digital repository of the department of mediamatics and cultural heritage

tags into the page where they want data from google book search to display. these tags contain an html attribute that acts as an identifier to describe the bibliographic item for which information should be retrieved. it may contain its isbn, oclc number, or lccn. in addition, the tags also contain one or more html <class> attributes to describe which processing should be done with the information retrieved from google to integrate it into the page. these classes can be combined with a list of traditional css classes in the <class> attribute to apply further style and formatting control. examples as an example, consider the following html an adapter may use in a page: when processed by the google book classes widget library, the class “gbs-thumbnail” instructs the widget to embed a thumbnail image of the book jacket for isbn 0596000278, and “gbs-link-to-preview” provides instructions to wrap the tag in a hyperlink pointing to google’s preview page. the result is as if the server had contacted google’s web service and constructed the html shown in example 1 in table 2, but the mash-up creator does not need to be concerned with the mechanics of contacting google’s service and making the necessary manipulations to the document. example 2 in table 2 demonstrates a second possible use of the widget. in this example, the creator’s intent is to display an image that links to google’s information page if and only if google provides at least a partial preview for the book in question. this goal is accomplished by placing the image inside the span and using style=“display:none” to make the span initially invisible. the span is made visible only if a preview is available at google, displaying the hyperlinked image. the full list of features supported by the google book classes widget library can be found in table 3. integration with legacy opacs the approach described thus far assumes that the mashup creator has sufficient control over the html markup that is sent to the user. this assumption does not always hold if the html is produced by a vendor-provided system, since such systems automatically generate most of the html used to display opac search results or individual bibliographic records. if the opac provides an extension system, such as a facility to embed customized links to external resources, it may be used to generate the necessary html by utilizing variables (e.g., “@#isbn@” for isbn numbers) set by the opac software. if no extension facility exists, accommodations by the widget library are needed to maintain the goal of not requiring any programming on the part of the adapter. we implemented such accommodations to facilitate the use of google book classes within a iii millennium opac.7 we used magic strings such as “isbn:millennium.record” in a table 1. sample request and response for google book search dynamic link api request: http://books.google.com/books?bibkeys=isbn:0596000278&jscmd=viewapi&callback=process json response: process({ “isbn:0596000278”: { “bib_key”: “isbn:0596000278”, “info_url”: “http://books.google.com/books?id=ezqe1hh91q4c\x26source=gbs_viewapi”, “preview_url”: “http://books.google.com/books?id=ezqe1hh91q4c\x26printsec=frontcover\x26 source=gbs_viewapi”, “thumbnail_url”: “http://bks4.books.google.com/books?id=ezqe1hh91q4c\x26printsec=frontcover\x26 img=1\x26zoom=5\x26sig=acfu3u2d1usnxw9baqd94u2nc3quwhjn2a”, “preview”: “partial”, “embeddable”: true } }); 80 information technology and libraries | june 2010 table 2. example of client-side processing by the google book classes widget library example 1: html written by adapter browser display resultant html after client-side processing <a href=“http://books.google.com/books?id=ezqe1hh91q4c& printsec=frontcover&source=gbs_viewapi”> <img src=“http://bks3.books.google.com/books?id=ezqe1hh91q4c& amp;printsec=frontcover&img=1&zoom=5& sig=acfu3u2d1usnxw9baqd94u2nc3quwhjn2a” /> </a> example 2: html written by adapter browser display <img src=“http://www.google.com/intl/en/googlebooks/images/ gbs_preview_button1.gif” /> resultant html after client-side processing <a href=”http://books.google.com/books?id=ezqe1hh91q4c& source=gbs_viewapi”> <img src=“http://www.google.com/intl/en/googlebooks/images/ gbs_preview_button1.gif” /> </a> table 3. supported google book classes google book class meaning gbs-thumbnail gbs-link-to-preview gbs-link-to-info gbs-link-to-thumbnail gbs-embed-viewer gbs-if-noview gbs-if-partial-or-full gbs-if-partial gbs-if-full gbs-remove-on-failure include an <img...> embedding the thumbnail image wrap span/div in link to preview at google book search (gbs) wrap span/div in link to info page at gbs wrap span/div in link to thumbnail at gbs directly embed a viewer for book’s content into the page, if possible keep this span/div only if gbs reports that book’s viewability is “noview” keep this span/div only if gbs reports that book’s viewability is at least “partial” keep this span/div only if gbs reports that book’s viewability is “partial” keep this span/div only if gbs reports that book’s viewability is “full” remove this span/div if gbs doesn’t return book information for this item <title> attribute to instruct the widget library to harvest the isbn from the current page via screen scraping. figure 3 provides an example of how a google book classes widget can be integrated into an opac search results page. ■■ the tictoclookup widget library the tictocs journal table of contents service is a free online service that allows academic researchers and web services and widgets for library information systems | back and bailey 81 other users to keep up with newly published research by giving them access to thousands of journal tables of contents from multiple publishers.8 the tictocs consortium compiles and maintains a dataset that maps issns and journal titles to rss-feed urls for the journals’ tables of contents. the tictoclookup web service we used the tictocs dataset to create a simple json web service called “tictoclookup” that returns rss-feed urls when queried by issn and, optionally, by journal title. table 4 shows an example query and response. to accommodate different hosting scenarios, we created two implementations of this tictoclookup: a standalone and a cloud-based implementation. the standalone version is implemented as a python web application conformant to the web services gateway interface (wsgi) specification. hosting this version requires access to a web server that supports a wsgicompatible environment, such as apache’s mod_wsgi. the python application reads the tictocs dataset and responds to lookup requests for specific issns. a cron job downloads the most up-to-date version of the dataset periodically. the cloud version of the tictoclookup service is implemented as a google app engine (gae) application. it uses the highly scalable and highly available gae datastore to store tictocs data records. gae applications run on servers located in google’s regional data centers so that requests are handled by a data center geographically close to the requesting client. as of june 2009, google hosting of gae applications is free, which includes a free allotment of several computational resources. for each application, gae allows quotas of up to 1.3 mb requests and the use of up to 10 gb of bandwidth per twenty-fourhour period. although this capacity is sufficient for the purposes of many small and medium-size institutions, additional capacity can be purchased at a small cost. widgetization to facilitate the easy integration of this service into websites without javascript programming, we developed a widget library. like google book classes, this widget library is controlled via html attributes associated with html or <div> tags that are placed into the page where the user decides to display data from the tictoclookup service. the html <title> attribute identifies the journal by its issn or its issn and title. as with google book classes, figure 3. sample use of google book classes in an opac results page table 4. sample request and response for tictocs lookup web service request: http://tictoclookup.appspot.com/0028-0836?title=nature&jsoncallback=process json response: process({ “lastmod”: “wed apr 29 05:42:36 2009”, “records”: [{ “title”: “nature”, “rssfeed”: http://www.nature.com/nature/current_issue/rss }], “issn”: “00280836” }); 82 information technology and libraries | june 2010 the html <class> attribute describes the desired processing, which may contain traditional css classes. example consider the following html an adapter may use in a page: click to subscribe to table of contents for this journal when processed by the tictoclookup widget library, the class “tictoc-link” instructs the widget to wrap the span in a link to the rss feed at which the table of content is published, allowing users to subscribe to it. the class “tictoc-preview” associates a tooltip element with the span, which displays the first entries of the feed when the user hovers over the link. we use the google feeds api, another json-based web service, to retrieve a cached copy of the feed. the “tictoc-alternate-link” class places an alternate link into the current document, which in some browsers triggers the display of the rss feed icon figure 4. sample use of tictoclookup classes in the status bar. the element, which is initially invisible, is made visible if and only if the tictoclookup service returns information for the given pair of issn and title. figure 4 provides a screenshot of the display if the user hovers over the link. as with google book classes, the mash-up creator does not need to be concerned with the mechanics of contacting the tictoclookup web service and making the necessary manipulations to the document. table 5 provides a complete overview of the classes tictoclookup supports. integration with legacy opacs similar to the google book classes widget library, we implemented provisions that allow the use of tictoclookup classes on pages over which the mash-up creator has limited control. for instance, specifying a title attribute of “issn:millennium.issnandtitle” harvests the issn and journal title from the iii millennium’s record display page. ■■ majax whereas the widget libraries discussed thus far integrate external web services into an opac display, majax is a widget library that integrates information coming from an opac into other pages, such as resource guides or course displays. majax is designed for use with a iii millennium integrated library system (ils) whose vendor does not provide a web-services interface. the techniques we used, however, extend to other opacs as well. like many table 5. supported tictoclookup classes tictoclookup class meaning tictoc-link tictoc-preview tictoc-embed-n tictoc-alternate-link tictoc-append-title wrap span/div in link to table of contents display tooltip with preview of current entries embed preview of first n entries insert <link rel=“alternate”> into document append the title of the journal to the span/div web services and widgets for library information systems | back and bailey 83 legacy opacs, millennium does not only lack a web-services interface, but lacks any programming interface to the records contained in the system and does not provide access to the database or file system of the machine housing the opac. providing opac data as a web service we implemented two methods to access records from the millennium opac using bibliographic identifiers such as isbn, oclc number, bibliographic record number, and item title. both methods provide access to complete marc records and holdings information, along with locations and real-time availability for each held item. majax extracts this information via screenscraping from the marc record display page. as with all screen-scraping approaches, the code performing the scraping must be updated if the output format provided by the opac changes. in our experience, such changes occur at a frequency of less than once per year. the first method, majax 1, implements screen scraping using javascript code that is contained in a document placed in a directory on the server (/screens), which is normally used for supplementary resources, such as images. this document is included in the target page as a hidden html <iframe> element (see frame b in figure 2). consequently, the same-domain restriction applies to the code residing in it. majax 1 can thus be used only on pages within the same domain—for instance, if the opac is housed at opac.library.university.edu, majax 1 may be used on all pages within *.university.edu (not merely *.library.university.edu). the key advantage of majax 1 is that no additional server is required. the second method, majax 2, uses an intermediary server that retrieves the data from the opac, translates it to json, and returns it to the client. this method, shown in figure 5, returns json data and therefore does not suffer from the same-domain restriction. however, it requires hosting the majax 2 web service. like the tictoclookup web service, we implemented the majax 2 web service using python conformant to wsgi. a single installation can support multiple opacs. widgetization the majax widget library allows the integration of both majax 1 and majax 2 data into websites without javascript programming. the tags function as placeholders, and <title> and <class> attributes describe the desired processing. majax provides a number of “majax classes,” multiple of which can be specified. these classes allow a mash-up creator to insert a large variety of bibliographic information, such as the values of marc fields. classes are also provided to insert fully formatted, ready-to-copy bibliographic references in harvard style, live circulation information, links to the catalog record, links to online versions of the item (if applicable), a ready-to-import ris description of the item, and even images of the book cover. a list of classes majax supports is provided in table 6. examples figure 6 provides an example use of majax widgets. four tags expand into the book cover, a complete harvard-style reference, the valid of a specific marc field (020), and a display of the current availability of the item, wrapped in a link to the catalog record. texts such as “copy is available” shown in figure 6 are localizable. even though there are multiple majax tags that refer to the same isbn, the majax widget library will contact the majax 1 or majax 2 web service only once per identifier, independent of how often it is used in a page. to manage the load, the majax client site library can be configured to not exceed a maximum number of requests per second, per client. all software described in this paper is available under the lgpl open source license. the majax libraries have been used by us and others for about two years. for instance, the “new books” list in our library uses majax 1 to provide circulation information. faculty members at our institution are using majax to enrich their course websites. a number of libraries have adopted majax 1, which is particularly easy to host because no additional server is required. ■■ related work most ilss in use today do not provide suitable web-services interfaces to access either bibliographic information figure 5. architecture of the majax 2 web service 84 information technology and libraries | june 2010 or availability data.9 this shortcoming is addressed by multiple initiatives. the ils discovery interface task force (ils-di) created a set of recommendations that facilitate the integration of discovery interfaces with legacy ilss, but does not define a concrete api.10 related, the iso 20775 holdings standard describes an xml schema to describe the availability of items across systems, but does not describe an api for accessing them.11 many ilss provide a z39.50 interface in addition to their htmlbased web opacs, but z39.50 does not provide standardized holdings and availability.12 nevertheless, there is hope within the community that ils vendors will react to their customers’ needs and provide web-services interfaces that implement these recommendations. the jangle project provides an api and an implementation of the ils-di recommendations through a representations state transfer (rest)–based interface that uses the atom publishing protocol (app).13 jangle can be linked to legacy ilss via connectors. the use of the xml-based app prevents direct access from client-side javascript code, however. in the future, adoption and widespread implementation of the w3c working draft on crossorigin resource sharing may relax the same-origin restriction in a controlled fashion, and thus allow access to app feeds from javascript across domains.14 screen-scraping is a common technique used to overcome the lack of web-services interfaces. for instance, oclc’s worldcat local product obtains access to availability information from legacy ilss in a similar fashion as our majax 2 service.15 whereas the web services used or created in our work exclusively use a rest-based model and return data in json format, interfaces based on soap (formerly simple object access protocol) whose semantics are described by a wsdl specification provide an alternative if access from within client-side javascript code is not required.16 html written by adapter <table width=“340”><tr><td> </td><td> isbn: </td></tr></table> display in browser after processing dahl, mark., banerjee, kyle., spalti, michael., 2006, digital libraries : integrating content and systems / oxford, chandos publishing, xviii, 203 p. isbn: 1843341662 (hbk.) 1 copy is available figure 6. example use of majax widgets oclc grid services provides rest-based web-services interfaces to several databases, including the worldcat search api and identifier services such as xisbn, xissn, and xoclcnum for frbr-related metadata.17 these services support xml and json and could benefit from widgetization for easier inclusion into client pages. the use of html markup to encode processing instructions is common in javascript frameworks, such as yui or dojo, which use <div> elements with customdefined attributes (so-called expando attributes) for this purpose.18 google gadgets uses a similar technique as well.19 the widely used context objects in spans (coins) specification exploits tags to encode openurl table 6. selected majax classes majax class replacement majax-marc-fff-s majax-marc-fff majax-syndetics-* majax-showholdings majax-showholdings-brief majax-endnote majax-ebook majax-linktocatalog majax-harvard-reference majax-newline majax-space marc field fff, subfields concatenation of all subfields in field fff book cover image current holdings and availability information …in brief format ris version of record link to online version, if any link to record in catalog reference in harvard style newline space web services and widgets for library information systems | back and bailey 85 techniques for the seamless inclusion of information from web services into websites. we considered the cases where an opac is either the target of such integration or the source of the information being integrated. we focused on client-side techniques in which each user’s browser contacts web services directly because this approach lends itself to the creation of html widgets. these widgets allow the integration and customization of web services without requiring programming. therefore nonprogrammers can become mash-up creators. we described in detail the functionality and use of several widget libraries and web services we built. table 7 provides a summary of the functionality and hosting requirements for each system discussed. although the specific requirements for each system differ because of their respective nature, all systems are designed to be deployable with minimum effort and resource requirements. this low entry cost, combined with the provision of a high-level, nonprogramming interface, constitute two crucial preconditions for the broad adoption of mash-up techniques in libraries, which in turn has the potential to context objects in pages for processing by client-side extension.20 librarything uses client-side mash-up techniques to incorporate a social tagging service into opac pages.21 although their technique uses a <div> element as a placeholder, it does not allow customization via classes—the changes to the content are encoded in custom-generated javascript code for each library that subscribes to the service. the juice project shares our goal of simplifying the enrichment of opac pages with content from other sources.22 it provides a set of reusable components that is directed at javascript programmers, not librarians. in the computer-science community, multiple emerging projects investigate how to simplify the creation of server-side data mash-ups by end user programmers.23 ■■ conclusion this paper explored the design space of mash-up table 7. summary of features and requirements for the widget libraries presented in this paper majax 1 majax 2 google book classes tictoclookup classes web service screen scraping iii record display json proxy for iii record display google book search dynamic link api books.google.com tictoc cloud application tictoclookup .appspot.com hosted by existing millennium installation /screens wsgi/python script on libx.lib.vt.edu google, inc. google, inc. via google app engine data provenance your opac your opac google jisc (www.tictocs .ac.uk) additional cost n/a can use libx.lib.vt.edu for testing, must run wsgi-enabled web server in production free, but subject to google terms of service generous free quota, pay per use beyond that same domain restriction yes no no no widgetization majax.js: class-based: majaxclasses gbsclasses.js:classbased: gbs tictoc.js:class-based: tictoc requires javascript programming no no no no requires additional server no yes (apache+mod_wsgi) no no (if using gae), else need apache+mod_wsgi iii bibrecord display n/a n/a yes yes iii webbridge integration yes yes yes yes 86 information technology and libraries | june 2010 vastly increase the reach and visibility of their electronic resources in the wider community. references 1. nicole engard, ed., library mashups—exploring new ways to deliver library data (medford, n.j.: information today, 2009); andrew darby and ron gilmour, “adding delicious data to your library website,” information technology & libraries 28, no. 2 (2009): 100–103. 2. monica brown-sica, “playing tag in the dark: diagnosing slowness in library response time,” information technologies & libraries 27, no. 4 (2008): 29–32. 3. dapper, “dapper dynamic ads,” http://www.dapper .net/ (accessed june 19, 2009); yahoo!, “pipes,” http://pipes .yahoo.com/pipes/ (accessed june 19, 2009). 4. jennifer bowen, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology & libraries 27, no. 2 (2008): 6–19; john blyberg, “ils customer bill-of-rights,” online posting, blyberg.net, nov. 20, 2005, http://www.blyberg .net/2005/11/20/ils-customer-bill-of-rights/ (accessed june 18, 2009). 5. douglas crockford, “the application/json media type for javascript object notation (json),” memo, the internet society, july 2006, http://www.ietf.org/rfc/rfc4627.txt (accessed mar. 30, 2010). 6. google, “who’s using the book search apis?” http:// code.google.com/apis/books/casestudies/ (accessed june 16, 2009). 7. innovative interfaces, “millennium ils,” http://www.iii .com/products/millennium_ils.shtml (accessed june 19, 2009). 8. joint information systems committee, “tictocs journal tables of contents service,” http://www.tictocs.ac.uk/ (accessed june 18, 2009). 9. mark dahl, kyle banarjee, and michael spalti, digital libraries: integrating content and systems (oxford, united kingdom: chandos, 2006). 10. john ockerbloom et al., “dlf ils discovery interface task group (ils-di) technical recommendation,” (dec. 8, 2008), http://diglib.org/architectures/ilsdi/dlf_ils_ discovery_1.1.pdf (accessed june 18, 2009). 11. international organization for standardization, “information and documentation—schema for holdings information,” http://www.iso.org/iso/catalogue_detail .htm?csnumber=39735 (accessed june 18, 2009) 12. national information standards organization, “ansi/ niso z39.50—information retrieval: application service definition and protocol specification,” (bethesda, md.: niso pr., 2003), http://www.loc.gov/z3950/agency/z39-50-2003.pdf (accessed may 31, 2010). 13. ross singer and james farrugia, “unveiling jangle: untangling library resources and exposing them through the atom publishing protocol,” the code4lib journal no. 4 (sept. 22, 2008), http://journal.code4lib.org/articles/109 (accessed apr. 21, 2010); roy fielding, “architectural styles and the design of network-based software architectures” (phd diss., university of california, irvine, 2000); j. c. gregorio, ed., “the atom publishing protocol,” memo, the internet engineering task force, oct. 2007, http://bitworking.org/projects/atom/rfc5023.html (accessed june 18, 2009). 14. world wide web consortium, “cross-origin resource sharing: w3c working draft 17 march 2009,” http://www .w3.org/tr/access-control/ (accessed june 18, 2009). 15. oclc online computer library center, “worldcat and cataloging documentation,” http://www.oclc.org/support/ documentation/worldcat/default.htm (accessed june 18, 2009). 16. f. curbera et al., “unraveling the web services web: an introduction to soap, wsdl, and uddi,” ieee internet computing 6, no. 2 (2002): 86–93. 17. oclc online computer library center, “oclc web services,” http://www.worldcat.org/devnet/wiki/services (accessed june 18, 2009); international federation of library associations and institutions study group on the functional requirements for bibliographic records, “functional requirements for bibliographic records : final report,” http://www.ifla.org/files/ cataloguing/frbr/frbr_2008.pdf (accessed mar. 31, 2010). 18. yahoo!, “the yahoo! user interface library (yui),” http://developer.yahoo.com/yui/ (accessed june 18, 2009); dojo foundation, “dojo—the javascript toolkit,” http://www .dojotoolkit.org/ (accessed june 18, 2009). 19. google, “gadgets.* api developer’s guide,” http://code. google.com/apis/gadgets/docs/dev_guide.html (accessed june 18, 2009). 20. daniel chudnov, “coins for the link trail,” library journal 131 (2006): 8–10. 21. librarything, “librarything,” http://www.librarything .com/widget.php (accessed june 19, 2009). 22. robert wallis, “juice—javascript user interface componentised extensions,” http://code.google.com/p/juice-project/ (accessed june 18, 2009). 23. jeffrey wong and jason hong, “making mashups with marmite: towards end-user programming for the web” conference on human factors in computing systems, san jose, california, april 28–may 3, 2007: conference proceedings, volume 2 (new york: association for computing machinery, 2007): 1435–44; guiling wang, shaohua yang, and yanbo han, “mashroom: end-user mashup programming using nested tables” (paper presented at the international world wide web conference, madrid, spain, 2009): 861–70; nan zang, “mashups for the web-active user” (paper presented at the ieee symposium on visual languages and human-centric computing, herrshing am ammersee, germany, 2008): 276–77. author id box for 2 column layout this article examines the linguistic structure of folksonomy tags collected over a thirty-day period from the daily tag logs of del.icio.us, furl, and technorati. the tags were evaluated against the national information standards organization (niso) guidelines for the construction of controlled vocabularies. the results indicate that the tags correspond closely to the niso guidelines pertaining to types of concepts expressed, the predominance of single terms and nouns, and the use of recognized spelling. problem areas pertain to the inconsistent use of count nouns and the incidence of ambiguous tags in the form of homographs, abbreviations, and acronyms. with the addition of guidelines to the construction of unambiguous tags and links to useful external reference sources, folksonomies could serve as a powerful, flexible tool for increasing the user-friendliness and interactivity of public library catalogs, and also may be useful for encouraging other activities, such as informal online communities of readers and user-driven readers’ advisory services. o ne of the most daunting challenges of information management in the digital world is the ability to keep, or refind, relevant information; book marking is one of the most popular methods for storing relevant web information for reaccess and reuse (bruce, jones, and dumais 2004). the rising popularity of social bookmark managers, such as del.icio.us, addresses these concerns by allowing users to organize their bookmarks by assigning tags that reflect directly their own vocabu lary and needs. the collection of userassigned tags is referred to commonly as a folksonomy. in recent years, significant developments have occurred in the creation of customizable user features in public library catalogs. these features offer clients the opportunity to customize their own library web pages and to store items of interest to them, such as book lists. client participation in these interfaces, however, is largely reactive; clients can select items from the catalog, but they have little ability to orga nize and categorize these items in a way that reflects their own needs and language. digital document repositories, such as library cata logs, normally index the subject of their contents via key words or subject headings. traditionally, such indexing is performed either by an authority, such as a librarian or a professional indexer, or is derived from the authors of the documents; in contrast, collaborative tagging, or folkson omy, allows anyone to freely attach keywords or tags to content. demspey (2003) and ketchell (2000) recommend that clients be allowed to annotate resources of interest and to share these annotations with other clients with similar interests. folksonomies can thus make significant contributions to public library catalogs by enabling cli ents to organize personal information spaces; namely, to create and organize their own personal information space in the catalog. clients find items of interest (items in the library catalog, citations from external databases, external web pages, and so on) and store, maintain, and organize them in the catalog using their own tags. in order to more fully understand these applications, it is important to examine how folksonomies are struc tured and used, and the extent to which they reflect user needs not found in existing lists of subject headings. the purpose of this proposed research is thus to examine the structure and scope of folksonomies. how are the tags that constitute the folksonomies structured? to what extent does this structure reflect and differ from the norms used in the construction of controlled vocabular ies ,such as library of congress subject headings? what are the strengths and weaknesses of folksonomies (for example, reflect user need, ambiguous headings, redun dant headings, and so forth)? this article will examine a selection of tags obtained from three folksonomy sites, del.icio.us (referred to henceforth as delicious), furl, and technorati, over a thirtyday period. the structure of these tags will be examined and evaluated against section 6 of the niso guidelines for the construction of controlled vocabularies (niso 2005), which looks specifically at the choice and form of terms. ■ definitions of folksonomies folksonomies have been described as “usercreated meta data . . . grassroots community classification of digital assets” (mathes 2004). wikipedia (2006) describes a folksonomy as “an internetbased information retrieval methodology consisting of collaboratively generated, openended labels that categorize content such as web pages, online photographs, and web links.” the concept of collaboration is attributed commonly to folksonomies (bateman, brooks, and mccalla 2006; cattuto, loreto, and pietronero 2006; fichter 2006; golder and huberman the structure and form of folksonomy tags: the road to the public library catalog louise f. spiteri louise f. spiteri (louise.spiteri@dal.ca) is associate professor at the school of information management, dalhousie university, halifax, nova scotia, canada. this research was funded by the oclc/alise library and information science research grant program. the structure and form of folksonomy tags | spiteri 13 1� information technology and libraries | september 20071� information technology and libraries | september 2007 2006; mathes 2004; quintarelli 2005; udell 2004). thomas vander wal, who coined the term folksonomy, argues, however, that: the definition of folksonomy has become completely unglued from anything i recognize. . . . it is not col laborative . . . it is the result of personal free tagging of information and objects (anything with a url) for one’s own retrieval. the tagging is done in a social environment (shared and open to others). the act of tagging is done by the person consuming the informa tion” (vanderwal.net 2005). it may be more accurate, therefore, to say that folk sonomies are created in an environment where, although people may not actively collaborate in their creation and assignation of tags, they may certainly access and use tags assigned by others. folksonomies thus enable the use of shared tags. folksonomies are used primarily in social bookmark ing sites, such as delicious (http://del.icio.us/) and furl (http://www.furl.net/), which allow users to add sites they like to their personal collections of links, to organize and categorize these sites by adding their own terms, or tags, and to share this collection with other people with the same interests. the tags are used to collocate bookmarks within a user’s collection and bookmarks across the entire system, so, for example, the page http://del.icio.us/tag/blogging will show all bookmarks that are tagged with blogging by any member of the delicious site. ■ benefits of folksonomies quintarelli (2005) and fichter (2006) suggest that folk sonomies reflect the movement of people away from authoritative, hierarchical taxonomic schemes that reflect an external viewpoint and order that may not necessarily reflect users’ ways of thinking. “in a social distributed environment, sharing one’s own tags makes for innova tive ways to map meaning and let relationships naturally emerge” (quintarelli 2005). vander wal (2006) adds that “the value in this external tagging is derived from people using their own vocabulary and adding explicit mean ing, which may come from inferred understanding of the information/object.” an attractive feature of folksonomies is their inclusive ness; they reflect the vocabulary of the users, regardless of viewpoint, background, bias, and so forth. folksonomies may thus be perceived to be a democratic system where everyone has the opportunity to contribute and share tags (kroski 2006). the development of folksonomies may reflect also the difficulty and expense of applying con trolled taxonomies to the web: building, maintaining, and enforcing a sound, controlled vocabulary is often simply too expensive in terms of development time and of the steep learning curve needed by the user of the system to learn the classification scheme (fichter 2006; kroski 2006; quintarelli 2005; shirky 2004). a further limitation of taxonomies is that they may become outdated easily. new concepts or products may emerge that are not yet included in the taxonomy; in comparison, folksonomies easily accommodate such new concepts (fichter 2006; mitchell 2005; wu, zubair, and maly, 2006). shirky (2004) points out that the advantage of folksonomies is not that they are better than controlled vocabularies, but that they are better than nothing. folksonomies follow desire lines, which are expres sions of the direct information needs of the user (kroski 2006; mathes 2004; merholz 2004). these desire lines also may reflect the needs of communities of interest: tag gers who use same set of tags have formed a group and can seek each other out using simple search techniques. “tagging provides users an easy, yet powerful method to express themselves within a community” (szekely and torres 2005). ■ weaknesses of folksonomies folksonomies share the problems inherent to all uncon trolled vocabularies, such as ambiguity, polysemy, syn onymy, and basic level variation (fichter 2006; golder and huberman 2006; guy and tomkin 2006; mathes 2004). the terms in a folksonomy may have inherent ambiguity as different users apply terms to documents in different ways. the polysemous tag port could refer to a sweet fortified wine, a porthole, a place for loading and unloading ships, the lefthand side of a ship or air craft, or a channel endpoint in a communications system. folksonomies do not include guidelines for use or scope notes. folksonomies provide for no synonym control; the terms mac, macintosh, and apple, for example, are all used to describe apple macintosh computers. similarly, both singular and plural forms of terms appear (for example, flower and flowers), thus creating a number of redun dant headings. the problem with basic level variation is that related terms that describe an item vary along a continuum of specificity ranging from very general to very specific, so, for example, documents tagged perl and javascript may be too specific for some users, while a document tagged programming may be too general for others. folksonomies provide no formal guidelines for the choice and form of tags, such as the use of com pound headings, punctuation, word order, and so forth; for example, should one use the tag vegan cooking or cooking, vegan? guy and tomkin (2006) provide some general suggestions for tag selection best practices, such as the use of plural rather than singular forms, the use article title | author 15the structure and form of folksonomy tags | spiteri 15 of underscore to join terms in a multiterm concept (for example, open_source), following conventions estab lished by others, and adding synonyms. these sugges tions are rather too vague to be of much use, however; for example, under what circumstances should singular forms be used (such as noncount nouns), and how should synonyms be linked? ■ applications of folksonomies other than social bookmarking sites, folksonomies are used in commercial shopping sites, such as amazon (http://www.amazon.com/), where clients tag items of interest; these tags can be accessed by people with similar interests. platial (http://www.platial.com/ splash) is used to tag personal collections of maps. examples of the use of folksonomies for intranets include ibm’s social bookmarking application dogear, which allows people to bookmark pages within their intranet (http://domino.watson.ibm.com/cambridge/ research.nsf/99751d8eb5a20c1f852568db004efc90/ 1c181ee5fbcf59fb852570fc0052ad75?opendocument), and scuttle (http://sourceforge.net/projects/scuttle/), an opensource bookmarking project that can be hosted on web servers for free. penntags (http://tags.library. upenn.edu/) is a social bookmarking service offered by the university of pennsylvania library to its community members. steve museum is a project that is investigating the incorporation of folksonomies into museum catalogs (trant and wyman 2006). another potential application of folksonomies is to public library catalogs, where users can organize and tag items of interest in userspecific folders; users could then decide whether or not to post the tags publicly (spiteri 2006). ■ analyses of folksonomies analysis of the structure, or composition, of tags has thus far been limited; there has been more emphasis placed upon the cooccurrence of tags and their frequency of use. cattuto, loreto, and pietronero (2006) applied a stochas tic model of user behavior to investigate the statistical properties of tag cooccurrence; their results suggest that users of collaborative tagging systems share universal behaviors. michlmayr (2005) compared tags assigned to a set of delicious bookmarks to the dmoz (http://www. dmoz.org/) taxonomy, which is designed by a commu nity of volunteers. the study concluded that there were few instances of overlap between the two sets of terms. mathes (2004) provides an interesting analysis of the strengths and limitations of the structure of delicious and flickr, but does not provide an explanation of the meth odology used to derive his observations; it is not clear, for example, for how long he studied these two sites, how many tags he examined, what elements he was looking for, or what evaluative criteria he applied. golder and huberman (2006) conducted an analysis of the structure of collaborative tagging systems, look ing at user activity and kinds and frequencies of tags. specifically, golder and huberman looked at what tags delicious members assigned and how many bookmarks they assigned to each tag. this study identified a number of functions tags perform for bookmarks, including iden tifying the: ■ subject of the item; ■ format of the item (for example, blog); ■ ownership of the item; and ■ characteristics of the item (for example, funny). while the golder and huberman study provides an important look at tag use, their study is limited in that they examined only one site for a period of four days; their results are an excellent first step in the analysis of tag use, but the narrow focus of their population and sample size means that their observations are not easily generalized. furthermore, this study focuses more on how bookmarks are associated with tags (for example, how many bookmarks are assigned per tag and by whom) rather than at the structural composition of the tags themselves. guy and tonkin (2006) collected a random sampling of tags from delicious and flickr to see whether “popular objections to folksonomic tagging are based on fact.” the authors do not explain, however, over what period the tags were acquired (for example, over a oneday period, over a month), nor to they provide any evaluative criteria. the tags were entered into aspell, an open source spell checker, from which the authors concluded that 40 percent of flickr and 28 percent of delicious tags were either mis spelled, encoded in a manner not understood by aspell, or consisted of compound words of two or more words. tags did not follow convention in such areas as the use of case or singular versus plural forms. while this study certainly focuses upon the structure of the tags, the bases for the authors’ conclusions are problematic. it is not clear that the use of a spell checker is a sufficient measure of quality. does the spell checker allow for cultural variations in spell ing (for example, labor or labour)? how wellrecognized and comprehensive is the source vocabulary for this spell checker? furthermore, if a tag does not exist in the spell checker, does this necessarily mean that the tag is incor rect? tags may include several neologisms, such as podcasting, that may not yet exist in conventional dictionaries but are wellrecognized in a particular domain. the authors do not mention whether they took into account the cor 16 information technology and libraries | september 200716 information technology and libraries | september 2007 rect use of the singular form of such tags as noncountable nouns (for example, air) or tags that describe disciplines or emotions (for example, history and love). if a named entity (person or organization) was not recognized by aspell, does this mean that the tag was classified as incorrect? lastly, the authors seem to imply that compound words of two or more words are necessarily incorrect, which may not be the case (for example, open source software). the pitfalls of folksonomies have been welldocu mented; what is missing is an indepth analysis of the linguistic structure of tags against an established bench mark. while popular opinion suggests that folksonomies suffer from ambiguous and inconsistent structure, the actual extent of these problems is not yet clear; further more, analyses conducted so far have not established clear benchmarks of quality pertaining to good tag structure. although there are no guidelines for the construction of tags, recognized guidelines do exist for the construction of terms that are used in taxonomies. although these guidelines discuss the elucidation of interterm relation ships (hierarchical, associative, and equivalent), which does not apply to the flat space of folksonomies, they contain sections pertaining to the choice and formation of concept terms that may, in fact, have relevance for the construction of tags. ■ methodology selection of folksonomy sites tags were chosen from three popular folksonomy sites: delicious, furl, and technorati (http://www.technorati. com/). delicious and furl function as bookmarking sites, while technorati enables people to search for and organize blogs. these sites were chosen because they provide daily logs of the most popular tags that have been assigned by their members on a given day. the daily tag logs from each of the sites were acquired over a thirtyday period (february 1–march 2, 2006). the daily tags for each site were entered into an excel spreadsheet. a list of unique tags for each site was compiled after the thirtyday period; unique refers to the single instance of a tag. some of the tags were used only once during the thirtyday period, while others, such as travel, occurred several times, so travel appears only once in the list of unique tags. variations of the same tag—for example, car or cars, cheney or dick cheney—were considered to constitute two unique tags. only englishlanguage tags were accumulated. the analysis of the tag structure in the three lists was conducted by applying the niso guidelines for thesaurus construction, which are the most current set of recognized guidelines for the: contents, display, construction . . . of controlled vocabu laries. this standard focuses on controlled vocabularies that are used for the representation of content objects in knowledge organization systems including lists, syn onym rings, taxonomies, and thesauri (niso 2005, 1). while folksonomies are not controlled vocabularies, they are lists of terms used to describe content, which means that the niso guidelines could work well as a benchmark against which to examine how folksonomy tags are structured as well as the extent to which this structure reflects the widely accepted norm for controlled vocabu laries. section 6 of the guidelines (term choice, scope, and form) was applied to the tags, specifically the following elements (see appendix a for the expanded list): 6.3 term choice 6.4 grammatical form of terms 6.5 nouns 6.6 selecting the preferred form only those elements in section 6 that were found to apply to the lists of unique tags are included in appendix a. for each site, the section 6 elements were applied to each unique tag; for example, it was noted whether a tag consists of one or more terms, whether the tag is a noun, adjective, or adverb, and so on. the frequency of occur rence of the section 6 elements was noted for each site and then compared across the three sites in order to determine the existence of any patterns in tag structure and the extent to which these patterns reflect current practice in the design of controlled vocabularies. definition and disambiguation of tags the meanings of the tags were determined based upon (1) the context of their use; and (2) their definition in three external sources, namely merriam webster online dic tionary (http://www.mw.com/); google (http://www. google.com/); and wikipedia (http://www.wikipedia. org/). merriamwebster was used specifically to define all tags other than those that constitute unique entities (for example, named people, places, organizations, or products) and to determine the various meanings of tags that are homographs (for example, art or web). the actual concept represented by homographs was determined by examin ing the sites or blogs to which the tag was assigned. merriamwebster also was used to determine the grammatical form of a tag; for example, noun, verbal noun, adjective, or adverb. determining verbal nouns proved to be complicated, especially given that niso relies only on examples to illustrate such nouns. some tags could serve as both verbal and simple nouns; for example, the tag clipping could describe the activity to clip or an item that has been clipped, such as a newspaper article title | author 17the structure and form of folksonomy tags | spiteri 17 clipping. similarly, does skiing refer to an activity, or the sport? if the dictionary defined a tag as an activity, the tag was classified as a verbal noun. in the case of tags that were defined as both verbal nouns and simple nouns, the context in which the tag was used determined the final classification. the dictionary also was used to determine the type of concept represented by a tag. the niso guidelines do not define any of these seven types of concepts outlined in section 6.3.2; they provide only a short list of examples for each type. if the term represented by the tag was defined as an activity, property, material, event, discipline or field of study, or unit of measurement, it was classified as such unless the context of the tag suggested otherwise. if none of these six types was defined in the dictionary, the default value of thing was assigned to the tag. these definitions were then compared to the context in which the tag was used. in the case of the tag art, for example, an examination of the sites associated with this tag indicated that it refers to art objects, rather than the discipline, so it was classified as a thing. merriamwebster was used to determine whether a tag constitutes a recognized term in standard english (both united states and united kingdom variants); for example, the tag blogs is a recognized term in the dictionary, while podcasting is not. niso does not provide a clear definition of slang, neologism, or jargon, other than to say that they are nonstandard terms not generally found in dictionaries. is the term podcasting, for example, an instance of slang, jargon, or neologism? at what point does jargon become a neologism? because of the difficulty of distinguishing among these three categories, it was decided to use the broader category nonstandard terms to cover tags that (1) could not be found in the dictionary; or (2) are designated as vulgar or slang in the dictionary. google and wikipedia were used to define the mean ings of tags that constitute unique entities. wikipedia also was used to distinguish the various meanings of tags that constitute abbreviations or acronyms via its disambigua tion pages; for example, the tag nfl is given eight pos sible meanings. in this case, the tag nfl is used to refer specifically to the national football league, so the tag is a homograph, noun, and unique entry. ■ tagging conventions and guidelines of the folksonomy sites delicious delicious defines tags as: oneword descriptors that you can assign to your bookmarks. . . . they’re a little bit like keywords but nonhierarchical. you can assign as many tags to a bookmark as you like and easily rename or delete them later. tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders” (del.icio.us 2006a). the delicious help page for tags encourages people to “enter as many tags as you would like, each separated by a space” in the tag field. this paragraph explains briefly that two lists of tags may appear under the entry form used to enter a bookmark. the first list consists of popular tags assigned by other people to the bookmark in question, while the second consists of recommended tags, which contains a combination of tags that have been assigned by the client in question as well as other users (del.icio.us 2006b). it is not clear how the two lists differ in that they both contain tags assigned by other people to the bookmark at hand. the only tangible guideline provided about how tags should be structured is the sentence “your only limitation on tags is that they must not include spaces.” delicious thus addresses only indirectly the fact that it does not allow multiterm tags; the examples provided suggest ways in which compound terms can be expressed; for example, sanfrancisco, sanfranciso, san.franciso (del. ico.us 2006b). punctuation thus appears to be allowed in the construction of tags, which is confirmed by the sug gestion that asterisks may be used to rate bookmarks: “a tag of * might mean an ok link, *** is pretty good, and a bookmark tagged ***** is awesome” (del.icio.us 2006b). it is thus possible that tags may not consist of recognizable terms, even though asterisks are neither searchable nor indicative of content. furl the furl web site uses the term topics rather than tags, but provides no guidelines or instructions for how to con struct these topics. furl mentions only that when entering a bookmark, “a small window will pop up. it should have the title and url of the page you are looking at. enter any additional details (i.e., topic, rating, comments) and click save” (furl 2006). furl provides all users with a list of default topics to which one can add at will. furl provides no guidelines as to whether single or multiword topics may be used; it is only by trial and error that the user discovers that the latter are, in fact, allowed. technorati in its tags help page, technorati encourages users to “think of a tag as a simple category name. people can categorize their posts, photos, and links with any tag that makes sense” (technorati 2006). a tag may be “anything, but it should be descriptive. please only use tags that are rel evant to the post” (technorati 2006). technorati tags are 1� information technology and libraries | september 20071� information technology and libraries | september 2007 embedded into individual blogs via the link rel=”tag”; for example: <a href=”http://technorati.com/tag/ global+warming” rel=”tag”>global warming</a>. the tag will appear as simply global warming. no other guidelines are provided about how tags should be constructed. as can be seen, the three folksonomy sites provide very few guidelines or conventions for how tags should be constructed. users are not pointed to the common problems that exist in uncontrolled vocabulary, such as ambiguous headings, homographs, synonyms, spelling variations, and so forth, nor are suggestions made as to the preferred form of tags, such as nouns, plural forms, or the distinction between count nouns (for example, dogs) and mass nouns (for example, air). given this lack of guidance, it is not unreasonable to assume that the tags acquired from these sites will vary considerably in form and structure. ■ findings unless stated otherwise, the number of tags per folk sonomy site is 76 for delicious, 208 for furl, and 229 for technorati. homographs the niso guidelines recommend that homographs— terms with identical spellings but different meanings— should be avoided as far as possible in the selection of terms. homographs constitute 22 percent of delicious tags, 12 percent of furl tags, and 20 percent of technorati tags. unique entities constitute a significant proportion of the homographs in all three sites, with 71 percent in delicious, 43 percent in furl, and 55 percent in technorati. the most frequently occurring homographs across the three sites consist predominantly of computerrelated terms, such as ajax and css. single-word versus multiword terms the niso guidelines recommend that terms should represent a single concept expressed by a single or mul tiword term, as needed. singleterm tags constitute 93 percent of delicious tags, 76 percent of furl tags, and 80 percent of technorati tags. the preponderance of single tags in delicious may reflect the fact that it does not allow for the use of spaces between the different elements of the same tag; for example, open source. types of concepts niso provides a list of seven types of concepts that may be represented by terms; while this list is not exhaustive, it represents the most frequently occurring types of con cept. table 1 shows the percentage of tags that correspond to each of the seven types of concepts. tags that represent things are clearly predominant in the three sites, with activities and properties forming a distant second and third in importance. none of the tags represent events or measures, and only a fraction of the technorati tags represent materials. the niso guidelines provide no indication of the expected distribution of the types of concepts, so it is difficult to determine to what extent the three folksonomy sites are consistent with other lists of descriptors. none of the tags fell outside the scope of the seven types of concepts. unique entities unique entities may represent the names of people, places, organizations, products, and specific events (niso 2005). unique entities constitute 22 percent of delicious tags, 14 percent of furl tags, and 49 percent of technorati tags. there is no consistency in the percentage of unique enti ties: technorati has nearly twice the percentage of tags than delicious has, and nearly triple the percentage of tags than furl has. computerrelated products constitute 100 percent of the unique entities in delicious, 63 percent in furl, and 38 percent in technorati. the remainder of the unique entities in furl and technorati represent places, people, and corporate bodies. the unique entities in technorati are closely related to developments in current news events, an occurrence that is likely due to the site’s focus on blogs rather than web sites. as will be discussed in a subsequent section, the unique entries constitute a significant proportion of the tags that represent ambiguous acronyms or abbreviated terms, such as ajax or psp. table 1. concepts represented by the tags delicious (%) furl (%) technorati (%) things 76 82 90.0 materials 0 0 0.4 activities 12 10 4.0 events 0 0 0.0 properties 8 6 4.0 disciplines 4 3 1.0 measures 0 0 0.0 article title | author 19the structure and form of folksonomy tags | spiteri 19 grammatical forms of terms the niso standards recommend the use of the following grammatical forms of terms: ■ nouns and noun phrases ■ verbal nouns ■ noun phrases ■ premodified noun phrases ■ postmodified noun phrases ■ adjectives ■ adverbs table 2 shows the distribution of the grammatical forms of tags. if all the types of nouns are combined, then 95 percent of delicious tags, 94 percent of furl tags, and 97 percent of technorati tags constitute types of nouns. the gram matical structure of the tags in the three folksonomy sites thus reflects very closely the niso recommendations that tags consist of mainly nouns, with the added proviso that adjectives and adverbs be kept to a minimum. none of the folksonomy sites used adverbs as tags, and the num ber of adjectives was very small, forming an average total of 5 percent of the tags. nouns (plural and singular forms) niso divides nouns into two categories: count nouns (how many?), and noncount, or mass nouns (how much?). niso recommends that count nouns appear in the plural form and mass nouns in the singular form. niso specifies other types of nouns that appear typi cally in the singular form: ■ abstract concepts ■ beliefs; for example, judaism, taoism ■ activities; for example, digestion, distribution ■ emotions; for example, anger, envy, love, pity ■ properties; for example, conductivity, silence ■ disciplines; for example, chemistry, astronomy ■ unique entities table 3 shows the distribution of the singular and plu ral forms of noun tags. the term singular nouns was used to collocate all the types of nonplural nouns. table 3 represents the number of tags that constitute count nouns; this does not mean, however, that the tags appeared correctly in the plural form. of the count nouns, 36 percent of delicious tags, 62 percent of furl tags, and 34 percent of technorati tags appeared correctly in the plural form. it should be noted that although table 3 indicates that properties constitute 8 percent of delicious, 6 percent of furl, and 4 percent of technorati tags, most of these tags are adjectives, and thus are not counted in the table. the niso guidelines do not suggest the typical distribution of count versus singular nouns, but table 3 indicates that at least among the three folksonomy sites, singular nouns form the bulk of the tags. table 2. grammatical form of tags delicious (%) furl (%) technorati (%) nouns 88 71 86 verbal nouns 5 6 4 noun phrases— premodified 1 15 4 noun phrases— postmodified 0 2 3 adjectives 6 6 3 adverbs 0 0 0 table 3. count and noncount noun tags delicious (%) furl (%) technorati (%) count nouns 18 35 23 noncount nouns 77 59 74 mass nouns 36 32 19 activities 12 10 4 properties 3 0 1 disciplines 4 3 1 unique 22 14 49 total 95 94 97 20 information technology and libraries | september 200720 information technology and libraries | september 2007 spelling the niso guidelines divide the spelling of terms into two sections: warrant and authority. with respect to warrant, niso recommends that “the most widely accepted spell ing of words, based on warrant, should be adopted,” with crossreferences made between variant spellings of terms. as far as authority is concerned, spelling should follow the practice of wellestablished dictionaries or glossaries. while spelling refers normally to whole words, i included in this analysis acronyms and abbreviations used to denote unique entities, such as countries or product names, as there are recognized spellings of such acronyms and abbreviations. table 4 shows the tags from the three sites that do not conform to recognized spelling; the terms in italics show the accepted spelling. the number of tags that do not conform to spelling warrant is clearly very few, constituting a total of 4 per cent of the delicious tags, 3 percent of the furl tags, and 2 percent of the technorati tags. two of the nonrecognized spellings in delicious are likely due to the difficulty of creating compound tags in this site, as was discussed earlier. the remainder of the tags conformed to recog nized spellings as found in the three reference sources consulted. the findings suggest that tags are spelled con sistently and in keeping with recognized warrant across the three folksonomy sites. because of the international nature of the three folksonomy sites, no default english spelling was assumed. table 5 shows those tags whose spellings reflect regional variations. none of the three folksonomy sites featured lexical variants of any one tag. as the three sites are united states–based, the preponderance of american spelling is not surprising. what is surprising, however, is that technorati features only the british variants in the total of tags examined in this study. it should be pointed out that the two lexical variants of these terms do appear in the three folksonomy sites; the two variants simply did not appear in the daily logs examined. no system to enable crossreferencing (for example, humour use or see humor) exists in any of the three folksonomy sites, nor is crossreferencing discussed in the help logs of the sites. abbreviations, initialisms, and acronyms niso recommends that the full form of terms should be used. abbreviations or acronyms should be used only when they are so wellestablished that the full form of the term is rarely used. crossreferences should be made between the full and abbreviated forms of the terms. abbreviations and acronyms constitute 22 percent of delicious tags, 16 percent of furl tags, and 19 percent of technorati tags. the majority of these abbreviations and acronyms pertain to unique entities, such as product names (for example, flash, mac, and nfl). in the case of delicious and furl, none of the abbreviated tags is referred to also by its full form. four of the abbreviated technorati tags have fullform equivalents: ■ cheney/dick cheney ■ ie/internet explorer ■ sheehan/cindy sheehan ■ uae/united arab emirates abbreviations and acronyms play a significant role in the ambiguity of the tags from the three sites; they represent 71 percent of the abbreviated delicious tags, 45 percent of the abbreviated furl tags, and 73 percent of the abbreviated technorati tags. furl and technorati are very similar in the proportion of abbreviated tags used, but delicious is significantly higher. the delicious tags are focused more heavily upon computerrelated products, which may explain why there are so many more abbrevi ated tags, as many of these products are often referred to by these shorter terms; for example, css, flash, apple, and so on. table 4. tags that do not conform to spelling warrant delicious (n=76) furl (n=208) technorati (n=229) howto (how to) hollywood bday (hollywood birthday) met-art pics (metropolitan art pictures) opensource (open source) med-books (medical books) superbowl (super bowl) toread (to read) oralsex (oral sex) web-20 (web2.0) table 5. tags that reflect regional spelling variations delicious (n=76) furl (n=208) technorati (n=229) humor (u.s. spelling) humor (u.s. spelling) favourite (british spelling) jewelry (u.s. spelling) humour (british spelling) article title | author 21the structure and form of folksonomy tags | spiteri 21 neologisms, slang, and jargon the niso guidelines explain that neologisms, slang, and jargon terms are generally not included in standard dic tionaries and should be used only when there is no other widely accepted alternative. nonstandard tags do not constitute a particularly relevant proportion of the total number of tags per site; they account for 3 percent of the delicious tags, 10 percent of the furl tags, and 6 percent of the technorati tags. the nonstandard tags refer almost exclusively to either computer or sexrelated concepts, such as podcast, wiki, and camsex. nonalphabetic characters this section of the niso guidelines deals with the use of capital letters and nonalphabetic characters. capitalization was not examined in the three folksonomy sites, as none of them are case sensitive; delicious and furl, for exam ple, post tags in lower case, regardless of whether the user has assigned upper or lower case, while technorati shows capital letters only if they are assigned by the users themselves. the niso guidelines state that nonalphabetic characters, such as hyphens, apostrophes (unless used for the possessive case), symbols, and punctuation marks, should not be used because they cause filing and search ing problems. table 6 shows the occurrence of nonalpha betic characters in the three folksonomy sites. a very small proportion of the tags in the three folk sonomy sites contains nonalphabetic characters, namely 1 percent of the delicious tags, and 3 percent of the furl and technorati tags. as was discussed previously, the delicious help screens may encourage people to use nonalphabetic characters to construct compound tags; in spite of this, however, such characters are not, in fact, used very frequently. it should be noted that the terms above were all searched, with punctuation intact, in their respective sites; in all three cases, the search engines retrieved the tags and their associated blogs or web sites, which suggests that nonalphabetic characters may not negatively impact searching. ■ discussion and recommendations the tags examined from the three folksonomy sites cor respond closely to a number of the niso guidelines pertaining to the structure of terms, namely in the types of concepts expressed by the tags, the predominance of single tags, the predominance of nouns, the use of recognized spelling, and the use of primarily alphabetic characters. potential problem areas in the structure of the tags pertain to the inconsistent use of the singular and plural form of count nouns, the difficulty with creating multi term tags in delicious, and the incidence of ambiguous tags in the form of homographs and unqualified abbre viations or acronyms. as has been seen, a significant proportion of tags that represent count nouns appears incorrectly in the singular form. because many search engines do not deploy default truncation, the use of the singular or plural form could affect retrieval; a search for the tag computer in delicious, for example, retrieved 208,409 hits, while one for computers retrieved 91,205 hits. some of the results from the two searches overlapped, but only if both the singular and plural forms of the tags coexist. it would thus be useful for the help features of the folksonomy sites to explain the difference between count and noncount nouns and to discuss the impact of the form of the noun upon retrieval. while all three sites conform to the niso recommendation that single terms be used whenever possible, some concepts cannot be expressed in this fashion, and thus folksonomy sites should accom modate the use of multiterm tags. table 6. nonalphabetic characters delicious (n=76) furl (n=208) technorati (n=229) hyphens — hollywood b-day; urlproject consumercredit; web2.0 apostrophes — mom’s medical (possessive) valentine’s day (possessive) underscore safari_export blogger_life — full stop — web 2.0 (part of product name) web-2.0 (part of product name) forward slash — — /africa + sign — jcr+ — 22 information technology and libraries | september 200722 information technology and libraries | september 2007 furl and technorati allow for their use, but make no mention of this feature in their help screens, which means that such tags may be constructed inconsistently—for example, by the insertion of punctuation—where a sim ple space between the tags will suffice. as has been seen, delicious does not allow directly for the construction of multiterm tags, and in its instructions it actually promotes inconsistency in how various punctuation devices may be used to conflate two or three separate tags, once again at the detriment of retrieval, as is shown below: opensource: 103,476 hits open_source: 91, 205 hits open.source: 26,494 hits delicious should consider allowing for the insertion of spaces between the composite words of a compound tag; without this facility, users may be unaware of how to create compound tags. alternatively, delicious should recommend the use of only one punctuation symbol to conflate terms, such as the underscore. furl and technorati should explain clearly that compound tags may be formed by the simple convention of placing a space between the terms. ambiguous headings constitute the most problematic area in the construction of the tags; these headings take the form of homographs and abbreviations or acronyms. in the case of computerrelated product names, it may be safe to assume that in the context of an online environ ment it is likely that the meaning of these product names is relatively selfevident. in the case of the tag yahoo, for example, none of the sites or blogs associated with this tag pertained to “a member of a race of brutes in swift’s gulliver’s travels who have the form and all the vices of humans, or a boorish, crass, or stupid person” (merriam webster 2007), but referred consistently to the internet service provider and search engine. on the other hand, the tag ajax was used to refer to asynchronous javascript and xml technology as well as to a number of mainly european soccer teams. given the international audience of these folksonomy sites, it may be unwise to assume that the meanings of these homographs are selfevident. library of congress subject headings often uses parenthetical qualifiers to clarify the meaning of terms— for example, python (computer program language)—even though this goes against niso recommendations. it is unlikely, however, that such use of parentheses will be effective in the folksonomy sites. a search for opera (browser), for example, will likely imply an underlying and boolean operator, which detracts from the pur pose and value of the parenthetical qualifier; this was confirmed in a furl search, where the terms opera and browser appeared either immediately adjacent to each other or within the same document. the application of the section of the niso guidelines pertaining to abbreviations and acronyms is particularly difficult, as it is important to balance between using abbre viated forms of concepts that are so wellknown that the full version is hardly used versus creating ambiguous tags. the fact that abbreviated forms appear so prominently in the daily logs of the three folksonomy sites suggests that the full forms of these tags are, in fact, very wellestablished. at face value, therefore, many of the abbreviated tags are ambiguous because they can refer to different concepts, but it is questionable whether such tags as css, flash, apple, and rss, for example are, in fact, ambiguous to the users of the sites. the use of the full forms for these tags seems cumbersome, as these concepts are hardly ever referred to in their full form. it could possibly be argued, in fact, that in some cases, the full forms may not be familiar; i may know to what concept rss refers, for example, without knowing the specific words represented by the letters r, s, s. the possible ambiguity of abbreviated forms is com pounded by the fact that none of the three folkson omy sites allows for crossreferences between equivalent terms, which is a standard feature of most controlled vocabularies, for example: nfl/national football league use national football league/used for nfl the help screens of the three sites do not address the notion of ambiguity in the construction of tags: they do not draw people’s attention to the inherent ambigu ity of abbreviated forms that may represent more than one concept. the sites also fail to address the fact that abbreviated forms (or any tag, for that matter) may be culturally based, so that while the meaning of nfl may be obvious to north american users, this may not be the case for people who live in other geographic areas. it may be useful for the folksonomy sites to add direct links to an online dictionary and to wikipedia, and to encourage people to use these sites to determine whether their cho sen tags may have more than one application or meaning; i had not realized, for example, that rss could represent twentythree different concepts until i used wikipedia and was led to a disambiguation page. access to these external sources may help users decide which full version of the abbreviation to use in the case of ambiguity. the examination of the structure of the tags pointed to some deficiencies in section 6 of the niso guidelines, specifically its occasional lack of sufficient definition or explanation of some of its recommendations. the guidelines list seven types of concepts that are typically represented by controlled vocabulary terms, but rely only upon a few examples to define the meaning and scope of these concepts. the guidelines thus provide no consistent mechanism by which the creators of terms can assess consistently the types of concepts represented. how, for example, is a discipline to be determined? does the term business represent a discipline if it is a subject area that is taught formally in a postsecondary institute, for article title | author 23the structure and form of folksonomy tags | spiteri 23 example? is it necessary for a discipline to be recognized as such among a majority of educational institutions? in its examples for events, niso lists holidays and revolutions. it is unclear, however, what level of specificity applies to this concept; would christmas, for example, be considered an event or a unique entity/proper noun (which is listed separately from types of concepts)? it is only later in the guidelines, under the examples provided for unique enti ties (for example, fourth of july), that one may assume that a named event should be considered a unique entity. verbal nouns also are difficult to determine based only upon the niso examples, and once again no guidelines are provided to determine whether a noun represents an activity or a thing, or possibly both; for example, skiing or clipping. the lack of clear definitions in niso also appeared in the section pertaining to slang, neologisms, and jargon, which are considered to be nonstandard terms that do not generally appear in dictionaries. as was discussed previ ously, it is not clear at what point a jargon term or a slang term becomes a neologism. all of the slang tags found in the three sites (for example, babe) appeared in merriam webster, which may serve to make this niso section even more ambiguous. ■ conclusion the most notable suggested weaknesses of folksonomies are their potential for ambiguity, polysemy, synonymy, and basic level variation as well as the lack of consistent guidelines for the choice and form of tags. the examina tion of the tags of the three folksonomy sites in light of the niso guidelines suggests that ambiguity and polysemy (such as homographs) are indeed problems in the struc ture of the folksonomy tags, although the actual propor tion of homographs and ambiguous tags each constitutes fewer than onequarter of the tags in each of the three folksonony sites. in other words, although ambiguity and polysemy are certainly problematic areas, most of the tags in each of the three sites are unambiguous in their meaning and thus conform to niso recommendations. the help sites of the three folksonomy provide few tangible guidelines for (1) the construction of tags, which affects the construction of multiterm tags; and (2) the clear distinction between the singular and plural forms of count versus noncount nouns. as has been shown, the use of the singular or plural forms of terms, as well as the use of punctuation to form multiterm tags, affects search results. a large proportion of the tags in all three sites consists of single terms, which mitigates the impact on retrieval, but the inconsistent use of the singular and plural forms of nouns is indeed significant and thus may have marked effect upon retrieval. synonymy and basic level variation were not examined in this study, but are certainly worthy of further exploration. in other areas, the tags conform closely to the niso guidelines for the choice and form of controlled vocabu laries. the tags represent mostly nouns, with very few unqualified adjectives or adverbs. the tags represent the types of concepts recommended by niso and conform well to recognized standards of spelling. most of the tags conform to standard usage; there are few instances of nonstandard usage, such as slang or jargon. in short, the structure of the tags in all three sites is well within the standards established and recognized for the construction of controlled vocabularies. should library catalogs decide to incorporate folkson omies, they should consider creating clearly written rec ommendations for the choice and form of tags that could include the following areas: ■ the difference between count and noncount nouns, as well as an explanation of how the use of the sin gular and plural forms affects retrieval. ■ one standard way in which to construct multiterm tags; for example, the insertion of a space between the component terms, or the use of an underscore between the terms. ■ a link to a recognized online dictionary and to wikipedia to enable users to determine the meanings of terms, to disambiguate amongst homographs, and to determine if the full form would be preferable to the abbreviated form. an explanation of the impact of ambiguous tags and homographs upon retrieval would be useful. ■ an acceptable use policy that would cover areas of potential concern, such as the use of potentially offensive tags, overly graphic tags, and so forth. although such terms were not the focus of this study, their presence was certainly evident in some cases, and would need to be considered in an environment that includes clients of all ages. with the use of such expanded guidelines and links to useful external reference sources, folksonomies could serve as a very powerful and flexible tool for increasing the userfriendliness and interactivity of public library catalogs, and also may be useful for encouraging other activities, such as informal online communities of readers and userdriven readers’ advisory services. works cited bateman, s., c. brooks, and g. mccalla. 2006. collaborative tagging approaches for ontological metadata in adaptive e-learning systems. http://www.win.tue.nl/swel/2006/ cameraready/02bateman_brooks_mccalla_swel2006_ final.pdf (accessed jan. 11, 2007). 2� information technology and libraries | september 20072� information technology and libraries | september 2007 bruce, h., w. jones, and s. dumais. 2004. keeping and re-finding information on the web: what do people do and what do they need? seattle: information school. http://kftf.ischool.washington .edu/refinding_information_on_the_web3.pdf (accessed jan. 11, 2007). cattuto, c., v. loreto, and l. pietronero. 2006. collaborative tagging and semiotic dynamics. http://arxiv.org/ps_cache/cs/ pdf/0605/0605015.pdf (accessed jan. 11, 2007). del.icio.us. 2006a. del.ico.us/about. http://del.icio.us/about/ (accessed jan. 11, 2007). del.icio.us. 2006b. del.ico.us/help/tags. http://del.icio.us/help/ tags (accessed jan. 11, 2007). dempsey, l. 2003. the recombinant library: portals and people. journal of library administration 39, no. 4: 103–36. fichter, d. 2006. intranet applications for tagging and folkson omies. online 30, no. 3: 43–45. furl. 2006. how to save a page in furl. http://www.furl.net/ howtosave.jsp (accessed jan. 11, 2007). golder, s. a., and b. a. huberman. 2006. usage patterns of col laborative tagging systems. journal of information science 32, no. 2: 198–208. guy, m., and e. tonkin. 2006. tidying up tags? d-lib magazine 12, no. 1. http://www.dlib.org/dlib/jan.06/guy/01guy.html (accessed jan. 11, 2007). ketchell, d. s. 2000. too many channels: making sense out of portals and personalization. information technology and libraries 19, no. 4: 175–79. kroski, e. 2006. the hive mind: folksonomies and user-based tagging. http://infotangle.blogsome.com/2005/12/07/thehive mindfolksonomiesanduserbasedtagging/ (accessed jan. 11, 2007). mathes, a. 2004. folksonomies—ccooperative classification and communication through shared metadata. http://www.adammathes .com/academic/computermediatedcommunication/ folksonomies.html (accessed jan. 11, 2007). merholz, p. 2004. ethnoclassification and vernacular vocabularies. http://www.peterme.com/archives/000387.html (accessed jan. 11, 2007). merriamwebster. (2007). yahoo. http://www.mw.com/ (accessed jan. 11, 2007). michlmayr, e. 2005. a case study on emergent semantics in communities. http://wit.tuwien.ac.at/people/michlmayr/ publications/michlmayr_casestudy_on_emergentsemantics _final.pdf (accessed jan. 11, 2007). mitchell, r. l. 2005. tag teams wrestle with web content. computerworld 38, no. 16: 31. niso. 2005. guidelines for the construction, format, and management of monolingual controlled vocabularies. ansi/niso z39.192005. bethesda, md.: national information standards organization. http://www.niso.org/standards/resources/z39192005 .pdf (accessed jan. 11, 2007). quintarelli, e. 2005. folksonomies: power to the people. http:// www.iskoi.org/doc/folksonomies.htm (accessed jan. 11, 2007). shirky, c. 2004. folksonomy. http://www.corante.com/many/ archives/2004/08/25/folksonomy.php (accessed jan. 11, 2007). spiteri, l. f. 2006. the use of folksonomies in public library cata logues. the serials librarian 51, no. 2: 75–89. szekely, b., and e. torres. 2005. ranking bookmarks and bistros: intelligent community and folksonomy development. http:// torrez.us/archives/2005/07/13/tagrank.pdf. (accessed jan. 11, 2007). technorati. 2006. technorati help:tags. http://www.technorati. com/help/tags.html (accessed jan. 11, 2007). trant, j., and b. wyman. (2006). investigating social tagging and folksonomy in art museums with steve.museum. http://www.archimuse .com/research/www2006taggingsteve.pdf (accessed jan. 11, 2007). udell, j. 2004. collaborative knowledge gardening. http://www. infoworld.com/article/04/08/20/34opstrategic_1.html (accessed jan. 11, 2007). vander wal, t. 2006. understanding folksonomy: tagging that works. http://s3.amazonaws.com/2006presentations/ dconstruct/tagging_in_rw.pdf (accessed jan. 11, 2007). vanderwal.net. 2005. folksonomy definition and wikipedia. http:// www.vanderwal.net/random/entrysel.php?blog=1750 (accessed jan. 11, 2007). wikipedia. 2006. folksonomy. http://en.wikipedia.org/wiki/ folksonomy (accessed jan. 11, 2007). wu, h., m. zubair, and k. maly. 2006. harvesting social knowledge from folksonomies. http://delivery.acm.org/10.1145/1150000/ 1149962/p111wu.pdf (accessed jan. 11, 2007). article title | author 25the structure and form of folksonomy tags | spiteri 25 appendix a: list of niso elements 6.3 term form 6.3.1 single word vs. multiword terms 6.3.2 types of concepts terms for things and their physical parts terms for materials terms for activities or processes terms for events or occurrences terms for properties or states terms for disciplines or subject fields terms for units of measurement 6.3.3 unique entities 6.4 grammatical forms of terms 6.4.1 nouns and noun phrases 6.4.1.1 verbal nouns 6.4.1.2 noun phrases 6.4.1.2.1 premodified noun phrases 6.4.1.2.2 postmodified noun phrases 6.4.2 adjectives 6.4.3 adverbs 6.5 nouns 6.5.1 count nouns 6.5.2 mass nouns 6.5.3 other types of singular nouns 6.5.3.1 abstract concepts 6.5.3.2 unique entities 6.6.2 spelling 6.6.2.1 spelling—warrant 6.6.2.2 spelling—authorities 6.6.3 abbreviations, initialisms, and acronyms 6.6.3.1 preference for abbreviation 6.6.3.2 preference for full form 6.6.3.2.1 general use 6.6.3.2.2 ambiguity 6.6.4 neologisms, slang, and jargon 6.7.1 capitalization and nonalphabetic characters 34 information technology and libraries | march 2010 tagging: an organization scheme for the internet marijke a. visser how should the information on the internet be organized? this question and the possible solutions spark debates among people concerned with how we identify, classify, and retrieve internet content. this paper discusses the benefits and the controversies of using a tagging system to organize internet resources. tagging refers to a classification system where individual internet users apply labels, or tags, to digital resources. tagging increased in popularity with the advent of web 2.0 applications that encourage interaction among users. as more information is available digitally, the challenge to find an organizational system scalable to the internet will continue to require forward thinking. trained to ensure access to a range of informational resources, librarians need to be concerned with access to internet content. librarians can play a pivotal role by advocating for a system that supports the user at the moment of need. tagging may just be the necessary system. w ho will organize the information available on the internet? how will it be organized? does it need an organizational scheme at all? in 1998, thomas and griffin asked a similar question, “who will create the metadata for the internet?” in their article with the same name.1 ten years later, this question has grown beyond simply supplying metadata to assuring that at the moment of need, someone can retrieve the information necessary to answer their query. given new classification tools available on the internet, the time is right to reassess traditional models, such as controlled vocabularies and taxonomies, and contrast them with folksonomies to understand which approach is best suited for the future. this paper gives particular attention to delicious, a social networking tool for generating folksonomies. the amount of information available to anyone with an internet connection has increased in part because of the internet’s participatory nature. users add content in a variety of formats and through a variety of applications to personalize their web experience, thus making internet content transitory in nature and challenging to lock into place. the continual influx of new information is causing a rapid cultural shift, more rapid than many people are able to keep up with or anticipate. conversations on a range of topics that take place using web technologies happen in real time. unless you are a participant in these conversations and debates using web-based communication tools, changes are passing you by. internet users in general have barely grasped the concept of web 2.0 and already the advanced “internet cognoscenti” write about web 3.0.2 regarding the organization and availability of internet content, librarians need to be ahead of the crowd as the voice who will assure content will be readily accessible to those that seek it. internet users actively participating in and shaping the online communities are, perhaps unintentionally, influencing how those who access information via the internet expect to be able to receive and use digital resources. librarians understand that the way information is organized is critical to its accessibility. they also understand the communities in which they operate. today, librarians need to be able to work seamlessly among the online communities, the resources they create, and the end user. as internet use evolves, librarians as information stakeholders should stay abreast of web 2.0 developments. by positioning themselves to lead the future of information organization, librarians will be able to select the best emerging web-based tools and applications, become familiar with their strengths, and leverage their usefulness to guide users in organizing internet content. shirky argues that the internet has allowed new communities to form. primarily online, these communities of internet users are capable of dramatically changing society both onand offline. shirky contends that because of the internet, “group action just got easier.”3 according to shirky, we are now at the critical point where internet use, while dependent on technology, is actually no longer about the technology at all. the web today (web 2.0) is about participation. “this [the internet] is a medium that is going to change society.”4 lessig points out that content creators are “writing in the socially, culturally relevant sense for the 21st century and to be able to engage in this writing is a measure of your literacy in the 21st century.”5 it is significant that creating content is no longer reserved for the internet cognoscenti. internet users with a variety of technological skills are participating in web 2.0 communities. information architects, web designers, librarians, business representatives, and any stakeholder dependent on accessing resources on the internet have a vested interest in how internet information is organized. not only does the architecture of participation inherent in the internet encourage completely new creative endeavors, it serves as a platform for individual voices as demonstrated in marijke a. visser (marijkea@gmail.com) is a library and information science graduate student at indiana university, indianapolis, and will be graduating may 2010. she is currently working for ala’s office for information and technology policy as an information technology policy analyst, where her area of focus includes telecommunications policy and how it affects access to information. tagging: an organization scheme for the internet | visser 35 personal and organizationally sponsored blogs: lessig 2.0, boing boing, open access news, and others. these internet conversations contribute diverse viewpoints on a stage where, theoretically, anyone can access them. web 2.0 technologies challenge our understanding of what constitutes information and push policy makers to negotiate equitable internet-use policies for the public, the content creators, corporate interests, and the service providers. to maintain an open internet that serves the needs of all the players, those involved must embrace the opportunity for cultural growth the social web represents. for users who access, create, and distribute digital content, information is anything but static; nor is using it the solitary endeavor of reading a book. its digital format makes it especially easy for people to manipulate it and shape it to create new works. people are sharing these new works via social technologies for others to then remix into yet more distinct creative work. communication is fundamentally altered by the ability to share content on the internet. today’s internet requires a reevaluation of how we define and organize information. the manner in which digital information is classified directly affects each user’s ability to access needed information to fully participate in twenty-first-century culture. new paradigms for talking about and classifying information that reflect the participatory internet are essential. n background the controversy over organizing web-based information can be summed up comparing two perspectives represented by shirky and peterson. both authors address how information on the web can be most effectively organized. in her introduction, peterson states, “items that are different or strange can become a barrier to networking.”6 shirky maintains, “as the web has shown us, you can extract a surprising amount of value from big messy data sets.”7 briefly, in this instance ontology refers to the idea of defining where digital information can and should be located (virtually). folksonomy describes an organizational system where individuals determine the placement and categorization of digital information. both terms are discussed in detail below. although any organizational system necessitates talking about the relationship(s) among the materials being organized, the relationships can be classified in multiple ways. to organize a given set of entities, it is necessary to establish in what general domain they belong and in what ways they are related. applying an ontological, or hierarchical, classification system to digital information raises several points to consider. first, there are no physical space restrictions on the internet, so relationships among digital resources do not need to be strictly identified. second, after recognizing that internet resources do not need the same classification standards as print material, librarians can begin to isolate the strengths of current nondigital systems that could be adapted to a system for the internet. third, librarians must be ready to eliminate current systems entirely if they fail to serve the needs of internet users. traditional systems for organizing information were developed prior to the information explosion on the internet. the internet’s unique platform for creating, storing, and disseminating information challenges pre– digital-age models. designing an organizational system for the internet that supports creative innovation and succeeds in providing access to the innovative work is paramount to moving the twenty-first-century culture forward. n assessing alternative models controversy encourages scrutiny of alternative models. in understanding the options for organizing digital information, it is important to understand traditional classification models. smith discusses controlled vocabularies, taxonomies, and facets as three traditional methods for applying metadata to a resource. according to smith, a controlled vocabulary is an unambiguous system for managing the meanings of words. it links synonyms, allowing a search to retrieve information on the basis of the relationship between synonyms.8 taxonomies are hierarchical, controlled vocabularies that establish parent–child relationships between terms. a faceted classification system categorizes information using the distinct properties of that information.9 in such a system, information can exist in more than one place at a time. a faceted classification system is a precursor to the bottom-up system represented by folksonomic tagging. folksonomy, a term coined in 2004 by thomas vander wal, refers to a “user-created categorical structure development with an emergent thesaurus.”10 vander wal further separates the definition into two types: a narrow and a broad folksonomy.11 in a broad folksonomy, many people tag the same object with numerous tags or a combination of their own and others’ tags. in a narrow folksonomy, one or few people tag an object with primarily singular terms. internet searching represents a unique challenge to people wanting to organize its available information. search engines like yahoo! and google approach the chaotic mass of information using two different techniques. yahoo! created a directory similar to the file folder system with a set of predetermined categories that were intended to be universally useful. in so doing, the yahoo! developers made assumptions about how the general public would categorize and access information. the categories 36 information technology and libraries | march 2010 and subsequent subcategories were not necessarily logically linked in the eyes of the general public. the yahoo! directory expanded as internet content grew, but the digital folder system, like a taxonomy, required an expert to maintain. shirky notes the yahoo! model could not scale to the internet. there are too many possible links to be able to successfully stay within the confines of a hierarchical classification system. additionally, on the internet, the links are sufficient for access because if two items are linked at least once, the user has an entry point to retrieve either one or both items.12 a hierarchical system does not assure a successful internet search and it requires a user to comprehend the links determined by the managing expert. in the google approach, developers acknowledged that the user with the query best understood the unique reasoning behind her search. the user therefore could best evaluate the information retrieved. according to shirky, the google model let go of the hierarchical file system because developers recognized effective searching cannot predetermine what the user wants. unlike yahoo!, google makes the links between the query and the resources after the user types in the search terms.13 trusting in the link system led google to understand and profit from letting the user filter the search results. to select the best organizational model for the internet it is critical to understand its emergent nature. a model that does not address the effects of web 2.0 on internet use and fails to capture participant-created content and tagging will not be successful. one approach to organizing digital resources has been for users to bookmark websites of personal interest. these bookmarks have been stored on the user’s computer, but newer models now combine the participatory web with saving, or tagging, websites. social bookmarking typifies the emergent web and the attraction of online networking. innovative and controversial, the folksonomy model brings to light numerous criteria necessary for a robust organizational system. a social bookmarking network, delicious is a tool for generating folksonomies. it combines a large amount of self-interest with the potential for an equal, if not greater, amount of social value. delicious users add metadata to resources on the internet by applying terms, or tags, to urls. users save these tagged websites to a personal library hosted on the delicious website. the default settings on delicious share a user’s library publicly, thus allowing other people—not limited to registered delicious account holders—to view any library. that the delicious developers understood how internet users would react to this type of interactive application is reflected in the popularity of delicious. delicious arrived on the scene in 2003, and in 2007 developers introduced a number of features to encourage further user collaboration. with a new look (going from the original del.icio.us to its current moniker, delicious) as well as more ways for users to retrieve and share resources by 2007, delicious had 3 million registered users and 100 million unique urls.14 the reputation of delicious has generated interest among people concerned with organizing the information available via the internet. how does the folksonomy or delicious model of open-ended tagging affect searching, information retrieving, and resource sharing? delicious, whose platform is heavily influenced by its users, operates with no hierarchical control over the vocabulary used as tags. this underscores the organization controversy. bottom-up tagging gives each person tagging an equal voice in the categorization scheme that develops through the user generated tags. at the same time, it creates a chaotic information-retrieval system when compared to traditional controlled vocabularies, taxonomies, and other methods of applying metadata.15 a folksonomy follows no hierarchical scheme. every tag generated supplies personal meaning to the associated url and is equally weighted. there will be overlap in some of the tags users select, and that will be the point of access for different users. for the unique tags, each delicious user can choose to adopt or reject them for their personal tagging system. either way, the additional tags add possible future access points for the rest of the user community. the social usefulness of the tags grows organically in relationship to their adoption by the group. can the internet support an organizational system controlled by user-generated tags? by the very nature of the participatory web, whose applications often get better with user input, the answer is yes. delicious and other social tagging systems are proving that their folksonomic approach is robust enough to satisfy the organizational needs of their users. defined by vander wal, a broad folksonomy is a classification system scalable to the internet.16 the problem with projecting already-existing search and classification strategies to the internet is that the internet is constantly evolving, and classic models are quickly overcome. even in the nonprint world of the internet, taxonomies and controlled vocabulary entail a commitment both from the entity wanting to organize the system and the users who will be accessing it. developing a taxonomy involves an expert, which requires an outlay of capital and, as in the case with yahoo!, a taxonomy is not necessarily what users are looking for. to be used effectively, taxonomies demand a certain amount of user finesse and complacency. the user must understand the general hierarchy and by default must suspend their own sense of category and subcategory if they do not mesh with the given system. the search model used by google, where the user does the filtering, has been a significantly more successful search engine. google recognizes natural language, making it user friendly; however, it remains merely a search engine. it is successful at making links, but it leaves the user stranded without a means to organize search results beyond simple page rank. traditional tagging: an organization scheme for the internet | visser 37 hierarchical systems and search strategies like those of yahoo! and google neglect to take into account the tremendous popularity of the participatory web. successful web applications today support user interaction; to disregard this is naive and short-sighted. in contrast to a simple page-rank results list or a hierarchical system, delicious results provide the user with rich, multilayer results. figure 1 shows four of the first ten results of a delicious search for the term “folksonomy.” the articles by the four authors in the left column were tagged according to the diagram. two of the articles are peer-reviewed, and two are cited repeatedly by scholars researching tagging and the internet. in this example, three unique terms are used to tag those articles, and the other terms provide additional entry points for retrieval. further information available using delicious shows that the guy article was tagged by 1,323 users, the mathes article by 2,787 users, the shirky article by 4,383 users, and the peterson article by 579 users.17 from the basic delicious search, the user can combine terms to narrow the query as well as search what other users have tagged with those terms. similar to the card catalog, where a library patron would often unintentionally find a book title by browsing cards before or after the actual title she originally wanted, a delicious user can browse other users’ libraries, often finding additional pertinent resources. a user will return a greater number of relevant and automatically filtered results than with an advanced google search. as an ancillary feature, once a delicious user finds an attractive tag stream—a series of tags by a particular user—they can opt to follow the user who created the tag stream, thereby increasing their personal resources. hence delicious is effective personally and socially. it emulates what internet users expect to be able to do with digital content: find interesting resources, personalize them, in this case with tags, and put them back out for others to use if they so choose. proponents of folksonomy recognize there are benefits to traditional taxonomies and controlled vocabulary systems. shirky delineates two features of an organizational system and their characteristics, providing an example of when a hierarchical system can be successful (see table 1).18 these characteristics apply to situations using databases, journal articles, and dissertations as spelled out by peterson, for example.19 specific organizations with identifiable common terminology—for example, medical libraries—can also benefit from a traditional classification system. these domains are the antithesis of the domain represented by the web. the success of controlled vocabularies, taxonomies, and their resulting systems depends on broad user adoption. that, in combination with the cost of creating and implementing a controlled system, raises questions as to their utility and long-term viability for use on the web. though meant for longevity, a taxonomy fulfills a need at one fixed moment in time. a folksonomy is never static. taxonomies developed by experts have not yet been able to be extended adequately for the breadth and depth of internet resources. neither have traditional viewpoints been scaled to accept the challenges encountered in trying to organize the internet. folksonomy, like taxonomy, seeks to provide the information critical to the user at the moment of need. folksonomy, however, relies on users to create the links that will retrieve the desired results. doctorow puts forward three critiques of a hierarchical metadata system, emphasizing the inadequacies of applying traditional classification schemes to the digital stage: 1. there is not a “correct” way to categorize an idea. 2. competing interests cannot come to a consensus figure 1. search results for “folksonomy” using delicious. table 1. domains and their participants domain to be organized participants in the domain small corpus expert catalogers formal categories authoritative source of judgment restricted entities coordinated users clear edges expert users 38 information technology and libraries | march 2010 on a hierarchical vocabulary. 3. there is more than one way to describe something. doctorow elaborates: “requiring everyone to use the same vocabulary to describe their material denudes the cognitive landscape, enforces homogeneity in ideas.”20 the internet raises the level of participation to include innumerable voices. the astonishing thing is that it thrives on this participation. guy and tonkin address the “folksonomic flaw” by saying user-generated tags are by definition imprecise. they can be ambiguous, overly personal, misspelled, and a contrived compound word. guy and tonkin suggest the need to improve tagging by educating the users or by improving the systems to encourage more accurate tagging.21 this, however, does not acknowledge that successful web 2.0 applications depend on the emergent wisdom of the user community. the systems permit organic evolution and continual improvement by user participation. a folksonomy evolves much the way a species does. unique or single-use tags have minimal social import and do not gain recognition. tags used by more than a few people reinforce their value and emerge as the more robust species. n conclusion the benefits of the internet are accessible to a wide range of users. the rewards of participation are immediate, social, and exponential in scope. user-generated content and associated organization models support the internet’s unique ability to bring together unlikely social relationships that would not necessarily happen in another milieu. to paraphrase shirky and lessig, people are participating in a moment of social and technological evolution that is altering traditional ways of thinking about information, thereby creating a break from traditional systems. folksonomic classification is part of that break. its utility grows organically as users add tagged content to the system. it is adaptive, and its strengths can be leveraged according to the needs of the group. while there are “folksonomic flaws” inherent in a bottomup classification system, there is tremendous value in weighting individual voices equally. following the logic of web 2.0 technology, folksonomy will improve according to the input of the users. it is an organizational system that reflects the basic tenets of the emergent internet. it may be the only practical solution in a world of participatory content creation. shirky describes the internet by saying, “there is no shelf in the digital world.”22 classic organizational schemes like the dewey decimal system were created to organize resources prior to the advent of the internet. a hierarchical system was necessary because there was a physical limitation on where a resource could be located; a book can only exist in one place at one time. in the digital world, the shelf is simply not there. material can exist in many different places at once and can be retrieved through many avenues. a broad folksonomy supports a vibrant search strategy. it combines individual user input with that of the group. this relationship creates data sets inherently meaningful to the community of users seeking information on any given topic at any given moment. this is why a folksonomic approach to organizing information on the internet is successful. users are rewarded for their participation, and the system improves because of it. folksonomy mirrors and supports the evolution of the internet. librarians, trained to be impartial and ethically bound to assure access to information, are the logical mediators among content creators, the architecture of the web, corporate interests, and policy makers. critical conversations are no longer happening only in traditional publications of the print world. they are happening with communication platforms like youtube, twitter, digg, and delicious. information organization is one issue on which librarians can be progressive. dedicated to making information available, librarians are in a unique position to take on challenges raised by the internet. as the profession experiments with the introduction of web 3.0, librarians need to position themselves between what is known and what has yet to evolve. librarians have always leveraged the interests and needs of their users to tailor their services to the individual entry point of every person who enters the library. because more and more resources are accessed via the internet, librarians will have to maintain a presence throughout the web if they are to continue to speak for the informational needs of their users. part of that presence necessitates an ability to adapt current models to the internet. more importantly, it requires recognition of when to forgo conventional service methods in favor of more innovative approaches. working in concert with the early adopters, corporate interests, and general internet users, librarians can promote a successful system for organizing internet resources. for the internet, folksonomic tagging is one solution that will assure users can retrieve information necessary to answer their queries. references and notes 1. charles f. thomas and linda s. griffin, “who will create the metadata for the internet?” first monday 3, no. 12 (dec. 1998). 2. web 2.0 is a fairly recent term, although now ubiquitous among people working in and around internet technologies. attributed to a conference held in 2004 between medialive tagging: an organization scheme for the internet | visser 39 international and o’reilly media, web 2.0 refers to the web as being a platform for harnessing the collective power of internet users interested in creating and sharing ideas and information without mediation from corporate, government, or other hierarchical policy influencers or regulators. web 3.0 is a much more fluid concept as of this writing. there are individuals who use it to refer to a semantic web where information is analyzed or processed by software designed specifically for computers to carry out the currently human-mediated activity of assigning meaning to information on a webpage. there are librarians involved with exploring virtual-world librarianship who refer to the 3d environment as web 3.0. the important point here is that what internet users now know as web 2.0 is in the process of being altered by individuals continually experimenting with and improving upon existing web applications. web 3.0 is the undefined future of the participatory internet. 3. clay shirky, “here comes everybody: the power of organizing without organizations” (presentation videocast, berkman center for internet & society, harvard university, cambridge, mass., 2008), http://cyber.law.harvard.edu/inter active/events/2008/02/shirky (accessed oct. 1, 2008). 4. ibid. 5. lawerence lessig, “early creative commons history, my version,” videocast, aug. 11, 2008, lessig 2.0, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed aug. 13, 2008). 6. elaine peterson, “beneath the metadata: some philosophical problems with folksonomy,” d-lib magazine 12, no. 11 (2006), http://www.dlib.org/dlib/november06/peterson/11peterson .html (accessed sept. 8, 2008). 7. clay shirky, “ontology is overrated: categories, links, and tags” online posting, spring 2005, clay shirky’s writings about the internet, http://www.shirky.com/writings/ontology_ overrated.html#mind_reading (accessed sept. 8, 2008). 8. gene smith, tagging: people-powered metadata for the social web (berkeley, calif.: new riders, 2008): 68. 9. ibid., 76. 10. thomas vander wal, “folksonomy,” online posting, feb. 7, 2007, vanderwal.net, http://www.vanderwal.net/folksonomy .html (accessed aug. 26, 2008). 11. thomas vander wal, “explaining and showing broad and narrow folksonomies,” online posting, feb. 21, 2005, personal infocloud, http://www.personalinfocloud.com/2005/02/ explaining_and_.html (accessed aug. 29, 2008). 12. shirky, “ontology is overrated.” 13. ibid. 14. michael arrington, “exclusive: screen shots and feature overview of delicious 2.0 preview,” online posting, june 16, 2005, techcrunch, http://www.techcrunch.com/2007/09/06/ exclusive-screen-shots-and-feature-overview-of-delicious-20 -preview/(accessed jan. 6, 2010). 15. smith, tagging, 67–93 . 16. vander wal, “explaining and showing broad and narrow folksonomies.” 17. adam mathes, “folksonomies—cooperative classification and communication through shared metadata” (graduate paper, university of illinois urbana–champaign, dec. 2004); peterson, “beneath the metadata”; shirky, “ontology is overrated”; thomas and griffin, “who will create the metadata for the internet?” 18. shirky, “ontology is overrated.” 19. peterson, “beneath the metadata.” 20. cory doctorow, “metacrap: putting the torch to seven straw-men of the meta-utopia,” online posting, aug. 26, 2001, the well, http://www.well.com/~doctorow/metacrap.htm (accessed sept. 15, 2008). 21. marieke guy and emma tonkin, “folksonomies: tidying up tags?” d-lib magazine 12, no. 1 (2006), http://www.dlib .org/dlib/january06/guy/01guy.html (accessed sept. 8, 2008). 22. shirky, “ontology is overrated.” global interoperability continued from page 33 9. julie renee moore, “rda: new cataloging rules, coming soon to a library near you!” library hi tech news 23, no. 9, (2006): 12. 10. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, & technical services 27, no. 1, (2003): 56. 11. park, “cross-lingual name and subject access.” 12. ibid. 13. thomas b. hickey, “virtual international authority file” (microsoft powerpoint presentation, ala annual conference, new orleans, june 2006), http://www.oclc.org/research/ projects/viaf/ala2006c.ppt (accessed dec. 9, 2009). 14. leaf, “leaf project consortium,” http://www.crxnet .com/leaf/index.html (accessed dec. 9, 2009). 15. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 16. alan danskin, “mature consideration: developing bibliographic standards and maintaining values,” new library world 105, no. 3/4, (2004): 114. 17. ibid. 18. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 19. moore, “rda.” 20. danskin, “mature consideration,” 116. 21. ibid.; park, “cross-lingual name and subject access.” discovery: what do you mean by that? | carter 161 judith carter editorial board thoughts: issue introduction discovery: what do you mean by that? m wuah ha ha ha haaa! finally it’s my turn. i hold the power of the editorial. (can you tell i’m writing this around halloween?) seriously now, i’ve been intimately and extensively involved with information technology and libraries for eleven years, yet this is the first time i’ve escaped from behind the editing scenes to address the readership directly. as managing editor for seven of the eleven volumes (18–22 and 27–28) and an editorial board member reviewing manuscripts (vols. 23–26), i am honored marc agreed to let me be guest editor for this theme issue. this issue is a compilation of presentations from the discovery mini-conference held at the university of nevada las vegas (unlv) libraries in the spring of 2009. the first article by jennifer fabbi gives the full chronology and framework of the project, but i have the pleasure of introducing this issue and topic by virtue of my role as guest editor, as well as my own participation in the miniconference before i left unlv in july 2009. n what is discovery? when the dean of libraries, patricia iannuzzi, announced that unlv would have a libraries-wide, poster-session style discovery mini-conference, jennifer fabbi and i decided we wanted to be part of it. we had already been exploring various aspects of discovery as part of an organizational focus as well as following up on a particular event that happened earlier in the year. while serving on a search committee, we posed a question to all the candidates: “what do you see the library catalog looking like in the future? what do you see as the relationship between the library catalog and other access or discovery tools?” one of the candidates had such a unique answer that it got us thinking: are we all talking about the same thing when we discuss discovery? the mini-conference gave us the opportunity to explore the idea further. an all-library summit that preceded the mini-conference announcement had focused on users finding known items. we knew that discovery was so much more and that it depended on the users’ needs. of course, first we went to multiple online dictionaries to look up the meanings of “discovery” and found the following definitions: n something learned or found; something new that has been learned or found n the process of learning something; the fact or process of finding out about something for the first time n the process of finding something; the process or act of finding something or somebody unexpectedly or after searching we also looked at famous quotes about discovery. there were some of our favorites: a discovery is said to be an accident meeting a prepared mind. —albert szent-gyorgyi education is a progressive discovery of our own ignorance. —will durant next, a colleague recommended we look at chang’s browsing theory.1 this theory covered the broad spectrum of how users seek information and showed a more serendipitous view than the former focus of known item search. obviously, browsing implies a physical interaction with a collection, so we reframed the themes to fit discovery in the “every-library” electronic information environment. chang’s five browsing themes, adapted to discovery: n looking for a specific item, to locate n looking for something with common characteristics, to find “more like this” n keeping up-to-date, to find out what’s new in a field, topic or intellectual area n learning or finding out, to define or form a research question n goal-free, to satisfy curiosity or be entertained.2 all interesting information, but a little theoretical for a visual presentation. to make these themes more concrete and visual, i suggested we apply them to personas as described in one of my favorite books, the inmates are running the asylum.3 this encourages programmers to create a user with a full backstory and then design a product for their needs. to do this in an entertaining way, we identified five types of users we’ve encountered in our libraries and described an information-seeking need for each. i then created some colorful and representational characters using a well-known, alliteratively named candy’s website. our five characters were 1. mina, stylishly dressed and always carries a cell phone, is an undergraduate who rarely uses the library. she has a sociology class library assignment to find information on the cell phone habits of generation x. 2. ms. lvite lives in the las vegas area and contributes to the library. she is a regular from the community judith carter (jcarter.mls@gmail.com) is head of technical services at marquette university raynor memorial libraries, milwaukee, wi and managing editor of ital. 162 information technology and libraries | december 2009 who likes to dig into everything the library owns about small mining towns in nevada. 3. dr. prof is a faculty member with a slightly outdated wardrobe but a thirst for knowledge. he wants to know what books have been published in his field of quantum bowtie mechanics by any of his colleagues across the country. 4. phdead tired is a slightly mussed grad student who is always in the library clutching a cup of coffee. he needs to narrow down his dissertation topic. 5. duuuuude is an energetic, sociable young man who likes to hang out in the library with his friends. he has some time to kill on the computer. on our poster, we asked the discovery miniconference attendees to place cutouts of our personas on a pie chart divided into the five themes of discovery. jennifer and i expected certain placements and were pleasantly surprised when our attendees challenged our assumptions with alternate possibilities. another section of the poster related discovery behaviors to specific electronic discovery tools. we provided a few and asked the attendees to add others (see table 1). while talking with each attendee, we provided a bookmark listing the five discovery behaviors (with colorful character personas) and suggested they keep them in mind as they visited the other conference sessions. we challenged them to identify what user behaviors the other presenters’ systems or services were targeting. the message jennifer and i hoped to convey with our poster was this: the way we think about discovery, or the users’ goals in finding information, drives the discovery table 1. relating discovery behaviors to electronic discovery tools user wants . . . provide the user . . . other tools?* to find a specific item search by title, author, or call number (e.g., libraries’ webopac) search a database worldcat flickr google books to find items with common characteristics items linked by subject headings, format, or other elements; tag clouds; federated search for article databases (e.g., webopac, encore, article databases) flickr summon twine delicious to be kept up-to-date recently added items by subject; integration of blogs for news or updates (e.g., new books list, libguides, encore “recently added”) blogs rss feeds apple itunes amazon readers advisory authors/musicians websites newspapers online to learn more about something general information that provides context, reviews (e.g., wikipedia, google, encore community reviews) dissertation abstracts encyclopedias database of databases (for context) peer to peer: delicious, social tagging to satisfy curiosity or be entertained surfing the web, multimedia, social networking (e.g., google, youtube, facebook) myspace world of warcraft second life podcasts wikipedia “random article” feature * ideas generated at the discovery mini-conference discovery: what do you mean by that? | carter 163 systems we have or will create. as you read through this issue, i hope you’ll see some new ways to think about discovery and that those ways will fuel this audience’s potential to create new tools. what follows is a textual walk around our miniconference. taken as individual articles, each might not look like what you are used to seeing in ital. taken as a whole that grew out of the process, these articles are what makes this a special issue. as i said before, jennifer fabbi provides the background and process for the discovery mini-conference. then, alex dolski describes a prototype multipac discovery system he created and demonstrated, and he discusses the issues surrounding the design of such a system. tom ipri, michael yunkin, and jeanne brown, as members of the usability working group, had already been conducting testing on unlv libraries’ website. they share their methods, findings, and results with us. thomas sommer presents a look at what the special collections department has implemented to aid discovery of their unique materials. wendy starkweather and eva stowers used the mini-conference as an opportunity to research how other libraries are providing discovery opportunities to students via smartphones. patrick griffis describes his work with free screen capture tools to build pathfinders to promote resource discovery. patrick griffis and cyrus ford each looked at enhancing catalog records, so they combined their two presentations here to describe ways to enrich the online catalog to better aid our users’ success. references 1. shan-ju chang, “chang’s browsing,” in theories of information behavior, by karen e. fisher, sanda erdelez, and lynne mckechnie (medford, n.j.: information today, 2005): 69–74. 2. ibid., 71–72. 3. alan cooper, the inmates are running the asylum, (indianapolis, ind.: sams, 1999). personas are described in chapter 9. figure 1. “initial thoughts” and “five general themes of discovery behavior” panel from the discovery mini-conference poster editorial board thoughts | eden 109 editorial board thoughts bradford lee eden musings on the demise of paper w e have been hearing the dire predictions about the end of paper and the book since microfiche was hailed as the savior of libraries decades ago. now it seems that technology may be finally catching up with the hype. with the amazon kindle and the sony reader beginning to sell in the marketplace despite the cost (about $360 for the kindle), it appears that a whole new group of electronic alternatives to the print book will soon be available for users next year. amazon reports that e-book sales quadrupled in 2008 from the previous year. this has many technology firms salivating and hoping that the consumer market is ready to move to digital reading as quickly and profitably as the move to digital music. some of these new devices and technologies are featured in the march 3, 2009, fortune article by michael v. copeland titled “the end of paper?”1 part of the problem with current readers is their challenges for advertising. because the screen is so small, there isn’t any room to insert ads (i.e., revenue) around the margins of the text. but new readers such as plastic logic, polymer vision, and firstpaper will have larger screens, stronger image resolution, and automatic wireless updates, with color screens and video capabilities just over the horizon. still, working out a business model for newspapers and magazines is the real challenge. and how much will readers pay for content? with everything “free” over the internet, consumers have become accustomed to information readily available for no immediate cost. so how much to charge and how to make money selling content? the plastic logic reader weighs less than a pound, is one-eighth of an inch thick, and resembles an 8½ x 11 inch sheet of paper or a clipboard. it will appear in the marketplace next year, using plastic transistors powered by a lithium battery. while not flexible, it is a very durable and break-resistant device. other e-readers will use flexible display technology that allows one to fold up the screen and place the device into a pocket. much of this technology is fueled by e-ink, a start-up company that is behind the success of the kindle and the reader. they are exploring the use of color and video, but both have problems in terms of reading experience and battery wear. in the long run, however, these issues will be resolved. expense is the main concern: just how much are users willing to pay to read something in digital rather than analog? amazon has been hugely successful with the kindle, selling more than 500,000 for just under $400 in 2007. and with the drop in subscriptions for analog magazines and newspapers, advertisers are becoming nervous about their futures. or will the “pay by the article” model, like that used for digital music sales, become the norm? so what should or do these developments mean for libraries? it means that we should probably be exploring the purchase of some of these products when they appear and offering them (with some content) for checkout to our patrons. many of us did something similar when it became apparent that laptops were wanted and needed by students for their use. many of us still offer this service today, even though many campuses now require students to purchase them anyway. offering cutting-edge technology with content related to the transmission and packaging of information is one way for our clientele to see libraries as more than just print materials and a social space. and libraries shouldn’t pay full price (or any price) for these new toys; companies that develop these products are dying to find free research and development focus groups that will assist them in versioning and upgrading their products for the marketplace. what better avenue than college students? related to this is the recent announcement by the university of michigan that their university press will now be a digital operation to be run as part of the library.2 decreased university and library budgets have meant that university presses have not been able to sell enough of their monographs to maintain viable business models. the move of a university press to a successful scholarly communication and open-source publishing entity like the university of michigan libraries means that the press will be able to survive, and it also indicates that the newer model of academic libraries as university publishers will have a prototypical example to point out to their university’s administration. in the long run, these types of partnerships are essential if academic libraries are to survive their own budget cuts in the future. references 1. michael v. copeland, “the end of paper?” cnnmoney .com, mar. 3, 2009, http://money.cnn.com/2009/03/03/ technology/copeland_epaper.fortune/ (accessed june 22, 2009). 2. andrew albanese, “university of michigan press merged with library, with new emphasis on digital monographs,” libraryjournal.com, mar. 26, 2009, http://www .libraryjournal.com/article/ca6647076.html (accessed june 22, 2009). bradford lee eden (eden@library.ucsb.edu) is associate university librarian for technical services and scholarly communication, university of california, santa barbara. editorial ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ public library computer waiting queues: alternatives to the first -come-first-served strategy stuart williamson public library computer waiting queues | williamson 72 abstract this paper summarizes the results of a simulation of alternative queuing strategies for a public library computer sign-up system. using computer usage data gathered from a public library, the performance of these various queuing strategies is compared in terms of the distribution of user wait times. the consequences of partitioning a pool of public computers are illustrated as are the potential benefits of prioritizing users in the waiting queue according to the amount of computer time they desire. introduction many of us at public libraries are all too familiar with the scene: a crowd of customers huddled around the library entrance in the morning, anxiously waiting for the doors to open to begin a race for the computers. from this point on, the wait for a computer at some libraries, such as the one we will examine, can hover near thirty minutes on busy days and peak at an hour or more. such long waiting times are a common source of frustration for both customers and staff. by far the most effective solution to this problem is to install more public computers at your library. of course, when the space or money run out, this may no longer be possible. another approach is to reduce the length or number of sessions each customer is allowed. unfortunately, reducing session length can make completion of many important tasks difficult; whereas, restricting the number of sessions per day can result in customers upset over being unable to use idle computers.1 finally, faced with daunting wait times, libraries eager to make their computers accessible to more people may be tempted to partition their waiting queue by installing separate fifteen-minute “express” computers. a primary focus of this paper is to illustrate how partitioning the pool of public computers can significantly increase waiting times. additionally, several alternative queuing strategies are presented for providing express-like computer access without increasing overall waiting times. we often take for granted the notion that first-come-first-served (fcfs) is a basic principle of fairness. “i was here first,” is an intuitive claim that we understand from an early age. however, stuart williamson (swilliamson@metrolibrary.org) is researcher, metropolitan library system, oklahoma city, oklahoma. mailto:swilliamson@metrolibrary.org information technology and libraries | june 2012 73 the inefficiency present in a strictly fcfs queue is implicitly acknowledged when we courteously invite a person with only a few items to bypass our overflowing grocery cart to proceed ahead in the check-out line. most of us would agree to wait an additional few minutes rather than delay someone else for a much greater length of time. when express lanes are present, they formalize this process by essentially allowing customers needing help for only a short period of time to cut in line. these line cuts are masked by the establishment of separate dedicated lines, i.e., the queue is partitioned into express and non-express lines. one question addressed by this article is “is there a middle ground?” in other words, how might a library system set up its computer waiting queue to achieve express-lane type service without splitting the set of public internet computers into partitions that operate separately and in parallel? several such strategies are presented here along with the results of how each performed in a computer simulation using actual customer usage data from a public library. strategies queuing systems are heavily researched in a number of disciplines, particularly computer science and operations research. the complexity and sheer number of different queuing models can present a formidable barrier to library professionals. this is because, in the absence of real-world data, it is often necessary to analyze a queuing system mathematically by approximating its key features with an applicable probability distribution. unfortunately, applying these distributions entails adopting their underlying assumptions as well as any additional assumptions involved in calculating the input parameters. for instance, the poisson distribution (used to approximate customer arrival rates) requires that the expected arrival rate be uniform across all time intervals, an assumption which is clearly violated when school lets out and teenagers suddenly swarm the computers.2 even if we can account for such discrepancies, there remains the difficulty of estimating the correct arrival rate parameter for each discrete time interval being analyzed. fortunately, many libraries now use automated computer sign-up systems which provide access to vast amounts of real-world data. with realistic data, it is possible to simulate various queuing strategies, a few of which will be analyzed in this article. a computer simulation using real-world data provides a good picture of the practical implications of any queuing strategy we care to devise without the need for complex models. as is often the case, designing a waiting queue strategy involves striking a balance among competing factors. for instance, one way of reducing waiting times involves breaking with the fcfs rule and allowing users in one category to cut in front of other users. how many cuts are acceptable? does the shorter wait time for users in one category justify the longer waits in another? there are no right answers to these questions. while simulating a strategy can provide a realistic picture of its results in terms of waiting times, evaluating which strategy’s results are preferable for a particular library must be done on a case-by-case basis. in addition to the standard fcfs strategy with a single pool of computers and the same fcfs strategy implemented with one computer removed from the pool to serve as a dedicated fifteen public library computer waiting queues | williamson 74 minute express computer (referred to as fcfs-15), we will consider for comparison three other well-known alternative queuing strategies: shortest-job-first (sjf), highest-response-ratio-next (hrrn), and a variant of shortest-job-first (sjf-fb) which employs a feedback mechanism to restrict the number of times a given user may be bypassed in the queue.3 the three alternative strategies all require advance knowledge or estimation of how long each particular computer session will last. in our case, this means customers would need to indicate how long of a session they desire upon first signing up for a computer. any number of minutes is acceptable so we will limit the sign-up options to four categories in fifteen-minute intervals: fifteen minutes, thirty minutes, forty-five minutes, and sixty minutes. each session will then be initially categorized into one of four priority classes (p1, p2, p3, and p4) accordingly. as the data will show, customers selecting shorter sessions are given a higher priority in the queue and will thus have a shorter expected waiting time. it should be noted that relying on users to choose their own session length presents its own set of problems. it is often difficult to estimate how much time will be required to accomplish a given set of tasks online. however, users face a similar difficulty in deciding whether to opt for a dedicated fifteen-minute computer under the fcfs-15 system. the trade-off between use time and wait time should provide an incentive for some users to self-ration their computer use, placing an additional downward pressure on wait times. however, user adaptations in response to various queuing strategies are outside the scope of this analysis and will not be considered further. the shortest-job-first (sjf) strategy functions by simply selecting from the queue the user in the highest priority class. the amount of time spent waiting by each user is only considered as a tie breaker among users occupying the same priority class. our results demonstrate that the sjf strategy is generally best for minimizing overall average waiting time as well as for getting customers needing the least amount of computer time online the fastest. the main drawbacks of this strategy are that these gains come at the expense of more line cuts and higher average and maximum waiting times for the lowest priority users—those needing the longest sessions (sixty minutes). there is no limit to how many times a user can be passed over in the queue. in theory, this means that such a user could be continually bypassed and never be assigned a computer during the day. the sjf-fb strategy is a variant of sjf with the addition of a feedback mechanism that increases the priority of users each time they are cut in line. for instance, if a user signs up for a sixtyminute session, he/she is initially assigned a priority of 4. suppose that shortly after, another user signs up for a thirty-minute session and is assigned a priority of 2. the next available computer will be assigned to the user with the priority 2. the bypassed user’s priority will now be bumped up by a set interval. in this simulation an interval of 0.5 is used so the bypassed user’s new priority becomes 3.5. as a result, users beginning with a priority of 4 will reach the highest priority of 1 after being bypassed six times and will not be bypassed further. this effectively restricts the maximum number of times a user can be cut in front of at six. information technology and libraries | june 2012 75 the final alternative strategy, highest-response-ratio-next (hrrn), is a balance between fcfs and sjf. it considers both the arrival time and requested session length when assigning a priority to each user in the queue. each time a user is selected from the queue, the response ratio is recalculated for all users. the user with the highest response ratio is selected and assigned the open computer. the formula for response ratio is: ( ) this allows users with a shorter session request to cut in line, but only up to a point. even customers requesting the longest possible session move up in priority as they wait, just at a slower pace. this method produces the same benefits and drawbacks as the sjf strategy; but the effects of both are moderated, and the possibility of unbounded waiting is eliminated. still, although the expected number of cuts will be lower using hrrn than with sjf, there is no limit on how many times a user may be passed over in the queue. the response ratio formula can be generalized by scaling the importance of the waiting time factor. for instance in the modified response ratio below, increasing values of x > 1 will cause the strategy to more resemble fcfs, and decreasing values of 0 < x < 1 will more resemble sjf. ( ) one could experiment with different values of x to find a desired balance between the number of line cuts and the impact on average waiting times for customers in the various priority classes. this won’t be pursued here, and x will be assumed to be 1. methodology the data used in this simulation come from the metropolitan library system’s southern oaks library in oklahoma city. this library has eighteen public internet computers that customers can sign up for using proprietary software developed by jimmy welch, deputy executive director/technology for the metropolitan library system. the waiting queue employs the firstcome-first-served (fcfs) strategy. customers are allotted an initial session of up to sixty minutes but may extend their session in thirty-minute increments so long as the waiting queue is empty. repeat customers are also allowed to sign up for additional thirty-minute sessions during the day, provided that no user currently in the queue has been waiting for more than ten minutes (an indication that demand for computers is currently high). anonymous usage data gathered by the system in august 2010 was compiled to produce the information about each customer session shown in table 1. public library computer waiting queues | williamson 76 table 1. session data (units in minutes) the information about each session required for the simulation includes the time at which the user arrived to sign up for a computer, the number of minutes it took the user to log in once assigned a computer, how many minutes of computer time were used, whether or not this was the user’s first or a subsequent session for the day, and finally, whether the user gave up waiting and abandoned his/her place in the queue. users are given eight minutes to log in once a computer station is assigned to them before they are considered to have abandoned the queue. once this data has been gathered, the computer simulation runs by iterating through each second the library is open. as user sign-up times are encountered in the data, they are added to the waiting queue. when a computer becomes available, a user is selected from the queue using the strategy being simulated and assigned to the open computer. the customer occupies the computer for the length of time given by their associated log-in delay and session length. when this time expires, customers are removed from their computer and the information recorded during their time spent in the waiting queue is logged. results there were 7,403 sign-ups for the computers at the southern oaks library in august 2010. each of these requests is assigned a priority class based on the length of the session as detailed in table 2. the intended session length of users choosing to abandon the queue is unknown. abandoned sign-ups are assigned a priority class randomly in proportion to the overall distribution of priority classes in the data so as not to introduce any systematic bias into the results. even though their actual session length is zero, these users participate in the queue and cause the computer eventually assigned to them to sit idle for eight minutes until it is re-assigned. customers signing up for a subsequent session during the day are always assigned the lowest priority class (p-4) regardless of their requested session length. this is a policy decision to not give priority to users who have already received a computer session for the day. information technology and libraries | june 2012 77 table 2. assignment of priority classes figure 1 displays the average waiting time for each priority class during the simulation (bars) along with the total number of sessions initially assigned to each class (line). it is immediately obvious from the chart that each alternative strategy excels at reducing the average wait for high priority (p1) users. also observe how removing one computer from the pool to serve exclusively as a fifteen-minute computer drastically increases the fcfs-15 average wait times in the other priority classes. clearly, removing one (or more) computer from the pool to serve as a dedicated fifteen-minute station is a poor strategy here for all but the 519 users in class p-1. losing just one of the eighteen available computers nearly doubles the average wait for the remaining 6,884 users in the other priority classes. figure 1. average user wait minutes by priority class public library computer waiting queues | williamson 78 by contrast, note that the reduced average wait times for the highest priority users in class p-1 persist in classes p-2 and p-3 for the non-fcsc strategies. the sjf strategy produces the most dramatic reductions for the 2,164 users not in class p-4. however, for the 5,239 users in class p-4, the sjf strategy produced an average wait time that was 2.1 minutes longer than the purely fcfs strategy. the hrrn strategy achieves lesser wait time reductions than sjf in the higher priority classes, but hrrn increased the average wait for users in class p-4 by only 0.7 minutes relative to fcfs. the average wait using the sjf-fb strategy falls in between that of sjf and hrrn for each priority class while guaranteeing users will be cut at most six times. an examination of the maximum wait times for each priority class in figure 2 illustrates how the express lane itself can be a bottleneck. even with a dedicated fifteen-minute express computer under the fcfs-15 strategy, at least one user would have waited over half an hour to use a computer for fifteen minutes or less. in all but the highest priority class (p-2 through p-4), the fcfs-15 strategy again performs poorly with at least one user in each of these classes waiting over ninety minutes for a computer. figure 2. maximum user wait minutes by priority class capping the number of times a user may be passed over in the queue under the sfj-fb strategy makes it less likely that members of classes p-2 and p-3 will be able to take advantage of their higher priority to cut in front of users in class p-4 during periods of peak demand. as a result, the sjf-fb maximum wait times for classes p-2 and p-3 are similar to those under the fcfs strategy. this was not the case in the breakdown of sjf-fb average waiting times across priority classes in figure 1. information technology and libraries | june 2012 79 table 3 breaks down waiting times for each queuing strategy according to the overall percentage of users waiting no more than the given number of minutes. here we see the effects of each strategy on the system as a whole, instead of by priority class. notice that the overall average wait times for the non-fcfs strategies are lower than those of fcfs. this indicates that the total reduction in waiting times for high-priority users exceeds the additional time spent waiting by users in class p-4. in other words, these strategies are globally more efficient than fcfs. notice, too, in table 3 that the non-fcfs strategies achieve significant reductions in the median wait time compared with fcfs. table 3. distribution of wait times by strategy after demonstrating the impact that breaking the first-come-first-served rule can have on waiting times, it is important to examine the line cuts that are associated with each of these strategies. line cuts are recorded by each user in the simulation while waiting in the queue. each time a user is selected from the queue and assigned a computer, remaining users who arrived prior to the one just selected note having been skipped over. by the time they are assigned a computer, users have recorded the total number of times they were passed over in the queue. public library computer waiting queues | williamson 80 figure 3. cumulative distribution of line cuts by queuing strategy figure 3 displays the cumulative percentage of users experiencing no more than the listed number of cuts for each non-fcfs strategy. the majority of users are not passed over at all under these strategies. however, there is a small minority of users that will be repeatedly cut in line. for instance, in our simulation, one unfortunate individual was passed over in the queue sixteen times under the sjf strategy. this user waited ninety-one minutes using this strategy as opposed to only fifty-nine minutes under the familiar fcfs waiting queue. most customers would become upset upon seeing a string of sixteen people jump over them in the queue and get on a computer while they are enduring such a long wait. the hrrn strategy caused a maximum of nine cuts to an individual in this simulation. this user waited seventy-three minutes under hrrn versus only fifty-five minutes using fcfs. extreme examples such as those above are the exception. under the hrrn and sjf-fb strategies, 99% of users were passed over at most four times while waiting in the queue. conclusion we have examined the simulation of several queuing strategies using a single month of computer usage data from the southern oaks library. the relative performance difference between queuing strategies will depend on the supply and demand of computers at any given location. clearly, at libraries with plenty of public computers for which customers seldom have to wait, the choice of queuing strategy is inconsequential. however, for libraries struggling with waiting times on par with those examined here, the choice can have a substantial impact. information technology and libraries | june 2012 81 in general, however, these simulation results demonstrate the ability of non-fcfs queuing strategies to significantly lower waiting times for certain classes of users without partitioning the pool of computers. these reductions in waiting times come at the cost of allowing high priority users to essentially cut in line. this causes slightly longer wait times for low priority users; but, overall average and median wait times see a small reduction. of course, for some customers, being passed over in line even once is intolerable. furthermore, creating a system to implement an alternative queuing strategy may present obstacles of its own. however, if the need to provide for quick, short-term computer access is pressing enough for a library to create a separate pool of “express” computers; then, one of the non-fcfs queuing strategies discussed in this paper may be a viable alternative. at the very least, the fcfs-15 simulation results should give one pause before resorting to designated “express” and “nonexpress” computers in an attempt to remedy unacceptable customer waiting times. acknowledgments the author would like to thank the metropolitan library system, kay bauman, jimmy welch, sudarshan dhall, and bo kinney for their support and assistance with this paper as well as tracey thompson and tim spindle for their excellent review and recommendations. references 1. j. d. slone, “the impact of time constraints on internet and web use,” journal of the american society for information science and technology 58 (2007): 508–17. 2. william mendenhall and terry sincich, statistics for engineering and the sciences (upper saddle river, nj: prentice-hall, 2006), 151–54. 3. abraham silberschatz, peter baer galvin, and greg gagne, operating system concepts (hoboken, nj: wiley, 2009), 188–200. editorial: singularity—are we there, yet? | truitt 55 i n my last column, i wrote about two books—nicholas carr ’s the shallows and william powers’ hamlet’s blackberry—relating to learning in the always-on, always connected environment of “screens.”1 since then, two additional works have come to my attention. while i won’t be able to do them justice in the space i have here, they deserve careful consideration and open discussion by those of us in the library community. if carr’s and power’s books are about how we learn in an always-connected world of screens, sherry turkle’s alone together and elias aboujaoude’s virtually you are about who we are in the process of becoming in that world.2 turkle is a psychologist at mit who studies human– computer interactions. among her previous works are the second self (1984) and life on the screen (1995). aboujaoude is a psychiatrist at the stanford university school of medicine, where he serves as director of the obsessive compulsive disorder clinic and the impulse control disorders clinic. based on extensive coverage of specialist and popular literature, as well as numerous anonymized accounts of patients and subjects encountered by the authors, both works are characterized by thorough research and thoughtful analysis. while their approaches to the topic of “what we are becoming” as a result of screens may differ— aboujaoude’s, for example, focuses on “templates” and the terminology of traditional psychiatry, while turkle’s examines the relationship between loneliness and solitude (they are different), and how these in turn relate to the world of screens—their observations of the everyday manifestations of what might be called the pathology of screens bear many common threads. i’m acutely aware of the potential for injustice (at best) and misrepresentation or misunderstanding (rather worse) that i risk in seeking to distill two very complex studies into such a small space. and, frankly, i’m still trying to wrap my head around both the books and the larger issues they raise. with that caveat, i still think we should be reading about and widely discussing the phenomena reported, which many of us observe on a daily basis. in the sections that follow, i’d like to touch on a very few themes that emerge from these books. ■■ “why do people no longer suffice?”3 a pair of anecdotes that turkle recounts to explain her reasons for writing the current book seems worth sharing at the outset. in the first, she describes taking her then-fourteen-year-old daughter, rebecca, to the charles darwin exhibition at new york’s american museum of natural history in 2005. among the many artifacts on display was a pair of live giant galapagos tortoises: “one tortoise was hidden from view; the other rested in its cage, utterly still. rebecca inspected the visible tortoise thoughtfully for a while and then said matter-of-factly, ‘they could have used a robot.’” when turkle queried other bystanders, many of the children agreed, with one saying, ‘for what the turtles do, you didn’t have to have live ones.’” in this case, “alive enough” was sufficient for the purpose at hand.4 sometime later, turkle read and publicly expressed her reservations about british computer scientist david levy’s book, love and sex with robots, in which levy predicted that by the middle of this century, love with robots will be as normal as love with other humans, while the number of sexual acts and lovemaking positions commonly practiced between humans will be extended, as robots will teach more than is in all of the world’s published sex manuals combined.5 contacted by a reporter from scientific american about her comments regarding levy’s book, turkle was stunned when the reporter, equating the possibility of relationships between humans and robots with gay and lesbian relationships, accused her of likewise opposing these human-to-human relationships. if we now have reached a point where gay and lesbian relationships can strike us as comparable to human-to-machine relationships, something very important has changed; for turkle, it suggested that we are on the threshold of what she terms the “robotic moment”: this does not mean that companionate robots are common among us; it refers to our state of emotional—and i would say philosophical—readiness. i find people willing to seriously consider robots not only as pets but as potential friends, confidants and romantic partners. we don’t seem to care what these artificial intelligences “know” or “understand” of the human moments we might “share” with them. at the robotic moment, the performance of connection seems connection enough. we are poised to attach to the inanimate without prejudice.6 marc truitteditorial: singularity—are we there, yet? marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 56 information technology and libraries | june 2011 while these examples are admittedly extreme, both authors agree that something very basic has changed in the way we conduct ourselves. turkle characterizes it as mobile technology having made each of us “pausable,” i.e., that a face-to-face interaction being interrupted by an incoming call, text message, or e-mail is no longer extraordinary; rather, in the “new etiquette,” it is “close to the norm.”10 and the rudeness, as well we know, isn’t limited to mobile communications. referring to “flame wars,” which regularly erupt in online communities, aboujaoude observes: the internet makes it easier to suspend ethical codes governing conduct and behavior. gentleness, common courtesy, and the little niceties that announce us as well-mannered, civilized, and sociable members of the species are quickly stripped away to reveal a completely naked, often unpleasant human being.11 even our routine e-mail messages—lacking as they often do salutations and closing sign-offs—are characterized by a form of curtness heretofore unacceptable in paper communications. remarkably, to those old enough to recall the traditional norms, the brusqueness is not only unintended, it is as well unconscious; “[we] just don’t think warmth and manners are necessary or even advisable in cyberspace.”12 ■■ castles in the air: avatars, profiles, and remaking ourselves as we wish we were finally, a place to love your body, love your friends, and love your life. —second life, “what is second life?”13 one of the interesting and worrisome themes in both turkle’s and aboujaoude’s studies is that of the reinvention and transformation of the self, in the form of online personas and avatars. this is the stock-in-trade of online communities and gaming sites such as facebook and second life. these sites cater to our nearly universal desire to be someone other than who we are: online, you’re slim, rich, and buffed up, and you feel you have more opportunities than in the real world. . . . we can reinvent ourselves as comely avatars. we can write the facebook profile that pleases us. we can edit our messages until they project the self we want to be.14 the problem is that for many there is an increasing fuzziness at the interface between real and virtual ■■ changing mores, or the triumph of rudeness i can’t think of any successful online community where the nice, quiet, reasonable voices defeat the loud, angry ones. . . . the computer somehow nullifies the social contract. —heather champ, yahoo!’s flickr community manager7 sadly, we’ve all experienced it. we get stuck on a bus, train, or in an elevator with someone engaged in a loud conversation on her or his mobile phone. all too often, the person is loudly carrying on about matters we wish we weren’t there to hear. perhaps it’s a fight with a partner. or a discussion of some delicate health matter. whatever it is, we really don’t want to know, but because of the limitations imposed by physical spaces, we can’t avoid being a party to at least half of the conversation. what’s wrong with these individuals? do they really have no consideration or sense of propriety? it turns out that in matters of tact and good taste, the ground has shifted, and where once we understood and abided by commonly accepted rules of conduct and respect for others, we do so no longer. indeed, the everyday obnoxious intrusions by those using public spaces for their private conversations are among the least of offenders. consider the following situations shared by turkle: sal, 62 years old, holds a small dinner party at his home as part of his “reentry into society” after several years of having cared for his recently deceased wife: i invited a woman, about fifty, who works in washington. in the middle of a conversation about the middle east, she takes out her blackberry. she wasn’t speaking on it. i wondered if she was checking her e-mail. i thought she was being rude, so i asked her what she was doing. she said that she was blogging the conversation. she was blogging the conversation.8 turkle later tells of attending a memorial service for a friend. several [attendees] around me used the [printed] program’s stiff, protective wings to hide their cell phones as they sent text messages during the service. one of the texting mourners, a woman in her late sixties, came over to chat with me after the service. matter-of-factly, she offered, “i couldn’t stand to sit that long without getting on my phone.” the point of the service was to take a moment. this woman had been schooled by a technology she’d had for less than a decade to find this close to impossible.9 editorial: singularity—are we there, yet? | truitt 57 enough” became yet more blurred. turkle’s anecdotes of children explaining the “aliveness” of these robots are both touching and disturbing. speaking of a tamagotchi, one child wrote a poem: “my baby died in his sleep. i will forever weep. then his batteries went dead. now he lives in my head.”19 the concept of “alive enough” is not unique to the very young, either. by 2009, sociable robots had moved beyond children’s toys with the introduction of paro, a baby seal-like “creature” aimed at providing companionship to the elderly and touted as “the most therapeutic robot in the world. . . . the children were onto something: the elderly are taken with the robots. most are accepting and there are times when some seem to prefer a robot with simple demands to a person with more complicated ones.”20 where does it end? turkle goes on to describe nursebot, a device aimed at hospitals and long-term care facilities, which colleagues characterized as “a robot even sherry can love.” but when turkle injured herself in a fall a few months later, [i was] wheeled from one test to another on a hospital stretcher. my companions in this journey were a changing collection of male orderlies. they knew how much it hurt when they had to lift me off the gurney and onto the radiology table. they were solicitous and funny. . . . the orderly who took me to the discharge station . . . gave me a high five. the nursebot might have been capable of the logistics, but i was glad that i was there with people. . . . between human beings, simple things reach you. when it comes to care, there may be no pedestrian jobs.21 but need we librarians care about something as farfetched as nursebot? absolutely. now that ibm has proven that it can design a machine—okay, an array of machines, but something much more compact is surely coming soon—that can win at jeopardy!, is the robotic reference librarian really that much of a hurdle? take a bit of watson technology, stick it in nursebot, give it sensible shoes, and hey, i can easily imagine bibliobot, factory-standard in several guises, including perhaps donna reed (as mary, who becomes the town librarian in the alter-life of capra’s it’s a wonderful life) or shirley jones (as marian, the librarian, in the music man). i like donna reed as much as anyone, but do i really want reference assistance from her android doppelgänger? but then, for years after the introduction of the atm, i confess that i continued taking lunch hours off just so that i could deal with a “real person” at the bank, so perhaps it’s just me. the future is in the helping/service professions, indeed! and when we’re all replaced by robots (sociable and otherwise), what will we do to fill the time? personas: “not surprisingly, people report feeling let down when they move from the virtual to the real world. it is not uncommon to see people fidget with their smartphones, looking for virtual places where they might once again be more.”15 turkle speaks of the development of what she terms a “vexed relationship” between the real and the virtual: in games where we expect to play an avatar, we end up being ourselves in the most revealing ways; on social-networking sites such as facebook, we think we will be presenting ourselves, but our profile ends up as somebody else—often the fantasy of who we want to be. distinctions blur.16 and indeed, some completely lose sight of what is real and what is not. aboujaoude relates the story of alex, whose involvement in an online community became so consuming that he not only created for himself an online persona—“’i then meticulously painted in his hair, streak by streak, and picked “azure blue” for his eye color and “snow white” for his teeth.’”—but also left his “real” girlfriend after similarly remaking the avatar of his online girlfriend, nadia—“from her waist size to the number of freckles on her cheeks.” speaking of his former “real” girlfriend, alex said, “real had become overrated.”17 ■■ “don’t we have people for these jobs?”18 ageist disclaimer: when i grew up, robots—those that weren’t in science fiction stories or films—were things that were touted as making auto assembly lines more efficient, or putting auto workers out of jobs, depending on your perspective. while not technically a robot, the other machine that characterized “that time” was the automated teller machine (atm), which freed us from having to do our banking during traditional weekday hours, and not coincidentally resulted, again, in the loss of many entry-level jobs in financial institutions. as i recall, we were all reassured that the future lay in “helping/ service” professions, where the danger of replacement by machines was thought to be minimal. now, fast forward 30 years. the first half of turkle’s book is the history of “sociable robots” and our interactions with them. moving from the reactions of mit students to joseph weizenbaum’s eliza in the mid-1970s, she recounts her studies of children’s interactions, first with electronic toys—e.g., tamagotchi—and later, with increasingly sophisticated and “alive” robots, such as furby, aibo, and my real baby. with each generation, these devices made yet more “demands” on their owners—for care, “feeding”, etc. and with each generation, the line between “alive” and “alive 58 information technology and libraries | june 2011 to admit that we’ve seen many examples of how connectedness between people we’d otherwise consider “normal” has and is changing our manners and mores.24 many libraries and other public spaces, reacting to patron complaints about the lack of consideration shown by some users, have had to declare certain areas “cell phone free.” in the interest of getting your attention, i’ve admittedly selected some fairly extreme examples from the two books at hand. however, i think the point is that, now that the glitter of always-on, always-connected, has begun to fade a bit, there is a continuum of dysfunctional behaviors that we are beginning to notice, and it’s time to talk about how we as librarians fit into all of this. are there things we in libraries are doing that encourage some of these less desirable and even unhealthy behaviors? which takes us to a second concern raised by some of my gentle draft-readers: we’ve heard this tale before. television, and radio before it, were technologies that, when they were new, were criticized as corrupting and leading us to all sorts of negative, self-destructive, and socially undesirable behaviors. how are screens and the technology of always-connected any different? a part of me—the one that winces every time someone glibly refers to the “transformational” changes taking place around us—agrees. i was trained as a historian, to take a long view about change. and we’re talking about technologies that—in the case of the web— have been in common use for just over fifteen years. that said, my interest here is in seeing our profession begin a conversation about how connective technologies have influenced behavioral changes in people, and especially about how we in libraries may be unwittingly abetting those behavioral changes. television and radio were fundamentally different technologies in that they were one-way broadcast tools. and to the best of my recollection, neither has ever been widely adopted by or in libraries. yes, we’ve circulated videos and sound recordings, and even provided limited facilities for the playback of such media. but neither has ever really had an impact on the traditional core business of libraries, which is the encouragement and facilitation of the largely solitary, contemplative act of reading. connective technologies, in the form of intelligent machines and network-based communities, can be said to be antithetical to this core activity. we need to think about that, and to consider carefully the behaviors we may be encouraging. notwithstanding those critics of change in our profession who feel we move far too glacially, i would maintain that we have often been, if not at the forefront of the technology pack, then certainly among its most enthusiastic ■■ where from here? i titled this column “singularity.” for those not familiar with the literature of science fiction, turkle provides a useful explanation: this notion has migrated from science fiction to engineering. the singularity is the moment—it is mythic; you have to believe in it—when machine intelligence crosses a tipping point. past this point, say those who believe, artificial intelligence will go beyond anything we can currently conceive. . . . at the singularity, everything will become technically possible, including robots that love. indeed, at the singularity, we may merge with the robotic and achieve immortality. the singularity is technological rapture.22 i think it’s pretty clear that we’re still a fair distance from anything that one might reasonably term a singularity. but the concept is surely present, albeit in a somewhat less hubristic degree, when we speak in uncritical awe of “game-changing” or “transformational” technologies. turkle puts it this way: the triumphalist narrative of the web is the reassuring story that people want to hear and that technologists want to tell. but the heroic story is not the whole story. in virtual worlds and computer games, people are flattened into personae. on social networks, people are reduced to their profiles. on our mobile devices, we often talk to each other on the move and with little disposable time—so little, in fact, that we communicate in a new language of abbreviation in which letters stand for words and emoticons for feelings. . . . we are increasingly connected to each other but oddly more alone: in intimacy, new solitudes.23 some of my endlessly patient friends—the ones who provide both you and me with some measure of buffering from the worst of my rants in prepublication drafts of these columns—have asked questions about how all this relates to libraries, for example: how much it is legitimate to generalize to the broader population research findings from cases of obsessive compulsive disorder? the individuals studied are, of course, obsessive and compulsive, in relation to the internet and new technologies. do their behaviors not represent an extreme end of the population? a fair question. and yes, the examples i’ve provided in this column are admittedly somewhat extreme. but turkle and aboujaoud both point to many examples that are far more common. i think all of us would have editorial: singularity—are we there, yet? | truitt 59 references and notes 1. marc truitt, “editorial: the air is full of people,” information technology and libraries 30 (mar. 2011): 3–5. http:// www.ala.org/ala/mgrps/divs/lita/ital/302011/3001mar/ editorial_pdf.cfm (accessed apr. 25, 2011). 2. sherry turkle, alone together: why we expect more from technology and less from each other (new york: basic books, 2011); elias aboujaoude, virtually you : the dangerous powers of the e-personality (new york : norton, 2011). 3. turkle, 19. 4. ibid., 3–4. 5. quoted in ibid., 5. 6. ibid., 9–10. emphasis added. 7. quoted in aboujaoude, 99. 8. turkle, 162. emphasis in original. 9. ibid, 295. 10. turkle, 161. 11. aboujaoude, 96 12. ibid., 98. 13. quoted in turkle, 1. 14. ibid., 12. 15. ibid. 16. ibid., 153. 17. aboujaoude, 77–78. 18. turkle, 290. 19. ibid., 34. 20. ibid., 103–4. 21. ibid., 120–21. 22. ibid., 25. 23. ibid., 18–19. 24. for a recent and typical example, see david carr, “keep your thumbs still when i’m talking to you,” new york times, apr. 15, 2011, http://www.nytimes.com/2011/04/17/ fashion/17text.html (accessed may 2, 2011). 25. aboujaoude, 283. adopters. in our quest to remain “relevant” to our university or school administrations, governing boards, and (in theory, at least) our patrons, we have embraced with remarkably little reservation just about every technology trend that’s come along in the past few decades. at the same time, we’ve been remarkably uncritical and unreflective about our role in, and the larger implications of, what we might be doing by adopting these technologies. aboujaoude, in a surprising, but i think largely correct summary comment, observes: extremely little is available, however, for the individual interested in learning more about how virtual technology has reshaped our inner universe and may be remapping our brains. as centers of learning, public libraries, schools, and universities may be disproportionately responsible for this deficiency. they outdo one another in digitalizing their holdings and speeding up their internet connections, and rightfully see those upgrades as essential to compete for students, scholars, and patrons. in exchange, however, and with few exceptions, they teach little about the unintended, less obvious, and more personal consequences of the world wide web. the irony is, at least in some libraries’ case, that their very survival seems threatened by a shift that they do not seem fully engaged in trying to understand, much less educate their audiences about.25 i could hardly agree more. so, how do we answer aboujaoude’s critique? editorial | truitt 87 l ife out of balance. those who saw it will surely recall the 1982 film that juxtaposed images of stunning natural beauty with scenes of humankind’s intrusion into the environment, all set to a score by philip glass. the title is a hopi word meaning “life out of balance,” “crazy life,” “life in turmoil,” “life disintegrating,” or “a state of life that calls for another way of living.” while the film, as i recall, relied mainly on images of urban landscapes, mines, power lines, etc., to make its point about our impact on the world around us, it did include as well images that had a technological focus, even if the pre–pc technology exemplars shown may seem somewhat quaint thirty years later.1 the sense that one is living in unbalanced, crazy, or tumultuous times is nothing new. indeed, i think it’s fair to say that most of us—our eyes and perspectives firmly and narrowly riveted to the here and now—tend to believe that our own specific time is one of uniquely rapid and disorienting change. but just as there have been, and will be, periods of rapid technological change, social upheaval, etc.—“been there, done that, got the t-shirt,” to recall the memorably pithy, if now slightly oh-so-aughts, slogan—so too have there been reactions to the conditions that characterized those times. a couple of very different but still pertinent examples come to mind. in the second half of the nineteenth century, a reaction against the social conservatism and shoddy, mass-produced goods of the victorian era began in england. inspired by writer and designer william morris, the arts and crafts movement emphasized simplicity, hand-made (as opposed to factory-made) objects, and social reform. by the turn of the century, the movement had migrated to the united states—memo to self: who were the leading lights of the movement in canada?—finding expression in the “mission-style” furniture of gustav stickley, the elegant art pottery of rookwood, marblehead, and teco, and the social activism of elbert hubbard’s roycrofters. fast-forward another half-century to the mid-1960s and the counter-culture of that time, itself a reaction to the racism, sexism, militarism, and social regimentation of the preceding decade. for a brief period, experimentation with “alternative lifestyles,” resistance to the vietnam war, and agitation for social, racial, and sexual change flourished. whatever one’s views about, say, the flower children, civil rights demonstrations, or the wisdom of u.s. involvement in vietnam, it’s well-nigh impossible to argue that the society that emerged from that time was not fundamentally different from the one that preceded it. that both of these “movements” ultimately were subsumed into the larger whole from which they sprang is only partly the issue. and my aim is not to romanticize either of these times, even as i confess to more than a passing interest in and sympathy for both. rather, my point is that their roots lay in a reaction to excesses—social, cultural, economic, political, even technological—that marked their times. they were the result of what might be termed “life out of balance.” in turn, their result, viewed through a longer lens, was a new balance, incorporating elements of the status quo ante and critical pieces from the movements themselves. thesis —> antithesis —> synthesis. we find ourselves in such unbalanced times again today. even without resort to over-hyped adjectives such as “transformational,” it is fair to say that we are in uncertain times. in libraries, budgets, staffing levels, and gate counts are in decline. the formats and means of information delivery are rapidly changing. debates rage over whether we are merely in the business of delivering “information” or of preserving, describing, and imparting learning and knowledge. perhaps worst of all, as our role in the society of which we are a part changes into something we cannot yet clearly see, we fear “irrelevance.” what will happen when everyone around us comes to believe that “everything [at least, everything that’s important] is on the web” and that libraries and librarians no longer have a raison d’être? for much of the past decade and a half—some among us might argue even longer—we’ve reacted by taking the rat-in-the-wheel approach. to remain “relevant,” we’ve adopted practically every new fad or technology that came along, endlessly spinning the wheel faster and faster, adopting the tokens of society around us in the hope that by so doing we would stanch the bleeding of money, staff, patrons, and our own morale. as i’ve observed in this space previously,2 we’ve added banks of über-connected computers, clearing away book stacks to design technology-focused creative services and collaborative spaces around them. we’ve treated books almost as smut, to be hidden away in “plain-brown-wrapper” compact storage facilities. we’ve reduced staffing, in the process outsourcing some services and automating others so that they become depersonalized, the library equivalent of a bank automated teller machine. we’ve forsaken collection building, preferring instead to rent access to resources we don’t own and to cede digitization control of those resources that we ostensibly do own. where does it end? in a former job, i used to joke that my director’s vision of the library would not be fully realized until no one but the director and the library’s system administrator were left on staff and nothing but a giant super-server remained of the library. it seemed only black humor then. today it’s just black. marc truitt marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. editorial: koyaanisqatsi 88 information technology and libraries | september 2011 and intellectual rest. they are places of the imagination. play to these strengths. those seeking to reimagine library spaces as refuges could hardly do better than to look to jasper fforde’s magical bookworld in the thursday next series for inspiration.3 stuffy academics and special libraries take note: library magic is not something restricted to children’s rooms in public libraries. walk through the glorious spaces of yale’s sterling memorial library or visit the reading room at the university of alberta’s rutherford library—known to the present generation of students as the “harry potter room,” for its evocation of the hogwarts school’s great hall—and then tell me that magic does not abound in such places. it’s present in all of our libraries, if we but have eyes to see and hearts to feel. ■■ the library was once a place for the individual. to contemplate. to do research. to know the peace and serenity of being alone. in recent years, as we’ve moved toward service models that emphasize collaboration and groups, i think we’ve lost track of those who do not visit us to socialize or work in groups. we need to reclaim them by devoting as much attention to services and spaces aimed at those seeking aloneness as we do at those seeking togetherness. the preceding list will probably brand me in the minds of some readers as anti-technology. i am not. after spending the greater part of my career working in library it, i still can be amazed at what is possible. “golly? we can do that?” but i firmly believe that library technology is not an end in itself. it is a tool, a service, whose purpose is to facilitate the delivery of knowledge, learning, and information that our collections and staff embody. nothing more. that world view may make me seem old fashioned; if such be the case, count me proudly guilty. in the end, though, i come back to the question of balance. there was a certain balance in and about libraries that prevailed before the most recent waves of technological change began washing over libraries a couple of decades ago. those waves disrupted but did not destroy the old balance. instead, they’ve left us out of balance, in a state of koyaanisqatsi. it’s time to find a new equilibrium, one that respects and celebrates the strengths of our traditional services and collections while incorporating the best that new technologies have to offer. it’s time to synthesize the two into something better than either. it’s time for balance. references and notes 1. wikipedia, “koyaanisqatsi,” http://en.wikipedia.org/ wiki/koyaanisqatsi (accessed july 12, 2011). ital readers in the united states can view the entire film online at http://www more importantly, where has all this wheel spinning gotten us, other than continued decline and yet more hand-wringing and anguish about irrelevance? it’s time to recognize that we are living in a state of koyaanisqatsi (life out of balance). and it’s up to us to do something new about it by creating a new balance. here are a few perhaps out-of-the-box ideas that i think could help with establishing that balance. spoiler alert: some of these may seem just a bit retro. i can’t help it: my formative library years predate the chicxulub asteroid impact. anyway, here goes: ■■ cease worrying so about “relevance.” instead, identify our niche: design services and collections that are “right” and uniquely ours, rather than pale reflections of fads that others can do better and that will eventually pass. we are not google. we are not starbucks. we know that we cannot hope to beat these sorts of outfits at their games; perhaps less obvious is that we should be extremely wary of even partnering with them. their agenda is not ours, and in any conflict between agendas, theirs is likely to prevail. we must identify something unique at which we excel. ■■ find comfort in our own skins. too many of us, i sense, are at some level uneasy with calling ourselves “librarians.” perhaps this is so because so many of us came to the profession by this or that circuitous route, that is, that we intended to be something else and wound up as librarians. get over it and wear the sensible shoes proudly. ■■ stop trying to run away from or hide books. they are, after all, perceived as our brand. is that such a bad thing? ■■ quit designing core services and tools that are based on the assumption that our patrons are all lazy imbeciles who will otherwise flee to google. the evidence suggests that those folks so inclined are already doing it anyway; why not instead aim at the segment that cares about provision of quality content and services—in collections, face-to-face instruction, and metadata? people can detect our arrogance and condescension on this point and will respond accordingly, either by being insulted and alienated or by acting as we depict them. ■■ begin thinking about how to design and deliver services that are less reliant on technology. technology has become, to borrow from marx, the opiate of libraries and librarians; we rely on it to the exclusion of nontechnological approaches, even when the latter are available to us. technology has become an end in itself, rather than a means to an end. ■■ libraries are perceived by many as safe harbors and refuges from any number of storms. they are places of rest—not only of physical rest, but of emotional editorial | truitt 89 editorial.cfm (accessed july 13, 2011). 3. begin with fforde’s the eyre affair (2001) and proceed from there. if you are a librarian and are not quickly enchanted, you probably should consider a career change very soon! thank you, michele n! .youtube.com/watch?v=sps6c9u7ras. sadly, the rest of us must borrow or rent a copy. 2. marc truitt, “no more silver bullets, please,” information technology & libraries 29, no. 2 (june 2010), http://www.ala .org/ala/mgrps/divs/lita/publications/ital/292010/2902jun/ we give to the organization. the lita assessment and research committee recently surveyed membership to find out why people belong to lita, this is an important step in helping lita provide programming etc. that will be most beneficial to its users, but the decision on whether to be a lita member i believe is more personal and doesn’t rest on the fact that a particular drupal class is offered or that a particular speaker is a member of the top tech trends panel. it is based on the overall experience that you have as a member, the many little things. i knew in just a few minutes of attending my first lita open house 12 years ago that i had found my ala home in lita. i wish that everyone could have such a positive experience being a member of lita. if your experience is less than positive how can it be more so? what are we doing right? what could we do differently? please let me or another officer know, and/or volunteer to become more involved and create a more valuable experience for yourself and others. president’s message continued from page 86 26 information technology and libraries | june 2008 preparing locally encoded electronic finding aid inventories for union environments: a publishing model for encoded archival description author id (to come) plato l. smith ii this paper will briefly discuss encoded archival description (ead) finding aids, the workflow and process involved in encoding finding aids using ead metadata standard, our institution’s current publishing model for ead finding aids, current ead metadata enhancement, and new developments in our publishing model for ead finding aids at florida state university libraries. for brevity and within the scope of this paper, fsu libraries will be referred to as fsu, electronic ead finding and/ or archival finding aid will be referred as ead or eads, and locally encoded electronic ead finding aids inventories will be referred to as eads @ fsu. n what is an ead finding aid? many scholars, researchers, and learning and scholarly communities are unaware of the existence of rare, historic, and scholarly primary source materials such as inventories, registers, indexes, archival documents, papers, and manuscripts located within institutions’ collections/holdings, particularly special collections and archives. a finding aid—a document providing information on the scope, contents, and locations of collections/ holdings—serves as both an information provider and guide for scholars, researchers, and learning and scholarly communities, directing them to the exact locations of rare, historic, and scholarly primary source materials within institutions’ collections/holdings, particularly noncirculating and rare materials. the development of the finding aid led to the institution of an encoding and markup language that was software/hardware independent, flexible, extensible, and allowed online presentation on the world wide web. in order to provide logical structure, content presentation, and hierarchical navigation, as well as to facilitate internet access of finding aids, the university of california–berkeley library in 1993 initiated a cooperative project that would later give rise to development of the nonproprietary sgml-based, xml-compliant, machine-readable markup language encoding finding aid standard, encoded archival description (ead) document type definition (dtd) (loc, 2006a). thus, an ead finding aid is a finding aid that has been encoded using encoded archival description and which should be validated against an ead dtd. the ead xml that produces the ead finding aid via an extensible style sheet language (xsl) should be checked for well-formed-ness via an xml validator (i.e. xml spy, oxygen, etc.) to ensure proper nesting of ead metadata elements “the ead document type definition (dtd) is a standard for encoding archival finding aids using extensible markup language (xml)” (loc, 2006c). an ead finding aid includes descriptive and generic elements along with attribute tags to provide descriptive information about the finding aid itself, such as title, compiler, compilation date, and the archival material such as collection, record group, series, or container list. florida state university libraries has been creating locally encoded electronic encoded archival description (ead) finding aids using a note tab light text editor template and locally developed xsl style sheets to generate multiple ead manifestations in html, pdf, and xml formats online for over two years. the formal ead encoding descriptions and guidelines are developed with strict adherence to the best practice guidelines for the implementation of ead version 2002 in florida institutions (fcla, 2006), manuscript processing reference manual (altman & nemmers, 2006), and ead version 2002. an ead note tab light template is used to encode findings down to the collection level and create ead xml files. the ead xml files are tranformed through xsl stylesheets to create ead finding aids for select special collections. n ead workflow, processes, and publishing model the certified archivist and staff in special collections and a graduate assistant in the digital library center encode finding aids in ead metadata standard using an ead clip and ead template library in note tab light text editor via data entry input for the various descriptive, administrative, generic elements, and attribute metadata element tags to generate ead xml files. the ead xml files are then checked for validity and well-formed-ness using xml spy 2006. currently, ead finding aids are encoded down to the folder level, but recent florida heritage project 2005–2006 grant funding has allowed selected special collections finding aids to be encoded down to the item level. currently, we use two xsl style sheets, ead2html.xsl and ead2pdf.xsl, to generate html and pdf formats, and simply display the raw xml as part of rendering ead finding aids as html, pdf, and xml and presenting these manifestations to researchers and end users. the ead2html.xsl style sheet used to generate the html versions was developed with specifications such as use of fsu seal, color, and display with input from the special collections department head. the ead2pdf.xsl style sheet used to generate pdf versions uses xsl-fo (formatting plato l. smith ii (psmithii@fsu.edu) is digital initiatives librarian at florida state university libraries, tallahassee. preparing locally encoded electronic finding aid inventories for union environments | smith 27 object), and was also developed with specifications for layout and design input from the special collections department head. the html versions are generated using xml spy home edition with built-in xslt, and the pdf versions are generated using apache formatting object processor (fop) software from the command line. ead finding aids, eads @ fsu, are available in html, pdf, and xml formats (see figure 1). the style sheets used, ead authoring software, and eads @ fsu original site are available via www.lib.fsu.edu/dlmc/dlc/ findingaids. n enriching ead metadata as ead standards and developments in the archival community advance, we had to begin a way of enriching our ead metadata to prepare our locally encoded ead finding aids for future union catalog searching and opac access. the first step toward enriching the metadata of our ead finding aids was to use rlg ead report card (oclc, 2008) on one of our ead finding aids. the test resulted in the display of missing required (req), mandatory (m), mandatory if applicable (ma), recommended (rec), optional (opt), and encoding analogs (relatedencoding and encodinganalog attributes) metadata elements (see figure 2). the second test involved reference online archive of california best practices guidelines (oac bpg), specifically appendix b (cdl, 2005, ¶ 2), to create a formal public identifier (fpi) for our ead finding aids and make the ead fpis describing archives content standards (dacs)–compliant. this second test resulted in the creation of our very first dacs– compliant ead formal public identifier. example: <eadid countrycode=“us” identifier=“mss2003004” mainagencycode=“ftasu” publicid=“-//florida state university::strozier library::special collections//text (us::ftasu::ftasu2003004:: bernard f. sliger collection)//en”>ftasu2003004. xml</eadid> the rlg ead report card and appendix b of oac bpg together helped us modify our ead finding aid encoding template and workflow to enrich the ead document identifier <did> metadata tag element, include missing mandatory ead metadata elements, and develop fpis for all of our ead finding aids. prior to recent new developments in the publishing model of ead finding aids at fsu libraries, the ead finding aids in our eads @ fsu inventories could not be easily found using traditional web search engines, were part of the so-called “deep web,” (prom & habing, 2002) and were “unidimensional in that they [were] based upon the assumption that there [was] an object in a library and there [was] a descriptive surrogate for that object, the cataloging record” (hensen, 1999). ead finding aids in our eads @ fsu inventories did not have a descriptive surrogate catalog record and lacked the relevant related encoding and analog metadata elements within the ead metadata with which to facilitate “metadata crosswalks”—mapping one metadata standard with another metadata standard to facilitate crosssearching. “to make the metadata in ead instance as robust as possible, and to allow for crosswalks to other encoding schemes, we mandate the inclusion of the relatedencoding and encodinganalog attributes in both the <eadheader> and <archdesc> segments” (meissner, et al., 2002). incorporating an ead quality checking tool such as rlg bpg and ead compliance such as dacs when figure 1. ead finding aids in html, pdf, and xml format figure 2. rlg ead report card of xml ead file 28 information technology and libraries | june 2008 authoring eads, will assist in improving ead encoding and ead finding aids publishing model. n some key issues with creating and managing ead finding aids one of the major issues with creating and managing ead finding aids is the set of rules used for describing papers, manuscripts, and archival documents. the former set of rules used for providing consistent descriptions and anglo-american cataloging rules (aacr) bibliographic catalog compliance for papers, manuscripts, and archival documents down to collection level was archives, personal papers, and manuscripts (appm), which was complied by steven l. hensen and published by the library of congress in 1983. however, the need for more description granularity down to the item level, enhanced bibliographic catalog specificity, marc and ead metadata standards implementations and metadata standards crosswalks, and inclusion of descriptors of archival material types beyond personal papers and manuscripts prompted the development of describing archives: a content standard (dacs), published in 2004 with the second edition published in 2007. “dacs [u.s. implementation of international standard for the description of archival materials and their creators] is an output-neutral set of rules for describing archives, personal papers, and manuscripts collections, and can be applied to all material types ”(pearce-moses, 2005). some international standards for describing archival materials are general international standard archival description isad(g) and international standard archival authority record for corporate bodies, persons, and families [isaar(cpf)]. other issues with creating and managing ead finding aids include (list not exhaustive): 1. online presentation of finding aids 2. exposing finding aids electronically for searching 3. provision of a search interface to search finding aids 4. online public access catalog record (marc) and link to finding aids 5. finding aids linked to digitized content of collections eads @ fsu exist in html for online presentation, pdf for printing, and xml for exporting, which allow researchers greater flexibility and options in the information-gathering and research processes and have improved the way archivists communicated guides to archival collections with researchers as opposed to paper finding aids physically housed within institutions. eads @ fsu have existed online in html, pdf, and xml formats for two years in a static html document and then moved to drupal (mysql database with php) for about one year, which improved online maintenance but not researcher functionality. however, the purchase and upgrade of a digital content management system marked a huge advancement in the development of our ead finding aids implementation and thus resolutions to issues numbers 1–3. researchers now have a single-point search interface to search eads @ fsu across all our digital collections/ institutional repository (see figure 3); the ability to search within the finding aids via full-text indexing of pdfs; the option of brief (thumbnails with ead, htm, pdf, and xml manifestation icons), table (title, creator, and identifier), and full (complete ead finding aid dc record with manifestations) views of search results, which provides different levels of exposures of ead finding aids; and the ability to save/e-mail search results. future initiatives are underway to enhance eads @ fsu implementation via the creation of ead marc records through dublin core to marc metadata crosswalk, to deep link to ead finding aids via 856 field in marc records, and to begin digitizing and linking to ead finding aids archival content via digital archival object <dao> ead element. <dao> is “linking element that uses the attributes entityref or href to connect the finding aid information to electronic representations of the described materials. the <dao> and <daogrp> elements allow the content of an archival collection or record figure 3. online search gui for ead finding aids and digital collections within ir preparing locally encoded electronic finding aid inventories for union environments | smith 29 group to be incorporated in the finding aid” (loc, 2006b). we have opted to create basic dublin core records of ead finding aids based on the information in the ead finding aids descriptive summary (front matter) first and then crosswalk to marc, but are cognizant that this current workflow is subject to change in the pursuit of advancement. however, we are seeking ways to improve the ead workflow and ead marc record creation through more communication and future collaboration with the fsu libraries cataloging department. n number of finding aids and percent of eads @ fsu as of february 16, 2006, we had 700 collections with finding aids in which 220 finding aids are electronic and encoded in html (31 percent of total finding aids). from the 220 electronic finding aids, 60 are available as html, pdf, and xml finding aids (20 percent of electronic finding aids are eads @ fsu). however, we currently have 63 ead finding aids available online in html, pdf, and xml formats. n new developments in publishing eads @ fsu current eads @ fsu include the recommendations from test 1 and test 2 (rlg bpg and dacs compliance) which were discussed earlier and the digital content management system (i.e. digitool) creates a descriptive digital surrogate of the ead objects in the form of brief and basic dublin core metadata records for each ead finding aid along with multiple ead manifestations (see figure 4). we have successfully built and launched our first new digital collection, fsu special collections ead inventories, in digitool 3.0 as part of fsu libraries dlc digital repository (http://digitool3.lib.fsu.edu/r/), a relational database digital content management system (dcms). digitool has an oracle 9i relational database management system backend, searchable web-based gui, a default ead style sheet that allows full-text searching of eads, supports marc, dc, mets metadata standards, jpeg2000 (built in tools for images and thumbnails) as well as z39.50 and oai protocols which will enable resource discovery and exposing of eads @ fsu. you can visit fsu special collections ead finding aids inventories at http://digitool3.lib.fsu.edu/r/? func=collections-result&collection_id=1076. n national, international, and regional aggregation of finding aids initiatives rlg’s archivegrid (http://archivegrid.org/web/index. jsp) is an international, cross-institutional search constituting the aggregation of primary source archival materials of more than 2,500 research libraries, museums, and archives with a single-point interface to search archival collections from across research institutions. other international, cross-institutional searches of aggregated archival collections are: n intute: arts& humanities in the united kingdom www.intute.ac.uk/artsandhumanities/ cgi-bin/browse.pl?id=200025 (international guide to subcategories of archival materials) n archives made easy www.archivesmade easy.org (guide to archives by country) there are also some regional initiatives, which provide cross-institutional search of aggregations of finding aids: n publication of archival library and museum materials (palmm) http://palmm.fcla.edu (crossfigure 4. ead finding aids in ead (default), html, pdf, and xml manifestations 30 information technology and libraries | june 2008 institutional searches in fl fsu participates, fl) n virginia heritage: guides to manuscript and archival collections in virginia http://ead.lib .virginia.edu/vivaead/ (cross-institutional searches in virginia) n texas archival resources online www.lib.utexas. edu/taro/ (cross-institutional searches in texas) n online archive of new mexico http://elibrary .unm.edu/oanm/ (cross-institutional searches in new mexico) awareness of regional, national, and international aggregation of finding aids initiatives and engagement in regional aggregation of finding aids will enable a consistent advancement in the development and implementation of eads @ fsu. acknowledgments fsu libraries digital library center and special collections department, florida heritage project funding (fcla), chuck f. thomas (fcla), and robert mcdonald (sdsc) assisted in the development, implementation, and success of eads at fsu. references altman, b. & nemmers, j. (2006). manuscripts processing reference manual. florida state university special collections. california digital library (cdl). (2005). oac best practice guidelines for encoded archival description, appendix b. formal public identifiers for finding aids. retrieved october 6, 2006 from www.cdlib.org/inside/diglib/guidelines/bpgead/ bpgead_app.html#d0e2995. digital library center, florida state university libraries. (2006). fsu special collections ead finding aids inventories. retrieved january 5, 2007 from http://digitool3.lib.fsu.edu/ r/?func=collections-result&collection_id=1076. florida center of library automation (fcla). (2004). palmm: publication of archival library and museum materials, archival collections. retrieved january 7, 2007 from http://palmm.fcla .edu. florida center for library automation (fcla). (2006). best practice guidelines for the implementaton of ead version 2002 in florida institutions. (john nemmers, ed.). accessed april 21, 2008, at www.fcla.edu/dlini/openingarchives/new/ floridaeadguidelines.pdf fox, m. (2003). the ead cookbook — 2002 edition.chicago: the society of american archivists. retrieved october 6, 2006 from www.archivists.org/saagroups/ead/ead2002cookbook .html. hensen, s. l. (1999). nistf ii and ead: the evolution of archival description. encoded archival description: context, theory, and case studies (pp. 23–34). chicago: the society of american archivsits library of congress (loc). (2006a). development of the encoded archival description dtd. retrieved october 6, 2006 from www.loc.gov/ead/eaddev.html. library of congress (loc). (2006b). digital archival object— encoded archival description tag library—version 2002. retrieved january 8, 2007 from www.loc.gov/ead/tglib. library of congress (loc). (2006c). encoded archival description —version 2002 official site. etd dtd version 2002. retrieved april 19, 2008 from www.loc.gov/ead/ead2002a.html. meissner, d., kinney, g., lacy, m., nelson, n., proffitt, m., rinehart, r., ruddy, d., stockling, b., webb, m., & young, t. (2002). rlg best practices guidelines for encoded archival description (pp. 1-24). mountain view: rlg. retrieved january 5, 2007 from www.rlg.org/en/pdfs/bpg.pdf. national library of australia. (1999). use of encoded archival description (ead) for manuscript collection retrieved january 4, 2007 from www.nla.gov.au/initiatives/ead/eadintro .html. oclc. (2007). archivegrid—open the door to history. retrieved january 4, 2007 from http://archivegrid.org/web. oclc. (2008). ead report card. retrieved april 11, 2008 www.oclc.org/programs/ourwork/past/ead/reportcard .htm. pearce-moses, r. (2005). a glossary of archival and records terminology. chicago: society of american archivists. retrieved january 8, 2007 from www.archivists.org/glossary/index.asp. prom, c. j. & habing, t. g. (2002). using the open archives initiative protocols with ead . paper preserted at the international conference on digital libraries proceedings of the 2nd acm/ieee-cs joint conference on digital libraries. portland, oregan, usa, july 14-18, 2002. retrieved october 6, 2006 from http://portal.acm .org/citation.cfm?doid=544220.544255. reese, t. (2005). building lite-weight ead repositories,. paper presented in the international conference on digital libraries proceedings of the 5th acm/ieee-cs joint conference on digital libraries. new york: acm. retrieved january 5, 2007 from http://doi.acm.org/10.1145/1065385.1065498. special collections department, university of virginia. (2004). virginia heritage guides to manuscripts and archival collections in virginia. retrieved january 7, 2007 from http://ead.lib.virginia .edu/vivaead/. thomas, c., et al. (2006). best practices guidelines for the implementation of ead version 2002 in florida institutions. florida state university special collections. university of texas libraries, university of texas at austin. (unknown). texas archival resources online (taro). retrieved january 4, 2007 from www.lib.utexas.edu/taro. zumalt ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ improving independent student navigation of complex educational web sites: an analysis of two navigation design changes in libguides kate a. pittsley and sara memmott information technology and libraries | se ptember 2012 52 abstract can the navigation of complex research websites be improved so that users more often find their way without intermediation or instruction? librarians at eastern michigan university discovered both anecdotally and by looking at patterns in usage statistics that some students were not recognizing navigational elements on web-based research guides, and so were not always accessing secondary pages of the guides. in this study, two types of navigation improvements were applied to separate sets of online guides. usage patterns from before and after the changes were analyzed. both sets of experimental guides showed an increase in use of secondary guide pages after the changes were applied whereas a comparison group with no navigation changes showed no significant change in usage patterns. in this case, both duplicate menu links and improvements to tab design appeared to improve independent student navigation of complex research sites. introduction anecdotal evidence led librarians at eastern michigan university (emu) to investigate possible navigation issues related to the libguides platform. anecdotal evidence included (1) incidents of emu librarians not immediately recognizing the tab navigation when looking at implementations of the libguides platform on other university sites during the initial purchase evaluation, (2) multiple encounters with students at the reference desk who did not notice the tab navigation, and (3) a specific case involving use of a guide with an online course. the case investigation started with a complaint from a professor that graduate students in her online course were suddenly using far fewer resources than students in the same course during previous semesters. the students in that semester’s section relied heavily—often solely— on one database, while most students during previous semesters had used multiple research sources. this course has always relied on a research guide prepared by the liaison librarian, the selection of resources provided had not changed significantly between the semesters, and the assignment had not changed. furthermore, the same professor taught the course and did not alter her recommendation to the students to use the resources on the research guide. what had changed between the semesters was the platform used to present research guides. the library had just migrated from a simple one-page format for research guides to the more flexible multipage format offered by the libguides platform. only a few resources were listed on the first kate a. pittsley (kpittsle@emich.edu) is an assistant professor and business information librarian and sara memmott (smemmott@emich.edu) is an instructor and emerging technologies librarian at eastern michigan university, ypsilanti, michigan. improving independent student navigation of complex educational websites | pittsley and memmott 53 libguides page of the guide used for the course. only one of these resources was a subscription database, and that database was the one that current students were using to the exclusion of many other useful sources. after speaking with the professor, the liaison librarian also worked one-on-one with a student in the course. the student confirmed that she had not noticed the tab navigation and so was unaware of the numerous resources offered on subsequent pages. the professor then sent a message to all students in the course explaining the tab navigation. subsequently the professor reported that students in the course used a much wider range of sources in assignments. statistical evidence of the problem a look at statistics on guide use for fall 2010 showed that on almost all guides the first pages of guides were the most heavily used. as the usual entry point, it wasn’t surprising that the first pages would receive the most use; however, on many multipage guides, the difference in use between the first page and all secondary pages was dramatic. that users missed the tab navigation and so did not realize additional guide pages existed seemed like a possible explanation for this usage pattern. librarians felt strongly that most users should be able to navigate guides without direct instruction in their use, and they were concerned by the evidence that indicated problems with the guide navigation. was there something that could be done to improve independent student navigation in libguides? two types of design changes to navigation were considered. to test the changes, each navigation change was applied to separate sets of guides. usage patterns were then compared for those guides before and after changes were made. the investigators also looked at usage patterns over the same period for a comparison group to which no navigation changes had been made. literature review navigation in libguides and pathfinders the authors reviewed numerous articles related to libguides or pathfinders generally, but found few that mention navigation issues. they then turned to studies of website navigation in general. in an early article on the transition to web-based library guides, cooper noted that “computer screens do not allow viewers to visualize as much information simultaneously as do print guides, and consequently the need for uncomplicated, easily understood design is even greater.”1 four university libraries’ usability studies of the libguides platform specifically address navigation issues. university of michigan librarians dubicki et al. found that “tabs are recognizable and meaningful—users understood the function of the tabs.”2 the michigan study then focused on the use of meaningful language for tab labels. however, at the latrobe university library (australia), corbin and karasmanis found a consistent pattern of students not recognizing the navigation tabs, and so recommended providing additional navigation links elsewhere on the page.3 at the university of washington, hungerford et al. found students did not immediately recognize the tab navigation: information technology and libraries | se ptember 2012 54 during testing it was observed that users frequently did not notice a guide’s tabs right away as a navigational option. users’ eyes were drawn to the top middle of the page first and would focus on content there, especially if there was actionable content, such as links to other pages or resources.4 the solution at the university of washington was to require that all guides have a main page navigation area (libguides “box”) with a menu of links to the tabbed pages. after a usability study, mit libraries also recommended use of a duplicate navigation menu on the first page, stating in mit libraries staff guidelines for creating libguides to “make sure to link to the tabs somewhere on the main page” as “users don’t always see the tabs, so providing alternate navigation helps.”5 navigation palmer mentions navigation as one of the factors most significantly associated with website success as measured by user satisfaction, likelihood to use a site again, and use frequency.6 however, effective navigation may be difficult to achieve. nielsen found in numerous studies that “users look straight at the content and ignore the navigation areas when they scan a new page.”7 in a presentation on the top ten mistakes in web design, human–computer interaction scholar tullis included “awkward or confusing navigation.”8 the following review of the literature on website navigation design is limited to studies of navigation models that use browsing via menus, tabs, and menu bars. the navigation problem seen in libguides is far from unique. usability studies for other information-rich websites demonstrate similar problems with users not recognizing navigation tabs or menu bars similar to those used in libguides. in 2001, mcgillis and toms investigated the usability of a library website with a horizontal navigation bar at the top of the page, a design similar to the single row of libguides tabs. this study found that users either did not see the navigation bar or did not realize it could be clicked.9 in multiple usability studies, u.s. census bureau researchers found similar problems with navigation bars on government websites. in 2009, olmsted-hawala et al. reported that study participants did not use the top-navigation bar on the census bureau’s business and industry website.10 the next year, chen et al. again reported problems with top-navigation bar use on the governments division public website, explaining that the “top-navigation bar blends into the header, leading participants to skip over the tabs and move directly to the main content. this is a recurring issue the usability laboratory has identified with many web sites.”11 one possible explanation for user neglect of tabs and navigation bars may be a phenomenon termed “banner blindness.” as early as 1999, benway provided in-depth analysis of this problem. in his thesis, he uses the word “banner” not just for banner ads, but also for banners that consist of horizontal graphic buttons similar to the libguides tab design. benway’s experiments show that an attempt to make important items visually prominent may have the opposite effect— that “the visual distinctiveness may actually make important items seem unimportant.” benway follows with two recommendations: (1) that “any method that is created to make something stand out should be carefully tested with users who are specifically looking for that content to ensure that it does not cause banner blindness,” and (2) that “any item visually distinguished on a page should be duplicated within a collection of links or other navigation areas of the page. that way, if searchers ignore the large salient item, they can still find what they need through basic navigation.”12 improving independent student navigation of complex educational websites | pittsley and memmott 55 in 2005, tullis cited multiple studies that showed that users found information faster or more effectively by using a simple table of contents than by using other navigation forms, including tabbased navigation.13 yet in 2011, nicolson et al. found that “participants rarely used table of contents; and often appeared not to notice them.”14 yelinek et al. pointed to a practical problem in using content menus on libguides pages: since libguides pages can be copied or mirrored on other guides, guide authors must be cognizant that such menus could cause problems with incorrect or confusing navigational links on copied or mirrored pages.15 success can also depend on the location of navigational elements, although researchers disagree on effects of location. in addition, user expectations of where to look for navigation elements may change over time along with changes in web conventions. in 2001, bernard studied user expectations as to where common web functions would be located on the screen layout. he found that “most participants expected the links to web pages within a website to be almost exclusively located in the upper-left side of a web page, which conforms to the current convention of placing links on [the] left side.”16 in 2004, pratt et al. found that users were equally effective using horizontal or vertical navigation menus, but when given a choice more users chose to use vertical navigation.17 also in 2004, mccarthy et al. performed an eye-tracking study, which showed faster search times when sites conformed to the expected left navigation menu and a user bias toward searching the middle of the screen; but it also found that the initial effect of menu position diminished with repeated use of a site.18 nonetheless, jones found that by 2006 most corporate webpages used “horizontally aligned primary navigation using buttons, tabs, or other formatted text.”19 in 2008, cooke found that users looked equally at left, top, and center menus; however, when “a visually prominent navigation menu populated the center of the web page, participants were more likely to direct their search in this location.”20 wroblewski describes how tab navigation was first popularized by amazon.21 burrell and sodan investigated user preferences for six navigation styles and found that users clearly preferred tab navigation “because it is most easily understood and learned.”22 in the often-cited web design manual don’t make me think, krug also recommends tabs: “tabs are one of the very few cases where using a physical metaphor in a user interface actually works.”23 krug recommends that tabs be carefully designed to resemble file folder tabs. they should “create the visual illusion that the active tab is in front of the other tabs . . . the active tab needs to be a different color or contrasting shade [than the other tabs] and it has to physically connect with the space below it. this is what makes the active tab ‘pop’ to the front.”24 an often-cited u.s. department of health and human services manual on research-based web design addresses principles of good tab design, stating that tabs should be located near the top of the page and should “look like clickable versions of real-world tabs. real-world tabs are those that resemble the ones found in a file drawer.”25 nielsen provides similar guidelines for tab design, which include that the selected tab should be highlighted, the current tab should be connected to the content area (just like a physical tab), and that one should use only one row of tabs.26 more recently, cronin highlighted examples of good tab design that effectively use elements such as rounded tab corners, space between tabs, and an obvious design for the active tab that visually connects the tab to the area beneath it.27 christie also provides best practices for tab design that include consistent use of only one row of tabs, use of a prominent color for the active tab and a single information technology and libraries | se ptember 2012 56 background color for unselected tabs, changing the font color on the active tab, and use of rounded corners to enhance the file-folder-tab metaphor.28 two articles mention that the complexity of a site can be a factor in navigation success. mccarthy et al. found that search times are significantly affected by site complexity and recommended finding ways to balance the provision of numerous user options with simplifying the site so that users can find their way.29 little specifically suggests reducing the amount of extraneous information on libguides pages in her article, which applies cognitive load theory to use of library research guides.30 in sum, effective navigation is difficult to achieve. however, navigation design can be improved by considering the purpose of the site, user expectations, common conventions, best practices, the possibility that intuitive ideas for design may not perform as expected (e.g., banner blindness), the site’s complexity, and more. research question and method could design changes improve independent student use of libguides tab navigation? the literature reviewed above suggested two likely design changes to test: adding additional navigation links in the body of the page and improving the tab design. testing these design changes on selected guides would allow the emu library to assess the impact before implement changes on all library research guides. for this experiment, each type of navigation change was applied to separate subsets of guides; a subset of similar guides was selected as a comparison group; and usage patterns were analyzed for similar periods before and after changes were made. navigation design changes were made to fourteen subject guides related to business. the business subject guides were divided into two experimental groups of seven guides. in group a, a table of contents box with navigation links was added to the front page of each guide, and in group b, the navigation tabs were altered in appearance. no navigation changes were made to comparison group c. class specific guides were excluded from the experiment, as in many cases the business librarian would have instructed students in the use of tabs on class guides. changes were made at the beginning of the winter 2011 semester so that an entire semester’s data could be collected and compared to the previous semester’s usage patterns. the design for group a was similar to the university of washington implementation of a “what’s in the guide” box on guide homepages that repeated the tab navigation links.31 for guides in group a, a table of contents box was placed on the guide homepages. it contained a simple list of links to the secondary pages of the guides, using the same labels as on the navigation tabs. the table of contents box used a larger font size than other body text and was given an outline color that contrasted with the outline color used on other boxes and matched the navigation tab color to create visual cues that this box had a different function from the other boxes on the page (navigation). the table of contents box was placed alongside other content on the guide homepages so users could still see the most relevant resources immediately. figure 1 shows a guide containing a table of contents box. improving independent student navigation of complex educational websites | pittsley and memmott 57 figure 1. group a guide with content menu box labeled “guide sections” the design change for group b focused on the navigation tabs. libguides tabs exhibit some of the properties of good tab design, such as allowing for rounded corners and contrasting colors for the selected tabs. other aspects are not ideal, such as the line that separates the active tab from the page body.32 in the emu library’s initial libguides implementation, the option for tabs with rounded corners was used to resemble the design of manila file folders and increase the association with the file-folder metaphor. possibilities for further design adaptation on the experimental guides were somewhat limited because these changes needed to be applied to the tabs of just a selected set of guides. the investigators theorized that increasing the height of the tabs might make them more closely resemble paper file folder tabs. increasing the height would also increase the area of the tabs, and the larger size might also make the tabs more noticeable. this option was simple to implement on the guides in group b by adding html break tags, , to the tab text. taller tabs also provided more room for text on the tabs. tabs in libguides will expand in width to fit the text label used, and if the tabs on a guide require more space on the page, they will be displayed in multiple rows. multiple rows of tabs are visually confusing and break the tabs metaphor, decreasing their usefulness for navigation.33 the emu library’s best practices for research guides already encouraged limiting tabs to one row. adding height to tabs allowed for clearer text labels on some guides without expanding the tab display beyond a single row. figure 2 shows a guide containing the altered taller tabs. information technology and libraries | se ptember 2012 58 figure 2. group b guide with tabs redesigned to look more like file folder tabs while variations in content and usage of library guides did not allow for a true control group, other social science subject guides were selected as a comparison group. social science subject guides were excluded from the comparison group if they had very low guide usage during the fall 2010 semester (fewer than thirty uses), or if they had fewer than three tabs, making them structurally dissimilar to the business guides. this left a group of sixteen comparison guides. no changes were made to the navigation design of these guides during the test period. the business guides—which the authors had permission to experiment with—tend to be longer and have more pages than other guides. on average, the experimental guides had more pages per guide than the comparison guides; guides in groups a and b averaged nine pages per guide, and comparison guides averaged five pages per guide. guides with more pages will tend to have a higher percentage of hits on secondary pages because there are more pages available to users. however, the authors intended to measure the change in usage patterns with each guide measured against itself in different periods, and the number of pages in each guide did not change from semester to semester. data collection and results libguides provides monthly usage statistics that include the total hits on each guide and the number of hits on each page of a guide. use of secondary pages of the guides was measured by calculating the proportion of hits to each guide that occurred on secondary pages. data for the fall 2010 semester (september through december 2010) was used to measure usage patterns before navigation changes were made to the experimental guides. data for the winter 2011 semester (january through april 2011) was used to measure usage patterns after navigation changes were made. each would represent a full semester’s use at similar enrollment levels with many of the same courses and assignments. usage patterns for the comparison guides were also examined for these periods. improving independent student navigation of complex educational websites | pittsley and memmott 59 as shown in figures 3 and 4, in both group a and group b, the percentage of hits on secondary pages increased in five guides and decreased in two guides. figure 3. group a: change in secondary page usage with content menus added for winter 2011 figure 4. group b: change in secondary page usage with new tab design for winter 2011 both groups of experimental guides showed an increase in use of secondary guide pages after the design changes were made. the median usage score was calculated for each group. group a, with the added menu links, showed an increase of 10.3 points in the median percentage of guide hits on secondary pages. group b, with redesigned tabs, showed an increase of 10.4 points in the median percentage of guide hits on secondary pages. within the comparison guides, the proportion of hits secondary tab usage : guides in group a fall 2010 winter 2011 secondary tab usage: guides in group b fall 2010 winter 2011 information technology and libraries | se ptember 2012 60 on secondary pages did not change significantly from fall 2010 to winter 2011. table 1 shows the median percentage of guide hits on secondary pages before and after navigation design changes. group a: menu links added group b: tabs redesigned group c: comparison group fall 2010 39.1% 50.5% 37.7% winter 2011 49.4% 60.9% 37.4% table 1. median percentage of guide hits on secondary pages the box plot in figure 5 graphically illustrates the range of the usage of secondary pages in each group of guides and the changes from fall 2010 to winter 2011, showing the minimum, maximum, and median scores, as well as the range of each quartile. figure 5. distribution of percentage of guide hits on secondary pages. this figure demonstrates the change in usage pattern for groups a and b and the lack of change in usage pattern for comparison group c. averages for the percentage change in secondary tab use were also computed for the combined experimental groups and the comparison group. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% group a f10 group a w11 group b f10 group b w11 group c f10 group c w11 improving independent student navigation of complex educational websites | pittsley and memmott 61 experimental or comparison n mean std. deviation std. error mean change in secondary tab use dim ension 1 experimental 14 .07871 .097840 .026149 comparison 16 -. 02550 .145977 .036494 table 2. average change in secondary tab use from fall 2010 to winter 2011, comparing all experimental guides (groups a & b) with all comparison (group c) guides. when comparing all experimental guides and all comparison guides, the change in use of secondary pages was found to be statistically significant. the average change in use of secondary pages for all experimental guides (groups a and b) was .07871, and the average for all comparison guides (group c) was -.02550. a t test showed that this difference was significant at the p < . 05 level (p = .032). study limitations in some (possibly many) cases, the first page of the guide provides all necessary sources and advice for an assignment. we measured actual use of secondary pages, but were unable to measure recognition of navigation elements where the student did not use the secondary pages because they had no need for additional resources. because it wasn’t possible to control use of the guides during the periods studied, it is possible that factors other than the design changes contributed to the pattern of hits. though subject guides rather than class guides were used to limit the influence of instruction in the use of guides, it wasn’t possible to determine with certainty if other faculty members instructed a significant number of students in the use of particular guides during the periods examined. the comparison group was slightly dissimilar in that they had fewer pages than the experimental guides; however, the number of pages on a guide did not correlate with a change in percentage of hits on secondary pages from one semester to the next. application of findings when presented with the study results, the full library faculty at emu expressed interest in using both design changes across all library research guides. the change to tab design—which is easiest to implement—has been made to all subject guides. some librarians also chose to add content menus to selected guides. since the complexity of research guides is also a factor in successful navigation,35 a recent libguides enhancement was used to move elements from the header area to the bottom of the guides. the elements moved out of the header included the date of last update, guide url, print option, and rss updates. the investigators hypothesize that the reduced complexity of the header may help in recognizing the tab navigation. although convinced that the experimental changes made a difference to independent student navigation in research guides, the authors hope to find further ways to strengthen independent navigation. vendor design changes to enhance the tab metaphor, such as creating a more visible connection between the active tab and page, might also improve navigation.36 information technology and libraries | se ptember 2012 62 conclusion designing navigation for complex sites, such as library research guides, is likely to be an ongoing challenge. this study suggests that thoughtful design changes can improve navigation. in this case, both duplicate menu links and improvements to tab design improved independent student navigation of complex research sites. references and notes 1. eric a. cooper, “library guides on the web: traditional tenets and internal issues,” computers in libraries 17, no. 9 (1997): 52. 2. barbara dubicki beaton et al., libguides usability task force guerrilla testing (ann arbor: university of michigan, 2009), http://www.lib.umich.edu/content/libguides-guerillatesting. 3. jenny corbin and sharon karasmanis, health sciences information literacy modules usability testing report (bundoora, australia: la trobe university library, 2009), http://arrow.latrobe.edu.au:8080/vital/access/handleresolver/1959.9/80852. 4. rachel hungerford, lauren ray, christine tawatao, and jennifer ward, libguides usability testing: customizing a product to work for your users (seattle: university of washington libraries, 2010), 6, http://hdl.handle.net/1773/17101. 5. mit libraries, research guides (libguides) usability results (cambridge, ma: mit libraries, 2008), http://libstaff.mit.edu/usability/2008/libguides-summary.html; mit libraries, guidelines for staff libguides (cambridge, ma: mit libraries, 2011), http://libguides.mit.edu/staff-guidelines. 6. jonathan w. palmer, “web site usability, design, and performance metrics,” information systems research 13, no. 2 (2002): 151-67, doi:10.1287/isre.13.2.151.88. 7. jakob nielsen, “is navigation useful?,” jakob nielsen’s alertbox, http://www.useit.com/alertbox/20000109.html. 8. thomas s. tullis, “web-based presentation of information: the top ten mistakes and why they are mistakes,” in hci international 2005 conference: 11th international conference on human-computer interaction, 22–27, july 2005, caesars palace, las vegas, nevada usa (mahwah nj: lawrence erlbaum associates, 2005), doi:10.1.1.107.9769. 9. louise mcgillis and elaine g. toms, “usability of the academic library web site: implications for design,” college & research libraries 62, no. 4 (2001): 355–67, http://crl.acrl.org/content/62/4/355.short. 10. erica olmsted-hawala et al., usability evaluation of the business and industry web site, survey methodology #2009–15, (washington, dc: statistical research division, u.s. census bureau, 2009), http://www.census.gov/srd/papers/pdf/ssm2009–15.pdf. 11. jennifer chen et al., usability evaluation of the governments division public web site, survey http://www.lib.umich.edu/content/libguides-guerilla-testing http://www.lib.umich.edu/content/libguides-guerilla-testing http://arrow.latrobe.edu.au:8080/vital/access/handleresolver/1959.9/80852 http://hdl.handle.net/1773/17101 http://crl.acrl.org/content/62/4/355.short http://www.census.gov/srd/papers/pdf/ssm2009–15.pdf improving independent student navigation of complex educational websites | pittsley and memmott 63 methodology #2010–02, (washington, dc: u.s. census bureau, usability laboratory, 2010), 19, http://www.census.gov/srd/papers/pdf/ssm2010-02.pdf. 12. jan panero benway, “banner blindness: what searching users notice and do not notice on the world wide web” (phd diss., rice university, 1999), 75, http://hdl.handle.net/1911/19353. 13. tullis, “web-based presentation of information.” 14. donald j. nicolson et al., “combining concurrent and sequential methods to examine the usability and readability of websites with information about medicines,” journal of mixed methods research 5, no. 1 (2011): 25–51, doi:10.1177/1558689810385694. 15. kathryn yelinek et al., “using libguides for an information literacy tutorial 2.0,” college & research libraries news 71, no. 7 (july): 352–55, http://crln.acrl.org/content/71/7/352.short 16. michael l. bernard, “developing schemas for the location of common web objects,” proceedings of the human factors and ergonomics society annual meeting 45, no. 15 (october 1, 2001): 1162, doi:10.1177/154193120104501502. 17. jean a. pratt, robert j. mills, and yongseog kim, “the effects of navigational orientation and user experience on user task efficiency and frustration levels,” journal of computer information systems 44, no. 4 (2004): 93–100. 18. john d. mccarthy, m. angela sasse, and jens riegelsberger, “could i have the menu please? an eye tracking study of design conventions,” people and computers 17, no. 1 (2004): 401–14. 19. scott l. jones, “evolution of corporate homepages: 1996 to 2006,” journal of business communication 44, no. 3 (2007): 236–57, doi:10.1177/0021943607301348. 20. lynne cooke, “how do users search web home pages?” technical communication 55, no. 2 (2008): 185. 21. luke wroblewski, “the history of amazon’s tab navigation,” lukew ideation + design, may 7, 2007, http://www.lukew.com/ff/entry.asp?178. after addition of numerous product categories made tabs impractical, amazon now relies on a left-side navigation menu. 22. a. burrell and a. c. sodan, “web interface navigation design: which style of navigationlink menus do users prefer?” in 22nd international conference on data engineering workshops, april 2006. proceedings (washington, d.c.: ieee computer society, 2006), 42– 42, doi:10.1109/icdew. 2006.163. 23. steve krug, don’t make me think! a common sense approach to web usability, 2nd ed. (berkeley: new riders, 2006), 79. 24. ibid., 82. http://www.census.gov/srd/papers/pdf/ssm2010-02.pdf http://hdl.handle.net/1911/19353 http://crln.acrl.org/content/71/7/352.short http://www.lukew.com/ff/entry.asp?178 information technology and libraries | se ptember 2012 64 25. u.s. department of health and human services, “navigation,” in research-based web design & usability guidelines (washington, dc: u.s. department of health and human services, 2006), 8, http://www.usability.gov/pdfs/chapter7.pdf. 26. jakob nielsen, “tabs, used right,” jakob nielsen’s alertbox, http://www.useit.com/alertbox/tabs.html. 27. matt cronin, “showcase of well-designed tabbed navigation,” smashing magazine, april 6, 2009, http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designedtabbed-navigation. 28. alex christie, “usability best practice, part 1—tab navigation,” tamar, january 13, 2010, http://blog.tamar.com/2010/01/usability-best-practice-part-1-tab-navigation. 29. mccarthy, sasse, and riegelsberger, “could i have the menu please?” 30. jennifer j. little, “cognitive load theory and library research guides,” internet reference services quarterly 15, no. 1 (2010): 52–63, doi:10.1080/10875300903530199. 31. hungerford et al., libguides usability testing. 32. christie, “usability best practice”; nielsen, “tabs, used right”; krug, don’t make me think; cronin, “showcase of well-designed tabbed navigation.” 33. christie, “usability best practice”; nielsen. “tabs, used right.” 34. eva d. vaughan, statistics: tools for understanding data in the behavioral sciences (upper saddle river, nj: prentice hall, 1998), 66. 35. mccarthy, sasse, and riegelsberger, “could i have the menu please?” 36. springshare, the libguides vendor, has been amenable to customer feedback and open to suggestions for platform improvements. http://www.usability.gov/pdfs/chapter7.pdf http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designed-tabbed-navigation http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designed-tabbed-navigation http://blog.tamar.com/2010/01/usability-best-practice-part-1-tab-navigation introducing zoomify image | smith 29 column title editor author id box for 3 column layout playing tag in the dark: diagnosing slowness in library response time | brown-sica 29 margaret brown-sicatutorial playing tag in the dark: diagnosing slowness in library response time in this article the author explores how the systems department at the auraria library (which serves more than thirty thousand primarily commuting students at the university of colorado–denver, the metropolitan state college of denver, and the community college of denver) diagnosed and analyzed slow response time when querying proprietary databases. issues examined include vendor issues, proxy issues, library network hardware, and bandwidth and network traffic. w hy is everything so slow?” this is the question that library systems departments often have the most trouble answering. it is also easy to dismiss because it is often the fault of factors beyond the control of library staff. what usually prompts these questions are the experiences of the reference librarians. when these librarians are trying to help students at the reference desk, it is very frustrating when databases seem to respond to queries slowly, files take forever to load onto the computer screen, and all the while the line in front of the desk get continues to grow. or the library gets calls from students using databases and the catalog from their homes who complain that searching library resources takes too long, and that they are getting frustrated and using google instead. this question is so painful because libraries spend so much of their shrinking budgets on high quality information in the form of expensive proprietary databases, and it is all wasted if users have trouble using them. in this case the problem seemed to be how slow the process of searching for information and downloading documents from databases was. for lack of a better term, the auraria library called this the “response time” problem. this article will discuss the various ways the systems (technology) department of the auraria library, which serves the university of colorado–denver, metropolitan state college of denver, and the community college of denver, tried to identify problems and improve database response time. the systems department defined “response time” as the time it took for a person to send a query from a computer at home or in the library to a proprietary information database and receive a response back, or how long it took to load a selected fulltext article from a database. when a customer sets out to use a database in the library, the query to the database could be slowed down by many different factors. the first is the proxy, in our case innovative interfaces’ inc. web access management (iii wam), a product that authenticates the user via the iii api (application program interface) product. to do this the query travels over network hardware, switches, and wires to the iii server and back again. then the query goes to the database’s server, which may be almost anywhere in the world. hardware problems at the database vendor’s end can affect this transfer. in the case of auraria library this transfer can be influenced by traffic on the library’s network, the university’s network, and any other place in between. this could also be hampered by the amount of memory in the computer where the query originates, by the amount of tasks being performed by that computer, etc. the bandwidth of the network and its speed can also have an effect. basically, the bottlenecks needed to be found and fixed. bottlenecks are described by webopedia as “the delay in transmission of data through the circuits of a computer’s microprocessor or over a tcp/ip network. the delay typically occurs when a system’s bandwidth cannot support the amount of information being relayed at the speed it is being processed. there are, however, many factors that can create a bottleneck in a system.”1 literature review there is not a lot on database response slowness in library literature, probably because the issue overlaps with computer science and really is not one problem but a possibility of one of several problems. the issue is figuring out where the problem lies. gerhan and mutula examined technical reasons for network slowness, performing bandwidth testing at a library in botswana and one in the united states using the same computer, and giving several suggestions for testing, fixing technical problems, and issues to examine. gerhan and mutula concluded that bandwidth and insufficient network infrastructure were the main culprits in their situation. they studied both bandwidth and bandwidth “squeeze.” looking for the bandwidth “squeeze” means looking along the internet’s “journey of many stages through routers and exchange points, each successively farther removed from the user.”2 bandwidth bottlenecks could occur at any one or more of those stages in the query’s transmission. the following four sections parse that lengthy pathway and examine how each may contribute to delays. badue et al. in their article “basic issues on the processing of web queries,” described web margaret brown-sica (margaret.brown -sica@ucdenver.edu) is head of technology and distance education support, auraria library, serving the university of colorado–denver, metropolitan state college of denver, and the community college of denver. 30 information technology and libraries | december 200830 information technology and libraries | december 2008 queries, load balancing, and how they function.3 bertot and mcclure’s “assessing sufficiency and quality of bandwidth for public libraries” is based on data collected as part of the 2006 public libraries and the internet study and provides a very straightforward approach for checking specific areas for problems.4 it outlines why basic data such as bandwidth readings may not give the complete picture. it also gives a nice outline of factors involved such as local settings and parameters, ultimate connectivity path, application resource needs, and protocol priority. azuma, okamoto, hasegawa, and masayuki’s “design, implementation and evaluation of resource management system for internet servers” was very helpful in understanding the role and function of proxy servers and problems they can present.5 vendor issues this is a very thorny topic because it is out of the library’s control, and also because the library has so many databases. the systems department asked the reference staff to send reports of problems listing the type of activity attempted, time and dates, the names of the database, the problem and any error messages encountered. a few that seemed to be the slowest were selected for special examination. one vendor worked extensively with the library and in the end it was believed that there were problems at their end in load balancing, which eventually seemed to be fixed. that company was in the middle of a merger and that may have also been an issue. we also noted that a database that uses very large image files, artstor, was hard to use because it was so slow. this company sent the library an application that simulated the databases’ use and was supposed to test to see if bandwidth at auraria library was sufficient for that database. according to the test, it was. databases that consistently were perceived as the slowest were those that had the largest documents and pictures, such as those that used primarily pdfs and visual material. this, with the results of the testing, pointed to a problem independent of vendor issues. bandwidth and network traffic the systems department decided to do bandwidth testing on the library’s public and staff computers after reading gerhan and mutula’s article about the university of botswana. the general perception is that bandwidth is often the primary problem in network slowness, as well as the problems with databases that use larger files. several of the computers were tested in several successive days during what is usually the busiest time for the network, between noon and 2 p.m. the results were good, averaging about 3000 kilobytes per second (kbps). for this test we used the cnet bandwidth meter, which downloads an image to your computer, measures the time of the download, and compares it to the maximum speeds offered by other internet service providers.6 there are several bandwidth meters available on the internet. when the network administrator checked the switches for network traffic, they showed low traffic, almost always less than 20 percent of capacity. this was confusing: if the problem was neither with the bandwidth nor the vendors, what was causing the slow network performance? one of the university network administrators was consulted to see if any factor in their sphere could be having an effect on our network. we knew that the main university network had implemented a bandwidth shaper to regulate bandwidth. “these devices limit bandwidth . . . by greedy applications, guarantee minimum throughput for users, groups or protocols, and better utilize widearea connections by smoothing out bursty traffic.”7 it was thought that perhaps this might be incorrectly prioritizing some of the library’s traffic. this was a dead end, though—the network administrators had stopped using the device. if the bandwidth was good and the traffic was manageable, then the problem appeared to not be at the library. however, according to bertot and mcclure, the bandwidth question is complex because typically an arbitrary number describes the number of kbps used to define “broadband.” . . . such arbitrary definitions to describe bandwidth sufficiency are generally not useful. the federal communications commission (fcc), for example, uses the term “high speed” for connections of 200kbps in at least one direction. there are three problematic issues with this definition: 1. it specifies unidirectional bandwidth, meaning that a 200kbps download, but a much slower upload (e.g., 56kbps) would fit this definition; 2. regardless of direction, bandwidth of 200kbps is neither high speed nor does it allow for a range of internet-based applications and services. this inadequacy will increase significantly as internet-based applications continue to demand more bandwidth to operate properly. 3. the definition is in the context of broadband to the single user or household, and does not take into consideration the demands of a high-use multiple-workstation public-access context.8 proxy issues auraria library uses the iii wam proxy server product. there were several things that pointed to the introducing zoomify image | smith 31playing tag in the dark: diagnosing slowness in library response time | brown-sica 31 proxy being an issue. one was that the systems department had been experimenting with invoking the proxy in the library building in order to collect more accurate statistics and found that complaints about speed seemed to have started around the same time as this experiment. but if the bandwidth was not showing inadequacy and the traffic was light, why was this happening? the answer is better explained by azuma et al.: needless to say, busy web servers must have many simultaneous http sessions, and server throughput is degraded when effective resource management is not considered, even with large network capacity. web proxy servers must also accommodate a large number of tcp connections, since they are usually prepared by isps (internet service providers) for their customers. furthermore, proxy servers must handle both upward tcp connections (from proxy server to web servers) and downward tcp connections (from client hosts to proxy server). hence, the proxy server becomes a likely spot for bottlenecks to occur during web document transfers, even when the bandwidth of the network and web server performance are adequate.9 testing was done from on campus and off campus, with and without using the proxy server. the results showed that the connection was faster without the proxy. when testing was done from the health sciences library at the university of colorado with the same type of server and proxy, the response time was much faster. the difference between auraria library and the other library is that the community auraria library serves (the community college of denver, metropolitan state college, and the university of colorado–denver) has a much larger user population who overwhelmingly use databases from home, therefore taxing the proxy server. the other library belonged to a smaller campus, but the hardware was the same. the proxy was immediately dropped for on-campus users, and that resulted in some responsetime improvements. a conference call was set up with the proxy vendor to determine if improvements in response time might be attained by changing from a proxy server to ldap (lightweight directory access protocol) authentication. the response given was that although there might be other benefits, increased response time was not one of them. library network hardware it was evident that the biggest bottleneck was the proxy, so the systems department decided to take a closer look at iii’s hardware. the switch that regulated traffic between the network and the server that houses our integrated library system, part of which is the proxy server, was discovered to have been set at “halfduplex.” half-duplex refers to the transmission of data in just one direction at a time. for example, a walkie-talkie is a half-duplex device because only one party can talk at a time. in contrast, a telephone is a full-duplex device because both parties can talk simultaneously. duplex modes often are used in reference to network data transmissions. some modems contain a switch that lets you select between halfduplex and full-duplex modes. the correct choice depends on which program you are using to transmit data through the modem.10 when this setting was changed to full duplex response time increased. there was also concern that this switch had not been functioning as well as it could. the switch was replaced, and this also improved response time. in addition, the old server purchased through iii was a generic server that had specifications based on the demands of the ils software and didn’t into consideration the amount of traffic going to the proxy server. auraria library, which serves a campus of more than thirty thousand full-time equivalent students, is a library with one of the largest commuter student populations in the country. a new server had been scheduled to be purchased in the near future, so a call was made to the ils vendor to talk about our hypothesis and requirements. the vendor agreed that the library should change the specification on the new server to make sure it served the library’s unique demands. a server will be purchased with increased memory and a second processor to hopefully keep these problems from happening again in the next few years. also, the cabling between the switch and the server was changed to greater facilitate heavy traffic. conclusion although it is sometimes a daunting task to try to discover where problems occur in the library’s database response time because there are so many contributing factors and because librarians often do not feel that they have enough technical knowledge to analyze such problems, there are certain things that can be examined and analyzed. it is important to look at how each library is unique and may be inadequately served by current bandwidth and hardware configurations. it is also important not to be intimidated by computer science literature and to trust patterns of reported problems. the auraria library systems department was fortunate to also be able to compare problems with colleagues at other libraries and test in those libraries, which revealed issues that were unique and therefore most likely due to a problem at the library end. it is important to keep learning about how 32 information technology and libraries | december 200832 information technology and libraries | december 2008 your system functions and to try to diagnose the problem by slowly looking at one piece at a time. though no one ever seems to be completely satisfied with the speed of their network, the employees of auraria library, especially those who work with the public, have been pleased with the increased speed they are experiencing when using proprietary databases. having improved on the responsetime speed issue, other problems that are not caused by the proxy hardware have been illuminated, such as browser configuration, which may be hampering certain databases—something that had been attributed to the network. references 1. webopedia, s.v. “bottleneck,” www.webopedia.com/term/b/bottleneck.html (accessed oct. 8, 2008). 2. david r. gerhan and stephen mutula, “bandwidth bottlenecks at the university of botswana,” library hi tech 23, no. 1 (2005): 102–17 3. claudine badue et al., “basic issues on the processing of web queries,” sigir forum; 2005 proceedings (new york: association for computing machinery, 2005): 577–78. 4. john carlo bertot and charles r. mcclure,” assessing sufficiency and quality of bandwidth for public libraries,” information technology and libraries 26, no. 1 (mar. 2007): 14 –22. 5. kazuhiro azuma, takuya okamoto, go hasegawa, and murata masayuki, “design, implementation and evaluation of resource management system for internet servers,” journal of high speed networks 14, no. 4 (2005): 301–16. 6. “cnet bandwidth meter,” http:// reviews.cnet.com/internet-speed-test (accessed oct. 8, 2008). 7. michael j. demaria, “warding off wan gridlock,” network computing nov. 15, 2002, www.networkcomputing.com/ showitem.jhtml?docid=1324f3 (accessed oct. 8, 2008). 8. bertot and mcclure, “assessing sufficiency and quality of bandwidth for public libraries,” 14. 9. azuma, okamoto, hasegawa, and masayuki, “design, implementation and evaluation of resource management system for internet servers,” 302. 10. webopedia, s.v. “half-duplex,” www.webopedia.com/term/h/half _duplex.html (accessed oct. 8, 2008). lita cover 2, cover 3, cover 4 index to advertisers graphs in libraries: a primer | powell et al. 157 james e. powell, daniel alcazar, matthew hopkins, robert olendorf, tamara m. mcmahon, amber wu, and linn collinsgraphs in libraries: a primer answer routine searches is compelling. how, we wonder, can we bring a bit of google to the library world? google harvests vast quantities of data from the web. this data aggregation is obviously complex. how does google make sense of it all so that it can offer searchers the most relevant results? answering this question requires understanding what google is doing, which requires a working knowledge of graph theory. we can then apply these lessons to library systems, make sense of voluminous bibliometric data, and give researchers tools that are as effective for them as google is for web surfers. just as web surfers want to know which sites are most relevant, researchers want to know which of the relevant results are the most reliable, the most influential, and of the highest quality. can quantitative metrics help answer these qualitative questions? the more deeply libraries and librarians can mine relationships between articles and authors and between subjects and institutions, the more reliable are their metrics. suppose some librarians want to compare the relative influence of two authors. they might first look at the authors’ respective number of publications. but are those papers of equally high quality? they might next count all citations to those papers. but are the citing articles of high quality? deeper still, they might assign different weights to each citing article using its own number of citations. at each step, whether realizing it or not, they are applying graph theory. with deeper knowledge of this subject, librarians can embrace complexity and harness it for research tools of powerful simplicity. ■■ pagerank and the global giant graph indexing the web is a massive challenge. the internet is a network of computer hardware resources so complex that no one really knows exactly how it is structured. in fact, researchers have resorted to conducting experiments to discern the structure and size of the internet and its potential vulnerability to attacks. representations of the data collected by these experiments are based on network whenever librarians use semantic web services and standards for representing data, they also generate graphs, whether they intend to or not. graphs are a new data model for libraries and librarians, and they present new opportunities for library services. in this paper we introduce graph theory and explore its real and potential applications in the context of digital libraries. part 1 describes basic concepts in graph theory and how graph theory has been applied by information retrieval systems such as google. part 2 discusses practical applications of graph theory in digital library environments. some of the applications have been prototyped at the los alamos national laboratory research library, others have been described in peer-reviewed journals, and still others are speculative in nature. the paper is intended to serve as a high-level tutorial to graphs in libraries. part 1. introduction to graph theory complexity surrounds us, and in the twenty-first century, our attempts at organization and structure sometimes lead to more complexity. in layman’s terms, complexity refers to problems and objects that have many distinct but interrelated issues or components. there also is an interdisciplinary field referred to as “complex systems,” which investigates emergent properties, such as collective intelligence.1 emergent properties are an embodiment of the old adage “the whole is greater than the sum of its parts.” these are behaviors or characteristics of a system “where the parts don’t give a real sense of the whole.”2 libraries reside at the nexus of these two definitions: they are creators and caretakers of complex data sets (metadata), and they are the source of explicit records of the complex and evolving intellectual and social relationships underlying the evolution of knowledge. digital libraries are complex systems. patrons visit libraries hoping to find some order in complexity or to discover a path to new knowledge. instead, they become the integration point for a complex set of systems as they juggle resource discovery by interacting with multiple systems, either overtly or via federated search, and by contending with multiple vendor sites to retrieve articles of interest. contrast this with google’s simple approach to content discovery: a user enters a few terms in a single box, and google returns a large list of results spanning the internet, placing the most relevant results at the top of this list. no one would suggest using google for all research needs, but its simplicity and recognized ability to james e. powell (jepowell@lanl.gov) is research technologist, daniel a. alcazar (dalcazar@lanl.gov) is professional librarian, matthew hopkins (mfhop@lanl.gov) is library professional, tamara m. mcmahon (tmcmahon@lanl.gov) is library technology professional, amber wu (amber.ponichtera@gmail.com) is graduate research assistant, and linn collins (linn@lanl .gov) is technical project manager, los alamos national laboratory, los alamos, new mexico. robert olendorf (olendorf@unm .edu) is data librarian for science and engineering, university of new mexico libraries, albuquerque, new mexico. 158 information technology and libraries | december 2011 influence a person has in a business context. if we want to analyze this aspect of the network, then it makes sense to consider the fact that some relationships are more influential than others. for example, a relationship with the president of the company is more significant than a relationship with a coworker, since it is a safe assumption that a direct relationship with the company leader will increase influence. so we assign weights to the edges based on who the edge connects to. google does something similar. all the webpages they track have centrality values, but google’s weighting algorithm takes into account the relative importance of the pages that connect to a given resource. the weighting algorithm bases importance on the number of links pointing to a page, not the page’s internal content, which makes it difficult for website authors to manipulate the system and climb the results ladder. so if a given webpage science, also known as graph theory. this is not the same network that ties all the computers on the internet together, though at first glance it is a similar idea. network science is a technique for representing the relationships between components of a complex system.3 it uses graphs, which consist of nodes and edges, to represent these sets of relationships. generally speaking, a node is an actor or object of some sort, and an edge is a relationship or property. in the case of the web, universal resource locators (urls) can be thought of as nodes, and connections between pages can be thought of as links or edges. this may sound familiar because the semantic web is largely built around the idea of graphs, where each pair of nodes with a connecting edge is referred to as a triple. in fact, tim berners-lee refers to the semantic web as the global giant graph—a place where statements of facts about things are published online and distinctly addressable, just as webpages are today.4 the semantic web differs from the traditional web in its use of ontologies that place meaning on the links and in the expectation that nodes are represented by universal resource identifiers (uris) or by literal (string, integer, etc.) values, as shown in figure 1, where the links in a web graph have meaning in the semantic web. semantic web data are a form of graph, so graph analysis techniques can be applied to semantic graphs, just as they are applied to representations of other complex systems, such as social networks, cellular metabolic networks, and ecological food webs. herein lies the secret behind google’s success: google builds a graph representation of the data it collects. these graphs play a large role in determining what users see in response to any given query. google uses a graph analysis technique called eigenvector centrality.5 in essence, google calculates the relative importance of a given webpage as a function of the importance of the pages that point to it. a simpler centrality measure is called degree centrality. degree centrality is simply a count of the number of edges a given node has. in a social network, degree centrality might tell you how many friends a given person has. if a person has edges representing friendship that connect him to seventeen other nodes, representing other people in the network, then his degree value is seventeen (see figure 2). if a person with seventeen friends has more friendship edges than any other person in the network, then he has the highest degree centrality for that network. eigenvector centrality expands on degree centrality. consider a social network that represents the amount of figure 1. a traditional web graph is compared to a corresponding semantic web graph. notice that replacing traditional web links with semantic links facilitates a deeper understanding of how the resources are related. graphs in libraries: a primer | powell et al. 159 networks evidence for the evolution of metabolic processes.7 chemists have used networks to model reactions in a step-wise fashion by “editing” graphs representing models of molecules and their reactivity,8 and they also have used graphs to better comprehend phase transition states, such as the freezing of water or the emergence of superconductivity when a material is cooled.9 economists have used graphs to model market trades and the effects of globalization.10 infectious disease specialists have used networks to model the spread of disease and to evaluate prospective vaccination plans.11 sociologists have modeled the complex interactions of people in communities.12 and in libraries, computer scientists have explored citation networks and coauthorship networks,13 and they have developed maps of science that integrate scientific papers, their topics, the journals in which they appear, and comsumers’ usage patterns to provide a new view of the pursuit of science.14 network science can make complexity more comprehensible by representing a subset of actors and relationships in a complex system as a graph. these has only two edges, it may still rank higher than a more connected page if one of the pages that links to it has a large number of pages pointing to it (see figure 3). this weighted degree centrality measure is eigenvector centrality, and a higher eigenvector centrality score causes a page to show up closer to the top of a google results set. the user never sees a graph, but this graphbased approach to exploring a complex system (the web), works quite well for routine web searches. ■■ graph theory graph theory, also known as network science, has evolved tremendously in the last decade. for example, information scientists have discovered hubs in the web that connect large numbers of pages, and if removed, disconnect large portions of the network.6 biologists have begun to explore cellular processes, such as metabolism, by modeling these processes as networks and have even found in these figure 2. friendship network figure 3. node 2 ranks higher than node 1 because node 3, which connects to node 2, has more incoming links than node 1. node 3 is deemed more important than node 9, which has no incoming links. 160 information technology and libraries | december 2011 as subgraphs, e.g., in the case where a person has two friends who are also mutual friends. small world networks have numerous highly interconnected subgroups called clusters. these may be distributed throughout the network in a regular fashion, with a few random connections that connect the otherwise disconnected clusters. these random links have the effect of greatly reducing the path length between any two nodes and explain the oft-cited six degrees of separation that connect all people to one another. in social networks, agency is often described as the mechanism by which graphs can then be explored visually and mathematically. graphs can be used to represent systems as they are, to extract subsets of these systems, or to discover wholly artificial collections of relationships between components of a speculative system. data also can be represented as graphs when they consist of “measurements that are either of or from a system conceptualized as a network.”15 in short, graphs offer a continuum of techniques for comprehending complexity and are suitable either for a layman with casual interest in a topic or a serious researcher ferreting out discrete details. at the core of network science is the graph. as stated earlier, a graph is a collection of nodes and the edges that connect some of those nodes, together representing a set of actors and relationships in a type of system. relationships can be unidirectional (e.g., in a social network, when the information flows from one person to another) or bidirectional (e.g., when the information flows back and forth between two individuals). relationships also can vary in significance and can be assigned a weight—for example, a person’s relationship to his or her supervisor might be weighted more heavily than a person’s relationship to his or her peers. a graph can consist of a single type of node (for subjects) and a single type of edge connecting those nodes (for predicates). these are called unipartite graphs. from the standpoint of graph theory, these are the easiest types of graphs to work with. graphs that represent two relationships (bipartite) or more are typically reduced to unipartite graphs in the process of exploring them because the vast majority of techniques for evaluating graphs were developed for graphs that address a single relationship between a set of nodes. ■■ global properties of graphs there are other aspects of graphs to consider, sometimes referred to as “global graph properties.”16 there are two basic classes of networks: homogeneous networks and inhomogeneous networks.17 these graphs exhibit characteristics that may not be comprehensible by close examination (e.g., by examining degree centrality, node clustering, or paths within a graph)18 but may be apparent, depending on the size and the way in which the graph is rendered, merely by looking at a visualization of the graph. in homogeneous graphs, nodes have no significant difference between their number of connections. examples include random graphs, complete graphs, and small world networks. in random graphs there is an equal probability that any two nodes will be connected (see figure 4), while in complete graphs (see figure 5) all nodes are connected with one another. random graphs are often used as tools to evaluate networks that describe real systems. complete graphs might occur in social networks figure 4. a random graph. any given node has an equal probability of being linked to any other node figure 5. a complete graph. all nodes are connected to all other nodes graphs in libraries: a primer | powell et al. 161 building blocks of networks.20 a three-node feedback motif is a set of nodes where the edges between them form a triangle and the edges are directional. in other words, node a is connected to (and might convey some information to) node b; node b, in turn, has the same relationship with node c; and node c is connected to (and conveys information back to) node a. in digital libraries, for example, if similar papers exhibit the same pattern of connectivity to a group of subject or keyword categories, motifs will make it possible to readily identify the topical overlap between them. collections of nodes that have a high degree of connectivity with each other are called clusters.21 in many complex systems, clusters are formed by preferential attachment. a group of highly clustered nodes that have low connectivity to the larger graph is known as a clique. while there are other aspects of graphs that can be explored, these four—node centrality measures, paths between nodes, motifs, and clustering—are accessible to most users and are significant in graphs representing bibliographic metadata and textual content. this will become clearer in the examples that follow. ■■ quantitative evaluation of graphs returning now to centrality measures, two of particular interest in digital libraries are degree centrality and betweenness centrality (or flow centrality). an interesting aspect of graphs is that, regardless of the data being represented, centrality measures and clustering characteristics often reveal important clues about the system that the data these random links get established. agency refers to the idea that multiple, often unpredictable actions on the part of individuals in a network result in unanticipated connections between people. examples of such actions are hobbies, past work experience, meeting someone new while on a trip to another country—pretty much anything that takes a person outside his or her normal social circles. in the case of inhomogeneous graphs, not all nodes are created equal. one type, scale-free networks, is common in a variety of systems ranging from biological to technological (see figure 6. these exhibit a structure in which a few nodes play a central role in connecting many others. these hubs form as a result of preferential attachment, known colloquially as “the rich get richer.” researchers became aware of scale-free networks as a result of analysis of the web when it was in its infancy. scale-free networks have been documented in biology, social networks, and technological networks. as a result, they are quite important in the field of information science. small world and scale-free networks are typical of complex systems that occur in nature or evolve because of emergent dynamic processes, in which a system self-organizes over time. small world networks provide fast, reliable communication between nodes, while scale-free networks are more fault tolerant, making them ideal for systems such as living cells, which are frequently challenged by the external environment.19 ■■ local properties of graphs below the ten-thousand-foot system-level view of networks, graphs can be scrutinized more closely using many other techniques. we will now consider four broad categories of local characteristics that describe networks and how they are, or could be, applied in digital libraries: node centrality measures, paths between nodes, motifs, and clustering. centrality measures make it possible to determine the importance of a given node in a network. degree centrality, in its simplest form, is simply a count of the number of edges connected to any given node in a network: a node with high-degree centrality has many connections to other nodes compared to a typical node in the graph. paths make it possible to explore the connections between nodes. an author who is two degrees removed from another author—in other words, the friend of a friend of a friend—has a path length of 2. researchers are often interested specifically in the shortest path between a given pair of nodes. many other types of paths can be explored depending on the type of network, but in libraries, paths that describe the flow of ideas or communication between people are most likely to be useful. motifs are the fundamental recurring structures that make up the larger graph, and they often are called the figure 6. example of a scale-free coauthorship network. a few nodes have many links, and most nodes have few or a single link to another node 162 information technology and libraries | december 2011 path that connects a node through other nodes back to itself. within graph visualization tools, the placement of nodes can vary from one layout to another. what matters is not the pictorial representation (though this can be useful), but the underlying relationships between nodes (the topology). along with clustering, paths help differentiate motifs, which are considered to be building blocks of some types of complex networks. since bibliographic metadata represents communication in one form or another, it is often most common to apply social network theory to graphs. but it is also possible to apply various centrality measures to graphs that are not social and to use these to discover significant nodes within those graphs. in part 2 we consider various unipartite and bipartite graphs that might be especially useful for examining digital library metadata. part 2. graph theory applications in digital libraries library systems, by virtue of the content they contain, are complex systems. fielded searches, faceted searches, and full-text searches all allow users to access aspects of the complex system. fielded searches leverage the explicit structure that has been encoded into the metadata describing the resources that users are ultimately trying to find (articles, books, etc). full-text searches enable users to explore in a more free-form manner, subject of course to the availability of searchable text. often, full-text search means the user is searching titles, abstracts, and other content that summarizes a resource, rather than the actual full text of articles and books. even if the user is searching the full content, there are relationships and aspects of the content that are not readily discernible through a full-text search. furthermore, there is not one single, comprehensive digital library—many library systems live in the deep web, that is, they are databases that are not indexed by search engines like google, and so users must describes, whether it’s coauthorship relationships or protein interactions in the cell of a living organism. often the clusters or nodes that exhibit a higher score in some centrality calculation are significant in some meaningful way compared to other nodes. recall that degree centrality refers to how many edges a given node has. degree centrality can vary significantly in strength depending on the relationships that are represented in the graph. consider a graph of citations between papers. while it may be obvious to humans that the mostly highly cited papers will have the highest-degree centrality, computers have no idea what this means. it is still up to humans to lend a degree of comprehensibility to the raw calculation: in other words, to understand that a paper with high-degree centrality is an important paper, at least among the papers the graph represents. betweenness centrality exposes how integral a given node is to a network. basically, without getting into the mathematics, it measures how often a node falls on the shortest path between other nodes. thus, nodes with high betweenness centrality do not necessarily have a lot of edges, but they bridge disparate clusters. in an informational network, the nodes with high betweenness centrality are crucial to information flow, social connections, or collaborations. hubs are examples of nodes with high betweenness centrality. the removal of a hub causes large portions of a network to become detached. in figure 7, the node labeled “folkner, w.m.” exhibits high betweenness centrality, since it connects two clusters together. a cluster coefficient expresses whether a given node in a network is a member of a tightly interlinked collection of nodes, or clique. the cluster coefficient of an entire graph reveals the overall tendency for clustering in a graph, with higher cluster coefficients typical of small world graphs. in other types of graphs, clusters sometimes manifest as homophily; that is, nodes of a given type are highly interconnected with one another and have few connections with nodes of other types. in social networks, this is sometimes referred to as the “birds of a feather” effect. in a more current reference, the effect was explored as a function of the likelihood that someone would “unfriend” an acquaintance on the social networking site facebook.22 in some networks (such as the internet), clusters are connected by hubs, while in others, the hub is the primary connecting node of other nodes. paths refer to the edges that connect nodes. the simplest case of a path is an edge that connects two nodes directly. path analysis addresses the set of edges that connect two nodes that are not, themselves, directly connected. the shortest path, as its name implies, refers to the route that uses the least number of edges to connect from node a to node b and measures the number of edges, not the linear distance. walks and paths refer to a list of nodes between two nodes, with walks allowing repeat visits to nodes, and paths not allowing them. cycles refer to a figure 7. paths in a coauthorship network graphs in libraries: a primer | powell et al. 163 coauthorship (collaboration) networks coauthorship (collaboration) networks are typically small world networks in which crossand interdisciplinary work provides the random links that connect various clusters (see figure 8). these graphs can be explored to determine which researchers are having the most influence in a given field; influence is a function of frequency of authorship. a prime example is the collaboration network graph for paul erdős, a highly productive mathematician. the popularity of his influence in academia has lead to the creation of the erdős number, which is “defined as indicating the topological distance in the graph depicting the co-authorship relations.”23 liu et al. proposed a node analysis measure that they called authorrank, which establishes weighted directed edges between authors. the author ’s authorrank value is a sum of the weighted edges connected to that author.24 these networks also can be used to explore how an idea spreads and what opportunities may exist for future collaborations, as well as many other existing and potential relationships. citation graphs citation graphs more strongly resemble scale-free networks, in which early papers in a given field tend to accumulate more links. such hub papers can be cited hundreds or even thousands of times while most papers are cited far less often or not at all. many researchers have explored citation graphs, though the person often credited with first noting the network characteristics of citation patterns was dereck j. de solla price in 1965.25 more recently, mark newman introduced the concept of what he calls “first mover advantage” to describe the preferential attachment observed in citation networks.26 search each individually. but if more of these systems adopted semantic web standards, they could be explored as graphs, and relationships between different databases would be easier to discern and represent to the user. many libraries have tried to emulate google by incorporating federated search engines with a single search box as an interface. this copies the form of google’s search engine but not its underlying power. to do that, libraries must enhance full-text searches by drawing on relationships. a full-text search will (hopefully) find relevant papers on a given topic, but a researcher often wants to find the best papers on that topic. to meet that need, libraries must harness the information contained in relationships; otherwise each paper is stuck in a vacuum. cited references are one way to connect papers. for researchers and librarians alike, this is a familiar metric for assessing a paper’s relative importance. the web of science and scopus are two databases that perform this function. looked at another way, citation counts are nothing more than degree centrality applied to a simple graph in which papers are nodes and references are edges. thus, in the framework of graph theory, citation analysis is just a small sliver of a world of possible relationships, many of which are unexplored. the following examples outline use case scenarios in which graph techniques are or could be applied to library data, such as bibliographic metadata, to help users find relationships and conduct research. ■■ informational graphs intrinsic to digital library systems there are multiple relationships represented within and between metadata contained in library systems that can be represented as graphs and explored using graph techniques. some of these, such as citation networks, are among the most well-studied informational networks. citation networks are valued because the data describing them is readily accessible and because scientists studying classes of networks have used them as surrogates for exploring scale-free networks. they are often evaluated as static networks (i.e., a snapshot in time) but some also have dynamic characteristics (e.g., they change and grow over time or they allow information-flow analysis). techniques such as pagerank can be used to evaluate information when the importance of a linking resource is as important as the number of links to a resource. multirelational networks can be developed to explore dynamic processes in research fields by using library data to provide the basic topological framework for some of the explorations. figure 8. a coauthorship network 164 information technology and libraries | december 2011 network with three types of nodes: one to represent individual pieces of debris, a second to represent collections of debris that are the original object that the debris is a fragment of, and a third to represent conjunction events (near misses) between objects. another example of graphs being used as tools is the case of developing vaccination strategies to curtail the spread of an infectious disease.30 in this case, networks have been used to determine that one of the best strategies for curtailing the transmission of a disease is to identify and vaccinate hubs, rather than to conduct mass vaccination campaigns. in libraries, graphs as tools could be used to help researchers identify collaboration opportunities, to disambiguate author identities and aggregate related materials, to allow library staff to evaluate the academic contribution of a group of researchers (bibliometrics), and to explore geospatial and temporal aspects of information, including changes in research focus over time. graphs for author name disambiguation author name disambiguation is a long-standing problem in libraries. many resources have been devoted to manual and automatic name authority control, yet the problem remains unsolved. projects such as oclc viaf and efforts to establish unique author identifiers will no doubt improve the situation, but many problems remain.31 meanwhile, we have experimented with an approach to author name matching by generating multirelational graphs. authors subject–author (expertise) graphs graphs that connect authors by subject areas can vary because of the granularity of subject headings (see figure 9). high-level subject headings tend to function as hubs, but more useful relationships are revealed by specific subject headings and author-provided keywords. the map of science merges publications and citations with actual end user usage patterns captured in library systems and deals, in part, with categories of scientific research.27 it clusters publications and visualizes them “as a journal network that outlines the relationships between various scientific domains.” implicit in this a model is the relationship of authors to subject areas. institution–topic and nation–topic (expertise) graphs from a commercial or geopolitical perspective, graphs that represent institutional or national expertise can reveal valuable information for scientists, elected officials, and investors, particularly in networks that represent the change in a given organization or region’s contributions to a field over time. metadata for scientific papers typically includes enough information to generate nodes and edges describing this. the resulting graph can reveal unexpected details, such as national or institutional efforts to nurture expertise in a given field, and the results of those efforts. the visualization of this data may take the form of icons that vary in shape and size depending on various aspects of nodes in the institution-topic network. these visual representations can then be overlaid onto a map, with various visual aspects of the icons also affected by centrality measures applied to a given institution’s contributions.28 ■■ graphs as tools graph representations can be used as tools to explore a variety of complex systems. even systems that do not initially appear to manifest networks of relationships can often be better understood when some aspect of the system is represented as a graph. this approach requires thinking about what aspects of information needs, discovery, or consumption might be represented or evaluated using networks. two interesting examples from other fields will illustrate the point. a 2009 paper in acta astronautica proposed that techniques to reduce the amount of space junk in orbit around the earth could be evaluated using graph theory techniques.29 the authors propose a dynamic multirelational figure 9. a subject–author graph for stephen hawking graphs in libraries: a primer | powell et al. 165 computation over time because it is typically so important to understanding data. allen’s temporal intervals address machine reasoning over disparate means of recording the temporal aspects of events.33 another temporal computing concept that has applicability to graphs is from the memento project, which makes it possible for users to view prior versions of webpages.34 entities in the memento ontology can become predicates in triples, which in turn can become edges in graphs. using graphs, time can be represented as a relationship between objects or as a distinct object within a graph. nodes that connect through a temporal node may overlap, coincide, or co-occur. nodes that cluster around time represent something important about the objects. genomic-document and proteindocument networks many people hoped that mapping the human genome would result in countless medical advances, but the process whereby genes manifest themselves in living organisms turned out to be much more complex—there wasn’t just a simple mapping between genes and organism traits, there were other processes controlled by genes representing additional layers of complexity scientists had not anticipated. today biologists apply network science to these processes to reveal the missing pieces of this puzzle.35 just as the process itself is complex, the information needs of these researchers benefit from a more sophisticated approach. biologists often need to find papers that reference a given gene or protein sequence. and so, representing these relationships (e.g., article–gene) as graphs has the added benefit of making the digital library research data compatible with the methods that biologists already use to document what they know about these processes. although this is a specialized type of graph, a similar approach might be valuable to researchers in a number of scientific disciplines, including materials science, astrophysics, and environmental sciences. graphs of omission one of the less obvious capabilities of network science is to make predictions about complex systems by looking for missing nodes in graphs.36 this has many applications: for example, identifying a hub in the metabolic processes of bacteria can yield new targets for antibiotics, but it is vital to know that interrupting the enzyme that serves as that hub will effectively kill the organism. making predictions about the evolution of research by identifying areas for cross-disciplinary collaboration or areas where little work has been done—enabling a researcher to leverage are the primary nodes of interest, but relationships such as topic areas, titles, dates, and even soundex representations of names also are represented. as one would expect, phonetically similar names cluster around particular soundex representations. shared coauthorship patterns and shared topic areas can reveal that two different names are for the same author as, for example, when a person’s name changes after marriage (see figure 10). graphs for title or citation deduplication string edit distance involves counting the number of changes that would need to be made to one string to convert it to another, and it is one of the most common approaches to deduplicating titles, citations, and author names. multirelational graphs, in which titles are linked to authors, publication dates, and subjects, result in subgraphs that appear virtually identical when two title variants are represented. centrality measures can be applied to unipartite subgraphs of these networks to home in on areas where data duplication may exist. temporal-topic graphs for analyzing the evolution of knowledge over time a particularly active area of research in graph theory is the representation of dynamical systems as networks. a dynamical system is described as a complex system that changes over time.32 computer scientists have developed various strategies and technologies to cope with figure 10. two authors with similar names linked by subject nodes 166 information technology and libraries | december 2011 basis for an on-the-fly search expansion tool. a querysuggestion tool might look at user-entered terms and determine that some are hubs, then suggest related terms from nodes that connect to those hub nodes. remember, graphs need not be visible to be useful! global subject resolution using dbpedia although dbpedia appears to lag behind wikipedia in terms of completeness and scrutiny by domain experts, it offers one mechanism for unifying user-provided tags, author keywords, and library-assigned subject headings with a graph of known facts about a topic. links into and out of dbpedia’s graphs on a given topic would enable serendipitous knowledge discovery through browsing these semantic graphs. viaf linked author data oclc’s effort to convert tens of millions of identity records into graphs describing various attributes of authors promises to enhance exploration of digital library content on the author dimension.42 these authority records contain a wealth of information, linking name variations, basic genealogical data such as birth and death dates, associations with institutions, subject areas, and titles published by authors. although some rough edges need to be smoothed (one of the authors of this paper discovered that his own authorship data was linked with another author of the same name), iterative refinement of this data as it is actually used may enable crowd-sourced the first-mover advantage and thus advance his or her career—is a valuable service that libraries are well positioned to provide (see figure 11). machine-supplied suggestions offer another type of prediction. for example, providing the prompt “did you mean john smith and climate change?” can leverage real or predicted relationships between author and subject (see figure 12). graphs, in turn, can be used to create tools that will simplify an author–subject search. viral concept detection phase transition typically refers to a process in thermodynamics that describes the point at which a material changes from one state of matter to another (e.g., liquid to solid). phase transition also applies to the dispersal of a new idea. interestingly enough, graphs representing matter at the point of phase transition, and graphs representing the spread of a fad in a social network, exhibit the same recognizable pattern of change: suddenly there are links between many more nodes, there’s a dramatic increase in clustering, and something called a giant component emerges.37 in a giant component, all of the nodes in that portion of the graph are interlinked, resulting in a complete graph like figure 5. this is not so different from what one observes when something “goes viral” on the internet. in a library, a dynamic graph showing the usage of new keywords for emerging subject areas would likely reflect a similar pattern. ■■ linked data graph examples cross-collection graphs, or graphs that link data under your control to data published online, can be constructed by building links into the web of linked data.38 linked data refers to semantic graphs of statements that various organizations publish on the web. for example, geonames.org publishes millions of statements about geographic locations on the linked data web.39 as these graphs grow and evolve, opportunities emerge for using this data in combination with your own data in various ways. for example, it would be quite interesting to develop a network representation of library subject headings and their relationships to concepts in the encyclopedic linked data collection known as dbpedia.40 the resulting graph could be used in a variety of ways: for example, to evaluate the consistency of statements made about concepts, to establish semantic links between user-provided tags and concepts,41 or to function as the figure 11. identifying areas for collaboration: a co-author graph with many simple motifs and few clusters might indicate a field ripe for collaboration graphs in libraries: a primer | powell et al. 167 content could be represented and explored as a graph, and some research has already shown that geographic networks—especially those representing human-constructed entities such as cities and transportation networks—exhibit small world characteristics.45 another way graphs can express geographic relationships in useful ways would be in representing the concept of nearness. waldo tobler’s first law of geography states that “everything is related to everything else, but near things are more related than distant things.”46 in practice, human beings define nearness in different ways, so a graph representing a shared concept of nearness would be very valuable, particularly in exploring works associated with biological, ecological, geological, or evolutionary sciences. graph representations of nearness could be developed by librarians working with scientists and could be the geographic equivalent to subject guides and finding aids. they also might be useful across disciplines and would enable machine inferencing across data that include geographic relationships. still other kinds of graphs what might a digital library tool based on graph theory look like? what could it do? it wouldn’t necessarily depict visualizations of graphs (though in some cases visual graphs are the most efficient way to impart concepts). after all, citation databases utilize graph theory, but the user only sees a number (cite count) and lists of articles (citing or cited). in many cases, then, the tool would perform graph evaluation techniques behind the scenes, translating these metrics into simple descriptive queries for the user. for example, a user interested in the most influential papers in his field would enter his subject, and then on the backend, the tool would apply eigenvector centrality to that subject’s citation graph. if the same user finds an especially relevant article, clicking a “find similar articles” button will produce a list of papers in that graph with the shortest path length to the paper in question. researchers also could use this tool to evaluate authors and institutions in various ways: ■■ is my output diverse or specialized compared to my colleagues? the tool assigns a score for each author based on degree centrality in a subject-author graph. ■■ i want to find potential collaborators. the tool returns authors connected to researcher by the shortest path length in a coauthorship graph. ■■ i want to collaborate with colleagues from other departments at my institution. high betweenness centrality quality control that will more rapidly identify and resolve these problems. linked geographic data using geonames it is ironic that the use of networks to describe geographic aspects of the world is in its infancy, considering that many consider leonhard euler’s attempt to find a mathematical solution to the seven bridges of königsberg problem in 1735 to be the birth of the field.43 as some authors have pointed out, geometric evaluation of geographic relationships is actually a poor way to explore geographic relationships.44 graphs can be used to express arbitrary relationships between geographically separated objects, and it is perhaps no accident that our road and railway systems are in fact among the most familiar graphs that people encounter in the real word. a subway map is a graph where subway stations are nodes linked by railway. graphs can represent the relationships between topological features, the visibility of buildings in a city to one another, or the land, sea, and air transportation that links one country to another. geonames supplies a rich collection of geographic information that includes descriptions of geopolitical entities (cities, states, countries), geophysical features, and various names that have been ascribed to these objects. the geographic relationships in intellectual figure 12. find similar articles: a search for hv reynolds might prompt a suggestion for sd miller, who has a similar authorship pattern 168 information technology and libraries | december 2011 nov. 21, 2007, timbl’s blog, http://dig.csail.mit.edu/bread crumbs/node/215. 5. lawrence page et al., the pagerank citation ranking: bringing order to the web (1999), http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.31.1768. 6. duncan s. callaway et al., “network robustness and fragility: percolation on random graphs,” physical review letters 85, no. 25 (2000): 5468–71. 7. adreas wagner and david a. fell, “the small world inside large metabolic networks,” proceedings of the royal society b: biological sciences 268, no. 1478 (2001): 1803–10. 8. gil benko, christopher flamm, and peter f. stadler, “a graph-based toy model of chemistry,” journal of chemical information and modeling 43, no. 4 (2003): 1085–93. 9. tad hogg, bernardo a. huberman, and colin p. williams, “phase transition and the search problem,” artificial intelligence 81 (1996): 1–15. 10. vladimir boginski, sergiy butenko, and panos m. pardalos, “mining market data: a network approach,” computers & operations research 33, no. 11 (2006): 3171–84. 11. zoltán dezső and albert-lászló barabási, “halting viruses in scale-free networks,” physical review e 65, no. 5 (2002), doi: 10.1103/physreve.65.055103. 12. hans noel and brendan nyhan, “the ‘unfriending’ problem: the consequences of homophily in friendship retention for causal estimates of social influence,” sept. 2010, http://arxiv.org/abs/1009.3243. 13. johan bollen et al., “toward alternative metrics of journal impact: a comparison of download and citation data,” information processing & management 41, no. 6 (2005): 1419–40; xiaoming liu et al., “co-authorship networks in the digital library research community,” information processing & management 41, no. 6 (2005): 1462–80. 14. johan bollen et al., “clickstream data yields highresolution maps of science,” ed. alan ruttenberg, plos one 4, no. 3 (3, 2009): e4803. 15. eric kolaczyk, statistical analysis of network data (new york; london: springer, 2009). 16. alejandro cornejo and nancy lynch, “reliably detecting connectivity using local graph traits,” csail technical reports mit-csail-tr-2010–043, 2010, http://hdl.handle .net/1721.1/58484 (accessed feb. 17, 2011). 17. réka albert, hawoong jeong, and albert-lászló barabási, “error and attack tolerance of complex networks,” nature 406, no. 6794 (2000): 378–82. 18. m. e. j. newman, “scientific collaboration networks. ii. shortest paths, weighted networks, and centrality,” physical review e 64, no. 1 (2001), doi: 10.1103/physreve.64.016132. 19. albert, jeong, and barabási, “error and attack tolerance.” 20. r. milo, “network motifs: simple building blocks of complex networks,” science 298, no. 5594 (2002): 824–27. 21. lawrence j. hubert, “some applications of graph theory to clustering,” psychometrika 39, no. 3 (1974): 283–309. 22. noel and nyhan, “the ‘unfriending’ problem.” 23. alexandru balaban and douglas klein, “co-authorship, rational erdős numbers, and resistance distances in graphs,” scientometrics 55, no. 1 (2002): 59–70. 24. liu et al., “co-authorship networks in the digital library research community.” 25. derek j. de solla price, “networks of scientific papers,” in a subject–author graph for that institution may locate potential “bridge” subjects to collaborate in. ■■ i’m leaving my current job. what other institutions are doing similar work? in an institution–subject graph, the shorter the path length between two institutions, the more comparable they may be. graphs also enable libraries to reach outside their own data to build connections with other data sets. heterogeneity, which makes relational database representations of arbitrary relationships difficult or impossible, becomes a trivial matter of adding additional nodes and edges to bridge collections. the linked data web defines simple requirements for establishing just such representations, and libraries are wellpositioned to build these bridges. ■■ conclusion for many centuries, libraries have served as repositories for the accumulated knowledge and creative products of civilization, and they contain mankind’s best efforts at comprehending complexity. this knowledge includes scientific works that strive to understand various aspects of the physical world, many of which are complex and require the efforts of numerous researchers over time. since the advent of the dewey decimal system, librarians have worked on many fronts to make this knowledge discoverable and to assist in its evaluation. qualitative evaluation increasingly requires understanding a resource in a larger context. we suggest that this context is itself a complex system, which would benefit from the modeling and quantitative evaluation techniques that network science has to offer. we believe librarians are well positioned to leverage network science to explore and comprehend emergent properties of complex information environments. as motivation for this pursuit, we offer in closing this prescient quote from carl woese, which though focused on the discipline of biology, is equally applicable to the myriad complexities of modern life: “a society that permits biology to become an engineering discipline, that allows that science to slip into the role of changing the living world without trying to understand it, is a danger to itself.”47 references 1. melanie mitchell, complexity: a guided tour (oxford, england; new york: oxford univ. pr., 2009). 2. carl woese, “a new biology for a new century,” microbiology and molecular biology reviews (june 2004): 173–86, doi: 10.1128/mmbr. 68.2.173–186.2004. 3. national research council (u.s.), network science (washington, d.c.: national academies pr., 2005). 4. tim berners-lee, “giant global graph,” online posting, graphs in libraries: a primer | powell et al. 169 networks,” proceedings of the national academy of sciences of the united states of america 98, no. 2 (jan. 16, 2001): 404–9. 38. chris bizer, richard cyganiak, and tom heath, how to publish linked data on the web? http://sites.wiwiss.fu-berlin.de/ suhl/bizer/pub/linkeddatatutorial/ (accessed feb. 17, 2011). 39. geonames, http://www.geonames.org/ (accessed feb. 17, 2011). 40. dbpedia, http://dbpedia.org/ (accessed february 17, 2011). 41. alexandre passant and phillippe laublet, “meaning of a tag: a collaborative approach to bridge the gap between tagging and linked data,” proceedings of the www 2008 workshop linked data on the web (ldow2008), bejing, apr. 2008, doi: 10.1.1.142.6915. 42. oclc, “viaf”; oclc homepage, http://www.oclc.org/ us/en/default.htm (accessed feb. 17, 2011). 43. norman biggs, graph theory, 1736–1936 (oxford, england; new york: clarendon, 1986). 44. bin jiang, “small world modeling for complex geographic environments,” in complex artificial environments (springer berlin heidelberg, 2006): 259–71, http://dx.doi.org/10.1007/3 -540-29710-3_17. 45. gillian byrne and lisa goddard, “the strongest link: libraries and linked data,” d-lib magazine 16, no. 11/12 (2010), http://www.dlib.org/dlib/november10/byrne/11byrne.html (accessed feb. 17, 2011). 46. daniel sui, “tobler’s first law of geography: a big idea for a small world?” annals of the association of american geographers 94, no. 2 (2004): 269–77. 47. woese, “a new biology for a new century.” science 149, no. 3683 (july 30, 1965): 510–15. 26. m. e. j. newman, “the first-mover advantage in scientific publication,” epl (europhysics letters) 86, no. 6 (2009): 68001. 27. bollen et al., “clickstream data yields high-resolution maps of science.” 28. chaomei chen, jasna kuljis, and ray j. paul, “visualizing latent domain knowledge,” ieee transactions on systems, man and cybernetics, part c (applications and reviews) 31, no. 4 (nov. 2001): 518–29. 29. hugh g. lewis et al., “a new analysis of debris mitigation and removal using networks,” acta astronautica 66, no. 1–2 (2010): 257–68. 30. dezso and barabási, “halting viruses in scale-free networks.” 31. oclc, “viaf (the virtual international authority file) [oclc—activities],” http://www.oclc.org/research/activities/viaf/ (accessed feb. 17, 2011). 32. mitchell, complexity: a guided tour. 33. james f. allen, “toward a general theory of action and time,” artificial intelligence 23, no. 2 (1984): 123–54. 34. herbert van de sompel et al., “memento: timemap apo for web archives,” http://www.mementoweb.org/events/ ia201002/slides/memento_201002_timemap.pdf (accessed feb. 17, 2011). 35. hawoong jeong et al., “lethality and centrality in protein networks,” nature 411 (may 3, 2001): 41–42. 36. aaron clauset, cristopher moore, and m. e. j. newman, “hierarchical structure and the prediction of missing links in networks,” nature 453, no. 7191 (2008): 98–101. 37. m. e. j. newman, “the structure of scientific collaboration incoming editor’s column | gerrity 155 bob gerrity g reetings ital readers. i’m writing this in late september, as the boston red sox attempt to back their way into the major league baseball postseason after blowing a 9-game lead over tampa bay in a major-league september meltdown of epic proportions. [red sox fans are prone to hyperbole, but in this case no hyperbole is needed: this meltdown really is epic.] it’s down to the last game of the season, and like many red sox fans, i’m hopeful but not optimistic. the fate of the 2011 red sox will be old news by the time this appears in print, though: as i’m coming to learn, the wheels of scholarly publishing continue to turn ever so slowly, unless forced to do otherwise. which brings me to why i’m taking on the role of editor of ital. on one hand, i’m fortunate to be taking on the editorship of a journal that quite clearly has been stewarded with care, dedication, and attention by my predecessors. i’ve spent quite a few hours recently in the z678.9 section of my library’s stacks, perusing three decades of back volumes of ital and its predecessor, the journal of library automation. there’s an impressive body of scholarly and informational output on library automation and related topics, from the sublime (“to boolean or not to boolean?” september 1983), to the not-so-sublime (“the effects of baud rate, performance anxiety, and experience in online bibliographic searches,” march 1990), to the sentimental (“floppies to pass the billion-dollar level in ’84.” september 1982), to the déjà-vu-all-over-again (“ls2000—the integrated library system for oclc,” june 1984). overall, i’d have to say there’s a solid foundation to build on, plus plenty of good content in the pipeline, and it would be easy to continue on in the same vein. but that’s not why i’m here. i’m fortunate to be taking on the role of editor as ital faces significant changes. in his inaugural editorial for ital in march 2005, then-incoming editor john webb articulated a number of worthy goals for ital, to both broaden and deepen the content of the journal and the demographic of the authors contributing to it. one goal in particular, though, strikes me (in hindsight of course) as problematic: “i hope to . . . facilitate the electronic publication of articles without endangering—but in fact enhancing—the absolutely essential financial contribution that the journal provides to the association.” anyone who has observed the struggles of the newspaper industry in recent years or been involved in the shift towards e-only in the world of academic/scholarly journals will not be surprised to learn that, in the intervening years since john wrote his column and ital has continued in print plus electronic form, revenues (primarily from subscriptions and advertising) have steadily declined while production and distribution costs have not, resulting in an increasing annual subsidy from ala/lita to support the publication. as a result, i’ve been tasked with exploring a new publication model for ital: open access and electronic only. plans for—and the timing of — this transition are still being developed as i write this, but should be finalized before “my” first issue is published in march 2012. there is much about ital that will not change even if the publication format does. a primary focus of the journal will continue to be to solicit and publish high-quality, peer-reviewed papers covering a broad array of topics related to the design, application, and use of technology in libraries. changes i would like to see include making ital more timely and more relevant to the day-to-day work interests of many of its readers. i’d like to add more topical, current, and informational content to ital without negatively impacting its traditional role as a publication vehicle for librarians in tenure-track positions. ital in an e-only format also needs to provide easy and transparent ways for readers to be informed when new content is published and to offer advice, criticism, and commentary to help improve ital. i look forward to your feedback as ital moves in a new direction, about which i’m both hopeful and optimistic. i would like to offer my sincere thanks to the outgoing editor of ital, marc truitt, who has been both helpful and gracious during this editorial transition. marc is passionate about ital and its legacy, and i hope he’ll see the future ital as a worthy successor to, rather than an unfortunate break from, the journal he’s stewarded for the past several years. incoming editor’s column: ch-ch-ch-ch-changes (turn and face the strain) bob gerrity (robert.gerrity@bc.edu) is associate university librarian for information technology, boston college libraries, chestnut hill, massachusetts. 50 communications how long the wait until we can call it television jerry borrell: congressional research service, library of congress , washington , d.c* this brief article will review videotex and teletext. there is little need to define terminology because new hybrid systems are being devised almost constantly (hats off to oclc's latest buzzword-viewtel). ylost useful of all would be an examination of the types of technology being used for information provision. the basic requirement for all systems is a data base-i.e ., data stored so as to allow its retrieval and display on a television screen. the interactions between the computer and the television screens are means to distinguish technologies. in teletext and videotex a device known as a decoder uses data encoded onto the lines of a broadcast signal (whatever the medium of transmission ) to generate the display screen. in videotex, voice grade telephone lines or interactive cable are used to carry data communications between two points (usually 1200 baud from the computer and 300 baud or less from the decoder and th e television screen). in teletext the signal is broadcast over airwaves (wideband) or via a time-sharing system (narrowband). the numerous configurations possible make straightforward classification of syst e ms questionable. a review of the systems currently available is useful to illustrate these terms, videotex and teletext. compuserve, the columbus, ohio-based company, provides on-line searching of newspapers to about 4,000 users. reader's digest recently acquired 51 percent of the source, a time*the views expressed in this paper do not necessarily represent those of the library of congress or of the congressional research ser~ vice. sharing service that provides more than 100 different (nonbibliographic) data bases to about 5,000 users. the warner and american express joint project, qube (also columbus-based), utilizes cable broadcast with a limited interactive capability . it does not allow for on-demand provision of information ; rather, it uses a polling technique. antiope, the french teletext system, used at ksl in st. louis last year and undergoing further tests in los angeles at knxt in the coming year, is only part of a complex data transmission system known as didon. antiope is also at an experimental stage in france, with 2,500 terminals scheduled for use in 1981. ceef ax and oracle , broadcast teletext by the bbc and ibc in britain, have an estimated 100,000 users currently. two thousand adapted television sets are being sold every month . prestel, bbc's videotex system, currently has approximately 5,000 users, half of whom are businesses. all other countries in europe are conducting experiments with one of the technologies. in canada, telidon, the most technically advanced system, has 200 users. experiments involving telidon are being conducted nationwide due to government interest in telecommunications improvements. telidon will also be used in washington in the spring of 1981 for consumer evaluation. these cursory notes should indicate the breadth of interes t in alternative means of information provision. video and electronic publishing newsletters (see references) keep track of the number of users and are the best way to keep informed of activities and developments. several important trends are becoming evident. perhaps the most evident is the realization that videography is being developed in countries other than the u.s. as a result of strong support by the national posts and telecommunications (ptt) authorities . until recently there was a feeling that the u.s. was technically behind europe. what is now evident is that in the free market system of the u . s. manufacturers or other potential system providers have had insufficient impetus to provide videotex/teletext technology. the technology of information display (see borrell, journal of library automation, v.13 (dec. 1980), p.277-81) in the u.s. is an order of magnitude more sophisticated than in europe. the point being that in the absence of strong ptt pressure, videography in the u . s. developed for specialized markets in which telecommunications were not a central need. in the one area of great demand, teletext services for the hearing impaired, decoders were developed and have been employed for a number of years (about 25,000 are currently in use ). as the high cost of telecommunications bandwidth is eased by data compression, direct broadcasting by satellite, enhanced cable services, and fiber optic networks, then videotex and te letext will become available on a wide scale in the u.s. the computer inquiry ii decision by the fcc involving reinterpretation of the communications act of 1934 has given at&t permission to enter the data processing market . in fact, at&t, in its third experiment with videotex, is taking such an aggressive stance that it seems to be doing everything that its critics have feared: providing updatable classified ads (dynamic yellow pages), allowing users to place information into the system memory , and providing voice mail servicesthereby taking on the newspapers, home computer manufacturers, and the u . s. postal service. in addition, banking services will be offered . as the largest company in the u.s ., this stance cannot be ignored. at&t supplies about 80 percent of the phone service in the u .s., and has the potential, if allowed , to become a broadcaster, data processor, publisher, and banker ; cross-ownership was never allowed up to this time . the trend toward specialized services provision is also exemplified by the communications 51 french and british systems. prestel , which was originally targeted for a home market, is now promoted with the tacit policy of being a special business service allowing financial and private data to be provided to subscribers. sofratev, the marketers of the french teletext system, are acknowledging the importance of transactional markets in two ways, based on technology they have named "smart card," a credit card-size (in one configuration) plate with a built-in microprocessor or chip. the card will allow system users to access material that will have controlled readership. an example would be a magazine of financial data provided to those who need such information (or, more importantly, are willing to pay for it). in a more complex effort, the largest retailer in paris will advertise material via teletext and system users will be able to make acquisitions with their smart card, which can be programmed with financial data. nor is this the end of the effort by the french to market information display technology. the electronic phone directory, being offered by bell in austin , is replicated in a more modest way by the french, who plan to produce a six-byeight-inch black-and,white display unit that will provide. phone directory information (both white and yellow pages) to all of france by the 1990s. developed as part of the "telerriatique" program of the .french government, the terminals represent to some (the parent company of the source has tendered an offer for up to 250,000 of the terminals) a low-cost alternative for providing videotex to a mass market. the tandy home computer in its videotex configuration seems to fill the same market slot. perhaps the most disturbing trend, at least from a librarian's point of view, is the fact that contemporary data systems are being created which could benefit greatly from the experience of librarians and libraries. for instance, research into the methods of access-keyword, phonetic and geographical-by the french is intended to pro:vide a flexible and easily used system for untrained persons searching for directory information, and is being performed by an advertising and yellow pages 52 journal of library automation vol. 14/1 march 1981 publishing firm. with a feeling of deja vu i listened to an explanation of how difficult it is to develop a system for the novice; one proposed solution is to allow only the first four letters of a word to be entered (one of the search methods used at the library of congress, which does suggest some cross-fertilization ). whatever the trends, the reality is that librarians and information scientists are playing decreasing roles in the growth of information display technology. hardware systems analysts, advertisers, and communications specialists are the main professions that have an active role to play in the information age. perhaps the answer is an immediate and radical change in the training of library schools of today. our small role may reflect our penchant to be collectors, archivists, and guardians of the information repositories . have we become the keepers of the system? the demand today is for service, information, and entertainment. if we librarians cannot fulfill these needs our places are not assured. should the american library association (ala) be ensuring that libraries are a part of all ongoing tests of videotex-at least in some way-either as organizers, information providers, or in analysis? consider the force of the argument given at the ala 1980 new york annual conference that cable television should be a medium that librarians become involved with for the future. certainly involvement is an important role, but we , like the industrialists and marketers before us, must make smart decisions and choose the proper niche and the most effective way to use our limited resources if we are to serve any part of society in the future. bibliography 1. electronic publishing revietc. oxford, england : learned information ltd . quarterly . 2. home video report . white plains, new york : knowledge industry publications. weekly. 3. ieee transactions on consumer electronics. new york: ieee broadcast, cable, and consumer electronics soc iety . five tim es yearly. 4. international videotex /te letext news. washington , d. c.: arlen communications ltd. monthly . 5. videodisc/teletext news. westport , conn.: microform revi ew. quarterly. 6. videoprint. norwalk , conn.: videoprint. two times monthly. 7. viewdata/videotex report. new york: link resources corp. monthly. data processing library: a very special library sherry cook, mercedes dumlao, and maria szabo: bechtel data processing library, san francisco, california. the 1980s are here and with them comes the ever broadening application of the computer. this presents a new challenge to libraries. what do we do with all these computer codes? how do we index the material? and most importantly, how do we make it accessible to our patrons or computer users? bechtel's data processing library has met these demands. the genesis for th e collection was bechte l's conversion from a honeywell 6000 computer to a univac lloo in 1974. all the programs in use at that time were converted to run on the univac system. it seemed a good time to put all of the computer programs together from all of the various bechtel divisions into a controlled collection. the librarians were charged with the responsibility of enforcing standards and control of bechtel's computer programs. the major benefits derived from placing all computer programs into a controlled library were: 1. company-wide usage of the programs. 2. minimize investment in program development through common usage. 3. computer file and documentation storage by the library to safeguard the investment. 4. central location for audits of program code and documentation. 5. centralized reporting on bechtel programs . developing the collection involved basic cataloging techniques which were greatly modified to encompass all the information that computer programs generate, including actual code, documentation, and list66 1 comparative costs of converting shelf list records to machine readable form richard e. chapin and dale h. pretzer: michigan state university library, east lansing, michigan a study at michigan state university library compared costs of three different methods of conversion: keypunching, paper-tape typew1·iting, and optical scanning by a service bureau. the record converted included call number, copy number, first 39 letters of the author's name, first 43 letters of the title, and m.te of publication. source documents were all of the shelf list cards at the library. the end products were a master book tape of the library collections and a machine readable book card for each volume to be used in an automated circulation system. the problems of format, cost and techniques in converting bibliographic data to machine readable form have caused many libraries to defer the automation of certain routine operations. the literature offers little for the administrator facing the decisions of what to convert and how to convert it. automated circulation systems require at least partial conversion of the accumulated bibliographic record. the university of missouri, like many libraries, has been converting the past record only for books as they are circulated ( 1) . southern illinois university ( 2) and johns hopkins ( 3), on the other hand, have converted the record for their entire collections. the southern illinois program is based upon converting only the call number. johns hopkins has converted the call number, main entry, title, pagination, size, and number of copies. and missouri has recorded call number, accession number, and abbreviated author and title. costs of shelf list conversion/ chapin and pretzer 67 · several methods of converting the record have been described. missouri employed keypunching; southern illinois marked code sheets which were scanned electronically and converted to magnetic tape; johns hopkins, working from microfilm copy of the shelf list, used special type font and typed the records for optical scanning. an ibm report on converting the national union catalog recommended an on-line terminal as the best method of conversion ( 4). studies at michigan state university led to the conclusion that acquisition, serials, circulation, and card production contained certain routines that might well be automated. once automation of circulation was decided upon as our initial effort, decisions were necessary as to the conversion. it was recommended that a portion of the bibliographic record for all items in the shelf list should be converted. information other than the call number is being used for other programs ( 5) . cost figures for converting library records are scarce. in only two instances are figures available. the ibm report on the national union cata. log shows that the average entry in nuc contains 277 characters, with an estimated conversion cost ranging from $0.3531 to $0.417 per entry. the proposed conversion method employs an on-line terminal, a technique not available to most libraries. the johns hopkins conversion of "about 300,000 · cards" was accomplished by optical scanning and cost $18,170 (3,p.4). this figures out at about $.06 per record. later in the report it is stated that the conversion "is at a rate of $.0038 per character converted" ( 3,p.25). at $.06 per card and $.0038 per character, the converted record would consist of 16 characters! in the study herewith reported every effort was made to arrive at comparative cost figures for the three methods of conversion that are readily available to most research libraries: keypunching, paper-tape typewriting, and optical scanning as accomplished through a service bureau. methods of study the shelf list records of the michigan state university library were divided into three sections by numbering catalog drawers in sequence: 1,2,3; then 2,3,1; then 3,1,2. all the drawers marked with number one became one sample group; those marked two and three made up the other groups. this method of numbering the drawers gave samples from each area of the classification schedule for each method of conversion. the bibliographic data were taken directly from the shelf list without transferring information to worksheets. a sample of the shelf list shows that 74 per cent of the cards are library of congress cards or copies of library of congress proof slips. of those cards produced in the library, only 12 per cent of the total were abbreviated records. the keypunch operators, the typists, and the service bureau were in68 journal of library automation vol. 1/ 1 march, 1968 structed to extract information from the shelf list record. all differences in type-capitals, italics, etc.-were to ~e ignored; transliterated titles were to he used in those cases where entries were in non-roman alphabet; accents and diacritical marks were ignored, except where it made a difference in filing, as with umlauts; all numbers in title and author fields were to be spelled as if written. <d qd 941 • a4l3 c.l c.2 c.) jx 1417 • w47 c.l ~ auanovich. vladimir moiseevfela. <!> apahaf dlsw.rslon m crystal rtics and the theory of excitons ,l>yl 1'. m. agranovic and v. l. ginzburg. trnnslnted from the original manuscript by literaturprojekt, innsbruck, a~tria. london, new york intencience publishers ,cu~r~ \!1 ' ml, 316 p. ua. 24 em. (lnterteleoce mon01rapba and te:dl in phyiiici.i and utronomy, y, 18) translation of kphctu .. oonthka c )"'etom n~tp&hctaeiiiioi .iiicnepc:hh h teo ph• jltchtohob ( romanlzed: ~latallooptlka i adletom prostrn n11t\ ennoj dl1pe11111 i teorul ~ksltonoy) b.lbllograpby: p . 807-313. 1. cry11tal optici. 2. e1:clto" tbeory. t. gln&burj, vltallt luanyich, 191&joint author. n. title. (sertea: lntencience monosj&phl id physics ao aattooomy, v. 18) qd941.a.13 548'.9 66-2th7 llbrarr of con1re1111 ed • ern.tional .relations• san francisco, chandler 0 fig. 1. shelf list cards. costs of shelf list conversion/ chapin and pretzer 69 information that was transcribed is marked in the example, figure 1. the complete call number 1) was included. author 2) was typed through 39 spaces, including dates, if possible. in cases where author entry was lengthy the operators were instructed to stop at the end of 39 spaces. title 3) was recorded as completely as possible th1·ough 43 spaces, but not to extend beyond the first major punctuation. date 4) was included as shown. only one copy 5) was shown on each entry. in the example of abbreviated form in figure 1, five separate records were required, with change only in copy number. the master book tape includes the call number, which occupies 32 spaces; 3 spaces are allowed for copy number, 39 for author, 43 for title, and 4 for date of publication. on the book card, figure 2, which was generated by the computer from the master book tape, the format is as follows: 32 spaces for call numbers, 3 for copy number, 11 for author, 26 for title and 4 for the year published. the remainder of the card is for machine codes used in the circulation system. l. i i i i ii i ill ill i i i michigan state fjnjversiiy i library i i i i i i i i i ii i importa.fr: i if this card is lost or damaged, a fine jll ie chargta. msu 7j6 i i i i i fig. 2. book pocket card. i i i i i i ii . i i the book card alone can be created directly by the keypunch. however, if a library has equipment available for a more complete program, it is useful to prepare information in a format to create a master book tape. programs have been written so that the master tape can be added to or deleted from at a later date. four operators worked on the project at michigan state university. two of them were average keypunch operators with little typing skill, one was an expert typist, and the other was an expert keypunch operator. the first two operators were trained to use both the keypunch and the flexowriter. the purpose in using a variety of typists and operators for the job was to arrive at average figures for the conversion project. the data show great variance of output among operators. .70 journal of library automation vol. 1/ 1 march, 1968 the outline of the methods used is shown in figure 3. the keypunch method recorded the bibliographic data by use of an ibm 026 keypunch. the punch cards were transferred to a magnetic tape and the book cards were generated by the computer. the paper-tape typewriter information was punched in paper tape by the use of a 2201 flexowriter. a portion of the sample was converted directly to magnetic tape. since some libraries will not have a paper-tape to magnetic-tape converter, the remainder of the paper-tape sample was converted to punch cards and then to magnetic tape. typed 1-----+ page fig. 3. flowchart of shelf list record. optical scanner the optical scanning method was handled by farrington corporation·s service bureau, input services, in dayton, ohio. the service bureau assigned 10 to 15 employees to transcribe the shelf list. they used ibm selectric typewriters, with special type font. special symbols were used to designate end of field. the data were recorded on continuous-form paper. the typed record was then edited and scanned, producing a magnetic tape. mter the tape was used for production of book cards, it was added to the master book tape. the first batch of cards sent to dayton was gone from the library for approximately four weeks. after the personnel at dayton became accustomed to the format and to library terminology, the turnaround time was approximately two weeks. the 255,000 records which were converted by the service bureau were sent off campus in four separate batches. costs of shelf list conversion/chapin and pretzer 71 machine verification of the record was not required. each operator was instructed to proofread her own copy. machine verification was considered, but the idea was discarded because of the extra cost involved. also, since book cards were to be inserted in all volumes, final verification would result when the books and cards were matched. results in the conversion keypunching cost 6.63 cents per record. paper-tape ran slightly higher-7.07 cents; this higher cost was due to the added cost of machinery and the added cost of going from paper tape to magnetic tape. optical scanning, through a service bureau, was exactly the same as keypunching-6.63 cents, including the programming costs. cost details are shown in table 1. table 1. average cost per shelf list record converted labor (1) salary fringe benefits equipment rental (2) computer supplies overhead ( 4) contractual services keypunch $.04073 .03723 .00350 .00322 .00280 .00042 .00003 .02232 paper-tape typewriter $.03960 .03620 .00340 .00888 .00840 .00048 . (3) .00052 .02172 scanning, service bureau $.00030 .06600 (5) total $.06630 $.07072 $.06630 ( 1) average costs for all operators based upon salary of $2.10 per hour, and fringe benefits of 9.4 per cent. ( 2) rental tin1e to library of ibm 1401 computer is $30.00 per hour, including personnel costs. (3) includes $.000089 for tape-to-tape conversion and $.000091 for tape to card to magnetic tape conversion. ( 4) university charge of 54.87 per cent of salaries, for space, utilities, maintenance, etc. this figure does not include cost of training and supervision. ( 5) $.057 per record plus .009 per record for programming costs. late in the study we observed that a seemingly inordinate amount of the flexowriter time was consumed by the automatic movement of the typewriter caniage to the pre-determined fixed fields. in order to circum72 journal of library automation vol. 1/ 1 march, 1968 vent this the operator was instructed to strike one key to indicate end of field, and then she no longer had to wait for the carriage movement. by using the manual field markers, as opposed to automatic fixed fields, the cost of the flexowriter operation was reduced to 6.672 cents per record. the disadvantage of the manual field-marking system was the increased chance of operator error, which amounted to 3.13 per cent more than the fixed-field method. for this reason, and in spite of the economy of the manual method, the use of pre-determined fixed fields for flexowriter conversion is to be preferred. in the comparison of the salary costs for keypunching and for the use of flexowriter, great variations were shown among operators. two participants were asked to use both the keypunch and the flexowriter on varying days, with tallies of their output accounted for throughout the entire project. operator 1 was essentially a skilled keypunch operator who had some background in typing. her salary cost per record during keypunching was 3.98 cents; her salary for the paper-tape typewriter was 7.92 cents. operator 2 was a skilled keypunch operator who was also sent to typing class for one term to raise her typing skill. her salary cost was 3.92 cents per record on the keypunch and 3.79 cents per record on the paper-tape machine. operator 3, who was a skilled keypunch operator, averaged 2.32 cents per record for salary cost. operator 4, who was a typist and not a keypunch operator, produced records on the flexowriter at a cost of 3.56 cents per record. the above figures indicate salaries only, and do not include overhead, fringe benefits, and other expenses which are reflected in the total conversion cost shown. a letter from farrington service corporation stated the following information about the scanning operation: "1) our typists produced an approximate total of 7,950 typing pages in the course of this conversion. 2) each typist averaged from 3.6 to 3.8 pages per hour. 3) we processed an average of 800-1,000 (shelf list) cards, per girl, per day. 4) the total man hours expended in this project was 2,144. 5) the amount of error detected as a result of sight verification varies significantly from girl to girl. the average, however, ran approximately 2.8 per cent (of records to be corrected)." comparison was made of actual records converted per eight-hour day by each of the methods. the service bureau, with skilled typists, was able to convert approximately 100 records an hour for each typist. the most efficient keypunch operator averaged about 75 records per hour, which was noticeably more than the average. the paper-tape typist, using pre-programmed fixed fields, reached 65 records per hour, but was able to produce 73 records per hour by manually typing the field markers. a short-run sample was stop-watch-timed to give an indication of the differences in results for each method when only minimum changes in certain fields, such as copy number or volume number, were required. costs of shelf list conversion/ chapin and pretzer 73 on the keypunch machine an operator consumed 34.6 seconds in typing the initial record and 20.4 seconds in duplicating the basic information and changing data in one given field. the operator with the automatic program flexowriter consumed 47.2 seconds typing the initial record, including 13.2 seconds in shifting fields and automatically firing the record marks, and 24 seconds duplicating the record. when she manually indicated the field information, she was able to convert the initial record in slightly less time-30 seconds; and she took 22.8 seconds to duplicate the data with a change in one field. final verification will be completed only when all cards are matched with the proper books. for those books that do not circulate, this may never be accomplished. a sample of cards was selected to reflect the three methods of conversion. the service bureau cards contained fewer errors than those produced by keypunching and paper-tape typewriting. production of records that were not acceptable to the computer in an edit program occurred in 1.75 per cent of the sample for keypunching, 0.93 per cent for paper-tape typewriting, and 0.16 per cent for service bureau. operator errors, discovered while matching cards with books, showed a higher percentage: 4.62 per cent for keypunching, 3.60 per cent for flexowriter, and 0.35 per cent for service bureau. conclusions and recommendations 1. the cost of converting a portion of the bibliographic record is relatively inexpensive when compared to the total cost of automated library programs. one reason for our delay in entering into the field of an automated circulation program was that of making the book cards. now that this task has been completed, it is obvious that conversion is a one-time cost that can well be absorbed. if the library cannot afford the original conversion, at a cost of 6 or 7 cents a record, then the library cannot afford to proceed with automated programs. 2. there is no difference in cost between keypunching a machine readable record and in having the project undertaken by a service bureau. the use of paper-tape typewriter for conversion costs more than the other two methods. 3. large scale conversion of records to machine readable form might well be done by an outside organization. in order to get the task completed in a short period of time, a library would be required to hire a number of short-term clerical employees. in the case of michigan state, situated in the small community of east lansing, recruiting and training a large number of employees for short-term projects is most difficult. it is rather certain that the overhead for such a program would bring the cost beyond that of using a service bureau. on the basis of our experience it is recommended that the conversion be sent to a service bureau. 74 journal of library automation vol. 1/ 1 march, 1968 4. a library can get along without portions of. a shelf list for short periods of time. one of the predicted problems of sending material off campus to be converted was that of losing the availability of the shelf list records. although there were some inconveniences, it was found that the library could carry on its operations and function without the shelf list. certainly, this could not be done if the shelf list cards were gone for any length of time. acknowledgment a grant from the council on library resources, inc., made possible the study described in this paper. references 1. parker, ralph h.: "development of automatic systems at the university of missouri library," in university of illinois graduate school of library science, ptoceedings of the 1963 clinic on library applications of data processing. (champaign, ill.: illini union bookstore, 1964)' 43-55. 2. southern illinois university. office of systems and procedures: an automated circulation control system for the delyte w. morris library; the system and its progress in brief. (carbondale, ill.: southern illinois university, 1963). 3. the johns hopkins university. the milton s. eisenhower library: progress report on an operations research and systems engineering study of a university library. (baltimore: johns hopkins, 1965). 4. international business machines. federal systems division: report on a pilot project for converting the pre-1952 national union catalog to a machine readable record. (rockville, maryland: ibm, 1965). 5. chapin, richard e.: "administrative and economic considerations for library automation," in university of illinois graduate school of library science, proceedings of the 1967 clinic on applications of data processing. (in press). the next generation library catalog | yang and hofmann 141 sharon q. yang and melissa a. hofmann the next generation library catalog: a comparative study of the opacs of koha, evergreen, and voyager open source has been the center of attention in the library world for the past several years. koha and evergreen are the two major open-source integrated library systems (ilss), and they continue to grow in maturity and popularity. the question remains as to how much we have achieved in open-source development toward the next-generation catalog compared to commercial systems. little has been written in the library literature to answer this question. this paper intends to answer this question by comparing the next-generation features of the opacs of two open-source ilss (koha and evergreen) and one proprietary ils (voyager’s webvoyage). m uch discussion has occurred lately on the nextgeneration library catalog, sometimes referred to as the library 2.0 catalog or “the third generation catalog.”1 different and even conflicting expectations exist as to what the next-generation library catalog comprises: in two sentences, this catalog is not really a catalog at all but more like a tool designed to make it easier for students to learn, teachers to instruct, and scholars to do research. it provides its intended audience with a more effective means for finding and using data and information.2 such expectations, despite their vagueness, eventually took concrete form in 2007.3 among the most prominent features of the next-generation catalog are a simple keyword search box, enhanced browsing possibilities, spelling corrections, relevance ranking, faceted navigation, federated search, user contribution, and enriched content, just to mention a few. over the past three years, libraries, vendors, and open-source communities have intensified their efforts to develop opacs with advanced features. the next-generation catalog is becoming the current catalog. the library community welcomes open-source integrated library systems (ilss) with open arms, as evidenced by the increasing number of libraries and library consortia that have adopted or are considering opensource options, such as koha, evergreen, and the open library environment project (ole project). librarians see a golden opportunity to add features to a system that will take years for a proprietary vendor to develop. open-source opacs, especially that of koha, seem to be more innovative than their long-established proprietary counterparts, as our investigation shows in this paper. threatened by this phenomenon, ils vendors have rushed to improve their opacs, modeling them after the next-generation catalog. for example, ex libris pushed out its new opac, webvoyage 7.0, in august of 2008 to give its opac a modern touch. one interesting question remains. in a competition for a modernized opac, which opac is closest to our visions for the next-generation library catalog: opensource or proprietary? the comparative study described in this article was conducted in the hope of yielding some information on this topic. for libraries facing options between open-source and proprietary systems, “a thorough process of evaluating an integrated library system (ils) today would not be complete without also weighing the open source ils products against their proprietary counterparts.”3 ■■ scope and purpose of the study the purpose of the study is to determine which opac of the three ilss—koha, evergreen, or webvoyage—offers more in terms of services and is more comparable to the next-generation library catalog. the three systems include two open-source and one proprietary ilss. koha and evergreen are chosen because they are the two most popular and fully developed open-source ilss in north america. at the time of the study, koha had 936 implementations worldwide; evergreen had 543 library users.4 we chose webvoyage for comparison because it is the opac of the voyager ils by ex libris, the biggest ils vendor in terms of personnel and marketplace.5 it also is one of the more popular ilss in north america, with a customer base of 1,424 libraries, most of which are academic.6 as the sample only includes three ilss, the study is very limited in scope, and the findings cannot be extrapolated to all open-source and proprietary catalogs. but, hopefully, readers will gain some insight into how much progress libraries, vendors, and open-source communities have achieved toward the next-generation catalog. ■■ literature review a review of the library literature found two relevant studies on the comparison of opacs in recent years. the first study was conducted by two librarians in slovenia investigating how much progress libraries had made toward the next-generation catalog.7 six online catalogs sharon q. yang (yangs@rider.edu) is systems librarian and melissa a. hofmann (mhofmann@rider.edu) is bibliographic control librarian, rider university. 142 information technology and libraries | september 2010 were examined and evaluated, including worldcat, the slovene union catalog cobiss, and those of four public libraries in the united states. the study also compared services provided by the library catalogs in the sample with those offered by amazon. the comparison took place primarily in six areas: search, presentation of results, enriched content, user participation, personalization, and web 2.0 technologies applied in opacs. the authors gave a detailed description of the research results supplemented by tables and snapshots of the catalogs in comparison. the findings indicated that “the progress of library catalogues has really been substantial in the last few years.” specifically, the library catalogues have made “the best progress on the content field and the least in user participation and personalization.” when compared to services offered by amazon, the authors concluded that “none of the six chosen catalogues offers the complete package of examined options that amazon does.”8 in other words, library catalogs in the sample still lacked features compared to amazon. the other comparative study was conducted by linda riewe, a library school student, in fulfillment for her master’s degree from san jose university. the research described in her thesis is a questionnaire survey targeted at 361 libraries that compares open-source (specifically, koha and evergreen) and propriety ilss in north america. more than twenty proprietary systems were covered, including horizon, voyager, millennium, polaris, innopac, and unicorn.9 only a small part of her study was related to opacs. it involved three questions about opacs and asked librarians to evaluate the ease of use of their ils opac’s search engines, their opac search engine’s completeness of features, and their perception of how easy it is for patrons to make self-service requests online for renewals and holds. a scale of 1 to 5 was used (1 = least satisfied; 5= very satisfied) regarding the three aspects of opacs. the mean and medium satisfaction ratings for open-source opacs were higher than those of proprietary ones. koha’s opac was ranked 4.3, 3.9, and 3.9, respectively in mean, the highest on the scale in all three categories, while the proprietary opacs were ranked 3.9, 3.6, and 3.6.10 evergreen fell in the middle, still ahead of proprietary opacs. the findings reinforced the perception that open-source catalogs, especially koha, offer more advanced features than proprietary ones. as riewe’s study focused more on the cost and user satisfaction with ilss, it yielded limited information about the connected opacs. no comparative research has measured the progress of open-source versus proprietary catalogs toward the next-generation library catalog. therefore the comparison described in this paper is the first of its kind. as only koha, everygreen, and voyager’s opacs are examined in this paper, the results cannot be extrapolated. studies on a larger scale are needed to shed light on the progress librarians have made toward the next-generation catalog. ■■ method the first step of the study was identifing and defining of a set of measurements by which to compare the three opacs. a review of library literature on the next-generation library catalog revealed different and somewhat conflicting points of views as to what the nextgeneration catalog should be. as marshall breeding put it, “there isn’t one single answer. we will see a number of approaches, each attacking the problem somewhat differently.”11 this study decided to use the most commonly held visions, which are summarized well by breeding and by morgan’s lita executive summary.12 the ten parameters identified and used in the comparison were taken primarily from breeding’s introduction to the july/ august 2007 issue of library technology reports, “nextgeneration library catalogs.”13 the ten features reflect some librarians’ visions for a modern catalog. they serve as additions to, rather than replacements of, the feature sets commonly found in legacy catalogs. the following are the definitions of each measurement: ■■ a single point of entry to all library information: “information” refers to all library resources. the next-generation catalog contains not only bibliographical information about printed books, video tapes, and journal titles but also leads to the full text of all electronic databases, digital archives, and any other library resources. it is a federated search engine for one-stop searching. it not only allows for one search leading to a federation of results, it also links to full-text electronic books and journal articles and directs users to printed materials. ■■ state-of-the-art web interface: library catalogs should be “intuitive interfaces” and “visually appealing sites” that compare well with other internet search engines.14 a library’s opac can be intimidating and complex. to attract users, the next-generation catalog looks and feels similar to google, amazon, and other popular websites. this criterion is highly subjective, however, because some users may find google and amazon anything but intuitive or appealing. the underlying assumption is that some internet search engines are popular, and a library catalog should be similar to be popular themselves. ■■ enriched content: breeding writes, “legacy catalogs tend to offer text-only displays, drawing only on the marc record. a next-generation catalog might bring in content from different sources to strengthen the visual appeal and increase the amount of information presented to the user.”15 the enriched content the next generation library catalog | yang and hofmann 143 includes images of book covers, cd and movie cases, tables of contents, summaries, reviews, and photos of items that traditionally are not present in legacy catalogs. ■■ faceted navigation: faceted navigation allows users to narrow their search results by facets. the types of facets may include subjects, authors, dates, types of materials, locations, series, and more. many discovery tools and federated search engines, such as villanova university’s vufind and innovative interface’s encore, have used this technology in searches.16 auto-graphics also applied this feature in their opac, agent iluminar.17 ■■ simple keyword search box: the next-generation catalog looks and feels like popular internet search engines. the best example is google’s simple user interface. that means that a simple keyword search box, instead of a controlled vocabulary or specific-field search box, should be presented to the user on the opening page with a link to an advanced search for user in need of more complex searching options. ■■ relevancy: traditional ranking of search results is based on the frequency and positions of terms in bibliographical records during keyword searches. relevancy has not worked well in opacs. in addition, popularity is another factor that has not been taken into consideration in relevancy ranking. for instance, “when ranking results from the library’s book collection, the number of times that an item has been checked out could be considered an indicator of popularity.”18 by the same token, the size and font of tags in a tag cloud or the number of comments users attach to an item may also be considered relevant in ranking search results. so far, almost no opacs are capable of incorporating circulation statistics into relevancy ranking. ■■ “did you mean . . . ?”: when a search term is not spelled correctly or nothing is found in the opac in a keyword search, the spell checker will kick in and suggest the correct spelling or recommend a term that may match the user’s intended search term. for example, a modern catalog may generate a statement such as “did you mean . . . ?” or “maybe you meant . . . .” this may be a very popular and useful service in modern opacs. ■■ recommendations and related materials: the nextgeneration catalog is envisioned as promoting reading and learning by making recommendations of additional related materials to patrons. this feature is an imitation of amazon and websites that promote selling by stating “customers who bought this item also bought . . . .” likewise, after a search in the opac, a statement such as “patrons who borrowed this book also borrowed the following books . . .” may appear. ■■ user contribution—ratings, reviews, comments, and tagging: legacy catalogs only allow catalogers to add content. in the next-generation catalog, users can be active contributors to the content of the opac. they can rate, write reviews, tag, and comment on items. user contribution is an important indicator for use and can be used in relevancy ranking. ■■ rss feeds: the next-generation catalog is dynamic because it delivers lists of new acquisitions and search updates to users through rss feeds. modern catalogs are service-oriented; they do more than provide a simple display search results. the second step is to apply these ten visions to the opacs of koha, evergreen, and webvoyage to determine if they are present or absent. the opacs used in this study included three examples from each system. they may have been product demos and live catalogs randomly chosen from the user list on the product websites. the latest releases at the time of the study was koha 3.0, evergreen 2.0, webvoyage 7.1. in case of discrepancies between product descriptions and reality, we gave precedence to reality over claims. in other words, even if the product documentation lists and describes a feature, this study does not include it if the feature is not in action either in the demo or live catalogs. despite the fact that a planned future release of one of those investigated opacs may add a feature, this study only recorded what existed at the time of the comparison. the following are the opacs examined in this paper. koha ■■ koho demo for academic libraries: http://academic .demo.kohalibrary.com/ ■■ wagner college: http://wagner.waldo.kohalibrary .com/ ■■ clearwater christian college: http://ccc.kohalibrary .com/ evergreen ■■ evergreen demo: http://demo.gapines.org/opac/ en-us/skin/default/xml/index.xml ■■ georgia pines: http://gapines.org/opac/en-us/ skin/default/xml/index.xml ■■ columbia bible college at http://columbiabc .evergreencatalog.com/opac/en-ca/skin/default/ xml/index.xml webvoyage ■■ rider university libraries: http://voyager.rider.edu ■■ renton college library: http://renton.library.ctc .edu/vwebv/searchbasic 144 information technology and libraries | september 2010 ■■ shoreline college library: http://shoreline.library .ctc.edu/vwebv/searchbasic the final step includes data collection and compilation. a discussion of findings follows. the study draws conclusions about which opac is more advanced and has more features of the next-generation library catalog. ■■ findings each of the opacs of koha, evergreen, and webvoyage are examined for the presence of the ten features of the next-generation catalog. single point of entry for all library information none of the opacs of the three ilss provides true federated searching. to varying degrees, each is limited in access, showing an absence of contents from electronic databases, digital archives, and other sources that generally are not located in the legacy catalog. of the three, koha is more advanced. while webvoyage and evergreen only display journal-holdings information in their opacs, koha links journal titles from its catalog to proquest’s serials solutions, thus leading users to fulltext journals in the electronic databases. the example in figure 1 (koha demo) shows the journal title unix update with an active link to the full-text journal in the availability field. the link takes patrons to serials solutions, where full text at the journal-title level is listed for each database (see figure 2). each link will take you into the full text in each database. state-of-the-art web interface as beauty is in the eye of the beholder, the interface of a catalog can be appealing to one user but prohibitive to another. with this limitation in mind, the out-of-thebox user interface at the demo sites was considered for each opac. all the three catalogs have the google-like simplicity in presentation. all of the user interfaces are highly customizable. it largely depends on the library to make the user interface appealing and welcoming to users. figures 3–5 show snapshots from each ilss demo sites and have not been customized. however, there are a few differences in the “state of the art.” for one, koha’s navigation between screens relies solely on the browser’s forward and back buttons, while webvoyage and evergreen have internal navigation buttons that more efficiently take the user between title lists, headings lists, and record displays, and between records in a result set. while all three opacs offer an advanced search page with multiple boxes for entering search terms, only webvoyage makes the relationship between the terms in different boxes clear. by the use of a drop-down box, it makes explicit that the search terms are by default anded and also allows for the selection of or and not. in koha’s and evergreen’s advanced search, however, the terms are anded only, a fact that is not at all obvious to the user. in the demo opacs examined, there is no option to choose or or not between rows, nor is there any indication that the search is anded. the point of providing multiple search boxes is to guide users in constructing a boolean search without their having to worry about operators and syntax. in koha, however, users have to type an or or not statement themselves within the text box, thus defeating the purpose of having multiple boxes. while evergreen allows for a not construction within a row (“does not contain”), it does not provide an option for or (“contains” and “matches exactly” are the other two options available). see figures figure 1. link to full-text journals in serials solutions in koha figure 2. links to serials solutions from koha the next generation library catalog | yang and hofmann 145 6–8. thus koha’s and evergreen’s advanced search is less than intuitive for users and certainly less functional than webvoyage’s. enriched content to varying degrees, enriched content is present in all three catalogs, with koha providing the most. while all three catalogs have book covers and movie-container art, koha has much more in its catalog. for instance, it displays tags, descriptions, comments, and amazon reviews. webvoyage displays links to google books for book reviews and content summaries but does not have tags, descriptions, and comments in the catalog. see figures 9–11. faceted navigation the koha opac is the only catalog of the three to offer faceted navigation. the “refine your search” feature allows users to narrow search results by availability, places, libraries, authors, topics, and series. clicking on a term within a facet adds that term to the search query and generates a narrower list of results. the user may then choose another facet to further refine the search. while evergreen appears to have faceted navigation upon first glance, it actually does not possess this feature. the following facets appear after a search generates hits: “relevant subjects,” “relevant authors,” and “relevant series.” but choosing a term within a facet does not narrow down the previous search. instead, it generates an entirely new search with the selected term; it does not add the new term to the previous query. users must manually combine the terms in the simple search box or through the advanced search page. webvoyage also does not offer faceted navigation—it only provides an option to “filter your search” by format, language, and date when a set of results is returned. see figures 12–14. keyword searching koha, evergreen, and webvoyage all present a simple keyword search box with a link to the advanced search (see figures 3–5). relevancy neither koha, evergreen, nor webvoyage provide any evidence for meeting the criteria of the next-generation catalog’s more inclusive vision of relevancy ranking, such as accounting for an item’s popularity or allowing user tags. koha uses index data’s zebra program for its relevance ranking, which “reads structured records in a variety of input formats . . . and allows access to them through exact boolean search figure 3. koha: state-of-the-art user interface figure 5. voyager: state-of-the-art user interface figure 4. evergreen: state-of-the-art user interface 146 information technology and libraries | september 2010 user contributions koha is the only system of the three that allows users to add tags, comments, descriptions, and reviews. in koha’s opac, user-added tags form tag clouds, and the font and size of each keyword or tag indicate that keyword or figure 6. voyager advanced search figure 7. koha advanced search figure 8. evergreen advanced search expressions and relevance-ranked free-text queries.19 evergreen’s dokuwiki states that the base relevancy score is determined by the cover density of the searched terms. after this base score is determined, items may receive score bumps based on word order, matching on the first word, and exact matches depending on the type of search performed.20 these statements do not indicate that either koha or evergreen go beyond the traditional relevancy-ranking methods of legacy systems, such as webvoyage. did you mean . . . ? only evergreen has a true “did you mean . . . ?” feature. when no hits are returned, evergreen provides a suggested alternate spelling (“maybe you meant . . . ?”) as well as a suggested additional search (“you may also like to try these related searches . . .”). koha has a spell-check feature, but it automatically normalizes the search term and does not give the option of choosing different one. this is not the same as a “did you mean . . . ?” feature as defined above. while the normalizing process may be seamless, it takes the power of choice away from the user and may be problematic if a particular alternative spelling or misspelling is searched purposefully, such as “womyn.” (when “womyn” is searched as a keyword in the koha demo opac, 16,230 hits are returned. this catalog does not appear to contain the term as spelled, which is why it is normalized to women. the fact that the term does not appear as is may not be transparent to the searcher.) with normalization, the user may also be unaware that any mistake in spelling has occurred, and the number of hits may differ between the correct spelling and the normalized spelling, potentially affecting discovery. the normalization feature also only works with particular combinations of misspellings, where letter order affects whether a match is found. otherwise the system returns a “no result found!” message with no suggestions offered. (try “homoexuality” vs. “homoexsuality.” in koha’s demo opac, the former, with a missing “s,” yields 553 hits, while the latter, with a misplaced “s,” yields none.) however, koha is a step ahead of webvoyage, which has no built-in spell checker at all. if a search fails, the system returns the message “search resulted in no hits.” see figures 15–17. recommendations/related materials none of the three online catalogs can recommend materials for users. the next generation library catalog | yang and hofmann 147 figure 9. koha enriched content figure 10. evergreen enriched content figure 11. voyager enriched content figure 12. koha faceted navigation figure 13. evergreen faceted navigation figure 14. voyager faceted navigation 148 information technology and libraries | september 2010 nevertheless, the user contribution in the koha opac is not easy to use. it may take many clicks before a user can figure out how to add or edit text. it requires user login, and the system cannot keep track of the search hits after a login takes place. therefore the user contribution features of koha need improvement. see figure 18. rss feeds koha provides rss feeds, while evergreen and webvoyage do not. ■■ conclusion table 1 is a summary of the comparisons in this paper. these comparisons show that the koha opac has six out of the ten compared features for the next-generation catalog, plus two halves. its full-fledged features include state-of-the-art web interface, enriched content, faceted navigation, a simple keyword search box, user contribution, and rss feeds. the two halves indicate the existence of a feature that is not fully developed. for instance, “did you mean . . . ?” in koha does not work the way the next-generation catalog is envisioned. in addition, koha has the capability of linking journal titles to full text via serials solutions, while the other two opacs only display holdings information. evergreen falls into second place, providing four out of the ten compared features: state-of-the-art interface, enriched content, a keyword search box, and “did you mean . . . ?” webvoyage, the voyager opac from ex libris, comes in third, providing only three out of the ten features for figure 15. evergreen: did you mean . . . ? figure 16. koha: did you mean . . . ? figure 17. voyager: did you mean . . . ? figure 18. koha user contibutions tag’s frequency of use. all the tags in a tag cloud serve as hyperlinks to library materials. users can write their own reviews to complement the amazon reviews. all user-added reviews, descriptions, and comments have to be approved by a librarian before they are finalized for display in the opac. the next generation library catalog | yang and hofmann 149 the next-generation catalog. based on the evidence, koha’s opac is more advanced and innovative than evergreen’s or voyager’s. among the three catalogs, the open-source opacs compare more favorably to the ideal next-generation catalog then the proprietary opac. however, none of them is capable of federated searching. only koha offers faceted navigation. webvoyage does not even provide a spell checker. the ils opac still has a long way to go toward the nextgeneration catalog. though this study samples only three catalogs, hopefully the findings will provide a glimpse of the current state of open-source versus proprietary catalogs. ils opacs are not comparable in features and functions to stand-alone opacs, also referred to as “discovery tools” or “layers.” some discovery tools, such as ex libris’ primo, also are federated search engines and are modeled after the next-generation catalog. recently they have become increasingly popular because they are bolder and more innovative than ils opacs. two of the best stand-alone open-source opacs are villanova university’s vufind and oregon state university’s libraryfind.21 both boast eight out of ten features of the next-generation catalog.22 technically it is easier to develop a new stand-alone opac with all the next-generation catalog features than mending old ils opacs. as more and more libraries are disappointed with their ils opacs, more discovery tools will be implemented. vendors will stop improving ils opacs and concentrate on developing better discovery tools. the fact that ils opacs are falling behind current trends may eventually bear no significance for libraries—at least for the ones that can afford the purchase or implementation of a more sophisticated discovery tool or stand-alone opac. certainly small and public libraries who cannot afford a discovery tool or a programmer for an open-source opac overlay will suffer, unless market conditions change. references 1. tanja mercun and maja žumer, “new generation of catalogues for the new generation of users: a comparison of six library catalogues,” program: electronic library & information systems 42, no. 3 (july 2008): 243–61. 2. eric lease morgan, “a ‘next-generation’ library catalog— executive summary (part #1 of 5),” online posting, july 7, 2006, lita blog: library information technology association, http:// litablog.org/2006/07/07/a-next-generation-library-catalog -executive-summary-part-1-of-5/ (accessed nov. 10, 2008). 3. marshall breeding, introduction to “next generation library catalogs,” library technology reports 43, no. 4 (july/aug. 2007): 5–14. 4. ibid. 5. marshall breeding, “library technology guides: key resources in the field of library automation,” http:// www .librarytechnology.org/lwc-search-advanced.pl (accessed jan. 23, 2010). 6. marshall breeding, “investing in the future: automation marketplace 2009,” library journal (apr. 1, 2009), http:// www .libraryjournal.com/article/ca6645868.html (accessed jan. 23, 2010). 7. marshall breeding, “library technology guides: company directory,” http://www.librarytechnology.org/exlibris .pl?sid=20100123734344482&code=vend (accessed jan. 23, 2010). 8. merčun and zumer, “new generation of catalogues.” 9. ibid. 10. linda riewe, “integrated library system (ils) survey: open source vs. proprietary-tables” (master’s thesis, san jose university, 2008): 2–5, http://users.sfo.com/~lmr/ils-survey/ tables-all.pdf (accessed nov. 4, 2008). 11. ibid., 26–27. 12. breeding, introduction. 13. ibid.; morgan, “a ‘next-generation’ library catalog.” 14. breeding, introduction. 15. ibid. 16. ibid. 17. villanova university, “vufind,” http://vufind.org/ (accessed june 10, 2010); innovated interfaces, “encore,” http:// encoreforlibraries.com/ (accessed june 10, 2010). 18. auto-graphics, “agent illuminar,” http://www4.auto -graphics.com/solutions/agentiluminar/agentiluminar.htm (accessed june 10, 2010). 19. breeding, introduction; morgan, “a ‘next-generation’ table 1. summary features of the next generation catalog koha evergreen voyager single point of entry for all library information ûü û û state-of-the-art web interface ü ü ü enriched content ü ü ü faceted navigation ü û û keyword search ü ü ü relevancy û û û did you mean…? üû ü û recommended/ related materials û û û user contribution ü û û rss feed ü û û 150 information technology and libraries | september 2010 22. villanova university, “vufind”; oregon state university, “libraryfind,” http://libraryfind.org/ (accessed june 10, 2010). 23. sharon q.yang and kurt wagner, “open source standalone opacs,” (microsoft powerpoint presentation, 2010 virtual academic library environment annual conference, piscataway, new jersey, jan. 8, 2010). library catalog.” 20. index data, “zebra,” http://www.indexdata.dk/zebra/ (accessed jan. 3, 2009). 21. evergreen docuwiki, “search relevancy ranking,” http://open-ils.org/dokuwiki/doku.php?id=scratchpad:opac_ demo&s=core (accessed dec. 19, 2008). lita cover 3, cover 4 yalsa cover 2 index to advertisers 60 information technology and libraries | june 2011 b ecause this is a family program and because we are all polite people, i can’t really use the term i want to here. let’s just say that i am an operating system [insert term here for someone who is highly promiscuous]. i simply love to install and play around with various operating systems, primarily free operating systems (oses), primarily linux distributions. and the more exotic, the better, even though i always dutifully return home at the end of the evening to my beautiful and beloved ubuntu. in the past year or two i can recall installing (and in some cases actually using) the following: gentoo, mint, fedora, debian, moonos, knoppix, damn small linux, easypeasy, ubuntu netbook remix, xubuntu, opensuse, netbsd, sabayon, simplymepis, centos, geexbox, and reactos. (aside from stock ubuntu and all things canonical, the one i keep a constant eye on is moonos [http://www.moonos.org/], a stunningly beautiful and eminently usable ubuntu-based remix by a young artist and programmer in cambodia, chanrithy thim.) in the old days i would have rustled up an old, sloughed-off pc to use as an experimental “server” upon which i would unleash each of these oses, one at a time. but those were the old days, and these are the new days. my boss kindly bought me a big honkin’ windows-based workstation about a year and a half ago, a box with plenty of processing power and memory (can you even buy a new workstation these days that’s not incredibly powerful, and incredibly inexpensive?), so my need for hardware above and beyond what i use in my daily life is mitigated. specifically, it’s mitigated through use of virtual machines. i have long used virtualbox (http://www.virtualbox .org/) to create virtual machines (vms), lopped-off hunks of ram and disk space to be used for the installation of a completely different os. with virtualbox, you first describe the specifications of the vm you’d like to create—how much of the host’s ram to provide, how large a virtual hard disk, boot order, access to host cd drives, usb devices, etc. you click a button to create it, then you install an os onto it, the “guest” os, in the usual way. (well, not exactly the usual way; it’s actually easier to install an os here because you can boot directly from a cd image, or iso file, negating the need to mess with anything so distasteful and old-fashioned and outre as an actual, physical cd-rom.) in my experience, you can create a new vm in mere seconds; then it’s all a matter of how difficult the os is to install, and the linux distributions are becoming easier and easier to install as the months plow on. at any rate, as far as your new os is concerned, it is being installed on bare metal. virtual? real? for most intents and purposes the guest os knows no difference. in the titillatingly dangerous and virus-ridden cyberworld in which we live, i’ll not mention the prophylactic uses of vms because, again, this is a family program and we’re all polite people. suffice it to say, the typical network connection of a vm is nated behind the nic of the host machine, so at least as far as active network– based attacks are concerned, your guest vm is at least as secure as its host, even more so because it sits in its own private network space. avoiding software-based viruses and trojans inside your vm? let’s just say that the wisdom passed down the cybergenerations still holds: when it rains, you wear a raincoat—if you see what i’m saying. aside from enabling, even promoting my shameless os promiscuity, how are vms useful in an actual work setting? for one, as a longtime windows guy, if i need to install and test something that is *nix-only, i don’t need a separate box with which to do so. (and vice versa too for all you unix-weaned ladies and gentlemen who find the need to test something on a rocker from redmond.) if there is a software dependency on a particular os, a particular version of a particular os, or even if the configuration of what i’m trying to test is so peculiar i just don’t want to attempt to mix it in with an existing, stable vm, i can easily and painlessly whip up a new instance of the required os and let it fly. and deleting all this when i’m done is easily accomplished within the virtualbox gui. using a virtual machine facilitates the easy exploration of new operating systems and new applications, and moving toward using virtual machines is similar to when i first started using a digital camera. you are free to click click click with no further expense accrued. you don’t like what you’ve done? blow it away and begin anew. all this vm business has spread, at my home institution, from workstation to data center. i now run both a development and test server on vms physically sitting on a massive production server in our data center—the kind of machine that when switched on causes a brown-out in the tri-state area. this is a very efficient way to do things though because when i needed access to my own server, our system administrator merely whipped up a vm for me to use. to me, real or virtual, it was all the same; to the system administrator, it greatly simplified operations. and i may joke about the loud clank of the host server’s power switch and subsequent dimming of the lights, but doing things this way has been shown to be more energy efficient than running a server farm in which each server editorial board thoughts: just like being there, or how i learned to stop coveting bare metal and learned to love my vm mark cyzyk (mcyzyk@jhu.edu) is the scholarly communication architect in the sheridan libraries, johns hopkins university, baltimore, maryland. mark cyzyk editorial board thoughts | cyzyk 61 virtual machines: zero-cost playgrounds for the promiscuous, and energy efficient, staff saving tools for system operations. what’s not to like? throw dual monitors into the mix (one for the host os; one for the guest), and it’s just like being there. sucks in enough juice to quench the thirst of its redundant power supplies. (they’re redundant, they repeat themselves; they’re redundant, they repeat themselves—so you don’t want too many of them around slurping up the wattage, slurping up the wattage . . . ) 100 information technology and libraries | june 2009 tutorial andrew darby and ron gilmour adding delicious data to your library website social bookmarking services such as delicious offer a simple way of developing lists of library resources. this paper outlines various methods of incorporating data from a delicious account into a webpage. we begin with a description of delicious linkrolls and tagrolls, the simplest but least flexible method of displaying delicious results. we then describe three more advanced methods of manipulating delicious data using rss, json, and xml. code samples using php and javascript are provided. o ne of the primary components of web 2.0 is social bookmarking. social bookmarking services allow users to store bookmarks on the web where they are available from any computer and to share these bookmarks with other users. even better, these bookmarks can be annotated and tagged to provide multiple points of subject access. social bookmarking services have become popular with librarians as a means of quickly assembling lists of resources. since anything with a url can become a bookmark, such lists can combine diverse resource types such as webpages, scholarly articles, and library catalog records. it is often desirable for the data stored in a social bookmarking account to be displayed in the context of a library webpage. this creates consistent branding and a more professional appearance. delicious (http://delicious .com/), one of the most popular social bookmarking tools, allows users to extract data from their accounts and to display this data on their own websites. delicious offers multiple ways of doing this, from simply embedding html in the target webpage to interacting with the api.1 in this paper we will begin by looking at the simplest methods for users uncomfortable with programming, and then move on to three more advanced methods using rss, json, and xml. our examples use php, a cross-platform scripting language that may be run on either linux/ unix or windows servers. while it is not possible for us to address the many environments (such as cmses) in which websites are constructed, our code should be adaptable to most contexts. this will be especially simple in the many popular php–based cmses such as drupal, joomla, and wordpress. it should be noted that the process of tagging resources in delicious requires little technical expertise, so the task of assembling lists of resources can be accomplished by any librarian. the construction of a website infrastructure (presumably by the library’s webmaster) is a more complex task that may require some programming expertise. linkrolls and tagrolls the simplest way of sharing links is to point users directly to the desired andrew darby (adarby@ithaca.edu) is web services librarian, and ron gilmour (rgilmour@ithaca.edu) is science librarian at ithaca college library, ithaca, new york. figure 1. delicious linkroll page adding delicious data to your library website | darby and gilmour 101 delicious page. to share all the items labeled “biology” for the user account “iclibref,” one could disseminate the url http://delicious.com/iclibref/ biology. the obvious downside is that the user is no longer on your website, and they may be confused by their new location and what they are supposed to do there. linkrolls, a utility available from the delicious site, provides a number of options for generating code to display a set of bookmarked links, including what tags to display, the number, the type of bullet, and the sorting criterion (see figure 1).2 this utility creates simple html code that can be added to a website. a related tool, tagrolls, creates the ubiquitous delicious tag cloud.3 for many librarians, this will be enough. with the embedded linkroll code, and perhaps a bit of css styling, they will be satisfied with the results. however, delicious also offers more advanced methods of interacting with data. for more control over how delicious data appears on a website, the user must interact with delicious through rss, json or xml. rss like most web 2.0 applications, delicious makes its content available as rss feeds. feeds are available at a variety of levels, from the delicious system as a whole down to a particular tag in a particular account. within a library context, the most useful types of feeds will be those that point to lists of resources with a given tag. for example, the request http://feeds.delicious.com/rss/iclibref/biology returns the rss feed for the “biology” tag of the “iclibref” account, with items listed as follows: <item rdf:about=“http://icarus .ithaca.edu/cgi-bin/pwebrecon .cgi?bbid=237870”> <title>darwin’s dangerous idea (evolution 1) 2008-0409t18:40:00z http://icarus.ithaca .edu/cgi-bin/pwebrecon. cgi?bbid=237870 iclibref this episode interweaves the drama in key moments of darwin's life with documentary sequences of current research, linking past to present and introducing major concepts of evolutionary theory. 2001 biology to display delicious rss results on a website, the webmaster must use some rss parsing tool in combination with a script to display the results. the xml_rss package provides an easy way to read rss using php.4 the code for such an operation might look like this: parse(); foreach ($rss->getitems() as $item) { echo “

” . $item[‘title’] . “

”; } ?> this code uses xml_rss to parse the rss feed and then prints out a list of linked results. rss is designed primarily as a current awareness tool. consequently, a delicious rss feed only returns the most recent thirty-one items. this makes sense from an rss perspective, but it will not often meet the needs of librarians who are using delicious as a repository of resources. despite this limitation, the delicious rss feed may be useful in cases where currency is relevant, such as lists of recently acquired materials. json a second method to retrieve results from delicious is using javascript object notation or json.5 as with the rss feed method, a request with credentials goes out to the delicious server. the response returns in json format, which can then be processed using javascript. an example request might be http://feeds.delicious . c o m / v 2 / j s o n / i c l i b r e f / b i o l o g y . by navigating to this url, the json response can be observed directly. a json response for a single record (formatted for readability) looks like this: delicious.posts = [ {“u”:“http:\/\/icarus.ithaca .edu\/cgi-bin\/pwebrecon .cgi?bbid=237870”, “d”:“darwin’s dangerous idea (evolution 1)”, “t”:[“biology”], “dt”:“2008-04-09t06:40:00z”, “n”:“this episode interweaves the drama in key moments of darwin’s life with documentary sequences of current research, linking past to present and introducing major concepts of evolutionary theory. 2001”} ]; it is instructive to look at the json feed because it displays the information elements that can be extracted: “u” for the url of the resource, “d” for the title, “t” for a comma-separated list of related tags, “n” for the note field, and “dt” for the timestamp. to display results in a webpage, the feed is requested using javascript: 102 information technology and libraries | june 2009 then the json objects must be looped through and displayed as desired. alternately, as in the script below, the json objects may be placed into an array for sorting. the following is a simple example of a script that displays all of the available data with each item in its own paragraph. this script also sorts the links alphabetically. while rss returns a maximum of thirty-one entries, json allows a maximum of one hundred. the exact number of items returned may be modified through the count parameter at the end of the url. at the ithaca college library, we chose to use json because at the time, delicious did not offer the convenient tagrolls, and the results returned by rss were displayed in reverse chronological order and truncated at thirty-one items. currently, we have a single php page that can display any delicious result set within our library website template. librarians generate links with parameters that designate a page title, a comma-delimited list of desired tags, and whether or not item descriptions should be displayed. for example, www.ithacalibrary.com/research/delish_feed. php?label=biology%20films&tag=bio logy,biologyi¬es=yes will return a page that looks like figure 2. the advantage of this approach is that librarians can easily generate webpages on the fly and send the url to their faculty members or add it to a subject guide or other webpage. the php script only has to read the “$_get” variables from the url and then query delicious for this content. xml delicious offers an application programming interface (api) that returns xml results from queries passed to delicious through https. for instance, the request https://api.del.icio.us/v1/posts/ recent?&tag=biology returns an xml document listing the fifteen most recent posts tagged as “biology” for a given account. unlike either the rss or the json methods, the xml api offers a means of retrieving all of the posts for a given tag by allowing requests such as https://api.del.icio.us/v1/ posts/all?&tag=biology. this type of request is labor intensive for the delicious server, so it is best to cache the results of such a query for future use. this involves the user writing the results of a request to a file on the server and then checking to see if such an archived file exists before issuing another request. a php utility called deliciousposts, which provides caching functionality, is available for free.6 note that the username is not part of the request and must be supplied separately. unlike the public rss or json feeds, using the xml api requires users to log in to their own account. from a script, this can be accomplished using the php curl function: $ch = curl_init(); curl_setopt($ch, curlopt_ url, $queryurl); curl_setopt($ch, curlopt_ userpwd, $username . “:” . $password); curl_setopt($ch, curlopt_ returntransfer, 1); $posts = curl_exec($ch); curl_close($ch); this code logs into a delicious account, passes it a query url, and makes the results of the query available as a string in the variable $posts. the content of $posts can then be processed as desired to create web content. one way of doing this is to use an xslt stylesheet to transform the results into html, which can then be printed to the browser: /* create a new dom document from your stylesheet */ $xsl = new domdocument; $xsl->load(“mystylesheet.xsl”); /* set up the xslt processor */ $xp = new xsltprocessor; $xp->importstylesheet($xsl); /* create another dom document from the contents of the $posts variable */ $doc = new domdocument; $doc->loadxml($posts); /* perform the xslt transformation and output the resulting html */ $html = $xp>transformtoxml($doc); echo $html; conclusion delicious is a great tool for quickly and easily saving bookmarks. it also offers some very simple tools such as linkrolls and tagrolls to add delicious content to a website. but to exert more control over this data, the user must interact with the delicious api or feeds. we have outlined three different ways to accomplish this: rss is a familiar option and a good choice if the data is to be used in a feed reader, or if only the most recent items need be shown. json is perhaps the fastest method, but requires some basic scripting knowledge and can only display one hundred results. the xml option involves more programming but allows an unlimited number of results to be returned. all of these methods facilitate the use of delicious data within an existing website. references 1. delicious, tools, http://delicious .com/help/tools (accessed nov. 7, 2008). 2. linkrolls may be found from your delicious account by clicking settings > linkrolls, or directly by going to http:// delicious.com/help/linkrolls (accessed nov. 7, 2008). 3. tagrolls may be found from your delivious account by clicking settings > tagrolls or directly by going to http:// delicious.com/help/tagrolls (accessed nov. 7, 2008) 4. martin jansen and clay loveless, “pear::package::xml_rss,” http://pear .php.net/package/xml_rss (accessed november 7, 2008). 5. introducing json, http://json.org (accessed nov. 7, 2008). 6. ron gilmour, “deliciousposts,” h t t p : / / r o n g i l m o u r. i n f o / s o f t w a r e / deliciousposts (accessed nov. 7, 2008). lita cover 2, cover 3, cover 4 mit press 92 index to advertisers copyright: regulation out of line with our digital reality? abigail j. mcdermott information technology and libraries | march 2012 7 abstract this paper provides a brief overview of the current state of copyright law in the united states, focusing on the negative impacts of these policies on libraries and patrons. the article discusses four challenges current copyright law presents to libraries and the public in general, highlighting three concrete ways intellectual property law interferes with digital library services and systems. finally, the author suggests that a greater emphasis on copyright literacy and a commitment among the library community to advocate for fairer policies is vital to correcting the imbalance between the interests of the public and those of copyright holders. introduction in july 2010, the library community applauded when librarian of congress james h. billington announced new exemptions to the digital millennium copyright act (dmca). those with visual disabilities and the librarians who serve them can now circumvent digital rights management (drm) software on e-books to activate a read-aloud function.1 in addition, higher education faculty in departments other than film and media studies can now break through drm software to include high-resolution film clips in class materials and lectures. however, their students cannot, since only those who are pursuing a degree in film can legally do the same.2 that means that english students who want to legally include high-resolution clips from the critically acclaimed film sense and sensibility in their final projects on jane austin’s novel will have to wait another three years, when the librarian of congress will again review the dmca. the fact that these new exemptions to the dmca were a cause for celebration is one indicator of the imbalanced state of the copyright regulations that control creative intellectual property in this country. as the consumer-advocacy group public knowledge asserted, “we continue to be disappointed that the copyright office under the digital millennium copyright act can grant extremely limited exemptions and only every three years. this state of affairs is an indication that the law needs to be changed.”3 this paper provides a brief overview of the current state of u.s. copyright law, especially developments during the past fifteen years, with a focus on the negative impact these policies have had and will continue to have on libraries, librarians, and the patrons they serve. this paper does not provide a comprehensive and impartial primer on copyright law, a complex abigail j. mcdermott (ajmcderm@umd.edu) is graduate research associate, the information policy and access center (ipac), and masters candidate in library science, university of maryland, college park. mailto:ajmcderm@umd.edu copyright: regulation out of line with our digital reality | mcdermott 8 and convoluted topic, instead identifying concerns about the effects an out-of-balance intellectual property system is having on the library profession, library services, and creative expression in our digital age. as with any area of public policy, the battles over intellectual property issues create an every fluctuating copyright environment, and therefore, this article is written to be current with policy developments as of october 2011. finally, this paper recommends that librarians seek to better educate themselves about copyright law, and some innovative responses to an overly restrictive system, so that we can effectively advocate on our own behalf, and better serve our patrons. the state of u.s. copyright law copyright law is a response to what is known as the “progress clause” of the constitution, which charges congress with the responsibility “to promote the progress of science and the useful arts . . . to this end, copyright assures authors the right to their original expression, but encourages others to build freely upon the ideas and information conveyed by a work.”4 fair use, a statutory exception to u.s. copyright law, is a complex subject, but a brief examination of the principle gets to the heart of copyright law itself. when determining fair use, courts consider 1. the purpose and character of the use; 2. the nature of the copyrighted work; 3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and 4. the effect of the use upon the potential market for the copyrighted work.5 while fair use is an “affirmative defense” to copyright infringement,6 invoking fair use is not the same as admitting to copyright infringement. teaching, scholarship, and research, as well as instances in which the use is not-for-profit and noncommercial, are all legitimate examples of fair use, even if fair use is determined on a case-by-case basis.7 despite the byzantine nature of copyright law, there are four key issues that present the greatest challenges and obstacles to librarians and people in general: the effect of the dmca on the principle of fair use; the dramatic extension of copyright terms codified by the sonny bono copyright term extension act; the disappearance of the registration requirement for copyright holders; and the problem of orphan works. the digital millennium copyright act (dmca) the dmca has been controversial since its passage in 1998. title i of the dmca implements two 1996 world intellectual property organization (wipo) treaties that obligate member states to enforce laws that make tampering with drm software illegal. the dmca added chapter 12 to the u.s. copyright act (17 u.s.c. §§ 1201–1205), and it criminalized the trafficking of “technologies designed to circumvent access control devices protecting copyrighted material from unauthorized information technology and libraries | march 2012 9 copying or use.”8 while film studios, e-book publishers, and record producers have the right to protect their intellectual property from illegal pirating, the dmca struck a serious blow to the principle of fair use, placing librarians and others who could likely claim fair use when copying a dvd or pdf file in a catch-22 scenario. while the act of copying the file may be legal according to fair use, breaking through any drm technology that prevents that copying is now illegal.9 the sonny bono copyright term extension act while the copyright act of 1790 only provided authors and publishers with twenty-eight years of copyright protection, the sonny bono copyright term extension act of 1998 increased the copyright terms of all copyrighted works that were eligible for renewal in 1998 to ninety-five years after the year of the creator’s death. in addition, all works copyrighted on or after january 1, 1978, now receive copyright protection for the life of the creator plus seventy years (or ninety-five years from the date of publication for works produced by multiple creators).10 jack valenti, former president of the motion picture association of american, was not successful in pushing copyright law past the bounds of the constitution, which mandates that copyright be limited, although he did try to circumvent this constitutional requirement by suggesting that copyright terms last forever less one day.11 the era of automatic copyright registration perhaps the most problematic facet of modern u.s. copyright law appears at first glance to be the most innocuous. the copyright act of 1976 did away with the registration requirement established by the copyright act of 1790.12 that means that any creative work “fixed in any tangible medium of expression” is automatically copyrighted at the moment of its creation.13 that includes family vacation photos stored on a computer hard drive; they are copyrighted and your permission is required to use them. the previous requirement of registration meant authors and creators had to actively register their works, so anything that was not registered entered the public domain, replenishing that important cultural realm.14 now that copyright attaches at the moment an idea is expressed through a cocktail napkin doodle or an outline, virtually nothing new enters the public domain until its copyright term expires—at least seventy years later. in fact, nothing new will enter the public domain through copyright expiration until 2019. until then, the public domain is essentially frozen in the year 1922.15 the problem of orphan works in addition, the incredibly long copyright terms that apply to all books, photographs, and sound recordings have created the problem of orphan works. orphan works are those works that are under copyright protection, but whose owners are difficult or impossible to locate, often due to death.16 these publications are problematic for researchers, librarians, and the public in general: orphan works are perceived to be inaccessible because of the risk of infringement liability that a user might incur if and when a copyright owner subsequently appears. consequently, many works that are, copyright: regulation out of line with our digital reality | mcdermott 10 in fact, abandoned by owners are withheld from public view and circulation because of uncertainty about the owner and the risk of liability.17 if copyright expired with the death of the author, or if there were a clause that would allow these works to pass into the public domain if the copyright holder’s heirs did not actively renew copyright for another term, then these materials would be far less likely to fall into legal limbo. currently, many are protected despite the fact that acquiring permission to use them is all but impossible. a study of orphan works in the collections of united kingdom public sector institutions found that these works are likely to have little commercial value, but high “academic and cultural significance,” and when contacted, these difficult-to-trace rights holders often grant permission for reproduction without asking for compensation.18 put another way, orphan works are essentially “locking up culture and other public sector content and preventing organizations from serving the public interest.”19 the row that arose in september 2011 between the hathitrust institutions and the authors guild over the university of michigan’s orphan works digitization project, with j. r. salamanca’s longout-of-print 1958 novel the lost country serving as the pivot point in the dispute, is an example of the orphan works problem. the fact that university of michigan associate university librarian john price wilkin was forced to assure the public that “no copyrighted books were made accessible to any students” illustrates the absurdity in arguing over whether it’s right to digitize books that are no longer even accessible in their printed form.20 libraries, digitization, and copyright law: the quiet crisis while one can debate if u.s. copyright law is still oriented toward the public good, the more relevant question in this context is the effect copyright law has on the library profession. drm technology can get in the way of serving library patrons with visual disabilities and every library needs to place a copyright disclaimer on the photocopiers, but how much more of a stumbling block is intellectual property law to librarians in general, and the advance of library systems and technology in particular? the answer is undeniably that current u.s. copyright legislation places obstacles in the way of librarians working in all types of libraries. while there are many ways that copyright law affects library services and collections in this digital area, three challenges are particularly pressing: the problem of ownership and licensing of digital content or collections; the librarian as de facto copyright expert; and copyright law as it relates to library digitization programs generally, and the google book settlement in particular. digital collections: licenses replace ownership in the past, people bought a book, and they owned that copy. there was little they could accidentally or unknowingly do to infringe on the copyright holder’s rights. likewise, when physical collections were their only concern, librarians could rely on sections 108 and 109 of the copyright law to protect them from liability when they copied a book or other work and when they loaned materials in their collections to patrons.21 today, we live partly in the physical world and https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote19#footnote19 information technology and libraries | march 2012 11 partly in the digital world, reaching out and connecting to each other across fiber optic lines in the same way we once did around the water cooler. likewise, the digital means of production are widely distributed. in a multimedia world, where sharing an informative or entertaining video clip is as easy as embedding a link onto someone’s facebook wall, the temptation to infringe on rights by distributing, reproducing, or displaying a creative work is all too common, and all too easy.22 many librarians believe that disclaimers on public-access computer terminals will protect them from lawsuit, but they do not often consider placing such disclaimers on their cd or dvd collections. yet a copyright holder would not have to prove the library is aware of piracy to accuse the library of vicarious infringement of copyright. the copyright holder may even be able to argue that the library sees some financial gain from this piracy if the existence of the material that is being pirated serves as the primary reason a patron visits the library.23 even the physical cd collection in the public library can place the institution in danger of copyright infringement; yet the copyright challenges raised by cutting-edge digital resources, like e-books, are undoubtedly more complicated. e-books are replacing traditional books in many contexts. like most digital works today, e-books are licensed, not purchased outright. the problem licensing presents to libraries is that licensed works are not sold, they are granted through contracts, and contracts can change suddenly and negate fair-use provisions of u.s. copyright law.24 while libraries are now adept at negotiating contracts with subscription database providers, e-books are in many ways even more difficult to manage, with many vendors requiring that patrons delete or destroy the licensed content on their personal e-readers at the end of the lending period.25 the entire library community was rocked by harpercollins’s february 2011 decision to limit licenses on e-books offered through library ebook vendors like overdrive to twenty-six circulations, with many librarians questioning the publisher’s assertion that this seemingly arbitrary limitation is related to the average lifespan of a single print copy.26 license holders have an easy time arguing that any use of their content without paying fees is a violation of their copyright. that is not the case when a fair use argument is justified, and while many in the library community may acquiesce to these arguments, “in recent cases, courts have found the use of a work to be fair despite the existence of a licensing market.”27 when license agreements are paired with drm technology, libraries may find themselves managing thousands of micropayments to allow their users to view, copy, move, print, or embed, for example, the pdf of a scholarly journal article.28 in the current climate of reduced staff and shrinking budgets, managing these complex licensing agreements has the potential to cripple many libraries. the librarian as accidental copyright czar during a special libraries association (sla) q&a session on copyright law in the digital age, the questions submitted to the panel came from librarians working in hospitals, public libraries, academic libraries, and even law libraries. librarians are being thrust into the position of de facto copyright expert. one of the speakers mentioned that she must constantly remind the lawyers at copyright: regulation out of line with our digital reality | mcdermott 12 the firm she works for that they should not copy and paste the full text of news or law review journal articles into their e-mails, and instead, they should send a link. the basis of her argument is the third factor of fair use mentioned earlier: the amount or substantiality of the portion of the copyrighted work being used.29 since fair use is not a “bright line” principle, the more factors you have on your side the better when you are using a copyrighted work without the owners express permission.30 librarians working in any institution must seek express permission from copyright holders for any video they wish to post, or embed, on library-managed websites. e-reserves and streaming video, mainstays of many educators and librarians seeking to capture the attention of this digital generation, have become bright red targets for litigious copyright holders who want to shrink the territory claimed under the fair-use banner even further. many in the library community are aware of the georgia state university e-reserves lawsuit, cambridge university press et al. v. patton, in which a group of academic publishers have accused the school of turning its e-reserves system into a vehicle for intentional piracy.31 university librarians are implicated for not providing sufficient oversight. it has come to light that the association of american publishers (aap) approached other schools, including cornell, hofstra, syracuse, and marquette, before filing a suit against georgia state. generally, the letters come from aaps outside counsel and are accompanied by “the draft of a federal court legal complaint that alleges copyright infringement.”32 the aap believes that e-reserves are by nature an infringement of copyright law, so they demand these universities work with their association to draft guidelines for electronic content that support aaps “cost-per-click theory of contemporary copyright: no pay equals no click.”33 it seems that georgia state was not willing to quietly concede to aap’s view on the matter, and they are now facing the association in court.34 a decision in this case was pending at the time this article went to press. the case brought by the association for information and media equipment (aime) against ucla is similar, except it focuses on the posting of videos so they can be streamed by students on password-protected university websites that do not allow the copying or retention of the videos.35 ucla argued that the video streaming services for students are protected by the technology education and copyright harmonization (teach) act of 2002, which is the same act that allows all libraries to offer patrons online access to electronic subscription databases off-site through a user-authentication system.36 in addition, ucla argues that it is simply allowing its students to “time shift” these videos, a practice deemed not to infringe on copyright law by the supreme court in its landmark sony corp. v. universal city studios, inc. decision of 1984.37 the american library association (ala), association of research libraries (arl), and the association of college and research libraries (acrl) jointly published an opinion supporting ucla in this case. many in the wider library community sympathized with ucla’s library administrators, who cite budget cuts that reduced hours at the school’s media laboratory as one reason they must now offer students a video-streaming option.38 in the end, the case was dismissed, mostly due to the lack of standing aime had to bring the suite against ucla, a state agency, in federal court. while the judge did not https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote30#footnote30 information technology and libraries | march 2012 13 expressly rule on the fair-use argument ucla made, the ruling did confirm that streaming is not a form of video distribution and that the public-performance argument ucla made regarding the videos was not invalidated by the fact that they made copies of the videos in question.39 digitization programs and the google book settlement librarians looking to digitize print collections, either for preservation or to facilitate online access, are also grappling with the copyright monopoly. librarians who do not have the time or resources to seek permission from publishers and authors before scanning a book in their collection cannot touch anything published after 1922. librarylaw.com provides a helpful chart directed at librarians considering digitization projects, but the overwhelming fine print below the chart speaks to the labyrinthine nature of copyright.40 the google book settlement continues to loom large over both the library profession and the publishing industry. at the heart of debate is google’s library project, which is part of google book search, originally named google print.41 the library project allows users to search for books using google’s algorithms to provide at its most basic a “snippet view” of the text from a relevant publication. authors and publishers could also grant their permission to allow a view of select sample pages, and of course if the book is in the public domain, then google can make the entire work visible online.42 in all cases, the user will see a “buy this book” link so that he or she could purchase the publication from online vendors on unrelated sites.43 google hoped to sidestep the copyright permission quandary for a digitization project of this scale, announcing that it would proceed with the digitization of cooperative library collections and that it would be the responsibility of publishers and authors to actively opt out or vocalize their objection to seeing their works digitized and posted online.44 google attempted to turn the copyright permissions process on its head, which was the basis of the class action lawsuit authors guild v. google inc.45 before the settlement was reached, google pointed to kelly v. arriba soft corp as proof that the indexing functions of an internet search engine constitute fair use. in that 2002 case, the ninth circuit court of appeals found that a website’s posting of thumbnail images, or “imprecise copies of low resolution, scaled down images,” constitutes fair use, and google argued its “snippet view” function is equivalent to a thumbnail image.46 however, judge denny chin rejected the google book settlement in march 2011, citing the fact that google would in essence be “exploiting books without the permission of copyright owners” and could also establish a monopoly over the digitized books market. the decision did in the end hinge on the fact that google wanted to follow an opt-out program for copyright holders rather than an affirmative opt-in system.47 the google book settlement was dismissed without prejudice, leaving the door open to further negotiations between the parties concerned. going forward, the library community should be concerned with how google will handle orphan works and how its index of digitized works will be made available to libraries and the public. the 2008 settlement granted google the nonexclusive right to digitize all books published before january 5, 2009, and in exchange, google would have https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote40#footnote40 copyright: regulation out of line with our digital reality | mcdermott 14 “paid 70% of the net revenue earned from uses of google book search in the united states to rights holders.”48 in addition, google would have established the book rights registry to negotiate with google and others seeking to “digitize, index or display” those works on behalf of the rights holders.49 approval of the settlement would have allowed google to move forward with plans to expand google book search and “to sell subscriptions to institutions and electronic versions of books to individuals.”50 the concern that judge denny chin expressed over a potential google book monopoly was widespread among the library community. while the settlement would not have given google exclusive rights to digitize and display these copyrighted works, google planned to ensure via the settlement that it would have received the same terms the book rights registry negotiated with any third-party digital library, while also inoculating itself against the risk of any copyright infringement lawsuits that could be filed against a competitor.51 that would have left libraries vulnerable to any subscription price increases for the google books service.52 libraries should carefully watch the negotiations around any future google books settlement, paying attention to a few key issues.53 there was considerable concern that under the terms of the 2008 settlement, even libraries participating in the google books library project would need to subscribe to the service to have access to digitized copies of the books in their own collections.53 many librarians also vocalized their disappointment in google’s abandonment of its fair-use argument when it agreed to the 2008 settlement, which, if it succeeded, would have been a boon to nonprofit, library-driven digitization programs.54 finally, many librarians were concerned that google’s book rights registry was likely to become the default rights holder for the orphan works in the google books library, and that claims that google books is an altruistic effort to establish a world library conceals the less admirable aim of the project—to monetize out-of-print and orphan works.55 librarians as free culture advocates: implications and recommendations our digital nation has turned copyright law into a minefield for both librarians and the public at large. intellectual property scholar lawrence lessig failed in his attempt to argue before the supreme court that the sonny bono copyright term extension act was an attempt to regulate free speech and therefore violated the first amendment.56 but many believe that our restrictive copyright laws at least violate the intent of the progress clause of the constitution, if not the first amendment: “unconstrained access to past works helps determine the richness of future works. inversely, when past works are inaccessible except to a privileged minority, future works are impoverished.”57 while technological advances have placed the digital means of production into the hands of the masses, intellectual property law is leading us down a path to self-censorship.58 as the profession “at the heart of both the knowledge economy and a healthy democracy,”59 it is in our best interest as librarians to recognize the important role we have to play in restoring the balance to copyright law. to engage in the debate over copyright law in the digital age, the library community needs to educate itself and advocate for our own self-interests, focusing on three key areas: https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote50#footnote50 information technology and libraries | march 2012 15 1. copyright law in the classroom and at the conference. we must educate new and seasoned librarians on the nature of copyright law, and the impact it has on library practice and systems. library schools must step up to the plate and include a thorough overview of copyright law in their library science curriculum. while including copyright law in a larger legal-issues class is acceptable, the complexity of current u.s. copyright law demonstrates that this is not a subject that can be glossed over in a single lecture. furthermore, there needs to be a stronger emphasis on continuing education and training on copyright law within the library profession. the sla offers a copyright certificate program, but the reach of such programs is not wide enough. copyright law, and the impacts current policy has on the library profession, must be prominently featured at library conferences. the university of maryland university college’s center for intellectual property offers an online community forum for discussing copyright issues and policies, but it is unclear how many librarians are members.60 2. librarians as standard-bearers for the free culture movement. while the library copyright alliance, to which the ala, arl, and acrl all belong, files amicus briefs in support of balanced copyright law and submits comments to wipo, the wider library community must also advocate for copyright reform, since this is an issue that affects all librarians, everywhere. as a profession, we need to throw our collective weight behind legislative measures that address the copyright monopoly. there have been a number of unfortunate failures in recent years. s. 1621, or the consumers, schools, and libraries digital management awareness act of 2003, attempted to address a number of drm issues, including a requirement that access controlled digital media and electronics include disclosures on the nature of the drm technology in use.61 h.r. 107, the digital media consumers rights act of 2003, would have amended the dmca to allow those researching the technology to circumvent drm software while also eliminating the catch-22 that makes circumventing drm software for fair-use purposes illegal. the balance act of 2003 (h.r. 1066) included provisions to expand fair use to the act of transmitting, accepting, and saving a copyrighted digital work for personal use. all of this legislation died in committee, as did h.r. 5889 (orphan works act of 2008) and s. 2913 (shawn bentley orphan works act of 2008). both bills would have addressed the orphan works dilemma, clearly spelling out the steps one must take to use an orphan work with no express permission from the copyright holder, without fear of a future lawsuit. could a show of support from the library community have saved these bills? it is impossible to know, but it is in our best interest to follow these legislative battles in the future and make sure our voice is heard. 3. libraries and the creative commons phenomenon. in addition, librarians need to take part in the creative commons (cc) movement by actively directing patrons towards this world of digital works that have clear, simple use and attribution requirements. creative commons was founded in 2001 with the support of the center for the study of the public domain at duke university school of law.62 the movement is essentially about free culture, and the idea that many people want to share their creative works and allow others to use or build off of their efforts easily and without seeking their permission. it is not intended to supplant copyright law, and lawrence copyright: regulation out of line with our digital reality | mcdermott 16 lessing, one of the founders of creative commons, has said many times that he believes intellectual property law is necessary and that piracy is inexcusable.63 instead, a cc license states in clear terms exactly what rights the creator reserves, and conversely, what rights are granted to everyone else.64 as lawrence lessig explains, you go to the creative commons website (http://creativecomms.org); you pick the opportunity to select a license: do you want to permit commercial uses or not? do you want to allow modifications or not? if you allow modifications, do you want to require a kind of copyleft idea that other people release the modifications under a similarly free license? that is the core, and that produces a license.65 there are currently six cc licenses, and they include some combination of the four license conditions defined by creative commons: attribution (by), share alike (sa), noncommercial (nc), and no derivatives (nd).66 each of the four conditions is designated by a clever symbol, and the six licenses display these symbols after the creative commons trademark itself, two small c’s inside a circle.67 there are “hundreds of millions of cc licensed works” that can be searched through google and yahoo, and some notable organizations that rely on cc licenses include flickr, the public library of science, wikipedia, and now whitehouse.gov.68 all librarians not already familiar with this approach need to educate themselves on cc licenses and how to find cc licensed works.69 while librarians must still inform their patrons about the realities of copyright law, it is just as important to direct patrons, students, and colleagues to cc licensed materials, so that they can create the mash-ups, videos, and podcasts that are the creative products of our web 2.0 world.70 the creative commons system is not perfect, and “creative commons gives the unskilled an opportunity to fail at many junctures.”71 yet that only speaks to the necessity of educating the library community about the “some rights reserved” movement, so that librarians, who are already called upon to understand traditional copyright law, are also educating our society about how individuals can protect their intellectual property while preserving and strengthening the public domain. conclusion the library community can no longer afford to consider intellectual property law as a foreign topic appropriate for law schools but not library schools. those who are behind the slow extermination of the public domain rely on the complexity of copyright law, and the misunderstanding of the principle of fair use, to make their arguments easier and to brow beat libraries and the public into handing over the rights the constitution bestows on everyone. librarians need to engage in the debate over copyright law to retain control over their collections, and to better serve their patrons. in the past, the library community has not hesitated to stand up for the freedom of speech and self-expression, whether it means taking a stand against banning books from school libraries or fighting to repeal clauses of the usa patriot act. today’s library patrons are not just information consumers—they are also information producers. therefore it is just as critical for librarians to advocate for their creative rights as it is for them to defend their freedom to read. https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote60#footnote60 https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fcreativecomms.org https://exch.mail.umd.edu/owa/webreadyviewbody.aspx?t=att&id=rgaaaadxslsgbeewtj9q0yhnkit2bwboujgpo3tvsou0x%2bkwiyfqalrqjtslaaboujgpo3tvsou0x%2bkwiyfqapiuledyaaaj&attid0=eacjse6zzphuq6qbfqvhbhu8&attcnt=1&pn=1#footnote65#footnote65 information technology and libraries | march 2012 17 the internet has become such a strong incubator of creative expression and innovation that the innovators are looking for a way to shirk the very laws that were designed to protect their interests. in the end, the desire to create and innovate seems to be more innate than those writing our intellectual property laws expected. perhaps financial gain is less of a motivator than the pleasure of sharing a piece of ourselves and our worldview with the rest of society. whether that’s the case or not, what is clear is that if we do not roll back legislation like the sonny bono copyright term extension act and the dmca so as to save the public domain, the pressure to create outside the bounds of the law is going to turn more inventors and artists into anarchists, threatening the interests of reasonable copyright holders. as librarians, we must curate and defend the creative property of the established, while fostering the innovative spirit of the next generation. as information, literature, and other creative works move out of the physical world, and off the shelves, into the digital realm, librarians need to do their part to ensure legislation is aligned with this new reality. if we do not, our profession may suffer first, but it will not be the last casualty of the copyright wars. references 1. beverly goldberg, “lg unlocks doors for creators, consumers with dmca exceptions,” american libraries 41, no. 9 (summer 2010): 14. 2. ibid. 3. goldberg, “lg unlocks doors.” 4. christopher alan jennings, fair use on the internet, prepared by the congressional research service (washington, dc: library of congress, 2002), 2. 5. ibid., 1. 6. ibid. 7. brandon butler, “urban copyright legends,” research library issues 270 (june 2010): 18. 8. robin jeweler, “digital rights” and fair use in copyright law, prepared by the congressional research service (washington, dc: library of congress, 2003), 5. 9. rachel bridgewater, “tipping the scales: how free culture helps restore balance in the age of copyright maximalism,” oregon library association quarterly 16, no. 3 (fall 2010): 19. 10. charles w. bailey jr., “strong copyright + drm + weak net neutrality = digital dystopia?” information technology & libraries 25, no. 3 (summer 2006): 117; u.s. copyright office, “copyright law of the united states,” under “chapter 3: duration of copyright,” http://www.copyright.gov/title17 (accessed december 8, 2010). 11. dan hunter, “culture war,” texas law review 83, no. 4 (2005): 1130. 12. bailey, “strong copyright,” 118. 13. u.s. copyright office, “copyright law of the united states,” under “chapter 1: subject matter and scope of copyright,” http://www.copyright.gov/title17 (accessed december 8, 2010). 14. bailey, “strong copyright,” 118. 15. mary minnow, “library digitization table,” http://www.librarylaw.com/digitizationtable.htm (accessed december 8, 2010). https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.copyright.gov%2ftitle17 https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.copyright.gov%2ftitle17 https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.librarylaw.com%2fdigitizationtable.htm copyright: regulation out of line with our digital reality | mcdermott 18 16. brian t. yeh, “orphan works” in copyright law, prepared by the congressional research service (washington, dc: library of congress, 2002), summary. 17. ibid. 18. jisc, in from the cold: an assessment of the scope of “orphan works” and its impact on the delivery of services to the public (cambridge, uk: jisc, 2009), 6. 19. ibid. 20. andrew albanese, “hathitrust suspends its orphan works release,” publishers weekly, sept, 16, 2011, http://www.publishersweekly.com/pw/bytopic/digital/copyright/article/48722-hathitrust-suspends-its-orphan-works-release-.html (accessed october 13, 2011). 21. u.s. copyright office, “copyright law of the united states,” under “chapter 1.” 22. u.s. copyright office, copyright basics (washington, dc: u.s. copyright office, 2000), www.copyright.gov/circs/circl/html (accessed december 8, 2010). 23. mary minnow, california library association, “library copyright liability and pirating patrons,” http://www.cla-net.org/resources/articles/minow_pirating.php (accessed december 10, 2010). 24. bailey, “strong copyright,” 118. 25. overdrive, “copyright,” http://www.overdrive.com/copyright.asp (accessed december 13, 2010). 26. josh hadro, “harpercollins puts 26 loan cap on ebook circulations,” library journal (february 25 2011), http://www.libraryjournal.com/lj/home/889452264/harpercollins_puts_26_loan_cap.html.csp (accessed october 13, 2011). 27. butler, “urban copyright legends,” 18. 28. bailey, “strong copyright,” 118. 29. library of congress, fair use on the internet, 3. 30. ibid., summary. 31. matthew k. dames, “education use in the digital age,” information today 27, no. 4 (april 2010): 18. 32. ibid. 33. dames, “education use in the digital age,”18. 34. matthew k. dames, “making a case for copyright officers,” information today 25, no. 7 (july 2010): 16. 35. william c. dougherty, “the copyright quagmire,” journal of academic librarianship 36, no. 4 (july 2010): 351. 36. ibid. 37. library of congress, “digital rights” and fair use in copyright law, 9. 38. dougherty, “the copyright quagmire,” 351. 39. kevin smith, “streaming video case dismissed,” scholarly communications @ duke, october 4, 2011, http://blogs.library.duke.edu/scholcomm/2011/10/04/streaming-video-casedismissed/ (accessed october 13, 2011). http://www.publishersweekly.com/pw/by-topic/digital/copyright/article/48722-hathitrust-suspends-its-orphan-works-release-.html http://www.publishersweekly.com/pw/by-topic/digital/copyright/article/48722-hathitrust-suspends-its-orphan-works-release-.html https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.copyright.gov%2fcircs%2fcircl%2fhtml https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.cla-net.org%2fresources%2farticles%2fminow_pirating.php https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.overdrive.com%2fcopyright.asp http://www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp http://www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp http://blogs.library.duke.edu/scholcomm/2011/10/04/streaming-video-case-dismissed/ http://blogs.library.duke.edu/scholcomm/2011/10/04/streaming-video-case-dismissed/ information technology and libraries | march 2012 19 40. dougherty, “the copyright quagmire,” 351. 41. librarylaw.com, “library digitization table.” 42. kate m. manuel, the google library project: is digitization for purposes of online indexing fair use under copyright law, prepared by the congressional research service (washington, dc: library of congress, 2009), 1–2. 43. jeweler, “digital rights” and fair use in copyright law, 2. 44. ibid. 45. ibid. 46. manuel, the google library project, 2. 47. amir efrati and jeffrey a. trachtenberg, “judge rejects google books settlement,” wall street journal, march 23, 2011, http://online.wsj.com/article/sb10001424052748704461304576216923562033348.html (accessed october 13, 2011). 48. jennings, fair use on the internet, 7. 49. manuel, the google library project, 2. 50. ibid., 9–10. 51. ibid. 52. ibid. 53. pamela samuelson, “google books is not a library,” huffington post, october 13, 2009, http://www.huffingtonpost.com/pamela-samuelson/google-books-is-not-a-lib_b_317518.html (accessed december 10, 2009). 54. ivy anderson, “hurtling toward the finish line: should the google book settlement be approved?” against the grain 22, no. 3 (june 2010): 18. 55. samuelson, “google books is not a library.” 56. jeweler, “‘digital rights” and fair use in copyright law, 3. 57. bailey, “strong copyright,” 116. 58. cushla kapitzke, “rethinking copyrights for the library through creative commons licensing,” library trends 58, no. 1 (summer 2009): 106. 59. ibid. 60. university of maryland university college, “member community,” center for intellectual property, http://cipcommunity.org/s/1039/start.aspx (accessed february 21, 2011). 61. robin jeweler, copyright law: digital rights management legislation, prepared by the congressional research service (washington, dc: library of congress, 2004), summary. 62. creative commons, “history,” http://creativecommons.org/about/history/ (accessed december 8, 2010). 63. lawrence lessig, “the vision for the creative commons? what are we and where are we headed? free culture,” in open content licensing: cultivating the creative commons, ed. brian fitzgerald (sydney: sydney university press, 2007), 42. 64. steven j. melamut, “free creativity: understanding the creative commons licenses,” american association of law libraries 14, no. 6 (april 2010): 22. http://online.wsj.com/article/sb10001424052748704461304576216923562033348.html https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fwww.huffingtonpost.com%2fpamela-samuelson%2fgoogle-books-is-not-a-lib_b_317518.html https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fcipcommunity.org%2fs%2f1039%2fstart.aspx https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fcreativecommons.org%2fabout%2fhistory%2f copyright: regulation out of line with our digital reality | mcdermott 20 65. lessig, “the vision for the creative commons?” 45. 66. creative commons, “about,” http://creativecommons.org/about/ (accessed december 8, 2010). 67. ibid. 68. ibid. 69. bridgewater, “tipping the scales,” 21. 70. ibid. 71. woody evans, “commons and creativity,” searcher 17, no. 9 (october 2009): 34. https://exch.mail.umd.edu/owa/redir.aspx?c=01cfbeb60fb24d1594b179edf974dcfd&url=http%3a%2f%2fcreativecommons.org%2fabout%2f introducing zoomify image | smith 55 column title editor author id box for 3 column layout returning classification to the catalog | bland and stoffan 55 communications robert n. bland and mark a. stoffan returning classification to the catalog the concept of a classified catalog, or using classification as a form of subject access, has been almost forgotten by contemporary librarians. recent developments indicate that this is changing as libraries seek to enhance the capabilities of their online catalogs. the western north carolina library network (wncln) has developed a “classified browse” feature for its shared online catalog that makes use of library of congress classification. while this feature is not expected to replace keyword searching, it offers both novice and experienced library users another way of identifying relevant materials. classification to modern librari-ans is almost exclusively a tool for organizing and arranging books (or other physical media) on shelves. the role of classification as a form of subject access to collections through the public catalog—the concept of the classified catalog—has been almost forgotten. from a review of the literature, it does not appear that any major u.s. library has supported a classified catalog since boston university libraries closed its classified catalog in 1973.1 to be sure, nearly all online catalogs nowadays have some form of what is called a “call number search” or a “shelf list browsing capability” that is based on classification, but this is a humble and little-used feature because it requires that a call number (or at least a call number stem) be known and entered by the user, when no verbal index to the classification is available online. this search methodology provides nothing in the way of a systematic and hierarchical arrangement and display of subject classes, complete with accompanying verbal descriptions, that the classified catalog seeks to accomplish. but as karen markey put it in her recent review of classification and the online catalog, “to this day, the only way in which most end users experience classification online is through their online catalog’s shelf list browsing capability.”2 there are signs that this situation is changing. the recently released endeca-based catalog at north carolina state university libraries uses library of congress classification (lcc) in a prominent way to provide for browsing of the collection without need of the user entering any search terms at all.3 the lcc outline is presented on the main search entry screen with verbal captions describing the classes, allowing users to navigate through several layers of the outline to retrieve with a click of the mouse bibliographic records for materials assigned to those classes. in a converse way, the new online catalog being developed by the florida center for library automation uses lc classification as a kind of back end to keyword searching. following a keyword search, a user can limit the results set by confining it to a designated lcc range chosen again from an online display of the lcc outline.4 both of these catalogs use three levels of the lcc outlines from the most general single letter level classes (q for sciences, for example) through the two-letter classes for more specific subjects (qc for physics, qd for chemistry) to an even finer granularity with designated numeric ranges within the two-letter classes identifying specific subdisciplines, (qd241–qd441 for organic chemistry). the western north carolina library network (wncln) has been experimenting with classification as a retrieval tool in the public catalog for some time,5 and it has just implemented the first version of what we call a classified catalog browse in our innovative millennium system.6 like the two catalogs just mentioned, the classified catalog browse is based on software that is external to the ils software and integrated with that software through linking and webpage designs. also, like the previously discussed catalogs, it is robert n. bland (bland@unca.edu) is associate university librarian for technical services, university of north carolina at asheville. mark stoffan (mstoffan@ fsu.edu) is associate director for library technology at florida state university, tallahassee. figure 1. level 1 of lc classification in wncln webpac 56 information technology and libraries | june 200856 information technology and libraries | september 2008 based on scanning and incorporating into the catalog the lcc outlines as published by the library of congress. the wncln catalog goes a step further, however, in bringing the entire lc classification online down to the individual class number level—at least that portion of the classification that is actually used in our catalog. this is done through extracting class numbers and associated subject headings from bibliographic and authority records in our catalog and building an online classification display with descriptive captions (a verbal index) from these bibliographic and authority records. the result is a hierarchical display (to continue the example from above) not only of qc241–qd441 for organic chemistry but within this, qd271 for chromatographic analysis, qd273 for organic electrochemistry, and so on. the design of our interface presents this as a fourth level to which the user can “drill” down beginning with q for sciences, qd for chemistry, qd241–qd441 for organic chemistry, and finally qd271 for chromatographic analysis (figures 1–4.) from this fourth level,the user can click an associated link to execute a search of the catalog by the class number in question using the call number search function of the ils (figure 5); a second link for that class number will present the same list of titles but sorted by “most popular” (i.e., the items that have been checked out most frequently) from a separate but linked external database (figure 6); a third link will search the catalog by the associated subject heading for the class (figure 7); and finally a fourth link will show other subject headings that have been used in the catalog with this specific class number (figure 8). what does having the lc classification online in our catalog accomplish for our users? part of the point of our project is to answer this very question. chan and others7 have theorized that incorporation of the classification system into the catalog as a retrieval tool can figure 2. level 2 of lc classification in wncln webpac figure 3. level 3 of lc classification in wncln webpac provide enhanced subject access that is not possible through standard alphabetical subject headings and keyword searching alone. early studies by markey and others at oclc seem to have confirmed this with an online version of the dewey decimal classification.8 since (as far as we know) the library of congress classification has not really been tested as an online retrieval tool in a live catalog up to now, our implementation will serve as a kind of test bed for this hypothesis. how actual users in fact exploit this feature is of course only something that experience will introducing zoomify image | smith 57returning classification to the catalog | bland and stoffan 57 tell. a cursory look, however, would seem to indicate definite advantages to this approach. first of all, many studies indicate that two of the major sources of failure with subject retrieval in online systems are misspellings and poor choice of search terms by users. no figure 4. level 4 of lc classification in wncln web:pac figure 5. call number search display in wncln matter how far we may try to go with keyword searching and relevance ranking, no online library retrieval system is likely to do much with “napolyan’s fites” when what the user is looking for are books on the military campaigns of the emperor napoleon. with the classification system and verbal index online most of these problems are eliminated, since users can navigate to a subject of choice without ever entering a search term. moreover, given the design of the verbal index based on library of congress subject headings, the user is led to actual subject headings used in the catalog, which should provide for precise retrieval beyond what is ordinarily possible with keywords even when entered correctly, and (importantly) a retrieval set that is always greater than zero. the infamous and frustrating problem of “no hits” is eliminated. secondly, the great attraction of the classified catalog approach is that it arranges subjects in a hierarchical fashion based on integral connections among the topics in a way that cannot be accommodated in an alphabetic subject approach because of the vagaries of spelling. the topics “violence,” “social conflict,” and “conflict management,” for example, obviously spread out in an alphabetical subject list, are collocated in the classified catalog under the class “hm1106–hm1171 interpersonal relations” (figure 9), allowing the user to find references to materials all in one place in the catalog just as the classification system arranges the books on these subjects all in one place on the library shelves. alphabetical subject indexes, of course, attempt to ameliorate this problem by means of cross references, but there is clearly a limit to how far one can go with this approach. finally, the classified catalog provides an efficient way for collection development staff to review specific subject areas and to make better informed purchasing decisions regarding the collections. in the wncln design, the classes at the bottom level of the hierarchy are linked to the catalog by call number and subject headings, and each class carries an indication of the number of items assigned that class number. the classes are also linked to an external database that shows the frequency 58 information technology and libraries | june 200858 information technology and libraries | september 2008 of circulation of items in the class as well as title and date of publication. a quick review of this list can inform a bibliographer of circulation rates as well as the currency of materials in the class. as mentioned, the captions that are displayed with the lcc hierarchy in the wncln catalog are extracted from subject headings and authority records present in our catalog. readers familiar with lc marc record services may wonder why we took this approach to building the verbal index rather than using the information available in the lc marc classification records. machine-readable records for lc classification are now available in marc format. these files include records for each individual class number with a corresponding verbal caption. while we did experiment with using these files, cost and complexity determined that we go another direction. the lc classification files are huge, containing hundreds of thousands of classification numbers that we do not now and probably never would use in our wncln catalog simply because we (unlike lc) have no materials on these subjects. while these records could be filtered out by matching against lc class numbers that are found in our catalog and discarding non-matches, this would add yet another level of processing to an already complex process, as would handling the lc table subdivisions that are used in the lc schedules and that are separate from the standard class numbers. secondly, the lc marc classification files require a subscription costing several thousands dollars per year, as well as a substantial payment for the retrospective file needed to begin building the database of class numbers. on the other hand, extracting the verbal index from subject headings and authority records in our own catalog adds no cost to our processing. these headings and authority records are created and maintained, of course, as a standard part of the figure 6. most used titles display figure 7. subject search display in wncln cataloging process, and accordingly only headings and authority records that match materials owned by our libraries are included. the description or caption that is finally assigned to a class number is determined by a computer program that analyzes both authority records and bibliographic records found in our catalog that are assigned the class number in question, with the subject heading that is used most frequently as a primary subject generally being the one normally selected as the caption for the class. these class numbers with associated subject headings are processed then by another program, which eventually builds html files introducing zoomify image | smith 59returning classification to the catalog | bland and stoffan 59 representing the classification with links to the catalog and the external “most used” database as alluded to above. these standard html files, along with the files representing the first three levels of the lcc outline, are then loaded onto our web server to display the classification system online. figure 9. collocation of terms in the classified catalog figure 8. related subjects display in wncln a second advantage of this approach is that using the actual subject heading as the caption or description for the class makes it possible to use that caption as a direct link to a subject search in the catalog, as shown in the illustration in figure 4. a disadvantage is that the captions from the lcc files are designed to retain the hierarchy that is represented in the printed schedules in a visual way by formatting and indenting. captions derived from subject headings do not retain this feature. we have tried to accommodate this in our display of the schedules by replicating the class number ranges from the outline in the appropriate place in the full display of the schedules, thereby building a hierarchy from these ranges as genus and the individual class numbers as species. this does not manage to retain the full hierarchy of the lc schedules as shown in the printed schedules or as represented in lc’s online classification web product, but it is, we hope, an adequate surrogate for the purpose intended. in fact, in most cases, the captions derived from the extracted subject and authority headings match quite nicely the captions included in the actual lcc schedules, as shown in a comparison from the psychology classification of the hierarchy as it appears in our classified catalog browse and as it appears online in lc’s classification web product (figures 10 and 11). what is missing in our representation of the classification is not so much the subject content of the classes but the notes and information about literary form that are included in the actual lcc schedules. thus, our lcc online is not a strict image of the lcc as it would appear in printed or electronic form based on the hierarchies and captions devised by the lc. nor for that matter—despite our terminology— is it a true classified catalog, since only one classification (that used in the call number) is assigned to each item, whereas in a true classified catalog multiple classifications may be assigned to an item. it is nevertheless an online presentation of the lcc with links to our catalog that seeks to enhance subject access by exploiting the power of the classification system to organize materials by integral subject classes and to show relationships among subjects by a 60 information technology and libraries | june 200860 information technology and libraries | september 2008 hierarchical arrangement of classes as genus, species, and subspecies. and, perhaps just as importantly, it is an implementation that requires no additional cataloging effort on the part of our staff, nor any additional costs for data or processing other than the investment we have made in development of the software and the small amount of time required weekly to update the files. we do not expect that the classified catalog browse will replace keyword or subject searching as the primary means of subject access to our collections. we do believe that it promises to be a powerful and effective complement to our standard ils searches that may improve subject searching for both the novice and the experienced user. references 1. margaret hindle hazen, “the closing of the classified catalog at boston university,” library resources and technical services 18 (1974): 221–26. 2. karen markey, joan s. mitchell, and diane vizine-goetz, “forty years of classification online: final chapter or future unlimited?” cataloging and classification quarterly 42 (2006): 1–63. 3. north carolina state university libraries, “ncsu libraries online catalog,” north carolina state university, www.lib.ncsu.edu/catalog (accessed mar. 23, 2007). 4. florida center for library automation, “state university libraries of florida–endeca,” board of governors, state of florida, http://catalog.fcla.edu (accessed mar. 23, 2007). 5. the western north carolina library network is a consortium consisting of the libraries of appalachian state university, the university of north carolina at asheville, and western carolina university. 6. western north carolina library network, “library catalog,” western north carolina library network, http://wncln .wncln.org (accessed mar. 23, 2007). figure 10. class captions in the wncln webpac figure 11. class captions in lc’s classification web 7. lois mai chan, “library of congress classification as an online retrieval tool: potentials and limitations,” information technology and libraries 5 (1986): 181–92. 8. karen markey and anh demeyer, dewey decimal classification online project: evaluation of a library schedule and index integrated into the subject searching capabilities of an online catalog: final report to the council on library resources (dublin, ohio: oclc, 1986), report no. oclc/ opr/rr-86/1. 2 information technology and libraries | september 2007 w elcome to my first ital president’s column. each president only gets a year to do these col umns, so expectations must be low all around. my hope is to stimulate some thinking and conversation that results in lita members’ ideas being exchanged and to create real opportunities to implement those ideas. my first column i thought i would keep short and sweet, and discuss just a few of the ideas that have been rattling around in my head since the 2007 midwinter lita town meeting, which have been enhanced by a number of discussions among librarians over the last six months. with any luck, these thoughts might have some bearing on what any of those ideas could mean to our organization. first off, i don’t think i can express how weird this whole presidential appellation is to me. i am extremely proud to be associated with lita, and honored and surprised at being elected. i come from a consortia envi ronment and an extremely flat organization. solving problems is often a matter of throwing all the parties in a room together and hashing it out until solutions are arrived at. i’ve been a training librarian for quite a while now, and pragmatic approaches to problem solving are my central focus. i’m a consortia wrangler, a trainer, and a technology pusher, and i hope my approach is, and will be, to listen hard and then see what can be accomplished. so in my own way, i find being president kind of on the embarrassing side. it’s like not knowing what to do with your hands when you’re speaking in public. at the lita town meeting (http://litablog .org/2007/06/17/litatownmeeting2007report/) it was pretty obvious that members want community in all its various forms, facetoface in multiple venues and online in multiple venues. it’s also pretty obvious from the studies done by pew internet and american life and by oclc that our users, and in particular our younger users, really want community. the web 2.0 and the library 2.0 movements are responses to that desire. as a somewhat flippant observation, we spent a generation educating our kids to work in groups, and now we shouldn’t be sur prised that they want to work and play in groups. many of us work effectively in collaborative groups everyday. we find it exciting, productive, and even fun. it’s an environment that we would like to create for our patrons, inhouse and virtually. it’s what we would like to see in our association. having been to every single top tech trends program and listened to the lita trendsters, one theme that often comes up is that complaining about the systems our ven dors deliver can at times be pointless, because they sim ply deliver what we ask for. there is of course a corollary to this. once a system is in the marketplace, adding func tionality often becomes centered around the lowhang ing fruit. as a fictitious example, a vendor might easily add the ability to change the colors of the display to the patron, but adding a shelf list browse might take serious coding to create. so through discussions and rfp, we ask for and get the pretty colors while the browsing function waits, a form of procrastination. so then does innovation come only when all the lowhanging fruit has finally been plucked, and there’s nothing else to procrastinate on? as social organizations, libraries, ala, lita and other groups, it appears that we have plucked all the lowhanging fruit of web 1.0. email and static web pages have been done to death. as a pragmatist, what concerns me most is implementation. what delivery systems should and can we adopt and develop to fulfill the promise of services we’d like? can we ensure that barriers to participation are either eliminated or so low as to include everyone? i like to think that web 2.0 is innovation toward mirroring how we personally want to work and play and how we want our social structures to perform. so how can we make lita mirror how we want to work and play? i do know it’s not just making everything a wiki. mark beatty (mbeatty@wils.wisc.edu) is lita president 2007/2008 and trainer, wisconsin library services, madison. president’s column mark beatty 4 information technology and libraries | december 2007 author id box for 2 column layout column title editor enterprise digital asset management (dam) systems are beginning to be explored in higher education, but little information about their implementation issues is available. this article describes the university of michigan’s investigation of managing and retrieving rich media assets in an enterprise dam system. it includes the background of the pilot project and descriptions of its infrastructure and metadata schema. two case studies are summarized—one in healthcare education, and one in teacher education and research. experiences with five significant issues are summarized: privacy, intellectual ownership, digital rights management, uncataloged materials backlog, and user interface and integration with other systems. u niversities are producers and repositories of large amounts of intellectual assets. these assets are of various forms: in addition to text materials, such as journal papers, there are theses, performances from per forming arts departments, recordings of native speakers of indigenous languages, or videos demonstrating surgical procedures, to name a few.1 such multimedia materials have not, in general, been available outside the originat ing academic department or unit, let alone systematically cataloged or indexed. valuable assets are “lost” by being locked away in individual drawers or hard disks.2 managing and retrieving multimedia assets are not problems confined to academia. media companies such as broadcast news agencies and movie studios also have faced this problem, leading to their adoption of digital asset management (dam) systems. in brief, dam systems are not only repositories of digitalrich media content and the associated metadata, but also provide management functionalities similar to database manage ment systems, including access control.3 a dam system can “ingest digital assets, store and index assets for easy searching, retrieve assets for use in many environments, and manage the rights associated with those assets.”4 in summer 2000, the university of michigan (um) tv station, umtv, was searching for a video archive solution. that fall, a um team visited cnn and experienced a “eureka!” moment. as james hilton, thenassociate provost for academic, information, and instructional technology affairs, later wrote, “building a digital asset management into the infrastructure . . . will be the digital equivalent of bringing indoor plumbing to the campus.”5 in spring 2001, an enterprise dam system was considered for inclusion in the university infrastruc ture. upon completion of a limited proofofconcept project, a crosscampus team developed the request for proposals (rfp) for the dams living lab, which was issued in july 2002 and subsequently awarded to ibm and ancept. in august 2003, hardware and software installation began in the living lab.6 by 2006, the project changed its name to bluestream to appeal to the grow ing mainstream user base.7 six academic and two support units agreed to partner in the pilot: ■ school of education ■ school of dentistry ■ college of literature, science, and the arts ■ school of nursing ■ school of pharmacy ■ school of social work ■ information technology central services ■ university libraries the academic units were asked to provide typical and unusual digital media assets to be included in the living lab pilot. the pilot focused on rich media, so the preferred types of assets were digital video, images, and other multimedia delivered over the web. the living lab pilot was designed to address four key questions: ■ how to create a robust infrastructure to process, manage, store, and publish digital rich media assets and their associated metadata. ■ how to build an environment where assets are eas ily searched, shared, edited, and repurposed in the academic model. ■ how to streamline the workflow required to create new works with digital rich media assets. ■ how to provide a campuswide platform for future application of rights declaration techniques (or other ip tools) to existing assets. this article describes the challenges encountered during the researchanddevelopment phase of the um enterprise dam system project known as the living lab. the project has now ended, and the implemented project is known as bluestream. enterprise digital asset management system pilot: lessons learned yong-mi kim, judy ahronheim, kara suzuka, louis e. king, dan bruell, ron miller, and lynn johnson yong-mi kim (kimym@umich.edu) is carat-rackham fellow 2004, school of information; judy ahronheim (jaheim@umich .edu) is metadata specialist, university libraries; kara suzuka (ksuzuka@umich.edu) is assistant research scientist, school of education; louis e. king (leking@umich.edu) is managing producer, digital media commons; dan bruell (danlbee@umich .edu) is director, school of dentistry; ron miller (ronalan@umich .edu) is multimedia services position lead, school of education; and lynn johnson (lynjohns@umich.edu) is associate professor, school of dentistry, university of michigan, ann arbor. article title | author 5enterprise dam system pilot | kim, ahronheim, suzuka, king, bruell, miller, and johnson 5 ■ background of the living lab: u-m enterprise dam system project an enterprise project such as the living lab at um can have significant impact on an institution’s teaching and learning activities by allowing all faculty and students easy yet secure access to media assets across the entire campus. such extensive impact can only be obtained by overcoming numerous and varied obstacles and by docu menting actual implementation experiences employed to overcome those challenges. enterprise dam system vendors such as stellent, artesia, and canto list clients from many different industry sectors, including gov ernment and education, but provide no detailed case studies on their web sites.8 information regarding the status of enterprise dam system projects and specific issues that arose during implementation is difficult to find. information publicly available for enterprise dam system projects in higher education is usually in the form of white papers or proposals that do not cover the actual implementations.9 given the high degree of interest and the number of pilot projects announced in recent years, this shortcoming has prompted the writing of this article, which presents the most important lessons learned dur ing the first phase of the living lab pilot project with the hope that these experiences will be valuable to other academic institutions considering similar projects. as part of its core mission, um strives to meet the teaching and learning needs of the entire campus. thus, the living lab pilot solicited participation from a diverse crosssection of the university’s departments and units with the goal of evaluating the use of varied teaching and learning assets for the system. from the beginning, it was expected that this system would handle assets in many different forms, such as digital video or digitized images, and also accommodate various organizational schemas and metadata for different collections. this sets the um enterprise dam system apart from projects that focus on only one type of collection or define a large monolithic metadata schema for all assets. data were gathered through interviews with asset providers, focus groups with potential users, and a review of the relevant literature. a number of barriers were identified during the pilot’s first phase. while there were some technical barriers, the most signifi cant barriers were cultural and organizational ones for which technical solutions were not clear. perhaps the most significant cultural divide was between the culture of academia and the culture of the commercial sector. cultural and organizational assumptions from com mercial business practices were embedded in the design of the products initially used in the living lab imple mentation. thus, an additional implementation chal lenge was determining which issues should be resolved through technical means, and which should be solved by changing the academic culture. this is expected to be an ongoing challenge. ■ architecture (building the infrastructure) an enterprise dam system in an academic community such as um needs to support a wide variety of services in order to meet the numerous and varied teaching, research, service, and administrative functions. figure 1 illustrates the services that are provided by an enterprise dam system and concurrently demonstrates its com plexity. the left column, process, lists a few of the media processes that various producers will use prepare their media and subsequent ingestion into the enterprise dam system; the middle column, manage, demonstrates the various functions of the enterprise dam system; while the third column, publish, lists a subset of the publishing venues for the media. because an enterprise dam system supports a variety of rich media, a number of software tools and workflows are required. figure 2 illustrates this complexity and describes the architecture and workflow used to add a video segment. the organization of figure 2 parallels that of figure 1. the left column, process, indicates that flip factory by telestream is used to convert digital video from the original codec to one that can be used for play back.10 in addition, videologger by virage uses media analysis algorithms to extract key frames and time codes created by louis e. king, ©2004 regents of the university of michigan figure 1. component services of the living lab 6 information technology and libraries | december 20076 information technology and libraries | december 2007 from the video as well as to convert the speechtotext for easy searching.11 the middle column, manage, illustrates tools from ibm that help create rich media as well as tools from stellent, such as its ancept media server (ams), that store and index the rich media assets.12 the third column, publish, illustrates two examples of how these digital video assets could be made available to the end user. one strategy is as a real video stream using real network’s helix server, and the other as a quicktime video stream using ibm’s videocharger.13 a thorough discussion of all of the software and hardware that make up um’s dam system is beyond the scope of this article. however, a list of the software components with links to their associated web sites is provided in figure 3. from the beginning the living lab pilot aimed for a diverse collection of assets to promote resource discovery and sharing across the university. figure 4 illustrates how the living lab is expected to fit into the varied publishing venues that comprise the campus teaching and learning infrastructure. existing storage and network infrastruc tures are used to deliver media assets to various software systems on campus. the living lab is used to streamline the cataloging, searching, and retrieving processes encoun tered during academic teaching and research activities. the following example describes how the enterprise dam system fits into the future campus cyberinfrastruc ture. a faculty member in the school of music is a jazz composer. one of her compositions is digitally stored in the enterprise dam system along with the associated metadata (cataloging information) that will allow the piece to be found during a search. that single audio file is then found, accessed, and used by five unique publish ing venues—the course web site, the university web site, a radio broadcast, the music store, and the library archive. the faculty member uses the piece in her jazz interpreta tion course and thus includes a link to the composition on her sakai course web site.14 when she receives an award, the um issues a press release on the um web site that includes a link to an audio sample. concurrently, michigan radio uses the enterprise dam system to find the piece for a radio interview with her that includes an audio segment.15 her performance is published by block m records, um’s webbased recording label, and, lastly, the library permanently stores the valuable piece in its institutional archive, deep blue.16 ■ metadata (managing assets within the academic model) the vision for enterprise dam at um is for digital assets to not only be stored in a secure repository, but also be findable, accessible, and usable by the appropriate persons in the university community in their academic endeavors. information about these assets, or metadata, is a crucial component of fulfilling this vision. an important created by louis e. king, ©2004 regents of the university of michigan figure 2. the living lab architecture north american systems ancept media server www.nasi.com/ancept.php ibm content manager www-306.ibm.com/software/data/cm/cmgr/mp/ telestream flip factory www.telestream.net/products/flipfactory.htm virage videologger www.virage.com/content/products/index.en.html ibm video charger www-306.ibm.com/software/data/videocharger/ real networks helix server www.realnetworks.com/products/media_delivery. html apple quicktime streaming server www.apple.com/quicktime/streamingserver/ handmade software image alchemy www.handmadesw.com/products/image_alchemy. htm figure 3. software used in the living lab article title | author 7enterprise dam system pilot | kim, ahronheim, suzuka, king, bruell, miller, and johnson 7 question that arises is, “what kind of metadata should be required for the assets in the living lab?” to help answer this question, potential asset provid ers were interviewed regarding their current approach to metadata, such as if they used a particular schema and how well it met their purposes. not surprisingly, asset providers had widely varied metadata implementations. while the assets intended for the living lab pilot all had some metadata, the scope and granularity varied greatly. metadata storage and access methods also varied, ranging from databases implemented using commercial database products and providing web frontends, to a combination of paper and spreadsheet records that had to be consulted together to locate a particular asset. the assets to be used in the living lab pilot consisted primarily of high and lowresolution digital images and digitized video. these interviews also generated a number of requirements for any potential living lab metadata schema. it was deter mined that the schema should be able to: ■ describe heterogeneous collections at an appropriate level of granularity and detail, allowing for domain specific description needs and vocabularies; ■ allow metadata entry by nonspecialists; ■ enable searches across multiple subject areas and col lections; ■ provide provenance information for the assets; and ■ provide information on authorized uses of the assets for differing classes of users. an examination of the literature showed a general consensus that no single metadata standard could meet the requirements of heterogeneous collections.17 projects as diverse as pb core and vius at penn state adopted the approach of drawing from multiple existing metadata standards.18 their approaches differ in that pb core is a combination of selected metadata elements from a num ber of standards plus additional elements unique to pb core, while vius opted for a merged superset of all the elements in the standards selected. in interviews with asset providers (usually faculty), cataloging backlog and the lack of personnel for gen erating and entering metadata emerged as consistent problems. there was concern that an overly complex or specialized schema would aggravate the cataloging back log by making metadata generation timeconsuming and cumbersome. budgetary constraints made hiring pro fessional metadata creators prohibitive. another aspect of the personnel problem was that adequate descrip tion required subject specialists who were, ideally, the resource authors or creators. but subject specialists, while familiar with the resources and the potential audience for them, may not be knowledgeable of how to produce highquality metadata, such as controlled vocabularies or consistent naming formats. to address these issues, the more simple and straight forward indexing process offered by dublin core (dc) was selected as the starting point for the metadata schema in the living lab.19 dc was originally developed to sup port resource discovery of a digital object, with resource authors as metadata creators. dc is a relatively small standard, but is extensible through the use of qualifiers. it has been adopted as a standard by a number of standards organizations, such as iso and ansi. a body of research exists on its use in digital libraries and its efficacy for authorgenerated metadata, and there are metadata crosswalks between dc and most other metadata stan dards. a number of other subjectspecific standards were also examined for more specialized description needs and controlled vocabularies: vra core, ims learning resource metadata specification, and snodent.20 in the end, the project leaders elected to adopt a rather novel approach to metadata by not defining one metadata schema for all assets. by taking advantage of the power of multiple approaches (for example, pb core for mixand match, and vius for a merged superset) each collection can have its own schema as long as it contains the ele ments of a more general, lowestcommondenominator schema. this overall schema, um_core, was defined based on dc. the elements are prefixed with dc or um to specify the schema origin. um_publisher and um_alternatepublisher identify who should be contacted about problems or ques tions regarding that particular asset. um_secondarysubject is a crosscollection subject classification schema devel created by louis e. king, ©2004 regents of the university of michigan figure 4. the enterprise dam system as the future campus infrastructure for academic venues 8 information technology and libraries | december 20078 information technology and libraries | december 2007 oped by the um libraries, and helps map the asset into the context of the university. in adopting such an approach to metadata, metadata creation is seen not as a oneshot process, but a collaborative and iterative one. for example, on initial ingestion into the living lab, the only metadata entered for an image may be dc_title, dc_date, and um_publisher. additional meta data may be entered as users discover and use the asset, or as input from a subject specialist becomes available. the discussion so far has focused on metadata pro duced with human intervention. a number of metadata elements can be obtained from the digital objects through the use of software. in an enterprise dam system, this is referred to as automatically generated metadata and is what can be directly obtained from a computer file such as file name, file size, and file format. this type of metadata is expected to play a larger role as an increasing propor tion of assets will be born digital and come accompanied by a rich set of embedded metadata. for example, images or video produced by current digital cameras contain exchangeable image file format (exif) metadata, which include such information as image size, date produced, and camera model used. when available, the living lab presents automatically generated metadata to the user in addition to the elements in um_core. thus, asset metadata in the living lab can be pro duced in two ways: automatically generated through a tool such as virage videologger in the case of video, or entered by hand through the current dam system inter face.21 in addition, if metadata already exist in a database format, such as filemaker, this can be imported once the appropriate mappings are defined.22 videologger, a video analysis tool for digital video files, can extract video key frames, add closed captions, determine whether the audio is speech or music, convert speech to text, and identify (through facial recognition) the speaker(s). these capabilities allow for more sophis ticated searching of video assets compared to the cur rent capabilities of search engines such as google. some degree of contentbased searching can now be done, as opposed to searching that relies on the title and other textual description provided separately from the video itself. for the pilot, particular interest was expressed in the speech recognition capability of videologger. videologger generates a timecoded text of spoken key words with 50 to 90 percent accuracy. the result is not nearly accurate enough to generate a transcript, but does indeed provide robust data for searching the content of video. given the diversity of assets in the living lab, it is clear that the university can utilize lowcost keyword analysis to enhance search granularity as well as the more expensive, fully accurate handprocessed transcript. ■ workflow examples two instructional challenges demonstrate how an enter prise digital asset management system can provide a solution to instructional dilemmas and how a unique workflow needs to be created for each situation. the chal lenges related to each project are described. school of dentistry the educational dilemma the um school of dentistry uses standardized patient instructors (spis) to assess students’ abilities to interact with patients. carefully trained actors play carefully scripted patient roles. dental students interview the patients, read their records, and make decisions about the patients’ care, all in a few minutes (see figure 6). each session is video recorded. currently, spis grade each student on predeter mined criteria, and the video recording is only used if a student contests the spis’ grade. ideally, a dental educator should review each recording and also grade each student. however, the um class size of 105 dental students causes a recordingbased grading process to be prohibitively expensive in terms of personnel time. in addition, the use of digital videotape makes it difficult for the recorded sessions to be made available to the students. because the tapes are part of the student’s record, they cannot be checked out. if a student wants to review a tape, she or he must make an appointment and review it in a supervised setting. living lab solution the um school of dentistry’s living lab pilot attempted simultaneously to improve the spi program and lower the cost of faculty grading spi sessions through three goals: dc_title dc_creator dc_subject um_secondarysubject dc_description dc_publisher dc_contributor dc_date dc_type dc_format dc_identifier dc_source dc_language dc_relation dc_coverage dc_rights um_publisher um_alternatepublisher figure 5. the u-m enterprise dam system metadata scheme um_core article title | author 9enterprise dam system pilot | kim, ahronheim, suzuka, king, bruell, miller, and johnson 9 1. use speechtotext analysis to create an easily searched transcript; 2. streamline the recording process; and 3. make the videos available online for student review. each of these challenges and the current results are summarized. speech-to-text analysis it was hypothesized that an effective speechtotext anal ysis of the spi session could enable a grader quickly to locate video segments that: (1) represented student dis cussion of specific dental procedures; and (2) contained student verbalizations of key clinical communication skills.23 in summer 2005, nine spi sessions were recorded and a comparison between manual transcription and the automated speechtotext processes was conducted. the transcribed audio track was manually marked up with timecoded reference points and inserted as an annota tion track to the video. those same videos also were ana lyzed through the video logger speechtotext service in the living lab, resulting in an automatically generated, timecoded text track. lastly, six keywords were selected that, if spoken by the student, indicated the correct use of either a dental procedure or good communication skills. keyword searches were conducted on both the manual transcription and the speechtotext analysis. three results were calculated on the key word searches of both versions of all nine recorded sessions. they were: (1) the number of successful keyword searches; (2) the number of successful search results that did not actually contain the keywords (false positives); and (3) the time required to complete the manual transcrip tion and texttospeech analysis of the recordings. the results demonstrated that the speechtotext analysis matched the manual transcription 20 to 60 percent of the time. also, the speechtotext process resulted in a false positive less than 10 percent of the time. lastly, the time required to complete the speechtotext analysis of a session was two minutes, while the average time required to complete a manual transcription of the same session was 180 minutes. while not perfect, the results are encouraging that manually transcribing the audio is no longer necessary. improvements are being made to the clinical environment and microphones so that a higherquality recording is obtained. it is anticipated that those changes combined with improved software will improve the results of the speechtotext analysis sufficiently so that automated keyword searches can be conducted for grading purposes. streamlining the recording process scale is a significant challenge to capturing 105 spi inter actions in a short amount of time. two to three weeks are required for the entire class of 105 students to complete a series of spi experiences, with as a many as four concur rent sessions at any given time. in summer 2006, it was decided to record 50 percent of one class. logistically, one camera operator could staff two stations simultane ously. the stations had to be physically close enough for a oneperson operation, but not so close that audio from the adjacent session was recorded. the optimal distance was about thirty to thirtyfive feet of separa tion. staggering the start times of each session allowed the camera operator to make sure each was started with optimal settings. since the results of the speechtotext analysis were linked to the quality of the equipment used, two prosumer minidv cameras with professional quality microphones and tripods also were purchased. student availability an important strength of living lab is the ability to make the assets both protected and accessible. the current itera tion does not have an interface for usercreated access con trol lists (acl), instead they need to be created by a systems administrator. once a systems administrator has created an acl, academic technology support staff can add or subtract people. to satisfy family educational rights and protection act regulations, a separate acl is needed for each student for the spi project.24 currently, the possibility of including the spi recordings and their associated transcriptions as ele ments of an eportfolio is being explored.25 in the meantime, students can use url references to include these videos and transcripts in such webbased tools as eportfolios and course management systems. discussion as the challenges of improving speechtotext analysis, recording workflow, and usercreated acls are overcome, the spi program will be able to operate at a new and previ ously unimagined level. a more objective keyword grad ing process can be instituted. students will be easily able to search through and review their sessions at times and locations that are convenient for them. living lab also will allow students to view their eportfolio of spi interactions and witness how they have improved their communica tion skills with patients. for the first time in healthcare education, a clinician’s communication skills, such as bedside or chairside manner, will be able to be taught and assessed using objective methods. school of education the challenge of using records of practice for research and professional education classroom documentation plays a significant role in educational research and in the professional education of teachers at the um school of education. collections of 10 information technology and libraries | december 200710 information technology and libraries | december 2007 videos capturing classroom lessons, smallgroup work, and interviews with students and teachers—as well as other classroom records, such as images of student work, teacher lesson plans, and assessment documents—are basic to much of the research that takes places in the school of education. however, there also is a large and increasing demand to use these records from real class rooms for educational purposes at the um and beyond, creating rich media materials for helping preservice and practicing teachers learn to see, understand, and engage in important practices of teaching. this desire to create widely distributed educational materials from classroom documentation raises two important challenges: first, there is the important challenge of protecting the identity of children (and, in some cases, teachers); and second, there is the difficult task of ensuring that the classroom records can be easily accessed by individuals who have permission to view and use the records while being inac cessible to those without permission. one research and materials development project at the um school of education has been exploring the use of living lab to support the critical work of processing classroom records for use in research and in educational materials, and the distribution and protection of class room records as they are integrated into teacher educa tion lessons and professional development sessions at the um and other sites in the united states. the findings and challenges of these efforts are summarized below. processing classroom records the classroom records used in the pilot were processed in three main ways, producing three different types of products: ■ preservation copies are highquality formats of the classroom records with minimal loss of digital infor mation that can be read by modern computers with standard software. these files are given standardized filenames, cleaned of artifacts and minor irregu larities, and deidentified (that is, digitally altered to remove any information that could reveal the identity of the students and, in some cases, of the teachers). ■ working copies are lowerquality versions of the preservation copies that are still sufficient for print ing or displaying and viewing. trading some degree of quality for smaller file sizes and thus data rates, the working copies are easier for people to use and share. additionally, these files are further devel oped to enhance usability: videos are clipped and composited to feature particular episodes; videos also are subtitled, flagged with chapter markers (or other types of coding), and embedded with links for accessing other relevant information; images of stu dent and teacher work are organized into multipage pdfs with bookmarks, links, and other navigational aids; and all files are embedded with metadata for aiding their discovery and revealing information about the files and their contents. ■ distribution copies are typically similar in quality to the working copies but are often integrated into other documents or with other content; they are labeled with copyright information and statements about the limitations of use. they are, in many cases, edited for use on a variety of platforms and copy protected in small ways (for example, word and powerpoint files are converted to pdfs). the living lab was found to support this processing of classroom records in two important ways. first, the system allowed for the setup and use of workflows that enabled undergraduate students hired by the project to upload processed files into the system and walk through a series of quality checks, focused on different aspects of the products. so, for example, when checking the preservation copies, one person was assigned to check the preservation copy against the actual artifact to make sure everything was captured adequately and that the resulting digital file was named properly (“quality check 1”). another individual was assigned to make sure the content was cleaned up properly and that no identifying information appeared anywhere (“quality check 2”). and finally, a third person checked the file against the meta data to make sure that all basic information about the file was correct (“quality check 3”). files that passed through all checks were organized into collections accessible to project members and others (“organize”). files that failed along the way were sent back to the beginning of the workflow (the “drawing board”), fixed, and checked again (see figure 7). figure 6. a dental student interviewing an spi. article title | author 11enterprise dam system pilot | kim, ahronheim, suzuka, king, bruell, miller, and johnson 11 second, living lab allowed asset and collection development to be carried out collaboratively and itera tively, enabling different individuals to add value in dif ferent ways over time. undergraduate students did much of the initial processing and checking of the assets; skilled staff members converted subtitles into speech metadata housed within living lab; and, eventually, project faculty and graduate students will add other types of analytic codes and content specific metadata to the assets. distribution and protection of classroom records in addition to supporting the production of various types of assets and collections, the living lab supported the distribution and protection of classroom records for use in education settings both at um and other institutions. for example, almost fifteen hours of classroom videos from a thirdgrade mathematics class were made acces sible to and were used by instructors and students in the college of education at michigan state university. in a different context, approximately ten minutes of classroom video was made available to instructors in mathematics departments at brigham young university, the university of georgia, and the city college of new york to use in courses for elementary teachers. each asset (and its derivatives) housed within living lab has a url that can be embedded within web pages and online coursemanagement systems, allowing for a great deal of flexibility in how and where the assets are pre sented and used. at the same time, each call to the server is checked and, when required, users are prompted to authen ticate by logging in before any assets are delivered. this has great potential for easily, seamlessly, and safely integrating living lab assets into a variety of web spaces. although this feature has indeed allowed for a great deal of flexibility, there were and continue to be challenges with creating an integrated and seamless experience for school of education students and their instructors. for example, depending on a variety of factors, such as user operating systems and web browser combinations, users might be prompted for multiple logins. additionally, the login for the living lab server can be quite unforgiving, locking out users who fail to login properly in the first few tries and providing limited communication about what has occurred and what needs to be done to correct the situation. discussion during the living lab pilot a number of workflow chal lenges were overcome that now allow numerous and varied types of media related to classroom records to be ingested into living lab, and derivatives created. this demonstrates that living lab is ready for complex media challenges associated with instruction. however, the next challenge of delivering easily and smoothly to others still remains. once authentication and authorization is con ducted using single signon techniques that allow users to access assets securely from living lab through other systems, assets will be able to be incorporated into web based materials and used to enhance the instruction of teachers in ways that have yet to be conceived. ■ privacy, intellectual property, and copyright during the course of the pilot, a number of issues emerged. among these were some of the most critical issues that institutions considering embarking on a similar asset man agement system need to address. these issues are: ■ privacy; ■ intellectual ownership and author control of materials; ■ digital rights management and copyright; ■ uncataloged materials backlog; and ■ user interface and integration with other campus systems. up to this point, enterprise dam systems had been developed and used primarily by commercial enterprises— for example, cnn and other broadcasting companies. using a product developed by and for the commercial sec tor brought to the fore the cultural differences between the academy and the commercial sector (see figure 8). the first three issues in the previous list are related to the differing cultures of commercial enterprise and academia. these issues are addressed below. the fourth and fifth issues are addressed in the section “other important issues.” privacy videos of medical procedures can be of tremendous value to students. in their own words, “watching is different from reading about it in a textbook.” but subjects have the right to retract their consent regarding the use of their images or treatment information for educational purposes. this creates a dilemma: if other assets have been cre ated using it, do all of them have to be withdrawn? for drawing board → quality check 1 → quality check 2 → quality check 3 → organize figure 7. living lab workflow 12 information technology and libraries | december 200712 information technology and libraries | december 2007 example, if a professor included an image from the univer sity’s dam system in a classroom powerpoint or keynote presentation, and subsequently included the presentation in the university’s dam system, what is the status of this file if the patient withdraws consent for use of her or his treatment information?26 when must the patient’s request be fulfilled? can it be done at the end of the semester, or does it need to be completed immediately? if the request must be fulfilled immediately, the faculty member may not have sufficient time to find a comparable replacement. waiting until the end of the semester helps balance patient privacy with teaching needs. in either case, files must be withdrawn from the enterprise dam system and links to those files removed. consent status and asset relationships must be part of the metadata for an asset to handle such situations. consideration must be given to associating a digital copy of all consent forms with the corresponding asset within an enterprise dam system. intellectual ownership and author control of materials authors’ rights, as recognized by the berne convention for the protection of literary and artistic works, have two components.27 one, the economic right in the work, is what is usually recognized by copyright law in the united states, being a property right that the author of the work can transfer to others through a contract. the other component—the moral rights of the author—is not explicitly acknowledged by copyright law in the united states and thus may escape consideration regarding ownership and use of intellectual property. moral rights include the right to the integrity of the work, and thus come into play in situations where a work is distorted or misrepresented. unlike economic rights, moral rights cannot be transferred and remain with the author. in a university setting, the university may own the economic right for a researcher’s work, in the form of copyright, but the researcher retains moral rights. the following incident illustrates what can happen when only property rights are taken into account. a digital video segment of a medical procedure was being shown as part of a living lab demo at a university it showcase. because the um held the copyright for that particular videotape, no problems were foreseen regarding its usage. a faculty member recognized the video as one she had cre ated several years ago and expressed great concern that it had been used for such a purpose without her knowledge or consent. the concern arose from the fact that video showed an outdated procedure. while the faculty member continued to use this video in the classroom, she felt this was different from having it available through the living lab. in the classroom, the faculty member alerted students to the outdated practices during the viewing, and she had full control over who viewed it. the faculty member felt she lost this control and additional clarification when the video became available through living lab. that is, her work was now misrepresented and her moral rights as an author were violated. digital rights management and copyright in the academic world, digital rights management (drm) is becoming a necessary component in disseminating intellectual products of all forms.28 however, at this time there are few standards and no technical drm solution that works for all media on all platforms. therefore, um has elected to use social rather than technical means of managing digital rights. the living lab metadata schema provides an element for rights statements, dc_rights. these metadata, combined with education of the univer sity community about copyright, fair use, and the highly granular access control and privileges management of the system, provide the community with the knowledge and tools to use the assets ethically. the university can establish rights declarations to use in the dc_rights field as standards are developed and prec edent is established in the courts. these declarations may include copyright licenses developed by the university legal counsel as well as those from the creative commons.29 current solution—access control lists a clear difference between the cultures of commercial enterprises and academia emerged regarding access to assets, administered through acls.30 an acl specifies commercial dam system model university dam system model assets held centrally federated ownership of assets access, roles, and privileges managed centrally distributed management of access, privileges and roles metadata frameworks— monolithic federated metadata schema agnostic user interface(s) re: privileges, ownership figure 8. differences between commerical and university uses of a dam system. article title | author 13enterprise dam system pilot | kim, ahronheim, suzuka, king, bruell, miller, and johnson 13 who is allowed to access an asset and how they can use it. in commercial settings, access to assets is centrally managed, while in academia, with its complex set of intellectual and copyright issues, it is preferable to have them managed by the asset holders. university users repeatedly asked for the ability to define acls for each asset in the living lab. currently, end users and support staff cannot define acls—only system administrators can create them. the middleware for userdefined acls has been fully developed, and the user interface for user defined acls will be made available in the next version. this capability is important in the academic envi ronment because the composition of group(s) of people requiring access to a particular asset is fluid and can span many organizational boundaries, both within and outside the university. a research group owning a collection of assets may want to restrict access for various reasons, including requirements set forth by an institutional review board (irb, a university group that oversees research projects involving human subjects), or regulations such as the health insurance portability and accountability act of 1996, which addresses patient health information privacy.32 the research group will want flexible access control, as research group members may collaborate with others inside and outside the university. the original irb approval may specify that confidentiality of the subjects must be maintained, and collected data, such as video or transcripts, can only be viewed by those directly involved in the research project and cannot be browsed by other researchers not involved in the study or the public at large. in another situation, a collection of art images may only be viewed by current students of the institution, thus requiring a different acl. this situation is still open to interpretation. some say patient consent regarding the use of information for instructional purposes cannot be withdrawn for the use of existing information at the home institution. they can only withdraw it for the use of future assets. others may feel that patients can withdraw permission for the use of their patient assets. other important issues uncataloged materials backlog what emerged from interviews and focus groups with content providers was that while there was no lack of assets they would like to see online, a large proportion of these assets had never been cataloged or even sys tematically labeled in some form. this finding may be attributed in part to the pilot focusing on existing assets that have previously not been available for widespread sharing—such as the files stored on faculty hard disks and departmental servers—only known to a favored few. owners or creators of these materials had not consciously thought about sharing these materials or making them available to others. librarians, in contrast, have devel oped systems and practices to ensure the findability of materials that enter the library. asset owners were more than willing to have the assets placed online, but did not have the time or resources to provide the appropriate metadata. hiring personnel to create the metadata is problematic, as there is a limit to the metadata that can be entered by nonexperts, and experts often are scarce and expensive. for example, for a collection of oral pathology images of microscopic slides, a subject expert must provide the diagnoses, stain, magnification, and other information for each image. without these details, merely putting the slides online is of little value, but these metadata cannot be provided by laypeople. collaborative metadata creation, allowing multiple metadata authors and iterations, may be one solution to this problem. a number of studies indicate that both organiza tional support and userfriendly metadata creation tools are necessary for resource authors to create high quality metadata.33 some of the backlog may be resolved through development of tools aimed at resource authors. in addition, increased use of digital file formats with embedded metadata may contribute to reducing future backlog by requiring less human involvement in meta data creation. faculty need to be taught that metadata raises the value and utility of assets. as they come to understand the essential role metadata plays, they, too, will invest in its creation. user interface and integration with other systems an enterprise dam system has two basic types of uses: by producers and by users. producers tend to be digital media technologists who create the digital assets and ingest them into the enterprise dam system. the users are the faculty, students, and staff who use these digital assets in their teaching, learning, or research. the research and development version of the enter prise dam system, living lab, works well for digital asset producers, but not for the users of these digital assets. ingestion and accessing processes are quite complex and are not currently integrated with other campus systems, such as the online library catalog or the sakaibased, campuswide course management sys tem, ctools.34 digital producers who are comfortable with complex systems are able to ingest and access rich media. however, users have to log onto the enterprise dam system and navigate its complex user interface. the level of complexity of accessing the media can cre ate a barrier to adoption and use. if the level of complex ity for accessing the assets is too high for users, then the system also is too complex to expect users to contribute to the ingestion of digital assets. 14 information technology and libraries | december 200714 information technology and libraries | december 2007 in both student and faculty focus groups there was concern about the technical skills needed for faculty use of an enterprise dam system in the classroom. ideally faculty should be able to incorporate assets seamlessly from the enterprise dam system to their classroom mate rials, such as powerpoint or keynote presentations. then, the presentations created on their computers should dis play without glitches on the classroom system. obviously faculty members cannot be expected to troubleshoot in the classroom when display problems occur. if the enterprise dam system is perceived as difficult to use, or as requiring a lot of troubleshooting by the user, this will discourage adoption by the faculty. this creates additional demands on the enterprise dam system, and potential additional it staffing demands for the academic units wanting to promote enterprise dam system use. when a problem is experienced in the classroom, the departmental it support, not the enterprise dam system support team, will be the first to be called. ideally, an enterprise dam system should be linked to the campus it infrastructure such that users or con sumers do not interact with the dam system itself, but rather through existing academic tools, such as the library gateway, course management system, or departmental web sites. having to learn a new system could be a sig nificant barrier to use for many potential dam system users in academia. ■ conclusions and lessons learned the vision of a dam system that would allow faculty and students easy yet secure access to myriad rich media assets is extremely appealing to members of the academy. conducting the pilot projects revealed numerous techni cal and cultural problems to resolve prior to achieving this vision. the authors anticipate that other institutions will need to address these same issues before undertaking their own enterprise dam system. using commercial software developed in academia during the course of the living lab pilot, the differ ences between academia and the commercial sector proved to be a significant issue. assumptions about the organizational culture and work methods are built into systems, often in a tacit manner. in the case of the initial iteration of the living lab, these assumptions were those of the corporate world, the primary clients of the commercial providers as well the environment of the developers. um project participants, meanwhile, brought their own expectations based on the reality of their work environment in academia. universities do not have a strict hierarchical structure, with each aca demic unit and department having a great degree of local control. academia also has a culture of sharing, where teaching or research products are often shared with no payment involved, other than acknowledgment of the source. thus, there was a process of mutual edu cation and negotiation regarding what was and was not acceptable in the enterprise dam system implementa tion. this difference of cultures first manifested itself with acls. in the initial implementation, an acl could be defined only by a system administrator. this was a showstopper for the um participants, who thought that asset providers themselves would be able to define and modify the acl for any particular asset. a centralized solution with a single owner of the assets (the company), which is acceptable in the corporate environment, is not acceptable in a university environment, where each user is consumer and owner. defining who has access to an asset can be a complex problem in academia, since this access is a moving target subject to both departmental and institutional constraints. libraries and librarians the traditional role of libraries is one of preserving and making accessible the intellectual property of all of humanity. with each new advance in information tech nology, such as dam systems, the role of libraries and librarians continues to evolve. this pilot highlighted the role and value of librarians skilled in metadata develop ment and assignment. without their expertise and early involvement, there would have been no standard method of indexing assets, thus preventing users from finding useful media. also, the project reinforced two reasons for encouraging asset creators to assign metadata at the asset creation point instead of at the archival point. one, this ensures that metadata are assigned when the content expertise is available. it is very difficult for producers to assign metadata retrospectively, and the indexing information may no longer be available at the point of archive. two, metadata assignment at the point of asset creation helps to ensure consistent metadata assignment that lends itself to automated solutions at the time of archiving.35 thus, while their role in digital asset man agement systems continues to evolve, the authors predict that the librarians’ role will evolve around metadata, and that libraries will start to become the archive for digital materials. it is anticipated that librarians will work with technical experts to develop workflows that include the automated metadata assignment to help faculty routinely add existing and new collections of assets to the system. one example of such a role is deep blue at the university of michigan. deep blue is a digital framework for pre serving and finding the best scholarly and artistic work produced at the university. article title | author 15enterprise dam system pilot | kim, ahronheim, suzuka, king, bruell, miller, and johnson 15 production productivity new technical complexities emerge with each new asset collection added to the um system. new workflows as well as richer software features continue to be developed to meet newly identified integration and user interface needs. as the living lab experience advances, techni cal barriers are eliminated and new workflows auto mated. the authors anticipate that, eventually, automated workflows will allow faculty and staff to routinely use digital assets with a minimum of technical expertise, thus decreasing the personnel costs associated with the use of rich media. for the foreseeable future, however, techni cally knowledgeable staff will be required to develop these workflows and even complete a significant amount of the work. academic practice the more delicate and challenging issue is educating fac ulty on the value and power of digital assets to improve their research and teaching. dam is a new concept to fac ulty, and it will only become useful when integrated into their daily teaching and research. this will happen as fac ulty members become more knowledgeable and increase their comfort in the use of digital assets. the dental case study demonstrates that an improved student experience can be provided with such an asset management system, while the education case study demonstrates that a com plex set of authentic classroom materials can be orga nized and ingested for use by others. these case studies are only two examples of the unanticipated outcomes that result from the use of digital assets in education. the authors predict that as more unanticipated and innova tive uses of digital assets are discovered, these new uses will, in turn, lead to increased academic productivity—for example, teaching more without increasing the number of faculty, students teaching each other with rich media, smallgroup work, and projectbased learning. the list of possibilities is endless. as the living lab evolved from a research and development project into the implementation project known as bluestream, it has become an actual classroom resource. this article described myriad issues that were addressed so that other institutions can embark on their own enterprise dam systems fully informed about the road ahead. the remaining technical issues can and will be resolved over time. the greatest challenges that remain are being discovered as faculty and students use bluestream to improve teaching, learning, and research activities. the success of bluestream specifically, and enterprise dam systems in general, will be determined by their successes and failures in meeting the needs of faculty and students. ■ acknowledgements the authors recognize that the living lab pilot program was conducted with the support of others. we thank ruxandraana iacob for her administrative contributions to the project. we thank both ruxandraana iacob and sharon grayden for their assistance with writing this article. thanks to karen dickinson for her encourage ment, optimism, and constant support throughout the project. we thank mark fitzgerald for his vision regard ing the potential of the school of dentistry spi project and for conducting the original research. the living lab pilot was conducted with support from the university of michigan office of the provost through the carat partnership program, which pro vided funding for the pilot, and the caratrackham fellowship program, which funded the metadata work. references 1. a. doyle and l. dawson, “current practices in digital asset management,” internet2/cni performance archive & retrieval working group, 2003, http://docs.internet2.edu/ doclib/draftinternet2humanitiesdigitalassetmanagement practices200310.html (accessed feb. 17, 2007). 2. d. z. spicer, p. b. deblois, and the educause current issues committee. “fifth annual educause survey identifies current it issues.” educause quarterly 27, no. 2 (2004): 8–22. 3. humanities advanced technology and information insti tute (hatii), university of glasgow, and the national initiative for a networked cultural heritage (ninch), “the ninch guide to good practice in the digital representation and man agement of cultural heritage materials,” 2003, www.nyu.edu/ its/humanities/ninchguide (accessed july 10, 2005). 4. a. mccord, “overview of digital asset management sys tems,” educause evolving technologies committee, sept. 6, 2002. 5. james l. hilton, “digital management systems,” educause review 38, no. 2 (2003): 53. 6. james. hilton, “university of michigan digital asset management system,” 2004. http://sitemaker.umich.edu/ bluestream/files/dams_year01_campus.ppt (accessed feb. 15, 2007). 7. the university of michigan, “bluestream,” 2006, http:// sitemaker.umich.edu/bluestream (accessed feb. 15, 2007). 8. oracle corp., “stellent universal content management,” 2006, www.stellent.com/en/index.htm (accessed feb. 15, 2007); artesia digital media group, “artesia: the open text digital media group,” 2006, www.artesia.com/ (accessed feb. 15, 2007); canto, “canto,” 2007, www.canto.com (accessed feb. 15, 2007). 9. r. d. vernon and o. v. riger, “digital asset management: an introduction to key issues,” www.cit.cornell.edu/oit/arch init/digassetmgmt.html (accessed sept. 24, 2004); yan han, “digital content management: the search for a content man agement system,” library hi tech 22, no. 4 (2004): 355–65; stan ford university libraries and academic information resources, 16 information technology and libraries | december 200716 information technology and libraries | december 2007 “media preservation: digital preservation,” 2005, http://library. stanford.edu/depts/pres/mediapres/digital.html (accessed july 29, 2005). 10. telestream, “telestream, inc.,” 2005, www.telestream.net/ products/flipfactory.htm (accessed feb. 15, 2007). 11. autonomy, inc., “virage products overview: virage vid eologger,” 2006, www.virage.com/content/products/index. en.html (accessed feb. 15, 2007). 12. international business machines corp., “ancept media server: digital asset management solution,” 2007, www.nasi. com/ancept.php (accessed feb. 15, 2007). 13. realnetworks, inc., “realnetworks media servers,” 2007, www.realnetworks.com/products/media_delivery.html (accessed feb. 15, 2007); apple, inc., “quicktime streaming server,” 2007, www.apple.com/quicktime/streamingserver (accessed feb. 15, 2007); international business machines corp., “db2 content manager video charger,” 2007, www306.ibm. com/software/data/videocharger/ (accessed feb. 15, 2007). 14. sakai, “sakai: collaboration and learning environment for education,” 2007, www.sakaiproject.org (accessed feb. 15, 2007). 15. the university of michigan, “michigan radio,” 2007, www.michiganradio.org (accessed feb. 15, 2007). 16. the university of michigan, “block m records,” 2005, www.blockmrecords.org (accessed feb. 15, 2007); the univer sity of michigan, “deep blue,” 2007, http://deepblue.lib.umich. edu (accessed feb. 15, 2007). 17. e. duval et al., “metadata principles and practicalities,” d-lib magazine 8, no 4 (2002); a. m. white et al., “pb core— the public broadcasting metadata initiative: progress report,” 2003 dublin core conference sept. 28–oct. 2, 2003, seattle; j. attig, a. copeland, and m. pelikan, “context and meaning: the challenges of metadata for a digital image library within the university,“ college & research libraries 65, no. 3 (may 2004): 251–61. 18. white et al., “pb core—the public broadcasting meta data initiative”; attig, copeland, and pelikan, “context and meaning.” 19. dublin core metadata initiative, “dublin core metadata initiative,” 2007, http://dublincore.org (accessed feb. 15, 2007). 20. visual resources association, “vra core categories, version 3.0,” 2002, www.vraweb.org/vracore3.htm (accessed feb. 15, 2007); louis j. goldberg, et al., “the significance of snodent,” studies in health technology and informatics 116 (aug. 2005): 737–42; http://ontology.buffalo.edu/medo/sno dent_05.pdf (accessed feb. 15, 2007). 21. autonomy, “virage products overview.” 22. filemaker, inc., “filemaker,” 2007, www.filemaker.com/ products (accessed feb. 15, 2007). 23. m. fitzgerald et al., “efficacy of speechtotext technol ogy in managing video recorded interactions,” journal of dental research 85, special issue a (2006): abstract no. 833. 24. u.s. department of education, “family educational rights and privacy act ferpa,” 2005, www.ed.gov/policy/ gen/guid/fpco/ferpa/index.html (accessed feb. 15, 2007). 25. g. lorenzo and j. ittelson, “an overview of eportfolios,” educause learning initiative, 2005, http://educause.edu/ir/ library/pdf/eli3001.pdf (accessed feb. 15, 2007). 26. microsoft corp., “microsoft office powerpoint 2007,” 2007, http://office.microsoft.com/enus/powerpoint/default. aspx (accessed feb. 15, 2007); apple, inc., “keynote,” 2007, www.apple.com/iwork/keynote (accessed feb. 15, 2007). 27. world intellectual property organization, “berne con vention for the protection of literary and artistic works,” 1979, www.wipo.int/treaties/en/ip/berne/trtdocs_wo001.html (accessed feb. 15, 2007). 28. wikimedia foundation, inc., “digital rights manage ment,” 2007, http://en.wikipedia.org/wiki/digital_rights_ management (accessed feb. 15, 2007). 29. creative commons, “creative commons,” 2007, http:// creativecommons.org (accessed feb. 15, 2007). 30. wikimedia foundation, inc., “access control list,” 2007, http://en.wikipedia.org/wiki/access_control_list (accessed feb. 15, 2007). 31. the university of michigan, “um institutional review boards,” 2007, www.irb.research.umich.edu (accessed feb. 15, 2007). 32. health insurance portability and accountability act of 1996 (hipaa), “centers for medicare and medicaid ser vices,” 2005, www.cms.hhs.gov/hipaageninfo/downloads/ hipaalaw.pdf (accessed feb. 15, 2007). 33. j. greenberg et al., “authorgenerated dublin core meta data for web resources: a baseline study in an organization,” journal of digital information 2, no. 2 (2002), http://journals.tdl. org/jodi/article/view/jodi39/45 (accessed nov. 10, 2007); a. crystal and j. greenberg, “usability of a metadata creation application for resource authors,” library & information science research 27, no. 2 (2005): 177–89. 34. the university of michigan, “ctools,” 2007, https:// ctools.umich.edu/portal (accessed feb. 15, 2007). 35. m. cox et al., descriptive metadata for television (amster dam: focal pr., 2006); michael a. chopey, “planning and imple menting a metadatadriven digital repository,” cataloging & classification quarterly 40, no. 3/4 (2005): 255–87. the efficient storage of text documents in digital libraries | skibiński and swacha 143 przemysław skibiński and jakub swacha the efficient storage of text documents in digital libraries przemysław skibiński (inikep@ii.uni.wroc.pl) is [qy: title?], institute of computer science, university of wrocław, poland. jakub swacha (jakubs@uoo.univ.szczecin.pl) is [qy: title?], institute of information technology in management, university of szczecin, poland. przemysław skibiński and jakub swacha the efficient storage of text documents in digital libraries in this paper we investigate the possibility of improving the efficiency of data compression, and thus reducing storage requirements, for seven widely used text document formats. we propose an open-source text compression software library, featuring an advanced word-substitution scheme with static and semidynamic word dictionaries. the empirical results show an average storage space reduction as high as 78 percent compared to uncompressed documents, and as high as 30 percent compared to documents compressed with the free compression software gzip. i t is hard to expect the continuing rapid growth of global information volume not to affect digital libraries.1 the growth of stored information volume means growth in storage requirements, which poses a problem in both technological and economic terms. fortunately, the digital librarys’ hunger for resources can be tamed with data compression.2 the primary motivation for our research was to limit the data storage requirements of the student thesis electronic archive in the institute of information technology in management at the university of szczecin. the current regulations state that every thesis should be submitted in both printed and electronic form. the latter facilitates automated processing of the documents for purposes such as plagiarism detection or statistical language analysis. considering the introduction of the three-cycle higher education system (bachelor/master/doctorate), there are several hundred theses added to the archive every year. although students are asked to submit microsoft word–compatible documents such as doc, docx, and rtf, other popular formats such as tex script (tex), html, ps, and pdf are also accepted, both in the case of the main thesis document, containing the thesis and any appendixes that were included in the printed version, and the additional appendixes, comprising materials that were left out of the printed version (such as detailed data tables, the full source code of programs, program manuals, etc.). some of the appendixes may be multimedia, in formats such as png, jpeg, or mpeg.3 notice that this paper deals with text-document compression only. although the size of individual text documents is often significantly smaller than the size of individual multimedia objects, their collective volume is large enough to make the compression effort worthwhile. the reason for focusing on text-document compression is that most multimedia formats have efficient compression schemes embedded, whereas text document formats usually either are uncompressed or use schemes with efficiency far worse than the current state of the art in text compression. although the student thesis electronic archive was our motivation, we propose a solution that can be applied to any digital library containing text documents. as the recent survey by kahl and williams revealed, 57.5 percent of the examined 1,117 digital library projects consisted of text content, so there are numerous libraries that could benefit form implementation of the proposed scheme.4 in this paper, we describe a state-of-the-art approach to text-document compression and present an opensource software library implementing the scheme that can be freely used in digital library projects. in the case of text documents, improvement in compression effectiveness may be obtained in two ways: with or without regard to their format. the more nontextual content in a document (e.g., formatting instructions, structure description, or embedded images), the more it requires format-specific processing to improve its compression ratio. this is because most document formats have their own ways of describing their formatting, structure, and nontextual inclusions (plain text files have no inclusions). for this reason, we have developed a compound scheme that consists of several subschemes that can be turned on and off or run with different parameters. the most suitable solution for a given document format can be obtained by merely choosing the right schemes and adequate parameter values. experimentally, we have found the optimal subscheme combinations for the following formats used in digital libraries: plain text, tex, rtf, text annotated with xml, html, as well as the device-independent rendering formats ps and pdf.5 first we discuss related work in text compression, then describe the basis of the proposed scheme and how it should be adapted for particular document formats. the section “using the scheme in a digital library project” discusses how to use the free software library that implements the scheme. then we cover the results of experiments involving the proposed scheme and a corpus of test files in each of the tested formats. n text compression there are two basic principles of general-purpose data compression. the first one works on the level of character sequences, the second one works on the level of przemysław skibiński (inikep@ii.uni.wroc.pl) is associate professor, institute of computer science, university of wrocław, poland. jakub swacha (jakubs@uoo.univ.szczecin .pl) is associate professor, institute of information technology in management, university of szczecin, poland. 144 information technology and libraries | september 2009 individual characters. in the first case, the idea is to look for matching character sequences in the past buffer of the file being compressed and replace such sequences with shorter code words; this principle underlies the algorithms derived from the concepts of arbraham lempel and jacob ziv (lz-type).6 in the second case, the idea is to gather frequency statistics for characters in the file being compressed and then assign shorter code words for frequent characters and longer ones for rare characters (this is exactly how huffman coding works—what arithmetic coding assigns are value ranges rather than individual code words).7 as the characters form words, and words form phrases, there is high correlation between subsequent characters. to produce shorter code words, a compression algorithm either has to observe the context (understood as several preceding characters) in which the character appeared and maintain separate frequency models for different contexts, or has to first decorrelate the characters (by sorting them according to their contexts) and then use an adaptive frequency model when compressing the output (as the characters’ dependence on context becomes dependence on position). whereas the former solution is the foundation of prediction by partial match (ppm) algorithms, burrows-wheeler transform (bwt) compression algorithms are based on the latter.8 witten et al., in their seminal work managing gigabytes, emphasize the role of data compression in text storage and retrieval systems, stating three requirements for the compression process: good compression, fast decoding, and feasibility of decoding individual documents with minimum overhead.9 the choice of compression algorithm should depend on what is more important for a specific application: better compression or faster decoding. an early work of jon louis bentley and others showed that a significant improvement in text compression can be achieved by treating a text document as a stream of space-delimited words rather than individual characters.10 this technique can be combined with any general-purpose compression method in two ways: by redesigning character-based algorithms as word-based ones or by implementing a two-stage scheme whose first step is a transform replacing words with dictionary indices and whose second step is passing the transformed text through any generalpurpose compressor.11 from the designer’s point of view, although the first approach provides more control over how the text is modeled, the second approach is much easier to implement and upgrade to future general-purpose compressors.12 notice that the separation of the wordreplacement stage from the compression stage does not imply that two distinct programs have to be used—if only an appropriate general-purpose compression software library is available, a single utility can use it to compress the output of the transform it first performed. an important element of every word-based scheme is the dictionary of words that lists character sequences that should be treated as single entities. the dictionary can be dynamic (i.e., constructed on-line during the compression of every document),13 static (i.e., constructed off-line before the compression stage and once for every document of a given class—typically, the language of the document determines its class),14 or semidynamic (i.e., constructed off-line before compression stage but individually for every document).15 semidynamic dictionaries must be stored along with the compressed document. dynamic dictionaries are reconstructed during decompression (which makes the decoding slower than in the other cases). when the static dictionary is used, it must be distributed with the decoder; since a single dictionary is used to compress multiple files, it usually attains the best compression ratios, but it is only effective with documents of the class it was originally prepared for. n the basic compression scheme the basis of our approach is a word-based, lossless text compression scheme, dubbed compression for textual digital libraries (ctdl). the scheme consists of up to four stages: 1. document decompression 2. dictionary composition 3. text transform 4. compression stages 1–2 are optional. the first is for retrieving textual content from files compressed poorly with generalpurpose methods. it is only executed for compressed input documents. it uses an embedded decompressor for files compressed using the deflate algorithm,16 but an external tool—precomp—is used to decode natively compressed pdf documents.17 the second stage is for constructing the dictionary of the most frequent words in the processed document. doing so is a good idea when the compressed documents have no common set of words. if there are many documents in the same language, a common dictionary fares better—it usually does not pay off to store an individual dictionary with each file because they all contain similar lists of words. for this reason we have developed two variants of the scheme. the basic ctdl includes stage 2; therefore it can use a document-specific semidynamic dictionary in the third stage. the ctdl+ variant uses a static dictionary common for all files in the same language; therefore it can omit stage 2. during stage 2, all the potential dictionary items that meet the word requirements are extracted from the document and then sorted according to their frequency the efficient storage of text documents in digital libraries | skibiński and swacha 145 to form a dictionary. the requirements define the minimum length and frequency of a word in the document (by default, 2 and 6 respectively) as well as its content. only the following kinds of strings are accepted into the dictionary: n a sequence of lowercase and uppercase letters (“a”–“z”, “a”–“z”) and characters with ascii code values from range 128–255 (thus it supports any typical 8-bit text encoding and also utf-8) n url address prefixes of the form “http:// domain/,” where domain is any combination of letters, digits, dots, and dashes n e-mails—patterns of the form “login@domain,” where login and domain are any combination of letters, digits, dots, and dashes n runs of spaces stage 3 begins with parsing the text into tokens. the tokens are defined by their content; as four types of content are distinguished, there are also four classes of tokens: words, numbers, special tokens, and characters. every token is then encoded in a way that depends on the class it belongs to. the words are those character sequences that are listed in the dictionary. every word is replaced with its dictionary index, which is then encoded using symbols that are rare or nonexistent in the input document. indexes are encoded with code words that are between one and four bytes long, with lower indexes (denoting more frequent words) being assigned shorter code words. the numbers are sequences of decimal digits, which are encoded with a dense binary code, and, similarly to letters, placed in a separate location in the output file. the special tokens can be decimal fractions, ip numerical addresses, dates, times, and numerical ranges. as they have a strict format and differ only in numerical values, they are encoded as sequences of numbers.18 finally, the characters are the tokens that do not belong to any of the aforementioned group. they are simply copied to the output file, with the exception of those rare characters that were used to construct code words; they are copied as well, but have to be preceded with a special escape symbol. the specialized transform variants (see the next section) distinguish three additional classes from the character class: letters (words not in the dictionary), single white spaces, and multiple white spaces. stage 4 could use any general-purpose compression method to encode the output of stage 3. for this role, we have investigated several open-licensed, generalpurpose compression algorithms that differ in speed and efficiency. as we believe that document access speed is important to textual digital libraries, we have decided to focus on lz–type algorithms because they offer the best decompression times. ctdl has two embedded backend compressors: the standard deflate and lzma, wellknown for its ability to attain high compression ratios.19 n adapting the transform for individual text document formats the text document formats have individual characteristics; therefore the compression ratio can be improved by adapting the transform for a particular format. as we noted in the introduction, we propose a set of subschemes (modifications of the original processing steps or additional processing steps) that can help compression— provided the issue that a given subscheme addresses is valid for the document format being compressed. there are two groups of subschemes: the first consists of solutions that can be applied to more than one document format. it includes n changing the minimum word frequency threshold (the “minfr” column in table 1) that a word must pass to be included in the semidynamic dictionary (notice that no word can be added to a static dictionary); n using spaceless word model (“wdspc” column in table 1) in which a single space between two words is not encoded at all; instead, a flag is used to mark two neighboring words that are not separated by a space; n run-length encoding of multiple spaces (“spruns” column in table 1); n letter containers (“letcnt” column in table 1), that is, removing sequences of letters (belonging to words that are not included in the dictionary) to a separate location in the output file (and leaving a flag at their original position). table 1 shows the assignment of the mentioned subschemes to document formats, with “+” denoting that a given subscheme should be applied when processing a given document format. notice that we use different subschemes for the same format depending on whether a semidynamic (ctdl) or static (ctdl+) dictionary is used. the remaining subschemes are applied for only one document format. they attain an improvement in compression performance by changing the definition of acceptable dictionary words, and, in one case (ps), by changing the definition of number strings. the encoder for the simplest of the examined formats—plain text files—performs no additional formatspecific processing. the first such modification is in the tex encoder. the difference is that words beginning with “\” (tex 146 information technology and libraries | september 2009 instructions) are now accepted in the dictionary. the modification for pdf documents is similar. in this case, bracketed words (pdf entities)— for example “(abc)”—are acceptable as dictionary entries. notice that pdf files are internally compressed by default—the transform can be applied after decompressing them into textual format. the precomp tool is used for this purpose. the subscheme for ps files features two modifications: its dictionary accepts words beginning with “/” and “\” or ending with “(“, and its number tokens can contain not only decimal but also hexadecimal digits (though a single number must have at least one decimal digit). the hexadecimal number must be at least 6 digits long, and is encoded with a flag: a byte containing its length (numbers with more than 261 digits are split into parts) and a sequence of bytes, each containing two digits from the number (if the number of digits is odd, the last byte contains only one digit). for rtf documents, the dictionary accepts the “\”-preceded words, like the tex files. moreover, the hexadecimal numbers are encoded in the same way as in the ps subscheme so that rtf documents containing images can be significantly reduced in size. specialization for xml is roughly the transform described in our earlier article, “revisiting dictionarybased compression.”20 it allows for xml start tags and entities to be added to dictionary, and it replaces every end tag respecting the xml well-formedness rule (i.e., closing the element opened most recently) with a single flag. it also uses a single flag to denote xml attribute value begin and end marks. html documents are handled similarly. the only difference is that the tags that, according to the html 4.01 specification, are not expected to be followed by an endtag (base, link, xbasehref, br, meta, hr, img, area, input, embed, param and col) are ignored by the mechanism replacing closing tags (so that it can guess the correct closing tag even after the singular tags were encountered).21 n using the scheme in a digital library project many textual digital libraries seriously lack text compression capabilities, and popular digital library systems, such as greenstone, have no embedded efficient text compression.22 therefore we have decided to develop ctdl as an open-source software library. the library is free to use and can be downloaded from www.ii.uni.wroc .pl/~inikep/research/ctdl/ctdl09.zip. the library does not require any additional nonstandard libraries. it has both the text transform and back-end compressors embedded. however, compressing pdf documents requires them to be decompressed first with the free precomp tool. the compression routines are wrapped in a code selecting the best algorithm depending on the chosen compression mode and the input document format. the interface of the library consists of only two functions: ctdl_encode and ctdl_decode, for, respectively, compressing and decompressing documents. ctdl_encode takes the following parameters: n char* filename—name of the input (uncompressed) document n char* filename_out—name of the output (compressed) document n efiletype ftype—format of the input document, defined as: enum efiletype { html, pdf, ps, rtf, tex, txt, xml}; n edictionarytype dtype—dictionary type, defined as: enum edictionarytype { static, semidynamic }; ctdl_decode takes the following parameters: n char* filename—name of the input (compressed) document n char* filename_out—name of the output (decompressed) document table 1. universal transform optimizations ctdl settings ctdl+ settings format minfr wdspc spruns letcnt wdspc spruns letcnt html 3 + + + + + pdf 3 ps 6 + + rtf 3 + + + tex 3 + + + + + + txt 6 + + + + + + xml 3 + + + + + the efficient storage of text documents in digital libraries | skibiński and swacha 147 the library was written in the c++ programming language, but a compiled static library is also distributed; thus it can be used in any language that can link such libraries. currently, the library is compatible with two platforms: microsoft windows and linux. to use static dictionaries, the respective dictionary file must be available. the library is supplied with an english dictionary trained on a 3 gb text corpus from project gutenberg.23 seven other dictionaries—german, spanish, finnish, french, italian, polish, and russian— can be freely downloaded from www.ii.uni.wroc.pl/~inikep/ research/dicts. there also is a tool that helps create a new dictionary from any given corpus of documents, available from skibiński upon request via e-mail (inikep@ii.uni .wroc.pl). the library can be used to reduce the storage requirements or also to reduce the time of delivering a requested document to the library user. in the first case, the decompression must be done on the server side. in the second case, it must be done on the client side, which is possible because stand-alone decompressors are available for microsoft windows and linux. obviously, a library can support both options by providing the user with a choice whether a document should be delivered compressed or not. if documents are to be decompressed client-side, the basic ctdl, using a semidynamic dictionary, seems handier, since it does not require the user to obtain the static dictionary that was used to compress the downloaded document. still, the size of such a dictionary is usually small, so it does not disqualify ctdl+ from this kind of use. n experimental results we tested ctdl experimentally on a benchmark set of text documents. the purpose of the tests was to compare the storage requirements of different document formats in compressed and uncompressed form. in selecting the test files we wanted to achieve the following goals: n test all the formats listed in table 1 (therefore we decided to choose documents that produced no errors during document format conversion) n obtain verifiable results (therefore we decided to use documents that can be easily obtained from the internet) n measure the actual compression improvement from applying the proposed scheme (apart from the rtf format, the scheme is neutral to the images embedded in documents; therefore we decided to use documents that have no embedded images) for these reasons, we used the following procedure for selecting documents to the test set. first, we searched the project gutenberg library for tex documents, as this format can most reliably be transformed into the other formats. from the fifty-one retrieved documents, we removed all those containing images as well as those that the htlatex tool failed to convert to html. in the eleven remaining documents, there were four jane austen books; this overrepresentation was handled by removing three of them. the resulting eight documents are given in table 2. from the tex files we generated html, pdf, and ps documents. then we used word 2007 to transform html documents into rtf, doc, and xml (thus this is the microsoft word xml format, not the project gutenberg xml format). the txt files were downloaded from project gutenberg. the tests were conducted on a low-end amd sempron 3000+ 1.80 ghz system with 512 mb ram and a seagate 80 gb ata drive, running windows xp sp2. for comparison purposes, we used three generalpurpose compression programs: n gzip implementing deflate n bzip2 implementing a bwt-based compression algorithm table 2. test set documents specification file name title author tex size (bytes) 13601-t expositions of holy scripture: romans corinthians maclaren 1,443,056 16514-t a little cook book for a little girl benton 220,480 1noam10t north america, v. 1 trollope 804,813 2ws2610 hamlet shakespeare 194,527 alice30 alice in wonderland carroll 165,844 cdscs10t some christmas stories dickens 127,684 grimm10t fairy tales grimm 535,842 pandp12t pride and prejudice austen 727,415 148 information technology and libraries | september 2009 n ppmvc implementing a ppm-derived compression algorithm24 tables 3–10 show n the bitrate attained on each test file by the deflatebased gzip in default mode, the proposed compression scheme in the semidynamic and static variants with deflate as the back-end compression algorithm, 7-zip in lzma mode, the proposed compression scheme in the semidynamic and static variants with lzma as the back-end compression algorithm, bzip2 and ppmvc; n the average bitrate attained on the whole test corpus; and n the total compression and decompression times (in seconds) for the whole test corpus, measured on the test platform (they are total elapsed times including program initialization and disk operations). bitrates are given in output bits per character of an uncompressed document in a given format, so a smaller table 3. compression efficiency and times for the txt documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.944 2.244 2.101 2.337 2.057 1.919 2.158 1.863 16514-t 2.566 2.150 1.969 2.228 1.993 1.838 2.010 1.780 1noam10t 2.967 2.337 2.109 2.432 2.151 1.958 2.160 1.946 2ws2610 3.217 2.874 2.459 2.871 2.659 2.312 2.565 2.343 alice30 2.906 2.533 2.184 2.585 2.360 2.056 2.341 2.090 cdscs10t 3.222 2.898 2.298 2.928 2.721 2.192 2.694 2.436 grimm10t 2.832 2.275 2.090 2.357 2.079 1.931 2.112 1.886 pandp12t 2.901 2.251 2.097 2.366 2.061 1.930 2.032 1.835 average 2.944 2.445 2.163 2.513 2.260 2.017 2.259 2.022 comp. time 0.688 1.234 0.954 6.688 2.640 2.281 2.110 3.281 dec. time 0.125 0.454 0.546 0.343 0.610 0.656 0.703 3.453 table 4. compression efficiency and times for the tex documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.927 2.233 2.092 2.328 2.049 1.913 2.146 1.852 16514-t 2.277 1.904 1.794 1.957 1.744 1.645 1.746 1.534 1noam10t 2.976 2.370 2.142 2.445 2.186 1.986 2.195 1.976 2ws2610 3.206 2.906 2.482 2.864 2.674 2.323 2.562 2.340 alice30 2.897 2.526 2.183 2.573 2.350 2.048 2.332 2.085 cdscs10t 3.224 2.931 2.328 2.941 2.759 2.222 2.723 2.466 grimm10t 2.831 2.304 2.120 2.364 2.113 1.960 2.143 1.910 pandp12t 2.881 2.239 2.090 2.346 2.049 1.916 2.013 1.817 average 2.902 2.427 2.154 2.477 2.241 2.002 2.233 1.998 comp. time 0.688 1.250 0.969 6.718 2.703 2.406 2.140 3.329 dec. time 0.109 0.453 0.547 0.360 0.609 0.672 0.703 3.485 the efficient storage of text documents in digital libraries | skibiński and swacha 149 bitrate (of, e.g., rtf documents compared to the plain text) does not mean the file is smaller, only that the compression was better. uncompressed files have a bitrate of 8 bits per character. looking at the results obtained for txt documents (table 3), we can see an average improvement of 17 percent for ctdl and 27 percent for ctdl+ compared to the baseline deflate implementation. compared to the baseline lzma implementation, the improvement is 10 percent for ctdl and 20 percent for ctdl+. also, ctdl+ combined with lzma compresses txt documents 31 percent better than gzip, 11 percent better than bzip2, and slightly better than the state-of-the-art ppmvc implementation. in case of tex documents (table 4), the gzip results were improved, on average, by 16 percent using ctdl and by 26 percent using ctdl+; the numbers for lzma are 10 percent for ctdl and 19 percent for ctdl+. in a cross-method comparison, ctdl+ with lzma beats gzip by 31 percent, bzip2 by 10 percent, and attains results very close to ppmvc. on average, deflate-based ctdl compressed xml documents 20 percent better than the baseline algorithm (table 5), and with ctdl+ the improvement rises to 26 percent. ctdl improves lzma compression by 11 percent, and ctdl+ improves it by 18 percent. ctdl+ with lzma beats gzip by 33 percent, bzip2 by 8 percent, and loses only 4 percent to ppmvc. similar results were obtained for html documents (table 6): they were compressed with ctdl and deflate 18 percent better than with the deflate algorithm alone, and 27 percent better with ctdl+. lzma compression efficiency is improved by 11 percent with ctdl and 20 percent with ctdl+. ctdl+ with lzma beats gzip by 33 percent, bzip2 by 9 percent, and loses only 2 percent to ppmvc. for rtf documents (table 7), the gzip results were improved, on average, by 18 percent using ctdl, and 25 percent using ctdl+; the numbers for lzma are respectively 9 percent for ctdl and 17 percent for ctdl+. in a cross-method comparison, ctdl+ with lzma beats gzip by 34 percent, bzip2 by 7 percent, and loses 5 percent to ppmvc. although there is no mode designed especially for doc documents in ctdl (table 8), the basic txt mode was used, as it was found experimentally to be the best choice available. the results show it managed to improve deflate-based compression by 9 percent using ctdl, and by 21 percent using ctdl+, whereas lzma-based compression was improved respectively by 4 percent for ctdl and 14 percent for ctdl+. combined with lzma, ctdl+ compresses doc documents 30 percent better than gzip, 13 percent better than bzip2, and 1 percent better than ppmvc. in case of ps documents (table 9), the gzip results were improved, on average, by 5 percent using ctdl, and by 8 percent using ctdl+; the numbers for lzma improved 3 percent for ctdl and 5 percent for ctdl+. in a cross-method comparison, ctdl+ with lzma beats gzip by 8 percent, losing 5 percent to bzip2 and 7 percent to ppmvc. finally, ctdl improved deflate-based compression of pdf documents (table 10) by 9 percent using ctdl and 10 percent using ctdl+ (compared to gzip; the numbers are table 5. compression efficiency and times for the xml documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.046 1.551 1.514 1.585 1.405 1.339 1.451 1.242 16514-t 0.871 0.698 0.670 0.703 0.612 0.590 0.599 0.552 1noam10t 2.383 1.870 1.736 1.914 1.711 1.575 1.724 1.515 2ws2610 0.691 0.539 0.497 0.561 0.474 0.440 0.461 0.422 alice30 1.477 1.258 1.140 1.248 1.131 1.034 1.116 0.999 cdscs10t 2.106 1.892 1.576 1.862 1.741 1.462 1.721 1.538 grimm10t 1.878 1.485 1.422 1.521 1.337 1.276 1.337 1.198 pandp12t 1.875 1.404 1.349 1.465 1.263 1.207 1.252 1.105 average 1.666 1.337 1.238 1.357 1.209 1.115 1.208 1.071 comp. time 0.750 1.844 1.390 10.79 4.891 5.828 7.047 3.688 dec. time 0.141 0.672 0.750 0.421 0.859 0.953 1.140 3.907 150 information technology and libraries | september 2009 much higher if compared to the embedded pdf compression—see “native” column in table 10); the numbers for lzma are respectively 7 percent for ctdl and 10 percent for ctdl+. combined with lzma, ctdl+ compresses pdf documents 28 percent better than gzip, 4 percent better than bzip2, and 5 percent worse than ppmvc. the results presented in tables 3–10 show that ctdl manages to improve compression efficiency of the general-purpose algorithms it is based on. the scale of improvement varies between document types, but for most of them it is more than 20 percent for ctdl+ and 10 percent for ctdl. the smallest improvement is achieved in case of ps (about 5 percent). figure 1 shows the same results in another perspective: the bars show how much better compression ratios were obtained for the same documents using different compression schemes compared to gzip with default options (0 percent means no improvement). compared to gzip, ctdl offers a significantly better compression ratio at the expense of longer processing time. the relative difference is especially high in case of decompression. however, in absolute terms, even in the worst case of pdf, the average delay between ctdl+ and gzip is below 180 ms for compression and 90 ms for decompression per file. taking into consideration the low-end specification of the test computer, these results table 6. compression efficiency and times for the html documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.696 2.054 1.940 2.121 1.868 1.751 1.932 1.670 16514-t 1.726 1.405 1.310 1.436 1.258 1.180 1.257 1.113 1noam10t 2.768 2.159 1.972 2.244 1.979 1.815 1.973 1.785 2ws2610 2.084 1.747 1.504 1.743 1.525 1.344 1.499 1.303 alice30 2.451 2.124 1.829 2.128 1.929 1.701 1.888 1.684 cdscs10t 2.880 2.593 2.084 2.597 2.410 1.966 2.348 2.131 grimm10t 2.603 2.074 1.916 2.138 1.883 1.752 1.889 1.688 pandp12t 2.640 2.037 1.891 2.120 1.826 1.717 1.777 1.596 average 2.481 2.024 1.806 2.066 1.835 1.653 1.820 1.621 comp. time 0.750 1.438 1.078 8.203 3.421 3.328 2.672 3.500 dec. time 0.140 0.515 0.594 0.359 0.688 0.750 0.812 3.672 table 7. compression efficiency and times for the rtf documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 1.882 1.431 1.372 1.428 1.267 1.200 1.300 1.120 16514-t 0.834 0.701 0.696 0.662 0.601 0.591 0.568 0.529 1noam10t 2.244 1.774 1.637 1.765 1.594 1.462 1.601 1.404 2ws2610 0.784 0.630 0.581 0.629 0.545 0.500 0.520 0.485 alice30 1.382 1.196 1.065 1.134 1.046 0.948 0.995 0.922 cdscs10t 2.059 1.882 1.558 1.784 1.704 1.432 1.645 1.488 grimm10t 1.618 1.301 1.227 1.285 1.150 1.082 1.149 1.010 pandp12t 1.742 1.340 1.264 1.336 1.169 1.115 1.142 1.012 average 1.568 1.282 1.175 1.253 1.135 1.041 1.115 0.996 comp. time 0.766 2.047 1.500 12.62 6.500 7.562 8.032 3.922 dec. time 0.156 0.688 0.766 0.469 0.875 0.953 1.312 4.157 the efficient storage of text documents in digital libraries | skibiński and swacha 151 certainly seem good enough for practical applications. compared to lzma, ctdl offers better compression and a shorter compression time at the expense of longer decompression time. notice that the absolute gain in compression time is several times the loss in decompression time, and the decompression time remains short, noticeably shorter than bzip2’s and several times shorter than ppmvc’s. ctdl+ beats bzip2 (with the sole exception of ps documents) in terms of compression ratio and achieves results that are mostly very close to the resourcehungry ppmvc. n conclusions in this paper we addressed the problem of compressing text documents. although individual text documents rarely exceed several megabytes in size, their entire collections can have very large storage space requirements. although text documents are often compressed with general-purpose methods such as deflate, much better compression can be obtained with a scheme specialized for text, and even better if the scheme is additionally specialized for individual document formats. we have developed such a scheme (ctdl), beginning with a text transform designed earlier for xml documents and table 8. compression efficiency and times for the doc documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.798 2.183 2.062 2.181 1.976 1.854 2.115 1.818 16514-t 2.226 2.213 2.073 1.712 1.712 1.652 1.919 1.686 1noam10t 2.851 2.250 2.025 2.289 2.057 1.869 2.113 1.870 2ws2610 2.497 2.499 2.210 2.095 2.095 1.890 2.251 1.999 alice30 2.744 2.714 2.270 2.345 2.345 2.038 2.348 2.058 cdscs10t 2.916 2.891 2.231 2.559 2.560 2.062 2.475 2.196 grimm10t 2.691 2.677 2.059 2.179 2.179 1.856 2.075 1.833 pandp12t 2.761 2.171 2.050 2.189 1.955 1.843 1.983 1.770 average 2.686 2.450 2.123 2.194 2.110 1.883 2.160 1.904 comp. time 0.718 1.312 1.031 7.078 4.063 3.001 2.250 3.421 dec. time 0.125 0.375 0.547 0.344 0.547 0.718 0.735 3.625 table 9. compression efficiency and times for the ps documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.847 2.634 2.589 2.213 2.105 2.074 2.011 1.778 16514-t 3.226 3.129 3.039 2.730 2.707 2.699 2.613 2.505 1noam10t 2.718 2.551 2.490 2.147 2.060 2.015 1.892 1.694 2ws2610 3.064 2.922 2.795 2.600 2.521 2.450 2.336 2.186 alice30 3.224 3.154 3.026 2.750 2.745 2.691 2.553 2.400 cdscs10t 3.110 3.029 2.890 2.657 2.683 2.579 2.447 2.276 grimm10t 2.833 2.664 2.597 2.288 2.200 2.162 2.074 1.863 pandp12t 2.814 2.533 2.468 2.193 2.049 1.998 1.858 1.644 average 2.980 2.827 2.737 2.447 2.384 2.334 2.223 2.043 comp. time 1.328 3.015 2.500 14.23 10.96 11.09 4.171 5.765 dec. time 0.203 0.688 0.781 0.609 1.063 1.125 1.360 6.063 152 information technology and libraries | september 2009 modifying it for the requirements of each of the investigated document formats. it has two operation modes: basic ctdl and ctdl+ (the latter uses a common word dictionary for improved compression) and uses two back-end compression algorithms: deflate and lzma (differing in compression speed and efficiency). the improvement in compression efficiency, which can be observed in the experimental results, amounts to a significant reduction of data storage requirements, giving the reasons to use the library in both new and existing digital library projects instead of general-purpose compression programs. to facilitate this process, we implemented the scheme as an open-source software library under the same name, freely available at http://www.ii.uni.wroc . p l / ~ i n i k e p / re s e a rc h / c t d l / ctdl09.zip. although the scheme and the library are now complete, we plan future extensions aiming both to increase the level of specializations for currently handled document formats and to extend the list of handled document formats. table 10. compression efficiency and times for the (uncompressed) pdf documents deflate lzma bzip2 ppmvc file name native gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 3.443 2.624 2.191 2.200 1.986 1.708 1.656 1.852 1.659 16514-t 4.370 2.839 2.836 2.810 2.422 2.422 2.328 2.378 2.241 1noam10t 3.379 2.522 2.103 2.094 1.924 1.659 1.603 1.770 1.587 2ws2610 3.519 2.204 2.346 2.248 1.781 1.947 1.860 1.625 1.480 alice30 3.886 2.863 2.753 2.668 2.429 2.308 2.216 2.315 2.137 cdscs10t 3.684 2.835 2.688 2.557 2.399 2.276 2.164 2.260 2.079 grimm10t 3.543 2.557 2.135 2.120 2.008 1.713 1.661 1.858 1.696 pandp12t 3.552 2.684 2.267 2.256 2.071 1.831 1.769 1.870 1.705 average 3.672 2.641 2.415 2.369 2.128 1.983 1.907 1.991 1.823 comp. time n/a 1.594 3.672 3.250 19.62 13.31 16.32 5.641 7.375 dec. time n/a 0.219 0.844 0.969 0.719 1.219 1.360 1.765 7.859 figure 1. compression improvement relative to gzip the efficient storage of text documents in digital libraries | skibiński and swacha 153 acknowledgements szymon grabowski is the coauthor of the xml-wrt transform, which served as the basis for the ctdl library. references 1. john f. gantz et al., the diverse and exploding digital universe: an updated forecast of worldwide information growth through 2011 (framingham, mass.: idc, 2008), http://www .emc.com/collateral/analyst-reports/diverse-exploding-digital -universe.pdf (accessed may 7, 2009). 2. timothy c. bell, alistair moffat, and ian h. witten, “compressing the digital library,” in proceedings of digital libraries ‘94 (college station: texas a&m univ. 1994): 41. 3. ian h. witten and david bainbridge, how to build a digital library (san francisco: morgan kaufmann, 2002). 4. chad m. kahl and sarah c. williams, “accessing digital libraries: a study of arl members’ digital projects,” the journal of academic librarianship 32, no. 4 (2006): 364. 5. donald e. knuth, tex: the program (reading, mass.: addison-wesley, 1986); microsoft technical support, rich text format (rtf) version 1.5 specification, 1997, http://www.biblioscape .com/rtf15_spec.htm (accessed may 7, 2009); tim bray et al., eds., extensible markup language (xml) 1.0 (fourth edition), 2006, http://www.w3.org/tr/2006/rec-xml-20060816 (accessed may 7, 2009); dave raggett, arnaud le hors, and ian jacobs, eds., w3c html 4.01 specification, 1999, http://www.w3.org/ tr/rec-html40/ (accessed may 7, 2009); postscript language reference, 3rd ed. (reading, mass.: addison-wesley, 1999), http://www.adobe.com/devnet/postscript/pdfs/plrm.pdf (accessed may 7, 2009); pdf reference, 6th ed., version 1.7, 2006, http://www.adobe.com/devnet/acrobat/pdfs/pdf_ reference_1-7.pdf (accessed may 7, 2009). 6. jacob ziv and abraham lempel, “a universal algorithm for sequential data compression,” ieee transactions on information theory 23, no. 3 (1977): 337. 7. ian h. witten, alistair moffat, and timothy c. bell, managing gigabytes: compressing and indexing documents and images, 2nd ed. (san francisco: morgan kaufmann, 1999). 8. john g. cleary and ian h. witten, “data compression using adaptive coding and partial string matching,” ieee transactions on communication 32, no. 4, (1984): 396; michael burrows and david j. wheeler, “a block-sorting lossless data compression algorithm,” digital equipment corporation src research report 124, 1994, www.hpl.hp.com/techreports/ compaq-dec/src-rr-124.pdf (accessed may 7, 2009). 9. witten, moffat, and bell, managing gigabytes. 10. jon louis bentley et al., “a locally adaptive data compression scheme,” communications of the acm 29, no. 4 (1986): 320; r. nigel horspool and gordon v. cormack, “constructing word-based text compression algorithms,” proceedings of the data compression conference (snowbird, utah, 1992): 62. 11. see for example andrei v. kadach, “text and hypertext compression,” programming & computer software 23, no. 4 (1997): 212; alistair moffat, “word-based text compression,” software—practice & experience 2, no. 19 (1989): 185; przemysław skibiński, szymon grabowski, and sebastian deorowicz, “revisiting dictionary-based compression,” software— practice & experience 35, no. 15 (2005): 1455. 12. przemysław skibiński, jakub swacha, and szymon grabowski, “a highly efficient xml compression scheme for the web,” proceedings of the 34th international conference on current trends in theory and practice of computer science, lncs 4910 (2008): 766. 13. jon louis bentley et al., “a locally adaptive data compression scheme,” communications of the acm 29, no. 4 (1986): 320. 14. skibiński, grabowski, and deorowicz, “revisiting dictionary-based compression,” 1455. 15. skibiński, swacha, and grabowski, “a highly efficient xml compression scheme for the web,” 766. 16. peter deutsch, “deflate compressed data format specification version 1.3,” rfc1951, network working group, 1996, www.ietf.org/rfc/rfc1951.txt (accessed may 7, 2009). 17. christian schneider, precomp—a command line precompressor, 2009, http://schnaader.info/precomp.html (accessed may 7, 2009). 18. the technical details of the algorithm constructing code words and assigning them to indexes, and encoding numbers and special tokens, are given in skibiński, swacha, and grabowski, “a highly efficient xml compression scheme for the web,” 766. 19. david solomon, data compression: the complete reference, 4th ed. (london: springer-verlag, 2006). 20. skibiński, swacha, and grabowski, “a highly efficient xml compression scheme for the web,” 766. 21. dave raggett, arnaud le hors, and ian jacobs, eds., w3c html 4.01 specification, 1999, http://www.w3.org/tr/rec -html40/ (accessed may 7, 2009). 22. ian h. witten, david bainbridge, and stefan boddie, “greenstone: open source dl software,” communications of the acm 44, no. 5 (2001): 47. 23. project gutenberg, 2008, http://www.gutenberg.org/ (accessed may 7, 2009). 24. przemysław skibiński and szymon grabowski, “variablelength contexts for ppm,” proceedings of the ieee data compression conference (snowbird, utah, 2004): 409. alcts cover 2 lita cover 3, cover 4 index to advertisers 86 information technology and libraries | september 2011 on technology and other decisions in my career. i know that i can post a question to lita-l or ala connect and get a quick, diverse response to an inquiry. i know that i can call on my lita colleagues to serve as references and reviewers as i move through my career. i also know that i can depend upon lita to help keep me current and well informed about technology and how it is integrated into our libraries and lives. this also gives me an edge in my career. so much of the lita experience is currently gained from attending meetings in person and making connections—those of you who have attended the lita happy hour can probably attest to this. for several years lita has not had a requirement to attend meetings in person and allows for virtual participation in committees and interest groups. several ad hoc methods have developed to allow members to attend meetings virtually. to better institutionalize the process two new taskforces have been formed to look at virtual participation in formal and informal lita meetings. a broadcasting taskforce is charged with making a recommendation on the best ways to deliver business meetings and another taskforce is charged with investigating methods to deliver education and programming to members virtually. both taskforces will pay careful attention to interaction and other attributes of in person gatherings that can be applied to virtual meetings so that we retain the connection-making experience. it is hard to assign monetary value to membership in an association, but we do so every time we make a decision to join or renew membership. when i renew and pay annual dues to lita i affirm that i am receiving value, and i do so without thinking. it is a given that i will renew. in addition to my library memberships i am a member of the wildlife conservation society (the group behind the bronx zoo and several other zoos in nyc). each year as i renew my membership i do a quick cost analysis calculating how many times i visited the zoos and what it would have cost my family if we were not members. but before i can finish that exercise my mind begins to wander and i start to think about the excursions to the zoos-camel rides, newborn animals—and those experiences and the memories created derail any cost recovery exercise. it is impossible to put monetary value on the wonderful experiences my family share during our visits to the zoo (incidentally it is more economical as well). i also feel some pride in contributing to an organization that does such wonderful programming and makes a real difference for animals and our planet. i understand that my membership helps them do what they do best. i don’t do this cost analysis with lita, but perhaps i should. the current price of lita membership is sixty dollars per year, which is about sixteen cents per day. as members we need to ask ourselves if we are receiving in return what a s i write my first president’s message for ital, i am wrapping up my year as vice president and the ala annual conference is fast approaching. the past year has been a busy one—handling necessary division business, including meeting with my fellow ala vice presidents, making committee appointments, planning an orientation for new board members, strategic planning, and attending various conferences and meetings to prepare me for my role as lita president. i am lucky to follow such wonderful leaders as karen starr and michelle frisque, who have both helped ready me for the year ahead. my life outside of lita has been equally busy. i started a new position as the director of weill cornell medical college library earlier this year and have a busy home life with two small children. as usual, i have been juggling quite a bit and often dropping a few balls. my mantra is that it is impossible to keep all the balls in the air all of the time, but when they do drop be careful not to let them roll so far away from you so that you lose sight of them. eventually i pick them up and start juggling again. i know that i am not alone in this juggling exercise. lita members have real jobs and friends, family, and other social responsibilities that keep us busy. so why do we give so much to our profession, including lita? if you are like me, it is because we get so much in return. the importance of activity and leadership in national, professional associations cannot be overrated. my experience in lita and other professional library associations has given me an opportunity to hone leadership skills working with various committees and boards over the years. the achievements that i have made in my career have a direct correlation to my work with lita. as libraries flatten organizational structures, lita is one place where anyone can take on leadership roles, gaining valuable experience. many members have agreed to take on leadership roles in the coming year by volunteering for committees and taskforces and accepting various appointments and i want to thank everyone who came forward. in the coming year i will be working with several officers and committees to develop orientations, mentoring initiatives, and leadership training for our members. i do appreciate that not everyone wants to take on a leadership role in lita. the networking opportunities, both formal and informal, also have been extremely valuable in my career. the people i have met in lita have become colleagues i am comfortable turning to for advice colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011–12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york colleen cuddy president’s message: reflections on membership continued on page 89 editorial | truitt 89 editorial.cfm (accessed july 13, 2011). 3. begin with fforde’s the eyre affair (2001) and proceed from there. if you are a librarian and are not quickly enchanted, you probably should consider a career change very soon! thank you, michele n! .youtube.com/watch?v=sps6c9u7ras. sadly, the rest of us must borrow or rent a copy. 2. marc truitt, “no more silver bullets, please,” information technology & libraries 29, no. 2 (june 2010), http://www.ala .org/ala/mgrps/divs/lita/publications/ital/292010/2902jun/ we give to the organization. the lita assessment and research committee recently surveyed membership to find out why people belong to lita, this is an important step in helping lita provide programming etc. that will be most beneficial to its users, but the decision on whether to be a lita member i believe is more personal and doesn’t rest on the fact that a particular drupal class is offered or that a particular speaker is a member of the top tech trends panel. it is based on the overall experience that you have as a member, the many little things. i knew in just a few minutes of attending my first lita open house 12 years ago that i had found my ala home in lita. i wish that everyone could have such a positive experience being a member of lita. if your experience is less than positive how can it be more so? what are we doing right? what could we do differently? please let me or another officer know, and/or volunteer to become more involved and create a more valuable experience for yourself and others. president’s message continued from page 86 aliprand ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ reference information extraction and processing using conditional random fields tudor groza, gunnar aastrand grimnes, and siegfried handschuh reference information extraction and processing |groza, grimnes, and handschuh 6 abstract fostering both the creation and the linking of data with the scope of supporting the growth of the linked data web requires us to improve the acquisition and extraction mechanisms of the underlying semantic metadata. this is particularly important for the scientific publishing domain, where currently most of the datasets are being created in an author-driven, manual manner. in addition, such datasets capture only fragments of the complete metadata, omitting usually, important elements such as the references, although they represent valuable information. in this paper we present an approach that aims at dealing with this aspect of extraction and processing of reference information. the experimental evaluation shows that, currently, our solution handles very well diverse types of reference format, thus making it usable for, or adaptable to, any area of scientific publishing. 1. introduction the progressive adoption of semantic web 1 techniques resulted in the creation of a series of datasets connected by the linked data 2 initiative, and via the linked data principles, into a universal web of linked data. in order to foster the continuous growth of this linked data web, we need to improve the acquisition and extraction mechanisms of the underlying semantic metadata. unfortunately, the scientific publishing domain, a domain with an enormous potential for generating large amounts of linked data, still promotes trivial mechanisms for producing semantic metadata. 3 as an illustration, the metadata acquisition process of the semantic web dog food server, 4 the main linked data publication repository available on the web, consists of two steps:  the authors manually fill-in submission forms corresponding to different publishing venues (e.g., conferences or workshops), with the resulting (usually xml) information being transformed via scripts into semantic metadata, and  the entity uris (i.e., authors and publications) present in this semantic metadata are then manually mapped to existing web uris for linking/consolidation purposes. tudor groza (tudor.groza@uq.edu.au) is postdoctoral research fellow, school of information technology and electrical engineering, university of queensland, gunnar aastrand grimnes (grimnes@dfki.uni-kl.de) is researcher, german research center for artificial intelligence (dfki) gmbh, kaiserslautern, germany, siegfried handschuh (msiegfried.handschuh@deri.org) is senior lecturer/associate professor, national university of ireland, galway, ireland. mailto:tudor.groza@uq.edu.au mailto:grimnes@dfki.uni-kl.de mailto:msiegfried.handschuh@deri.org information technology and libraries | june 2012 7 moreover, independent of the creation/acquisition process, one particular component of the publication metadata, i.e., the reference information, is almost constantly neglected. the reason is mainly the amount of work required to manually create it, or the complexity of the task, in the case of automatic extraction. as a result, currently, there are no datasets in the linked data web exposing reference information, while the number of digital libraries providing search and link functionality over references is rather limited. this is quite a problematic gap if we consider the amount of information provided by references and their foundational support for other application techniques that bring value to researchers and librarians, such as citation analysis and citation metrics, tracking temporal author-topic evolution 5 or co-authorship graph analysis. 6,7 in this paper we focus on the first of the above-mentioned steps, i.e., providing the underlying mechanisms for automatic extraction of reference metadata. we devise a solution that enables extraction and chunking of references using conditional random fields (crf). 8 the resulting metadata can then be easily transformed into semantic metadata adhering to particular schemas via scripts, the added value being the exclusion of the manual author-driven creation step from the process. from the domain perspective, we focus on computer science and health sciences only because these domains have representative datasets that can be used for evaluation and hence enable comparison against similar approaches. however, we believe that our model can be applied also in domains such as digital humanities or social sciences, and we intend, in the near future, to build a corresponding corpus that would allow us to test and adapt (if necessary) our solution to these domains. figure 1. examples of chunked and labeled reference strings reference chunking represents the process of label sequencing a reference string, i.e., tagging the parts of the reference containing the authors, the title, the publication venue, etc. the main issue associated with this task is the lack of uniformity in the reference representation. figure 1 presents three examples of chunked and labeled reference strings. one cannot infer generic patterns for all types of references. for example, the year (or date) of some of the references of this paper are similar to example 2 from the figure, i.e., they are located at the very end of the reference string. unfortunately, this does not hold for some journal reference formats, such as the one presented in example 1. and at the same time, the actual date might not comprise only the year, but also the month (and even day). in addition to the placement of the particular types of tokens within the reference string, one of the major concerns when labeling these types of tokens is disambiguation. generally, there are three categories of ambiguous elements: reference information extraction and processing |groza, grimnes, and handschuh 8  names—can act as authors, editors, or even part of organization names (e.g., max planck institute); in example 1 a name is used as part of the title;  numbers—can act as pages, years, days, volume numbers, or just numbers within the title;  locations—can act as actual locations or part of organization names (e.g., univ. of wisconsin) to help the chunker in performing disambiguation, one can use a series of markers, such as, pp. for pages, tr for technical reports, univ. or institute for organization. however, there are cases where such markers help in detecting the general category of the token, e.g., publication venue, but a more detailed disambiguation is required. for example, the proc. marker generally signals the publication venue of the reference, without knowing exactly whether it represents a workshop, conference or even journal (as in the case of proc. natl. acad. sci.—proceedings of the national academy of sciences). the solution we have devised was built to properly handle such disambiguation issues and the intrinsic heterogeneous nature of references. the features of the crf chunker model were chosen to provide a representative discrimination between the different fields of the reference string. consequently, as the experimental results show, the resulting chunker has a superior efficiency, while at the same time maintaining an increased versatility. the rest of the paper is structured as follows: in section 2 we briefly describe conditional random fields and analyze the existing related work. section 3 details the crf-based reference chunker and before concluding in section 5, section 4 presents our experimental results. 2. background 2.1 conditional random fields to have a better understanding of the machine learning technique used by our solution, in the following we give a brief description of the conditional random fields paradigm. figure 2. example linear crf—showing dependencies between features x and classes y information technology and libraries | june 2012 9 conditional random fields (crf) is a probabilistic graphical model for classification. crf, in general, can represent many different types of graphical models, however in the scope of this paper, we use the so-called linear-chain crfs. a simple example of a linear dependency graph is shown in figure 2, here only the features x of the previous item influences the class of the current item y. the conditional probability is defined as: ( | ) ( ) (∑ ( ) ) where ( ) ∑ ( ) and ( ) ∑ (∑ ( ) ) . the model is usually trained by maximizing the log-likelihood of the training data by gradient methods. a dynamic algorithm is used to compute all the required probabilities p⍬(yi, yi+1) for calculating the gradient of the likelihood. this means that in contrast to traditional classification algorithms in machine learning (e.g., support vector machines 9 ), it not only considers the attributes of the current element when determining the class, but also attributes of preceding and succeeding items. this makes it ideal for tagging sequences, such as chunking of parts of speech or parts of references, which is what we require for our chunking task. 2.2 related work in recent years, extensive research has been performed in the area of automatic metadata extraction from scientific publications. most of the approaches focus on one of the two main metadata components, i.e., on the heading/bibliographic metadata or on the reference metadata, but there are also cases when the entire set is targeted. as this paper focuses only on the second component, within this section we present and discuss those applications that deal strictly with reference chunking. the parscit framework is the closest technique mapping to our goals and methodology. 10 parscit is an open-source reference-parsing package. while its first version used a maximum entropy model to perform reference chunking, 11 currently, inspired by the work of peng et al. , 12 it uses a trained crf model for label sequencing. the model was obtained based on a set of twenty-three token-oriented features tailored towards correcting the errors that peng's crf model produced. our crf chunker builds on the work of parscit. however, as we aimed at improving the chunking performance, we altered some of the existing features and introduced additional ones. moreover, we have compiled significantly larger gazetteers required for detecting different aspects, such as names, places, organizations, journals, or publishers. one of the first attempts to extract and index reference information led to the currently well known system, citeseer. 13 around the same period, seymore et al. developed one of the first reference chunking approaches that used machine learning techniques. 14 the authors trained a hidden markov model (hmm) to build a reference sequence labeler using internal states for different parts of the fields. as it represented pioneering work, it also resulted in the first gold standard set, the cora dataset. at a later stage, the same group applied crf for the first time to perform reference chunking, which later inspired parscit. 15 reference information extraction and processing |groza, grimnes, and handschuh 10 in the same learning-driven category is the work of han et al. 16 the authors proposed an effective word clustering approach with the goal of reducing feature dimensionality when compared to hmm, while at the same time improving the overall chunking performance. the resultant domain, rule-based word clustering method for cluster feature representation used clusters formed from various domain databases and word orthographic properties. consequently, they achieved an 8.5 percent improvement on the overall accuracy of reference fields classification combined with a significant dimensionality reduction. flux-cim 17 is the only unsupervised 18 approach that targets reference chunking. the system uses automatically constructed knowledge bases from an existing set of sample references for recognizing the component fields of a reference. the chunking process features two steps:  a probability estimation of a given term within a reference which is a value for a given reference field based on the information encoded in their knowledge bases, and  the use of generic structural properties of references. similarly to seymore et al., 19 the authors have also created two datasets (specifically for the computer science and health science areas) to be used for comparing the achieved accuracies. a completely different, and novel, direction was developed by poon and domingos. 20 unlike all the other approaches, they propose a solution where the segmentation (chunking) of the reference fields is performed together with the entity resolution in a single integrated inference process. they, thus, help in disambiguating the boundaries of less-clear chunked fields, using the already well-segmented ones. although the results achieved are similar to, and even better than some of, the above-mentioned approaches, this is suboptimal from the computational perspective: the chunking/resolution time reported by the authors measured around thirty minutes. in addition to the previously described works, which were specifically tailored for bibliographic metadata extraction, there are a series of other approaches that could be used for the same purpose. for example, cesario et al. propose an innovative recursive boosting strategy, with progressive classification, to reconcile textual elements to an existing attribute schema. 21 in the case of bibliographic metadata segmentation, the metadata fields would correspond to the textual elements, while an ontology describing them (e.g., dublincore 22 or swrc 23 ) would have the schema role. the authors even describe an evaluation of the method using the dblp citation dataset, however, without giving precise details on the fields considered for segmentation. some other approaches include, in general, any sequence labeling techniques, e.g., slf, 24 named entity recognition techniques, 25 or even field association (fa) terms extraction, 26 the latter working on bibliographic metadata fields in a quasi-similar manner as the recursive boosting strategy. in conclusion, it is worth mentioning that retrieving citation contexts is an interesting research area especially in the context of digital libraries. our current work does not feature this aspect, but we regard it as one of the key next steps to be tackled. consequently, we mention the research performed by schwartz et al. 27 teufel et al., 28 or wu et al. 29 that deal with using citation contexts for discerning a citation's function and analyzing how this influences or is influenced by the work it points to. information technology and libraries | june 2012 11 3. method this section presents the crf chunker model. we start by defining the preprocessing steps that deal with the extraction of the references block, dividing the block into actual reference entries and cleaning the reference strings, and then detail the crf reference chunker features. 3.1 prerequisites most of the features used by the crf chunker require some forms of vocabulary entries. therefore, we have manually compiled a comprehensive list of gazetteers (only for english, except for the names), explained as follows:  firstname—25,155 entries gazetteer of the most common first names (independent of gender);  lastname—48,378 entries list of the most common surnames;  month—month names gazetteer and associated abbreviations;  venuetype—a structured gazetteer with five categories: conference, workshop, journal, techreport, and website. each category has attached its own gazetteer, containing specific keywords and not actual titles. for example, the conference gazetteerfeatures ten unigrams signaling conferences, such as conference, conf, or symposium;  location—places, cities, and countries gazetteer comprising 17,336 entries;  organization—150 entries gazetteer listing organization prefixes and suffixes (e.g., e.v. or kgaa);  proceedings—simple list of all possible appearances of the proceedings marker;  publisher—564 entries gazetteer comprising publisher unigrams (produced from around 150 publisher names);  jtitle—12,101 entries list of journal title unigrams (produced from around 1600 journal titles);  connection—a 42 entries stop-word gazetteer (e.g., to, and, as). 3.2 preprocessing in the preprocessing stage we deal with three aspects:  cleaning the provided input,  extracting the reference block, and  the division of the reference block into reference entries. the first step aims to clean the raw textual input received by the chunker of unwanted spacing characters while at the same time ensuring proper spacing where necessary. since the source of the textual input is unknown to the chunker, we make no assumptions with regard to its structure or content. 30 thus, in order to avoid inherent errors that might appear as a result of extracting the raw text from the original document, we perform the following cleaning steps:  we compress the text by eliminating unnecessary carriage returns, such that the lines containing less than 15 characters are merged with previous ones, 31  we introduce spaces after some punctuation characters, such as “,,” “.” or “-”, and finally,  we split the camel-cased strings, such as johndoe. reference information extraction and processing |groza, grimnes, and handschuh 12 the result will be a compact and clean version of the input. also, if the raw input is already compact and clean, this preprocessing step will not affect it. the extraction of the reference block is done using regular expressions. generally, we search in the compacted and cleaned input for specific markers, like references or bibliography, located mainly at the beginning of a line. if these are not directly found, we try different variations, such as, looking for the markers at the end of a line, or looking for split markers onto two lines (e.g., ref – erences, or refer – ences). this latter case is a typical consequence of the above-described compacting step if the initial input was erroneously extracted. the text following the markers is considered for division, although it may contain unwanted parts such as appendices or tables. the division into individual reference entries is performed on a case basis. after splitting the reference block based on new lines, we look for prefix patterns at the beginning of each line. as an example, we analyze which lines start with “[”, “(”, or a number followed by “.” or space, and we record the positions of these lines in the list of all lines. to ensure that we don't consider any false positives when merging the adjacent lines into a reference entry, we compute a global average of the differences between positions. assuming that a reference does not span on more than four lines, if this average is between one and four, a reference entry is created. the same average is also used to extract the last reference in the list, thus detaching it from eventual appendices or tables. 3.3 the reference chunking model we have built the crf learning model based on a series of features used in principle also by the other crf reference chunking approaches such as parscit 32 or peng and mccallum 33 . a set of feature values is used to characterize each token present in the reference string, where the reference's token list is obtained by dividing the reference string into space-separated pieces. the complete list of features is detailed as follows. we use example 1 from figure 1 toexemplify the feature values.  token—the original reference token: bronzwaer,  clean token—the original token, stripped of any punctuation and lower cased: bronzwaer  token ending—a flag signaling the type of ending (possible values: lower cap – c / upper cap – c / digit – 0 / punctuation character: ,  token decomposition–start—five individual values corresponding to token's first five characters, taken gradually: b, br, bro, bron, bronz  token decomposition–end—five individual values corresponding to the token's last five characters, taken gradually: r, er, aer, waer, zwaer,  pos tag—the token's part of speech tag (possible values: proper noun phrase – nnp ,  noun phrase – np, adjective – jj, cardinal number – cd, etc): nnp  orthographic case—a flag signaling the token's orthographic case (possible values:  initialcap, singlecap, lowercase, mixedcaps, allcaps): singlecap  punctuation type—a flag signaling the presence and type of a trailing punctuation character (possible values: cont, stop, other): cont  number type—a flag signaling the presence and type of a number in the token (possible values: year, ordinal, 1dig, 2dig, 3dig, 4dig, 4dig+, nonumber): nonumber information technology and libraries | june 2012 13  dictionary entries—a set of ten flags signaling the presence of the token in the set of individual gazetteers listed in sect. 3.1. for our example the dictionary feature set would be: no lastname no no no no no no no no  date check—a flag checking whether the token may contain a date in form of a period of days, e.g., 12-14 (possible values: possdate, no): no  pages check—a flag checking whether the token may contain pages, e.g., 234–238 (possible values: posspages, no): no  token placement—the token placement in the reference string, based on its division into nine equal consecutive buckets. this feature indicates the bucket number: 0 for training purposes we compiled and manually tagged a set of 830 randomly chosen references. these were extracted from random publications from diverse conferences and journals from the computer science field (collected from ieee explorer, springer link or the acm portal), manually cleaned, tagged, and categorized according to their type of publication venue. 34 to achieve an increased versatility, instead of performing crossvalidation, 35 which would result in a datasettailored model with limited or no versatility, we opted for sampling the test data. hence, we included in the training corpus some samples from the testing datasets as follows: 10 percent of the cora dataset (i.e., 20 entries), 36 10 percent of the flux-cim cs dataset (i.e., 30 entries), 37 and 1% of the flux-cim hs dataset (i.e., 20 entries). consequently, the final training corpus consisted of a total of 900 reference strings. to clarify, this is, to some extent, similar to the dataset-specific cross-validation, but instead of considering, for example, a 60–40 ratio for training/testing, we used only 10 percent for training, while the testing (described in section 4) was performed as a direct application of the chunker on the entire dataset. as already mentioned, our focus on computer science and health sciences is strictly due to evaluation purposes. our proposed model is domain-agnostic, and hence, the steps described here can be easily performed on datasets emerged from other domains, if at all necessary. in reality, the chunker’s performance on references from a domain not covered above can be easily boosted simply by including a sample of references in the training set and then retraining the chunker. the list of labels used for training and then testing consists of author, title, journal, conference, workshop, website, technicalrep, date, publisher, location, volnum, pages, etal, note, editors, organization. as we will see in the evaluation, not all labels were actually used for testing (e.g., note or editors), some of them being present in the model for the sake of disambiguation. also, as opposed to the other approaches, we made a clear distinction between workshop and conference, which adds an extra degree to the complexity of the disambiguation. the crf model was trained using the mallet (a machine learning for language toolkit) implementation. 38 the output of the chunker is post-processed to expose a series of fine-grained details. as shown in figure 1 in all the examples, the chunking provides a blocked partition of the reference string, but we require for the author field an even deeper partition. consequently, following a rule-based approach we extract the individual author names from the author block making use of the punctuation marks, the orthographic case, and the alternation between initials and actual names. when no initials, subject to the existing punctuation marks, we consider as a rule-of-thumb that each name generally comprises one first name and one surname (in this order, i.e., john doe). the result of the post-processing is used in the linking process. reference information extraction and processing |groza, grimnes, and handschuh 14 4. experimental results we have performed an extensive evaluation of the proposed reference chunking approach. in general, all the previous work in reference chunking focuses on raw reference chunking, i.e., label sequencing at the macro level. more concretely, the other approaches split and tag the reference strings using blocks of complete references, without going into details such as chunking individual authors. the only exception is the parscit package that does perform complete reference chunking in a similar fashion as we do. the evaluation results presented in this section, will feature complete chunking only for our solution and for parscit, and raw chunking for the rest of the approaches. field parscit peng han et al. our approach p r f1 f1 p r f1 p r f1 author 98.7 99.3 98.99 99.4 92.6 99.1 97.6 99.08 99.6 99.30 title 96.0 98.4 97.18 98.3 92.2 93.0 92.6 95.64 95.64 95.64 date 100 98.4 99.19 98.9 98.5 95.9 97.2 99.33 98.67 98.99 pages 97.7 98.4 98.04 98.6 95.6 96.9 96.2 99.28 99.22 99.24 location 95.6 90.0 92.71 87.2 77.7 71.5 74.5 93.45 92.59 93.01 organization 90.9 87.9 89.37 94.0 76.5 77.3 76.9 100 87.87 93.54 journal 90.8 91.2 90.99 91.3 77.1 78.7 77.9 94.02 97.42 95.68 booktitle 92.7 94.2 93.44 93.7 88.7 88.9 88.88 97.77 98.44 98.10 publisher 95.2 88.7 91.83 76.1 56.0 64.1 59.9 94.84 95.83 95.33 tech. rep. 94.0 79.6 86.2 86.7 56.2 64.1 59.9 100 90.90 95.23 website 100 100 100 table 1. evaluation results on the cora dataset an additional observation we need to make is related to the reference fields taken into account. most of the fields we have focused on coincide with the fields considered by all the existing relevant approaches. nevertheless, there are also some discrepancies, listed as follows:  the fields: volume, number, editors, or note were used in the chunking process b u t are not considered for evaluation  unlike all the other approaches, we make the distinction between conference and workshop as publication venues. however, for alignment purposes (i.e., to be able to compare our results with the other approaches), in the evaluation results these are merged into the booktitle field. the actual tests were performed on four different datasets, three of them used also for evaluating the other approaches, and a fourth one compiled by us. in the case of the three existing datasets, during the experimental evaluation we did not make use of the preprocessing step as they were already clean. as evaluation metric, we used the f1 score, 39 i.e., the harmonic mean of precision and recall, using the following formula: information technology and libraries | june 2012 15 in the following, we iterate over each dataset, by providing a short description and the experimental results. it is worth mentioning that our crf reference chunker was trained only once, as described earlier, and not specifically for each dataset. 4.1 dataset: cora the cora dataset is the first gold standard created for automatic reference chunking. 40 it comprises two hundred reference strings and focuses on the computer science area. each entry is segmented into thirteen different fields: author, editor, title, booktitle, journal, volume, publisher, date, pages, location, tech, institution and note. table 1 shows the comparative evaluation results on the cora dataset of parscit, peng et al., 41 han et al., 42 and our approach. we observe that our chunker outperforms the other chunkers on most of the fields, with some of them presenting a significant increase in performance (looking at the f1 score): journal from 91.3 percent to 95.68 percent, booktitle from 93.44 percent to 98.10 percent, publisher from 91.83 percent to 95.33 percent, and especially tech. rep. from 86.7 percent to 95.23 percent. in the case of the fields where our chunker was outperformed, the f1 score is very close to the best of the approaches and includes an increase in one of its two components (i.e., precision or recall). for example, on the organization field, we scored 93.54percent, the best being peng's 94 percent. however, we achieved a gain of almost 10 percent in precision when compared with parscit (100 percent vs. 90.9 percent precision). similarly, on the date field, our f1 was 98.99 percent, opposed to parscit's 99.19 percent, but with a better recall of 98.67 percent. field parscit flux-cim our approach p r f1 p r f1 p r f1 author 98.8 99.0 98.89 93.59 95.58 94.57 99.08 99.08 99.08 title 98.8 98.3 98.54 93.0 93.0 93.0 99.65 99.65 99.65 date 99.8 94.5 97.07 97.75 97.44 97.59 98.55 98.19 98.36 pages 94.7 99.3 96.94 97.0 97.84 97.41 97.28 97.72 97.49 location 96.9 88.4 92.45 96.83 97.6 97.21 95.55 94.5 95.02 journal 97.1 82.9 89.43 95.71 97.81 96.75 94.0 97.91 95.91 booktitle 95.7 99.3 97.46 97.47 95.45 96.45 99.13 99.13 99.13 publisher 98.8 75.9 85.84 100 100 100 98.59 98.59 98.59 table 2. evaluation results on the flux-cim dataset—cs domain field flux-cim our approach p r f1 p r f1 author 98.57 99.04 98.81 99.8 99.36 99.57 title 84.88 85.14 85.01 91.39 91.39 97.39 date 99.85 99.5 99.61 99.89 99.69 99.78 pages 99.1 99.2 99.45 99.94 99.59 99.76 journal 97.23 89.35 93.13 99.42 99.16 99.28 table 3. evaluation results on the flux-cim dataset—hs domain reference information extraction and processing |groza, grimnes, and handschuh 16 4.1 dataset: flux-cim flux-cim 43 is an unsupervised 44 reference extraction and chunking system. in order to evaluate its performance, the authors of flux-cim created two separate datasets:  the flux-cim cs dataset, composed on a collection of heterogeneous references from the computer science field, and  the flux-cim hs dataset is comprised of an organized and controlled collection of references from pubmed. the flux-cim cs dataset contains three hundred reference strings randomly selected from the acm digital library. each string is segmented into ten fields: author, title, conf, journal, volume, number, pub, date, pages and place. the flux-cim hs dataset contains 2000 entries, with each entry segmented into six fields: author, title, journal, volume, date and pages. table 2 presents the comparative test results achieved by parscit, flux-cim, and our approach on the cs dataset. similar to the cora dataset, our chunker outperformed the other chunkers on the majority of the fields, exceptions being the location, journal, and publisher fields. the test results on the hs dataset are presented in table 3. here we can observe a clear performance improvement on all fields, in some cases the difference being significant, e.g., the title field, from 85.01 percent to 97.39 percent, or the journal field, from 93.12 percent to 99.28 percent. this increase is even more relevant considering the size of the dataset, each 1percent representing twenty references. 4.3 dataset: cs-sw while the cora and flux-cim cs datasets do focus on the computer science field, they do not cover the slight differences in reference format that can be found nowadays in the semantic web community. consequently, to show the even broader application of our approach, we have compiled a dataset named cs-sw comprising 576 reference strings randomly selected from publications in the semantic web area, from conferences such as international semantic web conference (iswc), the european semantic web conference (eswc), the world wide web conference (www), or the european conference on knowledge acquisition (and co-located workshops). 45 each reference entry is segmented into twelve fields: author, title, conference, workshop, journal, techrep, organization, publisher, date, pages, website and location. table 4 shows the results of the tests carried out on this dataset. one can easily observe that the chunker performed in a similar manner as on the cora dataset, with emphasis on the author, date, pages and publisher fields. field our approach p r f1 author 98.61 99.27 98.93 title 94.91 93.29 94.09 date 98.89 98.34 98.61 pages 98.94 97.24 98.08 location 93.9 92.77 93.33 organization 85.71 80 00 82.75 journal 94.59 93.33 93.95 information technology and libraries | june 2012 17 conference 96.66 95.08 95.86 workshop 83.33 88.23 85.71 publisher 96.61 97.43 97.01 tech. rep. 100 80 88.88 website 98.14 94.64 96.35 table 4. evaluation results on the cs-sw dataset 5. conclusion in this paper we presented a novel approach for extracting and chunking reference information from scientific publications. the solution, realized using a crf trained chunker, achieved good results in the experimental evaluation, in addition to an increased versatility shown by applying the one-time trained chunker on multiple testing datasets. this enables a straightforward adoption and reuse of our solution for generating semantic metadata in any digital library or publication repository focused on scientific publishing. as next steps, we plan to create a comprehensive dataset covering multiple heterogeneous domains (e.g., social sciences or digital humanities) and evaluate the chunker’s performance on it. then we will focus on developing an accurate reference consolidation and linking technique, to address the second step mentioned in section 1, i.e., aligning the resulting metadata to the existing linked data on the web. we plan to develop a flexible consolidation mechanism by dynamically generating and executing sparql queries from chunked reference fields and filtering the results via two string approximation metrics (a combination of monge-elkan and chapman soundex algorithms). the sparql queries generation will be implemented in an extensible manner, via customizable query modules, to accommodate the heterogeneous nature of the diverse linked data sources. finally, we intend to develop an overlay interface for arbitrary online publication repositories, to enable on-the-fly creation, visualization, and linking of semantic metadata from repositories that currently do not expose their datasets in a semantic / linked manner. acknowledgements the work presented in this paper has been funded by science foundation ireland under grant no. sfi/08/ce/i1380 (lion-2). references and notes 1. tim berners-lee et al., “the semantic web,” scientific american 284 (2001): 35–43. 2. christian bizer et al., “linked data—the story so far,” international journal on semantic web and information systems 5 (2009): 1–22. 3. generating computer-understandable metadata represents an issue, in general, in the publishing domain, and not necessarily only in its scientific area. however, the relevant literature dealing with metadata extraction/generation has focused on scientific publishing, because of its accelerated growing rate, especially with the increasing use of the world wide web as a dissemination mechanism. reference information extraction and processing |groza, grimnes, and handschuh 18 4. knud moeller et al., “recipes for semantic web dog food – the eswc and iswc metadata projects,” proceedings of the 6th international semantic web conference (busan, korea, 2007). 5. wei peng and tao li, “temporal relation co-clustering on directional social network and author-topic evolution,” knowledge and information systems 26 (2011): 467–86. 6. laszlo barabasi et al., “evolution of the social network of scientific collaborations,” physica a: statistical mechanics and its applications 311 (2002): 590–614. 7. xiaoming liu et al., “co-authorship networks in the digital library research community,” information processing & management 41 (2005): 1462–80. 8. john d. lafferty et al., “conditional random fields: probabilistic models for segmenting and labeling sequence data,” proceedings of the 18th international conference on machine learning (san francisco, ca, usa, 2001): 282–89. 9. vladimir vapnik, the nature of statistical learning theory (new york: springer, 1995). 10. isaac g. councill et al., “parscit: an open-source crf reference string parsing package,” proceedings of the sixth international language resources and evaluation (marrakech, morocco, 2008). 11. yong kiat ng, “citation parsing using maximum entropy and repairs” (master's thesis, national university of singapore, 2004). 12. fuchun peng and andrew mccallum, “information extraction from research papers using conditional random fields,” information processing & management 42 (2006): 963–79. 13. c. lee giles et al., “citeseer: an automatic citation indexing system,” proceedings of the third amc conference on digital libraries (pittsburgh, pa, 1998): 89–98. 14. kristie seymore et al., “learning hidden markov model structure for information extraction,” proceedings of the aaai workshop on machine learning for information extraction (1999): 37– 42. 15. isaac g. councill et al., “parscit: an open-source crf reference string parsing package,” proceedings of the sixth international language resources and evaluation (marrakech, morocco, 2008). 16. hui han et al., “rule-based word clustering for document metadata extraction,” proceedings of the symposium on applied computing (santa fe, new mexico, 2005). 17. eli cortez et al., “flux-cim: flexible unsupervised extraction of citation metadata,” proceedings of the 2007 conference on digital libraries (new york, 2007): 215–24. 18. machine learning methods can be broadly classified into two categories: supervised and unsupervised. supervised methods require training on specific datasets that exhibit the characteristics of the target domain. to achieve high accuracy levels, the training dataset needs to be reasonably large, and more importantly, it has to cover most of the possible information technology and libraries | june 2012 19 exceptions from the intrinsic data patterns. unlike supervised methods, unsupervised methods do not require training, and in principle, use generic rules to encode both the expected patterns and the possible exceptions of the target data. 19. peng and mccallum, “information extraction from research papers using conditional random fields.” 20. hoifung poon and pedro domingos, “joint inference in information extraction,” proceedings of the 22nd national conference on artificial intelligence (vancouver, british columbia, canada, 2007): 913–18. 21. ariel schwartz et al., “multiple alignment of citation sentences with conditional random fields and posterior decoding,” proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (prague, czech republic, 2007): 847–57. 22. simone teufel et al., “automatic classification of citation function,” proceedings of the 2006 conference on empirical methods in natural language processing (sydney, australia, 2006): 103–10. 23. jien-chen wu et al., “computational analysis of move structures in academic abstracts,” coling/acl interactive presentation sessions (sydney, australia, 2006): 41–44. 24. eugenio cesario et al., “boosting text segmentation via progressive classification,” knowledge and information systems 15 (2008): 285–320. 25. dublin core website, http://dublincore.org (accessed may 4, 2011). 26. york sure et al., “the swrc ontology – semantic web for research communities,” proceedings of the 12th portuguese conference on artificial intelligence (covilha, portugal, 2005). 27. yanjun qi et al., “semi-supervised sequence labeling with self-learned features,” proceedings of ieee international conference on data mining (miami, fl, usa, 2009). 28. david sanchez et al., “content annotation for the semantic web: an automatic web-based approach,” knowledge and information systems 27 (2011): 393-418. 29. tshering cigay dorji et al., “extraction, selection and ranking of field association (fa) terms from domain-specific corpora for building a comprehensive fa terms dictionary,” knowledge and information systems 27 (2011): 141–61. 30. please note that the chunker is document-format agnostic and takes as input only raw text. the actual extraction of this raw text from the original document (pdf, doc or some other format) is the user’s responsibility. 31. as a note, we chose this length of fifteen characters empirically, and based on the assumption that in any format the publication content lines usually have more than fifteen characters. reference information extraction and processing |groza, grimnes, and handschuh 20 32. lafferty et al., “conditional random fields: probabilistic models for segmenting and labeling sequence data.” 33. councill et al., “parscit: an open-source crf reference string parsing package.” 34. the manual tagging was performed by a single person and since the reference chunks have no ambiguity attached, we did not see the need for running any data reliability tests. 35. ron kohavi, “a study of cross-validation and bootstrap for accuracy estimation and model selection,” proceedings of the 14th international joint conference on artificial intelligence (montreal, quebec, 1995): 1137–43. 36. peng and mccallum, “information extraction from research papers using conditional random fields.” 37. councill et al., “parscit: an open-source crf reference string parsing package.” 38. mallet: machine learning for language toolkit, http://mallet.cs.umass.edu (accessed may 4, 2011). 39. william m. shaw et al., “performance standards and evaluations in ir test collections: clusterbased retrieval models,” information processing & management 33 (1997): 1–14. 40. peng and mccallum, “information extraction from research papers using conditional random fields.” 41. councill et al., “parscit: an open-source crf reference string parsing package.” 42. seymore et al., “learning hidden markov model structure for information extraction.” 43. han et al., “rule-based word clustering for document metadata extraction.” 44. cortez et al., “flux-cim: flexible unsupervised extraction of citation metadata.” 45. the cs-sw dataset is available at http://resources.smile.deri.ie/corpora/cs-sw (accessed may 4, 2011). http://resources.smile.deri.ie/corpora/cs-sw editorial board thoughts | farnel 169 t his past spring, my alma mater, the school of library and information studies (slis) at the university of alberta, restructured the it component of its mlis program. as a result, as of september 2010, incoming students are expected to possess certain basic it skills before beginning their program.1 these skills include the following: ■■ comprehension of the components and operations of a personal computer ■■ microsoft windows file management ■■ proficiency with microsoft office (or similar) products, including word processing and presentation software ■■ use of e-mail ■■ basic web browsing and searching this new requirement got me thinking: is this common practice among ala-accredited library schools? if other schools are also requiring basic it skills prior to entry, how do those required by slis compare? so i thought i’d do a little investigating to see what others in “library school land” are doing. before i continue, a word of warning: this was by no means a rigorous scientific investigation, but rather an informal survey of the landscape. i started my investigation with ala’s directory of institutions offering accredited master’s programs.2 there are fifty-seven institutions listed in the directory. i visited each institution’s website and looked for pages describing technology requirements, computer-competency requirements, and the like. if i wasn’t able to find the desired information after fifteen or twenty minutes, i would note “nothing found” and move on to the next. in the end i found some sort of list of technology or computer-competency requirements on thirty-three (approximately 58 percent) of the websites. it may be the case that such a list exists on other sites and i didn’t find it. i should also note that five of the lists i found focus more on software and hardware than on skills in using said software and hardware. even considering these conditions, however, i was somewhat surprised at the low numbers. is it simply assumed that today’s students already have these skills? or is it expected that they will be picked up along the way? i don’t claim to know the answers, and discovering them would require a much more detailed and thorough investigation, but they are interesting questions nonetheless. once i had found the requirements, i examined them in some detail to get a sense of the kinds of skills listed. while i won’t enumerate them all, i did find the most common ones to be similar to those required by slis— basic comfort with a personal computer and proficiency with word processing and presentation software, e-mail, file management, and the internet. a few (5) schools also list comfort with local systems (e-mail accounts, online courseware, etc.). several (7) schools mention familiarity with basic database design and functionality, while a few (5) list basic web design. very few (3) mention competency with security tools (firewalls, virus checkers, etc.), and just slightly more (4) mention familiarity with web 2.0 tools like blogs, wikis, etc. while many (14) specifically mention searching under basic internet skills, few (7) mention proficiency with opacs or other common information tools such as full-text databases. interestingly, one school has a computer programming requirement, with mentions of specific acceptable languages, including c++, pascal, java, and perl. but this is certainly the exception rather than the rule. i was encouraged that there seems to be a certain agreement on the basics. but i was a little surprised at the relative rarity of competency with wikis and blogs and all those web 2.0 tools that are so often used and talked about in today’s libraries. is this because there is still some uncertainty as to the utility of such tools in libraries? or is it because of a belief that the members of the millennial or “digital” generation are already expert in using them? i don’t know the reasons, but it is interesting to ponder nonetheless. i was also surprised that a level of information literacy isn’t listed more often, particularly given that we’re talking about slis programs. i do know, of course, that many of these skills will be developed or enhanced as students work their way through their programs, but it also seems to me that there is so much other material to learn that the more that can be taken care of beforehand, the better. librarians work in a highly technical and technological environment, and this is only going to become even more the case for future generations of librarians. certainly, basic familiarity with a variety of applications and tools and comfort with rapidly changing technologies are major assets for librarians. in fact, ala recognizes the importance of “technological knowledge and skills” as core competencies of librarianship. specifically mentioned are the following: ■■ information, communication, assistive, and related technologies as they affect the resources, service delivery, and uses of libraries and other information agencies. ■■ the application of information, communication, assistive, and related technology and tools consistent with professional ethics and prevailing service norms and applications. ■■ the methods of assessing and evaluating the sharon farnel editorial board thoughts: system requirements sharon farnel (sharon.farnel@ualberta.ca) is metadata & cataloguing librarian at the university of alberta in edmonton, alberta, canada. 170 information technology and libraries | december 2010 references 1. university of alberta school of library and information studies, “degree requirements: master of library & information studies,” www.slis.ualberta.ca/mlis_degree_requirements.cfm (accessed aug. 5, 2010). 2. american library association office for accreditation, “library & information studies directory of institutions offering accredited master’s programs 2008–2009,” 2008, http:// ala.org/ala/educationcareers/education/accreditedprograms/ directory/pdf/lis_dir_20082009.pdf (accessed aug. 5, 2010). 3. american library association, “ala’s core competences of librarianship,” january 2009, www.ala.org/ala/education careers/careers/corecomp/corecompetences/finalcorecomp stat09.pdf (accessed aug. 5, 2010). specifications, efficacy, and cost efficiency of technology-based products and services. ■■ the principles and techniques necessary to identify and analyze emerging technologies and innovations in order to recognize and implement relevant technological improvements.3 given what we know about the importance of technology to librarians and librarianship, my investigation has left me with two questions: (1) why aren’t more library schools requiring certain it skills prior to entry into their programs? and (2) are those who do require them asking enough of their prospective students? i hope you, our readers, might ask yourselves these questions and join us on italica for what could turn out to be a lively discussion. 150 information technology and libraries | december 2011 hardly a day goes by in my professional life (and it sometimes creeps into my personal life too!) when i don’t think about the issues of connecting people with data, and then how to present that data in ways that are relevant to their needs. the tides are shifting in health sciences library and likely in your library too. ongoing changes in publishing and the changing nature of research have challenged the traditional nature of the library. it is no longer solely a repository for information, physical or virtual. as librarians move from collecting and cataloging bibliographic information new roles have emerged in data discovery, in its preservation, and in helping to make data more accessible. important specialties include; knowledge management, data visualization, e-science and copyright. librarians have valuable skills sets in mining and accessing data, human–computer interaction, computer interface design, and knowledge management that can be leveraged now. it is inevitable that data discovery will quicken the pace of science and lead to collaboration and collaboration will in turn lead to data discovery and accelerate the pace of science and so on and so on. in short twentieth century data stored in individual scientists’ notebooks or computers is largely inaccessible. twenty-first-century data needs to be available 24/7 in a curated state for continuous analysis. information overload and data deluge created by intersection of science and technology are two very real problems that the librarians have the skill and ability to deal with. and, as i talk of science, bear in mind that it extends beyond the biological and physical sciences to encompass the social sciences as well. interdisciplinary studies in particular have intensive data needs. in fields such as public health and urban planning, government data alongside research data is used to predict trends, forecast, make decisions, etc. government data is a particularly important part of the equation. consider the recent nsf requirement for researchers to provide open access to their data for any nsf-sponsored grants. it is likely other government agencies will follow suit. one of taiga’s provocative statements of 2011 is “#10. the oversupply of mlss” which states that “within five years, library programs will have overproduced mlss at a rate greater even than humanities phds and glutted a permanently diminished market.”1 as the alarming scenario of an over abundance of new mlss in proportion to available library jobs presents itself, i encourage librarians to begin to envision themselves as digital information brokers or data scientists. the us department of labor in the 2010–11 occupational outlook handbook, anticipates that librarian jobs in nontraditional settings will grow the fastest over this decade. nontraditional libraries and jobs include working as information brokers for private corporations, nonprofit organizations, and consulting firms. “many companies are turning to librarians because of l ast week i attended the second annual vivo conference in washington, d.c. vivo (vivoweb .org) is a semantic web application that enables the discovery of research and scholarship across disciplines in an institution with the potential to also link scholars and research across institutions. despite an earthquake and a hurricane the conference itself was the real showstopper—excellent, informative programming, engaging speakers, great networking and exchange of ideas. my institution is one of the core vivo members so it was an opportunity to showcase our work, see what others are doing as well as learn more about trends in research, e-science and data discovery and collaboration initiatives. much of what i learned or rediscovered at vivo will make it into my fifty-minute presentation on the subject at the lita national forum in st. louis later this month. in fact the vivo conference itself reminded me of our own national forum in size, scope and content. it was a good mix of in-depth technical discussions coupled with broad coverage of issues and trends in scientific research. this attention to content balance is something that lita consistently gets right at our annual forum—there is literally something for everyone from introductory concepts to technical details—and i look forward to seeing many familiar faces and meeting some new folks at this year’s lita national forum in st. louis “rivers of data: currents of change.” i would also like to take this opportunity to personally invite each and every ital reader to the 2012 lita national forum. building on this year’s theme, the 2012 lita national forum will be “the new world of data: discover. connect. remix.” i just signed off on theme this week and i am excited and impressed by the work completed by the national forum planning committee so far. please look for the call for papers and posters to come out in late december. i love the forum because it is much more intimate than the much larger ala meetings i always come away with new ideas and new friends. i am not alone in this feeling. a recent forum attendee commented,” (the lita forum) was one of the best conferences i have attended. i met a far greater concentration of peers—colleagues at other libraries doing similar work—at lita forum than i have met at other similar conferences.” i don’t think i could say it better myself. the 2012 forum theme is one of great personal interest to me and i plan to extend the theme to the lita president’s program on june 24, 2012, in anaheim. in fact colleen cuddypresident’s message: data discovery colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011–12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york president’s message | cuddy 151 column a call to arms for librarians of all backgrounds. the time to address data discovery is now! references 1. “taiga 2011 provocative statements,” http://taigaforum provocativestatements.blogspot.com/ (accessed sept. 22, 2011). 2. united states department of labor, bureau of labor statistics, occupational outlook handbook, 2010–11 edition, http:// www.bls.gov/oco/ocos068.htm (accessed sept. 22, 2011). their research and organizational skills and their knowledge of computer databases and library automation systems. librarians can review vast amounts of information and analyze, evaluate, and organize it according to a company’s specific needs.” 2 we have been seeing new job titles emerging to reflect these needs, such as data curation librarian, digital data outreach librarian, gis librarian, etc. what is your library doing with data? how can you and your library address the data needs of the twenty-first century? what technology is needed to address data needs? how can lita help you meet those needs? consider this statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2010 issue). total number of copies printed: average, 4,547; actual, 4,494. mailed outside country paid subscriptions: average, 3,608; actual, 3,577. sales through dealers and carriers, street vendors, and counter sales: average, 395; actual 367. total paid distribution: average, 4,003; actual, 3,944. free or nominal rate copies mailed at other classes through the usps: average, 27; actual, 27. free distribution outside the mail (total): average, 118; actual, 117. total free or nominal rate distribution: average, 145; actual, 144. total distribution: average, 4,148; actual, 4,088. office use, leftover, unaccounted, spoiled after printing: average, 399; actual, 406. total: average, 4,547; actual, 4,494. percentage paid: average, 96.50; actual, 96.48. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 11 . this article discusses structural, systems, and other types of bias that arise in matching new records to large databases. the focus is databases for bibliographic utilities, but other related database concerns will be discussed. problems of satisfying a “match” with sufficient flexibility and rigor in an environment of imperfect data are presented, and sources of unintentional variance are discussed. editor’s note: this article was submitted in honor of the fortieth anniversaries of lita and ital. s ameness is a sometime thing. libraries and other informationintensive organizations have long faced the problem of large collections of records growing incrementally. computerized records in a net worked environment have encouraged the recognition that duplicate records pose a serious threat to efficient information retrieval. yet what constitutes a duplicate record may be neither exact nor completely predictable. levels of discernment are required to permit matches on records that do not dif fer significantly and records that do. n initial definitions matching is defined as the process by which additions to a large database are screened and compared with existing database records. ideally, this process of matching ensures that duplicates are not added, nor erroneous replacements made of record pairs that are not really equivalent. oclc (online computer library center, inc.) is a non profit organization serving member libraries and related institutions throughout the world. it is the chief database capital of the organization, and it is “owned” in a sense by the member libraries worldwide that use and contribute to it. at this writing, it contains over seventythree mil lion records. this discussion focuses chiefly on oclc’s extended worldcat (xwc), though many of the issues are common to other bibliographic databases. examples of these include the research libraries group’s research libraries information network (rlin) database, pica (a european cooperative of libraries headquartered in the netherlands), and other union catalogs. the literature will demonstrate that the problems described exist in many if not most large bibliographic databases.the database contents are representations or surrogates of the objects in shared collections. individual records in xwc are com plex bibliographic representations of physical or virtual objects—books, films, urls, maps, slides, and much more. each of these records consists of metadata, i.e., “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource”1(appendix a). the records use an xml varia tion of the marc communications format.2 for example, a record for a book might typically contain such fields for author, title, publisher, and date, and many more in addi tion. the representation of any one object can be quite com plex, containing scores of fields and subfields. such a record may be quite brief, or several thousand characters long. the depth and richness of the records varies enormously. they may describe materials in more than 450 languages. this is a database against which millions of searches and millions of records are processed, each month. why is matching a challenge? two records describing the same intellectual creation or work (e.g., shakespeare’s othello) can vary by physical form and other attributes. two records describing both the same work and exactly the same form can differ from each other if the records were created under different rules of record description (catalog ing). two records intended to describe the same object can vary unintentionally if typographical or other entry errors are present in one or both. thus sorting out significant from insignificant differences is critical. an example of the challenges of developing matching software in the metadata capture project is described elsewhere.3 the scope of misinformation is limited to information storage and retrieval, and specifically to comparison of incoming records to candidate matches in the database. the authors define misinformation as follows: 1. anything that can cause two database records, i.e., representations of different items to be mistaken as representations of the same item. these can lead to inappropriate merging or updates. 2. the effect of techniques or processes of search that can obscure distinctions in differing items. 3. any case where matching misses an appropriate match due to nonsignificant differences in two records that really represent the same item. note that disinformation (the intentional effort to mis represent) is not considered in scope for this discussion. the assumption is that cooperation is in the interests of all parties contributing to a shared database. we do not assume that all institutions sharing the database have the same goals. misinformation and bias in metadata processing | thornburg and oskins 15 misinformation and bias in metadata processing: matching in large databases gail thornburg and w. michael oskins gail thornburg (thornbug@oclc.org) has taught at the university of maryland and the university of illinois, and served as an adjunct professor at kent state university, and as a senior-level software engineer at oclc. w. michael oskins (oskins@oclc.org) has worked as a developer and researcher at oclc for twenty years. 16 information technology and libraries | june 200716 information technology and libraries | june 2007 what is bias? bias can be defined as factors in the creation or processing of database records that feed on misinformation or missing information, and skew charac terizations of the database records in question. context—matching and bias how are matching and bias related to each other? the growth of a database is in part a function of the matching process. if matching is not tuned correctly, the database can grow or change in nonoptimal ways. another way to look at the problem is to consider the goal of success in searching, and the need to know when to stop. human beings recognize that failure to find the best information for a given problem may be costly. finding the best information when less would suffice may also be costly. systems need to know this. for a large shared data base, hundreds of thousands of records may be processed in a day; the system must be as efficient as possible. what are some costs? fail to match when one should, and duplicates may proliferate in the database. match badly, and there is risk of merging multiple records that do not represent the same item. a system of matching can fail in more than one way. balance is needed. 1. searches, which are based on data in the incom ing record, may be too precise to find legitimate matches. loosen the criteria too much, and the search may return too many records to compare. 2. once retrieved, candidate matches are evaluated. compare candidates too narrowly, and records with insignificant differences will be rejected. fail to take note of salient differences between incom ing record and database record, and the match will be wrong, undetected, and potentially hard to detect in the future. the goals vary in different matching projects. for some projects, setting “holdings,” the indication that a member library owns a copy of something, is the main goal of the processing. this does not involve adding, replacing, or merging database records. for other projects, the goal is to update the database, either by replacing matched records, merging multiple duplicate records into one, or by adding new records if no match is found in the database. for the latter, bad matching could compromise database contents. n background hickey and rypka provide a good review of the problems of identifying duplicates and the implications for match ing software.4 their study notes concerns from a variety of library networks including that of the university of toronto (utlas), washington library network (wln), and research libraries group (rlin). they also refer ence studies on duplicate detection in the illinois state wide bibliographic database and at oak ridge national laboratories. background discussion of broader misinfor mation issues in shared library catalogs can be found in bade’s paper.5 a good, though dated, review of duplicate record problems can be found in the o’neill, rogers, and oskins article.6 the authors discuss their analysis of differences in records that are similar but not identical, and which elements caused failure to match two records for the same item. for example, when there was only one differing element in a pair, they found that element was most often publication date. their study shows the difficulties for experts to determine with certainty that a bibliographic record is for the same item. problems of typographical errors in shared biblio graphic records come under discussion by beall and kafadar.7 their study of copy cataloging errors found only 35.8 percent were corrected later by libraries, though the ordinary assumption is that copy cataloging will be updated when more information is available for an item. pollock and zamora report on a spelling error detection project at chemical abstracts service (cas) and charac terize the types of errors they found.8 chemical abstracts databases are among the most searched databases in the world. cas is usually characterized as a set of sources with considerable depth and breadth. of the four most common typographical errors they describe, errors of omission are most common, with insertion second, substitution third, and transposition fourth. over 90 percent of the errors they found were single letter errors. this is in agreement with the findings of o’neill and aluri, though the databases were substantially different.9 another study on moving image materials focuses on problems of nearequivalents in cataloging.10 yee suggests that cataloging practice tends to lead to making too many separate records for near equivalents. owen gingerich provides insight in the use of holdings information in oclc and other bibliographic utilities such as rlin for scholarly research in locating early editions of copernicus’ de revolutionibus.11 among other sources, he used holdings information in multiple bibliographic utilities to help in collecting a census of copies of de revolutionibus, and plotting its movements through europe in the sixteenth century. his article high lights the importance of distinguishing very similar items for scholarly research. shedenhelm and burk discuss the introduction of vendor records into oclc’s worldcat database.12 their results indicate that these minimallevel records increase the duplication rate within the database and can be costly to upgrade. (see further discussion in the section change in contributor characteristics below.) one problem in analysis of sources of mismatch in previous studies is that there is no good way to detect and charac public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 17misinformation and bias in metadata processing | thornburg and oskins 17 terize typos that form real words. jasco reviews studies characterizing types and sources of errors.13 sheila intner compares the quality issues in the databases of oclc and the research libraries group (rlg) and finds the issues similar.14 intner used matched samples of records from both worldcat and rlin to list and compare types of errors in the records. she noted that while the perception at that time was that rlin had higherquality cataloging, the differences found were not statistically significant. jeffrey beall, while focusing in his study on the full text online database jstor, notes the commonality of problems in metadata quality.15 in addition, he discusses the special quality problems in a database of scanned images. the scanning software itself may introduce typo graphical errors. like xwc, the database changes rapidly. o’neill and visinegoetz present a survey of quality con trol issues in online databases.16 their sections on dupli cate detection and on matching algorithms illustrate the commonalities of these problems in a variety of shared cataloging databases. they cite variation in title as the most common reason for failure to identify a duplicate record that should match. variations in publisher, names, and pagination were noted as common. lei zeng pres ents a study of chinese language records in the oclc and rlin databases.17 zeng discusses quality problems including (1) format errors such as field and subfield tagging and incorrect punctuation; (2) content errors such as missing fields and internal record inconsisten cies; and (3) editing and inputting errors such as spacing and misspelling. part 2 of her study presents the results of the prototype rulebased system developed to catch such errors.18 while the author refrains from comparing the quality of oclc and rlin chinese language catalog records, the discussion makes clear that the quality issues are common to a number of online databases. more work is needed on quality and accuracy of shared records in nonroman scripts, or in other lan guages transliterated to roman script. n types of bias to be considered specific factors that may tend to bias an attempt to match one record to another include: 1. violated expectations—system software expects data it does not receive, or data received is not well formed. 2. temporal bias—changes in rules and philosophies of record creation over time. 3. design bias—choices in layout of the records, which favor one type of record representation at the expense of another. 4. judgment calls—distinctions introduced in record representations due to differing but legitimate variation in expert judgment. oclc is a multina tional cooperative and there is no universal set of standards and rules for creating database records. rules of cataloging most widely used are not abso lutely prescriptive and are designed to allow local deviation to meet local needs.19 5. structural bias—process and systems bias. this category reflects internal influences, inherent in the automatic processing, storage, and retrieval of large numbers of records. 6. growth of the database environment—whether in raw numbers of records, numbers of specific formats, numbers of foreign languages, or other characteristics that may affect efficient location and comparison of records. 7. changes in contributor characteristics––in the goals or focus of institutions that contribute to the database. violated expectations data may not conform to expectations. expectations about the nature of records in the data bases are frequently violated. what seem to be good rules for matching may not work well if the incoming data is not well formed, or simply not constructed as expected. biasing sources in the incoming data include the fol lowing: 1. typographical errors occur in titles and other parts of the record. anywhere the software has to parse text, an entry error—or even correction of an entry error by a later update—could con found matching. this could confound both (a) query execution and (b) candidate comparisons. basically the system expects textual data such as the name of a title or publisher to be correct, and machinebased efforts to detect errors in data are expensive to run. spelling detection techniques can compensate in some ways for data problems, but will not identify cases of realword errors. see kukich for a survey of spelling error, realword, and contextdependent techniques.20 2. there is also the issue of real word differences in similar text strings. an automated system with programmed fault tolerance may wrongly equate the publisher name “mila” with “mela” when they are distinct publishers. equivalence tables can crossreference known variations on wellknown publisher names, but cannot predict merges and other organizational changes. or consider author names: are “john smith” and “jon smith” the 1� information technology and libraries | june 20071� information technology and libraries | june 2007 same? this is a major problem with automated authority control where context clues may not be trustworthy. 3. errors of formatting of variable fields in the meta data contribute to false mismatch. the rules for data entry in the marc record are complex and have changed over time. erroneous placement or coding of subfields poses challenges for iden tification of relevant data. the software must be fault tolerant wherever possible. changes in the format of the data itself in these fields/sub fields may further complicate record comparisons. isbns (international standard book numbers) and lccns (library of congress control numbers) have both changed format in the recent past. 4. errors occur in the fields that indicate format of the information. in bibliographic records, format information is used to derive the overall type of material being described: book, url, dvd, and so on. errors in the data in combination can generate an incorrect material type for the record. 5. language of cataloging: this comparison has in the past caused inappropriate mismatches. the require ments in the new matching aimed to address this. 6. language in formation of queries: marc records frequently are a mixture of languages. as has been seen in other projects with intensive comparison of text, overlap in languages has the potential to confuse comparisons of short strings of text.21 the assumption made here is that the use of all pos sible syllables contained in the title should tend to mitigate language problems. nothing short of semantic analysis by the software is likely to solve such a problem, and contextual approaches to detection have had most success (in the produc tion environment) in carefully controlled cases. matching overall must be generic in its problem solving techniques. temporal bias large databases developed over time have their contents influenced by changes in standards for record creation, changes in contributor perception of the role of the data base, and changes in technology to be described. changes may include the following: 1. description level: e.g. changes such as book or elec tronic book. these have evolved from format to contentbased descriptions that transcend format. over time, the cataloging rules for describing formats have changed. thus a format description created earlier might inadvertently “mismatch” the newer description of exactly the same item. for example, the rules for describing a book on a cd originally emphasized the cd format, whereas now, the emphasis might be shifted to focus on the intellectual content, the fact that it is a book. 2. the role of the database once perceived as chiefly repository or even backup source for a given library has become a shared resource with responsibilities to a community larger than any one library. 3. over time, the use of the database may change. (this is further discussed in the section on growth of the environment later.) searching has to satisfy the reference function of the database, but match ing as a process also relies on searching, and its goals are different. 4. varied standards worldwide challenge coopera tion. while u.s. libraries usually follow aacr2 and use the marc21 communications format, other parts of the world may use unimarc and countryspecific cataloging rules. for instance, the pica bibliotekssystem, which hosts the dutch union catalog, used the prussian cataloging rules, which tended to focus on title entries.22 the switch to the rak was made by the early nineties.23 5. some libraries may not use any form of marc but submit a spreadsheet that is then converted to marc. there is some potential for ambiguities in those conversions due to lack of 1:1 correspon dence of parts. 6. even within a country, standards change over time, so that “correct” cataloging in one decade may not match that in a later period. neither is wrong, in its own temporal context, but each results in different metadata being created to describe the same item. intner points out that oclc’s database was initi ated a full decade before rlg implemented rlin, and rlin started almost the same time as the aacr2 publication.24 thus rlin had many fewer preaacr2 records in its database, while worldcat had many more preexisting records to try to match with the newer aacr2 forms. 7. objects referenced in the database may change over time. for instance, a record describing an elec tronic resource may point to a location no longer valid for that resource. 8. vendor records are created as advance advertis ing, but there is no guarantee the records will be updated later. estimating the time before updates occur is impossible. 9. records themselves change over time as they are copied, derived, and migrated into other systems. they may be enhanced or corrected in any system where they reside. so when they return to the origi nating database, they may have been transformed so far as to be unrecognizable as representations of the same item. this problem is not unique to xwc; public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 1�misinformation and bias in metadata processing | thornburg and oskins 1� it is a challenge for any shared database where export of records and reentry is likely. design bias the title, author, publisher, place of publication, and other elements of a record, designed in a time when most of the contents of a library were books, may not appear as clear or usable for other forms of informa tion, such as web sites or software. there is a risk to any design of a representation for an object, that it may favor distinctions in one format over another. or representations imported from other schemes may lose distinctions in the crosswalk from one scheme to another. a crosswalk is a mechanism for the mapping of data elements/content from one metadata scheme to another. dublin core and marc are just two examples of schemes used by library professionals. software exists to convert dublin core metadata to marc for mat, but the process of converting less complex data to a scheme of more structured data has inevitable limita tions. for instance, dublin core has “subject” while marc has dozens of ways to indicate subject, each with a different kind of designation for subject aspects of an item.25 see discussion in beall.26 libraries commonly exchange or purchase records from external sources to reduce the volume or costs of inhouse cataloging. if an institution harvests metadata from multiple sources, there can be varying structures, content standards, and overall quality, all of which can make record compari sons error prone. while library and information science professionals have been creating metadata in the form of catalog records for a long time, the wider community of digital repositories may be outside the lis commu nity, and have varied understanding of the need for consistent representations of data. robertson discusses the challenges of metadata creation outside the library community.27 museums and archives may take a dif ferent view of what quality standards in metadata are. for example, for a museum, extensive detail about the provenance of an object is necessary. archives often record information at the collection level rather than the object level; for example, a box of miscellaneous papers, as opposed to a record for each of the papers within the box. educators need to describe resources such as learning objects. a learning object is any entity, digital or nondigital, which can be used, reused, or referenced during technologysupported learning 28 for these objects a metadata record using the ieee lom standard may be used.29 while this is as complex as a marc record, it has less bibliographic description and more focus on description of the nature and use of the learning object. in short, for one type of institution the notion of appropriate granularity of description may be too detailed or too vague for the needs of another type of institution. judgment calls two persons creating independent records for the same item exercise judgment in describing what is most impor tant about the object. one may say it is a book with an accompanying cd, another may say it is software on a cd, accompanied by a book of documentation. another example of legitimate variation is the choice of use of ellipses […] to leave out parts of long titles in a metadata description. one record creator may list the whole title, another may list only the first part followed by the mark of ellipsis to indicate abbreviation of the lengthy title. either is correct, but may not match each other without special techniques. see appendix b for the perils of ellipsis handling. the form of name of a publisher, given other occur rences of a publisher name in a record, may be abbrevi ated. for instance, in one place the corporate author who is also the publisher might be listed in the author field as “department of health and human services” and then abbreviated—or not—in the publisher area as “the department.” note that there are limitations inherent to the valida tion of any system of matching, in that human reviewers may not be able to determine whether two representa tions in fact describe the same item. structural bias 1. process bias refers to any features of the software which at runtime may change the way matching is carried out, whether by shortening or lengthen ing the analysis, or otherwise branching the logical flow. this can arise from many sources, including but not limited to the following factors. a. there is need for efficient processing of large num bers of incoming records. this can force an empha sis on speedy matching. that is, matching not required to replace records tends to be optimized to stop searching/matching as early as is reason able. in the case where unique key searching finds a single match to an incoming record, it is fairly easy for the software to “justify” stopping. if there are multiple matches found, more analysis may be needed before the decision to stop matching can be made. over time the numbers of records processed has increased enormously. b. matching needs to exploit “unique” keys to speed searching, yet these may not prove to be unique. though agreements are in place for use of numeric keys such as isbns, creation of these keys is not under the control of any one organization. 20 information technology and libraries | june 200720 information technology and libraries | june 2007 c. problems arise when brief records are com pared with fuller records. comparisons may be biased inadvertently towards false matches. such sparseness of data has been identified as a problem in rlin matching as well as in xwc. d. at the same time there is bias toward less generic titles in matching. requirements of sys tem throughput mandate an upper limit on the size of result set that the matching software will even attempt to analyze. this upper limit could tend to discriminate against effective retrieval of generic titles. matching will reject very large results sets of searches. so the query that has fewer title terms may tend to retrieve too much. titles such as “proceedings” or “bulletin” may be difficult to match if insufficient other informa tion is present in the record for the query to use. ironically this can mean addition of more generic titles to the database, since what is there is in effect less findable. e. transparency can contribute to bias in that, for each layer of transparency a layer of opacity may be added, when information is filtered out from a user’s view. that user may be a human or an application. openurl access to “appropriate copy” is an example from the standards world. the complexity of choosing among multiple online copies has become known as the “appro priate copy” problem. there are a number of instances where more than one legitimate copy of an electronic article may exist, such as mir roring or aggregator databases. it is essentially a problem of where and how to introduce localiza tion into the linking process.30 appropriateness reflects the user’s context, e.g., location, license agreements in place, cost, and other factors. 2. systems bias. what is this, really? the database can be seen as “agent.” the weight of its own mass may affect efforts to use its contents. a. for maintainers of large database systems, the goals of database repository and search engine may be somewhat at odds. yet librarians do make use of the database as reference source. b. search strategies for the software that acts as a user of the database is necessarily developed and optimized at a certain point in time. yet a river of new information flows into this data base. 1. if the numbers of types of entries in various database indexes grows nonproportion ally, search strategies that worked well in the past could potentially fall “out of tune” with the database contents. see growth of the environment section below. 2. change in proportions of languages in the database may render an application’s use of stopword lists less effective. 3. if changes in technology or practice result in new forms of material being described in the database, the software searches using material type as a limiter may not work properly. the software is using abstractions provided by the database, and they need to be kept synchronized. c. automated query construction presents its own problems. the use of boolean searching [term a and term b and term c] is quite restrictive in the sense that there is no “halfway” or flex for a record being included in a set of candidates. matching starts with the most specific search to avoid toohigh numbers of records retrieved, and all it can do is drop or rearrange terms from a query in the effort to broaden the results. d. disconnects in metadata object creation/revision are another problem. links can point to broken uris (uniform resource identifiers). controlled vocabularies can drift or expand. even more confusing, a uri that is not broken may point to content which has changed to the point where the metadata no longer describes the item it once did. at one extreme, bruce and hillmann describe the curious case of citation of judicial opinions, for which a record of the opinion may be created as much as eighteen months before the volume with the official citation is printed, and thus the official citation cannot be created.31 e. expectations for creation of metadata play a role as well. traditional cataloging has generally had an expectation that most metadata is being cre ated once and reused. yet current practice may be more iterative, and must be, if such problems as records with broken internet uris are to be avoided. f. loss of synchronization can subvert process ing. note that other elements of metadata may become divorced or out of synch with the origi nal target /purpose. the prefix to an isbn was originally intended to describe the publisher, but is now an unreliable discriminator. numeric keys intended to identify items uniquely can retrieve multiple items, if the scheme for assign ing them is not applied consistently. in the worst case, meaningful data elements may become so corrupted as to be useless for record retrieval or even comparison of two records. g. ownership issues can detract from optimal data base management. member institutions’ percep tions of ownership of individual records can conflict with the goals of efficient search and retrieval. members may resist the idea of a “bet public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 21misinformation and bias in metadata processing | thornburg and oskins 21 ter” record being merged with a “lesser” one. so systems have ways of ranking records by source or contents with the general goal of trying to avoid losing information, but with the specific effect of refraining from actions that might be enriching in a given case. growth of the database environment a shared database can grow in unpredictable ways. a change in the relative proportions of different types of materials or topical coverage can render onceeffective searches ineffective due to large result sets. an example of this is the number of internetrelated entries in xwc. a search such as “dog” restricted to “internetrelated” entries in 1995 retrieved thirtyfour hits. this might be a manageable number. but in 2005, 225 entries were in the result set. similarly with subject headings, one search on “computer animation” retrieved fourteen hits in 1980, and 342 in 2005. in both cases the result sets grew from manageable to “too large” over time. the increase in the number of foreign language entries in a database can cause problems. just determining what language an entry is in can be difficult, and records may contain multiple languages. also, such languages as chinese, japanese, and korean can overlap. chinese syllables such as: “a, an, to, no, jan, ka, jun, lung, sung, i, lo, la, le, so, sun, juan,” seen out of context might be chinese or any one of several other languages. determining appropriate handling of stopwords and other rules for effective title matching becomes more complex as more languages populate the database. changes in contributor characteristics copy cataloging practices in an institution can affect xwc indirectly. an institution previously oriented to fixing downloaded records may adopt a policy of refrain ing from changing downloaded records. historical inde pendence of libraries is one illustration. prior to the 1970s, most libraries did not share their cataloging with other libraries. many institutions, especially smaller ones, were outside the loop and did things their own way. they used what rules they felt were useful, if they used any rules at all. later they converted sparse and poorly formed data into marc records and sent them to oclc for matching, perhaps in an effort to get back a more complete and useful record. yet the matching process is not always able to distinguish or interpret these local dialects. changes in specialization of cata loging staff at an institution, or cutbacks in staff can lead to reduced facility in providing original cataloging. outsourcing of cataloging work can affect handling of specialized materials as well. the introduction of vendor records and their characteristics has been noted by shedenhelm and burk.32 as they note, these records are very brief bibliographic records originally designed to advertise an item for sale by the vendor. these mini mal level records have a relatively high degree of dupli cation with existing records (37.5 percent in their study) and because of their sparseness can increase the cost of cataloging. changes in the proportion of contribu tors who create records in nonmarc formats such as dublin core can affect the completeness of bibliographic entries. the use of such formats, meant to facilitate the entry of bibliographic materials, does come with a cost. group cataloging is a process whereby smaller libraries can join a larger set of institutions in order to reduce costs and facilitate cataloging. this larger group then contributes to oclc’s database as an entity. the growth of group cataloging has resulted in the addition of more records from smaller libraries, which may in the future have an effect on searching/matching in xwc worldcat overall. internationalization may be a factor as well. the marc format is an anglobased format with englishlanguagebased documentation. rapid inter national growth thrusts a broader range of traditions into a marc/oclc world. the role of character sets is heightened as the database grows. a cyrillic record may not be confidently matched to a transliterated record for the same item. although worldcat has a long his tory with cjk records, marc and worldcat are not yet accustomed to a wide repertoire of character sets. now, however, xwc is an environment in which expanding character coverage is possible, and likely. future research n we need more systematic study of the types of errors/omissions encountered in marc record cre ation. n how can the process of matching accomodate objects that change over time? n how does the conversion from new metadata schemes affect matching to marc records? does it help to know in what format a record arrived, or under what rules it was created? n how can we address sparseness in vendor records or legal citations? how can we deal with other advance publication issues? n how do changes in philosophy of the database affect the integrity of the matching process? n conclusions in this review we have seen that characterizing metadata at a high level is difficult. challenges for adding to a large, complex database include some of the following: 22 information technology and libraries | june 200722 information technology and libraries | june 2007 n rules for expert creation of metadata inevitably change over time. n the object of the metadata itself may change, more often than may be convenient. n comparisons of briefer records to records that are more elaborate descriptions can have pitfalls. search and comparison strategies for such record pairs are challenged by the need to have matching algorithms that work for every scenario. n changes within the database may themselves con tribute to exacerbation of matching problems if duplicates are added too often, or records are merged that actually represent different contents. because of the risk, policies for merging and replacing records tend to be conservative, but this does not always favor the greatest efficiency in database processing. n changes in the membership sharing a database are likely to affect its shape and searchability. n newer schemes of metadata representation are likely to challenge existing algorithms for determining matches. references 1. national information standards organization, understanding metadata (bethesda, md.: niso pr., 2004), 1. http:// www.niso.org/standards/resources/understanding metadata. pdf (accessed feb. 26, 2006). 2. library of congress, “marc 21 concise format for bibliographic data (2002).” http://www.loc.gov/marc/ bibliographic/ecbdhome.html (accessed nov. 20, 2004). 3. gail thornburg, “matching: discrimination, misinforma tion, and sudden death,” informing science conference, flag staff, ariz., june 2005. 4. thomas b. hickey and david j. rypka, “automatic detec tion of duplicate monographic records,” journal of library automation 12, no. 2 (june 1979): 125–42. 5. david bade, “the creation and persistence of misinfor mation in shared library catalogs,” occasional paper no. 211, (graduate school of library and information science, univer sity of illinois at urbana–champaign, apr. 2002). 6. edward t. o’neill, sally a. rogers, and w. michael oskins, “characteristics of duplicate records in oclc’s online union catalog,” library resources and technical services 37, no.1 (1993): 59–71. 7. jeffrey beal and karen kafadar, “the effectiveness of copy cataloging at eliminating typographical errors in shared bibliographic records,” library resources & technical services 48, no. 2 (apr. 2004): 92–101. 8. j. j. pollock and a. zamora, “collection and characteriza tion of spelling errors in scientific and scholarly text,” journal of the american society for information science 34, no. 1 (1983): 51–58. 9. edward t. o’neill and rao aluri, “a method for cor recting typographical errors in subject headings in oclc records,” research report # oclc/opr/rr80/3 (1980). 10. martha m. yee, “manifestations and yearequivalents: theory, with special attention to movingimage materials,” library resources and technical services 38, no. 3 (1995): 227–55. 11. owen gingerich, “researching the book nobody read: the de revolutionibus of nicolaus copernicus,” the papers of the bibliographical society of america 99, no. 4 (2005): 484–504. 12. laura d. shedenhelm and bartley a. burk, “book vendor records in the oclc database: boon or bane?” library resources and technical services 45, no. 1 (2001): 10–19. 13. peter jasco, “content evaluation of databases,” in annual review of information science and technology, vol. 32 (medford, n.j.: information today, inc., for the american society for infor mation science, 1997), 231–67. 14. sheila intner, “quality in bibliographic databases: an analysis of membercontrolled cataloging of oclc and rlin,” advances in library administration and organization 8 (1989): 1–24. 15. jeffrey beall, “metadata and data quality problems in the digital library,” journal of digital information 6, no. 3 (2005): 10–11. 16. edward t. o’neill and diane vizinegoetz, “quality control in online databases,” annual review of information science and technology 23 (washington, d.c.: american society for information science, 1988). 17. lei zeng, “quality control of chineselanguage records using a rulebased data validation system. part 1: an evalua tion of the quality of chineselanguage records in the oclc oluc database,” cataloging and classification quarterly 16, no. 4 (1993): 25–66 18. lei zeng, “quality control of chineselanguage records using a rulebased data validation system. part 2: a study of a rulebased data validation system for online chinese cata loging,” cataloging and classification quarterly 18, no. 1 (1993): 3–26. 19. anglo-american cataloguing rules, 2nd ed., 2002 rev. (chi cago: ala, 2002). 20. karen kukich, “techniques for automatically correct ing words in text,” acm computing surveys 24, no. 4 (1992): 377–439. 21. gail thornburg, “the syllables in the haystack: techni cal challenges of nonchinese in a wadegiles to pinyin con version,” information technology and libraries 21, no. 3 (2002): 120–26. 22. hartmut walravens, “serials cataloguing in germany: the historical development,” cataloging and classification quarterly 35, no. 3/4 (2003): 541–51; instruktionen für die alphabetischen kataloge der preuszischen bibliotheken vom 10. mai 1899. 2 ausg. in der fassung vom 10. august 1908 (berlin: behrend & co., 1909). 23. richard greene, email message to author, nov. 13, 2006; regeln für die alphabetische katalogisierung: rak / irmgard bou vier (wiesbaden, germany: l. reichert, 1980, c1977). 24. intner, “quality in bibliographic databases.” 25. richard greene, email message to author, feb. 27, 2006. 26. beall, “metadata and data quality problems in the digital library.” 27. r. john robertson, “metadata quality: implications for library and information science professionals,” library review 54, no. 5 (2005): 295–300. public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 23misinformation and bias in metadata processing | thornburg and oskins 23 28. ieee. learning technology standards committee, “wg12: learning objects metadata.” http://ltsc.ieee.org/wg12 (accessed feb. 26, 2006). 29. ibid. 30. orien beitarie et al., “linking to the appropriate copy: report of a doibased prototype,” d-lib 7, no. 9 (sept. 2001). 31. thomas r. bruce and diane i. hillmann,“the continuum of metadata quality: defining, expressing, exploiting,” in metadata in practice (chicago: ala, 2004), 238–56. 32. shedenhelm and burk, “book vendor records in the oclc database.” 24 information technology and libraries | june 200724 information technology and libraries | june 2007 appendix a. sample cdfrecord record from the xwc database cgm 7a 27681290 vf bcahru mr baaafu 920714r19551952fr 092 mleng 92513007 dlcamim dlc lp5921u.s. copyright office xxu mr vbe 63606361 (viewing copy) fgb 56435647 (ref print) fpa 06210625 (master pos) othello (motion picture : welles) the tragedy of othello the moor of venice / a mercury production, [films marceau?] ; directed, produced, and written by orson welles. u.s. ; [morocco?] france :films marceau,1952 ; [morocco?: :s.n., 1952?] ;united states : united artists,1955. 2 videocassettes of 2 (ca. 92 min.) :sd., b&w ; 3/4 in. viewing copy. 10 reels of 10 on 5 (ca. 8280 ft.) :sd., b&w ; 35 mm. ref print. 10 reels of 10 on 5 (ca. 8280 ft.) :sd., b&w ; 35 mm. masterpos. copyright: orson welles; 19sep52; lp5921. reference sources cited below and m/b/rs preliminary cataloging card list title as othello. photography, anchisi brizzi, g.r. aldo, george fanto ; film editors, john shepridge, jean sacha, renzo lucidi, william morton ; music, francesco lavagnino, alberto barberis. orson welles, suzanne cloutier, micheaì l macliamoì ir, robert coote. director, producer, and writer credits taken from focus on orson welles, p. 205. lc has u.s. reissue copy.dlc new york times,9/15/55. an adaptation of the play by william shakespeare. reference sources used: new york times, 9/15/55; international motion pic ture almanac, 1956, p. 329; focus on orson welles, p. 205206; monthly film bulletin, v. 23, no. 267, p. 44; index de la cineì matog raphie francì§aise, 1952, p. 496. received: 5/26/87 from lc video lab;viewing copy; preservation, made from ref print, paperwork in acq: copyrightmaterial movement form file, lwo 21635; copyright collection. received: 12/2/64; ref print;copyright deposit; copyright collection. received: 5/70; masterpos;gift; afi theatre collection. othello (fictitious charac ter)drama. public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 25misinformation and bias in metadata processing | thornburg and oskins 25 plays. mim features. mim welles, orson, 1915direction, production,writing, cast. cloutier, suzanne,1927cast. mac liammoì ir, micheaì l, 18991978,cast. coote, robert,19091982,cast. copyright collection (library of congress)dlc afi theatre collection (library of congress)dlc othello. appendix b. the perils of judging near matches a. challenges of handling ellipses in titles thought to be similar incoming title: general explanation of tax legislation enacted in ... / prepared by the staff of the joint committee on taxation match: general explanation of tax legislation enacted in the 104th congress prepared by the staff of the joint committee on taxation incoming title: general explanation of tax legislation enacted in ... / prepared by the staff of the joint committee on taxation match: general explanation of tax legislation enacted in the 106th congress prepared by the staff of the joint committee on taxation incoming title: general explanation of tax legislation enacted in ... / prepared by the staff of the joint committee on taxation match: general explanation of tax legislation enacted in the 107th congress prepared by the staff of the joint committee on taxation incoming title: general explanation of tax legislation enacted in ... / prepared by the staff of the joint committee on taxation match: general explanation of tax legislation enacted in the 108th congress prepared by the staff of the joint committee on taxation b. partial matches in names which might represent the same publisher publisher comparison is challenging in an environment where organziations are regularly merged or acquired by other organziations. there is no real authority control for publishers that would help cataloguers decide on a preferred form. when governmental organizations are added to the mix, the challenges increase. below are some examples of nonmatch ing text of publisher names in records, which might or might not considered the same by a human expert. (the publisher names have been normalized.) 26 information technology and libraries | june 200726 information technology and libraries | june 2007 1. publisher name may be partially or differently recorded in two records incoming publisher: konzeptstudien kantonale planungsgruppe match: kantonale planungsgruppe konzeptstudien (word order different) incoming publisher: institut francais proche orient match: institut francais darcheologie proche orient incoming publisher: u s dept of commerce national oceanic and atmospheric administration national environ mental satellite data and information service match: national oceanic and atmospheric administration 2. publisher name may have changed due to acquisition by another organization incoming publisher: pearson prentice hall match: prentice hall incoming publisher: uxl match: uxl thomson gale incoming publisher: thomson arco match: arco thomson learning 3. one record may show “publisher” which is actually government distributing agency or clearinghouse such as the u.s. government printing office or national technical information service (ntis), while the candidate match shows the actual government agency. these can be almost impossible to evaluate. incoming publisher: u s congressional service match: supt g p o (here the distributor is the government printing office, listed as the publisher) incoming publisher: u s dept of commerce national oceanic and atmospheric administration national environmental satellite data and information service match: national oceanic and atmospheric administration incoming publisher: u s gpo match: u s fish and wildlife service 4. the publisher in a record may start with or end with the publisher in the second record. should it be called a match? good: incoming publisher trotta match: editorial trotta incoming publisher wiley match: john wiley questionable? incoming publisher prentice hall match: prentice hall regents canada incoming publisher geuthner match: orientaliste geuthner incoming publisher oxford match: distributed royal affairs oxford incoming publisher: pan union general secretariat organization states match: social science section cultural affairs pan union editorial | truitt 159 marc truitt editorial: reflections on what we mean by “forever” w hat do we mean when we tell people that we want or intend to preserve content or an object “forever”? a couple of weeks ago, i attended the fall meeting of the preservation and archiving special interest group (pasig) in san francisco. the group, generously sponsored by sun microsystems, is the brainchild of art pasquinelli of sun and michael keller of stanford. first, a confession on my part. since the university of alberta (ua) was one of the founding members of pasig, i had occasion to attend the first several pasig meetings. in the beginning, there were just a handful of—perhaps fewer than ten—institutions represented. it seemed at the first couple of meetings, when the group was still finding its direction, that the content was slim, repetitious, and overly focused on sun’s own solutions in the digital preservation and archiving (dpa) arena. since we had other attendees ably representing ua, i stayed away from the following several meetings. well, pasig has grown up. the attendee list for this meeting boasted nearly two hundred persons representing more than thirty institutions. among the attendees were many of the leading lights in dpa and the profession generally. institutions represented included several north american and european national libraries, as well as arls, memory institutions, and a host of companies and consultants offering a range of dpa solutions. yes, pasig has arrived, and we have art, mike, and sun to thank for this. if i have one real remaining complaint about pasig, it’s that the group is still overly focused on sun’s solutions. true, other vendors such as exlibris and vtls attended, but their solutions don’t compete; rather, they build on sun’s offerings. and while microsoft also was in attendance for the first time, its presentation focused not so much on dpa solutions—it has none—as on a raft of interesting and useful plug-ins whose purpose is to facilitate preservation of content created in microsoft products such as word, excel, powerpoint, etc. other large vendors of dpa solutions—think ibm, for one—remain conspicuously absent. it’s time for sun to do the “right thing” and “open source” pasig. if sun wishes to continue to sponsor pasig by lending administrative and organizational expertise, that would be great. indeed, a leading but not controlling role in pasig would be entirely consistent with the company’s new focus on support of open-source efforts such as mysql, openoffice, and opensolaris. so, what about the title of this editorial? when we talk of digital preservation, just how long are we thinking of preserving an object? ask any twenty specialists in dpa, and chances are that you’ll get at least ten different answers. for some, the timeframe can be as short as five to twenty years. for others, it’s fifty or perhaps one hundred years. at pasig, at least one presenter described an organizational business model that envisions preserving content for five hundred years. and there are even some in our profession who glibly use what one might call “the dpa f-word,” although fortunately none of them seemed to be in attendance at this fall’s pasig what does this mean in a very practical, nuts-and-bolts it sense? chris wood of sun gave a presentation at the 2008 pasig spring meeting in which he estimated that the cost to supply power and cooling alone to maintain a petabyte (1,000 tb) of disk-based digital content for a mere ten years would easily exceed $1 million.1 refining his figures downward somewhat, wood noted a few months later at the following pasig meeting that for a 1 tb drive, the fiveyear estimated power and cooling for 2008–12 could be estimated at approximately $320, or $640,000 per petabyte over ten years, still a considerable sum.2 add to this the costs of migration—consider that a modern spinning disk is generally thought to have a useful lifespan of about five years, and tape may have two or three decades—and the need regular integrity-checking of digital content for “bit-rot,” and you have the stuff of a sustainability nightmare. these challenges don’t even include the messy question of preservating an object so that it is usable in a century or five. while we probably will be able to read word and excel files for the foreseeable future, there are already countless files created with nowdefunct pc applications of the 1980s and 1990s; many are stored on all kinds of obsolete media and today are skating on the edge of inaccessibility. already we are seeing concern expressed at institutions with significant digital library and digitization commitments that curating, migrating, and ensuring the integrity and usability of growing petabytes of content over centuries may be unsustainable in both dollars and staff.3 can we even imagine the possible maintenance burden for our descendants, say, 250 or 500 years from now? in 2006, alexander stille observed that “one of the great ironies of the information age is that, while the late twentieth century will undoubtedly have recorded more data than any other period in history, it will also almost certainly have lost more information than any previous era.”4 how are we to deal with this? can we meaningfully plan for the preservation of digital content over centuries given our poor track record over just the past few decades? perhaps we’re thinking too big when we speak of “forever.” maybe we need to begin by conceptualizing and implementing on a more manageable scale. or, to adopt a phrase that seemed to become the informal mantra of marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 160 information technology and libraries | december 2009 both this year’s pasig and the immediately preceding ipres meeting, “to get to forever you have to get to five years first.”5 n about this issue of ital a few months ago, while she was still working at the university of nevada las vegas, ital’s longtime managing editor, judith carter, shared with me the program for discovery mini-conference that had just been held at unlv. the presentations, originally cast as poster sessions, suggested a diverse and fascinating collection of insights deserving of wider attention. i suggested to judith that she and her colleagues had the makings of a great ital theme issue, and i’m pleased that they accepted my invitation to rework the presentations into a form suitable for publication here. i hope that you will find the results of their work interesting—i certainly do. they’ve done a superb job! bravo to judith and the presenters at the unlv discovery mini-conference! n corrigenda in our september issue, in an article by kathleen carlson, we inadvertently characterized camtasia studio as an open-source product. it is not. camtasia studio is published by techsmith corporation. you can find out more at the product website (http://www.techsmith.com/ camtasia.asp). also, in the same article, we provided a url to a flash tutorial titled “how to order an article that asu does not own.” ms. carlson has recently advised us that the tutorial in question is no longer available. references and notes 1. chris wood, “the billion file problem and other archive issues” (presentation, spring meeting of the sun preservation and archiving special interest group [pasig], san francisco, california, may 28, 2008), http://events-at-sun.com/ pasig_spring/presentations/chriswood_massivearchive.pdf (accessed oct. 22, 2009). 2. chris wood, “archive and preservation: emerging storage: technologies & trends” (presentation, fall meeting of pasig, baltimore, maryland, nov. 19, 2008), http://events -at-sun.com/pasig_fall08/presentations/pasig_wood.pdf. (accessed oct. 22, 2009). 3. consider, for example, the following extract from a recent posting to the syslib-l electronic discussion list by the head of library systems at the university of north carolina at chapel hill: i’m exaggerating a little in my subject line, but it’s been less than 4 years since we purchased our first large (5tb) storage array. we now have a raw 65tb online, and 84tb on order—although a considerable chunk of that 84 is going to replace storage that’s going out of warranty/maintenance and is more cost effective to replace (apple xraids, for instance). in the end, though we’ll net out with 100tb or thereabouts by the end of next year. a great deal of this space is going to digitization projects—no surprise there. we have over 20tb now in our “digital archive,” storage i consider dim, if not dark. we need a heck of a lot of space for staging backups, givien [sic] how much we write to tape in a 24-hour period. individual staff aren’t abusing our lack of quotas—it’s really almost all legitimate, project-driven work that’s eating us up. what’s scarier is that we’re now talking seriously about moving from project-driven work to programmatic work: the latest large photographic archive we acquired is being scanned as part of the acquisition/processing workflow. we’re looking at ways to prioritize the scanning of our manuscript collections. donors increasingly expect to see their gifts online. and we’re not even yet supporting an “institutional repository.” will owen, “0 to 60 in three years: mass storage management,” online posting, dec. 8, 2008, syslib-l@listserv.indiana.edu, https://listserv.indiana.edu/cgi-bin/wa-iub.exe?a0=syslib-l (account required; accessed oct. 22, 2009). 4. alexander stille, “are we losing our memory? or, the museum of obsolete technology,” lost magazine, no. 3 (feb. 2006), http://www.lostmag.com/issue3/memory.php (accessed oct. 22, 2009). while stille was referring in this quotation to both digital and nondigital materials, his comments are but part of a larger debate positing that the latter half of the twentieth century could well come to be known in the future as a “digital dark age” because of the vast quantity of at-risk digital content, recently estimated by one expert at some 369 exabytes (369 billian gb) worth of data. physorg.com, “‘digital dark age’ may doom some data,” http://www.physorg.com/news144343006 .html (accessed oct. 22, 2009). 5. ed summers, “ipres, iipc, pasic roundup/braindump,” online posting, oct. 14, 2009, inkdroid, http://inkdroid .org/journal/2009/10/14/ipres-iipc-pasig-roundupbrain dump/ (accessed oct. 22, 2009). 2 information technology and libraries | june 2008 mark beatty (mbeatty@wils.wisc.edu) is lita president 2007/2008 and trainer, wisconsin library services, madison. mark beattypresident’s message i’ve recently read three quite different articles that surprisingly all had something similar to say with a different twist on the theme uppermost in my brain the last year or two. here’s the briefest of quotes from the three. i would suggest your full reading of all three if you haven’t already. n lankes, silverstein and nicholson, “participatory networks: the library as conversation,” in the december 2007 information technology and libraries: “with their principles, dedication to service, and unique knowledge of infrastructure, libraries are poised not simply to respond to new technologies, but to drive them. by tying technological implementation, development and improvement to the mission of facilitating conversations across fields, libraries can gain invaluable visibility and resources.” n bill crowley, “lifecycle librarianship,” in the april 1, 2008 library journal: “public, academic, and school librarians should adopt the service philosophy of lifecycle librarianship and jointly plan at town, city, or county levels to identify and meet human learning needs from “lapsit to nursing home.”” n joe kissell, “instant messaging for introverts,” in the april 4, 2008 tidbits (http://db.tidbits.com/ article/9544): “several people i discussed this issue with (using im and twitter) expressed dismay at having had relationships deteriorate due to an unwillingness on another person’s part to adapt to changing technology. for example, people who don’t use e-mail don’t get evites, and so they end up being excluded from parties.” what all three express to me is a concern that libraries, and just plain humans, need to be part of the conversation, part of the social structure, and full participants in life. we are now, through surveys and meetings and focus groups, starting to know that new librarians and new lita members are most interested in networking with their colleagues using multiple methods to fulfill the whole range of their professional and social needs. lankes wants to make sure we participate with all our constituencies, crowley wants us to spend a lifetime with those constituencies, and kissell wants to make sure we get invited to the party. that sounds a bit facetious but i believe the point is that our association, our libraries, our social structures are now required to be active participants, physically and virtually, in the life of their communities. we have to recognize our communities and then act to participate and provide space and support to those communities. this takes work and the will to always be part of our communities. all of which leads to my president’s program, featuring keynote speaker joe janes and the blogging folks at “it’s all good” at the ala annual conference 2008 in anaheim, california. it will be part of sunday afternoon with lita, taking place on sunday, june 29, 2008. the program line up will include: n top technology trends 1:30–3:00 p.m. n lita awards and scholarships reception 3–4 p.m. n lita president’s program 4–5 p.m. n “isn’t it great to be in the library . . . wherever that is?” it’s often said that today we have to run three libraries at once: the library of yesterday, today, and tomorrow. we run both the physical, visible library, and the one that exists beyond the walls. this raises many questions of what a library is and encompasses, what it isn’t, where the boundaries lie, the impact on what we do and how we do it, what our clients want, how we serve them, and what kinds of librarians serve them. this program will attempt to examine the full social and cultural constructs of libraries that move beyond basic web 2.0 and integrate patrons, librarians, and resources in what should be a ubiquitous manner. join joe janes, associate professor in the information school of the university of washington in seattle and columnist for american libraries, keynote speaker, along with members of the “it’s all good” blogging group (http://scanblog.blogspot.com) as the reactor panel for a lively exploration of possible futures. i hope you’ll be able to attend but be assured that members of the lita community will blog and report and even record the sessions in various ways that will be made freely available to our community. a semantic model of selective dissemination of information | morales-del-castillo et al. 21 a semantic model of selective dissemination of information for digital libraries j. m. morales-del-castillo, r. pedraza-jiménez, a. a. ruíz, e. peis, and e. herrera-viedma in this paper we present the theoretical and methodological foundations for the development of a multi-agent selective dissemination of information (sdi) service model that applies semantic web technologies for specialized digital libraries. these technologies make possible achieving more efficient information management, improving agent–user communication processes, and facilitating accurate access to relevant resources. other tools used are fuzzy linguistic modelling techniques (which make possible easing the interaction between users and system) and natural language processing (nlp) techniques for semiautomatic thesaurus generation. also, rss feeds are used as “current awareness bulletins” to generate personalized bibliographic alerts. n owadays, one of the main challenges faced by information systems at libraries or on the web is to efficiently manage the large number of documents they hold. information systems make it easier to give users access to relevant resources that satisfy their information needs, but a problem emerges when the user has a high degree of specialization and requires very specific resources, as in the case of researchers.1 in “traditional” physical libraries, several procedures have been proposed to try to mitigate this issue, including the selective dissemination of information (sdi) service model that make it possible to offer users potentially interesting documents by accessing users’ personal profiles kept by the library. nevertheless, the progressive incorporation of new information and communication technologies (icts) to information services, the widespread use of the internet, and the diversification of resources that can be accessed through the web has led libraries through a process of reinvention and transformation to become “digital” libraries.2 this reengineering process requires a deep revision of work techniques and methods so librarians can adapt to the new work environment and improve the services provided. in this paper we present a recommendation and sdi model, implemented as a service of a specialized digital library (in this case, specialized in library and information science), that can increase the accuracy of accessing information and the satisfaction of users’ information needs on the web. this model is built on a multi-agent framework, similar to the one proposed by herrera-viedma, peis, and morales-del-castillo,3 that applies semantic web technologies within the specific domain of specialized digital libraries in order to achieve more efficient information management (by semantically enriching different elements of the system) and improved agent–agent and user–agent communication processes. furthermore, the model uses fuzzy linguistic modelling techniques to facilitate the user–system interaction and to allow a higher grade of automation in certain procedures. to increase improved automation, some natural language processing (nlp) techniques are used to create a system thesaurus and other auxiliary tools for the definition of formal representations of information resources. in the next section, “instrumental basis,” we briefly analyze sdi services and several techniques involved in the semantic web project, and we describe the preliminary methodological and instrumental bases that we used for developing the model, such as fuzzy linguistic modelling techniques and tools for nlp. in “semantic sdi service model for digital libraries,” the bulk of this work, the application model that we propose is presented. finally, to sum up, some conclusive data are highlighted. n instrumental basis filtering techniques for sdi services filtering and recommendation services are based on the application of different process-management techniques that are oriented toward providing the user exactly the information that meets his or her needs or can be of his or her interest. in textual domains, these services are usually developed using multi-agent systems, whose main aims are n to evaluate and filter resources normally represented in xml or html format; and n to assist people in the process of searching for and retrieving resources.4 j. m. morales-del-castillo (josemdc@ugr.es) is assistant professor of information science, library and information science department, university of granada, spain. r. pedrazajiménez (rafael.pedraza@upf.edu) is assistant professor of information science, journalism and audiovisual communication department, pompeu fabra university, barcelona, spain. a. a. ruíz (aangel@ugr.es) is full professor of information science, library and information science department, university of granada. e. peis (epeis@ugr.es) is full professor of information science, library and information science department, university of granada. e. herrera-viedma (viedma@decsai.ugr.es) is senior lecturer in computer science, computer science and artificial intelligence department, university of granada. 22 information technology and libraries | march 2009 traditionally, these systems are classified as either content-based recommendation systems or collaborative recommendation systems.5 content-based recommendation systems filter information and generate recommendations by comparing a set of keywords defined by the user with the terms used to represent the content of documents, ignoring any information given by other users. by contrast, collaborative filtering systems use the information provided by several users to recommend documents to a given user, ignoring the representation of a document’s content. it is common to group users into different categories or stereotypes that are characterized by a series of rules and preferences, defined by default, that represent the information needs and common behavioural habits of a group of related users. the current trend is to develop hybrids that make the most of content-based and collaborative recommendation systems. in the field of libraries, these services usually adopt the form of sdi services that, depending on the profile of subscribed users, periodically (or when required by the user) generate a series of information alerts that describe the resources in the library that fit a user’s interests.6 sdi services have been studied in different research areas, such as the multi-agent systems development domain,7 and, of course, the digital libraries domain.8 presently, many sdi services are implemented on web platforms based on a multi-agent architecture where there is a set of intermediate agents that compare users’ profiles with the documents, and there are input-output agents that deal with subscriptions to the service and display generated alerts to users.9 usually, the information is structured according to a certain data model, and users’ profiles are defined using a series of keywords that are compared to descriptors or the full text of the documents. despite their usefulness, these services have some deficiencies: n the communication processes between agents, and between agents and users, are hindered by the different ways in which information is represented. n this heterogeneity in the representation of information makes it impossible to reuse such information in other processes or applications. a possible solution to these deficiencies consists of enriching the information representation using a common vocabulary and data model that are understandable by humans as well as by software agents. the semantic web project takes this idea and provides the means to develop a universal platform for the exchange of information.10 semantic web technologies the semantic web project tries to extend the model of the present web by using a series of standard languages that enable enriching the description of web resources and make them semantically accessible.11 to do that, the project basis itself on two fundamental ideas: (1) resources should be tagged semantically so that information can be understood both by humans and computers, and (2) intelligent agents should be developed that are capable of operating at a semantic level with those resources and that infer new knowledge from them (shifting from the search of keywords in a text to the retrieval of concepts).12 the semantic backbone of the project is the resource description framework (rdf) vocabulary, which provides a data model to represent, exchange, link, add, and reuse structured metadata of distributed information sources, thereby making them directly understandable by software agents.13 rdf structures the information into individual assertions (e.g., “resource,” “property,” and “property value triples”) and uniquely characterizes resources by means of uniform resource identifiers (uris), allowing agents to make inferences about them using web ontologies or other, simpler semantic structures, such as conceptual schemes or thesauri.14 even though the adoption of the semantic web and its application to systems like digital libraries is not free from trouble (because of the nature of the technologies involved in the project and because of the project’s ambitious objectives,15 among other reasons), the way these technologies represent the information is a significant improvement over the quality of the resources retrieved by search engines, and it also allows the preservation of platform independence, thus favouring the exchange and reuse of contents.16 as we can see, the semantic web works with information written in natural language that is structured in a way that can be interpreted by machines. for this reason, it is usually difficult to deal with problems that require operating with linguistic information that has a certain degree of uncertainty (e.g., when quantifying the user’s satisfaction in relation to a product or service). a possible solution could be the use of fuzzy linguistic modelling techniques as a tool for improving system–user communication. fuzzy linguistic modelling fuzzy linguistic modelling supplies a set of approximate techniques appropriate for dealing with qualitative aspects of problems.17 the ordinal linguistic approach is defined according to a finite set of tags (s) completely ordered and with odd cardinality (seven or nine tags): { }{ }t,=hi,s=s i …∈ 0, the central term has a value of approximately 0.5, and the rest of the terms are arranged symmetrically around a semantic model of selective dissemination of information | morales-del-castillo et al. 23 it. the semantics of each linguistic term is given by the ordered structure of the set of terms, considering that each linguistic term of the pair (si, st-i) is equally informative. each label si is assigned a fuzzy value defined in the interval [0,1] that is described by a linear trapezoidal property function represented by the 4-tupla (ai, bi, αi, βi). (the two first parameters show the interval where the property value is 1.0; the third and fourth parameters show the left and right limits of the distribution.) additionally, we need to define the following properties: 1.–the set is ordered: si ≥ sj if i ≥ j. 2.–there is the negation operator: neg(si ) = sj, with j = t i. 3.–maximization operator: max(si, sj) = si if si ≥ sj. 4.–minimization operator: min(si, sj) = si if si ≤ sj. it also is necessary to define aggregation operators, such as linguistic weighted averaging (lwa),18 capable of and operating with and combining linguistic information. focusing on facilitating the interaction between users and system, the other starting objective is to achieve the development and implementation of the model proposed in the most automated way possible. to do this, we use a basic auxiliary tool—a thesaurus—that, among other tasks, assists users in the creation of their profile and enables automating the alerts generation. that is why it is critical to define the way in which we create this tool, and in this work we propose a specific method for the semiautomatic development of thesauri using nlp techniques. nlp techniques and other automating tools nlp consists of a series of linguistic techniques, statistic approaches, and machine learning algorithms (mainly clustering techniques) that can be used, for example, to summarize texts in an automatic way, to develop automatic translators, and to create voice recognition software. another possible application of nlp would be the semiautomatic construction of thesauri using different techniques. one of them consists of determining the lexical relations between the terms of a text (mainly synonymy, hyponymy, and hyperonymy),19 and extracting terms that are more representative for the text’s specific domain.20 it is possible to elicit these relations by using linguistic tools, like princeton’s wordnet (http://wordnet .princeton.edu) and clustering techniques. wordnet is a powerful multilanguage lexical database where each one of its entries is defined, among other elements, by their synonyms (synsets), hyponyms, and hyperonyms.21 as a consequence, once given the most important terms of a domain, wordnet can be used to create from them a thesaurus (after leaving out all terms that have not been identified as belonging or related to the domain of interest).22 this tool can also be used with clustering techniques—for example, to group documents of a collection in a set of nodes or clusters, depending on their similarity. each of these clusters is described by the most representative terms of their documents. these terms make up the most specific level of a thesaurus and are used to search in wordnet for their synonyms and most general terms, contributing (with the repetition of this procedure) to the bottom-up-development process of the thesaurus.23 although there are many others, these are some of the most well-known techniques of semiautomatic thesaurus generation (semiautomatic because, needless to say, the supervision of experts is necessary to determine the validity of the final result). for specialized digital libraries, we propose developing, on a multi-agent platform and using all these tools, sdi services capable of generating alerts and recommendations for users according to their personal profiles. in particular, the model presented here is the result of several previous models merging, and its service is based on the definition of “current-awareness bulletins,” where users can find a basic description of the resources recently acquired by the library or those that might be of interest to them.24 n the semantic sdi service model for digital libraries the sdi service includes two agents (an interface agent and a task agent) distributed in a four-level hierarchical architecture: user level, interface level, task level and resource level. its main components are a repository of full-text documents (which make up the stock of the digital library) and a series of elements described using different rdfbased vocabularies: one or several rss feeds that play a role similar to that of current-awareness bulletins in traditional libraries; a repository of recommendation log files that store the recommendations made by users about the resources, and a thesaurus that lists and hierarchically relates the most relevant terms of the specialization domain of the library.25 also, the semantics of each element (that is, its characteristics and the relations the element establishes with other elements in the system) are defined in a web ontology developed in web ontology language (owl).26 next, we describe these main elements as well as the different functional modules that the system uses to carry out its activity. elements of the model there are four basic elements that make up the system: 24 information technology and libraries | march 2009 the thesaurus, user profiles, rss feeds, and recommendation log files. thesaurus an essential element of this sdi service is the thesaurus, an extensible tool used in traditional libraries that enables organizing the most relevant concepts in a specific domain, defining the semantic relations established between them, such as equivalence, hierarchical, and associative relations. the functions defined for the thesaurus in our system include helping in the indexing of rss feeds items and in the generation of information alerts and recommendations. to create the thesaurus, we followed the method suggested by pedraza-jiménez, valverde-albacete, and navia-vázquez.27 the learning technique used for the creation of a thesaurus includes four phases: preprocessing of documents, parameterizing the selected terms, conceptualizing their lexical stems, and generating a lattice or graph that shows the relation between the identified concepts. essentially, the aim of the preprocessing phase is to prepare the documents’ parameterization by removing elements regarded as superfluous. we have developed this phase in three stages: eliminating tags (stripping), standardizing, and stemming. in the first stage, all the tags (html, xml, etc.) that can appear in the collection of documents are eliminated. the second stage is the standardization of the words in the documents in order to facilitate and improve the parameterization process. at this stage, the acronyms and n-grams (bigrams and trigrams) that appear in the documents are identified using lists that were created for that purpose. once we have detected the acronyms and n-grams, the rest of the text is standardized. dates and numerical quantities are standardized, being substituted with a script that identifies them. all the terms (except acronyms) are changed to small letters, and punctuation marks are removed. finally, a list of function words is used to eliminate from the texts articles, determiners, auxiliary verbs, conjunctions, prepositions, pronouns, interjections, contractions, and grade adverbs. all the terms are stemmed to facilitate the search of the final terms and to improve their calculation during parameterization. to carry out this task, we have used morphy, the stemming algorithm used by wordnet. this algorithm implements a group of functions that check whether a term is an exception that does not need to be stemmed and then convert words that are not exceptions to their basic lexical form. those terms that appear in the documents but are not identified by morphy are eliminated from our experiment. the parameterization phase has a minimum complexity. once identified, the final terms (roots or bases) are quantified by being assigned a weight. such weight is obtained by the application of the scheme term frequencyinverse document frequency (tf-idf), a statistic measure that makes possible the quantification of the importance of a term or n-gram in a document depending on its frequency of appearance and in the collection the document belongs to. finally, once the documents have been parameterized, the associated meanings of each term (lemma) are extracted by searching for them in wordnet (specifically, we use wordnet 2.1 for unix-like systems). thus we get the group of synsets associated with each word. the group of hyperonyms and hyponyms also are extracted from the vocabulary of the analyzed collection of documents. the generation of our thesaurus—that is, the identification of descriptors that better represent the content of documents, and the identification of the underlying relations between them—is achieved using formal concept analysis techniques. this categorization technique uses the theory of lattices and ordered sets to find abstraction relations from the groups it generates. furthermore, this technique enables clustering the documents depending on the terms (and synonyms) it contains. also, a lattice graph is generated according to the underlying relations between the terms of the collection, taking into account the hyperonyms and hyponyms extracted. in that graph, each node represents a descriptor (namely, a group of synonym terms) and clusters the set of documents that contain it, linking them to those with which it has any relation (of hyponymy or hyperonymy). once the thesaurus is obtained by identifying its terms and the underlying relations between them, it is automatically represented using the simple knowledge organization system (skos) vocabulary (see figure 1).28 user profiles user profiles can be defined as structured representations that contain personal data, interests, and preferences of users with which agents can operate to customize the sdi service. in the model proposed here, these profiles are basically defined with friend of a friend (foaf), a specific rdf/xml for describing people (which favours the profile interoperability, since this is a widespread vocabulary supported by an owl ontology) and another nonstandard vocabulary of our own to define fields not included in foaf (see figure 2).29 profiles are generated the moment the user is registered in the system, and they are structured in two parts: a public profile that includes data related to the user’s identity and affiliation, and a private profile that includes the user’s interests and preferences about the topic of the alerts he or she wishes to receive. to define their preferences, users must specify keywords and concepts that best define their information a semantic model of selective dissemination of information | morales-del-castillo et al. 25 needs. later, the system compares those concepts with the terms in the thesaurus using as a similarity measure the edit tree algorithm.30 this function matches character strings, then returns the term introduced (if there’s an exact match) or the lexically most similar term (if not). consequently, if the suggested term satisfies user expectations, it will be added to the user’s profile together with its synonyms (if any). in those cases where the suggested term is not satisfactory, the system must have any tool or application that enables users to browse the thesaurus and select terms that better describe their needs. an example of this type of applications is thmanager (http://thmanager .sourceforge.net), a project of the universidad de zaragoza, spain, that enables editing, visualizing, and going through structures defined in skos. each of the terms selected by the user to define his or her areas of interest has an associated linguistic frequency value (tagged as ) that we call “satisfaction frequency.” it represents the regularity with which a particular preference value has been used in alerts positively evaluated by the user. this frequency measures the relative importance of the preferences stated by the user and allows the interface agent to generate a ranking list of results. the range of possible values for these frequencies is defined by a group of seven labels that we get from the fuzzy linguistic variable “frequency,” whose expression domain is defined by the linguistic term set s = {always, almost_ always, often, occasionally, rarely, almost_never, never}, being the default value and “occasionally” being the central value. rss feeds thanks to the popularization of blogs, there has been widespread use of several vocabularies specifically designed for the syndication of contents (that is, for making accessible to other internet users the content of a website by means of hyperlink lists called “feeds”). to create our current-awareness bulletin we use rss 1.0, a vocabulary that enables managing hyperlinks lists in an easy and flexible way. it utilizes the rdf/xml syntax and data model and is easily extensible because of the use of proceedings figure 1. sample entry of a skos core thesaurus diego allione sr. af9fa7601df46e95566 library management 0.83 figure 2. user profile sample 26 information technology and libraries | march 2009 modules that enable extending the vocabulary without modifying its core each time new describing elements are added. in this model several modules are used: the dublin core (dc) module to define the basic bibliographic information of the items utilizing the elements established by the dublin core metadata initiative (http:// dublincore.org); the syndication module to facilitate software agents synchronizing and updating rss feeds; and the taxonomy module to assign topics to feeds items. the structure of the feeds comprises two areas: one where the channel itself is described by a series of basic metadata like a title, a brief description of the content, and the updating frequency; and another where the descriptions of the items that make up the feed (see figure 3) are defined (including elements such as title, author, summary, hyperlink to the primary resource, date of creation, and subjects). recommendation log file each document in the repository has an associated recommendation log file in rdf that includes the listing of evaluations assigned to that resource by different users since the resource was added to the system. each of the entries of the recommendation log files consists of a recommendation value, a uri that identifies the user that has done the recommendation, and the date of the record (see figure 4). the expression domain of the recommendations is defined by the following set of five fuzzy linguistic labels that are extracted from the linguistic variable “quality of the resource”: q = {very_low, low, medium, high, very_high}. these elements represent the raw materials for the sdi service that enable it to develop its activity through four processes or functional modules: the profiles updating process, rss feeds generation process, alert generation process, and collaborative recommendation process. system processes profiles updating process since the sdi service’s functions are based on generating passive searches to rss feeds from the preferences stored 14/03/2007 high figure 4. recommendation log file sample escudero sánchez, manuel fernández cáceres, josé luis broadcasting and the internet http://eprints.rclis.org/…/audiovideo_good.pdf this paper is about… 2002 redoc, 8 (4), 2008 virual communities figure 3. rss feed item sample in a user’s profile, updating the profiles becomes a critical task. user profiles are meant to store long-term preferences, but the system must be able to detect any subtle change in these preferences over time to offer accurate recommendations. in our model, user profiles are updated using a simple mechanism that enables finding users’ implicit preferences by applying fuzzy linguistic techniques and taking into account the feedback users provide. users are asked about their satisfaction degree (ej) in relation to the information alert generated by the system (i.e., whether the items a semantic model of selective dissemination of information | morales-del-castillo et al. 27 retrieved are interesting or not). this satisfaction degree is obtained from the linguistic variable “satisfaction,” whose expression domain is the set of five linguistic labels: s’ = {total, very_high, high, medium, low, very_low, null}. this mechanism updates the satisfaction frequency associated with each user preference according to the satisfaction degree ej. it requires the use of a matching function similar to those used to model threshold weights in weighted search queries.31 the function proposed here rewards the frequencies associated with the preference values present when resources assessed are satisfactory, and it penalizes them when this assessment is negative. let ej { }t,=hba,|ss,s ba 0,...∈∈ s’ be the degree of satisfaction, and f j i l { }t,=hba,|ss,s ba 0,...∈∈ s the frequency of property i (in this case i = “preference”) with value l, then we define the updating function g as s’x s→s: { } { } ( ) {=f,eg s ‘http://www.doaj.org/oai.article’; # the oai repository mylibrary::config->instance( ‘articles’ ); # the mylibrary instance # create a facet called formats $facet = mylibrary::facet->new; $facet->facet_name( ‘formats’ ); $facet->facet_note( ‘types of physical items embodying information.’ ); $facet->commit; $formatid = $facet->facet_id; # create an associated term called articles $term = mylibrary::term->new; $term->term_name( ‘articles’ ); $term->term_note( ‘short, scholarly essays.’ ); $term->facet_id( $formatid ); $term->commit; $articleid = $term->term_id; # create a location type called url $location_type = mylibrary::resource::location::type->new; $location_type->name( ‘url’ ); $location_type->description( ‘the location of an internet resource.’ ); $location_type->commit; $location_type_id = $location_type->location_type_id; # create a harvester and loop through each oai set mylibrary: a digital library framework and toolkit | morgan 21 $harvester = net::oai::harvester->new( ‘baseurl’ => doaj ); $sets = $harvester->listsets; foreach ( $sets->setspecs ) { # get each record in this set and process it $records = $harvester->listallrecords( metadataprefix => ‘oai_dc’, set => $_ ); while ( $record = $records->next ) { # map the oai metadata to mylibrary attributes $fkey = $record->header->identifier; $metadata = $record->metadata; $name = $metadata->title; @creators = $metadata->creator; $note = $metadata->description; $publisher = $metadata->publisher; next if ( ! $publisher ); $location = $metadata->identifier; next if ( ! $location ); $date = $metadata->date; $source = $metadata->source; @subjects = $metadata->subject; # create and commit a mylibrary resource $resource = mylibrary::resource->new; $resource->fkey( $fkey ); $resource->name( $name ); $creator = ‘’; foreach ( @creators ) { $creator .= “$_|” } $resource->creator( $creator ); $resource->note( $note ); $resource->publisher( $publisher ); $resource->source( $source ); $resource->date( $date ); $subject = ‘’; foreach ( @subjects ) { $subject .= “$_|” } $resource->subject( $subject ); $resource->related_terms( new => [ $articleid ]); $resource->add_location( location => $location, location_type => $location_type_id ); $resource->commit; } } 22 information technology and libraries | september 2008 # done exit; appendix b # index mylibrary data with kinosearch # require use kinosearch::invindexer; use kinosearch::analysis::polyanalyzer; use mylibrary::core; # define use constant index => ‘../etc/index’; # location of the index mylibrary::config->instance( ‘articles’ ); # mylibrary instance to use # initialize the index $analyzer = kinosearch::analysis::polyanalyzer->new( language => ‘en’ ); $invindexer = kinosearch::invindexer->new( invindex => index, create => 1, analyzer => $analyzer ); # define the index’s fields $invindexer->spec_field( name => ‘id’ ); $invindexer->spec_field( name => ‘title’ ); $invindexer->spec_field( name => ‘description’ ); $invindexer->spec_field( name => ‘source’ ); $invindexer->spec_field( name => ‘publisher’ ); $invindexer->spec_field( name => ‘subject’ ); $invindexer->spec_field( name => ‘creator’ ); # get and process each resource foreach ( mylibrary::resource->get_ids ) { # create, fill, and commit a document with content my $resource = mylibrary::resource->new( id => $_ ); my $doc = $invindexer->new_doc; $doc->set_value ( id => $resource->id ); mylibrary: a digital library framework and toolkit | morgan 23 $doc->set_value ( title => $resource->name ) unless ( ! $resource->name ); $doc->set_value ( source => $resource->source ) unless ( ! $resource->source ); $doc->set_value ( publisher => $resource->publisher ) unless ( ! $resource->publisher ); $doc->set_value ( subject => $resource->subject ) unless ( ! $resource->subject ); $doc->set_value ( creator => $resource->creator ) unless ( ! $resource->creator ); $doc->set_value ( description => $resource->note ) unless ( ! $resource->note ); $invindexer->add_doc( $doc ); } # optimize and done $invindexer->finish( optimize => 1 ); exit; appendix c # search a kinosearch index and display content from mylibrary # require use kinosearch::searcher; use kinosearch::analysis::polyanalyzer; use mylibrary::core; # define use constant index => ‘../etc/index’; # location of the index mylibrary::config->instance( ‘articles’ ); # mylibrary instance to use # get the query my $query = shift; if ( ! $query ) { print “enter a query. “; chop ( $query = )} # open the index $analyzer = kinosearch::analysis::polyanalyzer->new( language => ‘en’ ); $searcher = kinosearch::searcher->new( invindex => index, analyzer => $analyzer ); # search $hits = $searcher->search( qq( $query )); # get the number of hits and display $total_hits = $hits->total_hits; 24 information technology and libraries | september 2008 print “your query ($query) found $total_hits record(s).\n\n”; # process each search result while ( $hit = $hits->fetch_hit_hashref ) { # get the mylibrary resource $resource = mylibrary::resource->new( id => $hit->{ ‘id’ }); # extract dublin core elements and display print “ id = “ . $resource->id . “\n”; print “ name = “ . $resource->name . “\n”; print “ date = “ . $resource->date . “\n”; print “ note = “ . $resource->note . “\n”; print “ creators = “; foreach ( split /\|/, $resource->creator ) { print “$_; “ } print “\n”; # get related terms and display @resource_terms = $resource->related_terms(); print “ term(s) = “; foreach (@resource_terms) { $term = mylibrary::term->new(id => $_); print $term->term_name, “ ($_)”, ‘; ‘; } print “\n”; # get locations (urls) and display @locations = $resource->resource_locations(); print “ location(s) = “; foreach (@locations) { print $_->location, “; “ } print “\n\n”; } # done exit; information retrieval using a middleware approach danijela boberić krstićev information technology and libraries | march 2013 54 abstract this paper explores the use of a mediator/wrapper approach to enable the search of an existing library management system using different information retrieval protocols. it proposes an architecture for a software component that will act as an intermediary between the library system and search services. it provides an overview of different approaches to add z39.50 and search/retrieval via url (sru) functionality using a middleware approach that is implemented on the bisis library management system. that wrapper performs transformation of contextual query language (cql) into lucene query language. the primary aim of this software component is to enable search and retrieval of bibliographic records using the sru and z39.50 protocols, but the proposed architecture of the software components is also suitable for inclusion of the existing library management system into a library portal. the software component provides a single interface to server-side protocols for search and retrieval of records. additional protocols could be used. this paper provides practical demonstration of interest to developers of library management systems and those who are trying to use open-source solutions to make their local catalog accessible to other systems. introduction information technologies are changing and developing very quickly, forcing continual adjustment of business processes to leverage the new trends. these changes affect all spheres of society, including libraries. there is a need to add new functionality to existing systems in ways that are cost effective and do not require major redevelopment of systems that have achieved a reasonable level of maturity and robustness. this paper describes how to extend an existing library management system with new functionality supporting easy sharing of bibliographic information with other library management systems. one of the core services of library management systems is support for shared cataloging. this service consists of the following activities: a librarian when processing a new bibliographical unit first checks whether the bibliographic unit has already been recorded in another library in the world. if it is found, then the librarian stores that electronic records to his/her local database of bibliographic records. in order to enable those activities, it is necessary that standard way of communication between different library management systems exists. currently, the well-known standards in this area are z39.501 and sru.2 danijela boberić krstićev (dboberic@uns.ac.rs) is a member department of mathematics and informatics, faculty of sciences, university of novi sad, serbia. mailto:dboberic@uns.ac.rs information retrieval using a middleware approach | krstićev 55 in this paper, a software component that integrates services for retrieval bibliographic records using the z39.50 and sru standard is described. the main purpose of that component is to encapsulate server sides of the appropriate protocols and to provide a unique interface for communication with the existing library management system. the same interface may be used regardless of which protocols are used for communication with the library management system. in addition, the software component acts as an intermediary between two different library management systems. the main advantage of the component is that it is independent of library management system with which it communicates. also, the component could be extended with new search and retrieval protocols. by using the component, the functionality of existing library management systems would be improved and redevelopment of the existing system would not be necessary. it means that the existing library management system would just need to provide an interface for communication with that component. that interface can even be implemented as an xml web service. standards used for search and retrieval the z39.50 standard was one of the first standards that defined a set of services to search for and retrieve data. the standard is an abstract model that defines communication between the client and server and does not go into details of implementation of the client or server. the model defines abstract prefixes used for search that do not depend on the implementation of the underlying system. it also defines the format in which data can be exchanged. the z39.50 standard defines query language type-1, which is required when implementing this standard. the z39.50 standard has certain drawbacks that new generation of standards, like sru, is trying to overcome. sru tries to keep functionality defined by z39.50 standard, but to allow its implementation using current technologies. one of the main advantages of the sru protocol, as opposed to z39.50, is that it allows messages to be exchanged in a form of xml documents, which was not the case with the z39.50 protocol. the query language used in sru is called contextual query language (cql).3 the sru standard has two implementations, one in which search and retrieval is done by sending messages via the hypertext transfer protocol (http) get and post methods (sru version) and the other for sending messages using the simple object access protocol (soap) (srw version). the main difference between sru and srw is in the way of sending messages.4 the srw version of the protocol packs messages in the soap envelope element, while the sru version of the protocol sends messages based on parameter/value pairs that are included in the url. another difference between the two versions is that the sru protocol for messages transfer uses only http, while srw, in can use secure shell (ssh) and simple mail transfer protocol (smtp), in addition to http. information technology and libraries | march 2013 56 related work a common approach for adding sru support to library systems, most of which already support, the z39.50 search protocol,5 has been to use existing software architecture that supports the z39.50 protocol. simultaneously supporting both protocols is very important because individual libraries will not decide to move to the new protocol until it is widely adopted within the library community. one approach in the implementation of a system for retrieval of data using both protocols is to create two independent server-side components for z39.50 and sru, where both software components access a single database. this approach involves creating a server implementation from the scratch without the utilization of existing architectures, which could be considered a disadvantage. figure 1. software architecture of a system with separate implementations of serverside protocols this approach is good if there is an existing z39.50 or sru server-side implementation, or if there is a library management system, for example, that supports just the z39.50 protocol, but has open programming code and allows changes that would allow the development of an sru service. the system architecture that is based on this approach is shown in figure 1 as a unified modeling language (uml) component diagram. in this figure, the software components that constitute the implementation of the client and the server side for each individual protocol are clearly separated, while the database is shared. the main disadvantage of this approach is that adding support for new search and retrieval protocols requires the transformation of the query language supported by that new protocol into the query language of target system. for example, if the existing library management system uses a relational database to store bibliographic records, for every a new protocol added, its query language must be transformed into the structured query language (sql) supported by the database. z39.50 server side sru server side database z39.50 client side sru client side zservice sruservice jdbc information retrieval using a middleware approach | krstićev 57 however, in most commercial library management systems that support server-side z39.50, local development and maintenance of additional services may not be possible due to the closed nature of the systems. one of the solutions in this case would be to create a so-called “gateway” software component that implements both an sru server and a z39.50 client, used to access the existing z39.50 server. that is, if a sru client's application sends search request, the gateway will accept that request, transform it into the z39.50 request and forward the request to the z39.50 server. similarly, when the gateway receives a response from the z39.50 server, the gateway will transform this response in sru response and forward it to the client. in this way, the client will have the impression that communicates directly with the sru server, while the existing z39.50 server will think that it sends response directly to the z39.50 client. figure 2 presents a component diagram that represents the architecture of the system that is based on this approach. figure 2. software architecture of a system with a gateway the software architecture shown in the figure 2 is one of the most common approaches and is used by the library of congress (lc),6 which uses the commercial voyager7 library information system, which allows searching by the z39.50 protocol. in order to support search of the lc database using sru, indexdata8 developed the yazproxy software component,9 which is an sruz39.50 gateway. the same idea10 was used in the implementation of the "the european library”11 database sru client side jdbc gateway sru server side z39.50 client side srutoz3950converter zservice z39.50 server side sruservice information technology and libraries | march 2013 58 portal, which aims to provide integrated access to the major collections of all the european national libraries. another interesting approach in designing software architecture for systems dealing with retrieval of information can be observed in the systems involved in searching heterogeneous information sources. the architecture of these systems is shown in figure 3. the basic idea in most of these systems is to provide the user with a single interface to search different systems. this means that there is a separate component that will accept a user query and transform it into a query that is supported by the specific system component that offers search and data retrieval. this component is also known as a mediator. a separate wrapper component must be created for each system to be searched, to convert the user's query to a query that is understood by the particular target system.12 figure 3. architecture with the mediator/wrapper approach figure 3 shows a system architecture that enables communication with three different systems (system1, system2 and systemn), each of which may use a different query language and therefore need different wrapper components (wrapper1, wrapper2 and wrappern ). in this architecture, each system can be a new mediator component that will interact with other systems. that is, the wrapper component can communicate with the system or with another mediator. the role of the mediator is to accept the request defined by the user and send it to all wrapper components. the wrapper components know how to transform the query that is sent by a mediator into a query that is supported by the target system with which the wrapper communicates. in addition, the wrapper has to transform data received from the target system in a format prescribed by the mediator. communication between client applications and the mediator client mediator system1 system2 systemn wrapper1 wrapper2 wrappern converter1 concrete query languagenconcrete query language2concrete query language1 converter2 convertern uniform query language information retrieval using a middleware approach | krstićev 59 may be through one of the protocols for search and retrieval of information, for example through the sru or z39.50 protocols, or it may be a standard http protocol. systems in which the architecture is based on the mediator/wrapper approach are described in several papers. coiera et al (2005)13 describe the architecture of a system that deals with the federated search of journals in the field of medicine, using the internal query language unified query language (uql). for each information source with which the system communicates, a wrapper was developed to translate queries from uql into the native query language of the source. the wrapper also has the task of returning search results to the mediator. those results are returned as an xml document, with a defined internal format called a unified response language (urel). as an alternative to using particular defined languages (uql and urel), a cql query language and the sru protocol could be used. another example of the use of mediators is described by cousins and sanders (2006),14 who address the interoperability issues in cross-database access and suggest how to incorporate a virtual union catalogue into the wider information environment through the application of middleware, using the z39.50 protocol to communicate with underlying sources. software component for services integration this paper describes a software component that would enable the integration of services for search and retrieval of bibliographic records into an existing library system. the main idea is that the component should be modular and flexible in order to allow the addition of new protocols for search and easy integration into the existing system. based on the papers analyzed in the previous section, it was concluded that a mediator/wrapper approach would work best. the architecture of system that would include the component and that would allow search and retrieval of bibliographic records from other library systems is shown in figure 4. z39.50 client sru client library information system recordmanager intermediary mediator wrapper z39.50 server sru server information technology and libraries | march 2013 60 figure 4. architecture of system for retrieval of bibliographic records in figure 4, the central place is occupied by the intermediary component, which consists of a mediator component and a wrapper component. this component is an intermediary between the search service and an existing library system. the library system provides an interface (recordmanager) which is responsible for returning records that match the received query. figure 4 also shows the components that are client applications that use specific protocols for communication (sru and z39.50), as well as the components that represent the server-side implementation of appropriate protocols. this paper will not describe the architecture of components that implement the server side of the z39.50 and sru protocols, primarily because there are already a lot of open-source solutions15 that implement those components and can easily be connected with this intermediary component. in order to test the intermediary component, we used the server side of the z39.50 protocol developed through the jafer project16 ; for the sru server side, we developed a special web service in the java programming language. in further discussion, it is assumed that the intermediary component receives queries from server-side z39.50 and sru services, and that this component does not contain any implementation of these protocols. the mediator component, which is part of the intermediary component, must accept queries sent by the server-side search and retrieval services. the mediator component uses its own internal representation of queries, so it is therefore necessary to transform received queries into the appropriate internal representation. after that, the mediator will establish communication with the wrapper component, which is in charge of executing queries in existing library system. the basic role of the wrapper component is to transform queries received from the mediator into queries supported by library system. after executing the query, the wrapper sends search results as an xml document to the mediator. before sending those results to server side of protocol, the mediator must transform those results into the format that was defined by the client. mediator software component the mediator is a software component that provides a unique interface for different client applications. in this study, as shown in figure 4, a slightly different solution was selected. instead of the mediator communicating directly with the client application, which in the case of protocols for data exchange is client side of that protocol, it actually communicates with the server components that implement the appropriate protocols, and the client application exchanges messages with the corresponding server-side protocol. the z39.50 client exchanges messages with the appropriate z39.50 server, and it communicates with the mediator component. a similar process is done when communication is done using the sru protocol. what is important to emphasize is that the z39.50 and sru servers communicate with the mediator through a unified user interface, represented in figure 5 by class mediatorservice. in this way the same method is used to submit the query and receive results, regardless of which protocol is used. that means information retrieval using a middleware approach | krstićev 61 that our system becomes more scalable and that it is possible to add some new search and retrieval protocols without refactoring the mediator component. figure 5 shows the uml class diagram that describes the software mediator component. the mediatorservice class is responsible for communication with the server-side z39.50 and sru protocols. this class accepts queries from the server side of protocols and returns bibliographic records in the format defined by the server. the mediator can accept queries defined by different query languages. its task is to transform these queries to an internal query language, which will be forwarded to the wrapper component. in this implementation, accepted queries are transformed into an object representation of cql, as defined by the sru standard. one of the reasons for choosing cql is that concepts defined in the z39.50 standard query language can be easily mapped to the corresponding concepts defined by cql. cql is semantically rich, so can be used to create various types of queries. also, because it is based on the concept of context set, it is extensible and allows usage of various types of context sets for different purposes. so, cql is not just limited to the function of searching bibliographic material. it could, for example, be used for searching geographical data. accordingly, it was assumed that cql is a general query language and that probably any query language could be transformed into it. in this implementation, the object model of cql query defined in project cqljava17 was used. in the case that there is a new query language, it would be necessary to perform mapping of the new query language into cql or to extend the object model of cql with new concepts. this implementation of the mediator component could transform two different types of queries into the cql object model. currently, it can transform type-1 queries (used by z39.50) and cql queries into cql object representation. to to add a new query language, it would just be necessary to add a new class that would implement the interface queryconverter shown in figure 5, but the architecture of component mediator remains the same. one task of the mediator component is to return records in the format that was defined by the client that sent the request. information technology and libraries | march 2013 62 figure 5. uml class diagram of mediator component as the mediator communicates with the z39.50 and sru server side, the task of the z39.50 and sru server side will be to check whether the format that the client requires is supported by the underlying system. if it is not supported, the request is not sent to mediator. otherwise, the mediator ensures the transformation of retrieved records into the chosen format. the mediator obtains bibliographic records from the wrapper in the form of an xml document that is valid according to the appropriate xml schema.18 the xml schema allows the creation of an xml document describing bibliographic records according to the unimarc19 or marc2120 format. the current implementation of the mediator component supports transformation of bibliographic records into an xml document that can be an instance of the unimarcslim xml schema,21 the marc21slim xml schema,22 or the dublin core xml schema.23 adding support for a new format would require creating a new class that would extend the class recordserializer (figure 5). because this mediator component works with xml, the transformation of bibliographic records into a new format also could be done by using exstensible stylesheet language transformations (xslt). 0..11..1 0..1 1..* 0..1 0..1 mediatorservice + getrecords (object query, string format) : string[] wrapper + executequery (cqlnode cqlquery) : string[] cqlstringconverter + parsequery (object query) : cqlnode rpnconverter + parsequery (object query) : cqlnode queryconverter + parsequery (object query) : cqlnode marc21serializer + serialize (string r) : sting dublincoreserializer + serialize (string r) : sting unimarcserializer + serialize (string r) : sting recordserialize + serialize (string r) : sting information retrieval using a middleware approach | krstićev 63 wrapper software component the wrapper software component is responsible for ensuring communication between the mediator and the existing library system. that is, the wrapper component is responsible for transforming the cql object representation into a concrete query that is supported by the existing library system and for obtaining results that match the query. implementation of the wrapper component directly depends on the architecture of the existing library system. figure 7 proposes a possible architecture of the wrapper component. this proposed architecture assumes that the existing library system provides some kind of service that will be used by the wrapper component to send the query and obtain results. the recordmanager interface in figure 7 is an example of such a service. recordmanager has two operations, one which executes the query and returns the number of hits and the second operation which returns bibliographic records. this proposed solution is useful for libraries that use a library management system that can be extended. it may not be appropriate for libraries using an “off the self” library management system that cannot be extended. the proposed architecture of the wrapper component is based on a strategy design pattern,24 primarily because of the need for transformation of the cql query into a query that is supported by the library system. according to the cql concept of context sets, all prefixes that can be searched are grouped in context sets, and these sets are registered with the library of congress. the concept of context sets enables specific communities and users to define their own prefixes, relations, and modifiers without fear that their name will be identical to the name of prefix defined in another set. that is, it is possible to define two prefixes with the same name, but they belong to different sets and therefore have different semantics. cql offers the possibility of combining in a single query elements that are defined in different context sets. when parsing a query, it is necessary to check which context set a particular item belongs to and then to apply appropriate mapping of the element from the context set to the corresponding element defined by the query language used in the library system. the strategy design pattern includes patterns that describe the behavior of objects (behavioral patterns), which determine the responsibility of each object and the way in which objects communicate with each other. the main task of a strategy pattern is to enable easy adjustment of the algorithm that is applied by an object at runtime. strategy pattern defines a family of algorithms, each of which is encapsulated in a single object. figure 6 is shows a class diagram from the book “design patterns: elements of reusable object-oriented software,“25 which describes basic elements of strategy patterns. information technology and libraries | march 2013 64 figure 6. strategy design pattern the basic elements of this pattern are the classes context, strategy, concretestrategya and concretestrategyb. the class context is in charge of choosing and changing algorithms in a way that creates an instance of the appropriate class, which implements the interface strategy. interface strategy contains the method algorityinterface(), which should implement all classes that implement that interface. class concretestrategya implements one concrete algorithm. this design pattern is used when transforming cql queries primarily because cql queries can consist of elements that belong to different context sets, whose elements are interpreted differently. classes context, strategy, cqlstrategy and dcstrategy, shown in figure 7, are elements of strategy pattern responsible for mapping concepts defined by cql. the class context is responsible for selection of appropriate strategies for parsing, depending on which context set the element that is going to be transformed belongs to. class cqlstrategy and dcstrategy are responsible for mapping the elements belonging respectively to the cql or dublin core context set in the appropriate elements of a particular query language used by the library system. the use of strategy pattern makes it possible, in real time, to change the algorithm that will parse the query depending on what context set is used. the described implementation of a wrapper component enables the parsing of queries that contain only elements that belong to cql and/or the dublin core context set. in order to provide support for a new context set, a new implementation of interface strategy (figure 7) would be required, including an algorithm to parse the elements defined by this new set. information retrieval using a middleware approach | krstićev 65 figure 7. uml class diagram of wrapper component integration of intermediary software components into the bisis library system the bisis library system was developed at the faculty of science and the faculty of technical sciences in novi sad, serbia, and has had several versions since its introduction in 1993. the fourth and current version of the system is based on xml technologies. among the core functional units of bisis26 are: • circulation of library material • cataloging of bibliographic records • indexing and retrieval of bibliographic records • downloading bibliographic records through z39.50 protocol • creation of a card catalog • creation of statistical reports an intermediary software component has been integrated into the bisis system. the intermediary component was written in the java programming language and implemented as a web application. communication between server applications that support the z39.50 and sru protocols and the intermediary component is done using the software package hessian.27 hessian offers a simple implementation of two protocols to communicate with web services, a binary protocol and its corresponding xml protocol, both of which rely on http. use of hessian package makes it easy to create a java servlet on the server side and proxy object on client-side, which will be used to 0..1 1..1 0..11..1 0..1 1..1 context + + + setstrategy (string strategy) mapindext ounderlayingprefix (string index) parseoperand (string index, cqlt ermnode node) : void : string : object strategy + + mapindext ounderlayingprefix (string index) parseoperand (string underlayingpref, cqlt ermnode node) : string : object cqlstrategy + + mapindext ounderlayingprefix (string index) parseoperand (string underlayingpref, cqlt ermnode node) : string : object dcstrategy + + mapindext ounderlayingprefix (string index) parseoperand (string underlayingpref, cqlt ermnode node) : string : object recordmanager + + select (object query) getrecords (int hits[]) : int[] : string[] wrapper + executequery (cqlnode cqlquery) makequery (cqlnode cql, object underlayingquery) : string[] : object information technology and libraries | march 2013 66 communicate with the servlet. in this case, the proxy object is deployed on the server side of protocol and the intermediary component contains a servlet. communication between the intermediary and bisis is also realized using the hessian software package, which leads to the possibility of creating a distributed system because the existing library system, the intermediary component, and server applications that implement the protocols can be located on physically separate computers. the bisis library system uses the lucene software package for indexing and searching. lucene has defined its own query language,29 so the wrapper component that is integrated into bisis has to transform to the cql query object model the object representation of the query defined by lucene. therefore the wrapper first needs to determine to which context set the index belongs and then apply the appropriate strategy for mapping the index. the rules for mapping the index to lucene fields are read from the corresponding xml document that is defined for every context set. listing 1 below provides an example of an xml document that contains some rules for mapping indexes of the dublin core context set to lucene index fields. the xml element index represents the name of index which is going to be mapped, while the xml element mappingelement contains the name of lucene field. for example, the title index defined in the dublincore context set, which denotes search by title of the publication, is mapped to the field ti, which is used by the search engine of bisis system. title ti creator au subject sb listing 1. xml document with rules for mapping the dublincore context set after the index is mapped to corresponding fields in lucene, a similar procedure is repeated for a relationship that may belong to some other context set or may have modifiers that belong to some information retrieval using a middleware approach | krstićev 67 other context set. it is therefore necessary to change the current strategy for mapping into a new one. by doing this, all elements of the cql query are converted into a lucene query, so the new query can be sent to bisis to be executed. approximately 40 libraries in serbia currently use the bisis system, which includes a z39.50 client, allowing the libraries to search the collections of other libraries that support communication through the z39.50 protocol. by integrating the intermediary component in the bisis system, non-bisis libraries may now search the collections of libraries that use bisis. as a first step, the intermediary component was just integrated in a few libraries, without any major problems. the component is most useful to the city libraries that use system bisis, because they have many branches, which can now search and retrieve bibliographic records from their central libraries. the component could potentially be used by other library management system, assuming the presence of an appropriate wrapper component to transform cql to the target query language. conclusion this paper describes an independent, modular software component that enables the integration of a service for search and retrieval of bibliographic records into an existing library system. the software component provides a single interface to server-side protocols to search and retrieve records, and could be extended to support additional server-side protocols. the paper describes the communication of this component with z39.50 and sru servers. the software component was developed for integration with the bisis library system, but is an independent component that could be integrated in any other library system. the proposed architecture of the software component is also suitable for inclusion of the existing library system into a single portal. the architecture of the portal should involve one mediator component whose task would be to communicate with wrapper components of individual library systems. each library system would implement its own search and store functionalities and could function independently of the portal. the basic advantage of this architecture is that it is possible to include new library systems that provide search services. it is only necessary to add a new wrapper that will perform the appropriate transformation of the query obtained from the mediator component in a query that the library system can process. the task of the mediator is to send queries to the wrapper, while each wrapper can establish communication with a specific library system. after obtaining the results from underlying library system, the mediator should be able to combine results, remove duplicate, and sort results. in this way end user would have impression that he has been searched a single database. references 1. “information retrieval (z39.50): application service definition and protocol specification,” http://www.loc.gov/z3950/agency/z39-50-2003.pdf (accessed february 22, 2013). http://www.loc.gov/z3950/agency/z39-50-2003.pdf information technology and libraries | march 2013 68 2. “search/retrieval via url,” http://www.loc.gov/standards/sru/. 3. “contextual query language – cql,” http://www.loc.gov/standards/sru/specs/cql.html. 4. eric lease morgan, "an introduction to the search/retrieve url service (sru),” ariadne 40 (2004), http://www.ariadne.ac.uk/issue40/morgan. 5. larry e. dixson, "yaz proxy installation to enhance z39.50 server performance,” library hi tech 27, no. 2 (2009): 277-285, http://dx.doi.org/10.1108/07378830910968227; mike taylor and adam dickmeiss, “delivering marc/xml records from the library of congress catalogue using the open protocols srw/u and z39.50,” (paper presented at world library and information congress: 71st ifla general conference and council, oslo, 2005). 6. mike taylor and adam dickmeiss,“delivering marc/xml records from the library of congress catalogue using the open protocols srw/u and z39.50,” (paper presented at world library and information congress: 71st ifla general conference and council, oslo, 2005). 7. “voyager integrated library system,” http://www.exlibrisgroup.com/category/voyager. 8. “indexdata,” http://www.indexdata.com/. 9. “yazproxy,” http://www.indexdata.com/yazproxy. 10. theo van veen and bill oldroyd, “search and retrieval in the european library,” d-lib magazine 10, no. 2 (2004), http://www.dlib.org/dlib/february04/vanveen/02vanveen.html.. 11. “тhe european library,” http://www.theeuropeanlibrary.org./tel4/. 12. gio wiederhold ,“mediators in the architecture of future information systems,” computer 25, no. 3 (1992): 38-49, http://dx.doi.org/10.1109/2/121508. 13. enrico coiera, martin walther, ken nguyen, and nigel h. lovell, “architecture for knowledgebased and federated search of online clinical evidence,” journal of medical internet research 7, no. 5 (2005), http://www.jmir.org/2005/5/e52/. 14. shirley cousins and ashley sanders, “incorporating a virtual union catalogue into the wider information environment through the application of middleware: interoperability issues in crossdatabase access,” journal of documentation 62, no. 1 (2006): 120-144, http://dx.doi.org/10.1108/00220410610642084. 15. “sru software and tools,” http://www.loc.gov/standards/sru/resources/tools.html; “z39.50 registry of implementators,” http://www.loc.gov/z3950/agency/register/entries.html. 16. “jafer toolkit project,” http://www.jafer.org. 17. “cql-java: a free cql compiler for java,” http://zing/z3950.org/cql/java/. http://www.loc.gov/standards/sru/ http://www.loc.gov/standards/sru/specs/cql.html http://www.ariadne.ac.uk/issue40/morgan http://dx.doi.org/10.1108/07378830910968227 http://www.exlibrisgroup.com/category/voyager http://www.indexdata.com/ http://www.indexdata.com/yazproxy http://www.dlib.org/dlib/february04/vanveen/02vanveen.html http://www.theeuropeanlibrary.org./tel4/ http://dx.doi.org/10.1109/2/121508 http://www.jmir.org/2005/5/e52/ http://dx.doi.org/10.1108/00220410610642084 http://www.loc.gov/standards/sru/resources/tools.html http://www.loc.gov/z3950/agency/register/entries.html http://www.jafer.org/ http://zing/z3950.org/cql/java/ information retrieval using a middleware approach | krstićev 69 18. bojana dimić, branko milosavljević and dušan surla,“xml schema for unimarc and marc 21 formats,” the electronic library 28, no. 2 (2010): 245-262, http://dx.doi.org/10.1108/02640471011033611. 19. “unimarc formats and related documentation,” http://www.ifla.org/en/publications/unimarcformats-and-related-documentation. 20. “marc 21 format for bibliographic data,” http://www.loc.gov/marc/bibliographic/. 21. “unimarcslim xml schema,” http://www.bncf.firenze.sbn.it/progetti/unimarc/slim/documentation/unimarcslim.xsd. 22. “marc21slim xml schema,” http://www.loc.gov/standards/marcxml/schema/marc21slim.xsd. 23. “dublincore xml schema,” http://www.loc.gov/standards/sru/resources/dc-schema.xsd. 24. erich gamma, richard helm, ralph johnson, and john vlissides, design patterns: elements of reusable object-oriented software (indianapolis: addison–wesley, 1994), 315-323. 25. ibid. 26. danijela boberić and branko milosavljević, “generating library material reports in software system bisis,” (proceedings of the 4th international conference on engineering technologies icet, novi sad, 2009); danijela boberić and dušan surla, “xml editor for search and retrieval of bibliographic records in the z39.50 standard”, the electronic library 27, no. 3 (2009): 474-495, http://dx.doi.org/10.1108/02640470910966916 (accessed february 22, 1013); bojana dimić and dušan surla, “xml editor for unimarc and marc21 cataloguing,” the electronic library 27, no. 3 (2009): 509-528, http://dx.doi.org/10.1108/02640470910966934 (accessed february 22, 2013); jelena rađenović, branko milosavljеvić and dušan surla, “modelling and implementation of catalogue cards using freemarker,” program: electronic library and information systems 43, no. 1 (2009): 63-76, http://dx.doi.org/10.1108/00330330934110 (accessed february 22, 2013); danijela tešendić, branko milosavljević and dušan surla, “a library circulation system for city and special libraries”, the electronic library 27, no. 1 (2009): 162-186, http://dx.doi.org/10.1108/02640470910934669. 27. “hessian,” http://hessian.caucho.com/doc/hessian-overview.xtp. 28. branko milosavljević, danijela boberić, and dušan surla, “retrieval of bibliographic records using apache lucene,” the electronic library 28, no. 4 (2010): 525-539, http://dx.doi.org/10.1108/02640471011065355. acknowledgement the work is partially supported by the ministry of education and science of the republic of serbia, through project no. 174023: "intelligent techniques and their integration into wide-spectrum decision support." http://dx.doi.org/10.1108/02640471011033611 http://www.ifla.org/en/publications/unimarc-formats-and-related-documentation http://www.ifla.org/en/publications/unimarc-formats-and-related-documentation http://www.loc.gov/marc/bibliographic/ http://www.bncf.firenze.sbn.it/progetti/unimarc/slim/documentation/unimarcslim.xsd http://www.loc.gov/standards/marcxml/schema/marc21slim.xsd http://www.loc.gov/standards/sru/resources/dc-schema.xsd http://dx.doi.org/10.1108/02640470910966916 http://dx.doi.org/10.1108/02640470910966934 http://dx.doi.org/10.1108/00330330934110 http://dx.doi.org/10.1108/02640470910934669 http://hessian.caucho.com/doc/hessian-overview.xtp http://dx.doi.org/10.1108/02640471011065355 abstract smartphones: a potential discovery tool | starkweather and stoward 187 smartphones: a potential discovery tool wendy starkweather and eva stowers the anticipated wide adoption of smartphones by researchers is viewed by the authors as a basis for developing mobile-based services. in response to the unlv libraries’ strategic plan’s focus on experimentation and outreach, the authors investigate the current and potential role of smartphones as a valuable discovery tool for library users. w hen the dean of libraries announced a discovery mini-conference at the university of nevada las vegas libraries to be held in spring 2009, we saw the opportunity to investigate the potential use of smartphones as a means of getting information and services to students. being enthusiastic users of apple’s iphone, we and the web technical support manager, developed a presentation highlighting the iphone’s potential value in an academic library setting. because wendy is unlv libraries’ director of user services, she was interested in the applicability of smartphones as a tool for users to more easily discover the libraries’ resources and services. eva, as the health sciences librarian, was aware of a long tradition of pda use by medical professionals. indeed, first-year bachelor of science nursing students are required to purchase a pda bundled with select software. together we were drawn to the student-outreach possibilities inherent in new smartphone applications such as twitter, facebook, and myspace. n presentation our brief review of the news and literature about mobile phones in general provided some interesting findings and served as a backdrop for our presentation: n a total of 77 percent of internet experts agreed that the mobile phone would be “the primary connection tool” for most people in the world by 2020.1 the number of smartphone users is expected to top 100 million by 2013. there are currently 25 million smartphone users, with sales in north america having grown 69 percent in 2008.2 n smartphones offer a combination of technologies, including gps tracking, digital cameras, and digital music, as well as more than fifty-thousand specialized apps for the iphone and new ones being designed for the blackberry and the palm pre.3 the palm pre offered less than twenty applications at its launch, but one million apllication downloads had been performed by june 24, 2009, less than a month after launch.4 n the 2009 horizon report predicts that the time to adoption of these mobile devices in the educational context will be “one year or less.”5 data gathered from campus users also was presented, providing another context. in march 2009, a survey of university of california, davis (uc-davis) students showed that 43 percent owned a smartphone.6 uc-davis is participating in apple’s university education forum. here at unlv, 37 percent of students and 26 percent of faculty and staff own a smartphone.7 the presentation itself highlighted the mobile applications that were being developed in several libraries to enhance student research, provide library instruction, and promote library services. two examples were abilene christian university (http://www.acu.edu/technology/ mobilelearning/index.html), which in fall 2008 distributed iphones and ipod touches to the incoming freshman class; and stanford university (http://www.stanford .edu/services/wirelessdevice/iphone/) which participates in “itunes u” (http://itunes.stanford.edu/). if the libraries were to move forward with smartphone technologies, it would be following the lead of such universities. readers also may be interested in joan lippincott’s recent concise summary of the implications of mobile technologies for academic libraries as well as the chapter on library mobile initiatives in the july 2008 library technology report.8 n goals: a balancing act ultimately the goal for many of these efforts is to be where the users are. this aspiration is spelled out in unlv libraries’ new strategic plan relating to infrastructure evolution, namely, “work towards an interface and system architecture that incorporates our resources, internal and external, and allows the user to access from their preferred starting point.”9 while such a goal is laudable and fits very well into the discovery emphasis of the mini-conference presentation, we are well aware of the need for further investigation before proceeding directly to full-scale development of a complete suite of mobile services for our users. of critical importance is ascertaining where our users are and determining whether they want us to be there and in what capacity. the value of this effort is demonstrated in booth’s research report on student interest in emerging technologies at ohio state university. the report includes the results of an extensive environmental survey of their wendy starkweather (wendy.starkweather@unlv.edu) is director, user services division, and eva stowers (eva.stowers @unlv.edu) is medical/health sciences librarian at the university of nevada las vegas libraries. 188 information technology and libraries | december 2009 library users. the study is part of ohio state’s effort to actualize their culture of assessment and continuous learning and to use “extant local knowledge of user populations and library goals” to inform “homegrown studies to illuminate contextual nuance and character, customization that can be difficult to achieve when using externally developed survey instruments.”10 unlv libraries are attempting to balance early experimentation and more extensive data-driven decision-making. the recently adopted strategic plan includes specific directions associated with both efforts. for experimentation, the direction states, “encourage staff to experiment with, explore, and share innovative and creative applications of technology.”11 to that end, we have begun working with our colleagues to introduce easy, small-scale efforts designed to test the waters of mobile technology use through small pilot projects. “text-a-librarian” has been added to our existing group of virtual reference service, and we introduced a “text the call number and record” service to our library’s opac in july 2009. unlv libraries’ strategic plan helps foster the healthy balance by directing library staff to “emphasize data collection and other evidence based approaches needed to assess efficiency and effectiveness of multiple modes and formats of access/ownership” and “collaborate to educate faculty and others regarding ways to incorporate library collections and services into education experiences for students.”12 action items associated with these directions will help the libraries learn and apply information specific to their users as the libraries further adopt and integrate mobile technologies into their services. as we begin our planning in earnest, we look forward to our own set of valuable discoveries. references 1. janna anderson and lee rainie, the future of the internet iii, pew internet & american life project, http://www.pewinternet .org/~/media//files/reports/2008/pip_futureinternet3.pdf .pdf (accessed july 20, 2009). 2. sam churchill, “smartphone users: 110m by 2013,” blog entry, mar. 24, 2009, dailywireless.org, http://www.daily wireless.org/2009/03/24/smartphone-users-100m-by-2013 (accessed july 20, 2009). 3. mg siegler, “state of the iphone ecosystem: 40 million devices and 50,000 apps,” blog entry, june 8, 2009, tech crunch, http://www.techcrunch.com/2009/06/08/40-million-iphones -and-ipod-touches-and-50000-apps (accessed july 20, 2009). 4. jenna wortham, “palm app catalog hits a million downloads,” blog entry, june 24, 2009, new york times technology, http://bits.blogs.nytimes.com/2009/06/24/palm-app-cataloghits-a-million-downloads (accessed july 20, 2009). 5. larry johnson, alan levine, and rachel smith, horizon report, 2009 edition (austin, tex.: the new media consortium, 2009), http://www.nmc.org/pdf/2009-horizon-report.pdf (accessed july 20, 2009). 6. university of california, davis. “more than 40% of campus students own smartphones, yearly tech survey says,” technews, http://technews.ucdavis.edu/news2.cfm?id=1752 (accessed july 20, 2009). 7. university of nevada las vegas, office of information technology, “student technology survey report: 2008– 2009,” http://oit.unlv.edu/sites/default/files/survey/survey results2008_students3_27_09.pdf (accessed july 20, 2009). 8. joan lippincott, “mobile technologies, mobile users: implications for academic libraries,” arl bi-monthly report 261 (dec. 2008), http://www.arl.org/bm~doc/arl-br-261-mobile .pdf. (accessed july 20, 2009); ellyssa kroski, “library mobile initiatives,” library technology reports 44, no. 5 (july 2008): 33–38. 9. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 20, 2009): 2. 10. char booth, informing innovation: tracking student interest in emerging library technologies at ohio university (chicago: association of college and research libraries, 2009), http:// www.ala.org/ala/mgrps/divs/acrl/publications/digital/ ii-booth.pdf (accessed july 20, 2009); “unlv libraries strategic plan 2009–2011,” 6. 11. “unlv libraries strategic plan 2009–2011,” 2. 12. ibid. manzari user-centered design of a web site | manzari and trinidad-christensen 163 this study describes the life cycle of a library web site created with a user-centered design process to serve a graduate school of library and information science (lis). findings based on a heuristic evaluation and usability study were applied in an iterative redesign of the site to better serve the needs of this special academic library population. recommendations for design of web-based services for library patrons from lis programs are discussed, as well as implications for web sites for special libraries within larger academic library settings. u ser-centered design principles were applied to the creation of a web site for the library and information science (lis) library at the c. w. post campus of long island university. this web site was designed for use by master’s degree and doctoral students in the palmer school of library and information science. the prototype was subjected to a usability study consisting of a heuristic evaluation and usability testing. the results were employed in an iterative redesign of the web site to better accommodate users’ needs. this was the first usability study of a web site at the c. w. post library. human-computer interaction, the study of the interaction of human performance with computers, imposes a rigorous methodology on the process of user-interface design. more than an intuitive determination of userfriendliness, a successful interactive product is developed by careful design, testing, and redesign based on the testing outcomes. testing the product several times as it is being developed, or iterative testing, allows the users’ needs to be incorporated into the design. the interface should be designed for a specific community of users and set of tasks to be accomplished, with the goal of creating a consistent, usable product. the lis library had a web site that was simply a description of the collection and did not provide access to online specialized resources. a new web site was designed for the lis library by the incoming lis librarian who made a determination of what content might be useful for lis students and faculty. the goal was to have such content readily accessible in a web site separate from the main library web site. the web site for the lis library includes: ฀ access to all online databases and journals related to lis; ฀ a general overview of the lis library and its resources as well as contact information, hours, and staff; ฀ a list of all print and online lis library journal subscriptions, grouped by both title and subject, with links to access the online journals; ฀ links to other web sites in the lis field; ฀ links to other university web pages, including the main library’s home page, library catalog, and instructions for remote database access, as well as to the lis school web site; ฀ a link to jake (jointly administered knowledge environment), a project by yale university that allows users to search for periodical titles within online databases, since the library did not have this type of access through its own software. this information was arranged in four top-level pages with sublevels. design considerations included making the site both easy to learn and efficient once users were familiar with it. since classes are taught at four locations in the metropolitan area, the site needed to be flexible enough to serve students at the c. w. post campus library as well as remotely. the layout of the information was designed to make the web site uncluttered and attractive. different color schemes were tried and informally polled among users. a version with white text on black background prompted strong likes or dislikes when shown to users. although this combination is easy to read, it was rejected because of the strong negative reactions from several users. photographs of the lis library and students were included. the pages were designed with a menu on the left side; fly-out menus were used to access submenus. where main library pages already existed for information to be included in the lis web site, such as lis hours and staff, links to those pages were made instead of re-creating the information in the lis web site. an attempt was made to render the site accessible to users with disabilities, and pages were made compliant with the world wide web consortium (w3c) by using their html validator and their cascading style sheet validator.1 ฀ literature review usability is a term with many definitions, varying by field.2 the fields of industrial engineering, product research and development, computer systems, and library science all share the study of human-and-machine interaction, as well user-centered design of a web site for library and information science students: heuristic evaluation and usability testing laura manzari and jeremiah trinidad-christensen laura manzari (manzari@liu.edu) is an associate professor and library and information science librarian at the c. w. post campus of long island university, brookville, n.y. jeremiah trinidad-christensen (jt2118@columbia.edu) is a gis/map librarian at columbia university, new york, n.y. 164 information technology and libraries | september 2006 as a commitment to users. dumas and reddish explain it simply: “usability means that the people who use the product can do so quickly and easily to accomplish their own tasks.”3 user-centered design incorporates usability principles into product design and places the focus on the user during project development. gould and lewis cite three principles of user-centered design: an early focus on users and tasks, empirical measurement of product usage, and iterative design to include user input into product design and modification.4 jakob nielsen, an often-cited usability engineering specialist, emphasizes that for increased functionality, engineering usability principles should apply to web design, which should be treated as a software development project. he advocates incorporating user evaluation into the design process first through a heuristic evaluation, followed by usability testing with a redesign of the product after each phase of evaluation.5 usability principles have been applied to library web-site design; however, library web-site usability studies often do not include the additional heuristic evaluation recommended by nielsen.6 in addition to usability, consideration should also be given during the design process to making the web site accessible to people with disabilities. federal agencies are now required by the rehabilitation act to make their web sites accessible to the disabled. section 508 part 1194.22 of the act enumerates sixteen rules for internet applications to help ensure web-site access for people with various disabilities.7 similarly, the web accessibility initiative hosted by the w3c works to ensure that accessibility practices are considered in web-site design. they developed the web content accessibility guidelines for making web sites accessible to people with disabilities.8 although articles have been written about usability testing of academic library web sites, very little has been written about usability testing of special-collection web sites for distinct user populations within larger academic settings.9 ฀ heuristic evaluation methodology heuristic evaluation is a usability engineering method in which a small set of expert evaluators examine a user interface for design problems by judging its compliance with a set of recognized usability principles or heuristics. nielsen developed a set of ten widely adopted usability heuristics (see sidebar). after studying the use of individual evaluators as well as groups of varying sizes, nielsen and molich recommend using three to five evaluators for a heuristic evaluation.10 the use of multiple experts will catch more flaws than a single expert, but using more than five experts does not produce greater results. in comparisons of heuristic evaluation and usability testing, the heuristic evaluation uncovered more of the minor problems while usability testing uncovered more major, global problems.11 since each method tends to uncover different usability problems, it is recommended that both methods be used complementarily, particularly with an iterative design change between the heuristic evaluation and the usability testing. for the heuristic evaluation, four people were approached from the palmer lis school faculty and ph.d. program with expertise in web-site design and humancomputer interaction. three agreed to participate. they were asked to familiarize themselves with the web site and evaluate it according to nielsen’s ten heuristics, which were provided to them. ฀ heuristic evaluation results the evaluators were all in agreement that the language was appropriate for lis students. one evaluator said if new students were not familiar with some of the terms they soon would be. another thought jake, the tool to access full text, might not be clear to students at first, but the lis web-site explanation was fine the way it was. they were also in agreement that the web site was well designed. comments included: “the purpose and description of each page is short and to the point, and there is a good, clean, viewable page for the users”; “the site was well designed and not over designed”; “very clear and user friendly”; “excellent example of limiting unnecessary irrelevant information.” the only page to receive a “poor layout” comment was the lengthy subject list of journals, though no suggestions for improvement were made. concern was expressed about links to other web sites on campus. one evaluator thought new students might be confused about the relationship between long island university, c. w. post, and the palmer school. two evaluators thought links to the main library’s web site could cause confusion because of the different design and layout. a preference for the design of the lis library web site over the main library and palmer school web sites was expressed. to eliminate some confusion, the menu options for other campus web sites were dropped down to a separate menu right below the menu of lis web pages. for additional clarity, some of the main library pages were re-created in the style of the lis pages instead of linking to the original page. the evaluators made several concrete suggestions for menu changes, which were included in the redesign. it was suggested that several menu options were unclear and needed clarification, so additional text was added for clarity at the expense of brevity. long island university’s online catalog is named liucat and was listed that way on the menu. new students might not be familiar with this name, so the menu label was changed to liucat (library catalog). user-centered design of a web site | manzari and trinidad-christensen 165 for the link to jake, a description, find periodicals in online databases, was added for clarification. it was also suggested that the link to the main library web page for all databases could cause confusion since the layout and design of that page is different. the wording was changed to all databases (located in the c. w. post library web site). menu options were originally arranged in order of anticipated use (see figure 1). thus, the order of menu options from the lis home page was databases, journals, library catalog, other web sites, palmer school, and main library. evaluators suggested that putting the option for lis home page first would give users an easy “emergency exit” to return to the home page if they were lost. the original menu options also varied from page to page. for example, menu options on the database page referred only to pages that users might need while doing database searches. at the suggestion of evaluators, the menu options were changed to be consistent on every page (see figure 2). a redesign based on these results was completed and posted to the internet for public use (see figure 3). ฀ usability testing methodology usability testing is an empirical method for improving design. test subjects are gathered from the population who will use the product and are asked to perform real tasks using the prototype while their performance and reactions to the product are observed and recorded by an interviewer. this observation and recording of behavior distinguishes usability testing from focus groups. observation allows the tester to see when and where users become frustrated or confused. the goal is to jakob nielsen’s usability heuristics visibility of system status—the system should always keep users informed about what is going on, through appropriate feedback within reasonable time. match between system and the real world— the system should speak the user’s language, with words, phrases, and concepts familiar to the user rather than system-oriented terms. follow real-world conventions, making information appear in a natural and logical order. user control and freedom—users often choose system functions by mistake and will need a clearly marked “emergency exit” to leave the unwanted state without having to go through an extended dialogue. support undo and redo. consistency and standards—users should not have to wonder whether different words, situations, or actions mean the same thing. follow platform conventions. error prevention—even better than good error messages is a careful design that prevents problems from occurring in the first place. recognition rather than recall—make objects, actions, and options visible. the user should not have to remember information from one part of the dialogue to another. instructions for use of the system should be visible or easily retrievable whenever appropriate. flexibility and efficiency of use—accelerators, unseen by the novice user, may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. allow users to tailor frequent actions. aesthetic and minimalist design—dialogues should not contain information that is irrelevant or rarely needed. every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility. help users recognize, diagnose, and recover from errors—error messages should be expressed in plain language (no codes), precisely indicate the problems, and constructively suggest a solution. help and documentation—even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. any such information would be easy to search, focused on the user’s task, list concrete steps to be carried out, and not be too large.12 figure 1. original menu figure 2. revised menu 166 information technology and libraries | september 2006 uncover usability problems with the product, not to test the participants themselves. the data gathered are then analyzed to recommend changes to fix usability problems. in addition to recording empirical data such as number of errors made or time taken to complete tasks, active intervention allows the interviewer to question participants about reasons for their actions as well as about their opinions regarding the product. in fact, subjects are asked to verbalize their thought processes as they complete the tasks using the interface. test subjects are usually interviewed individually and are all given the same pretest briefing from a script with a list of instructions followed by tasks representing actual use. test subjects are also asked questions about their likes and dislikes. in most situations, payment or other incentives are offered to help recruit subjects. four or five subjects will reveal 80 percent of usability problems.13 messages were sent to students via the palmer school’s mailing lists requesting volunteers. a ten-dollar gift certificate to a bookstore was offered as an inducement to recruitment. input was desired from both master’s degree and doctoral students. the first nine volunteers to respond—all master’s degree students—were accepted. this group included students from both the main and satellite campuses. no ph.d. students volunteered to participate at first, citing busy schedules, but eventually a doctoral student was recruited. testing was conducted in computer labs at the library, at the palmer school, and at the manhattan satellite campus. demographic information was gathered regarding users’ gender, age range, university status, familiarity with computers, with the internet, and with the lis library, as well as the type of internet connection and browser usually used. the subjects were given eight tasks to complete using the web site. the tasks reflected both the type of assignment a student might receive in class and the type of information they might seek on the lis web site on their own. the questions were designed to test usability of different parts of the web site. ฀ ฀usability testing results the first task tested the print journals page and asked if the lis library subscribes to a specific journal and whether it is refereed. (the web site uses an asterisk next to a journal title to indicate that it is refereed.) all subjects were able to easily find that the lis library does hold the journal title. although it was not initially obvious that the asterisk was a notation indicating that the journal was refereed, most of the subjects eventually found the explanatory note. many of the subjects did not know what a refereed journal was, and some asked if a definition could be provided on the site. for the second task, subjects needed to use jake to find the full text of an article. none of the students were familiar with jake but were able to use the lis web site to gain an understanding of its purpose and to access it. the third task asked subjects to find a library association that required using the other web sites page. all subjects demonstrated an understanding of how to use this page and found the information. the fourth task tested the full-text databases page. only one subject actually used this page to complete the task. the rest used the all databases link to the main library’s database list. that link appears above the link to full-text databases and most subjects chose that link without looking at the next menu option. several subfigure 3. final home page user-centered design of a web site | manzari and trinidad-christensen 167 jects became confused when they were taken to the main library’s page, just as the evaluators had predicted. even though wording was added warning users that they were leaving the lis web site, most subjects did not read it and wondered why the page layout changed and was not as clear. they also had trouble navigating back to the lis web site from the main library web site. the fifth task tested the journals by subject page. this task took longer for most of the subjects to answer, but all were able to use the page successfully to find a journal on a given subject. the sixth task required using the lis home page, and everyone easily used it to find the operating hours. the seventh task required subjects to find an online journal title that could be accessed from the electronic journals page. all subjects navigated this page easily. the final task asked subjects to find a book review. most subjects did not look at the page for library and information sciences databases to access the books in print database, saying they did not think it would be included there. instead, they used the link to the main library’s database page. one subject was not able to complete this task. problems primarily occurred during testing when subjects left the lis page to use a non-library science database located on the main web site. subjects had problems getting back to the lis site from the main library site. while performing tasks, some subjects would scroll up and down long lists instead of using the toolbars provided to bring the user to an exact location on the page. some preferred using the back button instead of using the lis web-site menu to navigate. these seemed to be individual styles of using the web and not any usability problem with the site. several people consistently used the menu to return to the lis home page before starting each new task, even though they could have navigated directly to the page they needed, making a return to the home page unnecessary. this validated the recommendation from the heuristic study that the link to the home page always be the first menu option to give users a comfortable safety valve when they get lost. the final questions asked subjects for their opinions on what they did and did not like about the web site, as well as any suggestions for improving the site. all subjects responded that they liked the layout of the pages, calling them uncluttered, clean, attractive, and logical. there were very few suggestions for improving the site. one person asked that contact information be included on the menu options in addition to its location right below the menu on the lis home page. another participant suggested adding class syllabi to the web site each semester, listing required texts along with a link to an online bookstore. some of the novice users asked for explanations of unfamiliar terms such as “refereed journals.” a participant suggested including a search engine instead of using links to navigate the site. this was considered during the initial site design but was not included since the site did not have a large number of pages. however, a search engine may be worth including. the one doctoral student had previously only used the main library’s web page to access databases. originally, he said he did not see the advantage of a site devoted to information science sources for doctoral candidates, since that program is more multidisciplinary. however, after completing the usability study, the student concluded that the lis web site was useful. he suggested that it should be publicized more to doctoral candidates and that it be more prominently highlighted on the main library web site. though the questions asked were about the lis web site, several subjects complained about the layout of the main library web site and suggested that it have better linking to the lis web site to enable it to be accessed more easily. ฀ conclusions iterative testing and user-centered design resulted in a product that testing revealed to be easy to learn and efficient to use, and about which subjects expressed satisfaction. based on findings that some students had not even been aware of the existence of the lis web site, greater emphasis is now given to the web site and its features during new student orientations. the biggest problem users had was navigating from the web pages of the main library back to the lis site. it was suggested that the lis site be highlighted more prominently on the main library web site. some users were confused by the different layouts between the sites, but no one expressed a preference for the design used by the main library web site. despite this confusion, subjects overwhelmingly expressed positive feedback about having a specialized library site serving their specific needs. issues regarding web-site design can be problematic for smaller specialized libraries within larger institutions. in this case, some of the problems navigating between the sites could be resolved by changes to the main library site. the design of the lis web site was preferred over the main campus web site by both the heuristic evaluators and the students in the usability test. however, designers of a main library web site might not be receptive to suggestions from a specialized or branch library. although consistency in design would eliminate confusion, requiring the specialcollection’s web site to follow a design set by the main institution could be a loss for users. in this instance, the main site was designed without user input, whereas the specialized library serving a smaller population was able to be more dynamic and responsive to its users. finding an appropriate balance for a site used by students new to the field as well as advanced students is 168 information technology and libraries | september 2006 a challenge. although the students in the study were all experienced computer and web users, their familiarity with basic library concepts varied greatly. a few novice users expressed some confusion as to the difference between journals and index databases. there actually was a description of each of these sources on the site but it was not read. (the subjects barely read any of the site’s text, so it can be difficult to make some points clearer when users want to navigate quickly without reading instructions. several subjects who did not bother to read text on the site still suggested having more notes to explain unfamiliar terms. however, if the site becomes too overloaded with explanations of library concepts, it could become annoying for more advanced users.) a separate page with a glossary is a possibility—based on the study, however, it will probably not be read. another possibility is a handout for students that could have more text for new users without cluttering the web site. having such a handout would also serve to publicize the site. there was some concern prior to the study that offering more advanced features, such as providing access to jake or indicating which journals are refereed, might be off-putting for new students; therefore, test questions were designed to gauge reactions to these features. most students in the study did express some intimidation at not being familiar with these concepts. however, all the subjects eventually figured out how to use jake and, once they tried it, thought it was a good idea to include it. even new students who had the most difficulty were still able to navigate and learn from the site to be able to use it efficiently. an online survey was added to the final design to allow continuous user input. the site consistently receives positive feedback through these surveys. it was planned that responses could be used to continually assess the site and ensure that it is kept responsive and up-to-date; however specific suggestions have not yet been forthcoming. how valuable was usability testing to the web-site design? several good suggestions were made and implemented, and the process confirmed that the site was well designed. it provided some insight into how subjects used the web site that had not been anticipated by the designers. since usability studies are fairly easy and inexpensive to conduct, it is probably a step worth taking during the web-site design process even if it results in only minor changes to the design. references and notes 1. w3c, “the w3c markup validation service,” validator .w3.org (accessed nov. 1, 2005); w3c, “the w3c css validation service,” jigsaw.w3.org/css-validator (accessed nov. 1, 2005). 2. see carol m. barnum, usability testing and research (new york: longman international, 2002); alison j. head, “web redemption and the promise of usability,” online 23, no. 6 (1999): 20–29; international standards organization, ergonomic requirements for office work with visual display terminals. part 11: guidance on usability—iso 9241-11 (geneva: international organization for standardization, 1998); judy jeng, “what is usability in the context of the digital library and how can it be measured?” information technology and libraries 24, no. 2 (2005): 47–52; jakob nielsen, usability engineering (boston: academic, 1993); ruth ann palmquist, “an overview of usability for the study of users’ web-based information retrieval behavior,” journal of education for library and information science 42, no. 2 (2001): 123–36. 3. joseph s. dumas and janice c. redish, a practical guide to usability testing (portland: intellect bks., 1999), 4. 4. john d. gould and clayton h. lewis, “designing for usability: key principles and what designers think,” communications of the acm 28 no. 3 (1985): 300–11. 5. jakob nielsen, “heuristic evaluation,” in jakob nielsen and robert l. mack, eds., usability inspection methods (new york: wiley, 1994), 25–62. 6. see denise t. covey, usage and usability assessment: library practices and concerns (washington, d.c.: digital library federation, 2002); nicole campbell, usability assessment of library-related web sites (chicago: ala, 2001); kristen l. garlock and sherry piontek, designing web interfaces to library services and resources (chicago: ala, 1999); anna noakes schulze, “user-centered design for information professionals,” journal of education for library and information science 42, no. 2 (2001): 116–22; susan m. thompson, “remote observation strategies for usability testing,” information technology and libraries 22, no. 3 (2003): 22–32. 7. government services administration, “section 508: section 508 standards,” www.section508.gov/index.cfm?fuseacti on=content&id=12#web (accessed nov. 1, 2005). 8. w3c, “web content accessibility guidelines 2.0,” www .w3.org/tr/wcag20 (accessed nov. 1, 2005). 9. see susan augustine and courtney greene, “discovering how students search a library web site: a usability case study,” college and research libraries 63, no. 4 (2002): 354–65; brenda battleson, austin booth, and jane weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (2001): 188–98; janice krueger, ron l. ray, and lorrie knight, “applying web usability techniques to assess student awareness of library web resources,” journal of academic librarianship 30, no. 4 (2004): 285–93; thura mack et al., “designing for experts: how scholars approach an academic library web site,” information technology and libraries 23, no. 1 (2004): 16–22; mark shelstad, “content matters: analysis of a web site redesign,” oclc systems & services 21, no. 3 (2005): 209–25; robert l. tolliver et al., “web site redesign and testing with a usability consultant: lessons learned,” oclc systems & services 21, no. 3 (2005): 156–67; dominique turnbow et al., “usability testing for web redesign: a ucla case study,” oclc systems & services 21, no. 3 (2005): 226–34; leanne m. vandecreek, “usability analysis of northern illinois user-centered design of a web site | manzari and trinidad-christensen 169 university libraries’ web site: a case study,” oclc systems & services 21, no. 3 (2005): 181–92. 10. jakob nielsen and rolf molich, “heuristic evaluation of user interfaces,” in proceedings of the acm chi ’90 (new york: association for computing machinery, 1990), 249–56. 11. robin jeffries et al., “user interface evaluation in the real world: a comparison of a few techniques,” in proceedings of the acm chi ’91 (new york: association for computing machinery, 1991), 119–24; jakob nielsen, “finding usability problems through heuristic evaluation,” in proceedings of the acm chi ’92 (new york: association for computing machinery, 1992), 373–86. 12. jakob nielsen, “heuristic evaluation,” 25–62. 13. jeffrey rubin, handbook of usability testing: how to plan, design, and conduct effective tests (new york: wiley, 1994); jakob nielsen, “why you only need to test with five users, alertbox mar. 19, 2000,” www.useit.com/alertbox/20000319.html (accessed nov. 1, 2005). 184 information technology and libraries | december 2009 thomas sommer unlv special collections in the twenty-first century university of nevada las vegas (unlv) special collections is consistently striving to provide several avenues of discovery to its diverse range of patrons. specifically, unlv special collections has planned and implemented several online tools to facilitate unearthing treasures in the collections. these online tools incorporate web 2.0 features as well as searchable interfaces to collections. t he university of nevada las vegas (unlv) special collections has been working toward creating a visible archival space in the twenty-first century that assists its patrons’ quest for historical discovery in unlv’s unique southern nevada, gaming, and las vegas collections. this effort has helped patrons ranging from researchers to students to residents. special collections has created a discovery environment that incorporates several points of access, including virtual exhibits, a collection-wide search box, and digital collections. unlv special collections also has added web 2.0 features to aid in the discovery and enrichment of this historical information. these new features range from a what’s new blog to a digital collection with interactive features. the first point of discovery within the unlv special collections website began with the virtual exhibits. staff created the virtual exhibits as static html pages that showcased unique materials housed within unlv special collections. they showed the scope and diversity of materials on a specific topic available to researchers, faculty, and students. one virtual exhibit is “dino at the sands” (figure 1), a point of discovery for the history not only of dean martin but of many rat pack exploits.1 the photographs in this exhibit come from the sands collection. it is a static html page, and it provides information and pictures regarding one of las vegas’ most famous entertainers. this exhibit contains links to rat pack information and various resources on dean martin, including photographs, books, and videotapes. a second mode of discovery within the unlv special collections website is its new “search special collections” google-like search box (figure 2). this is located on the homepage and searches the manuscript, photograph, and oral history primary source collections.2 the purpose is to aid in the discovery of material within the collections that is not yet detailed in the public online catalog. in the past researchers would have to work through the special collection’s website to locate the resources. they can now go to one place to search for various types of material—a one-stop shop. the search results are easy to read and highlight the search term (see figure 3).3 the third point of access is the digital collection. these collections are digital copies of original materials located within the archives. the digital copies are presented online, described, and organized for easy access. each collection offers full-text searches, browsing, zoom, pan, figure 2. unlv special collections search box figure 1. “dino at the sands” exhibit thomas sommer (thomas.sommer@unlv.edu) is university and technical services archivist in special collections at the university of nevada las vegas libraries. unlv special collections in the twenty-first century | sommer 185 side-by-side comparison, and exporting for presentation and reuse. the newest example of a digital collection is “southern nevada: the boomtown years” (figure 4).4 this collection brings together a wide range of original materials from various collections located within unlv special collections, the nevada state museum, the historical society in las vegas, and the clark county heritage museum. it even provides standards-based activities for elementary and high school students. this project was funded by the nevada state library and archives under the library services and technology act (lsta) as amended through the institute of museum figure 4. “southern nevada: the boomtown years” digital collection figure 5. “what’s new” blog figure 6. unlv special collection facebook page figure 3. hoover dam search results 186 information technology and libraries | december 2009 and library services (imls). unlv special collections director peter michel selected the content. the team included fourteen members, four of whom were funded by the grant. christy keeler, phd, created the educator pages and designed the student activities. new collections are great, but users have to know they exist. to announce new collections and displays, special collections first added a what’s new blog that includes an rss feed to keep patrons up-to-date on new messages (figure 5).5 another avenue of interaction was implemented in april 2009 when special collections created its own facebook page (figure 6).6 students and researchers are encouraged to become fans. status updates with images and links to southern nevada and las vegas resources lead the fans back to the main website where the other treasures can be discovered. special collections has implemented various web 2.0 features within its newest digital collections. specifically, it added a comments section, a “rate it” feature, and an rss feature to its latest digital collections (figures 7, 8, and 9). these latest trends enrich the collections’ resources with patron-supplied information.7 as is apparent, unlv special collections implemented several online tools to allow patrons to discover its extensive primary resources. these tools range from virtual exhibits and digital collections with web 2.0 features to blogs and social networking sites. special collections has endeavored to stay on top of the latest trends to benefit its patrons and facilitate their discovery of historical materials in the twenty-first century. figure 8. “rate it” feature for aerial view of hughes aircraft plant photograph figure 7. comments section for aerial view of hughes aircraft plant photograph figure 9. rss feature for the index to the “welcome home howard” digital collection continued on page 190 190 information technology and libraries | december 2009 as previously mentioned, these easy-to-use tools can allow screencast videos and screenshots to be integrated into a variety of online spaces. a particularly effective type of online space for potential integration of such screencast videos and screenshots are library “how do i find . . .” research help guides. many of these “how do i find . . .” research help guides serve as pathfinders for patrons, outlining processes for obtaining information sources. currently, many of these pathfinders are in text form, and experimentation with the tools outlined in this article can empower library staff to enhance their own pathfinders with screencast videos and screenshot tutorials. reference 1. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 30, 2009): 2. unlv special collections continued from page 186 references 1. peter michel, “dino at the sands,” unlv special collections, http://www.library.unlv.edu/speccol/dino/index.html (accessed july 28, 2009). 2. peter michel, “unlv special collections search box.” unlv special collections. http://www.library.unlv.edu/speccol/ index.html (accessed july 28, 2009). 3. unlv special collections search results, “hoover dam,” http://www.library.unlv.edu/speccol/databases/index .php?search_query=hoover+dam&bts=search&cols[]=oh&cols []=man&cols[]=photocoll&act=2 (accessed october 27, 2009). 4. unlv libraries, “southern nevada: the boomtown years,” http://digital.library.unlv.edu/boomtown/ (accessed july 28, 2009). 5. unlv special collections, “what’s new in special collections,” http://blogs.library.unlv.edu/whats_new_in_special_ collections/ (accessed july 28, 2009). 6. unlv special collections, “unlv special collections facebook homepage,” http://www.facebook.com/home .php?#/pages/las-vegas-nv/unlv-special-collections/70053 571047?ref=search (accessed july 28, 2009). 7. unlv libraries, “comments section for the aerial view of hughes aircraft plant photograph,” http://digital.library .unlv.edu/hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “‘rate it’ feature for the aerial view of hughes aircraft plant photograph,” http://digital.library.unlv.edu/ hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “rss feature for the index to the welcome home howard digital collection” http://digital.library.unlv.edu/hughes/ dm.php/ (accessed july 28, 2009). statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2009 issue). total number of copies printed: average, 5,096; actual, 4,751. mailed outside country paid subscriptions: average, 4,090; actual, 3,778. sales through dealers and carriers, street vendors, and counter sales: average, 430; actual 399. total paid distribution: average, 4,520; actual, 4,177. free or nominal rate copies mailed at other classes through the usps: average, 54; actual, 57. free distribution outside the mail (total): average, 127; actual, 123. total free or nominal rate distribution: average, 181; actual, 180. total distribution: average, 4,701; actual, 4,357. office use, leftover, unaccounted, spoiled after printing: average, 395; actual, 394. total: average, 5,096; actual, 4,751. percentage paid: average, 96.15; actual, 95.87. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 0 9 . fboze rectangle generating collaborative systems for digital libraries | hilera et al. 195 josé r. hilera, carmen pagés, j. javier martínez, j. antonio gutiérrez, and luis de-marcos an evolutive process to convert glossaries into ontologies dictionary, the outcome will be limited by the richness of the definition of terms included in that dictionary. it would be what is normally called a “lightweight” ontology,6 which could later be converted into a “heavyweight” ontology by implementing, in the form of axioms, knowledge not contained in the dictionary. this paper describes the process of creating a lightweight ontology of the domain of software engineering, starting from the ieee standard glossary of software engineering terminology.7 ■■ ontologies, the semantic web, and libraries within the field of librarianship, ontologies are already being used as alternative tools to traditional controlled vocabularies. this may be observed particularly within the realm of digital libraries, although, as krause asserts, objections to their use have often been raised by the digital library community.8 one of the core objections is the difficulty of creating ontologies as compared to other vocabularies such as taxonomies or thesauri. nonetheless, the semantic richness of an ontology offers a wide range of possibilities concerning indexing and searching of library documents. the term ontology (used in philosophy to refer to the “theory about existence”) has been adopted by the artificial intelligence research community to define a categorization of a knowledge domain in a shared and agreed form, based on concepts and relationships, which may be formally represented in a computer readable and usable format. the term has been widely employed since 2001, when berners-lee et al. envisaged the semantic web, which aims to turn the information stored on the web into knowledge by transforming data stored in every webpage into a common scheme accepted in a specific domain.9 to accomplish that task, knowledge must be represented in an agreed-upon and reusable computer-readable format. to do this, machines will require access to structured collections of information and to formalisms which are based on mathematical logic that permits higher levels of automatic processing. technologies for the semantic web have been developed by the world wide web consortium (w3c). the most relevant technologies are rdf (resource description this paper describes a method to generate ontologies from glossaries of terms. the proposed method presupposes an evolutionary life cycle based on successive transformations of the original glossary that lead to products of intermediate knowledge representation (dictionary, taxonomy, and thesaurus). these products are characterized by an increase in semantic expressiveness in comparison to the product obtained in the previous transformation, with the ontology as the end product. although this method has been applied to produce an ontology from the “ieee standard glossary of software engineering terminology,” it could be applied to any glossary of any knowledge domain to generate an ontology that may be used to index or search for information resources and documents stored in libraries or on the semantic web. f rom the point of view of their expressiveness or semantic richness, knowledge representation tools can be classified at four levels: at the basic level (level 0), to which dictionaries belong, tools include definitions of concepts without formal semantic primitives; at the taxonomies level (level 1), tools include a vocabulary, implicit or explicit, as well as descriptions of specialized relationships between concepts; at the thesauri level (level 2), tools further include lexical (synonymy, hyperonymy, etc.) and equivalence relationships; and at the reference models level (level 3), tools combine the previous relationships with other more complex relationships between concepts to completely represent a certain knowledge domain.1 ontologies belong at this last level. according to the hierarchic classification above, knowledge representation tools of a particular level add semantic expressiveness to those in the lowest levels in such a way that a dictionary or glossary of terms might develop into a taxonomy or a thesaurus, and later into an ontology. there are a variety of comparative studies of these tools,2 as well as varying proposals for systematically generating ontologies from lower-level knowledge representation systems, especially from descriptor thesauri.3 this paper proposes a process for generating a terminological ontology from a dictionary of a specific knowledge domain.4 given the definition offered by neches et al. (“an ontology is an instrument that defines the basic terms and relations comprising the vocabulary of a topic area as well as the rules for combining terms and relations to define extensions to the vocabulary”)5 it is evident that the ontology creation process will be easier if there is a vocabulary to be extended than if it is developed from scratch. if the developed ontology is based exclusively on the josé r. hilera (jose.hilera@uah.es) is professor, carmen pagés (carmina.pages@uah.es) is assistant professor, j. javier martínez (josej.martinez@uah.es) is professor, j. antonio gutiérrez (jantonio.gutierrez@uah.es) is assistant professor, and luis de-marcos (luis.demarcos@uah.es) is professor, department of computer science, faculty of librarianship and documentation, university of alcalá, madrid, spain. 196 information technology and libraries | december 2010 configuration management; data types; errors, faults, and failures; evaluation techniques; instruction types; language types; libraries; microprogramming; operating systems; quality attributes; software documentation; software and system testing; software architecture; software development process; software development techniques; and software tools.15 in the glossary, entries are arranged alphabetically. an entry may consist of a single word, such as “software,” a phrase, such as “test case,” or an acronym, such as “cm.” if a term has more than one definition, the definitions are numbered. in most cases, noun definitions are given first, followed by verb and adjective definitions as applicable. examples, notes, and illustrations have been added to clarify selected definitions. cross-references are used to show a term’s relations with other terms in the dictionary: “contrast with” refers to a term with an opposite or substantially different meaning; “syn” refers to a synonymous term; “see also” refers to a related term; and “see” refers to a preferred term or to a term where the desired definition can be found. figure 2 shows an example of one of the definitions of the glossary terms. note that definitions can also include framework),10 which defines a common data model to specify metadata, and owl (ontology web language),11 which is a new markup language for publishing and sharing data using web ontologies. more recently, the w3c has presented a proposal for a new rdf-based markup system that will be especially useful in the context of libraries. it is called skos (simple knowledge organization system), and it provides a model for expressing the basic structure and content of concept schemes, such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other similar types of controlled vocabularies.12 the emergence of the semantic web has created great interest within librarianship because of the new possibilities it offers in the areas of publication of bibliographical data and development of better indexes and better displays than those that we have now in ils opacs.13 for that reason, it is important to strive for semantic interoperability between the different vocabularies that may be used in libraries’ indexing and search systems, and to have compatible vocabularies (dictionaries, taxonomies, thesauri, ontologies, etc.) based on a shared standard like rdf. there are, at the present time, several proposals for using knowledge organization systems as alternatives to controlled vocabularies. for example, folksonomies, though originating within the web context, have been proposed by different authors for use within libraries “as a powerful, flexible tool for increasing the user-friendliness and interactivity of public library catalogs.”14 authors argue that the best approach would be to create interoperable controlled vocabularies using shared and agreed-upon glossaries and dictionaries from different domains as a departure point, and then to complete evolutive processes aimed at semantic extension to create ontologies, which could then be combined with other ontologies used in information systems running in both conventional and digital libraries for indexing as well as for supporting document searches. there are examples of glossaries that have been transformed into ontologies, such as the cambridge healthtech institute’s “pharmaceutical ontologies glossary and taxonomy” (http://www.genomicglossaries.com/content/ontolo gies.asp), which is an “evolving terminology for emerging technologies.” ■■ ieee standard glossary of software engineering terminology to demonstrate our proposed method, we will use a real glossary belonging to the computer science field, although it is possible to use any other. the glossary, available in electronic format (pdf), defines approximately 1,300 terms in the domain of software engineering (figure 1). topics include addressing assembling, compiling, linking, loading; computer performance evaluation; figure 1. cover of the glossary document generating collaborative systems for digital libraries | hilera et al. 197 4. define the classes and the class hierarchy 5. define the properties of classes (slots) 6. define the facets of the slots 7. create instances as outlined in the introduction, the ontology developed using our method is a terminological one. therefore we can ignore the first two steps in noy’s and mcguinness’ process as the concepts of the ontology coincide with the terms of the glossary used. any ontology development process must take into account the basic stages of the life cycle, but the way of organizing the stages can be different in different methods. in our case, since the ontology has a terminological character, we have established an incremental development process that supposes the natural evolution of the glossary from its original format (dictionary or vocabulary format) into an ontology. the proposed life cycle establishes a series of steps or phases that will result in intermediate knowledge representation tools, with the final product, the ontology, being the most semantically rich (figure 4). therefore this is a product-driven process, in which the aim of every step is to obtain an intermediate product useful on its own. the intermediate products and the final examples associated with the described concept. in the resulting ontology, the examples were included as instances of the corresponding class. in figure 2, it can be seen that the definition refers to another glossary on programming languages (std 610.13), which is a part of the series of dictionaries related to computer science (“ieee std 610,” figure 3). other glossaries which are mentioned in relation to some references about term definitions are 610.1, 610.5, 610.7, 610.8, and 610.9. to avoid redundant definitions and possible inconsistencies, links must be implemented between ontologies developed from those glossaries that include common concepts. the ontology generation process presented in this paper is meant to allow for integration with other ontologies that will be developed in the future from the other glossaries. in addition to the explicit references to other terms within the glossary and to terms from other glossaries, the textual definition of a concept also has implicit references to other terms. for example, from the phrase “provides features designed to facilitate expression of data structures” included in the definition of the term high order language (figure 2), it is possible to determine that there is an implicit relationship between this term and the term data structure, also included in the glossary. these relationships have been considered in establishing the properties of the concepts in the developed ontology. ■■ ontology development process many ontology development methods presuppose a life cycle and suggest technologies to apply during the process of developing an ontology.16 the method described by noy and mcguinness is helpful when beginning this process for the first time.17 they establish a seven-step process: 1. determine the domain and scope of the ontology 2. consider reusing existing ontologies 3. enumerate important terms in the ontology figure 2. example of term definition in the ieee glossary figure 3. ieee computer science glossaries 610—standard dictionary of computer terminology 610.1—standard glossary of mathematics of computing terminology 610.2—standard glossary of computer applications terminology 610.3—standard glossary of modeling and simulation terminology 610.4—standard glossary of image processing terminology 610.5—standard glossary of data management terminology 610.6—standard glossary of computer graphics terminology 610.7—standard glossary of computer networking terminology 610.8—standard glossary of artificial intelligence terminology 610.9—standard glossary of computer security and privacy terminology 610.10—standard glossary of computer hardware terminology 610.11—standard glossary of theory of computation terminology 610.12—standard glossary of software engineering terminology 610.13—standard glossary of computer languages terminology high order language (hol). a programming language that requires little knowledge of the computer on which a program will run, can be translated into several difference machine languages, allows symbolic naming of operations and addresses, provides features designed to facilitate expression of data structures and program logic, and usually results in several machine instructions for each program statement. examples include ada, cobol, fortran, algol, pascal. syn: high level language; higher order language; third generation language. contrast with: assembly language; fifth generation language; fourth generation language; machine language. note: specific languages are defined in p610.13 198 information technology and libraries | december 2010 since there are terms with different meanings (up to five in some cases) in the ieee glossary of software engineering terminology, during dictionary development we decided to create different concepts (classes) for the same term, associating a number to these concepts to differentiate them. for example, there are five different definitions for the term test, which is why there are five concepts (test1–test5), corresponding to the five meanings of the term: (1) an activity in which a system or component is executed under specified conditions, the results are observed or recorded, and an evaluation is made of some aspect of the system or component; (2) to conduct an activity as in (1); (3) a set of one or more test cases; (4) a set of one or more test procedures; (5) a set of one or more test cases and procedures. taxonomy the proposed lifecycle establishes a stage for the conversion of a dictionary into a taxonomy, understanding taxonomy as an instrument of concepts categorization, product are a dictionary, which has a formal and computer processed structure, with the terms and their definitions in xml format; a taxonomy, which reflects the hierarchic relationships between the terms; a thesaurus, which includes other relationships between the terms (for example, the synonymy relationship); and, finally, the ontology, which will include the hierarchy, the basic relationships of the thesaurus, new and more complex semantic relationships, and restrictions in form of axioms expressed using description logics.18 the following paragraphs describe the way each of these products is obtained. dictionary the first step of the proposed development process consists of the creation of a dictionary in xml format with all the terms included in the ieee standard glossary of software engineering terminology and their related definitions. this activity is particularly mechanical and does not need human intervention as it is basically a transformation of the glossary from its original format (pdf) into a format better suited to the development process. all formats considered for the dictionary are based on xml, and specifically on rdf and rdf schema. in the end, we decided to work with the standards daml+oil and owl,19 though we are not opposed to working with other languages, such as skos or xmi,20 in the future. (in the latter case, it would be possible to model the intermediate products and the ontology in uml graphic models stored in xml files.)21 in our project, the design and implementation of all products has been made using an ontology editor. we have used oiled (with oilviz plugin) as editor, both because of its simplicity and because it allows the exportation to owl and daml formats. however, with future maintenance and testing in mind, we decided to use protégé (with owl plugin) in the last step of the process, because this is a more flexible environment with extensible modules that integrate more functionality such as ontology annotation, evaluation, middleware service, query and inference, etc. figure 5 shows the dictionary entry for “high order language,” which appears in figure 2. note that the dictionary includes only owl:class (or daml:class) to mark the term; rdf:label to indicate the term name; and rdf:comment to provide the definition included in the original glossary. figure 4. ontology development process highorderlanguage figure 5. example of dictionary entry generating collaborative systems for digital libraries | hilera et al. 199 example, when analyzing the definition of the term compiler: “(is) a computer program that translates programs expressed in a high order language into their machine language equivalent,” it is possible to deduce that compiler is a subconcept of computer program, which is also included in the glossary.) in addition to the lexical or syntactic analysis, it is necessary for an expert in the domain to perform a semantic analysis to complete the development of the taxonomy. the implementation of the hierarchical relationships among the concepts is made using rdfs:subclassof, regardless of whether the taxonomy is implemented in owl or daml format, since both languages specify this type of relationship in the same way. figure 6 shows an example of a hierarchical relationship included in the definition of the concept pictured in figure 5. thesaurus according to the international organization for standardization (iso), a thesaurus is “the vocabulary of a controlled indexing language, formally organized in order to make explicit the a priori relations between concepts (for example ‘broader’ and ‘narrower’).”25 this definition establishes the lexical units and the semantic relationships between these units as the elements that constitute a thesaurus. the following is a sample of the lexical units: ■■ descriptors (also called “preferred terms”): the terms used consistently when indexing to represent a concept that can be in documents or in queries to these documents. the iso standard introduces the option of adding a definition or an application note to every term to establish explicitly the chosen meaning. this note is identified by the abbreviation sn (scope note), as shown in figure 7. ■■ non-descriptors (“non-preferred terms”): the synonyms or quasi-synonyms of a preferred term. a nonpreferred term is not assigned to documents submitted to an indexing process, but is provided as an entry point in a thesaurus to point to the appropriate descriptor. usually the descriptors are written in capital letters and the nondescriptors in small letters. ■■ compound descriptors: the terms used to represent complex concepts and groups of descriptors, which allow for the structuring of large numbers of thesaurus descriptors into subsets called micro-thesauri. in addition to lexical units, other fundamental elements of a thesaurus are semantic relationships between these units. the more common relationships between lexical units are the following: ■■ equivalence: the relationship between the descriptors and the nondescriptors (synonymous and that is, as a systematical classification in a traditional way. as gilchrist states, there is no consensus on the meaning of terms like taxonomy, thesaurus, or ontology.22 in addition, much work in the field of ontologies has been done without taking advantage of similar work performed in the fields of linguistics and library science.23 this situation is changing because of the increasing publication of works that relate the development of ontologies to the development of “classic” terminological tools (vocabularies, taxonomies, and thesauri). this paper emphasizes the importance and usefulness of the intermediate products created at each stage of the evolutive process from glossary to ontology. the end product of the initial stage is a dictionary expressed as xml. the next stage in the evolutive process (figure 4) is the transformation of that dictionary into a taxonomy through the addition of hierarchical relationships between concepts. to do this, it is necessary to undertake a lexicalsemantic analysis of the original glossary. this can be done in a semiautomatic way by applying natural language processing (nlp) techniques, such as those recommended by morales-del-castillo et al.,24 for creating thesauri. the basic processing sequence in linguistic engineering comprises the following steps: (1) incorporate the original documents (in our case the dictionary obtained in the previous stage) into the information system; (2) identify the language in which they are written, distinguishing independent words; (3) “understand” the processed material at the appropriate level; (4) use this understanding to transform, search, or traduce data; (5) produce the new media required to present the produced outcomes; and finally, (6) present the final outcome to human users by means of the most appropriate peripheral device—screen, speakers, printer, etc. an important aspect of this process is natural language comprehension. for that reason, several different kinds of programs are employed, including lemmatizers (which implement stemming algorithms to extract the lexeme or root of a word), morphologic analyzers (which glean sentence information from their constituent elements: morphemes, words, and parts of speech), syntactic analyzers (which group sentence constituents to extract elements larger than words), and semantic models (which represent language semantics in terms of concepts and their relations, using abstraction, logical reasoning, organization and data structuring capabilities). from the information in the software engineering dictionary and from a lexical analysis of it, it is possible to determine a hierarchical relationship when the name of a term contains the name of another one (for example, the term language and the terms programming language and hardware design language), or when expressions such as “is a” linked to the name of another term included in the glossary appear in the text of the term definition. (for 200 information technology and libraries | december 2010 indicating that high order language relates to both assembly and machine languages. the life cycle proposed in this paper (figure 4) includes a third step or phase that transforms the taxonomy obtained in the previous phase into a thesaurus through the incorporation of relationships between the concepts that complement the hierarchical relations included in the taxonomy. basically, we have to add two types of relationships—equivalence and associative, represented in the standard thesauri with uf (and use) and rt respectively. we will continue using xml to implement this new product. there are different ways of implementing a thesaurus using a language based on xml. for example, matthews et al. proposed a standard rdf format,26 where as hall created an ontology in daml.27 in both cases, the authors modeled the general structure of quasi-synonymous). iso establishes that the abbreviation uf (used for) precedes the nondescriptors linked to a descriptor; and the abbreviation use is used in the opposite case. for example, a thesaurus developed from the ieee glossary might include a descriptor “high order language” and an equivalence relationship with a nondescriptor “high level language” (figure 7). ■■ hierarchical: a relationship between two descriptors. in the thesaurus one of these descriptors has been defined as superior to the other one. there are no hierarchical relationships between nondescriptors, nor between nondescriptors and descriptors. a descriptor can have no lower descriptors or several of them, and no higher descriptors or several of them. according to the iso standard, hierarchy is expressed by means of the abbreviations bt (broader term), to indicate the generic or higher descriptors, and nt (narrower term), to indicate the specific or lower descriptors. the term at the head of the hierarchy to which a term belongs can be included, using the abbreviation tt (top term). figure 7 presents these hierarchical relationships. ■■ associative: a reciprocal relationship that is established between terms that are neither equivalent nor hierarchical, but are semantically or conceptually associated to such an extent that the link between them should be made explicit in the controlled vocabulary on the grounds that it may suggest additional terms for use in indexing or retrieval. it is generally indicated by the abbreviation rt (related term). there are no associative relationships between nondescriptors and descriptors, or between descriptors already linked by a hierarchical relation. it is possible to establish associative relationships between descriptors belonging to the same or different category. the associative relationships can be of very different types. for example, they can represent causality, instrumentation, location, similarity, origin, action, etc. figure 7 shows two associative relations, .. high order language (descriptor) sn a programming language that... uf high level language (no-descriptor) uf third generation language (no-descriptor) tt language bt programming language nt object oriented language nt declarative language rt assembly language (contrast with) rt machine language (contrast with) .. high level language use high order language .. third generation language use high order language .. figure 7. fragment of a thesaurus entry figure 6. example of taxonomy entry ... generating collaborative systems for digital libraries | hilera et al. 201 terms. for example: . or using the glossary notation: . ■■ the rest of the associative relationships (rt) that were included in the thesaurus correspond to the cross-references of the type “contrast with” and “see also” that appear explicitly in the ieee glossary. ■■ neither compound descriptors nor groups of descriptors have been implemented because there is no such structure in the glossary. ontology ding and foo state that “ontology promotes standardization and reusability of information representation through identifying common and shared knowledge. ontology adds values to traditional thesauri through deeper semantics in digital objects, both conceptually, relationally and machine understandably.”29 this semantic richness may imply deeper hierarchical levels, richer relationships between concepts, the definition of axioms or inference rules, etc. the final stage of the evolutive process is the transformation of the thesaurus created in the previous stage into an ontology. this is achieved through the addition of one or more of the basic elements of semantic complexity that differentiates ontologies from other knowledge representation standards (such as dictionaries, taxonomies, and thesauri). for example: ■■ semantic relationships between the concepts (classes) of the thesaurus have been added as properties or ontology slots. ■■ axioms of classes and axioms of properties. these are restriction rules that are declared to be satisfied by elements of ontology. for example, to establish disjunctive classes ( ), have been defined, and quantification restrictions (existential or universal) and cardinality restrictions in the relationships have been implemented as properties. software based on techniques of linguistic analysis has been developed to facilitate the establishment of the properties and restrictions. this software analyzes the definition text for each of the more than 1,500 glossary terms (in thesaurus format), isolating those words that a thesaurus from classes (rdf:class or daml:class) and properties (rdf:property or daml:objectproperty). in the first case they proposed five classes: thesaurusobject, concept, topconcept, term, scopenote; and several properties to implement the relations, like hasscopenote (sn), isindicatedby, preferredterm, usedfor (uf), conceptrelation, broaderconcept (bt), narrowerconcept (nt), topofhierarchy (tt) and isrelatedto (rt). recently the w3c has developed the skos specification, created to define knowledge organization schemes. in the case of thesauri, skos includes specific tags, such as skos:concept, skos:scopenote (sn), skos:broader (bt), skos:narrower (nt), skos:related (rt), etc., that are equivalent to those listed in the previous paragraph. our specification does not make any statement about the formal relationship between the class of skos concept schemes and the class of owl ontologies, which will allow different design patterns to be explored for using skos in combination with owl. although any of the above-mentioned formats could be used to implement the thesaurus, given that the endproduct of our process is to be an ontology, our proposal is that the product to be generated during this phase should have a format compatible with the final ontology and with the previous taxonomy. therefore a minimal number of changes will be carried out on the product created in the previous step, resulting in a knowledge representation tool similar to a thesaurus. that tool does not need to be modified during the following (final) phase of transformation into an ontology. nevertheless, if for some reason it is necessary to have the thesaurus in one of the other formats (such as skos), it is possible to apply a simple xslt transformation to the product. another option would be to integrate a thesaurus ontology, such as the one proposed by hall,28 with the ontology representing the ieee glossary. in the thesaurus implementation carried out in our project, the following limitations have been considered: ■■ only the hierarchical relationships implemented in the taxonomy have been considered. these include relationsips of type “is-a,” that is, generalization relationships or type–subset relationships. relationships that can be included in the thesaurus marked with tt, bt, and nt, like relations of type “part of” (that is, partative relationships) have not been considered. instead of considering them as hierarchical relationships, the final ontology includes the possibility of describing classes as a union of classes. ■■ the relationships of synonymy (uf and use) used to model the cross-references in the ieee glossary (“syn” and “see,” respectively) were implemented as equivalent terms, that is, as equivalent axioms between classes (owl:equivalentclass or daml:sameclassas), with inverse properties to reflect the preference of the 202 information technology and libraries | december 2010 match the name of other glossary terms (or a word in the definition text of other glossary terms). the isolated words will then be candidates for a relationship between both of them. (figure 8 shows the candidate properties obtained from the software engineering glossary.) the user then has the option of creating relationships with the identified candidate words. the user must indicate, for every relationship to be created, the restriction type that it represents as well as existential or universal quantification or cardinality (minimum or maximum). after confirming this information, the program updates the file containing the ontology (owl or daml), adding the property to the class that represents the processed term. figure 9 shows an example of the definition of two properties and its application to the class highorderlanguage: a property express with existential quantification over the class datastructure to indicate that a language must represent at least one data structure; and a property translateto of universal type to indicate that any high-level language is translated into machine language (machinelanguage). ■■ results, conclusions, and future work the existence of ontologies of specific knowledge domains (software engineering in this case) facilitates the process of finding resources about this discipline on the semantic web and in digital libraries, as well as the reuse of learning objects of the same domain stored in repositories available on the web.30 when a new resource is indexed in a library catalog, a new record that conforms to the ontology conceptual data model may be included. it will be necessary to assign its properties according to the concept definition included in the ontology. the user may later execute semantic queries that will be run by the search system that will traverse the ontology to identify the concept in which the user was interested to launch a wider query including the resources indexed under the concept. ontologies, like the one that has been “evolved,” may also be used in an open way to index and search for resources on the web. in that case, however, semantic search engines such as swoogle (http://swoogle.umbc .edu/), are required in place of traditional syntactic search engines, such as google. the creation of a complete ontology of a knowledge domain is a complex task. in the case of the domain presented in this paper, that of software engineering, although there have been initiatives toward ontology creation that have yielded publications by renowned authors in the field,31 a complete ontology has yet to be created and published. this paper has described a process for developing a modest but complete ontology from a glossary of terminology, both in owl format and daml+oil format, accept access accomplish account achieve adapt add adjust advance affect aggregate aid allocate allow allow symbolic naming alter analyze apply approach approve arrangement arrive assign assigned by assume avoid await begin break bring broke down builds call called by can be can be input can be used as can operate in cannot be usedas carry out cause change characterize combine communicate compare comply comprise conduct conform consist constrain construct contain contains no contribute control convert copy correct correspond count create debugs decompiles decomposedinto decrease define degree delineate denote depend depict describe design designate detect determine develop development direct disable disassembles display distribute divide document employ enable encapsulate encounter ensure enter establish estimate establish evaluate examine exchange execute after execute in executes expand express express as extract facilitate fetch fill follow fulfil generate give give partial given constrain govern have have associated have met have no hold identify identify request ignore implement imply improve incapacitate include incorporate increase indicate inform initiate insert install intend interact with interprets interrelate investigate invokes is is a defect in is a form of is a method of is a mode of is a part is a part of is a sequence is a sequenceof is a technique is a techniqueof is a type is a type of is ability is activated by is adjusted by is applied to is based is called by is composed is contained is contained in is establish is established is executed after is executed by is incorrect is independent of is manifest is measured in is not is not subdivided in is part is part of is performed by is performed on is portion is process by is produce by is produce in is ratio is represented by is the output is the result of is translated by is type is used is used in isolate know link list load locate maintain make make up may be measure meet mix modify monitors move no contain no execute no relate no use not be connected not erase not fill not have not involve not involving not translate not use occur occur in occur in a operate operatewith optimize order output parses pas pass test perform permit permitexecute permit the execution pertaining place preclude predict prepare prescribe present present for prevent preventaccessto process produce produce no propose provide rank reads realize receive reconstruct records recovery refine reflect reformat relate relation release relocates remove repair replace represent request require reserve reside restore restructure result resume retain retest returncontrolto reviews satisfy schedule send server set share show shutdown specify store store in structure submission of supervise supports suppress suspend swap synchronize take terminate test there are no through throughout transfer transform translate transmit treat through understand update use use in use to utilize value verify work in writes figure 8. candidate properties obtained from the linguistic analysis of the software engineering glossary generating collaborative systems for digital libraries | hilera et al. 203 to each term.) we defined 324 properties or relationships between these classes. these are based on a semiautomated linguistic analysis of the glossary content (for example, allow, convert, execute, operatewith, produces, translate, transform, utilize, workin, etc.), which will be refined in future versions. the authors’ aim is to use this ontology, which we have called ontoglose (ontology glossary software engineering), to unify the vocabulary. ontoglose will be used in a more ambitious project, whose purpose is the development of a complete ontology in software engineering from the swebok guide.32 although this paper has focused on this ontology, the method that has been described may be used to generate an ontology from any dictionary. the flexibility that owl permits for ontology description, along with its compatibility with other rdf-based metadata languages, makes possible interoperability between ontologies and between ontologies and other controlled vocabularies and allows for the building of merged representations of multiple knowledge domains. these representations may eventually be used in libraries and repositories to index and search for any kind of resource, not only those related to the original field. ■■ acknowledgments this research is co-funded by the spanish ministry of industry, tourism and commerce profit program (grant tsi-020100-2008-23). the authors also want to acknowledge support from the tifyc research group at the university of alcala. references and notes 1. m. dörr et al., state of the art in content standards (amsterdam: ontoweb consortium, 2001). 2. d. soergel, “the rise of ontologies or the reinvention of classification,” journal of the american society for information science 50, no. 12 (1999): 1119–20; a. gilchrist, “thesauri, taxonomies and ontologies—an etymological note,” journal of documentation 59, no. 1 (2003): 7–18. 3. b. j. wielinga et al., “from thesaurus to ontology,” proceedings of the 1st international conference on knowledge capture (new york: acm, 2001): 194–201: j. qin and s. paling, “converting a controlled vocabulary into an ontology: the case of gem,” information research 6 (2001): 2. 4. according to van heijst, schereiber, and wielinga, ontologies can be classified as terminological ontologies, information ontologies, and knowledge modeling ontologies; terminological ontologies specify the terms that are used to represent knowledge in the domain of discourse, and they are in use principally to unify vocabulary in a certain domain. g. van heijst, a. t. which is ready to use in the semantic web. as described at the opening of this article, our aim has been to create a lightweight ontology as a first version, which will later be improved by including more axioms and relationships that increase its semantic expressiveness. we have tried to make this first version as tailored as possible to the initial glossary, knowing that later versions will be improved by others who might take on the work. such improvements will increase the ontology’s utility, but will make it a lessfaithful representation of the ieee glossary from which it was derived. the ontology we have developed includes 1,521 classes that correspond to the same number of concepts represented in the ieee glossary. (included in this number are the different meanings that the glossary assigns ... figure 9. example of ontology entry 204 information technology and libraries | december 2010 20. w3c, skos; object management group, xml metadata interchange (xmi), 2003, http://www.omg.org/technology/documents/formal/xmi.htm (accessed oct. 5, 2009). 21. uml (unified modeling language) is a standardized general-purpose modeling language (http://www.uml.org). nowadays, different uml plugins for ontologies’ editors exist. these plugins allow working with uml graphic models. also, it is possible to realize the uml models with a case tool, to export them to xml format, and to transform them to the ontology format (for example, owl) using a xslt sheet, as the one published in d. gasevic, “umltoowl: converter from uml to owl,” http://www.sfu.ca/~dgasevic/projects/umltoowl/ (accessed oct. 5, 2009). 22. gilchrist, “thesauri, taxonomies and ontologies.” 23. soergel, “the rise of ontologies or the reinvention of classification.” 24. j. m. morales-del-castillo et al., “a semantic model of selective dissemination of information for digital libraries,” information technology & libraries 28, no. 1 (2009): 22–31. 25. international standards organization, iso 2788:1986 documentation—guidelines for the establishment and development of monolingual thesauri (geneve: international standards organization, 1986). 26. b. m. matthews, k. miller, and m. d. wilson, “a thesaurus interchange format in rdf,” 2002, http://www.w3c.rl.ac .uk/swad/thes_links.htm (accessed feb. 10, 2009). 27. m. hall, “call thesaurus ontology in daml,” dynamics research corporation, 2001, http://orlando.drc.com/daml/ ontology/call-thesaurus (accessed oct. 5, 2009). 28. ibid. 29. y. ding and s. foo, “ontology research and development. part 1—a review of ontology generation,” journal of information science 28, no. 2 (2002): 123–36. see also b. h. kwasnik, “the role of classification in knowledge representation and discover,” library trends 48 (1999): 22–47. 30. s. otón et al., “service oriented architecture for the implementation of distributed repositories of learning objects,” international journal of innovative computing, information & control (2010), forthcoming. 31. o. mendes and a. abran, “software engineering ontology: a development methodology,” metrics news 9 (2004): 68–76; c. calero, f. ruiz, and m. piattini, ontologies for software engineering and software technology (berlin: springer, 2006). 32. ieee, guide to the software engineering body of knowledge (swebok) (los alamitos, calif.: ieee computer society, 2004), http:// www.swebok.org (accessed oct. 5, 2009). schereiber, and b. j. wielinga, “using explicit ontologies in kbs development,” international journal of human & computer studies 46, no. 2/3 (1996): 183–292. 5. r. neches et al., “enabling technology for knowledge sharing,” ai magazine 12, no. 3 (1991): 36–56. 6. o. corcho, f. fernández-lópez, and a. gómez-pérez, “methodologies, tools and languages for buildings ontologies. where is their meeting point?” data & knowledge engineering 46, no. 1 (2003): 41–64. 7. intitute of electrical and electronics engineers (ieee), ieee std 610.12-1990(r2002): ieee standard glossary of software engineering terminology (reaffirmed 2002) (new york: ieee, 2002). 8. j. krause, “semantic heterogeneity: comparing new semantic web approaches with those of digital libraries,” library review 57, no. 3 (2008): 235–48. 9. t. berners-lee, j. hendler, and o. lassila, “the semantic web,” scientific american 284, no. 5 (2001): 34–43. 10. world wide web consortium (w3c), resource description framework (rdf): concepts and abstract syntax, w3c recommendation 10 february 2004, http://www.w3.org/tr/rdf-concepts/ (accessed oct. 5, 2009). 11. world wide web consortium (w3c), web ontology language (owl), 2004, http://www.w3.org/2004/owl (accessed oct. 5, 2009). 12. world wide web consortium (w3c), skos simple knowledge organization system, 2009, http://www.w3.org/ tr/2009/rec-skos-reference-20090818/ (accessed oct. 5, 2009). 13. m. m. yee, “can bibliographic data be put directly onto the semantic web?” information technology & libraries 28, no. 2 (2009): 55-80. 14. l. f. spiteri, “the structure and form of folksonomy tags: the road to the public library catalog,” information technology & libraries 26, no. 3 (2007): 13–25. 15. corcho, fernández-lópez, and gómez-pérez, “methodologies, tools and languages for buildings ontologies.” 16. ieee, ieee std 610.12-1990(r2002). 17. n. f. noy and d. l. mcguinness, “ontology development 101: a guide to creating your first ontology,” 2001, stanford university, http://www-ksl.stanford.edu/people/dlm/ papers/ontology-tutorial-noy-mcguinness.pdf (accessed sept 10, 2010). 18. d. baader et al., the description logic handbook (cambridge: cambridge univ. pr., 2003). 19. world wide web consortium, daml+oil reference description, 2001, http://www.w3.org/tr/daml+oil-reference (accessed oct. 5, 2009); w3c, owl. editorial | truitt 3 w ithin the last few months, two provocative books have been published that take different approaches to the question of how we learn in the always-on, always-connected electronic environment of “screens.” while neither is specifically directed at librarians, i think both deserve to be read and discussed widely in our community. ■■ the shallows the first, the shallows: what the internet is doing to our brains (norton, 2010), by nicholas carr, is an expanded version of his article “is google making us stupid?” published in the july/august 2008 issue of atlantic monthly and discussed in this space soon after.1 carr’s arguments in the shallows will be familiar to those who read his earlier article, but they are more thoroughly developed in his book and worth summarizing here. carr’s thesis is that use of connective technology—the internet and the web—is leading to a remapping of cognitive reading and thinking skills, and a “shallowing” of these mental faculties: over the last few years i’ve had an uncomfortable sense that someone, or something, has been tinkering with my brain, remapping the neural circuitry, reprogramming the memory. . . . i’m not thinking the way i used to think. i feel it most strongly when i’m reading. i used to find it easy to immerse myself in a book or a lengthy article. . . . that’s rarely the case anymore. (5) the problem, as carr goes on to describe at some length, chronicling in detail the results of years of neurological investigations, is that the brain is “plastic.” “virtually all of our neural circuits—whether they’re involved in feeling, seeing, hearing, moving, thinking, learning, perceiving, or remembering—are subject to change.” and one of the things that is changing them the most drastically today is our growing reliance on digital information. the paradox is that as we repeat an activity—surfing the web and clicking on links, rather than engaging with linear texts, for example—chemically induced synapses cause us to want to continue the new activity, strengthening those links (34). this quality of plastic neural circuits that can be remapped, when combined with the “ecosystem of interruption technologies” of the internet and the web (e.g., in-text hyperlinks, e-mail and rss alerts, text messaging, twitter, multiple widgets, etc.) is resulting in what carr argues is a growing inability or unwillingness to engage with and reflect deeply upon extended text (91).2 as carr puts it, the linear, literary mind . . . [that has] been the imaginative mind of the renaissance, the rational mind of the enlightenment, the inventive mind of the industrial revolution, even the subversive mind of modernism . . . may soon be yesterday’s mind. (10) there is much more. carr offers pointed critiques of major internet players and the roles they play in facilitating and exploiting the remapping of our neural circuits. google, whose “profits are tied directly to the velocity of people’s information intake,” is to carr “in the business of distraction” (156–57). the google book initiative “shouldn’t be confused with the libraries we’ve known until now. it’s not a library of books. it’s a library of snippets. . . . the strip-mining of ‘relevant content’ replaces the slow excavation of meaning” (166). ultimately, for carr, it’s about who is controlling whom. while the internet may permit us to better perform some functions—search, for example—“it poses a threat to our integrity as human beings . . . we program our computers and thereafter they program us” (214). put another way, “the computer screen bulldozes our doubts with its bounties and conveniences. it is so much our servant that it would seem churlish to notice that it is also our master” (4). ■■ hamlet’s blackberry perhaps less familiar than carr’s work is william powers’ hamlet’s blackberry: a practical philosophy for building a good life in the digital age (harpercollins 2010). powers, a writer whose work has appeared in the washington post, the new york times, the new republic, and elsewhere, describes the influence of digital technology (or “screens,” to use his shorthand)3 and connectedness on our lives: in the last few decades, we’ve found a powerful new way to pursue more busyness: digital technology. computers and smart phones are often pitched as solutions to our stressful, overextended lives. . . . but at the same time, they link us more tightly to all the sources of our busyness. our screens are conduits for everything that keeps us hopping—mandatory and optional, worthwhile and silly. . . . marc truitteditorial: “the air is full of people” marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | march 2011 if not yet a general consensus, that people are coming to experience and understand these costs. finally, they also make the point that things need not continue on their present course. i can imagine that if we in libraries take carr and powers seriously, there might be significant implications for service models and collections practices. both books have been reviewed in all the usual mainstream places. remarkably though, to me—and excluding a scant few discussion list threads such as that on web4lib several years ago—i’ve seen no discussion in the usual professional venues of their implications where libraries are concerned. perhaps i’m simply not reading the “right” weblogs or discussion lists. i’m not under the illusion that libraries or librarians can by themselves alter our rush toward the “shallows.” still, given our eagerness to discuss how we extend the reach of “screens” in libraries—whether in the form of learning commons, wireless access, mobile-friendly websites, clearing stacks of “tree-books” in favor of e-books, etc.—would it not be reasonable to think that we should show as much concern about the consequences of such activities, and even some interest in providing possible remedial alternatives? one of my favorite library spaces in college was the linonia and brothers reading room in yale’s sterling memorial library (see a photo of the reading room at http://images.library.yale.edu/madid/oneitem.aspx ?id=1772930). its dark oak paneling, built-in bookshelves, overstuffed leather easy chairs, cozy alcoves, toasty, footwarming steam radiators, and stained-glass windows overlooking a quiet courtyard represented the epitome of the nineteenth-century “gentleman’s library” and encouraged the sort of deep reading and contemplation that are becoming so rare in our institutions today. i spent many hours there, reading, thinking, dreaming—and yes, catnapping too. i haven’t visited the “l&b” in years; i hope it is still the way i so fondly recall it. over the past few years, as we’ve considered the various aspects of the library-as-space question, we’ve created all manner of collaborative, group-focused, überconnected learning spaces. we’ve also created bookfree spaces (to say nothing of book-free “libraries”), food-friendly spaces, quiet and cell-phone-free spaces, and a host of others of which i’m sure i haven’t thought. so, in an attempt to get us thinking about what carr ’s and powers’ books might mean for libraries, here’s a crazy idea to start us off: how about a screen-free space for deep reading and contemplation? it should be very low-tech: no mobiles, no laptops, no desktops, no networks, no clickety-clack of keys, no chimes of incoming e-mail and tweets, no unearthly glow of monitors. no food, drink, or group-study areas, either. just a quiet, inviting, comfortable space for individual reading and the goal is no longer to be “in touch” but to erase the possibility of ever being out of touch. to merge, to live simultaneously with everyone, sharing every moment, every perception, thought, and action via our screens. even the places where we used to go to get away from the crowd and the burdens it imposes on us are now connected. the simple act of going out for a walk is completely different today from what it was fifteen years ago. whether you’re walking down a big-city street or in the woods outside a country town, if you’re carrying a mobile device with you, the global crowd comes along. . . . the air is full of people. (14–15) drawing inspiration and analogy from a list of philosophers and other historical and literary figures beginning with plato and ending with mcluhan, powers describes seven practical approaches, tools, and techniques for disconnecting from our screen-driven life: ■■ seek physical distance (plato) ■■ seek intellectual and emotional distance (seneca) ■■ hope for devices that might allow us to customize our degree of connectedness (gutenberg) ■■ consider older, low-tech tools as alternatives where possible (shakespeare via hamlet) ■■ create positive rituals (ben franklin) ■■ create a “walden zone” refuge (thoreau) ■■ be aware of and take personal control from technology by being aware of that technology (mcluhan) powers then reviews how he and his family used these techniques to regain the sense of control and depth they felt they’d lost to screens. in the past several months, i’ve tried a couple myself. i no longer carry a blackberry unless i’m traveling out of town. i avoid e-mail and the internet completely on saturdays (my “internet sabbath”). the effect of these two small and easily achieved changes has been little short of liberating, providing space to think and reflect without the distraction of always-on connectedness. walking my lab seamus has become a special pleasure! ■■ bringing libraries into the picture so, what do carr’s and powers’ theses mean for libraries, and what do they mean in particular for those of us who provide technology solutions for libraries? they remind us that there is a very real human cost to the technology of screens and always-on connectedness that have become our stock-in-trade in recent years. as well, they provide convincing evidence that there is a growing awareness, editorial | truitt 5 references and notes 1. carr’s atlantic monthly article appeared in volume 301 (july/aug. 2008) and can be found at http://www.theatlantic . c o m / m a g a z i n e / a rc h i v e / 2 0 0 8 / 0 7 / i s g o o g l e m a k i n g u s -stupid/6868/ (accessed jan. 14, 2011); my ital column on the topic is at http://www.ala.org/ala/mgrps/divs/lita/ ital/272008/2703sep/editorial_pdf.cfm (accessed jan. 14, 2011). 2. the term “ecosystem of interruption technologies” belongs to cory doctorow. 3. powers uses the term “screens” to describe “the connective digital devices that have been widely adopted in the last two decades, including desktop and notebook computers, mobile phones, e-readers, and tablets” (1). thought. would some of our patrons adopt it? i’m willing to bet that they would. do we not owe them the same commitment to service that we’ve worked so hard to provide to those who wish to be collaborative and “always-on”? absolutely. no, we can’t change the world or stop the march of the screens. but perhaps, as with powers’ “walden zone,” we can start by providing a close-at-hand safe harbor for those of our patrons seeking refuge from the “always-on” world of screens. 44 information technology and libraries | december 2007 author id box for 3 column layout column titlecommunications afghanistan digital library initiative: revitalizing an integrated library system yan han and atifa rawan this paper describes an afghanistan digital library initiative of building an integrated library system (ils) for afghanistan universities and colleges based on open-source software. as one of the goals of the afghan equality digital libraries alliance, the authors applied systems analysis approach, evaluated different open-source ilss, and customized the selected software to accommodate users’ needs. improvements include arabic and persian language support, user interface changes, call number label printing, and isbn-13 support. to our knowledge, this ils is the first at a large academic library running on open-source software. the last quartercentury has been devastating for afghanistan, with an uninterrupted period of inva sions, civil wars, and oppressive regimes. “since 1979, the education system was virtually destroyed on all levels. schools and colleges were closed, looted, or physically reduced; student bodies and faculties were emptied by war, migration, and eco nomic hardship; and libraries were gutted.”1 kabul university (ku), for example, was largely demolished by 1994 and completely closed down in 1998. it is universally recognized that afghanistan desperately needs trained faculty, teachers, librarians, and staff. the current state of the higher education system is one of dramatic destruction and deteriora tion. based on rawan’s assessments of ku library, most of its collections were damaged or destroyed. she found that there were approximately 60,000 to 70,000 books in english, 2,000 to 3,000 books in persian, and 2,000 theses in persian. none of these collections have manual or online catalog records. the library has eigh teen staff members, but not all are fully trained in library activities.2 rebuilding the educational infra structure in afghanistan is essential. afghan equality digital libraries alliance the university of arizona (ua) library has been involved in rebuilding academic libraries in afghanistan since april 2002. in 2005, we were invited to be part of the digital libraries alliance (dla) as part of the afghan equality alliances: 21st century universities for afghanistan initiative funded by the usaid and washington state university. dla’s goal is to build the capacity of afghan libraries and librarians to work with open source digital libraries platforms; and to provide and enhance access to schol arly information resources and open content that all afghanistan univer sities can share. revitalizing the afghan ils an integrated library system (ils) usually includes several critical com ponents, such as acquisitions, cat aloging, catalog (search and find), circulation, and patron management. traditionally it has been the center of any library. recent developments in digital libraries have resulted in dis tributed systems in libraries, and the ils is treated as one of many digital library systems. it still is critical to have a centralized ils to provide a primary way to access libraryowned materials for afghanistan universi ties and colleges. other services, such as interlibrary loan and other digital library systems, can be further devel oped to extend libraries’ services to users and communities. the ua library is working collab oratively with other dla members, including universities around the world and universities in afghanistan. one of the goals is to develop a digital library environment, includ ing a centralized ils for four aca demic universities in kabul (kabul university, polytechnic university, kabul medical university, and kabul education university). in the future, the ils will include other regional institutions throughout afghanistan. the ils will support 30,000 students and 2,000 faculty in afghan universi ties and colleges. overview of the ils market currently the ils market is primar ily dominated by commercial sys tems, such as innovative interface, endeavor, and sirsi. compared with other computing areas, opensource systems in ils are immature and limited, as there are only a few prod ucts available, and most of them do not have the full features of an ils. however, they are providing a valu able alternative to those costly com mercial systems. based on the availability of exist ing funding, experiences with com mercial vendors, and consideration of vendor supports and future direc tions, the authors decided to build a digital library infrastructure with the open concept (open access, open source, and open standards). the decision is widely influenced by glo balization, open access, open source, open standards, and increasing user expectations. at the same time, the decision gives us an opportunity to develop and integrate new tools and services for libraries as suggested by the university of california.3 koha is probably the most renowned opensource ils. it is yan han (hany@u.library.arizona.edu) is systems librarian and atifa rawan (rawana@u.library.arizona.edu) is librarian at the university of arizona libraries, tucson. afghanistan digital library initiative | han and rawan 45 a fullfeatured ils, developed in new zealand and first deployed in horowhenua library trust in 2000. so far koha has been running in a few public and special libraries. the underlying architecture is the linux, apache, mysql, and perl (lamp) stack. building on a simi lar lamp (linux, apache, mysql, and php) architecture, openbiblio has a relatively short history, releas ing its first beta 0.1.0 version in 2002 and currently in beta 0.5.1 version. webils is an opensource ils based on unesco’s cds/isis database, developed by the institute for computer and information engineering in poland. the software has some ils features, including cataloging, catalog (search and find), loan, and report modules. weblis must run on windows and window based web servers, such as xitami/ microsoft iis and isis database. gnuteca, another opensource ils widely deployed in south america universities, was developed in brazil. as with webils, it has some ils features, such as cataloging, cata log, and loan; however, the software interface is written in portuguese, which presents a language barrier for u.s. and afghanistan users. the paper open source integrated library systems provides a good overview of other systems.4 systems analysis the authors adopted systems analy sis by taking account of afghan col lections, users’ needs, and systems functionality required to perform essential library operations. koha was chosen as the base software, due to its functionality, maturity, and support. some of the reasons are: ■ the software architecture is open source lamp, which is popular, stable, and predominant. ■ our staff have skills in these open software systems. ■ it is a fullfeatured opensource ils. certain components, such as multiple branch support and user management, are critical. ■ two large public libraries serv ing population of 30,000 users in new zealand and united states have been running their ils on koha for a few years. the soft ware is stable, and most bugs have been fixed. ■ koha has a mailing list that is used by koha developers and users as a communication tool to ask and answer questions. kabul universities have com puter science faculty and students who have the capacity to participate in the development. due to working schedules and locations, we prefer to develop and maintain the system in the ua library. the technical project team consists of three people: yan han, who is responsible for manag ing the overall implementation and development in the open source ils system; one parttime (twenty hours per week) student developer whose major task is to develop and man age source code; and a temporary student (ten hours per week for two months) responsible for translating english to farsi and dari. testing tasks, such as unit testing and sys tem testing, are shared by all mem bers of the team. major challenges farsi and dari languages support koha version 2.2 cannot correctly handle east asian language records, including farsi and dari records. supporting persian, farsi, and dari records is a very important require ment, as these afghan universities have quite a few persian and dari materials. koha generates a web based graphical user interface (gui) through perl included templates that use a html meta tag with western character set (iso85591) to encode characters. browsers such as internet explorer and firefox use the meta tag to decode characters with a predefined character set. therefore, other characters, such as arabic and persian as well as chinese would not be displayed correctly. the perl tem plates were identified and modified to allow characters to be encoded in unicode, and this solved the prob lem. persian and dari characters can be entered into the cataloging module and displayed correctly in the gui. however, we should understand the limitations of this approach when dealing with other east asian character sets, such as chinese characters. only frequently used characters can be represented. a project of academia sinica is one of the efforts to deal with 65,000 unique chinese characters.5 farsi/dari gui as the project is designed for local afghanistan users, there is a need for a farsi and dari gui. the current version of koha does not have such an interface, and we decided to create a new farsi/dari gui for the opac. the koha system’s internal structure is logically arranged; therefore, our development work in translation is not difficult to manage. the transla tion student translates english words in perl template files into farsi and dari. at the same time he works with the developer to make sure it is dis played correctly in the opac. figure 1 is the screenshot of the gui. other improvements we further developed a spine label printing module and integrated the module into the ils, as there is no such function provided. the module allows library staff to print one or more standardized labels (1.5 inches high by 1 inch wide) with oclc formats on gaylord lsl 01 paper, which has fiftysix labels per sheet. 46 information technology and libraries | december 2007 lstaff can select an appropriate label slot to start and print out his or her choices of labels through the web preview feature. this feature eases library staff operations and provides cost savings for label papers. isbn13 replaced isbn10 after january 1, 2007, and any ils has to be able to handle the new isbn13. our ils has been improved to han dle both isbn standards. thanks to koha’s delegation of the gui and major functionality, interfaces such as fonts and web pages can be modi fied through the templates and css. a z39.50 service has been configured to allow users to search other librar ies’ catalogs. hardware and software support afghanistan is still developing its fun damental infrastructure: electricity, transportation, and communication. when considering buying hardware for the ils, difficult issues, such as server services and computer parts, have to be solved. even international it companies, such as dell, hp, and ibm, have very limited services and support in afghanistan. regarding software and system support, our strategies are to: ■ maintain and develop the open source software at the ua library by the project team; ■ run one server in kabul, afghanistan, administrated by a local system administrator. ■ run one server in the ua library administrated by the library’s system administrator. cost we estimated our overall cost for building the opensource system is low and reasonable. the system is currently run ning on a dell 2800 server ($5,000 for 3ghz cpu, 4gb ram, and five 73gb hard drives), kernel built debian linux (free), apache 2 (free), mysql (free), and perl (free). han spends four hours per week for coor dination, communication, and man agement of the project. the student developer works twenty hours per week for development and mainte nance, while the translation student will spend one hundred hours for translation. conclusion revitalizing an afghan ils is the first important goal to build digital library initiatives for the afghanistan higher education system. by under standing afghan university librar ies, collections, and users, the ua library is working with other dla members to build the open source ils. the new farsi and dari user interface, language support, and other improvements have been made to meet needs of afghan uni versities and colleges. the cost of using and developing existing open source software is reasonable. acknowledgments we thank usaid, washington state university, and other dla mem bers for providing support. this work was supported by usaid and washington state university. references and notes 1. nazif sharani et. al., conference transcription, conference on strate gic planning of higher education for afghanistan, 2002, indiana university, bloomington, oct. 6–7. 2. atifa rawan, transformation in afghanistan: rebuilding libraries, paper presented at azla conference, mesa, ariz., oct. 11–13, 2005. 3. the university of california libraries, rethinking how we provide bibliographic services for the university of california, 2005, http://libraries.univer sityofcalifornia.edu/sopag/bstf/final. pdf. 4. eric anctil and jamshid beheshti, open source integrated library systems: an overview, 2004, www.anctil.org/users/ eric/oss4ils.html (accessed nov. 5, 2006). 5. derming juang et al., “resolving the unencoded character problem for chinese digital libraries,” proceedings of the 5th acm/ieee-cs joint conference on digital libraries, jcdl 2005, denver (june 7–11, 2005): 311–19 (new york: acm pr., 2005). figure 1: afghanistan academic libraries union catalog in farsi/dari lita cover 2, cover 3, cover 4 index to advertisers a s i approach the end of my tenure as ital edi tor, i reflect on the many lita members who have not submitted articles for possible publica tion in our journal. i am especially mindful of the smaller number who have promised or hinted or implied that they intended to or might submit articles. admittedly, some of them may have done so because i asked them, and their replies to me were the polite ones that one expects of the honorable members of the library and information technology association of the american library association. librarians are as individuals almost all or almost always polite in their professional discourse. pondering these potential authors, particularly the smaller number, i conjured a mental picture of a fictional, male, potential ital author. i don’t know why my fic tional potential author was male—it may be because more males than females are members of that group; it may be because i’m a male; or it may be unconscious sex ism. i’m not very selfanalytic. my mental picture of this fictional male potential author saw him driving home from his place of employ ment after having an afterwork half gallon of rum when, into the picture, a rattlesnake crawled on to the seat of his car and bit him on the scrotum. lucky him: he was, after all, a figment of my imagina tion. (any resemblance between my fictional author and a real potential author is purely coincidental.) lucky me: we all know that such an incident is not unthinkable in library land. lucky lita: it is unlikely that any member will cancel his or her membership or any subscriber, his, her, or its subscription because the technical term “scro tum” found its way into my editorial. ital is, after all, a technology journal, and members and readers ought to be offended if our journal abjures technical terminology. likewise they should be offended if our articles discuss library technology issues misusing technical terms or concepts, or confusing technical issues with policy issues, or stating technology problems or issues in the title or abstract or introduction then omitting any mention of said problems until the final paragraph(s). ital referees are quite diligent in questioning authors when they think terminology has been used loosely. their close readings of manuscripts have caught more than one author mislabeling policies related to the uses of informa tion technologies as if the policies were themselves tech nical conundrums. most commonly, they have required authors who state major theses or technology problems at the beginnings of their manuscripts, then all but ignore these until the final paragraphs, to rewrite sections of their manuscripts to emphasize the often interesting questions raised at the outset. what, pray tell, is the editor trying to communicate to readers? two things, primarily. first, i have been following with interest the several heated discussions that have taken place on lital for the past number of months. sometimes, the idea of the traditional quarterly scholarly/professional journal in a field changing so rapidly may seem almost quaint. a typical ital article is five months old when it is pub lished. a typical discussion thread on lital happens in “real time” and lasts two days at most. a small number of participants raise and “solve” an issue in less than a half dozen posts. a few times, however, a question asked or a comment posted by a lita member has led to a flurry of irrelevant postings, or, possibly worse, sustained bomb ing runs from at least two opposing camps that have left some members begging to be removed from the list until the all clear signal has been sounded. i’ve read all of these, and i could not help but won der, what if ital accepted manuscripts as short as lital postings? what would our referees do? i suspect, for our readers’ sakes, most would be rejected. authors whose manuscripts are rejected receive the comments made by the referees and me explaining why we cannot accept their submissions. the most frequent reason is that they are out of scope, irrelevant to the purposes of lita. when someone posts a technology question to lital that gener ates responses advising the questioner that implementing the technology in question is bad policy, the responses are, from an editor’s point of view, out of scope. how many lita members have authority—real authority—to set policy for their libraries? a second “popular” reason for rejections is that the manuscripts pose “false” problems that may be technological but that are not technologies that are within the “control” of libraries. these are out of scope in a different manner. third, some manuscripts do not pass the “so what” test. some days i wish that lital responders would referee, honestly, their own responses for their relevance to the questions or issues or sowhatness and to the membership. second, and more importantly to me, lita members, whether or not your bodies include the part that we all have come to know and defend, do you have the “” to send your ital editor a manuscript to be chewed upon not by rattlesnakes but by the skilled professionals who are your ital editorial board members and referees? i hope (and do i dare beg again?) so. your journal will not suffer quaintness unless you make it so. editorial: the virtues of deliberation john webb john webb (jwebb@wsu.edu) is a librarian emeritus, washington state university, and editor of information technology and libraries. editorial | webb 3 bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 205 kayla l. quinney, sara d. smith, and quinn galbraith bridging the gap: self-directed staff technology training of hbll patrons. as anticipated, results indicated that students frequently use text messages, social networks, blogs, etc., while fewer staff members use these technologies. for example, 42 percent of the students reported that they write a blog, while only 26 percent of staff and faculty do so. also, 74 percent of the students and only 30 percent of staff and faculty indicated that they belonged to a social network. after concluding that staff and faculty were not as connected as their student patrons are to technology, library administration developed the technology challenge to help close this gap. the technology challenge was a self-directed training program requiring participants to explore new technology on their own by spending at least fifteen minutes each day learning new technology skills. this program was successful in promoting lifelong learning by teaching technology applicable to the work and home lives of hbll employees. we will first discuss literature that shows how technology training can help academic librarians connect with student patrons, and then we will describe the technology challenge and demonstrate how it aligns with the principles of self-directed learning. the training will be evaluated by an analysis of the results of two surveys given to participants before and after the technology challenge was implemented. ■■ library 2.0 and “librarian 2.0” hbll wasn’t the first to notice the gap between librarians and students, mcdonald and thomas noted that “gaps have materialized,” and library technology does not always “provide certain services, resources, or possibilities expected by emerging user populations like the millennial generation.”1 college students, who grew up with technology, are “digital natives,” while librarians, many having learned technology later in life, are “digital immigrants.”2 the “digital natives” belong to the millennial generation, described by shish and allen as a generation of “learners raised on and confirmed experts in the latest, fastest, coolest, greatest, newest electronic technologies.”3 according to sweeny, when students use libraries, they expect the same “flexibility, geographic independence, speed of response, time shifting, interactivity, multitasking, and time savings” provided by the technology they use daily.4 students are undergraduates, as members of the millennial generation, are proficient in web 2.0 technology and expect to apply these technologies to their coursework—including scholarly research. to remain relevant, academic libraries need to provide the technology that student patrons expect, and academic librarians need to learn and use these technologies themselves. because leaders at the harold b. lee library of brigham young university (hbll) perceived a gap in technology use between students and their staff and faculty, they developed and implemented the technology challenge, a self-directed technology training program that rewarded employees for exploring technology daily. the purpose of this paper is to examine the technology challenge through an analysis of results of surveys given to participants before and after the technology challenge was implemented. the program will also be evaluated in terms of the adult learning theories of andragogy and selfdirected learning. hbll found that a self-directed approach fosters technology skills that librarians need to best serve students. in addition, it promotes lifelong learning habits to keep abreast of emerging technologies. this paper offers some insights and methods that could be applied in other libraries, the most valuable of which is the use of self-directed and andragogical training methods to help academic libraries better integrate modern technologies. l eaders at the harold b. lee library of brigham young university (hbll) began to suspect a need for technology training when employees were asked during a meeting if they owned an ipod or mp3 player. out of the twenty attendees, only two raised their hands—one of whom worked for it. perceiving a technology gap between hbll employees and student patrons, library leaders began investigating how they could help faculty and staff become more proficient with the technologies that student patrons use daily. to best serve student patrons, academic librarians need to be proficient with the technologies that student patrons expect. hbll found that a self-directed learning approach to staff technology training not only fosters technology skills, but also promotes lifelong learning habits. to further examine the technology gap between librarians and students, the hbll staff, faculty, and student employees were given a survey designed to explore generational differences in media and technology use. student employees were surveyed as representatives of the larger student body, which composes the majority kayla l. quinney (quinster27@gmail.com) is research specialist, sara d. smith (saradsmith@gmail.com) is research specialist, and quinn galbraith (quinn_galbraith@byu.edu) is library human resource training and development manager, brigham young university library, provo, utah. 206 information technology and libraries | december 2010 2.0,” a program that “focuses on self-exploration and encourages staff to learn about new technologies on their own.”24 learning 2.0 encouraged library staff to explore web 2.0 tools by completing twenty-three exercises involving new technologies. plcmc’s program has been replicated by more than 250 libraries and organizations worldwide,25 and several libraries have written about their experiences, including academic26 and public libraries.27 these programs—and the technology challenge implemented by hbll—integrate the theories of adult learning. in the 1960s and 1970s, malcolm knowles introduced the theory of andragogy to describe the way adults learn.28 knowles described adults as learners who (1) are self-directed, (2) use their experiences as a resource for learning, (3) learn more readily when they experience a need to know, (4) seek immediate application of knowledge, and (5) are best motivated by internal rather than external factors.29 the theory and practice of self-directed learning grew out of the first learning characteristic and assumes that adults prefer self-direction in determining and achieving learning goals, and therefore learners exercise independence in determining how and what they learn.30 these theories have had a considerable effect on adult education practice31 and employee development programs.32 when adults participate in trainings that align with the assumptions of andragogy, they are more likely to retain and apply what they have learned.33 ■■ the technology challenge hbll’s technology challenge is similar to learning 2.0 in that it encourages self-directed exploration of web 2.0 technologies, but it differs in that participants were even more self-directed in exploration and that they were asked to participate daily. these features encouraged more self-directed learning in areas of participant interest as well as habit formation. it is not our purpose to critique learning 2.0, but to provide some evidence and analysis to demonstrate the success of hands-on, self-directed training approaches and to suggest other ways for libraries to apply self-directed learning to technology training. the technology challenge was implemented from june 2007 to january 2008. hbll staff included 175 full-time employees, 96 of whom participated in the challenge. (the student employees were not involved.) participants were asked to spend fifteen minutes each day learning a new technology skill. hbll leaders used rewards to make the program enjoyable and to motivate participation: for each minute spent learning technology, participants earned one point, and when one thousand points were earned, the participant would receive a gift certificate to the campus bookstore. staff and faculty participated and tracked their progress through an online masters of “informal learning”; that is, they are accustomed to easily and quickly gathering information relevant to their lives from the internet and from friends. shish and allen claimed that millennials prefer “interactive, hyper-linked multimedia over the traditional static, textoriented printed items. they want a sense of control; they need experiential and collaborative approaches rather than formal, librarian-guided, library-centric services.”5 these students arrive on campus expecting “to handle the challenges of scholarly research” using similar methods and technologies.6 interactive technologies such as blogs, wikis, streaming media applications, and social networks, are referred to as “web 2.0.” abram argued that web 2.0 technology “could be useful in an enterprise, institutional research, or community environment, and could be driven or introduced by the library.”7 “library 2.0” is a concept referring to a library’s integration of these technologies; it is essentially the use of “web 2.0 opportunities in a library environment.”8 manesss described library 2.0 is user-centered, social, innovative, and provider of a multimedia experiences.9 it is a community that “blurs the line between librarian and patron, creator and consumer, authority and novice.”10 libraries have been using web 2.0 technology such as blogs,11 wikis,12 and social networks13 to better serve and connect with patrons. blogs allow libraries to “provide news, information and links to internet resources,”14 and wikis create online study groups15 and “build a shared knowledge repository.”16 social networks can be particularly useful in connecting with undergraduate students: millennials use technology to collaborate and make collective decisions,17 and libraries can capitalize on this tendency by using social networks, which for students would mean, as bates argues, “an informational equivalent of the reliance on one’s facebook friends.”18 students expect library 2.0—and as libraries integrate new technologies, the staff and faculty of academic libraries need to become “librarian 2.0.” according to abram, librarian 2.0 understands users and their needs “in terms of their goals and aspirations, workflows, social and content needs, and more. librarian 2.0 is where the user is, when the user is there.”19 the modern library user “needs the experience of the web . . . to learn and succeed,”20 and the modern librarian can help patrons transfer technology skills to information seeking. librarian 2.0 is prepared to help patrons familiar with web 2.0 to “leverage these [technologies] to make a difference in reaching their goals.”21 therefore staff and faculty “must become adept at key learning technologies themselves.”22 stephen abram asked, “are the expectations of our users increasing faster than our ability to adapt?”23 and this same concern motivated hbll and other institutions to initiate staff technology training programs. the public library of charlotte and mecklenburg county of north carolina (plcmc) developed “learning bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 207 their ability to learn and use technology. to be eligible to receive the gift card, participants were required to take this exit survey. sixty-four participants, all of whom had met or exceeded the thousand-point goal, chose to complete this survey, so the results of this survey represent the experiences of 66 percent of the participants. of course, if those who had not completed the technology challenge had taken the survey the results may have been different, but the results do show how those who chose to actively participate reacted to this training program. the survey included both quantifiable and open-ended questions (see appendix b for survey results and a list of the open-ended questions). the survey results, along with an analysis of the structure of the challenge itself, demonstrates that the program aligns with knowles’s five principles of andragogy to successfully help employees develop both technology skills and learning habits. self-direction the technology challenge was self-directed because it gave participants the flexibility to select which tasks and challenges they would complete. garrison wrote that in a self-directed program, “learners should be provided with choices of how they wish to proactively carry out the learning process. material resources should be available, approaches suggested, flexible pacing accommodated, and questioning and feedback provided when needed.”34 hbll provided a variety of challenges and training sessions related to various technologies. technology challenge participants were given the independence to choose which learning methods to use, including which training sessions to attend and which challenges to complete. according to the exit survey, the most popular training methods were small, instructor-led groups, followed by self-learning through reading books and articles. group training sessions were organized by hbll leadership and addressed topics such as microsoft office, rss feeds, computer organization skills, and multimedia software. other learning methods included web tutorials, dvds, large group discussions, and one-on-one tutoring. the group training classes preferred by hbll employees may be considered more teacher-directed than self-directed, but the technology challenge was self-directed as a whole in that learners were given the opportunity to choose what they learned and how they learned it. the structure of the technology challenge allowed participants to set their own pace. staff and faculty were given several months to complete the challenge and were responsible to pace themselves. on the exit survey, one participant commented: “if i didn’t get anything done one week, there wasn’t any pressure.” another enjoyed flexibility in deciding when and where to complete the tasks: “i liked being able to do the challenge anywhere. when i had a few minutes between appointments, classes, board game called “techopoly.” participation was voluntary, and staff and faculty were free to choose which tasks and challenges they would complete. tasks fell into one of four categories: software, hardware, library technology, and the internet. participants were required to complete one hundred points in each category, but beyond that, were able to decide how to spend their time. examples of tasks included attending workshops, exploring online tutorials, and reading books or articles about a relevant topic. for each hundred points earned, participants could complete a mini-challenge, which included reading blogs or e-books, listening to podcasts, or creating a photo cd (see appendix a for a more complete list). participants who completed fifteen out of twenty possible challenges were entered into a drawing for another gift certificate. before beginning the challenge, all participants were surveyed about their current use of technology. on this survey, they indicated that they were most uncomfortable with blogs, wikis, image editors, and music players. these results provided a focus for technology challenge trainings and mini-challenges. while not all of these technologies may apply directly to their jobs, 60 percent indicated that they were interested in learning them. forty-four percent reported that time was the greatest impediment to learning new technology; therefore the daily fifteen-minute requirement was introduced with the hope that it was small enough to be a good incentive to participate but substantial enough to promote habit formation and allow employees enough time to familiarize themselves with the technology. although some productivity may have been lost due to the time requirement (especially in cases where participants may have spent more than the required time), library leaders felt that technology training was an investment in hbll employees and that, at least for a few months, it was worth any potential loss in productivity. because participants could chose how and when they learned technology, they could incorporate the challenge into their work schedules according to their own needs, interests, and time constraints. of ninety-six participants, sixty-six reached or exceeded the thousand-point goal, and eight participants earned more than two thousand points. ten participants earned between five hundred and one thousand points, and another six earned between one hundred and five hundred. although not all participants completed the challenge, most were involved to some extent in learning technology during this time. ■■ the technology challenge and adult learning after finishing the challenge, participants took an exit survey to evaluate the experience and report changes in 208 information technology and libraries | december 2010 were willing, even excited, to learn technology skills: 37 percent “agreed” and 60 percent “strongly agreed” that they were interested in learning new technology. their desire to learn was cultivated by the survey itself, which helped them recognize and focus on this interest, and the challenge provided a way for employees to channel their desire to learn technology. immediate application learners need to see an opportunity for immediate application of their knowledge: ota et al. explained that “they want to learn what will help them perform tasks or deal with problems they confront in everyday situations and those presented in the context of application to real life.”39 because of the need for immediate application, the technology challenge encouraged staff and faculty to learn technology skills directly related to their jobs—as well as technology that is applicable to their personal or home lives. hbll leaders hoped that as staff became more comfortable with technology in general, they would be motivated to incorporate more complex technologies into their work. here is one example of how the technology challenge catered to adult learners’ need to apply what they learn: before designing the challenge, hbll held a training session to teach employees the basics of photoshop. even though attendees were on the clock, the turnout was discouraging. library leaders knew they needed to try something new. in the revamped photoshop workshop that was offered as part of the technology challenge, attendees brought family photos or film and learned how to edit and experiment with their photos and burn dvd copies. this time, the class was full: the same computer program that before drew only a few people was now exciting and useful. focusing on employees’ personal interests in learning new software, instead of just on teaching the software, better motivated staff and faculty to attend the training. motivation as stated by ota et al., adults are motivated by external factors but are usually more motivated by internal factors: “adults are responsive to some external motivators (e.g., better job, higher salaries), but the most potent motivators are internal (e.g., desire for increased job satisfaction, self-esteem).”40 on the entrance survey, participants were given the opportunity to comment on their reasons for participating in the challenge. the gift card, an example of an external motivation, was frequently cited as an important motivation. but many also commented on more internal motivations: “it’s important to my job to stay proficient in new technologies and i’d like to stay current”; “i feel that i need to be up-to-date or meetings i could complete some of the challenges.” employees could also determine how much or how little of the challenge they wanted to complete: many reached well over the thousand-point goal, while others fell a little short. participants began at different skill levels, and thus could use the time and resources allotted to explore basic or more advanced topics according to their needs and interests. garrison had noted the importance of providing resources and feedback in self-directed learning.35 the techopoly website provided resources (such as specific blogs or websites to visit) and instructions on how to use and access technology within the library. hbll also hired a student to assist staff and faculty one-on-one by explaining answers to their questions about technology and teaching other skills he thought may be relevant to their initial problem. the entrance and exit surveys provided opportunities for self-reflection and self-evaluation by questioning the participants’ use of technology before the challenge and asking them to evaluate their proficiency in technology after the challenge. use of experience the use of experience as a source of learning is important to adult learners: “the richest resource for learning resides in adults themselves; therefore, tapping into their experiences through experiential techniques (discussions, simulations, problem-solving activities, or case methods) is beneficial.”36 the small-group discussions and one-onone problem solving made available to hbll employees certainly fall into these categories. small-group classes are one of the best ways to encourage adults to share and validate their experiences, and doing so increases retention and application of new information.37 the trainings and challenges encouraged participants to make use of their work and personal experiences by connecting the topic to work or home application. for example, one session discussed how blogs relate to libraries, and another helped participants learn adobe photoshop skills by editing personal photographs. need to know adult learners are more successful when they desire and recognize a need for new knowledge or skills. the role of a trainer is to help learners recognize this “need to know” by “mak[ing] a case for the value of learning.”38 hbll used the generational survey and presurvey to develop a need and desire to learn. the results of the generational survey, which demonstrated a gap in technology use between librarians and students, were presented and discussed at a meeting held before the initiation of the technology challenge to help staff and faculty understand why it was important to learn 2.0 technology. results of the presurvey showed that staff and faculty bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 209 statistical reports or working with colleagues from other libraries.” ■■ “i learned how to set up a server that i now maintain on a semi-regular basis. i learned a lot about sfx and have learned some perl programming language as well that i use in my job daily as i maintain sfx.” ■■ “the new oclc client was probably the most significant. i spent a couple of days in an online class learning to customize the client, and i use what i learned there every single day.” ■■ “i use google docs frequently for one of the projects i am now working on.” participants also indicated weaknesses in the technology challenge. almost 20 percent of those who completed the challenge reported that it was too easy. this is a valid point—the challenge was designed to be easy so as not to intimidate staff or faculty who are less familiar with technology. it is important to note that these comments came from those who completed the challenge—other participants may have found the tasks and mini-challenges more difficult. the goal was to provide an introduction to web 2.0, not to train experts. however, a greater range of tasks and challenges could be provided in the future to allow staff and faculty more selfdirection in selecting goals relevant to their experience. to encourage staff and faculty to attend sponsored training sessions as part of the challenge, hbll leaders decided to double points for time spent at these classes. this certainly encouraged participation, but it lead to “point inflation”—perhaps being one reason why so many reported that the challenge was too easy to complete. the doubling of points may also have encouraged staff to spend more time in workshops and less time practicing or applying the skills learned. a possible solution would be offering 1.5 points, or offering a set number of points for attendance instead of counting per minute. it also may have been informative for purpose of analysis to have surveyed both those who did not complete the challenge as well as those who chose not to participate. because the presurvey indicated that time was the biggest deterrent to learning and incorporating new technology, we assume that many of those who did not participate or who did not complete the challenge felt that they did not have enough time to do so. there is definitely potential for further investigation into why library staff would not want to participate in a technology training program, what would motivate them to participate, and how we could redesign the technology challenge to make it more appealing to all of our staff and faculty. several library employees have requested that hbll sponsor another technology challenge program. because of the success of the first and because of continuing interest in technology training, we plan to do so in the future. we will make changes and adjustments according to the on technology in order to effectively help patrons”; “to identify and become comfortable with new technologies that will make my work more efficient, more presentable, and more accurate.” ■■ lifelong learning staff and faculty responded favorably to the training. none of the participants who took the exit survey disliked the challenge; 34 percent even reported that they strongly liked it. ninety-five percent reported that they enjoyed the process of learning new technology, and 100 percent reported that they were willing to participate in another technology challenge—thus suggesting success in the goal of encouraging lifelong technology learning. the exit survey results indicate that after completing the challenge, staff and faculty are more motivated to continue learning—which is exactly what hbll leaders hoped to accomplish. eighty-nine percent of the participants reported that their desire to learn new technology had increased, and 69 percent reported that they are now able to learn new technology faster after completing the technology challenge. ninety-seven percent claimed that they were more likely to incorporate new technology into home or work use, and 98 percent said they recognized the importance of staying on top of emerging technologies. participants commented that the training increased their desire to learn. one observed, “i often need a challenge to get motivated to do something new,” and another participant reported feeling “a little more comfortable trying new things out.” the exit survey asked participants to indicate how they now use technology. one employee keeps a blog for her daughter’s dance company, and another said, “i’m on my way to a full-blown googlereader addiction.” another participant applied these new skills at home: “i’m not so afraid of exploring the computer and other software programs. i even recently bought a computer for my own personal use at home.” the technology challenge was also successful in helping employees better serve patrons: “i can now better direct patrons to services that i would otherwise not have known about, such as streaming audio and video and e-book readers.” another participant felt better connected to student patrons: “i understand the students better and the things they use on a daily basis.” staff and faculty also found their new skills applicable to work beyond patron interaction, and many listed specific examples of how they now use technology at work: ■■ “i have attended a few microsoft office classes that have helped me tremendously in doing my work more efficiently, whether it is for preparing monthly 210 information technology and libraries | december 2010 2. richard t. sweeny, “reinventing library buildings and services for the millennial generation,” library administration & management 19, no. 4 (2005): 170. 3. win shish and martha allen, “working with generationd: adopting and adapting to cultural learning and change,” library management 28, no. 1/2 (2006): 89. 4. sweeney, “reinventing library buildings,” 170. 5. shish and allen, “working with generation-d,” 96. 6. ibid., 98. 7. stephen abram, “social libraries: the librarian 2.0 pheonomenon,” library resources & technical services 52, no. 2 (2008): 21. 8. ibid. 9. jack m. maness “library 2.0 theory: web 2.0 and its implications for libraries,” webology 3, no. 2 (2006), http:// www.webology.ir/2006/v3n2/a25.html?q=link:webology.ir/ (accessed jan. 8, 2010). 10. ibid., under “blogs and wikis,” para. 4. 11. laurel ann clyde, “library weblogs,” library management 22, no. 4/5 (2004): 183–89; maness, “library 2.0. theory.” 12. see matthew m. bejune, “wikis in libraries,” information technology & libraries 26, no. 3 (2007): 26–38 ; darlene fichter, “the many forms of e-collaboration: blogs, wikis, portals, groupware, discussion boards, and instant messaging,” online: exploring technology & resources for information professionals 29, no. 4 (2005): 48–50; maness, “library 2.0 theory.” 13. mary ellen bates, “can i facebook that?” online: exploring technology and resources for information professionals 31, no. 5 (2007): 64; sarah elizabeth miller and lauren a. jensen, “connecting and communicating with students on facebook,” computers in libraries 27, no. 8 (2007): 18–22. 14. clyde, “library weblogs,” 183. 15. maness, “library 2.0 theory.” 16. fichter, “many forms of e-collaboration,” 50. 17. sweeney, “reinventing library buildings”; bates, “can i facebook that?” 18. bates, “can i facebook that?” 64. 19. abram, “social libraries,” 21. 20. ibid., 20. 21. ibid., 21. 22. shish and allen, “working with generation-d,” 90. 23. abram, “social libraries,” 20. 24. helene blowers and lori reed, “the c’s of our sea change: plans for training staff, from core competencies to learning 2.0,” computers in libraries 27, no. 2 (2007): 11. 25. helene blowers, learning 2.0, 2007, http://plcmclearning .blogspot.com (accessed jan. 8, 2010). 26. for examples, see ilana kingsley and karen jensen, “learning 2.0: a tool for staff training at the university of alaska fairbanks rasmuson,” the electronic journal of academic & special librarianship 12, no. 1 (2009), http://southernlibrarianship.icaap.org/content/v10n01/kingsley_i01.html (accessed jan. 8, 2010); beverly simmons, “learning (2.0) to be a social library,” tennessee libraries 58, no. 2 (2008): 1–8. 27. for examples, see christine mackenzie, “creating our future: workforce planning for library 2.0 and beyond,” australasian public libraries & information services 20, no. 3 (2007): 118–24; liisa sjoblom, “embracing technology: the deschutes public library’s learning 2.0 program,” ola quarterly 14, no. 2 (2007): 2–6; hui-lan titango and gail l. mason, “learning library 2.0: 23 things @ scpl,” library management 30, no. 1/2 feedback we have received, and continue to evaluate it and improve it based on survey results. the purpose of a second technology challenge would be to reinforce what staff and faculty have already learned, to teach new skills, and to help participants remember the importance of lifelong learning when it comes to technology. ■■ conclusion hbll’s self-directed technology challenge was successful in teaching technology skills and in promoting lifelong learning—as well as in fostering the development of librarian 2.0. abram listed key characteristics and duties of librarian 2.0, including learning the tools of web 2.0; connecting people, technology, and information; embracing “nontextual information and the power of pictures, moving images, sight, and sound”; using the latest tools of communication; and understanding the “emerging roles and impacts of the blogosphere, web syndicasphere, and wikisphere.”41 survey results indicated that hbll employees are on their way to developing these attributes, and that they are better equipped with the skills and tools to keep learning. like plcmc’s learning 2.0, the technology challenge could be replicated in libraries of various sizes. obviously an exact replication would not be feasible or appropriate for every library—but the basic ideas, such as the principles of andragogy and self-directed learning could be incorporated, as well as the daily time requirement or the use of surveys to determine weaknesses or interests in technology skills. whatever the case, there is a great need for library staff and faculty to learn emerging technologies and to keep learning them as technology continues to change and advance. but the most important benefit of a self-directed training program focusing on lifelong learning is effective employee development. the goal of any training program is to increase work productivity—and as employees become more productive and efficient, they are happier and more excited about their jobs. on the exit survey, one participant expressed initially feeling hesitant about the technology challenge and feared that it would increase an already hefty workload. however, once the challenge began, the participant enjoyed “taking the time to learn about new things. i feel i am a better person/librarian because of it.” and that, ultimately, is the goal—not only to create better librarians, but also to create better people. notes 1. robert h. mcdonald and chuck thomas, “disconnects between library culture and millennial generation values,” educause quarterly 29, no. 4 (2006): 4. bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 211 ers,” journal of extension 33 (2005), http://www.joe.org/ joe/2006december/tt5.php (accessed jan. 8, 2010); wayne g. west, “group learning in the workplace,” new directions for adult and continuing education 71 (1996): 51–60. 33. ota et al., “needs of learners.” 34. d. r. garrison, “self-directed learning: toward a comprehensive model,” adult education quarterly 48 (1997): 22. 35. ibid. 36. ota et al., “needs of learners,” under “needs of the adult learner,” para. 4. 37. ota et al., “needs of learners”; west, “group learning.” 38. ota et al., “needs of learners,” under “needs of the adult learner,” para. 2. 39. ibid., para. 6. 40. ibid., para 7. 41. abram, “social library,” 21–22. (2009): 44–56; illinois library association, “continuous improvement: the transformation of staff development,” the illinois library association reporter 26, no. 2 (2008): 4–7; and thomas simpson, “keeping up with technology: orange county library embraces 2.0,” florida libraries 20, no. 2 (2007): 8–10. 28. sharan b. merriam, “andragogy and self-directed learning: pillars of adult learning theory,” new directions for adult & continuing education 89 (2001): 3–13. 29. malcolm shepherd knowles, the modern practice of adult education: from pedagogy to andragogy (new york: cambridge books, 1980). 30. jovita ross-gordon, “adult learners in the classroom,” new directions for student services 102 (2003): 43–52. 31. merriam, “pillars of adult learning”; ross-gordon, “adult learners.” 32. carrie ota et al., “training and the needs of learnappendix a. technology challenge “mini challenges” technology challenge participants had the opportunity to complete fifteen of twenty mini-challenges to become eligible to win a second gift certificate to the campus bookstore. below are some examples of technology mini-challenges: 1. read a library or a technology blog 2. listen to a library podcast 3. check out a book from circulation’s new self-checkout machine 4. complete an online copyright tutorial 5. catalog some books on librarything 6. read an e-book with sony ebook reader or amazon kindle 7. scan photos or copy them from a digital camera and then burn them onto a cd 8. backup data 9. change computer settings 10. schedule meetings with microsoft outlook 11. create a page or comment on a page on the library’s intranet wiki 12. use one of the library’s music databases to listen to music 13. use wordpress or blogger to create a blog 14. post a photo on a blog 15. use google reader or bloglines to subscribe to a blog or news page using rss 16. reserve and check out a digital camera, camcorder, dvr, or slide scanner from the multimedia lab and create something with it 17. convert media on the analog media racks 18. edit a family photograph using photo-editing software 19. attend a class in the multimedia lab 20. make a phone call using skype 212 information technology and libraries | december 2010 how did you like the technology challenge overall? answer response percent strongly disliked 0 0 disliked 0 0 liked 42 66 strongly liked 22 34 how did you like the reporting system used for the technology challenge (the techopoly game)? answer response percent strongly disliked 0 0 disliked 4 6 liked 41 64 strongly liked 19 30 would you participate in another technology challenge? answer response percent yes 64 100 no 0 0 what percentage of time did you spend using the following methods of learning? (participants were asked to allocate 100 points among the categories) category average response instructor-led large group 15.3 instructor-led small group 27 one-on-one instruction 3.5 web tutorial 12.8 self-learning (books, articles) 27.4 dvds .5 small group discussion 2.7 large group discussion 2.6 other 6.7 i am more likely to incorporate new technology into my home or work life. answer response percent strongly disagree 0 0 disagree 2 3 agree 49 77 strongly agree 13 20 i enjoy the process of making new technology a part of my work or home life. answer response percent strongly disagree 0 0 disagree 2 3 agree 37 58 strongly agree 24 38 after completing the technology challenge, my desire to learn new technologies has increased. answer response percent strongly disagree 0 0 disagree 7 11 agree 44 69 strongly agree 13 20 i feel i now learn new technologies more quickly. answer response percent strongly disagree 0 0 disagree 20 31 agree 39 61 strongly agree 5 8 appendix b. exit survey results bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 213 open-ended questions ■■ what would you change about the technology challenge? ■■ what did you like about the technology challenge? ■■ what technologies were you introduced to during the technology challenge that you now use on a regular basis? ■■ in what was do you feel the technology challenge has benefited you the most? how much more proficient do you feel in . . . category not any somewhat a lot hardware 31% 64% 5% software 8% 72% 20% internet resources 17% 68% 15% library technology 23% 64% 13% in order for you to succeed in your job, how important is keeping abreast of new technologies to you? answer response percent not important 1 2 important 22 34 very important 41 64 editorial | truitt 3 marc truitteditorial i doubt that many of the blog people are in the habit of sustained reading of complex texts. —michael gorman, 2005 s o, three plus years after the fact, why am i opening with michael gorman’s unfortunate characterization of those he labeled “blog people”? i have no interest in reopening this debate, honestly! but the problem with generalizations, however unfair, is that at their heart there is just enough substance to make them “stick”—to give them a grain or two of credibility. gorman’s words struck a chord in me that existed before his charge and has continued to exist to this day. the substance in gorman’s words had little to do with these “blog people” as such; rather, my interest was piqued by the implications in his remark about how we all deal with “complex texts” and the “sustained reading” of the same. in a time of wide availability of full-text electronic articles, it has become so easy and tempting to cherry pick the odd phrase here or there, without study of the work as a whole. how has scholarship especially been changed by the ease with which we can reduce works to snippets without having considered their overall context? i’m not arguing that scholarly research and writing hasn’t always been at least in part about finding the perfect juicy quotation around which we then weave our own theses. many of us well recall the boxes of 3x5” citation and 5x8” quotation files that we or our patrons laboriously assembled through weeks, months, and years of detailed research. but if the style of compiling these files that i witnessed (and indeed did) is any guide, their existence was the product of precisely that “sustained reading of complex texts” of which gorman spoke. my vague, nagging sense is that what is changing is this style of approaching whole texts. i wondered then about how much scholarly research today is driven by keyword searches of digitized texts that then essentially produce “virtual quotation files” without our having had to struggle with their context in the whole of the original source text? fast forward three years. lately, several articles touching on our changing ways of interacting with resources have appeared in both scholarly and popular venues, and these have served to underline my sense that we are missing something because of our growing lack of engagement with whole texts. writing in the july/august issue of the atlantic monthly, nicholas carr asks “is google making us stupid?” drawing an analogy to the scene in the film 2001: a space odyssey, in which astronaut dave bowman disables supercomputer hal’s memory circuits, carr says i can feel it, too. over the past few years i’ve had an uncomfortable sense that someone, or something, has been tinkering with my brain, remapping the neural circuitry, reprogramming the memory. my mind isn’t going—so far as i can tell—but it’s changing. i’m not thinking the way i used to think. i can feel it most strongly when i’m reading. immersing myself in a book or a lengthy article used to be easy. my mind would get caught up in the narrative or the turns of the argument, and i’d spend hours strolling through long stretches of prose. that’s rarely the case anymore. now my concentration often starts to drift after two or three pages. i get fidgety, lose the thread, begin looking for something else to do. i feel as if i’m always dragging my wayward brain back to the text. the deep reading that used to come naturally has become a struggle.1 carr goes on to explain that “what the net seems to be doing is chipping away my capacity for concentration and contemplation. my mind now expects to take in information the way the net distributes it: in a swiftly moving stream of particles. once i was a scuba diver in the sea of words. now i zip along the surface like a guy on a jet ski.”2 carr’s nagging fear found similar expression among some tech-savvy participants of library online forums; one of the more interesting comments appeared on the web4lib electronic discussion list. in a discussion of the article, tim spalding of librarything observed that he himself had experienced what he dubbed “the google effect” and noted something is lost. . . . human culture often advances by externalizing pieces of our mental life—writing externalizes memory, calculators externalize arithmetic, maps, and now gps, externalize way-finding, etc. each shift changes the culture. and each shift comes with a cost. nobody memorizes texts anymore, nobody knows the times tables past ten or twelve and nobody can find their way home from the stars and the side of the tree the moss grows on.3 meanwhile, another article appeared on a closely related topic, this time in the journal science. james a. evans observed that, because “scientists and scholars tend to search electronically and follow hyperlinks rather than browse or peruse,” the easy availability of electronic resources was resulting in an “ironic change” for scientific marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | september 2008 scholarship, in that as more journal issues came online, the articles referenced tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles. the forced browsing of print archives may have stretched scientists and scholars to anchor findings deeply into past and present scholarship. searching online is more efficient and following hyperlinks quickly puts researchers in touch with prevailing opinion, but this may accelerate consensus and narrow the range of findings and ideas built upon.4 evans’s research highlights an additional irony: an unintended benefit to the scholarly process in the paperbased world was “poor indexing,” since it encouraged browsing through less relevant, older, or more marginal literature. this browsing had the effect of “facilitat[ing] broader comparisons and led researchers into the past. modern graduate education parallels this shift in publication—shorter in years, more specialized in scope, culminating less frequently in a true dissertation than an album of articles.”5 what is one to make of all of this? at the outset, i wish to state clearly that i am not some sort of anti e-text luddite. electronic texts are a fact of life, and are becoming moreso every day. even though they are in their infancy as a medium, they’ve already transformed the landscape of bibliographic access. my interest is not with the tool, but with the manner in which we are using it. i began by suggesting that i share with gorman a concern about how we increasingly engage with “complex texts” today. unlike him, though, my concern is not limited only to the so-called blog people (whomever they may be), but indeed, it includes all of us. with the explosion in easily accessible electronic texts, our ideas and habits concerning interaction with these texts are changing, sometimes in unintended ways. in a recent informal survey i conducted of my colleagues at work, i asked, “have you ever read an e-book (not just a journal article) from (virtual) cover to (virtual) cover?” for those whose answer was affirmative, i also asked, “how many such books have you read in their entirety?” out of twenty-odd responses, three individuals answered that yes, they had had occasion to read an entire e-book (for a total of six books among the three “yes” respondents, which seemed surprisingly high to me). of greater interest, though, were those who chose to question the premise of the survey, arguing that people don’t “read” e-books the way that they read paper ones. it does make one wonder, then, how amazon thinks it possesses a viable business model in the kindle e-book reader, for which it currently lists an astounding 140,000+ available e-books. clearly, some e-books are being read as whole texts, by some people, for some purposes. but i suspect that’s another story.6 carr and evans use slightly differing imagery to describe a similar phenomenon. carr closes with a reference back to the death of 2001’s hal, saying, “as we come to rely on computers to mediate our understanding of the world, it is our own intelligence that flattens into artificial intelligence.”7 evans, on the other hand, compares contemporary scientific researchers to newton and darwin, each of whom produced works that “not only were engaged in current debates, but wove their propositions into conversation with astronomers, geometers, and naturalists from centuries past.” twenty-first-century scientists and scholars, by contrast, are able because of readily available electronic resources “to frame and publish their arguments more efficiently, [but] they weave them into a more focused—and more narrow—past and present.” 8 perhaps the most succinct statement, though, comes from librarything’s tim spalding, who summarized the problem thusly: “we advance by becoming dumber.”9 an ital research and publishing opportunity for an inquisitive and enterprising scholar, perhaps? i’d welcome the manuscript! shameless plugs department. by the time you read this, we at ital will have launched our new blog, italica (http://ital-ica.blogspot.com). italica addresses a need we on the ital editorial board have long sensed; that is, an area for “letters to the editor,” updates to articles, supplementary materials we can’t work into the journal—you name it. one of the most important features of italica will be a forum for readers’ conversations with our authors: we’ll ask authors to host and monitor discussion for a period of time after publication so that you’ll then have a chance to interact with them. italica is currently a pilot project. for our first issue we will have begun with a discussion hosted by jennifer bowen, whose article “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase i” was published in the june 2008 issue of ital. for our second italica, we plan to expand coverage and discussion to include all articles and other features in the september issue you now have in hand. italica is sure to become a stimulating supplement to and forum for topics originating in ital. we look forward to seeing you there! references and notes extract. michael gorman, “revenge of the blog people!” library journal (feb. 15, 2005) www.libraryjournal.com/article/ ca502009.html (accessed july 21, 2008). 1. nicholas carr, “is google making us stupid?” the atlantic monthly 301 (july/aug. 2008) www.theatlantic.com/ doc/200807/google (accessed july 23, 2008). editor’s column | truitt 5 2. ibid. 3. tim spalding, “re: ‘is google making us stupid? what the internet is doing to our brains,’” web4lib discussion list post, june 19, 2008, http://article.gmane.org/gmane.education .web4lib/12349 (accessed july 24, 2008). 4. james a. evans, “electronic publication and the narrowing of science and scholarship,” science (july 18, 2008) www .sciencemag.org/cgi/content/full/321/5887/395 (accessed july 24, 2008). emphasis added. 5. ibid. 6. as of 5:30pm (est), july 24, 2008, amazon’s website listed 145,591 “kindle books.” www.amazon.com/s/qid=1216934603/ ref=sr_hi?ie=utf8&rs=154606011&bbn=154606011&rh=n%3a1 54606011&page=1. 7. carr, “is google making us stupid?” 8. evans, “electronic publication and the narrowing of science of scholarship.” 9. spalding, “re: ‘is google making us stupid?’” 149 an integrated computer based technical processing system in a small college library jack w. scott: kent state university library, kent, ohio (formerly lorain county community college, lorain, ohio) a functioning technical processing system in a two-year community college library utilizes a model 2201 friden flexowriter with punch card control and tab card reading units, an ibm 026 key punch, and an ibm 1440 computer, with two tape and two disc drives, to produce all acquisitions and catalog files based primarily on a single typing at the time of initiating an order. records generated by the initial order, with slight updating of information,. are used to produce, via computer, manual and mechanized order files and shelf lists, catalogs in both the traditional 3x5 card form and book form, mechanized claiming of unfilled orders, and subject bibliographies. the lorain county community college, a two-year institution designed for 4000 students, opened in september 1964, with no librarian and no library collection. when the librarian was hired in october 1964, lack of personnel, both professional and clerical, forced him to examine closely traditional ways of ordering and preparing materials, his main task being the controlled building of a collection as quickly as possible. no library having been established, there were no inflexible rules governing acquisitions or cataloging and no catalogs or other files enforcing their pattern on future plans. the librarian was free to experiment and adapt as much as he desired; and adapt and experiment he did, remembering, at least most of the time, the primary reasons for designing the 150 journal of library automation vol. 1/3 september, 1968 system. these were 1) to notify the vendor about what material was desired; 2) to have readily available information about when material had been ordered and when it might arrive; 3) to provide a record of encumbrances; 4) to make sure that material received was the material which had been ordered; 5) to initiate payment for material received; 6) to provide catalog copy for technical processes to use in producing card and book catalogs; 7) to provide inexpensive control cards for a circulation system; and 8) to provide whatever other statistics might be needed by the librarian. the librarian attended the purdue conference on library automation (october 2-3, 1964) and an ibm conference on a-utomation held in cleveland (december 1964), and visited libraries with data processing installations, such as the decatur public library. then an extensive literature search was run on the subject of mechanization of libraries and the available material thoroughly reviewed. it was the consensus of the president, the librarian, and the manager of data processing that, as white said later, "the computer will play a major part in how libraries are organized and operated because libraries are a part of the fabric of society and computers are becoming a daily accepted part of life." ( 1) moreover, it was agreed that the use of data processing equipment would be justified only if it made building a collection more efficient and more economical than manual methods could do. metro}) after careful consideration of the ibm 870 document writing system ( 2) and the system described by kraft ( 3) as input techniqu~s for the college library, ·it . was decided to use the friden flexowriter, recommended both at purdue and, in european applications, by bernstein ( 4). its most attractive feature was the use of paper tapes to generate various secondary. records without the necessity of proofreading each one. the college, by mid-1965, ·had the following equipment available for library use: one friden flexowriter (model 2201) with card punch control unit and tab card reading unit, one ibm 026 key punch with alternate programming, and guaranteed time on the college-owned ibm 1440 8k computer with two tape and lwo disc drives. to produce punched paper tape and tab cards with only one keyboarding, an electrical connection between the flexowtiter and the keypunch was especially designed and installed. . it was fortunate for the library that the college also had an excellent data processing· manager who was interested in seeing data processing machines and techniques utilized in as many ways as possible. with his enthusiastic support, aid in programming and preparation of flow charts, and patient cooperation, it was not surprising that the automation of library processes was completely successful. ·· at this time it ·was decided that since the college was likely to remain integrated computer based processing/ scott 151 a single-campus institution it would be uneconomical to rely solely on a book catalog, even though the portability of such a device was most attractive to librarian and faculty alike. therefore, it was planned to have the public catalog, as well as the official shelf list, in card form, permitting both to be kept current economically. these two files were to be supplemented with crude book catalogs which would be a by-product, among others, of the typing of the original book orders. these book catalogs were not to replace the card catalog but simply to extend and facilitate use of the collection. it was also decided to design a system which would duplicate as few as possible of the manual aspects of normal technical processing systems, but one which would, at the same time, permit the return to a manual system from a machine system with a minimum of trouble and tribulation if support for the library's automated system should be withdrawn. concern about such withdrawal of support had originally been voiced by durkin and white in 1961, when they said: "there have been a number of unfortunate examples of libraries that abandoned their home-grown catalogs for a machine retriev(tl program because there was some free computer time, only to lose their machine time to a higher priority project and to be left with information storage to which they no longer have access. many of these librarians, and others who have heard about their plight, are determined not to bum their bridges behind' them by abandoning their reliable, if old-fashioned, 3x5 card catalogs." ( 5) although the necessity of returning to an inefficient manual system has not, to date, raised its ugly head, there were times when it was most comforting to know that routes of retreat and reformation were available. under the present system there is only one manual keyboarding of descriptive catalog main entries for most titles. all other records are generated from these main entries. this integrated system was adopted on the assumption that cataloging infonnation in some form ( 6) would be available for a high percentage of books. experience showed that about 95 percent of acquisitions did have catalog copy readily available. of 4029 titles processed in a 5-month period, catalog copy was available for 3824. after verification that a requested title is neither in the library nor on order, a copy of a catalog entry is located in a source such as the national union catalog, library of congress proofsheets, or publisher's weekly, etc. the catalog information is manually typed in its entirety (including subject headings) onto five-part multiple request forms, using the friden flexowriter. output from the friden consists of the multiple order, a punched paper tape containing the full bibliographic entry but no order information, and tab cards, punched by the slave ibm key punch, which contain full order information but only abbreviated bibliographic data. (figure 1 ). the tab cards, containing full order information, are used as input to the 1440 computer to create an "on order" file arranged by order 152 /ou·rnal of library automation vol. 1/ 3 september, 1968 mail copies to vendor typed multiple book orders on order tape fig. 1 on order creation routine. start flexowriter 026 key punch on order cards cards to week integrated computer based p1'0cessing / scott 153 number and stored on magnetic tape, from which an "on order" printout is produced weekly (figure 2). at any given time this magnetic tape order file can be used to total the dollar amount of outstanding orders to any given vendor, or the total amount outstanding to all vendors (figure 3 ). the punched paper tape and two copies of the request form are stored in a standard 3x5 card file arranged by main entry. one copy of the request form is to be used as a work slip when material is received. on order cards for one week fig. 2 on order update. start cpu on order update scratc h a f ter update 154 journal of library automation vol. 1/ 3 september, 1968 the original and one copy of the request form is sent to the vendor, with instructions to return one copy with shipment. in the event the vendor does not comply, the main entry can be located readily by checking the order number or order .date on the "on order" printout and using the abbreviated bibliographic information which appears there. if the material requested has not been shipped within three months, the magnetic tape order file is used to prepare tab cards containing all original order information and the cards are sent to the library with a notice stating that shipment is overdue. these tab cards are used as input fig. 3 on order cost tally. start cpu list or tab of on order file by cost #30000 on order cost tab integrated computer based processing/ scott 155 to the flexowriter tab card reader unit which activates the flexowriter itself and prepares "overdue, ship or cancel" notices to the vendor (figfig. 4 late on order routine. ure 4). 156 journal of library automation vol. 1/ 3 september, 1968 products when material is received, the paper tape and one copy of the main entry work slip are pulled from the card order file and sent to the cataloger who notes on the work slip the call number to be used as well as any changes. the work slip, punched paper tape and book then pass to the technician who does the shelf listing. at this point the original output paper tape containing full bibliographic information is used as input for the flexowriter to create a standard 3x5 hard-copy shelf list card containing full bibliographic information, as well as inventory data such as vendor, date of receipt and cost. the last three items and the call number are added manually as "changes." simultaneously a new paper tape is produced as output which contains bibliographic information from the first tape and all revisions deemed necessary by the cataloger. the revised paper tape is used on the flexowriter to prepare 3x5 card sets for the public catalog. at the same time the slave keypunch prepares a set of tab cards containing full acquisitions fig. 5 shelf list creation routine. integrated computer based processing/scott 157 information: cost, vendor, date of receipt; and abbreviated bibliographic information: short author, short title, full call number (including copy, year, part and volume), accession number and short edition statement (figure 5). the tab cards are used first to delete the item from the magnetic tape "on order" file and second as input to create a magnetic tape shelf list of abbreviated information arranged by call number (figure 6). the magnetic tape shelf list is used to create 1) eight copies of author, title, and classified catalogs which are updated semi-annually; 2 ) printouts of weekly acquisitions; 3) subject printouts on demand; and 4) tab cards which serve as circulation cards for books, film s, drawings, tape and disc recordings, filmstrips and any other materials. the tab cards can be used with the ibm 357 circulation system or any similar system. discussion the efficiency of this system is most dramatically demonstrated by the amount of work accomplished per person per year. one technician can sort by call number cpu circ. caro prep fig. 6 weekly shelf list update. sort by control number cpu 158 journal of library automation vol. 1/ 3 september, 1968 process over one thousand orders per month. over fifteen thousand fully cataloged volumes per year (approximately eleven thousand titles) are added to the collection by a technical processing department which consists solely of one full-time cataloger and two full-time technicians. one technician spends one half of her time typing orders and the other half preparing the shelf list. at present the limiting factor in processing material is not the personnel time available but rather time on the flexowriterkeypunch combination, which runs continuously for sixty hours per week. the cataloger feels if some thirty hours more per week were available for running the machines, or if a second flexowriter were available to handle catalog card output, it would then be possible to order, receive, and fully process fifteen thousand titles per year (eighteen to twenty thousand volumes) with only the present technical processing staff. references 1. white, herbert s.: "to the barricades! the computers are coming!" special libmries 57 (november, 1966), 631. 2. general information manual: mechanized library procedures (white plains, n.y.: ibm, n.d.). 3. kraft, donald h .: libmry automation with data processing equipment (chicago: ibm, 1964). 4. bernstein, hans h.: "die verwendung von flexowritern in dokumentation und bibliothek", n achrichten fur dokumentation 12 (june, 1961), 92. 5. durkin, robert e.; white, herbert s.: "simultaneous preparation of library catalogs for manual and machine applications", special libraries 52 (may, 1961), 231. 6. kaiser, walter h.: "new face and place for the catalog card", library journal 88 (january, 1963 ), 186. 6 information technology and libraries | june 2008 metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1 jennifer bowen the extensible catalog (xc) project at the university of rochester will design and develop a set of open-source applications to provide libraries with an alternative way to reveal their collections to library users. the goals and functional requirements developed for xc reveal generalizable needs for metadata to support a next-generation discovery system. the strategies that the xc project team and xc partner institutions will use to address these issues can contribute to an agenda for attention and action within the library community to ensure that library metadata will continue to support online resource discovery in the future. library metadata, whether in the form of marc 21 catalog records or in a variety of newer metadata schemas, has served its purpose for library users by facilitating their discovery of library resources within online library catalogs (opacs), digital libraries, and institutional repositories. however, libraries now face the challenge of making this wealth of legacy catalog data function adequately within next-generation web discovery environments. approaching this challenge will require: n an understanding of the metadata itself and a commitment to deriving as much value from it as possible; n a vision for the capabilities of future technology; n an understanding of the needs of current (and, where possible, future) library users; and n a commitment to ensuring that lessons learned in this area inform the development of both future library systems and future metadata standards. the university of rochester ’s extensible catalog (xc) project will bring these various perspectives together to design and develop a set of open-source, collaboratively built next-generation discovery tools for libraries. the xc project team seeks to make the best possible use of legacy library metadata, while also informing the future development of discovery metadata for libraries. during phase 1 of the xc project (2006–2007), the xc project team created a plan for developing xc and defined the goals and initial functional requirements for the system. this paper outlines the major metadatarelated issues that the xc project team and xc partner institutions will need to address to build the xc system during phase 2. it also describes how the xc team and xc partners will address these issues, and concludes by presenting a number of issues for the broader library community to consider. while this paper focuses on the work of a single library project, the goals and functional requirements developed for the xc project reveal many generalizable needs for metadata to support a next-generation discovery system.1 the metadata-related goals of the xc project—to facilitate the use of marc metadata outside an integrated library system (ils), to combine marc metadata with metadata from other sources in a single discovery environment, and to facilitate new functionality (e.g., faceted browsing, user tagging)—are very similar to the goals of other library projects and commercial vendor discovery software. the issues described in this paper thus transcend their connection to the xc project and can be considered general needs for library discovery metadata in the near future. in addition to informing the library community about the xc project and encouraging comment on that work, the author hopes that identifying and describing metadata issues that are important for xc—and that are likely to be important for other projects as well—will encourage the library community to set these issues as high priorities for attention and action within the next few years. n the extensible catalog project the university of rochester’s vision for the extensible catalog (xc) is to design and develop a set of open-source applications that provide libraries with an alternative way to reveal their collections to library users. xc will provide easy access to all resources (both digital and physical collections) and will enable library content to be revealed through other web applications that libraries may already be using. xc will be released as open-source software, so it will be available for free download, and libraries will be able to adopt, customize, and extend the software to meet their local needs. the xc project is a collaborative effort between partner institutions that will serve a variety of roles in its development. phase 1 of the xc project, funded by the andrew w. mellon foundation and carried out by the university of rochester river campus libraries between april 2006 and june 2007, resulted in the creation of a project plan for the development of xc. during xc phase 1, the xc project team recruited a number of other institutions that will serve as xc partners and who have agreed to contribute resources toward building and implementing xc during phase 2. xc phase 2 (october 2007 through jennifer bowen (jbowen@library.rochester.edu) is director of metadata management at the university of rochester river campus libraries, new york, and is co-principal investigator for the extensible catalog project. metadata to support next-generation library resource discovery | bowen 7 june 2009) is supported through additional funding from the andrew w. mellon foundation, the university of rochester, and xc partners. during phase 2, the xc project team, assisted by xc partners, will deploy the xc software and make it available as open-source software.2 through its various components, the xc system will provide a platform for local development and experimentation that will ultimately allow libraries to manage and reveal their metadata through a variety of web applications such as web sites, institutional repositories, and content management systems. a library may choose to create its own customized local interface to xc, or use xc’s native user interface “as is.” the native xc interface will include web 2.0 functionality, such as tagging and faceted browsing of search results that will be informed by frbr (functional requirements for bibliographic records)3 and frad (functional requirements for authority data)4 conceptual models. the xc software will handle multiple metadata schemas, such as marc 215 and dublin core,6 and will be able to serve as a repository for both existing and future library metadata. in addition, xc will facilitate the creation and incorporation of user-created metadata, enabling such metadata to be enhanced, augmented, and redistributed in a variety of ways. the xc project team has designed a modular architecture for xc, as shown in the simplified schematic in figure 1. xc will bring together metadata from a variety of sources (integrated library systems, digital repositories, etc.), apply services to that metadata, and display it in a usable way in the web environments where users expect to find it.7 xc’s architecture will allow institutions that implement the software to take advantage of innovative models for shared metadata services, which will be described in this paper. n xc phase 1 activities during the now-completed xc phase 1, the xc project team focused on six areas of activity: 1. survey and understand existing research on user practices. 2. gauge library demand for the xc system. 3. anticipate and prepare for the metadata requirements of the new system. 4. learn about and build on related projects. 5. experiment with and incorporate useful, freely available code. 6. build a community of interest. the xc project team carried out a variety of research activities to inform the overall goals and high-level functional requirements for xc. this research included a literature search and ongoing monitoring of discussion lists and blogs, to allow the team to keep up with the most current discussions taking place about next-generation library discovery systems and related technologies and projects.8 the xc team also consulted regularly with prospective partners and other knowledgeable colleagues who are engaged in defining the concept of a next-generation library discovery system. in order to gauge library demand for the xc system, the team also conducted a survey of interested institutions.9 this paper reports the results of the third area of activity during xc phase 1—anticipating and preparing for the metadata requirements of the new system—and looks ahead to plans to develop the xc software during phase 2. n xc goals and metadata functional requirements the goals of the xc project have significant implications for the metadata functionality of the system, with each goal suggesting specific high-level functional requirements for how the system can achieve that particular goal. the five goals are: n goal 1: provide access to all library resources, digital and non-digital. n goal 2: bring metadata about library resources into a more open web environment. n goal 3: provide an interface with new web functionality such as web 2.0 features and faceted browsing. n goal 4: conduct user research to inform system development. n goal 5: publish the xc code as open-source software. figure 1. xc system diagram 8 information technology and libraries | june 2008 an overview of each xc goal and its related high-level metadata requirements appears below. each requirement is then discussed in more detail, with a plan for how the xc project team will address that requirement when developing the xc software. n goal 1: provide access to all library resources, digital and non-digital working alongside a library’s current integrated library system (ils) and its other web applications, xc will strive to bring together access to all library resources, thus eliminating the data silos that are now likely to exist between a library’s opac and its various digital repositories and commercial databases. this goal suggests two fairly obvious metadata requirements (requirements 1 and 2). requirement 1—the system must be capable of acquiring and managing metadata from multiple sources: ilss, digital repositories, licensed databases, etc. a typical library currently has metadata pertaining to its collections residing in a variety of separate online systems: marc data in an ils, metadata in various schemas in digital collections and repositories, citation data in commercial databases, and other content on library web sites. a library that implements xc may want to populate the system with metadata from several online environments to simplify access to all types of resources. to achieve goal 1, xc must be capable of acquiring and managing metadata from all of these sources. each online environment and type of metadata present their own challenges. repurposing marc data repurposing marc metadata from an existing ils will be one of the biggest metadata tasks for a next-generation discovery system such as xc. in planning xc, we have assumed that most libraries will keep their current ils for the next few years or perhaps migrate to a newer commercial or open-source ils. in either case, most libraries will likely continue to rely on an ils’s staff functionality to handle materials acquisition, cataloging, circulation, etc. for the short term. relying upon an ils as a processing environment does not, however, mean that a library must use the opac portion of that ils as its means of resource discovery for users. xc will provide other options for resource retrieval by using web services to interact with the ils in the background.10 to repurpose ils metadata and enable it to be used in various web discovery environments, xc will harvest a copy of marc metadata records from an institution’s ils using the open archives initiative protocol for metadata harvesting (oai-pmh).11 using web services and standard protocols such as oaipmh offers not only a short-term solution for reusing metadata from an ils, but can also be used in both the shortand long-term to harvest metadata from any system that is oai-pmh harvestable, as will be discussed further below. while harvesting metadata from existing systems into xc creates duplication of metadata between an ils and xc, this actually has significant benefits. xc will handle metadata updates through automated harvesting services that minimize additional work for library staff, other than for setting up and managing the automated services themselves. the internal xc metadata cache can be easily regenerated from the original repositories and services when necessary, such as to enable future changes to the internal xc metadata schema. the xc system architecture also makes use of internal metadata duplication among xc’s components, which allows these components to communicate with each other using oaipmh. this built-in metadata redundancy will also enable xc to communicate with external services using this standard protocol. it is important to distinguish the deliberate metadata redundancies built into the xc architecture from the type of metadata redundancies that have been singled out for elimination in the library of congress working group on the future of bibliographic control draft report (recommendation 1.1)12 and previously in the university of california (uc) libraries bibliographic services task force’s final report.13 these other “negative” redundancies result from difficulties in sharing metadata among different environments and cause significant additional staff expense for libraries to enrich or recreate metadata locally. xc’s architecture actually solves many of these problems by facilitating the sharing of enriched metadata among xc users. xc can also adapt as the library community begins to address the types of costly metadata redundancies mentioned in the above reports, such as between the oclc worldcat database14 and copies of that marc data contained within a library’s ils, because xc will be capable of harvesting metadata from any source that uses a standard api.15 metadata from digital repositories and other free sources xc will harvest metadata from various digital collections and repositories, using oai-pmh, and will maintain a copy of the harvested metadata within the xc metadata cache, as shown in figure 1. the metadata services hub architecture provides flexibility and possible economy for xc users by offering the option for multiple xc institutions to share a single metadata hub, thus allowing participating institutions to take full advantage of the hub’s capabilities to aggregate and augment metadata from multiple sources. while the procedure for harvestmetadata to support next-generation library resource discovery | bowen 9 ing metadata from an external repository is not technologically difficult in itself, managing the flow of metadata coming from multiple sources and aggregating that metadata for use in xc will require the development of sophisticated software. to address this, the xc project team is partnering with established experts in bibliographic metadata aggregation to develop the metadata services portion of the xc architecture. the team from cornell university that has developed the software behind the national science digital library’s metadata management system (nsdl/mms)16 is advising the xc team in the development of the xc metadata services hub, which will be built on top of the basic nsdl/mms software. the xc metadata services hub will coordinate metadata services into a reusable task grouping that can be started on demand or scheduled to run regularly. this xc component will harvest xml metadata and combine metadata records that refer to equivalent resources (based on uniform resource identifier [uri], if available, or other unique identifier) into what the cornell team describes as a “mudball.” each mudball will contain the original metadata, the sources for the metadata, and the references to any services used to combine metadata into the mudball. the mudball may also contain metadata that is the result of further automated processing or services to improve quality or to explicitly identify relationships between resources. hub services could potentially record the source of each individual metadata statement within each mudball, which would then allow a metadata record to be redelivered in its original or in an enriched form when requested.17 by allowing for the capture of provenance data for each data element, the hub could potentially provide much more granular information about the origin of metadata—and much more flexibility for recombining metadata—than is possible in most marcbased environments. after using the redeployed nsdl/mms software as the foundation for the xc metadata hub, the xc project team will develop additional hub services to support xc’s functional requirements. xc-specific hub services will accommodate incoming marc data (including marc holdings data for non-digital resources); basic authority control; mappings from marc 21, marcxml,18 and dublin core to an internal xc schema defined within the xc application profile (described below); and other services to facilitate the functionality of the xc user environments (see discussion of requirement 5, below). finally, the xc hub services will make the metadata available for harvesting from the hub by the xc client integration applications. metadata for licensed content for a next-generation discovery system such as xc to provide access to all library resources, it will need to provide access to licensed content, such as citation data and full-text databases. metasearch technology provides one option for incorporating access to licensed content into xc. unfortunately, various difficulties with metasearch technology19 and usability issues with some metasearch products20 make metasearch technology a less-than-ideal solution. an alternative approach would bring metadata from licensed content directly into a system such as xc. the metadata services hub architecture for xc is capable of handling the ingest and processing of metadata supplied by commercial content providers by adding additional services to handle the necessary schema transformations and to control access to the licensed content. the more difficult issue with licensed content may be to obtain the cooperation of commercial vendors to ingest their metadata into xc. pursuing individual agreements with vendors to negotiate rights to ingest their metadata is beyond the original scope of xc’s phase 2 project. however, the xc team will continue to monitor ongoing developments in this area, especially the work of the ethicshare project, which uses a system architecture very similar to that of xc.21 it remains our goal to build a system that will facilitate the inclusion of licensed content within xc in situations where commercial providers have made it available to xc users. requirement 1 summary when considering needed functionality for a next-generation discovery system, the ability to ingest and manage metadata from a variety of sources is of paramount importance. unlike a current ils, where we often think of metadata as mostly static unless it is supplemented by new, updated, and deleted records, we should instead envision the metadata in a next-generation system as being in constant motion, moving from one environment to another and being harvested and transformed on a scheduled basis. the metadata services hub architecture of the xc system will accommodate and facilitate such constant movement of metadata. requirement 2—the system must handle multiple metadata schemas. an extension of requirement 1 will be the necessity for a next-generation system such as xc to handle metadata from multiple schemas, as the system harvests those schemas from various sources. library metadata priorities as a part of the xc survey of libraries described earlier in this paper, the xt team queried respondents about what metadata schemas they currently use or plan to use in the near future. many responding libraries indicated that they expect to increase their use of non–marc 21 metadata within the next three years, although no library indicated the intention to completely move away from 10 information technology and libraries | june 2008 marc 21 within that time period. nevertheless, the idea of a “marc exit strategy” has been discussed in various circles.22 the architecture of xc will enable libraries to move beyond the constraints of a marc-based system without abandoning their ils, and will provide an opportunity for libraries to stage their “marc exit strategy” in a way that suits their purposes. libraries also indicated that they plan to move away from homegrown schemas toward accepted standards such as mets,23 mods,24 mads,25 premis,26 ead,27 vra core,28 and dublin core.29 several responding libraries plan to move toward a wider variety of metadata schemas in the near future, and will focus on using xmlbased schemas to facilitate interoperability and metadata harvesting. to address the needs of these libraries in the future, xc’s metadata services will contain a variety of transformation services to handle a variety of schemas. taking into account the metadata schemas mentioned the most often among survey respondents, the software developed during phase 2 of the xc project will support harvested metadata in marc 21, marcxml, and dublin core (including qualified dublin core).30 metadata crosswalks and mapping one respondent to the xc survey offered the prediction that “reuse of existing metadata and transformation of metadata from one format to another will become commonplace and routine.”31 xc’s internal metadata transformations must be designed with this in mind, to facilitate making these activities “commonplace and routine.” fortunately, many maps and crosswalks already exist that potentially can be incorporated into a next-generation system such as xc.32 the metadata services hub architecture for xc can function as a standard framework for applying a variety of existing crosswalks within a single, shared environment. following “best practices” for crosswalking metadata, such as those developed by the digital library federation (dlf),33 will be extremely important in this environment. as the dlf guidelines describe, metadata schema transformation is not as straightforward as it might first appear to be. while the dlf guidelines advise always crosswalking from a more robust schema to a simpler one, sometimes in a series of steps, such mapping will often result in “dumbing down” of metadata, or loss of granularity. this is a particularly important concern for the xc project because a large percentage of the metadata handled by xc will be rich legacy marc 21 metadata, and we hope to maintain as much of that richness as possible within the xc system. in addition to simply mapping one data element in a schema to its closest equivalent in another, it is essential to ensure that the underlying metadata models of the two schemas being crosswalked are compatible. the authors of the framework for a bibliographic future draft document define multiple layers of such models that need to be considered,34 and offer a general highlevel comparison between the frbr data model35 and the dcmi (dublin core metadata initiative) abstract model (dcam).36 more detailed comparisons of models are also taking place as a part of the development of the new metadata content standard, resource description and access (rda).37 the developers of rda have issued documents offering a detailed mapping of rda elements to rda’s underlying model (frbr)38 and analyzing the relationship between rda elements, the dcmi abstract model, and the metadata framework.39 as a result of a meeting held april 30–may 1, 2007, a joint dcmi/rda task group is now undertaking the collaborative work necessary to carry out the following tasks: n develop an rda element vocabulary. n develop an rda/dublin core application profile based on frbr and frad. n disclose rda value vocabularies using rdf/ rdfs/skos.40 these efforts hold much potential to provide a more rigorous way to communicate about metadata across multiple communities and to increase the compatibility of different metadata schemas and their underlying models. such compatibility will be essential to enabling the functionality of future discovery systems such as xc. an xc metadata application profile the xc project team will define a metadata application profile for xc as a way to document decisions made about data elements, content standards, and crosswalking used within the system. the use of an application profile can facilitate metadata migration, harvesting, and other automated processes, and presents an approach to metadata that is more flexible and responsive to local needs than simply adopting someone else’s metadata guidelines.41 application profiles facilitate the use of multiple schemas because elements can be selected for inclusion from more than one existing schema, or additional elements can be created and defined locally.42 because the xc system will incorporate harvested metadata from a variety of sources, the use of an application profile will be essential to support xc’s complex system requirements. the dcmi community has published guidelines for creating a dublin core application profile (dcap), which is defined more specifically as: [a] form for documenting which terms a given application uses in its metadata, with what extensions or adaptations, and specifying how those terms relate both to formal standards such as dublin core as well as to less formally defined element sets and vocabularies.43 metadata to support next-generation library resource discovery | bowen 11 the announcement of plans to develop an rda/ dublin core application profile illustrates the important role that application profiles are beginning to take to facilitate the interoperability of metadata schemas. the planned rda/dc application profile will “translate” rda into a standard structure that will allow it to be related more easily to other metadata element sets. unfortunately, the rda/dc application profile will likely not be completed in time for it to be incorporated into the first release of the xc software in mid-2009. nevertheless, we intend to use the existing definitions of rda elements to inform the development of the xc application profile.44 this will allow us to anticipate any future incompatibilities between the rda/dc and the xc application profiles, and ensure that xc will be wellpositioned to take advantage of rda-based metadata when rda is implemented. this process may have the reciprocal benefit of also informing the developers of rda of any rda elements that may be difficult to implement within a next-generation system such as xc. the potential value of rda to the xc project—in terms of providing a consistent approach to bibliographic and authority metadata and facilitating frbr-related user functionality—is very significant. it is hoped that at some point xc can become an early adopter of rda and provide a mechanism through which libraries can move their legacy marc 21 metadata into a system that is compatible with an emerging international metadata standard. n goal 2: bring metadata about library resources into a more open web environment xc will reveal library metadata not only through its own separate interface (either the out-of-the-box xc interface or an interface designed by the local library), but will also allow library metadata to be revealed through other web applications. the latter approach will bring library resources directly to web locations that library users are already visiting, rather than attempting to entice users to visit an additional library-specific web location. making library metadata work effectively in the broader web environment (outside the well-defined boundaries of an ils or repository) will require the following requirements 3 and 4: requirement 3—metadata must conform to the standards of the new web environments as well as to that of the system from which it originated. achieving requirement 3 will require library metadata in future systems to perform a dual function: to conform to both existing library standards as well as to web standards and conventions. one way to achieve this is to ensure that the two types of standards themselves are compatible. coyle and hillmann have argued persuasively for changes in the direction of rda development to allow metadata created using rda to function in the broader web environment. these changes include the need to follow a clearly refined, high-level metadata model, to create data elements that can be manipulated by machines, and to move toward the use of uris instead of textual identifiers.45 after the announcement of the outcomes of the rda/dc data modeling meeting, the two authors are considerably more optimistic about rda functioning as a standard within the broader web environment.46 this discourse concerning rda shows but a piece of the process through which long-established library metadata standards need to be reexamined to make library metadata understandable to both humans and machines on the web. moving away from aacr2 toward rda, and ultimately toward incorporating standard web conventions into library metadata, can be a difficult process for those involved in creating and maintaining library standards. nevertheless, transforming library metadata standards in this way is essential to fulfill the requirements necessary for next-generation library discovery systems. requirement 4—metadata must function effectively within the new web environments as well as within the system from which it originated. not only must metadata for a next-generation system follow the conventions and standards used in the broader web, but the data also needs to be able to function effectively in a broader web environment. this is a slightly different proposition from requirement 3, and will necessitate testing the metadata standards themselves to ensure that they enable library metadata to function effectively. the xc project will provide direct experience with using library metadata in two types of web environments: content management systems and learning management systems. library metadata in a content management system as shown in the xc architecture diagram in figure 1, the xc project team will build one of the primary user environments for xc on top of the open-source content management system, drupal.47 the xc drupal module will allow us to respond to many of the needs expressed by libraries in their responses to the xc survey48 by supplying: n a web application server with a back-end database; 12 information technology and libraries | june 2008 n a user interface with web 2.0 features; n library-controlled web pages that will treat library metadata as a native data type; n a metadata interface for enhancing or correcting metadata in the system; and n an administrative interface. the xc team will bring library metadata into the drupal content management system (cms) as a native content type within that environment, creating a drupal “node” for each metadata record. this will allow xc to take advantage of many native features of the drupal cms, such as a taxonomy system.49 building xc interfaces on top of the drupal cms will also give us an opportunity to collaborate with partner libraries that are already active participants in the drupal user community. xc’s architecture will allow the possibility of developing additional user environments on top of other content management systems. bringing library metadata into these new environments will provide many new opportunities for libraries to manipulate their metadata and present it to users without being constrained by the limitations of the current generation of library systems. such opportunities will then inform the future requirements for library metadata in such environments. library metadata in a learning management system figure 1 illustrates two examples of xc user environments through learning management systems: xc interfaces to both the blackboard learning system50 and sakai.51 much exciting work is being done at other institutions to bring library content into these web applications.52 xc will build on projects such as these to reveal library metadata for non-licensed library resources from an ils through learning management systems. specifically, we plan to develop the capability for libraries to make the display of library metadata context-sensitive within the learning management system. for example, searching or browsing on a page for a particular academic course could be configured to reflect the subject area of the course (e.g., chemistry) and automatically present library resources related to that subject.53 this capability will build upon the experiences gained by the university of rochester through its work to develop its “course resources” system.54 such xc functionality will be integrated directly into the learning management system, rather than simply providing a link out to a separate library system. again, we hope that our efforts to bring library metadata into these new environments will encourage libraries to engage in further work to integrate library resources into broader web environments and inform future requirements for library metadata in these environments. n goal 3: provide an interface with new web functionality such as web 2.0 features and faceted browsing new functionality for users will require that metadata fulfill more sophisticated functions in a next-generation system than it may have done in an ils or repository, in order to provide more intuitive searching and navigation. the system will also need to capture and incorporate metadata generated through tagging, user-contributed reviews, etc. such new functionality creates the need for requirements 5 and 6. requirement 5—metadata must support functionality to facilitate intuitive searching and navigation, such as faceted browsing and frbrinformed results groupings. enabling faceting and clustering much research has already been done regarding the design of faceted search interfaces in general.55 when considered along with user research conducted at other institutions56 and to be conducted during the development of xc, this data provides a strong foundation for the design of a faceted browse environment. the xc project team has already gained firsthand experience with developing faceted browsing through the development of the “c4” prototype interface during phase 1 of the xc project.57 to enable faceting within xc, we will also pay particular attention to what others have discovered through designing faceted interfaces on top of legacy marc 21 metadata. specific lessons learned from those involved with north carolina state university’s endeca-based catalog,58 vanderbilt university’s primo implementation,59 and plymouth state university’s scriblio system60 provide valuable guidance for the xc project team as we design facets for the xc system. ideally, a mechanism should be developed to enable these discoveries to feed back into the development of metadata and encoding standards, so that changes to existing standards can be considered to facilitate faceting in the future. several new system implementations have used library of congress subject headings (lcsh) and lc subdivisions from marc 21 records as the basis for deriving facets. the xc “c4” prototype interface provides facets for topic, genre, and region that are based simply upon one or more marc 21 6xx tags.61 north carolina state university’s endeca-based system has enabled facets for topic, genre, region, and era using lcsh subdivisions as well, but this has necessitated a “massive cleanup” of subdivisions, as described by charley pennell.62 oclc’s fast (faceted application of subject terminology) project may provide another option for enabling such facets.63 a library could populate its marc 21 data with fast headings, based metadata to support next-generation library resource discovery | bowen 13 upon the existing lcsh in the records, and then use the fast headings as the basis for generating facets. it remains to be seen whether fast will offer significant benefit over lcsh itself when it comes to faceting, however, since fast headings are generated directly from lcsh. while marc 21 metadata has some known difficulties where faceting and clustering are concerned (such as those involving lcsh), the xc system will encounter additional difficulties when implementing these technologies with less robust metadata schemas such as simple dublin core, and especially across metadata from a variety of schemas. the development of web services to augment batches of metadata records in an automated manner holds some promise for improving the creation of facets from other metadata schemas. within the xc system, such services could be added to the metadata services hub and run against ingested metadata. while designing extensive services of this type is beyond the scope of the next phase of xc software development, we will encourage others to develop such services for xc. another (but much less desirable) approach to augmenting metadata is for a metadata specialist to manually edit one record or group of records. the xc cataloging interface, built within the drupal cms, will allow recordby-record editing of metadata when necessary. while we see this editing interface as essential functionality for xc, we anticipate that libraries will want to use this feature sparingly. in many cases it will be preferable to correct or augment metadata within its original repository (e.g., the institution’s ils) and then re-harvest the corrected metadata, rather than correcting it manually within xc itself. because of the expense of manual metadata augmentation and correction, libraries will be well-advised to rely upon insights gained through user research to assess the value of this type of work. for example, a library might decide to edit individual metadata records only when the correction or augmentation will support specific system functionality that is of high priority for the institution’s users. implementing frbr results groupings to incorporate logical groupings of search results based upon the frbr64 and frad65 data models over sets of diverse metadata within xc, we will encounter similar difficulties that we face with faceting and clustering. various analyses of the marc 21 formats have dealt extensively with the relationship between frbr and marc 21,66 and others have written specifically about methodology for frbrizing a marc-based catalog.67 in addition, various tools and web services are available that can potentially facilitate this process.68 even with this extensive body of work to draw upon, however, the success of our implementation of frbr-based functionality will depend upon both the quality and completeness of the system’s metadata. metadata in xc that originated as dublin core records may need significant augmentation to be incorporated effectively into frbrized results displays. to maximize the ability of the system to support frbr/frad results groupings, we may need to supplement automated grouping of resources with a combination of additional services for the metadata services hub, and with cataloger-generated metadata correction and augmentation, as described above.69 the xc team will use the results of user research carried out during the next phase of the xc project to inform our decision-making regarding what frbr-informed results grouping users find helpful, and then assess what specific metadata augmentation services are needed for xc. providing frbr-informed groupings of related records in search results will be easier when the underlying metadata incorporates principles of authority control. of course, the vast majority of the non-marc metadata that will be ingested into xc will not be under authority control. again, this situation suggests the need for additional services or functionality to improve existing metadata within the xc metadata hub, the xc cataloging interface, or both. as an experiment in developing services to facilitate authority control, the xc project team carried out a pilot project in partnership with a group of software engineering students from the rochester institute of technology (rit) during phase 1 of xc. the rit students designed a basic name access control tool that can be used across disparate metadata schemas in an environment such as xc. the tool can ingest marc 21 authority and bibliographic records as well as dublin core records, provide automated matching, and facilitate a cataloger’s handling of problem reports.70 the xc project team will implement the automated portion of the tool as a web service within the xc hub, and the “cataloger facilitation” portion of the tool within the xc cataloging user interface. institutions that use xc can then incorporate additional tools to facilitate authority control into xc as they are needed and developed. in addition to providing a test case for developing xc metadata services, the rit pilot project proved valuable by providing an opportunity for student software developers and catalogers to discuss the functional requirements of a cataloging tool. not only did the experience enable the developers to understand the needs of the system’s intended users, but it also presented an opportunity for the engineering students to demonstrate technological possibilities that the catalogers—who work almost exclusively with legacy ils technology—may not have envisioned before participating in the project. requirement 6—the system must manage usergenerated metadata resulting from user tagging, submission of reviews, etc. because users now expect web-based tools to offer web 2.0 functionalities, the xc project has as one of its basic 14 information technology and libraries | june 2008 goals to incorporate these functionalities into xc’s user environments. the results of the xc survey rank tools to support the finding, gathering, use, and reuse of scholarly content (e.g., rss feeds, blogs, tagging, user reviews) eighth out of a list of twenty new desirable opac features.71 we expect to learn much more about the usefulness of web 2.0 technology within a next-generation system through the user research that we will carry out during phase 2 of the xc project. the xc system will capture metadata generated by users from any one of the system’s user environments (e.g., drupal-based interface, learning management system integration) and harvest it back into the system’s metadata services hub for processing.72 the xc application profile will incorporate user-generated metadata, mapped into its own carefully defined metadata elements. this will allow us to capture and manage this metadata as discrete content, without inadvertently mixing it with other metadata created by library staff or ingested from other sources. n goal 4: conduct user research to inform system development user research will be essential to informing the design and functionality of the xc software. to align xc’s functional requirements as closely as possible with user needs, the xc project team will practice a user-centered design methodology that takes an iterative approach to defining the system’s functional requirements. since we will engage concurrently in the processes of user research and software design, we will not fully determine the system requirements for xc until a significant amount of user research has been done. a complete picture of the demands upon metadata within xc will thus emerge as we gain information from our user research. n goal 5: publish the xc code as open-source software central to the vision of the xc project is sharing the xc software freely throughout the library community and beyond. our hope is that others will use all or part of the xc software, modify it, and improve it to meet their own needs. new requirements for the metadata within xc are likely to arise as this process takes place. other future changes to the xc software will also be needed to ensure the software’s continued compatibility with various metadata standards and schemas. these changes will all affect the system requirements for xc over time. addressing goals 4 and 5 while goals 1 through 3 for the xc project result in specific high-level functional requirements for the system’s discovery metadata that can be addressed and discussed as xc is being developed, goals 4 and 5 present general challenges that must be addressed in the future. goal 4 is likely to fuel the need to update the xc software over time as the needs of users change. goal 5 provides a challenge to managing that updating process in a collaborative environment. these two goals suggest an additional general requirement for the system’s metadata requirement 7: requirement 7—the system’s metadata must be extensible to facilitate future enhancements and updates. enabling future user needs developing xc using a user-centered design process in which user research and software design occur simultaneously will enable us to design and build a system that is as responsive as possible to the needs of users that are seeking library resources. however, user needs will change during the life of the xc software. these needs must be assessed and addressed, and then weighed against the desires of individual institutions that use xc and who request specific system enhancements. to carry forward the xc project’s commitment to serving users, we will develop a governance model for the xc community that brings the needs of future users into the decision-making process by providing a method for continuing to determine and capture user needs. in addition, we will consciously cultivate a commitment to user research among members of the xc community. because the xc software will be released as open source, we can also encourage xc partners to develop whatever additional functionality they need for their own institutions and make these enhancements available to the entire community of xc users. this approach is very different from the enhancement process in place for most commercial systems, and xc partner institutions may need to adjust to this approach. enabling future metadata standards as current metadata standards are revised and new standards and schemas are created, xc must be able to accommodate these changes. new crosswalks will allow new metadata schemas to be mapped to the xc internal schema in the future. the xc application profile can be updated with the addition of new data elements as needed. the drupal-based xc user environment will also allow institutions that use xc to create new internal data types to incorporate additional types of metadata. as the development of the semantic web moves forward73 and enables smart linking between existing authority files and vocabularies,74 xc’s architecture can make use of the resulting web services, either by incorporating them metadata to support next-generation library resource discovery | bowen 15 through the xc metadata services hub or through the native xc user interface as part of a user search query. n further considerations the above discussion of the goals and requirements for xc has revealed a number of issues related to the development of next-generation discovery systems that are unfortunately beyond the scope of the next phase of the xc project. we therefore offer them as a possible agenda for future work by the broader library community: 1. explore the wider usefulness of web-based metadata services and the need for an automated metadata services coordinator to control these functions. libraries are already comfortable with basic “services” that are performed on metadata by an outside agency: for example, a library may send copies of its marc records to a vendor for authority processing or enrichment with tables of contents or other data elements. the library community should encourage vendors and others to develop these and other metadata enrichment options as automated web services. 2. study the advantages of using statement-level metadata provenance, as used in the nsdl metadata management system and considered for use within the xc metadata services hub, and explore whether there are ways that marc 21 could move toward allowing more granularity in recording and sharing metadata provenance. 3. to facilitate access to licensed library resources, encourage the development of more robust metasearch technology and standards so that technological limitations do not hinder system performance and search result usability. if this is not successful, libraries and content providers must work together to enable metadata for licensed resources to be revealed within open discovery environments such as xc and ethicshare.75 this second scenario will enable libraries to directly address usability issues with the display of licensed content, which may make it a more desirable longer-term solution than attempting to improve metasearch technology. 4. the administrative bodies of the two groups represented on the dcmi/rda task group (i.e., the dublin core metadata initiative and the rda committee of principals) have a responsibility to take the lead in funding this group’s work to develop and maintain the rda/dc application profile and its related registries and vocabularies. beyond this, however, the broader library community must recognize that this work is essential to ensure that future library metadata standards will function in the broader web environment, and offer additional administrative and financial support for it in the coming years. 5. to ensure that library standards work effectively outside of traditional library systems, catalogers and metadata experts must develop ongoing, collaborative working relationships with system developers. such collaboration will necessitate educating each group of experts about the domain of the other. 6. libraries should experiment with using metadata in new environments and use the lessons learned from this activity to inform the metadata standards development process. while current library automation environments by and large do not provide opportunities for this, the extensible catalog will provide a flexible platform where experimentation can take place.76 xc will make experimentation as risk-free as possible by ensuring that the original metadata brought into the system can be reharvested in its original form, thus minimizing concerns about possible data corruption. xc will also minimize the investment needed for a library to engage in this experimentation because it will be released as open-source software. 7. to facilitate new functionality for next-generation library discovery environments, libraries must share their new expertise in this area with each other. for example, library professional organizations (such as ala and its associations) should form discussion groups and committees devoted to sharing lessons learned from the implementation of faceted interfaces and web 2.0 technologies, such as tagging and folksonomies. such groups should develop a “best practices” document outlining a preferred way to define facets from marc 21 data that can be used by any library implementing faceting on top of its legacy metadata. 8. the library community should discuss and encourage mechanisms for pooling and sharing usergenerated metadata among libraries and other interested institutions. n conclusions to present library resources via the web in a manner that users now expect, library metadata must function in ways that have never been required of it before. making library metadata function effectively within the broader web environment will require that libraries take advantage of the combined knowledge of experts in the areas of cataloging/metadata and system development who share a 16 information technology and libraries | june 2008 common vision for serving library users. the challenges to making legacy library metadata and newer metadata for digital resources interact effectively in the broader web environment are significant, and work must begin now to ensure that we can preserve the investment that libraries have made in their legacy metadata. while the recommendations within this report are the result of planning to develop one particular library discovery system—the extensible catalog (xc)—these lessons can inform the development of other systems as well. the actual development of xc will continue to add to our knowledge in this area. while it may be tempting to wait and see what commercial vendors offer as their next generation of commercial discovery products, such a passive approach may jeopardize the future viability of library metadata. projects such as the extensible catalog can serve as a vehicle for moving forward by providing an opportunity for libraries to experiment and to then take informed action to move the library community toward a next generation of resource discovery systems. acknowledgments phase 1 of the extensible catalog project was funded through a grant from the andrew w. mellon foundation. this paper is in partial fulfillment of that grant, originally funded on april 1, 2006, and concluding on june 30, 2007. the author acknowledges the contributions of the entire university of rochester extensible catalog project team to the content of this paper, and especially thanks david lindahl, barbara tillett, and konstantin gurevich for reading and offering suggestions on drafts of this paper. references and notes 1. despite the use of the word “catalog” within the name of the extensible catalog project, this paper will avoid using the word “catalog” in the phrase “next-generation catalog” because this may misleadingly convey the idea of a catalog as solely a single, separate web destination for library users. instead, terms such as “discovery environment” and “discovery system” will be preferred. 2. the xc blog provides a list of xc partners, describes their roles in xc phase 2, and provides links to reports that represent the outcomes of xc phase 1. “xc (extensible catalog): an opensource online system that will unify access to traditional and digital library resources,” www.extensiblecatalog.info (accessed october 4, 2007). 3. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records (munich: k. g. saur, 1998), www.ifla.org/vii/s13/frbr/ frbr.pdf (accessed july 23, 2007). 4. ifla working group on functional requirements and numbering of authority records (franar), “functional requirements for authority data: a conceptual model,” april 1, 2007, www.ifla.org/vii/d4/franar-conceptualmodel2ndreview.pdf (accessed july 23, 2007). 5. library of congress, network development and marc standards office, “marc 21 formats,” april 18, 2005, www.loc .gov/marc/marcdocz.html (accessed september 3, 2007). 6. “dublin core metadata element set, version 1.1,” december 20, 2004, http://dublincore.org/documents/dces (accessed september 3, 2007). 7. university of rochester river campus libraries, “extensible catalog phase 2,” (grant proposal submitted to the andrew w. mellon foundation, july 11, 2007). 8. “literature list,” extensible catalog blog, www. extensiblecatalog.info/?page_id=17 (accessed august 27, 2007). 9. a summary of the results of this survey is available on the xc blog. nancy fried foster et al., “extensible catalog survey report,” july 20, 2007, www.extensiblecatalog.info/wp-content/ uploads/2007/07/xc%20survey%20report.pdf (accessed july 23, 2007). 10. lorcan dempsey has written of the need for a service layer for libraries that would facilitate the “de-coupling” of resource retrieval from back-end processing. lorcan dempsey, “a palindromic ils service layer,” lorcan dempsey’s weblog, january 20, 2006, http://orweblog.oclc.org/archives/000927. html (accessed august 24, 2007). 11. “open archives initiative protocol for metadata harvesting v. 2.0,” www.openarchives.org/oai/openarchivesprotocol. html (accessed august 27, 2007). 12. library of congress, working group on the future of bibliographic control, “report on the future of bibliographic control: draft for public comment,” november 30, 2007, www .loc.gov/bibliographic-future/news/lcwg-report-draft-11-3007-final.pdf (accessed december 30, 2007). 13. university of california libraries bibliographic services task force, “rethinking how we provide bibliographic services for the university of california,” final report, 34, http://libraries. universityofcalifornia.edu/sopag/bstf/final.pdf (accessed august 24, 2007). 14. “[worldcat.org] search for an item in libraries near you,” www.worldcat.org (accessed august 24, 2007). 15. oclc’s plan to create additional apis to worldcat as part of its worldcat grid project is a welcome development that may enable oclc members to harvest metadata directly from worldcat into a system such as xc in the future. see the following blog posting for an early description of oclc’s plans, which have not been formally unveiled by oclc as of this writing: bess sadler, “the librarians and the chocolate factory: oclc developer network day,” solvitur ambulando, october 3, 2007, www.ibiblio.org/bess/?p=88 (accessed december 30, 2007). 16. “metadata management system,” nsdl registry, september 20, 2006, http://metadataregistry.org/wiki/index.php/ metadata_management_system (accessed july 23, 2007). 17. diane hillmann, stuart sutton, and jon phipps, “nsdl metadata improvement and augmentation services,”(grant proposal submitted to the national science foundation, 2007). 18. library of congress, network development and marc standards office, “marcxml: marc 21 xml schema,” july 26, 2006, www.loc.gov/standards/marcxml (accessed september 3, 2007). metadata to support next-generation library resource discovery | bowen 17 19. andrew k. pace, “category: metasearch,” hectic pace, http://blogs.ala.org/pace.php?cat=150 (accessed august 27, 2007). see in particular the following blog entries: “metameta,” july 25, 2006; “more meta,” september 29, 2006; “preaching to the publishers,” oct 31, 2006; “even more meta,” july 11, 2007; and “still here,” august 21, 2007. 20. david lindahl, “metasearch in the users’ context,” the serials librarian 51, no. 3/4 (2007): 220–222. 21. ethicshare, a collaborative project of the university of minnesota, georgetown university, indiana university–bloomington, indiana university–purdue university indianapolis, and the university of virginia, is addressing this challenge as part of its plan to develop a sustainable online environment for the practical ethics community. the architecture of the proposed ethicshare system has many similarities to that of xc, but the project focuses specifically upon ingesting citation metadata from a variety of sources, including commercial providers. see cecily marcus, “ethicshare planning phase final report,” july 2007, www.lib.umn.edu/about/ethicshare/university%20 of%20minnesota_ethicshare_final_report.pdf (accessed august 27, 2007). 22. roy tennant used this phrase in “marc exit strategies,” library journal 127, no. 19 (november 15, 2002), www.libraryjournal.com/article/ca256611.html?q=tennant+exit (accessed july 23, 2007); karen coyle presented her vision for moving beyond marc to a more flexible, identifier-based record structure that will facilitate a range of library functions in “future considerations: the functional library systems record,” library hi tech 22, no. 2 (2004). 23. library of congress, network development and marc standards office, “mets: metadata encoding and transmission standard official web site,” august 23, 2007, www.loc.gov/ standards/mets (accessed september 3, 2007). 24. library of congress, network development and marc standards office, “mods: metadata object description schema,” august 22, 2007, www.loc.gov/standards/mods (accessed september 3, 2007). 25. library of congress, network development and marc standards office, “mads: metadata authority description schema,” february 2, 2007, www.loc.gov/standards/mads (accessed september 3, 2007). 26. “premis: preservation metadata maintenance activity,” july 31, 2007, www.loc.gov/standards/premis (accessed september 3, 2007). 27. library of congress, network development and marc standards office, “ead: encoded archival description version 2002 official site,” august 17, 2007, www.loc.gov/ead (accessed september 3, 2007). 28. visual resources association, “vra core: welcome to the vra core 4.0,” www.vraweb.org/projects/vracore4 (accessed september 3, 2007). 29. “dublin core metadata element set, version 1.1.” 30. other xml-compatible schemas, such as mods and mads, will also be supported initially in xc if they are first converted into marc xml or qualified dublin core. in the future, we plan to allow these other schemas to be harvested directly into xc. 31. foster et al., “extensible catalog survey report,” july 20, 2007, 15. the original comment was submitted by meg bellinger in yale university’s response to the xc survey. 32. patricia harpring et al., “metadata standards crosswalks,” in introduction to metadata: pathways to digital information (getty research institute, n.d.), www.getty.edu/research/ conducting_research/standards/intrometadata/crosswalks. html (accessed august 29, 2007); see also carol jean godby, jeffrey a. young, and eric childress, “a repository of metadata crosswalks,” d-lib magazine 10, no. 12 (december 2004), www .dlib.org/dlib/december04/godby/12godby.html (accessed july 23, 2007). 33. digital library federation, “crosswalkinglogic,” june 22, 2007, http://webservices.itcs.umich.edu/mediawiki/oaibp/ index.php/crosswalkinglogic (accessed august 28, 2007). 34. karen coyle et al., “framework for a bibliographic future,” may 2007, http://futurelib.pbwiki.com/framework (accessed july 23, 2007). 35. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records. 36. andy powell et al., “dcmi abstract model,” dublin core metadata initiative, june 4, 2007, http://dublincore.org/ documents/abstract-model (accessed august 29, 2007). 37. joint steering committee for development of rda, “rda: resource description and access: background,” july 16, 2007, www.collectionscanada.ca/jsc/rda.html (accessed august 29, 2007). 38. joint steering committee for development of rda, “rda-frbr mapping,” june 14, 2007, www.collectionscanada .ca/jsc/docs/5rda-frbrmapping.pdf (accessed august 29, 2007). 39. joint steering committee for development of rda, “rda element analysis,” june 14, 2007, www.collectionscanada.ca/ jsc/docs/5rda-elementanalysis.pdf (accessed august 28, 2007). a revised version of the document was issued on december 16, 2007, at www.collectionscanada.gc.ca/jsc/docs/5rda-element analysisrev.pdf (accessed december 30, 2007). 40. “data model meeting: british library, london 30 april–1 may 2007,” www.bl.uk/services/bibliographic/meeting.html (accessed july 23, 2007). the task group has outlined its work plan, including deliverables, on its wiki at http://dublincore .org/dcmirdataskgroup (accessed october 4, 2007). 41. emily a hicks, jody perkins, and margaret beecher maurer, “application profile development for consortial digital libraries,” library resources and technical services 51, no. 2 (april 2007). 42. makx dekkers, “application profiles, or how to mix and match metadata schemas,” cultivate interactive, january 2001, www.cultivate-int.org/issue3/schemas (accessed august 29, 2007). 43. thomas baker et al., “dublin core application profile guidelines,” september 3, 2005, http://dublincore.org/usage/ documents/profile-guidelines (accessed october 8, 2007). 44. joint steering committee for development of rda, “rda element analysis.” 45. karen coyle and diane hillmann, “resource description and access (rda): cataloging rules for the 20th century,” d-lib magazine 13, no. 1/2 (jan./feb. 2007), www.dlib.org/dlib/ january07/coyle/01coyle.html (accessed august 24, 2007). 46. karen coyle, “astonishing announcement: rda goes 2.0,” coyle’s information, may 3, 2007, http://kcoyle.blogspot .com/2007/05/astonishing-announcement-rda-goes-20.html (accessed august 29, 2007). 18 information technology and libraries | june 2008 47. “drupal.org,” http://drupal.org (accessed august 30, 2007). 48. foster et al., “extensible catalog survey report,” 14. 49. “taxonomy: a way to organize your content,” drupal.org, http://drupal.org/handbook/modules/taxonomy (accessed september 12, 2007). 50. “blackboard learning system,” www.blackboard.com/ products/academic_suite/learning_system/index.bb (accessed august 31, 2007). 51. “sakai: collaboration and learning environment for education,” http://sakaiproject.org (accessed august 31, 2007). 52. for example, the library into blackboard project at california state fullerton has developed a toolkit for faculty that brings openurl resolver functionality into blackboard to create linked citations to resources. see “putting the library into blackboard: a toolkit for cal state fullerton faculty,” 2005, www .library.fullerton.edu/librarytoolkit/default.shtml (accessed august 31, 2007); and susan tschabrun, “putting the library into blackboard: using the sfx openurl generator to create a toolkit for faculty.” the sakaibrary project at indiana university and the university of michigan are working to integrate licensed library content into sakai using metasearch technology. see “sakaibrary: integrating licensed library resources with sakai,” june 28, 2007, www.dlib.indiana.edu/projects/sakai (accessed august 31, 2007). 53. university of rochester river campus libraries, “extensible catalog phase 2.” 54. susan gibbons, “library course management systems: an overview,” library technology reports 41, no. 3 (may/june 2005): 34–37. 55. marti a. hearst, “design recommendations for hierarchical faceted search interfaces,” august 2006, http:// flamenco.berkeley.edu/papers/faceted-workshop06.pdf (accessed august 31, 2007). 56. kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology and libraries 25, no. 3 (september 2006): 128–138. 57. “c4,” https://www.library.rochester.edu/c4 (accessed september 28, 2007). as of the time of this writing, the c4 prototype is available to the public. however, the prototype is no longer being developed, and this prototype may cease to be available at some point in the future. 58. charley pennell, “forward to the past: resurrecting faceted search @ ncsu libraries,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), www.lib.ncsu.edu/endeca/ presentations/200706-facetedcatalogs-pennell.ppt (accessed august 31, 2007). 59. mary charles lasater, “authority control meets faceted browse: vanderbilt and primo,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), www.ala.org/ala/lita/litamembership/ litaigs/authorityalcts/2007annualfiles/marycharleslasater.ppt (accessed august 31, 2007). 60. casey bisson, “faceting and clustering: an implementation report based on scriblio,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), http://oz.plymouth.edu/~cbisson/ presentations/alaannual_2-2007june24.pdf (accessed august 31, 2007). 61. “subject access fields (6xx),” in marc 21 concise format for bibliographic data (2006), www.loc.gov/marc/bibliographic/ ecbdsubj.html (accessed september 28, 2007). 62. pennell, “forward to the past: resurrecting faceted search@ ncsu libraries.” 63. “fast: faceted application of subject terminology,” www.oclc.org/research/projects/fast (accessed august 31, 2007). 64. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records. 65. ifla working group on functional requirements and numbering of authority records (franar), “functional requirements for authority data.” 66. library of congress, network development and marc standards office, “functional analysis of the marc 21 bibliographic and holding formats,” april 6, 2006, www.loc. gov/marc/marc-functional-analysis/functional-analysis.html (accessed august 31, 2007); martha m. yee, “frbrization: a method for turning online public finding lists into online public catalogs,” information technology and libraries 24, no. 2 (june 2005): 77–95; pat riva, “mapping marc 21 linking entry fields to frbr and tillett’s taxonomy of bibliographic relationships,” library resources and technical services 48, no. 2 (april 2004): 130–143. 67. trond aalberg, “a process and tool for the conversion of marc records to a normalized frbr implementation,” in digital libraries: achievements, challenges and opportunities (berlin/heidelberg: springer, 2006), 283–292; christian monch and trond aalberg, “automatic conversion from marc to frbr,” in research and advanced technology for digital libraries (berlin/heidelberg: springer, 2003): 405–411; david mimno and gregory crane, “hierarchical catalog records: implementing a frbr catalog,” d-lib magazine 11, no. 10 (october 2005), www .dlib.org/dlib/october05/crane/10crane.html (accessed august 24, 2007). 68. trond aalberg, frank berg haugen, and ole husby, “a tool for converting from marc to frbr,” in research and advanced technology for digital libraries (berlin/heidelberg: springer, 2006), 453–456; “frbr work-set algorithm,” www .oclc.org/research/software/frbr/default.htm (accessed august 31, 2007); “xisbn (web service),” www.worldcat .org/affiliate/webservices/xisbn/app.jsp (accessed august 31, 2007). 69. for example, marc 21 data may need to be augmented to extract data attributes related to frbr works and expressions that are not explicitly coded within a marc 21 bibliographic record (such as a date associated with a work coded within a general note field); or to “sort out” the fields in a marc 21 bibliographic record for a single resource that contains various works and/or expressions (e.g. ,a sound recording with multiple tracks), to associate the various fields (performer access points, analytical entries, subject headings, etc.) with the appropriate work or expression. 70. while the rit-developed tool is not publicly available at the time of this writing, it is our intent to post it to sourceforge (www.sourceforge.net) in the near future. the final report of the rit project is available at http://docushare.lib.rochester.edu/ docushare/dsweb/get/document-27362 (accessed january 2, 2008). metadata to support next-generation library resource discovery | bowen 19 71. foster et al., “extensible catalog survey report.” 72. note the arrow pointing to the left in figure 1 between the user environments and the metadata services hub. 73. jane greenberg and eva mendez, knitting the semantic web (binghamton, ny: haworth information press, 2007). this volume, co-published simultaneously as cataloging and classification quarterly 43, no. 3/4, contains a wealth of articles that explore the role that libraries can, and should, play in the development of the semantic web. 74. corey a. harper and barbara b. tillett explore various methods for making these controlled vocabularies available in “library of congress controlled vocabularies and their application to the semantic web,” cataloging and classification quarterly 43, no. 3/4 (2007): 63. the development of skos (simple knowledge organization system), a semantic web language for representing controlled structured vocabularies, will also be valuable for xc. see alistair miles and jose r. perez-aguiera, “skos: simple knowledge organisation for the web,” catalogingand classification quarterly 43, no. 3/4 (2007). 75. marcus, “ethicshare planning phase final report.” 76. the talis platform provides another promising environment for experimentation and development. see “talis platform: semantic web application platform,” talis, www.talis.com/ platform (accessed september 2, 2007). student use of library computers: are desktop computers still relevant in today’s libraries? susan thompson information technology and libraries |december 2012 20 abstract academic libraries have traditionally provided computers for students to access their collections and, more recently, facilitate all aspects of studying. recent changes in technology, particularly the increased presence of mobile devices, calls into question how libraries can best provide technology support and how it might affect the use of other library services. a two-year study conducted at california state university san marcos library analyzed student use of computers in the library, both the library’s own desktop computers and laptops owned by students. the study found that, despite the increased ownership of mobile technology by students, they still clearly preferred to use desktop computers in the library. it also showed that students who used computers in the library were more likely to use other library services and physical collections. introduction for more than thirty years, it has been standard practice in libraries to provide some type of computer facility to assist students in their research. originally, the focus was on providing access to library resources, first the online catalog and then journal databases. for the past decade or so, this has expanded to general-use computers, often in an information-commons environment, capable of supporting all aspects of student research from original resource discovery to creation of the final paper or other research product. however, times are changing and the ready access to mobile technology has brought into question whether libraries need to or should continue to provide dedicated desktop computers. do students still use and value access to computers in the library? what impact does student computer use have on the library and its other services? have we reached the point where we should reevaluate how we use computers to support student research? california state university san marcos (csusm) is a public university with about nine thousand students, primarily undergraduates from the local area. csusm was established in 1991 and is one of the youngest campuses in the 23-campus california state university system. the library, originally located in space carved out of an administration building, moved into its own dedicated library building in 2004. one of the core principles in planning the new building was the vision of the library as a teaching and learning center. as a result, a great deal of thought went into the design of technology to support this vision. rather than viewing technology’s role as just supporting access to library resources, we expanded its role to providing cradle-to-grave support for the entire research process. we also felt that encouraging students to work in the library would encourage use of traditional library materials and the expertise of library staff, since these resources would be readily available.1 susan thompson (sthompsn@csusm.edu) is coordinator of library systems, california state university san marcos. student use of library computers | thompson 21 rethinking our assumptions about library technology’s role in the student research process led us to consider the entire building as a partner in the students’ learning process. rather than centralizing all computer support in one information commons, we wanted to provide technology wherever students want to use it. we used two strategies. first, we provided centralized technology using more than two hundred desktop computers, most located in four of our learning spaces: reference, classrooms, the media library, and the computer lab. three of these spaces are configured like information commons, providing full-service research computers grouped around the service desks near each library entrance. in addition, simplified “walk-up” computers are available on every floor. the simplified computers provide limited web services to encourage quick turnaround and no login requirement to ensure ready access to library collections for everyone, including community members. the other major component of our technology plan was the provision of wireless throughout the building, along with extensive power outlets to support mobile computing. more than forty quiet study rooms, along with table “islands” in the stacks, help support the use of laptops for group study. however, only two of these quiet studies, located in the media library, provide desktop computers designed specifically to support group work. in 2009 and again in 2010, we conducted computer use studies to evaluate the success of the library’s technology strategy and determine whether the library’s desktop computers were still meeting student needs as envisioned by the building plan. the goal of the study was to obtain a better understanding of how students use the library’s computers, including types of applications used, computer preferences, and computer-related study habits. the study addressed several specific research questions. first, librarians were concerned that the expanded capabilities of the desktop computers distracted students from an academic and library research focus. were students using the library’s computers appropriately? second, the original technology plan had provided extensive support for mobile technology, but the technology landscape has changed over time. how did the increase in student ownership of mobile devices—now at more than 80 percent—affect the use of the desktop computers? finally, did providing an application-rich computer environment encourage student to conduct more of their studying in the library, leading them more frequently to use traditional library collections and services? this article will focus on the study results pertaining to the second and third research questions. we found that, according to our expectations, students using library computer facilities also made extensive use of traditional library services. however, we were surprised to discover that the growing availability of mobile devices had relatively little impact on students’ continuing preference for libraryprovided desktop computers. literature review the concept of the information commons was just coming into vogue in the early 2000s, when we were designing our library building, and it strongly influenced our technology design as well as building design. information commons, defined by steiner as the “functional integration of technology and service delivery,” have become one of the primary methods by which libraries provide enhanced computing support for students studying in the library.2 one of the changes in libraries motivating the information-commons concept is the desire to support a broad range of learning styles, including the propensity to mix academic and social activities. particularly influential to our design was the concept of the information commons supporting students’ projects “from inception to completion” by providing appropriate technologies to facilitate research, collaboration, and consultation.3 information technology and libraries |december 2012 22 providing access to computers appears to contribute to the value of libraries as “place.” shill and toner, early in the era of information commons, noted “there are no systematic, empirical studies documenting the impact of enhanced library buildings on student usage of the physical library.” 4 since then, several evaluations of the information-commons approach seem to show a positive correlation between creation of a commons and higher library usage because students are now able to complete all aspects of their assignments in the library. for example, the university of tennessee and indiana university have shown significant increases in gate counts after they implemented their commons.5 while many studies discuss the value of information commons, very few look at why library computers are preferred over computers in other areas on campus. burke looked at factors influencing students’ choice of computing facilities at an australian university.6 given a choice of central computer labs, residence hall computers, and the library’s information commons, most students preferred the computers in the library over the other computer locations, with more than half using the library computers more than once a week. they rated the library most highly on its convenience and closeness to resources. perhaps the most important trend likely to affect libraries’ support for student technology needs is the increased use of mobile technology. the 2010 nationwide educause center for applied research (ecar) study, from the same year as the second csusm study, showed that 89 percent of students had laptops.7 other nationwide studies have corroborated this high level of laptop ownership.8 so, does this increased use of laptops and mobile devices have affect the use of desktop computers? the 2010 ecar study reported that desktop ownership (about 50 percent in 2010) had declined by more than 25 percent between 2006 and 2009, a significant period in the lifetime of csusm’s new library building. pew’s internet & american life project trend data showed desktop ownership as the only gadget category in which ownership is decreasing, from 68 percent in 2006 to 55 percent at the end of 2011.9 some libraries and campuses are beginning to respond to the increase in laptop ownership by changing their support for desktop computers. university of colorado boulder, in an effort to decrease costs and increase availability of flexible campus spaces, is making a major move away from providing desktop computers.10 while they found that 97 percent of their students own laptops and other mobile devices, they were concerned that many students still preferred to use desktop computers when on campus. to entice students to bring their laptops to campus, the university is enhancing their support for mobile devices by converting their central computer labs into flexible-use space with plentiful power outlets, flexible furniture, printing solutions, and access to the usual campus software. nevertheless, it may be premature for all libraries and universities to eliminate their desktop computer support. tom, voss, and scheetz found students want flexibility with a spectrum of technological options.11 certainly, they want wi-fi and power outlets to support their mobile technology. however, students also want conventional campus workstations providing a variety of functions, such as quick print and email computers, long-term workstations with privacy, and workstations at larger tables with multiple monitors that support group work. while the ubiquity of laptops is an important factor today, other forms of mobile devices may become more important in the future. a 2009 wall street journal article reported the trend for business travelers is to rely on smartphones rather than laptops.12 for the last three years, educause’s horizon reports have made support for non-laptop mobile technologies one of the top trends. the 2009 horizon report mentioned that in countries like japan, “young people equipped student use of library computers | thompson 23 with mobiles often see no reason to own personal computers.”13 in 2010, horizon reported an interesting pilot project at a community college in which one group of students was issued mobile devices and another group was not.14 members of the group with the mobile devices were found to work on the course more during their spare time. the 2011 horizon report discusses mobiles as capable devices in their own right that are increasingly users’ first choice for internet access.15 therefore, rather than trying to determine which technology is most important, libraries may need to support multiple devices. trends described in the ecar and horizon studies make it clear that students own multiple devices. so how do they use them in the study environment? head’s interviews with undergraduate students at ten us campuses found that “students use a less is more approach to manage and control all of the it devices and information systems available to them.”16 for example, in the days before final exams, students were selective in their use of technology to focus on coursework yet remain connected with the people in their lives. the question then may not be which technology libraries should support but rather how to support the right technology at the right time. method the csusm study used a mixed-method approach, combining surveys with real-time observation to improve the effectiveness of assessment and generate a more holistic understanding of how library users made their technology choices. the study protocol received exempt status by the university human subjects review board. it was carried out twice over a two-year period to determine whether time of the semester affected usage. in 2009, the study was administered at the end of the spring term, april 15 to may 3. we expected that students near the end of the term would be preparing for finals and completing assignments, including major projects. the 2010 study was conducted near the beginning of the term, february 4 to february 18. we that early term students would be less engaged in academic assignments, particularly major research projects. we carried out each study over a two-week period. an attempt was made to check consistency by duplicating each time and location. each location was surveyed monday—thursday, once in the morning and once in the afternoon during the heavy-use times of 11 a.m. and 2 p.m. the survey locations included two large computer labs (more than eighty computers each), one located near the library reference desk and one near the academic technology helpdesk. other locations included twenty computers in the media library, a handful of desktop computers in the curriculum area, and laptop users, mostly located on the fourth and fifth floor of the library. the fourth and fifth floor observations also included the library’s forty quiet study rooms. for the 2010 study, the other large computer lab on campus (108 computers), located outside the library, also was included for comparison purposes. we used two techniques: a quantitative survey of library computer users and a qualitative observation of software applications usage and selected study habits. the survey tried to determine the purpose for which the student was using the computer for that day, what their computer preference was, and what other business they might have in the library. it also asked students for their suggestions for changes in the library. the survey was usually completed within the five-minute period that we had estimated and contained no identifying personal information. the survey administrator handed-out the one-page paper survey, along with a pencil if desired, to each student using a library workstation or using a laptop during each designated observation information technology and libraries |december 2012 24 period. users who refused to take the survey were counted in the total number of students asked to do the survey. however, users who indicated they refused because they had already completed a survey on a previous observation date were marked as “dup” in the 2010 survey and were not counted again. the “dup” statistic proved useful as an independent confirmation of the popularity of the library computers. the second method involved conducting “over-the-shoulder” observations of students using the library computers. while students were filling out the paper survey, the survey administrator walked behind the users and inconspicuously looked at their computer screens. all users in the area were observed whether or not they had agreed to take the survey. the one exception was users in group-study rooms. the observer did not enter the room and could only note behaviors visible from the door window, such as laptop usage or group studying. based on brief (one minute or less) observations, administrators noted on a form the type of software application the student was using at that point in time. the observer also noted other, nondesktop computer technical devices in use (specifically laptops, headphones, and mobile devices such as smart phones), and study behaviors, such as groupwork (defined as two or more people working together). the student was not identified on the form. we felt that these observations could validate information provided by the users on the survey. results we completed 1,452 observations in 2009 and 2,501 observations in 2010. the gate counts for the primary month each study took place—70,607 for april 2009 and 59,668 for february 2010— show the library was used more heavily during the final exam period. the larger number of results the second year was due to more careful observation of laptop and study-group computer users on the fourth and fifth floor and the addition of observations in a nonlibrary computer lab rather than an increase of students available to be observed. the observations looked at application usage, study habits, and devices present, but this article will only discuss the observations pertaining to devices. in 2009, 17 percent of students were observed using laptops (see table1). this number almost doubled in 2010 to 33 percent. most laptop users were observed on the fourth and fifth floors where furniture, convenient electrical outlets, and quiet study rooms provided the best support for this technology. very few desktop computers were available, so students desiring to study on these floors have to bring their own laptops. almost 20 percent of students in 2010 were observed with other mobile technology, such as cell phones or ipods, and 16 percent were wearing headphones, which indicated there was other, often not visible, mobile technology in use. student use of library computers | thompson 25 table 1. mobile technology observed in 2009, 1,141 students completed the computer-use survey. however, we were unable to accurately determine the return rate that year. the nature of the study, which surveyed the same locations multiple times, revealed that many of the students were approached more than once to complete the survey. thus the majority of the refusals to take the survey were because the subject had already completed one previously. the 2010 study accounted for this phenomenon by counting refusals and duplications separately. in 2010, 1,123 students completed the survey out of 1,423 unique asks, resulting in a 79 percent return rate. the 619 duplicates counted represented about half of the 2010 surveys completed and could be considered another indicator of frequent use of the library’s computers. the 2010 results included an additional 290 surveys completed by students using the other large computer lab on campus outside the library. table 2. frequency of computer use 33% 16% 18% 17% 0% 5% 10% 15% 20% 25% 30% 35% laptop in use headphones in use mobile device in use (cell phone, ipod) 2010 2009 49% 33% 11% 9% 42% 30% 15% 10% 0% 10% 20% 30% 40% 50% 60% daily when on campus several times a week several times a month rarely use comps in library 2009 2010 information technology and libraries |december 2012 26 in both years of the study, 78 percent of students said they preferred to use computers in the library to other computer lab locations on campus. students also indicated they were frequent users (see table 2). in 2009, 82 percent of students used the library computers frequently—49 percent daily and 33 percent several times a week. the frequency of use in the 2010 early term study dropped about 10 percent to 72 percent but with the same proportion of daily vs. weekly users. convenience and quiet were the top reasons given by more than half of students as to why they preferred the library computers followed closely by atmosphere. about a quarter of students preferred library computers because of their close access to other library services. table 3. preferred computer to use in the library the types of computer that students preferred to use in the library were desktop computers followed by laptops owned by the students (see table 3). it is notable that the preference for desktop computers changed significantly from 2009 and 2010: 84 percent of students preferred desktop computers in 2009 vs. 72 percent in 2010—a 12 percent decrease. not surprisingly, few students preferred the simplified walk-up computers used for quick lookups. however, we did not expect such little interest in checking out laptops, with only 2 percent preferring that option. the 2010 study added a new question to the survey to better understand the types of technology devices owned by students (see table 4). in 2010, 84 percent of students owned a laptop (combining the netbook and laptop statistics). almost 40 percent of students owned a desktop, therefore many students owned more than one type of computer. of the 85 percent of students that indicated they had a cell phone, about one-third indicated they owned smart phones. the majority of students own music players. the one technology students were not interested in was e-book readers, with less than 2 percent indicating ownership. 84% 6% 23% 2% 71% 5% 28% 2% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% sit-down pc walk-up pc own laptop laptop checked out in library 2009 2010 student use of library computers | thompson 27 table 4. technology devices owned by students (2010) to understand how the use of technology might affect use of the library in general, the survey asked students what other library services they used on the same day they were using library computers. table 5 shows survey responses are very similar between the late term 2009 study and the early term in 2010. by far the most popular use of the library, by more than three-quarters of the students, was for study. around 25 percent of the students planned to meet with others, and 20 percent planned to use the media services. around 15 percent of students planned to checkout print books, 15 percent planned to use journals, and 10 percent planned to ask for help. the biggest difference for students early in the term was an increased interest (5 percent more) in using the library for study. the late-term students were 9 percent more likely to meet with others. by contrast, users in the nonlibrary computer lab were much less likely to make use of other library services. only 24 percent of nonlibrary users planned to study in the library, and 8 percent planned to meet with others in the library that day. use of all other library services was less than 5 percent by the nonlibrary computer users. 1% 1% 7% 31% 40% 52% 59% 77% 0% 20% 40% 60% 80% 100% kindle/book reader other handheld devices netbook smart phone desktop computer regular cell phone ipod/mp3 music player laptop information technology and libraries |december 2012 28 table 5. other library services used in 2010, we also asked users what changes they would like in the library, and 58 percent of respondents provided suggestions. the question was not limited to technology, but by far the biggest request for change was to provide more computers (requested by 30 percent of all respondents). analysis of the other survey questions regarding computer ownership, and preferences revealed who was requesting more traditional desktops in the library. surprisingly, most were laptop users; 90 percent of laptop owners wanted more computers and 88 percent of the respondents making this request were located on the fourth and fifth floor, which were almost exclusively laptop users. the next most comments received were remarks indicating student satisfaction with the current library services: 19 percent of students said they were satisfied with current library services and 9 percent praised the library and its services. commonality of requests dropped quickly at that point, with the fourth most common request being for more quiet (2 percent). 1% 0% 0% 2% 2% 3% 3% 4% 7% 23% 4% 3% 3% 9% 10% 13% 13% 22% 26% 81% 0% 3% 6% 8% 10% 15% 16% 20% 35% 76% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% other pick up ill/circuit create a video/web page use a reserve book ask questions/get help look for journals/newspapers checkout a book use media meet with others study 2009 2010 non-library student use of library computers | thompson 29 discussion the results show that students consistently prefer to use computers in the library, with 78 percent declaring a preference for the library over other computer locations on campus both years of the study. this preference is confirmed by the statistics reported by csusm’s campus it department, which tracks computer login data. this data consistently shows the library computer labs are used more than nonlibrary computer labs, with the computers near the library reference desk as the most popular followed closely by the library’s second large computer lab, which is located next to the technology help desk. for instance, during the 2010 study period, the reference desk lab (80 computers) had 6,247 logins compared to 3,218 logins in the largest nonlibrary lab (108 computers)—double the amount of usage. the data also shows that use of the computers near the reference desk increased by 15 percent between 2007 and 2010. supporting the popularity of using computers in the library is the fact that most students are repeat customers. table 2 shows 82 percent of the 2009 late-term respondents used the library computers several times a week with almost half using our computers daily. in contrast, 72 percent of the 2010 early term students used the library computers daily or several times a week. the 10 percent drop in frequency of visits to the library for computing applied to both laptop and desktop users and seems to be largely due to not yet receiving enough work from classes to justify more frequent use. the kind of computer that users prefered changed somewhat over the course of the study. the preference for desktop computers dropped from 84 percent of students in 2009 to 72 percent in 2010 (see table 3). one reason for this 12 percent drop may be related to how the survey was adminstered. the 2010 study did a more thorough job of surveying the fourth and fifth library floors where most laptop users are. as a result, the laptop floors represented 29 percent of the response in 2010 vs. only 13 percent in 2009. these numbers are also reflected in the proporation of laptops observed each year—33 percent in 2010 vs. 17 percent in 2009 (see table 1). the drop in desktop computer preference is interesting because it was not matched by an equally large increase in laptop preference, which only increased by 5 percent. the other reason for the decrease in desktop preference is likely due to the larger change seen nationwide in student laptop ownership. for instance, the pew study of gadget ownership showed a 13 percent drop in desktop ownership over a five-year period, 2006–2011, while at the same time laptop ownership almost doubled from 30 percent to 56 percent.17 however, it is interesting to note that, according to the pew study, in 2011 the percent of adults who owned each type of device was nearly equal— 55 percent for desktops and 56 percent for laptops. the 2010 survey tried to better understand students’ preferences by identifying all the kinds of technology they had available to them. we found that 77 percent of csusm students owned laptops and an additional 7 percent owned the netbook form of laptops (see table 4). the combined 84 percent laptop ownership is comparable with the 2010 ecar study’s finding of 89 percent student laptop ownership nationwide.18 this high level of laptop ownership may explain why the users who preferred laptop computers almost all preferred to use their own rather than laptops checked out in the library. despite the high laptop ownership and decrease in desktop preference, it is significant that the majority of csusm students still prefer to use desktop computers in the library. aside from the 72 percent of respondents who specifically stated a preference for desktop computers, the top suggestion for library improvement was to add more desktop computers, requested by 38 percent information technology and libraries |december 2012 30 of respondents. further analysis of the survey data revealed that it was the laptop owners and the fourth and fifth floor laptop users who were the primary requestors of more desktop computers. to try to better understand this seemingly contradictory behavior, we have done some further investigation. anecdotal conversations with users during the survey indicated that convenience and reliability are two factors affecting student’s decision to use desktop computers. the desktop computers’ speed and reliable internet connections were regarded as particularly important when uploading a final project to a professor, with some students stating they came to the library specifically to upload an assignment. in may 2012, the csusm library held a focus group that provided additional insight to the question of desktops vs. laptops. all of the eight-student focus group participants owned laptops, yet all eight participants indicated that they preferred to use desktop computers in the library. when asked why, participants indicated the reliability and speed of the desktop computers and the convenience of not having to remember to bring their laptop to school and “lug” it around. another factor influencing the convenience factor may be that our campus does not require that students own a laptop and bring it to class, so they may have less motivation to travel with their laptop. supporting the idea that students perceive different benefits for each type of computer, six of the eight participants owned a desktop computer in addition to a laptop. the 2010 study also showed that students see value in owning both a desktop and a laptop computer, since the 40 percent ownership of desktop computers overlaps the 84 percent ownership of laptops (see table 4). table 6. reasons students prefer using library computer areas for almost half of the students surveyed, one of the reasons for their preference for using computers in the library was either the ready access to library services or staff (see table 6). even more significant, when specifically asked what else they planned to do in the library that day besides using the computer (see table 5), more than 80 percent of the students indicated that they intended to use the library for purposes other than computing. the top two uses for the library were studying (76 percent in 2009, 81 percent in 2010) and meeting with others (35/26 percent), indicating the importance of the library as place. the most popular library service was the media 0% 5% 10% 15% 20% 25% 30% library services are close library staff are close 2009 2010 student use of library computers | thompson 31 library (20/22 percent) followed by collections with 16/13 percent planning to checkout a book and 15/13 percent planning to look for journals and newspapers. it is interesting that the level of use of these library services was similar whether early or late in the term. the biggest difference was that early term students were less likely to be working with a group but were slightly more likely to be engaged in general studying. even the less-used services, such as asking a question (10 percent) or using a reserve book (8 percent), exhibited an appropriate amount of usage if one looks at the actual numbers. for example, 8 percent of 1,123 2010 survey respondents represent 90 students who used reserve materials sometime during the 8 hours of the two-week survey period. to put the use of the library by computer users into perspective, we also asked students using the nonlibrary computer lab if they planned to use the library sometime that same day. only 24 percent of the nonlibrary computer users planned to study in the library that day vs. 81 percent of the library computer users; only 4 percent planned to use media vs. 24 percent; and 2 percent planned to check out a book vs. 13 percent. the implication is clear that students using computers in the library are much more likely to use the library’s other services. we usually think of providing desktop computers as a service for students, and so it is. however, the study results show that providing computers also benefits the library itself. it reinforces its role as place by providing a complete study environment for students and encouraging all study behaviors including communication and working with others. the popularity of the library computers provide us with a “captive audience” of repeat customers. conclusion the csusm library technology that was planned in 2004 is still meeting students’ needs. although most of our students own laptops, most still prefer to use desktop computers in the library. in fact, providing a full-service computer environment to support the entire research process benefits the entire library. students who use computers in the library appear to conduct more of their studying in the library and thus make more use of traditional library collections and services. going forward, several questions arise for future studies. csusm is a commuter school. students often treat their work space in the library as their office for the day, which increases the importance of a reliable and comfortable computer arrangement. one question that could be asked is whether the results would be different for colleges where most students live on campus or nearby. if the university requires that all students own their own laptop and expects them to bring them to class, how does that affect the relevance of desktop computers in the library? the 2010 study was completed just a few weeks before the first ipad was introduced. since students have identified convenience and weight as reasons for not carrying their laptops, are tablets and ultra-light computers, like the macbook air, more likely to be carried on campus by students and used them more frequently for their research? how important is it to have a supportive mobile infrastructure with features such as high speed wifi, ability to use campus printers, and access to campus applications? are students using smart phones and other mobile devices for study purposes? in fact, are we focusing too much on laptops, and are other mobile devices starting to take over that role? this study’s results make it clear that we can’t just look at data such as ecar’s, which show high laptop ownership, and assume that means students don’t want or won’t use library computers. as information technology and libraries |december 2012 32 the types of mobile devices continue to grow and evolve, libraries should continue to develop ways to facilitate their research role. however, the bottom line may not be that one technology will replace another but rather that students will have a mix of devices and will choose which device is best suited to a particular purpose. therefore libraries, rather than trying to pick which device to support, may need to develop a broad-based strategy to support them all. references 1. susan m. thompson and gabriella sonntag. “chapter 4: building for learning: synergy of space, technology and collaboration.” learning commons: evolution and collaborative essentials. oxford: chandos publishing (2008): 117-199. 2. heidi m. steiner and robert p. holley, “the past, present, and possibilities of commons in the academic library,” reference librarian 50, no. 4 (2009): 309–332. 3. michael j. whitchurch and c. jeffery belliston,“information commons at brigham young university: past, present, and future,” reference services review 34, no. 2 (2006): 261–78. 4. harold shill and shawn tonner, “creating a better place: physical improvements in academic libraries, 1995–2002,” college & research libraries 64 (2003): 435. 5. barbara i. dewey, “social, intellectual, and cultural spaces: creating compelling library environments for the digital age,” journal of library administration 48, no. 1 (2008): 85–94; diane dallis and carolyn walters, “reference services in the commons environment,” references services review 34, no. 2 (2006): 248–60. 6. liz burke et al., “where and why students choose to use computer facilities: a collaborative study at an australian and united kingdom university,” australian academic & research libraries 39, no. 3 (september 2008): 181–97. 7. shannon d. smith and judith borreson caruso, the ecar study of undergraduate students and information technology, 2010 (boulder, co: educause center for applied research, october 2010), http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed march 21, 2012). 8. pew internet & american life project, “adult gadget ownership over time (2006–2012),” http://www.pewinternet.org/static-pages/trend-data-(adults)/device-ownership.aspx (accessed june 14, 2012); the horizon report: 2009 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012); the horizon report: 2010 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012); the horizon report: 2011 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012). 9. pew internet, “adult gadget ownership.” http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf http://www.pewinternet.org/static-pages/trend-data-(adults)/device-ownership.aspx http://net.educause.edu/ir/library/pdf/hr2011.pdf http://net.educause.edu/ir/library/pdf/hr2011.pdf http://net.educause.edu/ir/library/pdf/hr2011.pdf student use of library computers | thompson 33 10. deborah keyek-franssen et al., computer labs study university of colorado boulder office of information technology october 7, 2011, http://oit.colorado.edu/sites/default/files/labsstudypenultimate-10-07-11.pdf (accessed june 15, 2012). 11. j. s. c. tom, k. voss, and c. scheetz[full names?], “the space is the message: first assessment of a learning studio,” educause quarterly 31, no. 2 (2008), http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio (accessed june 25, 2012). 12. nick wingfield, “time to leave the laptop behind,” wall street journal, february 23, 2009, http://online.wsj.com/article/sb122477763884262815.html (accessed june 15 2012). 13. the horizon report: 2009 edition. 14. the horizon report: 2010 edition. 15. the horizon report: 2011 edition. 16. alison j. head and michael b. eisenberg, “balancing act: how college students manage technology while in the library during crunch time,” project information literacy research report, information school, university of washington, october 12, 2011, http://projectinfolit.org/pdfs/pil_fall2011_techstudy_fullreport1.1.pdf (accessed june 14, 2012). 17. pew internet, “adult gadget ownership.” 18. smith and caruso, ecar study. http://oit.colorado.edu/sites/default/files/labsstudy-penultimate-10-07-11.pdf http://oit.colorado.edu/sites/default/files/labsstudy-penultimate-10-07-11.pdf http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio http://online.wsj.com/article/sb122477763884262815.html http://projectinfolit.org/pdfs/pil_fall2011_techstudy_fullreport1.1.pdf table 1. mobile technology observed discussion article title | author 41content-based information retrieval and digital libraries | wan and liu 41 content-based information retrieval and digital libraries this paper discusses the applications and importance of content-based information retrieval technology in digital libraries. it generalizes the process and analyzes current examples in four areas of the technology. content-based information retrieval has been shown to be an effective way to search for the type of multimedia documents that are increasingly stored in digital libraries. as a good complement to traditional textbased information retrieval technology, content-based information retrieval will be a significant trend for the development of digital libraries. w ith several decades of their development, digital libraries are no longer a myth. in fact, some general digital libraries such as the national science digital library (nsdl) and the internet public library are widely known and used. the advance of computer technology makes it possible to include a colossal amount of information in various formats in a digital library. in addition to traditional text-based documents such as books and articles, other types of materials—including images, audio, and video—can also be easily digitized and stored. therefore, how to retrieve and present this multimedia information effectively through the interface of a digital library becomes a significant research topic. currently, there are three methods of retrieving information in a digital library. the first and the easiest way is free browsing. by this means, a user browses through a collection and looks for desired information. the second method—the most popular technique used today—is textbased retrieval. through this method, textual information (full text of text-based documents and/or metadata of multimedia documents) is indexed so that a user can search the digital library by using keywords or controlled terms. the third method is content-based retrieval, which enables a user to search multimedia information in terms of the actual content of image, audio, or video (marques and furht 2002). some content features that have been studied so far include color, texture, size, shape, motion, and pitch. while some may argue that text-based retrieval techniques are good enough to locate desired multimedia information, as long as it is assigned proper metadata or tags, words are not sufficient to describe what is sometimes in a human’s mind. imagine a few examples: a patron comes to a public library with a picture of a rare insect. without expertise in entomology, the librarian won’t know where to start if only a text-based information retrieval system is available. however, with the help of content-based image retrieval, the librarian can upload the digitized image of the insect to an online digital image library of insects, and the system will retrieve similar images with detailed description of this insect. similarly, a patron has a segment of music audio, about which he or she knows nothing but wants to find out more. by using the content-based audio retrieval system, the patron can get similar audio clips with detailed information from a digital music library, and then listen to them to find an exact match. this procedure will be much easier than doing a search on a text-based music search system. it is definitely helpful if a user can search this non-textual information by styles and features. in addition, the advance of the world wide web brings some new challenges to traditional text-based information retrieval. while today’s web-based digital libraries can be accessed around the world, users with different language and cultural backgrounds may not be able to do effective keyword searches of these libraries. content-based information retrieval techniques will increase the accessibility of these digital libraries greatly, and this is probably a major reason it has become a hot research area in the past decade. ideally, a content-based information retrieval system can understand the multimedia data semantically, such as its objects and categories to which it belongs. therefore, a user is able to submit semantic queries and retrieve matched results. however, a great difficulty in the current computer technology is to extract high-level or semantic features of multimedia information. most projects still focus on lower-level features, such as color, texture, and shape. simply put, a typical content-based information retrieval system works in this way: first, for each multimedia file in the database, certain feature information (e.g., color, motion, or pitch) is extracted, indexed, and stored. second, when a user composes a query, the feature information of the query is calculated as vectors. finally, the system compares the similarity between the feature vectors of the query and multimedia data, and retrieves the best matching records. if the user is not satisfied with the retrieved records, he or she can refine the search results by selecting the most relevant ones to the search query, and repeat the search with the new information. this process is illustrated in figure 1. the following sections will examine some existing content-based information retrieval techniques for most common information formats (image, audio, and video) in digital libraries, as well as their limitations and trends. gary (gang) wan (gwan@tamu.edu) is a science librarian and assistant professor, and zao liu (zliu@tamu.edu) is a distance learning librarian and assistant professor at sterling c. evans library, texas a&m university, college station, texas. gary (gang) wan and zao liu 42 information technology and libraries | march 200842 information technology and libraries | march 2008 ■ content-based image retrieval there have been a large number of different contentbased image retrieval (cbir) systems proposed in the last few years, either building on prior work or exploring novel directions. one similarity among these systems is that most perform feature extraction as the first step in the process, obtaining global image features such as color, shape, and texture (datta et al., 2005). one of the most well-known cbir systems is query by image content (qbic), which was developed by ibm. it uses several different features, including color, sketches, texture, shape, and example images to retrieve images from image and video databases. since its launch in 1995, the qbic model has been employed for quite a few digital libraries or collections. one recent adopter is the state hermitage museum in russia (www.hermitage. ru), which uses qbic for its web-based digital image collection. users can find artwork images by selecting colors from a palette or by sketching shapes on a canvas. the user can also refine existing search results by requesting all artwork images with similar visual attributes. the following screenshots demonstrate how a user can do a content-based image search with qbic technology. in figure 2.1, the user chooses a color from the palette and composes the color schema of artwork he or she is looking for. figure 2.2 shows the artwork images that match the query schema. another example of digital libraries or collections that have incorporated cbir technology is the national science foundation’s international digital library project (www.memorynet.org), a project that is composed of several image collections. the information retrieval system for these collections includes both a traditional text-based search engine and a cbir system called simplicity (semantics-sensitive integrated matching for picture libraries) developed by wang et al. (2001) of pennsylvania state university. from the front page of these image collections, a user can choose to display a random group of images (figure 3.1). below each image is a “similar” button; clicking this allows the user to view a group of images that contain similar objects to the previously selected one (figure 3.2). by providing feedback to the search engine this way, the user can find images of desired objects without knowing their names or descriptions. simply put, simplicity segments each image into small regions, extracts several features (such as color, figure 1. the general process of content-based information retrieval figure 2.1. a user query figure 2.2. the search results for this query article title | author 43content-based information retrieval and digital libraries | wan and liu 43 location, and shape) from these small regions, and classifies these regions into some semantic categories (such as textured/nontextured and graph/photograph). when computing the similarity between the query image and images in the database, all these features will be considered and integrated, and best matching results will be retrieved (wang et al., 2001). similar applications of cbir technology in digital libraries include the university of california–berkeley’s digital library project (http://bnhm.berkeley.edu), the national stem digital library (ongoing), and virginia tech’s anthropology digital library, etana (ongoing). while these feature-based approaches have been explored over the years, an emerging new research direction in cbir is automatic concept recognition and annotation. ideally, automatic concept recognition and annotation can discover the concepts that an image conveys and assign a set of metadata to it, thus allowing image search through the use of text. a trusted automatic concept recognition and annotation system can be a good solution for large data sets. however, the semantic gap between computer processors and human brains remains the major challenge in the development of a robust automatic concept recognition and annotation system (datta et al., 2005). a recent example of efforts in this field is li and wang’s alipr (automatic linguistic indexing of pictures—real time, http://alipr.com) project (2006). through a web interface, users are able to search images in several different ways: they may do text searches and provide feedback to the system to find similar images. users may also upload an image, and the system will perform concept analysis and generate a set of annotations or tags automatically, as shown in figure 4. the system then retrieves images from the database that are visually similar to the uploaded image. in the process of automatic annotation, if the user doesn’t think the tags given by the system are suitable, he or she can input other tags to describe the image. this is also the “training” process for the alipr system. since cbir is the major research area and has the longest history in content-based information retrieval, there are many models, products, and ongoing projects in addition to the above examples. as image collections become a significant part of digital libraries, more attention has been paid to possibilities of providing content-based image search as a complement to existing metadata search. ■ content-based audio retrieval compared with cbir, content-based audio retrieval (cbar) is relatively new, and fewer research projects on it can be found. in general, existing cbar approaches start from the content analysis of audio clips. an example of this content analysis is extracting basic audio elements, such as duration, pitch, amplitude, brightness, and bandfigure 3.1. a group of random images in the collection figure 3.2. cbir results figure 4. alipr’s automatic annotation feature 44 information technology and libraries | march 200844 information technology and libraries | march 2008 width (wold et al., 1996). because of the great difficulties in recognizing audio content, research in this area is less mature than that in content-based image and video retrieval. although no cbar system has been found to be implemented by any digital library so far, quite a few projects provide good prototypes or directions. one good example is zhang and kuo’s (2001) research project on audio classification and retrieval. the prototype system is composed of three stages: coarse-level audio segmentation, fine-level classification, and audio retrieval. in the first stage, audio signals are semantically segmented and classified into several basic types including speech, music, song, speech with music background, environment sounds, and silence. some physical audio features—such as the energy function, the fundamental frequency, and the spectral peak tracks—are examined in this stage. in the second stage, further classification is conducted for every basic type. features are extracted from the time-frequency representation of audio signals to reveal subtle differences of timbre and pattern among different classes of sounds. based on these differences, the coarse-level segmentation obtained in stage one can be classified to narrower categories. for example, speech can be differentiated into the voices of men, women, and children. finally, in the information retrieval stage, two approaches—query-by-keyword and query-by-example—are employed. the query-by-keyword approach is more like the traditional text-based search system. the query-by-example approach is similar to content-based image retrieval systems where an image can be searched by color, texture, and histogram, and audio clips can be retrieved with distinct features, such as timbre, pitch, and rhythm. this way, a user may choose from a given list of features, listen to the retrieved samples, and modify the input feature set to get more desired results. zhang and kuo’s prototype is a very typical and classic cbar system. it is relatively mature and can be used by large digital audio libraries. more recently, li et al. (2003) proposed a new feature extraction method particularly for music genre classification named daubechies wavelet coefficient histograms (dwchs). dwchs capture the local and global information of music signals simultaneously by computing their histograms. similar to other cbar strategies, this method divides the process of music genre classification into two steps: feature extraction and multi-class classification. the music signal information representing the music is extracted first, and then an algorithm is used to identify the labels from the representation of the music sounds with respect to their features. since the decomposition of audio signal can produce a set of subband signals at different frequencies corresponding to different characteristics, li et al. (2003) proposed a new methodology, the dwchs algorithm, for feature extraction. with this algorithm, the decomposition of the music signals is obtained at the beginning, and then a histogram of each subband is constructed. hence, the energy for each subband is computed, and the characteristics of the music are represented by these subbands. one finding from this research reveals that this methodology, along with advanced machine learning techniques, has significantly improved accuracy of music genre classification (li et al. 2003). therefore, this methodology potentially can be used by those digital music libraries widely developed in past several years. ■ content-based video retrieval content-based video retrieval (cbvr) is a more recent research topic than cbir and cbar, partly because the digitization technology for video appeared later than those for image and audio. as digital video websites such as youtube and google video become more popular, how to retrieve desired video clips effectively is a great concern. searching by some features of video, such as motion and texture, can be a good complement to the traditional text-based search method. one of the earliest examples is the videoq system developed by chang et al. (1997) of columbia university (www.ctr.columbia.edu/videoq), which allows a user to search video based on a rich set of visual features and spatio-temporal relationships. the video clips in the database are stored as mpeg files. through a web interface, the user can formulate a query scene as a collection of objects with different attributes, including motion, shape, color, and texture. once the user has formulated the query, it is sent to a query server, which contains several databases for different content features. on the query server, the similarities between the features of each object specified in the query and those of the objects in the database are computed; a list of video clips is then retrieved based on their similarity values. for each of these video clips, key-frames are dynamically extracted from the video database and returned to browser. the matched objects are highlighted in the returned key-frame. the user can interactively view these matched video clips by simply clicking on the keyframe. meanwhile, the video clip corresponding to that key-frame is extracted from the video database (chang et al. 1997). figures 5.1–5.2 show an example of a visual search through the videoq system. many other cbvr projects also examine these content features and try to find more efficient ways to retrieve data. a recent example is wang et al.’s (2006) project, vferret, a content-based similarity search tool for continuous archived video. the vferret system segments video data into clips and extracts both visual and audio features as metadata. then a user can do a metadata search or article title | author 45content-based information retrieval and digital libraries | wan and liu 45 content-based search to retrieve desired video clips. in the first stage, a simple segmentation method is used to split the archived digital video into five-minute video clips. the system then extracts twenty image frames evenly from each of these five-minute video clips for visual feature extraction. additionally, the system splits the audio channel of each clip into twenty individual fifteensecond segments for further audio feature extraction. in the second stage, both audio and visual features are extracted. for visual features, the color element is used as the content feature. for audio features, 154 audio features originally used by ellis and lee (2004) to describe audio segments are computed. for each fifteen-second video segment, the visual feature vector extracted from the sample image and the audio feature vector extracted from the corresponding audio segment are combined into a single feature vector. in the information retrieval stage, the user submits a video clip query at first, then its feature vector is computed and compared with that of video clips in the database, and the most similar clips are retrieved (wang et al. 2006). similar projects in this area include carnegie mellon university’s informedia digital video library (www. informedia.cs.cmu.edu) and muvis of finland’s tampere university of technology (http://muvis.cs.tut.fi/index. html). content-based information retrieval for other digital formats with the advance of digitization technology, the content and formats of digital libraries are much richer than before. they are not limited to text, image, audio, and video. some new formats of digital content are emerging. digital libraries of 3-d objects are good examples. since 3-d models have arbitrary topologies and cannot be easily “parameterized” using a standard template as in the case for 2-d forms (bustos et al. 2005), contentbased 3-d model retrieval is a more challenging research topic than other multimedia formats discussed earlier. so far, four types of solutions—primitive-based, statistics-based, geometry-based, and view-based—have been found (bimbo and pala 2006). primitive-based solutions represent 3-d objects with a basic set of parameterized primitive elements. parameters are used to control the shape of each primitive element and to fit each primitive element with a part of the model. with statistics-based approaches, shape descriptions based on statistical modfigure 5.1. the user composes a query figure 5.2. search results for the sample query 46 information technology and libraries | march 200846 information technology and libraries | march 2008 els are created and measured. geometry-based methods, however, use geometric properties of the 3-d object and their measures as global shape descriptors. for viewbased solutions, a set of 2-d views of the model and descriptors of their content are used to represent the 3-d object shape (bimbo and pala 2006). another novel example is moustakas et al.’s (2005) project on 3-d model search using sketches. in the experimental system, the vector of geometrical descriptors for each 3-d model is calculated during the feature extraction stage. in the retrieval stage, a user can initially use one of the sketching interfaces (such as the virtual reality interface or by using an air mouse) to sketch a 2-d contour of the desired 3-d object. the 2-d shape is recognized by the system, and a sample primitive is automatically inserted in the scene. next, the user defines other elements that cannot be described by the 2-d contour, such as the height of the object, and manipulates the 2-d contour until it reaches its target position. the final query is formed after all the primitives are inserted. finally, the system computes the similarities between the query model and each 3-d model in the database, and renders the best matching records. an online demonstration can be found for a european project specifically designed for a 3-d digital museum collection, sculpteur (www.sculpteurweb.org). from its web-based search interface, a user can choose to do a metadata search or content-based search for a 3-d object. the search strategy here is somewhat similar to that in some cbir systems: the user can upload a 3-d model in vrml formats, then select a search algorithm (such as similar color, texture, etc.) to perform a search within a digital collection of 3-d models. as 3-d computer visualization has been widely used in a variety of areas, there are more research projects focusing on the content-based information retrieval techniques for this new multimedia format. ■ conclusion there is no doubt that content-based information retrieval technology is an emerging trend for digital library development and will be an important complement to the traditional text-based retrieval technology. the ideal cbir system can semantically understand the information in a digital library, and render users the most desirable data. however, the machine understanding of semantic information still remains to be a great difficulty. therefore, most current research projects, including those discussed in this paper, deal with the understanding and retrieval of lower-level features or physical features of multimedia content. certainly, as related disciplines such as computer vision and artificial intelligence keep developing, more researches will be done on higher-level feature-based retrieval. in addition, the growing varieties of multimedia content in digital libraries have also brought many new challenges. for instance, 3-d models now become important components of many digital libraries and museums. content-based retrieval technology can be a good direction for this type of content, since the shapes of these 3-d objects are often found more effectively if the user can compose the query visually. new cbir approaches need to be developed for these novel formats. furthermore, most cbir projects today tend to be web-based. by contrast, many project were based on client applications in the 1990s. these web-based cbir tools will have significant influence on digital libraries or repositories, as most of them are also web-based. particularly in the age of web 2.0, some large digital repositories—such as flickr for images and youtube and google video for video—are changing people’s daily lives. the implementation of cbir will be a great benefit to millions of users. since the nature of cbir is to provide better search aids to end users, it is extremely important to focus on the actual user’s needs and how well the user can use these new search tools. it is surprising to find that little usability testing has been done for most cbir projects. such testing should be incorporated into future cbir research before it is widely adopted. bibliography bimbo, a. and p. pala. 2006. content-based retrieval of 3-d models. acm transactions on multimedia computing, communications, and applications 2, no. 1: 20–43. bustos, b., et al. 2005. feature-based similarity search in 3-d object databases. acm computing surveys 37, no. 4: 345–387. chang, s., et al. 1997). videoq: an automated content based video search system using visual cues. in proceedings of the 5th acm international conference on multimedia, e. p. glinert, et al., eds. new york: acm. datta r., et al. 2005. content-based image retrieval: approaches and trends of the new age. in proceedings of the 7th international workshop on multimedia information retrieval, in conjunction with acm international conference on multimedia, h. zhang, , j. smith, and q. tian, eds. new york: acm. ellis, d. and k. lee. minimal-impact audio-based personal archives. in proceedings of the 1st acm workshop on continuous archival and retrieval of personal experiences carpe, j. gemmell, et al., eds. new york: acm. li, t., et al. 2003. a comparative study on content-based music genre classification. in proceedings of the 26th annual international acm sigir conference on research and development in information retrieval, c. clarke, et al., eds. new york: acm. li, j. and j. wang, j. 2006. real-time computerized annotation of pictures. in proceedings of the 14th annual acm international article title | author 47content-based information retrieval and digital libraries | wan and liu 47 conference on multimedia, k. nahrstedt, et al., eds. new york: acm. marques, o. and b. furht. 2002. content-based image and video retrieval. norwell, mass: kluwer. moustakas, k., et al. 2005. master-piece: a multimodal (gesture+speech) interface for 3d model search and retrieval integrated in a virtual assembly application. proceedings of the enterface: 62–75. wang, j., et al. 2001. simplicity: semantics-sensitive integrated matching for picture libraries. ieee trans. pattern analysis and machine intelligence 23, no. 9: 947–963. wang, z., et al. 2006. vferret: content-based similarity search tool for continuous archived video. in proceedings of the 3rd acm workshop on continuous archival and retrival of personal experiences, k. maze et al., eds. new york: acm. wold, e., et al. 1996. content-based classification, search, and retrieval of audio. ieee multimedia 3, no. 3: 27–36. zhang, t. and c. kuo. 2001. content-based audio classification and retrieval for audiovisual data parsing. norwell, mass.: kluwer. lita national forum cover 2 lita guides cover 3 lita workshops cover 4 index to advertisers statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: john webb, librarian emeritus, washington state university libraries, pullman, wa 99164-5610. annual subscription price, $55. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: june 2007 issue). total number of copies printed: average, 5,354; actual, 5,280. sales through dealers and carriers, street vendors, and counter sales: average, 0; actual 462. paid or requested mail subscriptions: average, 4,283; actual, 4,193. free distribution (total): average, 292; actual, 292. total distribution: average, 5,028; actual, 4,947. office use, leftover, unaccounted, spoiled after printing: average, 326; actual, 333. total: average, 5,354; actual, 5,280. percentage paid: average, 94.19; actual, 94.10. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 0 7 . trends at a glance: a management dashboard of library statistics emily morton-owens and karen l. hanson information technology and libraries | september 2012 36 abstract systems librarians at an academic medical library created a management data dashboard. charts were designed using best practices for data visualization and dashboard layout, and include metrics on gatecount, website visits, instant message reference chats, circulation, and interlibrary loan volume and turnaround time. several charts draw on ezproxy log data that has been analyzed and linked to other databases to reveal use by different academic departments and user roles (such as faculty or student). most charts are bar charts and include a linear regression trend line. the implementation uses perl scripts to retrieve data from eight different sources and add it to a mysql data warehouse, from which php/javascript webpages use google chart tools to create the dashboard charts. introduction new york university health sciences libraries (nyuhsl) had adopted a number of systems that were either open-source, home-grown, or that offered apis of one sort or another. examples include drupal, google analytics, and a home-grown interlibrary loan (ill) system. systems librarians decided to capitalize on the availability of this data by designing a system that would give library management a single, continuously self-updating point of access to monitor a variety of metrics. previously this kind of information had been assembled annually for surveys like aahsl and arl. 1 the layout and scope of the dashboard was influenced by google analytics and a beta dashboard project at brown.2 the dashboard enables closer scrutiny of trends in library use, ideally resulting in a more agile response to problems and opportunities. it allows decisions and trade-offs to be based on concrete data rather than impressions, and it documents the library’s service to its user community, which is important in a challenging budget climate. although the end product builds on a long list of technologies—especially perl, mysql, php, javascript, and google chart tools—the design of the project is lightweight and simple, and the number of lines of code required to power it is remarkably small. further, the design is modular. this means that nyuhsl could offer customized versions for staff in different roles, restricting the display to show only data that is relevant to the individual’s work. because most libraries have a unique combination of technologies in place to handle functions like circulation, reference questions, circulation, and so forth, a one-size-fits-all software package that emily morton-owens (emily.morton.owens@gmail.com) was web services librarian and karen hanson (karen.hanson@med.nyu.edu) is knowledge systems librarian, new york university health sciences libraries, new york. trends at a glance: a management dashboard of library statistics | morton-owens and hanson 37 could be used by any library may not be feasible. instead, this lightweight and modular approach could be re-created relatively easily to fit local circumstances and needs. visual design principles in designing the dashboard, we tried to use some best practices for data visualization and assembling charts into a dashboard. the best-known authority on data visualization, edward tufte, states “above all else, show the data.”3 in part, this means minimizing distractions, such as unnecessary gridlines and playful graphics. ideally, every dot of ink on the page would represent data. he also emphasizes truthful proportions, meaning the chart should be proportional to the actual measurements.4 a chart should display data from zero to the highest quantity, not arbitrarily starting the measurements at a higher number, because that distorts the proportions between the part and the whole. a chart also should not use graphics that differ in width as well as length, because that causes the area of the graphic to increase incorrectly, as opposed to simply the length increasing. pie charts are popular chart types that have serious problems in this respect despite their popularity; they require users to judge the relative area of the slices, which is difficult to do accurately.5 generally, it is better to use a bar chart with different length bars whose proportions users can judge better. color should also be used judiciously. some designers use too many colors for artistic effect, which creates a “visual puzzle”6 as the user wonders whether the colors carry meaning. some colors stand out more than others and should be used with caution. for example, red is often associated with something urgent or negative, so it should only be used in appropriate contexts. duller, less saturated colors are more appropriate for many data visualizations. a contrasting style is exemplified by nigel holmes, who designs charts and infographics with playful visual elements. a recent study compared the participants’ reactions to holmes’ work with plain charts of the same data.7 there was no significant difference in comprehension or shortterm memorability; however, the researchers found that the embellished charts were more memorable over the long term, as well as more enjoyable to look at. that said, holmes’ style is most appropriate for charts that are trying to drive home a certain interpretation. in the case of the dashboard, we did not want to make any specific point, nor did we have any way of knowing in advance what the data would reveal, so we used tufte’s principles in our design. a comparable authority on dashboard design is stephen few. a dashboard combines multiple data displays in a single point of access. as in the most familiar example, a car dashboard, it usually has to do with controlling or monitoring something without taking your focus from the main task.8 a dashboard should be simple and visual, not requiring the user to tune out extraneous information or interpret novel chart concepts. the goal is not to offer a lookup table of precise values. the user should be able to get the idea without reading too much text or having to think information technology and libraries | september 2012 38 too hard about what the graph represents. thinking again of a car, its speedometer does not offer a historical analysis of speed variation because this is too much data to process while the car is moving. similarly, the dashboard should ideally fit on one screen so that it can be taken in at a glance. if this is not possible, at least all of the individual charts should be presented intact, without scrolling or being cramped in ways that distort the data. a dashboard should present data dimensions that are dynamic. the user will refer to the dashboard frequently, so presenting data that does not change over time only takes up space. better yet, the data should be presented alongside a benchmark or goal. a benchmark may be a historical value for the same metric or perhaps a competitor’s value. a goal is an intended future value that may or may not ever have been reached. either way, including this alternate value gives context for whether the current performance is desirable. this is essential for making the dashboard into a decision-making tool. nils rasmussen et al. discuss three levels of dashboards: strategic, tactical (related to progress on a specific project), and operational (related to everyday, department-level processes). 9 so far, nyuhsl’s dashboard is primarily operational, monitoring whether ordinary work is proceeding as planned. later in this paper we will discuss ways to make the dashboard better suited to supporting strategic initiatives. system architecture the dashboard architecture consists of three main parts: importer scripts that get data from diverse sources, a data warehouse, and php/javascript scripts that display the data. the data warehouse is a simple mysql database; the term “warehouse” refers to the fact that it contains a stripped-down, simplified version of the data that is appropriate for analysis rather than operations. our approach to handling the data is an etl (extract, transform, load) routine. data are extracted from different sources, transformed in various ways, and loaded into the data warehouse. our data transformations include reducing granularity and enriching the data using details drawn from other datasets, such as the institutional list of ip ranges and their corresponding departments. data rarely change once in the warehouse because they represent historical measurements, not open transactions.10 there is an importer script customized for each data source. the data sources differ in format and location. for example, google analytics is a remote data source with a unique data export api, the ill data are in a local mysql database, and libraryh3lp has remote csv log files. the scripts run automatically via a cron job at 2a.m. and retrieves data for the previous day. that time was chosen to ensure all other nightly cron jobs that affect the databases are complete before the dashboard imports start. each uses custom code for its data source and creates a series of mysql insert queries to put the needed data fields in the mysql data warehouse. for example, a script might pull the dates when an ill request was placed and filled, but not the title of the requested item. trends at a glance: a management dashboard of library statistics | morton-owens and hanson 39 a carefully thought-out data model simplifies the creation of reports. the data structure should aim to support future expansion. in the data warehouse, information that was previously formatted and stored in very inconsistent ways is brought together uniformly. there is one table for each kind of data with consistent field names for dates, services, and so forth, and others that combine related data in useful ways. the dashboard display consists of a number of widgets, one for each chart. each chart is created with a mixture of php and javascript. google chart tools interprets lines of javascript to draw an attractive, proportional chart. we do not want to hardcode the values in this javascript, of course, because the charts should be dynamic. therefore we use php to query the data warehouse and a statement for each line of results to “write” a line of the data in javascript. figure 1. php is used to read from the database and generate rows of data as server-side javascript. each php/javascript file created through this process is embedded in a master php page. this master page controls the order and layout of the individual widgets using the php include feature to add each chart file to the page plus a css stylesheet to determine the spacing of the charts. finally, because all the queries take a relatively long time to run, the page is cached and refreshes itself the first time the page is opened each day. the dashboard can be refreshed manually if the database or code is modified and someone wants to see the results immediately. many of the dashboard’s charts include a linear regression trend line. this feature is not provided by google charts and must be inserted into the widget’s code manually. the formula can be found online.11 the sums and sums of squares are totted up as the code loops through each line of data, and these totals are used to calculate the slope and intercept. in our twenty-six-week displays, we never want to include the twenty-sixth week of data because that is the present (partial) week. the linear regression line takes the form y = mx + b. we can use that formula along with the slope and intercept values to calculate y-values for week zero and the next-to-last week (week twentyfive). those two points are plotted and the trend line is drawn between them. the color of the line depends on its slope (greater or less than zero). depending on whether we want that chart’s metric to go up or down, the line is green for the desirable direction and red for the undesirable direction. information technology and libraries | september 2012 40 details on individual systems gatecount most of nyuhsl’s five locations have electronic gates to track the number of patrons who visit. formerly these statistics were kept in a microsoft excel spreadsheet, but now there is a simple web form into which staff can enter the gate reading twice daily. the data goes directly into the data warehouse, and the a.m. and p.m. counts are automatically summed. there is some errorchecking to prevent incorrect numbers being entered, which varies depending on whether that location’s gate is the kind that provides a continuously increasing count or is reset each day. the data are presented in a stacked bar chart, summed for the week. the user can hover over the stacked bars to see numbers for each location, but the height of the stacked bar and the trend line represent the total visits for all locations together. figure 2. stacked bar chart with trendline showing visits per week to pphysical library branches over a twenty-six-week period ticketing nyuhsl manages online user requests with a simple ticketing system that integrates with drupal. there are four types of tickets, two of which involve helping users and two of which involve reporting problems. the “helpful” services are general reference questions and literature search requests. the “trouble” services are computer problems and e-resource problems. these two pairs trends at a glance: a management dashboard of library statistics | morton-owens and hanson 41 each have their own stacked bar chart because, ideally, the number of “helpful” tickets would go up while the number of “trouble” tickets would go down. each chart has a trend line, color-coded for the direction that is desirable in that case. figure 3. stacked bar chart with trendline showing trouble tickets by type the script that imports this information into the data warehouse simply does so from another local mysql database. it only fetches the date and the type of request, not the actual question or response. it also inserts a record into the user transactions table, which will be discussed in the section on user data. drupal nyuhsl’s drupal site allows librarians directly to contribute content like subject guides and blog posts.12 the dashboard tracks the number of edits contributed by users (excluding the web services librarian and the web manager, who would otherwise swamp the results). this is done with a simple count query on the node_revisions table in the drupal database. because no other processing is needed and caching ensures the query will be done at most once per day, this is the only widget that pulls data directly from the original database at the time the chart is drawn. koha koha is an open-source opac system. at nyuhsl, koha’s database is in mysql. each night the importer script copies “issues” data from koha’s statistics table. this supports the creation of a information technology and libraries | september 2012 42 stacked bar chart showing the number of item checkouts each week, with each bar divided according to the type of item borrowed (e.g., book or laptop). as with other charts, a color-coded trend line was added to show the change in the number of item checkouts. google analytics the dashboard relies on the google analytics php interface (gapi) to retrieve data using the google analytics data export api.13 nothing is stored in the data warehouse and there is no importer script. the first widget gets and displays weekly total visits for all nyuhsl websites, the main nyuhsl website, and visits from mobile devices. a trend line is calculated from the “all sites” count. the second widget retrieves a list of the top “outbound click” events for the past thirty days and returns them as urls. a regular expression is used to remove any ezproxy prefix, and the remaining url is matched against our electronic resources database to get the title. thus, for example, the widget displays “web of knowledge” instead of “http://ezproxy.med.nyu.edu/login?url=http://apps.isiknowledge.com/.” a future improvement to this display would require a new table in the data warehouse and importer script to store historic outbound click results. this data would support comparison of the current list with past results to identify click destinations that are trending up or down. figure 4. most popular links clicked on to leave the library’s website in a thirty-day period trends at a glance: a management dashboard of library statistics | morton-owens and hanson 43 libraryh3lp libraryh3lp is a jabber-based im product that allows librarians to jointly manage a queue of reference queries. it offers csv-formatted log files that a perl script can access using “curl,” a command-line tool that mimics a web browser’s login, cookies, and file requests. the csv log is downloaded via curl, processed with perl’s text::csv module, and the data are then inserted into the warehouse. the first libraryh3lp widget counts the number of chats handled by each librarian over the past ninety days. the second widget tracks the number of chats for the past twenty-six weeks and includes a trend line. figure 5. bar chart showing number of im chats per week over a twenty-six-week period document delivery services the document delivery services (dds) department fulfills ill requests. the web application that manages these requests is homegrown, with a database in mysql. each night, a script copies the latest requests to the data warehouse. the dashboard uses this data to display a chart of how many requests are made each week and which publications are requested from other libraries most frequently. this data could be used to determine whether there are resources that should be considered for purchase. information technology and libraries | september 2012 44 the dds data was also used to demonstrate how data might be used to track service performance. one chart shows the average time it takes to fulfill a document request. further evaluation is required to determine the usefulness of such a chart for motivating improvement of the service or whether this is perceived as a negative use of the data. some libraries may find this kind of information useful for streamlining services. figure 6. this stacked bar chart shows the number of document delivery requests handled per week. the chart separates patron requests from requests made by other libraries. ezproxy data ezproxy is an oclc tool for authenticating users who attempt to access the library’s electronic resources. it does not log e-resource use where the user is automatically authenticated using the institutional ip range, but the data are still valuable because it logs a significant amount of use that can support in-depth analysis. because of the gaps in the data, much of the analysis looks at patterns and relationships in the data rather than absolute values. karen coombs’ article discussing the analysis of ezproxy logs to understand e-resource at the department level provided the initial motivation to switch on the ezproxy log.14 when logging is enabled, a standard web log file is produced. here is a sample line from the log: 123.45.6.7 amyu0gh5brmuska hansok01 [09/sep/2011:18:25:23 -0500] post http://ovidsp.tx.ovid.com: 80/sp3.3.1a/ovidweb.cgi http/1.1 20020472 http://ovidsp.tx.ovid.com.ezproxy.med.nyu.edu/sp-3.3.1a/ovidweb.cgi trends at a glance: a management dashboard of library statistics | morton-owens and hanson 45 each line in the log contains a user ip address, a unique session id, the user id, the date and time of access, the url requested by the user, the http status code, the number of bytes in the requested file, and the referrer (the page the user clicked on to get to the site). the ezproxy log data undergoes some significant processing before being inserted into the ezproxy report tables. the main goal of this is to enrich the data with relevant supplemental information while eliminating redundancy. to facilitate this process, the importer script first dumps the entire log into a table and then performs multiple updates on the dataset. during the first step of processing, the ip addresses are compared to a list of departmental ip ranges maintained by medical center it. if a match is found, the “location accessed” is stored against the log line. next, the user id is compared with the institutional people database, retrieving a user type (faculty, staff, or student) and a department, if available (e.g., radiology). one item of significant interest to senior management is the level of use within hospitals. as a medical library, we are interested in the library’s value to patient care. if there is significant use in the hospitals, this could furnish positive evidence about the library’s role in the clinical setting. next, the resource url and the referring address are truncated down to domain names. the links in the log are very specific, showing detailed user activity. because the library is operating in a medical environment, privacy is a concern and so specific addresses are truncated to a top-level domain (e.g. ovid.com) to suppress any tie to a specific article, e-book, or other specific resource. finally, a query is run against the remaining raw data to condense the log down to unique session id/resource combinations, and this block of data is inserted into a new table. each user visit to a unique resource in a single session is recorded; for example, if a user visits lexis nexis, ovid medline, scopus, and lexis nexis again in a single session, three lines will be recorded in the user activity table. a single line in the final ezproxy activity table contains a unique combination of location accessed (e.g., tisch hospital), user department (e.g., radiology), user type (e.g., staff), earliest access date/time for that resource (e.g., 9/9/201118:25), resource name (e.g., scopus.com), session id, and referring domain (e.g., hsl.med.nyu.edu). there is significant repetition in the log. depending on what filters are set up, every image within a webpage could be a line in the log. the method of condensing the data described previously results in a much smaller and more manageable dataset. for example, on a single day 115,070 rows of were collected in the ezproxy log, but only 2,198 were inserted into the final warehouse table after truncating the urls and removing redundancy. in a separate query on the raw data table, a distinct list containing user id, date, and the word “eresources” is built and stored in a “user transactions” table. this very basic data are stored so that simple user analysis can be performed (see “user data” below). information technology and libraries | september 2012 46 figure 7. line chart showing total number of ezproxy sessions captured per week over a twenty-sixweek period once the ezproxy data are transferred to the appropriate tables, the raw data (and thus the most concerning data from a privacy standpoint) is purged from the database. several dashboard charts were created using the streamlined ezproxy data, a simple count of weekly e-resource users, and a table showing resources whose use changed most significantly since the previous month. it was challenging to calculate the significance of the variations in use since resources that went from one session in a month to two sessions were showing the same proportional change as those that increased from one thousand to two thousand sessions. a basic calculation was created to highlight the more significant changes in use. d = (pq) if d<0 then significance = d—8 x 10 d q +1 if d>0 then significance = d +8 x 10 d q +1 d = difference between last month and this month p = number of visits last month (8 to 1 days ago) q = number of visits previous month (15 to 9 days ago) trends at a glance: a management dashboard of library statistics | morton-owens and hanson 47 this equation serves the basic purpose of identifying unusual changes in e-resource use. for example, one e-resource was shown trending up in use after a librarian taught a course in it. figure 8. table of e-resources showing the most significant change in use over the last month compared to the previous month the ezproxy data has already proven to be a rich source of data. the work so far has only scratched the surface of what the data could show. only two charts are currently displayed on the dashboard, but the value of thisdata is more likely to come from one-off customized reports based on specific queries, like tracking use of individual resources over time or looking at variations of use within specific buildings, departments, or user types. there is also a lot that could be done with the referrer addresses. for example, the library has been submitting tips to the newsletter that is delivered by email. the referrer log allows the number of clicks from this source to be measured so that librarians can monitor the success of this marketing technique. user data each library system includes some user information. where user information is available in a system, a separate table is populated in the warehouse. as mentioned briefly above, a user id, a date, and the type of service used (e-resources, dds, literature search, etc.) is stored. details of the transaction are not kept here. the user id can be used to look up basic information about the user such as role (faculty, staff, student) and department. we should emphasize for clarity that the detailed information about the activity is completely separated from any information about the user so that the data cannot be joined back together. information technology and libraries | september 2012 48 the most sensitive data, such as the raw ezproxy log data, is purged after the import script has copied the truncated and de-identified data. even though the data stored is very basic, information at the granularity of individual users is never displayed on the dashboard. the user information is aggregated by user type for further analysis and display. the institutional people database can be used to determine how many people are in each department. a table added to the dashboard shows the number of resource uses and the percentage of each department that used library resources in a six-month period. some potential uses of this data include identifying possible training needs and measuring the success of library outreach to specific departments. for example, if one department uses the resources very little, this may indicate a training or marketing deficit. it may also be interesting to analyze how the academic success of a department aligns with library resource use. do the highest intensity users of library resources have greater professional output or higher prestige as a research department, for example? it is unsurprising to find that medical students and librarians are most likely to use library resources. the graduate medical education group is third and includes medical residents (newly qualified doctors on a learning curve). as with the ezproxy data, there are numerous insights to be gained from this data that will help the library make strategic decisions about future services. figure 9. table showing the proportion of each user group that has used at least one library service in a six-month period results trends at a glance: a management dashboard of library statistics | morton-owens and hanson 49 the dashboard has been available for almost a year. it requires a password and is only available to nyuhsl’s senior management team and librarians who have asked for access. feedback on the dashboard has been positive, and librarians have begun to make suggestions to improve its usefulness. one librarian uses the data warehouse for his own reports and will soon provide his queries so that they can be added to the dashboard. the dashboard has facilitated discoveries about the nature of our users and has identified potential training needs and areas of weakness in outreach. a static dashboard snapshot was recently created for presentation to the dean of the medical school to illustrate the extent and breadth of library use. the initial dashboard aimed to demonstrate the kinds of library statistics that it is possible to extract and display, but there is much to be done to improve its operational usefulness. a dashboard working group has been established to build on the original proof-of-concept by improving the data model and adding relevant charts. some charts will be incorporated into the public website as a snapshot of library activity. the dashboard was structured to be adaptable and expandable. the next iteration will support customization of the display for each user. new charts will be added as requested, and charts that are perceived to be less insightful will be removed. for example, one chart shows the number of reference chat requests answered by each librarian in addition to the number of chats handled per week. the usefulness of this chart was questioned when it was observed that the results were merely a reflection of which librarians had the most time at their own desks, allowing them to answer chats. this is an example of how it can be difficult to separate context from numbers. in this instance the individual statistics were only included because the data was available, not because any particular request from management, so these charts may be removed from the dashboard. nyuhsl is also investigating the ex libris tool ustat, which supports analysis of counter (counting online usage of networked electronic resources) reports from e-resources vendors. ustat covers some of the larger gaps in the ezproxy log, including journal-level rather than vendor-level analysis, and most importantly, the use statistics for non-ezproxied addresses. a future project will be to see whether there is an automated way to extract use metrics, either from ustat or directly from the vendors to be incorporated into the data warehouse. preliminary discussion are being held with it administrators about the possibilities of ezproxying library resource urls as they pass through the firewall so that the ezproxy log becomes a more complete reflection of use. an example of a strategic decision based on dashboard data involves nyuhsl’s mobile website. librarians had been considering the question of whether to invest substantial effort in identifying and presenting free apps and mobile websites to complement the library’s small collection of licensed mobile content. the chart of website visits on the dashboard surprisingly shows that the number of visits that come from mobile devices is consistently fewer 3 percent, probably because of the relatively modest selection of mobile-optimized website resources. rather than invest information technology and libraries | september 2012 50 significant effort in cataloging additional potentially lackluster free resources that would not be seen by a large number of users, the team decided to wait for more headlining subscription-based resources to become available and increase traffic to the mobile site. it would be worthwhile to add charts to the dashboard that track metrics related to new strategic initiatives requiring librarians to translate strategic ideas into measurable quantities. for example, if the library aspired to make sure users received responses more quickly, charts tracking the response time for various services could be added and grouped together to track progress on this goal. as data continues to accumulate, it will be possible to extend the timeframe of the charts, for example, making weekly charts into monthly ones. over time, the data may become more static, requiring more complicated calculations to reveal interesting trends. conclusions the medical center has a strong ethic of metric-driven decisions, and the dashboard brings the library in line with this initiative. the dashboard allows librarians and management to monitor key library operations from a single, convenient page, with an emphasis on long-term trends rather than day-to-day fluctuations in use. it was put together using freely available tools that should be within the reach of people with moderate programming experience. assembling the dashboard required background knowledge of the systems in question, was made possible by nyuhsl’s use of open-source and homegrown software, and increased the designers’ understanding of the data and tools in question. references 1 association of academic health sciences libraries, “annual statistics,” http://www.aahsl.org/mc/page.do?sitepageid=84868 (accessed november 7, 2011); association of research libraries, “arl statistics,” http://www.arl.org/stats/annualsurveys/arlstats (accessed november 7, 2011). 2 brown university library, “dashboard_beta :: dashboard information,” http://library.brown.edu/dashboard/info (accessed january 5, 2012). 3 edward r. tufte, the visual display of quantitative information (cheshire, ct: graphics, 2001), 92. 4 ibid., 56. 5 ibid., 178. 6 ibid., 153. 7 scott bateman et al., “useful junk? the effects of visual embellishment on comprehension and memorability of charts,” chi ’10 proceedings of the 28th international conference on human factors in computing systems (new york, acm, 2010) , doi: 10.1145/1753326.1753716. http://www.aahsl.org/mc/page.do?sitepageid=84868 http://www.arl.org/stats/annualsurveys/arlstats/ http://library.brown.edu/dashboard/info/ trends at a glance: a management dashboard of library statistics | morton-owens and hanson 51 8 stephen few, information dashboard design: the effective visual communication of data (beijing: o’reilly, 2006), 98. 9 nils rasmussen, claire y. chen, and manish bansal, business dashboards: a visual catalog for design and deployment (hoboken, nj: wiley, 2009), ch. 4. 10 richard j. roiger and michael w. geatz, data mining: a tutorial-based primer (boston: addison wesley, 2003), 186. 11 one example: stefan waner and steven r. costenoble, “fitting functions to data: linear and exponential regression,” february 2008, http://people.hofstra.edu/stefan_waner/realworld/calctopic1/regression.html (accessed january 5, 2012). 12 emily g. morton-owens, “editorial and technological workflow tools to promote website quality,” information technology &llibraries 30, no 3 (september 2011):92–98. 13 google, “gapi—google analytics api php interface,” http://code.google.com/p/gapi-google-analyticsphp-interface (accessed january 5, 2012). 14 karen a. coombs, “lessons learned from analyzing library database usage data,” library hitech 23 (2005): 4, 598–609, doi: 10.1108/07378830510636373. http://people.hofstra.edu/stefan_waner/realworld/calctopic1/regression.html http://code.google.com/p/gapi-google-analytics-php-interface/ http://code.google.com/p/gapi-google-analytics-php-interface/ library use of web-based research guides jimmy ghaphery and erin white information technology and libraries | march 2012 21 abstract this paper describes the ways in which libraries are currently implementing and managing webbased research guides (a.k.a. pathfinders, libguides, subject guides, etc.) by examining two sets of data from the spring of 2011. one set of data was compiled by visiting the websites of ninety-nine american university arl libraries and recording the characteristics of each site’s research guides. the other set of data is based on an online survey of librarians about the ways in which their libraries implement and maintain research guides. in conclusion, a discussion follows that includes implications for the library technology community. selected literature review while there has been significant research on library research guides, there has not been a recent survey either of the overall landscape or of librarian attitudes and practices. there has been recent work on the efficacy of research guides as well as strategies for their promotion. there is still work to be done on developing a strong return on investment metric for research guides, although the same could probably be said for other library technologies including websites, digital collections, and institutional repositories. subject-based research guides have a long history in libraries that predates the web as a servicedelivery mechanism. a literature-review article from 2007 found that research on the subject gained momentum around 1996 with the advent of electronic research guides, and that there was a need for more user-centric testing.1 by the mid-2000s, it was rare to find a library that did not offer research guides through its website.2 the format of guides has certainly shifted over time to database-driven efforts through local library programming and commercial offerings. a number of other articles start to answer some of the questions about usability posed in the 2007 literature review by vileno. in 2008, grays, del bosque, and costello used virtual focus groups as a test bed for guide evaluation.3 two articles from the august 2010 issue of the journal of library administration contain excellent literature reviews and look toward marketing, assessment, and best practices.4 also in 2010, vileno followed up on the 2007 literature review with usability testing that pointed toward a number of areas in which users experienced difficulties with research guides.5 jimmy ghaphery (jghapher@vcu.edu) is head, library information systems and erin white (erwhite@vcu.edu) is web systems librarian, virginia commonwealth university libraries, richmond, va. mailto:jghapher@vcu.edu library use of web-based research guides | ghaphery and white 22 in terms of cross-library studies, an interesting collaboration in 2008 between cornell and princeton universities found that students, faculty, and librarians perceived value in research guides, but that their qualitative comments and content analysis of the guides themselves indicated a need for more compelling and effective features.6 the work of morris and grimes from 1999 should also be mentioned; the authors surveyed 53 university libraries, finding that it was rare to find a library with formal management policies for their research guides.7 most recently, libguides has emerged as a leader in this arena, offering a popular software-as-aservice (saas) model and as such is not yet heavily represented in the literature. a multichapter libguides lita guide is pending publication and will cover such topics as implementing and managing libguides, setting standards for training and design, and creating and managing guides. arl guides landscape during the week of march 3rd, 2011, the authors visited the websites of 99 american university arl libraries to determine the prevalence and general characteristics of their subject-based research guides. in general, the visits reinforced the overarching theme within the literature that subject-based research guides are a core component of academic library web services. all 99 libraries offered research guides that were easy to find from the library home page. libguides was very prominent as a platform, in production at 67 of the 99 libraries. among these, it appeared that at least 5 libraries were in the process of migrating from a previous system (either a homegrown, database-driven site or static html pages) to libguides. in addition to the presence and platform, the authors recorded additional information about the scope and breadth of each site’s research guides. for each site, the presence of course-based research guides was recorded. in some cases the course guides had a separate listing, whereas in others they were intermingled with the subject-based research guides. course guides were found on 75 of the 99 libraries visited. of these, 63 were also libguides sites. it is certainly possible that course guides are being deployed at some of the other libraries but were not immediately visible in visiting the websites, or that course guides may be deployed through a course management system. nonetheless, it appears that the use of libguides encourages the presence of public-facing course guides. qualitatively, there was wide diversity of how course guides were organized and presented, varying from a simple a-to-z listing of all guides to separately curated landing pages specifically organized by discipline. the number of guides was recorded for each libguides site. it was possible to append “/browse.php?o=a” to the base url to determine how many guides and authors were published at each site. this php extension was the publicly available listing of all guides on each libguides platform. the “/browse.php?o=a” extension no longer publicly reports these statistics; however, findings could be reproduced by manually counting the number of guides and authors on each site. the authors confirmed the validity of this method in the fall of 2011 by revisiting four sites and finding that the numbers derived from manual counting were in line with the previous findings. of information technology and libraries | march 2012 23 the 63 libguides sites we observed, a total of 14,522 guides were counted from 2,101 authors for an average of 7 guides per author. on average, each site had 220 guides from 32 authors (median of 179 guides; 29 authors). at the high end of the scale, one site had 713 guides from 46 authors. based on the volume observed, libraries appear to be investing significant time toward the creation, and presumably the maintenance, of this content. in addition to creation and ongoing maintenance, such long lists of topics raise a number of usability issues that libraries will also be wise to keep in mind.8 survey the literature review and website visits call out two strong trends: 1. research guides are as commonplace as books in libraries, 2. libguides is the elephant in the room, so much so that it is hard to discuss research guides without discussing libguides. based on preliminary findings from the literature review and survey, we looked to further describe how libraries are supporting, innovating, implementing, and evaluating their research guides. a ten-question survey was designed to better understand how research guides sit within the cultural environment of libraries. it was distributed to a number of professional discussion lists the week of april 19, 2011 (see appendix). the following lists were used in an attempt to get a balance of opinion from populations of both technical and public services librarians: code4lib, web4lib, lita-l, lib-ref-l, and ili-l. the survey was made available for two weeks following the list announcements. survey response was very strong, with 198 responses (188 libraries) received without the benefit of any follow-up recruitment. ten institutions submitted more than one response. in these cases only the first response was included for analysis. we did not complete a response for our own institution. the vast majority (155, 82%) of respondents were from college or university libraries. of the remaining 33, 24 (13%) were from community college libraries, with only 9 (5%) identifying themselves as public, school, private, or governmental. among the college and university libraries, 17 (9%) identified themselves as members of the arl, which comprises 126 members.9 in terms of “what system best describes your research guides by subject?” the results were similar to the survey of arl websites. most libraries (129, 69%) reported libguides as their system, followed by “customized open source system” and “static html pages,” both at 20 responses (11% each). sixteen libraries (9%) reported using a homegrown system, with three libraries (2%) reporting “other commercial system.” in terms of initiating and maintaining a guides system, much of the work within libraries seems to be happening outside of library systems departments. when asked which statement best described who selected the guides system, 67 respondents (36%) indicated their library research library use of web-based research guides | ghaphery and white 24 guides were “initiated by public services,” followed closely by “more of a library-wide initiative” at 63 responses (34%). in the middle at 34 responses (18%) was “initiated by an informal crossdepartmental group.” only 10 respondents (5%) selected “initiated by systems,” with the top down approach of “initiated by administration” gathering 14 responses (7%). when narrowing the responses to those sites that are using libguides or campus guides, the portrait is not terribly different, with 36% library-wide, 35% public services, 18% informal cross-departmental, 7% administration, and systems trailing at 4%. likewise there was not a strong indication of library systems involvement in maintaining or supporting research guides. sixty-nine responses (37%) indicated “no ongoing involvement” and an additional 35 (19%) indicated “n/a we do not have a systems department.” there were only 21 responses (11%) stating “considerable ongoing involvement,” with the balance of 63 responses (34%) for “some ongoing involvement.” not surprisingly, there was a correlation between the type of research guide and the amount of systems involvement. for sites running a “customized open source system,” “other commercial system,” or “homegrown system,” at least 80% of responses indicated either “considerable” or “some” ongoing systems involvement. in contrast, 37% of sites running libguides or campusguides indicated “considerable” or “some” technical involvement. further, the libguides and campusguides users recorded the highest percentage (43%) of “no ongoing involvement” compared to 37% of all respondents. interestingly, 20% of libguides and campus guides users answered “n/a we do not have a systems department,” which is not significantly higher than all respondents for this question at 19%. the level of interaction between research guides and enterprise library systems was not reported as strong. when asked “which statement best describes the relationship between your web content management system and your research guides?” 112 responses (60%) indicated that “our content management system is independent of our research guides” with an additional 51 responses (27%) indicating that they did not have a content management system (cms). only 12 respondents (6%) said that their cms was integrated with their research guides with a remaining 13 (7%) saying that their cms was used for “both our website and our research guides.” a similar portrait was found in seeking out the relationship between research guides and discovery/federated search tools. when asked “which statement best describes the relationship between your discovery/federated search tool and your research guides?” roughly half of the respondents (96, 51%) did not have a discovery system (“n/a we do not have a discovery tool”). only 12 respondents (6%) selected “we prominently feature our discovery tool on our guides,” whereas more than double that number, 26 (14%), said “we typically do not include our discovery tool on our guides.” fifty four respondents (29%) took the middle path of “our discovery tool is one of many search options we feature on our guides.” in the case of both discovery systems and content management systems, it seems that research guides are typically not deeply integrated. when asked “what other type of content do you host on your research guides system?” respondents selected from a list of choices as reflected in table 1. information technology and libraries | march 2012 25 answer total percent libguides/campusguides course pages 127 68% 74% “how to” instruction 123 65% 77% alphabetical list of all databases 76 40% 42% “about the library” information (for example hours, directions, staff directory, event) 59 31% 35% digital collections 34 18% 19% everything—we use the research guide platform as our website 16 9% 9% none of the above 17 9% 2% table 1. other types of content hosted on research guides system these answers reinforce the portrait of integration within the larger library web presence. while the research guides platform is an important part of that presence, significant content is also being managed by libraries through other systems. it is also consistent with the findings from the arl website visits, where course pages were consistently found within the research guides platform. for sites reporting libguides or campusguides as their platform, inclusion of course pages and how-to instruction was even higher, at 74% and 77%, respectively. another multi-answer question sought to determine what types of policies are being used by libraries for the management of research guides: “which of the following procedures or policies do you have in place for your research guides?” responses are summarized in table 2. library use of web-based research guides | ghaphery and white 26 answer total percent percent using libguides/campusguides style guides for consistent presentation 105 56 58 maintenance and upkeep of guides 94 50 53 link checking 87 46 50 required elements such as contact information, chat, pictures, etc. 78 41 56 training for guide creators 73 39 43 transfer of guides to another author due to separation or change in duties 72 38 41 defined scope of appropriate content 43 23 22 allowing and/or moderating user tags, comments, ratings 36 19 25 none of the above 36 19 19 controlled vocabulary/tagging system for managing guides 23 12 25 table 2. management policies/procedures for research guides while nearly one in five libraries reported none of the policies in place at all, the responses indicate that there is effort being applied toward the management of these systems. the highest percentage for any given policy was 56% for “style guides for consistent presentation.” best practices in these areas could be emerging or many of these policies could be specific to individual library needs. as with the survey question on content, the research-guides platform also has a role with the libguides and campusguides users reporting much higher rates of policies for “controlled vocabulary/tagging” (25% vs. 12%) and “required elements” (56% vs. 41%). in both information technology and libraries | march 2012 27 of these cases, it is likely that the need for policies arise from the availability of these features and options that may not be present in other systems. based on this supposition, it is somewhat surprising that the libguides and campusguides sites reported the same lack of policy adoption (none of the above; 19%). the final question in the survey further explored the management posture for research guides by asking a free-text question: “how do you evaluate the success or failure of your research guides?” results were compiled into a spreadsheet. the authors used inductive coding to find themes and perform a basic data analysis on the responses, including a tally of which evaluation methods were used and how often. one in five institutions (37 respondents, 19.6%) looked only to usage stats, while seven respondents (4%) indicated that their library had performed usability testing as part of the evaluation. forty-our respondents (23.4%) said they had no evaluation method in place (“ouch! it hurts to write that.”), though many expressed an interest or plans to begin evaluation. another emerging theme included ten respondents who quantified success in terms of library adoption and ease of use. this included one respondent who had adopted libguides in light of prohibitive it regulations (“we choose libguides because it would not allow us to create class specific research webpages”). several institutions also expressed frustration with the survey instrument because they were in the process of moving from one guides system to another and were not sure how to address many questions. most responses indicated that there are more questions than answers regarding the efficacy of their research guides, though the general sentiment toward the idea of guides was positive, with words such as “positive,” “easy,” “like,” and “love” appearing in 16 responses. countering that, 5 respondents indicated that their libraries’ research-guides projects had fallen through. conclusion this study confirms previous research that web-based research guides are a common offering, especially in academic libraries. adding to this, we have recorded a quantitative adoption of libguides both through visiting arl websites and through a survey distributed to library listservs. further, this study did not find a consistent management or assessment practice for library research guides. perhaps the most interesting finding from this study is the role of library systems departments with regard to research guides. it appears that many library systems departments are not actively involved in either the initiation or ongoing support of web-based research guides. what are the implications for the library technology community and what questions arise for future research? the apparent ascendancy of libguides over local solutions is certainly worth considering and in part demonstrates some comfort within libraries for cloud computing and saas. time will tell how this might spread to other library systems. the popularity of libguides, at its heart a specialized content management system, also calls into question the vitality and adaptability of local content management system implementations in libraries. more generally, does the desire to professionally select and steward information for users on research guides indicate librarian misgivings about the usability of enterprise library systems? how do attitudes library use of web-based research guides | ghaphery and white 28 toward research guides differ between public services and technical services? hopefully these questions serve as a call for continued technical engagement with library research guides. what shape that engagement may have in the future is an open question, but based on the prevalence and descriptions of current implementations, such consideration by the library technology community is worthwhile. references 1. luigina vileno, “from paper to electronic, the evolution of pathfinders: a review of the literature,” reference services review 35, no. 3 (2007): 434–51. 2. martin courtois, martha higgins, aditya kapur, “was this guide helpful? users’ perceptions of subject guides,” reference services review 33 , no. 2 (2005): 188–96. 3. lateka j. grays, darcy del bosque, and kristen costello, “building a better m.i.c.e. trap: using virtual focus groups to assess subject guides for distance education students,” journal of library administration 48, no. 3/4 (2008): 431–53. 4. mira foster et al., “marketing research guides: an online experiment with libguides,” journal of library administration 50, no. 5/6 (july/september, 2010): 602–16; alisa c. gonzalez and theresa westbrock, “reaching out with libguides: establishing a working set of best practices,” journal of library administration 50, no. 5/6 (july/september, 2010): 638–56. 5. luigina vileno, “testing the usability of two online research guides,” partnership: the canadian journal of library and information practice and research 5, no. 2 (2010), http://journal.lib.uoguelph.ca/index.php/perj/article/view/1235 (accessed august 8, 2011). 6. angela horne and steve adams, “do the outcomes justify the buzz? an assessment of libguides at cornell university and princeton university—presentation transcript,” presented at the association of academic and research libraries, seattle, wa, 2009, http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-oflibguides-at-cornell-university-and-princeton-university (accessed august 8, 2011). 7. sarah morris and marybeth grimes, “a great deal of time and effort: an overview of creating and maintaining internet-based subject guides,” library computing 18, no. 3 (1999): 213–16. 8. mathew miles and scott bergstrom, “classification of library resources by subject on the library website: is there an optimal number of subject labels?” information technology & libraries 28, no. 1 (march 2009): 16–20, http://www.ala.org/lita/ital/files/28/1/miles.pdf (accessed august 8, 2011). 9. association of research libraries, “association of research libraries: member libraries,” http://www.arl.org/arl/membership/members.shtml (accessed october 24, 2011). http://journal.lib.uoguelph.ca/index.php/perj/article/view/1235 http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-of-libguides-at-cornell-university-and-princeton-university http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-of-libguides-at-cornell-university-and-princeton-university http://www.ala.org/lita/ital/files/28/1/miles.pdf http://www.arl.org/arl/membership/members.shtml information technology and libraries | march 2012 29 appendix. survey library use of web-based research guides please complete the survey below. we are researching libraries’ use of web-based research guides. please consider filling out the following survey, or forwarding this survey to the person in your library who would be in the best position to describe your library’s research guides. responses are anonymous. thank you for your help! jimmy ghaphery, vcu libraries erin white, vcu libraries 1) what is the name of your organization? __________________________________ note that the name of your organization will only be used to make sure multiple responses from the same organization are not received. any publication of results will not include specific names of organizations. 2) which choice best describes your library? o arl o university library o college library o community college library o public library o school library o private library o governmental library o nonprofit library 3) what type of system best describes your research guides by subject? o libguides or campusguides o customized open source system o other commercial system o homegrown system o static html pages 4) which statement best describes the selection of your current research guides system? o initiated by administration o initiated by systems o initiated by public services o initiated by an informal cross-departmental group o more of a library-wide initiative library use of web-based research guides | ghaphery and white 30 5) how much ongoing involvement does your systems department have with the management of your research guides? o no ongoing involvement o some ongoing involvement o considerable ongoing involvement o n/a we do not have a systems department 6) what other type of content do you host on your research guides system? o course pages o “how to” instruction o alphabetical list of all databases o “about the library” information (for example: hours, directions, staff directory, events) o digital collections o everything—we use the research guide platform as our website o none of the above 7) which statement best describes the relationship between your discovery/federated search tool and your research guides? o we typically do not include our discovery tool on our guides o our discovery tool is one of many search options we promote on our guides o we prominently feature our discovery tool on our guides o n/a we do not have a discovery tool 8) which statement best describes the relationship between your web content management system and your research guides? o our content management system is independent of our research guides o our content management system is integrated with our research guides o our content management system is used for both our website and our research guides o n/a we do not have a content management system 9) which of the following procedures or policies do you have in place for your research guides? o defined scope of appropriate content o required elements such as contact information, chat, pictures, etc. o style guides for consistent presentation o allowing and/or moderating user tags, comments, ratings o training for guide creators o controlled vocabulary/tagging system for managing guides o maintenance and upkeep of guides o link checking information technology and libraries | march 2012 31 o transfer of guides to another author due to separation or change in duties o none of the above 10) how do you evaluate the success or failure of your research guides? [free text] 49 book reviews writing for technical and professional journals by john h. mitchell. john wiley & sons, inc., new york, london and sydney, 1968, 405 pp. this book reprints, describes, summarizes or refers to every item in what has to be the world's largest scrapbook of material relating to professional publication. the last 240 pages (three-fifths of the total) include "sample" style guides from the ieee, management science, aibs, acs (including seven or eight pages of abbreviations used in chemical abstracts), aip, the gpo, nasa, the modem language association, the american mathematical society, the american medical association, the apa, the american sociological review, the american economic review, the hispanic american historical review, the nea, and sundry others. in almost every case, the excerpted or complete style guide is followed by an illustrative article. i would doubt that any other such compilation exists. the chapters which precede this anthology discuss more general aspects of writing for professional journals: design and approach, the collection, correlation, selection and anangement of data, and the . elements of journal articles. the text in these chapters is crowded with material of the most varied and unexpected kinds: disquisitions on logic, formal organization, outlining, interview techniques, information retrieval, the dewey decimal system, the ejc thesaurus, and much, much more. there is only one problem in all of this, but it is a· serious one, epitomized by the quotation from robert louis stevenson which mitchell uses as motto for his first chapter: "if a man can group his ideas, he is a good writer." this real treasury of reference material is all but inaccessible to the reader. titles of the five chapters are not very descriptive, and the index is not organized as a retrieval device. if one knows where in the book to look, he can find very useful information, but just leafing through the pages is neither efficient nor easy. it is made particularly difficult, in fact, by the striking lack of editorial judgment exercised in the design of the book. there is no differentiation between the author's comments and the examples and illustrations which he reprints (unless, as in some cases, the typography of the original has been reproduced). headings within chapters, where they exist at all, are confusing-and again, it is often difficult to determine whether they are part of mitchell's organization or part of some quoted work. as a result, it is hard to say who should buy this book and even harder to say how it might be used. professor mitchell, who "was elected teacher of the year by the students of the university of massachusetts" in 1965, is presumably able to make selections from the contents and to present them effectively in a classroom. perhaps the publishers might atone for 50 journal of library automation vol. 2/1 march, 1969 their abnegations of responsibility in preparing this book for the press by prevailing upon its author to write a supplementary, and much-needed, user's guide to its contents. a. ]. goldwyn computer peripherals & typesetting by arthur h. phillips. london, her majesty's stationary office, 1968. 665 pp. $28.80. the appearance of a comprehensive volume on computer composition is a boon to librarians as it comes at a time when progress with marc and other complex data bases calls for printing and other output capabilities which exceed those now commonly available with computers. recent advances in photocomposition technology now make possible printing of graphic arts quality at acceptable costs for certain types of com~ puter produced library publications, such as book and periodical catalogs whose basic input includes upperand lower-case and a full range of diacritical marks. with these advances librarians need no longer accept the limitations of character sets and image quality imposed by present line printers. a quality product is needed for outputs which are destined for publication. some pioneers have already made good use of this advanced technology to produce quality catalogs and lists; this book will help others to travel the same road. the volume is a comprehensive reference compendium of data on computer peripherals which is not otherwise available in convenient form. it gives special emphasis to the coding and keyboarding of alphanumeric texts and describes how the computer can be used for text processing with a typographic output. it also gives an appreciation of the problems involved and the techniques and equipment that are available to those who are preparing to enter this important field. the text is arranged in three sections. the first is an introduction to computer processing of alphanumeric data which is intended for printing in typographic quality. the second describes many types of computer peripherals and gives considerable attention to the various codes used for computer and printing equipment data input. the third section describes alphanumeric text composition and the available graphic arts composing equipment. the text is supplemented by many illustrations, diagrams, and tables plus an index and a glossary of terms. while much of the material in the volume will become outdated within a short time, a substantial portion of it is sufficiently basic to retain its value for a longer period. this handsome book is intelligently conceived and well-written by one of england's leading authorities on printing and ' computer typesetting. for anyone seriously interested in the subject the volume is essential and worth its price. richard de gennaro book reviews 51 coordinate indexing, by john c. costello, jr. rutgers series on systems for the intellectual organization of information, volume vii. edited by susan artandi. the rutgers university press, new brunswick, n.j., 1966. 218 pp. this paperback book is the result of ~ seminar meeting on coordinate indexing held april 28 and 29, 1966, under the sponsorship of the rutgers graduate school of library service. the volume consists of a detailed presentation of the subject by john costello of battelle memorial institute, followed by a discussion of the presentation by four panelists. the objectives of the book as given in the preface are: to offer a description, discussion, critique, and collection of facts and data on coordinate indexing as one of the systems which may be used to intellectually organize information contained in documents. basically an introductory description of the subject is offered. however, the principles of coordinate indexing are included so that the material has value for anyone interested in the topic. with examples offered primarily from metallurgy and engineering, the emphasis is on the handling of technical documents. about half of the presentation is devoted to input, with storage, searching, and output comprising the other half. discussion by the panel (dr. susan artandi, moderator; dr. charles l. bernier; dr. vincent e. giuliano; and dr. i. a. warheit) is not given verbatim, but summarized by the editor. although the table of contents is quite detailed, an index would make the book more useful. the inclusion of a selective bibliography is valuable, but unfortunately it is almost never referred to· in the text. the bibliography is of course now somewhat out-of-date. laura k. osborn libraries of the future, by j. c. r. licklider. the m.i.t. press, massachusetts institute of technology, cambridge, massachusetts, 1965. third printing, september 1966. 219 pp. $6.00. this remarkable little book is rapidly becoming a classic in the field of information science. (note that it is now in its third printing.) it analyzes the concepts and problems of libraries of the future, "future" being defined as the year 2000. the book is the culmination of a two-year research project on the future of libraries sponsored by the council on library resources. the study was conducted by bolt beranek and newman, inc. between november, 1961, and november, 1963. the first part of this book describes man's interaction with recorded knowledge in what mr. licklider calls "procognitive systems." the author assumes man will be reacting to segments of the entire body of recorded information within a vast hierarchical information network. he estimates 52 journal of. library automation vol. 2/1 marc;:h, 1969 the present world corpus of knowledge could be stored in 1015 bits of computer memory. the rate of increase is 2·106 bits per second. part two explores the use of computers within the procognitive system. subjects touched upon include syntactical analysis of natural languages, quantitative aspects of the representation of information, information retrieval effectiveness, and question-answer systems. some time is spent with studies of current computer techniques. in general, part two is a trifle dated as it deals with specific techniques in a field where technological obsolescence is precipitous. . mr. licklider's writing is both intellectually stimulating and delightful. in discussing the future computer console, " ... the concept of ·desk' may have changed from passive to active: a desk may be primarily a displayand-control station in a telecommunication-telecomputation system---,and its most vital part may be the cable (umbilical cord) that connects it, via a wall socket, into the procognitive utility net," a footnote goes on to say, "if a man wishes to get away from it all and think in peace and quiet, he will have merely to turn off the power. however, it may not be economically feasible for his employer to pay him full rate for the time he thus spends in unamplified cerebration." serious students of information or library science should consider this book required reading if for no other reason than the jolt it provides one's imagination1 gerry d. guthrie 154 information technology and libraries | september 2009 tutorial kathleen carlson delivering information to students 24/7 with camtasia this article examines the selection process for and use of camtasia studio software, a screen video capture program created by techsmith. the camtasia studio software allows the author to create streaming videos which gives students 24 hour access on any topics including how to order books through interlibrary loan. h ow does one engage students in the library research process? in my brief time at the downtown phoenix campus library of arizona state university (asu) i have found a software program that allows librarians to bring the classroom to the student. screen capture programs allow you to create presentations and publish them for students to view on their own time. instead of telling students how to do something, we need to show them.1 recent studies show there are numerous benefits to using streaming video in higher education. students that receive streaming video instruction as well as traditional instruction show dramatic improvement in class.2 this article takes a look at how i selected one software program and created a streaming video using the application. i examined three software applications that help create video tutorials and presentations: cam studio, macromedia’s captivate, and techsmith’s camtasia studio. i first experimented with cam studio, which is open-source software. there are limitations to what you can do with software that is free. the screen size is too small and the file size it can create is limited. macromedia’s captivate is good if you want to create a series of screenshots with accompanying audio. i did not choose this streaming video program because i was unsure of the software’s capability, and i had no one to provide technical support. the third choice, the open-source camtasia studio, was the software i selected. there were several reasons why i preferred this software. i had more familiarity with it, and the software is very easy to load and is user friendly. it also has the ability to record a video of everything that is happening on your computer screen.3 another reason i selected camtasia studio was because of the availability of an asu software technician who had experience editing the streaming video. most users view camtasia’s video through adobe flash, but the program also can produce windows media, quicktime, dvd-ready avi, ipod, iphone, realmedia mp3, web, cd, blog, and animated gif formats.4 camtasia performs screen captures in real time. you are able to simultaneously use slideshow software, navigate to a website, and narrate step-by-step instructions. the full version of camtasia studio runs around $300. in addition to the software program, you also must have a combination headset and microphone. a stick microphone will work, but the combination headset will help eliminate any noise that can be picked up by a stick microphone. i purchased a logitech extreme pc gaming headset for about $20. when you purchase the camtasia license online at http://www.techsmith .com/, the customer service department will e-mail you the access code along with a link from which you can download the software. the cd-rom loaded with the camtasia software arrives about ten days later. my first camtasia studio project was a tutorial on how to use the university’s interlibrary loan system. here are the basic steps i took to create a streaming video: 1. preproduction. this involves the creation of a script. 2. production. the actual capturing of the video and audio content. have all websites and programs open and minimized at the bottom of the screen in order to easily select them during the video capturing. 3. postproduction. this is the most time-consuming and involves editing the video and compressing the file for delivery to users. 4. publishing. posting the video to a web server and assessing the material’s success. to see the full 3 minute 53 second streaming video “how to order an article that asu does not own” go to http://www.asu.edu/lib/tutorials/ illiad/index.html. implementing camtasia studio once camtasia studio is installed on your computer, double click on the camtasia studio icon. it will bring up a welcome window where you can select from the following (see figure 1): n start a new project by recording the screen n start recording a powerpoint presentation n start a new project by importing media files n open an existing project i have selected “start a new project by recording the screen.” on the left hand menu there is a task list, and you can select one of the kathleen carlson (kathleen.carlson@ asu.edu) is health sciences librarian, information commons library, arizona state university, downtown phoenix campus. delivering information to students 24/7 with camtasia | carlson 155 following (see figure 2): n record the screen n record the powerpoint i have selected “start a new project by recording the screen.” this will bring up a window, “new recording wizard screen recording setup.” it asks you what you would like to record (see figure 3). n region of the screen n specific window n entire screen i have selected “entire screen.” when you click on the “next” button, it brings up a recording options window (see figure 4). select from the following: n record audio n record camera i have selected “record audio while recording the screen.” next you see a window that lets you choose audio settings from the following (see figure 5): n microphone n speaker audio n microphone and speaker audio n manual input selection i have selected “microphone” (see figure 6). the next window is titled “tune volume input levels.” use the input level lever to set the audio input level (see figure 7). figure 1. welcome screen and what do you want to do? figure 5. choose audio settingsfigure 3. screen recording setup figure 4. recording options figure 2. record the screen 156 information technology and libraries | september 2009 the “begin recording” window appears, which includes instructions on how to start and stop recording. you have the choice of clicking the “record” button on camtasia recorder or clicking the f9 key to start recording. to stop, click the “stop” button on camtasia recorder or click the f10 key (see figure 8). finally click on either “record the screen” or “record powerpoint.” to view your streaming video, click on the saved icon where it says clip bin or go to camtasia toolbar and click on view. then click on clip bin, then click on thumbnails. that’s all there is to it. summary i found camtasia studio to be very user friendly, although i cannot emphasize enough how important it is for librarians to collaborate with their it staff. this software enables you to bring the classroom to the student when they need it. you may have instructed a class on library research, but many of these students may have already forgotten where to begin. streaming video allows students to access presentations 24/7. here is a checklist of things to think about when selecting software: n what do you want to accomplish with the software? n what kind of access are you trying to give? n do you want audio, video, or both? n is it easy for the student to access and understand? n have you researched the software to make sure it meets your needs? n how much money do you want to spend? n what additional equipment is necessary? finally, and most importantly, work with your it staff on all phases of your project. by developing a collaborative relationship with them you will have fewer bumps in the road. use your imagination: the sky is the limit. references 1. diane murley, “tools for creating video tutorials,” law library journal 99, no. 4 (2007). 2. ron reed, “streaming technology improves achievement: study shows the use of standards-based video content, powered by new internet technology application, increases student achievement,” t.h.e. journal 30, no. 7 (2003). 3. christopher cox, “from cameras to camtasia: streaming media without the stress,” internet reference services quarterly 9 no. 3/4 (2004). 4. john d. clark and qinghua kou, “captivate/camtasia,” journal of the medical library association 96, no. 1 (2008), http://www.pubmedcentral.nih.gov/ articlerender.fcgi?artid=2212324 (accessed june 24, 2009). figure 6. audio volume levels figure 7. begin recording figure 8. camtasia recorder reproduced with permission of the copyright owner. further reproduction prohibited without permission. graphical table of contents for library collections: the application ... herrero-solana, victor;félix moya-anegón;guerrero-bote, vicente;zapico-alonso, felipe information technology and libraries; mar 2006; 25, 1; proquest education journals pg. 43 reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. bibliographic retrieval from bibliographic input; the hypothesis and construction of a test frederick h. ruecking, jr.: head, data processing division, the fondren library, rice university, houston, texas 227 a study of problems associated with bibliographic retrieval using unverified input data supplied by requesters. a code derived from compression of title and author information to four, four-character abbreviations each was used for retrieval tests on an ibm 1401 computer. retrieval accuracy was 98.67%. current acquisitions systems which utilize computer processing have been oriented toward handling the order request only after it has been manually verified. systems, such as that of texas a & i university (1), have proven useful in reducing certain clerical routines and in handling fund accounting ( 2). lack of a larger bibliographic data base and lack of adequate computer time have prevented many libraries from studying more sophisticated acquisitions systems. at the time the marc pilot project ( 3) was started, the fondren library at rice university did not have operating computer applications in acquisitions, serials, or cataloging. the university administration and the research computation center provided sufficient access to the ibm 7040 to permit the study of problems associated with bibliographic retrieval using input data which has varying accuracy. in 1966, richmond expressed the concern of many librarians about the lack of specific statements describing the techniques by which on-line retrieval could be accomplished without complicating the problems presented by the current card catalog ( 4). she had previously described some of the problems created by the kind and quality of data being utilized as references by library users ( 5). 228 journal of library automation vol. 1/ 4 december, 1968 an examination of the pertinent literature indicates that most of the current work in retrieval, while related to problems of bibliographic retrieval, does not offer much assistance when the input data is suspect ( 6, 7,8 ). tainiter and toyoda, for example, have described different techniques of addressing storage using known input data ( 9,10). one of the best-known retrieval systems is that of the chemical abstracts service, which provides a fairly sophisticated title-scan of journal articles with a surprising degree of flexibility in the logic and term structure used as input. comparable systems are used by the defense documentation center, medlars centers, and nasa technology centers. these systems have one specific feature in common: a high level of accuracy in the input data. user-supplied bibliographic data the reliability of bibliographic data supplied to university libraries from faculty and students has long been questioned ( 5). any search system which accepts such data must be designed 1) to increase the level of confidence through machine-generated search structures and variable threshholds and 2) to reduce the dependence upon spelling accuracy, punctuation, spacing and word order. the initial task of formulating an approach to this problem is to determine the type, quality, and quantity of data generally supplied by a user. to derive a controlled set of data for this purpose, the acquisition department of the fondren library provided xerox copies of all english language requests dated 1965 or later and a random sample of 295 requests was drawn from that file of 5000 items. this random sample was compared to the manually-verified, original order-requests to determine 1) the frequency with which data was supplied by the requestor and 2) the accuracy of the provided information. results of this study are given in table 1. table 1. level of confidence in the input data data times times level of elements given correct accuracy confidence edition 295 294 99.6 99.6 title 295 292 99.0 99.0 author 290 264 91.0 82.7 publish. 268 218 81.3 73.9 date 265 215 81.1 72.8 the results suggest that edition can have great significance when specified and should be used as strong supporting evidence for retrieval. it should not necessarily be a restrictive element because of the low-order magnitude of actual specification, which was five times in the sample. (unstated editions were considered as first editions, and correct. ) bibliographic retrievalj ruecking 229 title is the most significant and most reliable element. as richmond indicates, use of the entire title for searching would present distinct problems for retrieval systems ( 4) . consequently, an abbreviated version of the title must be derived from the input data which will reduce the impact and significance of the problems described by richmond (5). the hypothesis it is hypothecated that retrieval of correct bibliographic entries can be obtained from unverified, user-supplied, input data through the use of a code derived from the compression of author and title information supplied by the user. it is assumed that a similar code is provided for all entries of the data base using the same compression rules for main and added entry, title and added title information. it is further hypothecated that use of weighting factors for individual segments of the code will provide accurate retrieval in those cases when exact matching does not occur. before the retrieval methodology can be described, it is necessary to outline the compression technique to be used with author and title words. title compression to gain some understanding of the problems to be faced in compressing title information, a random sample of 500 titles was drawn from the first half of the initial marc i reel (about 4800 titles). each of these titles was analyzed for significant words and tabulations were made on word strings and word frequencies. the following words. were considered as non-significant: a, an, and, by, if, in, of, on, the, to. the tabulated data, shown in table 2, contain some surprising attributes. approximately 90% of the titles contain less than five significant words, which suggests that four significant words will be adequate to match on title. table 2. significant word strings in titles length of word string 1 2 3 4 5+ total number of titles 42 151 179 76 52 500 percentage 8.4 30.2 35.8 15.2 10.4 100.0 cumulative percentage 8.4 38.6 74.4 89.6 100.0 letting n stand for the corpus of words available for title use, the random chance of duplicating any specific word in another title can be stated 1 as . when a string of words is considered, the chance of randomly n 1 selecting the same word string may be considered as -a, where 'a' is the n number of words in the string. 230 journal of library automation vol. 1/ 4 december, 1968 certain words are used more frequently than others, and the occurrence of such words in a given string reduces the uniqueness of that string. the curve displayed in figure 1 shows the frequency distribution of words in the sample. the mean frequency of words in the title-sample is 1.33. 'ioo ( )b~f 800 700 600 t.r) 0 a: 0 3.500 ll. d 0:: llj cdfoo ~ =:i :z: 3()() 2ixj \ 100 fi'}.. i~ k f+!.~ \' jtl-' __() (i) i (~ _c[).l i z 3 '1s 6 7 8 f/ 10 ii /2 ffi!equency fig. 1. frequency distribution of words in sample. bibliographic retrievaljruecking 231 therefore, the chance of selecting an identical word string can be more accurately expressed as: n" an examination of word lengths, as shown in table 3, shows that 95% of the significant title words contain less than ten characters. an examination of the word list revealed that some 70% of the title words contain inflections and/ or suffixes. if these suffixes and inflections are removed, approximately 43% of the remaining word stems contain less than five characters and 59% contain less than six. table 3. distribution of character length and stem length length in total different percent stems percent characters words words 1 7 5 0.5 5 0.8 2 25 14 1.3 14 2.3 3 87 48 4.6 48 7.9 4 172 117 11.1 196 32.3 5 229 163 15.5 92 15.2 6 198 153 14.5 94 15.5 7 202 159 15.3 64 10.6 8 158 122 11.6 45 7.4 9 121 102 9.7 15 2.5 10 84 69 6.6 8 1.3 11 54 48 4.6 7 1.2 12 38 28 2.7 2 0.3 13 14 12 1.1 2 0.3 14 6 4 0.4 0 0.0 15 3 3 0.3 0 0.0 16 2 2 0.2 0 0.0 summary 1400 1049 592 the reduction of word length does affect the uniqueness of the individual word, merging distinct words into common word stems at a mean rate of 2.5 to 1.0. in table 3 the difference between 1049 words and 592 stems reflects the reduction of similar words into a common stem; for example: america, american, americans, americanism, etc., into a.mer. thus, the uniqueness of a string of title words is reduced to the following chance of duplication: (2.5 x 1.33 )• 3.3• n• or-n" 232 journal of library automation vol. 1/ 4 december, 1968 an analysis of consonant strings made by dolby and resnikoff provides frequencies of initial and terminal consonant strings occurring in 7000 common english words (with suffixes and inflections removed) ( 11,12, 13). these frequency lists clearly show that the terminal string of consonants has considerable information-carrying potential in terms of word identification. the starting string also carries information potential, but significantly less than the terminal string. by combining the initial and terminal strings, it is possible to generate an abbreviation which has adequate uniqueness and reduces the influence of spelling. the high percentage of four-character word stems and the fact that the maximum terminal string contains four consonants suggest the use of a four-character abbreviation. to compress a title word into four characters, it is necessary to specify a set of rules. the first rule will be to delete all suffixes and inflections which terminate a title word. the second rule will be to delete vowels from the stem until a consonant is located or the four-character stem is produced. the suffixes and inflections deleted in this procedure are contained in table 4. when the stem contains more than four characters, the third compression rule states that the four-character field is filled with the terminal-consonant string and remaining positions are filled from the initialcharacter string. table 4. deleted suffixes and inflections -ic -ive -in -et -ed -ative -ain -est -aged -ize -on -ant -oid -ing -ion -ent -ance -og -ation -ient -ence -log -ship -ment -ide -olog -er -ist -age -ish -or -y -able -al -s -ency -ible -ial -es -ogy -ite -ful -ies -ology -ine -ism -ives -ly -ure -urn -ess -ry -ise -ium -us -ary -ose -an -ous -ory -ate -ian -ious -ity -ite the relative uniqueness of the generated abbreviation can be calculated using the data supplied by dolby and resnikoff. for example, carter and bonk's building library collections would be abbreviatedbuld, libr,coct. the random chance of duplicating any abbreviation can be stated as consisting of the product of the random chance of duplicating the initial string and the random chance of duplicating the terminal string: bibliographic retrievalj ruecklng 233 fl ft -xx3.32 n1 nt the frequencies listed by dolby and resnikoff may be substituted in the above equation producing the following chances for duplication: 324 63 1 x x 10.89 = -for buld 6800 6800 208 288 6800 277 6800 1 1 x 6800 x 10.89 = 14745 for libr 16 1 x 6800 x 10.89 = 1041 for coct the random chance of duplicating this string of three abbreviations can be calculated by multiplying the individual calculations, which yields the random chance of 1 in 32 x 108• this high uniqueness declines rapidly when the title contains less than three significant words and contains high frequency words, such as the title collected works, for which the same uniqueness calculation produces the random chance of 1 in 44 x 104• to increase the level of uniqueness on short titles, like collected works, it becomes necessary to provide supporting data to the title information. it is clear that the supporting data must come from supplied author text. author compression the same compression algorithms can be used for both personal and corporate names with some modifications. the frequent· substitution of "conference" for "congress" and "symposia" for "symposium" suggests that meeting names should be considered as a secondary sub-set of non-significant words. names of organizational divisions, such as bureau, department, ministry, and office, can be considered as part of the same sub-set. the rules which govern the deletion of inflections, suffixes and vowels can be used for corporate names, but personal author names must be carried into the compression routine without modificatjon. only the last name of an author would be compressed into a code. constructing the test four, four-character abbreviations are allowed for title compression and four for author. rather than use a 32-character fixed field for these codes, the lengths of the input and main-base codes are variable, with leading control digits to specify the individual code sizes for the title and author segments. . provision is made for the inclusion of date, publisher and/ or edition in the search-code sh·ucture although these were not implemented in the test performed. . 234 journal of library automation vol. 1/ 4 december, 1968 at the time the input data is read, the existence of title, author, edition, publisher and date is indicated by the setting of indicators which control the matching mask and which, in part, control the specification of the retrieve threshhold. the title indicator specifies the number of compressed words in the supplied title which must be matched by the base code. a simple algorithm is used to calculate the threshhold values given in columns two through four of table 5. columns five through seven are obtained by adding two to the calculated threshholds. each agreement within the mask adds to a retrieve counter the values indicated in the last five columns of table 5, the values of x and y being the number of matching code words in the title and author segments respectively. conducting the test as mentioned above, the initial tests of the retrieve were based upon title and author matching exclusively and required three runs on the fondren library's 1401 computer. the first loaded 2874 original orderrequests, generated a search code utilizing the rules specified in this paper and created an input tape. the second run extracted title and author data from the marc i data base, created multiple search codes for title, main entry, added title and added entry. both tapes were sorted into ascending search-code sequence. the final run was the search program which attempted to match input codes with the marc i base codes. when there was agreement based on relationship of threshhold and retrieve counter, the printer displayed threshhold, short author and short title on one line, and retrieve value, input author and title on the next line as illustrated in figure 2. the printed results were compared to validate the accuracy of the retrieve. this comparison was cross-checked against the results of the acquisition department's manual procedures. the search program also provided for an attempt to match titles on the basis of a rearrangement of title words. in such attempts the retrieve threshhold was raised. analysis of results the raw data obtained from this experimental run are shown in table 6. of the 287 4 items represented in the input file , 48.4%, or 1392, were actually found to exist in the data base. of those actually present 90.4%, or 1200, were extracted with an overall accuracy of 98.67%. an examination of the sixteen false drops revealed several omissions in the compression routines for the input data and for the data base. one of the more significant omissions was failing to compensate for multi-character abbreviations, particularly 'st.' and 'ste.' for 'saint.' a subroutine for acceptance of such abbreviations added to the search-code generating program would increase the retrieve accuracy to 99%. table 5. values for variable threshhold data threshhold values agreement values given full-code test individual code test title author edition publish. date taepd 3 or 4 2 1 3 or 4 2 1 xylll 12 8+2y 4+2y 14 10+2y 6+2y 4x 2y 3 2 1 xyllo 12 8+2y 4+2y 14 10+2y 6+2y 4x 2y 3 2 1 xylol 12 8+2y 4+ 2y 14 10+2y 6+2y 4x 2y 3 2 1 xyloo 12 8+2y 4+ 2y 14 10+2y 6+2y 4x 2y 3 2 1 xyoll 12 8+2y 4+2y 14 10+2y 6+2y 4x 2y 3 2 1 l::x; .... xyolo 12 8+2y 4+ 2y 14 10+2y 6+2y 4x 2y 3 2 1 ~ g:' xy001 12 8+2y 4+2y 14 10+2y 6+2y 4x 2y 3 2 1 (1q ~ "';j xyooo 12 8+2y 4+2y 14 18+2y 6+2y 4x 2y 3 2 1 ;;:to .... ~ xolll 12 11 7 13 12 7 4x 2y 3 2 1 ::x; {';) xouo 12 11 7 13 12 7 4x 2y 3 2 1 ..... "'t .... {';) c: x0101 12 11 7 13 12 7 4x 2y 3 2 1 ~ "' x0100 12 11 7 13 11 7 4x 2y 3 2 1 !:l:l c::: xoou 12 10 6 13 11 7 4x 2y 3 2 1 trl (") p.:: xoolo 12 10 6 13 not permitted 4x 2y 3 2 1 -z x0001 12 9 5 13 not permitted 4x 2y 3 2 1 0 1:-0 xooo 12 not permitted not permitted c.:> cj1 10 4me r4m8rhchs 10 am~r4mbrhchs ob ame~boll ob ameii.boll 12 amerbusqshowzien 12 amerbusqshowzein 12 amercntrcampbrth 12 amercntrcamp 12 aherjewsisrliscs 1~ amerjewsisrliscshalor 12 ameroccpstctblau 1~ ameroccpstctblau 12 ameroccpstctounn 12 aheroccpstctblau 12 amerpartsysmchrs 14 aherpartsysmchrs 10 amerpreowarn 10 amerpreowarn 10 amerschkillck 10 4merschkblck 10 amerschosexi'i 10 amerschosexnpatccayo 12 amerspacexprshtn 1~ amerspacexprshen 12 amerthettooaoowr 1~ amerthettooaoowr 12 ame r thtii.as seenbrwn 11> amerthtras see nmos smonsj 12 ameihhtras seenmoss 18 a!'ierthtras seei'imossmo'isj 12 an4zphphargumcgl 12 anazphphargumcgfjan phip 12 ancihuntfar westpoud 18 ancihuntfar we stpouo fig. 2. sample of retrieved citations. heinrichs, waldo h. heinrichs* boswell, charles. boswellt lieoman, irving . leiomant bosworth, allan r. clay, c. t.t isaacs, harold robert; isaacs, harold r.t blau, peter michael. blaut duncan, otis ouoleyo jo blaut chambers, william nisbet chahberst warren, stoney, 1916 warrei'it black, hillel. blackt sexton, patricia cayo. sextoi'io patricia cayot shelton, william royo sheltont downer, alan seymour, oownert . brown, john mason, 1900 hoses, moi'itrose j.t american ambassaoor joseph c. gr american ambassaojrt the america the story of the worl the america. the story of the world the american burlesque show. the american burlesque showt america-s concentration camps by america-s concent~ation campst americai'i jews in israel by haao americai'i jews in israelt the americai'i occjpational structur the american etcupational structure the american occ~pational structur the american occupational structure the american party systems stages the american part~ systems• stages the american president the amer[can presioentt the american schjolbook. the american schoolbook* readings the american school a sociologic the american scholl. a sociological american space exploration the f american spat~ exploration. the fir the american theater today, eoite the american theater. today* the american theatre as seen by it the american theatre as seen by its moses, montrosf jqnaso the american the4tre as seen by it hoses, montrose j.t the american theatre as seen by its mcgreal, ian philip, 19 analyzing philosophical arguments mcgreaf, jan phillipt analyzing philosophical arguments. pouraoe, richard f. pourade* ancient hunters jf the far west, ancient hunters of the far west* ~ o;, 0" ~ ....... .q.. t"'l .... ~ ~ ~ i e· ;:$ < 0 r....... ~ t1 (!) (') (!) g. (!) ..:-: ....... cd 85 bibliographic retrievaljrvecking 237 table 6. table of results retrieve total correct false percentage values hits hits hits correct 6 14 14 0 100 8 0 0 0 0 10 311 311 0 100 12 264 248 16 93.3 14 232 232 0 100 16 118 118 0 100 18 260 260 0 100 20 1 1 0 100 totals 1200 1184 16 98.7 table 7. distribution of errors title errors author errors no. of title author author codes error spelling lacking error spelling other total 1 2 3 10 12 27 4 58 2 2 6 17 26 60 23 134 3 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 total 4 9 27 38 87 27 192 the occurrence of titles with the words "selected". or "collected," etc., produced additional false drop when the title word string exceeded two words. a modification to the search program to raise the threshhold when the input data contain codes such as 'sect; 'coct' would increase the retrieve accuracy to 99.17% the presence of personal names in titles, such as 'charles evans hughes' and 'franklin delano roosevelt' caused seven additional false drops. at present it seems unlikely that a simple method to prevent them can be included. conclusion the experimental results indicate that the hypothesis suggested is valid. use of multiple codes for added entry, added title in addition to the main entry, and main title data are clearly necessary. approximately 10% of the correctly retrieved items were produced by the existence of an added entry code. the influence of spelling accuracy was lessened by use of a compression technique. an inspection of extracted titles revealed the existence of 43 spelling errors which did not affect retrieval. thus, the search code reduced the significance of spelling by some 30%. utilizing table search followed by table look-up and linking random238 journal of library automation vol. 1/ 4 december, 1968 access addresses, should enable the search code approach to bibliographic retrieval to provide rapid, direct access to the title sought. acknowledgment this study was supported in part by national science foundation grants gn-758 and gu-1153 and by the regional information and communication exchange. the assistance of the acquisitions department staff, the research computation center staff and the staff of the fondren library's data processing division is gratefully acknowledged. references 1. morris, ned c.: "computer based acquisitions system at texas a & i university," journal of library automation, 1 (march 1968 ), 1-12. 2. wedgeworth, robert: "brown university library fund accounting system," i ournal of library automation, 1 (march 1968), 51-65. 3. u. s. library of congress: project marc, an experiment in automating library of congress catalog data (washington: 1967). 4. richmond, phyllis a.: "note on updating and searching computerized catalogs," library resources and technical services, 10 (spring 1966), 155-160. 5. richmond, phyllis a.: "source retrieval," physics today, 18 (april 1965)' 46-48. 6. atherton, p.; yorich, j. c.: three experiments with citation indexing and bibliographic coupling of physics literature (new york, american institute of physics, 1962). 7. international business machines corporation: reference manual, index organization for information retrieval (ibm, 1961). 8. international business machines corporation: a unique computable name code for alphabetic account numbering (white plains, n.y.: ibm, 1960). 9. tainiter, m.: "addressing random-access storage with multiple bucket capacities," association for computing machinery journal, 10 (july 1963 ), 307-315. 10. toyoda, junichi; tazuka, yoshikazu; kasahara, yoshiro: "analysis of the address assignment problems for clustered keys," association for computing machinery journal, 13 (october 1966), 526-532. 11. dolby, james l.; resnikoff, howard l.: "on the structure of written english words," language, 40 (apr-june 1964), 167-196. 12. resnikoff, howard l.; dolby, james l.: "the nature of affixing in written english, part i," mechanical translation, 8 (march 1965), 84-89. 13. resnikoff, howard l.; dolby, james l.: "the nature of affixing in written english, part ii," mechanical translation, 9 (june 1966), 23-33. a simple scheme for book classification using wikipedia | yelton 7 andromeda yelton a simple scheme for book classification using wikipedia ■■ background hanne albrechtsen outlines three types of strategies for subject analysis: simplistic, content-oriented, and requirements-oriented.3 in the simplistic approach, “subjects [are] absolute objective entities that can be derived as direct linguistic abstractions of documents.” the content-oriented model includes an interpretive step, identifying subjects not explicitly stated in the document. requirementsoriented approaches look at documents as instruments of communication; thus they anticipate users’ potential information needs and consider the meanings that documents may derive from their context. (see, for instance, the work of hjørland and mai.4) albrechtsen posits that only the simplistic model, which has obvious weaknesses, is amenable to automated analysis. the difficulty in moving beyond a simplistic approach, then, lies in the ability to capture things not stated, or at least not stated in proportion to their importance. synonymy and polysemy complicate the task. background knowledge is needed to draw inferences from text to larger meaning. these would be insuperable barriers if computers limited to simple word counts. however, thesauri, ontologies, and related tools can help computers as well as humans in addressing these problems; indeed, a great deal of research has been done in this area. for instance, enriching metadata with princeton university’s wordnet and the national library of medicine’s medical subject headings (mesh) is a common tactic,5 and the yahoo! category structure has been used as an ontology for automated document classification.6 several projects have used library of congress classification (lcc), dewey decimal classification (ddc), and similar library tools for automated text classification, but their results have not been thoroughly reported.7 all of these tools have had problems, though, with issues such as coverage, currency, and cost. this has motivated research into the use of wikipedia in their stead. since wikipedia’s founding in 2001, it has grown prodigiously, encompassing more than 3 million articles in its english edition alone as of this writing; this gives it unparalleled coverage. wikipedia also has many thesaurus-like features. redirects function as “see” references by linking synonyms to preferred terms. disambiguation pages deal with homonyms. the polyhierarchical category structure provides broader and narrower term relationships; the vast majority of pages belong to at least one category. links between pages function as related-term indicators. editor’s note: this article is the winner of the lita/ex libris student writing award, 2010. because the rate at which documents are being generated outstrips librarians’ ability to catalog them, an accurate, automated scheme of subject classification is desirable. however, simplistic word-counting schemes miss many important concepts; librarians must enrich algorithms with background knowledge to escape basic problems such as polysemy and synonymy. i have developed a script that uses wikipedia as context for analyzing the subjects of nonfiction books. though a simple method built quickly from freely available parts, it is partially successful, suggesting the promise of such an approach for future research. a s the amount of information in the world increases at an ever-more-astonishing rate, it becomes both more important to be able to sort out desirable information and more egregiously daunting to manually catalog every document. it is impossible even to keep up with all the documents in a bounded scope, such as academic journals; there were more than twenty-thousand peer-reviewed academic journals in publication in 2003.1 therefore a scheme of reliable, automated subject classification would be of great benefit. however, there are many barriers to such a scheme. naive word-counting schemes isolate common words, but not necessarily important ones. worse, the words for the most important concepts of a text may never occur in the text. how can this problem be addressed? first, the most characteristic (not necessarily the most common) words in a text need to be identified—words that particularly distinguish it from other texts. some corpus that connects words to ideas is required—in essence, a way to automatically look up ideas likely to be associated with some particular set of words. fortunately, there is such a corpus: wikipedia. what, after all, is a wikipedia article, but an idea (its title) followed by a set of words (the article text) that characterize that title? furthermore, the other elements of my scheme were readily available. for many books, amazon lists statistically improbable phrases (sips)— that is, phrases that are found “a large number of times in a particular book relative to all search inside! books.”2 and google provides a way to find pages highly relevant to a given phrase. if i used google to query wikipedia for a book’s sips (using the query form “site:en.wikipedia .org sip”), would wikipedia’s page titles tell me something useful about the subject(s) of the book? andromeda yelton (andromeda.yelton@gmail.com) graduated from the graduate school of library and information science, simmons college, boston, in may 2010. 8 information technology and libraries | march 2011 ■■ an initial test case to explore whether my method was feasible, i needed to try it on a test case. i chose stephen hawking’s a brief history of time, a relatively accessible meditation on the origin and fate of the universe, classified under “cosmology” by the library of congress. i began by looking up its sips on amazon.com. noticing that amazon also lists capitalized phrases (caps)—“people, places, events, or important topics mentioned frequently in a book”—i included those as well (see table 1).14 i then queried wikipedia via google for each of these phrases, using queries such as “site:en.wikipedia .org ‘grand unification theory.’” i selected the top three wikipedia article hits for each phrase. this yielded a list of sixty-one distinct items with several interesting properties: ■■ four items appeared twice (arrow of time, entropy [arrow of time], inflation [cosmology], richard feynman). however, nothing appeared more than twice; that is, nothing definitively stood out. ■■ many items on the list were clearly relevant to brief history, although often at too small a level of granularity to be good subject headings (e.g., black hole, second law of thermodynamics, time in physics). ■■ some items, while not unrelated, were wrong as subject classifications (e.g., list of solar system objects by size, nobel prize in physics). ■■ some items were at best amusingly, and at worst bafflingly, unrelated (e.g., alpha centauri [doctor who], electoral district [canada], james k. polk, united states men’s national soccer team). ■■ in addition, i had to discard some of the top google hits because they were not articles but wikipedia special pages, such as “talk” pages devoted to discussion of an article. this test showed that i needed an approach that would give me candidate subject headers at a higher level of granularity. i also needed to be able to draw a brighter line between candidates and noncandidates. the presence of noncandidates was not in itself distressing—any automated approach will consider avenues a human would not—but not having a clear basis for discarding low-probability descriptors was a problem. as it happens, wikipedia itself offers candidate subject headers at a higher level of granularity via its categories system. most articles belong to one or more categories, which are groups of pages belonging to the same list or topic.15 i hoped that by harvesting categories from the sixty-one pages i had discovered, i could improve my method. this yielded a list of more than three hundred categories. unsurprisingly, this list mostly comprised irrelevant because of this thesaurus structure, all of which can be harvested and used automatically, many researchers have used wikipedia for metadata enrichment, text clustering and classification, and the like. for example, han and zhao wanted to automatically disambiguate names found online but faced many problems familiar to librarians: “the traditional methods measure the similarity using the bag of words (bow) model. the bow, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. so the bow cannot reflect the actual similarity.” to counter this, they constructed a semantic model from information on wikipedia about the associative relationships of various ideas. they then used this model to find relationships between information found in the context of the target name in different pages. this enabled them to accurately group pages pertaining to particular individuals.8 carmel, roitman, and zwerdling used page categories and titles to enhance labeling of document clusters. although many algorithms exist for sorting large sets of documents into smaller, interrelated clusters, there is less work on labeling those clusters usefully. by extracting cluster keywords, using them to query wikipedia, and algorithmically analyzing the results, they created a system whose top five recommendations contained the human-generated cluster label more than 85 percent of the time.9 schönhofen looked at the same problem i examine— identifying document topics with wikipedia data—but he used a different approach. he calculated the relatedness between categories and words from titles of pages belonging to those categories. he then used that relatedness to determine how strongly words from a target document predicted various wikipedia categories. he found that although his results were skewed by how wellrepresented topics were on wikipedia, “for 86 percent of articles, the top 20 ranked categories contain at least one of the original ones, with the top ranked category correct for 48 percent of articles.”10 wikipedia has also been used as an ontology to improve clustering of documents in a corpus,11 to automatically generate domain-specific thesauri,12 and to improve wikipedia itself by suggesting appropriate categories for articles.13 in short, wikipedia has many uses for metadata enrichment. while text classification is one of these potential uses, and one with promise, it is under-explored at present. additionally, this exploration takes place almost entirely in the proceedings of computer science conferences, often without reference to library science concepts or in a place where librarians would be likely to benefit from it. this paper aims to bridge that gap. a simple scheme for book classification using wikipedia | yelton 9 computationally trivial to do so, given such a list. (the list need not be exhaustive as long as it exhaustively described category types; for instance, the same regular expression could filter out both “articles with unsourced statements from october 2009” and “articles with unsourced statements from may 2008.”) at this stage of research, however, i simply ignored these categories in analyzing my results. to find a variety of books to test, i used older new york times nonfiction bestseller lists because brand-new books are less likely to have sips available on amazon.19 these lists were heavily slanted toward autobiography, but also included history, politics, and social science topics. ■■ results of the thirty books i examined (the top fifteen each from paperback and hardback nonfiction lists), twenty-one had sips and caps available on amazon. i ran my script against each of these phrase sets and calculated three measures for each resulting category list: ■■ precision (p): of the top categories, how many were synonyms or near-synonyms of the book’s lcshs? ■■ recall (r): of the book’s lcshs, how many had synonyms or near-synonyms among the top categories? ■■ right-but-wrongs (rbw): of the top categories, how many are reminiscent of the lcshs without actually being synonymous? these included narrower terms (e.g., the category “african_american_actors” when the lcshs included “actors—united states —biography”), broader terms (e.g., “american_folk_ singers” vs. “dylan, bob, 1941–”), related terms (e.g., “the_chronicles_of_narnia_books” vs. “lion, the witch and the wardrobe (motion picture)”), and examples (“killian_documents_controversy” vs. “united states—politics and government—2001–2009”). i considered the “top categories” for each book to be the five that most commonly occurred (excluding wikipedia administrative categories), with the following exceptions: ■■ because i had no basis to distinguish between them, i included all equally popular categories, even if that would bring the total to more than five. thus, for example, for the book collapse, the most common category occurred seven times, followed by two categories with five appearances and six categories with four. rather than arbitrarily selecting two of the six four-occurrence categories to bring the total to five, i examined all nine top categories. ■■ if there were more than five lcshs, i expanded the number of categories accordingly, so as not to candidates (“wars involving the states and peoples of asia,” “video games with expansion packs,” “organizations based in sweden,” among many others). many categories played a clear role in the wikipedia ecology of knowledge but were not suitable as general-purpose subject headers (“living people,” “1849 deaths”). strikingly, though, the vast majority of candidates occurred only once. only forty-two occurred twice, fifteen occurred three times, and one occurred twelve times: “physical cosmology.” twelve occurrences, four times as many as the next candidate, looked like a bright line. and “physical cosmology” is an excellent description of brief history— arguably better than lcsh’s “cosmology.” the approach looked promising. ■■ automating further test cases the next step was to test an extensive variety of books to see if the method was more broadly applicable. however, running searches and collating queries for even one book is tedious; investigating a large number by hand was prohibitive. therefore i wrote a categorization script (see appendix) that performs the following steps:16 ■■ reads in a file of statistically improbable phrases17 ■■ runs google queries against wikipedia for all of them18 ■■ selects the top hits after filtering out some common wikipedia nonarticles, such as “category” and “user” pages ■■ harvests these articles’ categories ■■ sorts these categories by their frequency of occurrence this algorithm did not filter out wikipedia administrative categories, as creating a list of them would have been prohibitively time-consuming. however, it would be table 1. sips and caps for a brief history of time sips grand unification energy, complete unified theory, thermodynamic arrow, psychological arrow, primordial black holes, boundary proposal, hot big bang model, big bang singularity, more quarks, contracting phase, sum over histories caps alpha centauri, solar system, nobel prize, north pole, united states, edwin hubble, royal society, richard feynman, milky way, roger penrose, first world war, weak anthropic principle 10 information technology and libraries | march 2011 “continental_army_generals” vs. “united states— history—revolution, 1775–1783.” ■■ weak: some categories treated the same subject as the lcsh but not at all in the same way ■■ wrong: the categories were actively misleading the results are displayed in table 2. ■■ discussion the results of this test were decidedly more mixed than those of my initial test case. on some books the wikipedia method performed remarkably well; on misleadingly increase recall statistics. ■■ i did not consider any categories with fewer than four occurrences, even if that left me with fewer than five top categories to consider. the lists of three-, two-, and one-occurrence categories were very long and almost entirely composed of unrelated items. i also considered, subjectively, the degree of overlap between the lcshs and the top wikipedia categories. i chose four degrees of overlap: ■■ strong: the top categories were largely relevant and included synonyms or near-synonyms for the lcsh ■■ near miss: some categories suggested the lcsh but missed its key points, such as table 2. results (sorted by percentage of relevant categories). book p r rbw subjective quality chronicles, bob dylan 0.2 0.5 0.8 strong the chronicles of narnia: the lion, the witch and the wardrobe official illustrated movie companion, perry moore 0.25 1 0.625 strong 1776, david mccullough 0 0 0.8 near miss 100 people who are screwing up america, bernard goldberg 0 0 0.625 weak the bob dylan scrapbook, 1956–1966, with text by robert santelli 0.2 0.5 0.4 strong three weeks with my brother, nicholas sparks 0 0 0.57 weak mother angelica, raymond arroyo 0.07 0.33 0.43 near miss confessions of a video vixen, karrine steffans 0.25 0.33 0.25 weak the fairtax book, neal boortz and john linder 0.17 0.33 0.33 strong never have your dog stuffed, alan alda 0 0 0.43 weak the world is flat, thomas l. friedman 0.4 0.5 0 near miss the tender bar, j. r. moehringer 0 0 0.2 wrong the tipping point, malcolm gladwell 0 0 0.2 wrong collapse, jared diamond 0 0 0.11 weak blink, malcolm gladwell 0 0 0 wrong freakonomics, steven d. levitt and stephen j. dubner 0 0 0 wrong guns, germs, and steel, jared diamond 0 0 0 weak magical thinking, augusten burroughs 0 0 0 wrong a million little pieces, james frey 0 0 0 wrong worth more dead, ann rule 0 0 0 wrong tuesdays with morrie, mitch albom no category with more than 4 occurrences a simple scheme for book classification using wikipedia | yelton 11 my method’s success with a brief history of time. i tested another technical, jargon-intensive work (n. gregory mankiw’s macroeconomics textbook), and found that the method also worked very well, giving categories such as “macroeconomics” and “economics terminology” with high frequency. therefore a system of this nature, even if not usable for a broad-based collection, might be very useful for scientific or other jargon-intensive content such as a database of journal articles. ■■ future research the method outlined in this paper is intended to be a proof of concept using readily available tools. the following work might move it closer to a real-world application: ■■ a configurable system for providing statistically improbable phrases; there are many options.23 this would provide the user with more control over, and understanding of, sip generation (instead of the amazon black box), as well as providing output that could integrate directly with the script. ■■ a richer understanding of the wikipedia category system. some categories (e.g., “all articles with unsourced statements”) are clearly useful only for wikipedia administrative purposes, not as document descriptors; others (e.g., “physical cosmology”) are excellent subject candidates; others have unclear value as subjects or require some modification (e.g., “environmental non-fiction books,” “macroeconomics stubs”). many of these could be filtered out or reformatted automatically. ■■ greater use of wikipedia as an ontology. for example, a map of the category hierarchies might help locate headers at a useful level of granularity, or to find the overarching meaning suggested by several headers by finding their common broader terms. a more thorough understanding of wikipedia’s relational structure might help disambiguate terms.24 others, it performed very poorly. however, there are several patterns here: many of these books were autobiographies, and the method was ineffective on nearly all of these.20 a key feature of autobiographies, of course, is that they are typically written in the first person, and thus lack any term for the major subject—the author’s name. biography, by contrast, is rife with this term. this suggests that including titles and authors along with sips and caps may be wise. additionally, it might require making better use of wikipedia as an ontology to look for related concepts (rather in the manner that han and zhao used it for name disambiguation).21 books that treat a single, well-defined subject are easier to analyze than those with more sprawling coverage. in particular, books that treat a concept via a sequence of illustrative essays (e.g., tipping point, freakonomics) do not work well at all. sips may apply only to particular chapters rather than to the book as a whole, and the algorithm tends to pick out topics of particular chapters (e.g., for freakonomics, the fascinating chapter on sudhir venkatesh’s work on “gangs_in_chicago, _illinois”22) rather than the connecting threads of the entire book (e.g. “economics—sociological aspects”). the tactics suggested for autobiography might help here as well. my subjective impressions were usually, but not always, borne out by the statistics. this is because some of the rbws were strongly related to one another and suggested to a human observer a coherent narrative, whereas others picked out minor or dissimilar aspects of the book. there was one more interesting, and promising, pattern: my subjective impressions of the quality of the categories were strongly predicted by the frequency of the most common category. remember that in the brief history example, the most common category, “physical cosmology,” occurred twelve times, conspicuously more than any of its other categories. therefore i looked at how many times the top category for each book occurred in my results. i averaged this number for each subjective quality group; the results are in table 3. in other words, the easier it was to draw a bright line between common and uncommon categories, the more likely the results were to be good descriptions of the work. this suggests that a system such as this could be used with very little modification to streamline categorization. for example, it could automatically categorize works when it met a high confidence threshold (when, for instance, the most common category has double-digit occurrence), suggest categories for a human to accept or reject at moderate confidence, and decline to help at low confidence. it was also interesting to me that—unlike my initial test case—none of the bestsellers were scientific or technical works. it is possible that the jargon-intensive nature of science makes it easier to categorize accurately, hence table 3. category frequency and subjective quality subjective quality of categories frequencies of most common category average frequency of most common category strong 6, 12, 16, 19 13.25 near miss 5, 5, 7, 10 6.75 weak 4, 5, 6, 7, 8 6 wrong 3, 4, 4, 5, 5, 5, 7, 7 5 12 information technology and libraries | march 2011 (1993): 219. 4. birger hjørland, “the concept of subject in information science,” journal of documentation 48, no. 2 (1992): 172; jenserik mai, “classification in context: relativity, reality, and representation,” knowledge organization 31, no. 1 (2004): 39; jens-erik mai, “actors, domains, and constraints in the design and construction of controlled vocabularies,” knowledge organization 35, no. 1 (2008): 16. 5. xiaohua hu et al., “exploiting wikipedia as external knowledge for document clustering,” in proceedings of the 15th acm sigkdd international conference on knowledge discovery and data mining, paris, france, 28 june–1 july 2009 (new york: acm, 2009): 389. 6. yannis labrou and tim finin, “yahoo! as an ontology— using yahoo! categories to describe documents,” in proceedings of the eighth international conference on information and knowledge management, kansas city, mo, usa 1999 (new york: acm, 1999): 180. 7. kwan yi, “automated text classification using library classification schemes: trends, issues, and challenges,” international cataloging & bibliographic control 36, no. 4 (2007): 78. 8. xianpei han and jun zhao, “named entity disambiguation by leveraging wikipedia semantic knowledge,” in proceeding of the 18th acm conference on information and knowledge management, hong kong, china, 2–6 november 2009 (new york: acm, 2009): 215. 9. david carmel, haggai roitman, and naama zwerdling, “enhancing cluster labeling using wikipedia,” in proceedings of the 32nd international acm sigir conference on research and development in information retrieval, boston, ma, usa (new york: acm, 2009): 139. 10. peter schönhofen, “identifying document topics using the wikipedia category network,” in proceedings of the 2006 ieee/wic/acm international conference on web intelligence, hong kong, china, 18–22 december 2006 (los alamitos, calif.: ieee computer society, 2007). 11. hu et al., “exploiting wikipedia.” 12. david milne, olena medelyan, and ian h. witten, “mining domain-specific thesauri from wikipedia: a case study,” in proceedings of the 2006 ieee/wic/acm international conference on web intelligence, 22–26 december 2006 (washington, d.c.: ieee computer society, 2006): 442. 13. zeno gantner and lars schmidt-thieme, “automatic content-based categorization of wikipedia articles,” in proceedings of the 2009 workshop on the people’s web meets nlp, acl-ijcnlp 2009, 7 august 2009, suntec, singapore (morristown, n.j.: association for computational linguistics, 2009): 32. 14. “amazon.com capitalized phrases,” amazon.com, http://www.amazon.com/gp/search-inside/capshelp.html/ ref=sib_caps_help (accessed mar. 13, 2010). 15. for more on the epistemological and technical roles of categories in wikipedia, see http://en.wikipedia.org/wiki/ wikipedia:categorization. 16. two sources greatly helped the script-writing process: william steinmetz, wicked cool php: real-world scripts that solve difficult problems (san francisco: no starch, 2008); and the documentation at http://php.net. 17. not all books on amazon.com have sips, and books that do may only have them for one edition, although many editions may be found separately on the site. there is not a readily apparent pattern determining which edition features sips. therefore ■■ a special-case system for handling books and authors that have their own article pages on wikipedia. in addition, a large-scale project might want to work from downloaded snapshots of wikipedia (via http:// download.wikimedia.org/), which could be run on local hardware rather than burdening their servers, this would require using something other than google for relevance ranking (there are many options), with a corresponding revision of the categorization script. ■■ conclusions even a simple system, quickly assembled from freely available parts, can have modest success in identifying book categories. although my system is not ready for real-world applications, it demonstrates that an approach of this type has potential, especially for collections limited to certain genres. given the staggering volume of documents now being generated, automated classification is an important avenue to explore. i close with a philosophical point. although i have characterized this work throughout as automated classification, and it certainly feels automated to me when i use the script, it does in fact still rely on human judgment. wikipedia’s category structure and its articles linking text to title concepts are wholly human-created. even google’s pagerank system for determining relevancy rests on human input, using web links to pages as votes for them (like a vast citation index) and the texts of these links as indicators of page content.25 my algorithm therefore does not operate in lieu of human judgment. rather, it lets me leverage human judgment in a dramatically more efficient, if also more problematic, fashion than traditional subject cataloging. with the volume of content spiraling ever further beyond our ability to individually catalog documents—even in bounded contexts like academic databases, which strongly benefit from such cataloging— we must use human judgment in high-leverage ways if we are to have a hope of applying subject cataloging everywhere it is expected. references and notes 1. carol tenopir. “online databases—online scholarly journals: how many?” library journal (feb. 1, 2004), http://www .libraryjournal.com/article/ca374956.html (accessed mar. 13, 2010). 2. “amazon.com statistically improbable phrases,” amazon. com, http://www.amazon.com/gp/search-inside/sipshelp .html/ref=sib_sip_help (accessed mar. 13, 2010). 3. hanne albrechtsen. “subject analysis and indexing: from automated indexing to domain analysis,” the indexer, 18, no. 4 a simple scheme for book classification using wikipedia | yelton 13 problematic million little pieces to be autobiography, as it has that writing style, and as its lcsh treats it thus. 21. han and zhao, “named entity disambiguation.” 22. sudhir venkatesh, off the books: the underground economy of the urban poor (cambridge: harvard univ. pr., 2006). 23. see karen coyle, “machine indexing,” the journal of academic librarianship 34, no. 6 (2008): 530. she gives as examples phraserate (http://ivia.ucr.edu/projects/phraserate/), kea (http://www.nzdl.org/kea/), and extractor (http://extractor. com/). 24. per han and zhao, “named entity disambiguation.” 25. lawrence page et al., “the pagerank citation ranking: bringing order to the web,” stanford infolab (1999), http:// ilpubs.stanford.edu:8090/422/ (accessed mar. 13, 2010). this paper precedes the launch of google; as the title indicates, the citation index is one of google’s foundational ideas. this step cannot be automated. 18. be aware that running automated queries without permission is an explicit violation of google’s terms of service. seegoogle webmaster central, “automated queries,” http://www.google.com/support/webmasters/bin/answer .py?hl=en&answer=66357 (accessed mar. 13, 2010). before using this script, obtain an api key, which confers this permission. ajax web search api keys can be instantly and freely obtained via http://code.google.com/apis/ajaxsearch/web.html. 19. “hardcover nonfiction,” new york times, oct. 9, 2005, http://www.nytimes.com/2005/10/09/books/bestseller /1009besthardnonfiction.html?_r=1 (accessed mar. 13, 2010); “paperback nonfiction,” new york times, oct. 9, 2005, http://www .nytimes.com/2005/10/09/books/bestseller/1009bestpapernon fiction.html?_r=1 (accessed mar. 13, 2010). 20. for the purposes of this discussion i consider the appendix. php script for automated classification 4) { echo “i’m sorry; the number specified cannot be more than 4.”; die; } // next, turn our comma-separated list into an array. 14 information technology and libraries | march 2011 $sip_temp = fopen($argv[1], ‘r’); $sip_list = ‘’; while (! feof($sip_temp)) { $sip_list .= fgets($sip_temp, 5000); } fclose($sip_temp); $sip_array = explode(‘, ‘, $sip_list); /* here we access google search results for our sips and caps. it is a violation of the google terms of service to run automated queries without permission. obtain an ajax api key via http://code.google.com. */ $apikey = ‘your_key_goes_here’; foreach($sip_array as $query) { /* in multiword terms, change spaces to + so as not to break the google search. */ $query = str_replace( “ “, “+”,,” $query); $googresult = “http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=site%3aen.wikipedia.org+$query&key=$apikey”; $googdata = file_get_contents($googresult); // pick out the urls we want and put them into the array $links preg_match_all(‘|” url”:” [^” ]*”|i’,, $googdata, $links); /* strip out some crud from the json syntax to get just urls */ $links[0] = str_replace( “\” url\”:\” “, “”, $links[0]); $links[0] = str_replace(“\” “, “”, $links[0]); /* here we step through the links in the page google returned to us and find the top wikipedia articles among the results */ $i=0; foreach($links[0] as $testlink) { /* these variables test to see if we have hit a wikipedia special page instead of an article. there are many more flavors of special page, but these are the most likely to show up in the first few hits. */ $filetest = strpos($testlink, ‘wiki/file:’); $cattest = strpos($testlink, ‘wiki/category:’); $usertest = strpos($testlink, ‘wiki/user’); $talktest = strpos($testlink, ‘wiki/talk:’); $disambtest = strpos($testlink, ‘(disambiguation)’); $templatetest = strpos($testlink, ‘wiki/template_’); if (!$filetest && !$cattest && !$usertest && !$talktest && !$disambtest && !$templatetest) { $wikipages[] = $testlink; $i++; } /* once we’ve accumulated as many article pages as the user asked for, stop adding links to the $wikipages array. */ appendix. php script for automated classification (continued) a simple scheme for book classification using wikipedia | yelton 15 if ($i == $argv[2]) { break; } //this closes the foreach loop which steps through $links } // this closes the foreach loop which steps through $sip_array } /* for each page that we identified in the above step, let’s find the categories it belongs to. */ $mastercatarray = array(); foreach ($wikipages as $targetpage) { // scrape category information from the article page. $wikiscrape = file_get_contents($targetpage); preg_match_all(“|/wiki/category.[^\” ]+|”,,” $wikiscrape, $categories); foreach ($categories[0] as $catstring) { /* strip out the “wiki/category:” at the beginning of each string */ $catstring = substr($catstring, 15); /* keep count of how many times we’ve seen this category. */ if (array_key_exists($catstring, $mastercatarray)) { $mastercatarray[$catstring]++; } else { $mastercatarray[$catstring] =1; } } } // sort by value: most popular categories first. arsort($mastercatarray); echo “the top categories are:\n”; print_r($mastercatarray); ?> appendix. php script for automated classification (continued) mapping for the masses: gis lite and online mapping tools in academic libraries kathleen w. weessies and daniel s. dotson information technology and libraries | march 2013 23 abstract customized maps depicting complex social data are much more prevalent today than in the past. not only in formal published outlets, interactive mapping tools make it easy to create and publish custom maps in both formal and more casual outlets such as social media. this article defines gis lite, describes three commercial products currently licensed by institutions, and discusses issues that arise from their varied functionality and license restrictions. introduction news outlets from newspapers to television to internet these days are filled with maps that make it possible for readers to visualize complex social data. presidential election results, employment rates, and the plethora of data arising from the census of population are just a small sampling of social data mapped and consumed daily. the sharp rise in published maps in recent years has increased consumer awareness of the effectiveness of presenting data in map format and has raised expectations for finding, making and using customized maps. not just in news media, but in academia also, researchers and students have high interest in being able to make and use maps in their work. just a few years ago even the simplest maps had to be custom made by specialists. researchers and publishers had to seek out highly trained experts to make maps on their behalf. as a result, custom maps were generally only to be found in formal publications. the situation has changed partly because geographic information system (gis) software for geographic analysis and map making is more readily available than in years past. it does, however, remain specialized and wants considerable training for users to be proficient at even a basic level.1 this gap between supply and demand has been partly filled, especially in the last five years, by the growth of internet-based “gis lite” tools. while some basic tools are freely available on the internet, several tools are subscription-based and are licensed by libraries, schools and businesses for use. college and university libraries especially are quickly becoming a major resource for data visualization and mapping tools. the aim of this article is to describe several data-rich gis lite tools available in the library market and how these products have met or failed to meet the needs of several real-life college class kathleen w. weessies (weessie2@msu.edu), a lita member, is geosciences librarian and head of the map library, michigan state university, lansing. michigan. daniel s. dotson (dotson.77@osu.edu) is mathematical sciences librarian and science education specialist, associate professor, ohio state university libraries, columbus, ohio. mailto:weessie2@msu.edu mailto:dotson.77@osu.edu mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 24 situations. this is followed by a discussion of issues arising from user needs and restrictions posed by licensing and copyright. what is gis lite? students and faculty across the academic spectrum often discover that their topic has a geographic element to it and a map would enhance their work (paper, presentation, project, poster, article, book, thesis or dissertation, etc.). if their research involves data analysis, geospatial tools will draw attention to spatial patterns in the data that might not otherwise be apparent. every scholar with such needs must make a cost/benefit decision concerning gis: is his or her need greater than the cost in time and effort (sometimes money) necessary to learn or hire skills to produce map products? a full functioning gis, being a specialized system of software designed to work with geospatially referenced datasets, is designed to address all the problems above. the data may be analyzed and output into customized maps exactly to the researcher’s need. the traditional lowend solution available to non-experts, on the other hand, is colorizing a blank outline map, either with hand-held tools (markers, colored pencils, etc.) or on a computer using a graphic editing program. the profusion of web mapping options dangles tantalizingly with possibility, and occasionally (and increasingly) is able to provide an output that illustrates a useful point of users’ research in a professional enough manner to fill a need. in recent years the web has blossomed with map applications collectively called the “geoweb” or “geospatial web.” geoweb or geospatial web refers to the “emerging distributed global gis, which is a widespread distributed collaboration of knowledge and discovery.”2 some geoweb applications are well known street map resources such as google maps and mapquest. others are designed to deliver data from an organization, such as the national hazards support system (http://nhss.cr.usgs.gov), national pipeline mapping system (http://www.npms.phmsa.dot.gov/publicviewer), and the broadband map (http://www.broadbandmap.gov). a few tools focus on map creation and output such as arcgis online (http://www.arcgis.com/home/webmap/viewer.html) and scribble maps (http://www.scribblemaps.com). the newest subgenre of the geoweb consists of participatory mapping sites such as openstreet map (http://www.openstreetmap.org), did you feel it? (http://earthquake.usgs.gov/earthquake.usgs.gov/earthquakes/dyfi), and ushahidi (http://community.ushahidi.com/deployments). the geoweb literature is small but growing. 3 elwood reviewed published research on the geographic web.4 the geoweb literature tends to focus on creation of mappable data and delivery of geoweb services.5 in these the map consumer only appears as a contributor of data. very little has been written about users’ needs from the geoweb. the term gis lite has arisen among map and gis librarians to describe a subset of geoweb applications. gis lite is useful to library patrons lacking specialized gis training but who wish to conduct some gis and map-making activities on a lower learning curve. for the purpose of this article, gis lite will refer to applications, usually web-based, which allow users to manipulate geospatial data and create map outputs without programming skills or training in full gis software. http://nhss.cr.usgs.gov/ http://www.npms.phmsa.dot.gov/publicviewer http://www.broadbandmap.gov/ http://www.arcgis.com/home/webmap/viewer.html http://www.scribblemaps.com/ http://www.openstreetmap.org/ http://earthquake.usgs.gov/earthquake.usgs.gov/earthquakes/dyfi http://community.ushahidi.com/deployments information technology and libraries | march 2013 25 while many geoweb applications allow only low-level output options, gis lite will provide an output intended to be used in activities or rolled into a gis for further geospatial processing. in libraries, gis lite is closely allied with data and statistics resources. data and statistics librarianship have already been discussed as disciplines in the literature such as by hogenboom6 and gray.7 new technologies and access to deeper data resources such as the ones presented here have raised the bar for librarians’ responsibilities for curating, serving, and aiding patrons in its use. rather than be passive shepherds of information resources, librarians are now active participants and even information partners. librarians with map and gis skills similarly can directly enhance the quality of student scholarship across academic disciplines.8 the gis lite resources, however, need not remain specialized tools of map and gis librarians. librarians working in disciplines across the academic spectrum may incorporate them into their arsenal of tools to meet patron needs. data visualization tools a growing number of academic libraries have licensed access to online data providers. the following data tools contain enough gis lite functionality to aid patrons in visualizing and manipulating data (primarily social data) and creating customized map outputs. three of the more powerful commercial products described here are social explorer, simplymap, and proquest statistical datasets. social explorer licensed by oxford university press, social explorer provides selected data from the us decennial census 1790 to 2010, plus american community survey 2006 through 2010.9 the interface enables either retrieval of tabular data or visualization of data in an interactive map. as the user selects options through pull-down menus, the map automatically refreshes to reflect the chosen year and population statistics. the level of geography depicted defaults to county level data. if a user zooms in to an area smaller than a county, then data refreshes to smaller geographies such as census tracts if they are available at that level for that year. output is in the form of graphic files suitable for sharing in a computer presentation (see figure 1). one advantage of social explorer is that it utilizes historic boundaries as they existed for states, territories, counties, and census tracts for each given year. social explorer utilizes data and boundary files generated by the national historical gis (nhgis) based at the university of minnesota in collaboration with other partners. the creation of these historical boundaries was a significant undertaking and accomplishment.10 custom tables of data and the historic geographic boundaries may also be retrieved and downloaded for use from an affiliated engine through the nhgis website (http://www.nhgis.org). a disadvantage of this product is that the tool, while robust, does not completely replicate all the data available in the original paper census volumes. also, historical boundaries have not been created for city or township-level data. the final map layout is not customizable either in the location of title and legend or in the data intervals. http://www.nhgis.org/ mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 26 figure 1: map depicting population having four or more years of college, 1960 (source: social explorer, 2012; image used with permission) simplymap simplymap (http://geographicresearch.com/simplymap) is a product of geographic research. this powerful interface brings together public and licensed proprietary data to offer a broad array of 75,000 data variables in the united states. us census data are available 1980–2010 normalized to the user’s choice of either year 2000 or year 2010 geographies. numerous other licensed datasets primarily focus on demographics and consumer behavior, which makes it popular as a marketing research tool. each user establishes a personal login which allows created maps and tables to persist from session to session. upon creating a map view, the user may adjust the smaller geographic unit at which the theme data is displayed and also may adjust the data intervals as desired. the user creates a layout, adjusting the location of the map legend and title before exporting as a graphic or pdf (see figure 2). data are also exportable as gis-friendly shapefiles. http://geographicresearch.com/simplymap information technology and libraries | march 2013 27 the great advantage of this product is the ability to customize the data intervals. this makes it possible to filter the data and display specific thresholds meaningful to the user. for instance if a user needs to illustrate places where an activity or characteristic is shared by “over half” of the population, then one may change the map to display two data categories: one for places where up to 50 percent of the population shares the characteristic and a second category for places where more than 50 percent of the population shares the characteristic. another potential advantage is that all local data have been allocated pro rata so that all variables, regardless of their original granularity, may be expressed by county boundaries, by zip code boundaries, or by census tract. a disadvantage of the product is the lack of historical boundaries to match historical data. figure 2. map depicting census tracts that have more than 50% black population (yellow line indicates cincinnati city boundary) (source: simplymap, 2012; image used with permission) mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 28 proquest statistical datasets statistical datasets was developed by conquest systems inc. and is licensed by proquest. this product also mingles a broad array of several thousand public and licensed proprietary datasets, including some international data, in one interface. the user may retrieve data and view it in tabular or chart form. if the data have a geographic element, then the user may switch the view to a map interface. the resulting map may be exported as an image. the data may also be exported to a gis-friendly shapefile format. this product offers more robust data manipulation than the other products, in that the user may perform calculations between any of the data tables and create a chart or map of the created data element (see figure 3). statistical datasets, however, has more simplistic map layout capabilities than the other products. figure 3. map of sorghum production, by country, in 2010 (source: proquest statistical datasets, 2012; image used with permission) case studies the following three case studies are of college classroom situations in which students utilized maps or map making as part of the assigned course work. the above mapping options are assessed for how well they met the assignment needs. information technology and libraries | march 2013 29 case study 1 an upper level statistics course at the ohio state university requires students to create maps using sas (http://www.sas.com). while many may not associate the veteran statistical software package with creating maps, this course uses it along with sas/graph to combine statistical data with a map. the project requires data articulated at the county level in ohio, which the students then combine into multi-county regions. the end result is a map with regions labeled and rendered in 3d according to the data values. an example of the type of map that could be produced from such data using sas can be seen in figure 4. figure 4. map of observed rabbit density in ohio using sas, sas/graph, and mail carrier survey data,1998 (image used with permission) while the data are provided in this course, students could potentially seek help from the library in a traditional way to find numerical data expressed at a county level. the librarian would guide http://www.sas.com/ mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 30 patrons through appropriate avenues to locate data such as to the three products listed above. all three options contain numerous data variables for ohio at the county level. because the students are further processing the data elsewhere (in this case sas), the output options of the three products are less important. ultimately the availability of data on a desired subject would be the primary determinant for choosing one of the three gis lite options discussed here. social explorer will export the data in tabular form which can then be ingested into sas. simplymap and proquest statistical datasets would both be a bit easier, though, because both packages allow the user to export the data as shapefiles which are directly imported into sas/graph as both boundary files and joined tabular data. case study 2 a first year writing class at michigan state university has a theme of the american ethnic and racial experience. assignments all relate to a student’s chosen ethnic group and geographic location from approximately 1880 to 1930. assignments build upon each other to culminate in a final semester paper. students with ancestors living in the united states at that time are encouraged to examine their own family’s ethnicity and how they fit in their geographic context. otherwise, students may choose any ethnic group and place of interest. maps are a required element in the assignments. maps that display historical census data help students place the subject ethnic group into the larger county, state, and national context over the time frame. the students can see, for instance, if their subject household was part of an ethnic cluster or an outlier to ethnic clusters. the parameters for finding data and maps are generous and open to each student’s interpretation. the wish is for students to find social statistics and maps that are insightful to their topic and will help them tell their story. of the three statistical resources considered above, currently the only useful one is social explorer because it covers the time period studied by the class. the students may map several social indicators at the county level across several decades and compare their local area to the region and the nation. also they may save their maps and include them in their papers (properly credited). case study 3 “the ghetto” is an elective geography class restricted to upperclassman at michigan state university. in the semester project, students analyze the spatial organization and demographic variables of “ghetto” neighborhoods in a chosen city. a ghetto is defined as neighborhoods that have a 50 percent or higher concentration of a definable ethnic group. since black and white are the only two races consistently reported at the census tract level for all the years covered by the class (1960 through 2010) the students necessarily use that data for their projects. data needs for the class are focused and deep. the students specifically need to visualize us census data from 1960 through 2010 at the census tract level within the city limits for several social indicators. indicators include median income, median housing value, median rent, educational attainment, income, and rate of unemployment. the instructor has traditionally required use of the paper census volumes and students created hand-made maps that highlight information technology and libraries | march 2013 31 tracts in the subject city that conformed to the ghetto definition and those that did not for each of the census years covered. computer-retrieved data and computer-generated maps would be acceptable, but at the time of this writing no gis lite product is able to make all the maps that meet the specific requirements of this class. social explorer covers all of the date range and provides data down to the tract level. however it does not provide an outline of the city limits and does not provide all the data variables required in the assignment. simplymap will only work for 2000 through 2010 because tract boundaries are only available for those two years even though the data go back to 1980. simplymap does provide two excellent features though: it is the only product that allows an overlay of the (modern) city boundary on top of the census tract map, ands it is the only product that allows manipulation of the data intervals. students may choose to break the data at the needed 50 percent mark, while the other products utilize fixed data intervals not useful to this class. proquest statistical datasets can compute the data into two categories to create the necessary data intervals; however census data are only available beginning with census 2000. map products for user needs these three real-life class scenarios illustrate how the rich and seemingly duplicative resources of the library can range from perfectly suitable to perfectly useless depending on each project’s exact needs. the appropriateness of any given tool can only be assessed fairly if the librarian is familiar with all the “ins and outs” of every product. the geoweb and gis lite tools mentioned throughout this article are summarized in table 1. the suitability of gis lite tools will be further affected by the following issues. historical boundaries the range and granularity of data tools are subject to factors sometimes at odds with what a researcher would wish to have. at this time, for instance, many historical resources provide data only as detailed as the county level. county level data are available largely due to the efforts of the nhgis mentioned above and the newberry library’s atlas of county boundaries project (http://publications.newberry.ort/ahcbp). far fewer resources provide historical data at smaller geographies such as city, township, or census tract levels. this is because the smaller the geographies get, the exponentially more there are to create and for map interfaces to process. from the well-known resource city and county data book,11 it is easy enough to retrieve us city data. the historical boundaries of every city in the united states, however, have not been created. this is because city boundaries are much more dynamic than county boundaries and there is no centralized authoritative source for their changes over time. two of the three case studies presented here utilized historic data. this isn’t necessarily a representative proportion of user needs; librarians should assess data resources in light of their own patrons’ needs. normalization two equally valid data needs concerning any kind of time series data concern changing geographic boundaries. census tracts, for instance, provide geographic detail roughly at the neighborhood level designed by the bureau of census to encompass approximately 2,500 to 8,000 http://publications.newberry.ort/ahcbp mapping for the masses: gis lite & online mapping tools in academic libraries | weessies and dotson 32 people.12 because people move around and the density of population changes from decade to decade, so the configuration and numbering of tracts change over time. some scholars will wish to see the data values in the tracts as they were drawn at the time of issue. in this situation, a neighborhood of interest might belong to different tracts over the years or even be split between two or more tracts. other scholars focused on a particular neighborhood may wish to see many decades of census data re-cast into stable tracts in order to be directly comparable. data providers will take one approach or the other on this issue, and librarians will do well to be aware of their choice. license restrictions a third issue affecting use of these products is the ability to use derived map images, not only in formal outlets such as professional presentations, articles, books, and dissertations, but also informal outlets such as blogs and tweets. for the most part gis lite vendors are willing—even pleased—to see their products promoted in the literature and in social media. the vendors uniformly wish any such use to be properly credited. the license that every institution signs when acquiring these products will specify allowed and disallowed activities. the license, fixated on disallowing abuse or resale or other commercialization of the data, might leave a chilling effect on users wishing to use the images in their work. if a user is in any doubt as to the suitability of an intended use of a map, he or she should be encouraged to contact the vendor to seek permission for its use. as data resources grow and become more readily usable, the possibility for scholarly inquiry grows. librarians with familiarity with gis lite tools may partner with their patrons and guide them to the best resources. information technology and libraries | march 2013 33 table 1: a selection of geoweb and gis lite tools and their output options tool name url free or fee electronic output options* geoweb tools atlas of historical county boundaries http://publications.newberry.org/ahcbp/ free spatial data as shapefile, kmz; image as pdf did you feel it? http://earthquake.usgs.gov/earthquakes/dyfi/ free tabular data as txt, xml. image as jpg, pdf, ps google maps https://maps.google.com/ free none mapquest http://www.mapquest.com free none national broadband map http://www.broadbandmap.gov/ free image as png national hazards support systems (usgs) http://nhss.cr.usgs.gov/ free image as pdf, png national pipeline mapping system https://www.npms.phmsa.dot.gov/publicview er/ free image as jsf openstreetmap http://www.openstreetmap.org/ free tabular data as xml; image as png, jpg, svg, pdf ushahidi community deployments http://community.ushahidi.com/deployments/ free image as jpg gis lite tools arcgis online http://www.arcgis.com limited free options; access is part of institutional site license spatial data as arcgis 10; image as png (in arcexplorer) proquest statistical datasets http://cisupa.proquest.com/ws_display.asp?filt er=statistical%20datasets%20overview fee tabular data as excel, pdf, delimited text, sas, xml; spatial data as shapefile; image may be copied to clipboard sas/graph http://www.sas.com/technologies/bi/query_re porting/graph/index.html fee image as pdf, png, ps, emf, pcl scribble maps http://www.scribblemaps.com/ free spatial data as kml, gpx; image as jpg simplymap http://geographicresearch.com/simplymap fee tabular data as excel, csv, dbf, spatial data as shapefile; image as pdf, gif * does not include taking a screen shot of the monitor or making a durable url to the page http://publications.newberry.org/ahcbp/ http://earthquake.usgs.gov/earthquakes/dyfi/ https://maps.google.com/ http://www.mapquest.com/ http://www.broadbandmap.gov/ http://nhss.cr.usgs.gov/ https://www.npms.phmsa.dot.gov/publicviewer/ https://www.npms.phmsa.dot.gov/publicviewer/ http://www.openstreetmap.org/ http://community.ushahidi.com/deployments/ http://www.arcgis.com/ http://cisupa.proquest.com/ws_display.asp?filter=statistical%20datasets%20overview http://cisupa.proquest.com/ws_display.asp?filter=statistical%20datasets%20overview http://www.sas.com/technologies/bi/query_reporting/graph/index.html http://www.sas.com/technologies/bi/query_reporting/graph/index.html http://www.scribblemaps.com/ http://geographicresearch.com/simplymap information technology and libraries | march 2013 34 references 1. national research council, division on earth and life studies, board on earth sciences and resources, geographical sciences committee, learning to think spatially (washington, d.c.: f academies press, 2006): 9. 2. pinde fu and jiulin sun, web gis: principles and applications (redlands, ca: esri press, 2011): 15. 3. for good overviews of the geoweb, see muki haklay, alex singleton and chris parker, “web mapping 2.0: the neogeography of the geoweb,” geography compass 2, no. 6 (2008): 20112039, http://dx.doi.org/10.1111/j.1749-8198.2008.00167.x; jeremy w crampton, “cartography: maps 2.0,” progress in human geography 33, no. 1 (2009): 91-100, http://dx.doi.org/10.1177/0309132508094074. 4. sarah elwood, “geographic information science: visualization, visual methods, and the geoweb,” progress in human geography 35, no. 3 (2010): 401-408, http://dx.doi.org/10.1177/0309132510374250. 5. songnian li; suzana dragićević, and bert veenendaal eds, advances in web-based gis, mapping services and applications (boca raton, fl: crc press, 2011). 6. hogenboom, karen, carissa phillips, and merinda hensley, "show me the data! partnering with instructors to teach data literacy," in declaration of interdependence: the proceedings of the acrl 2011 conference, march 30-april 2, 2011, philadelphia, pa, ed. dawn m. mueller. (chicago: association of college and research libraries, 2011), 410-417, http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_ me_the_data.pdf. 7. ann s. gray, “data and statistical literacy for librarians,” iassist quarterly 28 no. 2/3 (2004): 24-29, http://www.iassistdata.org/content/data-and-statistical-literacy-librarians. 8. kathy weimer, paige andrew, and tracey hughes, map, gis and cataloging / metadata librarian core competencies (chicago: american library association map and geography round table, 2008), http://www.ala.org/magirt/files/publicationsab/magertcorecomp2008.pdf. 9. social explorer. http://www.socialexplorer.com/pub/home/home.aspx. 10. catherine fitch and steven ruggles, building the national historical geographic information system historical methods 36, no. 1 (2003): 41-50, http://dx.doi.org/10.1080/01615440309601214 . 11. u. s. bureau of census. county and city data book, http://www.census.gov/prod/www/abs/ccdb.html. http://dx.doi.org/10.1111/j.1749-8198.2008.00167.x http://dx.doi.org/10.1177/0309132508094074 http://dx.doi.org/10.1177/0309132510374250 http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_me_the_data.pdf http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_me_the_data.pdf http://www.iassistdata.org/content/data-and-statistical-literacy-librarians http://www.ala.org/magirt/files/publicationsab/magertcorecomp2008.pdf http://www.socialexplorer.com/pub/home/home.aspx http://dx.doi.org/10.1080/01615440309601214 http://www.census.gov/prod/www/abs/ccdb.html information technology and libraries | march 2013 35 12. census tracts and block numbering areas. http://www.census.gov/geo/www/cen_tract.html. acknowledgments the authors wish to thank dr. michael fligner, dr. clarence hooker, and dr. joe darden for permission to use their courses as case studies. http://www.census.gov/geo/www/cen_tract.html public access technologies in public libraries | bertot 81 john carlo bertot public access technologies in public libraries: effects and implications public libraries were early adopters of internet-based technologies and have provided public access to the internet and computers since the early 1990s. the landscape of public-access internet and computing was substantially different in the 1990s as the world wide web was only in its initial development. at that time, public libraries essentially experimented with publicaccess internet and computer services, largely absorbing this service into existing service and resource provision without substantial consideration of the management, facilities, staffing, and other implications of public-access technology (pat) services and resources. this article explores the implications for public libraries of the provision of pat and seeks to look further to review issues and practices associated with pat provision resources. while much research focuses on the amount of public access that public libraries provide, little offers a view of the effect of public access on libraries. this article provides insights into some of the costs, issues, and challenges associated with public access and concludes with recommendations that require continued exploration. p ublic libraries were early adopters of internet-based technologies and have provided public access to the internet and computers since the early 1990s.1 in 1994, 20.9 percent of public libraries were connected to the internet, and 12.7 percent offered public-access computers. by 1998, internet connectivity in public libraries grew to 83.6 percent, and 73.3 percent of public libraries provided public internet access.2 the landscape of public-access internet and computing was substantially different in the 1990s, as the world wide web was only in its initial development. at that time, public libraries essentially experimented with public-access internet and computer services, largely absorbing this service into existing service and resource provision without substantial consideration of the management, facilities, staffing, and other implications of public-access technology (pat) services and resources.3 using case studies conducted at thirty-five public libraries in five geographically dispersed and demographically diverse states, this article explores the implications for public libraries of the provision of pat. the researcher also conducted interviews with state library agency staff prior to visiting libraries in each state. the goals of this article are to n explore the level of support pat requires within public libraries; n explore the implications of pat on public libraries, including management, building planning, staffing, and other support issues; n explore current pat support practices; n identify issues and challenges public libraries face in maintaining and supporting their pat infrastructure; and n identify factors that contribute to successful pat practices. this article seeks to look beyond the provision of pat by public libraries and review issues and practices associated with pat–provision resources. while much research focuses on the amount of public access that public libraries provide, little offers a view of the effect of public access on libraries. this article provides insights into some of the costs, issues, and challenges associated with public access, and it concludes with recommendations that require continued exploration. n literature review quickly over time, public libraries increased their public-access provision substantially (see figures 1 and 2). connectivity grew from 20.9 percent in 1994 to nearly 100 percent in 2006.4 moreover, nearly all libraries that connected to the internet offered public-access internet services. simultaneously, the average number of publicaccess computers grew from 1.9 per public library in 1996 to 12 per public library in 2007.5 accompanying and in support of the continual growth of basic connectivity and computing infrastructure was a demand for broadband connectivity. indeed, since 1994, connectivity has progressed from dial-up phone lines to leased lines and other forms of high-speed connectivity. the extent of the growth in public-access services within public libraries is profound and substantive, leading to the development of new internet-based service roles for public libraries.6 and public access to the internet through public libraries provides a number of community benefits to different populations within served communities.7 overlaid onto the public-access infrastructure is an increasingly complex service mix that now includes access to digital content (e.g., databases and digital john carlo bertot (jbertot@umd.edu) is professor and director of the center for library innovation in the college of information studies at the university of maryland, college park. 82 information technology and libraries | june 2009 libraries), integrated library systems (ilss), voice over internet protocol (voip), digital reference, and a host of other services and resources—some for public access, others for back-office library operations. and patrons do use these services in increasing amounts—both in the library and in everyday life.8 in fact, 82.5 percent of public libraries report that they do not have an adequate number of public-access computers some or all of the time and have resorted to time limits and wireless access to extend public-access services.9 by 2007, as connectivity and public-access computer infrastructure grew, so ensued the need to provide a range of publicly available services and resources: n 87.7 percent of public libraries provide access to licensed databases n 83.4 percent of public libraries offer technology training n 74.1 percent of public libraries provide e-government services (e.g., locating government information and helping patrons complete online applications) n 62.5 percent of public libraries provide digital reference services n 51.8 percent of public libraries offer access to e-books10 the list is not exhaustive, but illustrative, since libraries do offer other services such access to homework resources, video content, audio content, and digitized collections. as public libraries expanded these services, management realized that they needed to plan and evaluate technology-based services. over the years, a range of technology management, planning, and evaluation resources emerged to help public libraries cope with their technology-based resources—those both publicly available and for administrative operations.11 but increasingly, public libraries report the strain that pat services promulgate. this centers on four key areas: n maintenance and management. the necessary maintenance and management requirements of pat places an additional burden on existing staff, many of whom do not possess technology expertise to troubleshoot, fix, and support internet-based services and resources that patrons access. n staff. libraries consistently cite staff expertise and availability as a barrier to the addition, support, and management of pat. indeed, as described in previous sections, some libraries have experienced a decline in library staff. n finances. there is evidence of stagnant funding for libraries at the local level as well as a shift in expenditures from staff and collections to operational costs such as utilities and maintenance. n buildings. the buildings are inadequate in terms of space and infrastructure (e.g., wiring and cabling) to support additional public access.12 this article explores these four areas through a sitevisit method in an effort to go beyond a quantitative assessment of pat within the public library community. though related in terms of topic area and author, this study was conducted separately from the public library internet surveys conducted since 1994 and offers insights into the provision of pat services and resources that a national survey cannot explore in such depth. figure 1. public-access internet connectivity from 1994 through 2008 figure 2. public-access internet workstations from 1996 through 2008 public access technologies in public libraries | bertot 83 n method the researcher visited thirty-five public libraries in five geographically and demographically diverse states between october 2007 and may 2008. the states were in the west, southwest, southeast, and mid-atlantic regions. the libraries visited included urban, suburban, rural, and native american public libraries that served populations ranging from a few hundred to more than half a million. the communities that the libraries served varied in terms of poverty, race, income, age, employment, and education demographics. prior to visiting the public library sites, the researcher conducted interviews with state library agency staff to better understand the public library context within each state and to explore overall pat issues, strategies, and other factors within the state. the following research questions guided the site visits: n what are the community and library contexts in which the library provides pat? n what are the pat services and resources that the library makes available to its community? n what pat services and resources does the library desire to provide to its community? n what is the relationship between provided and desired pat and the effect on the library (e.g., staff, finances, the building, and management)? n what are the perceived benefits to the library and its community gains through pat in the library? n what are the issues and barriers that the library encounters in providing pat services and resources? n how does the library manage and maintain its pat? the researcher visited each library for four to six hours. during that time, he interviewed the library director and/or branch manager and technology support staff (either a specific library position, designated library employee, or city or county it staff person), toured the library facility, and conducted a brief technology inventory. at some libraries, the researcher was able to meet with community partners that in some way collaborated with the library to provide pat services and resources (e.g., educational institutions that collaborated with libraries to provide access to broadband or volunteers who conducted technology training sessions). interviews were recorded and transcribed, and the technology inventories were entered into a microsoft excel spreadsheet for analysis. the transcripts were coded using thematic content analytic schemes to allow for the identification of key issues regarding pat areas.13 this approach enabled the researcher to use an iterative site-visit strategy that used findings from previous site visits to inform subsequent visits. to ensure valid and reliable data, the researcher used a three-stage strategy: 1. site-visit reports were completed and sent to th libraries for review. corrections from libraries were incorporated into a final site-visit report. 2. a final state-based site-visit report was compiled for distribution to state library agency staff and also incorporated their corrections. this provided a state-level reliability and validity check. 3. a summary of key findings was distributed to six experts in the public library technology environment, three of which were public library technology managers and three of which were technology consultants who worked with public libraries. in combination, this approach provided three levels of data quality checks, thus providing both internal (library and state) and external (technology expert) support for the findings. the findings in this article are limited to the libraries visited and interviews conducted with public librarians and state library agency staff. however, themes emerged early during the site-visit process and were reinforced through subsequent interviews and visits across the states and libraries visited. in addition, the use of external reviewers of the findings lends additional, but limited, support to the findings. n findings this section presents the results of the site visits and interviews with state library agency staff and public librarians. the article presents the findings by key areas surrounding pat in public libraries. the public-access context public libraries have a range of pat installed in their libraries for patron use. these technologies include public-access computers, wireless (wifi) access, ilss, online databases, digital reference, downloadable audio and video, and others. many of these services and resources are also available to patrons from outside library buildings, thus extending the reach (and support issues) of the library beyond the library’s walls. in addition, when libraries do not provide direct access to resources and services, they serve as access points to those services, such as online gaming and social networking. while libraries can and do deploy a number of technologies for public use, it is possible to group these 84 information technology and libraries | june 2009 technologies broadly into two overlapping categories: n hardware. library pat hardware can include public-access computers, public-access computing registration (i.e., reservation) systems, self-checkout stations, printers, faxes, laptops, and a range of other devices and systems. some of these technologies may have additional devices, such as those required for persons with disabilities. within the hardware grouping are networking technologies that include a range of hardware and software to enable a range of library networks to run (e.g., routers, hubs, switches, telecommunications lines, and networking software). n software. software can include device operating system software (e.g., microsoft windows, mac os, and linux), device application software (e.g., microsoft office, openoffice, graphics software, audio software, e-book readers, assistive software, and others), and functional software (e.g., web browsers, online databases, and digital reference). in short, public libraries make use of a range of technologies that the public uses in some way. each type of technology requires skills, management, implementation, and maintenance, all of which are discussed later. in the building, all of these products and services come together at the library’s public-access computers, or patron mobile device if wifi is available. moreover, patrons increasingly want to use their portable devices (e.g., usb drives, ipods, and others) with library technology. this places pressure on libraries to not just offer public-access computers, but also to support a range of technologies and services. thus the environment in which libraries offer pat is complex and requires substantial technical expertise, support, and maintenance in key areas of applications, computers, and networking. moreover, as discussed below, patrons are increasingly demanding market-based approaches to pat. these demands—which are largely about single-point access to a range of information services and resources—are often at odds with library technology that is based on stove-piped approaches (e.g, ils, e-books, and licensed resources) and that do not necessarily lend themselves to seamless integration. n external pressures on pats the advent and increased use by the public of google, amazon, itunes, youtube, myspace, second life, and other networked services affects public libraries in a number of ways. this article discusses these services and resources from the perspective of an information marketplace of which the public library is one entrant. interviewed librarians overwhelmingly indicated that users now expect library services to resemble those in the marketplace. users expect the look and feel, integration, service capabilities, interactivity, and personalization and customization that they experience while engaging in social networking, online searching, online purchasing, or other online activities. and within the library building, patrons expect the services to integrate at the public-access computer entry point—not distributed throughout the library in a range of locations, workstations, or devices. said differently, they expect to have a “mylibrary.com” experience that allows for seamless integration across the library’s services but also facilitates the use of personal technologies (e.g., ipods, mp3 players, and usb devices). thus users expect the library’s services to resemble those services offered by a range of information service providers. importantly, however, librarians indicated that library systems on which their services and resources reside by and large do not integrate seamlessly—nor were they designed to do so. public-access computers are gateways to the internet; the ils exists for patrons to search for and locate library holdings; and online databases, e-books, audiobooks, etc., are extensions of the library’s holdings but are not physical items under a library’s control and thus subject to a vendor’s information and business models. while library vendors and the library community are working to develop more integrated products that lead users to the information they seek, the technology is under development. there are three significant issues that libraries face because of market pressures: (1) the pressures all come together at a single point—the public-access computer; (2) users want a customized experience while using technology designed for the general public, not the individual user; and (3) users have choices in the information marketplace. one participant indicated, “if the library cannot match what users have access to on the outside, users will and do move on.” managing and maintaining public access managing the public-access computer environment for public libraries is an growing challenge. there are a number of management areas with which public librarians contend: n public-access computers—the computers and laptops (if applicable) themselves, which can include anything from keyboards and mice to troubleshooting a host of computer problems (it is important to note that these may be computers that often vary in age and composition, come from a range of vendors, run different operating systems, and often public access technologies in public libraries | bertot 85 have different application software versions). n peripheral management—the printers, faxes, scanners, and other equipment that are part of the library’s overall public access infrastructure. n public-access management software or systems—these may include online or in-building computer-based reservations (which encompasses specialized reservations such as teen machines, gaming computers, computers for seniors, and so on), time management (set to the library’s decided-upon time allotment), filtering, security, logins, virtual machines, etc. n wireless access—this may include logins and configurations for patrons to gain access to the library’s wireless network. n bandwidth management—this may include the need to allocate bandwidth differently as needs increase and decrease in a typical day. n training and patron assistance—for a vast array of services such as databases, online searching, e-government (e.g., completing government forms and seeking government information), and others. training can take place formally through classes, but also through point-of-use tutorials requested by patrons. to some extent, librarians commented that, while they do have issues with the public-access computers themselves from time to time, the real challenges that they face regard the actual management of the publicaccess environment—sign-ups, time limits, cost recovery for print jobs, helping patrons, and so on. one librarian commented that “the computers themselves are pretty stable. we don’t really have too many issues with them per se. it’s everything that goes into, out from, or around the computer that creates issues for us.” as a result of the management challenges, several libraries have adopted turn-key solutions, such as public-access management systems (e.g., comprise technology’s smart access manager [http://www.comprisetechnologies .com/product_29.html]) and all-encompassing public computing management systems that include networking and desktops (e.g., userful’s discoverstations [http:// userful.com/libraries/]). these systems allow for an allin-one sign-up, print cost recovery, filtering (if desired), and security approach. also, the discoverstations are a linux-based, all encompassing public-access management environment. a clear advantage to the discoverstation approach is that the discoverstation is connected to the internet and is accessible by userful staff remotely to update software and perform other maintenance functions. they also use open-source operating and application software. while these solutions do provide efficiencies, they also can create limitations. for example, the discoverstations are a thin-client system and are dependent on the server for graphics and memory, thus limiting their ability to access gaming and social-networking sites. the smart access manager, and similar programs, can rely on smart cards or other technology that users must purchase to print. another limitation is that the time limits are fixed, and, while users get warnings as time runs out, the session can end abruptly. these approaches are by and large adopted by libraries to ease the management associated with public-access computers and let staff concentrate on other duties and responsibilities. one librarian indicated that “until we had our management system, we would spend most of the day signing people up for the computers, or asking them to finish their work for the next person in line.” n planning for pat services and resources public libraries face a number of challenges when planning for pat services and resources. this is primarily because pat planning involves more than computers. any planning needs to encompass n building needs, requirements, limitations, and design; n technology assessment that considers the library’s existing technology, technology potential, current practices, and future trends; n planning for and supporting multiple technology platforms; n telecommunications and networking; n services and resources available in the marketplace—those specifically for libraries and those more broadly available to consumers and used by patrons; n specific needs and requirements of technology (e.g., memory, disk space, training, other); n requirements of other it groups with which the library may need to integrate, for example, city or county technology mandates; n support needs, including the need to enter into maintenance agreements for computer, network, and other equipment and software; n staff capabilities, such as current staff skill sets and their ability to handle the technologies under review or purchased; and n policy, such as requirements to filter because of local, state or federal mandates. the above list may not be exhaustive, but rather based on the main items that librarians identified during the site visits, and they serve to provide indicators of the challenges those planning library it initiatives face. 86 information technology and libraries | june 2009 n the endless upgrade and planning one librarian likened the pat environment to “being a gerbil on a treadmill. you go round and round and never really arrive,” a reference to the fact that public libraries are in a perpetual cycle of planning and implementing various pat services and resources. either hardware needs to be updated or replaced, or there is a software update that needs to be installed, or libraries are looking to the next technology coming down the road. in short, the technology planning to implementation cycle is perpetual. the upgrade and replacement cycle is further exacerbated by the funding situation in which most public libraries find themselves. increasingly, public library local and state funding, which combined can account for more than 90 percent of library funding, is flat or declining.14 the most recent series of public library internet studies indicates an increase in reliance by public libraries on fees and fines, fundraising, private foundation, and grant funding to finance collections and technology within libraries.15 this places key aspects of library operations in the realm of unreliable and one-time funding sources, thus making it difficult for libraries to develop multiyear plans for pat. n multiple support models to cope with pat management and maintenance issues, public libraries are developing various support strategies. the site visits found a number of technology-support approaches in effect, ranging from no it support to highly centralized statewide approaches. the following list describes the technology-support models encountered during the site visits: 1. no technology support. libraries in this group have neither technology-support staff nor any type of organized technology-support mechanism with existing library staff. nor do they have access to external support providers such as county or city it staff. libraries in this group might rely on volunteers or engage in ad hoc maintenance, but by and large have no formal approach to supporting or maintaining their technology. 2. internal library support without technology staff. in this model, the library provides its own technology support but does not necessarily have dedicated technology staff. rather, the library has designated one or more staff members to serve as the it person. usually this person has an interest in technology but has other primary responsibilities within the library. there may be some structure to the support—such as updating software (e.g., windows patches) once a week at a certain time— but it may be more ad hoc in approach. also, the library may try to provide its designated it person(s) with training to develop his or her skills further over time. 3. internal library support with technology staff. in this model, the library has at least one dedicated it staff person (partor full-time) who is responsible for maintaining and planning the library’s pat environment. the person may also have responsibilities for network maintenance and a range of technology-based services and resources. at the higher end of this approach are libraries with multiple it staff with differing responsibilities, such as networking, telecommunications, public-access computers, the ils, etc. libraries at this end of the spectrum tend to have a high degree of technology sophistication but may face other challenges (i.e., staffing shortages in key areas). 4. library consortia. over the years, public libraries have developed consortia for a range of services— shared ilss, resource sharing, resource licensing, and more. as public-library needs evolve, so too do the roles of library consortia. consortia increasingly provide training and technology-support services, and may be funded through membership fees, state aid, or other sources. 5. technology partners. while some libraries may rely on consortia for their technology support, others are seeking libraries that have more technology expertise, infrastructure, and abilities with whom to partner. this can be a fee-for-service arrangement that may involve sharing an ils, a maintenance agreement for network and public-access computer support, and a range of services. these arrangements allow the partner libraries to have some input into the technology planning and implementation processes without incurring the full expense of testing the technologies, having to implement them first, or hiring necessary staff (e.g., to manage the ils). the disadvantage to this model is that the smaller partner libraries are dependent on the technology decisions that the primary partner makes, including upgrade cycles, technology choices, migration time frames, etc. 6. city, county, or other agency it support. as city or county government agencies, some libraries receive technology support from the city or county it department (or in some cases the education department). this support ranges from a full slate of services and support available to the library to support only for the staff network and computers. public access technologies in public libraries | bertot 87 even at the higher end of the support spectrum, librarians gave mixed reviews for the support received from it agencies. this was primarily because of competing philosophies regarding the pat environment, with public librarians wanting an open-access policy to allow users access to a range of information service and resources and it agency staff wanting to essentially lock down the public-access environment and thus severely limit the functionality of the public-access computers and network services (i.e., wireless). other limitations might include prescribed pat, specified vendors, and bidding requirements. 7. state library support. one state library visited provides a high degree of service through its statewide approach to supporting public-access computing in the state’s public libraries. the state library has it staff in five locations throughout the state to provide support on a regional level but also has additional staff in the capital. these staff offer training, inhouse technical support, phone support, and can remote access the public-access computers in public libraries to troubleshoot, update, and perform other functions. moreover, this state built a statewide network through a statewide application to the federal e-rate program, thus providing broadband to all libraries. this model extends the availability of qualified technical support staff to all public libraries in the state—by phone as well as in person if need be. as a result, this enables public libraries to concentrate on service delivery to patrons. it is important to note that there are combinations of the above models in public libraries. for example, some libraries support their public-access networks and technology while the county or city it department supports the staff network and technology. it is clear, however, that there are a number of models for technology support in public libraries, and likely more than are presented in this article. the key issue is that public libraries are engaging in a broad spectrum of strategies to support, maintain, and manage their pat infrastructure. also of significance is that there are public libraries that have no technology-support services that provide pat services and resources. these libraries tend to serve populations of less than ten thousand, are rural, have fewer than five full-time equivalents (ftes), and are unlikely to be staffed by professional librarians. staff needs and pressures the study found a number of issues related to the effect of pat on library staff. this section of the findings discusses the primary factors affecting library staff as they work in the public-access context. n multiple skills needed not only is the pace of technological change increasing, but the change requires an ever-increasing array of skills because of the complexity of applications, technologies, and services. an example of such complexity is the library opac or ils. visited libraries indicated that such systems are becoming so complex and technologically sophisticated that there is a need for a full-time staff person to run and maintain the library ils. given the range of hardware, software, and networking infrastructure, as well as planning and pat management requirements, public librarians need a number of skills to successfully implement and maintain their pat environments. moreover, the skill needs depend on the librarian’s position—for example, an actual it staff person versus a reference librarian who does double duty by serving as the library’s it person. the skills required fall into technology, information literacy, service and facilities planning, management, and leadership and advocacy areas: n technology o general computer troubleshooting o basic maintenance, such as mouse and keyboard cleaning o basic computer repair, such as memory replacement, floppy drive replacement, disk defragmentation, etc. o basic networking, such as troubleshooting an “internet” issue versus a computer problem o telecommunications so as to understand the design and maintenance of broadband networks o integrated library systems o web design n information literacy o searching and using internet-based resources o searching and using library licensed resources o training patrons on the use of the publicaccess computers, general internet resources, and library resources o designing curriculum for various patron training courses n services and facilities planning o technology plan development and implementation (including budgeting) o telecommunications planning (including 88 information technology and libraries | june 2009 e-rate plan and application development) o building design so as to accommodate the requirements of public access technologies n management o license and contract negotiation for licensed resources, various public-access software and licenses, and maintenance agreements (service and repair agreements) o integration of pat into library operations o troubleshooting guidelines and process o policy development, such as acceptable use, filtering, filtering removal requests by patrons, etc. n leadership and advocacy o grant writing and partnership development so as to fund pat services and resources and extend out into the community that the library serves o advocacy so as to be able to demonstrate the value of pat in the library as a community good o leadership so as to build a community approach to public access with the library as one of the foundational institutions these items provide a broad cross section of the skills that public library staff may need to offer a robust pat environment. in the case of smaller, rural libraries, these requirements in general fall to the library director—along with all other duties of running the public library. in libraries that have separate technology, collections development, and other specialized staff, the skills and expertise may be dispersed throughout various areas in the library. n training public librarians receive a range of technology training— including none at all. in some cases, this might be a basic workshop on some aspect of technology at a state library association annual meeting or a regional workshop hosted by the library’s consortium. it could be an online course through webjunction (http://www.webjunction .org/). it could be a one-on-one session with a vendor representative or colleague. or it could be a formal, multiday class regarding the latest release of an ils. if available, public librarians have access to technology training that can take many forms, has a wide array of content (basic to expert), and can enhance staff knowledge about it with varying degrees of success. an issue raised by librarians was that having access to training and being able to take advantage of training are two separate things. regardless of the training delivery medium, librarians indicated that they were not always able to get release time to attend a training session. this was particularly the case for small, rural libraries that had less than five ftes spread out over several part-time individuals. for these staff to take advantage of training would require a substitute to cover public-service hours—or shut down the library. funding information technology as one might expect, there was a range of technology budgets in the public libraries visited or interviewed— from no technology budget to a substantial technology budget, and many points in between. some libraries had a dedicated it budget line item, others had only an operating budget out of which they might carve some funds for technology. libraries with dedicated it budgets by and large had at least one it staff person; libraries with no it budget largely relied on a staff person responsible for other library functions to manage their technology. in the smallest libraries, the library director served as the technology specialist in addition to being the general library operation manager. some libraries have established foundations through which they can raise funds for technology, among other library needs. many seek grants and thus devote substantial effort to seeking grant initiatives and writing grant proposals. some libraries held fundraisers and worked with their library friends groups to generate funds. other libraries engage in all of the above efforts to provide for their pat infrastructure, services, and resources. in short, there are several budgetary approaches public libraries use to support their pat environment. critical to note is that a number of libraries are increasingly relying on nonrecurring funds to support pats, a fact corroborated by the 2007 and 2008 public library internet surveys.16 the buildings when one visits public libraries, one is immediately struck by the diversity in design, functionality, and architecture of the buildings. public libraries often reflect the communities that they serve not only in the collection and service, but also in the facilities. this diversity serves the public library community well because it allows for a custom approach to libraries and their community. the building design, however, can also be a source of substantial challenge for public libraries. the increased integration of technology into library service places a range of stresses on buildings—physical space for workstations and other equipment and specialized furniture, power, server rooms, and cabling, for example. along with the library-based technology requirements come those of patrons—particularly the need for power so that public access technologies in public libraries | bertot 89 patrons may plug in their laptops or other devices. also important to note is that the building limitations also extend to staff and their access to computing and networked technologies. a number of librarians commented that they are “simply at capacity.” one librarian summed it up by stating that “there’s no more room at the inn. unless we start removing parts of our collection, we don’t have any more room for workstations.” another said that, “while we do have the space to add more computers, we don’t have enough power or outlets to support them. and, with our building, it’s not a simple thing to add.” in short, many libraries are reaching, or have reached, a saturation point as to just how much pat they can support. n discussion and implications over time, pat services have become essential services that public libraries provide their communities. with nearly all public libraries connected to the internet and offering public-access computers, the high percentage of libraries that offer internet-based services and resources, the overall usage of these resources by the public,17 and 73 percent of public libraries reporting that they are the only free provider of pat in their communities, it is clear that the provision of pat services is a key and critical service role that public libraries offer.18 it is also clear, however, that the extent to which public libraries can continue to absorb, update, and expand their pat depends on the resolution of a number of staffing, financial, maintenance and management, and building barriers. in a time of constrained budgets, it is unlikely that libraries will receive increased operational funding. indeed, reports of library funding cuts are increasing in the current economic downturn, which affects the ability of libraries to increase, or significantly update, staff—particularly in the areas of technology, licensing additional resources, procuring additional and new computers, and purchasing and offering expanded services such as digital photography, gaming, or social networking.19 moreover, the same financial constraints can affect the ability of libraries to raise capital funds for building improvements and new construction. funding also has an effect on the training that public libraries can offer or develop for their staff. and training is becoming increasingly important to the success of pat services and resources in public libraries—but not just training regarding the latest technologies. rather, there is a need for training that provides instruction on the relationship between the level of pat services and resources a library can or desires to provide and advocacy; broadband, computing, and other needs; technology planning and management; collaboration and partnering; and leadership. the public library pat environment is complex, encompasses a number of technologies, and has ties to many community services and resources. training programs need to reflect this complexity. the continued provision of pat services in public libraries is increasingly burdensome on the public library community, and the pressures to expand their pat services and resources continues to grow—particularly as libraries report their “sole provider” of free pat status in their communities. the successful libraries in terms of pat services and resources visited had staff that could n understand pat (both in terms of functionality and potential); n think creatively across the technology and library service spectrum; n integrate online content, pat, and library services; n articulate the value of pat as an essential community need and public library service; n articulate the role of the perception of the library by its community as a critical bridge to online content; n demonstrate leadership within the community and library; n form partnerships and extend pat services and resources into the community; and n raise funds and develop other support mechanisms to enhance pat services and resources in the library and throughout the community. in short, successful pat in libraries was being redefined in the context of communitywide pat service and resource provision. this approach not only can lead to a more robust community pat infrastructure, but it also lessens the library’s burden of pat service and resource provision. but equally important to note is that the extent to which all public libraries can engage in these activities on their own is unclear. indeed, several libraries visited were struggling to maintain basic pat service levels and indicated that increasing pat services came at the expense of other library services. “we’re trying to meet demand,” one librarian said, “but we have too few computers, too slow a connection, and staff don’t always know what to do when things go wrong or someone comes in talking about the latest technology or website.” for some libraries, therefore, quality pat services that meet community needs are simply out of reach. thus another implication and finding of the study is the need for libraries to explore other models of support for their pat environments—for example, using the services of a regional cooperative, if available; if none is available, libraries could form their own cooperative for resource sharing, technology support, and other aspects of pat service provision. the same approach could be 90 information technology and libraries | june 2009 taken within a city or county to enhance technology support throughout a region. another approach would be to outsource a library’s pat support and maintenance to a nearby library with support staff in a fee-for-service approach. there are a number of approaches that libraries could take to support their pat infrastructure. a key point is that libraries need to consider pat service provision in a broader community, regional, or state context, and the study found some libraries doing so. the need to avail staff of the skills required to truly support pat was a recurring theme throughout the site visits. approaches and access to training varied. for example, some state libraries provided—either directly or through the hiring of consultants and instructors—a number of technology-related courses taught in regional locations. an example of this approach is california’s infopeople project (http://www.infopeople .org/). some state libraries subscribed to webjunction (http://www.webjunction.org/), which provides access to online instructional content. online manuals provided by compumentor through a grant funded by the bill and melinda gates foundation aimed at helping rural libraries support their pat (www.maintainitproject.org) are another resource. beyond technology skills training, however, is the need for technology planning, effective communication, leadership, value demonstration, and advocacy. the extent to which leadership, advocacy, and library marketing, for example, are able to be taught remains a question. all of these issues take place with the backdrop of an economic downturn and budgetary constraints. increased operating costs created through inflation and higher energy costs place substantial pressures on public libraries simply to maintain current levels of service— much less engage in the additional levels of service that the pat environment brings. indeed, as the 2008 public library funding and technology access study demonstrated, public libraries are increasingly funding their technology-based services through non-recurring funds such as fines and fundraising activities.20 thus, the ability of public libraries to provide robust pat services and resources is increasingly limited unless such service provision comes at the expense of other library services. alone, the financial pressures place a high burden on public libraries. combined with the building, staffing, skills, and other constraints reported by public libraries, however, the emerging picture for library pat services and resources is one of significant challenge. n three key areas for additional exploration the findings from the study point to the need for additional research and exploration of three key services areas and issues related to pat support and services: 1. develop a better understanding of success in the pat environment. this study and the 2006 study by bertot et al. point to what is required for libraries to be successful in a networked environment.21 in fact, the 2007 public libraries and the internet report contained a section entitled “the successfully networked public library,” which offered a range of checklists for public libraries (and others) to consider as they planned and implemented their networked services.22 this study identified additional success factors and considerations focused specifically on the public access technology environment. together, these efforts point to the need to better understand and articulate the critical success factors necessary for public libraries to plan, implement, and update their pat given current service contexts. this is particularly necessary in the context of meeting user expectations and needs regarding networked technologies and services. 2. further identify technology-support models. this study uncovered a number of different technologysupport models implemented by public libraries. undoubtedly there are additional models that require identification. but, more importantly, there is a need to further explore how each technologysupport model assists libraries, under what circumstances, and in what ways. some models may be more or less appropriate on the basis of the service context of the library—and that is not clearly understood at this time. 3. levels of service capabilities. an underlying theme throughout this research, and one that is increasingly supported by the public library and the internet studies, is that the pat service context is essentially a continuum from low service and capability to high service and capability. there are a number of factors contributing to where libraries may lie on the success continuum—funding, management, leadership, attitude, skills, community support, and innovation, to name a few. this continuum requires additional research, and the research implications could be profound. emerging data indicate that there are public libraries that will be unable to continue to evolve and meet the increased demands of the networked environment, both in terms of staff and infrastructure. public libraries will have to make choices regarding the provision of pat services and resources in light of their ability to provide high-quality services (as defined by their service communities). for better or worse, the technology environment continually evolves and requires new technologies, management, and support. that is, public access technologies in public libraries | bertot 91 and will continue to be, the nature of public access to the internet. though there are likely other issues worthy of exploration, these three are critical to further our understanding of the pat environment and public library roles and issues associated with the provision of public access. n conclusion the pat environment in which public libraries operate is increasingly complex and continues to grow in funding, maintenance and management, staffing, and building demands. public libraries have navigated this environment successfully for more than fifteen years; however, stresses are now evident. libraries rose quickly to the challenge of providing public-access services to the communities that they serve. the challenges libraries face are not necessarily insurmountable, and there are a range of tools designed to help public libraries plan and manage their public-access services. these tools, however, place the burden of public access, or assume that the burden of public access in placed, on the public library. given increased operating costs because of inflation, the continual need to innovate and upgrade technologies, staff technology skills requirements, and other factors discussed in this article, libraries may not be in a position to shoulder the burden of public access alone. thus there is a need to reconsider the extent to which pat provision is the sole responsibility of the library; perhaps there is a need to integrate and expand public access throughout a community. the potential of such an approach can benefit a community through an integrated and broader access strategy, but also can relieve the pressure on the public library as the sole provider of public access. n acknowledgement this reserach was made possible in part through the support of the maintianit project (http://www.maintainit project.org/), an effort of the nonprofit techsoup web resource (http://www.techsoup.org/). references 1. charles r. mcclure, john carlo bertot, and douglas l. zweizig, public libraries and the internet: study results, policy issues, and recommendations (washington, d.c.: national commission on libraries and information science, 1994). 2. john carlo bertot and charles r. mcclure, moving toward more effective public internet access: the 1998 national survey of public library outlet internet connectivity (washington, d.c.: national commission on libraries and information science, 1998), http://www.liicenter.org/reports/1998_plinternet_ study.pdf (accessed apr. 22, 2009). 3. charles r. mcclure, john carlo bertot, and john c. beachboard, internet costs and cost models for public libraries (washington, d.c.: national commission on libraries and information science, 1995). 4. charles r. mcclure, john carlo bertot, and douglas l. zweizig, public libraries and the internet: study results, policy issues, and recommendations (washington, d.c.: national commission on libraries and information science, 1994); john carlo bertot, charles r. mcclure, paul t. jaeger, and joe ryan, public libraries and the internet 2006: study results and findings (tallahassee, fla.: information institute, 2006), http://www .ii.fsu.edu/projectfiles/plinternet/2006/2006_plinternet.pdf (accessed mar. 5, 2009). 5. john carlo bertot, charles r. mcclure, carla b. wright, elise jensen, and susan thomas, public libraries and the internet 2007: study results and findings (tallahassee, fla.: information institute, 2008). http://www.ii.fsu.edu/projectfiles/plinternet/ 2007/2007_plinternet.pdf (accessed sept. 10, 2008). 6. charles r. mcclure and paul t. jaeger, public libraries and internet service roles: measuring and maximizing internet services (chicago: ala, 2008). 7. george d’elia, june abbas, kay bishop, donald jacobs, and eleanor jo rodger, “the impact of youth’s use of the internet on the use of the public library,” journal of the american society for information science & technology 58, no. 14 (2007): 2180–96; george d’elia, corinne jorgensen, joseph woelfel, and eleanor jo rodger, “the impact of the internet on public library use: an analysis of the current consumer market for library and internet services,” journal of the american society for information science & technology 53, no. 10 (2002): 802–20. 8. national center for education statistics (nces), public libraries in the united states: fiscal year 2005 [nces 2008301] (washington, d.c.: national center for education statistics, 2007); pew american and internet life, “internet activities,” http:// www.pewinternet.org/trends/internet_activities_2.15.08.htm (accessed mar. 5, 2009). 9. bertot et al., public libraries and the internet 2007. 10. ibid. 11. cheryl bryan, managing facilities for results: optimizing space for services (chicago: public library association, 2007); joseph matthews, strategic planning and management for library managers (westport, conn.: libraries unlimited, 2005); joseph matthews, technology planning: preparing and updating a library technology plan (westport, conn.: libraries unlimited, 2004); diane mayo and jeanne goodrich, staffing for results: a guide to working smarter (chicago: public library association, 2002). 12. ala, libraries connect communities: public library funding & technology access study (chicago: ala, 2008), http:// www.ala.org/ala/aboutala/offices/ors/plftas/0708report.cfm (accessed mar. 5, 2008). 13. charles p. smith, ed., motivation and personality: handbook of thematic content analysis (new york: cambridge univ. 92 information technology and libraries | june 2009 pr., 1992); klaus krippendorf, content analysis: an introduction to its methodology (beverly hills, calif.: sage, 1980). 14. ala, libraries connect communities. 15. bertot et al., public libraries and the internet 2006; bertot et al., public libraries and the internet 2007. 16. ibid. 17. nces, public libraries in the united states. 18. bertot et al., public libraries and the internet 2007. 19. american libraries, “branch closings and budget cuts threaten libraries nationwide,” nov. 7, 2008, http://www .ala.org/ala/alonline/currentnews/ newsarchive/2008/november2008/ branchesthreatened.cfm (accessed nov. 17, 2008). 20. ala, libraries connect communities. 21. bertot et al., public libraries and the internet 2006. 22. bertot et al., public libraries and the internet 2007. 44 information technology and libraries | march 2011 jennifer emanuel usability of the vufind next-generation online catalog vufind incorporates many of the interactive web and social media technologies that the public uses online, including features from online booksellers and commercial search engines. the vufind search page is simple, containing only a single search box and a dropdown menu that gives users the option to search all fields or to search by title, author, subject, or isbn/issn (see figure 1). to combine searches using boolean logic or to limit to a particular language or format, the user must use the advanced search feature (see figure 2). the recordresults page displays results vertically, with each result containing basic item information, such as title, author, call number, location, item availability, and a graphical icon displaying the material’s format. the results page also has a column on the right side displaying “facets,” which are links that allow a user to refine their search and browse results using catalog data contained within the result set (see figure 3). vufind also contains a variety of web 2.0 features, such as the ability to tag items, create a list of favorite items, leave comments about an item, cite an item, and links to google book previews and extensive author biographies data mined from the internet. corresponding to the beginning of the vufind trial at uiuc, the university library purchased reviews, synopses, and cover images from syndetic solutions to further enhance both vufind and the existing webvoyage catalog. an additional appealing aspect of vufind was its speed; the carli installation of webvoyage is slow to load and is prone to time out while conducting searches. the uiuc library first provided vufind (http:// www.library.illinois.edu/vufind) at the beginning of the 2008 fall semester and expected it to be trialed through the end of the spring semester 2009. use statistics show that throughout the fall semester (september through december), there were approximately six thousand unique visitors each month, producing a total of more than thirty-eight thousand visits. spring statistics show use averaging more than ten thousand visitors a month, an increase most likely from word-of-mouth. librarians at both uiuc and carli were interested in what users thought about vufind, especially in relation to the usability of the interface. with this in mind, the library launched several forms of assessment during the spring semester. the first was a quantitative survey based on yale’s vufind usability testing.3 the second was a more extensive qualitative usability test that had users conducting sample searches in the interface and telling the facilitator their opinions. this article will discuss the hands-on usability portion of this study. survey responses that support the results presented herein will be reported in a separate venue. while this article only discusses vufind at a single institution, it does offer a generalized view of next-generation catalogs and how library users use such a catalog compared to a traditional online catalog. the vufind open–source, next-generation catalog system was implemented by the consortium of academic and research libraries in illinois as an alternative to the webvoyage opac system. the university of illinois at urbana-champaign began offering vufind alongside webvoyage in 2009 as an experiment in next generation catalogs. using a faceted search discovery interface, it offered numerous improvements to the uiuc catalog and focused on limiting results after searching rather than limiting searches up front. library users have praised vufind for its web 2.0 feel and features. however, there are issues, particularly with catalog data. v ufind is an open–source, next-generation catalog overlay system developed by villanova university library that was released to the public as beta in 2007 and version 1.0 in 2008.1 as of july 2009, four institutions implemented vufind as a primary catalog interface, and many more are either beta or internally testing it.2 more information about vufind, including the technical requirements and compatible opacs, is available on the project website (http://www.vufind.org). in illinois, the state consortium of academic and research libraries in illinois (carli) released a beta installation of vufind in 2008 on top of its webvoyage catalog database. the carli installation of vufind is a base installation with minor customizations to the carli catalog environment. some libraries in illinois utilize vufind as an alternative to their online catalog, including the university of illinois at urbana-champaign (uiuc), which currently advertises vufind as a more user friendly and faster version of the library catalog. as a part of the evaluation of nextgeneration catalog systems, uiuc decided to conduct hands-on usability testing during the spring of 2009. the carli catalog environment is very complex and comprises 153 member libraries throughout illinois, ranging from tiny academic libraries to the very large uiuc library. currently, 76 libraries use a centrally managed webvoyage system referred to as i-share. i-share is composed of a union catalog containing holdings of all 76 libraries as well as individual institution catalogs. library users heavily use the union catalog because of a strong culture of sharing materials between member institutions. carli’s vufind installation uses the records of the entire union catalog, but has library-specific views. each of these views is unique to the member library, but each library uses the same interface to view records throughout i-share. jennifer emanuel (emanuelj@illinois.edu) is digital services and reference librarian, university of illinois at urbana-champaign. usability of the vufind next-generation online catalog | emanuel 45 not simply find them.6 as a result, the past five years have been filled with commercial opac providers releasing next-generation library interfaces that overlay existing library catalog information and require an up-front investment by libraries to improve search capabilities. as these systems are inherently commercial and require a significant investment of capital, several open–source, next-generation catalog projects have emerged, such as vufind, blacklight, scriblio, and the extensible catalog project.7 these interfaces are often developed at one institution with their users in mind and then modified and adapted by other institutions to meet local needs. however, because they can be locally customized, libraries with significant technical expertise can have a unique interface that commercial vendors cannot compete against. one cannot discuss next-generation catalogs without mentioning the metadata that underlie opac systems. some librarians view the interface as only part of the problem of library catalogs and point to cataloging and metadata practices as the larger underlying problem. many librarians view traditional cataloging using machine-readable cataloging (marc), which has been used since the 1960s, as outdated because it was developed with nearly fifty-year-old technology in mind.8 however, because marc is so common and allows cataloging with a fine degree of granularity, current opac systems still utilize it. librarians have developed additional cataloging standards, such as dublin core (dc), metadata object description schema (mods), and functional requirements for bibliographic records (frbr), but none of these have achieved widespread adoption for cataloging printed materials. newly developed catalog projects, such as extensible catalog, are beginning to integrate these new metadata schemas, but currently others continue to use marc.9 many librarians also advocate to integrate folksonomy, or user tagging, into library catalogs. folksonomy is used by many library websites, most notably flickr, delicious, and librarything, each of which store user-submitted content that istagged with self-selected keywords that allow for easy retrieval and discovery.10 vufind integrates tagging into individual item records ■■ literature review librarians have complained about the usability of online catalogs since they were first created.4 when amazon.com became the go-to site for books and book information in the early 2000s, librarians and their users began to harshly criticize both opac interfaces and metadata standards.5 ever since north carolina state university announced a partnership with the commercial-search corporation endeca in 2006, librarians have been interested in the next generation of library catalogs and more broadly, discovery systems designed to help users discover library materials, figure 1. vufind default search figure 2. vufind advanced search figure 3. facets in vufind 46 information technology and libraries | march 2011 searching the library’s online catalog and were eager to see changes made to it. the test used was developed from a statewide usability test of different catalog interfaces usedin illinois. the test was adapted using the same sample searches, but was customized to the features and uses of vufind (see appendix). the vufind test was similar to the original test to allow a comparison of other catalog interfaces to vufind for internal evaluation purposes. i designed the test to allow subjects to perform a progressively complicated series of sample searches using the catalog while the moderator pointed out various features of the catalog interface. subjects were also asked what they thought about the search result sets and their opinions of the interface and navigation; they also were asked to perform specific tasks using vufind. the tasks were common library-catalog tasks using topics familiar at undergraduate–level students. the tasks ranged from a keyword search for “global warming” to a more complicated search for a specific compact disc by the artist prince. the tasks also included using the features associated with creating and using an account with vufind, such as adding tags and creating a favorite items list. through completing the test, subjects got an overview of vufind and were then asked to draw conclusions about their experience and compare it to other library catalogs they have used. the tests were performed in a small meeting room with one workstation set up with an install of the morae software, a microphone, and a web camera. morae is a very powerful software program developed by techsmith that records the screen on which the user is interacting with an interface, as well as environmental audio and video. although the study did not utilize all the features of the morae software, it was invaluable to the researcher to be able to review the entire testing experience with the same detail as when the test actually occurred in person. the study was carried out with the researcher sitting next to the workstation asking subjects to perform a task from the script while morae recorded all of their actions. once all fifteen subjects completed the test, the researcher watched the resulting videos and coded the answers into various themes on the basis of both broad subject categories and individual question answers. the researcher then gathered the codes into categories and used them to further analyze and gain insight into both the useful features of and problems with the vufind interface. ■■ analysis participants generally liked vufind and preferred it to the current webvoyage system. when asked to choose which catalog they would rather use, only one person, a faculty member, stated he would still use webvoyage. this faculty but does not pull tags from other sources; rather, users must tag items individually. additionally, next-generation catalogs offer a search mechanism that focuses on discovery rather than simply searching for library materials. users, accustomed to new ways of searching both on the internet and through commercial library indexing and abstracting databases, now search in a fundamentally different style than they did when opacs first became a part of library services. the online catalog is now just one of many tools that library users use to locate information and now covers fewer resources than it did ten to fifteen years ago. library users are now accustomed to using a single search box, such as with google; they also use nonlibrary online tools to find information about books and no longer view library catalogs as the primary place to look for books.11 as users are no longer accustomed to using the controlled language and particular searching methods of library catalogs because they have moved to discovering materials online, libraries must adapt to new way of obtaining information and focus not on teaching users how to locate library materials, but give them the tools to discover on their own.12 vufind is one option among many in the genre of next-generation or discovery-catalog tools. ■■ methods the study employed fifteen subjects who participated in individual, hands-on usability test sessions lasting an average of thirty minutes. i recruited volunteers though several methods, including posting to a university faculty and staff e-mail discussion list, an e-mail discussion lists aimed toward graduate students, and flyers in the undergraduate library. all means of recruitment stated that the library sought volunteer subjects to perform a variety of sample searches in a possible new library catalog interface. i also informed subjects that there was a gift card as a thank you for their time. all subjects had to sign a human subjects statement of informed consent approved by the university of illinois institutional review board. i sought a diverse sample, and therefore accepted the first five volunteers from the following pools: faculty and staff, graduate students, and undergraduate students. i felt that these three user groups were distinct enough to warrant having separate pools. the number of five users in each group was chosen because of jakob nielsen’s statement that five users will find 85 percent of usability problems and that fifteen users will discover all usability problems.13 although i did not specifically aim to recruit a diverse sample, the sample showed a large diversity in areas including age, library experience, and academic discipline. all subjects stated they had some experience usability of the vufind next-generation online catalog | emanuel 47 though there were questions as to how results were deemed relevant to the search statement as well as how they were ranked. participants were then asked to look at the right sidebar of the results page, which contains the facets. most users did not understand the term “facets,” with faculty and staff understanding the term more than graduate and undergraduate students did. one faculty member who understood the term facet noted that “facets are like a diamond with different sides or ways of viewing something.” however, when asked what term would be better to call the limiting options other than facet, several users suggested either calling the facets “categories” or renaming the column “refine search,” “narrow search,” or “sort your search.” participants were then asked to find how to see results for other i-share libraries. only two faculty members found i-share results quickly, and just half of the remaining participants were able to find the option at all. when asked what would make that option easier to find, most said they liked the wording, but the option needed to stand out more, perhaps with a different colored link or bolder type. two users thought having the location integrated as a facet would be the most useful way of seeing it. participants, however, quickly took to using the facets, as they were asked to use the climate change search results to find an electronic book published in 2008. no user had problems with this task, and several remarked that using facets was a lot easier than limiting to format and year before searching. the next task for participants was to open and examine a single record within their original climate change results (see figures 4 and 5). participants liked the layout, including the cover image with some brief title information, and a tabbed bar below showing additional information, such as more detailed description, holdings information, a table of contents, reviews, comments, and a link to request the item. several users remarked that they liked having information contained under tabs, but vufind organized each tab as a new webpage that made going back to previous tabs or the results page cumbersome. the only problem users had with the information contained within the tabs was the “staff view,” which contained the marc record information. most users looked at the marc record with confusion, including one graduate student who said, “if the staff view is of no use to the user, why even have it there?” one other useful feature that individual records in vufind contain is a link to an overlay window containing the full citation information for the item in both apa and mla formats. users were able to find this “cite this” link and liked having that information available. however, several participants noted that citation information would be much more beneficial if it could be easily exported to refworks or other bibliographic software. the next several searches used progressively higher-level member thought most of his searches were too advanced for the vufind interface and needed options that vufind did not have, such as limiting a search to an individual library or call number searching. this user did, however, specify that vufind would be easier to use for a fast and simple search. other users all responded very favorably to vufind, liking it better than any other online catalog they have used, with most stating that they wanted it as a permanent addition to the library. the most common responses to vufind were that the layout is easier on the eyes and displayed data much better than the webvoyage catalog; there were no comments about actual search results. several users stated that it was nice to be able to do a broad search and then have all limiting options presented to them as facets, allowing users to both limit after searching and letting them browse through a large number of search results. one user, an undergraduate student, stated she liked vufind because it “was new” and she always wants to try out new things on the internet. the first section of the usability test asked users to examine both the basic and advanced search options. users easily recognized how the interface functioned and liked having a single search box as the basic interface, noting that it looked more like a web search engine. they also recognized all of the dropdown menu options and agreed that the options included what they most often searched. however, four users wanted a keyword search. even though there is not a keyword search in webvoyage and there is an “all fields” menu option, participants seemed to think of the one box search universally as a keyword search and wanted that to be the default search option. one participant, an international graduate student, remarked that keyword is more understood by international students than the “all fields” search because, internationally, a field is not a search field but a scholarly field such as education or engineering. in the advanced search, all users thought the search options were clear and liked having icons to depict the various media formats. however, two users did remark that it would be useful to be able to limit by year on the advanced search page. the advanced search also is where the user can select one of seven languages, all of which are considered western languages, including latin and russian. two users, both international graduate students, stated that more languages would be beneficial, especially asian and more slavic languages. the university of illinois has separate libraries for asian and slavic materials, and these two participants said it would be useful to have search options that include the languages served by the libraries. the first task that participants were asked to do was an “all fields” search for “climate change.” they were instructed to look at the results page and an individual record to give feedback as to how they liked the layout and what they thought of the search results. upon looking at the results, all participants thought they were relevant, 48 information technology and libraries | march 2011 to items in which james joyce is the author, no participant had any problems, though several pointed out that there were three facets using his name—joyce, james; joyce, james avery; and joyce, j. a.—because of inconsistencies in cataloging (see figure 6). participants were next asked to search for an audio recording by the artist prince using the basic (single) search box. most participants did an “all fields” search for prince and attempted to use the facets to limit by a particular format. all but one was confident that they achieved the proper result, but there was confusion about the format. some participants were confused as to what format an audio recording was because the corresponding facet was for a music recording. a couple of users thought “audio recording” could be a spoken-word recording. most participants preferred that the format facets be more concrete toward a single actual physical format, such as a record, cassette, or a compact disc (see figure 7). physical formats appeared to resonate more with users than the broad cataloging term of “music recording.” a more specific format type (i.e., compact disc) is contained in the call number and should be straightforward to pull out as a facet. it appears vufind pulls the format information from marc field 245 subfield $h for medium rather than the call number (which at illinois can specify the format) or the 300 physical description field or another field such as a notes field that some institutions may use to specify the exact format. however, when participants were asked to further use facets to find prince’s first album, 1978’s for you, limitations with vufind became more apparent. each participant used a different method to search for this album, and none actually found the item either locally or in i-share, though the item has multiple copies available in both locations. most participants tried initially limiting by date because they were given that information. however, vufind’s facets focus on eras rather than specific years, which participants stated was frustrating as many items can fall under a broad era. also, the era facets brought up many more eras than one would consider an audio research skills and showed problems with both vufind and the catalog record data. the first search asked participants to do an “all fields” search for james joyce. all were able to complete the search, but there was notable confusion as to which records were written by james joyce and which were items about him. about half of the first-page results for this search did not list an author on the results page. vufind appears to pull the author field on the results page from the 100 field in the marc record, so if the 700 field is used instead for an editor, this information is not displayed on the results page. individual records do substitute the 700 field if the 100 field is not present, but this should also be the case on the initial results screen as well. several users thought it was strange that the results page often did not list the author, but an author was listed in the individual record. additionally, when asked to use the facets to limit figure 4. results set figure 5. record display figure 6. author facet figure 7. format facet usability of the vufind next-generation online catalog | emanuel 49 about both the reviews and comments that could be seen in the various records participants were asked to examine. many of the participants wanted more information as to where the reviews came from because this information was not clear. they also wanted to know whether the reviews or comments from catalog users had any type of moderation by a librarian. for the most part, participants liked having reviews inside the catalog records, but they liked having a summary even more. several users, all graduate students, expressed concern about the objectiveness of having reviews in the catalog, especially because it was not clear who did the review and feared that reviews may interject some bias that had no place in a library catalog record. one of these participants stated, “if i wanted reviews, i would just go to amazon. i don’t expect reviews, which can be subjective, to be in a library catalog—that is too commercial.” several undergraduate participants stated that reviews helped them decide whether the book was something that would be useful to them. the final task of the usability test asked participants to create an account with vufind because it is not connected to our user database. most users had no problems finishing this task, though they found some problems with the interface. first, it was not clear that users had to create an account and could not log in with their library number as they did in the library’s opac. second, the default field asks users for their barcode, which is not a term used at uiuc (users are assigned a library number). once logged in, participants were satisfied with the menu options and how their account information was displayed. finally, participants were asked, while logged in, to search for a favorite book and add it to their favorites list. all users liked the favorites-list feature, and many already knew of ways they could use it, but several wished they could create multiple lists and have the ability to arrange lists in folders. ■■ discussion participants thought favorably of the vufind interface and would use it again. they liked the layout of information much more than the current webvoyage interface and thought it was much easier to look at. they also had many comments that the color scheme (yellow and grey) was easier than the blues of the primary library opac. vufind also had more visual elements, such as cover images and icons representing format types that participants also commented on favorably. when asked to compare vufind to both the webvoyage catalog and amazon, only one participant indicated a preference for amazon, while the rest preferred vufind. the user who specified amazon, a faculty member, stated that that was where he always started searching for books; he would then search for specific titles in the recording, such as the 15th century. granted, the 15th century probably brings up music that originated in that era, not recorded then, but participants wanted the date to correspond to when an item was initially published or released. it appears that vufind pulls the era facet information from the subject headings and ignores the copyright or issue year. to users, the era facets are not useful for most of their search needs; users would rather limit by copyright or the original date of issue. another search that further highlighted problems searching for multimedia in vufind is the title search participants did for gone with the wind. everyone thought this search brought up relevant results, but when asked to determine whether the uiuc library had a copy of the dvd, many users expressed confusion. once again, the confusion was based on the inability to limit to a specific format. participants could use the facets to limit to a film or video, but not to a specific format. several participants stated that they needed specific formats because when they are doing a comparable search, they only want to find dvds. however, because all film formats are linked together under “film/video,” they must to go into individual records and examine the call number to determine the exact format. most participants stated clearly that “dvd” needed to be it’s own format facet and that entering a record to find the format required too much effort. participants also expressed frustration that the call number was the only place to determine specific format and believed that this information should be contained in the brief item information and not buried in the tabbed areas. the frustrations with the lack of specific formats also were evident when participants were asked to do an advanced search for a dvd on public speaking. all users initially thought the advanced search limiter for film/video was sufficient when they first looked at the advanced search options. however, when presented with an actual search (“public speaking”), they found that there should be more options and specific format choices up-front within the advanced search. another search that participants conducted was an author search for jack london. they then used the facets to find the book white fang. this search was chosen because the resulting records are mostly for older materials that often do not contain a lot of the additional information that newer records contain. participants looked at a specific record and then were asked what they thought of the information that was displayed. most answered that they would like as much information as you can give them, but were accepting of missing information. several participants stated that most people already know this book and thus did not need additional information. however, when pressed as to what information they would like added to the record, several users stated a summary would be the most useful. additionally, several users asked for more information 50 information technology and libraries | march 2011 the simplicity of the favorites listing feature, the difficulty of linking to other i-share library holdings, and the difficulties in using the facet categories. ■■ implications i intend to continue to perform similar usability tests on next-generation catalogs on a trial basis to examine one aspect regarding the future of online catalogs at uiuc. uiuc is looking at various catalog interfaces, of which vufind is one option, to see which best meets the needs of our users. users stated multiple times during testing that they find the current webvoyage interface to be very frustrating and will accept nearly anything that is an improvement, even if the new interface has some usability issues. vufind is not perfect for all searches, as shown by a lack of a call number search and the limitations in searching for multimedia options, but it does provide a more intuitive interface for most patrons. the future of vufind at uiuc is still open. development is currently stalled because of a lack of developer updates and internal staffing constraints both at uiuc and carli. however, because vufind is open–source, and the only ongoing cost is that of server maintenance, both carli and the library are continuing to display it as an option for searching the catalog. both carli and uiuc are closely examining other options for catalog interfaces that would provide patrons with a better search experience, but they have taken no further action to permanently adapt either vufind or to demo other options. despite its limitations, vufind is still a viable option for libraries with substantial technology expertise that are interested in a next-generation catalog interface at a low price. although it does have limitations, it has a better out-of-the-box interface than traditional opacs and should be considered alongside commercial options for any library thinking of adapting a catalog interface overlay. this usability test focused on one institution’s installation of vufind, which may or may not apply to other installations and other institutional needs. it would be interesting to study an installation of vufind at a smaller, nonresearch institution, where users have different searching needs and expectations related to a library’s opac. references 1. john houser, “the vufind implementation at villanova university,” library hi tech 27, no. 1 (2009): 96–105. 2. vufind, “vufind: about,” http://www.vufind.org/about .php (accessed sept. 10 2009). 3. kathleen bauer, “yale university vufind test— undergraduates,” http://www.library.yale.edu/libepub/ usability/studies/summary_undergraduate.doc (accessed mar. 20, 2010). library catalog to check availability. other participants who made comments about amazon stated that it was commercial and more about marketing materials, while the library catalog just provided the basic information needed to evaluate materials without attempting to sell them to you. several participants also stated they checked amazon for book information, but generally did not like it because of its commercial nature; because vufind provides much of the same information as amazon, they will use vufind first in the future. participants also thought amazon was for a popular and not scholarly audience, making it not useful for academic purposes. most users did not have much to say about the webvoyage opac, except it was overwhelming, had too many words on the result screen, and was not pleasantly visual. participants were also asked to look at vufind, amazon, and webvoyage from a visual preference. again, participants believed that vufind had the best layout. they liked that vufind had a very clean and uncluttered interface and that the colors were few and easy on the eye. they also commented about the visuals contained (cover art and icons) in the records and the vertical orientation of vufind (webvoyage has a horizontal orientation) to display records. they also liked how the facets were displayed, though two users thought they would be better situated on the left side of the results because they scan websites from the left to the right. the one thing that was mentioned several times was vufind’s lack of the star rating system that amazon uses to quickly rate an item. participants thought such a system might be better than reviews because it allows users to quickly scan through the item and not have to read through multiple reviews. when asked to rate the ease of use for vufind, with 1 being easy and 5 being difficult, participants rated it an average of 1.92. faculty rated the ease at 1.6, graduate students at 1.75, and undergraduates at 2.8. undergraduates were more likely to get frustrated at media searching and thought that some of the facets related to media items were confusing, which they used to explain their lower scores. however, when asked if they would rather use vufind over the current library catalog (webvoyage), all but one participant enthusiastically stated they would use vufind. most users stated that although vufind was not perfect, it was still much better than the other library catalog because of the better layout, visuals, and ability to limit results. the only user that specified they would still rather use the webvoyage catalog believed it had more options for advanced search, such as call number searching, which vufind lacked. there are, however, several changes that could make vufind more useful to our users that came out of usability testing. some of these are easy to implement on a local level, and others would improve the base build of vufind. a number of issues arose from usability testing, but the largest issues are the lack of refworks integration, usability of the vufind next-generation online catalog | emanuel 51 9. jennifer bowen, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology & libraries 27, no. 2 (2008): 6–19. 10. tom steele, “the new cooperative cataloging,” library hi tech 27, no. 1 (2009): 68–77. 11. ian rowlands and david nicholas, “understanding information behaviour: how do students and faculty find books?” journal of academic librarianship 34, no. 1 (2008): 3–15. 12. ja mi and cathy weng, “revitalizing the library opac: interface, searching, and display challengers,” information technology & libraries 27, no. 1 (2008): 5–22. 13. jakob nielsen, “why you only need to test with 5 users,” http://www.useit.com/alertbox/20000319.html (accessed mar. 20, 2010). 4. christine borgman, “why are online catalogs still hard to use?” journal of the american society for information science 47, no. 7 (1996): 493–503. 5. georgia briscoe, karne selden, and cheryl rae nyberg, “the catalog versus the home page: best practices for connecting to online resources,” law library journal 95, no. 2 (2003): 151–74. 6. kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39. 7. marshall breeding, “library technology guides: discovery layer interfaces,” http://www.librarytechnology. org/discovery.pl?sid=20100322930450439 (accessed mar. 2010). 8. karen m. spicher, “the development of the marc format,” cataloging & classification quaterly 21, no 3/4 (1996): 75–90. appendix. vufind usability study logging sheets i. the look and feel of vufind a. basic screen (the vufind main page) 1) is it obvious what to do? yes _____ no _____; what were you trying to do? 2) open the drop down box, examine the options. do you recognize theseoptions? yes _____ no _____ some _____ (if some, find out what the patron was expecting and get suggestions for improvement). comments: b. click on the advanced search option—take a minute to allow the participants to look around the screen 1) examine each of the advanced search options a) are the advanced search options clear? yes_____ no_____ b) are the advance search options helpful? yes_____no_____ 2) examine the limits fields, open the drop-down menu boxes a) are the limits clearly identified? yes _____ no _____ b) are the pictures helpful? yes _____ no _____ c) are the drop-down menu box options clear? yes _____ no _____ comments: ii. (back to the) basic search field a. enter the phrase—climate change (search all fields)—examine the search results 1) do the records retrieved appear to be relevant to your search statement? yes _____no _____don’t know _____ 2) what information would you like to see in the record? how should it be displayed? 3) examine the right sidebar. are the “facets” clear? yes _____no _____some, not all _____ 4) if you want to view items from other libraries in your search results, can you find the option? yes _____no _____ 5) can you find an electronic book published in 2008? yes _____no _____don’t know _____ comments: b. click on the first book record in the original climate change search results 1) is information about the book clearly represented? yes _____ no _____ 2) is it clear where to find item? yes _____ no _____ 3) look at the tags. do you understand what this feature is? yes _____ no _____ comments: c. look at the brief item information provided on the screen 1) is the information displayed useful in determining the scope and content of the item? yes _____no _____ 2) are the topics in the record useful for finding additional information on the topic? yes _____no _____ comments: d. click on each button below the brief record information 1) is this information useful? yes _____ no _____ 2) are the names for the tabs accurate? what should they be named? e. can you easily determine where the item is located and how to request it? yes _____no _____ comments: f. go back to the basic search box and enter the author james joyce (all fields) as a new search 1) is it easy to distinguish items by james joyce from items about james joyce? yes _____no _____ 2) using the facets, can you find only titles with james joyce as author? yes _____no _____ 3) can you find out how to cite an item? yes _____ no _____ comments: 52 information technology and libraries | march 2011 g. now try to find an audio recording by the artist prince using basic search were you successful? yes _____no _____ h. find the earliest prince recording ( “for you”; 1978). is it in the local collection? yes _____ no _____ if not, can you get a copy? comments: iii. in the advanced search screen: a. use the title drop down to find the item: gone with the wind 1) were you successful? yes _____ no _____ not sure _____ 2) can you locate a dvd of the same title? yes _____ no _____ 3) are copies of the dvd available in the university of illinois library? yes _____ no _____ comments: b. use the author drop down in the advanced search to locate titles by: jack london using the facets, find and open the record for the jack london novel, white fang. explore each of the: description, holdings, and comments tabs: 1) is this information useful? yes _____ no _____ 2) would you change the names of the tabs or the information on them? 3) other than your local library copy of white fang, can you find copies at other libraries? yes _____ no _____ comments: c. using the advanced search, find a dvd on public speaking (hint: use the limit box to select the film/video format) are there instructional videos in the university of illinois library? yes _____ no _____ 1) identify the author that’s responsible for one of the dvds 2) can you easily find other works by this author? yes _____ no _____ comments: iv. exploring the account features: a. click on login in the upper right corner of the page. on the next page, create an account. is it clear how to create an account? yes _____ no _____ b. once you have your account and are logged in to vufind, look at the menu on the right hand side. is it clear what each of the menu items are? yes _____ no _____ c. while still logged in, do a search for your favorite book and add it to your favorites list. is this tool useful, would you consider using it? yes _____ no _____ comments: v. comparing vufind to other resources: a. open three browser windows (this is easiest in firefox by entering ctrl-t for each new window) with 1) your library catalog 2) vufind 3) amazon.com enter global warming in each website in the basic search window of each. based on your initial reactions, which service appears the best for most of your uses? library catalog _____ vufind _____ amazon _____ comments: c. do you have a preference in the display formats? library catalog _____ vufind _____ amazon _____ comments: debriefing now that you have used vufind, how would you rate it—on a scale from 1–5, from easy to confusing to use? comments? how does it compare to other library catalogs you’ve used? if vufind and your home library catalog were available side-by-side, which would you use first? why? are you familiar with any of these other products: aquabrowser _____ googlebooks _____ microsoft live search _____ librarything _____amazon.com _____other preferred service _____ that’s it! thank you for participating in our usability. you will be receiving one other survey through email, we appreciate your opinions on the vufind product. lita covers 2, 3, and 4 index to advertisers resource discovery: comparative survey results on two catalog interfaces heather hessel and janet fransen resource discovery: comparative survey results | hessel and fransen 21 abstract like many libraries, the university of minnesota libraries-twin cities now offers a next-generation catalog alongside a traditional online public access catalog (opac). one year after the launch of its new platform as the default catalog, usage data for the opac remained relatively high, and anecdotal comments raised questions. in response, the libraries conducted surveys that covered topics such as perceptions of success, known-item searching, preferred search environments, and desirable resource types. results show distinct differences in the behavior of faculty, graduate student, and undergraduate survey respondents, and between library staff and non-library staff respondents. both quantitative and qualitative data inform the analysis and conclusions. introduction the growing level of searching expertise at large research institutions and the increasingly complex array of available discovery tools present unique challenges to librarians as they try to provide authoritative and clear searching options to their communities. many libraries have introduced next-generation catalogs to satisfy the needs and expectations of a new generation of library searchers. these catalogs incorporate some of the features that make the current web environment appealing: relevancy ranking, recommendations, tagging, and intuitive user interfaces. traditional opacs are generally viewed as more complex systems, catering to advanced users and requiring explicit training in order to extract useful data. some librarians and users also see them as more effective tools for conducting research than next-generation catalogs. academic libraries are frequently caught in the middle of conflicting requirements and expectations for discovery from diverse sets of searchers. in 2002, the university of minnesota-twin cities libraries migrated from the notis library system to the aleph500™ system and launched a new web interface based on the aleph online catalog, originally branded as mncat. in 2006, the libraries contracted with the ex libris group as one of three development partners in the creation of a new next-generation search environment called primo. during the development process, the libraries conducted multiple usability studies that provided data to inform the direction of the product. participants in the usability studies generally characterized the primo interface as “clear” and “efficient.”1 a year later the university heather hessel (heatherhessel@yahoo.com) was interim director of enterprise technology and systems, janet fransen (fransen@umn.edu) is the librarian for aerospace engineering, electrical engineering, computer science, and history of science & technology, university of minnesota, minneapolis, mn. mailto:heatherhessel@yahoo.com mailto:fransen@umn.edu information technology and libraries | june 2012 22 libraries branded primo as mncat plus, rebranded the aleph opac as mncat classic, and introduced mncat plus to the twin cities user community as a beta service. in august 2008, mncat plus was configured as the default search for the twin cities catalog on the libraries’ main website, with the libraries continuing to keep a separate link active to the aleph opac. a new organizational body called the primo management group was created in december 2008 to coordinate support, feedback, and enhancements of the local primo installation. this committee’s charge includes evaluating user input and satisfaction, coordinating communication to users and staff, and prioritizing enhancements to the software and the normalization process. when the primo management group began planning its first user satisfaction survey, the group noted that a significant number of library users seemed to prefer mncat classic. therefore, two surveys were developed in response to the group’s charge. these two surveys were identical in scope and questions, except that one survey referenced mncat classic and was targeted to mncat classic searchers (appendix a), while the other survey referenced mncat plus and was targeted to mncat plus searchers (appendix b). these surveys were designed to produce statistics that could be used as internal benchmarks to gauge library progress in areas of user experience, as well as to assist with ongoing and future planning with regard to discovery tools and features. research questions in addition to evaluating user satisfaction and requesting user input, the primo management group also chose to question users about searching behaviors in order to set the direction of future interface work. questions directed toward searching behaviors were informed by the findings from a 2009 university of minnesota libraries report on making resources discoverable.2 the group surveyed respondents about types of items they expect to find in their searches, their interest in online resources, and the entry point for their discovery experience. the primo management group crafted the surveys to get answers to the following research questions:  how often do users view their searching activity as successful?  how often do users know the title of the item that they are looking for, as opposed to finding any resource relevant to their topic?  what search environments do users choose when looking for a book? a journal? anything relevant to a topic?  how interested are users in finding items that are not physically located at the university of minnesota?  are there other types of resources that users would find helpful to discover in a catalog search? resource discovery: comparative survey results | hessel and fransen 23 although it can be tempting to think of the people using the catalog interfaces as a homogeneous group of “users,” large academic libraries serve many types of users. as wakimoto states in “scope of the library catalog in times of transition,” on the one hand, we have ‘net-generation users who are accustomed to the simplicity of the google interface, are content to enter a string of keywords, and want only the results that are available online. on the other hand, we have sophisticated, experienced catalog users who understand the purpose of uniform titles and library of congress classifications and take full advantage of advanced search functions. we need to accommodate both of these user groups effectively.3 the primo management group planned to use the demographic information to look for differences among user communities; therefore the surveys requested demographic information such as role (e.g., student) and college of affiliation (e.g., school of dentistry). in designing the surveys, the group took into account the limitations of this type of survey as well as the availability of other sources of information. for example, the primo management group chose not to include questions about specific interface features because such questions could be answered by analyzing data from system logs. the group was also interested in finding out about users’ strategies for discovering information, but members felt that this information was better obtained through focus groups or usability studies rather than through a survey instrument. research method the primo management group positioned links to the user surveys in several online locations, with the libraries’ home page providing one primary entry point. clicking on the link from the home page presented users with an intermediate page, where they were given a choice of which survey to complete: one based on mncat plus, and the other on mncat classic. if desired, users could choose to complete a separate survey for each of the two systems. links were also provided from within the mncat plus and mncat classic environments, and these links directed users to the relevant version of the survey without the intermediary page. in addition to the survey links in the online environment, announcements were made to staff about the surveys, and librarians were encouraged to publicize the surveys to their constituents around campus. the survey period lasted from october 1 through november 25, 2009. at the time of the surveys, the university of minnesota libraries was running primo version 2 and aleph version 19. because participants were self-selected, the survey results represent a biased sample, are more extreme than the norm, and are not generalizable to the whole university population. participants were not likely to click the survey link or respond to e-mailed requests unless they had sufficient incentive, such as strong feelings about one interface or the other. thirty percent of respondents provided an e-mail address to indicate that they would be willing to be contacted for focus groups or further surveys, indicating a high level of interest in the public-facing interfaces the libraries employ. in considering a process for repeating this project, more attention would be paid to methodology to address validity concerns. findings and analysis information technology and libraries | june 2012 24 findings relevant to each research question are discussed here. six hundred twenty-nine surveys contained at least one response—476 for mncat plus and 153 for mncat classic. responses by demographics as shown in table 1, graduate students were the primary respondents for both mncat plus and mncat classic, followed by undergraduates and faculty members. library staff made up 13 percent of mncat classic respondents and 4 percent of mncat plus respondents, although the actual number of library staff responding was nearly identical (twenty-one for mncat plus, twenty for mncat classic). library staff members were disproportionately represented in these survey responses and the group analyzed the results to identify categories in which library staff members differed from overall trends in the responses. questions about affiliation appeared at the end of the surveys, which may account for the high number of respondents in the “unspecified” category. mncat classic respondents frequency mncat plus respondents frequency graduate student 50 33% graduate student 176 37% undergraduate student 31 20% undergraduate student 110 23% library staff 20 13% faculty 40 8% faculty 21 14% staff (non-library) 28 6% staff (non-library) 10 7% library staff 21 4% community member 2 1% community member 11 2% (unspecified) 19 12% (unspecified) 90 19% total 153 100% total 476 100% table 1. respondents by user population a comparison of the student survey responses shows that graduate students were overrepresented, while undergraduates were underrepresented, at close to a reverse ratio. of the total number of graduate and undergraduate students, 62 percent of the respondents were graduate students, even though they accounted for only 32 percent in the larger population. conversely, undergraduates represented only 38 percent of the student respondents, even though they accounted for 68 percent of the graduate and undergraduate total. regrettably, the surveys did not include options for identifying oneself as a non-degree-seeking or professional student, so the analysis of students compared with overall population in this section includes only graduate students and undergraduates. differences were also apparent in the representation of all four categories of students within a particular college unit. at least two college units were underrepresented in the survey responses: resource discovery: comparative survey results | hessel and fransen 25 carlson school of management and the college of continuing education. one college unit was overrepresented in the survey results; 59 percent of the overall student respondents to the mncat classic survey, and 47 percent of the mncat plus students indicated that they were housed in the college of liberal arts (cla), and yet cla students only represent 32 percent of the total number of students on campus. table 2 shows the breakdown of percentages by college or unit and the corresponding breakdown by survey respondent, highlighting where significant discrepancies are evident. twin cities overall percentage of students mncat classic student survey respondents +/mncat plus student survey respondents +/ carlson school of management 9% 0% -9% 2% -7% center for allied health 0% 2% +1% 1% 0% col of educ/human development 10% 9% -1% 14% +3% col of food, agr & nat res sci 5% 4% 0% 7% +2% coll of continuing education 8% 1% -7% 1% -7% college of biological sciences 4% 6% +2% 5% 0% college of design 3% 3% 0% 3% 0% college of liberal arts 32% 59% +27% 47% +15% college of pharmacy 1% 1% 0% 0% -1% college of veterinary medicine 1% 1% 0% 1% 0% graduate school 0% 0% 0% 0% 0% humphrey inst of publ affairs 1% 1% 0% 1% 0% institute of technology (now college of science & engineering) 14% 9% -5% 10% -4% law school 2% 1% -1% 1% 0% medical school 4% 2% -3% 5% 0% school of dentistry 1% 1% 0% 0% -1% school of nursing 1% 0% -1% 0% -1% school of public health 2% 1% -1% 3% +1% table 2. student responses by affiliation information technology and libraries | june 2012 26 faculty and staff together totaled only eighty-nine respondents on the mncat plus survey and fifty-one respondents on the mncat classic survey. in keeping with graduate and undergraduate student trends, the college of liberal arts (cla) was clearly over-represented in terms of faculty responses. the cla faculty group represents about 17 percent of the faculty at the university of minnesota. yet over half the faculty respondents on the mncat plus survey were from cla; over 80 percent of the mncat classic faculty respondents identified themselves as affiliated with cla. faculty groups that were underrepresented include the medical school and the institute of technology. perceptions of success a critical area of inquiry for the surveys was user satisfaction and perceptions of success: “do users perceive their searching activity as successful?” asked in both surveys, the question’s responses allowed the primo management group to compare respondents’ perceived success between the two interfaces. results show a marked difference: while 86 percent of the mncat classic respondents reported that they are “usually” or “very often” successful at finding what they are looking for, only 62 percent of the mncat plus respondents reported the same perception of success. respondents reported very similar rates of success regardless of school, type of affiliation, or student status. figure 1. perceptions of success: mncat plus and mncat classic these results should be interpreted cautiously. because mncat plus is the libraries’ default catalog interface, mncat classic users are a self-selecting group whose members make a conscious decision to bookmark or click the extra link to use the mncat classic interface. one cannot assume that mncat users in general also would have an 86 percent perception of success were they to use mncat classic; familiarity with the tool could play a part in mncat classic users’ success. 14% 24% 44% 18% 4% 11% 32% 54% 0% 10% 20% 30% 40% 50% 60% rarely sometimes usually very often mncat classic mncat plus resource discovery: comparative survey results | hessel and fransen 27 another possible factor in the reported difference in user success is the higher proportion of known-item searching—finding a book by title—occurring in mncat classic. a user’s criteria for success differ when searching for a known item versus conducting a general topical search. it is easier for a searcher to determine that they have been successful in a situation where they are looking for a specific item. some features of mncat classic, such as the start-of-title and other browse indexes, are well suited to known-item searching and had no direct equivalent in mncat plus, which defaults to relevance-ranked results. (primo version 3 has implemented new features to enhance known-item searching.) comments received from users suggest that several factors played a role. one mncat classic respondent praised the “precision of the search...not just lots of random hits” and noted that mncat classic supports a “[m]ore focused search since i usually already know the title or author.” in contrast, a mncat plus respondent commented that the next-generation interface was “great for browsing topics when you do not have a specific title in mind.” this comment is consonant with the results from other usability testing done on next-generation catalogs. in "next generation catalogs: what do they do and why should we care?", emanuel describes observed differences between topical and known-item searching: “during the testing, users were generally happy with the results when they searched for a broad term, but they were not happy with results for more specific searches because often they had to further limit to find what they wanted in the first screen of results.”4 a common characteristic of next-generation catalogs is that they return a large result set that can then be limited using facets. training and experience may also explain some of the differences in success. mncat plus also enables functionality associated with the functional requirements for bibliographic records (frbr), which is intended to group items with the same core intellectual content in a way that is more intuitive to searchers. however, this feature is unfamiliar to traditional catalog searchers and requires an extra step to discover very specific known-items in primo. one mncat plus user expressed dissatisfaction and added, “i'm not sure if it's my lack of training/practice or that the system is not user-friendly.” in focus group analyses conducted in 2008, oclc found that “when participants conducted general searches on a topic (i.e., searches for unknown items) that they expressed dissatisfaction when items unrelated to what they were looking for were returned in the results list. end users may not understand how to best craft an appropriate search strategy for topic searches.”5 how often do users know the title of the item that they are looking for? users come to the library with different goals in mind. in “chang's browsing,” available in theories of information behavior, chang identified five general browsing themes,6 adapted to discovery by carter.7 for the purposes of the survey, the primo management group grouped those themes into two goals: finding an item when the title is known, and finding anything on a given topic. the primo management group had heard concerns from faculty and staff that they have more difficulty finding an item when they know the title when using mncat plus than they did with mncat classic. the group was interested in knowing how often users search for known items. to explore this topic and its impact on perceptions of success, the surveys included two questions on known-item and topical searching. the survey results shown in table 3 indicate that a significantly higher proportion of mncat classic respondents (30 percent plus 43 percent = 73 percent) than mncat plus respondents (24 information technology and libraries | june 2012 28 percent plus 29 percent = 53 percent) were “very often” or “usually” searching for known items. it may be that users in search of known items have learned to go to mncat classic rather than mncat plus. rarely sometimes usually very often total i already know the title of the item i am looking for mncat classic 7% (11) 19% (29) 30% (46) 43% (66) 152 mncat plus 15% (69) 33% (151) 24% (111) 29% (132) 463 i am looking for any resource relevant to my topic mncat classic 14% (21) 32% (47) 20% (29) 34% (51) 148 mncat plus 14% (62) 29% (133) 29% (133) 28% (127) 455 table 3. responses to “i already know the title of the item i am looking for” when the primo management group considered how often researchers in different user roles searched for known items versus anything on a topic, clear patterns emerged as shown in figure 2. in the mncat plus survey, only 34 percent of undergraduate mncat plus searchers “usually” or “very often” search for a particular item, versus 74 percent of faculty. conversely, 75 percent of undergraduate respondents “usually” or “very often” search for any resource relevant to a topic, versus 37 percent of faculty. graduate student respondents showed interest in both kinds of use. if successful browsing by topic is best achieved using post-search filtering, it may help to explain differences between undergraduate students and faculty. the analysis of usability testing done on other next generation catalogs described in “next generation catalogs: what do they do and why should we care?” states that “users that did not have extensive searching skills were more likely to appreciate the search first, limit later approach, while faculty members were faster to get frustrated with this technique.”8 results for all mncat classic respondents showed a preference for known item searching, but undergraduate students still indicated that they search more for anything on the topic and less for known items than faculty respondents. no significant differences were identified by discipline. resource discovery: comparative survey results | hessel and fransen 29 figure 2. searching for a known item vs. any relevant resource some qualitative comments from survey takers suggest that respondents view the library interface as a place to go to find something already known to exist, e.g., “i never want to search by topic. library catalogs are for looking up specific items.” however, with respect to discovering resources for a subject in general, both mncat classic and mncat plus respondents showed that they would also like to find items relevant to their topic (figure 2). there was no significant difference between mncat classic and mncat plus respondents on this question; in both environments, only 14 percent of the users said that they would “rarely” be interested in general results relevant to their topic. perceptions of success by specific characteristics for mncat plus, the majority of respondents “somewhat agree” or “strongly agree” that items available online or in a particular collection are easy to find. one-third of the mncat plus respondents had never tried to find an item in a particular format. over 40 percent had never tried to find an item with a particular isbn/issn. interface features may be a factor here: isbn/issn searching is not a choice in the mncat plus drop down menu, so users may not know that they can do such a search. a higher percentage of mncat classic respondents “strongly agree” that it is easy to find items by collection, available online, or in a particular format, than mncat plus respondents. figure 3 shows results based on particular characteristics. information technology and libraries | june 2012 30 figure 3. perception of success by characteristic although the surveys were primarily intended to gather reactions from end users, some interesting data emerged about usage by library staff. as demonstrated in figure 4, library staff respondents were much more likely to have performed the specific types of searches listed in this section than users generally, and reported a much higher rate of perceived success with mncat classic. figure 4. perception of success by characteristic: library staff resource discovery: comparative survey results | hessel and fransen 31 searching by location: local collections and other resources in a large research institution with several physical library locations and many distinct collections, users need the ability to quickly narrow a search to a particular collection. but even the largest institution cannot collect everything a researcher might need. the primo management group wondered not only whether users felt successful when they looked for an item in a particular collection but also wanted to explore whether users want to see items not owned by the institution as part of their search results. finding items among the many library locations was not a problem for either mncat plus or mncat classic respondents: 72 percent either somewhat or strongly agreed that it is easy to find items in a particular collection using mncat. furthermore, survey respondents of both interfaces agreed that they are interested in items no matter where the items are, which underlines the value of a service such as worldcat; 73 percent of mncat plus respondents and 78 percent of mncat classic respondents expressed a preference for seeing items held by other libraries, knowing they could request items using an interlibrary loan service if necessary. preferred search environments three of the survey questions asked users about their preferred search environments for different searching needs:  when looking for a particular book  when looking for a particular journal article  when searching without a particular title in mind each survey presented respondents with a list of choices and space to specify other sources not listed. respondents were encouraged to mark as many sources as they regularly use. when searching for a specific book, users of the two catalog environments identified a number of other sources. the top five sources in each survey are listed in table 4. when i am looking for a specific book, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. mncat classic (116) 1. mncat plus (217) 2. worldcat (50) 2. google (165) 3. amazon (50) 3. mncat classic (163) 4. google (49) 4. amazon (160) 5. google books (31) 5. google books (108) table 4. search environment for books information technology and libraries | june 2012 32 qualitative comments indicated that users like being able to connect to amazon and google books in order to look at tables of contents and reviews. they also specifically mentioned barnes and noble, as well as other local libraries. these results show that mncat plus respondents were more likely to also use mncat classic than vice-versa. the data do not suggest why this would be the case, but familiarity with the older interface may play a role. mncat classic respondents were more likely than mncat plus users to return to their search environment when searching for a particular book (82 percent versus 53 percent). one mncat plus respondent commented “i didn't know i could still get to mncat classic.” when searching for a specific journal article, users of both systems chose “other databases (jstor, pubmed, etc.)” above all the other choices. even more respondents would likely have marked this choice if not for confusion over the term “other databases.” most of the comments mentioned specific databases, even when the respondent had not selected the “other databases” choice. one user commented, “most of these choices would be illogical. you don't list article indexes, that's where i go first.” table 5 lists the five responses marked most often for each survey. when i am looking for a specific journal article, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. other databases (jstor, pubmed, etc.) (92) 1. other databases (jstor, pubmed, etc.) (232) 2. mncat classic (53) 2. google scholar (131) 3. google scholar (40) 3. e-journals list (130) 4. e-journals list (34) 4. mncat plus (110) 5. google (29) 5. mncat plus article search (101) table 5. search environment for articles. qualitative comments from respondents indicated that interfaces would be more useful if they helped users find online journal articles. this raised some questions with regard to mncat plus, which includes a tab labeled “articles” for conducting federated article searches. however, mncat plus respondents noted that they used the plus “articles” search almost as much as they did mncat plus. other plus comments included: i tried to use this for journal articles but it only has some in the database i guess and when i did my search it only found books and no articles. i don't understand it. i tried this new one and it came up with wierd [sic] stuff in terms of articles. my professor said to give up and use the regular indexes because i wasn't getting what i needed to do the paper. it wasted my time. this desire for federated search coupled with the expressions of dissatisfaction with the existing federated search platform is consistent with the mixed opinions expressed in other studies, such as sam houston state university’s assessment of use of and satisfaction with the webfeat resource discovery: comparative survey results | hessel and fransen 33 federated search tool. that study found “[f]ederated search use was highest among lower-level undergraduates, and both use and satisfaction declined as student classification rose.”9 the new search tools that contain preindexed articles, such as primo central, summon, worldcat local, and ebsco discovery service, may address the frustrations that more experienced searchers express regarding federated search technology. when researching a topic without a specific title in mind, “google” and “other databases” were nearly equal and ranked first for mncat plus respondents, while “other databases” ranked first for mncat classic respondents. table 6 lists the five responses marked most option for each survey. when i am researching a topic without a specific title in mind, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. other databases (jstor, pubmed, etc.) (84) 1. google (197) 2. mncat classic (76) 2. other databases (jstor, pubmed, etc.) (192) 3. google (63) 3. google scholar (155) 4. google scholar (47) 4. mncat plus (145) 5. worldcat (32) 5. mncat classic (101) table 6. search environment for topics significant differences based on school affiliation were evident in the area of preferred search environments for topical research. for example, institute of technology respondents reported using google much more often when researching without a specific title in mind than respondents in other areas. evidence from the health sciences is limited in that only seven percent of respondents in total identified themselves as being from this area. however, these limited results show that health sciences respondents relied more on library databases than on google. respondents in the liberal arts relied more on mncat, in either version, than did respondents in the other fields. desired resource types one feature of the primo discovery interface is its ability to aggregate records from more than one source. university libraries maintains several internal data sources that are not included in the catalog, and the possibility of including some of these in the mncat plus catalog has been considered many times since primo’s release. the primo management group was interested to hear from users whether they would find three types of internal sources useful: research reports and preprints, online media, and archival finding aids. the group also asked users to mark “online journal articles” if they would find article results helpful. the question did not specify whether journal articles would appear integrated with other search results in a mncat “books” search or information technology and libraries | june 2012 34 in a separate search such as that already provided through a metasearch on the mncat plus articles tab. the surveys asked users what kinds of resources would make mncat more useful. the results for both mncat plus and mncat classic were similar and response counts for both surveys were ordered as shown in table 7. respondents could mark more than one of the choices. i would find mncat more useful if it helped me find: mncat classic frequency mncat plus frequency online journal articles 65 255 u of m research materials (e.g., research reports, preprints) 34 149 online media (e.g., digital images, streaming audio/visual) 27 134 archival finding aids 27 90 table 7. desired resource types the primo management group noted that more mncat plus respondents chose “online journal articles” more frequently than the other categories even though the mncat plus interface includes an “articles” tab for federated searching. it is unclear whether the respondents were not seeing the “articles” tab in mncat plus because they would like to see search results integrated, or if they were using the “articles” tab and were not satisfied with the results. comments from respondents generally supported the inclusion of a wider range of resources in mncat. however, several respondents also expressed concerns about the trade-offs that might be involved in providing wider coverage. one user liked the idea of having the databases “all … in one place,” but added that “it would have to just give you the stuff that you need.” several users cited the varying quality of the material discovered through library sources. one user supported the inclusion of articles “if it included good articles and not the ones i got.” a mncat classic respondent gave the variable quality of the material he or she had found through a database search as a reason for leaving the coverage of mncat as it is: “i use the best sources depending on my needs.” another mncat classic user expressed doubt that coverage of all disciplines was feasible. in commenting on the content of mncat, respondents also mentioned specific types of material that they wanted to see (e.g. archives of various countries), as well as difficulties with particular classes of material (“the confusing world of government documents”). one mncat plus user related his or her interest in public domain items to a specific item of functionality that would enhance their discovery, namely a date sort. in general, the interest in university of minnesota research material was fairly high. however, faculty members ranked university of minnesota research materials last in terms of preference: only twelve faculty respondents chose the option, out of sixty-one total faculty respondents. resource discovery: comparative survey results | hessel and fransen 35 conclusions the data from two surveys, conducted concurrently in 2009 on a traditional opac (mncat classic) and next-generation catalog (mncat plus), point to differences in the use and perceptions of both systems. there appeared to be fairly strong “brand loyalty” with mncat classic, given that this interface is no longer the default search for the libraries. surveys for both systems suggest a perception of success that is lower than desirable and that there is room to improve the quality of the discovery experience. it is unclear from the data if the reported perceptions of success were the result of the systems not finding what the user wants, or if the systems did not contain what the user wanted to find. mncat classic respondents were more likely to use worldcat to find a specific book than mncat plus respondents. mncat plus respondents indicated a use of mncat classic, but not vice versa. both sets of surveys described use of amazon and google for discovery. mncat plus respondents reported lower rates of success at finding known items than mncat classic respondents. mncat classic respondents were far more likely to have a specific title in mind that they wanted to obtain; half of the mncat plus respondents reported having a specific title in mind. the team that examined the survey responses found that the data suggested several key attributes that should be present in the libraries discovery environment. further discussion of the results and suggested attributes was conducted with library staff members in open sessions. results also informed local work on improving discovery interfaces. the results suggested:  the environment should support multiple discovery tasks, including known-item searching and topical research.  support for discovery activity should be provided to all primary constituent groups, noting the significant survey response by graduate student searchers.  users want to discover materials that are not owned by the libraries, in addition to local holdings.  a discovery environment should make it easy for users to find and access resources in vendor-provided resources, such as jstor and pubmed. while the results of the 2009 surveys provided a valuable description of usage, the survey team recognized that methodological choices limit the usefulness in applying results to a larger population. the team also recognized that there were a number of questions yet unanswered. some of these outstanding questions present opportunities for future research and suggest that a variety of formats might be useful, including surveys, focus groups, and targeted interviews.  to what extent do users expect to find integrated search results among different kinds of content, such as articles, databases, indexes, and even large scale data sets?  what general search strategies do users use to navigate the complex discovery environment that is available to them, and where are the failure points?  how much of the current environment requires training and how much is truly intuitive to users? information technology and libraries | june 2012 36  how can the university libraries identify and serve users who did not complete the surveys?  how useful would users find targeted results based on a particular characteristic such as role, student status, or discipline? since the surveys were conducted, the university libraries upgraded to primo version 3, which included features to address some of the concerns respondents identified in the surveys, such as known-item searching. primo version 3 allows users to conduct a left-justified title search (“title begins with…”), as well as sort by fields such as title and author. once the new version has been in place long enough for users to develop some comfort with the interface, the primo management group intends to resolve methodological issues and repeat its surveys, measuring users’ reactions against the baseline data set in the 2009 surveys. acknowledgements we would like to thank the other members of the primo management group, who helped to design and implement the surveys, as well as analyze and communicate the results: chew chiat naun (chair), susan gangl, connie hendrick, lois hendrickson, kristen mastel, r. arvid nelsen, and jeff peterson. we also want to acknowledge the helpful feedback and guidance of the group’s sponsor, john butler. references 1 tamar sadeh, “user experience in the library: a case study.” new library world 109, no. 1/2 (2008): 7–24. 2 cody hanson et al., discoverability phase 1 final report (minneapolis: university of minnesota, 2009), http://purl.umn.edu/48258/ (accessed dec. 20, 2010). 3 jina choi wakimoto, “scope of the library catalog in times of transition.” cataloging & classification quarterly 47, no. 5 (2009): 409–26. 4 jenny emanuel, “next generation catalogs: what do they do and why should we care?” reference & user services quarterly 49, no. 2 (winter, 2009): 117–20. 5 karen calhoun, diane cellentani, and oclc, online catalogs : what users and librarians want: an oclc report (dublin, ohio: oclc, 2009). 6 shan-ju chang, “chang's browsing,” in theories of information behavior, ed. karen e. fisher, sandra erdelez and lynne mckechnie, 69-74 (medford, n.j.: information today, 2005). 7 judith carter, “discovery: what do you mean by that?” information technology & libraries 28, no. 4 (december 2009): 161–63. 8 jenny emanuel, “next generation catalogs: what do they do and why should we care?” reference & user services quarterly 49, no. 2 (winter, 2009): 117–20. 9 abe korah and erin dorris cassidy. “students and federated searching: a survey of use and satisfaction,” reference & user services quarterly 49, no. 4 (summer 2010): 325–32. https://purl.umn.edu/48258 resource discovery: comparative survey results | hessel and fransen 37 appendix a. mncat classic survey the library catalog is intended to help you find an item when you know its title, as well as suggest items that are relevant to a given topic. we’d like to know how often you use mncat classic for these different purposes. 1. when i visit mncat classic… very often usually sometimes rarely i already know the title of the item i am looking for     i am looking for any resource relevant to my topic     many people use tools other than the library catalog to find books, articles, and other resources. for the different situations below, please tell us what other tools you find helpful. 2. when i am looking for a specific book, i usually search (check all that apply):  amazon  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search  google scholar  libraries onesearch other (please specify) _______________________________________________________ 3. when i am looking for a specific journal article, i usually search (check all that apply):  amazon  google books  mncat plus article search  citation linker  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat other (please specify) ___________________________________________________ information technology and libraries | june 2012 38 4. when i am researching a topic without a specific title in mind, i usually search (check all that apply):  amazon  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search other (please specify) ___________________________________________________ now we’d like to know what you think of mncat classic and what new features (if any) you’d like to see. 5. when i use mncat classic very often usually sometimes rarely i succeed in finding what i’m looking for     6. it is easy to find the following kinds of items in mncat classic strongly agree somewhat agree somewhat disagree strongly disagree i haven’t looked for this with mncat classic an item that is available online      an item within a particular collection (e.g., wilson library, university archives, etc.)      an item in a particular physical format (e.g., dvd, map, etc.)      an item with a specific isbn or issn      resource discovery: comparative survey results | hessel and fransen 39 7. i would find mncat classic more useful if it helped me find (check all that apply):  online journal articles  online media (e.g., digital images, streaming audio/visual)  archival finding aids  u of m research material (e.g., research reports, preprints) other (please specify) ___________________________________________________ 8. the worldcat catalog allows you to search the contents of many library collections in addition to the university of minnesota. which of the following best describes your level of interest in this type of catalog?  yes, i am interested in what other libraries have regardless of where they are, knowing i could request it through interlibrary loan if i want it  yes, i am interested, but only if i can get the items from a nearby library  no, i am interested only in what is available at the university of minnesota libraries please share anything you particularly like or dislike about mncat classic. 9. what i like most about mncat classic is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ 10. what i like least about mncat classic is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ we want to understand how different groups of people use mncat classic, as well as other tools, for finding information. please answer the following questions to give us an idea of who you are. 11. how are you affiliated with the university of minnesota?  faculty  graduate student  undergraduate student  staff (non-library) information technology and libraries | june 2012 40  library staff  community member 12. with which university of minnesota college or school are you most closely affiliated?  allied health programs  food, agricultural and natural resource sciences  pharmacy  biological sciences  law school  public affairs  continuing education  liberal arts  public health  dentistry  libraries  technology (engineering, physical sciences & mathematics)  design  management  veterinary medicine  education & human development  medical school  none of these  extension  nursing 13. we are interested in learning more about how you find the materials you need. if you would be willing to be contacted for further surveys or focus groups, please provide your e-mail address: _______________________________________________ resource discovery: comparative survey results | hessel and fransen 41 appendix b. mncat plus survey the library catalog is intended to help you find an item when you know its title, as well as suggest items that are relevant to a given topic. we’d like to know how often you use mncat plus for these different purposes. 1. when i visit mncat plus… very often usually sometimes rarely i already know the title of the item i am looking for     i am looking for any resource relevant to my topic     many people use tools other than the library catalog to find books, articles, and other resources. for the different situations below, please tell us what other tools you find helpful. 2. when i am looking for a specific book, i usually search (check all that apply):  amazon  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search  google scholar  libraries onesearch other (please specify) _______________________________________________________ 3. when i am looking for a specific journal article, i usually search (check all that apply):  amazon  google books  mncat plus article search  citation linker  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat other (please specify) ___________________________________________________ information technology and libraries | june 2012 42 4. when i am researching a topic without a specific title in mind, i usually search (check all that apply):  amazon  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search other (please specify) ___________________________________________________ now we’d like to know what you think of mncat plus and what new features (if any) you’d like to see. 5. when i use mncat plus very often usually sometimes rarely i succeed in finding what i’m looking for     6. it is easy to find the following kinds of items in mncat plus strongly agree somewhat agree somewhat disagree strongly disagree i haven’t looked for this with mncat plus an item that is available online      an item within a particular collection (e.g., wilson library, university archives, etc.)      an item in a particular physical format (e.g., dvd, map, etc.)      an item with a specific isbn or issn      resource discovery: comparative survey results | hessel and fransen 43 7. i would find mncat plus more useful if it helped me find (check all that apply):  online journal articles  online media (e.g., digital images, streaming audio/visual)  archival finding aids  u of m research material (e.g., research reports, preprints) other (please specify) ___________________________________________________ 8. the worldcat catalog allows you to search the contents of many library collections in addition to the university of minnesota. which of the following best describes your level of interest in this type of catalog?  yes, i am interested in what other libraries have regardless of where they are, knowing i could request it through interlibrary loan if i want it  yes, i am interested, but only if i can get the items from a nearby library  no, i am interested only in what is available at the university of minnesota libraries please share anything you particularly like or dislike about mncat plus. 9. what i like most about mncat plus is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ 10. what i like least about mncat plus is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ we want to understand how different groups of people use mncat plus, as well as other tools, for finding information. please answer the following questions to give us an idea of who you are. 11. how are you affiliated with the university of minnesota?  faculty  graduate student  undergraduate student  staff (non-library) information technology and libraries | june 2012 44  library staff  community member 12. with which university of minnesota college or school are you most closely affiliated?  allied health programs  food, agricultural and natural resource sciences  pharmacy  biological sciences  law school  public affairs  continuing education  liberal arts  public health  dentistry  libraries  technology (engineering, physical sciences & mathematics)  design  management  veterinary medicine  education & human development  medical school  none of these  extension  nursing 13. we are interested in learning more about how you find the materials you need. if you would be willing to be contacted for further surveys or focus groups, please provide your e-mail address: _______________________________________________ editorial | truitt 3 marc truitteditorial marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. the catalog. love it? hate it? depending upon who is speaking, it may be cast as the ultimate portal that enables user access to all local and networked resources, or it may be a tool of byzantine complexity, comprehensible at best to but a small fraction of librarians able to navigate its bibliographic metadata encoded in an arcane 1960s-era format. it is a rich trove of structured and controlled information assembled over decades by the work of countless dedicated catalogers and others. or, it is the now-obsolete product of a labor-intensive process of description and subject analysis that has no relevance in a web-centric world where “everything” is findable via the google search-box. its attempt to organize knowledge provides catalogers with a raison d’etre, but sends their colleagues and many users fleeing for simpler and more all-encompassing tools. it is our alpha and omega, our yin and our yang. few topics in librarianship—perhaps with the conspicuous exception of that perennial library school favorite, our profession’s status as a profession—seem to provoke the range and depth of sentiment engendered by discussions of the place of the catalog. especially in recent years, criticism of the catalog has grown ever more strident, to the point where it has become commonplace in our profession’s literature to say that this most basic of library services “sucks.” as a consequence, librarians have increasingly fallen into one of two camps, with those critical of the catalog often simplistically characterized as favoring, and those defending it as opposing “change.” a number of initiatives have emerged in response to this ferment. some of these have focused on our bibliographic metadata, and particularly on its ability to express the relationships and interconnectedness of the bibliographic universe. as we have traditionally cataloged whatever we had “in-hand,” our cataloging codes and encoding standards have done a very good job of managing the description of bibliographic items; what they have not generally expressed well are the relationships among items. frbr and frad—the functional requirements for bibliographic records and the functional requirements for authority data—seem promising beginnings for addressing the relationship issues, although there are as-yet very few practical implementations. resource description and access (rda), the forthcoming successor to aacr2, is designed around frbr concepts; it will be interesting to see how this plays out in the “real world;” equally interesting will be to what degree the present (or a modified) marc21 is able to express rda’s frbr-based relationship model. other approaches have focused on developing systems that are able to exploit our existing investment in bibliographic metadata in new and useful ways. the pioneering and best-known example of this, of course, is the discovery tool developed by a partnership of north carolina state university libraries and endeca, which premiered in early 2006. this initiative included several innovative features not previously found in library catalogs, such as search result relevance ranking and the ability to perform faceted searching against a variety of controlledvocabulary indices (subject/topical, form/genre, date, etc.) ncsu’s endeca discovery tool spawned an entirely new product segment for the catalog: major ils vendors have scrambled to develop their own next-gen products, combining relevancy and facets with additional functionality such as web 2.0 social and collaborative tools and enhanced federated searching capabilities. the result of all this activity has been the first cross-platform growth opportunity for ils vendors since the development of resource-linking tools and the erm. we at ital have watched these trends with keen interest and have published works describing many of the major developments vis-a-vis the catalog in recent years. indeed, since late 2004, ital has published at least eleven major papers on various topics related to improving the catalog. with our publication of jennifer bowen’s report on the first phase outcomes of the university of rochester’s extensible catalog (xc) project in this issue of ital, we continue our commitment to publish important research in this area. the rochester project is noteworthy, both for its modular and metadata-focused approach and for its high visibility as an open source effort that has received significant support from the andrew w. mellon foundation. i predict that this paper will quickly take its place among the other ground-breaking works on the catalog that ital has published, and i’ll eagerly be awaiting the next progress report on the xc. n “must-reads” dept. okay, so i may not be the first out of the gate with this one, but for those of you who haven’t looked at it yet, trust me, you’ll want to. jonathan zittrain’s the future of the internet and how to stop it (yale university press, 2008), which divides the internet into “generative” technologies such as the pc, and proprietary appliances such as the iphone, may or may not resonate with you, but i think it could well become the next big debate about where the net is and where it should be going. grab a copy and read it today. 32 information technology and libraries | june 2007 author id box for 3 column layout column title 32 information technology and libraries | june 2008 communications michaela brenner and peter klein discovering the library with google earth libraries need to provide attractive and exciting discovery tools to draw patrons to the valuable resources in their catalogs. the authors conducted a pilot project to explore the free version of google earth as such a discover tool for portland state library’s digital collection of urban planning documents. they created eye-catching placemarks with links to parts of this collection, as well as to other pertinent materials like books, images, and historical background information. the detailed how-to-do part of this article is preceded by a discussion about discovery of library materials and followed by possible applications of this google earth project. in calhoun’s report to the library of congress, it becomes clear that staff time and resources will need to move from cataloging traditional formats, like books, to cataloging unique primary sources, and then providing access to these sources from many different angles. “organize, digitize, expose unique special collections” (calhoun 2006). in 2005, portland state university library received a grant “to develop a digital library under the sponsorship of the portland state university library to serve as a central repository of the collection, accession, and dissemination of [urban] key planning documents . . . that have high value for oregon citizens and for scholars around the world” (abbott 2005). this collection is called the oregon sustainable community digital library (oscdl) and is an ongoing project that includes literature, planning reports, maps, images, rlis (regional land information system) geographical data, and more. much of the older material is unpublished, and making it available online presents a valuable resource. most of the digitized—and, more recently, borndigital—documents are accessible through the library’s catalog, where patrons can find them together with other library materials about the city of portland. the bibliographic records are arranged in the catalog in an electronic resource management (erm) system (brenner, larsen, and weston 2006). additionally, these bibliographic data are regularly exported from the library catalog to the oscdl web site (http://oscdl. research.pdx.edu) and there integrated with gis (global information system) features, thus optimizing cataloging costs by reusing data in a different electronic environment. committed to not falling into the trap that clifford lynch had in mind when he wrote, “i think there is a mental picture that many of us have that digitization is something you do and you finish . . . a finite, one-time process“ (lynch 2002), and agreeing with gatenby that “it doesn’t matter at all if a user finds our opac through the ‘back door ’“ (gatenby 2007), the authors looked into further using these existing data from the library catalog by making them accessible from a popular and appealing place on the internet, a place that users are more likely to visit than the library catalog. the free version of google earth, a virtual-globe program that can be installed on pcs, lent itself to experimenting. “google earth combines the power of google search with satellite imagery, maps, terrain and 3-d buildings to put the world’s geographic information at your fingertips” (http://earth.google.com). from there, the authors provide links to the digitized documents in the library catalog. easy distribution, as well as the more playful nature of this pilot project and the inclusion of pictures, make the available data even more attractive to users. “google now reigns” “google now reigns,” claims karen markey (markey 2007), and many others agree that using google is easier and more appealing to most than using library catalogs. google’s popularity has been growing spectacularly. in august 2007, google accounted for 64 percent of all u.s. searches (avtec media group 2007). in contrast, the oclc report on how users perceive the library shows that only one percent of the respondents begin their information search on a library web site, while 84 percent use search engines (de rosa, et al. 2005). “if we [libraries] want to survive,” says stephen abram, “we must place our messages where the users are seeking answers and will trip over them. today that usually means at yahoo, msn, and google” (abram 2005). according to lorcan dempsey, in the longer run, traffic to the library catalog will come by linking from larger consolidated resources, like open worldcat and google scholar (dempsey 2005). dempsey also stressed that it becomes more and more significant to differentiate between discovery and location (dempsey 2006a). initially, users want to discover; they want to find what interests them independent from where this information is actually located and available. while there may be lots of valuable, detailed, and exceptionally well-organized bibliographic information in the library catalog, not michaela brenner (brennerm@pdx.edu) is assistant professor and database maintenance and catalog librarian at portland state university library, oregon. peter klein (peter.klein@colorado.edu) is aerospace engineering bs/ms at the university of colorado at boulder. introducing zoomify image | smith 33discovering the library with google earth | brenner and klein 33 many users (one percent) are willing to discover this information through the catalog. they may not discover what a library has to offer if “the library does not find a way to go to the user, rather than waiting for the user to come to the library” (coyle 2007). unless the intent is to keep our treasures buried, the library community needs to work with popular outside discovery environments— like search engines—to bring information available in libraries to users from the outside. libraries are, although sometimes reluctantly, responding. google, google scholar, and google books are open worldcat partner sites that are now or soon will be providing access to worldcat records. google book search includes “find this book in the library,” and the advanced book search also has the option to limit a search to library catalogs with access to the worldcat web record for each item. “deep linking” enables web users to link from search results in yahoo, google, or other partner sites to the “find in a library” interface in open worldcat, and then directly to the item’s record in their library’s online public access catalog (opac). simply put, “find it on google, get it from your library” (calhoun 2006). the “leveraged discovery environment” is an expression coined by dempsey that means it becomes increasingly important to leverage a “discovery environment which is outside your control to bring people back into our catalog environment (like amazon, google scholar)” (dempsey 2006b). issues in calhoun’s report to the library of congress include the question of how to get a google user from google to library collections. she quotes an interviewee saying that “data about a library’s collection needs to be on google and other popular sites as well as the library interface” (calhoun 2006). with evidence pointing to the heavy use of google for discovery and with google earth technology providing such a powerful visualization tool, the authors felt tempted to experiment with existing data from portland state library’s digital oscdl collection and make these data accessible through a virtual globe. the king’s college cultural heritage project martyn jessop from king’s college in london, united kingdom, published an article about a relatively small pilot project on providing access to a digital cultural heritage collection through a geographical information system (jessop 2005). jessop’s approach to explore different technologies and techniques to apply to existing data about unique primary sources was exactly what the authors had in mind with this project, and provided encouragement to move forward with the idea of providing additional access to the oregon sustainable community digital library (oscdl) collections through google earth. similar to jessop, the authors regard it an unaffordable luxury to put a great deal of effort into collecting, digitizing, and cataloging materials without making them available to a much broader audience through multiple access points. comparable to jessop, the goal of this project was to find a relatively simple, low-cost technological solution that could also be applied to a much wider range of data without much more investment in staff time and money. once the authors mastered the initial hurdle of understanding google earth’s programming language, they could easily identify with jessop’s notion of “project creep” as more and more possibilities arose to make the project more appealing. this, as with the king’s college project, was a valuable part of the development process, the details of which are described below. the portland state library oscdl-ongoogle-earth project the authors chose ten portlandbased oscdl sub-collections as the basis of this pilot project: harbor drive, front street, portland public market, urban studies collection, downtown, park blocks, south park blocks, pioneer courthouse square, portland city archives, and jpact (joint policy advisory committee on transportation). the programming language for google earth is kml (keyhole markup language), a file format used to display geographic data. kml is based on the xml standard and can be created with the google earth user interface or from scratch with a simple text editor. having no previous kml experience, the authors decided to use both. figure 1. basic placemark in google earth figure 2. kml script for basic placemark 34 information technology and libraries | june 200834 information technology and libraries | june 2008 a basic placemark provided by google earth (figure 1), copied and pasted in notepad (figure 2), was the starting point. at portland state library, information technology routinely batch export cataloged oscdl data from the library catalog (ils) to the oscdl web site to reuse them. for the google earth project, the authors had two options, to either export data relevant to our collections from the ils to a spreadsheet or to use an existing excel spreadsheet containing most of the same data, including place coordinates. this spreadsheet was one of many others that had been created to keep track for the digitization process as well as for creating bibliographic records for the library catalog later. using the available spreadsheet again, the following data were retained: n the title of the collection n longitude and latitude of the place the collection refers to n a brief description of the collection the following were added manually to the remaining spreadsheet: n all the texts and urls for the collection-specific links n urls for the collection-specific images the authors extracted the placemark-specific script from figure 2 to create a template in notepad. a general description and all links that were the same for the ten collections were added to this template, and placeholders were inserted for collection-specific data (figure 3). using microsoft office word’s mail merge, the authors populated the template with the data from the spreadsheet in one quick step. the result was a kml script that included all the placemark data for the ten collections (figure 4). the script was saved as plain text (.txt) first, and then renamed with the extension .kml, which represents the final file (figure 5). clicking the oscdl.kml icon on a desktop or inside a web application opens google earth. the user “flies” to portland, where ten stars represent the ten collections (figure 6). zooming in, the placemarks show the locations to which the collections refer. considering the many layers and icons available in google earth, the authors decided to use yellow stars to make them more visible. in order to avoid clutter and overlapping labels, titles only appear on mouse-over (figures 7 and 8). figure 9 shows the open placemark for portland public market. “portland state university” with the university’s logo is a link that takes the user to the university’s homepage. the next line is the title of the collection, followed by a brief description. the paragraph after that is the same for all collections and includes links to the portland state university library and the oscdl web site. the collection-specific links that follow next go to the library catalog where the user has access to the digitized manuscripts of this collection (figure 10). other pertinent links—in this case to a book available in the library, a public web site on the history of the market, and a historic image of the market—were added as well. to make the placemarks visually more attractive, all links are presented in the school’s “psu green,” and an image representative of the collection was added. the pictures can be enlarged in a new window by clicking on them. to avoid copyright issues, the authors photographed their own images. the last link opens an e-mail window for questions and comments (figure 11). this link is intended to bring some feedback and suggestions on how to improve the project and on its value for researchers and other users. the authors have been toying with the idea of including in the future more elaborate features such as video clips and music. one more recent feature is that kml files, created in google earth, can now also be viewed on the web by simply entering the url of the kml file into the search box of google maps (figure 12), thus creating google earth placemarks in figure 3. detail of template with variables between « double brackets » figure 4. detail: “downtown” placemark of finished kml script figure 5. simplified process figure 6. ten stars representing the ten collections introducing zoomify image | smith 35discovering the library with google earth | brenner and klein 35 google maps with different view options (figures 13 and 14). not all formatting is correctly transferred, and at this point, there is no way to correct this in google maps. for example, the yellow stars were white, the mouse-over didn’t work and the size of the placemarks was imprecise. however, the content of the placemarks—except for the images which didn’t show on some computers—was fully retained and all links worked (figure 15). although the use of the kml file in google maps is not as elegant as in google earth, it has the advantage that there is no need to install software as with google earth. this adds value to kml files and makes projects like this more versatile. the authors have identified several uses for the kml file: n a workstation in the library can be dedicated to resources about the city of portland. an icon on the desktop of this workstation will open google earth and “fly” directly to portland where the yellow stars are displayed. n professors can easily add the .kml file to webct (now blackboard) or other course management systems. n the file can be e-mailed as an figure 7. zoomed in with mouse-over placemark figure 8. location of the pioneer courthouse square placemark figure 9. portland public market figure 10. access to the collection in library catalog figure 11. ready-to-go e-mail window figure 12. url of kml file in google maps search box figure 13. “map” view in google maps figure 14. “satellite” view in google maps figure 15. portland public market placemark in google maps 36 information technology and libraries | june 200836 information technology and libraries | june 2008 attachment to those interested in the development of the city of portland. n a link from the wikipedia page related to the oscdl project leads to the google earth pilot project. n the project was added to the google earth gallery where many remarkable projects, created by individuals and groups can be found. n it can also be accessed through the oscdl web site, and relevant links from the records in the library catalog to google maps can be included. it may be useful to alert patrons, who actually did come to the catalog by themselves, to this visual tool. conclusion “the question now is not how we improve the catalog as such,” says dempsey. “it is how we provide effective discovery and delivery of library materials in a network environment where attention is scarce and information resources are abundant and where discovery opportunities are being centralized into major search engines and distributed to other environments” (dempsey 2006a). with this in mind, the authors took on the challenge to create another discovery tool for one of the library’s primary unique digital collections. google earth is not the web, and it needs to be installed on a workstation in order to use a kml file. on the other hand, the file created in google earth can also be used on the web more readily but less elegantly in google maps, thus possibly reaching a larger audience. similar to the king’s college project and following abram’s suggestion that “we should experiment more with pilots in specific areas” (abram 2005), this pilot project is of an exploratory, experimental nature. and as with many experiments, the authors were testing an idea, trying something different and new to find out how useful this idea might be, and useful applications for this project were identified. google earth is a sophisticated, attractive, and exciting program—and fun to play with. in a time “where attention is scarce and information resources are abundant,” as dempsey (2006a) says, we need to provide these kinds of discovery tools to attract patrons and to lure them to these valuable resources in our library’s catalog that we created with so much diligence and cost of staff time and resources. works cited abbott, carl. 2005. planning a sustainable portland: a digital library for local, regional, and state planning and policy documents. framing paper. http://oscdl.research.pdx.edu/documents/library_grant.pdf. abram, stephen. 2005. the google opportunity. library journal 130, no. 2: 34. avtec media group. 2007. search engine statistics. http://avtecmedia.com/ internet-marketing/internet-marketing-trends.htm. brenner, michaela, tom larsen, and claudia weston. 2006. digital collection management through the library catalog. information technology and libraries 25, no. 2: 65–77. calhoun, karen. 2006. the changing nature of the catalog and its integration with other discovery tools; final report, prepared for the library of congress. www.loc.gov.proxy.lib.pdx. edu/catdir/calhoun-report-final.pdf. coyle, karen. 2007. the library catalog in a 2.0 world. the journal of academic librarianship 33, no. 2: 289–291. de rosa, cathy et al. 2005. perceptions of libraries and information resources. a report to the oclc membership. www .oclc.org.proxy.lib.pdx.edu/reports/ pdfs/percept_all.pdf. dempsey, lorcan. 2006a. the library catalogue in the new discovery environment: some thoughts. ariadne 48. www.ariadne.ac.uk/issue48/dempsey. dempsey, lorcan. 2006b. lifting out the catalog discovery experience. lorcan dempsey’s weblog on libraries, services, and networks, may 14, 2006. http://orweblog .oclc.org/archives/001021.html dempsey, lorcan. 2005. making data work—web 2.0 and catalogs. lorcan dempsey’s weblog on libraries, services, and networks, october 4, 2005. http://orweblog.oclc .org/archives/000815.html gatenby, janifer. 2007. accessing library materials via google and other web sites. paper presented to elag (european library automation group), may 9, 2007. http://elag2007.upf. edu/papers/gatenby_2.pdf. jessop, martyn. 2005. the application of a geographical information system to the creation of a cultural heritage digital resource. literary and linguistic computing: journal of the association for literary and linguistic computing 20, no. 1: 71–90. lynch, clifford. 2002. digital collections, digital libraries, and the digitization of cultural heritage information. first monday 7, no. 5. www.firstmonday. org/issues/issue7_5/lynch. markey, karen. 2007. the online library catalog. d-lib magazine 13, no. 1/2. www .dlib.org/dlib/january07/markey/01 markey.html. lita cover 2, cover 3, cover 4 index to advertisers extending im beyond the reference desk: a case study on the integration of chat reference and library-wide instant messaging network ian chan, pearl ly, and yvonne meulemans information technology and libraries | september 2012 4 abstract openfire is an open-source instant messaging (im) network and a single unified application that meets the needs of chat reference and internal communication. in fall 2009, the california state university san marcos (csusm) library began using openfire and other jive software im technologies to simultaneously improve our existing im-integrated chat reference software and implement an internal im network. this case study describes the chat reference and internal communications environment at the csusm library and the selection, implementation, and evaluation of openfire. in addition, the authors discuss the benefits of deploying an integrated im and chat reference network. introduction instant messaging (im) has become a prevalent contact point for library patrons to get information and reference help, commonly known as chat reference or virtual reference. however, im can also offer a unique method of communication between library staff. librarians are able to rapidly exchange information synchronously or asynchronously in an informal way. im provides another means of building relationships within the library organization and can improve teamwork. many different chat-reference software packages are widely used by libraries, including questionpoint, meebo, and libraryh3lp. less commonly used is openfire (www.igniterealtime.org/projects/openfire), an open-source im network and a single unified application that uses the extensible messaging and presence protocol (xmpp), a widely adopted open protocol for im. since 2009, the california state university san marcos (csusm) kellogg library has used openfire for chat reference and internal im communication. openfire was relatively easy to set up and administer by the web development librarian. librarians and library users have found the im interface to be intuitive. in addition to helpful chat reference features such as statistics capture, queues, transfer, linking to meebo widgets, openfire offers the unique capability to host an internal im network within the library. ian chan (ichan@csusm.edu) is web development librarian, california state university san marcos, pearl ly (pmly@pasadena.edu) is access services & emerging technologies librarian, pasadena community college, pasadena, and yvonne meulemans (ymeulema@csusm.edu) is information literacy program coordinator, california state university san marcos, california. extending im beyond the reference desk | chan, ly, and meulemans 5 in this article, the authors present a literature review on im as a workplace communication tool and its successful use in libraries for chat reference services. a case study on the selection, implementation, and evaluation of openfire for use in chat reference and as an internal network will be discussed. in addition, survey results on the library staff use of the internal im network and its implications for collaboration and increased communication are shared. literature review although there is a great deal of literature on im for library reference services, publications on the use of im in libraries for internal communications do not appear in the professional literature. a review of library and information science (lis) literature has revealed very limited work on this aspect of instant messaging. however, a wider literature review in the fields of communications, computer science, and business, indicates there is growing interest in studying the benefits of im within organizations. instant messaging in the workplace in the workplace, im can offer a cost-effective means of connecting in real-time and may increase communication effectiveness between employees. it offers a number of advantages over email, telephone, and face-to-face that we will discuss further in the following section. within the academic library, im offers the possibility of not only improving access to librarians for research help but also provides the opportunity to enhance communication and collaboration throughout the entire organization. research findings indicate that im allows coworkers to maintain a sense of connection and context that is different from email, face-to-face (ftf), and phone conversations.1 each im conversation is designed to display as a single textual thread with one window per conversation. the contributions from each person in the discussion are clearly indicated and it is easy to review what has been said. this design supports the intermittent reconnection of conversation and in contrast to email, “intermittent instant messages were thought to be more immersive and to give more of a sense of a shared space and context than such email exchanges.”2 through the use of im, coworkers gain a highly interactive channel of communication that is not available via other methods of communication.3 phone and ftf conversations are two of the most common forms of interruption within the workplace.4 however, garrett and danziger found that “instant messaging in the workplace simultaneously promotes more frequent communications and reduces interruptions.”5 participants reported they were better able to manage disruptions using im and that im did not increase their communication time. the findings of this study revealed that some communication that otherwise may have occurred over email, by telephone, or in-person were instead delivered via im. this likely contributed to the reduced interruptions because im does not require full and immediate attention unlike a phone call or face-to-face communication. in addition, im study participants reported the ability to negotiate their availability through postponing conversations, information technology and libraries | september 2012 6 and these findings support earlier studies suggesting im is less intrusive than traditional communication methods for determining availability of coworkers.6 a number of research studies show that im improves teamwork and is useful for discussing complex tasks. huang, hung, and chen compared the effectiveness of email and im and the number of new ideas; they found that groups utilizing im generated more ideas than the email groups.7 they suggested that the spontaneous and rapid interchanges typical of im facilitates brainstorming between team members. the information that is uniquely visible through im and the ease of sending messages help create opportunities for spontaneous dialog. this is supported by a study by quan-haase, cothrel, and wellman, which found im promotes team interaction by indicating the likelihood of a faster response.8 ou et al. also suggest im has “potential to empower teamwork by establishing social networks and facilitating knowledge sharing among organizational members.”9 im can enhance the social connectedness of coworkers through its focus on contact lists and instant, opportunistic interactivity. the informal and personalized nature of im allows workers to build relationships while promoting the sharing of information. cho, trier, and kim suggest that the use of im as a communication tool encourages unplanned virtual hallway discussions that may be difficult for those located in different parts of a building, campus, or in remote locations.10 im can build relationships between teams and organizations where members are in physically separated locations. however, cho, trier, and kim also note that im is more successful in building relationships between coworkers who already have an existing relationship. wu et al. argue that by helping to build the social network within the organization, instant messaging can contribute to increased productivity.11 several studies have cautioned that im, like other forms of communication, requires organizational guidelines on usage and best practices. mahatanankoon suggests that productivity or job satisfaction may decrease without policies and workplace norms that guide im use.12 other research indicates that personality, employee status, and working style may affect the usefulness of im for individual employees.13 some workers may find the multitasking nature of im to work in their favor while those who prefer sequential task completion may find im disruptive. the hierarchy of work relationships and the nature of managerial styles are likely to have an impact on the use of im as well. while there are no research findings associated with the use of im for internal communication within libraries, there are articles encouraging its use. breeding writes of the potential for im to bring about “a level of collaboration that only rarely occurs with the store-and-forward model of traditional e-mail.”14 fink provides a concise introduction to the advantages of using internal im for communication between library staff.15 in addition, he provides an overview of the implementation and success of the openfire-based im network at mcmaster university. extending im beyond the reference desk | chan, ly, and meulemans 7 success of chat reference in libraries im-based chat reference gives libraries the means to more easily offer low-cost delivery of synchronous, real-time research assistance to their users, commonly referred to as “chat reference.” although libraries have used im for the last decade and many currently subscribe to questionpoint, a collaborative virtual reference service through oclc, two newer online services helped propel the growth of im-based chat reference. first available in 2006, the web-based meebo (www.meebo.com) made it much easier to use im for localized chat reference because library patrons were no longer required to have accounts on a proprietary network, such as aol or yahoo, to communicate with librarians.16 instead, meebo provided web widgets that allowed users to chat via the web browser. libraries could easily embed these widgets throughout their website and unlike questionpoint, meebo is free and does not require a subscription. librarians could answer questions using either their account on meebo’s website or by logging-in with a locally installed instant messaging client. in comparison to im-based chat reference, a number of libraries also found questionpoint difficult to use due to its complexity and awkward interface.17 in 2008, libraryh3lp (http://libraryh3lp.com) pushed the growth of im-based chat reference even further because it offered a low-cost, library-specific service that required little technical expertise to implement and operate. libraryh3lp improved on the meebo model by adding features such as queues, multi-user accounts, and assessment tools.18 im adds a more informal means of interaction that helps librarians build relationships with their users. several recent studies have shown that users respond positively to the use of im for chat reference. the illinois state university milner library found that switching from its older chat reference software to im increased transactions by 161 percent within one year.19 with the introduction of web-based im widgets pennsylvania state university library’s im-based chat reference grew from 20 percent to 60 percent of all virtual reference (vr), which includes email reference, in one year.20 a 2010 study of vr and im service at the university of guelph library found 71 percent user satisfaction with im compared to 70 percent satisfaction with vr overall.21 im use in academic libraries has become ubiquitous, and other types of libraries also use im to communicate with library patrons. case study california state university, san marcos (csusm) is a mid-size public university with approximately 9,500 students. csusm is a commuter campus with the majority of students living in north county san diego and offers many online or distance courses at satellite campuses. the csusm kellogg library has a robust chat reference service that is used by students on and off campus. the library has about forty-five employees including librarians, library administrators, and library assistants. the following section will discuss the meebo chat reference pilot, selection of openfire to replace meebo, implementation and customization of openfire, and evaluation of openfire for chat reference by librarians and as an internal network for all library personnel. information technology and libraries | september 2012 8 meebo chat reference pilot to examine the feasibility of using im for chat reference at csusm, the reference librarians initiated a pilot program using meebo (2008–9). a meebo widget was placed on the library’s homepage, the ask a librarian page, and on library research guides. within the first year of the pilot project, chat reference grew to more than 41 percent of all reference transactions.22 based on responses to user satisfaction surveys, 85 percent indicated they would recommend chat reference to other students, and 69 percent said they preferred it to other forms of reference services. chat reference is now an integral part of the library’s research assistance program, and im has become a permanent access point for students to contact reference librarians. although the new im service was successful, the pilot program uncovered a number of key shortcomings with meebo when used for chat reference; these shortcomings are documented in a case study by meulemans et al.23 these findings matched problems reported by other libraries who used meebo in their reference services.24 meebo is most suited for individual users who communicate one-to-one via im. for example, meebo chat widgets are specific to each meebo user, and it is not possible to share a single widget between multiple librarians. in addition, features such as message queues and message transfers, invaluable for managing a heavily used chat reference service, are not available in meebo. those features are essential for working with multiple, simultaneous incoming im messages, a common occurrence in virtual reference. other missing features included the lack of built-in transcript retention and lack of automated usage statistics.25 selecting openfire based on the need for a more robust chat reference system, the csusm reference librarians and the web development librarian explored other im options, especially open-source software. the web development librarian had previous experience using openfire at the university of alaska anchorage, for an internal library im network and investigated its capabilities to replace meebo as a chat reference tool. the desire to replace meebo for chat reference at csusm also provided the opportunity to pilot an internal im network. openfire, part of the suite of open-source instant messaging tools from jive software, was the only application that could easily fulfill both roles and offered a number of features that made it highly preferable when compared to other im-based chat reference systems. of its many features, one of the most valuable was the integration between openfire user accounts and our campus email system. being able to tap into the university’s email system meant automated configuration and updating of all staff accounts and contact lists. this removed the burden of individual account maintenance associated with external services such as meebo, libraryh3lp, and questionpoint. openfire supports internal im networks at educational institutions such as the university of pennsylvania, central michigan university, and university of california, san francisco. extending im beyond the reference desk | chan, ly, and meulemans 9 openfire could meet our im chat reference needs because it includes the fastpath plugin, a complete web-based chat management system available at www.igniterealtime.org/projects/openfire/plugins.jsp. this robust system incorporates important features such as message queues, message transfer, statistics, and canned messages. james cook university library in australia also chose to use openfire with fastpath plugin as its chat reference solution based on their need for those features.26 other institutions using fastpath and openfire in the role of chat reference or support include the university of texas, the oregon/ohio multistate virtual reference consortium, mozilla.com, and the university of wisconsin. when reviewing chat reference solutions, we considered the possibility of using chat modules available through drupal (http://drupal.org), the web content management system (cms) for our library website. the primary advantage of that option was complete integration with the library website and intranet. further analysis of the drupal option revealed that the available chat modules where too basic for our needs and that reconfiguration of our intranet and website to incorporate a workable chat reference system would require extensive time. in comparison to the implementation time associated with deploying the openfire system, using drupal-based chat modules did not provide a favorable cost-benefit ratio. while the proprietary libraryh3lp offered similar functionality for chat reference, its inability to integrate with our email system was clearly a deficit when compared to openfire. in libraryh3lp, it is necessary to create accounts for all library personnel in chat reference. fastpath does not have that requirement if you integrate openfire with your organization’s lightweight directory access protocol (ldap) directory. instead, the system will automatically create accounts for all library staff. furthermore, the administrative options and interface for libraryh3lp also did not compare favorably with that of fastpath. the fastpath interface for assigning users is more intuitive and the system generates a customizable chat initiation form for each workgroup (figures 1 and 2). oregon’s l-net and ohio’s knowitnow24x7 offer information about software requirements and an online demonstration of spark/fastpath.27 information technology and libraries | september 2012 10 figure 1. fastpath chat initiation form for csusm research help desk figure 2. fastpath chat initiation form for csusm media library for our requirements, openfire was clearly superior to the available systems for chat reference. its relatively simple deployment requirements and ease of setup helped make it our first choice for building a combined im network and chat reference system. in the following section, we will discuss the installation, customization, and assessment of our openfire implementation. openfire installation and configuration the openfire application is a free download from ignite realtime, a community of jive software. the program will run on any web server that has a windows, linux, or macintosh operating system. if configured as a self-contained application, openfire only requires java to be available on your web server. installation of the software is an automated process and system configuration is through a web-based setup guide. after the initial language selection form, the next step in the server configuration process is to enter the web server url and the ports through which the server will communicate with the outside world (figure 3). the third step provides fields for selecting the type of database to use with openfire and for inputting any information relating to your selection (figure 4). extending im beyond the reference desk | chan, ly, and meulemans 11 figure 3. openfire server settings screen figure 4. openfire database configuration form openfire uses a database to store information such as im network settings, user account information, and transcripts. database options include using an embedded database or connecting to an external database server. using the embedded database is the simpler option and is helpful if you do not have access to a database server. connecting to an external database server offers more control of the data generated by openfire and provides additional backup options. openfire works with a number of the more commonly used database servers such as mysql, postgresql, and microsoft sql server. in addition, oracle and ibm’s db2 are database options with additional free plugins from these vendors. we choose to use mysql because of our experience using it with other library web applications. if using the external database option, creating and configuring the external database before installing openfire is highly recommended. after choosing a database, the openfire configuration requires the selection of an authentication method for user accounts. one option is to use openfire’s internal authentication system. while the internal system is robust, it requires additional administrative support to manage the process of creating and maintaining user accounts. the recommended option is to connect openfire with your organization’s lightweight directory access protocol (ldap) directory (figure 5). ldap is a protocol that allows external systems to interact with the user information stored in an organization’s email system. using ldap with openfire is highly preferable because it simplifies access for your librarians and staff by automatically creating user accounts based on the information in your organization’s email system. library staff simply login with their work email or network account information; they are not required to create a new username and password. information technology and libraries | september 2012 12 figure 5. openfire ldap configuration form the last step in the configuration process is to grant system administrator access to the appropriate users. if using the ldap authentication method, you are able to select one or more users in your organization by entering their email id (the portion before the ampersand). the selected users will have complete access to all aspects of the openfire server. once the setup and configuration process is complete, the server is ready to accept im connections and route messages. reviewing the settings and options within the openfire system administration area is highly recommended. most libraries will likely want to adjust the configurations within the sections for server settings and archives. connecting the im network the second phase of the implementation process connected our library personnel with the im network using im software installed on their workstations. the openfire im server works with any multiprotocol im client (“multiprotocol” refers to support for simultaneous connections to multiple im networks) that provides options for configuring an xmpp or jabber account. some of the more popular im clients that offer this functionality include spark, trillian, miranda, and pidgin. based on our chat reference requirements, we choose to use spark (www.igniterealtime.org/projects/spark), an im client program designed to work specifically with the fastpath web chat service. spark comes with a fastpath plugin that enables users to receive and send messages to anyone communicating through the web-based fastpath chat widgets (more information on fastpath configuration is in the next section of this article). this plugin provides a tab for logging into a fastpath group and for viewing the status of the group’s message queues extending im beyond the reference desk | chan, ly, and meulemans 13 (figure 6). spark also includes many of the features offered by other im clients including built-in screen capture, message transfer, and group chat. figure 6. the fastpath plugin for spark library personnel were able to install spark on their own by downloading it from the ignite software website and launching the software’s installation package. the installation process is very simple and user-specific information is only required when spark is started for the first time. the fields required for login include the username and password of the user’s organizational email and the address of the im server. as part of our implementation process, we also provided library staff with recommendations regarding the selection and configuration of optional settings that might enhance their im experience. recommendations included auto-start of spark when loggingin to computer and the activation of incoming message signals, such as sound effects and pop-ups. on our openfire server, we had also installed the kraken gateway (http://kraken.blathersource.org) plugin to enable connections to external im networks. the gateway plugin works with spark to integrate library staff accounts on chat network such as google talk, facebook, and msn (an example of integrated networks is shown in figure 6.) by integrating meebo as well, librarians were able to continue using the meebo widgets they had embedded into their research guides and faculty profile pages. this allowed them to use spark to receive im messages rather than logging on to the meebo website. information technology and libraries | september 2012 14 configuring the fastpath plugin for chat reference a primary motivation for using openfire was the feature set available in the fastpath plugin. fastpath is a complete chat messaging system that includes workgroups, queues, chat widgets, and reporting. fastpath actually consists of two plugins that work together, fastpath service for managing the chat system and fastpath webchat for web-based chat widgets. both plugins are available as free downloads from the openfire plugins section of the ignite software website— www.igniterealtime.org/projects/openfire/plugins.jsp. to install fastpath, upload the its packages using the form in the plugins section of the openfire administrative interface. the plugins will automatically install and add a fastpath tab to the administrative main menu. the first step in getting started with the system is to create a workgroup and add members (figure 7). within each new workgroup, one or more queues are required to process and route incoming requests and each queue requires at least one “agent.” in fastpath, the term agent refers to those who will receive the incoming chat requests. figure 7. workgroup setup form in fastpath as work groups are created, the system automatically generates a chat initiation form which by default includes fields for name, email and question. administrators can remove, modify, and add any combination of field types including text fields, dropdown menus, multiline text areas, radio buttons, and check boxes. you may also configure the chat initiation form to require completion of some, all, or none of the fields. at csusm, our form (figures 1 and 2) includes name, question, email, and a dropdown menu for selecting the topic area of the user’s research and a field for the user to enter their question. the information in these fields allows us to quickly route incoming extending im beyond the reference desk | chan, ly, and meulemans 15 questions to the appropriate subject librarian. fastpath includes the ability to create routing rules that use the values submitted in the form to send messages to specific queues within a workgroup. in future, we may use the dropdown menu to automatically route questions to the subject specialist based on the student’s topic. there are two methods to make the fastpath chat widget available to the public. the standard approach embeds a presence icon on your webpage and provides automatic status updates. clicking on the icon displays the chat initiation form. for our needs we choose to embed the chat initiation form in our webpages (see appendix b for sample code). when the user submits the form, openfire routes the message to the next available librarian. on the librarian’s computer, the spark program plays a notification sound and displays a pop-up dialog. the pop-up dialog remains open until the librarian accepts the message, passes it on, or the time limit for acceptance is reached, in which case the message returns to the queue for the next available librarian. evaluation of openfire for enhanced chat reference the csusm reference librarians found fastpath and openfire to be much more robust than meebo for chat reference. the ability to keep chat transcripts and to retain metadata such as time stamps, duration of chats, and topic of research for each conversation is very helpful toward analyzing the effectiveness of chat research assistance and for statistical reporting. the automated recording of transcripts and metadata saved time when compared to meebo. using meebo, transcripts were manually copied into a microsoft word document and the tracking statistics of im interactions were kept in a shared excel spreadsheet. other useful features of fastpath were the capability of transferring of patrons to other librarians and having more than one librarian monitor incoming questions. furthermore, access to the database holding the fastpath data allowed us to build an intranet page to monitor real-time incoming im messages and their responses. however, some issues were encountered with the fastpath plugin when initiating chat connections. we experienced intermittent, random instances of dropped im connections and lost messages. while many of these lost connections were likely the result of user actions (accidentally closing the chat pop-up, walking away from the computer, etc.), others appear to have been due to problematic connections between the server and the user’s browser. to address these issues, we are now asking users to provide their email when they initiate a chat session. with user emails and our real-time chat monitoring system, we are able to follow up with reference patrons that experience im connection issues and provide research assistance via email. evaluation of openfire as an internal communication tool while the adoption of im as internal communication tool was highly encouraged, its use was not mandatory for all library personnel. based on the varied technical background of our staff and librarians, we recognized that some might find im difficult to integrate within their workflow or communication style and chose a soft-launch for our network. information technology and libraries | september 2012 16 in summer 2011, we conducted a survey of csusm library personnel (44 respondents, 99 percent of total staff) to evaluate im as an internal communication tool. (see appendix a for survey questions.) we found that 59 percent of staff use the internal im network while 85 percent use some type of im for web-based chat for work. of those who use internal im, 30 percent used it daily. while the survey was anonymous, anecdotal discussions indicate adoption rates are higher among library units where the work is technically oriented or instructional in nature, such as library systems and the information literacy program/reference. among the respondents who use im, 45 percent of library staff indicated they use it because it allows quick communication between those in the library and 39 percent like its informal nature of communication. twenty percent of total respondents preferred im to email and phone communications. two respondents use the internal im network but were dissatisfied with it and indicated it did not work well while one found it too difficult to use. an additional survey question was geared for staff members who do not use the internal im network at all (“why do you not use the library im network?”). this question was designed to find areas of possible improvement within our system to encourage greater use. survey respondents were allowed to select more than one reason. the most common reasons given by those who do not use the library im network were that they don’t feel the need (34 percent of nonusers), they mainly communicate with staff members who are also not utilizing the im network (18 percent), im does not work for their communication style (14 percent), and privacy concerns (14 percent). we believe more in-depth analysis is necessary to learn more regarding the perceived usefulness of im within our organization and to further its adoption. conclusion through additional training and user education, we hope to promote greater use of the openfire internal im network among those who work in the library. while 100 percent adoption of im as a communication tool is not a stated goal of our project, we believe that some staff have not realized the full potential of im for collaboration and productivity due to a lack of experience with this technology. in hindsight, additional training sessions beyond the initial introductory workshop to set up the spark im client may have increased the usage of im by staff. for example, providing more information on the library’s policies regarding internal im tracking and the configuration of our system may have alleviated concerns regarding privacy. in addition, we need to lead more discussions on the benefits of im for collaboration, lowering disruptions, and increasing effectiveness in the workplace. openfire and fastpath for chat reference has brought many new features that were previously unavailable to chat reference at csusm. the addition of queues, message transfer, and transcripts has enhanced the effectiveness of this service and eased its management. compared to the prior chat reference implementations that used questionpoint and meebo, this new system is more user friendly and robust. extending im beyond the reference desk | chan, ly, and meulemans 17 furthermore, the internal im network and its connection to web-based chat widgets offer the opportunity for building a library that is more open to users. library users could feasibly contact any library staff member, not just reference librarians, via im for help. we are testing this concept with a pilot project involving the csusm media library. they are staffing their own chat workgroup and a chat widget is now available on their website. in the future, we also hope to employ a chat widget for circulation and ill services, another public services area that frequently works with library users. it is important to note that the success of openfire and im in the library attracted the attention of other csusm instructional and student support areas. in spring 2011, instructional and information technology services (iits), which provides campus-wide technology services for faculty, staff, and students piloted an openfire-based im helpdesk service to assist users with technology questions and problems. as of fall 2011, the “ask an it technician” service is fully implemented and available on all campus webpages. discussions on the adoption of im for other campus student services, such as financial aid and counseling, have also occurred. in addition to being a contact point for students, im has potential to improve the internal communication within the organization. references 1. hee-kyung cho, matthias trier, and eunhee kim, “the use of instant messaging in working relationship development: a case study,” journal of computer-mediated communication 10, no. 4 (2005), http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00280.x/full (accessed aug. 1, 2011). 2. bonnie a. nardi, steven whittaker, and erin bradner, “interaction and outeraction: instant messaging in action,” in proceedings of the 2000 acm conference on computer supported cooperative work (new york, new york: acm press, 2000),79–88. 3. ellen isaacs et al., “the character, functions, and styles of instant messaging in the workplace,” in proceedings of the 2002 acm conference on computer supported cooperative work (new york, new york: acm press, 2002), 11–20. 4. victor m. gonzález and gloria mark, “constant, constant, multi-tasking craziness: managing multiple working spheres,” in proceedings of the sigchi conference on human factors in computing systems (new york, new york: acm press, 2004), 113–20. 5. r. kelly garrett and james n. danziger, “im = interruption management? instant messaging and disruption in the workplace,” journal of computer-mediated communication 13, no. 1 (2007), http://jcmc.indiana.edu/vol13/issue1/garrett.html (accessed jun. 15, 2011). 6. nardi, whittaker, and bradner, “interaction and outeraction,” 83. 7. albert h. huang, shin-yuan hung, and david c. yen, “an exploratory investigation of two internet-based communication modes,” computer standards & interfaces 29, no. 2 (2006): 238–43. http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00280.x/full http://jcmc.indiana.edu/vol13/issue1/garrett.html information technology and libraries | september 2012 18 8. anabel quan-haase, joseph cothrel, and barry wellman, “instant messaging for collaboration: a case study of a high-tech firm,” journal of computer-mediated communication 10, no. 4 (2005), http://jcmc.indiana.edu/vol10/issue4/quan-haase.html (accessed jun. 12, 2011). 9. carol x. j. ou et al., “empowering employees through instant messaging,” information technology & people 23, no. 2 (2010): 193–211. 10. cho, trier, and kim, “instant messaging in working relationship development.” 11. lynn wu et al., “value of social network—a large-scale analysis on network structure impact to financial revenue of information technology consultants” (paper presented at winter information systems conference, salt lake city, ut, feb. 5, 2009). 12. pruthikrai mahatanankoon, “28p. exploring the impact of instant messaging on job satisfaction and creativity,” conf-irm 2010 proceedings (2010). 13. ashish gupta and han li, “understanding the impact of instant messaging (im) on subjective task complexity and user satisfaction,” in pacis 2009 proceedings. paper 10, http://aisel.aisnet.org/pacis2009/1; and stephanie l. woerner, joanne yates, and wanda j. orlikowski, “conversational coherence in instant messaging and getting work done,” in proceedings of the 40th annual hawaii international conference on system sciences, http://www.computer.org/portal/web/csdl/doi/10.1109/hicss.2007.152 (2007). 14. marshall breeding, “instant messaging: it’s not just for kids anymore,” computers in libraries 23, no. 10 (2003): 38–40. 15. john fink, “using a local chat server in your library,” feliciter 56, no. 5 (2010): 202–3. 16. william breitbach, matthew mallard, and robert sage, “using meebo’s embedded im for academic reference services: a case study,” reference services review 37, no. 1 (2009): 83–98. 17. cathy carpenter and crystal renfro, “twelve years of online reference services at georgia tech: where we have been and where we are going,” georgia library quarterly 44, no. 2 (2007), http://digitalcommons.kennesaw.edu/glq/vol44/iss2/3 (accessed aug. 25, 2011); and danielle theiss-white et al., “im’ing overload: libraryh3lp to the rescue,” library hi tech news 26, no. 1/2 (2009): 12–17. 18. theiss-white et al., “im’ing overload,” 12–17. 19. sharon naylor, “why isn’t our chat reference used more?” reference & user services quarterly 47, no. 4 (2008): 342–54 20. sam stormont, “becoming embedded: incorporating instant messaging and the ongoing evolution of a virtual reference service,” public services quarterly 6, no. 4 (2010): 343–59. http://jcmc.indiana.edu/vol10/issue4/quan-haase.html http://www.computer.org/portal/web/csdl/doi/10.1109/hicss.2007.152 http://digitalcommons.kennesaw.edu/glq/vol44/iss2/3 extending im beyond the reference desk | chan, ly, and meulemans 19 21. lorna rourke and pascal lupien, “learning from chatting: how our virtual reference questions are giving us answers,” evidence based library & information practice 5, no. 2 (2010): 63–74. 22. pearl ly and allison carr, “do u im?: using evidence to inform decisions about instant messaging in library reference services” (poster presented at the 5th evidence based library and information practice conference, stockholm, sweden, june 29, 2009), http://blogs.kib.ki.se/eblip5/posters/ly_carr_poster.pdf (accessed august 1, 2011). 23. yvonne nalani meulemans, allison carr, and pearl ly, “from a distance: robust reference service via instant messaging,” journal of library & information services in distance learning 4, no. 1 (2010): 3–17. 24. theiss-white et al., “im’ing overload,” 12–17. 25. meulemans, carr, and ly, “from a distance,” 14–15 26. nicole johnston, “improving the reference and information experience of students in regional areas—does an instant messaging service make a difference?” (paper presented at 4th alia new librarians symposium, december 5–6, 2008, melbourne, australia), http://eprints.jcu.edu.au/2076(accessed august 17, 2011); and alan cockerill, “open source for im reference: openfire, fastpath and spark” (workshop presented at fair shake of the open source bottle, griffith university, queensland college of art, brisbane, australia, november 20, 2009), http://www.quloc.org.au/download.php?doc_id=6932&site_id=255 (accessed august 4, 2011). 27. oregon state multistate collaboration, “multi-state collaboration: home,” http://www.oregonlibraries.net/multi-state (accessed august 16, 2011). http://blogs.kib.ki.se/eblip5/posters/ly_carr_poster.pdf http://eprints.jcu.edu.au/2076 http://www.quloc.org.au/download.php?doc_id=6932&site_id=255 http://www.oregonlibraries.net/multi-state information technology and libraries | september 2012 20 appendix a library instant messaging (im) usage survey the information you submit is confidential. your name and campus id are not included with your response. which of the following do you use . . . for work for personal library’s im network (spark) meebo msn yahoo gtalk facebook or other website-specific chat system im app on my phone trillian, pidgin or other im aggregator skype i don’t use im or web-based chat other if you selected other, please describe: ____________________________________________________________________ extending im beyond the reference desk | chan, ly, and meulemans 21 on average, how often do you communicate via im or web-based chat at work? ● several times a day ● almost daily ● several times a week ● several times a month ● never how often do you use im or web-based chat to . . . 5—often 4 3— sometimes 2 1—never discuss work-related topic socialize with co-worker answer questions from library users talk about non-work related topic request tech support other if you selected other, please describe: ____________________________________________________________________ if you use im to communicate at work, what do you like about it? ● allows for quick communication with others in the library ● facilitates informal conversation ● students like to use it to ask library related questions ● i prefer im over phone or email ● other: information technology and libraries | september 2012 22 why do you not use the library im network? ● don’t feel the need ● the people i usually talk to aren’t on it ● does not work well ● never get around to it . . . but would like to ● it doesn’t work for my communication style ● the system is too difficult to use ● privacy concerns ● other: additional comments? ____________________________________________________________________ extending im beyond the reference desk | chan, ly, and meulemans 23 appendix b iframe code for embedding fastpath chat widget many different chat-reference software packages are widely used by libraries, including questionpoint, meebo, and libraryh3lp. less commonly used is openfire (www.igniterealtime.org/projects/openfire), an open-source im network and a single unified ap... literature review instant messaging in the workplace success of chat reference in libraries case study meebo chat reference pilot selecting openfire openfire installation and configuration connecting the im network configuring the fastpath plugin for chat reference evaluation of openfire for enhanced chat reference evaluation of openfire as an internal communication tool conclusion references appendix a library instant messaging (im) usage survey appendix b investigations into library web-scale discovery services jason vaughan information technology and libraries | march 2012 32 abstract web-scale discovery services for libraries provide deep discovery to a library’s local and licensed content and represent an evolution—perhaps a revolution—for end-user information discovery as pertains to library collections. this article frames the topic of web-scale discovery and begins by illuminating web-scale discovery from an academic library’s perspective—that is, the internal perspective seeking widespread staff participation in the discovery conversation. this included the creation of the discovery task force, a group that educated library staff, conducted internal staff surveys, and gathered observations from early adopters. the article next addresses the substantial research conducted with library vendors that have developed these services. such work included drafting of multiple comprehensive question lists distributed to the vendors, onsite vendor visits, and continual tracking of service enhancements. together, feedback gained from library staff, insights arrived at by the discovery task force, and information gathered from vendors collectively informed the recommendation of a service for the unlv libraries. introduction web-scale discovery services, combining vast repositories of content with accessible, intuitive interfaces, hold the potential to greatly facilitate the research process. while the technologies underlying such services are not new, commercial vendors releasing such services, and their work and agreements with publishers and aggregators to preindex content, is very new. this article in particular frames the topic of web-scale discovery and helps illuminate some of the concerns and commendations related to web-scale discovery from one library’s staff perspective—that is, the internal perspective. the second part focuses on detailed dialog with the commercial vendors, enabling the library to gain a better understanding of these services. in this sense, the second half is focused externally. given that web-scale discovery is new for the library environment, the author was unable to find any substantive published work detailing identification, research, evaluation, and recommendation related to library web-scale discovery services. it’s hoped that this article will serve as the ideal primer for other libraries exploring or contemplating exploration of these groundbreaking services. web-scale discovery services are able to index a variety of content, whether hosted locally or remotely. such content can include library ils records, digital collections, institutional repository content, and content from locally developed and hosted databases. such capabilities existed, to varying degrees, in next-generation library catalogs that debuted in the mid 2000s. in addition, web-scale discovery services pre–index remotely hosted content, whether purchased or licensed by the library. this latter set of content—hundreds of millions of items—can include items such as e-books, publisher or aggregator content for tens of thousands of full-text journals, content from abstracting and indexing databases, and materials housed in open-access repositories. for purposes of this article, web-scale discovery services are flexible services which jason vaughan (jason.vaughan@unlv.edu) is director, library technologies, university of nevada, las vegas. investigations into library web-scale discovery services | vaughan 33 provide quick and seamless discovery, delivery, and relevancy-ranking capabilities across a huge repository of content. commercial web-scale discovery vendors have brokered agreements with content providers (publishers and aggregators), allowing them to pre–index item metadata and full-text content (unlike the traditional federated search model). this approach lends itself to extremely rapid search and return of results ranked by relevancy, which can then be sorted in various ways according to the researcher’s whim (publication date, item type, full text only, etc.). by default, an intuitive, simple, google-like search box is provided (along with advanced search capabilities for those wishing this approach). the interface includes design cues expected by today’s researchers (such as faceted browsing) and, for libraries wishing to extend and customize the service, embraces an open architecture in comparison to traditional ils systems. why web-scale discovery? as illustrated by research dating back primarily to the 1990s, library discovery systems within the networked online environment have evolved, yet continue to struggle to serve users. as a result, the library (or systems supported and maintained by the library) is often not the first stop for research—or worse, not a stop at all. users accustomed to a quick, easy, “must have it now” environment have defected, and research continues to illustrate this fact. rather than weave these research findings into a paragraph or page, below are some illustrative quotes to convey this challenge. the quotations below were chosen because they succinctly capture findings from research involving dozens, hundreds, and in some cases thousands of participants or respondents: people do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable—so long as it requires little effort to find—rather than using information they know to be of high quality and reliable, though harder to find.1 * * * today, there are numerous alternative avenues for discovery, and libraries are challenged to determine what role they should appropriately play. basic scholarly information use practices have shifted rapidly in recent years, and as a result the academic library is increasingly being disintermediated from the discovery process, risking irrelevance in one of its core functional areas [that of the library serving as a starting point or gateway for locating research information] . . . we have seen faculty members steadily shifting towards reliance on networklevel electronic resources, and a corresponding decline in interest in using locally provided tools for discovery.2 * * * a seamless, easy flow from discovery through delivery is critical to end users. this point may seem obvious, but it is important to remember that for many end users, without the delivery of something he or she wants or needs, discovery alone is a waste of time.3 * * * end users’ expectations of data quality arise largely from their experiences of how information is organized on popular web sites. . . 4 * * * [user] expectations are increasingly driven by their experiences with search engines like google and online bookstores like amazon. when end users conduct a search in a library information technology and libraries | march 2012 34 catalog, they expect their searches to find materials on exactly what they are looking for; they want relevant results.5 * * * users don’t understand the difference in scope between the catalog and a&i services (or the catalog, databases, digitized collections, and free scholarly content).6 * * * it is our responsibility to assist our users in finding what they need without demanding that they acquire specialized knowledge or select among an array of “silo” systems whose distinctions seem arbitrary . . . the continuing proliferation of formats, tools, services, and technologies has upended how we arrange, retrieve, and present our holdings. our users expect simplicity and immediate reward and amazon, google, and itunes are the standards against which we are judged. our current systems pale beside them.7 * * * q: if you could provide one piece of advice to your library, what would it be? a: just remember that students are less informed about the resources of the library than ever before because they are competing heavily with the internet.8 additional factors sell the idea of web-scale discovery. obviously, something must be discoverable for it to be used (and of value) to a researcher; ideally, content should be easily discoverable. since these new services index content that previously was housed in dozens or hundreds of individual silos, they can greatly facilitate the search process for many research purposes. libraries often spend large sums of money to license and purchase content, sums that often increase annually. any tool that holds the potential to significantly increase the discovery and use of such content should cause libraries to take notice. at time of writing, early research is beginning to indicate that these tools can increase discovery. doug way compared link-resolver-database and full-text statistics prior to and after grand valley state university’s implementation of the summon webscale discovery service.9 his research suggests that the service was both broadly adopted by the university’s community and that it has led to an increase in their library’s electronic resource discovery and use. willamette university implemented worldcat local, and bill kelm presented results that showed an increase in both ill requests as well as use of the library’s electronic resources.10 from another angle, information-literacy efforts focus on connecting users to “legitimate” content and providing researchers the skills to identify content quality and legitimacy. given that these web-scale discovery services include or even primarily focus on indexing a large amount of scholarly research, such services can serve as another tool in the library’s arsenal. results retrieved from these services—largely content licensed or purchased by libraries—is accurate, relevant, and vetted, compared to the questionable or opinionated content that may often be returned through a web search engine query. several of the services currently allow a user to refine results to just categorized as peer-reviewed or scholarly. the internal academic library perspective: genesis of the unlv libraries discovery task force the following sections of this article begin with a focus on the internal unlv library perspective—from early discussions focused on the broad topic of discovery to establishing a task investigations into library web-scale discovery services | vaughan 35 force charged to identify, research, evaluate, and recommend a potential service for purchase. throughout this process, and as detailed below, communication with and feedback from the variety of library staff was essential in ensuring success. given the increasing vitality of content in electronic format, and the fact that such content was increasingly spread across multiple access points or discovery systems, in late 2008 the university of nevada las vegas (unlv) libraries began an effort to engage library staff in information discovery and how such discovery would ideally occur in the future. related to the exponential growth of content in electronic format, traditional technical-services functions of cataloging and acquisitions were changing or would soon change, not just at unlv, but throughout the academic library community. coinciding with this, the libraries were working on drafting their 2009–11 strategic plan and wanted to have a section highlighting the importance of information discovery and delivery with action items focused on improving this critical responsibility of libraries. in spring 2009, library staff were given the opportunity to share with colleagues a product or idea, related to some aspect of discovery, which they felt was worthy of further consideration. this event, open to unlv libraries staff and other nevada colleagues, was titled the discovery mini-summit, and more than a dozen participants shared their ideas, most in a poster-session format. one of the posters focused on serial solutions summon, an early entrant into the vendor web-scale discovery service landscape. at the time, it was a few months from public release. other posters included topics such as the flickr commons (cultural heritage and academic institutions exposing their digital collections through this popular platform), and a working prototype of a homegrown, open-source federated search approach searching across various subscribed databases. in august 2009, the dean of the unlv university libraries charged a ten-person task force to investigate and evaluate web-scale discovery services with the ultimate goal of providing a final recommendation for potential purchase. representation on the task force included three directors and a broad cross section of staff from across the functional areas of the library, including back-of-the-house and public-service operations. the director of library technologies, and author of this article, was tasked with drafting a charge and chairing the committee; once charged, the discovery task force worked over the next fifteen months to research, evaluate, and ultimately provide a recommendation regarding a web-scale discovery service. to help illustrate some of the events described, a graphical timeline of activities is presented as appendix a; the original charge appears as appendix b. in retrospect, the initial target date of early 2010 to make a recommendation was naive, as three of the five products ultimately identified and evaluated by the task force weren’t publicly released until 2010. several boundaries were provided within the charge, including the fact that the task force was not investigating and evaluating traditional federated search products. the libraries had had a very poor experience with federated search a few years earlier, and the shortcomings of the traditional federated search approach—regardless of vendor—are well known. the remainder of this article discusses the various steps taken by the discovery task force in evaluating and researching web-scale discovery services. while many libraries have begun to implement the webscale discovery services evaluated by this group, many more are currently at the learning and evaluation stage, or have not yet begun. many libraries that have already implemented a commercial service likely went through an evaluation process, but perhaps not at the scale conducted by the unlv libraries, if for no other reason than the majority of commercial services are extremely new. even in early 2010, there was less competition, fewer services to evaluate, information technology and libraries | march 2012 36 fewer vendors to contact, and fewer early adopters from whom to seek references. fortunately, the initial target date of early 2010 for a recommendation was a soft target, and the discovery task force was given ample time to evaluate the products. based on presentations given by the author in 2010, it can’t be presumed that an understanding of web-scale discovery—or the awareness of the commercial services now available—is necessarily widespread. in that sense, it’s the author’s hope and intent that information contained in this article can serve as a primer, or a recipe, for those libraries wishing to learn more about web-scale discovery and perhaps begin an evaluation process of their own. while research exists on federated search technologies within the library environment, the author was unable to find any peer-reviewed published research on the evaluation model and investigations for vendor produced web-scale discovery services as described in this paper. however, some reports are available on the open web, providing some insights into web-scale discovery evaluations led by other libraries, such as two reports provided by oregon state university. the first, dated march 2009, describes a task force whose activities included “scrutinize wcl [worldcat local], investigate other vendors’ products, specifically serials solutions’ summon, the recently announced federated index discovery system; ebsco’s integrated search; and innovative interfaces’ encore product, so that a more detailed comparison can be done,” and “by march 2010, communicate . . . whether wcl or another discovery service is the optimal purchase for osu libraries.”11 note that in 2009, encore existed as a next-generation discovery layer, and it had an optional add on called “encore harvester,” which allows for the harvesting of digital local collections. the report cites the university of michigan’s evaluation of wcl, and adds their additional observations. the march 2009 report provides a features comparison matrix for worldcat local, encore, summon, and libraryfind (an open-source search tool developed at osu that provides federated searching for selected resources). feature sets include the areas of search and retrieval, content, and added features (e.g., book covers, user tagging, etc.). the report also describes some usability testing involving wcl and integration with other local library services. a second set of investigations followed “in order to provide the task force with an opportunity to more thoroughly investigate other products” and is described in a second report provided at the end of 2009.12 at the time of both phases of this evaluation (and drafted reports) three of the web-scale discovery products had yet to enter public release. the december 2009 report focused on the two released products, serials solutions summon and worldcat local, and includes a feature matrix like the earlier report, with the added feature set of “other,” which included the features of “clarity of display,” “icons/images,” and “speed.” the latter report briefly describes how they obtained subject librarian feedback and the pros and cons observed by the librarians in looking at summon. it also mentions obtaining feedback from two early adopters of the summon product, as well as obtaining feedback from librarians whose library had implemented worldcat local. apart from the oregon reports, some other reports on evaluations (or selection) of a particular service, or a set of particular services, are available, such as the university of michigan’s article discovery working group, which submitted a final report in january 2010.13 activity: understanding web-scale the first activity of the discovery task force was to educate the members, and later, other library colleagues, on web-scale discovery. terms such as “federated search,” “metasearch,” “next investigations into library web-scale discovery services | vaughan 37 generation catalogs,” and “discovery layers” had all come before, and “web-scale” was a rather new concept that wasn’t widely understood. the discovery mini summit served as a springboard that perhaps more by chance than design introduced to unlv library staff what would later become more commonly known as web-scale discovery, though even we weren’t familiar with the term back in spring 2009. in fall 2009, the discovery task force identified reports from entities such as oclc, ithaka, and reports prepared for the library of congress highlighting changing user behavior and expectations; these reports helped form a solid foundation for understanding the “whys” related to web-scale discovery. additional registration and participation in sponsored web-scale discovery webcasts and meeting with vendors at library conferences helped further the understanding of web-scale discovery. after the discovery task force had a firm understanding of web-scale discovery, the group hosted a forum for all library staff to help explain the concept of web-scale discovery and the role of the discovery task force. specifically, this first forum outlined some key components of a web-scale discovery service, discussed research the task force had completed to date, and outlined some future research and evaluation steps. a summary of these steps appears in the timeline in appendix a. time was allowed for questions and answers, and then the task force broadcast several minutes of a (then recent) webcast talking about web-scale discovery. as part of its education role, the discovery task force set up an internal wiki-based webpage in august 2009 upon formation of the group, regularly added content, and notified staff when new content was added. a goal of the task force was to keep the evaluative process transparent, and over time the wiki became quite substantial. links to “live” services were provided on the wiki. given that some services had yet to be released, some links were to demo sites or sites of the closest approximation available, i.e., some services yet to be released were built on an existing discovery layer already in general release, and thus the look, feel, and functionality of such services was basically available for staff review. the wiki also provided links to published research and webcasts on web-scale discovery. such content grew over time as additional webscale discovery products entered general release. in addition to materials on particular services, links were provided to important background documents and reports on topics related to the user discovery experience and user expectations for search, discovery, and delivery. discovery task force meeting notes and staff survey results were posted to the wiki, as were evaluative materials such as information on the content-overlap analysis conducted for each service. announcements to relevant vendor programs at the american library association’s annual conference were also posted to the wiki. activity: initial staff survey as noted above, when the task force began its work, only two products (out of five ultimately evaluated) were in general release. as more products entered public release, a next step was to invite vendors onsite to show their publicly released product, or a working, developed prototype nearing initial public release. to capture a sense of the library staff ahead of these vendor visits, the discovery task force conducted the first of two staff surveys. the 21-question survey consisted of a mix of “rank on a scale” questions, multiple-choice questions, and free-text response questions. both the initial and subsequent surveys were administered through the online surveymonkey tool. respondents were allowed to skip any question they wished. the survey was broken into three broad topical areas: “local library customization capabilities,” “end user aspect: information technology and libraries | march 2012 38 features and functionality,” and “content.” the survey had an average response rate of 47 staff, or 47% of the library’s 100-strong workforce. the survey questions appear in appendix c. in hindsight, some of the questions could have benefitted from more careful construction. that said, there was a conscious juxtaposition of differing concepts within the same question—the task force did not want to receive a set of responses in which all library staff felt it was important for a service to do everything—in short, to be all things to all people. forcing staff to rate varied concepts within a question could provide insights into what they felt was really important. a brief summary of some key questions for each section follows. as an introduction, one question in the survey asked staff to rate the relative importance of each overarching aspect related to a discovery service (customization, end user interface, and content). staff felt content was the most critical aspect of a discovery service, followed by the end-user interface, followed by the ability to heavily customize the service. a snapshot of some of the capabilities library staff thought were important (or not) is provided in table 1. web-scale capabilities sa a n d sd physical item status information 81.6% 18.4% publication date sort capability 75.5% 24.5% display library-specified links in the interface 69.4% 30.6% one-click retrieval of full-text items 61.2% 36.7% 2% ability to place ill / consortial catalog requests 59.2% 36.7% 4.1% display the library’s logo 59.2% 36.7% 4.1% to be embedded within various library website pages 58% 42% full-text items first sort capability 58.3% 31.3% 8.3% 2.1% shopping cart for batch printing, emailing, saving 55.1% 44.9% faceted searching 48.9% 42.6% 8.5% media type sort capability 47.9% 43.8% 4.2% 4.2% author name sort capability 41.7% 37.5% 18.8% 2.1% have a search algorithm that can be tweaked by library staff 38% 36% 20% 4% 2% user account for saved searches and marked items 36.7% 44.9% 14.3% 4.1% book cover images 25% 39.6% 20.8% 10.4% 4.2% have a customizable color scheme 24% 58% 16% 2% google books preview button for book items 18.4% 53.1% 24.5% 4.1% tag cloud 12.5% 52.1% 31.3% 4.2% user authored ratings 6.4% 27.7% 44.7% 12.8% 8.5% user authored reviews 6.3% 20.8% 50% 12.5% 10.4% user authored tags 4.2% 33.3% 39.6% 10.4% 12.5% sa = strongly agree; a = agree; n = neither agree nor disagree; d = disagree; sd = strongly disagree table 1. web-scale discovery service capabilities investigations into library web-scale discovery services | vaughan 39 none of the results was surprising, other than perhaps the low interest or indifference in several web 2.0 community features, such as the ability for users to provide ratings, reviews, or tags for items, and even a tag cloud. the unlv libraries already had a next-generation catalog offering these features, and they have not been heavily used. even if there had been an appreciable adoption of these features by end users in the next-generation catalog for a web scale discovery service they are perhaps less applicable—it’s probably more likely that users would be less inclined to post reviews and ratings for an article, as opposed to a monograph—and article-level content vastly outnumbers book-level content with web-scale discovery services. the final survey section focused on content. one question asked about the incorporation of ten different information types (sources) and asked staff to rank how important it was that a service include such content. results are provided in table 2. a bit surprisingly, inclusion of catalog records was seen as most important. not surprisingly, full-text and a&i content from subscription resources were ranked very highly. it should also be noted that at the time of the survey, the institutional repository was in its infancy with only a few sample records, and awareness of this resource was low among library staff. another question listed a dozen existing publishers (e.g., springer, elsevier, etc.) deemed important to the libraries and asked staff to rank the importance that a discovery service index items from these publishers on a four point scale from “essential” to “not important.” results showed that all publishers were ranked as essential and important. related to content, 83.8 percent of staff felt that it was preferable for a service to de-dupe records such that the item appears once in the returned list of results; 14.6 percent preferred that the service not de-dupe results. information source rating average ils catalog records 1.69 majority of full-text articles / other research contained in vendorlicensed online resources 2.54 majority of citation records for non-full-text vendor-licensed a&i databases 4.95 consortial catalog records 5.03 electronic reserves records 5.44 records within locally created and hosted databases 5.64 digital collection records 5.77 worldcat records 6.21 ils authority control records 6.5 institutional repository records 6.68 table 2. importance of content indexed in discovery service after the first staff survey was concluded, the discovery task force hosted another library forum to introduce and “test drive” the five vendor services in front of library staff. this session was scheduled just a few weeks ahead of the onsite vendor visits to help serve as a primer to engage library staff and get them actively thinking about questions to ask the vendors. the task force information technology and libraries | march 2012 40 distributed notecards at the forum and asked attendees to record any specific questions they had about a particular service. after the forum, specific questions related to the particular products were collected; 28 questions were collected, and they helped inform future research for those questions for which the task force did not at the time have an answer. questions ran the gamut and collectively touched on all three areas of evaluation. activity: second staff survey within a month after the five vendor onsite visits, a content analysis of the overlap between unlv licensed content and content indexed by the discovery services was conducted. after these steps, a second staff survey was administered. this second staff survey had questions focused on the same three functional areas as the first staff survey: local library customization features, end user features and functionality, and content. since the vendor visits had taken place and users could now understand the questions in the context of the products, questions were asked from the perspective of each product, e.g., “please rate on a five point likert scale whether each discovery service appears to adequately cover a majority of the critical publisher titles (worldcat local, summon, eds, encore synergy, primo central).” in addition, there were free-text questions focused on each individual product allowing colleagues to share additional, detailed thoughts. the second survey totalled 25 questions and had an average response rate of 18 respondents, or about 18 percent of library staff. several staff conducted a series of sample searches in each of the services and provided feedback of their findings. though this was a small response rate, two of the five products rose to the top, a third was a strong contender, and two were seen as less desirable. the lower response rate is perhaps indicative of several things. first, not all staff had attended the onsite vendor demonstrations or had taken the time to test drive the services via the links provided on the discovery task force wiki site. second, some questions were more appropriately answered by a subset of staff. for example, the content questions might best be matched to those with reference, collection development, or curriculum and program liaison duties. finally, intricate details emerged once a thorough analysis of the vendor services was commenced. the first survey was focused more on the philosophy of what was desirable; the second survey took this a step further and asked how well each product matched such wishes. discovery services are changing rapidly with respect to interface updates, customization options, and scope of content. as such, and also reflective of the lower response rate, the author is not providing response information nor analysis for this second survey within this article. however, results may be provided upon specific request to the author. the questions themselves for the second staff survey are significant, and they could help serve as a model for other libraries evaluating existing services on the market. as such, questions appear in appendix d. activity: early adopter references one of the latter steps in the evaluation process from the internal academic library perspective was to obtain early adopter references from other academic library customers. a preliminary shortlist was compiled through a straw vote of the discovery task force—and the results of the vote showed a consensus. this vote narrowed down the discovery task force’s list of services still in contention for a potential purchase. this shortlist was based on the growing mass of research conducted by the discovery task force and informed by the staff surveys and feedback to date. three live customers were identified for each service that had made the shortlist, and the task investigations into library web-scale discovery services | vaughan 41 force successfully obtained two references for each service. reference requests were intensive and involved a set of two dozen questions that references either responded to in writing or answered during scheduled conference calls. to help libraries conducting or interested in conducting their own evaluation and analysis of these services, this list of questions appears in appendix e. the services are so new that the live references weren’t able to comprehensively answer all the questions—they simply hadn’t had sufficient time to fully assess the service they’d chosen to implement. still, some important insights were gained about the specific products and, at the larger level, discovery services as a whole. as noted earlier, discovery services are changing rapidly in the sense of interface updates, customization options, and scope of content. as such, the author is not providing product specific response information or analysis of responses for each specific product—such investigations and interpretations are the job of each individual library seriously wishing to evaluate the services to help decide which product seems most appropriate for its particular environment. several broad insights merit notice, and they are shared below. regarding a question on implementation (though some challenges were mentioned with a few responders), nothing reached the threshold of serious concern. all respondents indicated the new discovery service is already the default or primary search box on their website. one section of the early adopter questions focused on content. the questions in this area seemed a bit challenging for the respondents to provide lots of detail. in terms of “adequately covering a majority of the important library titles,” respondents varied from “too early to tell,” “it covers many areas but there are some big names missing,” to two of the respondents answering simply, “yes.” several respondents also clearly indicated that the web-scale discovery service is not the “beginning and ending” for discovery, a fact that even some of the discovery vendors openly note. for example, one respondent indicated that web-scale discovery doesn’t replace remote federated searching. a majority (not all) of the discovery vendors also have a federated search product that can, to varying degrees, be integrated with their preharvested, centralized, index-based discovery service. this allows additional content to be searched because such databases may include content not indexed within the web-scale discovery service. however, many are familiar with the limitations of federated search technologies: slow speed, poor relevancy ranking of results, and the need to configure and maintain sources and targets. such problems remain with federated search products integrated with web-scale discovery services. another respondent indicated they were targeting their discovery service at undergraduate research needs. another responded, “as a general rule, i would say the discovery service does an excellent job covering all disciplines. if you start really in-depth research in a specific discipline, it starts to break down. general searches are great . . . dive deeper into any discipline and it falls apart. for example, for a computer science person, at some point they will want to go to acm or ieee directly for deep searches.” related to this, “the catalog is still important, if you want to do a very specific search for a book record, the catalog is better. the discovery service does not replace the catalog.” in terms of satisfaction with content type (newspapers, articles, proceedings, etc.), respondents seemed generally happy with the content mix. a range of responses were received, such as “doesn’t appear to be a leaning one way or another, it’s a mix. some of these things depend on how you set the system up, as there is quite a bit of flexibility; the library has to make a decision on what they want searched.” another example was that “the vendor has been working very hard to balance content types and i’ve seen a lot of improvement,” “no imbalance, results seem pretty well rounded.” another responded, “a common complaint is that newspapers and book reviews dominate the search results, but that is much more a function of search algorithms then the amount of content in the index.” information technology and libraries | march 2012 42 when asked about positive or critical faculty feedback to the service, several respondents indicated they hadn’t had a lot of feedback yet. one indicated they had anecdotal feedback. another indicated they’d received backlash from some users who were used to other search services (but also added that it was no greater than backlash from any other service they’d implemented in the past—and so the backlash wasn’t a surprise). one indicated “not a lot of feedback from faculty, the tendency is to go to databases directly, librarians need to instruct them in the discovery service.” for student feedback, one indicated, “we have received a few positive comments and see increased usage.” another indicated, “reviews are mixed. we have had a lot of feedback thanking us for providing a search that covers articles and books. they like the ability to do one search and get a mix of resources without the search taking a long time. other feedback usually centers around a bug or a feature not working as it should, or as they understand it should. in general, however, the feedback has been positive.” another replied, “comments we receive are generally positive, but we’ve not collected them systematically.” some respondents indicated they had done some initial usability testing on the initial interface, but not the most recent one now in use. others indicated they had not yet conducted usability testing, but it was planned for later in 2010 or 2011. in terms of their fellow library staff and their initial satisfaction, one respondent indicated, “somewhere between satisfied and very satisfied . . . it has been increasing with each interface upgrade . . . our instruction librarians are not planning to use the discovery service this fall [in instruction efforts] because they need more experience with it . . . they have been overall intrigued and impressed by it . . . i would say our organization is grappling more with the implications of a discovery tools as a phenomenon than with our particular discovery service in particular. there seems to be general agreement that it is a good search tool for the unmediated searcher.” another indicated some concerns with the initial interface provided: “if librarians couldn’t figure it out, users can’t figure it out.” another responded, it was “a big struggle with librarians getting on board with the system and promoting the service to students. they continually compare it against the catalog. at one point, they weren’t even teaching the discovery service in bib instruction. the only way to improve things it with librarian feedback; it’s getting better, it has been hard. librarians have a hard time replacing the catalog and changing things that they are used to.” in terms of local customization, responses varied; some libraries had done basically no customization to the out-of-the-box interface, others had done extensive customization. one indicated they had tweaked sort options and added widgets to the interface. another indicated they had done extensive changes to the css. one indicated they had customized the colors, added a logo, tweaked the headers and footers, and created “canned” or preconfigured search boxes searching a subset of the index. another indicated they couldn’t customize the header and footer to the degree they would have liked, but were able to customize these elements to a degree. one respondent indicated they’d done a lot of customization to an earlier version of the interface, which had been rather painstaking, and that much of this broke when they upgraded to the latest version. that said, they also indicated the latest version was much better than the previous version. one respondent indicated it would be nice if the service could have multiple sources for investigations into library web-scale discovery services | vaughan 43 enriched record content so that better coverage could be achieved. one respondent indicated they were working on a complete custom interface from scratch, which would be partially populated with results from the discovery service index (as well as other data sources). a few questions asked about relevancy as a search concept and how well the respondents felt about the quality of returned results for queries. one respondent indicated, “we have been able to tweak the ranking and are satisfied at this point.” another indicated, “overall, the relevance is good – and it has improved a lot.” another noted, “known item title searching has been a problem . . . the issues here are very predictable – one word titles are more likely to be a problem, as well as titles with stopwords,” and noted the vendor was aware of the issue and was improving this. one noted, “we would like to be able to experiment with the discovery service more – and noted, “no relevancy algorithm control.” another indicated they looked to investigate relevance more once usability studies commenced, and noted they worked with the vendor to do some code changes with the default search mechanism. one noted that they’d like to be able to specify some additional fields that would be part of the algorithm associated with relevancy. another optimistically noted “as an early adopter, it has been amazing to see how relevance has improved. it is not perfect, but it is constantly evolving and improving.” a final question asked simply, “overall, do you feel your selection of this vendor’s product was a good one? do you sense that your users – students and faculty – have positively received the product?” for the majority of responses, there was general agreement from the early adopters that they felt they’d made the right choice. one noted that it was still early and the evaluation is still a work in progress, but felt it has been positively received. the majority were more certain, “yes, i strongly feel that this was the right decision . . . as more users find it, i believe we will receive additional positive feedback,” “yes, we strongly believe in this product and feel it has been adopted and widely accepted by our users,” “i do feel it was a good selection.” the external perspective: dialog with web-scale discovery vendors the preceding sections focused on an academic library’s perspective on web-scale discovery services—the thoughts, opinions, preferences, and vetting activities involving library staff. the following sections focus on the extensive dialog and interaction with the vendors themselves, regardless of the internal library perspective, and highlight the thorough, meticulous research activities conducted on five vendor services. the discovery task force sought to learn as much about the each service as possible, a challenging proposition given the fact that at the start of investigations, only two of five services had been released, and, unsurprisingly, very little research existed. as such, it was critical to work with vendors to best understand their services, and how their service compared to others in the marketplace. broadly summarized efforts included identification of services, drafting of multiple comprehensive question lists distributed to the vendors, onsite vendor visits, and continual tracking of service enhancements. activity: vendor identification over the course of a year’s work, the discovery task force executed several steps to systematically understand the vendor marketplace—the capabilities, content considerations, development cycles, and future roadmaps associated with five vendor offerings. given that the information technology and libraries | march 2012 44 task force began their work when only two of these services were in public release, there was no manual, recipe, or substantial published research to rely on. the beginning, for the unlv libraries, lie in identification of the services—one must first know the services to be evaluated before evaluation can commence. as mentioned previously, the discovery mini-summit held at the unlv libraries highlighted one product—serial solutions summon; the only released product at the time of the mini-summit was worldcat local. while no published peer-reviewed research highlighting these new web-scale discovery services existed, press and news releases did exist for the three to-be-released services. such releases shed light on the landscape of services that the task force would review—a total of five services, from the first-to-market, worldcat local, to the most recent entrant, primo central. oclc worldcat local, released in november 2007, can be considered the first web-scale discovery service as defined in this research; the experience of an early pilot partner (the university of washington) is profiled in a 2008 issue of library technology reports.14 in the uw pilot, approximately 30 million article-level items were included with the worldcat database. another product, serials solutions summon, was released in july 2009, and together these two services were the only ones publicly released when the discovery task force began its work. the task force identified three additional vendors each working on their own version of a web-scale discovery service; each of these services would enter initial general release as the task force continued its research: ebsco eds in january 2010, innovative interfaces encore synergy around may 2010, and ex libris primo central in june 2010. while each of these three were new in terms of web-scale discovery capabilities, each was built, at least in part, on earlier systems from the vendors. eds draws heavily from the ebscohost interface (the original version of which dates back to the 1990s), while the base encore and base primo systems were next-generation catalog systems that debuted in 2007. activity: vendor investigations after identification of existing and under development discovery services, a next step in unlv’s detailed vendor investigations included the creation of a uniform, comprehensive question list sent to each of the five vendors. the discovery task force ultimately developed a list of 71 questions divided into nine functional areas, as follows, with an example question: section 1: background. “when did product development begin (month, year)?” section 2: locally hosted systems and associated metadata. “with what metadata schemas does your discovery platform work? (e.g., marc, dublin core, ead, etc.)” section 3: publisher/aggregator coverage (full text and citation content). “with approximately how many publishers/aggregators have you forged content agreements ?” section 4: records maintenance and rights management. “how is your system initialized with the correct set of rights management information when a new library customer subscribes to your product?” investigations into library web-scale discovery services | vaughan 45 section 5: seamlessness & interoperability with existing content repositories. “for ils records related to physical holdings, is status information provided directly within the discovery service results list?” section 6: usability philosophy. “describe how your product incorporates published, established best practices in terms of a customer focused, usable interface.” section 7: local “look & feel” customization options. “which of the following can the library control: color scheme; logo / branding; facet categories and placement; etc.” section 8: user experience (presentation, search functionality, and what the user can do with the results). “at what point does a user leave the context and confines of the discovery interface and enter the interface of a different system, whether remote or local?” section 9: administration module & statistics. “describe in detail the statistics reporting capabilities offered by your system. does your system provide the following sets of statistics . . .” all vendors were given 2–3 weeks to respond, and all vendors responded. it was evident from the uneven level of responses to the questions that the vendors were at different developmental states with their products. some vendors were still 6–9 months away from initial public release; some were not even firm on when their service would enter release. it was also observed that some vendors were less explicit in the level of detail provided, reflective of, or in some cases perhaps regardless of, development state. a refined subset of the original 71 questions appears as a list of 40 questions in appendix f. apart from the detailed question list, various sets of free and licensed information on these discovery services are available online, and the task force sought to identify and digest the information. the charleston advisor has conducted interviews with several of the library webscale discovery vendors on their products, including ebsco,15 serials solutions,16 and ex libris.17 these interviews, each around a dozen questions, ask the vendors to describe their product, how it differs from other products in the marketplace, and include questions on metadata and content—all important questions. an article by ronda rowe reviews summon, eds, and worldcat local, and provides some analysis of each product on the basis of content, user interface and searchability, pricing, and contract options.18 it also provides a comparison of 24 product features provided by these three services, such as “search box can be embedded in any webpage,” “local branding possible,” and “supports social networking.” a wide variety of archived webcasts, many provided by library journal, are available through free registration, and new webcasts are being offered at time of writing; these presentations to some degree touch on discussions with the discovery vendors, and are often moderated or include company representatives as part of the discussion group.19 several libraries have authored reports and presentations that, at least partially, discuss information on particular services gained through their evaluations, which include dialog with the vendors.20 vendors themselves each have a section on their corporate website devoted to their service. information provided on these websites ranges from extremely brief to, in the case of worldcat local, very detailed and informative. in addition, much can be gained by “test-driving” live implementations. as such, a listing of vendor website addresses information technology and libraries | march 2012 46 providing more information as well as a list of sample, live implementations is provided in appendix g. activities: vendor visits and content overlap analysis each of the five vendors visited the unlv libraries in spring 2010. vendor visits all occurred within a nine-day span; visits were intentionally scheduled close to each other to keep things fresh in the minds of library staff, and such proximity would help with product comparisons. vendor visits lasted approximately half a day, and each vendor visit often included the field or regional sales representative as well as a product manager or technical expert. vendor visits included a demonstration and q&a for all library staff as well as invited colleagues from other southern nevada libraries, a meeting with the discovery task force, and a meeting with technical staff at unlv responsible for website design and application development and customization. vendors were each given a uniform set of fourteen questions on topics to address during their visit; these appear in appendix h. questions were divided into the broad topical areas of content coverage, end user interface and functionality, and staff “control” over the end user interface. on average, approximately 30–40 percent of the library staff attended the open vendor demo and q & a session. shortly after the vendor visits, a content-overlap analysis comparing unlv serials holdings with indexed content in the discovery service was sought from each vendor. given that the amount of content indexed by each discovery service was growing (and continues to grow) extremely rapidly as new publisher and aggregator content agreements are signed, this content-overlap analysis was intentionally not sought at an earlier date. some vendors were able to provide detailed coverage information against our existing journal titles (unlv currently subscribes to approximately 20,000 e-journals and provides access to another 7,000+ open-access titles). for others, this was more difficult. recognizing this, the head of collection development was asked to provide a list of the “top 100” journal titles for unlv based on such factors as usage statistics and whether the title was a core title for part of the unlv curriculum. the remaining vendors were able to provide content coverage information against this critical title list. four of the five products had quite comprehensive coverage (more than 80 percent) of the unlv libraries’ titles. while outside the scope of this article, “coverage” can mean different things for different services. driven by the publisher agreements they are able to secure, some discovery services may have extensive coverage for particular titles (such as the full text, abstracts, author-supplied keywords, subject headings, etc.), whereas other services, while covering the same title, may have “thinner” metadata, such as basic citation information (article title, publication title, author, publication date, etc.). more discussion on this topic is present in the january 2011 library technology reports on library web-scale discovery services.21 activity: product development tracking one aspect of web-scale discovery services, and the next-generation discovery layers that preceded them, is a rapid enhancement cycle, especially when juxtaposed against the turnkeystyle ils system that dominated library automation for many years. as an example, minor enhancements are provided by serials solutions to summon approximately every three to four weeks; provided by ebsco to ebsco discovery service approximately every three months; and investigations into library web-scale discovery services | vaughan 47 provided by ex libris to primo/primo central approximately every three months. many vendors unveil updates coinciding with annual library conferences, and 2010 was no exception. in late summer/early fall 2010, the discovery task force had conference calls or onsite visits with several of the vendors with a focused discussion on new enhancements and changes to services as well as to obtain answers to any questions that arose since their last visit several months earlier. since the vendor visits in spring 2010, each service had changed, and two services had unveiled significantly different and improved interfaces. the discovery task force’s understanding of web-scale discovery services had expanded greatly since starting their work. coordinated with the second series of vendor visits and discussions, an additional list of more than two dozen questions, recognizing this refined understanding, was sent to the majority of vendors. a portion of these questions are provided as part of the refined list of questions presented in appendix f. this second set of questions dealt with complex discussions of metadata quality, such as what level of content publishers and aggregators were providing for indexing purposes, e.g., full text, abstracts, table of contents, author-supplied keywords or subject headings, or particular citation and record fields), and also the vendor’s stance on content neutrality, i.e., whether they are entering into exclusive agreements with publishers and aggregators, and, if the discovery service vendor is owned by a company involved with content, if that content is promoted or weighted more heavily in result sets. other questions dealt with such topics as current install base counts and technical clarifications about how their service worked. in particular, the questions related to content were tricky for many (not all) of the vendors to address. still, the discovery task force was able to get a better understanding of how things worked in the evolving discovery environment. combined with the internal library perspective and the early adopter references, information gathered from vendors provided the necessary data set to submit a recommendation with confidence. activity: recommendation by mid-fall 2010, the discovery task force had conducted and had at their disposal a tremendous amount of research. recognizing how quickly these services change and the fact that a cyclical evaluation could occur, the task force members felt they had met their charge. if all things failed during the next phase—implementation—at least no one would be able to question the thoroughness of the task force’s efforts. unlike the hasty decision, which in part led to a less than stellar experience with federated search a few years earlier, the evaluation process to recommend a new web-scale discovery service was deliberate, thorough, transparent, and vetted with library stakeholders. given the discovery task force was entering its final phase, official price quotes were sought from each vendor. each task force member was asked to develop a pro/con list for all five identified products based on the knowledge that was gained. these lists were anonymized and consolidated into a single, extensive pro/con list for each service. some of the pros and cons were subjective (such as the interface aesthetics), some were objective (such as a particular discovery service not offering a desired feature). at one of the final meetings of the task force, members reaffirmed the three top contenders, indicated the other two were no longer under consideration and, afterward, were asked to rank their first, second, and third choices for the remaining services. while complete consensus wasn’t achieved, there was a resounding first choice, second choice, and third information technology and libraries | march 2012 48 choice. the task force presented a summary of findings at a meeting open to all library staff. this meeting summarized the research and evaluation steps the task force had conducted over the past year, framed each of the three shortlisted services by discussing some strengths and weaknesses of each service as observed by the task force, and sought to answer any questions from the library at large. prior to drafting the final report and making the recommendation to the dean of libraries, several task force members led a discussion and final question and answer at a libraries’ cabinet meeting, one of the high-level administrative groups at the unlv libraries. vetting by this body represented the last step related to the discovery task force’s investigation, evaluation, and recommendation for purchase of a library web-scale discovery service. the recommendation was broadly accepted by the library cabinet, and shortly afterward the discovery task force was officially disbanded, having met its goal of investigating, evaluating, and making a recommendation for purchase of a library web-scale discovery service. next steps the dialog above describes the research, evaluation, and recommendation model used by the unlv libraries to select a web-scale discovery service. such a model and the associated appendixes could serve as a framework, with some adaptations perhaps, for other libraries considering the evaluation and purchase of a web-scale discovery service. together, the discovery task force’s internal and external research and evaluation provided a substantive base of knowledge on which to make a recommendation. after its recommendation, the project progressed from a research and recommendation phase to an implementation phase. the libraries’ cabinet brainstormed a list of more than a dozen concise implementation bullet points—steps that would need to be addressed—including the harvesting and metadata mapping of local library resources, local branding and some level of customization work, and integration of the web-scale discovery search box in the appropriate locations on the libraries’ website. project implementation co-managers were assigned (the director of technical services and the web technical support manager), as well as key library personnel who would aid in one or more implementation steps. in january 2011, the implementation commenced, with an expected public launch of the new service planned for mid-2011. the success of a web-scale discovery service at the unlv libraries is a story yet to be written, but one full of promise. acknowledgements the author wishes to thank the other members of the unlv libraries’ discovery task force in the research and evaluation of library web-scale discovery services: darcy del bosque, alex dolski, tamera hanken, cory lampert, peter michel, vicki nozero, kathy rankin, michael yunkin, and anne zald. references 1. marcia j. bates, improving user access to library catalog and portal information, final report, version 3 (washington, dc: library of congress, 2003), 4, http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf (accessed september 10, 2010). http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf investigations into library web-scale discovery services | vaughan 49 2. roger c. schonfeld and ross housewright, faculty survey 2009: key strategic insights for libraries, publishers, and societies (new york: ithaka s+r, 2010), 4, http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-20002009/faculty%20study%202009.pdf (accessed september 10, 2010). 3. oclc, online catalogs: what users and librarians want (dublin, oh: oclc, 2009), 20, http://www.oclc.org/reports/onlinecatalogs/fullreport.pdf (accessed september 10, 2010). 4. ibid, vi. 5. ibid, 14. 6. karen calhoun, the changing nature of the catalog and its integration with other discovery tools: final report (washington, dc: library of congress, 2006), 35, http://www.loc.gov/catdir/calhoun-report-final.pdf (accessed september 10, 2010). 7. bibliographic services task force, rethinking how we provide bibliographic services for the university of california: final report ([pub location?] university of california libraries, 2005), 2, http://libraries.universityofcalifornia.edu/sopag/bstf/final.pdf (accessed september 10, 2010). 8. oclc, college students’ perceptions of libraries and information resources (dublin, oh: oclc, 2006), part 1, page 4, http://www.oclc.org/reports/pdfs/studentperceptions.pdf (accessed september 10, 2010). 9. doug way, “the impact of web-scale discovery on the use of a library collection,” serials review, in press. 10. bill kelm, “worldcat local effects at willamette university,” presentation, prezi, july 21, 2010, http://prezi.com/u84pzunpb0fa/worldcat-local-effects-at-wu/ (accessed sept 10, 2010). 11. michael boock, faye chadwell, and terry reese, “worldcat local task force report to lamp,”march 27, 2009, http://hdl.handle.net/1957/11167 (accessed february 12, 2012). 12. michael boock et al., “discovery services task force recommendation to university librarian,” http://hdl.handle.net/1957/13817 (accessed february 12, 2012). 13. ken varnum et al., “university of michigan library article discovery working group final report,” umich, january 29, 2010, http://www.lib.umich.edu/files/adwg/final-report.pdf.[access date?] 14. jennifer ward, pam mofjeld, and steve shadle, “worldcat local at the university of washington libraries,” library technology reports 44, no. 6 (august/september 2008). 15. dennis brunning and george machovec, “an interview with sam brooks and michael gorrell on the ebscohost integrated search and ebsco discovery service,” charleston advisor 11, no. 3 (january 2010): 62–65. http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/faculty%20study%202009.pdf http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/faculty%20study%202009.pdf http://www.oclc.org/reports/onlinecatalogs/fullreport.pdf http://www.loc.gov/catdir/calhoun-report-final.pdf http://libraries.universityofcalifornia.edu/sopag/bstf/final.pdf http://www.oclc.org/reports/pdfs/studentperceptions.pdf http://prezi.com/u84pzunpb0fa/worldcat-local-effects-at-wu/ http://hdl.handle.net/1957/11167 http://hdl.handle.net/1957/13817 http://www.lib.umich.edu/files/adwg/final-report.pdf information technology and libraries | march 2012 50 16. dennis brunning and george machovec, “interview about summon with jane burke, vice president of serials solutions,” charleston advisor 11, no. 4 (april 2010): 60–62. 17. dennis brunning and george machovec, “an interview with nancy dushkin, vp discovery and delivery solutions at ex libris, regarding primo central,” charleston advisor 12, no. 2 (october 2010): 58–59. 18. ronda rowe, “web-scale discovery: a review of summon, ebsco discovery service, and worldcat local,” charleston advisor 12, no. 1 (october 2010): 5–10. 19. library journal archived webcasts are available at http://www.libraryjournal.com/csp/cms/sites/lj/tools/webcast/index.csp (accessed sept 10, 2010). 20. boock, chadwell, and reese, “worldcat local task force report to lamp”; boock et al., “discovery services task force recommendation to university librarian”; ken varnum et al., “university of michigan library article discovery working group final report.” 21. jason vaughan, “library web-scale discovery services,” library technology reports 47, no. 1 (january 2011). note: appendices a–h available as supplemental files. http://www.libraryjournal.com/csp/cms/sites/lj/tools/webcast/index.csp investigations into library web-scale discovery services: appendices a-h jason vaughan information technology and libraries | march 2012 51 appendices appendix a. discovery task force timeline appendix b. discovery task force charge appendix c. discovery task force: staff survey 1 questions appendix d. discovery task force: staff survey 2 questions appendix e. discovery task force: early adopter questions appendix f. discovery task force: initial vendor investigation questions appendix g. vendor websites and example implementations appendix h. vendor visit questions investigations into library web-scale discovery services | vaughan 52 appendix a. discovery task force timeline information technology and libraries | march 2012 53 appendix b. discovery task force charge discovery task force charge informed through various efforts and research at the local and broader levels, and as expressed in the libraries 2010/12 strategic plan, the unlv libraries have the desire to enable and maximize the discovery of library resources for our patrons. specifically, the unlv libraries seek a unified solution which ideally could meet these guiding principles: • creates a unified search interface for users pulling together information from the library catalog as well as other resources (e.g. journal articles, images, archival materials). • enhances discoverability of as broad a spectrum of library resources as possible • intuitive: minimizes the skills, time, and effort needed by our users to discover resources •supports a high level of local customization (such as accommodation of branding and usability considerations) • supports a high level of interoperability (easily connecting and exchanging data with other systems that are part of our information infrastructure) •demonstrates commitment to sustainability and future enhancements •informed by preferred starting points as such, the discovery task force advises libraries administration on a solution that appears to best meet the goal of enabling and maximizing the discovery of library resources. a bulk of the work will entail a marketplace survey and evaluation of vendor offerings. charge specific deliverables for this work include: 1. identify vendor next generation discovery platforms, whether established and currently on the market, or those publicized and at an advanced stage of development, with an expectation of availability within a year’s time. identify & create a representative list of other academic libraries which have implemented or purchased currently available products. 2. create a checklist / criteria of functional requirements / desires for a next generation discovery platform. 3. create lists of questions to distribute to potential vendors and existing customers of next generation discovery platforms. questions will focus on broad categories such as the following: a. seek to understand how content hosted in our current online systems (iii catalog, contentdm, locally created databases, vendor databases, etc.) could/would (or not be able investigations into library web-scale discovery services | vaughan 54 to) be incorporated or searchable within the discovery platform. apart from our existing online systems as we know them today, the task force will explore, in general terms, how new information resources could be incorporated into the discovery platform. more explicitly, the task force will seek an understanding of what types of existing records are discoverable within the vendor’s next generation discovery platform, and seek an understanding of what basic metadata must exist for an item to be discoverable. b. seek to understand whether the solution relies on federated search, the creation of a central site index via metadata harvesting, or both, to enable discovery of items. c. additional questions, such as pricing, maintenance, install base, etc. 4. evaluate gathered information and seek feedback from library staff. 5. provide to the dean’s directs a final report which summarizes the task force findings. this report will include a recommended product(s) and a broad, as opposed to detailed, summary of workload implications related to implementation and ongoing maintenance. the final report should be provided to the dean’s directs by february 15, 2010. boundaries the work of the task force does not include: • detailing the contents of “hidden collections” within the libraries and seeking to make a concrete determination that such hidden collections, in their current form, would be discoverable via the new system. • conducting an inventory, recommending, or prioritizing collections or items which should be cataloged or otherwise enriched with metadata to make them discoverable. • coordination with other southern nevada nshe entities. • an ils marketplace survey. the underlying innovative millennium system is not being reviewed for potential replacement. • implementation of a selected product. [the charge concluded with a list of members for the task force] information technology and libraries | march 2012 55 appendix c. discovery task force: staff survey 1 questions “rank” means the surveymonkey question will be set up such that each option can only be chosen once, and will be placed on a scale that corresponds to the number of choices overall. “rate” means there will be a 5 point likert scale ranging from strongly disagree to strongly agree. section 1: customization. the “staff side” of the house 1. customization. it is important for the library to be able to control/tweak/influence the following design element [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  general color scheme  ability to include a unlv logo somewhere on the page.  ability to add other branding elements to the page.  ability to add one or more library specified links prominently in the interface (example: a link to the libraries’ home page)  able to customize the name of the product (meaning, the vendor’s name for the product doesn’t need to be used nor appear within the interface)  ability to embed the search box associated with the discovery platform elsewhere into the library website, such as the homepage (i.e. the user could start a search w/o having to directly go to the discovery platform 2. customization. are there any other design customization capabilities that are significantly important? please list, and please indicate if this is a high, low, or medium priority in terms of importance to you. (freetext box ) 3. search algorithms. it is important for the library to be able to change or tweak the platform’s native search algorithm to be able to promote desired items such that they appear higher in the returned list of [strongly disagree / disagree / neither agree or disagree / agree / strongly agree] [e.g. the library, at its option, could tweak one or more search algorithms to more heavily weight resources it wants to promote. for example, if a user searches for “hoover dam” the library could set a rule that would heavily weight and promote unlv digital collection images for hoover dam – those results would appear on the first page of results]. 4. statistics. the following statistic is important to have for the discovery platform [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  number of searches, by customizable timeframe number of item or article level records accessed (that is, a user clicks on something in the returned list of results)  number of searches generating 0 results investigations into library web-scale discovery services | vaughan 56  number of items accessed by type  number of items accessed by provider of content (that is, number of articles from particular database/fulltext vendor 5. statistics. what other statistics would you like to see a discovery platform provide and how important is this to you? (freetext box) 6. staff summary. please rank on a 1-3 scale how important the following elements are, with a “1” being most important, a “2” being 2nd most important, and a 3 being 3rd most important.  heavy customization capabilities as described in questions 1 & 2 above  ability to tweak search algorithms as described in question 3  ability for the system to natively provide detailed search stats such as described in question 4, 5. section 2. the “end user” side of the house 7. searching. which of the following search options is preferable when a user begins their search [choose one]  the system has a “google-like” simple search box  the system has a “google-like” simple search box, but also has an advanced search capability (user can refine the search to certain categories: author, journal, etc.)  no opinion 8. zero hit searches. for a search that retrieves no actual results: [choose one]  the system should suggest something else or ask, “did you mean?”  retrieving precise results is more important and the system should not suggest something else or ask “did you mean?”  no opinion 9. de-duplication of similar items. which of the following is preferable [choose one]  the system automatically de-dupes records (the item only appears once in the returned list)  the system does not de-dupe records (the same item could appear more than once in the returned list, such as when we have overlapping coverage of a particular journal from multiple subscription vendors)  no opinion information technology and libraries | march 2012 57 10. sorting of returned results. it is important for the user to be able to sort or reorder a list of returned results by . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  publication date  alphabetical by author name  alphabetical by title  full text items first  by media type (examples: journal, book, image, etc) 11. web 2.0 functionality on returned results. the following items are important for a discovery platform to have . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree] (note, if necessary, please conduct a search in the libraries’ encore system to help illustrate / remember some of the features/jargon mentioned below. in encore, “facets” appear on the left hand side of the screen; the results with book covers, “add to cart,” and “export” features appear in the middle; and a tag cloud to the right. note: this question is asking about having the particular feature regardless of which vendor, and not how well or how poorly you think the feature works for the encore system)  a tag cloud  faceted searching  ability to add user-generated tags to materials (“folksonomies”)  ability for users to write and post a review of an item • other (please specify) 12. enriched record information on returned results. the following items are important to have in the discovery system . . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  book covers for items held by the libraries  a google books preview button for print items held by the libraries  displays item status information for print items held by the libraries (example: available, checked out) 13. what the user can do with the results. the following functionality is important to have in the discovery system . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  retrieve the fulltext of an item with only a single click on the item from the initial list of returned results  ability to add items to a cart for easy export (print, email, save, export to refworks) investigations into library web-scale discovery services | vaughan 58  ability to place an interlibrary loan / link+ request for an item  system has a login/user account feature which can store user search information for later. in other words, a user could potentially log in to retrieve saved searches, previously stored items, or create alerts when new materials become available. 14. miscellaneous. the following feature/attribute is important to have in the discovery system . . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  the vendor has an existing mobile version of their discovery tool for use by smartphones or other small internet-enabled devices.  the vendor has designed the product such that it can be incorporated into other sites used by students, such as webcampus and/or social networking sites. such “designs” may include the use of persistent urls to embed hyperlinks, the ability to place the search box in another website, or specifically designed widgets developed by the vendor  indexing and availability of newly published items occurs within a matter of days as opposed to a week or perhaps a month.  library catalog authority record information is used to help return proper results and/or populate a tag cloud. 15. end user summary. please rank on a 1-8 scale how important the following elements are; a “1” means you think it is the most important, a “2” second most important, etc.  system offers a “google-like” simple search box only, as detailed in question 7 above  system offers a “did you mean?” or alternate suggestions for all searches retrieving 0 results as detailed in question 8 above (obviously, if you value precision of results over “did you mean” functionality, you would rank this toward the lower end of the spectrum).  system de-dupes similar items as detailed in question 9 above(if you believe the system should not dedupe similar items, you would rate this toward the lower end of the spectrum)  system provides multiple sort options of returned results as detailed in question 10 above  system offers a variety of web 2.0 features as detailed in question 11 above  system offer enriched record information as detailed in question 12 above  system offers flexible options for what a user can do with the results, as detailed in question 13 above  system has one or more miscellaneous features as detailed in question 14 above. section 3: content 16. incorporation of different information types. in an ideal world, a discovery platform would incorporate all of our electronic resources, whether locally produced or licensed/purchased from vendors. below is a listing of different information types. please rank on a scale of 1-10 how vital it is information technology and libraries | march 2012 59 that a discovery platform accommodate these information types (“1” is the most important item in your mind, a “2” is second most important, etc). a. innopac millennium records for unlv print & electronic holdings b. link+ records for print holdings held within the link+ consortium c. innopac authority control records d. records within oclc worldcat e. contentdm records for digital collection materials f. bepress digital commons institutional repository materials g. locally created web accessible database records (e.g. the special collections & architecture databases) h. electronic reserves materials hosted in eres i. a majority of the citation records from non fulltext, vendor licensed online index/abstract/citation databases (e.g. the “agricola” database) j. a majority of the fulltext articles or other research contained in many of our vendor licensed online resources (e.g. “academic search premier” which contains a lot of full text content, and the other fulltext resource packages / journal titles we subscribe to) 17. local content. related to item (g) in the question immediately above, please list any locally produced collections that are currently available either on the website, or in electronic format as a word document, excel spreadsheet or access database (and not currently available on the website) that you would like the discovery platform to incorporate. (freetext box) 18. particular sets of licensed resources, what’s important? please rank which of the licensed (full text or primarily full text) existing publishers below are most important for a discovery platform to accommodate. elsevier sage wiley springer american chemical society taylor & francis (informaworld) ieee american institute of physics oxford ovid nature emerald investigations into library web-scale discovery services | vaughan 60 section 4: survey summary 19. overarching survey question. the questions above were roughly categorized into three areas. given that no discovery platform will be everything to everybody, please rank on a 1-3 scale what the most important aspects of a discovery system are to you (1 is most critical, 2 is second in importance overall, etc.)  the platform is highly customizable by staff (types of things in area 1 of the survey)  the platform is highly flexible from the end-user standpoint (type of things in area 2 of the survey)  the platform encompasses a large variety of our licensed and local resources (type of things in area 3 of the survey) 20. additional input. the survey above is roughly drawn from a larger list of 71 questions sent to the discovery task force vendors. what other things do you think are really important when thinking about a next-generation discovery platform? (freetext input, you may write a sentence or a book) 21. demographic. what library division do you belong to? library administration library technologies research & education special collections technical services user services information technology and libraries | march 2012 61 appendix d. discovery task force: staff survey 2 question for the comparison questions, products are listed by order of vendor presentation. please mark an answer for each product. part i. licensed publisher content (e.g. fulltext journal articles; citations / abstracts) sa = strongly agree; a = agree; n= neither agree nor disagree; d = disagree; sd = strongly disagree 1. “the discovery platform appears to adequately cover a majority of the critical publisher titles.” sa a n d sd i don’t know enough about the content coverage for this product to comment ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 2. “the discovery platform appears to adequately cover a majority of the second-tier or somewhat less critical publisher titles.” sa a n d sd i don’t know enough about the content coverage for this product to comment ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 3. overall, from the content coverage point of view, please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 4. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the content coverage standpoint. unacceptable acceptable ex libris primo central investigations into library web-scale discovery services | vaughan 62 oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part ii. end-user functionality & ease of use 5. from the user perspective, how functional do you think the discovery platform is? are the facets and/or other methods that one can use to limit or refine a search appropriate? were you satisfied with the export options offered by the system (email, export into refworks, print, etc.)? if you think web 2.0 technologies are important (tag cloud, etc.), were one or more of these present (and well executed) in this product? the platform appears to be severely limited in major aspects of end user functionality the platform appears to have some level of useful functionality, but perhaps not as much or as well executed as some competing products. yes, the platform seems quite rich in terms of end user functionality, and such functions are well executed. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 6. from the user perspective, for a full-text pdf journal article, how easy is it to retrieve the full-text? does it take many clicks? are there confusing choices? it’s very cumbersome trying to retrieve the full text of an item, there are many clicks, and/or it’s simply confusing when going through the steps to retrieve the full text. it’s somewhat straightforward to retrieve a full text item, but perhaps it’s not as easy or as well executed as some of the competing products it’s quite easy to retrieve a full text item using this platform, as good as or better than the competition, and i don’t feel it would be a barrier to a majority of our users. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central information technology and libraries | march 2012 63 oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 7. how satisfied were you with the platform’s handling of “dead end” or “zero hit” searches? did the platform offer “did you mean” spelling suggestions? did the platform offer you the option to request the item via doc delivery / link+? is the vendor’s implementation of such features well executed, or were they difficult, confusing, or otherwise lacking? the platform appears to be severely limited in or otherwise poorly executes how it responds to a dead end or zero hit search. the platform handled dead end or zero hit results, but perhaps not as seamlessly or as well executed as some of the competing products. i was happy with how the platform handled “dead end” searches, and such functionality appears to be well executed, as good as or better than the competition. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 8. how satisfied were you with the platform’s integration with the opac? were important things such as call numbers, item status information, and enriched content immediately available and easily viewable from within the discovery platform interface, or did it require an extra click or two into the opac – and did you find this cumbersome or confusing? the platform provides minimal opac item information, and a user the platform appeared to integrate ok with the opac in i was happy with how the platform integrated with the i can’t comment on this particular product because i didn’t see the investigations into library web-scale discovery services | vaughan 64 would have to click through to the opac to get the information they might really need; and/or it took multiple clicks or was otherwise cumbersome to get the relevant item level information terms of providing some level of relevant item level information, but perhaps not as much or as well executed as competing products. opac. a majority of the opac information was available in the discovery platform, and/or their connection to the opac was quite elegant. vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 9. overall, from an end user functionality / ease of use standpoint – how a user can refine a search, export results, easily retrieve the fulltext, easily see information from the opac record – please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 10. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the user functionality / ease of use standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part iii. staff customization information technology and libraries | march 2012 65 11. the “out of the box” design demo’ed at the presentation (or linked to the discovery wiki page – whichever particular implementation you liked best for that product) was . . seriously lacking and i feel would need major design changes and customization by library web technical staff. middle of the road – some things i liked, some things i didn’t. the interface design was better than some competing products, worse than others. appeared very professional, clean, well organized, and usable; the appearance was better than most/all of the others products. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 12. all products offer some level of customization options that allow at least some changes to the “out of the box” platform. based on what the vendors indicated about the level of customization possible with the platform (e.g. look and feel, ability to add library links, ability to embed the search box on a homepage) do you feel there is enough flexibility with this platform for our needs? the platform appears to be severely limited in the degree or types of customization that can occur at the local level. we appear “stuck” with what the vendor gives us – for better or worse. the platform appeared to have some level of customization, but perhaps not as much as some competing products. yes, the platform seems quite rich in terms of customization options under our local control; more so than the majority or all of the other products. i can’t comment on this particular product because i didn’t see the vendor demo, don’t have enough information, and/or would prefer to leave this question to technical staff to weigh in on. ex libris primo central oclc worldcat local ebsco discovery services innovative encore investigations into library web-scale discovery services | vaughan 66 synergy serials solutions summon 13. overall, from a staff customization standpoint – the ability to change the interface, embed links, define facet categories, define labels, place the searchbox in a different webpage, etc., please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 14. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the staff customization standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part iv. summary questions 15. overall, from a content coverage, user functionality, and staff customization standpoint, please rank each product from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon information technology and libraries | march 2012 67 16. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the overall standpoint of content coverage, user functionality, and staff customization standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part v. additional thoughts 17. please share any additional thoughts you have on ex libris primo central. (freetext box) 18. please share any additional thoughts you have on oclc worldcat local. (freetext box) 19. please share any additional thoughts you have on ebsco discovery services. (freetext box) 20. please share any additional thoughts you have on innovative encore synergy. (freetext box) 21. please share any additional thoughts you have on serials solutions summon. (freetext box) investigations into library web-scale discovery services | vaughan 68 appendix e. discovery task force: early adopter reference questions author’s note: appendix e originally appeared in the january 2011 library technology reports: web scale discovery services as chapter 7, “questions to consider.” part 1 background 1. how long have you had your discovery service available to your end users? (what month and year did it become generally available to your primary user population, and linked to your public library website). 2. after you had selected a discovery service, approximately how long was the implementation period – how long did it take to “bring it up” for your end‐users and make it available (even if in ‘beta’ form) on your library website? 3. what have you named your discovery service, and is it the ‘default’ search service on your website at this point? in other words, regardless of other discovery systems (ils, digital collection management system, ir, etc.), has the new discovery service become the default or primary search box on your website? part 2 content: article level content coverage & scope “article level content” = articles from academic journals, articles from mainstream journals, newspaper content, conference proceedings, open access content 4. in terms of article level content, do you feel the preindexed, preharvested central index of the discovery platform adequately covers a majority of the titles important to your library’s collection and focus? 5. have you observed any particular strengths in terms of subject content in any of the three major overarching areas -humanities, social sciences, sciences? 6. have you observed any big, or appreciable, gaps in any of the three major overarching areas – humanities, social sciences, sciences? 7. have you observed that the discovery service leans toward one or a few particular content types (e.g. peer reviewed academic journal content; mainstream journal content; newspaper article content; conference proceedings content; academic open access content)? 8. are there particular publishers whose content is either not incorporated, (or not adequately incorporated), into the central index, that you’d like to see included (e.g. elsevier journal content)? 9. have you received any feedback, positive or negative, from your institution’s faculty, related to the content coverage within the discovery service? 10. taking all of the above questions into consideration, are you happy, satisfied, or dissatisfied with the scope of subject content, and formats covered, in the discovery platform’s central index? 11. in general, are you happy with the level of article level metadata associated with the returned information technology and libraries | march 2012 69 citation level results (that is, before one retrieves the complete full text). in other words, the product may incorporate basic citation level metadata (e.g. title, author, publication info), or it may include additional enrichment content, such as abstracts, author supplied keywords, etc. overall, how happy do you sense your library staff is with the quality and amount of metadata provided for a “majority” of the article level content indexed in the system? part 3 content: your local library resources 12. it’s presumed that your local library ils bib records have been harvested into the discovery solution. do you have any other local “homegrown” collections – hosted by other systems at your library or institution – whose content has been harvested into the discovery solution? examples would include digital collection content, institutional repository content, library subject guide content, or other specialized, homegrown local database content. if so, please briefly describe the content – focus of collection, type of content (images, articles, etc.), and a ballpark number of items. if no local collections other than ils bib record content have been harvested, please skip to question 15. 13. [for local collections other than ils bib records]. did you use existing, vendor provided ingestors to harvest the local record content (i.e. ingestors to transfer the record content, apply any transformations and normalizations to migrate the local content to the underlying discovery platform schema)? or did you develop your own ingestors from scratch, or using a toolkit or application profile template provided by the vendor? 14. [for local collections other than ils bib records]. did you need extensive assistance from the discovery platform vendor to help harvest any of your local collections into the discovery index? if so, regardless of whether the vendor offered this assistance for free or charged a fee, were you happy with the level of service received from the vendor? 15. do you feel your local content (including ils bib records) is adequately “exposed” during a majority of searches? in other words, if your local harvested content equaled a million records, and the overall size of the discovery platform index was a hundred million records, do you feel your local content is “lost” for a majority of end user searches, or adequately exposed? part 4 interface: general satisfaction level 16. overall, how satisfied are you and your local library colleagues with the discovery service’s interface? 17. do you have any sense of how satisfied faculty at your institution are with the discovery service’s interface? have you received any positive or negative comments from faculty related to the interface? 18. do you have any sense of how satisfied your (non-faculty) end-users are with the discovery service’s interface? have you received any positive or negative comments from users related to the interface? 19. have you conducted any end-user usability testing related to the discovery service? if so, can you provide the results, or otherwise some general comments on the results of these tests? 20. related to searching, are you happy with the relevance of results returned by the discovery service? have you noticed any consistent “goofiness,” or surprises with the returned results? if you could make a investigations into library web-scale discovery services | vaughan 70 change in the relevancy arena, what would it be, if anything? part 5 interface: local customization 21. has your library performed what you might consider any “major customization” to the product? or has it primarily been customizations such as naming the service, defining hyperlinks and the color scheme? if you’ve done more extensive customization, could you please briefly describe, and was the product architecture flexible enough to allow you to do what you wanted to do (also see question 22 below, which is related). 22. is there any particular feature or function that is missing or non-configurable within the discovery service that you wish were available? 23. in general, are you happy with the “openness” or “flexibility” of the system in terms of how customizable it is by your library staff? part 6: final thoughts 24. overall, do you feel your selection of this vendor’s product was a good one? do you sense that your users – students and faculty – have positively received the product? 25. have you conducted any statistics review or analysis (through the discovery service statistics, or link resolver statistics, etc.) that would indicate or at least suggest that the discovery service has improved the discoverability of some of your materials (whether local library materials or remotely hosted publisher content). 26. if you have some sense of the competition in the vendor discovery marketplace, do you feel this product offers something above and beyond the other competitors in the marketplace? if so, what attracted you to this particular product, what made it stand out? information technology and libraries | march 2012 71 appendix f. discovery task force: initial vendor investigation questions section 1: general / background questions 1. customer install base how many current customers do you have that have which have implemented the product at their institution? (the tool is currently available to users / researchers at that institution) how many additional customers have committed to the product? how many of these customers fall within our library type (e.g. higher ed academic, public, k-12)? 2. references can you provide website addresses for live implementations which you feel serve as a representative model matching our library type? can you provide references – the name and contact information for the lead individuals you worked with at several representative customer sites which match our library type? 3. pricing model, optional products describe your pricing model for a library type such as ours, including initial upfront costs and ongoing costs related to the subscription and technical support. what optional add-on services or modules (federated search, recommender services, enrichment services) do you market which we should be aware of, related to and able to be integrated with your web scale discovery solution? 4. technical support and troubleshooting briefly describe options customers have, and hours of availability, for reporting mission critical problems; and for reporting observed non mission-critical glitches. briefly describe any consulting services you may provide above and beyond support services offered as part of the ongoing subscription. (e.g. consulting services related to harvesting of a unique library resource for which an ingest/transform/normalize routine does not already exist). is there a process for suggesting enhancement requests for potential future incorporation into the product? 5. size of the centralized index. how many periodical titles does your preharvested, centralized index encompass? how many indexed items? 6. statistics. please describe what you feel are some of the more significant use, management or content related statistics available out-of-the-box with your system. investigations into library web-scale discovery services | vaughan 72 are the statistics counter compliant? 7. ongoing maintenance activities, local library staff. for instances where the interface and discovery service is hosted on your end, please describe any ongoing local library maintenance activities associated with maintaining the service for the local library’s clientele (e.g. maintenance of the link resolver database; ongoing maintenance associated with periodic local resource harvest updates; etc.) section 2: local library resources 8. metadata requirements and existing ingestors. what mandatory record fields for a local resource has to exist for the content to be indexed and discoverable within your platform (title, date)? please verify that your platform has existing connectors -ingest/transform/normalize tools and transfer mechanisms and/or application profiles for the following schema used by local systems at our library (e.g. marc 21 bibliographic records; unqualified / qualified dublin core, ead, etc.) please describe any standard tools your discovery platform may offer to assist local staff in crosswalking between the local library database schema and the underlying schema within your platform. our library uses the abc digital collection management software. do you have any existing customers who also utilize this platform, whose digital collections have been harvested and are now exposed in their instance of the discovery product? our library uses the abc institutional repository software. do you have any existing customers who also utilize this platform, whose digital collections have been harvested and are now exposed in their instance of the discovery product? 9. resource normalization. is content for both local and remote content normalized to a single schema? if so, please offer comments on how local and remote (publisher/aggregator) content is normalized to this single underling schema. to what degree can collections from different sources have their own unique field information which is displayed and/or figures into the relevancy ranking algorithm for retrieval purposes. 10. schedule. for records hosted in systems at the local library, how often do you harvest information to account for record updates, modifications, deletions? can the local library invoke a manual harvest of locally hosted resource records on a per-resource basis (e.g. from a selected resource – for example, if the library launches a new digital collection and want the records to be available in the new discovery platform shortly after they are available in our local digital collection management system, is there a mechanism to force a harvest prior to the next regularly scheduled harvest routine? after harvesting, how long does it typically take for such updates, additions, and deletions to be reflected in the searchable central index? information technology and libraries | march 2012 73 11. policies / procedures. please describe any general policies and procedures not already addressed which the local library should be aware of as relates to the harvesting of local resources. 12. consortial union catalogs. can your service harvest or provide access to items within a consortial or otherwise shared catalog (e.g. the inn-reach catalog). please describe. section 3: publisher and aggregator indexed content 13. publisher/aggregator agreements: general with approximately how many publishers have you forged content agreements with? are these agreements indefinite or do they have expiration dates? have you entered into any exclusive agreements with any publishers/aggregators (i.e. the publisher/aggregator is disallowed from forging agreements with competing discovery platform vendors, or disallowed from providing the same deep level of metadata/full text for indexing purposes). 14. comments on metadata provided. could you please provide some general comments on the level of data provided to you, for indexing purposes, by the “majority” of major publishers/aggregators with which you have forged agreements. please describe to what degree the following elements play a role in your discovery service: a. “basic” bibliographic information (article title/journal title/author/publication information) b. subject descriptors c. keywords (author supplied?) d. abstracts (author supplied?) e. full text 15. topical content strength do you feel there is a particular content area that you feel the service covers especially well or leans heavily toward (e.g. humanities, social sciences, sciences). do you feel there is a particular content type that you feel the service covers very well or leans heavily toward (scholarly journal content, mainstream journal content, newspapers, conference proceedings). what subject / content areas, if any, do you feel the service may be somewhat weak? are there current efforts to mitigate these weaknesses (e.g. future publisher agreements on the horizon)? 16. major publisher content agreements. are there major publisher agreements that you feel are especially significant for your service? if so, which publishers, and why (e.g. other discovery platform vendors may not have such agreements with those particular providers; the amount of content was so great that it greatly augmented the size and scope of your service; etc.) investigations into library web-scale discovery services | vaughan 74 17. content considered key by local library (by publisher). following is a list of some major publishers whose content the library licenses which is considered “key.” has your company forged agreements with these publishers to harvest their materials. if so please describe in general the scope of the agreement. how many titles are covered for each publisher? what level of metadata are they providing to you for indexing purposes (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). a. ex. elsevier b. ex. sage c. ex. taylor and francis d. ex. wiley / blackwell 18. content considered key by local library (by title). following is a list of some major journal / newspaper titles whose content the library licenses which is considered “key.” could you please indicate if your central index includes these titles, and if so, the level of indexing (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). a. ex. nature b. ex. american historical review c. ex. jama d. ex. wall street journal 19. google books / google scholar. do any agreements exist at this time to harvest the data associated with the google books or google scholar projects into your central index? if so, could you please describe the level of indexing (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). 20. worldcat catalog. does your service include the oclc worldcat catalog records? if so, what level of information is included? the complete record? holdings information? 21. e-book vendors. does your service include items from major e-book vendors? 22. record information. given the fact that the same content (e.g. metadata for a unique article) can be provided by multiple sources (e.g. the original publisher of the journal itself, an open access repository, a database / aggregator, another database / aggregator, etc.), please provide some general comments on how records are built within your discovery service. for example: a. you have an agreement with a particular publisher/aggregator and they agree to provide you with rich metadata for their content, perhaps even provide you with indexing they’ve already done for their content, and may even provide you with the full text for you to be able to “deep index” their content. b. you’ve got an agreement with a particular publisher who happens to be the only publisher/provider of that content. they may provide you rich info, or they may provide you rather weak info. in any case, you choose to incorporate this into your service, as they are the only provider/publisher of the info. or, information technology and libraries | march 2012 75 alternately, they may not be the only publisher/provider of the info, but they are the only publisher/provider you’ve currently entered into an agreement with for that content. c. for some items appearing within your service, content for those items is provided by multiple different sources whom you’ve made agreements with. in short, there will be in some/many cases of overlap for unique items, such as a particular article title. in such cases, do you create a “merged/composite/super record” -where your service utilizes particular metadata from each of the multiple sources, creating a “strong” single record built from these multiple resources. 23. deduping. related to the question immediately above, please describe your services’ approach (or not) to deduplicating items in your central index. if your service incorporates content for a same unique item from more than one content provider, does your index retrieve and display multiple instances of the same title? or do you create a merged/composite/super record, and only this single record is displayed? please describe. section 4: open access content 24. open access content sources. does your service automatically include (out of the box, no additional charge) materials from open access repositories? if so, could you please list some of the major repositories included (e.g. arxiv e-prints; hindawi publishing corporation; the directory of open access journals; hathi trust materials; etc.). 25. open access content sources: future plans. in addition to the current open access repositories that may be included in your service, are there other repositories whose content you are planning to incorporate in the future? 26. exposure to other libraries’ bibliographic / digital collection / ir content. are ils bibliographic records from other customers using your discovery platform exposed for discoverability in the searchable discovery instance of another customer? are digital collection records? institutional repository records? section 5: relevancy ranking 27. relevancy determination. please describe some of the factors which comprise the determination of relevancy within your service. what elements play a role, and how heavily are they weighted for purposes of determining relevancy? 28. currency. please comment on how heavily currency of an item plays in relevancy determination. does currency weigh more heavily for certain content types (e.g. newspapers)? 29. local library influence. does the local library have any influence or level of control over the relevancy algorithm? can they choose to “bump up” particular items for a search? please describe. 30. local collection visibility. could you please offer some comments on how local content (e.g. ils bibliographic records; digital collections) remains visible and discoverable within the larger pool of content indexed by your service? for example, local content may measures a million items, and your centralized index may cover half a billion items. investigations into library web-scale discovery services | vaughan 76 31. exposure of items with minimal metadata. some items likely have lesser metadata than other items. could you please offer some comments on how your system ensures discoverability for items with lesser or minimal metadata. 32. full text searching. does your service offer the capability for the user to search the fulltext of materials in your service (i.e. are they searching a full text keyword index?) if so, approximately what percentage of items within your service are “deep indexed?” 33. please describe how your system deals when no hits are retrieved for a search. does your system enable “best-match” retrieval – that is, something will always be returned or recommended? what elements play into this determination; how is the user prevented from having a completely “dead-end” search? section 6: authentication and rights management 34. open / closed nature of your discovery solution. does your system offer an unauthenticated view / access? please describe and offer some comments on what materials will not be discoverable/visible for an unauthenticated user. a. licensed full text b. records specifically or solely sourced from abstract and indexing databases c. full citation information (e.g. an unauthenticated user may see just a title; an authenticated user would see fuller citation information) d. enrichment information (such as book image covers, table of contents, abstracts, etc.) e. other 35. exposure of non-licensed resource metadata. if one weren’t to consider and take into account any e-journal/publisher package/database subscriptions & licenses the local library pays for, is there a base index of citation information that’s exposed and available to all subscribers of your discovery service? this may include open access materials, and/or bibliographic information for some publisher / aggregator content (which often requires a local library license to access the full text). please describe. would a user need to be authenticated to search (and retrieve results from) this “base index?” approximately how large is this “base index” which all customers may search, regardless of local library publisher/aggregator subscriptions. 36. rights management. please discuss how rights management is initialized and maintained in your system, for purposes of determining whether a local library user should have access to the full text (or otherwise “full resolution” if a library doesn’t license the fulltext – such as resolution to a detailed citation/abstract). information technology and libraries | march 2012 77 our library uses the abc link resolver. our library uses the abc a-z journal listing service. our library uses the abc electronic resource management system. is your discovery solution compatible with one/all of these systems for rights management purposes? is one approach preferable to the other, or does your approach explicitly depend on one of these particular services? section 7: user interface 37. openness to local library customization. please describe how “open” your system is to local library customization. for example, please comment on the local library’s ability to a. rename the service b. customize the header and footer hyperlinks / color scheme c. choose which facet clusters appear d. define new facet clusters e. embed the search box in other venues f. create canned, pre-customized searches for an instance of the search box g. define and promote a collection, database, or item such that it appears at the top or on the first page of any search i. develop custom “widgits” offering extra functionality or download “widgits” from an existing user community (e.g. image retrieval widgits such as flickr integration; library subject guide widgits such as libguides integration; etc. j. incorporate links to external enriched content (e.g. google book previews; amazon.com item information) k. other 38. web 2.0 social community features. please describe some current web 2.0 social features present in your discovery interface (e.g. user tagging, ratings, reviews, etc.). what, if any, plans do you have to offer or expand such functionality in future releases? 39. user accounts. does your system offer user accounts? if so, are these mandatory or optional? what services does this user account provide? a. save a list of results to return to at a later time? investigations into library web-scale discovery services | vaughan 78 b. save canned queries for later searching? c. see a list of recently viewed items? d. perform typical ils functions such as viewing checked out items / renewals / holds? e. create customized rss feeds for a search 40. mobile interface. please describe the mobile interfaces available for your product. is it a browser based interface optimized for smallscreen devices? is it a dedicated iphone, android, or blackberry based executable application? 41. usability testing. briefly describe how your product incorporates published, established “best practices” in terms of a customer focused, usable interface. what usability testing have your performed and/or do you conduct on an ongoing basis? have any other customers that have gone live with your service completed usability testing that you’re aware of? information technology and libraries | march 2012 79 appendix g: vendor websites and example implementations oclc worldcat local www.oclc.org/us/en/worldcatlocal/default.htm example implementations: lincoln trails library system www.lincolntrail.info/linc.html university of delaware www.lib.udel.edu university of washington www.lib.washington.edu willamette university http://library.willamette.edu serials solutions summon www.serialssolutions.com/summon example implementations: dartmouth college www.dartmouth.edu/~library/home/find/summon drexel university www.library.drexel.edu university of calgary http://library.ucalgary.ca western michigan university http://wmich.summon.serialssolutions.com ebsco discovery services www.ebscohost.com/discovery example implementations: james madison university www.lib.jmu.edu mississippi state university http://library.msstate.edu northeastern university www.lib.neu.edu university of oklahoma http://libraries.ou.edu investigations into library web-scale discovery services | vaughan 80 innovative interfaces encore synergy encoreforlibraries.com/tag/encore-synergy example implementations: university of nebraska-lincoln http://encore.unl.edu/iii/encore/home?lang=eng university of san diego http://sallypro.sandiego.edu/iii/encore/home?lang=eng scottsdale public library http://encore.scottsdaleaz.gov/iii/encore/home?lang=eng sacramento public library http://find.saclibrarycatalog.org/iii/encore/home?lang=eng ex libris primo central www.exlibrisgroup.com/category/primocentral example implementations: (note: example implementations are listed in alphabetical order. some implementations are more open to search by an external audience, based on configuration decisions at the local library level.) brigham young university scholarsearch www.lib.byu.edu (note: choose all-in-one search) northwestern university http://search.library.northwestern.edu vanderbilt university discoverlibrary http://discoverlibrary.vanderbilt.edu (note: choose books, media, and more) yonsei university (korea) wisearch: articles + library holdings http://library.yonsei.ac.kr/main/main.do (note: choose the articles + library holdings link. the interface is available in both korean and english; to change to english, select english at the top right of the screen after you have conducted a search and are within the primo central interface) information technology and libraries | march 2012 81 appendix h. vendor visit questions content 1. please speak to how well you feel your product stacks up against the competition in terms of the licensed full-text / citation content covered by your product. based on whatever marketplace or other competitive analysis you may have done, do you feel the agreements you’ve made with publishers equal, exceed, or trail the agreements other competitors have made? 2. from the perspective of an academic library serving undergraduate and graduate students as well as faculty, do you feel that there are particular licensed content areas your product covers very well (e.g. humanities, social sciences, sciences). do you feel there are areas which you need to build up? 3. what’s your philosophy going forward in inking future agreements with publishers to cover more licensed content? are there particular key publishers your index currently doesn’t include, but whom you are in active negotiations with? 4. we have several local content repositories, such as our digital collections in contentdm, our growing ir repository housed in bepress, and locally developed, web-searchable mysql databases. given the fact that most discovery platforms are quite new, do you already have existing customers harvesting their local collections, such as the above, into the discovery platform? have any particular, common problems surfaced in their attempts to get their local collections searchable and exposed in the discovery platform? 5. let’s say the library subscribes to an ejournal title – journal of animal studies -that’s from a publisher with whom you don’t have an agreement for their metadata, and thus, supposedly, don’t index. if a student tried to search for an article in this journal – “giraffe behavior during the drought season,” what would happen? is this content still somehow indexed in your tool? would the discovery platform invoke our link resolver? please describe. 6. our focus is your next generation discovery platform, and not on your “traditional” federated search product which may be able to cover other resources not yet indexed in your next generation discovery platform. that said, please briefly describe the role of your federated search product vis a vis the next generation discovery platform. do you see your federated search product “going away” once more and more content is eventually indexed in your next generation discovery platform? end user interface & functionality 7. are there any particular or unique look and feel aspects of your interface that you feel elevate your product above your competitors? if so, please describe. 8. are there any particular or unique functionality aspects of your product that you feel elevate it above the competition (e.g. presearch or postsearch refinement categories, export options, etc.) 9. studies show that end users want very quick access to full text materials such as electronic journal articles and ebooks. what is your product’s philosophy in regards to this? does your platform, in your opinion, provide seamless, quick access to full text materials, with a minimum of confusion? please describe. investigations into library web-scale discovery services | vaughan 82 related to this, does your platform de-dupe results, or is the user presented with a list of choices for a single, particular journal article they are trying to retrieve? in addition, please describe a bit how your relevancy ranking works for returned results. what makes an item appear first or on the first page of results? 10. please describe how “well” your product integrates with the library’s opac (in our case, innovative’s millennium opac). what information about opac holdings can be viewed directly in the discovery platform w/o clicking into the catalog and opening a new screen (e.g. call #, availability, enriched content such as table of contents or book covers?) in addition, our opac uses “scopes” which allow a user – if they choose – to limit at an outset (prior to a search being conducted) what collection they are searching. in other words, these scopes are location based, not media type based. for our institution, we have a scope for the main library, one for each of our three branch libraries, and a scope for the entire unlv collection. would your system be able to incorporate or integrate these pre-existing scopes in an advanced search mode? and/or, could these location based scopes appear as facets which a user could use to drill down a results list? 11. what is your platform’s philosophy in terms of “dead end searches.” does such a thing exist with your product? please describe what happens if a user a.) misspells a word b.) searches for a book or journal title / article that our library doesn’t own/license, but that we could acquire through interlibrary loan. staff “control” over the end user interface 12. how “open” is your platform to customization or interface design tweaks desired by the library? are there any particular aspects that the library can customize with your product that you feel elevate it above your competitors (e.g. defining facet categories; completely redesigning the end-user interface with colors, links, logos; etc.)? what are the major things customizable by the library, and why do you think this is something important that your product offers. 13. how “open” is your platform to porting over to other access points? in other words, provided appropriate technical skills exist, can we easily embed the search box for your product into a different webpage? could we create a “smaller,” more streamlined version of your interface for smartphone access? overarching question 14. in summary, what are some of the chief differentiators of your product from the competition? why is your product the best and most worthy of serious consideration? abstract introduction why web-scale discovery? q: if you could provide one piece of advice to your library, what would it be? the internal academic library perspective: genesis of the unlv libraries discovery task force the following sections of this article begin with a focus on the internal unlv library perspective—from early discussions focused on the broad topic of discovery to establishing a task force charged to identify, research, evaluate, and recommend a pot... activity: understanding web-scale activity: initial staff survey table 1. web-scale discovery service capabilities activity: second staff survey activity: early adopter references activity: vendor identification activity: vendor investigations activity: product development tracking activity: recommendation next steps references catqc and shelf-ready material | jay, simpson, and smith 41 michael jay ([e-mail?]) is information technology expert, software unit, information technology department; betsy simpson is chair, cataloging and metadata department; and doug smith is head, copy cataloging unit, cataloging and metadata department, george a. smathers libraries, university of florida, gainesville. michael jay, betsy simpson, and doug smith catqc and shelf-ready material: speeding collections to users while preserving data quality libraries contract with vendors to provide shelf-ready material, but is it really shelf-ready? it arrives with all the physical processing needed for immediate shelving, then lingers in back offices while staff conduct itemby-item checks against the catalog. catqc, a console application for microsoft windows developed at the university of florida, builds on oclc services to get material to the shelves and into the hands of users without delay and without sacrificing data quality. using standard c programming, catqc identifies problems in marc record files, often applying complex conditionals, and generates easy-to-use reports that do not require manual item review. a primary goal behind improvements in technical service workflows is to serve users more efficiently. however, the push to move material through the system faster can result in shortcuts that undermine bibliographic quality. developing safeguards that maintain sufficiently high standards but don’t sacrifice productivity is the modus operandi for technical service managers. the implementation of oclc’s worldcat cataloging partners (wcp, formerly promptcat) and bibliographic record notification services offers an opportunity to retool workflows to take advantage of automated processes to the fullest extent possible, but also requires some backroom creativity to assure that adequate access to material is not diminished. n literature review quality control has traditionally been viewed as a central aspect of cataloging operations, either as part of item-byitem handling or manual and automated authority maintenance. how this activity has been applied to outsourced cataloging was the subject of a survey of academic libraries in the united states and canada. a total of 19 percent of libraries in the survey indicated that they forgo quality control of outsourced copy, primarily for government documents records. however, most respondents reported they review records for errors. of that group, 50 percent focus on access points, 30 percent check a variety of fields, and a significant minority—20 percent—look at all data points. overall, the libraries expressed satisfaction with the outsourced cataloging using the following measures of quality supplied by the author: accuracy, consistency, adequacy of access points, and timeliness.1 at the inception of oclc’s promptcat service in 1995, ohio state university libraries participated in a study to test similar quality control criteria with the stated goals of improving efficiency and reducing copyediting. the results were so favorable that the author speculated that promptcat would herald a future where libraries can “reassess their local practices and develop greater confidence in national standards so that catalog records can be integrated into local opacs with minimal revision and library holdings can be made available in bibliographic databases as quickly as possible.”2 fast forward a few years and the new incarnation of promptcat, wcp, is well on its way to fulfilling this dream. in a recent investigation conducted at the university of arkansas libraries, researchers concluded that error review of copy supplied through promptcat is necessary, but the error rate does not warrant discontinuance of the service. the benefits in terms of time savings far outweigh the effort expended to correct errors, particularly when the focus of the review is to correct errors critical to user access. while the researchers examined a wide variety of errors, a primary consideration was series headings, particularly given the problems cited in previous studies and noted in the article.3 with the 2006 announcement by the library of congress (lc) to curtail its practice of providing controlled series access, the cataloging community voiced great concern about the effect of that decision on user access.4 the arkansas study determined that “the significant number of series issues overall (even before lc stopped performing series authority work) more than justifies our concern about providing series authority control for the shelf-ready titles.” approximately one third of the outsourced copy across the three record samples studied had a series, and, of that group, 32 percent needed attention, predominantly taking the form of authority record creation with associated analysis and classification decisions.5 the overwhelming consensus among catalogers is that error review is essential. as far as can be determined, an underlying premise behind such efforts seems to be that it is done with the book in hand. but could there be a way to satisfy the concerns without the book in hand? certainly, validation tools embedded in library management systems provide protections whether records are manually entered or batchloaded, and outsourced authority maintenance services (for those who can use them) offer further control. but a customizable tool that allows libraries to target specific needs, both standards-based and local, without relying on item-by-item handling can contribute michael jay (emjay@ufl.edu) is information technology expert, software unit, information technology department; betsy simpson (betsys@uflib.ufl.edu) is chair, cataloging and metadata department; and doug smith (dougsmith@uflib.ufl .edu) is head, copy cataloging unit, cataloging and metadata department, george a. smathers libraries, university of florida, gainesville. 42 information technology and libraries | march 2009 to an economy of scale demanded by an environment with shrinking budgets and staff to devote to manual bibliographic scrutiny. if that tool is viewed as part of a workflow stream involving local error detection at the receiving location as well as enhancement at the network level (i.e., oclc’s bibliographic record notification service), then it becomes an important step in freeing catalogers to turn their attention to other priorities, such as digitized and hidden collections. n local setting and workflow the george a. smathers libraries at the university of florida encompasses six branches that address the information needs of a diverse academic research campus with close to fifty thousand undergraduate and graduate students. the technical services division, which includes the acquisitions and licensing department and the cataloging and metadata department, acquires and catalogs approximately forty thousand items annually. seeking ways to minimize the handling of incoming material, beginning in 2006 the departments developed a workflow that made it possible to send shelf-ready incoming material directly to the branches after check-in against the invoice. shelf-ready items represent approximately 30 percent of the libraries’ purchased monographic resources at this time. by using wcp record loads along with vendor-supplied shelf-ready processing, the time from receipt to shelf has been reduced significantly because it is no longer necessary to send the bulk of the shipments to cataloging and metadata. exceptions to this practice include specific categories of material that require individual inspection. the vendor is asked to include a flag in books that fall into many of these categories: n any nonprocessed book or book without a spine label n books with spine labels that have numbering after the date (e.g., vol. 4, no. 2) n books with cds or other formats included n books with loose maps n atlases n spiral-bound books n books that have the words “annual,” “biennial,” or a numeric year in the title (these may be a serial add to an existing record or part of a series that will be established during cataloging) to facilitate a post–receipt record review for those items not sent to cataloging and metadata, acquisitions and licensing runs a local programming tool, catqc, which reports records containing attributes cataloging and metadata has determined necessitate closer examination. figure 1 is an example of the reports generated, which are viewed using the mozilla firefox browser. copy catalogers rotate responsibility for checking the report and revising records when necessary. retrieval of the physical piece is only necessary in the 1 percent of cases where the item needs to be relabeled. n catqc report catqc analyzes the content of the wcp record file and identifies records with particular bibliographic coding, which are used to detect potential problems: 1. encoding levels 2, 3, 5, 7, e, j, k, m 2. 040 with non-english subfield b 3. 245 fields with subfields h, n, or p 4. 245 fields with subfields a or b that contain numerals 5. 245 fields with subfields a or b that contain red flag keywords 6. 246 fields 7. 490 fields with first indicator 0 8. 856 fields without subfield 3 9. 6xx fields with second indicators 4, 5, 6, and 7 the numbers following each problem listed below indicate which codes are used to signal the presence of a potential problem. minimal-level copy (1) the library’s wcp profiles, currently in place for three vendors, are set up to accept all oclc encoding levels. with such a wide-open plan, it is important to catch records with minimal-level copy to assure that appropriate access points exist and are coded correctly. the library encounters these less-than-full encoding levels infrequently. parallel records (2) catqc identifies foreign library records that are candidates for parallel record treatment by indicating in the report if the 040 has a non-english subfield b. the report includes a 936 field if present to alert catalogers that a parallel record is available. volume sets (3, 4, 5) the library does not generally analyze the individual volumes of multipart monographic sets (i.e., volume sets) even when the volumes have distinctive titles. these catcq and shelf-ready material | jay, simpson, and smith 43 “volume,” “part,” and “number” as well as common abbreviations of those words (e.g., v. or vol.). serial vs. monograph treatment (4, 5) titles owned by the library and classified as serials sometimes are ordered inadvertently as monographs, resulting in the delivery of a monographic record. a similar problem also occasionally arises with new titles. by detecting numerals, keywords, or the presence of one or more of the subfields in the 245 field, we can quickly scan a list of records with these characteristics. of course, most of the records detected by catqc are false hits because of the broad scope of the search; however, it takes only a few minutes to scan through the record list. non-print formats (3) the library does not receive records for any format other than print through wcp. consequently, detecting the presence of a subfield h in the 245 field is a good signal that there may be a problem with the record. alternate titles (6) alternate titles can be an important access point for library users. sometimes text that should properly be in subfield i (e.g., “at head of title”) of the 246 field is placed in subfield a in front of the alternate title. this adversely affects user access to the title through browse searching. catqc checks for and reports the presence of a 246 field. the cataloger can then quickly confirm that it is coded correctly. untraced series (7) as a program for cooperative cataloging (pcc) participant, the library opted to follow pcc practice to continue to trace series despite lc’s decision in 2006 to treat as untraced all series statements in newly cataloged records. because some libraries chose to follow lc in its decision, there has been an overall increase in the use of untraced series statements across all types of record-encoding volumes are added to the collection under the title of the set. the june 2006 decision by lc to produce individual volume records when a distinctive title exists caused concern about the integrity of the libraries’ existing open volume set records. because such records typically have enumeration indicated in the subfield n, and sometimes p, of the 245 field, the program searches for instances of those subfields. in addition, the program detects the presence of numerals in the 245 and keywords such as figure 1. an example report from catcq 44 information technology and libraries | march 2009 levels. to address this issue, catqc searches all wcp records for 490 fields with first indicator 0. catalogers check the authority files for the series and make any necessary changes to the records. this is by far the most frequent correction made by catalogers. links (8) to provide users with information about the nature of the urls displayed in the catalog, catalogers insure that explanatory text is recorded in subfield 3 of the 856 field. catqc looks for the absence of subfield 3, and, if absent, displays the 856 field in the report as a hyperlink. the cataloger adds the appropriate text (e.g., full text) as needed. subject headings with second indicators 4, 5, 6, and 7 (9) the catqc report reviewed by catalogers includes subject headings with second indicator 4. when these headings duplicate headings already on the record, catalogers delete them from our local system. when the headings are not duplicates, the catalogers change the second indicator 4 to 0. typically, 6xx fields with second indicators 5, 6, and 7 contain non-english headings based on foreign thesauri. these headings can conflict with lc headings and, in some cases, are cross references on lc authorities. the resulting split files are not only confusing to patrons, but also add to the numbers of errors reported that require authority maintenance. for these reasons, our policy is to delete the headings from our local system. catqc detects the presence of second indicators 5, 6, or 7 and creates a modified file with the headings removed with one exception: a heading with second indicator 7 and subfield 2 of “nasat,” which indicates the heading is taken from the national aeronautics and space administration thesaurus, is not removed because the local preference is to retain the “nasat” headings. n library-specific issues catqc resolves local problems when needed. for example, when more than one lc call number was present on the record, the wcp spine manifest sent to the vendor used to contain the second call number, which was affixed to the item. when the wcp records were loaded into the library’s catalog, the first call number populated the holding. as a result, there was a discrepancy between the spine label on the book and the call number in the catalog. prior to generating the report, catqc found multiple instances of call numbers in the records in the wcp file and created a modified file with the call numbers reordered so that the correct call number was used on the holding when the record was loaded. previously, the library’s opac did not display the text in subfield 3 of the 856 field, which specifies the type of material covered by the link, and to the user it appeared that the link was to a full-text resource. this was particularly troublesome for records with lc links to table of contents, publisher descriptions, contributor information, and sample text. to prevent user frustration, catqc was programmed to move the links on the wcp records to 5xx fields. when the opac interface improved and the programming was no longer necessary, catqc was revised. n analysis to see how well catqc and oclc’s bibliographic notification service were meeting our goal of maintaining high-quality bibliographic control, 63 reports were randomly selected from the 171 reports generated by catqc between october 2007 and april 2008. catqc found no problems in twelve (19 percent) of the selected reports. these twelve were not used in the analysis, leaving fifty-one catqc reports examined with at least one potential problem flagged for review. an average of 35.6 percent of the records in the sample of reports was flagged as requiring review by a cataloger. an average of thirteen possible problems was detected per report. of these, 55 percent were potential problems requiring at least some attention from the cataloger. the action required of the cataloger varied from simply checking the text of a field displayed in the report (e.g., 246 fields) to bringing up the record in aleph and editing the bibliographic record (e.g., verifying and correcting series headings or eliminating unwanted subject headings). why the relatively high rate of false positives (45 percent)? to minimize missing serials and volumes belonging to sets, catqc is designed to err on the side of caution. two of the criteria listed earlier were responsible for the vast majority of the false positives generated by catqc: 245 fields with subfields a or b that contain numerals and 245 fields with subfields a or b that contain red-flag keywords. clearly, if every record with a numeral in the 245 is flagged, a lot of hits will be generated that are not actual problems. the list of keywords was purposefully designed to be extensive. for example, “volume,” “vol.,” and “v.” are all triggers causing a record to be flagged. therefore a bibliographic record containing the phrase “volume cost profit analysis” in the 245 field would be flagged as a potential problem. at first glance, a report filled with so many false positives may seem inefficient and burdensome for catalogers to use; however, this is largely mitigated by the excellent display format. the programmer worked closely with catcq and shelf-ready material | jay, simpson, and smith 45 the copy cataloging unit staff to develop a user-friendly report format. each record is framed separately, making it easy to distinguish from adjoining records. potential problems are highlighted with red lettering immediately alerting catalogers to what the potential problem might be. whenever a potential problem is found, the text of the entire field appears in the report so that catalogers can see quickly whether the field triggering the flag is an actual problem. it takes a matter of seconds to glance through the 245 fields of half a dozen records to see if the numeral or keyword detected is a problem. the catalogers who work with these reports estimated that it took them between two and three hours per month to both review the files and make corrections to bibliographic records. a second component of bibliographic quality maintenance is oclc’s bibliographic record notification service. this service compares newly upgraded oclc records with records held by the library and delivers the upgraded records to the library. because catqc flags records with encoding levels of 2, 3, 5, 7, e, j, k, and m, it was possible to determine if these records had, in fact, been upgraded in oclc. in the sample, thirty-three records were flagged because of the encoding level. no upgrade had been made to 21.2 percent of the records in oclc as of august 2008. upgrades had been made to 45.5 percent of the records. the remaining 33.3 percent of the records were manually loaded by catalogers in copy cataloging. these typically are records for items brought to copy cataloging by acquisitions and licensing because they meet one or more of the criteria for individual inspection discussed previously. when catalogers search oclc and find that the received record has not been upgraded, they search for another matching record. a third of the time, a record of higher quality than that received is found in oclc and exported to the catalog. the reason why the record of better quality is not harvested initially is not clear. it is possible that at the time the records were harvested both records were of equivalent quality and by chance one was enhanced over another. in no instance had any of the records originally harvested been upgraded (this is not reflected in the 21.2 percent of records not upgraded). encoding level 8 records are excluded from catqc reports. because of the relatively quick turnaround for upgrades of this type of copy, the library decided to rely solely on the bibliographic record notification service. n technical specifications catqc is a console application for windows. written in standard c, it is designed to be portable to multiple operating systems with little modification. no graphic interface was developed because (a) the users are satisfied with the current operating procedure and (b) the treatment of the records is predefined as a matter of local policy. the user opens a command console (cmd.exe) and types “catqc”+space+“[name of marc file]”+enter. the corrected file is generated; catqc analyzes the modified file and creates the xml report. it moves the report to a reviewing folder on a file server across the lan and indicates to the user that it is terminating. modifications require action by a programmer; the user cannot choose from a list of options. benefits include a 100 kb file size and a processing speed of approximately 1,000 records per second. no quantitative analysis has yet been done related to the speed of processing, but to the user the entire process seems nearly instantaneous. the genesis of the project was an interest in the record structure of marc files brought about in the programmer by the use of earlier local automation tools. the project was speculative. the first experiment contained the programming structure that would become catqc. one record is read into memory at a time, and there is another array held for individual marc fields. conceptually, the records are divided into three portions—leader, directory, and dataset—when the need arises to build an edited record. initially there was no editing, only the production of the report. the generation of strict, valid xml is a significant aspect of catqc. an original document type was created, along with a corresponding cascading style sheet. the reports are viewable to anyone with an xml–capable browser either through file server, web server, or e-mail. (the current version of internet explorer does not fully support the style sheet syntax.) this continues to be convenient for the report reviewers because they do not have to be client application operators. see appendix a for an excerpt of a document instance and appendix b for the document type definition. catqc is not currently a generalized tool such as marcedit, a widely used marc editing utility that provides a standard array of basic capabilities: field counting, field and subfield deletion (with certain conditional checks), field and subfield additions, field swapping and text replacement, and file conversion to and from various formats such as marcxml and dublin core as well as between marc-8 and utf-8 encodings.6 marcedit continues to grow and does offer programmability that relies on the windows scripting host. this requires the user to either learn vbscript or use the wizards offered by marcedit. the catqc development goal was to create a report, viewable through a lan or the internet, which alerts a group of catalogers to potential problems with specific records, often illustrating those problems. although it might have been possible to use a combination of marcedit capabilities and local programming to help achieve this goal, it likely would have been a more cumbersome route, particularly taking into consideration the multidimensional 46 information technology and libraries | march 2009 conditionals desired. it was deemed easier to write a program that addresses local needs directly in a language already familiar to the programmer. as catqc evolved, it was modified to identify more potential problems and to do more logical comparisons as well as to edit the files as necessary before generating the reports. catqc addresses a particular workflow directly and provides one solution. it is procedural as opposed to event driven or object oriented. with version 1.3, the generic functions were extracted into a marclib 1.0, a common object file format library. functions specific to local workflow remain in catqc. the program is freely available to interested libraries by contacting the authors. as of this writing, the university of florida plans to distribute this utility under the gnu public license version 3 (see www.opensource.org/licenses/gpl-3.0.html) while retaining copyright. n conclusion catqc provides catalogers an easy way to check the bibliographic quality of shelf-ready material without the book in hand. as a result, throughput time from receipt to shelf is reduced, and staff can focus data review on problem areas—those affecting access or interfering with local processes. some of the issues addressed by catqc are of concern to all libraries while others reflect local preferences. the program could be easily modified to conform to those preferences. automation tools such as catqc are of key importance to libraries seeking ways to streamline workflows to the benefit of users. references and notes 1. vinh-the lam, “quality control issues in outsourcing cataloging in united states and canadian academic libraries,” cataloging & classification quarterly 40, no. 1 (2005): 101–22. 2. mary m. rider, “promptcat: a projected service for automatic cataloging—results of a study at the ohio state university libraries,” cataloging & classification quarterly 20, no. 4 (1995): 43. 3. mary walker and deb kulczak, “shelf-ready books using promptcat and ybp: issues to consider (an analysis of errors at the university of arkansas),” library collections, acquisitions, & technical services 31, no. 2 (2007): 61–84. 4. “lc pulls plug on series authority records,” cataloging & classification quarterly 43, no. 2 (2006): 98–99. 5. walker and kulczak, “shelf-ready books.” 6. for more information about marcedit, see http://oregon state.edu/~reeset/marcedit/html/index.php. wcp file analysis: 201 records analyzed. record: 71 oclc number: 243683394 timestamp: 20080824000000.0 245: 10 |a difference algebra /|c levin alexander. 245 h 245 n 245 p numerals keywords appendix a. catqc document instance excerpt catcq and shelf-ready material | jay, simpson, and smith 47 490: 0 |a algebras and applications ;|v v. 8 . . . appendix b. catqc document type definition 48 information technology and libraries | march 2009 academic uses of google earth and google maps in a library setting eva dodsworth and andrew nicholson academic uses of google earth and google maps in a library setting | dodsworth and nicholson 102 abstract over the last several years, google earth and google maps have been adopted by many academic institutions as academic research and mapping tools. the authors were interested in discovering how popular the google mapping products are in the academic library setting. a survey was conducted to establish the mapping products’ popularity, and type of use in an academic library setting. results show that over 90 percent of the respondents use google earth and google maps either to help answer research questions, to create and access finding aids, for instructional purposes or for promotion and marketing. the authors recommend expanding the mapping products’ user base to include all reference and liaison librarians. introduction since their launch in 2005, google maps and google earth have had an enormous impact on the way people think, learn, and work with geographic information. with easy access to spatial and cultural information, google maps/earth has provided users with the means to understand their world and their communities of interest. moreover, the customizable map features and dynamic presentation tools found in google maps and google earth make each one an attractive option for someone wanting to teach geographic information or make customized maps. for academic researchers, google mapping applications are also appealing for their powerful ability to share and host projects, create customized kml (keyhole markup language) files, and to easily communicate their own research findings in a geographic context. recognizing their potential for revitalizing map collections and geographic education, the authors felt that many academic libraries were also going to be active in using google maps/earth for a variety of purposes, from promoting their services to developing their own google kml files for users. with google earth’s ease of use and visualization capabilities, it was even thought that academic libraries would be using google earth heavily in instruction classes bringing geographic information to subject areas traditionally outside of geography. as active users of google maps/earth in their roles as academic librarians at their universities, the authors became curious to know what other academic librarians were doing with google maps/earth, particularly those working with maps and/or geography subjects. were they using eva dodsworth (edodsworth@uwaterloo.ca) is geospatial data services librarian, university of waterloo library, waterloo, and andrew nicholson (andrew.nicholson@utoronto.ca) is gis/data librarian, hazel mccallion academic learning centre, university of toronto mississauga, ontario mailto:edodsworth@uwaterloo.ca mailto:andrew.nicholson@utoronto.ca information technology and libraries | june 2012 103 the technology as part of their librarian roles on campus? how were they using it? what impacts was it having in how they delivered library services? to help answer these questions, the authors set out on a three-stage process with the aim of providing a more complete picture of google maps/earth use in academic libraries. the first stage consisted of a literature search focusing on library and information science research databases, to see what (if any) scholarly research had been written that discussed the role of google maps/earth in academic libraries. the second stage of the research had the authors examining over a dozen academic library websites to assess how they were integrating google maps/earth either through an api plug-in on their website or advertising other google maps/earth related services and collections. the third stage had the authors compile a set of twenty survey questions which were then distributed to academic librarians across canada and the united states, probing the use of google mapping products in the academic library setting. literature review despite the ubiquity of google for information searching, there was a surprising paucity of literature that documents the impact of google maps/earth in academic libraries. nevertheless, there are some articles which indicate just how much google maps can help raise the profile of library services. terry ballard, a librarian at quinnipiac university, describes in a few articles how he and colleagues were able to use google earth placemarks to promote his library’s special collections.1 the potential for “discovering the library with google earth” is also a theme in an article by brenner and klein in which the portland state university library linked their urban planning documents collection to google earth for ease of searching.2 although the focus is on public libraries, michael vandenburg documents how his library system began “using google maps as an interface for the library catalogue.” in his article, vandenburg discusses that the inspiration for such a project came about through various google maps mashups that were popular on search oriented websites such as “housing maps,” which combined realtor listings from craigslist with a google maps api. using api coding, vandenburg was able to link latitude and longitude data of countries to individual opac records enabling a visual search for items at the country level.3 while these articles focused on use of google earth as a collection discovery tool, troy swanson notes the visualization aspects of the applications and their utility for teaching information literacy. swanson has students use google earth and second life as tools to create a virtual exhibit on malcolm x. although swanson notes that the final output by the students did not meet the initial expectations, valuable learning opportunities for teaching in a 3d space were recognized and should be pursued. 4 some of these opportunities are highlighted as case studies by lamb, noting the visualization aspects of google earth would be very useful for librarians providing instruction.5 academic uses of google earth and google maps in a library setting | dodsworth and nicholson 104 google maps/earth & academic libraries: a scan of selected library websites for the next stage, the authors performed an environmental scan of academic library websites to see how they are using and implementing google mapping technology into their services. many are doing creative and innovative project work which will, we hope, encourage and guide other libraries to consider doing something similar. mapping technology can be used in several different ways, and with internet users becoming more proficient using this technology, libraries have the opportunity to take advantage of this communication medium. any document or image that has a geographic component can be digitized and made easily accessible using online mapping technology. the following section will review some of the projects highlighted on websites. the projects can be grouped into the following categories: finding aids, collection distribution, and teaching and reference services. finding aids all collections in libraries require some sort of finding aid to locate library material—the most obvious one being the library catalog. however, there are many location-based materials that use customized finding aids such as map and air photo indexes, and geospatial data coverage maps. for several years now libraries have been trying to make access to the finding aids easier by digitizing them and offering them online. not only are online versions easily updatable, but they are quite often created using google technology, allowing for the use of modern basemaps and zoom capabilities. traditional paper indexes can be difficult to navigate, especially the historical ones, making the search process rather difficult for users and library staff. one of the most popular types of online indexes created by libraries is air photo indexes. most map libraries collect air photos, and many use similar indexes to help locate aerial photography for an area of interest. several libraries have digitized the indexes making the same information available online. users simply zoom into a geographical area and click on a point to retrieve the photo information they need in order to locate the air photo in the library collection. some libraries will even send an electronic copy of the photo to the users. the mcgill university library, for example, has made its air photo information available from their webpage in a kml format to be viewed in google earth. users can click on a point of interest to easily obtain the air photo information. mcgill library has also digitized topographic indexes, making them also available via google earth.6 the university of western ontario’s serge a. sauer’s library also provides its air photo indexes online, incorporating google maps directly into their website. placemarks representing individual photos have been inserted on a google map, along with the photo description so that when users click on the placemark, photo information is released. using google mapping technology to offer online finding aids that are searchable by location is an innovative and cost-free step towards collection accessibility. what would make these types of library collections even more accessible, however, is offering users online access to digital versions of the collection items themselves. so to bring the indexing project one step forward, not only would the photo reference information be made available, but the actual image would be too, information technology and libraries | june 2012 105 thereby allowing libraries to use google mapping technology as an avenue for collection distribution and delivery. collection delivery libraries have had digital collections for quite some time. many of course do not need to digitize resources themselves as they subscribe to products such as electronic journals and books. however there are still some less common collections that are physically housed in libraries that would be much more accessible to users if they were exposed and made available online. an internet search has shed light on numerous digitization projects that use google mapping technology to search for and deliver location-based collections. examples of these types of collections include historical maps and air photos, archived photos and postcards, audio interviews, community information, textual documents like letters and diaries, and gis data. mcmaster university library is one example of a library that has digitized a historical map collection and made it available online. an index to its world war i military maps and aerial photography was created using google maps, and was embedded into its webpage.7 users can click on an area of interest to bring up the corresponding high resolution map image. likewise, brock university library has also offered its historical air photo collection online, allowing users to search using a google map, and then download photos of interest.8 additionally, yale university library has created kml indexes of its fire insurance plans, with direct links to the digitized images.9 the university of connecticut library has digitized its local historical maps and using google maps had created a map mashup which includes historic landmarks. clicking on the landmarks provides users with links to related resources. several libraries have digitized other imagery, such as postcards and photography. this is particularly popular with archival and specialized collections. the university of vermont library has embedded a google map into its website with placemarks that when clicked lead the user to the library’s long trail collection, an assortment of over 900 images of the oldest long-distance hiking trail in the united states. the images have been digitized from hand-colored lantern slides.10 cleveland state university library has also done something similar with its cleveland memory project, in which google maps were embedded into the library webpage and placemarks of local historic landmarks added. when users click on the placemarks, they are able to access a description of the landmark along with a photograph of it. clicking on “more information” will lead the user to several related resources, including the library catalog, where original documents about the location are available (e.g., images, books).11 besides digitizing their collections, some libraries have also georeferenced them so that they could not only be accurately located using an index, but so that they could be viewed in google earth (kml format). offering collections in kml format greatly increases exposure and use of geographic resources because google earth is one of the more popular location-based applications used by library users and the public. geographic files such as georeferenced air photos and satellite images, as well as gis data used to be only viewed in specialized gis programs. but gis technology has evolved into so many online applications, offering all computer users the benefits of geographic information and a platform to distribute information. academic uses of google earth and google maps in a library setting | dodsworth and nicholson 106 the university of waterloo map library is one example of a library that had digitized its historical air photo collection and made the images available in kml format for google earth usage.12 users can access a map index of the available photos from the map library webpage and then click on the index to download the images. the university of north carolina library has georeferenced several historical maps and made them available for viewing as an image overlay in google maps. this particular mapping project consists of around 150 thematic maps, including historical soil surveys, road and highway maps, city/county maps, and more. users can take advantage of the georeferenced maps and accurately compare historical features to modern ones with google maps’ basemap. having a preview of the dataset before it is downloaded assists the user in downloading only what is needed.13 perhaps more popular than a library’s air photo collection are libraries’ collections of geospatial data. geospatial, or gis data, has traditionally been only used by users who have access to gis programs such as esri’s arcgis, or arcview. more recently, librarians have discovered that when spatial files are converted into easy-to use file formats, such as kml, the user group is broadened and the files are used more. so it is no surprise that several libraries have converted their gis shapefiles (a spatial data file format used specifically in gis programs) into kml files and made them available for download from their webpages. university of connecticut library offers its gis files online in various formats, including kml. it also provides a sample image of the gis layer in google maps.14 baruch college at the city university of new york has made neighborhood census data available in google maps. the geographic boundary files were overlaid in google maps, and clicking on the map will lead users to the files available from the american census bureau’s fact finder. clearly, many libraries have incorporated google mapping technology into their digitization projects. the technology has proven capable of attracting collections that are not strictly locationfocused such as maps and air photos, but that have a location associated with it, such as archival photos of community landmarks or books written about a specific locale. google mapping technology makes the organization and storage of collections relatively effortless for library project managers, and it makes collection searching and distribution simple and friendly for the users. other uses of google maps/earth in libraries perhaps one of the simplest uses of google mapping technology can be illustrated by visiting several library websites. many libraries have embedded google maps into their website as either a webpage header15 survey: what are academic library staff doing with google maps/earth?16 following the review of the literature and academic library websites, the authors wanted to discover how academic librarians themselves were using google maps and google earth in their work, if at all. to capture this data, the authors compiled a set of survey questions targeting those in the academic library community who work with maps, gis, or geography/geology/earth science subject matter. information technology and libraries | june 2012 107 in preparing the survey questions, the authors were aware of a “survey fatigue” among the academic library community. at the time of research, many surveys were going out to librarians requesting their time and responses, so the authors wanted to keep the survey concise both in terms of number of questions, but also in the types of questions. in the end, the survey was created with twenty questions consisting of six yes/no questions, seven multiple choice, and the remaining seven questions being short answer. for distributing the survey, the authors wanted to reach as many librarians who worked with maps, geospatial data and government document subject matter as possible. the survey was then distributed on specialized map library and government publication listservs, including maps-l, govinfo, gis4lib, and carta (canadian maps & air photo systems forum). the survey was also distributed on the members’ only lists belonging to the association of canadian map libraries & archives (acmla) and the western association of map libraries (waml) listservs. the survey was made available on survey monkey for two months from december 2010 to the end of january 2011. the responses with the survey available during a quieter period of library activities, and thanks to a couple of reminder emails being sent out on the lists, our questionnaire received a total of 83 responses. who is using google maps/earth? the first couple questions dealt with the department or area of the library in which the respondent worked in, and what their position encompassed. as expected, a large majority of respondents, 81 percent, worked in “map/gis services” while 28.8 percent also had “general reference” responsibilities. other library service areas mentioned included “data services” and “it,” as well as some that fell outside library boundaries where staff worked in geography and environment science departments. not surprisingly, 52 percent of the responses indicated that their position was “librarian,” with the majority being “gis librarian” or “map librarian.” others included “reference & instructional services librarian” and “science librarian.” also received were 17 responses from gis specialists, library technicians and map assistants. what was especially noteworthy was that 12 responses were from library administrators, directors, or department heads who were finding time to work with google earth as part of their responsibilities. this number also included gis coordinators and map curators responsible for making decisions in their departments. google mapping products : what is being used, how often and for what purpose? to gain an understanding of how library staff are using google mapping products, a series of questions was asked of the respondents to determine which products were being used, how often and for which tasks. respondents were given a list of all the google mapping products available, and were asked to indicate which ones they had worked with. not surprisingly, the top two products used by respondents were google maps, 93 percent (71) and google earth, 91 percent (69). google maps api had been used by 40 percent (30) of the respondents, followed by google earth pro at 38 percent (29). eight percent (6) had also worked with google earth api, and 7 percent (5) had used google earth plus. interestingly, one respondent indicated that they had deployed google earth enterprise in their library. academic uses of google earth and google maps in a library setting | dodsworth and nicholson 108 figure 1. respondents’ use of google mapping products since many of these users may have simply used the products occasionally, it was important to get a sense of how often the products were being used. when asked the question “how regularly do you work with google mapping products for work-related projects?” 69 percent (54) responded that they use the products at least once a month. of those responses, 45 percent (35) use them at least weekly. specifically, eighteen percent (14) use them one to two times a week, thirteen percent (10) use them three to four times a week, and fourteen percent (11) use them even more often than that. only six percent (5) responded that they don’t use the products at all. information technology and libraries | june 2012 109 figure 2. frequency of use for work-related projects as google maps/earth can be used in many different ways and for different purposes in a library environment, the survey inquired how in fact these products were being used in their libraries. the survey question listed four possible tasks that the technology could be used for with the additional option for respondents to enter their own ‘other’ usages. respondents could check off all that applied. the options given included: • instruction • promotion/marketing • answering research questions • creating/accessing a finding aid tool (air photo map indexes, etc.) • other: (fill in answer) the majority of respondents, 82 percent (58) indicated they were using the products to answer research questions; 61 percent (43) for creating or accessing a finding aid tool; 56 percent (40) for instruction purposes; 27 percent for promotion/marketing and 20 percent (14) have used them for “other” purposes including georeferencing imagery, for use in webpages or creating learning objects. academic uses of google earth and google maps in a library setting | dodsworth and nicholson 110 figure 3. level and frequency of use in instruction are google mapping products being used for library instruction? for the authors, one of the best aspects of google maps/earth applications is their visualization capabilities. the ability to easily create and display geographic information to engage students makes google mapping applications an ideal instruction tool. in many ways, google maps and google earth have helped promote map and spatial literacy as concepts that are teachable. despite the free availability and ease of use of google mapping applications, the authors were somewhat surprised from the survey to find that 72 percent of library staff surveyed noted that their institution did not have any kind of map, spatial, or geospatial literacy policy in place. when it came time to provide instruction in the classroom, the survey found that only 31 percent (26) of the respondents had even used google earth in a classroom. nevertheless, in looking at the course levels, library instruction with google earth tools is actually occurring at all levels, from first year to graduate. significantly however, the frequency of the instruction seems to peak in the fourth year, where staff are using in upwards of six to nine courses. respondents were asked to give some details of these sessions, and they included a variety of class topics from environmental awareness education for first year students, to learning digitization skills in later years. has your library taken advantage of google map/earth technology for promotion or marketing purposes? information technology and libraries | june 2012 111 from our environmental scan of library websites we saw many interesting uses of google maps and google earth that were embedded directly into websites. perhaps because of this the authors were surprised to find that 55 percent of the survey respondents did not believe their library was using these technologies for promotion or marketing purposes. for those respondents who were using google maps or google earth to boost services for users, quite a few provided interesting examples of what this technology can offer. many were using google map apis to enhance map and aerial photo indexes, creating greater awareness of these resources and enhancing access. one respondent noted they had created a campus tour that highlighted all of the buildings that made up the library system, while others were using google api technology to showcase particular digitization projects such as folklore collections or geologic atlases. when asked if such activities have helped to enhance services or provided benefits to users, many responded that they had for both the users and for other library staff. greater speed and an increased familiarity of the collections were cited by several respondents, who no longer need to consult the paper indexes. does the library provide support to the wider campus community using google mapping products (not including instructional collaborations)? although many libraries are now using google maps and google earth technology, the authors were surprised that many were not actively leveraging this expertise across their campuses. almost all the respondents either skipped the question or stated that they were not providing this kind of active support. several noted that their gis services were open to all and that they were responsible for the google earth pro licences on campus, but that this was the extent of their support. working with google map/earth (kml) files in the last few years, kml files have become one of the more popular ways to display and distribute geographic information online. with its ease of use, and access, kml files have considerably broadened the user base of geographic information. kml files can be easily created in google earth, and they can be easily converted from gis files in specialized programs. it is this ease of access and usability that has popularized geographic information, hence increasing exposure to library collections and services. this survey was therefore interested in determining how libraries are using and creating kml files. when survey respondents were asked whether they work with kml files, 64 percent (47) responded they did, with 85 percent (40) claiming that they create their own kml files. for those who create their own, 92 percent (34) said that they created kml files by converting them from another file format using an external application, such as arcgis, earthpoint, ogr20gr, or shp2kml software. 78 percent (29) also created them in google earth, and 32 percent (12) created kml files by writing their own xml code. the authors were most interested to know if kml files were actually held as part of the library holdings. thirty percent (13) of respondents noted that they provide access to their kml files as academic uses of google earth and google maps in a library setting | dodsworth and nicholson 112 part of their collections, with 89 percent, (8) claiming they could be located through the library website. other areas mentioned for access included libguides and specialized gis data catalogues available through the library’s website. in terms of quantity, one respondent claimed a collection of 500-800 kml files, while other responses mentioned amounts in the ranges from 5 to 100, with some claiming that they were not sure exactly how many made up their collection. what other online mapping tools are used in your library apart from google maps and google earth? although google maps and google earth are perhaps the most well-known online mapping tools available, the authors were also interested to learn if there were other products that libraries were using as part of their service offerings. as expected many mentioned esri’s arcgis online and esri’s arcexplorer, while other responses included bing maps, openstreetmap, and open layers. discussion google mapping applications are clearly being used for academic purposes in library settings. with such diverse capabilities made available in these programs, library professionals are using them in several different ways. google earth and google maps are popular among library staff who work with gis and/or map collections. in fact, over 90 percent of the respondents use both products, either to help answer research questions, to create and access finding aids, for instructional purposes or for promotion and marketing. google mapping products have also helped libraries revitalize their collections as well as assist in transferring spatial information literacy skills to academic students and faculty. the authors hope that readers who work in a map/gis library setting will be inspired by the many examples of online mapping projects outlined in this paper and they will too use the online tools to the benefit of their library and their library users. google mapping products offer libraries an online platform to share information, and resources in an easy, accessible and low-cost way. the survey results also indicate that map/gis professionals in academic libraries trust and rely on google maps/earth as a solution to many academic queries and needs. since google mapping products were created for the use by mainstream society, it can be suggested that all other nonmap and gis related fields may find the products to be beneficial and useful to them as well. google earth and google maps are very easy to learn and the users do not require any spatial or mapping skills. as this survey was limited to map/gis users, the authors do not know how, if at all, google mapping products are being used by other library staff. this will be a future area of study. the authors do strongly suggest however for map/gis librarians to consider offering training sessions to reference staff and liaison librarians. as a multidisciplinary tool, many subject areas can benefit from google maps/earth, as it’s certainly not a tool for use by only gis/map librarians. with a little bit of training, all library staff can use google mapping products to assist with research questions, spatial literacy, location-based projects and library instruction. in fact, library staff members responsible for nontraditional library material such as photographs, postcards, audio recordings, original hand-written documents, etc. may want to consider using online mapping products to organize their collection. too many times such original material is lost in the library’s filing system, is irretrievable or unavailable during convenient hours. google maps/earth will organize all collections based on their geographic location and can offer access to the actual information technology and libraries | june 2012 113 material. more exposure to and training on these free and easy to use products can increase collection use, promote mapping technology, and organize the library’s holdings. references 1 terry ballard, “inheriting the earth: using kml files to add placemarks relating to the library’s original content to google earth and google maps” new library world 110 (2009): 357-65, doi: 10.1108/0307480091097579. jacobsen, mikael and terry ballard, “google maps: you are here: using google maps to bring out your library’s local collections” library journal, october 15, 2008 (accessed september 11, 2011). http://www.libraryjournal.com/article/ca6602836.html. 2 michaela brenner and peter klein, “discovering the library with google earth” information technology and libraries 27 (2008): 32-6. 3 michael vandenburg, “using google maps as an interface for the library catalogue” library hitech 26 (2008): 33-40. 4 troy swanson, “google maps and second life: virtual platforms meet information literacy” college & research libraries news 69 (2008): 610-12. 5 annette lamb, and larry johnson, “virtual expeditions: google earth, gis, and geovisualization technologies in teaching and learning” teacher librarian 37 (2010): 81-5. 6 a list of mcgill library’s air photo indexes can be viewed at http://www.mcgill.ca/library/library-findinfo/maps/airphotos/ (accessed september 8, 2011). 7 mcmaster university library map index can be found at http://library.mcmaster.ca/maps/ww1/ndx5to40.htm, (accessed september 8, 2011). 8 the brock university historical air photo collection can be accessed at: http://www.brocku.ca/maplibrary/airphoto/historical.php (accessed september 8, 2011). 9 the yale university sanborn indexes can be found at http://www.library.yale.edu/mapcoll/print_sanborn.html (accessed september 8, 2011). 10 the university of vermont library’s google map can be found at: http://cdi.uvm.edu/collections/browsecollection.xql?pid=longtrail&title=long%20trail%20p hotographs (accessed september 8, 2011). 11 the cleveland memory project can be found at: http://www.clevelandmemory.org/hlneo/ (accessed september 8, 2011). http://www.libraryjournal.com/article/ca6602836.html http://www.mcgill.ca/library/library-findinfo/maps/airphotos/ http://library.mcmaster.ca/maps/ww1/ndx5to40.htm http://www.brocku.ca/maplibrary/airphoto/historical.php http://www.library.yale.edu/mapcoll/print_sanborn.html http://cdi.uvm.edu/collections/browsecollection.xql?pid=longtrail&title=long%20trail%20photographs http://cdi.uvm.edu/collections/browsecollection.xql?pid=longtrail&title=long%20trail%20photographs http://www.clevelandmemory.org/hlneo/ academic uses of google earth and google maps in a library setting | dodsworth and nicholson 114 12 the university of waterloo map library website can be found at: http://www.lib.uwaterloo.ca/locations/umd/project/ (accessed september 8, 2011). 13 the university of north carolina library provides interactive maps at http://www.lib.unc.edu/dc/ncmaps/interactive/overlay.html (accessed september 8, 2011). 14 the university of connecticut library offers gis files online here: http://magic.lib.uconn.edu/connecticut_data.html (accessed september 8, 2011). 15 campus map examples include: yale university library at http://maps.commons.yale.edu/venice/ example maps for library locations on campus include: brock university library, http://www.brocku.ca/maplibrary/general/where-is-the-ml.php university of north carolina, http://www.lib.unc.edu/libraries_collections.html (all accessed on september 8, 2011). 16 the full survey instrument can be found in the appendix of this document. http://www.lib.uwaterloo.ca/locations/umd/project/ http://www.lib.unc.edu/dc/ncmaps/interactive/overlay.html http://magic.lib.uconn.edu/connecticut_data.html http://maps.commons.yale.edu/venice/ http://www.brocku.ca/maplibrary/general/where-is-the-ml.php http://www.lib.unc.edu/libraries_collections.html information technology and libraries | june 2012 115 appendix google maps and google earth: influences and impacts in your library you and your library 1. what is your work position title? 2. what department/division/area of library do you work in? (click all that apply) o map/gis services o government publications o general reference o technical services o other (please specify): google mapping products 3. please check all the products you have worked with? o google maps o google maps api o google earth o google earth plus o google earth pro o google earth api o google earth enterprise 4. how regularly do you work with google mapping products for work-related projects? o not at all o a few times a year o 1-3 times a month o 1-2 times a week o 3-4 times a week o more often than that! o not sure 5. for what work related tasks, have you used these products? (click all that apply) o instruction o promotion/marketing o answering research questions o creating/accessing a finding aid tool (air photo, map indexes, etc.) academic uses of google earth and google maps in a library setting | dodsworth and nicholson 116 library instruction using google mapping products 6. does your library have a map, or spatial, or geospatial literacy policy or program? o yes o no 7. if you are using google mapping products for instruction, what level or year of university course(s) are you using it in, and in how many courses: 1-2 3-5 6-9 10-14 15 and more 1st year (100 level) 2nd year (200 level) 3rd year (300 level) 4th year (400 level) graduate level 8. please describe some of these activities? 9. does your library offer geographic awareness or gis-related training to some or all the library staff? promotion/marketing using google mapping products 10. has your library used google mapping technology to promote, offer, or deliver a service? (for example, offering kml files for download, indexes, guides, scanned documents, placemarks/urls from google maps/earth, etc.) o yes o no 10a. if yes, please describe with as much detail as possible how your library has used google mapping technology. if possible, please provide links to the projects. 10b. if yes, how have the google mapping related projects enhanced services or benefited the library? information technology and libraries | june 2012 117 11. does the library provide support to the wider campus community using google mapping products (not including instructional collaborations)? kml/kmz collections 12. do you work with kml files? o yes o no 13. do you create your own kml files? o yes o no 14. how do you create your own kml files? o write xml code o save in google earth o convert from another file format using an external application o other (please specify) 15. does your library hold and provide access to kml or kmz files as part of its collections? o yes o no 16. if yes, approximately how many files do you currently hold? 17. how are these files findable by your patrons? o opac o library website o both 18. do you or other library staff use other online mapping tools? please list which ones and what they are used for. batch ingesting into eprints digital repository sof tware tomasz neugebauer and bin han information technology and libraries | march 2012 113 abstract this paper describes the batch importing strategy and workflow used for the import of theses metadata and pdf documents into the eprints digital repository software. a two-step strategy of importing metadata in marc format followed by attachment of pdf documents is described in detail, including perl source code for scripts used. the processes described were used in the ingestion of 6,000 theses metadata and pdfs into an eprints institutional repository. introduction tutorials have been published about batch ingestion of proquest metadata and electronic theses and dissertations (etds),1 as well as endnote library,2 into the digital commons platform. the procedures for bulk importing of etds using dspace have also been reported.3 however, bulk importing into the eprints digital repository software has not been exhaustively addressed in the literature.4 a recent article by walsh provides a literature review of batch importing into institutional repositories.5 the only published report on batch importing into the eprints platform describes perl scripts for metadata-only records import from thomson reuters reference manager.6 bulk importing is often one of the first tasks after launching a repository, so it is unsurprising that requests for reports and documentation on eprints-specific workflow have been a recurring question on the eprints tech list.7 a recently published review of eprints identifies “the absence of a bulk uploading feature” as its most significant weakness.8 although eprints’ graphical user interface for bulk importing is limited to the use of the installed import plugins, the software does have a versatile infrastructure for this purpose. leveraging eprints’ import functionality requires some perl scripting, structuring the data for import, and using the command line interface. in 2009, when concordia university launched spectrum,9 its research repository, the first task was a batch ingest of approximately 6,000 theses dated from 1967 to 2003. the source of the metadata for this import consisted in marc records from an integrated library system powered by innovative interfaces and proquest pdf documents. this paper is a report on the strategy and workflow adopted for batch ingestion of this content into the eprints digital repository software. import strategy eprints has a documented import command line utility located in the /bin folder.10 documents can also be imported through eprints’ graphical interface. using the command line utility for tomasz neugebauer (tomasz.neugebauer@concordia.ca) is digital projects and systems development librarian and bin han (bin.han@concordia.ca) is digital repository developer, concordia university libraries, montreal, quebec, canada. mailto:tomasz.neugebauer@concordia.ca mailto:bin.han@concordia.ca batch ingesting into eprints digital repository software| neugebauer and han 114 importing is recommended because it is easier to monitor the operation in real time by adding progress information output to the import plugin code. the task of batch importing can be split into the following subtasks: 1. import of metadata of each item 2. import of associated documents, such as full-text pdf files the strategy adopted was to first import the metadata for all of the new items into the inbox of an editor’s account. after this first step was completed, a script was used to loop through the newly imported eprints and attach the corresponding full-text documents. although documents can be imported from the local file system or via http, import of the files from the local file system was used. the batch import procedure varies depending on the format of the metadata and documents to be imported. metadata import requires a mapping of the source schema fields to the default or custom fields in eprints. the source metadata must also be converted into one of the formats supported by eprints’ import plugins, or a custom plugin must be created. import plugins are available for many popular formats, including bibtex, doi, endnote, and pubmedxml. in addition, community-contributed import plugins such as marc and arxiv are available at eprints files.11 since most repositories use custom metadata fields, some customization of the import plugins is usually necessary. marc plugin for eprints in eprints, the import and export plugins ensure interoperability of the repository with other systems. import plugins read metadata from one schema and load it into the eprints system through a mapping of the fields into the eprints schema. loading marc-encoded files into eprints requires the installation of the import/export plugin developed by romero and miguel.12 the installation of this plugin requires the following two cpan modules: marc::record and marc::file::usmarc. the marc plugin was then subclassed to create an import plugin named “concordia theses,” which is customized for thesis marc records. concordia theses marc plugin the marc plugin features a central configuration file (see appendix a) in which each marc field is paired with a corresponding mapping to an eprints field. most of the fields were configured through this configuration file (see table 1). the source marc records from the innovative interfaces integrated library system (ils) encode the physical description of each item using the anglo american cataloguing rules, as in the following example: “ix, 133 leaves : ill. ; 29 cm.” since the default eprints field for number of pages is of the type integer and does not allow multipart physical descriptions from the marc 300 field, a custom text field for these physical descriptions (pages_aacr) had to be added. the marc.pl configuration file cannot be used to map compound fields, such as author names—the fields need custom mapping implementation in perl. for instance, the marc 100 and 700 fields information technology and libraries | march 2012 115 are transferred into the eprints author compound field (in marc.pm). similarly, marc 599 is mapped into a custom thesis advisor compound field. marc field eprints field 020a isbn 020z isbn 022a issn 245a title 250a edition 260a place_of_pub 260b publisher 260c date 300a pages_aacr 362a volume 440a series 440c volume 440x issn 520a abstract 730a publication table 1. mapping table from marc to eprints helge knüttel’s refinements to the marc plugin shared on the eprints tech list were employed in the implementation of a new subclass of marc import for the concordia theses marc records. in the implementation of the concordia theses plugin, concordiatheses.pm inherits from marc.pm. (see figure 1.)13 knüttel added two methods that make it easier to subclass the general marc plugin and add unique mappings: handle_marc_specialities and post_process_eprint. the post_process_eprint function was not used to attach the full-text documents to each eprint. instead, the strategy to import the full-text documents using a separate attach_documents script was used (see “theses document file attachment” below). import of all of the specialized fields, such as thesis type (mapped from marc 710t), program, department, and proquest id, was implemented in the function handle_marc_specialities of concordiatheses.pm. for instance, 502a in the marc record contains the department information, whereas an eprints system like spectrum stores department hierarchy as subject objects in a tree. therefore importing the department information based on the value of 502a required regular expression searches of this marc field to find the mapping into a corresponding subject id. this was implemented in the handle_marc_specialities function. batch ingesting into eprints digital repository software| neugebauer and han 116 figure 1. concordia theses class diagram, created with the perl module uml::class::simple execution of the theses metadata import the depositing user’s name is displayed along with the metadata for each eprint. a batchimporter user with the corporate name “concordia university libraries” was created to carry out the import. as a result, the public display of the imported items shows the following as a part of the metadata: “deposited by: concordia university libraries.” the marc plugin requires the encoding of the source marc files to be utf-8, whereas the records are exported from the ils with marc-8 encoding. therefore marcedit software developed by reese was used to convert the marc file to utf-8.14 to activate the import, the main marc import plugin and its subclass, concordiatheses.pm, have to be placed in the plugin folder /perl_lib/eprints/plugin/import/marc/. the configuration file information technology and libraries | march 2012 117 (see appendix a) must also be placed with the rest of the configurable files in /archives/repositoryid/cfg/cfg.d. the plugin can then be activated from the command line using the import script in the /bin folder. a detailed description of this script and its usage is documented on the eprints wiki. the following eprints command from the /bin folder was used to launch the import: import repositoryid --verbose --user batchimporter eprint marc::concordiatheses theses-utf8.mrc following the aforementioned steps, all the theses metadata was imported into the eprints software. the new items were imported with their statuses set to inbox. a status set to inbox means that the imported items are in the work area of batchimporter user and will need to be moved to live public access by switching their status to archive. theses document file attachment after the process of importing the metadata of each thesis is complete, the corresponding document files need to be attached. the proquest id was used to link the full-text pdf documents to the metadata records. all of the marc records contained the proquest id, while the pdf files, received from proquest, were delivered with the corresponding proquest id as the filename. the pdfs were uploaded to a folder on the repository web server using ftp. the attach_documents script (see appendix b for source code) was then used to attach the documents to each of the imported eprints in the batchimporter’s inbox and to move the imported eprints to the live archive. several variables need to be set at the beginning of the attach_documents operation (see table 2). variable comment $root_dir = 'bin/importdata/proquest' this is the root folder where all the associated documents are uploaded by ftp. $depositor = 'batchimporter' only the items deposited by a defined depositor, in this case batchimporter, will be moved from inbox to live archive. $dataset_id = 'inbox' limit the dataset to those eprints with status set to inbox $repositoryid = 'library' the internal eprints identifier of the repository table 2. variables to be set in the attach_documents script batch ingesting into eprints digital repository software| neugebauer and han 118 the following command is used to proceed with file attachment, while the output log is redirected and saved in the file attachment: /bin/attach_documents.pl > ./attachment 2>&1 the thesis metadata record was made live even if it did not contain a corresponding document file. a list of eprint ids of theses that did not contain a corresponding full-text pdf document are listed at the end of the log file, along with the count of the number of theses that were made live. after the import operation is complete, all the abstract pages need to be regenerated with the following command: /bin/generate_abstracts repositoryid conclusions this paper is a detailed report on batch importing into the eprints system. the authors believe that this paper and its accompanying source code is a useful contribution to the literature on batch importing into digital repository systems. in particular, it should be useful to institutions that are adopting the eprints digital repository software. batch importing of content is a basic and fundamental function of a repository system, which is why the topic has come up repeatedly on the eprints tech list and in a repository software review. the methods that we describe for carrying out batch importing in eprints make use of the command line and require perl scripting. more robust administrative graphical user interface support for batch import functions would be a useful feature to develop in the platform. acknowledgements the authors would like thank mia massicotte for exporting the metadata records from the integrated library system. we would also like to thank alexandros nitsiou, raquel horlick, adam field, and the reviewers at information technology and libraries for their useful comments and suggestions. references 1. shawn averkamp and joanna lee, “repurposing proquest metadata for batch ingesting etds into an institutional repository,” code{4}lib journal 7 (2009), http://journal.code4lib.org/articles/1647 (accessed june 27, 2011). 2. michael witt and mark p. newton, “preparing batch deposits for digital commons repositories,” 2008, http://docs.lib.purdue.edu/lib_research/96/ (accessed june 20, 2011). 3. randall floyd, “automated electronic thesis and dissertations ingest,” 2009, https://wiki.dlib.indiana.edu/display/iusw/automated+electronic+thesis+and+dissertations+i ngest (accessed may 26, 2011). 4. eprints digital repository software, university of southampton, uk, http://www.eprints.org/ (accessed june 27, 2011). 5. maureen p. walsh, “batch loading collections into dspace: using perl scripts for automation and quality control,” information technology & libraries 29, no. 3 (2010): 117–27, http://journal.code4lib.org/articles/1647 http://docs.lib.purdue.edu/lib_research/96/ https://wiki.dlib.indiana.edu/display/iusw/automated+electronic+thesis+and+dissertations+ingest https://wiki.dlib.indiana.edu/display/iusw/automated+electronic+thesis+and+dissertations+ingest http://www.eprints.org/ information technology and libraries | march 2012 119 http://search.ebscohost.com/login.aspx?direct=true&db=a9h&an=52871761&site=ehost-live (accessed june 26, 2011). 6. lesley drysdale, “importing records from reference manager into gnu eprints,” 2004, http://hdl.handle.net/1905/175 (accessed june 27, 2011). 7. eprints tech list, university of southampton, uk, http://www.eprints.org/tech.php/ (accessed june 27, 2011). 8. mike beazly, “eprints institutional repository software: a review,” partnership: the canadian journal of library & information practice & research 5, no. 2 (2010), http://journal.lib.uoguelph.ca/index.php/perj/article/viewarticle/1234 (accessed june 27, 2011). 9. concordia university libraries, “spectrum: concordia university research repository,” http://spectrum.library.concordia.ca (accessed june 27, 2011). 10. eprints wiki, “api:bin/import,” university of southampton, uk, http://wiki.eprints.org/w/api:bin/import (accessed june 23, 2011). 11. eprints files, university of southampton, uk, http://files.eprints.org/ (accessed june 24 2011). 12. parella romero and jose miguel, “marc import/export plugins for gnu eprints3,” eprints files, 2008, http://files.eprints.org/323/ (accessed may 31, 2011). 13. agent zhang and maxim zenin, “uml:class::simple,” cpan, http://search.cpan.org/~agent/uml-class-simple-0.18/lib/uml/class/simple.pm (accessed september 20, 2011). 14. terry reese, “marcedit: downloads,” oregon state university, http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html (accessed june 27, 2011). http://search.ebscohost.com/login.aspx?direct=true&db=a9h&an=52871761&site=ehost-live http://hdl.handle.net/1905/175 http://www.eprints.org/tech.php/ http://journal.lib.uoguelph.ca/index.php/perj/article/viewarticle/1234 http://spectrum.library.concordia.ca/ http://wiki.eprints.org/w/api:bin/import http://files.eprints.org/ http://files.eprints.org/323/ http://search.cpan.org/~agent/uml-class-simple-0.18/lib/uml/class/simple.pm http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html batch ingesting into eprints digital repository software| neugebauer and han 120 appendix a. marc.pl configuration file # # plugin eprints::plugin::import::marc # # marc tofro eprints mappings # do _not_ add compound mappings here. $c->{marc}->{marc2ep} = { # marc to eprints '020a' => 'isbn', '020z' => 'isbn', '022a' => 'issn', '245a' => 'title', '245b' => 'subtitle', '250a' => 'edition', '260a' => 'place_of_pub', '260b' => 'publisher', '260c' => 'date', '362a' => 'volume', '440a' => 'series', '440c' => 'volume', '440x' => 'issn', '520a' => 'abstract', '730a' => 'publication', }; $c->{marc}->{marc2ep}->{constants} = { }; ################################################################### ### # # plugin-specific settings. # # any non empty hash set for a specific plugin will override the # general one above! # ################################################################### ### # # plugin eprints::plugin::import::marc::concordiatheses # $c->{marc}->{'eprints::plugin::import::marc::concordiatheses'}->{marc2ep} = { '020a' => 'isbn', '020z' => 'isbn', '022a' => 'issn', '250a' => 'edition', information technology and libraries | march 2012 121 '260a' => 'place_of_pub', '260b' => 'publisher', '260c' => 'date', '300a' => 'pages_aacr', '362a' => 'volume', '440a' => 'series', '440c' => 'volume', '440x' => 'issn', '520a' => 'abstract', '730a' => 'publication', }; $c->{marc}->{'eprints::plugin::import::marc::concordiatheses'}->{constants} = { # marc to eprints constants 'type' => 'thesis', 'institution' => 'concordia university', 'date_type' => 'submitted', }; batch ingesting into eprints digital repository software| neugebauer and han 122 appendix b. attach_documents.pl #!/usr/bin/perl -i/opt/eprints3/perl_lib =head1 description this script allows you to attach a file to an eprint object by proquest id. =head1 copyright and license 2009 adam field, tomasz neugebauer 2011 bin han this module is free software under the same terms of perl. compatible with eprints 3.2.4 (victoria sponge). =cut use strict; use warnings; use eprints; my $repositoryid = 'library'; my $root_dir = '/opt/eprints3/bin/import-data/proquest'; #location of pdf files my $dataset_id = 'inbox'; #change to 'eprint' if you want to run it over everything. my $depositor = 'batchimporter'; #limit import to $depositor’s inbox #global variables for log purposes my $int_live = 0; #count of eprints moved to live archive with a document my $int_doc = 0; #count of eprints that already have document attached my @array_doc; #ids of eprints that already have documents my $int_no_doc = 0; #count of eprints moved to live with no document attached my @array_no_doc; #ids of eprints that have no documents my $int_no_proid = 0; #count of eprints with no proquest id my @array_no_proid; #ids of eprints with no proquest id my $session = eprints::session->new(1, $repositoryid); die "couldn't create session for $repositoryid\n" unless defined $session; #the hash contains all the files that need to be uploaded #the hash contains key-value pairs: (pq_id => filename) my $filemap = {}; load_filemap($root_dir); #get all eprints in inbox dataset my $dataset = $session->get_repository->get_dataset($dataset_id); #run attach_file on each eprint object $dataset->map($session, \&attach_file); information technology and libraries | march 2012 123 #output log for attachment print "#### $int_doc eprints already have document attached, skip ####\n @array_doc\n"; print "#### $int_no_proid eprints doesn't have proquest id, skip ####\n @array_no_proid\n"; print "#### $int_no_doc eprints doesn't have associated document, moved to live ####\n @array_no_doc\n"; #total number of eprints that were made live: those with and without documents. my $int_total_live = $int_live + $int_no_doc; print "#### intotal: $int_total_live eprints moved to live ####\n"; #attach file to corresponding eprint object sub attach_file { my ($session, $ds, $eprint) = @_; #skip if eprint already has a document attached my $full_text_status = $eprint->get_value( "full_text_status" ); if ($full_text_status ne "none") { print "eprint ".$eprint->get_id." already has a document, skipping\n"; $int_doc ++; push ( @array_doc, $eprint->get_id ); return; } #retrieve username/userid associated with current eprint my $user = new eprints::dataobj::user( $eprint->{ session }, $eprint->get_value( "userid" ) ); my $username; # exit in case of failure to retrieve associated user, just in case. return unless defined $user; $username = $user->get_value( "username" ); # $dataset includes all eprints in inbox, so we limit to $depositor's items only return if( $username ne $depositor ); #skip if no proquest id is associated with the current eprint my $pq_id = $eprint->get_value('pq_id'); if (not defined $pq_id) { print "eprint ".$eprint->get_id." doesn't have a proquest id, skipping\n"; $int_no_proid ++; batch ingesting into eprints digital repository software| neugebauer and han 124 push ( @array_no_proid, $eprint->get_id ); return; } #remove space from proquest id $pq_id =~ s/\s//g; #attach the pdf to eprint objects and move to live archive if ($filemap->{$pq_id} and -e $filemap->{$pq_id} ) #if the file exists { #create document object, add pdf files to document, attach to eprint object, and move to live archive my $doc = eprints::dataobj::document::create( $session, $eprint ); $doc->add_file( $filemap->{$pq_id}, $pq_id . '.pdf' ); $doc->set_value( "format", "application/pdf" ); $doc->commit(); print "adding document to eprint ", $eprint->get_id, "\n"; $eprint->move_to_archive; print "eprint ".$eprint->get_id." moved to archive.\n"; $int_live ++; } else { #move the metadata-only eprints to live as well print "proquest id \\$pq_id\\ (eprint ", $eprint->get_id, ") does not have a file associated with it\n"; $eprint->move_to_archive; print "eprint ".$eprint->get_id." moved to archive without document attached.\n"; $int_no_doc ++; push ( @array_no_doc, $eprint->get_id ); } } #recursively traverse the directory, find all pdf files. sub load_filemap { my ($directory) = @_; foreach my $filename (<$directory/*>) { if (-d $filename) { load_filemap($filename); } #catch the file name ending in .pdf elsif ($filename =~ m/([^\/]*)\.pdf$/i) information technology and libraries | march 2012 125 { my $pq_id = $1; #add pq_id => filename pair to filemap hash table $filemap->{$pq_id} = $filename; } } } modeling a library website redesign process: developing a user-centered website through usability testing danielle a. becker and lauren yannotta information technology and libraries | march 2013 6 abstract this article presents a model for creating a strong, user-centered web presence by pairing usability testing and the design process. four rounds of usability testing were conducted throughout the process of building a new academic library web site. participants were asked to perform tasks using a talk-aloud protocol. tasks were based on guiding principles of web usability that served as a framework for the new site. results from this study show that testing throughout the design process is an effective way to build a website that not only reflects user needs and preferences, but can be easily changed as new resources and technologies emerge. introduction in 2008 the hunter college libraries launched a two-year website redesign process driven by iterative usability testing. the goals of the redesign were to: • update the design to position the library as a technology leader on campus; • streamline the architecture and navigation; • simplify the language used to describe resources, tools, and services; and • develop a mechanism to quickly incorporate new and emerging tools and technologies. based on the perceived weaknesses of the old site, the libraries’ web committee developed guiding principles that provided a framework for the development of the new site. the guiding principles endorsed solid information architecture, clear navigation systems, strong visual appeal, understandable terminology, and user-centered design. this paper will review the literature on iterative usability testing, user-centered design, and thinkaloud protocol and the implications moving forward. it will also outline the methods used for this study and discuss the results. the model used, building the design based on the guiding principles and using the testing to uphold those principles, led to the development of a strong, user-centered site that can be easily changed or adapted to accommodate new resources and technologies. we believe this model is unique and can be replicated by other academic libraries undertaking a website redesign process. danielle a. becker (dbe0003@hunter.cuny.edu) is assistant professor/web librarian, lauren yannotta (lyannotta@hotmail.com) was assistant professor/instructional design librarian, hunter college libraries, new york, new york. mailto:dbe0003@hunter.cuny.edu mailto:lyannotta@hotmail.com modeling a library website redesign process | becker 7 background the goals of the research were to (1) determine the effectiveness of the hunter college libraries website, (2) discover how iterative usability testing resulting in a complete redesign impacts how the students perceive the usability of a college library website, and (3) reveal student informationseeking habits. a formal usability test was conducted both on the existing hunter college libraries website (appendix a) and the following drafts of the redesign (appendix b) with twenty users over an eighteen-month period. the testing occurred before the website redesign began, while the website was under construction, and after the site was launched. the participants were selected through convenience sampling and informed that participation was confidential. the intent of the usability test was to uncover the flaws in navigation and terminology of the current website and, as the redesign process progressed, to incorporate the users’ feedback into the new website’s design to closely match their wants and needs. the redesign of the website began with a complete inventory of the existing webpages. an analysis was done of the website that identified key information, links, units within the department, and placement of information in the information architecture of the website. we identified six core goals that we felt were the most important for all users of the library’s website: 1. user should be able to locate high-level information within three clicks. 2. eliminate library jargon from navigational system using concise language. 3. improve readability of site. 4. design a visually appealing site. 5. create a site that was easily changeable and expandable. 6. market the libraries’ services and resources through the site. literature review in 2010, oclc compiled a report, “the digital information seeker,” that found 84 percent of users begin their information searches with search engines, while only 1 percent began on a library website. search engines are preferred because of speed, ease of use, convenience, and availability.1 similar studies such as emde et al., and gross and sheridan, have shown that students are not using library websites to do their research.2 gross and sheridan assert in their article on undergraduate search behavior that “although students are provided with library skills sessions, many of them still struggle with the complex interfaces and myriad of choices the library website provides.” 3 this research shows the importance of creating streamlined websites that will information technology and libraries | march 2013 8 compete for our students’ attention. in building a new website at the hunter college libraries, we thought the best way to do this was through user-centered design. web designers both inside and outside the library have recognized the importance of usercentered design. nielsen advises that website structure should be driven by the tasks the users came to the site to perform.4 he asserts the amount of graphics on webpages should be minimized because they often affect page download times and that gratuitous graphics (including text rendered as images) should be eliminated altogether. 5 he also contends it is important to ensure that page designs are accessible to all users regardless of platform or newness of technology. 6 in their article, “how do i find an article? insights from a web usability study,” cockrell and jayne cited instances when researchers concluded that library terminology contributed to patrons’ difficulties when using library websites, thus highlighting the importance of understandable terminology. hulseberg and monson found in their investigation of student-driven taxonomy for library website design that “by developing our websites based on student-driven taxonomy for library website terminology, features, and organization, we can create sites that allow students to get down to the business of conducting research.” 7 performing usability testing is one way to confirm user-centered design. in his book don’t make me think!, krug insists that usability testing can provide designers with invaluable input. that, taken together with experience, professional judgment, and common sense, makes design choices easier.8 ipri, yunkin, and brown, in their article “usability as a method for assessing discovery,” emphasize the important role usability testing has in capturing emotional and aesthetic responses users have to websites, along with expressions of satisfaction with the layout and logic of the site. even the discovery of basic mistakes, such as incorrect or broken links and ineffective wording, can negatively affect discovery of library resources and services. 9 in battleson, booth, and weatherford’s literature review for their usability testing of an academic library website case study, they summarize dumas and redish's discussion of the five facets of formal usability testing: (1) the goal is to improve the usability of the interface, (2) testers should represent real users, (3) testers perform real tasks, (4) user behavior and commentary are observed and recorded, and (5) data are analyzed to recognize problems and suggest solutions. they conclude that when usability testing is "applied to website interfaces, this test method not only results in a more usable site, but also allows the site design team to function more efficiently, since it replaces opinion with user-centered design."10 this allows the designers to evaluate the results and identify problems with the design being tested. 11 usability experts nielsen and tahir contend that the earlier and more frequently usability tests are conducted, the more impact the results will have on the final design of the website because the results can be incorporated throughout the design process. they conclude it is better to conduct frequent, smaller studies with a maximum of five users. they assert, “you will always have discovered so many blunders in the design that it will be better to go back to the drawing board modeling a library website redesign process | becker 9 and redesign the interface than to discover the same usability problems several more times with even more users.” 12 based on the strength of the literature, we decided to use iterative testing for our usability study. krug points out that testing is an iterative process because designers need to create, test, and fix based on test results, then test again.13 according to the united states department of health and human services report “research-based web design and usability guidelines,” conducting before and after studies when revising a website will help designers determine if changes actually made a difference in the usability of the site.14 manzari and trinidad-christensen found in their evaluation of user-centered design for a library website, iterative testing is when a product is tested several times during development, allowing users’ needs to be incorporated into the design. in their study, their aim was that the final draft of their website would closely match the users’ information needs while remaining consistent, easy to learn, and efficient.15 battleson, booth, and weintrop report that there is “a consensus in the literature that usability testing be an iterative process, preferably one built into a web site’s initial design.” 16 they explain that “site developers should test for usability, redesign, and test again—these steps create a cycle for maintaining, evaluating and continually improving a site.” 17 george used iterative testing in her redesign of the carnegie mellon university libraries website and concluded that it was “necessary to provide user-centered services via the web site.” 18 cobus, dent, and ondrusek used six students to usability test the “pilot study.” then eight students participated in the first round of testing; then librarians modified the prototype and tested fourteen students in the second and final round. after the second round of testing they used the results of this test to analyze the user recordings and deliver the findings and proposed “fixes” to the prototype pages to the web editor.19 mcmullen’s redesign of the roger williams university library website was able to “complete the usability-refinement cycle” twice before finalizing the website design.20 but continued refinements were needed, leading to another round of usability tests to identify and correct problem areas.21 bauer-graham, poe, and weatherford did a comparative study of a library websites’ usability via a survey and then redesigned the website after evaluating the survey’s results. they waited a semester, distributed another survey to determine the functionality of the current site. the survey had the participants view the previous design and the current design in a side-by-side comparison to determine how useful the changes made to the site were. 22 when testing participants, in the article “how do i find an article? insights from a web usability study,” cockrell and jayne suggest using a web interface to perform specified tasks while a tester observes, noting the choices made, where mistakes occur, and using a “think aloud” protocol. they found that modifying the website through an ongoing, iterative process of testing, refining, and retesting its component parts improves functionality. 23 in conducting our usability testing we used a think-aloud protocol to capture the participants’ actions. van den haak, de jong, and schellens define think-aloud protocol as relying on a method information technology and libraries | march 2013 10 that asks users to complete a set of tasks and to constantly verbalize their thoughts while working on the tasks. the usefulness of this method of testing lies in the fact that the data collected reflect the actual use of the thing being tested and not the participants’ judgments about its usability. instead, the test follows the individual’s thoughts during the execution of the tasks. 24 nielsen states that think-aloud protocol “may be the single most valuable usability engineering method. . . . one gets a very direct understanding of what parts of the [interface/user] dialog cause the most problems, because the thinking aloud method shows how users interpret each individual interface item.” 25 turnbow ‘s article “usability testing for web redesign: a ucla case study” states that using the “think-aloud protocol” provides crucial real-time feedback on potential problems in the design and organization of a website.26 cobus, dent, and ondrusek used the think-aloud protocol in their usability study. they encouraged participants to talk out loud as they answered the questions, audio taped their comments, and captured their on-screen navigation using camtasia.27 this information was used to successfully reorganize hunter college library’s website. method an interactive draft of hunter college libraries redesigned website was created before the usability study was conducted. in spring 2009, the authors created the protocol for the usability testing. a think-aloud protocol was agreed upon for testing both the old site and the drafts of the new site, including a series of post-test questions that would allow participants to share their demographic information and give subjective feedback on the drafts of the site. draft questions were written, and we conducted mock usability tests on each other. after several drafts we revised our questions and performed pilot tests on an mlis graduate student and two undergraduate student library assistants with little experience with the current website. we ascertained from these pilot tests that we needed to slightly revise the wording of several questions to make them more understandable to all users. we made the revisions and eliminated a question that was redundant. all recruitment materials and finalized questions were submitted to the institutional review board (irb) for review and went through the certification process. after receiving approval we secured a private room to conduct the study. participants were recruited using a variety of methods. signs were posted throughout the library, an e-mail was sent out to several hunter college distribution lists, and a tent sign was erected in the lobby of the library. participants were required to be students or faculty. participants were offered a $10.00 barnes & noble gift card as incentive. applicants were accepted on a rolling basis. twenty students participated in the web usability study (appendix c). no faculty responded to our requests for participation so a decision was made to focus this usability test on students rather than faculty because students comprise our core user base. another usability test will be conducted in the future that will focus on faculty to determine how their academic tasks differ from undergraduates when using the library modeling a library website redesign process | becker 11 website. the redesigned site is malleable, which makes revisions and future changes in the design a predicted outcome of future usability tests. tests were scheduled for thirty-minute intervals. we conducted four rounds of testing using five participants per round. the two researchers switched questioner and observer roles after each round of testing. each participant was asked to think aloud while they completed the tasks and navigated the website. both researchers took notes during the tests to ensure detailed and accurate data was collected. each participant was asked to review the irb forms detailing their involvement in the study, and they were asked to consent at that time. their consent was implied if they participated in the study after reading the form. the usability test consisted of fifteen task-oriented questions. the questions were identical when testing the old and new draft site. the first round tested only the old site, while the following three rounds tested only the new draft site. we tested both sites because we believed that comparing the two sites would reveal if the new site improved performance. the questions (appendix d) were not changed after they were initially finalized and remained the same throughout the entire four rounds of the usability study. participants were reminded at the onset of the test and throughout the process that the design and usability of the site(s) were being tested, not their searching abilities. the tests were scheduled for an hour each, allowing participants to take the tests without time restrictions or without being timed. as a result, the participants were encouraged to take as much time as they needed to answer the questions, but were also allowed to skip questions if they were unable to locate answers. initially the tests were recorded using camtasia software. this allowed us to record participants’ navigation trails through their mouse movements and clicks. but, after the first round of testing, we decided that observing and taking notes was appropriate documentation, and we stopped using the software. after the participants completed the tests we asked them user preference questions to get a sense of their user habits and their candid opinions of the new draft of the website. these questions were designed to elicit ideas for useful links to include on the website and also to gauge the visual appeal of the site. information technology and libraries | march 2013 12 results table 1. percent of tasks answered correctly discussion hunter college libraries’ website was due for a redesign because the site was dated in its appearance and did not allow new content to be added quickly and easily. as a result, a decision was made to build a new site using a content management system (cms) to make the site easily expandable and simple to update. this study tested the simple tasks to determine how to structure the information architecture and to reinforce the guiding principles of the redesigned website. task successes and failures the high percentage of success of participants finding books on the redesigned website using the online library catalog and easily find library hours reinforced our guiding principle of understandable terminology and clear navigational systems. krug contends that navigation educates the user on the site’s contents through its visible hierarchy. the result is a site that guides the user through their options and instills confidence in task old site new site find a book using online library catalog 80% 86% find library hours 100% 100% get help from a librarian using questionpoint 40% 93% find a journal article 20% 66% find reference materials 0% 7% find journals by title 40% 66% find circulation policies 60% 53% find books on reserve 80% 73% find magazines by title 0% 73% find the library staff contact information 60% 100% find contact information for the branch libraries 40% 100% modeling a library website redesign process | becker 13 the website and its designers.28 we found this to be true in the way our users easily found the hours and catalog links on the prototype of our library website. the users on the old site knew where to look for this information because they were accustomed to how to navigate the old site. given that the prototype was a complete departure from the navigation and design of the old site, it was crucial that the labels and links were clear and understandable in the prototype or our design would fail. we made “hours” the first link under the “about” heading and “cuny+/books” the first link under the “find” heading and as a result both our terminology and our structure was a success with participants. on the old website, users rarely used the libraries’ online chat client. despite our efforts to remind students of its usefulness, the website didn’t sufficiently place the link in a reasonably visible location on the home page. in the old site, only 40 percent of participants located the link as it was on the bottom left of the screen and easy to overlook. instead, on the new site, the “ask a librarian” link was prominently featured on the top of the screen. these results upheld the guiding principles of solid information architecture and understandable terminology. it also supported nielsen’s assertion that “site design must be aimed at simplicity above all else, with as few distractions as possible and with a very clear information architecture and matching navigation tools.” 29 as a result the launch of the redesigned site, the use of the questionpoint chat client has more than doubled. finding a journal article on a topic was always problematic for users of the old library website. the participants we tested were familiar with the site, and 80 percent erroneously clicked on “journal title list” when the more appropriate link would have been “databases” if they didn’t have an exact journal title in mind. although we taught this in our information literacy courses, it was challenging getting the information across. in order to address this on the new site, “databases” was changed to “databases/articles” and categorized under the heading “find.” the participants using the new site had greater success with the new terminology; 66 percent correctly chose “databases/articles.” this question revealed an inconsistency with the guiding principals of understandable terminology and clear navigation systems on the old site. these issues were addressed by adding the word “articles” after “databases” on the new site to clarify what resources could be found in a database and also by placing the link under the heading “find” to further explain the action a student would be taking by clicking on the “databases/articles” link. finding reference materials was challenging for the users of the old site as none of the participants clicked on the intended link “subject guides.” in an effort to increase usage of the research guides, the library not only purchased the libguides tool, but also changed the wording of the link to “topic guides.” as we neared the end of our study we observed that only one participant knew to click on the “topic guides” link for research assistance. the participants suggested calling it “research guides” instead of “topic guides” and we changed it. unfortunately, the usability study was completed and we were unable to further test the effectiveness of the rewording of this link. anecdotally, the rewording of this link appears to be more understandable to users as the information technology and libraries | march 2013 14 research guides are getting more usage (based on hit counts) than the previous guides. the rewording of these guides adhered to both principles of understandable terminology and usercentered design. these results supported nielsen’s assertion that the most important material should be presented up front, using the inverted pyramid principal. “users should be able to tell in a glance what the page is about and what it can do for them.” 30 our results also supported the hhs report, which states that terminology “plays a large role in the user’s ability to find and understand information. many terms are familiar to designers and content writers, but not to users.” 31 we concluded that rewriting the link based on student feedback reduces the use of terminology. although librarians are “subject specialists” and “subject liaisons” and are familiar with those labels and that terminology, our students were looking for the word “research” instead of “subject” so they were not connecting with the library’s libguides. as previously discussed, students of the old site thought the link “journal title list” would give them access to the library’s database holdings. when asked to find a specific journal title the correct answer to this question on the old site was “journal title list,” with only 40 percent of the participants answering correctly. another change to terminology in the new site, both were placed under the heading “find,” and, after testing of the first prototype, “journal title list” was changed to “list of journals and magazines.” in the following tests 66 percent of the participants were able to answer correctly. the percentages of success in finding circulation policies between the old site and the prototype site were slight, only a 7 percent difference. this can be attributed to the fact that participants on the old site could click on multiple links to get to the correct page, and they were familiar enough with the site to know that. in the prototype of the site there were several paths as well, some direct, some indirect. testing the wording of this link supported the understandable terminology principle, more so than the old website’s “library policies” link, yet to be true to our user-centered design principle, we needed to reword it once more. therefore, after the test was completed and the website was launched, we reworded the link to “checkout policies,” which utilizes the same terminology that users are familiar with because they checkout books at our checkout desk. the remaining tasks consisted of locating information, such as finding books on reserve, magazines by title, library staff contact information, and finding branch information were all met with higher success rates in the prototype site because in the redesign process the links were reworded to support the understandable terminology and user-centered design principles. participant feedback: qualitative the usability testing process informed the redesign of our website in many specific ways. if the layout of the site didn’t test well with participants, we planned to create another prototype. in their evaluation of colorado state universities libraries’ digital collections and the western waters digital library websites, zimmerman and paschal describe the importance of first impressions of a website as the determining factor of whether users return to a website; if it is positive they will return and continue to explore.32 modeling a library website redesign process | becker 15 when given an opportunity to give feedback on what they thought of the design of the website the participants commented: • “there were no good library links at the bottom before and there wasn’t the ask a librarian link either which i like a lot.” • “the old site was too difficult to navigate, new site has a lot of information, i like the different color schemes for the different things.” • “it is contemporary and has everything i need in front of me.” • “cool.” • “helpful.” • “straightforward.” • “the organization is easier for when you want to find things.” • “interactivity and rollovers make it easy to use.” • “intuitive, straight-forward and i like the simplicity of the colors.” • “more professional, more aesthetically pleasing than the old site.” • “the four menu options (about, find, services, help) break the information down easily.” additional research conducted by nathan, yeow, and murguesan claims attractiveness (referring to aesthetic appeal of a website) is the most important factor in influencing customer decisionmaking and affects the usability of the website.33 not only that, but users feel better when using a more attractive product. fortunately, the feedback from our participants revealed that the website was visually appealing, and the navigation scheme was clear and easy to understand. other changes made to the libraries’ website because of usability testing participants commented that they expected to find library contact information on the bottom of the homepage, so the bottom of the screen was modified to include this information as well as a “contact us” link. participants did not realize that the “about,” “find,” “services,” and “help” headings were also links, so we modified them so they were underlined when hovered over. there were also adjustments to the gray color bars on the top of the page because participants thought they were too bright, so they were darkened to make the labels easier to read. participants also commented that they wanted links to various public libraries in new york city under the “quick links” section of the homepage. we designed buttons for brooklyn public library, queens public library, and the new york public library and reordered this list to move these links closer to the top of the “quick links” section. information technology and libraries | march 2013 16 conclusion conducting a usability study of hunter college libraries existing website and the various stages of the redesigned website prototypes was instrumental in developing a user-centered design. approaching the website redesign in stages, with guidance from iterative user testing and influenced by the participants’ comments, gave the web librarian and the web committee an opportunity to incorporate the findings of the usability study into the design of the new website. rather than basing design decisions on assumptions of users’ needs and information seeking behaviors, we were able to incorporate what we’d learned from the library literature and the users’ behavior into our evolving designs. this strategy resulted in a redesigned website that, with continued testing, user feedback, and updating, has aligned with the guiding principles we developed at the onset of the redesign project. the one unexpected outcome from this study is that we discovered that despite how well a library website is designed, users will still need to be educated in how to use the site with an emphasis on developing strong information literacy skills. references 1. “the digital information seeker: report of the findings from selected oclc, rin, and jisc user behaviour projects,” oclc research, ed. lynn silipigni-connaway and timothy dickey (2010): 6, www.jisc.ac.uk/publications/reports/2010/digitalinformationseekers.aspx. 2. judith emde, lea currie, frances a. devlin, and kathryn graves, “is ‘good enough’ ok? undergraduate search behavior in google and in a library database,” university of kansas scholarworks (2008), http://hdl.handle.net/1808/3869; julia gross and lutie sheridan, “web scale discovery: the user experience,” new library world 112, no. 5/6 (2011): 236, doi: 10.1108/03074801111136275. 3. ibid, 238. 4. jakob nielsen, designing web usability (indianapolis: new riders, 1999), 198. 5. ibid, 134. 6. ibid, 97. 7. barbara j. cockrell and elaine a. jayne, “how do i find an article? insights from a web usability study,” journal of academic librarianship 28, no. 3 (2002): 123, doi: 10.1016/s00991333(02)00279-3. 8. steve krug, don't make me think! a common sense approach to web usability, 2nd ed. (berkeley, ca: new riders, 2006), 135. 9. tom ipri, michael yunkin, and jeanne brown, “usability as a method for assessing discovery,” information technology & libraries 28, no. 4 (2009): 181, doi: 10.6017/ital.v28i4.3229. 10. brenda battleson, austin booth, and jane weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (2001): 189–98, doi: 10.1016/s0099-1333(01)00180-x. http://www.jisc.ac.uk/publications/reports/2010/digitalinformationseekers.aspx http://hdl.handle.net/1808/3869 doi:%2010.1108/03074801111136275 doi:%2010.1108/03074801111136275 doi:%2010.1016/s0099-1333(02)00279-3 doi:%2010.1016/s0099-1333(02)00279-3 doi:%2010.6017/ital.v28i4.3229 doi:%2010.1016/s0099-1333(01)00180-x doi:%2010.1016/s0099-1333(01)00180-x modeling a library website redesign process | becker 17 11. ibid. 12. jakob nielsen and marie tahir, “keep your users in mind,” internet world 6, no. 24 (2000): 44. 13. steve krug, don't make me think! a common sense approach to web usability, 135. 14. research-based web design and usability guidelines, ed. ben schneiderman (washington: united states dept. of health and human services, 2006), 190. 15. laura manzari and jeremiah trinidad-christensen, “user-centered design of a web site for library and information science students: heuristic evaluation and usability testing,” information technology & libraries 25, no. 3 (2006): 163, doi: 10.6017/ital.v25i3.3348. 16. battleson, booth, and weintrop, “usability testing of an academic library web site,” 190. 17. ibid. 18. carole a. george, “usability testing and design of a library website: an iterative approach,” oclc systems & services 21, no. 3 (2005): 178, doi: 10.1108/10650750510612371. 19. laura cobus, valeda dent, and anita ondrusek, “how twenty-eight users helped redesign an academic library web site,” reference & user services quarterly 44, no. 3 (2005): 234–35. 20. susan mcmullen, “usability testing in a library web site redesign project,” reference services review 29, no. 1 (2001): 13, doi: 10.1108/00907320110366732. 21. ibid. 22. john bauer-graham, jodi poe, and kimberly weatherford, “functional by design: a comparative study to determine the usability and functionality of one library's web site,” technical services quarterly 21, no. 2 (2003): 34, doi: 10.1300/j124v21n02_03. 23. cockrell and jayne, “how do i find an article?,” 123. 24. maaike van den haak, menno de jong, and peter jan schellens, “retrospective vs. concurrent think-aloud protocols: testing the usability of an online library catalogue,” behavior & information technology 22, no. 5 (2003): 339. 25. battleson, booth, and weintrop, “usability testing of an academic library web site,” 192. 26. dominique turnbowet al., “usability testing for web redesign: a ucla case study,” oclc systems & services 21, no. 3 (2005): 231, doi: 10.1108/10650750510612416. 27. cobus, dent, and ondrusek, “how twenty-eight users helped redesign an academic library web site,” 234. 28. krug, don't make me think! 59. 29. nielsen, designing web usability, 164. 30. ibid., 111. doi:%2010.6017/ital.v25i3.3348 doi:%2010.1108/10650750510612371 doi:%2010.1108/00907320110366732 doi:%2010.1300/j124v21n02_03 doi:%2010.1108/10650750510612416 information technology and libraries | march 2013 18 31. schneiderman, research-based web design and usability guidelines, 160. 32. don zimmerman and dawn bastian paschal, “an exploratory evaluation of colorado state universities libraries’ digital collections and the western waters digital library web sites,” journal of academic librarianship 35, no. 3 (2009): 238, doi: 10.1016/j.acalib.2009.03.011. 33. robert j. nathan, paul h. p. yeow, and sam murugesan, “key usability factors of serviceoriented web sites for students: an empirical study,” online information review 32, no. 3 (2008): 308, doi: 10.1108/14684520810889646. doi:%2010.1016/j.acalib.2009.03.011 doi:%2010.1108/14684520810889646 modeling a library website redesign process | becker 19 appendix a. hunter college libraries’ old website information technology and libraries | march 2013 20 appendix b. hunter college libraries’ new website modeling a library website redesign process | becker 21 appendix c. test participant profiles participant sex academic standing major library instruction session? how often in the library 1 female senior history yes every day 2 female sophomore psychology no every day 3 male junior nursing no 1/week 4 female junior studio art no 5/week 5 female senior accounting yes 2–3/week 6 male freshman undeclared yes 1/week 7 female freshman undeclared no every day 8 male senior music yes 3–4/week 9 male freshman physics/english no every day 10 female senior english lit/ media studies no 1/week 11 female junior fine arts/ geography yes 2–3/week 12 male sophomore computer science yes every day 13 male sophomore econ/psychology yes 6 hours/week 14 female senior math/econ yes 2–3/week 15 female senior art yes everyday 16 male n/a* pre-nursing no daily 17 female senior** econ didn’t remember 3/week 18 male senior pre-med yes 2/week 19 female grad art history yes 3/week 20 male grad education (tesol) no every day note: *this student at hunter fulfilling pre-requisites; already had bachelor of arts degree from another college. **this student had just graduated. information technology and libraries | march 2013 22 appendix d. test questions/tasks • what is the first thing you noticed (or looked at) when you launched the hunter libraries homepage? • what’s the second? • if your instructor assigned the book to kill a mockingbird what link would you click on to see if the library owns that book? • when does the library close on wednesday night? • if you have a problem researching a paper topic and are at home, where would you go to get help from a librarian? • where would you click if you needed to find two journal articles on “homelessness in america”? • you have to write your first sociology paper and wanted to know what databases, journals, and web sites would be good resources for you to begin your research. where would you click? • does hunter library subscribe to the e-journal journal of communication? • how long can you check out a book for? • how would you find items on reserve for professor doyle’s liibr100 class? • does hunter library have the latest issue of rolling stone magazine? • what is the e-mail for louise sherby, dean of libraries? • what is the phone number for the social work library? • you are looking for a guide to grammar and writing on the web, does the library’s webpage have a link to such a guide? • your friend is a hunter student who lives near brooklyn college. she says that she may return books she borrowed from the brooklyn college library to hunter library. is she right? where would you find out? • this website is easy to navigate (agree, agree somewhat, disagree somewhat, disagree)? • this website uses too much jargon (agree, agree somewhat, disagree somewhat, disagree)? • i use the hunter library’s website (agree, agree somewhat, disagree somewhat, disagree)? can bibliographic data be put directly onto the semantic web? | yee 55 martha m. yee can bibliographic data be put directly onto the semantic web? this paper is a think piece about the possible future of bibliographic control; it provides a brief introduction to the semantic web and defines related terms, and it discusses granularity and structure issues and the lack of standards for the efficient display and indexing of bibliographic data. it is also a report on a work in progress—an experiment in building a resource description framework (rdf) model of more frbrized cataloging rules than those about to be introduced to the library community (resource description and access) and in creating an rdf data model for the rules. i am now in the process of trying to model my cataloging rules in the form of an rdf model, which can also be inspected at http://myee. bol.ucla.edu/. in the process of doing this, i have discovered a number of areas in which i am not sure that rdf is sophisticated enough yet to deal with our data. this article is an attempt to identify some of those areas and explore whether or not the problems i have encountered are soluble—in other words, whether or not our data might be able to live on the semantic web. in this paper, i am focusing on raising the questions about the suitability of rdf to our data that have come up in the course of my work. t his paper is a think piece about the possible future of bibliographic control; as such, it raises more complex questions than it answers. it is also a report on a work in progress—an experiment in building a resource description framework (rdf) model of frbrized descriptive and subject-cataloging rules. here my focus will be on the data model rather than on the frbrized cataloging rules for gathering data to put in the model, although i hope to have more to say about the latter in the future. the intent is not to present you with conclusions but to present some questions about data modeling that have arisen in the course of the experiment. my premise is that decisions about the data model we follow in the future should be made openly and as a community rather than in a small, closed group of insiders. if we are to move toward the creation of metadata that is more interoperable with metadata being created outside our community, as is called for by many in our profession, we will need to address these complex questions as a community following a period of deep thinking, clever experimentation, and astute political strategizing. n the vision the semantic web is still a bewitching midsummer night’s dream. it is the idea that we might be able to replace the existing html–based web consisting of marked-up documents—or pages—with a new rdf– based web consisting of data encoded as classes, class properties, and class relationships (semantic linkages), allowing the web to become a huge shared database. some call this web 3.0, with hyperdata replacing hypertext. embracing the semantic web might allow us to do a better job of integrating our content and services with the wider internet, thereby satisfying the desire for greater data interoperability that seems to be widespread in our field. it also might free our data from the proprietary prisons in which it is currently held and allow us to cooperate in developing open-source software to index and display the data in much better ways than we have managed to achieve so far in vendor-developed ils opacs or in giant, bureaucratic bibliographic empires such as oclc worldcat. the semantic web also holds the promise of allowing us to make our work more efficient. in this bewitching vision, we would share in the creation of uniform resource identifiers (uris) for works, expressions, manifestations, persons, corporate bodies, places, subjects, and so on. at the uri would be found all of the data about that entity, including the preferred name and the variant names, but also including much more data about the entity than we currently put into our work (name-title and title), such as personal name, corporate name, geographic, and subject authority records. if any of that data needed to be changed, it would be changed only once, and the change would be immediately accessible to all users, libraries, and library staff by means of links down to local data such as circulation, acquisitions, and binding data. each work would need to be described only once at one uri, each expression would need to be described only once at one uri, and so forth. very much up in the air is the question of what institutional structures would support the sharing of the creation of uris for entities on the semantic web. for the data to be reliable, we would need to have a way to ensure that the system would be under the control of people who had been educated about the value of clean and accurate entity definition, the value of choosing “most commonly known” preferred forms (for display in lists of multiple different entities), and the value of providing access martha m. yee (myee@ucla.edu) is cataloging supervisor at the university of california, los angeles film and television archive. 56 information technology and libraries | june 2009 under all variant forms likely to be sought. at the same time, we would need a mechanism to ensure that any interested members of the public could contribute to the effort of gathering variants or correcting entity definitions when we have had inadequate information. for example, it would be very valuable to have the input of a textual or descriptive bibliographer applied to difficult questions concerning particular editions, issues, and states of a significant literary work. it would also be very valuable to be able to solicit input from a subject expert in determining the bounds of a concept entity (subject heading) or class entity (classification). n the experiment (my project) to explore these bewitching ideas, i have been conducting an experiment. as part of my experiment, i designed a set of cataloging rules that are more frbrized than is rda in the sense that they more clearly differentiate between data applying to expression and data applying to manifestation. note that there is an underlying assumption in both frbr (which defines expression quite differently from manifestation) and on my part, namely that catalogers always know whether a given piece of data applies at either the expression or the manifestation level. that assumption is open to questioning in the process of the experiment as well. my rules also call for creating a more hierarchical and degressive relationship between the frbr entities work, expression, manifestation, and item, such that data pertaining to the work does not need to be repeated for every expression, data pertaining to the expression does not need to be repeated for every manifestation, and so forth. degressive is an old term used by bibliographers for bibliographies that provide great detail about first editions and less detail for editions after the first. i have adapted this term to characterize my rules, according to which the cataloger begins by describing the work; any details that pertain to all expressions and manifestations of the work are not repeated in the expression and manifestation descriptions. this paper would be entirely too long if i spent any more time describing the rules i am developing, which can be inspected at http://myee.bol.ucla .edu. here, i would like to focus on the data-modeling process and the questions about the suitability of rdf and the semantic web for encoding our data. (by the way, i don’t seriously expect anyone to adopt my rules! they are radically different than the rules currently being applied and would represent a revolution in cataloging practice that we may not be up to undertaking in the current economic climate. their value lies in their thought-experiment aspect and their ability to clarify what entities we can model and what entities we may not be able to model.) i am now in the process of trying to model my cataloging rules in the form of an rdf model (“rdf” as used in this paper should be considered from now on to encompass rdf schema [rdfs], web ontology language [owl], and simple knowledge organization system [skos] unless otherwise stated); this model can also be inspected at http://myee.bol .ucla.edu. in the process of doing this, i have discovered a number of areas in which i am not sure that rdf is yet sophisticated enough to deal with our data. this article is an attempt to outline some of those areas and explore whether the problems i have encountered are soluble, in other words, whether or not our data might be able to live on the semantic web eventually. i have already heard from rdf experts bruce d’arcus (miami university) and rob styles (developer of talis, as semantic web technology company), whom i cite later, but through this article i hope to reach a larger community. my research questions can be found later, but first some definitions. n definition of terms the semantic web is a way to represent knowledge; it is a knowledge-representation language that provides ways of expressing meaning that are amenable to computation; it is also a means of constructing knowledgedomain maps consisting of class and property axioms with a formal semantics rdf is a family of specifications for methods of modeling information that underpins the semantic web through a variety of syntax formats; an rdf metadata model is based on making statements about resources in the form of triples that consist of 1. the subject of the triple (e.g., “new york”); 2. the predicate of the triple that links the subject and the object (e.g., “has the postal abbreviation”); and 3. the object of the triple (e.g., “ny”). xml is commonly used to express rdf, but it is not a necessity; it can also be expressed in notation 3 or n3, for example.1 rdfs is an extensible knowledge-representation language that provides basic elements for the description of ontologies, also known as rdf vocabularies. using rdfs, statements are made about resources in the form of 1. a class (or entity) as subject of the rdf triple (e.g., “new york”); 2. a relationship (or semantic linkage) as predicate of the rdf triple that links the subject and the object (e.g., can bibliographic data be put directly onto the semantic web? | yee 57 “has the postal abbreviation”); and 3. a property (or attribute) as object of the rdf triple (e.g., “ny”). owl is a family of knowledge representation languages for authoring ontologies compatible with rdf. skos is a family of formal languages built upon rdf and designed for representation of thesauri, classification schemes, taxonomies, or subject-heading systems. n research questions actually, the full-blown semantic web may not be exactly what we need. remember that the fundamental definition of the semantic web is “a way to represent knowledge.” the semantic web is a direct descendant of the attempt to create artificial intelligence, that is, of the attempt to encode enough knowledge of the real world to allow a computer to reason about reality in a way indistinguishable from the way a human being reasons. one of the research questions should probably be whether or not the technology developed to support the semantic web can be used to represent information rather than knowledge. fortunately, we do not need to represent all of human knowledge—we simply need to describe and index resources to facilitate their retrieval. we need to encode facts about the resources and what the resources discuss (what they are “about”), not facts about “reality.” based on our past experience, doing even this is not as simple as people think it is. the question is whether we could do what we need to do within the context of the semantic web. sometimes things that sound simple do not turn out to be so simple in the doing. my research questions are as follows: 1. is it possible for catalogers to tell in all cases whether a piece of data pertains to the frbr expression or the frbr manifestation? 2. is it possible to fit our data into rdf? given that rdf was designed to encode knowledge rather than information, perhaps it is the wrong technology to use for our purposes? 3. if it is possible to fit our data into rdf, is it possible to use that data to design indexes and displays that meet the objectives of the catalog (i.e., providing an efficient instrument to allow a user to find a particular work of which the author and title are known, a particular expression of a work, all of the works of an author, all of the works in a given genre or form, or all of the works on a particular subject)? as stated previously, i am not yet ready to answer these questions. i hope to find answers in the course of developing the rules and the model. in this paper, i am focusing on raising the questions about the suitability of rdf to our data that have come up in the course of my work. n other relevant projects other relevant projects include the following: 1. frbr, functional requirements for authority data (frad), funtional requirements for subject authority records (frsar), and frbr-objectoriented (frbroo). all are attempts to create conceptual models of bibliographic entities using an entity-relationship model that is very similar to the class-property model used by rdf.2 2. various initiatives at the library of congress (lc), such as lc subject headings (lcsh) in skos,3 the lc name authority file in skos,4 the lccn permalink project to create persistent uris for bibliographic records,5 and initiatives to provide skos representations for vocabularies and data elements used in marc, premis, and mets. these all represent attempts to convert our existing bibliographic data into uris that stand for the bibliographic entities represented by bibliographic records and authority records; the uris would then be available for experiments in putting our data directly onto the semantic web. 3. the dc-rda task group project to put rda data elements into rdf.6 as noted previously and discussed further later, rda is less frbrized than my cataloging rules, but otherwise this project is very similar to mine. 4. dublin core’s (dc’s) work on an rdf schema.7 dublin core is very focused on manifestation and does not deal with expressions and works, so it is less similar to my project than is the dc-rda task groups’s project (see further discussion later). n why my project? one might legitimately ask why there is a need for a different model than the ones already provided by frbr, frad, frsar, frbroo, rda, and dc. the frbr and rda models are still tied to the model that is implicit in our current bibliographic data in which expression and manifestation are undifferentiated. this is because publishers publish and libraries acquire and shelve manifestations. in our current bibliographic practice, a new 58 information technology and libraries | june 2009 bibliographic record is made for either a new manifestation or a new expression. thus, in effect, there is no way for a computer to tell one from the other in our current data. despite the fact that frbr has good definitions of expression (change in content) and manifestation (mere change in carrier), it perpetuates the existing implicit model in its mapping of attributes to entities. for example, frbr maps the following to manifestation: edition statements (“2nd rev. ed.”); statements of responsibility that identify translators, editors, and illustrators; physical description statements that identify illustrated editions; and extent statements that differentiate expressions (the 102-minute version vs. the 89-minute version); etc. thus the frbr definition of expression recognizes that a 2nd revised edition is a new expression, but frbr maps the edition statement to manifestation. in my model, i have tried to differentiate more cleanly data applying to expressions from data applying to manifestations.8 frbr and rda tend to assume that our current bibliographic data elements map to one and only one group 1 entity or class. there are exceptions, such as title, which frbr and rda define at work, expression, and manifestation levels. however, there is a lack of recognition that, to create an accurate model of the bibliographic universe, more data elements need to be applied at the work and expression level in addition to (or even instead of) the manifestation level. in the appendix i have tried to contrast the frbr, frad, and rda models with mine. in my model, many more data elements (properties and attributes) are linked to the work and expression level. after all, if the expression entity is defined as any change in work content, the work entity needs to be associated with all content elements that might change, such as the original extent of the work, the original statement of responsibility, whether illustrations were originally present, whether color was originally present in a visual work, whether sound was originally present in an audiovisual work, the original aspect ratio of a moving image work, and so on. frbr also tends to assume that our current data elements map to one and only one entity. in working on my model, i have come to the conclusion that this is not necessarily true. in some cases, a data element pertaining to a manifestation also pertains to the expression and the work. in other cases, the same data element is specific to that manifestation, and, in other cases, the same data element is specific to its expression. this is true of most of the elements of the bibliographic description. frad, in attempting to deal with the fact that our current cataloging rules allow a single person to have several bibliographic identities (or pseudonyms), treats person, name, and controlled access point as three separate entities or classes. i have tried to keep my model simpler and more elegant by treating only person as an entity, with preferred name and variant name as attributes or properties of that entity. frbroo is focused on the creation process for works, with special attention to the creation of unique works of art and other one-off items found in museums. thus frbroo tends to neglect the collocation of the various expressions that develop in the history of a work that is reproduced and published, such as translations, abridged editions, editions with commentary, etc. dc has concentrated exclusively on the description of manifestations and has neglected expression and work altogether. one of the tenets of semantic web development is that, once an entity is defined by a community, other communities can reuse that entity without defining it themselves. the very different definitions of the work and expression entities in the different communities described above raise some serious questions about the viability of this tenet. n assumptions it should be noted that this entire experiment is based on two assumptions about the future of human intervention for information organization. these two assumptions are based on the even bigger assumption that, even though the internet seems to be an economy based on free intellectual labor, and, even though human intervention for information organization is expensive (and therefore at more risk than ever), human intervention for information organization is worth the expense. n assumption 1: what we need is not artificial intelligence, but a better human–machine partnership such that humans can do all of the intellectual labor and machines can do all of the repetitive clerical labor. currently, catalogers spend too much time on the latter because of the poor design of current systems for inputting data. the universal employment provided by paying humans to do the intellectual labor of building the semantic web might be just the stimulus our economy needs. n assumption 2: those who need structured and granular data—and the precise retrieval that results from it—to carry out research and scholarship may constitute an elite minority rather than most of the people of the world (sadly), but that talented and intelligent minority is an important one for the cultural and technological advancement of humanity. it is even possible that, if we did a better job of providing access to such data, we might enable the enlargement of that minority. can bibliographic data be put directly onto the semantic web? | yee 59 n granularity and structure issues as soon as one starts to create a data model, one encounters granularity or cataloger-data parsing issues. these issues have actually been with us all along as we developed the data model implicit in aacr2r and marc 21. those familiar with rda, frbr, and frad development will recognize that much of that development is directed at increasing structure and granularity in catalogerproduced data to prepare for moving it onto the semantic web. however, there are clear trade-offs in an increase in structure and granularity. more structure and more granularity make possible more powerful indexing and more sophisticated display, but more structure and more granularity are more complex and expensive to apply and less likely to be implemented in a standard fashion across all communities; that is, it is less likely that interoperable data would be produced. any switching or mapping that was employed to create interoperable data would produce the lowest common denominator (the simplest and least granular data), and once rendered interoperable, it would not be possible for that data to swim back upstream to regain its lost granularity. data with less structure and less granularity could be easier and cheaper to apply and might have the potential to be adopted in a more standard fashion across all communities, but that data would limit the degree to which powerful indexing and sophisticated display would be possible. take the example of a personal name: currently, we demarcate surname from forename by putting the surname first, followed by a comma and then the forename. even that amount of granularity can sometimes pose a problem for a cataloger who does not necessarily know which part of the name is surname and which part is forename in a culture unfamiliar to the cataloger. in other words, the more granularity you desire in your data, the more often the people collecting the data are going to encounter ambiguous situations. another example: currently, we do not collect information about gender self-identification; if we were to increase the granularity of our data to gather that information, we would surely encounter situations in which the cataloger would not necessarily know if a given creator was self-defined as a female or a male or of some other gender identity. presently, if we are adding a birth and death date, whatever dates we use are all together in a $d subfield without any separate coding to indicate which date is the birth date and which is the death date (although an occasional “b.” or “d.” will tell us this kind of information). we could certainly provide more granularity for dates, but that would make the marc 21 format much more complex and difficult to learn. people who dislike the marc 21 format already argue that it is too granular and therefore requires too much of a learning curve before people can use it. for example, tennant claims that “there are only two kinds of people who believe themselves able to read a marc record without referring to a stack of manuals: a handful of our top catalogers and those on serious drugs.”9 how much of the granularity already in marc 21 is used either in existing records or, even if present, is used in indexing and display software? granularity costs money, and libraries and archives are already starving for resources. granularity can only be provided by people, and people are expensive. granularity and structure also exist in tension with each other. more granularity can lead to less structure (or more complexity to retain structure along with granularity). in the pursuit of more granularity of data than we have now, rda, attempting to support rdf–compliant xml encoding, has been atomizing data to make it useful to computers, but this will not necessarily make the data more useful to humans. to be useful to humans, it must be possible to group and arrange (sort) the data meaningfully, both for indexing and for display. the developers of skos refer to the “vast amounts of unstructured (i.e., human readable) information in the web,”10 yet labeling bits of data as to type and recording semantic relationships in a machine-actionable way do not necessarily provide the kind of structure necessary to make data readable by humans and therefore useful to the people the web is ultimately supposed to serve. consider the case of music instrumentation. if you have a piece of music for five guitars and one flute, and you simply code number and instrumentation without any way to link “five” with “guitars” and “one” with “flute,” you will not be able to guarantee that a person looking for music for five flutes and one guitar will not be given this piece of music in their results (see figure 1).11 the more granular the data, the less the cataloger can build order, sequencing, and linking into the data; the coding must be carefully designed to allow the desired order, sequencing, and linking for indexing and display to be possible, which might call for even more complex coding. it would be easy to lose information about order, sequencing, and linking inadvertently. actually, there are several different meanings for the term structure: 1. structure is an object of a record (structure of document?); for example, elings and waibel refer to “data fields . . . also referred to as elements . . . which are organized into a record by a data structure.”12 2. structure is the communications layer, as opposed to the display layer or content designation.13 3. structure is the record, field, and subfield. 4. structure is the linking of bits of data together in the 60 information technology and libraries | june 2009 form of various types of relationships. 5. structure is the display of data in a structured, ordered, and sequenced manner to facilitate human understanding. 6. data structure is a way of storing data in a computer so that it can be used efficiently (this is how computer programmers use the term). i hasten to add that i am definitely in favor of adding more structure and granularity to our data when it is necessary to carry out the fundamental objectives of our profession and of our catalogs. i argued earlier that frbr and rda are not granular enough when it comes to the distinction between data elements that apply to expression and those that apply to manifestation. if we could just agree on how to differentiate data applying to the manifestation from data applying to the expression instead of our current practice of identifying works with headings and lumping all manifestation and expression data together, we could increase the level of service we are able to provide to users a thousandfold. however, if we are not going to commit to differentiating between figure 1b. example of encoding of musical instrumentation at the expression level based on the above model 5 guitars 1 flute instrumentation of musical expression original instrumentation of musical expression—number of a particular instrument original instrumentation of musical expression—type of instrument figure 1a. extract from yee rdf model that illustrates one technique for modeling musical instrumentation at the expression level (using a blank node to group repeated number and instrument type) can bibliographic data be put directly onto the semantic web? | yee 61 expression and manifestation, it would be more intellectually honest for frbr and rda to take the less granular path of mapping all existing bibliographic data to manifestation and expression undifferentiated, that is, to use our current data model unchanged and state this openly. i am not in favor of adding granularity for granularity’s sake or for the sake of vague conceptions of possible future use. granularity is expensive and should be used only in support of clear and fundamental objectives. n the goal: efficient displays and indexes my main concern is that we model and then structure the data in a way that allows us to build the complex displays that are necessary to make catalogs appear simple to use. i am aware that the current orthodoxy is that recording data should be kept completely separate from indexing and display (“the applications layer”). because i have spent my career in a field in which catalog records are indexed and displayed badly by systems people who don’t seem to understand the data contained in them, i am a skeptic. it is definitely possible to model and structure data in such a way that desired displays and indexes are impossible to construct. i have seen it happen! the lc working group report states that “it will be recognized that human users and their needs for display and discovery do not represent the only use of bibliographic metadata; instead, to an increasing degree, machine applications are their primary users.”14 my fear is that the underlying assumption here is that users need to (and can) retrieve the single perfect record. this will never be true for bibliographic metadata. users will always need to assemble all relevant records (of all kinds) as precisely as possible and then browse through them before making a decision about which resources to obtain. this is as true in the semantic web—where “records” can be conceived of as entity or class uris—as it is in the world of marc–encoded metadata. some of the problems that have arisen in the past in trying to index bibliographic metadata for humans are connected to the fact that existing systems do not group all of the data related to a particular entity effectively, such that a user can use any variant name or any combination of variant names for an entity and do a successful search. currently, you can only look for a match among two or more keywords within the bounds of a single manifestation-based bibliographic record or within the bounds of a single heading, minus any variant terms for that entity. thus, when you do a keyword search for two keywords, for example, “clemens” and “adventures,” you will retrieve only those manifestations of mark twain’s adventures of tom sawyer that have his real name (clemens) and the title word “adventures” co-occurring within the bounded space created by a single manifestation-based bibliographic record. instead, the preferred forms and the variant forms for a given entity need to be bounded for indexing such that the keywords the user employs to search for that entity can be matched using co-occurrence rules that look for matches within a single bounded space representing the entity desired. we will return to this problem in the discussion of issue 3 in the later section “rdf problems encountered.” the most complex indexing problem has always proven to be the grouping or bounding of data related to a work, since it requires pulling in all variants for the creator(s) of that work as well. otherwise, a user who searches for a work using a variant of the author’s name and a variant of the title will continue to fail (as they do in all current opacs), even when the desired work exists in the catalog. if we could create a uri for the adventures of tom sawyer that included all variant names for the author and all variant titles for the work (including the variant title tom sawyer), the same keyword search described above (“clemens” and “adventures”) could be made to retrieve all manifestations and expressions of the adventures of tom sawyer, instead of the few isolated manifestations that it would retrieve in current catalogs. we need to make sure that we design and structure the data such that the following displays are possible: n display all works by this author in alphabetical order by title with the sorting element (title) appearing at the top of each work displayed. n display all works on this subject in alphabetical order by principal author and title (with principal author and title appearing at top of each work displayed), or title if there is no principal author (with title appearing at top of each work displayed). we must ensure that we design and structure the data in such a way that our structure allows us to create subgroups of related data, such as instrumentation for a piece of music (consisting of a number associated with each particular instrument), place and related publisher for a certain span of dates on a serial title change record, and the like. n which standards will carry out which functions? currently, we have a number of different standards to carry out a number of different functions; we can speculate about how those functions might be allocated in a new semantic web–based dispensation, as shown in table 1. in table 1, data structure is taken to mean what a record represents or stands for; traditionally, a record has represented an expression (in the days of hand62 information technology and libraries | june 2009 press books) or a manifestation (ever since reproduction mechanisms have become more sophisticated, allowing an explosion of reproductions of the same content in different formats and coming from different distributors). rda is record-neutral; rdf would allow uris to be established for any and all of the frbr levels; that is, there would be a uri for a particular work, a uri for a particular expression, a uri for a particular manifestation, and a uri for a particular item. note that i am not using data structure in the sense that a computer programmer does (as a way of storing data in a computer so that it can be used efficiently). currently, the encoding of facts about entity relationships (see table 1) is carried out by matching data-value character strings (headings or linking fields using issns and the like) that are defined by the lc/naco authority file (following aacr2r rules), lcsh (following rules in the subject cataloging manual), etc. in the future, this function might be carried out by using rdf to link the uri for a resource to the uri for a data value. display rules (see table 1) are currently defined by isbd and aacr2r but widely ignored by systems, which frequently truncate bibliographic records arbitrarily in displays, supply labels, and the like; rda abdicates responsibility, pushing display out of the cataloging rules. the general principle on the web is to divorce data from display and allow anyone to display the data any way they want. display is the heart of the objects (or goals) of cataloging: the point is to display to the user the works of an author, the editions of a work, or the works on a subject. all of these goals only can be met if complex, high-quality displays can be built from the data created according to the data model. indexing rules (see table 1) were once under the control of catalogers (in book and card catalogs) in that users had to navigate through headings and cross-references to find table 1. possible reallocation of current functions in a new semantic web–based dispensation function current future? data content, or content guidelines (rules for providing data in a particular element) defined by aacr2r and marc 21 defined by rda and rdf/rdfs/ owl/skos data elements defined by isbd–based aacr2r and marc 21 defined by rda and rdf/rdfs/ owl/skos data values defined by lc/naco authority file, lcsh, marc 21 coded data values, etc. defined as ontologies using rdf/ rdfs/owl/skos encoding or labeling of data elements for machine manipulation; same as data format? defined by iso 2709–based marc 21 defined by rdf/rdfs/xml data structure (i.e., what a record stands for) defined by aacr2r and marc 21; also frbr? defined by rdf/rdfs/owl/ skos schematization (constraint on structure and content) marc 21, mods, dcmi abstract model defined by rdf/rdfs/owl/ skos encoding of facts about entity relationships carried out by matching data value strings (headings found in lc/naco authority file and lcsh, issn’s, and the like) carried out by rdf/rdfs/owl/ skos in the form of uri links display rules ils software, formerly isbd– based aacr2r (“application layer”) or yee rules indexing rules ils software sparql, “application layer,” or yee rules can bibliographic data be put directly onto the semantic web? | yee 63 what they wanted; currently indexing is in the hands of system designers who prefer to provide keyword indexing of bibliographic (i.e., manifestation-based) records rather than provide users with access to the entities they are really interested in (works, authors and subjects), all represented currently by authority records for headings and cross-references. rda abdicates responsibility, pushing indexing concerns completely out of the cataloging rules. the general principle on the web is to allow resources to be indexed by any web search engines that wish to index them. current web data is not structured at all for either indexing or display. i would argue that our interest in the semantic web should be focused on whether or not it will support more data structure—as well as more logic in that data structure—to support better indexes and better displays than we have now in manifestation-based ils opacs. crucial to better indexing than we have ever had before are the co-occurrence rules for keyword indexing, that is, the rules for when a co-occurrence of two or more keywords should produce a match. we need to be able to do a keyword search across all possible variant names for the entity of interest, and the entity of interest for the average catalog user is much more likely to be a particular work than to be a particular manifestation. unfortunately, catalog-use studies only have studied so-called known-item searches without investigating whether a known-item searcher was looking for a particular edition or manifestation of a work or was simply looking for a particular work in order to make a choice as to edition or manifestation once the work was found. however, common sense tells us that it is a rare user who approaches the catalog with prior knowledge about all published editions of a given work. the more common situation is surely one in which a user desires to read a particular shakespeare play or view a particular david lean film and discovers that the desired work exists in more than one expression or manifestation only after searching the catalog. we need to have the keyword(s) in our search for a particular work co-occur within a bounded space that encompasses all possible keywords that might refer to that particular work entity, including both creator and title keywords. notice in table 1 the unifying effect that rdf could potentially have; it could free us from the use of multiple standards that can easily contradict each other, or at least not live peacefully together. examples are not hard to find in the current environment. one that has cropped up in the course of rda development concerns family names. presently the rules for naming families are different depending on whether the family is the subject of a work (and established according to lcsh) or whether the family is responsible for a collection of papers (and established according to rda). n types of data rda has blurred the distinctions among certain types of data, apparently because there is a perception that on the semantic web the same piece of data needs to be coded only once, and all indexing and display needs can be supported from that one piece of data. i question that assumption on the basis of my experience with bibliographic cataloging. all of the following ways of encoding the same piece of data can still have value in certain circumstances: n transcribed; in rdf terms, a literal (i.e., any data that is not a uri, a constant value). transcribed data is data copied from an item being cataloged. it is valuable for providing access to the form of the name used on a title page and is particularly useful for people who use pseudonyms, corporate bodies that change name, and so on. transcribed data is an important part of the historical record and not just for off-line materials; it can be a historical record of changing data on notoriously fluid webpages. n composed; in rdf terms, also a literal. composed data is information composed by a cataloger on the basis of observation of the item in hand; it can be valuable for historical purposes to know which data was composed. n supplied; in rdf terms, also a literal. supplied data is information supplied by a cataloger from outside sources; it can be valuable for historical purposes to know which data was supplied and from which outside sources it came. n coded; in rdf, represented by a uri. coded data would likely transform on the semantic web into links to ontologies that could provide normalized, human-readable identification strings on demand, thus causing coded and normalized data to merge into one type of data. is it not possible, though, that the coded form of normalized data might continue to provide for more efficient searching for computers as opposed to humans? coded data also has great cross-cultural value, since it is not as language-dependent as literals or normalized headings. n normalized headings (controlled headings); in rdf, represented by a uri. normalized or controlled headings are still necessary to provide users with coherent, ordered displays of thousands of entities that all match the user’s search for a particular entity (work, author, subject, etc.). the reason google displays are so hideous is that, so far, the data searched lacks any normalized display data. if variant language forms of the name for an entity 64 information technology and libraries | june 2009 are linked to an entity uri, it should be possible to supply headings in the language and script desired by a particular user. n the rdf model those who have become familiar with frbr over the years will probably not find it too difficult to transition from the frbr conceptual model to the rdf model. what frbr calls an “entity,” rdf calls a “subject” and rdfs calls a “class.” what frbr calls an “attribute,” rdf calls an “object” and rdfs calls a “property.” what frbr calls a “relationship,” rdf calls a “predicate” and rdfs calls a “relationship” or a “semantic linkage” (see table 2). the difficulty in any data-modeling exercise lies in deciding what to treat as an entity or class and what to treat as an attribute or property. the authors of frbr decided to create a class called expression to deal with any change in the content of a work. when frbr is applied to serials, which change content with every issue, the model does not work well. in my model, i found it useful to create a new entity at the manifestation level, the serial title, to deal with the type of change that is more relevant to serials, the change in title. i also created another new entity at the manifestation level, title-manifestation, to deal with a change of title in a nonserial work that is not associated with a change in content. one hundred years ago, this entity would have been called title-edition. i am also in the process of developing an entity at the expression level—surrogate—to deal with reproductions of original artworks that need to inherit the qualities of the original artwork they reproduce without being treated as an edition of that original artwork, which ipso facto is unique. these are just examples of cases in which it is not that easy to decide on the classes or entities that are necessary to accurately model bibliographic information. see the appendix for a complete comparison of the classes and entities defined in four different models: frbr, frad, rda, and the yee cataloging rules (ycr). the appendix also shows variation among these models concerning whether a given data element is treated as a class/entity or as an attribute/property. the most notable examples are name and preferred access point, which are treated as classes/entities in frad, as attributes in frbr and ycr, and as both in rda. n rdf problems encountered my goal for this paper is to institute discussion with data modelers about which problems i observed are insoluble and which are soluble: 1. is there an assumption on the part of semantic web developers that a given data element, such as a publisher name, should be expressed as either a literal or using a uri (i.e., controlled), but never both? cataloging is rooted in humanistic practices that require careful recording of evidence. there will always be value in distinguishing and labeling the following types of data: n copied as is from an artifact (transcribed) n supplied by a cataloger n categorized by a cataloger (controlled) tim berners-lee (the father of the internet and the semantic web) emphasizes the importance of recording not just data but also its provenance for the sake of authenticity.15 for many data elements, therefore, it will be important to be able to record both a literal (transcribed or composed form or both) and a uri (controlled form). is this a problem in rdf? as a corollary, if any data that can be given a uri cannot also be represented by a literal (transcribed and composed data, or one or the other), it may not be possible to design coherent, readable displays of the data describing a particular entity. among other things, cataloging is a discursive writing skill. does rdf require that all data be represented only once, either by a literal or by a uri? or is it perhaps possible that data that has a uri could also have a transcribed or composed form as a property? perhaps it will even be possible to store multiple snapshots of online works that change over time to document variant forms of a name for works, persons, and so on. 2. will the internet ever be fast enough to assemble the equivalent of our current records from a collection of hundreds or even thousands of uris? in rdf, links are one-to-one rather than one-to-many. this leads to a great proliferation of reciprocal links. the more granularity there is in the data, the more linking is necessary to ensure that atomized data elements are linked together. potentially, every piece of data describing a particular entity could be represented by a uri leading out to a skos list of data values. the number of links necessary to pull together table 2. the frbr conceptual model translated into rdf and rdfs frbr rdf rdfs entity subject class attribute object property relationship predicate relationship/ semantic linkage can bibliographic data be put directly onto the semantic web? | yee 65 all of the data just to describe one manifestation could become astronomical, as could the number of one-to-one links necessary to create the appearance of a one-to-many link, such as the link between an author and all the works of an author. is the internet really fast enough to assemble a record from hundreds of uris in a reasonable amount of time? given the often slow network throughput typical of many of our current internet connections, is it really practical to expect all of these pieces to be pulled together efficiently to create a single display for a single user? we yet may feel nostalgia for the single manifestation-based record that already has all of the relevant data in it (no assembly required). bruce d’arcus points out, however, that i think if you’re dealing with rdf, you wouldn’t necessarily be gathering these data in real-time. the uris that are the targets for those links are really just global identifiers. how you get the triples is a separate matter. so, for example, in my own personal case, i’m going to put together an rdf store that is populated with data from a variety of sources, but that data population will happen by script, and i’ll still be querying a single endpoint, where the rdf is stored in a relational database.16 in other words, d’arcus essentially will put them all in one place, or in one database that “looks” from a uri perspective to be “one place” where they’re already gathered. 3. is rdf capable of dealing with works that are identified using their creators? we need to treat author as both an entity in its own right and as a property of a work, and in many cases the latter is the more important function for user service. lexical labels, or human-readable identifiers for works that are identified using both the principal author and the title, are particularly problematic in rdf given that the principal author is an entity in its own right. is rdf capable of supporting the indexing necessary to allow a user to search using any variant of the author’s name and any variant of the title of a work in combination and still retrieve all expressions and manifestations of that work, given that author will have a uri of its own, linked by means of a relationship link to the work uri? is rdf capable of supporting the display of a list of one thousand works, each identified by principal author, in order first by principal author, then by title, then by publication date, given that the preferred heading for each principal author would have to be assembled from the uri for that principal author and the preferred title for each work would have to be assembled from the uri for that work? for fear that this will not, in fact, be possible, i have put a human-readable work-identifier data element into my model that consists of principal author and title when appropriate, even though that means the preferred name of the principal author may not be able to be controlled by the entity record for the principal author. any guidance from experienced data modelers in this regard would be appreciated. according to bruce d’arcus, this is purely an interface or application question that does not require a solution at the data layer.17 since we have never had interfaces or applications that would do this correctly, even though the data is readily available in authority records, i am skeptical about this answer! perhaps bruce’s suggestion under item 9 of designating a sortname property for each entity is the solution here as well. my human-readable work identifier consisting of the name of the principal creator and uniform title of work could be designated the sortname poperty for the work. it would have to be changed whenever the preferred form of the name for the principal creator changed, however. 4. do all possible inverse relationships need to be expressed explicitly, or can they be inferred? my model is already quite large, and i have not yet defined the inverse of every property as i really should to have a correct rdf model. in other words, for every property there needs to be an inverse property; for example, the property iscreatorof needs to have the inverse property iscreatedby; thus “twain” has the property iscreatorof, while “adventures of tom sawyer” has the property iscreatedby. perhaps users and inputters will not actually have to see the huge, complex rdf data model that would result from creating all the inverse relationships, but those who maintain the model will have to deal with a great deal of complexity. however, since i’m not a programmer, i don’t know how the complexity of rdf compares to the complexity of existing ils software. 5. can rdf solve the problems we are having now because of the lack of transitivity or inheritance in the data models that underlie current ilses, or will rdf merely perpetuate these problems? we have problems now with the data models that underlie our current ilses because of the inability of these models to deal with hierarchical inheritance, such that whatever is true of an entity in the hierarchy is also true of every entity below that entity in the hierarchy. one example is that of cross-references to a parent corporate body that should be held to apply to all subdivisions of that corporate body but never are in existing ils systems. there is a cross-reference from “fbi” to “united states. federal bureau of investigation,” but not from “fbi counterterrorism division” to “united states. federal bureau of investigation. counterterrorism division.” for that reason, a search in any opac name index for “fbi counterterrorism division” will fail. we need systems that recognize that data about a parent corporate body is relevant to all subdivisions of that parent body. we need systems that recognize that data about a work is relevant to all expressions and manifestations of that work. rdf allows you to link a work to an expression 66 information technology and libraries | june 2009 and an expression to a manifestation, but i don’t believe it allows you to encode the information that everything that is true of the work is true of all of its expressions and manifestations. rob styles seems to confirm this: “rdf doesn’t have hierarchy. in computer science terms, it’s a graph, not a tree, which means you can connect anything to anything else in any direction.”18 of course, not all links should be this kind of transitive or inheritance link. one expression of work a is linked to another expression of work a by links to work a, but whatever is true of one of those expressions is not necessarily true of the other; one may be illustrated, for example, while the other is not. whatever is true of one work is not necessarily true of another work related to it by related work link. it should be recognized that bibliographic data is rife with hierarchy. it is one of our major tools for expressing meaning to our users. corporate bodies have corporate subdivisions, and many things that are true for the parent body also are true for its subdivisions. subjects are expressed using main headings and subject subdivisions, and many things that are true for the main heading (such as variant names) also are true for the heading combined with one of its subdivisions. geographic areas are contained within larger geographic areas, and many things that are true of the larger geographic area also are true for smaller regions, counties, cities, etc., contained within that larger geographic area. for all these reasons, i believe that, to do effective displays and indexes for our bibliographic data, it is critical that we be able to distinguish between a hierarchical relationship and a nonhierarchical relationship. 6. to recognize the fact that the subject of a book or a film could be a work, a person, a concept, an object, an event, or a place (all classes in the model), is there any reason we cannot define subject itself as a property (a relationship) rather than a class in its own right? in my model, all subject properties are defined as having a domain of resource, meaning there is no constraint as to the class to which these subject properties apply. i’m not sure if there will be any fall-out from that modeling decision. 7. how do we distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location? sometimes a place is a jurisdiction and behaves like a corporate body (e.g., united states is the name of the government of the united states). sometimes place is a physical location in which something is located (e.g., the birds discussed in a book about the birds of the united states). to distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location, i have defined two different classes for place: place as jurisdictional corporate body and place as geographic area. will this cause problems in the model? will there be times when it prevents us from making elegant generalizations in the model about place per se? there is a similar problem with events. some events are corporate bodies (e.g., conferences that publish papers) and some are a kind of subject (e.g., an earthquake). i have defined two different classes for event: conference or other event as corporate body creator and event as subject. 8. what is the best way to model a bound-with or an issuedwith relationship, or a part–whole relationship in which the whole must be located to obtain the part? the bound-with relationship is actually between two items containing two different works, while the issued-with relationship is between two manifestations containing two different works (see figure 2). is this a work-to-work relationship? will designating it a work-to-work relationship cause problems for indicating which specific items or manifestation-items of each work are physically located in the same place? this question may also apply to those part–whole relationships in which the part is physically contained within the whole and both are located in the same place (sometimes known as analytics). one thing to bear in mind is that in all of these cases the relationship between two works does not hold between all instances of each work; it only holds for those particular instances that are contained in the particular manifestation or item that is bound with, issued with, or part of the whole. however, if the relationship is modeled as a work-1manifestation to work-2-manifestation relationship, or a work-1-item to work-2-item relationship,, care must be taken in the design of displays to pull in enough information about the two or more works so as not to confuse the user. 9. how do we express the arrangement of elements that have a definite order? i am having trouble imagining how to encode the ordering of data elements that make up a larger element, such as the pieces of a personal name. this is really a desire to control the display of those atomized elements so that they make sense to human beings rather than just to machines. could one define a property such as natural language order of forename, surname, middle name, patronymic, matronymic and/or clan name of a person given that the ideal order of these elements might vary from one person to another? could one define properties such as sorting element 1, sorting element 2, sorting element 3, etc., and assign them to the various pieces that will be assembled to make a particular heading for an entity, such as an lcsh heading for a historical period? (depending on the answer to the question in item 11, it may or may not be possible to assign a property to a property in this fashion.) are there standard sorting rules we need to be aware of (in unicode, for example)? are there other rdf techniques available to deal with sorting and arrangement? bruce d’arcus suggests that, instead of coding the name parts, it would be more useful to designate sortname properties;19 might it not be necessary to designate a sortname property for each variant name, as well, can bibliographic data be put directly onto the semantic web? | yee 67 for cases in which variants need to appear in sorted displays? and wouldn’t these sortname properties complicate maintenance over time as preferred and variant names changed? 10. how do we link related data elements in such a way that effective indexing and displays are possible? some examples: number and kind of instrument (e.g., music written for two oboes and three guitars); multiple publishers, frequencies, subtitles, editors, etc., with date spans for a serial title change (or will it be necessary to create a new manifestation for every single change in subtitle, publisher name, place of publication, etc?). the assumption seems to be that there will be no repeatable data elements. based on my somewhat limited experience with rdf, it appears that there are record equivalents (every data element—property or relationship—pertaining to a particular entity with a uri), but there are no field or subfield equivalents that allow the sublinking of related pieces of data about an entity. indeed, rob styles goes so far as to argue that ultimately there is no notion of a “record” in rdf.20 it is possible that blank nodes might be able to fill in for fields and subfields in some cases for grouping data, but there are dangers involved in their use.21 to a cataloger, it looks as though the plan is for rdf data to float around loose without any requirement that there be a method for pulling it together into coherent displays designed for human beings. 11. can a property have a property in rdf? as an example of where it might be useful to define a property of a property, robert maxwell suggests that date of publication is really an attribute (property) of the published by relationship (another property).22 another example: in my model, a variant title for a serial is a property. can that property itself have the property type of variant title to encompass things like spine title, key title, etc.? another example appeared in item 9, in which it is suggested that it might be desirable to assign sort-element properties to the various elements of a name property. 12. how do we document record display decisions? there is no way to record display decisions in rdf itself; it is completely display-neutral. we could not safely commit to a particular rdf–based data model until a significant amount of sample bibliographic data had been created and open-source indexing and display software had been designed and user-tested on that data. it may be that we will need to supplement rdf with some other encoding mechanism that allows us to record display decisions along with the data. current cataloging rules are about display as much as they are about content designation. isbd concerns the order in which the elements should be displayed to humans. the cataloging objectives concern display to users of such entity groups as the works of an author, the editions of a work, and the works on a subject. 13. can all bibliographic data be reduced to either a class or a property with a finite list of values? another way to put this is to ask if all that catalogers do could be reduced to a set of pull-down menus. cataloging is the art of writing discursive prose as much as it is the ability to select the correct value for a particular data element. we must deal with ambiguous data (presented by joe blow could mean that joe created the entire work, produced it, distributed it, sponsored it, or merely funded it). we must sometimes record information without knowing its exact meaning. we must deal with situations that have not been anticipated in advance. it is not possible to list every possible kind of data and every possible value for each type of figure 2. examples of part–whole relationships. how might these be best expressed in rdf? issued-with relationship a copy of charlie chaplin’s 1917 film the immigrant can be found on a videodisc compilation called charlie chaplin, the early years along with two other chaplin films. this compilation was published and collected by many different libraries and media centers. if a user wants to view this copy of the immigrant, he or she will first have to locate charlie chaplin, the early years, then look for the desired film at the beginning of the first videodisc in the set. the issued-with relationship between the immigrant and the other two films on charlie chaplin, the early years is currently expressed in the bibliographic record by means of a “with” note: first on charlie chaplin, the early years, v. 1 (62 min.) with: the count – easy street. bound-with relationship the university of california, los angeles film & television archive has acquired a reel of 16 mm. film from a collector who strung five warner bros. cartoons together on a single reel of film. we can assume that no other archive, library, or media collection will have this particular compilation of cartoons, so the relationship between the five cartoons is purely local in nature. however, any user at the film & television archive who wishes to view one of these cartoons will have to request a viewing appointment for the entire reel and then find the desired cartoon among the other four on the reel. the bound-with relationship among these cartoons is currently expressed in a holdings record by means of a “with” note: fourth on reel with: daffy doodles – tweety pie – i love to singa – along flirtation walk. 68 information technology and libraries | june 2009 data up front before any data is gathered. it will always be necessary to provide a plain-text escape hatch. the bibliographic world is a complex, constantly changing world filled with ambiguity. n what are the next steps? in a sense, this paper is a first crude attempt at locating unmapped territory that has not yet been explored. if we were to decide as a community that it would be valuable to move our shared cataloging activities onto the semantic web, we would have a lot of work ahead of us. if some of the rdf problems described above are insoluble, we may need to work with semantic web developers to create a more sophisticated version of rdf that can handle the transitivity and complex linking required by our data. we will also need to encourage a very complex existing community to evolve institutional structures that would enable a more efficient use of the internet for the sharing of cataloging and other metadata creation. this is not just a technological problem, but also a political one. in the meantime, the experiment continues. let the thinking and learning begin! references and notes 1. “notation3, or n3 as it is more commonly known, is a shorthand non–xml serialization of resource description framework models, designed with human-readability in mind: n3 is much more compact and readable than xml rdf notation. the format is being developed by tim berners-lee and others from the semantic web community.” wikipedia, “notation 3,” http://en.wikipedia.org/wiki/notation_3 (accessed feb. 19, 2009). 2. frbr review group, www.ifla.org/vii/s13/wgfrbr/; frbr review group, franar (working group on functional requirements and numbering of authority records), www .ifla.org/vii/d4/wg-franar.htm; frbr review group, frsar (working group, functional requirements for subject authority records), www.ifla.org/vii/s29/wgfrsar.htm; frbroo, frbr review group, working group on frbr/crm dialogue, www .ifla.org/vii/s13/wgfrbr/frbr-crmdialogue_wg.htm. 3. library of congress, response to on the record: report of the library of congress working group on the future of bibliographic control (washington, d.c.: library of congress, 2008): 24, 39, 40, www.loc.gov/bibliographic-future/news/lcwgrpt response_dm_053008.pdf (accessed mar. 25, 2009). 4. ibid., 39. 5. ibid., 41. 6. dublin core metadata initiative, dcmi/rda task group wiki, http://www.dublincore.org/dcmirdataskgroup/ (accessed mar. 25, 2009). 7. mikael nilsson, andy powell, pete johnston, and ambjorn naeve, expressing dublin core metadata using the resource description framework (rdf), http://dublincore.org/ documents/2008/01/14/dc-rdf/ (accessed mar. 25, 2009). 8. see for example table 6.3 in frbr, which maps to manifestation every kind of data that pertains to expression change with the exception of language change. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records (munich: k. g. saur, 1998): 95, http://www.ifla.org/vii/s13/frbr/frbr.pdf (accessed mar. 4, 2009). 9. roy tennant, “marc must die,” library journal 127, no. 17 (oct. 15, 2002): 26. 10. w3c, skos simple knowledge organization system reference, w3c working draft 29 august 2008, http://www.w3.org/ tr/skos-reference/ (accessed mar. 25, 2009). 11. the extract in figure 1 is taken from my complete rdf model, which can be found at http://myee.bol.ucla.edu/ ycrschemardf.txt. 12. mary w. elings and gunter waibel, “metadata for all: descriptive standards and metadata sharing across libraries, archives and museums,” first monday 12, no. 3 (mar. 5, 2007), http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ article/view/1628/1543 (accessed mar. 25, 2009). 13. oclc, a holdings primer: principles and standards for local holdings records, 2nd ed. (dublin, ohio: oclc, 2008), 4, http:// www.oclc.org/us/en/support/documentation/localholdings/ primer/holdings%20primer%202008.pdf (accessed mar. 25, 2009). 14. the library of congress working group, on the record: report of the library of congress working group on the future of bibliographic control (washington, d.c.: library of congress, 2008): 30, http:// www.loc.gov/bibliographic-future/news/lcwg-ontherecord -jan08-final.pdf (accessed mar. 25, 2009). 15. talis, sir tim berners-lee talks with talis about the semantic web: transcript of an interview recorded on 7 february 2008, http://talis-podcasts.s3.amazonaws.com/twt20080207_timbl .html (accessed mar. 25, 2009). 16. bruce d’arcus, e-mail to author, mar. 18, 2008. 17. ibid. 18. rob styles, e-mail to author, mar. 25, 2008. 19. bruce d’arcus, e-mail to author, mar. 18, 2008. 20. rob styles, e-mail to author, mar. 25, 2008. 21. w3c, “section 2.3, structured property values and blank nodes,” in rdf primer: w3c recommendation 10 february 2004, http://www.w3.org/tr/rdf-primer/#structuredproperties (accessed mar. 25, 2009). 22. robert maxwell, frbr: a guide for the perplexed (chicago: ala, 2008). can bibliographic data be put directly onto the semantic web? | yee 69 entities/classes in rda, frbr, frad compared to yee cataloging rules (ycr) rda, frbr, and frad ycr group 1: work work group 1: expression expression surrogate group 1: manifestation manifestation title-manifestation serial title group 1: item item group 2: person person fictitious character performing animal group 2: corporate body corporate body corporate subdivision place as jurisdictional corporate body conference or other event as corporate body creator jurisdictional corporate subdivision family (rda and frad only) group 3: concept concept group 3: object object group 3: event event or historical period as subject group 3: place place as geographic area discipline genre/form name identifier controlled access point rules (frad only) agency (frad only) appendix. entity/class and attribute/property comparisons 70 information technology and libraries | june 2009 attributes/properties in frbr compared to frad model entity frbr frad work title of the work form of work date of the work other distinguishing characteristics intended termination intended audience context for the work medium of performance (musical work) numeric designation (musical work) key (musical work) coordinates (cartographic work) equinox (cartographic work) form of work date of the work medium of performance subject of the work numeric designation key place of origin of the work original language of the work history other distinguishing characteristic expression title of the expression form of expression date of expression language of expression other distinguishing characteristics extensibility of expression revisability of expression extent of the expression summarization of content context for the expression critical response to the expression use restrictions on the expression sequencing pattern (serial) expected regularity of issue (serial) expected frequency of issue (serial) type of score (musical notation) medium of performance (musical notation or recorded sound) scale (cartographic image/object) projection (cartographic image/object) presentation technique (cartographic image/object) representation of relief (cartographic image/object) geodetic, grid, and vertical measurement (cartographic image/ object) recording technique (remote sensing image) special characteristic (remote sensing image) technique (graphic or projected image) form of expression date of expression language of expression technique other distinguishing characteristic surrogate can bibliographic data be put directly onto the semantic web? | yee 71 model entity frbr frad manifestation title of the manifestation statement of responsibility edition/issue designation place of publication/distribution publisher/distributor date of publication/distribution fabricator/manufacturer series statement form of carrier extent of the carrier physical medium capture mode dimensions of the carrier manifestation identifier source for acquisition/access authorization terms of availability access restrictions on the manifestation typeface (printed book) type size (printed book) foliation (hand-printed book) collation (hand-printed book) publication status (serial) numbering (serial) playing speed (sound recording) groove width (sound recording) kind of cutting (sound recording) tape configuration (sound recording) kind of sound (sound recording) special reproduction characteristic (sound recording) colour (image) reduction ratio (microform) polarity (microform or visual projection) generation (microform or visual projection) presentation format (visual projection) system requirements (electronic resource) file characteristics (electronic resource) mode of access (remote access electronic resource) access address (remote access electronic resource) edition/issue designation place of publication/distribution publisher/distributor date of publication/distribution form of carrier numbering title-manifestation serial title item item identifier fingerprint provenance of the item marks/inscriptions exhibition history condition of the item treatment history scheduled treatment access restrictions on the item location of item attributes/properties in frbr compared to frad (cont.) 72 information technology and libraries | june 2009 model entity frbr frad person name of person dates of person title of person other designation associated with the person dates associated with the person title of person other designation associated with the person gender place of birth place of death country place of residence affiliation address language of person field of activity profession/occupation biography/history fictitious character performing animal corporate body name of the corporate body number associated with the corporate body place associated with the corporate body date associated with the corporate body other designation associated with the corporate body place associated with the corporate body date associated with the corporate body other designation associated with the corporate body type of corporate body language of the corporate body address field of activity history corporate subdivision place as jurisdictional corporate body conference or other event as corporate body creator jurisdictional corporate subdivision family type of family dates of family places associated with family history of family concept term for the concept type of concept object term for the object type of object date of production place of production producer/fabricator physical medium event term for the event date associated with the event place associated with the event attributes/properties in frbr compared to frad (cont.) can bibliographic data be put directly onto the semantic web? | yee 73 model entity frbr frad place term for the place coordinates other geographical information discipline genre/form name type of name scope of usage dates of usage language of name script of name transliteration scheme of name identifier type of identifier identifier string suffix controlled access point type of controlled access point status of controlled access point designated usage of controlled access point undifferentiated access point language of base access point script of base access point script of cataloguing transliteration scheme of base access point transliteration scheme of cataloguing source of controlled access point base access point addition rules citation for rules rules identifier agency name of agency agency identifier location of agency attributes/properties in frbr compared to frad (cont.) 74 information technology and libraries | june 2009 attributes/properties in rda compared to ycr model entity rda ycr work title of the work form of work date of work place of origin of work medium of performance numeric designation key signatory to a treaty, etc. other distinguishing characteristic of the work original language of the work history of the work identifier for the work nature of the content coverage of the content coordinates of cartographic content equinox epoch intended audience system of organization dissertation or theses information key identifier for work language-based identifier (preferred lexical label) variant language-based identifier (alternate lexical label) language-based identifier (preferred lexical label) for work language-based identifier for work (preferred lexical label) identified by principalcreator in combination with uniform title language-based identifier (preferred lexical label) for work identified by title alone (uniform title) supplied title for work variant title for work original language of work responsibility for work original publication statement of work dates associated with work original publication/release/broadcast date of work copyright date of work creation date of work date of first recording of a work date of first performance of a work finding date of naturally occurring object original publisher/distributor/broadcaster of work places associated with work original place of publication/distribution/broadcasting for work country of origin of work place of creation of work place of first recording of work place of first performance of work finding place of naturally occurring object original method of publication/distribution/broadcast of work serial or integrating work original numeric and/or alphabetic designations—beginning serial or integrating work original chronological designations— beginning serial or integrating work original numeric and/or alphabetic designations—ending serial or integrating work original chronological designations— ending encoding of content of work genre/form of content of work original instrumentation of musical work instrumentation of musical work—number of a particular instrument instrumentation of musical work—type of instrument original voice(s) of musical work voice(s) of musical work—number of a particular type of voice voice(s) of musical work—type of voice original key of musical work numeric designation of musical work coordinates of cartographic work equinox of cartographic work original physical characteristics of work original extent of work original dimensions of work mode of issuance of work can bibliographic data be put directly onto the semantic web? | yee 75 model entity rda ycr work (cont.) original aspect ratio of moving image work original image format of moving image work original base of work original materials applied to base of work work summary work contents list custodial history of work creation of archival collection censorship history of work note about relationship(s) to other works expression content type date of expression language of expression other distinguishing characteristic of the expression identifier for the expression summarization of the content place and date of capture language of the content form of notation accessibility content illustrative content supplementary content colour content sound content aspect ratio format of notated music medium of performance of musical content duration performer, narrator, and/or presenter artistic and/or technical credits scale projection of cartographic content other details of cartographic content awards key identifier for expression language-based identifier (preferred lexical label) for expression variant title for expression nature of modification of expression expression title expression statement of responsibility edition statement scale of cartographic expression projection of cartographic expression publication statement of expression place of publication/distribution/release/broadcasting for expression place of recording for expression publisher/distributor/releaser/broadcaster for expression publication/distribution/release/broadcast date for expression copyright date for expression date of recording for expression numeric and/or alphabetic designations for serial expressions chronological designations for serial expressions performance date for expression place of performance for expression extent of expression content of expression language of expression text language of expression captions language of expression sound track language of sung or spoken text of expression language of expression subtitles language of expression intertitles language of summary or abstract of expression instrumentation of musical expression instrumentation of musical expression—number of a particular instrument instrumentation of musical expression—type of instrument voice(s) of musical expression voice(s) of musical expression—number of a particular type of voice voice(s) of musical expression—type of voice key of musical expression appendages to the expression expression series statement mode of issuance for expression notes about expression surrogate [under development] attributes/properties in rda compared to ycr (cont.) 76 information technology and libraries | june 2009 model entity rda ycr manifestation title statement of responsibility edition statement numbering of serials production statement publication statement distribution statement manufacture statement copyright date series statement mode of issuance frequency identifier for the manifestation note media type carrier type base material applied material mount production method generation layout book format font size polarity reduction ratio sound characteristics projection characteristics of motion picture film video characteristics digital file characteristics equipment and system requirements terms of availability key identifier for manifestation publication statement of manifestation place of publication/distribution/release/broadcast of manifestation manifestation publisher/distributor/releaser/broadcaster manifestation date of publication/distribution/release/broadcast carrier edition statement carrier piece count carrier name carrier broadcast standard carrier recording type carrier playing speed carrier configuration of playback channels process used to produce carrier carrier dimensions carrier base materials carrier generation carrier polarity materials applied to carrier carrier encoding format intermediation tool requirements system requirements serial manifestation illustration statement manifestation standard number manifestation isbn manifestation issn manifestation publisher number manifestation universal product code notes about manifestation titlemanifestation key identifier for title-manifestation variant title for title-manifestation title-manifestation title title-manifestation statement of responsibilities title-manifestation edition statement publication statement of title-manifestation place of publication/distribution/release/broadcasting of titlemanifestation publisher/distributor/releaser, broadcaster of title-manifestation date of publication/distribution/release/broadcast of titlemanifestation title-manifestation series title-manifestation mode of issuance notes about title-manifestation title-manifestation standard number attributes/properties in rda compared to ycr (cont.) can bibliographic data be put directly onto the semantic web? | yee 77 model entity rda ycr serial title key identifier for serial title variant title for serial title title of serial title serial title statement of responsibility serial title edition statement publication statement of serial title place of publication/distribution/release/broadcast of serial title publisher/distributor/releaser/broadcaster of serial title date of publication/distribution/release/broadcast of serial title serial title beginning numeric and/or alphabetic designations serial title beginning chronological designations serial title ending numeric and/or alphabetic designations serial title ending chronological designations serial title frequency serial title mode of issuance serial title illustration statement notes about serial title serial title issn-l item preferred citation custodial history immediate source of acquisition identifier for the item item-specific carrier characteristics key identifier for item item barcode item location item call number or accession number item copy number item provenance item condition item marks and inscriptions item exhibition history item treatment history item scheduled treatment item access restrictions attributes/properties in rda compared to ycr (cont.) 78 information technology and libraries | june 2009 model entity rda ycr person name of the person preferred name for the person variant name for the person date associated with the person title of the person fuller form of name other designation associated with the person gender place of birth place of death country associated with the person place of residence address of the person affiliation language of the person field of activity of the person profession or occupation biographical information identifier for the person key identifier for person language-based identifier (preferred lexical label) for person clan name of person forename/given name/first name of person matronymic of person middle name of person nickname of person patronymic of person surname/family name of person natural language order of forename, surname, middle name, patronymic, matronymic and/or clan name of person affiliation of person biography/history of person date of birth of person date of death of person ethnicity of person field of activity of person gender of person language of person place of birth of person place of death of person place of residence of person political affiliation of person profession/occupation of person religion of person variant name for person fictitious character [under development] performing animal [under development] corporate body name of the corporate body preferred name for the corporate body variant name for the corporate body place associated with the corporate body date associated with the corporate body associated institution other designation associated with the corporate body language of the corporate body address of the corporate body field of activity of the corporate body corporate history identifier for the corporate body key identifier for corporate body language-based identifier (preferred lexical label) for corporate body dates associated with corporate body field of activity of corporate body history of corporate body language of corporate body place associated with corporate body type of corporate body variant name for corporate body corporate subdivision [under development] place as jurisdictional corporate body [under development] attributes/properties in rda compared to ycr (cont.) can bibliographic data be put directly onto the semantic web? | yee 79 model entity rda ycr conference or other event as corporate body creator [under development] jurisdictional corporate subdivision [under development] family name of the family preferred name for the family variant name for the family type of family date associated with the family place associated with the family prominent member of the family hereditary title family history identifier for the family concept term for the concept preferred term for the concept variant term for the concept type of concept identifier for the concept key identifier for concept language-based identifier (preferred lexical label) for concept qualifier for concept language-based identifier variant name for concept object name of the object preferred name for the object variant name for the object type of object date of production place of production producer/fabricator physical medium identifier for the object key identifier for object language-based identifier (preferred lexical label) for object qualifier for object language-based identifier variant name for object event name of the event preferred name for the event variant name for the event date associated with the event place associated with the event identifier for the event key identifier for event or historical period as subject language-based identifier (preferred lexical label) for event or historical period as subject beginning date for event or historical period as subject ending date for event or historical period as subject variant name for event or historical period as subject place name of the place preferred name for the place variant name for the place coordinates other geographical information identifier for the place key identifier for place as geographic area language-based identifier (preferred lexical label) for place as geographic area qualifier for place as geographic area variant name for place as geographic area discipline key identifier for discipline language-based identifier (preferred lexical label) (name or classification number or symbol) for discipline translation of meaning of classification number or symbol for discipline attributes/properties in rda compared to ycr (cont.) 80 information technology and libraries | june 2009 model entity rda ycr genre/form key identifier for genre/form language-based identifier (preferred lexical label) for genre/form variant name for genre/form name scope of usage date of usage identifier controlled access point rules agency note: in rda, the following attributes have not yet been assigned to a particular class or entity: extent, dimensions, terms of availability, contact information, restrictions on access, restrictions on use, uniform resource locator, status of identification, source consulted, cataloguer’s note, status of identification, and undifferentiated name indicator. name is being treated as both a class and a property. identifier and controlled access point are treated as properties rather than classes in both rda and ycr. attributes/properties in rda compared to ycr (cont.) 252 book reviews systematic analysis of university libraries, by jeffrey a. raffel and robert shishko. cambridge, mass.: m. i. t. press, 1969. 107 pp. $6.95. . systematic analysis of university libraries is an exciting book, for it is the first rep?rt describing a~plication of cost-benefit analysis to a library. raffel and shishk? ha~e apphed the methodology of cost-benefit analysis to . the m. i. t. l1branes and have produced an admirable description of this. method of research that examines policy making in a system as a cho~ce among alternatives. this work is not a cookbook providing answers denved from principles; it is an exposition of a methodology that produces data used as a basis of decision making. the book employs the case-study technique, with them. i. t. libraries furnish~g ~e raw material for the cases. findings cannot be extrapolated to all hbranes: although they may be applicable in some. for example, raffel and shishko found that 75% of the m. i. t. libraries budget is allocated to res.earch activities in the institution. such findings are inapplicable to small hberal arts colleges, where faculty members do little research. the ,purpose of systematic analysis of university libraries is to teach the ~ppl1cabo~ of cost-benefit analysis rather than to provide answers. it mstructs m the methodology for obtaining answers. case. studies pres~nted in the ?ook include selection, acquisitions and catalogm~,. ~mong hbrary operations. also examined are book storage, study facihhes and reserve book procedures. a technique for measuring benefits by surveying users is also described. the conc~uding ~hapter presents in outline form major findings, of which o~ly two will be given here as examples of results of this type of analysis. frrst, the authors found that the most effective alternate storage system, namely compact storage, saves only about one percent of annual library resources, but provokes a major loss of benefit, since compact storage limits browsing and increases retrieval time for books. a second finding of int~rest ~s . that major. cataloging expenses are for professional librarians domg ongmal catalogmg, and for proofreading and checking of catalog car~s. t?at costs of original cataloging bulk largest will not be a surprise to hb:anans, but that the ~ext largest cost should be proofreading and checkmg of catalog cards will come as a surprise to some. the book concludes with a score of research questions to be explored in the future, and it is fervently to be hoped that raffel and shishko will continue their investigation along the avenues they have delineated. frederick g. kilgour book reviews 253 tj:e undergr~d~te library, by irene a. braden. chicago: american library assocmhon, 1970. ( acrl monograph, 31). 158 pp. $7.50. the separate undergraduate library on the university campus is a phe?omenon of the last two decades-harvard's lamont library was the first m 1949. more than twenty-five such libraries now exist or are in the plann.in~ or c01~str~~tion stage. the literature of librarianship contains descnphons of md1v1dual undergraduate libraries or philosophical essays concerning library services for undergraduates. braden, however, was the first to st':dy more .exte~sively and impartially this attempt to provide bette~ se.rvices for umvemty students. for her dissertation at the university of. m~ch1gan, she collected data on six undergraduate libraries-harvard, ~i~htg~, south carolina, cornell, indiana, and texas. each library was visited m 1965/66 and interviews with librarians were conducted· documents were consulted. ' we he~e have published ~5-35 page descriptions of these six pioneers. !~~ studies r.ange from architectural design, through the gathering of the m1hal collections of books and other media, to the host of services offered in the complete~ library. excellent statistical tables, organizational charts and floor plans illustrate the text. there are some errors. michigan added more _se~ts in 1965, not sej?,tember, 1966 as stated on page 43. also referring to michtgan on page 47: the reference collection began with about 2000 volumes, but it soon became evident that it would have to be enlarged. the collection now numbers about 3100 volumes.83" the footnote refers to page 18 of the 1957/58 annual report of the michigan undergraduate library, but there is no mention of the number of reference volumes there. instead the 1957/58 annual report records on page 4 that there were 800 reference volumes on november 18, 1957 when the collections were moved into the new building. after presentation of the case studies, the author summarizes her conclusions on the buildings, book collections, services, staffs, and use by stu?en~s. ~f particular value are fourteen brief guidelines formulated to assist libranans who may be contemplating an undergraduate library on their campus. the reader should be forewarned that the undergraduate library, although a most welcome publication, is now an historical document. only data through 1964/65 are presented. major changes in services and facilities have occurred in the past five years. those interested in autom~ti~n would think that undergraduate libraries have done nothing. michigan, however, began an automated circulation system for reserve material in 1967 and for the main collection in 1968. . billy r. wilkinson 252 book reviews systematic analysis of university libraries, by jeffrey a. raffel and robert shishko. cambridge, mass.: m. i. t. press, 1969. 107 pp. $6.95. . systematic analysis of university libraries is an exciting book, for it is the first rep?rt describing a~plication of cost-benefit analysis to a library. raffel and shishk? ha~e apphed the methodology of cost-benefit analysis to . the m. i. t. l1branes and have produced an admirable description of this. method of research that examines policy making in a system as a cho~ce among alternatives. this work is not a cookbook providing answers denved from principles; it is an exposition of a methodology that produces data used as a basis of decision making. the book employs the case-study technique, with them. i. t. libraries furnish~g ~e raw material for the cases. findings cannot be extrapolated to all hbranes: although they may be applicable in some. for example, raffel and shishko found that 75% of the m. i. t. libraries budget is allocated to res.earch activities in the institution. such findings are inapplicable to small hberal arts colleges, where faculty members do little research. the ,purpose of systematic analysis of university libraries is to teach the ~ppl1cabo~ of cost-benefit analysis rather than to provide answers. it mstructs m the methodology for obtaining answers. case. studies pres~nted in the ?ook include selection, acquisitions and catalogm~,. ~mong hbrary operations. also examined are book storage, study facihhes and reserve book procedures. a technique for measuring benefits by surveying users is also described. the conc~uding ~hapter presents in outline form major findings, of which o~ly two will be given here as examples of results of this type of analysis. frrst, the authors found that the most effective alternate storage system, namely compact storage, saves only about one percent of annual library resources, but provokes a major loss of benefit, since compact storage limits browsing and increases retrieval time for books. a second finding of int~rest ~s . that major. cataloging expenses are for professional librarians domg ongmal catalogmg, and for proofreading and checking of catalog car~s. t?at costs of original cataloging bulk largest will not be a surprise to hb:anans, but that the ~ext largest cost should be proofreading and checkmg of catalog cards will come as a surprise to some. the book concludes with a score of research questions to be explored in the future, and it is fervently to be hoped that raffel and shishko will continue their investigation along the avenues they have delineated. frederick g. kilgour book reviews 253 tj:e undergr~d~te library, by irene a. braden. chicago: american library assocmhon, 1970. ( acrl monograph, 31). 158 pp. $7.50. the separate undergraduate library on the university campus is a phe?omenon of the last two decades-harvard's lamont library was the first m 1949. more than twenty-five such libraries now exist or are in the plann.in~ or c01~str~~tion stage. the literature of librarianship contains descnphons of md1v1dual undergraduate libraries or philosophical essays concerning library services for undergraduates. braden, however, was the first to st':dy more .exte~sively and impartially this attempt to provide bette~ se.rvices for umvemty students. for her dissertation at the university of. m~ch1gan, she collected data on six undergraduate libraries-harvard, ~i~htg~, south carolina, cornell, indiana, and texas. each library was visited m 1965/66 and interviews with librarians were conducted· documents were consulted. ' we he~e have published ~5-35 page descriptions of these six pioneers. !~~ studies r.ange from architectural design, through the gathering of the m1hal collections of books and other media, to the host of services offered in the complete~ library. excellent statistical tables, organizational charts and floor plans illustrate the text. there are some errors. michigan added more _se~ts in 1965, not sej?,tember, 1966 as stated on page 43. also referring to michtgan on page 47: the reference collection began with about 2000 volumes, but it soon became evident that it would have to be enlarged. the collection now numbers about 3100 volumes.83" the footnote refers to page 18 of the 1957/58 annual report of the michigan undergraduate library, but there is no mention of the number of reference volumes there. instead the 1957/58 annual report records on page 4 that there were 800 reference volumes on november 18, 1957 when the collections were moved into the new building. after presentation of the case studies, the author summarizes her conclusions on the buildings, book collections, services, staffs, and use by stu?en~s. ~f particular value are fourteen brief guidelines formulated to assist libranans who may be contemplating an undergraduate library on their campus. the reader should be forewarned that the undergraduate library, although a most welcome publication, is now an historical document. only data through 1964/65 are presented. major changes in services and facilities have occurred in the past five years. those interested in autom~ti~n would think that undergraduate libraries have done nothing. michigan, however, began an automated circulation system for reserve material in 1967 and for the main collection in 1968. . billy r. wilkinson 254 journal of library automation vol. 3/3 september, 1970 report on t~e total system computer program for medical libraries, by robert e. divett and w. wayne jones. albuquerque: university of new mexico school of medicine library of the medical sciences, 1969. 424 pp. the concep~ .of "total system" is a fairly easy one to grasp until one attempts defimtion of the term. then there creep in all sorts of unexpected, rather unfair practical considerations, usually related to environment. under these constraints, one man's total system becomes a very personal conditioned statement. the report is organized into three sections: a system description oriented toward the librarian; technical descriptions of the file organization and program structure f~r .the programmer; and a set of appendices which mclude the source hstmgs of all the programs. the source listings are mo~e than three-quarters of ~he report, and are tiring to examine and decipher. much more useful m a report of this nature would have been the program decision tables which underly the program. a section on recommendations explores the future direction of the system. however, some matters of concern in the report are glossed over in a rather facile manner with little or no comment. the system has been implemented at different levels. acquisitions and cat~o~ing are. essentially translations to an on-line mode of a batch system. (it ~s mteres~g to note ~at ~ card catalog is maintained to back up this on-line operation). on-line crrculation is presented as if it is running, whereas the authors say that lack of funding prevented implementation. -:r:he really exciting work has been done with file organization, the incorporation of mesh tree structures on the file and their use for upward (to most general, not most specific) searching, and the development of an on-line interrogation procedure both for update and search of the file. one finds that hardware costs alone would be either $7728 per annum plus computer time for a batch system, or between $98,000 and $104,000 per annum plus computer time for a terminal system. but then one reads that "the terminal total computer system is the only effective, efficient way of meeting the demands of service and processing that are required by a technical library." when one is talking about a hardware cost of $100,000 per annum what exactly do the words "only effective, efficient" mean? glyn evans how to manage and use technical information, by freeman h . dyke, jr. boston: industrial education institute, 1968. $15.00. freeman dyke is a veteran of the ups and downs of the informationretrieval industry and through his association through the years with jonker, doc~mentat.i~n,_inc. ~leasco ), and the acm lecture circuit has developed a wide famil1anty with hardware and software used in the handling of technical information. this book is a compendium of information about book reviews 255 equipment and systems, ranging from catalog cards to computers. a useful feature, repeated many times throughout the b<9ok, is a double list of "advantages" and "disadvantages" for the hardware or the system that has been described. thus, the advantages of uniterm cards and dual dictionaries (simplicity, low equipment and operating cost, flexibility of vocabulary, physical availability, fairly high output speed ) are balanced against their disadvantages (variable search speed, low output flexibility, indirect access to information, difficulty in updating ). in most cases, no bias is indicated in the descriptive sections, and the reader is more or less on his own in making a final choice of machine or technique. numerous clear illustrations -photographs, cartoons, diagrams and other graphics-provide a . helpful and interesting relief to the unjustified offset text. the lack of an index sets up serious retrieval problems. the major market for this book would seem to be business and industry, particularly companies which .are planning to set up or modernize their methods for the storage and retrieval of technical information. the book might well be purchased for the business or industrial users of a library. because it is not at all oriented to the problems of library automation, it is not particularly recommended for use by the librarians themselves. · a. ]. goldwyn an introduction to decision logic tables, by herman mcdaniel. new york: wiley, 1968. 96 pp. $6.95. the literature of decision tables is marked more by · its absence than by its presence; before the appearance of this book, the reader was limited to brief journal articles or an infrequent technical report or two. thus, even though the author wams that the present volume makes no pretext of being an exhaustive treatment, he has nonetheless added materially to the store of knowledge of this admittedly limited field. ; mcdaniel carefully leads the reader through the process of developing a decision table and the simple rules of logic utilized to prove relevancy of the table elements or for eliminating irrelevant tests. of interest to all who are concerned with automation is the author's discussion of -the conversion of a flow chart to· a decision table. another interesting section is the use of table processors to translate decisioi1 tables into · portions of computer programs. at · this juncture, the author offers some evidence to support his contention that considerable programming time will be saved if the programmer works from decision tables rather than flow charts. if he is right, librarians had better get with it and learn how to construct decision tables as well as flow charts. one omission, a discussion of and and or condition statements, is unfortunate since it appears that they merit space even in an introductory text. however, the author does provide a considerable number of exercises for the reader. these will help to sharpen the reader's understanding of decision tables. john ]. miniter 254 journal of library automation vol. 3/3 september, 1970 report on t~e total system computer program for medical libraries, by robert e. divett and w. wayne jones. albuquerque: university of new mexico school of medicine library of the medical sciences, 1969. 424 pp. the concep~ .of "total system" is a fairly easy one to grasp until one attempts defimtion of the term. then there creep in all sorts of unexpected, rather unfair practical considerations, usually related to environment. under these constraints, one man's total system becomes a very personal conditioned statement. the report is organized into three sections: a system description oriented toward the librarian; technical descriptions of the file organization and program structure f~r .the programmer; and a set of appendices which mclude the source hstmgs of all the programs. the source listings are mo~e than three-quarters of ~he report, and are tiring to examine and decipher. much more useful m a report of this nature would have been the program decision tables which underly the program. a section on recommendations explores the future direction of the system. however, some matters of concern in the report are glossed over in a rather facile manner with little or no comment. the system has been implemented at different levels. acquisitions and cat~o~ing are. essentially translations to an on-line mode of a batch system. (it ~s mteres~g to note ~at ~ card catalog is maintained to back up this on-line operation). on-line crrculation is presented as if it is running, whereas the authors say that lack of funding prevented implementation. -:r:he really exciting work has been done with file organization, the incorporation of mesh tree structures on the file and their use for upward (to most general, not most specific) searching, and the development of an on-line interrogation procedure both for update and search of the file. one finds that hardware costs alone would be either $7728 per annum plus computer time for a batch system, or between $98,000 and $104,000 per annum plus computer time for a terminal system. but then one reads that "the terminal total computer system is the only effective, efficient way of meeting the demands of service and processing that are required by a technical library." when one is talking about a hardware cost of $100,000 per annum what exactly do the words "only effective, efficient" mean? glyn evans how to manage and use technical information, by freeman h . dyke, jr. boston: industrial education institute, 1968. $15.00. freeman dyke is a veteran of the ups and downs of the informationretrieval industry and through his association through the years with jonker, doc~mentat.i~n,_inc. ~leasco ), and the acm lecture circuit has developed a wide famil1anty with hardware and software used in the handling of technical information. this book is a compendium of information about book reviews 255 equipment and systems, ranging from catalog cards to computers. a useful feature, repeated many times throughout the b<9ok, is a double list of "advantages" and "disadvantages" for the hardware or the system that has been described. thus, the advantages of uniterm cards and dual dictionaries (simplicity, low equipment and operating cost, flexibility of vocabulary, physical availability, fairly high output speed ) are balanced against their disadvantages (variable search speed, low output flexibility, indirect access to information, difficulty in updating ). in most cases, no bias is indicated in the descriptive sections, and the reader is more or less on his own in making a final choice of machine or technique. numerous clear illustrations -photographs, cartoons, diagrams and other graphics-provide a . helpful and interesting relief to the unjustified offset text. the lack of an index sets up serious retrieval problems. the major market for this book would seem to be business and industry, particularly companies which .are planning to set up or modernize their methods for the storage and retrieval of technical information. the book might well be purchased for the business or industrial users of a library. because it is not at all oriented to the problems of library automation, it is not particularly recommended for use by the librarians themselves. · a. ]. goldwyn an introduction to decision logic tables, by herman mcdaniel. new york: wiley, 1968. 96 pp. $6.95. the literature of decision tables is marked more by · its absence than by its presence; before the appearance of this book, the reader was limited to brief journal articles or an infrequent technical report or two. thus, even though the author wams that the present volume makes no pretext of being an exhaustive treatment, he has nonetheless added materially to the store of knowledge of this admittedly limited field. ; mcdaniel carefully leads the reader through the process of developing a decision table and the simple rules of logic utilized to prove relevancy of the table elements or for eliminating irrelevant tests. of interest to all who are concerned with automation is the author's discussion of -the conversion of a flow chart to· a decision table. another interesting section is the use of table processors to translate decisioi1 tables into · portions of computer programs. at · this juncture, the author offers some evidence to support his contention that considerable programming time will be saved if the programmer works from decision tables rather than flow charts. if he is right, librarians had better get with it and learn how to construct decision tables as well as flow charts. one omission, a discussion of and and or condition statements, is unfortunate since it appears that they merit space even in an introductory text. however, the author does provide a considerable number of exercises for the reader. these will help to sharpen the reader's understanding of decision tables. john ]. miniter 256 journal of library automation vol. 3/3 september, 1970 computer-based library and information systems, by j.p. henley. computer monographs series. new york: american elsevier, 1970. 84 pp. $5.75. just when librarians and computer specialists were beginning to understand each other, there is published a slight monograph that effectively gaps the bridge. the book is based upon mr. henley's m.sc. work at trinity college, dublin. it bears a 1970 imprint, but appears to be about seven years out of date. one is told briefly about the king report, the information retrieval languages lisp and comit, and related ancient breakthroughs. the bibliography yields 32 dated citations with a mean date of 1963-the approximate time this work might have been considered timely. in seven slim chapters ru1d two gratuitous appendices, the author treats such topics as "introduction to the computer", "library systems requirements", "the philosophy of a machine-based system", and even "a short note on backus normal form". some of the author's urgent allusions to old events are pure high camp, e.g., "the growing interest in mechanisation, borne out for example by . . . the recent initiation of discussions between a major publishing house and a large computer manufacturer, make it vital for the cross-fertilization of ideas between computer and library experts to proceed as quickly as possible." (p. 75.) other pronouncements are patently absurd, such as: "one common use of such real-time 'on-line' computing is the writing of a program directly at the console, instruction l;ly instruction, instead of having to write it all beforehand and read it in from cards or paper tape." ( p. 10.) it is all too easy to fault a short book for shortcomings, but other books in this same series, such as j. m. foster's list processing, have proven the excellence possible in a trim so-shilling monograph (mr. foster's work is only 54 pages.) excellence in this format appears to require focus upon a narrow subject area, and discipline in the treatment of the core elements of the area. in attempting in 84 pages to cover several subjects of encyclopedic scope (library and information systems, as well as a basic computer tutorial), the author piles pelion upon ossa and then shows us sample pebbles from the pile instead of the view from the summit. there remains much important and exciting material to be presented to librarians and computer people about each other's work. regrettably, mr. henley, in the words of his fellow dubliner james joyce, has "speared the rod and spoiled the lightning." wiuiam r. nugent librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 129 debra a. riley-huff and julia m. rholes librarians and technology skill acquisition: issues and perspectives qualified individuals to fill these technology-driven librarian roles in our libraries and if so why? how are qualifications acquired and what are are they, besides a moving target? there appears to be two major convergent trends influencing this uncertain phenomenon. the first is what is perceived as “lack of awareness” and consensus about what the core of lis needs to be or to become in order to offer real value in a constantly changing and competitive information landscape.5 the other trend centers on the role of lis education and the continuing questions regarding its direction, efficacy, and ability to prepare future librarians for the modern information professions of now and the future. while changes are apparent it appears many lis programs are still operating on a two-track model of “traditional librarians and information managers” and there are enough questions in this area to warrant further investigation and inquiry.6 ■■ literature review most of the literature pertaining to the readiness of librarians to work in increasingly technical environments, centers on lis education. this certainly makes sense given the assumed qualifications the degree confers. scant literature focuses solely on the core of the librarians’ professional identity, workplace culture, and institutional historical perspectives related to qualifications; however, allusions to “redefining” lis are often found in lis education literature. there is limited research on preprofessional or even professional in-service training although calls for such research have been made repeatedly. a key study on lis education is the 2000 kaliper report, issued when the impact of technology in libraries was clearly reaching saturation.7 the report is the product of an analysis project with a goal of examining new trends in lis education. the report lists six trends including three of which are pertinent to the investigation of technology inclusion in lis programs. these trends note that in 2000, lis programs were beginning to address a more broad range of information problems and environments, programs were increasing it content into the curriculum, and several programs were beginning to offer specializations within the curriculum, though not ones with a heavy technology focus. in a widely cited curriculum study in 2004, markey completed a comprehensive examination of 55 libraries are increasingly searching for and employing librarians with significant technology skill sets. this article reports on a study conducted to determine how well prepared librarians are for their positions in academic libraries, how they acquired their skillss and how difficult they are to hire and retain. the examination entails a close look at ala-accredited lis program technology course offerings and dovetails a dual survey designed to capture experiences and perspectives from practitioners, both library administrators and librarianss who have significant technology roles. a recent oclc report on research libraries, risk, and systemic change discusses what arl directors perceive as the highest risks to their libraries.1 the administrators reported on several high risks in the area of human resources including high-risk conditions in recruitment, training, and job pools. the oclc report notes that recruitment and retention is difficult due to the competitive environment and the reduction in the pool of qualified candidates. why precisely do administrators perceive that there is a scarcity of qualified candidates? changes in libraries, most of which have been brought on by the digital age, are reflected in the need for a stronger technological type of librarianship—not simply because technology is there to be taken advantage of, but because “information” by nature has found its dominion as the supreme commodity perfectly transported on bits. it follows, if information is your profession, you are no longer on paper. that lis is becoming an increasingly technology-driven profession is both recognized and documented. a noted trend particularly in academic libraries is a move away from simply redefining traditional or existing library roles altogether in favor of new and completely redesigned job profiles.2 this trend verifies actions by library administrators who are increasingly seeking librarians with a wider range of information technology (it) skills to meet the demands of users who are accessing information through technology.3 johnson states the need well as we need an integrated understanding of human needs and their relationships to information systems and social structures. we need unifying principles that illuminate the role of information in both computation and cognition, in both communication and community. we need information professionals who can apply these principles to synthesize human-centered and technological perspectives.4 the questions then become, is there a scarcity of debra a. riley-huff (rileyhuf@olemiss.edu) is web services librarian, university of mississippi libraries, university, miss. julia m. rholes (jrholes@olemiss.edu) is dean of libraries, university of mississippi libraries, university, mississippi. 130 information technology and libraries | september 2011 academic libraries had embarked on an unprecedented increase in filling librarian positions with professionals who do not have a master’s degree in library science.13 citing the association of research libraries annual salary statistics, among a variety of positions being filled by other professionals a substantial number are going to those in technology fields such as systems and instructional technology. in the mid 2000s, suggestions that library schools needed to work more closely with computer science departments began coming up more often. obstacles to these types of partnerships were noted as computer science departments failed to see the advantage offered by library science faculty as well as being wary of taking on a “softening” by the inclusion of what is perceived as a “soft science.”14 in response, most library schools have added courses in computing, but many still question the adequacy. more recently there have been increasing calls from within lis for more research into lis education and professional practice. in 2006, a study by mckinney comparing proposed “ala core competencies” to what was actually being taught in ala-accredited curricula, shed some light on what is currently offered in the core of lis education.15 the study found that the core competency required most often in ala-accredited programs were “knowledge organization” or cataloging (94.6 percent), “professional ethics” (80.4 percent), “knowledge dissemination” or reference (73.2 percent), “knowledge inquiry” or research (66.1 percent), and “technical knowledge” or technology foundations (66.1 percent).16 these courses map well to ala core competencies but the question in the digital age, is one, not even universally required, technology-related course adequate for a career in lis? the literature would seem to reflect that it is not. 2007 saw many calls for studies of lis education using methods that not only examined course curricula but that also sought evidence of outcomes by those working in the field.17 an interest in studies reporting on employers’ views, graduates’ workplace experiences, and if possible longitudinal studies have been outwardly requested.18 indications are that those in library work environments can play a vital role in shaping the future course of lis education and preprofessional training by providing targeted research, data, and evidence of where weaknesses are currently being experienced and what changes are driving new scenarios. the most current literature points out both areas of technology deficiencies and emerging opportunities in libraries. areas with an apparent need for immediate improvement are the continuing integration of third-party web 2.0 application programming interfaces (apis) and social networking platforms.19 debates about job titles and labels continue but the actuality is that the number of adequately trained digital librarians has not kept up with the demand.20 modern libraries require those in technology-related roles to have broad or ala-accredited lis programs looking for change between the years 2000 and 2002.8 markey’s study revealed that while there were improvements in the number of it-related courses offered and required throughout programs, they were still limited overall with the emphasis continuing to be on the core curriculum consisting of foundations, reference, organization, and management. one of the important points markey makes is the considerable challenge involved in retraining or acquiring knowledgeable faculty to teach relevant it courses. the focus on lis education issues came to the fore in 2004 when michael gorman released a pair of articles asserting that there was a crisis in lis education, namely an assault on lis by what gorman referred to as “information science,” “information studies” and “information technology.”9 gorman’s papers sought to establish that there is a de facto competition between information science courses, which he characterized as courses with a computational focus and lis courses, which composed core librarianship courses, those tending to be the more user focused and organizational. gorman claimed lis faculty were being marginalized in favor of information science and made further claims regarding gender roles within the profession along the alleged lis/is split. gorman also noted that there was no consensus about how “librarianship” should be defined coming from either ala or the lis graduate programs. the articles were not without controversy, spurring a flurry of discussion in the library community, which spawned several follow up articles. dillon and norris rallied against the library vs. information science argument as a premise, which has no bearing on the reality of what is happening in lis and does nothing but create yet another distracting disagreement over labels.10 others argued for the increasing inclusion of technology courses in lis education, as estabrook put it, librarianship without a strong linkage to technology (and it’s capacity to extend our work) will become a mastodon. technology without reference to the core library principles of information organization and access is deracinated.11 as the future of lis was being hotly debated, voices in the field were issuing warnings that obstacles were being encountered finding qualified librarians with the requisite technology skills necessary to take on new roles in the library. in 2007, johnson made the case for the increasing need for new areas of emphasis in lis, including specializations such as geographic information systems by pointing out that it is not so much the granular training that is expected of lis education but a higher level technology skill set that allows for the ability to move into these specializations, identify what is needed, assess problems, and make decisions.12 in 2006, neal noted that librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 131 by examination of course catalogs and surveys of both library administrators and technology librarians. the lis educational data was obtained by inspecting course catalogs. course catalogs and website curriculum pages from all ala-accredited lis programs in the united states, canada, and puerto rico were examined in december 2009 for the inclusion of technology-related courses. the catalogs examined were for the 2009–10 academic year. spanish and french catalogs were translated. each available course description was reviewed and those courses with a primary technology component were identified. in a secondary examination the selected courses were closely inspected for the exact technology focus and the primary subject content was noted for each course. courses were then separated into categories by areas of focus and tabulated. a targeted survey identified practicing technology librarians’ perspectives on their level of preparation and continuing skill level needs based on actual job demands. in this survey, librarians with significant technology roles was defined as “for the purposes of this survey a librarian with a significant technology role would be any librarian whose job would very likely be considered “it” if they were not in a library and whose job titles contain words like “systems, digital, web, electronic, network, database, automation, and whose job involves maintaining and/or building various it infrastructures.” the survey was posted on various library and library technology electronic discussion lists in december 2009 and was available for two weeks. library administrative perspectives were also gained through a targeted survey aimed at those with an administrative role of department head or higher. the survey was designed to capture the reported experience library administrators have had with librarians in significant technology roles, primarily as it relates to skill levels, availability, hiring, and retention. this survey was posted on to various library administrative and technology discussion lists in december 2009 and was also available for two weeks. both surveys included many similar questions to compare and contrast viewpoints. results were tabulated to form an overarching picture and some relevant comparisons were made. there are limitations and inherent issues with this type of research. catalog examinations when completed by qualified librarians can hold great accuracy; however, the introduction of bias or misinterpretation is always possible.26 when categorizing courses, the authors reviewed course descriptions three separate times to ensure accuracy. courses in doubt were reviewed again with knowledgeable colleagues to obtain a consensus. surveys designed to capture perspectives, views, and experiences are by nature highly subjective and provide data that is both qualitative and quantitative. tabulated data was given strictly simple numerical representation to provide a factual picture of what was reported. specialized competencies in areas such as web development, database design, and management paired with a good working knowledge of classification formats such as xml, marc, ead, rdf and dublin core. educational technology (et) has been identified as an area of expected growth opportunity for libraries and there have been suggestions that more lis programs should partner with et programs to improve lis technology offerings, skills and preprofessional training.21 lis program change, including the apparent coalescing of information technology focused education would appear to be demonstrated by the ischool or ifield caucus of ala accredited programs, however the literature is not clear on if that is actually being evidenced. the ischools organization started in as collective in 2005 with a goal of advancing information science. ischools incorporate a multidisciplinary approach and those with a library science focus are ala accredited.22 a 2009 study interestingly applied abbott’s theoretical framework used in the chaos of disciplines to the ifield.23 resulting in abstract yet relevant conclusions, abbott looks at change in a field through a sociological lens looking for patterns of fractal distinction over time. the study concluded that traditional lis education remained at the heart of the ifield movement and that the real change has been in locale, from libraries to location independent.24 hall’s 2009 study exploring the core of required courses across almost all ala accredited programs reveals that the core curriculum is still principle-centered, but it is focusing less on reference and intermediary activities with a definite shift toward research methods and information technology.25 ■■ method this research study was designed to capture a broad view of technology skill needs, skill availability, and skill acquisition in libraries, while still allowing for some areas of sharper focus on stakeholder perspectives. the four primary stakeholder groups in this study were identified as lis educators, lis students, working librarians, and library administrators. the research questions cover three main areas of technology skill acquisition and employment. one area is lis education and whether the status of all technology course offerings has changed in recent years in response to market demands. the second area is the experience of librarians with significant technology roles with regards to job availability, readiness, and technology skill acquisition. the third area is, the perception of library administrators regarding the availability and readiness of librarians with technology roles. to cover the research questions and provide a broad situational view, the research was triangulated and aimed at the three question areas. data collection was accomplished 132 information technology and libraries | september 2011 may arguably be considered description or cataloging. metadata was included because it is an integral part of many new digital services. the categories are presented in column 1, the total number of courses offered is presented in column 2. the number of advanced courses available within each category total is further broken out into parenthesis. some programs offered more than one course in a given category; hence the percentage of programs offering at least one course is given in column 3. additionally, although the librarian survey was targeted to “those with significant technology roles,” it would appear that the definition of “significant” seemed to vary in interpretation by the respondents. this is discussed in further detail in the findings. given the limitations of this type of research, the authors did not attempt to find definite correlations, however trends and patterns are clearly revealed. ■■ catalog findings course catalogs from all 57 ala-accredited programs in the united states, canada, and puerto rico were examined for the inclusion of technology-related courses. a total of 439 technology-related courses were offered across the 57 lis programs, including certificate program course offerings. the total number of technology-related courses offered by program ranged from 2 to 20. the mean number of courses offered per program was 7.7, the median was 10, and the mode was 4. table 1 shows the total number of technology courses being offered per program by matching them with the number of courses they offer. catalog course content descriptions were analyzed looking for a technology focus. the fifteen categories noted in table 2 were selected as representative of the technology-related courses offered. it is acknowledged that some course content may be overlapping, but each course was placed in only one category based on its primary content. note also the inclusion of “metadata markup” which table 1. number of technology-related courses being offered per program # of programs offering # of courses offered 1 offers 2 courses 6 offer 3 courses 8 offer 4 courses 6 offer 5 courses 7 offer 6 courses 5 offer 7 courses 5 offer 8 courses 1 offer 9 courses 6 offer 10 courses 1 offers 11 courses 3 offer 12 courses 2 offer 13 courses 2 offer 14 courses 1 offers 15 courses 1 offers 17 courses 1 offers 18 courses 1 offers 20 courses table 2. course content description and number of courses offered across all programs. the number of advanced courses in the total is given in parenthesis. course type as categorized by the course content description in the lis program catalog # of courses offered % of programs offering at least 1 course database design, development and maintenance 47 (7) 70 web architecture (web design, development, usability) 52 (11) 68 broad technology survey courses (basics of library technologies and overviews) 50 65 digital libraries 43 (4) 61 systems analysis, server management 49 (6) 60 metadata markup (dc, ead, xml, rdf) 43 (10) 50 digital imaging, audio and video production 33 (5) 47 automation and integrated library systems 21 37 networks 32 (3) 35 human computer interaction 21 (4) 29 instructional technology 12 21 computer programming languages, open source technologies 12 (2) 17 web 2.0 (social networking, virtual reality, third party api’s) 11 17 user it management (microcomputers in libraries) 6 10 geographic information systems 6 (1) 8 librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 133 ■■ perspectives on job availability, readiness and skill acquisition as previously noted in the method, two surveys were administered to collect participant viewpoint data pertinent to the study. reponses were carefully checked to determine whether they met the criteria for inclusion in the study. no attempt was made to disqualify respondents based solely on job title. it did appear that a significant number of non-target subjects did initially reply to the librarian survey, but quit the survey at the technology-related questions. final inclusion was based on either an it-related job title or if the respondent answered the technology questions regardless of job title. tables 3–5 report demographic response data. ■■ perspectives on job and candidate availability a 2009 study by matthew and pardue asked the question “what skills do librarians need in today’s world?”29 they sought to answer this question by performing a content analysis, spread over five months, of randomly selected jobs from ala’s joblist. what they found in the area of technology was a significant need for web development, an assessment of the course catalog facts reveals that there have been increases in the number of technology courses offered in lis programs, but is it enough? significant longitudinal data shows an increased emphasis in the area of metadata. a 2008 study of the total number of lis courses offering internet or electronic resources and metadata schemas, found that the number of programs offering such as being ten (17.5 percent) with only twelve metadata courses offered in total.27 current results show 43 metadata courses offered with 50 percent of lis programs offering at least one course. the lack of a solid basis in web 2.0 applications and integration as reported by aharony is confirmed by the current catalog data, with only 17 percent of programs offering a course.28 while at first glance it looks like many technology-related courses are currently being offered in lis programs, a closer inspection reveals cause for concern. many of these courses should be offered by 100 percent of lis programs and advanced courses in many areas should be offered as well. while there may be some overlap of content in some of these course descriptions, the percentages are still too low to deduce that lis graduates, without preprofessional technology experience or education, are really prepared to take on serious technology roles in academic libraries. table 3. response data responses administrative survey librarian survey total responses 185 382 total usable (qualified) 146 227 table 4. respondents institution by size by type administrative survey librarian survey under 5,000 37 72 5,000 10,000 25 31 10,000 15,000 18 28 15,000 20,000 11 20 20,000 25,000 13 21 25,000 30,000 16 13 30,000 35,000 4 11 35,000 40,000 5 9 more than 40,000 12 21 unknown 5 1 table 5. respondent type administrative survey: position # of responses dean, director, university librarian 46 department head 71 manager or other leadership role 29 librarian survey: general area of work # of responses public services 48 systems 42 web services 32 reporting dual roles 31 digital librarian 29 electronic resources librarian 28 emerging/instructional technologies 18 administrative 10 metadata/cataloger 9 technical services 7 distance education librarian 4 demographic data 134 information technology and libraries | september 2011 based on the difficulty rating and the classifications were then averaged by difficulty. some respondents were unsure of difficulty ratings because the searches happened before their presence at their current library and those searches were excluded. position classifications with less than five searches were excluded from averaging and are marked “na” in table 6. the difficulty rubric is as follows: 1 = easy; 2 = not too bad, pretty straightforward search; 3 = a bit tough, the search was protracted; 4 = very difficult, required more than one search; 5 = unable to fill the position. it is to be noted that almost all levels of difficulties were reported for many classifications but that the overall average hiring difficulty rating was 2.48. a comparable set of questions was posted to the librarian survey. we asked librarians to report professional level technology positions they had held in the past five years along with any current job searches. 164 responses were received by people indicating that they had held such a position or were searching for one, with the total number of positions/searches being reported at 316 with some respondents reporting multiple positions. respondents reported having between one and five different positions with the average number being 1.92 jobs per respondent (see table 7). the respondents were also asked to give the position title for each position held or positions they were applying for as well as the difficulty encountered in obtaining the position. like the administrative report, job titles were project management, systems development, and systems applications. further they suggest that some librarians are using a substantial professional it skills subset. this article’s literature review points out that there are assertions being made that some technology-related librarian positions are difficult to fill and may in fact be filled by non-mls professionals. in the associated surveys the authors sought to capture data related to actual job availability, search experiences and perspectives by both library administration and librarians. note that both mls librarians and a few professional library it staff completed the survey. the distinction is made where appropriate. the survey asked library administrators if they had hired a technology professional position in the past five years. 146 responses were received and 100 respondents indicated that they had conducted such a search, with the total number of searches being reported at 167. of these searches, 22 did not meet the criteria for inclusion due to other missing data such as job title. the total reported number of librarian/professional level technology positions that were posted for hire by these respondents was 145 with some respondents reporting multiple searches for the same or different positions. respondents conducting searches reported having between 1 and 5 searches total with the average number being 1.45 per respondent. the respondents were also asked to provide the position title for each search, the difficulty encountered in conducting the search, and the success rate. job titles were divided into categories to ascertain how many positions in each category reported having a relevant search conducted. each search was then assigned a point value table 6. administrative report on positions open, searches and difficulty of search (n = 145) position classification searches search difficulty systems/ automation librarian 40 2.78 digital librarian 32 2.6 emerging & instructional technology librarian 15 2.53 web services/ development librarian 33 2.51 electronic resources librarian 22 1.95 database manager 1 na network librarian/ professional 1 na table 7. librarian report on positions held or current searches and difficulty (n = 316) position classification # of positions/ searches search difficulty administrative 8 3 technical services 17 2.11 public services 57 2.1 systems/ automation librarian 76 1.89 web services/ development librarian 38 1.89 electronic resources librarian 39 1.87 digital librarian 41 1.8 metadata/cataloger 13 1.77 distance education librarian 6 1.66 emerging & instructional technology librarian 21 1.61 reporting dual roles 30 na librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 135 employment status for “newly minted” mls graduates having just entered the profession were asked in a survey “did specific information technology or computer skills lead to you getting a job?” the answer was a “resounding yes” by 66 percent of the respondents.33 experience is divided into categories to ascertain how many positions in each classification category. each position classification was then assigned a point value base on how the respondents rated the difficulty of those particular searches and the classifications were then averaged by difficulty using the same scale that was applied in the administrative survey. again, almost all levels of difficulties were reported for many classifications but that the overall average hiring difficulty rating was 1.9. to provide as accurate a picture as possible the surveys asked both groups to indicate if any well known mitigating factors contributed to complications with the job searches. these factors are shown in table 8 which stacks both groups for comparison. this particular dataset reveals some interesting patterns. those roles that were in the most demand were the also the most difficult to hire for, while these also were the easier positions for candidates to find. librarians also listed more job categories as having a significant technology component than the administrators had. perhaps most notable is the discrepancy shown between how administrators perceive the qualifications of candidates as compared to how candidates view themselves. while both groups acknowledge lack of it skills and qualifications as the number one mitigating factor, library administrators perceive the problem as being significantly more serious. this data backs up other recent findings that important new job categories are being defined in lis.30 the data also further support that these roles, while centering on core librarianship principles, have a different skill set.31 ■■ job readiness perspectives issues of job readiness for academic librarians need to be looked at from a number of different perspectives. job readiness can be understood in one way by a candidate and can be something different to an employer. job readiness is not only of critical concern at the beginning of a librarian’s career, clearly this attribute continues to be significant throughout an individual’s length of service in one or more roles and to one or more employers. job readiness is composed of several factors, the most important being education, experience and ongoing skill acquisition. while this is certainly true for all librarians it is of even more concern to those librarians with significant technology roles because of rapid changes in technology. a concern has been established in the literature and in this study that lis education, in the areas of technology, may be inadequate and lack the intensity necessary for modern libraries. this perception has been backed up by entrants to the profession.32 that technology skills are extremely important to library employers has been evident for at least a decade. in 2001 a case study on table 8. mitigating factors in hiring and job search (n = 93) administrative survey: mitigating factors in hiring as a percentage of respondents to the question (n = 93) % of responses we had difficulty getting an applicant pool with adequate skills 54 we are unable to offer qualified candidates what we feel is a competitive salary. 38 we are located in what may reasonably be perceived as an undesirable area to live. 23 we are located in an area with a very high cost of living. 23 we have an it infrastructure or environment that we and/or a candidate may have perceived as unacceptable. 20 the current economic climate has made hiring for these types of positions easier. 18 a successful candidate did not accept an offer of employment 13 librarian survey: mitigating factors in job search as a percentage of respondents to the question (n = 198) % of responses i suspect i may not have/had adequate skills, experience or i was otherwise unqualified. 25 i have not been able to find a position for what i consider to be a fair salary. 11 many jobs are located in what may reasonably be perceived as an undesirable area to live. 10 many jobs are located in an area with a very high cost of living. 15 some jobs have an it infrastructure or environment that i have perceived as unacceptable. 10 the current economic climate has now made finding these types of positions tougher. 22 i was a successful candidate but i could or did not accept an offer of employment. 3 136 information technology and libraries | september 2011 library technology experience they preferred from a candidate. there were 97 responses; the range of preferred experience was 0–7, the mean was 3.06, and the mode was 3. librarians were also asked how much experience they had in a technology-related library role. there were 187 responses; the range of experience was 0–39 years, the mean was 8.7, the mode was 5. when participating administrators were asked if they felt it was necessary to have an mlis librarian fill a technology-related role that is heavily user-centric, 110 administrators responded. also a very important factor, with one study of academic library search committees reporting committee members mentioning that “experience trumps education.”34 this study sought to gather data on possible patterns in the job readiness area. the authors wanted to know how job candidates and employers felt about the viability of new mls graduates, how experience factored into job readiness, how much experience is out there and how long term experience impacted expectations. the survey asked administrators how many years of table 9. question sets related to experience factors by group administrative survey strongly disagree disagree can’t say agree strongly agree new librarians right out of graduate school seem to be adequately prepared (n = 111) 7% 40% 24% 28% 1% librarians with undergraduate or 2nd graduate degrees in a technology/computer fields seem adequately prepared (n = 109) 1% 9% 48% 39% 4% librarians with pre-professional technologyrelated experience seem adequately prepared (n = 109) 1% 6% 47% 41% 8% librarians with some (up to 3 years) post mls technology experience seem adequately prepared (n = 111) 1% 10% 17% 62% 10% librarians with more than 3 years post mls technology experience seem adequately prepared (n = 111) 1% 3% 24% 55% 16% librarians never seem adequately prepared for technology roles (n = 111) 19% 55% 12% 7% 6% librarian survey strongly disagree disagree other agree strongly agree as a new librarian right out of graduate school i was adequately prepared (n = 187) 12% 19% no grad degree 3% 42% 8% i have an undergraduate or 2nd graduate degree in a technology/computer field that has helped me be adequately prepared (n = 187) 13% 7% no tech degree 60% 13% 6% i had pre-professional technology-related experience that helped me be adequately prepared (n = 187) 3% 7% no such experience 20% 43% 27% i have less than 3 years of post mls technology experience and i am adequately prepared (n = 180) 6% 13% na 63% 16% 1% i have more than 3 years of post mls technology experience and i am adequately prepared (n = 184) 2% 12% na 17% 48% 20% i have never felt like i am adequately prepared for technology roles (n = 186) 19% 43% neutral 23% 12% 2% librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 137 readiness of new librarians and the value of related technology degrees. areas of agreement are noted in the importance of preprofessional experience, three or more years of experience, and the generally positive attitude regarding librarians’ ability to successfully take on significant technology roles in libraries. ■■ ongoing skill acquisition and retention how librarians with significant technology roles acquire the skills needed to do their jobs and how they keep those skills current was of great interest in this study. the importance of preprofessional experience has been noted but we should also include the value of service learning in lis education as an important starting point. successful service learning experiences include practicum and partnerships with libraries in need of technology-related services. successful projects such as online exhibits, wireless policies, taxonomy-creation and cross-walking for contentdm are just a few of the service projects that have given lis students real-world experience.35 this responses ranged from 50 percent “yes,” 38 percent “no,” and 12 percent “unsure.” to the same question, 195 practicing technology librarians responded with 58 percent “yes,” 23 percent “no,” and 20 percent “unsure.” the administrator participants were asked if they had ever had to fill a technology-related librarian role with a non-mls hire simply because they were unable to find a qualified librarian to fill the job. of 106 responses, 22 percent reported that they hired a non-mls candidate. the librarian participants were also was asked to report on mls status; out of 194 responses, 93 percent reported holding an mls or equivalent. the survey also asked the librarian participants to report what year they graduated from their mls program as the authors felt this data was important to the inherent longitudinal perspectives reported in the study. of 162 responses, participants reported graduating between 1972–2009. the mean was 1999, the median was 2002, and the mode was 2004. table 9 shows a question set related to experience factors, which stacks both groups for comparison. there are a few notable points in this particular dataset including what appears to be an area of disagreement between administrators and librarians about the table 10. education and skill supplementation for librarians with technology roles administrative survey: in what ways have you supplemented training for your librarians or professional staff with technology-related roles? (does not include ala conferences) % we have paid for technology-related conferences and pre-conferences. 79 we have paid for or allowed time off for classes. 72 we have paid for or allowed time for off online workshops and /or tutorials 87 we have paid for books or other learning materials. 55 we have paid for some or all of a 1st or 2nd graduate degree. 12 we would like to supplement but it is not in our budget. 5 we feel that keeping up with technology is essential for librarians with technology-related roles. 73 librarian survey: in what ways have you supplemented your own education related to technology skill development in terms of your time and/or money? (not including ala conferences) % i have attended technology-related conferences and pre-conferences. 73 i have taken classes. 60 i have taken online workshops and/or tutorials 87 i have bought books or other learning materials. 77 i am getting a 1st or 2nd graduate degree. 9 we would like to supplement my own education but i can not afford it. 13 i would like to supplement my own education but i do not have time. 13 i have not had to supplement in any way. 1 i feel that keeping up with technology is essential for librarians with technology-related roles. 84 i feel that keeping up with technology is somewhat futile. 11 138 information technology and libraries | september 2011 librarians who have transitioned successfully into technology centric roles. this supports the perception that experience and on the job learning play a leading role in the development of technology skills for librarians. openended survey comments also revealed a number of staff who initially were hired in an it role and then went on to acquire an mls while continuing in their technologyfocused role. retention is sometimes problematic for librarians with it roles, primarily because many of them are also employable in many other settings apart from libraries. the survey asked administrators “do you know any librarians with technology roles that have taken it positions outside the library field?” and out of 111 respondents, 33 percent answered “yes.” in open-ended responses the most common reasons administrators felt retention may be a problem was salary, lack of challenges/opportunities, and risk averse cultures. the survey also asked the librarian group “do you think you would ever consider taking an it position outside the library field?” out of 190 respondents; 34 percent answered “yes,” 23 percent “yes, but only if it was education related,” and 42 percent “no.” additionally 38 percent of these librarian respondents knew a librarian who took an it position outside the library field. for the librarian participants an open response field in the survey, named work environment and lack of support for technology as the most often named reasons for this leaving a position. the surveys used in this research study covered several complicated issues. those who responded to the surveys were encouraged to leave open text comments research study asked administrators and librarians in what formal ways they supplement their ongoing education and skill acquisition. table 10 shows these results in a stacked format for comparison. also of interest in this data set is the higher level of importance librarians place on continuing skill development in the area of technology. in open ended text responses a number of librarians reported that the less formal methods of monitoring electronic discussion lists and articles was also a very important part of keeping up with technology in their area. the priority of staying educated, active and current for librarians with significant technology roles cannot be underestimated; what tennant defines as technology agility, the capacity to learn constantly and quickly. i cannot make this point strongly enough. it does not matter what they know now. can they assess a new technology and what it may do (or not do) for your library? can they stay up to date? can they learn a new technology without formal training? if they can’t they will find it difficult to do the job.36 not all librarians with technology roles start out in those positions and thus role transformation must be examined. in some cases librarians with more traditional roles such as reference and collection development have transformed their skill set and taken on technology centric roles. table 11 shows the results of the survey questions related to role transformation in a stacked format for comparison. to be noted in this data set is the large number of table 11. role transformation from traditional library roles to technology centric roles and the reverse. administrative survey (n = 104) % we have had one or more librarians make this transformation successfully. 53 we have had one or more librarians attempt this transformation with some success. 35 we have had one or more librarians attempt this transformation without success. 17 some have been interested in doing this but have not done so. 14 we do not seem to have had anyone interested in this 11 we have had one or more librarians who started out in a technology-related librarian role but have left it for a more traditional librarian role. 5 librarian survey (n = 184) % i started out in a technology-related librarian role and i am still in it. 45 i have made a complete technology role transformation successfully from another type of librarian role. 30 i have attempted to make a technology role transformation but with only some success. 12 i have made a technology role transformation but sometimes i wish i had not. 9 i have made a technology role transformation but i wish i had not and i am interested in returning to a more traditional librarian role. 9 i am not a librarian. 4 librarians and technology skill acquisition: issues and perspectives | riley-huff and rholes 139 vary considerably from program to program and the content of individual courses appears to vary considerably as well. there appears to be a clear need for additional courses at a more advanced level. this need is evidenced by the experiences of both information technology job candidates and the administrators involved in the hiring decisions. there are clearly still difficulties in both the acquisition of needed skill sets for certain positions and in actual hiring for some information technology positions. there are also some discrepancies between how administrators perceive candidates’ qualifications as compared to how the candidates view themselves. administrators perceive the problem of a lack of it skills/qualifications as more serious than do candidates. the two groups also differ on the question of “readiness” of new professionals. the two groups do agree on the importance of preprofessional experience, and they both exhibit generally positive attitudes toward librarians’ ability to successfully take on significant technology roles in libraries. in several key areas. a large number of comments were received and many of them were of considerable length. many individuals clearly wanted to be heard, others were concerned their story would not be captured in the data, and many expressed a genuine overall interest in the topic. a few salient comments from a variety of areas covered are given in table 12. ■■ conclusion this study seeks to provide an overview of the current issues related to it staffing in academic libraries by reporting on three areas dealing with library skill acquisition and employment. with regards to the status of technology course offerings in lis programs, there has been a significant increase in the number of technologyrelated courses, but the numbers of technology courses table 12. a sample of open ended responses from the two surveys administrative survey “there is a huge need for more and adequate technology training for librarians. it is essential for libraries to remain viable in the future.” “only one library technology position (coordinator) is a professional librarian. others are professional positions without mls.” “there is a lot of competition for few jobs, especially in the current economic climate.” “we finally hired at the level of technician as none of the mls candidates had the necessary qualifications.” “if i wanted a position that would develop strategy for the library’s tools on the web or create a digitization program for special collections, i probably would want an mls with library experience simply because they understand the expectations and the environment.” “number of years of experience in technology is not as important as a willingness to learn and keep current. sometimes old dogs won’t move on to new tricks. sometimes new dogs aren’t interested in learning tricks.” librarian survey “i believe that because technology is constantly changing and evolving, librarians in technology-oriented positions must do the same.” “my problem with being a systems librarian in a small institution is that the job was 24/7/365. way too much stress with no down time.” “i have left the library field for a few years but came back. my motivation was a higher salary, but that didn’t really happen.” “i’m considering leaving my current position because the technology role (which i do love) was added to my position without much training or support. now that part of my job is growing so that i can’t keep up with all my duties.” “i don’t think that library school alone prepared me for my job. i had to do a lot of external study and work to learn what i did, and worked as a part-time systems library assistant while in school, where i learned the majority of what prepared me for my current job.” “library schools need to be more rigorous about teaching students how to innovate with technology, not just use tools others have built. you can’t convert “traditional” librarians into technology roles without rigorous study. otherwise, you will get mediocre and even dangerous results.” 140 information technology and libraries | september 2011 16. ibid., 53–54. 17. thomas w. leonhardt, “thoughts on library education,” technicalities 27, no. 3 (2007): 4–7. 18. thomas w. leonhardt, “library and information science education” technicalities 27, no. 2 (2007): 3–6. 19. noa aharony, “web 2.0 in u.s. lis schools: are they missing the boat?” ariande 30, no. 54 (2008): 1. 20. chuck thomas and salwa ismail patel, “competencybased training for digital librarians: a viable strategy for an evolving workforce?” journal of education for library & information science, 49, no. 4 (2008): 298–309. 21. michael j. miller, “information communication technology infusion in 21st century librarianship: a proposal for a blended core course,” journal of education for library & information science 48, no. 3 (2007): 202–17. 22. “about the ischools.” (2010); http://www.ischools.org/ site/about/ (accessed 9/1/2010). 23. laurie j. bonnici, manimegalai m. subramaniam, and kathleen burnett, “everything old is new again: the evolution of library and information science education from lis to ifield,” journal of education for library & information science 50, no. 4 (2009): 263–74; andrew abbott, the chaos of disciplines (chicago: chicago univ. pr., 2001). 24. bonnici, “everything old is new again,” 263–74. 25. russell a. hall, “exploring the core: an examination of required courses in ala-accredited,” education for information 27, no. 1 (2009): 57–67. 26. ibid., 62. 27. jane m. davis, “a survey of cataloging education: are library schools listening?” cataloging & cataloging quarterly 46, no. 2 (2008): 182–200. 28. aharony, “web 2.0 in u.s. lis,” 1. 29. janie m. mathews and harold pardue, “the presence of it skill sets in librarian position announcements,” college & research libraries 70, no. 3 (2009): 250–57. 30. “redefining lis jobs,” library technology reports 45, no. 3, (2007): 40. 31. youngok choi and edie rasmussen, “what qualifications and skill are important for digital librarian positions in academic libraries? a job advertisement analysis,” the journal of academic librarianship 35, no. 5 (2009): 457–67. 32. carla j. soffle and kim leeder, “practitioners and library education: a crisis of understanding,” journal of education for library & information science 46, no. 4 (2005): 312–19. 33. marta mestrovic deyrup and alan delozier, “a case study on the current employment status of new m.l.s. graduates,” current studies in librarianship 25, no. 1/2, (2001): 21–38. 34. mary a. ball and katherine schilling, “service learning, technology and lis education,” journal of education for library & information science 47, no. 4 (2006): 277–90. 35. marta mestrovic deyrup and alan delozier, “a case study on the current employment status of new m.l.s. graduates,” current studies in librarianship 25, no. 1/2 (2001): 21–38. 36. roy tennant, “the most important management decision: hiring staff for the new millennium,” library journal 123, no. 3 (1998): 102. more research is still needed to identify the key technology skills needed. case studies of successful library technology teams and individuals may reveal more about the process of skill acquisition. questions regarding how much can be taught in lis courses or practicum, and how much must be expected through on-the-job experience are good areas for more research. references 1. james michalko, constance malpas and arnold arcolio, “research libraries, risk and systematic change,” oclc research (mar. 2010), http://www.oclc.org/research/publications/ library/2010/2010-03.pdf. 2. lori a. goetsch, reinventing our work, “new and emerging roles for academic librarians,” journal of library administration 48, no. 2 (2008): 157–72. 3. janie m. mathews and harold pardue, “the presence of it skill sets in librarian position announcements,” college and research libraries 70, no. 3 (2009): 250–57. 4. peggy johnson, “from the editor’s desk,” technicalities 27, no. 3 (2007): 2–4. 5. ton debruyn, “questioning the focus of lis education,” journal of education for library & information science 48, no. 2 (2007): 108–15. 6. jacquelyn erdman, “education for a new breed of librarian,” reference librarian 47, no. 98 (2007): 93–94. 7. “educating library and information science professionals for a new century: the kaliper report,” executive summary. aliper advisory committee, alise. (reston, virginia, july 2000), http://www.si.umich.edu/~durrance/textdocs/ kaliperfinalr.pdf (accessed june 1, 2010). 8. karen markey, “current educational trends in library and information science curricula,” journal of education for library and information science 45, no. 4 (2004): 317–39. 9. michael gorman, “whither library education?” new library world 105, no. 9/10 (2004): 376–80; michael gorman, “what ails library education?” journal of academic librarianship 30, no. 2 (2004): 99–101. 10. andrew dillon and april norris, “crying wolf: an examination and reconsideration of the perception of crisis in lis education,” journal of education for library & information science 46, no. 4, (2005): 208–98. 11. leigh s. estabrook, “crying wolf: a response,” journal of education for library & information science 46, no. 4 (2005):299–303. 12. ian m. johnson, “education for librarianship and information studies: fit for purpose?” information development 23, no.1 (2007): 13–14. 13. james g. neal, “raised by wolves,” library journal 131, no. 3 (2006): 42–44. 14. sheila s. intner, “library education for the third millennium,” technicalities 24, no. 6 (2004): 10–12 15. renee d. mckinney, “draft proposed ala core competencies compared to ala-accredited, candidate, and precandidate program curricula: a preliminary analysis,” journal of education for library & information science 47 no.1 (2006): 52–77. editorial board thoughts: doesn’t work mark cyzyk editorial board thoughts | cyzyk 3 the proof of the pudding’s in the eating. miguel de cervantes saavedra. the ingenious hidalgo don quixote de la mancha. part i, chapter xxxvii, john rutherford, trans. about fifteen years ago i had two students from germany working for me, jens and andreas. those guys were great. they were smart and funny and interesting and always did their best. i would send them out to fix things around the library, and they would dutifully report back with success or failure. i told them that, particularly if there was a problem with a staff workstation, “if it breaks in the morning, it must be fixed by lunchtime; if it breaks in the afternoon, it must be fixed by 5:00.” they understood that if a staff workstation was down, then that probably also meant a staff member was just sitting there, waiting for it to be fixed. if we had to we could slap a sign on a broken public workstation and get back to it later—there were plenty of other working public stations after all—but staff workstations must be working at all times. insofar as we had an aged fleet of pcs whose cmos batteries were rapidly giving out, i kept jens and andreas running around the building quite a bit. on occasion, though, they would report back with the dreaded, “hey boss, doesn’t work.” this was the one thing that would raise my ire. “of course it doesn’t work, that’s why i sent you down there!” i would think. the phrase “doesn’t work” became for me a pavlovian signal that i was about to drop everything and go take a look myself. it now occurs to me, though, that this notion of “work” is precisely the point of technology, and that sometimes this gets lost for those of us employed fulltime as technologists in libraries. let me explain: in my opinion and for the most part, the proper role of the technologist in a library is that of a consultant on loan to the departments to work on projects there, embedded.1 two of the best bosses i ever had said essentially the same thing to me in our introductory first-day-on-the-job chit-chat: “you report to me, but you work for them.” such is the proper attitude in any serviceoriented profession. does this not frequently get inverted, subverted, lost? what happens is that technology starts to take on an importance undeserved. it becomes selfreferential and insular; a technology-for-technology’s-sake attitude arises. mark cyzyk (mcyzyk@jhu.edu) is scholarly communication architect in the sheridan libraries, john hopkins university. mailto:mcyzyk@jhu.edu information technology and libraries | june 2012 4 but technology-for-technology’s-sake is just wrong. technology is merely a means to an end, not an end in itself. the word itself derives from the ancient greek technê, most frequently translated into english as “craft” and frequently distinguished in the greek philosophical literature from epistêmê or (certain) knowledge.2 so it is here that the crucial distinction in the western world between practical and theoretical activities is made, and technology is clearly a practical, not theoretical activity. as such, it has by its very nature practical outcomes in the world: technology works in the world. technology is instrumental in achieving certain practical outcomes; its value is as a tool, instrumentally valuable, not inherently valuable. it is not for its own sake that we implement technology; we implement technology to get some sort of work accomplished in the world. our programming languages, application servers, web application frameworks, ajax libraries, integrated development environments, source-code repositories, build tools, testing harnesses, switches, routers, single-signon utilities, proxy servers, link resolvers, repositories, bibliographic management utilities, help-desk ticketing applications, and elaborate project-management protocols are all for naught if the final product of our labor, at the end of the day, doesn’t work. our product is not only literally useless, it is worse than useless because the library in which we labor has devoted precious resources to it only to result in a service or product that does not properly function, and those are precious resources that could have been spent elsewhere. hey there fellow technologists, why am i being so dismal? i would prefer the term “grave” to “dismal.” significant portions of the library budget are put toward technology each year, and as those whose duty it is to carry our local technology strategies into the future, we need to always be mindful of the fact that each and every dollar spent on technology is a dollar not available for building our collections—surely the direct center of the mission of anyone who calls himself a librarian, a.k.a., a cultural conservationist. (shouldn’t we be wearing badges that read, “to collect and preserve”?) making it work is job one for the technologist in the library. … a colleague and friend of mine once told me, a decade ago, that our fellow colleague made a snippy comment about an important and major web application i had written, “just because it works doesn’t mean it’s right.” now, admittedly, i was a very sloppy code formatter, and yet i certainly would never say that the applications i wrote were steaming plates of spaghetti. on the contrary, i think the code i wrote consisted of good, solid procedural programming. what my disgruntled colleague meant, i think, was that i failed to follow a framework, and by “framework” he naturally meant the same framework to which he’d recently hitched his own coding wagon. my response to his snippiness was, “ah, pretty-it-up all you want, organize it any-which-way, but functional code-code that works--is actually the number one criterion for being good code.” just ask your clients. editorial board thoughts | cyzyk 5 that app i wrote has been in production, happily working away as a key piece of the enterprise network infrastructure at a prominent, multi-campus, east coast university since 2002.3 references 1. and here i heartily agree with my fellow editorial board member, michael witt, when he notes that “[p]art of this process is attempting to feel our users’ pain…”, and i even extend this to the point of us technologists actively working with our users toward a common goal, literally sitting with them, among them, not merely being present to offer occasional support, not merely feeling their pain but being so invested in our common project that their pain is our pain. [did i really just suggest we take on more pain?! yep.] see: michael witt. “eating our own dogfood.” information technology and libraries 30, no. 3 (september 2011) 90. http://www.ala.org/lita/ital/sites/ala.org.lita.ital/files/content/30/3/pdf/witt.pdf 2. i’m no classics scholar, but this is my recollection from taking a graduate seminar many years ago on this very topic. so while i’m not pulling this entirely out of thin air, i am pulling it from the musty mists of middle-aged memory – that, and a quick scan of professor richard parry’s fine article on this topic in the stanford encyclopedia of philosophy, particularly the section on aristotle’s views. regarding my comments below on technology being instrumentally valuable, i cite parry’s words: “presumably, then, the craftsman does not choose his activity for itself but for the end; thus the value of the activity is in what is made”. see: richard parry. "episteme and techne," the stanford encyclopedia of philosophy, fall 2008 edition, edward n. zalta, editor. http://plato.stanford.edu/archives/fall2008/entries/episteme-techne/ 3. mark cyzyk, "the johns hopkins address registration system (jhars): anatomy of an application," educause quarterly 26, no. 3 (2003). https://jscholarship.library.jhu.edu/handle/1774.2/32800 http://www.ala.org/lita/ital/sites/ala.org.lita.ital/files/content/30/3/pdf/witt.pdf http://plato.stanford.edu/archives/fall2008/entries/episteme-techne/ https://jscholarship.library.jhu.edu/handle/1774.2/32800 experiences of migrating to an open source integrated library system vandana singh information technology and libraries | march 2013 36 abstract interest in migrating to open-source integrated library systems is continually growing in libraries. along with the interest, lack of empirical research and evidence to compare the process of migration brings a lot of anxiety to the interested librarians. in this research, twenty librarians who have worked in libraries that migrated to open-source integrated library system (ils) or are in the process of migrating were interviewed. the interviews focused on their experiences and the lessons learned in the process of migration. the results from the interviews are used to create guidelines/best practices for each stage of the adoption process of an open-source ils. these guidelines will be helpful for librarians who want to research and adopt an open-source ils. introduction open-source software (oss) has become increasingly popular in libraries, and every year more libraries migrate to an open-source integrated library system.1 while there many discrete opensource applications used by libraries, this paper focuses on the integrated library system (ils), which supports core operations at most libraries. the two most popular open-source ilss in the united states are koha and evergreen, and they are being positioned as alternatives to proprietary ilss. 2 as open-source software becomes more widely used, it is not enough just to identify which software is the most appropriate for libraries, but it is also important to identify best practices, common problems, and misconceptions with the adoption of these software packages. the literature on open-source ilss is usually in the form of a case study from an individual library or a detailed account of one or two aspects of the process of selection, migration, and adoption. in our interactions with librarians from across the country, we found that there are no consolidated resources for researching different open-source ilss and for sharing the experiences of the people using them. librarians who are interested in open-source ils cannot find one resource that can give them an overview of the necessary information related to open-source ilss. in this research, we interviewed twenty librarians from different types and sizes of libraries and gathered their experiences to create generalized guidelines for the adoption of open-source ilss. these guidelines are at a broader level than one single case study and cover all the different stages of the adoption lifecycle. the experiences of librarians are useful for people who are evaluating opensource ilss as well as those who are in the process of adoption. learning from their experiences will help librarians to not have to reinvent the wheel. this type of research helps the librarians by empowering them with the information they need; also, it helps us in understanding the current status of this popular software. vandana singh (vandana@utk.edu) is assistant professor, school of information sciences, university of tennessee, knoxville, tennessee. mailto:vandana@utk.edu experiences of migrating to an open-source integrated library system | singh 37 literature review as mentioned earlier, most of the literature on open-source ils is practitioner-based and provides case studies or single steps in the process of adoption. these research studies and resources are useful but do not address the broad information needs of the librarians who are researching the topic of open-source ilss. every library is different, so no two libraries are going to take the same path in the adoption process. the usefulness of these articles depends on whether the searcher can find one in a similar environment. another issue is the amount of information given in these resources. often these papers discuss only one aspect of moving to an open-source ils, for example choosing the open-source ils. if they do cover the whole process, there is usually not enough detail to know how they did it. for example, morton-owens, hanson, and walls organize their paper into five sections: motivation and requirements analysis, software selection, configuration, training, and maintenance. 3 however, each section includes more main points than description. another relevant stream of literature includes those that compare different opensource ilss. these range from little more than links to different open-source projects to in-depth comparisons.4 for example, muller evaluated open-source communities for different ilss on forty criteria and then compared the ils on over eight hundred functions and features.5 these types of articles are very useful for those who are trying to become acquainted with the different opensource ilss that are available and are in the evaluation phase of the process. again, they are not helpful in understanding the entire process of adoption. some best practices articles such as tennant may be a little older, but his nine tips are still valid and very useful as a good foundation for anyone thinking about making the switch to open-source ils.6 what are the factors for moving to an open-source ils? another reason why an open-source ils appeals to libraries is its underlying philosophy: “open source and open access are philosophically linked to intellectual freedom, which is ultimately the mission of libraries.” 7 the other two common reasons are cost and functionality. the literature covering the decision to move to an open-source ils makes it clear that there is a wide variety of ways that libraries come to this decision. in espiau-bechetoille, bernon, bruley, and mousin, the consortium made the decision in four parts. 8 the article states that they initially determined that four open-source ilss met their needs (koha, emilda, gnuteca and evergreen), although it is somewhat vague as to how they determined that koha was the best for their situation. indeed, most of the article is about how the three libraries involved had to work together, coordinating and dividing responsibilities. bissels shares that money was the main reason that the complementary and alternative medicine library and information service (camlis) decided to migrate to koha.9 they explain the process of making that decision. camlis was being developed from nothing, which makes their situation different than most libraries, and hence the process is different as well. michigan is an area known for its number of evergreen libraries. much of that is due to michigan library consortium. dykhuis explains the long, involved process that led to a number of evergreen installations. 10 mlc provides services to michigan information technology and libraries | march 2013 38 libraries, such as training and support. when they started looking for an ils system that all libraries could use, the main concerns were cost and functionality, which are the two key aspects that are mentioned in any discussion about choosing an ils. kohn and mccloy state that they decided to migrate to a new ils due to frustration with their current ils and that they involved all six of their librarians in the decision-making process.11 dennison and lewis show another reason why people migrate to open-source ils.12 they say that the proprietary system they were using was much more complicated than they needed. in addition, because of staff turnover no one really understood the system. this lack of expertise combined with increasing annual costs led to the decision to move to an open-source ils. an important lesson to take from this article is that they included all six of their librarians in the decisionmaking process. for a smaller library where everyone is an expert in their area of the library, it is important to get everyone involved in order to make sure that important functions or needed capabilities are not overlooked. almost any library that chooses open-source ils will name cost as one of their primary reasons. functionality is usually what determines which ils they choose. riewe conducted a study where he asked why each library chose its current ils. 13 open source libraries responded most often with ability to customize, the freedom from vendor lock-in, portability, and cost. how does migration happen? there are two general ways to do a migration: all at once or in stages. kohn and mccloy discuss a three-phase migration.14 the reason for this method was to spread the cost over several years. they did the public website and federated catalog as phase one and did the backend part during phases two and three. when multiple libraries are involved, phased migration is more like what is described in dykhuis.15 in that case, first a pilot program was created where a few libraries migrated over to the new system. when that was successful, then more libraries migrated. in contrast to a phased migration, walls discusses a migration completed in three months.16 this time includes installation, testing, and configuration. one interesting decision they made was to migrate at the end of the fiscal year in order to limit the amount of acquisitions data to be migrated. dennison and lewis completed their migration in two months. in this migration, most of the work was done by the company that was hosting their system. 17 this limited the amount of expertise that the library staff needed and made the migration much smoother from their perspective. migration can also be an opportunity; for example, morton-owens, hanson, and walls mention that they used the migration to koha to synchronize circulation rules between the branches. 18 it was also used to weed out inactive patrons (anyone who had not used the library in two years). data migration can be a problem, though. in the old system, the location code had been used for where the item was within the branch library, what kind of item it was, and how it circulated, but experiences of migrating to an open-source integrated library system | singh 39 these are three separate fields in koha. however, to some extent these issues are true of any migration between different systems. the migration experience is not always of a smooth transition. one of the advantages of opensource is the ability to customize and to develop functions that are specific to your library. in the case of new york academy of medicine library (nyam) working with its consortium waldo (westchester academic library directors organization), it was the decision to have developments completed before migration that caused the problems.19 their migration schedule was delayed by a month, and even after the delay not all of the eleven key features were complete. in addition, their migration took place when liblime (a proprietary vendor) with whom they were working announced their separation from the koha open-source community, which caused additional confusion. there are a couple of lessons to take from this. first, if doing development, be sure that the time needed is built into the migration schedule. also, when choosing an ils, think about how many developments are going to be necessary to successfully run the ils in your environment. lastly, try to prioritize the developments to minimize the number needed before “going live.” what does the literature say about training? very little is available about the training process for open-source ils. in current studies, training can be done in two ways: either by buying training from a vendor, or doing it internally.20 dennison and lewis found that having staff work on the system together at first and then try it independently was the most successful. 21 they had a demonstration system to practice, which also helped. in addition to this self-training, they had onsite training done by module, which allowed staff to attend only the training that was relevant and needed for them. in all of the articles discussed in this section, only one talks about ongoing maintenance. 22 the two-paragraph section includes suggested methods and does not mention anything about the amount of time or expertise needed for ongoing maintenance. in summary, in this literature review we found that there is research about open-source ils but that there is a need for much more work in this area. it was found that research articles and practitioner pieces are available and talk about different aspects of the adoption process. the main reasons for adoption are identified. there are also a few scattered individual articles about the process of migration, training, and maintenance. there is a gap in the studies of open-source ils, and there is no comprehensive study that documents the process, explains the steps, and identifies best practices and challenges for librarians who are interested. data sources the objective for data collection was to collect data from a variety of library types and sizes in order to collect a wide range of data. e-mail invitations for interviews were sent to koha and evergreen discussion list and to several other library-related discussion list. the e-mail requested volunteers for a telephone interview to share their experiences with open-source integrated library systems. potential participants identified themselves as being willing to be interviewed for information technology and libraries | march 2013 40 the project via e-mail and were then contacted by researchers to set up times for phone interviews. the list of interview questions was e-mailed to the participants before the interviews so that they could review the questions and had enough time to reflect on their experiences. the interviews were conducted with librarians working in a variety of libraries, including nine libraries using evergreen and one in the process of migrating to evergreen. seven libraries were using koha, two were using other open-source ilss, and one was using a proprietary ils while evaluating opensource. public libraries were the most numerous with eleven respondents, while there were also four special libraries, three academic libraries, and one school library. researchers also requested information about the size of the library collection. seven libraries owned collections of less than 100,000 items, seven had collections of 100,001–999,999 items, and four libraries owned collections of over 1,000,000 items. geographically, the respondents ranged all over the united states and included one library located in afghanistan (although the ils was installed in the united states). table 1 details the description of the data. data collection method interviews were chosen as the primary means of data collection in order to gather rich information that could be analyzed using qualitative methods. researchers sought to interview professionals from a variety of library types and sizes in order to collect a variety of different experiences regarding the selection, implementation, and ongoing maintenance of open-source ils. interviewing was the chosen methodology for several reasons. first, the goal was to go past the practitioner articles to see what kinds of trends there are in the migration process. this requires getting experiences from multiple librarians. interviews provide the in-depth “case-study description” that we were looking for.23 in addition, the most useful aspect of interviewing is the ability to follow up on an answer that the participant gives.24 this ensures that the same type of information is gathered from every interview. this is unlike surveys where sometimes participants do not respond in a way that answers what the researcher really wants to know. in our case, we used telephone interviews due to the geographic dispersion of the participants. it allowed us to talk to librarians from all over the country instead of just within our area. the interview questions are listed in appendix a. data analysis methodology interviews were transcribed, and identifying information was then removed from each of the transcribed documents. the transcripts were then uploaded into dedoose (www.dedoose.com), a web-based analysis program supporting qualitative and mixed methods research. dedoose provides online excerpt selection, coding, and analysis by multiple researchers for multiple documents. the research team used an iterative process of qualitatively analyzing the resulting documents. this method used multiple reviews of the data to initially code large excerpts which were then analyzed twice more to extract common themes and ideas. researchers began by reviewing each document for quantitative information, including the library type, ils in use, experiences of migrating to an open-source integrated library system | singh 41 number of it staff, and size of the collection. this information was added as metadata descriptors to each document in dedoose. upon review of the transcriptions and in discussions about the interview process, researchers began a content analysis of the qualitative data. codes were created based on this initial analysis to aid in categorizing the data from the interviews. two coders coded the entire dataset, specifying categories and themes to the excerpts of the interview transcription. all of the excerpts from each coder were used to create two tests. each coder then took the test of the other's codes by choosing their own codes for each excerpt. researchers earned scores of .96 and .95 using cohen’s kappa statistic, indicating very high reliability. table 1. description of libraries library size (number of items in collection) library type ils used under 100,000 academic koha 100,000–1,000,000 public evergreen under 100,000 special proprietary—considering open-source under 100,000 public koha school koha 100,000–1,000,000 public millennium—in process of migrating to evergreen 100,000–1,000,000 public evergreen 100,000–1,000,000 special koha under 100,000 public koha public evergreen 100,000–1,000,000 academic evergreen-equinox under 100,000 special koha over 1,000,000 academic kuali ole 100,000–1,000,000 public evergreen-equinox over 1,000,000 public evergreen 100,000–1,000,000 public evergreen under 100,000 public koha-bywater over 1,000,000 public evergreen-equinox under 100,000 public evergreen over 1,000,000 special collective access information technology and libraries | march 2013 42 results results from the interview questions were divided into eight categories identified as stages of migration, starting with evaluation of the ils, creation of a demonstration site, data preparation, identification of customization and development needs, data migration, staff training and user testing, and going live and long-term maintenance plans. best practices and challenges for each of the stages are presented below. this section begins with some general considerations gleaned from the responses. general consideration when migrating to an open-source ils • create awareness about open-source culture in your library—let them know what to expect. • develop it skills internally even if you use a vendor. • assess your staff’s abilities before committing. knowing what your staff can do will help determine whether you need to work with a vendor and to what degree or if you can do it alone. it is also a way to determine who is going to be on your migration team. • have a demonstration system; pre-migration, it can be used to test and train, and after migration it can be used to help find solutions to problems. this will also help develop skills internally. • communication is key. o if working with a vendor either as a single library or as a consortium, have a designated liaison with the vendor so all questions go through one person. in a consortium, ensure that everybody knows what is going on. • be prepared to commit a significant amount of staff time for testing, development, and migration, especially if you are not hiring a proprietary vendor for support. working with vendors • read contracts carefully. do not be afraid to ask questions and request changes. sometimes the other party has a completely different meaning for a word than you do. make sure you are on the same page. • ensure that there is an explicit timeline and procedure for the release of usable source code. • see that you are guaranteed and entitled to access the source code in case you need to switch developers, bring additional developers on board, or try to fix problems in-house. • provide specific examples when reporting problems. specific example will help the developers determine what the problem is and will help prevent any miscommunication. • designate a liaison between library staff and developers. the liaison will have to be someone who understands or can learn enough about what the developers are doing so that he or she can translate any problems or complaints from one group to the other. experiences of migrating to an open-source integrated library system | singh 43 • set up regular meetings for those involved in the migration project. regular meetings keep everyone focused and on task. they also provide an opportunity for questions, concerns, and problems to be addressed quickly. sample quote from interviews: one of the main things that came up is working with equinox, it was amazing. to start with, they were very, very helpful. and i had made an assumption, and i think the rest of us had, too, that we were working with, that this was developed by librarians, and that the terminology used would be library jargon. but that was not the case. we had some stumbling points over, we would say, okay, we want this, or this is a transaction, or that’s a bill, but that’s not what they called it. they didn’t call it a transaction, or they didn’t call it a bill. and so when we wrote the contract, we wrote it so that none of the patrons’ current checkout record would migrate, which is a big issue. and we didn’t realize that we weren’t using the right terminology in order to put that in the contract so that those current checkouts would move over with the migration and not just the record. stage 1—evaluation when making the decision of whether to migrate to open-source and which open-source ils is best for your library, the main things to start with are two questions: who makes the decision and on what basis. in practice, who makes the decision? • if a single library, one or two people make the decision, usually the library director and whoever is serving as the tech person. • if in a consortium, a committee makes the decision, often either the library directors or tech people. best practice suggestion: regardless of the size of the library system, even though these are the people making the decisions, you should always try to include as many groups as open-sourceible in the decision to move to open-source. which ils? • make a list of requirements based on your current system and a wish list of requirements for the new system. this is one area where you can involve more than just the system staff. asking the different departments (cataloging, acquisitions, and circulation) what their needs are ensures that the final decision includes everyone. • talk to other libraries that have made the move to open-source. they are a great resource for seeing how the system actually works, asking questions about the migration process, and providing information about open-source problems. if available, talk to a library that migrated from your current proprietary system. some systems are easier to migrate from than others, so this would be an opportunity to find out about any specific problems. information technology and libraries | march 2013 44 stage 2—set up a demonstration site • this is the most important guideline in the entire paper. create a demonstration site before making a final decision. o if there is still confusion in your team about which ils to use, setting up a demo site and installing koha and evergreen will be the best way to decide which one works for your situation. o doing at least one test migration will show what kind of data preparation needs to be done, usually by doing data mapping. data mapping is where you determine where the fields in your current system go when you move into the new system. another often-used term for data mapping is staging tables. o the demo site is also a good way to do staff training when needed. o the demo site also provides a way to determine what the best setup rules, policies, and settings are by testing them in advance. o it provides an opportunity to learn the processes of the different modules and how they differ from your library’s current practices. o most importantly, it serves as a test run for migration, which will make the actual migration go smoothly. sample quotes from interviews: do you think that the tests with the data and doing that really helped? oh yes, we were have had a disaster if we hadn’t done three tests and test loads. the pals office has done conversions multiple times before so they have it done, and we have good tech people. so they knew that the three tests loads would be a good thing. we did discover some of the tools that should be used, like for example one of the things that’s recommended for evergreen patron migration is to have a staging table, so you dump all your records into a database that you can then use to create the records in the evergreen tables. and you know we found out why that was important by running into a couple, a few problems with not being able to line up the data in the multiple fields. but you know that’s the sort of thing we expect. that’s pretty, i classify it as pretty typical migration learning, is finding out what works one way, what doesn’t the other. but you know that was a good thing because all the documents were saying, “you should use a staging table.” and we had to figure out ourselves why that was such a good idea. you should use a staging table for migration, i.e. move records into a database that is then used to create records in evergreen. it helps because some data doesn't line up in the same fields. it's a good idea to set up tables and rules far in advance in order to test before migration. it's very important to do data mapping very carefully because if you lose anything during migration it's difficult to get it back. check it to make sure that all the fields will be experiences of migrating to an open-source integrated library system | singh 45 transported correctly, and run tests while the old system is still up to make sure everything is there. stage 3—data preparation • clean up the data in advance. the better the data is, the easier it will transfer. this is also an opportunity to start fresh in a new system, so if there were inconsistencies or irritations in the old system this is a good time to fix it. o weeding—if you have records (either materials or patrons) that are out of date, get rid of them. the fewer the records, the easier migration will be. in addition, vendors often charge by record, so why pay for records you do not need? • consistency in data is key. if multiple people are working on the data, make sure they are working based on the same standards. • do a fine amnesty when migrating to a new system. depending on the systems (current and new), it is sometimes impossible or very difficult to transfer fine data into the new ils, so doing a fine amnesty will make the process simpler. • spot check data (testing, during, and after migration). catching problems early means there will be less work trying to fix problems later. sample quotes from interviews: i would say that if you’re considering converting to an ils software, that you’ve really got to do the data mapping very carefully with a fine-toothed comb because you don’t want to lose data. it’s too hard to get it back in. the data needs to be normalized so that the numbers of fields are uniform, names are in the correct order, and data is displayed correctly. the library has had to decide whether it is worthwhile to do things like getting rid of old abbreviations, etc. to make the data more easily understood. problems occur with old data if information such as note fields has been entered inconsistently. it's important to have procedures and to make sure everyone is following them. often things are put in different places, which causes a lot of trouble. they are doing a lot of cleanup of data, such as reducing the number of unique values in the case of some items that had a huge number of values in a drop down list. would like to spend more time on data cleanup but need to go ahead and get data migrated. stage 4—development/customization • one benefit of using an open-source ils is that any development done by any library comes back to the community, so often if you want something done, someone else might have already created that functionality and you can use it. information technology and libraries | march 2013 46 • develop partnerships. often if you want a specific development, someone else does too. if your staff does not have the expertise, then you could provide more of the funding and the partner could provide the tech skills or vice versa. partnerships mean the development will cost less than if you did it alone. • grant money is also available for open-source development and may be another funding option. sample quotes from interviews: the library does its own minor customizations and uses equinox for major jobs. they will lay out and prepare everything then hire equinox to write and implement new code. the library tries not to do things on its own but always looks for partnerships when doing any customizations. that way libraries that have similar needs can share resources. stage 5—migration process • write workflows and policies/rules beforehand. writing these when working on the demo site should provide step-by-step instructions on how to do the final migration. • having regular meetings during the migration process ensures that everyone stays on the same page and prevents miscommunications that will slow down the process. • if many libraries are involved, migration in waves will make things easier. this is generally a situation with a statewide consortium. usually there is a pilot migration of four to eight libraries, then after that, each wave gets a little bigger as the system becomes more practiced. this can also be a useful model if the libraries involved in the consortium are accepting the migration at different rates. • for a consortium that is coming from multiple ilss, having a vendor will make it easier. this is not to say that it could not be done without a vendor, but migrating from system a is going to be different than migrating from system b. this increases the complexity, which can make working with a vendor more cost effective. stage 6—staff training and user testing • who does the training? there are two main ways: by a vendor or internally. o if trained by a vendor, there are two options:  the vendor sends someone to the library to conduct training.  the library sends someone to the vendor for training and then he or she comes back and trains the rest of the staff. o if trained internally, there are a lot of training materials available. there are several libraries that have created their own materials and then made them available online. this is another time where having contacts with other libraries can help in using common resources. experiences of migrating to an open-source integrated library system | singh 47 • documentation is important for training. the best way is to find what documentation is already available and then customize it for your system. • do training fairly close to the “go live” date. • use a day or two for training. if a consortium is spread out geographically, use webinars and wikis. • when doing training, have specific tasks to do. this can be done a few ways. o do the specific tasks at the training. o demonstrate the tasks at training and then give “homework” where the staff does the specific tasks independently. to implement this option, staff has to have access to a demo system. o have staff try the tests on their own and use the training session for questions or problems they had. sample quotes from interviews: well we had, we hired equinox to come and do 2 days of training with us. so they’re here and did hands-on training with us. and then we also, they provided some packets of exercises that people could do on their own. and we had the system up and running in the background so that they could play with it about a week before we actually went live to the public so that they could get used to it, figure out how things worked, and work with it a little bit so they could answer questions before the public came and said, hey, how do i find my record, and i can’t get into this anymore. and the training was really good, but the hands-on was the best. and it’s not a difficult system to work, but you just need a little experience with it before it makes sense. evergreen runs a test server that anybody can download the staff client for that and work in their test server and just examine all of the records and how the system works, to figure out our workflows. we looked up documentation online—evergreen, indiana, pines, various places—copied the documentation they so graciously hosted online for everybody to use, went through it, found what worked for us. those couple staff members worked with other staff. we printed out kind of our little how-to guides for other people, depending on which worked, and told them they’re going to sit down, we’ve got terminals set up here, sit down and learn it. the admin person, she went through some quite detailed training. she went to atlanta and had training from equinox on a lot of aspects of evergreen. and then we also, she came back, and then she did training for all the libraries in the consortium, kind of an intensive day-long or half-day-long thing that she offered in several different central geographic locations so that all the libraries would have a chance to go and attend without having to drive too far. and we also did webinars, we got a couple webinars for the real outlying libraries. and we also have ongoing weekly webinars. and we have a wiki set up where we put all the information in the online manual and stuff like that. information technology and libraries | march 2013 48 all the training sessions were recorded, and so we had them on cd for new people coming on board. marketing for patrons • most libraries have not done anything elaborate, generally just announcements through posters, local papers, flyers, and on websites. • if the migration is greatly changing the situation for patrons, then more marketing is needed. • set up a demo computer for patrons to try or hold classes once the system is up. training for patrons • most libraries did not find this necessary. either the system is easy to use or it is set up to look like the old system. • if training patrons, create online tutorials. stage 7—“go live” and after • if possible, have your old system running for a month or two until you are sure that all the data got migrated over properly. sample quote from interviews: check it to make sure that all the fields will be transported correctly, and run tests while the old system is still up to make sure everything is there. maintenance—library staff (this assumes a migration being done in-house with little to no vendor support.) • staff has to have the technical knowledge (linux, sql, and coding). • often the money saved from moving to open-source is used to pay for additional staff. • most time is not spent on maintenance but on customization, updates, or problem-solving. maintenance—vendor • often start with higher vendor support, which lessens as the staff learns and develops expertise. discussion and conclusion interviews with twenty librarians from different settings provided insight into the process of the adoption of open-source ils and were used to develop the guidelines presented in this paper. these guidelines are not intended to serve as a complete guide to the process of adoption but are meant to give interested librarians an overview of the process. these guidelines can help libraries prepare themselves for the research and adoption far before they delve into the process. since these guidelines are all based in the real-life adoption experiences of libraries, they provide insight experiences of migrating to an open-source integrated library system | singh 49 into the challenges as well as the opportunities in the process. these guidelines can be used to develop an adoption plan and requirements for the adoption process. in future research, we are working to create adoption blueprints and total cost of ownership assessments (with and without vendors) for libraries of different sizes and types. also, as part of this research we have developed an information portal that contains resources that will help librarians in each phase of the process of open-source ils adoption. the information portal along with these guidelines will fill a very important gap in the resources available for open-source ils adoption. the url for the portal is not being provided in this paper to ensure anonymous review. references 1. marshall breeding, “automation marketplace 2012: agents of change,” library journal 137, no. 6 (april 1, 2012), http://lj.libraryjournal.com/2012/03/industry-news/automationmarketplace-2012-agents-of-change (accessed february 18, 2013). 2. tristan müller, “how to choose a free and open-source integrated library system,” oclc systems & services: international digital library perspectives 27, no. 1 (2011): 57–78, http://dx.doi.org/10.1108/10650751111106573 (accessed february 18, 2013). 3. emily g. morton-owens, karen l. hanson, and ian walls, “implementing open-source software for three core library functions: a stage-by-stage comparison,” journal of electronic resources in medical libraries 8, no. 1 (2011), 1–14, http://dx.doi.org/10.1080/15424065.2011.551486 (accessed february 18, 2013). 4. janet l. balas, “how they did it: ils migration case studies,” computer in libraries 31, no. 8 (2011): 37. 5. müller, “how to choose a free and open-source integrated library system.” 6. roy tennant, “technology decision-making: a guide for the perplexed,” library journal 125, no. 7 (2000): 30. 7. xan arch, “ultimate debate 2010: open source software—free beer or free puppy? a report of the lita internet resources & services interest group program, american library association annual conference, washington, dc, june 2010,” technical services quarterly 28, no. 2 (2011): 186–88, http://dx.doi.org/10.1080/07317131.2011.546268 (accessed february 18, 2013). 8. camille espiau-bechetoille, jean bernon, caroline bruley, and sandrine mousin, “an example of inter-university cooperation for implementing koha in libraries: collective approach and institutional needs,” oclc systems & services: international digital library perspectives 27, no.1 (2011): 40–44, http://dx.doi.org/10.1108/10650751111106546 (accessed february 18, 2013). http://lj.libraryjournal.com/2012/03/industry-news/automation-marketplace-2012-agents-of-change/ http://lj.libraryjournal.com/2012/03/industry-news/automation-marketplace-2012-agents-of-change/ http://dx.doi.org/10.1108/10650751111106573 http://dx.doi.org/10.1080/15424065.2011.551486 http://dx.doi.org/10.1080/07317131.2011.546268 http://dx.doi.org/10.1108/10650751111106546 information technology and libraries | march 2013 50 9. gerhard bissels, “implementation of an open-source library management system: experiences with koha 3.0 at the royal london homoeopathic hospital,” electronic library and information systems 42, no. 3 (2008): 303–14, http://dx.doi.org/10.1108/00330330810892703 (accessed february 18, 2013). 10. randy dykhuis, “michigan evergreen: implementing a shared open source integrated library system,” collaborative librarianship 1, no. 2 (2009): 60–65, http://collaborativelibrarianship.org/index.php/jocl/article/view/7/8 (accessed february 18, 2013). 11. karen kohn and eric mccloy, “phased migration to koha: our library’s experience,” journal of web librarianship 4 no. 4 (2010): 427–34, http://dx.doi.org/10.1080/19322909.2010.485944 (accessed february 18, 2013). 12. l.h. lyn dennison and a.f. lewis, “small and open-source: decisions and implementation of an open-source integrated library system in a small private college,” georgia library quarterly 48 no. 2 (2011): 6–8, http://digitalcommons.kennesaw.edu/glq/vol48/iss2/3 (accessed february 18, 2012). 13. linda m. riewe, “survey of open-source integrated library systems,” master’s theses, paper 3481, http://scholarworks.sjsu.edu/etd_theses/3481 (accessed february 18, 2013). 14. karen kohn and eric mccloy, “phased migration to koha: our library’s experience.” 15. randy dykhuis, “michigan evergreen: implementing a shared open source integrated library system.” 16. ian walls, “migrating from innovative interfaces’ millennium to koha: the nyu health sciences libraries’ experiences,” oclc systems & services: international digital library perspectives 27, no. 1 (2011): 51–56, http://dx.doi.org/10.1108/10650751111106564 (accessed february 13, 2013). 17. l.h. lyn dennison and a.f. lewis, “small and open-source: decisions and implementation of an open-source integrated library system in a small private college.” 18. emily g. morton-owens, karen l. hanson, and ian walls “implementing open-source software for three core library functions: a stage-by-stage comparison.” 19. lisa genoese and latrina keith, “jumping ship: one health science library’s voyage from a proprietary ils to open source,” journal of electronic resources in medical libraries 8, no. 2 (2011): 126–33, http://dx.doi.org/10.1080/15424065.2011.576605 (accessed february 18, 2013). 20. ian walls, “migrating from innovative interfaces’ millennium to koha: the nyu health sciences libraries’ experiences”; emily g. morton-owens, karen l. hanson, and ian walls, http://dx.doi.org/10.1108/00330330810892703 http://collaborativelibrarianship.org/index.php/jocl/article/view/7/8 http://dx.doi.org/10.1080/19322909.2010.485944 http://digitalcommons.kennesaw.edu/glq/vol48/iss2/3 http://scholarworks.sjsu.edu/etd_theses/3481 http://dx.doi.org/10.1108/10650751111106564 http://dx.doi.org/10.1080/15424065.2011.576605 experiences of migrating to an open-source integrated library system | singh 51 “implementing open-source software for three core library functions: a stage-by-stage comparison.” 21. l. h. lyn dennison and a. f. lewis, “small and open-source: decisions and implementation of an open-source integrated library system in a small private college.” 22. morton-owens, hanson, and walls, “implementing open-source software for three core library functions.” 23. laurel jizba mis, “an essay on our interviews, and a call for participation,” journal of internet cataloging 6 no. 2 (2003): 17–20, doi: 10.1300/j141v06n02_04 (accessed february 18, 2013). 24. golnessa galyani moghaddan and mostafa moballeghi, “how do we measure the use of scientific journals? a note on research methodologies,” scientometrics 76, no. 1 (2008): 125– 33, doi: 10.1007/s11192-007-1901-y (accessed february 18, 2013). doi:%2010.1300/j141v06n02_04 doi:%2010.1007/s11192-007-1901-y information technology and libraries | march 2013 52 appendix a. interview questions library environment 1. what is your library type (school, academic, public, special, etc.)? 2. what is your library size (how many employees, population served, and number of materials)? evaluation (we would like as much info as possible about why the system was chosen over others, including any existing system.) 3. what open-source ils are you using and why did you choose it? 4. when choosing an open-source ils, where did you go for information (vendor/ils pages, community groups, personal contacts, etc)? 5. who was involved in deciding which ils to use? adoption (we would like to document specific problems or issues that could be used by other libraries to ease their installation.) 6. were there any problems during migration? 7. what do you know now that you wish you had known before migration? 8. how long did migration take? were you on schedule? 9. if getting paid support, how did the vendors (previous and current) help with migration? implementation (again, specific examples of the things that worked well or didn't work. how can other libraries learn from this experience?) 10. what kind of (and how much) training did your library staff receive? 11. did you do any kind of marketing to your patrons? 12. (if haven’t gotten to this part yet), what are your plans for implementation? 13. how much time did implementation take and were you on schedule? maintenance (this information will be especially important when compared to the library type and size as a reference for other libraries. we would like to get answers that are as specific as possible). 14. how large is your systems staff? is it sufficient to maintain the system? 15. how much time do you spend each week doing system maintenance? how does this compare to your old system? experiences of migrating to an open-source integrated library system | singh 53 16. what resources (or channels) do you use to solve your technical support issues? what roles do paid vendors play in maintenance of your system? advice for other libraries (these open-ended questions are an opportunity to learn more information that we might not have thought of asking about. responses could provide a valuable resource to other libraries as they plan their implementation). 17. what is the best thing and worst thing about having an open-source ils? 18. are there any lessons or advice that you would like to share with other librarians who are thinking about or migrating to an open-source ils? acknowledgment this research was funded by an early career imls grant. abstract interest in migrating to open-source integrated library systems is continually growing in libraries. along with the interest, lack of empirical research and evidence to compare the process of migration brings a lot of anxiety to the interested librari... 164 information technology and libraries | december 2009 “discovery” focus as impetus for organizational learning jennifer l. fabbi the university of nevada las vegas libraries’ focus on the concept of discovery and the tools and processes that enable our users to find information began with an organizational review of the libraries’ technical services division. this article outlines the phases of this review and subsequent planning and organizational commitment to discovery. using the theoretical lens of organizational learning, it highlights how the emerging focus on discovery has provided an impetus for genuine learning and change. t he university of nevada las vegas (unlv) libraries’ focus on the concept of discovery and the tools and processes that enable our users to find information stemmed from the confluence of several initiatives. however, a significant path that is directly responsible for the increased attention on discovery leads through one unit in unlv libraries—technical services. this unit, consisting of the materials ordering and receiving (acquisitions) and bibliographic and metadata services (cataloging) departments, had been without a permanent director for three years when i was asked to take the interim post in april 2008. while the initial expectation was that i would work with the staff to continue to keep technical services functioning while we performed our third search for a permanent director, it became clear after three months that, because of nevada’s budgetary limitations, we would not be able to go forward with a search at that time. as all personnel searches in unlv libraries were frozen, managers and staff across the divisions moved quickly to reassign staff with the aim of mitigating the effects of staff vacancies. there was division between the library administrators as to what the solution would be for technical services: split up the division—for which we had trouble recruiting and retaining a leader in the past—and divvy up its functions among other divisions in the libraries, or to continue to hold down the fort while conducting a review of technical services that would inform what it might become in the future. other organizations have taken serious looks at, and provided roadmaps of, how their organizations’ focus of technical services will change in the future.1 the latter route was chosen, and the review—eventually dubbed revisioning technical services—led directly to the inquiries and activities documented in this ital special issue. detailing the process of revisioning technical services and using the theoretical lens of organizational learning, i will demonstrate how the libraries’ emerging focus on the concept of discovery has provided an impetus for genuine learning and change. n organizational learning in images of organization, morgan devotes a chapter to theories of organizational development that characterize organizations using the metaphor of the brain.2 based on the principles of modern cybernetics, argyris and schön provide a framework for thinking about how organizations can learn to learn.3 while many organizations have become adept at single-loop learning—the ability to scan the environment, set objectives, and monitor their own figure 1. singleand double-loop learning source: learning-org discussion pages, “single and double loop learning,” learning-org dialog on learning organizations, http://www.learning-org.com/ graphics/lo23374singledll.jpg (accessed aug. 11, 2009). jennifer l. fabbi (jennifer.fabbi@unlv.edu) is special assistant to the dean at the university of nevada las vegas libraries. “discovery” focus as impetus for organizational learning | fabbi 165 general performance in relation to existing operating norms—these types of systems are generally designed to keep the organization “on course.” double-loop learning, on the other hand, is a process of learning to learn, which depends on being able to take a “double look” at the situation by questioning the relevance of operating norms (see figure 1). bureaucratized organizations have fundamental organizing principles, including management hierarchy and subunit goals that are seen as ends to themselves, which can actually obstruct the learning process. to become skilled in the art of double-loop learning, organizations must avoid getting trapped in singlelooped processes, especially those created by “traditional management control systems” and the “defensive routines” of organizational members.4 according to morgan, cybernetics suggests that learning organizations must develop capacities that allow them to do the following:5 n scan and anticipate change in the wider environment to detect significant variations by o embracing views of potential futures as well as of the present and the past; o understanding products and services from the customer’s point of view; and o using, embracing, and creating uncertainty as a resource for new patterns of development. n develop an ability to question, challenge, and change operating norms and assumptions by o challenging how they see and think about organizational reality using different templates and mental models; o making sure strategic development does not run ahead of organizational reality; and o developing a culture that supports change and risk taking. n allow an appropriate strategic direction and pattern of organization to emerge by o developing a sense of vision, norms, values, limits, or “reference points” to guide behavior, including the ability to question the limits being imposed; o absorbing the basic philosophy that will guide appropriate objectives and behaviors in any situation; and o placing as much importance on the selection of the limits to be placed on behavior as on the active pursuit of desired goals. unlv libraries’ revisioning technical services process and the resulting organizational focus on discovery is outlined below, and the elements identifying unlv libraries as a learning organization throughout this process are highlighted (see appendix a). n revisioning technical services this review of technical services was a process consisting of several distinct steps over many months, and each step was informed by the data and opinions gained in the prior steps: phase 1: technical services baseline, focusing on the nature of technical services work at unlv libraries, in the library profession, and factors that affect this work now and in the future phase 2: organizational call to action, engaging the entire organization in shared learning and input phase 3: summit on discovery, shifting significantly away from technical services and toward the concept of discovery of information and the experience of our users technical services baseline the first phase of the process, which i called the “technical services baseline,” included a face-to-face meeting with me and all technical services staff. we talked openly about the challenges that we faced, options on the table for the division and why i thought that taking on this review would be the best course to pursue, and goals of the review. outcomes of the process were guided by the dean of libraries, were written by me, and received input from technical services staff, resulting in the following goals: 1. collect input about the kinds of skills and leadership we would like to see in our new technical services director. (while creating these goals, we were given the go-ahead to continue our search for a new director). 2. investigate the organization of knowledge at a broad level—what is the added value that libraries provide? 3. increase overall knowledge of professional issues in technical services and what is most meaningful for us at unlv. 4. encourage technical services staff to consider current and future priorities. after establishing these goals, i began to document information about the process on unlv libraries’ staff website (figure 2) so that all staff could follow its progress. 166 information technology and libraries | december 2009 with the feedback i received at the face-to-face meeting and guided by the stated goals of the process, i gave technical services staff a series of three questions to answer individually: 1. what do you think the major functions of technical services are? examples are “cataloging physical materials” and “ordering and paying for all resources purchased from the collections budget.” 2. what external factors—in librarianship and otherwise—should we be paying the most attention to in terms of their effect on technical services work? examples are “the ways that users look for information” and “reduction of print book and serials budgets.” feel free to do a little research on this question and provide the sources of the information that you find. 3. what are the three highest priority/most important tasks on your to-do list right now? eighteen of twenty staff members responded to the questions. i then analyzed the twenty pages of feedback according to two specific criteria: (1) i paid special attention to phrases that indicated an individual’s beliefs, values, or philosophies to identify potential sources of conflict as we moved through the process; and (2) i looked for priority tasks listed that are not directly related to the individual’s job duties, as many of them were indicators of work stress or anxiety related to perceived impending change. during this phase, organizational learning was initiated through the process of challenging how technical services staff and others viewed technical services as a unit in the organization, and through the creation of shared reference points to guide our future actions. while beginning a dialogue about a variety of future management options for technical services work functions may have raised levels of anxiety within the organization, it also invited administration and staff to question the status quo and consider alternative modes of operation within the context of efficiency.6 in addition to thinking about current realities and external influences, staff were asked to participate in generating outcomes to guide the review process. these shared goals helped to develop a sense of coherence for what started out as a very loose assignment—a review that would inform what the unit might become in the future. organizational call to action the next phase of the process, “a call to action,” required library-wide involvement and input. while i knew that this phase would involve a library staff survey, i also desired that all staff responding to the survey had a basic knowledge of some of the issues that are facing library technical services today. using input from the two technical services department heads, i selected two readings for all library staff: bothmann and holmberg’s chapter on strategic planning for electronic resource management addressed many of the planning, policy, and workflow issues that unlv libraries has experienced7; and coyle’s article on information organization and the future of the library catalog offers several ideas for ensuring that valuable information is visible to our users in the information environments they are using.8 i also asked the library staff to visit the university of nebraska–lincoln’s “encore catalog search” (http://iris.unl.edu) and go through the discovery experience by performing a guided search and a search on a topic of their choice. they were then asked to ponder what collections of physical or digital resources we currently own at the libraries that are not available from the library catalog. after completing these steps, i directed library staff to a survey of questions related to the importance of several items referenced in the articles in terms of the following unlv libraries priorities: n creating a single search interface for users pulling together information from the traditional library catalog as well as other resources (e.g., journal articles, images, archival materials) n considering non–marc records in the library catalog for the integration of nontraditional library and nonlibrary resources into the catalog n linking to access points for full-text resources from the catalog n creating ways for the catalog to recommend items to users figure 2. project’s wiki page on staff website “discovery” focus as impetus for organizational learning | fabbi 167 n creating metadata for materials not found in the catalog n creating “community” within the library catalog n implementing an electronic resource management system (erms) to help manage the details related to subscriptions to electronic content n implementing federated searching so that users can search across multiple electronic resource interfaces at once n making electronic resource license information available to library staff and patrons there also were several questions asking library staff to prioritize many of the functions that technical services already undertakes to some extent: n cataloging specialized or unique materials n cataloging and processing gift collections n ensuring that full-text electronic access is represented accurately in the catalog n claiming and binding print serials n ordering and receiving physical resources n ordering and receiving electronic resources n maintaining and communicating acquisitions budget and serials data the survey asked technical services staff to “think of your current top three priority to-do items. in light of what you read and what you think is important for us to focus on, how do you think your work now will have changed in five years?” all other library staff members were asked to respond to the following: 1. please list two ways that technical services supports your work now. 2. please list two things you would like technical services to start doing in support of your work now. 3. please list two things you think technical services can stop doing now. 4. please list two things technical services will need to begin doing to support your work in the next five years. finally, the survey included ample opportunity for additional comments. fifty-eight staff members (over half of all library staff) completed the readings, activity, and survey. i analyzed the information to inform the design of subsequent phases of revisioning technical services. the dean of libraries’ direct reports then reviewed the design. in addition, many library staff contributed additional readings and links to library catalogs and other websites to add to the revisioning technical services staff webpage. throughout this phase, the organization was invited into the learning process through engagement with shared reference points, the ability to question the status quo, and the ability to embrace views of potential futures as well as of the present and the past.9 the careful selection of shared readings and activities created coherence among the staff in terms of thinking about the future, but these ideas also raised many questions about the concept of discovery and what route unlv libraries might take. the survey allowed library staff to better understand current practices in technical services, to prioritize new ideas against these practices, and to think about future options and their potential impact on their individual work as well as the collective work of the libraries. summit on discovery in the third phase of this process, “the discovery summit,” focus began to shift significantly from technical services as an organizational unit to the concept of discovery and what it means for the future of unlv libraries. during this half-day event, employing a facilitator from off campus, the dean of libraries and i designed a program to fulfill the following desired outcome: through a process of focused inquiry, observation, and discussion, participants will more fully understand the discovery experience of unlv libraries users. the event was open to all library staff members; however, individuals were required to rsvp and complete an activity before the day of the event. (the facilitator worked specifically with the technical services staff at a retreat designed to prepare for upcoming interviews for technical services director candidates.) participants were each sent a “summit matrix” (see appendix b) ahead of time, which asked them to look for specific pieces of information by doing the following: 1. search for the information requested with three discovery tools as your starting points: the libraries’ catalog, the libraries’ website, and a general internet search engine (like google). 2. for each discovery tool, rate the information that you were able to find in terms of “ease of discovery” on a scale of 1 (lowest ease—few results) to 5 (highest ease—best results). 3. document the thoughts and feelings you had and/ or process you went through in searching for this information. 4. answer this question: do you have other preferred starting points when looking for information that the libraries own or provide access to? the information that staff members were asked to search for using each discovery tool was mostly specific to the region of southern nevada, such as, “i heard that henderson (a city in southern nevada) started as a mining community. does unlv libraries have any books about that?” and “find any photograph of the gay 168 information technology and libraries | december 2009 pride parade in las vegas that you can look at in unlv libraries.” during the summit, the approximately sixty participants were asked to discuss their experiences searching for the matrix information, including any affective component to their experience, and they were asked to specify criteria for their definition of “ease of discovery.” next, we showed end-user usability video testing footage of a unlv professor, a human resources employee, and a unlv librarian going through similar discovery exercises. after each video, we discussed these users’ experiences—their successes, failures, and frustrations— and the fact that even our experts were unable to discover some of this information. finally, we facilitated a robust brainstorming session on initiatives we could undertake to improve the discovery experience of our users. [editor’s note: read more about this usability testing in “usability as a method for assessing discovery” on page 181 of this issue.] during the wrap-up of the discovery summit, the final phase of this initial process, the discovery miniconference was introduced. a call for proposals for library staff to introduce or otherwise present discovery concepts to other library staff was distributed. this call tied together the revisioning technical services process to date and also placed the focus on discovery to the libraries’ upcoming strategic planning process. this strategic planning process, outlining broad directions for the libraries to focus on for the next two years, would be the first time we would use our newly created evaluation framework. we focused on the concepts of discovery, access, and use, all tied together through an emphasis on the user. all library staff members were invited to submit a poster session or other visual display on various themes related to discovery of information to add to our collective and individual knowledge bases and to better understand our colleagues’ philosophies and positions on discovery. in addressing one of six mini-conference themes listed below, all drawn directly from the revisioning technical services survey results, potential participants were asked to consider the question, “what are your ideas for ways to improve how users find library resources?” n single search interface (federated searching, harvester-type platform, etc.) n open source vs. vendor infrastructure n information-seeking behavior of different users n social networking and web 2.0 features as related to discovery n describing primary sources and other unique materials for discovery n opening the library catalog for different record types and materials proposals could include any of these perspectives: n an environmental scan with a summary of what you learn n a visual representation of what you would consider improvement or success n a position for a specific approach or solution that you advocate ultimately, we had seventeen distinct projects involving twenty-four staff members for the afternoon miniconference. it was attended by approximately seventy additional staff members from unlv libraries as well as representatives from institutions who share our innovative system. we collected feedback on each project in written form and electronically after the mini-conference. miniconference content was documented on its own wiki pages and in this special issue of ital. during this phase of the revisioning technical services process, there was an emphasis on understanding our services from the customers’ point of view, a hallmark of a learning organization.10 during the discovery summit, we aimed to transform frustration and uncertainty over the user experience of the services we are providing into a motivation to embrace potential futures. the mini-conference utilized the discovery themes that had evolved throughout the revisioning technical services process to provide a cohesive framework for library staff members to share their knowledge and ideas about discovery systems and to question the status quo. n organizational ownership of discovery: strategic planning and beyond through the phases of the revisioning technical services process outlined above, it should be evident how the concept of discovery, highlighted during the process, moved from being focused on technical services to being owned by the entire organization. while the vocabulary of discovery had previously been owned by pockets of staff throughout unlv libraries, it has now become a common lexicon for all. the libraries’ evaluation framework, which includes discovery, had set the stage for our upcoming organizational strategic plan. just prior to the discovery summit, the dean of libraries’ direct reports group began to discuss how it would create a strategic plan for the 2009–11 biennium. it became increasingly apparent how important a focus on discovery would be in this process, and that we needed to time our planning right, allowing the organization and ourselves time to become familiar with the potential activities we might commit to in this area before locking into a strategic plan. “discovery” focus as impetus for organizational learning | fabbi 169 the dean’s direct reports group first spent time crafting a series of strategic directions to focus on in the two-year time period we were planning for. rather than give the organization specific activities to undertake, the strategic directions were meant to focus our new initiatives—and in a way to limit that activity to those that would move us past the status quo. of the sixteen directions, one stemmed directly from the organization’s focus on discovery: “improve discoverability of physical and electronic resources in empowering users to be self sufficient; work toward an interface and system architecture that incorporates our resources, internal and external, and allows the user to access them from their preferred starting point.” an additional direction also touched on the discovery concept: “monitor and adapt physical and virtual spaces to ensure they respond to and are informed by next-generation technologies, user expectations, and patterns in learning, social interactions, and research collaboration; encourage staff to experiment with, explore, and share innovative and creative applications of technology.” through their division directors and standing committees, all library staff members were subsequently given the opportunity to submit action items to the strategic plan within the framework of the strategic directions. the effort was made by the dean of libraries for this part of the process to coincide with the discovery mini-conference, a time when many library staff members were being exposed to a wide variety of potential activities that we might take as an organization in this area. one of the major action items that made it into the strategic plan was for the dean’s direct reports to charge an oversight task force with the investigation and recommendation of a systems or systems that would foster increased, unified discovery of library collections. the charge of this newly created discovery task force includes a set of guiding principles for the group in recommending a discovery solution that n creates a unified search interface for users pulling together information from the library catalog as well as other resources (e.g., journal articles, images, archival materials); n enhances discoverability of as broad a spectrum of library resources as possible; n is intuitive: minimizes the skills, time, and effort needed by our users to discover resources; n supports a high level of local customization (such as accommodating branding and usability considerations); n supports a high level of interoperability (easily connecting and exchanging data with other systems that are part of our information infrastructure); n demonstrates commitment to sustainability and future enhancements; and n is informed by preferred starting points of the user. in setting forth these guiding principles, the work of the discovery task force is informed by the organization’s discovery values, which have evolved over a year of organizational learning. in the timing of the strategic planning process and the emphasis of the plan, we made sure that the organization’s strategic development did not run ahead of organizational reality and also have worked to develop a culture that supports change and risk taking.11 the strategic discovery direction and pattern of organizational focus has been allowed to emerge throughout the organizational learning process. as evidenced in both the strategic plan directions and guiding principles laid out in the charge of the discovery task force, the organization has begun to absorb the basic philosophy that will guide appropriate objectives in this area and has focused more on this guiding philosophy than on the active pursuit of one right answer as it continues to learn. n conclusion using the theoretical lens of organizational learning, i have documented how unlv libraries’ emerging focus on the concept of discovery has provided an impetus for learning and change (see appendix a). our experience throughout this process supports the theory that organizational intelligence evolves over time and in reference to current operating norms.12 argyris and schön warn that a top-down approach to management focusing on control and clearly defined objectives encourages singleloop learning.13 had unlv libraries chosen a more management-oriented route at the beginning of this process, it most likely would have yielded an entirely different result. in this case, genuine organizational learning proved to be action based and ever-emerging, and while this is known to introduce some level of anxiety into an organization, the development of the ability to question, challenge, and potentially change operating norms has been worth the cost.14 i believe that while any single idea we have broached in the discovery arena may not be completely unique, it is the entire process of organizational learning that is significant and applicable to many information and technology-related areas of interest. references 1. karen calhoun, the changing nature of the catalog and its integration with other discovery tools (washington, d.c.: library 170 information technology and libraries | december 2009 scan and anticipate change in the wider environment to detect significant variations by n embracing views of potential futures as well as of the present and the past (revisioning phase 1: technical services questions); n understanding products and services from the customer’s point of view (revisioning phase 3: summit); and n using, embracing, and creating uncertainty as a resource for new patterns of development (revisioning phase 1: meeting; phase 3: summit). develop an ability to question, challenge, and change operating norms and assumptions by n challenging how they see and think about organizational reality using different templates and mental models (revisioning phase 2: survey); n making sure strategic development does not run ahead of organizational reality (strategic planning process; discovery task force charge); and n developing a culture that supports change and risk taking (strategic planning process). allow an appropriate strategic direction and pattern of organization to emerge by n developing a sense of vision, norms, values, limits, or “reference points” to guide behavior, including the ability to question the limits being imposed (revisioning phase 1: outcomes; phase 2: shared readings, activity; strategic planning process; discovery task force charge); n absorbing the basic philosophy that will guide appropriate objectives and behaviors in any situation (strategic planning process, discovery task force charge); and n placing as much importance on the selection of the limits to be placed on behavior as on the active pursuit of desired goals (strategic planning process, discovery task force charge). of congress, 2006), http://www.loc.gov/catdir/calhoun-report -final.pdf (accessed aug. 12, 2009); bibliographic services task force, rethinking how we provide bibliographic services for the university of california (univ. of california libraries, 2005), http://libraries.universityofcalifornia.edu/sopag/bstf/final .pdf (accessed aug. 12, 2009). 2. gareth morgan, images of organization (thousand oaks, calif.: sage, 2006). 3. chris argyris and donald a. schön, organizational learning ii: theory, method, and practice (reading, mass.: addison wesley, 1996). 4. morgan, images of organization, 87. 5. morgan, images of organization, 87–97. 6. ibid. 7. robert l. bothmann and melissa holmberg, “strategic planning for electronic management,” in electronic resource management in libraries: research and practice, ed. holly yu and scott breivold, 16–28 (hershey, pa.: information science reference, 2008). 8. karen coyle, “the library catalog: some possible futures,” the journal of academic librarianship 33, no. 3 (2007): 414–16. 9. morgan, images of organization. 10. ibid. 11. ibid. 12. ibid. 13. argyris and schön, organizational learning ii. 14. morgan, images of organization. appendix a. tracking unlv libraries’ discovery focus across characteristics of organizational learning “discovery” focus as impetus for organizational learning | fabbi 171 please complete the following and bring to the summit on discovery—february 24: 1. search for the information requested in each row of the table below with three discovery tools as your starting points: the libraries catalog, the libraries website, and a general internet search engine (like google). 2. for each discovery tool, rate the information that you were able to find in terms of “ease of discovery” on a scale of 1 (lowest ease) to 5 (highest ease). 3. document the thoughts and feelings you had and/ or process you went through in searching for this information in the space provided. 4. answer this question: do you have other preferred starting points when looking for information that the libraries own or provide access to? appendix b. summit matrix what am i looking for? libraries catalog libraries website google thoughts, etc., on what i discovered what’s all the fuss about frazier hall? why is it important? does unlv libraries have any documents about the history of the university that reference it? it’s black history month and my professor wants me to find an oral history about african americans in las vegas that is available in unlv libraries. i heard that henderson started as a mining community. does unlv libraries have any books about that? find any photograph of the gay pride parade in las vegas that you can look at in unlv libraries. 106 information technology and libraries | september 2009 michelle frisquepresident’s message michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. b y the time you read this column i will be lita president, however, as i write this i still have a couple of weeks left in my vice-presidential year. i have been warned by so many that my presidential year will fly by, and i am beginning to understand how that could be. i can’t believe i am almost done with my first year. i have enjoyed it and sometimes been overwhelmed by it—especially when i began the process of appointing lita volunteers to committees and liaison roles. i didn’t realize how many appointments there were to make. i want to thank all of the lita members who volunteered. you really helped make the appointment process easier. as a volunteer organization, lita relies on you, and once again many of you have stepped up. thank you. during the appointment process i was introduced to many lita members whom i had not yet met. i enjoyed being introduced to you virtually, and i look forward to meeting you in person in the coming year. i also want to thank the lita office. they were there whenever i needed them. without their assistance i would not have been able to successfully complete the appointment process. over the last year i have been working closely with this year’s lita emerging leaders, lisa thomas and holly tomren. i have really enjoyed the experience. their enthusiasm and energy is contagious. i wish every lita member could have been at this year’s lita camp in columbus, ohio, on may 8. during one of the lightning round sessions, lisa went to the podium and gave an impassioned speech about the benefits of belonging to a professional organization like lita. if there was a person in the audience that was not yet a lita member, i am sure they joined immediately afterward. she really captured the essence of why i became active in lita and why i continue to stay so involved in this organization so many years later. i can honestly say that as much as i have given to lita, i have received so much more in return. that is the true benefit of lita membership. over the last year, the lita board has had some great discussions with lita members and leaders. those conversations will continue as we start the work of drafting a new strategic plan. i want to create a strategic plan that will chart a meaningful path for the association and its members for the next several years. i want it to provide direction but also be flexible enough to adapt to changes in the information technology association landscape. as andrew pace mentioned in his last president’s message, changes will be coming. while we still aren’t sure exactly what those changes are, we know that it is time to seriously look at the current organizational structure of lita to make sure it best fits our needs today while continuing to remain flexible enough to meet our needs tomorrow. when i think of the organizational changes we are exploring, i can’t help but think of the houses i see on my favorite home improvement shows. lita has good bones. the structure and foundation are solid and well built, and as long as the house is well cared for, should last for years to come. however, like all houses, improvements need to be made over time to keep up with the market. the lita structure and foundation will be the same. when you drive up to the house you will still recognize the lita structure. when you walk in the door my hope is that you will still get that same homey feeling you had before, maybe with a few “oohs” and “aahs” thrown in as you notice the upgrades and enhancements. as the year progresses we will know more. i will use this column and other communication avenues to keep you informed of our plans and to gather your input. i would like to close my first column by thanking you for giving me this opportunity to serve you as the lita president. i am honored and humbled by the trust you have placed in me, and i am ready to start my presidential year. i hope it does not go by too quickly. i want to savor the experience. now let’s get started! editor’s comments bob gerrity information technology and libraries | september 2012 1 g’day, mates, and welcome to our third open-access issue. ital takes on an additional international dimension with this issue, as your faithful editor has taken up residence down under, in sunny queensland, australia. the recent ala annual meeting in anaheim marked some changes to the ital editorial board that i’d like to highlight. cynthia porter and judith carter are ending their tenure with ital after many years of service. cynthia is featured in this month’s editorial board thoughts column, offering her perspective on library technology past and present. judith carter ends a long run with ital as managing editor, and i thank her for her years of dedicated service. ed tallent, director of levin library at curry college, is the incoming managing editor. we also welcome two new members of the editorial board: brad eden, the dean of library services and professor of library science at valparaiso university, and jerome yavarkovsky, former university librarian at boston college, and the 2004 recipient of ala’s hugh c. atkinson award. jerome currently co-chairs the library technology working group at the mediagrid immersive education initiative. we cover a broad range of topics in this issue. ian chan, pearl ly, and yvonne meulemans describe the implementation of the open-source instant messaging (im) network openfire at california state university san marcos, in supporting of the integration of chat reference and internal library communications. richard gartner explores the use of the metadata encoding and transmission standard (mets) as an alternative to the fedora content model (fcm) for an “intermediary” digital-library schema. emily morton and karen hanson present an innovative approach to creating a management dashboard of key library statistics. kate pittsley and sara memmott describe navigational improvements made to libguides at eastern michigan university. bojana surla reports on the development of a platform-independent, java-based marc editor. yongming wang and trevor dawes delve into the need for next-generation integrated library systems and early initiatives in that space. melanie schlosser and brian stamper begin to explore the effects of reposting library digital collections on flickr. in addition to the compelling new content in this issue of ital, we have compelling old content from the print archive of ital and its predecessor, journal of library automation (jola), that will soon be available online, thanks in large to the work of andy boze and colleagues at the university of notre dame. scans of all of the back issues have now been deposited onto the server that currently hosts ital, and will be processed and published online over the coming months. bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, st. lucia, queensland, australia. learning to share: measuring use of a digitized collection on flickr and in the ir melanie schlosser and brian stamper information technology and libraries | september 2012 85 abstract there is very little public data on usage of digitized library collections. new methods for promoting and sharing digitized collections are created all the time, but very little investigation has been done on the effect of those efforts on usage of the collections on library websites. this study attempts to measure the effects of reposting a collection on flickr on use of the collection in a library-run institutional repository (ir). the results are inconclusive, but the paper provides background on the topic and guidance for future efforts. introduction inspired by the need to provide relevant resources and make wise use of limited budgets, many libraries measure the use of their collections. from circulation counts and in-library use studies of print materials, to increasingly sophisticated analyses of usage of licensed digital resources, the techniques have changed even as the need for the data has grown. new technologies have simultaneously presented challenges to measuring use, and allowed those measurements to become more accurate and more relevant. in spite of the relative newness of the digital era, “librarians already know considerably more about digital library use than they did about traditional library use in the print environment.”1 arl’s libqual+,2 one of the most widelyadopted tools for measuring users’ perceptions of service quality, has recently been joined by digiqual and mines for libraries. these new statsqual tools3 extend the familiar libqual focus on users into the digital environment. there are tools and studies for seemingly every type of licensed digital content, all with an eye toward better understanding their users and making better-informed collection management decisions. those same tools and studies for measuring use of library-created digital collections are conspicuous in their absence. almost two decades into library collection digitization programs, there is not a significant body of literature on measuring use of digitized collections. a number of articles have been written about measuring usage of library websites in general; arendt and wagner4 is a recent example. in one of the few studies to specifically measure use of a digitized collection, herold5 uses google analytics to uncover the geographical location of users of a digitized archival image collection. otherwise, a literature search on usage studies uncovers very little. less formal communication channels are similarly quiet, and public usage data on digitized collections on library sites is virtually nonexistent. commercial sites for disseminating and sharing melanie schlosser (schlosser.40@osu.edu) is digital publishing librarian and brian stamper (stamper.10@osu.edu) is administrative associate, the ohio state university libraries, columbus, ohio. mailto:schlosser.40@osu.edu mailto:stamper.10@osu.edu information technology and libraries | september 2012 86 digital media frequently display simple use metrics (image views, for example, or file downloads) alongside content; such features do not appear on digitized collections on library sites. usage and digitization projects digitized library collections are created with an eye toward use from their early planning stages. an influential early clir publication on selecting collections for digitization written by a harvard task force6 included current and potential use of the analog and digitized collection as a criterion for selection. the factors to be considered include the quantitative (“how much is the collection used?”) and the qualitative (“what is the nature of the use?”). more than ten years later, ooghe and moreels7 find that use is still a criterion for selection of collections to digitize, tied closely to the value of the collection. facilitating discovery and use of the digitized collection is a major consideration during project development. payette and rieger8 is an early example of a study of the needs of users in digital library design. usability testing of the interface is frequently a component of site design; see jeng9 for a good overview of usability testing in the digital library environment. increasing usage of the digitized collection is also a major theme in metadata research and development. standards such as the open archives initiative’s protocol for metadata harvesting10 and object reuse and exchange11 are meant to allow discovery and reuse of objects in a variety of environments, and the linked data movement promises to make library data even more relevant and reusable in the world wide web environment.12 digital collection managers have also found more radical methods of increasing usage of their collections. inserting references into relevant wikipedia articles has become a popular way to drive more users to the library’s site.13 some librarians have taken the idea a step further and have begun reposting their digital content on third-party sites. the smithsonian pioneered one reposting strategy in 2008 when they partnered with flickr, the popular photo-sharing site, to launch flickr commons.14 the commons is a walled garden within flickr that contains copyrightfree images held by cultural heritage institutions such as libraries, archives, and museums. each partner institution has its own branded space “photostream” in flickr parlance organized into collections and sets. this model aggregates content from different organizations and locates it where users already are, but it still maintains the traditional institution/collection structure. flickr commons has been, by all measures, a very successful experiment in sharing collections with users. the smithsonian,15 the library of congress,16 the alcuin society,17 and the london school of economics18 have all written about their experiences with the commons. stephens19 and michel and tzoc20 give advice on how libraries can work with flickr, and garvin21 and vaughan22 take a broad view of the project and the partners. another sharing strategy is beginning to emerge, where digital collection curators contribute individual or small groups of images to thematic websites. a recent example is pets in collections,23 a whimsical tumblr photo blog created by the digital collections librarian at bryn mawr college. learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 87 the site’s description states, “come on if you work in a library, archive, or museum, you know you’ve seen at least one of these a seemingly random image of that important person and his dog or a man and a monkey wearing overalls … so now you finally have a place to share them with the world!” the site requires submissions to include only the image and a link back to the institution or repository that houses it, although submitters may include more information if they choose. although more lighthearted than most traditional library image collections, it still performs the desired function of introducing users to digital collections they may never have encountered otherwise. clearly, these creative and thoughtful strategies are not dreamed up by digital librarians unconcerned with end use of their collections, so why do stewards of digitized collections so rarely collect, or at least publicly discuss, statistics on the use of their content? the one notable exception to this may shed some light on the matter. institutional repositories (irs) have been the one area of non-licensed digital library content where usage statistics are frequently collected and publicized. dspace,24 the widely-adopted ir platform developed by mit and hewlett-packard, has increasingly sophisticated tools for tracking and sharing use of the content it hosts. digital commons,25 the hosted ir solution created by bepress, provides automated monthly download reports for scholars who use it to archive their content. the development of these features has been driven by the need to communicate value to faculty and administrators. encouraging participation by faculty has been a major focus of ir managers since the initial ‘build it and they will come’ optimism faded and the challenge of adding another task to already busy faculty schedules became clear.26 having a clear need (outreach) and a defined audience (participating scholars) has led to a thriving program of usage tracking in the ir community. the lack of an obvious constituency and the absence of pointed questions about use in the digitized collections world have, one suspects, led to the current dearth of measurement tools and initiatives. still, questions about use do arise, particularly when libraries undertake laborintensive usability studies or venture into the somewhat controversial landscape of sharing library-created digital objects on third party sites.27 anecdotally, the thought of sharing library content elsewhere on the web also raises concerns about loss of context and control, as well as a fear of ‘dilution’ of the library’s web presence. “if patrons can use the library’s collections on other sites,” a fellow librarian once exclaimed, “they won’t come to the library’s website anymore!” without usage data, we cannot adequately answer questions about the value of our projects or the way they impact other library services. justification for study and research questions there were three major motivations for this project. first, inspired by the success of the flickr commons project, we wanted to explore a method for sharing our collections more widely. an image collection and a third-party image-sharing platform were an obvious choice, since image display is not a strength of our dspace-based repository. flickr is currently a major presence in information technology and libraries | september 2012 88 the image sharing landscape, and the existence of the commons was an added incentive for choosing flickr as our platform. second, the collection we selected for the project (described more fully below) is not fully described, and we wanted to take advantage of flickr’s annotation tools to allow user-generated metadata. since further description of the images would have required an unusual depth of expertise, we were not optimistic that we would receive much useful data, and in fact we did not. still, we lost nothing by asking, and gained familiarity with flickr’s capabilities for metadata capture. the final motivation for the project, and the focus of the study, was the desire to investigate the effect of third-party platform sharing of a local collection on usage of that collection on library sites. the data gathered were meant partly to inform our local practice, but also to address a concern that may hold librarians back from exploring such means of increasing collection usage the fear that doing so will divert traffic from library sites. we suspected that sharing collections more widely would actually increase usage of the items on library-owned sites, and the study was developed to explore the issue in a rigorous way. the research question for this study was: does reposting digitized images from a library site to a third-party image sharing site have an effect on usage of the images on the library site? about the study platforms for the study, the images were submitted to two different platforms the knowledge bank (kb),28 a library-managed repository, and flickr, a commercial image sharing site. the kb is an institutional repository built on dspace software with a manakin (xml-based) user interface. established in 2005, it holds more than 45,000 items, including faculty and student research, gray literature, institutional records, and digitized library collections. image collections like the one used in this study make up a small percentage of the items in the repository. in the kb’s organizational structure, the images in the study were submitted as a collection in the library’s community, under a sub-community for the special collection that contributed them. each image was submitted as an item consisting of one image file and dublin core metadata.29 the project originally called for submitting the images to flickr commons, but the commons was not accepting new partners during the study period. instead, we created a standard flickr pro account for the libraries, while following the commons guidelines in image rights and settings. in contrast to dspace’s community/sub-community/collection structure, flickr images are organized in sets, sets belong to collections, and all images make up the account owner’s photostream. a set was created for the images, with accompanying text giving background information and inviting users to contribute to the description of the images.30 the images were accompanied by the same metadata as the items in the kb, but the files themselves were higher resolution, to take advantage of flickr’s ability to display a range of sizes for each image. all items in the set were publicly learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 89 available for viewing, commenting, and tagging, and each image was accompanied by links back to the kb at the item, collection, and repository level. the collection the choice of a collection for the study was limited by a number of factors. first, and most obviously, it needed to be an image collection. second, it needed to be in the public domain, both to allow our digitization and distribution of the images, and also to satisfy flickr commons’ “no known copyright restrictions” requirement.31 this could be accomplished either by choosing a collection whose copyright protections had expired, or by removing restrictions from a collection to which the libraries owned the rights. third, the curator of the collection needed to be willing and able to post the images on a commercial site. this required not only an open-minded curator, but also a collection without a restrictive donor agreement or items containing sensitive or private information. finally, we wanted the collection to be of broad public interest. the collection chosen for the study was a set of 163 photographs from osu’s charles h. mccaghy collection of exotic dance from burlesque to clubs, held by the jerome lawrence and robert e. lee theatre research institute.32 the photographs, mainly images of burlesque dancers, were published on cabinet and tobacco cards in the 1890s, putting them solidly in the public domain. figure 1. "the devil's auction," j. gurney & son (studio). http://hdl.handle.net/1811/47633 (kb), http://www.flickr.com/photos/60966199@n08/5588351865/ (flickr) http://hdl.handle.net/1811/47633 learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 87 methodology phases the study took place in 2011 and was organized in three ten-week phases. for the first phase (january 31 through april 11), the images were submitted to the kb. the purpose of this phase was to provide a baseline level of usage for the images in the repository. in phase two (april 12 through june 20), half of the images were randomly selected and submitted to flickr (group a). the purpose of this phase was to determine what effect reposting would have on usage of items in the repository both on those images that were reposted, and on other images in the same collection that had not been reposted. in phase three (june 21 through august 29), the rest of the images (group b) were submitted to flickr. in this phase, we began publicizing the collection. publicity consisted of sharing links to the collection on social media and sending emails to scholars in relevant fields via email lists. these efforts led to further downstream publicity on popular and scholarly blogs.33 data collection the unit of measurement for the study was views of individual images. to understand the notion of a “view,” we must contrast two different ways that an image may be viewed in the knowledge bank. each image in the collection has an individual web page (the item page) where it is presented along with metadata describing it. in addition, from that page a visitor may download and save the image file itself (in this collection, a jpeg). in the former case, the image is an element in a web page, while in the latter it is an image file independent of its web context. search engines and other sources commonly link directly to such files, so it is not unusual for a visitor to download a file without ever having seen it in context. in light of this, we produced two data sets, one for visits to item pages, and another for file downloads. depending on one’s interpretation, either could be construed as a “view.” ultimately there was little distinction in usage patterns between the two types of measure. the data were generated by making use of dspace’s apache solr-based statistics system, which provides a queryable database of usage events. for each item in the study, we made two queries; one for per-day counts of item page views, and another for per-day counts of image file downloads (called “bitstream” downloads in dspace parlance.) in both cases, views that came from automated sources such as search engine indexing agents were excluded from our counts. views of the images in flickr were noted and used as a benchmark, but were not the focus of the study. unlike cumulative views, which are tabulated and saved indefinitely, flickr saves daily view numbers for only thirty days. as a result, daily view numbers for most of the study period were not available for analysis, and the discussion of the trends in the flickr data is necessarily anecdotal. information technology and libraries | september 2012 88 results at the end of the study period, the data showed very little usage of the collection in the repository. this lack of usage was relatively consistent through the three phases of the study, and in rough terms translates to less than one view of each item per day. of the two ways of measuring an image "view" either by counting views of the web page where the item can be found or by counting how many times the image file was downloaded there was little distinction. knowledge bank item pages received between 5 and 38 views per item, while files were downloaded between 5 and 34 times. further, there were no significant differences in number of views received between the first group released to flickr and the second. kb item page views image file downloads min median max min median max group a (images released to flickr in phase ii) 5 10 35 5 9 25 group b (images released to flickr in phase iii) 6 10 38 4 9 34 table 1. the items in the study are divided into group a and group b, depending on when the images were placed on flickr. this table shows that both groups received similar traffic over the course of the study, with items having between 5 and 38 views in both groups, with a median of 10 for both, and between 4 and 34 downloads, with a median of 9 for both groups. the items attracted more visitors on flickr, with the images receiving between 100 and 600 views each. with a few exceptions, the items that appeared towards the beginning of the set (as viewed by a user who starts from the set home page) received more views than items towards its end. this suggests a particular usage pattern start at the beginning, browse through a certain number of images, and navigate away. a more significant trend in the flickr data is that most views of the images came after publicity for the collection began (approximately midway through the third phase of the study). again, the lack of daily usage numbers on flickr makes it impossible to demonstrate the publicity ‘bump,’ but it was dramatic. we witnessed a similar, if smaller, ‘bump’ in usage of the items in the kb after publicity started. we were also able to identify 65 unique visitors to the kb who came to the site via a link on flickr, out of 449 unique visitors overall. of those who came to the kb from flickr, 31 continued on to other parts of the kb, and the rest left after viewing a single item or image. learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 89 discussion with so little data, we cannot reliably answer the primary research question. reposting certainly does not seem to have lowered usage of the items in the kb, but the numbers of views in all phases were so small as to preclude drawing meaningful conclusions. a larger issue is the fact that much of the usage came immediately following our promotional efforts. this development complicated the research in a number of ways. first, because the promotional emails and social media messages specifically pointed users to the collection in flickr, it is impossible to know how the use may have differed if the primary link in the promotion had been to the knowledge bank. would the higher use seen on flickr simply have transferred to the kb? would the unfamiliarity and non-image-centric interface of the knowledge bank have thwarted casual users in their attempt to browse the collection? the centrality of the promotion efforts also suggests that one of the underlying assumptions of the study may have been wrong. this research project was premised on the idea that an openly available collection on a library website will attract a certain number of visitors (number dependent on the popularity and topicality of the subject of the collection) who find the content spontaneously via searching and browsing. placing that same content on a third-party site could theoretically divert a percentage of those users, who would then never visit the library’s site. the percentage of users diverted would likely depend on how many more users browse the third party site than the library site, as well as the relative position of the two in search rankings. the mccaghy collection should have been a good candidate for this type of use pattern. flickr is certainly heavily used and browsed, and burlesque, while not currently making headlines, is a subject with fairly broad popular appeal. the fact that users did not spontaneously discover the collection on either platform in significant numbers suggests that this may not be how discovery of library digitized collections works. it is not surprising that email lists and social media should drive larger numbers of users to a collection than happenstance the power of link curation by trusted friends via informal communication channels is well known. what is surprising is that it was the only significant use pattern in evidence. the primary takeaway is that promotion is key. if we do not promote our collections to the people who are likely to be interested in them, barring a stroke of luck, it is unlikely that they will be found. anecdotally, promotional efforts are often an afterthought in digital collections work a pleasant but unnecessary ‘extra.’ in our environment, the repository staff often feel that promotion is the work of the collection owner, who may not think of promoting the collection in the digital environment, nor know how to do so. as a result, users who would benefit from the collections simply do not know they exist. these results also suggest that librarians worried about the consequences of sharing their collections on third party sites may be worrying about the wrong thing. the sheer volume of information on any given topic makes it unlikely that any but the most dedicated researcher will information technology and libraries | september 2012 90 explore all available sources. most other users are likely to rely on trusted information sources (traditional media, blogs, social networking sites) to steer them towards the items that are most likely to interest them. instead of wondering if users will still come to the library’s site if the content is available elsewhere, perhaps we should be asking of our digital collections, “is anyone using them on any site?” and if the answer is no, the owners and caretakers of those collections should explore ways to bring them to the attention of relevant audiences. conclusion as a usage study of a collection hosted on a library site and a commercial site, this project was not a success. flawed assumptions and a lack of usable data resulted in an inability to address the primary research question in a meaningful way. however, it does shed light on the questions that motivated it. are our digitized collections being used? what effect do current methods of sharing and promotion have on that use? librarians working with digitized collections have fallen behind our colleagues in the print and institutional repository arenas in measuring use of collections, but we have the same needs for usage data. in the current climate of heightened accountability in higher education and publicly funded institutions, we need to demonstrate the value of what we do. we need to know when our efforts to promote our collections are working, and determine which projects have been most successful and merit continued development. and as always, we need to share our results, both formally and informally, with our colleagues. measuring use of digital resources is challenging, and obtaining accurate usage statistics requires not only familiarity with the tools involved, but also some understanding of the ways in which the numbers can be unrepresentative of actual use. the organizations that do collect usage statistics on their digitized collections should share their methods and their results with others to help foster an environment where such data are collected and used. next steps in this area could take the shape of further research projects, or simply more visible work collecting usage statistics on digital collections. of greatest utility to the field would be data demonstrating the relative effectiveness of different methods of increasing use. do labor-intensive usability studies deliver returns in the form of increased use of the finished site? which forms of reposting generate the most views? what types of publicity are most effective in bringing users to collections? how does use of a collection change over time? there are also more policy-driven questions to be answered. for example, should further investment in a collection or site be tied to increasing use of low-traffic collections, or capitalizing on success? differences in topic, format, and audience make it difficult to generalize in this area, but we can begin building a body of knowledge that helps us learn from each other’s successes and failures. learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 91 references 1 brinley franklin, martha kyrillidou, and terry plum. "from usage to user: library metrics and expectations for the evaluation of digital libraries." in evaluation of digital libraries: an insight into useful applications and methods, ed. giannis tsakonas and christos papatheodorou, 17-39. (oxford: chandos publishing, 2009). http://www.libqual.org/publications (accessed february 29, 2012) 2 “libqual+,” accessed february 29, 2012. http://www.libqual.org/home 3 “statsqual,” accessed february 29, 2012. http://www.digiqual.org/ 4 julie arendt and cassie wagner. "beyond description: converting web site usage statistics into concrete site improvement ideas." journal of web librarianship 4, no. 1 (2010): 37-54. 5 irene m. h. herold. "digital archival image collections: who are the users?" behavioral & social sciences librarian 29, no. 4 (2010): 267-282. 6 dan hazen, jeffrey horrell, and jan merrill-oldham. selecting research collections for digitization. (council on library and information resources, 1998). http://www.clir.org/pubs/reports/hazen/pub74.html (accessed february 29, 2012) 7 bart ooghe and dries moreels. "analysing selection for digitisation: current practices and common incentives." d-lib magazine 15, no. 9 (2009): 28. http://www.dlib.org/dlib/september09/ooghe/09ooghe.html. 8 sandra d. payette and oya y. rieger. "supporting scholarly inquiry: incorporating users in the design of the digital library." the journal of academic librarianship 24, no. 2 (1998): 121-129. 9 judy jeng. "what is usability in the context of the digital library and how can it be measured?" information technology & libraries 24, no. 2 (2005): 47-56. 10 “open archives initiative protocol for metadata harvesting,” accessed february 29, 2012. http://www.openarchives.org/pmh/ 11 “open archives initiative object reuse and exchange,” accessed february 29, 2012. http://www.openarchives.org/ore/ 12 eric miller and micheline westfall. "linked data and libraries." serials librarian 60, no. 1&4 (2011): 17-22. 13 ann m. lally and carolyn e. dunford. “using wikipedia to extend digital collections,” d-lib magazine 13, no. 5&6 (2007). accessed february 29, 2012. doi:10.1045/may2007-lally 14 “flickr: the commons,” accessed february 29, 2012. http://www.flickr.com/commons/ 15 martin kalfatovic, effie kapsalis, katherine spiess, anne camp, and michael edson. "smithsonian team flickr: a library, archives, and museums collaboration in web 2.0 space." archival science 8, no. 4 (2008): 267-277. http://www.libqual.org/publications http://www.libqual.org/home http://www.digiqual.org/ http://www.clir.org/pubs/reports/hazen/pub74.html http://www.dlib.org/dlib/september09/ooghe/09ooghe.html http://www.openarchives.org/pmh/ http://www.openarchives.org/ore/ http://www.flickr.com/commons/ information technology and libraries | september 2012 92 16 josh hadro. "lc report positive on flickr pilot." library journal 134, no. 1 (2009): 23. 17 jeremiah saunders. “flickr as a digital image collection host: a case study of the alcuin society,” collection management 33, no. 4 (2008): 302-309. doi: 10.1080/01462670802360387 18 victoria carolan and anna towlson. "a history in pictures: lse archives on flickr." aliss quarterly 6 (2011): 16-18. 19 michael stephens. "flickr." library technology reports 42, 4 (2006): 58-62. 20 jason paul michel and elias tzoc. "automated bulk uploading of images and metadata to flickr." journal of web librarianship 4, no. 4 (10, 2010): 435-448. 21 peggy garvin. "photostreams to the people." searcher 17, no. 8 (2009): 45-49. 22 jason vaughan. "insights into the commons on flickr." portal: libraries & the academy 10, no. 2 (2010): 185-214. 23 “pets-in-collections,” accessed february 29, 2012. http://petsincollections.tumblr.com/ 24 “dspace,” accessed february 29, 2012. http://www.dspace.org/ 25 “digital commons,” accessed february 29, 2012. http://digitalcommons.bepress.com/ 26 dorothea salo. "innkeeper at the roach motel." library trends 57, no. 2 (2008): 98-123. 27 for an example of the type of debate that tends to surround projects like flickr commons, see http://www.foundhistory.org/2008/12/22/tragedy-at-the-commons/. (accessed february 29, 2012) 28 “the knowledge bank,” accessed february 29, 2012. http://kb.osu.edu 29 “charles h. mccaghy collection of exotic dance from burlesque to clubs,” accessed february 29, 2012. http://hdl.handle.net/1811/47556 30 “charles h. mccaghy collection of exotic dance from burlesque to clubs,” accessed february 29, 2012. http://flic.kr/s/ahsjua3bgi 31 “flickr: the commons (usage),” accessed february 29, 2012. http://www.flickr.com/commons/usage/ 32 “the jerome lawrence and robert e. lee theatre research institute,” http://library.osu.edu/find/collections/theatre-research-institute/; “charles h. mccaghy collection of exotic dance from burlesque to clubs,” http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-andspecial-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/; “loose women in tights digital exhibit,” http://library.osu.edu/find/collections/theatreresearch-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/. accessed february 29, 2012. http://petsincollections.tumblr.com/ http://www.dspace.org/ http://digitalcommons.bepress.com/ http://www.foundhistory.org/2008/12/22/tragedy-at-the-commons/.%29 http://hdl.handle.net/1811/47556 http://flic.kr/s/ahsjua3bgi http://www.flickr.com/commons/usage/ http://library.osu.edu/find/collections/theatre-research-institute/ http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-and-special-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/ http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-and-special-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/ http://library.osu.edu/find/collections/theatre-research-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/ http://library.osu.edu/find/collections/theatre-research-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/ learning to share: measuring use of a digitized collection on flickr and in the ir| schlosser and stamper 93 33 for an example of the kind of coverage it received, see http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesqueperformers (accessed february 29, 2012) http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesque-performers http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesque-performers 108 information technology and libraries | september 2011 nancy m. foasberg adoption of e-book readers among college students: a survey understand whether and how students are using e-book readers to respond appropriately. as new media formats emerge, libraries must avoid both extremes: uncritical, hype-driven adoption of new formats and irrational attachment to the status quo. ■■ research context recently introduced e-reader brands have attracted so much attention that it is sometimes difficult to remember that those currently on the market are not the first generation of such devices. the first generation was introduced, to little fanfare, in the 1990s. devices such as the softbook and the rocket e-book reader are well documented in the literature, but were unsuccessful in the market.1 the most recent wave of e-readers began with the sony reader in 2006 and amazon’s kindle in 2007, and thus far is enjoying more success. barnes and noble and borders have entered the market with the nook and the kobo, respectively, and apple has introduced the ipad, a multifunction device that works well as an e-reader. amazon claims that e-book sales for the kindle have outstripped their hardcover book sales.2 these numbers may reflect price differences, enthusiasm on the part of early adopters, marketing efforts on the parts of these particular companies, or a lack of other options for e-reader users because the devices are designed to be compatible primarily with the offerings of the companies who sell them. nevertheless, they certainly indicate a rise in the consumption of e-books by the public, as the dramatic increase in wholesale e-book sales bears out.3 in the meantime, sales of the devices increased nearly 80 percent in 2010.4 with this flurry of activity have come predictions that e-readers will replace print eventually, perhaps even within the next few years.5 books have been published with such bold titles as print is dead.6 however, despite the excitement, e-readers are still a niche market. according to the 2010 pew internet and american life survey, 5 percent of americans own e-book readers. those who do skew heavily to the wealthy and well-educated, with 12 percent having an annual household income of $75,000 or more and 9 percent of college graduates owning an electronic book reader. this suggests that e-book readers are still a luxury item to many.7 to academic librarians, it is especially important to know whether e-readers are being adopted by college students and whether they can be adapted for academic use. e-readers’ virtues, including their light weight, their ability to hold many books at the same time, and the speed with which materials can be delivered, could make them very attractive to students. however, they have many limitations for academic work. most do not provide the ability to copy and paste into another document, have to learn whether e-book readers have become widely popular among college students, this study surveys students at one large, urban, four-year public college. the survey asked whether the students owned e-book readers and if so, how often they used them and for what purposes. thus far, uptake is slow; a very small proportion of students use e-readers. these students use them primarily for leisure reading and continue to rely on print for much of their reading. students reported that price is the greatest barrier to e-reader adoption and had little interest in borrowing e-reader compatible e-books from the library. p ortable e-book readers, including the amazon kindle, barnes and noble nook, and the sony reader, free e-books from the constraints of the computer screen. although such devices have existed for a long time, only recently have they achieved some degree of popularity. as these devices become more commonplace, they could signal important changes for libraries, which currently purchase and loan books according to the rights and affordances associated with print books. however, these changes will only come about if e-book readers become dominant. for academic libraries, the population of interest is college students. their use of reading formats drives collection development practices, and any need to adjust to e-readers depends on whether students adopt them. thus, it is important to research the present state of students’ interest in e-readers. do they own e-readers? do they wish to purchase one? if they do own them, do they use them often and regard them suitable for academic work? the present study surveys students at queens college, part of the city university of new york, to gather information about their attitudes toward and ownership of e-books and e-book readers. because only queens college students were surveyed, it is not possible to draw conclusions about college students in general. however, the data do provide a snapshot of a diverse student body in a large, urban, four-year public college setting. the goal of the survey was to learn whether students own and use e-book readers, and if so, how they use them. in the midst of enthusiasm for the format by publishers, librarians and early adopters, it is important to consult the students themselves, whose preferences and reading habits are at stake. it is also vital for academic libraries to nancy m. foasberg (nfoasberg@qc.cuny.edu) is humanities librarian, queens college, city university of new york, flushing, new york. adoption of e-book readers among college students: a survey | foasberg 109 foundation survey, internet and american life, found that e-readers were luxury items owned by the well educated and well off. in the survey, 5 percent of respondents reported owning an e-reader.12 in the ecar study of undergraduate students and information technology, 3.1 percent of undergraduate college students reported owning an e-book reader, suggesting that college students are adopting the devices at a slower rate than the general population.13 commercial market research companies, including harris interactive and the book industry study group, also have collected data on e-book adoption. the harris interactive poll found that 8 percent of their respondents owned e-readers, and that those who did claimed that they read more since acquiring it. however, as a weighted online poll with no available measure of sampling error, these results should be considered with caution.14 the book industry study group survey, although it was sponsored by several publishers and e-reader manufacturers, appears to use a more robust method. this survey, consumer attitudes toward e-book reading, was conducted in three parts in 2009 and 2010. kelly gallagher, who was responsible for the group that conducted the study, remarks that “we are still in very early days on e-books in all aspects—technology and adoption.” although the size of the market has increased dramatically, the survey found that almost half of all e-readers are acquired as a gift and that half of all e-books “purchased” are actually free. however, among those who used e-books, about half said they mostly or exclusively purchased e-books rather than print. the e-books purchased are mostly fiction (75 percent); textbooks comprised only 11 percent of e-book purchases.15 much of the literature on e-book readers consists of user studies, which provide useful information about how readers might interact with the devices once they have them in hand but provide no information about whether students are likely to use them of their own volition. however, these studies are of interest because they hint at reasons that students may or may not find e-readers useful, important information for predicting the future of e-books. user studies have covered small devices, such as pdas (personal data assistants);16 first-generation e-readers, such as the rocket ebook;17 and more recent e-book readers.18 the results of many recent e-reader user studies have been very similar to studies on the usability of the first generation of e-book readers: the devices offer advantages in portability and convenience but lack good note-taking features and provide little support for nonlinear navigation. amazon sponsored large-scale research on academic uses of e-book readers at universities, such as princeton, case western reserve university, and the university of virginia,19 while other universities, such as northwest missouri state university,20 carried out their own projects limited note-taking capabilities, and rely on navigation strategies that are most effective for linear reading. the format also presents many difficulties regarding library lending. many publishers rely on various forms of drm (digital rights management) software to protect copyrighted materials. this software often prevents e-books from being compatible with more than one type of e-book reader. indeed, because e-book collections in academic libraries predate the emergence of e-book readers, many libraries now own or subscribe to large e-book collections that are not compatible with the majority of these devices. furthermore, publishers and manufacturers have been hesitant to establish lending models for their books. amazon recently announced that they would allow users to lend a book once for a period of fourteen days, if the publisher gave permission.8 this very cautious and limited approach speaks volumes about publishers’ fears regarding user sharing of e-books. several libraries have developed programs for lending the devices,9 but there is no real model for lending e-books to users who already own e-readers. a service called overdrive also provides downloadable collections, primarily of popular fiction, that can be accessed in this manner. however, the collections are small and are not compatible with all devices, including the most popular, the kindle. in the united kingdom, the publisher’s association has provided guidelines under which libraries can lend e-books, which include a requirement that the user physically visit the library to download the e-book.10 clearly, we do not currently have anything resembling a true library lending model for e-reader compatible e-books, especially not one that takes advantage of the format’s strengths. despite the challenges, it is clear that if e-book readers are enthusiastically adopted by students, libraries will need to find a way to offer materials compatible with them. as buczynski puts it, “libraries need to be in play at this critical juncture lest they be left out or sidelined in the emerging e-book marketplace.”11 however, because the costs of participating are likely to be substantial, it is very important to discover whether students are indeed adopting the hardware. few studies have focused on spontaneous student adoption of the devices, although several mention that when students were introduced to e-readers, they appeared to be unfamiliar with the devices and regard them as a novelty. however, e-readers have become more prevalent since many of these studies were conducted. thus this study surveys students to find their attitudes toward e-book readers. ■■ literature review only a few studies have attempted to quantify the popularity of e-readers. as mentioned above, the 2010 pew 110 information technology and libraries | september 2011 their first encounter with an e-book reader.”34 while this is mere anecdote, it, along with the survey results noted above, raises the question of how popular the device really is on college campuses. finally, a third group of studies attempts to predict the future of e-readers and e-books. even before the introduction of e-readers, some saw e-books as the likely future of academic libraries.35 more recently, one report discusses the likelihood of and barriers to e-book adoption. this article concludes that “barriers to e-book adoption still exist, but signs point to this changing within the next two to five years. that, of course, has been said for most of the past 15 to 20 years.”36 still, nelson points out that technologies can become ubiquitous very quickly, using the ipod as an example, and warns libraries against falling behind.37 yet another report puts e-books in the two-tothree-year adoption range and claims that e-books “have reached mainstream adoption in the consumer sector” and that the “obstacles have . . . started to fall away.”38 ■■ method the e-reader survey was conducted as part of queens college’s student technology survey, which also covered several other aspects of students’ interactions with technology. the author is grateful to the center for teaching and learning (in particular, eva fernández and michelle fraboni) for graciously agreeing to include questions about e-readers in the survey and providing some assistance in managing the data. this survey, run through queens college’s center for teaching and learning, was hosted by surveymonkey and was distributed to students through their official e-mail accounts. participants were offered a chance to win an ipod touch as an incentive, but students who did not participate also were offered an opportunity to enter the ipod drawing. the survey was available between april and june 2010. all personally identifying information was removed from the responses to protect student privacy. rather than surveying the entire population about e-readers and e-books, the survey limited most of the questions to students with some experience with the format. of the students who responded to the survey, only 63 (3.7 percent) used e-readers. however, 338 more students identified themselves as users of e-books but did not use e-readers. all other students skipped past the e-book questions and were directed to the next part of the survey. the questions about e-readers fell into several categories. the students were asked about their ownership of devices and which devices they planned to purchase in the future. while they might of course change their minds about future purchases, this is a useful way of measuring whether students regard the devices as desirable. with other e-readers. other types of programs, most notably texas a&m’s kindle lending program,21 and many academic focus groups have also contributed to our knowledge of how students use e-readers. users in nearly every study have praised the portability of these devices. this can be very important to students; users in one study noted that the portability of reading devices allowed them to “reclaim an otherwise difficult to use brief period,”22 and in another, students were able to multitask, doing household chores and studying at the same time.23 adjustable text size and the ability to search for words in the text have also been popular among students, as has the novelty value of these devices. environmental concerns surrounding heavy printing have also been cited as an advantage of e-readers.24 however, the limitations of these devices, some of which are severe in an academic setting, also have been noted. the comments of students at gettysburg college are typical: they liked the e-readers for leisure reading, but found them awkward for classroom use.25 lack of note-taking support was an important drawback for many students. waycott and kukulska-hulme noted that students were much less likely to take notes while reading with a pda than they were with print.26 a study at princeton found that the same was true of students using the kindle,27 and students at northwest missouri state university said they read less with an e-textbook than with a traditional one, although they did not report changes in their study habits.28 despite the ability of many devices to search the text of a book, users in many studies also disliked the inability to skim and browse through the materials as they would with print.29 interestingly, this complaint appeared in studies of all types of e-readers, even those with larger screens. students, in a recent study with the sony reader and ipod touch, noted that these devices did a poor job of supporting pdfs, a standard format for online course materials. the documents were displayed at a very small size and the words were sometimes jumbled.30 whether these drawbacks will prevent students from adopting e-book readers remained to be seen. library and information science (lis) students in a small, week-long study reiterated the problems found in the above studies, but nevertheless found themselves using e-readers extensively and reading more books and newspapers than they had before.31 several of these user studies hint that e-readers are not currently commonplace as far as users often seemed to regard the devices with surprise and curiosity. in some studies, while users were initially attracted to the novelty value of the devices, their enthusiasm dimmed after using the devices and discovering technical problems and limitations.32 one author describes e-readers as “attention getters, but not attention keepers.”33 a study in early 2009, in which students were provided with e-readers, notes that “for the majority of the participants, this was adoption of e-book readers among college students: a survey | foasberg 111 attitudes of students in general, similar surveys should be taken across many campuses in several demographically different areas. researching e-readers is inherently difficult because the landscape is changing very quickly. since the survey began, apple’s ipad became available, prices for dedicated e-readers have dropped dramatically, publishers have become more willing to offer content electronically, and amazon has released a new version of the kindle and has begun taking out television advertisements for it. without a follow-up survey, it is impossible to know whether these events have changed student attitudes. ■■ results and discussion e-reader adoption of the 1,705 students who responded to the survey, 401 say that they read e-books (table 1). most students (338) who use e-books read them on a device other than an e-reader, but 63 say they use a dedicated reader for e-books (table 2). however, when students were asked about the technological devices that they own, only 56 selected e-book readers. perhaps the seven students who use e-book readers but don’t report owning one are sharing or borrowing them, or perhaps they are using a device other than the ones enumerated in the question. aside from table 3, which breaks down the e-reader brands that students own, the following data will be based upon the larger sample of 63 students. the students who read e-books on another device were asked whether they planned to buy an e-reader in the respondents were also asked about their use of e-books. this category includes questions about what kind of reading students use e-books for, how much of their reading uses e-books, and where they are finding their e-books. it was important to learn whether students considered e-book readers appropriate for academic work, and whether they considered the library a potential source for e-books. finally, to assess their attitudes toward e-book readers, students were asked to identify the main benefits and drawbacks of e-book readers. several possibilities were listed, and students were asked to respond to them along a likert scale. a field was also included in which students could fill in their own answers. after 643 incomplete surveys were eliminated, there were 1,705 responses from queens college students. this is about 8 percent of the queens college student body. e-mail surveys always run the risk of response bias, especially when they concern technology. however, students who responded were representative of queens college in terms of sex, age, class standing, major, and other demographic characteristics. the results were compared using a chi-squared test with the level of significance set at 0.05. in some cases, there were too few respondents to test significance properly and comparisons could not be made. please see appendix for the e-reader questions included in the survey instruments. they will be referred to in more depth throughout this article. ■■ survey limitations the survey results may not be generalizable because of the survey’s small sample size. in particular, the 63 respondents who use e-book readers may not be representative of student e-reader owners in general. the survey also relies on self-reporting; no direct observation of student behavior took place. students who do use e-readers may be more comfortable with technology and more likely to respond to e-mail surveys. however, the sample is representative for queens college students, and the percentage of students who own e-book readers is close to the national average at the time the survey was taken (5 percent).39 since only queens college students were surveyed, the results reflect the behavior and attitudes of students at a single large, four-year public college in new york city. the results do not necessarily reflect the experience of students at other types of institutions or in other parts of the united states. the other parts of the technology survey show that qc students are heavy users of technology, so they may adopt new technologies such as e-book readers more quickly than other students. to understand the table 1. e-book use among respondents e-book use number of respondents read e-books 401 (23.5%) do not read e-books 1262 (74.0%) don’t know what an e-book is 42 (2.5%) total 1705 (100%) table 2. devices used to read e-books among e-book readers device used number of respondents (% of e-book users) dedicated e-reader 63 (15.7) other device 338 (84.3) total 401 (100) 112 information technology and libraries | september 2011 desire to buy an ipad, many more than reported owning an e-reader. curiously, the e-reader owners reported that they planned to buy an ipad at the same rate as the other students. it is not clear whether these students plan to replace their e-reader or use multiple devices. in either case, while the arrival of the ipad and other tablet devices seems likely to increase the number of students carrying potential e-reading devices, some of its adopters will probably be students who already own e-readers. not surprisingly, students who used e-readers tended to be early adopters of technology in general (table 4).40 compared to the general pool of respondents, they were much more likely to like or love new technologies and much less likely to describe themselves as neutral or skeptical of them. in a chi-squared test, these differences were significant at a level of 0.001. although e-reading devices have existed since the 1990s, the newest, most popular generation of them is so recent that people who own one now are early adopters by definition. compared to the rest of the survey respondents, both e-reader owners and other e-book users were much more likely to identify as early adopters of technology in general. given this trend, the adoption rate of e-readers among students may slow once the early adopters are satisfied. uses of e-books students who used an e-book reader were asked how much of their reading they did with it and whether they used it for class, recreational, or work-related reading (table 5). students without e-readers were asked the same questions about their use of e-books. while it is likely that students who use e-book readers continue to access e-books in other ways, this distinction was made because this survey was designed to study their use of e-readers specifically. because e-reader users were not asked about their use of e-books in other formats, it is not clear whether their habits with more traditional e-book formats differ from those of other students. fewer than half the e-reader users in the study used the device for two-thirds of their reading or more. in the table below, students who did all their reading and those who did about two-thirds of their reading with e-books are combined, because so few claimed to read e-books exclusively. three students with e-readers and future. the majority had no immediate plans to buy one, with those who said they did not plan to acquire one and those who did not know combining for 62.43 percent. 23.67 percent planned to buy one either within the next year or before leaving college, and the remaining 13.91 percent planned to acquire an e-reader after graduating. despite ergonomic disadvantages, many more students are using e-books on some other device, such as a computer or a cell phone, than are loading them on e-readers. furthermore, a large percentage of these students do not plan to buy an e-book reader. the factors preventing these students from buying e-readers will be covered in more detail in the “attitudes toward e-readers” section below. however, it seems likely that a major factor is price, identified by both e-reader owners and non-owners as the greatest disadvantage of these devices. when asked to list the devices they owned, 56 students named some type of e-book reader. among these, the amazon kindle was the most popular (table 3). as expected, e-readers have yet to be adopted by most students at queens college. at the time of this survey, less than 4 percent of respondents owned one. while the rest of the survey shows that these students are highly wired—82 percent own a laptop less than five years old and 93 percent have high-speed internet access at home—this has not translated to a high rate of e-reader ownership. although apple’s ipad, a tablet device that functions as an e-reader among other things, was not yet released at the time of the survey, it may see wider adoption than the dedicated devices. when the survey was originally distributed, this device had been announced but not yet released. overall, 8 percent of students expressed a table 3. e-reader brands owned by students devices owned number of students (% of e-reader owners) amazon kindle 26 (46.4%) barnes & noble nook 14 (25.0%) sony reader 10 (17.9%) other 6 (10.7%) total 56 (100.0%) table 4. e-reader use and self-identification as an early adopter e-reader owners all respondents love or like new technologies 40 (63.5%) 698 (40.9%) neutral or skeptical about new technologies 23 (36.5%) 1007 (59.1%) total 63 (100.0%) 1705 (100.0%) adoption of e-book readers among college students: a survey | foasberg 113 pleasure. this finding is much more surprising, given the very slow adoption of e-books before the introduction of e-readers, and the ergonomic problems with reading from vertical screens. however, students who used e-books without e-readers were much more likely to read e-books for classes. this difference may be due to the sorts of material that are available in each format. although textbook publishers have shown interest in creating e-textbooks for use on devices such as the ipad, there is little selection available for e-readers as yet. when working without e-book readers, however, there is a wide variety of academic materials available in electronic formats, and many textbooks include an online component. academic libraries, including the one at queens college, subscribe to large e-book collections of academic materials. for the most part, these collections cannot be used on an e-reader, but they are available through the library’s website to students with an internet connection and a browser. it is also possible that the e-readers are not well suited to class readings. some past studies, cited above, have found that e-readers do not accommodate functions such as note taking, skimming, and non-sequential navigation very well. since these are important functions for academic work, and both print books and “traditional” e-books are superior in these respects, such limitations may prevent students from using e-readers for classes. the user behaviors reported here do not appear to herald the end of print; in fact, very few students with e-readers use them for all their reading, and over half of the students with e-readers use them for one-third of their reading or less. it is not clear whether students intentionally choose to read some materials in print and others with nine without said they used e-books for all their reading. very few students without e-book readers used e-books for a large proportion of their reading; indeed, 54 percent said they used e-books for less than a third of their reading. differences between the groups were tested for significance using a chi-squared test. note that percentages may not add up to 100 percent, due to rounding. since many studies of e-book readers have found them more suitable for recreational reading than for academic work, users of e-readers were asked to identify the kinds of readings for which they used e-readers and asked to identify all options that they found applicable (table 6). since students were allowed to choose more than one option, the totals are greater than the number of participants. indeed, e-readers were much more likely to be used for recreational reading and other types of e-books far more likely to be used for class. for other types of reading, differences between these groups were not significant. since e-readers have been marketed largely for the popular fiction market and are designed to accommodate casual linear reading, it is not surprising that students who use them are most likely to report using them for leisure reading. in this area they seem to enjoy a strong advantage over more traditional e-book formats read on another device such as a computer or a cell phone. however, the study did not control for the amount of reading that students do. students who use e-readers may be heavier leisure readers in general. further research could clarify whether heavier use of leisure e-reading is due to the devices or the tendencies of those who own them. a large proportion of the students who read e-books without e-readers (65.7 percent) do read e-books for table 5. amount of reading done with e-books amount of reading e-reader users other users x2 significance level significant? about two-thirds or all 27 (42.8%) 65 (19.2%) 16.8 0.001 yes about a third 14 (22.2%) 90 (26.6%) 0.1 0.5 no less than a third 22 (34.9%) 183 (54.1%) 7.9 0.01 yes total 63 (99.9%) 338 (99.9%) ———table 6. types of reading done with e-books type of reading e-reader users other users x2 significance level significant? recreational 54 (85.7%) 222 (65.7%) 9.9 0.01 yes class 24 (38.1%) 217 (64.2%) 14.7 0.001 yes work 11 (17.8%) 88 (26.0%) 2.1 0.5 no other 3 (4.8%) 8 (2.4%) 1.1 0.5 no 114 information technology and libraries | september 2011 from the manufacturer of the e-reader that supports them, this result is not surprising. it suggests that these booksellers have a high degree of power in the market, a potential effect of e-readers that deserves further attention. however, official e-book sellers of the sort mentioned above are not the only option for students seeking digital reading material, since both independent online bookstores and open access repositories such as project gutenberg were used by students. libraries, both public and academic, reached traditional e-book users much more successfully than e-reader users. although many libraries have large e-book collections, there is currently little material for e-readers. despite the existence of a service called overdrive, which provides e-books compatible with some e-readers (excluding the kindle), circulating e-books is challenging, due to a host of technical and legal problems. given this environment, it is not surprising that students without e-readers were more likely to use their public library as a source of e-books than were e-reader users. the queens college campus library, which offers many electronic collections but none that are e-reader-friendly, fared worse; only one student claimed to have used it as a source of e-reader compatible materials. in the free comment field, students mentioned other sources of e-books such as the apple itunes store, the campus bookstore and lulu.com, an online bookseller that also provides self-publishing. several also admitted, unprompted, that they download books illegally. attitudes toward e-readers in the interests of learning what caused students to adopt e-readers or not, the survey used a series of likert-style questions to ask what the students considered the benefits and drawbacks of such devices. strikingly, e-reader owners and non-owners agreed about both the advantages and disadvantages; owning an e-reader did not seem to change most of the things that students value and dislike about it. figure 1 shows the number of students in each group who their e-reader, or whether they are limited by the materials available for the e-reader. the circumstances under which students switch between electronic and print would be an excellent area for future research; is it a matter of what is practically available, or is the e-reader better suited for some texts and reading circumstances than others? sources of e-books the major producers of e-readers are either primarily booksellers, such as amazon and barnes & noble, or are hardware manufacturers who also provide a store where users can purchase e-books, such as sony (or, after the ipad launch, apple). in both models, the manufacturers hope to sell e-books to those who have purchased their devices. they provide more streamlined ways of loading these e-books on their devices, and in some cases use drm to prevent their e-books from being used on competing devices, as well as to inhibit piracy. table 7 shows the sources from which readers with and without e-readers obtain e-books. e-reader users were much more likely than non-users to get their e-books from the official store associated it—that is, the store providing the e-reader, such as amazon, barnes and noble, or sony’s readerstore. there was no significant difference between the two groups’ use of open access or independent sources, but the students who did not use e-readers were much more likely to use e-books from their public library, and while 19.8 percent of students without e-readers used the campus library as a source of e-books, only one student with an e-reader did. since respondents were allowed to choose more than one answer, the results do not sum up to 100 percent. by a wide margin, students who own e-readers are most likely to purchase their e-reading materials from the “official” store; 86 percent cited the official store as a source of e-books. students without e-readers also use these stores more than any other source of e-books, but they are nevertheless far less likely to use them than e-reader users. because it is much easier to buy e-books table 7. sources of e-books how do you get e-books? e-reader users other users x2 significance level significant? store specific to popular e-readers 54 (85.7%) 154 (45.6%) 34.2 0.001 yes open access repositories 16 (25.4%) 120 (35.5%) 2.4 0.5 no public library 10 (15.9%) 99 (29.3%) 4.8 0.05 yes independent online retailer 9 (14.3%) 71 (21.0%) 1.5 0.5 no other 4 (6.3%) 39 (11.5%) n/a n/a n/a campus library 1 (1.6%) 67 (19.8%) n/a n/a n/a adoption of e-book readers among college students: a survey | foasberg 115 students with e-readers were more likely than others to rate portability and convenience as “very valuable.” as the studies cited above suggest, being able to easily download books, carry them away from the computer, and store many books on a single device are very appealing to students. only the final two features, text-to-speech and special features such as dictionaries, attracted enough “not very valuable” or “not valuable” responses for an inter-group comparison. both groups considered text-to-speech the least valuable feature, but students who did not own e-readers were significantly more likely to consider it a valuable or very valuable feature, perhaps indicating that the users to whom this is important have avoided the devices, which currently support it in a very limited fashion. perhaps, too, students with e-readers rated this feature less useful because of its current limitations. in either case, rated each feature either valuable or very valuable. if the positive features of the devices are ranked based on the percentage of respondents who considered them very valuable, the order is almost the same for students with and without e-readers. for students with e-readers, the features rank as follows: portability, convenience, storage, special functions, and text-to-speech. for those without, convenience ranks slightly higher than portability; all other features rank in the same order. tables 8 and 9 present the results of these questions in more detail. for the sake of brevity, the chi-squared results have been omitted. any differences considered significant in the discussion below are significant at least at the 0.05 level. nearly all e-reader users and a strong majority of other e-book users rated portability, convenience, and storage either “valuable” or “very valuable,” though figure 1. features rated “valuable” or “very valuable” 116 information technology and libraries | september 2011 among respondents suggests that that many of those who do not own an e-book reader are unfamiliar with the technology. since e-readers are primarily sold over the internet, many people have not had a chance to see or handle one, perhaps partly explaining this result. if they become more widespread, this may well change. not surprisingly, respondents who did not own e-readers were significantly more likely to prefer print. however, it is worth noting that even among students who did use e-readers, over a third “agree” or “completely agree” that they prefer print, with another third neither agreeing nor disagreeing. use of e-readers does not appear to indicate hostility toward print. this is consistent with the students’ self-reports of e-reader use; as reported above, over half of the students surveyed use e-readers for one-third of their reading or less. thus, it seems unlikely that most of these students plan to totally abandon print any time soon; rather, e-readers are providing another format that they use in addition to print. as for students who do not use e-readers, over half say they prefer print, but this is far from their most widespread concern; rather, like e-reader owners, they are most likely to cite the cost of the reader or the selection of books available as a drawback of the devices. queens college students considered price the most important drawback of e-readers. for both groups (owners and non-owners), it was the factor most likely to be identified as a concern, and the difference between the it was the only variable listed in the survey for which either the “not very valuable” and “not valuable” responses from either group amounted to a combined total of greater than 10 percent of the respondents in that group. in addition to valuing the same features, e-reader owners and non-owners had similar concerns about the device. figure 2 shows the number of respondents in each group who agreed or completely agreed that the issues listed were one of the main shortcomings of e-book readers. tables 10 and 11 give the responses in more detail. the responses with which the most respondents either agreed or completely agreed were the same: cost of e-reader, selection of e-books, and cost of e-books, in that order. although groups such as the electronic frontier foundation have raised concerns about privacy issues related to e-readers,41 these issues have made little impression on students; both e-reader users and nonusers were in agreement in putting privacy at the bottom of the list. one exception to the general agreement between e-reader users and other e-book readers was concern about eyestrain. the majority (63 percent) of those who do not use e-readers either “completely agree” or “agree” that eyestrain is a drawback, while only 29 percent of e-reader owners did. this was a major concern for early e-readers, leading the current generation of these devices to use e-ink, a technology that resembles paper and is thought to eliminate the eyestrain problem. the disparity table 8. value of e-reader features, according to e-reader users very valuable valuable somewhat valuable not very valuable not valuable at all no response portability 52 (82.54%) 10 (15.87%) 1 (1.59%) 0 (0.00%) 0 (0.00%) 0 (0.00%) convenience 46 (73.02%) 13 (20.63%) 1 (1.59%) 1 (1.59%) 1 (1.59%) 1 (1.59%) storage 42 (66.67%) 16 (25.40%) 2 (3.17%) 1 (1.59%) 0 (0.00%) 2 (3.17%) special functions 32 (50.79%) 18 (28.57%) 7 (11.11%) 3 (4.76%) 3 (4.76%) 0 (0.00%) text-speech 10 (15.87%) 13 (20.63%) 12 (19.05%) 16 (25.40%) 11 (17.46%) 1 (1.59%) table 9. value of e-reader features, according to other e-book users very valuable valuable somewhat valuable not very valuable not valuable at all no response portability 199 (58.88%) 89 (26.33%) 39 (11.53%) 4 (1.18%) 5 (1.48%) 2 (0.06%) convenience 194 (57.40%) 98 (28.99%) 34 (10.06%) 7 (2.07%) 2 (0.59%) 3 (0.89%) storage 181 (53.55%) 99 (29.28%) 40 (11.83%) 10 (2.96%) 4 (1.18%) 4 (1.18%) special functions 169 (50.00%) 82 (24.26%) 58 (17.16%) 22 (6.51%) 4 (1.18%) 3 (0.89%) text-speech 95 (28.11%) 77 (22.78%) 77 (22.78%) 50 (14.79%) 35 (10.36%) 4 (1.18%) adoption of e-book readers among college students: a survey | foasberg 117 responded, but they brought up issues such as highlighting, battery life, and the small size of the screen. another student was more confident in the value of e-readers and used this space to proclaim paper books dead. e-book circulation programs finally, students were asked whether they would be interested in checking out e-readers with books loaded on them from the campus library (table 12). as is often the case when a survey asks for interest in a prospective new service, the response was very positive. however, it was expected that many of the students would prefer to download materials for devices that they already own to take advantage of the convenience of e-readers. on the contrary, a high percentage of both types of students expressed interest in checking out e-book readers, but very few wished to check out e-books two groups was not significant. at the time this survey was taken, amazon’s kindle cost close to $300 and barnes and noble’s nook was priced similarly. soon after the survey closed, however, the major e-reader manufacturers engaged in a “price war,” which resulted in the prices of the best-known dedicated readers, amazon’s kindle and barnes and noble’s nook, falling to under $200. given the feeling among survey respondents that the price of the readers is a serious drawback, this reduction may cause the adoption rate to rise. it would be worthwhile to repeat this survey or a similar one in the near future to learn whether the e-reader price war has had any effect upon price-sensitive students. in the pilot survey, students had written in further responses about the drawbacks of e-readers, but not about their benefits. while some of those responses were incorporated into the final survey, a free text field was also added to catch any further comments. few students figure 2. drawbacks with which students “agree” or “completely agree” 118 information technology and libraries | september 2011 ■■ future research although this survey provides some data to help libraries think about the popularity of e-readers among students, many aspects of students’ use of e-readers remain unexplored. further research on how student adoption of e-book readers varies by location and demographics, particularly considering students’ economic characteristics, for a device of their own. even students who owned e-readers were much more likely to express interest in checking out the device than checking out materials to read on it. this preference belies the common assumption that users do not wish to carry multiple devices and prefer to download everything electronically. instead, they were interested in checking out an e-reader from the library. unless the emphasis of the question altered the results, it is somewhat difficult to account for this response. table 10. drawbacks of e-readers, according to e-reader owners completely agree agree neither agree nor disagree disagree completely disagree no response cost of reader 19 (30.16%) 23 (36.51%) 13 (20.63%) 7 (11.11%) 0 (0.00%) 1 (1.59%) selection 11 (17.46%) 26 (41.27%) 12 (19.05%) 7 (11.11%) 6 (9.52%) 1 (1.59%) cost of e-books 10 (15.87%) 20 (31.75%) 16 (25.40%) 11 (17.46%) 5 (7.94%) 1 (1.59%) prefer print 6 (9.52%) 16 (25.40%) 21 (33.33%) 11 (17.46%) 8 (12.70%) 1 (1.59%) eyestrain 7 (11.11%) 11 (17.46%) 20 (31.75%) 15 (23.81%) 9 (14.29%) 1 (1.59%) interface 7 (11.11%) 10 (15.87%) 24 (38.10%) 9 (14.29%) 8 (12.70%) 5 (7.94%) privacy 3 (4.76%) 9 (14.29%) 13 (20.63%) 26 (41.27%) 11 (17.46%) 1 (1.59%) table 11. drawbacks of e-readers, according to other e-book users completely agree agree neither agree nor disagree disagree completely disagree no response cost of reader 146 (43.20%) 117 (34.62%) 50 (14.79%) 14 (4.14%) 11 (3.25%) 0 (0.00%) selection 80 (23.67%) 136 (40.24%) 84 (24.85%) 27 (7.99%) 7 (2.07%) 4 (1.18%) cost of e-books 94 (27.81%) 121 (35.80%) 76 (22.49%) 37 (10.95%) 10 (2.96%) 0 (0.00%) prefer print 78 (23.08%) 99 (29.29%) 116 (34.32%) 25 (7.40%) 19 (5.62%) 1 (0.30%) eyestrain 84 (24.85%) 129 (38.17%) 80 (23.67%) 33 (9.76%) 11 (3.25%) 1 (0.30%) interface 43 (12.72%) 82 (24.26%) 145 (42.90%) 33 (9.76%) 20 (5.92%) 15 (4.44%) privacy 39 (11.54%) 65 (19.23%) 144 (42.60%) 49 (14.50%) 40 (11.83%) 1 (0.30%) table 12. interest in checking out preloaded e-readers from the library e-reader owners other e-book users would be interested in checking out e-readers 44 (70.0%) 257 (76.0%) would not be interested in checking out e-readers 4 (6.3%) 38 (11.2%) would not be interested in checking out e-readers, but would like to check out e-books to read on my own e-reader 15 (23.8%) 43 (12.7%) total 63 (100.1%) 338 (99.9%) adoption of e-book readers among college students: a survey | foasberg 119 whom would not object to using a print edition if one were available. under these circumstances, and realizing that the future popularity of e-readers is far from guaranteed, developing such models is, for now, more important than putting them into practice in the short term. references 1. nancy k. herther, “the ebook reader is not the future of ebooks,” searcher 16, no. 8 (2008): 26–40, http://search.ebsco host.com/login.aspx?direct=true&db=a9h&an=34172354&site =ehost-live (accessed dec. 22, 2010). 2. charlie sorrel, “amazon: e-books outsell hardcovers,” wired, july 20, 2010, http://www.wired.com/gadgetlab/ 2010/07/amazon-e-books-outsell-hardcovers/ (accessed dec. 22, 2010). 3. international digital publishing forum, “industry statistics,” oct. 2010, http://www.idpf.org/doc_library/indus trystats.htm (accessed dec. 22, 2010). 4. kathleen hall, “global e-reader sales to hit 6.6m 2010,” electronics weekly, dec. 9, 2010, http://www.electronicsweekly .com/articles/2010/12/09/50083/global-e-reader-sales-to -reach-6.6m-2010-gartner.htm (accessed dec. 22, 2010). 5. cody combs, “will physical books be gone in five years?” video interview with nicholas negroponte, cnn, oct. 18, 2010, http://www.cnn.com/2010/tech/innovation/10/17/negro ponte.ebooks/index.html (accessed dec. 22, 2010). 6. jeff gomez, print is dead: books in our digital age (basingstoke, uk: palgrave macmillan, 2009). 7. aaron smith, “e-book readers and tablet computers,” in americans and their gadgets (washington, d.c.: pew internet & american life project, 2010), http://www.pewinternet.org/ reports/2010/gadgets/report/ebook-readers-and-tablet -computers.aspx (accessed dec. 22, 2010). 8. alex sharp, “amazon announces kindle book lending feature is coming in 2010,” suite101, oct. 26, 2010, http:// www.suite101.com/content/amazon-announces-kindle-book -lending-feature-is-coming-in-2010-a300036#ixzz18cxanfke (accessed dec. 22, 2010). 9. karl drinkwater, “e-book readers: what are librarians to make of them?” sconul focus 49 (2010): 4–10, http://www .sconul.ac.uk/publications/newsletter/49/2.pdf (accessed dec. 22, 2010). drinkwater provides an overview and a discussion of the challenges and benefits of such programs. 10. benedicte page, “pa sets out restrictions on library e-book lending,” the bookseller, oct. 21, 2010, http://www .thebookseller.com/news/132038-pa-sets-out-restrictions-on -library-e-book-lending.html (accessed dec. 22, 2010). 11. james a. buczynski, “library ebooks: some can’t find them, others find them and don’t know what they are,” internet services reference quarterly 15, no. 1 (2010): 11–19, doi: 10.1080/10875300903517089, http://dx.doi.org/ 10.1080/10875300903517089 (accessed dec. 22, 2010). 12. smith, “e-book readers and tablet computers,” http:// www.pewinternet.org/reports/2010/gadgets/report/ebook -readers-and-tablet-computers.aspx (accessed dec. 22, 2010). 13. shannon d. smith and judith borreson caruso, the ecar study of undergraduate students and information technology, 2010 (boulder, colo.: educause, 2010), http://net.educause. is certainly important. more research on the habits of students with e-readers would also help libraries and universities to better serve their needs. in particular, while this survey found that students tend to switch between electronic and print formats, little is yet known about when and why they move from one to the other. it will also be important to research the differences between the reading habits of students who own e-readers and those who do not, as this may prove useful in interpreting the survey data about types of reading done with different kinds of e-books. furthermore, since the e-book market changes quickly, continuing to research student adoption of e-readers is also important to monitor student reactions to new developments. ■■ conclusion while many queens college students express an interest in e-readers, and even those who do not own one believe that their portability and convenience offer valuable advantages, only a small percentage of students, many of whom are early adopters of technology in general, actually use one. furthermore, even those who own e-readers do not use them exclusively, and only a third say they prefer it to print. in light of these responses, the proper response to this technology may not be a discussion about whether “paper books are dead” (as one of the survey respondents wrote in the comment field) but how each format is used. research on when, where, and for what purposes students might choose print or electronic has already begun.42 many of the factors that contribute to the niche status of e-readers are changing. competition between manufacturers has brought down the price of the reader itself, and the selection of books available for them is improving. because these were some of the most important problems standing in the way of e-reader adoption for queens college students, e-reader ownership could increase rapidly. the lack of a significant difference between the attitudes of e-reader owners and nonowners merits further emphasis and examination, as it may indicate that price is indeed the major barrier to e-reader ownership. although the prices are lower now than they were when the survey was originally taken, this would present a major concern if e-readers became the expected format in which students read, perhaps even the possibility of a new kind of digital divide. as the future is uncertain, it is important for academic libraries to pay attention to their students’ adoption of e-readers, and to consider models under which they can provide materials compatible with them. however, it is important to remember that such materials would, at present, be accessible to only a small subset of users, many of 120 information technology and libraries | september 2011 20. jon t. rickman et al., “a campus-wide e-textbook initiative,” educause quarterly 32, no. 2 (2009), http://www.edu cause.edu/library/eqm0927 (accessed dec. 22, 2010). 21. dennis t. clark, “lending kindle e-book readers: first results from the texas a&m university project,” collection building 28, no. 4 (2009): 146–49, doi: 10.1108/01604950910999774, http://www.emeraldinsight.com/journals.htm?articleid=18174 06&show=abstract (accessed dec. 22, 2010). 22. marshall and rutolo, “reading-in-the-small,” 58. 23. mallett, “a screen too far?” 142. 24. “e-reader pilot at princeton.” 25. foster and remy, “e-books for academe,” 6. 26. waycott and kukulska-hulme, “students’ experiences with pdas,” 38. 27. “e-reader pilot at princeton.” 28. rickman, “a campus-wide e-textbook initiative.” 29. dennis t. clark et al., “a qualitative assessment of the kindle e-book reader: results from initial focus groups,” performance measurement and metrics 9, no. 2 (2008): 118–129, doi: 10.1108/14678040810906826, http://www.emeraldinsight .com/journals.htm?articleid=1736795&show=abstract (accessed dec. 22, 2010); james dearnley, cliff mcknight, and anne morris. “electronic book usage in public libraries: a study of user and staff reactions to a pda-based collection,” journal of librarianship and information science 36, no. 4 (2004): 175–182, doi: 10.1177/0961000604050568, http://lis.sagepub.com/content/36/4/175 (accessed dec. 22, 2010); mallett, “a screen too far?” 143; waycott and kukulska-hulme, “students’ experiences with pdas,” 36. 30. mallet, “a screen too far?” 142–43. 31. m. cristina pattuelli and debbie rabina. “forms, effects, function: lis students’ attitudes toward portable e-book readers,” aslib proceedings: new information perspectives 62, no. 3 (2010): 228–44, doi: 10.1108/00012531011046880, http://www .emeraldinsight.com/journals.htm?articleid=1863571&show=ab stract (accessed dec. 22, 2010). 32. see, for example, gil-rodriguez and planella-ribera, “educational uses of the e-book,” 58–59; and cliff mcknight and james dearnley, “electronic book use in a public library,” journal of librarianship & information science 35, no. 4 (2003): 235–42, doi: 10.1177/0961000603035004003, http://lis.sagepub .com/content/35/4/235 (accessed dec. 22, 2010). 33. rickman et al. “a campus-wide e-textbook initiative.” 34. maria kiriakova et al., “aiming at a moving target: pilot testing ebook readers in an urban academic library,” computers in libraries 30, no. 2 (2010): 20–24, http://search .ebscohost.com/login.aspx?direct=true&db=a9h&an=48757663 &site=ehost-live (accessed dec. 22, 2010). 35. mark sandler, kim armstrong, and bob nardini, “market formation for e-books: diffusion, confusion or delusion?” the journal of electronic publishing 10, no. 3 (2007), doi: 10.3998/3336451.0010.310, http://quod.lib.umich.edu/cgi/t/ text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0010.310 (accessed dec. 22, 2010). 36. mark r. nelson, “e-books in higher education: nearing the end of an era of hype?” educause review 43, no. 2 (2008), http://www.educause.edu/educause+review/ educausereviewmagazinevolume43/ebooksinhigher educationnearing/162677 (accessed dec. 22, 2010). 37. ibid. 38. l. johnson et al., the 2010 horizon report (austin, tex.: edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed dec. 22, 2010). 14. harris interactive, “one in ten americans use an e-reader; one in ten likely to get one in the next six months,” press release, sept. 22, 2010, http://www.piworld.com/com mon/items/biz/pi/pdf/2010/09/pi_pdf_harrispoll_ereaders. pdf (accessed dec. 22, 2010). 15. kat meyer, “#followreader: consumer attitudes toward e-book reading,” blog posting, o’reilly radar, aug. 4, 2010, http://radar.oreilly.com/2010/08/followreader-consumer-atti tudes-toward-e-book-reading.html (accessed dec. 22, 2010). 16. the following articles are all based on user studies with small form factor devices: paul lam, shun leung lam, john lam and carmel mcnaught, “usability and usefulness of ebooks on ppcs: how students’ opinions vary over time,” australasian journal of educational technology 25, no. 1 (2009): 30–44, http:// www.ascilite.org.au/ajet/ajet25/lam.pdf (accessed dec. 22, 2010); catherine c. marshall and christine rutolo, “readingin-the-small: a study of reading on small form factor devices,” in jcdl ’02 proceedings of the 2nd acm/ieee-cs joint conference on digital libraries (new york: acm, 2002): 56–64. doi: 10.1145/544220.544230, http://portal.acm.org/citation .cfm?doid=544220.544230 (accessed dec. 22, 2010); and j. waycott and a. kukulska-hulme, “students’ experiences with pdas for reading course materials,” personal ubiquitous computing 7, no. 1 (2002): 30–43, doi: 10.1007/s00779–002–0211-x, http://www .springerlink.com/content/w288kry251dd2vcd/ (accessed dec. 22, 2010). 17. some examples in an academic context: james dearnley and cliff mcknight, “the revolution starts next week: the findings of two studies considering electronic books,” information services & use 21, no. 2 (2001): 65–78, http://search .ebscohost.com/login.aspx?direct=true&db=a9h&an=5847810& site=ehost-live (accessed dec. 22, 2010); and eric j. simon, “an experiment using electronic books in the classroom,” journal of computers in mathematics & science teaching 21, no. 1 (2002): 53–66, http://vnweb.hwwilsonweb.com/hww/jumpstart.jhtml?recid= 0bc05f7a67b1790e5237dc070f466830549a60a87b3fa34bd0b8951acd 7a879da9fa151218a88252&fmt=h (accessed dec. 22, 2010). 18. eva patrícia gil-rodriguez and jordi planella-ribera, “educational uses of the e-book: an experience in a virtual university context,” in hci and usability for education and work, ed. andreas holzinger, lecture notes in computer science no. 5298 (berlin: springer, 2008): 55–62, doi: 10.1007/9783-540-89350-9-5, http://www.springerlink.com/content/ d357482823j10m96/ (accessed dec. 22, 2010); “e-reader pilot at princeton, final report,” (princeton university, 2009), http:// www.princeton.edu/ereaderpilot/ereaderfinalreportlong .pdf (accessed dec. 22, 2010); gavin foster and eric d. remy. “e-books for academe: a study from gettysburg college,” educause research bulletin, no. 22 (2009), http://www.educause .edu/resources/ebooksforacademeastudyfromgett/187196 (dec. 22, 2010); and elizabeth mallett, “a screen too far? findings from an e-book reader pilot,” serials 23, no. 2 (2010): 14–144, doi: 10.1629/23140, http://uksg.metapress.com/ media/mfpntjwvyqtggyjvudu7/contributions/f/3/2/6/ f32687v5r12n5h77.pdf (accessed july 11, 2011). 19. steve kolowich, “colleges test amazon’s kindle e-book reader as study tool,” usa today, feb. 23, 2010, http://www .usatoday.com/news/education/2010–02–23-ihe-amazon-kin dle-for-college23_st_n.htm (accessed dec. 22, 2010). adoption of e-book readers among college students: a survey | foasberg 121 question 22, and was reused in the current survey. again, the author extends thanks to michelle fraboni and eva fernández, who ran this portion of the survey at queens college and allowed the use of their data. 41. electronic frontier foundation, “updated and corrected: e-book buyer’s guide to privacy,” deeplinks blog, jan. 6, 2010, http://www.eff.org/deeplinks/2010/01/updated-and-corrected-e-book-buyers-guide-privacy (accessed dec. 22, 2010). 42. pattuelli and rabina, “lis students’ attitudes.” new media consortium, 2010), http://wp.nmc.org/horizon2010/chapters/electronic-books/ (accessed july 11, 2011). 39. aaron smith, “e-book readers and tablet computers,” h t t p : / / w w w. p e w i n t e r n e t . o rg / r e p o r t s / 2 0 1 0 / g a d g e t s / report/ebook-readers-and-tablet-computers.aspx (accessed july 11, 2011). 40. this question was located in a portion of the survey not focused on e-book readers and thus does not appear in the appendix. the question derives from smith and caruso, 105, 122 information technology and libraries | september 2011 appendix. queens college student technology survey adoption of e-book readers among college students: a survey | foasberg 123 124 information technology and libraries | september 2011 adoption of e-book readers among college students: a survey | foasberg 125 126 information technology and libraries | september 2011 adoption of e-book readers among college students: a survey | foasberg 127 128 information technology and libraries | september 2011 2 information technology and libraries | march 2008 currently we librarians seem to be hitching our wagon to the idea of library as community because in part it’s what we ourselves want. we’ve seen that our lita members want more community from our association, so it makes sense to us that our patrons also want community. it’s what pew, oclc, and other studies seem to be telling us. the business-wired side of the world is breaking their backs to create every form of virtual community they can think of as quickly as possible. apply the appropriate amounts of marketing and then our patrons want those things and expect them from all of their historically important community resources, the library being a prime player in that group. so we strive and strive and strive to not only provide the standard issue face-to-face community we’ve always created, but to also create that new highly desired virtual community. either we create a library-specific version, or we at the very least create a way for our patrons to access those communities. hopefully, when our patrons step into those virtual communities, we work to make it possible for them to find libraries there, too. all well and good, but do we have a plan? what’s the goal? what’s the end achievement? if, as studies say, patrons with a research need turn to libraries first only one percent of the time, and instead first hit up friends and family fifty or more percent of the time, then where is our significance and place in either the physical or virtual spaces? we know we serve significant numbers in many ways. we have gate counts, circulation records, holds placed, warm bodies in the building—all manners of indicators that show a well-managed and -marketed library is in demand and appreciated. as we run into the terrible head-on crash of community and technology, willy-nilly doing absolutely everything we can to accommodate everyone and everything, because we’re librarians and library technologists and that’s what we do, do we really have a clue why we’re doing it? all fodder for deep thought and many lattes or beers and late night discussions. on the lita side, though, we’re embarking on doing something about this knot when it comes to serving our members. under the guidance of past-president bonnie postlethwaite we’ve established an assessment and research committee co-chaired by bonnie and diane bisom. to kick off the committee activities and to help them establish an agenda and direction, lita hired the research firm the wedewer group to work with the lita board and the new committee. stay tuned for reports and announcements from this committee as it works to find answers to some of those questions. and have that latte with a lita colleague as you seek to find some answers yourself. it’s all part of building community. mark beatty (mbeatty@wils.wisc.edu) is lita president 2007/2008 and trainer, wisconsin library services, madison. president’s message: doing something about life’s persistent problems? mark beatty 24 information technology and libraries | march 2011 ruben tous, manel guerrero, and jaime delgado semantic web for reliable citation analysis in scholarly publishing nevertheless, current practices in citation analysis entail serious problems, including security flaws related to the publishing process (e.g., repudiation, impersonation, and privacy of paper contents) and defects related to citation analysis, such as the following: ■■ nonidentical paper instances confusion ■■ author naming conflicts ■■ lack of machine-readable citation metadata ■■ fake citing papers ■■ impossibility for authors to control their related citation data ■■ impossibility for citation-analysis systems to verify the provenance and trust of citation data, both in the short and long term besides the fact that they do not provide any security feature, the main shortcoming of current citation-analysis systems such as isi citation index, citeseer (http:// citeseer.ist.psu.edu/), and google scholar is the fact that they count multiple copies or versions of the same paper as many papers. in addition, they distribute citations of a paper between a number of copies or versions, thus decreasing the visibility of the specific work. moreover, their use of different analysis databases leads to very different results because of differences in their indexing policies and in their collected papers.3 to remedy all these imperfections, this paper proposes a reference architecture for reliable citation analysis based on applying semantic trust mechanisms. it is important to note that a complete or partial adoption of the ideas defended in this paper will imply the effort to introduce changes within the publishing lifecycle. we believe that these changes are justified considering the serious flaws of the established solutions, and the relevance that citation-analysis systems are acquiring in our society. ■■ reference architecture we have designed a reference architecture that aims to provide reliability to the citation and citation-tracking lifecycle. this architecture is based in the use of digitally signed semantic metadata in the different stages of the scholarly publishing workflow. as a trust scheme, we have chosen a public key infrastructure (pki), in which certificates are signed by certification authorities belonging to one or more hierarchical certification chains.4 trust scheme the goal of the architecture is to allow citation-analysis systems to verify the provenance and trust of machinereadable metadata about citations before incorporating analysis of the impact of scholarly artifacts is constrained by current unreliable practices in cross-referencing, citation discovering, and citation indexing and analysis, which have not kept pace with the technological advances that are occurring in several areas like knowledge management and security. because citation analysis has become the primary component in scholarly impact factor calculation, and considering the relevance of this metric within both the scholarly publishing value chain and (especially important) the professional curriculum evaluation of scholarly professionals, we defend that current practices need to be revised. this paper describes a reference architecture that aims to provide openness and reliability to the citation-tracking lifecycle. the solution relies on the use of digitally signed semantic metadata in the different stages of the scholarly publishing workflow in such a manner that authors, publishers, repositories, and citation-analysis systems will have access to independent reliable evidences that are resistant to forgery, impersonation, and repudiation. as far as we know, this is the first paper to combine semantic web technologies and public-key cryptography to achieve reliable citation analysis in scholarly publishing. i n recent years, the amount of scholarly communication brought into the digital realm has exponentially increased.1 this no-way-back process is fostering the exploitation of large-scale digitized scholarly repositories for analysis tasks, especially those related to impact factor calculation. the potential automation of the contribution– relevance calculation of scholarly artifacts and scholarly professionals has attracted the interest of several parties within the scholarly environment, and even outside of it. for example, one can find within articles of the spanish law related to the scholarly personnel certification the requirement that the papers appearing in the curricula of candidates should appear in the subject category listing of the journal citation reports of the science citation index.2 this example shows the growing relevance of these systems today. ruben tous (rtous@ac.upc.edu) is associate professor, manuel guerrero (guerrero@ac.upc.edu) is associate professor, and jaime delgado (jaime.delgado@ac.upc.edu) is professor, all in the departament d’arquitectura de computadors, universitat politècnica de catalunya, barcelona, spain. semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 25 might send a signed notification of rejection. we feel that the notification of acceptance is necessary because in a certain kind of curriculum, evaluations for university professors conditionally accepted papers can be counted, and in other curriculums not. the camera-ready version will be signed by all the authors of the paper, not only the corresponding author like in the paper submission. after the camera-ready version of the paper has been accepted, the journal will send a signed notification of future publication. this notification will include the date of acceptance and an estimate date of publication. finally, once the paper has been published, the journal will send a signed notification of publication to the author. the reason for having both notification of future publication and notification of publication is that, again, some curriculum evaluations might be flexible enough to count papers that have been accepted for future publication, while stricter ones state explicitly that they only accept published papers. once this process has been completed, a citationanalysis system will only need to import the authors’ ca certificates (that is, the certificates of the universities, research centers, and companies) and the publishers’ ca certificates (like acm, ieee, springer, lita, etc.) to be able to verify all the signed information. a chain of cas will be possible both with authors (for example, university, department, and research line) and with publications (for example, publisher and journal). ■■ universal resource identifiers to ensure that authors’ uris are unique, they will have a tree structure similar to what urls have. the first level element of the uri will be the authors’s organization (be it a university or a research center) id. this organization id will be composed by the country code top-level domain (cctld) and the organization name, separated by an underscore.5 the citation-analysis system will be responsible for assigning these identifiers and ensuring that all organizations have different identifiers. then, in the same manner, each organization will assign second-level elements (similar to departments) and so forth. author’s ca_id: _ example: es_upc author ’s uri: author:/// . . . /. example: author://es_upc.dac/ruben.tous (in this example “es” is the cctdl for spain, upc (universitat politècnica de catalunya) is the university, and dac (departament d’arquitectura de computadors) is the department. them into their repositories. as a collateral effect, authors and publishers also will be able to store evidences (in the form of digitally signed metadata graphs) that demonstrate different facts related to the creating–editing–publishing process (e.g., paper submission, paper acceptance, and paper publication). to achieve these goals, our reference architecture requires each metadata graph carrying information about events to be digitally signed by the proper subject. because our approach is based in a pki trust scheme, each signing subject (author or publisher) will need a public key certificate (or identity certificate), which is an electronic document that incorporates a digital signature to bind a public key with an identity. all the certificates used in the architecture will include the public key information of the subject, a validity period, the url of a revocation center, and the digital signature of the certificate produced by the certificate issuer’s private key. each author will have a certificate that will include as a subject-unique identifier the author ’s universal resource identifier (uri), which we explain in the next section, along with the author ’s current information (such as name, e-mail, affiliation, and address) and previous information (list of former names, e-mails, and addresses), and a timestamp indicating when the certificate was generated. the certification authority (ca) of the author’s certificate will be the university, research center, or company with which the author is affiliated. the ca will manage changes in name, e-mail, and address by generating a new certificate in which the former certificate will move to the list of former information. changes in affiliation will be managed by the new ca, which will generate a new certificate with the current information. since the new certificate will have a new uri, the ca also will generate a signed link to the previous uri. therefore the citation-analysis system will be able to recognize the contributions signed with both certificates as contributions made by the same author. it will be the responsibility of the new ca to verify that the author was indeed affiliated to the former organization (which we consider a very feasible requirement). every time an author (or group of authors) submits a paper to a conference, workshop, or journal, the corresponding author will digitally sign a metadata graph describing the paper submission event. although the paper submission will only be signed by the corresponding author, it will include the uris of all the authors. journals (and also conferences and workshops) will have a certificate that contains their related information. their ca will be the organization or editorial board behind them (for instance, acm, ieee, springer, lita, etc.). if a paper is accepted, the journal will send a signed notification of acceptance, which will include the reviews, the comments from the editor, and the conditions for the paper to be accepted. if the paper is rejected, the journal 26 information technology and libraries | march 2011 ■■ microsoft’s conference management toolkit (cmt; http://cmt.research.microsoft.com) is a conference management service sponsored by microsoft research. it uses https to provide confidentiality, but it is a service for which you have to pay. although some of the web-based systems provide confidentiality through https, none of them provides nonrepudiation, which we feel is even more important. this is so because nonrepudiation allows authors to certify their publications to their curriculum evaluators. our proposed scheme always provides nonrepudiation because of its use of signatures. curriculum evaluators don’t need to search for the publisher’s website to find the evaluated author’s paper. in addition, our proposed scheme allows curriculum evaluations to be performed by computer programs. and confidentiality can easily be achieved by encrypting the messages with the public key of the destination of the message. it should not be difficult for authors to obtain the public key for the conference or journal (which could be included in its “call for papers” or included on its webpage). and, because the paper-submission message includes the author’s public key, notifications of acceptance, rejection, and publication can be encrypted with that key. ■■ modeling the scholarly communication process citation analysis systems operate over metadata about the scholarly communication process. currently, these metadata are usually automatically generated by the citation-analysis systems themselves, generally through a programmatic analysis of the scholarly artifacts unstructured textual contents. these techniques have several drawbacks, as enumerated already, but especially regarding the fact that there is metadata that cannot be inferred from the contents of a paper, like all the aspects of the publishing process. to allow citation-analysis systems accessing metadata about the entire scholarly artifacts lifecycle, we suggest a metadata model that captures a great part of the scholarly domain static and dynamic semantics. this model is based on knowledge representation techniques in semantic web, such as resource description framework (rdf) graphs and web ontology language (owl) ontologies. metadata and rdf the term “metadata” typically refers to a certain data representation that describes the characteristics of an information-bearing entity (generally another data representation such as a physical book or a digital video file). metadata plays a privileged role in the scholarly creations’ uris are built in a similar manner to authors’ uris. but it this case, the use of the country code as part of the publisher’s id is optional. because a creation and its metadata evolve through different stages (submission and camera-ready), we will use different uris for each phase. we propose the use of this kind of uri instead of other possible schemes such as the digital object identifier (doi), because the ones proposed in this paper has the advantage of being human readable and contain the cas chain.6 of course, that doesn’t mean that once published a paper cannot obtain a doi or another kind of identifier. publisher’s ca_id: or _ examples: lita and it_italianjournalofzoology creation’s uri: creation:// . . . / example: creation://lita.ital/vol27_num1_ paper124 confidentiality and nonrepudiation nowadays, some conferences manage their paper submissions and notifications of acceptance (with their corresponding reviews) through e-mail, while others use a web-based application, such as edas (http://edas.info/). the e-mail-based system has no means of providing any kind of confidentiality. each router through which the e-mail travel can see their contents (paper submissions and paper reviews). the web-based system can provide confidentiality through http secure (https), although some of the most popular applications (such as edas and myreview) do not provide it; their developers may not have thought that it was an important feature. the following is a short list of some of the existing web-based systems: ■■ edas (http://edas.info/) is probably the most popular sytem. it can manage a large number of conferences and special issues of journals. it does not provide confidentiality. ■■ myreview (http://myreview.intellagence.eu/index .php) is an open-source web application distributed under the gpl license for managing the paper submissions and paper reviews of a conference or journal. myreview is implemented with php and mysql. it does not provide confidentiality. ■■ conftool (http://www.conftool.net) is another web-based management system for conferences and workshops. a free license of the standard version is available for noncommercial conferences and events with fewer than 150 participants. it uses https to provide confidentiality. semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 27 the purpose of the reference architecture described in this paper, we do not instruct which of the two described approaches for signing rdf graphs is to be used. the decision will depend on the implementation (i.e., on how the graphs will be interchanged and processed). owl and an ontology for the scholarly context to allow modeling the scholarly communication process with rdf graphs, we have designed an owl description logic (dl) ontology. owl is a vocabulary for describing properties and classes of rdf resources, complementing rdfs’s capabilities for providing semantics for generalization hierarchies of such properties and classes. owl enriches the rdfs vocabulary by adding, among others, relations between classes (e.g., disjointness), cardinality (e.g., “exactly one”), equality, richer typing of properties, characteristics of properties (e.g., symmetry), and enumerated classes. owl has the influence of more than ten years of dl research. this knowledge allowed the set of constructors and axioms supported by owl to be carefully chosen so as to balance the expressive requirements of typical applications with a requirement for reliable and efficient reasoning support. a suitable balance between these computational requirements and the expressive requirements was achieved by basing the design of owl on the sh family of description logics.10 the language has three increasingly expressive sublanguages designed for different uses: owl lite, owl dl, and owl full. we have chosen owl dl to define the ontology for capturing the static and dynamic semantics of the scholarly communication process. with respect to the other versions of owl, owl dl offers the most expressiveness while retaining computational completeness (all conclusions are guaranteed to be computable) and decidability (all computations will finish in finite time). owl dl is so named because of its correspondence with description logics. figure 3 shows a simplified graphical view of the owl ontology we have defined for capturing static and dynamic semantics of the scholarly communication process. figure 4, figure 5, and figure 6 offer a (partial) tabular representation of the main classes and properties of the ontology. in owl, properties are independent from classes, but we have chosen to depict them in an object-oriented manner to improve understanding. for the same reason we have represented some properties as arrows between classes, despite this information being already present in the tables. uris do not appear as properties in the diagrams because each instance of a class will be an rdf resource, and any resource has a uri according to the rdf model. these uris will follow the rules described in the above section, “reference architecture.” it’s worth mentioning that the selection of the included properties has been based in the study of several metadata formats and standards, such as dublin communication process by helping identify, discover, assess, and manage scholarly artifacts. because metadata are data, they can be represented through any the existing data representation models, such as the relational model or the xml infoset. though the represented information should be the same regardless of the formalism used, each model offers different capabilities of data manipulation and querying. recently, a not-so-recent formalism has proliferated as a metadata representation model: rdf from the world wide web consortium (w3c).7 we have chosen rdf for modeling the citation lifecycle because of its advantages with respect to other formalisms. rdf is modular; a subset of rdf triples from an rdf graph can be used separately, keeping a consistent rdf model. it therefore can be used with partial information, an essential feature in a distributed environment. the union of knowledge is mapped into the union of the corresponding rdf graphs (information can be gathered incrementally from multiple sources). rdf is the main building block of the semantic web initiative, together with a set of technologies for defining rdf vocabularies like rdf schema (rdfs) and the owl.8 rdf comprises several related elements, including a formal model and an xml serialization syntax. the basic building block of the rdf model is the triple subjectpredicate-object. in a graph-theory sense, an rdf instance is a labeled directed graph consisting of vertices, which represent subjects or objects, and labeled edges, which represent predicates (semantic relations between subjects and objects). coming back to the scholarly domain, our proposal is to model static knowledge (e.g., authors and papers metadata) and dynamic knowledge (e.g., “the action of accepting a paper for publication,” or “the action of submitting a paper for publication”) using rdf predicates. the example in figure 1 shows how the action of submitting a paper for publication could be modeled with an rdf graph. figure 2 shows how the example in figure 1 would be serialized using the rdf xml syntax (the abbreviated mode). so, in our approach, we model assertions as rdf graphs and subgraphs. to allow anybody (authors, publishers, citation-analysis systems, or others) to verify a chain of assertions, each involved rdf graph must be digitally signed by the proper principal. there are two approaches to signing rdf graphs (as also happens with xml instances). the first approach applies when the rdf graph is obtained from a digitally signed file. in this situation, one can simply verify the signature on the file. however, in certain situations the rdf graphs or subgraphs come from a more complex processing chain, and one could not have access to the original signed file. a second approach deals with this situation, and faces the problem of digitally signing the graphs themselves, that is, signing the information contained in them.9 for 28 information technology and libraries | march 2011 note that instances of submitted and accepted event classes will point to the same creation instance because no modification of the creation is performed between these events. on the other hand, instances of tobepublished and published event classes will point to different creation instances (pointed by the cameraready and publishedcreation properties) because of the final editorial-side modifications to which a work can be subject. ■■ advantages of the proposed trust scheme the following is a short list of security features provided by our proposed scheme and attacks against which our proposed scheme is resilient: core (dc), dc’s scholarly works application profile, vcard, and bibtex.11 figure 4 shows the class publication and its subclasses, which represent the different kinds of publication. in the figure, we only show classes for journals, proceedings, and books. but it could obviously be extended to contain any kind of publication. figure 5 contains the classes for the agents of the ontology (i.e., the human beings that author papers and book chapters and the organizations to which human beings are affiliated or that edit publications). the figure also includes the creation class (e.g., a paper or a book chapter). finally, figure 6 has the part of the ontology that describes the different events that occur in the process of publishing a paper (i.e., paper submission, paper acceptance, notification of future publication, and publication). figure 1. example rdf graph semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 29 cryptography. the necessary changes do not apply only to the citation-management software, but also to all the involved parties in the publishing lifecycle (e.g., conference and journal management systems). authors and publishers would be the originators of the digitally signed evidences, thus user-friendly tools for generating and signing the rdf metadata would be required. plenty of rdf editors and digital signature toolkits exist, but we predict that conference and journal management systems such as edas could easily be extended to provide integrated functionalities for generating and processing digitally signed metadata graphs. this could be transparent to the users because the rdf documents would be automatically generated (and also signed in the case of the publishers) during the creating–editing– publishing process. because our approach is based on a pki trust scheme, we rely on a special setup assumption: the existence of cas, which certify that the identity information and the public key contained within the public key certificates of authors and publishers belong together. to get a publication recognized by a reliable citation-analysis system, an author or a publisher would need a public-key certificate issued by a ca trusted by this citation-analysis system. the selection of trusted ■■ an author can certify to any evaluation entity that will evaluate his or her curriculum the publications that he or she has done. ■■ an evaluator entity can query the citation-analysis system and get all the publications that a certain author has done. ■■ an author cannot forge notifications of publication. ■■ a publisher cannot repudiate the fact that it has published an article once it has sent the certificate. ■■ two or more authors cannot team up and make the system think that they are the same person to have more publications in their accounts (not even if they happen to have the same name). ■■ implications the adoption of the approach proposed in this paper has certain implications in terms of technological changes but also in terms of behavioral changes at some of the stages of the scholarly publishing workflow. regarding the technological impact, the approach relies on the use of semantic web technologies and public-key 2008–05–25 semantic web for reliable citation management in scholarly publishing . . . . . . figure 2. example rdf/xml representation of graph in figure 1 30 information technology and libraries | march 2011 figure 3. owl ontology for capturing the scholarly communication process figure 4. part of the ontology describing publications semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 31 the citation-analysis system obtains the information or whether the information is duplicated. the proposed approach guarantees that the citation-analysis subsystem can always verify the provenance and trust of the metadata, and the use of unique identifiers ensures the detection of duplicates. our approach also implies minor behavioral changes for authors, mainly related to the management of publickey certificates, which is often required for many other tasks nowadays. a collateral benefit of the approach would be the automation of the copyright transfer procedure, which in most cases still relies on handwritten signatures. authors would only be required to have their public-key certificate at hand (probably installed in the web browser), and the conference and journal management software would do all the work. cas by citation-analysis systems would require the deployment of the necessary mechanisms to allow an author or a publisher to ask for the inclusion of his or her institution in the list. however, this process would be eased if some institutional cas belonged to trust hierarchies (e.g., national or regional), so including some higher-level cas makes the inclusion of cas of some small institutions easier. another technological implication is related to the interchange and storage of the metadata. users and publishers should save the signed metadata coming from a publishing process digitally, and citation-analysis systems should harvest the digitally signed metadata. the metadata-harvesting process could be done in several different ways; but here raises an important benefit of the presented approach: the fact that it does not matter where figure 5. part of the ontology describing agents and creations 32 information technology and libraries | march 2011 domain, but which we have taken in consideration. in our approach, static and dynamic metadata cross many trust boundaries, so it is necessary to apply trust management techniques designed to protect open and decentralized systems. we have chosen a public-key infrastructure (pki) design to cover such a requirement. however, other approaches exist, such as the one by khare and rifkin, which combines rdf with digital signatures in a manner related to what is known as the “web of trust.”13 one aspect of any approach dealing with rdf and cryptography is how to digitally sign rdf graphs. as described above, in the section “modeling the scholarly communication process with semantic web knowledge representation techniques,” there are two different approaches for such a task, signing the file from which the graph will be obtained (which is the one we have chosen) or digitally signing the graphs themselves (the information represented in them), as described by carroll.14 ■■ conclusions the work presented in this paper describes a reference architecture that aims to provide reliability to the citation and citation-tracking lifecycle. the paper defends that current practices in the analysis of impact of scholarly artifacts entail serious design and security flaws, including nonidentical instances confusion, author-naming conflicts, fake citing, repudiation, impersonation, etc. ■■ related work as far as we know, this is the first paper to combine semantic web technologies and public-key cryptography to achieve reliable citation analysis in scholarly publishing. regarding the use of ontologies and semantic web technologies for modeling the scholarly domain, we highlight the research by rodriguez, bollen, and van de sompel.12 they define a semantic model for the scholarly communication process, which is used within an associated large-scale semantic store containing bibliographic, citation, and use data. this work is related to the mesur (metrics from scholarly usage of resources) project (http://www.mesur.org) from los alamos national laboratory. the project’s main goal is providing novel mechanisms for assessing the impact of scholarly communication items, and hence of scholars, with metrics derived from use data. as in our case, the approach by rodriguez, bollen, and van de sompel models static and dynamic aspects of the scholarly communication process using rdf and owl. however, contrary to what happens in that approach, our work focuses on modeling the dynamic aspects of the creation–editing–publishing workflow, while the approach by rodriguez, bollen, and van de sompel focuses on modeling the use of alreadypublished bibliographic resources. regarding the combination of semantic web technologies with security aspects and cryptography, there exist several works that do not specifically focus in the scholarly figure 6. part of the ontology describing events semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 33 isi web of knowledge, http://www.isiwebofknowledge .com/ (accessed june 24, 2010); and eugene garfield, citation indexing: its theory and application in science, technology and humanities (new york: wiley, 1979). 3. judit bar-ilan, “an ego-centric citation analysis of the works of michael o. rabin based on multiple citation indexes,” information processing & management: an international journal 42 no. 6 (2006): 1553–66. 4. alfred arsenault and sean turner, “internet x.509 public key infrastructure: pkix roadmap,” draft, pkix working group, sept. 8, 1998, http://tools.ietf.org/html/draft-ietf-pkixroadmap-00 (accessed june 24, 2010). 5. internet assigned numbers authority (iana), root zone database, http://www.iana.org/domains/root/db/ (accessed june 24, 2010). 6. for information on the doi system, see bill rosenblatt, “the digital object identifier: solving the dilemma of copyright protection online,” journal of electronic publishing 3, no. 2 (1997). 7. resource description framework (rdf), world wide web consortium, feb. 10, 2004, http://www.w3.org/rdf/ (accessed june 24, 2010). 8. “rdf vocabulary description language 1.0: rdf schema. w3c working draft 23 january 2003,” http://www .w3.org/tr/2003/wd-rdf-schema-20030123/ (accessed june 24, 2010); “owl web ontology language overview. w3c recommendation 10 february 2004,” http://www.w3.org/tr/ owl-features/ (accessed june 24, 2010). 9. jeremy j. carroll, “signing rdf graphs,” in the semantic web—iswc 2003, vol. 2870, lecture notes in computer science, ed. dieter fensel, katia sycara, and john mylopoulos (new york: springer, 2003). 10. ian horrocks, peter f. patel-schneider, and frank van harmelen, “from shiq and rdf to owl: the making of a web ontology language” web semantics: science, services and agents on the world wide web 1 (2003): 10–11. 11. see the dublin core metadata initiative (dcmi), http:// dublincore.org/ (accessed june 24, 2010); julie allinson, pete johnston, and andy powell, “a dublin core application profile for scholarly works,” ariadne 50 (2007), http://www.ukoln .ac.uk/repositories/digirep/index/eprints_type_vocabulary_ encoding_scheme, http://www.ariadne.ac.uk/issue50/ allinson-et-al/ (accessed dec. 27, 2010); world wide web consortium, “representing vcard objects in rdf/xml: w3c note 22 february 2001,” http://www.w3.org/tr/2001/note -vcard-rdf-20010222/ (accessed dec. 3, 2010); and for bibtex, see “entry types,” http://nwalsh.com/tex/texhelp/bibtx-7. html (accessed june 24, 2010). 12. marko. a. rodriguez, johan bollen, and herbert van de sompel, “a practical ontology for the large-scale modeling of scholarly artifacts and their usage,” proceedings of the 7th acm/ ieee joint conference on digital libraries (2007): 278–87. 13. rohit khare and adam rifkin, “weaving a web of trust,” world wide web journal 2, no. 3 (1997): 77–112. 14. carroll, “signing rdf graphs.” the architecture presented in this work is based in the use of digitally signed rdf graphs in the different stages of the scholarly publishing workflow, in such a manner that authors, publishers, repositories, and citation-analysis systems could have access to independent reliable evidences. the architecture aims to allow the creation of a reliable information space that reflects not just static knowledge but also dynamic relationships, reflecting the full complexity of trust relationships between the different parties in the scholarly domain. to allow modeling the scholarly communication process with rdf graphs, we have designed an owl dl ontology. rdf graphs carrying instances of classes and properties from the ontology will be digitally signed and interchanged between parties at the different stages of the creation–editing–publishing process. citation-management systems will have access to these signed metadata graphs and will be able to verify their provenance and trust before incorporating them to their repositories. because citation analysis has become a critical component in scholarly impact factor calculation, and considering the relevance of this metric within the scholarly publishing value chain, we defend that the relevance of providing a reliable solution justifies the effort of introducing technological changes within the publishing lifecycle. we believe that these changes, which could be easily automated and incorporated to the modern conference and journal editorial systems, are justified considering the serious flaws of the established solutions and the relevance that citation-analysis systems are acquiring in our society ■■ acknowledgment this work has been partly supported by the spanish administration (tec2008-06692-c02-01 and tsi2007 66869-c02-01). references and notes 1. herbert van de sompel et al., “an interoperable fabric for scholarly value chains,” d-lib magazine 12 no. 10 (2006), http:// www.dlib.org/dlib/october06/vandesompel/10vandesompel .html (accessed jan. 19, 2011). 2. boletín oficial del estado (b.o.e.) 054 04/03/2005 sec 3 pag 7875 a 7887, http://www.boe.es/boe/dias/2005/03/04/pdfs/ a07875–07887.pdf (accessed june 24, 2010). see also thomson cherry 154 information technology and libraries | september 2006 article title: subtitle in same font author name and second author author id box for 2 column layout the present study investigated whether there is a correlation between user performance and compliance with screen-design guidelines found in the literature. rather than test individual guidelines and their interactions, the authors took a more holistic approach and tested a compilation of guidelines. nine bibliographic display formats were scored using a checklist of eighty-six guidelines. twenty-seven participants completed ninety search tasks using the displays in a simulated web environment. none of the correlations indicated that user performance was statistically significantly faster with greater conformity to guidelines. in some cases, user performance was actually significantly slower with greater conformity to guidelines. in a supplementary study, a different set of forty-three guidelines and the user performance data from the main study were used. again, none of the correlations indicated that user performance was statistically significantly faster with greater conformity to guidelines. a ttempts to establish generalizations are ubiquitous in science and in many areas of human endeavor. it is well known that this enterprise can be extremely problematic in both applied and pure science.1 in the area of human-computer interaction, establishing and evaluating generalizations in the form of interface-design guidelines are pervasive and difficult challenges, particularly because of the intractably large number of potential interactions among guidelines. using bibliographic display formats from web catalogs, the present study utilizes global evaluation by correlating user performance in a search task with conformity to a compilation of eighty-six guidelines (divided into four subsets). the literature offers many design guidelines for the user interface, some of which cover all aspects of the user interface, some of which focus on one aspect of the user interface—e.g., screen design. tullis, in chapters in two editions of the handbook of human-computer interaction, reviews the work in this area.2 the earlier chapter provides a table describing the screen-design guidelines available at that time. he includes, for example, galitz, whom he notes have several hundred guidelines addressing general screen design, and smith and mosier, whom he notes have about three hundred guidelines addressing the display of data.3 earlier guidelines tended to be generic. more recently, guidelines have been developed for specific applications—e.g., web sites for airline travel agencies, multimedia applications, e-commerce, children, bibliographic displays, and public-information kiosks.4 although some of the guidelines in the literature are based on empirical evidence, many are based on expert opinion and have not been tested. some of the researchbased guidelines have been tested in isolation or in combination with only a few other guidelines. the national cancer institute (nci) web site, research-based web design and usability guidelines, rates sixty guidelines on a scale of 0 to 5 based on the strength of the evidence.5 the more valid the studies that directly support the guideline, the higher the rating. in interpreting the scores, the site advises that scores of 1, 2, or 3 suggest that “more evidence is needed to strengthen the designer’s overall confidence in the validity of a guideline.” of the sixty guidelines on the site, forty-six (76.7 percent) fall into this group. in 2003, the united states department of health and human services web site, research-based web design and usability guidelines, rated 187 guidelines on a different five-point scale.6 eightytwo guidelines (43.9 percent) meet the criteria of having strong or medium research support. another forty-eight guidelines (25.7 percent) are rated as having weak research support. thus, there is some research support for 69.6 percent of the guidelines. in addition to the issue of the validity of individual guidelines, there may be interactions among guidelines. an interaction occurs if the effect of a variable depends on the level of another variable—e.g., an interaction occurs if the usefulness of a guideline depends on whether some other guideline is being followed. a more severe problem is the potential for high-order interactions: the nature of a two-way interaction may depend on the level of a third variable, the nature of a three-way interaction may depend on the level of a fourth variable, and so on. because of the combinatorial explosion, if there are more than a few variables the number of possible interactions becomes huge. as cronbach stated: “once we attend to interactions, we enter a hall of mirrors that extends to infinity.”7 with a large set of guidelines, it is impractical to test all of the guidelines and all of the interactions, including highorder interactions. muter suggested several approaches for handling the problem of intractable high-order interactions, including adapting optimizing algorithms such as simplex, seeking “robustness in variation,” re-construing the problem, and pruning the alternative space.8 the present study utilizes another approach: global evaluation by joan m. cherry, paul muter, and steve j. szigeti bibliographic displays in web catalogs: does conformity to design guidelines correlate with user performance? joan m. cherry (joan.cherry@utoronto.ca) is a professor in the faculty of information studies; paul muter (muter@psych .utoronto.ca) is an assistant professor in the department of psychology; and steve j. szigeti (szigeti@fis.utoronto.ca) is a doctoral student in the faculty of information studies and the knowledge media design institute, all at the university of toronto, canada. bibliographic displays in web catalogs | cherry, muter, and szigeti 155 correlating user performance with conformity to a set of guidelines. using this method, particular guidelines and interactions are not tested, but the set and subsets are tested globally, and some of the interactions, including high-order interactions, are captured. bibliographic displays were scored using a compilation of guidelines, divided into four subsets, and the performance of users doing a set of search tasks using the displays was measured. an attempt was made to determine whether users find information more quickly on displays that receive high scores on checklists of screen-design guidelines. the authors are aware of only two studies that have investigated conformity with a set of guidelines and user performance, and they both included only ten guidelines. d’angelo and twining measured the correlation between compliance with a set of ten standards (d’angelo standards) and user comprehension.9 the d’angelo standards are in the form of principles for web-page design, based on a review of the literature.10 d’angelo and twining found a small correlation (.266) between number of standards met and user comprehension.11 they do not report on statistical significance, but from the data provided in the paper it appears that the correlation is not significant. gerhardt-powals compared an interface designed according to ten cognitive engineering principles to two control interfaces and found that the cognitively engineered interface resulted in statistically significantly superior user performance.12 the guidelines used in the present study were based on a list compiled by chan to evaluate displays of bibliographic records in online library catalogs.13 the set of guidelines was broken down into four subsets. participants in this study were given search tasks and clicked on the requested item on a bibliographic display. the main dependent variable of interest was response time. ฀ method participants twenty-seven participants were recruited through the university of toronto psychology 100 subject pool. seventeen were female; ten were male. most (twenty) were in the age group 17 to 24; three were in the age group 25 to 34 years, and four were in the age group 35 to 44. one had never used the web; all others reported using the web one or more hours per week. participants received course credit. design to control for the effects of fatigue, practice runs, and the like, the order of trials was determined by two orthogonal 9 x 9 latin squares—one to select a display and one to select a book record. each participant completed five consecutive search tasks—author, title, call number, publisher, and date—in a random order, with each display-book combination. (the order of the five search tasks was randomized each time.) this procedure was repeated, so that in total each participant did ninety tasks (9 displays x 5 tasks x 2 repetitions). materials and apparatus the study used nine displays from library catalogs available on the web. they were selected to represent a variety of systems and to illustrate the remarkable diversity in bibliographic displays in web catalogs. the displays differed in the amount of information included, the structure of the display, employment of highlighting techniques, and use of graphical elements. four examples of the nine displays are presented in figures 1a, 1b, 1c, and 1d. the displays were captured and presented in an interactive environment using active server page (asp) software. the look of the displays was retained, but hypertext links were deactivated. nine different book records were used to provide the content for the displays. items selected were those that would be readily understood by most users—e.g., books by saul bellow, norman mailer, and john updike. the guidelines were based on a list compiled by chan from a review of the literature in human-computer interaction and library science.14 the list does not include guidelines about the process of design. chan formatted the guidelines as a checklist for bibliographic displays in online catalogs. in work reported in 1996, cherry and cox modified the checklist for use with bibliographic displays in web catalogs.15 in a 1998 paper, cherry reported on evaluations of bibliographic displays in catalogs of academic libraries, based on chan’s data for twelve opacs and data for ten web catalogs evaluated by cherry and cox using a modification of the 1996 checklist for web catalogs.16 the findings showed that, on average, displays in opacs scored 58 percent and displays in web catalogs scored 60 percent. the 1996 checklist of guidelines was modified by herrero-solana and de moya-anegón, who used it to explore the use of multivariate analysis in evaluating twenty-five latin american catalogs.17 for the present study four questions were removed that were considered less useful from the checklist used in cherry’s 1998 analysis. the checklist consisted of four sections or subsets: labels (these identify parts of the bibliographic description); text (the display of the bibliographic, holdings/ location, and circulation status information); instructions (includes instructions to users, informational messages, and options available); and layout (includes identification of the screen, the organization for the bibliographic 156 information technology and libraries | september 2006 information, spacing, and consistency of information presentation). items on the checklist were phrased as questions requiring yes/no responses. examples of the items are: labels: “are all fields/variables labeled?” text: “is the text in mixed case (upper and lowercase)?” instructions: “are instructional sentences or phrases simple, concise, clear, and free of typographical errors?” and layout: “is the width of the display no more than forty to sixty characters?” the set used in the present study contained eightysix guidelines in total, of which forty-eight were generic and could be applied to any application. thirty-eight are specific and apply to bibliographic displays in web catalogs. the experiment was run on a pentium computer with a seventeen-inch sony color monitor with a standard keyboard and mouse. figure 1a. example of display figure 1b. example of display figure 1c. example of display figure 1d. example of display bibliographic displays in web catalogs | cherry, muter, and szigeti 157 procedure participants were tested individually. five practice trials with a display and book record not used in the experiment familiarized the participant with the tasks and software. at the beginning of a trial, the message “when ready, click” appeared on the screen. when the participant clicked on the mouse, a bibliographic display appeared along with a message at the top of the screen indicating whether the participant should click on the author, title, call number, publisher, or date of publication—e.g., “current task: author.” participants clicked on what they thought was the correct answer. if they clicked on any other area, the display was shown again. an incorrect click was not defined as an error—in effect, percent correct was always 100—but an incorrect click would of course add to the response time. the software recorded the time to successfully complete each search, the identification for the display and the book record, and the search-task type. when a participant completed the five search tasks for a display, a message was shown indicating the average response time on that set of tasks. when participants completed the ninety search tasks, they were asked to rank the nine displays according to their preference. for this task, a set of laminated color printouts of the displays was provided. participants ranked the displays, assigning a rank of 1 to the display that they preferred most, and 9 to the one they preferred least. they were also asked to complete a short background questionnaire. the entire session took less than forty-five minutes. scoring the displays on screen design guidelines the authors’ experience has indicated that judging whether a guideline is met can be problematic: evaluators sometimes differ in their judgments. in this study, three evaluators assessed each of the nine displays independently. if there was any disagreement amongst the evaluators’ responses for a given question for a given display, that question was not used in the computation of the percentage score for that display. (a guideline regarding screen density was evaluated by only one evaluator because it was very time-consuming.) the total number of questions used to assess each display was eighty-six. the number of questions on which the evaluators disagreed ranged from twelve to thirty across the nine displays. all questions on which the three evaluators agreed for a given display were used in the calculation of the percentage score for that display. hence the percentage scores for the displays are based on a variable set and number of questions—from fifty-six to seventy-four. the subset of questions on which the three evaluators agreed for all nine displays was small—twenty-two questions. ฀ results with regard to conformity to the guidelines, in addition to the overall scores for each display, which ranged from 42 percent to 65 percent, the percentage score was calculated for each subset of the checklist (labels, text, instructions, and layout). the time to successfully complete each search task was recorded to the nearest millisecond. (for some unknown reason, six of the 2,430 response times recorded [27 x 90] were 0 milliseconds. the program was written in such a way that the response-time buffer was cleared at the time of stimulus presentation, in case the participant clicked just before this time. these trials were treated as missing values in the calculation of the means.) six mean response times were calculated: author, title, call number, publisher, date, and the sum of the five response times, called all tasks. the mean of all tasks response times ranged from 13,671 milliseconds to 21,599 milliseconds for the nine formats. the nine display formats differed significantly on this variable according to an analysis of variance, f(8, 477) = 17.1, p < .001. the correlations between response times and guidelines-conformance scores are presented in table 1. it is important to note that a high correlation between response time and conformity to guidelines indicates a low correlation between user performance (speed) and conformity to guidelines. row 1 of table 1 contains correlations between the total guidelines score and response times; column 1 contains correlations between all tasks (the sum of the five response times) and guidelines scores. of course, the correlations in table 1 are not all independent of each other. only five of the thirty correlations in table 1 are significant at the .05 level, and they all indicate slower response times with higher conformity to guidelines. of the six correlations in table 1 indicating faster response times with higher conformity to guidelines, none approaches statistical significance. the upper left-hand cell of table 1 indicates that the overall correlation between total scores on the guidelines and the mean response time across all search tasks (all tasks) was 0.469 (df = 7, p = 0.203)—i.e., conformity to the overall checklist was correlated with slower overall response times, though this correlation did not approach statistical significance. figure 2 shows a scatter plot of the main independent variable, overall score on the checklist of guidelines, and the main dependent variable, the sum of the response times for the five tasks (all tasks). figure 3 shows a scatter plot for the highest obtained correlation: between score on the overall checklist of guidelines and the time to complete the title search task. visual inspection suggests patterns consistent with table 1: no correlation in figure 2, and slower search times with higher guidelines scores in figure 3. finally, correlations were computed between preference and response times (all tasks response times and five 158 information technology and libraries | september 2006 specific-task response times) and between preference and conformity to guidelines (overall guidelines four subsets of guidelines). none of the eleven correlations approached statistical significance. ฀ supplementary study to further validate the results of the main study, it was decided to score the interfaces against a different set of guidelines based on the 2003 u.s. department of health and human services research-based web design and usability guidelines. this set consists of 187 guidelines and includes a rating for each guideline based on strength of research evidence for that guideline. the present study started with eighty-two guidelines that were rated as having either moderate or strong research support, as the definitions of both of these include “cumulative research-based evidence.”18 compliance with guidelines that address the process of design can only be judged during the design process, or via access to the interface designers. since this review process did not allow for that, a total of nine process-focused guidelines were discarded. this set of seventy-three guidelines was then compared with the sixty-guideline 2001 nci set, research-based web design and usability guidelines, intending to add any outstanding nci guidelines supported by strong research evidence to the existing list of seventy-three. however, all of the strongly supported nci guidelines were already represented in the original seventy-three. finally, the guidelines in the iso 9241, ergonomic requirements for office work with visual display terminals (vdts), part 11 (guidance on usability), part 12 (presentation of information ), and part 14 (menu dialogues ) were compared to the existing set of seventy-three, with the intention that any prescriptive guideline in the iso set that was not already included in the original seventy-three would be added.19 again, there were none. the seventy-three guidelines were organized into three thematic groups: (1) layout (the organization of textual and graphic material on the screen), (2) interaction (which included navigation or any element with which the user would interact), and (3) text and readability. all of the guidelines used were written in a manner allowing readers room for interpretation. the authors explicitly stated that they were not writing rules, but rather, guidelines, and recognized that their application must allow for a level of flexibility.20 this ambiguity creates problems in terms of assessing displays. in this study, two evaluators independently assessed the nine displays. the first evaluator applied all seventy-three guidelines and found thirty to be nonapplicable to the specific types of interfaces considered. the second evaluator applied the shortened list of forty-three guidelines. following the independent evaluations, the two evaluators compared assessments. the initial rate of agreement between the two assessments ranged from 49 percent to 70 percent across the nine displays. in cases where there was disagreement, the evaluators discussed their rationale for the assessment in order to achieve consensus. ฀ results of supplementary study as with the initial study, in addition to the overall scores for each display, the percentage score was calculated for each subset of the checklist (labels, interaction, and text and readability). it is worth noting that the overall scores witnessed higher compliance to this second set of guidelines, ranging from 68 percent to 89 percent. the correlations between response times and guidelines-conformance scores are presented in table 2. again, it is important to note that a high correlation between response time and conformity to guidelines indicates a low correlation between user performance (speed) and conformity to guidelines. row 1 of table 2 contains correlations between the total guidelines score and response times; column 1 contains correlations between all tasks (the sum of the five response times) and guidelines scores. of course, the correlations in table 2 are not all independent of each other. only one of the twenty-four correlations in table 2 table 1. correlations between scores on the checklist of screen design guidelines and time to complete search tasks: pearson correlation (sig. 2-tailed); n=9 all cells all tasks author title call # publisher year total score: .469 (.203) .401 (.285) .870 (.002) .547 (.127) .035 (.930) .247 (.522) labels: .722 (.028) .757 (.018) .312 (.413) .601 (.087) .400 (.286) .669 (.049) text: -.260 (.500) -.002 (.997) .595 (.091) -.191 (.623) -.412 (.271) -.288 (.452) instructions: .422 (.258) .442 (.234) .712 (.032) .566 (.112) .026 (.947) .126 (.748) layout: .602 (.086 -.102 (.794) .383 (.308) .624 (.073) .492 (.179) .367 (.332) bibliographic displays in web catalogs | cherry, muter, and szigeti 159 is significant at the .05 level, and it indicates a slower response time with higher conformity to guidelines. of the ten correlations in table 2 indicating faster response times with higher conformity to guidelines, none approaches statistical significance. the upper left-hand cell of table 2 indicates that the overall correlation between total scores on the guidelines and the mean response time across all search tasks (all tasks) was 0.292 (p = 0.445)—i.e., conformity to the overall checklist was correlated with slower overall response times, though this correlation did not approach statistical significance. figure 4 shows a scatter plot of the main independent variable, overall score on the checklist of guidelines, and the main dependent variable, the sum of the response times for the five tasks (all tasks). figure 5 shows a scatter plot for the highest-obtained correlation: between score on the text and readability category of guidelines and the time to complete the title search task. visual inspection suggests patterns consistent with table 2: no correlation in figure 4, and slower search times with higher guidelines scores in figure 5. ฀ discussion in the present experiment and the supplementary study, none of the correlations indicating faster user performance with greater conformity to guidelines approached statistical significance. in some cases, user performance was actually significantly slower with greater conformity to guidelines—i.e., in some cases, there was a negative correlation between user performance and conformity to guidelines. the authors are aware of no other study indicating a negative correlation between user performance and conformity to interface design guidelines. some researchers would not be surprised at a finding of zero correlation between user performance and conformity to guidelines, but a negative correlation is somewhat puzzling. a negative correlation implies that there is something wrong somewhere—perhaps incorrect underlying theories or an incorrect body of assumptions. such a negative correlation is not without precedent in applied science. in the field of medicine, before the turn of the twentieth century, seeing a doctor actually decreased the chances of improving health.21 presumably, medical guidelines of the time were negatively correlated with successful practice, and the negative correlation implies not just worthlessness, but medical theories or beliefs that were actually incorrect and harmful. the boundary conditions of the present findings are unknown. the present findings may be specific to the tasks employed—fairly simple search tasks. the findings may apply only to situations in which the user is switching formats frequently, as opposed to situations in which each user is using only one format. (a between-subjects design would test this possibility.) the findings may be specific to the two sets of guidelines used. with sets of ten guidelines, d’angelo and twining and gerhardt-powals found positive correlations between user performance and conformity to guidelines (though apparently not statistically significantly in the former study).22 the guidelines used in the authors’ main study and supplementary study tended to be more detailed than in the other two studies. detailed guidelines are sometimes seen as advantageous, since developers who use guidelines need to be able to interpret the guidelines in order to implement them. however, perhaps following a large number of detailed figure 2. scatter plot for overall score on checklist of screen design guidelines and time to complete set of five search tasks figure 3. scatter plot for overall score on checklist of screen design guidelines and time to complete “title” search tasks 160 information technology and libraries | september 2006 guidelines reduces the amount of personal judgment used and results in less effective designs. (designers of the nine displays used in the present study would not have been using either of the sets of guidelines used in our studies but may have been using some of the sources from which our guidelines were extracted.) as noted by cheepen in discussing guidelines for voice dialogues, sometimes a designer’s experience may be more valuable than a particular guideline.23 the lack of agreement in interpreting the guidelines was an unexpected but interesting factor revealed during the collection of data in both the main study and the supplementary study. while a higher rate of agreement had been expected, the differences raised an important point in the use of guidelines. if guidelines intentionally leave room for interpretation, what factor does expert opinion and experience play in design? in the main study, the number of guidelines on which the evaluators disagreed ranged from 14 percent to 35 percent across the nine displays. in the supplementary study, both evaluators had experience in interface design through a number of different roles in the design process (both academic and professional). this meant the evaluators’ interpretations of the guidelines were informed by previous experience. the initial level of disagreement ranged from 30 percent to 51 percent across the nine displays. while it was possible to quickly reach consensus table 2. correlations between scores on subset of the u.s. dept. of health and human services (2003) research–based web design and usability guidelines and time to complete search tasks: pearson correlation (sig. 2-tailed); n=9 all cells all tasks author title call # publisher year total score: .292 (.445) .201 (.604) .080 (.839) -.004 (.992) .345 (.363) .499 (.172) layout: -.308 (.420) -.264 (.492) -.512 (.159) -.332 (.383) .046 (.906) -.294 (.442) text: .087 (.824) -.051 (.895) .712 (.032) -.059 (.879) -.095 (.808) -.259 (.500) interaction: .638 (.065) .603 (.085) .055 (.887) .439 (.238) .547 (.128) .625 (.072) figure 4. scatter plot for subset of u.s. department of health and human services (2003) research–based web design and usability guidelines conformance score and total time to complete five search tasks figure 5. scatter plot for text and readability category of u.s. department of health and human services (2003) research–based web design and usability guidelines and time to complete “title” search tasks bibliographic displays in web catalogs | cherry, muter, and szigeti 161 on a number of assessments (because both evaluators recognized the high degree of subjectivity that is involved in design), it also led to longer discussions regarding the intentions of the guideline authors. a majority of the differences involved lack of guideline clarity (where one evaluator had indicated a meet-or-fail score, while another felt the guideline was either unclear or not applicable). does this imply that guidelines can best be applied by committees or groups of designers? the dynamic of such groups would add another complex variable to understanding the relationship between guideline conformity and user performance. future research should test other tasks and other sets of guidelines to confirm or refute the findings of the present study. there should also be investigation of other potential predictors of display effectiveness. for example, would the ratings of usability experts or graphic designers for a set of bibliographic displays be positively correlated with user performance? crawford, in response to a paper presenting findings from an evaluation of bibliographic displays using a previous version of the checklist of guidelines used in the main study, commented that the design of bibliographic displays still reflects art, not science.24 several researchers have discussed aesthetics and user interface design. reed et al. noted the need to extend our understanding of the role of aesthetic elements in the context of user-interface guidelines and standards.25 ngo, teo, and byrne discussed fourteen aesthetic measures for graphic displays.26 norman discussed these ideas in “emotions and design: attractive things work better.”27 tractinsky, katz, and ikar found strong correlations between perceived aesthetic appeal and perceived usability.28 most empirical studies of guidelines have looked at one variable only or, at the most, a small number of variables. the opposite extreme would be to do a study that examines a large number of variables factorially. for example, assuming eighty-six yes/no guidelines for bibliographic displays, it would be theoretically possible to do a factorial experiment testing all possible combinations of yes/no—2 to the 86th power. in such an experiment, all two-way interactions and higher interactions could be assessed, but such an experiment is not feasible. what the authors have done is somewhere between these two extremes. this study has the disadvantage that we cannot say anything about any individual guideline, but it has the advantage that it captures some of the interactions, including highorder interactions. despite the present results, the authors are not recommending abandoning the search for guidelines in interface design. at a minimum, the use of guidelines may increase consistency across interfaces, which may be helpful. however, in some research domains, particularly when huge numbers of potential interactions result in extreme complexity, it may be advisable to allocate resources to means other than attempting to establish guidelines, such as expert review, relying on tradition, letting natural selection take its course, utilizing the intuitions of designers, and observing user-interaction. indeed, in pure and applied research in general, perhaps more resources should be allocated to means other than searching for explicit generalizations. future research may better indicate when to attempt to establish generalizations and when to use other methods. ฀ acknowledgements this work was supported by a social sciences and humanities research council general research grant awarded by the faculty of information studies, university of toronto, and by the natural sciences and engineering research council of canada. the authors wish to thank mark dykeman and gerry oxford who developed the software for the experiment; donna chan, joan bartlett, and margaret english, who scored the displays with the first set of guidelines; everton lewis, who conducted the experimental sessions; m. max evans, who helped score the displays with the supplementary set of guidelines; and robert l. duchnicky, jonathan l. freedman, bruce oddson, tarjin rahman, and paul w. smith for helpful comments. references and notes 1. see, for example, a. chapanis, “some generalizations about generalization,” human factors 30, no. 3 (1988): 253–67. 2. t. s. tullis, “screen design,” in handbook of human-computer interaction, ed. m. helander (amsterdam: elsevier, 1988), 377–411; t. s. tullis, “screen design,” in handbook of humancomputer interaction, 2d ed., eds. m. helander, t. k. landauer, and p. prabhu (amsterdam: elsevier, 1997), 503–31. 3. w. o. galitz, handbook of screen format design, 2d ed. (wellesley hills, mass.: qed information sciences, 1985); s. l. smith and j. n. mosier, guidelines for designing user interface software, technical report esd-tr-86-278 (hanscom air force base, mass.: usaf electronic systems division, 1986). 4. c. chariton and m. choi, “user interface guidelines for enhancing the usability of airline travel agency e-commerce web sites,” chi ‘02 extended abstracts on human factors in computing systems, apr. 20–25, 2002 (minneapolis, minn.: acm press), 676–77, http://portal.acm.org/citation .cfm?doid=506443.506541 (accessed dec. 28, 2005); m. g. wadlow, “the andrew system; the role of human interface guidelines in the design of multimedia applications,” current psychology: research and reviews 9 (summer 1990): 181–91; j. kim and j. lee, “critical design factors for successful e-commerce systems,” behaviour and information technology 21, no. 3 (2002): 185–99; s. giltuz and j. nielsen, usability of web sites for children: 162 information technology and libraries | september 2006 70 design guidelines (fremont, calif.: nielsen norman group, 2002); juliana chan, “evaluation of formats used to display bibliographic records in opacs in canadian academic and public libraries,” master of information science research project report (university of toronto: faculty of information studies, 1995); m. c. maquire, “a review of user-interface design guidelines for public information kiosk systems,” international journal of human-computer studies 50, no. 3 (1999): 263–86. 5. national cancer institute, research-based web design and usability guidelines (2001), www.usability.gov/guidelines/index .html (accessed dec. 28, 2005). 6. u.s. department of health and human services, researchbased web design and usability guidelines (2003), http://usability .gov/pdfs/guidelines.html (accessed dec. 28, 2005). 7. l. j. cronbach, “beyond the two disciplines of scientific psychology,” american psychologist 30, no. 2 (1975): 116–27. 8. p. muter, “interface design and optimization of reading of continuous text,” in cognitive aspects of electronic text processing, eds. h. van oostendorp and s. de mul (norwood, n.j.: ablex, 1996), 161–80; j. a. nelder and r. mead, “a simplex method for function minimization,” computer journal 7, no. 4 (1965): 308–13; t. k. landauer, “research methods in human-computer interaction,” in handbook of human-computer interaction, ed. m. helander (amsterdam: elsevier, 1988), 905–28; r. n. shepard, “toward a universal law of generalization for psychological science,” science 237 (sept. 11, 1987): 1317–323. 9. j. d. d’angelo and j. twining, “comprehension by clicks: d’angelo standards for web page design, and time, comprehension, and preference,” information technology and libraries 19, no. 3 (2000): 125–35. 10. j. d. d’angelo and s. k. little, “successful web pages: what are they and do they exist?” information technology and libraries 17, no. 2 (1998): 71–81. 11. d’angelo and twining, “comprehension by clicks.” 12. j. gerhardt-powals, “cognitive engineering principles for enhancing human-computer performance,” international journal of human-computer interaction 8, no. 2 (1996): 189–211. 13. chan, “evaluation of formats.” 14. ibid. 15. joan m. cherry and joseph p. cox, “world wide web displays of bibliographic records: an evaluation,” proceedings of the 24th annual conference of the canadian association for information science (toronto, ontario: canadian association for information science, 1996), 101–14. 16. joan m. cherry, “bibliographic displays in opacs and web catalogs: how well do they comply with display guidelines?” information technology and libraries 17, no. 3 (1998): 124– 37; cherry and cox, “world wide web displays of bibliographic records.” 17. v. herrero-solana and f. de moya-anegón, “bibliographic displays of web-based opacs: multivariate analysis applied to latin-american catalogs,” libri 51 (june 2001): 75–85. 18. u.s. department of health and human services, researchbased web design and usability guidelines, xxi. 19. international organization for standardization, iso 924111: ergonomic requirements for office work with visual display terminals (vdts)—part 11: guidance on usability (geneva, switzerland: international organization for standardization, 1998); international organization for standardization, iso 9241-12: ergonomic requirements for office work with visual display terminals (vdts)—part 12: presentation of information (geneva, switzerland: international organization for standardization, 1997); international organization for standardization, iso 9241-14: ergonomic requirements for office work with visual display terminals (vdts)—part 14: menu dialogues (geneva, switzerland: international organization for standardization, 1997). 20. u.s. department of health and human services, researchbased web design and usability guidelines. 21. ivan illich, limits to medicine: medical nemesis: the expropriation of health (harmondsworth, n.y.: penguin, 1976). 22. d’angelo and twining, “comprehension by clicks”; gerhardt-powals, “cognitive engineering principles.” 23. c. cheepen, “guidelines for dialogue design—what is our approach? working design guidelines for advanced voice dialogues project. paper 3,” (1996), www.soc.surrey.ac.uk/research/ reports/voice-dialogues/wp3.html (accessed dec. 29, 2005). 24. w. crawford, “webcats and checklists: some cautionary notes,” information technology and libraries 18, no. 2, (1999): 100–03; cherry, “bibliographic displays in opacs and web catalogs.” 25. p. reed et al., “user interface guidelines and standards: progress, issues, and prospects,” interacting with computers 12, no. 1 (1999): 119–42. 26. d. c. l. ngo, l. s. teo, and j. g. byrne, “formalizing guidelines for the design of screen layouts,” displays 21, no. 1 (2000): 3–15. 27. d. a. norman, “emotion and design: attractive things work better,” interactions 9, no. 4 (2002): 36–42. 28. n. tractinsky, a. s. katz, d. ikar, “what is beautiful is usable,” interacting with computers 13, no. 2 (2000): 127–45. 52 journal of library automation vol. 14/1 march 1981 publishing firm. with a feeling of deja vu i listened to an explanation of how difficult it is to develop a system for the novice; one proposed solution is to allow only the first four letters of a word to be entered (one of the search methods used at the library of congress, which does suggest some cross-fertilization ). whatever the trends, the reality is that librarians and information scientists are playing decreasing roles in the growth of information display technology. hardware systems analysts, advertisers, and communications specialists are the main professions that have an active role to play in the information age. perhaps the answer is an immediate and radical change in the training of library schools of today. our small role may reflect our penchant to be collectors, archivists, and guardians of the information repositories . have we become the keepers of the system? the demand today is for service, information, and entertainment. if we librarians cannot fulfill these needs our places are not assured. should the american library association (ala) be ensuring that libraries are a part of all ongoing tests of videotex-at least in some way-either as organizers, information providers, or in analysis? consider the force of the argument given at the ala 1980 new york annual conference that cable television should be a medium that librarians become involved with for the future. certainly involvement is an important role, but we , like the industrialists and marketers before us, must make smart decisions and choose the proper niche and the most effective way to use our limited resources if we are to serve any part of society in the future. bibliography 1. electronic publishing revietc. oxford, england : learned information ltd . quarterly . 2. home video report . white plains, new york : knowledge industry publications. weekly. 3. ieee transactions on consumer electronics. new york: ieee broadcast, cable, and consumer electronics soc iety . five tim es yearly. 4. international videotex /te letext news. washington , d. c.: arlen communications ltd. monthly . 5. videodisc/teletext news. westport , conn.: microform revi ew. quarterly. 6. videoprint. norwalk , conn.: videoprint. two times monthly. 7. viewdata/videotex report. new york: link resources corp. monthly. data processing library: a very special library sherry cook, mercedes dumlao, and maria szabo: bechtel data processing library, san francisco, california. the 1980s are here and with them comes the ever broadening application of the computer. this presents a new challenge to libraries. what do we do with all these computer codes? how do we index the material? and most importantly, how do we make it accessible to our patrons or computer users? bechtel's data processing library has met these demands. the genesis for th e collection was bechte l's conversion from a honeywell 6000 computer to a univac lloo in 1974. all the programs in use at that time were converted to run on the univac system. it seemed a good time to put all of the computer programs together from all of the various bechtel divisions into a controlled collection. the librarians were charged with the responsibility of enforcing standards and control of bechtel's computer programs. the major benefits derived from placing all computer programs into a controlled library were: 1. company-wide usage of the programs. 2. minimize investment in program development through common usage. 3. computer file and documentation storage by the library to safeguard the investment. 4. central location for audits of program code and documentation. 5. centralized reporting on bechtel programs . developing the collection involved basic cataloging techniques which were greatly modified to encompass all the information that computer programs generate, including actual code, documentation, and listings . historically, this information must be kept indefinitely on an archival basis . the machine-readabl e codes themselves are grouped together and maintained from the library's budget . finally , a reference desk is staffed to answer questions from the entire user community. documentation for programs is strictly controlled . code changes are arranged chronologically to provide only the most current release of a program to all users. historical information is kept and is crucial to satisfy the demands of auditors (such as the nuclear regulatory commission). additionally, the names of people administratively connected with the program are recorded and their responsibilities communications 53 defined (valuable in situations of liability for work complete d yesteryear). the backbone of the operation is a standards manual that spells out and discusses the file requirements, documentation specifications, and control forms. this standard is made readily available throughout bechtel. in addition, there are in-house education classes about the same document. indeed, the central data processing library is the repository of computer information at bechtel. the centralization and control of computer programs eliminates the chaos that can occur if too many individuals maintain and use the same computer program . a partnership for creating successful partnerships | grant 5 ex libris column carl grant carl grant is [tk] ex libris column carl grant a partnership for creating successful partnerships carl grant w hen marc asked me to write this column i eagerly accepted because i feel strongly about libraries leveraging their role to their greater advantage in the rapidly changing information landscape. i see sponsorships and partnerships as an important tool for doing that. however, as noted in marc’s column in this issue, we’d been having a discussion about the continuing involvement of ex libris in the lita/ex libris student writing award. like many of you, we at ex libris are trying to keep our costs low in this challenging economic environment so that we can in turn keep your costs low. thus we are closely evaluating all expenditures to ensure their cost is justified by the value they return to our organization. i won’t repeat the discussion already outlined by marc above, but will just note with great pleasure his willingness to not only listen to my concerns, but to try and address them. his invitation to write this column was part of that response, a chance for me to share my thoughts and concerns with you about sponsorships and partnerships and where they need to go in the future. to do that, i’d like to expand on some of the concepts marc and i were discussing and talk about how to make sponsorships and partnerships successful. i want to look at what successful ones consist of as well as what types are needed in our profession tomorrow. n the elements of successful sponsorships and partnerships for a sponsorship or partnership to be successful in today’s environment, it should offer at least the following components: 1. clear and shared goals. agreeing what is to be achieved via the sponsorship or partnership is essential. furthermore, it should be readily apparent that the goals are achievable. this will happen through joint planning and execution of an agreedupon project plan that results in that achievement. it is up to each partner to ensure that they have the resources to execute that project plan on schedule and on budget. as there will always be unplanned events and issues, there must also be ongoing, open communications throughout the life of the sponsorship or partnership. this way, surprises are avoided and issues can be dealt with before they become problems. 2. risks and rewards must be real and shared. members of a sponsorship or partnership should share risks and rewards in proportion to the role they hold. furthermore, the rewards must be seen to be real rewards to all the members. step into the other members’ shoes and look at what you’re offering. does it clearly bring value to the other organizations in the arrangement? if so, how? if not, what can be done to address that disparity? sponsorships and partnerships should not take advantage of any one sponsor or partner by allocating risks or rewards disproportionately to their contributions. rewards realized by members of the sponsorship or partnership should be proportionally shared by all the members. 3. defined time. a sponsorship or partnership is for a defined amount of time and should not be assumed to be ongoing. regular reviews of how well the sponsorship or partnership is working for the partners must be conducted and decisions made on the basis of those results. it might be that the landscape is changing and the benefits are no longer as meaningful, or there are alternatives now available that provide better benefits for on of the members. maintaining a sponsorship or partnership past its useful life will only result in the disintegration of the overall relationship. 4. write it down. organizations merge, are acquired and sold, people change jobs, and people change responsibilities. any sponsorship or partnership should have a written agreement outlining the elements above. once finalized, it should be signed by an appropriate person representing each member organization. that way, when things do change, there is a reference point and the arrangement is more likely to survive any of these precipitous events. n the sponsorships and partnerships needed for tomorrow successful sponsorships and partnerships are a necessary part our landscape today. the world of information and knowledge has become too large, exists in too many silos, and is far too complex. “competition, collaboration, and cooperation” defines the only path possible for navigating the landscape successfully. as the president of a company in the library automation marketplace, i continue to seek out opportunities that uniquely position our company to effectively maintain success in the marketplace and to provide value for our customers and thus our company. i believe libraries need to seek the same opportunities for their organizations. carl grant (carl.grant@exlibrisgroup.com) is president of ex libris north america, des plaines, illinois. continued on page 7 editorial board thoughts | shores 7 looking ahead, it seems clear that the pace of change in today’s environment will only continue to accelerate; thus the need for us to quickly form and dissolve key sponsorships and partnerships that will result in the successful fostering and implementation of new ideas, the currency of a vibrant profession. the next challenge is to realize that many of the key sponsorship and partnerships that need to be formed are not just with traditional organizations in this profession. tomorrow’s sponsorships and partnership will be with those organizations that will benefit from the expertise of libraries and their suppliers while in return helping to develop or provide the new funding opportunities and means and places for disseminating access to their expertise and resources. likely organizations would be those in the fields of education, publishing, content creation and management, and social and community webbased software. to summarize, we at ex libris believe in sponsorships and partnerships. we believe they’re important and should be used in advancing our profession and organizations. from long experience we also have learned there are right ways and wrong ways to implement these tools, and i’ve shared thoughts on how to make them work for all the parties involved. again, i thank marc for his receptiveness to this discussion and my even deeper appreciation for trying to address the issues. it’s serves as an excellent example of what i discussed above. people forget, but paper, the scroll, the codex, and later the book were all major technological leaps, not to mention the printing press and moveable type. . . . there is so much potential for using technology to equalize access to information, regardless of how much money you have, what language you speak, or where you live. big ideas, enthusiasm, and hope for the profession, in addition to practical technology-focused information await the reader. enjoy the issue, and congratulations to the winner and all the finalists! note 1. all quotations are taken with permission from private e-mail correspondence. a partnership for creating successful partnerships continued from page 5 90 information technology and libraries | september 2011 michael witt i ’ll never forget helping one of my relatives learn how to use his first computer. we ran through the basics: turning the computer and monitor on, pointing and clicking, typing, and opening and closing windows. i went away to college, and when i came back for the holidays, he happily showed off his new abilities to send emails and create spreadsheets and such. despite his well-earned pride, i couldn’t help but notice that when he reached the edge of the desk with the mouse, he would use his other hand to place a photo album up against the desk and roll the mouse onto it, in order to reach the far right-hand side of the screen with the pointer. when i picked up his hand and the mouse and re-centered it on the desk for him, i think it blew his mind. he had been using the photo album to extend the reach of the mouse and pointer for months! it occurred to me that i should have spent more time with him, not just showing him what to do, but watching him do it. those of us working in information technology have a tremendous impact on library staff productivity by virtue of the systems we select or develop and implement. people working in most facets of library operations trust and rely on our hardware and software to accomplish their daily work, for which we bear a significant burden of responsibility. are they using the best possible tools for their work? are they using them in the best way? a great deal of effort has gone into user-centered design and improving functionality for our patrons, but in this time of reduced budgets and changing staff roles, it is important to extend similar consideration to the systems that we provision for our co-workers. at its best, information technology has the ability to save time and add value to the library by creating efficiencies and empowering people to do better and new work. whether we are evaluating new integrated library systems or choosing the default text editor for our workstations, we are presented with opportunities to learn more about how our libraries accomplish work “on the ground” and reconsider the role that technology can play in helping them. the phrase “eating your own dog food” is so common in software development circles that some have begun using it as a verb. developers engage in “dogfooding” by using new software themselves, internally, to identify bugs and improve usability and functionality before releasing it to users. this is a regular practice of companies such as microsoft1 and google2. setting aside any negative connotations for the moment (why are people eating dog food? and exactly who are the “dogs” in this scenario?), there is a lot that we can learn by putting ourselves in the place of our users and experiencing our systems from their perspective. perhaps the best way to do this is to walk around the building and spend time in each unit of the library, shadowing its staff and observing how they interact with systems to do their work. try to learn their workflow and observe the tasks they perform—both online and offline. you don’t need to become an expert, but ideally you’d be able to try to perform some of the tasks yourself. in one case, we were able to identify and enable someone to design and run their own reports, which helped their unit make more timely decisions and eliminated the need for it to run monthly reports on their behalf. if these tasks support user-facing interactions, you might get some good usability information in the process too. for example, i learned more about our library’s website by working chat reference for an hour a week than i did in two years of web development team meetings! part of this process is attempting to feel our users’ pain, too. do you use the same locked-down workstation image that you deploy to your staff desktops? there is also a tendency among it staff to keep the newest and best machines for their own use and cycle older machines to other units. i understand—it staff are working with databases and doing developing software, and so we benefit the most from higher-performing machines—but keep in mind that your co-workers likely have older, slower machines and take the lowest common denominator hardware into account when selecting new software. by walking a mile in your users’ shows, you may gain a deeper appreciation and understanding of the other units of the library and how they work together. because so much work is done on computers, people working in information technology can often see a broad picture of the activities of the library. we have the ability to make connections and identify potential points of integration, not only between machines but also between people and their work. references 1. pascal g. zachary, showstopper! the breakneck race to create windows nt and the next generation at microsoft (new york: free press, 1994): 129–56. 2. stephen levy, “inside google+: how the search giant plans to go social,” http://www.wired.com/epicenter/2011/06/ inside-google-plus-social/all/1 (accessed july 12, 2011). editorial board thoughts: eating our own dogfood michael witt (mwitt@purdue.edu) is the interdisciplinary research librarian and an assistant professor of library science at purdue university in west lafayette, indiana. he serves on the editorial board of ital. 58 information technology and libraries | june 2010 know its power, and facets can showcase metadata in new interfaces. according to mcguinness, facets perform several functions in an interface: ■■ vocabulary control ■■ site navigation and support ■■ overview provision and expectation setting ■■ browsing support ■■ searching support ■■ disambiguation support5 these functions offer several potential advantages to the user: the functions use category systems that are coherent and complete, they are predictable, they show previews of where to go next, they show how to return to previous states, they suggest logical alternatives, and they help the user avoid empty result sets as searches are narrowed.6 disadvantages include the fact that categories of interest must be known in advance, important trends may not be shown, category structures may need to be built by hand, and automated assignment is only partly successful.7 library catalog records, of course, already supply “categories of interest” and a category structure. information science research has shown benefits to users from faceted search interfaces. but do these benefits hold true for systems as complex as library catalogs? this paper presents an extensive review of both information science and library literature related to faceted browsing. ■■ method to find articles in the library and information science literature related to faceted browsing, the author searched the association for computing machinery (acm) digital library, scopus, and library and information science and technology abstracts (lista) databases. in scopus and the acm digital library, the most successful searches included the following: ■■ (facet* or cluster*) and (usability or user stud*) ■■ facet* and usability in lista, the most successful searches included combining product names such as “aquabrowser” with “usability.” the search “catalog and usability” was also used. the author also searched google and the next generation catalogs for libraries (ngc4lib) electronic discussion list in an attempt to find unpublished studies. search terms initially included the concept of “clustering”; however, this was quickly shown to be a clearly defined, separate topic. according to hearst, “clustering refers to the grouping of items according to some measure faceted browsing is a common feature of new library catalog interfaces. but to what extent does it improve user performance in searching within today’s library catalog systems? this article reviews the literature for user studies involving faceted browsing and user studies of “next-generation” library catalogs that incorporate faceted browsing. both the results and the methods of these studies are analyzed by asking, what do we currently know about faceted browsing? how can we design better studies of faceted browsing in library catalogs? the article proposes methodological considerations for practicing librarians and provides examples of goals, tasks, and measurements for user studies of faceted browsing in library catalogs. m any libraries are now investigating possible new interfaces to their library catalogs. sometimes called “next-generation library catalogs” or “discovery tools,” these new interfaces are often separate from existing integrated library systems. they seek to provide an improved experience for library patrons by offering a more modern look and feel, new features, and the potential to retrieve results from other major library systems such as article databases. one interesting feature these interfaces offer is called “faceted browsing.” hearst defines facets as a “a set of meaningful labels organized in such a way as to reflect the concepts relevant to a domain.”1 labarre defines facets as representing “the categories, properties, attributes, characteristics, relations, functions or concepts that are central to the set of documents or entities being organized and which are of particular interest to the user group.”2 faceted browsing offers the user relevant subcategories by which they can see an overview of results, then narrow their list. in library catalog interfaces, facets usually include authors, subjects, and formats, but may include any field that can be logically created from the marc record (see figure 1 for an example). using facets to structure information is not new to librarians and information scientists. as early as 1955, the classification research group stated a desire to see faceted classification as the basis for all information retrieval.3 in 1960, ranganathan introduced facet analysis to our profession.4 librarians like metadata because they jody condit fagan (faganjc@jmu.edu) is content interfaces coordinator, james madison university library, harrisonburg, virginia. jody condit fagan usability studies of faceted browsing: a literature review usability studies of faceted browsing: a literature review | fagan 59 doing so and performed a user study to inform their decision. results: empirical studies of faceted browsing the following summaries present selected empirical research studies that had significant findings related to faceted browsing or interesting methods for such studies. it is not an exhaustive list. pratt, hearst, and fagan questioned whether faceted results were better than clustering or relevancy-ranked results.11 they studied fifteen breast-cancer patients and families. every subject used three tools: a faceted interface, a tool that clustered the search results, and a tool that ranked the search results according to relevance criteria. the subjects were given three simple queries related to breast cancer (e.g., “what are the ways to prevent breast cancer?”), asked to list answers to these before beginning, and to answer the same queries after using all the tools. in this study, subjects completed two timed tasks. first, subjects found as many answers as possible to the question in four minutes. second, the researchers measured the time subjects took to find answers to two specific questions (e.g., “can diet be used in the prevention of breast cancer?”) that related to the original, general query. for the first task, when the subjects used the faceted interface, they found more answers than they did with the other two tools. the mean number of answers found using the faceted interface was 7.80, for the cluster tool it was 4.53, and for the ranking tool it was 5.60. this difference was significant (p<0.05).12 for the second task, the researchers found no significant difference between the tools when comparing time on task. the researchers gave the subjects a user-satisfaction questionnaire at the end of the study. on thirteen of the fourteen quantitative questions, satisfaction scores for the faceted interface were much higher than they were for either the ranking tool or the cluster tool. this difference was statistically significant (p < 0.05). all fifteen users also affirmed that the faceted interface made sense, was helpful, was useful, and had clear labels, and said they would use the faceted interface again for another search. yee et al. studied the use of faceted metadata for image searching, and browsing using an interface they developed called flamenco.13 they collected data from thirty-two participants who were regular users of the internet, searching for information either every day or a few times a week. their subjects performed four tasks (two structured and two unstructured) on each of two interfaces. an example of an unstructured task from their study was “search for images of interest.” an example of a structured task was to gather materials for an art history of similarity . . . typically computed using associations and commonalities among features where features are typically words and phrases.”8 using library catalog keywords to generate word clouds would be an example of clustering, as opposed to using subject headings to group items. clustering has some advantages according to hearst. it is fully automated, it is easily applied to any text collection, it can reveal unexpected or new trends, and it can clarify or sharpen vague queries. disadvantages to clustering include possible imperfections in the clustering algorithm, similar items not always being grouped into one cluster, a lack of predictability, conflating many dimensions, difficulty labeling groups, and counterintuitive subhierarchies.9 in user studies comparing clustering with facets, pratt, hearst, and fagan showed that users find clustering difficult to interpret and prefer a predictable organization of category hierarchies.10 ■■ results the author grouped the literature into two categories: user studies of faceted browsing and user studies of library catalog interfaces that include faceted browsing as a feature. generally speaking, the information science literature consisted of empirical studies of interfaces created by the researchers. in some cases, the researchers’ intent was to create and refine an interface intended for actual use; in others, the researchers created the interface only for the purposes of studying a specific aspect of user behavior. in the library literature, the studies found were generally qualitative usability studies of specific library catalog interface products. libraries had either implemented a new product, or they were thinking about figure 1. faceted results from jmu’s vufind implementation 60 information technology and libraries | june 2010 uddin and janacek asked nineteen users (staff and students at the asian institute of technology) to use a website search engine with both a traditional results list and a faceted results list.22 tasks were as follows: (1) look for scholarship information for a masters program, (2) look for staff recruitment information, and (3) look for research and associated faculty member information within your interested area.23 they found that users were faster when using the faceted system, significantly so for two of the three tasks. success in finding relevant results was higher with the faceted system. in the post–study questionnaire, participants rated the faceted system more highly, including significantly higher ratings for flexibility, interest, understanding of information content, and more search results relevancy. participants rated the most useful features to be the capability to switch from one facet to another, preview the result set, combine facets, and navigate via breadcrumbs. capra et al. compared three interfaces in use by the bureau of labor statistics website, using a between-subjects study with twenty-eight people and a within-subjects study with twelve people.24 each set of participants performed three kinds of searches: simple lookup, complex lookup, and exploratory. the researchers used an interesting strategy to help control the variables in their study: because the bls website is a highly specialized corpus devoted to economic data in the united states organized across very specific time periods (e.g., monthly releases of price or employment data), we decided to include the us as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in our study. thus, the simple lookup tasks were constructed around a single economic facet but also included the spatial and temporal facets to provide context for the searchers. the complex lookup tasks involve additional facets including genre (e.g. press release) and/or region.25 capra et al. found that users preferred the familiarity afforded by the traditional website interface (hyperlinks + keyword search) but listed the facets on the two experimental interfaces as their best features. the researchers concluded, “if there is a predominant model of the information space, a well designed hierarchical organization might be preferred.”26 zhang and marchionini analyzed results from fifteen undergraduate and graduate students in a usability study of an interface that used facets to categorize results (relation browser ++).27 there were three types of tasks: ■■ type 1: simple look-up task (three tasks such as “check if the movie titled the matrix is in the library movie collection”). ■■ type 2: data exploration and analysis tasks (six tasks essay on a topic given by the researchers and to complete four related subtasks. the researchers designed the structured task so they knew exactly how many relevant results were in the system. they also gave a satisfaction survey. more participants were able to retrieve all relevant results with the faceted interface than with the baseline interface. during the structured tasks, participants received empty results with the baseline interface more than three times as often as with the faceted interface.14 the researchers found that participants constructed queries from multiple facets in the unstructured tasks 19 percent of the time and in the structured tasks 45 percent of the time.15 when given a post–test survey, participants identified the faceted interface as easier to use, more flexible, interesting, enjoyable, simple, and easy to browse. they also rated it as slightly more “overwhelming.” when asked to choose between the two, twenty-nine participants chose the faceted interface, compared with two who chose the baseline (n = 31). thirty-one of the thirty-two participants said the faceted interface helped them learn more, and twentyeight of them said it would be more useful for their usual tasks.16 the researchers concluded that even though their faceted interface was much slower than the other, it was strongly preferred by most study participants: “these results indicate that a category-based approach is a successful way to provide access to image collections.”17 in a related usability study on the flamenco interface, english et al. compared two image browsing interfaces in a nineteen-participant study.18 after an initial search, the “matrix view” interface showed a left column with facets, with the images in the result set placed in the main area of the screen. from this intermediary screen, the user could select multiple terms from facets in any order and have the items grouped under any facet. the “singletree” interface listed subcategories of the currently selected term at the top, with query previews underneath. the user could then only drill down to subcategories of the current category, and could not select terms from more than one facet. the researchers found that a majority of participants preferred the “power” and “flexibility” of matrix to the simplicity of singletree. they found it easier to refine and expand searches, shift between searches, and troubleshoot research problems. they did prefer singletree for locating a specific image, but matrix was preferred for browsing and exploring. participants started over only 0.2 percent of the time for the matrix compared to 4.5 percent for singletree.19 yet the faceted interface, matrix, was not “better” at everything. for specific image searching, participants found the correct image only 22.0 percent of the time in matrix compared to 66.0 percent in singletree.20 also, in matrix, some participants drilled down in the wrong hierarchy with wrong assumptions. one interesting finding was that in both interfaces, more participants chose to begin by browsing (12.7 percent) than by searching (5.0 percent).21 usability studies of faceted browsing: a literature review | fagan 61 of the first two studies: the first study comprised one faculty member, five graduate students, and two undergraduate students; the second comprised two faculty members, four graduate students, and two undergraduate students. the third study did not report results related to faceted browsing and is not discussed here. the first study had seven scenarios; the second study had nine. the scenarios were complex: for example, one scenario began, “you want to borrow shakespeare’s play, the tempest, from the library,” but contained the following subtasks as well: 1. find the tempest. 2. find multiple editions of this item. 3. find a recent version. 4. see if at least one of the editions is available in the library. 5. what is the call number of the book? 6. you’d like to print the details of this edition of the book so you can refer to it later. participants found the interface friendly, easy to use, and easy to learn. all the participants reported that faceted browsing was useful as a means of narrowing down the result lists, and they considered this tool one of the differentiating features between primo and their library opac or other interfaces. facets were clear, intuitive, and useful to all participants, including opening the “more” section.31 one specific result from the tests was that “online resources” and “available” limiters were moved from a separate location to the right with all other facets.32 in a study of aquabrowser by olson, twelve subjects— all graduate students in the humanities—participated in a comparative test in which they looked for additional sources for their dissertation.33 aquabrowser was created by medialab but is distributed by serials solutions in north america. this study also had three pilot subjects. no relevance judgments were made by the researchers. nine of the twelve subjects found relevant materials by using aquabrowser that they had not found before.34 olson’s subjects understood facets as a refinement tool (narrowing) and had a clear idea of which facets were useful and not useful for them. they gave overwhelmingly positive comments. only two felt the faceted interface was not an improvement. some participants wanted to limit to multiple languages or dates, and a few were confused about the location of facets in multiple places, for example, “music” under both format and topic. a team at yale university, led by bauer, recently conducted two tests on pilot vufind installations: a subject-based presentation of e-books for the cushing/ whitney medical library and a pilot test of vufind using undergraduate students with a sample of 400,000 records from the library system.35 vufind is open-source software developed at villanova university (http://vufind.org). that require users to understand and make sense of the information collection: “in which decade did steven spielberg direct the most movies?”). ■■ type 3: (one free exploration task: “find five favorite videos without any time constraints”). the tasks assigned for the two interfaces were different but comparable. for type 2 tasks, zhang and marchionini found that performance differences between the two interfaces were all statistically significant at the .05 level.28 no participants got wrong answers for any but one of the tasks using the faceted interface. with regard to satisfaction, on the exploratory tasks the researchers found statistically significant differences favoring the faceted interface on all three of the satisfaction questions. participants found the faceted interface not as aesthetically appealing nor as intuitive to use as the basic interface. two participants were confused by the constant changing and updating of the faceted interface. the above studies are examples of empirical investigations of experimental interfaces. hearst recently concluded that facets are a “proven technique for supporting exploration and discovery” and summarized areas for further research in this area, such as applying facets to large “subject-oriented category systems,” facets on mobile interfaces, adding smart features like “autocomplete” to facets, allowing keyword search terms to affect order of facets, and visualizations of facets.29 in the following section, user studies of next-generation library catalog interfaces will be presented. results: library literature understandably, most studies by practicing librarians focus on products their libraries are considering for eventual use. these studies all use real library catalog records, usually the entire catalog’s database. in most cases, these studies were not focused on investigating faceted browsing per se, but on the usability of the overall interface. in general, these studies used fewer participants than the information science studies above, followed less rigorous methods, and were not subjected to statistical tests. nevertheless, they provide many insights into the user experience with the extremely complex datasets underneath next-generation library catalog interfaces that feature faceted browsing. in this review article, only results specifically relating to faceted browsing will be presented. sadeh described a series of usability studies performed at the university of minnesota (um), a primo development partner.30 primo is the next-generation library catalog product sold by ex libris. the author also received additional information from the usability services lab at um via e-mail. three studies were conducted in august 2006, january 2007, and october 2007. eight users from various disciplines participated in each 62 information technology and libraries | june 2010 participants. the researchers measured task success, duration, and difficulty, but did not measure user satisfaction. their study consisted of four known-item tasks and six topic-searching tasks. the topic-searching tasks were geared toward the use of facets, for example, “can you show me how would you find the most recently published book about nuclear energy policy in the united states?”45 all five participants using endeca understood the idea of facets, and three used them. students tried to limit their searches at the outset rather than search and then refine results. an interesting finding was that use of the facets did not directly follow the order in which facets were listed. the most heavily used facet was library of congress classification (lcc), followed closely by topic, and then library, format, author, and genre.46 results showed a significantly shorter average task duration for endeca catalog users for most tasks.47 the researchers noted that none of the students understood that the lcc facet represented call-number ranges, but all of the students understood that these facets “could be used to learn about a topic from different aspects—science, medicine, education.”48 the authors could find no published studies relating to the use of facets in some next-generation library catalogs, including encore and worldcat local. although the university of washington did publish results of a worldcat local usability study in a recent issue of library technology reports, results from the second round of testing, which included an investigation of facets, were not yet ready.49 ■■ discussion summary of empirical evidence related to faceted browsing empirical studies in the information science literature support many positive findings related to faceted browsing and build a solid case for including facets in search interfaces: ■■ facets are useful for creating navigation structures.50 ■■ faceted categorization greatly facilitates efficient retrieval in database searching.51 ■■ facets help avoid dead ends.52 ■■ users are faster when using a faceted system.53 ■■ success in finding relevant results is higher with a faceted system.54 ■■ users find more results with a faceted system.55 ■■ users also seem to like facets, although they do not always immediately have a positive reaction. ■■ users prefer search results organized into predictable, multidimensional hierarchies.56 ■■ participants’ satisfaction is higher with a faceted system.57 the team drew test questions from user search logs in their current library system. some questions targeted specific problems, such as incomplete spellings and incomplete title information. bauer notes that some problems uncovered in the study may relate to the peculiarities of the yale implementation. the medical library study contained eight participants—a mix of medical and nursing students. facets, reported bauer, “worked well in several instances, although some participants did not think they were noticeable on the right side of the page.”36 the prompt for the faceted task in this study came after the user had done a search: “what if you wanted to look at a particular subset, say ‘xxx’ (determine by looking at the facets).”37 half of the participants used facets, half used “search within” to narrow the topic by adding keywords. sixty-two percent of the participants were successful at this task. the undergraduate study asked five participants faced with a results list, “what would you do now if you only wanted to see material written by john adams?”38 on this task, only one of the five was successful, even though the author’s name was on the screen. bauer noted that in general, “the use of the topic facet to narrow the search was not understood by most participants. . . . even when participants tried to use topic facets the length of the list and extraneous topics rendered them less than useful.”39 the five undergraduates were also asked, “could you find books in this set of results that are about health and illness in the united states population, or control of communicable diseases during the era of the depression?”40 again, only one of the five was successful. bauer notes that “the overly broad search results made this difficult for participants. again, topic facets were difficult to navigate and not particularly useful to this search.”41 bauer’s team noted that when the search was configured to return more hits, “topic facets become a confusingly large set of unrelated items. these imprecise search results, combined with poor topic facet sets, seemed to result in confusion for test participants.”42 participants were not aware that topics represented subsets, although learning occurred because the “narrow” header was helpful to some participants.43 other results found by bauer’s team were that participants were intrigued by facets, navigation tools are needed so that patrons may reorder large sets of topic facets, format and era facets were useful to participants, and call-number facets were not used by anyone. antelman, pace, and lynema studied north carolina state university’s (ncsu) next-generation library catalog, which is driven by software from endeca.44 their study used ten undergraduate students in a between-subjects design where five used the endeca catalog and five used the library’s traditional catalog. the researchers noted that their participants may have been experienced with the library’s old catalog, as log data shows most ncsu users enter one or two terms, which was not true of study usability studies of faceted browsing: a literature review | fagan 63 one product’s faceted system for a library catalog does not substitute for another, the size and scope of local collections may greatly affect results, and cataloging practices and metadata will affect results. still, it is important for practicing librarians to determine if new features such as facets truly improve the user’s experience. methodological best practices after reading numerous empirical research studies (some of which critique their own methods) and library case studies, some suggestions for designing better studies of facets in library catalogs emerged. designing the study ■■ consider reusing protocols from previous studies. this provides not only a tested method but also a possible point of comparison. ■■ define clear goals for each study and focus on specific research questions. it’s tempting to just throw the user into the interface and see what happens, but this makes it difficult, if not impossible, to analyze the results in a useful way. for example, one of zhang and marchionini’s hypotheses specifically describes what rich interaction would look like: “typing in keywords and clicking visual bars to filter results would be used frequently and interchangeably by the users to finish complex search tasks, especially when large numbers of results are returned.”64 ■■ develop the study for one type of user. olson’s focus on graduate students in the dissertation process allowed the researchers to control for variables such as interest of and knowledge about the subject. ■■ pilot test the study with a student worker or colleague to iron out potential wrinkles. ■■ let users explore the system for a short time and possibly complete one highly structured task to help the user become used to the test environment, interface, and facilitator.65 unless you are truly interested in the very first experience users have with a system, the first use of a system is an artificial case. designing tasks ■■ make sure user performance on each task is measurable. will you measure the time spent on a task? if “success” is important, define what that would look like. for example, english et al. defined success for one of their tasks as when “the participant indicated (within the allotted time) that he/she had reached an appropriate set of images/specific image in the collection.”66 ■■ establish benchmarks for comparison. one can test for significant differences between interfaces, one can test for differences between research subjects and an expert user, and one can simply measure against ■■ users are more confident with a faceted system.58 ■■ users may prefer the familiarity afforded by traditional website interface (hyperlinks + keyword search).59 ■■ initial reactions to the faceted interface may be cautious, seeing it as different or unfamiliar.60 users interact with specific characteristics of faceted interfaces, and they go beyond just one click with facets when it is permitted. english et al. found that 7 percent of their participants expanded facets by removing a term, and that facets were used more than “keyword search within”: 27.6 percent versus 9 percent.61 yee et al. found that participants construct queries from multiple facets 19 percent of the time in unstructured tasks; in structured tasks they do so 45 percent of the time.62 the above studies did not use library catalogs; in most cases they used an experimental interface with record sets that were much smaller and less complicated than in a complete library collection. domains included websites, information from one website, image collections, video collections, and a journal article collection. summary of practical user studies related to faceted browsing this review also included studies from practicing librarians at live library implementations. these studies generally had smaller numbers of users, were more likely to focus on the entire interface rather than a few features, and chose more widely divergent methods. studies were usually linked to a specific product, and results varied widely between systems and studies. for this reason it is difficult to assemble a bulleted summary as with the previous section. the variety of results from these studies indicate that when faceted browsing is applied to a reallife situation, implementation details can greatly affect user performance and user preference. some, like labarre, are skeptical about whether facets are appropriate for library information. descriptions of library materials, says labarre, include analyses of intellectual content that go beyond the descriptive terms assigned to commercial items such as a laptop: now is the time to question the assumptions that are embedded in these commercial systems that were primarily designed to provide access to concrete items through descriptions in order to enhance profit.63 it is clear that an evaluation of commercial interfaces or experimental interfaces does not substitute for an opac evaluation. yet it is a challenge for libraries to find expertise and resources to conduct user studies. the systems they want to test are large and complex. collaborating with other libraries has its own challenges: an evaluation of 64 information technology and libraries | june 2010 groups of participants, each of which tests a different system. ■❏ a within-subjects design has one group of participants test both systems. it is hoped that if libraries use the suggestions above when designing future experiments, results across studies will be more comparable and useful. designing user studies of faceted browsing after examining both empirical research studies and case studies by practicing librarians, a key difference seems to be the specificity of research questions and designing tasks and measurements to test specific hypotheses. while describing a full user-study protocol for investigating faceted browsing in a library catalog is beyond the scope of this article, reviewing the literature and the study methods it describes provided insights into how hypotheses, tasks, and measurements could be written to provide more reliable and comparable evidence related to faceted browsing in library catalog systems. for example, one research question could surround the format facet: “compared with our current interface, does our new faceted interface improve the user’s ability to find different formats of materials?” hypotheses could include the following: 1. users will be more accurate when identifying the formats of items from their result set when using the faceted interface than when using the traditional interface. 2. users will be able to identify formats of items more quickly with the faceted interface than with the traditional interface. looking at these hypotheses, here is a prompt and some example tasks the participants would be asked to perform: “we will be asking you to find a variety of formats of materials. when we say formats of materials, we mean books, journal articles, videos, etc.” ■■ task 1: please use interface a to search on “interpersonal communication.” look at your results set. please list as many different formats of material as you can. ■■ task 2: how many items of each format are there? ■■ task 3: please use interface b to search on “family communication.” what formats of materials do you see in your results set? ■■ task 4: how many items of each format are there?” we would choose the topics “interpersonal communication” and “family communication” because our local catalog has many material types for these topics and because these topics would be understood by most of our students. we would choose different topics to expectations or against previous iterations of the same study. for example, “75 percent of users completed the task within five minutes.” zhang and marchionini measured error rates, another possible benchmark.67 ■■ consider looking at your existing opac logs for zeroresults searches or other issues that might inspire interesting questions. ■■ target tasks to avoid distracters. for example, if your catalog has a glut of government documents, consider running the test with a limit set to exclude them unless you are specifically interested in their impact. for example, capra et al. decided to include the united states as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in their study.68 ■■ for some tasks, give the subjects simple queries (e.g., “what are the ways to prevent breast cancer?”) as opposed to asking the subjects to come up with their own topic. this can help control for the potential challenges of formulating one’s own research question on the spot. as librarians know, formulating a good research question is its own challenge. ■■ if you are using any timed tasks, consider how the nature of your tasks could affect the result. for example, pratt, hearst, and fagan noted that the time that it took subjects to read and understand abstracts most heavily influenced the time for them to find an answer.69 english et al. found that the system’s processing time influenced their results.70 ■■ consider the implications of your local implementation carefully when designing your study. at yale, the team chose to point their vufind instance at just 400,000 of their records, drew questions from problems users were having (as shown in log files), and targeted questions to these problems.71 who to study? ■■ try to study a larger set of users. it is better to create a short test with many users than a long test with a few users. nielsen suggests that twenty users is sufficient.72 consider collaborating with another library if necessary. ■■ if you test a small number, such as the typical four to eight users for a usability test, be sure you emphasize that your results are not generalizable. ■■ use subjects who are already interested in the subject domain: for example, pratt, hearst, and fagan used breast cancer patients,73 and olson used graduate students currently writing their dissertations.74 ■■ consider focusing on advanced or scholarly users. la barre suggests that undergraduates may be overstudied.75 ■■ for comparative studies, consider having both between-subjects and within-subjects designs.76 ■❏ a between-subjects design involves creating two usability studies of faceted browsing: a literature review | fagan 65 these experimental studies. previous case-study investigations of library catalog interfaces with facets have proven inconclusive. by choosing more specific research questions, tasks, and measurements for user studies, libraries may be able to design more objective studies and compare results more effectively. references 1. marti a. hearst, “clustering versus faceted categories for information exploration,” communications of the acm 49, no. 4 (2006): 60. 2. kathryn la barre, “faceted navigation and browsing features in new opacs: robust support for scholarly information seeking?” knowledge organization 34, no. 2 (2007): 82. 3. vanda broughton, “the need for faceted classification as the basis of all methods of information retrieval,” aslib proceedings 58, no. 1/2 (2006): 49–71. 4. s. r. ranganathan, colon classification basic classification, 6th ed. (new york: asia, 1960). 5. deborah l. mcguinness, “ontologies come of age,” in spinning the semantic web: bringing the world wide web to its full potential, ed. dieter fensel et al. (cambridge, mass.: mit pr., 2003): 179–84. 6. hearst, “clustering versus faceted categories,” 60. 7. ibid., 61. 8. ibid., 59. 9. ibid.. 60. 10. wanda pratt, marti a. hearst, and lawrence m. fagan, “a knowledge-based approach to organizing retrieved documents,” proceedings of the sixteenth national conference on artificial intelligence, july 18–22, 1999, orlando, florida (menlo park, calif.: aaai pr., 1999): 80–85. 11. ibid. 12. ibid., 5. 13. ka-ping yee et al., “faceted metadata for image search and browsing,” 2003, http://flamenco.berkeley.edu/papers/ flamenco-chi03.pdf (accessed oct. 6, 2008). 14. ibid., 6. 15. ibid., 7. 16. ibid. 17. ibid., 8. 18. jennifer english et al., “flexible search and navigation,” 2002, http://flamenco.berkeley.edu/papers/flamenco02.pdf (accessed apr. 22, 2010). 19. ibid., 7. 20. ibid., 6. 21. ibid., 7. 22. mohammed nasir uddin and paul janecek, “performance and usability testing of multidimensional taxonomy in web site search and navigation,” performance measurement and metrics 8, no. 1 (2007): 18–33. 23. ibid., 25. 24. robert capra et al., “effects of structure and interaction style on distinct search tasks,” proceedings of the 7th acm-ieee-cs joint conference on digital libraries (new york: acm, 2007): 442–51. 25. ibid., 446. 26. ibid., 450. help minimize learning effects. to further address this, we would plan to have half our users start first with the traditional interface and half to start first with the faceted interface. this way we can test for differences resulting from learning. the above tasks would allow us to measure several pieces of evidence to support or reject our hypotheses. for tasks 1 and 3, we would measure the number of formats correctly identified by users compared with the number found by an expert searcher. for tasks 2 and 4, we would compare the number of items correctly identified with the total items found in each category by an expert searcher. we could also time the user to determine which interface helped them work more quickly. in addition to measuring the number of formats identified and the number of items identified in each format, we would be able to measure the time it takes users to identify the number of formats and the number of items in each format. to measure user satisfaction, we would ask participants to complete the system usability scale (sus) after each interface and, at the very end of the study, complete a questionnaire comparing the two interfaces. even just selecting the format facet, we would have plenty to investigate. other hypotheses and tasks could be developed for other facet types, such as time period or publication date, or facets related to the responsible parties, such as author or director: hypothesis: users can find more materials written in a certain time period using the faceted interface. task: find ten items of any type (books, journals, movies) written in the 1950s that you think would have information about television advertising. hypothesis: users can find movies directed by a specific person more quickly using the faceted interface. task: in the next two minutes, find as many movies as you can that were directed by orson welles. for the first task above, an expert searcher could complete the same task, and their time could be used as a point of comparison. for the second, the total number of movies in the library catalog that were directed by welles is an objective quantity. in both cases, one could compare the user’s performance on the two interfaces. ■■ conclusion reviewing user studies about faceted browsing revealed empirical evidence that faceted browsing improves user performance. yet this evidence does not necessarily point directly to user success in faceted library catalogs, which have much more complex databases than those used in 66 information technology and libraries | june 2010 53. uddin and janecek, “performance and usability testing”; zhang and marchionini, evaluation and evolution; hao chen and susan dumais, bringing order to the web: automatically categorizing search results (new york: acm, 2000): 145–52. 54. uddin and janecek, “performance and usability testing.” 55. ibid.; pratt, hearst, and fagan, “a knowledge-based approach”; hsinchun chen et al., “internet browsing and searching: user evaluations of category map and concept space techniques,” journal of the american society for information science 49, no. 7 (1998): 582–603. 56. vanda broughton, “the need for faceted classification as the basis of all methods of information retrieval,” aslib proceedings 58, no. 1/2 (2006): 49–71; pratt, hearst, and fagan, “a knowledge-based approach,” 80–85.; chen et al., “internet browsing and searching,” 582–603; yee et al., “faceted metadata for image search and browsing”; english et al., “flexible search and navigation using faceted metadata.” 57. uddin and janecek, “performance and usability testing”; zhang and marchionini, evaluation and evolution; hideo joho and joemon m. jose, slicing and dicing the information space using local contexts (new york: acm, 2006): 66–74.; yee et al., “faceted metadata for image search and browsing.” 58. yee et al., “faceted metadata for image search and browsing”; chen and dumais, bringing order to the web. 59. capra et al., “effects of structure and interaction style.” 60. yee et al., “faceted metadata for image search and browsing”; capra et al., “effects of structure and interaction style”; zhang and marchionini, evaluation and evolution. 61. english et al., “flexible search and navigation,” 7. 62. yee et al., “faceted metadata for image search and browsing,” 7. 63. la barre, “faceted navigation and browsing,” 85. 64. zhang and marchionini, evaluation and evolution, 183. 65. english et al., “flexible search and navigation.” 66. ibid., 6. 67. zhang and marchionini, evaluation and evolution. 68. capra et al., “effects of structure and interaction style.” 69. pratt, hearst, and fagan, “a knowledge-based approach.” 70. english et al., “flexible search and navigation.” 71. bauer, “yale university library vufind test—undergraduates.” 72. jakob nielsen, “quantitative studies: how many users to test?” online posting, alertbox, june 26, 2006 http://www.useit .com/alertbox/quantitative_testing.html (accessed apr. 7, 2010). 73. pratt, hearst, and fagan, “a knowledge-based approach.” 74. tod a. olson used graduate students currently writing their dissertations. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61. 75. la barre, “faceted navigation and browsing.” 76. capra et al., “effects of structure and interaction style.” 27. junliang zhang and gary marchionini, evaluation and evolution of a browse and search interface: relation browser++ (atlanta, ga.: digital government society of north america, 2005): 179–88. 28. ibid., 183. 29. marti a. hearst, “uis for faceted navigation: recent advances and remaining open problems,” 2008, http://people. ischool.berkeley.edu/~hearst/papers/hcir08.pdf (accessed apr. 27, 2010). 30. tamar sadeh, “user experience in the library: a case study,” new library world 109, no. 1/2 (jan. 2008): 7–24. 31. ibid., 22. 32. jerilyn veldof, e-mail from university of minnesota usability services lab, 2008. 33. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61. 34. ibid., 555. 35. kathleen bauer, “yale university library vufind test— undergraduates,” may 20, 2008, http://www.library.yale.edu/ usability/studies/summary_undergraduate.doc (accessed apr. 27, 2010); kathleen bauer and alice peterson-hart, “usability test of vufind as a subject-based display of ebooks,” aug. 21, 2008, http://www.library.yale.edu/usability/studies/summary _medical.doc (accessed apr. 27, 2010). 36. bauer and peterson-hart, “usability test of vufind as a subject-based display of ebooks,” 1. 37. ibid., 2. 38. ibid., 3. 39. ibid. 40. ibid., 4. 41. ibid. 42. ibid., 5. 43. ibid., 8. 44. kristin antelman, andrew k. pace, and emily lynema, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39. 45. ibid., 139. 46. ibid., 133. 47. ibid., 135. 48. ibid., 136. 49. jennifer l. ward, steve shadle, and pam mofield, “user experience, feedback, and testing,” library technology reports 44, no. 6 (aug. 2008): 22. 50. english et al., “flexible search and navigation.” 51. peter ingwersen and irene wormell, “ranganathan in the perspective of advanced information retrieval,” libri 42 (1992): 184–201; winfried godert, “facet classification in online retrieval,” international classification 18, no. 2 (1991): 98–109.; w. godert, “klassificationssysteme und online-katalog [classification systems and the online catalogue],” zeitschrift für bibliothekswesen und bibliographie 34, no. 3 (1987): 185–95. 52. yee et al., “faceted metadata for image search and browsing”; english et al., “flexible search and navigation.” generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 171 from previous experience and from research in software engineering. wasted effort and poor interoperability can therefore ensue, raising the costs of dls and jeopardizing the fluidity of information assets in the future. in addition, there is a need for modeling services and data structures as highlighted in the “digital library reference model” proposed by the delos eu network of excellence (also called the “delos manifesto”);2 in fact, the distribution of dl services over digital networks, typically accessed through web browsers or dedicated clients, makes the whole theme of interaction between users important, for both individual usage and remote collaboration. designing and modeling such interactions call for considerations pertaining to the fields of human– computer interaction (hci) and computer-supported cooperative work (cscw). as an example, scenariobased or activity-based approaches developed in the hci area can be exploited in dl design. to meet these needs we developed cradle (cooperative-relational approach to digital library environments),3 a metamodel-based digital library management system (dlms) supporting collaboration in the design, development, and use of dls, exploiting patterns emerging from previous projects. the entities of the cradle metamodel allow the specification of collections, structures, services, and communities of users (called “societies” in cradle) and partially reflect the delos manifesto. the metamodel entities are based on existing dl taxonomies, such as those proposed by fox and marchionini,4 gonçalves et al.,5 or in the delos manifesto, so as to leverage available tools and knowledge. designers of dls can exploit the domain-specific visual language (dvsl) available in the cradle environment—where familiar entities extracted from the referred taxonomies are represented graphically—to model data structures, interfaces and services offered to the final users. the visual model is then processed and transformed, exploiting suitable templates, toward a set of specific languages for describing interfaces and services. the results are finally transformed into platformindependent (java) code for specific dl applications. cradle supports the basic functionalities of a dl through interfaces and service templates for managing, browsing, searching, and updating. these can be further specialized to deploy advanced functionalities as defined by designers through the entities of the proposed visual the design and development of a digital library involves different stakeholders, such as: information architects, librarians, and domain experts, who need to agree on a common language to describe, discuss, and negotiate the services the library has to offer. to this end, high-level, language-neutral models have to be devised. metamodeling techniques favor the definition of domainspecific visual languages through which stakeholders can share their views and directly manipulate representations of the domain entities. this paper describes cradle (cooperative-relational approach to digital library environments), a metamodel-based framework and visual language for the definition of notions and services related to the development of digital libraries. a collection of tools allows the automatic generation of several services, defined with the cradle visual language, and of the graphical user interfaces providing access to them for the final user. the effectiveness of the approach is illustrated by presenting digital libraries generated with cradle, while the cradle environment has been evaluated by using the cognitive dimensions framework. d igital libraries (dls) are rapidly becoming a preferred source for information and documentation. both at research and industry levels, dls are the most referenced sources, as testified by the popularity of google books, google video, ieee explore, and the acm portal. nevertheless, no general model is uniformly accepted for such systems. only few examples of modeling languages for developing dls are available,1 and there is a general lack of systems for designing and developing dls. this is even more unfortunate because different stakeholders are interested in the design and development of a dl, such as information architects, to librarians, to software engineers, to experts of the specific domain served by the dl. these categories may have contrasting objectives and views when deploying a dl: librarians are able to deal with faceted categories of documents, taxonomies, and document classification; software engineers usually concentrate on services and code development; information architects favor effectiveness of retrieval; and domain experts are interested in directly referring to the content of interest without going through technical jargon. designers of dls are most often library technical staff with little to no formal training in software engineering, or computer scientists with little background in the research findings of hypertext information retrieval. thus dl systems are usually built from scratch using specialized architectures that do not benefit alessio malizia (alessio.malizia@uc3m.es) is associate professor, universidad carlos iii, department of informatics, madrid, spain; paolo bottoni (bottoni@di.uniroma1.it) is associate professor and s. levialdi (levialdi@di.uniroma1.it) is professor, “sapienza” university of rome, department of computer science, rome, italy. alessio malizia, paolo bottoni, and s. levialdi generating collaborative systems for digital libraries: a model-driven approach 172 information technology and libraries | december 2010 a formal foundation for digital libraries, called 5s, based on the concepts of streams, (data) structures, (resource) spaces, scenarios, and societies. while being evidence of a good modeling endeavor, the approach does not specify formally how to derive a system implementation from the model. the new generation of dl systems will be highly distributed, providing adaptive and interoperable behaviour by adjusting their structure dynamically, in order to act in dynamic environments (e.g., interfacing with the physical world).13 to manage such large and complex systems, a systematic engineering approach is required, typically one that includes modeling as an essential design activity where the availability of such domain-specific concepts as first-class elements in dl models will make application specification easier.14 while most of the disciplines related to dls—e.g., databases,15 information retrieval,16 and hypertext and multimedia17—have underlying formal models that have properly steered them, little is available to formalize dls per se. wang described the structure of a dl system as a domain-specific database together with a user interface for querying the records stored in the database.18 castelli et al. present an approach involving multidimensional query languages for searching information in dl systems that is based on first-order logic.19 these works model metadata specifications and thus are the main examples of system formalization in dl environments. cognitive models for information retrieval, as used for example by oddy et al.,20 focus on users’ information-seeking behavior (i.e., formation, nature, and properties of a users’ information need) and on how information retrieval systems are used in operational environments. other approaches based on models and languages for describing the entities involved in a dl are the digital library definition language,21 the dspace data model22 (with the definitions of communities and workflow models), the metis workflow framework,23 and the fedora structoid approach.24 e/r approaches are frequently used for modeling database management system (dbms) applications,25 but as e/r diagrams only model the static structure of a dbms, they generally do not deal deeply with dynamic aspects. temporal extensions add dynamic aspects to the e/r approach, but most of them are not object-oriented.26 the advent of object-oriented technology calls for approaches and tools to information system design resulting in object-oriented systems. these considerations drove research toward modeling approaches as supported by uml.27 however, since the uml metamodel is not yet widespread in the dl community, we adopted the e/r formalism and complemented it with the specification of the dynamics made available through the user interface, as described by malizia et al.28 using the metamodel, we have defined a dsvl, including basic entities and language. cradle is based on the entity-relationship (e/r) formalism, which is powerful and general enough to describe dl models and is supported by many tools as a metamodeling language. moreover, we observed that users and designers involved in the dl environment, but not coming from a software engineering background, may not be familiar with advanced formalism like unified modeling language (uml), but they usually have basic notions on database management systems, where e/r is largely employed. ■■ literature review dls are complex information systems involving technologies and features from different areas, such as library and information systems, information retrieval, and hci. this interdisciplinary nature is well reflected in the various definitions of dls present in the literature. as far back as 1965, licklider envisaged collections of digital versions of scanned documents accessible via interconnected computers.6 more recently, levy and marshall described dls as sets of collections of documents, together with digital resources, accessible by users in a distributed context.7 to manage the amount of information stored in such systems, they proposed some sort of user-assisting software agent. other definitions include not only printed documents, but multimedia resources in general.8 however different the definitions may be, they all include the presence of collections of resources, their organization in structured repositories, and their availability to remote users through networks (as discussed by morgan).9 recent efforts toward standardization have been taken by public and private organizations. for example, a delphi study identified four main ingredients: an organized collection of resources, mechanisms for browsing and searching, a distributed networked environment, and a set of objectified services.10 the president’s information technology advisory committee (pitac) panel on digital libraries sees dls as the networked collections of digital text, documents, images, sounds, scientific data, and software that make up the core of today’s internet and of tomorrow’s universally accessible digital repositories of all human knowledge.11 when considering dls in the context of distributed dl environments, only few papers have been produced, contrasting with the huge bibliography on dls in general. the dl group at the universidad de las américas puebla in mexico introduced the concept of personal and group spaces, relevant to the cscw domain, in the dl system context.12 users can share information stored in their personal spaces or share agents, thus allowing other users to perform the same search on the document collections in the dl. the cited text by gonçalves et al. gives generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 173 education as discussed by wattenberg or zia.33 in the nsdl program, a new generation of services has been developed that includes support for teaching and learning; this means also considering users’ activities or scenarios and not only information access. services for implementing personal content delivery and sharing, or managing digital resources and modeling collaboration, are examples of tools introduced during this program. the virtual reference desk (vrd) is emerging as an interactive service based on dls. with vrd, users can take advantage of domain experts’ knowledge and librarians’ experience to locate information. for example, the u.s. library of congress ask a librarian service acts as a vrd for users who want help in searching information categories or to interact with expert librarians to search for a specific topic.34 the interactive and collaborative aspects of activities taking place within dls facilitate the development of user communities. social networking, work practices, and content sharing are all features that influence the technology and its use. following borgmann,35 lynch sees the future of dls not in broad services but in supporting and facilitating “customization by community,” i.e., services tailored for domain-specific work practices.36 we also examined the research agenda on systemoriented issues in dls and the delos manifesto.37 the agenda abstracts the dl life cycle, identifying five main areas, and proposes key research problems. in particular we tackle activities such as formal modeling of dls and their communities and developing frameworks coherent with such models. at the architectural level, one point of interest is to support heterogeneous and distributed systems, in particular networked dls and services.38 for interoperability, one of the issues is how to support and interoperate with different metadata models and standards to allow distributed cataloguing and indexing, as in the open archive initiative (oai).39 finally, we are interested in the service level of the research agenda and more precisely in web services and workflow management as crucial features when including communities and designing dls for use over networks and for sharing content. as a result of this analysis, the cradle framework features the following: ■■ a visual language to help users and designers when visual modeling their specific dl (without knowing any technical detail apart from learning how to use a visual environment providing diagrams representations of domain specific elements) ■■ an environment integrating visual modeling and code generation instead of simply providing an integrated architecture that does not hide technical details ■■ interface generation for dealing with different users relationships for modeling dl-related scenarios and activities. the need for the integration of multiple languages has also been indicated as a key aspect of the dsvl approach.29 in fact, complex domains like dls typically consist of multiple subdomains, each of which may require its own particular language. in the current implementation, the definition of dsvls exploits the metamodeling facilities of atom3, based on graph-grammars.30 atom3 has been typically used for simulation and model transformation, but we adopt it here as a tool for system generation. ■■ requirements for modeling digital libraries we follow the delos manifesto by considering a dl as an organization (possibly virtual and distributed) for managing collections of digital documents (digital contents in general) and preserving their images on storage. a dl offers contextual services to communities of users, a certain quality of service, and the ability to apply specific policies. in cradle we leave the definition of quality of service to the service-oriented architecture standards we employ and partially model the applicable policy, but we focus here on crucial interactivity aspects needed to make dls usable by different communities of users. in particular, we model interactive activities and services based on librarians’ experiences in face-to-face communication with users, or designing exchange and integration procedures for communicating between institutions and managing shared resources. while librarians are usually interested in modeling metadata across dls, software engineers aim at providing multiple tools for implementing services,31 such as indexing, querying, semantics,32 etc. therefore we provide a visual model useful for librarians and information architects to mimic the design phases they usually perform. moreover, by supporting component services, we help software engineers to specify and add services on demand to dl environments. to this end, we use a service component model. by sharing a common language, users from different categories can communicate to design a dl system while concentrating on their own tasks (services development and design for software engineers and dl design for librarians and information architects). users are modeled according to the delos manifesto as dl end-users (subdivided into content creators, content consumers, and librarians), dl designers (librarians and information architects), dl system administrators (typically librarians), and dl application developers (software engineers). several activities have been started on modeling domain specific dls. as an example, the u.s. national science digital library (nsdl) program promotes educational dls and services for basic and advanced science 174 information technology and libraries | december 2010 ■■ how that information is structured and organized (structural model) ■■ the behavior of the dl (service model) and the different societies of actors ■■ groups of services acting together to carry out the dl behavior (societal model) figure 1 depicts the design approach supported by cradle architecture, namely, modeling the society of actors and services interacting in the domain-specific scenarios and describing the documents and metadata structure included with the library by defining a visual model for all these entities. the dl is built using a collection of stock parts and configurable components that provide the infrastructure for the new dl. this infrastructure includes the classes of objects and relationships that make up the dl, and processing tools to create and load the actual library collection from raw documents, as well as services for searching, browsing, and collection maintenance. finally, the code generation module generates tailored dl services code stubs by composing and specializing components from the component pool. initially, a dl designer is responsible for formalizing (starting from an analysis of the dl requirements and characteristics) a conceptual description of the dl using metamodel concepts. model specifications are then fed into a dl generator (written in python for atom3), to produce a dl tailored suitable for specific platforms and requirements. after these design phases, cradle generates the code for the user interface and the parts of code corresponding to services and actors interacting in the described society. a set of templates for code generation and designers ■■ flexible metadata definitions ■■ a set of interactive integrated tools for user activities with the generated dl system to sum up, cradle is a dlms aimed at supporting all the users involved in the development of a dl system and providing interfaces, data modeling, and services for user-driven generation of specific dls. although cradle does not yet satisfy all requirements for a generic dl system, it addresses issues focused on developing interactive dl systems, stressing interfaces and communication between users. nevertheless, we employed standards when possible to leave it open for further specification or enhancements from the dl user community. extensive use of xml-based languages allows us to change document information depending on implemented recognition algorithms so that expert users can easily model their dl by selecting the best recognition and indexing algorithms. cradle evolves from the jdan (java-based environment for document applications on networks) platform, which managed both document images and forms on the basis of a component architecture.40 jdan was based on xml technologies, and its modularity allowed its integration in service-based and grid-based scenarios. it supported template code generation and modeling, but it required the designer to write xml specifications and edit xml schema files in order to model the dl document types and services, thus requiring technical knowledge that should be avoided to let users concentrate on their specific domains. ■■ modeling digital library systems the cradle framework shows a unique combination of features: it is based on a formal model, exploits a set of domain-specific languages, and provides automatic code generation. moreover, fundamental roles are played by the concepts of society and collaboration.41 cradle generates code from tools built after modeling a dl (according to the rules defined by the proposed metamodel) and performs automatic transformation and mapping from model to code to generate software tools for a given dl model. the specification of a dl in cradle encompasses four complementary dimensions: ■■ multimedia information supported by the dl (collection model) figure 1. cradle architecture generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 175 socioeconomic, and environment dimensions. we now show in detail the entities and relations in the derived metamodel, shown in figure 2. actor entities actors are the users of dls. actors interact with the dl through services (interfaces) that are (or can be) affected by the actors preferences and messages (raised events). in the cradle metamodel, an actor is an entity with a behavior that may concurrently generate events. communications with other actors may occur synchronously or asynchronously. actors can relate through services to shape a digital community, i.e., the basis of a dl society. in fact, communities of students, readers, or librarians interact with and through dls, generally following predefined scenarios. as an example, societies can behave as query generator services (from the point of view of the library) and as teaching, learning, and working services (from the point of view of other humans and organizations). communication between actors within the same or different societies occur through message exchange. to operate, societies need shared data structures and message protocols, enacted by sending structured sequences of queries and retrieving collections of results. the actor entity includes three attributes: 1. role identifies which role is played by the actor within the dl society. examples of specific human roles include authors, publishers, editors, maintainers, developers, and the library staff. examples of nonhuman actors include computers, printers, telecommunication devices, software agents, and digital resources in general. 2. status is an enumeration of possible statuses for the actor: i. none (default value) ii. active (present in the model and actively generating events) iii. inactive (present in the model but not generating events) iv. sleeping (present in the model and awaiting for a response to a raised event) 3. events describes a list of events that can be raised by the actor or received as a response message from a service. examples of events are borrow, reserve, return, etc. events triggered from digital resources include store, trash, and transfer. examples of response events are found, not found, updated, etc. have been built for typical services of a dl environment. to improve acceptability and interoperability, cradle adopts standard specification sublanguages for representing dl concepts. most of the cradle model primitives are defined as xml elements, possibly enclosing other sublanguages to help define dl concepts. in more detail, mime types constitute the basis for encoding elements of a collection. the xml user interface language (xul)42 is used to represent appearance and visual interfaces, and xdoclet is used in the libgen code generation module, as shown in figure 1.43 ■■ the cradle metamodel in the cradle formalism, the specification of a dl includes a collection model describing the maintained multimedia documents, a structural model of information organization, a service model for the dl behavior, and a societal model describing the societies of actors and groups of services acting together to carry out the dl behavior. a society is an instance of the cradle model defined according to a specific collaboration framework in the dl domain. a society is the highest-level component of a dl and exists to serve the information needs of its actors and to describe its context of usage. hence a dl collects, preserves, and shares information artefacts for society members. the basic entities in cradle are derived from the categorization along the actors, activities, components, figure 2. the cradle metamodel with the e/r formalism 176 information technology and libraries | december 2010 a text document, including scientific articles and books, becomes a sequence of strings. the struct entity a struct is a structural element specifying a part of a whole. in dls, structures represent hypertexts, taxonomies, relationships between elements, or containment. for example, books can be structured logically into chapters, sections, subsections, and paragraphs, or physically into cover, pages, line groups (paragraphs), and lines. structures are represented as graphs, and the struct entity (a vertex) contains four attributes: 1. document is a pointer to the document entity the structure refers to. 2. id is a unique identifier for a structure element. 3. type takes three possible values: i. metadata denotes a content descriptor, for instance title, author, etc. ii. layout denotes the associated layout, e.g., left frame, columns, etc. iii. item indicates a generic structure element used for extending the model. 4. values is a list of values describing the element content, e.g., title, author, etc. actors interact with services in an event-driven way. services are connected via messages (send and reply) and can be sequential, concurrent, or task-related (when a service acts as a subtask of a macroservice). services perform operations (e.g., get, add, and del) on collections, producing collections of documents as results. struct elements are connected to each other as nodes of a graph representing metadata structures associated with documents. the metamodel has been translated to a dsvl, associating symbols and icons with entities and relations (see “cradle language and tools” below). with respect to the six core concepts of the delos manifesto (content, user, functionality, quality, policy, and architecture), content can be modeled in cradle as collections and structs, user as actor, and functionality as service. the quality concept is not directly modeled in cradle, but for quality of service we support standard service architecture. policies can be partially modeled by services managing interaction between actors and collections, making it possible to apply standard access policies. from the architectural point of view, we follow the reference architecture of figure 1. ■■ cradle language and tools in this section we describe the selection of languages and tools of the cradle platform. to improve interoperability service entities services describe scenarios, activities, operations, and tasks that ultimately specify the functionalities of a dl, such as collecting, creating, disseminating, evaluating, organizing, personalizing, preserving, requesting, and selecting documents and providing services to humans concerned with fact-finding, learning, gathering, and exploring the content of a dl. all these activities can be described and implemented using scenarios and appear in the dl setting as a result of actors using services (thus societies). furthermore, these activities realize and shape relationships within and between societies, services, and structures. in the cradle metamodel, the service entity models what the system is required to do, in terms of actions and processes, to achieve a task. a detailed task analysis helps understand the current system and the information flow within it in order to design and allocate tasks appropriately. the service entity has four attributes: 1. name is a string representing a textual description of the service. 2. sync states whether communication is synchronous or asynchronous, modeled by values wait and nowait, respectively. 3. events is a list of messages that can trigger actions among services (tasks); for example, valid or notvalid in case of a parsing service. 4. responses contain a list of response messages that can reply to raised events; they are used as a communication mechanism by actors and services. the collection entity collections are sets of documents of arbitrary type (e.g., bits, characters, images, etc.) used to model static or dynamic content. in the static interpretation, a collection defines information content interpreted as a set of basic elements, often of the same type, such as plain text. examples of dynamic content include video delivered to a viewer, animated presentations, and so on. the attributes of collection are name and documents. name is a string, while documents is a list of pairs (documentname, documentlabel), the latter being a pointer to the document entity. the document entity documents are the basic elements in a dl and are modeled with attributes label and structure. label defines a textual string used by a collection entity to refer to the document. we can consider it as a document identifier, specifying a class or a type of document. structure defines the semantics and area of application of the document. for example, any textual representation can be seen as a string of characters, so that generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 177 graphs. model manipulation can then be expressed via graph grammars also specified in atom3. the general process of automatic creation of cooperative dl environments for an application is shown in figure 3. initially, a designer formalizes a conceptual description of the dl using the cradle metamodel concepts. this phase is usually preceded by an analysis of requirements and interaction scenarios, as seen previously. model specifications are then provided to a dl code generator (written in python within atom3) to produce dls tailored to specific platforms and requirements. these are built on a collection of templates of services and configurable components providing infrastructure for the new dl. the sketched infrastructure includes classes for objects (tasks), relationships making up the dl, and processing tools to upload the actual library collection from raw documents, as well as services for searching and browsing and for document collections maintenance. the cradle generator automatically generates different kinds of output for the cradle model of the cooperative dl environment, such as service and collection managers. collection managers define the logical schemata of the dl, which in cradle correspond to a set of mime types, xul and xdoclet specifications, representing digital objects, their component parts, and linking information. collection managers also store instances of their and collaboration, cradle makes extensive use of existing standard specification languages. most cradle outputs are defined with xml-based formats, able to enclose other specific languages. the basic languages and corresponding tools used in cradle are the following: ■■ mime type. multipurpose internet mail extensions (mime) constitute the basis for encoding documents in cradle, supporting several file formats and types of character encoding. mime was chosen because of wide availability of mime types, and standardisation of the approach. this makes it a natural choice for dls where different types of documents need to be managed (pdf, html, doc, etc.). moreover, mime standards for character encoding descriptions help keeping the cradle framework open and compliant with standards. ■■ xul. the xml user interface language (xul) is an xml-based markup language used to represent appearance and visual interfaces. xul is not a public standard yet, but it uses many existing standards and technologies, including dtd and rdf,44 which makes it easily readable for people with a background in web programming and design. the main benefit of xul is that it provides a simple definition of common user interface elements (widgets). this drastically reduces the software development effort required for visual interfaces. ■■ xdoclet. xdoclet is used for generating services from tagged-code fragments. it is an open-source code generation library which enables attribute-oriented programming for java via insertion of special tags.45 it includes a library of predefined tags, which simplify coding for various technologies, e.g., web services. the motivation for using xdoclet in the cradle framework is related to its approach for template code generation. designers can describe templates for each service (browse, query, and index) and the xdoclet generated code can be automatically transformed into the java code for managing the specified service. ■■ atom3. atom3 is a metamodeling system to model graphical formalisms. starting from a metaspecification (in e/r), atom3 generates a tool to process models described in the chosen formalism. models are internally represented using abstract syntax figure 3. cooperative dl generation process with cradle framework 178 information technology and libraries | december 2010 and (3) the metadata operations box. the right column manages visualization and multimedia information obtained from documents. the basic features provided with the ui templates are document loading, visualization, metadata organization, and management. the layout template, in the collection box, manages the visualization of the documents contained in a collection, while the visualization template works according to the data (mime) type specified by the document. actually, by selecting a document included in the collection, the corresponding data file is automatically uploaded and visualized in the ui. the metadata visualization in the code template reflects the metadata structure (a tree) represented by a struct, specifying the relationship between parent and child nodes. thus the xul template includes an area (the metadata box) for managing tree structures as described in the visual model of the dl. although the tree-like visualization has potential drawbacks if there are many metadata items, there should be no real concern with medium loads. the ui template also includes a box to perform operations on metadata, such as insert, delete, and edit. users can select a value in the metadata box and manipulate the presented values. figure 4 shows an example of a ui generated from a basic template. service templates to achieve automated code generation, we use xdoclet to specify parameters and service code generation according to such parameters. cradle can automatically annotate java files with name–value pairs, and xdoclet provides a syntax for parameter specification. code generation is classes and function as search engines for the system. services classes also are generated and are represented as attribute-oriented classes involving parts and features of entities. ■■ cradle platform the cradle platform is based on a model-driven approach for the design and automatic generation of code for dls. in particular, the dsvl for cradle has four diagram types (collection, structure, service, and actor) to describe the different aspects of a dl. in this section we describe the user interface (ui) and service templates used for generating the dl tools. in particular, the ui layout is mainly generated from the structured information provided by the document, struct, and collection entities. the ui events are managed by invoking the appropriate services according to the imported xul templates. at the service and communication levels, the xdoclet code is generated by the service and actor entities, exploiting their relationships. we also show how code generation works and the advanced platform features, such as automatic service discovery. at the end of the section a running example is shown, representing all the phases involved in using the cradle framework for generating the dl tools for a typical library scenario. user interface templates the generation of the ui is driven by the visual model designed by the cradle user. specifically, the model entities involved in this process are document, struct and collection (see figure 2) for the basic components and layout of the interfaces, while linked services are described in the appropriate templates. the code generation process takes place through transformations implemented as actions in the atom3 metamodel specification, where graph-grammar rules may have a condition that must be satisfied for the rule to be applied (preconditions), as well as actions to be performed when the rule is executed (postconditions). a transformation is described during the visual modeling phase in terms of conditions and corresponding actions (inserting xul language statements for the interface in the appropriate code template placeholders). the generated user interface is built on a set of xul template files that are automatically specialized on the basis of the attributes and relationships designed in the visual modeling phase. the layout template for the user interface is divided into two columns (see figure 4). the left column is made of three boxes: (1) the collection box (2) the metadata box, figure 4. an example of an automatically generated user interface. (a) document area; (b) collection box; (c) metadata box; (d) metadata operations box. generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 179 "msg arguments.argname"> { "" , "" "" } , }; the first two lines declare a class with a name class nameimpl that extends the class name. the xdoclet template tag xdtclass:classname denotes the name of the class in the annotated java file. all standard xdoclet template tags have a namespace starting with “xdt.” the rest of the template uses xdtfield : forallfield to iterate through the fields. for each field with a tag named msg arguments.argname (checked using xdtfield : ifhasfieldtag), it creates a subarray of strings using the values obtained from the field tag parameters. xdtfield : fieldname gives the name of the field, while xdtfield : fieldtagvalue retrieves the value of a given field tag parameter. characters that are not part of some xdoclet template tags are directly copied into the generated code. the following code segment was generated by xdoclet using the annotated fields and the above template segment: public class msgargumentsimpl extends msgarguments { public static string[ ][ ] argumentnames = new string[ ][ ]{ { "eventmsg" , " event " , " eventstring " } , { " responsemsg " , " response " , " responsestring " } , }; } similarly, we generate the getter and setter methods for each field: public get () { return ; } public void set ( string value ) { based on code templates. hence service templates are xdoclet templates for transforming xdoclet code fragments obtained from the modeled service entities. the basic xdoclet template manages messages between services, according to the event and response attributes described in “cradle language and tools” above. in fact, cradle generates a java application (a service) that needs to receive messages (event) and reply to them (response) as parameters for the service application. in xdoclet, these can be attached to the corresponding field by means of annotation tags, as in the following code segments: public class msgarguments { . . . . . . /* * @msg arguments.argname name="event " desc="event_string " */ protected string eventmsg = null; /* * @msg arguments.argname name="response" * desc="response_string " */ protected string responsemsg = null; } each msg arguments.argname related to a field is called a field tag. each field tag can have multiple parameters, listed after the field tag. in the tag name msg arguments .argname, the prefix serves as the namespace of all tags for this particular xdoclet application, thus avoiding naming conflicts with other standard or customized xdoclet tags. not only fields can be annotated, but also other entities such as class and functions can have tags too. xdoclet enables powerful code generation requiring little or no customization (depending on how much is provided by the template). the type of code to be generated using the parameters is defined by the corresponding xdoclet template. we have created template files composed of java codes and special xdoclet instructions in the form of xml tags. these xdoclet instructions allow conditionals (if) and loops (for), thus providing us with expressive power close to a programming language. in the following example, we first create an array containing labels and other information for each argument: public class impl extends { public static string[ ][ ] argumentnames = new string[ ][ ] { " , value ) ; }< /xdtfield : ifhasfieldtag> this translates into the following generated code: public java.lang.string get eventmsg ( ) { return eventmsg ; } public void set eventmsg ( string value ) { setvalue ( "eventmsg" , value ) ; } public java.lang.string getresponsemsg ( ) { return getresponsemsg ; } public void setresponsemsg ( string value ) { setvalue ( " responsemsg " , value ) ; } the same template is used for managing the name and sync attributes of service entities. code generation, service discovery, and advanced features a service or interface template only describes the solution to a particular design problem—it is not code. consequently, users will find it difficult to make the leap from the template description to a particular implementation even though the template might include sample code. others, like software engineers, might have no trouble translating the template into code, but they still may find it a chore, especially when they have to do it repeatedly. the cradle visual design environment (based on atom3) helps alleviate these problems. from just a few pieces of information (the visual model), typically application-specific names for actors and services in a dl society along with choices for the design tradeoffs, the tool can create class declarations and definitions implementing the template. the ultimate goal of the modeling effort remains, however, the production of reliable and efficiently executable code. hence a code generation transformation produces interface (xul) and service (java code from xdoclet templates) code from the dl model. we have manually coded xul templates specifying the static setup of the gui, the various widgets and their layout. this must be complemented with code generated from a dl model of the systems dynamics coded into services. while other approaches are possible,46 we employed the solution implemented within the atom3 environment according to its graph grammar modeling approach to code generation. cradle supports a flexible iterative process for visual design and code generation. in fact, a design change might require substantial reimplementation generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 181 selecting one, the ui activates the metadata operations box—figure 6(d). the selected metadata node will then be presented in the lower (metadata operations) box, labeled “set metadata values,” replacing the default “none” value as shown in figure 6. after the metadata item is presented, the user can edit its value and save it by clicking on the “set value” button. the associated action saves the metadata information and causes its display in the intermediate box (tree-like structure), changing the visualization according to the new values. the code generation process for the do_search and front desk services is based on xdoclet templates. in particular, a message listener template is used to generate the java code for the front desk service. in fact, the front desk service is asynchronous and manages communications between actors. the actors classes are generated also by using the services templates since they have attributes, events, and messages, just like the services. the do_search service code is based on the producer and consumer templates, since it is synchronous by definition in the modeled scenario. a get method retrieving a collection of documents is implemented from the getter template. the routine invoked by the transformation action for struct entities performs a breadth-first exploration of the metadata tree in the visual model and attaches the corresponding xul code for displaying the struct node in the correct position within the graph structure of the ui. collections, while a single rectangle connected to a collection represents a document entity; the circles linked to the document entity are the struct (metadata) entities. metadata entities are linked to the node relationships (organized as a tree) and linked to the document entity by a metadata linktype relationship. the search service is synchronous (sync attribute set to “wait”). it queries the document collection (get operation) looking for the requested document (using metadata information provided by the borrow request), and waits for the result of get (a collection of documents). based on this result, the service returns a boolean message “is_available,” which is then propagated as a response to the librarian and eventually to the student, as shown in figure 5. when the library designer has built the model, the transformation process can be run, executing the code generation actions associated with the entities and services represented in the model. the code generation process is based on template code snippets generated from the atom3 environment graph transformation engine, following the generative rules of the metamodel. we also use pre– and postconditions on application of transformation rules to have code generation depend on verification of some property. the generated ui is presented in figure 6. on the right side, the document area is presented according to the xul template. documents are managed according to their mime type: the pdf file of the example is loaded with the appropriate adobe acrobat reader plug-in. on the left column of the ui are three boxes, according to the xul template. the collection box—figure 6(b)— presents the list of documents contained in the collection specified by the documents attribute of the library collection entity, and allows users to interact with documents. after selecting a document by clicking on the list, it is presented in the document area—figure 6(a)—where it can be managed (edit, print, save, etc.). in the metadata box—figure 6(c)—the tree structure of the metadata is depicted according to the categorization modeled by the designer. the xul template contains all the basic layout and action features for managing a tree structure. the generated box contains the parent and child nodes according to the attributes specified in the corresponding struct elements. the user can click on the root for compacting or exploding the tree nodes; by figure 5. the library model, alias the model of the library society 182 information technology and libraries | december 2010 workflow system. the release collection maintains the image files in a permanent storage, while data is written to the target database or content management software, together with xml metadata snippets (e.g., to be stored in xml native dbms). a typical configuration would have the recognition service running on a server cluster, with many dataentry services running on different clients (web browsers directly support xul interfaces). whereas current document capture environments are proprietary and closed, the definition of an xml-based interchange format allows the suitable assembly of different component-based technologies in order to define a complex framework. the realization of the jdan dl system within the cradle framework can be considered as a preliminary step in the direction of a standard multimedia document managing platform with region segmentation and classification, thus aiming at automatic recognition of image database and batch acquisition of multiple multimedia documents types and formats. personal and collaborative spaces a personal space is a virtual area (within the dl society) that is modeled as being owned and maintained by a user including resources (document collections, services, etc.), or references to resources, which are relevant to a task, or set of tasks, the user needs to carry out in the dl. personal spaces may thus contain digital documents in multiple media, personal schedules, visualization tools, and user agents (shaped as services) entitled with various tasks. resources within personal spaces can be allocated ■■ designing and generating advanced collaborative dl systems in this section we show the use of cradle as an analytical tool helpful in comprehending specific dl phenomena, to present the complex interplays that occur between cradle components and dl concepts in a real dl application, and to illustrate the possibility of using cradle as a tool to design and generate advanced tools for dl development. modeling document images collections with cradle, the designer can provide the visual model of the dl society involved in document management and the remaining phases are automatically carried out by cradle modules and templates. we have provided the user with basic code templates for the recognition and indexing services, the data-entry plug-in, and archive release. the designer can thus simply translate the particular dl society into the corresponding visual model within the cradle visual modeling editor. as a proof of concept, figure 7 models the jdan architecture, introduced in “requirements for modeling digital libraries,” exploiting the cradle visual language. the recognition service performs the automatic document recognition and stores the corresponding document images, together with the extracted metadata in the archive collection. it interacts with the scanner actor, representing a machine or a human operator that scans paper documents. designers can choose their own segmentation method or algorithm; what is required to be compliant with the framework is to produce an xdoclet template. it stores the document images into the archive collection, with its different regions layout information according to the xml metadata schema provided by the designer. if there is at least one region marked as “not interpreted,” the dataentry service is invoked on the “not interpreted” regions. the data-entry service allows operators to evaluate the automatic classification performed by the system and edit the segmentation for indexing. operators can also edit the recognized regions with the classification engine (included in the recognition service) and adjust their values and sizes. the output of this phase is an xml description that will be imported in the indexing service for indexing (and eventually querying). the archive collection stores all of the basic information kept in jdan, such as text labels, while the indexing service, based on a multitier architecture, exploiting jboss 3.0, has access to them. this service is responsible for turning the data fragments in the archive collection into useful forms to be presented to the final users, e.g., a report or a query result. the final stage in the recognition process could be to release each document to a content management or figure 6. the ui generated by cradle transforming the library model in xul and xdoclet code generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 183 and metadata, but also can share information with the various committees collaborating for certain tasks. ■■ evaluation in this section we evaluate the presented approach from three different perspectives: usability of the cradle notation, its expressiveness, and usability of the generated dls. usability of cradle notation we have tested it by using the well known cognitive dimensions framework for notations and visual language design.48 the dimensions are usually employed to evaluate the usability of a visual language or notation, or as heuristics to drive the design of innovative visual languages. the significant results are as follows. abstraction gradient an abstraction is a grouping of elements to be treated as one entity. in this sense, cradle is abstraction-tolerant. it provides entities for high-level abstractions of communication processes and services. these abstractions are intuitive as they are visualized as the process they represent (services with events and responses) and easy to learn as their configuration implies few simple attributes. although cradle does not allow users to build new abstractions, the e/r formalism is powerful enough to provide basic abstraction levels. closeness of mapping cradle elements have been assigned icons to resemble their real-world counterparts (e.g., a collection is represented as a set of paper sheets). the elements that do not have a correspondence with a physical object in the real world have icons borrowed from well-known notations (e.g., structs represented as graph nodes). consistency a notation is consistent if a user knowing some of its structure can infer most of the rest. in cradle, when two elements represent the same entity but can be used either as input or as output, then their shape is equal but incorporates an incoming or an outgoing message in order to differentiate them. see, for example, the icons for services or those for graph nodes representing either a according to the user’s role. for example, a conference chair would have access to conference-specific materials, visualization tools and interfaces to upload papers for review by a committee. similarly, we denote a group space as a virtual area in which library users (the entire dl society) can meet to conduct collaborative activities synchronously or asynchronously. explicit group spaces are created dynamically by a designer or facilitator who becomes (or appoints) the owner of the space and defines who the participants will be. in addition to direct user-touser communication, users should be able to access library materials and make annotations on them for every other group to see. ideally, users should be able to act (and carry dl materials with them) between personal and group spaces or among group spaces to which they belong. it may also be the case, however, that a given resource is referenced in several personal or group spaces. basic functionality required for personal spaces includes capabilities for viewing, launching, and monitoring library services, agents, and applications. like group spaces, personal spaces should provide users with the means to easily become aware of other users and resources that are present in a given group space at any time, as well as mechanisms to communicate with other users and make annotations on library resources. we employed this personal and group space paradigm in modeling a collaborative environment in the academic conferences domain, where a conference chair can have a personal view of the document collections (resources) figure 7. the cradle model for the jdan framwork 184 information technology and libraries | december 2010 of “sapienza” university of rome (undergraduate students), shown in figure 5, and (2) an application employed with a project of records management in a collaboration between the computer science and the computer engineering department of “sapienza” university, as shown in figure 7. usability of the generated tools environments for single-view languages generated with atom3 have been extensively used, mostly in an academic setting, in different areas like software and web engineering, modeling, and simulation; urban planning; etc. however, depending on the kind of the domain, generating the results may take some time. for instance, the state reachability analysis in the dl example takes a few minutes; we are currently employing a version of atom3 that includes petri-nets formalism where we can test the services states reachability.49 in general, from application experience, we note the general agreement that automated syntactical consistency support greatly simplifies the design of complex systems. finally, some users pointed out some technical limitations of the current implementation, such as the fact that it is not possible to open several views at a time. altogether, we believe this work contributes to make more efficient and less tedious the definition and maintenance of environments for dls. our model-based approach must be contrasted with the programmingcentric approach of most case tools, where the language and the code generation tools are hard-coded so that whenever a modification has to be done (whether on the language or on the semantic domain) developers have to dive into the code. ■■ conclusions and future work dls are complex information systems that integrate findings from disciplines such as hypertext, information retrieval, multimedia, databases, and hci. dl design is often a multidisciplinary effort, including library staff and computer scientists. wasted effort and poor interoperability can therefore ensue. examining the related bibliography, we noted that there is a lack of tools or automatic systems for designing and developing cooperative dl systems. moreover, there is a need for modeling interactions between dls and users, such as scenario or activity-based approaches. the cradle framework fulfills this gap by providing a model-driven approach for generating visual interaction tools for dls, supporting design and automatic generation of code for dls. in particular, we use a metamodel made of different diagram types (collection, structures, service, and struct or an actor, with different colors. diffuseness/terseness a notation is diffuse when many elements are needed to express one concept. cradle is terse and not diffuse because each entity expresses a meaning on its own. error-proneness data flow visualization reduces the chance of errors at a first level of the specification. on the other hand, some mistakes can be introduced when specifying visual entities, since it is possible to express relations between source and target models which cannot generate semantically correct code. however, these mistakes should be considered “programming errors more than slips,” and may be detected through progressive evaluation. hidden dependencies a hidden dependency is a relation between two elements that is not visible. in cradle, relevant dependencies are represented as data flows via directed links. progressive evaluation each dl model can be tested as soon as it is defined, without having to wait until the whole model is finished. the visual interface for the dl can be generated with just one click, and services can be subsequently added to test their functionalities. viscosity cradle has a low viscosity because making small changes in a part of a specification does not imply lots of readjustments in the rest of it. one can change properties, events or responses and these changes will have only local effect. the only local changes that could imply performing further changes by hand are deleting entities or changing names; however, this would imply minimal changes (just removing or updating references to them) and would only affect a small set of subsequent elements in the same data flow. visibility a dl specification consists of a single set of diagrams fitting in one window. empirically, we have observed that this model usually involves no more than fifteen entities. different, independent cradle models can be simultaneously shown in different windows. expressiveness of cradle the paper has illustrated the expressiveness of cradle by defining different entities end relationships for different dl requisites. to this end, two different applications have been considered: (1) a basic example elaborated with the collaboration of the information science school generating collaborative systems for digital libraries | malizia, bottoni, and levialdi 185 retrieval (reading, mass.: addison-wesley, 1999). 17. d. lucarella and a. zanzi, “a visual retrieval environment for hypermedia information systems,” acm transactions on information systems 14 (1996): 3–29. 18. b. wang, “a hybrid system approach for supporting digital libraries,” international journal on digital libraries 2 (1999): 91–110,. 19. d. castelli, c. meghini, and p. pagano, “foundations of a multidimensional query language for digital libraries,” in proc. ecdl ’02, lncs 2458 (berlin: springer, 2002): 251–65. 20. r. n. oddy et al., eds., proc. joint acm/bcs symposium in information storage & retrieval (oxford: butterworths, 1981). 21. k. maly, m. zubair et al., “scalable digital libraries based on ncstrl/dienst,” in proc. ecdl ’00 (london: springer, 2000): 168–79. 22. r. tansley, m. bass and m. smith, “dspace as an open archival information system: current status and future directions,” proc. ecdl ’03, lncs 2769 (berlin: springer, 2003): 446–60. 23. k. m. anderson et al., “metis: lightweight, flexible, and web-based workflow services for digital libraries,” proc. 3rd acm/ieee-cs jcdl ’03 (los alamitos, calif.: ieee computer society, 2003): 98–109. 24. n. dushay, “localizing experience of digital content via structural metadata,” in proc. 2nd acm/ieee-cs jcdl ’02 (new york: acm, 2002): 244–52. 25. m. gogolla et al., “integrating the er approach in an oo environment,” proc. er, ’93 (berlin: springer, 1993): 376–89. 26. heidi gregersen and christian s. jensen, “temporal entity-relationship models—a survey,” ieee transactions on knowledge & data engineering 11 (1999): 464–97. 27. b. berkem, “aligning it with the changes using the goal-driven development for uml and mda,” journal of object technology 4 (2005): 49–65. 28. a. malizia, e. guerra, and j. de lara, “model-driven development of digital libraries: generating the user interface,” proc. mddaui ’06, http://sunsite.informatik.rwth-aachen.de/ publications/ceur-ws/vol-214/ (accessed oct 18, 2010). 29. d. l. atkins et al., “mawl: a domain-specific language for form-based services,” ieee transactions on software engineering 25 (1999): 334–46. 30. j. de lara and h. vangheluwe, “atom3: a tool for multi-formalism and meta-modelling,” proc. fase ’02 (berlin: springer, 2002): 174–88. 31. j. m. morales-del-castillo et al., “a semantic model of selective dissemination of information for digital libraries,” journal of information technology & libraries 28 (2009): 21–30. 32. n. santos, f. c. a. campos, and r. m. m. braga, “digital libraries and ontology,” in handbook of research on digital libraries: design, development, and impact, ed. y.-l. theng et al. (hershey, pa.: idea group, 2008): 1:19. 33. f. wattenberg, “a national digital library for science, mathematics, engineering, and technology education,” d-lib magazine 3 no. 10 (1998), http://www.dlib.org/dlib/october98/ wattenberg/10wattenberg.html (accessed oct 18, 2010); l. l. zia, “the nsf national science, technology, engineering, and mathematics education digital library (nsdl) program: new projects and a progress report,” d-lib magazine, 7, no. 11 (2002), http://www.dlib.org/dlib/november01/zia/11zia.html (accessed oct 18, 2010). 34. u.s. library of congress, ask a librarian, http://www.loc society), which describe the different aspects of a dl. we have built a code generator able to produce xul code from the design models for the dl user interface. moreover, we use template code generation integrating predefined components for the different services (xdoclet language) according to the model specification. extensions of cradle with behavioral diagrams and the addition of analysis and simulation capabilities are under study. these will exploit the new atom3 capabilities for describing multiview dsvls, to which this work directly contributed. references 1. a. m. gonçalves, e. a fox, “5sl: a language for declarative specification and generation of digital libraries,” proc. jcdl ’02 (new york: acm, 2002): 263–72. 2. l. candela et al., “setting the foundations of digital libraries: the delos manifesto,” d-lib magazine 13 (2007), http://www.dlib.org/dlib/march07/castelli/03castelli.html (accessed oct 18, 2010). 3. a. malizia et al., “a cooperative-relational approach to digital libraries,” proc. ecdl 2007, lncs 4675 (berlin: springer, 2007): 75–86. 4. e. a. fox and g. marchionini, “toward a worldwide digital library,” communications of the acm 41 (1998): 29–32. 5. m. a. gonçalves et al., “streams, structures, spaces, scenarios, societies (5s): a formal model for digital libraries,” acm transactions on information systems 22 (2004): 270–312. 6. j. c. r. licklider, libraries of the future (cambridge, mass.: mit pr., 1965). 7. d. m. levy and c. c. marshall, “going digital: a look at assumptions underlying digital libraries,” communications of the acm 38 (1995): 77–84. 8. r. reddy and i. wladawsky-berger, “digital libraries: universal access to human knowledge—a report to the president,” 2001, www.itrd.gov/pubs/pitac/pitac-dl-9feb01.pdf (accessed mar. 16, 2010). 9. e. l. morgan, “mylibrary: a digital library framework and toolkit,” journal of information technology & libraries 27 (2008): 12–24. 10. t. r. kochtanek and k. k. hein, “delphi study of digital libraries,” information processing management 35 (1999): 245–54. 11. s. e. howe et al., “the president’s information technology advisory committee’s february 2001 digital library report and its impact,” in proc. jcdl ’01 (new york: acm, 2001): 223–25. 12. n. reyes-farfan and j. a. sanchez, “personal spaces in the context of oa,” proc. jcdl ’03 (ieee computer society, 2003): 182–83. 13. m. wirsing, report on the eu/nsf strategic workshop on engineering software-intensive systems, 2004, http://www.ercim. eu/eu-nsf/sis.pdf (accessed oct 18, 2010) 14. s. kelly and j.-p. tolvanen, domain-specific modeling: enabling full code generation (hoboken, n.j.: wiley, 2008). 15. h. r. turtle and w. bruce croft, “evaluation of an inference network-based retrieval model,” acm transactions on information systems 9 (1991): 187–222. 16. r. a. baeza-yates, b. a. ribeiro-neto, modern information 186 information technology and libraries | december 2010 .mozilla.org/en/xul (accessed mar. 16, 2010). 43. xdoclet, welcome! what is xdoclet? http://xdoclet .sourceforge.net/xdoclet/index.html (accessed mar. 16, 2010). 44. w3c, extensible markup language (xml) 1.0 (fifth edition), http://www.w3.org/tr/2008/rec-xml-20081126/ (accessed mar. 16, 2010); w3c, resource description framework (rdf), http://www.w3.org/rdf/ (accessed mar. 16, 2010). 45. h. wada and j. suzuki, “modeling turnpike frontend system: a model-driven development framework leveraging uml metamodeling and attribute-oriented programming,” proc. models ’05, lncs 3713 (berlin: springer, 2005): 584–600. 46. i. horrocks, constructing the user interface with statecharts (boston: addison-wesley, 1999). 47. universal discover, description, and integration oasis standard, welcome to uddi xml.org, http://uddi.xml.org/ (accessed mar. 16, 2010). 48. t. r. g. green and m. petre, “usability analysis of visual programming environments: a ‘cognitive dimensions framework,’” journal of visual languages & computing 7 (1996): 131–74. 49. j. de lara, e. guerra, and a. malizia, “model driven development of digital libraries—validation, analysis and formal code generation,” proc. 3rd webist ’07 (berlin: springer, 2008). .gov/rr/askalib/ (accessed on mar. 16, 2010). 35. c. l. borgmann, “what are digital libraries? competing visions,” information processing & management 25 (1999):227–43. 36. c. lynch, “coding with the real world: heresies and unexplored questions about audience, economics, and control of digital libraries,” in digital library use: social practice in design and evaluation, ed. a. p. bishop, n. a. van house, and b. buttenfield (cambridge, mass.: mit pr., 2003): 191–216. 37. y. ioannidis et al., “digital library information-technology infrastructure,” international journal of digital libraries 5 (2005): 266–74. 38. e. a. fox et al., “the networked digital library of theses and dissertations: changes in the university community,” journal of computing higher education 13 (2002): 3–24. 39. h. van de sompel and c. lagoze, “notes from the interoperability front: a progress report on the open archives initiative,” proc. 6th ecdl, 2002, lncs 2458 (berlin: springer 2002): 144–57. 40. f. de rosa et al., “jdan: a component architecture for digital libraries,” delos workshop: digital library architectures, (padua, italy: edizioni libreria peogetto, 2004): 151–62. 41. defined as a set of actors (users) playing roles and interacting with services. 42. mozilla developer center, xul, https://developer drawing upon findings from a national survey of u.s. public libraries, this paper examines trends in internet and public computing access in public libraries across states from 2004 to 2006. based on library-supplied information about levels and types of internet and public computing access, the authors offer insights into the network-based content and services that public libraries provide. examining data from 2004 to 2006 reveals trends and accomplishments in certain states and geographic regions. this paper details and discusses the data, identifies and analyzes issues related to internet access, and suggests areas for future research. t his article presents findings from the 2004 and 2006 public libraries and the internet studies detail ing the different levels of internet access available in public libraries in different states.1 at this point, 98.9 percent of public library branches are connected to the internet and 98.4 percent of connected public library branches offer public internet access.2 however, the types of access and the quality of access available are not uniformly distributed among libraries or among the libraries in various states. while the data at the national level paint a portrait of the internet and public computing access provided by public libraries overall, studies of these differences among the states can help reveal successes and lessons that may help libraries in other states to increase their levels of access. the need to continue to increase the levels and quality of internet and public computing access in public libraries is not an abstract problem. the services and con tent available on the internet continue to require greater bandwidth and computing capacity, so public libraries must address everincreasing technological demands on the internet and computing access that they provide. 3 public libraries are also facing increased external pressure on their internet and computing access. as patrons have come to rely on the availability of internet and computing access in public libraries, so too have government agencies. many federal, state, and local government agencies now rely on public libraries to facilitate citizens’ access to egovernment services, such as applying for the federal prescription drug plans, filing taxes, and many other interactions with the gov ernment.4 further, public libraries also face increased demands to supply public access computing in times of natural disasters, such as the major hurricanes of 2004 and 2005.5 as a result, both patrons and govern ment agencies depend on the internet and computing access provided by public libraries, and each group has different, but interrelated, expectations of what kinds of access public libraries should provide. however, the data indicate that public libraries are at capacity in meet ing some of these expectations, while some libraries lack the funding, technologysupport capacity, space, and infrastructure (e.g., power, cabling) to reach the expecta tions of each respective group. as public libraries (and the internet and public com puting access they provide) continue to fill more social roles and expectations, a range of new ideas and strate gies can be considered by public libraries to identify suc cessful methods for providing access that is high quality and sufficient to meet the needs of patrons and commu nity. the goals of the public libraries and the internet stud ies have been to help provide an understanding of the issues and needs of libraries associated with providing internetbased services and resources. the 2006 public libraries and the internet study employed a webbased survey approach to gather both quantitative and qualitative data from a sample of the 16,457 public library outlets in the united states.6 a sample was drawn to accurately represent metropolitan status (roughly equating to their designation of urban, suburban, or rural libraries), poverty levels (as derived through census data), state libraries, and the national picture, producing a sample of 6,979 public library out lets.7 the survey received a total of 4,818 responses for a response rate of 69 percent. the data in this article, unless otherwise noted, are drawn from the 2004 and 2006 public libraries and the internet studies.8 while the survey received responses from librar ies in all fifty states, there were not enough responses in all states from which to present statelevel findings. the study was able to provide statelevel analysis for thirtyfive states (including washington, d.c.) in 2004 and fortyfour states at the outlet level (including washington, d.c.) and fortytwo states at the system level (including washington, d.c.) in 2006. in addi tion, there was some variance in states with adequate responses between the 2004 and 2006 studies. a full listing of the states is available in the final reports of the 2004 and 2006 studies at http://www.ii.fsu.edu/ plinternet_reports.cfm. thus, the findings below reflect 4 information technology and libraries | june 2007 public libraries and internet access across the united states: a comparison by state 2004–2006 paul t. jaeger, john carlo bertot, charles r. mcclure, and miranda rodriguez paul t. jaeger (pjaeger@umd.edu) is an assistant professor at the college of information studies at the university of maryland; john carlo bertot (bertot@ci.fsu.edu) is professor and associate director of the information use management and policy institute, college of information, florida state university; charles r. mcclure (cmcclure@ci.fsu.edu) is francis eppes professor and director of the information use management and policy institute, college of information, florida state university; and miranda rodriguez (mrodrig08@umd.edu) is a graduate student in the college of information studies at the university of maryland. public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 5 only those states for which both the 2004 and 2006 stud ies were able to provide analysis. n public libraries and the internet across the states overview of 2004 to 2006 as the public library and the internet studies have been ongoing since 1994, the questions asked in the biennial studies have evolved along with the provision of internet access in libraries. the questions have varied between surveys, but there have been consistent questions that allow for longitudinal analysis at the national level. the 2004 study introduced the analysis of the data at both the national and the state levels. with both the 2004 and 2006 studies providing data at the state level, some longitudi nal analysis at the state level is now possible. overall, there were a number of areas of consistent data across the states from 2004 to 2006. most states had fairly similar, if not identical, percentages of library outlets offering public internet access between 2004 and 2006. for the most part, changes were increases in the percentage of library outlets offering patron access. further, the average number of hours open per week in 2004 (44.5) and in 2006 (44.8) were very similar, as were the percentages of library outlets reporting increases in hours per week, decreases in hours per week, and no changes in hours per week. while these numbers are consistent, it is not known whether this average number of hours open, or the distribution of the hours open across the week, is sufficient to meet patron needs in most communities. data across the states also indicated that physical space is the primary reason for the inability of libraries to add more workstations within the library building. there was also consistency in the findings related to upgrades and replacement schedules. changes and continuities from 2004 to 2006 while the items noted above show some areas of stability in the internet access provided by public libraries across the states, insights are possible in the areas of change for libraries overall or in the libraries that are leading in particular areas. table 1 details the states with the highest average number of hours open per public library outlet in 2004 and 2006. between 2004 and 2006, the national average for the number of hours open increased slightly from 44.5 hours per week to 44.8 hours per week. this increase is reflected in the numbers for the individual states in 2006, which are generally slightly higher than the numbers for the individual states in 2004. for example, the top state in 2006 averaged 55.7 hours per outlet each week, while the top state in 2004 averaged 54.8 hours. the top four states—ohio, new jersey, florida, and virginia—were the same in both years, though with the top two switching positions. this demonstrates a continuing commitment in these four states by state and local government to ensure wide access to public librar ies. these states are also ones with large populations and state budgets, presumably fueling the commitment and facilitating the ability to keep libraries open for many hours each week. while the needs of patrons in other states are no less significant, the data indicate that states with larger populations and higher budgets, not surpris ingly, may be best positioned to provide the highest levels of access to public libraries for state residents. the other six states in the 2006 top ten were not in the 2004 top ten. the primary reason for this is that the six states in 2006 increased their hours more than other states. note that the fifthranked state in 2004, south carolina, averaged 49 hours per outlet each week, which is less than the tenthranked state in 2006, illinois, at 49.5 hours. simply by maintaining the average number of hours open per outlet between 2004 and 2006, south carolina fell from fifth to out of the top ten. these differ ences are reflected in the fact that there is nearly a ten hour difference from first place to tenth place in 2004; yet only a sixhour discrepancy exists from first place to tenth in 2006. these numbers suggest that hours of operation may change frequently for many libraries, indicating the need for future evaluations of operational hours in rela tion to meeting patron demand. table 2 displays the states with the highest average number of public access workstations per public library in 2004 and 2006. the national averages between 2004 and 2006 also showed a slight increase from 10.4 workstations table 1. highest average number of hours open in public library outlets by state in 2004 and 2006 2004 2006 1. new jersey 54.8 1. ohio 55.7 2. ohio 54.6 2. new jersey 55.6 3. florida 52.4 3. florida 52.3 4. virginia 51.3 4. virginia 52.3 5. south carolina 49.0 5. indiana 51.9 6. utah 48.0 6. pennsylvania 50.6 7. new mexico 47.4 7. washington, d.c. 50.6 8. rhode island 47.3 8. maryland 50.0 9. alabama 46.9 9. connecticut 49.8 10. new york 46.2 10. illinois 49.5 national: 44.5 national: 44.8 in 2004 to 10.7 workstations in 2006. a key reason for this slow growth in the number of workstations appears to have a great deal to do with limitations of physical space in libraries; in spite of increasing demands, space con straints often limit computer capacity.9 unlike table 1, the comparisons between 2004 and 2006 in table 2 do not show acrosstheboard increases from 2004 to 2006. in fact, florida had the highest average of workstations per library outlet in both 2004 and 2006, but the average number decreased from 22.6 in 2004 to 21.7 in 2006. it is interesting to note that florida has a significantly higher number of workstations than the next highest state in both 2004 and 2006. in contrast, many of the states in the lower half of the top ten in 2004 had sub stantially lower average numbers of workstations in 2004 than in 2006. in 2004 there were an average of seven more computers in spot two than spot ten; in 2006, there were only an average of four more computers from spot two to ten. the large increases in the number of workstations in some states, like nevada, michigan, and maryland, indicate sizeable changes in budget, numbers of outlets, and/or population size. also of note is the significant drop of the average number of workstations in kentucky, declining from 18.8 in 2004 to fewer than 13 in 2006. a possible explanation is that, since kentucky libraries have been leaders in adopting wireless technologies (see table 3), the demand for workstations has decreased as libraries have added wireless access. five states appear in the top ten of both years— florida, indiana, georgia, california, and new jersey. the average number of workstations in indiana, california, and georgia increased from 2004 to 2006, while the aver age number of workstations in florida and new jersey decreased between 2004 and 2006. some of the decreases in workstations can be accounted for by increases in the availability of wireless access in public libraries, as librar ies with wireless access may feel less need to add more networked computers, relying on patrons to bring their own laptops. such a strategy, of course, will not increase access for patrons who cannot afford laptops. some libraries have sought to address this issue by having lap tops available for loan within the library building. the states listed in table 3 had the highest average levels of wireless connectivity in public library outlets in 2004 and 2006. the differences between the numbers in 2004 and 2006 reveal the dramatic increases in the avail ability of wireless internet access in public libraries. the national average in 2004 was 17.9 percent, but in 2006, the national average had more than doubled to 37.4 percent of public libraries offering wireless internet access. this sizeable increase is reflected in the changes in the states with the highest levels of wireless access. every position in the ratings in table 3 shows a dra matic jump from 2004 to 2006. the top position increased from 47 percent to 63.8 percent. the tenth position increased from 19.6 percent to 47.8 percent, an increase of nearly twoandahalf times. these increases show how much more prominent wireless internet access has become in the services that public libraries offer to their communities and to their patrons. four states appear on both the 2004 and 2006 lists— virginia, kentucky, rhode island, and new jersey. these four states all showed increases, but the rises in some table 2. highest average number of public access workstations in public library outlets by state in 2004 and 2006. 2004 2006 1. florida 22.6 1. florida 21.7 2. kentucky 18.8 2. indiana 17.5 3. new jersey 15.5 3. nevada 15.7 4. georgia 14.0 4. michigan 14.8 5. utah 13.0 5. maryland 14.6 6. rhode island 12.6 6. georgia 14.4 7. indiana 12.3 7. arizona 14.1 8. texas 11.9 8. california 14.0 9. california 11.8 9. new jersey 13.8 10. south carolina 11.7 10. virginia 13.0 new york 11.7 national: 10.4 national: 10.7 table 3. highest levels of public access wireless internet connectivity in public library outlets by state in 2004 and 2006 2004 2006 1. kentucky 47% 1. virginia 63.8% 2. new mexico 38.6% 2. connecticut 56.6% 3. new hampshire 31.6% 3. indiana 56.6% 4. virginia 30.8% 4. rhode island 53.9% 5. texas 26.4% 5. kentucky 52.0% 6. kansas 25.8% 6. new jersey 50.9% 7. new jersey 22.8% 7. maryland 49.8% 8. rhode island 22.5% 8. illinois 48.3% 9. florida 21.9% 9. california 47.8% 10. new york 19.6% 10. massachusetts 47.8% national: 17.9% national: 37.4% 6 information technology and libraries | june 2007 public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 7 other states were significant enough to reduce kentucky from the topranked state in 2004 to the fifth ranked, in spite of the fact that the number of public libraries in kentucky offering wireless access increased from 47 per cent to 52 percent. in both years, a majority of the states in the top ten were located along the east coast. further, high levels of wireless access may be linked in some states to areas of high population density or the strong presence of technologyrelated sectors in the state, as in california and virginia. smaller states with areas of dense popula tions, such as connecticut, rhode island, and maryland, are also among the leaders in wireless access. tables 4 and 5 provide contrasting pictures regarding the number of public access internet workstations in public libraries by state in 2004 and 2006. table 4 shows the states with the highest percentages of libraries that consistently have fewer workstations that are needed by patrons, while table 5 shows the states with the highest percentages of libraries that consistently have sufficient workstations to meet patron needs. of note is the fact that, unlike the preceding three tables, there appears to be no significant geographical clustering of states in tables 4 and 5. nationally, the percentage of libraries that consis tently have insufficient workstations to meet patron needs declined from 15.7 percent in to 2004 to 13.7 percent in 2006, a change that is within the margin of error (+/ 3.4 percent) of the question on the 2006 survey. due to the size of the change, it is not known if the national decline was a real improvement or simply a reflection of the margin of error. washington, d.c., oregon, new mexico, idaho, and california appear on the lists for both 2004 and 2006 in table 4. washington, d.c. had the highest percentage of libraries reporting insufficient workstations in both years, though there was a significant decrease from 100 percent of libraries in 2004 to 69 percent of libraries in 2006. in this case, the significant drop represents major strides forward to providing sufficient access to patrons in washington, d.c. similarly, though california features on both lists, the percentages dropped from 44.9 percent in 2004 to 22.2 percent in 2006, a decline of more than half. states like these are obviously making efforts to address the need for increased workstations. overall, eight out of ten positions in table 4 remained constant or saw a decline percentage in each position from 2004 to 2006, indicating a national decrease in libraries with insufficient workstations. in sharp contrast, fewer than 20 percent of nevada libraries in 2004 reported insufficient workstations, placing well out of the top ten. however, in 2006 nevada ranked second, with 51.5 percent of public libraries reporting insufficient workstations to meet patron demand. with nevada’s rapidly growing population, it appears that the demand for internet access in public libraries may not be keeping pace with the population growth. the percentage of public libraries reporting suffi cient workstations to consistently meet patron demands increased slightly at the national level from 14.1 percent in 2004 to 14.6 percent in 2006, again well within the margin of error (+/ 3.5 percent) of the 2006 question. however, in table 5, the top ten positions in 2006 all fea ture lower percentages than the same positions in 2004. in 2004 the topranked state had 53.2 percent of libraries able to consistently meet patron needs for internet access, but the topranked state in 2006 had only 31 percent of libraries able to consistently meet patron access needs. table 4. public library outlet public access workstation availability by state in 2004 and 2006–consistently have fewer workstations than are needed 2004 2006 1. washington, d.c. 100% 1. washington, d.c. 69.9% 2. california 44.9% 2. nevada 51.5% 3. florida 36% 3. oregon 34.8% 4. new mexico 30.7% 4. new mexico 31.9% 5. oregon 30.4% 5. tennessee 30.4% 6. utah 29.2% 6. alaska 27.8% 7. south carolina 28.4% 7. idaho 26% 8. kentucky 24.1% 8. california 22.2% 9. alabama 21.5% 9. new york 21.4% 10. idaho 21.1% 10. rhode island 19% national: 15.7% national: 13.7% table 5. public library outlet public access workstation availability by state in 2004 and 2006—always have a sufficient number of workstations to meet demand. 2004 2006 1. wyoming 53.2% 1. louisiana 31% 2. alaska 34.9% 2. new hampshire 30.4% 3. kansas 32.2% 3. north carolina 28.4% 4. rhode island 31.4% 4. arkansas 26.2% 5. new hampshire 29.7% 5. wyoming 25.2% 6. south dakota 25.2% 6. mississippi 24.4% 7. georgia 25% 7. missouri 23.6% 8. arkansas 24.8% 8. vermont 22.2% 9. vermont 32.7% 9. nevada 20.9% 10. virginia 22.4% 10. pennsylvania 17.9% west virginia 17.9% national: 14.1% national: 14.6% � information technology and libraries | june 2007 four states—new hampshire, arkansas, wyoming, and vermont—appear on both the 2004 and 2006 lists. the national increase in the sufficiency of the num ber of workstations to meet patron access needs and decreases in all of the topranked states between 2004 and 2006 seems incongruous. this situation results, however, from a decrease in range of differences among the states from 2004 to 2006, so that the range is compressed and the percentages are more similar among the states. further, in some states, the addition of wireless access may have served to increase the overall sufficiency of the access in libraries, possibly leveling the differences among states. nevertheless, the national average of only 14.6 percent of public libraries consistently having sufficient numbers of workstations to meet patron access needs is clearly a major problem that public libraries must work to address. comparing the 2006 data of tables 4 and 5 demonstrates that patron demands for internet access are being met neither evenly nor consistently across the states. nationally, the percentage of public library systems with increases in the information technology budgets from the previous year dropped dramatically from 36.1 percent in 2004 to 18.6 percent in 2006. as can be seen in table 6, various national, state, and local budget crunches have significantly reduced the percentages of public library systems with increases in information technology budgets. when inflation is taken into account, a stationary information technology budget represents a net decrease in funds available in real dollar terms, so the only public libraries that are not actually having reductions in their information technology budgets are those with increases in such budgets. since internet access and the accompa nying hardware necessary to provide it are clearly a key aspect of information technology budgets, decreases in these budgets will have tangible impacts on the ability of public libraries to provide sufficient internet access. virtually every position on table 6 has a decrease of 20 percent to 30 percent from 2004 to 2006, with the largest decrease being from 84.2 percent in 2004 to 48.3 percent in 2006 in the second position. five states—delaware, kentucky, florida, rhode island, and south carolina—are listed for both 2004 and 2006, though every one of these states registered a decrease from 2004 to 2006. no drop was more dramatic than south carolina’s from 84.2 percent in 2004 to 31 percent in 2006. overall, though, the declining information tech nology budgets and continuing increases in demands for information technology access among patrons cre ates a very difficult situation for libraries. public libraries and the internet in 2006 along with questions that were asked on both the 2004 and 2006 public libraries and the internet studies, the sur vey included new questions on the 2006 study to account for social changes, alterations of the policy environment, and the maturation of internet access in public librar ies. several findings from the new questions on the 2006 study were noteworthy among the state data. the states listed in table 7 had the highest percentage of public library systems with increases in total operating budget over the previous year in 2006. nationally, 45.1 percent of public library systems had some increase in their overall budget, which includes funding for staff, physical structures, collection development, and many other costs, along with technology. at the state level, three northeastern states clearly led the way, with more than 75 percent of library systems in maryland, delaware, and rhode island benefiting from an increase in the overall operating budget. also of note is the fact that two fairly table 6. highest levels of public library system overall internet information technology budget increases by state in 2004 and 2006 2004 2006 1. florida 87.5% 1. delaware 60% 2. south carolina 84.2% 2. kentucky 48.3% 3. rhode island 67.5% 3. maryland 47.6% 4. delaware 64.9% 4. wyoming 45.7% 5. new jersey 61.5% 5. louisiana 40% 6. north carolina 55.5% 6. florida 38% 7. virginia 53.6% 7. rhode island 33.3% 8. kentucky 53.2% 8. south carolina 31% 9. new mexico 49.3% 9. arkansas 27.5% 10. kansas 49% 10. california 27.3% national: 36.1% national: 18.6% table 7. highest levels of public library system total operating budget increases by state in 2006 1. maryland 85.7% 2. delaware 80% 3. rhode island 76.4% 4. idaho 74.5% 5. kentucky 73.6% 6. connecticut 68.6% 7. virginia 62.8% 8. new hampshire 62.5% 9. north carolina 61.6% 10. wyoming 60.9% national: 45.1% public libraries and internet access | jaeger, bertot, mcclure, and rodriguez � rural and sparsely populated western states—idaho and wyoming—were among the top ten. five of the states in the top ten in highest percent ages of increases in operating budget in 2006 were also among the top ten in highest percentages of increases in information technology budgets in 2006. comparing table 7 with table 6 reveals that delaware, kentucky, maryland, rhode island, and wyoming are on both lists. in these states, increases in information technology budgets seem to have accompanied larger increases in the overall 2006 budget. an interesting point to ponder in comparing table 6 with table 7 is the large discrepancy between average increases in information technology budgets (18.6 per cent) with overall budgets (45.1 percent) at the national level. as internet access is becoming more vital to pub lic libraries in the content and services they provide to patrons, it seems surprising that such a smaller portion of library systems would receive an increase in information technology budgets than in overall budgets. one growing issue with the provision of internet access in public libraries is the provision of access at suf ficient connection speeds. more and more internet con tent and services are complex and require large amounts of bandwidth, particularly content involving audio and video components. fortunately, as demonstrated in table 8, 53.5 percent of libraries nationally indicate that their connection speed is sufficient at all times to meet patron needs. in contrast, only 16.1 percent of public libraries nationally indicate that their connection speed is insuf ficient to meet patron needs at all times. georgia has the highest percentage of libraries that always have sufficient connection speed at 80.5 percent. in the case of georgia, the statewide library network is most likely a key part of ensuring the majority of libraries have sufficient access speed. many of the other states that have the highest percentages of public librar ies with sufficient connection speeds are located in the middle part of the country. the state with the highest percentage of libraries with insufficient connection speed to meet patron demands is virginia, with 35 per cent of libraries. curiously, virginia consistently ranks in the top ten of tables 1–3. though virginia libraries have some of the longest hours open, some of the high est numbers of workstations, and some of the highest levels of wireless access, they still have the highest per centage of libraries with insufficient connection speed. only five states had more than 25 percent of libraries with connection speeds insufficient to meet the needs of patrons at all times. this issue is significant now in these states, as these libraries lack the necessary connec tion speeds. however, it will continue to escalate as an issue as content and services on the internet continue to evolve and become more complex, thus requiring greater connection speeds. comparing table 8 with table 4 (consistently have fewer workstations than are needed) and table 5 (always have a sufficient number of workstations to meet demand) reveals some parallels. alabama and rhode island are among the top ten states both for connection speed being consistently insufficient to meet patron needs (table 8) and consistently have fewer workstations than are needed (table 4). conversely, vermont and louisiana are among the top ten states both for connection speed being sufficient to meet patron needs at all times (table 8) and always have a sufficient number of workstations to meet demand (table 5). table 9 displays the two leading types of internet connection providers for public libraries and the states with the highest percentages of libraries using each. nationally, 46.4 percent of public libraries rely on an internet service provider (isp) for internet access. in the states listed in table 9, threequarters or more of librar ies use an isp, with more than 90 percent of libraries in kentucky and iowa using an isp. the next most common means of connection for public libraries is through a library cooperative or library network, with 26.2 percent of libraries nationally using these means. in such cases, member libraries rely on their established network to serve as the connector to the internet. the library net work approach seems to be most effective in geographi cally small states. the top three on the list being three of the smallest of the states—rhode island, delaware, and west virginia—with more than 75 percent of libraries in each of these states connecting through a network. nationally, the remaining approximately 25 percent of table �. highest percentages of public library outlets where public access internet service connection speed is sufficient at all times or insufficient by state in 2006 sufficient to meet patrons needs at all times insufficient to meet patron needs 1. georgia 80.5% 1. virginia 35% 2. new hampshire 70.6% 2. north carolina 28.1% 3. iowa 64.2% 3. alaska 27.3% 4. illinois 64% 4. delaware 26.9% 5. ohio 63.9% 5. mississippi 26.6% 6. indiana 63.6% 6. missouri 24.3% 7. vermont 63.5% 7. rhode island 23.1% 8. oklahoma 62.8% 8. oregon 22.4% 9. louisiana 61.7% 9. connecticut 21.5% 10. wisconsin 61.5% 10. arkansas 21.2% national: 53.5% national: 16.1% 10 information technology and libraries | june 2007 libraries connect through a network managed by a nonlibrary entity or by other means. the highest percentages of public library sys tems receiving each kind of erate discount are presented in table 10. erate discounts are an important source of technology funding for many public libraries across the country, with more than $250,000,000 in erate discounts distributed to libraries between 2000 and 2003.10 nationally in 2006, 22.4 percent of public library systems received discounts for internet connectivity, 39.6 percent for telecommunications services, and 4.4 percent for internal connection costs. mississippi and louisiana appear in the top five for each of the three types of discounts. minnesota and west virginia are each in the top five for two of the three lists. many of the states benefiting the most from erate funding in 2006 have large rural popu lations spread out over a geographically dispersed area, indicating the continuing importance of e rate discounts in bringing internet connections to rural public libraries. maryland and west virginia are both included in the telecommunications service column of table 10 due to proportionally large areas of these smaller states that are rural. the importance of the telecommunications dis counts in certain states is obviated by the fact that more than 75 percent of public library systems in all five states listed received such discounts. in comparison, only one state has more than 75 percent of library systems receiv ing discounts for internet connectivity, while no state has 30 percent of library systems receiving discounts for internal connection costs, with the latter reflecting the manner in which erate funding is calculated. in spite of the penetration of the internet into virtually every public library in the united states and the general expectations that internet access will be publicly available in every library, not all public libraries offer information technology training for patrons. nationally, 21.4 percent of public library outlets do not offer technology training. table 10 lists the states with the highest percentages of public library outlets not offering information technol ogy training. six of the ten states listed are located in the southeastern part of the country. the lack of resources or adequate number of staff to provide training is a leading concern in these states. not offering patron training may be strongly linked to lacking economic resources to do so. for example, the two states with the highest percentage of public libraries not offering patron training—mississippi and louisiana—are also the two states in the top five recipients of each kind of erate funding listed in table 10. if the libraries in states like these are economically struggling just to provide internet access, it seems likely that providing accompany ing training might be difficult as well. a further difficulty is that there is little public or private funding available specifically for training. n discussion of issues the similarities and differences among the states indi cate that the evolution of public access to the internet in public libraries is not necessarily an evenly distributed phenomenon, as some states appear to be consistent lead ers in some areas and other states appear to consistently trail in others. while the national picture is one primarily of continued progress in the availability and quality of internet access available to library patrons, the progress is not evenly distributed among the states. 11 libraries in different states struggle with or benefit from different issues. some public libraries are limited by state and local budgetary limitations, while other libraries are seeking alternate funding sources through grant writ ing and building partnerships with the corporate world. some face barriers to providing access due to their geo graphical location or small service population. it may also be the case that the libraries in some states do not per ceive that patrons desire increased access. other public libraries are able to provide highend access as a result of having strong local leadership, sufficient state and local funding, welldeveloped networks and cooperatives, and a proactive state library. though the discussion of the “digital divide” has become much less frequent, the state data seem to indi cate that there are gaps in levels of access among libraries in different states. while every state has very successful individual libraries in terms of providing quality internet table �. highest levels of types of internet connection provider for public library outlets by state in 2006 internet service provider library cooperative or network 1. kentucky 93.5% 1. rhode island 84.7% 2. iowa 90.9% 2. delaware 79.5% 3. new hampshire 83.8% 3. west virginia 77.9% 4. vermont 81.1% 4. wisconsin 71.2% 5. oklahoma 80.6% 5. massachusetts 54.7% wyoming 80.6% 6. minnesota 52.5% 7. idaho 80.2% 7. ohio 48.9% 8. montana 78.9% 8. georgia 45.1% 9. tennessee 78.4% 9. mississippi 41.2% 10. alabama 74.6% 10. connecticut 38.5% national: 46.4% national: 26.2% public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 11 access and individual libraries that could be doing a better job, the state data indicate that library patrons in different parts of the country have variations in the levels and quality of access available to them. uniformity across all states clearly will never be feasible, though, as differ ent states and their patrons have different needs. for example, tables 1, 2, and 3 all display features that indicate highlevel internet access in public librar ies—high numbers of hours open, high numbers of public access workstations, and high levels of wireless internet access. three states—maryland, new jersey, and virginia—appear in the top ten in these three lists for 2006. further, connecticut, florida, illinois, and indiana each appear in the top ten of two of these three lists. these states clearly are making successful efforts at the state and local levels to guarantee widespread access to public libraries and the internet access they provide. gaps in access are also evident among different regions of the country. the highest percentages of library systems with increases in total operating budgets were concentrated in states along the east coast, with seven of the states listed in table 7 being midatlantic or northeastern states. in con trast, the highest percentages of library systems relying on erate funding in table 10 were concentrated in the midwest and the southeast. further, the numbers in tables 6 and 7 showed far greater increases in the total operating budgets than in the information technology budgets in all regions of the country. as a result, public libraries in all parts of the united states may need to seek alternate sources of funding specifically for information technology costs. as can be seen in table 3, the leading states in adoption of wireless technology are concentrated in the northeast and midatlantic. in table 11, southern states, particu larly louisiana and mississippi, had many of the highest percentages of libraries not offering any internet training to patrons. it is important to note with data from the gulfstates, however, that the effects of hurricane katrina may have had a large impact on the results reported. one key difference in a number of states seems to be the presence of a state library actively working to coordi nate access issues. this particular study was not able to address such issues, but evidence indicates that the state library can play a significant role in ensuring sufficiency of internet access in public libraries in a state. maine, west virginia, and wisconsin all have state libraries that apply and distribute funds at the statewide level to ensure all public libraries, regardless of size or geography, have highend connections to the internet. the state library of west virginia, for example, applied for erate funding for telecommunications costs on a statewide basis and received 79.1 percent funding in 2006, using such funding to cover not only connection costs for public libraries, but also to provide it and network support to libraries. another example of a successful statewide effort to provide sufficient internet access can be found in maryland. in the early 1990s, maryland public library administrators agreed to let the state library use library services and technology act (lsta) funds to build the sailor network, connecting all public libraries in the state.12 this network predates the erate program by a number of years, but having an established statewide network has helped the state library to coordinate table 10. highest percentages of public library systems receiving e-rate discounts by category and state in 2006 internet connectivity telecommunications services internal connection costs 1. louisiana 89.2% 1. mississippi 92.6% 1. mississippi 29.6% 2. indiana 70.8% 2. south carolina 89.4% 2. minnesota 22.6% 3. mississippi 63% 3. louisiana 79.5% 3. arizona 19.3% 4. minnesota 50.5% 4. west virginia 79.1% 4. west virginia 14.2% 5. tennessee 44.7% 5. maryland 76.2% 5. louisiana 12.3% national: 22.4% national: 39.6% national: 4.4% table 11. highest levels of public library systems not offering patron information technology training services by state in 2006 1. louisiana 48.7% 2. mississippi 40.7% 3. arkansas 39.6% 4. alaska 36% 5. arizona 34.8% 6. georgia 34.5% 7. new hampshire 32.8% 8. south carolina 31.1% 9. tennessee 30% 10. idaho 29% national: 21.4% 12 information technology and libraries | june 2007 applications, funding, and services among the libraries of the state. the state budget in maryland also provides other types of funding to support the state library, the library systems, and the library outlets in providing internet access. in states such as georgia, maryland, maine, west virginia, and wisconsin, the provision of internet access in public libraries is shaped not only by library outlets and library systems, but by the state libraries as well. in these and other states, the efforts of the state library appear to be reflected in the data from this study. a final area for discussion is the degree to which librarians understand how much bandwidth is required to meet the needs of library users, how to measure actual bandwidth that is available in the library, and how to determine the degree to which that bandwidth is suf ficient. indeed, many providers advertise that their con nection speeds are “up to” a certain speed when in fact they deliver considerably less.13 the authors have offered an analysis of determining the quality and sufficiency of bandwidth elsewhere.14 suffice to say that there is consid erable confusion as to “how good is good enough” band width connection quality. these types of issues frame understandings of how connected libraries in different states are and whether those connections are sufficient to meet the needs of patrons. n future research while the experience of individual patrons in particular libraries will vary widely in terms of whether the access available is sufficient to meet their information needs, the fact that the state data indicate variations in the levels and quality of access among some states and regions of the country is worthy of note. an important area of sub sequent research will be to investigate these differences, determine the reasons for them, and develop strategies to alleviate these apparent gaps in access. investigating these differences requires consideration of local and situational factors that may affect access in one library but perhaps not in another. for example, one public library may have access to an internet provider that offers higher speed connectivity that is not available in another location. the range of the possible local and situational factors affecting access and services is extensive. a prelimi nary list of the factors that contribute to being a success fully networked public library is described in greater detail in the 2006 study.15 however, additional investigation into the degree to which these factors affect access, quality of service, and user satisfaction needs to be continued. the personal experience of the authors in working with various state library agencies suggests the need for additional research that explores relationships among those states ranked highest in areas such as connectivity and workstations with programs and services offered by the state library agencies. one state library, for example, has a specific program that works directly with individual public libraries to assist them in completing the various erate forms. is there a link between that state library providing such assistance and the state’s public libraries receiving more erate discounts per capita than other states? this is but one example where investigating the role of the state library and comparing those roles and services to the rankings may be useful. perhaps a number of “best practices” could be identified that would assist the libraries in other states in improv ing access and services. in terms of research methods, future research on the topics identified in this article may need to draw upon strategies other than a national survey and onsite focus groups/interviews. the 2006 study, for the first time, included site visits and interviews and produced a wealth of data that supplemented the national survey data.16 onsite analysis of actual connection speeds in a sample of public libraries is but one example. the degree to which survey respondents know the connec tion speeds at specific workstations is unclear. simply because a t1 line comes in the front door, it is not nec essarily the speed available at a particular workstation. other methods such as log file analysis or userbased surveys of networked services (as opposed to surveys completed by librarians) may offer insights that could augment the national survey data. other approaches such as policy analysis may also prove useful in better understanding access, connectiv ity, and services on a statebystate basis. there has been no systematic description and analysis of statebased laws and regulations that affect public library internet access, connectivity, and services. the authors are aware of some states that ensure a minimum bandwidth will be provided to each public library in the state and pay for such connectivity. such is not true in other states. thus, a better understanding of how statebased policies and regulations affect access, connectivity, and services may identify strategies and policies that could be used in other states to increase or improve access, connectiv ity, and services. the data discussed in this article also point to many other important needs in future research. libraries in certain states that seem to be frequently ranking high in the tables indicate that certain states are better able to sustain their libraries in terms of finances and usage. however, additional factors may also be key in the differ ences among the states. future research needs to consider the internet access in public libraries in different states in relation to other services offered by libraries and to uses of the internet connectivity in libraries, including types of online content and services available, types of training public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 13 available, community outreach, other collection issues, staffing in relation to technology, and other factors. n conclusion internet and public computing access is almost univer sally available in public libraries in the united states, but there are differences in the amounts of access, the kinds of access, and sufficiency of the access available to meet patron demands. now that virtually every public library has an internet connection, provides internet access to patrons, and offers a range of public computing access, the attention of public libraries must refocus on ensuring that every library can provide sufficient internet and com puting access to meet patron needs. the issues to address include being open to the public a sufficient number of hours, having enough internet access workstations, hav ing adequate wireless access, and having sufficient speed and quality of connectivity to meet the needs of patrons. if a library is not able to provide sufficient access now, the situation will only continue to grow more difficult as the content and services on the internet continue to be more demanding of technical and bandwidth capacity. public libraries must also focus on increasing provi sion of internet access in light of federal, state, and local governments recently adding yet another significant level of services to public libraries by “requesting” that they provide access to and training in using numerous egov ernment services. such egovernment services include social services, prescription drug plans, health care, disas ter support, tax filing, resource management, and many other activities.17 the maintenance of traditional services, the addi tion and expansion of public access computing and networked services, and now the addition of a range of egovernment services tacitly required by federal, state, and local governments, in combination, risk stretching public library resources beyond their ability to keep up. to avoid such a situation, public libraries, library sys tems, and state governments must learn from the library outlets, systems, and states that are more successfully providing sufficient internet access to their patrons and their communities. among these leaders, there are likely models for success that can be identified for the benefit of other outlets, systems, and states. beyond the lessons that can be learned from the most connected, however, there are also practical and logistical issues that remain beyond the control of an individual library and sometimes the entire state, such as geographical and economic factors. ultimately, the analysis of state data offered here sug gests that much can be learned from one state that might assist another state in terms of improving connectivity, access, and services. while the data suggest a number of significant discrepancies among the various states, it may be that a range of best practices can be identified from those more highly ranked states that could be employed in other states to improve access, connectivity, and ser vices. staff at the various state library agencies may wish to discuss these findings and develop strategies that can then improve access nationwide. providing access to the internet is now as established a role for public libraries as providing access to books. patrons and communities, and now government orga nizations, rely on the fact that internet access will be available to everyone who needs it. while there are other points of access to the internet in some communities, such as school media centers and community technology centers, the public library is often the only public access point available in many communities.18 public libraries across the states must continually work to make sure the access they provide meets all of these needs. n acknowledgements the 2004 and 2006 public libraries and the internet studies were funded by the american library association and the bill & melinda gates foundation. drs. bertot, mcclure, and jaeger served as the coprincipal investigators of the study. more information on these studies is available at http://www.ii.fsu.edu/plinternet/. references and notes 1. john carlo bertot, charles r. mcclure, and paul t. jaeger, public libraries and the internet 2004: survey results and findings (tallahassee, fla.: information institute, 2005), http://www.ii.fsu .edu/plinternet_reports.cfm; john carlo bertot et al., public libraries and the internet 2006: study results and findings (tal lahassee, fla.: information institute, 2006), http://www.ii.fsu. edu/plinternet_reports.cfm (accessed mar. 31, 2007). 2. bertot et al., public libraries and the internet 2006. 3. john carlo bertot and charles r. mcclure, “assessing the sufficiency and quality of bandwidth for public libraries,” information technology and libraries 26, no. 1 (2007): 14–22. 4. john carlo bertot et al., “drafted: i want you to deliver egovernment,” library journal 131, no. 13 (2006): 34–39; john carlo bertot et al., “public access computing and internet access in public libraries: the role of pub lic libraries in egovernment and emergency situations,” first monday 11, no. 9 (2006). http://www.firstmonday .org/issues/issue11_9/bertot/ (accessed mar. 31, 2007). 5. ibid.; paul t. jaeger et al., “the 2004 and 2005 gulf coast hurricanes: evolving roles and lessons learned for public libraries in disaster preparedness and community services,” public library quarterly (in press). 6. there are actually nearly 17,000 service outlets in the united states. however, the sample frame eliminated bookmobiles as 14 information technology and libraries | june 2007 well as library outlets that the study team could neither geocode nor calculate poverty measures. additional information on the methodology is available in the study report at http://www.ii.fsu .edu/plinternet/ (accessed mar. 31, 2007). 7. bertot et al., public libraries and the internet 2006. 8. bertot, mcclure, and jaeger, public libraries and the internet 2004; bertot et al., public libraries and the internet 2006. the 2004 survey instrument is available at http://www.ii.fsu.edu/pro jectfiles/plinternet/plinternet_appendixa.pdf. the 2006 survey instrument is available at http://www.ii.fsu.edu/projectfiles/ plinternet/2006/appendix1.pdf (accessed mar. 31, 2007). 9. bertot et al., public libraries and the internet 2006. 10. paul t. jaeger, charles r. mcclure, and john carlo bertot, “the erate program and libraries and library consortia, 2000 2004: trends and issues,” information technology and libraries 24, no. 2 (2005): 57–67. 11. bertot, mcclure, and jaeger, public libraries and the internet 2004; bertot et al., public libraries and the internet 2006; john carlo bertot, charles r. mcclure, and paul t. jaeger, “public libraries struggle to meet internet demand: new study shows libraries need support to sustain online services,” american libraries 36, no. 7 (2005): 78–79. 12. john carlo bertot and charles r. mcclure, sailor assessment final report: findings and future sailor development (bal timore, md.: division of library development and services, 1996). 13. matt richtel and ken belson, “not always full speed ahead,” new york times, nov. 18, 2006. 14. bertot and mcclure, “assessing the sufficiency,” 14–22. 15. bertot et al., public libraries and the internet 2006. 16. ibid. 17. bertot et al., “drafted: i want you to deliver egovern ment”; bertot et al., “public access computing and internet access in public libraries”; jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 18. paul t. jaeger et al., “the policy implications of internet connectivity in public libraries,” government information quarterly 23, no. 1 (2006): 123–41. batch loading collections into dspace | walsh 117 maureen p. walsh batch loading collections into dspace: using perl scripts for automation and quality control colleagues briefly described batch loading marc metadata crosswalked to dspace dublin core (dc) in a poster session.2 mishra and others developed a perl script to create the dspace archive directory for batch import of electronic theses and dissertations (etds) extracted with a java program from an in-house bibliographic database.3 mundle used perl scripts to batch process etds for import into dspace with marc catalog records or excel spreadsheets as the source metadata.4 brownlee used python scripts to batch process comma-separated values (csv) files exported from filemaker database software for ingest via the dspace item importer.5 more in-depth descriptions of batch loading are provided by thomas; kim, dong, and durden; proudfoot et al.; witt and newton; drysdale; ribaric; floyd; and averkamp and lee. however, irrespective of repository software, each describes a process to populate their repositories dissimilar to the workflows developed for the knowledge bank in approach or source data. thomas describes the perl scripts used to convert marc catalog records into dc and to create the archive directory for dspace batch import.6 kim, dong, and durden used perl scripts to semiautomate the preparation of files for batch loading a university of texas harry ransom humanities research center (hrc) collection into dspace. the xml source metadata they used was generated by the national library of new zealand metadata extraction tool.7 two subsequent projects for the hrc revisited the workflow described by kim, dong, and durden.8 proudfoot and her colleagues discuss importing metadata-only records from departmental refbase, thomson reuters endnote, and microsoft access databases into eprints. they also describe an experimental perl script written to scrape lists of publications from personal websites to populate eprints.9 two additional workflow examples used citation databases as the data source for batch loading into repositories. witt and newton provide a tutorial on transforming endnote metadata for digital commons with xslt (extensible stylesheet language transformations).10 drysdale describes the perl scripts used to convert thomson reuters reference manager files into xml for the batch loading of metadata-only records into the university of glascow’s eprints repository.11 the glascow eprints batch workflow is additionally described by robertson and nixon and greig.12 several workflows were designed for batch loading etds into repositories. ribaric describes the automatic this paper describes batch loading workflows developed for the knowledge bank, the ohio state university’s institutional repository. in the five years since the inception of the repository approximately 80 percent of the items added to the knowledge bank, a dspace repository, have been batch loaded. most of the batch loads utilized perl scripts to automate the process of importing metadata and content files. custom perl scripts were used to migrate data from spreadsheets or comma-separated values files into the dspace archive directory format, to build collections and tables of contents, and to provide data quality control. two projects are described to illustrate the process and workflows. t he mission of the knowledge bank, the ohio state university’s (osu) institutional repository, is to collect, preserve, and distribute the digital intellectual output of osu’s faculty, staff, and students.1 the staff working with the knowledge bank have sought from its inception to be as efficient as possible in adding content to dspace. using batch loading workflows to populate the repository has been integral to that efficiency. the first batch load into the knowledge bank was august 29, 2005. over the next four years, 698 collections containing 32,188 items were batch loaded, representing 79 percent of the items and 58 percent of the collections in the knowledge bank. these batch loaded collections vary from journal issues to photo albums. the items include articles, images, abstracts, and transcripts. the majority of the batch loads, including the first, used custom perl scripts to migrate data from microsoft excel spreadsheets into the dspace batch import format for descriptive metadata and content files. perl scripts have been used for data cleanup and quality control as part of the batch load process. perl scripts, in combination with shell scripts, have also been used to build collections and tables of contents in the knowledge bank. the workflows using perl scripts to automate batch import into dspace have evolved through an iterative process of continual refinement and improvement. two knowledge bank projects are presented as case studies to illustrate a successful approach that may be applicable to other institutional repositories. ■■ literature review batch ingesting is acknowledged in the literature as a means of populating institutional repositories. there are examples of specific batch loading processes minimally discussed in the literature. branschofsky and her maureen p. walsh (walsh.260@osu.edu) is metadata librarian/ assistant professor, the ohio state university libraries, columbus, ohio. 118 information technology and libraries | september 2010 relational database postgresql 8.1.11 on the red hat enterprise linux 5 operating system. the structure of the knowledge bank follows the hierarchical arrangement of dspace. communities are at the highest level and can be divided into subcommunities. each community or subcommunity contains one or more collections. all items—the basic archival elements in dspace—are contained within collections. items consist of metadata and bundles of bitstreams (files). dspace supports two user interfaces: the original interface based on javaserver pages (jspui) and the newer manakin (xmlui) interface based on the apache cocoon framework. at this writing, the knowledge bank continues to use the jspui interface. the default metadata used by dspace is a qualified dc schema derived from the dc library application profile.18 the knowledge bank uses a locally defined extended version of the default dspace qualified dc schema, which includes several additional element qualifiers. the metadata management for the knowledge bank is guided by a knowledge bank application profile and a core element set for each collection within the repository derived from the application profile.19 the metadata librarians at osul create the collection core element sets in consultation with the community representatives. the core element sets serve as metadata guidelines for submitting items to the knowledge bank regardless of the method of ingest. the primary means of adding items to collections in dspace, and the two ways used for knowledge bank ingest, are (1) direct (or intermediated) author entry via the dspace web item submission user interface and (2) in batch via the dspace item importer. recent enhancements to dspace, not yet fully explored for use with the knowledge bank, include new ingest options using simple web-service offering repository deposit (sword), open archives initiative object reuse and exchange (oai-ore), and dspace package importers such as the metadata encoding and transmission standard submission information package (mets sip) preparation of etds from the internet archive (http:// www.archive.org/) for ingest into dspace using php utilities.13 floyd describes the processor developed to automate the ingest of proquest etds via the dspace item importer.14 also using proquest etds as the source data, averkamp and lee described using xslt to transform the proquest data to bepress’ (the berkeley electronic press) schema for batch loading into a digital commons repository.15 the knowledge bank workflows described in this paper use perl scripts to generate dc xml and create the archive directory for batch loading metadata records and content files into dspace using excel spreadsheets or csv files as the source metadata. ■■ background the knowledge bank, a joint initiative of the osu libraries (osul) and the osu office of the chief information officer, was first registered in the registry of open access repositories (roar) on september 28, 2004.16 as of december 2009 the repository held 40,686 items in 1,192 collections. the knowledge bank uses dspace, the open-source java-based repository software jointly developed by the massachusetts institute of technology libraries and hewlett-packard.17 as a dspace repository, the knowledge bank is organized by communities. the fifty-two communities currently in the knowledge bank include administrative units, colleges, departments, journals, library special collections, research centers, symposiums, and undergraduate honors theses. the commonality of the varied knowledge bank communities is their affiliation with osu and their production of knowledge in a digital format that they wish to store, preserve, and distribute. the staff working with the knowledge bank includes a team of people from three osul areas—technical services, information technology, and preservation—and the contracted hours of one systems developer from the osu office of information technology (oit). the osul team members are not individually assigned full-time to the repository. the current osul team includes a librarian repository manager, two metadata librarians, one systems librarian, one systems developer, two technical services staff members, one preservation staff member, and one graduate assistant. the knowledge bank is currently running dspace 1.5.2 and the figure 1. dspace simple archive format archive_directory/ item_000/ dublin_core.xml--qualified dublin core metadata contents --text file containing one line per filename file_l.pdf --files to be added as bitstreams to the item file_2.pdf item_001/ dublin_core.xml file_1.pdf ... batch loading collections into dspace | walsh 119 ■■ case studies the issues of the ohio journal of science ojs was jointly published by osu and the ohio academy of science (oas) until 1974, when oas took over sole control of the journal. the issues of ojs are archived in the knowledge bank with a two year rolling wall embargo. the issues for 1900 through 2003, a total of 639 issues containing 6,429 articles, were batch loaded into the knowledge bank. due to rights issues, the retrospective batch loading project had two phases. the project to digitize ojs began with the 1900–1972 issues that osu had the rights to digitize and make publicly available. osu later acquired the rights for 1973–present, and (accounting for the embargo period) 1973–2003 became phase 2 of the project. the two phases of batch loads were the most complicated automated batch loading processes developed to date for the knowledge bank. to batch load phase 1 in 2005 and phase 2 in 2006, the systems developers working with the knowledge bank wrote scripts to build collections, generate dc xml from the source metadata, create the archive directory, load the metadata and content files, create tables of contents, and load the tables of contents into dspace. the ojs community in the knowledge bank is organized by collections representing each issue of the journal. the systems developers used scripts to automate the building of the collections in dspace because of the number needed as part of the retrospective project. the individual articles within the issues are items within the collections. there is a table of contents for the articles in each issue as part of the collection homepages.21 again, due to the number required for the retrospective project, the systems developers used scripts to automate the creation and loading of the tables of contents. the tables of contents are contained in the html introductory text section of the collection pages. the tables of contents list title, authors, and pages. they also include a link to the item record and a direct link to the article pdf that includes the file size. for each phase of the ojs project, a vendor contracted by osul supplied the article pdfs and an excel spreadsheet with the article-level metadata. the metadata format. this paper describes ingest via the dspace batch item importer. the dspace item importer is a command-line tool for batch ingesting items. the importer uses a simple archive format diagramed in figure 1. the archive is a directory of items that contain a subdirectory of item metadata, item files, and a contents file listing the bitstream file names. each item’s descriptive metadata is contained in a dc xml file. the format used by dspace for the dc xml files is illustrated in figure 2. automating the process of creating the unix archive directory has been the main function of the perl scripts written for the knowledge bank batch loading workflows. a systems developer uses the test mode of the dspace item importer tool to validate the item directories before doing a batch load. any significant errors are corrected and the process is repeated. after a successful test, the batch is loaded into the staging instance of the knowledge bank and quality checked by a metadata librarian to identify any unexpected results and script or data problems that need to be corrected. after a successful load into the staging instance the batch is loaded into the production instance of the knowledge bank. most of the knowledge bank batch loading workflows use excel spreadsheets or csv files as the source for the descriptive item metadata. the creation of the metadata contained in the spreadsheets or files has varied by project. in some cases the metadata is created by osul staff. in other cases the metadata is supplied by knowledge bank communities in consultation with a metadata librarian or by a vendor contracted by osul. whether the source metadata is created in-house or externally supplied, osul staff are involved in the quality control of the metadata. several of the first communities to join the knowledge bank had very large retrospective collection sets to archive. the collection sets of two of those early adopters, the journal issues of the ohio journal of science (ojs) and the abstracts of the osu international symposium on molecular spectroscopy currently account for 59 percent of the items in the knowledge bank.20 the successful batch loading workflows developed for these two communities—which continue to be active content suppliers to the repository—are presented as case studies. figure 2. dspace qualified dublin core xml notes on the bird life of cedar point 1901-04 griggs, robert f. 120 information technology and libraries | september 2010 article-level metadata to knowledge bank dc, as illustrated in table 1. the systems developers used the mapping as a guide to write perl scripts to transform the vendor metadata into the dspace schema of dc. the workflow for the two phases was nearly identical, except each phase had its own batch loading scripts. due to a staff change between the two phases of the project, a former osul systems developer was responsible for batch loading phase 1 and the oit systems developer was responsible for phase 2. the phase 1 scripts were all written in perl. the four scripts written for phase 1 created the archive directory, performed database operations to build the collections, generated the html introduction table of contents for each collection, and loaded the tables of contents into dspace via the database. for phase 2, the oit systems developer modified and added to the phase 1 batch processing scripts. this case study focuses on phase 2 of the project. batch processing for phase 2 of ojs the annotated scripts the oit systems developer used for phase 2 of the ojs project are included in appendix a, available on the italica weblog (http://ital-ica .blogspot.com/). a shell script (mkcol.sh) added collections based on a listing of the journal issues. the script performed a login as a selected user id to the dspace web interface using the web access tool curl. a subsequent simple looping perl script (mkallcol.pl) used the stored credentials to submit data via this channel to build the collections in the knowledge bank. the metadata.pl script created the archive directory for each collection. the oit systems developer added the pdf file for each item to unix. the vendor-supplied metadata was saved as unicode text format and transferred to unix for further processing. the developer used vi commands to manually modify metadata for characters illegal in xml (e.g., “<” and “&”). (although manual steps were used for this project, the oit systems developer improved the perl scripts for subsequent projects by adding code for automated transformation of the input data to help ensure xml validity.) the metadata.pl script then processed each line of the metadata along with the corresponding data file. for each item, the script created the dc xml file and the contents file and moved them and the pdf file to the proper directory. load sets for each collection (issue) were placed in their own subdirectory, and a load was done for each subdirectory. the items for each collection were loaded by a small perl script (loaditems. pl) that used the list of issues and their collection ids and called a shell script (import.sh) for the actual load. the tables of contents for the issues were added to the knowledge bank after the items were loaded. a perl script (intro.pl) created the tables of contents using the metadata and the dspace map file, a stored mapping of item received from the vendor had not been customized for the knowledge bank. the ojs issues were sent to a vendor for digitization and metadata creation before the knowledge bank was chosen as the hosting site of the digitized journal. the osu digital initiatives steering committee 2002 proposal for the ojs digitization project had predated the knowledge bank dspace instance. osul staff performed quality-control checks of the vendor-supplied metadata and standardized the author names. the vendor supplied the author names as they appeared in the articles—in direct order, comma separated, and including any “and” that appeared. in addition to other quality checks performed, osul staff edited the author names in the spreadsheet to conform to dspace author-entry convention (surname first). semicolons were added to separate author names, and the extraneous ands were removed. a former metadata librarian mapped the vendor-supplied table 1. mapping of vendor metadata to qualified dublin core vendor-supplied metadata knowledge bank dublin core file [n/a: pdf file name] cover title dc.identifier.citation* issn dc.identifier.issn vol. dc.identifier.citation* iss. dc.identifier.citation* cover date dc.identifier.citation* year dc.date.issued month dc.date.issued fpage dc.identifier.citation* lpage dc.identifier.citation* article title dc.title author names dc.creator institution dc.description abstract dc.description.abstract n/a dc.language.iso n/a dc.rights n/a dc.type *format: [cover title]. v[vol.], n[iss.] ([cover date]), [fpage]-[lpage] batch loading collections into dspace | walsh 121 directories to item handles created during the load. the tables of contents were added to the knowledge bank using a shell script (installintro.sh) similar to what was used to create the collections. installintro.sh used curl to simulate a user adding the data to dspace by performing a login as a selected user id to the dspace web interface. a simple looping perl script (ldallintro.pl) called installintro.sh and used the stored credentials to submit the data for the tables of contents. the abstracts of the osu international symposium on molecular spectroscopy the knowledge bank contains the abstracts of the papers presented at the osu international symposium on molecular spectroscopy (mss), which has met annually since 1946. beginning with the 2005 symposium, the complete presentations from authors who have authorized their inclusion are archived along with the abstracts. the mss community in the knowledge bank currently contains 17,714 items grouped by decade into six collections. the six collections were created “manually” via the dspace web interface prior to the batch loading of the items. the retrospective years of the symposium (1946–2004) were batch loaded in three phases in 2006. each symposium year following the retrospective loads was batch loaded individually. retrospective mss batch loads the majority of the abstracts for the retrospective loads were digitized by osul. a vendor was contracted by osul to digitize the remainder and to supply the metadata for the retrospective batch loads. the files digitized by osul were sent to the vendor for metadata capture. osul provided the vendor a metadata template derived from the mss core element set. the metadata taken from the abstracts comprised author, affiliation, title, year, session number, sponsorship (if applicable), and a full transcription of the abstract. to facilitate searching, the formulas and special characters appearing in the titles and abstracts were encoded using latex, a document preparation system used for scientific data. the vendor delivered the metadata in excel spreadsheets as per the spreadsheet template provided by osul. quality-checking the metadata was an essential step in the workflow for osul. the metadata received for the project required revisions and data cleanup. the vendor originally supplied incomplete files and spreadsheets that contained data errors, including incorrect numbering, data in the wrong fields, and inconsistency with the latex encoding. the three knowledge bank batch load phases for the retrospective mss project corresponded to the staged receipt of metadata and digitized files from the vendor. the annotated scripts used for phase 2 of the project, which included twenty years of the osu international symposium between 1951 and 1999, are included in appendix b, available on the italica weblog. the oit systems developer saved the metadata as a tab-separated file and added it to unix along with the abstract files. a perl script (mkxml2.pl) transformed the metadata into dc xml and created the archive directories for loading the metadata and abstract files into the knowledge bank. the script divided the directories into separate load sets for each of the six collections and accounted for the inconsistent naming of the abstract files. the script added the constant data for type and language that was not included in the vendor-supplied metadata. unlike the ojs project, where multiple authors were on the same line of the metadata file, the mss phase 2 script had to code for authors and their affiliations on separate lines. once the load sets were made, the oit systems developer ran a shell script to load them. the script (import_ collections.sh) was used to run the load for each set so that the dspace item import command did not need to be constructed each time. annual mss batch loads a new workflow was developed for batch loading the annual mss collection additions. the metadata and item files for the annual collection additions are supplied by the mss community. the community provides the symposium metadata in a csv file and the item files in a tar archive file. the symposium uses a web form for latex–formatted abstract submissions. the community processes the electronic symposium submissions with a perl script to create the csv file. the metadata delivered in the csv file is based on the template created by the author, which details the metadata requirements for the project. the oit systems developer borrowed from and modified earlier perl scripts to create a new script for batch processing the metadata and files for the annual symposium collection additions. to assist with the development of the new script, i provided the developer a mapping of the community csv headings to the knowledge bank dc fields. i also provided a sample dc xml file to illustrate the desired result of the perl transformation of the community metadata into dc xml. for each new year of the symposium, i create a sample dc xml result for an item to check the accuracy of the script. a dc xml example from a 2009 mss item is included in appendix c, available on the italica weblog. unlike the previous retrospective mss loads in which the script processed multiple years of the symposium, the new script processes one year at a time. the annual symposiums are batch loaded individually into one existing mss decade collection. the new script for the annual loads was tested and refined by loading the 2005 symposium into the staging instance of the 122 information technology and libraries | september 2010 ■■ summary and conclusion each of the batch loads that used perl scripts had its own unique features. the format of content and associated metadata varied considerably, and custom scripts to convert the content and metadata into the dspace import format were created on a case-by-case basis. the differences between batch loads included the delivery format of the metadata, the fields of metadata supplied, how metadata values were delimited, the character set used for the metadata, the data used to uniquely identify the files to be loaded, and how repeating metadata fields were identified. because of the differences in supplied metadata, a separate perl script for generating the dc xml and archive directory for batch loading was written for each project. each new perl script borrowed from and modified earlier scripts. many of the early batch loads were firsts for the knowledge bank and the staff working with the repository, both in terms of content and in terms of metadata. dealing with communityand vendor-supplied metadata and various encodings (including latex), each of the early loads encountered different data obstacles, and in each case solutions were written in perl. the batch loading code has matured over time, and the progression of improvements is evident in the example scripts included in the appendixes. batch loading can greatly reduce the time it takes to add content and metadata to a repository, but successful knowledge bank. problems encountered with character encoding and file types were resolved by modifying the script. the metadata and files for the symposium years 2005, 2006, and 2007 were made available to osul in 2007, and each year was individually loaded into the existing knowledge bank collection for that decade. these first three years of community-supplied csv files contained author metadata inconsistent with knowledge bank author entries. the names were in direct order, uppercase, split by either a semicolon or “and,” and included extraneous data, such as an address. the oit systems developer wrote a perl script to correct the author metadata as part of the batch loading workflow. an annotated section of that script illustrating the author modifications is included in appendix d, available on the italica weblog. the mss community revised the perl script they used to generate the csv files by including an edited version of this author entry correction script and were able to provide the expected author data for 2008 and 2009. the author entries received for these years were in inverted order (surname first) and mixed case, were semicolon separated, and included no extraneous data. the receipt of consistent data from the community for the last two years has facilitated the standardized workflow for the annual mss loads. the scripts used to batch load the 2009 symposium year are included in appendix e, which appears at the end of this text. the oit systems developer unpacked the tar file of abstracts and presentations into a directory named for the year of the symposium on unix. the perl script written for the annual mss loads (mkxml. pl) was saved on unix and renamed mkxml2009.pl. the script was edited for 2009 (including the name of the csv file and the location of the directories for the unpacked files and generated xml). the csv headings used by the community in the new file were checked and verified against the extract list in the script. once the perl script was up-to-date and the base directory was created, the oit systems developer ran the perl script to generate the archive directory set for import. the import.sh script was then edited for 2009 and run to import the new symposium year into the staging instance of the knowledge bank as a quality check prior to loading into the live repository. the brief item view of an example mss 2009 item archived in the knowledge bank is shown in figure 3. figure 3. mss 2009 archived item example batch loading collections into dspace | walsh 123 proceedings of the 2003 international conference on dublin core and metadata applications: supporting communities of discourse and practice—metadata research & applications, seattle, washington, 2003, http://dcpapers .dublincore.org/ojs/pubs/article/view/753/749 (accessed dec. 21, 2009). 3. r. mishra et al., “development of etd repository at iitk library using dspace,” in international conference on semantic web and digital libraries (icsd-2007), ed. a. r. d. prasad and devika p. madalli (2007), 249–59. http://hdl.handle .net/1849/321 (accessed dec. 21, 2009). 4. todd m. mundle, “digital retrospective conversion of theses and dissertations: an in house project” (paper presented to the 8th international symposium on electronic theses & dissertations, sydney, australia, sept. 28–30, 2005), http://adt.caul .edu.au/etd2005/papers/080mundle.pdf (accessed dec. 21, 2009). 5. rowan brownlee, “research data and repository metadata: policy and technical issues at the university of sydney library,” cataloging & classification quarterly 47, no. 3/4 (2009): 370–79. 6. steve thomas, “importing marc data into dspace,” 2006, http://hdl.handle.net/2440/14784 (accessed dec. 21, 2009). 7. sarah kim, lorraine a. dong, and megan durden, “automated batch archival processing: preserving arnold wesker’s digital manuscripts,” archival issues 30, no. 2 (2006): 91–106. 8. elspeth healey, samantha mueller, and sarah ticer, “the paul n. banks papers: archiving the electronic records of a digitally-adventurous conservator,” 2009, https://pacer .ischool.utexas.edu/bitstream/2081/20150/1/paul_banks_ final_report.pdf (accessed dec. 21, 2009); lisa schmidt, “preservation of a born digital literary genre: archiving legacy macintosh hypertext files in dspace,” 2007, https://pacer .ischool.utexas.edu/bitstream/2081/9007/1/mj%20wbo%20 capstone%20report.pdf (accessed dec. 21, 2009). 9. rachel e. proudfoot et al., “jisc final report: increase (increasing repository content through automation and services),” 2009, http://eprints.whiterose.ac.uk/9160/ (accessed dec. 21, 2009). 10. michael witt and mark p. newton, “preparing batch deposits for digital commons repositories,” 2008, http://docs .lib.purdue.edu/lib_research/96/ (accessed dec. 21, 2009). 11. lesley drysdale, “importing records from reference manager into gnu eprints,” 2004, http://hdl.handle.net/1905/175 (accessed dec. 21, 2009). 12. r. john robertson, “evaluation of metadata workflows for the glasgow eprints and dspace services,” 2006, http://hdl .handle.net/1905/615 (accessed dec. 21, 2009); william j. nixon and morag greig, “populating the glasgow eprints service: a mediated model and workflow,” 2005, http://hdl.handle .net/1905/387 (accessed dec. 21, 2009). 13. tim ribaric, “automatic preparation of etd material from the internet archive for the dspace repository platform,” code4lib journal no. 8 (nov. 23, 2009), http://journal.code4lib.org/ articles/2152 (accessed dec. 21, 2009). 14. randall floyd, “automated electronic thesis and dissertations ingest,” (mar. 30, 2009), http://wiki.dlib.indiana.edu/ confluence/x/01y (accessed dec. 21, 2009). 15. shawn averkamp and joanna lee, “repurposing probatch loading workflows are dependent upon the quality of data and metadata loaded. along with testing scripts and checking imported metadata by first batch loading to a development or staging environment, quality control of the supplied metadata is an integral step. the flexibility of perl allowed testing and revising to accommodate problems encountered with how the metadata was supplied for the heterogeneous collections batch loaded into the knowledge bank. however, toward the goal of standardizing batch loading workflows, the staff working with the knowledge bank iteratively refined not only the scripts but also the metadata requirements for each project and how those were communicated to the data suppliers with mappings, explicit metadata examples, and sample desired results. the efficiency of batch loading workflows is greatly enhanced by consistent data and basic standards for how metadata is supplied. batch loading is not only an extremely efficient means of populating an institutional repository, it is also a valueadded service that can increase buy-in from the wider campus community. it is hoped that by openly sharing examples of our batch loading scripts we are contributing to the development of an open library of code that can be borrowed and adapted by the library community toward future institutional repository success stories. ■■ acknowledgments i would like to thank conrad gratz, of osu oit, and andrew wang, formerly of osul. gratz wrote the shell scripts and the majority of the perl scripts used for automating the knowledge bank item import process and ran the corresponding batch loads. the early perl scripts used for batch loading into the knowledge bank, including the first phase of ojs and mss, were written by wang. parts of those early perl scripts written by wang were borrowed for subsequent scripts written by gratz. gratz provided the annotated scripts appearing in the appendixes and consulted with the author regarding the description of the scripts. i would also like to thank amanda j. wilson, a former metadata librarian for osul, who was instrumental to the success of many of the batch loading workflows created for the knowledge bank. references and notes 1. the ohio state university knowledge bank, “institutional repository policies,” 2007, http://library.osu.edu/sites/ kbinfo/policies.html (accessed dec. 21, 2009). the knowledge bank homepage can be found at https://kb.osu.edu/dspace/ (accessed dec. 21, 2009). 2. margret branschofsky et al., “evolving metadata needs for an institutional repository: mit’s dspace,” 124 information technology and libraries | september 2010 appendix e. mss 2009 batch loading scripts -mkxml2009.pl -#!/usr/bin/perl use encode; # routines for utf encoding use text::xsv; # routines to process csv files. use file::basename; # open and read the comma separated metadata file. my $csv = new text::xsv; #$csv->set_sep(' '); # use for tab separated files. $csv->open_file("mss2009.csv"); $csv->read_header(); # process the csv column headers. # constants for file and directory names. $basedir = "/common/batch/input/mss/"; $indir = "$basedir/2009"; $xmldir= "./2009xml"; $imagesubdir= "processed_images”; $filename = "dublin_core.xml"; # process each line of metadata, one line per item. $linenum = 1; while ($csv->get_row()) { # this divides the item's metadata into fields, each in its own variable. my ( $identifier, $title, $creators, $description_abstract, $issuedate, $description, $description2, appendixes a–d available at http://ital-ica.blogspot.com/ quest metadata for batch ingesting etds into an institutional repository,” code4lib journal no. 7 (june 26, 2009), http://journal .code4lib.org/articles/1647 (accessed dec. 21, 2009). 16. tim brody, registry of open access repositories (roar), http://roar.eprints.org/ (accessed dec. 21, 2009). 17. duraspace, dspace, http://www.dspace.org/ (accessed dec. 21, 2009). 18. dublin core metadata initiative libraries working group, “dc-library application profile (dc-lib),” http://dublincore .org/documents/2004/09/10/library-application-profile/ (accessed dec. 21, 2009). 19. the ohio state university knowledge bank policy committee, “osu knowledge bank metadata application profile,” http://library.osu.edu/sites/techservices/kbappprofile.php (accessed dec. 21, 2009). 20. ohio journal of science (ohio academy of science), knowledge bank community, http://hdl.handle .net/1811/686 (accessed dec. 21, 2009); osu international symposium on molecular spectroscopy, knowledge bank community, http://hdl.handle.net/1811/5850 (accessed dec. 21, 2009). 21. ohio journal of science (ohio academy of science), ohio journal of science: volume 74, issue 3 (may, 1974), knowledge bank collection, http://hdl.handle.net/1811/22017 (accessed dec. 21, 2009). batch loading collections into dspace | walsh 125 $abstract, $gif, $ppt, ) = $csv->extract( "talk_id", "title", "creators", "abstract", "issuedate", "description", "authorinstitution", "image_file_name", "talk_gifs_file", "talk_ppt_file" ); $creatorxml = ""; # multiple creators are separated by ';' in the metadata. if (length($creators) > 0) { # create xml for each creator. @creatorlist = split(/;/,$creators); foreach $creator (@creatorlist) { if (length($creator) > 0) { $creatorxml .= '' .$creator.’’.”\n “; } } } # done processing creators for this item. # create the xml string for the abstract. $abstractxml = ""; if (length($description_abstract) > 0) { # convert special metadata characters for use in xml/html. $description_abstract =~ s/\&/&/g; $description_abstract =~ s/\>/>/g; $description_abstract =~ s/\' .$description_abstract.''; } # create the xml string for the description. $descriptionxml = ""; if (length($description) > 0) { # convert special metadata characters for use in xml/html. $description=~ s/\&/&/g; $description=~ s/\>/>/g; $description=~ s/\' .$description.''; } appendix e. mss 2009 batch loading scripts (cont.) 126 information technology and libraries | september 2010 # create the xml string for the author institution. $description2xml = ""; if (length($description2) > 0) { # convert special metadata characters for use in xml/html. $description2=~ s/\&/&/g; $description2=~ s/\>/>/g; $description2=~ s/\' .'author institution: '.$description2.''; } # convert special characters in title. $title=~ s/\&/&/g; $title=~ s/\>/>/g; $title=~ s/\:encoding(utf-8)", "$basedir/$subdir/$filename"); print fh <<"xml"; $identifier $title $issuedate $abstractxml $descriptionxml $description2xml article en $creatorxml xml close($fh); # create contents file and move files to the load set. # copy item files into the load set. if (defined($abstract) && length($abstract) > 0) { system "cp $indir/$abstract $basedir/$subdir"; } $sourcedir = substr($abstract, 0, 5); if (defined($ppt) && length($ppt) > 0 ) { system "cp $indir/$sourcedir/$sourcedir/*.* $basedir/$subdir/"; } if (defined($gif) && length($gif) > 0 ) { system "cp $indir/$sourcedir/$imagesubdir/*.* $basedir/$subdir/"; } # make the 'contents' file and fill it with the file names. appendix e. mss 2009 batch loading scripts (cont.) batch loading collections into dspace | walsh 127 system "touch $basedir/$subdir/contents"; if (defined($gif) && length($gif) > 0 && -d "$indir/$sourcedir/$imagesubdir" ) { # sort items in reverse order so they show up right in dspace. # this is a hack that depends on how the db returns items # in unsorted (physical) order. there are better ways to do this. system "cd $indir/$sourcedir/$imagesubdir/;" . " ls *[0-9][0-9].* | sort -r >> $basedir/$subdir/contents"; system "cd $indir/$sourcedir/$imagesubdir/;" . " ls *[a-za-z][0-9].* | sort -r >> $basedir/$subdir/contents"; } if (defined($ppt) && length($ppt) > 0 && -d "$indir/$sourcedir/$sourcedir" ) { system "cd $indir/$sourcedir/$sourcedir/;" . " ls *.* >> $basedir/$subdir/contents"; } # put the abstract in last, so it displays first. system "cd $basedir/$subdir; basename $abstract >>" . " $basedir/$subdir/contents"; $linenum++; } # done processing an item. --------------------------------------------------------------------------------------------------import.sh –#!/bin/sh # # import a collection from files generated on dspace # collection_id=1811/6635 eperson=[name removed]@osu.edu source_dir=./2009xml base_id=`basename $collection_id` mapfile=./map-dspace03-mss2009.$base_id /dspace/bin/dsrun org.dspace.app.itemimport.itemimport --add --eperson=$eperson --collection=$collection_id --source=$source_dir --mapfile=$mapfile appendix e. mss 2009 batch loading scripts (cont.) : | wang 81building an open source institutional repository at a small law school library | wang 81 fang wangcommunications v700 flatbed scanner, which was recommended by many digitization best practices in texas. for software, we had all the important basics such as ocr and image editing software for the project to start. for the following several months, i did extensive research on what digital asset management platform would be the best solution for the law library. we had options to continue displaying the digital collections through webpages or use a digital asset management platform that would provide long-term preservation as well as retrieval functions. we made the decision to go with the latter. generally speaking, there are two types of digital asset management platforms: proprietary and open source. in some rare occasions, a library chooses to develop its own system and not to use either type of the platforms if the library has designated programmers. there are pros and cons to both proprietary and open source platforms. although setting up the repository is fairly quick and easy on a proprietary platform, it can be very expensive to pay annual fees for hosting and using the service. for the open source software, it may appear to be “free” up front; however, installing and customizing the repository can be very time consuming and these solutions often lack technical and development support. there is no uniform rule for choosing a platform. it depends on what the organization wants to achieve and its own unique circumstances. i explored several popular proprietary platforms such as contentdm and digital commons. contentdm is an oclc product, which has a lot of capability and is especially good for displaying image collections. digital commons is owned of the repository is ongoing; it is valuable to share the experience with other institutions who wish to set up an institutional repository of their own and also add to the knowledgebase of ir development. institutional repository from the ground up unlike most large university libraries, law school libraries are usually behind on digital initiative activities because of smaller budgets, lack of staff, and fewer resources. although institutional repositories have already become a trend for large university libraries, it still appears to be a new concept for many law school libraries. at the beginning of 2009, i was hired as the digital information management librarian to develop a digital repository for the law school library. when i arrived at texas tech university law library, there was no institutional repository implemented. there were very few digital projects done at the law library. one digital collection was of faculty scholarship. this collection was displayed on a webpage with links to pdf files. another digital project, to digitize and provide access to the texas governor executive orders found in the texas register, was planned then disbanded because of the previous employee leaving the position. i started by looking at the digitization equipment in the library. the equipment was very limited: a very old and rarely used book scanner and a sheet-fed scanner. the good thing was that the library did have extra pcs to serve as workstations. i did research on the book scanner we had and also consulted colleagues i met at various digital library conferences about it. because the model is very outdated and has been discontinued by the vendor and thus had little value to our digitization project, i decided to get rid of the scanner. i then proposed to purchase an epson perfection building an open source institutional repository at a small law school library: is it realistic or unattainable? digital preservation activities among law libraries have largely been limited by a lack of funding, staffing and expertise. most law school libraries that have already implemented an institutional repository (ir) chose proprietary platforms because they are easy to set up, customize, and maintain with the technical and development support they provide. the texas tech university school of law digital repository is one of the few law school repositories in the nation that is built on the dspace open source platform.1 the repository is the law school’s first institutional repository in history. it was designed to collect, preserve, share and promote the law school’s digital materials, including research and scholarship of the law faculty and students, institutional history, and law-related resources. in addition, the repository also serves as a dark archive to house internal records. i n this article, the author describes the process of building the digital repository from scratch including hardware and software, customization, collection development, marketing and outreach, and future projects. although the development fang wang (fang.wang@ttu.edu) is digital information management librarian, texas tech university school of law library, lubbock, texas. 82 information technology and libraries | june 2011 two months later, we discovered that a preconfigured application called jumpbox for dspace was released and approved to be a much easier solution for the installation. the price was reasonable too, $149 a year (the price has jumped quite a bit since then). however, using jumpbox would leave our newly purchased red hat linux server of no use because jumpbox runs on ubuntu, therefore after some discussion we decided not to pursue it. we were a little stuck in the installation process. outsourcing the installation seemed to be a feasible solution for us at this point. we identified a reputable dspace service provider after doing extensive research including comparing vendors, obtaining references, and pursuing other avenues. after obtaining a quote, we were quite satisfied with the price and decided to contract with the vendor. while waiting for the contract to be approved by the university contracting office, i began designing the look and feel that is unique to the ttu school of law with some help from another library staff member. the installation finally took place at the beginning of january 2010. i worked very closely with the service provider during the installation to ensure the desired configuration for our dspace instance. our repository site with the ttu law branding became accessible to the public three days later. and with several weeks of warranty, we were able to adjust several configurations including display thumbnails for images. overall, we are very pleased with the results. after the installation, our it department maintains the dspace site and we host all the content on our own server. collection development of the ir content is the most critical element to an institutional repository. while we were waiting for our it department 66, the majority of the repositories worldwide were created using the dspace platform.2 for the installation, we looked at the opportunity to use services provided by the state digital library consortium texas digital library (tdl) and tried to pursue a partnership with the main university library, which had already implemented a digital repository. however, because of financial reasons and separate budgets, those approaches did not work out. so we decided to have our own it department install dspace. installation and customization of our dspace unlike large university libraries, smaller special libraries face many challenges while trying to establish an open source repository. after making the decision to use dspace, the first challenge we faced was the installation. dspace runs on postgresql or oracle and requires a server installation. customizing the web interface requires either the jspui (javaserver pages user interface) or xmlui (extensible markup language user interface). the staff in our it department knew little about dspace. however, another special library on campus offered their installation notes to our system administrator because they just installed dspace. although dspace runs on a variety of operating systems, we purchased red hat enterprise linux after some testing because it is the recommended os for dspace. then our system administrator spent several months trying to figure out how to install the software in addition to his existing projects. because we did not have dedicated it personnel working on the installation, the work was often interrupted and very difficult to complete. our it staff also found it very difficult to continue with the installation because the software requires a lot of expertise. by berkley press and is often used in the law library community. as a smaller law library, our budget did not allow us to purchase those platforms, which require annual fees of more than $10,000. so we had to look at the open source options. for the open source platforms, i investigated dspace, fedora, eprints and green stone. dspace is a javabased system developed by mit and hp labs. it offers a communitiescollections model and has built-in submission workflows and long-term preservation function. it can be installed “out of the box” and is easy to use. it has been widely adopted as institutional repository software in the united states and worldwide. fedora was also developed in the united states. it is more of a backend software with no web-based administration tools and requires a lot of programming effort. similar to dspace, eprints is another easy to set up and use ir software developed in the u.k. it is written in perl and is more widespread in europe. greenstone is a tool developed in new zealand for building and distributing digital library collections. it provides interfaces in 35 languages so it has many international users. when choosing an ir platform, it is not a question of which software is superior to others but rather which is more appropriate for the purpose and the content of the repository. our goal was to find a platform that had low costs and did not involve much programming. we also wanted a system that was capable of archiving digital items in various formats for the long term, flexible for data migration, had a widely accepted metadata scheme, decent search capability, and was easy to use. another factor we had to consider was the user base. because open source software relies on the user themselves for technical support for the most part, we wanted a software that had an active user community in the united states. dspace seemed to satisfy all of our needs. also, according to repository : | wang 83building an open source institutional repository at a small law school library | wang 83 hosted by the lubbock county bar association at the ttu law school. we made the initial announcement to the law faculty and staff and later to the lubbock county bar about the new digital initiative service we have established. we received very positive feedback from the law community. professor edgar’s family was delighted to see his collection made available to the public. following the success of the initial launch, i developed an outreach plan to promote the digital repository. to make the repository site more visible, several efforts were made: the repository site url was submitted to the dspace user registry, the directory of open access repositories (opendoar), and registry of open access repositories (roar); the site was registered with google webmaster tools for better indexing; and the repository was linked to several websites of the law school and library. the “faculty scholarship” collection and the “texas governor executive orders” collection became available shortly after. i then developed a poster of the newly established digital repository and presented it at the texas conference on digital libraries held at university of texas austin in may 2010. currently, our digital repository has more than eight hundred digital items as of august 2010. with more and more content becoming available in the repository, we plan on making an official announcement to the law community. we will also make entering first-year law students aware of the ir by including an article about the new repository in the library newsletter that is distributed to them during their orientation. our future marketing plan includes sending out announcements of new collections to the law school using our online announcement system techlawannounce and promoting the digital repository through the law library social networking pages on facebook and twitter. we also plan reviewed each year. based on the collection development policy, we made a decision to migrate the content of the old “faculty scholarship” collection from webpages into the digital repository. it was intended to include all publications of the texas tech law school faculty in the collection. we then hired a second-year law student as the digital project assistant and trained him on scanning, editing, and ocr-ing pdf files; uploading files to dspace; and creating basic metadata. we also brought another two student assistants on board to help with the migration of the faculty scholarship collection. the faculty services librarian checked the copyright with faculty members and publishers while i (the digital information management librarian) served as the repository manager handling more complicated metadata creation, performing quality control over student submissions, and overseeing the whole project. later development and promoting the ir during the faculty scholarship migration process, we discovered a need to customize dspace to allow active urls for publications. we wanted all the articles linked to three widely used legal databases: westlaw, lexisnexis, and hein online. because the default dspace system does not support active urls, it requires some programming effort to make the system detect a particular metadata field then render it as a clickable link. we outsourced the development to the same service provider who installed dspace for us. the results were very satisfying. the vendor customized the system to allow active urls and displayed the links as clickable icons for each legal database. in april 2010, “professor j. hadley edgar ’s personal papers” collection was made available in conjunction with his memorial service, to install dspace, we prepared and scanned two collections: the “texas governor executive orders” collection and the “professor j. hadley edgar’s personal papers” collection. the latter was a collection donated by professor edgar’s wife after he passed away in 2009. professor edgar taught at the law school from 1971 to 1991. he was named the robert h. bean professor of law and was twice voted by the student body as the outstanding law professor. the collection contains personal correspondence, photos, newspaper clippings, certificates, and other materials. many of the items have a high historic value to the law school. for the scanning standards, we used 200 dpi for text-based materials and 400 dpi for pictures. we chose pdf as our production file format as it is a common document format and smaller in size to download. after the installation was completed at the beginning of january, i drafted and implemented a digital repository collection development policy shortly after to ensure proper procedures and guidance of the repository development. the policy includes elements such as the purpose of the repository, scope of the collections, selection criteria and responsibilities, editorial rights, and how to handle challenges and withdrawals. i also developed a repository release form to obtain permissions from donors and authors to ensure open access for the materials in the repository. twelve collections were initially planned for the repository: “faculty scholarship,” “personal manuscripts,” “texas governor executive orders,” “law school history,” “law library history,” “regional legal history,” “law student works,” “audio/ video collection,” “dark archive,” “electronic journals,” “conference, colloquium and symposium,” and “lectures and presentations.” there will be changes to the collections in the future as the digital repository collection development policy will be 84 information technology and libraries | june 2011 all roads lead to rome. no matter what platform you choose, whether open source or not, the goal is to pick a system that best suits your organization’s needs. to build a successful institutional repository is not simply “scanning” and “putting stuff online.” various factors need to be considered, such as digitization, ir platform, collection development, metadata, copyright issues, and marketing and outreach. our experience has proven that it is possible for a smaller special library with limited resources and funding to establish an open source ir such as dspace and continue to maintain the site and build the collections with success. open source software is certainly not “free” because it requires a lot of effort. however, in the end it still costs a lot less than what we would pay to the proprietary software vendors. references 1. “the texas tech university school of law digital repository,” http://reposi tory.law.ttu.edu/ (accessed apr. 5, 2011). 2. “repository maps,” accessed http://maps.repository66.org/ (accessed aug. 16, 2010). (ssrn) links to individual articles in the faculty scholarship collection. after that, the next collections we will work on are the law school and law library history materials. we also plan to do some development on the dspace authentication to integrate with the ttu “eraider” system to enable single log-in. in the future, we want to explore the possibilities of setting up a collection for the works of our law students and engage in electronic journal publishing using our digital repository. conclusion it is not an easy task to develop an institutional repository from scratch, especially for a smaller organization. installation and development are certainly a big challenge for a smaller library with limited number of it staff. outsourcing these needs to a service provider seems to be a feasible solution. another challenge is training. we overcame this challenge by taking advantage of the state consortium’s dspace training sessions. subscribing to the dspace mailing list is necessary as it is a communication channel for dspace users to ask questions, seek help, and keep up to date about the software. on hosting information sessions for our law faculty and students to learn more about the digital repository. future projects there is no doubt that our digital repository will grow significantly because we have exciting collections planned for future projects. one of our law faculty, professor daniel benson, donated some of his personal files from an eight-year litigation representing the minority plaintiffs in the civil rights case of jones v. city of lubbock, 727 f. 2d 364 (5th cir. 1984) in which the minority plaintiffs won the case. the lawsuit changed the city of lubbock’s election system for city council members from the “at large” method to the “single member district system,” which allowed the minority candidates consistently being elected. this collection contains materials, notes, memoranda, letters, and other documents prepared and utilized by the plaintiffs’ attorneys. it has significant historical value because a texas tech law professor and five texas tech law graduates participated in that case successfully as pro bono attorneys for the minority plaintiffs. in addition, we plan on adding social science research network 156 information technology and libraries | december 2011 mark dehmlow editorial board thoughts: sharing responsibility in the digital age t his topic is very resonant for me because this past year we launched a new interface to our catalog, rich with all of the features that our users have been self-trained to expect from browsing the internet. we actually launched this project in public beta for two years, t–w–o years. i should also mention that the initial implementation team was diverse, drawing from technology, public services, collections, and technical services. yet, when we launched the project into production, it was only then that we heard concerns and complaints. those concerns revolved around two things—first, there was functionality in the classic catalog that wasn’t in the new one, and second, people were used to the old way of doing things and didn’t know how the supposedly more intuitive interface worked—a kind of opacholm syndrome, and more importantly for librarians, they wanted to know how to exploit the system powerfully. we also found during the first semester, there were few instructors teaching the new system because they were afraid they couldn’t speak authoritatively about it. people are creatures of habit and even though something might be easier to learn if it were your first exposure to it (macs vs. pcs anyone?), often times changing from a more complex, but well understood, process is difficult. i remember years ago at another institution i worked for, helping the organization move from a menu driven ils interface to a graphical user interface. it required staff to actually rethink the process they were performing because although the gui is able to make the process more efficient, it also hides many of the more mundane parts of it. with all of those concerns on the table all of a sudden, what did we do? we spent the summer after our production launch providing targeted training sessions and gathering in person feedback from our internal stakeholders. it probably amounted to more than thirty meetings over the course of three months. we synthesized feedback, identified the biggest pain points, and spent a couple of months developing solutions. providing a more organized training program and targeted feedback sessions as a replacement for our more generalized call for input bought us a lot of goodwill internally. it also gave us some direction on what areas to focus on and opened more dialog with the rest of the library. in the end, it is really important for all areas to be responsible for trying out new systems, even if those responsible are doing more outreach than simple general calls for participation. in some ways, those deploying new systems have the greater onus in this relationship in that they are driving many of the effort; this is especially critical for changes that have broad impact. taking a more organized and proactive approach to training and acclimating our organizations to change can go a long way to reducing conflict and stress. everyone is extremely busy, and the tendency for people is to ignore the things that aren’t directly in front of them. making efforts toward a more proactive strategy raises awareness and by meeting in person, you show people that their input is valuable enough to make time to listen and talk to them. taking this type of approach is important even in the cases where projects are managed by committees. liaisons don’t necessarily provide organizational saturation and oftentimes the vital information about a new system is filtered through their own sense of what is critical. a good start to determine how much communication is needed is to first gauge the potential impact—if a change affects more than a certain percentage of the library and its users, it probably means it will require a good deal more outreach so people don’t feel quite as off balance when the change is implemented. those deploying projects should add a couple of months onto the end of planning cycles to help provide training and gather feedback in a hands on way—e-mail announcements are more often ignored than not given the sheer amount of e-mail everyone gets these days. another possible strategy is to devise testing scripts for anyone trying the system to follow as opposed to just having them “try it out.” a script will give people some direction and hopefully get them into system functionality that they otherwise could miss by trying it without any specific goal. i am not so naive to think we will reach an allencompassing-kumbaya moment where communication is perfect and everyone agrees on what kinds of changes to implement in our organizations. i do think though, that teams and individuals who are implementing new systems can help alleviate anxieties if we build more time into our deployment processes to ease our organizations into change instead of hoping they learned how to swim before we all jump in. mark dehmlow (mdehmlow@nd.edu) is head, library web department, interim head, library information systems department, hesburgh libraries, university of notre dame, notre dame, indiana 16 information technology and libraries | march 2009 mathew j. miles and scott j. bergstrom classification of library resources by subject on the library website: is there an optimal number of subject labels? the number of labels used to organize resources by subject varies greatly among library websites. some librarians choose very short lists of labels while others choose much longer lists. we conducted a study with 120 students and staff to try to answer the following question: what is the effect of the number of labels in a list on response time to research questions? what we found is that response time increases gradually as the number of the items in the list grow until the list size reaches approximately fifty items. at that point, response time increases significantly. no association between response time and relevance was found. i t is clear that academic librarians face a daunting task drawing users to their library’s web presence. “nearly three-quarters (73%) of college students say they use the internet more than the library, while only 9% said they use the library more than the internet for information searching.”1 improving the usability of the library websites therefore should be a primary concern for librarians. one feature common to most library websites is a list of resources organized by subject. libraries seem to use similar subject labels in their categorization of resources. however, the number of subject labels varies greatly. some use as few as five subject labels while others use more than one hundred. in this study we address the following question: what is the effect of the number of subject labels in a list on response times to research questions? n literature review mcgillis and toms conducted a performance test in which users were asked to find a database by navigating through a library website. they found that participants “had difficulties in choosing from the categories on the home page and, subsequently, in figuring out which database to select.”2 a review of relevant research literature yielded a number of theses and dissertations in which the authors compared the usability of different library websites. jeng in particular analyzed a great deal of the usability testing published concerning the digital library. the following are some of the points she summarized that were highly relevant to our study: n user “lostness”: users did not understand the structure of the digital library. n ambiguity of terminology: problems with wording accounted for 36 percent of usability problems. n finding periodical articles and subject-specific databases was a challenge for users.3 a significant body of research not specific to libraries provides a useful context for the present research. miller’s landmark study regarding the capacity of human shortterm memory showed as a rule that the span of immediate memory is about 7 ± 2 items.4 sometimes this finding is misapplied to suggest that menus with more than nine subject labels should never be used on a webpage. subsequent research has shown that “chunking,” which is the process of organizing items into “a collection of elements having strong associations with one another, but weak associations with elements within other chunks,”5 allows human short-term memory to handle a far larger set of items at a time. larson and czerwinski provide important insights into menuing structures. for example, increasing the depth (the number of levels) of a menu harms search performance on the web. they also state that “as you increase breadth and/or depth, reaction time, error rates, and perceived complexity will all increase.”6 however, they concluded that a “medium condition of breadth and depth outperformed the broadest, shallow web structure overall.”7 this finding is somewhat contrary to a previous study by snowberry, parkinson, and sisson, who found that when testing structures of 26, 43, 82, 641 (26 means two menu items per level, six levels deep), the 641 structure grouped into categories proved to be advantageous in both speed and accuracy.8 larson and czerwinksi recommended that “as a general principle, the depth of a tree structure should be minimized by providing broad menus of up to eight or nine items each.”9 zaphiris also corroborated that previous research concerning depth and breadth of the tree structure was true for the web. the deeper the tree structure, the slower the user performance.10 he also found that response times for expandable menus are on average 50 percent longer than sequential menus.11 both the research and current practices are clear concerning the efficacy of hierarchical menu structures. thus it was not a focus of our research. the focus instead was on a single-level menu and how the number and characteristics of subject labels would affect search response times. n background in preparation for this study, library subject lists were collected from a set of thirty library websites in the united mathew j. miles (milesm@byui.edu) is systems librarian and scott j. bergstrom (bergstroms@byui.edu) is director of institutional research at brigham young university–idaho in rexburg. classification of library resources by subject on the library website | miles and bergstrom 17 states, canada, and the united kingdom. we selected twelve lists from these websites that were representative of the entire group and that varied in size from small to large. to render some of these lists more usable, we made slight modifications. there were many similarities between label names. n research design participants were randomly assigned to one of twelve experimental groups. each experimental group would be shown one of the twelve lists that were selected for use in this study. roughly 90 percent of the participants were students. the remaining 10 percent of the participants were full-time employees who worked in these same departments. the twelve lists ranged in number of labels from five to seventy-two: n group a: 5 subject labels n group b: 9 subject labels n group c: 9 subject labels n group d: 23 subject labels n group e : 6 subject labels n group f: 7 subject labels n group g: 12 subject labels n group h: 9 subject labels n group i: 35 subject labels n group j: 28 subject labels n group k: 49 subject labels n group l: 72 subject labels each participant was asked to select a subject label from a list in response to eleven different research questions. the questions are listed below: 1. which category would most likely have information about modern graphical design? 2. which category would most likely have information about the aztec empire of ancient mexico? 3. which category would most likely have information about the effects of standardized testing on high school classroom teaching? 4. which category would most likely have information on skateboarding? 5. which category would most likely have information on repetitive stress injuries? 6. which category would most likely have information about the french revolution? 7. which category would most likely have information concerning walmart’s marketing strategy? 8. which category would most likely have information on the reintroduction of wolves into yellowstone park? 9. which category would most likely have information about the effects of increased use of nuclear power on the price of natural gas? 10. which category would most likely have information on the electoral college? 11. which category would most likely have information on the philosopher emmanuel kant? the questions were designed to represent a variety of subject areas that library patrons might pursue. each subject list was printed on a white sheet of paper in alphabetical order in a single column, or double columns when needed. we did not attempt to test the subject lists in the context of any web design. we were more interested in observing the effect of the number of labels in a list on response time independent of any web design. each participant was asked the same eleven questions in the same order. the order of questions was fixed because we were not interested in testing for the effect of order and wanted a uniform treatment, thereby not introducing extraneous variance into the results. for each question, the participant was asked to select a label from the subject list under which they would expect to find a resource that would best provide information to answer the question. participants were also instructed to select only a single label, even if they could think of more than one label as a possible answer. participants were encouraged to ask for clarification if they did not fully understand the question being asked. recording of response times did not begin until clarification of the question had been given. response times were recorded unbeknownst to the participant. if the participant was simply unable to make a selection, that was also recorded. two people administered the exercise. one recorded response times; the other asked the questions and recorded label selections. relevance rankings were calculated for each possible combination of labels within a subject list for each question. for example, if a subject list consisted of five labels, for each question there were five possible answers. two library professionals—one with humanities expertise, the other with sciences expertise—assigned a relevance ranking to every possible combination of question and labels within a subject list. the rankings were then averaged for each question–label combination. n results the analysis of the data was undertaken to determine whether the average response times of participants, adjusted by the different levels of relevance in the subject list labels that prevailed for a given question, were significantly different across the different lists. in other words, would the response times of participants using a particular list, for whom the labels in the list were highly relevant 18 information technology and libraries | march 2009 to the question, be different from students using the other lists for whom the labels in the list were also highly relevant to the question? a separate univariate general linear model analysis was conducted for each of the eleven questions. the analyses were conducted separately because each question represented a unique search domain. the univariate general linear model provided a technique for testing whether the average response times associated with the different lists were significantly different from each other. this technique also allowed for the inclusion of a covariate—relevance of the subject list labels to the question—to determine whether response times at an equivalent level of relevance was different across lists. in the analysis model, the dependent variable was response time, defined as the time needed to select a subject list label. the covariate was relevance, defined as the perceived match between a label and the question. for example, a label of “economics” would be assessed as highly relevant to the question, what is the current unemployment rate? the same label would be assessed as not relevant for the question, what are the names of four moons of saturn? the main factor in the model was the actual list being presented to the participant. there were twelve lists used in this study. the statistical model can be summarized as follows: response time = list + relevance + (list × relevance) + error the general linear model required that the following conditions be met: first, data must come from a random sample from a normal population. second, all variances with each of the groupings are the same (i.e., they have homoscedasticity). an examination of whether these assumptions were met revealed problems both with normality and with homoscedasticity. a common technique— logarithmic transformation—was employed to resolve these problems. accordingly, response-time data were all converted to common logarithms. an examination of assumptions with the transformed data showed that all questions but three met the required conditions. the three 0.70 0.80 0.90 1.00 1.10 1.20 0.50 0.60 avg log performance trend figure 1. the overall average of average search times for the eight questions for all experimental groups (i.e., lists) questions (5, 6, and 7) were excluded from subsequent analysis. n conclusions the series of graphs in the appendix show the average response times, adjusted for relevance, for eight of the eleven questions for all twelve lists (i.e., experimental groups). three of the eleven questions were excluded from the analysis because of heteroscedascity. an inspection of these graphs shows no consistent pattern in response time as the number of the items in the lists increase. essentially, this means that, for any given level of relevance, the number of items of the list does not affect response time significantly. it seems that for a single question, characteristics of the categories themselves are more important than the quantity of categories in the list. the response times using a subject list with twenty-eight labels is similar to the response times using a list of six labels. a statistical comparison of the mean response time for each classification of library resources by subject on the library website | miles and bergstrom 19 group with that of each of the other groups for each of the questions largely confirms this. there were very few statistically significant different comparisons. the spikes and valleys of the graphs in the appendix are generally not significantly different. however, when the average response time associated with all lists is combined into an overall average from all eight questions, a somewhat clearer picture emerges (see figure 1). response times increase gradually as the number of the items in the list increase until the list size reaches approximately fifty items. at that point, response time increases significantly. no association was found between response time and relevance. a fast response time did not necessarily yield a relevant response, nor did a slow response time yield an irrelevant response. n observations we observed that there were two basic patterns exhibited when participants made selections. the first pattern was the quick selection—participants easily made a selection after performing an initial scan of the available labels. nevertheless, a quick selection did not always mean a relevant selection. the second pattern was the delayed selection. if participants were unable to make a selection after the initial scan of items, they would hesitate as they struggled to determine how the question might be reclassified to make one of the labels fit. we did not have access to a high-tech lab, so we were unable to track eye movement, but it appeared that the participants began scanning up and down the list of available items in an attempt to make a selection. the delayed selection seemed to be a combination of two problems: first, none of the available labels seemed to fit. second, the delay in scanning increased as the list grew larger. it’s possible that once the list becomes large enough, scanning begins to slow the selection process. a delayed selection did not necessarily yield an irrelevant selection. the label names themselves did not seem to be a significant factor affecting user performance. we did test three lists, each with nine items and each having different labels, and response times were similar for the three lists. a future study might compare a more extensive number of lists with the same number of items with different labels to see if label names have an effect on response time. this is a particular challenge to librarians in classifying the digital library, since they must come up with a few labels to classify all possible subjects. creating eleven questions to span a broad range of subjects is also a possible weakness of the study. we had to throw out three questions that violated the assumptions of the statistical model. we tried our best to select questions that would represent the broad subject areas of science, arts, and general interest. we also attempted to vary the difficulty of the questions. a different set of questions may yield different results. references 1. steve jones, the internet goes to college, ed. mary madden (washington, d.c.: pew internet and american life project, 2002): 3, www.pewinternet.org/pdfs/pip_college_report.pdf (accessed mar. 20, 2007). 2. louise mcgillis and elaine g. toms, “usability of the academic library web site: implications for design,” college & research libraries 62, no. 4 (2001): 361. 3. judy h. jeng, “usability of the digital library: an evaluation model” (phd diss., rutgers university, new brunswick, new jersey): 38–42. 4. george a. miller, “the magical number seven plus or minus two: some limits on our capacity for processing information,” psychological review 63, no. 2 (1956): 81–97. 5. fernand gobet et al., “chunking mechanisms in human learning,” trends in cognitive sciences 5, no. 6 (2001): 236–43. 6. kevin larson and mary czerwinski, “web page design: implications of memory, structure and scent for information retrieval” (los angeles: acm/addison-wesley, 1998): 25, http://doi.acm.org/10.1145/274644.274649 (accessed nov. 1, 2007). 7. ibid. 8. kathleen snowberry, mary parkinson, and norwood sisson, “computer display menus,” ergonomics 26, no 7 (1983): 705. 9. larson and czerwinski, “web page design,” 26. 10. panayiotis g. zaphiris, “depth vs. breath in the arrangement of web links,” www.soi.city.ac.uk/~zaphiri/papers/hfes .pdf (accessed nov. 1, 2007). 11. panayiotis g. zaphiris, ben shneiderman, and kent l. norman, “expandable indexes versus sequential menus for searching hierarchies on the world wide web,” http:// citeseer.ist.psu.edu/rd/0%2c443461%2c1%2c0.25%2cdow nload/http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/ cache/papers/cs/22119/http:zszzszagrino.orgzszpzaphiriz szpaperszszexpandableindexes.pdf/zaphiris99expandable.pdf (accessed nov. 1, 2007). 20 information technology and libraries | march 2009 appendix. response times by question by group 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 gr p a (5 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p d (2 3 ite m s) gr p e (6 it em s) gr p f (7 it em s) gr p g (1 2 ite m s) gr p h (9 it em s) gr p i (3 5 ite m s) gr p j (2 8 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 gr p a (5 it em s) gr p e (6 it em s) gr p f (7 it em s) gr p b (9 it em s) gr p c (9 it em s) gr p h (9 it em s) gr p g (1 2 ite m s) gr p d (2 3 ite m s) gr p j (2 8 ite m s) gr p i (3 5 ite m s) gr p k (4 9 ite m s) gr p l (7 2 ite m s) question 1 question 8 question 2 question 9 question 3 question 10 question 4 question 11 methods of randomization of large files with high volatility 79 patrick c. mitchell: senior programmer, washington state university, pullman, washington, and thomas k. burgess: project manager, institute of library research, university of california, los angeles, california key-to-address conversion algorithms which have been used for a large, direct access file are compared with respect to record density and access time. cumulative distribution functions are plotted to demonstrate the distribution of addresses generated by each method. the long-standing practice of counting address collisions is shown to be less valuable in fudging algorithm effectiveness than considering the maximum number of contiguously occupied file locations. the random access disk file used by the washington state university library acquisition sub-system is a large file with a sizable number of records being added and deleted daily. this file represents not only materials on order by the acquisitions section, but all materials which are in process within the technical services area of the library. the size of the file currently varies from approximately 12,000 to 15,000 items and has a capacity of 18,000 items. over 40,000 items are added and purged annually. each record consists of both fixed length fields and variable length fields. fixed fields primarily contain quantity and accounting information; the variable length fields represent bibliographic data. records are blocked at 1,000 characters for file structuring purposes; however the variable length information is treated as strings of characters with delimiters. the key to the file is a 16-character structure which is developed from the purchase order number. the structure of the key is as follows: six digits of the original purchase order number, two digits of partial order and credit information, and eight digits containing the computed relative record address. proper development of this key turns out to be 80 journal of library automation vol 3/1 march, 1970 the most important factor in achieving efficiency in both file access time and record density within the file. the w.s.u. purchase order numbering system, developed from a basic six-digit purchase order number, allows up to one million entries. of these, the library currently uses four blocks: one block for standing orders, one block for orders originating from the university after the system becomes operational, another block used by the systems people in prototype testing of the system, and a fourth block which was given to one vendor who operates an approval book program. in mapping a possible million numbers into eighteen thousand disk locations, there is a high probability that the disk addresses for more than one record will be the same. disk location, also called disk address, home position, and relative record address ( rra) in this paper, refers to the computed offset address of a record in the file, relative to the starting address of the file. currently, the file resides on an ibm 2316 disk pack which can store six 1000-character records per track. thus if the starting address of the file is track 40, a record with rra = 5 would have its home position on track 40, while a record with rra = 6 would have its home position on track 41. it should be noted that routines in this system are required to calculate neither absolute track address nor relative track address and therefore the file could be moved to any direct access device supported by os/bdam without program modification. when two records map into the same address, it is called a collision. for a write statement under the ibm 360 operating system, basic direct access methods, the system locates that disk address generated and if another record is found there, it sequentially searches from that point forward until a vacant space is found and then stores the new record in that space. the sequential search is done by a hardware program in the i/ 0 channel and proceeds at the rotational speed of the device on which the file resides. the cpu is free during this period to service other users. similarily, when searching for a record, the system locates the disk address and matches keys; if they do not match, it sequentially searches forward from that point. long sequential searches sharply degrade the operating efficiency of on-line systems. in initial experimentation with this file, it was discovered that some records were 2,500 disk positions away from their computed locations. this seriously reduced response time to the terminals which were operating against those records. the necessity to develop a method for placing each record close to its calculated location became quite obvious. however, the methodology for doing this was not as clear. the upper bound delay for a direct access read/write operation can be defined as the largest number of contiguously occupied record locations within the file. the problem of minimizing this upper bound for a particular file is equivalent to finding an algorithm which maps the keys in such a way that unoccupied locations are interspersed throughout the randomization of large files/mitchell and burgess 81 file space. one method for doing this is to triple the amount of space required for the file. this has been a traditional approach but is unsatisfactory in terms of its efficiency in space utilization. the method first used by the library was motivated by the necessity to "get on the air." its requirements were that it be easily implemented and perform to a reasonable degree. the prime modulo scheme seemed to qualify and was selected. as this algorithm was used, the largest prime number within the file size was divided into the purchase order number and the modulo remainder was used as an address; that is, rra = [po modulo pr] where rra is the relative record address, po is the purchase order number, and pr is a prime number. during the initial period file size grew to about 8,000 records. because the acquisitions section was converting from its manual operation, the file continued to grow in size and the collision problem became pronounced. when the file reached about 70% capacity-that is when 70% of the space allocated for the file was being occupied by records-this method became unusable; records were then located so far from their original addresses that terminal response times became degraded and batch process routines began to have significant increases in run times. with no additional space available to expand the size of the file, it became necessary to increase the record density within the existing file bounds. therefore an adaptation of the original algorithm was developed. in addition to generating the original number by dividing a prime number into the purchase order number and keeping the modulo remainder, the purchase order number was multiplied by 300 and divided by that same prime number to get an additional modulo remainder; the latter was added to the first modulo remainder and the sum then divided by 2: (po modulo pr) + (300 • po modulo pr) 2 rra = again this scheme brought some relief, but the file continued to grow as the system was implemented, and it became obvious that this procedure would also fail because of over-crowded areas in the file. a search of the literature using w. b. climenson's chapter on file structure ( 2) as a start provided some other methods for reducing the collision problem ( 1, 3, 4, 5, 6). several randomization or hashing schemes were examined. however, none of these methods appeared to be particularly pertinent to the set of conditions at washington state. in order to bring relief from the continuing problem of file and program maintenance involved with changing the file-mapping algorithm, research was initiated to devise an algorithm which would, independent of the input data, map records uniformly across the available file space. the algorithm which resulted utilizes a pseudo-random number generator, rand (7) developed at the w.s.u. computing center randl, program 360l-13.5.004, computing center library, computing center, 82 journal of library automation vol 3/ 1 march, 1970 washington state university, pullman, washington. the normal use of rand is to generate a sequence of uniformly distributed integers over the interval [1, m], where m is a specified upper bound in the interval [1, 231 -1]. in addition to m, rand has a second input parameter: n, which is the last number generated by rand. given m and n, rand generates a result r. rand is used by the algorithm to generate relative disk addresses by setting m to the size or capacity of the file, by setting n to the purchase order number of the record to be located, and by using r as the relative address of the record. rra =rand (po, m ) . in order to test the effectiveness of this algorithm and others which might be devised, a file simulation program was written bdamsim, program 360l-06.7.008, computing center library, computing center, washington state university, pullman, washington. inputs to this program are: a) an algorithm to generate relative record locations; b) a sequential file which contains the input data for "a"; c) various scalar values such as file capacity, approximate number of records in the file, title of output, etc. the program analyzes the numbers generated by "a" operating on "b" within the constraints of "c". the outputs of the program are some statistical results and a graphical plot showing the cumulative distribution function of the generated addresses. figures 1, 2, and 3 show the plotted output of the three algorithms operating against the current acquisitions file. the abscissas of the plots 8 • )!! ii! li 1i :;! 5i ::! !':! ~ ~ n n ~~ a= .. ~, ,~ -' -' ~)11 i'! a; ·:5 ~li ma! 0.. 0.. .. .. it ,:: ~ ~ ~--~~~~-±~~~--~~~~~~~~--~--~--~--~~0 21 , 10 '12.20 83.30 111,'10 105. 51 i:m.61 1~7.71 1811,81 im.tl 211.01 2$!,11 253.2f relrt ive record addresses lx i 02 l fig. 1. rra =po modulo pr randomization of large files/ mitchell and burgess 83 fig. 2. rra = ( (po modulo pr) + (300 x po modulo pr) )/ 2. 8 i )c ii! ~ ~ z! 5i fl !! l':! i<; ~ ;;; :::::: ;::: ~8 z 8::: .; .; ::~ ~ ,.. ~ ..j ..j ~~ ~iii ~m :ti~ a: a: """' "' ~ ~ ~ ~ fig. 3. rra =rand (po, pr). 84 journal of library a'1":111l"jwij i '_''l'h.l-t tt1ul um ~~i :f'i :t~ ;jl;lt'{>,'f r r_r,f rx 15 1 lf~ 1 1-e --····--j·-·· ·-.. lf [l{rl i i'yliilj f'li: ·r1 1jgt)f i *y:nt : ii!j,ii¥1.1 i 9;j.;,~;1: ; ~!z1tt '?" ~~:.,;.· .-.t r •.. ~,, •. x ~.:r, r_ x ,r,; ,r;; i ~~i_i_ 1 if r. ---· . -l -··· ·r'i!..r':~ i tm~m~x <¥~1 t :k j~,] f:k {i~ ~ ri fr t1>/ ilm ~'!<. #.l'li!iii *t9.t !k ix x: . rel ~ \ \ \ \ \ \ \ • \ .ii character key b character c 1rselection key a fig. 1. kanji teletypewriter keyboard of the national diet library. included on this keyboard are : kanji kana western alphabets numerals symbols and marks kanji pattern s kanji components space 2,006 90 144 20 210 40 139 total 2,6506 by using shift keys on the upper left of the keyboard, kana in both styles and alphabets in upper and lower cases can be input. for satisfacjapanese character input!morita 11 tory operation, the keyers must be professionally trained, and it is said that one to three months are necessary for them to be fully trained and able to input an average of fifty to sixty kanji per minute. this is not as fast as most other methods discussed. japanese typewriter the second of the full keyboard approaches is the japanese typewriter method, which uses a modification of the standard japanese typewriter with a tray filled with kanji printing types. the operator finds a character in the tray and punches it by moving a metal handle as the type bar is punched down to print the character. this is rather primitive and different in its operation from the english typewriter, which uses the ten-finger touch method. there are four variations: character location method. kanji are arranged on a keyboard by their codes, so that when a key is punched, the kanji is typed on regular paper as if it had been done by a regular japanese typewriter. at the same time, the code is automatically read from the location of the key and is punched on tape. code-plate scanning method. each type bar has a plate attached on its side, and the code for the character is marked on its plate . when a key is typed, the kanji is printed on paper and the code from the plate is optically scanned at the same time. coded typeface method. each typeface is made with a character on the upper half and a code for it on the lower hale when a key is typed, both the character and code are printed. the code on the bottom half is optically scanned from the printed paper. modified coded typeface method. instead of typing both characters and codes on the paper, this method prints only the characters on the front of the paper and, at the same time, prints a bar code on the back of the paper. the machine capable of doing this is complicated. the size of the character on a typeface can be bigger than in the variation above, and the bar code can be larger to make the scanning of the code easier and more precise. as the discussion of the four variations indicates, the japanese typewriter offers the advantage of being able to monitor input at the time of keying. since the japanese typewriter has been in use for a long time in offices where a quantity of official documents are dealt with, and since ordinary japanese typists can use this system without any additional training, the use of equipment similar in operation was considered advantageous . however, it should be noted that japanese typewriters have never become as prevalent as english typewriters, and the demand for computers comes from more areas than just those where japanese typewriters are used . for this reason, the use of japanese typewriters is not as advantageous as its proponents claim . an obvious 12 journal of library automation vol. 14/1 march 1981 disadvantage is its slow speed of operation-thirty to fifty characters per minute on the average. another disadvantage is that the number of characters on the keyboard is limited to about 3,000. tablet style this method, also known as pen-touch method, was recently developed . each character has a key, and characters are arranged in a certain order. the location of the characters on a matrix sheet determines the two-byte binary code, which consists of a two-digit numerical abscissa and twodigit numerical ordinate . the operator touches the key with a penshaped detector and the code for the character is punched on the paper tape. the operation is one-handed, requiring only a light touch of the key by a detector. keys are on one flat keyboard and are color-coded by sections to make it easier for the operator to locate them. light touch operation reduces operator fatigue. this method does not require special training. however, the number of kanji on a keyboard of reasonable size is limited to approximately 3,500. by shifting, twice as many characters can be handled, though all characters are not indicated on the keyboard. speed of input is not very high-thirty to seventy characters per minute. this system, already used in many libraries, is becoming increasingly popular because of its easy operation. there are three different technologies used: electromagnetic, electrostatic, and photoelectric. there are no differences in actual input operation for those electronically different methods. component pattern input although not a full keyboard method, component pattern input is closely related to these methods. the idea behind this approach is that most kanji are composed of one or more basic component units, two or more of which can be put together into one kanji according to one predetermined pattern out of forty general patterns. the inputting device has keys for those forty patterns along with keys for individual components on a special keyboard. to compose a kanji, a key for an appropriate pattern is selected and typed, and components are chosen to fill each individually numbered block of the selected pattern, following the established order as shown below. 7 each pattern has a code, and so does each component . when a key is typed, the code is punched on a paper tape as shown in figure 2. there are cases where a kanji with two components can be a component of another kanji, as shown in the first and second examples in figure 2. a kanji is constructed by punching at least three codes : one for a pattern and at least two for components. then, a kanji dictionary consisting of several thousand master-code combinations (see figure 3) is stored in a magnetic drum, and the several codes to compose a kanji punched on paper or cassette tapes are converted through this dictionjapanese character lnput!morita 13 k&njl nol on pattern a componenl parlo (radiula) lhe keyboard• ;1§ *-d! [e] . f§ ---. .j 2804 38d 2723 --·-c od eo ~t§ !-.~f~~ --:~ . : .... .: . ... ! 00 "j * ej 2806 3813 1638 1938 -codu t-t ;f:t:~ lm * ~t ~ ~ ~' ,,.~; u : __ ~~-; 4 2807 1638 1138 1138 1138 --cod eo ffe ~*,l; ~ [1@ * ;-1-1 y {! -l __ m1 ___.. 4 2807 1o3a 1817 142a 08z4 ---cod eo fig. 2 . component pattern input. z804 3813 z7zb 0000 0000 0000 8118 • ~-m z806 3813 1638 193!1 0000 0000 b 118 -ao z607 1638 1138 1138 1138 0000 6117 -~ 1a z807 1638 1817 l4za 08z4 0000 9815 .. t~ fig. 3. kanji dictionary. ary to a two-byte binary code assigned to that particular kanji. these are then handled as other kanji with an individual code. though this can be a stand-alone approach to inputting kanji, the principle has been adopted by the national diet library to supplement the inputting of kanji on the full keyboard kanji teletypewriter. the national diet library uses this system when inputting kanji that are not included in its keyboard. instead of having a special separate keyboard, the kanji teletypewriter of the national diet library integrates patterns and components as equivalents to other characters. its keyboard includes forty patterns and approximately 140 components. this was the most elementary approach to computerize kanji . conceived in the early developmental stage of kanji processing, it used one of the characteristics of kanji, the composition from several components. in actual situations, this technique requires at least three key strokes for one kanji and consumes time to locate the needed component on the 14 journal of library automation vol. 14/1 march 1981 keyboard. furthermore, it requires the complicated extra step of putting input codes through a kanji dictionary to combine component codes into a code per kanji. no library is currently using this system by itself. kana keyboard system the keyboard of a japanese syllabary typewriter has adapted the conventional english typewriter keyboard and has standard roman alphabet keys that contain katakana in shift (figure 4). since the number of katakana exceeds that of roman letters, the katakana keys are extended to keys for numerals and punctuation marks. this means that this typewriter can be used either for kana or roman letters by changing its mode. fig. 4. kana typewriter keyboard. two-key stroke method this variation of the kana keyboard system is referred to as the twokey stroke system, and uses kana as codes not as letters . roman letters can be used as codes, too. there are two different subvariations. they are: location correspondence. keys are divided into two sections : one for right hand, and the other for left hand. if two keys are to be stroked, there will be four possible combinations of key strokes: (1) left hand twice, (2) left .and right, (3) right and left, and (4) right twice. the keyboard is accompanied by a kanji table in which characters are arranged in several blocks and in a certain order within each block. each block, which contains twenty-six kanji in a four-by-six arrangement, is made according to each combination of strokes: first block is left and left; second block is left, right, etc. within each block, the ordinate consists of keys for the first stroke and the abscissa for the second . a kanji which is at the intersection of the above indicates which keys are to be typed. when kanji a is to be typed (see figure 5), since it is in block a indicating the stroke combination as left and left, the operator types a · and w by left hand. if kanji b is to be typed, the operator types key a by left hand and key p by right. each key has a byte code and a combination of two key strokes makes a composite, a two-byte binary code, for a kanji. the bit may be changed by shifting, and different kanji can block a (for left, left) g j.,;( '7-. (q) (w) (e) (r) ~ ( 1) 0000 '! (q) 00 00 4(a) o• 00 ll) 0/0 0 0 (z) ' ,. kanji a japanese character input!morita 15 'ij / (t) (y) 0 0 0 0 00 0 0 ~ (1) "' (q) 4(a) ''l (z) block b (for left, right) 7-.:::.. 7--e" o (u) (i) (0) (p) ($) (c) 000000 oooooo ooo.oo 0 0 0/0 0 0 ,. / / kanji b fig. 5. kanji table for location correspondence method. be typed if another table is prepared for kanji with different bits. association memory method . in this method, each kanji is given two kana which usually represent a reading of that kanji. the operator associates a kanji to be input with two kana assigned to that kanji, and types them with two strokes using the kana keys. both of the key-stroke methods are economical as well as convenient because of the wide availability of kana typewriters . mainly for that reason, both of these systems . have been well accepted and are expected to grow further. since this touch method does not require the operator to look for the character on the keyboard to input, it is the fastest to operate and is considered suitable for input in quantity. it is possible to input 60 to 120 characters per minute. the only drawback is that the operator must get acquainted with the arrangement of kanji in the first variation, and must memorize all the associated kana spelling for many kanji in case of the second variation. in either case, the operator must be professionally trained. the japan information center for science and technology, which indexes many scientific publications, employs a vendor who uses the location correspondence variation of this system for inputting information. display selection this also uses a kana typewriter with a screen in front . when a word is typed in kana, a group of kanji with that sound are displayed on the screen. the operator chooses the right kanji with a light pen-a slow but accurate operation. the operator does not have to be specially trained for this. kana-kanji conversion in contrast to the conventional approach of full keyboard inputting, an entirely new method for inputting kanji is gaining popularity as the 16 journal of library automation vol. 14/1 march 1981 availability of sophisticated software increases. this uses a kana typewriter keyboard to input japanese in syllabary or romanized form, converting them to kanji by software. there are two ways of conversion: one that converts word by word, and the other sentence by sentence. stenotype the stenotype is a typewriterlike device. the operator must be able to take shorthand. when the stenotype is used, it punches words in paper tapes. therefore, inputting is high speed. however, the operator must receive proper training. optical character recognition this system, developing quickly and expected to gain wider use, can scan a maximum of 2,500 printed kanji. 8 one variation connects a writing tablet to a computer so that as the operator writes kanji on the tablet, the computer scans them in stroke order. this function of scanning by the stroke order is considered to be an advantage for processing some types of japanese documents. the drawbacks are that the system is still very expensive, and the number of recognizable characters is fewer than 2,000. voice recognition this is an oral-visual system, in which the human voice is read by a computer. obviously the most difficult to develop, this system is still in an experimental stage . however, a prototype has been demonstrated at various exhibitions, and the system apparently possesses great potential. summary pattern configuration and output devices for japanese characters are basically the same as those for english. however, the pattern generation of characters is mechanically more complicated than that of the roman alphabet, because kanji has a more complicated structure than the roman alphabet and the number of components is greater. each kanji is represented by a two-byte binary code rather than one byte as in roman alphabet. because of this, the efficiency of retrieval is low. presently, hard copy and typesetting for printing of hard copy are the major output forms, and very little on-line retrieval of information with kanji is in current operation. problems particular to kanji processing among numerous problems in processing kanji through computers, major ones are: (1) which kanji are to be included; (2) how many characters are to be handled; (3) what code should be assigned and how it should be arranged on the keyboard or table; and (4) how the kanji not included on the keyboard should be treated. in the early stage of kanji computer development, different institujapanese character input/morita 17 tions handled the problems in ways best suited to their individual needs, according to the nature of the literature covered, the amount of literature processed, and the kinds of output needed . they experimented with the then best available capabilities. as a result, the finished systems are all independent and mutually incompatible. standardization is obviously necessary for exchange of information among the systems. in order to set standards for selection of characters and assignment of codes, jis (japan industrial standard) c6226-1978 has been compiled by the japan association for development of information processing. this is a table of characters designed for information exchange (a portion of which is shown in figure 6). it has a one-byte code as its abscissa and another as its ordinate. characters are arranged so that the intersection of abscissa and ordinate determines a kanji whose code consists of four numerals, two from the abscissa and two from the ordinate. included in the table are kana in both styles, roman, greek, and cyrillic alphabets in upper and lower cases, diacritical marks, numerals, and punctuation marks, as follows: 1. special characters 108 2. numerals (arabic) 10 3. roman alphabets 52 4. hiragana 83 5. katakana 86 6. greek alphabets 48 7. cyrillic alphabets 66 8 . kanji 6,349 total 6~8029 in the first section of the table , numerals, alphabets., kana, and special characters are grouped . in the second section, the total of 2, 965 frequently used kanji are arranged as the first priority group, and an additional 3,384 kanji are selected as the second group 10 in the bottom half of the table. kanji are printed in the preferred style for printing typeface. this table will resolve problems 1 to 3 mentioned above. institutions that had arranged their own codes for kanji, including the national institute of japanese literature, are now automatically translating their own codes into jis codes. in cases where needed kanji are not included on the keyboard, handling varies. with the japanese typewriter, because each kanji is inscribed on a typeface, only the kanji on that typeface is printed when the type bar is stroked . therefore , only kanji that have typefaces can be input in this system, while some other handling is possible in other methods. while the number of characters that can be accommodated on keyboards is limited to 2,000 to 3,500, depending on the type of equip18 journal of library automation vol. 14/1 march 1981 b7 d d did d d d d 0 d 0 0 0 b6 1 1 1 1 1 1 1 1 1 1 1 1 1 ! ~ bs d d d d d 1 d 0 d d d 0 d d 2 "' b4 d d d d 0 d 0 1 1 1 1 1 1 1 bj d 0 d 1 1 1 1 0 0 0 0 1 1 1b2 d 1 1 d 0 1 1 0 0 1 1 d 0 bt 1 0 1 0 1 0 1 0 1 0 1 d 1 ~ 1 "'1 1~ b4 1 2 3 4 5 6 7 8 9 10 11 12 13 b; b6 b5 b3 b2 bt 0 1 0 0 0 0 1 1 :·s p: i jl r-f ii ' lll-i . . ? i ~ ~ lj ' 0 ji' 1 • _; ' 1... ---' . . 0 1 d 0 1 0 1 0 2 ~ oic'ji6 a. \l v * ' t -i t 0 1 0 0 0 j1 1 3 0 1 0 i q i 1 0 0 4 ... ..j.. ~.--. ) ;{_ i .z h tj' /j{ ~ "> d) "' 7 }; 0 1 0 0 1 0 1 5 7 711 1 '/ rf .:r.. .x. ;;t ;t ij ij~ ~ 0 1 0 j 0 i 1 ii 0 6 a!bir t.ieiz h 8 r kia m n 0 1 0 j 0 1 1 i 1 i 7 a 6 1 8 rln e e )k 3 l1 i1 k ji 0 1 0 1 0 olol 8 0 1 ol1,oloi11 9 0 1 0 1 0 ii! 0 10 0 i 1 0 1 0 1 ' 11 j. 0 1 ol 1 1jo 0 12 0 1 0 1 1 0 1 13 0 1 0 1 1 1 0 14 0 1 o 1 1 1 1 1 15 r· :!fi p.§. k ~ n -~ "' 1 t-'· ttr; j ;~ ;_,~ -ftt !ffi 0 1 1 0 0 0 0 16 5.p.. ' t ,u a. ').{ * _[§ ~c...· 0 \1 ' 1 0 0 0 1 17 vi,"'i ~~ ,., .. p-[ :tji r'-· ft•;j;j i rr: 1fn .!jfl ·~.c.~ >j(; ,>_l;;, , .~. lit (j • 1 -'fj•--;;1 0/ !_11/ 0 0 1 0 18 tftl b.fltitti l~j [£}:\ £ fjil ~~ n ~;_rj :& !iii] :l~ j . f--""· i . --:----·-·· ~q.~~ t~r-i~ 1 jf( fe .r.t: ''"' if~~ rm 0 1 1 0 oj1/l 19 is •r ·1,. i \. 1,el ;r-.l; j,~ ~ 0 1 1 0 1 0 0 20 5''5 ;\ j ....:: i ji "'~ fn . f i )(ij • 'f-lj!t. ret jf~ ;flj /fjj •wj .;lj: p~ n i 1 1 n 1 i n i 1 ?l ~ .j~ i m i ~ \.;!cr j:.jt ~rr ~ i.gi ~;j 14!.~ h:l :=r fig. 6 . code of th e japanese graphic character set for information interchange. japanese character input!morita 19 ment, character generators have the capability of outputting more than the number of characters on the keyboard. figure 7 shows their relationship. characters that are in the generator but not on the keyboard must be frequently processed, because the number of characters needed for most documents could reach 6,000 to 6,500. using a shift key to enter another mode is a fairly common technique for inputting uncommon kanji. the keyboard may not have a character but, if the character generator has it, the code for that character can be input by shifting. for example, if a character on the keyboard has a code 0117, a bit is changed so the code 8117 can be typed by shifting and typing that key. if the code 8117 is assigned to another kanji not on the keyboard but indexed in the dictionary, it can be input. this applies for the kanji teletypewriter, tablet style, and the two-key stroke variations of the kana typewriter. in the kanji teletypewriter system used by the national diet library, the keyboard accommodates 2,650 characters, while its character generi i i i ,---...... ' / -'-fig. 7. kanji creating capability. outside system capability system capability character generator capability keyboard characters ator has the capability for 5, 717. operators in the national diet library input kanji that are not on the keyboard by using component pattern input method. or, if the operator finds the kanji code in the specially compiled dictionary in which codes for kanji are indexed, a shift key is used to change the bit, thus creating the code for kanji not on the keyboard. most other tablet systems use code dictionaries. in the twokey stroke variations of kana typewriters, tables of kanji for second and third or more shifts can be built, especially when the location association method is used. the handling of kanji that are not in character generators is more difficult. only the digital character generator, the kind that uses either dot or stroke, can add characters fairly easily. in the flying spot system, characters can be added, but it must be done professionally with an additional character cylinder and is very costly. the national diet library, which now uses flying spot, limits addition of kanji to a minimum. because its output is solely in printed book form, the national diet library inputs a fill character for kanji not in the system . when 20 journal of library automation vol. 14/1 march 1981 the phototypeset masters are made, the fill characters are replaced by typeset characters . the use of a fill character suffices only when the output is phototypeset, because there is a step to replace fill characters by typeface. however, as long as the data base includes many fill characters on the magnetic tapes, the on-line retrieval of information or later utilization of tapes becomes unsatisfactory . the national institute of japanese literature uses a dot matrix and prints by wiredot impact . if a kanji is not in the character generator, the institute's staff composes the kanji in an enlarged dot matrix and creates the capability for printing in the generator. if the kanji made in such a way is used only once, the kanji pattern is not stored in the character generator, so that the generator does not reach its full capacity quickly. the enlarged dot composite for kanji created in the institute is filed and indexed for future use. most other institutions simply do not use those less commonly used kanji, and substitute kana for them . in addition to the problems common to any character output, such as size and number of dots, the problem of the space for kanji in relation to other characters and the choice of vertical or horizontal printing of japanese sentences with kanji must be considered. kanji have many strokes and, as mentioned before, are expressed by two-byte codes . each kanji needs a double space when displayed on screens or printed. when a kanji is used with numerals or kana, the kanji part looks fine but the numerical part has too much space between each numeral. therefore, input of kanji is done in a kanji mode and input of kana, roman alphabets, and numerals are in a kananumerical mode. in this way a multidigit figure looks like one whole figure rather than a line of one-digit figures . some formal documents must be printed in the traditional vertical arrangement. to cope with this situation, some line printers have the capability to precompose a vertical page before printing it. there are multicolor crts · on the market that can be used for the retrieval of library-related information, e. g., main entry in red, series statement in yellow. one last problem that must be considered is that most of these systems require trained operators, or else the operation is very slow. the information is edited and compiled by the editors and prepared for input in the form of worksheets. so are the revisions. at various stages of revising the text, the information must be printed, given to the editors, and revised . further developments in simplifying input and revising texts for efficient flow are to be expected. application of kanji systems processing of vernacular-language materials in their own writing systems is considered vital for research libraries in this country. in adoptjapanese character input/morita 21 ing the kanji systems in such libraries, there are three major factors that must be considered: the objectives and needs of the institution, the cost, and the personnel. first, the institution must know what it must accomplish by means of such a system. the needs may not be the same for all institutions . is the system for retrieving catalog information, or for inputting catalog and other information? is it for internal processing or patron use? is it for a large bibliographic utility to distribute information to its subscribers, or for an individual institution to process its own information? could the system be shared by the department of asian studies in any way? the character set needs· of the institution are a major factor in choosing the system . since input and output devices are different, i.e., one cannot input kanji on a crt and retrieve kanji from the same crt, the institution must consider how much it will need to input, or whether it can rely on available data bases. some institutions may not need any input equipment if they utilize available data bases . if japan marc and other tapes are made accessible by a large bibliographic utility in this country, the institutions will be able to obtain bibliographic information in kanji on the screen. if they want only catalog cards or a com catalog, they will not need any equipment except the terminals supported by the utility. if they want to input, they must consider what form or forms of output they need, how to create the characters not included in the system, in addition to which system to choose. second, cost is an important factor. is the expense jl.lstified in terms of the other needs of the library? what can be accomplished per dollar spent? the kanji systems are still expensive, though the cost will eventually be reduced. how much can be spent and how much continuing support can be expected are factors that modify system expectations. the budget must include not only the one-time hardware cost , but also the software, maintenance, and personnel. third, the availability of personnel will affect the choice of system. what degree of language expertise does the system require in each stage of operation, such as inputting, maintenance , and programming? does it need terminal operators trained in those languages? what other personnel does the system need as far as language-related qualification is concerned? apart from the three major factors discussed above, there are some technical aspects that must be adjusted to library situations in this country. since japanese, chinese, and korean use the same chinese ideographs to different degrees and in different ways, libraries considering automated processing of these language materials are probably expected to handle all three languages by the same system, to say nothing about the other non-roman scripts. problems will arise in selecting characters for inclusion in the system. as pointed out earlier with regard to 22 journal of library automation vol. 14/1 march 1981 japanese character processing, there are simply too many characters for the present capacity of any computer. if korean and chinese languages are to be handled by the same computer, this problem multiplies. the korean alphabet, called hangul, would have to be included. chinese has more characters than japanese. worse yet is the fact that some kanji are simplified in different ways in japan and china, so that they are neither recognizable nor interchangeable between them . it will be an enormous task to accommodate both in the same system. another problem is the arrangement and indexing of kanji. if a full keyboard, a japanese typewriter keyboard, or two-key stroke system, especially its location association method by kana typewriter, is considered for japanese, chinese, and korean, the arrangement of the characters must be indexed and accessed for the three languages, in addition to the multiple readings found in japanese. for example, kanji on the japanese keyboard are usually arranged by the initial sound of the japanese reading of the kanji . this arrangement will be useless for chinese and korean, because japanese readings are not the same as chinese or korean readings. the arrangement of kanji on the keyboards must be on some new principle common to these languages. even if the kana-kanji conversion is used, and roman alphabet-kanji conversion software is adopted, software to handle those three languages must be developed. such software would have to be highly sophisticated. the presence of many homonyms in chinese will cause a great problem to the extent that the system relies on transliterated or romanized forms of the language . recognition of the many identical spellings in different language contexts will be extremely difficult. the above discussion is based on what is currently available in japan . the combination of existing inputting, generating, and outputting equipment developed by japanese technology opens up various possibilities for us to build effective systems in this country . acknowledgment this article is based on a study conducted in japan as a japan foundation professional fellow, and as a visiting re search fellow of the center for research on information and library science, university of tokyo. references l. national institute of japanese lite rature, implementation of a computer system and a kanji handling system at ni]l (tokyo: nijl, 1978), p.16. 2. toshio ishiwata, "kanji shori kenkyu ni motomerareru mono " ("requirements for study on kanji processing"] computopia no.9 (1977) , p.35 . 3. gendai yoga no kiso chishiki , 1980 {basic knowledge on current terms , 1980] (tokyo: jiyukokuminsha, 1980), p .999. 4. figures are taken from the following two sources and compiled by the author: hasegawa, jitsur6. "kanji shari sochi" ("kanji processing devices"] ]aha shari [in formation processing] 19, no.4:353 (april 1978). japanese character lnput!morita 23 sugai, kazur6. "kanji nyii.-shutsuryoku sochi mo kaihatsu doko" ["a trend in development of kanji input-output devices"] business communication 16, no. 7:41 (1979). 5. used for the pattern input mentioned in the following component pattern input system . 6. national diet library, library automation in the national diet library (tokyo: the library, 1979), p.4 . 7. ibid., p.7 . 8. asia business consultants is using an optical character recognition system that can scan handwritten kana and numerals in a small scale to input and process catalog information for a library collection. 9. "joh6 kokan no tame no kanji fug6 no hy6junka" ["s tandarization of kanji code for information interchange"] kagaku gijitsu bunken siibisu [scientific and technical documents service] no.50 (1978), p.29. 10. ibid., p .28. ichiko morita is assistant professor in library administration and head, automated processing division, the ohio state university libraries . editor's notes most ]ola readers are aware of significant delays in publication in the last volume. susan k. martin, a former editor of ]ola, and richard d. johnson, a former editor of college & research libraries , gave freely of their time and energy to bring the journal back on schedule. mary madden, judith schmidt, and the members of the editorial board under the leadership of charles husbands all worked closely with sue and richard in this effort. this was a second time around for sue, who undertook a similar task when she assumed the jola editorship in 1972. the ]ola readership and this editor owe debts of gratitude to sue, richard, and all the others who helped. we do not foresee major changes in the format of the journal as established principally under the editorships of kilgour and martin. we look for increased strength in our book reviews section under the editorship of david weisbrod. the addition of tom harnish as assistant editor for video technologies indicates our recognition of the growing importance of videobased information systems. we encourage reader suggestions. w e welcome brief communications of successes or failures that might be of interest to other readers. letters to the editor about any of our feature articles or communications are solicited. president’s message: open access/open data colleen cuddy information technologies and libraries | march 2012 1 i am very excited to write this column. this issue of information technology and libraries (ital) marks the beginning of a new era for the journal. ital is now an open-access, electronic-only journal. there are many people to thank for this transition. the lita publications committee led by kristen antelman did a thorough analysis of publishing options and presented a thoughtful proposal to the lita board; the lita board had the foresight to push for an open-access journal even if it might mean a temporary revenue loss for the division; bob gerrity, ital editor, has enthusiastically supported this transition and did the heavy lifting to make it happen; and the lita office staff worked tirelessly for the past year to help shepherd this project. i am proud to be leading the organization during this time. to see ital go open access in my presidential year is extremely gratifying. as cliff lynch notes in his editorial, “the library profession has been slow to open up access to the publications of its own professional societies, to take advantage of the greater reach and impact that such policies can offer.” as librarians challenge publishers to pursue open-access venues, myself included, i am relieved to no longer be a hypocrite. by supporting open access we are sending a strong message to the community that we believe in the benefits of open access and we encourage other library organizations to do the same. ital will now reach a much broader and larger audience. this will benefit our authors, the organization, and the scholarship of our profession. i understand that while our members embrace open access, not everyone is pleased with an online-only journal. the number of new journals being offered electronically only is growing and i believe we are beginning to see a decline in the dual publishing model of publishers and societies offering both print and online journals. my library has been cutting back consistently on print copies of journals and this year will get only a handful of journals in print. personally, i have embraced the electronic publishing world. in fact, i held off on subscribing to the new yorker until it had an ipad subscription model! i estimate that i read 95 percent of my books and all of my professional journals electronically. the revolution has happened for me and for many others. i know that our membership will adapt and transition their ital reading habits to our new electronic edition and i look forward to seeing this column and the entire journal in its new format. colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011-12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york. mailto:colleen.cuddy@med.cornell.edu president’s message | cuddy 2 earlier this week saw the research works act die. librarians and researchers across the country celebrated this victory as we preserved an important open-access mandate requiring the deposition of research articles funded by the national institutes of health into pubmed central. this act threatened not just research but the availability of health information to patients and their families. as librarians, we still need to be vigilant about preserving open access and supporting open-access initiatives. i would like to draw your attention to the federal research public access act (frpaa, hr 4004). this act was recently introduced in the house, with a companion bill in the senate. as described by the association of research libraries, frppa would ensure free, timely, online access to the published results of research funded by eleven u.s. federal agencies. the bill gives individual agencies flexibility in choosing the location of the digital repository to house this content, as long as the repositories meet conditions for interoperability and public accessibility, and have provisions for long-term archiving. the legislation would extend and expand access to federally-funded research resources and, importantly, spur and accelerate scientific discovery. notably, this bill does not take anything away from publishers. no publisher will be forced to publish research under the bill’s provisions; any publisher can simply decline to publish the material if it feels the terms are too onerous. i encourage the library community to contact their representatives to support this bill. open access and open data are the keystones of e-science and its goals of accelerating scientific discovery. i hope that many of you will join me at the lita president’s program on june 24, 2012, in anaheim. tony hey, corporate vice president of microsoft research connections and former director of the u.k.'s e-science initiative, and clifford lynch, executive director of the coalition for networked information, will discuss data-intensive scientific discovery and its implications for libraries, drawing from the seminal work the fourth paradigm. librarians are beginning to explore our role in this new paradigm of providing access to and helping to manage data in addition to bibliographic resources. it is a timely topic and one in which librarians, due to our skill set, are poised to take a leadership role. reading the fourth paradigm was a real game changer for me. it is still extremely relevant. you might consider reading a chapter or two prior to the program. it is an open-access e-book available for download from microsoft research (http://research.microsoft.com/en-us/collaboration/fourthparadigm/). i keep a copy on my ipad, right there with downloaded ital article pdfs. http://www.arl.org/pp/access/frpaa-2012.shtml http://research.microsoft.com/en-us/collaboration/fourthparadigm/ 40 information technology and libraries | march 2010 mary kurtz dublin core, dspace, and a brief analysis of three university repositories this paper provides an overview of dublin core (dc) and dspace together with an examination of the institutional repositories of three public research universities. the universities all use dc and dspace to create and manage their repositories. i drew a sampling of records from each repository and examined them for metadata quality using the criteria of completeness, accuracy, and consistency. i also examined the quality of records with reference to the methods of educating repository users. one repository used librarians to oversee the archiving process, while the other two employed two different strategies as part of the selfarchiving process. the librarian-overseen archive had the most complete and accurate records for dspace entries. t he last quarter of the twentieth century has seen the birth, evolution, and explosive proliferation of a bewildering variety of new data types and formats. digital text and images, audio and video files, spreadsheets, websites, interactive databases, rss feeds, streaming live video, computer programs, and macros are merely a few examples of the kinds of data that can be now found on the web and elsewhere. these new dataforms do not always conform to conventional cataloging formats. in an attempt to bring some sort of order from chaos, the concept of metadata (literally “data about data”) arose. metadata is, according to ala, “structured, encoded data that describe characteristics of informationbearing entities to aid in the identification, discovery, assessment, and management of the described entities.”1 metadata is an attempt to capture the contextual information surrounding a datum. the enriching contextual information assists the data user to understand how to use the original datum. metadata also attempts to bridge the semantic gap between machine users of data and human users of the same data. n dublin core dublin core (dc) is a metadata schema that arose from an invitational workshop sponsored by the online computer library center (oclc) in 1995. “dublin” refers to the location of this original meeting in dublin, ohio, and “core” refers to that fact dc is set of metadata elements that are basic, but expandable. dc draws upon concepts from many disciplines, including librarianship, computer science, and archival preservation. the standards and definitions of the dc element sets have been developed and refined by the dublin core metadata initiative (dcmi) with an eye to interoperability. dcmi maintains a website (http://dublincore.org/ documents/dces/) that hosts the current definitions of all the dc elements and their properties. dc is a set of fifteen basic elements plus three additional elements. all elements are both optional and repeatable. the basic dc elements are: 1. title 2. creator 3. subject 4. description 5. publisher 6. contributor 7. date 8. type 9. format 10. identifier 11. source 12. language 13. relation 14. coverage 15. rights the additional dc elements are: 16. audience 17. provenance 18. rights holder dc allows for element refinements (or subfields) that narrow the meaning of an element, making it more specific. the use of these refinements is not required. dc also allows for the addition of nonstandard elements for local use. n dspace dspace is an open-source software package that provides management tools for digital assets. it is frequently used to create and manage institutional repositories. first released in 2002, dspace is a joint development effort of hewlett packard (hp) labs and the massachusetts institute of technology (mit). today, dspace’s future mary kurtz (mhkurtz@gmail.com) is a june 2009 graduate of drexel university’s school of information technology. she also holds a bs in secondary education from the university of scranton and an ma in english from the university of illinois at urbana– champaign. currently, kurtz volunteers her time in technical services/cataloging at simms library at albuquerque academy and in corporate archives at lovelace respiratory research institute (www.lrri.org), where she is using dspace to manage a diverse collection of historical photographs and scientific publications. dc, dspace, and a brief analysis of three university repositories | kurtz 41 is guided by a loose grouping of interested developers called the dspace committers group, whose members currently include hp labs, mit, oclc, the university of cambridge, the university of edinburgh, the australian national university, and texas a&m university. dspace version 1.3 was released in 2005 and the newest version, dspace 1.5, was released in march 2008. more than one thousand institutions around the world use dspace, including public and private colleges and universities and a variety not-for-profit corporations. dc is at the heart of dspace. although dspace can be customized to a limited extent, the basic and qualified elements of dc and their refinements form dspace’s backbone.2 n how dspace works: a contributor’s perspective dspace is designed for use by “metadata naive” contributors. this is a conscious design choice made by its developers and in keeping with the philosophy of inclusion for institutional repositories. dspace was developed for use by a wide variety of contributors with a wide range of metadata and bibliographic skills. dspace simplifies the metadata markup process by using terminology that is different from dc standards and by automating the production of element fields and xml/html code. dspace has four hierarchical levels of users: users, contributors, community administrators, and network/ systems administrators. the user is a member of the general public who will retrieve information from the repository via browsing the database or conducting structured searches for specific information. the contributor is an individual who wishes to add their own work to the database. to become a contributor, one must be approved by a dspace community administrator and receive a password. a contributor may create, upload, and (depending upon the privileges bestowed upon him by his community administrator), edit or remove informational records. their editing and removal privileges are restricted to their own records. a community administrator has oversight within their specialized area of dspace and accordingly has more privileges within the system than a contributor. a community administrator may create, upload, edit, and remove records, but also can edit and remove all records available within the community’s area of the database. additionally, the community administrator has access to some metadata about the repository’s records that is not available to users and contributors and has the power to approve requests to become contributors and grant upload access to the database. lastly, the community administrator sets the rights policy for all materials included in the database and writes the statement of rights that every contributor must agree to with every record upload. the network/systems administrator is not involved with database content, focusing rather on software maintenance and code customization. when a dspace contributor wishes to create a new record, the software walks them through the process. dspace presents seven screens in sequence that ask for specific information to be entered via check buttons, fillin textboxes, and sliders. at the end of this process, the contributor must electronically sign an acceptance of the statement of rights. because dspace’s software attempts to simplify the metadata-creation process for contributors, its terminology is different from dc’s. dspace uses more common terms that are familiar to a wider variety of individuals. for example, dspace asks the contributor to list an “author” for the work, not a “creator” or a “contributor.” in fact, those terms appear nowhere in any dspace. instead, dspace takes the text entered in the author textbox and maps it to a dc element—something that has profound implications if the mapping does not follow expected dc definitions. likewise, dspace does not use “subject” when asking the contributor to describe their material. instead, dspace asks the contributor to list keywords. text entered into the keyword field is then mapped into the subject element. while this seems like a reasonable path, it does have some interesting implications for how the subject element is interpreted and used by contributors. dc’s metadata elements are all optional. this is not true in dspace. dspace has both mandatory and automatic elements in its records. because of this, data records created in dspace look different than data records created in dc. these mandatory, automatic, and default fields affect the fill frequency of certain dc elements—with all of these elements having 100 percent participation. in dspace, the title element is mandatory; that is, it is a required element. the software will not allow the contributor to proceed if the title text box is left empty. as a consequence, all dspace records will have 100 percent participation in the title element. dspace has seven automatic elements, that is, element fields that are created by the software without any need for contributor input. three are date elements, two are format elements, one is an identifier, and one is provenance. dspace automatically records the time of the each record’s creation in machine-readable form. when the record is uploaded into the database, this timestamp is entered into three element fields: dc.date.available, dc.date.accessioned, and dc.date.issued. therefore dspace records have 100 percent participation in the date element. for previously published materials, a separate screen asks for the original publication date, which is then 42 information technology and libraries | march 2010 placed in the dc.date.issued element. like title, the original date of publication is a mandatory field, and failure to enter a meaningful numerical date into the textbox will halt the creation of a record. in a similar manner, dspace “reads” the kind of file the contributor is uploading to the database. dspace automatically records the size and type (.doc, .jpg, .pdf, etc.) of the file or files. this data is automatically entered into dc.format.mimetype and dc.format.extent. like date, all dspace records will have 100 percent participation in the format element. likewise, dspace automatically assigns a location identifier when a record is uploaded to the database. this information is recorded as an uri and placed in the identifier element. all dspace records have a dc.identifier.uri field. the final automatic element is provenance. at the time of record creation, dspace records the identity of the contributor (derived from the sign-in identity and password) and places this information into a dc.provenance element field. this information becomes a permanent part of the dspace record; however, this field is a hidden to users. typically only community and network/systems administrators may view provenance information. still, like date, format, and identifier elements, dspace records have automatic 100 percent participation in provenance. because of the design of dspace’s software, all dspace-created records will have a combination of both contributor-created and dspace-created metadata. all dspace records can be edited. during record creation, the contributor may at any time move backward through his record to alter information. once the record has been finished and the statement of rights signed, the completed record moves into the community administrator’s workflow. once the record has entered the workflow, the community administrator is able to view the record with all the metadata tags attached and make changes using dspace’s editing tools. however, depending on the local practices and the volume of records passing through the administrator’s workflow, the administrator may simply upload records without first reviewing them. a record may also be edited after it has been uploaded, with any changes being uploaded into the database at the end of editing process. in editing a record after it has been uploaded, the contributor, providing he has been granted the appropriate privileges, is able to see all the metadata elements that have attached to the record. calling up the editing tools at this point allows the contributor or administrator to make significant changes to the elements and their qualifiers, something that is not possible during the record’s creation. when using the editing tools, the simplified contributor interface disappears, and the metadata elements fields are labeled with their dc names. the contributor or administrator may remove metadata tags and the information they contain and add new ones selecting the appropriate metadata element and qualifier from a slider. for example, during the editing process, the contributor or administrator may choose to create dc.contributor. editor or dc.subject.lcsh options—something not possible during the record-creation process. in the examination of the dspace records from our three repositories, dspace’s shaping influence on element participation and metadata quality will be clearly seen. n the repositories dspace is principally used by academic and corporate nonprofit agencies to create and manage their institutional repositories. for this study, i selected three academic institutions that shared similar characteristics (large, public, research-based universities) but which had differing approaches to how they managed their metadata-quality issues. the university of new mexico (unm) dspace repository (dspaceunm) holds a wide-ranging set of records, including materials from the university’s faculty and administration, the law school, the anderson school of business administration, and the medical school, as well as materials from a number of tangentially related university entities like the western water policy review advisory commission, new mexico water trust board, and governor richardson’s task force on ethic reform. at the time of the initial research for this paper (spring 2008), dspaceunm provided little easily accessible on-site education for contributors about the dspace record-creation process. what was offered—a set of eight general information files—was buried deep inside the library community. a contributor would have to know the files existed to find them. by summer 2009, this had changed. dspaceunm had a new homepage layout. there is now a link to “help sheets and promotional materials” at the top center of the homepage. this link leads to the previously difficult-tofind help files. the content of the help files, however, remains largely unchanged. they discuss community creation, copyrights, administrative workflow for community creation, a list of supported formats, a statement of dspaceunm’s privacy policy, and a list of required, encouraged, and not required elements for each new record created. for the most part, dspaceunm help sheets do not attempt to educate the contributor in issues of metadata quality. there is no discussion of dc terminology, no attempts to refer the contributor to a thesaurus or controlled vocabulary list, nor any explanation of the record-creation or editing process. this lack of contributor education may be explained in part because dspaceunm requires all new records dc, dspace, and a brief analysis of three university repositories | kurtz 43 to be reviewed by a subject area librarian as part of the dspace community workflow. thus any contributor errors, in theory, ought to be caught and corrected before being uploaded to the database. the university of washington (uw) dspace repository (researchworks at the university of washington) hosts a narrower set of records than dspaceunm, with the materials limited to the those contributed by the university’s faculty, students, and staff, plus materials from the uw’s archives and uw’s school of public and community health. in 2008, researchworks was self-archiving. most contributors were expected to use dspace to create and upload their record. there is no indication in the publicly available information about the record creation workflow if record reviews were conducted before record upload. the help link on the researchworks homepage brought contributors to a set of screen-by-screen instructions on how to use dspace’s software to create and upload a record. the step-through did not include instructions on how to edit a record once it had been created. no explanation of the meanings or definitions of the various dc elements was included in the help files. there also were no suggestions about the use of a controlled vocabulary or a thesaurus for subject headings. by 2009, this link had disappeared and the associated contributor education materials with it. the knowledge bank at ohio state university(osu) is the third repository examined for this paper. osu’s repository hosts more than thirty communities, all of which are associated with various academic departments or special university programs. like researchworks at uw, osu’s repository appears to be self-archiving with no clear policy statement as to whether a record is reviewed before it is uploaded to the repository’s database. osu makes a strong effort to educate its contributors. on the upper-left of the knowledge bank homepage is a slider link that brings the contributor (or any user) to several important and useful sources of repository information: about knowledge bank, faqs, policies, video upload procedures, community set-up form, describing your resources, and knowledge bank licensing agreement. the existence and use of metadata in knowledge bank are explicitly mentioned in the faq and policies areas, together with an explanation of what metadata is and how metadata is used (faq), and a list of supported metadata elements (policies). the describe your resources section gives extended definitions of each dspace-available dc metadata element and provides examples of appropriate metadata-element use. knowledge bank provides the most comprehensive contributor education information of any of the three repositories examined. it does not use a controlled vocabulary list for subject headings, and it does not offer a thesaurus. n data and analysis i chose twenty randomly selected full records from each repository. no more than one record was taken from any one collection to gather a broad sampling from each repository. i examined each record for the quality of its metadata. metadata quality is a semantically slippery term. park, in the spring 2009 special metadata issue of cataloging and classification quarterly, suggested that most commonly accepted criteria for metadata quality are completeness, accuracy, and consistence.3 those criteria will be applied in this analysis. for the purpose of this paper, i define completeness as the fill rate for key metadata elements. because the purpose of metadata is to identify the record and to assist in the user’s search process, the key elements are title, contributor/creator, subject, and description.abstract— all contributor-generated fields. i chose these elements because these are the fields that the dspace software uses when someone conducts an unrestricted search. table 1 shows the fill rate for the title element is 100 percent for all three repositories. this is to be expected because, as noted above, title is mandatory field. the fill rate for contributor/creator is likewise high: 16 of 20 (80 percent) for unm, 19 of 20 (95 percent) for uw, and 19 of 20 (95 percent) for osu. (osu’s fill rate for creator and contributor were summed because osu uses different definitions for creator and contributor element fields than do unm or uw. this discrepancy will be discussed in greater depth in the consistency of metadata terminology below.) the fill rate for subject was more variable. unm’s subject fill rate was 100 percent, while uw’s was 55 percent, and osu’s was 40 percent. the fill rate for the description.abstract subfield was 12 of 80 (60 percent) at unm, 15 of 20 (75 percent) at uw, and 8 of 20 (40 percent) at osu. (see appendix a for a complete list of metadata elements and subfields used by each of the three repositories.) the relatively low fill rate (below 50 percent) at the osu knowledgebank in both subject and description .abstract suggests a lack of completeness in that repository’s records. accuracy in metadata quality is the essential “correctness” of a record. correctness issues in a record range from data-entry issues (typos, misspellings, and inconsistent date formats) to the correct application of metadata definitions and data overlaps.4 accuracy is perhaps the most difficult of the metadata 44 information technology and libraries | march 2010 quality criteria to judge. local practices vary widely, and dc allows for the creation of custom metadata tags for local use. additionally, there is long-standing debate and confusion about the definitions of metadata elements even among librarians and information professionals.5 because of this, only the most egregious of accuracy errors were considered for this paper. all three repositories had at least one record that contained one or more inaccurate metadata fields; two of them had four or more inaccurate records. inaccurate records included a wide variety of accuracy errors, including poor subject information (no matter how loosely one defines a subject heading, “the” is not an accurate descriptor); mutually contradictory metadata (record contained two different language tags, although only one applied to the content); and one in which the abstract was significantly longer and only tangentially related than the file it described. additionally, records showed confusion over contributor versus creator elements. in a few records, contributors entered duplicate information into both element fields. this observation supports park and childress’s findings that there is widespread confusion over these elements.6 among the most problematic records in terms of accuracy were those contained in uw’s early buddhist manuscripts project. this collection, which has been removed from public access since the original data was drawn for this paper, contained numerous ambiguous, contradictory, and inaccurate metadata elements.7 while contributor-generated subject headings were specifically not examined for this paper, it must be noted that was a wide variation in the level of detail and vocabulary used to describe records. no community within any of the repositories had specific rules for the generation of keyword descriptors for records, and the lack of guidance shows. consistency can be defined as the homogeneity of formats, definitions, and use of dc elements within the records. this consistency, or uniformity, of data is important because it promotes basic semantic interoperability. consistency both inside the repository itself and with other repositories makes the repository easier to use and provides the user with higher quality information. all three repositories showed 100 percent consistency in dspace-generated elements. dspace’s automated creation of date and format fields provided reliably consistent records in those element fields. dspace’s automatic formatting of personal names in the dc.contributor.author and dc.creator fields also provided excellent internal consistency. however, the metadata elements were much less consistent for contributor-generated information. inconsistency within the subject element is where most problems occurred. personal names used as subject heading and capitalization within subject headings both proved to be particular issues. dspace alphabetizes subject headings according to the first letter of the free text entered in the keyword box. thus the same name entered in different formats (first name first or last name first) generates different subject-heading listings. the same is true for capitalization. any difference in capitalization of any word within the free-text entry generates a separate subject heading. another field where consistency was an issue was dc.description.sponsorship. sponsorship is problem because different communities, even different collections within the same community, use the field to hold different information. some collections used the sponsorship field to hold the name of a thesis or dissertation advisor. some collections used sponsorship to list the funding agency or underwriter for a project being documented inside the record. some collections used sponsorship to acknowledge the donation of the physical materials documented by the record. while all of these are valid uses of the field, they are not the same thing and do not hold the same meaning for the user. the largest consistency issue, however, came from table 1. metadata fields and their frequencies element univ. of n.m. univ. of wash. ohio state univ. title 20 20 20 creator 0 0 16 subject 20 11 8 description 12 16 17 publisher 4 4 8 contributor 16 19 3 date 20 20 20 type 20 20 20 identifier 20 20 20 source 0 0 0 language 20 20 20 relation 3 1 6 coverage 2 0 0 rights 2 0 0 provenance ** ** ** **provenance tags are not visible to public users dc, dspace, and a brief analysis of three university repositories | kurtz 45 a comparison of repository policies regarding element use and definition. unaltered dspace software maps contributor-generated information entered into the author textbox during the record-creation process into the dc.contributor.author field. however, osu’s dspace software has been altered so that the dc.contributor .author field does not exist. instead, text entered into the author textbox during the record-creation process maps to dc.creator. although both uses are correct, this choice does create a significant difference in element definitions. osu’s dspace author fields are no longer congruent with other dspace author fields. n conclusions dspace was created as repository management tool. by streamlining the record creation workflow and partially automating the creation of metadata, dspace’s developers hoped to make institutional repositories more useful and functional while time providing an improved experience for both users and contributors. in this, dspace has been partially successful. dspace has made it easier for the “metadata naive” contributor to create records. and, in some ways, dspace has improved the quality of repository metadata. its automatically generated fields ensure better consistency in those elements and subfields. its mandatory fields guarantee 100 percent fill rates in some elements, and this contributes to an increase in metadata completeness. however, dspace still relies heavily on contributorgenerated data to fill most of the dc elements, and it is in these contributor-generated fields that most of the metadata quality issues arise. nonmandatory fields are skipped, leading to incomplete records. data entry errors, a lack of authority control over subject headings, and confusion over element definitions can lead to poor metadata accuracy. a lack of enforced, uniform naming and capitalization conventions leads to metadata inconsistency, as does the localized and individual differences in the application of metadata element definitions. while most of the records examined in this small survey could be characterized as “acceptable” to “good,” some are abysmal. to improve the inconsistency of the dspace records, the three universities have tried differing approaches. only unm’s required record review by a subject area librarian before upload seems to have made any significant impact on metadata quality. unm has a 100 percent fill rate for subject elements in its records, while uw and osu do not. this is not to say that unm’s process is perfect and that poor records do not get into the system—they do (see appendix b for an example). but it appears that for now, the intermediary intervention of a librarian during the record-creation process is an improvement over self-archiving—even with education—by contributors. references and notes 1. association of library collections & technical services, committee on cataloging: description & access, task force on metadata, “final report,” june 16, 2000, http://www.libraries .psu.edu/tas/jca/ccda/tf-meta6.html (accessed mar. 10, 2007). 2. a voluntary (and therefore less-than-complete) list of current dspace users can be found at http://www.dspace. org/index.php?option=com_content&task=view&id=596&ite mid=180. further specific information about dspace, including technical specifications, training materials, licensing, and a user wiki, can be found at http://www.dspace.org/index .php?option=com_content&task=blogcategory&id=44&itemi d=125. 3. jung-ran park “metadata quality in digital repositories: a survey of the current state of the art,” cataloging & classification quarterly 47, no. 3 (2009): 213–28. 4. sarah currier et al., “quality assurance for digital learning object repositories: issues for the metadata creation process,” alt-j: research in learning technology 12, no. 1 (2004): 5–20. 5. jung-ran park and eric childress, “dc metadata semantics: an analysis of the perspectives of informational professionals,” journal of information science 20, no. 10 (2009): 1–13. 6. ibid. 7. for a fuller discussion of the collection’s problems and challenges in using both dspace and dc, see kathleen forsythe et al., university of washington ealy buddhist manuscripts project in dspace (paper presented at dc-2003, seattle, wash., sept. 28–oct. 2, 2003), http://dc2003.ischool.washington.edu/ archive-03/03forsythe.pdf (accessed mar. 10, 2007). lita cover 2, cover 3 neal-schuman cover 4 oclc 7 index to advertisers 46 information technology and libraries | march 2010 appendix a. a list of the most commonly used qualifiers in each repository university of new mexico dc.date.issued (20) dc.date.accessioned (20) dc.date.available (20) dc.format.mimetype (20) dc.format.extent (20) dc.identifier.uri (20) dc.contributor.author (15)) dc.description.abstract (12) dc.identifier.citation (6) dc.description.sponsorship (4) dc.subject.mesh (2) dc.contributor.other (2) dc.description.sponsor (1) dc.date.created (1) dc.relation.isbasedon (1) dc.relation.ispartof (1) dc.coverage.temporal (1) dc.coverage.spatial (1) dc.contributor.other (1) university of washington dc.date.accessioned (20) dc.date.available (20) dc.date.issued (20) dc.format.mimetype (20) dc.format.extent (20) dc. identifier.uri (20) dc.contributor.author (18) dc.description.abstract (15) dc.identifier.citation (4) dc.identifier.issn (4) dc.description.sponsorship (1) dc.contributor.corporateauthor (1) dc.contributor.illustrator (1) dc.relation.ispartof (1) ohio state university dc.date.issued (20) dc.date.available (20) dc.date.accessioned (20) dc.format.mimetype (20) dc.format.extent (20) dc.identifier.uri (20) dc.description.abstract (8) dc.identifier.citation (4) dc.subject.lcsh (4) dc.relation.ispartof (4) dc.description.sponsorship (3) dc.identifier.other (2) dc.contributor.editor (2) dc.contribtor.advisor (1) dc.identifier.issn (1) dc.description.duration (1) dc.relation.isformatof (1) dc.description.statementofresponsibility (1) dc.description.tableofcontents (1) appendix b. sample record dc.identifier.uri http://hdl.handle.net/1928/3571 dc.description.abstract president schmidly’s charge for the creation of a north golf course community advisory board. dc.format.extent 17301 bytes dc.format.mimetype application/pdf dc.language.iso en_us dc.subject president dc.subject schmidly dc.subject north dc.subject golf dc.subject course dc.subject community dc.subject advisory dc.subject board dc.subject charge dc.title community_advisory_board_charge dc.type other the next generation library catalog | zhou 151are your digital documents web friendly? | zhou 151 are your digital documents web friendly?: making scanned documents web accessible the internet has greatly changed how library users search and use library resources. many of them prefer resources available in electronic format over traditional print materials. while many documents are now born digital, many more are only accessible in print and need to be digitized. this paper focuses on how the colorado state university libraries creates and optimizes text-based and digitized pdf documents for easy access, downloading, and printing. t o digitize print materials, we normally scan originals, save them in archival digital formats, and then make them webaccessible. there are two types of print documents, graphic-based and text-based. if we apply the same techniques to digitize these two different types of materials, the documents produced will not be web-friendly. graphic-based materials include archival resources such as historical photographs, drawings, manuscripts, maps, slides, and posters. we normally scan them in color at a very high resolution to capture and present a reproduction that is as faithful to the original as possible. then we save the scanned images in tiff (tagged image file format) for archival purposes and convert the tiffs to jpeg (joint photographic experts group) 2000 or jpeg for web access. however, the same practice is not suitable for modern text-based documents, such as reports, journal articles, meeting minutes, and theses and dissertations. many old text-based documents (e.g., aged newspapers and books), should be yongli zhoututorial files for fast web delivery as access files. for text-based files, access files normally are pdfs that are converted from scanned images. “bcr’s cdp digital imaging best practices version 2.0” says that the master image should be the highest quality you can afford, it should not be edited or processed for any specific output, and it should be uncompressed.1 this statement applies to archival images, such as photographs, manuscripts, and other image-based materials. if we adopt the same approach for modern text documents, the result may be problematic. pdfs that are created from such master files may have the following drawbacks: ■■ because of their large file size, they require a long download time or cannot be downloaded because of a timeout error. ■■ they may crash a user’s computer because they use more memory while viewing. ■■ they sometimes cannot be printed because of insufficient printer memory. ■■ poor print and on-screen viewing qualities can be caused by background noise and bleedthrough of text. background noise can be caused by stains, highlighter marks made by users, and yellowed paper from aged documents. ■■ the ocr process sometimes does not work for high-resolution images. ■■ content creators need to spend more time scanning images at a high resolution and converting them to pdf documents. web-friendly files should be small, accessible by most users, full-text searchable, and have good treated as graphic-based material. these documents often have faded text, unusual fonts, stains, and colored background. if they are scanned using the same practice as modern text documents, the document created can be unreadable and contain incorrect information. this topic is covered in the section “full-text searchable pdfs and troubleshooting ocr errors.” currently, pdf is the file format used for most digitized text documents. while pdfs that are created from high-resolution color images may be of excellent quality, they can have many drawbacks. for example, a multipage pdf may have a large file size, which increases download time and the memory required while viewing. sometimes the download takes so long it fails because a time-out error occurs. printers may have insufficient memory to print large documents. in addition, the optical character recognition (ocr) process is not accurate for high resolution images in either color or grayscale. as we know, users want the ability to easily download, view, print, and search online textual documents. all of the drawbacks created by high-quality scanning defeat one of the most important purposes of digitizing text-based documents: making them accessible to more users. this paper addresses how colorado state university libraries (csul) manages these problems and others as staff create web-friendly digitized textual documents. topics include scanning, long-time archiving, full-text searchable pdfs and troubleshooting ocr problems, and optimizing pdf files for web delivery. preservation master files and access files for digitization projects, we normally refer to images in uncompressed tiff format as master files and compressed yongli zhou is digital repositories librarian, colorado state university libraries, colorado state university, fort collins, colorado 152 information technology and libraries | september 2010152 information technology and libraries | september 2010 factors that determine pdf file size. color images typically generate the largest pdfs and black-and-white images generate the smallest pdfs. interestingly, an image of smaller file size does not necessarily generate a smaller pdf. table 1 shows how file format and color mode affect pdf file size. the source file is a page containing black-and-white text and line art drawings. its physical dimensions are 8.047" by 10.893". all images were scanned at 300 dpi. csul uses adobe acrobat professional to create pdfs from scanned images. the current version we use is adobe acrobat 9 professional, but most of its features listed in this paper are available for other acrobat versions. when acrobat converts tiff images to a pdf, it compresses images. therefore a final pdf has a smaller file size than the total size of the original images. acrobat compresses tiff uncompressed, lzw, and zip the same amount and produces pdfs of the same file size. because our in-house scanning software does not support tiff g4, we did not include tiff g4 test data here. by comparing similar pages, we concluded that tiff g4 works the same as tiff uncompressed, lzw, and zip. for example, if we scan a text-based page as blackand-white and save it separately in tiff uncompressed, lzw, zip, or g4, then convert each page into a pdf, the final pdf will have the same file size without a noticeable quality difference. tiff jpeg generates the smallest pdf, but it is a lossy format, so it is not recommended. both jpeg and jpeg 2000 have smaller file sizes but generate larger pdfs than those converted from tiff images. recommendations 1. use tiff uncompressed or lzw in 24 bits color for pages with color graphs or for historical documents. 2. use tiff uncompressed or lzw compress an image up to 50 percent. some vendors hesitate to use this format because it was proprietary; however, the patent expired on june 20, 2003. this format has been widely adopted by much software and is safe to use. csul saves all scanned text documents in this format. ■■ tiff zip: this is a lossless compression. like lzw, zip compression is most effective for images that contain large areas of single color. 2 ■■ tiff jpeg: this is a jpeg file stored inside a tiff tag. it is a lossy compression, so csul does not use this file format. other image formats: ■■ jpeg: this format is a lossy compression and can only be used for nonarchival purposes. a jpeg image can be converted to pdf or embedded in a pdf. however, a pdf created from jpeg images has a much larger file size compared to a pdf created from tiff images. ■■ jpeg 2000: this format’s file extension is .jp2. this format offers superior compression performance and other advantages. jpeg 2000 normally is used for archival photographs, not for text-based documents. in short, scanned images should be saved as tiff files, either with compression or without. we recommend saving text-only pages and pages containing text and/or line art as tiff g4 or tiff lzw. we also recommend saving pages with photographs and illustrations as tiff lzw. we also recommend saving pages with photographs and illustrations as tiff uncompressed or tiff lzw. how image format and color mode affect pdf file size color mode and file format are two on-screen viewing and print qualities. in the following sections, we will discuss how to make scanned documents web-friendly. scanning there are three main factors that affect the quality and file size of a digitized document: file format, color mode, and resolution of the source images. these factors should be kept in mind when scanning text documents. file format and compression most digitized documents are scanned and saved as tiff files. however, there are many different formats of tiff. which one is appropriate for your project? ■■ tiff: uncompressed format. this is a standard format for scanned images. however, an uncompressed tiff file has the largest file size and requires more space to store. ■■ tiff g3: tiff with g3 compression is the universal standard for faxs and multipage line-art documents. it is used for blackand-white documents only. ■■ tiff g4: tiff with g4 compression has been approved as a lossless archival file format for bitonal images. tiff images saved in this compression have the smallest file size. it is a standard file format used by many commercial scanning vendors. it should only be used for pages with text or line art. many scanning programs do not provide this file format by default. ■■ tiff huffmann: a method for compressing bi-level data based on the ccitt group 3 1d facsimile compression schema. ■■ tiff lzw: this format uses a lossless compression that does not discard details from images. it may be used for bitonal, grayscale, and color images. it may the next generation library catalog | zhou 153are your digital documents web friendly? | zhou 153 to be scanned at no less than 600 dpi in color. our experiments show that documents scanned at 300 or 400 dpi are sufficient for creating pdfs of good quality. resolutions lower than 300 dpi are not recommended because they can degrade image quality and produce more ocr errors. resolutions higher than 400 dpi also are not recommended because they generate large files with little improved on-screen viewing and print quality. we compared pdf files that were converted from images of resolutions at 300, 400, and 600 dpi. viewed at 100 percent, the difference in image quality both on screen and in print was negligible. if a page has text with very small font, it can be scanned at a higher resolution to improve ocr accuracy and viewing and print quality. table 2 shows that high-resolution images produce large files and require more time to be converted into pdfs. the time required to combine images is not significantly different compared to scanning time and ocr time, so it was omitted. our example is a modern text document with text and a black-and-white chart. most of our digitization projects do not require scanning at 600 dpi; 300 dpi is the minimum requirement. we use 400 dpi for most documents and choose a proper color mode for each page. for example, we scan our theses and dissertations in black-andwhite at 400 dpi for bitonal pages. we scan pages containing photographs or illustrations in 8-bit grayscale or 24-bit color at 400 dpi. other factors that affect pdf file size in addition to the three main factors we have discussed, unnecessary edges, bleed-through of text and graphs, background noise, and blank pages also increase pdf file sizes. figure 1 shows how a clean scan can largely reduce a pdf file size and cover. the updated file has a file size of 42.8 mb. the example can be accessed at http://hdl.handle .net/10217/3667. sometimes we scan a page containing text and photographs or illustrations twice, in color or grayscale and in black-and-white. when we create a pdf, we combine two images of the same page to reproduce the original appearance and to reduce file size. how to optimize pdfs using multiple scans will be discussed in a later section. how image resolution affects pdf file size before we start scanning, we check with our project manager regarding project standards. for some funded projects, documents are required in grayscale 8 bits for pages with black-and-white photographs or grayscale illustrations. 3. use tiff uncompressed, lzw, or g4 in black-and-white for pages containing text or line art. to achieve the best result, each page should be scanned accordingly. for example, we had a document with a color cover, 790 pages containing text and line art, and 7 blank pages. we scanned the original document in color at 300 dpi. the pdf created from these images was 384 mb, so large that it exceeded the maximum file size that our repository software allows for uploading. to optimize the document, we deleted all blank pages, converted the 790 pages with text and line art from color to blackand-white, and retained the color table 1. file format and color mode versus pdf file size file format scan specifications tiff size (kb) pdf size (kb) tiff color 24 bits 23,141 900 tiff lzw color 24 bits 5,773 900 tiff zip color 24 bits 4,892 900 tiff jpeg color 24 bits 4,854 873 jpeg 2000 color 24 bits 5,361 5,366 jpeg color 24 bits 4,849 5,066 tiff grayscale 8 bits 7,729 825 tiff lzw grayscale 8 bits 2,250 825 tiff zip grayscale 8 bits 1,832 825 tiff jpeg grayscale 8 bits 2,902 804 jpeg 2000 grayscale 8 bits 2,266 2,270 jpeg grayscale 8 bits 2,886 3,158 tiff black-and-white 994 116 tiff lzw black-and-white 242 116 tiff zip black-and-white 196 116 note: black-and-white scans cannot be saved in jpeg, jpeg 2000, or tiff jpeg formats. 154 information technology and libraries | september 2010154 information technology and libraries | september 2010 many pdf files cannot be saved as pdf/a files. if an error occurs when saving a pdf to pdf/a, you may use adobe acrobat preflight (advanced > preflight) to identify problems. see figure 2. errors can be created by nonembedded fonts, embedded images with unsupported file compression, bookmarks, embedded video and audio, etc. by default, the reduce file size procedure in acrobat professional compresses color images using jpeg 2000 compression. after running the reduce file size procedure, a pdf may not be saved as a pdf/a because of a “jpeg 2000 compression used” error. according to the pdf/a competence center, this problem will be eliminated in the second part of the pdf/a standard— pdf/a-2 is planned for 2008/2009. there are many other features in new pdfs; for example, transparency and layers will be allowed in pdf/a2.5 however, at the time this paper was written pdf/a-2 had not been announced.6 portable, which means the file created on one computer can be viewed with an acrobat viewer on other computers, handheld devices, and on other platforms.3 a pdf/a document is basically a traditional pdf document that fulfills precisely defined specifications. the pdf/a standard aims to enable the creation of pdf documents whose visual appearance will remain the same over the course of time. these files should be software-independent and unrestricted by the systems used to create, store, and reproduce them.4 the goal of pdf/a is for long-term archiving. a pdf/a document has the same file extension as a regular pdf file and must be at least compatible with acrobat reader 4. there are many ways to create a pdf/a document. you can convert existing images and pdf files to pdf/a files, export a document to pdf/a format, scan to pdf/a, to name a few. there are many software programs you can use to create pdf/a, such as adobe acrobat professional 8 and later versions, compart ag, pdflib, and pdf tools ag. simultaneously improve its viewing and print quality. recommendations 1. unnecessary edges: crop out. 2. bleed-through text or graphs: place a piece of white or black card stock on the back of a page. if a page is single sided, use white card stock. if a page is double sided, use black card stock and increase contrast ratio when scanning. often color or grayscale images have bleedthrough problems. scanning a page containing text or line art as black-and-white will eliminate bleed-through text and graphs. 3. background noise: scanning a page containing text or line art as black-and-white can eliminate background noise. many aged documents have yellowed papers. if we scan them as color or grayscale, the result will be images with yellow or gray background, which may increase pdf file sizes greatly. we also recommend increasing the contrast for better ocr results when scanning documents with background colors. 4. blank pages: do not include if they are not required. blank pages scanned in grayscale or color can quickly increase file size. pdf and longterm archiving pdf/a pdf vs. pdf/a pdf, short for portable document format, was developed by adobe as a unique format to be viewed through adobe acrobat viewers. as the name implies, it is table 2. color mode and image resolution vs. pdf file size color mode resolution (dpi) scanning time (sec.) ocr time (sec.) tiff lzw (kb) pdf size (kb) color 600 100 n/a* 16,498 2,391 color 400 25 35 7,603 1,491 color 300 18 16 5,763 952 grayscale 600 36 33 6,097 2,220 grayscale 400 18 18 2,888 1370 grayscale 300 14 12 2,240 875 b/w 600 12 18 559 325 b/w 400 10 10 333 235 b/w 300 8 9 232 140 *n/a due to an ocr error the next generation library catalog | zhou 155are your digital documents web friendly? | zhou 155 able. this option keeps the original image and places an invisible text layer over it. recommended for cases requiring maximum fidelity to the original image.8 this is the only option used by csul. 2. searchable image: ensures that text is searchable and selectable. this option keeps the original image, de-skews it as needed, and places an invisible text layer over it. the selection for downsample images in this same dialog box determines whether the image is downsampled and to what extent.9 the downsampling combines several pixels in an image to make a single larger pixel; thus some information is deleted from the image. however, downsampling does not affect the quality of text or line art. when a proper setting is used, the size of a pdf can be significantly reduced with little or no loss of detail and precision. 3. clearscan: synthesizes a new type 3 font that closely approximates the original, and preserves the page background using a low-resolution copy.10 the final pdf is the same as a born-digital pdf. because acrobat cannot guarantee the accuracy of manipulate the pdf document for accessibility. once ocr is properly applied to the scanned files, however, the image becomes searchable text with selectable graphics, and one may apply other accessibility features to the document.7 acrobat professional provides three ocr options: 1. searchable image (exact): ensures that text is searchable and selectfull-text searchable pdfs and troubleshooting ocr errors a pdf created from a scanned piece of paper is inherently inaccessible because the content of the document is an image, not searchable text. assistive technology cannot read or extract the words, users cannot select or edit the text, and one cannot figure 1. pdfs converted from different images: (a) the original pdf converted from a grayscale image and with unnecessary edges; (b) updated pdf converted from a blackand-white image and with edges cropped out; (c) screen viewed at 100 percent of the pdf in grayscale; and (d) screen viewed at 100 percent of the pdf in black-and-white. dimensions: 9.127” x 11.455” color mode: grayscale resolution: 600 dpi tiff lzw: 12.7 mb pdf: 1,051 kb dimensions: 8” x 10.4” color mode: black-and-white resolution: 400 dpi tiff lzw: 153 kb pdf: 61 kb figure 2. example of adobe acrobat 9 preflight 156 information technology and libraries | september 2010156 information technology and libraries | september 2010 but at least users can read all text, while the black-and-white scan contains unreadable words. troubleshoot ocr error 3: cannot ocr image based text the search of a digitized pdf is actually performed on its invisible text layer. the automated ocr process inevitably produces some incorrectly recognized words. for example, acrobat cannot recognize the colorado state university logo correctly (see figure 6). unfortunately, acrobat does not provide a function to edit a pdf file’s invisible text layer. to manually edit or add ocr’d text, adobe acrobat capture 3.0 (see figure 7) must be purchased. however, our tests show that capture 3.0 has many drawbacks. this software is complicated and produces it’s own errors. sometimes it consolidates words; other times it breaks them up. in addition, it is time-consuming to add or modify invisible text layers using acrobat capture 3.0. at csul, we manually add searchable text for title and abstract pages only if they cannot be ocr’d by acrobat correctly. the example in troubleshoot ocr error 2: could not perform recognition (ocr) sometimes acrobat generates an “outside of the allowed specifications” error when processing ocr. this error is normally caused by color images scanned at 600 dpi or more. in the example in figure 4, the page only contains text but was scanned in color at 600 dpi. when we scanned this page as blackand-white at 400 dpi, we did not encounter this problem. we could also use a lower-resolution color scan to avoid this error. our experiments also show that images scanned in black-and-white work best for the ocr process. in this article we mainly discuss running the ocr process on modern textual documents. black-and-white scans do not work well for historical textual documents or aged newspapers. these documents may have faded text and background noise. when they are scanned as blackand-white, broken letters may occur, and some text might become unreadable. for this reason they should be scanned in color or grayscale. in figure 5, images scanned in color might not produce accurate ocr results, ocred text at 100 percent, this option is not acceptable for us. for a tutorial on to how to make a full-text searchable pdf, please see appendix a. troubleshoot ocr error 1: acrobat crashes occasionally acrobat crashes during the ocr process. the error message does not indicate what causes the crash and where the problem occurs. fortunately, the page number of the error can be found on the top shortcuts menu. in figure 3, we can see the error occurs on page 7. we discovered that errors are often caused by figures or diagrams. for a problem like this, the solution is to skip the error-causing page when running the ocr process. our initial research was performed on acrobat 8 professional. our recent study shows that this problem has been significantly improved in acrobat 9 professional. figure 3. adobe acrobat 8 professional crash window figure 4. “could not perform recognition (ocr)” error figure 5. an aged newspaper scanned in color and black-and-white aged newspaper scanned in color aged newspaper scanned in black-and-white the next generation library catalog | zhou 157are your digital documents web friendly? | zhou 157 a very light yellow background. the undesirable marks and background contribute to its large file size and create ink waste when printed. method 2: running acrobat’s built-in optimization processes acrobat provides three built-in processes to reduce file size. by default, acrobat use jpeg compression for color and grayscale images and ccitt group 4 compression for bitonal images. optimize scanned pdf open a scanned pdf and select documents > optimize scanned pdf. a number of settings, such as image quality and background removal, can be specified in the optimize scanned pdf dialog box. our experiments show this process can noticably degrade images and sometimes even increase file size. therefore we do not use this option. reduce file size open a scanned pdf and select documents > reduce file size. the reduce file size command resamples and recompresses images, removes embedded base-14 fonts, and subset-embeds fonts that were left embedded. it also compresses document structure and cleans up elements such as invalid bookmarks. if the file size is already as small as possible, this command has no effect.11 after process, some files cannot be saved as pdf/a, as we discussed in a previous section. we also noticed that different versions of acrobat can create files of different file sizes even if the same settings were used. pdf optimizer open a scanned pdf and select advanced > pdf optimizer. many settings can be specified in the pdf optimizer dialog box. for example, we can downsample images from sections, we can greatly reduce a pdf’s size by using an appropriate color mode and resolution. figure 9 shows two different versions of a digitized document. the source document has a color cover and 111 bitonal pages. the original pdf, shown in figure 9 on the left, was created by another university department. it was not scanned according to standards and procedures adopted by csul. it was scanned in color at 300 dpi and has a file size of 66,265 kb. we exported the original pdf as tiff images, batch-converted color tiff images to black-and-white tiff images, and then created a new pdf using blackand-white tiff images. the updated pdf has a file size of 8,842 kb. the image on the right is much cleaner and has better print quality. the file on the left has unwanted marks and figure 8 is a book title page for which we used acrobat capture 3.0 to manually add searchable text. the entire book may be accessed at http://hdl .handle.net/10217/1553. optimizing pdfs for web delivery a digitized pdf file with 400 color pages may be as large as 200 to 400 mb. most of the time, optimizing processes may reduce files this large without a noticeable difference in quality. in some cases, quality may be improved. we will discuss three optimization methods we use. method 1: using an appropriate color mode and resolution as we have discussed in previous ~do university original logo text ocred by acrobat figure 6. incorrectly recognized text sample figure 7. adobe acrobat capture interface figure 8. image-based text sample 158 information technology and libraries | september 2010158 information technology and libraries | september 2010 grayscale. a pdf may contain pages that were scanned with different color modes and resolutions. a pdf may also have pages of mixed resolutions. one page may contain both bitonal images and color or grayscale images, but they must be of the same resolution. the following strategies were adopted by csul: 1. combine bitmap, grayscale, and color images. we use grayscale images for pages that contain grayscale graphs, such as black-and-white photos, color images for pages that contain color images, and bitmap images for text-only or text and line art pages. 2. if a page contains high-definition color or grayscale images, scan that page in a higher resolution and scan other pages at 400 dpi. 3. if a page contains a very small font and the ocr process does not work well, scan it at a higher resolution and the rest of document at 400 dpi. 4. if a page has both text, color, or grayscale graphs, we scan it twice. then we modify images using adobe photoshop and combine two images in acrobat. in figure 10, the grayscale image has a gray background and a true reproduction of the original photograph. the black-and-white scan has a white background and clean text, but details of the photograph are lost. the pdf converted from the grayscale image is 491 kb and has nine ocr errors. the pdf converted from the black-and-white image is 61kb and has no ocr errors. the pdf converted from a combination of the grayscale and black-and-white images is 283 kb and has no ocr errors. the following are the steps used to create a pdf in figure 10 using acrobat: 1. scan a page twice—grayscale optimizer can be found at http:// www.acrobatusers.com/tutorials/ understanding-acrobats-optimizer. method 3: combining different scans many documents have color covers and color or grayscale illustrations, but the majority of pages are textonly. it is not necessary to scan all pages of such documents in color or a higher resolution to a lower resolution and choose a different file compression. different collections have different original sources, therefore different settings should be applied. we normally do several tests for each collection and choose the one that works best for it. we also make our pdfs compatible with acrobat 6 to allow users with older versions of software to view our documents. a detailed tutorial of how to use the pdf figure 9. reduce file size example figure 10. reduce file size example: combine images the next generation library catalog | zhou 159are your digital documents web friendly? | zhou 159 help.html?content=wsfd1234e1c4b69f30 ea53e41001031ab64-7757.html (accessed mar. 3, 2010). 3. ted padova adobe acrobat 7 pdf bible, 1st ed. (indianapolis: wiley, 2005). 4. olaf drümmer, alexandra oettler, and dietrich von seggern, pdf/a in a nutshell—long term archiving with pdf, (berlin: association for digital document standards, 2007). 5. pdf/a competence center, “pdf/a: an iso standard—future development of pdf/a,” http://www. pdfa.org/doku.php?id=pdfa:en (accessed july 20, 2010). 6. pdf/a competence center, “pdf/a—a new standard for longterm archiving,” http://www.pdfa.org/ doku.php?id=pdfa:en:pdfa_whitepaper (accessed july 20, 2010). 7. adobe, “creating accessible pdf documents with adobe acrobat 7.0: a guide for publishing pdf documents for use by people with disabilities,” 2005, http://www.adobe.com/enterprise/ a c c e s s i b i l i t y / p d f s / a c ro 7 _ p g _ u e . p d f (accessed mar. 8, 2010). 8. adobe, “recognize text in scanned documents,” 2010, http:// help.adobe.com/en_us/acrobat/9.0/ s t a n d a rd / w s 2 a 3 d d 1 fa c fa 5 4 c f 6 -b993-159299574ab8.w.html (accessed mar. 8, 2010). 9. ibid. 10. ibid. 11. adobe, “reduce file size by saving,” 2010, http://help.adobe.com/en_us/ acrobat/9.0/standard/ws65c0a053 -bc7c-49a2-88f1-b1bcd2524b68.w.html (accessed mar. 3, 2010). the other 76 pages as grayscale and black-and-white. then we used the procedure described above to combine text pages and photographs. the final pdf has clear text and correctly reproduced photographs. the example can be found at http://hdl .handle.net/10217/1553. conclusion our case study, as reported in this article, demonstrates the importance of investing the time and effort to apply the appropriate standards and techniques for scanning and optimizing digitized documents. if proper techniques are used, the final result will be web-friendly resources that are easy to download, view, search, and print. users will be left with a positive impression of the library and feel encouraged to use its materials and services again in the future. references 1. bcr’s cdp digital imaging best practices working group, “bcr’s cdp digital imaging best practices version 2.0,” june 2008, http://www.bcr.org/ dps/cdp/best/digital-imaging-bp.pdf (accessed mar. 3, 2010). 2. adobe, “about file formats and compression,” 2010, http://livedocs .adobe.com/en_us/photoshop/10.0/ and black-and-white. 2. crop out text on the grayscale scan using photoshop. 3. delete the illustration on the black-and-white image using photoshop. 4. create a pdf using the blackand-white image. 5. run the ocr process and save the file. 6. insert the color graph. select tools > advanced editing > touchup object tool. rightclick on the page and select place image. locate the color graph in the open dialog, then click open and move the color graph to its correct location. 7. save the file and run the reduce file size or pdf optimizer procedure. 8. save the file again. this method produces the smallest file size with the best quality, but it is very time-consuming. at csul we used this method for some important documents, such as one of our institutional repository’s showcase items, agricultural frontier to electronic frontier. the book has 220 pages, including a color cover, 76 pages with text and photographs, and 143 text-only pages. we used a color image for the cover page and 143 black-and-white images for the 143 text-only pages. we scanned appendix a. step-by-step creating a full-text searchable pdf in this tutorial, we will show you how to create a full-text searchable pdf using adobe acrobat 9 professional. creating a pdf from a scanner adobe acrobat professional can create a pdf directly from a scanner. acrobat 9 provides five options: black and white document, grayscale document, color document, color image, and custom scan. the custom scan option allows you to scan, run the ocr procedure, add metadata, combine multiple pages into one pdf, and also make it pdf/a compliant. to create a pdf from a scanner, go to file > create pdf > from scanner > custom scan. see figure 1. at csul, we do not directly create pdfs from scanners because our tests show that it can produce fuzzy text and it is not time efficient. both scanning and running the ocr process can be very time consuming. if an error occurs during these processes, we would have to start over again. we normally scan images on scanning stations by student employees 160 information technology and libraries | september 2010160 information technology and libraries | september 2010 or outsource them to vendors. then library staff will perform quality control and create pdfs on seperate machines. in this way, we can work on multiple documents at the same time and ensure that we provide high-quality pdfs. creating a pdf from scanned images 1. from the task bar select combine > merge files into a single pdf > from multiple files. see figure 2. 2. in the combine files dialog, make sure the single pdf radio button is selected. from the add files dropdown menu select add files. see figure 3. 3. in the add files dialog, locate images and select multiple images by holding shift key, and then click add files button. 4. by default, acrobat sorts files by file names. use move up and move down buttons to change image orders and use the remove button to delete images. choose a target file size. the smallest icon will produce a file with a smaller file size but a lower image quality pdf, and the largest icon will produce a high image quality pdf but with a very large file size. we normally use the default file size setting, which is the middle icon. 5. save the file. at this point, the pdf is not full-text searchable. making a full-text searchable pdf a pdf document created from a scanned piece of paper is inherently inaccessible because the content of the document is an image, not searchable text. assistive technology cannot read or extract the words, users cannot select or edit the text, and one cannot manipulate the pdf document for accessibility. once optical character recognition (ocr) is properly applied to the scanned files, however, the image becomes searchable text with selectable graphics, and one may apply other accessibility features to the document. adobe acrobat professional provides three ocr options, searchable image (exact), searchable image, and clean scan. because searchable image (exact) is the only option that keeps the original look, we only use this option. to run an ocr procedure using acrobat 9 professional: 1. open a digitized pdf. 2. select document > ocr text recognition > recognize text using ocr. 3. in the recognize text dialog, specify pages to be ocred. 4. in the recognize text dialog, click the edit button in the settings section to choose ocr language and pdf output style. we recommend the searchable image (exact) option. click ok. the setting will be remembered by the program and will be used until a new setting is chosen. sometimes a pdf’s file size increases greatly after an ocr process. if this happens, use the pdf optimizer to reduce its file size. figure 2. merge files into a single pdf figure 3. combine files dialog figure 1. acrobat 9 professional’s create pdf from scanner dialog 6 information technology and libraries | march 2010 sandra shores is [tk] sandra shores editorial board thoughts: issue introduction to student essays t he papers in this special issue, although covering diverse topics, have in common their authorship by people currently or recently engaged in graduate library studies. it has been many years since i was a library science student—twenty-five in fact. i remember remarking to a future colleague at the time that i found the interview for my first professional job easy, not because the interviewers failed to ask challenging questions, but because i had just graduated. i was passionate about my chosen profession, and my mind was filled from my time at library school with big ideas and the latest theories, techniques, and knowledge of our discipline. while i could enthusiastically respond to anything the interviewers asked, my colleague remarked she had been in her job so long that she felt she had lost her sense of the big questions. the busyness of her daily work life drew her focus away from contemplation of our purpose, principles, and values as librarians. i now feel at a similar point in my career as this colleague did twenty-five years ago, and for that reason i have been delighted to work with these student authors to help see their papers through to publication. the six papers represent the strongest work from a wide selection that students submitted to the lita/ ex libris student writing award competition. this year’s winner is michael silver, who looks forward to graduating in the spring from the mlis program at the university of alberta. silver entered the program with a strong library technology foundation, having provided it services to a regional library system for about ten years. he notes that “the ‘accidental systems librarian’ position is probably the norm in many small and medium sized libraries. as a result, there are a number of practices that libraries should adopt from the it world that many library staff have never been exposed to.”1 his paper, which details the implementation of an open-source monitoring system to ensure the availability of library systems and services, is a fine example of the blending of best practices from two professions. indeed, many of us who work in it in libraries have a library background and still have a great deal to learn from it professionals. silver is contemplating a phd program or else a return to a library systems position when he graduates. either way, the profession will benefit from his thoughtful, well-researched, and useful contributions to our field. todd vandenbark’s paper on library web design for persons with disabilities follows, providing a highly practical but also very readable guide for webmasters and others. vandenbark graduated last spring with a masters degree from the school of library and information science at indiana university and is already working as a web services librarian at the eccles health sciences library at the university of utah. like mr. silver, he entered the program with a number of years’ work experience in the it field, and his paper reflects the depth of his technical knowledge. vandenbark notes, however, that he has found “the enthusiasm and collegiality among library technology professionals to be a welcome change from other employment experiences,” a gratifying comment for readers of this journal. ilana tolkoff tackles the challenging concept of global interoperability in cataloguing. she was fascinated that a single database, oclc, has holdings from libraries all over the world. this is also such a recent phenomenon that our current cataloging standards still do not accommodate such global participation. i was interested to see what librarians were doing to reconcile this variety of languages, scripts, cultures, and independently developed cataloging standards. tolkoff also graduated this past spring and is hoping to find a position within a music library. marijke visser addresses the overwhelming question of how to organize and expose internet resources, looking at tagging and the social web as a solution. coming from a teaching background, visser has long been interested in literacy and life-long learning. she is concerned about “the amount of information found only online and what it means when people are unable . . . to find the best resources, the best article, the right website that answers a question or solves a critical problem.” she is excited by “the potential for creativity made possible by technology” and by the way librarians incorporate “collaborative tools and interactive applications into library service.” visser looks forward to graduating in may. mary kurtz examines the use of the dublin core metadata schema within dspace institutional repositories. as a volunteer, she used dspace to archive historical photographs and was responsible for classifying them using dublin core. she enjoyed exploring how other institutions use the same tools and would love to delve further into digital archives, “how they’re used, how they’re organized, who uses them and why.” kurtz graduated in the summer and is looking for the right job for her interests and talents in a location that suits herself and her family. finally, lauren mandel wraps up the issue exploring the use of a geographic information system to understand how patrons use library spaces. mandel has been an enthusiastic patron of libraries since she was a small child visiting her local county and city public libraries. she is currently a doctoral candidate at florida state university and sees an academic future for herself. mandel expresses infectious optimism about technology in libraries: sandra shores (sandra.shores@ualberta.ca) is guest editor of this issue and operations manager, information technology services, university of alberta libraries, edmonton, alberta, canada. editorial board thoughts | shores 7 looking ahead, it seems clear that the pace of change in today’s environment will only continue to accelerate; thus the need for us to quickly form and dissolve key sponsorships and partnerships that will result in the successful fostering and implementation of new ideas, the currency of a vibrant profession. the next challenge is to realize that many of the key sponsorship and partnerships that need to be formed are not just with traditional organizations in this profession. tomorrow’s sponsorships and partnership will be with those organizations that will benefit from the expertise of libraries and their suppliers while in return helping to develop or provide the new funding opportunities and means and places for disseminating access to their expertise and resources. likely organizations would be those in the fields of education, publishing, content creation and management, and social and community webbased software. to summarize, we at ex libris believe in sponsorships and partnerships. we believe they’re important and should be used in advancing our profession and organizations. from long experience we also have learned there are right ways and wrong ways to implement these tools, and i’ve shared thoughts on how to make them work for all the parties involved. again, i thank marc for his receptiveness to this discussion and my even deeper appreciation for trying to address the issues. it’s serves as an excellent example of what i discussed above. people forget, but paper, the scroll, the codex, and later the book were all major technological leaps, not to mention the printing press and moveable type. . . . there is so much potential for using technology to equalize access to information, regardless of how much money you have, what language you speak, or where you live. big ideas, enthusiasm, and hope for the profession, in addition to practical technology-focused information await the reader. enjoy the issue, and congratulations to the winner and all the finalists! note 1. all quotations are taken with permission from private e-mail correspondence. a partnership for creating successful partnerships continued from page 5 automated book order and circulation control procedures at the oakland university library lawrence auld: oakland university, rochester, michigan 93 automated systems of book order and circulation control using an ibm 1620 computer are described as developed at oakland university. relative degrees of success and failure are discussed briefly. introduction oakland university, affiliated with michigan state ·university and founded in 1957, offers degree programs at the bachelor's and master's levels. by september, 1967, 3,896 students were enrolled and continuing growth is anticipated in coming years. the library had holdings of 86,755 jlumes and 17,908 units of microform materials on july 1, 1967. although young, oakland's library has already encountered a host of problems common to most academic libraries. in recognizing a need to 1utomate or otherwise improve basic routines of handling book ordering •• u. circulation control, oakland is simply another member of a growing club. the book order system developed at oakland is noteworthy because of ·~rtain features which may be unique: a title index to the on-order file, a computer prepared invoice-voucher form, and a computer prepared voucher card which serves as input to the computer for writing payment checks. in logic the system is related, through parallel invention, to the machine aided technical processing system developed at yale university ( 1). the system developed with unit record equipment at the university of maryland is perhaps more directly related, particularly in the use of the purchase order as a vendor's report form (2,3 ). the pennsyl94 journal of library automation vol. 1/ 2 june, 1968 vania state university library design for automated acquisitions, which uses a similar purchase order, includes the capacity for an elaborate and variable method for reporting the progress of each item from initial order to completion of cataloging ( 4,5) . the ibm 357 circulation control system developed at southern illinois university, carbondale, set the pattern followed by most subsequent systems ( 6,7) . oakland's circulation control system, a variation of the ibm 357 system, is more flexible than some because it uses trigger cards to control machine operations. this paper, originally distributed to a relatively small group of persons and redrafted for a more general reading, presents a case study of how one institution in modest circumstances set about solving certain problems. it describes not systems to be copied but rather a learning process which will continue for many years to come. background during the winter of 1964/ 65, oakland university library laid out the plans and began work on a program of automation of the university library. an initial four-phase plan was conceived: 1) book order, 2) circulation control, 3) serials acquisitions, and 4) a printed book catalog. these housekeeping routines were felt to be the foundation for developing further automation in the library. their automation would liberate the staff, clerical and professional, from such nonproductive and repetitive_ tasks as alphabetizing and re-copying of bibliographic information. an early decision to learn by doing rather than attempting to design the ultimate system in advance was supported by the university administration. consensus being that a larger computer to replace the ibm 1620 would be delivered within two years, computer programs were planned to be useful for twenty-four to thirty-six months. work on developing the book order system was begun in march, 1965; perhaps an all-time speed record was achieved when the system was put into use on july 1 of the same year. work on a circulation control system was begun in august and on february 21, 1966, it too was ready. phases three and four, serials acquisitions and the printed book catalog, were by then being held in abeyance until larger computer equipment should become available to the library. at oakland university all computer and related services are provided by the computing and data processing center. the computer system includes the following pieces of equipment: ibm 1620 computer, 40k with monitor 1 and additional instructions feature (mf, tnf, tns) ibm 1622 card reader/ punch (240 cpm/ 125 cpm) two ibm 1311 disk drives with changeable disk packs ibm 1443 line printer ( 240 lpm) automated book order/ auld 95 only one of the two disk drives is available for production use because the other is committed to monitor, supervisor, and stored programs. a disk pack on the ibm 1620 can accommodate two million numeric or one million alphabetic characters. the computer language used for most of the library programs is 1620 sps (symbolic programming system); fortran is used for some computational work. equipment within the library consists of an ibm 026 printing keypunch which is used for the order system and an ibm 357 data collection device, including a time clock, with output via a second ibm 026 printing keypunch for the circulation system. book order procedure as may be inferred from a birdseye view of the order system (figure 1), the initial input to the computer is decklets of punched cards. output from the computer is a series of printouts: purchase orders, library of congress card orders, oakland university invoice-vouchers, a complete fig. 1. flow chart of book order system . 96 journal of library automation vol. 1/ 2 june, 1968 on-order listing with title and purchase order number indices, departmental listings, and budget summaries. facu1ty and library staff submit requests for book purchases to the acquisitions department on a specially designed library book request form (figure 2 ) . the 5x8-inch size provides adequate room for notes, checking marks, etc., and makes for improved legibility, which in turn makes for easier, faster, and more accurate keypunching. kttt;e libt o ry oo~larul untve r1 ity jildg. q 11ery library book request mutt be typ41d &jp st orch au th o, cij tit i• p' tla piip brit . ... p~o~bli•h•r and "oce r----no. copie• i p, bll•h dote l fd ition j•'· _t·· ~ mo . yr. cotl. ltl!qu tttt d it d!portment cited in r---o:;;p'rice t ·· dept i vando• clau l l c cood n•mbe • l.c. fig. 2. book order request form. the request form calls for the bibliographic data customarily required for book purchasing, plus date of ordering, code number for the department originating the order, and vendor number. oakland university utilizes campus-wide a five-digit vendor code system; since the library's vendor numbers are a part of the university's vendor code, this interface is one of several points where the book order system ties in with other university records and procedures. a tag number is assigned to each library book request form upon its arrival in the acquisitions department. after routine bibliographic identification is completed, decklet cards (figure 3) are keypunched. the individual cards in each decklet are kept together by the tag number, punched into columns one through five. to keep the cards in order within decklets, column six is punched to identify the type of card as 1) author, 2) title, 3) place and publisher, or 4) miscellaneous information. column seven indicates the card number within type of card. for example, code 11 in columns six and seven wou1d be the first author card and code 12 the second. ·automated book order/ auld 97 i i : i •i l l i l ~~ i l i i ..... ~;,, i i i i i l l ! i i i : i .,! autho• ~ ~ : 0 : i!! u ~ ••• c< anc ~u&ll&~ bisct/11~ cll~b:s . each book has a machine readable book card (figure 7). the period for which the book normally circulates is indicated with a letter code punched into column one; column two identifies the collection within the library from which the material came; column three identifies · the type of material. the call number and/ or other identifying information is punched into columns four through forty-one. column forty-two is punched with an end-of-transmission code . 104 journal of library automation vol. 1/ 2 june, 1968 .. ::;; !!~8 g c: .. ~~z z ... . 0 p 0 -1 1111 .. !,._ ~oot ~ .... i z:o =;.: "' !::o:r.oa 1);1 ... ;!,..< .. ... '"it"' !ill 0 $ ;;;~_. ... ;., 0 c: s!! ~~1!!1 ,. ill: z o:= lire:~ ... ~-~ -zr.oa s 0 n_ c: ..... :iiii i:' s; s20 z-;. z =~= 0 0-it . . c: ........ .. z !o~ ... < ... ~:~ .. "' ~ . "' pi 'i t' 1 • t •h • eijn!lf!m!i!u+•+•+•+•+•+•+•+•• .. fig. 7 . book card. the ibm 357 data collection device will perform only one operation without special instructions. if it is to perform more than one operation, it. must receive instructions for each variant operation and it must receive them each time the variant operation is performed. this limitation can be met in one of three ways: by not admitting variant operations, by using a cartridge as a carrier for some information, or by providing special . instructions as they are needed via a "trigger" card. denying the existence of a variant operation was not practical, because at oakland the identification of a borrower constitutes a set of variant operations. the library's clientele includes not only oakland university students, faculty, and staff, but also residents from the surrounding communities, area high school students, and neighboring college students. the heaviest users are oakland's own students and faculty, who have machine readable plastic identification cards issued by the registrar or the personnel office. it has been impractical for the library to attempt to issue similar cards to guest borrowers. thus, the identification of a borrower is a set of variant operations. use of a cartridge to gain the borrower identification number would be possible but would leave the borrower identification badge unused. this badge card constitutes an official identification card and as such should be utilized throughout the university whenever practical. . · trigger cards to instruct the 357 in the pedormance of variant operations were developed to control the recording of borrower identification and to identify discharging and certain charging functions. the use of trigger cards provides flexibility, in that machine. instructions are carried in trigger cards and are not an integral part of the book cards. a change in machine configuration would probably not require ·repunching book cards for the book collection. at the same time a wide range · of 357 machine functions are made possible through ·the use of different automated book order/ auld 105 trigger cards. in short, the adoption of trigger cards provides the greatest degree of flexibility in operating the 357. in the customary borrowing procedure the student brings a book to the circulation desk and presents it, along with his machine readable student id card, to the desk attendant. the attendant first inserts the book card into the ibm 357 data collection device, then retrieves the book card and inserts a "student badge trigger card", which activates the badge reader on the 357. then the badge is inserted into the badge reader, completing the transaction. by remote control this has created on an ibm 026 printing keypunch a card with the following information: typical loan period, collection from which the item came, type of material, call number, borrower type, borrower's identification number, the day of the year, and the time of day secured from an on-line clock. if the borrower does not have a machine readable badge card, an alternate method of charging a book is to use a "manual entry trigger card" which activates the manual entry unit, with which can be recorded numeric information identifvine: the borrower. with special trigger cards .bo;;ks can also be charged to reserve, bindery, or "missing". books are discharged by passing the book card through the 357 and following it by a "discharge trigger card". monday through friday at closing the charge and discharge cards for the day are delivered to the computing and data processing center, where they are processed by the ibm 1620 computer system. the circulation file is maintained on a disk pack similar to that for the order. system. three reports are received from the computing and data processing center: a daily cumulative listing of all books and materials in circulation (figure 8); a cumulative weekly list of all books on long-term loan; and a weekly fines-due report. in addition, overdue notices, computer printed on mailable postcard stock, are sent weekly to the library where they are audited before being mailed. the fines-due report is arranged by borrower, bringing together in one place all of the borrower's delinquencies; the books which he has neglected to return are listed here, as are the overdue books which he returned through the outdoor book return chute. for the latter the number of days overdue at the time of return is listed. subsequent refinements introduced into this system include two additional reports: a pre-notice report in call number sequence produced two days in advance of the fines-due report and a listing of books discharged each day. the pre-notice report makes it possible to search the shelves for books which have been returned but, because of time lag, may still have overdue notices generated. normal tum-around time for the system is 24 hours, but on weekends it goes to 63 hours and at certain holiday periods even higher. the daily list of discharges documents the return and discharge of each book and is used to answer the student who says, "but i returned the book." 106 journal of library automation vol. 1/2 june, 1968 s hort term books in circulation . weds-jul. 13.1966 pag f. 1 8 call numb er borrower day of yr due odue 01 jc0153ol.79 01 000009b74 20b 01 jc 0179 , r723 01 000007736 209 01 j c0 179or83-1962 01 000004838 199 01 j c0 179.r86-195 4 01 000007935 209 01 j c025 lol..27 01 000009021 20 i 01 jc0421ob8vol 01 000000207 127 * 01 j c0 4 23oi..58co2 04 000002393 19 9 01 jk0246ob9-1895v o 2 01 00000020 7 127 * 01 jk04 2 1op4 01 000006266 203 01 jk0421o s7 01 000006266 209 01 jk0516os3 01 000 003891 199 01 jk0518oh6 01 000006266 209 01 jk0524ol.38 01 000007717 2 1 4 01 jk154 1oj27 01 000006266 182 * 01 jk1561o527 01 000003891 199 01 jk1 57 1om8 01 000003640 208 01 j k1976 om5-co2 0 5 0 00002256 207 01 jk2295om5253 01 000007397 209 01 jk2372 oh5 04 000002194 2 10 01 jk 2372op6 04 000002194 2 1 0 01 jk2408ok4 0 1 000 00020 7 146 * 01 jn6769 oa5k622 01 00000 52 31 2 13 01 j01503 o1 912 ob7 01 000003824 209 01 j01503o1911oh72 01 000003824 207 0 1 j01512ok7 01 000 003824 207 01 j s0323oc58 01 0000 07717 209 01 js0341ow7 0 1 00 0 00 7717 2 09 01 j x14 25 op384 04 00000 2925 213 01 jx14 28 oc 6c5-1 964 01 000004154 199 01 jx1 977o2 oc5a73 01 000009 11 9 207 01 jx1977o2ou5577 0 1 000007371 201 fig. 8. example of short term circulation rep01t. maximum file capacity will permit up to about 9,000 charges at one time. assuming an average life of four weeks for each charge, the maximum number of transactions which can be accommodated in one year is about 115,000. the circulation control system utilizes eight programs. all are written in 1620 sps and utilize 40k storage. (an additional computational program not included in the production package is written in fortran.) with only minor modification the programs could be made to work with 20k storage. the individual programs are described in table 2. tabk 2. lib 201 lib 202 lib 204 lib 205 lib 207 lib 209 lib 212 lib 213 circulation control system programs to update file and to print short-and long-term reports. to print overdue notices and fines-due report. phase 1 routine for lib 202. cold start program to "seed" circulation file. to restart files from one term to the next. to print pre-notice report. to print daily discharges. to print circulation file or part thereof. • automated book order/ auld 107 appraisal the book order system has been described as it was originally designed, and the circulation control system as designed and modified. a partial update together with a critical appraisal follows. implicit in the planning of both systems was the assumption that the ibm 1620 would eventually be replaced by a larger and faster machine and that both systems would be redesigned and augmented. however, the ibm 1620 is continuing in use for a maximum rather than minimum projected time. in july, 1965, oakland initiated an accelerated library development program. overnight the book budget projection for several years was available and in less than three months the book order system was consequently overloaded. with the disk ble filled and many orders waiting, drastic action was required. the most obvious solution seemed to be use of an additional changeable disk pack to expand the purchase order file, but this procedure would have been hopelessly unwieldy. to use a second pack would require either that all transactions be run against both disk packs, roughly doubling computer time and costs, or that each transaction be addressed to a particular disk pack which would necessitate extensive systems redesign. another proposed solution was to revert to a completely manual system, but the order section preferred, if at all po~sible, to retain the automated fiscal control and invoice-voucher preparation features of the order system. , the alternative finally adopted required a basic philosophical change in the system. as originally designed, the system accounted for a book from the time it was placed on order to the time it was cataloged and placed on the shelf. the disk file was one-half occupied· with items received and paid for but not yet cataloged. by purging the file of such items, an on-order file in the narrowest sense was created and a doubling of file capacity gained. now a new problem was created. how was a book to be accounted for that had been received, paid, and purged from the on-order file, but not yet cataloged? the solution was to print a second (carbon) copy of the lc card order slip which would be hand-filed into the card catalog; there it would serve as an on-order/ in-process slip until replaced by a catalog card. hand-filed slips replacing a machine-filed list further altered the philosophical basis of the system. discrepancies in entry do occur, but not so often that the expedient does not work. four months later the system was again overloaded and a routine had to be devised whereby purchase orders could be issued either manually or through the computer. however, all items were still paid via the computer and all invoice-vouchers computer prepared. fiscal control was retained even though the rationale of the system was violated . 108 journal of library automation vol. 1/ 2 june, 1968 during the summer of 1967 a change of a different nature was implemented. as originally designed the system provided constant communication between the library and each faculty department through the departmental report. but, after the changes described above, the departmental report now included less than one-half of the items being purchased with the department's book fund allocation. it had ceased to serve any purpose and was omitted after july, 1967, with a consequent reduction of nearly two-fifths of line-printer time required for the book order system. to the question, "would it be better to return to a completely manual system for ordel'ing books?" the answer by the order section has always been "no, retention of the automated system for fiscal control and voucher preparation is preferable, even with the patched system at hand." nor should it be forgotten that the book order system as originally designed worked well until the demand on it exceeded its production capacity. also to be recognized is the gain in experience and insight by the library staff during these three years. reading about or visiting someone else's work is enlightening but day-to-day work brings an understanding for which it is difficult to obtain a substitute. acknowledgments four persons deserve special recognition for the roles they played in the foregoing: dr. floyd cammack, former university librarian, without · whose imagination and courage library automation at oakland would not have been attempted; mr. donald mann, assistant director, computing and data processing center, an outstanding systems analyst and programmer; mrs. edith pollock, head of the order section, who likes computers; mrs. nancy covert, head of circulation department, who likes students. references i. alanen, sally; sparks, david e.; kilgour, frederick g.: "a computermonitored library technical processing system," in american documentation institute: proceedings of the annual meeting, v. 3, 1966 (woodland hills, calif.: adrianne press, 1966) p. 419-26. 2. cox, carl r.: "the mechanization of acquisitions and circulation procedures at the university of maryland library," in international business machines corporation: ibm library mechanization symposium (endicott, n. y.: 1964) p. 205-35. 3. cox, carl r.: "mechanized acquisitions procedures at the university of maryland," college & research libraries, 24 (may 1965) 232-36. 4. minder, thomas l.: "automation-the acquisitions program at the pennsylvania state university library," in international business machines corporation: ibm library mechanization symposium (endicott, n. y.: 1964) p. 145-56. automated book order/ auld 109 5. minder, thomas l.; lazorick, gerald: "automation of the penn state university acquisitions department" in international business machines corporation: ibm library mechanization symposium (endicott, n. y. 1964) p. 157-63. (reprinted from american documentation institute: automation and scientific communication; short papers contributed to the theme sessions of the 26th annual meeting ... (washington: 1963) p. 455-59. ) 6. dejarnett, l. r. : "library circulation control using ibm 357's at southern illinois university," in international business machines corporation: ibm library mechanization symposium (endicott, n. y.: 1964) p . 77-94. 7. mccoy, ralph e.: "computerized circulation work: a case study of the 357 data collection system," library resources & technical services, 9 (winter 1965), 59-65. 6 information technology and libraries | september 2008 mireia ribera turróeditorial board thoughts the june issue of ital featured a new column enti-tled editorial board thoughts. the column features commentary written by ital editorial board members on the intersection of technology and libraries. in the june issue kyle felker made a strong case for gerald zaltman’s book how customers think as a guide to doing user-centered design and assessment in the context of limited resources and uncertain user needs. in this column i introduce another factor in the library–it equation, that of rapid technological change. in the midst of some recent spring cleaning in my library i had the pleasure of finding a report documenting the current and future it needs of purdue university’s hicks undergraduate library. the report is dated winter 1995. the following summarizes the hicks undergraduate library’s it resources in 1995: [the library] has seven public workstations running eight different databases and using six different search software programs. six of the stations support a single database only; one station supports one cd-rom application and three other applications (installed on the hard drive). none of the computers runs windows, but the current programs do not require it. five stations are equipped with six-disc cd-rom drives. we do not anticipate that we will be required to upgrade to windows capability in the near future for any of the application programs. today the hicks undergraduate library’s it resources are dramatically different. as opposed to seven public workstations, we have more than seventy computers distributed throughout the library and the digital learning collaboratory, our information commons. this excludes forty-six laptops available for patron checkout and eighty-eight laptops designated for instructional use. we have moved from eight cd-rom databases to more than four hundred networked databases accessible throughout the purdue university libraries, campus, and beyond. as a result, there are hundreds of “search software programs”—doesn’t that phrase sound odd today?—including the library databases, the catalog, and any number of commercial search engines like google. today all, or nearly all, of our machines run windows, and the macs have the capability of running windows. in addition to providing access to databases, our machines are loaded with productivity and multimedia software allowing students to consume and produce a wide array of information resources. beyond computers, our library now loans out additional equipment including hard drives, digital cameras, and video cameras. the 1995 report also includes system specifications for the computers. these sound quaint today. of the seven computers six were 386 machines with processors clocking in at 25 mhz. the computers had between 640k and 2.5mb of ram with hard drives with capacities between 20 and 60mb. the seventh computer was a 286 machine probably with a 12.5 mhz processor, and correspondingly smaller memory and hard disc capacity. the report does not include monitor specifications, though, based on the time, they were likely fourteenor fifteen-inch cga or ega cathode ray tube monitors. modern computers are astonishingly powerful in comparison. according to a member of our it unit, the computers we order today have 2.8 ghz dual core processors, 3gb of ram, and 250gb hard drives. this equates to being 112 times faster, 1,200 times more ram, and hard drives that are 4,167 times larger than the 1995 computers! as a benchmark, consider moore’s law, a doubling of capacitors every two years, a sixty-four fold increase over a thirteen year period. who would have thought that library computers would outpace moore’s law?! today’s computers are also smaller than those of 1995. our standard desktop machines serve as an example, but perhaps not as dramatically as laptops, mini-laptops, and any of the mobile computing machines small enough to fit into your pocket. monitors are smaller, though also bigger. each new computer we order today comes standard with a twenty-inch flat panel lcd monitor. it is smaller in terms of weight and overall size, but the viewing area is significantly larger. these trends are certainly not unique to purdue. nearly every other academic library could boast similar it advancements. with this in mind, and if moore’s law continues as projected, imagine the computer resources that will be available on the average desktop machine— although one wonders if it will in fact be a desktop machine—in the next thirteen years. what things out on the distant horizon will eventually become commonplace? here the quote from the 1995 report about windows is particularly revealing. what things that are currently state-of-the-art will we leave behind in the next decade? what’s dos? what’s a cd-rom? will we soon say, what’s a hard drive? what’s software? what’s a desktop computer? in the last thirteen years we have also witnessed the widespread adoption and proliferation of the internet, the network that is the backbone for many technologies that have become essential components of physical and digital libraries. earlier this year, i co-authored an arl spec kit entitled social software in libraries.1 the survey reports on the usage of ten types of social software within arl libraries: (1) social networking sites like myspace and facebook; (2) media sharing sites like 6 information technology and libraries | september 2008 matthew m. bejune (mbejune@purdue.edu) is an ital editorial board member (2007–09), assistant professor of library science at purdue university, and doctoral student in the graduate school of library and information science at the university of illinois at urbana–champaign. matthew m. bejune editorial board thoughts | bejune 7 youtube and flickr; (3) social bookmarking and tagging sites like del. icio.us and librarything; (4) wikis like wikipedia and library success: a best practices wiki; (5) blogs; (6) rss used to syndicate content from webpages, blogs, podcasts, etc.; (7) chat and instant messenger services; (8) voice over internet protocol (voip) services like googletalk and skype; (9) virtual worlds like second life and massively multiplayer online games (mmogs) like world of warcraft; and (10) widgets either developed by libraries like facebook applications, firefox catalog search extensions, or widgets implemented by libraries like meebome and firefox plugins. of the 64 arl libraries that responded, a 52% response rate, 61 (95% of respondents) said they are using social software. of the three libraries not using social software, two indicated they plan to do so in the future. in combination then, 63 out of 64 respondents (98%) indicated they are either currently using or planning to use social software. as part of the survey there was a call for examples of social software used in libraries. of the 370 examples we received, we selected around 70 for publication in the spec kit. the examples are captivating and they illustrate the wide variety of applications in use today. of the ten social software applications in the spec kit, how many of them were at our disposal in 1995? by my count three: chat and instant messenger services, voip, and virtual worlds such as text-based muds and moos. of these three, how many were in use in libraries? very few, if any. in our survey we asked libraries for the year in which they first implemented social software. the earliest applications were cu-seeme, a voip chat service at cornell university in 1996, im at the university of california riverside in 1996 as well, and interoffice chat at the university of kentucky in 1998. the remaining libraries adopted social software in year 2000 and beyond, with 2005 being the most common year with 22 responses or 34% of the libraries that had adopted social software. a look at this data shows that my earlier use of a thirteen-year time period to illustrate how difficult it is to project technological innovations that may prove disruptive to our organizations is too broad a time frame. perhaps we should scale this back to looking at five-year increments of time. using the spec kit data, in year 2003, a total of 16 arl libraries had adopted social software. this represents 25% of the total number of institutions that responded when we did our survey. this seems like figure 1. responses to the question, “please enter the year in which your library first began using social software” (n=61). a more reasonable time frame to be looking to the future. so, what does the future hold for it and libraries, whether it be thirteen or five years in the future? i am not a technologist by training, nor do i consider myself a futurist, so i typically defer to my colleagues. there are three places i look to for prognostications of the future. the first is lita’s top technology trends, a recurring discussion group that is a part of ala’s annual conference sand midwinter meetings. past top technology trends discussions can be found on lita’s blog (www.ala .org/ala/lita/litaresources/toptechtrends/toptechnology.cfm) and on lita’s website (www.ala.org/ala/lita/ litaresources/toptechtrends/toptechnology.cfm). the second source is the horizon project, a five-year qualitative research effort aimed at identifying and describing emerging technologies within the realm of teaching and learning. the project is a collaboration between the new media consortium and educause. the horizon project website (http://horizon.nmc.org/wiki/main_page) contains the annual horizon reports going back to 2004. a final approach to project the future of it and libraries is to consider the work of our peers. the next library innovation may emerge from a sister institution. or perhaps it may take route at your local library first! reference 1. bejune, matthew m. and jana ronan. social software in libraries. arl spec kit 304. washington, d.c.: association of research libraries, 2008. mobile technologies & academics: do students use mobile technologies in their academic lives and are librarians ready to meet this challenge? angela dresselhaus and flora shrode mobile technologies & academics | dresselhaus and shrode 82 abstract in this paper we report on two surveys and offer an introductory plan that librarians may use to begin implementing mobile access to selected library databases and services. results from the first survey helped us to gain insight into where students at utah state university (usu) in logan, utah, stand regarding their use of mobile devices for academic activities in general and their desire for access to library services and resources in particular. a second survey, conducted with librarians, gave us an idea of the extent to which responding libraries offer mobile access, their future plans for mobile implementation, and their opinions about whether and how mobile technologies may be useful to library patrons. in the last segment of the paper, we outline steps librarians can take as they “go mobile.” purpose of the study similar to colleagues in all types of libraries around the world, librarians at utah state university (usu) want to take advantage of opportunities to provide information resources and library services via mobile devices. observing growing popularity of mobile, internetcapable telephones and computing devices, usu librarians assume that at least some users would welcome the ability to use such devices to connect to library resources. to find out what mobile services or vendors’ applications usu students would be likely to use, we conducted a needs assessment. the lessons learned will provide important guidance to management decisions about how librarians and staff members devote time and effort toward implementing and developing mobile access. we conducted a survey of usu’s students (approximately 25,000 undergraduates and graduates) to determine the degree of handheld device usage in the student population, the purposes for which students use such devices, and students’ interests in mobile access to the library. in addition, we surveyed librarians to learn about libraries’ current and future plans to launch mobile services. this survey was administered to an opportunistic population angela dresselhaus (aldresselhaus@gmail.com) was electronic resources librarian, flora shrode (flora.shrode@usu.edu) is head, reference & instruction services, utah state university, logan, utah. mailto:aldresselhaus@gmail.com mailto:flora.shrode@usu.edu information technology and libraries | june 2012 83 comprised of subscribers to seven e-mail lists whom we invited to offer feedback. our goal was to develop an action plan that would be responsive to students’ interests. at the same time, we aim to take advantage of the growing awareness of and demand for mobile access and to balance workloads among the library information technology professionals who would implement these services. usu is utah’s land-grant university and the merrill-cazier library is its primary library facility on the home campus in logan, utah. while usu has had satellite branches for some time, a growing emphasis on expanding online and distance education courses and degree programs has resulted in a considerable growth of its distance education programs in the last five years. mobile access to university resources makes especially good sense for the distance education population and for students who may reside close to the main usu campus but who also enroll in online courses. the library has an information technology staff of 4.5 fte professionals who support the library catalog, maintain roughly 250 computer workstations in cooperation with the director of campus student computer labs, and oversee the computing needs of library staff and faculty members. literature review mobile access to library resources is not a new concept; in fact, the first project designed to deliver handheld mobile access to library patrons began eighteen years ago, in 1993, the time of mainframe computers and gopher. the “library without a roof” project partners included the university of southern alabama, at&t, bellsouth cellular, and notable technologies, inc. 1 library patrons at participating institutions could search and read electronic texts on their personal digital assistants (pdas) and search the library catalog while browsing in physical collections. as reflected in the literature, interest in pda applications for libraries started to pick up around the turn of the twenty-first century. medical librarians were among the first to widely recognize the potential impact of mobile technologies on librarianship. a 2002 article in the journal of the medical library association and a monograph by colleen cuddy are among the first publications that focus on pdas. 2 a quick perusal of the medical category on the itunes store reveals several professional applications, ranging from new england journal of medicine tools to remote patient vital-sign monitors. as an example of the depth of mobile-device penetration in the medical field, in 2010 the food and drug administration approved the marketing of the airstrip suite of mobile-device applications. these apps work in conjunction with vital-sign monitoring equipment to allow instant remote access to a patient’s vital signs. 3 these examples illustrate the increasing pervasiveness of mobile technology in everyday life. mobile learning in academic areas outside of medicine has increased recently as more universities have adopted mobile technologies. 4 a sampling of current projects at academic mobile technologies & academics | dresselhaus and shrode 84 institutions is provided in the 2010 horizon report. 5 according to the 2010 educause center for applied research (ecar) study, 49 percent of undergraduates consider themselves mainstream adopters of technology. 6 locally, utah state university students have adopted smartphones at the rate of 39.3 percent and other handheld internet devices at the rate of 31.5 percent. these statistics indicate that skills are increasing and the technological landscape is changing quickly. the ecar study reports that student computing is rapidly moving to the cloud, another indication of the rapid change in the use of technology. “usb may one day go the way of the eight-track tape as laptops, netbooks, smartphones and other portable devices enable students to access their content from anywhere. they may or may not be aware of it, but many of today’s undergraduates are already cloud-savvy information consumers, and higher education is slowly but surely following their lead.” 7 similarly, usu students show interest in adopting new technology. while usu students are less likely to own mobile devices, 70.2 percent of respondents indicated that they would be likely or very likely to use library resources on smartphones if they owned capable devices and if the library provided easy access to materials. bridges, gascho rempel, and griggs published a comprehensive article, “making the case for a fully mobile library web site: from floor maps to the catalog,” detailing their efforts to implement mobile services on the oregon state university campus. 8 their paper highlights the popularity of mobile phones and smartphones/web-enabled phones. the authors discuss mobile phone use, library mobile websites, and mobile catalogs, and they describe the process they used to develop their mobile library site. they note that mobile services will certainly be expected in the coming years, and we have learned that usu students share this expectation. survey research in recent years librarians have conducted surveys on mobile technology in libraries. in a 2007 study, cummings, merrill, and borrelli surveyed library patrons to find out if they are likely to access the library catalog via small-screen devices. 9 they discovered that 45.2 percent of respondents, regardless of whether they owned a device, would access the library catalog on a small-screen device. mobile access to the library catalog was the most requested service in the usu student survey, although it accounted for only 16percent of the responses. cummings, et al. also discovered that the most frequent users of the catalog were also the least willing to access the catalog via mobile devices, an interesting observation that merits further research. their survey was completed in june of 2007, just five months after the january 9th release of the original iphone. the release of the iphone is significant as the point where the market demographics of mobile device users began to shift to people under thirty, the primary age group of undergraduate students. 10 librarians wilson and mccarthy at ryerson university conducted two surveys to measure information technology and libraries | june 2012 85 the usage of their catalog’s feature to send a call number via text or email (initiated in 2007) and their “fledgling mobile web site” (launched in 2008). 11 the first survey indicated that 20 percent of respondents owned internet-capable cell phones, and over half said they intended to buy this type of phone when their current contracts expired. the survey respondents indicated they wanted the following services: “booking group study rooms, checking hours and schedules, checking their borrower records and checking the catalogue.” 12 the second survey was conducted a year after the library had implemented a group study room reservation system, catalog and borrower record services, and a computer/laptop availability service. results of the follow-up survey show a drastic increase in ownership of internetcapable cell phones (from 20% to 65%). respondents desired two new services: article searches and e-book access. wilson and mccarthy found that very few library patrons were accessing the mobile services, but “60% of the survey respondents were unaware that the library provided mobile services.” 13 the authors conclude that advertising should be a central part of mobile technology implementation. they also detail how the library contributed expertise and leadership to their campus-wide mobile initiatives. seeholzer and salem conducted a series of focus groups in the spring of 2009 to determine the extent of mobile device use among students at kent state university. 14 notable among their findings are that students are willing to conduct research with mobile devices, and they desire to have a feature-rich interactive experience via handheld devices. students expressed interest in customizing interactions with the library’s mobile site and completing common tasks such as placing holds or renewing library materials. nationwide survey of librarians we asked colleagues who subscribe to e-mail distribution lists to respond to a survey about their libraries’ implementation of mobile applications for access to library collections and services. invitations to take the survey were sent to seven lists (acrl science & technology section, eril, information literacy instruction, liblicense-l, nasig, ref-l, and serialist), and 289 librarians and library staff members responded to the survey. the population of subscribers to the e-mail lists we used to solicit survey responses is dynamic and includes librarians and staff who work in academic and other types of settings. while our findings cannot be generalized in a statistically reliable manner, we nonetheless believe that the survey responses merit thorough analysis. we chose to conduct two surveys to avoid some of the problems we noted in a 2007 study conducted by todd spires. 15 spires’ survey questions focused on librarians’ perceptions rather than on empirical data. we developed separate surveys for librarians and students in hopes of avoiding problems that could arise from basing assumptions on perceived behavior or from the complexity of interpreting and generalizing from perceptions. a survey of library patrons should provide more accurate insight into the ways that patrons are using the library mobile technologies & academics | dresselhaus and shrode 86 via handheld devices. in the libraries that currently provide mobile access to resources, the library catalog is most commonly offered. article databases and assistance from a librarian tie as the second most frequently provided services. figure 1 shows a snapshot of the resources and services librarians reported that they provide. we also asked how long libraries have provided mobile access, and the time periods ranged from a few weeks to more than ten years. five librarians indicated that they have provided mobile access for six to ten years, and it is possible that these respondents may work in medical or health science libraries, as our literature review indicated that access to medical information and journal articles via pdas has been a reality for several years. figure 1. librarians’ responses: does your library provide mobile access to the following library resources? librarians were also asked what services and resources they believe libraries should provide via mobile devices. of one hundred seventy-eight responses, 71 percent indicated that “everything” or a variety of library resources should be made available. a few of the more interesting suggestions include a library café webcam (similar to a popular link from north carolina state university), locker reservations, a virtual suggestion box, alerts about database trials, an app that lists new books, and using ipads or other mobile devices for roving reference. roving reference with tablet pcs was evaluated by smith and pietraszewski at the west campus branch library of texas a&m. 16 as tablet computers become increasingly popular with the release of the ipad and other tablets, 17 roving reference should be reconsidered. smith and pietraszewski note that "the tablet pc proved to be an extremely useful device as well as a novelty that drew student interest (anything to make reference librarians look cool!)" 18 using the latest technology in libraries will help raise awareness that libraries are relevant and adapting to changing user preferences. we asked librarians to indicate who had responsibility for implementing mobile access in their library. the 184 responses are summarized here:  63 percent answered that a library systems or computing professional does this work;  26.1 percent indicated that the electronic resources librarian has this role;  17.9 percent rely on an information professional from outside of the library;  22.8 percent chose “other,” and we unfortunately did not offer a space for comments where survey respondents could tell us the job title of the person in their library who implements mobile access. the results from our sample of librarians are consistent with a larger study by the library journal. 19 the lj study found that the majority of academic libraries have implemented or are information technology and libraries | june 2012 87 planning to implement mobile technologies. student survey in january of 2011 we sent out a thirteen-question survey to students (questions are available in appendix a). usu’s student headcount is 25,767, and 3,074 students responded, representing 11.9 percent of the student population. we asked students to identify with colleges so that we could evaluate the survey sample against the enrollment at usu. the rate of response by college clustered between 12–19 percent with the lowest response rate (8 percent) from the college of education. the highest response rate came from the college of humanities and social sciences. we examined survey response rates from usu undergraduate and graduate populations; 54 percent of undergraduates and 50 percent of graduate students use mobile technology for academic purposes. we believe that our sample is sufficiently representative of the overall population of usu. figure 2. student response rates by college in order to understand the context of survey questions that specifically address mobile access, we asked students how often they used library electronic resources. the majority of students used electronic books, the library catalog, and electronic journals/articles a few times each semester. only 34.4 percent of students never use electronic books, 19.6 percent never use the library catalog, and 17.6 percent never use electronic journals/articles. we made comparisons between disciplines and found no significant difference in electronic resource use between fields in the sciences and those in humanities. further data will be collected in fall 2011 about use of print and electronic materials. mobile technologies & academics | dresselhaus and shrode 88 figure 3. electronic resource use among students students were asked how often they use a variety of handheld devices. we decided to emphasize access over ownership in order to allow for a variety of situations. responses show that 39.3 percent of our students use a smartphone with internet access on a daily basis. another 31.5 percent of students use other handheld devices like an ipod touch on a daily basis. very few students use ipads or e-book readers, with 3.9 percent and 5.4 percent indicating daily use, respectively. we view the "other handheld device" category as an important segment of the mobile technology market because of the lower cost barrier, since such devices do not require a subscription to a data plan. the ecar study also noted the possibility of cost factors influencing the decision of some students not to access the internet via a handheld device. 20 information technology and libraries | june 2012 89 figure 4. mobile device usage students were asked if they use their mobile device or phone for academic purposes (e.g., blackboard, electronic course reserves, etc.). this question was intentionally worded broadly in order to gather general information. we used skip logic to direct respondents to different paths through the survey based on their response to earlier questions. in response to a question about how students use their mobile devices, 54 percent of respondents indicated that they use their mobile devices for academic purposes. we analyzed the results by discipline and noted a few variances. among students responding from the school of business, 63 percent said that they use their mobile device for academic purposes, and 59 percent of engineering students use their devices for school work. the respondents from the other colleges reported use under 50 percent, most likely because of more limited adoption of mobile technology by usu faculty in those fields or lack of personal funds (or unwillingness to spend) to acquire devices and data plans. the 2010 ecar report also noted higher exposure to technology in these fields, indicating that the situation at usu is in line with results from a national study. 21 mobile technologies & academics | dresselhaus and shrode 90 table 1. device use for academic purposes by college we asked the students, “if library resources were easily accessible on your mobile devices, and if you had such a device, how likely would you be to use any of the following for assignments or research?” responses to this question allowed us to gauge interest without concerns about cost of technology or the current state of mobile readiness in our library. among the survey respondents, 70.2 percent are likely or very likely to use resources on a smartphone; 46.9 percent are likely or very likely to use resources on an ipad; 45.9 percent are likely or very likely to use resources on an e-book reader; 63.2 percent are likely or very likely to use resources on other devices. we included an option for respondents to select “not applicable” as distinct from “not likely” to allow for those students who may welcome use of a mobile device but who may currently use a device different from the types we specified. information technology and libraries | june 2012 91 figure 5. likelihood of using library resources on mobile device if easily available we are unsure how to account for the dramatic difference in interest between smartphone and ipad usage. survey responses indicated that only a small number of students have access to an ipad, and it is possible that students have had little opportunity to see their classmates or others use ipads in an academic setting. students were asked in a free-text question to list the services the library should offer. the comments were varied and often used language different from the vocabulary that librarians typically use. in order to gain an understanding of trends and to standardize the language, we coded the survey comments. after coding, trends began to emerge. access to the library catalog was mentioned by 16 percent of respondents. mobile services in general were specified by 11 percent of survey respondents, 10 percent wanted articles, and 9 percent wanted to reserve study rooms on their mobile device. the phrase “mobile services” represents a catch-all tag designated for comments that indicated that a student desired a variety of services or all services that are possible. for example, only 9 percent of respondents indicated they had used text to contact the library and 15percent had used instant messaging. several students indicated they might have used these services but did not know they were available, indicating a need for advertising. while we learned much mobile technologies & academics | dresselhaus and shrode 92 about students’ desires for mobile services from this important subset of comments in response to the free-text question, they did not prove especially useful to guide librarians’ plans for the next stages of implementing mobile technology. figure 6. services requested by students as is common at many institutions, funding at usu is limited and any development in the area of mobile access implementation must be strategic. our survey indicated that usu students are using mobile devices for their academic work and would like to further integrate library resources into their mobile routine. the next section of this paper outlines the steps we are taking toward mobile implementation. going mobile the usu library joins many other academic libraries in the beginning stages of implementing mobile technologies. survey responses from students indicate that they use mobile devices for academic purposes, and until options to use the library with such devices are available and advertised, we will not have a clear understanding of students’ preferences. klatt's article, “going mobile: free and easy,” 22 outlines a way to get started with mobile services with small investments of time and money. articles by griggs, 23 back, 24 and west, 25 and books by green, et al. 26 and hanson 27 also provide guidance in this area. here we offer suggestions to establish an implementation team, conduct an environmental scan, outline steps to begin the process, and shed light on advertising, assessment, and policy issues. information technology and libraries | june 2012 93 implementation team for a library seeking to provide mobile access to online resources, a diverse and talented implementation team is important. public services personnel in an academic library staff are on the front lines and often field students’ questions. they may also have the opportunity to observe how students are using mobile devices in the library. if librarians track reference interactions, they may find evidence that students are attempting to use their mobile devices to access library services. the electronic resources/collections specialist will also play a key role in mobile development. these specialists are often in contact with vendors, and their advocacy is important in encouraging mobile web development in the vendor community. a web site coordinator interested in mobile services and knowledgeable in current web standards will bring essential talent to the team. arguably, a mobile-optimized web site should become a standard level of service. web sites that are optimized or adapted specifically for mobile access are device agnostic and do not require advanced knowledge of smart phone operating systems. therefore existing web development staff can apply their current skill set to expand into mobile web design. in order to launch advanced interactive access to library resources, a programmer who is interested in developing mobile apps on a number of platforms is needed. device-specific applications allow for the use of phone features such as gps and orientation sensing via an accelerometer and provide the basis for augmented reality technologies. environmental scan librarians can learn about mobile usage in their community by gathering information to guide future development. at usu we interpret the numbers of students who use mobile devices for academic purposes as justification for implementing mobile library access, but we have not set a benchmark for a degree of interest that would trigger more development. some of the mobile implementations described at the end of this paper required minimal time or were investigated because of the electronic resources librarian’s interest for their relevance to her role as music subject librarian. in the survey we administered to students, we considered it important to include a wide range of devices, including ipod touches and similar devices that have many of the same possibilities for academic use as smartphones but which do not require a monthly contract. laptops are also considered a mobile technology, and while we did not emphasize this class of devices, some student comments referred specifically to laptop computers. we will monitor use of the mobile applications that we implement and likely conduct a follow-up survey to assess students’ satisfaction and to find out if there are other services they would like for the library to provide. while librarians may gather useful information from a user study, there are other ways to determine if students are, in fact, using mobile devices in the library. one approach is to review logs of reference questions to determine if students are inquiring about access to library resources via mobile devices. recently, a few mobile-related questions have surfaced mobile technologies & academics | dresselhaus and shrode 94 at usu in the libstats program used to track reference interactions. this is also an area where training reference staff to recognize and record questions about mobile access could be helpful to detect demand in the library’s community. if vendors provide statistics about use of their products from mobile devices, this information could also contribute to assessing need. finally, in libraries that use vpn or other off-campus authentication methods, consulting with it support staff to see if they field questions on setting up remote access on smartphones or other devices may factor into decisions regarding mobile access. the usu information technology website provides a knowledgebase that includes entries on a variety of mobile device queries. this indicates to librarians that people in the university community are using their mobile devices for academic functions. before we conducted the survey of usu students, we knew little about the exact nature of their mobile use. getting started after identifying the needs on campus, the next step is to create a plan for mobile implementation. an important aspect of anticipating the needs of a library’s user population is to understand the likely use scenarios, goals, tasks, and context as outlined in “library/mobile: tips on designing and developing mobile web sites.” 28 building on services that incorporate tasks that people already perform in non-academic contexts provides a logical bridge for those who are familiar with everyday use of a mobile device to recognize how such devices can serve academic purposes. gathering information from each vendor that supplies content to the library is an important early step in planning. this information can serve as the basis of a mobile web implementation plan and, in the case of ebsco, creating a profile is necessary in order to allow access to a mobile-formatted platform. at usu our online catalog provider has developed an application for apple's ios platform. if a library’s catalog vendor does not offer a dedicated application or mobile site, samuel liston’s comparisons of three major online catalogs on three popular mobile devices is helpful in gaining an understanding of how opacs display on smartphones. his article also outlines a procedure for testing opacs and usability. 29 at usu we can also take advantage of serials solutions’ mobile-optimized search screen and a variety of applications provided by other vendors. jensen noted that librarians should not rely solely on vendor-created applications due to vendors’ tendency to develop applications that are usable by only a segment of the overall mobile device user population. 30 he adds that libraries should also avoid developing applications for limited platforms. in addition, jensen provides a simple step-by-step process for converting articles retrieved from a vendor database to a format that can be downloaded from electronic course reserves and read on a variety of handheld devices. while using vendor-developed applications is an important strategy, most libraries will find that developing a mobile-compatible library website is necessary. information technology and libraries | june 2012 95 mobile website development can be accomplished in a variety of ways. at usu we plan to offer a version of our regular website by employing cascading style sheets (css). this method is described in the paper by bridges, et al., 31 and standard guidelines can be found in the mobile web best practices 1.0. 32 this method will allow the content to be reformatted at the point of need for a variety of platforms. results from the usu student survey indicate a desire to be able to use a mobile device for access to the library catalog, to use services like reference assistance, find articles, and make study room reservations. the library plans to include hours and location information, access to existing reference chat and text features, and links to databases with mobile friendly websites or vendor-created applications in addition to the resources requested by students. we are still unsure of the best way to provide links to applications and how to explain the various authentication methods required by each vendor. while vpn and ezproxy are possible methods to authenticate via mobile devices, vendors are content at the moment to allow students to access their resources by setting up an account that is based on an authorized e-mail domain or through a user account created on the non-mobile version of the resource. in a few cases at usu, mobile applications from vendors allow access to categories of users such as alumni because they have a usu.edu e-mail address, although the library does not typically include these patrons in our authorized remote user group. advertising, assessment, and policy creating a mobile website and offering mobile services are only the beginning of the effort to provide access to library materials for mobile users. as wilson and mccarthy found, advertising is essential; 33 students won’t use a service they don’t know about. crafting a marketing plan with both online and print materials is essential. educating library staff members, especially those on the public services front line, is an essential part of promoting mobile services. assessment strategies must be developed in order to focus development strategically. periodic surveys and focus groups can inform future development of mobile services and gauge the impact of currently offered services. librarians should encourage vendors to provide usage data for their mobile portals or applications, and libraries can track use data from their own information technology departments. implementation of mobile web services creates the need to develop new policies and to educate staff. privacy concerns and the complexities of digital rights management have the potential to transform the role of the library and its policies. 34 patrons will need to be aware that the library has less control over maintaining privacy when materials are accessed via third-party mobile applications. libraries will need to consider how new developments in pricing models may affect expanding mobile access; one example is harpercollins’ announcement in early 2011 about a policy requiring libraries to repurchase individual e mobile technologies & academics | dresselhaus and shrode 96 book titles after a cap on check-outs is reached. 35 librarians’ desire to offer reference services or other assistance via mobile devices follows naturally from their long-standing efforts to enable patrons to ask questions via e-mail, chat, instant messaging, or sms text. instant messaging, chat, and text lend themselves to mobile access because they are designed for the relatively short exchange that people typically use when communicating with a handheld device. offering reference services using sms text and chat in particular are relatively easy for libraries to employ because there are many free services to support them. in some cases, a systems administrator or it expert may be helpful in navigating the set-up of chat and text services and to integrate them so that, for example, when a text message arrives during a time when no one is monitoring the service, a voicemail message automatically appears in library’s e-mail account. librarians can find an enormous amount of advice on the web and in the literature about how to begin offering mobilefriendly reference, how to expand the virtual reference services they currently provide, and how to choose among free and fee-based services for their library’s needs and budget. two efficient places to begin are cody hanson’s special issue of library technology reports, which provides a thorough overview of mobile devices and their capabilities and straightforward suggestions for planning and implementation, and m-libraries, a section of library success: a best practices wiki. 36 conclusion in light of trends toward more widespread use of mobile computing devices and smartphones, it makes sense for libraries to provide access to their collections and services in ways that work well with mobile devices. this case study presents the situation at the merrill-cazier library at utah state university, where students who responded to a survey indicate they are very interested in mobile access, even if they have not yet purchased a smartphone or find data plans to be too expensive at this point. as is only reasonable for any library, at usu we have begun by implementing mobile applications that are available from vendors of our online catalog and databases because these require minimal effort and no additional cost. we present ideas for establishing an implementation team and advice for academic libraries who wish to “go mobile.” we aim to have a concrete plan for the work that will be required to optimize the library’s website for mobile access by the fall of 2011. a significant step is hiring a digital services librarian to work closely with the webmaster, electronic resources librarian, and others interested in promoting access to resources and services via mobile devices. our vision is to be on track to offer an augmented-reality experience to our patrons as the 2010 horizon report indicates will be an important trend in the next two to three years. we aim to create an environment in which students can use their mobile device to gain entry to a new layer of digital information, enhancing their experience in the physical library. information technology and libraries | june 2012 97 references 1. clifton dale foster, “pdas and the library without a roof,” journal of computing in higher education 7, no. 1 (1995): 85–93. 2. russell smith, “adapting a new technology to the academic medical library: personal digital assistants,” journal of the medical library association 90, no. 1 (2002): 93–94; colleen cuddy, using pdas in libraries: a how-to-do-it manual (new york: neal-schuman publishers, 2005). 3. andrea jackson, “wireless technology poised to transform health care,” rady business journal 3, no. 1 (2010): 24–26. 4. alan w. aldrich, “universities and libraries move to the mobile web,” educause quarterly 33, no. 2 (2010), www.educause.edu/educause+quarterly/educausequarterlymagazinevolum/univers itiesandlibrariesmoveto/206531 (accessed mar. 30, 2011). 5. larry johnson, alan levine, r. smith, and s. stone, the 2010 horizon report (austin, tx: the new media consortium, 2010), www.nmc.org/pdf/2010-horizon-report.pdf (accessed mar. 31, 2011). 6. shannon d. smith and judith borreson caruso, with an introduction by joshua kim, the ecar study of undergraduate students and information technology, 2010 (research study, vol. 6) (boulder, co: educause center for applied research, 2010), www.educause.edu/ecar (accessed mar. 31, 2011). 7. smith and caruso, the ecar study of undergraduate students and information technology, 2010. 8. laurie bridges et al., “making the case for a fully mobile library web site: from floor maps to the catalog,” reference services review 38, no. 2 (2010): 309–20. 9. joel cummings, alex merrill, and steve borrelli, “the use of handheld mobile devices: their impact and implications for library services,” library hi tech 28, no. 1 (2009): 22– 40. 10. rubicon consulting, the apple iphone: success and challenges for the mobile industry (los gatos, ca: rubicon consulting, 2008), http://rubiconconsulting.com/downloads/whitepapers/rubicon-iphone_user_survey.pdf (accessed mar. 31, 2011). 11. sally wilson and graham mccarthy, “the mobile university: from the library to the campus,” reference services review 38, no. 2 (2010): 215. http://www.educause.edu/educause%2bquarterly/educausequarterlymagazinevolum/universitiesandlibrariesmoveto/206531 http://www.educause.edu/educause%2bquarterly/educausequarterlymagazinevolum/universitiesandlibrariesmoveto/206531 http://www.educause.edu/educause%2bquarterly/educausequarterlymagazinevolum/universitiesandlibrariesmoveto/206531 file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.nmc.org/pdf/2010-horizon-report.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.nmc.org/pdf/2010-horizon-report.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.educause.edu/ecar http://rubiconconsulting.com/downloads/whitepapers/rubicon-iphone_user_survey.pdf http://rubiconconsulting.com/downloads/whitepapers/rubicon-iphone_user_survey.pdf mobile technologies & academics | dresselhaus and shrode 98 12. ibid., 216. 13. ibid., 223. 14. jamie seeholzer and joseph a. salem, “library on the go: a focus group study of the mobile web and the academic library,” college and research libraries 72, no. 1 (2011): 9– 20. 15. todd spires, “handheld librarians: a survey of librarian and library patron use of wireless handheld devices,” internet reference services quarterly 13, no. 4 (2008): 287– 309. 16. michael m. smith and barbara a. pietraszewski, “enabling the roving reference librarian: wireless access with tablet pcs,” reference services review 32, no. 3 (2004): 249–55. 17. kathryn zickuhr, generations and their gadgets (washington, d.c.: pew internet & american life project, 2011), http://pewinternet.org/reports/2011/generations-andgadgets.aspx (accessed mar. 31, 2011). 18. smith and pietraszewski, “enabling the roving reference librarian,” 253. 19. lisa carlucci thomas, “gone mobile: mobile catalogs, sms reference, and qr codes are on the rise—how are libraries adapting to mobile culture?” library journal 135, no. 17 (2020): 30–34. 20. smith and caruso, the ecar study of undergraduate students and information technology, 2010. 21. ibid. 22. carolyn klatt, “going mobile: free and easy,” medical reference services quarterly 30, no. 1 (2011): 56–73. 23. kim griggs, laurie m. bridges, and hannah gascho rempel, “library/mobile: tips on designing and developing mobile web sites,” code4lib 8, november 23, 2009, http://journal.code4lib.org/articles/2055 (accessed mar. 30, 2011). 24. godmar back and a. bailey, “web services and widgets for library information systems,”information technology & libraries 29, no. 2 (2010): 76–86. 25. mark andy west, arthur w hafner, and bradley d. faust, “communications—expanding access to library collections and services using small-screen devices,” information technology & libraries 25, no. 2 (2006): 103. 26. courtney greene, missy roser, and elizabeth ruane, the anywhere library: a primer for the mobile web (chicago: association of college and research libraries, 2010). http://pewinternet.org/reports/2011/generations-and-gadgets.aspx http://pewinternet.org/reports/2011/generations-and-gadgets.aspx http://journal.code4lib.org/articles/2055 information technology and libraries | june 2012 99 27. cody w. hanson, “libraries and the mobile web,” library technology reports 42, no. 2 (february/march 2011). 28. griggs, bridges, and gascho rempel, “library/mobile.” 29. samuel liston, “opacs and the mobile,” computers in libraries 29, no. 5 (2009): 6–47. 30. r. bruce jensen, “optimizing library content for mobile phones,” library hi tech news 27, no. 2 (2010): 6–9. 31. griggs, bridges, and gascho rempel, “library/mobile.” 32. “mobile web best practices 1.0,” worldwide web consortium (w3c), www.w3.org/tr/mobile-bp (accessed mar. 30, 2011). 33. wilson and mccarthy, “the mobile university.” 34. timothy vollmer, there’s an app for that! libraries and mobile technology: an introduction to public policy considerations (policy brief no. 3) (washington, d.c.: ala office for information technology policy, 2010), www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf (accessed mar. 31, 2011). 35. josh hadro, “harpercollins puts 26 loan cap on ebook circulations,” library journal, february 25, 2011, www.libraryjournal.com/lj/home/889452264/harpercollins_puts_26_loan_cap.html.csp (accessed mar. 31, 2011). 36. “m-libraries: library success: a best practices wiki,” www.libsuccess.org/index.php?title=m-libraries, (accessed mar. 31, 2011). file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.libsuccess.org/index.php%3ftitle=m-libraries, mobile technologies & academics | dresselhaus and shrode 100 appendix a. student survey questions 1. type of student? 2. age? 3. gender? 4. what is your college? 5. how often do you use the following electronic resources provided by your library? 6. do you use any of the following devices? 7. do you use your mobile device or phone for academic purposes (e.g., blackboard, electronic course reserves, etc.)? 8. please list what you use your device to do? 9. have you ever used a text message to get help using the library? 10. have you ever used instant messaging to get help using the library? 11. if library resources were easily accessible on your mobile devices and if you had such a device, how likely would you be to use any of the following for assignments or research? 12. what mobile services would you like the library to offer? 13. comments? information technology and libraries | june 2012 101 appendix b. librarian survey questions 1. type of library? 2. your job/role in the library? 3. years working in libraries? 4. does your library offer mobile device applications for the following electronic resources? 5. who in your library or on your campus is responsible for implementing or developing mobile device applications? 6. how long has your library provided access via mobile devices to electronic resources or services? 7. if you collect use data for library electronic resources, are patrons using the mobile device applications your library provides? 8. what mobile services do you believe libraries should offer? 9. comments? 26 information technology and libraries | september 2007 author id box for 2 column layout wikis in libraries matthew m. bejune wikis have recently been adopted to support a variety of collaborative activities within libraries. this article and its companion wiki, librarywikis (http://librarywikis. pbwiki.com/), seek to document the phenomenon of wikis in libraries. this subject is considered within the framework of computer-supported cooperative work (cscw). the author identified thirty-three library wikis and developed a classification schema with four categories: (1) collaboration among libraries (45.7 percent); (2) collaboration among library staff (31.4 percent); (3) collaboration among library staff and patrons (14.3 percent); and (4) collaboration among patrons (8.6 percent). examples of library wikis are presented within the article, as is a discussion for why wikis are primarily utilized within categories i and ii and not within categories iii and iv. it is clear that wikis have great utility within libraries, and the author urges further application of wikis in libraries. i n recent years, the popularity of wikis has skyrocketed. wikis were invented in the mid1990s to help facilitate the exchange of ideas between computer programmers. the use of wikis has gone far beyond the domain of com puter programming, and now it seems as if every google search contains a wikipedia entry. wikis have entered into the public consciousness. so, too, have wikis entered into the domain of professional library practice. the purpose of this research is to document how wikis are used in librar ies. in conjunction with this article, the author has created librarywikis (http://librarywikis.pbwiki.com/), a wiki to which readers can submit additional examples of wikis used in libraries. the article will proceed in three sections. the first section is a literature review that defines wikis and introduces computersupported cooperative work (cscw) as a context for understanding wikis. the second section documents the author’s research and presents a schema for classifying wikis used in libraries. the third section considers the implications of the research results. ■ literature review what’s a wiki? wikipedia (2007a) defines a wiki as: a type of web site that allows the visitors to add, remove, edit, and change some content, typically with out the need for registration. it also allows for linking among any number of pages. this ease of interaction and operation makes a wiki an effective tool for mass collaborative authoring. wikis have been around since the mid1990s, though it is only recently that they have become ubiquitous. in 1995, ward cunningham launched the first wiki, wikiwikiweb (http://c2.com/cgi/wiki), which is still active today, to facilitate the exchange of ideas among computer program mers (wikipedia 2007b). the launch of wikiwikiweb was a departure from the existing model of web communica tion ,where there was a clear divide between authors and readers. wikiwikiweb elevated the status of readers, if they so chose, to that of content writers and editors. this model proved popular, and the wiki technology used on wikiwikiweb was soon ported to other online communi ties, the most famous example being wikipedia. on january 15, 2001, wikipedia was launched by larry sanger and jimmy wales as a complementary project for the nowdefunct nupedia encyclopedia. nupedia was a free, online encyclopedia with articles written by experts and reviewed by editors. wikipedia was designed as a feeder project to solicit new articles for nupedia that were not submitted by experts. the two services coexisted for some time, but in 2003 the nupedia servers were shut down. since its launch, wikipedia has undergone rapid growth. at the close of 2001, wikipedia’s first year of operation, there were 20,000 articles in eighteen language editions. as of this writing, there are approximately seven million articles in 251 languages, fourteen of which have more than 100,000 articles each. as a sign of wikipedia’s growth, when this manuscript was first submitted four months earlier, there were more than five million articles in 250 languages. author’s note: sources in the previous two para graphs come from wikipedia. the author acknowledges the concerns within the academy regarding the practice of citing wikipedia within scholarly works; however, it was decided that wikipedia is arguably an authoritative source on wikis and itself. nevertheless, the author notes that there were changes—insubstantial ones—to the cited wikipedia entries between when the manuscript was first submitted and when it was revised four months later. wikis and cscw wikis facilitate collaborative authoring and can be con sidered one of the technologies studied under the domain of cscw. in this section, cscw is explained and it is shown how wikis fit within this framework. cscw is an area of computer science research that considers the application of computer technology to sup port cooperative, also referred to as collaborative work. the term was first coined in 1984 by irene greif (1988) and matthew m. bejune (mbejune@purdue.edu) is an assistant professor of library science at purdue university libraries. he also is a doctoral student at the graduate school of library and information science, university of illinois at urbana-champaign. article title | author 27wikis in libraries | bejune 27 paul cashman to describe a workshop they were planning on the support of people in work environments with com puters. over the years there have been a number of review articles that describe cscw in greater detail, including bannon and schmidt (1991), rodden (1991), schmidt and bannon (1992), sachs (1995), dourish (2001), ackerman (2002), olson and olson (2002), dix, finlay, abowd, and beale (2004), and shneiderman and plaisant (2005). publication in the field of cscw primarily occurs through conferences. the first conference on cscw was held in 1986 in austin, texas. since then, the conference has been held biennially in the united states. proceedings are published by the association for computing machinery (acm, http://www.acm.org/). in 1991, the first european conference on computer supported cooperative work (ecscw) was held in amsterdam. ecscw also is held biennially, in oddnumbered years. ecscw proceedings are published by springer (http://www.ecscw.unisie gen.de/). the primary journal for cscw is computer supported cooperative work: the journal of collaborative computing. publications also appear within publications of the acm and chi, the conference on human factors in computing. cscw and libraries as libraries are, by nature, collaborative work envi ronments—library staff working together and with patrons—and as digital libraries and computer technolo gies become increasingly prevalent, there is a natural fit between cscw and libraries. the following researchers have applied cscw to libraries. twidale et al. (1997) pub lished a report sponsored by the british library research and innovation centre that examined the role of col laboration in the informationsearching process to inform how information systems design could better address and support collaborative activity. twidale and nichols (1998) offered ethnographic research of physical collaborative environments—in a university library and an office—to aid the design of digital libraries. they wrote two reviews of cscw as applied to libraries—the first was more com prehensive (twidale and nichols 1998) than the second (twidale and nichols 1999). sánchez (2001) discussed collaborative environments designed and prototyped for digital library environments. classification of collaboration technologies that facilitate collaborative work are typically classified within cscw across two continua: synchronous versus asynchronous, and colocated versus remote. if put together in a twobytwo matrix, there are four possibilities: (1) synchronous and colocated (same time, same place); (2) synchronous and remote (same time, different place); (3) asynchronous and remote (different time, different place); and (4) asynchronous and colocated (different time, same place). this classification schema was first proposed by johansen et al. (1988). nichols and twidale (1999) mapped work applications within the realm of cscw in figure 1. wikis are not present in the figure, but their absence is not an indication that they are not cooperative work technologies. rather, wikis were not yet widely in use at the time cscw was considered by nichols and twidale. the author has added wikis to nichols and twidale’s graphical representation in figure 2. interestingly, wikis are bordercrossers fitting within two quadrants: the upper right—asynchronous and colocated; and the lower right—asynchronous and remote. wikis are asynchronous in that they do not require people to be working together at the same time. they are both colocated and remote in that people working collaboratively may not need to be working in the same place. it is also interesting to note that library technologies also can be mapped using johansen’s schema. nichols and twidale (1999) also mapped this, and figure 3 illus trates the variety of collaborative work that goes on within libraries. ■ method in order to to discover the widest variety of wikis used in libraries, the author searched for examples of wikis used in libraries within three areas—the lis literature, the library success wiki, and within messages posted on three professional electronic discussion lists. when examples were found, they were logged and classified according to a schema created by the author. results are presented in the next section. the first area searched was within the lis literature. the author utilized the wilson library literature and figure 1. classification of cscw applications co-located remote synchronous asynchronous meeting rooms distributed meetings muds and moos shared drawing video conferencing collaborative writing team rooms organizational memory workflow web-based applications collaborative writing 2� information technology and libraries | september 20072� information technology and libraries | september 2007 information science database. there were two main types of articles: ones that argued for the use of wikis in libraries, and ones that were case studies of wikis that had been implemented. the second area searched was within library success: a best practices wiki (http://www.libsuccess.org/) (see figure 4), created by meredith farkas, distance learning librarian at norwich university. as the name implies, it is a place for people within the library community to share their success stories. posting to the wiki is open to the public, though registration is encouraged. there are many subject areas on the wiki, including management and leadership, readers’ advisory, reference services, infor mation literacy, and so on. there also is a section about collaborative tools in libraries (http://www.libsuccess .org/index.php?title=collaborative_tools_in_libraries), in which examples of wikis in libraries are presented. within this section there is a presentation about wikis made by farkas (2006) titled wiki world (http://www. libsuccess.org/indexphp?title=wiki_world), from which examples were culled. the third area that was searched was professional electronic discussion list messages from web4lib, dig_ ref, and librefl. the web4lib electronic discussion list (tennant 2005) is “for the discussion of issues relating to the creation, management, and support of library based world wide web servers, services, and applica tions.” the list is moderated by roy tennant and the web4lib advisory board and was started in 1994. the dig_ref electronic discussion list is a forum for “people and organizations answering the questions of users via the internet” (webjunction n.d.). the list is hosted by the information institute of syracuse, school of information studies, syracuse university, and was created in 1998. the librefl electronic discussion list is “a moderated discussion of issues related to reference librarianship (balraj 2005). established in 1990, it’s operated out of kent state university and moderated by a group of list own ers. these three electronic discussion lists were selected for two reasons. first, the author is a subscriber to each electronic discussion list, and prior to the research noted there were messages about wikis in libraries. second, based on the descriptions of each electronic discussion list stated above, the selected electronic discussion lists reasonably covered the discussion of wikis in libraries within the professional library electronic discussion lists. one year of messages, november 15, 2005, through november 14, 2006, was analyzed for each list. messages about wikis in libraries were identified through key word searches against the author’s personal archive of electronic discussion list messages collected over the figure 2. classification of cscw applications including wikis co-located remote synchronous asynchronous meeting rooms distributed meetings muds and moos shared drawing video conferencing collaborative writing wikis team rooms wikis organizational memory workflow web-based applications collaborative writing figure 3. classification of collaborative work within libraries co-located remote synchronous asynchronous personal help reference interview issue of book on loan fact-to-face interactions use of opacs database search video conferencing telephone notice boards post-it notes memos documents for study social information filtering e-mail, voicemail distance learning postal services figure �. library success: a best practices wiki (http://www. libsuccess.org/) article title | author 29wikis in libraries | bejune 29 years. an alternative method would have been to search the web archive of each list, but the author found it easier to search within his mail client, microsoft outlook. messages with the word “wiki” were found in 513 mes sages: 354 in web4lib, 91 in dig_ref, and 68 in libref l. this approach had high recall, as discourse about wikis frequently included the use of the word “wiki,” though low precision, as there were many results that were not about wikis used in libraries. common false hits included messages about the nature study (giles 2005) that com pared wikipedia to encyclopedia britannica, and messages that included the word “wiki” but were simply refer ring to wikis, though not examples of wikis used within libraries. from the list of 513 messages, the author read each message and came up with a much shorter list of thirtynine messages about wikis in libraries: thirtytwo in web4lib, three in dig_ref, and four in librefl. ■ results classification of the results after all wiki examples had been collected, it became clear that there was a way to classify the results. in farkas’s (2006) presentation about wikis, she organized wikis in two categories: (1) how libraries can use wikis with their patrons; and (2) how libraries can use wikis for knowledge sharing and collaboration. this schema, while it accounts for two types of collaboration, is not granular enough to represent the types of collaboration found within the wiki examples identified. as such, it became clear that another schema was needed. twidale and nichols (1998) identified three types of collaboration within libraries: (1) collaboration among library staff; (2) collaboration between a patron and a member of staff; and (3) collaboration among library users. their classification schema mapped well to the examples of wikis that were identified; however, it too was not granular enough, as it did not distinguish among col laboration between library staff intraorganizationally and extraorganizationally, the two most common types of wiki usage found in the research (see appendix). to account for these types of collaboration, which are common not only to wiki use in libraries but to all professional library prac tice, the author modified twidale and nichols schema (see figure 6). the improved schema also uniformly represents entities across the categories—library staff and member of staff are referred to as “library staff,” and patrons and library users are referred to as “patrons.” examples of wikis used in libraries for each category are provided to better illustrate the proposed classifica tion schema. ■ collaboration among libraries the library instruction wiki (http://instructionwiki .org/main_page) is an example of a wiki that is used for collaboration among libraries (figure 7). it appears as though the wiki was originally set up to support library instruction within oregon—it is unclear if this was asso ciated with a particular type of library, say academic or public—but now the wiki supports library instruction in general. the wiki is selfdescribed as: a collaboratively developed resource for librarians involved with or interested in instruction. all librarians and others interested in library instruction are welcome and encouraged to contribute. the tagline for the wiki is “stop reinventing the wheel”(library instruction wiki 2006). from this wiki there figure 6. four types of collaboration within libraries 1. collaboration among libraries (extra-organizational) 2. collaboration among library staff (intra-organizational) 3. collaboration among library staff and patrons 4. collaboration among patrons figure 5. wiki world (http://www.libsuccess.org/index.php?title=wiki _world) 30 information technology and libraries | september 200730 information technology and libraries | september 2007 is a list of library instruction resources that include the fol lowing: handouts, tutorials, and other resources to share; teaching techniques, tips, and tricks; classspecific web sites and handouts; glossary and encyclopedia; bibliography and suggested reading; and instructionrelated projects, brainstorms, and documents. within the handouts, tutori als, and other resources to share section, the author found a wide variety of resources from libraries across the country. similarly, there were a number of suggestions to be found under the teaching techniques, tips, and tricks section. another example of a wiki used for collaboration among libraries is the library success wiki (http://www .libsuccess.org/), one of the sources of examples of wikis used in this research. adding to earlier descriptions of this wiki as presented in this paper, library success seems to be one of the most frequently updated library wikis and perhaps the most comprehensive in its cover age of library topics. ■ collaboration among library staff the university of connecticut libraries’ staff wiki (http:// wiki.lib.uconn.edu/) is an example of a wiki used for col laboration among library staff (figure 8). this wiki is a knowledge base containing more than one thousand infor mation technology services (its) documents. its docu ments support the information technology needs of the library organization. examples include answers to com monly asked questions, user manuals, and instructions for a variety of computer operations. in addition to being a repository of its documents, the wiki also serves as a portal to other wikis within the university of connecticut libraries. there are many other wikis connected to library units; teams; software applications, such as the libraries ils; libraries within the university of connecticut libraries; and other university of connecticut campuses. the health science library knowledge base, stony brook university (http://appdev.hsclib.sunysb.edu/ twiki/bin/view/main/webhome) is another example of a wiki that is used for collaboration among library staff (figure 9). the wiki is described as “a space for the dynamic collaboration of the library staff, and a platform of shared resources” (health sciences library 2007). on the wiki there are the following content areas: news and announcements; hsl departments; projects; trouble shooting; staff training resources, working papers and support materials; and community activities, scholarship, conferences, and publications. ■ collaboration among library staff and patrons there are only a few examples of wikis used for collabora tion among library staff and patrons to cite as exemplars. one example is the st. joseph county public library (sjpl) subject guides (http://www.libraryforlife.org/ subjectguides/index.php/main_page), seen in figure 10. this wiki is a collection of resources and services in print and electronic formats to assist library patrons with subject area searching. as the wiki is published by library staff for public consumption, it has more of a professional feel than wikis from the first two categories. pages have images, and the content is structured to look like a standard web page. though the wiki looks like a web page, there still remain a number of edit links that follow each section of text on the wiki. while these tags bear importance for those editing figure 7. library instruction wiki (http://instructionwiki.org/) figure �. the university of connecticut libraries’ staff wiki (http:// wiki.lib.uconn.edu/) article title | author 31wikis in libraries | bejune 31 the wiki—library staff only in this case—they undoubtedly puzzle library patrons who think that they have the ability to edit the wiki when, in fact, they do not. another example of collaboration between library staff and patrons that takes a similar approach is the usc aiken gregggraniteville library web site (http://library. usca.edu/) in figure 11. as with the sjpl subject guides, this wiki looks more like a web site than a wiki. in fact, the usc aiken wiki conceals its true identity as a wiki even more so than the sjpl subject guides. the only evidence that the web site is a wiki is a link at the bottom of each page that says “powered by pmwiki.” pmwiki (http:// pmwiki.org/) is a content management system that uti lizes the wiki technology on the back end to manage a web site while retaining the look and feel of a standard web site. it seems that the benefits of using a wiki in such a way are shared content creation and management. ■ collaboration among patrons as there are only three examples of wikis used for col laboration among patrons, all examples will be high lighted in this section. the first example is wiki worldcat (http://www.oclc.org/productworks/wcwiki.htm), sponsored by oclc. wiki worldcat launched as a pilot project in september 2005. the service allows users of open worldcat, oclc’s web version of worldcat, to add book reviews to item records. though this wiki does not have many book reviews in it, even for contemporary bestsellers, it gives a taste for how a wiki could be used to facilitate collaboration among patrons. a second example is the biz wiki from ohio university libraries (http://www.library.ohiou.edu/subjects/ bizwiki/index.php/main_page) (see figure 12). the biz wiki is a collection of business information resources avail able through ohio university. the wiki was created by chad boeninger, reference and instruction librarian, as an alternate form of a subject guide or pathfinder. what separates this wiki from those in the third category, collaboration among library staff and patrons, is that the wiki is editable by patrons as well as librarians. similarly, butler wikiref (http://www .seedwiki.com/wiki/butler_wikiref) is a wiki that has reviews of reference resources created by butler librarians, faculty, staff, and students (see figure 13).figure 9. health sciences library knowledge base (http://appdev .hsclib.sunysb.edu/twiki/bin/view/main/webhome) figure 11. usc aiken gregg-graniteville library (http://library.usca .edu/) figure 10. sjcpl subject guides (http://libraryforlife.org/subject guides/index.php/main_page/) 32 information technology and libraries | september 200732 information technology and libraries | september 2007 full results thirtythree wikis were identified. two wikis were classi fied in two categories each. the full results are available in the appendix. table 1 illustrates how wikis were not uniformly distributed across the four categories: category i had 45.7 percent, category ii had 31.4 percent, category iii had 14.3 percent, and category iv had 8.6 percent. nearly 80 percent of all examples were found within categories i and ii. as seen in some of the examples in the previous section, wikis were utilized for a variety of purposes. here is a short list of purposes for which wikis were utilized: for sharing information, supporting association work, collecting soft ware documentation, supporting conferences, facilitating librariantofaculty collaboration, creating digital reposito ries, managing web content, creating intranets, providing reference desk support, creating knowledge bases, creating subject guides, and collecting reader reviews. wiki software utilization is summarized in tables 2 and 3. mediawiki is the most popular software utilized by libraries (33.3 percent), followed by unknown (30.3 percent), pbwiki (12.1 percent), pmwiki (12.1 percent), seedwiki (6.1 percent), twiki (3 percent), and xwiki (3 percent). if the values for unknown are removed from the totals (table 3 ), mediawiki is utilized in almost half (47.8 percent) of all library wiki applications. ■ discussion with a wealth of examples of wikis in categories i and ii and a dearth of examples of wikis in categories iii and iv, the library community seems to be more comfortable using wikis to collaborate within the community, but less comfortable using wikis to collaborate with library patrons or to enable collaboration among patrons. the research results pose the questions: why are wikis pre dominantly used for collaboration within the library community? and why are wikis minimally used for col laborating with patrons and helping patrons to collabo rate with one another? why are wikis predominantly used for collaboration within the library community? this is perhaps the easier of the two questions to explain. there is a long legacy of cooperation and collaboration intraorganizationally and extraorganizationally within libraries. one explanation for this is the shared bud getary climate within libraries. all too often there are insufficient money, staff, and resources to offer desired levels of service. librarians work together to overcome these barriers. prominent examples include coopera tive cataloging, interlibrary lending, and the formation of consortia to negotiate pricing. another explanation can be found in the personal characteristics of library professionals. librarianship is a service profession that consequently attracts serviceminded individuals who are interested in helping others, whether they are library patrons or fellow colleagues. a third reason is the role of library associations, such as the international federation of library associations and institutions, the american library association, the special libraries association, and the medical library association, as well as many others at the international, national, state, and local lev figure 12. ohio university libraries biz wiki (http://www.library. ohiou.edu/subjects/bizwiki) figure 13. butler wikiref (http://www.seedwiki.com/wiki/butler_ wikiref) article title | author 33wikis in libraries | bejune 33 els, and the work that is done through these associations at annual conferences and throughout the year. libraries use wikis to collaborate intraorganizationally and extra organizationally because collaboration is what they do most naturally. why are wikis minimally used for collaborating with patrons and helping patrons to collaborate with one another? the reasons for why libraries are only minimally using wikis to collaborate with patrons and for patron collabora tion are more difficult to ascertain. however, due to the untapped potential of using wikis, the proposed answers to this question are more important and may lead to future implementations of wikis in libraries. here are four pos sible explanations, some more speculative than others. first, perhaps one of the reasons is the result of the way in which libraries are conceived by library patrons and librarians alike. a strong case can be made for libraries as places of collaborative work, and the author takes this posi tion. however, historically libraries have been repositories of information, and this remains a pervasive and difficult concept to change—libraries are frequently seen simply as places to get books. in this scenario, the librarian is a gate keeper that a patron interacts with to get a book—that is, if the patron interacts with a librarian at all. it also is worthy to note that the relationship is oneway—the patron needs the assistance of librarian, but not the other way around. viewed in these terms, this is not a collaborative situation. for libraries to use wikis for the purpose of collaborating with library patrons, it might demand the reconceptualiza tion of libraries by library patrons and librarians. similarly, this extreme conceptualization of libraries does not con sider patrons working with one another, even though it is an activity that occurs formally and informally within libraries, not to mention with the emergence of interdisci plinary and multidisciplinary work. if wikis are to be used to facilitate collaboration between patrons, the conceptual ization of the library by library patrons and librarians must be expanded. second, there may be fears within the library commu nity about authority, responsibility, and liability. libraries have long held the responsibility of ensuring the authority of the bibliographic catalog. if patrons are allowed to edit the library wiki, there is potential for negatively affecting the authority of the wiki and even the perceived author ity of the library. likewise, there is potential liability in allowing patrons to post to the library wiki. similar con table 2. software totals wiki software no. % mediawiki 11 33.3 unknown 10 30.3 pbwiki 4 12.1 pmwiki 4 12.1 seedwiki 2 6.1 twiki 1 3 xwiki 1 3 total: 33 100 table 3. software totals without unknowns wiki software no. % mediawiki 11 47.8 pbwiki 4 17.4 pmwiki 4 17.4 seedwiki 2 8.7 twiki 1 4.3 xwiki 1 4.3 total: 23 100.0 table 1. classification summary category no. % i: collaboration among libraries 16 45.7 ii: collaboration among library staff 11 31.4 iii: collaboration among library staff and patrons 5 14.3 iv: collaboration among patrons 3 8.6 total: 35 100.0 3� information technology and libraries | september 20073� information technology and libraries | september 2007 cerns have been raised in the past about other collabora tive technologies, such as blogs, bulletin boards, mailing lists, and so on, all aspects of the library 2.0 movement. if libraries are fully to realize library 2.0 as described by casey and savastinuk (2006), miller (2006), and courtney (2007), these issues must be considered. third, perhaps it is due to a matter of fit. it might be the case that wikis are utilized in categories i and ii and not within categories iii and iv because the tools are better suited to support the types of activities within categories i and ii. consider some of the activities listed earlier: sup porting association work, collecting software documenta tion, supporting conferences, creating digital repositories, creating intranets, and creating knowledge bases. each of these illustrates a wiki that is utilized for the creation of a resource with multiple authors and readers, tasks that are wellsuited to wikis. wikipedia is a great example of a wiki with clear, shared tasks for multiple authors and multiple readers and a sense of persistence over time. in contrast, relationships between library staff and patrons do not typically lead to the shared creation of resources. while it is true that the relationship between patron and librarian in the context of a patron’s research assignment can be collab orative depending on the circumstances, authorship is not shared but is possessed by the patron. in addition, research assignments in the context of undergraduate coursework are shortlived and seldom go beyond the confines of a particular course. in terms of patrons working together with other patrons, there is the precedent of group work; however, groups often produce projects or papers that share the characteristics of nongroup research assignments listed above. this, of course, does not mean that wikis are not suitable for collaboration within categories iii and iv, but perhaps the opportunities for collaboration are fewer or that they stretch the imagination of the types and ways of doing collaborative work. fourth, perhaps it is a matter of “not yet.” while the research has shown that libraries are not utilizing wikis in categories iii and iv, this may be because it is too soon. it should be noted that wikis are still new technologies. it might be the case that librarians are experimenting in safer contexts so they will gain experience prior to trying more public projects where their expertise will be needed. if this explanation is true, it is expected that more exam ples of wikis in libraries will soon emerge. as they do, the author hopes that all examples of wikis in libraries, new and old, will be added to the companion wiki to this article, librarywikis (http://librarywikis.pbwiki.com/). ■ conclusion it appears that wikis are here to stay, and that their utili zation within libraries is only just beginning. this article documented the current practice of wikis used in libraries using cscw as a framework for discussion. the author located examples of wikis in three places: within the lis lit erature, on the library success wiki, and within messages from three professional electronic discussion lists. thirty three examples of wikis were identified and classified using a classification schema created by the author. the schema has four categories: (1) collaboration among librar ies; (2) collaboration among library staff; (3) collaboration among library staff and patrons; and (4) collaboration among patrons. wikis were used for a variety of purposes, including for sharing information, supporting associa tion work, collecting software documentation, supporting conferences, facilitating librariantofaculty collaboration, creating digital repositories, managing web content, creat ing intranets, providing reference desk support, creating knowledge bases, creating subject guides, and collecting reader reviews. by and large, wikis were primarily used to support collaboration among library staff intraorganiza tionally and extraorganizationally, with nearly 80 percent (45.7 percent and 31.4 percent respectively) of the examples so identified, and less so in the support of collaboration among library staff and patrons (14.3 percent) and col laboration among patrons (8.6 percent). a majority of the examples of wikis utilized the mediawiki software (47.8 percent). it is clear that there are plenty of examples of wikis utilized in libraries, and more to be found each day. it is at this time that the profession is faced with extending the use of this technology, and it is to the future to see how wikis will continue to be used within libraries. works cited ackerman, mark s. 2002. the intellectual challenge of cscw: the gap between social requirements and technical feasibil ity. in human-computer interaction in the new millennium, ed. john m. carroll, 179–203. new york: addisonwesley. balraj, leela, et al. 2005 librefl. kent state university librar ies. http://www.library.kent.edu/page/10391 (accessed june 12, 2007). archive is available at this link as well. bannon, liam j., and kjeld schmidt. 1991. cscw: four charac ters in search for a context. in studies in computer supported cooperative work. ed. john m. bowers and steven d. benford, 3–16. amsterdam: elsevier. casey, michael e., and laura c. savastinuk. 2006. library 2.0. library journal 131, no. 14: 40–42. http://www.libraryjournal. com/article/ca6365200.html (accessed june 12, 2007). courtney, nancy. 2007. library 2.0 and beyond: innovative technologies and tomorrow’s user (in press). westport, conn.: libraries unlimited. dix, alan, et al. 2004. socioorganizational issues and stake holder requirements. in human computer interaction, 3rd ed., 450–74. upper saddle river, n.j.: prentice hall. dourish, paul. 2001. social computing. in where the action is: the foundations of embodied interaction, 55–97. cambridge, mass: mit pr. article title | author 35wikis in libraries | bejune 35 farkas, meredith. 2006. wiki world. http://www.libsuccess. org/index.php?title=wiki_world (accessed june 12, 2007). giles, jim. 2005. internet encyclopaedias go head to head. nature 438: 900–01. http://www.nature.com/nature/journal/v438/ n7070/full/438900a.html (accessed june 12, 2007). greif, irene, ed. 1988. computer supported cooperative work: a book of readings. san mateo, calif.: morgan kaufmann publishers. health sciences library, state university of new york, stony brook. 2007. health sciences library knowledge base. http://appdev.hsclib.sunysb.edu/twiki/bin/view/main/ webhome (accessed june 12, 2007). johansen, robert, et al. 1988. groupware: computer support for business teams. new york: free press. library instruction wiki. 2006. http://instructionwiki.org/ main_page (accessed june 12, 2007). miller, paul. 2006. coming together around library 2.0. dlib magazine 12, no. 4. http://www.dlib.org/dlib/april06/ miller/04miller.html (accessed june 12, 2007). nichols, david m., and michael b. twidale. 1999. com puter supported cooperative work and libraries. vine 109: 10–15. http://www.comp.lancs.ac.uk/computing/research/ cseg/projects/ariadne/docs/vine.html (accessed june 12, 2007). olson, gary m., and judith s. olson. 2002. groupware and com putersupported cooperative work. in the human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, ed. julie a. jacko and andrew sears, 583–95. mahwah, n.j.: lawrence erlbaum associates, inc.. rodden, tom t. 1991. a survey of cscw systems. interacting with computers 3, no. 3: 319–54. sachs, patricia. 1995. transforming work: collaboration, learn ing, and design. communications of the acm 38: 227–49. sánchez, j. alfredo. 2001. hci and cscw in the context of digi tal libraries. in chi ‘01 extended abstracts on human factors in computing systems. conference on human factors in computing systems. seattle, wash., mar. 31–apr. 5 2001. schmidt, kjeld, and liam j. bannon. 1992. taking cscw seri ously: supporting articulation work. computer supported cooperative work 1, no. 1/2: 7–40. shneiderman, ben, and catherine plaisant. 2005. collaboration. in designing the user interface: strategies for effective humancomputer interaction, 4th ed., 408–50. reading, mass.: addison wesley. tennant, roy. 2005. web4lib electronic discussion. webjunc tion.org. http://lists.webjunction.org/web4lib/ (accessed june 12, 2007). archive is available at this link as well. twidale, michael b., et al. 1997. collaboration in physical and digital libraries. report no. 64, british library research and innovation centre. http://www.comp.lancs.ac.uk/ computing/research/cseg/projects/ariadne/bl/report/ (accessed june 12, 2007). twidale, michael b., and david m. nichols. 1998a. using studies of collaborative activity in physical environments to inform the design of digital libraries. technical report cseg/11/98, computing department, lancaster university, uk. http://www.comp.lancs.ac.uk/computing/research/cseg/ projects/ariadne/docs/cscw98.html (accessed june 12, 2007). twidale, michael b., and david m. nichols. 1998b. a survey of applications of cscw for digital libraries. technical report cseg/4/98, computing department, lancaster university, uk. http://www.comp.lancs.ac.uk/computing/research/cseg/ projects/ariadne/docs/survey.html (accessed june 12, 2007). webjunction. n.d. dig_ref electronic discussion list. http:// www.vrd.org/dig_ref/dig_ref.shtml (accessed june 12, 2007). wikipedia. 2007a. wiki. http://en.wikipedia.org/wiki/wiki (accessed april 29, 2007). wikipedia. 2007b. wikiwikiweb. http://en.wikipedia.org/ wiki/wikiwikiweb (accessed april 29, 2007). 36 information technology and libraries | september 200736 information technology and libraries | september 2007 appendix. wikis in libraries i = collaboration between libraries ii = collaboration between library staff iii = collaboration between library staff and patrons iv = collaboration between patrons category description location wiki software i library success: a best practices wiki—a wiki capturing library success stories. covers a wide variety of topics. also features a presentation about wikis http://www.libsuccess. org/index.php?title=wiki_world http://www.libsuccess.org/ mediawiki i wiki for school library association in alaska http://akasl.pbwiki.com/ pbwiki i wiki to support reserves direct. free, opensource software for managing academic reserves materials developed by emory university. http://www.reservesdirect.org/ wiki/index.php/main_page mediawiki i sunyla new tech wiki—a place for state university of new york (suny) librarians to share how they are using information technologies to interact with patrons http://sunylanewtechwiki.pbwiki. com/ pbwiki i wiki for librarians and faculty members to collaborate across campuses. being used with distance learning instructors and small groups message from robin shapiro. on [dig_ref] electronic discussion list dated 10/18/2006. unknown i discusses setting up three wikis in last month: “one to sup port a preconference workshop, another for behindthe scenes conferences planning by local organizers, and one for conference attendees to use before they arrived and during the sessions” (30). fichter, darlene. 2006. using wikis to support online collaboration in libraries. information outlook 10, no.1: 3031. unknown i unofficial wiki to the american library association 2005 annual conference http://meredith.wolfwater.com/ wiki/index.php?title=main_page mediawiki i unofficial wiki to the 2005 internet librarian conference http://ili2005.xwiki.com/xwiki/bin/ view/main/webhome xwiki i wiki for the canadian library association (cla) 2005 annual conference http://wiki.ucalgary.ca/page/cla mediawiki i wiki for south carolina library association http://www.scla.org/governance/ homepage pmwiki i wiki set up to support national discussion about institutional repositories in new zealand http://wiki.tertiary.govt.nz/ ~institutionalrepositories pmwiki i the oregon library instruction wiki used for sharing infor mation about library instruction http://instructionwiki.org/ mediawiki i personal repositories online wiki environment (prowe)— an online repository sponsored by the open university and the university of leicester that uses wikis and blogs to encourage the open exchange of ideas across communities of practice http://www.prowe.ac.uk/ unknown article title | author 37wikis in libraries | bejune 37 category description location wiki software i lis wiki—space for collecting articles and general informa tion about library and information science http://liswiki.org/wiki/main_page mediawiki i making of modern michigan—a wiki to support a statewide digital library project http://blog.lib.msu.edu/mmmwiki/ index.php/main_page unknown (behind firewall) i wiki used as a web content editing tool in a digital library initiative sponsored by emory university, the university of arizona, virginia tech, and the university of notre dame http://sunylanewtechwiki.pbwiki .com/ pbwiki ii wiki at suny stony brook health sciences library used as knowledge base http://appdev.hsclib.sunysb.edu/ twiki/bin/view/main/webhome; presentation can be found at: http:// ms.cc.sunysb.edu/%7edachase/ wikisinaction.htm twiki ii wiki at york university used internally for committee work. exploring how to use wikis as a way to collaborate with users message from mark robertson. on web4lib electronic discussion list dated 10/13/2006. unknown ii wiki for internal staff use at the university of waterloo. they utilize access control to restrict parts of the wiki to groups message from chris gray. on web4lib electronic discussion list dated 08/09/2006. unknown ii wiki at the university of toronto for internal communica tions, technical problems, and as a document repository message from stephanie walker. on librefl electronic discussion list dated 10/28/2006. unknown ii wiki used for coordination and organization of portable professor program, which appears to be a collaborative infor mation literacy program for remote faculty http://tfppcommittee.pbwiki.com/ pbwiki ii the university of connecticut libraries’ staff wiki which is a repository of information technology services documents http://wiki.lib.uconn.edu/wiki/ main_page mediawiki ii wiki used at binghamton university libraries for staff intranet. features pages for committees, documentation, policies, newsletters, presentations, and travel reports screenshots can be found at http://library.lib.binghamton.edu/ presentations/cil2006/cil%202006 _wikis.pdf mediawiki ii wiki used at the information desk at miami university described in: withers, rob. “something wiki this way comes.” c&rl news 66, no. 11 (2005): 775–77. unknown ii use of wiki as knowledge base to support reference service http://oregonstate.edu/~reeset/ rdm/ unknown ii university of minnesota libraries staff web site in wiki form https://wiki.lib.umn.edu/ pmwiki ii wiki used to support the mit engineering and science libraries bteam. the wiki may no longer be active, but is still available http://www.seedwiki.com/wiki/b team seedwiki iii a wiki that is subject guide at st. joseph county public library in south bend, indiana http://www.libraryforlife.org/ subjectguides/index.php/main_page mediawiki 3� information technology and libraries | september 20073� information technology and libraries | september 2007 category description location wiki software iii wiki used at the aiken library, university of south carolina as a content management system (cms) http://library.usca.edu/main/ homepage pmwiki iii doucette library of teaching resources wiki—a repository of resources for education students http://wiki.ucalgary.ca/page/ doucette mediawiki iv wiki worldcat (wikid) is an oclc pilot project (now defunct) that allowed users to add reviews to open worldcat records http://www.oclc.org/product works/wcwiki.htm unknown iii and iv wikiref lists reviews of reference resources—databases, books, web sites, etc. —created by butler librarians, faculty, staff, and students. http://www.seedwiki.com/wiki/ butler_wikiref; reported in matthies, brad, jonathan helmke, and paul slater. using a wiki to enhance library instruction. indiana libraries 25, no. 3 (2006): 32–34. seedwiki iii and iv wiki used as a subject guide at ohio university http://www.library.ohiou.edu/sub jects/bizwiki/index.php/main_page; presentation about the wiki: http://www.infotoday.com/cil2006/ presentations/c101102_boeninger .pps mediawiki index blending is the process of database development whereby various components are merged and refined to create a single encompassing source of information. once a research need is determined for a given area of study, existing resources are examined for value and possible contribution to the end product. index blending focuses on the quality of bibliographic records as the primary factor with the addition of full text to enhance the end user’s research experience as an added convenience. key examples of the process of index blending involve the fields of communication and mass media, hospitality and tourism, as well as computers and applied sciences. when academia, vendors, subject experts, lexicographers, and other contributors are brought together through the various factors associated with index blending, relevant discipline-specific research may be greatly enhanced. a s consumers, when we set out to make a purchase, we want the utmost in quality, and when applica ble, quantity, and of course all of the other ”appeal” factors that might be associated with a given product or service. these factors may include any number of catego ries, not the least of which is price. in other words, let it suffice to say that, as buyers, we want to have our cake and eat it, too. but how often is this a realistic approach to evaluating a given item for purchase? we first must decide what is important to us, decipher the order of this importance as we see it, and evaluate our options. wouldn’t it be much easier if one product in every situ ation had all of the factors that we deem important, and the appropriate price to go along with it? according to veliyath and fitzgerald in an article published in competitiveness review, firms can either posi tion themselves at the high end, offering higher quality at higher prices, or at the lower end, offering lower quality at a lower price (or anywhere inbetween on the continuum of constant value for customers). customers, however, want more of what they value, such as convenience, speed, stateoftheart design, quality, etc. competitors then try to differentiate themselves from their rivals along the same line of constant value, either by offering a higher quality at the same price or the same quality at a lower price (thereby increasing value for the customer).1 as such, and using a common example, is it possible to have the handling of a bmw sports car, the luxurious ride of a cadillac, the passenger space of a winnebago, the cargo space of an oversized pickup truck, all for the price of an economy car? it’s doubtful. but through recent developments in the electronic research database market place, and a process known as “index blending,” we may be closer than ever to this ideal formula when it comes to webbased reference resources for academic libraries. the phrase “index blending” is used here to describe an original concept/methodology initiated by ebsco publishing (ebsco). this is not to say that ebsco is the first vendor ever to have combined resources to create a new product, but to the authors’ best knowledge, no other vendor has pursued the “blending” of resources to the same extent and with such a strong guiding directive as ebsco has. index blending is the combining of niche indexes and other important components to create a single defini tive index for a particular discipline. as vendors seek to offer the most powerful research database for a given area of study, the pieces may come together through a combination of existing resources and proprietary development. in other words, in order to refine the tools used for research in a discipline, existing resources may be combined, fleshed out, further expanded upon, and enhanced to culminate in the archetypical index for the particular discipline. perhaps this represents the solution to the dilemma that “database choices become increas ingly complex when multiple sources exist that cover the same discipline.”2 the idea may seem elementary, but the process, however, can be arduous. processes involved with index blending expand upon the basic development stages asso ciated with creating a research database from “scratch,” coupled with an increase in applicable factors, which become evident when several existing and emerging resources are involved and subsequently interwoven. as is always the case, the first step to building a solution is to identify the problem and/or the need. in database devel opment, this is, in a nutshell, pinpointing a subject area of research that is lacking a corresponding definitive index, and where study patterns and research interest dictate a need for such a resource. this involves not only conduct ing surveys and engaging in discussion with advisory boards, librarians, subject experts, users, etc., but also taking a close look at the research resources that are cur rently available to determine value. because the process begins with the fact that there is a problem (no definitive index for the particular area in question), the idea is to understand the strengths of available resources, as well as to identify weaknesses. through this research process, vendors can further identify independent elements of each resource that may index blending | brooks and herrick 27 index blending: enabling the development of definitive, discipline-specific resources sam brooks and mark herrick sam brooks (sbrooks@ebscohost.com) is the senior vice president of sales & marketing for ebsco information services. mark herrick (mherrick@ebscohost.com) is the vice president of business development for ebsco publishing. 2� information technology and libraries | june 20072� information technology and libraries | june 2007 provide significant benefit or value, as well as pinpoint the additional important pieces that are not represented in any of the available resources. in both cases (available and not available), these elements may represent various aspects associated with a research index such as content coverage (both current and backfile), quality of indexing and abstracts, software/search functionality, thesauri, etc. once the identification and research has taken place, vendors should have the necessary knowledge to proceed to the production phase. figure 1 helps to illustrate how the index blending process can help to develop a new database that fuses together the strengths of existing resources while simul taneously compensating for any individual weaknesses that they may have. if value is attributed to currently available databases, then, if appropriate, database acquisition may come into play. this is often a critical phase of the process, and may involve the acquisition of more than a single index. however, the desire by a vendor to acquire a given resource is based on several motivating factors, including the qual ity of the database as a whole, the depth and breadth of its coverage, and at times, the extreme quality of an intricate aspect of a database, which will eventually be said data base’s contribution to the process of index blending, thus representing its “mark” on the final product. because there is no authoritative resource available for a given subject area does not mean necessarily that certain aspects of existing resources are not of utmost quality. hence, utilizing strengths of existing resources makes sense so as to not “reinvent the wheel” when applicable. in a journal of academic librarianship article discussing the research environment in libraries and the simultaneous utilization of existing library resources, similar principles to those used in index blending are apparent. “properly combining library resources to func tion collectively as a cohesive, efficient unit is the basis of information integration.”3 similar themes to those asso ciated with information integration run through index blending. this is attributed largely to the fact that the basic goal of each is to enable the extraction and utiliza tion of essential material pertinent to specific research so as to enhance the overall research process. n the process of index blending an example an interesting example of index blending utilized for a major area of study is in the case of communication and mass media. an article in searcher outlined the develop ment process and release of the database, communication & mass media complete, which may be the quintessential instance of the power brought about through index blending. in the article, the author first identifies the problem/need as such: when a communication studies student approaches my reference desk, it can take a few moments before i choose a database to search. why the delay? well, to be perfectly blunt, the communication studies literature is all over the place. if the question relates to an aspect of the communications industry, i will often begin with a business database. if the question concerns the effects of media violence on children, i may choose to search one or more of the following: comabstracts, psychinfo [sic], sociological abstracts, eric, and even a few large aggregators, such as wilsonweb’s omnifile and ebsco’s academic search premier. in addition, there is the question of finding a single database that covers the communication science and disorders field and the more mass mediafocused communication studies field. the result has been a searching strategy that relies on consulting multiple databases—a strategy that may not please impatient or inexperienced patrons. the need for such an assortment of databases is symptomatic of the discipline. the field of com munication studies is extremely interdisciplinary. the discipline’s roots began in the study of rhetoric and journalism and now encompass subjects ranging from political communication to film studies to advertising to journalism to communication disorders to digital convergence and to every manner of media. the dis cipline has strong roots in the social sciences, but also draws heavily on the humanities and the sciences. as some have put it, there is an aspect of communication studies in every discipline. this leaves librarians with the difficult task of finding a single database that cov ers this wideranging discipline. enter ebsco’s new communication & mass media complete database.4figure 1. the index blending process public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 2�index blending | brooks and herrick 2� this overview of the need for a comprehensive resource in areas related to communication and mass media is indicative of the type of information that vendors must extract when deciding their course of action for creat ing (or not creating) a database to meet such needs. in this instance, the need became apparent to ebsco upon conducting investigative research in this direction. there were certainly important, quality resources available cov ering some of the subject areas and subdisciplines, but not a single, allencompassing resource. hence, the table was set to move forward and begin the process of data base development using the process of index blending. once the need for a comprehensive communication and mass media database was established, ebsco began the phases of looking closely at available resources and gathering specific important details about what was required to develop such a database. in order to under stand the finer details and make appropriate forward progress in formulating an index for a given research area, a dedicated group of subject experts (advisory board, indexers, lexicographers, etc.) must be estab lished. in addition, aggregators must develop appro priate relationships and key partnerships. in the case of the database communication & mass media complete, ebsco worked diligently to assemble a panel of experts to provide direction. often, suggestions made by advi sory board members ultimately led to larger organiza tional partnerships. the first of ebsco’s major partnerships for the benefit of the development of communication & mass media complete was with the national communication association (nca). nca is the oldest and largest national organization to promote communication schol arship and education. founded in 1914, the nca is a nonprofit organization of approximately 7,100 educa tors, practitioners, and students who work and reside in every u.s. state and more than twenty countries. the purpose of the association is to promote study, criti cism, research, teaching, and application of the artistic, humanistic, and scientific principles of communica tion. nca is a scholarly society and, as such, works to enhance the research, teaching, and service produced by its members on topics of both intellectual and social significance. staff at the nca national office follows trends in national research, teaching, and service pri orities. it relays those opportunities to its members and represents the academic discipline of communication in those national efforts.5 in addition to providing insight and advice into the areas associated with communication and mass media, nca found in ebsco an ideal partner to further the tremendous efforts the organization had put into its database, commsearch. commsearch, in its original form, was a scholarly communication database with deep, archival coverage of the journals of the nca and other major journals in the field of communication studies. the database provided bibliographic and keyword references to twentysix journals in communication studies with coverage extending to the inaugural issue of each—some from as far back as the early decades of the twentieth century. the database also included covertocover indexing of the nca’s first six journals (from their first editions to the present) and authorsup plied abstracts from their earliest appearance in nca journals. as ebsco’s goals were in line with the nca in terms of improving scholarly research in areas sur rounding communication as well as enhancing the dis semination of applicable materials, a partnership was formed, and ebsco acquired commsearch. the com pany acquired this database with the intent to enhance the collection through content additions such that it would take residence immediately as a core component of communication & mass media complete. the second major database acquisition came about similarly to the commsearch arrangement; only this time, ebsco worked closely with penn state university, the developers of a database called mass media articles index. created by jack pontius and maintained by the penn state libraries since 1984, mass media articles index provided citation coverage for over forty thousand articles on mass media published in over sixty research journals, as well as major journalism reviews, recent encyclopedias, and handbooks in the area of communications studies. this database, which was once a standalone research tool, is a good example of how a goodquality resource can arise out of the passion and unique vision of an individual, yet never fully develop into its full potential due to a lack of funding, dedicated staff, and experience in database publishing. seeing the incredible potential of mass media articles index, ebsco earmarked this database as the sec ond major component in its larger communication and mass media product. as mentioned, the basic idea with index blending is to pinpoint the best and most important aspects of each database to carry forward into the final product. it is at this point that difficulty typically arises in the normalization of data. once core database components are determined, a vendor ’s expertise in building data bases, standardizing entries, etc., comes to the forefront. furthermore, because another basic ingredient to the process of index blending revolves around additional material included by the database developer, that aggre gator has the burden of taking the core building blocks of the database and elevating these raw materials to the point where their combination and refinement become the desired end result—a definitive, cohesive index to research in the subject area. with this in mind, ebsco carefully selected the indexing components of each resource that were essen tial to carry forward and substantially expanded the 30 information technology and libraries | june 200730 information technology and libraries | june 2007 abstracting and indexing coverage of appropriate journals in commsearch and mass media articles index. the company also added indexing and abstracts for many more of the important titles in the communication and mass media fields that were not covered by these databases. through its initial research, ebsco gained a thorough knowledge of which journals and other content sources were not covered by the two acquired databases, and worked to provide coverage for those missing sources. as such, the idea with this database was to cover all appropriate, qual ity titles indexed in all other currently avail able communication and mass mediaspecific databases combined, as well as other important journals not previously covered by any such database. further still, the company took the database to new levels through the creation and deployment of features such as searchable cited references and index browsing. figure 2 provides a visual interpretation of the elements associated with this particular example of index blending. often academic librarians consider aggre gated fulltext databases as a means for access ing fulltext information quickly, but with a negative outlook toward the quality of the indexing included in these databases. however, it is ebsco’s intention to create first and fore most a powerful index, such that any full text included is that much easier to locate and utilize. according to cleveland and cleveland in the book introduction to indexing and abstracting, 3rd ed., “in any retrieval system, success or failure depends on the adequacy of the indexing and the related search ing procedures.”6 ebsco wholeheartedly agrees with this statement. and though the company is the leader in providing fulltext databases, it continues to raise the bar for these databases through not only constantly increasing the quality and quantity of full text, but also by enhancing indexing, abstracts, and associated search functionality. a database may provide the greatest collection of full text, yet it is still only as good as its underlying indexing framework that guides users to the appropriate content. index blending allows for this ideal because the development of the indexing takes place at the onset as the primary objective, and full text may be included at a later stage. this is precisely the case with ebsco’s communication/communications database where the first iteration of the collection (communication & mass media index) did not include full text, and the complete (fulltext version) was soon to follow. thus, in the case of communication & mass media complete, once the core elements for the index were in place, refined, and normalized, ebsco moved forward in the area of fulltext content. in addition to the inclu sion of full text for all of the nca journals, which david oldenkamp refers to as “heavyweights in communication studies,” ebsco included fulltext coverage for nearly 230 titles. according to oldenkamp, as of april 2004, the competing database with the next largest number of publications covered in full text included only sixteen fulltext titles.7 though index blending is not the traditional way in which to build a database, and may actually be the most laborintensive way in which to proceed, the end results can be remarkable when done properly. using this process, “ebsco has managed to create the largest and most comprehensive database serving the needs of communication studies scholars, faculty, students, and librarians.”8 in addition, a review published in the charleston advisor determined that “ebsco has brought together two reliable but atrophied resources and refreshed them with new search capabilities and added content, such as abstracts. these have been combined with a healthy dose of ‘not indexed anywhere’ new titles and interdisciplinary sources to create a comprehensive figure 2. indexing components of communication & mass media complete public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 31index blending | brooks and herrick 31 resource that will satisfy the needs of students, faculty, and researchers.”9 n another example of index blending hospitality & tourism index index blending is a concept as much as it is a process and a means to an end. much like applying a particular theory to a number of different instances, index blending is interdisciplinary in application. thus, the area of com munication/communications as described previously, is simply an example of practical implementation of this concept, and a particular way in which the process was approached given the specific elements involved. another discipline to which index blending has been applied is the niche areas related to hospitality and tourism. according to professor vivienne sario, director of travel and tourism at community college of southern nevada, “on a global basis the hospitality and tourism industry employs more than 10 percent of the worldwide workforce. it contributes over $4 trillion in gross global output. this means travel and tourism is the world’s largest industry.”10 though still considered (perhaps incorrectly) a “niche” area of study, the number of hospitality and tourism programs supported in colleges and universities around the globe has also increased to the point where dozens and dozens of twoand four-year academic institutions provide related courses of study. from a business perspective, in order to justify the amount of resources that would inevitably be expended to develop a high-end, comprehensive database, the basic criteria needed for database development must first be in place. considering the economic vastness of the hospitality and tourism industry, the interest and research need is quite apparent. if there is at least one clearly definitive academic resource covering the subject area, in all likelihood, the decision would be made to cease exploration and development in that area. contrarily, when ebsco conducted exhaustive research to determine the need for a new index to literature in the areas of hospitality and tourism, the unanimous conclusion was to move forward in the development of a product that would go above and beyond the level of the existing resources. this is not to say that quality was not inherent in some of the existing resources. in actuality, the fact that there were already quality (albeit perhaps incomplete) resources available, paved the way for utilizing principles of index blending in the development of a more comprehensive resource. the first element of what was to become ebsco’s hospitality & tourism index was purdue university’s lodging, restaurant, & tourism index (lrti). as an indi cator of the level of emphasis attributed to this subject area by the university, purdue’s hospitality and tourism management undergraduate program was ranked num ber one nationally by a survey published in the journal of hospitality & tourism education.11 a previous survey conducted by the same journal used a different method ology and sample, but still ranked purdue’s hospitality and tourism management (htm) program number one in the nation.12 to provide insight into the purdue htm program, the origins and history of lrti, the need for a compre hensive database, and the university’s decision to work with ebsco, questions were asked of two prominent purdue faculty members: raphael kavanaugh, head, hospitality and tourism management department, and priscilla geahigan, head, consumer and family sciences library. the following is taken from email cor respondence among one of the authors (sam brooks), kavanaugh, and geahigan: brooks: how long has purdue offered a hospitality & tourism management program? kavanaugh: the program began in 1928 as the department of institutional management. brooks: when and why did purdue decide to create the lodging restaurant & tourism index (lrti)? kavanaugh: to fill a serious void of access to relevant research conducted related to the industry. geahigan: before 1990 coverage of the hospitality industry within business indexes and databases was limited. to meet the needs of researchers and students, purdue’s restaurant, hotel, institutional, and tourism management department, an inhouse indexing project, started in the purdue consumer and family sciences library in 1977. citations of articles from scholarly and trade journals were entered on index cards, filed by subject headings. in 1985 the project became more for malized and migrated into partnership with a few other academic institutions. a printed index titled lodging and restaurant index started. in 1987, purdue became the sole producer of the index. in 1995, the index was renamed the lodging, restaurant, and tourism index (lrti), with expanded scope and coverage. over the years, data diskettes and cdrom formats were added to the printed version. brooks: how important are “niche” or subjectspecific databases to support research in a given area such as h&t? geahigan: in contrast to earlier years, students can now get their information from a multitude of databases and 32 information technology and libraries | june 200732 information technology and libraries | june 2007 venues. at purdue, we have databases that cover all aspects of business and management. undergraduate students often get confused and impatient at the large number of databases offered. a subject specific database like hti gives them a place to start without feeling lost. brooks: why did purdue decide to partner with ebsco, and subsequently merge lrti in the larger hospitality & tourism index (hti)? geahigan: we realized that we do not have the resources to support a database that measures up to industry technology standards and have long decided to look for a company to take over lrti. ebsco’s offer was attrac tive to purdue because of their willingness to assume future indexing of the lrti journals. in addition, many purdue students are already familiar with the ebsco interface because we have numerous other ebsco hosted databases. we are pleased that lrti became the foundation of ebsco’s building of hti.13 the second foundational component of the database also came about through acquisition from an academic institution. articles in hospitality and tourism was copro duced by oxford brookes university and the university of surrey. bournemouth university was also a source of data for this database between the years of 1988 and 1998. this database provided details of more than fortysix thousand englishlanguage articles selected from more than 330 relevant academic and trade journals published worldwide from 1984 to 2003.14 rounding out the list of three existing resources that were acquired by ebsco, the hospitality database (acquired from the original developers at cornell university) was also assimilated into the new hospitality and tourism database. the hospitality database evolved from the print publication bibliography of hotel management and related subjects that was originally established in the 1950s by blanche fickle, the first director of the library at cornell university’s school of hotel administration.15 this database, founded on the vision of ms. fickle, would serve as a core resource for ebsco’s new hospitality & tourism index by providing it with a foundation of quality indexing for journals related to the study of hotel adminis tration and management. ebsco completed the initial development of its hospitality and tourism database by reviewing applicable subscription statistics maintained by its sister company, ebsco subscription services, in order to locate other publications relevant to the various subdisciplines of hospitality and tour ism. any such publications that were not already indexed by the other three existing resources were targeted for inclusion in the new hospitality & tourism index. figure 3 provides a visual interpretation of the ele ments associated with this particular example of index blending. following the initial release of hospitality & tourism index, in order to provide an even more inclusive research experience, ebsco proceeded to develop and release a fulltext version of this resource entitled hospitality & tourism complete. this new variant of the database offers users the same indexing infrastructure as hospitality & tourism index, as well as provides the additional benefit of immediate access to relevant fulltext content. while the availability of full text is certainly of immense value, it is still the quality of underlying indexing that allows this database to be regarded as truly innovative. in fact, this same perspective was echoed in a recent review in choice where the author states that “hospitality & tourism complete indexes its specialized subject area bet ter than any other product currently available.”16 n the whole is greater than the sum of its parts the process of index blending not only brings together content from a variety of resources, it also has the power to increase the research value of that same content. by combining such content under the umbrella of a single comprehensive database, pertinent information can now be more efficiently accessed and crossreferenced with other relevant content. previously, the same body of information could only be explored via a highly ineffec tive, piecemeal research process. one last example that demonstrates this potential increase in research value is found in the computers & figure 3. indexing components of hospitality & tourism complete public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 33index blending | brooks and herrick 33 applied sciences complete database. this resource was shaped through the acquisition and merger of three distinct indexes—computer science index (csi), internet & personal computing abstracts (ipca), and information science & technology abstracts (ista)—and rounded out with addi tional indexed content relevant to the larger discipline. this resulted in a total of 1,100 active journals indexed back as far as 1965. then, after two years of dedicated licensing work with pub lishers, full text for more than 570 of those titles was added to provide more direct access to such content for researchers. figure 4 illustrates how the various subject areas (unique and shared) covered by the three original databases were merged together in the blending process. from this diagram, it is apparent that the original three databases were already quality resources in their own right and adequately rep resented their respective subject areas. however, it should also be apparent that, through the pro cess of index blending, the value of the original databases has been enhanced via the fusion of their unique, yet complementary content into a single comprehensive resource. n conclusion though the above examples of communication & mass media complete, hospitality & tourism index, and computers & applied sciences complete represent only three of sev eral subjectspecific databases culminating from the process of index blending, most database producers (including ebsco) would likely agree that this is not a common procedure for database development. however, the knowledge that a company derives from the pro cess often has a significant impact on the company’s other, “nonblended” databases. index blending typically requires a high degree of refinement in order to be fully successful, so when a company engages in this rigorous developmental process, the newfound experience and expertise gained from it may spill over into the com pany’s other database initiatives. end users may notice improved indexing, abstracts, and other valuable com ponents that are now included in other more established fulltext resources from the same vendor. databases that were once viewed simply as “aggregated fulltext data bases” may be looked upon in a different light after the company adopts the process of index blending for other, unrelated database projects. though these databases may still provide easy access to an abundance of fulltext content, they may also now be considered the definitive index for their respective subject area(s). therefore, when a company implements the practice of index blending for some of its products, the resulting effects are two fold. the databases created directly as a result of the index blending process are the first to benefit, and the company’s other databases (including those with full text) may also benefit from index blending in an indirect manner. in the end, however, the success of any index blending initiative is measured by the level of benefit that it provides to applicable researchers and other users of the resulting databases. references 1. rajaram veliyath and elizabeth fitzgerald, “firm capabil ities, business strategies, customer preferences, and hypercom petitive arenas: the sustainability of competitive advantages with implications for firm competitiveness,” competitiveness review 10 (2000): 56–82. 2. m. suzanne brown, jana s. edwards, and jeneen lasee willemssen, “a new comparison of the current index to jour nals in education and the education index: a deep analysis of indexing,” the journal of academic librarianship 25 (may 1999): 216–22. 3. sam brooks, “integration of information resources and collection development strategy,” the journal of academic librarianship 27 (july 2001): 316–19. 4. david oldenkamp, “ebsco’s new communication and mass media complete (cmmc) database,” searcher 12, no. 4 (apr. 2004): 40. 5. national communication association web site. http:// www.natcom.org (accessed aug. 2004). figure 4. subject areas of component databases are merged into a cohesive whole through index blending 34 information technology and libraries | june 200734 information technology and libraries | june 2007 6. donald b. cleveland and ana d. cleveland, introduction to indexing and abstracting, 3rd ed. (greenwood village, colo.: libraries unlimited, 2001): 26. 7. oldenkamp, “ebsco’s new communication and mass media complete (cmmc) database.” 8. ibid. 9. dodie owens, “advisor reviews—standard review: communication and mass media complete,” the charleston advisor 6, no. 4 (apr. 2005): 45. 10. vivienne sario, “hospitality & tourism programs,” http://www.studyusa.com/articles/hospitality.asp (accessed june 1, 2006). 11. purdue university web site. http://news.uns.purdue. edu/uns/html4ever/030130.kavanaugh.rank2003.html (accessed june 1, 2006). 12. michael g. brizek and mahmood a. khan, “ranking of u.s. hospitality undergraduate programs: 2000–01,” journal of hospitality & tourism education 14, no. 2 (2002): 4. 13. raphael kavanaugh and priscilla geahigan, email mes sage with author sam brooks, feb. 3, 2005. 14. articles in hospitality and tourism web site (hosted by the university of surrey). http://libweb.surrey.ac.uk/aht2/about .asp (accessed june 1, 2006). 15. cornell university’s school of hotel administration web site. http://www.nestlelib.cornell.edu/history.html (accessed june 1, 2006). 16. s. c. awe, “referencesocial and behavioral sciences— hospitality & tourism complete,” choice 43, no. 10 (june 2006). testing information literacy in digital environments | katz 3 despite coming of age with the internet and other technology, many college students lack the information and communication technology (ict) literacy skills necessary to navigate, evaluate, and use the overabundance of information available today. this paper describes the development and early administrations of ets’s iskills assessment, an internet-based assessment of information literacy skills that arise in the context of technology. from the earliest stages to the present, the library community has been directly involved in the design, development, review, field trials, and administration to ensure the assessment and scores are valid, reliable, authentic, and useful. t echnology is the portal through which we interact with information, but there is growing belief that people’s ability to handle information—to solve problems and think critically about information—tells us more about their future success than does their knowledge of specific hardware or software. these skills—known as information and communications technology (ict) literacy—comprise a twentyfirstcentury form of literacy in which researching and communicating information via digital environments are as important as reading and writing were in earlier centuries (partnership for 21st century skills 2003). although today’s knowledge society challenges stu dents with overabundant information of often dubious quality, higher education has recognized that the solution cannot be limited to improving technology instruction. instead, there is an increasingly urgent need for students to have stronger information literacy skills—to “be able to recognize when information is needed and have the ability to locate, evaluate, and use effectively the needed information” (american library association 1989)—and apply those skills in the context of technology. regional accreditation agencies have integrated information lit eracy into their standards and requirements (for example, middle states commission on higher education 2003; western association of schools and colleges 2001), and several colleges have begun campuswide initiatives to improve the information literacy of their students (for example, the california state university 2006; university of central florida 2006). however, a key challenge to designing and implementing effective information lit eracy instruction is the development of reliable and valid assessments. without effective assessment, it is difficult to know if instructional programs are paying off—whether students’ information literacy skills are improving. ict literacy skills are an issue of national and inter national concern as well. in january 2001, educational testing service (ets) convened an international ict literacy panel to study the growing importance of exist ing and emerging information and communication tech nologies and their relationship to literacy. the results of the panel’s deliberations over fifteen months highlighted the growing importance of ict literacy in academia, the workplace, and society. the panel called for assessments that will make it possible to determine to what extent young adults have obtained the combination of techni cal and cognitive skills needed to be productive mem bers of an informationrich, technologybased society (international ict literacy panel 2002). this article describes ets’s iskills assessment (for merly “ict literacy assessment”), an internetbased assessment of information literacy skills that arise in the context of technology. from the earliest stages to the pres ent, the library community has been directly involved in the design, development, review, field trials, and admin istration to ensure the assessment and scores are valid, reliable, authentic, and useful. ■ motivated by the library community although the results of the international ict literacy panel provided recommendations and a framework for an assessment, the inspiration for the current iskills assessment came more directly from the higher educa tion and library community. for many years, faculty and administrators at the california state university (csu) had been investigating issues of information literacy on their campuses. as part of their systemwide information competence initiative that began in 1995, researchers at csu undertook a massive ethnographic study to observe students’ research skills. the results suggested a great many shortcomings in students’ infor mation literacy skills, which confirmed librarian and classroom faculty anecdotal reports. however, clearly such a massive data collection and analysis effort would be unfeasible for documenting the information literacy skills of students throughout the csu system (dunn 2002). gordon smith and the late ilene rockman, both of the csu chancellor ’s office, discussed with ets the idea of developing an assessment of ict literacy that could support csu’s information competence initiative as well as similar initiatives throughout the higher edu cation community. irvin r. katz irvin r. katz (ikatz@ets.org) is senior research scientist in the research and development division at educational testing service. testing information literacy in digital environments: ets’s iskills assessment � information technology and libraries | september 2007� information technology and libraries | september 2007 ■ national higher education ict literacy initiative in august 2003, ets established the national higher education ict literacy initiative, a consortium of seven colleges and universities that recognized the need for an ict literacy assessment targeted at higher educa tion. representatives of these institutions collaborated with ets staff to design and develop the iskills assessment. the consortium built upon the work of the international panel to explicate the nature of ict literacy in higher education. over the ensuing months, repre sentatives of consortium institutions served as subject matter experts for the assessment design and scoring implementation. the development of the assessment followed a process known as evidencecentered design (mislevy, steinberg, and almond 2003), a systematic approach to the design of assessments that focuses on the evidence (student performance and products) of proficiencies as the basis for constructing assessment tasks. through the evidence centered design process, ets staff (psychometricians, cognitive psychologists, and test developers) and sub jectmatter experts (librarians and faculty) designed the assessment by considering first the purpose of the assess ment and by defining the construct—the knowledge and skills to be assessed. these decisions drove discussions of the types of behaviors, or performance indicators, to serve as evidence of student proficiency. finally, simulation based tasks designed around authentic scenarios were crafted to elicit from students the critical performance indicators. katz et al. (2004) and brasley (2006) provide a detailed account of this design and development process, illustrating the critical role played by librarians and other faculty from higher education. ■ ict literacy = information literacy + digital environments consortium members agreed with the conclusions of the international ict literacy panel that ict literacy must be defined as more than technology literacy. college students who grew up with the internet (the “net generation”) might be impressively technologically literate, more accepting of new technology, and more technically facile than their parents and instructors (oblinger and oblinger 2005). however, anecdotally and in smallscale studies, there is increasing evidence that students do not use technology effectively when they conduct research or communicate (rockman 2004). many educators believe that students today are less information savvy than earlier generations despite having powerful information tools at their disposal (breivik 2005). ict literacy must bridge the ideas of information literacy and technology literacy. to do so, ict literacy draws out the technologyrelated components of infor mation literacy as specified in the oftencited standards of the association of college and research libraries (acrl) (american library association 1989), focusing on how students locate, organize, and communicate information within digital environments (katz 2005). this conflu ence of information and technology directly reflects the “new illiteracy” concerns of educators: students quickly adopt new technology, but do not similarly acquire skills for being critical consumers and ethical producers of information (rockman 2002). students need training and practice in ict literacy skills, whether through general education or within discipline coursework (rockman 2004). the definition of ict literacy adopted by the con sortium members reflects this view of ict literacy as information literacy needed to function in a technological society: ict literacy is the ability to appropriately use digital technology, communication tools, and/or networks to solve information problems in order to function in an information society. this includes having the ability to use technology as a tool to research, organize, and communicate information and having a fundamental understanding of the ethical/legal issues surrounding accessing and using information (katz et al. 2004, 7). consortium members further refined this defini tion, identifying seven performance areas (see figure 1). these areas mirror the acrl standards and other related standards, but focus on elements that were judged most central to being sufficiently information literate to meet the challenges posed by technology. ■ ets’s iskills assessment ets’s iskills assessment is an internetdelivered assess ment that measures students’ abilities to research, orga nize, and communicate information using technology. the assessment focuses on the cognitive problemsolving and criticalthinking skills associated with using technol ogy to handle information. as such, scoring algorithms target cognitive decisionmaking rather than technical competencies. the assessment measures ict literacy through the seven performance areas identified by con sortium members, which represent important problem solving and criticalthinking aspects of ict literacy skill (see figure 1). assessment administration takes approx imately seventyfive minutes, divided into two sec tions lasting thirtyfive and forty minutes, respectively. article title | author 5testing information literacy in digital environments | katz 5 figure 1. components of ict literacy define: understand and articulate the scope of an information problem in order to facilitate the electronic search for information, such as by: ■ distinguishing a clear, concise, and topical research question from poorly framed questions, such as ones that are overly broad or do not otherwise fulfill the information need; ■ asking questions of a “professor” that help disambiguate a vague research assignment; and ■ conducting effective preliminary information searches to help frame a research statement. access: collect and/or retrieve information in digital environments. information sources might be web pages, databases, discussion groups, e-mail, or online descriptions of print media. tasks include: ■ generating and combining search terms (keywords) to satisfy the requirements of a particular research task; ■ efficiently browsing one or more resources to locate pertinent information; and ■ deciding what types of resources might yield the most useful information for a particular need. evaluate: judge whether information satisfies an information problem by determining authority, bias, timeliness, relevance, and other aspects of materials. tasks include: ■ judging the relative usefulness of provided web pages and online journal articles; ■ evaluating whether a database contains appropriately current and pertinent information; and ■ deciding the extent to which a collection of resources sufficiently covers a research area. manage: organize information to help you or others find it later, such as by: ■ categorizing e-mails into appropriate folders based on a critical view of the e-mails’ contents; ■ arranging personnel information into an organizational chart; and ■ sorting files, e-mails, or database returns to clarify clusters of related information. integrate: interpret and represent information, such as by using digital tools to synthesize, summarize, compare, and contrast information from multiple sources while: ■ comparing advertisements, e-mails, or web sites from competing vendors by summarizing information into a table; ■ summarizing and synthesizing information from a variety of types of sources according to specific criteria in order to compare information and make a decision; and ■ re-representing results from an academic or sports tournament into a spreadsheet to clarify standings and decide the need for playoffs. create: adapt, apply, design, or construct information in digital environments, such as by: ■ editing and formatting a document according to a set of editorial specifications; ■ creating a presentation slide to support a position on a controversial topic; and ■ creating a data display to clarify the relationship between academic and economic variables. communicate: disseminate information tailored to a particular audience in an effective digital format, such as by: ■ formatting a document to make it more useful to a particular group; ■ transforming an e-mail into a succinct presentation to meet an audience’s needs; ■ selecting and organizing slides for distinct presentations to different audiences; and ■ designing a flyer to advertise to a distinct group of users. © 2007 educational testing service. all rights reserved. 6 information technology and libraries | september 20076 information technology and libraries | september 2007 during this time, students respond to fifteen interactive, performancebased tasks. each interactive task presents a realworld scenario, such as a class or work assignment, that frames the infor mation problem. students solve informationhandling tasks in the context of simulated software (for example, email, web browser, library database) having the look and feel of typical applications. there are fourteen three to fiveminute tasks and one fifteenminute task. the three to fiveminute tasks target a single perfor mance area, while the fifteenminute tasks comprise more complex problemsolving scenarios that target multiple performance areas. the simpler tasks contribute to the overall reliability of the assessment, while the more com plex task focuses on the richer aspects of ict literacy performance. in the assessment, a student might encounter a sce nario that requires him or her to access information from a database using a search engine (see figure 2). the results are tracked and strategies scored based on how he or she searches for information, such as key words chosen, search strategies refined, and how well the information returned meets the needs of the task. the assessment tasks each contain mechanisms to keep students from pursuing unproductive actions in the simulated environment. for example, in an internet browsing task, when the student clicks on an incorrect link, he might be told that the link is not needed for the current task. this message cues the student to try an alter native approach while still noting for scoring purposes that the student made a misstep. in a similar way, the student who fails to find useful (or any) journal articles in her database search might receive an instant message from a “teammate” providing her with a set of journal articles to be evaluated. these mechanisms potentially keep students from becoming frustrated (for example, via a fruitless search) while providing the opportunity for the students to demonstrate other aspects of their skills (for example, evaluation skills). the scoring for the iskills assessment is completely automated. unlike a multiplechoice question, each simu lationbased task provides many opportunities to collect information about a student and allows for alternative paths leading to a solution. scored responses are pro duced for each part of a task, and a student’s overall score on the test accumulates the individual scored responses across all assessment tasks. the assessment differs from existing measures in sev eral ways. as a largescale measure, it was designed to be administered and scored across units of an institution or across institutions. as a simulationbased assessment, the tasks go beyond what is possible in multiplechoice format, providing students with the look and feel of interactive digital environments along with tasks that elicit higherorder criticalthinking and problemsolving skills. as a scenariobased assessment, students become engaged in the world of the tasks, and the task scenarios describe the types of assignments students should be see ing in their ict literacy instruction as well as examples of workplace and personal information problems. ■ two levels of assessments the iskills assessment is offered at two levels: core and advanced. the core level was designed to assess readi ness for the ict literacy demands of college. it is targeted at high school seniors and firstyear college students. the advanced level was designed to assess readiness for the ict literacy challenges in transitioning to higherlevel college coursework, such as moving from sophomore to junior year or transferring from a twoyear to a fouryear institution. the advanced level targets students in their second or third year of postsecondary study. the key difference between the core and advanced levels is in the difficulty of the assessment tasks. tasks in the core level are designed to be easier; examinees are presented with fewer options, the scenarios are more straightforward, and the reasoning needed for each step in a task is simpler. an advanced task might require an individual to infer the search terms needed from a gen eral description of an information need; the correspond ing core task would state the information need more explicitly. in a task of evaluating web sites, the core level might present a web site with many clues that it is not figure 2. in the iskills assessment, students demonstrate their skills at handling information through interaction with simulated software. in this example task, students develop a search query as part of a research assignment on earthquakes. © 2007 educational testing service. all rights reserved. article title | author 7testing information literacy in digital environments | katz 7 authoritative (a “.com” url, unprofessional look, content that directly describes the authors as students). the cor responding advanced task would present fewer cues of the web site’s origin (for example, a professional look, but careful reading reveals the web site is by students). ■ score reports for individuals and institutions both levels of the assessment feature online delivery of score reports for individuals and for institutions. the individual score report is intended to help guide students in their learning of ict literacy skills, aiding identifica tion of students who might need additional ict literacy instruction. the report includes an overall ict literacy score, a percentile score, and individualized feedback on the student’s performance (see figure 3). the percentile compares students to a reference group of students who took the test in early 2006 and who fall within the target population for the assessment level (core or advanced). as more data are collected from a greater number of institutions, these reference groups will be updated and, ideally, approach nationally representative norms. score reports are available online to students, usually within one week. high schools, colleges, and universities receive score reports that aggregate results from the testtakers at their institution. the purpose of the reports is to provide an overview of the students in comparison with a reference group. these reports are available to institutions online after at least fifty students have taken either the core or advanced level test—that is, when there are sufficient num bers to allow reporting of reliable scores. figure 4 shows a graph from one type of institutional report. users have the option to specify the reference group (for example, all students, all students at a fouryear institution) and the subset of testtakers to compare to that group (for exam ple, freshmen, students taking the test within a particular timeframe). a second report summarizes the performance feedback of the individual reports, providing percentages of students who received the highest score on each aspect of performance (each of the fourteen short tasks are scored on two or three different elements). finally, institutions can conduct their own analyses by downloading the data of their testtakers, which include each student’s responses to the background questions, iskills score, and responses to institutionspecified questions. ■ testing the test a variety of specialists contributed to the development of ets’s iskills assessment: librarians, classroom fac ulty, education administrators, assessment specialists, researchers, userinterface and graphic designers, and systems developers. the team’s combined goal was to produce a valid, reliable, authentic assessment of ict literacy skills. before the iskills assessment produced figure 3. first page of a sample score report for an individual. the subsequent pages contain additional performance feedback. figure 4. sample portion of an institutional score report: comparison between a user-specified reference group and data from the user’s institution. © 2007 educational testing service. all rights reserved. © 2007 educational testing service. all rights reserved. � information technology and libraries | september 2007� information technology and libraries | september 2007 official scores for testtakers, these specialists—both ets and ict literacy experts—subjected the assess ment to a variety of review procedures at many stages of development. these reviews ranged from weekly teleconferences with consortium members during the initial development of assessment tasks (january–july 2004), to smallscale usability studies in which ets staff observed individual students completing assessment tasks (or mockups of assessment tasks), to field trials that mirrored actual test delivery. the usability studies investigated students’ comprehension of the tasks and testing environment as well as the ease of use of the simulated software in the assessment tasks. the field trials provided opportunities to collect performance data and test the automated scoring algorithms. in some cases, ets staff finetuned the scoring algorithms (or developed alternatives) when the scores produced were not psychometrically sound, such as when one element of students’ scores was inconsistent with their overall performance. through these reviews and field trials, the iskills assessment evolved to its current form, targeting and reporting the performance of individuals who complete the seventyfiveminute assessment. in some cases, feedback from experts and field trial participants led to significant changes. for example, the iskills assess ment began in 2005 as a twohour assessment (at that time called the ict literacy assessment), that reported scores only to institutions on the aggregated perfor mance of their participating students. some students entering higher education found the 2005 assessment excessively difficult, which led to the creation of the easier core level assessment. table 1 outlines the participation volumes for the field trials and test administrations. during each field trial, as well as during the institutional administration, feedback was collected from students on their experience with the test via a brief exit survey. table 2 summarizes some results of the exit survey. student reactions to the test were reasonably consistent: most students enjoyed taking the test and found the tasks realistic. in writ ten comments, students taking the institutional assess ment found the experience rewarding but exhausting, and thought the amount of reading excessive. student feedback directly influenced the design of the core and advanced level assessments, including the shorter test table 1. chronology of field trials and test administrations date administration approximate no. of students approximate no. of participating institutions july–september 2004 field trials for institutional assessment 1,000 40 january–april 2005 institutional assessment 5,000 30 may 2005 field trials for alternative individual assessment structures 400 25 november 2005 field trials for advanced level individual assessment 700 25 january–may 2006 advanced level individual assessment 2,000 25 february 2006 field trials for core level individual assessment 700 30 april–may 2006 core level individual assessment 4,500 45 august–december 2006 core level: continuous administration 2,100 20 august–december 2006 advanced level: continuous administration 1,400 10 note: items in bold represent “live” test administrations in which score reports were issued to institutions, students, or both. article title | author 9testing information literacy in digital environments | katz 9 taking time and lighter reading load compared with the institutional assessment. as shown in table 1 (bolded rows), test administra tions in 2005 and early 2006 occurred within set time frames. beginning in august 2006, the core and advanced level assessments switched to continuous testing: instead of a specific testing window, institutions create testing sessions to suit the convenience of their resources and students. the tests are still administered in a proctored lab environment, however, to preserve the integrity of the scores. ■ student performance almost 6,400 students at sixtythree institutions par ticipated during the first administrations of the core and advanced level iskills assessments between january and may 2006. (some institutions administered both the core and advanced level assessments.) testtakers consisted of 1,016 highschool students, 753 community college students, and 4,585 fouryear college and university stu dents. institutions selected students to participate based on their assessment goals. some chose to test students enrolled in a particular course, some recruited a random sample, and some issued an open invitation and offered gift certificates or other incentives. because the sample of students is not representative of all united states institu tions nor all higher education students, these results do not necessarily generalize to the greater population of collegeage students and should therefore be interpreted with caution. even so, the preliminary results reveal interesting trends in the ict literacy skills of participat ing students. overall, students performed poorly on both the core and advanced level, achieving only about half of the possible points on the tests. informally, the data suggest that students generally do not consider the needs of an audience when communicating information. for exam table 2. student feedback from the institutional assessment and individual assessments’ field trials statement % agreeing institutional assessment (n=4,898) advanced level field trials (n=736) core level field trials (n=648) i enjoyed taking this test. 61 59 67 this test was appropriately challenging. 90 90 86 i have never taken a test like this one before. 90 90 89 to perform well on this test requires thinking skills as well as technical skills. 95 93 94 i found the overall testing interface easy to use (even if the tasks themselves might have been difficult). 83 82 85 my performance on this test accurately reflects my ability to solve problems using computers and the internet. 63 56 67 i didn’t take this test very seriously. 25 25 23 the tasks reflect activities i have done at school, work, or home. 79 77 78 the software tools were unrealistic. n/a 21 24 10 information technology and libraries | september 200710 information technology and libraries | september 2007 ple, they do not appear to recognize the value of tailor ing material to an audience. regarding the ethical use of information, students tend not to check the “fair use” policies of information on the assessment’s simulated web sites. unless the usage policy (for example, copy right information) is very obvious, students appeared to assume that they may use information obtained online. on the positive side, testtakers appeared to recognize that .edu and .gov sites are less likely to contain biased material than .com sites. eighty percent of testtakers correctly completed an organizational chart based on emailed personnel information. most testtakers cor rectly categorized emails and files into folders. and when presented with an unclear assignment, 70 percent of testtakers selected the best question to help clarify the assignment. during a task in which students evaluated a set of web sites: ■ only 52 percent judged the objectivity of the sites cor rectly; ■ sixtyfive percent judged the authority correctly; ■ seventytwo percent judged the timeliness correctly; and ■ overall, only 49 percent of testtakers uniquely identi fied the one web site that met all criteria. when selecting a research statement for a class assign ment: ■ only 44 percent identified a statement that captured the demands of the assignment; ■ fortyeight percent picked a reasonable but too broad statement; and ■ eight percent picked statements that did not address the assignment. when asked to narrow an overly broad search: ■ only 35 percent selected the correct revision; and ■ thirtyfive percent selected a revision that only mar ginally narrowed the search results other results suggest that these students’ ict literacy needs further development: ■ in a web search task, only 40 percent entered mul tiple search terms to narrow the results; ■ when constructing a presentation slide designed to persuade, 12 percent used only those points directly related to the argument; ■ only a few testtakers accurately adapted existing material for a new audience; and ■ when searching a large database, only 50 percent of testtakers used a strategy that minimized irrelevant results. ■ validity evidence the goal of the iskills assessment is to measure the ict literacy skills of students—higher scores on the assess ment should reflect stronger skills. evidence for this validity argument has been gathered since the earliest stages of assessment design, beginning in august 2003. these documentation and research efforts, conducted at ets and at participating institutions, include: ■ the estimated reliability of iskills assessment scores is .88 (cronbach alpha), which is a measure of test score consistency across various administrations. this level of reliability is comparable to that of many other respected contentbased assessments, such as the advanced placement exams. ■ as outlined earlier, the evidencecentered design approach ensures a direct connection between experts’ view of the domain (in this case, ict literacy), evi dence of student performance, design of the tasks, and the means for scoring the assessment (katz et al. 2004). through the continued involvement of the library community in the form of the ict literacy national advisory committee and development committees, the assessment maintains the endorsement of its con tent by appropriate subjectmatter experts. ■ in november 2005, a panel of experts (librarians and faculty representing high schools, community colleges, and fouryear institutions from across the united states) reviewed the task content and scoring for the core level iskills assessment. after investigat ing each of the thirty tasks and their scoring in detail, the panelists strongly endorsed twentysix of the tasks. four tasks received less strong endorsement and were subsequently revised according to the committee’s recommendations. ■ students’ selfassessments of their ict literacy skills align with their scores on the iskills assessment (katz and macklin 2006). the selfassessment measures were gathered via a survey administered before the 2005 assessment. interestingly, although students’ confidence in their ict literacy skills aligned with their iskills scores, iskills scores did not correlate with the frequency with which students reported per forming ict literacy activities. this result supports librarians’ claims that mere frequency of use does not translate to good ict literacy skills, and points article title | author 11testing information literacy in digital environments | katz 11 to the need for ict literacy instruction (oblinger and hawkins 2006; rockman 2002). ■ several other validity studies are ongoing, both at ets and at collaborating institutions. these stud ies include using the iskills assessment in prepost evaluations of educational interventions, detailed comparisons of student performance on the assess ment and on more realworld ict literacy tasks, and comparisons of iskills assessment scores and scores from writing portfolios. ■ national ict literacy standards and setting cut scores in october 2006, the national forum on information literacy, an advocacy group for information literacy policy (http://www.infolit.org/), announced the formation of the national ict literacy policy council. the policy coun cil—composed of representatives from key policymaking, informationliteracy advocacy, education, and workforce groups—has the charter to draft ict literacy standards that outline what students should know and be able to do at different points in their academic careers. beginning in 2007, the council will first review existing standards docu ments to draft descriptions for different levels of perfor mance (for example, minimal ict literacy, proficient ict literacy), creating a framework for the national ict literacy standards. separate performance levels will be defined for the corresponding target population for the core and advanced assessments. these performancelevel descrip tions will be reviewed by other groups representing key stakeholders, such as business leaders, healthcare educa tors, and the library community. the council also will recruit experts in ict literacy and informationliteracy instruction to review the iskills assessment and recommend cut scores corresponding to the performance levels for the core and advanced assess ments. (a cut score represents the minimum assessment score needed to classify a student at a given performance level.) the standardsbased cut scores are intended to help educators determine which students meet the ict literacy standards and which may need additional instruction or remediation. the council will review these recommended cut scores and modify or accept them as appropriately reflecting national ict literacy standards. ■ conclusions ets’s iskills assessment is the first nationally available measure of ict literacy that reflects the richness of that area through simulationbased assessment. owing to the 2005 and 2006 testing of more than ten thousand students, there is now evidence consistent with anec dotal reports of students’ difficulty with ict literacy despite their technical prowess. the results reflect poor ict literacy performance not only by students within one institution, but across the participating sixtythree high schools, community colleges, and fouryear colleges and universities. the iskills assessment answers the call of the 2001 international ict literacy panel and should inform ict literacy instruction to strengthen these criti cal twentyfirstcentury skills for college students and all members of society. ■ acknowledgments i thank karen bogan, dan eignor, terry egan, and david williamson for their comments on earlier drafts of this article. the work described in this article represents con tributions by the entire iskills team at educational testing service and the iskills national advisory committee. works cited american library association. 1989. presidential committee on information literacy: final report. chicago: ala. available online at http://www.ala.org/acrl/legalis.html (accessed june 13, 2007). brasley, s. s. 2006. building and using a tool to assess info and tech literacy. computers in libraries 26, no. 5: 6–7, 43–48. breivik, p. s. 2005. 21st century learning and information literacy. change 37, no. 2: 20–27. dunn, k. 2002. assessing information literacy skills in the cali fornia state university: a progress report. journal of academic librarianship 28, no. 1/2: 26–36. international ict literacy panel. 2002. digital transformation: a framework for ict literacy. princeton, n.j.: educational testing service. available online at http://www.ets.org/media/ tests/information_and_communication_technology_lit eracy/ictreport.pdf (accessed june 13, 2007). katz, i. r. 2005. beyond technical competence: literacy in infor mation and communication technology. educational technology magazine 45, no 6: 144–47. katz, i. r., and a. macklin. 2006. information and communica tion technology (ict) literacy: integration and assessment in higher education. in proceedings of the 4th international conference on education and information systems, technologies, and applications, f. malpica, a. tremante, and f. welsch, eds. caracas, venezuela: international institute of informatics and systemics. katz, i. r., et al. 2004. assessing information and communications technology literacy for higher education. paper presented at the 12 information technology and libraries | september 200712 information technology and libraries | september 2007 annual meeting of the international association for educa tional assessment, philadelphia, pa. middle states commission on higher education. 2003. developing research and communication skills: guidelines for information literacy in the curriculum. philadelphia: middle states com mission on higher education. mislevy, r. j., l. s. steinberg, and r. g. almond. 2003. on the structure of educational assessments. measurement: interdisciplinary research and perspectives 1: 3–67. oblinger, d. g., and b. l. hawkins. 2006. the myth about stu dent competency. educause review 41, no. 2: 12–13. oblinger, d. g., and j. l. oblinger, eds. 2005. educating the net generation. washington, d.c.: educause, http://www. educause.edu/educatingthenetgen (accessed dec. 29, 2006). partnership for 21st century skills. 2003. learning for the 21st century: a report and mile guide for 21st century skills. washington, d.c.: partnership for 21st century skills. rockman, i. f. 2002. strengthening connections between infor mation literacy, general education, and assessment efforts. library trends 51, no. 2: 185–98. ———. 2004. introduction: the importance of information lit eracy. in integrating information literacy into the higher education curriculum: practical models for transformation. i. f. rockman and associates, eds. san francisco: jossybass. the california state university. 2006. information competence initiative web site. http://calstate.edu/ls/infocomp.shtml (accessed june 4, 2006). university of central florida. 2006. information fluency initiative web site. http://www.if.ucf.edu/ (accessed june 4, 2006). western association of schools and colleges. 2001. handbook of accreditation. alameda, calif.: western association of schools and colleges. available online at http://www.wascsenior .org/wasc/doc_lib/2001%20handbook.pdf (accessed dec. 22, 2006). automatic retrieval of biographical reference books cherie b. well: institute for computer research, committee on information science, university of chicago, chicago, illinois 239 a description of one of the first pro;ected attempts to automate a reference service, that of advising which biographical reference book to use. two hundred and thirty-four biographical books were categorized as to type of subjects included and contents of the uniform entries they contain. a computer program which selects up to five books most likely to contain answers to biographical questions is described and its test results presented. an evaluation of the system and a discussion of ways to extend the scheme to other forms of reference work are given. ideally the reference librarian is the "middleman between the reader and the right book" ( 1 ) , and this is what the program here described is intended to be. in the past there has been very little interest shown in automating this service, probably because it is neither urgent nor practical in current reference departments. many developments in automating other areas of libraries have indirectly benefitted reference librarians, and the literature primarily emphasizes this aspect. for instance, where circulation systems have been automated, the location of a particular volume can be quickly ascertained and librarians need not waste time searching. automation of the ordering phase provides them with information on the processing stage of a new volume. if the contents of the catalog have been put in machine readable form, special bibliographies can be rapidly produced in response to a particular request or as a regular service of selective dissemination. the development of kwic (key word in context) in240 journal of library automation vol. 1/ 4 december, 1968 dexes, which are compiled and printed by computer, has enabled publishers to provide indexes to their books much faster. computers have also been programmed to make concordances and citation indexes ( 2). the combination of paper-tape typewriters, computer and a photocomposer has introduced automation into compiling index medicus (3). changes in reference services themselves, however, may make automation of question-answering practical. one trend is toward larger reference collections to be shared by several libraries; some areas have already set up regional reference services. there are also cooperative reference plans whereby several strong libraries agree to specialize in certain fields and cooperate in answering questions referred by the others ( 4). these trends will mean two things to reference librarians: greater concentration of resources, allowing more specialized books and mechanization; and screening of questions at the local level, letting reference centers concentrate on more complex questions that utilize their specialized books. thus it seems likely that special reference centers may look increasingly toward mechanizing their services, and retrieval schemes of the type presented here will be important to consider. basic assumptions the categorizing system was based on two nearly universal generalizati.ons about biographical reference books: 1) they are consistently confined to biographies of persons who have something in common: for example, being alive or dead; or having the same nationality, sex, occupation, religion, race, memberships; or possessing some combination of those attributes. these common characteristics in the people covered by a given book are herein called "exclusive categories." 2) the books generally maintain uniform entries for each subject; that is, they give the same data for each biography. these facts are referred to herein as "specifics" or "specific categories." certain assumptions were made about reference work: 1) all biographical reference books fit into the scheme and can be categorized. 2) the more limited a book's scope, the more likely it is to contain the person a user wants to find. in other words, if a user is interested in a dutch economist, he is more likely to find information in a book limited to dutch economists than in a general biographical dictionary. the user, however, does not want to miss any source that might be useful. therefore a general biographical dictionary should be given to him as a last resort, after books on dutch economists, dutchmen of all occupations, and economists of all nationalities. 3) certain requirements, the specifics, have no substitutes. for example, a book lists addresses or it does not, and if a user wants an address, books without them are useless. there is merit in suggesting to a user which book to use as opposed to giving him the direct answer to his question. probably the best argument for this assumption is that the volume of names that would have to be retrieval of biographical reference booksjweil 241 compiled and stored for a direct inquiry system is staggering, only a small number would ever be looked up, and it is impossible to predict which ones would be searched for. there are advantages to mechanizing this particular task of a reference librarian: good reference librarians should be freed to perform work less easily mechanized; there are not enough reference librarians who have perfect recall of their collections even to knowing which exclusive categories all the books fit into; and no librarian could have complete recall as to the specifics contained in each biographical reference book in the collection. the computer program the program was written in the comit language, a non-numerical programming language developed for research in mechanical translation, information retrieval and artificial intelligence. it is a high-level problemoriented language for symbol manipulation, especially designed for dealing with strings of characters. the program could probably be converted to other list-processing languages ( 6) for operation at other installations. the program was run at the university of chicago computation center on an ibm 7094 having the comit system on a disk. questions were submitted and nm in large batches. · the data all biographical reference books in english, with alphabetical ordering of subjects, which are in the reference room of the university of chicago's harper library were included in the data and no other books were included. since one assumption was that all biographical reference books could be categorized by the scheme, it seemed more useful to prove the system could handle any biographical reference tool than to compile a balanced list of biographical books. there was no difficulty in categorizing the books. all books are categorized in the following way. first an arbitrary abbreviation for the book is chosen to be its entry in the file; it is referred to as a "constituent." each book is then described by determining the values of nine subscripts each constituent carries, the subscripts being sex, living, nat (nationality), occup (occupation), min ( minorities), date, index, specl and spec2 (specifics). values of the first five subscripts-the exclusive categories-are first determined. that is, is the book limited to one sex? are all the subjects living or dead? do they all have a certain occupation? does the book include only certain nationalities? or is there another restriction; e.g., to alumni of a college, members of the nobility or a religious group? the exclusive categories for a book are determined and coded from a table of abbreviations. sex, for example, allows three values: restricted to males ( m), restricted to females (f) , or no restriction ( z) . also a value x must occur 242 journal of library automation vol. 1/ 4 december, 1968 with m or f, indicating there is a restriction. therefore sex can have the following combinations: sex z, sex f x, or sex m x; the values m x and f x are both the opposite of z. next the book's date is determined by asking "at what date did the values on living (yes or no) apply? or, if the subjects are not restricted to living or dead (living z), "when was the book up to date?" next any indexes to the biographies are noted. all the biographical books list subjects in alphabetical order by surname. lists of subjects in any other order are considered indexes even if the subjects are actually listed in some other order in the main body and the list that is alphabetic by surname is an index. finally, specific categories (spec i and spec2) are coded for such facts as birthdate, birthplace, college attended, degrees held, hobbies, illustrations, social clubs, and marital status. when all categorizing is finished, a data item is punched in this form: dictphilbio/ index field x, living n x, occup z, sex a, nat philip asian x, specl dc ds fl bp l cl cm dg e i z, date 50s x, spec2 p pl r ms pd z, min z +. this represents the dictionary of philippine biography, a book limited to dead filipinos and giving for each entry: dates, career, descendants, field, birthplace, long articles, class in college, degrees, education, picture, parents, publications, references, marital status and physical description. the book has a special index to find subjects by their field of work. one specific value, that for a long article, requires special mention. though most biographical reference books provide the same facts about all the subjects in list form, a few provide different facts about different subjects in a nanative form. such books carry the specl l, and the other specifics these books are listed as providing are not always given for every subject. for example, a book with a list format may provide the birthplace for every subject when it can be ascertained, but in a book using the narrative form, where often different authors write the articles, birthplace is not necessarily given. books in narrative form are used less for quick reference; therefore the program provides a note, when a long article is requested, that the card catalog may provide more long articles on the subject. ease of file maintenance is one advantage of this system. as data is analyzed in the first place, if a new value for a category is required, such as an occupation which is not in the list, the new value is simply added under occup for that particular book and in the list of abbreviations for fuh1re use. it is a little more complicated to make an existing value more specific. for example, to differentiate botanist, chem, physics and astron and still maintain scientist as a general category embracing them all, another short program is required to retrieve the data to be reclassified. retrieval of biographical reference books/ well 243 coding the question a biographical question can be quickly coded. the nine required subscripts are the same as those for the data books, but only one value for each subscript is necessary. for example, "'what are the publications of a living dutch economist? a current book is desired." is coded as q / sex z (or m), living y, nat dutch, occup econ, min z, index z, date 60s, spec! z, spec2 pl +. operation of the program briefly, the program reads in data and then the first question. it weeds out data items that can never be suitable, discarding all but those items that have the same values as the question has on the subscripts index, spec! and spec2. it then weeds out data items that do not have either the same values as the question, or the value z, on the subscripts occup, nat, min, sex and living. mter each weeding the program checks to determine that there are data items left; if all the books have been weeded out, there are no answers. there is also a provision to allow the user to designate certain titles to be ignored on a particular question in case he has already checked them, for example. all data items left after weeding are potential answers and could simply be printed out. however, subsequent searches over the remaining items serve the purpose of rearranging them into an order in which they are more likely to produce answers. it was decided that five answers are enough to judge the types of titles chosen yet few enough to avoid very long searches. a shorter list of answers would obviously be cheaper and a longer list more likely to produce a book containing the desired subject. ordering proceeds as follows: first values of subscripts sex, living, min, occup, nat and date on the question as originally stated are matched to those of books in the data. the computer is at this stage searching for books that are limited in just those categories in which the question is limited. for example, if the question q / sex z, living y, min z, nat dutch, occup econ, index z, date 60s, spec! a, spec2 pl + will match only those books published in the 1960's and restricted to living dutch economists which give publications for all the subjects (or the majority), and the books cannot be restricted to a sex or to any "minority" group. the books found may or may not have additional values on the subscripts; that is, a book may also contain french economists. such books found on the first search are mostly likely to contain the subject the questioner is looking for. if there are fewer than five books found which are a perfect match with the question, the program begins to alter the question. to make the least significant possible change in the question, the program changes the value of the subscript judged to be the limiting factor on the fewest books in the data, namely sex. if sex has a z as its value (because the questioner did not know the sex or did not prefer a book limited to one 244 journal of library automation vol. 1/ 4 december, 1968 sex) it is changed to x so that a book limited to one sex will not be overlooked. if sex does not have a z value (which means it has either m x or f x), it is changed to z. this means the questioner preferred books limited to one sex but presumably his second choice is books not limited to any sex. clearly if the question has sex f x it can never be changed to sex m x or sex x, since sex x will find books in the data classified sex m x. anything other than z changes to z, and z only changes to x. mter this change is made, another search is conducted and the answers counted. until there are five books or the data is exhausted, the original question is altered and the cycle continued. alterations proceed by changing the values of one subscript at a time in the following order: sex, living, min, nat and occup. then they are changed two at a time, three at a time, four at a time, and finally all five are changed, so there are thirty-one possible changes. if at the end of the thirty-second search there are still not five answers and there are more data items, the date restriction on the question is checked. if date has a value other than z, it is changed to z, which matches all the data items, and the computer prints a note if this is done; the program will then select any book regardless of date. control returns to search and begins the cycle again, continuing until five answers are found or the data is exhausted. mter searching is finished, the writing routine commences. one at a time the computer takes each answer, writes out its code for possible further reference, and then writes out the complete author, title, copyright date and library of congress call number, all of which the computer finds in a list within the program. results to obtain some measure of the program's accuracy, fourteen textbook questions, probably more challenging than the average patron would ask, were submitted to the computer and to a professional librarian who was especially familiar with biographical reference books. (see figure 1 for sample questions and results. ) the librarian spent a total of an hour and a half, and found answers to eleven out of fourteen questions. on the three she could not answer she felt she had exhausted the resources. in one of the eleven she answered ("how many americans won the nobel prize in medicine between 1930 and 1950?") she found the answer in a source not specifically biographical (world almanac) and therefore not in the computer's data. no problems occurred in forming the questions for submission to the computer. the program found some reasonable sources in all cases. it found books containing the answer in ten out of fourteen cases, the four answers not found being those three the librarian missed and the one requiring an almanac. in all but one case there were more possibilities than the five books given per answer. some questions were rerun ignoring retrieval of biographical reference books / well 245 qu~stion: in one source find a list of .1t least twenty references t o biog raphical information about dmitri ~1endelee£ (or mende. lev), russ i an chemist (1834-1907). as submitted to computer: q / sex h, livi ng n, occup cheh, nat russian, min z, specl z, specz r , index z, dt.te z + librarian's results: b phillips, dictionary of biographical a encyclopedia britannica a encyclopedia p-'llericana a biography index (1949-64 volumes) reference .. 0 references 6 references 1 reference .. 14 references time: 10 minutes computer's results: a index to scientists ... 27 references a biography index c drake, di ctionary of american biography (sounds wrong but it is international.) b phillips, dictionary of biographical reference a encyclopedia britannica question: what academic degrees have been earned by professor reuben l. hill, director of family study at the university of minnesota'?: as submitted to computer: (l) q/ sex m, living y, occup educ, nat aher, min z, specl dg, spec2 z, i~"dex z, date z + ( 2) ignore + 1\}ieconassn + i gnore + amerscience + ignore + ampolisci + ignore + damerschol + ignore + leadeduc + q / sex m, living y, occup educ, nat amer, min z, specl dg, specz z, index z, date z librarian's results: b leaders in education a who's who in arne rica answer: bs, phm, phd time: 3 minutes computer 's results: ( l) d handbook of the american economic association d d d b (2) b c a b b american men of science biographical directory of the american l'olitical science assoca t ion directory of american scholars leaders in education who's who i n american education outstanding young men of america who's who in america h'ho' s who in various areas national cyclopedia of american biograp hy question: where might i find information about a new england ancestor named jacob billings who was born around 1753'? as submitted to the computer: a / sex m, living n , occup z, nat amer, min ff , index z, date z, specl z, spec2 z + librarian's results: d handbook of genealogy about genealogists not families a compendium of american genealogy time: 8 minutes computer t s results: a compendium of a(!letican genea logy c dictionary of american biography c who ~.ras ~"1'10 in america c lamb's biographical dict i onary of the u. s. c concise dictionary of american biography a = it has the answer or a t least part of it b = good choice but it does not have answer c = reasonable choice but the r e arc better ones d = poor choice fig. 1. sample reference questions. 246 journal of library automation vol. 1/ 4 december, 1968 the first five answers, and five more titles were retrieved; even then there were more possibilities. in some cases the program did better than the librarian because she wasted time looking in sources that did not give the specifics sought. for instance, when the question asked for the pronunciation of the surname of paul and lucian hillemaker, french composers, she looked in dictionaries that do not give pronunciation. the computer found the only four possible sources immediately. in other cases the program came up with rather far-fetched answers a human would have skipped. a question asking for biographies of franz rakoczy, an hungarian hero, retrieved in its second five sources three jewish encyclopedias and a book on composers! these were not wrong and, in cases where occupation or minority group affiliations were unknown, these might be good sources. as an answer to the nobel-prize-winner question the computer retrieved sources on american doctors, nobel winners and scientists, which are the best choices from the data and would have the answers buried in them. however, what is really required is an index to award winners, and there were none in the data. the test revealed the necessity for allowing questions to have dummy values; that is, ones not used in the data. for instance there are no books limited to botanists, so occup botanist is not allowed in a question, though occup scientist is, and chem and physics are included as more specific values under scientist. asking for occup scientist when searching for a botanist avoids getting books devoted to nonscientific occupations but also gets books devoted to chemists and physicists. since one would want these books if he did not know the scientist was a botanist, that should not be changed. if he asks for occup botanist he wants books devoted to botanists first, then scientists in general. a short-term solution is to have dummy values to stand for all these other values. for example occup other-scientist could include all scientific occupations except those specifically listed, and it would retrieve books limited to all scientists but not to specific scientific occupations mentioned in the data. a long-term solution is to use a computer language allowing tree-structured data. presently this problem does no more than cause extraneous retrievals which the person using the list can easily skip. discussion advantages of the scheme can be speculated. from the library's point of view its virtues are that it is simple and inexpensive. original implementation would not require a major block of time to be spent in human indexing or abstracting. operating costs would be low because it does not require such a large store of information in memory that several tapes must be searched, and because updating the file is simple. when a new retrieval of biographical reference books/ well 247 book is added, an experienced person could categorize it in five minutes, punch a new data card and, if required, add to the list of values in the table of abbreviations. the system could provide useful information to other departments. it could keep tallies for the acquisitions department of how often a book is given as an answer, indicating whether new editions of it or similar books would be good buys. from the user's point of view the system avoids a major pitfall of some retrieval schemes which retrieve on the basis of ambiguous terms or association chains; that is, missing relevant items. if the user resubmits the same question ignoring already retrieved books each time, he will eventually have a comprehensive list of possible sources in the data that have the index and specifics he requires. a user also wants his information as brief as possible, listed in order of importance and with no extraneous answers ( 7); this requirement could be met as the program stands by having a human simply cross out any unnecessary titles. users like to know the reliability of the information ( 7) ; this detail could be provided along with the titles. users also want speed and convenience. as it stands, this system could be made available to users of the university of chicago library tomorrow with no more equipment than is presently in the computation center. time delay in the present implementation could be remedied by using an on-line system. users often prefer to be given facts themselves and not just citations ( 7). a program that gives biographical facts directly has no connection with this scheme or classification system, but the output of this program could be used as a tool by a librarian to find the answer for a patron. bibliographies the most obvious area to which the retrieval scheme could be extended is that of bibliographies. like biographies, they are limited in their scopes to certain exclusive categories, and they contain the same specific facts for each entry. logical exclusive categories could be: nationality, form (with such values as drama, poetry, fiction, maps, etc. ), subject (probably the most frequently used criterion on which to select books for a bibliography), and date. since there is no living with which to connect date, date here should probably have not just the most recent relevant date but as many values as necessary. for instance date 40s 50s 60s would apply to an index that began publication in the 1940's and is current. then a request for any of those dates would find it. possible specifics include number of pages, the cost, or a facsimile of the title page. arrangement would be needed, being different from index in that bibliographies, unlike biographies, cannot be assumed to have the same order (alphabetic by subject's name) plus indexes in other orders. arrangement would list as values all the ways the con248 journal of library automation vol. 1/ 4 december, 1968 tents of the bibliography could be approached: by subject, author, title, chronology or a combination of these. dictionaries dictionaries also lend themselves well to this type of scheme; one exclusive category, subject, might even be adequate for dictionaries. dictionaries' special subjects could be broken down into field (such as chemistry or business) and type (such as slang or geography), if necessary. language would be a specific category, since there are no substitutes for the language required. other possible specifics are pronunciation, definition, etymology and illustration. atlases atlases are also suited to the scheme. exclusive categories that seem appropriate are area covered, special subject atlases, and the size of the scale. scale should probably act as date does in the biographical program; that is, if a particular scale is requested, that would be searched for first and, if no answer is found, a note would be given and another search made for any scale. specifics for atlases could include items like topography, rainfall, winds, cities, highways and major products. factual books (those that give the highest mountain, the first fourminute mile, the january loth price of u.s. steel, etc.) do not lend themselves to the scheme. because these books are not uniform as to entries and subject coverage, the list of possible specifics and exclusive categories would be extremely long and the number of searches consequently prohibitive. also, since such books are far fewer in number than biographical or bibliographical works, the proper one is easier to find by browsing. conclusion a scheme for categorizing biographical reference books by their exclusive and specific categories makes it possible to automatically retrieve titles of those which would best answer reference questions. when tested it was found acceptable, with minor refinements, and it is easily adaptable to other reference book forms. such a system seems a logical direction in which to go when automation of actual reference functions is undertaken. acknowledgment the project under discussion was undertaken in partial fulfillment of requirements for the m. a. degree at the university of chicago's graduate library school. the computer program employed is detailed in the author's thesis ( 8). the work was partially completed under the auspices of aec contract no. at(ll-1)614. retrieval of biographical reference books / well 249 references 1. university of illinois library school: the library as a community information center. papers presented at an institute conducted by the university of illinois library school september 29-0ctober 2, 1957 (champaign, illinois: university of illinois library school, 1959), p. 2. 2. shera, jesse: "automation and the reference librarian," rq, iii, 6 (july 1964), 3-4. 3. austin, charles j.: medlars 1963-1967 (bethesda, national institutes of health, 1968). 4. haas, warren j.: "statewide and regional reference service," library trends. xii, 3 (january 1964), 407-10. 5. yngve, victor: com it programmers' reference manual (cambridge, mass.: m. i. t. press, 1962). 6. hsu, r. w.: characteristics of four list-processing languages (u. s. department of commerce, national bureau of standards, sept. 1963). 7. goodwin, harry b. : "some thoughts on improved technical information service," readings in information retrieval (new york, scarecrow press, 1964) , p. 43. 8. weil, cherie b.: classification and automatic retrieval of biographical reference books (chicago: university of chicago graduate library school, 1967). kwic index to government publications margaret norden: reference librarian, rush rhees library, university of rochester, rochester, new york 139 united states and united nations publications were not efficiently processed nor readily available to the reader at brandeis university library. data processing equipment was used to make a list of this material which could be referred to by a computer produced kwic index. currency and availability to the user, and time and cost efficiencies for the library were given precedence over detailed subject access. united states and united nations classification schemes> and existing bibliographies and indexes were used extensively. collections of publications of the united states government and the united nations are unwieldy and, often, unused. orne (1), kane (2), and morehead ( 3) have acknowledged that much of the output of proliferating governmental agencies and government supported research centers is hardly accessible. successful attempts to control the literature of a particular subject field, such as the indexes to the human relations area files and the american political science review, have been compiled by kenneth janda ( 4). others ( 5,6,7,8,) have described projects which apply the kwic index method of control to industrial research reports. no similar attempt to control government publications has been reported, although at northeastern university data processing equipment has been used to list united states material. the index developed at brandeis university library was designed to accommodate the varied government publications held by a library which served student, faculty and researcher alike. 140 journal of lib-rary automation vol. 2/3 september, 1969 materials and method brandeis became a selective united states document depository late in 1965. two years later a government documents department was created to handle all united states publications, as well as those of the united nations. about 15,000 united states publications and a smaller number of united nations publications had previously been acquired and processed as a regular part of the library collection. this material formed the nucleus of the documents collection, to which some 3,000 pieces were added yearly. the new department ordered and received all publications issued by federal government agencies and the united nations, but processed and serviced only about 80% of them. materials that had been acquired for the science library or special collections, such as reserve, were directed to regular library processing departments. the materials retained were classified and arranged according to the superintendent of documents classification and the united nations scheme wherever such numbers were available. all previously cataloged items were removed from the regular collection and scheduled for reclassification. only where superintendent of documents and united nations numbers were not available was library of congress classification retained or assigned. the collection then consisted of material arranged in three sections according to the classifications of the superintendent of documents, the united nations, and the library of congress. the kwic index included all united states and united nations publications located in the documents department. the reader was reminded that additional material issued by those government publishers, housed elsewhere in the libraries, was included in the library catalog. prefatory material included a list of symbols and abbreviations. a two-part index to issuing agencies, represented by six-letter mnemonic acronyms, was arranged alphabetically by acronym, and by bureau name. the r eader was cautioned to consult the united states government organization manual and a united nations organization chart for identification of government agencies and for tracing frequent changes in their structure and nomenclature. the documents list consisted of two parts: one, an accession number listing; and two, a kwic index to part one. upon arrival at the library, publications were numbered and ibm cards were punched according to format cards that described allocation of columns: column card 1 1-6 7 8-13 14-79 80 information item number card number author agency title field blank column card 3 1-6 7 8-20 21-54 55-79 80 information item number card number procedural data holdings kwic index 141 classification number blank cards one and three were punched for all documents; however, cards two and four were punched only where data exceded the prescribed spaces on cards one or three. columns one through six were reserved for the accession numbers. a special punch in column one was used to identify united nations documents so that they were listed after the united states sequence. column seven indicated the card number for a given document and was suppressed in the print-out. the title field included not only the title, but series and number, personal author and monographic date where this information was suitable. the flexible field was used for any information for which the librarian wished kwic cards. a cross reference or explanatory note about the location of publications of a quasi-independent agency was incorporated in the title field. the procedural data included type of publication, binding and frequency information, accounting data and similar notations. a sample of part one has been reproduced in figure 1. part two of the list, the kwic index, was produced by an ibm 1620 computer, model one with 40k memory. an excerpt has been reproduced in figure 2. only cards one and two were put into the computer along with the program and dictionary of exceptions. cards three and four were not used to produce the kwic index. the program required production of cards for author acronyms and for all keywords found in the title field. except in the cases of author acronyms and first words, a keyword was identified by the fact that it followed a blank space. blanks were not necessary in these two cases because they were incorporated in the computer program. single letters, integers, and exceptions were not considered keywords. the index was printed so that the accession number always appeared on the left, and the author agency was followed by an asterik and a space. the wraparound format usually associated with kwic indexes was abandoned to improve visual clarity. results about eight months after its inception, 2600 items had been entered on a separate documents collection list. the list had been printed offline on three-part print-out paper interleaved with carbon. mter the papers were reassembled in looseleaf binders, they were made available in the documents and reference departments and in the science library. ·•' ' 142 journal of library automation vol. 2/3 september, 1969 131) .jntpub 130 32 22 trans. lations on communist chinae 6 50ooonoo1t1968-to date y3oj66/13 131 rt.ral 131 42 22 federal programs available to assist rural americao 1968o· 60epost 1968 ' a97 o2/l5z 132 fedi\ar 132 ~ directory of research natural ar~as on federal lands of the u.s. 1968. nz 4z 22 60epost 1968 y3 of 31/19/9/968 133 jntpub 133 tr-anslations on east european agricul;turet forestry + foodt industries, 133 32 22 6gift i/noo117ol963-to' date y3oj66/13 iranslaiions on. easi european foreign ira 134 jnipob 134 :>2 22 6 15ooo//noo80tl963-to date y3oj66/l3 135 jntpub 135 translations on economic organization and management in eastern europe. 135 32 22 6 65,00n0ol44t 1963-to date y3oj66/l3 136 jntpllb translations on mongoi.!ao 136 32 22 6 13,00noo19t 1963-to' date 137 jntpu'b translations on north koreao 137 32 22 6 20ooonoo1t 1966-to date lio jntpub translations on north vietnamo 138 3'2 22 6 70,00not1't 1966-to date 139 jntpub translations on south + east asiao 139 32 22 6gift 1963to date 140 jntpub translations on latin americao 140 32 22 6150,001967-to date 141 jntpub translations on the near east. 141 32 22 6 50,001966-to date 143 jntfub 143 32 22 translations on u,s,s,ro agricultureo 6 '10o001967-to date y3oj66/l3 y3oj66/13 y3oj66/13 y3oj66/13 y3tj66/13 y3tj66/13 y3oj66/l3 144 jf\;tpu6 144 32 22 u,s,s,r, economy and !ndustryo general information, 6 40,001966-to date y3oj66/13 145 l!bcon accessions listt ceylono 145 32 22 6pi..-480//l967-to date lcle3017/volt/noo 146 libcon accessions l!stt !nd!ao 146 32 22 5pl-480//l963-to date lc1,301volo/noo 147 libcon accessions listtindonesia o 147 jz 2 2 6pl-4801964-to date lc1o30/5/vol.inoo 1'48 libcon accessions listt middle east, 148 3z 2z 5pl-480//1963-to date lclo 30/3/vo.lo/noo 149 libcon accessio ns listt nepalo fig. 1. accession number list. 166 113 6 68 66 164 117 113 64 56 163 44 19 31 32 3 71 83 108 195 176 190 78 1~ 172 160 66 123 11 74 18 179 30 62 174 l2.3 177 42 93 . 118 179 166 155 9cooc2 96 176 90vv.j3 139 133 134 128 129 135 55 141 148 kwic index 143 congresso 1949-1951. desper * repto to the president and the congressional otre(toryo congrs *official congressional district data booktredistricteo states• suppe census * congressional recordo congrs * congrs * congressional recordo congrs * factual.. campaign informationt1968o congrs * message of the president of the u,s, and accompanying documents congrs * official (ong!iess!onal. directoryo conrep * calendars of the uoso house-of-representatives, conrep * laws relating to social security + unemployment compensatjontl9 consen * calendar of business, consen * nomination + election of president + vice-president of the u,st conservation, agrscs * so!~ consir!!ction reptso ohous!ng authorized by......b.\.i.l.l.d.lng. p.e.rmjj_a_~el!~y._s_* __ construction reptstthousing saleso census * construction reptsot housing authorized by building permits, census * consumer price indexes, lsblab * contemporary artistt pubo 4730o 1968, smiths * the armed forces of the u contributionst oasi-noo hewsaa * social security co•lperat i ve water resojj_rc..e.s_f3~.e.ft..b~l:la.~-d_.i.r~ u:u.i'l.y.lli~_1_a_l_n_t_~.!h__::*--- coronary drug projectt phs pubo 1965t revo l968o hewphs * the corps• l967o opport * job cost of clean watero 1968o wtrpol * countries for ~!sa of u,s, passports, stated * fees charged by foreign county business patternse census * court decisions, spctju * supreme courts, 1966 t1967 • childb •. legal!lil!c'iograt'h;rf"® ·.ruve'nllearo-f'a'filtv·crime + delinquency abstracts, hewphs * crime delinquency abstracts, before 1965 see international bibliography on crime + delinquencyo crisis of our citieso 1968, presdt * meeting the insurance crisistfactst myth + social changeo 1967• hewwel * rural youth in cr')ps + market so foragr * foreign ag.ricult(ir-e including • • • --··~ ct-1o 1967o census .* data access descriptionst census tabulations avatla deaf peopleo1968o vocreh * vocational rehabi~itation of defciv * safety reviewo defwro 4 naval research reviewst del!nquency abstracts'!. . .':!.~ wpi-)s * crime '!' . demonstration findingso labord * mdtatexperimental + department of state bulle stated * department store sales in selected areas, census * special current busin deptot or missouri-miscellaneouso 1863e comwar 4 repto western descriptions• census tabulations available on computer tap! series• ct-1 desper * repto to the pr.e.s.ule.iil._an_d_.jj:i.e. _con~r.es.so_,_ ].9!+9-j.9.-=5~1.t.•=--:-:--=---::=:::: developments 1834-1962t l962o stated * a historical summary of uoso-kore developmentsol96l-65o1968o ecosoc * capital punishmentt pttl repto1960tp drug abuse controlo hewfda * boac bulletin issued by the bureau of drug projectt phs pubo 1965t revo 1968o hewphs * the coronary duties• guardianshipo 1968o womenp * arental rights and east as l a. jntp..ujl! _ _t_rai'is.l.a .. llon_s_qtl_. south_.+ __ __ ·~ -~=c=;;:---:t.=c:-=---::--=:,...,-;-= east european agricul turf t forestry + foodt i ndustri es o jntpub * transl-,;" east european forf.ign tradeo jntpub * translations on eastern europeo jntpub * political translations on eastern europeo jntpub * sociological translations on eastern europe o jntpub * translations on economic organization and manag eastern europ~-l-state!l!'_t;_~_qi}._t;i.§_e_?_w!th the. s_ovg,t .union .• + .•• . ·----easto jntpub * translations on the near east, libcon * accessions ltsro middle fig. 2. kwic index. 144 journal of library automation vol. 2/3 september, 1969 the copies were usable only on the temporary basis for which they were intended. pages ripped easily in the binders. the printing on copies two and three, which were carbon copies, smudged readily. production of more permanent copies of the list was deferred until the catalog should be more complete. because of the preliminary nature of this project, no specific time accounting was made. there was an attempt to increase student assistant duties in order to save regular staff time. the librarian annotated superintendent of documents shipping lists to indicate which items required new punched cards. she omitted, for example, journals that were entered once with an open holdings statement. after annotation, punched cards for united states depository documents were made by student assistants who had been previously introduced to the allocation of columns and to punching procedures. for non-depository united states and united nations publications, the librarian mapped cards on 80-column format sheets. in the production of part two, the kwic index, staff was involved only to make cross references. since the kwic program had been designed to make entries for all words in the title field other than the dictionary of exceptions, cross references had to be interfiled manually after the kwic entries had been made and alphabetized. the cost of materials for the addition of 100 entries to the list is tabulated below: materials ibm cards ( 800) print-out paper ( 8 sets) ibm 1620 computer rental ( 4 minutes) costs $ .80 + freight .32 + freight 1.68 $ 2.80 + freight there was no charge for the use of the keypunch (ibm 026), sorter (ibm 082), and accounting machine (ibm 407), nor was the library charged by the computer center persom:iel who wrote the program. for the first 100 items, all cards were duplicated in production as insurance against destruction (thus the card expense itemized above was doubled). the duplicate deck was later eliminated because the time spent in duplicating and interpreting these cards was greater than that required to repunch the deck from the list entries. storage space was available without cost, and no new storage equipment was purchased. the kwic program was written so that keyword cards were made for all words in the title field except listed exceptions, single letters and integers. it seemed at the inception of the project that such a program, which allowed untrained assistants to punch cards with a minimum of difficulty, was preferable to one that involved tagged keywords. however, ' the necessary filing and removal of cross references subsequently proved an inconvenience when the list was updated and reprinted. kwic index 145 discussion the productivity of government publishers has directed so much material into the library that ordinary procedures have been overtaxed. card catalog entries, for example, have become tardy, cumbersome, and incomprehensible to the library user and expensive for the library. the kwic list was designed as a substitute; however, it was useful only where the subject of a publication had been fairly reflected by its title. the possibility of incorporating descriptors in the title field of the list was considered, but rejected in the interests of speed and efficiency. the list depended upon standard reference sources for more complete subject and analytic cataloging. most often used in the case of united states publications were the superintendent of documents monthly catalog (9) and its auxiliaries such as the bureau of the census catalog ( 10). other sources included: wilson, popular names (11), the readers guide, the social science and humanities index, the business periodicals index, the index to legal periodicals, and the commerce clearing house index. for the united nations publications, greater use was made of the trade publications such as the periodic check list (formerly the monthly sales bulletin), the international reporter, the unesco catalogue, and the publishers trade list annual section for "unesco." the kwic index also was limited in that it covered only documents in the library's collection. while the user was convenienced by the ready availability of all items listed, he was obliged to consult reference sources for other existing documents. the new tool had advantages similar to book catalogs in terms of space saving and ease of duplicating. although originally only three copies were made, the possibility of duplication and distribution of this list to interested academic departments had been considered. it was also intended that new punched cards would be used to produce lists of new accessions, which would be duplicated and circulated. the problem of updating involved reprinting of part two, the kwic index, after previous inter-alphabetizing of entries. part one, however, was not reprinted as new entries were added successively. corrections were made by duplicating parts of cards and punching where necessary. the availability and cun-entness of such a list would presumably have encouraged the faculty and students to make greater use of these materials, and eliminated duplication of purchase orders. a major drawback to the list was that its arrangement, by accession munbers, bore no particular logic. a classification number an-angement would have been more meaningful to the reader; it would also have served as a shelf list and provided material for subject holdings lists. however, the ibm cards were not so arranged because neither the mechanical nor manual sorting of multi-digit and letter numbers was practical. arrangement by superintendent of documents numbers was employed at north146 journal of library automation vol. 2/ 3 september, 1969 eastern university, boston, massachusetts, and proved so inadequate that the librarian added subject headings to documents punched cards. this extra time-consuming step, plus the need to manually file punched cards, influenced the author to abandon shelf list order. a second difficulty involved in the kwic project was the dependence of the library upon use of equipment owned by another agency. it was conceivable that alterations in the equipment, policies or personnel of the university computer center could enforce changes on the library's listing procedure. this evaluation of the kwic index excluded considerations of the separation of the documents and reference departments. this matter has been thoroughly discussed elsewhere ( 12) . two other subjective considerations appeared during the first year of operation. most serious was the estrangement between documents and reference personnel. since both departments served the public, and their material was distinguished only by publisher, each staff relied extensively upon the other. cooperation and acquaintance with library material was difficult to maintain in two separate departments. because documents staff were primarily public service personnel, their extensive involvement in technical processes was not an efficient use of staff expertise. on the other hand, complete responsibility for this portion of library holdings insured that the staff became thoroughly acquainted with the collection and were better able to serve the public. conclusion the kwic index to government publications at brandeis is difficult to evaluate before the tests of time and use have been made. the system was suitable for the university library in that it was frequently consulted by the same, relatively sophisticated, users who were eager to familiarize themselves with library material. the kwic list itself emphasized currentness and flexibility at the expense of detailed subject access. this system attempted to utilize a potential goldmine of material without major investment or upheaval in the library. it has been sufficiently resilient to withstand a complete change of department personnel and was successful enough so that the possibility of expansion is being considered. note this report described the documents department as it functioned at its inception in september, 1967. the author left brandeis universityin june of 1968. the scope of the documents list was changed in september 1968 to include all united states and united nations publications acquired by the library. any inquiries about the present system should be directed to the current documents librarian: mr. michael abaray, goldfarb library, brandeis university, waltham, massachusetts 02154. kwic index 147 references 1. "report on the sixty-ninth meeting of the association of research libraries, new orleans, la. 1/8/67," lc information bulletin, 26 (january 26, 1967), 70. 2. kane, rita: "the future lies ahead: the documents depository library of tomorrow," library journal, 92 (november 1, 1967), 39713973. 3. morehead, joe: "united states government documents-a mazeway miscellany," rq, 8 (fall1968), 47-50. 4. janda, kenneth ed.: "advances in information retrieval in the social sciences," american behavioral scientist, 10 (january and february 1967). 5. sternberg, v. a.: "miles of information by the inch at the library of the bettis atomic power laboratory, westinghouse electric corporation," pennsylvania library association bulletin, 22 (may 1967), 189-194. 6. lawson, constance : "report documentation at texas instruments, incorporated," special libraries association, texas chapter bulletin, 15, (february 1964), 14-17. 7. minton, ann: "document retrieval based on keyword concept," special libraries association, texas chapter bulletin, 15 (february 1964)' 8-10. 8. bauer, c. b.: "practical application of automation in a scientific information center-a case study," special libraries, 55 (march 1964), 137-142. 9. united states. superintendent of documents: monthly catalog of united states government publications (washington : government printing office, 1895) . 10. u. s. bureau of the census : bureau of the census catalog ('washington: government printing office, 1945). 11. wilson, donald f. and william p. kilroy, comps.: popular names of united states government reports, a catalog (washington: government printing office, 1966). 12. shaw, thomas shuler, ed.: "federal, state and local government publications," library trends, 15 (july 1966), 3-194. patrick griffis building pathfinders with free screen capture tools building pathfinders with free screen capture tools | griffis 189 this article outlines freely available screen capturing tools, covering their benefits and drawbacks as well as their potential applications. in discussing these tools, the author illustrates how they can be used to build pathfinding tutorials for users and how these tutorials can be shared with users. the author notes that the availability of these screen capturing tools at no cost, coupled with their ease of use, provides ample opportunity for low-stakes experimentation from library staff in building dynamic pathfinders to promote the discovery of library resources. o ne of the goals related to discovery in the university of nevada las vegas (unlv) libraries’ strategic plan is to “expand user awareness of library resources, services and staff expertise through promotion and technology.”1 screencasting videos and screenshots can be used effectively to show users how to access materials using finding tools in a systematic, step-by-step way. screencasting and screen capturing tools are becoming more intuitive to learn and use and can be downloaded for free. as such, these tools are becoming an efficient and effective method for building pathfinders for users. one such tool is jing (http://www.jingproject.com), freeware that is easy to download and use. jing allows for short screencasts of five minutes or less to be created and uploaded to a remote server on screencast.com. once a jing screencast is uploaded, screencast.com provides a url for the screencast that can be shared via e-mail or instant message or on a webpage. another function of jing is recording screenshots, which can be annotated and shared by url or pasted into documents or presentations. jing serves as an effective tool for enabling librarians working with students via chat or instant messaging to quickly create screenshots and videos that visually demonstrate to students how to get the information they need. jing stores the screenshots and videos on its server, which allows those files to be reused in subject or course guides and in course management systems, course syllabi, and library instructional handouts. moreover, jing’s files storage provides an opportunity for librarians to incorporate tutorials into a variety of spaces where patrons may need them in such a manner that does not require internal library server space or work from internal library web specialists. trailfire (http://www.trailfire.com) is another screencapturing tool that can be utilized in the same manner. trailfire allows users to create a trail of webpage screenshots that can be annotated with notes and shared with others via a url. such trails can provide users with a step-by-step slideshow outlining how to obtain specific resources. when a trail is created with trailfire, a url is provided to share. like jing, trailfire is free to download and easy to learn and use. wink (http://debugmode.com/wink) was originally created for producing software tutorials, which makes it well suited for creating tutorials about how to use databases. although wink is much less sophisticated than expensive software packages, it can capture screenshots, add explanation boxes, buttons, titles, and voice to your tutorials. screenshots are captured automatically as you use your computer on the basis of mouse and keyboard input. wink files can be converted into very compressed flash presentations and a wide range of other file types, such as pdf, but do not support avi files. as such, wink tutorials converted to flash have a fluid movie feel similar to jing screencasts, but wink tutorials also can be converted to more static formats like pdf, which provides added flexibility. slideshare (http://www.slideshare.net) allows for the conversion of uploaded powerpoint, openoffice, or pdf files into online flash movies. an option to sync audio to the slides is available, and widgets can be created to embed slideshows onto websites, blogs, subject guides, or even social networking sites. any of these tools can be utilized for just-in-time virtual reference questions in addition to the common use of just-in-case instructional tutorials. such just-in-time screen capturing and screencasting offer a viable solution for providing more equitable service and teachable moments within virtual reference applications. these tools allow library staff to answer patron questions via e-mail and chat reference in a manner that allows patrons to see processes for obtaining information sources. demonstrations that are typically provided in face-toface reference interactions and classroom instruction sessions can be provided to patrons virtually. the efficiency of this practice is that it is simpler and faster to capture and share a screencast tutorial when answering virtual reference questions than to explain complex processes in written form. additionally, the fact that these tools are freely available and easy to use provides library staff the opportunity to pursue low-stakes experimentation with screen capturing and screencasting. the primary drawback to these freely available tools is that none of them provides a screencast that allows for both voice and text annotations, unlike commercial products such as camtasia and captivate. however, tutorials rendered with these freely available tools can be repurposed into a tutorial within commercial applications like camtasia studio (http://www.techsmith.com/camtasia .asp) and adobe captivate (http://www.adobe.com/ products/captivate/). patrick griffis (patrick.griffis@unlv.edu) is business librarian, university of nevada las vegas libraries. 190 information technology and libraries | december 2009 as previously mentioned, these easy-to-use tools can allow screencast videos and screenshots to be integrated into a variety of online spaces. a particularly effective type of online space for potential integration of such screencast videos and screenshots are library “how do i find . . .” research help guides. many of these “how do i find . . .” research help guides serve as pathfinders for patrons, outlining processes for obtaining information sources. currently, many of these pathfinders are in text form, and experimentation with the tools outlined in this article can empower library staff to enhance their own pathfinders with screencast videos and screenshot tutorials. reference 1. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 30, 2009): 2. unlv special collections continued from page 186 references 1. peter michel, “dino at the sands,” unlv special collections, http://www.library.unlv.edu/speccol/dino/index.html (accessed july 28, 2009). 2. peter michel, “unlv special collections search box.” unlv special collections. http://www.library.unlv.edu/speccol/ index.html (accessed july 28, 2009). 3. unlv special collections search results, “hoover dam,” http://www.library.unlv.edu/speccol/databases/index .php?search_query=hoover+dam&bts=search&cols[]=oh&cols []=man&cols[]=photocoll&act=2 (accessed october 27, 2009). 4. unlv libraries, “southern nevada: the boomtown years,” http://digital.library.unlv.edu/boomtown/ (accessed july 28, 2009). 5. unlv special collections, “what’s new in special collections,” http://blogs.library.unlv.edu/whats_new_in_special_ collections/ (accessed july 28, 2009). 6. unlv special collections, “unlv special collections facebook homepage,” http://www.facebook.com/home .php?#/pages/las-vegas-nv/unlv-special-collections/70053 571047?ref=search (accessed july 28, 2009). 7. unlv libraries, “comments section for the aerial view of hughes aircraft plant photograph,” http://digital.library .unlv.edu/hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “‘rate it’ feature for the aerial view of hughes aircraft plant photograph,” http://digital.library.unlv.edu/ hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “rss feature for the index to the welcome home howard digital collection” http://digital.library.unlv.edu/hughes/ dm.php/ (accessed july 28, 2009). statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2009 issue). total number of copies printed: average, 5,096; actual, 4,751. mailed outside country paid subscriptions: average, 4,090; actual, 3,778. sales through dealers and carriers, street vendors, and counter sales: average, 430; actual 399. total paid distribution: average, 4,520; actual, 4,177. free or nominal rate copies mailed at other classes through the usps: average, 54; actual, 57. free distribution outside the mail (total): average, 127; actual, 123. total free or nominal rate distribution: average, 181; actual, 180. total distribution: average, 4,701; actual, 4,357. office use, leftover, unaccounted, spoiled after printing: average, 395; actual, 394. total: average, 5,096; actual, 4,751. percentage paid: average, 96.15; actual, 95.87. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 0 9 . 128 bell laboratories' library real-time loan system (bellrel) r. a. kennedy: bell telephone laboratories, murray hill, new jersey bell telephone laboratories has established an on-line circulation system linking two terminals in each of its three largest libraries to a central computer. objectives include improved service through computer pooling of collections, immediate reporting on publication availability or a borrower's record, automatic reserve follow-up; reduced labor; and increased · management information. loans, returns, reserves and many queries are handled in real time. input may be keyboard only or combined with card reading, to handle all publications with borrower present or absent. bellrel is now being used for some 1500 transactions per day. introduction as part of a continuing program to exploit available technology to improve library service, the technical information libraries system of the bell telephone laboratories has established an on-line, real-time computer circulation network. the initial configuration links two terminals in each of the holmdel, murray hill and whippany, new jersey, libraries to a central computer at murray hill. these are the three largest libraries in bell laboratories, handling 75% of a system total of more than 300,000 loans per year. the bellrel system is designed to process loans, returns, reservations and queries with real-time speed and responsiveness; additionally, it provides a wide range of other products and information basic to the effective control and use of library resources. the libraries of bell laboratories, like many other research libraries, have experienced unprecedented growth over the past decade in facilities, collections, services and traffic. new approaches have had to be found bellrel/ kennedy 129 not only to supply information services of sufficient power and diversity to meet the needs of a communications research organization of over 15,000 people, but also to cope with the expanding volume of everyday work in its eighteen library units. as elsewhere, a large component of that work is circulation in all of its ramifications: direct service, record-keeping, follow-up, resource identification, inter-unit coordination, feedback for purchase and purge decisions, etc. the bellrel system is addressed to these problems within the context of the bell laboratories. the use of computers in circulation control is no longer novel. the studies done by george fry and associates for the library technology project of the american library association emphasize the expense of implementing computer-aided circulation systems ( 1,2). despite these studies, which tend to focus more on the gross costs of substituting data processing for manual techniques than on the immediate and long-range gains for the library as an information system, a trend to the computer is clear. southern illinois ( 3), lehigh ( 4), and oakland ( 5) are among the many university and research libraries which have automated circulation operations using the ibm 357 data collection system and batch processing. comparable systems are in use or planned by other libraries (6,7). latterly there is increasing evidence of serious interest in real-time circulation control. the queen's university of belfast ( 8 ), and the state university of new york at buffalo ( 9) are two institutions reporting studies. redstone arsenal has been demonstrating a two-terminal, on-line system for about a year as part of a comprehensive automation program (10). the bellrel system was put into regular service in march, 1968, after two months of dry-run testing at all six terminals. this paper describes the reasons for changing from a manual system; the objectives established for the new system; the alternatives evaluated; the principal elements, operations and services of the selected system; and problems and performance in the brief period of operations to date. the paper is essentially a summary description; it does not report in detail on all card, disk and tape formats, maintenance procedures, products, logical operations, etc., of the system and its fifty-plus programs. a further report on bellrel will be published when significant experience has accrued. the displaced manual system the newark self-charge-signature system has been used by bell laboratories' libraries for some forty years. in this well-known simple system, the borrower writes his name and address on a prepared book card pulled from the book pocket. for the two out of three loans at bell labs where the borrower is not present, a circulation assistant fills out the card, which is then date stamped, tabbed for date due and filed by author. minor variations on this practice are used for unbound journals and other items lacking book cards. 130 journal of library automation vol. 1/2 june, 1968 reservations for individuals or other libraries in the network are hand posted on the charge card. files are scrutinized for overdue dates every several days (latterly, less frequently as traffic has mounted) and notices prepared by xerox copying of the charge card on an appropriate form. although standard loan periods run from one to four weeks, depending upon the item and demand, about 30% of all loans result in overdue notices. each library in the network has maintained its own circulation rec~ ords, including records for the local circulation of items borrowed on inter~ unit loan. inter-unit traffic is heavy, although substantial duplication of important publications exists in the various libraries. the merits of the newark self-charge system-simplicity, fast handling of borrowers, relatively low cost-are widely known. the system is a venerable one; it works. but all circulation systems have imperfections and in the bell laboratories long-recognized deficiencies of the manual system became increasingly unacceptable when loan traffic began to approach, then exceed, 200,000 items per year. these deficiencies included: 1. an increasing number of hours spent on the tedious and uninspiring tasks of sorting, tagging, posting, slipping, checking and husbanding cards. 2. labor, frequent delays and poor service associated with processing over 60,000 overdue notices per year. 3. inability automatically to use the pooled resources of several libraries to meet demands. 4. inability to determine quickly not merely the holdings of other copies of a title in the library system (union catalogs serve this purpose, after some steps and card handling) but the availability of loan copies at the moment of need. 5. inefficiencies in tracking down missing publications, inventory items, etc. 6. inability to identify all publications currently on loan to a borrower or used by him sometime previously. 7. inadequate information on collection use for resource management. 8. excessive service delays due to combinations of the preceding factors. new system objectives the deficiencies listed above suggest some of the characteristics defined for the new system. library management concluded early in 1965 that any replacement for the existing system must: 1. meet the long-range needs of each of the major libraries in bell laboratories and be extensible to other units in the library network as traffic, experience and costs warranted. . 2. provide not merely a more effective means for handling circulation operations within the walls of any one library but also, if possible, bellrel/kennedy 131 an instrument for knocking walls down, for bringing the combined resources of a number of libraries to bear on any information need. 3. handle all types of materials, bound or unbound, and all types of requests whether in person, by mail, direct telephone or recorded message (i.e., telereference) service. 4. give immediate up-to-the-minute accounting for all items on loan or otherwise off the shelves and locate copies still available for loan. 5. hold reservations against system resources (in line with objective 2) and direct the first copy returned, wherever returned, and as automatically as practical, to the first person on the reserve queue, whatever his base location. 6. identify promptly all items currently charged to a borrower and, as required, previously borrowed by him. 7. monitor circulation traffic and generate, as necessary, overdue notices, missing item lists, high-demand lists, zero-activity reports, statistics, use analyses and other feedback fundamental to effective control and management of the collection. 8. lift the circulation staff from clerical tasks to more personal service to library users, in the interest of the "human use of human beings," to use norbert wiener's phrase. 9. integrate the loan system with other computer-aided systems in use or planned in the libraries. 10. improve the total response of the library to the user. systems evaluated in view of these objectives it will be apparent that only a computeraided system could be seriously considered. none of the several dozen noncomputer systems surveyed in the fry report ( 1) could be considered a worthwhile alternative to the libraries' manual system. the essential questions therefore became: off-line or on-line access? batch or real-time processing? the demonstrated success of the ibm 357 batch processing circulation system compelled study and on-site investigation in several libraries. it was concluded, however, that while the 357 system would meet a number of the established goals, and at moderate cost, the important objectives of immediate accountability, automatic follow-up on reserves, full disclosure of copies available for loan, and automatic pooling of network resources would be seriously compromised. further, the fact that twothirds of all loans made in bell laboratories do not involve the presence of the borrower substantially detracted from one of the major virtues of the 357 system, i.e., the simplicity of input using a pre-punched man ( identification) card submitted by the borrower. the various alternatives for coping with this situation in a 357 system, for 200,000 loans a year and a potential of over 15,000 people, were not attractive. i .. ' i i (. ! :: " ' r ' 132 journal of library automation vol. 1/ 2 june, 1968 the feasibility of on-line access has been widely demonstrated in the research and business world. remote, on-line computer processing is clearly a common course of the near future. equally predictably, it will steadily give more favorable cost/ value ratios as machine costs decrease and labor costs mount. in sum, the technical information libraries concluded that an on-line system was worth the investment and that no other system was worth the price. only an on-line approach would meet the overall objectives for a new system and offer advantages sufficient to justify conversion effort at this time. as frederick ruecking has observed, "a charging system should not be selected because it is 'cheaper' than others. if the selected system does not meet the present and future needs, the choice is poor." ( 11) the bellrel system bellrel is a joint development of the technical information libraries and the comptroller's division of bell laboratories. the system was designed, programmed and implemented in a little over two years, beginning in late 1965. during this time, preparation of the bibliographic records, system design and programming took about seven man years. basic machine elements the initial network is illustrated in figure 1. the two ibm 1050 terw.£ . o ata phone holm0£l l i brary ,...i,-,.,050~te=r=min""a-..~ 1051 1052 1056 selector murray hill library 1050 terminal 10~0 terminal 10" 105 2 lost 1051 1052 1056 channe~ #i 1-----.------.1 console typewriter fig. 1. bellrel circulation system network . whippany library 105 0 te rmina~ r-10 __ 5_0 --te--.m-in-4--.~ 10" 1052 1051 1051 1052 1056 won i to r te"minal bellrel/ kennedy 133 minals in each of the three libraries incorporate keyboard, printer and card reader facilities for maximum flexibility in handling all types of transactions and queries. each terminal is linked by telephone lines, using western electric 103a data-sets, to an ibm 360-40 computer in the compb·oller's division at murray hill. the murray hill library is only a building away from the computer. the holmdel and whippany libraries are about thirty and twelve miles distant, respectively. the computer, in heavy daily use along with other computers for regular operations of the comptroller's division, has a 262,000 byte ( character) core memory. core is partitioned, permitting effective simultaneous use of the computer for routine batch operations and the bellrel system. in addition to core requirements for the 360 operating system, core partitions include (a) the teleprocessing logic of the ibm queued teleprocessing access method (qtam), (b) message editing logic and application logic packages, including library applications and (c) batch processing programs and operations for all purposes. figure 2, a flowchart of the real-time processing logic, illustrates core partitioning for function process equip input and output of inquiry or transaction at 1050 terminal teleprocessing logic *inter-t.erminal communication, if desired by librarian message editing logic application logic common logic routines output ~ message ( response) v receive a i queue message i (switch) i queue a send response • process message • refer to disk files • update files • generate response disk input. and output logic process logic 1050 fig. 2. general flowchart of bellrel real-time programming logic. 134 journal of library automation vol. 1/ 2 june, 1968 (a) and (b). in addition to the programs resident in core (portions of which can be overlaid as necessary by other real-time operations) certain programs for particular functions (e.g. loan, return, etc.) are called from disk as needed. in all, 32 real-time and 23 batch programs, together with the 360 operating system, are used by bellrel. the programs are written in cobol level f and basic assembly language. disk records publication and man records are stored on an ibm 2314 disk pack with a capacity of some 29,000,000 characters. about two-thirds of this space is in use or dedicated. the man records, which are up-dated daily from tape used for telephone directory, payroll and other purposes, cover about 19,000 people including btl employees and resident visitors, i.e., contractual people who may also use library facilities. each man record is 161 characters in length and contains such information as payroll account number, name, department number, telephone number, location, occupational class, space for three book loans, keys referring to overflow loan trailers elsewhere on disk, etc. the man file is organized by payroll account number, a five-digit number which is keyed in or read from a prepunched card for all loans, reservations and other transactions requiring it. access to man records on disk is by the ibm index sequential access method ( isam). publication records vary in format, length and method of access depending upon the class of publication. five classes of publications are currently in the system: books (class 1), journals (class 2), trade catalogs (class 3), college catalogs (class 4) and dewey-classified continuations and multiple-volume titles cataloged as sets (class 5). other classes of information, e.g., documents, motion picture films , etc. will be added. each title in each class is assigned a unique six-digit identification number, the first digit of which identifies the class. a typical number for a monograph title is 127391. the punched cards and book labels for each copy of this title also indicate the holding library and its copy number, e.g., 127391mh01, 127391 wh05. a sample card and label, generated by the computer, are shown in figure 3. as noted above, books fall in two classes-1 and 5. each class provides a maximum of 100,000 title numbers, more than adequate for the predicted growth of the technical information libraries where weeding is heavy. the book collections for the three libraries now on disk total about 33,000 titles and 66,000 volumes. the disk record for each class 1 title is 188 characters in length and contains the book number, 43 characters of author-title, the call number, copies by location, the fields for file maintenance change infonnation, three loans, two reserves, keys to loan trailers and reserve trailers, etc. each loan field identifies borrower, date due, copy and status of the loan (e.g., overdue number, renewed, number of reserves, returned). the bellrel/kennedy 135 i i 102362mhui i i holton, g ./ sci~nc e and the mod e rn mind ii ill ii i i ii ii i i i i i i i i i _ .... u 500/h7 5 1111 i i i i i i i i 111111 i i bell 'elephone laboratories ii i i i i i i i i i i tethnical .. iiofi.1atjon liirafties i i i i i i i i i i i i i i 1023!>2 mh 01 holton, g./ science ano the modern mind 500/h75 ill i fig. 3. bellrel book card and label. z ~ ::u rr1 3': i 0 < rr1 i i -u r rr1 l> (f) rr1 0 0 identification number for each new class 1 book is assigned by the computer on update runs. numbers are sequential. disk access is direct. class 5 books-cataloged continuations and multiple-volume titles cataloged as sets-share a different kind of disk record. they could all have been entered as class 1 items, in which case each volume of a set would have had a separate record on disk, a unique (not ~ecessarily consecutive) identification number, and a separate listing in the author, call number and identification number printed catalogs. the class 5 approach, however, permits grouping of volumes in sets and series. ten volumes of one title are handled in one disk record, 288 characters in length, under the same identification number. additional volumes, up to a total of 100, are handled in succeeding records. all of the records of the set carry the same first five digits in their identification number. disk access is by the index sequential access method ( isam). in addition to grouping sets, class 5 records effect a saving on disk space and permit use statistics to be derived for the set as a whole, as well as for each volume in the set. the principal disadvantage of the approach is that all keyed messages dealing with any volume in the set must cite both the basic access number and the specific data (e.g., volume number) pertinent to the volume in question. the journal disk records cover all the 2700 journal titles held in the library system. unlike books, however, records of all copies and volumes of each title are not permanently stored on disk. instead, each !55character journal title record contains the journal identification number ,. ,, 136 journal of library automation vol. 1/ 2 june, 1968 and 48 characters of title, plus fields for file maintenance changes, two loans, one reserve, and keys to loan and reserve trailers. specific bound volumes or unbound issues are recorded on this record only as long as they are current loan or reserve transactions. to expedite loans and returns, punch cards and computer-printed labels have been prepared for some 10,000 bound journal volumes. additional volumes are similarly processed as circulated or bound. disk records for trade catalogs and college catalogs are also 155 characters long. access to records is also by the index sequential access method. unlike journal volumes, however, each separate catalog is specifically identified and recorded on disk. when conversion is complete, more than 5000 catalogs will be accessible on disk. the loan and reserve trailers for each publication class accommodate overflow. trailer records vary in number and length depending upon function, publication type and predicted need. for example, 5000 31character trailer records, each handling three reserves, are available for book reserves. for journals, 800 59-character records, each handling three reserves, are provided. the difference reflects the heavier book traffic and the particularly sharp peaking of reserves on new book titles. apart from the normal safety back-up files (e.g., the nightly dump to tape of the current disk records), the only remaining machine record which requires mention is the history tape. this tape, up-dated daily, is a continuing record of all completed loans which provides information necessary for statistics and use analyses. on-line transactions twenty-two different transaction codes are currently available to handle loans, returns, renewals, reservations and queries in real time. in addition, any terminal can call another terminal by a single digit code and one terminal in each library can call the other two libraries simultaneously by a 'broadcast' code. this inter-library, typewritten message facility is a highly useful component of the total system. ten of the twenty-two transaction codes handle loans, returns, reservations and renewals. these codes, their prime functions and associated data inputs are listed in table 1. the eleven lq (library query) codes for requesting information from bellrel are listed in table 2. one additional code causes the computer to print out at the query terminal a statistical log of all classes of transactions at each terminal and their totals. it also gives the number of input errors made at each terminal. the log aids in adjusting work loads and monitoring performance. let us now consider several common transactions in more detail. loans: if the borrower is present, he gives the desired book to the circulation clerk. he shows his badge or, alternatively, writes his surname and fivetable 1. on-line codes for loans, etc. code function 1. loans lm ln 2. returns lc lk 3. reserves la lb ld 4. renewals, etc. loan of 1-5 items for one man at one time. overnight loan. assigns overnight loan period automatically; does not pick-up reserves on return. cancel loan. charge out automatically to first person on reserve queue. cancel loan. no automatic charge-out. add to reserve queue. give reserve no., copies held and available, etc. bypass reserve queue. put designated man first. delete from reserve queu~. lp change loan period assigned. lr renew loan, once. lg force renewal irrespective of reserves, overdue status, etc. input data & method man no., item no. (including location and copy). usually card read. , item no., with location and copy. usually card read. " man no., item no. (less location and copy). keyed. , , new loan period, complete item no. keyed or card read. , , t:x:j ~ ~ ~ ~ ~ ~ ........... ~ t'%:1 z z t'%:1 t1 ~ )-' ~ i ( ~ .. = ~ •• 142 journal of library automation vol. 1/ 2 june, 1968 similar messages are not accepted by the system. recovery from errors may be done by aborting input, repeating it correctly or, if all elements are legitimate to the message edit program, by using the appropriate online code to correct the record. on the first day of full operations, 10% of the input transactions were incorrect. one week's experience reduced the error rate to 3%, and further improvement is expected. the .25% error rate estimated by lazorick and herling for a system planned to function without any prepunched cards ( 9) appears unrealistic. non-personal codes some thirty special codes, which function like man numbers in the system, are available to handle real-time transactions involving branch libraries, outside organizations and such internal library functions as charges to recataloging, repair, new book shelf, etc. all are three-digit codes, essentially mnemonic, e.g., al9 allentown ( pa.) library; wi9withdrawn. most of the codes generate overdue notices; the codes for binding, missing, repair and a few oth~rs do not. several require backup manual records, e.g., ala interlibrary loan forms for charges to outside libraries. batch processes and products overdue notices and daily loan lists are produced in a nightly file maintenance run which also updates the history tape. the preprinted forms used for first and second overdue notices are address-sorted for direct mailing. the third notice, triggered three days after the second and ten days after the first, is a listing with telephone numbers and other data for telephone follow-up. the daily loan list is primarily a back-up record in the event of system down-time. current loans, the number of reserves and other information are combined in one list for all three libraries. the bellrel master book catalog is run quarterly from disk records. main entry, dewey number and access number catalogs are produced. all new copy; new title and other record changes made on disk in maintenance runs are reflected in cumulative weekly catalog supplements . these runs also produce all the new or changed cards and labels required. the bellrel catalog is a precursor to a system-wide printed book catalog which will replace nearly one million catalog cards held in eighteen libraries. when completely developed, input to the circulation system will be a sub-system of the master catalog maintenance procedures. maintenance of the disk journal records for bellrel follows a comparable integrated approach: journal code numbers, title abbreviations, data changes and the like are derived from the computer routines used to prepare the serials catalog since 1962. trade catalog files for the bellrel/kennedy 141 the book and the number of copies still available for loan at each library. getting one copy into the hands of the requester is then very simple. the holding library nearest to the borrower is instructed, by telephone or terminal message, to send call number such-and-such "out." the requester's name and address are not relayed. the holding library gets the book from the shelves and cancels it, using the lc command with the card reader. although this copy was not on loan, the computer ignores this fact because someone is waiting for the book, i.e., the requester whose reserve triggered this sequence. as a consequence of the cancel operation, the requester is automatically charged with the book, the holding library is told his name and address, and mailing follows. the lc command is also used in the same way to get additional copies of a book, when purchased to meet high demands, into the hands of the requesters. the la reserve transaction is put to particularly good use in handling the 600-plus requests received within a few days each month for new books announced in the library bulletin. bulletin request forms supply both item numbers ~nd man numbers. mass input follows and the computer responds with all the signposts needed to put every copy in the system to work, with a dispatch speed hitherto impossible to achieve. as shown in table 1, two transactions permit changes in reserve queues. ld deletes a requester. lb permits the queue to be bypassed and insertion of a new name at the top of the list. queries this is a fact retrieval facility. the codes listed in table 2 are reasonably self-explanatory, and take into account the realities of on-line circulation service. lqc, for example, tells the status of a title at the moment of asking, an up-to-dateness not available from the backup daily loan list generated each night. typical responses to the lqc code are: copies available, mh02 who!; title removed my68; or all 03 copies loaned, 14 reserves similarly, lql provides a requester with an immediate, printed listing of all the items he has on loan. two query codes cause display of the complete disk record for a publication ( lqd) or a person ( lqe), including current loans, reserves and trailer records. error detection in any keyboard operation mistakes will be made. bellrel attempts to signal critical errors and prevent them from affecting records. as noted previously, input man numbers and item numbers are translated by the computer into alpha characters. numerous diagnostics are also returned: e.g., invalid transaction code; invalid book id #; invalid empl #;invalid transaction bad copy#; variable data required, etc. incorrect inputs generating these and table 2. on-line library query codes. code query 1. publications lqc what is the status of title . . . ? lqs what is the status of copy . . . ? lqn what overnight items are still out? lqd display the complete disk record for title .... 2. people lqm how many items are on loan to ... ? lql what items are charged to . . . ? lqq who is first on the reserve queue for . . . ? lqr is man . . . on reserve? where? lqw who are the borrowers of title ... ? lqz who is man number ... ? lqe display the complete disk record for man . . .. input data & method (all queries are keyed.) item no. (less location and copy). item no. (with location and copy). location symbol only. item no. (less location and copy). man no. ,, man no., item no. (less location, copy) . " man no. " " ~ i ..a t-'4 ... i it i g· ~ ..... ~ '-1 § s~ ..... cd &5 bellrel/ kennedy 139 digit number on a card. while he is doing this, the clerk hits the 'request" button on the keyboard-printer, and inserts the book card in the card reader along with an end-of-transmission card. with the keyboard 'proceed' light on ( 2 seconds after 'request'), the clerk returns the typewriter carriage and keys in lm (the loan code) and the man number obtained from the borrower: e.g., lm43486. input is completed by activating the card reader. the card reader reads only to the end-of-block punch (column 16 in book cards ) , ignoring the author-title data and call number in columns 17-78. as the book identification number is read, it is listed on the typewriter. the loan period is not punched in the card, but is assigned by the computer on the basis of the first digit of the publication's identification number. the assigned loan may be altered from the keyboard, if desired. the computer responds to the loan transaction in 3 to 5 seconds, printing back in upper-case red the first three letters (trigram) of the borrower's surname and twenty characters of the book's author-title entry. these responses provide checks against errors in keying. man numbers are usually keyed (although they may also be card read) and a book number is keyed when its punch card is not available, e.g., in posting a reserve. as noted below, a wide range of other computer responses are available to flag errors and aid diagnosis. the loan transaction is completed by inserting the punch card in the book and date stamping it. total elapsed time from the borrower's presentation of the book to date stamping averages about 23 seconds for a single loan of the type described. this compares with about 20 seconds cited for one ibm 357 system ( 3) and 14 second~ in bell laboratories manual system; however, in both these systems further processing is required. if a borrower wishes to charge out more than one book at a time (a common occurrence which ruled out punching the end-of-transmission code on the book card), up to five books may be handled with one keyboarding. total elapsed time for multiple loans averages about 15 seconds per book. loans of bound journals, trade catalogs and other publications with prepunched cards follow the routine described. for unbound journals and other items lacking cards, it is necessary to obtain the title number from the printed catalog and to key this in with the relevant issue information. other transaction codes, as noted in table 1, deal with loan period changes and renewals. typical computer responses from the renew code (lr) include: renew; overdue; res waiting; no renew. returns the two-character return code ( lc) is used with card reading or typing. five items may be discharged with one lc action. the computer ·' .,, iii '•' ill 1;1 '• ' i ' . ~ . . ,, '•' ' o , i , i. ... i ' 140 journal of library automation vol. 1/ 2 june, 1968 responds with twenty characters of author-title, and one of the following messages, for each item: returned i.e., the loan is complete and no one is waiting for a copy of this book. loan to . . . i.e., send the book to the man indicated by name and address. since he was first on the reserve queue, the book is now charged out to him for the loan period shown. mail to ... libr i.e., this book belongs to the library sho\vn and should be returned there. no one is waiting for it. not on loan i.e., this book was previously cancelled or somebody borrowed' it without charging it out. the loan to ... response noted above is a particularly valuable service and time-saving feature. in effect, if any reserve exists anywhere in the system for the title, then the first copy returned is automatically charged to the first person in the queue and the next person moved up. the loan period assigned by the computer depends upon whether there is a waiting list for the book. the library does not need to take any charge-out · action except to date stamp the book and address a mail envelope using the information provided . the mail to ... libr response, calling attention to the fact that the book should be returned to its 'home' library, is coupled with automatic charging by the computer to in transit to . . . questions about the copy will receive this response during the time it takes to ship it to its home base. when the book is cancelled at the home library, any reservations made during the 'in transit' phase will cause automatic loan in the manner already described . cancellation of a loan charge without automatic follow-up on the reserve queue is sometimes desirable. for example, after a copy of a book has been charged to 'missing' and search has failed to locate it, a charge to 'lost' may be desirable for record purposes. use of the lk return code, instead of the normal lc, makes this possible without automatic pick-up of the reserve queue. reservations since reserves are posted in bellrel in real time, any copy of a title returned, even seconds after the reserve is made, will be charged to the first man on reserve. reserves are input using the keyboard sequence la man number, item number. the computer response includes the standard name trigram and publication data. if all copies of the title are on loan, the computer also responds with information to the requester on where he stands; as an example, "res #03, copies held 05". if all copies are not on loan, the response includes the call number of l bellrel/kennedy 143 circulation system are similarly correlated with other existing machine processes and products. as stated earlier, much improved feedback on collection use, demand patterns, and other matters important to library management was a major goal of bellrel. the history tapes serve this purpose both for special-purpose analyses and regular system reports. the latter include circulation statistics by subject class and library, laboratory location, user department and so on. three other reports may be mentioned: 1. high-demand list-this is a weekly list focusing attention on all titles with more than a specified number of reserves. reserves and copies are shown by location. previous loan totals are also given to aid in purchase decisions to meet demands. 2. zero-loan list-this is a semiannual listing of all titles in the collection with no recorded loan activity in one or more libraries for the period surveyed. a summary of previous loans is given, to help in decisions on weeding. 3. missing items list-this is a twice-monthly, dewey-ordered list of all titles charged to 'missing.' it is used to conduct scheduled searches in all libraries until the items are converted to 'lost' and replaced or withdrawn. operating experience . this paper is being written after only one month's . use of bellrel in regular service. the following observations are therefore limited. circulation assistants have adapted very quickly to the ·input mechanics, familiarity with typewriter keyboards and the novelty of conversing with the computer being contributing factors .. bellrel appears to be regarded as a powerful and perceptive colleague with the occasional off moments accepted in a friend. · burdensome tasks, such as preparing overdue notices and maintaining card records have ·been dropped with enthusiasm. staff members are developing new perspectives as they understand the functioning of an information network. the total system concept, embracing the resources of all participating libraries . and permitting one copy of a book to serve many readers without inter-library loan, is modifying many practices. greater record accuracy, · completeness and utility is also being realized, along with significant time-savings throughout the system. the query . facility, which shows promise of being much used, provides immediate answers to certain questions which previously could not be asked and gives a glimpse of the eventual responsiveness of a complete on-line library catalog. .. · customer reaction has ranged from some technical interest (technical staff members were consulted in the development of the system and information about its purposes and functions has been widely disseminated) to more common approval and enthusiasm. the increase in time to charge ,. •' if' ,. ii · it '• '•' it' ' • ,, " •' 144 journal of library automation vol. 1/ 2 june, 1968 out a book in person in bellrel-about nine seconds more than the manual system for a single loan and two seconds more per book for multifle loans-appears to be widely accepted. whether this ·is due to initia tolerance of a new system, or less 'work' by the borrower in the charge operation, or an appreciation that service as a whole will be faster and more responsive, is not known. it is expected that charging time will be reduced with program modifications and experience. it should also be recalled that in two out of three loans the borrower is not present: far from experiencing additional delay, he gets what he wants faster. the usual bugs in a complex of programs have arisen; certain trailers had to be enlarged; the 360 operating system and hardware have failed several times. down-time, under initial loads of up to 1500 on-line transactions per day, has been less than anticipated for the first month and is expected to drop sharply. about two down-times per day were experienced in the first month, about half of these being deliberate, and most recoveries have taken less than fifteen minutes. down-time logs are used to record transactions for immediate entry into the system when it becomes alive, a similar procedure being used for after-hours loans. costs the costs of operating the bellrel system are, understandably, higher than the displaced manual system, the two systems, of course, not being comparable in services and functions. in the operations which can be fairly directly correlated, bellrel permits very significant labor savings. appreciable materials savings are also anticipated as a result of collection pooling (leading to reduced duplication of resources in the individual libraries), better inventory control, and other factors. rental costs are the major component. each of the six terminals, for example, with associated data-sets and telephone lines, costs $275 a month. costs of the portion .of the transmission control unit and disk facility used by the libraries total about $1100 a month. in addition to a small amount for materials, other costs include a share of the central processing unit and core memory charges, depending upon usage. to execute 1500 real-time transactions per day appears to require less than 12 minutes of main-frame computer time, but a share of the real-time terminal polling and batch processing time must also be included. however, experience with the automated system has been far too brief to reach any precise cost figures for the whole system. in particular, although the dollar value of the largely intangible but very real benefits to library users and library staff can only, at least at this stage, be guessed at, bellrel has been implemented on the premise that these benefits are major. it should be noted that the costs of the manual (newark) system in bell laboratories differ greatly from the costs calculated by the library technology project ( l tp) for this system in an academic library ( 12). bellrel/kennedy 145 ltp cost estimates for both the newark and the ibm 357 systems do not conform to our calculations for more reasons than can he discussed here. in the main, however, environmental conditions, strongly affecting labor costs, are too different. for example, in arriving at labor costs, l tp uses the figure of 44 overdues per 1000 circulations in academic libraries; in our library system where there are no fines or long loans, overdues total about eight times this figure. again, as a result of book announcement services, discipline concentration and other factors, reserves in the bell laboratories libraries are nearly twenty times the ratio used by the library technology project for academic libraries. still further, in bell laboratories some 200,000 loans per year are made without the borrower being present to fill in the loan card. these and other factors add heavily to the cost of labor. few industrial organizations can obtain labor at the cost of $2.00 per hour cited in library technology reports when personnel benefits and other overhead are included. conclusion paul fasana has observed: "since cost is primarily a quantitative measme of a system, it is hut one of several factors (and possibly not even the most important factor) to consider in evaluating an automated system. other factors . . . qualitative factors . . . must also he considered. . . . they include such items as operating efficiency, reliability, services rendered, and growth potential." ( 13) . a full judgment on these factors in the bellrel system must await fmther experience hut the following observations may he made: 1) bellrel is not an experiment; it is addressed to practical problems in an industrial library network. 2) it is not a final system; software and hardware evolution will see to that. 3) it is not a model system, transportable in toto to another context; any system of comparable complexity and investment requires careful matching to local needs and objectives. bellrel objectives, to reiterate, include improved service through computer pooling of dispersed library collections, up-to-date reporting on the status of any publication, immediate identification of all items on loan to a person and automatic follow-up on reserve queues; reduced clerical labor; better inventory control; much enriched feedback for library management; more effective realization of the information network philosophy; and experience in the new era of man-machine communication in a real-life environment. the evidence is strong that these objectives are being achieved. acknowledgments the technical information libraries gratefully acknowledge the unstinting and imaginative aid given by the comptroller's division of bell tel,, •' ~ • ' r .. ... 146 journal of librm·y automation vol. 1/ 2 june, 1968 ephone laboratories in the design, development and operation of the bellrel system. bibliography 1. 2. 3. 4. 5 . 6. 7. 8. 9. 10. 11. 12. 13. george fry and associates, inc.: study of circulation control systems (chicago: ala, 1961). american library association, library technology project: the use of data-processing equipment in circulation control (chicago: ala, july 1965), library technology reports. mccoy, ralph e.: "computerized circulation work: a case study of the 357 data collection system," library resources & technical se1·vices, 9 (winter 1965), 59-65. flannery, anne; mack, james d.: mechanized circulation system, lehigh university library (center for the information sciences, lehigh univ.: nov. 1966), library systems analyses report no. 4. cammack, floyd; mann, donald: "institutional implications of an automated circulation study," college & research libraries, 28 ( march 1967), 129-32 . cuadra, carlos a., ed.: american documentation institute annual review of information science and technology, vol. 1. (new york: interscience, 1966), pp. 201-4. mccune, lois c.; salmon, stephen r.: "bibliography of library automation," ala bulletin, 61 (june 1967), 674-94. kimber, richardt.: "studies at the queen's university of belfast on real-time computer control of book circulation," journal of documentation, 22 (june 1966), 116-22. lazorick, gerald j.; herling, john p. : "a real time library circulation system without pre-punched cards," proceedings of the american documentation institute, v. 4 (washington: adi, 1967), 202-6. croxton, f. e.: on-line computer applications in a technical library (redstone scientific information center, redstone arsenal, alabama: november 1967), rsic-723. ruecking, frederick, jr.: "selecting a circulation-control system: a mathematical approach," college & research libraries, 25 (sept. 1964)' 385-90. american library association, library technology project: three systems of circulation control (chicago: ala, may 1967), library technology reports. fasana, paul j.: "determining the cost of library automation," ala bulletin, 61 (june 1967) 661. 6 information technology and libraries | march 2009 paul t. jaeger and zheng yan one law with two outcomes: comparing the implementation of cipa in public libraries and schools though the children’s internet protection act (cipa) established requirements for both public libraries and public schools to adopt filters on all of their computers when they receive certain federal funding, it has not attracted a great amount of research into the effects on libraries and schools and the users of these social institutions. this paper explores the implications of cipa in terms of its effects on public libraries and public schools, individually and in tandem. drawing from both library and education research, the paper examines the legal background and basis of cipa, the current state of internet access and levels of filtering in public libraries and public schools, the perceived value of cipa, the perceived consequences of cipa, the differences in levels of implementation of cipa in public libraries and public schools, and the reasons for those dramatic differences. after an analysis of these issues within the greater policy context, the paper suggests research questions to help provide more data about the challenges and questions revealed in this analysis. t he children’s internet protection act (cipa) established requirements for both public libraries and public schools to—as a condition for receiving certain federal funds—adopt filters on all of their computers to protect children from online content that was deemed potentially harmful.1 passed in 2000, cipa was initially implemented by public schools after its passage, but it was not widely implemented in public libraries until the 2003 supreme court decision (united states v. american library association) upholding the law’s constitutionality.2 now that cipa has been extensively implemented for five years in libraries and eight years in schools, it has had time to have significant effects on access to online information and services. while the goal of filtering requirements is to protect children from potentially inappropriate content, filtering also creates major educational and social implications because filters also limit access to other kinds of information and create different perceptions about schools and libraries as social institutions. curiously, cipa and its requirements have not attracted a great amount of research into the effects on schools, libraries, and the users of these social institutions. much of the literature about cipa has focused on practical issues—either recommendations on implementing filters or stories of practical experiences with filtering. while those types of writing are valuable to practitioners who must deal with the consequences of filtering, there are major educational and societal issues raised by filtering that merit much greater exploration. while relatively small bodies of research have been generated about cipa’s effects in public libraries and public schools,3 thus far these two strands of research have remained separate. but it is the contention of this paper that these two strands of research, when viewed together, have much more value for creating a broader understanding of the educational and societal implications. it would be impossible to see the real consequences of cipa without the development of an integrative picture of its effects on both public schools and public libraries. in this paper, the implications of cipa will be explored in terms of effects on public libraries and public schools, individually and in tandem. public libraries and public schools are generally considered separate but related public sphere entities because both serve core educational and information-provision functions in society. furthermore, the fact that public schools also contain school library media centers highlights some very interesting points of intersection between public libraries and school libraries in terms of the consequences of cipa: while cipa requires filtering of computers throughout public libraries and public schools, the presence of school library media centers makes the connection between libraries and schools stronger, as do the teaching roles of public libraries (e.g., training classes, workshops, and evening classes). n the legal road to cipa history under cipa, public libraries and public schools receiving certain kinds of federal funds are required to use filtering programs to protect children under the age of seventeen from harmful visual depictions on the internet and to provide public notices and hearings to increase public awareness of internet safety. senator john mccain (r-az) sponsored cipa, and it was signed into law by president bill clinton on december 21, 2000. cipa requires that filters at public libraries and public schools block three specific types of content: (1) obscene material (that paul t. jaeger (pjaeger@umd.edu) is assistant professor at the college of information studies and director of the center for information policy and electronic government of the university of maryland in college park. zheng yan (zyan@uamail.albany .edu) is associate professor at the department of educational and counseling psychology in the school of education of the state university of new york at albany. one law with two outcomes | jaeger and yan 7 which appeals to prurient interests only and is “offensive to community standards”); (2) child pornography (depictions of sexual conduct and or lewd exhibitionism involving minors); and (3) material that is harmful to minors (depictions of nudity and sexual activity that lack artistic, literary, or scientific value). cipa focused on “the recipients of internet transmission,” rather than the senders, in an attempt to avoid the constitutional issues that undermined the previous attempts to regulate internet content.4 using congressional authority under the spending clause of article i, section 8 of the u.s. constitution, cipa ties the direct or indirect receipt of certain types of federal funds to the installation of filters on library and school computers. therefore each public library and school that receives the applicable types of federal funding must implement filters on all computers in the library and school buildings, including computers that are exclusively for staff use. libraries and schools had to address these issues very quickly because the federal communications commission (fcc) mandated certification of compliance with cipa by funding year 2004, which began in summer 2004.5 cipa requires that filters on computers block three specific types of content, and each of the three categories of materials has a specific legal meaning. the first type—obscene materials—is statutorily defined as depicting sexual conduct that appeals only to prurient interests, is offensive to community standards, and lacks serious literary, artistic, political, or scientific value.6 historically, obscene speech has been viewed as being bereft of any meaningful ideas or educational, social, or professional value to society.7 statutes regulating speech as obscene have to do so very carefully and specifically, and speech can only be labeled obscene if the entire work is without merit.8 if speech has any educational, social, or professional importance, even for embodying controversial or unorthodox ideas, it is supposed to receive first amendment protection.9 the second type of content—child pornography—is statutorily defined as depicting any form of sexual conduct or lewd exhibitionism involving minors.10 both of these types of speech have a long history of being regulated and being considered as having no constitutional protections in the united states. the third type of content that must be filtered— material that is harmful to minors—encompasses a range of otherwise protected forms of speech. cipa defines “harmful to minors” as including any depiction of nudity, sexual activity, or simulated sexual activity that has no serious literary, artistic, political, or scientific value to minors.11 the material that falls into this third category is constitutionally protected speech that encompasses any depiction of nudity, sexual activity, or simulated sexual activity that has serious literary, artistic, political, or scientific value to adults. along with possibly including a range of materials related to literature, art, science, and policy, this third category may involve materials on issues vital to personal well-being such as safe sexual practices, sexual identity issues, and even general health care issues such as breast cancer. in addition to the filtering requirements, section 1731 also prescribes an internet awareness strategy that public libraries and schools must adopt to address five major internet safety issues related to minors. it requires libraries and schools to provide reasonable public notice and to hold at least one public hearing or meeting to address these internet safety issues. requirements for schools and libraries cipa includes sections specifying two major strategies for protecting children online (mainly in sections 1711, 1712, 1721, and 1732) as well as sections describing various definitions and procedural issues for implementing the strategies (mainly in sections 1701, 1703, 1731, 1732, 1733, and 1741). section 1711 specifies the primary internet protection strategy—filtering—in public schools. specifically, it amends the elementary and secondary education act of 1965 by limiting funding availability for schools under section 254 of the communication act of 1934. through a compliance certification process within a school under supervision by the local educational agency, it requires schools to include the operation of a technology protection measure that protects students against access to visual depictions that are obscene, are child pornography, or are harmful to minors under the age of seventeen. likewise, section 1712 specifies the same filtering strategy in public libraries. specifically, it amends section 224 of the museum and library service act of 1996/2003 by limiting funding availability for libraries under section 254 of the communication act of 1934. through a compliance certification process within a library under supervision by the institute of museum and library services (imls), it requires libraries to include the operation of a technology protection measure that protects students against access to visual depictions that are obscene, child pornography, or harmful to minors under the age of seventeen. section 1721 is a requirement for both libraries and schools to enforce the internet safety policy with the internet safety policy strategy and the filtering technology strategy as a condition of universal service discounts. specifically, it amends section 254 of the communication act of 1934 and requests both schools and libraries to monitor the online activities of minors, operate a technical protection measure, provide reasonable public notice, and hold at least one public hearing or meeting to address the internet safety policy. this is through the 8 information technology and libraries | march 2009 certification process regulated by the fcc. section 1732, titled the neighborhood children’s internet protection act (ncipa), amends section 254 of the communication act of 1934 and requires schools and libraries to adopt and implement an internet safety policy. it specifies five types of internet safety issues: (1) access by minors to inappropriate matter on the internet; (2) safety and security of minors when using e-mail, chat rooms, and other online communications; (3) unauthorized access; (4) unauthorized disclosure, use, and dissemination of personal information; and (5) measures to restrict access to harmful online materials. from the above summary, it is clear that (1) the two protection strategies of cipa (the internet filtering strategy and safety policy strategy) were equally enforced in both public schools and public libraries because they are two of the most important social institutions for children’s internet safety; (2) the nature of the implementation mechanism is exactly the same, using the same federal funding mechanisms as the sole financial incentive (limiting funding availability for schools and libraries under section 254 of the communication act of 1934) through a compliance certification process to enforce the implementation of cipa; and (3) the actual implementation procedure differs in libraries and schools, with schools to be certified under the supervision of local educational agencies (such as school districts and state departments of education) and with libraries to be certified within a library under the supervision of the imls. economics of cipa the universal service program (commonly known as e–rate) was established by the telecommunications act of 1996 to provide discounts, ranging from 20 to 90 percent, to libraries and schools for telecommunications services, internet services, internal systems, and equipment.12 the program has been very successful, providing approximately $2.25 billion dollars a year to public schools, public libraries, and public hospitals. the vast majority of e-rate funding—about 90 percent—goes to public schools each year, with roughly 4 percent being awarded to public libraries and the remainder going to hospitals.13 the emphasis on funding schools results from the large number of public schools and the sizeable computing needs of all of these schools. but even 4 percent of the e-rate funding is quite substantial, with public libraries receiving more than $250 million between 2000 and 2003.14 schools received about $12 billion in the same time period.15 along with e-rate funds, the library services and technology act (lsta) program administered by the imls provides money to each state library agency to use on library programs and services in that state, though the amount of these funds is considerably lower than e-rate funds. the american library association (ala) has noted that the e-rate program has been particularly significant in its role of expanding online access to students and to library patrons in both rural and underserved communities.16 in addition to the effect on libraries, e-rate and lsta funds have significantly affected the lives of individuals and communities. these programs have contributed to the increase in the availability of free public internet access in schools and libraries. by 2001, more than 99 percent of public school libraries provided students with internet access.17 by 2007, 99.7 percent of public library branches were connected to the internet, and 99.1 percent of public library branches offered public internet access.18 however, only a small portion of libraries and schools used filters prior to cipa.19 since the advent of computers in libraries, librarians typically had used informal monitoring practices for computer users to ensure that nothing age inappropriate or morally offensive was publicly visible.20 some individual school and library systems, such as in kansas and indiana, even developed formal or informal statewide internet safety strategies and approaches.21 why were only libraries and schools chosen to protect children’s online safety? while there are many social institutions that could have been the focus of cipa, the law places the requirements specifically on public libraries and public schools. if congress was so interested in protecting children from access to harmful internet content, it seems that the law would be more expansive and focused on the content itself rather than filtering access to the content. however, earlier laws that attempted to regulate access to internet content failed legal challenges specifically because they tried to regulate content. prior to the enactment of cipa, there were a number of other proposed laws aimed at preventing minors from accessing inappropriate internet content. the communications decency act (cda) of 1996 prohibited the sending or posting of obscene material through the internet to individuals under the age of eighteen.22 however, the supreme court found the cda to be unconstitutional, stating that the law violated free speech under the first amendment. in 1998, congress passed the child online protection act (copa), which prohibited commercial websites from displaying material deemed harmful to minors and imposed criminal penalties on internet violators.23 a three-panel judge for the district court for the eastern district of pennsylvania ruled that copa’s focus on “contemporary community standards” violated the first amendment, and the panel subsequently imposed an one law with two outcomes | jaeger and yan 9 injunction on copa’s enforcement. cipa’s force comes from congress’s power under the spending clause; that is, congress can legally attach requirements to funds that it gives out. since cipa is based on economic persuasion—the potential loss of funds for technology—the law can only have an effect on recipients of those funds. while regulating internet access in other venues like coffee shops, internet cafés, bookstores, and even individual homes would provide a more comprehensive shield to limit children’s access to certain online content, these institutions could not be reached under the spending clause. as a result, the burdens of cipa fall squarely on public libraries and public schools. n the current state of filtering when did cipa actually come into effect in libraries and schools? after overcoming a series of legal challenges that were ultimately decided by the supreme court, cipa came into effect in full force in 2003, though 96 percent of public schools were already in compliance with cipa in 2001. when the court upheld the constitutionality of cipa, the legal challenge by public libraries centered on the way the statute was written.24 the court’s decision states that the wording of the law does not place unconstitutional limitations on free speech in public libraries. to continue receiving federal dollars directly or indirectly through certain federal programs, public libraries and schools were required to install filtering technologies on all computers. while the case decided by the supreme court focused on public libraries, the decision virtually precludes public schools from making the same or related challenges.25 before that case was decided, however, most schools had already adopted filters to comply with cipa. as a result of cipa, a public library or public school must install technology protection measures, better known as filters, on all of its computers if it receives n e-rate discounts for internet access costs, n e–rate discounts for internal connections costs, n lsta funding for direct internet costs,26 or n lsta funding for purchasing technology to access the internet. the requirements of cipa extend to public libraries, public schools, and any library institution that receives lsta and e–rate funds as part of a system, including state library agencies and library consortia. as a result of the financial incentives to comply, almost 100 percent of public schools in the united states have implemented the requirements of cipa,27 and approximately half of public libraries have done so.28 how many public schools have implemented cipa? according to the latest report by the department of education (see table 1), by 2005, 100 percent of public schools had implemented both the internet filtering strategy and safety policy strategy. in fact, in 2001 (the first year cipa was in effect), 96 percent of schools had implemented cipa, with 99 percent filtering by 2002. when compared to the percentage of all public schools with internet access from 1994 to 2005, internet access became nearly universal in schools between 1999 and 2000 (95 to 98 percent), and one can see that the internet access percentage in 2001 was almost the same as the cipa implementation percentage. according to the department of education, the above estimations are based on a survey of 1,205 elementary and secondary schools selected from 63,000 elementary schools and 21,000 secondary and combined schools.29 after reviewing the design and administration of the survey, it can be concluded that these estimations should be considered valid and reliable and that cipa was immediately and consistently implemented in the majority of the public schools since 2001.30 how many public libraries have implemented cipa? in 2002, 43.4 percent of public libraries were receiving e-rate discounts, and 18.9 percent said they would not apply for e-rate discounts if cipa was upheld.31 since the supreme court decision upholding cipa, the number of libraries complying with cipa has increased, as table 1. implementation of cipa in public schools year 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2005 access (%) 35 50 65 78 89 95 98 99 99 100 100 filtering (%) 96 99 97 100 10 information technology and libraries | march 2009 have the number of libraries not applying for e-rate funds to avoid complying with cipa. however, unlike schools, there is no exact count of how many libraries have filtered internet access. in many cases, the libraries themselves do not filter, but a state library, library consortium, or local or state government system of which they are a part filters access from beyond the walls of the library. in some of these cases, the library staff may not even be aware that such filtering is occurring. a number of state and local governments have also passed their own laws to encourage or require all libraries in the state to filter internet access regardless of e-rate or lsta funds.32 in 2008, 38.2 percent of public libraries were filtering access within the library as a result of directly receiving e-rate funding.33 furthermore, 13.1 percent of libraries were receiving e-rate funding as a part of another organization, meaning that these libraries also would need to comply with cipa’s requirements.34 as such, the number of public libraries filtering access is now at least 51.3 percent, but the number will likely be higher as a result of state and local laws requiring libraries to filter as well as other reasons libraries have implemented filters. in contrast, among libraries not receiving e-rate funds, the number of libraries now not applying for e-rate intentionally to avoid the cipa requirements is 31.6 percent.35 while it is not possible to identify an exact number of public libraries that filter access, it is clear that libraries overall have far lower levels of filtering than the 100 percent of public schools that filter access. e-rate and other program issues the administration of the e-rate program has not occurred without controversy. throughout the course of the program, many applicants for and recipients of the funding have found the program structure to be obtuse, the application process to be complicated and time consuming, and the administration of the decision-making process to be slow.36 as a result, many schools and libraries find it difficult to plan ahead for budgeting purposes, not knowing how much funding they will receive or when they will receive it.37 there also have been larger difficulties for the program. following revelations about the uses of some e-rate awards, the fcc suspended the program from august to december 2004 to impose new accounting and spending rules for the funds, delaying the distribution of over $1 billion in funding to libraries and schools.38 news investigations had discovered that certain school systems were using e-rate funds to purchase more technology than they needed or could afford to maintain, and some school systems failed to ever use technology they had acquired.39 while the administration of the e-rate program has been comparatively smooth since, the temporary suspension of the program caused serious short-term problems for, and left a sense of distrust of, the program among many recipients.40 filtering issues during the 1990s, many types of software filtering products became available to consumers, including serverside filtering products (using a list of server-selected blocked urls that may or may not be disclosed to the user), client-side filtering (controlling the blocking of specific content with a user password), text-based content-analysis filtering (removing illicit content of a website using real-time analysis), monitoring and timelimiting technologies (tracking a child’s online activities and limiting the amount of time he or she spends online), and age-verification systems (allowing access to webpages by passwords issued by a third party to an adult).41 but because filtering software companies make the decisions about how the products work, content and collection decisions for electronic resources in schools and public libraries have been taken out of the hands of librarians, teachers, and local communities and placed in the trust of proprietary software products.42 some filtering programs also have specific political agendas, which many organizations that purchase them are not aware of.43 in a study of over one million pages, for every webpage blocked by a filter as advertised by the software vendor, one or more pages were blocked inappropriately, while many of the criteria used by the filtering products go beyond the criteria enumerated in cipa.44 filters have significant rates of inappropriately blocking materials, meaning that filters misidentify harmless materials as suspect and prevent access to harmless items (e.g., one filter blocked access to the declaration of independence and the constitution).45 furthermore, when libraries install filters to comply with cipa, in many instances the filters will frequently be blocking text as well as images, and (depending on the type of filtering product employed) filters may be blocking access to entire websites or even all the sites from certain internet service providers. as such, the current state of filtering technology will create the practical effect of cipa restricting access to far more than just certain types of images in many schools and libraries.46 n differences in the perceived value of cipa and filtering based on the available data, there clearly is a sizeable contrast in the levels of implementation of cipa between one law with two outcomes | jaeger and yan 11 schools and libraries. this difference raises a number of questions: for what reasons has cipa been much more widely implemented in schools? is this issue mainly value driven, dollar driven, both, or neither in these two public institutions? why are these two institutions so different regarding cipa implementation while they share many social and educational similarities? reasons for nationwide full implementation in schools there are various reasons—from financial, population, social, and management issues to computer and internet availability—that have driven the rapid and comprehensive implementation of filters in public schools. first, public schools have to implement cipa because of societal pressures and the lobbying of parents to ensure students’ internet safety. almost all users of computers in schools are minors, the most vulnerable groups for internet crimes and child pornography. public schools in america have been the focus of public attention and scrutiny for years, and the political and social responsibility of public schools for children’s internet safety is huge. as a result, society has decided these students should be most strongly protected, and cipa was implemented immediately and most widely at schools. second, in contrast to public libraries (which average slightly less than eleven computers per library outlet), the typical number of computers in public schools ranges from one hundred to five hundred, which are needed to meet the needs of students and teachers for daily learning and teaching. since the number of computers is quite large, the financial incentives of e-rate funding are substantial and critical to the operation of the schools. this situation provides administrators in schools and school districts with the incentive to make decisions to implement cipa as quickly and extensively as possible. furthermore, the amount of money that e-rate provides for schools in terms of technology is astounding. as was noted earlier, schools received over $12 billion from 2000 to 2003 alone. schools likely would not be able to provide the necessary computers for students and teachers without the e-rate funds. third, the actual implementation procedure differs in schools and libraries: schools are certified under the supervision of the local educational agencies such as school districts and state departments of education; libraries are certified within a library organization under the supervision of the imls. in other words, the certification process at schools is directly and effectively controlled by school districts and state departments of education, following the same fundamental values of protecting children. the resistance to cipa in schools has been very small in comparison to libraries. the primary concern raised has been the issue of educational equality. concerns have been raised that filters in schools may create two classes of students—ones with only filtered access at school and ones who also can get unfiltered access at home.47 reasons for more limited implementation in libraries in public libraries, the reasons for implementing cipa are similar to those of public schools in many ways. public libraries provide an average of 10.7 computers in each of the approximately seven thousand public libraries in the united states, which is a lot of technology that needs to be supported. the e-rate and lsta funds are vital to many libraries in the provision of computers and the internet. furthermore, with limited alternative sources of funding, the e-rate and lsta funds are hard to replace if they are not available. given that the public libraries have become the guarantor of public access to computing and the internet, libraries have to find ways to ensure that patrons can access the internet.48 libraries also have to be concerned about protecting and providing a safe environment for younger patrons. while libraries serve patrons of all ages, one of the key social expectations of libraries is the provision of educational materials for children and young adults. children’s sections of libraries almost always have computers in them. much of the content blocked by filters is of little or no education value. as such, “defending unfiltered internet access was quite different from defending catcher in the rye.”49 nevertheless, many libraries have fought against the filtering requirements of cipa because they believe that it violates the principles of librarianship or for a number of other reasons. in 2008, 31.6 percent of public libraries refused to apply for e-rate or lsta funds specifically to avoid cipa requirements, a substantial increase from the 15.3 percent of libraries that did not apply for e-rate because of cipa in 2006.50 as a result of defending patron’s rights to free access, the libraries that are not applying for e-rate funds because of the requirements of cipa are being forced to turn down the chance for funding to help pay for internet access in order to preserve community access to the internet. because many libraries feel that they cannot apply for e-rate funds, local and regional discrepancies are occurring in the levels of internet access that are available to patrons of public libraries in different parts of the country.51 for adult patrons who wish to access material on computers with filters, cipa states that the library has the option of disabling the filters for “bona fide research or other lawful purposes” when adult patrons request such disabling. the law does not require libraries to 12 information technology and libraries | march 2009 disable the filters for adult patrons, and the criteria for disabling of filters do not have a set definition in the law. the potential problems in the process of having the filters disabled are many and significant, including librarians not allowing the filters to be turned off, librarians not knowing how to turn the filters off, the filtering software being too complicated to turn off without injuring the performance of the workstation in other applications, or the filtering software being unable to be turned off in a reasonable amount of time.52 it has been estimated that approximately 11 million low-income individuals rely on public libraries to access online information because they lack internet access at home or work.53 the e-rate and lsta programs have helped to make public libraries a trusted community source of internet access, with the public library being the only source of free public internet access available to all community residents in nearly 75 percent of communities in the united states.54 therefore usage of computers and the internet in public libraries has continued to grow at a very fast pace over the past ten years.55 thus public libraries are torn between the values of providing safe access for younger patrons and broad access for adult patrons who may have no other means of accessing the internet. n cipa, public policy, and further research while the diverse implementations, effects, and levels of acceptance of cipa across schools and libraries demonstrate the wide range of potential ramifications of the law, surprisingly little consideration is given to major assumptions in the law, including the appropriateness of the requirements to different age groups and the nature of information on the internet. cipa treats all users as if they are the same level of maturity and need the same level of protection as a small child, as evidenced by the requirement that all computers in a library or school have filters regardless of whether children use a particular computer. in reality, children and adults interact in different social, physical, and cognitive ways with computers because of different developmental processes.56 cipa fails to recognize that children as individual users are active processors of information and that children of different ages are going to be affected in divergent ways by filtering programs.57 younger children benefit from more restrictive filters while older children benefit from less restrictive filters. moreover, filtering can be complimented by encouragement of frequent positive internet usage and informal instruction to encourage positive use. finally, children of all ages need a better understanding of the structure of the internet to encourage appropriate caution in terms of online safety. the internet represents a new social and cultural environment in which users simultaneously are affected by the social environment and also construct that environment with other users.58 cipa also is based on fundamental misconceptions about information on the internet. the supreme court’s decision upholding cipa represents several of these misconceptions, adopting an attitude that ‘we know what is best for you’ in terms of the information that citizens should be allowed to access.59 it assumes that schools and libraries select printed materials out of a desire to protect and censor rather than recognizing the basic reality that only a small number of print materials can be afforded by any school or library. the internet frees schools and libraries from many of these costs. furthermore, the court assumes that libraries should censor the internet as well, ultimately upholding the same level of access to information for adult patrons and librarians in public libraries as students in public schools. these two major unexamined assumptions in the law certainly have played a part in the difficulty of implementing cipa and in the resistance to the law. and this does not even address the problems of assuming that public libraries and public schools can be treated interchangeably in crafting legislation. these problematic assumptions point to a significantly larger issue: in trying to deal with the new situations created by the internet and related technology, the federal government has significantly increased the attention paid to information policy.60 over the past few years, government laws and standards related to information have begun to more clearly relate to social aspects of information technologies such as the filtering requirements of cipa.61 but the social, economic, and political ramifications for decisions about information policy are often woefully underexamined in the development of legislation.62 this paper has documented that many of the reasons for and statistics about cipa implementation are available by bringing together information from different social institutions. the biggest questions about cipa are about the societal effects of the policy decisions: n has cipa changed the education and informationprovision roles of libraries and schools? n has cipa changed the social expectations for libraries and schools? n have adult patron information behaviors changed in libraries? n have minor patron information behaviors changed in libraries? n have student information behaviors changed in school? n how has cipa changed the management of libraries and schools? n will congress view cipa as successful enough to merit using libraries and schools as the means of enforcing other legislation? one law with two outcomes | jaeger and yan 13 but these social and administrative concerns are not the only major research questions raised by the implementation of cipa. future research about cipa not only needs to focus on the individual, institutional, and social effects of the law. it must explore the lessons that cipa can provide to the process of creating and implementing information policies with significant societal implications. the most significant research issues related to cipa may be the ones that help illuminate how to improve the legislative process to better account for the potential consequences of regulating information while the legislation is still being developed. such cross-disciplinary analyses would be of great value as information becomes the center of an increasing amount of legislation, and the effects of this legislation have continually wider consequences for the flow of information through society. it could also be of great benefit to public schools and libraries, which, if cipa is any indication, may play a large role in future legislation about public internet access. references 1. children’s internet protection act (cipa), public law 106554. 2. united states v. american library association, 539 u.s. 154 (2003). 3. american library association, libraries connect communities: public library funding & technology access study 2007–2008 (chicago: ala, 2008); paul t. jaeger, john carlo bertot, and charles r. mcclure, “the effects of the children’s internet protection act (cipa) in public libraries and its implications for research: a statistical, policy, and legal analysis,” journal of the american society for information science and technology 55, no. 13 (2004): 1131–39; paul t. jaeger et al., “public libraries and internet access across the united states: a comparison by state from 2004 to 2006,” information technology and libraries 26, no. 2 (2007): 4–14; paul t. jaeger et al., “cipa: decisions, implementation, and impacts,” public libraries 44, no. 2 (2005): 105–9; zheng yan, “limited knowledge and limited resources: children’s and adolescents’ understanding of the internet,” journal of applied developmental psychology (forthcoming); zheng yan, “differences in basic knowledge and perceived education of internet safety between high school and undergraduate students: do high school students really benefit from the children’s internet protection act?” journal of applied developmental psychology (forthcoming); zheng yan, “what influences children’s and adolescents’ understanding of the complexity of the internet?,” developmental psychology 42 (2006): 418–28. 4. martha m. mccarthy, “filtering the internet: the children’s internet protection act,” educational horizons 82, no, 2 (winter 2004): 108. 5. federal communications commission, in the matter of federal–state joint board on universal service: children’s internet protection act, fcc order 03-188 (washington, d.c.: 2003). 6. cipa. 7. roth v. united states, 354 u.s. 476 (1957). 8. miller v. california, 413 u.s. 15 (1973). 9. roth v. united states. 10. cipa. 11. cipa. 12. telecommunications act of 1996, public law 104-104 (feb. 8, 1996). 13. paul t. jaeger, charles r. mcclure, and john carlo bertot, “the e-rate program and libraries and library consortia, 2000–2004: trends and issues,” information technology & libraries 24, no. 2 (2005): 57–67. 14. ibid. 15. ibid. 16. american library association, “u.s. supreme court arguments on cipa expected in late winter or early spring,” press release, nov. 13, 2002, www.ala.org/ala/aboutala/hqops/ pio/pressreleasesbucket/ussupremecourt.cfm (accessed may 19, 2008). 17. kelly rodden, “the children’s internet protection act in public schools: the government stepping on parents’ toes?” fordham law review 71 (2003): 2141–75. 18. john carlo bertot, paul t. jaeger, and charles r. mcclure, “public libraries and the internet 2007: issues, implications, and expectations,” library & information science research 30 (2008): 175–184; charles r. mcclure, paul t. jaeger, and john carlo bertot, “the looming infrastructure plateau?: space, funding, connection speed, and the ability of public libraries to meet the demand for free internet access,” first monday 12, no. 12 (2007), www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ article/view/2017/1907 (accessed may 19, 2008). 19. mccarthy, “filtering the internet.” 20. leigh s. estabrook and edward lakner, “managing internet access: results of a national survey,” american libraries 31, no. 8 (2000): 60–62. 21. alberta davis comer, “studying indiana public libraries’ usage of internet filters,” computers in libraries (june 2005): 10–15; thomas m. reddick, “building and running a collaborative internet filter is akin to a kansas barn raising,” computers in libraries 20, no. 4 (2004): 10–14. 22. communications decency act of 1996, public law 104-104 (feb. 8, 1996). 23. child online protection act (copa), public law 105-277 (oct. 21, 1998). 24. united states v. american library association. 25. r. trevor hall and ed carter, “examining the constitutionality of internet filtering in public schools: a u.s. perspective,” education & the law 18, no. 4 (2006): 227–45; mccarthy “filtering the internet.” 26. library services and technology act, public law 104-208 (sept. 30, 1996). 27. john wells and laurie lewis, internet access in u.s. public schools and classrooms: 1994–2005, special report prepared at the request of the national center for education statistics, nov. 2006. 28. american library association, libraries connect communities; john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 285–301; jaeger et al., “cipa.” 29. wells and lewis, internet access in u.s. public schools and classrooms. 14 information technology and libraries | march 2009 30. ibid. 31. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 32. jaeger et al., “cipa.” 33. american library association, libraries connect communities. 34. ibid. 35. ibid. 36. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 37. ibid. 38. norman oder, “$40 million in e-rate funds suspended: delays caused as fcc requires new accounting standards,” library journal 129, no. 18 (2004): 16; debra lau whelan, “e-rate funding still up in the air: schools, libraries left in the dark about discounted funds for internet services,” school library journal 50, no. 11 (2004): 16. 39. ken foskett and paul donsky, “hard eye on city schools’ hardware,” atlanta journal-constitution, may 25, 2004; ken foskett and jeff nesmith, “wired for waste: abuses tarnish e-rate program,” atlanta journal-constitution, may 24, 2004. 40. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 41. department of commerce, national telecommunication and information administration, children’s internet protection act: study of technology protection measures in section 1703, report to congress (washington, d.c.: 2003). 42. mccarthy, “filtering the internet.” 43. paul t. jaeger and charles r. mcclure, “potential legal challenges to the application of the children’s internet protection act (cipa) in public libraries: strategies and issues,” first monday 9, no. 2 (2004), www.firstmonday.org/issues/issue9_2/ jaeger/index.html (accessed may 19, 2008). 44. electronic frontier foundation, internet blocking in public schools (washington, d.c.: 2004), http://w2.eff.org/censor ship/censorware/net_block_report (accessed may 19, 2008). 45. adam horowitz, “the constitutionality of the children’s internet protection act,” st. thomas law review 13, no. 1 (2000): 425–44. 46. tanessa cabe, “regulation of speech on the internet: fourth time’s the charm?” media law and policy 11 (2002): 50–61; adam goldstein, “like a sieve: the child internet protection act and ineffective filters in libraries,” fordham intellectual property, media, and entertainment law journal 12 (2002): 1187–1202; horowitz, “the constitutionality of the children’s internet protection act”; marilyn j. maloney and julia morgan, “rock and a hard place: the public library’s dilemma in providing access to legal materials on the internet while restricting access to illegal materials,” hamline law review 24, no. 2 (2001): 199–222; mary minow, “filters and the public library: a legal and policy analysis,” first monday 2, no. 12 (1997), www .firstmonday.org/issues/issue2_12/minnow (accessed may 19, 2008); richard j. peltz, “use ‘the filter you were born with’: the unconstitutionality of mandatory internet filtering for adult patrons of public libraries,” washington law review 77, no. 2 (2002): 397–479. 47. mccarthy, “filtering the internet.” 48. john carlo bertot et al., “public access computing and internet access in public libraries: the role of public libraries in e-government and emergency situations,” first monday 11, no. 9 (2006), www.firstmonday.org/issues/issue11_9/bertot (accessed may 19, 2008); john carlo bertot et al., “drafted: i want you to deliver e-government,” library journal 131, no. 13 (2006): 34–39; paul t. jaeger and kenneth r. fleischmann, “public libraries, values, trust, and e-government,” information technology and libraries 26, no. 4 (2007): 35–43. 49. doug johnson, “maintaining intellectual freedom in a filtered world,” learning & leading with technology 32, no. 8 (may 2005): 39. 50. bertot, mcclure, and jaeger, “the impacts of free public internet access on public library patrons and communities.” 51. jaeger et al., “public libraries and internet access across the united states.” 52. paul t. jaeger et al., “the policy implications of internet connectivity in public libraries,” government information quarterly 23, no. 1 (2006): 123–41. 53. goldstein, “like a sieve.” 54. bertot, mcclure, and jaeger, “the impacts of free public internet access on public library patrons and communities”; jaeger and fleischmann, “public libraries, values, trust, and e-government.“ 55. bertot, jaeger, and mcclure, “public libraries and the internet 2007”; charles r. mcclure et al., “funding and expenditures related to internet access in public libraries,” information technology & libraries (forthcoming). 56. zheng yan and kurt w. fischer, “how children and adults learn to use computers: a developmental approach,” new directions for child and adolescent development 105 (2004): 41–61. 57. zheng yan, “age differences in children’s understanding of the complexity of the internet,” journal of applied developmental psychology 26 (2005): 385–96; yan, “limited knowledge and limited resources”; yan, “differences in basic knowledge and perceived education of internet safety”; yan, “what influences children’s and adolescents’ understanding of the complexity of the internet?” 58. patricia greenfield and zheng yan, “children, adolescents, and the internet: a new field of inquiry in developmental psychology,” developmental psychology 42 (2006): 391–93. 59. john n. gathegi, “the public library as a public forum: the (de)evolution of a legal doctrine,” library quarterly 75 (2005): 12. 60. sandra braman, “where has media policy gone? defining the field in the 21st century,” communication law and policy 9, no. 2 (2004): 153–82; sandra braman, change of state: information, policy, & power (cambridge, mass.: mit pr., 2007); charles r. mcclure and paul t. jaeger, “government information policy research: importance, approaches, and realities,” library & information science research 30 (2008): 257–64; milton mueller, christiane page, and brendan kuerbis, “civil society and the shaping of communication-information policy: four decades of advocacy,” information society 20, no. 3 (2004): 169–85. 61. paul t. jaeger, “information policy, information access, and democratic participation: the national and international implications of the bush administration’s information politics,” government information quarterly 24 (2007): 840–59. 62. mcclure and jaeger, “government information policy research.” article title | author 23frbrization of a library catalog | dickey 23 the functional requirements for bibliographic records (frbr)’s hierarchical system defines families of bibliographic relationship between records and collocates them better than most extant bibliographic systems. certain library materials (especially audio-visual formats) pose notable challenges to search and retrieval; the first benefits of a frbrized system would be felt in music libraries, but research already has proven its advantages for fine arts, theology, and literature—the bulk of the non-science, technology, and mathematics collections. this report will summarize the benefits of frbr to nextgeneration library catalogs and opacs, and will review the handful of ils and catalog systems currently operating with its theoretical structure. editor’s note: this article is the winner of the lita/ ex libris writing award, 2007. t he following review addresses the challenges and benefits of a next-generation online public access catalog (opac) according to the functional requirements for bibliographic records (frbr).1 after a brief recapitulation of the challenges posed by certain library materials—specifically, but not limited to, audiovisual materials—this report will present frbr’s benefits as a means of organizing the database and public search results from an opac.2 frbr’s hierarchical system of records defines families of bibliographic relationship between records and collocates them better than most extant bibliographic systems; it thus affords both library users and staff a more streamlined navigation between related items in different materials formats and among editions and adaptations of a work. in the eight years since the frbr report’s publication, a handful of working systems have been developed. the first benefits of such a system to an average academic library system would be felt in a branch music library, but research already has proven its advantages for fine arts, theology, and literature—the bulk of the non-science, technology, and mathematics collections. ■ current search and retrieval challenges the difficulties faced first, but not exclusively, by music users of most integrated library systems fall into two related categories: issues of materials formats, and issues of cataloging, indexing, and marc record structure. music libraries must collect, catalog, and support materials in more formats than anyone else; this makes their experience of the most common ils modules—circulation, reserves, and acquisitions—by definition more complicated. the study of music continues to rely on the interrelated use of three distinct information formats—scores (the notated manifestation of a composer’s or improviser’s thought), recordings (realizations in sound, and sometimes video, of such compositions and improvisations), and books and journals (intellectual thought regarding such compositions and improvisations)—music libraries continue to require . . . collections that integrate [emphasis mine] these three information formats appropriately.3 put a different way, “relatedness is a pervasive characteristic of music materials.”4 this is why frbr’s model of bibliographic relationships offers benefits that will first impact the music collection.5 at present, however, musical formats pose search and retrieval challenges for most ils users, and the problem is certainly replicated with microforms and video recordings. the marc codes distinguish between material formats, but they support only one category for sound recordings, lumping together cd, dvd audio, cassette tape, reel-toreel tape, and all other types.6 this single “sound recording” definition is easily reflected in opacs (such as those powered by innovative interfaces’ millennium and ex libris’ aleph 500) and union catalogs (such as worldcat. org).7 however, the distinction between sound recording formats is embedded in subfields of the 007 field, which presently cannot be indexed by many library automation systems because the subfields are not adjacent. an even more central challenge derives from the fact that music sound recordings—such as journals and essay collections—contain within each item more than one work. thus, for one of the central material formats collected by a music library (as well as by a public library or other academic branches), users routinely find themselves searching for a distinct subset of the item record. perversely, though music catalogers do tend to include analytic added-entries for the subparts of a cd recording or printed score, and major ils vendors are learning to index them, aacr2 guidelines set arbitrary cutoff points of about fifteen tracks on a sound recording, and three performable units within a score.8 subsets of essay collections and journal runs are routinely exposed to users’ searches by indexing and abstracting services and major databases, but subsets of libraries’ music collections depend upon catalogers to exploit the marc records for user access.9 timothy j. dickey (dickeyt@oclc.org) is a post-doctoral researcher, oclc office of programs and research, dublin, ohio. frbrization of a library catalog: better collocation of records, leading to enhanced search, retrieval, and display timothy j. dickey 24 information technology and libraries | march 200824 information technology and libraries | march 2008 in light of these pervasive bibliographic relationships, catalogers of music (again, with parallels in other subjects) have developed a distinctive approach to the marc metadata schema. in particular, they—with their colleagues in literature, fine arts, and theology—rely upon the 700t field for uniform work titles, and upon careful authority control.10 however, once again, many major ils portals have spotty records in affording access to library collections via these data. innovative interfaces’ millennium, though it clearly leads other major library products in this market, frequently frustrates music librarians (it is, of course, not alone in doing so).11 its automatic authority control feature works poorly with (necessary) music authority records.12 and even though innovative has been one of the first vendors to add a database index to the 700t field, partly in response to concerns expressed to the company by the music librarians’ user group, millennium apparently does not allow for an appropriate level of follow-through on searching.13 an initial search by name of a major composer, for instance, yields a huge and cluttered result set containing all indexed 700t fields.14 the results do helpfully include the appropriate see also references, but those references disappear in a subsidiary (limited) search. in addition, the subsidiary display inexplicably changes to an unhelpful arrangement of generic 245 fields (“mozart, symphonies”; “mozart, operas, excerpts”). similar challenges will be faced by other parts of an academic or large public library collection, including the literature collections (for works such as shakespeare’s plays), fine arts (for images and artists’ works), and theology (for works whose uniform title is in latin). the opac interfaces of other major ils vendors fare little better. the same search (for “mozart”) on the emory university library catalog (with an ils by sirsidynix), similarly yields a rich results set of more than one thousand records, and poses similar problems in refining the search.15 in the case of this opac, an index of 700t fields also exists, but it only may be searched from the inside of a single record; as with millennium, sirsidynix’s interface will then group the next set of results confusingly by 245 fields. the library corporation’s carl-x apparently does not contain a 700t index; the simple “mozart” search returns a muchsimplified set of only 97 results organized by 245a fields, and thus offers a more concise set of results but avoids the most incisive index for audio-visual materials.16 ex libris offers a somewhat more helpful display of its more restricted results; unfortunately for the present comparison, though the detailed results set does list the “format” of all mozart-authored items, the same term— “music”—is used for sound recordings, musical scores, and score excerpts, with no attempt logically to group the results around individual works.17 no 700t index appears present. ■ the frbr paradigm: review of literature and theory from the earliest library catalogs in the modern age, the tools of bibliographic organization have sought to afford users both access to the collection and collocation of related materials. anglo-american cataloging practice has traditionally served the first function by main entries and alternate access points and the second function by classification systems. however, as knowledge increases in scope and complexity, the systems of bibliographic control have needed to evolve. as early as the 1950s, theories were developing that sought to distinguish between the intellectual content of a work, and its often manifold physical embodiments.18 the 1961 paris international conference on cataloging principles first reified within the cataloging community a work-item distinction, though even the 1988 publication of the anglo-american cataloging rules, 2nd ed., “continued to demonstrate confusion about the nature . . . of works.”19 meanwhile, extensive research into the nature of bibliographic relationships groped toward a consensus definition of the entity-types that could encompass such relationships.20 ed o’neill and diane vizine-goetz examined some one hundred editions of smollett’s the expedition of humphrey clinker over a two-hundred-year span of publication history to propose a hierarchical set of definitions to define entity levels.21 the theoretical entities include the intellectual content of a work—which in the case of audio-visual works, may not even exist in any printed formats—the various versions, editions, and printings in which that intellectual content manifests itself, and the specific copies of each manifestation which a library may hold.22 research has discovered such clusters of bibliographically related entities for as much as 50 percent or more of all the intellectual works in any given library catalog, and as many as 85 percent of the works in a music catalog.23 this work laid the foundation for frbr (and, once again, incidentally underscored the breadth of its applicability to, and beyond, music catalogs). the theoretical framework of frbr is most concisely set forth in the final report of the ifla study group. the long-awaited publication traces its genesis to the 1990 stockholm seminar, and the resultant 1992 founding of the ilfa study group on functional requirements for bibliographic records. the study group set out to develop: a framework that identifies and clearly defines the entities of interest to users of bibliographic records, the attributes of each entity, and the types of relationships that operate between entities . . . a conceptual model that would serve as the basis for relating specific attributes and relationships . . . to the various tasks that users perform when consulting bibliographic records. article title | author 25frbrization of a library catalog | dickey 25 the study makes no a priori assumptions about the bibliographic record itself, either in terms of content or structure.24 in other words, the intention of the group’s deliberations and the final report is to present a model for understanding bibliographic entities and the relationships between them to support information organization tools. it specifically adopts an approach that defines classes of entities based upon how users, rather than catalogers, approach bibliographic records—or, by natural extension, any system of metadata. the frbr hierarchical entities comprise a fourfold set of definitions: ■ work: “a distinct intellectual or artistic creation”; ■ expression: “the intellectual or artistic realization of a work” in any combination of forms (including editions, arrangements, adaptations, translations, performances, etc.); ■ manifestation: “the physical embodiment of an expression of a work”; and ■ item: “a single exemplar of a manifestation.”25 examples of these hierarchical levels abound in the bibliographic universe, but frequently music offers the quickest examples: ■ work: mozart’s die zauberflöte (the magic flute) ■ work: puccini’s la bohéme ■ expression: the composer’s complete musical score (1896) ■ manifestation: edition of the score printed by ricordi in 1897 ■ expression: an english language edition for piano and voices ■ expression: a performance by mirella freni, luciano pavarotti, and the berlin philharmonic orchestra (october 1972) ■ manifestation: a recording of this perfor mance released on 33¹/³ rpm sound discs in 1972 by london records ■ manifestation: a re-release of the same per formance on compact disc in 1987 by london records ■ item: the copy of the compact disc held by the columbus metropolitan library ■ item: the copy of the compact disc held by the university of cincinnati in fact, lis research has tended to demonstrate what music librarians have always understood—that relatedness among items and complexity of families is most prevalent in audio-visual collections. even before the ifla report had been penned, sherry vellucci had set out the task: “to create new catalog structures that better serve the needs of the music user community, it is important first to understand the exact nature and complexity of the materials to be described in the catalog.”26 even limiting herself to musical scores alone (that is, no recordings or monographs), vellucci found that more than 94.8 percent of her sample exhibited at least one bibliographic relationship with another entity in the collection; she further related this finding to the very “inherent nature of music, which requires performance for its aural realization,” as opposed to, for example, monographic book printing.27 vellucci and others have frequently commented on how the relatedness of manifestations—in different formats, arrangements, and abridgements—of musical works continues to be a problem for information retrieval in the world of music bibliography.28 musical works have been variously and industriously described by musicologists and music bibliographers. yet, in the information retrieval domain [and, i might add, under both aacr and aacr2] . . . systems for bibliographic information retrieval . . . have been designed with the document as the key entity, and works have been dismissed as too abstract . . .29 the work is the access point many users will bring—in their minds, and thus in their queries—to a system. they intend, however, to discover, identify, and obtain specific manifestations of that work. very recently, research has begun to demonstrate that the frbr model can offer specific advantages to music retrieval in cases such as these: “the description of bibliographic data in a frbr-based database leads to less redundancy and a clearer presentation of the relationships which are implicit in the traditional databases found in libraries today.”30 explorations of the theory in view of the benefits to other disciplines, such as audio-visual and other graphic materials, maps, oral literature, and rare books, have appeared in the literature as well.31 the admitted weakness of the frbr theory, of course, is that it remains a theory at its inception, with still preciously few working applications. ■ frbr applications working implementations of frbr to catalogs, opacs, and ilss are still relatively few but promise much for the future. the frbr theoretical framework has remained an area of intense research at oclc, which has even led to some prototype applications and, very recently, deployment in the worldcat local interface.32 a scattered few other researchers have crafted frbr catalogs and catalog displays for their own ends; the library of congress has a prototype as well. innovative, the leading academic ils vendor, announced a frbr feature for 2005 release, 26 information technology and libraries | march 200826 information technology and libraries | march 2008 yet shelved the project for lack of a beta-testing partner library.33 ex libris’ primo discovery tool, one other complete ils (by visionary technologies for library systems, or vtls), and the national library of australia, have each deployed operational frbr applications.34 the number of projects testifies to the high level of interest among the cataloging and information science communities, while the relatively small number of successful applications testifies to the difficulties faced. oclc has engaged in a number of research projects and prototypes in order to explore ways that frbrization of bibliographic records could enhance information access. oclc research frequently notes the potential streamlining of library cataloging by frbrization; in addition they have experienced “superior presentation” and “more intuitive clustering” of search results when the model is incorporated into systems.35 work-level definitions stand behind such oclc research prototypes as audience level, dewey browser, fictionfinder, xisbn, and live search. in every case, researchers determined that, though it was very difficult to automate any identification of expressions, application of work-level categories both simplifies and improves search result sets.36 an algorithm common to several of these applications is freely available as an open source application, and now as a public interface option in oclc’s worldcat local.37 the algorithm creates an author/title key to cluster worksets (often at a higher level than the frbr work, as in the case of the two distinct works that are the book and screenplay for gone with the wind). in the public search interface, the results sets may be grouped at the work level; users may then execute a more granular search for “all editions,” an option that then displays the group of expressions linked to the work record. unfortunately, as the software does not use 700t fields (its intention is to travel up the entity hierarchy, and it uses the 1xx, 24x, and 130 fields), its usefulness in solving the above challenges may not be immediate. a somewhat similar application (though merrilee proffitt declares it not to be a frbr product) was redlightgreen, a user interface for the exrlg union catalog based upon quasi-frbr clustering.38 the reports from designers of other automated systems offer interesting commentaries on the process. the team building an automatically frbrized database and user interface for austlit—a new union collection of australian literature among eight academic libraries and the national library of australia—acknowledged some difficulty with non-monographic works such as poems, though the majority of their database consisted of simpler work-manifestation pairs.39 based on strongly positive user feedback (“the presentation of information about related works [is] both useful and comprehensible”), a similar application was attempted on the australian national music gateway musicaustralia; it is unclear whether the project was shelved due to difficulties in automating the frbrization process.40 one recent application created for the perseus digital library adopts a somewhat different approach.41 rather than altering previously created marc records to allow hierarchical relationships to surface, this team created new records using crosswalks between marc and, for instance, mods, for work-level records. they claim some moderate level of success; though once again, their discussion of the process is more illuminating than their product. mimno and crane successfully allowed a single manifestation-level record to link upwards to many expressions, a necessary analytic feature especially for dealing with sound recordings. they did practically demonstrate the difficulty of searching elements from different levels of the hierarchy at the same time (such as work title and translator), a complication predicted by yee.42 three ils vendors have released products that use the frbr model: portia (visualcat), ex libris (primo), and vtls (virtua).43 the first product, a cataloging utility from a smaller player in the vendor market, claims to incorporate frbr into its metadata capture, yet the information available does not explain how, nor do they offer an opac to exploit it. the 2007 release of ex libris’ primo offers what the company calls “frbr groupings” of results.44 this discovery tool is not itself an ils, but promises to interoperate with major existing ils products to consolidate search results. it remains unclear at this time how ex libris’ “standard frbr algorithms” actually group records; the single deployment in the danish royal library allows searching for more records with the same title, for instance, but does not distinguish between translations of the same work.45 vtls, on the other hand, has since 2004 offered a complete product that has the potential to modify existing marc records—via local linking tags in the 001 and 004 fields—to create frbr relationships.46 their own studies agreed with oclc that a subset, roughly 18 percent, of existing catalog records (most heavily concentrated in music collections) would benefit from the process, and they thus allow for “mixed” catalogs, with only subsets (or even individually selected records) to be frbrized. the company’s own information suggests relatively simple implementation by library catalogers, coupled with robust functionality for users, and may be the leading edge of the next generation of catalog products. ■ frbr solutions the ilfa study group, following its user-centered approach, set out a list of specific tasks that users of a computer-aided catalog should be able to accomplish: article title | author 27frbrization of a library catalog | dickey 27 ■ to find all manifestations embodying certain criteria, or to find a specific manifestation given identifying information about it; ■ to identify a work, and to identify expressions and manifestations of that work; ■ to select among works, among expressions, and among manifestations; and ■ to obtain a particular manifestation once selected. it seems clear that the frbr model offers a framework of relationships that can aid each task. unfortunately, none of the currently available commercial solutions may be in themselves completely applicable for a single library. the oclc work-set algorithm is open source, as well as easily available through worldcat local, but it only works to create super-work records; it also ignores the 700t field so crucial to many of the issues noted above. none of the other home-grown applications may have code available to an institution. the virtua module from vtls offers a very tempting solution, but may require a change of vendor.47 either adapting one of these solutions or designing a local application, then, raises the question: what would the ideal system entail? catalog frbrization will transpire in two segments: enhancing the existing catalog to add bibliographic relationships to surface in the retrieval phase, and designing or adaptating a new interface and display to reflect the relationships.48 the first task may prove the more formidable, due to the size of even a modest catalog database and the difficulties often observed in automating such a task; while the librarians constructing the austlit system found a relatively high percentage of records could be transferred en masse, the oclc research team had difficulty automatically pinpointing expressions from current marc records.49 despite current technology trends toward users’ application of tags, reviews, and other metadata, a task as specialized as adding bibliographic relationships to the catalog demands specialized cataloging professionals.50 the best approach within a current library structure may be to create a single new position to head the project and to act as liaison with cataloging staff in the various branches and with vendor staff, if applicable. each library branch may judge on its own the proportions of records to frbrize, beginning with high-traffic works and authors, those for whom search results tend to be the most overwhelming and confusing to users. each branch can be responsible for allocation of cataloging staff effort to the process, and will thus have specialist oversight of subsets of the database. three technical solutions to actually changing the database structure have been attempted in the literature to date: incrementally improving the existing marc records to better reflect bibliographic relationships, adding local linking tags, and simply creating new metadata schemas. the vtls solution of adding local linking tags seems most appropriate; relationships between records are created and maintained via unique identifiers and linking statements in the 001 and 004 fields.51 oclc’s open source software could expedite the creation of work-level records, and the creation of expression-level records will be made easier by the large amount of bibliographic information already present in the current catalog. wherever possible, cataloging staff also should take the opportunity to verify or create links to authority files so as to enhance retrieval.52 creating a new catalog display option could be accomplished via additions to current opac coding, either by adopting worldcat local or by designing parts of a new local interface. it need not even require a complete revision; the single site (ucl) currently deploying vtls’ frbrized interface maintains a mixed catalog and offers, once again, a highly intuitive model.53 when a searcher comes across a bibliographic record for which frbr linking is available, they may click a link to open a new display screen. we should strive, however, to use simple interface statements such as “view all different kinds of holdings,” “this work has x editions, in y languages” or “this version of the work has been published z times” (both the oclc prototype and the austlit gateway offer such helpful and user-friendly statements). though the foundational work of both tillett and smiraglia focused upon taxonomies of relationships, the hierarchical structure of the ifla proposal should remain at the forefront of the display, with a secondary organization by type of relationship or type of entity. rather than adopting a design which automatically refreshes at each click, a tree organization of the display should be more user-friendly, allowing users to maintain a visual sense of the organization that they are encountering (see appendix for screenshots of this type of tree display).54 format information should be included in the display, as an indication of a users’ primary category, as well as a distinction among expressions of a work. with these changes, the library catalog will begin to afford its users better access to many of its core collections. frbrization of even part of the catalog—concentrating on high-incidence authors, as identified by subject specialists—will allow it better to reflect, and collocate, items within the families of bibliographic relationships that have been acknowledged a part of library collections for decades. this increased collocation will begin to counteract the pitfalls of mere keyword searching on the part of users, especially in conjunction with renewed authority work. finally, frbr offers a display option in a revamped opac that is at the same time simpler than current result lists, and more elegant in its reflection of relatedness among items. each feature should better 28 information technology and libraries | march 200828 information technology and libraries | march 2008 enable the users of our catalog to find, select, and obtain appropriate resources, and will bring our libraries into the next generation of cataloging practice. references and notes 1. ifla committee on the functional requirements for bibliographic records, final report (munich: k. g. saur, 1998); see also http://www.ifla.org/vii/s13/wgfrbr/bibliography.htm (accessed mar. 10, 2007). 2. this paper began as a graduate research assignment for lis 60640 (library automation), in the kent state university mlis program, march 19, 2007. my thanks to jennifer hambrick, nancy lensenmayer, and joan lippincott, for their helpful comments on earlier drafts. the curricular assignment asked for a library automation proposal in a specific library setting; the original review contained a set of recommendations concerning frbr through the lens of a (fictional) medium-sized academic library system, that of st. hildegard of bingen catholic university. as will be noted below, the branch music library typically serves a small population of music majors (graduate and undergraduate) within such an institution, but also a large portion of the student body that use the library’s collection to support their music coursework and arts distribution requirements. any music library’s proportion of the overall system’s holdings may be relatively small, but will include materials in a diverse set of formats: monographs, serials, musical scores, sound recordings in several formats (cassette tapes, lps, cds, and streaming audio files), and a growing collection of video recordings, likewise in several formats (vhs, laser discs, and dvd). it thus offers an early test case for difficulties with an automated library system. 3. dan zager, “collection development and management,” notes—quarterly journal of the music library association 56, no. 3 (march 2000): 569. 4. sherry l. velluci, “music metadata and authority control in an international context,” notes—quarterly journal of the music library association 57, no. 3 (mar. 2001): 541. 5. the opac for the university of huddersfield library system famously first deployed a search option for related items (“did you mean . . . ?”); http://www.hud.ac.uk/cls (accessed july 10, 2007). frbr not only offers the related item search, but also logically groups related works throughout the library catalog. 6. allyson carlyle demonstrated empirically that users value an object’s format as one of the first distinguishing features: “user categorization of works: toward improved organization of online catalog displays,” journal of documentation 55, no. 2 (mar. 1999): 184–208 at 197. 7. millennium will feature heavily in the following discussion, both because of its position leading the academic library automation market (being adopted wholesale by, for instance, the ohio statewide academic library consortium), and because it was the subject of the original paper. 8. see alastair boyd, “the worst of both worlds: how old rules and new interfaces hinder access to music,” caml review 33, no. 3 (nov. 2005), http://www.yorku.ca/caml/ review/33-3/both_worlds.htm (accessed mar. 12, 2007); michael gorman and paul w. winkler, eds., anglo-american cataloging rules, 2nd ed. (chicago: ala, 1988). 9. in the past few years, a small subset of the search literature has described technical efforts to develop search engines that can query by musical example; see j. stephen downie, “the scientific evaluation of music information retrieval systems: foundations and future,” computer music journal 28, no. 2 (summer 2004): 12–23. a company called melodis corporation has recently announced a successful launch of a query-by-humming search engine, though a verdict from the music community remains out; http://www.midomi.com (accessed jan. 31, 2007). 10. see velluci, “music metadata and authority control in an international context”; richard p. smiraglia, “uniform titles for music: an exercise in collocating works,” cataloging and classification quarterly 9, no. 3 (1989): 97–114; steven h. wright, “music librarianship at the turn of the century: technology,” notes—quarterly journal of the music library association 56, no. 3 (mar. 2000): 591–97. each author builds upon the foundational work of barbara tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (ph.d. diss., university of california at los angeles, 1987). 11. “at conferences, [my colleagues] are always groaning if they are a voyager client,” interview with an academic music librarian by the author, feb. 9, 2007. 12. several prominent music librarians only discovered that innovative’s system had such a feature when instances of the automatic system’s changing carefully crafted music authority records were discovered; mark sharff (washington university in st. louis) and deborah pierce (university of washington), postings to innovative music users’ group electronic discussion list, oct. 6, 2006, archive accessed feb. 1, 2007. 13. music librarians are the only subset of the millennium users to have formed their own innovate users’ group. sirsidynix has a separate users’ group for stm librarians, and ex libris hosts a law librarians’ users’ group, two other groups whose interaction with the ils poses discipline-specific challenges. 14. searches were tested on the the ohio state university libraries’ opac , http://library.osu.edu (accessed mar. 10, 2007). 15. http://www.emory.edu/libraries.cfm (accessed june 27, 2007). 16. searches performed on the library of oklahoma state university, http://www.library.okstate.edu (accessed june 27, 2007); tlc has considered making frbrization a possible feature of their product. they offer some concatenation of “intellectually similar bibliographic records,” and “tlc continues to monitor emerging frbr standards”; don kaiser, personal communication to the author, july 8, 2007. i was unable to reach representatives of sirsidynix on this issue. 17. searches performed on the mit library catalog, powered by aleph 500 http://libraries.mit.edu (accessed june 27, 2007). 18. eva verona, “literary unit versus bibliographic unit [1959],” in foundations of descriptive cataloging, ed. michael carpenter and elaine svenonius, 155–75 (littleton, colo.: libraries unlimited, 1985), and seymour lubetzky, principles of cataloging, final report phase i: descriptive cataloging (los angeles: institute for library research, 1969), are usually credited with article title | author 29frbrization of a library catalog | dickey 29 the foundational work on such theories; see richard p. smiraglia, the nature of “a work”: implications for the organization of knowledge (lanham, md.: scarecrow, 2001), 15–33, to whom the following overview is indebted. 19. anglo-american cataloging rules, cited in smiraglia, the nature of “a work,” 33. 20. among the many library and information science thinkers contributing to this body of research, the most prominent have been patrick wilson, “the second objective” in the conceptual foundations of descriptive cataloging, ed. elaine svenonius, 5–16 (san diego: academic publ., 1989); edward t. o’neill and diane vizine-goetz, “bibliographic relationships: implications for the function of the catalog,” in the conceptual foundations of descriptive cataloging, ed. elaine svenonius, 167–79 (san diego: academic publ., 1989); barbara ann tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (ph.d. diss, university of california, los angeles, 1987); eadem, “bibliographic relationships,” in relationships in the organization of knowledge, carol a. bean and rebecca green, eds. , 19–35 (dordrecht: kluwer, 2001) (summary of her dissertation findings on 19–20); martha m. yee, “manifestations and near-equivalents: theory with special attention to moving-image materials,” library resources and technical services 38, no. 3 (1994): 227–55. 21. o’neill and vizine-goetz, “bibliographic relationships”; see also edward t. o’neill, “frbr: application of the entityrelationship model to humphrey clinker,” library resources and technical services 46, no. 4 (oct. 2002): 150–59. 22. theorists in music semiotics who have more or less profoundly influenced music librarians’ view of their materials include jean-jacques nattiez, music and discourse: toward a semiology of music, trans. by carolyn abbate (princeton, n.j.: princeton univ. pr., 1990), and lydia goehr, the imaginary museum of musical works (new york: oxford univ. pr., 1992). see also smiraglia, the nature of “a work,” 64. for a concise overview of how semiotic theory has influenced thinking about literary texts, see w. c. greetham, theories of the text (oxford: oxford univ. pr., 1999), 276–325. 23. studies have found families of derivative bibliographic relationships in 30.2 percent of all worldcat records, 49.9 percent of records in the catalog of georgetown university library, 52.9 percent in the burke theological library (union theological seminary), 57.9 percent of theological works in the new york university library, and 85.4 percent in the sibley music library at the eastman school of music (university of rochester). see smiraglia, the nature of “a work,” 87, who cites richard p. smiraglia and gregory h. leazer, “derivative bibliographic relationships: the work relationship in a global bibliographic database,” journal of the american society for information science 50 (1999): 493–504; richard p. smiraglia, “authority control and the extent of derivative bibliographic relationships” (ph.d. diss., university of chicago, 1992); richard p. smiraglia, “derivative bibliographic relationships among theological works,” proceedings of the 62nd annual meeting of the american society for information science (medford, n.j.: information today, 1999): 497–506; and sherry l. vellucci, “bibliographic relationships among musical bibliographic entities: a conceptual analysis of music represented in a library catalog with a taxonomy of the relationships” (d.l.s. diss., columbia university, 1994). 24. ifla, final report, 2–3. 25. ibid, 16–23. 26. sherry l. vellucci, bibliographic relationships in music catalogs (lanham, md.: scarecrow, 1997), 1. 27. ibid, 238; 251. 28. vellucci, “music metadata”; richard p. smiraglia, “musical works and information retrieval,” notes: quarterly journal of the music library association 58, no. 4 (june 2002). patrick le boeuf notes that users of music collections often use the single word “score” to indicate any one of the four frbr entities; “musical works in the frbr model or ‘quasi la stessa cosa’: variations on a theme by umberto eco,” in functional requirements for bibliographic records (frbr): hype or cure-all? ed. patrick le boeuf, 103–23 at 105–06 (new york: haworth, 2005). 29. smiraglia, “musical works and information retrieval,” 2. 30. marte brenne, “storage and retrieval of musical documents in a frbr-based library catalogue” (masters’ thesis, oslo university college, 2004), 79. see also john anderies, “enhancing library catalogs for music,” paper presented at the conference on music and technology in the liberal arts environment, hamilton college, june 22, 2004; powerpoint presentation accessed mar. 12, 2007, from http://academics. hamilton.edu/conferences/musicandtech/presentations/catalog-enhancements.ppt; boyd, “the worst of both worlds.” 31. see the extensive bibliography compiled by ifla, cataloging division: “frbr bibliography,” http://www.ifla.org/ vii/s13/wgfrbr.bibliography.htm (accessed mar. 10, 2007). 32. the first ils deployment of the worldcat local application using frbr is with the university of washington libraries: http://www.lib.washington.edu (accessed june 27, 2007). 33. innovative interfaces, inc., “millennium 2005 preview: frbr support,” inn-touch (june 2004), 9. interestingly, the onepage advertisement for the new service chose a musical work, puccini’s opera la bohème, to illustrate how the sorting would work. innovative interfaces booth staff at the ala national conference, washington, d.c., june 24, 2007, told the author the company has moved in a different development direction now (investing more heavily in faceted browsing). 34. denmark’s det kongelige bibliotek has been the first ex libris partner library to deploy primo, http://www.kb.dk/en (accessed july 10, 2007). the vtls system has been operating since 2004 at the université catholique de louvain, http:// www.bib.ucl.ac.be (accessed mar. 15, 2007). for austlit, see http://www.austlit.edu.au (accessed mar. 14, 2007). 35. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, and technical services 27, no. 1 (spring 2003): 45–60. work-level records allow manifestation and item records to inherit labor-intensive subject classification metadata; eric childress, “frbr and oclc research,” paper presented at the university of north carolina-chapel hill, apr. 10, 2006, http://www.oclc.org/research/presentations/ childress/20060410-uncch-sils.ppt (accessed mar. 12, 2007). 36. thomas b. hickey, edward t. o’neill, and jenny toves, “experiments with the ifla functional requirements for bibliographic records (frbr),” d-lib 8, no. 9 (sept. 2002), http://www.dlib.org/dlib/september02/hickey/09hickey.html (accessed mar. 12, 2007). 37. thomas b. hickey and jenny toves, “frbr work-set algorithm,” apr. 2005 report, http://www.oclc.org/research/ projects/frbr/default.htm (accessed mar. 12, 2007); algorithm 30 information technology and libraries | march 200830 information technology and libraries | march 2008 available at http://www.oclc.org/research/projects/frbr/algorithm.htm. on worldcat local, see above, note 32. 38. merrilee proffitt, “redlightgreen: frbr between a rock and a hard place,” http://www.ala.org/ala/alcts/alctsconted/ presentations/proffitt.pdf (accessed mar. 12, 2007). redlight green has been discontinued, and some of its technology incorporated into worldcat local. 39. http://www.austlit.edu.au (accessed mar. 14, 2007), but unfortunately a subscription database at this time, and thus unavailable for operational comparison. see marie-louise ayres, “case studies in implementing functional requirements for bibliographic records: austlit and musicaustralia,” alj: the australian library journal 54, no. 1 (feb. 2005): 43–54, http:// www.nla.gov.au/nla/staffpaper/2005/ayres1.html (accessed mar. 12, 2007). 40. ibid. 41. see david mimno and gregory crane, “hierarchical catalog records: implementing a frbr catalog,” d-lib 11, no. 10 (oct. 2005); http://www.dlib.org/dlib/october05/ crane/10crane.html (accessed mar. 12, 2007). 42. ibid. see also martha m. yee, “frbrization: a method for turning online public finding lists into online public catalogs,” information technology and libraries 24, no. 3 (2005): 77–95, http://repositories.cdlib.org/postprints/715 (accessed mar. 12, 2007). 43. portia, “visualcat overview,” http://www.portia.dk/ pubs/visualcat/present/visualcatoverview20050607.pdf (accessed mar. 14, 2007); vtls, inc., “virtua,” http://www.vtls. com/brochures/virtua.pdf (accessed mar. 14, 2007). 44. http://www.exlibrisgroup.com/primo_orig.htm (accessed july 10, 2007). 45. syed ahmed, personal communication to the author, july 10, 2007; searches run july 10, 2007, on http://www.kb.dk/en. the library’s holdings of manifestations of mozart’s singspiel opera, the magic flute, run to four different groupings on this catalog: one under the title “die zauberflöte,” one under the title “la flute enchantée: opéra fantastique en 4 actes,” and two separate groups under the title “tryllefløtjen.” 46. “vtls announces first production use of frbr,” http:// www.vtls.com/corporate/releases/2004/6.shtml (accessed mar. 14, 2007). unfortunately, though this press release indicates commitments on the part of the université catholique de louvain and vaughan public libraries (ontario, canada) to use fully frbrized catalogs, only the first is operating in this mode as of july 2007, and with only a subset of its catalog adapted. 47. virtua is not interoperable, for instance, with any of innovative’s other ils modules, which continue to dominate a number of larger academic consortia; john espley, vtls inc. director of design, personal communication to the author, mar. 15, 2007. 48. see allyson carlyle, “fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays,” library resources and technical services 41, no. 2 (1997): 79–100. 49. even at the work-level, yee distinguished fully eight different places in a marc record in which the identity of a work may be located, “frbrization,” 79–80. 50. gregory leazer and richard p. smiraglia imply that cataloger-based “maps” of bibliographic relationships are inadequate; “bibliographic families in the library catalog: a qualitative analysis and grounded theory,” library resources and technical services 43, no. 4 (1999): 191–212. the cataloging failures they describe, however, are more a result of inadequacies in the current rules and practice, and do not really prove that catalogers have failed in the task of creating useful systems. 51. vinood chacra and john espley, “differentiating libraries though enriched user searching: frbr as the next dimensions in meaningful information retrieval,” powerpoint presentation, http://www.vtls.com/corporate/frbr.shtml (accessed mar. 10, 2007). 52. see yee, “frbrization.” 53. http://www.bib.ucl.ac.be (accessed mar. 15, 2007). 54. not only does the ex libris primo application need clickthroughs, it creates a new window for an extra step before presenting a new group of records. bibliography anderies, john. “enhancing library catalogs for music.” paper presented at the conference on music and technology in the liberal arts environment, hamilton college, june 22, 2004; http://academics.hamilton.edu/conferences/musicandtech/presentations/catalog-enhancements.ppt (accessed mar. 12, 2007). ayres, marie-louise. “case studies in implementing functional requirements for bibliographic records: austlit and musicaustralia.” alj: the australian library journal 54, no. 1 (feb. 2005): 43–54; http://www.nla.gov.au/nla/staffpaper/2005/ ayres1.html (accessed mar. 12, 2007). bennett, rick, brian f. lavoie, and edward t. o’neill. “the concept of a work in worldcat: an application of frbr.” library collections, acquisitions, and technical services 27, no. 1 (spring 2003): 45–60. boyd, alistair. “the worst of both worlds: how old rules and new interfaces hinder access to music.” caml review 33, no. 3 (nov. 2005); http://www.yorku.ca/caml/review/33-3/ both_worlds.htm (accessed mar. 12, 2007). brenne, marte. “storage and retrieval of musical documents in a frbr-based library catalogue.” masters’ thesis, oslo university college, 2004. carlyle, allyson. “fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays,” library resources and technical services 41, no. 2 (1997): 79–100. ______. “user categorization of works: toward improved organization of online catalog displays.” journal of documentation 55, no. 2 (mar. 1999): 184–208 chacra, vinood, and john espley. “differentiating libraries though enriched user searching: frbr as the next dimensions in meaningful information retrieval.” powerpoint presentation, http://www.vtls.com/corporate/frbr.shtml (accessed mar. 10, 2007). childress, eric. “frbr and oclc research.” paper presented at the university of north carolina-chapel hill, apr. 10, 2006; http://www.oclc.org/research/presentations/ childress/20060410-uncch-sils.ppt (accessed mar. 12, 2007). hickey, thomas b., and edward o’neill. “frbrizing oclc’s worldcat.” in functional requirements for bibliographic records article title | author 31frbrization of a library catalog | dickey 31 (frbr): hype or cure-all? ed. patrick le boeuf, 239-251. new york: haworth, 2005. hickey, thomas b., and jenny toves. “frbr work-set algorithm.” apr. 2005 report; http://www.oclc.org/research/ frbr (accessed mar. 12, 2007). hickey, thomas b., edward t. o’neill, and jenny toves, “experiments with the ifla functional requirements for bibliographic records (frbr),” d-lib 8, no. 9 (sept. 2002); http://www.dlib.org/dlib/september02/hickey/09hickey. html (accessed mar. 12, 2007). ifla study group on the functional requirements for bibliographic records. functional requirements for bibliographic records: final report. munich: k. g. saur, 1998. layne, sara shatford. “subject access to art images.” in introduction to art image access: issues, tools, standards, strategies, murtha baca, ed., 1–18. los angeles: getty research institute, 2002. leazer, gregory, and richard p. smiraglia. “bibliographic families in the library catalog: a qualitative analysis and grounded theory.” library resources and technical services 43, no. 4 (1999): 191–212. le boeuf, patrick. “musical works in the frbr model or ‘quasi la stessa cosa’: variations on a theme by umberto eco.” in functional requirements for bibliographic records (frbr): hype or cure-all? patrick le boeuf, ed., 103–23 new york: haworth, 2005. markey, karen. subject access to visual resources collections: a model for computer construction of thematic catalogs. new york: greenwood, 1986. mimno, david, and gregory crane. “hierarchical catalog records: implementing a frbr catalog.” d-lib 11, no. 10 (oct. 2005); http://www.dlib.org/dlib/october05/crane/10crane. html (accessed mar. 12, 2007). o’neill, edward t. “frbr: application of the entity-relationship model to humphrey clinker.” library resources and technical services 46, no. 4 (oct. 2002): 150–59. o’neill, edward t., and diane vizine-goetz. “bibliographic relationships: implications for the function of the catalog.” in the conceptual foundations of descriptive cataloging. elaine svenonius, ed., 167–79. san diego: academic publ., 1989. proffitt, merrilee. “redlightgreen: frbr between a rock and a hard place.” paper presented at the 2004 ala annual conference, orlando, fla.; http://www.ala.org/ala/alcts/alctsconted/presentations/proffitt.pdf (accessed mar. 12, 2007). smiraglia, richard p. bibliographic control of music, 1897–2000. lanham, md.: scarecrow and music library association, 2006. ______. “content metadata: an analysis of etruscan artifacts in a museum of archaeology.” cataloging and classification quarterly, 40, no. 3/4 (2005): 135–51. ______. “musical works and information retrieval,” notes: quarterly journal of the music library association 58, no. 4 (june 2002): 747–64. ______. the nature of “a work”: implications for the organization of knowledge. lanham, md.: scarecrow, 2001. ______. “uniform titles for music: an exercise in collocating works.” cataloging and classification quarterly 9, no. 3 (1989): 97–114. tillett, barbara ann. “bibliographic relationships.” in relationships in the organization of knowledge. carol a. bean and rebecca green, eds., 19–35. dordrecht: kluwer, 2001. vellucci, sherry l. bibliographic relationships in music catalogs. lanham, md.: scarecrow, 1997. ______. “music metadata and authority control in an international context.” notes—quarterly journal of the music library association 57, no. 3 (mar. 2001): 541–54. wilson, patrick. “the second objective.” in the conceptual foundations of descriptive cataloging. elaine svenonius, ed., 5–16. san diego: academic publ., 1989. wright, h. s. “music librarianship at the turn of the century: technology.” notes: quarterly journal of the music library association 56, no. 3 (mar. 2000): 591–97. yee, martha m. “frbrization: a method for turning online public finding lists into online public catalogs.” information technology and libraries 24, no. 3 (2005): 77–95; http://repositories.cdlib.org/postprints/713 (accessed mar. 12, 2007). ______. “manifestations and near-equivalents: theory with special attention to moving-image materials.” library resources and technical services 38, no. 3 (1994): 227–55. zager, daniel. “collection development and management.” notes: quarterly journal of the music library association 56, no. 3 (2000): 567–73. 32 information technology and libraries | march 200832 information technology and libraries | march 2008 a search on also sprach zarathustra on the online public access catalog for the universite catholique de louvain, with results frbrized. (a vtls opac). selecting the first work yields the following screen: . . . which, when frbrized, yields a list of expressions. any part of the tree may be expanded, to display manifestations, and item-level records follow. appendix: examples of a frbrized tree display web services and widgets for library information systems | han 87on the clouds: a new way of computing | han 87 shape cloud computing. for example, sun’s well-known slogan “the network is the computer” was established in late 1980s. salesforce.com has been providing on-demand software as a service (saas) for customers since 1999. ibm and microsoft started to deliver web services in the early 2000s. microsoft’s azure service provides an operating system and a set of developer tools and services. google’s popular google docs software provides web-based word-processing, spreadsheet, and presentation applications. google app engine allows system developers to run their python/java applications on google’s infrastructure. sun provides $1 per cpu hour. amazon is well-known for providing web services such as ec2 and s3. yahoo! announced that it would use the apache hadoop framework to allow users to work with thousands of nodes and petabytes (1 million gigabytes) of data. these examples demonstrate that cloud computing providers are offering services on every level, from hardware (e.g., amazon and sun), to operating systems (e.g., google and microsoft), to software and service (e.g., google, microsoft, and yahoo!). cloud-computing providers target a variety of end users, from software developers to the general public. for additional information regarding cloud computing models, the university of california (uc) berkeley’s report provides a good comparison of these models by amazon, microsoft, and google.4 as cloud computing providers lower prices and it advancements remove technology barriers—such as virtualization and network bandwidth—cloud computing has moved into the mainstream.5 gartner stated, “organizations are switching from factors related to cloud computing: infinite computing resources available on demand, removing the need to plan ahead; the removal of an up-front costly investment, allowing companies to start small and increase resources when needed; and a system that is pay-for-use on a short-term basis and releases customers when needed (e.g., cpu by hour, storage by day).2 national institute of standards and technology (nist) currently defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. network, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”3 as there are several definitions for “utility computing” and “cloud computing,” the author does not intend to suggest a better definition, but rather to list the characteristics of cloud computing. the term “cloud computing” means that ■■ customers do not own network resources, such as hardware, software, systems, or services; ■■ network resources are provided through remote data centers on a subscription basis; and ■■ network resources are delivered as services over the web. this article discusses using cloud computing on an it-infrastructure level, including building virtual server nodes and running a library’s essential computer systems in remote data centers by paying a fee instead of running them on-site. the article reviews current cloud computing services, presents the author’s experience, and discusses advantages and disadvantages of using the new approach. all kinds of clouds major it companies have spent billions of dollars since the 1990s to on the clouds: a new way of computing this article introduces cloud computing and discusses the author’s experience “on the clouds.” the author reviews cloud computing services and providers, then presents his experience of running multiple systems (e.g., integrated library systems, content management systems, and repository software). he evaluates costs, discusses advantages, and addresses some issues about cloud computing. cloud computing fundamentally changes the ways institutions and companies manage their computing needs. libraries can take advantage of cloud computing to start an it project with low cost, to manage computing resources cost-effectively, and to explore new computing possibilities. s cholarly communication and new ways of teaching provide an opportunity for academic institutions to collaborate on providing access to scholarly materials and research data. there is a growing need to handle large amounts of data using computer algorithms that presents challenges to libraries with limited experience in handling nontextual materials. because of the current economic crisis, academic institutions need to find ways to acquire and manage computing resources in a cost-effective manner. one of the hottest topics in it is cloud computing. cloud computing is not new to many of us because we have been using some of its services, such as google docs, for years. in his latest book, the big switch: rewiring the world, from edison to google, carr argues that computing will go the way of electricity: purchase when needed, which he calls “utility computing.” his examples include amazon’s ec2 (elastic computing cloud), and s3 (simple storage) services.1 amazon’s chief technology officer proposed the following yan hantutorial yan han (hany@u.library.arizona.edu) is associate librarian, university of arizona libraries, tucson. 88 information technology and libraries | june 201088 information technology and libraries | june 2010 company-owner hardware and software to per-use service-based models.”6 for example, the u.s. government website (http://www.usa .gov/) will soon begin using cloud computing.7 the new york times used amazon’s ec2 and s3 services as well as a hadoop application to provide open access to public domain articles from 1851 to 1922. the times loaded 4 tb of raw tiff images and their derivative 11 million pdfs into amazon’s s3 in twenty-four hours at very reasonable cost.8 this project is very similar to digital library projects run by academic libraries. oclc announced its movement of library management services to the web.9 it is clear that oclc is going to deliver a web-based integrated library system (ils) to provide a new way of running an ils. duraspace, a joint organization by fedora commons and dspace foundation, announced that they would be taking advantage of cloud storage and cloud computing.10 on the clouds computing needs in academic libraries can be placed into two categories: user computing needs and library goals. user computing needs academic libraries usually run hundreds of pcs for students and staff to fulfill their individual needs (e.g., microsoft office, browsers, and image-, audio-, and video-processing applications). library goals a variety of library systems are used to achieve libraries’ goals to support research, learning, and teaching. these systems include the following: ■■ library website: the website may be built on simple html webpages or a content management system such as drupal, joomla, or any home-grown php, perl, asp, or jsp system. ■■ ils: this system provides traditional core library work such as cataloging, acquisition, reporting, accounting, and user management. typical systems include innovative interfaces, sirsidynix, voyager, and opensource software such as koha. ■■ repository system: this system provides submission and access to the institution’s digital collections and scholarship. typical systems include dspace, fedora, eprints, contentdm, and greenstone. ■■ other systems: for example, federated search systems, learning object management systems, interlibrary loan (ill) systems, and reference tracking systems. ■■ public and private storage: staff file-sharing, digitization, and backup. due to differences in end users and functionality, most systems do not use computing resources equally. for example, the ils is input and output intensive and database query intensive, while repository systems require storage ranging from a few gigabytes to dozens of terabytes and substantial network bandwidth. cloud computing brings a fundamental shift in computing. it changes the way organizations acquire, configure, manage, and maintain computing resources to achieve their business goals. the availability of cloud computing providers allows organizations to focus on their business and leave general computing maintenance to the major it companies. in the fall of 2008, the author started to research cloud computing providers and how he could implement cloud computing for some library systems to save staff and equipment costs. in january 2009, the author started his plan to build library systems “on the clouds.” the university of arizona libraries (ual) has been a key player in the process of rebuilding higher education in afghanistan since 2001. ual librarian atifa rawan and the author have received multiple grant contracts to build technical infrastructures for afghanistan’s academic libraries. the technical infrastructure includes the following: ■■ afghanistan ils: a bilingual ils based on the open-source system koha.11 ■■ afghanistan digital libraries website (http://www.afghan digitallibraries.org/): originally built on simple html pages, later rebuilt in 2008 using the content management system joomla. ■■ a digitization management system. the author has also developed a japanese ill system (http://gif project.libraryfinder.org) for the north american coordinating council on japanese library resources. these systems had been running on ual’s internal technical infrastructure. these systems run in a complex computing environment, require different modules, and do not use computing resources equally. for example, the afghan ils runs on linux, apache, mysql, and perl. its opac and staff interface run on two different ports. the afghanistan digital libraries website requires linux, apache, mysql, and php. the japanese ill system was written in java and runs on tomcat. there are several reasons why the author moved these systems to the new cloud computing infrastructure: ■■ these systems need to be accessed in a system mode by people who are not ual employees. ■■ system rebooting time can be substantial in this infrastructure because of server setup and it policy. ■■ the current on-site server has web services and widgets for library information systems | han 89on the clouds: a new way of computing | han 89 reached its life expectancy and requires a replacement. by analyzing the complex needs of different systems and considering how to use resources more effectively, the author decided to run all the systems through one cloud computing provider. by comparing the features and the costs, linode (http://www.linode.com/) was chosen because it provides full ssh and root access using virtualization, four data centers in geographically diverse areas, high availability and clustering support, and an option for month-to-month contracts. in addition, other customers have provided positive reviews. in january 2009, the author purchased one node located in fremont, california, for $19.95 per month. an implementation plan (see appendix) was drafted to complete the project in phases. the author owns a virtual server and has access to everything that a physical server provides. in addition, the provider and the user community provided timely help and technical support. the migration of systems was straightforward: a linux kernel (debian 4.0) was installed within an hour, domain registration was complete and the domains went active in twenty-four hours, the afghanistan digital libraries’ website (based on joomla) migration was complete within a week, and all supporting tools and libraries (e.g., mysql, tomcat, and java sdk) were installed and configured within a few days. a month later, the afghanistan ils (based on koha) migration was completed. the ill system was also migrated without problem. tests have been performed in all these systems to verify their usability. in summary, the migration of systems was very successful and did not encounter any barriers. it addresses the issues facing us: after the migration, ssh log-ins for users who are not university employees were set up quickly; systems maintenance is managed by the author’s team, and rebooting now only takes about one minute; and there is no need to buy a new server and put it in a temperature and security controlled environment. the hardware is maintained by the provider. the administrative gui for the linux nodes is shown in figure 1. since migration, no downtime because of hardware or other failures caused by the provider has been observed. after migrating all the systems successfully and running them in a reliable mode for a few months, the second phase was implemented (see appendix). another linux node (located in atalanta, georgia) was purchased for backup and monitoring (see figure 2). nagios, an open-source monitoring system, was tested and configured to identify and report problems for the above library systems. nagios provides the following functions: (1) monitoring critical computing components, such as the network, systems, services, and servers; (2) timely alerts delivered via e-mail or cell phone; and (3) report and record logs of outages, events, and alerts. a backup script is also run as a prescheduled job to back up the systems on a regular basis. figure 1. linux node administration web interface figure 2. two linux nodes located in two remote data centers node 1: 64.62.xxx.xxx (fremont, ca) node 2: 74.207.xxx.xxx (atlanta, ga) nagios backup afghan digital libraries website afghan ils interlibrary loan system dspace 90 information technology and libraries | june 201090 information technology and libraries | june 2010 findings and discussions since january 2009, all the systems have been migrated and have been running without any issues caused by the provider. the author is very satisfied with the outcomes and cost. the annual cost of running two nodes is $480 per year, compared to at least $4,000 dollars if the hardware had been run in the library.12 from the author ’s experience, cloud computing provides the following advantages over the traditional way of computing in academic institutions: ■■ cost-effectiveness: from the above example and literature review, it is obvious that using cloud computing to run applications, systems, and it infrastructure saves staff and financial resources. uc berkeley’s report and zawodny’s blog provide a detailed analysis of costs for cpu hours and disk storage.13 ■■ flexibility: cloud computing allows organizations to start a project quickly without worrying about up-front costs. computing resources such as disk storage, cpu, and ram can be added when needed. in this case, the author started on a small scale by purchasing one node and added additional resources later. ■■ data safety: organizations are able to purchase storage in data centers located thousands of miles away, increasing data safety in case of natural disasters or other factors. this strategy is very difficult to achieve in a traditional off-site backup. ■■ high availability: cloud computing providers such as microsoft, google, and amazon have better resources to provide more up-time than almost any other organizations and companies do. ■■ the ability to handle large amounts of data: cloud computing has a pay-for-use business model that allows academic institutions to analyze terabytes of data using distributed computing over hundreds of computers for a short-time cost. on-demand data storage, high availability and data safety are critical features for academic libraries.14 however, readers should be aware of some technical and business issues: ■■ availability of a service: in several widely reported cases, amazon’s s3 and google gmail were inaccessible for a duration of several hours in 2008. the author believes that the commercial providers have better technical and financial resources to keep more up-time than most academic institutions. for those wanting no single point of failure (e.g., a provider goes out of business), the author suggests storing duplicate data with a different provider or locally. ■■ data confidentiality: most academic libraries have open-access data. this issue can be solved by encrypting data before moving to the clouds. in addition, licensing terms can be negotiated with providers regarding data safety and confidentiality. ■■ data transfer bottlenecks: accessing the digital collections requires considerable network bandwidth, and digital collections are usually optimized for customer access. moving huge amounts of data (e.g., preservation digital images, audios, videos, and data sets) to data centers can be scheduled during off hours (e.g., 1–5 a.m.), or data can be shipped on hard disks to the data centers. ■■ legal jurisdiction: legal jurisdiction creates complex issues for both providers and end users. for example, canadian privacy laws regulate data privacy in public and private sectors. in 2008, the office of the privacy commissioner of canada released a finding that “outsourcing of canada .com email services to u.s.-based firm raises questions for subscribers,” and expressed concerns about public sector privacy protection.15 this brings concerns to both providers and end users, and it was suggested that privacy issues will be very challenging.16 summary the author introduces cloud computing services and providers, presents his experience of running multiple systems such as ils, content management systems, repository software, and the other system “on the clouds” since january 2009. using cloud computing brings significant cost savings and flexibility. however, readers should be aware of technical and business issues. the author is very satisfied with his experience of moving library systems to cloud computing. his experience demonstrates a new way of managing critical computing resources in an academic library setting. the next steps include using cloud computing to meet digital collections’ storage needs. cloud computing brings fundamental changes to organizations managing their computing needs. as major organizations in library fields, such as oclc, started to take advantage of cloud computing, the author believes that cloud computing will play an important role in library it. acknowledgments the author thanks usaid and washington state university for providing financial support. the author thanks matthew cleveland’s excellent work “on the clouds.” references 1. nicholars carr, the big switch: rewiring the world, from edison to google web services and widgets for library information systems | han 91on the clouds: a new way of computing | han 91 (london: norton, 2008). 2. werner vogels, “a head in the clouds—the power of infrastructure as a service” (paper presented at the cloud computing and in applications conference (cca ’08), chicago, oct. 22–23, 2008). 3. peter mell and tim grance, “draft nist working definition of cloud computing,” national institute of standards and technology (may 11, 2009), http:// csrc.nist.gov/groups/sns/cloud-computing/index.html (accessed july 22, 2009). 4. michael armbust et al., “above the clouds: a berkeley view of cloud computing,” technical report, university of california, berkeley, eecs department, feb. 10, 2009, http://www.eecs.berkeley .edu/pubs/techrpts/2009/eecs-200928.html (accessed july 1, 2009). 5. eric hand, “head in the clouds: ‘cloud computing’ is being pitched as a new nirvana for scientists drowning in data. but can it deliver?” nature 449, no. 7165 (2007): 963; geoffery fowler and ben worthen, “the internet industry is on a cloud—whatever that may mean,” wall street journal, mar. 26, 2009, http://online.wsj.com/article/ sb123802623665542725.html (accessed july 14, 2009); stephen baker, “google and the wisdom of the clouds,” business week (dec. 14, 2007), http://www.msnbc .msn.com/id/22261846/ (accessed july 8, 2009). 6. gartner, “gartner says worldwide it spending on pace to supass $3.4 trillion in 2008,” press release, aug. 18, 2008, http://www.gartner.com/it/page .jsp?id=742913 (accessed july 7, 2009). 7. wyatt kash, “usa.gov, gobierno usa.gov move into the internet cloud,” government computer news, feb. 23, 2009, http://gcn.com/articles/2009/02/23/ gsa-sites-to-move-to-the-cloud.aspx?s =gcndaily_240209 (accessed july 14, 2009). 8. derek gottfrid, “self-service, prorated super computing fun!” online posting, new york times open, nov. 1, 2007, http://open.blogs .nytimes.com/2007/11/01/self-service -prorated-super-computing-fun/?scp =1&sq=self%20service%20prorated&st =cse (accessed july 8, 2009). 9. oclc online computing library center, “oclc announces strategy to move library management services to web scale,” press release, apr. 23, 2009, http://www.oclc.org/us/en/news/ releases/200927.htm (accessed july 5, 2009). 10. duraspace, “fedora commons and dspace foundation join together to create duraspace organization,” press release, may 12, 2009, http:// duraspace.org/documents/pressrelease .pdf (accessed july 8, 2009). 11. yan han and atifa rawan, “afghanistan digital library initiative: revitalizing an integrated library system,” information technology & libraries 26, no. 4 (2007): 44–46. 12. fowler and worthen, “the internet industry is on a cloud.” 13. jeremy zawodney, “replacing my home backup server with amazon’s s3,” online posting, jeremy zawodny’s blog, oct. 3, 2006, http://jeremy .zawodny.com/blog/archives/007624 .html (accessed june 19, 2009). 14. yan han, “an integrated high availability computing platform,” the electronic library 23, no. 6 (2005): 632–40. 15. office of the privacy commissioner of canada, “tabling of privacy commissioner of canada’s 2005–06 annual report on the privacy act: commissioner expresses concerns about public sector privacy protection,” press release, june 20, 2006, http://www.priv.gc.ca/media/ nr-c/2006/nr-c_060620_e.cfm (accessed july 14, 2009); office of the privacy commissioner of canada, “findings under the personal information protection and electronic documents act (pipeda),” (sept. 19, 2008), http://www.priv.gc.ca/cf -dc/2008/394_20080807_e.cfm (accessed july 14, 2009). 16. stephen baker, “google and the wisdom of the clouds,” business week (dec. 14, 2007), http://www.msnbc.msn .com/id/22261846/ (accessed july 8, 2009). appendix. project plan: building ha linux platform using cloud computing project manager: project members: object statement: to build a high availability (ha) linux platform to support multiple systems using cloud computing in six months. scope: the project members should identify cloud computing providers, evaluate the costs, and build a linux platform for computer systems, including afghan ils, afghanistan digital libraries website, repository system, japanese interlibrary loan website, and digitization management system. resources: project deliverable: january 1, 2009—july 1, 2009 92 information technology and libraries | june 201092 information technology and libraries | june 2010 phase i ■■ to build a stable and reliable linux platform to support multiple web applications. the platform needs to consider reliability and high availability in a cost-effective manner ■■ to install needed libraries for the environment ■■ to migrate ils (koha) to this linux platform ■■ to migrate afghan digital libraries’ website (joomla) to this platform ■■ to migrate japanese interlibrary loan website ■■ to migrate digitization management system phase ii ■■ to research and implement a monitoring tool to monitor all web applications as well as os level tools (e.g. tomcat, mysql) ■■ to configure a cron job to run routine things (e.g., backup ) ■■ to research and implement storage (tb) for digitization and access phase iii ■■ to research and build linux clustering steps: 1. os installation: debian 4 2. platform environment: register dns 3. install java 6, tomcat 6, mysql 5, etc. 4. install source control env git 5. install statistics analysis tool (google analytics) 6. install monitoring tool: ganglia or nagios 7. web applications 8. joomla 9. koha 10. monitoring tool 11. digitization management system 12. repository system: dspace, fedora, etc. 13. ha tools/applications note calculation based on the following: ■■ leasing two nodes $20/month: $20 x 2 nodes x 12 months = $480/year ■■ a medium-priced server with backup with a life expectancy of 5 years ($5,000): $1,000/year ■■ 5 percent of system administrator time for managing the server ($60,000 annual salary): $3,000/year ■■ ignore telecommunication cost, utility cost, and space cost. ■■ ignore software developer’s time because it is equal for both options. appendix. project plan: building ha linux platform using cloud computing (cont.) examining attributes of open standard file formats for long-term preservation and open access eun g.park and sam oh information technology and libraries | december 2012 44 abstract this study examines the attributes that have been used to assess file formats in literature and compiles the most frequently used attributes of file formats to establish open-standard file-formatselection criteria. a comprehensive review was undertaken to identify the current knowledge regarding file-format-selection criteria. the findings indicate that the most common criteria can be categorized into five major groups: functionality, metadata, openness, interoperability, and independence. these attributes appear to be closely related. additional attributes include presentation, authenticity, adoption, protection, preservation, reference, and others. introduction file format is one of the core issues in the fields of digital content management and digital preservation. as many different types of file formats are available for texts, images, graphs, audio recordings, videos, databases, and web applications, the selection of appropriate file formats poses an ongoing challenge to libraries, archives, and other cultural heritage institutions. some file formats appear to be more widely accepted: tagged image file format (tiff), portable document format (pdf), pdf/a, office open xml (ooxml), and open document format (odf), to name a few. many institutions, including the library of congress (lc), possess guidelines on file format applications for long-term preservation strategies that specify requisite characteristics of acceptable file formats (e.g., they are independent of specific operating systems, are independent of hardware and software functions, conform to international standards, etc.).1 the format descriptions database of the global digital format registry is an effort to maintain a detailed representation of information and sustainability factors for as many file formats as possible (the pronom technical registry is another such database).2 despite these developments, file format selection remains a complex task and prompts many questions that range from a general interest (“which selection criteria are appropriate?”) to more specific (“are these international standard file formats sufficient for us to ensure long term preservation and access?” or “how should we define and implement standard file formats in harmony with our local context?”). in this study, we investigate the definitions and features of standard file formats and examine the eun g. park (eun.park@mcgill.ca) is associate professor, school of information studies, mcgill university, montreal, canada. sam oh (samoh@skku.edu) is corresponding author and professor, department of library and information science, sungkyunkwan university, seoul, korea. mailto:eun.park@mcgill.ca mailto:samoh@skku.edu information technology and libraries | december 2012 45 major attributes of assessing file formats. we discuss relevant issues from the viewpoint of openstandard file formats for long-term preservation and open access. background on standard file formats the term file format is generally defined as what “specifies the organization of information at some level of abstraction, contained in one or more byte streams that can be exchanged between systems.”3 according to interpares 2, file format is “the organization of data within files, usually designed to facilitate the storage, retrieval, processing, presentation, and/or transmission of the data by software.”4 the premis data dictionary for preservation metadata observes that, technically, file format is “a specific, pre-established structure for the organization of a digital file or bitstream.”5 in general, file format can be divided into two types: an access format and a preservation format. an access format is “suitable for viewing a document or doing something with it so that users access the on-the-fly converted access formats.”6 in comparison, a preservation format is “suitable for storing a document in an electronic archive for a long period”7; it provides “the ability to capture the material into the archive and render and disseminate the information now and in the future.”8 while the ability to ensure long-term preservation focuses on the sustainability of preservation formats, the document in its access format tends to emphasize that it should be accessible and available by users, presumably all of the time. many researchers have discussed file formats and long-term preservation in relation to various types of resources. for example, folk and barkstrom describe and adopt several attributes of file formats that may affect the long-term preservation of scientific and engineering data (e.g., the ease of archival storage, ease of archival access, usability, data scholarship enablement, support for data integrity, and maintainability and durability of file formats).9 barnes suggests converting word processing documents in digital repositories, which are unsuitable for long-term storage, into a preservation format.10 the evaluation by rauch, krottmaier, and tochtermann illustrates the practical use of file formats for 3d objects in terms of long-term reliability.11 others have developed and/or applied numerous criteria in different settings. for instance, sullivan uses a list of desirable properties of a long-term preservation format to explain the purpose of pdf)/a from an archival and records management prospective.12 sullivan cites device independence, self-containment, self-describing, transparency, accessibility, disclosure, and adoption as such properties. rauch, krottmaier, and tochtermann’s study applies criteria that consist of technical characteristics (e.g., open specification, compatibility, and standardization) and market characteristics (e.g., guarantee duration, support duration, market penetration, and the number of independent producers). rog and van wijk propose a quantifiable assessment method to calculate composite scores of file formats.13 they identify seven main categories of criteria: openness, adoption, complexity, technical protection mechanism, self-documentation, robustness, and dependencies. sahu focuses on the criteria developed by the uk’s national archives, which include open standards, ubiquity, stability, metadata support, feature set, examining attributes of open standard file formats for long-term preservation and open access | park and oh 46 interoperability, and viability.14 a more comprehensive evaluation by the lc reveals three components—technical factors, quality, and functionality—while placing a particular emphasis on the balance between the first two.15 hodge and anderson use seven criteria for sustainability, which are similar to the technical factors of the lc study: disclosure, adoption, transparency, selfdocumentation, external dependencies, impact of patents, and technical protection mechanisms.16 some institutions adopt another term, standard file formats, to differentiate accepted and recommended file formats from others. according to the david project, “standard file formats owe their status to (official) initiatives for standardizing or to their widespread use.”17 standard may be too general to specify the elements of file formats. however, there is a recognition that only those file formats accepted and recommended by national or international standard organizations (such as the international standardization organization [iso], international industry imaging association [i3a], www consortium, etc.) are genuine standard file formats. for example, iso has announced several standard file formats for images: tiff/it (iso 12639:2004), png (iso/iec 15948:2004), and jpeg 2000 (iso/iec 15444:2003, 2004, 2005, 2007, 2008). for document file formats, pdf/a-1 (iso standard 19005-1. document file format for long-term preservation) is one example. this format is proprietary to maintain archival and recordsmanagement requirements and to preserve the visual appearance and migration needs of electronic documents. office open xml file format (iso/iec 29500–1:2008. information technology—document description and processing languages) is another open standard that can be implemented from microsoft office applications on multiple platforms. odf (iso/iec 26300:2006. information technology—open document format for office applications [opendocument] v1.0) is an xml-based open file format. regardless of iso-announced standards, some errors in these file formats have been reported. for example, although pdf/a-1 is for longterm preservation of and access to documents, studies reveal that the feature-rich nature of pdf can create difficulties in preserving pdf information over time.18 to overcome the barriers of pdf and pdf/a-1, xml technology seems prevalent for digital resources in archiving systems and digital preservation.19 the digital repository community is treating xml technology as a panacea and converting most of their digital resources to xml. the netherlands institute for scientific information service (nisis) adopts another noteworthy definition of standard file formats. it observes that standard image file formats “are widely accepted, have freely available specifications, are highly interoperable, incorporate no data compression and are capable of supporting preservation metadata.”20 this definition implies specific and advanced ramifications for cost-free interoperability and metadata, which closely relates to open access. open standard is another relevant term to consider in file formats. although perspectives vary greatly between researchers, open standards can be acquired and used without any barrier or cost.21 in other words, open standard products are free from restrictions, such as patents, and are independent of proprietary hardware or software. since the 1990s, open standard has been broadly adopted in many fields and is now an almost compulsory feature in information services. information technology and libraries | december 2012 47 to follow the national archives’ definition, open standard formats are “formats for which the technical specifications have been made available in the public domain.”22 in comparison, the folk and barkstrom approach opens standards from institutional support perspectives, relying on user communities for standards that are widely available and used.23 on a more specific level, stanescu emphasizes independence as the basic selection criteria for file formats.24 others, such as todd, propose determining whether a standard should be more open than others by applying criteria: adoption, platform independence, disclosure, transparency, and metadata support.25 other factors considered by todd include reusability and interoperability; robustness, complexity, and viability; stability; and intellectual property (ip) and rights management.26 echoing the lc, hodge and anderson also suggest a list of selection criteria that have been grouped under the banner of “technical factors”: disclosure, adoption, transparency, self-documentation, external dependencies, impact of patents, and technical protection mechanisms.27 researchers agree that open standard file formats are less obsolete and more reliable than proprietary formats.28 close examination of the nisis definition mentioned above reveals that standard file formats are in reality not free, nor do they allow unrestricted access to resources. the three file formats that iso has announced (pdf/a, ooxml, and odf) are proprietary and sometimes costly. they also prohibit the purchase of access to a proprietary standard, although there is an assumption that a standard should be free from legal and financial restrictions. the iso-announced file formats, in short, are only standard file formats, not open standard file formats. for cultural heritage institutions, questions regarding appropriate selection criteria and the sufficiency of existing international standard file formats for long-term preservation and access remain unanswered. there exists neither a uniform method to compare the specifications of different file formats nor an objective approach to assess format specifications that would ensure long-term preservation and persistent access. objectives of the study in this study, we attempt to better define and establish open-standard file-format-selection criteria. to that end, we assess and compile the most frequently used attributes of file formats to establish open-standard file-format-selection criteria. method we performed a comprehensive review of published articles, institutional reports, and other literature to identify the current knowledge regarding file-format-selection criteria. we included literature that deals with the three standard file formats (pdf, pdf/a, and xml) but excluded the recently announced odf format due to the scarcity of literature on odf. among more than the thirty articles initially reviewed, only twenty-five that use their own clear attributes were included in this study. all of the attributes that we have employed are listed by frequency and grouped according to similarities in meaning (see appendix). the original definitions or descriptions that we used are listed in the second column. the file formats that we assessed by their attributes are examining attributes of open standard file formats for long-term preservation and open access | park and oh 48 listed in the third column. when we give attributes without specific definitions or descriptions, “no definite term” is inserted. findings as illustrated in the appendix, the criteria identified by the studies vary. although the requirements and context of the studies may differ, the most common criteria can be divided into five categories: functionality, metadata, openness, interoperability, and independence. first, functionality refers to the ability of a format to do exactly what it is supposed to be doing.29 it is important to distinguish between two broad uses: preservation of document structure and formatting and preservation of useable content. to preserve document formatting, a “published view” of a given piece of content is critical for distribution. other content, such as database information or device-specific documents, needs to be preserved as well. functionality criteria include various attributes related to formats and structure or physical and technical specifications of files (e.g., robustness, feature set, viability, color maintenance, clarity, compactness, modularity, compression algorithms, etc.). second, metadata indicates that a format allows rich descriptive and technical metadata to be embedded in files. metadata can be expressed as metadata support, self-documentation (selfdocumenting), documentation, content-level (as opposed to presentation-level) description, selfdescribing, self-describing files, formal description of format, etc. third, openness refers to specifications of a file format that are publicly available and accessible and formats that are not proprietary. whether seen as a single definition or as a set of criteria, the characteristic that appears to be at the core of the open standard movement is its independence from outside proprietary or commercial control. openness also may refer to the autonomy of a file format, which relies on several factors. first, the document should be self-contained in terms of the content information (e.g., the text), the structural information (i.e., for those documents that are structured), the formatting information (e.g., fonts, colours, styles, etc.), and the metadata information. self-containment does not necessarily mean that an archivist will only have one document to deal with. it does mean, however, that they will have documents that will provide them with all the information to access and process the content, structure, formatting, and metadata. openness is expressed as open availability by some researchers.30 other researchers adopt the term disclosure for expressing that specification is publicly available.31 fourth, is the independence of a document from proprietary or commercial hardware and software configurations, especially to prevent any issues resulting from different versions of software, hardware, and operating systems. this aspect is expressed in the appendix as open standards, open source software or equivalent, standard/proprietary, etc. this also closely relates to independence, one of the five categories in the appendix, expressed as device independencies, independent implementations, no external dependency, no external dependencies, portability, and monitoring obsolescence. having documents in a proprietary format controlled by a third party information technology and libraries | december 2012 49 implies that, at one time or another, this format may no longer be supported, or that a change in the user agreement may lead to restricted access, access to outdated material, or patent and copyright issues. this fact means that the document must be freely accessible, without password restrictions or protection, and without any digital rights management scheme. blocking access to a document with a password can lead to serious problems if the password gets lost. in addition, the size and compactness of the document will influence the selection of a file format. fifth, interoperability primarily refers to the ability of a file format to be compatible with other formats and to exchange documents without loss of information.32 specifically, it refers to the ability of a given software to open a document without requiring any special application, plug-in, codec, or proprietary add-on. adherence to open source standards is usually a good indication of the interoperability of a format. in general, an open standard is released after years of bargaining and agreements between major players. supervision by an international standard (such as iso or the w3c) commonly helps propagate the format. in addition to the five categories mentioned above, other attributes are often used. presentation, authenticity, adoption, protection, preservation and reference are such examples. among these attributes, authenticity, although this is the seventh in the appendix, is one of the most important attributes in archives and records management. it refers to the ability to guarantee that a file is what it originally was without any corruption or alteration.33 specific to authenticity is data integrity, which assesses the integrity of the file through an internal mechanism (e.g., png files include byte sequences to validate against errors). another method of validating the authenticity of a document is to look at its traceability,34 that is, the traces left by the original author and those who modified or opened a file. one example is the difference between the creation date, modification date, and access date of any file on a personal computer. these three dates correspond to a moment when someone (often a different person each time) opened the file. other mechanisms may require log information, which is external to the file. another good indication of authenticity is the stability of a format.35 a format that is widely used is more likely to be stable. a stable format is also more likely to cause less data loss and corruption; hence it is a better indicator of authenticity. presentation includes attributes related to presenting and rendering data, expressed as distributing a page image, normal rendering, self-containment, selfcontained, and beyond normal rendering. adoption indicates how popular and widely a file format is adopted by user communities, also represented as popularity, widely used formats, ubiquity, or continuity. protection includes the technical protection mechanism or source verification to protect with security skills. preservation means long-term preservation, institutional support, or ease of transformation and preservation. reference indicates citability, or referential extensibility. among other attributes, transparency is interesting to note because it indicates the degree to which files are open to direct analysis with basic tools and human readability. another important aspect across these criteria is that the terminologies used in the studies may be quite different yet describe the same or similar concepts from different angles. for instance, rog and van wijk use openness for standardization and specification without restrictions,36 while examining attributes of open standard file formats for long-term preservation and open access | park and oh 50 several other researchers use open availability to convey the same thing.37 they in turn adopt the term disclosure to express that specification is publicly available.38 discussion and conclusion functionality, metadata, openness, interoperability, and independence appear to be the most important factors when selecting file formats. when file formats for long-term preservation and open access are under discussion, cultural heritage institutions need to consider many issues. despite several efforts, it is still tricky for them to identify the most appropriate file format or even to discern acceptable formats from unacceptable formats. where it is difficult to prevent the creation of a new file format, format selection is not an easy task, both in theory and in practice. it is critical, however, to base the decision on a clear understanding of the purpose for which the document is preserved: access preservation or repurposing preservation. cultural heritage institutions and digital repository communities need to guarantee long-term preservation of digital resources in selected file formats. additionally, users find it necessary to have access to digital information in these file formats. additional consideration involves the level of access users may enjoy (e.g., long-term access, permanent access, open access, persistent access, etc.). when determining international standard file formats, an aspect of open access should be included because it is a well-liked topic. it is necessary to develop a scale or measurement to assess open-standard format specifications to ensure long-term preservation and open access. identifying which attributes are required to be an open-standard file format and which digital format is most apt for the use and sustainability of long-term preservation is a meaningful task. the outcome of our study provides a framework for appropriate strategies when selecting file formats for long-term preservation and access to digital content. we hope that the criteria described in this study will benefit librarians, preservers, record creators, record managers, archivists, and users. we are reminded of todd’s remark that “the most important action is to align the recognition and weighting of criteria with a clear preservation strategy and keep them under review using risk management techniques.”39 the question of how to adopt and implement these attributes can only be answered in the local context and decisions of each cultural heritage institution.40 each institution should consider implementing a file format throughout the entire life cycle of digital resources, with a holistic approach to managerial, technical, procedural, archival, and financial issues for the purpose of long-term preservation and persistent access. the criteria may change over time, as is necessary for any format to adequately serve its purpose. maintaining its quality may be an ongoing task that cultural heritage institutions should take into account at all times. even more importantly, cultural heritage institutions need to establish and implement a set of standard guidelines specific to each context for the selection of open-standard file formats. note: this research was supported by the sungkyunkwan university research fund (2010-2011). information technology and libraries | december 2012 51 references and notes 1. library of congress, “sustainability of digital formats: planning for library of congress collections,” www.digitalpreservation.gov/formats/intro/intro.shtml (accessed november 21, 2011). 2. global digital format registry, www.gdfr.info (accessed november 17, 2011); the technical registry pronom, www.nationalarchives.gov.uk/aboutapps/pronom (accessed november 21, 2011). 3. mike folk and bruce r. barkstrom, “attributes of file formats for long-term preservation of scientific and engineering data in digital libraries” (paper presented at the joint conference on digital libraries (jcdl), houston, tx, may 27–31, 2003), 1, www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom2 00305.pdf (accessed november 21, 2011). 4. interpares 2 project glossary, p. 24, www.interpares.org/ip2/ip2_term_pdf.cfm?pdf=glossary (accessed november 21, 2011). 5. premis editorial committee, premis data dictionary for preservation metadata, ver. 2.0, march 2008, p. 195, www.loc.gov/standards/premis/v2/premis-2-0.pdf (accessed november 21, 2011). 6. ian barnes, “preservation of word processing documents,” july 14, 2006, p. 4, http://apsr.anu.edu.au/publications/word_processing_preservation.pdf (accessed november 21, 2011). 7. ibid. 8. gail hodge and nikkia anderson, “formats for digital preservation: a review of alternatives and issues,” information services & use 27 (2007): 46. 9. folk and barkstrom, “attributes of file formats.” 10. barnes, “preservation of word processing documents.” 11. carl rauch, harald krottmaier, and klaus tochtermann, “file-formats for preservation: evaluating the long-term stability of file-formats,” in proceedings of the 11th international conference on electronic publishing 2007 (vienna, austria, june 13–15, 2007): 101–6. 12. susan j. sullivan, “an archival/records management perspective on pdf/a,” records management journal 16, no. 1 (2006): 51–56. 13. judith rog and caroline van wijk, “evaluating file formats for long-term preservation,” 2008, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). http://www.digitalpreservation.gov/formats/intro/intro.shtml http://www.nationalarchives.gov.uk/aboutapps/pronom http://www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom200305.pdf http://www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom200305.pdf http://www.interpares.org/ip2/ip2_term_pdf.cfm?pdf=glossary http://www.loc.gov/standards/premis/v2/premis-2-0.pdf http://apsr.anu.edu.au/publications/word_processing_preservation.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 52 14. d. k. sahu, “long term preservation: which file format to use” (paper presented in workshops on open access & institutional repository, chennai, india, may 2–8, 2004), http://openmed.nic.in/1363/01/long_term_preservation.pdf (accessed november 21, 2011). 15. cendi digital preservation task group, “formats for digital preservation: a review of alternatives and issues,” www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf (accessed november 21, 2011). 16. hodge and anderson, “formats for digital preservation.” 17. david 4 project (digital archiving, guideline and advice 4), “standards for fileformats,” 1, www.expertisecentrumdavid.be/davidproject/teksten/guideline4.pdf (accessed november 21, 2011). 18. sullivan, “an archival/records management perspective on pdf/a”; john michael potter, “formats conversion technologies set to benefit institutional repositories,” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881&rep=rep1&type=pdf (accessed november 21, 2011). 19. eva müller et al., “using xml for long-term preservation: experiences from the diva project,” in proceedings of the 6th international symposium on electronic theses and dissertations (may 20–24, 2003): 109–16, https://edoc.hu-berlin.de/conferences/etd2003/hanssonpeter/html/index.html (accessed november 21, 2011). 20. rene van horik, “image formats: practical experiences” (paper presented in erpanet training, vienna, austria, may 10–11, 2004), 22, www.erpanet.org/events/2004/vienna/presentations/erpatrainingvienna_horik.pdf (accessed november 21, 2011). 21. open standard is related to open access, which comes from the open access movement that allows resources to be freely available to the public and permits any user to use those resources (e.g., mainly electronic journals, repositories, databases, software applications, etc.) without financial, legal, or technical barriers. see amy e. c. koehler, “some thoughts on the meaning of open access for university library technical services,” serials review 32, no. 1 (march 2006): 17–21; budapest open access initiative, “read the budapest open access initiative,” www.soros.org/openaccess/read.shtml (accessed november 21, 2011). 22. national archives, “selecting file formats for long-term preservation,” 6, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). 23. folk and barkstrom, “attributes of file formats.” http://openmed.nic.in/1363/01/long_term_preservation.pdf http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf http://www.expertisecentrumdavid.be/davidproject/teksten/guideline4.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881&rep=rep1&type=pdf https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/html/index.html https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/html/index.html http://www.erpanet.org/events/2004/vienna/presentations/erpatrainingvienna_horik.pdf http://www.soros.org/openaccess/read.shtml http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf information technology and libraries | december 2012 53 24. andreas stanescu, “assessing the durability of formats in a digital preservation environment: the inform methodology,” d-lib magazine 10, no. 11 (november 2004), www.dlib.org/dlib/november04/stanescu/11stanescu.html (accessed november 21, 2011). 25. malcolm todd, “technology watch report: file formats for preservation,” www.dpconline.org/advice/technology-watch-reports (accessed november 21, 2011). 26. ibid. 27. hodge and anderson, “formats for digital preservation.” 28. edward m. corrado, “the importance of open access, open source, and open standards for libraries,” issues in science & technology librarianship (spring 2005), www.library.ucsb.edu/istl/05-spring/article2.html (accessed november 21, 2011); carl vilbrandt et al., “cultural heritage preservation using constructive shape modeling,” computer graphics forum 23, no. 1 (2004): 25–41; marshall breeding, “preserving digital information,” information today 19, no. 5 (2002): 48–49. 29. eun g. park, “xml: examining the criteria to be open standard file format,” (paper presented at the interpares 3 international symposium, oslo, norway, september 17, 2010), www.interpares.org/display_file.cfm?doc=ip3_isym04_presentation_3–3_korea.pdf (accessed november 21, 2011). 30. adrian brown, “digital preservation guidance note: selecting file formats for long-term preservation,” www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011); barnes, “preservation of word processing documents”; sahu, “long term preservation”; potter, “formats conversion technologies.” 31. stephen abrams et al., “pdf-a: the development of a digital preservation standard” (paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, 2005), www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011); sullivan, “an archival/records management perspective on pdf/a”; cendi, “formats for digital preservation”; and hodge & anderson, “formats for digital preservation.” 32. the national archives, http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_me thod_27022008.pdf (accessed november 21, 2011); ecma international, “office open xml file formats—ecma-376,” www.ecma-international.org/publications/standards/ecma-376.htm (accessed november 21, 2011). 33. christoph becker et al., “systematic characterisation of objects in digital preservation: the extensible characterisation languages,” www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_bec ker.pdf (accessed november 21, 2011); national archives, http://www.dlib.org/dlib/november04/stanescu/11stanescu.html http://www.dpconline.org/advice/technology-watch-reports http://www.library.ucsb.edu/istl/05-spring/article2.html http://www.interpares.org/display_file.cfm?doc=ip3_isym04_presentation_3–3_korea.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.aiim.org/documents/standards/pdf-a.ppt http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_becker.pdf http://www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_becker.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 54 www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). 34. folk and barkstrom, “attributes of file formats.” 35. national archives, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011); rog and van wijk, “evaluating file formats for long-term preservation.” 36. rog and van wijk, “evaluating file formats for long-term preservation.” 37. see brown, “digital preservation guidance note: selecting file formats for long-term preservation,” www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011); barnes, “preservation of word processing documents”; sahu, “long term preservation”; potter, “formats conversion technologies.” 38. stephen abrams et al., “pdf-a: the development of a digital preservation standard” (paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, 2005), www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011).; sullivan, “an archival/records management perspective on pdf/a”; cendi, “formats for digital preservation”; and hodge & anderson, “formats for digital preservation.” 39. todd, “technology watch report,” 33. 40. evelyn peters mclellan, “selecting digital file formats for long-term preservation: interpares 2 project general study 11 final report,” www.interpares.org/display_file.cfm?doc=ip2_file_formats(complete).pdf (accessed november 21, 2011). http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.aiim.org/documents/standards/pdf-a.ppt http://www.interpares.org/display_file.cfm?doc=ip2_file_formats(complete).pdf information technology and libraries | december 2012 55 appendix: file format attributes no. attribute definition/description assessed file format 1. f u n c t i o n a l i t y robustness robust against single point of failure, support for file corruption detection, file format stability, backward compatibility and forward compatibility (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) a robust format contains several layers of defense against corruption (frey, 2000). n/a feature set formats supporting the full range of features and functionality (brown, 2003) n/a not defined (sahu, 2006) n/a viability error-detection facilities to allow detection of file corruption (brown, 2003). png format (yes) not defined (sahu, 2006) n/a support for graphic effects and typography not defined (cendi, 2007; hodge & anderson, 2007) tiff_g4 (no) color maintenance not defined (cendi, 2007; hodge & anderson, 2007) tiff_g4 (limited) clarity support for high image resolution (cendi, 2007; hodge & anderson, 2007) tiff_g4 (yes) quality this pertains to how well the format fulfills its task today: (1) low space costs, (2) highly encompassing, (3) robust, (4) simplicity, (5) highly tested, (6) loss-free, (7) supports metadata (clausen, 2004). n/a compactness to minimize storage and i/o costs (folk & barkstrom, 2003) n/a simplicity ease of implementing readers (folk & barkstrom, 2003) n/a file corruption detection to be able to detect that a file has been corrupted; to provide errorcorrection (folk & barkstrom, 2003) n/a raw i/o efficiency formats that are organized for fast sequential access (folk & barkstrom, 2003) n/a availability of readers to maintain ease of data access for readers (folk & barkstrom, 2003) n/a ease of subsetting to process only part of data files (folk & barkstrom, 2003) n/a size to transfer data in large blocks (folk & barkstrom, 2003) n/a ability to aggregate many objects in a single file to maintain as small as archive “name space” as possible (folk & barkstrom, 2003) n/a ability to embed data extraction software in the files the files come with read software embedded (folk & barkstrom, 2003). n/a ability to name file elements to work with data based on manipulating the element names instead of binary offsets, or other references (folk & barkstrom, 2003) n/a rigorous definition to be defined in a sufficient rigorous way (folk & barkstrom, 2003) n/a multilanguage implementation of library software to have multiple implementations of readers for a single format (folk & barkstrom, 2003) n/a memory some formats emphasize the presence or absence of memory (frey, 2000). tiff (yes) examining attributes of open standard file formats for long-term preservation and open access | park and oh 56 accuracy in some cases, the accuracy of the data can be decreased to save memory, e.g., through compression. in the case of a digital master, however, accuracy is very important (frey, 2000). n/a speed the ability to access or display a data set at a certain speed is critical to certain applications (frey, 2000). n/a extendibility a data format can be modified to allow for new types of data and features in the future (frey, 2000). n/a modularity a modular data set definition is designed to allow some of its functionality to be upgraded or enhanced without having to propagate changes through all parts of the data set (frey, 2000). n/a plugability related to modularity, this permits the user of an implementation of a data set reader or writer to replace a module with private code (frey, 2000). n/a interpretability not binary formats (barnes, 2006) rtf (yes) ms word (no) xml (yes) the standard should be written in characters that people can read (lesk, 1995). n/a complexity human readability, compression, variety of features (rog & van wijk, 2008; wijk & rog, 2007). n/a simple raster formats are preferred (puglia et al., 2004). n/a compression algorithms the format uses standard algorithms (puglia et al., 2004). n/a accessibility to prohibit encryption in the file trailer (sullivan, 2006) pdf/a (yes) component reuse not defined (sahu, 2006) pdf (no) html (limited) sgml (excellent) xml (excellent) repurposing not defined (sahu, 1999) pdf (limited) html (limited) sgml (excellent) xml (excellent) packaging formats in general, packaging formats should be acceptable as transfer mechanisms for image file formats (puglia et al., 2004). zip (yes) significant properties the format accommodates high-bit, high-resolution (detail), color accuracy, and multiple compression options (puglia et al., 2004). n/a processability the requirement to maintain a processable version of the record to have any reuse value (brown, 2003) conversion of a word-processed document into pdf format. (no) searching not defined (sahu, 2006) pdf (limited) html (good) sgml (excellent) xml (excellent) no definite term to support the automatic validation of document conversions and the evaluation of conversion quality by hierarchically decomposing documents from different sources and representing them in an abstract xml language (becker et al., 2008a; becker et al., 2008b) n/a xcl (yes) to make transferring data easy (johnson, 1999) n/a xml (yes) a format that is easy to restore and understand by both humans and machines (müller et al., 2003) n/a xml (yes) information technology and libraries | december 2012 57 inability to be backed out into a usable format (potter, 2006) pdfs (no) 2. m e t a d a t a self-documentation self-documenting digital objects that contain basic descriptive, technical, and other administrative metadata (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) metadata and technical description of format embedded (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) the ability of a digital format to hold (in a transparent form) metadata beyond that needed for basic rendering of the content (arms & fleischhauer, 2006) n/a self-documenting to contain its own description (abrams et al., 2005) n/a documentation deep technical documentation publicly and fully is available. it is maintained for older versions of the format (puglia et al., 2004). n/a metadata support file formats making provision for the inclusion of metadata (brown, 2003) tiff (yes) microsoft word 2000 (yes) not defined (kenney, 2001) fiff 6.0 (yes) gif 89a (yes) jpeg (yes) flashpix 1.0.2 (yes) imagepac, photo cd (no) png 1.2 (yes) pdf (yes) not defined (sahu, 2006) n/a metadata the format allows for self-documentation (puglia et al., 2004). n/a content-level description not presentation-level description; structural markup, not formatting (barnes, 2006) pdf (no) docbook (yes) tei (yes) xhtml (yes) xml (yes) content-level, not presentation-level, descriptions where possible, the labeling of items should reflect their meaning, not their appearance (lesk, 1995). sgml (yes) self-describing many different types of metadata are required to decipher the contents of a file (folk & barkstrom, 2003). n/a self-describing files embed metadata in pdf files (sullivan, 2006) pdf/a (adobe extensible metadata platform required) formal (bnfor xml-like) description of format to create new readers solely on the basis of formal descriptions of the file content (folk & barkstrom, 2003) n/a no definite term its self-describing tags identify what your content is all about (johnson, 1999). n/a xml (yes) a format for strong descriptive and administrative metadata and the complete content of the document (müller et al., 2003) n/a xml (yes) examining attributes of open standard file formats for long-term preservation and open access | park and oh 58 3. o p e n n e s s disclosure authoritative specification publicly available (abrams et al., 2005) pdf/a (yes) microsoft word (no) the degree to which complete specifications and tools for validating technical integrity exist and are accessible to those creating and sustaining digital content (cendi, 2007; hodge & anderson, 2007; arms & fleischhauer, 2006) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) authoritative specification is publicly available (sullivan, 2006). pdf/a (yes) open availability no proprietary formats (barnes, 2006) odf (yes) gif (no) pdf (no) rtf (no) microsoft word (no) any manufacturer or researcher should have the ability to use the standard, rather than having it under the control of only one company (lesk, 1995). kodak photocd (no) gif (no) openness standardization, restrictions on the interpretation of the file format, reader with freely available source (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (yes) ms word (no) a standard is designed to be implemented by multiple providers and guide 5: file formats for digital masters employed by a large number of users (frey, 2000). n/a formats that are described by publicly available specifications or open-source source code can, with some effort, be reconstructed later: (1) open publicly available specification, (2) specification in public domain, (3) viewer with freely available source, (4) viewer with gpl’ed source, (5) not encrypted (clausen, 2004). n/a open-source software or equivalent to move toward obtaining open-source arrangements for all parts of the file format and associated libraries (folk & barkstrom, 2003) n/a open standard formats for which the technical specification has been made available in the public domain (brown, 2003) jpeg (yes) pdf (limited) ascii (limited) not defined (sahu, 2006) n/a standard/ proprietary not defined (kenney, 2001) fiff 6.0 (yes) gif 89a (yes) jpeg (yes) flashpix 1.0.2 (yes) imagepac, photo cd (no) png 1.2 (yes) pdf (yes) nonproprietary formats the specification is independent of a particular vendor (public records office of victoria, 2004). n/a no definite term to avoid vendor-lock (potter, 2006) odf (yes) information technology and libraries | december 2012 59 4. i n t e r o p e r a b i l i t y interoperability is the format supported by many software applications/os platforms or is it linked closely with a specific application (puglia et al., 2004)? n/a the ability to exchange electronic records with other users and it systems (brown, 2003) n/a not defined (sahu, 2006) n/a data interchange not defined (sahu, 2006) pdf (no) html (limited) sgml (excellent) xml (excellent) compatibility compatibility with prior versions of data set definitions often is needed for access and migration considerations (frey, 2000). n/a stability compatibility between versions (folk & barkstrom, 2003) n/a stable, not subject to constant or major changes over time (brown, 2003) n/a the format is supported by current applications and backward compatible, and there are frequent updates to the format or the specification (puglia et al., 2004). n/a not defined (sahu, 2006). n/a scalability the design should be applicable both to small and large data sets and to small and large hardware systems (frey, 2000). n/a markup compatibility and extensibility to support a much broader range of applications (ecma, 2008) n/a xml (yes) suitability for a variety of storage technologies the format should not be geared toward any particular technology (folk & barkstrom, 2003). n/a no definite term to allow data to be shared across information systems and remain impervious to many proprietary software revisions (potter, 2006) openoffice (yes) 5. i n d e p e n d e n c e device independencies can be reliably and consistently rendered without regard to the hardware/software platform (abrams et al., 2005) pdf/a (yes) tiff (no) static visual appearance can be reliably and consistently rendered and printed without regard to the hardware or software platform used (sullivan, 2006). pdf/a (yes) pdf/x (yes) this is a very important aspect for master files because they will be most likely used on various systems (frey, 2000). n/a independent implementations independent implementations help ensure that vendors accurately implement the specification (public records office of victoria, 2004). n/a externaldependency degree to which the format is dependent on specific hardware, operating system, or software for rendering or use and the complexity of dealing with those dependencies in future technical environments (arms & fleischhauer, 2006) n/a external dependencies the degree to which a particular format depends on particular hardware, operating system, or software for rendering or use and the predicted complexity of dealing with those dependencies in future technical environments (cendi, 2007; hodge & anderson, 2007) pdf (limited) pdf/a (no) tiff_g4 (no) xml (no) examining attributes of open standard file formats for long-term preservation and open access | park and oh 60 portability a format that makes extensive use of specific hardware or operating system features is likely to be unusable when that hardware or operating system falls into disuse. a format that is defined in an independent way will be much easier to use in the future: (1) independent of hardware; (2) independent of operating system; (3) independent of other software; (4) independent of particular institutions, groups, or events; (5) widespread current use; (6) little built-in functionality; and (7) single version or well-defined versions (clausen, 2004). n/a monitoring obsolescence information gathered through regular web harvesting can give us some information about what file types are approaching obsolescence, at least for the more frequently used types (clausen, 2004). n/a no definite term a human-readable text format and internationalized character sets are supported (müller et al., 2003). n/a xml (yes) not dependent on specific hardware, not dependent on specific operating systems, not dependent on one specific reader, not dependent on other external resources (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (little) the format requires a plug-in for viewing if appropriate software is not available or relies on external programs to function (puglia et al., 2004). n/a 6. p r e s e n t a t i o n distributing page image not defined (sahu, 2006) pdf (excellent) html (good) sgml (good) xml (good) normal rendering not defined (cendi, 2007; hodge & anderson, 2007). pdf (yes) pdf/a (limited) tiff_g4 (yes) xml (yes) presentation preservation of its original look and feel (brown, 2003) n/a self-containment everything that is necessary to render or print a pdf/a file must be contained within the file (sullivan, 2006). pdf/a (yes) self-contained to contain all resources necessary for rendering (abrams et al., 2005) n/a beyond normal rendering not defined (cendi, 2007; hodge & anderson, 2007). pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (limited) 7. a u t h e n t i c i t y authenticity the format must preserve the content (data and structure) of the record and any inherent contextual, provenance, referencing and fixity information (brown, 2003). n/a provenance traceability ability to trace the entire configuration of data production (folk & barkstrom, 2003) n/a integrity of layout not defined (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (n/a) xml (yes) integrity of rendering of equations not defined (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (n/a) xml (limited) integrity of structure not defined (cendi, 2007; hodge & anderson, 2007) pdf (limited) pdf/a (limited) tiff_g4 (n/a) information technology and libraries | december 2012 61 xml (yes) 8. a d o p t i o n adoption degree to which the format is already used by the primary creators, disseminators, or users of information resources (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) worldwide usage, usage in the cultural heritage sector as archival format (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (yes) microsoft word (limited) the degree to which the format is already used by the primary creators, disseminators, or users of information resources (arms & fleischhauer, 2006) n/a widespread use may be the best deterrent against preservation risk (abrams et al., 2005). tiff (yes) the format is widely used by the imaging community in cultural institutions (puglia et al., 2004). n/a flexibility of implementation to promote its wide adoption (sullivan, 2006) pdf/a (yes) popularity a format that is widely used (folk & barkstrom, 2003) n/a widely used formats it is far more likely that software will continue to be available to render the format (public records office of victoria, 2004). n/a ubiquity popular formats supported by as much software as possible (brown, 2003) n/a not defined (sahu, 2006) n/a continuity the file format is mature (puglia et al., 2004) n/a 9. p r o t e c t i o n technical protection mechanism password protection, copy protection, digital signature, printing protection and content extraction protection (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) implementation of a mechanism such as encryption that prevents the preservation of content by a trusted repository (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (no) tiff_g4 (no) xml (no) it must be able to replicate the content on new media, migrate and normalize it in the face of changing technology, and disseminate it to users at a resolution consistent with network bandwidth constraints (arms & fleischhauer, 2006). n/a no encryption, passwords, etc. (abrams et al. (2005) n/a protection the format accommodates error detection, correction mechanisms, and encryption options (puglia et al., 2004). n/a source verification cryptographic encoding of files or digital watermarks without overburdening the data centers or archives (folk & barkstrom, 2003) n/a examining attributes of open standard file formats for long-term preservation and open access | park and oh 62 10. p r e s e r v a t i o n preservation the format contains embedded objects (e.g., fonts, raster images) or links to external objects (puglia et al., 2004). n/a long-term institutional support to ensure the long-term maintenance and support of a data format by placing responsibility for these operations on institutions (folk & barkstrom, 2003) n/a ease of transformation/ preservation the format will be supported for fully functional preservation in a repository setting, or the format guarantee can currently only be made at the bitstream (content data) level (puglia et al., 2004). n/a no definite term to create files with either a very high or very low preservation value (becker et al., 2008a, becker et al., 2008b) pdf (no) tiff (no) 11. r e f e r e n c e citability a machine-independent ability to reference or “cite” the individual data element in a stable way (folk & barkstrom, 2003) n/a referential extensibility ability to build annotations about new interpretations of the data (folk & barkstrom, 2003) n/a no definite term an open and established notation (müller et al., 2003) n/a xml (yes) data is easily repurposed via tags or translated to any medium (johnson, 1999) n/a xml (yes) creating, using, and reusing tags is easy, making it highly extensible (johnson, 1999). n/a xml (yes) 12. o t h e r s transparency degree to which the digital representation is open to direct analysis with basic tools, such as human readability using a text-only editor (cendi, 2007, hodge & anderson, 2007). pdf (limited) pdf/a (limited) tiff_g4 (limited) xml (yes) in natural reading order (sullivan, 2006). pdf/a (yes) microsoft notepad (yes) the degree to which the format is already used by the primary creators, disseminators, or users of information resources (arms & fleischhauer, 2006) n/a amenable to direct analysis with basic tools (abrams et al., 2005) n/a ample comment space to allow rich metadata (barnes, 2006) n/a items should be labeled, as far as possible, with enough information to serve for searching or cataloging (lesk, 1995). tiff (yes) a digital format may inhibit the ability of archival institutions to sustain content in that format (arms & fleischhauer, 2006). n/a information technology and libraries | december 2012 63 table bibliography abrams, stephen et al. 2005. “pdf-a: the development of a digital preservation standard.” paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, http://www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011). arms, caroline r. and carl fleischhauer. 2006. “sustainability of digital formats: planning for library of congress collections.” http://www.digitalpreservation.gov/formats/sustain/sustain.shtml (accessed november 21, 2011). barnes, ian. 2006. “preservation of word processing documents.” http://apsr.anu.edu.au/publications/word_processing_preservation.pdf (accessed november 21, 2011). becker, christoph et al. 2008. “a generic xml language for characterising objects to support digital preservation.” in proceedings of the 2008 acm symposium on applied computing, fortaleza, ceara, brazil, march 16–20. becker, christoph et al. 2008. “systematic characterization of objects in digital preservation: the extensible characterization language.” journal of universal computer science 14, no 18: 2936– 2952. brown, adams. 2003. “the national archives. digital preservation guidance note: selecting file formats for long-term preservation.” http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011). cendi digital preservation task group. 2007. “formats for digital preservation: a review of alternatives and issues.” http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf (accessed november 21, 2011). clausen, lars r. 2004. “handling file formats.” http://netarchive.dk/publikationer/fileformats2004.pdf (accessed november 21, 2011). ecma. 2008. “office open xml file formats—part 1.” 2nd ed. http://www.ecmainternational.org/publications/standards/ecma-376.htm (accessed november 21, 2011). folk, mike, and bruce barkstrom. 2003. “attributes of file formats for long-term preservation of scientific and engineering data in digital libraries.” paper presented at the joint conference on digital libraries, houston, tx, may 27–31. http://www.hdfgroup.org/projects/nara/sci_formats_and_archiving.pdf (accessed november 21, 2011). http://www.digitalpreservation.gov/formats/sustain/sustain.shtml http://apsr.anu.edu.au/publications/word_processing_preservation.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf http://netarchive.dk/publikationer/fileformats-2004.pdf http://netarchive.dk/publikationer/fileformats-2004.pdf http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.hdfgroup.org/projects/nara/sci_formats_and_archiving.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 64 frey, franziska. 2000. “5. file formats for digital masters.” in guides to quality in visual resource imaging, research libraries group and digital library federation. http://imagendigital.esteticas.unam.mx/pdf/guides.pdf (accessed november 21, 2011). hodge, gail and nikkia anderson. 2007. “formats for digital preservation: a review of alternatives and issues.” information services & use 27: 45–63. johnson, amy helen. 1999. “xml xtends its reach: xml finds favor in many it shops, but it’s still not right for everyone.” computerworld 33, no. 42: 76–81. lesk, michael e. 1995. “preserving digital objects: recurrent needs and challenges.” in proceedings of the 2nd npo conference on multimedia preservation. brisbane, australia. http://www.lesk.com/mlesk/auspres/aus.html (accessed november 21, 2011). müller, eva et al. 2003. “using xml for long-term preservation: experiences from the diva project.” in proceedings of the sixth international symposium on electronic theses and dissertations. berlin, may: 109–116, https://edoc.hu-berlin.de/conferences/etd2003/hanssonpeter/pdf/index.pdf (accessed december 8, 2012). potter, john michael. 2006. “formats conversion technologies set to benefit institutional repositories.” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026typ e=pdf (accessed november 21, 2011). public records office of victoria (australia). 2006. “advice on vers long-term preservation formats pros 99/007 (version2) specification 4.” department for victorian communities. http://prov.vic.gov.au/wp-content/uploads/2012/01/vers_advice13.pdf (accessed november 21, 2011). puglia, steven, jeffrey reed, and erin rhodes. 2004. “technical guidelines for digitizing archival materials for electronic access: creation of production master files—raster images.” us national archives and records administration. http://www.archives.gov/preservation/technical/guidelines.pdf (accessed november 21, 2011). rog, judith, and caroline van wijk. 2008. “evaluating file formats for long-term preservation.” national library of the netherlands. http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_metho d_27022008.pdf (accessed november 21, 2011). sahu, d.k. 2004. “long term preservation: which file format to use.” presentation at workshops on open access & institutional repository, chennai, india, may 2–8, http://openmed.nic.in/1363/01/long_term_preservation.pdf (accessed november 21, 2011). sullivan, susan j. 2006. “an archival/records management perspective on pdf/a.” records management journal 16, no. 1: 51–56. http://imagendigital.esteticas.unam.mx/pdf/guides.pdf http://www.lesk.com/mlesk/auspres/aus.html https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/pdf/index.pdf https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/pdf/index.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026type=pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026type=pdf http://prov.vic.gov.au/wp-content/uploads/2012/01/vers_advice13.pdf http://www.archives.gov/preservation/technical/guidelines.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://openmed.nic.in/1363/01/long_term_preservation.pdf information technology and libraries | december 2012 65 van wijk, caroline, and judith rog. 2007. “evaluating file formats for long-term preservation.” presentation at international conference on digital preservation, beijing, china, oct 11–12. http://ipres.las.ac.cn/pdf/caroline-ipres2007-11-12oct_cw.pdf (accessed november 21, 2011). http://ipres.las.ac.cn/pdf/caroline-ipres2007-11-12oct_cw.pdf 50 information technology and libraries | june 2009 andrew k. pace president’s message: lita forever andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. i was warned when i started my term as lita president that my time at the helm would seem fleeting in retrospect, and i didn’t believe it. i should have. i suppose most advice of that sort falls on deaf ears—advice to children about growing up, advice to newlyweds, advice to new parents. some things you just have to experience. now i am left with that feeling of having worked very hard while not accomplishing nearly enough. it’s time to buy myself some more time. my predecessor, mark beatty, likes to jokingly introduce himself in ala circles as “lita has-been” in reference to his role as lita past-president. i say jokingly because he and i both know it is not true. not only does the past-president continue in an active role on the lita board and executive committee, the past-president has the daunting task of acting as the division’s financial officer. just as mark knows well the nature of this elected (but still volunteer) commitment, so michelle frisque, my successor this july, knows that the hard work started as vice-president/ president-elect has two challenging years ahead. being elected lita president is for all intents and purposes a three-year term with shifting responsibilities. add to this the possibility of serving on the board beforehand, and it’s likely that one could serve less time for knocking over a liquor store. i’m joking, of course— there’s nothing punitive about being a lita officer; it’s as rewarding as it is challenging. neither is this intended to be a self-congratulatory screed as my last hurrah in print as lita president. i’ve referred repeatedly to the grassroots success of lita’s board, interest groups, dedicated committees, and engaged volunteers. the flatness of our division is often emulated by others. i thoroughly enjoy engagement with the lita membership, face-to-face and virtual recruitment of new members and volunteers, and group meetings to discuss moving lita forward. i love that lita is fun. fun and enjoyment, coupled with my dedication to the profession that i love, is why i plan to make the most of my time, even as a has-been. all those meetings, all that bureaucracy? well, believe it or not, i like the bureaucracy—process works when you learn to work the process—and all those meetings have actually created some excellent feedback for the lita board. changes in ala, changes in the membership, and changes suggested by committees and interest groups all suggest . . . guess what? change. “change” has been a popular theme these days. i’m in that weird minority of people who does not believe that people don’t like to change. i think if the ideas are good, if the destination is worthwhile, then change is possible and even desirable. i’m always geared up for change, for learning from our mistakes, for asking forgiveness on occasion and for permission even less. this is a long-winded way of saying that i think lita is ready for some change. change to the board, change to the committees and interest groups, and changes to our interactions with lita and ala staff. i think ala and the other divisions are anxious for change as well, and i feel confident that lita and its membership can help, even while we change ourselves. don’t ask me today what the details of these changes are. all i can say is that i will be there for them, help see them through, and will be there on the other side to asses which changes worked and which didn’t. one thing i hope does not change is the passion and dedication of the leaders, volunteers, and members of this great organization. i only hope that our ranks grow, even in times of financial uncertainty. lita provides a valuable network of colleagues and friends—this network is always valuable, but it is indispensible in times of difficulty. for many, lita represents a second or third divisional membership, but for networking and collegial support, i think we are second to none. i titled my previous column “lita now.” i think it’s safe for me to say now, “lita forever.” usability study of a library’s m obile website: an example from portland state university kimberly d. pendell and michael s. bowman usability study of a library’s mobile website | pendell and bowman 45 abstract to discover how a newly developed library mobile website performed across a variety of devices, the authors used a hybrid field and laboratory methodology to conduct a usability test of the website. twelve student participants were recruited and selected according to phone type. results revealed a wide array of errors attributed to site design, wireless network connections, as well as phone hardware and software. this study provides an example methodology for testing library mobile websites, identifies issues associated with mobile websites, and provides recommendations for improving the user experience. introduction mobile websites are swiftly becoming a new access point for library services and resources. these websites are significantly different from full websites, particularly in terms of the user interface and available mobile-friendly functions. in addition, users interact with a mobile website on a variety of smartphones or other internet-capable mobile devices, all with differing hardware and software. it is commonly considered a best practice to perform usability tests prior to the launch of a new website in order to assess its user friendliness, yet examples of applying this practice to new library mobile websites are rare. considering the variability of user experiences in the mobile environment, usability testing of mobile websites is an important step in the development process. this study is an example of how usability testing may be performed on a library mobile website. the results provided us with new insights on the experience of our target users. in the fall of 2010, with the rapid growth of smartphones nationwide especially among college students, portland state university (psu) library decided to develop a mobile library website for its campus community. the library’s lead programmer and a student employee developed a test version of the website. this version of the website included library hours, location information, a local catalog search, library account access for viewing and renewing checked out items, and access to reference services. it also included a “find a computer” feature displaying the availability of work stations in the library’s two computer labs. kimberly d. pendell (kpendell@pdx.edu) is social sciences librarian, assistant professor, and michael s. bowman (bowman@pdx.edu) is interim assistant university librarian for public services, associate professor, portland state university library, portland, oregon. mailto:kpendell@pdx.edu mailto:bowman@pdx.edu information technology & libraries | june 2012 46 the basic architecture and design of the site was modeled on other existing academic library mobile websites that were appealing to the development team. the top-level navigation of the mobile website largely mirrored the full library website, utilizing the same language as the website when possible. the mobile website was built to be compatible with webkit, the dominant smartphone layout engine. use of javascript on the website was minimized due to the varying levels of support for it on different smartphones, and flash was avoided entirely. figure 1. home page of library mobile website, test version we formed a mobile website team to further evaluate the test website and prepare it for launch. three out of four team members owned smartphones, either an iphone 3gs or an iphone 4. we soon began questioning how the mobile website would work on other types of phones, recognizing that hardware and software differences would likely impact user experience of the mobile website. performing a formal usability test using a variety of internet-capable phones quickly became a priority. we decided to conduct a usability test for the new mobile website in order to answer the question: how user-friendly and effective is the new library mobile website on students’ various mobile devices? literature review smartphones, mobile websites, and mobile applications have dominated the technology landscape in the last few years. smartphone ownership has steadily increased, and a large percentage of usability study of a library’s mobile website | pendell and bowman 47 smartphone owners regularly use their phone to access the internet. the pew research center reports that 52 percent of americans aged 18–29 own smartphones, and 81 percent of this population use their smartphone to access the internet or e-mail on a typical day. additionally, 42 percent of this population uses a smartphone as their primary online access point.1 the 2010 ecar study of undergraduate students and information technology found that 62.7 percent of undergraduate students own internet-capable handheld devices, an increase of 11.5 percent from 2009. the 2010 survey also showed that an additional 11.3 percent of students intended to purchase an internet-capable handheld device within the next year.2 in this environment academic libraries have been scrambling to address the proliferation of student owned mobile devices, thus the number of mobile library websites is growing. the library success wiki, which tracks libraries with mobile websites, shows an 66 percent increase in the number of academic libraries in the united states and canada with mobile websites from august 2010 to august 2011.3 we reviewed articles about mobile websites in the professional library science literature and found that mobile website usability testing is only briefly mentioned. in their summary of current mobile technologies and mobile library website development, bridges, rempel, and griggs state that “user testing should be part of any web application development plan. you can apply the same types of evaluation techniques used in non-mobile applications to ensure a usable interface.”4 in a previous article, the same authors also note that not accounting for other types of mobile users is easy to do but leaves a potentially large audience for a mobile website “out in the cold.”5 more recently, seeholzer and salem found the usability aspect of mobile website development to be in need of further research.6 usability evaluation techniques for a mobile website are similar to those for a full website, but the variety of smartphones and internet-capable feature phones immediately complicates standard usability testing practices. the mobile device landscape is fraught with variables that can have a significant impact on the user experience of a mobile website. factors like small screen size, processing power, wireless or data plan connection, and on-screen keyboards or other data entry methods contribute to user experience and impact usability testing. zhang and adipat note that, mobile devices themselves, due to their unique, heterogeneous characteristics and physical constraints, may play a much more influential role in usability testing of mobile applications than desktop computers do in usability testing of desktop applications. therefore real mobile devices should be used whenever possible.7 one strategy for usability testing on mobile devices is to identify device “families” by similar operating systems or other characteristics, then perform a test of the website. for example, griggs, bridges, and rempel found representative models of device families at a local retailer, where they tested the site on the display phones. the authors also recommend “hallway usability testing,” an impromptu test with a volunteer.8 zhang and adipat go on to outline two methodologies for formal mobile application usability testing: field studies and laboratory experiments. the benefit of a mobile usability field study is information technology & libraries | june 2012 48 the preservation of the mobile environment in which tasks are normally performed. however, data collection is challenging in field studies, requiring the participant to reliably and consistently selfreport data. in contrast, the benefit of a laboratory study is that researchers have more control over the test session and data collection method. laboratory usability tests lend themselves to screen capture or video recording, allowing researchers more comprehensive data regarding the participant’s performance on predetermined tasks.9 however, billi and others point out that there is no general agreement in the literature about the significance or usefulness of the difference between laboratory and field testing of mobile applications.10 one compromise between field studies and laboratory experiments is the use of a smartphone emulator: an emulator mimics the smartphone interface on a desktop computer and is recordable via screen capture. however, desktop emulators mask some usability problems that impact smartphones, such as an unstable wireless connection or limited bandwidth.11 in order to record test sessions of users working directly with mobile devices, jakob nielsen, the well-known usability expert, briefly mentions the use of a document camera.12 in another usability test of a mobile application, loizides and buchanan also used a document camera with recording capabilities to effectively record users working with a mobile device.13 usability attributes are metrics that help assess the user-friendliness of a website. in their review of empirical mobile usability studies, coursaris and kim present the three most commonly used measures in mobile usability testing: efficiency: degree to which the product is enabling the tasks to be performed in a quick, effective and economical manner or is hindering performance; effectiveness: accuracy and completeness with which specified users achieved specified goals in particular environment; satisfaction: the degree to which a product is giving contentment or making the user satisfied.14 the authors present these measures in an overall framework of “contextual usability” constructed with the four variables of user, task, environment, and technology. an important note is the authors’ use of technology rather than focusing solely on the product; this subtle difference acknowledges that the user interacts not only with a product, but also other factors closely associated with the product, such as wireless connectivity.15 a participant proceeding through a predetermined task scenario is helpful in assessing site efficiency and effectiveness by measuring the error rate and time spent on a task. user satisfaction may be gauged by the participant’s expression of satisfaction, confusion, or frustration while performing the tasks. measurement of user satisfaction may also be supplemented by a post-test survey. returning to general evaluation techniques, mobile website usability employs the use of task scenarios, post-test surveys, and data analysis methods, similar to full site testing. general guides such as the handbook of usability testing by rubin and chisnell and george’s user-centered library websites: usability evaluation methods provide helpful information on designing task scenarios, how to facilitate a test, post-test survey ideas, and methods of analysis.16 another usability study of a library’s mobile website | pendell and bowman 49 common data collection method in usability testing is the think aloud protocol as it allows researchers to more fully understand the user experience. participants are instructed to talk about what they are thinking as they use the site; for example, expressing uncertainty of what option to select, frustration with poorly designed data entry fields, or satisfaction with easily understood navigation. examples of the think aloud protocol can also be found in mobile website usability testing.17 method while effective usability testing normally relies on five to eight participants, we decided a larger number of participants would be needed in order to capture the behavior of the site on a variety of devices. therefore, we recruited twelve participants to accommodate a balanced variety of smartphone brands and models. based on average market share, we aimed to test the website on four iphones, four android phones, and four other types of smartphones or internet-capable mobile devices (e.g., blackberry, windows phones). all study participants were university students, the primary target audience of the mobile website. we used three methods to recruit participants: a post to the library’s facebook page, a news item on the library’s home page, and two dozen flyers posted around campus. each form of recruitment described an opportunity for students to spend less than thirty minutes helping the library test its new mobile website. also, participants would receive a $10 coffee shop gift card as an incentive. a project-specific email address served as the initial contact point for students to volunteer. we instructed volunteers to indicate their phone type in their e-mail; this information was used to select and contact the students with the desired variety of mobile devices. if a scheduled participant did not come to the test appointment, another student with the same or similar type of phone was contacted and scheduled. no other demographic data or screening was used to select participants, aside from a minimum age requirement of eighteen years old. we employed a hybrid field and laboratory test protocol, which allowed us to test the mobile website on students’ native devices while in a laboratory setting that we could efficiently manage and schedule. participants used their own phone for the test without any adjustment to their existing operating preferences, similar to field testing methodology. however, we used a controlled environment in order to facilitate the test session and create recordings for data analysis. a library conference room served as our laboratory, and a document camera with video recording capability was used to record the session. the document camera was placed on an audio/visual cart and the participants chose to either stand or sit while holding their phones under the camera. the document camera recorded the phone screen, the participant’s hands, and the audio of the session. the video feed was available through the room projector as well, which helped us monitor image quality of the recordings. information technology & libraries | june 2012 50 figure 2. video still from test session recording the test session consisted of two parts: the completion of five tasks using participants’ phones on our test website recorded under the document camera, and a post-test survey. participants were read an introduction and instructions from a script in order to decrease variation in test protocol and our influence as the facilitators. we also performed a walk-through of the testing session prior to administering it to ensure the script was clearly worded and easy to understand. we developed our test scenarios and tasks according to five functional objectives for the library mobile website: 1. participants can find library hours for a given day in the week. 2. participants can perform a known title search in catalog and check for item status. 3. participants can use my account to view checked out books.18 4. participants can use chat reference. 5. participants can effectively search for a scholarly article using the mobile version of ebscohost academic search complete. prior to beginning the test, we encouraged participants to use the “think aloud” protocol while performing tasks. we also instructed them to move between tasks however they would naturally in order to capture user behavior when navigating from one part of the site to another. the post-test survey provided us with additional data and user reactions to the site. users were asked to rate the site’s appearance, ease of use, and how frequently they might use the different website features usability study of a library’s mobile website | pendell and bowman 51 (e.g., renewing a checked out item). the survey was administered directly after the task scenario portion of the test in order to take advantage of the users’ recent experience with the website. we evaluated the test sessions utilizing the measures of efficiency, effectiveness, and satisfaction. in this study, we assessed efficiency as time spent performing the task and effectiveness as success or failure in completing the task. we observed errors and categorized them as either a user error or site error. each error was also categorized as minor, major, or fatal: minor errors were easily identified and corrected by the user; major errors caused a notable delay, but the user was able to correct and complete the task; fatal errors prevented the user from completing the task. to assess user satisfaction, we took note of user comments as they performed tasks, and we also referred to their ratings and comments on the post-test survey. before analyzing the test recordings, we normalized our scoring behavior by performing a sample test session with a library staff member unfamiliar with the mobile website. we scored the sample recording separately and then met to discuss, clarify, and agree upon each error category. each of the twelve test sessions was viewed and scored independently. once this process was completed, we discussed our scoring of each test session video, combining our data and observations. we analyzed the combined data by looking for both common and unique errors for each usability task across the variety of smartphones tested. to protect participants’ confidentiality, each video file and post-test survey was labeled only with the test number and device type. prior to beginning the study, all recruitment methods, informed consent, methodology, tasks and post-test survey were approved by portland state university human subjects research and review committee. findings our recruitment efforts were successful with even a few same-day responses from the announcement posted on the library’s facebook page. some students also indicated that they had seen the recruitment flyers on campus. a total of fifty-two students volunteered to participate; twelve students were successfully contacted, scheduled, and tested. the distribution of the twelve participants and their types of phones is shown in table 1. number of participants operating system phone model 4 android htc droid incredible 2; motorola droid; htc mytouch 3g slide; motorola cliq 2 3 ios iphone 3gs 2 blackberry blackberry 9630; blackberry curve information technology & libraries | june 2012 52 1 windows phone 7 windows phone 7 1 webos palm pixi 1 other windows kin 2 feature phone (a phone with internet capability, running kinos) table 1. test participants by smartphone operating system and model usability task scenarios all test participants quickly and successfully completed the first task, finding the library hours for sunday. the second task was to find a book in the library catalog and report whether the book was available for check out. nine participants completed this task; the windows phone 7 and the two blackberry phones presented a fatal system error when working with our mobile catalog software, mobilecat. these participants were able to perform a search but were not able to view a full item record, blocking them from seeing the item’s availability and completing the task. this task also revealed one minor error for iphone users: the iphone displayed the item’s ten digit isbn as a phone number, complete with touch-to-call button. many users took more time than anticipated when asked to search for a book. the video recordings captured participants slowly scrolling through the menu before choosing “search psuonly catalog.” a few participants expressed their hesitation verbally: ● “maybe not the catalog? i don't know. yeah i guess that would be the one.” ● “i don't look for books on this site anyway...my lack of knowledge more than anything else.” ● “search psu library catalog i'm assuming?” the blackberry curve participant did not recognize the catalog option and selected “databases & articles” to search for a book. she was guided back to the catalog after her unsuccessful search in ebscohost. we observed an additional delay in searching for a book when using the catalog interface. the catalog search included a pull down menu of collections options. the collections menu was included by the site developers because it is present in the full website version of the local catalog. users tended to explore the menu looking for a selection that would be helpful in performing the task; however, they abandoned the menu, occasionally expressing additional confusion. usability study of a library’s mobile website | pendell and bowman 53 figure 3. catalog search with additional “collections” menu the next task was to log into a library account and view checked out items. all participants were successful with this task, but frequent minor user errors were observed, all misspelling or numerical entry errors. most participants self-corrected before submitting the login; however, one participant submitted a misspelled user name and promptly received an error message from the site. participants were also instructed to log out of the account. after clicking “logout” one participant made the observation; “huh, it goes to the login screen. i assume i'm logged out, though it doesn't say so.” the fourth task scenario involved using the library’s chat reference service via the mobile website. the chat reference service is provided via open source software in cooperation with l-net, the oregon statewide service. usability testing demonstrated that the chat reference service did not perform well on a variety of phones. also, a significant problem arose when participants attempted to access chat reference via the university’s unsecured wireless network. because the chat reference service is managed by a third-party host, three participants were confronted with a non-mobile friendly authentication screen (see discussion of the local wireless environment below). as this was an unexpected event in testing, participants were given the option to authenticate or abandon the task. all three participants who arrived at this point chose to move ahead with authentication during the test session. information technology & libraries | june 2012 54 once the chat interface was available to participants, other system errors were discovered. only three out of twelve participants successfully sent and received a chat message. only one participant (htc droid incredible) experienced an error-free chat transaction. various problems encountered included: · unresponsive or slow to respond buttons, · text fields unresponsive to data entry, · unusually long page loading time, · non-mobile-friendly error message upon attempting to exit, and · non-mobile-friendly “leave a message” webpage. another finding from this task is that participants expressed concern regarding communication delays during the chat reference task. if the librarians staffing the chat service are busy with other users, a new incoming user is placed in a queue. after waiting in the chat queue for forty seconds, one participant commented, “probably if i was on the bus and it took this long, i would leave a message.” being in a controlled environment, participants looked to the facilitator as a guide for how long to remain in the chat queue, distorting the indication of how long users would wait for a chat reference transaction in the field environment. figure 4. chat reference queue usability study of a library’s mobile website | pendell and bowman 55 the last task scenario asked participants to use the mobile version of ebscohost’s academic search complete. our test instance of this database generally performed well with android phones and less well with webos phones or iphones. android participants successfully accessed, searched, and viewed results in the database. iphone users experienced delays in initiating text entry, three consecutive touches being consistently necessary to activate typing in the search field. our feature phone participant with a windows kin 2 was unable to use ebscohost because the phone’s browser, internet explorer 6, is not supported by the ebscohost mobile website. the palm pixi participant also experienced difficulty with very long page loading times, two security certificate notifications (not present on other tests), and our ezproxy authentication page. with all these obstacles, the palm pixi participant abandoned the task. another participant, blackberry 9630, also abandoned the task due to slow page loading. a secondary objective of our ebscohost search task was to observe if participants explored ebscohost’s “search options” in order to limit results to scholarly articles. our task scenario asked participants to find a scholarly article on global warming. only one participant explored the ebscohost interface, successfully identified the “search options” menu, and limited the results to “scholarly (peer reviewed) articles.” another participant included the words “peer reviewed” with “global warming” in the search field in an attempt to add the limit. a third expressed the need to limit to scholarly articles but was unable to discover how to do so. of the remaining seven participants who searched academic search complete for the topic “global warming” none expressed concern or awareness of the scholarly limit in academic search complete. it is unclear whether this was a product of the interface design, users’ lack of knowledge regarding limiting their search to scholarly sources, or if our task scenario was simply too vague. though participants’ wireless configurations, or lack thereof, was not formally part of the usability test, we quickly discovered that this variable had a significant impact on the user’s experience of the mobile website. in the introductory script and informed consent we recommended to participants that they connect to the university’s wireless network to avoid data charges. however, we did not explicitly instruct users to connect to the secure network. most participants chose to connect to the unencrypted wireless network and appeared to be unaware of the encrypted network (psu and psu secure respectively). using the unencrypted network led to authentication requirements at two different points in the test: using the chat service and searching academic search complete. other users who were unfamiliar with adding a wireless network to their phone used their cellular network connection. these participants were asked to authenticate only when accessing ebscohost’s academic search complete (see table 2). participants expressed surprise at the appearance of an authentication request when performing different tasks, particularly while connected to the on-campus university wireless network. the required data entry in a non-mobile friendly authentication screen, and the added page loading time, created an obstacle for the participant to overcome in order to complete the task. notably, three participants also explained their naivete on how to find and add a wireless network to their phone. information technology & libraries | june 2012 56 internet connection library mobile website chat reference ebscohost on campus, unencrypted wireless no authentication required authentication required authentication required on campus, encrypted wireless no authentication required no authentication required no authentication required on campus, cellular network no authentication required no authentication required authentication required off campus, any mode no authentication required no authentication required authentication required table 2. authentication requirements based on type of internet connection and resource. post -test survey each participant completed a post-test survey that asked them to rate the mobile website’s appearance and ease of use. the survey also asked participants to rank how frequently they were likely to use specific features of the website such as search for books and ask for help on a rating scale of more than weekly, weekly, monthly, less than monthly, and never. participants were also invited to add general comments about the website. the mobile website’s overall appearance and ease of use was highly rated by all participants. the straightforward design of the mobile website’s homepage also garnered praise in the comment section of the post-test survey. comments regarding the site’s design included: “very simple to navigate,” and “the simple homepage is perfect! also, i love that the site rotates sideways with my phone.” for many of the features listed on the survey participants selected an almost even distribution across the frequency of use rating scale. however, two features were ranked as having potential for very high use. nine out of twelve participants said they would search for articles weekly or more than weekly. eight out of twelve participants said they would use the “find a computer” function weekly or more than weekly. two participants additionally wrote in comments that “find a computer” was “very important” and would be used “every day.” at the other end of the scale, our menu option “directions” was ranked as having a potential frequency of use of never, with the exception of one participant marking less than monthly. discussion usability testing of the library’s mobile website provided the team with valuable information, leading us to implement important changes before the site was launched. we quickly decided on a usability study of a library’s mobile website | pendell and bowman 57 few changes, while others involved longer discussion. the collections menu was removed from the catalog search; this menu distracted and confused users with options that were not useful in a general search. “directions” was moved from a top level navigation element to a clickable link in the site footer. also, the need for a mobile version of the library’s ezproxy authentication page was clearly documented and has since been created and implemented. however, the team was very pleased with the praise for the overall appearance of the website and its ease of use, especially considering the significant difficulties some participants faced when completing specific tasks. the “find a computer” feature of the mobile website was very popular with test participants. the potential popularity among users is perhaps a reflection of overcrowded computer labs across campus and the continued need students have for desktop computing. unfortunately, “find a computer” has been temporarily removed from the site due to changes in computer laboratory tracking software at the campus it level. we hope to soon again have access to the workstation data for the library’s two computer labs in order to develop a new version of this feature. the hesitation participants displayed when selecting the catalog option in order to search for a book was remarkable for its pervasiveness. it’s possible that the term “catalog” has declined in use to the point of not being recognizable to some users, and it is not used to describe the search on the homepage of the library’s full website. in fact, we had originally planned to name the catalog search option with a more active and descriptive phrase, such as “find books and more,” which is used on the library’s full website. however, the full library website employs worldcat local, allowing users to make consortial and interlibrary loan requests. in contrast, the mobile website catalog reflects only our local holdings and does not support the request functionality. the team decided not to potentially confuse users further regarding the functionality of the different catalogs by giving them the same descriptive title. in the case that worldcat local’s beta mobile catalog increases in stability and functionality, we will abandon mobilecat and provide the same request options on the mobile website as on the full website. we discussed removing the chat service option from the “ask us” page. during usability testing, it was demonstrated that users would too frequently have poor experiences using this service due to slow page loads on most phones, the unpredictable responsiveness of text entry fields and buttons, and the wait time for a librarian to begin the chat. also, it could be that waiting in a virtual queue on a mobile device is particularly unappealing because the user is blocked from completing other tasks simultaneously. the library recently implemented a new text reference service, and this service was added to the mobile website. the text reference service is an asynchronous, non-webbased service that is less likely to pose similar usability problems as those found with the chat service. this reflects the difference between applications developed for desktop computing, such as web-based instant messaging, versus a technology that is specifically related to the mobile phone environment, like text messaging. however, tablet device users complicate matters since they might use the full desktop website or the mobile website; for this reason, chat reference is still part of the mobile website. information technology & libraries | june 2012 58 participants’ interest in accessing and searching databases was notable. during the task, many participants expressed positive reactions to the availability of the ebscohost database. the posttest survey results demonstrated a strong interest in searching for articles via the mobile website, giving their potential frequency of use as weekly or more than weekly. this evidence supports the previous user focus group results of seeholzer and salem.19 students are interested in accessing research databases on their mobile devices, despite the likely limitations of performing advanced searches and downloading files. therefore, the team decided to include ebscohost’s academic search complete along with eight other mobile-friendly databases in the live version of the website launched after the usability test. figure 5. home page of the library mobile website, updated usability study of a library’s mobile website | pendell and bowman 59 the new library mobile website was launched in the first week of fall 2011 quarter classes. in the first full week there were 569 visits to the site. site analytics for the first week also showed that our distribution of smartphone models in usability testing was fairly well matched with the users of the website, though we underestimated the number of iphone users: 64 percent of visits were from apple ios users, 28 percent from android users, 0.7percent blackberry users, and the remaining a mix of users with alternative mobile browsers and desktop browsers. usability testing with participants’ native smartphones and wireless connectivity revealed issues which would have been absent in a laboratory test that employed a mobile device emulator and a stable network connection. the complications introduced by the encrypted and unencrypted campus wireless networks, and cellular network connections, revealed some of the many variables users might experience outside of a controlled setting. ultimately, the variety of options for connecting to the internet from a smartphone, in combination with the authentication requirements of licensed library resources, potentially adds obstacles for users. general recommendations for mobile library websites that emerged from our usability test include: · users appreciate simple, streamlined navigation and clearly worded labels; · error message pages and other supplemental pages linked from the mobile website pages should be identified and mobile-friendly versions created; · recognize that how users connect to the mobile website is related to their experience using the site; · anticipate problems with third-party services (which often cannot be solved locally). additionally, system responses to user actions are important; for example, provide a “you have successfully logged out” message and an indicator that a catalog search is in progress. it is possible that users are even more likely to abandon tasks in a mobile environment than in a desktop environment if they perceive the site to be unresponsive. as test facilitators, we experienced three primary difficulties in keeping the testing sessions consistent. the unexpectedly poor performance of the mobile website on some devices required us to communicate with participants about when a task could be abandoned. for example, after one participant made three unsuccessful attempts at entering text data in the chat service interface, she was directed to move ahead to the next task. such instances of multiple unsuccessful attempts were considered to be fatal system errors. however, under these circumstances, it is difficult to know whether our test facilitation led participants to spend more or less time than they normally would attempting a task. secondly, the issue of system authentication led to unexpected variation in testing. some participants proceeded through these obstacles, while others either opted out or had significant enough technical difficulties that the task was deemed a fatal error. again, it is unclear how the average user would deal with this situation in the field. some users information technology & libraries | june 2012 60 might leave an activity if an obstacle appears too cumbersome, others might proceed. finally, participants demonstrated a wide range in their willingness to “think aloud.” in retrospect, as facilitators, we should have provided an example of the method before beginning the test; perhaps doing so would have encouraged the participants to speak more freely. the relatively simple nature of most of the test tasks may have also contributed to this problem as participants seemed reluctant to say something that might be considered too obvious. another limitation of our study is that the participants were a convenience sample of volunteers selected by phone type. though our selection was based loosely on market share of different smartphone brands, a preliminary investigation into the mobile device market of our target population would have been helpful to establish what devices would be most important to test. additional usability testing on more complex library related tasks, such as advanced searching in a database, or downloading and viewing files, is recommended for further research. also of interest would be a study of user willingness to proceed past obstacles like authentication requirements and non-mobile friendly pages in the field. conclusion we began our study questioning whether or not different smartphone hardware and operating systems would impact the user experience of our library’s new mobile website. usability testing confirmed that the type of smartphone does have an impact on the user experience, occasionally significantly so. by testing the site on a range of devices, we observed a wide variation of successful and unsuccessful experiences with our mobile website. the wide variety of phones and mobile devices in use makes developing a mobile website that perfectly serves all of them difficult; there is likely to always be a segment of users who experience difficulties with any given mobile website. however, usability testing data and developer awareness of potential problems will generate positive changes to mobile websites and alleviate frustration for many users down the road. references and notes 1. aaron smith, “35% of american adults own a smartphone: one quarter of smartphone owners use their phone for most of their online browsing,” pew research center, june 15, 2011, http://pewinternet.org/~/media//files/reports/2011/pip_smartphones.pdf (accessed oct. 13, 2011). 2. shannon d. smith and judith b. caruso, the ecar study of undergraduate students and information technology, 2010, educause, 2010, 41, http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed sept. 12, 2011); shannon d. smith, gail salaway, and judith b. caruso, the ecar study of undergraduate students and information technology, 2009, educause, 2009, 49, http://www.educause.edu/resources/theecarstudyofundergraduatestu/187215 (accessed sept. 12, 2011). http://pewinternet.org/~/media/files/reports/2011/pip_smartphones.pdf http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf http://www.educause.edu/resources/theecarstudyofundergraduatestu/187215 usability study of a library’s mobile website | pendell and bowman 61 3. a comparison count of u.s. and canadian academic libraries with active mobile websites, wiki page versions, august 2010 (56 listed) and august 2011 (84 listed). library success: a best practices wiki, “m-libraries: libraries offering mobile interfaces or applications,” http://libsuccess.org/index.php?title=m-libraries (accessed sept. 7, 2011). 4. laurie m. bridges, hannah gascho rempel, and kim griggs, “making the case for a fully mobile library web site: from floor maps to the catalog,” reference services review 38, no. 2 (2010): 317, doi:10.1108/00907321011045061. 5. kim griggs, laurie m. bridges, and hannah gascho rempel, “library/mmobile: tips on designing and developing mobile web sites,” code4lib journal no. 8 (2009), under “content adaptation techniques,” http://journal.code4lib.org/articles/2055 (accessed sept. 7, 2011). 6. jamie seeholzer and joseph a. salem jr., “library on the go: a focus group study of the mobile web and the academic library,” college & research libraries 72, no. 1 (2011): 19. 7. dongsong zhang and boonlit adipat, “challenges, methodologies, and issues in the usability testing of mobile applications,” international journal of human-computer interaction 18, no. 3 (2005): 302, doi:10.1207/s15327590ijhc1803_3. 8. griggs, bridges, and rempel, “library/mobile.” 9. zhang and adipat, “challenges, methodologies,” 303–4. 10. billi et al., “a unified methodology for the evaluation of accessibility and usability of mobile applications,” universal access in the information society 9, no. 4 (2010): 340, doi:10.1007/s10209-009-0180-1. 11. zhang and adipat, “challenges, methodologies,” 302. 12. jakob nielsen, “mobile usability,” alertbox, september 26, 2011, www.useit.com/alertbox/mobile-usability.html (accessed sept. 28, 2011). 13. fernando loizides and george buchanan, “performing document triage on small screen devices. part 1: structured documents,” in iiix ’10: proceeding of the third symposium on information interaction in context, ed. nicholas j. belkin and diane kelly (new york: acm, 2010), 342, doi:10.1145/1840784.1840836. 14. constantinos k. coursaris and dan j. kim, “a qualitative review of empirical mobile usability studies” (presentation, twelfth americas conference on information systems, acapulco, mexico, august 4–6, 2006), 4, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.4082&rep=rep1&type=pdf (accessed sept. 7, 2011) 15. ibid., 2. http://libsuccess.org/index.php?title=m-libraries http://journal.code4lib.org/articles/2055 file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.useit.com/alertbox/mobile-usability.html http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.4082&rep=rep1&type=pdf information technology & libraries | june 2012 62 16. jeffrey rubin and dana chisnell, handbook of usability testing: how to plan, design, and conduct effective tests, 2nd ed. (indianapolis, in: wiley, 2008); carole a. george, user-centered library web sites: usability evaluation methods (cambridge: chandos, 2008). 17. ronan hegarty and judith wusteman, “evaluating ebscohost mobile,” library hi tech 29, no. 2 (2011): 323–25, doi:10.1108/07378831111138198; robert c. wu et al., “usability of a mobile electronic medical record prototype: a verbal protocol analysis,” informatics for health & social care 33, no. 2 (2008): 141–42, doi:10.1080/17538150802127223. 18. in order to protect participants’ confidentiality a dummy library user account was created; the user name and password for the account were provided to the participant at the test session. 19. seeholzer and salem, “library on the go,” 14. 50 information technology and libraries | june 2006 author name and second author f orty years! in july 1966, the library and information technology association (lita) was officially born at the american library association (ala) annual conference in new york as the information science and automation division (isad). it was bastille day, and i’m sure for those who had worked so hard to create this new organization that it probably seemed like a revolution, a new day. the organizational meeting held that day attracted “several hundred people.” imagine! i’ve mentioned it before, i know, but the history of the first twenty-five years of lita is intriguing reading and well worth an investment of your time. stephen r. salmon’s article “lita’s first twenty-five years: a brief history” (www.lita.org/ala/lita/aboutlita/org/1st 25years.htm) offers an interesting look back in time. any technology organization that has been in existence for forty or more years has seen a lot of changes and adapted over time to a new environment and new technologies. there is no other choice. someone (who, i don’t remember; i’d gladly attribute the quote if i did) once told me that library automation began with the electric eraser. i’m sure that many of you have neither seen an electric eraser, nor can you probably imagine its purpose. ask around. i’m sure there are staff in your organization who do remember using it. there may even be one hidden somewhere in your library. a quick search of the web even finds cordless, rechargeable electric erasers today in drafting and art supply stores. the 1960s, as lita was born, was still the era of the big mainframe systems and not-so-common programming languages. machine readable cataloging (marc) was born and oclc conceived. the 1970s saw the introduction of minicomputer systems. digital equipment corporation introduced the vax, a 32-bit platform, in 1976. the roots of many of our current integrated library systems reach back to this decade. the 1980s saw the introduction of the ibm personal computer and the apple macintosh. the graphical interface became the norm or at least the one to imitate. the 1990s saw a shift away from hardware to communication and access as the web was unveiled and began to give life to the internet bubble. the new millennium began with y2k. the web predominates, and increasingly, the digital form dominates almost everything we touch (text, audio, video). automation and systems evolved and changed over the years, and so did libraries. automation, which had been confined to large air-conditioned and monitored rooms, moved out into the library. it increasingly appeared at circulation desks, on staff desks, and then throughout the library. information technology (it) spread into offices everywhere and into homes. libraries had products and services to deliver to users. users demanded more convenience. of course, others knew this trend as well and provided products and services that users wanted. users often liked what they saw in stores better than what the library was able to provide. each of us attempts to keep up, compete, and beat those whom we see as our competitors. it’s a moving target and one that seems to be gaining speed. all the while, during these four decades, our association and its members continually adapted to the new environment, faced new challenges, and adopted new technologies. we would not exist if we did not. i feel that we, as an association, are again facing the need to change, to transform ourselves. it, digital technology, automation (whatever term you want to use) affects the work of virtually every library staff member. everyone’s work in the library uses or contributes to the digital presence of our employer. it is not the domain of a few. lita has a wonderful history and it has great potential to better serve the profession. what do we want our association to be? what programs and services can we provide that others do not? who can we involve to broaden our reach? how can we better communicate with members and nonmembers? if we had a clean sheet of paper, what would we write? what would we dream? we need to share that dream and bring it to life. i can’t do it. the lita board can’t do it. we need your help. we need your ideas. we need your energy. we need to break out of our comfort zone. none of us wants the strategic plan (www.lita.org/ala/lita/aboutlita/org/plan.htm) we adopted last year to ring hollow. we want to accelerate change and move into a reenergized future. i welcome your aspirations, ideas, and comments. i know that the lita board does as well. please feel free to contact me or any member of the board (www.lita .org/ala/lita/aboutlita/org/litagov/board.htm). lita is your association. where should we be going? help us navigate the future. patrick mullin patrick mullin (mullin@email.unc.edu) is lita president 2005– 2006, and associate university librarian for access services and systems, the university of north carolina at chapel hill. president’s column lauren h. mandel (lmandel@fsu.edu) is a doctoral candidate at the florida state university college of communication & information, school of library & information studies, and is research coordinator at the information use management & policy institute. geographic information systems: tools for displaying in-library use data lauren h. mandel geographic information systems: tools for displaying in-library use data | mandel 47 in-library use data is crucial for modern libraries to understand the full spectrum of patron use, including patron self-service activities, circulation, and reference statistics. rather than using tables and charts to display use data, a geographic information system (gis) facilitates a more visually appealing graphical display of the data in the form of a map. giss have been used by library and information science (lis) researchers and practitioners to create maps that display analyses of service area populations and demographics, facilities space management issues, spatial distribution of in-library use of materials, planned branch consolidations, and so on. the “seating sweeps” method allows researchers and librarians to collect in-library use data regarding where patrons are locating themselves within the library and what they are doing at those locations, such as sitting and reading, studying in a group, or socializing. this paper proposes a gis as a tool to visually display in-library use data collected via “seating sweeps” of a library. by using a gis to store, manage, and display the data, researchers and librarians can create visually appealing maps that show areas of heavy use and evidence of the use and value of the library for a community. example maps are included to facilitate the reader’s understanding of the possibilities afforded by using giss in lis research. t he modern public library operates in a context of limited (and often continually reduced) funding where the librarians must justify the continued value of the library to funding and supervisory authorities. this is especially the case as more and more patrons access the library virtually, calling into question the relevance of the physical library. in this context, there is a great need for librarians and researchers to evaluate the use of library facility space to demonstrate that the physical library is still being used for important social and educational functions. despite this need, no model of public library facility evaluation emphasizes the ways patrons use library facilities. the systematic collection of in-library use data must go beyond traditional circulation and reference transactions to include self-service activities, group study and collaboration, socializing, and more. geographic information systems (giss) are beginning to become deployed in library and information science (lis) research as a tool for graphically displaying data. an initial review of the literature has yielded studies where a gis has been used in analyzing service area populations through u.s. census data;1 sitting facility locations;2 managing facilities, including spatial distribution of in-library book use and occupancy of library study space;3 and planning branch consolidations.4 these uses of gis are not mutually exclusive; studies have combined multiple uses of giss.5 also, giss have been proposed as viable tools for producing visual representations of measurements of library facility use.6 these studies show the capabilities of a gis for storing, managing, analyzing, and displaying in-library use data and the value of gisproduced maps for library facility evaluations, in-library use research, and library justification. n research purpose observing and measuring the use of a library facility is a crucial step in the facility evaluation process. the library needs to understand how the facility is currently being used in order to justify the continued financial support necessary to maintain and operate it. understanding how the facility is used can also help librarians identify hightraffic areas of the library that are ideal locations to market library services and materials. this understanding cannot be reached by analyzing circulation and reference transaction data alone; it must include in-library use measures that account for all ways patrons are using the facility. the purpose of this paper is to suggest a method by which to observe and record all uses of a library facility during a sampling period, the so-called “seating sweep” performed by given and leckie, and then to use a gis to store, manage, and display the collected data on a map or series of maps that graphically depict library use.7 n significance of facility evaluation facility evaluation is a topic of vital importance in all fields, but this is especially true of a field such as public librarianship where funding is often a source of concern.8 in times of economic instability, libraries can benefit from the ability to identify uses of existing facilities and employ this information to justify the continued operation of the library facility. also, knowing which areas of the library are more frequently used than others can help lauren h. mandel (lmandel@fsu.edu) is a doctoral candidate at the florida state university college of communication & information, school of library & information studies, and is research coordinator at the information use management & policy institute. 48 information technology and libraries | march 2010 librarians determine where to place displays of library materials and advertisements of library services. for a library to begin to evaluate patron use and how well the facility meets users’ needs, there must be an understanding of what users need from the library facility.9 to determine those needs, it is vital that library staff observe the facility while it is being used. this observation can be applied to the facility evaluation plan to justify the continued operation of the facility to meet the needs of the library service population. understanding how people use the public library facility beyond traditional measures of circulation statistics and reference transactions can lead to new theories of library use, an area of significant research interest for lis. additionally, the importance of this work transcends lis because it applies to other government-funded community service agencies as well. for example, recreation facilities and community centers could also benefit from a customer-use model that incorporates measures of the true use of those facilities. n literature review although much has been written on the use of library facilities, little of the research includes studies of how patrons actually use existing public library facilities and whether facilities are designed to accommodate this use.10 rather, much of the research in public library facility evaluation has focused on collection and equipment space needs,11 despite the user-oriented focus of public library accountability models.12 recent research in library facility design is beginning to reflect this focus,13 but additional study would be useful to the field. use of gis is on the rise in the modern technological world. a gis is a computer-based tool for compiling, storing, analyzing, and displaying data graphically.14 usually this data is geospatial in nature, but a gis also can incorporate descriptive or statistic data to provide a richer picture than figures and tables can. although gis has been around for half a century, it has become increasingly more affordable, allowing libraries and similar institutions to consider using a gis as a measurement and analysis tool. giss have started being used in lis research as a tool for graphically displaying library data. one fruitful area has been the mapping of user demographics for facility planning purposes,15 including studies that mapped library closures.16 mapping also can include in-library use data,17 in which case a gis is used to overlay collected in-library use data on library floor plans. this can offer a richer picture of how a facility is being used than traditional charts and tables can provide. using a gis to display library service area population data adkins and sturges suggest libraries use a gis-based library service area assessment as a method to evaluate their service areas and plan library services to meet the unique demographic demands of their communities.18 they discuss the methods of using gis, including downloading u.s. census tiger (topologically integrated geographic encoding and referencing) files, geocoding library locations, delineating service areas by multiple methods, and analyzing demographics. a key tenet of this approach is the concept that public libraries need to understand the needs of their patrons. this is a prevailing concept in the literature.19 prieser and wang, in reporting a method used to create a facilities master plan for the public library of cincinnati and hamilton county, ohio, offer a convincing argument for combining gis and building performance evaluation (bpe) methods to examine branch facility needs and offer individualized facilities recommendations.20 like other lis researchers,21 preiser and wang suggest a relationship between libraries and retail stores, noting the similar modern trends of destination libraries and destination bookstores. they also acknowledge the difficulty in completing an accurate library performance assessment due to the multitude of activities and functions of a library. their method is a combination of a gis-based service area and population analysis with a bpe that includes staff and user interviews and surveys, direct observation, and photography. the described multimethod approach offers a more complete picture of a library facility’s performance than traditional circulation-based evaluations. further use of giss in library facility planning can be seen from a study comparing proposed branches by demographic data that has been analyzed and presented through a gis. hertel and sprague describe research that used a gis to conduct geospatial analysis of u.s. census data to depict the demographics of populations that would be served by two proposed branch libraries for a public library system in idaho.22 a primary purpose of this research is to demonstrate the possible ways public libraries can use gis to present visual and quantitative demographic analyses of service area populations. hertel and sprague identify that public libraries are challenged to determine which public they are serving and the needs of that population, writing that “libraries are beginning to add customer-based satisfaction as a critical component of resource allocation decisions” and need the help of a gis to provide hard-data evidence in support of staff observations.23 this evidence could take the form of demographic data, as discussed by hertel and sprague, and also could incorporate in-library use data to present a fuller picture of a facility’s use. geographic information systems: tools for displaying in-library use data | mandel 49 using gis to display in-library use data xia conducted several studies in which he collected libraryuse data and mapped that data via a gis. in one study designed to identify the importance of space management in academic libraries, xia suggests applications of giss in library space management, particularly his tool integrating library floor plans with feature data in a gis.24 he explains that a gis can overcome the constraints of drafting and computer automated design tools, such as those in use at chico meriam library at california state university and at the michigan state university main library. for example, giss are not limited to space visualization manipulation, but can incorporate user perceptions, behavior, and daily activities, all of which are important data to library space management considerations and in-library use research. xia also reviews the use of gis tools that incorporate hospital and casino floor plans, noting that library facilities are as equally complex as hospitals and casinos; this is a compelling argument that academic libraries should consider the use of a gis as a space management tool. in another study, xia uses a gis to visualize the spatial distribution of books in the library in an attempt to establish the relationship between the height of bookshelves and the in-library use of books.25 this study seeks to answer the question of how the location of books on shelves of different heights could influence user behavior (i.e., patrons may prefer to browse shelves at eye level rather than the top and bottom shelves). what is of interest here is xia’s use of a gis to spatially represent the collected data. xia remarks that a gis “is suitable for assisting in the research of in-library book use where library floor layouts can be drawn into maps on multipledimensional views.”26 in fact, xia’s graphics depict the use of books by bookshelf height in a visual manner that could not be achieved without the use of a gis. similarly, a gis can be used to spatially represent the collected data in an in-library use study by overlaying the data onto a representation of the library floor plan. in a third project, xia measures study space use in academic libraries as a metric of user satisfaction with library services.27 he says that libraries need to evaluate space needs on case-by-case basis because every library is unique and serves a unique population. therefore, to observe the occupancy of study areas in an academic library, xia drew the library’s study facilities (including furniture) in a gis. he then observed patrons’ use of the facilities and entered the observation data into the gis to overlay on maps of the study areas. there are several advantages of using gis in this way: spatial databases can store continuing data sets, the system is powerful and flexible for manipulating and analyzing the spatial dataset, there are enhanced data visualization capabilities, and maps and data become interactive. conclusions drawn from the literature a gis is a tool gaining momentum in the world of lis research. giss have been used to conduct and display service area population assessments,28 propose facility locations,29 and plan for and measure branch consolidation impacts and benefits.30 giss also have been used to graphically represent in-library use for managing facility space allocation, mapping in-library book use, and visualizing the occupancy of library study space.31 additionally, giss have been used in combination studies that examine library service areas and facility location proposals.32 these uses of giss are only the beginning; a gis can be used to map any type of data a library can collect, including all measures of in-library use. additionally, gis-based data analysis and display complements the focus in library-use research on gathering data to show a richer picture of a facility’s use and the focus in library facility design literature on building libraries on the basis of community needs.33 n in-library use research that would benefit from spatial data displays unobtrusive observational research offers a rich method for identifying and recording the use of a public library facility. a researcher could obtain a copy of a library’s floor plan, predetermine sampling times during which to “sweep” the library, and conduct the sweeps by marking all patrons observed on the floor plan.34 this data then could be entered into a gis database for spatial analysis and display. specific questions that could be addressed via such a method include the following: n what are all the ways in which people are using the library facility? n how many people are using traditional library resources, such as books and computers? n how many people are using the facility for other reasons, such as relaxation, meeting friends, and so on? n do the ways in which patrons use the library vary by location within the facility (e.g., are the people using traditional library resources and the people using the library for other reasons using the same areas of the library or different areas)? n which area(s) of the library facility receive the highest level of use? it is hoped that answers to these questions, in whole or in part, could begin to offer a picture of how a library facility is currently being used by library patrons. to better view this picture, the data recorded from the observational research could be entered into a gis to 50 information technology and libraries | march 2010 overlay onto the library floor plan in a similar manner as xia’s use of a gis to display occupancy of library study space.35 this spatial representation of the data should facilitate greater understanding of the actual use of the library facility. instead of a library presenting tables and graphs of library use, it would be able to produce illustrative maps that would help explain patterns of use to funding and supervising authorities. these maps would not require expensive proprietary gis packages; the examples provided in this paper were created using the free, open-source mapwindow gis package. example using gis to display in-library use data for this paper, i produced example maps on the basis of fictional in-library use data. these maps were created using mapwindow gis software along with microsoft excel, publisher, and paint (see figure 1 for a diagram of this process). mapwindow is an open-source gis package that is easy to learn and use, but its layout and graphic design features are limited compared to the more expensive and sophisticated proprietary gis packages.36 mapwindow files are compatible with the proprietary packages, so they could be imported into other gis packages for finishing. for this paper, however, the goal was to create simple maps that a novice could replicate. therefore publisher and paint were used for finalizing the maps, instead of a sophisticated gis package. it was relatively easy to create the maps. first, i drew a sample floor plan of a fictional library computer lab in excel and imported it into mapwindow as a jpeg file. i then overlaid polygons (shapes that represent area units such as chairs and tables) onto the floor plan and saved two shapefiles, one for tables and one for computers. a shapefile is a basic storage file used in most gis packages. for each of those shapefiles i created an attribute table (basically, a linked spreadsheet) using fictitious data representing use of the tables and computers at 9 and 11 a.m. and 1, 3, 5, and 7 p.m. on a sample day. the field calculator generated a final column summing the total use of each table and computer for the fictitious sample day. i then created maps depicting the use of both tables and computers at each of the sample time periods (see figure 2) and for the total use (see figure 3). benefits of gis-created displays for library managers the maps presented here are not based on actual data, but are meant to demonstrate the capabilities of giss for spatially representing the use of a library facility. this could be done on a grander scale using an entire library floor plan and data collected during a longer sample period (e.g., a full week). these maps can serve several purposes for figure 1. process diagram for creating the sample maps figure 2. example maps depicting use of tables and computers in a fictional library computer lab, by hour geographic information systems: tools for displaying in-library use data | mandel 51 library managers, specifically regarding the marketing of library services and the justification of library funding. mapping data obtained from library “sweeps” can help identify the popularity of different areas of the library at different times of the day, different days of the week, or different times of the year. once the library has identified the most popular areas, this information can be used to market library materials and services. for example, a highly populated area would be an ideal location over which to install ceiling-mounted signs that the library could use for marketing services and programs. or the library could purchase a book display table similar to those used in bookstores and install it in the middle of a frequently populated area. the library could stock the table with seasonally relevant books and other materials (e.g., tax guidebooks in march and april) and track the circulation of these materials to determine the degree to which placement on the display table resulted in increased borrowing of those materials. in addition to helping the library market its materials and services, mapping in-library use can provide visual evidence of the library’s value. public libraries often rely on reference and circulation transaction data, gate counts, and programming attendance statistics to justify their existence. these measures, although valuable and important, do not include many other ways that patrons use libraries, such as sitting and reading, studying, group work, and socializing. during “seating sweeps,” the observers can record any and all uses they observe, including any that may not have been anticipated. all of these uses could then be mapped, providing a richer picture of how a public library is used and stronger justification of the library’s value. these maps may be easier for funding and supervising authorities to understand than textual explanations or graphs and charts of statistical analyses. n conclusion from a review of the literature, it is clear that giss are increasingly being used in lis research as data-analysis and display tools. giss are being used to analyze patron and materials data as well as studies combining combined multiple uses of giss. patron analysis has included service-area-population analysis and branch-consolidation planning. analysis of library materials has been used for space management, visualizing the spatial distribution of in-library book use, and visual representation of facility-use measurements. this paper has proposed collecting in-library use data according to given and leckie’s “seating sweeps” method and visually displaying that data via a gis. examples of such visual displays were provided to facilitate the reader’s understanding of the possibilities afforded by using a gis in lis research, as well as the scalable nature of the method. librarians and library staff can produce maps similar to the examples in this paper with minimal gis training and background. the literature review and example figures offered in this paper show the capabilities of giss for analyzing and graphically presenting library-use data. giss are tools that can facilitate library facility evaluations, in-library use research, and library valuation and justification. references 1. denice adkins and denyse k. sturges, “library service planning with gis and census data,” public libraries 43, no. 3 (2004): 165–70; karen hertel and nancy sprague, “gis and census data: tools for library planning,” library hi tech 25, no. 2 (2007): 246–59; wolfgang f. e. preiser and xinhao wang, “assessing library performance with gis and building evaluation methods,” new library world 107, no. 1224–25 (2006): 193–217. figure 3. example map depicting total use of tables and computers in a fictional library computer lab for a sample day 52 information technology and libraries | march 2010 2. hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 3. jingfeng xia, “library space management: a gis proposal,” library hi tech 22, no. 4 (2004): 375–82; xia, “using gis to measure in-library book-use behavior,” information technology & libraries 23, no. 4 (2004): 184–91; xia, “visualizing occupancy of library study space with gis maps,” new library world 106, no. 1212–13 (2005): 219–33. 4. preiser and wang, “assessing library performance.” 5. hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 6. preiser and wang, “assessing library performance”; xia, “library space management”; xia, “using gis to measure”; xia, “visualizing occupancy.” 7. lisa m. given and gloria j. leckie, “‘sweeping’ the library: mapping the social activity space of the public library,” library & information science research 25, no. 4 (2003): 365–85. 8. “jackson rejects levy to reopen libraries,” american libraries 38, no. 7 (2007): 24–25; “may levy set for jackson county libraries closing in april,” american libraries 38, no. 3 (2007): 14; “tax reform has florida bracing for major budget cuts,” american libraries 38, no. 8 (2007): 21. 9. anne morris and elizabeth barron, “user consultation in public library services,” library management 19, no. 7 (1998): 404–15; susan l. silver and lisa t. nickel, surveying user activity as a tool for space planning in an academic library (tampa: univ. of south florida library, 2002); james simon and kurt schlichting, “the college connection: using academic support to conduct public library services,” public libraries 42, no. 6 (2003): 375–78. 10. given and leckie, “‘sweeping’ the library”; christie m. koontz, dean k. jue, and keith curry lance, “collecting detailed in-library usage data in the u.s. public libraries: the methodology, the results and the impact,” in proceedings of the third northumbria international conference on performance measurement in libraries and information services (newcastle, uk: university of northumbria, 2001): 175–79; koontz, jue, and lance, “neighborhood-based in-library use performance measures for public libraries: a nationwide study of majorityminority and majority white/low income markets using personal digital data collectors,” library & information science research 27, no. 1 (2005): 28–50. 11. cheryl bryan, managing facilities for results: optimizing space for services (chicago: ala, 2007); anders c. dahlgren, public library space needs: a planning outline (madison, wis.: department of public instruction, 1988); william w. sannwald and robert s. smith, eds., checklist of library building design considerations (chicago: ala, 1988). 12. brenda dervin, “useful theory for librarianship: communication, not information,” drexel library quarterly 13, no. 3 (1977): 16–32; morris and barron, “user consultation”; preiser and wang, “assessing library performance”; simon and schlichting, “the college connection”; norman walzer, karen stott, and lori sutton, “changes in public library services,” illinois libraries 83, no. 1 (2001): 47–52. 13. bradley wade bishop, “use of geographic information systems in marketing and facility site location: a case study of douglas county (colo.) libraries,” public libraries 47, no. 5: 65–69; david jones, “people places: public library buildings for the new millennium,” australasian public libraries & information services 14, no. 3 (2001): 81–89; nolan lushington, libraries designed for users: a 21st century guide (new york: neal-schuman, 2001); shannon mattern, “form for function: the architecture for new libraries,” in the new downtown library: designing with communities (minneapolis: univ. of minnesota pr., 2007), 55–83. 14. united nations, department of economic and social affairs, statistics division, handbook on geographical information systems and mapping (new york: united nations, 2000). 15. adkins and sturges, “library service planning”; bishop, “use of geographic information systems”; hertel and sprague, “gis and census data”; christie koontz, “using geographic information systems for estimating and profiling geographic library market areas,” in geographic information systems and libraries: patrons, maps, and spatial information, ed. linda c. smith and mike gluck (urbana–champaign: univ. of illinois pr., 1996): 181–93; preiser and wang, “assessing library performance.” 16. christie m. koontz, dean k. jue, and bradley wade bishop, “public library facility closure: an investigation of reasons for closure and effects on geographic market areas,” library & information science research 31, no. 2 (2009): 84–91. 17. xia, “library space management”; xia, “using gis to measure”; xia, “visualizing occupancy.” 18. adkins and sturges, “library service planning.” 19. bishop, “use of geographic information systems”; jones, “people places”; koontz, jue, and lance, “collecting detailed inlibrary usage data”; koontz, jue, and lance, “neighborhoodbased in-library use”; morris and barron, “user consultation”; simon and schlichting, “the college connection”; walzer, stott, and sutton, “changes in public library services.” 20. preiser and wang, “assessing library performance.” 21. given and leckie, “‘sweeping’ the library;” christie m. koontz, “retail interior layout for libraries,” marketing library services 19, no. 1 (2005): 3–5. 22. hertel and sprague, “gis and census data.” 23. ibid., 247. 24. xia, “library space management.” 25. xia, “using gis to measure.” 26. ibid., 186. 27. xia, “visualizing occupancy.” 28. adkins and sturges, “library service planning”; hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 29. hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 30. koontz, jue, and bishop, “public library facility closure”; preiser and wang, “assessing library performance.” 31. xia, “library space management”; xia, “using gis to measure”; xia, “visualizing occupancy.” 32. hertel and sprague, “gis and census data”; preiser and wang, “assessing library performance.” 33. given and leckie, “‘sweeping’ the library”; koontz, jue, and lance, “collecting detailed in-library usage data”; koontz, jue, and lance, “neighborhood-based in-library use”; silver and nickel, surveying user activity; jones, “people places”; lushington, libraries designed for users. 34. given and leckie, “‘sweeping’ the library.” 35. xia, “visualizing occupancy.” 36. for more information or to download mapwindow gis, see http://www.mapwindow.org/ : | zhang et al. 75seeing the wood for the trees | zhang et al. 75 here again, no weighting or differentiating mechanism is included in describing the multiple elements. what is addressed is the “what” problem: what is the work of or about? metadata schemas for images and art works such as vra core and cdwa focus on specificity and exhaustivity of indexing, that is, the precision and quantity of terms applied to a subject element. however, these schemas do not address the question of how much the work is of or about the item or concept represented by a particular keyword. recently, social tagging functions have been adopted in digital library and catalog systems to help support better searching and browsing. this introduces more subject terms into the system. yet again, there is typically no mechanism to differentiate between the tags used for any given item, except for only a few sites that make use of tag frequency information in the search interfaces. as collections grow and more federated searching is carried out, the absence of weights for subject terms can cause problems in search and navigation. the following examples illustrate the problems, and the rest of the paper further reviews and discusses the precedent research and practice on weighting, and further outlines the issues that are critical in applying a weighting mechanism. example, the dublin core metadata element set recommends the use of controlled vocabulary to represent subject in “keywords, key phrases, or classification codes.”1 similarly, the library of congress practice, suggested in the subject headings manual, is to assign “one or more subject headings that best summarize the overall contents of the work and provide access to its most important topics.”2 a topic is only “important enough” to be given a subject heading if it comprises at least 20 percent of a work, except for headings of named entities, which do not need to be 20 percent of the work when they are “critical to the subject of the work as a whole.”3 although catalogers are aware of it when they assign terms, this weight information is left out of the current library metadata schemas and practice. a similar practice applies in non-textual object subject indexing. because of the difficulty of selecting words to represent visual/aural symbolism, subject indexing for art and cultural objects is usually guided by panofsky’s three levels of meaning (pre-iconographical, iconographical, and post-iconographical), further refined by layne in “ofness” and “aboutness” in each level. specifically, what can be indexed includes the “ofness” (what the picture depicts) as well as some “aboutness” (what is expressed in the picture) in both pre–iconographical and iconographical levels.4 in practice, vra core 4.0 for example defines subject subelements as: terms or phrases that describe, identify, or interpret the work or image and what it depicts or expresses. these may include generic terms that describe the work and the elements that it comprises, terms that identify particular people, geographic places, narrative and iconographic themes, or terms that refer to broader concepts or interpretations.5 seeing the wood for the trees: enhancing metadata subject elements with weights subject indexing has been conducted in a dichotomous way in terms of what the information object is primarily about/of or not, corresponding to the presence or absence of a particular subject term, respectively. with more subject terms brought into information systems via social tagging, manual cataloging, or automated indexing, many more partially relevant results can be retrieved. using examples from digital image collections and online library catalog systems, we explore the problem and advocate for adding a weighting mechanism to subject indexing and tagging to make web search and navigation more effective and efficient. we argue that the weighting of subject terms is more important than ever in today’s world of growing collections, more federated searching, and expansion of social tagging. such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and taggers, but also needs to be incorporated into system functionality and metadata schemas. s ubjects as important access points have largely been indexed in a dichotomous way: what the object is primarily about/ of or not. this approach to indexing is implicitly assumed in various guidelines for subject indexing. for hong zhang, linda c. smith, michael twidale, and fang huang gaocommunications hong zhang (hzhang1@illinois.edu) is phd candidate, graduate school of library and information science, university of illinois at urbana-champaign, linda c. smith (lcsmith@illinois.edu) is professor, graduate school of library and information science, university of illinois at urbana-champaign, michael twidale (twidale@illinois.edu) is professor, graduate school of library and information science, university of illinois at urbana-champaign, and fang huang gao (fgao@gpo.gov) is supervisory librarian, government printing office. 76 information technology and libraries | june 2011 ■■ examples of problems exhaustive indexing: digital library collections a search query of “tree” can return thousands of images in several digital library collections. the results include images with a tree or trees as primary components mixed with images where a tree or trees, although definitely present, are minor components of the image. figure 1 illustrates the point. these examples come from three different collections and either include the subject element of “tree” or are tagged with “tree” by users. there is no mechanism that catalogers or users have available to indicate that “tree” in these images is a minor component. note that we are not calling this out as an error in the professionally developed subject terms, nor indeed in the end user generated tags. although particular images may have an incorrectly applied keyword, we want to talk about the vast majority where the keyword quite correctly refers to a component of the image. furthermore, such keywords referring to minor components of the image are extremely useful for other queries. this kind of exhaustive indexing of images enables the effective satisfaction of search needs, such as looking for pictures of “buildings, people, and trees” or “trees beside a river.” with large image collections, such compound needs become more important to satisfy by combinations of searching and browsing. to enable them, metadata about minor subjects is essential. however, without weights to differentiate subject keywords, users will get overwhelmed with partially relevant results. for example, a user looking for images of trees (i.e., “tree” as the primary subject) would have to look through large sets of results such as a photograph of a dog with a tiny tree out of focus in the background. for some items that include rich metadata, such as title or description, when people look at a particular item’s record, with the title and sometimes the description, we may very well determine that the picture is primarily of, say, a dog instead of trees. that is, the subject elements have to be interpreted based on the context of other elements in the record to convey the “primary” and “peripheral” subjects among the listed subject terms. however, in a search and navigation system where subject elements are usually treated as context-free, search efficiency will be largely impaired because of the “noise” items and inability to refine the scope, especially when the volume of items grows. lack of weighting also limits other potential uses of keywords or tags. for example, all the tags of all the items in a collection can be used to create a tag cloud as a low cost way to contribute to a visualization of what a collection is “about” overall.6 unfortunately, a laboriously developed set of exhaustive tags, although valuable for supporting searching and browsing within a large image collection, could give a very distorted overview of what the whole collection is about. extending our example, the tag “tree” may occur so frequently and be so prominent in the tag cloud that a user infers that this is mostly a botanical collection. selective indexing: lcsh in library catalogs although more extreme in the case of images in conveying the “ofness,” the same problem with multiple subjects also applies to text in terms of “aboutness.” the following example comes from an online library catalog in a faceted navigation web interface using library of congress subject headings in subject cataloging.7 the query “psychoanalysis and religion” returned 158 results, with 126 in “psychoanalysis and religion” under the topic facet. according to the subject headings manual, the first subject is always the primary one, while the second and others could be either a primary or nonprimary subject.8 this means that among these 126 books, there is no easy way to tell which books are “primarily” about “psychoanalysis and religion” unless the user goes through all of them. with the provided metadata, we do know that all books that have “psychoanalysis and religion” as the first subject heading are primarily about this topic, but a book that has this same heading as its second subject heading may or may not be primarily about this topic. there is no way to indicate which it is in the metadata, nor in the search interface. as this example shows, the library of congress manual involves an attempt to acknowledge and make a distinction between primary and nonprimary subjects. however in practice the attempt is insufficient to be really useful since apart from the first entry, it is ambiguous whether subsequent entries are additional primary subjects or nonprimary subjects. consequently, the search system and, further on, the users are not able to take full advantage of the care of a cataloger in deciding whether an additional subject is primary or not. other information retrieval systems the negative effect of current subject indexing without weighting on search outcomes has been identified by some researchers on particular information retrieval systems. in a study examining “the contribution of metadata to effective searching,”9 hawking and zobel found that the available subject metadata are “of little value in ranking answers” to search queries.10 their explanation is that “it is difficult to indicate via metadata tagging the relative importance of a page to a particular topic,”11 in addition to the problems in data quality and system implementation. the same problem : | zhang et al. 77seeing the wood for the trees | zhang et al. 77 authors compared with the automatic indexing systems, because human indexers should be better at weighting the significance of subjects, and be more able to distinguish between important and peripheral compared with computers that base significance on term frequency.13 indeed, while various weighting algorithms have been used in automatic indexing systems to approximate the distinguishing function, there is simply no such mechanism built in human subject the particular page harder to find.12 a similar problem is reported in a recent study by lykke and eslau. in comparing searching by controlled subject metadata, searching based on automatic indexing, and searching based on automatic indexing expanded with a corporate thesaurus in an enterprise electronic document management system, the authors found that the metadata searches produced the lowest precision among the three strategies. the problem of indiscriminate metadata indexing is “remarkable” to the of multiple tags without weights is described: in the kinds of queries we have studied, there is typically one page (or at most a small number) that is particularly valuable. there are many other pages which could be said to be relevant to the query—and thus merit a metadata match—but they are not nearly so useful for a typical searcher. under the assumption that metadata is needed for search, all of these pages should have the relevant metadata tag, but this makes a. subject: women; books; dresses; flowers; trees; . . . in: victoria & albert museum (accessed aug. 30, 2010), http://collections.vam.ac.uk/item/014962/oil-painting-the-day-dream b. tags: japanese; moon; nights; walking; tree; . . . in: brooklyn museum (accessed aug. 30, 2010), http://www.brooklynmuseum.org/opencollections/objects/121725/aoi_slope_outside_toranomon_gate_no._113_from_ one_hundred_famous_views_of_edo c. tags: japanese; birds; silk; waterfall; tree; . . . in: steve: the museum social tagging project (accessed aug. 30, 2010), http://tagger.steve.museum/steve/object/15?offset=2 figure 1. example images with “tree” as a subject item 78 information technology and libraries | june 2011 anderson in niso tr021997.20 in addition, researchers have noticed the limitations of this dichotomous indexing. in an opinion piece, markey emphasizes the urgency to “replace boolean-based catalogs with post-boolean probabilistic retrieval methods,”21 especially given the challenges library systems are faced with today. it is the time to change the boolean, i.e., dichotomous, practice of subject indexing and cataloging, no matter whether it is produced by professional librarians, by user tagging, or by an automatic mechanism. indeed, as declared by svenonius, “while the purpose of an index is to point, the pointing cannot be done indiscriminately.”22 needed refinements in subject indexing the fact that weighted indexing has become more prominently needed over the past decade may be related to the shift in the continuum from subject indexing as representation/ surrogate to subject indexing as access points, which is consistent with the shift from a small number of subject terms to more subject terms. this might explain why the weighting practice is applied in the above mentioned medline/pubmed system. with web-based systems, social tagging technology, federated searching, and the growing number of collections producing more subject terms, to distinguish between them has become a prominent problem. in reviewing information users and use from the 1920s to the present, miksa points out the trend to “more granular access to informational objects” “by viewing documents as having many diverse subjects rather than one or two ‘main’ subjects,” no matter what the social and technical environment has been.23 in recognizing this theme in the future development of information organization and retrieval systems, we argue that the subject indexing mechanism subject indexing has been discussed in the research area of subject analysis for some time. weighting gives indexing an increased granularity and can be a device to counteract the effect of indexing specificity and exhaustivity on precision and recall, as pointed out by foskett: whereas specificity is a device to increase relevance at the cost of recall, exhaustivity works in the opposite direction, by increasing recall, but at the expense of relevance. a device which we may use to counteract this effect to some extent is weighting. in this, we try to show the significance of any particular specification by giving it a weight on a pre-established scale. for example, if we had a book on pets which dealt largely with dogs, we might give pets a weight of 10/10, and dogs, a weight of 8/10 or less.16 anderson also includes weighting as a part of indexing in the guidelines for indexes and related information retrieval devices (niso tr021997): one function of an index is to discriminate between major and minor treatments of particular topics or manifestations of particular features.17 he also notes that a weighting scheme is “especially useful in high-exhaustivity indexing”18 when both peripheral and primary topics are indicated. similarly, fidel lists “weights” as one of the issues that should be addressed in an indexing policy.19 metadata indexing without weighting is related to the simplified dichotomous assumption in subject indexing—primarily about/of and not primarily about/of, which further leads to the dichotomous retrieval result—retrieved and not retrieved. weighting as a mechanism to break this dichotomy is noted by metadata indexing even though human indexers are able to do the job much better than computers. weighting: yesterday, today, and future precedent weighting practices written more than thirty years ago, the final report of the subject access project describes how the project researchers applied weights to the newly added subject terms extracted from tables of contents and backof-the-book indexes. the criterion used in that project was that terms and phrases with a “ten-page range or larger” were treated as “major” ones.14 a similar mechanism was adopted in the eric database beginning in the 1960s, with indexes distinguishing “major” and “minor” descriptors as the result of indexing. while some search systems allowed differentiation of major and minor descriptors in formulating searches, others simply included the distinction (with an asterisk) when displaying a record. unfortunately, this distinguishing mechanism is no longer included in the later eric indexing data. a system using weighted indexing and searching and still running today is the medline/pubmed interface. a qualifier [majr] can be used with a medical subject headings (mesh) term in a query to “search a mesh heading which is a major topic of an article (e.g., thromboembolism[majr]).”15 in the search result page, each major mesh topic term is denoted by an asterisk at the end. weighting concept and the purpose of indexing the weighting concept is connected with the fundamental purpose of indexing. the idea of weighting in : | zhang et al. 79seeing the wood for the trees | zhang et al. 79 user tagging and machine generated metadata, such weighting becomes more important than ever if we are to make productive use of metadata richness and still see the wood for the trees. references 1. “dublin core metadata element set, version 1.1,” http://dublincore.org/docu ments/dces/ (accessed nov. 20, 2010). 2. library of congress, subject headings manual (washington, d.c.: library of congress, 2008). 3. ibid. 4. elaine svenonius, “access to nonbook materials: the limits of subject indexing for visual and aural languages,” journal of the american society for information science, 45, no. 8 (1994): 600–606. 5. “vra core 4.0 element description,” http://www.loc.gov/standards/vracore/ vra_core4_element_description.pdf (accessed mar. 31, 2011). 6. richard j. urban, michael b. twidale, and piotr adamczyk, “designing and developing a collections dashboard,” in j. trant and d. bearman (eds). museums and the web 2010: proceedings, ed. j. trant and d. bearman (toronto: archives & museum informatics, 2010). http://www .archimuse.com/mw2010/papers/urban/ urban.html (accessed apr. 5, 2011). 7. “vufind at the university of illinois,” http://vufind.carli.illinois.edu (accessed nov. 20, 2010). 8. library of congress, subject headings manual. 9. david hawking and justin zobel, “does topic metadata help with web search?” journal of the american society for information science & technology 58, no. 5 (2007): 613–28. 10. ibid. 11. ibid. 12. ibid, 625. 13. marianne lykke and anna g. eslau, “using thesauri in enterprise settings: indexing or query expansion?” in the janus faced scholar. a festschrift in honour of peter ingwersen, ed. birger larsen et al. (copenhagen: royal school of library & information science, 2010): 87–97. 14. subject access project, books are for use: final report of the subject access project to the council on library resources (syracuse, n.y.: syracuse univ., 1978). 15. “pubmed,” http://www.nlm.nih more than three categories or using continuous scales instead of category rating.24 subject indexing involves a similar judgment of relevance when deciding whether to include a subject term. more sophisticated scales certainly enable more useful ranking of results, but the cost of obtaining such information may rise. after the mechanism of incorporating weights into subject indexing/ cataloging is developed, guidelines should be provided for indexing practice to produce consistent and good quality. weights in both indexing and retrieval system adding weights to subject indexing/ cataloging needs to be considered and applied in three parts: (1) extending metadata schemas by encoding weights in subject elements; (2) subject indexing/cataloging with weight information; and (3) retrieval systems that exploit the weighting information in subject metadata elements. the mechanism will not work effectively in the absence of any one of them. conclusion this paper advocates for adding a weighting mechanism to subject indexing and tagging, to enable search algorithms to be more discriminating and browsing better oriented, and thus to make it possible to provide more granular access to information. such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and taggers, but also needs to be incorporated into system functionality. as social tagging is brought into today’s digital library collections and online library catalogs, as collections grow and are aggregated, and the opportunity arises for adding more metadata from a variety of different sources, including end should provide sufficient granularity to allow more granular access to information, as demonstrated in the examples in the previous section. potential challenges while arguing for the potential value of weights associated with subject terms, it is also important to acknowledge potential challenges posed by this approach. human judgment treating assigned terms equally might seem to avoid the additional human judgment and the subjectivity of the weight levels because different catalogers may give different weight to a subject heading. we argue that assigning subject headings is itself unavoidably subjective. we are already using professional indexers and subject catalogers to create value-added metadata in the form of subject terms. assigning weights would be a further enhancement. on the other hand, adding a weighting mechanism into metadata schemas is independent of the issue of human indexing. no matter who will do the subject indexing or tagging, either professional librarians or users or possibly computers, there is a need for weight information in the metadata records. the weighting scale in terms of the specific mechanism of representing the weight rating, we can benefit from research on weighting of index terms and on the relevance of search results. for example, the three categories of relevant, partially relevant, and nonrelevant in information retrieval are similar to the major, minor, and nonpresent subject indexing method in the examples above. borlund notes several retrieval studies proposing 80 information technology and libraries | june 2011 22. svenonius, “access to nonbook materials,” 601. 23. francis miksa, “information organization and the mysterious information user,” libraries & the cultural record 44, no. 3 (2009): 343–70. 24. pia borlund, “the concept of relevance in ir,” journal of the american society for information science & technology 54, no. 10 (2003): 913–25. 18. ibid. 19. raya fidel, “user-centered indexing,” journal of the american society for information science 45, no. 8 (1994): 572–75. 20. anderson, guidelines for indexes and related information retrieval devices, 20. 21. karen markey, “the online library catalog: paradise lost and paradise regained?” d-lib magazine 13, no. 1/2 (2007). . g o v / b s d / d i s t e d / p u b m e d t u t o r i a l / 020_760.html (accessed nov. 20, 2010). 16. a. c. foskett, the subject approach to information, 5th ed. (london: library association publishing, 1996): 24. 17. james d. anderson, guidelines for indexes and related information retrieval devices. niso-tr02–1997, http:// www.niso.org/publications/tr/tr02.pdf (accessed nov. 20, 2010): 25. 214 information technology and libraries | december 2010 margaret brown-sica, jeffrey beall, and nina mchale next-generation library catalogs and the problem of slow response time and librarians will benefit from knowing what typical and acceptable response times are in online catalogs, and this information will assist in the design and evaluation of library discovery systems. this study also looks at benchmarks in response time and defines what is unacceptable and why. when advanced features and content in library catalogs increase response time to the extent that users become disaffected and use the catalog less, nextgen catalogs represent a step backward, not forward. in august 2009, the auraria library launched an instance of the worldcat local product from oclc, dubbed worldcat@auraria. the library’s traditional catalog—named skyline and running on the innovative interfaces platform—still runs concurrently with worldcat@auraria. because worldcat local currently lacks a library circulation module that the library was able to use, the legacy catalog is still required for its circulation functionality. in addition, skyline contains marc records from the serialssolution 360 marc product. since many of these records are not yet available in the oclc worldcat database, these records are being maintained in the legacy catalog to enable access to the library’s extensive collection of online journals. almost immediately upon implementation of worldcat local, many library staff began to express concern about the product’s slow response time. they bemoaned its slowness both at the reference desk and during library instruction sessions. few of the discussions of the product’s slow response time evaluated this weakness in the context of its advanced features. several of the reference and instruction librarians even stated that they refused to use it any longer and that they were not recommending it to students and faculty. indeed, many stated that they would only use the legacy skyline catalog from then on. therefore we decided to analyze the product’s response time in relation to the legacy catalog. we also decided to further our study by examining response time in library catalogs in general, including several different online catalog products from different vendors. ■■ response time the term response time can mean different things in different contexts. here we use it to mean the time it takes for all files that constitute a single webpage (in the case of testing performed, a permalink to a bibliographic record) to travel across the internet from a web server to the computer on which the page is to be displayed. we do not include the time it takes for the browser to render the page, only the time it takes for the files to arrive to the requesting computer. typically, a single webpage is made of multiple files; these are sent via the internet from a web response time as defined for this study is the time that it takes for all files that constitute a single webpage to travel across the internet from a web server to the end user’s browser. in this study, the authors tested response times on queries for identical items in five different library catalogs, one of them a next-generation (nextgen) catalog. the authors also discuss acceptable response time and how it may affect the discovery process. they suggest that librarians and vendors should develop standards for acceptable response time and use it in the product selection and development processes. n ext-generation, or nextgen, library catalogs offer advanced features and functionality that facilitate library research and enable web 2.0 features such as tagging and the ability for end users to create lists and add book reviews. in addition, individual catalog records now typically contain much more data than they did in earlier generations of online catalogs. this additional data can include the previously mentioned tags, lists, and reviews, but a bibliographic record may also contain cover images, multiple icons and graphics, tables of contents, holdings data, links to similar items, and much more. this additional data is designed to assist catalog users in the selection, evaluation, and access of library materials. however, all of the additional data and features have the disadvantage of increasing the time it takes for the information to flow across the internet and reach the end user. moreover, the code that handles all this data is much more complex than the coding used in earlier, traditional library catalogs. slow response time has the potential to discourage both library patrons from using the catalog and library staff from using or recommending it. during a reference interview or library instruction session, a slow response time creates an awkward lull in the process, a delay that decreases confidence in the mind of library users, especially novices who are accustomed to the speed of an open internet search. the two-fold purpose of this study is to define the concept of response time as it relates to both traditional and nextgen library catalogs and to measure some typical response times in a selection of library catalogs. libraries margaret brown-sica (margaret.brown-sica@ucdenver.edu) is assistant professor, associate director of technology strategy and learning spaces, jeffrey beall (jeffrey.beall@ucdenver.edu) is assistant professor, metadata librarian, and nina mchale (nina.mchale@ucdenver.edu) is assistant professor, web librarian, university of colorado denver. next-generation library catalogs | brown-sica, beall, and mchale 215 mathews posted an article called “5 next gen library catalogs and 5 students: their initial impressions.”7 here he shares student impressions of several nextgen catalogs. regarding slow response time mathews notes, “lots of comments on slowness. one student said it took more than ten seconds to provide results. some other comments were: ‘that’s unacceptable’ and ‘slow-motion search, typical library.’” nagy and garrison, on lauren’s library blog, emphasized that any “cross-silo federated search” is “as slow as the slower silos.”8 any search interface is as slow as the slowest database from which it pulls information; however, that does not make users more likely to wait for search results. in fact, many users will not even know—or care—what is happening behind the scenes in a nextgen catalog. the assertion that slow response time makes wellintentioned improvements to an interface irrelevant is supported by an article that analyzes the development of semantic web browsers. frachtenberg notes that users, however, have grown to expect web search engines to provide near-instantaneous results, and a slow search engine could be deemed unusable even if it provides highly relevant results. it is therefore imperative for any search engine to meet its users’ interactivity expectations, or risk losing them.9 this is not just a library issue. users expect a fast response to all web queries, and we can learn from studies on general web response time and how it affects the user experience. huang and fong-ling help explain different user standards when using websites. their research suggests that “hygiene factors” such as “navigation, information display, ease of learning and response time” are more important to people using “utilitarian” sites to accomplish tasks rather than “hedonistic” sites.10 in other words, response time importance increases when the user is trying to perform a task— such as research—and possibly even more for a task that may be time sensitive—such as trying to complete an assignment for class. ■■ method for testing response time in an assortment of library catalogs, we used the websitepulse service (http://www .websitepulse.com). websitepulse provides in-depth website and server diagnostic services that are intended to save e-business customers time and money by reporting errors and web server and website performance issues to clients. a thirty-day free trial is available for potential customers to review the full array of their services; however, the free web page test, available at http://www.website server and arrive sequentially at the computer where the request was initiated. while the world wide web consortium (w3c) does not set forth any particular guidelines regarding response time, go-to usability expert jakob nielsen states that “0.1 second is about the limit for having the user feel that the system is reacting instantaneously.”1 he further posits that 1.0 second is “about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay.”2 finally, he asserts that: 10 seconds is about the limit for keeping the user’s attention focused on the dialogue. for longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.3 even though this advice dates to 1994, nielsen noted even then that it had “been about the same for many years.”4 ■■ previous studies the chief benefit of studying response time is to establish it as a criterion for evaluating online products that libraries license and purchase, including nextgen online catalogs. establishing response-time benchmarks will aid in the evaluation of these products and will help libraries convey the message to product vendors that fast response time is a valuable product feature. long response times will indicate that a product is deficient and suffers from poor usability. it is important to note, however, that sometimes library technology environments can be at fault in lengthening response time as well; in “playing tag in the dark: diagnosing slowness in library response time,” brown-sica diagnosed delays in response time by testing such variables as vendor and proxy issues, hardware, bandwidth, and network traffic.5 in that case, inadequate server specifications and settings were at fault. while there are many articles on nextgen catalogs, few of them discuss the issue of response time in relation to their success. search slowness has been reported in library literature about nextgen catalogs’ metasearch cousins, federated search products. in a 2006 review of federated search tools metalib and webfeat, chen noted that “a federated search could be dozens of times slower than google.”6 more comments about the negative effects of slow response time in nextgen catalogs can be found in popular library technology blogs. on his blog, 216 information technology and libraries | december 2010 ■■ findings: skyline versus worldcat@auraria in figure 2, the bar graph shows a sample load time for the permalink to the bibliographic record for the title hard lessons: the iraq reconstruction experience in skyline, auraria’s traditional catalog load time for the page is pulse.com/corporate/alltools.php, met our needs. to use the webpage test, simply select “web page test” from the dropdown menu, input a url—in the case of the testing done for this study, the permalink for one of three books (see, for example, figure 1)—enter the validation code, and click “test it.” websitepulse returns a bar graph (figure 2) and a table (figure 3) of the file activity from the server sending the composite files to the end user ’s web browser. each line represents one of the files that make up the rendered webpage. they load sequentially, and the bar graph shows both the time it took for each file to load and the order in which the files were received. longer segments of the bar graph provide visual indication of where a slow-loading webpage might encounter sticking points—for example, waiting for a large image file or third-party content to load. accompanying the bar graph is a table describing the file transmissions in more detail, including dns, connection, file redirects (if applicable), first and last bytes, file transmission times, and file sizes. figure 1. permalink screen shot for the record for the title hard lessons in auraria library’s skyline catalog figure 2. websitepulse webpage test bar graph results for skyline (traditional) catalog record figure 3. websitepulse webpage test table results for skyline (traditional) catalog record next-generation library catalogs | brown-sica, beall, and mchale 217 requested at items 8, 14, 15, 17, 26, and 27. the third parties include yahoo! api services, the google api service, recaptcha, and addthis. recaptcha is used to provide security within worldcat local with optical character recognition images (“captchas”), and the addthis api is used to provide bookmarking functionality. at number 22, a connection is made to the auraria library web server to retrieve a logo image hosted on the web server. at number 28, the cover photo for hard lessons is retrieved from an oclc server. the files listed in figure 6 details the complex process of web browsers’ assembly of them. each connection to third-party content, while all relatively short, allows for additional features and functionality, but lengthens overall response. as figure 6 shows, the response time is slightly more than 10 seconds, which, according to nielsen, “is about the limit for keeping the user ’s attention focused on the dialogue.”12 while widgets, third-party content, and other web 2.0 tools add desirable content and functionality to the library’s catalog, they also do slow response time considerably. the total file size for the bibliographic record in worldcat@auraria—compared to skyline’s 84.64 kb—is 633.09 kb. as will be shown in the test results below for the catalog and nextgen catalog products, bells and whistles added to traditional 1.1429 seconds total. the record is composed of a total of fourteen items, including image files (gifs), cascading style sheet (css) files, and javascript (js) files. as the graph is read downward, the longer segments of the bars reveal the sticking points. in the case of skyline, the nine image files, two css files, and one js file loaded quickly; the only cause for concern is the red line at item four. this revealed that we were not taking advantage of the option to add a favicon to our iii catalog. the web librarian provided the ils server technician with the same favicon image used for the library’s website, correcting this issue. the skyline catalog, judging by this data, falls into nielsen’s second range of user expectations regarding response time, which is more than one second, or “about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay.”11 further detail is provided in figure 3; this table lists each of the webpage’s component files, and various times associated with the delivery of each file. the column on the right lists the size in kilobytes of each file. the total size of the combined files is 84.64 kb. in contrast to skyline’s meager 14 files, worldcat local requires 31 items to assemble the webpage (figure 4) for the same bibliographic record. figures 5 and 6 show that this includes 10 css files, 10 javascript files, and 8 images files (gifs and pngs). no item in particular slows down the overall process very much; the longestloading item is number 13, which is a wait for third-party content, a connection to yahoo!’s user interface (yui) api service. additional third-party content is being figure4. permalink screen shot for the record for the title hard lessons in worldcat@auraria figure 5. websitepulse webpage test bar graph results for worldcat@auraria record 218 information technology and libraries | december 2010 total time for each permalinked bibliographic record to load as reported by the websitepulse tests; this number appears near the lower right-hand corner of the tables in figures 3, 6, 9, 12, and 15. we selected three books that were each held by all five of our test sites, verifying that we were searching the same three bibliographic records in each of the online catalogs by looking at the oclc number in the records. each of the catalogs we tested has a permalink feature; this is a stable url that always points to the same record in each catalog. using a permalink approximates conducting a known-item search for that item from a catalog search screen. we saved these links and used them in our searches. the bibliographic records we tested were for these books; the permalinks used for testing follow the books: book 1: hard lessons: the iraq reconstruction experience. washington, d.c.: special inspector general, iraq reconstruction, 2009 (oclc number 302189848). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat .org/oclc/302189848 ■■ skyline: http://skyline.cudenver.edu/record=b243 3301~s0 ■■ lcoc: http://lccn.loc.gov/2009366172 ■■ ut austin: http://catalog.lib.utexas.edu/record= b7195737~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=2770895{ckey} book 2: ehrenreich, barbara. nickel and dimed: on (not) getting by in america. 1st ed. new york: metropolitan, 2001 (oclc number 256770509). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat .org/oclc/45243324 ■■ skyline: http://skyline.cudenver.edu/record=b187 0305~s0 ■■ lcoc: http://lccn.loc.gov/00052514 ■■ ut austin: http://catalog.lib.utexas.edu/record= b5133603~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=1856407{ckey} book 3: langley, lester d. simón bolívar: venezuelan rebel, american revolutionary. lanham: rowman & littlefield catalogs slowed response time considerably, even doubling it in one case. are they worth it? the response of auraria’s reference and instruction staff seems to indicate that they are not. ■■ gathering more data: selecting the books and catalogs to study to broaden our comparison and to increase our data collection, we also tested three other non-auraria catalogs. we designed our study to incorporate a number of variables. we decided to link to bibliographic records for three different books in the five different online catalogs tested. these included skyline and worldcat@auraria as well three additional online public access catalog products, for a total of two instances of innovative interfaces products, one of a voyager catalog, and one of a sirsidynix catalog. we also selected online catalogs in different parts of the country: worldcatlocal in ohio; skyline in denver; the library of congress’ online catalog (lcoc) in washington, d.c.; the university of texas at austin’s (ut austin) online catalog; and the university of southern california’s (usc) online catalog, named homer, in los angeles. we also did our testing at different times of the day. one book was tested in the morning, one at midday, and one in the afternoon. websitepulse performs its webpage tests from three different locations in seattle, munich, and brisbane; we selected seattle for all of our tests. we recorded the figure 6. websitepulse webpage test table results for worldcat@auraria record next-generation library catalogs | brown-sica, beall, and mchale 219 .org/oclc/256770509 ■■ skyline: http://skyline.cudenver.edu/record=b242 6349~s0 ■■ lcoc: http://lccn.loc.gov/2008041868 ■■ ut austin: http://catalog.lib.utexas.edu/record= b7192968~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=2755357{ckey} we gathered the data for thirteen days in early november 2009, an active period in the middle of the semester. for each test, we recorded the response time total in seconds. the data is displayed in tables 1–3. we searched bibliographic records for three books in five library catalogs over thirteen days (3 x 5 x 13) for a total of 195 response time measurements. the websitepulse data is calculated to the ten thousandth of a second, and we recorded the data exactly as it was presented. publishers, c2009 (oclc number 256770509). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat table 1. response times for book 1 response time in seconds day wor ldcat skyline lc ut austin usc 1 10.5230 1.3191 2.6366 3.6643 3.1816 2 10.5329 1.2058 1.2588 3.5089 4.0855 3 10.4948 1.2796 2.5506 3.4462 2.8584 4 13.2433 1.4668 1.4071 3.6368 3.2750 5 10.5834 1.3763 3.6363 3.3143 4.6205 6 11.2617 1.2461 2.3836 3.4764 2.9421 7 20.5529 1.2791 3.3990 3.4349 3.2563 8 12.6071 1.3172 3.6494 3.5085 2.7958 9 10.4936 1.1767 2.6883 3.7392 4.0548 10 10.1173 1.5679 1.3661 3.7634 3.1165 11 9.4755 1.1872 1.3535 3.4504 3.3764 12 12.1935 1.3467 4.7499 3.2683 3.4529 13 11.7236 1.2754 1.5569 3.1250 3.1230 average 11.8310 1.3111 2.5105 3.4874 3.3953 table 2. response times for book 2 response time in seconds day worldcat skyline lc ut austin usc 1 10.9524 1.4504 2.5669 3.4649 3.2345 2 10.5885 1.2890 2.7130 3.8244 3.7859 3 10.9267 1.3051 0.2168 4.0154 3.6989 4 13.8776 1.3052 1.3149 4.0293 3.3358 5 10.6495 1.3250 4.5732 3.5775 3.2979 6 11.8369 1.3645 1.3605 3.3152 2.9023 7 11.3482 1.2348 2.3685 3.4073 3.5559 8 10.7717 1.2317 1.3196 3.5326 3.3657 9 11.1694 1.0997 1.0433 2.8096 2.6839 10 19.0694 1.6479 2.5779 4.3595 2.6945 11 12.0109 1.1945 2.5344 3.0848 18.5552 12 12.6881 0.7384 1.3863 3.7873 3.9975 13 11.6370 1.1668 1.2573 3.3211 3.6393 average 12.1174 1.2579 1.9410 3.5791 4.5190 table 3. response times for book 3 response time in seconds day worldcat skyline lc ut austin usc 1 10.8560 1.3345 1.9055 3.7001 2.6903 2 10.1936 1.2671 1.8801 3.5036 2.7641 3 11.0900 1.5326 1.3983 3.5983 3.0025 4 10.9030 1.4557 2.0432 3.6248 2.9285 5 12.3503 1.5972 3.5474 3.6428 4.5431 6 9.1008 1.1661 1.4440 3.4577 3.1080 7 9.6263 1.1240 2.3688 3.1041 3.3388 8 10.9539 1.1944 1.4941 2.8968 3.4224 9 11.0001 1.2805 1.3255 3.3644 2.7236 10 10.2231 1.3778 1.3131 3.3863 3.4885 11 10.1358 1.2476 2.3199 3.4552 2.9302 12 12.0109 1.1945 2.5344 3.0848 18.5552 13 11.5881 1.2596 2.5245 3.8040 3.8506 average 10.7717 1.3101 2.0076 3.4325 4.4112 table 4. averages response time in seconds book worldcat skyline lc ut austin usc book 1 11.8310 1.3111 2.5105 3.4874 3.3953 book 2 12.1174 1.2579 1.9410 3.5791 4.5190 book 3 10.7717 1.3101 2.0076 3.4325 4.4112 average 11.5734 1.2930 2.1530 3.4997 4.1085 220 information technology and libraries | december 2010 university of colorado denver: skyline (innovative interfaces) as previously mentioned, the traditional catalog at auraria library runs on an innovative interfaces integrated library system (ils). testing revealed a missing favicon image file that the web server tries to send each time (item 4 in figure 3). however, this did not negatively affect the response time. the catalog’s response time was good, with an average of 1.2930 seconds, giving it the fastest average time among all the test sites in the testing period. as figure 1 shows, however, skyline is a typical legacy catalog that is designed for a traditional library environment. library of congress: online catalog (voyager) the average response time for the lcoc was 2.0076 ■■ results the data shows the response times for each of the three books in each of the five online catalogs over the thirteenday testing period. the raw data was used to calculate averages for each book in each of the five online catalogs, and then we calculated averages for each of the five online catalogs (table 4). the averages show that during the testing period, the response time varied between 1.2930 seconds for the skyline library catalog in denver to 11.5734 seconds for worldcat@auraria, which has its servers in ohio. university of colorado denver: worldcat@auraria worldcat@auraria was routinely over nielsen’s ten second limit, sometimes taking as long as twenty seconds to load all the files to generate a single webpage. as previously discussed, this is due to the high number and variety of files that make up a single bibliographic record. the files sent also include cover images, but they are small and do not add much to the total time. after our tests on worldcat@auraria were conducted, the site removed one of the features on pages for individual resources, namely the “similar items” feature. this feature was one of the most file-intensive on a typical page, and its removal should speed up page loads. however, worldcat@auraria had the highest average response time by far of the five catalogs tested. figure 7. permalink screen shot for the record for the title hard lessons in the library of congress online catalog figure 8. websitepulse webpage test bar graph results for library of congress online catalog record figure 9. websitepulse webpage test table results for library of congress online catalog record next-generation library catalogs | brown-sica, beall, and mchale 221 item 14 is a script, that while hosted on the ils server, queries amazon.com to return cover image art (figures 11–12). the average response time for ut austin’s catalog was 3.4997 seconds. this example demonstrates that response times for traditional (i.e., not nextgen) catalogs can be slowed down by additional content as well. university of southern california: homer (sirsidynix) the average response time for usc’s homer catalog was 4.1085 seconds, making it the second slowest after seconds. this was the second fastest average among the five catalogs tested. while, like skyline, the bibliographic record page is sparsely decorated (figure 7), this pays dividends in response time, as there are only two css files and three gif files to load after the html content loads (figure 9). figure 8 shows that initial connection time is the longest factor in load time; however, it is still short enough to not have a negative effect. total file size is 19.27 kb. as with skyline, the page itself (figure 7) is not particularly end-user friendly to nonlibrarians. university of texas at austin: library catalog (innovative interfaces) ut austin, like auraria library, runs an innovative interfaces ils. the library catalog also includes book cover images, one of the most attractive nextgen features (figure 10), and as shown in figure 12, third-party content is used to add features and functionality (items 16 and 17). ut austin’s catalog uses a google javascript api (item 16 in figure 12) and librarything’s catalog enhancement product, which can add book recommendations, tag browsing, and alternate editions and translations. total content size for the bibliographic record is considerably larger than skyline and the lcoc at 138.84 kb. it appears as though inclusion of cover art nearly doubles the response time; figure 10. permalink screen shot for the record for the title hard lessons in university of texas at austin’s library catalog figure 11. websitepulse webpage test bar graph results for university of texas at austin’s library catalog record figure 12. websitepulse webpage test table results for university of texas at austin’s library catalog record 222 information technology and libraries | december 2010 completed. added functionality and features in library search tools are valuable, but there is a tipping point when these features slow down a product’s response time to where users find the search tool too slow or unreliable. based on the findings of this study, we recommend that libraries adopt web response time standards, such as those set forth by nielsen, for evaluating vendor search products and creating in-house search products. commercial tools like websitepulse make this type of data collection simple and easy. testing should be conducted for an extended period of time, preferably during a peak period—i.e., during a busy time of the semester for academic libraries. we further recommend that reviews of electronic resources add response time as an worldcat@auraria, and the slowest among the traditional catalogs. this sirsidynix catalog appears to take a longer time than the other brands of catalogs to make the initial connection to the ils; this accounts for much of the slowness (see figures 14 and 15). once the initial connection is made, however, the remaining content loads very quickly, with one exception: item 13 (see figure 15), which is a connection to the third-party provider syndetic solutions, which provides cover art, a summary, an author biography, and a table of contents. while the display of this content is attractive and well-integrated to the catalog (figure 13), it adds 1.2 seconds to the total response time. also, as shown in item 14 and 15, usc’s homer uses the addthis service to add bookmarking enhancements to the catalog. total combined file size is 148.47 kb, with the bulk of the file size (80 kb) coming from the initial connection (item 1 in figure 15). ■■ conclusion an eye-catching interface and valuable content are lost on the end user if he or she moves on before a search is figure 13. permalink screen shot for the record for the title hard lessons in homer, the university of southern california’s catalog figure 14. websitepulse webpage test bar graph results for homer, the university of southern california’s catalog figure 15. websitepulse webpage test table results for homer, the university of southern california’s catalog next-generation library catalogs | brown-sica, beall, and mchale 223 4. ibid. 5. margaret brown-sica. “playing tag in the dark: diagnosing slowness in library response time,” information technology & libraries 27, no. 4 (2008): 29–32. 6. xiaotian chen, “metalib, webfeat, and google: the strengths and weaknesses of federated search engines compared with google,” online information review 30, no. 4 (2006): 422. 7. brian mathews, “5 next gen library catalogs and 5 students: their initial impressions,” online posting, may 1, 2009, the ubiquitous librarian blog, http://theubiquitouslibrarian .typepad.com/the_ubiquitous_librarian/2009/05/5-next-genlibrary-catalogs-and-5-students-their-initial-impressions.html (accessed feb. 5, 2010) 8. andrew nagy and scott garrison, “next-gen catalogs are only part of the solution,” online posting. oct. 4, 2009, lauren’s library blog, http://laurenpressley.com/library/2009/10/next -gen-catalogs-are-only-part-of-the-solution/ (accessed feb. 5, 2010). 9. eitan frachtenberg, “reducing query latencies in web search using fine-grained parallelism,” world wide web 12, no. 4 (2009): 441–60. 10. travis k huang and fu fong-ling, “understanding user interface needs of e-commerce web sites,” behaviour & information technology 28, no. 5 (2009): 461–69, http://www .informaworld.com/10.1080/01449290903121378 (accessed feb. 5, 2010). 11. nielsen, usability engineering, 135. 12. ibid. evaluation criterion. additional research about response time as defined in this study might look at other search tools, to include article databases, and especially other metasearch products that collect and aggregate search results from several remote sources. further studies with more of a technological focus could include discussions of optimizing data delivery methods—again, in the case of metasearch tools from multiple remote sources—to reduce response time. finally, product designers should pay close attention to response time when designing information retrieval products that libraries purchase. ■■ acknowledgments the authors wish to thank shelley wendt, library data analyst, for her assistance in preparing the test data. references 1. jakob nielsen, usability engineering (san francisco: morgan kaufmann, 1994): 135. 2. ibid. 3. ibid. 116 journal of library automation vol. 14/2 june 1981 tions only. they do not list the individual works that may be contained in publications. if an analytic catalog were to be built into a computerized system at some time in the future , the structure code would be a great help in the redesign, because it makes it easy to spot items that need analytics, namely those that contain embedded works, or codes 2, 4, 5, 6, 8, 9, 10, 11, and 13. a searcher working with such an analytic catalog could use the code to limit output to manageable stages-first all items of type c, for example; then broadening the search to include those of type d; and so forth, until enough relevant material has been found. the structure code would also be useful in the displayed output. if codes 5 or 8 appeared together with a bibliographic description on the screen, this would tell the catalog user that the item retrieved is a set of many separately titled documents. a complete list of those titles can then be displayed to help the searcher decide which of the documents are relevant for him. in the card catalog this is done by means of contents notes . not all libraries go to the trouble of making contents notes, though, and not all contents notes are complete and rtliable . the structure code would ensure consistency and completeness of contents information at all times. codes 10 and 13 in a search output, analogously, would tell the user that the item is a serial with individual issue titles. there is no mechanism in the contemporary card catalog to inform readers of those titles. codes 4 and 7 would tell that the document is part of a finite set, and so forth. it has been the general experience of database designers that a record cannot have too many searchable elements built into its format. no sooner is one approach abandoned "because nobody needs it," than someone arrives on the scene with just that requirement. it can be anticipated, then, that once the structure code is part of the standard record format, catalog users will find many other ways to work the code into search strategies. it can also be anticipated that the proposed structure code, by adding a factor of selectivity, will help catalogers because it strengthens the authority-control aspect of machine-readable catalog files. if two publications bear identical titles, for example, and one is of structure 1, the other of structure 6, then it is clear that they cannot possibly be the same items. however, if they are of structures 1 and 7, respectively, extra care must be taken in cataloging, for they could be different versions of the same work. determination of the structure of an item is a by-product of cataloging, for no librarian can catalog a book unless he understands what the structure of that book is-one or more works, one or more documents per item, open or closed set, and so forth . it would therefore be very cheap at cataloging time to document the already-performed structure analysis and express this structure in the form of a code. references l. herbert h. hoffman, descriptive cataloging in a new light: polemical chapters for librarians (newport beach, calif.: headway publications, 1976), p.43. revisions to contributed cataloging in a cooperative cataloging database judith hudson: university libraries , state university of new york at albany. introduction oclc is the largest bibliographic utility in the united states. one of its greatest assets is its computerized database of standardized cataloging information . the database, which is built on the principle of shared cataloging, consists of cataloging records input from library of congress marc tapes and records contributed by member libraries. oclc standards ln. order to provide records contributed by member libraries that are as usable as those input from marc tapes, it is imperative that the records meet the standards set by oclc and that the cataloging and formatting of the records be free of errors. member libraries are requested to follow the nationally accepted cataloging code (anglo-american cataloging rules, north american text, 1 • 2 for records input before december 12, 1980, and angloamerican cataloguing rules, second edition, 3 for records input later), the library of congress' application of the cataloging code, and the various marc formats in preparing records to be input. 4 • 5 the cataloging rules dictate what kind of bibliographic information should be included in the cataloging records, a prescribed system of punctuation that identifies the various fields of the cataloging record (international standard bibliographic description, isbd), which access points should be provided, and what form the entries should take. the marc formats provide a standardized method of identifying the various fields and subfields in a cataloging record and, through the use of indicators, information necessary to make the record easily manipulated by computers. in addition, fixed fields provide coded information about the cataloging records. the form of main, added, and series entries can be verified in the national union catalog to ensure that member libraries are following the library of congress' application of the cataloging code . by the same token, subject entries can be verified in the appropriate subject heading list (e.g., library of congress subject headings, sears subject headings, etc.). a study of oclc member cataloging a major problem with the use of contributed cataloging is the amount of revision needed to bring the records up to the standards described above. in 1975, a study of the quality of a group of membercontributed catalog records was conducted by c. c . ryans. 6 the first 700 monographic records input into oclc after september 1, 1975, to which kent state university attached its holdings were examined. 7 the analysis included changes in or additions to main, added, or series communications 117 entries, changes in descriptive cataloging, and changes in or additions to subject headings . the study dealt only with the revision of cataloging; revision of the formatting of records was not noted. the kent state study found that 393 revisions were necessary to 283 records. the remaining 417 records were considered to be acceptable, i.e., they adhered to aacr and isbd rules and to the oclc standards for input cataloging. recent developments relating to quality control since these records were studied, the internetwork quality control council was formed in 1977 by the oclc board of trustees. 8 its primary purpose is to identify problem areas regarding quality control and distribute information to networks concerning problems and solutions. its role is to promote quality control through education and by monitoring the implementation of standards. in addition, oclc' s documentation has steadily improved. the recent publication of the books format9 and the recent revision of the cataloging manual10 provide clear and specific information on oclc' s formatting requirements. with these developments in mind, it would seem likely that the quality of the contributed cataloging has improved since 1975. in order to test this assumption, a number of cataloging records were analyzed in an effort to replicate the kent state study. the analysis of these records differed from the earlier study in that differences in the treatment of series were not noted because one library's treatment of series can reasonably be expected to differ from that of another . methodology the records included in this study consist of 1,017 monographic catalog records to which the state university of new york at albany (sunya) library added its holding symbol during an eight-month period from november 1979 to july 1980. the records included only those that were entered into the oclc database after 1976. cataloging revisions that were noted 118 journal of library automation vol. 14/2 jun e 1981 consisted of changes in main and added entries to make them consistent with library of congress form of entry, and the inclusion of other added entries that were deemed necessary to provide adequate access to the material. in addition, corrections or additions to the imprint and the collation· were noted, as were typograph_ ical errors in all fields . subject headings that were changed to make them consistent with library of congress subject headings and subject headings and/or subdivisions added to provide better subject access to the material were also noted . analysis of cataloging cataloging revisions were required for 43 percent of the 1,017 records examined (596 changes or additions were made to 437 records). changes or additions to subject headings were made to 22.4 percent of all the records in the sunya sample, and represented the most common revision . changes in descriptive cataloging were made to 20 percent of the records, and changes or additions to main or added entries were made to approximately 16 percent of the records. table 1 compares the results of this analysis with the findings of the e arlier study . it should be emphasized that the two studies are not exactly comparable because the kent state study included differences in the treatment of series, while this study noted only typographical errors in series statements. the findings of this analysis do not bear out the hypothesis that the quality of member-contributed cataloging has improved since 1975. the overall percentage of records requiring cataloging revision is similar in both the kent state and the sunya samples . the percentage of changes made in the various areas of the cataloging records was similar, with the exception of added entries and subject headings . in the sunya sample , more revisions and additions were made to these two areas. this difference between the two samples may reflect variation in the cataloging policies of the two libraries rather than the presence or absence of more errors in member-contributed catalog records . analysis of oclc reportable errors and additions in the fall of 1979, oclc distributed its revised cataloging manual, which includes a chapter dealing with quality control. 11 the chapter delineates the errors and changes that are to be reported to oclc for correction or addition . the cataloging records examined in this study were also analyzed with these criteria in mind. this analysis (table 2) revealed that 661 reportable errors or changes were found on 486 records (47.8 percent of all the records). reportable errors or changes included formatting errors or omissions such as incorrect assignment of tags, incorrect or missing indicators, subfield codes or fixed fields, and errors affecting retrieval or card printing . other types of errors intable 1 . comparison of two studies of cataloging revision area needing kent state sample* sunya sample revision or addition number percentage number percentage main entry 44 6.2 46 4.5 title statement 28 4.0 76 7.5 edition statement 4 0.6 2 0.2 imprint 29 4.4 64 6.3 collation 111 15.9 58 5.7 series 55 7.9 3 0.3 subject heading 88 12.6 228 22 .4 added entries 44 6.2 119 11.7 total records in study 700 100.0 1017 100.0 records requiring revision 283 40.4 437 43.0 number of revisions made 393 596 *source: constance c . ryans, "a study of errors found in non-marc cataloging in a machineassisted system," journal of library automation 11 :128 (june 1978). communications 119 table 2 . errors and additions reportable to oclc number percentage of total records percentage of total errors and additions 19 6 13 17 59 errors in transcription of data incorrect assignment of tags incorrect or missing subfield codes incorrect assignment of 1st indicator incorrect assignment of 2d indicator incorrect fixed fields incorrect isbd incorrect form of entry (less than lc) errors affecting retrieval or card printing bibliographic information missing addition of access points 313 8 87 3 1 135 total number of records containing reportable errors or additions total number of reportable errors or additions 486 661 eluded incorrect or omitted access points (added or subject entries, isbn, lc card numbers, etc.), errors in transcription of data, incorrect isbn, and the omission of needed bibliographic information. approximately 40 percent (408) of the records contained formatting errors, with over 29 percent (300) of the records containing incomplete or incorrect fixed fields. the apparent unconcern with fixed fields may stem from a lack of understanding of the value of correct fixed-field information. the recent addition of date and type of material as qualifiers in a search of the database is one example of the use of fixed fields. in order to underscore their importance, it might be useful for oclc to highlight this use of fixed fields and further explain to its members how other fixed fields might be used in online search strategies in the future. errors in or omission of access points were found in 222 records (21.8 percent). these errors were also noted in the study of cataloging revisions discussed above, as were errors in transcription of data, in isbd, and in omission of necessary bibliographic information. summary of findings although the quality of the sunya sample seems equivalent to that of the kent state sample, an analysis by date of input of the records examined indicates a slight decrease in the percentage of rec1.9 0.6 1.3 1.7 5.8 30.8 0.8 8.6 0.3 0.1 13.3 47.8 2.9 0.9 2.0 2.6 8.9 47.4 1.2 13.2 0.5 0.2 20 .4 100.0 ords needing correction for those records input in 1979 and 1980 (table 3). perhaps this is the beginning of a trend toward more careful cataloging and formatting of records input by members. in summary, 589 of the 1,017 membercontributed records studied were found to require revision. of these, 486 records contained er.rors or omissions that may be reported to oclc, and 437 required cataloging revision. it is discouraging to realize that approximately 60 percent of the member records used required revision. such a high percentage of records needing revision necessitates the review of all member records .used if a library wishes to adhere to oclc standards for cataloging. this leads to tremendous duplication of effort and negates, in part, the purpose of shared cataloging. table 3. yearly breakdown of catalog records total records percentage year number needing needing of input of records correction correction 1977 186 115 61.8 1978 332 202 60.8 1979 339 184 54.3 1980 160 88 55.0 influences for change the implementation of aacr2 in 1981 provides the impetus for greater adherence to standards. since all catalogers 120 journal of library automation vol. 14/2 june 1981 have had to learn the new cataloging requirements, greater care may be used in the formulation of records by member libraries. the publication of clear and specific guidelines for reportable errors may help to alleviate the situation in two ways . first, the careful articulation of errors or desirable additions may impel member libraries to place more emphasis on the quality control of input. second, member libraries may report more errors, thus allowing oclc to correct the master records. a change in the method of correcting errors and the rate at which they are corrected might be beneficial. presently, errors on the master records can only be corrected by oclc or by the inputting library if it is the only library that has used the record. such an arrangement is clumsy and time-consuming. if other member libraries were trained and authorized to correct errors on master records, errors might be corrected as often as they are detected. in the long run, however, the responsibility for inputting catalog records that meet the standards for cataloging and formatting rests with the member libraries. oclc and the networks must develop methods of encouraging libraries to input records that are correctly formatted and cataloged . one way of alleviating the problem might be to develop training programs conducted by oclc or by network staff that are aimed at those libraries identified as having high error rates. another approach might be to give public recognition to libraries that contribute cataloging of high quality to the database. one example of this approach is the pittsburgh regional library council's fred award, which annually honors the library with the lowest error rate in the prlc network. 12 through the use of peer pressure the member libraries and networks of oclc can encourage adherence to the standards. in addition, they must continue to insist that oclc address this annoying, expensive, and seemingly perennial problem. references l. anglo-american cataloging rules, north american text (chicago: american library assn., 1967), 409p. 2. anglo-american cataloging rules, chapter 6 (rev. ed.; chicago: american library assn., 1974), 122p. 3. anglo-american cataloguing rules, second edition (chicago: american library assn., 1978), 620p. 4. oclc, inc . , cataloging: user manual (columbus: oclc, 1979), 1v. (looseleaf). 5. oclc level i and level k input standards (columbus: ohio college library center, 1977), 1 v. (looseleaf). 6. constance c. ryans, "a study of errors found in non-marc cataloging in a machine-assisted system," journal of library automation 11:125-32 oune 1978). 7. ibid., p . 127. 8. frederick g. kilgour, "establishment of inter-network quality control council" (unpublished document, ohio college library center, 1977), 2p. 9. oclc, inc., books format (columbus: oclc, 1980), 1v. (looseleaf). 10. oclc, inc., cataloging: user manual, 1v. (looseleaf) . 11. ibid. 12. "prlc peer council cites pittsburgh theological seminary library for high cataloging standards," oclc newsletter 131:4 (sept. 1980). participatory networks | lankes, silverstein, and nicholson 17 author id box for 2 column layout column title editor the goal of the technology brief is to familiarize library decision-makers with the opportunities and challenges of participatory networks. in order to accomplish this goal the brief is divided into four sections (excluding an overview and a detailed statement of goal): ■ a conceptual framework for understanding and evaluating participatory networks; ■ a discussion of key concepts and technologies in participatory networks drawn primarily from web 2.0 and library 2.0; ■ a merging of the conceptual framework with the technological discussion to present a roadmap for library systems development; and ■ a set of recommendations to foster greater discussion and action on the topic of participatory networks and, more broadly, participatory librarianship. this summary will highlight the discussions in each of these four topics. for consistency, the section numbers and titles from the full brief are used. k nowledge is created through conversation. libraries are in the knowledge business. therefore, libraries are in the conversation business. some of those conversations span millennia, while others only span a few seconds. some of these conversations happen in real time. in some conversations, there is a broadcast of ideas from one author to multiple audiences. some conversa tions are sparked by a book, a video, or a web page. some of these conversations are as trivial as directing someone to the bathroom. other conversations center on the foun dations of ourselves and our humanity. it may be odd to start a technology brief with such seemingly abstract comments. yet, without this firm, if theoretical, footing, the advent of web 2.0, social net working, library 2.0, and participatory networks seems a clutter of new terminology, tools, and acronyms. in fact, as will be discussed, without this conceptual footing, many library functions can seem disconnected, and the field that serves lawyers, doctors, single mothers, and eightyear olds (among others) fragmented. the scale of this technology brief is limited; it is to present library decisionmakers with the opportunities and challenges of participatory networks. it is only a single piece of a much larger puzzle that seeks to pres ent a cohesive framework for libraries. this framework not only will fit tools such as blogs and wikis into their offerings (where appropriate), but also will show how a more participatory, conversational approach to libraries in general can help libraries better integrate current and future functions. think of this document as an overview or introduction to participatory librarianship. readers will find plenty of examples and definitions of web 2.0 and social networking later in this article. however, to jump right into the technology without a larger frame work invites the rightful skepticism of a library organiza tion that feels constantly buffeted by new technological advances. in any environment with no larger conceptual founding, to measure the importance of an advance in technology or practice selection of any one technology or practice is nearly arbitrary. without a framework, the field becomes open to the influence of personalities and trendy technology. therefore, it is vital to ground any technological, social, or policy conversation into a larger, rooted concept. as susser said, “to practice without theory is to sail an uncharted sea; theory without practice is not to set sail at all.”1 for this paper, the chart will be conversation theory. the core of this article is in four sections: ■ a conceptual framework for understanding and eval uating participatory networks; ■ a discussion of key concepts and technologies in par ticipatory networks drawn primarily from web 2.0 and library 2.0; ■ a merging of the conceptual framework with the technological discussion to present a sort of roadmap for library systems development; and ■ a set of recommendations to foster greater discussion and action on the topic of participatory networks and, more broadly, participatory librarianship. it is recommended that the reader follow this order to get the big picture; however, the second section should be a useful primer on the language and concepts of partici patory networks. ■ library as a facilitator of conversation let us return to the concept that knowledge is created through conversation. this notion stretches back to socrates and the socratic method. however, the specific foundation for this statement comes from conversation theory, a means of explaining cognition and how people learn.2 it is not the purpose of this article to provide a r. david lankes (jdlankes@iis.syr.edu) is director and associate professor, joanne silverstein (jlsilver@iis.syr.edu) is research professor, and scott nicholson (scott@scottnicholson.com) is associate professor at the information institute of syracuse, (n.y.) syracuse university’s school of information studies. participatory networks: the library as conversation r. david lankes, joanne silverstein, and scott nicholson 18 information technology and libraries | december 200718 information technology and libraries | december 2007 detailed description of conversation theory, a task already admirably accomplished by pask. rather, let us use the theory as a structure upon which to hang an exploration of participatory networking and, more broadly, participa tory librarianship. the core of conversation theory is simple: people learn through conversation. different communities have different standards for conversations, from the scientific community’s rigorous formalisms, to the religious com munity’s embedded meaning in scripture, to the some times impenetrable dialect of teens. the point remains, however, that different actors establish meaning through determining common definitions and building upon shared concepts. the library has been a place where we facilitate con versations, though often implicitly. the concept of learn ing through conversation is evidenced in libraries in such large initiatives as information literacy and teaching criti cal thinking skills (using such metacognitive approaches as selfquestioning), and in the smaller events of book groups, reference interviews, and speaker series. library activities such as building collections of artifacts (the tan gible products of conversation) inform scholars’ research through a formal conversation process where ideas are supported with evidence and methods. similarly, pres ervation efforts, perhaps of wax cylinders with spoken word content or of ancient maps that embody an ongo ing dialogue about the shape and nature of the physical world, seek to save, or at least document, important conversations. common use of the word “conversation” is com pletely in accordance with the use of the term in conver sation theory. the term is, however, more specifically defined as an act of communication and agreement between a set of agents. so, a conversation can be between two people, two organizations, two countries, or even within an individual. how can a conversation take place within an individual? educators and school librarians may be familiar with the term “metacogni tion,” or the act of reflecting on one’s learning.3 yet, even the most casual reader will be familiar with the concept of debating oneself (“if i go right, i’ll get there faster, but if i go left i can stop by jim’s . . .”). the point is that a conversation is with at least two agents trying to come to an understanding. also note that those two agents can change over time. so, while socrates and plato are dead, the conversation they started about the nature of knowl edge and the world is carried forward by new genera tions of thinkers—same conversation, different agents. people converse, organizations converse, states con verse, societies converse. the requirements, in the terms of conversation theory, are two cognitive systems seek ing agreement. the results of these conversations, what pask would call “cognitive entanglements,” are books, videos, and artifacts that either document, expand, or result from conversations.4 so, while one cannot con verse with a book, that book certainly can be a starting point for many conversations within the reader and within a larger community. if the theory is that conversation creates knowledge, the library community has added a corollary: the best knowl edge comes from an optimal information environment, one in which the most diverse and complete information is available to the conversant(s). library ethics show an implicit understanding of this corollary in the advocacy of intellectual freedom and unfettered access. libraries seek to create rich environments for knowledge and have taken the stance that they are not in the job of arbitrating the conversations that occur or the appropriateness of the information used to inform those conversations. as will be discussed later, this belief in openness of conversations will have some farreaching implications for the library collec tion and is an ideal that can never truly be met. for now, the reader may take away that conversation theory is very much in line with current and past library practice, and it also shows a clear trajectory for the future. this viewpoint’s value is not just theoretical; it has real consequences and uses. for example, much of library evaluation has been based on numeric counts of tangible outputs: books circulated, collection size, reference transactions, and so on. yet this quantitative approach has been frustrating to many who feel they are count ing outcomes but not getting at true impact of library service. librarians may ask themselves, “which num bers are important . . . and why?” if libraries focused on conversations, there might be some clarity and cohesion between statistics and other outcomes. suddenly, the number of reference questions can be linked to items cat aloged or to circulation numbers . . . they are all markers of the scope and scale of conversations within the library context. this approach might enable the library com munity to better identify important conversations and demonstrate direct contributions to these conversations across functions. for example, a school district identifies early literacy as important. there is a discussion about public policy options, new programs, and school goals to achieve greater literacy in k–5. the library should be able to track two streams in this conversation. the first is the one libraries are accustomed to counting; that is, the library’s contribution to k–5 literacy (participation in book talks, children’s events, circulation of children’s books, reference questions, and so on). but the library also can document and demonstrate how it furthered the conversation about children’s literacy in general. it could show the resources provided to community offi cials. it could show the literacy pathfinders that were created. the point of this example is that the library is both participant in the conversation (what we do to pro mote early literacy) and facilitator of conversation (what we do to promote public discourse). article title | author 19participatory networks | lankes, silverstein, and nicholson 19 the theoretical discussion leads us to a discussion about the second topic of this technology brief: pragmatic aspects of the knowledge as conversation approach, or a participatory approach, as it will be called. as new technologies are developed and deployed in the current environment of limited resources, there must be some means of evaluating their utility. a technology’s util ity is appropriately measured against a given library’s mission, which is, in turn, developed to respond to the needs of the community that library serves. first, how ever, let us identify some of the new technologies and describe them briefly. ■ participatory networking, social networks, and web 2.0 let us now move from the theoretical to the opera tional. the impetus behind this article is the relatively recent emergence of a new group of internet services and capabilities. suddenly, terms such as wiki, blog, mashup, web 2.0, and biblioblogosphere have become commonplace. as with any new wave of technological creation, these terms can seem ambiguous. they also come wrapped in varying amounts of hype. they may all, however, be grouped under the phenomenon of par ticipatory networking. while we now have a conceptual framework to evaluate these technologies that support participatory networking (for example, do they further conversa tions), we still need to know the basics of the terminol ogy and technologies. this section outlines key concepts in the pragmatics of participatory networking. the section after this one will join the theoretical and operational to outline key chal lenges and opportunities for the library world. we begin with web 2.0. web 2.0 much of what we call participatory networking, at least the technological foundation of it, stems from developments in web 2.0.5 as with many buzzwords, the exact definition of web 2.0 is not clear. it is more an aggregation of concepts that range from software development (loosely coupled application programming interfaces [apis] and the ease of incorporating features across platforms) to abstrac tions (the user is the content). what pervades the web 2.0 approach is the notion that internet services are increas ingly facilitators of conversations. the following sections describe some of the characteristics of web 2.0. web 2.0 characteristic: social networks a core concept of web 2.0 is that people are the content of sites; that is, a site is not populated with information for users to consume. instead, services are provided to individual users for them to build networks of friends and other groups (professional, recreational, and so on). the content of a site, then, comprises userprovided infor mation that attracts new members of an everexpanding network. examples include: ■ flickr. flickr (www.flickr.com) provides users with free web space to upload images and create photo albums. users then can share these photos with friends or with the public at large. flickr facilitates the creation of shared photo galleries around themes and places. ■ the cheshire public library. the teen book blog (http://cpltbb.wordpress.com) at the cheshire public library offers book reviews created only by the stu dents who use the library. ■ memorial hall library. the memorial hall library in andover, massachusetts, offers podcasts of poetry contests in which the content is created by students (www.mhl.org/teens/audio/index.htm). ■ libraries in myspace. myspace searches show that there are myspace sites for hundreds of individual libraries and scores of library groups. alexandrian public library (apl), for example, has established a site at myspace (www.myspace.com/teensatapl). this practice is growing among public libraries and is an attempt to reach out to users in their preferred online environments. in this venue, the more friends a library’s myspace site has, the more successful it may be considered. as of this writing, apl had sev entyfive friends and fifteen comments. the brooklyn college library had 2,195 friends and 270 comments. web 2.0 characteristic: wisdom of crowds there has been some research into the quality of mass decisionmaking.6 that research shows how remarkably accurate groups are in their judgments. web 2.0 pools large groups of users to comment on decisions. this aggregation of input is facilitated by the ready availabil ity of social networking sites. certainly, this approach of community organization and verification of knowledge also has its detractors. many, for example, question the wisdom seen in some entries of wikipedia. yet, recent articles have compared this mass editing process favor ably to traditional sources of information, such as the encyclopedia britannica.7 examples include: ■ ebay. ebay has perhaps the most studied and copied community policing and reputation systems. all buyers and sellers can be rated. the aggregation of many users’ experiences create a feedback score that is equivalent to a group credibility rating (see figure 1). these kinds of group feedback systems can now be seen in most major internet retailers. ■ librarything. librarything.com makes book recom 20 information technology and libraries | december 200720 information technology and libraries | december 2007 mendations based on the collective intelligence of all users of the site. the greater the pool of collective intelligence, the more information available to the user for decisionmaking. ■ the diary project. the diary project library (www. diaryproject.com) is a nonprofit organization that encourages teens to write about their daytoday experiences growing up. the goal of this site is to encourage communication among teens of all cul tures and backgrounds, provide peertopeer support, stimulate discussion, and generate feedback that can help ease some of the concerns teens encounter along the way and let them know that they are not alone. to that end, the site comprises thousands of entries in twentyfour categories. because of the great number of entries, most youth can find helpful materials. web 2.0 characteristic: loosely coupled apis an api provides a set of instructions (messages) that a programmer can use to communicate between applica tions. apis allow programmers to incorporate one piece of software they may not be able to directly manipulate (code) into another. for example, google maps has made a public api that allows web page designers to include satellite images into their web pages with little more than a latitude and longitude.8 apis vary in their ease of integration. loosely coupled apis allow for very easy integration using highlevel scripting languages such as javascript9. examples include: ■ google maps. google maps displays street or sat ellite maps showing markers on specific locations provided by an external source with simple sets of longitudes and latitudes. it becomes extremely easy to create geographic information systems with little knowledge of gis principles. ■ flickr. flickr provides easy means to integrate hosted images into other web pages or applications (as with a google map that shows images taken at a specific location). ■ youtube. youtube (www.youtube.com) provides users with the capability to upload and comment upon video on the internet. it also allows for easy integration of the videos into other web pages and blogs. with a simple line of html code, anyone can access streaming video for their content. web 2.0 characteristic: mashups mashups are combinations of apis and data that result in new information resources and services.10 this ease of incorporation has led to an assumption of a “right to remix.” in the world of open source software and the creative commons, the right to remix refers to a grow ing expectation among internet users that they are not limited by the interfaces and uses presented to them by a single organization. examples include: ■ chicagocrime.org. an oftencited example of a mashup is chicagocrime.org, which uses google maps to plot crime data for the city of chicago. users can now see exactly which street corner had the most murders. figure 2 shows a marker at the location of every homicide in chicago from november 2, 2005, to august 2, 2006. ■ book burro. book burro (http://bookburro.org/ about.html) “is a web 2.0 extension for firefox and flock. when it senses you are looking at a page that contains a book, it will overlay a small panel which when opened lists prices at online bookstores such as amazon, buy, half (and many more) and whether the book is available at your library.” ■ library lookup. the mit library lookup greasemonkey script for firefox (http://libraries. mit.edu/help/lookup.html) searches mit’s barton catalog from an amazon book screen. web 2.0 characteristic: permanent betas the concept of a permanent beta is, in part, a realization that no software is ever truly complete so long as the user community is still commenting upon it. for example, google does not release services from beta until it has achieved a sufficient user base, no matter how fixed the underlying source code is.11 permanent beta also is a design strategy. large applications are broken into smaller constituent parts that can be manipulated sepa rately. this allows large applications to be continually figure 1. a seller’s profile shows a potential buyer the ebay community’s current estimation of a seller’s credibility. article title | author 21participatory networks | lankes, silverstein, and nicholson 21 developed by a more diverse and distributed community (as in open source). examples include: ■ google labs. google has a site named “google labs” (http://labs.google.com) that puts out company generated tools and services. in fact, part of a google employee’s work time is dedicated to creating the resources and tools through personal projects and exploration. these tools and services remain a part of the “lab” until they are finished and have sufficient user bases. projects (see figure 3) range from the simple (google suggest, which provides a dropdown box of possible search queries as you being to type your search terms) to the extensive (google maps, which started as a google lab project). ■ mit libraries. the mit libraries are experimenting with new technologies to help make access to informa tion easier. the tools below are offered to the public with an appeal for feedback and additional tools, and the there is a permanent address designed just to collect feedback from the betaphase tools, which include: ■ the new humanities virtual browsery, which highlights new books and incorporates an rss feed, the ability to comment on books, links to book reviews, availability information, and links to other books by the same author. ■ the libx—mit edition (http://libraries.mit. edu/help/libx.html), which is a firefox toolbar that allows users to search the barton catalog, vera, google scholar, the sfx fulltext finder, and other search tools; it embeds links to mit only resources in amazon, barnes & noble, google scholar, and nyt book reviews. ■ the dewey research advisor business and economics q&a (http://libraries.mit.edu/help/ dra.html), which provides starting points for specific research questions in the fields of busi ness, management, and economics. web 2.0 characteristic: software gets better the more people use it an increasing number of web 2.0 sites emphasize social networks, where these services gain value only as they gain users. malcolm gladwell recounts this principle and the work of kevin kelly with an earlier telecommunica tions network, the network of fax machines connected to the phone system: the first fax machine ever made . . . cost about $2,000 at retail. but it was worth nothing because there was no other fax machine for it to communicate with. the second fax machine made the first fax machine more valuable, and the third fax made the first two more valuable, and so on. . . . when you buy a fax machine, then, what you are really buying is access to the entire fax network—which is infinitely more valuable than the machine itself.12 with social networking sites, and all sites that seek to capitalize on user input (reviews, annotations, profiles, etc.), the true value of each site is defined by the number of people it can bring together. a classic example of this characteristic is amazon. amazon sells books and other merchandise, but, in reality, amazon is very much about the marketing of information. amazon gains tremendous value by allowing its users to review and rate items. the more people use amazon and the more they comment, the more visibility these active users gain and the more credibility markers they take on. web 2.0 characteristic: folksonomies a folksonomy is a classification system created in a bottomup fashion with no central coordination. this differs from the deductive approach of such classifica tions systems as the dewey decimal system, where the world of ideas is broken into ten nominal classes.13 it also differs from other means of developing classifications where some central authority determines if a term should be included. in a folksonomy, the members of a group simply attach terms (or tags) to items (such as photos or blog postings), and the aggregate of these terms is seen as the classification. what emerges is a classification scheme that prioritizes common usage (the mostused tags) over semantic clarity (if most people use “car,” but some use “cars,” they are seen as different terms, and the tag “auto mobile” has no real relationship within the aggregate classification). examples include: figure 2: screenshot of chicagocrime.org 22 information technology and libraries | december 200722 information technology and libraries | december 2007 ■ penntags. penntags (http://tags.library.upenn.edu/ help) is a social bookmarking tool for locating, orga nizing, and sharing one’s favorite online resources. members of the penn community can collect and maintain urls, links to journal articles, and records in franklin, the online catalog, and vcat, the online video catalog. once resources are compiled, users can organize them by assigning tags (freetext key words) or by grouping them into projects according to specific preferences. penntags also can be used collaboratively, as it acts as a repository of the varied interests and academic pursuits of the penn com munity, and a user can find topics and other users related to his or her own favorite online resources. ■ hillsdale teen library. the hillsdale teen library (www.flickr.com/photos/hillsdalelibraryteens) uses flickr to post pictures of events at the hillsdale teen library (figure 4). the resulting tag view is repre sented in figure 5. these tags allow users to easily retrieve the images in which they are interested. there are more characteristics of web 2.0, but these give some overall concepts. core new technologies: ajax and web services as we have just discussed, web 2.0 is little more than set of related concepts, albeit with a lot of value being currently attached to these concepts. these concepts are supported by two underlying technologies that have facilitated web 2.0 development and brought a substantially new (and improved) user experience to the web. the first is ajax, which allows a more desktoplike experience for users. the second is the advent of web services. these technolo gies are not necessary for web 2.0 concepts, but they have made web 2.0 sites much more compelling. ajax ajax stands for asynchronous javascript and xml.14 it is a set of existing web technologies brought together. at the most basic, ajax allows a browser (the part the user interacts with) and a server (where the data resides) to send data back and forth without needing to refresh the entire web page being worked on. think about the web sites you work with. you click on a link, the browser freezes and waits for the data, then draws it on the screen. early versions of such sites as mapquest would show a map. if you wanted to zoom into the map, you would press a zoom icon and wait while the new map, and the rest of the web page was redrawn. compare this to google maps, where you click in the middle of a map and drag left or right and the map moves dynamically. we are used to this kind of interaction in desktop applications. click and drag has become second nature on the desktop, and ajax is making it second nature on the web, too. another ajax advantage is that it is open and requires only light programming skills. javascript on the client and almost any serverside scripting language (such as active server pages or php) are easily accessible languages. this fact allows for both fast development and easier integration with existing systems. as an example, it should now be easier to bring more interactive web interfaces to existing online catalogs. web services web services allow for softwaretosoftware interactions on the web.15 using web protocols and xml, applications exchange queries and information in order to facilitate the larger functioning of a system. one example would be a system that uses an isbn number to query multiple online catalogs and commercial vendors for availability (and price) of a book. this simple process might be part of a much larger library catalog that shows users a book and its availability. the point is, that unlike federated search systems such as z39.50, web services are small. they also tend to be lightweight (that is, limited in what they do), and are aggregated for greater functionality. this is the technological basis for the loosely coupled apis dis cussed previously. library 2.0 library 2.0 is a somewhat diffuse concept. walt crawford, in his extended essay “library 2.0 and ‘library 2.0,’” found sixtytwo different (and often contradictory) views and seven distinct definitions of library 2.0.16 it is no wonder that people are confused. however, it is natural for emerging ideas and groups to function in an environ figure 3: screenshot of current google lab projects article title | author 23participatory networks | lankes, silverstein, and nicholson 23 ment of high ambiguity. for use in this technology brief, the authors see library 2.0 as an attempt to apply web 2.0 concepts (and some longstanding beliefs for greater com munity involvement) to the purpose of the library. in the words of ormsby, “the purpose of a library is not to . . . showcase new gadgetry . . . ; rather, it is to make possible that instant of insight when all the facts come together in the shape of new knowledge.”17 in the case of library 2.0, the new gadgetry discussed in the previous section comprises a group of software applications. how the applications are used will determine whether they support ormsby’s “instant of insight.” many libraries and librarians already are pursuing this goal. some, for instance, are using blogs to reach other librarians, their own users (on their own web sites), and potential users (using myspace and other online communities). they are using wikis to deliver reports, teach information literacy, and serve as repositories. one has developed an api that allows wordpress posts to be directly integrated into a library catalog. clearly, the internet and newer tools that empower users seem to be aligned with the library mission. after all, librarians blogging and allowing the catalog to be mashed up can be seen as an extension of current information services. but this abundance of new applications poses a challenge. given the speed with which new tools are invented, librarians may find it difficult to create strate gies that include all the desired services that they make possible. for every new application that becomes avail able, library administrators must decide whether it can serve the library, how to use it, and how to find additional resources to manage it (for example, “now we can do this. but why should we?”). this problem stems from focusing excessively on the technology. librarians should instead focus on the phenomena made possible by the technology. most important of these phenomena, the library invites participation. as chad and miller state: library 2.0 facilitates and encourages a culture of participation, drawing upon the perspectives and con tributions of library staff, technology partners and the wider community. library 2.0 means harnessing this type of participation so that libraries can benefit from increasingly rich collaborative cataloguing efforts, such as including contributions from partner libraries as well as adding rich enhancements, such as book jackets or movie files, to records from publishers and others. library 2.0 is about encouraging and enabling a library’s community of users to participate, contribut ing their own views on resources they have used and new ones to which they might wish access. with library 2.0, a library will continue to develop and deploy the rich descriptive standards of the domain, whilst embracing more participative approaches that encourage interaction with and the formation of com munities of interest.18 the carte blanche statement that users participating in the library is “good,” however, is insufficient. library administers must ask, “what is the ultimate goal?” in summary, current initiatives in the library world to bring the tools of web 2.0 to the service of library 2.0 are exciting and innovative, and, more to the point, they are supportive of the library’s purpose. they may, however, incur costs, such as monitoring blogs and wikis, and cre figure 4: hillsdale teen library figure 5: hillsdale teen library flickr site 24 information technology and libraries | december 200724 information technology and libraries | december 2007 ating content and corresponding with users that stretch already inadequate resources even further. ultimately, the value of library 2.0 concepts requires us to answer some important questions: will they be used to further knowledge, or will they simply create more work for librarians? what does the next version of library 2.0 look like? is its mission the same, and only the tools dif ferent? what makes the library different from myspace— simply a legacy? should we incorporate new services into the current library offerings? how do we, as facilitators of conversations, point the way to the next generation of library? it is hoped that some of the concepts in participa tory librarianship may answer these questions and help further the innovations of the library 2.0 community. participatory networks the authors use the phrase “participatory networking” to encompass the concept of using web 2.0 principles and technologies to implement a conversational model within a community (a library, a peer group, the general public, and so on). why not simply adopt social networking, web 2.0, or library 2.0 for that matter? let us examine each term’s limitations: ■ social networking: social network sites such as myspace and facebook have certainly captured public attention. they also have proven very popular. in their short life spans, these sites have garnered an immense audience (myspace has been ranked one of the top destination sites on the web) and drawn much atten tion from the press.19 some of that attention, however, has been very negative. myspace, for example, has been typified as a refuge for pedophiles and online predators. even the television show saturday night live has parodied the site for the ease with which users can create false personas and engage in risky online behaviors.20 to say you are starting a social networking site in your library may draw either enthusiastic support, vehement opposition (“social networking experiment in my library?!”), or simply confused looks. add to the potential negative con notations the ambiguity of the term. is a blog a social networking site? is flickr? to compound this confu sion, the academic domain of social network theory predates myspace by about a decade. ■ web 2.0: ambiguity also dogs the web 2.0 world. for some, it is technology (blogs, ajax, web ser vices, and so on). for others, it is simply a buzzword for the current crop of internet sites that survived the burst of the dotcom bubble. in any case, web 2.0 certainly implies more than just the inclusion of users in systems. ■ library 2.0: as stated before, the term library 2.0 is a vague term used by some as a goad to the library community. further, this term limits the discussion of userinclusive web services to the library world. while this brief focuses on the library community, it also sees the library community as a potential leader in a much broader field. so, ultimately, the authors propose “participatory net working” as a positive term and concept that libraries can use and promote without the confusion and limitations of previous language. the phrase “participatory network” also has a history of prior use that can be built upon. it represents systems of exchange and integration and has long been used in discussions of policy, art, and government.21 the phrase also has been used to describe online communities that exchange and integrate information. ■ libraries as participatory conversations so where are we? we started with the abstract statement that knowledge is created through conversation. we then looked at the current landscape of technologies that can facilitate these conversations and showed examples of how libraries, other industries, and individuals are using these technologies. in this section we combine the larger framework with the technologies to see how libraries can incorporate participatory networks to further their knowledge mission. participatory librarianship in action let us look specifically at how participatory networks can be used in the library’s role as facilitator of knowledge through conversation. an obvious example is libraries hosting blogs and wikis for their communities, creat ing virtual meeting spaces for individuals and groups. indeed, these are increasingly useful functions for librar ies. they meet a perceived need in the community and can generate excitement both within the library and in the community. the idea of creating online sites for individu als and organizations makes sense for a library, although it is not without difficulties (see the section on challenges and opportunities). libraries also could use freely avail able (and increasingly easy to implement) open source software to create library versions of wikipedia (with or without enhanced editorial processes). another way for libraries to offer these services would be through a cooperative or other thirdparty vendor. such a service easily can be seen as a knowledge management activity capturing and providing local expertise while linking this expertise to that produced at other libraries. another reason for libraries to engage in participatory networking is that one library can more easily collaborate article title | author 25participatory networks | lankes, silverstein, and nicholson 25 with other libraries in richer dialogues. we currently have systems that connect our online catalogs and share resources through interlibrary loan. these conduits exist and can be used for the transferal of richer data, as has been proved through collaborative virtual reference sys tems. in our current systems, as in traditional library practice, when users are referred to other libraries, they are sent out and not brought back. in a participatory library setting, libraries would facilitate a conversation between the user, the community of the local library, and then through the developed conduits, other libraries and their communities. the end result would be a seamless web of libraries where the user can ignore the intrica cies of the library’s organization structure and boundar ies, and in which the libraries are using the best local resources to meet local needs. bringing libraries seamlessly together to participate in conversations with a single user has another sig nificant advantage: the library would make it easy for users to join the conversation regardless of where they are, through the presentation of a single façade. there is, for example, only one google, one amazon, and one wikipedia. why should users have to search from among thousands of libraries to find the conversations they want? participatory networking will be most effective when libraries work together, when the whole is greater than its parts. we currently see elements of the participatory library in the oclc open worldcat project. for example, users searching google may come across a listing provided by oclc. after selecting the entry for the book, the user can then jump to his or her own local library’s information about the book. users do not have to know which library to visit to find a book near them. extending this concept to conversations, one goal of these participatory networks is to make it easier for the user to enter a conversation with the library without having to work to discover their own specific entry points. however, ensuring this effective seamless access to the library will require more than simply adding ele ments of participatory networking around the library’s edges. adding services such as blogs and wikis may be seen merely as adjunct to current library offerings. as with any technological advance, scarce resources must be weighed against a desire to incorporate new services. do we expand the collection, improve the web site, or offer blogs to students? a better approach for making these kinds of decisions is to look at the needs of the community served in context with the commonly accepted, core tasks of a library, and see how they can be recast (and enhanced) as conversational, or participatory, tools. in point of fact, every service, patron, and access point is a starting point for a conversation. let’s start with the catalog. if the catalog is a conversation, it is decidedly formal and, more importantly, one way. think of today’s catalog as the educational equivalent of a college lecture. a for mal system is used to serve up a series of presentations on a given topic (selected by the user). the presentations are rigid in their construction (marc, aacr2, and so on). they follow an abstract model (relevance scores, some times alphabetical listings), and provide minimal oppor tunities to the receiver of the information to provide feedback or input. they provide no constructive means for the user to improve or shape the conversation. even recent advances in catalog functions (dynamic, graphical visualizations; faceted searching; simple search boxes’ links to noncollection resources) do little more than make the presentation of information more varied. they are still not truly interactive because they do not allow user participation; they do not allow for conversation. to highlight the oneway nature of the catalog, ask a simple question: what happens when the user doesn’t find something? do we assume that the information is there, but that the user is simply incapable of finding it (in which case the catalog presents search tips, refers the patron to an expert librarian who is capable, or offers more information literacy instruction)? do we assume that the information does not exist (refer the patron to interlibrary loan, pass him or her on to a broader search engine)? do we assume that the catalog itself is limited (refer the user to online databases, or other finding aids)? what if we assume that the catalog is just the current place a user is involving in an ongoing conversation —what would that look like? how can such a traditionally rigid system (in concept, more than in any one feature set) be made more participa tory? what if the user, finding no relevant information in the catalog, adds either the information or a place holder for someone else to fill in the missing information? possibly the user adds information from his or her exper tise. however, assuming that most people go to a catalog because they don’t have the information, perhaps the user instead begins a process for adding the information. the user might ask a question using a virtual reference service; at the end of the transaction, the user then has the option to add the question, along with the answer and associ ated materials, to the catalog. or perhaps, the user simply leaves the query in the catalog for other patrons to answer, requesting to be notified when a response is posted. in that case, when a new user does a catalog search and runs across the question, he or she can provide an answer. that answer might be a textual entry (or an image, sound, or video), or simply a new query that directs the original questioner or new patrons to existing information in the catalog (usercreated see also entries in the catalog). the catalog also can associate conversations with any data point. for example, a user pulls up the record for a book she or he feels might be relevant to an information need she or he is having. this process starts a conver sation between that user and the library, its users, and 26 information technology and libraries | december 200726 information technology and libraries | december 2007 authors of associated works. the user can see comments and ratings associated with this book from not only users of this library, but users of other libraries. also associated is a list of related works and the full audio of a lecture by the author. the user also might be directed to an in person or online book group that is reading that book. the point is that the catalog facilitates a conversation as opposed to simply presenting what it “knows” about a topic and then stepping out of the process. the catalog, then, does not simply present information, but instead helps users construct knowledge by allowing the user to participate in a conversation. there are other means of improving (and linking) systems in a conversational library. take the implicit link between the catalog and circulation. of course, these systems have always been linked in that items found in the catalog can be checked out, and checked out items have their status reflected in the catalog. but this kind of state information is a pretty meager offering. imagine using circulation data to improve the actual functionality of the catalog. take the example of a user who is search ing the catalog for fictional books on magic. currently, a relevance score between an item’s metadata and the query is computed and then all the items are ranked in a retrieval set. this relevance score can be computed in many ways, but is usually based on the number of times a keyword appears in the record and the placement of that keyword in the metadata record (giving preference to terms appearing in certain marc fields, such as titles). what is missing is the actual, realworld circulation of an item. wouldn’t it make sense, given such an abstract query, to present the user with harry potter first (but not exclusively)? what if we added circulation data to our relevance rankings: how many times this item has been checked out? it turns out that using a simple statistic is amazingly powerful. it is akin to google’s page rank algorithm that presents sites most linked to higher in the results. also, for those worried that users would be flooded with only popular materials, studies show that while these algorithms do change the very top ranked material, the effect quickly fades so that the user can still easily find other materials. another consideration for adjusting a search is to allow the user to tweak the algorithms used to retrieve works. in the example above, a user could turn off the popularity feature. the user also could toggle switches for currency, authority, and other facets of relevancy rankings. the conversational model requires us to rethink the catalog as a dynamic system with data of varying levels of currency and, frankly, quality, coming into and out of the system. in a conversational catalog, there is no reason that some data can’t exist in the catalog for limited dura tions (from years to seconds). records of wellgroomed physical collections may be a core and durable collection in the catalog, but that is only one of many types of infor mation that could exist in the catalog space. furthermore, even this core data can be annotated and linked to (and from) more transient media. so, the user might see a review from a blog as part of a catalog record on one day, but when she or he pulls the record up again in a few days, that review might be absent, the blog writer hav ing withdrawn the comment. this is akin to weeding the collection; however, it would happen in a more dynamic fashion than occurs with the content on library shelves. the conversational model also can be used in other areas of the library. what do we digitize? what do we select? what programs do we offer? what do we pre serve? the empowered user can participate in answer ing all of these questions but does not replace the expert librarian; rather, the user contributes additional and diverse information and commentary. in fact, the catalog scenario just proposed already assumes that the library catalog does more than store metadata. in order for the scenario to work, the catalog must store questions, answers, video, audio—in essence the catalog must be expanded and integrated with other library systems so that a final participatory library system can present a coherent view of the library to patrons. the next section lays out a sort of roadmap for these enhance ments and mergers. framework for integration of participatory librarianship as has been noted, participatory networks and libraries as conversations are not brand new concepts sprung from the head of zeus. instead, they are means to integrate past and current innovations and create a viable plan forward. figure 6 provides a sort of road map of how the library might make the transition from current systems to a truly participatory system. it includes current systems, systems under development (such as federated searching), and new concepts (such as the participatory library). it seeks to capture current momentum and push the field forward to a larger view instead of getting bogged down in the intricacies of any one development activity. along the left side of the graph are current library systems. while the terminology may differ from library to library, nearly every system can be found on today’s library web sites. by showing the systems together, the problems of user confusion and library management burden become obvious. users must often navigate these systems based on their needs, and often with little help. should they search the catalogs first, or the databases? isn’t the catalog really just another database? which data base do they choose? in our attempts to serve the users better by creating a rich set of resources and services, we have instead complicated their informationseeking lives. as one librarian puts it, “don’t give me one more system i, or my patrons, have to deal with.” article title | author 27participatory networks | lankes, silverstein, and nicholson 27 from the array of systems on the left side, we can see that libraries have not been doing themselves any favors either. we are maintaining many systems, therefore mak ing the calls for yet more systems not only impractical but unwise. the answer is to integrate systems, combining the best of each while discarding the complexity of the whole. the library world is in the midst of doing just that. this section seeks to highlight promising developments in integrating library systems well beyond the library catalog and to highlight not only an ideal endpoint, but also how this ideal system is truly participatory. merging reference and community involvement the functional area furthest along in the integration of participatory librarianship is reference; as reference is most readily recognizable as a conversation, this comes as no surprise. over the last decade, reference services have gone online and have led to shared reference ser vices. more importantly, reference done online creates artifacts of reference conversations: electronic files than can be cleaned of personal information and placed in a knowledge base and used as a resource for other users. a new development in reference is the reference blog, in which multiple librarians and other users can be part of a questionanswering community with conversations that can live on beyond a single transaction. another functional area of libraries that is already involved with participatory librarianship is community involvement. for decades, public libraries have supported local community groups through meeting spaces. some libraries now are hosting web spaces for local groups. as libraries incorporate participatory technologies into their offerings, they can create virtual places such as discussion forums, wikis, and blogs for these community groups to use. if there are standards for these discussion areas, then groups from different communities also could easily participate in shared boards; this makes sense for groups such as weight watchers or alcoholics anonymous that have local branches and national involvement. in an academic setting, these groups can be student, faculty, or staff organizations or courses. in addition to reference and hosted community con versations, the library has been actively creating digi tal collections of materials (either through digitization, leasing service from content providers, or capturing the library’s born digital items). parallel to the digital collec tion building of library materials is an active attempt to create institutional repositories of faculty papers, teacher lesson plans, organizational documentation, and the like. these services are participatory systems in which col lections come from users’ contributions, and they may evolve into digital repositories that include both user and librariancreated artifacts. these different conversations can be archived into a single repository, and, if properly planned, the refer ence conversations can live alongside, and eventually be intermingled with, the community conversations, and the digital repository (which, after all, though formal, is a community conversation) into a community repository. community repositories allow librarians to be more eas ily involved in the conversations of the community and capture important artifacts of these conversations for later use. merging library metadata into an enhanced catalog participatory librarianship can be supported by another functional area of the library: collections. traditionally, the collection comprises books, magazines, and other information resources paid for by the library. electronic resources, such as databases that are leased instead of purchased, make up a large portion of library expen ditures. more recently, webbased resources (external feeds and sites) have been selected and added to the virtual collection. several kinds of finding aids are used to locate these information resources. the catalog and databases both contain descriptions of resources and searching interfaces. in order to improve access, libraries include records for databases within the catalog. conversely, federated search ing tools combine the records from different databases and could allow the retrieval of both books and articles by com bining records from the traditional catalog and databases into one tool. if communitycreated resources are part of the catalog, then these resources also would be findable alongside other traditional library resources. the tools for describing information resources also can be participatory. in traditional librarianship, the librarians provide metadata that patrons then use to make selections. figure 6: road map of how the library might make the transition from current systems to a truly participatory system. 28 information technology and libraries | december 200728 information technology and libraries | december 2007 by examining this use data, recommender systems can be created to help users locate new materials. in participatory networking, patrons will be encouraged to add comments about items. if standards are used for these comments, then they can be shared among libraries to create larger pools of recommendations. as these comments are analyzed, they can be combined with usage databases to create stronger recommender systems to present patrons with additional choices based upon what is being explored. the end result is an enhanced catalog that allows users and libraries to find information regardless of which sys tem the information resides in. however, the enhanced catalog is still just that, a catalog. it contains surrogates of digital information and is managed separately from the artifacts themselves. in the case of physical items, this may be all the library systems can manage, but in the case of digital content, there is one more step that needs to be taken. namely, the artificial barrier between catalog (defined as inventory control system) and content (housed in the community repository) must come down. building the participatory library at this point in the evolution of distributed systems into a truly integrated library system, the participatory library, we have two large collections: one of resources, and one of information about the resources. the first collection of digital content, the community repository, is built by the library and its users collaboratively. the second collection, the enhanced catalog, includes metadata, both formal and usercreated (such as ratings, commentary, use data, and the like). both the community repository and the enriched catalog are participatory. yet to realize the dream of a seamless system of functionality (seamless to the user and the library), these two systems must be merged, allow ing users to find resources and, much more importantly, conversations. furthermore, the users must be able to add to metadata (such as tags to catalog records) and content (such as articles, postings to a wiki, or personal images). the result may be conceived of as a single integrated infor mation resource, which, for the purposes of this conversa tion, is called the participatory library. users may access the participatory library directly through the library or as a series of services in google, myspace, or their own home pages. the point is that the access to the library takes place at the point of conversa tion, not at the point the user realizes he or she needs information from the library. conversations and preservation the conversation model highlights the need for preserva tion. aside from simply providing systems that facilitate conversation, libraries serve as the vital community memory. conversations construct knowledge, but some one must remember what has already been said and know how to access that dialog. scientific conversations, for example, are built on previous conversations (theories, studies, methods, results, and hypotheses). capturing conversations and playing them back at the right time is essential. this might mean the preservation of artifacts (maps, transcripts, blueprints, photographs), but also it means the increasingly important tasks of capturing the digital dialogs. this highlights the need for institutional repositories (that will later be integrated seamlessly with other library systems, as previously discussed). specifically, web sites, lectures, courseware, and articles must be kept. further, they must be kept in true conversa tional repositories that capture the artifacts (the papers), the methods (data, instruments, policy documents), and the process (meeting notes, conversations, presentations, web sites, electronic discussions). they must be kept in information structures that make them readily available as conversations; in other words, users must be able to search for materials and reconstruct a conversation in its entirety from one fragment. being where the conversation is imagine the conversations that are going on in your local library as you read this. imagine the physicist chatting with the gardener, and the trustee talking with the volunteer who is reading the latest bestseller. what knowledge can be gleaned from these novel interac tions? can you measure it? can you enhance it? can you capture it? can you recall it when it would be precisely what a user needs? note also that these conversations do not belong solely to the library. the library is only part of the con versation. faced with the daunting variety of resources available on the web, many organizations try to become the single point of entry into it. remember that conversa tions are varied in their mode, places, and players, and, more importantly, that they are intensely personal. this means that participants need to have ownership in them, and often in their locations as well. this also means that the library, as facilitator, needs to be varied in its modes and access points. in many cases, it is better to either create a personal space in which users may converse, or, increasingly, to be part of someone else’s space. what we can learn from web 2.0’s mashups is that smaller sets of limited (but easy to access) functionalities lead to greater incorporation of tools into people’s lives. in the chicagocrime–google maps mashup, combining maps from google and chicago crime statistics, it was important for the host of the site to brand the space and shape the interface for his conversation on crime. can your library functions be as easily incorporated into these types of conversations? can a user search your catalog and present the results on his or her web site? the point is that libraries need to be proactive in a new way. instead of article title | author 29participatory networks | lankes, silverstein, and nicholson 29 the mantra, “be where the user is,” we need to, “be where the conversation is.” it is not enough to be at the users’ desktops; you need to be in their email program, in their myspace pages, in their instant messaging lists, and in their rss feed readers. all of these examples point to a significant mental shift that librarians will need to make in moving from delivering information from a centralized location to delivering information in a decentralized manner where the conversations of users are taking place. the catalog example presented earlier is an example of a centralized place for conversations. what if, instead of only being in a catalog, the same data were split into smaller components and embedded in the user’s browser and email pro grams? just as google’s mail system embeds advertising based upon the content of a message, the library could provide links to its resources based upon what a user is working on. by disaggregating the information within its system, the library can deliver just what is needed to a user, provide connections into mashups, and live in the space of the user instead of forcing the user to come to the space of the library. challenges and opportunities there is clearly a host of challenges in incorporating par ticipatory networks and a participatory model into the library. this is to be expected when we are dealing with something as fundamental as knowledge and as personal as conversations. we consider four major challenges that must be met by libraries before they can truly get into the business of participatory librarianship. technical there is a rich suite of participatory networking software that libraries can incorporate into their daily operations. implementing a blog, a wiki, or rss feeds these days is not a hard task, and they can easily be used to deliver information about library services and conversations to the user’s space. furthermore, these systems are often tested in very largescale environments and are, in some cases, the same tools used in large participatory network ing sites such as wikipedia and blogger. some of these packages are commercial, but others are open source software. open source software is cheaper, easier to adapt, and, in some cases, more advanced. the downside to open source is that it requires a considerable amount of technical knowledge by the library (but not as much as one might think) and does not come with a technical support hotline. the largest technological impediment, however, may be the currently installed base of software within librar ies. integrated library systems have a long history and include a broad range of library functions. legacy code and near monolithic systems have restricted the easy exchange of a diverse set of information. were these sys tems written today, they would use modular code and loosely coupled apis and allow customers much more interface customizability. these changes may come to integrated library systems (as customers are demanding it), but it may take years. several libraries are currently attempting to pick apart these integrated systems themselves. often, libraries go to the underlying databases that hold the library metadata or create their own data structures, such as the university of pennsylvania data farm project.22 once components of this system are exposed, the catalog simply becomes another database that can be federated into new and uni fied interfaces. however, such integration requires a great deal of technological expertise. there is an opportunity for integrated library system vendors or large consortial groups such as oclc to move quickly into this space. in the meantime, there is an opportunity for the larger library community. this technology brief was created in response to a perceived need. whether evi denced in the library 2.0 community or in conversations at lita, libraries are now interested in incorporating new web technologies into their offerings and opera tions. the technologies under consideration here pres ent platforms for experimentation. rather than setting up thousands of separated experiments, however, the library community should create a participatory net work of its own. the technology certainly exists to create a test bed for libraries to set up various combinations of communication technologies (blogs, tagging, wikis), to test new web services against pooled data (catalog data, metadata repositories, and large scale data sets), and even to incorporate new services into the current library offerings (rss feeds, for example). by combining resources (money, time, expertise) in a single, largescale test bed, libraries not only can get greater impact for the their investments, but can directly experience life as a connected conversation. these connections, if built at the ground level, will then make it easier for the library to come into existence. terminology can be clarified, claims tested, and best practices collaboratively developed, greatly accelerating innovation and dissemination. operational in addition to being in the conversation business, librar ies are in the infrastructure business. one of the most powerful aspects of a library is its ability not only to develop a collection of some type of information, but to maintain it over time. sometimes infrastructure can be problematic (as in the case of legacy systems), but more often than not it provides a stable foundation from which to operate. there are many conversations going on that need infrastructure but have none (or little). think of the opportunities in your community for using the web to 30 information technology and libraries | december 200730 information technology and libraries | december 2007 facilitate a conversation. it might be a researcher want ing to disseminate the results of his or her latest study. it might be a community organization seeking funding. it might be a business trying to manage its basic opera tional knowledge. the point is that such individuals and community organizations are not in the infrastructure business and could use a partner who is. imagine a local organization coming to the library and, within a few min utes, setting up a web site with an rss feed, a blog, and bulletin boards. the library facilitates, but does not own, that individual’s or organization’s conversation. it does form a strong partnership, however, that can be leveraged into resources and support. the true power of participa tory networking in libraries is not to give every librarian a blog; it is in giving every community member a blog (and making the librarian a part of the community). in addition, the library can play the role of connecting these conversations to other users when appropriate. participatory libraries allow the concept of com munity center (intellectual center, service center, media center, information center, meeting center) to be extended to the web. many public libraries have no problem providing meeting space to local nonprofits. why not provide web meeting space in the form of a web site or web conferencing? many academic libraries attempt to capture the scholarly output of their faculties, why not help generate the output with research data stores? the answers to these questions inevitably come back to time and money. however, there is nothing in this brief that says such services have to be free. in fact, the best part nerships are formed when all partners are invested in the process. the true problem is that libraries have no idea of how to charge for such services. faculty would be glad to write library support into grants (in the form of web site creation and hosting), but need a dollar figure to include and how long each task will take. many libraries aren’t used to positioning their services on a per item basis, and this makes it difficult to build partnerships. sometimes it is not a lack of money, but a lack of structure to take in money that is the problem. policy as always, it is policy that presents the greatest challenges. the idea of opening the library functions to a greater set of inputs is rife with potential pitfalls. how can libraries use the technologies and concepts of facebook and myspace without being plagued by their problems? how can users truly be made part of the collection without the library being liable for all of their actions? the answers may lie in a seemingly obscure concept: identity management. conversations can range in their mode, topic, and duration. they also can vary in the conversants. the library needs to know a conversant’s status to determine policy (for example, we can only disclose this information to this person), and requires a unique identifier, such as a library card, to uphold it. in traditional libraries, that is the extent of identity management. in a participatory model, distinctions among identi ties become complex and graduated, and require us to consider a new approach. this new model, of patrons adding information directly to library systems, is not as radical as it may first appear. we have become very used to the idea of roles and tiered levels of authority in many other settings. most modern computer systems allow for some gradation in user abilities (and responsibilities). online communities have even introduced merit systems, by which continual highquality contributions to a site equals greater power in the site. think about amazon, wikipedia, even ebay; as users contribute more to the community, they gain status and recognition. from par ticipants to editors, from readers to writers, these organi zations have seen membership as a sliding scale of trust, and libraries need to adopt this approach in all of their basic systems. we currently do, to a degree, in the form of librarians, paraprofessionals, and other staff. yet even these distinctions tend to be rigid and often classbased, with high walls (such as a master’s degree) between the strata. some of this is imposed by outside organizations (civil service requirements, tenure track, and so on), but a great deal is there by inertia of the field. skillful use of identity management will help librar ies avoid the baggage of myspace and facebook. as users grain greater access, greater responsibility, and greater autonomy, libraries need to be more certain of their identities. that is, for a user to do more requires the library to know more. knowing about a user may involve traditional identity verification or tracking an activity trail, whereby intentions can be judged in rela tion to actions. these concepts may be expressed as, “the more we know you, the more control you can have in valuable services such as blogging, or the catalog.” the concepts are illustrated in blogger and livejournal, both of which require some level of identity information. in another example, to join livejournal you must be invited, thus the community confers identity. the common theme is that verifying (and building) identity is community based. the difference between the library and myspace is that the library works in an established community with traditional norms of identity, whereas myspace is seeking to create a community (where identity is more defined by social connections than actions). both the library and the services mentioned above, however, base their functions and services on identity. ethical as knowledge is developed through conversation, and libraries facilitate this process, libraries have a powerful impact on the knowledge generated. can librarians inter fere with and shape conversations? absolutely. should we? we can’t help it. our collections, our reference work, article title | author 31participatory networks | lankes, silverstein, and nicholson 31 our mere presence will influence conversations. the ques tion is, in what ways? by dedicating a library mission to directly align with the needs of a finite community, we are accepting the biases, norms, and priorities of the com munity. while a library may seek to expand or change the community, it does so from within. when internet filtering became a requirement for fed eral internet funding, public and school libraries could not simply quit, or ignore the fact, because they are agents of their communities. school libraries had to accept filtering with federal funding because their parent organizations, the schools, accepted filtering.23 we see, from this example, that libraries may shift from facilitating conversations to becoming active conversants, but they are always doing both. thus, the question is not whether the library shapes conversations, but which ones, and how actively? these questions are hardly new to the underlying principles of librarianship. and nothing in the participa tory model seeks to change those underlying principles. the participatory model does, however, highlight the fact that those principles shape conversations and have an impact on the community. ■ recommendations the overall recommendation of this article is that librar ies must be active participants in the ongoing conversa tions about participatory networking. they must do so through action, by modeling appropriate and innovative use of technologies. this must be done at the core of the library, not on the periphery. rather than just adding blogs and photosharing, libraries should adopt the princi ples of participation in existing core library technologies, such as the catalog. anything less simply adds stress and stretches scarce resources even further. to complement this broad recommendation, the authors make two specific proposals: expand and deepen the discussion and understanding of participatory net works and participatory librarianship, and create a par ticipatory library test bed to give librarians needed participatory skills and sustain a standing research agenda in participatory librarianship. as stated in the outset of this document, what you are reading is limited. while it certainly contains the kernel and essence of participatory networks (systems to allow users to be truly part of services) and participatory librar ianship (the role of librarianship as facilitators and actors in conversations in general), the focus was on technology and technology changes. already, the ideas contained in this document have been part of an active conversation. the first draft of this document was made available for public comment via a wiki, email, and bulletin boards, and concepts herein presented at conferences and lec tures. however, there is now a need to broaden the scope and scale of the conversation. the theoretical founda tions of participatory librarianship need to be rigorously presented. the nontechnical components of the ideas (and the marriage of nontechnical to technical) need to be explored. there are curricular implications: how do we prepare participatory librarians? the nature and form of the library and participatory systems need to be dis cussed and examined in theoretical, experimental, and operational contexts. in order to do this, the authors propose a series of con versations to engage the ideas. these conversations, both in person and virtual, need to be within the profession and across disciplines and industries. the deeper conversa tions need to be documented in a series of publications that expand this document for academics and practitioners. the authors feel, however, that the first proposal must be grounded in action. to complement the more abstract exploration of participatory networks and participatory librarianship, there must be an active playground where conversants can experience firsthand the technologies discussed, and then actively shape the tools of partici pation. this is the test bed. this test bed would imple ment a participatory network of libraries, and provide a common technology platform to host blogs, wikis, discussion boards, rss aggregators, and the like. these shared technologies would be used to experiment with new technologies and to provide real services to librar ies. thus, libraries could not only read about blogging applications, they could try them and even roll them out to their community members. as libraries start new com munity initiatives, they could rapidly add wikis and rss feeds hosted at the shared test bed. the test bed would also make all software available to the libraries so they could locally implement technologies that have proven themselves. the test bed would provide the open source software and consulting support to implement features locally. the test bed also would develop new metrics and means of evaluating participatory library services for the use of planners and policy makers. a major deliverable of the test bed, however, would be to model innovations in integrated library systems (ils). the test bed would work with libraries and ils vendors to pilot new technologies and specify new stan dards to accelerate ils modernization. the point of the test bed is not to create new ilss, but to make it easy to incorporate innovative technologies into vendor and open source ilss. the location and support model of the test bed are open for the library community to determine. certainly, it could be placed in existing library associations or orga nizations. however, it would require the host to be seen as neutral in ils issues, and to be capable of supporting a diverse infrastructure over time. the host organiza tion also would need to be a nimble organization, able 32 information technology and libraries | december 200732 information technology and libraries | december 2007 to identify new technical opportunities and implement them quickly. one model that might work is establishing a pooled fund from interested libraries. this pooled fund would support an open source technology infrastructure and a small team of researchers and developers. the team’s activities would be overseen by an advisory panel drawn from contributing members. such a model spreads this investment out into experimentation across a broad col laboration and should, ultimately, save libraries time and money. as a result, the time and money that indi vidual libraries might spend on isolated or disconnected experiments can be invested in a common effort with greater return. libraries have a chance not only to improve service to their local communities, but to advance the field of par ticipatory networks. with their principles, dedication to service, and unique knowledge of infrastructure, libraries are poised not simply to respond to new technologies, but to drive them. by tying technological implementa tion, development, and improvement to the mission of facilitating conversations across fields, libraries can gain invaluable visibility and resources. impact and leadership, however, come from a firm and conceptual understanding of libraries’ roles in their communities. the assertion that libraries are an indis pensable part of knowledge generation in all sectors pro vides a powerful argument to an expanded function of libraries. eventually, blogs, wikis, rss, and ajax all will fade in the continuously dynamic internet environment. however, the concept of participatory networks and con versations is durable. ■ acknowledgements the authors would like to thank the following people and groups: ken lavender, for his editing prowess. the doctoral students of ist 800 for providing input on conversation theory: johanna birkland, john d’ignazio, keisuke inoue, jonathan jackson, todd marshall, jeffrey owens, katie parker, david pimentel, michael scialdone, jaime snyder, sarah webb. the students of ist 676 for their tremendous input and for their exploration of the related concept of massive scale librarianship: marcia alden, charles bush, janet chemotti, janet feathers, gabrielle gosselin, ana guimaraes, colleen halpin, katie hayduke, agnes imecs, jennifer kilbury, minchun ku, todd mccall, virginia payne, joseph ryan, jean van doren, susan yoo. those who commented on the draft, including karen scheider, walt crawford and john buschman, and kathleen de la peña mccook. lita for giving us a forum for feedback. carrie lowe, rick weingarten, and mark bard of ala’s oitp for their feedback and support. the institute staff, including lisa pawlewicz, joan laskowski, and christian o’brien, for logistical support. references and notes 1. cited in p. hardiker and m. baker, “towards social theory for social work,” handbook of theory for practice teachers in social work, j. lishman, ed. (london: jessica kingsley, 1991). 2. g. pask, conversation theory: applications in education and epistemology (new york: elsevier, 1976). 3. linda h. bertland, “an overview of research in metacog nition: implications for information skills instruction,” school library media quarterly 15 (winter 1986): 96–99. 4. pask, conversation theory, 92. 5. tim o’reilly, “what is web 2.0: design patterns and business models for the next generation of software,” o’reilly, www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/ whatisweb20.html (accessed feb. 1, 2007). 6. j. suroweicki, the wisdom of crowds (new york: double day, 2004). 7. “wiki’s wild world: researchers should read wikipedia cautiously and amend it enthusiastically,” nature 438, no. 890 (dec. 2005): 890; www.nature.com/nature/journal/v438/ n7070/full/438890a.html (accessed feb 1, 2007). 8. google, “google maps api,” www.google.com/apis/ maps (accessed feb. 1, 2007). 9. “java script tutorial,” w3 schools, www.w3schools.com/ js/default.asp (accessed feb. 1, 2007). 10. while the terms in web 2.0 are a bit ambiguous, many people confuse the term “mashup” with “remixes.” mashups are combining data and functions (such as mapping), whereas remixes are reusing and combining content only. so combining a song with a piece of video to create a “new” music video would be a remix. mapping all of your videos on a map using youtube to store the videos and google maps to plot them geographically would be a mashup. 11. for example gmail, a very widely used, webbased email service, but is still considered “beta” by google. 12. malcolm gladwell, the tipping point: how little things can make a big difference (boston: back bay books, 2000), 272. 13. oclc, “introduction to dewey decimal classification,” www.oclc.org/dewey/versions/ddc22print/intro.pdf (accessed feb. 1, 2007). 14. “ajax (programming),” wikipedia, http://en.wikipedia .org/wiki/ajax_(programming) (accessed feb. 1, 2007). 15. “web services activity,” w3c, www.w3.org/2002/ws (accessed feb. 1, 2007). 16. walt crawford, “library 2.0 and ‘library 2.0.’ ” cites & insights 6, no. 2 (2006), http://citesandinsights.info/civ6i2.pdf (accessed dec. 13, 2007). 17. eric ormsby, “the battle of the book: the research library today,” the new criterion (oct. 2001): 8. 18. ken chad and paul miller, “do libraries matter? the rise of library 2.0: a white paper,” version 1.0, 2005, www.talis .com/downloads/white_papers/dolibrariesmatter.pdf (accessed feb. 1, 2007). 19. slashdot, “myspace #1 us destination last week,” h t t p : / / s l a s h d o t . o rg / a r t i c l e s / 0 6 / 0 7 / 1 2 / 0 0 1 6 2 11 . s h t m l article title | author 33participatory networks | lankes, silverstein, and nicholson 33 (accessed feb. 1, 2007); pete williams, “myspace, facebook attract online predators,” msnbc, www.msnbc.msn.com/ id/11165576 (accessed feb. 1, 2007); “the myspace gener ation,” businessweek, dec. 12, 2005, www.businessweek .com/magazine/content/05_50/b3963001.htm (accessed feb. 1, 2007). 20. saturday night live, “sketch: myspace seminar,” nbc, www.nbc.com/saturday_night_live/segments/9166.shtml (accessed feb. 1, 2007). 21. c. stohl and g. cheney, “participatory processes/para doxical practices,” management communication quarterly 14, no. 3 (2001): 349–407. 22. j. zucca, “traces in the clickstream: early work on a management information repository at the university of penn sylvania,” information technology and libraries 22, no. 4 (2003): 175–78. 23. to be more precise, public and school libraries that accept erate funding. 170 information technology and libraries | december 2011 this paper summarizes a research program that focuses on how catalogers, other cultural heritage information workers, web/semantic web technologists, and the general public understand, explain, and manage resource description tasks by creating, counting, measuring, classifying, and otherwise arranging descriptions of cultural heritage resources within the bibliographic universe and beyond it. a significant effort is made to update the nineteenth-century mathematical and scientific ideas present in traditional cataloging theory to their twentiethand twenty-first-century counterparts. there are two key elements in this approach: (1) a technique for diagrammatically depicting and manipulating large quantities of individual and grouped bibliographic entities and the relationships between them, and (2) the creation of resource description exemplars (problem–solution sets) that are intended to play theoretical, pedagogical, and it system design roles. to the reader: this paper presents a major re-visioning of cataloging theory, introducing along the way a technique for depicting diagrammatically large quantities of bibliographic entities and the relationships between them. as many details of the diagrams cannot be reproduced in regularly sized print publications, the reader is invited to follow the links provided in the endnotes to pdf versions of the figures. c ataloging—the systematic arrangement of resources through their descriptions that is practiced by libraries, archives, and museums (i.e., cultural heritage institutions) and other parties1—can be placed in an advanced, twenty-first-century context by updating its preexisting scientific and mathematical ideas with their more contemporary versions. rather than directing our attention to implementation-oriented details such as metadata formats, database designs, and communications protocols, as do technologists pursuing bottom-up web and semantic web initiatives, in ronald j. murray and barbara b. tillett cataloging theory in search of graph theory and other ivory towers object: cultural heritage resource description networks this paper we will define a complementary, top-down approach. this top-down approach focuses on how catalogers, other cultural heritage information workers, web/ semantic web technologists, and the general public have understood, explained, and managed their resource description tasks by creating, counting, measuring, classifying, and otherwise arranging descriptions of cultural heritage resources within and beyond the bibliographic universe. we go on to prescribe what enlargements of cataloging theory and practice are required such that catalogers and other interested parties can describe pages from unique, ancient codices as readily as they might describe information elements and patterns on the web. we will be enhancing cataloging theory with concepts from communications theory, history of science, graph theory, computer science, and from the hybrid field of anthropology and mathematics called ethnomathematics. employing this strategy benefits two groups: ■■ workers in the cultural heritage realm, who will acquire a broadened perspective on their resource description activities, who will be better prepared to handle new forms of creative expressions as they appear, and who will be able to shape the development of information systems that support more sophisticated types of resource descriptions and ways of exploring those descriptions. to build a better library system (perhaps an n-dimensional, n-connected system?), one needs better theories about the library collections and the people or groups who manage and use them. ■■ the full spectrum of people who draw on cultural heritage resources: scholars, creatives (novelists, poets, visual artists, musicians, and so on), professional and technical workers, students, and other people or groups pursuing specific or general, long or short-term interests, entertainment, etc. to apply a multidisciplinary perspective to the processes by which resource description data (linked or otherwise) are created and used is not an ivory tower exercise. our approach draws lessons from the debates on why, what, and how to describe physical phenomena that were conducted by physicists, engineers, software developers (and their historian and philosopher of science observers) during the evolution of high-energy physics. during that time, intensive debates raged over theory and observational/experimental data, the roles of theorists, experimenters, and instrument builders, instrumentation, and hardware/software system design.2 accommodating the resulting scientific approaches to description, collaboration, and publishing has required the creation of information technologies that have had and continue to have world-shaking effects. ronald j. murray (rmur@loc.gov) is a digital conversion specialist in the preservation reformatting division, and barbara b. tillett (btil@loc.gov) is the chief of the policy and standards division at the library of congress. cataloging theory in search of graph theory and other ivory towers | murray and tillett 171 descriptions—accounts or representations of a person, object, or event being drawn on by a person, group, institution, and so on, in pursuit of its interests. given this definition, a person (or a computation) operating from a business rules–generated institutional or personal point of view, and executing specified procedures (or algorithms) to do so, is an integral component of a resource description process (see figure 1). this process involves identifying a resource’s textual, graphical, acoustic, or other features and then classifying, making quality and fitness for purpose judgments, etc., on the resource. knowing which institutional or individual points of view are being employed is essential when parties possessing multiple views on those resources describe cultural heritage resources. how multiple resource descriptions derived from multiple points of view are to be related to one another becomes a key theoretical issue with significant practical consequences. ■■ niels bohr’s complementarity principle and the library in 1927, the physicist niels bohr offered a radical explanation for seemingly contradictory observations of physical phenomena confounding physicists at that time.6 according to bohr, creating descriptions of nature is the primary task of the physicist: it is wrong to think that the task of physics is to find out how nature is. physics concerns what we can say about nature.7 descriptions that appear contradictory or incomparable may in fact be signaling deep limitations in language. bohr’s complementarity principle states that a complete description of atomic-level phenomena requires descriptions of both wave and particle properties. this is generally understood to mean that in the normal language these physics research facilities and their supporting academic institutions are the same ones whose scientific subcultures (theory, experiment, and instrument building) generated the data creation, management, analysis, and publication requirements that resulted in the creation of the web. in response to this development, we have come to believe that cultural heritage resource description (i.e., the process of identifying and describing phenomena in the bibliographic universe as opposed to the physical one) must now be as open to the concepts and practices of those twenty-first-century physics subcultures as it had been to the natural sciences during the nineteenth century.3 we have consequently undertaken an intensive study of the scientific subcultures that generate scientific data and have identified four principles on which to base a more general approach to cultural heritage resource description: 1. observations 2. complementarity 3. graphs 4. exemplars the cultural heritage resource description theory to follow proposes a more articulated view of the complex, collaborative process of making available—through their descriptions—socially relevant cultural heritage resources at a global scale. we will demonstrate that a broader understanding of this resource description process (along with the ability to create improved implementations of it) requires integrating ideas from other fields of study, reaching beyond it system design to embrace larger issues. ■■ cataloging as observation as stated in the oxford english dictionary, an observation is: the action or an act of observing scientifically; esp. the careful watching and noting of an object or phenomenon in regard to its cause or effect, or of objects or phenomena in regard to their mutual relations (contrasted with experiment). also: a measurement or other piece of information so obtained; an experimental result.4 following the scientific community’s lead in striving to describe the physical universe through observations, we adapted the concept of an observation into the bibliographic universe and assert that cataloging is a process of making observations on resources. human or computational observers following institutional business rules (i.e., the terms, facts, definitions, and action assertions that represent constraints on an enterprise and on the things of interest to the enterprise)5 create resource figure 1. a resource description modeled as a business ruleconstrained account of a person, object, or event 172 information technology and libraries | december 2011 purpose, its reformatting, and its long-term preservation must take into consideration that resource’s physical characteristics. having things to say about cultural heritage resources—and having many “voices” with which to say them—presents the problem of creating a well-articulated context for library-generated resource descriptions as well as those from other sources. these contextualization issues must be addressed theoretically before implementation-level thinking, and the demands of contextualization require visualization tools to complement the narratives common to catalogers, scholars, and other users. this is where mathematics and ethnomathematics make their entrance. ethnomathematics is the study of the mathematical practices of specific cultural groups over the course of their daily lives and as they deal with familiar and novel problems.10 an ethnomathematical perspective on cultural heritage resource description directs one’s attention to the existence of simple and complex resource descriptions, the patterns of descriptions that have been created, and the representation of these patterns when they are interpreted as expressions of mathematical ideas. a key advantage of operating from an ethnomathematical perspective is becoming aware that mathematical ideas can be observed within a culture (namely the people and institutions who play key roles in observing the bibliographic universe) before their having been identified and treated formally by western-style mathematicians. ■■ resource description as graph creation relationships between cultural heritage resource descriptions can be represented as conceptually engaging and flexible systems of connections mathematicians call graphs. a full appreciation of two key mathematical ideas underlying the evolution of cataloging—putting things into groups and defining relationships between things and groups of things—was only possible after the founding, naming, and expansion of graph theory, which is a field of mathematics that emerged in the 1850s, and the eventual acceptance around 1900 of set theory, a field founded amid intense controversy in 1874. between the emergence of formal mathematical treatments of those ideas by mathematicians and their actual exploitation by cataloging theorists—or by anyone capable of considering library resource description and organization problems from a mathematical perspective—lay a gulf of more than one hundred years.11 it remained for scholars in the library world to begin addressing the issue. tillett’s 1987 work on bibliographic relationships and svenonius’s 2000 definition of bibliographic entities in set-theoretic terms that physicists use to communicate experimental results, the wholeness of nature is accessible only through the embrace of complementary, contradictory, and paradoxical descriptions of it. later in his career, bohr vigorously affirmed his belief that the complementarity principle was not limited to quantum physics: in general philosophical perspective, it is significant that, as regards analysis and synthesis in other fields of knowledge, we are confronted with situations reminding us of the situation in quantum physics. thus, the integrity of living organisms, and the characteristics of conscious individuals, and most of human cultures, present features of wholeness, the account of which implies a typically complementary mode of description. . . . we are not dealing with more or less vague analogies, but with clear examples of logical relations which, in different contexts, are met with in wider fields.8 within a library, there are many things catalogers, conservators, and preservation scientists—each with their distinctive skills, points of view, and business rules—can observe and say about cultural heritage resources.9 much of what these specialists say and do strongly affects library users’ ability to discover, access, and use library resources in their original or surrogate forms. while observations made by these specialists from different perspectives may lead to descriptions that must be accepted as valid for those specialists, a fuller appreciation of these descriptions calls for the integration of those multiple perspectives into a well-articulated, accessible whole. reflecting the perspectives of the library of congress directorates in which we work, the acquisitions and bibliographic access (aba) directorate and the preservation directorate, we assert that the most fundamental complementary views on cultural heritage resources involve describing a library’s resources in terms of their availability (from an acquisitions perspective), in terms of their information content (from a cataloging perspective), and in terms of their physical properties (from a preservation perspective). for example, in the normal languages used to communicate their results, preservation directorate conservators narrate their condition assessments and record simple physical measurements of library-managed objects—while at the same time preservation scientists in another section bring instrumentation to acquire optical and chemical data from submitted materials and from reference collections of physical and digital media. even though these assessments and measurements may not be comprehended by or made accessible to most library users, the information gathered possess a critical logical relationship to bibliographic and other descriptions of those same resources. key decisions regarding a library resource’s fitness for cataloging theory in search of graph theory and other ivory towers | murray and tillett 173 by the modeling technique. what is required instead is theory-based guidance of systems development, alongside theory testing and improvement through application use. if software development is not constrained by a tacit or explicit resource description theory or practice, graph or other data structures familiar to the historically less well-informed, those favored by an institution’s system designers and developers, or those familiar to and favored by implementation-oriented communities may be invoked inappropriately.18 given graph theory’s potentially overwhelming mathematical power—as evidenced by its many applications in the physical sciences, engineering, and computer science—investigations into graph theory and its history require close attention both to the history and evolving needs of the cultural heritage community.19 the unnecessary constraint on resource description theory formation occasioned by the use of e-r or oo modeling can be removed by dispensing with it system analysis tools and expressing resource description concepts in graph-theoretical terms. with this step, the very general elements (i.e., entities and relationships) that characterize e-r models and the more implementation-oriented ones in oo models are replaced by more mathematically flexible, theory-relevant elements expressed in graph-theoretical terms. the result is a “graph-friendly” theory of cultural heritage resource description, which can borrow from other fields (e.g., ethnomathematics, history of science) to improve its descriptive and predictive power, guide it system design and use, and, in response to users’ experiences with functioning systems, results in improved theories and information systems. graph theory in a cultural heritage context ever since the nineteenth century foundation of graph theory (though scholars regularly date its origins from euler’s 1736 paper)20 and its move from the backwaters of recreational mathematics to full field status by 1936, graph theory has concerned itself with the properties of systems of connections—nowadays regularly expressed as the mathematical objects called sets.21 in addition to its set notational form, graphs also are depicted and manipulated in diagrammatic form as dots/labeled nodes linked by labeled or unlabeled, simple or arrowed lines. for example, the graph x, consisting of one set of nodes labeled a, b, c, d, e, and f and one set of edges labeled ab, bd, de, ef, and fc, can be depicted in set notation as x = {{a b c d e f}, {ab bd de ef fc}} and can be depicted diagrammatically as in figure 2. when graphs are defined to represent different types of nodes and relationships, it becomes possible to create and discuss structures that can support cultural heritage resource description theory and application building. the following diagrams depict simple resource description identified those mathematical ideas in cataloging theory and developed them formally.12 then in 2009, we were able to employ graph theory (expressed in set-theoretical terms and in its highly informative graphical representation) as part of a broader historical and cultural analysis.13 cataloging theory had by 2009 haltingly embraced a new view on how resources in libraries have been described and arranged via their descriptions—an activity that in principle stretches back to catalogs created for the library of alexandria14—and how these structured resource descriptions have evolved over time, irrespective of implementation. murray’s investigation into this issue revealed that the increasingly formalized and refined rules that guided anglo-american catalogers had, by 1876, specified sophisticated systems of cross-references (i.e., connections between bibliographic descriptions of works, authors, and subjects)—systems whose properties were not yet the subject of formal mathematical treatment by mathematicians of the time.15 murray also found that library resource description structures—when teased out of their book and card and digital catalog implementations and treated as graphs—are arguably more sophisticated than those being explored in the world wide web consortium’s (w3c) library linked data initiative.16 implementation-oriented substitutes for graph theory cataloging theory has been both helped and hindered by the use of information technology (it) techniques like entity-relationship modeling (e-r, first used extensively by tillett in 1987 to identify bibliographic relationships in cataloging records) and object-oriented (oo) modeling.17 e-r and oo modeling may be used effectively to create information systems that are based on an inventory of “things of interest” and the relationships that exist between them. unfortunately, the things of interest in cultural heritage institutions keep changing and may require redefinition, aggregation, disaggregation, and re-aggregation. e-r and oo modeling as usually practiced are not designed to manage the degree and kind of changes that take place under those circumstances. when trying to figure out what is “out there” in the bibliographic universe, we assert that focus should first be placed on identifying and describing the things of interest, what relationships exist between them, and what processes are involved in the creation, etc., of resource descriptions. having accomplished this, attention can then be safely paid to defining and managing information deemed essential to the enterprise, that is, undertaking it system analysis and design. but when an it-centric modeling technique becomes the bed on which the resource description theory itself is constructed, the resulting theory will be driven in a direction that is strongly influenced 174 information technology and libraries | december 2011 of the resources they describe. figure 4’s diagrammatic simplicity becomes problematic when large quantities of resources are to be described, when the number and kinds of relationships recorded grows large, and when more comprehensive but less-detailed views of bibliographic relationships are desired. to address these problems in a comprehensive fashion, we examined similar complex description scenarios in the sciences and borrowed another idea from the physics community—paper tool creation and use. ■■ paper tools: graph-aware diagram creation paper tools are collections of symbolic elements (diagrams, characters, etc.), whose construction and manipulation are subject to specified rules and constraints.23 berzelian chemical notation (e.g., c6h12o6) and—more prominently—feynman diagrams like those in figure 5 are familiar examples of paper tool creation and use.24 creating a paper tool resource diagram requires that the rules for creating resource descriptions be reflected in diagram elements, properties of diagram elements, and drawing rules that define how diagram/symbolic elements are connected to one another (e.g., the formula c6h12o6 specifies six molecules of carbon, twelve of hydrogen, and six of oxygen). the detailed bibliographic information in figure 4 is progressively schematized in a graphs that are based on real-world bibliographic descriptions. nodes in the graphs represent text, numbers, or dates and relationships that can be nondirectional (as a simple line), unidirectional (as single arrowed lines) or bidirectional (as a double arrowed line). the all-in-one resource description graph in figure 3 can be divided and connected according to the kinds of relationships that have been defined for cultural heritage resources. this is the point where institutional, group, and individual ways of describing resources shape the initial structure of the graph. once constructed, graph structures like this and their diagrammatic representations are then interpreted in terms of a tacit or explicit resource description theory. in the case of graphs constructed according to ifla’s functional requirements for bibliographic records (frbr) standard,22 figure 3 can be subdivided into four frbr sub-graphs, yielding figure 4. the four diagrams depict the initial graph of cataloging data as four complementary frbr wemi (w–work, e–expression, m–manifestation, and i–item) graphs. note that the item graph contains the call numbers (used here to identify the location of the copy) of three physical copies of the novel. this use of call numbers is qualitatively different from the values found in the manifestation graph in that resource descriptions in this graph apply to the entire population of physical copies printed by the publisher. the descriptions contained in figure 4’s frbr subgraphs reproduce bibliographic characteristics found useful by catalogers, scholars, other educationally oriented end users, and to varying extents the public in general. once created, resource description graphs and subgraphs (in mathematical notation or in simple diagrams like figure 4) can proliferate and link in multiple and complex ways—in parallel with or independently figure 3. library of congress catalog data for thomas pynchon’s novel gravity’s rainbow, represented as an all-inone graph labeled c figure 2. a diagrammatic representation of graph x cataloging theory in search of graph theory and other ivory towers | murray and tillett 175 6 graph is now represented explicitly by a black dot in a ring in the more schematic paper tool version. resource descriptions are then represented in fixed colors and positions relative to the resource/ring: the worklevel resource description is represented by a blue box, expression by a green box, manifestation by a yellow box, and item by a red box. depicting one aspect of the frbr way that reflects frbr definitions of bibliographic things of interest and their relevant relationships. as a first step, the four wemi descriptions in figure 4 are given a common identity by linking them to a c node, as in figure 6. the diagram is then further schematized such that frbr description types and relationships are represented by appropriate graphical elements connected to other elements. the result shows how a frbr paper tool makes it much easier to construct and examine complex large-scale properties of resource and resource description structures (like figure 7, right side) without being distracted by textual and linkage details. the resource described (but not shown) by the figure figure 4. the all-in-one graph in figure 3, separated into four frbr work (top-left), expression (top-right), manifestation (bottom-left), and item (bottom-right) graphs figure 5. feynman diagrams of elementary particle interactions figure 6. a frbr resource description graph 176 information technology and libraries | december 2011 expressions. the work products of scholars—especially those creations that are dense with quotations, citations, and other types of direct and derived textual and graphical reference within and beyond themselves—are excellent environments for paper tool explorations and more generally, for testing of exemplars—solutions to the potentially complex problem of describing cultural heritage resources. ■■ exemplars the fourth principle in our cultural heritage resource description theory involves exemplar identification and analysis. according to the historian of science thomas s. kühn, exemplars are sets of concrete problems and solutions encountered during one’s education, training, and work. in the sciences, exemplar-based problem finding and solving involves mastery of relevant models, builds knowledge bases, and hones problem-solving skills. every student in a field would be expected to demonstrate mastery by learning and using their field’s exemplars. change within a scientific field is manifest by the need to modify old or create new exemplars as new problems appear and must be solved.26 a cultural heritage resource description theorist would, in addition to identifying and developing exemplars from real bibliographic data and other sources, want to speculate about possible resource/description configurations that call for changes in existing information technologies. to the theorist, it would be as important to find out what can’t be done with frbr and other resource description models at library, archive, museum, and internet scales, as it is to be able to explain routine item cataloging and tagging activities. discovering system limitations is better done in advance by simulating uncommon or challenging circumstances than by having problems appear later in production systems. model graphically, the descriptions closest to the black dot resource/slot are the most concrete and those furthest away the most abstract. (readers wishing to interpret frbr paper tool diagrams without reference to color values should note the strict ordering of wemi elements: w–e–m–i–resource/ring or resource/ring–i–m–e–w.) finally, to minimize element use when pairs of wemi boxes touch, the appropriate frbr linking relationship for the relevant pair of descriptions (as explicitly shown in the expanded graph) is implied but not shown. with appropriate diagramming conventions, the process of creating and exploring resource description complexes addresses combined issues of cataloging theory and institutional policy—and results in an ability to make better-informed judgments/computations about resource descriptions and their referenced resources. as a result, resource description graphs are readily created and transformed to serve theoretical—and with greater experience in thinking and programming along graph-friendly lines, practical—ends. one example of transformability would arise when exploring the implications of removing redundant portions of related resource descriptions as more copies of the same work are brought to the bibliographic universe. the frbr paper tool elements and the more articulated resource description graphs in figure 8 both depict the consequences of a practical act: combining resource descriptions for two copies of the same edition of the novel gravity’s rainbow.25 the top-most frbr diagram and its magnified section depict how the graph would look with a single item-level description, the call number for one physical copy. the bottom-most frbr diagram and its magnified section depict the graph with two item-level descriptions, the call numbers for two physical copies. a frbr paper tool’s flexibility is useful for exploring potentially complex bibliographic relationships created or uncovered by scholars—parties whose expertise lies in identifying, interrelating, and discussing creative concepts and influences across a full range of communicative figure 7. a frbr paper tool diagram element (left) and the less schematic frbr resource description graph it depicts (right) cataloging theory in search of graph theory and other ivory towers | murray and tillett 177 drawing diagrams. use case diagrams are secondary in use case work.28 as products of and guides for theory making, resource description exemplars have different origins and audiences than those for use cases. while use cases and exemplars offer perspectives that can support information system design, exemplars were originally introduced as theoretical entities by kühn to explain how theories and theory-committed communities can crystallize around problem-solution sets, how these sets also can serve as pedagogical tools, and why and when problem-solution sets get displaced by new ones. the proposed process of cultural heritage exemplar creation and use, followed by modification or replacement in the face of changes in the bibliographic universe draws on kühn’s and historian of science david kaiser’s interest in how work gets done in the sciences, in addition to their rejection of paradigms as eerie self-directing processes.29 exemplars are not use cases use cases are a software modeling technique employed by the w3c library linked data incubator group (lld xg) in support of requirements specification.27 kühnstyle exemplars are definitely not to be confused with use cases, which are requirements-gathering documents that contribute to software engineering projects. there is a wikipedia definition of a use case that describes its properties: a use case in software engineering and systems engineering, is a description of steps or actions between a user (or “actor”) and a software system which leads the user towards something useful. the user or actor might be a person or something more abstract, such as an external software system or manual process. . . . use cases are mostly text documents, and use case modeling is primarily an act of writing text and not figure 8. frbr paper tool diagram elements and the frbr resource description graphs they depict 178 information technology and libraries | december 2011 ■■ a webpage and its underlying, globally distributed, multimedia resource network, as it changes over time. such exemplars can be presented diagrammatically through the use of paper tools. this use of diagrams in support of conceptualization and information system design is deliberately patterned after professional data modeling theory and practice.31 paper tool–supported analyses of a nineteenth-century american novel (exemplar 1) and of eighteenth-century french poems drawn from state archives (exemplar 2) will be presented to illustrate how information system design and pedagogy can be informed by exemplary scholarly research and publication, combined with narrativized diagrammatic representations of bibliographic and other relationships in traditional and digital media. exemplar 1. from moby-dick to mash-ups—a print publication history and multimedia mash-up problem document the publication history of print copies of a literary work, identifying editorially driven content transfer across print editions along with content selection and transformation in support of multimedia resource creation. solution the solution to this descriptive problem relies heavily on placing resource descriptions into groups and then defining relationships within and across those groups— i.e., on graph creation. after locating a checklist that documented the publication history of the novel and after identifying key components of a moby-dick and orson welles–themed multimedia resource appropriation and transformation network, murray used the frbr paper tool along with additional connection rules to create a resource description diagram (rdd) that represented g. thomas tanselle’s documentation of the printing history (from 1851 to 1976) of herman melville’s epic novel, moby-dick.32 the resulting diagram provides a high-level view of a large set of printed materials—depicting concepts such as a creative work, the expression of the work in a particular mode of languaging (i.e., speech, sign, image), and more concrete concepts such as publications. to reduce displayed complexity, sets of frbr diagram elements were collapsed into green shaded squares representing entire editions/printings, yielding figure 9.33 the vertical axis represents the year of publication, starting with the 1851 printings at the top. connected squares the resulting network of connections in figure 9 can be interpreted in publishing terms. one line or two or more lines descending downwards from a printing’s green in addition, resource description structures specified in an exemplar can and should represent a more abstract treatment of a resource description and not just data or data structures engaged by end users. exemplars on hand and others to come cultural heritage resource description exemplars have been created over time as solutions to problems of resource description and later made available for use, study, mastery, and improvement. while not necessarily bound to a particular information technology, such as papyrus, parchment, index cards, database records, or rdf aggregations, resource description exemplars have historically provided descriptive solutions of physical resources whose physical and intellectual structure had originally been innovative solutions to describing, for example, ■■ a manuscript (individual and related multiples, published but host to history, imaginary, etc.); ■■ a monograph in one edition (individual and related multiples); ■■ a monograph in multiple editions (individual and related multiples); and ■■ a publication in multiple media, created sequentially or simultaneously. with the advent of electronic and then digital communications media, more complex resource description problem-solution sets have been called for as a response to enduringly or recently more sophisticated creative/ editorial decision-making and to more flexible print and digital information technology production capabilities. the most challenging problem-solution sets involve the assembly and cross-referencing of several multipart—and possibly multimedia—creative or editorially constructed works, such as the following: ■■ a work published as a monograph, but which has been reprinted and reedited; translated into numerous languages; supplemented by illustrations from multiple artists; excerpted and adapted as plays, an opera, comic books, and cartoon series; multimedia mash-ups; and has been directly quoted in paintings and other graphic arts productions, and has been the subject of dissertations, monographs, journal articles, etc. ■■ a continuing publication (individual and related multiple publications, special editions, name, publisher, editorial policy changes, etc.). ■■ a monograph whose main content is composed nearly entirely of excerpts from other print publications.30 ■■ a library-hosted multimedia resource and its associated resource description network. cataloging theory in search of graph theory and other ivory towers | murray and tillett 179 by paper tool diagram creation, analysis, and subsequent action, namely, ■■ connecting the squares (i.e., assigning at least one relationship to a printing) ensures access based on the relationship assigned; and ■■ parties located around the globe can examine a given connected or disconnected resource description network and develop strategies for enhancing its usefulness. the wealth of descriptive information available in the moby-dick exemplar illustrates how previous and future collaborative efforts between cultural heritage institutions and other parties have already generated resource descriptions that possess a network structure alongside its content. with a more graph-friendly and collaborative implementation, melville scholars, scholarly organizations,34 and enthusiasts could more effectively examine, discuss, and through their actions enhance the moby dick resource description network’s documentary, scholarly, and educational value. in its original form, the moby dick resource description diagram (and the exemplar it partially documents) only depicted full-length publications of melville’s work. as a test of the frbr paper tool’s ability to accommodate both traditional and modern creative expressions in individual and aggregate form—while continuing to serve theoretical, practical, and educational ends—murray added a resource description network for orson whales,35 square are interpreted to mean that the printing gave rise to one or more additional printings, which may occur in the same or later years. two or more lines converging on a green square from above indicate that the printing was created by combining texts from multiple prior printings—an editorial/creative technique similar to that used to construct the mash-ups published on the web. connecting unconnected squares tanselle’s checklist did not specify predecessor or successor relationships for each post–1851 printing. this often unavoidable, incomplete status is depicted in figure 9 as green squares that are ■■ not linked to any squares above it, i.e., to earlier printings; and/or ■■ not linked to any squares below it, i.e., to later printings; or ■■ connected islands, without a link to the larger structure. recognizing the extent of moby-dick printing disconnectedness in tanselle’s checklist and developing a strategy for dealing with it only by analyzing tanselle’s checklist would be extremely difficult. in contrast, the disconnectedness of the moby-dick resource description network, and its implications for search-based discovery based on following the depicted relationships is readily discernable in figure 9. the ease with which the disconnected condition can be assessed also hints at benefits to be gained by collaborative resource description supported figure 9. a moby-dick resource description diagram, depicting relationships between printings made between 1851–1976 (greatly reduced scale) 180 information technology and libraries | december 2011 darnton’s book can stand on its own as an exemplar for historical method, with the diagram providing additional diagrammatic support. solution 2 darnton’s analysis treated each poem found in the archives as an individual creative work,38 enabling the use of the frbr paper tool (as a bookkeeping device this time) instead of a tool designed to aggregate and describe archival materials. the resulting diagram is a more articulated frbr paper tool depiction of darnton’s poetry communication network, a section of which appears as figure 11. the depiction of the poetry communication network shown in figure 11 is composed of: ■■ tan squares that depict individuals (clerks, professors, priests, students, etc.) who read, discussed, copied, and passed along the poems. ■■ diagram elements that depict poetry written on scraps of paper (treated as resources) that were police custody, were admitted to having existed by suspects, or assumed to have existed by the police. if one’s theory and business rules permit it, paper tool drawing conventions can depict descriptions of lost and nonexistent but nonetheless describable resources. ■■ arrowed lines that represent relationships between a poem and the individuals who owned copies, those who created or received copies of the poem, etc.39 with darnton’s monograph to provide background information regarding the historical personages involved, relationships between the works and the people, document selection from archival fonds, and the point of view of the scholar, the resulting problem-solution set can: ■■ serve as enhanced documentation for darnton-style communication network analysis and discussion. ■■ serve as an exemplar for catalogers, scholars, and alex itin’s moby-dick-themed multimedia mash-up, to the print media diagram. the four-minute long orson whales multimedia mashup contains hundreds of hand-painted page images from the novel, excerpts from the led zeppelin song “moby dick,” parts of two vocal performances by the actor orson welles, and a video clip from welles’s motion picture citizen kane. the result is shown in figure 10.36 the leftmost group of descriptions in figure 10 depicts various releases of led zeppelin’s “moby dick.” the central group depicts the sources of two orson welles audio dialogues after they had been ripped (i.e., digitized from physical media) and made available online. the grouping on the right depicts the orson whales mash-up itself and collections of digital images of painted pages created from two printed copies of the novel. exemplar 2. poetry and the police—archival content identification and critical analysis problem examine archival collections and select, describe, and document ownership and other relationships of a set of documents (poems) alleged to have circulated within a loosely defined social group. solution 1 in his 2010 work, poetry and the police: communication networks in eighteenth-century paris, historian robert darnton studied a 1749 paris police investigation into the transmission of poems highly critical of the french king, louis xv. after combing state archives for police reports, finding and identifying scraps of paper once held as evidence, and collecting other archival materials, darnton was able to construct a poetry communication network diagram,37 which, along with his narrative account, identified a number of parties who owned, copied, and transmitted six of the scandalous poems and placed their activities in a political, social, and literary context. figure 10. a resource description diagram of alex itin’s moby-dick multimedia work, depicting the resources and their frbr descriptions. cataloging theory in search of graph theory and other ivory towers | murray and tillett 181 with all of the adaptations and excerpts extant within a specified bibliographic universe (such as the cataloging records that appear in oclc’s worldcat bibliographic database). resource description diagrams, created from real-world or theoretically motivated considerations, would then provide a diagrammatic means for depicting the precise and flexible underlying mathematical ideas that, heretofore unrecognized but nonetheless systematically employed, serve resource description ends. if the structure of a well-motivated and constructed resource description diagram subsequently makes data representation and management requirements that a given information system cannot accommodate, cataloging theorists and information technologists alike will then know of that system’s limitations, will work together on mitigating them, and will embark on improving system capabilities. ■■ cataloging theory, tool-making, education, and practice this modernized resource description theory offers new and enhanced roles and benefits for cultural heritage personnel as well as for the scholars, students, and those members of the general public who require support not just for searching, but also for collecting, reading, writing, collaborating, monitoring, etc.40 information systems that others who seek similar solutions to their problems with identifying, describing, depicting, and discussing as individual works documents ordinarily bundled within hierarchically structured archival fonds at multiple locations. ■■ a paper tool into a power tool there are limits to what can be done with a hand-drawn frbr paper tool. while murray was able to depict largescale bibliographic relationships that probably had not been observed before, he was forced to stop work on the moby-dick diagram because much of the useful information available could not fit into a static, hand-drawn diagram. we think that automated assistance in creating resource description diagrams from bibliographic records is required. with that capability available, cataloging theorists and parties with scholarly and pedagogical interests could interactively and efficiently explore how scholars and sophisticated readers describe significant quantities of analog and digital resources. it would then be possible and extremely useful to be able to initiate a scholarly discussion or begin a lecture by saying, “given a moby-dick resource description network . . . ” and then proceed to argue or teach from a diagram depicting all known printings of moby-dick—along figure 11. a section of darnton’s poetry communication network 182 information technology and libraries | december 2011 the value of non-euclidean geometry lies in its ability to liberate us from preconceived ideas in preparation for the time when exploration of physical laws might demand some geometry other than the euclidean.41 taking riemann to heart, we assert that the value of describing cultural heritage resources as observations organized into graphs and of enhancing and supplementing the resource description exemplars that have evolved over time and circumstance rests in opportunities for liberating the cultural heritage community from preconceived ideas about resource description structures and from longstanding points of view on those resources. having achieved such a goal, the cultural heritage community would then be ready when the demand came for resource description structures that must be more flexible and powerful than the traditional ones. given the unprecedented development of the web and the promise of bottom-up semantic web initiatives, we think that the time for the cultural heritage community’s liberation is at hand. ■■ acknowledgments the authors wish to thank beacher wiggins and dianne van der reyden, directors of the library of congress acquisitions and bibliographic access directorate and the preservation directorates, respectively, for supporting the authors’ efforts to explore and renew the scientific and mathematical foundations of cultural heritage resource description. thanks also to marcia ascher, david hay, robert darnton, daniel huson, and mark ragan, whose scholarship informed our own; and to joanne o’brienlevin for her critical eye and for editorial advice. references and notes 1. oed online, “catalogue, n.” http://www.oed.com/view dictionaryentry/entry/28711 (accessed aug. 10, 2011). 2. peter galison, “part ii: building data,” in image & logic: a material culture of microphysics (chicago: univ. of chicago pr., 2003): 370–431. 3. gordon mcquat, “cataloguing power: delineating ‘competent naturalists’ and the meaning of species in the british museum,” british journal for the history of science 34, no. 1 (mar. 2001): 1–28. exclusive control of classification schemes and of the records that named and described its specimens are said to have contributed to the success of the british museum’s institutional mission in the nineteenth century. as a division of the british museum, the british library appears to have incorporated classification concepts (hierarchical structuring) from its parent and elaborated on the museum’s strategies for cataloging species. 4. oed online, “observation, n.” http://www.oed.com/ viewdictionaryentry/entry/129883 (accessed july 8, 2011). couple modern, high-level understandings about how cultural heritage resources can be described, organized, and explored with data models that support linking within and across multiple points of view will be able to support those requirements. the complementarity of cosmological and quantum-level views cataloging theory formation and practice—two areas of activity that did not interest many outside of cultural heritage institutions—can now be understood as a much more comprehensive multilayered activity that is approachable from at least two distinct points of view. the approach presented in this paper represents a cosmological-level view on the bibliographic universe. this treatment of existing or imaginable large-scale configurations of cultural heritage resource descriptions serves as a complement to the quantum-level view of resource description, as characterized by it-related specificities such as character sets, identifiers, rdf triples, triplestores, etc. activities at the quantum level—the domain of semantic web technologists and others—yield powerful and relatively unconstrained information management systems. in the absence of cosmological-level inspiration or guidance, these systems have not necessarily been tested against nontrivial, challenging cultural heritage resource description scenarios like those documented in the above two exemplars. applying both views to the bibliographic universe would clearly be beneficial for all institutional and individual parties involved. if ever a model for multilevel, multidisciplinary effort was required, the history of physics is illuminated by mutually influential interactions of cosmological and quantum-level theories, practices, and pedagogy. workers in cultural heritage institutions and technologists pursuing w3c initiatives would do well to reflect on the result. ■■ ready for the future—and creating the future to explore the cultural, scientific, and mathematical ideas underlying cultural heritage resource description, to identify, study, and teach with exemplars, and to exploit the theoretical reach and bookkeeping capability of paper tool –like techniques is to pay homage to the cultural heritage community’s 170+ year-old talent for pragmatic, implementation-oriented thinking,while at the same time pointing out a rich set of possibilities for enhanced service to society. the cultural heritage community can draw inspiration from geometrician bernhard riemann’s own justification for his version of thinking outside of the box called euclidean geometry: cataloging theory in search of graph theory and other ivory towers | murray and tillett 183 18. the prospects for creating graph-theoretical functions that operate on resource description networks are extremely promising. for example, combinatorica (an implementation of graph theory concepts created for the computer mathematics application mathematica) is composed of more than 450 functions. were cultural heritage resource description networks to be defined using this application’s graph-friendly data format, significant quantities of combinatorica functions would be available for theoretical and applied uses; siriam pemmaraju and steven skiena, computational discrete mathematics: combinatorics and graph theory with mathematica (new york: cambridge univ. pr., 2003). 19. dénes könig, theory of finite and infinite graphs, trans. richard mccoart (boston: birkhaüser, 1990); fred buckley and marty lewinter, a friendly introduction to graph theory (upper saddle river, n.j.: pearson, 2003); oystein ore and robin wilson, graphs and their uses (washington d.c.: mathematical association of america, 1990). 20. leonhard euler, “solutio problematis ad geometriam situs pertinentis,” commentarii academiae scientarium imperalis petropolitanae no. 8 (1736): 128–40. 21. “set theory, branch of mathematics that deals with the properties of well-defined collections of objects, which may or may not be of a mathematical nature, such as numbers or functions. the theory is less valuable in direct application to ordinary experience than as a basis for precise and adaptable terminology for the definition of complex and sophisticated mathematical concepts.” quoted from encyclopædia britannica online, “set theory,” oct. 2010, http://www.britannica.com/ebchecked/ topic/536159/set-theory (accessed oct. 27, 2010). 22. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records: final report (munich: k.g. saur, 1998). this document is downloadable as a pdf from http://www.ifla.org/vii/s13/ frbr/frbr.pdf or as an html page at http://www.ifla.org/vii/ s13/frbr/frbr.htm. 23. ursula klein, ed., experiments, models, paper tools: cultures of organic chemistry in the nineteenth century (stanford, calif.: stanford univ. pr., 2003); klein, ed., tools and modes of representation in the laboratory sciences (boston: kluwer, 2001); david kaiser, drawing theories apart: the dispersion of feynman diagrams in postwar physics (chicago: univ. of chicago pr., 2005). 24. for more examples and a general description of feynman diagrams, see http://www2.slac.stanford.edu/vvc/theory/ feynman.html. 25. an enlarged version of this diagram may be found online. ronald j. murray and barbara b. tillett, “frbr paper tool diagram elements and the frbr resource description graphs they depict,” aug. 2011, http://arizona.openrepository.com/ arizona/bitstream/10150/139769/2/fig%208%20frbr%20 paper%20tool%20elements%20and%20graphs.pdf. other informative illustrations also are available. murray and tillett, “resource description diagram supplement to ‘cataloging theory in search of graph theory and other ivory towers. object: cultural heritage resource description networks,” aug. 2011, http://hdl.handle.net/10150/139769. 26. thomas s. kühn, the structure of scientific revolutions, 2nd ed. (chicago: univ. of chicago pr., 1970). 27. daniel vila suero, “use case report,” world wide web consortium, june 27, 2011, http://www.w3.org/2005/ incubator/lld/wiki/usecasereport. 5. david c. hay, uml and data modeling: a vade mecum for modern times (bradley beach, n.j.: technics pr., forthcoming 2011): 124–25. some scholars argue that decisions as to what the things of interest are and the categories they belong to are influenced by social and political factors. geoffrey c. bowker, susan leigh star, sorting things out: classification and its consequences (cambridge, mass.: mit pr., 1999). 6. gerald holton, “the roots of complementarity,” daedalus 117, no. 3 (1988): 151–97, http://www.jstor.org/stable/20023980 (accessed feb. 24, 2011). 7. niels bohr, quoted in aage petersen, “the philosophy of niels bohr,” bulletin of the atomic scientists 19, no. 7 (sept. 1963): 12. 8. niels bohr, “quantum physics and philosophy: causality and complementarity,” in essays 1958–1962 on atomic physics and human knowledge (woodbridge, conn.: ox bow, 1997): 7. 9. for cataloging theorists, the description of cultural heritage things of interest yields groups of statements that occupy different levels of abstraction. upon regarding a certain physical object, a marketer describes product features, a linguist enumerates utterances, a scholar perceives a work with known or inferred relationships to other works, and so on. 10. marcia ascher, ethnomathematics: a multicultural view of mathematical ideas (pacific grove, calif.: brooks/cole, 1991); ascher, mathematics elsewhere: an exploration of ideas across cultures (princeton: princeton univ. pr., 2002). 11. a timeline of events, people, and so on that have had or should have had an impact on describing cultural heritage resources is available online. seven fields or subfields are represented in the timeline and keyed by color: library & information science; mathematics; ethnomathematics; physical sciences; biological sciences; computer science; and arts & literature. ronald j. murray, “the library organization problem,” dipity .com, aug. 2011, http://www.dipity.com/rmur/libraryorganization-problem/ or http://www.dipity.com/rmur/ library-organization-problem/?mode=fs (fullscreen view). 12. barbara ann barnett tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (phd diss., university of california, los angeles, 1987); elaine svenonius, the intellectual foundation of information organization (cambridge, mass.: mit pr., 2000): 32–51. svenonius’s definition is opposed to database implementations that permitted boolean operations on records at retrieval time. 13. ronald j. murray, “the graph-theoretical library,” slideshare.net, july 5 2011, http://www.slideshare.net/ ronmurray/-the-graph-theoretical-library. 14. francis j. witty, “the pinakes of callimachus,” library quarterly 28, no. 1–4 (1958): 132–36. 15. ronald j. murray, “re-imagining the bibliographic universe: frbr, physics, and the world wide web,” slideshare .net, oct. 22 2010, http://www.slideshare.net/ronmurray/frbrphysics-and-the-world-wide-web-revised. 16. for an overview of the technology-driven library linked data initiative, see http://linkeddata.org/faq. murray’s analyses of cultural heritage resource descriptions may be explored in a series of slideshows at http://www.slideshare.net/ronmurray/. 17. pat riva, martin doerr, and maja žumer, “frbroo: enabling a common view of information from memory institutions,” international cataloging & bibliographic control 38, no. 2 (june 2009): 30–34. 184 information technology and libraries | december 2011 36. the multimedia mash-up in figure 10 was linked to the much larger moby-dick structure depicted in figure 9. the combination of the two yields figure 10a, which is too detailed for printout but which can be downloaded for inspection as the following pdf file: ronald j. murray and barbara b. tillett, “transfer and transformation of content across cultural heritage resources: a moby-dick resource description network covering full-length printings from 1851–1976*,” july 2011, http://arizona.openrepository.com/arizona/bitstream/10150/136270/4/fig%2010a%20orson%20whales%20 in%20moby%20dick%20context.pdf. in the figure, two print publications have been expanded to reveal their own similar mash-up structure. 37. robert darnton, poetry and the police: communication networks in eighteenth-century paris (cambridge, mass.: belknap pr. of harvard univ. pr., 2010): 16. 38. ronald j. murray in a discussion with robert darnton, sept. 20, 2010. darnton considered the poems retrieved from the archives as distinct intellectual creations, which permitted the use of frbr diagram elements for the analysis. otherwise, a paper tool with diagram elements based on the archival descriptive standard isad(g) would have been used. committee on descriptive standards, isad (g): general international standard archival description (stockholm, sweden, 1999– ). 39. the complete poetry communication diagram may be viewed at http://arizona.openrepository.com/arizona/ bitstream/10150/136270/6/fig%2011%20poetry%20commun ication%20network.pdf. 40. carole l. palmer, lauren c. teffeau, and carrie m. pittman, scholarly information practices for the online environment: themes from the literature and implications for library science development (dublin, ohio: oclc research, 2009), http://www . o c l c . o rg / p ro g r a m s / p u b l i c a t i o n s / re p o r t s / 2 0 0 9 0 2 . p d f (accessed july 15, 2011). 41. g. f. b. riemann, quoted in marvin j. greenberg, euclidean and non-euclidean geometry: development and history (new york: freeman, 2008): 371. 28. wikipedia.org, “use case,” june 13, 2011, http://en .wikipedia.org/wiki/use_case. 29. kaiser, drawing theories, 385–86. 30. prime examples being jacques derrida’s typographically complex 1974 work glas (univ. of nebraska pr.), and reality hunger: a manifesto (vintage), david shield’s 2011 textual mashup on the topic of originality, authenticity, and mash-ups in general. 31. graeme simsion, data modeling: theory and practice (bradley beach, n.j.: technics, 2007): 333. 32. herman melville, moby-dick (new york: harper & brothers; london: richard bentley, 1851). moby-dick edition publication history excerpted from g. thomas tanselle, checklist of editions of moby-dick 1851–1976. issued on the occasion of an exhibition at the newberry library commemorating the 125th anniversary of its original publication (evanston, ill.: northwestern univ. pr.; chicago: newberry library, 1976). 33. ronald j. murray, “from moby-dick to mash-ups: thinking about bibliographic networks,” slideshare.net, apr. 2011, http://www.slideshare.net/ronmurray/from-mobydick-to-mashups-revised. the moby-dick resource description diagram was presented to the american library association committee on cataloging: description and access at the ala annual conference, washington d.c., july 2010. 34. the life and works of herman melville, melville.org, july 25, 2000, http://melville.org. 35. the new york artist alex itin describes his creation: “it is more or less a birthday gift to myself. i’ve been drawing it on every page of moby dick (using two books to get both sides of each page) for months. the soundtrack is built from searching ‘moby dick’ on youtube (i was looking for orson’s preacher from the the [sic] john huston film) . . . you find tons of led zep [sic] and drummers doing bonzo and a little orson . . . makes for a nice melville in the end. cinqo [sic] de mayo i turn forty. ahhhhhhh the french champagne.” quoted from alex itin, “orson whales,” youtube, jan. 2011, http://www.youtube .com/watch?v=2_3-gem6o_g. 2 information technology and libraries | december 2008 andrew k. pacepresident’s message i n my first column, i mentioned that the lita board’s main objective is “to oversee the affairs of the division during the period between meetings.” of course, oversight requires communication. sometimes this is among board members, or it’s an e-mail update, or a post to the lita-l discussion list, or even the articles in this journal. regardless, i see the cornerstone of “between-meeting oversight” as keeping the membership fully (or even partially) engaged from january through june and july through december. as a mea culpa for the board, but without placing the blame on any one individual, i am willing to concede that the board has not done an adequate job of engaging the membership between american library association (ala) meetings. while ala itself is addressing this problem with recommendations for virtual participation and online collaboration, lita should be at the forefront of setting the benchmark for virtual communication, participation, education, planning, and membership development. in an attempt to posit some solutions, as opposed to finding someone to blame, i first thought of the lita committees. which one should be responsible for communicating lita opportunities and events to the membership using twenty-first-century technology? education? membership? web coordinating? program planning? publications? in the end, i was left with the choice of two evils: merge all the committees into one so that they can do everything or create a new committee to deal with the perceived problem. knowing that neither of those solutions will suffice, i’d like to put the onus back on the membership. maybe i’m trying to be a 2.0 librarian—crowdsourcing the problem, that is, taking the task that might have been done by an individual or committee and asking for more of a community-driven solution. in the past, lita focused on the necessary technologies for crowdsourcing—discussion lists, blogs, and wikis—as if the technology alone could solve the problem. the bigwig taskforce and web coordinating committee have shouldered the burden of both implementing the technology and gaining philosophical consensus on its use—a daunting task that can easily appear chaotic. now that the technology is commoditized (and generally embraced by ala at large and other divisions as well), perhaps it is time to embrace the philosophy of crowdsourcing. maybe it’s just because i have had cloud computing and web-scale architectures on the brain too much lately (having decided that it is impossible to serve two masters—job and volunteer work—i shall forever endeavor to find the overlap between the two), but i sincerely believe that repeating the mantra that lita’s strength is its membership is not mere rhetorical lipservice. ebay is better for sellers because there are so many buyers; it is better for buyers because there are so many sellers. googledocs works for sharing documents better than a corporate wiki or microsoft sharepoint because it breaks down the barriers of domains, allowing the participants to determine who shares responsibility for producing something. barcamps are rising in popularity not only because of a content focus on open data, open source, and open access, but because of the participatory and usergenerated style of the barcamp-style meetings. as a division of ala, lita has two challenges— leading the efforts of educating the membership, other divisions, and ala about impending sea changes in information technology, but also embracing these technologies itself. we must eat our own dog food, as the saying goes. perhaps it is more fitting to suggest that lita must not only focus on getting technology to work, but putting technology to work. in the next few months, the lita board will be tackling lita’s strategic plan, which expires in 2008. that means it is time not only to review the strategy—to educate, to serve, to reach out—but also to assess the tactics employed to fulfill that strategy. you are probably reading this column in or after the month in which the strategic plan ends, which does not mean that we will be coasting into the ala midwinter meeting. on the contrary, i sincerely hope to gather enough information from committees, task forces, members, and nonmembers in order for the lita leadership to come up with something strategically meaningful going into the next decade. one year isn’t nearly long enough to see something this big through to completion. just as national politicians begin reelection campaigns as soon as they are elected, i suspect that ala divisional presidents begin thinking about their legacy within the first couple months of office, if not before. but i hope, at least, to establish some groundwork, including a platform strategy that will allow the membership to maintain a connection with the board and with other members—to crowdsource solutions on a scale that has not been attempted in the past and that will solidify our future. and when we have a plan, you can trust that we will use all the available methods at our disposal to promote it and solicit your feedback. andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. laneconnex | ketchell et al. 31 laneconnex: an integrated biomedical digital library interface debra s. ketchell, ryan max steinberg, charles yates, and heidi a. heilemann this paper describes one approach to creating a search application that unlocks heterogeneous content stores and incorporates integrative functionality of web search engines. laneconnex is a search interface that identifies journals, books, databases, calculators, bioinformatics tools, help information, and search hits from more than three hundred full-text heterogeneous clinical and bioresearch sources. the user interface is a simple query box. results are ranked by relevance with options for filtering by content type or expanding to the next most likely set. the system is built using component-oriented programming design. the underlying architecture is built on apache cocoon, java servlets, xml/xslt, sql, and javascript. the system has proven reliable in production, reduced user time spent finding information on the site, and maximized the institutional investment in licensed resources. m ost biomedical libraries separate searching for resources held locally from external database searching, requiring clinicians and researchers to know which interface to use to find a specific type of information. google, amazon, and other web search engines have shaped user behavior and expectations.1 users expect a simple query box with results returned from a broad array of content ranked or categorized appropriately with direct links to content, whether it is an html page, a pdf document, a streaming video, or an image. biomedical libraries have transitioned to digital journals and reference sources, adopted openurl link resolvers, and created institutional repositories. however, students, clinicians, and researchers are hindered from maximizing this content because of proprietary and heterogeneous systems. a strategic challenge for biomedical libraries is to create a unified search for a broad spectrum of licensed, open-access, and institutional content. n background studies show that students and researchers will use the search path of least cognitive resistance.2 ease and speed are the most important factors for using a particular search engine. a university of california report found that academic users want one search tool to cover a wide information universe, multiple formats, full-text availability to move seamlessly to the item itself, intelligent assistance and spelling correction, results sorted in order of relevance, help navigating large retrievals by logical subsetting and customization, and seamless access anytime, anywhere.3 studies of clinicians in the patient-care environment have documented that effort is the most important factor in whether a patient-care question is pursued.4 for researchers, finding and using the best bioinformatics tool is an elusive problem.5 in 2005, the lane medical library and knowledge management center (lane) at the stanford university medical center provided access to an expansive array of licensed, institutional, and open-access digital content in support of research, patient care, and education. like most of its peers, lane users were required to use scores of different interfaces to search external databases and find digital resources. we created a local metasearch application for clinical reference content, but it did not integrate result sets from disparate resources. a review of federated-search software in the marketplace found that products were either slow or they limited retrieval when faced with a broad spectrum of biomedical content. we decided to build on our existing application architecture to create a fast and unified interface. a detailed analysis of lane website-usage logs was conducted before embarking on the creation of the new search application. key points of user failure in the existing search options were spelling errors that could easily be corrected to avoid zero results; lack of sufficient intuitive options to move forward from a zero-results search or change topics without backtracking; lack of use of existing genre or role searches; confusion about when to use the resource, openurl resolver, or pubmed search to find a known item; and results that were cognitively difficult to navigate. studies of the web search engine and the pubmed search log concurred with our usagelog analysis: a single term search is the most common, with three words maximum entered by typical users.6 a pubmed study found that 22 percent of user queries were for known items rather than for a general subject, confirming our own log analysis findings that the majority of searches were for a particular source item.7 search-term analysis revealed that many of our users were entering partial article citations (e.g., author, date) in any query debra s. ketchell (debra.ketchell@gmail.com) is the former associate dean for knowledge management and library director; ryan max steinberg (ryan.max.steinberg@stanford .edu) is the knowledge integration programmer/architect; charles yates (charles.yates@stanford.edu) is the systems software developer; and heidi a. heilemann (heidi.heilemann@stanford .edu) is the former director for research & instruction and current associate dean for knowledge management and library director at the lane medical library & knowledge management center, information resources & technology, stanford university school of medicine, stanford, california. 32 information technology and libraries | march 2009 box expecting that article databases would be searched concurrently with the resource database. our displayed results were sorted alphabetically, and each version of an item was displayed separately. for the user, this meant a cluttered list with redundant title information that increased their cognitive effort to find meaningful items. overall, users were confronted with too many choices upfront and too few options after retrieving results. focus groups of faculty and students were conducted in 2005. attendees wanted local information integrated into the proposed single search. local information included content such as how-to information, expertise, seminars, grand rounds, core lab resources, drug formulary, patient handouts, and clinical calculators. most of this content is restricted to the stanford user population. users consistently described their need for a simple search interface that was fast and customized to the stanford environment. in late 2005, we embarked on a project to design a search application that would address both existing points of failure in the current system and meet the expressed need for a comprehensive discovery-andfinding tool as described in focus groups. the result is an application called laneconnex. n design objectives the overall goal of laneconnex is to create a simple, fast search across multiple licensed, open-access, and special-object local knowledge sources that depackages and reaggregates information on the basis of stanford institutional roles. the content of lane’s digital collection includes forty-five hundred journal titles and fortytwo thousand other digital resources, including video lectures, executable software, patient handouts, bioinformatics tools, and a significant store of digitized historical materials as a result of the google books program. media types include html pages, pdf documents, jpeg images, mp3 audio files, mpeg4 videos, and executable applications. more than three hundred reference titles have been licensed specifically for clinicians at the point of care (e.g., uptodate, emedicine, stat-ref, and micromedex clinical evidence). clinicians wanted their results to reflect subcomponents of a package (e.g., results from the micromedex patient handouts). other clinical content is institutionally managed (e.g., institutional formulary, lab test database, or patient handouts). more than 175 biomedical research tools have been licensed or selected from open-access content. the needs of biomedical researchers include molecular biology tools and software, biomedical literature databases, citation analysis, chemical and engineering databases, expertise-finding tools, laboratory tools and supplies, institutional-research resources, and upcoming seminars. the specific objectives of the search application are the following: n the user interface should be fast, simple, and intuitive, with embedded suggestions for improving search results (e.g., did you mean? didn’t find it? have you tried?). n search results from disparate local and external systems should be integrated into a single display based on popular search-engine models familiar to the target population. n the query-retrieval and results display should be separated and reusable to allow customization by role or domain and future expansion into other institutional tools. n resource results should be ranked by relevance and filtered by genre. n metasearch results should be hit counts and filtered by category for speed and breadth. results should be reusable for specific views by role. n finding a known article or journal should be streamlined and directly link to the item or “get item” option. n the most popular search options (pubmed, google, and lane journals) should be ubiquitous. n alternative pathways should be dynamic and interactive at the point of need to avoid backtracking and dead ends. n user behavior should be tracked by search term, resource used, and user location to help the library make informed decisions about licensing, metadata, and missing content. n off-the-shelf software should be used when available or appropriate with development focused on search integration. n the application should be built upon existing metadata-creation systems and trusted webdevelopment technologies. based on these objectives, we designed an application that is an extension of existing systems and technologies. resources are acquired and metadata are provided using the voyager integrated library system (ils). the sfx openurl link resolver provides full-text article access and expands the title search beyond biomedicine to all online journals at stanford. ezproxy provides seamless off-campus access. webtrends provides usage tracking. movable type is used to create faq and help information. a locally developed metasearch application provides a cross search with hit results from more than three hundred external and internal full-text sources. the technologies used to build laneconnex and integrate all of these systems include extensible stylesheet language laneconnex | ketchell et al. 33 transformations (xslt), java, javascript, the apache cocoon project, and oracle. n systems description architecture laneconnex is built on a principle of separation of concerns. the lane content owner can directly change the inclusion of search results, how they are displayed, and additional path-finding information. application programmers use java, javascript, xslt, and structured query language (sql) to create components that generate and modify the search results. the merger of content design and search results occurs “just in time” in the user’s browser. we use component-oriented programming design whereby services provided within the application are defined by simple contracts. in laneconnex, these components (called “transformers”) consume xml information and, after transforming it in some way, pass it on to some other component. a particular contract can be fulfilled in different ways for different purposes. this component architecture allows for easy extension of the underlying apache cocoon application. if laneconnex needs to transform some xml data that is not possible with built-in cocoon transformers, it is a simple matter to create a software component that does what is needed and fulfills the transformer contract. apache cocoon is the underlying architecture for laneconnex, as illustrated in figure 1. this java servlet is an xml–publishing engine that is built upon a component framework and uses a pipeline-processing model. a declarative language uses pattern matching to associate sets of processing components with particular request urls. content can come from a variety of sources. we use content from the local file system, network file system, http, and a relational database. the xslt language is used extensively in the pipelines and gives fine control of individual parts of the documents being processed. the end of processing is usually an xhtml document but can be any common mime type. we use cocoon to separate areas of concern so things like content, look and feel, and processing can all be managed as separate entities by different groups of people with little effect on another area. this separation of concerns is manifested by template documents that contain most of the html content common to all pages and are then combined with content documents within a processing pipeline. the declarative nature of the sitemap language and xslt facilitate rapid development with no need to redeploy the entire application to make changes in its behavior. the laneconnex search is composed of several components integrated into a query-and-results interface: oracle resource metadata, full-text metasearch application, movable type blogging software, “did you mean?” spell checker, ezproxy remote access, and webtrends tracking. n full-text metasearch integration of results from lane’s metasearch application illustrates cocoon’s many strengths. when a user searches laneconnex, cocoon sends his or her query to the metasearch application, which then dispatches the request to multiple external, full-text search engines and content stores. some examples of these external resources are uptodate, access medicine, micromedex, pubmed, and md consult. the metasearch application interacts with these external resources through jakarta commons http clients. responses from external resources are turned into w3c document object model (dom) objects, and xpath expressions are used to resolve hit counts from the dom objects. as result counts are returned, they are added to an xml–based result list and returned to cocoon. the power of cocoon becomes evident as the xml– based metasearch result list is combined with a separate display template. this template-based approach affords content curators the ability to directly add, group, and describe metasearch resources using the language and look that is most meaningful to their specific user communities. for example, there are currently eight metasearch templates curated by an informationist in partnership with a target community. curating these templates requires little to no assistance from programmers. in lane’s 2005 interface, a user’s request was sent to the metasearch application, and the application waited five seconds before responding to give external resources a chance to return a result. hit counts in the user interface included a link to refresh and retrieve more results from external resources that had not yet responded. usability studies showed this to be a significant user barrier, since the refresh link was rarely clicked. the initial five second delay also gave users the impression that the site was slow. the laneconnex application makes heavy use of javascript to solve this problem. after a user makes her initial request, javascript is used to poll the metasearch application (through cocoon) on the user’s behalf, popping in result counts as external resources respond. this adds a level of interactivity previously unavailable and makes the metasearch piece of laneconnex much more successful than its previous version. resource metadata laneconnex replaces the catalog as the primary discovery interface. metadata describing locally owned and 34 information technology and libraries | march 2009 licensed resources (journals, databases, books, videos, images, calculators, and software applications) are stored in the library’s current system of record, an instance of the voyager ils. laneconnex makes no attempt to replace voyager ’s strengths as an application for the selection, acquisition, description, and management of access to library resources. it does, however, replace voyager ’s discovery interface. to this end, metadata for about eight thousand digital resources is extracted from voyager ’s oracle database, converted into marcxml, processed with xslt, and stored in a simple relational database (six tables and twenty-nine attributes) to support fast retrieval speed and tight control over search syntax. this extraction process occurs nightly, with incremental updates every five minutes. the oracle text search engine provides functionality anticipated by our internet-minded users. key features are speed and relevance-ranked results. a highly refined results ranking insures that the logical title appears in the first few results. a user ’s query is parsed for wildcard, boolean, proximity, and phrase operators, and then translated into an sql query. results are then transformed into a display version. related services laneconnex compares a user’s query terms against a dictionary. each query is sent to a cocoon spell-checking component that returns suggestions where appropriate. this component currently uses the simple object figure 1. laneconnex architecture. laneconnex | ketchell et al. 35 access protocol (soap)–based spelling service from google. google was chosen over the national center for biotechnology information (ncbi) spelling service because of the breadth of terms entered by users; however, cocoon’s component-oriented architecture would make it trivial to change spell checkers in the future. each query is also compared against stanford’s openurl link resolver (findit@stanford). client-side javascript makes a cocoon-mediated query of findit@stanford. using xslt, findit@stanford responses are turned into javascript object notation (json) objects and popped into the interface as appropriate. although the vast majority of laneconnex searches result in zero findit@stanford results, the convenience of searching all of lane’s systems in a single, unified interface far outweighs the effort of implementation. a commercial analytics tool called webtrends is used to collect web statistics for making data-centric decisions about interface changes. webtrends uses client-side javascript to track specific user click events. libraries need to track both on-site clicks (e.g., the user clicked on “clinical portal” from the home page) and off-site clicks (e.g., the user clicked on “yamada’s gastroenterology” after doing a search for “ibs”). to facilitate off-site click capture, webtrends requires every external link to include a snippet of javascript. requiring content creators to input this code by hand would be error prone and tedious. laneconnex automatically supplies this code for every class of link (search or static). this specialized webtrends method provides lane with data to inform both interface design and licensing decisions. n results laneconnex version 1.0 was released to the stanford biomedical community in july 2006. the current application can be experienced at http://lane.stanford.edu. the figure 2. laneconnex resource search results. resource results are ranked by relevance. single word titles are given a higher weight in the ranking algorithm to insure they are displayed in the first five results. uniform titles are used to co-locate versions (e.g., the three instances of science from different producers). journals titles are linked to their respective impact factor page in the isi web of knowledge. digital formats that require special players or restrictions are indicated. the metadata searched for ejournals, databases, ebooks, biotools, video, and medcalcs are lane’s digital resources extracted from the integrated library system into a searchable oracle database. the first “all” tab is the combined results of these genres and the lane site help and information. figure 3. laneconnex related services search enhancements. laneconnex includes a spell checker to avoid a common failure in user searches. ajax services allow the inclusion of search results from other sources for common zero results failures. for example, the stanford link resolver database is simultaneously searched to insure online journals outside the scope of biomedicine are presented as a linked result for the user. production version has proven reliable over two years. incremental user focus groups have been employed to improve the interface as issues arose. a series of vignettes will be used to illustrate how the current version of 36 information technology and libraries | march 2009 the “sunetid login” is required. n user query: “new yokrer.” a faculty member is looking for an article in the new yorker for a class reading assignment. he makes a typing error, which invokes the “did you mean?” function (see figure 3). he clicks on the correct spelling. no results are found in the resource search, but a simultaneous search of the link-resolver database finds an instance of this title licensed for the campus and displays a clickable link for the user. n user query: “pathway analysis.” a post–doc is looking for information on how to share an ingenuity pathway. figure 4 illustrates the integration of the locally created lane faqs. faqs comprise a broad spectrum of help and how-to information as described by our focus groups. help text is created in the movable type blog software, and made searchable through the laneconnex application. the movable type interface lowers the barrier to html content creation by any staff member. more complex answers include embedded images and videos to enable the user to see exactly how to do a particular procedure. cocoon allows for the syndication of subsets of this faq content back into static html pages where it can be displayed as both category-specific lists or as the text for scroll-over help for a link. having a single store of help information insures the content is updated once for all instances. n user query: “uterine cancer kapp.” a resident is looking for a known article. laneconnex simultaneously searches pubmed to increase the likelihood of user success (see figure 5). clicking on the pubmed tab retrieves the results in the native interface; however, the user sees the pubmed@stanford version, which includes embedded links to the article based on our openurl link resolver. the ability to retrieve results from bibliographic databases that includes article resolution insures that our biomedical community is always using the correct url to insure maximum full-text article access. user testing in 2007 found that adding the three most frequently used sources (pubmed, google, and lane catalog) into our one-box laneconnex search was a significant time saver. it addresses laneconnex meets the design objectives from the user’s perspective. n user query: “science.” a graduate student is looking for the journal science. the laneconnex results are listed in relevance order (see figure 2). singleword titles are given a higher weight in the ranking algorithm to insure they are displayed in the first five results. results from local metadata are displayed by uniform title. for example, lane has three instances of the journal science, and each version is linked to the appropriate external store. brief notes provide critical information for particular resources. for example, restricted local patient education documents and video seminars note that figure 4. example of integration of local content stores. help information is managed in moveable type and integrated into laneconnex search results. laneconnex | ketchell et al. 37 the expectation on the part of our users that they could search for an article or a journal title in a single search box without first selecting a database. n user query: “serotonin pulmonary hypertension.” a medical student is looking for the correlation of two topics. clicking on the “clinical” tab, the student sees the results of the clinical metasearch in figure 6. metasearch results are deep searches of sources within licensed packages (e.g., textbooks in md consult or a specific database in micromedex), local content (e.g., stanford’s lab-test database), and openaccess content (e.g., ncbi databases). pubmed results are tailored strategies tiered by evidence. for example, the evidence-summaries strategy retrieves results from twelve clinical-evidence resources (e.g., buj, clinical evidence, and cochrane systematic reviews) that link to the full-text licensed by stanford. an example of the bioresearch metasearch is shown in figure 7. content selected for this audience includes literature databases, funding sources, patents, structures, clinical trials, protocols, and stanford expertise integrated with gene, protein, and phenotype tools. user testing revealed that many users did not click on the “clinical” tab. the clinical metasearch was originally developed for the clinical portal page and focused on clinicians in practice; however, the results needed to be exposed more directly as part of the laneconnex search. figure 8 illustrates the “have you tried?” feature that displays a few relevant clinical-content sources without requiring the user to select the “clinical” tab. this feature is managed by the smartsearch component of the laneconnex system. smartsearch sends the user’s query terms to pubmed, extracts a subset of articles associated with those terms, extracts the mesh headings for those articles, and computes the frequency of headings in the articles to determine the most likely mesh terms associated with the user’s query terms. these mesh terms are mapped to mesh terms associated with each metasearch resource. preliminary evaluation indicates that the clinical content is now being discovered by more users. figure 5. example of integration of popular search engines into laneconnex results. three of the most popular searches based on usage analysis are included at the top level. pubmed and google are mapped to lane’s link resolver to retrieve the full article. creating or editing metasearch templates is a curator driven task. programming is only required to add new sources to the metasearch engine. a curator may choose from more than three hundred sources to create a discipline-based layout using general templates. names, categories, and other description information are all at the curator ’s discretion. while developing new subspecialty templates, we discovered that clinicians were confused by the difference in layout of their specialty portal and their metasearch results (e.g., the cardiology portal used the generic clinical metasearch). to address this issue, we devised an approach that merges a portal and metasearch into a single entity as illustrated in figure 9. a combination of the component-oriented architecture of laneconnex and javascript makes the integration of metasearch results into a new template patterned after a portal easy to implement. this strategy will enable the creation of templates contextually appropriate to knowledge requests originating from electronic medical-record systems in the future. direct user feedback and usage statistics confirm that search is now the dominant mode of navigation. the amount of time each user spends on the website has dropped since the release of version 1.0. we speculate that the integrated search helps our users find relevant 38 information technology and libraries | march 2009 information more efficiently. focus groups with students are uniformly positive. graduate students like the ability to find digital articles using a single search box. medical students like the clinical metasearch as an easy way to look up new topics in texts and customized pubmed searches. bioengineering students like the ability to easily look up patient care–related topics. pediatrics residents and attendings have championed the development of their portal and metasearch focused on their patient population. medical educators have commented on their ability to focus on the best information sources. n discussion a review of websites in 2007 found that most biomedical libraries had separate search interfaces for their digital resources, library catalog, and external databases. biomedical libraries are implementing metasearch software to cross search proprietary databases. the university of california, davis is using the metalib software to federate searching multiple bibliographic databases.8 the university of south california and florida state university are using webfeat software to search clinical textbooks.9 the health sciences library system at the university of pittsburgh is using vivisimo to search clinical textbooks and bioresearch tools.10 academic libraries are introducing new “resource shopping” applications, such as the endeca project at north carolina state university, the summa project at the university of aarhus, and the vufind project at villanova university.11 these systems offer a single query box, faceted results, spell checking, recommendations based on user input, and asynchronous javascript and xml (ajax) for live status information. we believe our approach is a practical integration for our biomedical community that bridges finding a resource and finding a specific item through figure 6. integration of metasearch results into laneconnex. results from two general, role-based metasearches (bioresearch and clinical) are included in the laneconnex interface. the first image shows a clinician searching laneconnex for serotonin pulmonary hypertension. selecting the clinical tab presents the clinical content metasearch display (second image), and is placed deep inside the source by selecting a title (third image). laneconnex | ketchell et al. 39 a metasearch of multiple databases. the laneconnex application searches across digital resources and external data stores simultaneously and presents results in a unified display. the limitation to our approach is that the metasearch returns only hit counts rather than previews of the specific content. standardization of results from external systems, particularly receipt of xml results, remains a challenge. federated search engines do integrate at this level, but are usually slow or limit the number of results. true integration awaits health level seven (hl7) clinical decision support standards and national information standards organization (niso) metasearch initiative for query and retrieval of specific content.12 one of the primary objectives of laneconnex is speed and ease of use. ranking and categorization of results has been very successful in the eyes of the user community. the integration of metasearch results has been particularly successful with our pediatric specialty portal and search. however, general user understanding of how the clinical and biomedical tabs related to the genre tabs in laneconnex has been problematic. we reviewed web engines and found a similar challenge in presenting disparate format results (e.g., video or image search results) or lists of hits from different systems (e.g., ncbi’s entrez search results).13 we are continuing to develop our new specialty portal-and-search model and our smartsearch term-mapping component to further integrate results. n conclusion laneconnex is an effective and openended search infrastructure for integrating local resource metadata and full-text content used by clinicians and biomedical researchers. its effectiveness comes from the recognition that users prefer a single query box with relevance or categorically organized results that lead them to the most likely figure 7. example of a bioresearch metasearch. figure 8. the smartsearch component embeds a set of the metasearch results into the laneconnex interface as “have you tried?” clickable links. these links are the equivalent of selecting the title from a clinical metasearch result. the example search for atypical malignant rhabdoid tumor (a rare childhood cancer) invokes oncology and pediatric textbook results. these texts and pubmed provide quick access for a medical student or resident on the pediatric ward. figure 9. example of a clinical specialty portal with integrated metasearch. clinical portal pages are organized so metasearch hit counts can display next to content links if a user executes a search. this approach removes the dissonance clinicians felt existed between separate portal page and metasearch results in version 1.0. 40 information technology and libraries | march 2009 answer to a question or prospects in their exploration. the application is based on separation of concerns and is easily extensible. new resources are constantly emerging, and it is important that libraries take full advantage of existing and forthcoming content that is tailored to their user population regardless of the source. the next major step in the ongoing development of laneconnex is becoming an invisible backend application to bring content directly into the user’s workflow. n acknowledgements the authors would like to acknowledge the contributions of the entire laneconnex technical team, in particular pam murnane, olya gary, dick miller, rick zwies, and rikke ogawa for their design contributions, philip constantinou for his architecture contribution, and alain boussard for his systems development contributions. references 1. denise t. covey, “the need to improve remote access to online library resources: filling the gap between commercial vendor and academic user practice,” portal libraries and the academy 3 no.4 (2003): 577–99; nobert lossau, “search engine technology and digital libraries,” d-lib magazine 10 no. 6 (2004), www.dlib.org/dlib/june04/lossau/06lossau.html (accessed mar. 1, 2008); oclc, “college students’ perception of libraries and information resource,” www.oclc.org/reports/ perceptionscollege.htm (accessed mar 1, 2008); and jim henderson, “google scholar: a source for clinicians,” canadian medical association journal 12 no. 172 (2005). 2. covey, “the need to improve remote access to online library resources”; lossau, “search engine technology and digital libraries”; oclc, “college students’ perception of libraries and information resource.” 3. jane lee, “uc health sciences metasearch exploration. part 1: graduate student gocus group findings,” uc health sciences metasearch team, www.cdlib.org/inside/assess/ evaluation_activities/docs/2006/draft_gradreport_march2006. pdf (accessed mar. 1, 2008). 4. karen k. grandage, david c. slawson, and allen f. shaughnessy, “when less is more: a practical approach to searching for evidence-based answers,” journal of the medical library association 90 no. 3 (2002): 298–304. 5. nicola cannata, emanuela merelli, and russ b. altman, “time to organize the bioinformatics resourceome,” plos computational biology 1 no. 7 (2005): e76. 6. craig silverstein et al., “analysis of a very large web search engine query log,” www.cs.ucsb.edu/~almeroth/ classes/tech-soc/2005-winter/papers/analysis.pdf (accessed mar. 1, 2008); anne aula, “query formulation in web information search,” www.cs.uta.fi/~aula/questionnaire.pdf (accessed mar. 1, 2008); jorge r. herskovic, len y. tanaka, william hersh, and elmer v. bernstam, “a day in the life of pubmed: analysis of a typical day’s query log,” journal of the american medical informatics association 14 no. 2 (2007): 212–20. 7. herskovic, “a day in the life of pubmed.” 8. davis libraries university of california, “quicksearch,” http://mysearchspace.lib.ucdavis.edu/ (accessed mar. 1, 2008). 9. eileen eandi, “health sciences multi-ebook search,” norris medical library newsletter (spring 2006), norris medical library, university of southern california, www.usc.edu/hsc/ nml/lib-information/newsletters.html (accessed mar. 1, 2008); maguire medical library, florida state university, “webfeat clinical book search,” http://med.fsu.edu/library/tutorials/ webfeat2_viewlet_swf.html (accessed mar. 1, 2008). 10. jill e. foust, philip bergen, gretchen l. maxeiner, and peter n. pawlowski, “improving e-book access via a librarydeveloped full-text search tool,” journal of the medical library association 95 no. 1 (2007): 40–45. 11. north carolina state university libraries, “endeca at the ncsu libraries,” www.lib.ncsu.edu/endeca (accessed mar. 1, 2008); hans lund, hans lauridsen, and jens hofman hansen, “summa—integrated search,” www.statsbiblioteket.dk/ publ/summaenglish.pdf (accessed mar. 1, 2008); falvey memorial library, villanova university, “vufind,” www.vufind.org (accessed mar. 1, 2008). 12. see the health level seven (hl7) clinical decision support working committee activities, in particular the infobutton standard proposal at www.hl7.org/special/committees/dss/ index.cfm and the niso metasearch initiative documentation at www.niso.org/workrooms/mi (accessed mar 1, 2008). 13. national center for biotechnology information (ncbi) entrez cross-database search, www.ncbi.nlm.nih.gov/entrez (accessed mar. 1, 2008). acrl 5 alcts 15 lita cover 2, cover 3 jaunter cover 4 index to advertisers author name and second author b y now, most library and information technology association (lita) members and information technology and libraries (ital) readers know that 2006 is the fortieth anniversary of lita’s predecessor, the information science and automation division (isad) of the american library association (ala). and 2007 marks the fortieth birthday of ital, first published in 1967 as the journal of library automation (jola). i hope that members and readers know the vital role played by fred kilgour in the founding of the division and as jola’s founding editor. this issue marks the initiation of a two-volume celebration (volumes 25 and 26) of his role as founding editor by publishing what we hope are significant articles resulting from original research, the development of important and creative new systems, or explications of significant new technologies that will shape future information technologies. i have invited some of the authors of these articles to submit their manuscripts. others are being submitted in response to a call i published both in an earlier editorial and in a message to the lita-l discussion list. whether invited or submitted, they will receive the same double-blind refereeing that all ital articles undergo. the referees will not know which articles have been invited or submitted for this purpose. the articles will, however, be so designated when they are published. volume 25 initiates a second landmark for ital. henceforth, ital will be published simultaneously in electronic and print versions. the electronic copy will be available to lita members and ital subscribers on the ala/lita web site. equally significantly, at the 2006 ala midwinter meeting in san antonio, the lita board of directors approved a second proposal from the lita publications committee. (the ital editor and editorial board report to the publications committee.) after six months, the electronic issues will be open to all, not restricted to members and subscribers. put simply, if you are a member or subscriber reading this issue in print, you may also read it and volume 25, number 1 (the march 2006 issue) on the web. when volume 25, number 3 is published in september 2006, the march issue on the web will be open for anyone to read. when the december issue is published, this june e-issue will be open to all. the web versions are to be published in both pdf and html versions. most ital articles now include urls. readers will be able to link to them. most figures and graphs submitted by authors are in color. from now on, these will be available to the readers of the e-copies. ala publishing allows authors to submit their articles to institutional repositories, and many authors now do so. authors will retain this option. some articles have been posted on other portals as well. martha yee’s outstanding june 2005 article on how to frbrize the opac appears not only on ucla’s repository site but also on the escholarship repository site of the university of california system, one of the few library-related articles on the site (http://repositories.cdlib.org/escholarhip). furthermore, on november 29, 2005, it was among the top ten most popular articles on the site. recently, dlist (http://dlist.sir.arizona.edu) at the university of arizona library received permission to include it. the decisions to allow simultaneous publication of print and electronic versions and to allow open access after six months were not made lightly. the lita board members carried on extensive electronic discussions among themselves and with nancy colyar, chair of the publications committee, and me. lita president pat mullin’s summary of those discussions was more than ten single-spaced pages. nancy and i also attended a meeting of the board in san antonio. publications and memberships are two chief sources of revenue for almost all professional associations. in two surveys in the past ten years, lita members have indicated they considered ital to be their most important membership benefit. lita membership fell this year, probably because of the recent dues increases by other divisions of ala. this decline was anticipated by lita’s leadership. i think both the ital editorial board and the lita leadership would love to take the additional pioneering step of making our journal a full open-access publication. however, legitimate concern was expressed that opening access after six months might lead to both a decrease in members and subscribers. a significant number of lita leaders said that their membership was based on lita programs, participation, and interaction with colleagues, not just ital. i hope that all lita members feel the same. i further hope that lita members will do everything they can to discourage their libraries from canceling their subscriptions. our financial health would be enhanced if all lita members took two other steps: participating in writing and encouraging the writing of significant articles, and encouraging your many library technology vendors to advertise in ital. fred kilgour and the other founders of our division were library information technology (it) pioneers. fred’s leadership helped make jola and now ital vital reading for library it professionals. i believe that by celebrating the lita/ital anniversaries with a reconfirmation of our practice of publishing articles of the highest quality and by making ital more accessible through electronic publication, we are reaffirming the scholarly and professional commitments first made by fred kilgour and his isad colleagues such a short forty years ago. john webb john webb (jwebb@wsu.edu) is assistant director for systems and planning, washington state university libraries, pullman, and editor of information technology and libraries. editorial: lita and ital: forty and still counting editorial | webb 51 87 book reviews automation in libraries, by r. t. kimber. oxford, pergamon press, 1968. 140 pp. $6.00. many books have been published in recent years on the subject of library automation. very few of them, however, have succeeded in making meaningful contributions to a better understanding of the subject. this volume has made a sincere effort to be one of the few. although library automation is an ambiguous term which lacks precise definition, it is used here clearly to mean the use of computers in libraries. the book is intended for those with no computer background but who are familiar with library operations. it attempts to give a good introduction to current practices in library automation and a fairly detailed ac4 count of the state of the art. in the first chapter, "libraries and automation," mr. kimber discusses the relationship between the library and the computer. seeing the computer as a means of performing human clerical functions, he points out two important attitudes that must be observed: first, one must not change to a computer system just for the sake of changing, and second, one must be willing to change if the change means improvement. the monetary worth of the computer in the library is difficult to express because the end result is not increased profit but better service. since benefits from computer operations can be expressed in time and effort saved, these are the means of monetary comparison the author suggests. he also observes that although there are many good reasons for wanting computerized operations, some of these are merely emotional. chapter ii, "introduction to computers" is written by anne h. boyd, lecturer in computation at queen's university of belfast. miss boyd gives a brief review of the development and use of computers and discusses the fundamentals of computer systems. the next four chapters by mr. kimber present computerized systems for various library activities: chapter iii, "ordering and acquisitions," chapter iv, "circulation control," chapter v, "periodicals listing and accessioning," chapter vi, "catalogues and bibliographies." each chapter with the minimum of technical terminology gives a good account of what is involved in automating a particular operation. his treatment is very informative on these matters. in his final chapter (chapter vii, "the present state of automation in libraries") kimber discusses current trends of library automation and gives examples of libraries which use computers. his list is admittedly not comprehensive, but it does provide a comparison to the "ideal" systems he has described in the earlier chapters. in commenting on the future of computerized library systems, he sees these systems as an escape from the problems of everyday library operations. .88 journal of library a.utomation vol. 3/1 march, 1970 this book should be a good addition to the current book-s on library automation. one unfortunate aspect, however, appears to be an absence of treatment regarding the psychological impact of automation on librarians and users which is certainly one important aspect to he considered when automation of a system is proposed. also, at times the author, in attempting to simplify his discussion, has made :a generalized statement without fuller explanation. this could be misleading and tend to confuse the uninitiated reader. these deficiencies are not of major consequence and do not prejudice the total work but, care should be taken in reading. sul h. lee 1968 international directory of research and development scientists, philadelphia: institute for scientific information, inc., 1969, 1352 pages { approx. ) . $60.00. the second issue of the "international directory of research and development scientists" ( idr&ds) iists the names and organizational addresses of 152,648 authors whose papers were listed in either 4o implications of marc, and the· library of congress systems studies. (this paper includes twenty-eight pages. of ap-· pendices., mostly charts}., two additional papers include a discussion of the future of, and a tabulation of trends affecting, library automation. mm:h of the material in these non-survey papers. is reported more completely elsewhere and some of ft now seems dated. the material presented in this publication must have produced a highly effective educational institute in 1967. in 1969~ its value is at best as a first reader in library automation but not as the state-of-the-art review the title proclaims. charles t . payne 90 journal of library automation vol. 3/1 march, 1970 computers and data processing: information sources, by chester morrill, jr. an annotated guide to the literature, associations, and institutions concerned with input, throughput, and output of data. detroit: gale research co., [1969]. 275 pp. $8.75. (management information guide, 15) this latest volume in the management information guide series should prove as useful as its predecessors, offering to those persons interested in or concerned with computers and data processing (and who now is not?) an organized and extensive survey of the basic and necessary source of available information. thus the text is for the most part an annotated bibliography of pertinent references arranged in broad categories, each category prefaced with a paragraph or two of comment. this is in the style of mr. morrill's earlier contribution to the series, systems and procedures including office management, 1967 and, in general, that of all the volumes of the series. section 7 "operating" is the largest category, some forty pages of references subdivided into "manuals," "digital computers," "data transmission," "fortran," "software" and the like. section 9, entitled "front office references," is of particular interest to the reference librarian, since it serves as a guide to desirable dictionaries, handbooks and abstracting services in the fields of automation and data processing. individual annotations are usually brief, informative and on occasion evaluative. they give evidence of considerable skill in the art of capsule characterization. the prefatory paragraphs and notes to each section characterize the particular topic as successfully and succinctly as do the individual annotations. the preface to section 3, "personnel," is particularly felicitous. coverage is ample not only as to the subjects chosen but also as to numbers of references under individual subjects. an important thirty pages of appendices lists additional sources of information associations, manufacturers, seminars, publishers, placement firms, etc.-particularly valuable to the business man or government official as a desk or front-office reference book, although the librarian will also find it of value in providing specific information for his clientele. in all, this is a highly competent and very welcome addition to the series as well as to the ranks of special reference sources so necessary to the proper practice of the reference librarian's art. i think of crane's a guide to the literature of chemistry and white's sources of information in the social sciences and consider the author quite comfortable in their company as well as in that of his colleagues in the series. in addition, he evinces in his annotations and prefaces a wit, a turn of phrase and a capacity for direct statement that inform and delight the user. he displays an expertise in the fields of management and computer science, and one feels one can rely on his selection and judgment. eleanor r. devlin book reviews 91 cenralized book processing: a feasibility study based on colorado academic libraries by lawrence e. leonard, joan m. maier and richard m. dougherty. metuchen, n.j.: scarecrow press, 1969. 401 pp. $10.00. in october 1966 the national science foundation awarded a grant to the university of colorado libraries and the colorado council of librarians for research in the area of centralized processing. the project was in three phases. phase i involved an examination of the feasibility of establishing a book-processing center to serve the needs of the nine state-supported college and university libraries in colorado (which range in size from the university of colorado, with 805,959 volumes as of june 30, 1967, to metropolitan state college, a new institution with 8,310 volumes). phase ii involved a simulation study of the proposed center, while phase iii involved an operational book-processing center on a one-year experimental basis. this book summarizes the results of the first two phases of the study. phase i involved a detailed time-and-cost analysis of the acquisition, cataloging, and bookkeeping procedures in the nine participating libraries, with resultant processing costs per volume which are both convincing and somewhat startling, ranging as they do from $2.67 to $7.71 per volume. the operating specifications of the proposed book-processing center are then set forth and a mathematical model for simulating its operations under a variety of alternative conditions is prepared. the conclusions are less than surprising: "a centralized book processing center to serve the needs of the academic libraries in colorado is a viable approach to book processing." project benefits are enumerated, in the areas of cost savings, time-lag reductions, and the more efficient utilization of personnel. unfortunately, while many of the conclusions are buttressed by a dazzling array of tables and mathematical formulas (how can most librarians really argue with a regression analysis correlation coefficient matrix?), some of the most important savings cited are based on simple guesses, in some cases very simple guesses. to mention just two examples: 1) we are told that "a discount advantage expected through the use of combined ordering and a larger volume of ordering is conservatively estimated at 5% ... " (perhaps, but what is this based on?) 2) in the area of time lag reduction, "the greatest savings in time will accrue when the center is able to purchase materials from a vendor who has built up his book stock to reflect the needs of academic institutions. up to now, vendors have been unwilling to do this because there is insufficient profit motive." would nine libraries combining together change this profit picture? it is unfortunate that this report could not have waited on phase iii, the completion of the one-year trial of the operational center which was to have been ready in august 1969, so that we could see just how the predictions for the center worked out in practice. as it stands, however, the 92 journal of library autcmuztion vol 3/1 march, 1970 book is a valuable study in library systems analysis and design, and its identification and quantification of the various technical processing activities can yield real benefits to librarians everywhere, be they ever so decentralized. norman dudley a guide to a selection of computer-based science and technology reference services in the u.s.a., american library association, chicago, illinois, 1969, 29 pages. $1.50. this guide is an attempt to bring together those reference publications which are also available in machine readable form. as a "selection" it is limited to eighteen sources from government, professional and private organizations. the guide is the result of a survey undertaken in 1968 by the science and technology reference services committee of the american library association reference services division. the committee was composed of elsie bergland, john mcgowan, william page, joseph paulukonis, margaret simonds, george caldwell, robert krupp and richard snyder. each entry is broken down into three units: 1) the characteristics of the data base, 2) the equipment configuration and 3) the use of the file. subject headings under characteristics of the data base include subject matter, literature surveyed, types of material covered, etc. the equipment configuration section describes computer model, core, operating systems, and programming language. the use of the file section covers potential uses of the data base by the producer and the subscriber. unfortunately for publications of this sort, they become out of date rather quickly. the continuing series, the directory of computerized information in science and technology, is updated periodically and is a very useful reference tool in this field. ge"y d. guthrie 92 journal of library autonuztion vol 3/1 march, 1970 book is a valuable study in library systems analysis and design, and its identification and quantification of the various technical processing activities can yield real benefits to librarians everywhere, be they ever so decentralized. norman dudley a guide to a selection of computer-based science and technology reference services in the u.s.a., american library association, chicago, illinois, 1969, 29 pages. $1.50. this guide is an attempt to bring together those refere~~e pu~lic,~~o~s which are also available in machine readable form. as a selection 1t ls limited to eighteen sources from government, professional and private organizations. . . the guide is the result of a survey undertaken m 1968 by the sc1ence and technology reference services committee of the american library association reference services division. the committee was composed of elsie bergland, john mcgowan, william page, joseph paulukonis, margaret simonds, george caldwell, robert krupp and richard snyder. each entry is broken down into three units: 1) the characteristics of the data base, 2) the equipment configuration and 3) the use of the file. subject headings under characteristics of the data base include subject matter, literature surveyed, types of material covered, etc. the. equipment configuration section describes computer model, core, operatmg systems, and programming language. the use of the file section covers potential uses of the data base by the producer and the subscriber. unfortunately for publications of this sort, they become out of date rather quickly. the continuing series, the directory of computerized infornuztion in science and technology, is updated periodically and is a very useful reference tool in this field. gerry d. guthrie \ orthographic error patterns of author names in catalog searches 93 renata tagliacozzo, manfred kochen, and lawrence rosenberg: mental health research institute, the university of michigan, ann arbor, michigan an investigation of error patterns in author names based on data from a survey of library catalog searches. position of spelling errors was noted and related to length of name. probability of a name having a spelling error was found to increase with length of name. nearly half of the spelling mistakes were replacement errors; following, in order of decreasing frequency, were omission, addition, and transposition errors. computer-based catalog searching may fail if a searcher provides an author or title which does not match with the required exactitude the corresponding computer-stored catalog entry ( 1). in designing computer aids to catalog searching, it is important to build in safety features that decrease sensitivity to minor errors. for example, compression coding techniques may be used to minimize the effects of spelling errors on retrieval ( 2, 3, 4). preliminary to the design of good protection devices, the application of error-correction coding theory ( 5, 6, 7) and data on error patterns in actual catalog searches ( 8, 9) may be helpful. a recent survey of catalog use at three university libraries yielded some data of the above-mentioned kind (10). the aim of this paper is to present and analyze those results of the survey which bear on questions of error control in searching a computer-stored catalog. in the survey, users were interviewed at random as they approached the catalog. of the 2167 users interviewed, 1489 were searching the catalog for a particular item ("known-item searches"). of these, 67.9% first entered the catalog with an author's or editor's name, 26.2% with a title, and 5.9% with a subject heading. approximately half the searchers had a written citation, while half relied on memory for the relevant inreducing psychological resistance to digital repositories | quinn 67 and mit mandates, and other mandates such as the one instituted at stanford’s school of education, have come to pass, and the registry of open access repository material archiving policies (roarmap) lists more than 120 mandates around the world that now exist.3 while it is too early to tell whether these developments will be successful in getting faculty to deposit their work in digital repositories, they at least establish a precedent that other institutions may follow. how many institutions follow and how effective the mandates will be once enacted remains to be seen. will all colleges and universities, or even a majority, adopt mandates that require faculty to deposit their work in repositories? what of those that do not? even if most institutions are successful in instituting mandates, will they be sufficient to obtain faculty cooperation? for those institutions that do not adopt mandates, how are they going to persuade faculty to participate in self-archiving, or even in some variation—such as having surrogates (librarians, staff, or graduate assistants) archive the work of faculty? are mandates the only way to ensure faculty cooperation and compliance, or are mandates even necessarily the best way? to begin to adequately address the problem of user resistance to digital repositories, it might help to first gain some insight into the psychology of resistance. the existing literature on user behavior with regard to digital repositories devotes scant attention to the psychology of resistance. in an article entitled “institutional repositories: partnering with faculty to enhance scholarly communication,” johnson discusses the inertia of the traditional publishing paradigm. he notes that this inertia is most evident in academic faculty. this would suggest that the problem of eliciting user cooperation is primarily motivational and that the problem is more one of indifference than active resistance.4 heterick, in his article “faculty attitudes toward electronic resources,” suggests that one reason faculty may be resistant to digital repositories is because they do not fully trust them. in response to a survey he conducted, 48 percent of faculty felt that libraries should maintain paper archives.5 the implication is that digital repositories and archives may never completely replace hard copies in the minds of scholars. in “understanding faculty to improve content recruitment for institutional repositories,” foster and gibbons point out that faculty complain of having too much work already. they resent any additional work that contributing to a digital repository might entail. thus the authors echo johnson in suggesting that faculty resistance the potential value of digital repositories is dependent on the cooperation of scholars to deposit their work. although many researchers have been resistant to submitting their work, the literature on digital repositories contains very little research on the psychology of resistance. this article looks at the psychological literature on resistance and explores what its implications might be for reducing the resistance of scholars to submitting their work to digital repositories. psychologists have devised many potentially useful strategies for reducing resistance that might be used to address the problem; this article examines these strategies and how they might be applied. o bserving the development and growth of digital repositories in recent years has been a bit like riding an emotional roller coaster. even the definition of what constitutes a repository may not be the subject of complete agreement, but for the purposes of this study, a repository is defined as an online database of digital or digitized scholarly works constructed for the purpose of preserving and disseminating scholarly research. the initial enthusiasm expressed by librarians and advocates of open access toward the potential of repositories to make significant amounts of scholarly research available to anyone with internet access gradually gave way to a more somber appraisal of the prospects of getting faculty and researchers to deposit their work. in august 2007, bailey posted an entry to his digital koans blog titled “institutional repositories: doa?” in which he noted that building digital repository collections would be a long, arduous, and costly process.1 the success of repositories, in his view, will be a function not so much of technical considerations as of attitudinal ones. faculty remain unconvinced that repositories are important, and there is a critical need for outreach programs that point to repositories as an important step in solving the crisis in scholarly communication. salo elaborated on bailey’s post with “yes, irs are broken. let’s talk about it,” on her own blog, caveat lector. salo points out that institutional repositories have not fulfilled their early promise of attracting a large number of faculty who are willing to submit their work. she criticizes repositories for monopolizing the time of library faculty and staff, and she states her belief that repositories will not work without deposit mandates, but that mandates are impractical.2 subsequent events in the world of scholarly communication might suggest that mandates may be less impractical than salo originally thought. since her post, the national institutes of health mandate, the harvard brian quinn (brian.quinn@ttu.edu) is social sciences librarian, texas tech university libraries, lubbock. brian quinn reducing psychological resistance to digital repositories 68 information technology and libraries | june 2010 whether or not this was actually the case.11 this study also suggests that a combination of both cognitive and affective processes feed faculty resistance to digital repositories. it can be seen from the preceding review of the literature that several factors have been identified as being possible sources of user resistance to digital repositories. yet the authors offer little in the way of strategies for addressing this resistance other than to suggest workaround solutions such as having nonscholars (e.g., librarians, graduate students, or clerical staff) serve as proxy for faculty and deposit their work for them, or to suggest that institutions mandate that faculty deposit their work. similarly, although numerous arguments have been made in favor of digital repositories and open access, they do not directly address the resistance issue.12 in contrast, psychologists have studied user resistance extensively and accumulated a body of research that may suggest ways to reduce resistance rather than try to circumvent it. it may be helpful to examine some of these studies to see what insights they might offer to help address the problem of user resistance. it should be pointed out that resistance as a topic has been addressed in the business and organizational literature, but has generally been approached from the standpoint of management and organizational change.13 this study has chosen to focus primarily on the psychology of resistance because many repositories are situated in a university setting. unlike employees of a corporation, faculty members typically have a greater degree of autonomy and latitude in deciding whether to accommodate new work processes and procedures into their existing routines, and the locus of change will therefore be more at an individual level. ■■ the psychology of user resistance psychologists define resistance as a preexisting state or attitude in which the user is motivated to counter any attempts at persuasion. this motivation may occur on a cognitive, affective, or behavioral level. psychologists thus distinguish between a state of not being persuaded and one in which there is actual motivation to not comply. the source of the motivation is usually an affective state, such as anxiety or ambivalence, which itself may result from cognitive problems, such as misunderstanding, ignorance, or confusion.14 it is interesting to note that psychologists have long viewed inertia as one form of resistance, suggesting paradoxically that a person can be motivated to inaction.15 resistance may also manifest itself in more subtle forms that shade into indifference, suspicion of new work processes or technologies, and contentment with the status quo. may be attributed at least in part to motivation.6 in another article published a few months later, foster and gibbons suggest that the main reason faculty have been slow to deposit their work in digital repositories is a cognitive one: faculty have not understood how they would benefit by doing so. the authors also mention that users may feel anxiety when executing the sequence of technical steps needed to deposit their work, and that they may also worry about possible copyright infringement.7 the psychology of resistance may thus manifest itself in both cognitive and affective ways. harley and her colleagues talk about faculty not perceiving any reward for depositing their work in their article “the influence of academic values on scholarly publication and communication practices.” this perception results in reduced drive to participate. anxiety is another factor contributing to resistance: faculty fear that their work may be vulnerable to plagiarism in an openaccess environment.8 in “towards user responsive institutional repositories: a case study,” devakos suggests that one source of user resistance is cognitive in origin. scholars do not submit their work frequently enough to be able to navigate the interface from memory, so they must reinitiate the learning process each time they submit their work. the same is true for entering metadata for their work.9 their sense of control may also be threatened by any limitations that may be imposed on substituting later iterations of their work for earlier versions. davis and connolly point to several sources of confusion, uncertainty, and anxiety among faculty in their article “institutional repositories: evaluating the reasons for non-use of cornell university’s installation of dspace.” cognitive problems arise from having to learn new technology to deposit work and not knowing copyright details well enough to know whether publishers would permit the deposit of research prior to publication. faculty wonder whether this might jeopardize their chances of acceptance by important journals whose editors might view deposit as a form of prior publication that would disqualify them from consideration. there is also fear that the complex structure of a large repository may actually make a scholar’s work more difficult to find; faculty may not understand that repositories are not isolated institutional entities but are usually searchable by major search engines like google.10 kim also identifies anxiety about plagiarism and confusion about copyright as being sources of faculty resistance in the article “motivating and impeding factors affecting faculty contribution to institutional repositories.” kim found that plagiarism anxiety made some faculty only willing to deposit already-published work and that prepublication material was considered too risky. faculty with no self-archiving experience also felt that many publishers do not allow self-archiving, reducing psychological resistance to digital repositories | quinn 69 more open to information that challenges their beliefs and attitudes and are more open to suggestion.18 thus before beginning a discussion of why users should deposit their research in repositories, it might help to first affirm the users’ self-concept. this could be done, for example, by reminding them of how unbiased they are in their work or how important it is in their work to be open to new ideas and new approaches, or how successful they have been in their work as scholars. the affirmation should be subtle and not directly related to the repository situation, but it should remind them that they are openminded individuals who are not bound by tradition and that part of their success is attributable to their flexibility and adaptability. once the users have been affirmed, librarians can then lead into a discussion of the importance of submitting scholarly research to repositories. self-generated affirmations may be even more effective. for example, another way to affirm the self would be to ask users to recall instances in which they successfully took a new approach or otherwise broke new ground or were innovative in some way. this could serve as a segue into a discussion of the repository as one more opportunity to be innovative. once the self-concept has been boosted, the threatening quality of the message will be perceived as less disturbing and will be more likely to receive consideration. a related strategy that psychologists employ to reduce resistance involves casting the user in the role of “expert.” this is especially easy to do with scholars because they are experts in their fields. casting the user in the role of expert can deactivate resistance by putting that person in the persuasive role, which creates a form of role reversal.19 rather than the librarian being seen as the persuader, the scholar is placed in that role. by saying to the scholar, “you are the expert in the area of communicating your research to an audience, so you would know better why the digital repository is an alternative that deserves consideration once you understand how it works and how it may benefit you,” you are empowering the user. casting the user as an expert imparts a sense of control to the user. it helps to disable resistance by placing the user in a position of being predisposed to agree to the role he or she is being cast in, which also makes the user more prone to agree with the idea of using a digital repository. priming and imaging one important discovery that psychologists have made that has some bearing on user resistance is that even subtle manipulations can have a significant effect on one’s judgments and actions. in an interesting experiment, psychologists told a group of students that they were to read an online newspaper, ostensibly to evaluate its design and assess how easy it was to read. half of them read an editorial discussing a public opinion survey of youth ■■ negative and positive strategies for reducing resistance just as the definition of resistance can be paradoxical, so too may be some of the strategies that psychologists use to address it. perhaps the most basic example is to counter resistance by acknowledging it. when scholars are presented with a message that overtly states that digital repositories are beneficial and desirable, it may simultaneously generate a covert reaction in the form of resistance. rather than simply anticipating this and attempting to ignore it, digital repository advocates might be more persuasive if they acknowledge to scholars that there will likely be resistance, mention some possible reasons (e.g., plagiarism or copyright concerns), and immediately introduce some counterrationales to address those reasons.16 psychologists have found that being up front and forthcoming can reduce resistance, particularly with regard to the downside of digital repositories. they have learned that it can be advantageous to preemptively reveal negative information about something so that it can be downplayed or discounted. thus talking about the weaknesses or shortcomings of digital repositories as early as possible in an interaction may have the effect of making these problems seem less important and weakening user resistance. not only does revealing negative information impart a sense of honesty and credibility to the user, but psychologists have found that people feel closer to people who reveal personal information.17 a librarian could thus describe some of his or her own frustrations in using repositories as an effective way of establishing rapport with resistant users. the unexpected approach of bringing up the less desirable aspects of repositories—whether this refers to the technological steps that must be learned to submit one’s work or the fact that depositing one’s work in a repository is not a guarantee that it will be highly cited—can be disarming to the resistant user. this is particularly true of more resistant users who may have been expecting a strong hard-sell approach on the part of librarians. when suddenly faced with a more candid appeal the user may be thrown off balance psychologically, leaving him or her more vulnerable to information that is the opposite of what was anticipated and to possibly viewing that information in a more positive light. if one way to disarm a user is to begin by discussing the negatives, a seemingly opposite approach that psychologists take is to reinforce the user’s sense of self. psychologists believe that one source of resistance stems from when a user’s self-concept—which the user tries to protect from any source of undesired change—has been threatened in one way or another. a stable self-concept is necessary for the user to maintain a sense of order and predictability. reinforcing the self-concept of the user should therefore make the user less likely to resist depositing work in a digital repository. self-affirmed users are 70 information technology and libraries | june 2010 or even possibly collaborating on research. their imaginations could be further stimulated by asking them to think of what it would be like to have their work still actively preserved and available to their successors a century from now. using the imagining strategy could potentially be significantly more effective in attenuating resistance than presenting arguments based on dry facts. identification and liking conscious processes like imagining are not the only psychological means of reducing the resistance of users to digital repositories. unconscious processes can also be helpful. one example of such a process is what psychologists refer to as the “liking heuristic.” this refers to the tendency of users to employ a rule-of-thumb method to decide whether to comply with requests from persons. this tendency results from users constantly being inundated with requests. consequently, they need to simplify and streamline the decision-making process that they use to decide whether to cooperate with a request. the liking heuristic holds that users are more likely to help someone they might otherwise not help if they unconsciously identify with the person. at an unconscious level, the user may think that a person acts like them and dresses like them, and therefore the user identifies with that person and likes them enough to comply with their request. in one experiment that psychologists conducted to see if people are more likely to comply with requests from people that they identify with, female undergraduates were informed that they would be participating in a study of first impressions. the subjects were instructed that they and a person in another room would each learn a little about one another without meeting each other. each subject was then given a list of fifty adjectives and was asked to select the twenty that were most characteristic of themselves. the experimenter then told the participants that they would get to see each other’s lists. the experimenter took the subject’s list and then returned a short time later with what supposedly was the other participant’s list, but was actually a list that the experimenter had filled out to indicate that either the subject had much in common with the other participant’s personality (seventeen of twenty matches), some shared attributes (ten of twenty matches), or relatively few characteristics in common (three of twenty matches). the subject was then asked to examine the list and fill out a survey that probed their initial impressions of the other participant, including how much they liked them. at the end of the experiment, the two subjects were brought together and given credit for participating. the experimenter soon left the room and the confederate participant asked the other participant if she would read and critically evaluate an eight-page paper for an english class. the results of the experiment indicated that the more the participant thought she shared in consumer patterns that highlighted functional needs, and the other half read a similar editorial focusing on hedonistic needs. the students next viewed an ad for a new brand of shampoo that featured either a strong or a weak argument for the product. the results of the experiment indicated that students who read the functional editorial and were then subsequently exposed to the strong argument for the shampoo (a functional product) had a much more favorable impression of the brand than students who had received the mismatched prime.20 while it may seem that the editorial and the shampoo were unrelated, psychologists found that the subjects engaged in a process of elaborating the editorial, which then predisposed them to favor the shampoo. the presence of elaboration, which is a precursor to the development of attitudes, suggests that librarians could reduce users’ resistance to digital repositories by first involving them in some form of priming activity immediately prior to any attempt to persuade them. for example, asking faculty to read a brief case study of a scholar who has benefited from involvement in open-access activity might serve as an effective prime. another example might be to listen briefly to a speaker summarizing the individual, disciplinary, and societal benefits of sharing one’s research with colleagues. interventions like these should help mitigate any predisposition toward resistance on the part of users. imagining is a strategy related to priming that psychologists have found to be effective in reducing resistance. taking their cue from insurance salesmen—who are trained to get clients to actively imagine what it would be like to lose their home or be in an accident—a group of psychologists conducted an experiment in which they divided a sample of homeowners who were considering the purchase of cable tv into two groups. one group was presented with the benefits of cable in a straightforward, informative way that described various features. the other group was asked to imagine themselves enjoying the benefits and all the possible channels and shows that they might experience and how entertaining it might be. the psychologists then administered a questionnaire. the results indicated that those participants who were asked to imagine the benefits of cable were much more likely to want cable tv and to subscribe to it than were those who were only given information about cable tv.21 in other words, imagining resulted in more positive attitudes and beliefs. this study suggests that librarians attempting to reduce resistance among users of digital repositories may need to do more than merely inform or describe to them the advantages of depositing their work. they may need to ask users to imagine in vivid detail what it would be like to receive periodic reports indicating that their work had been downloaded dozens or even hundreds of times. librarians could ask them to imagine receiving e-mail or calls from colleagues indicating that they had accessed their work in the repository and were interested in learning more about it, reducing psychological resistance to digital repositories | quinn 71 students typically overestimate the amount of drinking that their peers engage in at parties. these inaccurate normative beliefs act as a negative influence, causing them to imbibe more because they believe that is what their peers are doing. by informing students that almost threequarters of their peers have less than three drinks at social gatherings, psychologists have had some success in reducing excessive drinking behavior by students.23 the power of normative messages is illustrated by a recent experiment conducted by a group of psychologists who created a series of five cards to encourage hotel guests to reuse their towels during their stay. the psychologists hypothesized that by appealing to social norms, they could increase compliance rates. to test their hypothesis, the researchers used a different conceptual appeal for each of the five cards. one card appealed to environmental concerns (“help save the environment”), another to environmental cooperation (“partner with us to save the environment”), a third card appealed to the advantage to the hotel (“help the hotel save energy”), a fourth card targeted future generations (“help save resources for future generations”), and a final card appealed to guests by making reference to a descriptive norm of the situation (“join your fellow citizens in helping to save the environment”). the results of the study indicated that the card that mentioned the benefit to the hotel was least effective in getting guests to reuse their towels, and the card that was most effective was the one that mentioned that descriptive norm.24 this research suggests that if users who are resistant to submitting their work to digital repositories were informed that a larger percentage of their peers were depositing work than they realized, resistance may be reduced. this might prove to be particularly true if they learned that prominent or influential scholars were engaged in populating repositories with their work. this would create a social-norms effect that would help legitimize repositories to other faculty and help them to perceive the submission process as normal and desirable. the idea that accomplished researchers are submitting materials and reaping the benefits might prove very attractive to less experienced and less well-regarded faculty. psychologists have a considerable body of evidence in the area of social modeling that suggests that people will imitate the behavior of others in social situations because that behavior provides an implicit guideline of what to do in a similar situation. a related finding is that the more influential people are, the more likely it is for others to emulate their actions. this is even more probable for highstatus individuals who are skilled and attractive and who are capable of communicating what needs to be done to potential followers.25 social modeling addresses both the cognitive dimension of how resistant users should behave and also the affective dimension by offering models that serve as a source of motivation to resistant users to change common with the confederate, the more she liked her. the more she liked the confederate and experienced a perception of consensus, the more likely she was to comply with her request to critique the paper.22 thus, when trying to overcome the resistance of users to depositing their work in a digital repository, it might make sense to consider who it is that is making the request. universities sometimes host scholarly communication symposia that are not only aimed at getting faculty interested in open-access issues, but to urge them to submit their work to the institution’s repositories. frequently, speakers at these symposia consist of academic administrators, members of scholarly communication or open-access advocacy organizations, or individuals in the library field. the research conducted by psychologists, however, suggests that appeals to scholars and researchers would be more effective if they were made by other scholars and those who are actively engaged in research. faculty are much more likely to identify with and cooperate with requests from their own tribe, as it were, and efforts need to be concentrated on getting faculty who are involved in and understand the value of repositories to articulate this to their colleagues. researchers who can personally testify to the benefits of depositing their work are most likely to be effective at convincing other researchers of the value of doing likewise and will be more effective at reducing resistance. librarians need to recognize who their potentially most effective spokespersons and advocates are, which the psychological research seems to suggest is faculty talking to other faculty. perceived consensus and social modeling the processes of faculty identification with peers and perceived consensus mentioned above can be further enhanced by informing researchers that other scholars are submitting their work, rather than merely telling researchers why they should submit their work. information about the practices of others may help change beliefs because of the need to identify with other in-group members. this is particularly true of faculty, who are prone to making continuous comparisons with their peers at other institutions and who are highly competitive by nature. once they are informed of the career advantages of depositing their work (in terms of professional visibility, collaboration opportunities, etc.), and they are informed that other researchers have these advantages, this then becomes an impetus for them to submit their work to keep up with their peers and stay competitive. a perception of consensus is thus fostered—a feeling that if one’s peers are already depositing their work, this is a practice that one can more easily agree to. psychologists have leveraged the power of identification by using social-norms research to inform people about the reality of what constitutes normative behavior as opposed to people’s perceptions of it. for example, college 72 information technology and libraries | june 2010 highly resistant users that may be unwilling to submit their work to a repository. rather than trying to prepare a strong argument based on reason and logic, psychologists believe that using a narrative approach may be more effective. this means conveying the facts about open access and digital repositories in the form of a story. stories are less rhetorical and tend not to be viewed by listeners as attempts at persuasion. the intent of the communicator and the counterresistant message are not as overt, and the intent of the message might not be obvious until it has already had a chance to influence the listener. a well-crafted narrative may be able to get under the radar of the listener before the listener has a chance to react defensively and revert to a mode of resistance. in a narrative, beliefs are rarely stated overtly but are implied, and implied beliefs are more difficult to refute than overtly stated beliefs. listening to a story and wondering how it will turn out tends to use up much of the cognitive attentional capacity that might otherwise be devoted to counterarguing, which is another reason why using a narrative approach may be particularly effective with users who are strongly resistant. the longer and more subtle nature of narratives may also make them less a target of resistance than more direct arguments.28 using a narrative approach, the case for submitting work to a repository might be presented not as a collection of dry facts or statistics, but rather as a story. the protagonists are the researchers, and their struggle is to obtain recognition for their work and to advance scholarship by providing maximum access to the greatest audience of scholars and to obtain as much access as possible to the work of their peers so that they can build on it. the protagonists are thwarted in their attempts to achieve their ends by avaricious publishers who obtain the work of researchers for free and then sell it back to them in the form of journal and database subscriptions and books for exorbitant prices. these prices far exceed the rate of inflation or the budgets of universities to pay for them. the publishers engage in a series of mergers and acquisitions that swallow up small publishing firms and result in the scholarly publishing enterprise being controlled by a few giant firms that offer unreasonable terms to users and make unreasonable demands when negotiating with them. presented in this dramatic way, the significance of scholar participation in digital repositories becomes magnified to an extent that it becomes more difficult to resist what may almost seem like an epic struggle between good and evil. and while this may be a greatly oversimplified example, it nonetheless provides a sense of the potential power of using a narrative approach as a technique to reduce resistance. introducing a time element into the attempt to persuade users to deposit their work in digital repositories can play an important role in reducing resistance. given that faculty are highly competitive, introducing the idea not only that other faculty are submitting their work but that they are already benefiting as a result makes the their behavior in the desired direction. redefinition, consistency, and depersonalization another strategy that psychologists use to reduce resistance among users is to change the definition of the situation. resistant users see the process of submitting their research to the repository as an imposition at best. in their view, the last thing that they need is another obligation or responsibility to burden their already busy lives. psychologists have learned that reframing a situation can reduce resistance by encouraging the user to look at the same phenomenon in a different way. in the current situation, resistant users should be informed that depositing their work in a digital repository is not a burden but a way to raise their professional profile as researchers, to expose their work to a wider audience, and to heighten their visibility among not only their peers but a much larger potential audience that would be able to encounter their work on the web. seen in this way, the additional work of submission is less of a distraction and more of a career investment. moreover, this approach leverages a related psychological concept that can be useful in helping to dissolve resistance. psychologists understand that inconsistency has a negative effect on self-esteem, so persuading users to believe that submitting their work to a digital repository is consistent with their past behavior can be motivating.26 the point needs to be emphasized with researchers that the act of submitting their work to a digital repository is not something strange and radical, but is consistent with prior actions intended to publicize and promote their work. a digital repository can be seen as analogous to a preprint, book, journal, or other tangible and familiar vehicles that faculty have used countless times to send their work out into the world. while the medium might have changed, the intention and the goal are the same. reframing the act of depositing as “old wine in new bottles” may help to undermine resistance. in approaching highly resistant individuals, psychologists have discovered that it is essential to depersonalize any appeal to change their behavior. instead of saying, “you should reduce your caloric intake,” it is better to say, “it is important for people to reduce their caloric intake.” this helps to deflect and reduce the directive, judgmental, and prescriptive quality of the request, thus making it less likely to provoke resistance.27 suggestion can be much less threatening than prescription among users who may be suspicious and mistrusting. reverting to a third-person level of appeal may allow the message to get through without it being immediately rejected by the user. narrative, timing, and anticipation psychologists recommend another strategy to help defuse reducing psychological resistance to digital repositories | quinn 73 technological platforms, and so on. this could be followed by a reminder to users that it is their choice—it is entirely up to them. this reminder that users have the freedom of choice may help to further counter any resistance generated as a result of instructions or inducements to anticipate regret. indeed, psychologists have found that reinstating a choice that was previously threatened can result in greater compliance than if the threat had never been introduced.32 offering users the freedom to choose between alternatives tends to make them more likely to comply. this is because having a choice enables users to both accept and resist the request rather than simply focus all their resistance on a single alternative. when presented with options, the user is able to satisfy the urge to resist by rejecting one option but is simultaneously motivated to accept another option; the user is aware that there are benefits to complying and wants to take advantage of them but also wants to save face and not give in. by being offered several alternatives that nonetheless all commit to a similar outcome, the user is able to resist and accept at the same time.33 for example, one alternative option to self-archiving might be to present the faculty member with the option of an authorpays publishing model. the choice of alternatives allows the faculty member to be selective and discerning so that a sense of satisfaction is derived from the ability to resist by rejecting one alternative. at the same time, the librarian is able to gain compliance because one of the other alternatives that commits the faculty member to depositing research is accepted. options, comparisons, increments, and guarantees in addition to offering options, another way to erode user resistance to digital repositories is to use a comparative strategy. one technique is to first make a large request, such as “we would like you to submit all the articles that you have published in the last decade to the repository,” and then follow this with a more modest request, such as “we would appreciate it if you would please deposit all the articles you have published in the last year.” the original request becomes an “anchor” or point of reference in the mind of the user against which the subsequent request is then evaluated. setting a high anchor lessens user resistance by changing the user’s point of comparison of the second request from nothing (not depositing any work in the repository) to a higher value (submitting a decade of work). in this way, a high reference anchor is established for the second request, which makes it seem more reasonable in the newly created context of the higher value.34 the user is thus more likely to comply with the second request when it is framed in this way. using this comparative approach may also work because it creates a feeling of reciprocity in the user. when proposition much more salient. it not only suggests that submitting work is a process that results in a desirable outcome, but that the earlier one’s work is submitted, the more recognition will accrue and the more rapidly one’s career will advance.29 faculty may feel compelled to submit their work in an effort to remain competitive with their colleagues. one resource that may be particularly helpful for working with skeptical faculty who want substantiation about the effect of self-archiving on scholarly impact is a bibliography created by the open citation project titled, “the effect of open access and downloads (hits) on citation impact: a bibliography of studies.”30 it provides substantial documentation of the effect that open access has on scholarly visibility. an additional stimulus might be introduced in conjunction with the time element in the form of a download report. showing faculty how downloads accumulate over time is analogous to arguments that investment counselors use showing how interest on investments accrues and compounds over time. this investment analogy creates a condition in which hesitating to submit their work results in faculty potentially losing recognition and compromising their career advancement. an interesting related finding by psychologists suggests that an effective way to reduce user resistance is to have users think about the future consequences of complying or not complying. in particular, if users are asked to anticipate the amount of future regret they might experience for making a poor choice, this can significantly reduce the amount of resistance to complying with a request. normally, users tend not to ruminate about the possibility of future disappointment in making a decision. if users are made to anticipate future regret, however, they will act in the present to try to minimize it. studies conducted by psychologists show that when users are asked to anticipate the amount of future regret that they might experience for choosing to comply with a request and having it turn out adversely versus choosing to not comply and having it turn out adversely, they consistently indicate that they would feel more regret if they did not comply and experienced negative consequences as a result.31 in an effort to minimize this anticipated regret, they will then be more prone to comply. based on this research, one strategy to reduce user resistance to digital repositories would be to get users to think about the future, specifically about future regret resulting from not cooperating with the request to submit their work. if they feel that they might experience more regret in not cooperating than in cooperating, they might then be more inclined to cooperate. getting users to think about the future could be done by asking users to imagine various scenarios involving the negative outcomes of not complying, such as lost opportunities for recognition, a lack of citation by peers, lost invitations to collaborate, an inability to migrate one’s work to future 74 information technology and libraries | june 2010 submit their work. mandates rely on authority rather than persuasion to accomplish this and, as such, may represent a less-than-optimal solution to reducing user resistance. mandates represent a failure to arrive at a meeting of the minds of advocates of open access, such as librarians, and the rest of the intellectual community. understanding the psychology of resistance is an important prerequisite to any effort to reduce it. psychologists have assembled a significant body of research on resistance and how to address it. some of the strategies that the research suggests may be effective, such as discussing resistance itself with users and talking about the negative effects of repositories, may seem counterintuitive and have probably not been widely used by librarians. yet when other more conventional techniques have been tried with little or no success, it may make sense to experiment with some of these approaches. particularly in the academy, where reason is supposed to prevail over authority, incorporating resistance psychology into a program aimed at soliciting faculty research seems an appropriate step before resorting to mandates. most strategies that librarians have used in trying to persuade faculty to submit their work have been conventional. they are primarily of a cognitive nature and are variations on informing and educating faculty about how repositories work and why they are important. researchers have an important affective dimension that needs to be addressed by these appeals, and the psychological research on resistance suggests that a strictly rational approach may not be sufficient. by incorporating some of the seemingly paradoxical and counterintuitive techniques discussed earlier, librarians may be able to penetrate the resistance of researchers and reach them at a deeper, less rational level. ideally, a mixture of rational and less-conventional approaches might be combined to maximize effectiveness. such a program may not eliminate resistance but could go a long way toward reducing it. future studies that test the effectiveness of such programs will hopefully be conducted to provide us with a better sense of how they work in real-world settings. references 1. charles w. bailey jr., “institutional repositories: doa?,” online posting, digital koans, aug. 22, 2007, http://digital -scholarship.org/digitalkoans/2007/08/21/institutional -repositories-doa/ (accessed apr. 21, 2010). 2. dorothea salo, “yes, irs are broken. let’s talk about it,” online posting, caveat lector, sept. 5, 2007, http://cavlec. yarinareth.net/2007/09/05/yes-irs-are-broken-lets-talk-about -it/ (accessed apr. 21, 2010). 3. eprints services, roarmap (registry of open access repository material archiving policies) http://www.eprints .org/openaccess/policysignup/ (accessed july 28, 2009). 4. richard k. johnson, “institutional repositories: partnering the requester scales down the request from the large one to a smaller one, it creates a sense of obligation on the part of the user to also make a concession by agreeing to the more modest request. the cultural expectation of reciprocity places the user in a situation in which they will comply with the lesser request to avoid feelings of guilt.35 for the most resistant users, breaking the request down into the smallest possible increment may prove helpful. by making the request seem more manageable, the user is encouraged to comply. psychologists conducted an experiment to test whether minimizing a request would result in greater cooperation. they went door-to-door, soliciting contributions to the american cancer society, and received donations from 29 percent of households. they then made additional solicitations, this time asking, “would you contribute? even a penny will help!” using this approach, donations increased to 50 percent. even though the solicitors only asked for a penny, the amounts of the donations were equal to that of the original request. by asking for “even a penny,” the solicitors made the request appear to be more modest and less of a target of resistance.36 librarians might approach faculty by saying “if you could even submit one paper we would be grateful,” with the idea that once faculty make an initial submission they will be more inclined to submit more papers in the future. one final strategy that psychological research suggests may be effective in reducing resistance to digital repositories is to make sure that users understand that the decision to deposit their work is not irrevocable. with any new product, users have fears about what might happen if they try it and they are not satisfied with it. not knowing the consequences of making a decision that they may later regret fuels reluctance to become involved with it. faculty need to be reassured that they can opt out of participating at any time and that the repository sponsors will guarantee this. this guarantee needs to be repeated and emphasized as much as possible in the solicitation process so that faculty are frequently reminded that they are entering into a decision that they can reverse if they so decide. having this reassurance should make researchers much less resistant to submitting their work, and the few faculty who may decide that they want to opt out are worth the reduction in resistance.37 the digital repository is a new phenomenon that faculty are unfamiliar with, and it is therefore important to create an atmosphere of trust. the guarantee will help win that trust. ■■ conclusion the scholarly literature on digital repositories has given little attention to the psychology of resistance. yet the ultimate success of digital repositories depends on overcoming the resistance of scholars and researchers to reducing psychological resistance to digital repositories | quinn 75 20. curtis p. haugtvedt et al., “consumer psychology and attitude change,” in knowles and linn, resistance and persuasion, 283–96. 21. larry w. gregory, robert b. cialdini, and kathleen m. carpenter, “self-relevant scenarios as mediators of likelihood estimates and compliance: does imagining make it so?” journal of personality & social psychology 43, no. 1 (1982): 89–99. 22. jerry m. burger, “fleeting attraction and compliance with requests,” in the science of social influence: advances and future progress, ed. anthony r. pratkanis (new york: psychology pr., 2007): 155–66. 23. john d. clapp and anita lyn mcdonald, “the relationship of perceptions of alcohol promotion and peer drinking norms to alcohol problems reported by college students,” journal of college student development 41, no. 1 (2000): 19–26. 24. noah j. goldstein and robert b. cialdini, “using social norms as a lever of social influence,” in the science of social influence: advances and future progress, ed. anthony r. pratkanis (new york: psychology pr., 2007): 167–90. 25. dale h. schunk, “social-self interaction and achievement behavior,” educational psychologist 34, no. 4 (1999): 219–27. 26. rosanna e. guadagno et al., “when saying yes leads to saying no: preference for consistency and the reverse foot-inthe-door effect,” personality & social psychology bulletin 27, no. 7 (2001): 859–67. 27. mary jiang bresnahan et al., “personal and cultural differences in responding to criticism in three countries,” asian journal of social psychology 5, no. 2 (2002): 93–105. 28. melanie c. green and timothy c. brock, “in the mind’s eye: transportation-imagery model of narrative persuasion,” in narrative impact: social and cultural foundations, ed. melanie c. green, jeffrey j. strange, and timothy c. brock (mahwah, n.j.: lawrence erlbaum, 2004): 315–41. 29. oswald huber, “time pressure in risky decision making: effect on risk defusing,” psychology science 49, no. 4 (2007): 415–26. 30. the open citation project, “the effect of open access and downloads (‘hits’) on citation impact: a bibliography of studies,” july 17, 2009, http://opcit.eprints.org/oacitation -biblio.html (accessed july 29, 2009). 31. matthew t. crawford et al., “reactance, compliance, and anticipated regret,” journal of experimental social psychology 38, no. 1 (2002): 56–63. 32. nicolas gueguen and alexandre pascual, “evocation of freedom and compliance: the ‘but you are free of . . .’ technique,” current research in social psychology 5, no. 18 (2000): 264–70. 33. james p. dillard, “the current status of research on sequential request compliance techniques,” personality & social psychology bulletin 17, no. 3 (1991): 283–88. 34. thomas mussweiler, “the malleability of anchoring effects,” experimental psychology 49, no. 1 (2002): 67–72. 35. robert b. cialdini and noah j. goldstein, “social influence: compliance and conformity,” annual review of psychology 55 (2004): 591–21. 36. james m. wyant and stephen l. smith, “getting more by asking for less: the effects of request size on donations of charity,” journal of applied social psychology 17, no. 4 (1987): 392–400. 37. lydia j. price, “the joint effects of brands and warranties in signaling new product quality,” journal of economic psychology 23, no. 2 (2002): 165–90. with faculty to enhance scholarly communication,” d-lib magazine 8, no. 11 (2002), http://www.dlib.org/dlib/november02/ johnson/11johnson.html (accessed apr. 2, 2008). 5. bruce heterick, “faculty attitudes toward electronic resources,” educause review 37, no. 4 (2002): 10–11. 6. nancy fried foster and susan gibbons, “understanding faculty to improve content recruitment for institutional repositories,” d-lib magazine 11, no. 1 (2005), http://www.dlib.org/ dlib/january05/foster/01foster.html (accessed july 29, 2009). 7. suzanne bell, nancy fried foster, and susan gibbons, “reference librarians and the success of institutional repositories,” reference services review 33, no. 3 (2005): 283–90. 8. diane harley et al., “the influence of academic values on scholarly publication and communication practices,” center for studies in higher education, research & occasional paper series: cshe.13.06, sept. 1, 2006, http://repositories.cdlib.org/ cshe/cshe-13-06/ (accessed apr. 17, 2008). 9. rea devakos, “towards user responsive institutional repositories: a case study,” library high tech 24, no. 2 (2006): 173–82. 10. philip m. davis and matthew j. l. connolly, “institutional repositories: evaluating the reasons for non-use of cornell university’s installation of dspace,” d-lib magazine 13, no. 3/4 (2007), http://www.dlib.org/dlib/march07/davis/03davis .html (accessed july 29, 2009). 11. jihyun kim, “motivating and impeding factors affecting faculty contribution to institutional repositories,” journal of digital information 8, no. 2 (2007), http://journals.tdl.org/jodi/ article/view/193/177 (accessed july 29, 2009). 12. peter suber, “open access overview” online posting, open access news: news from the open access environment, june 21, 2004, http://www.earlham.edu/~peters/fos/overview .htm (accessed 29 july 2009). 13. see, for example, jeffrey d. ford and laurie w. ford, “decoding resistance to change,” harvard business review 87, no. 4 (2009): 99–103.; john p. kotter and leonard a. schlesinger, “choosing strategies for change,” harvard business review 86, no. 7/8 (2008): 130–39; and paul r. lawrence, “how to deal with resistance to change,” harvard business review 47, no. 1 (1969): 4–176. 14. julia zuwerink jacks and maureen e. o’brien, “decreasing resistance by affirming the self,” in resistance and persuasion, ed. eric s. knowles and jay a. linn (mahwah, n.j.: lawrence erlbaum, 2004): 235–57. 15. benjamin margolis, “notes on narcissistic resistance,” modern psychoanalysis 9, no. 2 (1984): 149–56. 16. ralph grabhorn et al., “the therapeutic relationship as reflected in linguistic interaction: work on resistance,” psychotherapy research 15, no. 4 (2005): 470–82. 17. arthur aron et al., “the experimental generation of interpersonal closeness: a procedure and some preliminary findings,” personality & social psychology bulletin 23, no. 4 (1997): 363–77. 18. geoffrey l. cohen, joshua aronson, and claude m. steele, “when beliefs yield to evidence: reducing biased evaluation by affirming the self,” personality & social psychology bulletin 26, no. 9 (2000): 1151–64. 19. anthony r. pratkanis, “altercasting as an influence tactic,” in attitudes, behavior and social context: the role of norms and group membership, ed. deborah j. terry and michael a.hogg (mahwah, n.j.: lawrence erlbaum, 2000): 201–26. librarians and technology skill acquisition: issues and perspectives | farney 141click analytics: visualizing website use data | farney 141 tutorial tabatha a. farney librarians who create website content should have access to website usage statistics to measure their webpages’ effectiveness and refine the pages as necessary.3 with web analytics libraries can increase the effectiveness of their websites, and as marshall breeding has observed, libraries can regularly use website statistics to determine how new webpage content is actually being used and make revisions to the content based on this information.4 several recent studies used google analytics to collect and report website usage statistics to measure website effectiveness and improve their usability.5 while web analytics are useful in a website redesign process, several studies concluded that web usage statistics should not be the sole source of information used to evaluate a website. these studies recommend using click data in conjunction with other website usability testing methods.6 background a lack of research on the use of click analytics in libraries motivated the web services librarian to explore their potential by directly implementing them on the library’s website. she found that there are several click analytics products available and each has its own unique functionality. however, many are commercially produced and expensive. with limited funding, the web services librarian selected google analytics’ in-page analytics, clickheat, and crazy egg because they are either free or inexpensive. each tool was evaluated on the library’s website for over a six month period. because google analytics cannot discern between the same link repeated in multiple places on a webpage. furthermore, she wanted to use website use data to determine the areas of high and low usage on the library’s homepage, and use this information to justify her webpage reorganization decisions. although this data can be found in a google analytics report, the web services librarian found it difficult to easily identify the necessary information within the massive amount of data the reports contain. the web services librarian opted to use click analytics, also known as click density analysis or site overlay, a subset of web analytics that reveals where users click on a webpage.1 a click analytics report produces a visual representation of what and where visitors are clicking on an individual webpage by overlaying the click data on top of the webpage that is being tested. rather than wading through the data, libraries can quickly identify what content users are clicking by using a click analytics report. the web services librarian tested several click analytics products while reassessing the library’s homepage. during this process she discovered that each click analytics tool had different functionalities that impacted their usefulness to the library. this paper introduces and evaluates three click analytics tools, google analytics’ in-page analytics, clickheat, and crazy egg, in the context of redesigning the library’s homepage and discusses the benefits and drawbacks of each. literature review library literature indicates that libraries are actively engaged in interpreting website usage data for a variety of purposes. laura b. cohen’s study encourages libraries to use their website usage data to enhance their understanding of how visitors access and use library websites.2 jeanie m. welch further recommends that all click analytics: visualizing website use data editor’s note: this paper is adapted from a presentation given at the 2010 lita forum click analytics is a powerful technique that displays what and where users are clicking on a webpage helping libraries to easily identify areas of high and low usage on a page without having to decipher website use data sets. click analytics is a subset of web analytics, but there is little research that discusses its potential uses for libraries. this paper introduces three click analytics tools, google analytics’ in-page analytics, clickheat, and crazy egg, and evaluates their usefulness in the context of redesigning a library’s homepage. w eb analytics tools, such as google analytics, assist libraries in interpreting their website usage statistics by formatting that data into reports and charts. the web services librarian at the kraemer family library at the university of colorado, colorado springs wanted to use website use data to reassess the library’s homepage that was crowded with redundant links. for example, all the links in the site’s dropdown navigation were repeated at the bottom of the homepage to make the links more noticeable to the user, but it unintentionally made the page long. to determine which links the web services librarian would recommend for removal, she needed to compare the use or clicks the repetitive links received. at the time, the library relied solely on google analytics to interpret website use data. however, this practice proved insufficient tabatha a. farney (tfarney@uccs.edu) is web services librarian, kraemer family library, university of colorado, colorado springs, colorado. 142 information technology and libraries | september 2011 libraries, outbound links include library catalogs or subscription databases. additional javascript tags must be added to each outbound link for google analytics to track that data.9 once google analytics recognizes the outbound links, their click data will be available in the in-page analytics report. visitors to that page, and outbound destinations, links that navigate visitors away from that webpage. the inbound sources and outbound destinations reports can track outbound links, which are links that have a different domain or url address from the website tracked within google analytics. for in-page analytics google analytics is a popular, comprehensive web analytics tool that contains a click analytics feature called in-page analytics (formerly site overlay) that visually displays click data by overlaying that information on the current webpage (see figure 1). site overlay was used during the library’s redesign process, however, it was replaced by in-page analytics in october 2010.7 the web services librarian reassessed the library’s homepage using in-page analytics, and found that the current tool resolved some of site overlay’s shortcomings. site overlay is no longer accessible in google analytics, so this paper will discuss in-page analytics. essentially, in-page analytics is an updated version of the site overlay (see figure 2). in addition to visually representing click data on a webpage, in-page analytics contains new features including the ability to easily segment data. web analytics expert, avinash kaushik, stresses the importance of segmenting website use data because it breaks down the aggregated data into specific data sets that represents more defined groups of users.8 rather than studying the total number of clicks a link received, an in-page analytics report can segment the data into specific groups of users, such as mobile device users. in-page analytics provides several default segments, but custom segments can also be applied allowing libraries to further filter the data that is constructive to them. in-page analytics also displays a complementing overview report of statistics located in a side panel next to the typical site overlay view. this overview report extracts useful data from other reports generated in google analytics without having to leave the in-page analytics report screen. the report includes the webpage’s inbound sources, also called top referrals, which are links from other webpages leading figure 1. screenshot of google analytics’ defunct site overlay figure 2. screenshot of google analytic’s in-page analytic librarians and technology skill acquisition: issues and perspectives | farney 143click analytics: visualizing website use data | farney 143 services librarian uses a screen capture tool, such as the firefox add-on screengrab13, to collect and archive the in-page analytics reports, but the process is clunky and results in the loss of the ability to segment the data. clickheat labsmedia’s clickheat is an open source heat mapping tool that visually displays the clicks on a webpage using color to indicate the amount of clicks an area receives. similar to in-page analytics, a clickheat heat map displays the current webpage and overlays that page with click data (see figure 3). instead of listing percentages or actual numbers of clicks, the heat map represents clicks using color. the warmer the color, such as yellows, oranges, or reds, the more clicks that area receives; the absence of color implies little to no click activity. each heat map has an indicator that outlines the number of clicks a color represents. a heat map clearly displays the heavily used and underused sections on a webpage making it easy for people with little experience interpreting website usage statistics to interpret the data. however, a heat map is not about exact numbers, but rather general areas of usage. for exact numbers, a traditional, comprehensive web analytics tool is required. clickheat can stand alone or be integrated into other web analytic tools.14 to have a more comprehensive web analytics product, the web services librarian opted to use the clickheat plugin for piwik, a free, open source web analytics tool that seeks to be an alternative to google analytics.15 by itself piwik has no click analytics feature, therefore clickheat is a useful plugin. both piwik and clickheat require access to a web server for installation and knowledge of php and mysql to configure them. because the kraemer family library does not maintain its own web servers, the pages, but it is time consuming and may not be worth the effort since the data are indirectly available.11 a major drawback to in-page analytics is that it does not discern between the same links listed in multiple places on a webpage. instead it tracks redundant links as one link, making it impossible to distinguish which repeated link received more use on the library’s homepage. similarly, the library’s homepage uses icons to help draw attention to certain links. these icons are linked images next to their counterpart text link. since the icon and text link share the same url, in-page analytics cannot reveal which is receiving more clicks. in-page analytics is useless for comparing repetitive links on a webpage, but google reports that they are working on adding this capability.12 as stated earlier, in-page analytics lays the click data over the current webpage in real-time, which can be both useful and limiting. using the current webpage allows libraries to navigate through their site while staying within the in-page analytics report. libraries can follow in the tracks of website users to learn how they interact with the site’s content and navigation. the downside is that it is difficult to compare a new version of a webpage with an older version since it only displays the current webpage. for example, the web services librarian could not accurately compare the use data between the old homepage and the revised homepage within the in-page analytics report because the newly redesigned homepage replaced the old page. comparing different versions of a webpage could help determine whether the new revisions improved the page or not. an archive or export feature would remedy this problem, but in-page analytics does not have this capacity. additionally, an export function would improve the ability to share this report with other librarians without having them login to the google analytics website. currently, the web evaluation of in-page analytics in-page analytics’ advanced segmenting ability far exceeds the old site overlay functionality. segmenting click data at the link level helps web managers to see how groups of users are navigating through a website. for example, in-page analytics can monitor the links mobile users are clicking, allowing web managers to track how that group of users are navigating through a website. this data could be used in designing a mobile version of a site. in-page analytics integrates a site overlay report and an overview report that contains selected web use statistics for an individual webpage. although the overview report is not in visual context with the site overlay view, it combines the necessary data to determine how a webpage is being accessed and used. this assists in identifying possible flaws in a website’s navigation, layout, or content. it also has the potential to clarify misleading website statistics. for instance, google analytics top exit pages report indicates the library’s homepage is the top exit page for the site. exit pages are the last page a visitor views before leaving the site.10 having a high exit rate could imply visitors were leaving the library’s site from the homepage and potentially missing a majority of the library’s online resources. using in-page analytics, it was apparent the library’s homepage had a high number of exits because many visitors clicked on outbound links, such as the library catalog, that navigated visitors away from the library’s website. rather than finding a potential problem, in-page analytics indicated that the homepage’s layout successfully led visitors to a desired point of information. while the data from the outbound links is available in the data overview report, it is not displayed within the site overlay view. it is possible to work around this problem by creating internal redirect 144 information technology and libraries | september 2011 the precise number of clicks is available in traditional web analytics reports. installing and configuring clickheat is a potential drawback for some libraries that do not have access to the necessary technology or staff to maintain it. even with access to a web server and knowledgeable staff, the web services librarian still experienced glitches implementing clickheat. she could not add clickheat to any high trafficked webpage because it created a slight, but noticeable, lag in response time to any page it was added. the cause was an out-of-box configuration setting that had to be fixed by the campus’ information technology department.17 another concern for libraries is that clickheat is continuously being developed with new versions or patches released periodically.18 like any locally installed software, libraries must plan for continuing maintenance of clickheat to keep it current. just as with in-page analytics, clickheat has no export or archive function. this impedes the web main navigation on the homepage and opted to use links prominently displayed within the homepage’s content. this indicated that either the users did not notice the main navigation dropdown menus or that they chose to ignore them. further usability testing of the main navigation is necessary to better understand why users do not utilize it. clickheat is most useful when combined with a comprehensive web analytics tool, such as piwik. since clickheat only collects data where visitors are clicking, it does not track other web analytics metrics, which limits its ability to segment the click data. currently, clickheat only segments clicks by browser type or screen resolution. additional segmenting ability would enhance this tool’s usefulness. for example, the ability to segment clicks from new visitors and returning visitors may reveal how visitors learn to use the library’s homepage. furthermore, the heat map report does not provide the actual number of clicks on individual links or content areas since heat maps generalize click patterns. web services librarian worked with the campus’ information technology department to install piwik with the clickheat plugin on a campus web server. once installed, piwik and clickheat generate javascript tags that must be added to every page that website use data will be tracked. although piwik and clickheat can be integrated, the tools work separately so two javascript tags must be added to a webpage to track click data in piwik as well as in clickheat. only the pages that contain the clickheat tracking script will generate heat maps that are then stored within the local piwik interface. evaluation of clickheat in-page analytics only tracks links or items that perform some sort of action, such as playing a flash video,16 but clickheat tracks clicks on internal links, outbound links, and even nonlinked objects, such as images. hence, clickheat is able to track clicks on the entire webpage. tracking non-linked objects was unexpectedly useful in identifying potential flaws in a webpage’s design. for instance, within a week of beta testing the library’s redesigned homepage, it was evident that users clicked on the graphics that were positioned closely to text links. the images were intended to draw the user’s attention to the text link, but instead users clicked on the graphic itself expecting it to be a link. to alleviate possible user frustration, the web services librarian added links to the graphics to take visitors to the same destinations as their companion text links. clickheat treats every link or image as its own separate component, so it has the ability to compare the same link listed in multiple places on the same page. unlike in-page analytics, clickheat was particularly helpful in analyzing which redundant links received more use on the homepage. in addition, the heat map also revealed that users ignored the site’s figure 3. screenshot of clickheat’s heat map report librarians and technology skill acquisition: issues and perspectives | farney 145click analytics: visualizing website use data | farney 145 clicks that area has received with the brighter colors representing the higher percentage of clicks. the plus signs can be expanded to show the total number of clicks an item has received, and this number can be easily filtered into eleven predefined allowing crazy egg to differentiate between the same link or image listed multiple times on a webpage. crazy egg displays this data in color-coded plus signs which are located next to the link or graphic it represents. the color is based on the percentage of services librarian’s ability to share the heat maps and compare different versions of a webpage. again, the web services librarian manually archives the heat maps using a screen capture tool, but the process is not the perfect solution. crazy egg crazy egg is a commercial, hosted click analytics tool selected for this project primarily for its advanced click tracking functionality. it is a fee-based service that requires a monthly subscription. there are several subscription packages based on the number of visits and “snapshots.” snapshots are webpages that are tracked by crazy egg. the kraemer family library subscribes to the standard package that allows up to twenty snapshots at one time with a combined total of 25,000 visits a month. to help manage how those visits are distributed, each tracked page can be assigned a specific number of visits or time period so that one webpage does not use all the visits early in the month. once a snapshot reaches its target number of visits or its allocated time period, it automatically stops tracking clicks and archives that snapshot within the crazy egg website.19 the snapshots convert the click data into three different click analytic reports: heat map, site overlay, and something called “confetti view.” crazy egg’s heat map report is comparable to clickheat’s heat map; they both use intensity of colors to show high areas of clicks on a webpage (see figure 4). crazy egg’s site overlay is similar to in-page analytics in that they both display the number of clicks a link receives (see figure 5). unlike in-page analytics, crazy egg tracks all clicks including outbound links as well as nonlinked content, such as graphics, if it has received multiple clicks. every clicked link and graphic is treated as its own separate entity, figure 4. screenshot of crazy egg’s heat map report figure 5. screenshot of crazy egg’s site overlay report 146 information technology and libraries | september 2011 to decide which redundant links to remove from the homepage. the confetti view report was useful for studying clicks on the entire webpage. segmenting this data allowed the web services librarian to identify click patterns on the webpage from a specific group. for example, the report revealed that mobile device users would scroll horizontally on the homepage to click on content, but rarely vertically. she also focused on the time to click segment, which reports how long it took a visitor to click on something, in the confetti view to identify links or areas that took users over half a minute to click. both segments provided interesting information, but further usability testing is necessary to better understand why mobile users preferred not to scroll vertically or why it took users longer to click on certain links. crazy egg also has the ability to archive its snapshots within its profile. this is useful for comparing different versions of a webpage to discover if the modifications were an improvement or not. one goal for the library’s homepage redesign was to shorten the page so users did not have to scroll evaluation of crazy egg crazy egg combines the capabilities of in-page analytcis and clickheat in one tool and expands on their abilities. it is not a comprehensive web analytics tool like google analytics or piwik, but rather is designed to specifically track where users are clicking. crazy egg’s heat map report is comparable to the one freely available in clickheat, however, its site overlay and confetti view reports are more sophisticated than what is currently available for free. the web services librarian found crazy egg to be a worthwhile investment during the library’s homepage redesign because it provided additional context to show how users were interacting with the library’s website. the site overlay facilitated the ability to compare the same link listed in multiple locations on the library’s homepage. not only could the web services librarian see how many clicks the links received, but she could also segment and compare that data to learn which links users were finding faster and which links new visitors or returning visitors preferred. this data helped her segments that include day of week, browser type, and top referring websites. custom segments may be applied if they are set up within the crazy egg profile. the confetti view report displays every click the snapshot recorded and overlays those clicks as colored dots on the snapshot as shown in figure 6. the color of the dot corresponds to specific segment value. the confetti view report uses the same default segmented values used in the site overlay report but here they can be further filtered into defined values for that segment. for example, the confetti view can segment the clicks by window width and then further filter the data to display only the clicks from visitors with window widths under 1000 pixels to see if users with smaller screen resolutions are scrolling down long webpages to click on content. this information is hard to glean from crazy egg’s site overlay report because it focuses on the individual link or graphic. the confetti view report focuses on clicks at the webpage level, allowing libraries to view usage trends on a webpage. crazy egg is a hosted service like google analytics, which means all the data are stored on crazy egg’s web servers and accessed through its website. implementing crazy egg on a webpage is a two-step process requiring the web manager to first set up the snapshot within the crazy egg profile and then add the tracking javascript tags to the webpage it will track. once the javascript tags are in place, crazy egg takes a picture of the current webpage and stores that as the snapshot on which to overlay the click data reports. since it uses a “snapshot” of the webpage, the website manager needs to retake a snapshot of the webpage if there are any changes to it. retaking the snapshot requires only a click of a button to automatically stop the old snapshot and regenerate a new one based on the current webpage without having to change the javascript tags. figure 6. screenshot of crazy egg’s confetti view report librarians and technology skill acquisition: issues and perspectives | farney 147click analytics: visualizing website use data | farney 147 website. next, she will explore ways to automate the process of sharing of website use data to make this information more accessible to other interested librarians. by sharing this information, the web services librarian hopes to promote informed decision making for the library’s web content and design. references 1. avinash kaushik, web analytics 2.0: the art of online accountability and science of customer centricity (indianapolis: wiley, 2010): 81–83. 2. laura b. cohen, “a two-tiered model for analyzing library website usage statistics, part 2: log file analysis,” portal: libraries & the academy 3, no. 3 (2003): 523–24. 3. jeanie m. welch, “who says we’re not busy? library web page usage as a measure of public service activity,” reference services review 33, no. 4 (2005): 377–78. 4. marshall breeding, “an analytical approach to assessing the effectiveness of web-based resources,” computers in libraries, 28, no. 1 (2008): 20–22. 5. julie arendt and cassie wagner, “beyond description: converting web site statistics into concrete site improvement ideas,” journal of web librarianship 4, no. 1 (january 2010): 37–54; steven j. turner, “websites statistics 2.0: using google analytics to measure library website effectiveness,” technical services quarterly 27, no. 3 (2010): 261–278; wei fang and marjorie e. crawford, “measuring law library catalog web site usability: a web analytic approach,” journal of web librarianship 2, no. 2–3 (2008): 287–306. 6. ardent and wagner, “beyond description,” 51–52; andrea wiggins, “data-driven design: using web analytics to validate heuristics,” bulletin of the american society for information science and technology 33, no. 5 (2007): 20–21; elizabeth l. black, “web analytics: a picture of the academic library web site user,” journal of web librarianship 3, no. 1 (2009): 12–13. 7. trevor claiborne, “introducing in-page analytics: visual context for your analytics data,” google analytics blog, oct. 15, 2010, http://analytics.blogspot .com/2010/10/introducing-in-page-ana tracking abilities, however, all provide a distinct picture of how visitors use a webpage. by using all of them, the web services librarian was able to clearly identify and recommend the links for removal. in addition, she identified other potential usability concerns, such as visitors clicking on nonlinked graphics rather than the link itself. a major bonus of using click analytics tools is their ability to create easy to understand reports that instantly display where visitors are clicking on a webpage. no previous knowledge of web analytics is required to understand these reports. the web services librarian found it simple to present and discuss click analytics reports with other librarians with little to no background in web analytics. this helped increase the transparency of why links were targeted for removal from the homepage. as useful as click analytics tools are, they cannot determine why users click on a link, only where they have clicked. click analytics tools simply visualize website usage statistics. as elizabeth black reports, these “statistics are a trail left by the user, but they do not explain the motivations behind the behavior.”20 she concludes that additional usability studies are required to better understand users and their interactions on a website.21 libraries can use the click analytics reports to identify a problem on a webpage, but further usability testing will explain why there is a problem and help library web managers fix the issue and prevent repeating the mistake in the future. the web services librarian incorporated the use of in-page analytics, clickheat, and crazy egg in her web analytics practices since these tools continue to be useful to test the usage of new content added to a webpage. furthermore, she finds that click analytics’ straightforward reports prompted her to share website use data more often with fellow librarians to assist in other decisionmaking processes for the library’s down too much to get to needed links. by comparing the old homepage and the new homepage confetti reports in crazy egg, it was instantly apparent that the new homepage had significantly fewer clicks on its bottom half than the old version. furthermore, comparing the different versions using the time to click segment in the site overlay showed that placing the link more prominently on the webpage decreased the overall time it took users to click on it. crazy egg’s main drawback is that archived pages that are no longer tracking click data count toward the overall number of snapshots that can be tracked at one time. if libraries regularly retest a webpage, they will easily reach the maximum number of snapshots their subscription permits in a relatively short period. once a crazy egg subscription is cancelled data stored in the account is no longer accessible. this increases the importance of regularly exporting data. crazy egg is designed to export the heat map and confetti view reports. the direct export function takes a snapshot of the current report as it is displayed, and automatically converts that image into a pdf. exporting the heat map is fairly simple because the report is a single image, but exporting all the content in the confetti view report is more difficult because the report is based on segments of click data. each segment type would have to be exported in a separate pdf report to retain all of the content. in addition, there is no export option for the site overlay report so there is not an easy method to manage that information outside of crazy egg. even if libraries are actively exporting reports from crazy egg, data loss is inevitable. summary and conclusions closely examining in-page analytics, clickheat, and crazyegg reveals that each tool has different levels of click 148 information technology and libraries | september 2011 (2009): 81–84. 17. clickheat performance and optimization, labsmedia, http://www .labsmedia.com/clickheat/156894.html (accessed feb. 7, 2011). 18. clickheat, sourceforge, http:// sourceforge.net/projects/clickheat/files/ (accessed feb. 7, 2011). 19. crazy egg, http://www.crazyegg .com/, (accessed on mar. 25, 2011). 20. black, “web analytics,” 12. 21. ibid., 12–13. 13. screengrab, firefox add-ons, https://addons.mozilla.org/en-us/fire fox/addon/1146/ (accessed feb. 7, 2011). 14. clickheat, labsmedia, http:// www.labsmedia.com/clickheat/index .html (accessed feb. 7,2011). 15. piwik, http://piwik.org/ (accessed feb. 7, 2011). 16. paul betty, “assessing homegrown library collections: using google analytics to track use of screencasts and flash-based learning objects,” journal of electronic resources librarianship, 21, no. 1 lytics-visual.html (accessed feb. 7, 2011). 8. kaushik, web analytics 2.0, 88. 9. turner, “websites statistics 2.0,” 272–73. 10. kaushik, web analytics 2.0, 53–55. 11. site overlay not displaying outbound links, google analytics help forum, http://www.google.com/ support/forum/p/google+analytics/ thread?tid=39dc323262740612&hl=en (accessed feb. 7, 2011). 12. claiborne, “introducing in-page analytics.” 30 information technology and libraries | march 2010 the path toward global interoperability in cataloging ilana tolkoff libraries began in complete isolation with no uniformity of standards and have grown over time to be ever more interoperable. this paper examines the current steps toward the goal of universal interoperability. these projects aim to reconcile linguistic and organizational obstacles, with a particular focus on subject headings, name authorities, and titles. i n classical and medieval times, library catalogs were completely isolated from each other and idiosyncratic. since then, there has been a trend to move toward greater interoperability. we have not yet attained this international standardization in cataloging, and there are currently many challenges that stand in the way of this goal. this paper will examine the teleological evolution of cataloging and analyze the obstacles that stand in the way of complete interoperability, how they may be overcome, and which may remain. this paper will not provide a comprehensive list of all issues pertaining to interoperability; rather, it will attempt to shed light on those issues most salient to the discussion. unlike the libraries we are familiar with today, medieval libraries worked in near total isolation. most were maintained by monks in monasteries, and any regulations in cataloging practice were established by each religious order. one reason for their lack of regulations was that their collections were small by our standards; a monastic library had at most a few hundred volumes (a couple thousand in some very rare cases). the “armarius,” or librarian, kept more of an inventory than an actual catalog, along with the inventories of all other valuable possessions of the monastery. there were no standard rules for this inventory-keeping, although the armarius usually wrote down the author and title, or incipit if there was no author or title. some of these inventories also contained bibliographic descriptions, which most often described the physical book rather than its contents. the inventories were usually taken according to the shelf organization, which was occasionally based on subject, like most libraries are today. these trends in medieval cataloging varied widely from library to library, and their inventories were entirely different from our modern opacs. the inventory did not provide users access to the materials. instead, the user consulted the armarius, who usually knew the collection by heart. this was a reasonable request given the small size of the collections.1 this type of nonstandardized cataloging remained relatively unchanged until the nineteenth century, when charles c. jewett introduced the idea of a union catalog. jewett also proposed having stereotype plates for each bibliographic record, rather than a book catalog, because this could reduce costs, create uniformity, and organize records alphabetically. this was the precursor to the twentieth-century card catalog. while many of jewett’s ideas were not actually practiced during his lifetime, they laid the foundation for later cataloging practices.2 the twentieth century brought a great revolution in cataloging standards, particularly in the united states. in 1914, the library of congress subject headings (lcsh) were first published and introduced a controlled vocabulary to american cataloging. the 1960s saw a wide array of advancements in standardization. the library of congress (lc) developed marc, which became a national standard in 1973. it also was the time of the creation of anglo-american cataloguing rules (aacr), the paris principles, and international standard bibliographic description (isbd). while many of these standardization projects were uniquely american or british phenomena, they quickly spread to other parts of the world, often in translated versions.3 while the technology did not yet exist in the 1970s to provide widespread local online catalogs, technology did allow for union catalogs containing the records of many libraries in a single database. these union catalogs included the research libraries information network (rlin), the oclc online computer library center (oclc), and the western library network (wln). in the 1980s the local online public access catalog (opac) emerged, and in the 1990s opacs migrated to the web (webpacs).4 currently, most libraries have opacs and are members of oclc, the largest union catalog, used by more than 71,000 libraries in 112 countries and territories.5 now that most of the world’s libraries are on oclc, librarians face the challenge and inconvenience of discrepancies in cataloging practice due to the differing standards of diverse countries, languages, and alphabets. the fields of language engineering and linguistics are working on various language translation and analysis tools. some of these include machine translation; ontology, or the hierarchical organization of concepts; information extraction, which deciphers conceptual information from unorganized information, such as that on the web; text summarization, in which computers create a short summary from a long piece of text; and speech processing, which is the computer analysis of human speech.6 while these are all exciting advances in information technology, as of yet they are not intelligent enough to help us establish cataloging interoperability. it will be interesting to see whether language engineering tools will be capable of helping catalogers in the future, but for now they are ilana tolkoff (ilana.tolkoff@gmail.com) holds a ba in music and italian from vassar college, an ma in musicology from brandeis university, and an mls from the university at buffalo. she is currently seeking employment as a music librarian. the path toward global interoperability in cataloging | tolkoff 31 best at making sense of unstructured information, such as the web. the interoperability of library catalogs, which consist of highly structured information, must be tackled through software that innovative librarians of the future will produce. in an ideal world, oclc would be smoothly interoperable at a global level. a single thesaurus of subject headings would have translations in every language. there would be just one set of authority files. all manifestations of a single work would be grouped under the same title, translatable to all languages. there would be a single bibliographic record for a single work, rather than multiple bibliographic records in different languages for the same work. this single bibliographic record could be translatable into any language, so that when searching in worldcat, one could change the settings to any language to retrieve records that would display in that chosen language. when catalogers contribute to oclc, they would create the records in their respective languages, and once in the database the records would be translatable to any other language. because records would be so fluidly translatable, an opac could be searched in any language. for example, the default settings for the university at buffalo’s opac could be english, but patrons could change those settings to accommodate the great variety of international students doing research. this vision is utopian to say the least, and it is doubtful that we will ever reach this point. but it is valuable to establish an ideal scenario to aim our innovation in the right direction. one major obstacle in the way of global interoperability is the existence of different alphabets and the inherently imperfect nature of transliteration. there are essentially two types of transliteration schemes: those based on phonetic structure and those based on morphemic structure. the danger of phonetic transliteration, which mimics pronunciation, is that semantics often get lost. it fails to differentiate between homographs (words that are spelled and pronounced the same way but have different meanings). complications also arise when there are differences between careful and casual styles of speech. park asserts, “when catalogers transcribe words according to pronunciation, they can create inconsistent and arbitrary records.”7 morphemic transliteration, on the other hand, is based on the meanings of morphemes, and sometimes ends up being very different from the pronunciation in the source language. one advantage to this, however, is that it requires fewer diacritics than phonetic transliteration. park, whose primary focus is on korean–roman transliteration, argues that the mccune reischauer phonetic transliteration that libraries use loses too much of the original meaning. in other alphabets, however, phonetic transliteration may be more beneficial, as in the lc’s recent switch to pinyin transliteration in chinese. the lc found pinyin to be more easily searchable than wade-giles or monosyllabic pinyin, which are both morphemic. however, another problem with transliteration that neither phonetic nor morphemic schemes can solve is word segmentation—how a transliterated word is divided. this becomes problematic when there are no contextual clues, such as in a bibliographic record.8 other obstacles that stand in the way of interoperability are the diverse systems of subject headings, authority headings, and titles found internationally. resource description and access (rda) will not deal with subject headings because it is such a hefty task, so it is unlikely that subject headings will become globally interoperable in the near future.9 fortunately, twenty-four national libraries of english speaking countries use lcsh, and twelve non-english-speaking countries use a translated or modified version of lcsh. this still leaves many more countries that use their own systems of subject headings, which ultimately need to be made interoperable. even within a single language, subject headings can be complicated and inconsistent because they can be expressed as a single noun, compound noun, noun phrase, or inverted phrase; the problem becomes even greater when trying to translate these to other languages. bennett, lavoie, and o’neill note that catalogers often assign different subject headings (and classifications) to different manifestations of the same work.10 that is, the record for the novel gone with the wind might have different subject headings than the record for the movie. this problem could potentially be resolved by the functional requirements for bibliographic records (frbr), which will be discussed below. translation is a difficult task, particularly in the context of strict cataloging rules. it is especially complicated to translate among unrelated languages, where one might be syntactic and the other inflectional. this means that there are discrepancies in the use of prepositions, conjunctions, articles, and inflections. the ability to add or remove terms in translation creates endless variations. a single concept can be expressed in a morpheme, a word, a phrase, or a clause, depending on the language. there also are cultural differences that are reflected in different languages. park gives the example of how angloamerican culture often names buildings and brand names after people, reflecting our culture’s values of individualism, while in korea this phenomenon does not exist at all. on the other hand, korean’s use of formal and informal inflections reflects their collectivist hierarchical culture. another concept that does not cross cultural lines is the korean pumasi system in which family and friends help someone in a time of need with the understanding that the favor will be returned when they need it. this cannot be translated into a single english word, phrase, or subject heading. one way of resolving ambiguity in translations is through modifiers or scope notes, but this is only a partial solution.11 because translation and transliteration are so difficult, 32 information technology and libraries | march 2010 as well as labor-intensive, the current trend is to link already existing systems. multilingual access to subjects (macs) is one such linking project that aims to link subject headings in english, french, and german. it is a joint project under the conference of european national librarians among the swiss national library, the bibliothèque nationale de france (bnf), the british library (bl), and die deutsche bibliothek (ddb). it aims to link the english lcsh, the french répertoire d’autorité matière encyclopédique et alphabétique unifié (rameau), and the german schlagwortnormdatei/ regeln für den schlagwortkatalog (swd/rswk). this requires manually analyzing and matching the concepts in each heading. if there is no conceptual equivalent, then it simply stands alone. macs can link between headings and strings or even create new headings for linking purposes. this is not as fruitful as it sounds, however, as there are fewer correspondences than one might expect. the macs team experimented with finding correspondences by choosing two topics: sports, which was expected to have a particularly high number of correspondences, and theater, which was expected to have a particularly low number of correspondences. of the 278 sports headings, 86 percent matched in all three languages, 8 percent matched in two, and 6 percent was unmatched. of the 261 theater headings, 60 percent matched in three languages, 18 percent matched in two, and 22 percent was unmatched.12 even in the most cross-cultural subject of sports, 14 percent of terms did not correspond fully, making one wonder whether linking will work well enough to prevail. a similar project—the virtual international authority file (viaf)—is being undertaken for authority headings, a joint project of the lc, the bnf, and ddb, and now including several other national libraries. viaf aims to link (not consolidate) existing authority files, and its beta version (available at http://viaf.org) allows one to search by name, preferred name, or title. oclc’s software mines these authority files and the titles associated with them for language, lc control number, lc classification, usage, title, publisher, place of publication, date of publication, material type, and authors. it then derives a new enhanced authority record, which facilitates mapping among authority records in all of viaf’s languages. these derived authority records are stored on oai servers, where they are maintained and can be accessed by users. users can search viaf by a single national library or broaden their possibilities by searching all participating national libraries. as of 2006, between the lc’s and ddb’s authority files, there were 558,618 matches, including 70,797 complex matches (one-to-many), and 487,821 unique matches (one-to-one) out of 4,187,973 lc names and 2,659,276 ddb names. ultimately, viaf could be used for still more languages, including non-roman alphabets.13 recently the national library of israel has joined, and viaf can link to the hebrew alphabet. a similar project to viaf that also aimed to link authority files was linking and exploring authority files (leaf), which was under the auspices of the information society technologies programme of the fifth framework of the european commission. the three-year project began in 2001 with dozens of libraries and organizations (many of which are national libraries), representing eight languages. its website describes the project as follows: information which is retrieved as a result of a query will be stored in a pan-european “central name authority file.” this file will grow with each query and at the same time will reflect what data records are relevant to the leaf users. libraries and archives wanting to improve authority information will thus be able to prioritise their editing work. registered users will be able to post annotations to particular data records in the leaf system, to search for annotations, and to download records in various formats.14 park identifies two main problems with linking authority files. one is that name authorities still contain some language-specific features. the other is that disambiguation can vary among name authority systems (e.g., birth/death dates, corporate qualifiers, and profession/ activity). these are the challenges that projects like leaf and viaf must overcome. while the linking of subject headings and name authorities is still experimental and imperfect, the frbr model for linking titles is much more promising and will be incorporated in the soon-to-be-released rda. according to bennett, lavoie, and o’neill, there are three important benefits to frbr: (1) it allows for different views of a bibliographic database, (2) it creates a hierarchy of bibliographic entities in the catalog such that all versions of the same work fall into a single collapsible entry point, (3) and the confluence of the first two benefits makes the catalog more efficient. in the frbr model, the bibliographic record consists of four entities: (1) the work, (2) the expression, (3) the manifestation, and (4) the item. all manifestations of a single work are grouped together, allowing for a more economical use of information because the title needs to be entered only once.15 that is, a “title authority file” will exist much like a name authority file. this means that all editions in all languages and in all formats would be grouped under the same title. for example, the lord of the rings title would include all novels, films, translations, and editions in one grouping. this would reduce the number of bibliographic records, and as danskin notes, “the idea of creating more records at a time when publishing output threatens to outstrip the cataloguing capacity of national bibliographic agencies is alarming.”16 the frbr model is particularly beneficial for complex canonical works like the bible. there are a small number of complex canonical works, but they take up a the path toward global interoperability in cataloging | tolkoff 33 disproportionate number of holdings in oclc.17 because this only applies to a small number of works, it would not be difficult to implement, and there would be a disproportionate benefit in the long run. there is some uncertainty, however, in what constitutes a complex work and whether certain items should be grouped under the same title.18 for instance, should prokofiev’s romeo and juliet be grouped with shakespeare’s? the advantage of the frbr model for titles over subject headings or name authorities is that no such thing as a title authority file exists (as conceptualized by frbr). we would be able to start from scratch, creating such title authority files at the international level. subject headings and name authorities, on the other hand, already exist in many different forms and languages so that cross-linking projects like viaf might be our only option. it is encouraging to see the strides being made to make subject headings, name authority headings, and titles globally interoperable, but what about other access points within a record’s bibliographic description? these are usually in only one language, or two if cataloged in a bilingual country. should these elements (format, contents, and so on) be cross-linked as well, and is this even possible? what should reasonably be considered an access point? most people search by subject, author, or title, so perhaps it is not worth making other types of access points interoperable for the few occasions when they are useful. yet if 100 percent universal interoperability is our ultimate utopian goal, perhaps we should not settle for anything less than true international access to all fields in a record. because translation and transliteration are such complex undertakings, linking of extant files is the future of the field. there are advantages and disadvantages to this. on the one hand, linking these files is certainly better than having them exist only for their own countries. they are easily executed projects that would not require a total overhaul of the way things currently stand. the disadvantages are not to be ignored, however. the fact that files do not correspond perfectly from language to language means that many files will remain in isolation in the national library that created them. another problem is that cross-linking is potentially more confusing to the user; the search results on http://www.viaf.org are not always simple and straightforward. if cross-linking is where we are headed, then we need to focus on a more user-friendly interface. if the ultimate goal of interoperability is simplification, then we need to actually simplify the way query results are organized rather than make them more confusing. very soon rda will be released and will bring us to a new level of interoperability. aacr2 arrived in 1978, and though it has been revised several times, it is in many ways outdated and mainly applies to books. rda will bring something completely new to the table. it will be flexible enough to be used in other metadata schemes besides marc, and it can even be used by different industries such as publishers, museums, and archives.19 its incorporation of the frbr model is exciting as well. still, there are some practical problems in implementing rda and frbr, one of which is that reeducating librarians about the new rules will be costly and take time. also, frbr in its ideal form would require a major overhaul of the way oclc and integrated library systems currently operate, so it will be interesting to see to what extent rda will actually incorporate frbr and how it will be practically implemented. danskin asks, “will the benefits of international co-operation outweigh the costs of effecting changes? is the usa prepared to change its own practices, if necessary, to conform to european or wider ifla standards?”20 it seems that the united states is in fact ready and willing to adopt frbr, but to what extent is yet to be determined. what i have discussed in this paper are some of the more prominent international standardization projects, although there are countless others, such as eurowordnet, the open language archives community (olac), and international cataloguing code (icc), to name but a few.21 in general, the current major projects consist of linking subject headings, name authority files, and titles in multiple languages. linking may not have the best correspondence rates, we have still not begun to tackle the cross-linking of other bibliographic elements, and at this point search results may be more confusing than helpful. but the existence of these linking projects means we are at least headed in the right direction. the emergent universality of oclc was our most recent step toward interoperability, and it looks as if cross-linking is our next step. only time will tell what steps will follow. references 1. lawrence s. guthrie ii, “an overview of medieval library cataloging,” cataloging & classification quarterly 15, no. 3 (1992): 93–100. 2. lois mai chan and theodora hodges, cataloging and classification: an introduction, 3rd ed. (lanham, md.: scarecrow, 2007): 48. 3. ibid., 6–8. 4. ibid., 7–9. 5. oclc, “about oclc,” http://www.oclc.org/us/en/ about/default.htm (accessed dec. 9, 2009). 6. jung-ran park, “cross-lingual name and subject access: mechanisms and challenges,” library resources & technical services 51, no. 3 (2007): 181. 7. ibid., 185. 8. ibid. continued on page 39 tagging: an organization scheme for the internet | visser 39 international and o’reilly media, web 2.0 refers to the web as being a platform for harnessing the collective power of internet users interested in creating and sharing ideas and information without mediation from corporate, government, or other hierarchical policy influencers or regulators. web 3.0 is a much more fluid concept as of this writing. there are individuals who use it to refer to a semantic web where information is analyzed or processed by software designed specifically for computers to carry out the currently human-mediated activity of assigning meaning to information on a webpage. there are librarians involved with exploring virtual-world librarianship who refer to the 3d environment as web 3.0. the important point here is that what internet users now know as web 2.0 is in the process of being altered by individuals continually experimenting with and improving upon existing web applications. web 3.0 is the undefined future of the participatory internet. 3. clay shirky, “here comes everybody: the power of organizing without organizations” (presentation videocast, berkman center for internet & society, harvard university, cambridge, mass., 2008), http://cyber.law.harvard.edu/inter active/events/2008/02/shirky (accessed oct. 1, 2008). 4. ibid. 5. lawerence lessig, “early creative commons history, my version,” videocast, aug. 11, 2008, lessig 2.0, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed aug. 13, 2008). 6. elaine peterson, “beneath the metadata: some philosophical problems with folksonomy,” d-lib magazine 12, no. 11 (2006), http://www.dlib.org/dlib/november06/peterson/11peterson .html (accessed sept. 8, 2008). 7. clay shirky, “ontology is overrated: categories, links, and tags” online posting, spring 2005, clay shirky’s writings about the internet, http://www.shirky.com/writings/ontology_ overrated.html#mind_reading (accessed sept. 8, 2008). 8. gene smith, tagging: people-powered metadata for the social web (berkeley, calif.: new riders, 2008): 68. 9. ibid., 76. 10. thomas vander wal, “folksonomy,” online posting, feb. 7, 2007, vanderwal.net, http://www.vanderwal.net/folksonomy .html (accessed aug. 26, 2008). 11. thomas vander wal, “explaining and showing broad and narrow folksonomies,” online posting, feb. 21, 2005, personal infocloud, http://www.personalinfocloud.com/2005/02/ explaining_and_.html (accessed aug. 29, 2008). 12. shirky, “ontology is overrated.” 13. ibid. 14. michael arrington, “exclusive: screen shots and feature overview of delicious 2.0 preview,” online posting, june 16, 2005, techcrunch, http://www.techcrunch.com/2007/09/06/ exclusive-screen-shots-and-feature-overview-of-delicious-20 -preview/(accessed jan. 6, 2010). 15. smith, tagging, 67–93 . 16. vander wal, “explaining and showing broad and narrow folksonomies.” 17. adam mathes, “folksonomies—cooperative classification and communication through shared metadata” (graduate paper, university of illinois urbana–champaign, dec. 2004); peterson, “beneath the metadata”; shirky, “ontology is overrated”; thomas and griffin, “who will create the metadata for the internet?” 18. shirky, “ontology is overrated.” 19. peterson, “beneath the metadata.” 20. cory doctorow, “metacrap: putting the torch to seven straw-men of the meta-utopia,” online posting, aug. 26, 2001, the well, http://www.well.com/~doctorow/metacrap.htm (accessed sept. 15, 2008). 21. marieke guy and emma tonkin, “folksonomies: tidying up tags?” d-lib magazine 12, no. 1 (2006), http://www.dlib .org/dlib/january06/guy/01guy.html (accessed sept. 8, 2008). 22. shirky, “ontology is overrated.” global interoperability continued from page 33 9. julie renee moore, “rda: new cataloging rules, coming soon to a library near you!” library hi tech news 23, no. 9, (2006): 12. 10. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, & technical services 27, no. 1, (2003): 56. 11. park, “cross-lingual name and subject access.” 12. ibid. 13. thomas b. hickey, “virtual international authority file” (microsoft powerpoint presentation, ala annual conference, new orleans, june 2006), http://www.oclc.org/research/ projects/viaf/ala2006c.ppt (accessed dec. 9, 2009). 14. leaf, “leaf project consortium,” http://www.crxnet .com/leaf/index.html (accessed dec. 9, 2009). 15. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 16. alan danskin, “mature consideration: developing bibliographic standards and maintaining values,” new library world 105, no. 3/4, (2004): 114. 17. ibid. 18. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 19. moore, “rda.” 20. danskin, “mature consideration,” 116. 21. ibid.; park, “cross-lingual name and subject access.” author id box for 2 column layout editorial board thoughts | hirst 5 donna hirsteditorial board thoughts the iowa city flood of 2008: a librarian and it professional’s perspective d o you like to chase fire trucks? do you enjoy watching a raft of adventurers go over the waterfall, careening from rock to rock? well, this is a story of the iowa city flood of 2008, a flood projected to happen once every five hundred years, from the perspective of a librarian and it professional. n the approach of the flood the winter of 2008 was hard, and we got mounds of snow. the spring was wet that year in iowa city. it rained almost every day. minnesota’s snow melt-off hadn’t been released from the reservoir due to the heavy rains. everyone watched the river rise, day by day. the parks were underwater; the river was creeping up toward buildings, including the university of iowa. in early june, with about a day and a half notice, library staff at the university’s main library, art library, and music library were told to evacuate. one of the first acts of evacuation was the relocation of all of the library servers to the engineering building up the hill—high and dry—literally rolling them across the street and up the sidewalk. although all servers were relocated to engineering, engineering didn’t have enough power in their server room to handle the extra capacity to run all of our machines. the five primo servers that run our discovery searching service had to stay disconnected. with the servers safe and sound, we moved our attention to staff workstations. the personal workstations of the administrative staff and the finance department were moved to the business library. the libraries’ laptops were collected and moved into the branch libraries, which would be receiving displaced staff. many staff would be expected to work from public clusters in the various library branches, locked down to specific functions. as library staff were collecting their critical possessions, the town was madly sandbagging. more than a million sandbags were piled around university buildings, private businesses, and residences. in retrospect, some of the sandbags may have made a difference, but since the flood was so much greater than anticipated, the water largely went over and around, leaving a lot of soggy sandbags. on june 13, the day before the main library was to be closed, the decision was made to move books up from the basement. there were well over 500,000 volumes in the basement, and a group of approximately five hundred volunteers moved 62,000 volumes and 37,000 manuscript boxes from the lower shelves. volunteers passed books hand to hand into the third, fourth, and fifth floors of the building. a number of the volunteers came from sandbagging teams. individuals who had never been in a boxes of manuscripts being stacked on the fifth floor photo by carol jonck moving boxes out of the basement photo courtesy of the university of iowa news service donna hirst (donna-hirst@uiowa.edu) is project coordinator, library information technology, university of iowa libraries, iowa city. 6 information technology and libraries | december 2008 library, didn’t know what a circulation desk was, or what a library of congress call number was were working hard side by side with physicians, ministers, scientists, students, and retirees. the end result was not orderly, but the collection was saved from the encroaching river. the libraries at the university of iowa are indebted to these volunteers who helped protect the collection from the expected water. n the river peaks approximately twenty university buildings were closed because of the flood, including the main library, the art building, and the music building. the university’s power plant was closed. the entire arts campus was deeply under water. most of the main roads connecting the east side of iowa city to the west side were closed, and most of the highways into iowa city were closed. interstate 80 was closed in multiple places, and no traffic was allowed from the east side of the state to the west side. many bridges in and around iowa city were closed; some had actually crumbled and floated down stream. so the president of the university, sally mason, closed the university for the first time in its history. most staff would not be able to get to work anyway. many individuals were struggling with residences and businesses that were under water. the university was to be closed for the week of june 15, with the university’s hospitals continuing to operate under strained conditions; continued delivery of patient services was a priority. most library staff stayed home and followed the news stories, shocked at the daily news of destruction and loss. select library it staff began working in the background to set up new work environments for library staff returning to foreign workstations or relocated work environments. at the flood’s peak, the main library took several inches of water in the basement. there was slight rusting in the compact shelving, but the collection was completely saved. a portion of the basement was lower, and the computer equipment controlling the libraries’ public computer cluster was completely ruined. this computer cluster housing more than two hundred workstations library staff and volunteers sandbagging photo by carol jonck moving books out of the basement photo courtesy of the university of iowa news service the beginning of a book chain to the fourth floor photo courtesy of the university of iowa news service editorial board thoughts | hirst 7 which had been moved on the last day before the evacuation. much of this administrative work could proceed, and during the first week at the business library our finance department successfully completed our end-ofyear rollover process on all our materials funds. staff from the music library, art library, preservation, and special collections were assigned to the business library. the engineering library adopted the main library circulation and reserve departments. the media services staff was relocated to the physics library. the media staff had cleverly pulled most of the staff development videos and made them available to staff from the physics library, thus allowing the many displaced library staff to make progress on staff development requirements. was completely out of commission. the basements and first floors of the art and music buildings were completely ruined, but the libraries for these disciplines were on higher floors. the collections were spared, but there was absolutely no access to the building. n the libraries take baby steps to resume service after a week of being completely shut down, the university opened to a first day of summer school, but things were not the same. for the nineteen university buildings that had been flooded, hordes of contractors, subcontractors, and laborers began the arduous task of reclamation. university staff could work at home when that was possible, and most of the library’s dislocated reference staff did that, developing courses for the fall, progressing on selection work, and so on. staff could take vacation, but few chose this option. approximately 160 staff from the main library and the art and music libraries were reassigned to four branch libraries that were not affected by the flood. all of central technical services (cts) and interlibrary loan staff were assigned to the hardin health science library. central shipping and facilities was also at harden library, thus the convoluted distribution of mail started from here. most of the public machines were taken by cts staff, but their routine work proceeded very slowly. cts did not have access to oclc until the end of their flood relocation, which seriously impacted their workflow. an early problem that had to be solved was providing telephones and printing to relocated staff. virtually none of the relocated staff had dedicated telephones, even the administration. in any given location the small number of regular branch staff graciously shared their phones with their visitors. sharing equipment tended to be true for printers as well. for a few critical phone numbers in the main library, the phone number was transferred to a designated phone in the branch. thus often, when regular staff or student workers answered a phone, they had no idea what number the originating caller was trying to call. staff were encouraged to transfer their office phone number to their cell phone. at the business library, the library administrative staff and the finance staff had their personal workstations, library staff sandbagging photo by donald baxter 8 information technology and libraries | december 2008 was closed for about four weeks. the art and music libraries may be closed for a year. when library staff returned to the main library, there were books and manuscript boxes piled on the floor and on top of all the study tables. some of the main corridors, approximately twentyone feet wide, were so filled with library materials that you almost had to walk sideways and suck in your tummy to walk down the hall. bathrooms were blocked and access to elevators was limited. every library study table on the third through fifth floors were piled three feet high or more with books. for many weeks, library staff and volunteers carefully sorted through the materials and reshelved them as required. many materials needed conservation treatment, not because of the flood, but because of age and handling. many adjustments needed to be made to resume full service. due dates for all circulation categories had to be retrospectively altered to allow for the libraries being closed and for the extraordinary situations in which our library users found themselves during the flood. library materials were returned wet and moldy, and some items were lost. during the flood, in some cases, buildings actually floated down river. the libraries’ preservation department did extensive community education regarding treatment of materials damaged in the flood. the university was very interested in documenting the affect of the flood, and thus the libraries cooperated in trying to gather statistics on the number of hours of library staff and volunteers used during the flood. record keeping was complex, since one person could be a staff person working on flood efforts but also a volunteer working evenings and weekends. n our neighbors the effect of the iowa city flood of 2008 has been extensive, but was nothing compared to the flood in cedar rapids, our neighbor to the north. the cedar rapids public library lost their entire collection of 300,000 volumes, except for the children’s collection and 26,000 volumes that were checked out to library users that week. it staff were housed throughout the newly distributed libraries complex. one it staff member was at the engineering library, one was at the health science library, and two were at the business library. several it staff were relocated to the campus computer center. n the libraries proceed apace despite hurdles as the water receded and workers cleaned and proceeded with air handling and mold abatement, a very limited number of library staff were allowed back into the main library, typically with escorts, for very limited periods of time. during this time it staff was able to go into the main library and retrieve barcode scanners to allow cts staff to progress with book processing. staff went back for unprocessed materials needing original cataloging since staff had the time to process materials but didn’t have the materials. it staff retrieved some of our zebra printers so that labels could be applied to unbound serials. as it staff were allowed limited access to the main library, they went around to the various staff workstations and powered them up so that relocated staff could utilize the remote desktop function. n moving back the art and music libraries were evacuated june 10. the main library was evacuated june 13. the main library passing the books up the stairs photo courtesy of the university of iowa news service the goal of this paper is to describe a design—including the hardware, software, and configuration––for an open source wireless network. the network designed will require authentication. while care will be taken to keep the authentication exchange secure, the network will otherwise transmit data without encryption. w ireless networks are an essential tool for provid ing service for colleges and libraries. this paper will explain the setup of a wireless network using opensource software and inexpensive commodity hardware. opensource software was employed exclu sively. this allowed for flexibility in design and reduction in expense while also providing a platform for students to learn more about the internal workings of the system by examining particular sections of code in which they have interest. standard commodity hardware was used as a means of saving cost. this should allow others to repeat this design with a minimum of funding. the purpose of a network, like any resource, is to provide a service for those who own it; in this case, the patrons of a library, or students, faculty, and staff at a col lege. to ensure that this network serves its owners, users will be required to authenticate before gaining access. once authenticated, the central captive portal can pro vide different levels of service for specific user groups, including guest access, if desired. for this system, ease of access for users was the primary concern; other than using the secure socket layer for authentication, the remainder of the traffic was unencrypted. other than the base nodes, the remaining access points were connected to each other using a wireless connection in order to avoid physically connecting all access points across campus and to further reduce the expense for the deployment of the network. this was accomplished using the wds (wireless distributed system) feature on the wireless routers. all access points connect to a centralized set of servers that provide: dhcp, webcaching proxy, dns caching, radius, web server, a captive portal, and logging of network traffic. n hardware requirements for the network were relatively modest, using inexpensive wireless routers along with several linux servers built upon older pentium 3 desktop systems. linksys wrt54gs routers were chosen as the access points as they are inexpensive, readily available, and possess the ability to run custom opensource firmware. other access points could be used; however, the configuration sugges tions are specific to the wrt54gs and may not apply to other hardware. the routing functions of the wrt54gs were not used in this implementation. the servers need not be anything special; older hardware will work just fine. for this implementation, decommissioned 900 mhz units with 512mb of ram and 40gb hard drives were used. n wireless router software in order to provide the functionality required, the units had their firmware flashed with an opensource, linux based operating system available from openwrt for the linksys routers (http://www.openwrt.org). support is also available for other wireless devices. “the firmware from openwrt provides a fully writable file system with pack age management. this allows developers the freedom to customize the devices by choosing only the packages and software that are necessary for their applications.”1 as the routers have limited storage, being able to hand select only the necessary components is a definite advantage. n server software for the operating system on the servers, fedora core was chosen.2 fedora provides the yellow dog updater, modified (yum), which eases the updating of all pack ages installed on the system, including kernel updates.3 this aids security by providing a platform for easily and frequently updating the system. fedora core is an open source distribution that is available for free. fedora core also comes with many other opensource packages that were used in this design, such as the apache web server. while the designers had more familiarity with fedora, other distributions are also available that provide simi lar benefits (suse, ubuntu, openbsd, debian, etc.). the server was run in command line mode with no graphical user interface in order to reduce the load on the server and save space on the hard drive. n captive portal in order to require authentication before gaining access to the network, a captive portal was used. some of the open source wifi hotspot implementation | sondag and feher 35 open source wifi hotspot implementation tyler sondag and jim feher jim feher (jdfeher@mckendree.edu) is an associate professor of computer science at mckendree college in lebanon, illinois. tyler sondag (tnsondag@mckendree.edu), is a senior in computer science at mckendree college. 36 information technology and libraries | june 200736 information technology and libraries | june 2007 desired features in the choice of the captive portal were: encrypted authentication, traffic logging, and the ability to provide different levels of service for different user groups. logging traffic allows the system administrators to identify accounts that have been misusing the network. those who inadvertently misuse the system or perhaps have had their accounts compromised can have their access temporarily disabled until they can be contacted with instructions concerning acceptable use of the net work. as the network must be shared by all, those who habitually abuse the resource can have their accounts per manently disabled. the captive portal should also redi rect web traffic to a login page that is served on the secure socket layer until the user logs in. chillispot was chosen as it possesses all of the features mentioned above.4 n server layout as can be seen in appendix a, three servers were used for this implementation. the first server was used as the main router to the internet. the second server ran a squid web caching server.5 it also ran a dns cach ing server and the freeradius server.6 the third was used for the captive portal. three servers were used for various reasons. first, this distributed the load. second, portions of the network that were not behind the cap tive portal could more easily use the services on the second server running squid, dns, and freeradius. it should be noted that three independent servers are not required; many of the services could be consolidated on two or even one single server to reduce the hardware requirements. the implementation depends upon the specific needs for the network. n server installation installing the operating system (fedora core) on each server is a relatively straightforward procedure. each machine was partitioned with 1024 mbs of swap space with the rest of the drive being an ext3 partition with the mount point “/”. only the minimal set of packages required were installed at this time. the first server, server #1 (router), was given three network interfaces, one for the internet connection, one to connect to a switch that then connects to server #2 (web/dns caching and radius) as well as other machines that do not connect through the captive portal, and one connecting to server #3 (captive portal machine). the second server, server #2, only needs one interface, but the third, server #3, requires two interfaces, one for the master wireless access point, and one to connect to the switch connecting this machine to the rest of the network (appendix a). ssh login for root was also disabled at this time for added security. n server #1 configuration for server #1, very little setup was required. since this server works mainly as a router, the only major items that went into its configuration were the iptables rules, which are shown and described in appendix b.7 rules were set up to: n set up network address translation; n allow traffic to flow within the network; n log the traffic from the wireless portion of the net work; n allow for the transparent setup of the web proxy server; and n set up port knocking before allowing users to log into the router via ssh.8 a reference to this script was placed in the /etc/rc.d/ rc.local file so that it would run when the server boots. last was the setup of the three network interfaces in the machine. this can be done during system installation or afterwards on the fedora core based server by editing the configuration files in the /etc/sysconfig/networking scripts/ directory. one of the configuration files used in this implementation can be seen in appendix c. of course the configuration will change as the topology of the net work changes. n server #2 configuration the second server required significantly more setup to configure all of the necessary services that it runs. the first service added for this implementation was the web caching proxy server, squid. squid’s default configura tion file (/etc/squid.conf) is quite large; fortunately it requires little modification to get a simple server up and running.9 the changes made for this implementation can be seen in appendix d. the most important lines in this configuration are the last few, which enable it to act as a transparent proxy server, making it invisible to the users and requiring no setup of their browsers. as there was no need for an authoritative dns server, just dns caching for the network, dnsmasq, which is easy to configure and can handle both dhcp services as well as dns caching, was chosen.10 in this instance, the captive portal was used to provide dhcp services for the wireless clients; however dnsmasq was used for dynamic clients on the remaining portion of the network. dnsmasq public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 37open source wifi hotspot implementation | sondag and feher 37 is relatively easy to configure, requiring only one change in its default configuration file, which points to the file in which the dns server addresses are stored, in this case /etc/dnsmasq_resolv.conf. next is the configuration of freeradius server. there are two files that need to be modified for the radius server; both are in the /etc/raddb/ directory. the first is clients.conf (appendix e). in this file at least two clients must be listed, one for localhost (this machine) and one for the captive portal machine. for each machine, a pass word must be specified as well as the hostname for that machine. this establishes the shared key that is used to encrypt communication between the captive portal and the radius server. the second is the users file (appendix f). in this file, each user for the captive portal system must be listed with his/her password. this implementa tion also included a class, a session timeout (dhcp lease time), idle timeout, accounting interim interval, and the maximum upload and download speeds. if guest access is required, one or several guest accounts should be added to this file along with entries for the registered users. an entry was added for each access point so that they can obtain an ip address from the dhcp server. finally for this machine, the interface configuration file was changed according to the network specifications. for this machine the configuration is simple since it only has one interface, and the only requirement for its address is that it be on the same network as the interface on the main router server to which it is connected. n server #3 configuration the third server required the installation of the captive portal software, in this case chillispot. in order to install chillispot, if fedora was used for the base system, it may be possible to install it as a prepackaged binary in the form an rpm package manager (rpm) file. otherwise, if you find that you need to compile chillispot from source code, you may need to deviate from a minimal installa tion of the operating system and base components and also include the gnu compiler collection (gcc). when installing from source code, first download the code from the chillispot web site. once the code is down loaded, unzipped and untarred, installing the chillispot daemon is done by entering the directory containing the source files and entering the standard commands: ./configure make make install when chillispot is on the system, either by compiling from source or through an rpm file, two more files must be configured and copied to the proper directory, the main configuration file and the login file. the configuration file, chilli.conf, is located in the directory that contains the source files. move this file to the /etc/ directory and make the necessary changes. in this implementation, the file required several changes (appendix g). one of the more significant alterations was to change the default network range of 192.168.182.0/24, which would be limited to less than 256 addresses. the address range was for the dhcp server was also expanded to allow for more users. the lower portion of the network range was left to make room for addresses that could be assigned to the wireless access points. an entry was added to allow the access points to obtain a static ip address in that lower range. after this, settings must be changed for the dns addresses given out to clients, and the address of the radius server. there is also a setting in the chillispot configuration file that allows users to access a certain list of domains without logging in. for this implementation, the decision was to allow the users access to the campus network, as well as to the dns server. next, the “radi ussecret” must be set. this is the same password that was entered into the clients.conf file on the radius server for this machine. it is also necessary to set the address of the page to which users will be directed. two lines must also be added to allow authentication using the physical or media access control (mac) address for the access points. all of the access points shared a common password. chillispot passes the physical address of the access point to the radius server along with this password. a separate entry must exist in the radius configuration file for each ip/physical address combination. for this setup, the redirect page was placed on this server, therefore apache (using yum) was also installed, and this server’s address was added as the web address for the redirect page (also note that the https module may be required for apache if it does not automatically install). rather than write a new page at this time, the sample page (hotspotlogin.cgi) from the chillispot source folder was copied and modified slightly (appendix h). in addi tion, a secure socket layer (ssl) certificate was installed on this server. this is not necessary, but it helps to avoid the warnings that pop up when a client attempts to access the login page with a browser. a few iptables rules need to be added. the first com mand needs to be executed in order to utilize network address translation (nat) and have the server forward packets to the outside network. /sbin/iptables t nat a postrouting o eth0 \ j masquerade the next is used to drop all outbound traffic originating from the access points. this prevents anyone spoofing the physical address of the access point from accessing 3� information technology and libraries | june 20073� information technology and libraries | june 2007 the internet, while still allowing the access points and the chillispot server to communicate for configuration and monitoring. /sbin/iptables a forward s 192.168.182.0/24 \ j drop these commands need to be executed when the chillispot machine boots, so they were placed into the /etc/rc.d/rc.local file. it may also be necessary to ensure that the machine can forward network traffic. this can be accomplished with the following command, which is also found as the first executable command from the script in appendix b: echo “1” > /proc/sys/net/ipv4/ip_forward finally, the configuration files for the interfaces were set up. n openwrt installation and configuration several ways exist to replace the default linksys firmware with the openwrt firmware.11 the tftp protocol can be used with both windows and linux, and one such method can be found in appendix i.12 in addition, other methods for using the standard web interface can be found on the openwrt web site.13 there are several versions of the openwrt firmware available; the newest version that uses the squashfs filesystem was chosen because it utilizes com pression that frees more space on the access point. openwrt comes with a default web interface that can be used for configuration, however, ssh was enabled and a script using the nvram command was used to configure each access point (see appendix j). before ssh can be used, you must telnet into the router and change the default password (which for linksys routers is ‘admin’). note: even if you decide to use the web interface, you should still change the default password. as several services that were installed with the default configuration were not used in the implementa tion, they were disabled once the firmware was flashed by removing the modules that boot at startup: the web interface, dnsmasq, and the firewall. this is done by deleting their entries in the /etc/init.d directory. changes were needed to set the mode of the access point, to turn on and configure the clients needing to use wds, to set the network information for the access point and then to save these settings. all of the wireless access points that communicate with each other via a wireless connec tion must have their physical addresses entered using a nvram command. for example, the command used for the main access point for the library would be: nvram set w10_wds=”mac_4_lib1 mac_4_lib2” all of this is detailed in appendix j. a final set of com mands, which were needed for the wrt54gs, are included to allow the access point to obtain its ip address from the dhcp server. these commands may not be necessary depending upon the type of access point used. since extra wireless access points are available, if an access point fails or is having problems for some reason, it is simply a matter of running a script similar to the one found in the appendix on one of the extra routers and swapping it out. n security unfortunately this system is not very secure. only the login credentials are encrypted via ssl. general data packets are in no way encrypted, so any information being transmitted is available to anyone sniffing the channel. wep and wpa could be used for encryption, but they have known vulnerabilities. other methods exist for securing the network such as wpa with radius or the use of a virtual private network, however the client setup for such systems may not be considered trivial for the typical user. therefore it was decided that it was better to inform the users that the data was not being encrypted and let them act accordingly, rather than use encryption with known flaws or invest the time required to train the general population on how to configure their mobile units to use a more secure form of encryption. as the main goal of this particular network was connectivity and not security, it was felt that this was a fair trade off. as new standards for wireless communication are developed and commodity hardware that supports them becomes available, this may change so that encrypted channels can be employed more easily. n conclusion this implementation is in no way completed. it is a work in progress, with many goals still in mind. also, as new features are desired, parts of the system will change to accommodate these requirements. current plans for the future are first to develop scripts to check the status of the access points and display this information to a web page. these scripts will also notify network administrators when access points go offline. this will help the adminis trators in making sure the system is up at all times. after this, scripts will be developed to parse the log files to find abusive activity (spamming, viruses, etc). however, the current project as described is complete and has already functioned successfully for nearly a year providing con nectivity for the library and portions of the mckendree college campus. public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 3�open source wifi hotspot implementation | sondag and feher 3� references and notes 1. openwrt, wireless freedom. www.openwrt.org (accessed june 16, 2006). 2. the fedora project. www.fedora.redhat.com (accessed nov. 29, 2005). 3. yum: yellow dog updater, modified. www.linux.duke. edu/projects/yum (accessed july 22 2006). 4. chillispot—open source wireless lan access point controller. www.chillispot.org (accessed june 23, 2006). 5. squid web proxy cache. www.squidcache.org (accessed june 1, 2006). 6. freeradius—building the perfect radius server. www. freeradius.org (accessed june 28, 2006). 7. netfilter/iptables project homepage—the netfilter.org project. www.netfilter.org (accessed aug. 8, 2006). 8. thomas eastep, “port knocking and other uses of ‘recent match.’” www.shorewall.net/portknocking.html (accessed aug. 11, 2006). 9. squid web proxy cache, “squid frequently asked questions: interception caching/proxying.” www.squidcache. org/doc/faq/faq17.html (accessed aug. 8, 2006). 10. dnsmasq—a dns forwarder for nat firewalls. www. thekelleys.org.uk/dnsmasq/doc.html (accessed june 1, 2006). 11. linksys.com. www.linksys.com (accessed dec. 15, 2005). 12. openwrtdocs/installing/tftp—openwrt. wiki.open wrt.org/openwrtdocs/installing/tftp?action=show&redirect =openwrtviatfp (accessed aug. 2, 2006). 13. openwrtdocs/installing—openwrt. wiki.openwrt.org/ openwrtdocs/installing (accessed aug. 2, 2006). appendix a. network configuration 40 information technology and libraries | june 200740 information technology and libraries | june 2007 appendix b. iptables script—server #1 # this particular bit must be set to one to allow the # network to forward packets echo “1” > /proc/sys/net/ipv4/ip_forward # set up path to the internal network from internet if the # internal network initiated the connection iptables a forward i eth0 o eth1 d 10.4.0.0 \ m state state established,related j accept # same for the chillispot subnet iptables a forward i eth0 o eth2 d 10.5.0.0 \ m state state established,related j accept # allow the internal subnets to communicate with one another iptables a forward i eth1 d 10.5.0.0 o eth2 \ j accept iptables a forward i eth2 d 10.4.0.0 o eth1 \ j accept # allow subnet containing server 2 to reach the internet iptables a forward i eth1 o eth0 j accept # chillispot – accept and forward packets iptables a forward i eth2 s 10.5.3.30 j accept # set up transparent proxy for wireless network, but allow # connections that go through to the campus network # to bypass proxy iptables t nat a prerouting i eth2 ! \ d 66.99.172.0/23 p tcp dport 80 s 10.5.0.0/16 \ j dnat todestination 10.4.1.90:3128 # nat iptables t nat a postrouting o eth0 \ j masquerade # simple port knocking to allow port 22 connection adapted # from www.shorewall.net/portknocking.html1 another # excellent document can be found at # www.debian-administration.org/articles/26814 # once connection started let it continue iptables a input m state state \ established,related j accept # if name ssh has been set, then allow connection iptables a input p tcp dport 22 m recent \ rcheck name ssh j accept # surround the port that opens ssh so that a sequential port # scanners will end up closing it right after opening it. iptables a input p tcp dport 1233 m recent \ –name ssh remove j drop iptables a input p tcp dport 1234 m recent \ name ssh set j drop iptables a input p tcp dport 1235 m recent \ name ssh remove j drop # drop all packets that do not match a rule above by default iptables a input j drop appendix c. server configuration for first network card (ethernet 0) # /etc/sysconfing/networkingscripts/ifcfgeth0 # server #1 # device=eth0 bootproto=static broadcast=66.128.109.63 hwaddr=00:11:22:33:44:66 ipaddr=66.128.109.60 netmask=255.255.255.248 network=66.128.109.56 onboot=yes type=ethernet appendix d. /etc/squid.conf—server #2 #default squid port http_port 3128 # settings changed to specify memory for squid cache_mem 32 mb cachedir ufs /var/spool/squid 1000 16 256 # allow assess to squid for all within our network acl all src 0.0.0.0/0.0.0.0 http_access allow all http_reply_access allow all # internal host with no externally known name so we put # our internal host name visible_hostname hostname # specifications needed for transparent proxy2 httpd_accel_port 80 httpd_accel_host virtual httpd_accel_with_proxy on httpd_accel_uses_host_header on public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 41open source wifi hotspot implementation | sondag and feher 41 appendix e. /etc/raddb/clients.conf— server #2 client 127.0.0.1 { secret = password shortname = localhost nastype = other } client 10.5.3.30 { secret = password shortname = other machine } appendix f. /etc/raddb/users—server #2 # example of an entry for a user joeuser authtype:=local, userpassword==”passwd” class = 0702345678, sessiontimeout = 3600, idletimeout = 600, acctinteriminterval = 60, wisprbandwidthmaxup = 128000, wisprbandwidthmaxdown = 512000 # example of an entry for an access point # the physical/mac address listed below is for the # lan side of the router/access point mac_address authtype := local, userpassword == “password” framedipaddress = 192.168.182.10, acctinteriminterval = 3600, sessiontimeout = 0, idletimeout = 0 appendix g. /etc/chilli.conf—server #3 # used to expand the network net 192.168.176.0/20 # used to expand the number of hosts that can connect # while still leaving a portion of the network for # infrastructure dynip 192.168.184.0/21 # used to give static addresses to the access points statip 192.168.182.0/24 # internal dns followed by external dns dns1 10.4.1.90 dns2 24.217.0.3 # radius server for the network radiusserver1 10.4.1.90 radiusserver2 10.4.1.90 # radius secret used radiussecret password # interface chillispot server to listens to dhcp requests dhcpif eth1 # specified default login page uamserver https://10.5.3.30/cgibin/hotspotlogin.cgi # addresses that users can visit without authenticating uamallowed 10.4.1.90,24.217.0.3,66.99.172.0/24 # this allows the access points to authenticate based on # mac address only, this is required to log into the access # points from the captive portal server macauth # this password corresponds with the password from the # radius users file macpasswd password 42 information technology and libraries | june 200742 information technology and libraries | june 2007 appendix h. redirection page appendix i. method for flashing firmware of linksys router the firmware can be flashed using the builtin web inter face or via tftp. while help is available online3 for this, the procedure outlined here may also be helpful. on newer versions of the linksys routers, an older version of the linksys firmware must be installed first that supports a bug in the ping function on the router. once the older version is installed, you can exploit a bug in the ping com mand on the router to enable “boot wait,” which enables the router to accept a connection to flash its firmware as it is booting. detailed instructions for this installation are as fol lows: n first, download an old version of a linksys firmware that supports the ping bug to enable boot wait. one is available at: ftp://ftp.linksys.com/pub/network/ wrt54gs_3.37.2_us_code.zip n download and unzip this file. n plug an ethernet patch cable into link #1 on the router (not the wan port) and the interface on your machine. set the ip address of your computer to a static ip address in the 192.168.1.x range, not 192.168.1.1, which is used by the router. n log into router by opening a browser window and putting 192.168.1.1 into the address bar. (note: this is only for factory preset routers.) username: (leave blank) password: admin n click on "administration". n click on "firmware upgrade". n click "browse" and locate the old linksys firmware on your machine. n click "upgrade". n wait patiently while it flashes the firmware…. n click "setup". n click "basic setup". public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 43open source wifi hotspot implementation | sondag and feher 43 n choose "static ip" from the first box. n for the ip address put in "10.0.0.1". n for the netmask put in "255.0.0.0". n for the gateway put in "10.0.0.2". n you can leave everything else as their default set tings. n choose save settings at the bottom of the page. n click on "administration". n click on "diagnostics". n click on "ping". in the “address” box put the following commands in one at a time and click on “ping”; if you see the message that the host was unreachable you have done something wrong. ;cp${ifs}*/*/nvram${ifs}/tmp/n ;*/n${ifs}set${ifs}boot_wait=on ;*/n${ifs}commit ;*/n${ifs}show>tmp/ping.log n after the last command you will see a list of all the nvram settings on the router, make sure that the line for "boot_wait" is set to on n unplug the router (the linksys router will only look for new firmware on boot). n use tftp on your linux or windows machine. n if the openwrt0wrt54gssquashfs.bin file is not in this directory, copy the file to this directory n run the following commands at the prompt (below are the linux commands) tftp 192.168.1.1 tftp> binary tftp> rexmt 1 tftp> timeout 60 tftp> trace tftp> put openwrtxxxx.xxxx.bin n the router will now reboot (it may take a very long time), when it is done rebooting, the dmz light will turn off the new firmware is now loaded onto the router. appendix j. nvram script for wireless routers ## server information stored as comments ##192.168.182.10 mainap 00:11:22:33:44:00 ##192.168.182.11 cl202a 00:11:22:33:44:11 ##192.168.182.20 lib01 00:11:22:33:44:22 ##192.168.182.21 lib02 00:11:22:33:44:33 ##192.168.182.22 lib03 00:11:22:33:44:44 ##192.168.182.30 car01 00:11:22:33:44:55 ## same for all nvram set wl0_mode=ap nvram set wl0_ssid=mck_wireless nvram set wl0_channel=9 nvram set lan_proto=dhcp ## sample configuration for a few access points. ## uncomment and run for the appropriate node. ## make sure to ## add a line for every access point you have. ## unique for lib01 ## allow connections to/from lib02, and lib03 #nvram set wl0_wds=”00:11:22:33:44:33 00:11:22:33:44:44” ## unique for lib02 ## allow connections to/from lib01 #nvram set wl0_wds=”00:11:22:33:44:22” ## unique for lib03 ## allow connections to/from lib01 #nvram set wl0_wds=”00:11:22:33:44:22” ## same for all nvram commit ## same for all ## this needed to be done to allow each wrt54gs router ## to accept an ip address from a dhcp server. this is ## only for the wrt54gs. other access point/routers ## may require something different. # cd /etc/init.d # rm s05nvram # cp /rom/etc/init.d/s05nvram . # vi s05nvram ## place a # in front of (comment out) ## nvram set lan_proto=”static” references 1. thomas eastep, “port knocking and other uses of ‘recent match.’ ” www.shorewall.net/portknocking.html (accessed aug. 11, 2006) 2. ibid. 3. openwrtdocs/installingopenwrt, wiki.openwrt.org/ openwrtdocs/installing (accessed aug. 2, 2006). animated subject maps for book collections tim donahue information technology and libraries | june 2013 7 abstract of our two primary textual formats, articles by far have received the most fiscal and technological support in recent decades. meanwhile, our more traditional format, the book, seems in some ways to already be treated as a languishing symbol of the past. the development of opacs and the abandonment of card catalogs in the 1980s and 1990s is the seminal evolution in print monograph access, but little else has changed. to help users locate books by call number and browse the collection by subject, animated subject maps were created. while the initial aim is a practical one, helping users to locate books and subjects, the subject maps also reveal the knowledge organization of the physical library, which it displays in a way that can be meaningful to faculty, students, and other community members. we can do more with current technologies to assist and enrich the experience of users searching and browsing for books. the subject map is presented as an example of how we can do more in this regard. lc classification, books, and library stacks during the last few decades of technological evolution in libraries, we have helped facilitate a seismic shift from print-based to digital research. our library websites are jammed with electronic resources, digital collection components, database links, virtual reference assistance, online tutorials, and mobile apps. collection budgets too have shifted from a print to electronic focus. many libraries are now spending less than 20 percent of their material budgets on print monographs. and yet, our stacks are still filled with books that often take up more than fifty percent of our library spaces. knowledge organization schemas have also evolved in libraries. we have subject lists to help users to decide on which databases to select that reflect current disciplines and majors in higher education. internal database navigation continues to evolve in terms of limits, fields, and subject searching. web searching is based on the contemporary keyword approach where “everything is miscellaneous” and need not be organized, but nationwide, billions of books still sit on shelves according to dewey or library of congress classification systems that were initially developed over a century ago. some say these organizing systems are woefully antiquated and do not reflect our contemporary post-modern realities, though they still amply serve their purpose to assign call number locations for our books. we hear scant little of plans to update these classification schemes. why invest more time, energy, and resources on revamped organization schemes for libraries? the hathitrust now contains the tim donahue (tdonahue@montana.edu) is assistant professor/instruction librarian, montana state university, bozeman, mt. animated subject maps for book collections | donahue 8 scanned text of more than ten million books. google claims there are almost 130 million published titles in the world and intends to digitize all of them.1 what will happen to our physical book collections? how long will they reside on our library shelves? how long will they be located using the dewey and lc systems? is the library a shrinking organism? profession-wide, there seems to be no concrete vision in regards to the future of our book collections. there is, of course, general acknowledgement that acquisition of e-books will increase as print acquisitions decrease and that, overall, print collections will accordingly shrink to reflect the growing digital nature of knowledge consumption. but for now and into the foreseeable future these billions of monographs remain on our shelves in the same locations our call number systems assigned to them decades ago. and while online library users are now able to utilize an array of electronic access delivery systems and web technologies for their article research and consumption, book seekers still need a call number. books and articles have been our two primary textual formats for centuries. articles have moved into the digital realm more fleetly than their lengthier counterparts. their briefer length, the cyclical serial publication process, and the evolution of database containment and access have enabled, in a relatively short time, a migration from print to primarily digital access. books, however, are accessed in much the same way they were a hundred years ago. the development of opacs in the 1980s and 1990s and abandonment of card catalogs is the seminal evolution in print monograph access, but little else has changed.2 once a call number is attained, the rest of the process remains physical, usually requiring pencil, paper, feet, sometimes a librarian, and a trip through the library until the object itself is found and pulled from the shelf. so while the process of article acquisition may employ a plethora of finding aids, keyword searching, database features, full text availability, and various delivery methods through our richly developed websites, beyond the opac and possibly a static online map, book seekers are on their own or need a librarian in what may seem a meaningless labyrinth of stacks and shelves. while the primary and most practical purpose of our classification schemes is to provide an assigned call number for book finding, these organizational outlines create an order to the layout of our stacks that maps a universe of knowledge within our library walls. this structure of knowledge reveals a meaning to our collections that includes the colocation of books by topic and proximity of related subjects. these features enhance the browsing process and often lead to the act of serendipitous discovery. to locate a book by call number, a user may consult library floor plans, which are typically limited to broad ranges or lc main classes, then rely on stack-end cards to home in on the exact stack location. to browse books by subject without using the catalog, a user typically must rely on a combination of floor plans and lc outline posters if they exist at all. often, informed browsing by subject cannot take place without a visit to the reference desk for mediation by a librarian. even then, many librarians are barely familiar with their book collection’s organizational structure and are reticent to recommend broad subject browsing. information technology and libraries | june 2013 9 purpose and description of the subject map to help users locate books by call number and browse the collection by subject, animated subject maps were created at skidmore college and montana state university. displaying overhead views of library floors, users mouse over stacks to reveal the lc sub-classes located within. alternatively, they may browse and select lc subject headings to see which stacks contain them. the lc outline contains 21 main subject classes and 224 sub-classes, corresponding to the first two elements of a book call number. on stack mouse-over, three items are displayed: the call number by range, the main subject heading, and all sub-classes contained within the stack. when using the browse by subject option, users select and click an lc main class and the stacks where this class is located are highlighted. while the initial aim is a practical one, helping users to locate books and subjects, the subject map also reveals the knowledge organization of the physical library, which it displays in a way that can be meaningful to faculty, students, and other community members. the map also provides local electronic access to the lc classification outline. at both institutions the maps are linked from prominent web locations and electronic points of need that are relevant and proximate to other book searching functions and tools. figure 1. skidmore college subject map showing stack mouse-over display. animated subject maps for book collections | donahue 10 figure 2. montana state university subject map showing stack mouse-over display. design rationale and methodology the inspiration for the subject map started with a question: what if users could see on a map where individual subjects were located within the library? most library maps examined were limited to lc main classes or broad ranges denoting wide swaths of call numbers. including hundreds of lc subclasses would convolute and clutter a floor map beyond usability. but what if an online map contained each individual stack and only upon user-activation was the information revealed, saving space and avoiding clutter? such a map should be as devoid of congestion as possible and focus the user’s attention on library stack locations and lc classification. working from existing maps and architectural blueprints of the library building, a basic perimeter was rendered using adobe illustrator and indesign software. these perimeters were then imported into adobe flash and a new .fla file created. library stacks were then measured, counted, and added as a separate layer within each floor perimeter. basic location elements such as stairways, elevators, and doors were added for locational reference points. each stack was then programmed as a button with basic rollover functionality. flash actionscript was coded so that the correct call number, main class, and sub-class information appear within the interface upon rollover activation. this functionality accounts for the stack searching ability of the subject map. information technology and libraries | june 2013 11 additionally, the lc outline was made searchable within the map so that users can mouse over subjects and upon clicking, see what stacks contain those main classes. this functionality accounts for the subject searching ability of the map. left-hand navigation was built in so users can toggle between these two main search functions. maintaining visual minimalism and simplicity was a priority and inclinations to render the map more comprehensively were resisted in order to maximize attention to subject and stack information. black, white, and gray colors were chosen to enhance the contrast of the map and aid the user’s eye for quick and clear use. other relevant links and instructional context were added to the left-hand navigation including links to the catalog, official lc outline, and library homepage. finally, after uploading to the local server and creating a simple url, links to the subject map were established in prominent and meaningful points of need within the library website. user acceptance once the subject map was completed and links to it were made public, a brief demonstration was provided for reference team members who began showing it to users at the reference desk. initial reaction was enthusiastic. students thought it was “cool” and enjoyed “playing with it.” one reported, “i didn’t know the library actually made sense like that. it’s neat to see the logic about where things are.” another student said, “now i can see where all the books on buddhism are!” faculty, too, were pleased. though faculty members typically know a little about lc classification, they are not accustomed to seeing it visualized and grafted onto their institutional library’s stacks. making transparent the intellectual organization of the library for other faculty can bolster their confidence in our order and structure. professors are often pleased to see their discipline’s place within our stacks and where related subjects are located. the most positive praise for the subject map, however, comes from the sense of convenience it lends. many comments express appreciation for the ability to directly locate an individual book stack. because primary directional and finding elements like stairs and elevators are included in the maps, users are able to see the exact path that leads to the book they are seeking. for those not interested in browsing, in a hurry, or challenged in terms of mobility, the subject map is a time and energy saver. some users however have reported frustration with the sensitivity required for the mouse-over functions. others desire a more detailed level of searching beyond the sub-class level. one user pointed out that the subject map was of no help to the blind. multiple uses and internal applications the primary use and most obvious application of the subject map is as a reference tool. as a front line finding aid, librarians and other public service staff at reference, circulation, or other help desks can easily and conveniently deploy the map to point users in the right direction and orient them to the book collection. in library instruction sessions, the subject map is not only a practical local resource worth pointing out, but also serves as an example of applied knowledge organization. when accompanying a demonstration of the library catalog, the map is not only a valuable finding aid, but adds a layer of meaning as well. students who understand the map are animated subject maps for book collections | donahue 12 not only more able to browse and locate books, but learn that a call number represents a detailed subject meaning as well as locational device. used in conjunction with a tour, the map reinforces the layout of library shelves and helps to bridge the divide between electronic resources and physical retrieval. the subject map facilitates a concrete and visual introduction to the lc classification outline, a knowledge of which can be applied to most college and research libraries in the united states. the subject map can also be of assistance with collection development. perusal of the map can reveal relative strengths and weaknesses within the collection. subject liaisons and bibliographers may use the map to home in on and visualize their assigned areas. circulation staff and stacks maintenance workers find the map useful for book retrieval, shifting projects, and in the training and acclimation of new workers to the library. the subject map has proven to be a useful reference for library redesign and space planning considerations. at information fairs and promotional events where devices or projection screens are available, the map has served as a talking point and promotional piece of digital outreach. the map has been demonstrated by information science professors to lis graduate students as an example of applied knowledge organization in libraries. recently, a newly hired incoming library dean commented that the map helped him “get to know the book collection” and familiarized him with the library. figure 3. skidmore college subject map showing subject search display. information technology and libraries | june 2013 13 issues and challenges in some libraries, books don’t move for decades. the same subjects may reside on the same shelves during an entire library’s lifetime. in this case, a subject map can be designed once and never edited. but, of course, most library buildings go through changes and evolutions. in many libraries, collection shifting seems to be ongoing. book collections wax and wane. certain subjects expand with their times, while others shrink in irrelevancy. weeding does not affect all subjects and stacks equally and adjustments to shelves and end cards are necessary. in addition to the transitions of weeding and shifting, sometimes whole floors are reconfigured. in the library commons era of the last few decades, substantial redesigns have been commonplace as book collections make way for computer stations and study spaces. in all these cases, adjustments and updates will be necessary to keep a subject map accurate. this is easily done by going back into the master .fla file and editing as needed. in many cases only a stack or two need be adjusted, but in instances of major collection shifting some planning ahead may be necessary and more time allotted for redesign. shifting can be a complex spatial exercise and it is difficult to predict where subjects will realign exactly. subject map editing may have to wait until physical shifting is completed. it should be noted that each stack must be hand-coded separately. in libraries with hundreds of stacks this can seem a tedious and time-consuming design method. both subject maps rely on adobe flash animation technology. flash is proprietary software, so the benefits of open source software cannot be utilized with subject maps at this time. further, abobe flash reader software must be installed on a computer for the subject map to render. this has almost never been a problem, however, as the flash reader is ubiquitous and automatically installed on most public and private machines upon initial boot up. another concern, however, relating to flash technology is human assets. not every library has a flash designer or even someone who can implement the most fundamental flash capabilities. flash is not hard to learn and the subject maps utilize only its most basic functionalities, but still, for some it remains a niche software and many libraries will not have the resources to invest. reaction, though, to the live subject maps and the rollover interactivity they provide, has been so positive that more fully integrated flash maps have been proposed. why not have all physical elements of the library incorporated into one flash-enabled map? this is possible but may come at some expense to the functionality of the subject-rendering aspect of the maps. by limiting the application to stacks and lc classes, a user may remain more focused. avoiding clutter, overcrowding, and a preponderance of choice is a design strategy that has gained much credibility in recent years.3 the subject map enjoys the usability success of clean design, limited purpose, and simple rendering. while demonstrating the potential of user-activated animation for other proposed library applications, the subject map might be best maintained as a limited specialty map. a final concern regarding the long-term success of subject maps should be mentioned. how long will books remain in libraries? how long will they be organized by subject? when the physical animated subject maps for book collections | donahue 14 arrangement and organization of information objects no longer exists in libraries, maps of any kind will seemingly lose all efficacy. but will libraries themselves exist in this future? whither books? whither libraries? future developments the most prominent and practical attribute of the subject map is its ability to show a user the exact stack where the book they are seeking is located. but in its current state as a stand-alone application, a user must obtain a call number from a catalog search, then open the subject map by going to its independent url. investigation is underway to determine what is necessary in order to integrate the subject map with the online catalog. in this scenario, a catalog item record might also display an embedded subject map that automatically highlights the floor and stack where the call number is located. this seemingly requires .swf files and flash actionscript to be embedded in catalog coding. one potential solution is to attribute an individual url to each stack rendering so that a get url function can be applied and embedded in each catalog item record. this synthesis of subject map and catalog poses a complex challenge but promises meaningful and time-saving results for the item retrieval process. qr code technology in conjunction with subject map use is also being deployed. by fixing qr codes on stack end cards that link to relevant sections of the lc outline, a researcher may use a mobile device to browse digitally and physically within the stacks at the same time. in this way a user may conduct digital subject browsing and physical item browsing simultaneously. the urls linked to by qr coding contain detailed lc sub-levels not contained within the subject map, which is limited to the level of sub-class. the active discovery of new knowledge facilitated by exploiting preexisting lc organization inside library stacks in real time can be quite impressive when experienced firsthand. another development exploiting lc knowledge organization is in beta mode at this time. an lc search database has been created allowing users to enter words and find matching lc subject terminology. potentially, this database could be merged with the subject map, allowing users to correlate subject word search with physical locations independent of call numbers. despite its intent as a limited specialty map, possibilities are also being explored to incorporate the subject map into a more fully integrated library map. one way forward in this regard is to create map layers that could be toggled on and off by users. in this way, the subject map could exist as its own layer, maintaining its clarity and integrity when isolated but integrated when viewed with other layers. flash technology excels at allowing such layer creation. other stack maps and related technologies searching the web for “subject map” and relative terminology such as stack, shelf, book, and lc maps, does turn up various efforts and approaches to organizing and exploiting classification scheme data, but no animated, user-activated maps are found. similar searches across library and information science literature turn up some explorative research on the possibilities of mapping information technology and libraries | june 2013 15 lc data, but again no animated stack maps are found.4 there is a product licensed by bowker inc. called stackmap that can be linked to catalog search results. when a user clicks on the map link next to a call number result, a map is displayed with the destination stack highlighted, but the information provided is locational only. stackmap is not animated or user-activated. no subject information is given and the map offers no browsing features. since the release of html5, we are beginning to see more animation on the web that is not flashdriven. steve jobs and apple’s determined refusal to run flash on their mobile devices has motivated many to seek other animation options. new html5 animation tools such as adobe edge, hippo animator, and hype offer promising starts at dislodging the flash grip on web animation, but they have far to go and do not yet offer either the ease of design nor the range of creative possibilities of flash. building an animated subject map with html5 alone does not seem possible at this time. universal applicability of the subject map so far, subject maps have been created for two very different libraries. the commonality shared between the montana state university and skidmore college libraries is their possession of hundreds of thousands of books in stacks shelved by the lc classification system. this is a trait shared by nearly all college and research libraries. subject maps can be easily structured on the dewey decimal system as well so that public libraries could benefit from their functionality, making the subject map appropriate and creatable for more than 12,000 libraries.5 of our two primary textual formats, articles by far have received the most fiscal and technological support in recent decades. article searching and retrieval continues to evolve through the rich implementation of assets such as locally constructed resource management tools, independent journal title searches, complexly designed database search interfaces, and dedicated electronic resource librarians. meanwhile, our more traditional format, the book, seems in some ways to already be treated as a languishing symbol of the past. because its future is uncertain, does that justify our neglect in the present? as a profession we seem a bit complacent about the state of our book collections. why dedicate our technical resources to a format that is on the way out? but has the book disappeared yet? as we make room for more student lounges, coffee bars, computer stations, writing labs, and information commons, we should carefully ask what makes a library special. good books and the focused, sustained treatment of knowledge they contain are part of the correct answer, symbolically and as yet, practically speaking. while our books still occupy our library shelves, shouldn’t they also fully benefit from the ongoing technological explosion through which we continue to evolve? opacs haven’t evolved much in recent years. in fact they seem quite stymied to many librarians and users. we can do more with current technologies to assist and enrich the experience of users searching and browsing for books. the subject map is hopefully an example of how we can do more in this regard. while we have grown accustomed to increasingly look forward in order to position our libraries for the future, we should also remember to sometimes look back. our classification systems and animated subject maps for book collections | donahue 16 book collections are assets built from the past that represent many decades of great labor, investment, and achievement. more than 12,000 public and academic libraries together make up one of our greatest national treasures and bulwarks of living democracy. libraries are among the dearest valued assets in any of our states. many of the most beautiful buildings in our nation are libraries. based on library insurance values and estimated replacement costs, library buildings and the collections they hold amount cumulatively to hundreds of billions of dollars of worth.6 this astounding worth is figured mainly from the buildings themselves and the books they contain. a few have commented that there is some aesthetic quality to the subject maps. if this is true, the appeal comes from the synthesis of architectural form and the universe of knowledge revealed within, from the beauty of libraries both real and ideal, from physical and mental constructions unified. animated subject maps can help bring the physical and intellectual beauty of libraries into the digital realm, but the main appeal is a practical one: to point the user directly to the book or subject they are seeking. so in conclusion, perhaps we should measure the subject map’s potential in the light of ranganathan’s five laws of library science:7 1. books are for use. 2. every reader his [or her] book. 3. every book its reader. 4. save the time of the reader. 5. the library is a growing organism. the subject maps can be found at the following urls: skidmore college subject map: http://lib.skidmore.edu/includes/files/subjectmaps/subjectmap.swf montana state university subject map: www.lib.montana.edu/subjectmap references 1. google, “google books library project—an enhanced card catalog of the world’s books,” http://books.google.com/googlebooks/library.html, accessed november 8, 2012. 2. antonella iacono, “opac, users, web. future developments for online library catalogues,” bollettino aib 50, no. 1–2 (2010): 69–88, http://bollettino.aib.it/article/view/5296. 3. geoffrey little, “where are you going, where have you been? the evolution of the academic library web site,” the journal of academic librarianship 38, no. 2, (2012): 123–25, doi:10.1016:j.acalib.2012.02.005. http://lib.skidmore.edu/includes/files/subjectmaps/subjectmap.swf http://www.lib.montana.edu/subjectmap/ http://books.google.com/googlebooks/library.html http://bollettino.aib.it/article/view/5296 http://dx.doi.org/10.1016:j.acalib.2012.02.005 information technology and libraries | june 2013 17 4. kwan yi and lois mai chan, “linking folksonomy to library of congress subject headings: an exploratory study,” journal of documentation 65, no. 6 (2009): 872–900, doi:10.1108:00220410910998906. 5. american library association, “number of libraries in the united states, ala library fact sheet 1,” www.ala.org/tools/libfactsheets/alalibraryfactsheet01. 6. edward marman, “a method for establishing a depreciated monetary value for print collections,” library administration and management 9, no. 2 (1995): 94–98. 7. s. r. ranganathan, the five laws of library science (new delhi: ess ess, 2006), http://hdl.handle.net/2027/mdp.39015073883822. http://dx.doi.org/10.1108:00220410910998906 http://www.ala.org/tools/libfactsheets/alalibraryfactsheet01 http://hdl.handle.net/2027/mdp.39015073883822 social contexts of new media literacy: mapping libraries elizabeth thorne-wallington information technology and libraries | december 2013 53 abstract this paper examines the issue of universal library access by conducting a geospatial analysis of library location and certain socioeconomic factors in the st. louis, missouri, metropolitan area. framed around the issue of universal access to internet, computers, and technology (ict) for digital natives, this paper demonstrates patterns of library location related to race and income. this research then raises important questions about library location, and, in turn, how this impacts access to ict for young people in the community. objectives and purpose the development and diffusion of new media and digital technologies has profoundly affected the literacy experiences of today’s youth.1 young people today develop literacy through a variety of new media and digital technologies.2 the dissemination of these resources has also allowed for youth to have literacy-rich experiences in an array of different settings. ernest morrell, literacy researcher, writes, as english educators, we have a major responsibility to help future english teachers to redefine literacy instruction in a manner that is culturally and socially relevant, empowering, and meaningful to students who must navigate a diverse and rapidly changing world.3 this paper will explore how mapping and geographic information systems (gis) can help illuminate the cultural and social factors related to how and where students access and use new media literacies and digital technology. libraries play an important role in encouraging new media literacy development;4 yet access to libraries must be understood through social and cultural contexts. the objective of this paper is to demonstrate how mapping and gis can be used to provide rigorous analysis of how library location in st. louis, missouri, is correlated with socioeconomic factors defined by the us census including median household income and race. by using gis, the role of libraries in providing universal access to new media resources can be displayed statistically, both challenging and confirming previously held beliefs about library access. this analysis raises new questions about how libraries are distributed across the st. louis area and whether they truly provide universal and equal access. elizabeth thorne-wallington (ethornew@wustl.edu) is a doctoral student in the department of education at washington university in st. louis. mailto:ethornew@wustl.edu information technology and libraries | december 2013 54 literature review advances in technologies are transforming the very meaning of literacy.5 traditionally, literacy has been defined as the ability to understand and make meaning of a given text.6 the changing global economy requires a variety of digital literacies, which schools do not provide.7 instead, young people acquire literacy through a multitude of inand out-of-school experiences with new media and digital technology.8 libraries play a vital role in supporting new media literacy by offering out-of-school access and experiences. to understand the role that libraries play in offering access to new media literacy technologies, a few key concepts must be defined. first is the concept of the digital native. those born around 1980, who have essentially grown up with technology, are known as digital natives.9 digital natives are expected to have a base knowledge of technology and to be able to pick up and learn new technology quickly because of that base knowledge. digital natives have been exposed to technology from a young age and are adept at using a variety of digital technologies. the suggestion is that young people can quickly learn to make use of the new media and technology available in a specific location. key to any discussion of digital natives is the concept of the digital divide. the digital divide has been a central issue of education policy since the mid-1990s.10 early work on the digital divide was concerned primarily with equal access.11 more recently, however, the idea of a “binary digital divide” has been replaced by studies focusing on a multidimensional view of the digital divide.12 hargattai asserts that even among digital natives, there are large variations in internet skills and uses correlated with socioeconomic status, race, and gender.13 these variations call for a nuanced study examining social and cultural factors associated with new media literacy, including out-ofschool contexts. the concept of literacy and learning in out-of-school contexts has a strong historical context. hull and schultz provide a review of the theory and research on literacy in out-of-school settings.14 a variety of studies, including self-guided literacy activities, after-school programs, and reading programs were reviewed, and the significance of out-of-school learning opportunities was supported by these studies. importantly for the research here, research has also been done on the use of digital technology in out-of-school settings. lankshear and knobel examine out-of-school practices extensively with their work on new literacies.15 lankshear and knobel also make clear the complexity of out-of-school experiences among young people. students participate in nontraditional literacy activities such as blogging and remix in a variety of out-of-school contexts, from home computers to community-based organizations to libraries. most importantly, lankshear and knobel found that the students did connect what they learned in the classroom with these out-of-school activities. the connection between out-of-school literacies and in-school learning has also been studied. education policy researcher allan luke writes, the redefined action of governments . . . is to provide access to combinatory forms of enabling capital that enhance students’ possibilities of putting the kinds of practices, texts, and discourses social contexts of new media literacies: mapping libraries| thorne-wallington 55 acquired in schools to work in consequential ways that enable active position taking in social fields.16 collins writes about this relationship between inand out-of-school literacies. collins writes in her case study that there are a variety of “imports” and “exports” in terms of practices. that is, skill transaction works in both directions, with skills learned out of school used in school, and skills learned in school used out of school.17 skerett and bomer make this connection even more explicit when looking at adolescent literacy practices.18 their article examines how a teacher in an urban classroom drew on her students’ out-of-school literacies to inform teaching and learning in a traditional literacy classroom. the authors found that the teacher in their study was able to create a curriculum that engaged students by inviting them to use literacies learned in out-of-school settings. however, the authors write that this type of literacy study was taxing and time-consuming for both the teacher and the student. still, it is clear that connections between inand out-of-school literacies can be made. the role libraries play in making this connection has not been studied as extensively. yet it is clear that young people do use libraries to access technology. becker et al., found that nearly half of the nation’s 14 to 18 year olds had used a library computer within the past year. becker et al. additionally found that for poor children and families, libraries are a “technological lifeline.” among those below the poverty line, 61 percent used public library computers and the internet for educational purposes.19 tripp writes that libraries have long played an important role in helping people gain access to digital media tools, resources, and skills.20 tripp writes that libraries should capitalize on the potential of new media to engage young people. additionally, tripp argues that librarians need to develop skills to train young people to use new media. the idea that libraries are important in meeting the need is further supported by the recent grants, totaling $1.2 million, by the john d. and catherine t. macarthur foundation to build “innovative learning labs for teens” in libraries. this grant making was a response to president obama’s “educate to innovate” campaign, a nationwide effort to bring american students to the forefront in science and math.21 this literature review demonstrates that the body of research currently available focuses on digital natives and the digital divide, but that the research lacks the nuance needed to capture the complexity of social and cultural contexts surrounding the issue. this literature review further demonstrates both the importance of new media literacy and out-of-school learning, as well as the key role that libraries play in supporting these learning opportunities. the study provided here uses gis analysis to demonstrate important socioeconomic and cultural factors that surround libraries and library access. first, i describe the role of gis in understanding context. next, i describe the methods used in this paper. finally, i analyze the results and implications for the study. geographic information systems analysis in education there is a burgeoning body of research which uses geographic information systems (gis) to better understand socioeconomic and cultural contexts of education and literacy issues.22 information technology and libraries | december 2013 56 there are several key works that link geography and social context. lefebvre defines space as socially produced, and he writes that space embodies social relationships shaped by values and meanings. he describes space as a tool for thought and action or as a means of control and domination. lefebvre writes that there is a need for spatial reappropriation in everyday urban life. the struggle for equality, then, is central to the “right of the city.”23 the unequal distributions of resources in the city help to maintain social and economic advantaged positions, which is important to the analysis here of library access. this unequal distribution of resources continues today. de souza briggs and others write that there is clear geographical segregation in american cities today.24 this is seen in housing choice, racial attitudes, and discrimination, as well as metropolitan development and policy coalitions. in the conclusion of his book, de souza briggs writes that housing choice is limited for low-ses minorities, and these limitations produce myriad social effects. again, this finding is important to the contexts of where libraries are located. jargowsky writes of similar findings.25 like de souza briggs, jargowsky focuses on the role that geography plays in terms of neighborhood and poverty. jargowsky even finds social characteristics of these neighborhoods: there is a higher prevalence of single-parent families, lower educational attainment, a higher level of dropouts, and more children living in poverty. important here, though, is that all such characteristics can be displayed geographically, which means that varying housing, economic, and social conditions can be displayed with library locations. soja goes beyond the geographic analysis offered by de souza briggs and jargowsky and writes that space should be applied to contemporary social theory.26 soja found that spatiality should be used in terms of critical human geography to advance a theory of justice on multiple levels. he writes that injustice is spatially construed and that this spatiality shapes social injustice as much as social injustice shapes a specific geography. this understanding, then, shapes how i approach the study of new media literacies as influenced by cultural and social factors. these factors are particularly prevalent in the st. louis, missouri, area. colin gordon reiterates the arguments of lefbvre jargowsky and de souza briggs in arguing that st. louis is a city in decline.27 by providing maps that project housing policies, gordon is able to provide a clear link between historical housing policies such as racial covenants and current urban decline. gordon is able to show that vast populations are moving out of st. louis city and into the county, resulting in a concentration of minority populations in the northern part of the city. gordon argues that the policies and programs offered by st. louis city have only exacerbated the problem and led to greater blight.28 in terms of literacy, morrell makes the most explicit connection between literacy and mapping with a study that used a community-asset mapping activity to make the argument that teachers need to make an explicit connection between literacy at school and the new literacies experienced in the community.29 the significance of this is that gis can be used to illuminate the social and economic contexts of new media literacy opportunities as well, which in turn could help inform social dialogue about the availability of and access to informal education opportunities for new media literacy. social contexts of new media literacies: mapping libraries| thorne-wallington 57 methods and data the gis analysis performed here concerns library locations in the st. louis metropolitan area, including st. louis city and st. louis county. the st. louis metropolitan area was chosen because of past research mapping the segregation of the city, largely because the city and county are so clearly segregated racially and economically along the north–south line. this segregation is striking when displayed geographically and illuminating when mapped with library location. maps were created using tiger files (www.census.gov/geo/maps-data/data/tiger.html) and us census data (http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml), both freely available to the public via internet download. libraries were identified using the st. louis city library’s “libraries & hours” webpage (www.slpl.org/slpl/library/article240098545.asp), the st. louis county library “locations & hours” webpage (www.slcl.org/about/hours_and_locations), google maps (www.maps.google.com), and the yellow pages for the st. louis metropolitan area (www.yellowpages.com). the address of each library was entered into itouchmap (http://itouchmap.com ) to indentify the latitude and longitude of the library. a spreadsheet containing this information was then loaded into the gis software and displayed as x–y data. the maps were then displayed using median household income, african american population, and latino and hispanic population as obtained from the us census at census tract level. for median household income, the data was from 1999. for all other census data, the year was 2010. for district-level data, communication arts data from the missouri department of elementary and secondary education (modese) website (http://dese.mo.gov/dsm ), was entered into microsoft excel, and then displayed on the maps. the data is district level, representing all grades tested for communication arts across all district schools. the modese data was from 2008, the most recent year available at the time the analysis was performed. the communication arts data was taken from the missouri assessment program test. this test is given yearly across the state to all public school students. the state then collects the data and makes it available at the state, district, and school level. the data used here is district-level data. scores are broken into four categories: advanced, proficient, basic, and below basic. the groups for proficient and advanced were combined to indicate the district’s success on the map test. these are the two levels generally considered acceptable or passing by the state.30 before looking at patterns of library location and these socioeconomic and educational factors, density analysis was performed on the library locations using esri arcgis software, version 9.0, to analyze whether clustering was statistically significant. this analysis was used to demonstrate whether libraries were clustered in a statistically significant pattern, or if location was random. the nearest neighbor tool of arcgis was used to determine if a set of features, in this case the libraries, shows a statistically significant level of clustering. this was done by measuring the distance from each library to its single nearest neighbor and calculating the average distance of all the measurements. the tool then created a hypothetical set of data with the same number of features, but placed randomly within the study area. then an average distance was calculated for these features and compared to the real data. that is, a hypothetical random set of locations was compared to the set of actual library locations. a near-neighbor index was produced, which expresses the ratio of the observed distance divided by the distance from the hypothetical data, thus comparing the two sets.31 this score was then standardized, producing a z-score, reported below in the results section. http://www.census.gov/geo/maps-data/data/tiger.html http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml http://www.slpl.org/slpl/library/article240098545.asp http://www.slcl.org/about/hours_and_locations http://www.maps.google.com/ http://www.yellowpages.com/ http://dese.mo.gov/dsm information technology and libraries | december 2013 58 results and conclusions using the nearest neighbor tool produced a z-score of -3.08, showing that the data is clustered beyond the 0.01 significance level. this means that there is a less than 1 percent chance that library location would be clustered to this degree based on chance. knowing, then, that library location is not random, we can now examine socioeconomic patterns of the areas where libraries are located. figure 1 shows library location and population of individuals under the age of 18 at the census tract level for st. louis city and county, using data from the 2010 us census. to clarify, the city and county are divided by the bold black line crossing the middle of the map, the only such boundary in figure 1, where the county is the larger geographic area. library location is important because previous research shows that young people use informal learning environments to access new media technologies,32 and libraries are a key informal learning environment.33 this map demonstrates, however, that libraries are not located in census tracts with the highest populations of individuals under the age of 18 in st. louis city and county. in fact, for all the tracts with the highest number of individuals under the age of 18, there are zero libraries located in these tracts. this is especially concerning given that young people may have less access to transportation, so their access of facilities in neighboring census tracts may be quite limited. figure 1. number of individuals under the age of 18 by census tract and library location in st. louis city and st. louis county. source: 2010 us census. social contexts of new media literacies: mapping libraries| thorne-wallington 59 figure 2 includes maps showing library locations in st. louis city and county in terms of poverty and race by census tract level, as well as act score by district, represented by the bold lines, where st. louis city is represented by a single district, the st. louis public school district. median household income in indicated by the gray shading, with white areas not having data available. first, census tracts with low median household income are clustered in the northern part of the city and county. there are four libraries in the northern half of the city, and eleven libraries in the central and southern parts of the city. there are fewer libraries in the census tracts with low median household income. figure 2. median household income, act score, and library location, st. louis city and county. source: 2010 us census and missouri department of elementary and secondary education, 2010, www.modese.gov. while the nearest neighbor analysis has already demonstrated the libraries are significantly clustered, the maps seem to suggest the pattern of that clustering. this is especially concerning given the report by becker that 61 percent of those living below the poverty line use libraries to access the internet.34 first, in terms of median household income, it does appear that many libraries are located in higher income areas of the city and county. while the libraries appear to be http://www.modese.gov/ information technology and libraries | december 2013 60 clustered centrally, and particularly near major freeways, there appear to be libraries in many of the higher income census tracts. adding to the concern of location is that of access to these library locations. for those living below the poverty line, transportation is often a prohibitive cost, so access from public transportation should also be a major concern for libraries. additionally, in a pattern repeated in figure 4, the location of libraries does not appear to have any effect on act scores, but there are clearly higher act scores in wealthier areas of the city and county. this is not to say that there is a statistical relationship between act score and library location, but rather to look at the spatial patterns of each in order to note similarities and differences in these patterns. figure 3 shows library location by race, including african american or black and hispanic or latino. first, it is important to note that patterns of race in st. louis have been carefully documented by gordon.35 the st. louis area is clearly a highly segregated region, which makes the social contexts of libraries in the st. louis area even more important. this map demonstrates that while there are many libraries in the northern parts of st. louis city and county, none of these libraries is located in the census tracts with the highest populations of those identifying themselves as african american or black in either the city or county. this raises questions about the inequality of access to the libraries. on the other hand, the densest populations of those identifying themselves as hispanic or latino are in the southern part of the city, but not the county. there is a library located in one of those tracts. it appears the areas with higher concentrations of african americans or blacks have fewer libraries, while areas with the higher concentrations of latinos or hispanics are located in the southern parts of the city that do have libraries. it is important to note, however, that the concentrations of latinos and hispanics is quite low, and those areas are majority white census tracts. as noted above, beyond location, access from public transportation is also an important issue. at the same time, the clustering and patterns shown on these maps raise key issues about access based on income and race. libraries are not located in areas with low median household income or in areas with high concentrations of african americans or blacks. this raises serious questions about why libraries are located where they are, and whether the individuals located in these areas have equal access to library resources, particularly new media technologies. social contexts of new media literacies: mapping libraries| thorne-wallington 61 figure 3. african american or black and hispanic, library location, st. louis city and county. source: 2010 us census. the final map raises a slightly different issue, one of test scores and student achievement. figure 4 shows library location by percent proficient or advanced on the missouri achievement program test by district. beyond the location of the libraries, one factor that stands out is that the areas with the lowest percent proficient or advanced are also the areas with the lowest median household income and the highest percentage of those identifying as african american or black. here an interesting pattern emerges. while there are many libraries in the city and northern part of the county, the percent proficient or advanced on the communication arts portion of exam is quite low (20–30 percent). on the other hand, in the western part of the county, there are few libraries, but the percent proficient or advanced is at its highest level. this suggests that there may not be a strong connection between achievement on the map exam and library location, similar to the lack of relationship seen in between act average score and library location in figure 2. at the same time, there does appear to be a correlation between race, income, and test scores. this correlation is noted throughout the literature on student achievement.36 clearly, these maps raise important questions such as how and why libraries are located in a certain area, who uses libraries in a given area, as well as what other informal learning environments and community assets exist in these areas. what is made clear by the maps, though, is that gis can be used as a tool to help understand the context of new media literacy. information technology and libraries | december 2013 62 figure 4. proficient or advanced, communication arts map by district, 2009, and library location. source: missouri department of elementary and secondary education, 2010, www.modese.gov. significance these results demonstrate that gis can be used to illuminate the social, cultural, and economic complexity that surrounds informal learning environments, particularly libraries. this can help demonstrate not only where young people have the opportunity to use new media literacy, but also the complex contextual factors surrounding those opportunities. paired with traditional qualitative and quantitative work, gis can provide an additional lens for understanding new media literacy ecologies, which can help inform dialogue about this topic. for the results of this study, there does appear to be a relationship between library location and race and income. this study illuminates the complex contextual factors affecting libraries. because of the important role that libraries can play in offering young people out of school learning opportunities, particularly in terms of access to new media resources, these contextual factors are important to ensuring equal access and opportunity for all. http://www.modese.gov/ social contexts of new media literacies: mapping libraries| thorne-wallington 63 references 1. ernest morrell, “critical approaches to media in urban english language arts teacher development,” action in teacher education 33, no. 2 (2011): 151–71, doi: 10.1080/01626620.2011.569416. 2. mizuko ito et al., hanging out, messing around, and geeking out: kids living and learning with new media (cambridge: mit press/macarthur foundation, 2010). 3. morrell, “critical approaches to media in urban english language arts teacher development.” 4. lisa tripp, “digital youth, libraries, and new media literacy,” reference librarian 52, no. 4 (2011): 329–41, doi: 10.1080/02763877.2011.584842. 5. gunther kress, literacy in the new media age (london: routledge, 2003). 6. ibid. 7. donna e. alvermann and alison h. heron, “literacy identity work: playing to learn with popular media,” journal of adolescent & adult literacy 45, no. 2 (2001): 118–22. 8. colin lankshear and michele knobel, new literacies: everyday practices and classroom learning (maidenshead: open university press, 2006). 9. john palfrey and urs gasser, born digital: understanding the first generation of digital natives (new york: perseus, 2009). 10. karin m. wiburg, “technology and the new meaning of educational equity,” computers in the schools 20, no. 1–2 (2003): 113–28, doi: 10.1300/j025v20n01_09. 11. rob kling, “learning about information technologies and social change: the contribution of social informatics,” information society 16, no. 3 (2000): 212–24. 12. james r. valadez and richard p. durán, “redefining the digital divide: beyond access to computers and the internet,” high school journal 90, no. 3 (2007): 31–44, http://www.jstor.org/stable/40364198. 13. eszter hargittai, “digital na(t)ives? variation in internet skills and uses among members of the ‘net generation,’” sociological inquiry 80, no. 1 (2010): 92–113, doi: 10.1111/j.1475682x.2009.00317.x. 14. glynda hull and katherine schultz, “literacy and learning out of school: a review of theory and research,” review of educational research 71, no. 4 (2001): 575–611, http://www.jstor.org/stable/3516099. 15. colin lankshear and michele knobel, new literacies. http://www.jstor.org/stable/40364198 http://www.jstor.org/stable/3516099 information technology and libraries | december 2013 64 16. allan luke, “literacy and the other: a sociological approach to literacy research and policy in multilingual societies,” reading research quarterly 38, no. 1 (2003): 132–41, http://www.jstor.org/stable/415697. 17. stephanie collins, “breadth and depth, imports and exports: transactions between the in-and out-of-school literacy practices of an ‘at risk’ youth,” in cultural practices of literacy: case studies of language, literacy, social practice, and power (mahwah, nj: lawrence erlbaum, 2007). 18. allison skerrett and randy bomer, “borderzones in adolescents literacy practices: connecting out-of-school literacies to the reading curriculum,” urban education 46, no. 6 (2011): 1256–79, doi: 10.1177/0042085911398920. 19. samantha becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 20. lisa tripp, “digital youth, libraries, and new media literacy.” 21. nora fleming, “museums and libraries awarded $1.2m to build learning labs,” education week (blog), december 7, 2012, http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarde d_12_million_to_build_learning_labs_for_youth.html. 22. see william f. tate iv and mark hogrebe, “from visuals to vision: using gis to inform civic dialogue about african american males,” race ethnicity and education 14, no. 1 (2011), 51– 71, doi: 10.1080/13613324.2011.531980; mark c. hogrebe and william f. tate iv, “school composition and context factors that moderate and predict 10th-grade science proficiency,” teachers college record 112, no. 4 (2010), 1096–1136; robert j. sampson, great american city: chicago and the enduring neighborhood effect (chicago: university of chicago press, 2012). 23. henri lefebvre, the production of space (oxford: blackwell, 1991). 24. xavier de souza briggs, the georgraphy of opportunity: race and housing choice in metropolitan america (washington, dc: brookings institute press, 2005). 25 paul jargowsky, poverty and place: ghettos, barrios, and the american city (new york: russell sage foundation, 1997). 26. edward w. soja, postmodern geographies: the reassertion of space in critical social theory (new york: verso, 1989). 27. collin gordon, mapping decline: st. louis and the fate of the american city (university of pennsylvania press, 2008). 28. ibid. http://www.jstor.org/stable/415697 http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarded_12_million_to_build_learning_labs_for_youth.html http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarded_12_million_to_build_learning_labs_for_youth.html social contexts of new media literacies: mapping libraries| thorne-wallington 65 29. ernest morrell, “critical approaches to media in urban english language arts teacher development.” 30. missouri department of elementary and secondary education, http://dese.mo.gov/dsm/. 31. david allen, gis tutorial ii: spatial analysis workbook (redlands, ca: esri press, 2009). 32. becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 33. lisa tripp, “digital youth, libraries, and new media literacy.” 34. becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 35. collin gordon, mapping decline: st. louis and the fate of the american city. 36. see mwalimu shujaa, beyond desegregation: the politics of quality in african american schooling (thousand oaks, ca: corwin, 1996); william j. wilson, the truly disadvantaged: the inner city, the underclass, and public policy (chicago: university of chicago press, 1987); gary orfield and mindy l. kornhaber, raising standards or raising barriers: inequality and highstakes testing in public education (new york: century foundation, 2010). http://dese.mo.gov/dsm/ tending a wild garden: library web design for persons with disabilities | vandenbark 23 r. todd vandenbark tending a wild garden: library web design for persons with disabilities nearly one-fifth of americans have some form of disability, and accessibility guidelines and standards that apply to libraries are complicated, unclear, and difficult to achieve. understanding how persons with disabilities access web-based content is critical to accessible design. recent research supports the use of a database-driven model for library web development. existing technologies offer a variety of tools to meet disabled patrons’ needs, and resources exist to assist library professionals in obtaining and evaluating product accessibility information from vendors. librarians in charge of technology can best serve these patrons by proactively updating and adapting services as assistive technologies improve. i n march 2007, eighty-two countries signed the united nations’ convention on the rights of persons with disabilities, including canada, the european community, and the united states. the convention’s purpose was “to promote, protect and ensure the full and equal enjoyment of all human rights and fundamental freedoms by all persons with disabilities, and to promote respect for their inherent dignity.”1 among the many proscriptions for assuring respect and equal treatment of people with disabilities (pwd) under the law, signatories agreed to take appropriate measures: (g) to promote access for persons with disabilities to new information and communications technologies and systems, including the internet; and (h) to promote the design, development, production and distribution of accessible information and communications technologies and systems at an early stage, so that these technologies and systems become accessible at minimum cost. in addition, the convention seeks to guarantee equal access to information by doing the following: (c) urging private entities that provide services to the general public, including through the internet, to provide information and services in accessible and usable formats for persons with disabilities; and (d) encouraging the mass media, including providers of information through the internet, to make their services accessible to persons with disabilities.2 because the internet and its design standards are evolving at a dizzying rate, it is difficult to create websites that are both cutting-edge and standards-compliant. this paper evaluates the challenge of web design as it relates to individuals with disabilities, exploring current standards, and offering recommendations for accessible development. examining the provision of it for this demographic is vital because according to the u.s. census bureau, the u.s. public includes about 51.2 million noninstitutionalized people living with disabilities, 32.5 million of which are severely disabled. this means that nearly one-fifth of the u.s. public faces some physical, mental, sensory, or other functional impairment (18 percent in 2002).3 because a library’s mandate is to make its resources accessible to everyone, it is important to attend to the special challenges faced by patrons with disabilities and to offer appropriate services with those special needs in mind. n current u.s. regulations, standards, and guidelines in 1990 congress enacted the americans with disabilities act (ada), the first comprehensive legislation mandating equal treatment under the law for pwd. the ada prohibits discrimination against pwd in employment, public services, public accommodations, and in telecommunications. title ii of the ada mandates that all state governments, local governments, and public agencies provide access for pwd to all of their activities, services, and programs. since school, public, and academic libraries are under the purview of title ii, they must “furnish auxiliary aids and services when necessary to ensure effective communication.”4 though predating widespread use of the internet, the law’s intent points toward the adoption and adaptation of appropriate technologies to allow persons with a variety of disabilities to access electronic resources in a way that is most effective for them. changes to section 508 of the 1973 rehabilitation act enacted in 1998 and 2000 introduced the first standards for “accessible information technology recognized by the federal government.”5 many state and local governments have since passed laws applying the standards of section 508 to government agencies and related services. according to the access board, the independent federal agency charged with assuring compliance with a variety of laws regarding services to pwd, information and communication technology (ict) includes any equipment or interconnected system or subsystem of equipment, that is used in the creation, conversion, or duplication of data or information. the term electronic r. todd vandenbark (todd.vandenbark@utah.edu) is web services librarian, eccles health sciences library, university of utah, salt lake city. 24 information technology and libraries | march 2010 and information technology includes, but is not limited to, telecommunications products (such as telephones), information kiosks and transaction machines, world wide web sites, multimedia, and office equipment such as copiers and fax machines.6 the access board further specifies guidelines for “web-based intranet and internet information and applications,” which are directly relevant to the provision of such services in libraries.7 what follows is a detailed examination of these standards with examples to assist in understanding and implementation. (a) a text equivalent for every non-text element shall be provided. assistive technology cannot yet describe what pictures and other images look like; they require meaningful text-based information associated with each picture. if an image directs the user to do something, the associated text must explain the purpose and meaning of the image. this way, someone who cannot see the screen can understand and navigate the page successfully. this is generally accomplished by using the “alt” and “longdesc” attributes for images: “short

. however, these aids also can clutter a page when not used properly. the current versions of the most popular screen-reader software do not limit the amount of “alt” text they can read. however, freedom scientific’s jaws 6.x divides the “alt” attribute into distinct chunks of 125 characters each (excluding spaces) and reads them separately as if they were separate graphics.8 this can be confusing to the end user. longer content can be put into a separate text file and the file linked to using the “longdesc” attribute. when a page contains audio or video files, a text alternative needs to be provided. for audio files such as interviews, lectures, and podcasts, a link to a transcript of the audio file must be immediately available. for video clips such as those on youtube, captions must accompany the clip. (b) equivalent alternatives for any multimedia presentation shall be synchronized with the presentation. this means that captions for video must be real-time and synchronized with the actions in the video, not contained solely in a separate transcript. (c) web pages shall be designed so that all information conveyed with color is also available without color, for example from context or markup. while color can be used, it cannot be the sole source or indicator of information. imagine an educational website offering a story problem presented in black and green print, and the answer to the problem could be deciphered using only the green letters. this would be inaccessible to students who have certain forms of color-blindness as well as those who use screen-reader software. (d) documents shall be organized so they are readable without requiring an associated style sheet. the introduction of cascading style sheets (css) can improve accessibility because they allow the separation of presentation from content. however, not all browsers fully support css, so webpages need to be designed so any browser can read them accurately. the content needs to be organized so that it can be read and understood with css formatting turned off. (e) redundant text links shall be provided for each active region of a server-side image map, and (f) client-side image maps shall be provided instead of server-side image maps except where the regions cannot be defined with an available geometric shape. an image map can be thought of as a geometrically defined and arranged group of links to other content on a site. a clickable map of the fifty u.s. states is an example of a functioning image map. a server-side image map would appear to a screen reader only as a set of coordinates, whereas clientside maps can include information about where the link leads through “alt” text. the best practice is to only use client-side image maps and make sure the “alt” text is descriptive and meaningful. (g) row and column headers shall be identified for data tables, and (h) markup shall be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers. correct table coding is critical. each table should use the “table summary” attribute to provide a meaningful description of its content and arrangement: . headers should be coded using the table header (“th”) tag, and its “scope” attribute should specify whether the header applies to a row or a column: ”; return ($statusline, $toemail, $msg); }#end printstatus function # checks the status for the given daemon # takes in ip, port to check, daemon name, and protocol # (tcp/udp). if given port=0 it checks for local daemon sub checkdaemon { my ($ip, $port, $daemon, $proto) = @_; my $dstat = 0; if ($proto !~ /local/){ #su checks for udp ports my $com = ($proto =~ “tcp”) ? (“nmap -p $port $ip | grep $port”) : (“nmap -su -p $port $ip | grep $port”); open(tmp, “$com|”); my $comout = ; close(tmp); if ($comout =~ /open/){ $dstat = 1; #if port is open, status is up } } else{ $daemon =~ s/ +.*//g; #\l lowercases the first letter of $daemon my $com = “which \l$daemon”; open(tmp, “$com|”); my $comout = ; close(tmp); $com = “ps aux | awk ‘{print \$11}’ | grep $comout”; open(tmp, “$com|”); $comout = ; close(tmp); $dstat = 1 if ($comout); 54 information technology and libraries | june 200854 information technology and libraries | september 2008 } return $dstat; } # end checkdaemon function # send the output perl status file to the webserver sub scpfile { my ($filepath, $webservuname, $webservpass, $webservurl, $webservtarg ) = @_; my $command = “scp $filepath $webservuname” .”\@$webservurl:$webservtarg”; my $exp1 = expect->spawn ($command); # the first argument “30” may need to be adjusted # if your system has very high latency my $ret = $exp1->expect(30, “word:”); print $exp1 “$webservpass\r”; my $ret = $exp1->expect(undef); $exp1->close(); } # end scpfile function # send an email to the admin & append error to log file sub sendemail { my ($errorlist, $weboutputurl, $fromemail, $toaddresses ) = @_; my $mailer = mail::mailer->new(“sendmail”); $mailer->open({from => “$fromemail”, to => [$toaddresses], subject => “wireless problem”}); $errorlist .= “\n\n$weboutputurl”; print $mailer $errorlist; $mailer->close(); } # end sendemail function appendix d. script output page appendix e. diagram of network lita cover 2, cover 3, cover 4 index to advertisers article title | author 39 author id box for 2 column layout thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria 39 author id box for 2 column layout knowledge organization systems denotes formally represented knowledge that is used within the context of digital libraries to improve data sharing and information retrieval. to increase their use, and to reuse them when possible, it is vital to manage them adequately and to provide them in a standard interchange format. simple knowledge organization systems (skos) seem to be the most promising representation for the type of knowledge models used in digital libraries, but there is a lack of tools that are able to properly manage it. this work presents a tool that fills this gap, facilitating their use in different environments and using skos as an interchange format. u nlike the largely unstructured information avail able on the web, information in digital libraries (dls) is explicitly organized, described, and man aged. in order to facilitate discovery and access, dl sys tems summarize the content of their data resources into small descriptions, usually called metadata, which can be either introduced manually or automatically generated (index terms automatically extracted from a collection of documents). most dls use structured metadata in accor dance with recognized standards, such as marc21 (u.s. library of congress 2004) or dublin core (iso 2003). in order to provide accurate metadata without ter minological dispersion, metadata creators use different forms of controlled vocabularies to fill the content of typi cal keyword sections. this increase of homogeneity in the descriptions is intended to improve the results provided by search systems. to facilitate the retrieval process, the same vocabularies used to create the descriptions are usu ally used to simplify the construction of user queries. as there are many different schemas for modeling controlled vocabularies, the term knowledge organization systems (kos) is intended to encompass all types of schemas for organizing information and promoting knowledge management. as hodge (2000) says, “a kos serves as a bridge between the users’ information need and the material in the collection.” some types of kos can be highlighted. examples of simple types are glossaries, which are only a list of terms (usually with definitions), and authority files that control variant ver sions of key information (such as geographic or personal names). more complex are subject headings, classifica tion schemes, and categorization schemes (also known as taxonomies) that provide a limited hierarchical structure. at a more complex level, kos includes thesauri and less traditional schemes, such as semantic networks and ontologies, that provide richer semantic relations. there is not a single kos on which everyone agrees. as lesk (1997) notes, while a single kos would be advantageous, it is unlikely that such a system will ever be developed. culture constrains the knowledge classifi cation scheme because what is meaningful to one area is not necessarily meaningful to another. depending on the situation, the use of one or another kos has its advan tages and disadvantages, each one having its place. these schemas, although sharing many characteristics, usually have been treated heterogeneously, leading to a variety of representation formats to store them. thesauri are an example of the format heterogeneity problem. according to iso2788 (norm for monolingual thesauri) (iso 1986), a thesaurus is a set of terms that describe the vocabulary of a controlled indexing language, formally organized so that the a priori relationships between con cepts (for example, synonyms, broader terms, narrower terms, and related terms) are made explicit. this stan dard is complemented with iso5964 (iso 1985), which describes the model for multilingual thesauri, but none of them describe a representation format. the lack of a stan dard representation model has caused a proliferation of incompatible formats created by different organizations. so each organization that wants to use several external thesauri has to create specific tools to transform all of them to the same format. in order to eliminate the heterogeneity of represen tation formats, the w3c initiative has promoted the development of simple knowledge organization systems (skos) (miles et al. 2005) for its use in the semantic web environment. skos has been created to represent simple kos, such as subject heading lists, taxonomies, classifica tion schemes, thesauri, folksonomies, and other types of controlled vocabulary as well as concept schemes embed ded in glossaries and terminologies. although skos has been recently proposed, the number and importance of organizations involved in its creation process (and that publish their kos in this format) indicates that it will probably become a standard for kos representation. skos provides a rich, machinereadable language that is very useful to represent kos, but nobody would expect to have to create it manually or by just using a generalpurpose resource description framework (rdf) editor (skos is rdfbased). however, in the digital library area, there are not specialized tools that are able to manage it adequately. therefore, this work tries to fill this gap, describing an open source tool, thmanager, that thmanager: an open source tool for creating and visualizing skos javier lacasta, javier nogueras-iso, francisco javier lópez-pellicer, pedro rafael muro-medrano, and francisco javier zarazaga-soria javier lacasta (jlacasta@unizar.es) is assistant professor, javier nogueras-iso (jnog@unizar.es) is assistant professor, francisco javier lópez-pellicer (fjlopez@unizar.es) is research fellow, pedro rafael muro-medrano (prmuro@ unizar.es) is associate professor, and francisco javier zarazaga-soria (javy@unizar.es) is associate professor in the computer science and systems engineering department, university of zaragoza, spain. �0 information technology and libraries | september 2007�0 information technology and libraries | september 2007 facilitates the construction of skosbased kos. although thmanager has been created to manage thesauri, it also is appropriate to create and manage any other models that can be represented using skos format. this article describes the thmanager tool, highlight ing its characteristics. thmanager’s layerbased architec ture permits the reuse of the components created for the management of thesauri in other applications where they are also needed. for example, it facilitates the selection of values from a controlled vocabulary in a metadata cre ation tool, or the construction of user queries in a search client. the tool is distributed as open source software accessible through the sourceforge platform (http:// thmanager.sourceforge.net/). ■ state of the art in thesaurus tools and representation models the problem of creating appropriate content for thesauri is of interest in the dl field and other related disciplines, and an increasing number of software packages have appeared in recent years for constructing thesauri. for instance, the web site of willpower information (http://www .willpower.demon.co.uk/thessoft.htm) offers a detailed revision of more than forty tools. some are only avail able as a module of a complete information storage and retrieval system, but others also allow the possibility of working independently of any other software. among these thesaurus creation tools, one may note the follow ing products: ■ bibliotech (http://www.inmagic.com/). this is a multiplatform tool that forms part of bibliotech pro integrated library system and can be used to build an ansi/niso standard thesaurus (standard z39.19 [ansi 1993]). ■ lexico (http://www.pmei.com/lexico.html). this is a javabased tool that can be accessed and/or manip ulated over the internet. thesauri are saved in a textbased format. it has been used by the u.s. library of congress to manage such vocabularies and thesauri as the thesaurus for graphic materials, the global legal information network thesaurus, the legislative indexing vocabulary, and the symbols of american libraries listing. ■ multites (http://www.multites.com/) is a windows based tool that provides support for ansi/niso relationships plus userdefined relationships and comment fields for an unlimited number of thesauri (both monolingual and multilingual). ■ termtree 2000 (http://www.termtree.com.au/) is a windowsbased tool that uses access, sql server, or oracle for data storage. it can import and export trim thesauri (a format used by the towers records information management system [http://www.towersoft.com/]), as well as a defined termtree 2000 tag format. ■ webchoir (http://www.webchoir.com/) is a family of clientserver web applications that provides dif ferent utilities for thesaurus management in multiple dbms platforms. termchoir is a hierarchical infor mation organizing and searching tool that enables one to create and search varieties of hierarchical subject categories, controlled vocabularies, and tax onomies based on either predefined standards or a userdefined structure, and is then exported to an xmlbased format. linkchoir is another tool that allows indexers to describe information sources using terminology organized in termchoir. and seekchoir is a retrieval system that enables users to browse thesaurus descriptors and their references (broader terms, related terms, synonyms, and so on). ■ synaptica (http://www.synaptica.com/) is a client server web application that can be installed locally on a client’s intranet or extranet server. thesaurus data is stored in a sql server or oracle database. the application supports the creation of electronic the sauri in compliance with the ansi/niso standard. the application allows the exchange of thesauri in csv (commaseparated values) text format. ■ superthes (batschi et al. 2002) is a windowsbased tool that allows the creation of thesauri. it extends the ansi/niso relationships, allowing many pos sible data types to enrich the properties of a concept. it can import and export thesauri in xml and tabular format. ■ tematres (hhttp://r020.com.ar/tematres/) is a web application specially oriented to the creation of thesauri, but it also can be used to develop web navigation structures or to manage the documentary languages in use. the thesauri are stored in a mysql database. it provides the created thesauri in zthes (tylor 2004) or in skos format. finally, it must be mentioned that, given that thesauri can be considered as ontologies specialized in organiz ing terminology (gonzalo et al. 1998), ontology editors have sometimes been used for thesaurus construction. a detailed survey of ontology editors can be found in the denny study (2002). all of these tools (desktop or webbased) present some problems in using them as general thesaurus editors. the main one is the incompatibility in the interchange formats that they support. these tools also present integration problems. some are deeply integrated in bigger sys tems and cannot easily be reused in other environments because they need specific software components to work article title | author �1thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �1 (as dbms to store thesauri). others are independent tools (can be considered as generalpurpose thesaurus editors), but their architecture does not facilitate their integration within other information management tools. and most of them are not open source tools, so there is no possibility to modify them to improve their functionality. focusing on the interchange format problem, the iso5964 standard (norm for multilingual thesauri) is currently undergoing review by iso tc46/sc 9, and it is expected that the new modifications will include a stan dard exchange format for thesauri. it is believed that this format will be based on technologies such as rdf/xml. in fact, some initiatives in this direction have already arisen: ■ the adl thesaurus protocol (janée et al. 2003) defines an xml and httpbased protocol for access ing thesauri. as a result of query operations, portions of the thesaurus encoded in xml are returned. ■ the language independent metadata browsing of european resources (limber) project has published a thesaurus interchange format in rdf (matthews et al. 2001). this work introduces an rdf representa tion of thesauri, which is proposed as a candidate thesaurus interchange format. ■ the california environmental resources evaluation system (ceres) and the nbii biological resources division are collaborating in a thesaurus partnership project (ceres/nbii 2003) for the development of an integrated environmental thesaurus and a thesau rus networking toolset for metadata development and keyword searching. one of the deliverables of this project is an rdf format to represent thesauri. ■ the semantic web advanced development for europe (swadeurope 2001) project includes the swadeurope thesaurus activity, which has defined the skos, a set of specifications to represent the knowledge organization systems (kos) on the semantic web (thesauri between them). the british standards bs5723 (bsi 1987) and bs6723 (bsi 1985) (equivalent to the international iso2788 and iso5964) also lack a representation format. the british standards institute idt/2/2 working group is now developing the bs8723 standard that will replace them and whose fifth part will describe the exchange formats and protocols for interoperability of thesauri. the objec tive of this working group is to promote the standard to iso, to replace the iso2788 and iso5964. here, it is important to remark that given the direct involvement of the idt/2/2 working group with skos development; probably the two initiatives will not diverge. the new representation format will be, if not exactly skos, at least skosbased. taking into account all these circumstances, skos seems to be the most adequate representation model to store thesauri. given that skos is rdfbased, it can be created using any tool that is able to manage rdf (usually used to edit ontologies); for example, swoop (mindswap group 2006), protégé (noy et al. 2000), or triple20 (wielemaker et al. 2005). the problem with these tools is that they are too complex for editing and visualizing such a simple model as skos. they are thought to create complex ontologies, so they provide too many options not spe cifically adapted to the type of relations in skos. in addition, they do not allow an integrated management of collection of thesauri and other types of controlled vocabularies as needed in dl processes (for example, the creation of metadata of resources, or the construction of queries in a search system). ■ skos model skos is a representation model for simple knowledge organization systems, such as subject heading lists, tax onomies, classification schemes, thesauri, folksonomies, other types of controlled vocabulary, and also concept schemes embedded in glossaries and terminologies. this section describes the model, providing characteristics, showing the state of development, and indicating the problems found to represent some types of kos. skos was initially developed within the scope of the semantic web advanced development for europe (swadeurope 2001). swade was created to support w3c’s semantic web initiative in europe (part of the ist7 programme). skos is based on a generic rdf schema for thesauri that was initially produced by the desire project (cross et al. 2001), and further developed in the limber project (matthews et al. 2001). it has been developed as a draft of an rdf schema for thesauri com patible with relevant iso standards, and later adapted to support other types of kos. among the kos already published using this new format are gemet (eea 2001), agrovoc (fao 2006), adl feature types (hill and zheng 1999), and some parts of wordnet lexical data base (miller 1990), all of them available on the skos project web page. skos is a collection of three different rdf schema application profiles: skoscore, to store common prop erties and relations; skosmapping, whose purpose is to describe relations between different kos; and skos extension, to indicate specific relations and properties only contained in some type of kos. for the first step of the development of the thmanager tool, only the most stable part of skos has been consid ered. figure 1 shows the part of skoscore used. the rest of skoscore is still unstable, so its support has been delayed until it is approved. skosmapping and skosextension are still in their first steps of develop �2 information technology and libraries | september 2007�2 information technology and libraries | september 2007 ment and are very unstable, so their management in thmanager also has been delayed until the creation of stable versions. in skoscore, a kos (in our case, usually a the saurus) consists of a set of concepts (labelled as skos: concept) that are grouped by a concept scheme (skos: conceptscheme). to distinguish between different mod els provided, the skos:conceptscheme contains a uri that identifies it, but to describe the model content to humans, metadata following the dublin core standard also can be added. the relation of the concept scheme with the concepts of the kos is done through the skos: hastopconcept relation. this relation points at the most general concepts of the kos (top concepts), which are used as entry points to the kos structure. in skos, each concept consists of a uri and a set of properties and relations to other concepts. among the properties, skos.preflabel and skos.altlabel provide labels for a concept in different languages. the first one is used to show the label that better identifies a concept (for the sauri it must be unique). the second one is an alternative label that contains synonyms or spelling variations of the preferred label (it is used to redirect to the preferred label of the concept). the skos concepts also can contain three other properties called skos.scopenote, skos.definition, and skos.example. they contain annotations about the ways to use a concept, a definition, or examples of use in differ ent languages. last, the skos.prefsymbol and skos.altsymbol properties are used to provide a preferred or some alter native symbols that graphically represent the concept. for example, a graphical representation is very useful to identify the meaning of a mathematical formula. another example is a chemical formula, where a graphical repre sentation of the structure of the substance also provides valuable information to the user. with respect to the relations, each concept indicates by means of the skos:inscheme relation in which concept scheme it is contained. the skos.broader and the skos.narrower relations are inverse relations used to model the generalization and specialization characteristics present in many kos (including thesauri). skos.broader relates to more general concepts, and skos.narrower to more spe cific ones. the skos.related relation describes associative relationships between concepts (also present in many thesauri), indicating that two concepts are related in some way. with these properties and relations, it is perfectly possible to represent thesauri, taxonomies, and other types of controlled vocabularies. however, there is a problem for the representation of classification schemes that provide multiple coding of terms, as there is no place to store this information. under this category, one may find classification schemes such as iso639 (iso 2002) (iso standard for coding of languages), which proposes different types of alphanumeric codes (for example, two letters and three letters). for this special case, the skos working group proposes the use of the property skos.notation. although this property is not in the skos vocabulary yet, it is expected to be added in future versions. given the need to work with these types of schemes, this property has been included in the thmanager tool. ■ thmanager architecture this section presents the architecture of thmanager tool. this tool has been created to manage thesauri in skos, but it also is a base infrastructure that facilitates the management of thesauri in dls, simplifying their inte gration in tools that need to use thesauri or other types of controlled vocabularies. in addition, to facilitate its use on different computer platforms, thmanager has been developed using the java objectoriented language. the architecture of thmanager tool is shown in figure 2. the system consists of three layers: first, a repository layer where thesauri are stored and identified by means of associated metadata describing them; second, a per sistence layer that provides an api for access to thesauri stored in the repository; and third, a gui layer that offers different graphical components to visualize thesauri, to search by their properties, and to edit them in different ways. the thmanager tool is an application that uses the different components provided by the gui layer to allow the user to manage the thesauri. in addition, the layered architecture allows other applications to use some of the visualization components or the method provided by the persistence layer to provide access to thesauri. the main features that have guided the design of these layers have been the following: a metadatadriven design, efficient management of thesauri, the possibility of interrelating thesauri, and the reusability of thmanager figure 1. skos model article title | author �3thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �3 components. the following subsections describe these characteristics in detail. metadata-driven design a fundamental aspect in the repository layer is the use of metadata to describe thesauri. thmanager considers metadata of thesauri as basic information in the thesau rus management process, being stored in the metadata repository and managed by the metadata manager. the reason for this metadatadriven design is that thesauri must be described and classified to facilitate the selec tion of the one that better fits the user needs, allowing the user to search them not only by their name but also by the application domain or the associated geographi cal area between others. the lack of metadata makes the identification of useful thesauri (provided by other organizations) difficult, producing a low reuse of them in other contexts. to describe thesauri in our service, a metadata profile based on dublin core has been created. the reason to use dublin core as basis of this profile has been its extensive use in the metadata community. it provides a simple way to describe a resource using very general metadata ele ments, which can be easily matched with complex domain specific metadata standards. additionally, dublin core also can be extended to define application profiles for specific types of resources. following the metadata pro file hierarchy described in tolosanacalasanz et al. (2006), the thesaurus metadata profile refines the definition and domain of dublin core elements as well as includes two new elements (metadata language and metadata identifier) to appropriately identify the metadata records describing a thesaurus. the profile for thesauri has been described using the iemsr format (heery et al. 2005) and is distributed with the tool. iemsr is an rdfbased format created by the jisc ie metadata schema registry project to describe metadata application profiles. figure 3 shows the metadata created for gemet thesaurus (the resource), expressed as a hedgehog graph (reinterpreta tion of rdf triplets: resources, named properties, and values). the purpose of these metadata is not only to sim plify the thesaurus location to a user, but also to facilitate the identification of thesauri useful for a specific task in a machinetomachine communication. for instance, one may be interested only in thesauri that cover a restricted geographical area or have a specific thematic. efficient thesauri storage thesauri vary enormously in size, ranging from hundreds of concepts and properties to millions. so the time spent on load, navigation, and search processes are a functional restriction for a tool that has to manage them. skos is rdfbased, and because reading rdf to extract the con tent is a slow process, the format is not appropriate for inner storage. to provide better access time, thmanager transforms skos into a binary format when a new skos is imported. the persistence layer provides a unified access to the thesaurus repository. this layer is used by the gui layer figure 2. kos manager architecture viewer generatorviewer generator repository concept repository metadata manager concept manager persistence gui disambiguation tool concept core thesaurus persistence manager skos core skos mapping jena api metadata repository thesaurus metadata applications thmanagerthmanager other tools that use thesauri other tools that use thesauri desktop tools that use thesauri other tools that use thesauri other tools that use thesauri other tools that use thesauri other tools that use thesauri desktop tools that use thesauri desktop tools that use thesauri other tools that use thesauri other tools that use thesauri web services that use thesauri other tools that use thesauri other tools that use thesauri other tools that use thesauri other tools that use thesauri web services that use thesauri web services that use thesauri visualization edition search gui manager figure 3. metadata of gemet thesaurus european topic centre on catalogue of data sources (etc/cds) general multilingual environmental thesaurus dc:title dcterms:alternative gemet dc:creator [ http://www2.ulcc.ac.uk/unesco/concept/mt_mt_2.55 ] science.environmental sciences and engineering [ http://www2.ulcc.ac.uk/unesco/concept/mt_2.60 ] science.pollution, disasters and security [ http://www2.ulcc.ac.uk/unesco/concept/mt_2.65 ] science.natural resources dc:subject dc:subject dc:subject dc:subject gemet was conceived as a "general" thesaurus, aimed to define a common general language, a core of general terminology for the environment dc:description dc:publisher european environment agency (eea) dc:date 2005-03-07 dc:type [ http://iaaa.cps.unizar.es/dctype/concept/236 ] text.reference materials.ontology dc:format [ http://iaaa.cps.unizar.es/mimetype/concept/skos ] skos http://www.eionet.eu.int/gemetdc:identifier dc:language en es fr ... iaaa:metadatalanguage en http://iaaa.cps.unizar.es/ontologies/gemetiaaa:metadataidentifier [ http://www2.ulcc.ac.uk/unesco/concept/mt_2.75 ] science.natural sciences [ http://www.eionet.europa.eu ] european environment information and observation network it can be used whenever there is no commercial profitdc:rights dc:relation us environmental protection agency (epa) dc:contributor dc:source [ http://europa.eu/eurovoc ] eurovoc thesaurus european environment agency (eea) dc:creator ... �� information technology and libraries | september 2007�� information technology and libraries | september 2007 to access the thesauri, but it also can be employed by other tools that need to use thesauri outside a desktop environment (for example, a thematic search system accessible through the web that requires browsing a thesaurus to facilitate construction of user queries). this layer performs the transformation of skos to the binary format when a thesaurus is imported. the transformation is provided using the jena library, a popular library to manipulate rdf documents that allows storing them in different kinds of repositories (http://jena.sourceforge. net/). jena provides an open model that can be extended with specialized modules to use other ways of storage, making it possible to easily change the storage format system for another that is more efficient if needed. the data structure used is shown in figure 4. the model is an optimized representation of the information given by the rdf triplets. the concepts map contains the concepts and their associated relations in the form of keyvalue pairs: the key is a uri identifying a concept; and the value is a relations object containing the properties of the concept. a relations object is a map that stores the properties of one concept in the form of pairs. the keys used for this map are the names of the typical property types in the skos model (for example, narrower or broader). the only special cases for encoding these property types in the proposed data structure occur when they have a language attribute (for example, preflabel, definition, or scopenote). in those cases, we propose the use of a [lang] suffix to distinguish the property type for a particular language. for instance, preflabel_en indicates a preflabel property type in english. additionally, it must be noted that the data type of the property values assigned to each key in the relations map varies upon the semantics given to each property type. the data types fall into the following categories: a string for a preflabel property type; a list of strings for altlabel, definition, scope note, and example property types; a uri for a prefsymbol property type; a list of uris for narrower, broader, related, and altsymbol property types; and a list of notation objects for a notation property type. the data type used for notation values is a complex object because there may be different notation types. a notation object consists of type and value attributes. the type attribute is a uri that identifies a particular notation type and qualifies the associated notation value. additionally, and with the objective of increasing the speed of some operations (for example, navigation or search), some optimizations have been added. first, the uris of the top concepts are stored in the topconcepts list. this list contains redundant information, given that those concepts also are stored in the concepts map, but it makes immediate their location. second, to speed up the search of concepts and the drawing of the alphabetic viewer, the translations map has been added. for each language sup ported by the thesaurus, this map contains a translationterm object, or list of pairs , ordered by preflabel. it also contains redundant information that allows the immediate creation of the alphabetic viewer for a language, simplifying the search process; as can be seen later, this does not provides a big over head in load time. in addition, if no alphabetic viewer and search are needed, this structure can be removed without affecting the hierarchical viewer. this solution has proven to be useful to manage the kind of thesauri we use (they do not sur pass 50,000 concepts and about 330,000 properties), loading them to memory in an average com puter in a reasonable time, and allowing immediate navigation and search (see section 6). interrelation of thesauri the vast choice of thesauri that are available nowadays implies an undesired effect of content heterogeneity. although a the saurus is usually created for a specific application domain, some of the concepts defined in thesauri from different applicafigure �. persistence model …… relations uri 3uri 3 relations uri 2uri 2 relations uri 1uri 1 valuekey …… relations uri 3uri 3 relations uri 2uri 2 relations uri 1uri 1 valuekey <> concepts uriprefsymbol list altsymbol list notation stringpreflabel_[lang] list altlabel_[lang] list definition_[lang] list scopenote_[lang] list example_[lang] list related list broader list narrower valuekey uriprefsymbol list altsymbol list notation stringpreflabel_[lang] list altlabel_[lang] list definition_[lang] list scopenote_[lang] list example_[lang] list related list broader list narrower valuekey <> relations -type : uri -value : string notation …… list narrower valuekey …… list narrower valuekey <> relations … uri 390 uri 27 uri 3 … uri 390 uri 27 uri 3 <> topconcepts … -concept : uri -label : string translationterm …… listfr listes listen valuekey …… listfr listes listen valuekey <> translations article title | author �5thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �5 tions domains may be equivalent. in order to facilitate crossdomain classification of resources, users would benefit from the possibility of knowing the connections of a thesaurus in their application domain to thesauri used in other domains. however, it is difficult to manually detect the implicit links between those different thesauri. therefore, in order to automatically facilitate these interthesaurus connections, the persistence layer of thmanager tool provides an interrelation function that relates a thesaurus with respect to an upperlevel lexical database (the concept core displayed in figure 2). the interrelation mechanism is based on the method presented in noguerasiso, zarazagasoria, and muro medrano (2005). it is an unsupervised disambiguation method that uses the relations between concepts as disam biguation context. it applies a heuristic voting algorithm to select the most adequate sense of the used concept core for each thesaurus concept. at the moment, the concept core is the wordnet lexical database. wordnet is a large english lexical database that groups nouns, verbs, adjectives, and adverbs into sets of cognitive synonyms (synsets), each expressing a distinct concept. those synsets are interlinked by means of conceptualsemantic and lexical relations. the interrelation component has been conceived as an independent module that receives a thesaurus as input in skos and returns the relation respect to concept core using an extended version of the skos mapping model (miles and brickley 2004). this model, as commented before, is a part of skos that allows describing exact, major, and minor mappings between concepts of two different kos (in this case between a thesaurus and the common core). skos mapping is still in an early stage of development and has been extended in order to provide the needed functionality. the base skos mapping provides the map:exactmatch, map:majormatch, and map:minormatch relations to indicate the degree of relation between two concepts. given that the interrelation algorithm cannot ensure that a mapping is 100 percent exact, only the major and minor match properties are used. the algorithm returns a list of pos sible mappings with the lexical database for each concept: the one with the highest probability is assigned as major match, and the rest are assigned as minor matches. to store the interrelation probability, skos mapping has been extended by adding a blank node with the liability of the mapping. also, to be able to know which concepts of which thesauri are equivalents to one of the common core, the inverse relations of map:majormatch and map:minormatch have been created. an example of skos mapping can be seen in figure 5. there, the concept 340 of gemet thesaurus (alloy) is correctly mapped to the wordnet concept number 13751474 (alloy, metal) with a probability of 91.007 percent, an unrelated minor mapping also is found, but it is given a low probability (8.992 percent). reusability of thmanager components on top of the api layer, the gui layer has been con structed. this layer contains several graphical interfaces to provide different types of viewers, searchers, and edi tors for thesauri. this layer is used as base for the con struction of the thmanager tool. the tool groups a subset of the provided components, relating them to obtain a final user application that allows the management of the stored thesauri, their visualization (navigation by the concept relations), their edition, and their importation and exportation using skos format. the thmanager tool not only has been created as an independent tool to facilitate thesauri management, but also to allow easy integration in tools that need to use thesauri. it has been done by combining the informa tion management with specific graphical interfaces in different blackbox components. between the provided components, there is a hierarchical viewer, an alphabetic viewer, a list viewer, a searcher, and an editor, but more components can be constructed if needed. the use of the gui layer as a library of reusable graphical components makes it possible to create different tools that are able to manage thesauri with different user requirements with minimum effort, allowing also the integration of this technology in other applications that need controlled vocabularies to improve their functionality. for example, in a metadata creation tool, it can be used to provide the graphical component to select controlled values from thesauri and automatically insert them in the metadata. it also can be used to provide the list of possible values to use in a web search system, or to provide a thesaurus based navigation of a collection of resources in an explor atory search system. figure 6 shows the integration process of a thesau rus visualization component in an external tool. the provided thesaurus components have been constructed following the java beans philosophy (reusable software components that can be manipulated visually in a builder tool), where a component is a black box with methods to read and change its state that can be reused when needed. here, each thesaurus component is a thesaurusbean that can be directly inserted in a graphical application to use its functionality (visualize or edit thesauri) in a very simple way. the thesaurusbeans are provided by the thesaurusbeanmanager that, given the parameters of the thesaurus to visualize and the type of visualization, returns the most adequate component to use. ■ description of thmanager functionality thmanager tool is a desktop application that is able to manage thesauri stored in skos. as regards to the instal �6 information technology and libraries | september 2007�6 information technology and libraries | september 2007 lation requirements, the application requires 100 mbs of free space on the hard disk. with respect to ram and cpu requirements, they depend greatly on the size and the number of thesauri loaded in the tool. considering the number and size of thesauri used as testbed in section 6, ram consumption ranges from 256 to 512 mbs, and with a 3ghz cpu (for example, pentium iv), the load times for the bigger thesauri are acceptable. however, if the size of thesauri is smaller, ram and cpu requirements decrease, being able to operate on a computer with just a 1 ghz cpu (for example, pentium iii) and 128 mbs of ram. given that the management of thmanager is meta data oriented, the first window in the application shows a table including the metadata records describing all the thesauri stored in the system (figure 7). the selection of a record in this table indicates to the rest of the compo nents the selected thesaurus. the creation or deletion of thesauri also is provided here. the only operation that can be performed when no record is selected is to import a new thesaurus stored in skos. to import it, the name of the skos file must be provided. the import tool also contains the option to interrelate the imported thesaurus to the concept core. the metadata of the thesaurus are extracted from inside of the skos if they are available, or they can be provided in an associated xml metadata file. if no metadata record is provided, the application generates a new one with minimum information, using as base the name of the skos file. once the user has selected a thesaurus, it can visualize and modify its metadata or content, export it to skos, or, as commented before, delete it. with respect to the metadata describing a thesaurus, a metadata viewer visualizes the metadata in html and a metadata editor allows the editing of metadata following the thesaurus metadata profile described in the metadatadriven design section (figure 8 shows a screenshot of the metadata edi tor). different html views can be provided by adding more css files to the application. the metadata editor is customiz able. to add or delete metadata elements to the metadata edi tor window, it is only neces sary to modify the description of the iemsr profile for thesauri included in the application. the main functionality of the tool is to visualize the thesaurus structure, showing all proper ties of concepts and allowing the navigation by relations (see figure 9). here, different readonly viewers are provided. there is an alphabetic viewer that shows all the concepts ordered by the preferred label in one language. a hierar chical viewer provides navigation by broader and nar rower relations. additionally, a hypertext viewer shows all properties of a concept and provides navigation by all its relations (broader, narrower, and related) via hyper links. finally, there also is a search system that allows the typical searches needed for thesauri (equals, starts with, contains). currently, search is limited to preferred labels in the selected language, but it could be extended to allow searches by other properties, such as synonyms, defini tions, or scope notes. figure 5. skos mapping extension alloy ... 91.00727 alloy, metal … … 91.00727 map:majormatch iaaa:probability map:majormatch iaaa:hasmajormatch iaaa:hasmajormatch resource property alloy, metal a mixture containing two or more metallic elements or metallic and nonmetallic elements usually fused together or dissolving into each other when molten; "brass is an alloy of zinc and copper" skos:definition map:minormatch iaaa:hasminormatch admixture, alloy map:minormatch iaaa:hasminormatch http://www.eionet.eu.int/ gemet/concept/340 rdf:about a28660 rdf:nodeid a2821 8.992731 iaaa:probability rdf:nodeid http://wordnet.princeton.edu/ wordnet_2.0/13751474 rdf:about skos:preflabel alloy skos:preflabel http://wordnet.princeton.edu/ wordnet_2.0/13664144 the state of impairing the quality or reducing the value of something skos:preflabel skos:definition rdf:about any of a large number of substances having metallic properties and consisting of two or more elements; with few exceptions, the components are usually metallic elements. (source: mgh) skos:definition figure 6. gui component integration desktop tool thesaurusbeanmanager type: tree, thesaurus: gemet thesaurusbean article title | author �7thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �7 all of these viewers are synchronized, so the selec tion of a concept in one of them produces the selection of the same concept in the others. the layered architec ture described previously allows these viewers to be reused in many situations, including other parts of the thmanager tool. for example, in the thesaurus metadata editor described before, the thesaurus viewer is used to facilitate the selection of values for the subject section of metadata. also, in the thesaurus editor shown later, the thesaurus viewer simplifies the selection of a concept related (by some kind of relation) to the selected, and provides a preview of the hierarchical viewer to help to detect wrong relations. the third available operation is to edit the thesaurus structure. here, to create a thesaurus following the skos model, an edition component is provided (see figure 10). the graphical interface shows a list with all the concepts created in the selected thesaurus, allowing the creation of new ones (providing their uris) or deletion of selected ones. once a concept has been selected, its properties and relations to other concepts are shown, allowing the creation of new ones and the deletion of others. to facili tate the creation of relations between concepts, a selector of concepts (based in the thesaurus viewer) is provided, allowing the user to add related concepts without manu ally typing the uri of the associated concept. also, to see if the created thesaurus is correct, a preview of the hier archical viewer can be shown, allowing the user to easily detect problems in the broader and narrower relations. with respect to the interrelation functionality, at the moment the mapping obtained is shown in the thesaurus viewers, but the navigation between equivalent concepts of two thesauri must be be done manually by the user. however, a navigation component still under develop ment will allow the user to jump from a concept in a the saurus to concepts in others that are mapped to the same concept in the common core. as mentioned before, for efficiency, the format used to store the thesauri in the repository is binary, but the inter change format used is skos. so a module for thesauri importation and exportation is provided. this module is able to import from and export to skos. in addition, if the thesaurus has been interrelated with respect to the concept core, it is able to export its mapping to the con cept core using the extended version of skos mapping above. ■ results of the work this section shows some experiments performed with the thmanager tool for the storage and management of a selected set of thesauri. in particular, this set of thesauri is relevant in the context of the geographic information community. the increasing relevance of geographic infor mation for decisionmaking and resource management in different areas of government has promoted the cre ation of geolibraries and spatial data infrastructures to facilitate distribution and access of geographic informa tion (noguerasiso, zarazagasoria, and muromedrano, 2005). in this context, complex metadata schemes, such as iso19115, have been proposed for a fulldetail descrip tion of resources. many of the metadata elements in these schemes are either constrained to a selected vocabulary (iso639 for language encoding, iso3166 for country codes, and so on), or the user is told to pick a term from the most suitable thesaurus. the problems with this sec ond case are that typically the choice for thesauri is quite open, the thesauri are frequently large, and the exchange format of available thesauri is quite heterogeneous. in such a context, the thmanager tool has proven to be very useful to simplify the management of the used thesauri. at the moment, eighty kos between thesauri and other types of controlled vocabulary have been cre ated or transformed to skos and managed through this tool. table 1 shows some of them, indicating their names (name column), the number of concepts (nc column), their total number of properties and relations (np and nr columns), and the number of languages in which concept properties are provided (nl column). to give an idea of the cost of loading these structures, the sizes of skos and binary files (ss and sb columns) are provided in kilobytes (kb). additionally, table 1 compares the performance time of thmanager with respect to other tools that load the figure 7. thesaurus selector figure �. thesaurus metadata editor �� information technology and libraries | september 2007�� information technology and libraries | september 2007 thesauri directly from an rdf file using the jena library (time performance has been obtained using a 3ghz pentium iv processor). for this purpose, three different load times (in seconds) have been computed. the bt column contains the load time of binary files without the cost of creating the gui for the thesauri viewers. the lt column contains the total load time of binary files (including the time of gui creation and drawing). the jt column contains the time spent by a hypothetical rdf based editor tool to invoke jena and load in its memory model the rdf skos files (it does not include gui cre ation) containing the thesauri. the difference between the bt and lt column shows the time used to draw the gui once the thesauri have been loaded in memory. the difference between bt and jt columns shows the gain in terms of time of using a binary storage instead of a rdf based one. the thesauri shown in the table are the adl feature types thesaurus (adl ftt), the isoc thesaurus of geography (isocg), the iso639, the unesco thesaurus (unesco 1995), the ogp surveying and positioning committee code lists (epsg) (ogp 2006), the multilingual agricultural thesaurus (agrovoc), the european vocabulary thesaurus (eurovoc) (eupo 2005), the european territorial units (spain and france) (etu), and the general multilingual environmental thesaurus (gemet). they have been selected because they have different sizes and can be used to show how the load time evolves with the thesaurus size. among them, gemet and agrovoc can be high lighted. although they are provided as skos, they include nonstandard extensions that we have transformed to standard skos relations and properties. eurovoc and unesco are examples of thesauri provided in formats different than skos that we have completely transformed into skos. the former one was in an xmlbased format, and the latter used a plaintext format. another thesaurus transformed to skos is the european territorial units, which contains the administrative political units in spain and france. here, the original source was a collection of heterogeneous documents that contained parts of the needed information and have been processed to generate a skos file. some classification schemes also have been trans formed to skos, such as the iso639 and the different epsg codes for coordinate reference systems (includ ing datums, ellipsoids, and projections). with respect to controlled vocabularies created (by the authors) in skos using the thmanager tool, there is an extended version of the adl feature types that includes a more detailed clas sification of features types and different glossaries used for resource classification. figure 11 depicts the comparison of the different load times shown in table 1 with respect to the size of the rdf skos files. the order of the thesauri in the figure is the same as in the table 1. it can be seen that the time to con struct the model using a binary format is almost half the time spent to create the model using a rdf file. in addi tion, once the binary model is loaded, the time to generate the gui is not very dependent on thesaurus size. this is possible thanks to the redundant information added to facilitate the access to top concepts and to speed up load ing of the alphabetic viewer. this redundant informa tion produces an overhead in the load of the model, but without it the drawing time would be much worse, as it would have to generate it on the fly. however, in spite of the improvements, for the larger thesauri considered, the load time starts to be long, given that it includes the load time of all the structure of the thesaurus in memory and the creation of the objects used to manage it quickly when loaded. but, once it is loaded, future accesses are immediate (quicker than 0.5 seconds). these accesses include opening it again, navigating by figure 9. thesaurus concept selector figure 10. thesaurus concept editor article title | author �9thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria �9 thesaurus relations, changing the visualization language, and searching concepts by their preferred labels. to minimize the load time, thesauri can be loaded in the background when the application is launched, reducing, in that way, the user perception of the load time. another interesting aspect in figure 11 is the peak of the third element. it corresponds with the iso639 classifica tion scheme. it has the special characteristic of not having hierarchy and having many notations. these two character istics produce a little increase in the model load time, given that the top concepts list contains all the concepts and the notations are more complex than other relations. but most of the time is used to generate the gui of the tree viewer. the tree viewer gets all the concepts that are top terms, and for each one it asks for their preferred labels in the selected language and sorts them alphabetically to show the first level of the tree. this is fast for a few hundred concepts, but not for the 7,599 in the iso639. however, this problem could be easily solved if the metadata contained a descrip tion of the type of kos to visualize. if the tool knew that the kos does not have broader and narrower relations, it could use the structures used to visualize the alphabetic list, which are optimized to show all of the kos concepts rapidly, instead of trying to load it as a tree. the persistence approach used has the advantage of not requiring external persistence systems, such as a dbms, and providing rapid access after loading, but it has the drawback of loading all thesauri in memory (in time and space). so, for much bigger thesauri, the use of some kind of dbms would be necessary. if this change were necessary, minimum modifications would be needed (one class). however, if not all the concepts are loaded, the alphabetic viewer (shows all the concepts) would have to be updated (for example, showing the concepts by pages) or it would become too slow to work with it. ■ conclusions this article has presented a tool for managing the the sauri needed in a digital library, for creating metadata, and for running search processes using skos as the interchange format. this work revises the tools that are available to edit thesauri, highlighting the lack of a formalized way to exchange thesauri and the difficulty of integrating those tools in other environments. this work selects skos from the available interchange formats for thesauri as the most promising format to become a standard for skos repre sentation, and highlights the lack of tools that are able to manage it properly. the thmanager tool is offered as the solution to these problems. it is an open source tool that can manage the sauri stored in skos, allowing their visualization and editing. thanks to the layered architecture, its components can be easily integrated in other applications that need to use thesauri or other controlled vocabularies. additionally, the components can be used to control the possible values used in a web search service to facilitate traditional or exploratory searches based on a controlled vocabulary. the performance of the tool is proved through a series of experiments on the management of a selected set of thesauri. this work analyzes the features of this selected set of thesauri and compares the efficiency of this tool with respect to other tools that load the thesauri directly from a rdf file. in particular, it is shown that the internal representation used by thmanager helps to decrease the time spent for the graphical loading of thesauri, facilitating navigation of the thesaurus contents as well as other typical operations, such as sorting or change of visual ization language. additionally, it is worth noting that the tool can be used as a library of components to simplify the integration of the sauri in other applications that require the use of controlled vocabularies. thmanager has been integrated within the open source catmdedit tool table 1. sizes of some thesauri and other types of vocabularies name nc np nr nl lt bt jt ss sb adl ftt 210 210 408 1 0.4 0.047 0.062 103 41 isocg 5,136 5,136 1,026 1 2.4 1.063 1.797 2,796 1,332 iso639 7,599 16,247 0 6 5.1 1.969 2.89 3,870 3,017 unesco 8,600 13,281 21,681 3 2.1 1.406 2.984 4,034 2,135 epsg 4,772 9,544 0 1 1.8 0.969 1.796 2,935 1,682 agrovoc 16,896 103,484 30,361 3 7.5 4.953 14.75 15,859 5,089 eurovoc 6,649 196,391 20,861 15 11.1 9.266 15.828 18,442 11,483 etu 44,991 89,980 89,976 2 13.3 10.625 17.844 23,828 10,412 gemet 5,244 326,602 12,750 21 13.7 11.828 25.61 28,010 15,048 50 information technology and libraries | september 200750 information technology and libraries | september 2007 (zarazagasoria et al. 2003), a metadata editor tool for the documentation of geographic information resources (metadata compliant with iso19115 geographic informa tion metadata standard). the thesaurusbeans provided in thmanager library have been used to facilitate keyword selection for some metadata elements. the thmanager component library also has contributed to the develop ment of catalog search systems guided by controlled vocabularies. for instance, it has been used to build a thematic catalog in the sdiger project (zarazagasoria 2007). sdiger is a pilot project on the implementa tion of the infrastructure for spatial information in europe (inspire) for the development of a spatial data infrastructure to support access to geographic infor mation resources concerned with the european water framework directive. thanks to the thmanager compo nents, the thematic catalog allows browsing of resources by means of several multilingual thesauri, including gemet, unesco, agrovoc, and eurovoc. future work will enhance the functionalities provided by thmanager. first, the ergonomics will be improved to show connections between different thesauri. currently, these connections can be computed and annotated, but the gui does not allow the user to navigate them. as the base technology already has been developed, only a graphical interface is needed. second, the tool will be enhanced to support data types different from texts (for example, images, documents, or other multimedia sources) for the encoding of concepts’ property values. third, it has been noted that the thesauri concepts can evolve with time. thus, a mechanism for the managing the different ver sions of thesauri will be necessary in the future. finally, improvements in usability also are expected. thanks to the componentbased design of thmanager widgets (thesaurusbeans), new viewers or editors can be readily created to meet the needs of specific users. ■ acknowledgments this work has been partially supported by the spanish ministry of education and science through the proj ects tin200600779 and tic200309365c0201 from the national plan for scientific research, development, and technology innovation. the authors would like to express their gratitude to juan josé floristán for his support in the technical development of the tool. references american national standards institute (ansi). 1993. guidelines for the construction, format, and management of monolin gual thesauri. ansi/niso z39.191993. revision of z39.19. batschi, wolfdieter et al. 2002. superthes: a new software for construction, maintenance, and visualisation of mul tilingual thesauri. http://www.treks.cnr.it/docs/st_ enviroinfo_2002.pdf (accessed sept. 6, 2007). british standards institute (bsi). 1985. guide to establishment and development of multilingual thesauri. bs 6723. british standards institute (bsi). 1987. guide to establishment and development of monolingual thesauri. bs 5723. ceres/nbii. 2003. the ceres/nbii thesaurus partnership project. http://ceres.ca.gov/thesaurus/ (accessed june 12, 2007). cross, phil, dan brickley, and traugott koch. 2001. rdf the saurus specification. technical report 1011, institute for learning and research technology. http://www.ilrt.bris.ac.uk/ discovery/2001/01/rdfthes/ (accessed june 12, 2007). denny, michael. 2002. ontology building: a survey of edit ing tools. xml.com. http://xml.com/pub/a/2002/11/06/ ontologies.html (accessed june 12, 2007). european environment agency (eea). 2004. general multilingual environmental thesaurus (gemet). version 2.0. european environment information and observation network. http:// www.eionet.europa.eu/gemet/rdf (accessed june 12, 2007). european union publication office (eupo). 2005. european vocabulary (eurovoc). publications office. http://europa .eu/eurovoc/ (accessed june 12, 2007). food and agriculture organization of the united nations (fao). 2006. agriculture vocabulary (agrovoc). agricul tural information management standards. http://www.fao. org/aims/ag%20alpha.htm (accessed june 12, 2007). gonzalo, julio, et al. 1998. applying eurowordnet to crosslan guage text retrieval. computers and the humanities 32, no. 2/3 (special issue on eurowordnet): 185–207. heery, rachel, et al. 2005. jisc metadata schema registry. in 5th acm/ieee-cs joint conference on digital libraries, 381–81. new york: acm pr. hill, linda, and qi zheng. 1999. indirect geospatial referencing through place names in the digital library: alexandria digi figure 11. thesaurus load times 0 5 10 15 20 25 30 0 5000 10000 15000 20000 25000 30000 skos file size (kb) lo ad t im e (s ) rdf (jena) binary thmanager article title | author 51thmanager | lacasta, nogueras-iso, lópez-pellicer, muro-medrano, and zarazaga-soria 51 tal library experience with developing and implementing gazetteers. in asis ‘99: proceedings of the 62nd asis annual meeting: knowledge: creation, organization, and use, 57–69. med ford, n.j.: information today, for the ameircan society for information science. hodge, gail. 2000. systems of knowledge organization for digital libraries: beyond traditional authority files. washington, d.c.: the digital library federation. international organization for standardization (iso). 1985. guidelines for the establishment and development of multilingual thesauri. iso 5964. international organization for standardization (iso). 1986. guidelines for the establishment and development of monolingual thesauri. iso 2788. international organization for standardization (iso). 2002. codes for the representation of names of languages. iso 639. international organization for standardization (iso). 2003. information and documentation—the dublin core metadata element set. iso 15836:2003. janée, greg, satoshi ikeda, and linda l. hill. 2003. the adl the saurus protocol. http://www.alexandria.ucsb.edu/~gjanee/ thesaurus/ (accessed june 12, 2007). lesk, michael. 1997. practical digital libraries. san francisco: books, bytes, and bucks. matthews, brian m., et al. 2001. internationalising data access through limber. in third international workshop on internationalisation of products and systems: 1–14. milton keynes (uk). http://epubs.cclrc.ac.uk/bitstream/401/limber_iwips.pdf (accessed june 12, 2007). miles, alistair, and dan brickley, eds. 2004. skos mapping vocab ulary specification. w3c. http://www.w3.org/2004/02/ skos/mapping/spec/20041111.html (accessed june 12, 2007). miles, alistair, brian matthews, and michael wilson. 2005. skos core: simple knowledge organization for the web. in 2005 dublin core annual conference—vocabularies in practice, 5–13. madrid: universidad carlos ii de madrid. miller, george a. 1990. wordnet: an online lexical database. int. j. lexicography 3: 235–312. mindswap group. 2006. swoop a hypermediabased feath erweight owl ontology editor. maryland information and network dynamics lab. semantic web agents project. http://www.mindswap.org/2004/swoop/ (accessed june 12, 2007). noguerasiso, javier, francisco javier zarazagasoria, and pedro rafael muromedrano. 2005. geographic information metadata for spatial data infrastructures—resources, interoperability, and information retrieval. new york: springer verlag. noy, natalie f., ray w. fergerson, and mark a. musen. 2000. the knowledge model of protégé2000: combining interoper ability and flexibility. in knowledge engineering and knowledge management: methods, models, and tools: 12th international conference, ekaw 2000, juan-les-pins, france, october 2–6, 2000: proceedings, 120 (lecture notes in computer science, 1937). new york: springer. ogp surveying & positioning committee. 2006. surveying and positioning. http://www.epsg.org/ (accessed june 12, 2007). semantic web advanced development for europe (swad europe). 2001. semantic web advanced development for europe thesaurus activity. http://www.w3.org/2001/sw/ europe/ reports/thes (accessed june 12, 2007). tolosanacalasanz, r., et al. 2006. semantic interoperability based on dublin core hierarchical onetoone mappings. international journal of metadata, semantics, and ontologies 1, no. 3: 183–88. tylor, mike. 2004. the zthes specifications for thesaurus rep resentation, access, and navigation. http://zthes.z3950.org/ (accessed june 12, 2007). united nations educational, scientific, and cultural organiza tion (unesco). 1995. unesco thesaurus: a structured list of descriptors for indexing and retrieving literature in the fields of education, science, social and human science, culture, communication and information. paris: unesco publ. u.s. library of congress. network devlopment and marc standards office. 2004. marc standards. http://www.loc. gov/marc/ (accessed june 12, 2007). wielemaker, jan, guss schreiber, and bob wielinga1. 2005. using triples for implementation: the triple20 ontology-manipulation tool (lecture notes in computer science, 3729): 773–85. new york: springer. zarazagasoria, francisco javier, et al. 2003. a java tool for creating iso/fgdc geographic metadata. in geodatenund geodiensteinfrastukuren—von der forschung zur praktischen anwendung: beitrage ze den münsteraner gi-tagen, 26/27. juni 2003 (ifgiprints, 18). münster, germany: institut fur geoin formatik, universitat münster. zarazagasoria, francisco javier, et al. 2007. providing sdi ser vices in a crossborder scenario: the sdiger project use case. in research and theory in advancing spatial data infrastructure concepts, 113–26. redlands, calif.: esri. ebsco cover 2 lita cover 3, cover 4 index to advertisers 2 information technology and libraries | march 2007 m any things happen on the national front that affect libraries and their use of technology. legislative action, national policy, and stan dards development are all arenas in which ala and lita both take an active role. lita has articulated in its strategic plan the need to pursue active involvement in providing its expertise on national issues and standards development. lita achieves these important objectives in a variety of ways. lita has several committees, interest groups, and representatives to ala standing committees that address legislation, regulation, and national policy issues that pertain to technology. the charge of the lita legislation and regulations committee reads: “the legislation and regulation committee monitors legislative and regula tory developments in the areas of information and communications technologies; identifies relevant issues affecting libraries and assists in developing appropri ate strategies for responding to these issues.” as its educational mission, the committee publicizes issues and strategies on the lita web site. the chairperson of this committee serves as the lita representative to the ala legislation assembly which advises ala on positions to take regarding legislative and regulatory action. lita also has a representative to the ala office of information technology policy advisory committee who works closely with the legislation and regulation committee on it policy issues that may cross over into the legislative realm. lita also appoints a representa tive to the ala intellectual freedom committee whose purpose is “to recommend such steps as may be neces sary to safeguard the rights of library users, libraries, and librarians, in accordance with the first amendment to the united states constitution and the library bill of rights.” much has happened on the national front in the past few years that provides plenty of work for these lita and ala committees. the patriot act, calea, net neutrality, dopa, ada compliance, and debates over copyright and intellectual property rights in an electronic world are all examples of issues that require technologi cal control or affect systems and network solutions. they also touch at the heart of what librarians have always stood for: protection of intellectual property, personal pri vacy, and intellectual freedom. library technologists exert enormous time and effort protecting the privacy of patron records through data retention policies, system controls, and strong authentication systems all while providing authorized access to intellectual property according to copyright or licensing restrictions. keeping lita mem bers apprised of all of these issues and the technologies required to abide by legal requirements is an enormous task of the committees and interest groups. these groups do this through programming, publications, and postings to the lita web site. lita has always been very active on the standards development front. from the start, lita was involved with the marc standards through the hard work of henriette avram. the number of standards that affect libraries has mushroomed. there are standards for all aspects of technology—data formats, hardware and firmware, and networking. ala regularly calls on lita to provide expertise on developing standards that per tain to library technology. lita has a standards interest group and shares membership with alcts and rusa on the marbi committee. most lita interest groups deal with standards of some sort at least occasionally. the lita board felt that lita’s work on develop ing standards was so important that in 2006 a new standards coordinator position was created and diane hillman, cornell university, was appointed as the first person in this role. the standards coordinator identifies lita experts to assist in calls for review of developing standards and seeks input from the membership. the standards coordinator works closely with the standards interest group to help educate the membership. because of the nature of digital information, networks, and the standards that enable the distribution of digital informa tion and services, it has become impossible for any one person to understand all the standards that affect the library technologist. as standards proliferate, it becomes more important for lita to provide educational oppor tunities alongside the involvement in the development of these standards that so impact our daily lives. the lita web site provides a wealth of information about standards. a new means of contributing to the dialogue about developing standards is to participate in the lita wiki where diane hillman will be leading the way in posting information about various library technology standards. also, a great place to learn about various stan dards is right here in ital. practically every issue has at least one article about one standard or another. lita’s participation in technological developments on the national front is critical to all libraries. policy, regu lation, and standards form the infrastructure to techno logical implementation and are the cornerstone to library technology. lita is the place where you can learn more about these developments and participate in the dialogue about them. bonnie postlethwaite (postlethwaiteb@umkc.edu) is lita president 2006/2007 and associate dean of libraries, university of missouri–kansas city. president’s column bonnie postlethwaite searchable signatures: context and the struggle for recognition gina schlesselmantarango information technology and libraries | september 2013 5 abstract social networking sites made possible through web 2.0 allow for unique user-generated tags called “searchable signatures.” these tags move beyond the descriptive and act as means for users to assert online individual and group identities. this paper presents a study of searchable signatures on the instagram application, demonstrating that these types of tags are valuable not only because they allow for both individuals and groups to engage in what social theorist axel honneth calls the “struggle for recognition,” but also because they provide contextual use data and sociohistorical information so important to the understanding of digital objects. methods for the gathering and display of searchable signatures in digital library environments are also explored. introduction a comparison of user-generated tags with metadata traditionally assigned to digital objects suggests that social network platforms provide an intersubjective space for what social theorist axel honneth has termed the “struggle for recognition.” 1 social network users, through the creation of identity-based tags—or what can be understood as “searchable signatures”—are able to assert and perform online selves and are thus able to demand, or struggle for, recognition within a larger social framework. baroncelli and freitas cogently argue that web 2.0, or the interactive online social arena, in fact functions as a “recognition market in which contemporary individuals . . . trade personal worth through displays and exchanges of . . . self-presentations.” 2 a comparison of a metadata schema used in yale university’s digital images database with usergenerated tags accompanying shared photographs on the social networking platform instagram demonstrates that searchable signatures are unique to social networking sites. as phenomena that allow for public presentations of disembodied selves, searchable signatures thus provide specific information about the context of the digital images with which they are associated. capturing context remains a challenge for those working with digital collections, but searchable signatures allow viewers to derive valuable use data and sociohistorical information to better understand the world in which digital images originated and exist. literature review web 2.0 identities and recognition theory while web 2.0 can be imagined as a highly collaborative space where social actors are able to gina schlesselman-tarango (gina.schlesselman@du.edu) holds a master of social sciences from the university of colorado denver and is currently an mlis candidate at university of colorado. mailto:gina.schlesselman@du.edu searchable signatures: context and the struggle for recognition | schlesselman-tarango 6 communicate to the world new identities, some warn that this communication is somehow engineered and performed. van dijck, in an analysis of social media, argues that it is indeed “publicity strategies [that] mediate the norms for sociality and connectivity,” and baroncelli and freitas note that web 2.0 allows people to make themselves visible through modes of spectacularization.3 though his focus is on the spectacle in fin de siècle france, clark provides some insight into the effects of spectacularization on the individual. 4 working within a historical materialist framework, clark points that with the growth of capitalism, the individual has become colonized. 5 clark further describes this colonization as “massive internal extension of the capitalist market—the invasion and restructuring of whole areas of free time, private life, leisure, and personal expression . . . the making-into-commodities of whole areas of social practice which had once been referred to casually as everyday life.” 6 here, web 2.0 is not a liberatory tool but instead a space where users are colonized to the extent that they create selves exchanged through social networking sites owned by capitalist enterprises. web 2.0, then, has created a situation in which personal time and identification can be successfully commodified. baroncelli and freitas conclude, “from that formula, personal life becomes a capital to be shared with other people—preferably, with a large audience.” 7 the problem, then, is that one’s existence is defined simply “by being seen by others” and can no longer be understood as authentic.8 despite the sophistication of the argument detailed above, there are some who view the online self, created through web 2.0, as a legitimate and authentic identity. in an account of the online self, hongladarom summarizes this position, noting that both offline and virtual identities are constructed in social environments. 9 for hongladarom, these identities are not different in essence because “what it is to be a person . . . is constituted by external factors.” 10 the online world as an external factor has the ability to affirm one’s existence, regardless of whether that existence is physical or virtual. in sum, it is the social other and not a material existence that is the authenticating factor in identity formation. there are others who validate the role that spectacle—or what also can be understood as performance—plays in identity formation. pearson calls on the work of goffman to argue, “identity-as-performance is seen as part of the flow of social interaction as individuals construct identity performances fitting their milieu.” 11 for pearson, the identity is always performed, be it through web 2.0 or otherwise. there is nothing particularly worrisome, then, about the effects of web 2.0 on the self, nor does web 2.0 threaten the authenticity of the self. identity is always performed and is in a sense a spectacle—this does not mean, however, that identity in itself is spurious. it is with this perspective of the online self as a performed albeit authentic identity that this paper further develops. before a thorough analysis of the searchable signature as an online self can be conducted, a deeper understanding of honneth’s theory of recognition is in order. information technology and libraries | september 2013 7 in his 1995 work the struggle for recognition: the moral grammar of social conflicts, honneth sets out to develop a social theory based on what he calls “morally motivated struggle.” 12 based on the habermasian concept of communicative action, honneth contends that it is through mutual recognition that “one can develop a practical relation-to-self [and can] view oneself from the normative perspective of one’s partners in interaction, as their social addressee.” 13 relation-toself is key for honneth, and he argues that a healthy relation-to-self, or what can be thought of as self-esteem, is developed when one is seen as valuable by others. beyond self-esteem, honneth points that the success of social life itself depends on “symmetrical esteem between individualized (and autonomous) subjects.” 14 for honneth, this “symmetrical esteem” can lead to solidarity between individuals. “relationships of this sort,” he explains, “can be said to be cases of ‘solidarity’ because they inspire not just passive tolerance but felt concern for what is individual and particular about the other person.” 15 that is to say that felt concern for another allows one to see the specific traits of the other as valuable in working towards common goals, and honneth imagines that in situations of “symmetrical esteem . . . every subject is free from being collectively denigrated, so that one is given the chance to experience oneself to be recognized, in light of one’s own accomplishments and abilities, as valuable for society.” 16 until this ideal is realized, however, individuals must find sites in which to struggle to be recognized as valuable social assets. according to baroncelli and freitas, it is in fact web 2.0 that provides the arena where “the contemporary demand for the visibility of the self” is able to flourish. 17 they position this argument within honneth’s framework, asserting that the visibility of self is “directed towards a quest for recognition,” and they thus conclude that web 2.0 can be understood as a “recognition market.” 18 context and its importance capturing and integrating markers of context into records, according to chowdhury, still present a challenge for many.19 “there is now a general consensus that the major challenge facing a digital library as well as a digital preservation program is that it must describe its content as well as the context sufficiently well to allow its correct interpretation by the current and future generations of users,” he contends.20 context in itself is difficult to define, let alone its myriad facets that might or might not facilitate better understanding of digital objects. dervin, in her exploration of the meaning of context, points that it is often conceptualized as the “container in which the phenomenon resides.” 21 she points that the list of factors that constitute the container and might be considered contextual is in fact “inexhaustible”—items on this list, for example, might include the gender, race, and ethnicity of those involved in a phenomenon. 22 in an indexing or digital collection environment, the goal is to determine which of these many factors ought be included in a record to best allow for discovery and use. searchable signatures: context and the struggle for recognition | schlesselman-tarango 8 others imagine context as a fluid, ever-changing process rather than as a static container of data. “in this framework,” dervin writes, “reality is in a continuous and always incomplete process of becoming.” 23 this understanding of context as changing is helpful for those working with objects that live in digital environments, especially web 2.0. certainly the interactive nature of the web has created room for a variety of users to create, share, appropriate, comment on, tag, reject, celebrate, and ultimately understand images in a multitude of contexts that might be different from one moment to the next. there are many reasons to include contextual information in records of digital objects. lee argues that by providing context, or what he describes as the “social and documentary” world “in which [a digital object] is embedded,” future users will be able to better understand the “details of our current lives.” 24 further, lee contends that context is helpful in that is illustrates the ways in which a digital object is related to other materials: relationships to other digital objects can dramatically affect the ways in which digital objects have been perceived and experienced. in order for a future user to make sense of a digital object, it could be useful for that user to know precisely what set of . . . representations—e.g. titles, tags, captions, annotations, image thumbnails, video keyframes—were associated with a digital object at a given point in time. 25 the user-generated tag, then, is a valuable representation that provides contextual information surrounding the perception and experience of the image with which it is directly related. discussion user-generated tags and traditional metadata user-generated tags have been hailed as an important stage in the evolution of image description and are said to have the potential to shape controlled vocabularies used in traditional metadata schemas. for example, in a comparison of flickr tags and index terms from the university of st. andrews library photographic archive, rorissa stresses the importance of exploring similarities and differences between indexers’ and users’ language, noting that “social tagging could serve as a platform on which to build future indexing systems.” 26 like others, rorissa hopes that continued research into user-generated social tags will be able to “bridge the semantic gap between indexerassigned terms and users’ search language.” 27 in fact, some are currently utilizing social tags in an effort to describe and facilitate access to collections. one such organization is steve: the museum social tagging project, “a place where you can help museums describe their collections by applying keywords, or tags, to objects.” 28 the organization allows users to not only view traditional metadata associated with cultural objects, but also tags generated by others. in an effort to better understand the similarities and differences between user-generated tags and the language used in traditional metadata schemas, one must compare the two systems. information technology and libraries | september 2013 9 yale university’s digital images database provides a glimpse at the ways in which traditional metadata schemas are typically used to describe images in digital library settings. most of the images included in the database are accompanied by descriptive, structural, and administrative metadata. for example, an item entitled “boy sitting on a stoop holding a pole” (see figure 1) from the university’s collection of 1957–90 andrews st. george papers provides a digital copy of the image, the image number, name of the creator, date of creation, type of original material, dimensions, copyright information, manuscript group name and number, box and folder numbers, and a credit line.29 the image is further described by the following: “man in the shed is making homemade bombs. the boy and man are also in image 45350.” 30 figure 1. “boy sitting on a stoop holding a pole” from yale university’s digital images database collection of 1957–90 andrews st. george papers, november 2012. certainly, such information is useful in library environments and provides users with helpful and formatted data to best guide the information discovery process. the finding aid for the andrews st. george collection is additionally helpful in that it includes information about provenance, access, processing, associated materials, and the creator; it also contains descriptive information about the collection by box and folder number. 31 however, if additional use data and sociohistorical searchable signatures: context and the struggle for recognition | schlesselman-tarango 10 information specific to this individual item were available, it would be most helpful in assisting users in determining the image’s greater context. a study of modes of participation on social networking sites suggests that it is now possible to supply such contextual information for digital objects that live in interactive online environments. a useful site for exploring user-generated tags associated with images is instagram, a social application designed for iphone and android.32 instagram users are able to upload and edit photos, and other users can then view, like, and comment on the shared photos. instagram users are able to follow other users and search for photos by the creator’s username or by accompanying tags. instagram, owned by facebook, is interoperable with other social networking sites, and users have the ability to share their photos on facebook, flickr, tumblr, and twitter. as of july 2012, it was reported that instagram had 80 million users, and in september 2012, the new york times reported that 5 billion photos were shared through the application.33 users are limited to 30 tags per photo, and instagram suggests that users be as specific as possible when describing an image with a tag so that communities of users with similar interests can form.34 many tags, like the information included in traditional metadata schemas, aim to best describe an image by explaining its content; for example, one user assigned the tags #kids, #nieces, #nephews, and #family to a photograph of a group of smiling children (see figure 2). like the information accompanying the photograph in the yale university digital images database, such tags provide users and viewers with tools to better determine the “aboutness” of the image at hand. information technology and libraries | september 2013 11 figure 2. photo shared on instagram assigning both descriptive tags and the searchable signature #proudaunt, november 2012. however, instagram users are repurposing the tagging function in a way that is unique to social networking sites. in addition to the descriptive tags assigned to the image of the children described above, the user also tagged the photo with the term #proudaunt (see figure 2). there is, however, no aunt (what can be assumed to be an adult female) in the photograph. this tag, then, functions to further identify the user who created or shared the photograph and does not describe the content of the image at hand. a search of the same tag, #proudaunt, demonstrates that this user is not alone in identifying as such: in november 2012, this search returned 40,202 images with the same tag and more than 58,000 images with tags derived from the same phrase (#proudaunty, #proudauntie, #proudaunties, #proudauntiemoment, and #proudaunti) (see figure 3). figure 3. list of results from #proudaunt hashtag search on instagram, november 2012. this type of user-generated tag—one that identifies the creator or sharer of the photograph yet is not necessarily meant to describe the content of the image—can be understood as a searchable signature. such identity-based tags are not found within yale university’s digital images database; the closest relative of the searchable signature is the creator’s name. while searchable, this name is not alternative, or secondary, and it was not created and does not exist in a social environment. searchable signatures: context and the struggle for recognition | schlesselman-tarango 12 currently, born-digital objects are often created and shared in a technological milieu that allows for the assignment of user-generated tags. consequently, the integration of the searchable signature into the presentation of digital objects has become part of accepted social practice and offers unique opportunities for digital library curators and users alike. until quite recently, most materials—be they photographs, manuscripts, or government documents—were not born in digital environments. however, digitization projects have been undertaken to ensure that such historical materials are more widely and eternally available. these reborn digital objects, then, have been and can be integrated into dynamic social environments. steve: the museum social tagging project, mentioned earlier in this paper, is one example of an organization that has capitalized on the social practice of user-generated tagging and is using descriptive tags along with traditional metadata to better describe reborn digital objects. it is important, then, to explore what (if any) implications the application of the searchable signature, a unique type of user-generated tag, has for historical objects that are later integrated into digital environments. searchable signatures associated with born digital images on social networking sites contain valuable information about their creators, users, and the images’ context. one cannot ignore that users will, if given the chance, also likely apply signatures to reborn digital objects in similar ways that they do to objects that have always existed in social environments. since the searchable signature is used to identify not only digital image creators, but also sharers, and if these signatures do in fact provide important insight into the sharers and their motivations, then these signatures are not to be ignored. rather than focusing on the creating, the lens through which to understand the searchable signature for reborn digital objects can be shifted to the social act of sharing: by whom, when, in which social environments, and for what purposes. a deeper analysis of the presentation of self through the searchable signature and the role that the signature plays in providing valuable contextual information for both bornand reborn-digital objects is developed below. searchable signatures and the struggle for recognition if web 2.0 indeed functions as a recognition market, then social media and social networking sites might appear to be tables at such a market. placing oneself behind a table—be it facebook, twitter, or instagram—the user is able to perform his or her online identity to passersby and effectively struggle to be recognized as a unique individual or as a member of a social group. these performances, which could be deemed narcissistic in nature, can alternatively be read as healthy attempts to self-actualize and connect to larger society.35 one such “table” in the recognition market is instagram. beyond instagram’s social nature that allows participants to interact with and follow one another, the specific role of the searchable signature is of interest to those who are concerned with struggles for recognition. rather than describing shared images, searchable signatures reflect performative yet authentic user identities. information technology and libraries | september 2013 13 mccune, in a case study of consumer production on instagram, acknowledges the potential of the tag to not only facilitate image exchange but to communicate users’ positions as members of social groups.36 through a simple search of tags, users who identify as, for example, “cat ladies,” are able to validate their identities when they see that there are many others who use the same or similar language in demonstrations of the self (see figure 4). other signatures such as #proudaunt, while not necessarily playful, still function to provide viewers with additional information about the instagram user that cannot be determined through the photo itself. the ability to find images based on these searchable signatures allows users to find others who identify in a like manner and to imagine themselves as part of a larger social group. in effect, searchable signatures allow users to be recognized as social addressees of like-minded others. positioning oneself within a group must be understood as a struggle for recognition, for to imagine oneself as part of the social fabric is also to see oneself as valuable. figure 4. list of results from #catlady hashtag search on instagram, november 2012. enabled by web 2.0, searchable signatures contain potential for marginalized peoples or groups to assert online selves to be seen and ultimately heard in a truly intersubjective landscape. it is not too much of a leap to imagine that searchable signatures might make possible the organization of individuals and groups for political purposes. in fact, in a discussion of social groups, honneth notes that “the more successful social movements are at drawing the public sphere’s attention to searchable signatures: context and the struggle for recognition | schlesselman-tarango 14 the neglected significance of the traits and abilities they collectively represent, the better their chances of raising the social worth, or indeed, the standing of their members.” 37 here, searchable signatures might provide such movements with a venue to capture the public’s attention and to effectively struggle for and gain recognition. searchable signatures and context as markers of individual and group identities, searchable signatures are unique in that they provide a snapshot of the multitude of social, historical, political, individual, and interpersonal relationships that ontologize the images with which they are paired. it is this very contextual information that is at times lacking in traditional indexing environments. by examining searchable signatures, experts and users are able to understand which individuals and groups create, use, and identify with certain images. thus, as markers of self, searchable signatures provide use data for scholars to better investigate which images are important to online individual or group identities. if the searchable signature is used in a political fashion, historians and sociologists might be able to study which types of images, for example, marginalized groups rally around, identify with, and use in their struggles for recognition. such use data also illuminates how and by whom certain digital images have been appropriated over time. for example, if a picture of a cat is first created or shared via instagram by an animal rights activist, the image might be accompanied by the searchable signature #humanforcats. this same image, shared by another user months later, might be accompanied by the #catlady signature. those interested will be able to examine how the same image has been historically used for different purposes and will be better able to grasp the evolving nature of its digital context. in addition to use data, the searchable signature provides insight into the sociohistorical context surrounding digital images. for those who perceive “reality . . . as accessible only (and always incompletely) in context, in specific historicized moments in time space” the searchable signature clarifies and makes more accessible that reality surrounding the digital image. 38 in a traditional library setting, a photo of a cat might be indexed with descriptive subject headings such as “cat,” “persian cat,” or “kitten—behavior.” however, the searchable signature #catladyforlife provides additional information on how the cat has become, for a certain social group in a specific moment in time, a trope of sorts for those who are proud of not only their relationships with their domestic pets, but of their shared values and lifestyles as well. if a historian were to dig deeper, he or she also might see that “cat lady” has historically been used in a derogatory manner to mark single, unattractive women thought to be crazy and unable to care for the great number of cats they own and that, by (re)claiming this title, women might be engaging in a struggle for recognition that extends beyond mere admiration for felines.39 chowdhury, in a continued discussion of challenges facing the digital world, asks whether it is “possible to capture the changing context along with the content of each information resource, because as we know the use and importance . . . changes significantly with time.” 40 additionally, he information technology and libraries | september 2013 15 asks, “will it be possible to re-interpret the stored digital content in the light of the changing context and user community, and thereby re-inventing the importance and use of the stored objects?” 41 it is here that the searchable signature offers use data and sociohistorical information to illuminate the (changing) value digital images have for individuals, communities, and society. conclusion clark argues that representation must be understood within the confines of what he calls “social practice.” 42 social practice, among other things, can be understood as “the overlap and interference of representations; it is their rearrangement in use.” 43 representation of self also must be understood within current social practice, and an important facet of today’s practice is web 2.0. as a social space, web 2.0 allows for the creation of disembodied self-representations. one type of such representation, the searchable signature, is a phenomenon unique to social networking sites. while many acknowledge the potential of descriptive, user-generated tags to inform or even to be used in conjunction with metadata schemas or controlled vocabularies, instagram users have created an additional, alternative use for the tag. rather than simply using tags to describe shared images, they have successfully created a route to online identity formation and recognition. searchable signatures demonstrate the power of the online self, as they allow users to struggle to be recognized as unique individuals or as parts of larger social groups. these signatures, too, might act as platforms on which social groups can assert their value and thus demand recognition. additionally, searchable signatures provide contextual information that reflects the social practice in which digital images live. while the capture and integration of such information remains a challenge for those engaged in traditional indexing, web 2.0 allows for this unique type of usergenerated tag and thus provides better understanding of the context surrounding digital images. as to the question of whether searchable signatures can be integrated into existing metadata schemas or be used to inform controlled vocabularies in library environments, it is not unreasonable to suggest that digital objects be accompanied by their supplemental yet valuable representations (e.g., searchable signatures and the like). many methods exist through which these signatures might be both gathered and displayed. certainly, a full exploration of such practices is the stuff of future research; however, some initial ideas are detailed below. one method of gathering identity-based tags would involve the active hunting down of searchable signatures. locating objects on social networking sites that are also in one’s digital collection, the indexer would identify and track associated user-generated searchable signatures. this method would require extreme diligence, high levels of comfort navigating and using web 2.0, a clear idea of which social networking sites yield the most valuable searchable signatures, and likely one or more full-time staff members devoted to such activities. even if feed systems were employed for individual digital objects, this method demands much of indexers and would likely not be sustainable over time. searchable signatures: context and the struggle for recognition | schlesselman-tarango 16 a more passive yet efficient way of gathering searchable signatures would simply be to build on methods that have shown to be successful. by creating interactive digital environments that encourage users to assign not only descriptive but also identity-based tags, indexers are freed of the time-consuming task of hunting for searchable signatures on the web. since searchable signatures have come to be part of online social practice, the assigning of them would likely be familiar to users—initially, libraries might need to prompt users to share signatures or provide them with examples. this gathering tactic could be used to harvest signatures for items that are already part of the library’s digital collection (telling us about signatures used by potential sharers) or as a means to incorporate new digital objects into the collection (telling us about signatures used by both creators and sharers). in both gathering scenarios, indexers might choose to display only the most occurring or what they deem to be the most relevant searchable signatures, or they might choose to display all such tags; decisions such as these will ultimately depend on each institution’s mission and resources. of course, if a library integrates a born-digital image into its collection and can identify the searchable signatures originally assigned to it via social networking sites or otherwise, this information should also be recorded. here, users will be able to get a glimpse of the image in its pre-library life. providing associated usernames, dates posted, and the name of the social networking sites too will assist in providing a more complete picture of the individuals or groups linked to the image. this information can provide valuable data about the information creators and sharers who use specific social platforms. the aim of this paper is to lay the theoretical groundwork to better understand the role of searchable signatures in today’s digital environment as well as the signature’s unique ability to provide context for digital images. surely, further research into the phenomenon of the searchable signature would demonstrate how it is currently used outside of instagram or as a political tool. others might consider examining the username as another arena in which individuals or groups construct and perform online identities and thus engage in struggles for recognition. usernames also might provide contextual use data and sociohistorical information that inevitably support greater understanding of digital objects. finally, further research is needed to identify how libraries could utilize the searchable signature in promotional activities and to build and cater to user communities. references 1. axel honneth, the struggle for recognition: the moral grammar of social conflicts (cambridge: mit press, 1995). 2. lauane baroncelli and andre freitas, “the visibility of the self on the web: a struggle for recognition,” in proceedings of 3rd acm international conference on web science, 2011, accessed august 12, 2013, www.websci11.org/fileadmin/websci/posters/191_paper.pdf. information technology and libraries | september 2013 17 3. jose van dijck, “facebook as a tool for producing sociality and connectivity,” television & new media 13, no. 2 (2012): 160–76; baroncelli and freitas, “the visibility of the self.” 4. t. j. clark, introduction to the painting of modern life: paris in the art of manet and his followers (princeton, nj: princeton university press, 1984), 1–22. 5. ibid. 6. ibid., 9. 7. baroncelli and freitas, “the visibility of the self.” 8. ibid. 9. soraj hongladarom, “personal identity and the self in the online and offline world,” minds & machines 21 (2011): 533–48. 10. ibid., 541. 11. erika pearson, “all the world wide web’s a stage: the performance of identity in online social networks,” first monday 14 (2009), accessed november 9, 2012, www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm; erving goffman, the presentation of self in everyday life (garden city, ny: doubleday, 1959). 12. honneth, the struggle for recognition, 1. 13. jurgen habermas, the theory of communicative action (boston: beacon, 1984); honneth, the struggle for recognition, 92. 14. honneth, the struggle for recognition, 129. 15. ibid. 16. ibid., 130. 17. baroncelli and freitas, “the visibility of the self.” 18. ibid. 19. gobinda chowdhury, “from digital libraries to digital preservation research: the importance of users and context,” journal of documentation 66, no. 2 (2010): 207–23, doi: 10.1108/00220411011023625. 20. ibid., 217. 21. brenda dervin, “given a context by any other name: methodological tools for taming the unruly beast,” in information seeking in context, ed. pertti vakkari et al. (london: taylor graham, 1997), 13–38. searchable signatures: context and the struggle for recognition | schlesselman-tarango 18 22. ibid., 15. 23. ibid., 18. 24. christopher a. (cal) lee, “a framework for contextual information in digital collections,” journal of documentation 67 (2011): 95–143. 25. ibid., 100. 26. abebe rorissa, “a comparative study of flickr tags and index terms in a general image collection,” journal of the american society for information science and technology 61, no. 11 (2010): 2230–42. 27. ibid., 2239. 28. “steve central: social tagging for cultural collections,” steve: the museum social tagging project, accessed december 16, 2012, http://tagger.steve.museum. 29. “yale university library manuscripts & archives department,” yale university manuscripts & archives digital images database, last modified april 19, 2012, accessed december 3, 2012, http://images.library.yale.edu/madid. 30. ibid. 31. “andrew st. george papers (ms 1912),” manuscripts and archives, yale university library, accessed april 30, 2013, http://drs.library.yale.edu:8083/fedoragsearch/rest. 32. “faq,” instagram, accessed november 10, 2012, http://instagram.com/about/faq. 33. emil protalinksi, “instagram passes 80 million users,” cnet, july 6, 2012, accessed november 13, 2012, http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-millionusers; jenna wortham, “it’s official: facebook closes its acquisition of instagram,” new york times, september 6, 2012, accessed november 13, 2012, http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-ofinstagram. 34. “tagging your photos using #hashtags,” instagram, accessed november 10, 2012, http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-usinghashtags; “instagram tips: using hashtags,” instagram, accessed november 10, 2012, http://blog.instagram.com/post/17674993957/instagram-tips-using-hashtags. 35. andrew l. mendelson and zizi papacharissi, “look at us: collective narcissism in college student facebook photo galleries,” in a networked self: identity, community and culture on social network sites, ed. zizi papacharissi (new york: routledge, 2010), 251–73. 36. zachary mccune, “consumer production in social media networks: a case study of the http://tagger.steve.museum/ http://images.library.yale.edu/madid/ http://drs.library.yale.edu:8083/fedoragsearch/rest/ http://instagram.com/about/faq/ http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-million-users/ http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-million-users/ http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-of-instagram/ http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-of-instagram/ http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-using-hashtags http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-using-hashtags http://blog.instagram.com/post/17674993957/instagram-tips-using-hashtags information technology and libraries | september 2013 19 ‘instagram’ iphone app” (master’s dissertation, university of cambridge, 2011), accessed december 20, 2012, http://thames2thayer.com/portfolio/a-study-of-instagram. 37. honneth, the struggle for recognition, 127. 38. dervin, “given a context by any other name,” 17. 39. kiri blakeley, “crazy cat ladies,” forbes, october 15, 2009, accessed december 4, 2012, www.forbes.com/2009/10/14/crazy-cat-lady-pets-stereotype-forbes-woman-timefelines.html; crazy cat ladies society & gentlemen's auxiliary homepage, accessed december 4, 2012, www.crazycatladies.org. 40. chowdhury, “from digital libraries to digital preservation,” 219. 41. ibid. 42. clark, introduction to the painting of modern life, 6. 43. ibid. acknowledgments many thanks to erin meyer and dr. krystyna matusiak at the university of denver for their feedback and guidance. http://thames2thayer.com/portfolio/a-study-of-instagram/ academic web site design and academic templates | peterson 217 academic web site design continues to evolve as colleges and universities are under increasing pressure to create a web site that is both hip and professional looking. many colleges and universities are using templates to unify the look and feel of their web sites. where does the library web site fit into a comprehensive campus design scheme? the library web site is unique due to the wide range of services and content available. based on a poster session presented at the twelfth annual association of college and research libraries conference in minneapolis, minnesota, april 2005, this paper explores the prevalence of university-wide academic templates on library web sites and discusses factors libraries should consider in the future. c ollege and universities have a long history with the web. in the early 1990s, university web sites began as piecemeal projects with varying degrees of complexity—many started as informational sites for various technologically advanced departments on campus. over the last decade, these web sites have become a vital part of postsecondary institutions and one of their most visible faces. academic web sites communicate the brand and mission of an institution. they are used by prospective students to learn about an institution and then used later to apply. current students use them to pay tuition bills, register for classes, access course materials, participate in class discussions, take tests, get grades, and more. online learning and course-management software programs, such as blackboard, continue to increase the use of web sites. they are now an important learning tool for the entire campus community and the primary communication tool for current students, parents, alumni, the community, donors, and funding organizations. web site standards have developed since the 1990s. usability and accessibility are now important tenets for web site designers, especially for educational institutions. as a result, campus web designers or outside consultants are often responsible for designing large parts of the academic web site. as web sites have grown, ongoing maintenance is an important workload issue. databases and other technologies are used to simplify daily updates and changes to web sites. this is where the academic template fits in. an academic template can be defined as a common or shared template used to control the formatting of web pages in different departments on a campus. generally, administrators will mandate the use of a specific template or group of templates. this mandate includes guidelines for such things as layout, design, color, font, graphics, and navigation links to be used on all web pages. often, the templates are administered using content management systems (cmss) or web development software such as macromedia’s contribute. these programs give different levels of editing rights to individuals, thus keeping tight control over particular web pages or even parts of web pages. academic templates give the web site administrator the ability to change the template and update all pages with a single keystroke. for example, the web site administrator may give editing rights to content editors, such as librarians, to edit only the center section of the web page. the remaining parts of the page such as the top, sides, and bottom are locked and cannot be edited. the result of using templates is that the university web site is very unified and consistent. this is particularly important in creating a brand for the university. well-branded institutions have the opportunity to increase revenue, improve administration and faculty staffing, improve retention, and increase alumni relationships.1 but what about the library? libraries are one of the most visited web pages on a university’s web site.2 thus, the design of the library page can be crucial to a well-designed academic web site. the library web site can set a tone for an institution and help prospective students get a feel for the campus. belanger, mount, and wilson contend it is important for the image of an institution to match the reality.3 if there is discord between the two, students may choose an inappropriate college and quickly drop out, lowering a campus’s retention data. the library web site can also be important in the recruitment of new faculty members. in addition, libraries use their web sites for marketing, public relations, and fund-raising for the library.4 library web sites are crucial to delivering data, research tools, and instruction to students, faculty, staff, and community patrons. more than 90 percent of students access the library from their home computers, and 78 percent prefer this form of access.5 today, the web site connects users with article citations and databases, library catalogs, full-text journals, magazines, newspapers, books, videos, dvds, e-books, encyclopedias, streaming music and video, and more. users access subject-specific research guides, library tutorials, information-literacy instruction, and critical evaluation tools. services such as interlibrary loan (ill), reference management programs such as endnote or refworks, and print and electronic reserves are also used via the web. users get help with doing research by e-mail and virtual chat. in addition, libraries are digital repositories for a growing number of digital historic documents and archives. academic web site design and academic templates: where does the library fit in? kate peterson kate peterson (katepeterson@gmail.com) is an information literacy librarian at capella university, minneapolis, minnesota. 218 information technology and libraries | december 2006 how common are academic templates in library web sites? what effect do they have on the content and services provided by libraries? ■ methods for the purposes of this study, a list of doctoral, master’s, and bachelor of arts (ba) institutions (private and public) based on the carnegie classification of institutions of higher education was created and a random number table was used to select a sample of web pages (n=216).6 home pages, admissions pages, departmental pages, and library web pages were analyzed. a similarly sized sample of each type was selected to give a broad overview of trends—18 percent of doctoral institutions (n=47), 19 percent of master’s institutions (n=115), and 23 percent of ba institutions (n=54). the following questions were asked: ■ does the college or university web site use an academic template? ■ if yes, is the library using the template, and for how much of the library web site? ■ to what extent is the template being used? primarily, a web site was determined to be using an academic template based on the look of the site. for example, if the majority of the web elements (top banner, navigation) all matched, then the web site was counted as using some sort of template. use and nonuse of content management system (cms) software behind the web site was not considered in this study—only the look of the web site. ■ results a majority of college and university web sites (94 percent) use an academic template. fifty percent of the libraries surveyed use the academic template for at least the library’s home page. of that number, about 34 percent of libraries use the template on a majority of the library pages. roughly 44 percent of the total libraries surveyed did not use the academic template, and approximately 5 percent of academic web sites do not use any sort of unified academic template. smaller ba institutions are more likely to use the academic template on multiple library pages than doctoral institutions, which tend to have their own library design or template (see table 1). for those libraries that did not use the academic template on every library page, the most commonly used elements template were the top header (which often has the university seal or an image of the university), the top navigation bar (with university-wide links), and the bottom footer, which often contains the university address, privacy statement, or legal disclaimers. less frequently used elements were the bottom navigation bar, and the left or right navigation bar with university-wide links (see tables 2–3). ■ discussion while many colleges and universities use academic templates, only about half of their libraries follow suit. libraries using the template often use selected parts of the template, or only use the template on their home page. though not considered in this study, there may be a correlation between institution size and template use, as larger institutions are more likely to have library web designers and thus use the academic template only on the library’s home page. while academic templates can cause libraries many problems, there are also many benefits to be considered. ■ problems with academictemplates on library web sites the primary concern with any template is how much space is available for content. for example, there may be a very small box for the page content while images, banner bars, and large navigation links may take up most of the real estate on the page. this problem can be exacerbated for libraries because there are so many different types of content such as the library catalog, databases, tutorials, forms, ill, and other library services delivered via the web. libraries can be caught between the design imposed by the academic template and the rigid size requirements from outside vendors such as database companies, ill or reserve modules, federated search products, or others. academic templates are usually mandated by administrators without a full understanding of the specific content and uses of the library web site. many problems can occur when trying to fit an existing library web site into a poorly designed academic template. it can be very difficult to modify the template effectively for the library’s purposes. an example of one specific problem is confusing links on the template, where a link on every page to the “university catalog” links to the course catalog and not the library catalog, which is very confusing for users. another example is a search box as part of the academic template—what are users searching? the university web site? the library web site? the library catalog? the world wide web? another drawback to using academic templates for library web sites can be the time involved in training librarians, staff, and library web site administrators. the existing academic web site design and academic templates | peterson 219 content must be fit into the new template—a huge project, given that many library web sites contain one thousand pages or more. generally, a decision to use a template is accompanied by a decision to use a cms or new web-page editor. this takes yet more time to train individuals on the new software in addition to the new template. ■ benefits of using academic templates one of the benefits for libraries using an academic template is the ability to exploit the expertise of the web site designers who created the template. the academic template often incorporates images, logos, and branding that the library may not be able to design otherwise. many libraries do not have professional web designers on staff; even if they do, there often is no one person who designs and maintains the entire library web site. instead, different parts of a library web site are designed and maintained by different individuals with varying degrees of web site ability. as a result, many library web sites are a mix of styles, which can be disorienting for students who are familiar with the university’s “look.” web site uniformity has a positive effect on usability since familiarity with one part of the web site helps students, faculty, table 1. percentages of occurrences of academic templates no academic template (%) library not using template (%) library using template—transition or top page (%) library using template—majority of pages (%) bachelor of arts 4 37 13 46 master’s 6 48 12 34 doctoral 6 45 28 21 table 2. occurrence of templates in academic and library web sites no academic template library not using template library using template— transition or top page library using template— majority of pages total sites analyzed bachelor of arts 2 20 7 25 54 master’s 7 55 14 39 115 doctoral 3 21 13 10 47 total 12 96 34 74 216 table 3. percentages of occurrence for institutions using the academic-wide template for first page of library web site or libraries using modified academic template ba (%) master’s (%) doctoral (%) all colleges and universities (%) top header (no navigation) 100 94 94 91 top navigation 75 82 82 76 bottom header (no navigation) 83 65 76 72 bottom navigation 25 18 18 20 left navigation 42 18 18 24 right navigation 8 0 0 2 220 information technology and libraries | december 2006 and staff navigate other parts of the web site. even web site basics such as knowing the color and style of the links and how to navigate to different pages can be helpful.8 another benefit is academic templates are generally ada compliant as required under section 508 of the rehabilitation act of 1973.9 as usability and usability testing become more prevalent, academic template designers may also test the template and navigation for usability. such testing will improve the template and thus the library web site as well. ■ trends in academic and library web sites colleges and universities are responding to a new generation of students, the majority of whom have grown up with computers. in trying to meet their needs and desires, many academic web sites have high-quality photographs, quotes, and testimonials from the universities’ students on their home pages. more and more materials are being placed online to allow both prospective and current students to do what they need to do twenty-four hours a day, from registering for classes to handing in research papers. many web sites have interactive elements such as instant polls or quizzlets or use instant messaging to connect with tech-savvy students. for example, prospective students can chat with admissions staff members or current students about what it is like to attend a particular university. a large number of sites also highlight weblogs written by current students or those studying abroad. these features allow students to use the technology they are comfortable with to maximize their academic experience. numerous library web sites are changing as well, featuring a library catalog, article database, or federated search box on the home page to allow users to search instantly. additionally, library sites are beginning to include images of students using the library, external or internal shots of the building, flash graphics, icons, and sound. many incorporate screen captures to help users navigate specific databases or forms. in addition, an increasing number of libraries use weblogs to give more of a dynamic quality with daily library news and announcements. ■ strategies for using academic templates based on comments received in april 2005 during the poster session, and in recent electronic discussion list postings, many academic libraries are dealing with these issues. libraries should work on creating a mission statement and objectives for their web sites that expand upon the library’s mission, the institutional web site’s mission, and the institution’s overall mission and brand. librarians must be knowledgeable about web site usability and trends in web site design in order to communicate effectively to designers and administrators. librarians should also become members of campus web committees and be a voice for library users during the design process. teaching administrators and campus web designers about the library and the library web site’s prominence are important tools to successfully deal with any proposed university-wide academic templates. for example, a librarian could mock-up a few pages, conduct informal usability testing, and invite administrators to learn firsthand about potential problems library users could experience with a template. librarians could also propose a modified template that uses a few key elements from the academic template. this would maintain the brand but retain enough space for important library content. connecting with other librarians and learning from each other’s successes and failures will also help bring insight into this academic template issue. ■ conclusion the use of academic templates is only going to increase as institutional web sites grow in complexity and importance. libraries are an important part of institutions both physically—on campus—and virtually—as part of the campus web site. academic templates are part of a unified design scheme for colleges and universities. librarians must work with both library and university administrators to create a well-designed but usable library web site. they must advocate for library users and continue to help students and faculty access the rich resources and services available from the library. library administrators need to allocate resources and staff time to improve their web sites and to work in concert with academic web site designers to merge the best of the academic template to the best of the library site while not sacrificing users’ needs. the result will be highly used, highly usable library web sites that attract students and keep them coming back to access the fantastic world of information available in today’s academic libraries. ■ references 1. robert sevier, “university branding: 4 keys to success,” university business 5, no. 1 (2002): 27–28. 2. mignon adams and richard m. dougherty, “how useful is your homepage? a quick and practical approach to evaluating a library’s web site,” college & research libraries news 63, no. 8 (2002): 590–92. academic web site design and academic templates | peterson 221 3. charles belanger, joan mount, and mathew wilson, “institutional image and retention,” tertiary education and management 8, no. 3 (2002): 217. 4. jeanie m. welch, “the electronic welcome mat: the academic library web site as a marketing and public-relations tool,” the journal of academic librarianship 31, no. 3 (2005): 225–28. 5. oclc, “how academic librarians influence students’ web-based information choices,” in oclc online computer library center database online, (2002), 5, http://www5.oclc .org/downloads/community/informationhabits.pdf (accessed march 10, 2005). 6. carnegie foundation, carnegie classification of institutions of higher education, 2000 edition, http://www.carnegiefound ation.org/classification/ (accessed jan. 8, 2005). 7. beth evans, “the authors of academic library home pages: their identity, training, and dissemination of web construction skills,” internet research 9, no. 4 (1999): 309–19. 8. oclc, 6. 9. u.s. department of justice, section 508 home page, in united states department of justice database online, (2004), 1, http://www.usdoj.gov/crt/508/ (accessed july 3, 2005). statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: john webb, librarian emeritus, washington state university libraries, pullman, wa 99164-5610. annual subscription price, $55. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: june 2006 issue). total number of copies printed: average, 5,256; actual, 5,300. sales through dealers and carriers, street vendors, and counter sales: none. paid or requested mail subscriptions: average, 4,262; actual, 4,280. free distribution (total): average, 59; actual, 67. total distribution: average, 4,758; actual, 4,769. office use, leftover, unaccounted, spoiled after printing: average, 498; actual, 531. total: average, 5,256; actual, 5,300. percentage paid: average, 98.76; actual, 98.60. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , o c t o b e r 1 9 9 9 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , s e p t e m b e r 3 0 , 2 0 0 6 . editorial and technological workflow tools to promote website quality | morton-owens 91 emily g. morton-owens editorial and technological workflow tools to promote website quality everard and galletta performed an experimental study with 232 university students to discover whether website flaws affected perception of site quality and trust. their three types of flaws were incompleteness, language errors (such as spelling mistakes), and poor style in terms of “ambiance and aesthetics,” including readable formatting of text. they discovered that subjects’ perception of flaws influenced their judgment of a site being highquality and trustworthy. further, they found that the first perceived error had a greater negative impact than additional problems did, and they described website users as “quite critical, negative, and unforgiving.”5 briggs et al. did two studies of users’ likelihood of accepting advice presented on a website. of the three factors they considered—credibility, personalization, and predictability—credibility was the most influential in predicting whether users would accept or reject the advice. “it is clear,” they report, “that the look and feel of a web site is paramount in first attracting the attention of a user and signaling the trustworthiness of the site. the site should be . . . free of errors and clutter.”6 though none of these studies focuses on libraries or academic websites and though they use various metrics of trustworthiness, together they point to the importance of quality. text quality and functional usability should be important to library website managers. libraries ask users to entrust them to choose resources, answer questions, and provide research advice, so projecting competence and trustworthiness is essential. it is a challenge to balance the concern for quality with the desire to update the website frequently and with librarians’ workloads. this paper describes a solution implemented in drupal that promotes participation while maintaining quality. the editorial system described draws on the author’s prior experience working in book publishing at penguin and random house, showing how a system that ensures quality in print publishing can be adjusted to fit the needs of websites. ■■ setting editing most people think of editing in terms of improving the correctness of a document: fixing spelling or punctuation errors, fact-checking, and so forth. these factors are probably the most salient ones in the sense that they are editor’s note: this paper is adapted from a presentation given at the 2010 lita forum library websites are an increasingly visible representation of the library as an institution, which makes website quality an important way to communicate competence and trustworthiness to users. a website editorial workflow is one way to enforce a process and ensure quality. in a workflow, users receive roles, like author or editor, and content travels through various stages in which grammar, spelling, tone, and format are checked. one library used a workflow system to involve librarians in the creation of content. this system, implemented in drupal, an opensource content management system, solved problems of coordination, quality, and comprehensiveness that existed on the library’s earlier, static website. t oday, libraries can treat their websites as a significant point of user contact and as a way of compensating for decreases in traditional measures of library use, like gate counts and circulation.1 websites offer more than just a gateway to journals; librarians also can consider instructional or explanatory webpages as a type of public service interaction.2 as users flock to the web to access electronic resources and services, a library’s website becomes an increasingly prominent representation of the library. at the new york university health sciences libraries (nyuhsl), for example, statistics for the 2009–10 academic year showed 580,980 in-person visits for all five locations combined. by comparison, the website received 986,922 visits. in other words, the libraries received 70 percent more website visits than in-person visits. many libraries conduct usability testing to determine whether their websites meet the functional needs of their users. a concern related to usability is quality: users form an impression of the library partly based on how it presents itself via the website. as several studies outside the library arena have shown, users’ experience of a website leads them to attribute characteristics of competence and trustworthiness to the sponsoring organization. tseng and fogg, discussing non-web computer systems, present “surface credibility” as one of the types of credibility affecting users. they suggest that “small computer errors have disproportionately large effects on perceptions of credibility.”3 in another paper by fogg et al., “amateurism” is one of seven factors in a study of website credibility. the authors recommend that “organizations that care about credibility should be ever vigilant—and perhaps obsessive—to avoid small glitches in their websites. . . . even one typographical error or a single broken link is damaging.”4 emily g. morton-owens (emily.morton-owens@med.nyu.edu) is web services librarian, new york university health sciences libraries, new york. 92 information technology and libraries | september 2011 happens when a page moves from one state to another. the very simple workflow in figure 1 shows two roles (author and editor) and three states (draft, approval, and published). there are two transitions with permissions attached to them. only the author can decide when he or she is done working and make the transition from draft to approval. only the editor can decide when the page is ready and make the transition from approval to published. (in these figures, dotted borders indicate states in which the content is not visible to the public.) a book publishing workflow involves perhaps a dozen steps in which the manuscript passes between the author, his or her agent, and various editorial staff. a year can pass between receiving the manuscript and publishing the book. the reason for that careful, conservative process is that it is very difficult to fix a book once thousands of copies have been printed in hardcover. by contrast, consider a newspaper: a new version appears every day and contains corrections from previous editions. a newspaper workflow is hardly going to take a full year. a website is even more flexible than a newspaper because it can be fixed or improved at any time. the kind of multistep process used for books and newspapers is effective, but not practical for websites. a website should have a workflow for editorial quality control, but it should be proportional to the format in terms of the number of steps, the length of the process, and the number of people involved. alternate workflow models this paper focuses on a contributor/editor model in which multiple authors create material that is vetted by a central authority: the editor. other models could be implemented with much the same tools. for example, in a peer-review system as is used for academic journals, there is a reviewer role, and an article could have states like “published,” “under review,” “conditionally accepted,” and so forth. most noticeable when neglected. editors, however, have several other important roles. for example, they select what will be published. in book publishing, that involves rejecting the vast majority of material that is submitted. in many professional contexts, however, it means soliciting contributions and encouraging authors. either way, the editor has a role in deciding what topics are relevant and what authors should be involved. additionally, editors are often involved in presenting their products to audiences. in book publishing, that can mean weighing in on jacket designs or soliciting blurbs from popular authors. on websites, it might mean choosing templates or fonts. editors want to make materials attractive and accessible to the right audience. together, correctness, choice, and presentation are the main concerns of an editor and together contribute to quality. each of these ideas can be considered in light of library websites. correctness means offering information that is current and free of errors, contradictions, and confusing omissions. it also means representing the organization well by having text that is well written and appropriate for the audience. writing for the web is a special skill; people reading from screens have a tendency to skim, so text should be edited to be concise and preferably organized into short chunks with “visible structure.”7 there is also good guidance available about using meaningful link words, action phrases, and “layering” to limit the amount of information presented at once.8 of course, correctness also means avoiding the kind of obvious spelling and grammar mistakes that users find so detrimental. choice probably will not involve rejecting submissions to the website. instead, in a library context it could mean identifying information that should appear on the website and writing or soliciting content to answer that need. presentation may or may not have a marketing aspect. a public library’s website may advertise events and emphasize community participation. as an academic medical library, nyuhsl has in some sense a captive audience, but it is still important to communicate to users that librarians understand their unique and highlevel information needs and are qualified to partner with them. workflow a workflow is a way to assign responsibility for achieving the goals of correctness, choice, and presentation. it breaks the process down into steps that ensure the appropriate people review the material. it also leaves a paper trail that allows participants to see the history and status of material. workflow can alleviate the coordination problems that prevent a website from exhibiting the quality it should. a workflow is composed of states, roles, and transitions. pages have states (like “draft” or “published”) and users have roles (like “contributor” or “editor”). a transition figure 1. very basic workflow editorial and technological workflow tools to promote website quality | morton-owens 93 effect was on the quality of the website, which contained mistakes and confusing information. ■■ methods nyuhsl workflow and solutions to resolve its web management issues, nyuhsl chose to work with the drupal content management system (cms). the ability to set up workflow and inventory content by date, subject, or author was a leading reason for that decision. other reasons included usability of the backend for librarians, theming options, the scripting language the cms uses (php), and drupal’s popularity with other libraries and other nyu departments.9 nyuhsl’s drupal environment has four main user roles: 1. anonymous: these are visitors to the nyuhsl site who are not logged in (i.e., library users). they have no permissions to edit or manage content. they have no editorial responsibilities. 2. library staff: this group includes all the staff content authors. their role is to notice what content library users need and to contribute it. staff have been encouraged to view website contributions as something casual—more akin to writing an e-mail than writing a journal article. 3. marketing team: this five-member group checks content that will appear on the homepage. their mandate is to make sure that the content is accurate about library services and resources and represents the library well. its members include both librarians and staff with relevant experience. 4. administrators: there are three site admins; they have the most permissions because they also build the site and make changes to how it works. two of the three admins have copyediting experience from prior jobs, so they are responsible for content approvals. they copyedit for spelling, grammar, and readability. admins also check for malformed html created by the wysiwyg (what you see is what you get) interface provided for authors, and they use their knowledge of other material on the site to look out for potential conflicts or add relevant links. returning to the themes of correctness, choice, and presentation, it could be said that librarian authors are responsible for choice (deciding what to post), the marketing team is responsible for choice and presentation, and the administrators are responsible for all three. an important thing to understand is that each person in a role has the same permissions, and any one of in an upvoting system like reddit (http://reddit .com), content is published by default, any user has the ability to upvote (i.e., approve) a piece of content, and the criterion for being featured on the front page is the number of approvals. in a moderation system, any user can submit content and the default behavior is for the moderator to approve anything that is not outright offensive. the moderator never edits, just chooses the state “approved” or the state “denied.” moderation is often used to manage comments. another model, not considered here, is to create separate “staging” and “production” websites. content and features are piloted on the staging site before being pushed to the live site. (nyuhsl’s workflow occurs all on the live site.) still, even in a staging/production system the workflow is implicit in choosing someone who has the permission and responsibility to push the staging site to the production site. problems at nyuhsl in 2007, the web services librarian position at nyuhsl had been open for nearly a year. librarians who needed to post material to the website approached the head of library systems or the “sysadmin.” both of them could post pages, but they did not proofread. pages that became live on the website stayed: they were never systematically checked. if a librarian or user noticed a problem with a page, it was not clear who had the correct information or was responsible for fixing it. often, pages that were found to be out-of-date would be delinked from other pages but were left on the server and thus findable via search engines or bookmarks. because only a few people had ftp access to the server, but authored little content, the usernames shown on the server were useless for determining who was responsible for a page. similarly, timestamps on the server were misleading; someone might fix one link on a page without reviewing the rest of it, so the page could have a recent timestamp but be full of outdated information. even after a new web services librarian started in 2007, problems remained. the new librarian took over sole responsibility for posting content, which made the responsibility clearer but created a bottleneck, for example, if she went on vacation. furthermore, in a library with five locations and about sixty full-time employees, it was hard for one person to do justice to all the libraries’ activities. if a page required editing, there was no way to keep track of whose turn it was to work on the document. there also was no automatic notification when a page was published. this made it possible for content to go astray and be forgotten. these problems added up to frustration for would-be content authors, a time drain for systems staff, and less time to create new content and sites. the most significant 94 information technology and libraries | september 2011 at the top of the homepage. their appearance should not be delayed, so any staff author can publish one. class sessions are specific dates, times, and locations that a class is being offered. these posts are assembled from prewritten text, so there is no way to introduce errors and no reason to route them through an approval step. figure 2 illustrates the main steps of the three cases. the names of the states are shown with arrows indicating which role can make each transition. unlabeled arrows mean that any staff member can perform that step. figure 3 shows how, at each approval step, content can be sent back to the author (with comments) for revision. although this happens rarely, it is important to have a way to communicate with the author in a way that is traceable by the workflow. figure 4 illustrates the concept of retirement. nyuhsl needed a way to hide content from library users and search engines, but it is dangerous to allow library staff to delete content. also, old content is sometimes useful to refer to or can even be republished if the need arises. any library staff user can retire content if they recognize it as no longer relevant or appropriate. additionally, library staff can resurrect retired content by resetting it to the draft state. that is, they cannot directly publish retired content (because they do not have permission to publish), but they can put it back on the path to being published by saving it as a draft, editing, and resubmitting for approval. figure 5 shows that library staff do not really need to understand the details of workflow. for any new content, they only have two options: keep the content in the draft state or move it on to whatever next step is available. all them can perform an action. the five marketing team members do not vote on the content, nor do they all have to approve it; instead, any one of them, who happens to be at his workstation when they get a notification, is sufficient to perform the marketing team duty. also, the marketing team members and administrators do not “self-approve”—no matter how good an editor someone may be, he or she is rarely good at editing her own work. nyuhsl’s workflow considers three cases: 1. most types of content are reviewed by one of the administrators before going live. 2. content types that appear on the homepage (i.e., at higher visibility) are reviewed by a member of the marketing team before being reviewed by an administrator. 3. two types of content do not go through any workflow. alerts are urgent messages that appear in red figure 2. approval steps figure 3. returning contents for edits figure 4. retirement editorial and technological workflow tools to promote website quality | morton-owens 95 this may sound like a large volume of e-mail, but it does not appear to bother library staff. the subject line of every e-mail generated by the system is prefaced with “[hsl site]” for easy filtering. also, every e-mail is signed with “love, the nyuhsl website.” this started as a joke during testing but was retained because staff liked it so much. one described it as giving the site a “warm, fuzzy feeling.” drupal modules nyuhsl developers used a number of different drupal modules to achieve the desired workflow functionality. a simple system could be achieved using fewer modules; the book using drupal offers a good walkthrough of workflow, actions, and trigger.10 of course, it also would be possible to implement these ideas in another cms or in a homegrown system. this list does not describe how to configure each module because the features are constantly evolving; more information is available on the drupal website.11 the drupal modules used include: ■■ workflow ■■ actions ■■ trigger ■■ token ■■ module grants ■■ wysiwyg, imce, imce wysiwyg api bridge ■■ node expire ■■ taxonomy role ■■ ldap integration ■■ rules ■■ results participation figure 6 shows the number of page revisions per person from july 14, 2009, to november 4, 2010. since many pages are static and were created only once, but need to be updated regularly, a page creation and a page update count equally in this accounting, which was drawn from the node_revisions table in drupal. it gives a general sense of content-related activity. a reasonable number of staff have logged in, including all of the librarians and a number of staff in key positions (such as branch managers). the black bars represent the administrators of the website. it is clear that the workflow system, while broadening participation, has hardly diffused primary responsibility of managing the website. the web services librarian and web manager have by far the most page edits, as they both write new content and edit content written by all other users. of the other options are hidden because staff do not have permission to perform them. the status of content in the workflow can be checked by clicking on the workflow tab of each page, but it also is tracked by notification e-mails. when the content enters a state requiring an approval, each person in that approving role gets an e-mail letting them know something needs their attention. the e-mail includes a link directly to the editing page. for example, if a librarian writes a blog post and changes its state from “draft” to “ready for marketing approval,” he or she gets a confirmation e-mail that the post is in the marketing approval queue. the marketing team members each get an e-mail asking them to approve the post; only one needs to do so. once someone has performed that approval, the marketing team members receive an e-mail letting them know that no further action is required. now the content is in the “ready for approval” state and the author gets another e-mail notification. the administrators get a notification with a link to edit the post. once an administrator gives the post final approval, the author gets an e-mail indicating that the post is now live. the nyuhsl website workflow system also includes reminders. each piece of content in the system has an author (authorship can be reassigned, so it is not necessarily the person who originally created the page). the author receives an e-mail every four months reminding him or her to check the content, revise it if necessary, and re-save it so that it gets a new timestamp. if the author does not do so, he or she will continue to get reminders until the task is complete. also, the site administrators can refer to a list of content that is out of date and can follow up in person if needed. note that reminders only apply to static content types like pages and faqs, not to blog posts or event announcements, which are not expected to have permanent relevance. figure 5. workflow choices for library staff users 96 information technology and libraries | september 2011 check the status by clicking on the workflow tab. this eliminates the discouraging mystery of having content get lost on the way to being published. ■■ identifying “problem” content: the node expire module has been modified to send e-mail reminders about stale content; as a result, this “problem” figure 7 shows the distribution of content updates once the web team members have been removed. it is clear that a small number of heroic contributors are responsible for the bulk of new content and updates, with other users logging on sporadically to address specific needs or problems. how editorial workflow addresses nyuhsl’s problems different aspects of the nyuhsl editorial workflow address different website problems that existed before the move to a cms. together, the workflow features create a clearly defined track that marches contributed content along a path to publication while always making the history and status of that content clear. ■■ keeping track of who wrote what when: this information is collected by the core drupal software and visible on administrative pages. (drupal also can be customized to display or sort this information in more convenient ways.) ■■ preventing mistakes and inconsistencies: this requires a human editor, but drupal can be used to formalize that role, assign it to specific people, and ensure nothing gets published without being reviewed by an editor. ■■ bottlenecks: nyuhsl eliminated bottlenecks that stranded content waiting for one person to post it by creating roles with multiple members, any one of whom can advance content to the next state. there is no step in the system that can be performed by only one person. ■■ knowledge: the issue of having too much going on in the library for one person to report on was addressed by making it easier for more people to contribute. drupal encourages this through its usability (especially a wysiwyg editor), and workflow makes it safe by controlling how the contributions are posted. ■■ “lost” content: when staff contribute content, they get e-mail notifications about its status and also can figure 6. number of revisions by user each user is indicated by their employee type rather than by name. figure 7. number of revisions by user, minus web team each user is indicated by their employee type rather than by name editorial and technological workflow tools to promote website quality | morton-owens 97 places web content in the context of other communication methods, like e-mail marketing, press releases, and social media.12 in her view, it is not enough to consider a website on its own; it has to be part of a complete strategy for communicating with an organization’s audience. libraries embarking on a website redesign would benefit from contemplating this larger array of strategic issues in addition to the nitty-gritty of creating a process to ensure quality. ■■ conclusions nyuhsl differs from other libraries in its size, status as an academic medical library, level of it staffing, and other ways. some aspects of nyuhsl’s experience implementing editorial workflow will, however, likely be applicable to other libraries. it does not necessarily make sense to assign editorial responsibility to it staff; instead, there may be someone on staff who has editorial or journalistic experience and could serve as the content approver. many universities offer short copyediting courses, and a prospective website editor could attend such a course. implementing a workflow system, especially in drupal, requires a lot of detailed configuration. developers should make sure the workflow concept is clearly mapped out in terms of states, roles, and transitions before attempting to build anything. workflow can seem complicated to users too, so developers should endeavor to hide as much as possible from nonadministrators. small mistakes in drupal settings and permissions can cause confusing failures in the workflow system. for example, a user may find him or herself unable to advance a blog post from “draft” to “ready for approval,” or a state change from “ready for approval” to “live,” and may not actually cause the content to be published. it would save time in the long run to thoroughly test all the possibilities with volunteers who play each role before the site is in active use. finally, when the workflow is in place, the website’s managers may find themselves doing less writing and fewer content updates. they have a new role, though: to curate the site and support staff who use the new tools. the concept of editing is not yet consistently applied to websites unless the site represents an organization that already relies on editors (like a newspaper)—but it is gaining recognition as a best practice. if the website is the most readily available public face of an institution, it should receive editorial attention just as a brochure or fundraising letter would. workflow is one way that libraries can promote a higher level of quality and perceived competence and reliability through their website presence. content is usually addressed by library staff without the administrators/editors doing anything at all. the administrators also can access a page that lists all the content that has been marked as “expired” so they know with whom to follow up. ■■ outdated content: some content may be outdated and undesirable to show the public or be indexed by search engines, but be useful to librarians. it also is not safe to allow staff to delete content, as they may do so by accident. these issues are addressed by the notion of “retiring” content, which hides content by unpublishing it but does not delete it from the system. ■■ future work the workflow system sets up an environment that achieves nyuhsl’s goals, structurally speaking, but social (nontechnology) considerations prevent it from living up to its full potential. not all of the librarians contribute regularly. this is partly because they are busy, and writing web content is not one of their job requirements. another reason is that some staff are more comfortable using the system than others, a phenomenon that reinforces itself as the expert users spend more time creating content and become even more expert. a third cause is that not all librarians may perceive that they have something useful to say. reluctant contributors have no external motivation to increase their involvement. it would be helpful to formalize the role of librarians as content contributors. there is presently no librarian at nyuhsl whose job description includes writing content for the website; even the web services librarian is charged only with “coordinating, designing, and maintaining” sites. ideally, every librarian job description would include working with users and would mention writing website content as an important forum for that. that said, it is not clear what metric could be used to judge the contributions fairly. it also is important to continue to emphasize the value of content contributions so that librarians are motivated and feel recognized. even librarians whose specialties are not outreach-oriented (e.g., systems librarians) have expert knowledge that could be shared in, say, a short article on how to set up rss feeds. workflow is part of a group of concerns being called “content strategy.” this concept, which has grown in popularity since 2008, includes editorial quality alongside issues like branding/messaging, search engine optimization, and information architecture. a content strategist would be concerned with why content is meaningful in addition to how it is managed. in her brief, useful book on the topic, kristina halvorson 98 information technology and libraries | september 2011 5. andrea everard and dennis f. galletta, “how presentation flaws affect perceived site quality, trust, and intention to purchase from an online store,” journal of management information systems 22 (2005–6): 79. 6. pamela briggs et al., “trust in online advice,” social science computer review 20 (2002): 330. 7. patrick j. lynch and sarah horton “online style,” web style guide, 3rd ed., http://webstyleguide.com/wsg3/9-editorial-style/3-online-style.html (accessed dec. 1, 2010). 8. janice (ginny) redish, letting go of the words: writing web content that works (san francisco: morgan kaufman, 2007). 9. emily g. morton-owens, karen l. hanson, and ian walls, “implementing open-source software for three core library functions: a stage-by-stage comparison,” journal of electronic resources in medical libraries 8 (2011): 1–14. 10. angela byron et al., using drupal (sebastopol, calif.: o’reilly, 2008). 11. all drupal modules can be found via http://drupal.org/ project/modules. 12. kristina halvorson, content strategy for the web (berkeley, calif.: new riders, 2010). ■■ acknowledgments thank you to jamie graham, karen hanson, dorothy moore, and vikram yelanadu. references 1. charles martell, “the absent user: physical use of academic library collections and services continues to decline 1995–2006,” journal of academic librarianship 34 (2008): 400–407. 2. jeanie m. welch, “who says we’re not busy? library web page usage as a measure of public service activity,” reference services review 33 (2005): 371–79. 3. b. j. fogg and hsiang tseng, “the elements of computer credibility” (paper presented at chi ’99, pittsburgh, pennsylvania, may 15–20, 1999): 82. 4. b. j. fogg et al., “what makes web sites credible? a report on a large quantitative study” (paper presented at sigchi ’01, seattle, washington, mar. 31–apr. 4, 2001): 67–68. 75 the development and administration of automated systems in academic libraries richard de gennaro: harvard university library, cambridge, mass. the first part of this paper considers three general approaches to the development of an automation program in a large research library. the library may decide simply to wait for developments; it may attempt to develop a total or integrated system from the start; or it may adopt an evolutionary approach leading to an integrated system. outside consultants, it is suggested, will become increasingly important. the second part of the paper deals with important elements in any program regardless of the approach. these include the building of a capability to do automation work, staffing, equipment, organizational structure, selection of projects, and costs. since most computer-based systems in academic libraries at the present time are in the developmen tal or early operational stages when improvements and modifications are frequent, it is difficult to make a meaningful separation between the developmental function and the administrative or management function. development, administration, and operations are all bound up together and are in most cases carried on by the same staff. this situation will change in time, but it seems safe to assume that automated library systems will continue to be characterized by instability and change for the next several years. in any case, this paper will not attempt to distinguish between developmental and administrative ftmctions but will instead discuss in an informal and non-technical way some of the factors to be considered b y librarians and administrators when 76 journal of library automation vol. 1/ 1 march, 1968 their thoughts turn, as they inevitably must, to introducing computer systems into their libraries or to expanding existing machine operations. alternative approaches to library automation will be explored first. there will follow a discussion of some of the important elements that go into a successful program, such as building a capability, a staff, and an organization. the selection of specific projects and the matter of costs will also be covered briefly. approaches to library automation devising a plan for automating a library is not entirely unlike formulating a program for a new library building. while there are general types of building best suited to the requirements of different types of library, each library is unique in some respects, and requires a building which is especially designed for its own particular needs and situation. as there are no canned library building programs, so there are no canned library automation programs, at least not at this stage of development; therefore the first task of a library administration is to formulate an approach to automation based on a realistic assessment of the institution• s needs and resources. certain newly-founded university libraries such as florida atlantic, which have small book collections and little existing bibliographical apparatus, have taken the seemingly logical course of attempting to design and install integrated computer-based systems for all library operations. certain special libraries with limited collections and a flexible bibligraphical apparatus are also following this course. project intrex at m.i.t. is setting up an experimental library operation parallel to the traditional one, with the hope that the former will eventually transform or even supersede the latter. several older university libraries, including chicago, washington state, and stanford, are attempting to design total systems based on on-line technology and to implement these systems in modules. many other university libraries (british columbia, harvard, and yale to name only a few) approach automation in an evolutionary way and are designing separate, but related, batch-processing systems for various housekeeping functions such as circulation, ordering and accounting, catalog input, and card production. still other libraries (princeton is a notable example) expect to take little or no action until national standardized bibliographical formats have been promulgated, and some order or pattern has begun to emerge from the experimental work that is in progress. only time will tell which of these courses will be most fruitful. meanwhile the library administrator must decide what approach to take; and the approach to automation, like that to a building program, must be based on local requirements and available resources ( 1,2). for the sake of this discussion the major principal approaches will be considered under three headings: 1) the wait-for-developments approach, automated systems in academic libraries/ de gennaro 77 2) the direct approach to a total system, and 3) the evolutionary approach to a total system. the use of outside consultants will also be discussed. the wait-for-developments approach this approach is based on the premise that practically all computerbased library systems are in an experimental or research-and-development stage with questionable economic justification, and that it is unnecessary and uneconomical for every library to undertake difficult and costly development work. the advocates of this approach suggest that library automation should not be a moon race and say that it makes sense to wait until the pioneers have developed some standardized, workable, and economical systems which can be installed and operated in other libraries at a reasonable cost. for many libraries, particularly the smaller ones, this is a reasonable position to take for the next few years. it is a cautious approach which minimizes costs and risks. for the larger libraries, however, it overlooks the fact that soon, in order to cope with increasing workloads, they will have to develop the capability to select, adapt, implement, operate, and maintain systems that were developed elsewhere. the development of this capability will take time and will be made more· difficult by the absence of any prior interest and activity in automation within the adapting institution. the costs will be postponed and perhaps reduced because the late-starters will be able to telescope much of the process, like countries which had their industrial revolution late. however, it will take some courage and political astuteness for a library administrator to hold firmly to this position in the face of the pressures to automate that are coming from all quarters, both inside and outside the institution ( 3). a major error in the wait-for-developments approach is the assumption that a time will come when the library automation situation will have shaken down and stabilized so that one can move into the field confidently. this probably will not happen for many years, if it happens at all, for with each new development there is another more promising one just over the horizon. how long does one wait for the perfect system to be developed so that it can be easily "plugged in," and how does one recognize that system when one sees it? there is real danger of being left behind in this position, and a large library may then find it difficult indeed to catch up. the direct approach to a total system this approach to library automation is based on the premise that, since a library is a total operating unit and all its varied operations are interrelated and interconnected, the logic of the situation demands that it be looked upon as a whole by the systems designers and that a single inte78 journal of library automation vol. 1/ 1 march, 1968 grated or total system be designed to include all machinable operations in the library. such a system would make the most efficient and economical use of the capabilities of the computer. this does not require that the entire system be designed and implemented at the same time, but permits treating each task as one of a series of modules, each of which can be implemented separately, though designed as part of a whole. several large libraries have chosen this method and, while a good deal of progress is being made, these efforts are still in the early development stage. the university of chicago system is the most advanced (4) . unlike the evolutionary approach, which assumes that much can be done with local funds, home-grown staff, batch processing and even second generation computers, the total systems approach must be based on sophisticated on-line as well as batch-processing equipment. this equipment is expensive; it is also complex, requiring a trained and experienced staff of systems people and expert programmers to design, implement, and operate it effectively. since the development costs involved in this approach are considerable, exceeding the available resources of even the larger libraries, those libraries that are attempting this method have sought and received sizable financial backing from the granting agencies. the total systems approach has logic in its favor: it focuses on the right goal and the goal will ultimately be attainable. the chief difficulty, however, is one of timing. the designers of these systems are trying to telescope the development process by skipping an intermediate stage in which the many old manual systems would have been converted to simple batch-processing or off-line computer systems, and the experience and knowledge thus acquired utilized in taking the design one step further into a sophisticated, total system using both on-line and batch-processing techniques. the problem is that we neither fully understand the present manual systems nor the implications of the new advanced ones. we are pushing forward the frontiers of both library automation and computer technology. it may well be that the gamble will pay off, but it is extremely doubtful that the first models of a total library system will be economically and technically viable. the best that can be hoped for is that they will work well enough to serve as prototypes for later models. while bold attempts to make a total system will unquestionably advance the cause of library automation in general, the pioneering libraries may very well suffer serious setbacks in the process, and the prudent administrator should carefully weigh the risks and the gains of this approach for his own particular library. the evolutionary approach to a total system this approach consists basically of taking a long-range, conservative view of the problem of automating a large, complex library. the ultimate goal is the same as that of the total systems approach described in the automated systems in academic libraries/de gennaro 79 preceding section, but the method of reaching it is different. in the total systems approach, objectives are defined, missions for reaching those objectives are designed, and the missions are computerized, usually in a series of modules. in the evolutionary approach, the library moves from traditional manual systems to increasingly complex machine systems in successive stages to achieve a total system with the least expenditure of effort and money and with the least disruption of current operations and services ( 5 ) . in the first stage the library undertakes to design and implement a series of basic systems to computerize various procedures using its own staff and available equipment. this is something of a bootstrap operation, the basic idea of which is to raise the level of operation circulation, acquisitions, catalog input, etc. -from existing manual systems to simple and economical machine systems until major portions of the conventional systems have been computerized. in the process of doing this, the library will have built up a trained staff, a data processing department or unit with a regular budget, some equipment, and a space in which to work: in short, an in-house capability to carry on complex systems work. during this first stage the library will have been working with tried and tested equipment and software packages probably of the second generation variety and meanwhile, third generation computers with on-line and time-sharing software are being debugged and made ready for use in actual operating situations. at some point the library itself, computer hardware and software, and the state of the library automation art will all have advanced to a point where it will be feasible to undertake the task of redesigning the simple stage-one systems into a new integrated stage-two system which builds upon the designs and operating experience obtained with the earlier systems. these stage-one systems will have been, for the most part, mechanized versions of the old manual systems; but the stage-two systems, since they are a step removed from the manual ones, can be designed to incorporate significant departures from the old way of doing things and take advantage of the capabilities of the advanced equipment and software that will be used. the design, programming, and implementation of these stage-two systems will be facilitated by the fact that the library is going from one logical machine system to another, rather than from primitive unformalized manual systems to highly complex machine systems in one step. because existing manual systems in libraries produce no hard statistical data about the nature and number of transactions handled, stage-one machine systems have had to be designed without benefit of this essential data. however, even the simplest machine systems can be made to produce a wide variety of statistical data which can be used to great advantage by the designers of stage-two systems. the participation of non80 journal of library automation vol. 1/ 1 march, 1968 library-oriented computer people in stage-two design will also ·be facilitated by the fact that they will be dealing with formalized machine systems and records in machine readable form with which they can easily cope. while the old stage one of library automation was one in which librarians almost exclusively did the design and programming, it is doubtful that stage-two systems can or should be done without the active aid of computer specialists. in stage one it was easier for librarians to learn computing and to do the job themselves than it was to teach computer people about the old manual systems and the job to be done to convert them. this may no longer be the case in dealing with redesign of old machine systems into very complex systems to run on third or fourth generation equipment in an on-line, time-sharing environment. there is now a generation of experienced computer-oriented librarians capable of specifying the job to be done and knowledgeable enough to judge the quality of the work that has been done by the experts. there is no reason why a team of librarians and computer experts should not be able to work effectively together to design and implement future library systems. as traditional library systems are replaced by machine systems, the specialized knowledge of them becomes superfluous, and it was this type of knowledge that used to distinguish the librarian from the computer expert. just as there is a growing corps of librarians specializing in computer work, so there is a growing corps of computer people specializing in library work. it is with these two groups working together as a team that the hope of the future lies. the question of who is to do library automation librarians or computer experts is no longer meaningful; library automation will be done by persons who are knowledgeable about it and who are deeply committed to it as a specialty; whether they have approached it through a background of librarianship or technology will be of little consequence. experience has shown that computer people who have made a full-time commitment to the field of library automation have done some of the best work to date. stage-two, or advanced integrated library systems, may be built by a team of library and computer people of various types working as staff members of the library, as has been suggested in the preceding discussion, but this approach also has its weaknesses. for example, let us assume that a large library has finally brought itself through stage one and is now planning to enter the second stage. it may have acquired a good deal of the capability to do advanced work, but its staff may be too small and too inexperienced in certain aspects of the work to undertake the major task of planning, designing, and implementing a new integrated system. additional expert help may be needed, but only on a temporary basis during the planning and design stages. such people will be hard to find, and also hard to hire within some library salary structures. they automated systems in academic librari-es/ de gennaro 81 will be difficult to absorb into the library's existing staff, administrative, and physical framework. they may also be difficult to separate from the staff when they are no longer needed. use of outside consultants there are alternative approaches to creating advanced automated systems. the discussion that follows will deal with one of the most obvious: to contract much of the work out to private research and development firms specializing in library systems. what comes to mind here is an analogy with the employment of specialized talents of architects, engineers, and construction companies in planning and building very large, complex and costly library buildings, which are then turned over to librarians to operate. when a decision has been made to build a new building, the university architect is not called in to do the job, nor is an architect added to the library staff, nor are librarians on the staff trained to become architects and engineers qualified to design and supervise the construction of the building. most libraries have on their staffs one or two librarians who are experienced and knowledgeable enough to determine the over-all requirements of the new building, and together they develop a building program which outlines the general concept of the building and specifies various requirements. a qualified professional architect is commissioned to translate the program into preliminary drawings, and there follows a continuing dialogue between the architect and the librarians which eventually produces acceptable working drawings of a building based on the original program. for tasks outside his area of competence, the architect in turn engages the services of various specialists, such as structural and heating and ventilating engineers. both the architect and the owners can also call on library consultants for help and advice if needed. the architect participates in the selection of a construction company to do the actual building and is responsible for supervic;ing the work and making sure that the building is constructed according to plans and contracts. upon completion, the building is turned over to the owners, and the librarians move in and operate it and see to its maintenance. in time, various changes and additions will have to be made. minor ones can be made by the regular buildings staff of the institution, but major ones will probably be made with the advice and assistance of the original architect or some other. in the analogous situation, the library would have its own experienced systems unit or group capable of formulating a concept and drawing up a written program specifying the goals and requirements of the automated system. a qualified "architect" for the system would be engaged in the form of a small firm of systems consultants specializing or experienced in library systems work. their task, like the architect's, would be to turn 82 journal of library automation vol. 1/ 1 march, 1968 the general program into a detailed system design with the full aid and participation of the local library systems group. this group would be experienced and competent enough to make sure that the consultants really understood the program and were working in harmony with it. mter an acceptable design had emerged from this dialogue, the consultant would be asked to help select a systems development firm which would play a role similar to that of the construction company in the analog: to complete the very detailed design work and .to do the programming and debugging and implementation of the system. the consultant would oversee this work, just as the architect oversees the construction of a building. the local library group will have actively participated in the development and implementation of the system and would thus be competent to accept, operate, maintain and improve it. success or failure in this approach to advanced library automation will depend to a large extent on the competence of the "architect" or consultant who is engaged. until recently this was not a very promising route to take for several reasons. there were no firms or consultants with the requisite knowledge and experience in library systems, and the state of the library automation art was confused and lacking in clear h·ends or direction. it was generally felt tl1at batch-processing systems on second and even third generation computing equipment could and should be designed and installed by local staff in order to give them necessary experience and to avoid the failures that could come from systems designed outside the library. library automation has evolved to a point where there is a real need for advanced library systems competence that can be called upon in the way that has been suggested, and individuals and firms will appear to satisfy that need. it is very likely, however, that the knowledge and the experience that is now being obtained in on-line systems by pioneering libraries such as the university of chicago, washington state university and stanford university, will have to be assimilated before we can expect competent consultants to emerge. the chief difficulty with the architect-and-building analog is that while the process of designing and constructing library buildings is widely understood, there being hundreds of examples of library buildings which can be observed and studied as precedents, the total on-line library system has yet to be designed and tested. there are no precedents and no examples; we are in the position of asking the "architect'' to design a prototype system, and therein lies the risk. mter this task has been done several times, librarians can begin to shop around for experienced and competent "architects" and successful operating systems which can be adapted to their needs. the key problem here, as always in library automation, is one of correct timing: to embark on a line of development automated systems in academic libraries/de gennaro 83 only when the state of the art is sufficiently advanced and the time is ripe for a particular new development. building the capability for automation regardless of the approach that is selected, there are certain prerequisites to a successful automation effort, and these can be grouped under the rubric of "building the capability." to build this capability requires time and money. it consists of a staff, equipment, space, an organization with a regular budget, and a certain amount of know-how which is generally obtained by doing a series of projects. success depends to a large extent on how well these resources are utilized, i.e. on the overall sh·ategy and the nature and timing of the various moves that are made. much has already been said about building the capability in the discussion on the approaches to automation, and what follows is an expansion of some points that have been made and a recapitulation of others. staff since nothing gets done without people, it follows that assembling, training, and holding a competent staff is the most important single element in a library's automation effort. the number of trained and experienced library systems people is still extremely small in ·relation to the ever-growing need and demand. to attract an experienced computer librarian and even to hold an inexperienced one with good potential, libraries will have to pay more than they pay members of the staff with comparable experience in other lines of library work. this is simply the law of supply and demand at work. to attract people from the computer field will by the same token require even higher salaries. in addition, library systems staff, because of the rate of development of the field and the way in which new information is communicated, will have to be given more time and funds for training courses and for travel and attendance at conferences than has been the case for other library staff. the question of who will do library automation-librarians or computer experts-has already been touched upon in another context, but it is worth emphasizing the point that there is no unequivocal answer. there are many librarians who have acquired the necessary computer expertise and many computer people who have acquired the necessary knowledge of library functions. the real key to the problem is to get people who are totally committed to library automation whatever their background. computer people on temporary loan from a computing center may be poor risks, since their professional commitment is to the computer world rather than that of the library. they are paid and promoted by the computing center and their primary loyalty is necessarily to that employer. computer people, like the rest of us, give their best to tasks which they find interesting and challenging, and by and large, they tend to look l 84 journal of library automation vol. 1/ 1 march, 1968 upon the computerization of library housekeeping tasks as trivial and unworthy of their efforts. on the other hand, a first-rate computer person who has elected to specialize in library automation and who has accepted a position on a library staff may be a good risk, because he will quickly take on many of the characteristics of a librarian yet without becoming burdened by the full weight of the conventional wisdom that librarians are condemned to carry. the ideal situation is to have a staff large enough to include a mixture of both types, so that each will profit by the special knowledge and experience of the other. to bring in computer experts inexperienced in library matters to automate a large and complex library without the active participation of the library's own systems people is to invite almost certain failure. outsiders, no matter how competent, tend to underestimate the magnitude and complexity of library operations; this is tme not only of computing center people but also of independent research and development firms. a library automation group can include several different types of persons with very different kinds and levels of qualifications. the project director or administrative head should preferably be an imaginative and experienced librarian who has acquired experience with electronic data processing equipment and techniques, and an over-all view of the general state of the library automation art, including its potential and direction of development. there are various levels of library systems analysts and programmers, and the number and type needed will depend on the approach and the stage of a particular library's automation effort. the critical factor is not numbers but quality. there are many cases where one or two inspired and energetic systems people have far surpassed the efforts of much larger groups in both quality and quantity of work. some of the most effective library automation work has been done by the people who combine the abilities of the systems analyst with those of the expert programmer and are capable of doing a complete project themselves. a library that has one or two really gifted systems people of this type and permits them to work at their maximum is well on the way to a successful automation effort. as a library begins to move into development of on-line systems, it will need specialist programmers in addition to the systems analysts described above. these programmers need not be, and probably will not be, librarians. other members of the team, again depending on the projects, will be librarians who are at home in the computer environment but who will be doing the more traditional types of work, such as tagging and editing machine catalog records. in any consideration of library automation staff, it would be a mistake to underestimate the importance of the role of keypunchers, paper tape automated systems in academic libraries/de gennaro 85 typists, and other machine operators; it is essential that these staff members be conscientious and motivated persons. they are responsible for the quality and quantity of the input, and therefore of the output, and they can frequently do much to make or break a system. a good deal of discussion and experimentation has gone into the question of the relative efficiency of various keyboarding devices for library input, but little consideration is given to the human operators of the equipment. experience shows that there can be large variations in the speed and accuracy of different persons doing the same type of work on the same machine. equipment one of the lessons of library automation learned during the last few years is that a library cannot risk putting its critical computer-based systems onto equipment over which it has no control. this does not necessarily mean that it needs its own in-house computer. however, if it plans to rely on equipment under the administrative control of others, such as the computer center or the administrative data processing unit, it must get firm and binding commitments for time, and must have a voice in the type and configuration of equipment to be made available. the importance of this point may be overlooked during an initial development period, when the library's need for time is minimal and flexible; it becomes extremely critical when systems such as acquisitions and . circulation become totally dependent on computers. people at university computing centers are generally oriented toward scientific and research users and in a tight situation wiu give the library's needs second priority; those in administrative data process~g, because they are operations oriented, tend to have a somewhat better appreciation of the library's requirements. in any case, a library needs more than the expressed sympathy and goodwill of those who control the computing equipment-it needs firm commitments. for all but the largest libraries, the economics of present-day computer applications in libraries make it virtually impossible to justify an in-house machine of the capacity libraries will need, dedicated solely or largely to library uses. even the larger libraries will find it extremely difficult to justify a high-discount second generation machine or a small third generation machine during the period when their systems are being developed and implemented a step or a module at a time. eventually, library use may increase to a point where the in-house machine will pay for itself, but during the interim period the situation will be uneconomical unless other users can be found to share the cost. in the immediate future, most libraries will have to depend on equipment located in computing or data processing centers. the recent experience of the university of chicago library, which is pioneering on-line systems, suggests that this situation is inevitable, given the high core requirements and low com86 journal of library automation vol. 1/ 1 march, 1968 puter usage of library systems. experience at the university of missouri ( 6), suggests that the future will see several libraries grouping to share a machine dedicated to library use; this may well be preferable to having to share with research and scientific users elsewhere within the university. a clear trend is not yet evident, but it seems reasonable to suppose that in the next few years sharing of one kind or another will be more common than having machines wholly assigned to a single library; and that local situations will dictate a variety of arrangements. while it is clear that the future of library automation lies in third-generation computers, much of their promise is as yet unfulfilled, and it would be premature at this point to write off some of the old, reliable, second-generation batch-processing machines. the ibm 1401, for example, is extremely well suited for many library uses, particularly printing and formatting, and it is a machine easily mastered by the uninitiated. this old workhorse will be with us for several more years before it is retired to majorca along with obsolete paris taxis. organization when automation activity in a library has progressed to a point where the systems group consists of several permanent professionals and several clericals, it may be advisable to make a permanent place for the group in the library's regular organizational structure. the best arrangement might be to form a separate unit or department on an equal footing with the traditional departments such as acquisitions, cataloging, and public services. this systems department would have a two-fold function: it would develop new systems and operate implemented systems; and it would bring together for maximum economy and efficiency most of the library's data processing equipment and systems staff. it will require adequate space of its own andabove alla regular budget, so that permanent and long-term programs can be developed and sustained on some thing other than an ad hoc basis. there are other advantages to having an established systems department or unit. it gives a sense of identity and esprit to the staff; and it enables them to work more effectively with other departments and to be accepted by them as a permanent fact of life in the library, thereby diminishing resistance to automation. let there be no mistake about it the systems group will be a permanent and growing part of the library staff, because there is no such thing as a finished, stable system. (there is a saying in the computer field which goes "if it works, it's obsolete.") the systems unit should be kept flexible and creative. it should not be allowed to become totally preoccupied with routine operations and submerged in its day-to-day workload, as is too frequently the case with the traditional departments, which consequently lose their capacity to see their operations clearly and to innovate. part of the systems effort automated systems in academic libraries/de gennaro 87 must be devoted to operational systems, but another part should be devoted to the formulation and development of new projects. the creative staff should not be wasted running routine operations . . . there has never been any tradition for research and development work in libraries they were considered exclusively service and operational institutions. the advent of the new technology is forcing a change in this traditional attitude in some of the larger and more innovative libraries which are doing some research and a good deal of development. it is worth noting that a concomitant of research and development is a certain amount of risk but that, while there is no such thing as change without risk, standing pat is also a gamble. not every idea will succeed and we must learn to accept failures, but the experiments must be conducted so as to minimize the effect of failure on actual library operations. ·automated systems are never finished they are open-ended. they are always being changed, enlarged, and improved; and program and system maintenance will consequently be a permanent activity. this is one of the chief reasons why the equipment and the systems group should be concentrated in a separate department. the contrary case, namely dispersion of the operational aspects among the departments responsible for the work, may be feasible in the future as library automation becomes more sophisticated and peripheral equipment becomes less expensive, but the odds at this time appear to favor greater centralization. · the harvard university library has created, with good results, a new major department along the lines suggested above, except that it also includes the photo-reproduction services. the combination of data processing and reprography in a single department is a natural and logical relationship and one which will have increasingly important implications as both technologies develop concurrently and with increasing interdependence in the future. even at the present time, there is sufficient relationship between them so that the marriage is fruitful and in no way premature. while computers have had most of the glamour, photographic technology in general, and particularly the advent of the quick-copying machine, during the last seven years has so far had a more profound and widespread impact on library resources and services to readers than the entire field of computers and data processing. within the next several years, computer and reprographic technology will be so closely intertwined in libraries as to be inseparable. it would be a mistake to sell reprography short in the coming revolution. project selection no academic library should embark on any type of automation program without first acquiring a basic knowledge of the projects and plans of the library of congress, the national library of medicine, the national li88 journal of library automation vol. 1/ 1 march, 1968 · brary of agriculture, and certain of their joint activities, such as the national serials data program. as libraries with no previous experience with data processing systems move into the field of automation, they frequently select some relatively simple and productive projects to give experience to the systems staff and confidence in machine tec;hniques to the rest of the library staff. precise selection will depend on the local situation, but projects such as the production of lists of current journals (not serials check-in), lists of reserve books, lists of subject headings, circulation, and even acquisitions ordering and accounting systems are considered to be the safest and the most productive type of initial projects. since failures in the initial stage will have serious psychological effects on the library administration and entire staff, it is best to begin with modest projects. until recently it was fashionable to tackle the problem of automating the serials check-in system as a first project on the grounds that this was one of the most important, troublesome, and repetitive library operations and was therefore the best area in which to begin computerization. fortunately, a more realistic view of the serials problem has begun to prevail that serial receipts is an extremely complex and irregular library operation and one which will probably require some on-line updating capabilities, and complex file organization and maintenance programs. in any case, it is decidedly not an area for beginners. a major objection to all of the projects mentioned is that they do not directly involve the catalo~, which is at the heart of library automation. now that the marc ii tormat has been developed by the library of congress and is being widely accepted as the standardized bibliographical and communications format, the most logical initial automation effort for many libraries will be to adapt to their own environments the input system for current cataloging which is now being developed by the library of congress. the logic of beginning an integrated system with the development of an input sub-system for current cataloging has always been compelling for this author far more compelling than beginning in the ordering process, as so many advocate. the catalog is the central record, and the conversion of this record into machinable form is the heart of the matter of library automation. it seems self-evident that systems design should begin here with the basic bibliographical entry upon which the entire system is built. having designed this central module, one can then tum to the acquisitions process and design this module around the central one. circulation is a similar secondary problem. in other words, systems design should begin at the point where the permanent bibliographical record enters the system and not where the first tentative special-purpose record is created. unfortunately, until the advent of the standardized marc ii format, it was not feasible, except in automated systems in academic libraries/ de gennaro 89 an experimental way, for libraries to begin with the catalog record, simply because the state of the art was not far enough advanced. the development and acceptance of the marc ii format in 1967 marks the end of one era in library automation and the beginning of another. in the pre-marc ii period every system was unique; all the programming and most of the systems work had to be done by a library's own staff. in the post-marc ii period we will begin to benefit from systems and programs that will be developed at the library of congress and elsewhere, because they will ~e designed around the standard format and for at least one standard computer. as a result of this, automation in libraries will be greatly accelerated and will become far more widespread in the next few years ( 7). an input system for current cataloging in the marc ii format will be among the first packages available. it will be followed shortly by programs designed to sort and manipulate the data in various ways. a library will require a considerable amount of expertise on the part of its staff to adapt these procedures and programs to its own uses (we are not yet at the point of "plugging-in" systems), but the effort will be considerably reduced and the risks of going down blind alleys with homemade approaches and systems will be nearly eliminated for those libraries that are willing to adopt this strategy. the development and operation of a local marc ii input system with an efficient alteration and addition capability will be a prerequisite for any library that expects to learn to make effective use of the magnetic tapes containing the library of congress's current c;atalog data in the marc ii format, which will be available as a regular subscription in july, 1968. in addition to providing the experience essential for dealing with the library of congress marc data, a local input system will enable the library to enter its own data both into the local systems and into the national systems which will l?egin to emerge in the near future. since the design of the marc ii format is also hospitable to other kinds of library data, such as subject-headings lists and classification schedules, the experience gained with it in an input system will be transferable to other library automation projects. costs the price of doing original development work in the library automation field comes extremely highso high that in most cases such work cannot be undertaken without substantial assistance from outside sources. even when grants are available, the institution has to contribute a considerable portion of the total cost of any development effort, and this cost is not a matter of money alone; it requires the commitment of the library's limited human resources. in the earlier days of library automa-:tion attention was focused on the high cost of hardware, computer and 90 journal of library automation vol 1/ 1 march, 1q.68 peripheral equipment. the cost of software, the systems work and programming, tended to be underestimated. experience has shown, however, that software costs are as high as hardware costs or even higher. the development of new systems, i.e., those without precedents, is the most costly kind of library automation, and most libraries will have to select carefully the areas in which to do their original work. for those libraries that are content to adopt existing systems, the costs of the systems effort, while still high, are considerably less and the risks are also reduced. these costs, however, will probably have to be borne entirely by the institution, as it is unlikely that outside funding can be obtained for this type of work. the justification of computer-based library systems on the basis of the costs alone will continue to be difficult because machine systems not only replace manual systems but generally do more and different things, and it is extremely difficult to compare them with the old manual systems, which frequently did not adequately do the job they were supposed to do and for which operating costs often were unknown. generally speaking, and in the short run at least, computer-based systems will not save money for an institution if all development and implementation costs are included. they will provide better and more dependable records and systems, which are essential to enable libraries simply to cope with increased intake and workloads, but they will cost at least as much as the inadequate and frequently unexpansible manual systems they replace. the picture may change in the long run, but even then it seems more reasonable to expect that automation, in addition to profoundly changing · the way in which the library budget is spent, will increase the total cost of providing library service. however, that service will be at a much higher level than the service bought by today's library budget. certain jobs will be eliminated, but others will be created to provide ·new services and services in greater depth; as a library becomes increasingly successful and responsive, more and more will be demanded of it. conclusion the purpose of this paper has been to stress the importance of good strategy, correct timing, and intelligent systems staff as the essential ingredients for a successful automation program. it has also tried to make clear that no canned formulas for automating an academic library are waiting to be discovered and applied to any particular library. each library is going to have to decide for itseh which approach or strategy seems best suited to its own particular needs and situation. on the other hand, a good deal of experience with the development and administration of library systems has been acquired over the last few years and some of it may very well be useful to those who are about to take the plunge for the first time. this paper was written with the intention of automated systems in academic libraries/ de gennaro 91 passing along, for what they are worth, one man's ideas, opinions, and impressions based on an imperfect knowledge of the state of the library automation art and a modest amount of first-hand experience in library systems development and administration. references 1. wasserman, paul: th e librarian and the machine (detroit: gale, 1965). a thoughtful and thorough review of the state of the art of library automation, with some discussion of the various approaches to automation. essential reading for library administrators. 2. cox, n. s. m.; dews, j. d.; dolby, j. l.: the computer and the libmry (newcastle upon tyne: university of newcastle upon tyne, 1966). american edition published by archon books, hamden, conn. extremely clear, well-written and essential book for anyone with an interest in library automation. 3. dix, william s.: annual report of the librarian for the year ending june 30, 1966 (princeton: princeton university library, 1966). one of the best policy statements on library automation; a comprehensive review of the subject in the princeton context, with particular emphasis on the "wait-for-developments" approach. 4. fussier, herman h.; payne, charles t.: annual report 1966/67 to the national science foundation from the university of chicago library; development of an integrated, computer-based, bibliographical data system for a large university library (chicago: university of chicago library, 1967 ). appended to the report is a paper given may 1, 1967, at the clinic on library application of data processing conducted by the graduate school of library science, university of illinois. mr. payne is the author, and the paper is entitled "an integrated computer-based bibliographic data system for a large university library: progress and problems at the university of chicago." 5. kilgour, frederick g.: "comprehensive modern library systems," in the brasenose conference on the automation of libraries, proceedings. (london: mansell, 1967), 46-56. an example of the evolutionary approach as employed at the yale university library. 6. parker, ralph h.: "not a shared system: an account of a computer operation designed specifically and solely for library use at the university of missouri," librm·y journal, 92 (nov. 1, 1967), 3967-3970. 7. annual review of information science and technology (new york: lnterscience publishers), 1 ( 1966) . a useful tool for surveying the current state of the library automation art and for obtaining citations to current publications and reports is a chapter on automation in libraries which appears in each volume. communications ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 172 information technology and libraries | december 2009 information discovery insights gained from multipac, a prototype library discovery system alex a. dolski at the university of nevada las vegas libraries, as in most libraries, resources are dispersed into a number of closed “silos” with an organization-centric, rather than patron-centric, layout. patrons frequently have trouble navigating and discovering the dozens of disparate interfaces, and any attempt at a global overview of our information offerings is at the same time incomplete and highly complex. while consolidation of interfaces is widely considered to be desirable, certain challenges have made it elusive in practice. m ultipac is an experimental “discovery,” or metasearch, system developed to explore issues surrounding heterogeneous physical and networked resource access in an academic library environment. this article discusses some of the reasons for, and outcomes of, its development at the university of nevada las vegas (unlv). n the case for multipac fragmentation of library resources and their interfaces is a growing problem in libraries, and unlv libraries is no exception. electronic information here is scattered across our innovative webpac; our main website, our three branch library websites; remote article databases, local custom databases, local digital collections, special collections, other remotely hosted resources (such as libguides), and others. the number of these resources, as well as the total volume of content offered by the libraries, has grown over time (figure 1), while access provisions have not kept pace in terms of usability. in light of this dilemma, the libraries and various units within have deployed finding and search tools that provide browsing and searching access to certain subsets of these resources, depending on criteria such as n the type of resource; n its place within the libraries’ organizational structure; n its place within some arbitrarily defined topical categorization of library resources; n the perceived quality of its content; and n its uniqueness relative to other resources. these tools tend to be organization-centric rather than patron-centric, as they are generally provisioned in relative isolation from each other without placing as much emphasis on the big picture (figure 2). the result is, from the patron’s perspective, a disaggregated mass of information and scattered finding tools that, to varying degrees, each accomplishes its own specific goals at the expense of macro-level findability. currently, a comprehensive search for a given subject across as many library resources as possible might involve visiting a half-dozen interfaces or more—each one predicated upon awareness of each individual interface, its relation to the others, and figure 1. “silos” in the library figure 2. organization-centric resource provisioning alex a. dolski (alex.dolski@unlv.edu) is web & digitization application developer at the university of nevada las vegas libraries. information discovery insights gained from multipac | dolski 173 the characteristics of its specific coverage of the corpus of library content. our library website serves as the de facto gateway to our electronic, networked content offerings. yet usability studies have shown that findability, when given our website as a starting point, is poor. undoubtedly this is due, at least in part, to interface fragmentation. test subjects, when given a task to find something and asked to use the library website as a starting point, fail outright in a clear majority of cases.1 multipac is a technical prototype that serves as an exploration of these issues. while the system itself breaks no new technical ground, it brings to the forefront critical issues of metadata quality, organizational structure, and long-term planning that can inform future actions regarding strategy and implementation of potential solutions at unlv and elsewhere. yet it is only one of numerous ways that these issues could be addressed.2 in an abstract sense, multipac is biased toward principles of simplification, consolidation, and unification. in theory, usability can be improved by eliminating redundant interfaces, consolidating search tools, and bringing together resource-specific features (e.g., opac holdings status) in one interface to the maximum extent possible (figure 3). taken to an extreme, this means being able to support searching all of our resources, regardless of type or location, from a single interface; abstracting each resource from whatever native or built-in user interface it might offer; and relying instead on its data interface for querying and result-set gathering. thus multipac is as much a proof-of-concept as it is a concrete implementation. n background: how multipac became what it is multipac came about from a unique set of circumstances. from the beginning, it was intended as an exploratory project, with no serious expectation of it ever being deployed. our desire to have a working prototype ready for our discovery mini-conference meant that we had just six weeks of development time, which was hardly sufficient for anything more than the most agile of table 1. some popular existing library discovery systems name company/institution commercial status aquabrowser serials solutions commercial blacklight university of virginia open-source (apache) encore innovative interfaces commercial extensible catalog university of rochester open-source (mit/gpl) libraryfind oregon state university open-source (gpl) metalib ex libris commercial primo ex libris commercial summon serials solutions commercial vufind villanova university open-source (gpl) worldcat local oclc commercial table 2. some existing back-end search servers name company/institution commercial status endeca endeca technologies commercial idol autonomy commercial lucene apache foundation open-source (apache) search server microsoft commercial search server express microsoft free solr (superset of lucene) apache foundation open-source (apache) sphinx sphinx technologies open-source (gpl) xapian community open-source (gpl) zebra index data open-source (gpl) 174 information technology and libraries | december 2009 development models. the resulting design, while foundationally solid, was limited in scope and depth because of time constraints. another option, instead of developing multipac, would have been to demonstrate an existing open-source discovery system. the advantage of this approach is that the final product would have been considerably more advanced than anything we could have developed ourselves in six weeks. on the other hand, it might not have provided a comparable learning opportunity. n survey of similar systems were its development to continue, multipac would find itself among an increasingly crowded field of competitors (table 1). a number of library discovery systems already exist, most backed by open-source or commercially available back-end search engines (table 2), which handle the nitty-gritty, low-level ingestion, indexing, and retrieval. these lists of systems are by no means comprehensive and do not include notable experimental or research systems, which would make them much longer. n architecture in terms of how they carry out a search, meta-search applications can be divided into two main groups: distributed (or federated search), in which searches are “broadcast” to individual resources that return results in real time (figure 4); and harvested search, in which searches are carried out against a local index of resource contents (figure 5).3 both have advantages and disadvantages beyond the scope of this article. multipac takes the latter approach. it consists of three primary components: the search server, the user interface, and the metadata harvesting system (figure 6). figure 4. the federated search process figure 5. the harvested search process figure 6. the three main components of multipac figure 3. patron-centric resource provisioning information discovery insights gained from multipac | dolski 175 n search server after some research, solr was chosen as the search server because of its ease of use, proven library track record, and http–based representational state transfer (rest) application programming interface (api), which improves network-topological flexibility, allowing it to be deployed on a different server than the front-end web application—an important consideration in our server environment.4 jetty—a java web application server bundled with solr—proved adequate and convenient for our needs. the metadata schema used by solr can be customized. we derived ours from the unqualified dublin core metadata element set (dcmes),5 with a few fields removed and some fields added, such as “library” and “department,” as well as fields that support various multipac features, such as thumbnail images, and primary record urls. dcmes was chosen for its combination of generality, simplicity, and familiarity. in practice, the solr schema is for finding purposes only, so whether it uses a standard schema is of little importance. n user interface the front-end multipac system is written in php 5.2 in a model-view-controller design based on classical object design principles. to support modularity, new resources can be added as classes that implement a resource-class interface. the multipac html user interface is composed of five views: search, browse, results, item, and list, which exist to accommodate the finding process illustrated in figure 7. each view uses a custom html template that can be easily styled by nonprogrammer web designers. (needless to say, judging by figures 8–12, they haven’t been.) most dynamic code is encapsulated within dedicated “helper” methods in an attempt to decouple the templates from the rest of the system. output formats, like resources, are modular and decoupled from the core of the system. the html user interface is one of several interfaces available to the multipac system; others include xml and json, which effectively add web services support to all encompassed resources—a feature missing from many of the resources’ own built-in interfaces.6 n search view search view (figure 8) is the simplest view, serving as the “front page.” it currently includes little more than a brief introduction and search field. the search field is not complicated; it is, in fact, possible to include search forms on any webpage and scope them to any subset of resources on the basis of facet queries. for example, a search form could be scoped to las vegas–related resources in special collections, which would satisfy the demand of some library departments for custom search engines tailored to their resources without contributing to the “interface fragmentation” effect discussed in the introduction. (this would require a higher level of metadata quality than we currently have, which will be discussed in depth later.) because search forms can be added to any page, this view is not essential to the multipac system. to improve simplification, it could be easily removed and replaced with, for example, a search form on the library homepage. n browse view browse view (figure 9) is an alternative to search view, intended for situations in which the user lacks a “concrete target” (figure 7). as should be evident by its appearance, figure 7. the information-finding process supported by multipac figure 8. the multipac search view page 176 information technology and libraries | december 2009 this is the least-developed view, simply displaying facet terms in an html unordered list. notice the facet terms in the format field; this is malprocessed, marc– encoded information resulting from a quick-and-dirty extensible stylesheet language (xsl) transformation from marcxml to solr xml. n results view the results page (figure 10) is composed of three columns: 1. the left column displays a facet list—a feature generally found to be highly useful for results-gathering purposes.7 the data in the list is generated by solr and transformed to an html unordered list using php. the facets are configurable; fields can be made “facetable” in the solr schema configuration file. 2. the center column displays results for the current search query that have been provided by solr. thumbnails are available for resources that have them; generic icons are provided for those that do not. currently, the results list displays item title and description fields. some items have very rich descriptions; others have minimal descriptions or no descriptions at all. this happens to be one of several significant metadata quality issues that will be discussed later. 3. the right column displays results from nonindexed resources, including any that it would not be feasible to index locally, such as google, our article databases, and so on. multipac displays these resources as collapsed panes that expand when their titles are clicked and initiate an ajax request for the current search query. in a situation in which there might be twenty or more “panes” to load, performance would obviously suffer greatly if each one had to be queried each time the results page loaded. the on-demand loading process greatly speeds up the page load time. currently, the right column includes only a handful of resource panes—as many as could be developed in six weeks alongside the rest of the prototype. it is anticipated that further development would entail the addition of any number of panes—perhaps several dozen. the ease of developing a resource pane can vary greatly depending on the resource. for developerfriendly resources that offer a useful javascript object notation (json) api, it can take less than half an hour. for article databases, which vendors generally take great pains to “lock down,” the task can entail a two-day marathon involving trial-and-error http-request-token authentication and screen-scraping of complex invalid html. in some cases, vendor license agreements may prohibit this kind of use altogether. there is little we can do about this; clearly, one of multipac’s severest limitations is its lack of adeptness at searching these types of “closed” remote resources. n item view item view (figure 11) provides greater detail about an individual item, including a display of more metadata fields, an image, and a link to the item in its primary context, if available. it is expected that this view also would include holdings status information for opac resources, although this has not been implemented yet. the availability of various page features is dependent on values encoded in the item’s solr metadata record. for example, if an image url is available, it will be displayed; if not, it won’t. an effort was made to keep the view logic separate from the underlying resource to improve code and resource maintainability. the page template itself does not contain any resource-dependent conditionals. n list view list view (figure 12), essentially a “favorites” or “cart” view, is so named because it is intended to duplicate the list feature of unlv libraries’ innovative millennium figure 9. the multipac browse view page information discovery insights gained from multipac | dolski 177 opac. the user can click a button in either results view or item view to add items to the list, which is stored in a cookie. although currently not feature-rich, it would be reasonable to expect the ability to send the list as an e-mail or text message, as well as other features. n metadata harvesting system for metadata to be imported into solr, it must first be harvested. in the harvesting process, a custom script checks source data and compares it with local data. it downloads new records, updates stale records, and deletes missing records. not all resources support the ability to easily check for changed records, meaning that the full record set must be downloaded and converted during every harvest. in most cases, this is not a problem; most of our resources (the library catalog excluded) can be fully dumped in a matter of a few seconds each. in a production environment, the harvest scripts would be run automatically every day or so. in practice, every resource is different, necessitating a different harvest script. the open archives initiative protocol for metadata harvesting (oai-pmh) is the protocol that first jumps to mind as being ideal for metadata harvesting, but most of our resources do not support it. ideally, we would modify as many of them as possible to be oai–compliant, but that would still leave many that are out of our hands. either way, a substantial number of custom harvest scripts would still be required. for demonstration purposes, the multipac prototype was seeded with sample data from a handful of diverse resources: 1. a set of 16,000 marc records from our library catalog, which we converted to marcxml and then to solr xml using xsl transformations 2. our locally built las vegas architects and buildings database, a mysql database containing more than 10,000 rows across 27 tables, which we queried and dumped into xml using a php script 3. our locally built special collections database, a smaller mysql database, which we dealt with the same way 4. our contentdm digital collections, which we downloaded via oai-pmh and transformed using another custom xsl stylesheet there are typically a variety of conversion options for each resource. because of time constraints, we simply chose what we expected would be the quickest route for each, and did not pay much attention to the quality of the conversion. n how multipac answers unlv libraries’ discovery questions multipac has essentially proven its capability of solving interface multiplication and fragmentation issues. figure 10. the multipac results view page 178 information technology and libraries | december 2009 by adding a layer of abstraction between resource and patron, it enables us to reference abstract resources instead of their specific implementations—for example, “the library catalog” instead of “the innopac catalog.” this creates flexibility gains with regard to resource provision and deployment. this kind of “pervasive decoupling” can carry with it a number of advantages. first, it can allow us to provide custom-developed services that vendors cannot or do not offer. second, it can prevent service interruptions caused by maintenance, upgrades, or replacement of individual back-end resources. third, by making us less dependent on specific implementations of vendor products—in other words, reducing vendor “lock-in”—it can potentially give us leverage in vendor contract negotiations. because of the breadth of information we offer from our website gateway, we as a library are particularly sensitive about the continued availability of access to our resources at stable urls. when resources are not persistent, patrons and staff need to be retrained, expectations need to be adjusted, and hyperlinks—scattered all over the place—need to be updated. by decoupling abstract resources from their implementations, multipac becomes, in effect, its own persistent uri system, unifying many library resources under one stable uri schema. in conjunction with a url rewriting system on the web server, a resource-based uri schema (figure 13) would be both powerful and desirable.8 n lessons learned in the development of multipac the lessons learned in the development of multipac fall into three main categories, listed here in order of importance. metadata quality considerations quality metadata—characterized by unified schemas; useful crosswalking; and consistent, thorough description—facilitates finding and gathering. in practice, a surrogate record is as important as the resource it describes. below a certain quality threshold, its accompanying resource may never be found, in which case it may as well not exist. surrogate record quality influences relevance ranking and can mean the difference between the most relevant result appearing on page 1 or page 50 (relevance, of course, being a somewhat disputed term). solr and similar systems will search all surrogates, including those that are of poor quality, but the resulting relevancy ranking will be that much less meaningful. figure 13. example of an implementation-based vs. resource-based uri implementation-based http://www.library.unlv.edu/arch/archdb2/index.php/projects/view/1509 resource-based (hypothetical) http://www.library.unlv.edu/item/483742 figure 11. the multipac item view page figure 12. the multipac list view page information discovery insights gained from multipac | dolski 179 metadata quality can be evaluated on several levels, from extremely specific to extremely broad (figure 14). that which may appear to be adequate at one level may fail at a higher level. using this figure as an example, multipac requires strong adherence to level 5, whereas most of our metadata fails to reach level 4. a “level 4 failure” is illustrated in table 3, which compares sample metadata records from four different multipac resources. empty cells are not necessarily “bad”— not all metadata elements apply to all resources—but this type of inconsistency multiplies as the number of resources grows, which can have negative implications for retrieval. suggestions for improving metadata quality the results from the multipac project suggest that metadata rules should be applied strictly and comprehensively according to library-wide standards that, at our libraries, have yet to be enacted. surrogate records must be treated as must-have (rather than nice-to-have) features of all resources. resources that are not yet described in a system that supports searchable surrogate records should be transitioned to one that does; for example, html webpages should be transitioned to a content management system with metadata ascription and searchability features (at unlv, this is planned). however, it is not enough for resources to have high-quality metadata if not all schemas are in sync. there exist a number of resources in our library that are well-described but whose schemas do not mesh well with other resources. different formats are used; different descriptive elements figure 14. example scopes of metadata application and evaluation, from broad (top) to specific table 3. comparing sample crosswalked metadata from four different unlv libraries resources library catalog digital collections special collections database las vegas architects & buildings database title goldfield: boom town of nevada map of tonopah mining district, nye county, nevada 0361 : mines and mining collection flamingo hilton las vegas creator paher, stanley w. booker & bradford call number f849.g6p34 contents (item-level description of contents) format digital object photo collections database record language eng eng eng coverage tonopah mining district (nev.) ; ray mining district (nev.) description (omitted for brevity) publisher nevada publications university of nevada las vegas libraries unlv architecture studies library subject (lcsh omitted for brevity) (lcsh omitted for brevity) 180 information technology and libraries | december 2009 are used; and different interpretations, however subtle, are made of element meanings. despite the best intentions of everyone involved with its creation and maintenance, and despite the high quality of many of our metadata records when examined in isolation, in the big picture, multipac has demonstrated—perhaps for the first time—how much work will be needed to upgrade our metadata for a discovery system. would the benefits make the effort worthwhile? would the effort be implementable and sustainable given the limitations of the present generation of “silo” systems? what kind of adjustments would need to be made to accommodate effective workflows, and what might those workflows look like? these questions still await answers. of note, all other open-source and vendor systems suffer from the same issues, which is a key reason that these types of systems are not yet ascendant in libraries.9 there is much promise in the ability of infrastructural standards like frbr, skos, rda, and the many other esoteric information acronyms to pave the way for the next generation of library discovery systems. organizational considerations electronic information has so far proved relatively elusive to manage; some of it is ephemeral in existence, most of it is constantly changing, and all of it is from diverse sources. attempts to deal with electronic resources—representing them using catalog surrogate records, streamlining website portals, farming out the problem to vendors—have not been as successful as they have needed to be and suffer from a number of inherent limitations. multipac would constitute a major change in library resource provision. our library, like many, is for the most part organized around a core 1970s–80s ils–support model that is not well adapted to a modern unified discovery environment. next-generation discovery is trending away from assembly-line-style acquisition and processing of primarily physical resources and toward agglomerating interspersed networked and physical resource clouds from onand offsite.10 in this model, increasing responsibilities are placed on all content providers to ensure that their metadata conforms to site-wide protocols that, at our library, have yet to be developed. n conclusion in deciding how to best deal with discovery issues, we found that a traditional product matrix comparison does not address the entire scope of the problem, which is that some of the discoverability inadequacies in our libraries are caused by factors that cannot be purchased. sound metadata is essential for proper functioning of a unified discovery system, and descriptive uniformity must be ensured on multiple levels, from the element level to the institution level. technical facilitators of improved discoverability already exist; the responsibility falls on us to adapt to the demands of future discovery systems. the specific discovery tool itself is only a facilitator, the specific implementation of which is likely to change over time. what will not change are library-wide metadata quality issues that will serve any tool we happen to deploy. the multipac project brought to light important library-wide discoverability issues that may not have been as obvious before, exposing a number of limitations in our existing metadata as well as giving us a glimpse of what it might take to improve our metadata to accommodate a next-generation discovery system, in whatever form that might take. references 1. unlv libraries usability committee, internal library website usability testing, las vegas, 2008. 2. karen calhoun, “the changing nature of the catalog and its integration with other discovery tools.” report prepared for the library of congress, 2006. 3. xiaoming liu et al., “federated searching interface techniques for heterogeneous oai repositories,” journal of digital information 4, no. 2 (2002). 4. apache software foundation, apache solr, http://lucene .apache.org/solr/ (accessed june 11, 2009). 5. dublin core metadata initiative, “dublin core metadata element set, version 1.1,” jan. 14, 2008, http://dublincore.org/ documents/dces/ (accessed june 25, 2009). 6. lorcan dempsey, “a palindromic ils service layer,” lorcan dempsey’s weblog, jan. 20, 2006, http://orweblog.oclc .org/archives/000927.html (accessed july 15, 2009). 7. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 4, no. 25 (2007): 550–61. 8. tim berners-lee, “hypertext style: cool uris don’t change,” 1998, http://www.w3.org/provider/style/uri (accessed june 23, 2009). 9. bowen, jennifer, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology and libraries 2, no. 27 (june 2008): 6–19. 10. calhoun, “the changing nature of the catalog.” evaluation of the new jersey digital highway | jeng 17 judy jeng evaluation of the new jersey digital highway the aim of this research is to study the usefulness of the new jersey digital highway (njdh, www.njdigitalhigh way.org) and its portal structure. the njdh intends to provide an immersive and user-centered portal for new jersey history and culture. the research recruited 145 participants and used a web-based questionnaire that contained three sections: for everyone, for educators, and for curators. the feedback on the usefulness of the njdh was positive and the portal structure was favorable. the research uncovered several reasons why some collections did not want to or could not participate. the findings also suggested priorities for further development. this study is one of the few on the evaluation of cultural heritage digital library. t he new jersey digital highway (njdh, www .njdigitalhighway.org) is a digital library for new jersey history and culture, including collections of new jersey libraries, museums, archives, and historical societies. the njdh, funded in part by the 2003 national leadership grant of the institute for museum and library services, is a joint project by new jersey state library, the new jersey division of archives and records management at rutgers university libraries, the new jersey historical society, and the american labor museum. as part of the project, the njdh identifies 686 cultural heritage institutions (public libraries, archives, historical societies, and museums). as of november 2007, there are more than ten thousand objects (pictures, records, and oral histories) in the repository. more are being added daily. the njdh, at this writing, is still very much a work in process. the principal investigator of this project continues to extend opportunities to more communities to link their sites and scan their images.1 the njdh provides portals for four different groups of people: everyone, educators, students, and librarians and curators. its mission is to develop an immersive, user-centered information portal and to support the new jersey learner through a collaboration among cultural heritage institutions that supports preservation of the past, new access strategies for the future, and active engagement with resources at the local and the global level for shared access and local ownership. the njdh uses fedora (flexible extensible digital object repository architecture) as a platform to mount participating institutions’ digital objects and metadata. fedora is developed jointly by cornell university and the university of virginia and is currently supported through an andrew w. mellon foundation grant that is customizable and allows local institutions to have true control over what they digitize and post.2 fedora is built on xml with core standards that support flexibility and interoperability such as mets (metadata encoding and transmition standard, www.loc.gov/standards/ mets) and oai-pmh (open archives initiative protocol for metadata harvesting, www.openarchives.org) functions. fedora is chosen for the njdh because it can effectively accommodate and manage a broad array of information sources with the flexibility to integrate with other information repositories. the njdh uses a metadata structure based on mods (metadata open description schema, www.loc.gov/ standards/mods), mets, niso, and premis (preservation metadata, www.loc.gov/standards/premis) metadata standards to support preservation of digital objects, to ensure scalability for projects and interoperability with other systems through oai-pmh. this hybrid approach enables njdh collection managers and metadata creators to provide information through multiple presentation standards in a schema easily understood within distinctive cultural heritage organization communities. mods is used for descriptive metadata, provides and retains standard bibliographic cataloging principles, and is therefore easily mapped to marc. the njdh therefore includes a mapping utility that allows the export of records from the njdh to online catalogs for any organization that wants to make its digital objects accessible within its integrated library system. additionally, there are four other types of metadata in njdh: source metadata describes provenance, condition, and conservation of analog source materials such as photographs, books, maps, audio, and video; technical metadata describes born digital images and provides information about the digital master files that will be maintained for long-term preservation and access; rights metadata identifies the rights holder(s) for each information source, identifies the permissions for use including any restrictions, and documents the copyright status of each work; digital provenance metadata provides a digital “audit trail” of any changes to the metadata.3 the use of the njdh has steadily grown and has some three thousand unique visitors a month averaging eight to ten thousand visits per month.4 n prior cultural heritage digital library evaluations literature review indicates that few researchers have investigated the usability or the evaluation of cultural judy jeng (jjeng@njcu.edu) is head of collection services, new jersey city university, new jersey. 18 information technology and libraries | december 2008 heritage digital libraries. the minerva (ministerial network for valorising activities) project proposed a number of criteria and principles specifically for usability evaluations of cultural web applications, including visibility, affordance, natural mapping, constraints, conceptual models, feedback, safety, flexibility, the scope and aim of the site, meaningful organization of the website’s functions, quality of content (for example, consistency, completeness, conciseness, accuracy, objectivity), design of functional layout, consistent use of graphics and multimedia components, as well as provision for navigation tools and search mechanisms.5 in addition, vaki, dallas, and dalla proposed sixteen usability guidelines for cultural applications.6 garoufallou, siatri, and balatsoukas reported their research on the user interface of the veriagrid application.7 the veriagrid system (www.theveriagrid.org) is a platform based on digital cartography that supports a vector map of the city of veria organized by layers and linked to multimedia objects such as text, images, photos, and video clips. the researchers were interested in learnability, errors, and satisfaction. n usefulness as the primary evaluation criterion for the njdh the njdh aims to serve heterogeneous communities and information needs. like other digital cultural services, it is not easy to address usability issues. lynch has said that digital libraries of cultural heritage don’t really have natural communities around them and that digital materials find their own unexpected user communities.8 garoufallou, siatri, and balatsoukas said that “different types of users, such as students and scholars or tourists and travelers look at these services from different angles (for example, scholarly or recreational needs). thus, the provision of accessible and user-friendly systems is important for the wider use and acceptance of these services.”9 the aim of this evaluation was to assess usefulness of the njdh from the perspectives of general users, educators, and cultural heritage professionals. usefulness is one of the criteria of usability with a focus on “did it really help me?” and “was it worth the effort?” usefulness differs from usableness in that usableness refers to functions such as “can i turn it on?” or “can i invoke that function?” usefulness can also mean “serving an intended purpose.” in the technology acceptance model (tam) developed by davis and his colleagues, perceived usefulness refers to the extent to which an information system will enhance a user’s performance.10 in addition to usefulness and usableness, jeng has gathered a comprehensive collection of usability criteria such as effectiveness, efficiency, satisfaction, learnability, ease of use, memorability, mistake recovery, and interface effectiveness.11 usability is a multidimensional construct and has a theoretical root in human–computer interaction. although usefulness may be an important evaluation criterion, thomas and jeng report that usefulness is an often overlooked criterion of usability.12 literature review indicates that usefulness has been used as either the primary or one of the criteria in the following evaluations of digital libraries: elibraryhub, the digital work environment, grow (geotechnical, rock, and water engineering, www.grow.arizona.edu), mcmaster university library’s gateway, the miguel de cervantes virtual library, minnesota’s foundations project, and the moving image collections.13 this paper reports the evaluation of the njdh. n research method a web-based online survey was conducted in september– december 2006. the questionnaire was designed, collected, and analyzed using web-based software called surveymonkey. convenience sampling method was used in this study. subjects were recruited by posting a link on the njdh website, by posting announcements on a number of electronic discussion lists for educators and cultural heritage professionals, and by word-of-mouth invitations. the participants were asked to complete a two-part questionnaire. the first part gathered demographic data such as gender, age, ethnic background, educational background, the county they live in, and how they learned about the njdh. the second part contained three sections: one for everyone, one for educators, and one for cultural heritage professionals. the section for everyone contained twenty-six questions, including seven-point likert scales and open-ended questions with a focus on the digital library’s usefulness, navigation, design, terminology, and user lostness. in addition to this general section, educators were asked to complete another fifteen questions pertaining specifically to the educators’ portal; the cultural heritage professionals had another thirteen questions regarding the librarians and curators’ portal. a total of 145 individuals participated in the survey, of which 32 were educators (22%) and 28 (20%) were cultural heritage professionals. the participants were mostly white (127 respondents or 89%), mostly female (118 respondents or 81%), and most had a master’s or doctoral degree (114 respondents or 79%). in terms of age distribution, more than half of the participants were over 50 (79 respondents or 55%) (see table 1). nearly all (136 respondents or 94%) were residents of new jersey. evaluation of the new jersey digital highway | jeng 19 among the educators that participated in this survey who evaluated the educators’ portal, 56% (18 respondents) worked at colleges or universities, 16% (5 respondents) worked at high schools, 13% (4 respondents) worked at elementary or middle schools, and 6% (2 respondents) identified themselves as specialists in museums, libraries, or archives. roughly a third (10 respondents or 31%) were teachers, 3% (1 respondent) was a teaching assistant, 13% (4 respondents) were school administrators, and 28% (9 respondents) were school library media specialists or librarians (see table 2). in terms of what they teach, 27% (7 respondents) teach new jersey history, 23% (6 respondents) teach social studies, 12% (3 respondents) teach civics, 8% (2 respondents) teach geography, and 8% (2 respondents) teach popular culture. as to the survey participants who identified themselves as cultural heritage professionals, 61% (17 respondents) worked at libraries, 11% (3 respondents) worked at museums, 11% (3 respondents) worked at historical societies, and 4% (1 respondent) worked with archives. in terms of their roles at those organizations, 61% (17 respondents) said they were faculty or staff, 18% (5 respondents) were administrators, one was a consultant, one was a librarian, and one was a volunteer (see table 3). n findings how do users find out about the njdh and will they come back? the survey found that more than half of the respondents (58 participants or 40%) learned about the njdh from their colleagues or friends, 19 participants (13%) learned through attending conferences, 16 participants (11%) were linked from other websites (see figure 1). the njdh digital library intends to build rich and “one stop shop” digital collections of new jersey history and culture. cultural heritage digital library plays a particularly important role for students of the humanities because the digital library is the humanist’s laboratory, its resources are the scholar’s primary data.14 it is important to enhance users’ awareness of this digital library among new jerseyans and even promote this cultural heritage digital library to users at global level. table 1. demographic data (n = 145) total % gender male 27 18.6 female 118 81.4 age 18–24 1 0.7 25–49 63 44.1 50–64 74 51.7 65+ 5 3.5 ethnic background white 127 89.4 african american 5 3.5 asian 6 4.2 hispanic 3 2.1 native american 1 0.7 education high school 5 3.4 associate’s degree 7 4.8 bachelor’s degree 19 13.1 master’s or phd degree 114 78.6 in terms of the purposes of visiting the njdh, the study found 72 respondents (76%) were just browsing and 23 respondents (24%) were looking for specific information such as a specific county information, history, and family genealogy (see figure 2). seventy-two respondents (74%) replied that they will come back to use the njdh again (see figure 3). those who said “no” gave reasons such as their doubts on whether the information in the njdh is reliable and authoritative, the depth and breadth of content in this digital library, and the inconsistency of fonts and font sizes. n navigation navigation has been reported in literature as a common problem in a digital library. users could accidentally leave the digital library, following the links to other web-based resources, and were unaware that they were no longer using the digital library. brinck, gergle, and wood report that disorientation is among the biggest frustrations for web users.15 20 information technology and libraries | december 2008 average 2.54 on a 7-point likert scale, 1 being easy to navigate and 7 being difficult to navigate). twentythree participants (25%) marked 1 on the likert scale, 28 participants (30%) marked 2, and 26 participants (28%) marked 3. these brought the total of the top three points to 83%. the overall response regarding user lostness was also not a problem (response average 2.42 on a 7-point likert scale, 1 being not lost at all and 7 being very lost). only two participants expressed they were very lost and one expressed lost. the reasons that could lead to user lostness include the lack of material in the collections so far, the need for explanation of how relevance is ranked, the home page being text heavy and cluttered, the photos not being legible, the lack of author information in documents, no indication of a trail of how one got there, lengthy urls, the need for better chosen direct links instead of layered links, and patrons’ unfamiliarity with icons and their functions. n layout the rating for the layout of the njdh was very positive (response average 2.54 on a 7-point likert scale, 1 being good and 7 being bad). however, the site may improve its appearance in the following areas: there is currently too much text per page (the font is too small and the use of typography, informational hierarchy, and white space must be improved); more important information needs to go at the top of pages; and more colors need to be used. n terminology the degree to which users interact with a digital library depends on how well users understand the terminology displayed on the system interface. literature review has indicated that the inappropriate use of jargon has been a common problem in digital library design. hartson, shivakumar, and pérez-quinones report from their usability inspection of the networked computer science technical reference library (www.ncstrol.org) that problems with wording accounted for 36% of the digital library’s usability problems.16 system designers often assume too much about the extent of user knowledge. the precise use of words in a user interface is one of the utmost important design considerations for usability. table 2. educators’ demographic data (n = 32) total % institutions university or college 18 56 high school 5 16 elementary or middle school 4 13 museums and others 2 6 no answer 3 9 total 32 100 roles teacher 10 31 teaching assistant 1 3 administrator 4 13 librarian 9 28 no answer 8 25 total 32 100 table 3. cultural heritage professional’s demographic data (n = 28) total % institutions library 17 61 museum 3 11 historical society 3 11 archives 1 4 others or no answer 4 14 roles faculty or staff 17 61 administrator 5 18 consultant 1 4 librarian 1 4 volunteer 1 4 no answer 3 11 this survey found the overall response regarding the navigation of the njdh was very positive (response evaluation of the new jersey digital highway | jeng 21 this research found that the overall response regarding terminology and labeling in the njdh was positive (response average 2.34 on a 7-point likert scale, 1 being clear and 7 being not clear). n usefulness usefulness was the fundamental research focus of this study. this research investigated whether the njdh was useful to the general public, educators, and students. the responses were overwhelmingly positive: 73% of the respondents gave 1–3 ratings on the 7-point likert scale (1 being useful and 7 being not useful)—30% (29 respondents) marked 1, 33% (32 respondents) marked 2, and 12% (12 respondents) marked 3. the average response was 2.63. this was a very positive response. when it comes to the specific section for educators to evaluate the educator’s portal, the rating was also positive (response average 3.04). those educators felt that the most useful information was the “how to” information for teaching with digital resources, research genealogy, developing an oral history, and so on. twelve respondents (44%) indicated they would encourage their students to use the njdh site for term papers or homework assignments. thirteen respondents (50%) indicated they would make their own lesson plans using the resources and information from the njdh. regarding the student’s portal, those educators who responded to the survey indicated that, from their perspectives, the most useful information for students was the general information about new jersey, including a directory of cultural heritage organizations, places to visit, etc. as for the librarians and curators’ portal, those cultural heritage professionals identified the librarians and curators’ resource center as the most useful resource in the njdh, followed by the digital highway collections roadmap and associated guidelines, calendar, the searching capabilities of new jersey cultural heritage organizations, and new jersey information. sixteen respondents (67%) said they would recommend this digital library to their patrons, two respondents (8%) won’t, and six respondents (25%) were not sure. it is obvious that the njdh administrators need to work harder in this area to enhance usefulness for cultural heritage professionals and their patrons. figure 1. where did you hear about njdh? the survey asked all respondents to suggest what themes should be enriched in the njdh collections. the suggestions were, in this order: new jersey history, new jersey state and county documents, new jersey culture, genealogy, everyday life in new jersey, new jersey industry, more immigration resources, education in new jersey, new jersey in wartime, and transportation. regarding the librarians and curators’ portal, the respondents suggested the contents of this particular portal should be enhanced in the following priority order: (1) more links to other websites with history resources and activities, (2) access to mentors experienced in digitizing and metadata who can provide one-to-one assistance, (3) a discussion list or blog where users can ask questions or share ideas with others, (4) information about training sessions around new jersey on digitization and metadata, (5) more resources on digital preservation and metadata, (6) educational activities that users can share with their patrons, (7) a tool for users to create their own interactive activities using the njdh resources, and (8) more information about helping patrons to use the njdh more effectively. n portal structure the njdh provides four portals for different target users: everyone, educators, students, and librarians and curators. each portal provides different interface and packages different information for a different type of user. the survey found 80% of the subjects understood the purpose of the four portals (by marking 1 or 2 on the 7-point likert scale) and only 4 participants (4%) found this type of portal structure confusing. the survey further found 65% of participants felt this kind of portal structure helpful to them. 22 information technology and libraries | december 2008 n why not contributing to the njdh collections? the respondents indicated that the barriers for them to contribute collections or resources to the njdh were, in this order: (1) lack of staff or time, (2) lack of funding, (3) lack of knowledge, and (4) copyright concerns. n statistical analyses the study found demographic factors, such as age, gender, ethnic background, and educational level, do not have significant effects on a number of areas: (1) how the participants ranked usefulness of the digital library, (2) usefulness evaluation of the four-portal structure, (3) understanding of terminology, (4) ease of navigation, and (5) lostness. the study found the correlation between navigation and lostness was statistically significant: r (66) = .83, p < .001. when a user felt the system easy to navigate, the user felt less lost. the study also found usefulness of the digital library has a statistically significant effect on a user ’s return decision. a one-way analysis of variance was conducted. the analysis of variance was significant, f (2, 59) = 20.42, p < .001. the strength of relationship between usefulness ranking and the decision of whether to revisit the digital library, as assessed by n2, was strong, with the usefulness factor accounting for 41% of the variance of the return decision. because the overall f test was significant, follow-up tests were conducted to evaluate pairwise differences among the means. using the turkey test, the pairwise comparisons yes vs. no and yes vs. not sure were significant. the pairwise comparison no vs. not sure was not significant. n conclusions usability evaluation is a user-centered evaluation to learn from users’ needs, expectations, and satisfaction. this research studied usefulness, navigation, user lostness, terminology, and layout. the overall response was positive, and the finding was that the njdh was useful in providing new jersey history and culture information. designers of the njdh learned from the study the priorities of adding various new jersey themes to the collections and how to make the site easier to use. as a result of the study, lifelong learners are identified as an important target audience. this research provided insights on why people came to use this particular digital library, their pleasure of using it, how to improve ease-of-use, navigation, website appearance, and the use of terminology and labeling. the front page of the website was redesigned to address the overuse of text on each page. the study also helped to discover what components of the site were more useful and why. furthermore, it investigated why some museums or collections in new jersey have not participated in this digital library development project. as a result of the study, more emphasis has been placed on building tools figure 2. purpose of the most recent visit figure 3. will you use njdh again? evaluation of the new jersey digital highway | jeng 23 to increase independent collection contribution by museums and archives. the observations of this study may help the development of other academic digital libraries because the barriers found in the study are common obstacles. after eighteen months of the study, the njdh governance planning committee still uses the evaluation report to address more complex and fundamental changes and the reorganization of the digital library. the study confirmed that users of this digital library appreciated the idea of providing different portals for different users. the study did not find demographic factors (age, gender, ethnic background, and educational level) play statistically significant roles in the usefulness rankings of the digital library or portal structure, terminology, ease of use, or user lostness. the study found there was a strong correlation between ease of navigation and user lostness. users don’t have feelings of lostness when a system is easy to navigate. the study also found users will come back to revisit a digital library when they find the site is useful. n acknowledgments judy jeng and grace agnew were the codesigners of the questionnaire for this study. judy served as the evaluation consultant for the njdh. grace agnew, the associate university librarian for digital library systems at rutgers university, was the principal investigator of the njdh. the njdh received funding from institute of museum library services grant lg30-03-0269-03. references 1. linda langschied, “history and high-tech intersect on the new jersey digital highway,” www.imls.gov/profiles/ nov07.shtm (accessed aug. 12, 2008). 2. linda langschied and ann montanaro, “the new jersey digital highway: a next-generation approach to statewide digital library development,” microform & imaging review 34, no. 4 (2005): 167–73. 3. the new jersey digital highway: final report on imls grant #lg30-03-0269-03, www.njdigitalhighway.org/documents/ njdh-final_report_www_version.pdf (accessed aug. 12, 2008). 4. ibid. 5. minerva working group 5, handbook for quality in cultural web sites improving quality for citizens: version 1.2—draft. (2003), www.minervaeurope.org/publications/ qualitycriteria1_2draft/qualitypdf1103.pdf (accessed aug. 12, 2008). 6. elina vaki, costis dallas, and christina dalla, calimera: cultural applications: local institutions mediating electronic resources: deliverable d 18: usability guidelines, www.calimera .org/lists/resources%20library/the%20end%20user%20 experience,%20a%20usable%20community%20memory/ usability%20guidelines.pdf (accessed aug. 12, 2008). 7. emmanouel garoufallou, rania siatri, and panagiotis balatsoukas, “virtual maps—virtual worlds: testing the usability of a greek virtual cultural map,” journal of the american society for information science and technology 59, no. 4 (2008): 591–601. 8. clifford lynch, “digital collections, digital libraries and the digitization of cultural heritage information,” first monday 7, no. 5 (2002), www.firstmonday.org/issues/issue7_5/lynch/ (accessed aug. 12, 2008). 9. garoufallou, siatri, and balatsoukas, “virtual maps— virtual worlds,” 591–601. 10. fred d. davis, “perceived usefulness, perceived ease of use, and user acceptance of information technology,” mis quarterly 13, no. 3 (1989): 319–40; fred d. davis, richard p. bagozzi, and paul r. warshaw, “user acceptance of computer technology: a comparison of two theoretical models,” management science 35, no. 8 (1989): 982–1003. 11. judy jeng, “usability of the digital library: an evaluation model” (phd diss., rutgers university, 2006): 10–19; judy jeng, “usability assessment of academic digital libraries: effectiveness, efficiency, satisfaction, and learnability,” libri: international journal of libraries and information services 55, no. 2/3 (2005): 96–121; judy jeng, “what is usability in the context of the digital library and how can it be measured?” information technology and libraries 24, no. 2 (2005): 47–56. 12. rita leigh thomas, “elements of performance and satisfaction as indicators of the usability of digital spatial interfaces for information-seeking: implications for isla” (phd diss., univ. of southern california, 1998); judy jeng, “usability of the digital library: an evaluation model” (phd diss., rutgers university, 2006): 33. 13. yin-leng theng, mei-yee chan, ai-ling khoo, and raju buddharaju, “quantitative and qualitative evaluations of the singapore national library board’s digital library,” in design and usability of digital libraries: case studies in the asia pacific, ed. yin-leng theng and schubert foo (hershey, pa.: information science publishing, 2005): 334–49.; n. meyyappan, schubert foo, and g. g. chowdhury, “design and evaluation of a taskbased digital library for the academic community,” journal of documentation 60, no. 4 (2004): 449–75; janice lodato, “creating an educational digital library: grow a national civil engineering education resource library,” (paper presented at the conference on human factors in computing systems, vienna, austria, apr. 24–29, 2004), in the acm digital library, http://portal.acm.org/citation.cfm?id=985942&coll=portal&dl =acm&cfid=32427354&cftoken=28824529 (accessed aug. 12, 2008); brian detlor et al., fostering robust library portals: an assessment of the mcmaster university library gateway (hamilton, ont.: michael g. degroote school of business, mcmaster university, 2003); álvaro quijano-solís and raúl novelo-peña, “evaluating a monolingual multinational digital library by using usability: an exploratory approach from a developing country,” the international information & library review 37, no. 4 (2005): 329–36; eileen quam, “informing and evaluating a metadata initiative: usability and metadata studies in minnesota’s foundations project,” government information quarterly 18, no. 24 information technology and libraries | december 2008 3 (2001): 181–94; judy jeng, “metadata usefulness evaluation of the moving image collections” (paper presented at the new jersey library association annual conference, long branch, new jersey, apr. 23–25, 2007), www.njla.org/conference/2007/ presentations/metadata.pdf (accessed aug. 12, 2008). 14. gregory crane and clifford wulfman, “towards a cultural heritage digital library,” proceedings of the 3rd acm/ ieee-cs joint conference on digital libraries, in the acm digital library, http://delivery.acm.org/10.1145/830000/827150/p75 -crane.pdf?key1=827150&key2=9784876911&coll=acm&dl=a cm&cfid=8598346&cftoken=44546164 (accessed aug. 12, 2008). 15. tom brinck, darren gergle, and scott d. wood, designing web sites that work: usability for the web (san francisco: morgan kaufmann, 2002). 16. h. rex hartson, priy a. shivakumar, and manuel a. pérez-quinones, “usability inspection of digital libraries: a case study,” international journal on digital libraries 4, no. 2 (2004): 108–23. lib-mocs-kmc364-20131012113626 236 news and announcements programmers discussion group meets: pl/1, the marc format, and holdings twenty-two computer programmers, analysts, and managers met on june 29 in san francisco for the formative meeting of the lit a/isas programmers discussion group. in an informal and informative hour, the group established ground rules, started a mailing list, planned the topic for midwinter 1982, and found out more about practice<> in fifteen library-related installations. programming language usage what programming languages are used, and used primarily, at the installations? nine languages turned up, excluding database management systems (and lumping all "assembly" languages together)-but one language accounted for more than one-half of the responses: language users primary pl/1 14 13 assembler/ assembly languages 8 5 cobol 4 2 pascal 3 1 basic 1 1 c 1 1 mils (a mumps dialect) 1 fortran 0 snobol 0 note: some installations use more than one ''primary" language.) a second round of hands showed only four users with no use of pl!i. marc format usage these questions are asked on an agencyby-agency basis. one agency made no use of the marc communications format. none of those receiving marc-format tapes were unable to recreate the format. eight of the fifteen agencies made significant internal-processing use of the marccommunications-format structure, including the leader, directory, and character storage patterns; this question was made more explicit to try to narrow the answers. thus, the marc communications format is used as a processing format in a significant number of institutions. only three agencies use ascii internally, most use of marc takes place within ebcdic. (all but three agencies were using ibm 360/370 equivalent computersthe parallel is clear.) computer usage as noted, all but three agencies use ibm equivalents in the mainframe range; three of those use plug-compatible equipment such as magnuson and amdahl. the other major computers are cdc, dec/vax, and data general eclipse systems. smaller computers in use include dc, dec 11170, datapoint, and ibm series/! units. home terminals and computers four of those present currently have home terminals. three have home computers. future plaru for the discussion group the midwinter 1982 topic will be "holdings," with some emphasis on dealing with holdings formats in various technical processing systems (such as oclc, utlas, wln, rlin). an announcement and mailing list will go to all those on the mailing list, as will an october/november mailing with questions sent to the chair. those interested should send their names and addresses to walt crawford, rlg, jordan quad, stanford, ca 94305. it is anticipated that papers on the topic may be ready by midwinter; questions and comments are welcomed. note: there will be no set speakers or panelists; this will be a true disci.i.i'sion group. the topic for the philadelphia meeting will be set at midwinter 1982.-walt crawford, chair, the research libraries group, inc. channel 2000 a test of viewdata system called channel 2000 was conducted by oclc in columbus, ohio, during the last quarter of 1980. an outgrowth of the oclc research department's home delivery of library services program, channel 2000 was developed and tested to investigate technical, business, market, and social issues involved in electronic delivery of information using videotex technology. data collection throughout the test, data were collected in three ways. transaction logs were maintained, recording keystrokes of each user during the test, thus allowing future analyses and reconstruction of the test sessions. questionnaires requesting demographic information, life-style, opinion leadership, and attitudes toward channel 2000 were collected from each user in each household before, during, and after the test. six focus-group interviews were held and audiotaped to obtain specific userresponses to the information services. attitudes toward library services forty-six percent of the respondents agreed that channel 2000 saved time in getting books from the library. responding to other questions, 29 percent felt that they would rather go to a traditional library than order books through channel 2000, and 38 percent of the users felt that channel 2000 had no effect on their library allendance. forty-one percent of the channel 2000 test group felt that their knowledge of library services increased as a result of the channel 2000 test. in addition, 16 percent of the respondents stated that they spent more time reading books than they did before the test. eighty-two percent of the respondents felt that public libraries should spend tax dollars on services such as channel 2000. although this might suggest that library viewdata services should be taxbased, subsequent focus-group interviews indicated that remote use of these services should be paid for by the individual, whereas on-site use should be "free." sixtythree percent of the test population stated news and announcements 237 that they would probably subscribe and pay for a viewdata library service, if the services were made available to them off-site. purchase intent respondents were asked to rank-order the seven channel 2000 services according to the likelihood that they would pay money to have that service in their home. a mean score was calculated for each channel 2000 service, and the following table shows rank order of preference. rank order channel 2000 service 1 video encyclopedia locate any of 32,000 articles in the new academic american encyclopedia via one of three easy look-up indexes 2 video catalog browse through the videocard catalog of the public libraries of columbus and franklin county, and select books to be mailed directly to your home 3 home banking pay your bills; check the status of your checking and savings accounts; look up the balance of your visa credit card; look up your mortgage and installment loans; get current information on bank one interest rates 4 public information become aware of public and legislative information in ohio 5 columbus calendar check the monthly calendar of events for local educational and entertainment happenings 6 math that connts! teach your children basic mathematics, including counting and simple word problems 7 early reader help your children learn to read by reinforcing word relationships the final report, mailed to all oclc member libraries, was published as channel 2000: description and findings of a viewdata test conducted by oclc in columbus, ohio, october-december 1980. dublin, ohio: research department, online computer library center, inc., 1981. 21p. notis software available at the 1981 ala annual conference in san francisco, the northwestern univer238 journal of library automation vol. 14/3 september 1981 sity library announced the availability of version 3.2 of the notis computer system. intended for medium and large research libraries or groups of libraries, notis provides comprehensive online integratedprocessing capabilities for cataloging, acquisitions, and serials control. patron access by author and title has been in operation for more than a year , and version 3.2 adds subject-access capability as well as other new features. an improved circulation module and other enhancements are under development for future release. although notis, which runs on standard ibm or ibm-compatible hardware, has been in use by the national library of venezuela for several years, northwestern only recently decided to actively market the software, and provided a demonstration at the ala conference. a contract has been signed with the university of florida, and several other installations are expected within a few months. further information on notis may be obtained from the northwestern university library, 1935 sheridan rd., evanston, il 60201. bibliographic access & control system the washington university school of medicine library announces its computerbased online catalog/library control system known as the bibliographic access & control system (bacs). the system is now in operation and utilizes marc cataloging records obtained from oclc since 1975, serials records from philsom serials control network, and machine-readable patron records. features of interest in the system are: 1. patron access by author, title, subject, call number, or combination of keywords. the public-access feature has been in operation since may 1981. online instructions support system use, minimizing staff intervention. user survey indicates a high degree of satisfaction with the system. 2. low cost public access terminal with a specially designed overlay board. 3. barcode-based circulation system featuring the usual functions, including recalls for high demand items, overdue notices, suspension of circulation privileges, etc. 4. cataloging records loaded from oclc marc records by tape and from a microcomputer interface at the oclc printer port. authority control available on three levels: (a) controlled authority, i.e. , mesh or lc, (b) library-specific assigned authority, and (c) word list available to user. 5. full cataloging functions online, including editing, deleting, and entering records. 6. serials control from philsom system. philsom is an online distributed computer network that currently controls serials for sixteen medical school libraries. philsom features rapid online check-in, claims, fiscal control, union lists, and management reports. 7. five possible displays of the basic bibliographic record, varying from a brief record for the public access terminal to complete information for cataloging and reference staff. 8. two levels of documentation available online. the software is available to interested libraries, bibliographic utilities, or commercial firms. contact: washington university school of medicine library, 4580 scott, st. louis, mo 63110; (314) 454-3711. editorial | truitt 55 a recent library journal (lj) story referred to “the palpable hunger public librarians have for change . . . and, perhaps, a silver bullet to ensure their future” in the context of a presentation at the public library association’s 2010 annual conference by staff members of the rangeview (colo.) library district. now, lest there be any doubt on this point, allow me to state clearly from the outset that none of the following ramblings are in any way intended as a specific critique of the measures undertaken by rangeview. far be it from me to second-guess the rangeview staff’s judgment as to how best to serve the community there.1 rather, what got my attention was lj’s reference to a “palpable hunger”for magic ammunition, from whose presumed existence we in libraries seem to draw comfort. in the last quarter century, it seems as though we’ve heard about and tried enough silver bullets to keep our collective six-shooters endlessly blazing away. here are just a few examples that i can recall off the top of my head, and in no particular order: ■■ library cafes and coffee shops. ■■ libraries arranged along the lines of chain bookstores. ■■ general-use computers in libraries (including information/knowledge commons and what-have-you) ■■ computer gaming in libraries. ■■ lending laptops, digital cameras, mp3 players and ipods, e-book readers, and now ipads. ■■ mobile technology (e.g., sites and services aimed at and optimized for iphones, blackberries, etc.) ■■ e-books and e-serials. ■■ chat and instant-message reference. ■■ libraries and social networking (e.g., facebook, twitter, second life, etc.). ■■ “breaking down silos,” and “freeing”/exposing our bibliographic data to the web, and reuse by others outside of the library milieu. ■■ ditching our old and “outmoded” systems, whether the object of our scorn is aacr2, lcsh, lcc, dewey, marc, the ils, etc. ■■ library websites generally. remember how everyone—including us—simply had to have a website in the 1990s? and ever since then, it’s been an endless treadmill race to find the perfect, user-centric library web presence? if sisyphus were to be incarnated today, i have little doubt that he would appear as a library web manager and his boulder would be a library website. ■■ oh, and as long as we’re at it, “user-centricity” generally. the implication, of course, is that before the term came into vogue, libraries and librarians were not focused on users. ■■ “next-gen” catalogs. i’m sure i’m forgetting a whole lot more. anyway, you get the picture. each of these has, at one time or another, been positioned by some advocate as the necessary change—the “silver bullet”—that would save libraries from “irrelevance” (or worse!), if we would but adopt it now, or better yet, yesterday. well, to judge from the generally dismal state of libraries as depicted by some opinionmakers in our profession—or perhaps simply from our collective lack of self-esteem—we either have been misled about the potency of our ammunition, or else we’ve been very poor markspersons. notwithstanding the fact that we seem to have been indiscriminately blasting away with shotguns rather than six-shooters, our shooting has neither reversed the trends of shrinking budgets and declining morale nor staunched the ceaseless dire warnings of some about “irrelevance” resulting from ebbing library use. to stretch the analogy a bit further still, one might even argue that all this shooting has done damage of its own, peppering our most valuable services with countless pellet-sized holes. at the same time, we have in recent years shown ourselves to be remarkably susceptible to the marketingfocused hyperbole of those in and out of librarianship about technological change. each new technology is labeled a “game-changer”; change in general is either— to use the now slightly-dated, oh-so-nineties term—a “paradigm shift” or, more recently, “transformational.” when did we surrender our skepticism and awareness of a longer view? what’s wrong with this picture?2 i’d like to suggest another way of viewing this. a couple of years ago, alan weisman published the world without us, a book that should be required reading for all who are interested in sustainability, our own hubris, and humankind’s place in the world. the book begins with our total, overnight disappearance, and asks (1) what would the earth be like without us? and (2) what evidence of our works would remain, and for how long? the bottom line answers for weisman are (1) in the long run, probably much better off, and (2) not much and not for very long, really. so, applying weisman’s first question to our own, much more modest domain, what might the world be like if tomorrow librarians all disappeared or went on to work doing something else—became consultants, perhaps?— and our physical and virtual collections were padlocked? would everything be okay, because as some believe, marc truitteditorial: no more silver bullets, please marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 56 information technology and libraries | june 2010 think we need to be prepared to turn off the lights, lock the doors, and go elsewhere, because i hope that what we’re doing is about more than just our own job security. and if the far-fetched should actually happen, and we all disappear? i predict that at some future point, someone will reinvent libraries and librarians, just as others have reinvented cataloguing in the guise of metadata. notes and references 1. norman oder, “pla 2010 conference: the anythink revolution is ripe,” library journal, mar. 26, 2010, http://www .libraryjournal.com/article/ca6724258.html (accessed mar. 30, 2010). there, i said it! a fairly innocuous disclaimer added to one of my columns last year seemed to garner more attention (http:// freerangelibrarian.com/2009/06/13/marc-truitts-surprising -ital-editorial/) than did the content of the column itself. will the present disclaimer be the subject of similar speculation? 2. one of my favorite antidotes to such bloated, short-term language is embodied in michael gorman’s “human values in a technological age,” ital 20, no. 1 (mar. 2000): 4–11, http:// www.ala.org/ala/mgrps/divs/lita/ital/2001gorman.cfm (accessed apr 12, 2010)—highly recommended. the following is but one of many calming and eminently sensible observations gorman makes: the key to understanding the past is the knowledge that people then did not live in the past—they lived in the present, just a different present from ours. the present we are living in will be the past sooner than we wish. what we perceive as its uniqueness will come to be seen as just a part of the past as viewed from the point of a future present that will, in turn, see itself as unique. people in history did not wear quaintly oldfashioned clothes—they wore modern clothes. they did not see themselves as comparing unfavorably with the people of the future, they compared themselves and their lives favorably with the people of their past. in the context of our area of interest, it is particularly interesting to note that people in history did not see themselves as technologically primitive. on the contrary, they saw themselves as they were—at the leading edge of technology in a time of unprecedented change. it’s all out there on the web anyway, and google will make it findable? absent a few starry-eyed bibliophiles and newly out-of-work librarians—those who didn’t make the grade as consultants—would anyone mourn our disappearance? would anyone notice? if a tree falls in the woods . . . in short, would it matter? and if so, why and how much? the answer to the preceding two questions, i think, can help to point the way to an approach for understanding and evaluating services and change in libraries that is both more realistic and less draining than our obsessive quest for the “silver bullet.” what exactly is our “valueadd”? what do we provide that is unique and valuable? we can’t hope to compete with barnes and noble, starbucks, or the googleplex; seeking to do so simply diverts resources and energy from providing services and resources that are uniquely ours. instead, new and changed services and approaches should be evaluated in terms of our value-add: if they contribute positively and are within our abilities to do them, great. if they do not contribute positively, then trying to do them is wasteful, a distraction, and ultimately disillusioning to those who place their hopes in such panaceas. some of the “bullets” i listed above may well qualify as contributing to our value-add, and that’s fine. my point isn’t to judge whether they are “bad” or “good.” my argument is about process and how we decide what we should do and not do. understanding what we contribute that is uniquely ours should be the reference standard by which proposed changes are evaluated, not some pie-inthe-sky expectation that pursuit of this or that vogue will magically solve our funding woes, contribute to higher (real or virtual) gate counts, make us more “relevant” to a particular user group, or even raise our flagging selfesteem. in other words, our value-add must stand on its own, regardless of whether it actually solves temporal problems. it is the “why” in “why are we here?” if, at the end of the day, we cannot articulate that which makes us uniquely valuable—or if society as a whole finds that contribution not worth the cost—then i editorial ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 34 information technology and libraries | march 2011 camilla fulton web accessibility, libraries, and the law as a typical student, you are able to scan the resources and descriptions, familiarize yourself with the quiz’s format, and follow the link to the quiz with no inherent problems. everything on the page flows well for you and the content is broken up easily for navigation. now imagine that you are legally blind. you navigate to the webpage with your screen reader, a software device that allows you to surf the web despite your impairment. ideally, the device gives you equal access to webpages, and you can navigate them in an equivalent manner as your peers. when you visit your teacher’s webpage, however, you start experiencing some problems. for one, you cannot scan the page like your peers because the category titles were designed with font tags instead of heading tags styled with cascading style sheets (css). most screen readers use heading tags to create the equivalent of a table of contents. this table of contents function divides the page into navigable sections instead of making the screen reader relay all page content as a single mass. second, most screen readers also allow users to “scan” or navigate a page by its listed links. when you visit your teacher’s page, you get a list of approximately twenty links that all read, “search this resource.” unfortunately, you are unable to differentiate between the separate resources without having the screen reader read all content for the appropriate context. third, because the resources are separated by hard returns, you find it difficult to differentiate between each listed item. your screen reader does not indicate when it approaches a list of categorized items, nor does it pause between each item. if the resources were contained within the proper html list tags of either ordered or unordered (with subsequent list item tagging), then you could navigate through the suggested resources more efficiently (see figures 1, 2, and 3). finally, the video tutorial’s audio tract explains much of the quiz’s structure; however, the video relies on image-capture alone for page orientation and navigation. without a visual transcript, you are at a disadvantage. stylistic descriptions of the page and its buttons are generally unhelpful, but the page’s textual content, and the general movement through it, would better aid you in preparation for the quiz. to be fair, your teacher would already be cognizant of your visual disability and would have accommodated your class needs appropriately. the individuals with disabilities education act (idea) mandates educational institutions to provide an equal opportunity to education.1 your teacher would likely avoid posting any class materials online without being certain that the content was fully accessible and usable to you. unlike educational institutions, however, most libraries are not legally bound to the same law. idea does not command libraries to provide equal access to information through with an abundance of library resources being served on the web, researchers are finding that disabled people oftentimes do not have the same level of access to materials as their nondisabled peers. this paper discusses web accessibility in the context of united states’ federal laws most referenced in web accessibility lawsuits. additionally, it reveals which states have statutes that mirror federal web accessibility guidelines and to what extent. interestingly, fewer than half of the states have adopted statutes addressing web accessibility, and fewer than half of these reference section 508 of the rehabilitation act or web content accessibility guidelines (wcag) 1.0. regardless of sparse legislation surrounding web accessibility, librarians should consult the appropriate web accessibility resources to ensure that their specialized content reaches all. i magine you are a student. in one of your classes, a teacher and librarian create a webpage that will help the class complete an online quiz. this quiz constitutes 20 percent of your final grade. through the exercise, your teacher hopes to instill the importance of quality research resources found on the web. the teacher and librarian divide their hand-picked resources into five subject-based categories. each resource listing contains a link to that particular resource followed by a paragraph of pertinent background information. the list concludes with a short video tutorial that prepares students for the layout of the online quiz. neither the teacher nor the librarian has extensive web design experience, but they both have basic html skills. the library’s information technologists give the teacher and librarian web space, allowing them to freely create their content on the web. unfortunately, they do not have a web librarian at their disposal to help construct the page. they solely rely on what they recall from previous web projects and visual layouts from other websites they admire. as they begin to construct the page, they first style each category’s title with font tags to make them bolder and larger than the surrounding text. they then separate each resource and its accompanying description with the equivalent of hard returns (or line breaks). next, they place links to the resources within the description text and label them with “search this resource.” finally, they create the audiovisual tutorial with a runtime of three minutes. camilla fulton (cfulton2@illinois.edu) is web and digital content access librarian, university of illinois, urbana-champaign. web accessibility, libraries, and the law | fulton 35 providing specifics on when those standards should apply. for example, section 508 of the rehabilitation act could serve as a blueprint for information technology guidelines that state agencies should follow. section 508 states that federal employees with disabilities [must] have access to and use of information and data that is comparable to the access and use by federal employees who are not individuals with disabilities, unless an undue burden would be imposed on the agency.4 section 508 continues to outline how the declaration should be met when procuring and managing software, websites, telecommunications, multimedia, etc. section 508’s web standards comply with w3c’s web content accessibility guidelines (wcag) 1.0; stricter compliance is optional. states could stop at section 508 and only make web accessibility laws applicable to other state agencies. section 504 of the rehabilitation act, however, provides additional legislation to model. in section 504, no disabled person can be excluded from programs or activities that are funded by federal dollars.5 section 504 further their websites. neither does the federal government possess a carte blanche web accessibility law that applies to the nation. this absence of legislation may give the impression of irrelevance, but as more core components of librarianship migrate to the web, librarians should confront these issues so they can serve all patrons more effectively. this article provides background information on the federal laws most frequently referenced within web accessibility cases. additionally, this article tests three assumptions: ■■ although the federal government has no web accessibility laws in place for the general public, most states legalized web accessibility for their respective state agencies. ■■ most state statutes do not mention section 508 of the americans with disabilities act (ada) or acknowledge world wide web consortium (w3c) standards. ■■ most libraries are not included as entities that must comply with state web accessibility statutes. further discussion on why these issues are important to the library profession follows. ■■ literature review no previous study has systematically examined state web accessibility statutes as they relate to libraries. most articles that address issues related to library web accessibility view libraries as independent entities and run accessibility evaluators on preselected library and university websites.2 those same articles also evaluate the meaning and impact of federal disability laws that could drive the outcome of web accessibility in academia.3 in examining state statutes, additional complexities may be unveiled when delving into the topic of web accessibility and librarianship. ■■ background with no definitive stance on public web accessibility from the federal government, states became tasked with figure 1. these webpages look exactly the same to users, but the html structure actually differs in source code view. 36 information technology and libraries | march 2011 title ii, section 201 (1) defines “public entity” as state and local governments, including their agencies, departments, and districts.9 title iii, section 302(a) builds on title ii and states that in the case of commercial facilities, no individual shall be discriminated against on the basis of disability in the full and equal enjoyment of the goods, services, facilities, privileges, advantages, or accommodations of any place of public accommodation by any person who owns, leases . . . or operates a place of public accommodation.10 delineates specific entities subject to the auspice of this law. though section 504 never mentions web accessibility specifically, states could freely interpret and apply certain aspects of the law for their own use (e.g., making organizations receiving state funds create accessible websites to prevent the exclusion of disabled people). if states wanted to provide the highest level of service to all, they would also consider incorporating the most recent w3c recommendations. the w3c formed in 1994 to address the need for structural consistency across multitudinous websites and web browsers. the driving principle of the w3c is to make the benefits of the web accessible to all, “whatever their hardware, software, network infrastructure, native language, culture, geographical location, or physical or mental ability.”6 the most recent w3c guidelines, wcag 2.0, detail web accessibility guidelines that are simpler to understand and, if followed, could improve both accessibility and usability despite browser type. alternatively, states could decide to wait until the federal government mandates an all-encompassing law on web accessibility. the national federation of the blind (nfb) and american council of the blind (acb) have been trying commercial entities in courts, claiming that inaccessible commercial websites discriminate against disabled people. the famous nfb lawsuit against target provided a precedent for other courts to acknowledge; commercial entities should provide an accessible means to purchase regularly stocked items through their website (if they are already maintaining one).7 these commercial web accessibility lawsuits are often defended with title ii and title iii of the ada. title ii, section 202 states, subject to the provisions of this title, no qualified individual with a disability shall, by reason of such disability, be excluded from participation in or be denied the benefits of the services, programs, or activities of a public entity, or be discriminated by any such entity.8 figure 2. here we see distinct variances in the source code. the image at the top (inaccessible) reveals code that does not use headings or unordered lists for each resource. the image on the bottom (accessible) does use semantically correct code, maintaining the same look and feel of the headings and list items through an attached cascading stylesheet. web accessibility, libraries, and the law | fulton 37 accessibility believe that section 301(7) specifically denotes places of physical accommodation because the authors’ original intent did not include virtual ones.13 settling on a definition for “public accommodation” is so divisive that three district courts are receptive to “public accommodation” referring to nonphysical places, four district courts ruled against the notion, and four have not yet made a decision.14 despite legal battles within the commercial sector, state statute analysis shows that states felt compelled to address web accessibility on their own terms. ■■ method this study surveys the most current state statute web presences as they pertain to web accessibility and their connection to libraries. using georgia institute of technology’s state e&it accessibility initiatives database and golden’s article on accessibility within institutions of higher learning as starting points, i searched each state government’s online statutes for the most recently available code.15 examples of search terms used include “web accessibility,” “information technology,” and “accessibility -building -architecture -health.” “building,” for example, excluded statute results that pertained to building accessibility. i then reviewed each statute to determine whether its mandates applied to web accessibility. some statutes excluded mention of web accessibility but outlined specific requirements for an institution’s software procurement. when statutes on web accessibility could not be found, additional searches were conducted for the most recently available web accessibility guidelines, policies, or standards. using a popular web search engine and the search terms “[state] web accessibility” usually resulted in finding the state’s standards online. if the search engine did not offer desirable results, then i visited the appropriate state government’s website. the term “web accessibility” was used within the state government’s site search. the following results serve only as a guide. because of the ever-changing nature of the law, please consult legal advisors within your institution for changes that may have occurred post article publication. ■■ results “although the federal government has no web accessibility laws in place for the general public, most states legalized web accessibility for its respective state agencies.” false—only seventeen states have codified laws ensuring web accessibility for their state websites.16 four this title’s proclamation seems clear-cut; however, legal definitions of “public accommodation” differ. title iii, section 301(7) defines a list of acceptable entities to receive the title of “public accommodation.”11 among those listed are auditoriums, theaters, terminals, and educational facilities. courts using title iii in defense for web accessibility argue that the web is a place, and therefore cannot discriminate against those with visual, motor, or mental disabilities.12 those arguing against using title iii for web figure 3. fangs (http://www.standards-schmandards.com/ projects/fangs/) visually emulates what a standard screen reader outputs so that designers can take the first steps in creating more accessible content on the web. 38 information technology and libraries | march 2011 classified institutions with library websites found that less than half of each degree-producing division was directed by their institution to comply with the ada for web accessibility.24 some may not recognize the significance of providing accessible library websites, especially if they do not witness a large quantity of accommodation requests from their users. coincidentally, perceived societal drawbacks could keep disabled users from seeking the assistance they need.25 according to american community survey terminology, disabilities negatively affecting web accessibility tend to be sensory and self-care based.26 the 2008 american community survey public use microdata sample estimates that 10,393,100 noninstitutionalized americans of all ages live with a hearing disability and 6,826,400 live with a visual disability.27 according to the same survey, an estimated 7,195,600 noninstitutionalized americans live with a self-care disability. in other words, nearly 24.5 million people in the united states are unable to retrieve information from library websites unless web authors make accessibility and usability their goal. as gatekeepers of information and research resources, librarians should want to be the first to provide unrestricted and unhindered access to all patrons despite their ability. nonetheless, potential objections to addressing web accessibility can deter improvement: learning and applying web accessibility guidelines will be difficult. there is no way we can improve access to disabled users in a way that will be useful. actually, more than 90 percent of sensory-accessibility issues can be resolved through steps outlined in section 508, such as utilizing headings properly, giving alternative image descriptions, and providing captions for audio and video. granted, these elements may be more difficult to manage on extensive websites, but wisely applied web content management systems could alleviate information technology units’ stress in that respect.28 creating an accessible website is time consuming and resource draining. this is obviously an “undue burden” on our facility. we cannot do anything about accessibility until we are given more funding. the “undue burden” clause seen in section 508 and several state statutes is a real issue that government officials needed to address. however, individual institutions are not supposed to view accessible website creation as an isolated activity. “undue burden,” as defined by the code of federal regulations, relies upon the overall budget of the program or component being developed.29 claiming an “undue burden” means that the institution must extensively document why creating an accessible website would cause a burden.30 the institution would also have to provide disabled users an alternative means of access to information provided online. of these seventeen extended coverage to include agencies receiving state funds (with no exceptions).17 though that number seems disappointingly low, many states addressed web accessibility through other means. thirtyone states without web accessibility statutes posted some form of standard, policy, or guideline online in its place (see appendix). these standards only apply to state entities, however, and have no legal footing outside of federal law to spur enforcement. at the time of article submission, alaska and wyoming were the only two states without an accessibility standard, policy, or guideline available on the web. “most state statutes do not mention section 508 of the americans with disabilities act or acknowledge world wide web consortium (w3c) standards” true—interestingly, only seven of the seventeen states with web accessibility statutes reference section 508 or wcag 1.0 directly within their statute text (see appendix).18 minnesota is the only state that references the more current wcag 2.0 standards.19 these numbers may seem minuscule as well, but all states have supplemented their statutes with more descriptive guidelines and standards that delineate best practices for compliance (see appendix). within those guidelines and standards, section 508 and wcag 1.0 get mentioned with more frequency. “most libraries are not included as entities that must comply with state web accessibility statutes.” true—from the perspective of a librarian, the above data means that forty-eight states would require web accessibility compliance for their state libraries (see appendix). four of those states (arkansas, california, kentucky, and montana) require all libraries receiving state funds to maintain an accessible website.20 an additional four states (illinois, oklahoma, texas, and virginia) explicitly hold universities, and therefore their libraries, to the same standards as their state agencies.21 despite the commendable efforts of eight states pushing for more far-reaching web accessibility, thousands of k–12, public, and academic libraries nationwide escape these laws’ reach. ■■ discussion and conclusion without legal backing for web accessibility issues at all levels, “equitable access to information and library services” might remain a dream.22 notably, researchers have witnessed web accessibility improvements in a four-year span; however, as of 2006, even libraries at institutions with ala-accredited library and information science programs did not average an accessibility validation of 70 percent or higher.23 additionally, a survey of carnegie web accessibility, libraries, and the law | fulton 39 9. 42 u.s.c. §12131. 10. 42 u.s.c. §12182. 11. 42 u.s.c. §12181. 12. carrie l. kiedrowski, “the applicability of the ada to private internet web sites,” cleveland state law review 49 (2001): 719–47; shani else, “courts must welcome the reality of the modern word: cyberspace is a place under title iii of the americans with disabilities act,” washington & lee law review 65 (summer 2008): 1121–58. 13. ibid. 14. nikki d. kessling, “why the target ‘nexus test’ leaves disabled americans disconnected: a better approach to determine whether private commercial websites are ‘places of public accommodation,’” houston law review 45 (summer 2008): 991–1029. 15. state e & it accessibility initiatives workgroup, “state it database,” georgia institute of technology, http://acces sibility.gtri.gatech.edu/sitid/state_prototype.php (accessed jan. 28, 2010); nina golden, “why institutions of higher education must provide access to the internet to students with disabilities,” vanderbilt journal of entertainment & technology law 10 (winter 2008): 363–411. 16. arizona revised statutes §41-3532 (2010); arkansas code of 1987 annotated §25-26-201–§25-26-206 (2009); california government code §11135–§11139 (2010); colorado revised statutes §24-85-101–§24-85-104 (2009); florida statutes §282.601– §282.606 (2010); 30 illinois complied statutes annotated 587 (2010); burns indiana code annotated §4-13.1-3 (2010); kentucky revised statutes annotated §61.980–§ 61.988 (2010); louisiana revised statutes §39:302 (2010); maryland state finance and procurement code annotated §3a-311 (2010); minnesota annotated statutes §16e.03 subdivisions 9-10 (2009); missouri revised statutes §191.863 (2009); montana code annotated §185-601 (2009); 62 oklahoma statutes §34.16, §34.28–§34.30 (2009); texas government code §2054.451–§2054.463 (2009); virginia code annotated §2.2-3500–§2.2-3504 (2010); west virginia code § 18-10n-1–§18-10n-4 (2009). 17. arkansas code of 1987 annotated §25-26-202(7) (2009); california government code §11135 (2010); kentucky revised statutes annotated §61.980(4) (2010); montana code annotated §18-5-602 (2009). 18. arizona revised statutes §41-3532 (2010); california government code §11135(d)(2) (2010); burns indiana code annotated §4-13.1-3-1(a) (2010); florida statutes §282.602 (2010); kentucky revised statutes annotated §61.980(1) (2010); minnesota annotated statutes §16e.03 subdivision 9(b) (2009); missouri revised statutes §191.863(1) (2009). 19. minnesota annotated statutes §16e.03 subdivision 9(b) (2009). 20. arkansas code of 1987 annotated §25-26-202(7) (2009); california government code §11135 (2010); kentucky revised statutes annotated §61.980(4) (2010); montana code annotated §18-5-602 (2009). 21. 30 illinois complied statutes annotated 587/10 (2010); 62 oklahoma statutes §34.29 (2009); texas government code §2054.451 (2009); virginia code annotated §2.2-3501 (2010). 22. american library association, “alahead to 2010 strategic plan,” http://www.ala.org/ala/aboutala/missionhistory/ plan/2010/index.cfm (accessed jan. 28, 2010). 23. comeaux and schmetzke, “accessibility trends.” no one will sue an institution focused on promoting education. we will just continue providing one-on-one assistance when requested. in 2009, a blind student, backed by the nfb, initiated litigation against the law school admissions council (lsac) because of the inaccessibility of its online tests.31 in 2010, they added four law schools to the defense: university of california hastings college of the law, thomas jefferson school of law, whittier law school, and chapman university school of law.32 these law schools were added because they host their application materials on the lsac website.33 assuredly, if instructors and students are encouraged or required to use library webpages for assignments and research, those unable to use them in an equivalent manner as their peers may pursue litigation for forcible change. ultimately, providing accessible websites for library users should not be perceived as a hassle. sure, it may entail a new way of thinking, but the benefits of universal access and improved usability far outweigh the frustration that users may feel when they cannot be self-sufficient in their web-based research.34 regardless of whether the disabled user is in a k–12, college, university, or public library, they are paying for a service that requires more than just a physical accommodation.35 federal agencies, state entities, and individual institutions are all responsible (and important) in the promotion of accessible website construction. lack of statutes or federal laws should not exempt libraries from providing equivalent access to all; it should drive libraries toward it. references 1. individuals with disabilities education act of 2004, 40 u.s.c. §1411–§1419. 2. see david comeaux and axel schmetzke, “accessibility trends among academic library and library school web sites in the usa and canada,” journal of access services 6 (jan.–june 2009): 137–52; julia huprich and ravonne green, “assessing the library homepages of copla institutions for section 508 accessibility errors: who’s accessible, who’s not and how the online webxact assessment tool can help,” journal of access services 4, no. 1 (2007): 59–73; michael providenti and robert zai iii, “web accessibility at kentucky’s academic libraries,” library hi tech 25, no. 4 (2007): 478–93. 3. ibid.; michael providenti and rober zai iii, “web accessibility at academic libraries: standards, legislation, and enforcement,” library hi tech 24, no. 4 (2007): 494–508. 4. 29 u.s.c. §794(d); 36 code of federal regulations (cfr) §1194.1. 5. 29 u.s.c. § 794. 6. world wide web consortium, “w3c mission,” http:// www.w3.org/consortium/mission.html (accessed jan. 28, 2010). 7. national federation of the blind v. target corp., 452 f. supp. 2d 946 (n.d. cal. 2006). 8. 42 u.s.c. §12132. 40 information technology and libraries | march 2011 special needs, vol. 5105, lecture notes in computer science (linz, australia: springer-verlag, 2008) 454–61; david kane and nora hegarty, “new site, new opportunities: enforcing standards compliance within a content management system,” library hi tech 25, no. 2 (2007): 276–87. 29. 28 cfr §36.104. 30. ibid. 31. sheri qualters, “blind law student sues law school admissions council over accessibility,” national law journal (feb. 20, 2009), http://www.law.com/jsp/nlj/pubarticlenlj .jsp?id=1202428419045 (accessed jan. 28, 2010). follow the case at the county of alameda’s superior court of california, available online (search for case number rg09436691): http://apps .alameda.courts.ca.gov/domainweb/html/index.html (accessed sept. 20, 2010). 32. ibid. 33. ibid. after finding the case, click on “register of actions” in the side navigation menu. these details can be found on page 10 of the action “joint case management statement filed,” uploaded june 30, 2010. 34. jim blansett, “digital discrimination: ten years after section 508, libraries still fall short of addressing disabilities online,” library journal 133 (aug. 2008): 26–29; drew robb, “one site fits all: companies are working to make their web sites comply with accessibility guidelines because the effort translates into more customers,” computerworld (mar. 28, 2005): 29–32. 35. the united states department of justice supports title iii’s application of “public accommodation” to include virtual web spaces. see u.s. department of justice, “settlement agreement between the united states of america and city of missoula county, montana under the americans with disabilities act,” dj# 204-44-45, http://www.justice.gov/crt/foia/mt_1.php and http://www.ada.gov/missoula.htm (accessed jan. 28, 2010). 24. ruth sara connell, “survey of web developers in academic libraries,” journal of academic librarianship 34, no. 2 (2008): 121–29. 25. patrick m. egan and traci a. guiliano, “unaccommodating attitudes: perceptions of students as a function of academic accommodation use and test performance” north american journal of psychology 11, no. 3 (2009): 487–500; ramona paetzold et al., “perceptions of people with disabilities: when is accommodation fair?” basic & applied social psychology 30 (2008): 27–35. 26. u.s. census bureau, american community survey, puerto rico community survey: 2008 subject definitions (washington, d.c.: government printing office, 2009). hearing disability pertains to deafness or difficulty in hearing. visual disability pertains to blindness or difficulty seeing despite prescription glasses. self-care disability pertains to those whom have “difficulty dressing or bathing.” 27. u.s. census bureau, data set: 2006–2008 american community survey (acs) public use microdata sample (pums) 3-year estimates (washington, d.c.: government printing office, 2009). for a more interactive table, with statistics drawn directly from the american community survey pums data files, see the database created and maintained by the employment and disability institute at cornell university: m. j. bjelland, w. a. erickson, and c. g. lee, disability statistics from the american community survey (acs), cornell university rehabilitation research and training center on disability demographics and statistics (statsrrtc), http://www.disabilitystatistics.org (accessed jan. 28, 2010). 28. sébastien rainville-pitt and jean-marie d’amour, “using a cms to create fully accessible web sites,” journal of access services 6 (2009): 261–64; laura burzagli et al., “using web content management systems for accessibility: the experience of a research institute portal,” in proceedings of the 11th international conference on computers helping people with appendix. library website accessibility requirements, by state state libraries included? code online state statutes online statements/policies/ guidelines ala. n/a n/a n/a http://isd.alabama.gov/isd/statements .aspx alas. n/a n/a n/a n/a ariz.* state and statefunded (with exceptions) arizona revised statutes §413532 http://www.azleg.state.az.us/ arizonarevisedstatutes.asp? title=41 http://az.gov/polices_accessibility.html ark. state and state-funded arkansas code annotated §2526-201 thru §25-26-206 http://www.arkleg.state.ar.us/assembly/ arkansascodelargefiles/title%2025%20 state%20government-chapter%2026%20 information%20technology.htm and http:// www.arkleg.state.ar.us/bureau/publications/ arkansas%20code/title%2025.pdf http://portal.arkansas.gov/pages/policy .aspx web accessibility, libraries, and the law | fulton 41 state libraries included? code online state statutes online statements/policies/ guidelines calif.* state and state-funded california government code §11135 thru §11139 http://www.leginfo.ca.gov/calaw.html http://www.webtools.ca.gov/accessibility/ state_standards.asp colo. state colorado revised statutes §2485-101 thru §24-85-104 http://www.state.co.us/gov_dir/leg_dir/ olls/colorado_revised_statutes.htm www.colorado.gov/colorado/accessibility .html conn. n/a n/a n/a http://www.access.state.ct.us/ del. n/a n/a n/a http://gic.delaware.gov/information/ access_central.shtml fla.* state florida statutes §282.601 thru §282.606 http://www.leg.state.fl.us/statutes/ http://www.myflorida.com/myflorida/ accessibility.html ga. n/a n/a n/a http://www.georgia.gov/00/static/ 0,2085,4802_0_0_accessibility, 00.html hawaii n/a n/a n/a http://www.ehawaii.gov/dakine/docs/ada .html idaho n/a n/a n/a http://idaho.gov/accessibility.html ill. state and university 30 illinois complied statutes annotated 587 http://www.ilga.gov/legislation/ilcs/ilcs.asp http://www.dhs.state.il.us/page.aspx? item=32765 ind.* state and local government burns indiana code annotated §4-13.1-3 http://www.in.gov/legislative/ic/code/title4/ ar13.1/ch3.html http://www.in.gov/core/accessibility.htm iowa n/a n/a n/a http://www.iowa.gov/pages/accessibility kans. n/a n/a n/a http://www.kansas.gov/about/ accessibility_policy.html ky.* state and state-funded kentucky revised statutes annotated §61.980 thru §61.988 http://www.lrc.ky.gov/krs/titles.htm http://technology.ky.gov/policies/ webtoolkit.htm la. state louisiana revised statutes §39:302 http://www.legis.state.la.us/ http://www.louisiana.gov/government/ policies/#webaccessibility maine n/a n/a n/a http://www.maine.gov/oit/accessibility/ policy/webpolicy.html appendix. library website accessibility requirements, by state (continued) 42 information technology and libraries | march 2011 state libraries included? code online state statutes online statements/policies/ guidelines md. state and (possibly) community college maryland state finance and procurement code annotated §3a311 http://www.michie.com/maryland/ and http://www.dsd.state.md.us/comar/coma r.aspx http://www.maryland.gov/pages/ accessibility.aspx mass. n/a n/a n/a http://www.mass.gov/accessibility and http://www.mass.gov/?pageid=mg2utiliti es&l=1&sid=massgov2&u=utility_policy_ accessibility mich. n/a n/a n/a http://www.michigan.gov/som/0,1607,7– 192–26913–2090—, 00.html minn.** state minnesota annotated statutes §16e. 03 subdivisions 9–10 https://www.revisor.mn.gov/pubs/ http://www.starprogram.state.mn.us/ accessibility_usability.htm miss. n/a n/a n/a http://www.mississippi.gov/access_policy .jsp mo.* state missouri revised statutes §191.863 http://www.moga.mo.gov/statutes/ statutes.htm http://oa.mo.gov/itsd/cio/standards/ ittechnology.htm mont. state and state-funded montana code annotated §185-601 http://data.opi.mt.gov/bills/mca_toc/index .htm http://mt.gov/discover/disclaimer .asp#accessibility neb. n/a n/a n/a http://www.webmasters.ne.gov/ accessibilitystandards.html nev. n/a n/a n/a http://www.nitoc.nv.gov/psps/3.02_ standard_webstyleguide.pdf n.h. n/a n/a n/a http://www.nh.gov/wai/ n.j. n/a n/a n/a http://www.state.nj.us/nj/accessibility.html n.m. n/a n/a n/a http://www.newmexico.gov/accessibility .htm n.y. n/a n/a n/a http://www.cio.ny.gov/policy/nys-p08– 005.pdf n.c. n/a n/a n/a http://www.ncsta.gov/docs/principles%20 practices%20standards/application.pdf n. dak. n/a n/a n/a http://www.nd.gov/ea/standards/ ohio n/a n/a n/a http://ohio.gov/policies/accessibility/ appendix. library website accessibility requirements, by state (continued) web accessibility, libraries, and the law | fulton 43 state libraries included? code online state statutes online statements/policies/ guidelines okla. state and university 62 oklahoma statutes §34.16, §34.28 thru §34.30 http://www.lsb.state.ok.us/ http://www.ok.gov/accessibility/ ore. n/a n/a n/a http://www.oregon.gov/accessibility.shtml pa. n/a n/a n/a http://www.portal.state.pa.us/portal/ server.pt/community/it_accessibility/10940 r.i. n/a n/a n/a http://www.ri.gov/policies/access.php s.c. n/a n/a n/a http://sc.gov/policies/accessibility.htm s. dak. n/a n/a n/a http://www.sd.gov/accpolicy.aspx tenn. n/a n/a n/a http://www.tennesseeanytime.org/web -policies/accessibility.html tex. state and university texas government code §2054.451 thru §2054.463 http://www.statutes.legis.state.tx.us/ http://www.texasonline.com/portal/tol/en/ policies utah n/a n/a n/a http://www.utah.gov/accessibility.html va. state, university, and commonwealth virginia code annotated §2.2-3500 thru §2.2-3504 http://leg1.state.va.us/000/src.htm http://www.virginia.gov/cmsportal3/ about_virginia.gov_4096/web_policy.html vt. n/a n/a n/a http://www.vermont.gov/portal/policies/ accessibility.php wash. n/a n/a n/a http://isb.wa.gov/webguide/accessibility .aspx w. va. state west virginia code §1810n-1 thru §18-10n-4 http://www.legis.state.wv.us/wvcode/ code.cfm http://www.wv.gov/policies/pages/ accessibility.aspx wis. n/a n/a n/a http://www.wisconsin.gov/state/core/ accessibility.html wyo. n/a n/a n/a n/a *these states mention section 508 of the rehabilitation act within statute text **this state mentions wcag 2.0 within its statute text note: most states with statutes on web accessibility also have statements, policies, and guidelines that are more detailed than the statute text and may contain references to section 508 and wcag 2.0. all webpages were visited between january 1, 2010, and february 12, 2010. appendix. library website accessibility requirements, by state (continued) 34 information technology and libraries | december 2007 author id box for 2 column layout column title editor as public libraries are becoming e-government access points relied on by both patrons and government agencies, it is important for libraries to consider the implications of these roles. while providing e-government access serves to reinforce the tremendously important role of public libraries in the united states social infrastructure, it also creates new demands on libraries and opens up significant new opportunities. drawing upon several different strands of research, this paper examines the nexus of public libraries, values, trust, and e-government, focusing on the ways in which the values of librarianship and the trust that communities place in their public libraries reinforce the role of public libraries in the provision of e-government. the unique values embraced by public libraries have not only shaped the missions of libraries, they have influenced popular opinion surrounding public libraries and fostered the confidence that communities place in them as a source of trusted information and assistance in finding information. as public libraries have embraced the provision of internet access, these values and trust have become intertwined with their new social role as a public access point for e-government both in normal information activities and in the most extreme circumstances. this paper explores the intersections of these issues and the relation of the vital e-government role of public libraries to library funding, public policy, library and information science education, and research initiatives. p ublic libraries have always been valued and trusted institutions within society. due to recent advances in technology and changes in united states society, public libraries now also play a unique and critical role by offering free public internet access. with the increas ing reliance on the internet as a key source of news, social capital, and access to government services and information, the free access provided by public librar ies is an invaluable resource. as a result, a significant proportion of the u.s. population, including people who have no other means of access, people who need help using computers and the internet, and people who have lower quality access, rely on the internet access and computer help available in public libraries. federal, state, and local government agencies now also rely on public libraries to provide citizens with access to and guidance in using egovernment web sites, forms, and services; many government agencies simply direct citizens to the nearest public library for help. this confluence of events has created a major new social role for public libraries— guarantors of internet and egovernment access. though public libraries are not the only points of free internet access in many communities, they have created the strongest commitment to providing access and help for all. by providing not only the access to technology, but also to help using the technology, libraries became internet access points, while community technology cen ters, which usually did not offer the same level of avail able assistance, failed in the late 1990s and early 2000s. further, as libraries not only provide internet access, but free computer access as well, they attract the people who do not own computers and do not benefit from a city’s or coffee shop’s free wifi. the compelling combination of free computer access, free internet access, the avail ability of assistance from knowledgeable librarians, the value that public librarians place on serving their local communities, and the historical trust that society places in public libraries has made libraries a critical part of the u.s. social infrastructure. without public libraries, large segments of the population would be cut off from access to the internet and egovernment. while the provision of internet access for those who have no other access parallels the role of public libraries as providers of access to print materials, the matura tion of public libraries into internet and egovernment access hubs has profound implications for the roles that public libraries are being expected to play in their communities. public libraries are trusted by their com munities as places that community members can turn to for unfettered information access and as places to go for information in times of need. combining this trust with the power of internet access and support makes public libraries even more critical within their local com munities. the trust placed in libraries is also important in balancing the lack of confidence that many citizens place in other government institutions as well as in the internet. clearly, egovernment, which exists at this intersection, has its trustworthiness bolstered by the role of public libraries in its use. as patrons are able to access egovernment through the library—a place that is trusted—they may have greater confidence in the gov ernment services they use through library computers and with the assistance of librarians. the important role of libraries in providing citizens with access to the internet, and especially to egovern paul t. jaeger (pjaeger@umd.edu) is an assistant professor and director of the center for information policy and electronic government at the college of information studies of the university of maryland, college park. kenneth r. fleischmann (kfleisch@umd.edu) is an assistant professor at the college of information studies of the university of maryland, college park. paul t. jaeger and kenneth r. fleischmann public libraries, values, trust, and e-government article title | author 35public libraries, values, trust, and e-government | jaeger and fleischmann 35 ment, makes natural sense given the values of the public library. these new services reflect the values traditionally upheld by public libraries, such as equal access to infor mation, literacy and learning, and democracy. indeed, these values likely have played a significant role in developing and sustaining public trust in public libraries as institutions. thus, to understand how public libraries have come to serve as the default site for egovernment access, it is important to consider how this role builds on and reflects the public library’s enduring values. drawing upon several different strands of research, this article explores the intersections of public libraries, values, trust, and egovernment. the article first exam ines the values of public libraries and the role that these values play in influencing popular opinion surrounding public libraries. next, the article focuses on the trust that communities place in public libraries, which builds upon the values that libraries uphold. after that, the article explores the reasons why public libraries became and remain the public access point for egovernment, providing examples from the 2004 and 2005 hurricane seasons that illustrate this point in the most extreme cir cumstances. the article then examines the nexus of public libraries, values, trust, and egovernment, further exam ining how the values of librarianship and the confidence that communities place in their public libraries reinforce the role of public libraries in the provision of egovern ment. finally, the article explores how the egovernment role of public libraries could be cultivated to improve library services through involvement in research and educational initiatives. ■ public libraries and values values can be seen as “evaluative beliefs that synthesize affective and cognitive elements to orient people to the world in which they live.”1 in other words, values tie together how individuals think about the world and how they feel about the world. following this definition, values are situated within individuals. although they are a result of social interaction and may be shared among individuals, values are a highly individualized and per sonalized phenomenon. thus, values arise at the intersec tion of the individual and the social, with some scholars now making a case for increasing the emphasis placed on values in the social sciences.2 recently, many scholars and commentators have focused on the values of librar ies, most notably former ala president michael gorman, who has written extensively on the topic.3 gorman focuses on library values in response to what he views as a disconnect between library practitioners and academics. he argues that libraryscience programs are becoming increasingly detached from reality, and that one way to ground library science, as well as the library profession, is through an emphasis on the values of librar ianship, which demonstrate the core, enduring values of the profession.4 he explains that values, on the one hand, should provide a foundation for interaction and mutual understanding among members of a profession; on the other hand, they should not be viewed as immutable, but rather as sufficiently flexible to match the changing times. he lists eight central values of librarianship that he views as particularly salient at present: stewardship, service, intellectual freedom, rationalism, literacy and learning, equity of access to recorded knowledge and information, privacy, and democracy. frances groen echoes gorman’s sentiments and argues that one of the major limitations of libraryscience programs is their lack of attention to values.5 she argues that library and information science (lis) programs place almost all of their educational emphasis on what librar ians do and how they do it, and almost none on the rea sons why they do what they do and why such activities are important. she identifies three fundamental library values: access to information, universal literacy, and preservation of cultural heritage, all of which she argues are also characteristics of liberal democratic societies. this argument parallels the observation that increases in information access within a society are essential to increasing the inclusiveness of the democratic process in that society.6 library historian toni samek focuses on another aspect of library values that is no longer as strongly emphasized—attempts to achieve neutrality in libraries.7 neutrality often was advocated as a cherished value, in the sense of providing equal access to all information and sources. however, samek demonstrates that libraries, on the contrary, were more likely to emphasize mainstream information sources and thus privilege them over alter native sources. not only has the value of neutrality been problematic in terms of how it has been implemented and mobilized in public libraries in the 1960s and 1970s, but it also is perhaps impossible to ever achieve in reality.8 the fact that neither gorman nor groen include neutrality in their listings of fundamental library values demonstrates how library values have continued to evolve as public libraries have developed as social institutions. as library values have developed, they have served to unite librarians and establish the role of public libraries in their communities. the values of librarianship have been encoded in the american library association’s (ala) library bill of rights, which strongly asserts the values of equal access and service for all patrons, nondiscrimina tion, diversity of viewpoint, and resistance to censorship and other abridgments of freedom of expression.9 the values of libraries and librarianship are one of the fac tors that lead communities to trust public libraries, as the following section explores. overall, further study of the 36 information technology and libraries | december 200736 information technology and libraries | december 2007 role of values in libraries is essential, especially given the increasing role of technology in public libraries.10 ■ public libraries and trust exactly one half of the respondents to a 2007 pew research center study agreed with the statement “you can’t be too careful in dealing with people.”11 however, even in a climate where trust can be a precious commodity, public libraries are trusted by their communities. carr argues that libraries have come to earn the trust of their com munities because of four obligations that librarians strive to meet: to provide usercentered service, to actively engage in helping users, to connect information seekers to unexplored information sources, and to take the goal of helping users as a professional duty that is controlled first and foremost by the library user.12 similarly, jaeger and burnett argue that, because of its traditional defense of commonly accepted and popular values—such as free access to and exchange of information, providing a diverse range of materials and perspectives to users from across society, and opposition to government intrusions into personal reading habits—public libraries have come to be seen by members of the populace as a trusted source of information in the community.13 gorman argues for a direct link between the values of libraries and the trust that is instilled within them by the public, stating that one important mission for ensuring the survival of libraries and librarianship is “assuring the bond of trust between the library and the society we serve by demonstrating our stewardship and commitment, thus strengthening the mutuality of the interests of librar ians and the broader community.”14 further, a 2006 study conducted by public agenda found that “public libraries seem almost immune to the distrust that is associated with so many other institutions.”15 in specific terms of the internet, the public library “is a trusted communitybased entity to which individuals turn for help in their online activities—even if they have comput ers and internet access at home or elsewhere.”16 in a large scale national survey, 64 percent of respondents, including both users and nonusers of public libraries, asserted that providing public access to the internet should be one of the highest priorities for public libraries.17 thus, trust in public libraries seems to carry over from other library services to provision of internet access and training. however, challenges to trust in public libraries seem to be growing in the internet age. the trusted role of pro tecting users’ personal information may create conflicts with the other social responsibilities of public libraries.18 as a result of a lack of preparedness of some librarians to deal with privacy issues, it is possible that “the trust that research shows users place in libraries is not fully repaid.”19 a 2005 oclc study suggests that, indeed, user trust in public libraries shows signs of weakening, as the majority of citizens place as much trust in internet search engines as they do in public libraries.20 further, the changes in the law following the 9/11 terror attacks that have increased the ability of the federal government to track patron activities in public libraries, such as through the usa patriot act, have raised serious concerns about privacy and freedom of expression among many public library patrons and librarians.21 trust in libraries also has been challenged by the impo sition of filters for public libraries that receive erate fund ing due to the children’s internet protection act.22 while internet access is no longer unfettered in libraries that have to comply with the law, public libraries have been able to prevent this law from eroding their role as trusted internet provider through ala’s vigorous legal challenge to the constitutionality of law and the rejection of erate funds by a large number of libraries after the supreme court upheld the constitutionality of the law.23 thus, the trusting rela tionships that public libraries have built with their com munities are valuable commodities that can be transferred under some circumstances from one particular service to another, yet are not inalienable rights granted to public libraries. rather, public trust is something that libraries must work hard to maintain. trust in public libraries also has served as an important cause and effect of the role of libraries in providing access to egovernment. ■ public libraries and e-government public libraries are not only trusted as a means of access to the internet in general, they are trusted as a provider of access to egovernment. with nearly every united states public library now connected to the internet and offer ing free public access, they can fill a community need of ensuring that all citizens have access to egovernment and assistance using egovernment services.24 indeed, public libraries and the internet have both improved public access to government information.25 this social role also is embraced by all levels of government, with government agencies often directing people with questions about their online materials to public libraries for help.26 as such, government agencies also trust public libraries to serve as key providers of e government access and training. public libraries could not have foreseen becoming the default social access point for egovernment when they began to provide free public internet access in the mid1990s, due in great part to the largely separate evolution of internet access in libraries and egovernment. however, they now fill this role in society, ensuring access to those who have no other means of reaching egovernment and providing a safety article title | author 37public libraries, values, trust, and e-government | jaeger and fleischmann 37 net of training and assistance for those who have access but need help using egovernment. public libraries have developed into the social source of egovernment for two reasons. the first is simply that libraries committed to the provision of public internet access in the early 1990s and have continued to grow and improve that access so that virtually all public libraries in the united states provide free public internet access.27 however, presence of access alone does not account for the current role of the public library, as most public schools and government offices have internet access, and community technology centers were origi nally funded to create an environment that would provide computer access. a key difference in public libraries is that they are historically trusted as providers of information, including government information, to all segments of society. “the public library is one place that is culturally ingrained as a trusted source of free and open information access and exchange.”28 a key part of the provision of internet access in pub lic libraries also has been providing help. as heanue explains, “even if americans had all the hardware they needed to access every bit of government information they required, many would still need the help of skilled librarians whose job it is to be familiar with multiple systems of access to government systems.”29 not only is the information trusted because of the source, the help is trusted because the librarians are part of the library. as egovernment has developed and the complexity has grown, this trusted help has become invaluable to many people who need to use egovernment but do not feel able to on their own. in a 2001 study of both public library and internet users, the key preferences identified for public libraries included the ease of use, accuracy of informa tion available, and help provided by library staff.30 these perceptions have carried over into egovernment, as the staff members not only provide help using egovernment; their guidance directs users to the correct egovernment sites and forms and makes using the sites an easier expe rience than it otherwise would be. in the era of egovernment, governments internation ally are showing a strong preference for delivering ser vices via the internet, particularly as a means of boosting costefficiency and reducing time spent on direct interac tions with citizens.31 however, citizens show a strong preference for phonebased or inperson interactions with government representatives when they have questions or are seeking services.32 egovernment services generally are limited by difficulties in searching for and locating the desired information, as well as lack of availability of computers and internet access to many segments of the general population.33 such problems are exacerbated by general lack of familiarity of the structure of government and which agencies to contact as well as many citizens’ attitudes toward technology and government.34 also, as egovernment sites give more emphasis to presenting political agendas rather than promoting democratic par ticipation, users are less trusting of the sites themselves.35 finally, perhaps the most compelling reason for the reli ance on public libraries to provide access to and help with egovernment is that public libraries provide support equally to all members of a community—and that free services are of most relative value to those who have the fewest resources of their own. as a result of the reliance of patrons and government agencies on the public library as a center for egovernment access and assistance, public librarians have had to become de facto experts on egovernment, ranging from medicare prescription plans to fema forms to immigration registra tion to water management registration.36 in one case, the involvement of a librarian who specialized in government information was necessary in a community planning pro cess to sort through the related egovernment materials and information sources.37 one area where the social roles as provider of egovernment and as trusted provider of information were notably intertwined was during the 2004 and 2005 hurricane seasons along the gulf coast. ■ public libraries as trusted provider of e-government public libraries have become vital access points and com munication hubs for many communities and, in times of emergency, are vital in helping their communities cope with the crisis.38 this role proved especially important in com munities along the gulf coast during the unprecedented 2004 and 2005 hurricane seasons, with public libraries employing their internet access to assist their communities in hurricane recovery in numerous ways. the public librar ies in that region described five major roles for the public library internet access in communities after a hurricane: ■ finding and communicating with dispersed and dis placed family members and friends; ■ completing fema forms, which are online only, and insurance claims; ■ searching for news about conditions in the areas from which they had evacuated; ■ trying to find information about the condition of their homes or places of work, including checking news sites and satellite maps; and ■ helping emergency service providers find informa tion and connect to the internet.39 the provision of egovernment information and assis tance in filling out egovernment forms was a central function of these libraries in helping their communities. the level of assistance was astounding—one mississippi library completed more than fortyfive thousand fema 38 information technology and libraries | december 200738 information technology and libraries | december 2007 applications for patrons in the first month after katrina struck—despite the fact that the libraries were not specifi cally prepared to offer such a service and that few library systems planned for this type of situation.40 furthermore, while libraries helped many communities, they could not meet the enormous needs in the affected communi ties. the events along the gulf coast in 2004 and 2005 revealed a serious need for the integration of local and state public entities that have largescale coordination plans to work with the libraries.41 most of the functions that community organizations played in the most ravaged areas after katrina, rita, wilma, dennis, ivan, and the other major storms were completely ad hoc and unplanned.42 the federal gov ernment was of little help in the immediate aftermath of many of these situations.43 as such, it was the local community organizations, particularly public libraries, that used information technology (at least what was still working) to try to pick up the pieces, get aid, find the missing, and perform other vital functions. consider the following quotes from local government officials explaining the role computers and internet access in public libraries played in providing information to dev astated communities: our public access computers have been the only source of communicating with insurance carriers, the federal emergency management agency and other sources of aid. the greatest impact has been access to information such as fema forms and job applications that are only available via internet. this was highly visible during the aftermath of hurricanes rita & katrina. overall access to information in this rural community has been outstanding due to use of the internet. relief workers were encouraged to use the library to keep in touch with family and friends through email. . . . the library provided a fema team with local maps and help in locating areas that potentially suffered major damage from the storm. during the immediate aftermath of katrina, our com puters were invaluable in locating missing family, applying for fema relief (which could only be done online) and other emergency needs. for that time—the computers were a godsend. we have a large number of displaced people who are coming to rely upon the library in ways many of them never expected. i’ve had so many people tell me that they had never been to a library before they had to find someplace to file a fema application or insur ance claim. many of these people knew nothing about computers and would have been totally lost without the staff’s help.44 along with egovernment access, one of the greatest affects of access to information related to searches for lost family, friends, and pets, with many libraries creating lists of individuals who had been to the library and who were being sought to help in establishing contacts between people. as one librarian stated, “our computers were invaluable in locating a missing family.”45 searches were conducted by patrons and by librarians helping them to locate evacuees and search for information about those who stayed behind. internet access also allowed patrons to have “contact with family members outside of the disaster area,” “communicate with family and friends,” and “stay in touch with family and friends due to lack of telephone service.”46 libraries used their internet access to aid rescue personnel to communicate with their agen cies, and even to direct emergency responders with direc tions, maps, and information about where people most needed help.47 the level of local libraries’ success in meeting the needs of their communities after the hurricanes varied widely, though. many were simply overwhelmed by the numbers of people in need and limited by the fact that they had never expected to have to act as a community lifeline in this way.48 the libraries that faired the best were usually in florida; they have a greater familiarity with dealing with hurricanes and thus were more prepared and had more established ties between local libraries, county governments, and state agencies.49 having internet access and expertise is clearly not enough. planning, coordina tion, experience, and government support and funding all influenced how different public libraries were able to respond after the major hurricanes. public libraries also may be able to play a role in ongoing emergency response efforts, such as the development of largescale community response grids that coordinate citizens and emergency responders in emergencies.50 the greatest lesson, however, may be that public librar ies, as trusted providers of information technology access, particularly access to egovernment, are the most local line of response in communities. the national government failed shatteringly and completely to help people after hurricane katrina, while little public libraries in and on the edges of the devastation hummed along. the local nature of the response that libraries could provide man aged to reach communities and members of those commu nities much better than national or state level responses. such local response to crises, while vital, is becoming much harder to find outside of public libraries. ■ the nexus of public libraries, values, trust, and e-government the democratically oriented core values of public librar ies and the trust that communities place in their public article title | author 39public libraries, values, trust, and e-government | jaeger and fleischmann 39 libraries have the potential to significantly enhance and strengthen the role of public libraries in the provision of egovernment. citizens who access egovernment using computers in public libraries, and with the expert assistance of librarians, may have more confidence in the egovernment information and services they are using as a result of their high regard for public libraries. as patrons trust that librarians will help them reach the information they need, patrons’ awareness of and confidence in egovernment will increase as they learn from librarians about the types of information and services available from egovernment. further, by teaching patrons what is available from and how to use egovernment, librar ians are serving to increase the number of egovernment users. because egovernment is still at an early stage in its development, such positive associations could play a critical role in encouraging and facilitating its widespread acceptance and adoption. just as egovernment is still in its formative stages, research on egovernment also is just getting started. to date, research on egovernment has focused more on technical than social aspects. for example, a meta analysis of 110 peerreviewed journal articles related to egovernment revealed that the relationship between egovernment and values is an important, yet to date understudied, topic.51 it is important to consider not only bandwidth and markup languages, but also values and trust in developing and analyzing egovernment. it also is important to consider the relationship between trust in egovernment and the potential for increasingly participatory democracy. trust can be seen as “centrally positioned at the nexus between the primarily internally driven administrative reforms of egovernment’s architecture and the related, more exter nally rooted pressures for egovernance reflected in widening debates on openness and engagement.”52 similarly, “citizen engagement can help build and strengthen the trust relationship between governments and citizens.”53 through egovernment, it is possible to facilitate citizen participation in government through the bidirectional interactive potential of the internet, making it possible to move toward strong democracy.54 greater faith in democracy can potentially significantly increase citizen trust in egovernment. at the same time that we consider all of these impor tant issues related to egovernment, it is important not to lose sight of the critical role that public libraries play in the provision of egovernment. further, it is necessary to make certain that public libraries receive credit and support for the work that they do in providing access to and help with egovernment. as demonstrated above, public libraries are uniquely and ideally situated to ensure access to and assistance in using egovernment information and services. however, this activity is not sustainable without the recognition and resources that must accompany this role. the conclusion addresses this important point in more detail. ■ conclusions and future directions the evolution of the public library into an egovernment access point has occurred without the direct intention of public libraries and without their involvement in policy decisions related to these new social roles. as with the need to become more active in encouraging the develop ment of technologies to help libraries fulfill these social expectations, public libraries also must become more involved in the policymaking process and in seeking financial and other support for these activities. public libraries have to demand a voice not only to better con vey their critical role in the provision egovernment, but to help shape the direction of the policymaking process to ensure more government support for the access to and help with egovernment that they provide. public libraries have taken on these responsibilities without receiving additional funding. while the provi sion of internet access alone is a major expense for public libraries, the reliance of government agencies on public libraries as the public support system for egovernment adds very significant extra burdens to libraries.55 in a 2007 survey of florida public libraries, for example, 98.7 percent indicated that they receive no support from an outside agency to support the egovernment services the library provides, despite the fact that 83.3 percent of responding libraries indicated that the use of egovern ment in the library had increased overall library usage.56 this lack of outside support has resulted in public librar ies in different parts of the country having widely varying access to the internet.57 the reality is that public libraries are expected by patrons and government agencies to fulfill this social role, whether or not any support—financial, staffing, or training—is provided for this role. the vital roles that public libraries played in the aftermath of the major hur ricanes of the 2004 and 2005 seasons may have perma nently cemented the public and government perception of public libraries as hubs for egovernment access.58 while public libraries have become the unofficial uni versal access point for egovernment and are trusted to serve as a vital community response and recovery agency during emergencies, they do not receive funding or other forms of external assistance for these functions. public libraries need to become involved in and encourage plans and programs that will serve to sustain these essential and inextricably linked activities, while also bringing some level of financial, training, and staffing support for these roles. the tremendous efforts and successes of public librar ies in the aftermath of the 2004 and 2005 hurricanes has 40 information technology and libraries | december 200740 information technology and libraries | december 2007 earned libraries a central position to egovernment and emergency planning at local, state, and federal levels. in those emergency situations, public libraries were able to serve their communities in a capacity that was far beyond the traditional image of the role of libraries, but these emergency response roles are as significant as anything else libraries could do for their communities. in order to continue fulfilling these roles and adequately performing other expected functions, public libraries need to push not only for financial support, but also for a greater role in planning and decisionmaking related to egovernment services as well as emergency response and recovery at all levels of government. if strategic plans and library activities have a consis tent message about the need for support, the interrelated roles of trusted source of local information, egovernment access provider, and communityresponse information and coordination center can make a compelling argument for increases in funding, support, and social standing of public libraries. the most obvious source of further sup port for these activities would be the federal government. amazingly, federal government support accounts for only about 1 percent of public library funding.59 given that federal government agencies are already relying on public libraries to ensure access to egovernment and fos ter community response and recovery in times of emer gencies, federal support for these social roles of the public library clearly can and should be increased significantly. state libraries, cooperatives, and library networks already work to coordinate funding and activities related to certain programs, such as the erate program.60 these same library collectives may be able to work together to promote the need for additional resources and coor dinate those resources once they are attained. private and public partnerships offer another potential means of support for these library activities. with its strong historical and current connections to technology and libraries, the bill and melinda gates foundation might be a very important partner in funding and facilitating the increased role that public libraries play in providing access to and help with egovernment. the search for additional funding to support egovernment provision should not only focus on funds for access and training, but also on funds for research about how to better meet individual and community egovernment needs and the affects of egovernment provision by public libraries on individuals and communities. regardless of what approaches are taken to find ing greater support, however, public libraries must do a better job of communicating their involvement in the provision of egovernment to governments and private organizations in order to increase support. such commu nications will need to be part of a larger strategy to define a place within public policy that gives public libraries a voice in egovernment issues. if public libraries are going to fulfill this social role, they must become a greater pres ence in the national policy discourse surrounding egov ernment. to increase their support and standing in policy discourse, libraries must not be hesitant in reminding the public and government officials of their successes after emergencies and in providing the social infrastructure for efiling of taxes, enrolling in medicare prescription drug plans, and myriad other routine egovernment activities. in many societies, egovernment has come to be seen by many citizens and governments as a force that will enhance democratic participation, more closely link citizens and their representatives, and help disadvan taged populations become more active participants in government and in society.61 egovernment is seen by many as having “the potential to fundamentally change a whole array of public interactions with government.”62 while the egovernment act of 2002 and president’s egovernment management agenda have emphasized the transformative effect of egovernment, thus far it has primarily been used as a way to make information available, provide forms and electronic filing, and distrib ute the viewpoints of government agencies.60 however, many citizens do look to egovernment as a valuable source of information, considering egovernment sites to be “objective authoritative sources.”64 currently, the primary reason that people use egovernment is to gather information.65 in the united states, 58 percent of internet users in the united states believe egovernment to be the best source for government information, 65 percent of americans expect that information they are seeking will be on a government site, and 26 million americans seek political information online everyday.66 public satisfaction with the egovernment services available, however, is limited. as commercial sites are developing faster and provide more innovative services than egovernment sites, public satisfaction with gov ernment web sites is declining.67 public confidence in government web sites also has declined as much of the public policy related to egovernment since 9/11 has been to reduce access to information through egovern ment.68 the types of information that have been affected include many forms of socially useful information, from scientific information to public safety information to information about government activities.69 for these and other reasons, the majority of citizens, even those with a highspeed internet connection at home, seeking govern ment information and services prefer to speak to a person directly in their contacts with the government.70 in many cases, people turn to public librarians to serve as the per son involved in egovernment contacts. further, when people struggle with, become frustrated by, or reject egovernment services, they turn to public libraries. every year, public libraries deal with huge num bers of patrons needing help with online taxes, and the medicare prescription drug plan signup period resulted in article title | author 41public libraries, values, trust, and e-government | jaeger and fleischmann 41 an influx of seniors to public libraries seeking help in using the online registration system.71 for example, during the 2006 tax season, virginia discontinued the distribution of free print copies of tax forms to encourage use of the online system. instead, citizens of the state flooded public librar ies, assuming that libraries could find them print copies of the forms, which of course the libraries did. it seems unlikely, however, that the same government officials pushing the use of egovernment are aware of the roles of public libraries in helping citizens with daytoday egovernment use. further, the enormous social roles of public libraries in emergency response in communities, such as during the 2004 and 2005 hurricane seasons, are far from widely known among government officials. to encourage the provision of external funding, the develop ment of targeted support technologies, and policy sup port for these social roles, public libraries must make the government and the public better aware of these roles and what is needed to ensure that the roles can be fulfilled. similarly, there is an extremely important role for lis programs in ensuring public libraries can meet community expectations for egovernment provision. lis program graduates need to be prepared to help patrons access and use egovernment information and services. as govern ment activities move primarily or exclusively online, patrons will increasingly seek help with egovernment from public libraries. lis programs must ensure that grad uates are ready to serve patrons in this capacity. in 2007, the college of information studies at the university of maryland became the first alaaccredited school to offer a concentration in egovernment as part of the master of library science program.72 the goal of this concentration is to prepare future librarians who wish to specialize in egovernment, which will be an area of increasing and sig nificant need as more government information and services move online and more government agencies rely on public libraries to ensure access to egovernment. lis programs need to prioritize finding ways to incorporate the teaching of issues related to egovernment in public libraries as new concentrations or courses, or into existing courses. the provision of egovernment is an important role of public libraries that is likely to increase significantly, and gradu ates of lis programs need to be prepared to meet patrons’ egovernment information needs. further, lis faculties also can support public libraries in their egovernment access and training roles by focusing more research on the intersections of public libraries and egovernment. ultimately, the role of the trusted and valued public provider of egovernment access creates many financial and staffing obligations and social responsibilities, but it also is a tremendous opportunity for public libraries. fighting against censorship efforts in the 1950s estab lished the public perception of libraries as guardians of the first amendment during the mccarthy era.73 working to ensure access and the ability to use egovernment is creating new public perceptions of libraries as guardians of equal access in new but just as socially meaningful ways. rather than needing to ponder whether the emer gence of the internet will limit or remove the relevance of public libraries, the advent of egovernment has created a brand new and very significant role that public libraries can play in serving their communities. given the empha sis that governments are placing on moving information and services online, patrons will continue to need access to and assistance in using egovernment. the trust and values that have long been associated with public libraries are evolving to include the social expectations of the provision of access to and training for egovernment by public libraries. in the same ways that patrons have learned to trust public libraries to provide equal access to print information sources, they now have learned to trust that libraries can provide equal access to egovernment information. it seems that citizens will regu larly be turning to public libraries for help with mundane egovernment activities, such as finding forms and filing taxes, as well as with the most pressing egovernment activities, as was demonstrated in the aftermath of hur ricanes katrina and rita. because the trust in and values of public libraries have set the stage for the emerging role of libraries in egovernment, public libraries need to work to ensure the availability of the support, education, and policy decisions that they need to serve their communities in this new and vital role in situations ranging from every day information needs to the most extreme circumstances. in spite of the costs associated with serving as the public’s egovernment access center, acting as the social guarantor of equal access to egovernment emphatically demonstrates that public libraries will continue to be a central part of the infrastructure of society in the internet age. public libraries now must learn to articulate better the social roles they are playing and the types of support they need from lis programs, funding agencies, and gov ernment agencies to continue playing these roles. ■ acknowledgment the authors of this paper have worked with several col leagues on projects related to the ideas discussed in this paper. the authors would particularly like to thank john carlo bertot, lesley a. langa, charles r. mcclure, jennifer preece, yan qu, ben shneiderman, and philip fei wu. references and notes 1. margaret mooney marini, “social values and norms,” encyclopedia of sociology, edgar f. borgatta and marie l. borgatta, eds., 2828 (new york: macmillan, 2000). 42 information technology and libraries | december 200742 information technology and libraries | december 2007 2. steven hitlin and jane allyn piliavin, “values: reviv ing a dormant concept,” annual review of sociology 30 (2004): 359–93. 3. michael gorman, our singular strengths: meditations for librarians (chicago: ala, 1997); michael gorman, our enduring values: librarianship in the 21st century (chicago: ala, 2000); michael gorman, our own selves: more meditations for librarians (chicago: ala, 2005). 4. gorman, our enduring values. 5. frances k. groen, access to medical knowledge: libraries, digitization, and the public good (lanham, md.: scarecrow, 2007). 6. elizabeth smith, “equal information access and the evo lution of american democracy,” journal of educational media and library sciences 33, no. 2 (1995): 158–71. 7. toni samek, intellectual freedom and social responsibility in american librarianship, 1967–1974 (jefferson, n.c.: mcfarland, 2001). 8. pam scott, evelleen richards, and brian martin, “cap tives of controversy: the myth of the neutral social researcher in contemporary scientific controversies,” science, technology, and human values 15 (1990): 474–94. 9. american library association, “library bill of rights,” www.ala.org/ala/oif/statementspols/statementsif/librarybill rights.htm (accessed may 19, 2007). 10. kenneth r. fleischmann, “digital libraries with embed ded values: combining insights from lis and science and technology studies,” library quarterly (in press); kenneth r. fleischmann, “digital libraries and human values: human computer interaction meets social informatics,” proceedings of the 70th annual conference of the american society for infor mation science and technology, milwaukee, wisc., 2007. 11. pew research center, americans and social trust: who, where, and why (washington, d.c.: pew research center, 2007), http://pewresearch.org/assets/social/pdf/socialtrust.pdf, 2. 12. david wildon carr, “an ethos of trust in information service,” in ethics and electronic information: a festschrift for stephen almagno, barbara rockenbach and tom mendina, eds., 45–52 (jefferson, n.c.: mcfarland, 2003). 13. paul t. jaeger and gary burnett, “information access and exchange among small worlds in a democratic society: the role of policy in redefining information behavior in the post 9/11 united states,” library quarterly 75, no. 4 (2005): 464–95. 14. gorman, our enduring values, 66. 15. public agenda, long overdue: a fresh look at public and leadership attitudes about libraries in the 21st century (new york: public agenda, 2006), 11, www.publicagenda.org/research/ pdfs/long_overdue.pdf (accessed may 19, 2007). 16. john carlo bertot et al., “public access computing and internet access in public libraries: the role of public librar ies in egovernment and emergency situations,” first monday 11, no. 9 (2006), www.firstmonday.org/issues/issue11_9/bertot (accessed may 19, 2007). 17. public agenda, long overdue. 18. nancy zimmerman and feili tu, “it is not just a matter of ethics ii: an examination of issues related to the ethical provi sion of consumer health services in public libraries,” ethics and electronic information: a festschrift for stephen almagno, barbara rockenbach and tom mendina, eds., 119–27 (jefferson, n.c.: mcfarland, 2003). 19. paul sturges and ursula iliffe, “preserving a secret garden for the mind: the ethics of user privacy in the digital library,” ethics and electronic information: a festschrift for stephen almagno, barbara rockenbach and tom mendina, eds., 74–81 (jefferson, n.c.: mcfarland, 2003), 81. 20. online computer library center, inc. (oclc), perceptions of libraries and information resources: a report to the oclc membership (dublin, ohio: oclc, 2005). 21. jaeger and burnett, “information access and exchange among small worlds in a democratic society”; paul t. jaeger et al., “the usa patriot act, the foreign intelligence surveil lance act, and information policy research in libraries: issues, impacts, and questions for library researchers,” library quarterly 74, no. 2 (2004): 99–121. 22. children’s internet protection act, public law 106–554. 23. paul t. jaeger, john carlo bertot, and charles r. mcclure, “the effects of the children’s internet protection act (cipa) in public libraries and its implications for research: a statistical, policy, and legal analysis,” journal of the american society for information science and technology 55, no. 13 (2004): 1131–39; paul t. jaeger et al., “cipa: decisions, implementation, and impacts,” public libraries 44, no. 2 (2005): 105–09. 24. bertot et al., “public access computing and internet access in public libraries”; john carlo bertot et al., “drafted: i want you to deliver egovernment,” library journal 131, no. 13 (2006): 34–39; john carlo bertot et al., public libraries and the internet 2006: study results and findings (tallahassee, fla.: infor mation institute, 2006), www.ii.fsu.edu/plinternet_reports.cfm (accessed may 19, 2007). 25. nancy kranich, “libraries, the internet, and democracy,” libraries & democracy: the cornerstones of liberty, nancy kranich, ed., 83–95 (chicago: ala, 2001). 26. bertot et al., “public access computing and internet access in public libraries”; bertot et al., “drafted.” 27. bertot et al., public libraries and the internet 2006. 28. jaeger and burnett, “information access and exchange among small worlds in a democratic society,” 487. 29. anne heanue, “in support of democracy: the library role in public access to government,” information, libraries, and democracy: the cornerstones of liberty, nancy kranich, ed. (chi cago: ala, 2001), 124. 30. george d’elia et al., “the impact of the internet on public library uses: an analysis of the current consumer market for library and internet services,” journal of the american society for information science and technology 53, no. 10 (2002): 802–20; eleanor jo rodger, george d’elia, and corrine jorgensen, “the public library and the internet: is peaceful coexistence pos sible?,” american libraries 31, no. 5 (2001): 58–61. 31. w. e. ebbers, w. j. pieterson, and h. n. noordman, “elec tronic government: rethinking channel management strate gies,” government information quarterly (in press). 32. ibid. 33. awdhesh k. singh and rajendra sahu, “integrating inter net, telephones, and call centers for delivering better quality egovernance to all citizens,” government information quarterly (in press). 34. paul t. jaeger and kim m. thompson, “egovernment around the world: lessons, challenges, and new directions,” government information quarterly 20, no. 4 (2003): 389–94; paul t. jaeger and kim m. thompson, “social information behavior article title | author 43public libraries, values, trust, and e-government | jaeger and fleischmann 43 and the democratic process: information poverty, normative behavior, and electronic government in the united states,” library & information science research 26, no. 1 (2004): 94–107. 35. paul t. jaeger, “deliberative democracy and the con ceptual foundations of electronic government,” government information quarterly 22, no. 4 (2005): 702–19; paul t. jaeger, “information policy, information access, and democratic partic ipation: the national and international implications of the bush administration’s information politics,” government information quarterly (in press). 36. bertot et al., “public access computing and internet access in public libraries”; bertot et al., “drafted.” 37. aimee c. quinn and laxmi ramasubramanian, “infor mation technologies and civic engagement: perspectives from librarianship and planning,” government information quarterly (in press). 38. bertot et al., public libraries and the internet 2006; paul t. jaeger et al., “the 2004 and 2005 gulf coast hurricanes: evolv ing roles and lessons learned for public libraries in disaster preparedness and community services,” public library quarterly (in press). 39. bertot et al., “drafted.” 40. jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 41. ibid. 42. ibid. 43. michael arnone, “storm watch 2006: ready or not,” federal computer week, june 5, 2006, www.fcw.com/print/12_20/ news/947111.html (accessed may 19, 2007). 44. jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 45. bertot et al., “public access computing and internet access in public libraries.” 46. jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 47. ibid. 48. ibid. 49. bertot et al., “public access computing and internet access in public libraries.” 50. paul t. jaeger et al., “911.gov: harnessing egovernment, mobile communication technologies, and social networks to promote community participation in emergency response,” telecommunications policy (in press); ben shneiderman and jenny preece, “911.gov: community response grids,” science 315 (2007): 944. 51. kim viborg andersen and helle zinner henriksen, “egovernment research: capabilities, interaction, orientation, and values,” current issues and trends in e-government research, donald f. norris, ed., 269–88 (hershey, pa.: cybertech, 2007). 52. jeffrey roy, “egovernment in canada: transition or trans formation?” current issues and trends in e-government research, donald f. norris, ed., 44–67 (hershey, pa.: cybertech, 2007), 51. 53. oecd egovernment studies, the e-government imperative (danvers, mass.: organization for economic cooperation and development, 2005), 45. 54. bruce barber, strong democracy (berkeley, calif.: univ. of california pr., 1984). 55. bertot et al., public libraries and the internet 2006. 56. charles r. mcclure et al., e-government and public libraries: current status, meeting report, findings, and next steps (tallahassee, fla.: information use management and policy institute, 2007), www.ii.fsu.edu/announcements/egov2006/ egov_report.pdf (accessed may 19, 2007). 57. paul t. jaeger et al., “public libraries and internet access across the united states: a comparison by state from 2004 to 2006,” information technology and libraries 26, no. 2 (2007): 4–14. 58. jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 59. bertot et al., “drafted.” 60. jaeger et al., “public libraries and internet access across the united states.” 61. beth simone noveck, “designing deliberative democracy in cyberspace: the role of the cyberlawyer,” boston university journal of science and technology 9 (2003): 1–91. 62. s. h. holden and l. i. millett, “authentication, privacy, and the federal egovernment,” information society 21 (2005): 367. 63. egovernment act of 2002, p.l. 107–347; jaeger, “delibera tive democracy and the conceptual foundations of electronic government”; e-government strategy: implementing the president’s management agenda for e-government (washington, d.c.: egov, 2003), www.whitehouse.gov/omb/egov/2003egov_strat.pdf (accessed may 19, 2007). 64. anderson office of government services, a usability analysis of selected federal government web sites (anderson office of government services: washington, d.c., 2002), 1. 65. christopher g. reddick, “citizen interaction with egov ernment: from the streets to servers?,” government information quarterly 22, no. 1 (2005): 338–57. 66. john b. horrigan, politics online (washington, d.c., pew internet & american life project, 2006); john b. horrigan and lee rainie, counting on the internet (washington, d.c., pew internet & american life project, 2002). 67. stephen barr, “public less satisfied with government websites,” washington post, mar. 21, 2007, www.washingtonpost. com/wpdyn/content/article/2007/03/20/ar2007032001338. html (accessed may 19, 2007). 68. lotte e. feinberg, “foia, federal information policy, and information availability in a post9/11 world,” government information quarterly 21 (2004): 439–60; elaine l. halchin, “electronic government: government capability or terrorist resource,” government information quarterly 21 (2004): 406–19: harold c. relyea and elaine l. halchin, “homeland security and information management,” the bowker annual: library and trade almanac 2003, d. bogart, ed., 231–50 (medford, n.j.: infor mation today, 2003). 69. jaeger, “information policy, information access, and democratic participation.” 70. john b. horrigan, how americans get in touch with government (washington, d.c., pew internet & american life project, 2004). 71. bertot et al., “public access computing and internet access in public libraries”; bertot et al., “drafted.” 72. the description of the university of maryland’s egov ernment master’s program is available at www.clis.umd.edu/ programs/egov.shtml. 73. jaeger and burnett, “information access and exchange among small worlds in a democratic society.” 54 information technology and libraries | june 2010 tinuing education opportunities for library information technologists and all library staff who have an interest in technology. 2. innovation: to serve the library community, lita expert members will identify and demonstrate the value of new and existing technologies within ala and beyond. 3. advocacy and policy: lita will advocate for and participate in the adoption of legislation, policies, technologies, and standards that promote equitable access to information and technology. 4. the organization: lita will have a solid structure to support its members in accomplishing its mission, vision, and strategic plan. 5. collaboration and outreach: lita will reach out and collaborate with other library organizations to increase the awareness of the importance of technology in libraries, improve services to existing members, and reach out to new members. the lita executive committee is currently finalizing the strategies lita will pursue to achieve success in each of the goal areas. it is my hope that the strategies for each goal are approved by the lita board of directors before the 2010 ala annual conference in washington, d.c. that way the finalized version of the lita strategic plan can be introduced to the committee and interest group chairs and the membership as a whole at that conference. this will allow us to start the next fiscal year with a clear road for the future. while i am excited about what is next, i have also been dreading the end of my presidency. i have truly enjoyed my experience as lita president, and in some way wish it was not about to end. i have learned so much and have met so many wonderful people. thank you for giving me this opportunity to serve you and for your support. i have truly appreciated it. a s i write this last column, the song “my way” by frank sinatra keeps going through my head. while this is definitely not my final curtain, it is the final curtain of my presidency. like sinatra i have a few regrets, “but then again, too few to mention.” there was so much more i wanted to accomplish this year; however, as usual, my plans were more ambitious than the time i had available. being lita’s president was a big part of my life, but it was not the only part. those other parts—like family, friends, work, and school—demanded my attention as well. i have thought about what to say in this final column. do i list my accomplishments of the last year? nah, you can read all about that in the lita annual report, which i will post in june. tackle some controversial topic? while i can think of a few, i have not yet thought of any solutions, and i do not want to rant against something without proposing some type of solution or plan of attack. i thought instead i would talk about where i have devoted a large part of my lita time over the last year. as i look back at the last year, i am also thinking ahead to the future of lita. we are currently writing lita’s strategic plan. we have a lot to great ideas to work with. lita members are always willing to share their thoughts both formally and informally. i have been charged with the task of taking all of those great ideas, gathered at conferences, board meetings, hallway conversations, surveys, e-mail, etc., to create a roadmap for the future. after reviewing all of the ideas gathered over the last three years, i was able to narrow that list down to six major goal areas. with the assistance of the lita board of directors and the lita executive committee, we whittled the list down to five major goal areas of the lita strategic plan: 1. training and continuing education: lita will be nationally recognized as the leading source for conmichelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque president’s message: the end and new beginnings editorial board thoughts: technology and mission: reflections of a first-year college library director ed tallent information technology and libraries | december 2012 3 as i reflect on my first year as director for a small college library, several themes are clear to me, but perhaps none resonates as vibrantly as the challenges in managing technology, technology planning, and the never-ending need for technology integration, both within the library and the college. it is all-encompassing, involving every library activity and initiative. while my issues will naturally have a contextual flavor unique to my place of employment, i imagine they reflect issues that all librarians face (or have already faced). what is perhaps less unique is how these issues of library technology intersect with some very high priority college initiatives and challenges. and, given myriad reports on students’ ongoing ambivalent attitudes toward libraries (after everything we have done for them!), it still behooves us to keep working at this integration of the library into the learning and teaching process and to hitch our wagon to larger strategic missions. so, what issues have i faced? the campus portal vs. library web site: this issue is neither new nor unique, but is still is a tangled web of conflicting priorities and attitudes, campus politics and technology vision, the extent and location of technology support, and the flexibility of the campus portal or content management system (cms) and the people who direct it. it is not a question of any misunderstandings, as the need to market the library via the campus web site is obvious and the goal of personalized service is laudatory. yet, marrying the external marketing needs with the internal support needs is a difficult balance to achieve. the web offers a more dramatic entrée to the library than a portal/intranet, and portal technology is not perfect, as jacob neilson highlights in a recent post. the goal obviously is further complicated by the fact that the support needed to maintain a quality web presence--one that is well graphically interesting, vibrant and intuitive--is significant when one considers library web sites are rarely used a place to begin research by students and faculty. ed tallent (edtallent@ curry.edu) is director, levin library, curry college, milton, massachusetts. http://www.useit.com/alertbox/intranet-usability.html editorial board thoughts: technology and mission | tallent 4 the portal, on the other hand, promises a personalized approach and easier maintenance, but lacks the level of operability that would be desirable. the web presence can support both user needs and offer visitors a sense of the quality services and collections the library provides. so, at this writing, what we have is a litany of questions not yet resolved. mobile, tablets, and virtual services: the questions also abound in these areas. should we build our own mobile services, or contract out the development? do we (can we) focus on creating a leadership role for the library in the area of emerging technology, or wait for a coordinated institutional vision and plan to emerge? in the area of tablets, we are about to commence circulating ipads and anyone who has gone through the labyrinthian process just to load apps will know that the process gives one pause as to the value of such an initiative, and that is before they circulate and need to be managed. still, it is a technology initiative that demands review of library work flows, security, student training, and collection access. virtual services were at a fairly nascent state upon my arrival and have grown slowly, as they are being developed in a culture that stressed individual, hands-on, and personalized services. virtual services can be all that, but that needs to be demonstrated not only to the user but to the people delivering the service. the added value here is that the work engages us in valuable reflections on the way in which we work or should work. value of the library: i began my new position at time when the college was deeply engrossed in the issue of student recruitment, retention, and success. for my employer these are significant institutional identity issues, and the library is expected to document its contributions to student outcomes and success. not nearly enough has been done, though a working relationship with a new director of institutional research is developing and critical issues such information literacy, integrated student support, learning spaces, learning analytics, and the need for a data warehouse will be incorporated into the into the college’s strategic plan. the opportunity is there for the library to link with major college initiatives, for example, and make information literacy more than a library issue. citation management: now, here is a traditional library activity, the bane of many a reference service interaction and the undergraduate’s last-minute nightmare. a combination of technical, service and fiscal challenge revolve around the campus climate on the use of technology to respond to this quandary. what to do with faculty who believe strongly that the best way to learn this skill is by hand, not with any system that aims for interoperability and a desire to save the time of the user? for others, which tool should be used? should we not just go with a free one? while discipline differences will always exist, the current environment does present opportunities for the library to take a leadership role in defining what the possibilities are and ideally connecting the approach to appropriate and measurable learning outcomes and to the larger issue of academic integrity. information technology and libraries | december 2012 5 e-books, pda, article tokens: one of the unforeseen benefits of my moving to a small college library is that there is not the attachment to a print collection that exists in many/most research libraries. there is remarkable openness to experimenting with and committing to various methods of digital delivery of content. thus, we have been able to test myriad possibilities, from patron driven book purchasing, tokens for journal articles, and streaming popular films from a link in the library management system. this blurring of content, delivery, and functionality presents numerous opportunities for librarians to have conversations with departments of the future of collections. connecting with alumni: this is always an important strategic issue for colleges and universities and it seems as though there are promising emerging options for libraries to deliver database content to alumni, as vendors are beginning to offer more reasonable alumni-oriented packages. my library will be working with the appropriate campus offices next year to develop a plan for funding targeted library content for alumni as part of the college’s broader strategic activities to engage alumni. web design skills: while i understand the value that products like libguides can bring to the community, allowing content experts (librarians) to quickly and easily create template-driven web-based subject guides, i remain troubled by the lack of design skills librarians possess, and by the lack of recognition that good design can be just as important as good content. this is not a criticism, as we are not graphic designers. we have a sense of user needs, knowledge about content, and a desire to deliver, but i believe that products like this lead librarians to believe that good design for learning is easy. i do not claim to be an expert, but i know this is not the case. this approach does not translate into user friendly guides that hold to consistent standards. i think we need to recognize that we can benefit from non-librarian expertise in the area of web design. one opportunity that i want to investigate along these lines is to create student internships that would bring design skills and the student perspective to the work. a win-win, as this also supports the college’s desire for more internships and experiential learning for students. there is neither time nor space to address an even broader library technology issue on the near horizon, which will be another campus engagement moment, the future ils for the library. yet, maybe that should have been addressed first, since what i have read and heard, the new ilss will solve all of the above problems! farrell ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ editor’s comments bob gerrity information technologies and libraries | september 2013 3 this month’s issue in this month’s issue, we welcome back the president’s message column, with incoming lita president cindi trainor describing upcoming lita events, priorities, and opportunities for members. university of denver mlis candidate gina schlesselman-tarango contributes a compelling piece describing the background, use, and potential library application of searchable signatures in web 2.0 applications such as instagram. jenny emanuel from university of illinois reports on the complex relationship that millennial academic librarians have with technology. kristina l. southwell and jacquelyn slater from university of oklahoma present the findings of a study evaluating the accessibility of special collections finding aids to screen readers for visually impaired users. ping fu from central washington university and moira fitzgerald from yale university look at the potential effects of cloud-based next-generation library services platforms on staffing models for systems and technical-services departments. visiting the discovery side of library services, megan johnson from appalachian state university reports on usability testing of appalachian’s “one box” integrated articles and catalog search, using innovative interfaces’ encore discovery service. speaking of usability, i had the chance recently to observe a usability testing session for my library’s website, and was reminded of the importance of designing library websites and delivering web-based library services that will actually be of value to our users, delivered with their context in mind rather than ours. my library, like many others, has a website rich in content and complexity and organized around our structure. to the user i was observing, the complexity and library-centric organization clearly were obstacles to the rich content we offer. an undergraduate art history major, she was primarily interested in library resources and services that were directly connected to her coursework and that were accessible from the university’s learning management system (lms). she valued the convenience of direct access from the lms to library-managed course readings and past exam papers. but, when asked to navigate to the same resources using the library homepage as a starting point, rather than the lms, she quickly became frustrated and confused by the overload of search options with (to her) confusing labels. she was further stymied by our proclivity to make things more complex than they need to be (or should be). a simple example: a common occurrence at the beginning of semester is that students with outstanding library fines/fees are blocked from registering for classes. rather than providing a simple, direct “resolve my library fees” link, with clear instructions on how to fix their problem, as bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. editor’s comments bob gerrity editor’s comments | gerrity 4 quickly as possible, we instead provide pages of information about how and why the fines/fees were calculated, with no link to a solution to the problem at hand. my takeaways from the session were that (1) our website needs to be radically simplified and (2) we should be focussing on designing and delivering services that can be embedded in the context of the user’s natural workflows, not the library’s. easier said than done, of course. reviewers needed the ital editorial board has room for a couple of additional members, to help us keep up with incoming article submissions. if you have a passion for library technology, a willingness to undertake a few reviews each year, and are a member of lita (or willing to join), please send me an e-mail indicating your interest and area(s) of expertise. as always, suggestions and feedback on ital are welcome, at the e-mail address above. content management systems: trends in academic libraries ruth sara connell information technology and libraries | june 2013 42 abstract academic libraries, and their parent institutions, are increasingly using content management systems (cmss) for website management. in this study, the author surveyed academic library web managers from four-year institutions to discover whether they had adopted cmss, which tools they were using, and their satisfaction with their website management system. other issues, such as institutional control over library website management, were raised. the survey results showed that cms satisfaction levels vary by tool and that many libraries do not have input into the selection of their cms because the determination is made at an institutional level. these findings will be helpful for decision makers involved in the selection of cmss for academic libraries. introduction as library websites have evolved over the years, so has their role and complexity. in the beginning, the purpose of most library websites was to convey basic information, such as hours and policies, to library users. as time passed, more and more library products and services became available online, increasing the size and complexity of library websites. many academic library web designers found that their web authoring tools were no longer adequate for their needs and turned to cmss to help them manage and maintain their sites. for other web designers, the choice was not theirs to make. their institution transitioned to a cms and required the academic library to follow suit, regardless of whether the library staff had a say in the selection of the cms or its suitability for the library environment. the purpose of this study was to examine cms usage within the academic library market and to provide librarians quantitative and qualitative knowledge to help make decisions when considering switching to, or between, cmss. in particular, the objectives of this study were to determine (1) the level of saturation of cmss in the academic library community; (2) the most popular cmss within academic libraries, the reasons for the selection of those systems, and satisfaction with those cmss; (3) if there is a relationship between libraries with their own dedicated information technology (it) staff and those with open source (os) systems; and (4) if there is a relationship between institutional characteristics and issues surrounding cms selection. ruth sara connell (ruth.connell@valpo.edu) is associate professor of library services and electronic services librarian, christopher center library services, valparaiso university, valparaiso, in. mailto:ruth.connell@valpo.edu content management systems: trends in academic libraries | connell 43 although this study largely focuses on cms adoption and related issues, the library web designers who responded to the survey were asked to identify what method of web management they use if they do not use a cms and asked about satisfaction with their current system. thus, information regarding cms alternatives (such as adobe’s dreamweaver web content editing software) is also included in the results. as will be discussed in the literature review, cmss have been broadly defined in the past. therefore, for this study participants were informed that only cmss used to manage their primary public website were of interest. specifically, cmss were defined as website management tools through which the appearance and formatting is managed separately from content, so that authors can easily add content regardless of web authoring skills. literature review most of the library literature regarding cms adoption consists of individual case studies describing selection and implementation at specific institutions. there are very few comprehensive surveys of library websites or the personnel in charge of academic library websites to determine trends in cms usage. the published studies including cms usage within academic libraries do not definitively answer whether overall adoption has increased. in 2005 several georgia state university librarians surveyed web librarians at sixty-three of their peer institutions, and of the sixteen responses, six (or 38 percent) reported use of “cms technology to run parts of their web site.” 1 a 2006 study of web managers from wide range of institutions (associates to research) indicated a 26 percent (twenty-four of ninety-four) cms adoption rate.2 a more recent 2008 study of institutions of varying sizes resulted in a little more than half of respondents indicating use of cmss, although the authors note that “people defined cmss very broadly,” 3 including tools like moodle and contentdm, and some of those libraries indicated they did not use the cms to manage their website. a 2012 study by comeaux and schmetzke differs from the others mentioned here in that they reviewed academic library websites of the fifty-six campuses offering ala-accredited graduate degrees (generally larger universities) and used tools and examined page code to try to determine on their own if the libraries used cmss, as opposed to polling librarians at those institutions to ask them to self-identify if they used cmss. they identified nineteen out of fifty-six (34 percent) sites using cmss. the authors offer this caveat, “it is very possible that more sites use cmss than could be readily identified. this is particularly true for ‘home-grown’ systems, which are unlikely to leave any readily discernible source code.” 4 because of different methodologies and population groups studied in these studies, it is not possible to draw conclusions regarding cms adoption rates within academic libraries over time using these results. as mentioned previously, some people define cmss more broadly than others. one example of a product that can be used as a cms, but is not necessarily a cms, is springshare’s libguides. many libraries use libguides as a component of their website to create guides. however, some libraries have utilized the product to develop their whole site, in effect using it as a cms. a case study by information technology and libraries | june 2013 44 two librarians at york college describes why they chose libguides as their cms instead of as a more limited guide creation tool.5 several themes recurred throughout many of the case study articles. one common theme was the issue of lack of control and problems of collaboration between academic libraries and the campus entities controlling website management. amy york, the web services librarian at middle tennessee state university, described the decision to transition to a cms in this way, “and while it was feasible for us to remain outside of the campus cms and yet conform to the campus template, the head of the it web unit was quite adamant that we move into the cms.” 6 in a study by bundza et al., several participants who indicated dissatisfaction with website maintenance mentioned “authority and decision-making issues” as well as “turf struggles.” 7 other articles expressed more positive collaborative experiences. morehead state university librarians kmetz and bailey noted, “when attending conferences and hearing the stories of other libraries, it became apparent that a typical relationship between librarians and a campus it staff is often much less communicative and much less positive than [ours]. because of the relatively smooth collaborative spirit, a librarian was invited in 2003 to participate in the selection of a cms system.” 8 kimberley stephenson also emphasized the advantageous relationships that can develop when a positive approach is used, “rather than simply complaining that staff from other departments do not understand library needs, librarians should respectfully acknowledge that campus web developers want to create a site that attracts users and consider how an attractive site that reflects the university’s brand can be beneficial in promoting library resources and services.” 9 however, earlier in the article she does acknowledge that the iterative and collaborative process between the library and their university relations (ur) department was occasionally contentious and that the web services librarian notifies ur staff before making changes to the library homepage.10 another common theme in the literature was the reasoning behind transitioning to a cms. one commonly cited criterion was access control or workflow management, which allows site administrators to assign contributors editorial control over different sections of the site or approve changes before publishing.11 however, although this feature is considered a requirement by many libraries, it has its detractors. kmetz and bailey indicated that at morehead state university, “approval chains have been viewed as somewhat stifling and potentially draconian, so they have not been activated.” 12 these studies greatly informed the questions used and development of the survey instrument for this study. method in designing the survey instrument, questions were considered based on how they informed the objectives of the study. to simplify analysis, it was important to compile as comprehensive a list of content management systems: trends in academic libraries | connell 45 cmss as possible. this list was created by pulling cms names from the literature review, the web4lib discussion list, and the cmsmatrix website (www.cmsmatrix.org). in order to select institutions for distribution, the 2010 carnegie classification of institutions of higher education basic classification lists were used.13 the author chose to focus on three broad classifications: 1. research institutions consisting of the following carnegie basic classifications: research universities (very high research activity), research universities (high research activity), and dru: doctoral/research universities. 2. master’s institutions consisting of the following carnegie basic classifications: master's colleges and universities (larger programs), master's colleges and universities (medium programs), master's colleges and universities (smaller programs). 3. baccalaureate institutions consisting of the following carnegie basic classifications: baccalaureate colleges—arts & sciences and baccalaureate colleges—diverse fields. the basic classification lists were downloaded into excel with each of the three categories in a different worksheet, and then each institution was assigned a number using the random number generator feature within excel. the institutions were then sorted by those numbers creating a randomly ordered list within each classification. to determine sample size for a stratified random sampling, ronald powell’s “table for determining sample size from a given population” 14 (with a .05 degree of accuracy) was used. each classification’s population was considered separately, and the appropriate sample size chosen from the table. the population size of each of the groups (total number of institutions within that carnegie classification) and the corresponding sample sizes were • research: population = 297, sample size = 165; • master’s: population = 727, sample size = 248; • baccalaureate: population = 662, sample size = 242. the total number of institutions included in the sample size was 655. the author then went through the list of selected institutions and searched online to find their library webpages and find the person most likely responsible for the library’s website. during this process, there were some institutions, mostly for-profits, for which a library website could not be found. when this occurred, that institution was eliminated and the next institution on the list used in its place. in some cases, the person responsible for web content was not easily identifiable; in these cases an educated guess was made when possible, or else the director or a general library email address was used. the survey was made available online and distributed via e-mail to the 655 recipients on october 1, 2012. reminders were sent on october 10 and october 18, and the survey was closed on october 26, 2012. out of 655 recipients, 286 responses were received. some of those responses http://www.cmsmatrix.org/ information technology and libraries | june 2013 46 had to be eliminated for various reasons. if two responses were received from one institution, the more complete response was used while the other response was discarded. some responses included only an answer to the first question (name of institution or declination of that question to answer demographic questions) and no other responses; these were also eliminated. once the invalid responses were removed, 265 remained, for a 40 percent response rate. before conducting an analysis of the data, some cleanup and standardization of results was required. for example, a handful of respondents indicated they used a cms and then indicated that their cms was dreamweaver or adobe contribute. these responses were recoded as non-cms responses. likewise, one respondent self-identified as a non-cms user but then listed drupal as his/her web management tool and this was recoded as a cms response. demographic profile of respondents for the purposes of gathering demographic data, respondents were offered two options. they could provide their institution’s name, which would be used solely to pair their responses with the appropriate carnegie demographic categories (not to identify them or their institution), or they could choose to answer a separate set of questions regarding their size, public/private affiliation, and basic carnegie classification. the basic carnegie classification of the largest response group was master’s with 102 responses (38 percent); then baccalaureate institutions (94 responses or 35 percent), and then research institutions (69 responses or 26 percent). this correlates pretty closely with the distribution percentages, which were 38 percent master’s (248 out of 655), 37 percent baccalaureate (242 out of 655), and 25percent research (165 out of 655). of the 265 responses, 95 (36 percent) came from academic librarians representing public institutions and 170 (64 percent) from private. of the private institutions, the vast majority (166 responses or 98 percent) were not-for-profit, while 4 (2 percent) were for-profits. to define size, the carnegie size and setting classification was used. very small institutions are defined as less than 1,000 full-time equivalent (fte) enrollment, small is 1,000–2,999 fte, medium is 3,000–9,999 fte, and large is at least 10,000 fte. the largest group of responses came from small institutions (105 responses or 40 percent), then medium (67 responses or 25 percent), large (60 responses or 23 percent), and very small (33 responses or 12 percent). results the first question asking for institutional identification (or alternative routing to carnegie classification questions) was the only question for which an answer was required. in addition, because of question logic, some people saw questions that others did not based on how they answered previous questions. thus, the number of responses varies for each question. one of the objectives of this study was to identify if there were traits among institutional characteristics and cms selection and management. the results that follow include both content management systems: trends in academic libraries | connell 47 descriptive statistics and statistically significant inferential statistics discovered using chi-square and fisher’s exact tests. statistically significant results are labeled as such. the responses to this survey show that most academic libraries are using a cms to manage their main library website (169out of 265 responses or 64 percent). overall, cms users expressed similar (although slightly greater) satisfaction levels with their method of web management (see table 1.) table 1 satisfaction by cms use use a cms to manage library website yes no user is highly satisfied or satisfied yes 79 responses or 54% 41 responses or 47% no 68 responses or 46% 46 responses or 53% total 147 responses or 100% 87 responses or 100% non-cms users non-cms users were asked what software or system they use to govern their site. by far, the most popular system mentioned among the 82 responses was adobe dreamweaver, with 24 (29 percent) users listing it as their only or primary system. some people listed dreamweaver as part of a list of tools used; for example “php / mysql, integrated development environments (php storm, coda), dreamweaver, etc.,” and if all mentions of dreamweaver are included, the number of users rises to 31 (38 percent). some version of “hand coded” was the second most popular answer with 9 responses (11 percent), followed by adobe contribute with 7 (9 percent). many of the “other” responses were hard to classify and were excluded from analysis. some examples include: • ftp to the web • voyager public web browser ezproxy • excel, e-mail, file folders on shared drives among the top three non-cms web management systems, dreamweaver users were most satisfied, selecting highly satisfied or satisfied in 15 out of 24 (63 percent) cases. hand coders were highly satisfied or satisfied in 5out of 9 of cases (56 percent), and adobe contribute users were only highly satisfied or satisfied in 3 out of 7 (43 percent) cases. respondents not using a cms were asked whether they were considering a move to a cms within the next two years. most (59 percent) said yes. research libraries were much more likely to be planning such a move (81percent) than master’s (50 percent) or baccalaureate (45 percent) libraries (see table 2.) a chi-square test rejects the null hypothesis that the consideration of a move to cms is independent of basic carnegie classification; this difference was significant at the p = 0.038 level. information technology and libraries | june 2013 48 table 2 non-cms users considering a move to a cms within the next two years by carnegie classification* baccalaureate master’s research total no 11 responses or 55% 11 responses or 50% 4 responses or 19% 26 responses or 41% yes 9 responses or 45% 11 responses or 50% 17 responses or 81% 37 responses or 59% total 20 responses or 100% 22 responses or 100% 21 responses or 100% 63 responses or 100% chi-square=6.526, df=2, p=.038 *excludes “not sure” responses non-cms users were asked to provide comments related to topics covered in the survey, and here is a sampling of responses received: • cmss cost money that our college cannot count on being available on a yearly basis. • the library doesn't have overall responsibility for the website. university web services manages the entire site, i submit changes to them for inclusion and updates. • we are so small that the time to learn and implement a cms hardly seems worth it. so far this low-tech method has worked for us. • the main university site was moved to a cms in 2008. the library was not included in that move because of the number of pages. i hear rumors that we will be forced into the cms that is under consideration for adoption now. the library has had zero input in the selection of the new cms. cms users when respondents indicated their library used a cms, they were routed to a series of cms related questions. the first question asked which cms their library was using. of the 153 responses, the most popular cmss were drupal (40); wordpress (15); libguides (14), which was defined within the survey as a cms “for main library website, not just for guides”; cascade server (12); ektron (6); and modx and plone (5 each). these users were also asked about their overall satisfaction with their systems. among the top four cmss, libguides users were the most satisfied, selecting highly satisfied or satisfied in 12 out of 12 (100 percent) cases. the remaining three systems’ satisfaction ratings (highly satisfied or satisfied) were as follows: wordpress (12out of 15 cases or 80 percent), drupal (26out of 38 cases or 68 percent), and cascade server (3 out of 11 cases or 27 percent). when asked whether they would switch systems if given the opportunity, most (61out of 109 cases or 56 percent) said no. looking at the responses for the top four cmss, responses echo the content management systems: trends in academic libraries | connell 49 satisfaction responses. libguides users were least likely to want to switch (0 out of 7 cases or 0 percent), followed by wordpress (1 out of 5 cases or 17 percent), drupal (8out of 23 cases or 26 percent), and cascade server (3 out of 7 or 43 percent) users. respondents were asked whether their library uses the same cms as their parent institution. most (106 out of 169 cases or 63 percent) said yes. libraries at large institutions (over 10,000 fte) were much less likely (34 percent) than their smaller counterparts to share a cms with their parent institution (see table 3.) a chi-square test rejects the null hypothesis that sharing a cms with a parent institution is independent of size: at a significance level of p = 0.001, libraries at smaller institutions are more likely to share a cms with their parent. table 3 cms users whose libraries use the same cms as their parent institution by size large medium small very small total no 23 responses (66%) 15 responses (33%) 19 responses (27%) 6 responses (35%) 63 responses (37%) yes 12 responses (34%) 31 responses (67%) 52 responses (73%) 11 responses (65%) 106 responses (63%) total 35 responses (100%) 46 responses (100%) 71 responses (100%) 17 responses (100%) 169 responses (100%) chi-square=15.921, df=3, p=.001 not surprisingly, a similar correlation holds true for comparing shared cmss and simplified basic carnegie classification. baccalaureate and master’s libraries were more likely to share cmss with their institutions (69 percent and 71 percent respectively) than research libraries (42 percent) (see table 4.) at a significance level of p = 0.004, a chi-square test rejects the null hypothesis that sharing a cms with a parent institution is independent of basic carnegie classification. table 4 cms users whose libraries use the same cms as their parent institution, by carnegie classification baccalaureate master’s research total no 19 responses (31%) 18 responses (29%) 26 responses (58%) 63 responses (37%) yes 43 responses (69%) 44 responses (71%) 19 responses (42%) 106 responses (63%) total 62 responses (100%) 62 responses (100%) 45 responses (100%) 169 responses (100%) chi-square = 11.057, df = 2, p = .004 information technology and libraries | june 2013 50 when participants responded that their library shared a cms with the parent institution, they were asked a follow up question about whether the library made the transition with the parent institution. most (80 out of 99 cases or 81 percent) said yes, the transition was made together. however, private institutions were more likely to have made the switch together (88 percent) than public (63 percent) (see table 5.) a fisher’s exact test rejects the null hypothesis that transition to cms is independent of institutional control: at a significance level of p = 0.010, private institutions are more likely than public to move to a cms in concert. table 5 users whose libraries and parent institutions use the same cms: transition by public/private control* private public total switched independently 9 responses (13%) 10 responses (37%) 19 responses (19%) switched together 63 responses (88%) 17 responses or (63%) 80 responses (81%) total 72 responses (101%)** 27 responses (100%) 99 responses (100%) fisher’s exact test: p = .010 * excludes responses where people indicated “other” ** due to rounding, total is greater than 100% similarly, a relationship existed between transition to cms and basic carnegie classification. baccalaureate institutions (93 percent) were more likely than master’s (80 percent), which were more likely than research institutions (53 percent) to make the transition together (see table 6.) a chi-square test rejects the null hypothesis that the transition to cms is independent of basic carnegie classification: at a significance level of p = 0.002, higher degree granting institutions are less likely to make the transition together. table 6 users whose libraries and parent institutions use the same cms: transition by carnegie classification* baccalaureate master’s research total switched independently 3 responses (7%) 8 responses (21%) 8 responses (47%) 19 responses (19%) switched together 40 responses (93%) 31 responses (80%) 9 responses (53%) 80 responses (81%) total 43 responses (100%) 39 responses (101%)** 17 responses (100%) 99 responses (100%) chi-square = 12.693, df = 2, p = .002 *excludes responses where people indicated “other” **due to rounding, total is greater than 100% content management systems: trends in academic libraries | connell 51 this study indicates that for libraries that transitioned to a cms with their parent institution, the transition was usually forced. out of the 88 libraries that transitioned together and indicated whether they were given a choice, only 8 libraries (9 percent) had a say in whether to make that transition. and even though academic libraries were usually forced to transition with their institution, they did not usually have representation on campus-wide cms selection committees. only 25 percent (22 out of 87) respondents indicated that their library had a seat at the table during cms selection. when comparing cms satisfaction ratings among libraries that were represented on cms selection committees versus those that had no representation, it is not surprising that those with representation were more satisfied (13 out of 22 cases or 59 percent) than those without (21 out of 59 cases or 36 percent). the same holds true for those libraries given a choice whether to transition. those given a choice were satisfied more often (6out of 8 cases or 75 percent) than those forced to transition (21 out of 71 cases or 30 percent). respondents who said that they were not on the same cms as their institution were asked why they chose a different system. many of the responses indicated a desire for freedom from the controlling influence of either it and marketing arms of the institution : • we felt drupal offered more flexibility for our needs than cascade, which is what the university at large was using. i've heard more recently that the university may be considering switching to drupal. • university pr controls all aspects of the university cms. we want more freedom. • we are a service-oriented organization, as opposed to a marketing arm. we by necessity need to be different. cms users were asked to provide a list of three factors most important in their selection of their cms and to rank their list in order of importance. the author standardized the responses, e.g. “price” was recorded as “cost.” the factors listed first, in order of frequency, were ease of use (15), flexibility (10), and cost (6). ignoring the ranking, 38 respondents listed ease of use somewhere in their “top three”, while 23 listed cost, and 16 listed flexibility. another objective of this study was to determine if there was a positive correlation between libraries with their own dedicated it staff and those who chose open source cmss. therefore cms users were asked if their library had its own dedicated it staff, and 66 out of 143 libraries (46 percent) said yes. then the cmss used by respondents were translated into two categories, open source or proprietary systems (when a cms listed was unknown it was coded as a missing value), and a fisher’s exact test was run against all cases that had values for both variables to see if a correlation existed. although those with library it had open source systems more frequently than those without, the difference was not significant (see table 7.) information technology and libraries | june 2013 52 table 7 libraries with own it personnel by open source cmss library has own it yes no total cms is open source yes 37 responses (73%) 32 responses (57%) 69 responses (65%) no 14 responses (28%) 24 responses (43%) 38 responses (36%) total 51 responses (101%)* 56 responses or (100%) 107 responses (101%)* fisher’s exact test: p = .109 *due to rounding, total is greater than 100% in another question, people were asked to self-identify if their organization uses an open source cms, and if so asked whether they have outsourced any of its implementation or design to an outside vendor. most (61 out of 77 cases or 79 percent) said they had not outsourced implementation or design. one person commented, “no, i don't recommend doing this. the cost is great, you lose the expertise once the consultant leaves, and the maintenance cost goes through the roof. hire someone fulltime or move a current position to be the keeper of the system.” one of the advantages of having a cms is the ability to give multiple people, regardless of their web authoring skills, the opportunity to edit webpages. therefore, cms users were asked how many web content creators they have within their library. out of 152 responses, the most frequent range cited was 2–5 authors (72 responses or 47 percent), followed by (33 responses or 22 percent) with only one author, 6–10 authors (20 responses or 13 percent), 21–50 authors (16 responses or 11 percent), 11–20 authors (6 responses or 4 percent), and over 50 authors (5 responses or 3 percent). because this question was an open ended response and responses varied greatly, including “over 100 (over 20 are regular contributors)” and “1–3”, standardization was required. when a range or multiple numbers were provided, the largest number was used. respondents were asked whether their library uses a workflow management process requiring page authors to receive approval before publishing content. of the 131 people who responded yes or no, most (88 responses or 67 percent) said no. cms users were asked to provide comments related to topics covered in the survey. many comments mentioned issues of control (or lack thereof), while another common theme was concerns with specific cmss. here is a sampling of responses received: • having dedicated staff is a necessity. there was a time when these tools could be installed and used by a techie generalist. those days are over. a professional content person and a professional cms person are a must if you want your site to look like a professional site... content management systems: trends in academic libraries | connell 53 i'm shocked at how many libraries switched to a cms yet still have a site that looks and feels like it was created 10 years ago. • since the cms was bred in-house by another university department, we do not have control over changing the design or layout. the last time i requested a change, they wanted to charge us. • our university marketing department, which includes the web team, is currently in the process of switching [cmss]. we were not invited to be involved in the selection process for a new cms, although they did receive my unsolicited advice. • we compared costs for open source and licensed systems, and we found the costs to be approximately equivalent based on the development work we would have needed in an open source environment. • the library was not part of the original selection process for the campus' first cms because my position didn't exist at that time. now that we have a dedicated web services position, the library is considered a "power user" in the cms and we are often part of the campus wide discussions about the new cms and strategic planning involving the campus website. • we currently do not have the preferred level of control over our library website; we fought for customization rights for our front page, and won on that front. however, departments on campus do not have permission to install or configure modules, which we hope will change in the future. • there’s a huge disconnect between it /administration and the library regarding unique needs of the library in the context of web-based delivery of information. discussion comparing the results of this study to previous studies indicates that cms usage within academic libraries is rising. the 64 percent cms adoption rate found in this survey, which used a more narrow definition of cms than some previous studies cited in the literature review, is higher than adoption rates in any of said studies. as more libraries make the transition, it is important to know how different cmss have been received among their peers. although cms users are slightly more satisfied than non-cms users (54 percent vs. 47 percent), the tools used matter. so if a library using dreamweaver to manage their site is given an option of moving with their institution to a cms and that cms is cascade server, they should strongly consider sticking with their current non-cms method based on the respective satisfaction levels reported in this study (63 percent vs. 27 percent). satisfaction levels are important, but should not be considered in a vacuum. for example, although libguides users reported very high satisfaction levels (100 percent were satisfied or very satisfied), users were mostly (11 out of 14 users or 79 percent) small or very small schools, while the remaining three (21percent) were medium schools. no large schools reported using libguides as their cms. libguides may be wonderful for a smaller school without need of much information technology and libraries | june 2013 54 customization or, in some cases, access to technical expertise but may not be a good cms solution for larger institutions. one of the largest issues raised by survey respondents was libraries’ control, or lack thereof, when moving to a campus-selected cms. given the complexity of academic libraries websites, library representation on campus-wide cms selection committees is warranted. not only are libraries more satisfied with the results when given a say in the selection, but libraries have special needs when it comes to website design that other campus units do not. including library representation ensures those needs are met. some of the respondents’ comments regarding lack of control over their sites are disturbing to libraries being forced or considering a move to a campus cms. clearly, having to pay another campus department to make changes to the library site is not an attractive option for most libraries. nor should libraries have to fight for the right or ability to customize their home pages. developing good working relationships with the decision makers may help prevent some of these problems, but likely not all. this study indicates that it is not uncommon for academic libraries to be forced into cmss, regardless of the cmss acceptability to the library environment. conclusion the adoption of cmss to manage academic libraries’ websites is increasing, but not all cmss are created equal. when given input into switching website management tools, library staff have many factors to take into consideration. these include, but are not limited to, in-house technical expertise, desirability of open source solutions, satisfaction of peer libraries with considered systems, and library specific needs, such as workflow management and customization requirements. ideally, libraries would always be partners at the table when campus-wide cms decisions are being made, but this study shows that this does not happen in most cases. if a library suspects that it is likely to be required to move to a campus-selected system, its staff should be alert for news of impending changes so that they can work to be involved at the beginning of the process to be able to provide input. a transition to a bad cms can have long-term negative effects on the library, its users, and staff. a library’s website is its virtual “branch” and vitally important to the functioning of the library. the management of such an important component of the library should not be left to chance. references 1. doug goans, guy leach, and teri m. vogel, “beyond html: developing and re-imagining library web guides in a content management system,” library hi tech 24, no. 1 (2006): 29–53, doi:10.1108/07378830610652095. 2. ruth sara connell, “survey of web developers in academic libraries,” the journal of academic librarianship 34, no. 2 (march 2008): 121–129, doi:10.1016/j.acalib.2007.12.005. http://dx.doi.org/10.1016/j.acalib.2007.12.005 content management systems: trends in academic libraries | connell 55 3. maira bundza, patricia fravel vander meer, and maria a. perez-stable, “work of the web weavers: web development in academic libraries,” journal of web librarianship 3, no. 3 (july 2009): 239–62. 4. david comeaux and axel schmetzke, “accessibility of academic library web sites in north america—current status and trends (2002–2012).” library hi tech 31, no. 1 (january 28, 2013): 2. 5. daniel verbit and vickie l. kline, “libguides: a cms for busy librarians,” computers in libraries 31, no. 6 (july 2011): 21–25. 6. amy york, holly hebert, and j. michael lindsay, “transforming the library website: you and the it crowd,” tennessee libraries 62, no. 3 (2012). 7. bundza, vender meer, and perez-stable, “work of the web weavers: web development in academic libraries.” 8. tom kmetz and ray bailey, “migrating a library’s web site to a commercial cms within a campus-wide implementation,” library hi tech 24, no. 1 (2006): 102–14, doi:10.1108/07378830610652130. 9. kimberley stephenson, “sharing control, embracing collaboration: cross-campus partnerships for library website design and management,” journal of electronic resources librarianship 24, no. 2 (april 2012): 91–100. 10. ibid. 11. elizabeth l. black, “selecting a web content management system for an academic library website,” information technology & libraries 30, no. 4 (december 2011): 185–89; andy austin and christopher harris, “welcome to a new paradigm,” library technology reports 44, no. 4 (june 2008): 5–7; holly yu , “chapter 1: library web content management: needs and challenges,” in content and workflow management for library web sites: case studies, ed. holly yu (hersey, pa: information science publishing, 2005), 1–21; wayne powel and chris gill, “web content management systems in higher education,” educause quarterly 26, no. 2 (2003): 43– 50; goans, leach, and vogel, “beyond html.” 12. kmetz and bailey, “migrating a library’s web site.” 13. carnegie foundation for the advancement of teaching, 2010 classification of institutions of higher education, accessed february 4, 2013, http://classifications.carnegiefoundation.org/descriptions/basic.php. 14. ronald r. powell , basic research methods for librarians (greenwood, 1997). http://classifications.carnegiefoundation.org/descriptions/basic.php j costs of library catalog cards produced by computer 121 frederick g. kilgour: ohio college library center, columbus, ohio production costs of 79,831 cards are analyzed. cards were produced by four variants of the columbia-harvard-yale procedure employing an ibm 870 document writer and an ibm 1401 computer. costs per card ranged from 8.8 to 9.8 cents for completed cards. . early in september, 1964, the yale medical library.put into routine operation the columbia-harvard-yale computerized technique for catalog card manufacture ( 1), and during the following three · years yale produced over 87,000 cards. the principal objective of the chy project was an on-line, computerized, bibliographic information retrieval system. however, the route selected for attaining the objective included manufacture of cards from machine readable data to keep up the manual catalog while machine readable records were being inexpensively accumulated for computerized subject retrieval. catalog cards were only one product of the system, but their production was designed to be as efficient as possible within constraints of the system. nevertheless, this paper will examine chy card production costs as though this segment of the system were an isolated procedure, yielding but one product, as is the case in classical library procedures. costing will disregard other benefits, such as accession lists and machine readable data produced for little, or no, additional expense. the columbia medical library and harvard medical library also installed ibm 870 document writers and tested the programs for card production, but neither library routinely produced cards. however, co122 journal of library automation vol. 1/ 2 june, 1968 lumbia produced its acquisitions lists until october, 1966, using chy techniques. harvard issued a similar list, but for a shorter period of time, and it was harvard's withdrawal early in 1966 that brought about the collapse of the project. nevertheless, other institutions adopted the chy procedure for catalog card production, among them the medical library at the university of rochester, which used the programs for two years following february, 1966. e. r. squibb & sons at east brunswick, new jersey, also uses the programs. at the university of kentucky an 870 document writer types catalog cards, but new programs were written to run on an ibm 7040 computer that recently have been recoded in cobol for an ibm 360/50. similarly, the library at philip morris, inc., richmond, virginia, rewrote the programs to run on an ibm 1620 computer which punches cards that drive an 870. the korean social science bibliography project of the human relations area files has elaborated the chy technique into its automated bibliographic system ( 2), which in turn is the base for another bibliographic system for mrican studies. the machine readable cataloging record of the chy mechanized system eventually became the great-grandfather of the marc ii format and contributed about as much to marc ii as would have been the case had their relationship been truly biological. although the columbia-harvard-yale project never did develop and activate its proposed bibliographic information retrieval system, r. k. summit working entirely independently has brought into successful operation his excellent dialog system ( 3) which is essentially the system that chy had in design stage. moreover, summit's system is definitely superior because it has several useful functions not contemplated in chy. nearly all reports on catalog card production limit study of costs to reproduction of cards and neglect other costs involved in preparing cards for the catalog. an exception is p. j. fasana's 1963 investigation wherein he found that library of congress cards, in seven copies and ready to be filed into a catalog, cost 16.6 cents per card; cards produced by a machine method consisting of a tape typewriter and a very small special purpose computer cost 9.9 cents ( 4). fasana used an hourly salary rate of $2.00. a study of early experience with chy production yielded 12.5 cents per card ( 1) whereas the present study shows that costs range between 8.8 and 9.8 cents per card, cards being ·in completed form, arranged in packs for individual catalogs, and ready for bursting before alphabetizing for filing. methods · during the course of the three years in which the chy programs were in operation, four variant techniques were used for card production. the first three with their limitations have been described · elsewhere ( 5). briefly, the initial system consisted of keypunching from worksheets, _listing the punch cards on an ibm 870 document writer, proofreading and costs of library catalog cards/ kilgour 123 correcting, processing the proofread and corrected punch cards on an ibm 1401 computer which produced punch card output that, in tum, was used to drive the 870 document writer for production of catalog cards on oneup forms. in the next arrangement, printing of cards on one-up forms was accomplished on an ibm 1401 computer driving an upperand lowercase print chain. in the third procedure, a two-up card form replaced the one-up form. finally, the medical library returned the 870 document writer to the manufacturer, and the 1401 was programmed to do the prooflisting in upper and lower case. the yale bibliographic system (6) replaced the chy routines on 25 july 1967. the keypuncher kept time records for the various activities listed in table 1 throughout the period of this study. during the first two months of operation, design for recording data was inadequate. subsequently an individual would, albeit infrequently, fail to record time elapsed, so that production of 7,630 cards was omitted from the study, leaving a total of 79,831 to be included. on several occasions during the fourth part of the study, the second proofreading was suspended, and only correction carried out. hence, time expended in this category is less than in the previous three periods. at first an ibm 1401 computer in the yale computer center was used, the center being located about a mile from the medical library. subsequently, another 1401 modified to drive an upperand lower-case print chain and located in the medical sc;hool was employed. later this machine was transferred to the administrative data systems computer center, which moved to a new location not long after it assumed operation of the 1401. still later, the 1401 was again transferred, this time to the yale computer center. as can be seen from the computer charges in table 1, these wanderings about new haven appear to have had no effect on operating efficiency. time recorded for each computer run was actual time clocked by the operator. other times were recorded by the individual performing the operation. ·. salaries used in the cost calculation were salaries being paid in june, 1967, which were, of course, appreciably higher than those in the autumn of 1964; hourly rate for the first proofreader in table 1 was $2.62 ~nd for the second $2.21. hourly rental for the 870 document writer was $.78. rate of computer charges employed in the calculation was $20 per hour, a rate that had existed during the last year or so during which data was collected. initially, computer charges had been $75 an hour, but they dropped precipitously during the first two years. costs for catalog card stock were the lowest cost charged for the two types of forms. since these forms were not standard items during the years of the study, their prices varied considerably depending upon the amount ordered. results table 1 contains cost figures for catalog card production by the four variant techniques. since salaries and computer charges can vary widely, -----.-.---.-~..::::-·...:::::-.-__ ...... l'o ~ table 1. per-card costs of computer-produced catalog cards. 'o' one-u p form on 870 one-up fo r m o n 1401, t woup f o r m on 1401 , two-up· form o n 1401 , ~ g proo f on 8 70 proof o n 870 p r oof o n 140 1 ...... ..a dollars hou r s dollars hou r s dolla r s hou r s d olla r s hours t"'' .... <:3"' k e ypunch i n g • 02 19 • 0099 • 0 2 18 • 0099 • 0222 • 0 10 1 • 0 235 • 0106 "'t ~ "'t '-!::: keypun c h • 0029 • 00 99 • 0030 • 009 9 • 003 0 • 0101 • 0 0 32 • 0 106 ::> ~ i b m 8 70p r o o£ • 0033 • 00 4 3 • 0 036 • 00 4 6 • 003 9 • 00 51 ..... 0 i bm 1401 -proof • 004 6 ~ • 009 1 ~ ..... .... proofr eaders (2) 0 ;:$ proofr eadi ng • 0 11 5 • 004 4 • 0 11 3 • 00 4 3 . 0118 • 00 45 • 011 6 • 0044 proofr eading and c orrecting • 0 120 • 0 0 55 • 0 12 2 • 005 5 • 0 11 9 • 0 0 54 • 009 1 • 004 1 ~ i bm 140 1 • 0149 • 0085 • 0313 • 0 156 • 023 1 • 011 6 • 024 5 • 0 112 !"""' ...... ib m 8 70-ca r d typing • 0 104 '-.... l'o card st o c k • 0 149 • 01 49 • 01 2 5 • 0125 '--1 t o ta l • 0 9 18 • 0981 • 0884 • 09 35 § v(l) ...... <;;0 n um b er of cards 1 5, 149 9343 27,210 28, 129 0:> 00 number of titles 1, 6 55 990 2 , 920 3,1 30 cards per titl e 9 . 2 9. 4 9. 3 9 . 0 ~--· costs of librm·y catalog cards/kilgour 125 particularly among countries, time per card produced is also included in the table to facilitate comparison with other systems. of course, amounts of tim~ calculated by dividing elapsed time by amount of product are not directly comparable with results of time and motion studies such as henry voos' helpful study (7) . however, two different methods of comparing the input costs in table 1 with those johnson ( 8) published for the stanford book catalog gave divergences of only 2 and 6 per cent. source of the increase in costs of six-tenths of a cent from the first procedure to the second is entirely the increase in computer charges when the 1401 replaced the 870 to print cards. when the two-up form was employed on the computer in variant three, charges then dropped to less than the combined 1401 and 870 costs in the first procedure. costs rose again in procedure four. here the principal cause of the increase was the substitution of computer-produced proof listings after the 870 document writer had been returned to the manufacturer. although there is no reason to think that preparation of cataloging copy on a worksheet is either more or less expensive than older techniques, coding a worksheet constitutes additional work for which there is no equivalent in classical procedures. coding costs were examined between 9 march and 11 may 1965, when six individuals, ranging from professional catalogers to a student assistant, recorded time required to code 725 worksheets. time per final catalog card produced was three seconds; in other words, $.003 for a cataloger receiving $7500 a year, or $.001 for a student assistant earning $1.50 an hour. if total coding cost, . rather than a portion of it, were to be charged to card production, costs reported in table 1 could rise oneto three-tenths cents. discussion the accurate comparison of costs would be with those of systems similar to the chy system that produce more than one product. for instance, the chy system also produced monthly accession lists from the same punch-card decklets that produced catalog cards. the accession list was produced mechanically at a cost far less than that for the previous manual preparation. the decklets also constituted machine readable information available for other purposes, most of which have not yet been realized. system costing would assign only a portion of keypunching and proofreading costs to card production. another saving was the appreciable shortening of time required for catalog cards to appear in the catalog. in procedures one through three, usually three or four days elapsed from the day on which the cataloger completed cataloging to the day on which cards were filed into the catalog. however, in procedure four, the computer, which was then a mile distant from the medical library, was used on two separate occasions for each batch of decklets, so that elapsed time rose to at least a week. ' i li ii ii '· ,, .. '· ,, ' • ,, 126 journal of library automation vol. 1/ 2 june, 1968 even though other benefits are not reflected in comparative costs, it is clear from fasana's findings that the chy computer-produced cards cost far less than do lc cards, and have a similar cost to those produced mechanically on which fasana reported. although there appears to be no published evidence that photocopying techniques can produce finished catalog cards at less expense than 9 cents, it is possible that some photoreproduced cards may be less expensive than those described in this article. however, it must be pointed out that photo-reproduced cards are products . of single-product procedures, whereas the chy cards are one of several system products. increase in cost between procedure three and procedure four was due to increase in cost of prooflisting in upper and lower case on the 1401 computer as compared to prooflisting on the 870 document writer. this cost increase was not detected until calculations were done for this investigation, and therein lies a moral. it was the policy at the yale library for all programming to be done by library programmers, since various inefficiences, and indeed catastrophes, had occasionally been observed when non-library personnel had prepared programs for library operations. the single exception to this policy was the proof program, which this investigation reveals used an exhorbitant amount of time-one-third of that required for subsequent card production. since it had been felt that writing and coding a prooflisting program. was perfectly straightfmward, an outside programmer of recognized ability was employed to write and code the program. because the program was simple, and because the programmer had high competence, efficiency of the program was never checked as it should have been. this episode raises the question that if even the wary can be trapped, how can the tmwary avoid pitfalls? there is no satisfactory answer, but it would appear that some difficulties could be avoided by review of new programs by experienced library programmers, of which there are unfortunately far too few. comparison with data such as that in table 1 will also be helpful, but not definitive, in evaluating new programs. of course, when widely used library computer programs of recognized efficiency are generally available, magnitude of the pitfalls will have been greatly reduced. concl"qsion computer-produced catalog cards, even when they are but one of several system products, can be prepared in finished form for a local catalog less expensively and with less delay than can library of congress printed cards. computer card production at 8.8 to 9.8 cents per completed card appears to be competitive with other procedures for preparing catalog cards. however, undetected inefficiency in a minor program increased costs, thereby emphasizing need to insure efficiency in programs used routinely. costs of library catalog cards/ kilgour 127 acknowledgements the author is most grateful to mrs. sarah boyd, keypuncher extraordinary, who maintained the record of the data used in this study. national science foundation grant no. 179 supported the chy project in part. references 1. kilgour, frederick g.: "mechanization of cataloging procedures," bulletin of the medical library association, 53 (aprill965), 152-162. 2. koh, hesung c.: "a social science bibliographic system; computer adaptations," the american behavioral scientist, 10 (jan. 1967), 2-5. 3. summit, roger k.: "dialog; an operational on-line reference retrieval system," association for computing machinery, proceedings of 22nd national conference, (1967), 51-56. 4. fasana, p.j.: "automating cataloging functions in conventional libraries," library resources & technical services, 7 ( fall1963), 350-365. 5. kilgour, frederick g.: "library catalogue production on small computers," american documentation, 17 (july 1966), 124-131. 6. weisbrod, david l.: "an integrated, computerized, bibliographic system for libraries," (in press). 7. voos, henry: standard times for certain clerical activities in technical processing (ann arbor, university microfilms, 1965). 8. johnson, richard d.: "a book catalog at stanford~" journal of library automation, 1 (march 1968), 13-50. ----------------------editor’s comments bob gerrity information technology and libraries | december 2012 1 past and present converge with the december 2012 issue of information technology and libraries (ital), as we also publish online the first volume of ital’s predecessor, the journal of library automation (jola), originally published in print in 1968. the first volume of jola offers a fascinating glimpse into early days of library automation, when many things were different, such as the size (big) and capacity (small) of computer hardware, and many things were the same (e.g., richard johnson’s description of the book catalog project at stanford, where “the major achievement of the preliminary systems design was to establish a meaningful dialogue between the librarian and systems and computer personnel.” plus ça change, plus c'est la meme. there are articles by luminaries in the field: richard de gennaro describes approaches to developing an automation program in a large research library, frederick kilgour, from the ohio bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. http://ejournals.bc.edu/ojs/index.php/ital/issue/view/312 editor’s comments bob gerrity editor’s comments | gerrity 2 college library center (now oclc), analyzes catalog-card production costs at columbia, harvard, and yale in the mid 1960s (8.8 to 9.8 cents per completed card), and henriette avram from the library of congress describes the successful use of the cobol programming language to manipulate marc ii records. the december 2012 issue marks the completion of ital’s first year as an e-only, open-access publication. while we don’t have readership statistics for the previous print journal to compare with, download statistics for the e-version appear healthy, with more than 30,000 full-text article downloads for 2012 content so far this year, plus more than 10,000 downloads for content from previous years. based on the download statistics, the topics of most interest to today’s ital readers are discovery systems, web-based research guides, digital preservation, and digital copyright. this month’s issue takes some of these themes further, with articles that examine the usability of autocompletion features in library search interfaces (ward, hahn, and feist), reveal patterns of student use of library computers (thompson), propose a cloud-based digital library storage solution (sosa-sosa), and summarize attributes of open standard file formats (park, oh). happy reading. 20 information technology and libraries | june 2008 an assessment of student satisfaction with a circulating laptop service louise feldmann, lindsey wess, and tom moothart since may of 2000, colorado state university’s (csu) morgan library has provided a laptop computer lending service. in five years the service had expanded from 20 to 172 laptops. although the service was deemed a success, users complained about slow laptop startups, lost data, and lost wireless connections. in the fall of 2005, the program was formally assessed using a customer satisfaction survey. this paper discusses the results of the survey and changes made to the service based on user feedback. colorado state university (csu) is a land-grant insti-tution located in fort collins, colorado. the csu libraries consist of the morgan library, the main library on the central campus; the veterinary teaching branch hospital library at the veterinary hospital campus; and the atmospheric branch library at the foothills campus. in 1997, morgan library completed a major renovation and expansion which provided a designated space for public desktop computers in an information commons environment. the library called this space the electronic information center (eic). due to the popularity of the eic ,and with the intent of expanding computer access without expanding the computer lab, library staff began to explore the implementation of a laptop checkout service in 2000. library staff used heather lyle’s (1999) article “circulating laptop computers at west virginia university” as a guide in planning the service. development funds were used to purchase twenty laptop computers, and the 3com corporation donated fifteen wireless network access points. the laptops were to be used in morgan library on a wireless network maintained by the library technology services department. these computers were to be circulated from the loan desk, the same desk used to check out books. although the building is open to the public, use of the laptops was limited to university students and staff and for library in-house use only. all the public desktop computers and laptops use microsoft windows and microsoft office. maintaining the security of the libraries’ network and students’ personal data in a wireless environment was paramount. to maintain a secure computing environment and present a standardized computing experience in the library, an application of windows xp group policies was used. currently, the laptop software is updated at least every semester using symantec ghost. ghost copies a standardized image to every laptop even when the library owns a variety of computer models from the same manufacturer. additionally, due to concerns over wireless computer security, morgan library implemented cisco’s virtual private network (vpn) in 2004. the laptop service was launched in may 2000. more than 22,000 laptop transactions occurred in the initial year. since its inception, the use of the morgan library laptop service and the number of laptops available for checkout has steadily grown. using student technology funds, the service had grown to 172 laptops and ten presentation kits consisting of a laptop, projector, and a portable screen. circulation during the fall 2005 semester totaled 30,626 laptops and 102 presentation kits. in fiscal year 2005, 66,552 laptops and presentation kits were checked out. based on the high circulation statistics and anecdotal evidence, the service appeared to be successful. although morgan library replaced laptops every three years and upgraded the wireless network, laptop support staff noted that users complained of slow laptop startups, lost data, and lost wireless connections. the researchers also noted that large numbers of users queued at the circulation desk at 5:00 p.m. even though large numbers of desktop computers were available in the eic. a customer service satisfaction survey was developed to assess the service and test library staff’s assumptions about the service. csu had a student population of 25,616 students at the time of the survey. n literature review much of the published literature discussing laptop services focuses on the implementation of laptop lending programs and was published from 2001 to 2003, when many libraries were beginning this service (allmang 2003; block 2001; dugan 2001; myers 2001; oddy 2002; vaughan and burnes 2002; williams 2003). these articles deal primarily with topics such as how to deal with start-up technological, staffing, and maintenance issues. they have minimal discussion of the service post-implementation. researchers who have surveyed users of university laptop lending services include direnzo (2002), lyle (1999), jordy (1998), block (2001), oddy (2002), and monash university’s caulfield library (2004). direnzo from the university of akron only briefly discusses a survey they conducted with some information about additional software added as a result of their user comments. lyle from west virginia university discusses the percentage of respondents to particular questions such louise feldmann (louise.feldmann@colostate.edu) is the business and economics librarian at colorado state university libraries. she serves as the college liaison librarian to the college of business. lindsey wess (lindsey.wess@colostate. edu) coordinates assistive technology services and manages the information desk and the electronic information center at colorado state university libraries. tom moothart (tmoothar@ library.colostate.edu) is the coordinator of on-site services at colorado state university libraries. student satisfaction with circulating laptop service | feldmann, wess, and moothart 21 as what applications were used, problems encountered, and overall satisfaction with the service. jordy’s report provides in-depth analysis of the survey results from the university of north carolina at chapel hill, but the focus of his survey is on the laptop service’s impact on library employee work flow. monash university’s caulfield library survey focuses on wireless access and awareness of the program by patrons. other survey results found on university library web sites include southern new hampshire university library (west 2005) and murray state university library (2002). additionally, the monmouth university library web site (2003) provides discussion and written analysis of a survey they conducted prior to implementation of their service, a survey which was used to gather information and assess patron needs in order to aid in the construction and planning of their service. from the survey results discussed in the literature and posted on web sites, overall comments from users are very consistent with one another. most users indicate that they use a loaned laptop computer rather than desktop computer for privacy and portability (lyle 1999; oddy 2002; west 2005). in addition, the responses from patrons are overwhelmingly positive and users appreciated having the service made available (lyle 1999; jordy1998; west 2005). both west virginia university and the university of north carolina at chapel hill surveys found that 98 percent of respondents would check out a laptop again (lyle 1999; jordy 1998). southern university of new hampshire’s survey indicated that 88 percent of those responding would check one out again (west 2005). many respondents stated that a primary drawback of using the laptops was the slowness of connectivity (lyle 1999; monash 2004; murray state 2002). the primary use of the laptops, reported in the surveys, was microsoft word (lyle 1999; jordy 1998; oddy 2002). there is a lack of published literature regarding laptop lending customer satisfaction surveys and analysis. this could be due to the relative newness of many programs, the lack of university libraries that provide laptops, or the reliance on circulation statistics solely to assess the program. articles that discuss circulation and usage statistics as an assessment indicator to judge the popularity of their programs include direnzo (2002), dugan (2001), and vaughan and burnes (2002). based on high circulation statistics and positive anecdotal evidence, it may appear that library users are pleased with laptop programs, and perhaps there has been a hesitation to survey users on a program that is perceived by those in the library as successful. n results with the strong emphasis on assessment at colorado state university, it was decided to formally survey laptop users on their satisfaction with the program. the survey was distributed by the access services staff when the laptops were checked out from october 28, 2005, to november 28, 2005. this was a voluntary survey and the respondents were asked to complete one survey. users returned 173 completed surveys. undergraduates are the predominant audience for the laptop service; of the 173 returned surveys, 160 identified themselves as undergraduates. as shown in table 1, the responses indicated that the library has a core of regular laptop users, with 33 percent using the laptops at least daily and 82 percent using the laptops at least weekly. only 3 percent indicated that they were using a laptop for the first time. many laptop users also utilized the eic with 67 percent responding that they use the information commons at least weekly (see table 2). the laptops were initially purchased with the intent that they would be used to support student team projects. presentation kits with a laptop, projector, and portable screen were an extension of this idea and were also made available for checkout. surprisingly, only 15 percent of table 1. how often do you use a library laptop? frequency percentage more than once a day 3% daily 30% weekly 49% monthly 15% my first time 3% n=172 table 2. how often do you use a library pc? frequency percentage more than once a day 3% daily 20% weekly 44% monthly 20% never 13% n=169 22 information technology and libraries | june 2008 the respondents noted that they were using the laptop with a group. during evenings, it was observed by staff that students were regularly queuing and waiting for a laptop even though pcs were available in the library computer lab. figure 1 shows an hourly use statistics for the desktop and laptop public computers. the usage of the desktop computer drops in the late afternoon, just as the use of the laptop computer increases. students were asked why they chose a laptop rather than a library pc and were allowed to choose from multiple answers. as can be seen in table 3, most students noted the advantages of portability and privacy. five respondents wrote in the “other” category that they were able to work better in quieter areas, and ten mention that the computer lab workspace is limited. the dense use of space in the library computer lab has been noted by morgan library staff and students. the desktop surrounding each library pc only provides about three feet of workspace. one respondent explained the choice of laptop over pc was because “i can take it to a table and spread out my notes vs. on a library pc.” for many users, the desktops are too crowded to spread research material, and the eic is too noisy for contemplative thought. as can be noted from the use statistics, the public laptop program has been a very popular library service. prior to the survey, the perception of the morgan library staff was that students were waiting in the evening for extended periods of time for a laptop. when the library expanded the laptop pool from 20 in 2000 to 172 in 2005, it had seemingly no effect on reducing the number of students waiting to use them. as can be seen in table 4, when asked how long they had waited for a laptop, 74 percent of the students said they had access to a laptop immediately, and 15 percent waited less than a minute. the survey was administered during the second busiest time of the year for the library, the month before thanksgiving break. in the open comments, one respondent stated that it was possible to wait fortyfive minutes to an hour for a laptop and another noted that “during finals weeks it is almost impossible to get one.” even with the limited waiting time recorded by the page 1 of 1 feldmann figures.doc 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 7:30 am 8:30 am 9:30 am 10:30 am 11:30 am 12:30 pm 1:30 pm 2:30 pm 3:30 pm 4:30 pm 5:30 pm 6:30 pm 7:30 pm 8:30 pm 9:30 pm 10:30 pm 11:30 pm time of day p er ce nt ag e of u se r desktop computers checkout laptops figure 1. computer use statistics for may 1, 2006. figure 1. computer use statistics for may 1, 2006. table 3. why did you choose to use a laptop rather than a library pc? response number portability 41 privacy 12 easier to work with a group 7 portability and privacy 54 portability and easier to work with a group 10 portability, privacy, and easier to work with a group 12 student satisfaction with circulating laptop service | feldmann, wess, and moothart 23 respondents, when asked how the library could improve the laptop service many respondents requested that more laptops be purchased to decrease the wait. the library is struggling to determine the appropriate number of laptops to have available during peak use periods to reduce or eliminate wait times. the library laptops are more problematic than the library desktop computers to support. the laptops are more fragile than the desktop computers and have the added complication of connecting to the wireless network. every morning the morgan library’s technology staff retrieves non-functioning laptops; library technicians regularly retrieve lost data due to malfunctioning laptops and unsophisticated computer users. the addition of the virtual private network (vpn) connection to the laptop startup script files has slowed the boot-up to the wireless network. an effort has been made to ameliorate wireless “dead zones,” but users still complain of being dropped from the wireless network. with these problems in mind, users were asked about the technical complications they have experienced with the library laptops. the survey responses in tables 5 and 6 indicate a much lower percentage of users reporting technical problems than was anticipated. the technical staff’s large volume of technical calls may reflect the volume of users rather than systematic problems with the laptop service. surprisingly, 79 percent of the users reported rarely or never returning a non-functioning laptop. in addition, the library technicians have reported that no problems have been found on some of the laptops returned for repair. some of the returned computers may be due to frustration with the slow connection to the wireless network. forty-five percent of respondents reported at least occasionally having problems connecting to the wireless network. from the inception of the laptop program, the library has experienced problems with the wireless technology. from its original fifteen wireless access points to its current twenty-nine, the library has struggled to meet the demand of additional library laptops and users’ personal laptops. many written comments on the surveys complained about the slow connection speed of the wireless network such as, “find a way to make the boot-up process faster. i need to wait about five minutes for it to be totally booted and ready to use.” even with the slow connection to the wireless network, 41 percent of students responding to the survey rated their satisfaction with the library’s laptop service as excellent and 49 percent rated their satisfaction as good (see table 7). n discussion even with 90 percent of our users rating the laptop service as good or excellent, the survey noted some problems that needed attention. the morgan library laptops seamlessly connect to a wireless network through a login script when the computer is turned on. a new script was written to table 4. how long did you wait before you were able to check out your laptop? response percentage i did not wait 74% less than one minute 15% one to four minutes 11% five to ten minutes 2% more than ten minutes 0% n=171 table 5. how often have you experience problems saving files, connecting to the wireless network, or had a laptop that locked up or crashed? frequency saving files wireless connection locked up or crashed often <1% 5% <1% occasionally 8% 40% 17% rarely 33% 32% 35% never 58% 24% 49% n= 165 165 163 table 6. how often have you returned a library laptop that was not working properly? frequency percentage often 4% occasionally 18% rarely 30% never 49% n=165 24 information technology and libraries | june 2008 allow the connection and authentication to the cisco virtual private network (vpn) client. during testing it was found that some laptops took as long as ten minutes to connect to the wireless network, which resulted in numerous survey respondents commenting on our slow wireless network. to help correct this problem, the library’s network staff changed each laptop’s user profile from a mandatory roaming profile to a local profile and simplified the login script. the laptops connected faster to the wireless network with the new script, but they still did not meet the students’ expectations. in the fall of 2006, the library network staff moved the laptops from vpn to wi-fi protected access (wpa) wireless security, and laptop login time to the wireless network dropped to under two minutes. the number of customer complaints dropped dramatically after implementing wpa. additional access points were purchased to improve connectivity in morgan library’s wireless “dead zones.” in january 2006, the university’s central computing services audited the wireless network after continued wireless connectivity complaints. the audit recommended reconfiguring the access points channel assignments. in many cases it was found that the same channel had been assigned to access points adjacent to each other, ultimately compromising laptop connectivity. the audit also discovered noise interference on the wireless network from a 2.4-ghz cordless phone used by the loan desk staff. the phone was replaced with a 5.8-ghz one, which has resulted in fewer dropped connections near the loan desk. supporting almost 200 laptops has introduced several problems in the library. the morgan library building was not designed to support the use of large numbers of laptops. because it is impractical for the loan desk to charge nearly 200 laptop batteries throughout the day, laptops available for checkout must be connected to electrical outlets. these are seldom near study tables, and students are forced to crawl underneath tables to locate power or stretch adapter cords across aisles. a space plan for the morgan library is being developed that will increase the number of outlets near study tables. in the meantime, 100 power strips were added to tables used heavily by laptop users. the loan desk staff is very efficient at circulating, but has less success at troubleshooting technical problems. when the laptop service was first implemented, large numbers of laptops were not available due to servicing reasons. the public laptop downtime was lowered by hiring additional library technology students. a one-day onsite repair service agreement was purchased from the manufacturer which resulted in many equipment repairs being completed within 48 hours. in order to reduce the downtime further, a plan to replace some loan desk student workers with library technology students is being evaluated. the technology students will be able to troubleshoot connectivity and hardware problems with the users when they return the defective computers to the loan desk. if a computer needs additional service, it can be handled immediately, which will allow more laptops for checkout since fewer will be removed for repair. when the laptop service was first envisioned, it was seen as a great service for those working in groups. as can be seen in table 3, very few students are using the laptops in a group setting. in survey written comments, students emphasize that they enjoy the portability and privacy enabled by using a laptop. the morgan library eic is cramped and noisy, with the configuration allowing very little room for students to spread out research materials and notes for writing. the morgan library space plan takes these issues into consideration and recommends reconfiguring the eic to lessen the noise and provide writing space near computers. this is intended to improve the student library experience and encourage students to use the desktop computers during the evenings when lines form for the laptops. in order to decrease the current laptop queue at the loan desk, more laptops will be added. as a result of survey comments requesting apple computers, five mac powerbooks were added to the library’s laptop fleet. in addition, as morgan library adds more checkout laptops and the number of students arriving on campus with wireless laptops increases, the wireless infrastructure will need to be upgraded. upgrading the wireless access points to standard 802.11g has been implemented. updating each laptop with a new hardrive image has become problematic as the number of laptops has increased. the wireless network capacity is not large enough for the ghost software to transmit the image to multiple laptops, and so each laptop must be physically attached to the library network. initially, when library technology services attempted imaging many laptops at once, it took six to eight hours and required up to eight staff members. this method of large-scale laptop imaging was so network intensive that it had to be performed when the library was closed to avoid disrupting table 7. please rate your satisfaction with the laptop service. response percentage excellent 41% good 49% neutral 7% poor very poor 2% <1% n=166 student satisfaction with circulating laptop service | feldmann, wess, and moothart 25 public internet use. now imaging the laptop fleet is done piecemeal, twenty to thirty laptops at a time, in order to minimize complications with the ghost process and multicasting through the network switches. due to the staff time required, laptop software is not updated as often as the users would like. technological solutions continue to be investigated that will decrease the labor and network intensity of imaging. n conclusion the morgan library laptop service was established in 2000 and has been a very popular addition to the library’s services. as an example of its popularity, in fiscal year 2005 the laptops circulated 66,552 times. student government continues to support the use of student technology fees to support and expand the fleet of laptops. this survey was an attempt to assess users’ perceptions of the service and identify areas that need improvement. the survey found that students rarely wait more than a few minutes for a laptop, and in open-ended survey questions, students noted that they waited for computers only during peak use periods. while relatively few survey respondents experienced technical difficulties with the laptops and wireless network, slow wireless connection time was a concern that students noted in the open comments section of the survey. overall, the students gave the laptop service a very high rating. when asked to suggest improvements to the service, many respondents recommended purchasing more laptops. the libraries made several changes to improve the laptop service based on survey responses. changes have been made to the login script files, wireless network, and security protocol to speed and stabilize the wireless connection process. additional wireless access points will be added to the building and all access points will be upgraded to the 802.11g standard. in addition, five mac powerbooks have been added to the fleet of windowsbased laptops. the library continues to investigate new service models to circulate and maintain the laptops. works cited allmang, nancy. 2003. our plan for a wireless loan service. computer in libraries 23, no. 3: 20–25. block, karla j. 2001. laptops for loan: the experience of a multilibrary project. journal of interlibrary loan, document delivery, and information 12, no. 1: 1–12. direnzo, susan. 2002. a wireless laptop-lending program: the university of akron experience. technical services quarterly 20, no. 2: 1–12. dugan, robert e. 2001. managing laptops and the wireless network at the mildred f. sawyer library. journal of academic librarianship 27, no. 4: 295–298. jordy, matthew l. 1998. the impact of user support needs on a large academic workflow as a result of a laptop check-out program. master’s thesis, university of north carolina. lyle, heather. 1999. circulating laptop computers at west virginia university. information outlook 3, no. 11: 30–32. myers, penelope. 2001. laptop rental program, temple university libraries. journal of interlibrary loan, document delivery, and information supply 12, no. 1: 35–40. monash university caulfield library. 2004. laptop users and wireless network survey. www.its.monash.edu.au/staff/networks/wireless/review/caul-lapandnetsurvey.pdf (accessed june 8, 2005). monmouth university. 2003. testing the wireless waters: a survey of potential users before the implementation of a wireless notebook computer lending program in an academic library. http://bluehawk.monmouth.edu/~hholden/wwl/wireless_survey_results.html (accessed june 8, 2005). murray state university. 2002. library laptop computer usage survey results. www.murraystate.edu/msml/laptopsurv. htm (accessed june 8, 2005). oddy, elizabeth carley. 2002. laptops for loan. library and information update 1, no. 4: 54–55. vaughn, james b., and brett burnes. 2002. bringing them in and checking them out: laptop use in the modern academic library. information technology and libraries 21, no. 2: 52–62. west, carol. 2005. librarians pleased with results of student survey. southern new hampshire university. www.snhu. edu/3174/asp (accessed june 8, 2005). williams, joe. 2003. taming the wireless frontier: pdas, tablets, and laptops at home on the range. computers in libraries 23, no. 3: 10–12, 62–64. 128 information technology and libraries | september 2010 lynne weber and peg lawrence authentication and access: accommodating public users in an academic world in cook and shelton’s managing public computing, which confirmed the lack of applicable guidelines on academic websites, had more up-to-date information but was not available to the researchers at the time the project was initiated.2 in the course of research, the authors developed the following questions: ■■ how many arl libraries require affiliated users to log into public computer workstations within the library? ■■ how many arl libraries provide the means to authenticate guest users and allow them to log on to the same computers used by affiliates? ■■ how many arl libraries offer open-access computers for guests to use? do these libraries provide both open-access computers and the means for guest user authentication? ■■ how do federal depository library program libraries balance their policy requiring computer authentication with the obligation to provide public access to government information? ■■ do computers provided for guest use (open access or guest login) provide different software or capabilities than those provided to affiliated users? ■■ how many arl libraries have written policies for the use of open-access computers? if a policy exists, what is it? ■■ how many arl libraries have written policies for authenticating guest users? if a policy exists, what is it? ■■ literature review since the 1950s there has been considerable discussion within library literature about academic libraries serving “external,” “secondary,” or “outside” users. the subject has been approached from the viewpoint of access to the library facility and collections, reference assistance, interlibrary loan (ill) service, borrowing privileges, and (more recently) access to computers and internet privileges, including the use of proprietary databases. deale emphasized the importance of public relations to the academic library.3 while he touched on creating bonds both on and off campus, he described the positive effect of “privilege cards” to community members.4 josey described the variety of services that savannah state college offered to the community.5 he concluded his essay with these words: why cannot these tried methods of lending books to citizens of the community, story hours for children . . . , a library lecture series or other forum, a great books discussion group and the use of the library staff in the fall of 2004, the academic computing center, a division of the information technology services department (its) at minnesota state university, mankato took over responsibility for the computers in the public areas of memorial library. for the first time, affiliated memorial library users were required to authenticate using a campus username and password, a change that effectively eliminated computer access for anyone not part of the university community. this posed a dilemma for the librarians. because of its federal depository status, the library had a responsibility to provide general access to both print and online government publications for the general public. furthermore, the library had a long tradition of providing guest access to most library resources, and there was reluctance to abandon the practice. therefore the librarians worked with its to retain a small group of six computers that did not require authentication and were clearly marked for community use, along with several standup, open-access computers on each floor used primarily for searching the library catalog. the additional need to provide computer access to high school students visiting the library for research and instruction led to more discussions with its and resulted in a means of generating temporary usernames and passwords through a web form. these user accommodations were implemented in the library without creating a written policy governing the use of open-access computers. o ver time, library staff realized that guidelines for guests using the computers were needed because of misuse of the open-access computers. we were charged with the task of drafting these guidelines. in typical librarian fashion, we searched websites, including those of association of research libraries (arl) members for existing computer access policies in academic libraries. we obtained very little information through this search, so we turned to arl publications for assistance. library public access workstation authentication by lori driscoll, was of greater benefit and offered much of the needed information, but it was dated.1 a research result described lynne webber (lnweber@mnsu.edu) is access services librarian and peg lawrence (peg.lawrence@mnsu.edu) is systems librarian, minnesota state university, mankato. authentication and access | weber and lawrence 129 providing service to the unaffiliated, his survey revealed 100 percent of responding libraries offered free in-house collection use for the general public, and many others offered additional services.16 brenda johnson described a one-day program in 1984 sponsored by rutgers university libraries forum titled “a case study in closing the university library to the public.” the participating librarians spent the day familiarizing themselves with the “facts” of the theoretical case and concluded that public access should be restricted but not completely eliminated. a few months later, consideration of closing rutgers’ library to the public became a real debate. although there were strong opposing viewpoints, the recommendation was to retain the open-door policy.17 jansen discussed the division between those who wanted to provide the finest service to primary users and those who viewed the library’s mission as including all who requested assistance. jansen suggested specific ways to balance the needs of affiliates and the public and referred to the dilemma the university of california, berkeley, library that had been closed to unaffiliated users.18 bobp and richey determined that california undergraduate libraries were emphasizing service to primary users at a time when it was no longer practical to offer the same level of service to primary and secondary users. they presented three courses of action: adherence to the status quo, adoption of a policy restricting access, or implementation of tiered service.19 throughout the 1990s, the debate over the public’s right to use academic libraries continued, with increasing focus on computer use in public and private academic libraries. new authorization and authentication requirements increased the control of internal computers, but the question remained of libraries providing access to government information and responding to community members who expected to use the libraries supported by their taxes. morgan, who described himself as one who had spent his career encouraging equal access to information, concluded that it would be necessary to use authentication, authorization, and access control to continue offering information services readily available in the past.20 martin acknowledged that library use was changing as a result of the internet and that the public viewed the academic librarian as one who could deal with the explosion of information and offer service to the public.21 johnson described unaffiliated users as a group who wanted all the privileges of the affiliates; she discussed the obligation of the institution to develop policies managing these guest users.22 still and kassabian considered the dual responsibilities of the academic library to offer internet access to public users and to control internet material received and sent by primary and public users. further, they weighed as consultants be employed toward the building of good relations between town and gown.6 later, however, deale indicated that the generosity common in the 1950s to outsiders was becoming unsustainable.7 deale used beloit college, with an “open door policy” extending more than 100 years, as an example of a school that had found it necessary to refuse out-of-library circulation to minors except through ill by the 1960s.8 also in 1964, waggoner related the increasing difficulty of accommodating public use of the academic library. he encouraged a balance of responsibility to the public with the institution’s foremost obligation to the students and faculty.9 in october 1965, the ad hoc committee on community use of academic libraries was formed by the college library section of the association of college and research libraries (acrl). this committee distributed a 13-question survey to 1,100 colleges and universities throughout the united states. the high rate of response (71 percent) was considered noteworthy, and the findings were explored in “community use of academic libraries: a symposium,” published in 1967.10 the concluding article by josey (the symposium’s moderator) summarized the lenient attitudes of academic libraries toward public users revealed through survey and symposium reports. in the same article, josey followed up with his own arguments in favor of the public’s right to use academic libraries because of the state and federal support provided to those institutions.11 similarly, in 1976 tolliver reported the results of a survey of 28 wisconsin libraries (public academic, private academic, and public), which indicated that respondents made a great effort to serve all patrons seeking service.12 tolliver continued in a different vein from josey, however, by reporting the current annual fiscal support for libraries in wisconsin and commenting upon financial stewardship. tolliver concluded by asking, “how effective are our library systems and cooperative affiliations in meeting the information needs of the citizens of wisconsin?”13 much of the literature in the years following focused on serving unaffiliated users at a time when public and academic libraries suffered the strain of overuse and underfunding. the need for prioritization of primary users was discussed. in 1979, russell asked, “who are our legitimate clientele?” and countered the argument for publicly supported libraries serving the entire public by saying the public “cannot freely use the university lawn mowers, motor pool vehicles, computer center, or athletic facilities.”14 ten years later, russell, robison, and prather prefaced their report on a survey of policies and services for outside users at 12 consortia institutions by saying, “the issue of external users is of mounting concern to an institution whose income is student credit hour generated.”15 despite russell’s concerns about the strain of 130 information technology and libraries | september 2010 be aware of the issues and of the effects that licensing, networking, and collection development decisions have on access.”35 in “unaffiliated users’ access to academic libraries: a survey,” courtney reported and analyzed data from her own comprehensive survey sent to 814 academic libraries in winter 2001.36 of the 527 libraries responding to the survey, 72 libraries (13.6 percent) required all users to authenticate to use computers within the library, while 56 (12.4 percent) indicated that they planned to require authentication in the next twelve months.37 courtney followed this with data from surveyed libraries that had canceled “most” of their indexes and abstracts (179 libraries, or 33.9 percent) and libraries that had cancelled “most” periodicals (46 libraries or 8.7 percent).38 she concluded that the extent to which the authentication requirement restricted unaffiliated users was not clear, and she asked, “as greater numbers of resources shift to electronic-only formats, is it desirable that they disappear from the view of the community user or the visiting scholar?”39 courtney’s “authentication and library public access computers: a call for discussion” described a follow-up with the academic libraries participating in her 2001 survey who had self-identified as using authentication or planning to employ authentication within the next twelve months. her conclusion was the existence of ambivalence toward authentication among the libraries, since more than half of the respondents provided some sort of public access. she encouraged librarians to carefully consider the library’s commitment to service before entering into blanket license agreements with vendors or agreeing to campus computer restrictions.40 several editions of the arl spec kit series showing trends of authentication and authorization for all users of arl libraries have been an invaluable resource in this investigation. an examination of earlier spec kits indicated that the definitions of “user authentication” and “authorization” have changed over the years. user authentication, by plum and bleiler indicated that 98 percent of surveyed libraries authenticated users in some way, but at that time authentication would have been more precisely defined as authorization or permission to access personal records, such as circulation, e-mail, course registration, and file space. as such, neither authentication nor authorization was related to basic computer access.41 by contrast, it is common for current library users authenticate to have any access to a public workstation. driscoll’s library public access workstation authentication sought information on how and why users were authenticated on public-access computers, who was driving the change, how it affected the ability of federal depository libraries to provide public information, and how it affected library services in general.42 but at the time of driscoll’s survey, only 11 percent of surveyed libraries required authentication on all computers and 22 percent required it only on selected terminals. cook and shelton’s managing public computing the reconciliation of material restrictions against “principles of freedom of speech, academic freedom, and the ala’s condemnation of censorship.”23 lynch discussed institutional use of authentication and authorization and the growing difficulty of verifying bona fide users of academic library subscription databases and other electronic resources. he cautioned that future technical design choices must reflect basic library values of free speech, personal confidentiality, and trust between academic institution and publisher.24 barsun specifically examined the webpages of one hundred arl libraries in search of information pertinent to unaffiliated users. she included a historic overview of the changing attitudes of academics toward service to the unaffiliated population and described the difficult balance of college community needs with those of outsiders in 2000 (the survey year).25 barsun observed a consistent lack of information on library websites regarding library guest use of proprietary databases.26 carlson discussed academic librarians’ concerns about “internet-related crimes and hacking” leading to reconsideration of open computer use, and he described the need to compromise patron privacy by requiring authentication.27 in a chapter on the relationship of it security to academic values, oblinger said, “one possible interpretation of intellectual freedom is that individuals have the right to open and unfiltered access to the internet.”28 this statement was followed later with “equal access to information can also be seen as a logical extension of fairness.”29 a short article in library and information update alerted the authors to a uk project investigating improved online access to resources for library visitors not affiliated with the host institution.30 salotti described higher education access to e-resources in visited institutions (haervi) and its development of a toolkit to assist with the complexities of offering electronic resources to guest users.31 salotti summarized existing resources for sharing within the united kingdom and emphasized that “no single solution is likely to suit all universities and colleges, so we hope that the toolkit will offer a number of options.”32 launched by the society of college, national and university libraries (sconul), and universities and colleges information systems association (ucisa), haervi has created a best-practice guide.33 by far the most useful articles for this investigation have been those by nancy courtney. “barbarians at the gates: a half-century of unaffiliated users in academic libraries,” a literature review on the topic of visitors in academic libraries, included a summary of trends in attitude and practice toward visiting users since the 1950s.34 the article concluded with a warning: “the shift from printed to electronic formats . . . combined with the integration of library resources with campus computer networks and the internet poses a distinct threat to the public’s access to information even onsite. it is incumbent upon academic librarians to authentication and access | weber and lawrence 131 introductory letter with the invitation to participate and a forward containing definitions of terms used within the survey is in appendix a. in total, 61 (52 percent) of the 117 arl libraries invited to participate in the survey responded. this is comparable with the response rate for similar surveys reported by plum and bleiler (52 of 121, or 43 percent), driscoll (67 of 124, or 54 percent), and cook and shelton (69 of 123, or 56 percent).45 1. what is the name of your academic institution? the names of the 61 responding libraries are listed in appendix b. 2. is your institution public or private? see figure 1. respondents’ explanations of “other” are listed below. ■❏ state-related ■❏ trust instrument of the u.s. people; quasigovernment ■❏ private state-aided ■❏ federal government research library ■❏ both—private foundation, public support 3. are affiliated users required to authenticate in order to access computers in the public area of your library? see figure 2. 4. if you answered “yes” to the previous question, does your library provide the means for guest users to authenticate? see figure 3. respondents’ explanations of “other” are listed below. all described open-access computers. ■❏ “we have a few “open” terminals” ■❏ “4 computers don’t require authentication” ■❏ “some workstations do not require authentication” ■❏ “open-access pcs for guests (limited number and function)” ■❏ “no—but we maintain several open pcs for guests” ■❏ “some workstations do not require login” 5. is your library a federal depository library? see figure 4. this question caused some confusion for the canadian survey respondents because canada has its own depository services program corresponding to the u.s. federal depository program. consequently, 57 of the 61 respondents identified themselves as federal depository (including three canadian libraries), although 5 of the 61 are more accurately members of the canadian depository services program. only two responding libraries were neither a member of the u.s. federal depository program nor of the canadian depository services program. 6. if you answered “yes” to the previous question, and computer authentication is required, what provisions have been made to accommodate use of online government documents by the general public in the library? please check all that touched on every aspect of managing public computing, including public computer use, policy, and security.43 even in 2007, only 25 percent of surveyed libraries required authentication on all computers, but 46 percent required authentication on some computers, showing the trend toward an ever increasing number of libraries requiring public workstation authentication. most of the responding libraries had a computer-use policy, with 48 percent following an institution-wide policy developed by the university or central it department.44 ■■ method we constructed a survey designed to obtain current data about authentication in arl libraries and to provide insight into how guest access is granted at various academic institutions. it should be noted that the object of the survey was access to computers located in the public areas of the library for use by patrons, not access to staff computers. we constructed a simple, fourteen-question survey using the zoomerang online tool (http://www .zoomerang.com/). a list of the deans, directors, and chief operating officers from the 123 arl libraries was compiled from an internet search. we eliminated the few library administrators whose addresses could not be readily found and sent the survey to 117 individuals with the request that it be forwarded to the appropriate respondent. the recipients were informed that the goal of the project was “determination of computer authentication and current computer access practices within arl libraries” and that the intention was “to reflect practices at the main or central library” on the respondent’s campus. recipients were further informed that the names of the participating libraries and the responses would be reported in the findings, but that there would be no link between responses given and the name of the participating library. the survey introduction included the name and contact information of the institutional review board administrator for minnesota state university, mankato. potential respondents were advised that the e-mail served as informed consent for the study. the survey was administered over approximately three weeks. we sent reminders three, five, and seven days after the survey was launched to those who had not already responded. ■■ survey questions, responses, and findings we administered the survey, titled “authentication and access: academic computers 2.0,” in late april 2008. following is a copy of the fourteen-question survey with responses, interpretative data, and comments. the 132 information technology and libraries | september 2010 ■❏ “some computers are open access and require no authentication” ■❏ “some workstations do not require login” 7. if your library has open-access computers, how many do you provide? (supply number). see figure 6. a total of 61 institutions responded to this question, and 50 reported open-access computers. the number of open-access computers ranged from 2 to 3,000. as expected, the highest numbers were reported by libraries that did not require authentication for affiliates. the mean number of open-access computers was 161.2, the median was 23, the mode was 30, and the range was 2,998. 8. please indicate which online resources and services are available to authenticated users. please check all that apply. see figure 7. ■❏ online catalog ■❏ government documents ■❏ internet browser apply. see figure 5. ■❏ temporary user id and password ■❏ open access computers (unlimited access) ■❏ open access computers (access limited to government documents) ■❏ other of the 57 libraries that responded “yes” to question 5, 30 required authentication for affiliates. these institutions offered the general public access to online government documents various ways. explanations of “other” are listed below. three of these responses indicate, by survey definition, that open-access computers were provided. ■❏ “catalog-only workstations” ■❏ “4 computers don’t require authentication” ■❏ “generic login and password” ■❏ “librarians login each guest individually” ■❏ “provision made for under-18 guests needing gov doc” ■❏ “staff in gov info also login user for quick use” ■❏ “restricted guest access on all public devices” figure 3. institutions with the means to authenticate guests figure 4. libraries with federal depository and/or canadian depository services status figure 2. institutions requiring authentication figure 1. categories of responding institutions authentication and access | weber and lawrence 133 11. does your library have a written policy for use of open access computers in the public area of the library? question 7 indicates that 50 of the 61 responding libraries did offer the public two or more open-access computers. out of the 50, 28 responded that they had a written policy governing the use of computers. conversely, open-access computers were reported at 22 libraries that had no reported written policy. 12. if you answered “yes” to the previous question, please give the link to the policy and/or summarize the policy. twenty-eight libraries gave a url, a url plus a summary explanation, or a summary explanation with no url. 13. does your library have a written policy for authenticating guest users? out of the 32 libraries that required their users to authenticate (see question 3), 23 also had the means to allow their guests to authenticate (see question 4). fifteen of those libraries said they had a policy. 14. if you answered “yes” to the previous question, please give the link to the policy and/or summarize the policy. eleven ■❏ licensed electronic resources ■❏ personal e-mail access ■❏ microsoft office software 9. please indicate which online resources and services are available to authenticated guest users. please check all that apply. see figure 8. ■❏ online catalog ■❏ government documents ■❏ internet browser ■❏ licensed electronic resources ■❏ personal e-mail access ■❏ microsoft office software 10. please indicate which online resources and services are available on open-access computers. please check all that apply. see figure 9. ■❏ online catalog ■❏ government documents ■❏ internet browser ■❏ licensed electronic resources ■❏ personal e-mail access ■❏ microsoft office software figure 5. provisions for the online use of government documents where authentication is required figure 6. number of open-access computers offered figure 7. electronic resources for authenticated affiliated users (n = 32) number of libraries number of librariesnumber of libraries number of libraries figure 8. resources for authenticating guest users (n = 23) 134 information technology and libraries | september 2010 ■■ respondents and authentication figure 10 compares authentication practices of public, private, and other institutions described in response to question 2. responses from public institutions outnumbered those from private institutions, but within each group a similar percentage of libraries required their affiliated users to authenticate. therefore no statistically significant difference was found between authenticating affiliates in public and private institutions. of the 61 respondents, 32 (52 percent) required their affiliated users to authenticate (see question 3) and 23 of the 32 also had the means to authenticate guests (see question 4). the remaining 9 offered open-access computers. fourteen libraries had both the means to authenticate guests and had open-access computers (see questions 4 and 7). when we compare the results of the 2007 study by cook and shelton with the results of the current study (completed in 2008), the results are somewhat contradictory (see table 1).46 the differences in survey data seem to indicate that authentication requirements are decreasing; however, the literature review—specifically cook and shelton and the 2003 courtney article—clearly indicate that authentication is on the rise.47 this dichotomy may be explained, in part, by the fact that of the more than 60 arl libraries responding to both surveys, there was an overlap of only 34 libraries. the 30 u.s. federal depository or canadian depository services libraries that required their affiliated users to authenticate (see questions 3 and 5) provided guest access ranging from usernames and passwords, to open-access computers, to computers restricted to libraries gave the url to their policy; 4 summarized their policies. ■■ research questions answered the study resulted in answers to the questions we posed at the outset: ■■ thirty-two (52 percent) of the responding arl libraries required affiliated users to login to public computer workstations in the library. ■■ twenty-three (72 percent) of the 32 arl libraries requiring affiliated users to login to public computers provided the means for guest users to login to public computer workstations in the library. ■■ fifty (82 percent) of 61 responding arl libraries provided open-access computers for guest users; 14 (28 percent) of those 50 libraries provided both open-access computers and the means for guest authentication. ■■ without exception, all u.s. federal depository or canadian depository services libraries that required their users to authenticate offered guest users some form of access to online information. ■■ survey results indicated some differences between software provided to various users on differently accessed computers. office software was less frequently provided on open-access computers. ■■ twenty-eight responding arl libraries had written policies relating to the use of open-access computers. ■■ fifteen responding arl libraries had written policies relating to the authorization of guests. figure 9. electronic resources on open access computers (n = 50) figure 10. comparison of library type and authentication requirement number of libraries authentication and access | weber and lawrence 135 ■■ one library had guidelines for use posted next to the workstations but did not give specifics. ■■ fourteen of those requiring their users to authenticate had both open-access computers and guest authentication to offer to visitors of their libraries. other policy information was obtained by an examination of the 28 websites listed by respondents: ■■ ten of the sites specifically stated that the open-access computers were for academic use only. ■■ five of the sites specified time limits for use of openaccess computers, ranging from 30 to 90 minutes. ■■ four stated that time limits would be enforced when others were waiting to use computers. ■■ one library used a sign-in sheet to monitor time limits. ■■ one library mentioned a reservation system to monitor time limits. ■■ two libraries prohibited online gambling. ■■ six libraries prohibited viewing sexually explicit materials. ■■ guest-authentication policies of the 23 libraries that had the means to authenticate their guests, 15 had a policy for guests obtaining a username and password to authenticate, and 6 outlined their requirements of showing identification and issuing access. the other 9 had open-access computers that guests might use. the following are some of the varied approaches to guest authentication: ■■ duration of the access (when mentioned) ranged from 30 days to 12 months. ■■ one library had a form of sponsored access where current faculty or staff could grant a temporary username and password to a visitor. ■■ one library had an online vouching system that allowed the visitor to issue his or her own username and password online. ■■ one library allowed guests to register themselves by swiping an id or credit card. ■■ one library had open-access computers for local resources and only required authentication to leave the library domain. ■■ one library had the librarians log the users in as guests. ■■ one library described the privacy protection of collected personal information. ■■ no library mentioned charging a fee for allowing computer access. government documents, to librarians logging in for guests (see question 6). numbers of open-access computers ranged widely from 2 to more than 3,000 (see question 7). eleven (19 percent) of the responding u.s. federal depository or canadian depository services libraries that did not provide open-access computers issued a temporary id (nine libraries), provided open access limited to government documents (one library), or required librarian login for each guest (one library). all libraries with u.s. federal depository or canadian depository services status provided a means of public access to information to fulfill their obligation to offer government documents to guests. figure 11 shows a comparison of resources available to authenticated users and authenticated guests and offered on open-access computers. as might be expected, almost all institutions provided access to online catalogs, government documents, and internet browsers. fewer allowed access to licensed electronic resources and e-mail. access to office software showed the most dramatic drop in availability, especially on open-access computers. ■■ open-access computer policies as mentioned earlier, 28 libraries had written policies for their open-access computers (see question 11), and 28 libraries gave a url, a url plus a summary explanation, or a summary explanation with no url (see question 12). in most instances, the library policy included their campus’s acceptable-use policy. seven libraries cited their campus’s acceptable-use policy and nothing else. nearly all libraries applied the same acceptable-use policy to all users on all computers and made no distinction between policies for use of open-access computers or computers requiring authentication. following are some of the varied aspects of summarized policies pertaining to open-access computers: ■■ eight libraries stated that the computers were for academic use and that users might be asked to give up their workstation if others were waiting. table 1. comparison of findings from cook and shelton (2007) and the current survey (2008) authentication requirements 2007 (n = 69) 2008 (n = 61) some required 28 (46%) 23 (38%) required for all 15 (25%) 9 (15%) not required 18 (30%) 29 (48%) 136 information technology and libraries | september 2010 ■■ further study although the survey answered many of our questions, other questions arose. while the number of libraries requiring affiliated users to log on to their public computers is increasing, this study does not explain why this is the case. reasons could include reactions to the september 11 disaster, the usa patriot act, general security concerns, or the convenience of the personalized desktop and services for each authenticated user. perhaps a future investigation could focus on reasons for more frequent requirement of authentication. other subjects that arose in the examination of institutional policies were guest fees for services, age limits for younger users, computer time limits for guests, and collaboration between academic and public libraries. ■■ policy developed as a result of the survey findings as a result of what was learned in the survey, we drafted guidelines governing the use of open-access computers by visitors and other non-university users. the guidelines can be found at http://lib.mnsu.edu/about/libvisitors .html#access. these guidelines inform guests that openaccess computers are available to support their research, study, and professional activities. the computers also are governed by the campus policy and the state university system acceptable-use policy. guideline provisions enable staff to ask users to relinquish a computer when others are waiting or if the computer is not being used for academic purposes. while this library has the ability to generate temporary usernames and passwords, and does so for local schools coming to the library for research, no guidelines have yet been put in place for this function. figure 11. online resources available to authenticated affiliated users, guest users, open-access users authentication and access | weber and lawrence 137 these practices depend on institutional missions and goals and are limited by reasonable considerations. in the past, accommodation at some level was generally offered to the community, but the complications of affiliate authentication, guest registration, and vendor-license restrictions may effectively discourage or prevent outside users from accessing principal resources. on the other hand, open-access computers facilitate access to electronic resources. those librarians who wish to provide the same level of commitment to guest users as in the past as well as protect the rights of all should advocate to campus policy-makers at every level to allow appropriate guest access to computers to fulfill the library’s mission. in this way, the needs and rights of guest users can be balanced with the responsibilities of using campus computers. in addition, librarians should consider ensuring that the licenses of all electronic resources accommodate walk-in users and developing guidelines to prevent incorporation of electronic materials that restrict such use. this is essential if the library tradition of freedom of access to information is to continue. finally, in regard to external or guest users, academic librarians are pulled in two directions; they are torn between serving primary users and fulfilling the principles of intellectual freedom and free, universal access to information along with their obligations as federal depository libraries. at the same time, academic librarians frequently struggle with the goals of the campus administration responsible for providing secure, reliable networks, sometimes at the expense of the needs of the outside community. the data gathered in this study, indicating that 82 percent of responding libraries continue to provide at least some open-access computers, is encouraging news for guest users. balancing public access and privacy with institutional security, while a current concern, may be resolved in the way of so many earlier preoccupations of the electronic age. given the pervasiveness of the problem, however, fair and equitable treatment of all library users may continue to be a central concern for academic libraries for years to come. references 1. lori driscoll, library public access workstation authentication, spec kit 277 (washington, d.c.: association of research libraries, 2003). 2. martin cook and mark shelton, managing public computing, spec kit 302 (washington, d.c.: association of research libraries, 2007): 16. 3. h. vail deale, “public relations of academic libraries,” library trends 7 (oct. 1958): 269–77. 4. ibid., 275. 5. e. j. josey, “the college library and the community,” faculty research edition, savannah state college bulletin (dec. 1962): 61–66. ■■ conclusions while we were able to gather more than 50 years of literature pertaining to unaffiliated users in academic libraries, it soon became apparent that the scope of consideration changed radically through the years. in the early years, there was discussion about the obligation to provide service and access for the community balanced with the challenge to serve two clienteles. despite lengthy debate, there was little exception to offering the community some level of service within academic libraries. early preoccupation with physical access, material loans, ill, basic reference, and other services later became a discussion of the right to use computers, electronic resources, and other services without imposing undue difficulty to the guest. current discussions related to guest users reflect obvious changes in public computer administration over the years. authentication presently is used at a more fundamental level than in earlier years. in many libraries, users must be authorized to use the computer in any way whatsoever. as more and more institutions require authentication for their primary users, accommodation must be made if guests are to continue being served. in addition, as courtney’s 2003 research indicates, an ever increasing number of electronic databases, indexes, and journals replace print resources in library collections. this multiplies the roadblocks for guest users and exacerbates the issue.48 unless special provisions are made for computer access, community users are left without access to a major part of the library’s collections. because 104 of the 123 arl libraries (85 percent) are federal depository or canadian depository services libraries, the researchers hypothesized that most libraries responding to the survey would offer open-access computers for the use of nonaffiliated patrons. this study has shown that federal depository libraries have remained true to their mission and obligation of providing public access to government-generated documents. every federal depository respondent indicated that some means was in place to continue providing visitor and guest access to the majority of their electronic resources— whether through open-access computers, temporary or guest logins, or even librarians logging on for users. while access to government resources is required for the libraries housing government-document collections, libraries can use considerably more discretion when considering what other resources guest patrons may use. despite the commitment of libraries to the dissemination of government documents, the increasing use of authentication may ultimately diminish the libraries’ ability and desire to accommodate the information needs of the public. this survey has provided insight into the various ways academic libraries serve guest users. not all academic libraries provide public access to all library resources. 138 information technology and libraries | september 2010 identify yourself,” chronicle of higher education 50, no. 42 (june 25, 2004): a39, http://search.ebscohost.com/login.aspx?direct =true&db=aph&an=13670316&site=ehost-live (accessed mar. 2, 2009). 28. diana oblinger, “it security and academic values,” in luker and petersen, computer & network security in higher education, 4, http://net.educause.edu/ir/library/pdf/pub7008e .pdf (accessed july 14, 2008). 29. ibid., 5. 30. “access for non-affiliated users,” library & information update 7, no. 4 (2008): 10. 31. paul salotti, “introduction to haervi-he access to e-resources in visited institutions,” sconul focus no. 39 (dec. 2006): 22–23, http://www.sconul.ac.uk/publications/ newsletter/39/8.pdf (accessed july 14, 2008). 32. ibid., 23. 33. universities and colleges information systems association (ucisa), haervi: he access to e-resources in visited institutions, (oxford: ucisa, 2007), http://www.ucisa.ac.uk/ publications/~/media/files/members/activities/haervi/ haerviguide%20pdf (accessed july 14, 2008). 34. nancy courtney, “barbarians at the gates: a half-century of unaffiliated users in academic libraries,” journal of academic librarianship 27, no. 6 (nov. 2001): 473–78, http://search.ebsco host.com/login.aspx?direct=true&db=aph&an=5602739&site= ehost-live (accessed july 14, 2008). 35. ibid., 478. 36. nancy courtney, “unaffiliated users’ access to academic libraries: a survey,” journal of academic librarianship 29, no. 1 (jan. 2003): 3–7, http://search.ebscohost.com/login.aspx?dire ct=true&db=aph&an=9406155&site=ehost-live (accessed july 14, 2008). 37. ibid., 5. 38. ibid., 6. 39. ibid., 7. 40. nancy courtney, “authentication and library public access computers: a call for discussion,” college & research libraries news 65, no. 5 (may 2004): 269–70, 277, www.ala .org/ala/mgrps/divs/acrl/publications/crlnews/2004/may/ authentication.cfm (accessed july 14, 2008). 41. terry plum and richard bleiler, user authentication, spec kit 267 (washington, d.c.: association of research libraries, 2001): 9. 42. lori driscoll, library public access workstation authentication, spec kit 277 (washington, d.c.: association of research libraries, 2003): 11. 43. cook and shelton, managing public computing. 44. ibid., 15. 45. plum and bleiler, user authentication, 9; driscoll, library public access workstation authentication, 11; cook and shelton, managing public computing, 11. 46. cook and shelton, managing public computing, 15. 47. ibid.; courtney, unaffiliated users, 5–7. 48. courtney, unaffiliated users, 6–7. 6. ibid., 66. 7. h. vail deale, “campus vs. community,” library journal 89 (apr. 15, 1964): 1695–97. 8. ibid., 1696. 9. john waggoner, “the role of the private university library,” north carolina libraries 22 (winter 1964): 55–57. 10. e. j. josey, “community use of academic libraries: a symposium,” college & research libraries 28, no. 3 (may 1967): 184–85. 11. e. j. josey, “implications for college libraries,” in “community use of academic libraries,” 198–202. 12. don l. tolliver, “citizens may use any tax-supported library?” wisconsin library bulletin (nov./dec. 1976): 253. 13. ibid., 254. 14. ralph e. russell, “services for whom: a search for identity,” tennessee librarian: quarterly journal of the tennessee library association 31, no. 4 (fall 1979): 37, 39. 15. ralph e. russell, carolyn l. robison, and james e. prather, “external user access to academic libraries,” the southeastern librarian 39 (winter 1989): 135. 16. ibid., 136. 17. brenda l. johnson, “a case study in closing the university library to the public,” college & research library news 45, no. 8 (sept. 1984): 404–7. 18. lloyd m. jansen, “welcome or not, here they come: unaffiliated users of academic libraries,” reference services review 21, no. 1 (spring 1993): 7–14. 19. mary ellen bobp and debora richey, “serving secondary users: can it continue?” college & undergraduate libraries 1, no. 2 (1994): 1–15. 20. eric lease morgan, “access control in libraries,” computers in libraries 18, no. 3 (mar. 1, 1998): 38–40, http://search .ebscohost.com/login.aspx?direct=true&db=aph&an=306709& site=ehost-live (accessed aug. 1, 2008). 21. susan k. martin, “a new kind of audience,” journal of academic librarianship 24, no. 6 (nov. 1998): 469, library, information science & technology abstracts, http://search.ebsco host.com/login.aspx?direct=true&db=aph&an=1521445&site= ehost-live (accessed aug. 8, 2008). 22. peggy johnson, “serving unaffiliated users in publicly funded academic libraries,” technicalities 18, no. 1 (jan. 1998): 8–11. 23. julie still and vibiana kassabian, “the mole’s dilemma: ethical aspects of public internet access in academic libraries,” internet reference services quarterly 4, no. 3 (1999): 9. 24. clifford lynch, “authentication and trust in a networked world,” educom review 34, no. 4 (jul./aug. 1999), http://search .ebscohost.com/login.aspx?direct=true&db=aph&an=2041418 &site=ehost-live (accessed july 16, 2008). 25. rita barsun, “library web pages and policies toward ‘outsiders’: is the information there?” public services quarterly 1, no. 4 (2003): 11–27. 26. ibid., 24. 27. scott carlson, “to use that library computer, please authentication and access | weber and lawrence 139 appendix a. the survey introduction, invitation to participate, and forward dear arl member library, as part of a professional research project, we are attempting to determine computer authentication and current computer access practices within arl libraries. we have developed a very brief survey to obtain this information which we ask one representative from your institution to complete before april 25, 2008. the survey is intended to reflect practices at the main or central library on your campus. names of libraries responding to the survey may be listed but no identifying information will be linked to your responses in the analysis or publication of results. if you have any questions about your rights as a research participant, please contact anne blackhurst, minnesota state university, mankato irb administrator. anne blackhurst, irb administrator minnesota state university, mankato college of graduate studies & research 115 alumni foundation mankato, mn 56001 (507)389-2321 anne.blackhurst@mnsu.edu you may preview the survey by scrolling to the text below this message. if, after previewing you believe it should be handled by another member of your library team, please forward this message appropriately. alternatively, you may print the survey, answer it manually and mail it to: systems/ access services survey library services minnesota state university, mankato ml 3097—po box 8419 mankato, mn 56001-8419 (usa) we ask you or your representative to take 5 minutes to answer 14 questions about computer authentication practices in your main library. participation is voluntary, but follow-up reminders will be sent. this e-mail serves as your informed consent for this study. your participation in this study includes the completion of an online survey. your name and identity will not be linked in any way to the research reports. clicking the link to take the survey shows that you understand you are participating in the project and you give consent to our group to use the information you provide. you have the right to refuse to complete the survey and can discontinue it at any time. to take part in the survey, please click the link at the bottom of this e-mail. thank you in advance for your contribution to our project. if you have questions, please direct your inquiries to the contacts given below. thank you for responding to our invitation to participate in the survey. this survey is intended to determine current academic library practices for computer authentication and open access. your participation is greatly appreciated. below are the definitions of terms used within this survey: ■■ “authentication”: a username and password are required to verify the identity and status of the user in order to log on to computer workstations in the library. ■■ “affiliated user”: a library user who is eligible for campus privileges. ■■ “non-affiliated user”: a library user who is not a member of the institutional community (an alumnus may be a nonaffiliated user). this may be used interchangeably with “guest user.” ■■ “guest user”: visitor, walk-in user, nonaffiliated user. ■■ “open access computer”: computer workstation that does not require authentication by user. 140 information technology and libraries | september 2010 appendix b. responding institutions 1. university at albany state university of new york 2. university of alabama 3. university of alberta 4. university of arizona 5. arizona state university 6. boston college 7. university of british columbia 8. university at buffalo, state university of ny 9. case western reserve university 10. university of california berkeley 11. university of california, davis 12. university of california, irvine 13. university of chicago 14. university of colorado at boulder 15. university of connecticut 16. columbia university 17. dartmouth college 18. university of delaware 19. university of florida 20. florida state university 21. university of georgia 22. georgia tech 23. university of guelph 24. howard university 25. university of illinois at urbana-champaign 26. indiana university bloomington 27. iowa state university 28. johns hopkins university 29. university of kansas 30. university of louisville 31. louisiana state university 32. mcgill university 33. university of maryland 34. university of massachusetts amherst 35. university of michigan 36. michigan state university 37. university of minnesota 38. university of missouri 39. massachusetts institute of technology 40. national agricultural library 41. university of nebraska-lincoln 42. new york public library 43. northwestern university 44. ohio state university 45. oklahoma state university 46. university of oregon 47. university of pennsylvania 48. university of pittsburgh 49. purdue university 50. rice university 51. smithsonian institution 52. university of southern california 53. southern illinois university carbondale 54. syracuse university 55. temple university 56. university of tennessee 57. texas a&m university 58. texas tech university 59. tulane university 60. university of toronto 61. vanderbilt university 54 information technology and libraries | june 2011 recreation, law enforcement and public safety, and social services available in the community ■■ access to electronic encyclopedias, local libraries’ catalogs, full-text articles online, and document delivery.”2 at the time we were asking the question, will an information infrastructure be built? the answer? most assuredly. indeed, librarians stepped up to the table and ensured that the public had access to information-related services at their local library. the information the public asked for in 1994, as listed above, is widely available today. there are numerous examples in which librarians and libraries have served as leaders in the ongoing sustainablity of local, regional, and national information networks. it was pointed out at the time, and remains true today, that in an era of ever-shrinking resources, libraries cannot and should not compete with telecommunications, entertainment, and computer companies. they need to “join them as equals in the information arena.”3 lita has a viable role in the development of the twentyfirst-century skills that will firmly put the information infrastructure into place. a lita member is appointed as a liaison to the office for information technology policy (oitp) and serves on the lita technology and access committee, which addresses similar issues. the lita transliteracy interest group explores, develops, and promotes the role of libraries in all aspects of literacy. working with the oitp provides lita membership with the opportunity to participate in current issues, such as digital literacy. the information infrastructure has come a long way in the last twenty some years. there is still much to be done. robert bocher, technology consultant with the wisconsin state library and oitp fellow, will present “building the future: addressing library broadband connectivity issues in the 21st century” at the lita president’s program from 4 p.m. to 5:30 p.m. on sunday, june 26, at the ala annual conference in new orleans. i look forward to seeing you at the program and to hear about the successes and the work that remains to be done to address the broadband needs we all face in the country. references 1. federal communications commission, the national broadband plan: chapter 2: goals for a high performance america, http://www.broadband.gov/plan/2 -goals-for-a-high-performance-america/ (accessed apr. 2, 2011). 2. karen starr, “the american public, the public library, and the internet; an ever-evolving partnership” in the cybrarian’s manual, ed. pat ensor (chicago: ala, 1997): 23–24. 3. ibid., 31. t wenty years ago, librarians became involved in the implementation of the internet for the use of the public across the country. those initiatives were soon followed by the bill and melinda gates foundation projects supporting public libraries, which included funding hardware grants to implement public computer labs and connectivity grants to support high-speed internet connections. in 2008, the institute of museum and library services (imls) convened a task force to define twentyfirst-century skills for museums and libraries, which became an ongoing national initiative (http://www.imls .gov/about/21stcskills.shtm). the one year anniversary of the release of the national broadband plan was march 16, 2011. as described on broadband.gov, the plan is intended “to create a high-performance america—a more productive, creative, efficient america in which affordable broadband is available everywhere and everyone has the means and skills to use valuable broadband applications.”1 in 1994, the idaho state library’s development division cosponsored eight focus groups in which 179 people participated. the participants were asked several questions, including the types of information they would like to see on the internet. the results reflected the public’s interest at that time in the following: ■■ “expert advice on a variety of topics including medicine, law, car repair, computer technology, animal husbandry, and gardening ■■ economic development, investment, bank rates, consumer product safety, and insurance ■■ community-based information such as events, volunteers, local classified advertisements, special interest groups, housing information, public meetings, transportation schedules, and local employment opportunities ■■ computer training, foreign language programs, homework service, teacher recertification, school activities, school scheduling, and adult education ■■ electronic mail and the ability to transfer files locally as well as worldwide ■■ access to public records, voting records of legislators, absentee voting, the ability to renew a driver’s license, the rules and regulations from governmental agencies, and taxes ■■ information about hunting and fishing, environmental quality, the local weather, road advisories, sports, karen j. starr (karen.j.starr@gmail.com) is lita president 2010-11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starr president’s message: 21st century skills, 21st century infrastructure editorial the authors of “the state of rfid applications in libraries,” that appeared in the march 2006 issue, inadvertently included two sentences that are near quotations from a commentary by peter warfield and lee tien in the april 8, 2005 issue of the berkeley daily planet. on page 30 immediately following footnote 24, the authors wrote: “the eugene public library reported ‘collision’ problems on very thin materials and on videos as well as false readings from the rfid security gates. collision problems mean that two or more tags are close enough to cancel the signals, making them undetectable by the rfid checkout and security systems.” warfield and lien wrote: “the eugene (ore.) public library reported ‘collision’ problems on very thin materials and on videos as well as ‘false readings’ from the rfid security gates. (collision problems mean that two or more tags are close enough to ‘cancel the signals,’ according to an american library association publication, making them undetectable by the rfid checkout and security systems.)” (accessed may 16, 2006, www .berkeleydailyplanet.com/article.cfm?archivedate=04-08 -05&storyid=21128). the authors’ research notes indicated that it was a near quotation, but this fact was lost in the writing of the article. the article referee, the copy editors, and i did not question the authors because earlier in the same paragraph they wrote about the eugene public library experience and referred (footnote 23) to an earlier article in the berkeley daily planet. the authors and i apologize for this unfortunate error. **** july 1, 2006 marked the merger of rlg and oclc. by the time this editorial appears, many words will already have been spoken and written about this monumental, twentyfirst century library event. i know what i think the three very important immediate effects of the merger will be. first, it is a giant step toward the realization of a global library bibliographic database. second, taking advantage of rlg’s unique and successful programs and integrating them and their development philosophy as “rlgprograms,” while working alongside oclc research, seems a step so important for the future development of library technology that it cannot be overemphasized. third, and very practically, incorporating redlightgreen into open worldcat will give the library world a product that users might prefer over a search of google books or amazon. i requested and received quotes about the merger from the principals that i might put into this editorial that won’t appear until four months after the may 3 announcement. jay jordan, president and ceo, oclc, remarked: “we have worked cooperatively with rlg on a variety of projects over the years. since we announced our plans to combine, staff from both organizations have been working together to develop plans and strategies to integrate systems, products, and services. over the past several months, staff members have demonstrated great mutual respect, energy, and enthusiasm for the potential of our new relationship and what it means for the organizations we serve. there is much work to be done as we complete this transition. clearly, we are off to a good start.” betsy wilson, chair, oclc board of trustees, and dean of libraries, university of washington, wrote: “the response from our constituencies has been overwhelmingly supportive. over the past several months, we have finalized appointments for the twelve-person program council, which reports to . . . oclc through a standing committee called the rlg board committee. we are starting to build agendas for our new alliance. the members of this group from the rlg board are: james neal, vice president for information services and university librarian, columbia university; nancy eaton, dean of university libraries and scholarly communication, penn state university (and former chair of the oclc board); and carol mandel, dean of libraries, new york university. from oclc the members are elisabeth niggeman, director, deutschesbibliothek; jane ryland, senior scientist, internet 2; and betsy wilson, dean of university libraries, university of washington.” and from james michalko, currently president and ceo of rlg, and by the time you read this, vice president, rlg-programs development, oclc: “we are combining the practices of rlg and oclc in a very powerful way— by putting together the traditions of rlg and oclc we are creating a robust new venue for research institutions and new capacity that will provide unique and beneficial outcomes to the whole community.” by now, all lita members and ital readers know that in 1967, fred kilgour founded oclc; and was the founding editor of the journal of library automation (jola—vol. 1, no. 1 was published in march, 1968), which, with but a mild outcry from serials librarians, changed its title to information technology and libraries in 1982. this afternoon (6/15/06), i called fred. he and his wife eleanor reminisced about the earliest days, and then i asked him for his comments on the oclc-rlg merger. because he had had the first words about both oclc and jola, as it were, i told him that i would like for him to have the last. and this is what he said, “at long last!” fred kilgour died on july 31, 2006, aged 92. a tribute posted by alane wilson of oclc may be read at http:// scanblog.blogspot.com/2006/07/frederick-g-kilgour -1914-2006.html editorial: a confession, a speculation, and a farewell john webb john webb (jwebb@wsu.edu) is a librarian emeritus, washington state university and editor of information technology and libraries. editorial | webb 115 kruger ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ an algorithm for variable-length proper-name compression 257 james l. dolby: r & d consultants company, los altos, california viable on-line search systems require reasonable capabilities to automatically detect (and hopefully correct) variations between request format and stored format. an important requirement is the solution of the problem of matching proper names, not only because both input specificatiof.i,s and storage specifications are subject to error, but also because various transliteration schemes exist and can provide variant proper name forms in the same data base. this paper reviews several proper name matching schemes and provides an updated version of these schemes which tests out nicely on the proper name equivalence classes of a suburban telephone book. an appendix lists the corpus of names used for algorithm test. a viable on-line search system cannot reasonably assume that each user will invariably provide the proper input information without error. human beings not only make errors, but also expect their correspondents, be they human or mechanical, to be able to cope with these errors, at least at some reasonable error-rate level. many of the difficulties in implementing computer systems in many areas of human activity stem from failure to recognize, and plan for, routine acceptance of errors in the systems. indeed, computing did not become the widespread activity it is now until the socalled higher-level languages came into being. although it is customary to think of higher-level languages as being "more english-like," the height of their level is better measured by the brevity with which various jobs can be expressed (for brevity tends to reduce errors) and the degree of sophistication of their automatic error detection and correction procedures. the processing of catalog information for the purposes of exposing and retrieving information presents at least two major areas for research in automatic error detection and correction. at the first stage, the data bank must be created, updated and maintained. methods for dealing with input errors at this level have been derived by a number of groups and it seems reasonable to assert that something in the order of 60% of the input errors can be detected automatically ( 1,2,3 ). with the possibility of human proof258 journal of library automation vol. 3/4 december, 1970 reading and error detection through actual use, it is reasonable to expect a mature data base to have a very low over-all error rate. at the second stage, however, when a user approaches the data base through a terminal or other on-line device, the errors will be of a recurring nature. each user will generate his own error set and, though experience will tend to minimize the error rate for a particular user, there will be an essentially irreducible minimum error rate even for an experienced user. if the system is to attract users other than professional interrogators, it must respond intelligently at this minimal error level. this paper explores certain problems associated with making "noisy matches" in catalog searches. because preliminary information indicates that the most likely source of input errors is in the keyboarding of proper names, the main emphasis of the paper is on the problem of algorithmically compressing proper names in such a way as to identify similar names (and likely misspellings) without over-identifying the list of possible authors. existing name-compression algorithms the problem of providing equivalence classes of proper names is hardly new. library catalogs, telephone directories and other major data bases have made use of "see-also"-type references for many years. some years ago remington-rand derived an alphanumeric name compression algorithm, soundex, that could be applied either by hand or by machine for such purposes ( 4). perhaps the most widely used on-line retrieval system presently in existence, the airline reservation system (such as sabre), makes use of such an algorithm (5). the closely related problem of compressing english words (either to establish noisy matches, to eliminate misspelled words, or simply to achieve data bank compression) has also received some attention ( 6, 7, 8). implementation of such algorithms has been described ( 9, 10, 11, 12, 13). although english word structure differs from proper-name structure in some important respects (e.g., the existence of suffixes), three of the algorithms are constructed by giving varying degrees of attention to the following five areas of word structure: 1 ) the character in word initial position; 2) the character set: (a, e, i, 0, u, y, h, w); 3) doubled characters (e.g., tt); 4) transformation of consonants (i.e., all alphabetic characters other than those in 2 above) into equivalence classes; 5) truncation of the residual character string. the word-initial character receives varying attention. soundex places the initial consonant in the initial position of the compressed form and then transforms all other consonants into equivalence classes with numeric titles. sabre maintains the word-initial character even if it is a vowel. in the armour research foundation scheme (arf), the word-initial character is also retained as is. algorithm for name compressionjdolby 259 both soundex and sabre eliminate all characters in the set 2) above. the arf scheme retains all characters in shorter words and deletes vowels only, to reduce the compressed form to four characters, deleting the "u" after "q," the second vowel in a vowel string, and then all remaining vowels. all three systems delete the second letter of a double-letter string. sabre goes a step further and deletes the second letter of a doubleletter string occurring after the vowels have been deleted. thus, the second "r" of "bearer" would be deleted. soundex maps the eighteen consonants into six equivalence classes: 1) b, f, p, v 2) c, g, j, k, q, s, x, z 3) d, t 4) l 5) m, n 6) r sabre and arf do not perform any transformations on these eighteen consonants. finally, all three systems truncate the remaining string of characters to four characters. for shorter forms, padding in the form of zeros (soundex), blanks (sabre), or hyphens (arf) is added so that all codes are precisely four characters long. variable-length coding schemes have been considered but generally rejected for implementation on major systems because of the attendant difficulties of programming and the fact that code compression is enhanced by fixed-length codes where no interword space is necessary. although fixed-length schemes of length greater than four have been considered, no definitive data appears to be available as to the enhanced ability of compressed codes to discriminate by introduction of more characters. the sabre system does add a fifth character but makes use of the person's first initial for added discrimination. tukey ( 14) has constructed a personal author code for his citation indexing and permuted title studies on an extensive corpus of the statistical literature. in this situation the author code is a semi-mnemonic code in a tag form to assist the user in identification rather than to be used as a basic entry point. however, tukey does note that in his corpus a threecharacter code of the surname, plus two initials, is superior to a fivecharacter surname code for purposes of unique identification. measuring algorithmic performance one of the main problems in constructing linguistic algorithms is to decide on appropriate measures of performance and to obtain data bases for implementing such measures. in this case it is clear that certain improvements in existing algorithms can be madeparticularly by using more sophisticated b·ansformation rules for the consonants and that 260 journal of librat·y automation vol. 3/4 december, 1970 the problems of implementing such changes are not so great in today's context as they were when the systems noted above were originally derived. improvements in processing speeds and programming languages, however, do not remove the need for keeping "linguistic frills" to a minimum. ideally, it would be desirable to have a list of common errors in keyboarding names as a test basis for any proposed algorithms. unfortunately, no such list of sufficient size appears to be available. lacking this, one can speculate that certain formal properties of the predictability of language might be useful in deriving an algorithm. at the english word level, some effort has been made to exploit measures of entropy as developed by shannon in this direction (6, 7). however, there is good reason to question whether entropy, at least when measured in the usual way, is strongly correlated with actually occurring errors ( 15). as an alternative, one can study existing lists of personal-name equivalence classes to derive such algorithms and then test the algorithm against such classes, measuring both the degree of over-identification and the degree of under-identification. clearly, such tests will carry more weight if they are conducted under economic forcing conditions where weaknesses in the test set will lead to real and measurable expense to the organization publishing the list. the sabre system operates under strong economic forcing conditions in the sense that airline passengers frequently have a number of competitive alternatives available to them and lost reservations can cause sufficient inconvenience for them to consider these alternatives. however, the main application of the sabre system is to rather small groups of persons (at least when compared to the number of personal authors in a typical library catalog), so that errors of over-identification are essentially trivial in cost to the airlines. a readily available source of "see-also"-type equivalence classes of proper names is given in the telephone directory system. here, the economic forcing system is not so strong as in the airline situation, but it is measurable in that failure to provide an adequate list will lead to increased user dependence on the information operator, with consequent increased cost to the telephone company. as a test of the feasibility of using such a set of equivalence classes, the 451 classes found in the palo alto-los altos (california) telephone directory were copied out by hand and used in deriving and testing the algorithm given in the next section and the soundex algorithm. there remains the question of deciding what is to constitute proper agreement between any algorithm and the set of equivalence classes chosen as a data base. at the grossest level it seems reasonable to argue that overidentification is less serious than under-identification. false drops only tend to clog the line. lost reference points, on the other hand, lead to lost information. investigation of other applications of linguistic algorithms, such as algorithms to hyphenate words, identify semantically similar words through cutting off of suffixes, and so forth, indicates that it is usually algorithm for name compressionjdolby 261 possible to reduce crucial error (in this case under-identification) to something under 5%, while preserving something in the order of 80% of the original distinctions (or efficiency) of the system. efforts to improve materially on the "five-and-eighty" rule generally lead to solutions involving larger context and/or extensive exception dictionaries. in this study efforts are directed at achieving a "five-and-eighty" solution. a variable-length name-compression scheme in light of the fact that no definitive information is available on the problems of truncating errors in name-compression algorithms, it is convenient to break the problem into two pieces. first is derivation of a variable-length algorithm of the required accuracy and efficiency and then determination of the errors induced by truncation. a studying of the set of equivalence classes given in the palo alto-los altos telephone directory made fairly clear that with minor modifications of the basic five steps used in the other algorithms noted above, it would not be too difficult to provide a reasonably accurate match without requiring too much over-identification. the main modifications made consisted of maintaining the position of the first vowel and using local context to make transformations on the consonants. the algorithm is given below. (the rules given must be applied in the order given both with respect to the rules themselves and to the order of the lists within the rules, as the precedence relations are important to the performance of the algorithm.) a spelling equivalent abbreviation algorithm for personal names 1) transform: "meg" to "mk", "mag" to "mk", "mac" to "mk", "me" to "mk". 2) working from the right, recursively delete the second letter from th f ii . i tt · "dt" "ld" " d" " t" " " " d" " t" " '' e o owmg e er parrs: , , n , n , rc , r , r , sc , "sk", "st''. 3) t f ,, , t "k ,, (( , t 1.( , " ., t " ., " , t " ,~ ,, rans orm: x o s , ce o se , c1 o s1 , cy o sy , consonant-ch" to "consonant-sh"; all other occurrences of "c" to "k", "z" to "s", "wr" to "r", "dg" to "g", "qu" to "k'', "t" to "d", "ph" to "f' (after the first letter). 4) delete all consonants other than "1", "n", and y' which precede the letter "k" (after the first letter). 5) delete one letter from any doubled consonant. 6) transform "pf#" to "p#", "#pf" to "#f", "vowel-gh#" to "vowel-£#", "consonant-gh" to "consonant-g#", and delete all other occurrences of "gh". ("#"is the word-beginning and word-ending marker.) 7) replace the first vowel in the name by the symbol "•". 8) delete all remaining vowels. 9) delete all occurrences of "w" or "h" after the first letter in the word. the vowels are taken to be (a, e, i, 0, u, y) . the remaining literal characters are treated as consonants. 262 journal of library automation vol. 3/4 december, 1970 the algorithm splits 22 ( 4.9%) of the 451 equivalence classes given by the phone directory. on the other hand, the algorithm provides 349 distinct classes (not counting those classes that were broken off in error) or 77.4% of the 451 classes in the telephone directory data base. thus has been achieved a reasonable approximation to the "five-and-eighty" performance found in other linguistic problem areas. to give a proper appreciation of the nature of these underidentification errors, they are discussed below individually. 1) the name bryer is put in the same equivalence class with a variety of spellings of the name bear. the algorithm fails to make this identification. 2) blagburn is not equated to blackburn. 3) the name davison is equated to davidson in its various forms. the algorithm fails to make this identification and this appears to be one of a modest class of difficulties that occur prior to the -son, -sen names. 4) the class of names dickinson, dickerson, dickison, and dickenson are all equated by the directory but kept separate, except for the two forms of dickinson, by the algorithm. 5) the name holm is not equated with the name home. 6) the name holmes is not equated with the name homes. 7) the algorithm fails to equate jaeger with two forms of yaeger. 8) the algorithm fails to equate lamb with lamn. 9) the algorithm incorrectly assumes that the final "gh" of leigh should be treated as an "f." treating final "gh" either as a null sound or an "f' leads to about the same number of errors in either direction. 10) the algorithm fails on the pairing of leicester and lester. the difficulty is an intervening vowel. 11) the algorithm fails to equate the various forms of lindsay with the forms of lindsley. 12) the algorithm fails to equate the various forms of mclaughlin with mclachlan. 13) the algorithm fails to equate mccullogh with mccullah. this is again the final "gh" problem. 14) the algorithm fails to equate mccue with mchugh (again the final "gh" problem) . 15) the algorithm fails to equate moretton with morton. this is an intervening vowel problem. 16) the algorithm fails to equate rauch with roush. 17) the algorithm fails to equate robinson with robison (another -son type problem). 18) the algorithm incorrectly assumes that the interior "ph" of shepherd is an "£." 19) the algorithm fails to equate speer with speier. algorithm for name compressionjdolby 263 20) the algorithm fails to equate stevens with stephens. 21) the algorithm fails to equate stevenson with stephenson. 22) the algorithm fails to equate the various forms of the word thompson (an -son problem.) in several of the errors noted above it may be questioned whether the telephone directory is following its own procedures with complete rigor. setting these aside, the primary errors occur with the final "gh," the words ending in "son," and the words with the extraneous interior vowels. each of these problems can be resolved to any desired degree of accuracy, but only at the expense of noticeable ·increases in the degree of complexity of the algorithm. the truncation problem simple truncation does not introduce errors of under-identification; it can only lead to further over-identification. examination of the results of applying the algorithm to the telephone directory data base shows that no new over-identification is introduced if the compressed codes are all reduced to the leftmost seven characters. further truncation leads to the following results: code length 7 6 5 4 cumulative over-identification losses 0 1 6 45 thus there is a strong argument for maintaining at least five characters in the compressed code. however, there is no real need for restriction to simple truncation. following the procedures used in the arf system, further truncation can be obtained by selectively removing some of the remaining characters. the natural candidate for such removal is the vowel marker. if the vowel marker is removed from all the five character codes, only six more overidentification errors are introduced. removal of the vowel markers from all of the codes would have introduced 17 more errors of over-identification. the utility of the vowel marker is in the short codes. this in turn suggests that introduction of a second vowel marker in the very short codes may have some utility, and this is indeed the case. if the conception of vowel marker is generalized as marking the position of a vowel-string (i.e., a string of consecutive vowels), where for these purposes a vowel is any of the characters (a, e, i, 0, u, y, h, w), and these markers are maintained as "padding" in the very short words, 18 errors of over-identification are eliminated at the cost of two new errors of under-identification. in this way the following modification to the variable length algorithm is derived: 1) mark the position of each of the first two vowel strings with an "o ," if there is more than one vowel. 264 journal of library automation vol. 3/4 december, 1970 2) truncate to six characters. 3) if the six-character code has two vowel markers, remove the righthand vowel marker. otherwise, truncate the sixth character. 4) if the resulting five-character code has a vowel marker, remove it. otherwise remove the fifth character. 5) for all codes having less than four characters in the variable-length fonn, pad to four characters by adding blanks to the right. measured against the telephone directory data base, this fixed-length compression code provides 361 distinct classes (not counting improper class splits as separate classes) or 80% of the 451 given classes. twentyfour ( 5.3 %) of the classes are improperly split. by way of comparison, the sound ex system improperly splits 135 classes ( 30%) and provides only 287 distinct classes (not counting improperly split classes), or 63.8% of the telephone directory data base. acknowledgments this research was carried out for the institute of library research, university of california, under the sponsorship of the office of education, research grant no. oeg-1-7-071083-5068. the author would like to thank ralph m. shoffner and kelley l. cartwright for suggesting the problem and for a number of useful comments on existing systems. allan j. humphrey was kind enough to program the variable-length version of the algorithm for test purposes. appendix: corpus of names used for algorithm test a list of personal-name equivalence classes from the palo alto-los altos telephone directory is arranged according to the variable-length compression code (with the vowel marked "•" treated as an "a" for ordering) . names whose compressed codes do not match the one given in the first column (and hence represent weaknesses in the algorithm and/ or the directory groupings) are given in italics. a small number of directory entries that do not bear on the immediate problem have been deleted from the list : bell's see also bells; co-op see also co-operative; st. see also saint; etc. 0 bl abel, abele, abell, able 0 brms abrahams, abrams 0 brmsn abrahamson, abramson •d eddy, eddie 0 dmns edmonds, edmunds 0 dmnsn edmondson, edmundson 0 dms adams, addems 0 gn eagen, egan, eggen 0 gr jaeger, yaeger, yeager °kn aiken, aikin, aitken °kns adkins, akins °kr okr ·ks 0 lbrd ·ln 0 ln 0 lsn 0 lvr •ms 0 ngl 0 nl 0 nrs 0 nrsn •ns 0 rksn 0 rl 0 rn •rns •rs 0 rvn 0 rvng 0 sbrn b•n b•ns b°kmn b0 l b0 l b0 l b0 l b.l b 0 ln b·m b 0 mn b•n b0 nd b·r b0 r b•r b•r b 0 rbr b•rc b 0 rgr b 0 rk b 0 rn algorithm for name compressionjdolby 265 acker, aker eckard, eckardt, eckart, eckert, eckhardt oakes, oaks, ochs albright, allbright elliot, elliott allan, allen, allyn ohlsen, olesen, olsen, olson, olsson oliveira, olivera, olivero ames, eames engel, engle, ingle o'neal, o'neil, o'neill andrews, andrus andersen, anderson, andreasen ennis, enos enrichsen, erickson, ericson, ericsson, eriksen earley, early erwin, irwin aarons, ahrends, ahrens, arens, arentz, arons ayers, ayres ervin, ervine, irvin, irvine erving, irving osborn, osborne, osbourne, osburn beatie, beattie, beatty, beaty, beedie betts, betz bachman, bachmann, backman bailey, baillie, bailly, baily, bayley beal, beale, beall, biehl belew, ballou, bellew buhl, buell belle, bell bolton, boulton baum, bohm, bohme bauman, bowman bain, bane, bayne bennet, bennett baer, bahr, baier, bair, bare, bear, beare, behr, beier, bier, bryer barry, beare, beery, berry bauer, baur, bower bird, burd, byrd barbour, barber berg, bergh, burge berger, burger boerke, birk, bourke, burk, burke burn, byrne 266 journal of library automation vol. 3/4 december, 1970 b 0 rnr b 0 rns b 0 rnsn b0 rs bl°kbrn bl 0 m br 0 d br 0 n br 0 n d 0 ds d°f d 0 gn d°k n•knsn n•ksn n•l n•l n•l d 0 mn n•n n•n n•n n•n n•n d0 nl d.r n•r d 0 rm d 0 vdsn n•vs dr0 sl f• f°fr f 0 gn f 0 l f 0 l f 0 lknr f 0 lps f 0 ngn f 0 nl f0 rl f 0 rr f 0 rr f 0 rs bernard, bernhard, bernhardt, bernhart berns, bims, burns, byrns, byrnes bernstein, bornstein bertsch, birch, burch blackburn, blagburn blom, bloom, bluhm, blum, blume brode, brodie, brody braun, brown, browne brand, brandt, brant diezt, ditz duffie, duffy dougan, dugan, duggan dickey, dicke dickenson, dickerson, dickinson, dickison dickson, dixon, dixson dailey, daily, daley, daly dahl, dahle, dall, doll deahl, deal, diehl diamond, dimond, dymond dean, deane, deen denney, denny donahoo, donahue, donoho, donohoe, donohoo, donohue, dunnahoo downey, downie dunn, dunne donley, donnelley, donnelly daugherty, doherty, dougherty dyar, dyer derham, durham davidsen, davidson, davison davies, davis driscoll, driskell fay, fahay, fahey fifer, pfeffer, pfeiffer fagan, feigan, fegan feil; pfeil feld, feldt, felt faulkner, falconer philips, phillips finnegan, finnigan finlay, finley farrell, ferrell ferrara, ferreira, ferriera foerster, forester, forrester, forster forrest, forest f 0 rs f 0 rs f 0 sr fl 0 n fl 0 ngn fr0 fr0 dmn fr0 drksn fr°k fr0 ns fr0 ns fr 0 s fr0 sr g0 d g0 ds g°f g0 l g0 lmr g0 lr g0 ms g0 nr g 0 nsls g0 nslvs g0 rd c•rn g 0 rn g 0 rnr c•rr g 0 s gr 0 gr.fd gr0 n gr•s h•n h°f h°fmn h0 g h 0 gn h°k h°ksn h 0 l h•l h•l h0 l h 0 ld algorithm for name compressionjdolby 267 faris, farriss, ferris, ferriss first, fuerst, furst fischer, fisher flinn, flynn flanagan, flanigan, flannigan frei, frey, fry, frye freedman, friedman frederickson, frederiksen, fredickson, fredriksson franck, frank france, frantz, franz frances, francis freeze, freese, fries fraser, frasier, frazer, frazier good, goode getz, goetz, goetze goff, gough gold, goold, gould gilmer, gilmore, gilmour gallagher, gallaher, galleher gomes, gomez guenther, gunther gonzales, gonzalez consalves, gonzalves garratt, garrett garrity, geraghty, geraty, gerrity gorden, gordohn, gordon gardiner, gardner, gartner garrard, gerard, gerrard, girard gauss, goss gray, grey griffeth, griffith green, greene gros, grose, gross hyde, heidt hoff, hough, huff hoffman, hoffmann, hofman, hofmann, huffman hoag, hoge, hogue hagan, hagen hauch, hauck, hauk, hauke hutcheson, hutchison holley, holly holl, hall halley, haley haile, hale holiday, halliday, holladay, holliday i 268 journal of libra1·y automation vol. 3/4 december, 1970 h 0 lg h 0 lm h 0 lms h 0 ln h0 m h 0 mr h 0 n h 0 n h0 nn h 0 nrks h 0 nrksn h0 ns h0 ns i-jonsn h 0 r h 0 r h 0 r h 0 r h 0 rmn h 0 rmn h 0 rmn h0 rn h 0 rn h 0 rn h 0 rngdn h 0 s h 0 s h 0 s h 0 sn h 0 vr r tfr rfrs tkb rkbsn rks rl rms rmsn rnsn rs ko k°f k°fmn helwig, hellwig holm, home holmes, homes highland, hyland ham, hamm hammar, hammer hanna, hannah hahn, hahne, harm, haun hanan, hannan, hannon hendricks, hendrix, henriques hendrickson, henriksen, henrikson heintz, heinz, heinze, hindes, hinds, hines, hinze haines, haynes henson, hansen, hanson, hanssen, hansson, hanszen herd, heard, hird, hurd hart, hardt, harte, heart hare, hair hardey, hardie, hardy hartman, hardmen, hardman, hartmann herman, hermann, herrmann harman, harmon heron, herrin, herron hardin, harden hom, horne herrington, harrington haas, haase, hasse howes, house, howse hays, hayes houston, huston hoover, hover jew, jue jeffery, jeffrey jefferies, jefferis, jefferys, jeffreys jacobi, jacoby jacobsen, jacobson, jackobsen jacques, jacks, jaques jewell, juhl jaimes, james jameson, jamieson, jamison jahnsen, jansen, jansohn, janssen, jansson, janzen, jensen, jenson joice, joyce kay, kaye coffee, coffey coffman, kauffman, kaufman, kaufmann k°k k0 l k0 l k0 lmn k0 lr k0 mbrln k 0 mbs k0 mp k0 mps k0 n k0 n k0 n k0 n k0 n k0 n k0 n k 0 nl k 0 nr k0 ns k0 p k0 pl k0 r k0 r k0 r k0 r k0 r k 0 rd k0 rln k 0 rn k0 rsnr k0 s k0 s k0 s k0 sl k0 slr k 0 sr kl 0 n kl.,rk kl 0 sn kr 0 kr 0 gr kr.,mr kr 0 n kr 0 s kr 0 s algor·ithm. for name compressionfdolby 269 cook, cooke, koch, koche cole, kohl, koll kelley, kelly coleman, cohnan koehler, koeller, kohler, koller chamberlain, chamberlin combs, coombes, coombs camp, kampe, kampf campos, campus cahn, conn, kahn cahen, cain, caine, cane, kain, kane chin, chinn chaney, cheney coen, cohan, cohen, cohn, cone, koehn, kahn coon, kuhn, kuhne kenney, kenny, kinney conley, conly, connelly, connolly conner, connor coons, koontz, kuhns, kuns, kuntz, kunz coop, co-op, coope, coupe, koop chapel, chapell, chappel, chappell, chappelle, chapple carrie, carey, cary corey, cory carr, kar, karr kurtz, kurz kehr, ker, kerr cartwright, cortright carleton, carlton carney, cerney, kearney kirschner, kirchner chace, chase cass, kass kees, keyes, keys cassel, cassell, castle kesler, kessler, kestler kaiser, kayser, keizer, keyser, kieser, kiser, kizer cline, klein, kleine, kline clark, clarke claussen, clausen, clawson, closson crow, crowe krieger, kroeger, krueger, kruger creamer, cramer, kraemer, kl·amer, kremer craine, crane christie, christy, kristee crouss, kraus, krausch, krause, krouse 270 journal of library automation vol. 3/4 december, 1970 kr 0 s kr 0 s kr 0 snsn lo lo l 0 d l 0 dl l 0 drmn l°k l°ks l 0 ln l 0 lr l 0 mb l 0 mn l 0 mn l0 n l0 n l0 n l0 n l 0 ng l 0 nn l 0 ns l 0 r l 0 rns l 0 rns l 0 rsn l 0 s l 0 s l 0 sr l0 v l 0 vd l 0 vl l 0 vn m 0 d m 0 dn m0 ds m 0 dsn m°kl m°km m°ks m°ks m 0 ln m 0 ln m 0 lr m 0 lr cross, krost crews, cruz, kruse christensen, christiansen, christianson loe, loewe, low, lowe lea, lee, leigh lloyd, loyd litle, littell, little, lytle ledterman, letterman leach, leech, leitch lucas, lukas laughlin, loughlin lawler, lawlor lamb, lamm lemen, lemmon, lemon layman, lehman, lehmann lind, lynd, lynde lion, lyon lin, linn, lynn, lynne lain, laine, laing, lane, layne lang, lange london, lundin lindsay, lindsey, lindsley, linsley lawry, lowery, lowrey, lowry lawrence, lowrance laurence, lawrance, lawrence, lorence, lorenz larsen, larson lewis, louis, luis, luiz lacey, lacy leicester, lester levey, levi, levy leavett, leavitt, levit lavell, lavelle, leavelle, loveall, lovell lavin, levin, levine mead, meade m oretton, morton mathews, matthews madison, madsen, matson, matteson, mattison, mattson michael, michel meacham, mechem marques, marquez, marquis, marquiss marcks, marks, marx maloney, moloney, molony mullan, mullen, mullin mallery, mallory moeller, moller, mueller, muller m0 lr m 0 ls m 0 n m0 nr m0 nr m0 nsn m 0 r m 0 r m0 r m0 r m0 r m0 rf m0 rl m 0 rn m 0 rs m0 rs mk0 mk0 mk0 mk 0 mk 0 l mk 0 lf mk 0 lm mk 0 n mk 0 nr mk 0 ns mk0 ns mk0 r mk0 r mkd 0 nl mkf 0 rln mkf 0 rsn mkl 0 d mkl 0 kln mkl 0 ln mkl 0 n mkl•n mkl 0 s mkm 0 ln mkn°l mkr•o n°kl n°kls n°kls algorithm for name compressionjdolby 271 millar, miller miles, myles mahan, mann miner, minor monroe, munro monson, munson murray, murrey maher, maier, mayer mohr, moor, moore meyers, myers meier, meyer, mieir, myhre murphey, murphy merrell, merrill marten, martin, martine, martyn meyers, myers maurice, morris, morse mccoy, mccaughey magee, mcgee, mcgehee, mcghie mackey, mackay, mackie, mckay mccue, mchugh magill, mcgill mccollough, mccullah, mccullough mccallum, mccollum, mccolm mckenney, mckinney macintyre, mcentire, mcintire, mcintyre mackenzie, mckenzie maginnis, mcginnis, mcguinness, mcinnes, mcinnis maguire, mcguire mccarthy, mccarty macdonald, mcdonald, mcdonnell macfarland, macfarlane, mcfarland, mcfarlane macpherson, mcpherson macleod, mccloud, mcleod maclachlan, maclachlin, mclachlan, mclaughlin, mcloughlin mcclellan, mcclelland, mclellan mcclain, mcclaine, mclain, mclane maclean, mcclean, mclean mccloskey, mcclosky, mccluskey macmillan, mcmillan, mcmillin macneal, mcneal, mcneil, mcneill magrath, mcgrath nichol, nicholl, nickel, nickle, nicol, nicoll nicholls, nichols, nickels, nickles, nicols nicholas, nicolas 272 journal of library automation vol. 3/4 d ecember, 1970 n°klsn n°ksn n°l n°lsn n°mn n°rs n°sbd p•n p 0 drsn p•c p 0 lk p0 lsn p•n p•r p•r p0 rk p 0 rks p•rs r•rs p•rs p 0 rsn pr°kr pr 0 ns pr 0 r r• r• r 0 bnsn r•n r•n r 0 d r 0 dr r•ns r 0 gn r•gr r°k r°k r°kr n•l r0 mngtn r0 mr n•ms n•n r0 nr r•s nicholsen, nicholson, nicolaisen, nicolson nickson, nixon neal, neale, neall, neel, neil, neill neilsen, neilson, nelsen, nelson, nielsen, nielson, nilson, nilssen, nilsson neumann, newman norris, nourse nesbit, nesbitt, nisbet pettee, petty peterson, pederson, pedersen, petersen, petterson page, paige polak, pollack, pollak, pollock polson, paulsen, paulson, poulsen, poulsson paine, payn, payne parry, perry parr, paar park, parke parks, parkes pierce, pearce, peirce, piers parish, parrish paris, parris pierson, pearson, pehrson, peirson prichard, pritchard prince, prinz prior, pryor roe, rowe rae, ray, raye, rea, rey, wray robinson, robison rothe, roth rudd, rood, rude reed, read, reade, reid rider, ryder rhoades, rhoads, rhodes regan, ragon, reagan rodgers, rogers richey, ritchey, ritchie reich, reiche reichardt, richert, rickard reilley, reilly, reilli, riley remington, rimington reamer, reimer, riemer, rimmer ramsay, ramsey rhein, rhine, ryan reinhard, reinhardt, reinhart, rhinehart, rinehart reas, reece, rees, reese, reis, reiss, ries r0 s r0 s r0 s r•vs s•br s°fl s•fn s°fns s°fnsn s°fr s°fr s•cl s 0 glr s•k s•ks s•l s•l s•lr s•ls s•lv s•lvr s 0 mkr s 0 mn s 0 mn s•mrs s·ms s•n s 0 n s 0 nr s0 nrs s 0 pr s·r s·r s·r s 0 r s0 r s•rl s 0 rlng s•rmn s0 rn s•rr sos sm 0 d algorithm for name compressionjdolby 273 rauch, rausch, roach, roche, roush rush, rusch russ, rus reaves, reeves seibert, siebert schofield, scofield stefan, steffan, steffen, stephan, stephen steffens, stephens, stevens steffensen, steffenson, stephenson, stevenson schaefer, schaeffer, schafer, schaffer, schafer, shaffer, sheaffer stauffer, stouffer siegal, sigal sigler, ziegler schuck, shuck sachs, sacks, saks, sax, saxe seeley, seely, seley schell, shell schuler, schuller schultz, schultze, schulz, schulze, shults, shultz silva, sylva silveira, silvera, silveria schomaker, schumacher, schumaker, shoemaker, shumaker simon, symon seaman, seemann, semon somers, sommars, sommers, summers simms, sims stein, stine sweeney, sweeny, sweney senter, center sanders, saunders shepard, shephard, shepheard, shepherd, sheppard stahr, star, starr stewart, stuart storey, story saier, sayre schwartz, schwarz, schwarze, swartz schirle, shirley sterling, stirling scheuermann, schurman, sherman stearn, stem scherer, shearer, sharer, sherer, sheerer sousa, souza smith, smyth, smythe 274 journal of library automation vol. 3/4 december, 1970 sm 0 d sn°dr sn°l sp 0 lng sp 0 r sp 0 r sr 0 dr sr0 dr t0 d t 0 msn t0 rl tr 0 s v·l v·l v·r w•o w 0 dkr w·nl w·nmn w 0 dr w 0 drs w 0 gnr w 0 l w 0 l w 0 l w 0 lbr w 0 lf w 0 lkns w 0 lks w 0 ln w 0 lr w 0 lrs w 0 ls w 0 ls w 0 ls w 0 lsn w 0 n w 0 r w 0 r w 0 rl w 0 rnr w 0 s w·smn schmid, schmidt, schmit, schmitt, smit schneider, schnieder, snaider, snider, snyder schnell, snell spalding, spaulding spear, speer, speirer spears, speers schroder, schroeder, schroeter schrader, shrader tait, tate thomason, thompson, thomsen, thomson, tomson terrel, terrell, terrill tracey, tracy vail, vaile, vale valley, valle vieira, vierra white, wight whitacre, whitaker, whiteaker, whittaker whiteley, whitley whitman, wittman woodard, woodward waters, watters wagener, waggener, wagoner, wagner, wegner, waggoner willey, willi wiley, wylie wahl, wall wilber, wilbur wolf, wolfe, wolff, woolf, woulfe, wulf, wulff wilkens, wilkins wilkes, wilks whalen, whelan walter, walther, wolter walters, walthers, wolters wallace, wallis welch, welsh welles, wells willson, wilson winn, wynn, wynne worth, wirth ware, wear, weir, wier wehrle, wehrlie, werle, worley warner, werner weis, weiss, wiese, wise, wyss weismann, weissman, weseman, wiseman, wismonn, wissman algorithm for name compressionjdolby 275 references 1. cox, n.s.m.; dolby, j. l.: "structured linguistic data and the automatic detection of errors." in advances in computer typesetting (london: institute of printing, 1966), pp. 122-125. 2. cox, n.s.m.; dews, j. d.; dolby, j. l.,: the computer and the library (hamden, conn.: archon press, 1967). 3. dolby, j. l.; forsyth, v. j.; resnikoff, h. l.: computerized library catalogs: their growth, cost and utility (cambridge, massachusetts: the m.i.t. press, 1969) . 4. becker, joseph; hayes, robert m. : information storage and retrieval (new york: wiley, 1963 ), p. 143. 5. davidson, leon: "retrieval of misspelled names in airlines passenger record system," communications of the acm, 5 (1962), 169-171. 6. blair, c. r.: "a program for correcting spelling errors," information & control, 3 ( 1960), 60-67. 7. schwartz, e. s.: an adaptive information transmission system employing minimum redundancy word codes (armour research foundation report, april 1962). (ad 274-135). 8. bourne, c. p.; ford, d.: "a study of methods for systematically abbreviating english words and names," journal of the acm, 8 ( 1961), 538-552. 9. kessler, m. m., "the "on-line" technical information system at m.i.t.", in 1967 ieee international convention record. (new york: institute of electrical and electronic engineers, 1967), pp. 40-43. 10. kilgour, f. g.: "retrieval of single entries from a computerized library catalog file," american society for information science, proceedings, 5 ( 1968), 133-136. 11. nugent, w. r.: "compression word coding techniques for information retrieval," journal of library automation, 1 (december 1968), 250-260. 12. rothrock, h. i.: computer-assisted directory search; a dissertation in electrical engineering. (philadelphia: university of pennsylvania, 1968). 13. ruecking, f. h.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-238. 14. tukey, j. w.: a tagging system for journal articles and other citable items: a status report (princeton, n.j.: statistical techniques research group, princeton university, 1963). 15. resnikoff, h. l.; dolby, j. l.: a proposal to construct a linguistic and statistical programming system, (los altos, cal.: r & d consultants company, 1967). the next generation integrated library system: a promise fulfilled? yongming wang and trevor a. dawes information technology and libraries | september 2012 76 abstract the adoption of integrated library systems (ils) became prevalent in the 1980s and 1990s as libraries began or continued to automate their processes. these systems enabled library staff to work, in many cases, more efficiently than they had in the past. however, these systems were also restrictive—especially as the nature of the work began to change—largely in response to the growth of electronic and digital resources that they were not designed to manage. new library systems—the second (or next) generation—are needed to effectively manage the processes of acquiring, describing, and making available all library resources. this article examines the state of library systems today and describes the features needed in a next-generation library system. the authors also examine some of the next-generation library systems currently in development that purport to fill the changing needs of libraries. introduction since the late 1980s and early 1990s, the library automation system has gone from inception to rapid implementation to near ubiquitous adoption. but after two decades of changes in information technology, and especially in the last decade, the library has seen itself facing tremendous changes in terms of both resources and services it provides. on the resource side, print material and physical items are no longer dominant collections; electronic resources are fast outpacing physical materials to become the dominant library resources, especially in the academic and special libraries. in addition, many other digital format resources, such as digital collections, institutional repositories, and e-books have taken root. on the service front, library users— accustomed to immediate and instant searching, finding, and accessing information in the google age—demand more and more instant and easy access to library resources and services. but the library automation system, also called the integrated library system (ils), has not changed much for the past two decades. it finds itself uneasily handling the ever-changing library environment and workflow. library staff becomes ever more frustrated with the ils, noting its inadequacy in dealing with their daily jobs. library users are confused by the many interfaces and complexity of library applications and systems. it is obvious that we are at the tipping point for a dramatic change in the area of library automation systems. the library literature has been referring to these as second-generation library automation systems or next-generation library systems.1 two pillars of the second-generation library automation system are(1) it will manage the library resources in the comprehensive and unified way regardless of resource format and location; and (2) it will break away from the traditional ils models and build on the service oriented architecture (soa) model. yongming want (wangyo@tcnj.edu) is systems librarian for the college of new jersey library, ewing township, and trevor dawes (tdawes@princeton.edu) is access services & circulation librarian, princeton university libraries, princeton, new jersey. the next generation library system: a promise fulfilled? | wang and dawes 77 we are at the beginning of a new era of library automation systems. some library system vendors have realized the need to change and have started to develop and implement the secondgeneration library automation system. we believe that the concept and implementation of the new library automation system will catch on quickly among the all types of libraries. it will change how the library conducts its business and will benefit both library staff and users. literature review there is not much research literature on the subject to date. after more than a decade of library automation development and implementation, starting in the late 1990s, libraries have been facing the challenges ushered in by rapidly evolving internet and web 2.0 technologies in addition to the growing number of savvy web users. libraries found themselves lagging behind other sources (such as internet search engines) in meeting users’ information needs, and library staff members are generally frustrated by the lack of flexibility of traditional library systems. as early as 2007, marshall breeding pointed out that “as librarians continue to operate with sparse resources, performing ever more services with ever more diverse collections—but with no increases in staff—it’s more important than ever to have automation tools that provide the most effective assistance possible.”2 in his 2009 article, he deliberately says that “dissatisfaction with the current slate of ils products runs high. the areas of concern lie in their inability to manage electronic content and with their user interfaces that do not fare well against contemporary expectations of the web.”3 so what are the trends in libraries for the last decade in terms of library resources, collections, services, and resource discoveries? according to breeding, there are three trends: “1. increased digital collections; 2. changed expectations regarding interfaces; 3. shifted attitudes toward data and software.”4 andrew pace notes that “web-based content, licensed resources, born-digital documents, and institutionally significant digital collections emerged rapidly to overtake the effort required to maintain print collections, especially in academic libraries.”5 another noticeable trend in the library technology field is occurring along with a similar trend in the general information technology field, that is, the open-source software movement. as pace states, “open source software (oss) efforts such as the open archive initiative (oai), dspace, and koha—just to name a few, as an exhaustive list would overwhelm the reader—challenged commercial proprietary systems, not only for market share but often in terms of sophistication and functionality.”6 as for the infrastructure and features of the second-generation library automation system, both breeding and pace have their respective visions. breeding writes that “the next generation of library automation systems needs to be designed to match the workflows of today’s libraries, which manage both digital and print resources.”7 “one of the fundamental assumptions of the next generation library automation would involve a design to accommodate the hybrid physical and digital existence that libraries face today.”8 pace specifically requires that the next-generation library automation system should use the web as a platform to fulfill the notion of software-as-aservice (saas), or further, platform-as-a-service (paas). the technical advantages of such systems would include the ability to “1. develop, test, deploy, host, and maintain on the same integrated environment; 2. user experience without compromise; 3. build-in scalability, reliability, and information technology and libraries | september 2012 78 security; 4. build-in integration with web services and databases; 5. support collaboration; 6. deep application instrumentation.”9 also as early as october 2007, computers in libraries invited ellen bahr to survey a number of library technology experts regarding what features and functionality they want to see built into ilss soon. the experts included roy tennant, kristin antelman, ross singer, andrew pace, john blyberg, stephen abram, and h. frank cervone. they identified the following key functionality for future ilss: • direct, read-only access to data, preferably through an open source database management system like mysql. • a standard way to communicate with the ils, preferably through an application programming interface. • standards-compliant systems including better security and more complete documentation. • the ability to run the ils on hardware that the library selects and on servers that the library administers. • greater interoperability of systems, pertaining to the systems within the library (including components from vendors, open source communities, and homegrown systems) and beyond (enterprise-level systems such as courseware and university portals, and shared library systems such as oclc). • greater distinction between the ils (which needs to efficiently manage a library’s business processes) and the opac (which needs to be a sophisticated finding tool). • better user interfaces, making use of the most current technologies available and providing a single interface to all of the library’s holdings, regardless of format.10 four aspects of next-generation ils there are four distinguishing characteristics of the next-generation ils we believe are critical. they are comprehensive library resources management; a system based on service-oriented architecture; the ability to meet the challenge of new library workflow; and a next-generation discovery layer. comprehensive library resources management comprehensive library resources management requires that next-generation ilss should be able to manage all library materials regardless of format or location. current ilss are built around the traditional library practice of print collections and services designed around these collections, but the last ten to fifteen years have seen great shifts in both library collections and services. print and physical materials are no longer the dominant resources. actually, in many libraries, especially in academic and research libraries, the building of electronic and digital collections have taken a larger role in library collection development. the traditional ils has not been able to handle ever-growing electronic and digital resources—either in terms of their acquisition or management. therefore a variety of either commercial or open-source the next generation library system: a promise fulfilled? | wang and dawes 79 electronic resources management systems (erm systems) have been developed over the years to address this management gap, but two problems exist: first, most erm systems, whether commercial or open-source, have not been able to truly integrate the acquisition process into the acquisitions workflow of the current ils systems, causing a messy and redundant workflow for the library staff. in libraries where an erm is deployed, staff generally track workflows in both the erm and the ils. if the library’s workflows have not been revised, miscommunication between the traditional acquisitions staff and the electronic resources staff can cause confusion, delay, and may even lead to disruption of services to library patrons. second, erm systems, by design, don’t take current library workflows into account. while it is true that these resources may need to be processed differently, library staff generally are used to traditional processes and want systems that function in familiar ways. many libraries, particularly academic libraries, still have relatively large serials departments responsible for the management of print journals. some have only recently begun to develop the personnel and the skills required to manage the influx of electronic and digital resources. because of these problems with existing erm systems, it is important that the next-generation ilss fully integrate the key features of erm systems, enabling the library to streamline and efficiently manage resources and staff. full integration of e-resource management would not only include acquisitions functionality but also the ability to manage licenses—a critical component of e-resource management—and the ability to manage the various packages, databases, and vendors. describing and providing access to e-resources are two aspects of the e-resources management process. these two features of the erm system should also be integrated with the description and metadata management component of the next-generation ils. centrally managing the metadata of e-resources enables easier discovery of resources by library users and has the advantage of shifting some of the management workflow to the metadata (or cataloging) staff. system based on service-oriented architecture next-generation ilss should be designed based on service-oriented architecture (soa). what is soa? a service-oriented architecture (soa) is an architecture for building business applications as a set of loosely coupled distributed components linked together to deliver a well-defined level of service. these services communicate with each other, and the communication involves data exchange or service coordination. soa is based on web services. broadly, soa can be classified into two aspects: services and connections, described below. services: a service is a function or some processing logic or business processing that is welldefined, self-contained, and does not depend on the context or state of other services. an example of a service is loan processing services, which can be a self-contained unit for processing loan applications. another example is weather services, used to get weather information. any application on the network can use the services of the weather service to get the weather information for a local area or region. in the library field, an example of a well-defined service is a check-in or check-out service. information technology and libraries | september 2012 80 connections: connections are the links connecting these self-contained distributed services with each other. they enable client-to-services communication. in case of web services, simple object access protocol (soap) is frequently used to communicate between services. there are many benefits of soa in the next-generation ils. these include the ability to be platform independent, therefore allowing libraries to use the software and hardware of their choice. there is no threat of being locked in to a single vendor, as many libraries are now with their current ilss. soa also enables incremental development, deployment, and maintenance. the vendors can use the existing software (investment) and use soa to build applications without replacing existing applications. as breeding described, the potential of web services (soa) for libraries includes • real-time interaction between library-automation systems and business systems of a library’s parent organization; • real-time interaction between library-automation systems and library suppliers or other business partners; • blending of library services into campus or municipal portal environments; • insertion of library services and content into courseware-management systems or other learning environments; • blending of content from external sources into library interfaces; and • delivery of library services and content to library users through nontraditional channels. 11 meet the challenge of the new library workflow the library systems in use today are, in general, aging—most were developed at least ten to fifteen years ago. they have been updated with software patches and new releases, but they still demand that staff work in the manner in which the systems were originally designed. although changes in our library operations have been realized in many organizations, these systems have not been able to adequately adapt to how library staff now want to—or need to—operate. the inability to keep pace with the move from largely print to increasingly electronic resources in our libraries is one of the reasons our existing systems fail. copeland et al. present a stunning visual of the typical workflow involved in acquiring and making available an electronic resource in the print-based library management system.12 their graphic depicts five possible starting points, nine decision points, and close to twenty steps involved in the process. this process may not be typical, but it is illustrative of the complex nature of our new workflows that simply cannot be accommodated by existing ilss. as early as 1997, the sirsi corporation recognized the need to modify systems; they introduced workflows, which is designed to streamline library operations.13 workflows, which introduced a graphical user interface to the sirsi unicorn system, was intended to allow staff a certain amount of flexibility and customization, depending on the tasks they typically perform. the new systems that are being developed and deployed today promise even more flexibility and propose to enable staff to work more efficiently irrespective of the format of the material being processed. but these systems will require staff to think about workflows in entirely different ways. not only will the method used to perform tasks be different (now web-based, hosted services as the next generation library system: a promise fulfilled? | wang and dawes 81 opposed to client-server-based tools) but the functionality has been enhanced to be more efficient. we cannot say how these new systems will be welcomed or resisted by staff. nor can we say how much staff savings will be realized because these systems are still too new and have not yet been implemented on a wide enough scale for a thorough assessment. but they are at least starting to address the issue. on the one hand, they will open a new window for further study and exploration of how to shape the next-generation ilss to suit the new library workflow. on the other hand, the library will benefit by changing some of their out-of-date practices and workflows around the new system. next-generation discovery layer current library opacs, like the ilss themselves, are more than ten years old and generally have shown no improvement in search capability, navigability, or discovery. meanwhile, search technology has radically improved in the past decade. frustrations with the opacs’ limitations on the part of both librarians and library users eventually motivated many libraries to seek alternatives. libraries want to take advantage of the advances in search and discovery technology by implementing “nextgen” opacs or library discovery services. given the vast range of resources available in libraries—local print holdings, specialized databases, and commercial databases to name only a few—libraries want a service that would make as many of them as discoverable as possible. the ideal system would have a unified search interface with a single search box, but with relevance ranking, faceted search, social tagging of records, persistent links to records, rss feeds for searches, and the ability to easily save searches or export selected records to standard bibliographic management software programs. the ideal system would also integrate with the library’s opac, overlaying its current interface with a more nimble and navigable interface that still allows real-time circulation status and provides as much support as possible for foreign language fonts. it would also be as customizable as possible. numerous options for discovery currently exist, and these include summon from serials solutions, primo from ex libris, worldcat local from oclc, ebsco discovery service, and encore from innovative interfaces. as these services are not the focus of this article, they will not be discussed in detail, but the next-generation ilss should have the ability to integrate seamlessly with these discovery services. analysis of two examples 1. alma development in early 2009, ex libris (owner of aleph and voyager) began discussions with several institutions (boston college, princeton university, and katholieke universiteit leuven; purdue university joined later) to develop what they then termed the unified resource management system (urm). the urm was to replace the existing ilss and the subsequent add-ons that provided functionality not inherently available, such as the electronic resources management (erm) tools. the “backend” operations would also be de-coupled from the user interface as described elsewhere in this paper. information technology and libraries | september 2012 82 through a series of in-person and online meetings with the development partners, ex libris staff developed the conceptual framework and functional requirements for the urm (later named alma) and began development of the product. alma was delivered to the partners in a series of releases, each with more functionality, and the feedback was used to enhance or further develop the product. alma uses the concept of a shared metadata repository (the metadata management system) to which libraries would contribute, through which records would be shared, and from which records would be downloaded and edited with local information. selection and acquisitions functions would be integrated not only within alma, but within the discovery layer to allow patrons, as well as staff, the ability to suggest items for addition to the library’s holdings. with “smart fulfillment,” the workflows for delivering materials to patrons will also be seamless.14 one of the major changes planned for alma is the ability to manage the types of resources that cannot be effectively managed in current ilss—specifically electronic and digital resources. these resources are currently managed with the use of add-on products that interact with varying degrees of success with the ilss. this lack of integration has been a source of frustration for library staff, particularly as library electronic and digital collections continue to steadily grow. the development partners have presented extensively at various conferences about the development process and have been mostly positive about the product. dawes and lute described princeton university’s participation in a presentation at the 2011 acrl conference in philadelphia.15 at princeton, an executive committee was created to oversee that partner’s process. other staff members were then involved in testing each of the partner releases as the functionality increased and was made available to them. the princeton university team then provided feedback to ex libris via regular telephone calls, after which they would see changes based on their feedback, or a status update from ex libris about the particular issue reported. the staff members at princeton believe that their participation in the development of alma has given them an opportunity to closely examine their workflows to see where efficiencies can be made. 2. kuali ole project in 2008 a group of nine libraries formed the open library environment (ole) project, later called kuali ole. kuali is a community of higher education institutions that came together to build enterprise-level and open-source applications for the higher education community. these systems include some core applications such as kuali financial system, kuali people management, and other campus-wide applications. the kuali ole is its most recent endeavor. the purpose of the kuali ole project is to build an enterprise-level, open-source, and next-generation ils. the goal of kuali ole, taken from its website (http://kuali.org/ole), is to “develop the first system designed by and for academic and research libraries for managing and delivering intellectual information.” there are six principal objectives of the project: • to be built, owned, governed by the academic and research library community • to supports the wide range of resources and formats of scholarly information • to interoperate and integrate with other enterprise and network-based systems the next generation library system: a promise fulfilled? | wang and dawes 83 • to support federation across projects, partners, consortia, and institutions • to provide workflow design and management capabilities • to provides information management capabilities to nonlibrary efforts the funding is provided by a contribution from the andrew w. mellon foundation and the nine partner institutions. kuali ole will be built based on the soa model, on top of the kuali middleware application, kuali rice, the core component of the kuali suite of applications. kuali rice “provides an enterprise class middleware suite of integrated products that allows for applications to be built in an agile fashion. this enables developers to react to end user business requirements in an efficient and productive manner, so that they can produce high quality business applications.”16 version 1.0 of kuali ole is scheduled to be released to the public in december 2012. a stepping and testing version (0.3) was released in november 2011, which covers some core acquisitions features such as “select” and “acquire” processes. we believe that the kuali ole software will not only provide an alternative solution of the ils for academic and research libraries, but will change the way the library conducts its business, and will also have implications for staffing. these changes will result from the comprehensive management of library materials and resources, and the system’s interoperability with other college-level enterprise applications. conclusion after about two decades of library automation system history, both libraries and vendors have begun to realize that a revolutionary change is needed in designing and developing the nextgeneration ils. the system, built on the model of soa, should enable the library to comprehensively and effectively manage all library resources and collections, should accommodate a more flexible library workflow, and should enable the library to provide better services to library users. it is encouraging to see that, in both the commercial and open-source arenas, concrete steps are being taken to develop these systems that will manage all library resources. alma and kuali ole are but two of the next-generation ilss in development. in 2011, serials solutions announced their intent to develop a system using the same principles as described. so have innovative interfaces and oclc, the latter of which has already released an early version of their product to some institutions. since these products are still in development and implementation is not yet widespread, their success in meeting the needs of the library community is still to be seen. references 1. marshall breeding, “next generation library automation: its impact on the serials community,” the serials librarian 56, no. 1–4 (2009): 55–64. 2. marshall breeding, “it’s time to break the mold of the original ils,” computers in libraries 27, no. 10 (2007): 39–41. 3. breeding, “next generation library automation information technology and libraries | september 2012 84 4. breeding, “it’s time to break the mold of the original ils.” 5. andrew pace, “21st century library systems,” journal of library administration 49, no. 6 (2009): 641–50. 6. ibid. 7. breeding, “it’s time to break the mold of the original ils.” 8. breeding, “next generation library automation.” 9. dave mitchell, “defining platform-as-a-service, or paas,” bungee connect developer network, 2008, http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-orpaas (accessed jan. 28, 2012). 10. ellen bahr, “dreaming of a better ils,” computers in libraries 27, no. 9 (2007): 10–14. 11. marshall breeding, “web services and service oriented architecture,” library technology reports 42, no. 3 (2006): 3–42. 12. jessie l. copeland et al., “workflow challenges: does technology dictate workflow?” serials librarian 56, no. 1–4 (2009): 266–70. 13. “sirsi introduces workflows to streamline library operations,” information today 14, no. 7 (1997): 52. 14. ex libris, “ex libris alma: the next generation library services framework,” 2011, www.exlibrisgroup.com/category/almaoverview (accessed jan. 3, 2012). 15. acrl virtual conference, “princeton university discusses ex libris alma,” 2011, www.learningtimes.net/acrl/2011/906 (accessed jan. 3, 2012). 16. kuali rice website, http://www.kuali.org/rice (accessed sept. 10, 2012). http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-or-paas http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-or-paas http://www.exlibrisgroup.com/category/almaoverview http://www.kuali.org/rice a file storage service on a cloud computing environment for digital libraries victor jesús sosa-sosa and emigdio m. hernandez-ramirez information technology and libraries | december 2012 34 abstract the growing need for digital libraries to manage large amounts of data requires storage infrastructure that libraries can deploy quickly and economically. cloud computing is a new model that allows the provision of information technology (it) resources on demand, lowering management complexity. this paper introduces a file-storage service that is implemented on a private/hybrid cloud-computing environment and is based on open-source software. the authors evaluated performance and resource consumption using several levels of data availability and fault tolerance. this service can be taken as a reference guide for it staff wanting to build a modest cloud storage infrastructure. introduction the information technology (it) revolution has led to the digitization of every kind of information.1 digital libraries are appearing as one more step toward easy access to information spread throughout a variety of media. the digital storage of data facilitates information retrieval, allowing a new wave of services and web applications that take advantage of the huge amount of data available.2 the challenges of preserving and sharing data stored on digital media are significant compared to the print world, in which data “stored” on paper can still be read centuries or millennia later. in contrast, only ten years ago, floppy disks were a major storage medium for digital data, but now the vast majority of computers no longer support this type of device. in today’s environment, selecting a good data repository is important to ensure that data are preserved and accessible. likewise, defining the storage requirements for digital libraries has become a big challenge. in this context, it staff—those responsible for predicting what storage resources will be needed in the medium term—often face the following scenarios: • prediction of storage requirements turn out to be below real needs, resulting in resource deficits. • prediction of storage requirements turn out to be above real needs, resulting in expenditure and administration overhead for resources that end up not being used. in these situations, considering only an efficient strategy to store documents is not enough.3 the acquisition of storage services that implement an elastic concept (i.e., storage capacity that can be victor jesús sosa-sosa (vjsosa@tamps.cinvestav.mx) is professor and researcher at the information technology laboratory at cinvestav, campus tamaulipas, mexico. emigdio m. hernandez-ramirez (emhr1983@gmail.com) is software developer, svam international, ciudad victoria, mexico. information technology and libraries | december 2012 35 increased or reduced on demand, with a cost of acquisition and management relatively low) becomes attractive. cloud computing is a current trend that considers the internet as a platform providing on-demand computing and software as a service to anyone, anywhere, and at any time. digital libraries naturally should be connected to cloud computing to obtain mutual benefits and enhance both perspectives.4 in this model, storage resources are provisioned on demand and are paid according to consumption. services deployment in a cloud-computing environment can be implemented three ways: private, public, or hybrid. in the private option, infrastructure is operated solely for a single organization; most of the time, it requires an initial strong investment because the organization must purchase a large amount of storage resources and pay for the administration costs. the public cloud is the most traditional version of cloud computing. in this model, infrastructure belongs to an external organization where costs are a function of the resources used. these costs include administration. finally, the hybrid model contains a mixture of private and public. a cloud-computing environment is mainly supported by technologies such as virtualization and service-oriented architectures. a cloud environment provides omnipresence and facilitates deployment of file-storage services. it means that users can access their files via the internet from anywhere and without requiring the installation of a special application. the user only needs a web browser. data availability, scalability, elastic service, and pay-per-use are attractive characteristics found in the cloud service model. virtualization plays an important role in cloud computing. with this technology, it is possible to have facilities such as multiple execution environments, sandboxing, server consolidation, use of multiple operating systems, and software migration, among others. besides virtualization technologies, emerging tools that allow the creation of cloud-computing environments also support this type of computing model, providing dynamic instantiation and release of virtual machines and software migration. currently, it is possible to find several examples of public cloud storage, such as amazon s3 (http://aws.amazon.com/en/s3), rackspace (http://www.rackspace.com/cloud/public/files), and google storage (https://developers.google.com/storage), each of which provide high availability, fault tolerance, and services and administration at low cost. for organizations that do not want to use a third-party environment to store their data, private cloud services may offer a better option, although the cost is higher. in this case, a hybrid cloud model could be an affordable solution. organizations or individual users, can store sensitive or frequently used information in the private infrastructure and less sensitive data in the public cloud. the development of a prototype of a file-storage service implemented on a private and hybrid cloud environment using mainly free and open-source software (foss) helped us to analyze the behavior of different replication techniques. we paid special attention to the cost of the system implementation, system efficiency, resource consumption, and different levels of data privacy and availability that can be achieved by each type of system. http://aws.amazon.com/en/s3 http://www.rackspace.com/cloud/public/files https://developers.google.com/storage a file storage service on a cloud computing environment for digital libraries | sosa-sosa 36 infrastructure description the aim of this prototyping project was to design and implement scalable and elastic distributed storage architecture in a cloud-computing environment using free, well-known, open-source tools. this architecture represents a feasible option that digital libraries can adopt to solve financial and technical challenges when building a cloud-computing environment. the architecture combines private and public clouds by creating a hybrid cloud environment. for this purpose, we evaluated tools such as kvm and xen, which are useful for creating virtual machines (vm).5 open nebula (http://opennebula.org), eucalyptus (http://www.eucalyptus.com), and openstack (http://www.openstack.org) are good, free options for managing a cloud environment. we selected open nebula for this prototype. commodity hard drives have a relatively high failure rate, hence our main motivation to evaluate different replication mechanisms, providing several levels of data availability and fault tolerance. figure 1(a) shows the core components of our storage architecture (the private cloud), and figure 1(b) shows a distributed storage web application named distributed storage on the cloud (disoc), used as a proof of concept. the private cloud also has an interface to access a public cloud, thus creating a hybrid environment. figure 1. main components of the cloud storage architecture the core components and modules of the architecture are the following: • virtual machine (vm). we evaluated different open-source were evaluated, such as kvm and xen, for the creation of virtual machines.6 some performance tests were done, and kvm showed a slightly higher performance than xen. we selected kvm as the main virtual machine manager (vmm) for the proposed architecture. vmms also are called http://opennebula.org/ http://www.eucalyptus.com/ http://www.openstack.org/ information technology and libraries | december 2012 37 hypervisors. each vm has a linux operating system that is optimized to work in virtual environments and requires a minimum consumption of disk space. the vm also includes an apache web server, a php module, and some basic tools that were used to build the disoc web application. every vm is able to transparently access a pool of disks through a special data access module, which we called dam. more details about dam follow. • virtual machine manager module (vmmm). this has the function of dynamic instantiation and de-instantiation of virtual machines depending on the current load on the infrastructure. • data access module (dam). all of the virtual disk space required by every vm was obtained through the data access module interface (dam-i). dam-i allows vms to access disk space by calling dam, which provides transparent access to the different disks that are part of the storage infrastructure. dam allocates and retrieves files stored throughout multiple file servers. • load balancer module (lbm). this distributes the load among different vms instantiated on the physical servers that make up the private cloud. • load manager (lm). this monitors the load that can occur in the private cloud. • distributed storage on the cloud (disoc). this is a web-based file-storage system that is used as a proof of concept and was implemented based on the proposed architecture. replication techniques high availability is one of the important features offered in a storage service deployed in the cloud. the use of replication techniques has been the most useful proposal to achieve this feature. dam is the component that provides different levels of data availability. it currently includes the following replication policies: no-replication, total-replication, mirroring, and ida-based replication. • no-replication. this replication policy represents the data availability method with the lowest level of fault tolerance. in this method, only the original version of a file is stored in the disk pool. it follows a round-robin allocation policy whereby load assignation is made based on a circularly linked list, taking into account disk availability. this policy prevents all files from being allocated to the same server, providing a minimal fault tolerance in case a server failure. • mirroring. this replication technique is a simple way to ensure higher availability without high resource consumption. in this replication, every time a file is stored in a disk, the dam creates a copy and places it on a different disk. • total-replication. this represents the highest data availability approach. in this technique, a copy of the file is stored on all of the file servers available. total-replication also requires the highest consumption of resources. • ida-based replication. to provide higher data availability with less impact on the consumption of resources, an alternative approach based on information-dispersal techniques can be used. the information dispersal algorithm (ida) is an example of this a file storage service on a cloud computing environment for digital libraries | sosa-sosa 38 strategy.7 when a file (of size |f|) is required to be stored using the ida, the file is partitioned into n fragments of size |f|/m, where m2 ~oi l~ of orals, vest aa4 .. ~:ibe r i~ tu ndra and forest-tundra cry oce mics cryogenic engineering cbyoturbati011 central 23-319) soils 23-3195 23-2276 cryogenic ·fossil c(p.vasses coun ty. ··: crystal gbovth in l'is let 23-2314 patter ns on the ice surface of a lake ~327 14 crystal lattices growth of an ice cr ystal ' in anal og y wit h a n elec tro s tatic field 23-2624 crystal structobe crystal struc ture of water 23-22ij5 gr o vt h o f s nowflakes 23-2874 si lver 'iod ide nucleating sites 23-2269 snow c ry s tals in fusb iai district,' kyoto 23-2878 cbystal study t!chiiqo!s complexities of the three-diaensional ' shape of indivi dual crystals in g l ac i er ice 23-2943 crystals physi ca l properties of aolecular c ry s tals, liquids, and glasselj-2231 cobic ice hexagonal and cubic ice at lov temperature 23-2651 ten s ile strength of c ubic crystals under pressure . 23-2928 cyclo'e blokiig siok ft!f!r "cyclone~ blowing snow •eter and its u se a t llirnyy 23-3073 cjbstal stody tecuiiqo!s contac't • ethod of photographing snow ann firn sa•ples 23-2799 diiiac! avalan ches on rebun i s land, japan 23-29.13 daaage by snowstora of jan. 1963 in japan 2l-a67 por e s t da mage caused by avalanches ' 23-2875 snov and ice da•age o n electric communication lin es in hokkaido 23-2~8 1 darag!690 forest tbe!s co•parative studies · of avalanche injury and vind daaage to forests 23-2ij2~ dus building da a s of •oraine deposits 23-2556 bui l ding e abank•ents in freezing weather 23-3150 chang ing the hydrologica l regiae of a river by controlling its flow23-2429 the ycar-round constru c tion of t~~ yilyuy power station da• in t•e extreoe north 23-2982 dbpoir1tioi build in g d efor•ati o os caused bj frost h eave 2 l26 07 concrete defor•ation due to shrinkage at minus te•pera ture s 23-2558 deformation of brid ge abut•ents erected on penafrosf 23-2599 roadbed deforaati o n due to ground thawing and fros t heave 23-2865 stab ility o f foundations built oo fr os t heaving ground 23-2598 strains in concrete due to negati•e t e aperatures 2l-28ij5 d!gfiee dus de vel o pment of ~bore ice in the lazare v station region 23-303 7 fig. 3. output of cobol language program using marc ii data. marc ii and cobol/ avram and droz 271 table 8. manpower expenditure activity analysis and programming debugging and checkout man weeks 1 2 total 3 since the processing time of a print program is usually a function of the speed of the printer, no accurate internal processing times were recorded . . however, there was no noticeable time difference between this program and other marc print programs written at the library of congress in assembly language. communication format processing the aforementioned techniques are equally adaptable for use with the marc ii communications format ( 3) with the following changes in format conventions: 1) the communication format has a 24-character leader rather than 92 characters of fixed length items in the processing format. in the program, under the "working-storage section", the group item labelled "fixed-marc" would have to be redefined to conform with the 24-character leader. the cobol statements that are noted with " 0 0 " would require a change of their value from "92" to "24". 2) the communication format has no total count of entries in the record directory. a calculation would have to be made to arrive at the total count and that figure stored in a new hold area labelled "directory-count". the base address of the data in the communication format is not relative to the first position of the record as defined in the processing format, but to the first position of the first variable field. this base address is carried in the record leader, and is available for the calculation required for the directory entry count (base address -24/ 12). in the program, after the record directory had been searched and the proper entry placed in the work area, the "move-data" sub-routine would move the appropriate field to the work area for processing with the one alteration noted below with an asterisk. move-data. move zeros to tsub. move spaces to hold-data. move d-address to dsub. add base-address to dsub. o perform move-a d-length times. move-a. add 1 to dsub. add 1 to tsub. move marc-byte (dsub) to d-hold (tsub). programming techniques naturally are dependent on the processing required and the format characteristics at the individual institution. if the marc ii communications format were to be manipulated in the form 272 journal of library automation vol. 1/ 4 december, 1968 in which it is received (each byte equal to a character with a 24-character leader followed by 12-character directory entries) an alternate approach to that suggested above could be to work in the record area and not move data to a work area. conclusion the only marc ii data available to users up to the writing of this article (october 1968) has been the marc ii test tape released by the library of congress in august 1968. therefore, it is probable that most people expressing doubts about the use of cobol with marc records have done so without the experience of actually using the language. we now have this experience at the libary of congress. cobol was successfully used for the computer processing of marc records. the complexity of the record did not detract from ease in programming. although the programs written were for a report function, the data accessing modules of cobol nevertheless can be used for many other functions. file maintenance and retrieval algorithms could be defined and programmed in cobol with facility equal to that in programming the subject function. references 1. griffin, hillis: "automation of technical processes in libraries," in annual review of information science and technology, edited by carlos a. cuadra (chicago: encyclopaedia britannica) 3 (1968), 241-262. 2. u. s. library of congress, information systems office: subscriber's guide to the marc distribution service (washington, d. c.: library of congress, 1968). 3. avram, henriette d.; knapp, john f.; rather, lucia j.: the marc ii format: a communications format for bibliographic data (washington, d . c.; library of congress, 1968 ), pp. 1, 2, 10. microsoft word december_ital_kiscaden_final.docx creating a current awareness service using yahoo! pipes and libguides elizabeth kiscaden information technology and libraries | december 2014 51 abstract migration from print to electronic journals brought an end to traditional current awareness services, which primarily used print routing. the emergence of real simple syndication, or rss feeds, and email alerting systems provided users with alternative services. to assist users with adopting these technologies, a service utilizing aggregate feeds to the library’s electronic journal content was created and made available through libguides. libraries can reestablish current awareness services using existing technologies to increase awareness and usage of library-‐provided electronic journal content. the current awareness service presented is an example of how libraries can build basic current awareness services utilizing freely accessible technologies. current awareness services library current awareness services, commonly referred to as “table of contents” services, historically involved the dissemination of information in the form of print journals or photocopied journal contents routed to library users subscribed to the service.1,2 these services have been particularly popular among corporate, law, and hospital libraries, which routinely route serials to primarily internal clients. while these paper-‐based services are still offered at some libraries, most shifted to an electronic model of service with the migration to electronic journals. as libraries adopted electronic journals, many paper-‐based current awareness services transitioned to an electronic table of contents service utilizing email alerts or referred users to rss feeds made available by publishers and database vendors.3 a common challenge to a library-‐managed electronic table of contents service is the complexity of managing alerts for hundreds of electronic journals for multiple patrons. more often, libraries make individual users responsible for subscribing to email alerts or rss feeds on their own, effectively transferring the responsibility of subscribing to, filtering, and managing incoming information to the user. a drawback to this migration is that library users often don’t possess a clear understanding of what tools are available to create their own service.4 formerly, journals may have arrived on a user’s desk for perusal, yet now users are required to seek out information independently. additionally, despite the number of discovery tools available, library users are often unaware of journals available in an electronic format through their library.5 information management tools have become necessary in our current information environment; with the abundance of elizabeth kiscaden (elizabeth-‐kiscaden@uiowa.edu), former library director at waldorf college, is head, library services, hardin library for the health sciences, university of iowa, iowa city. creating a current awareness service using yahoo! pipes and libguides | kiscaden 52 information available, keeping up-‐to-‐date with new information in a discipline can be overwhelming. therein exists an opportunity for libraries—academic, special, and public—to revitalize current awareness services and build information management tools using aggregate feeds. design and description of the service at waldorf college, the luise v. hanson library created a current awareness service utilizing rss feeds, with the intent to assist faculty with keeping up-‐to-‐date with newly published content in the library’s electronic journal collection. the service, dubbed info sos, was designed to overcome two barriers to patron participation in feed services: the chore of subscribing to and curating multiple feeds and the lack of awareness of feeds and feed reader technology. info sos was piloted to faculty during the spring of 2014 and was accompanied by an informal questionnaire to collect feedback. info sos is built on rss, or “really simple syndication” technology, one of the most prevalent tools for keeping current with new information published electronically. rss has been available for more than a decade,6 and many users—both patrons and library professionals—are using this technology. however, while powerful and freely accessible, rss feeds have their limitations. subscribing to and curating multiple feeds can become a burden. to eliminate the chore of managing multiple feeds, info sos displays feed aggregates created using yahoo! pipes http://pipes.yahoo.com/pipes/). aggregate feeds, or feeds comprising multiple rss feeds, can be created using many tools available freely online, such as feed stitch, feed informer, feedburner, and more. yahoo! pipes was chosen for this service primarily because it requires limited coding knowledge,7 yet the software provides a number of advanced functions for sorting and combining large groups of feeds. these advanced features became essential when building aggregate feeds for content from journal aggregators. yahoo! pipes requires a user account (free of charge) before constructing pipes. the software combines and sorts information using a visual editor that resembles virtual plumbing, which is presumably why the software is called pipes. to construct the aggregate feeds composing info sos, librarians used the fetch feed operator to combine individual rss feeds into a single feed. once combined, the service uses the sort operator, which sorts the aggregated content by date. from the sort operator, the content is connected to the pipe output, from which a single rss feed is generated. the strength of yahoo! pipes lies in the advanced tools available for manipulating feed content. for example, pipes sorts feed content from database vendors by the date it is published to the feed, not the publication date of the article. if desired, aggregate feed creators can use the rename and regex operators to remove the article publication date from the description field and use it to sort the feed content. another useful tool is the union operator, which allows creators to string together larger bundles of feeds. information technology and libraries | december 2014 53 figure 1. fetch feed and sort operator in yahoo! pipes figure 2. image of yahoo! pipe using advanced tools creating a current awareness service using yahoo! pipes and libguides | kiscaden 54 lack of awareness is a barrier to user adoption of rss feeds; many users have an unclear understanding of what a rss feed is. if unfamiliar with rss feeds, it is safe to assume that users are unfamiliar with rss reader technology as well. at waldorf college, this was confirmed by the questionnaire distributed during the pilot of this service. of the twenty-‐eight faculty respondents, more than 70 percent had never used an rss feed before using info sos. it is safe to assume that these faculty would not have a subscription to a feed reader. recognizing the need for an interface to deliver content, librarians used the libguides software to display content from these aggregate feeds. the software contains a tool for adding feed content, and allows for the application of an institution’s proxy prefix to the url, creating seamless access on and off campus. the info sos resource contains tabbed pages designated for individual fields (biology, psychology, library sciences, etc.) displaying aggregate feeds for journals in each subject area. for example, the physics page contains aggregate feeds for new articles published in the library’s full-‐ text physics journals, as displayed in the figure below. figure 3. aggregate physics feeds in libguides user feedback info sos remains a relatively new service to library users at the luise v. hanson library, but preliminary feedback has been positive. the service was advertised to faculty via email and information technology and libraries | december 2014 55 accompanied by a feedback survey created using google forms. as stated previously, librarians received twenty-‐eight responses to the survey, a relatively strong response considering the limited number of faculty at the college. of the respondents, more than 70 percent had never used an rss feed previously, instead using a variety of other tools to stay current with their field. of those other tools, 18 percent of faculty subscribed to table of contents alerts, 27 percent browsed new issues of print journals, 25 percent visited association websites, and 23 percent conducted periodic searches for information in the library databases. it was of some concern that of these tools, only faculty using databases and subscribing to table of contents alerts would be connecting with the library’s electronic journal collection. when presented with info sos and asked whether faculty would find this tool useful, more than 70 percent responded that they would. faculty were solicited for suggestions for improving the resource, and librarians received many suggestions for expanding the content. this feedback was valuable in that it provided justification for continuing the service beyond the pilot and a list of potential subject areas to begin expanding the service. the intended outcome of the service is to assist faculty in keeping current with literature in their field and utilizing the library’s resources in the process. limitations and challenges generating feeds from popular library databases, such as ebscohost and proquest, is limited in that the publication dates for articles are contained in the description field. this can make the sort operator in yahoo! pipes somewhat inaccurate because it would be sorting by the date they were published to the feed, not by actual publication date of the journal article. if necessary, this issue can be corrected using the rename and regex operators by copying the item description as the publication date. an additional challenge regarding vendor-‐created feeds relates to the issue of expiring feeds created from library databases. a library profile was required for each database, such as ebscohost or proquest, to create and save feeds. this allows for the renewal of expiring feeds; the email account attached to the profile receives an invitation to renew expiring feeds. most vendors allow for feeds to be created at the database without a profile, but those feeds will automatically expire if not used within a period of time. the potential of feeds expiring may add an element of maintenance to the current awareness service. future developments yahoo! pipes offers the unique ability to publish pipes that others may share and “clone.” for libraries interested in creating aggregate feeds for popular ebscohost journals, the pipes created for info sos are available to clone at http://pipes.yahoo.com/infosos. a search of published pipes available in yahoo! pipes reveals pipes created by many public and academic libraries, all of which are available to clone and edit. the ability to share pipes with other institutions introduces the possibility of current awareness services shared between library consortia or associations. creating a current awareness service using yahoo! pipes and libguides | kiscaden 56 as information becomes more abundant, tools and services to manage incoming information will continue to be a corresponding need. creating and sharing services that utilize technology common to libraries presents us with the opportunity to collaborate with one another and revitalize library-‐ engineered current awareness services. these services offer a value that is twofold: library users benefit from the ability to stay current with publications in their field, and libraries have the potential of increased usage of their purchased content. with no financial investment, an aggregate feed-‐based service is a value that a variety of libraries can implement with the investment of only limited personnel time. references 1. g. mahesh and dinesh kumar gupta, “changing paradigm in journals based current awareness services in libraries,” information services & use 28, no. 1 (2008): 59–65, http://dx.doi.org/10.3233/isu-‐2008-‐0555. 2. stephen m. johnson, andrew osmond, and rebecca j. holz, “developing a current awareness service using really simple syndication (rss),” journal of the medical library association 97, no. 1 (2009): 51–53, http://dx.doi.org/10:3163/1536-‐5050.97.1.011. 3. mahesh and gupta, “changing paradigm in journals based current awareness services in libraries.” 4. m. kathleen kern and cuiying mu, “the impact of new technologies on current awareness tools in academic libraries,” reference & user services quarterly 51, no. 2 (2011): 92–97. 5. sandra j. weingart and janet a. anderson, “when questions are answers: using a survey to achieve faculty awareness of the library’s electronic resources,” college & research libraries 61, no. 2 (2000): 127–34, http://dx.doi.org/10.5860/crl.61.2.127. 6. jim doree, “rss: a brief introduction,” journal of manual & manipulative therapy 15, no. 1 (2007): 57–58. 7. bill dyszel, “create no-‐code mashups with yahoo! pipes,” pc magazine 26, no. 21/22 (2007): 103– 5. jorgensen ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ editorial board thoughts: libraries as makerspace? tod colegrove information technology and libraries | march 2013 2 recently there has been tremendous interest in “makerspace” and its potential in libraries: from middle school and public libraries to academic and special libraries, the topic seems very much top of mind. a number of libraries across the country have been actively expanding makerspace within the physical library and exploring its impact; as head of one such library, i can report that reactions to the associated changes have been quite polarized. those from the supported membership of the library have been uniformly positive, with new and established users as well as principal donors immediately recognizing and embracing its potential to enhance learning and catalyze innovation; interestingly, the minority of individuals that recoil at the idea have been either long-term librarians or library staff members. i suspect the polarization may be more a function of confusion over what makerspace actually is. this piece offers a brief overview of the landscape of makerspace—a glimpse into how its practice can dramatically enhance traditional library offerings, revitalizing the library as a center of learning. been happening for thousands of years . . . dale dougherty, founder of make magazine and maker faire, at the “maker monday” event of the 2013 american library association midwinter meeting framed the question simply, “whether making belongs in libraries or whether libraries can contribute to making.” more than one audience member may have been surprised when he continued, “it’s already been happening for hundreds of years—maybe thousands.”1 the o’reilly/darpa makerspace playbook describes the overall goals and concept of makerspace (emphasis added): “by helping schools and communities everywhere establish makerspaces, we expect to build your makerspace users' literacy in design, science, technology, engineering, art, and math. . . . we see making as a gateway to deeper engagement in science and engineering but also art and design. makerspaces share some aspects of the shop class, home economics class, the art studio and science lab. in effect, a makerspace is a physical mashup of these different places that allows projects to integrate these different kinds of skills.”2 building users’ literacies across multiple domains and a gateway to deeper engagement? surely these are core values of the library; one might even suspect that to some degree libraries have long been makerspace. a familiar example of maker activity in libraries might include digital media: still/video photography and audio mastering and remixing. youmedia network, funded by the macarthur patrick “tod” colegrove (pcolegrove@unr.edu), a lita member, is head of the delamare science & engineering library at the university of nevada, reno, nevada. mailto:pcolegrove@unr.edu editorial board thoughts: libraries as makerspace? | colegrove 3 institute through the institute of museum and library services, is a recent example of such effort aimed at creating transformative spaces; engaged in exploring, expressing, and creating with digital media, youth are encouraged to “hang out, mess around, and geek out.” a more pedestrian example is found in the support of users with first-time learning or refreshing of computer programming skills. as recently as the 1980s, the singular option the library had was to maintain a collection of print texts. through the 1990s and into the early 2000s, that support improved dramatically as publishers distributed code examples and ancillary documents on accompanying cd or dvd media, saving the reader the effort of manually typing in code examples. the associated collections grew rapidly, even as the overhead associated with the maintenance and weeding of a collection that was more and more rapidly obsoleted grew more. today, e-book versions combined with ready availability of computer workstations within the library, and the rapidly growing availability of web-based tutorials and support communities, render a potent combination that customers of the library can use to quickly acquire the ability to create or “make” custom applications. with the migration of the supporting print collections online, the library can contemplate further support in the physical spaces opened up. open working areas and whiteboard walls can further amplify the collaborative nature of such making; the library might even consider adding popular hardware development platforms to its collection of lendable technology, enabling those interested to check out a development kit rather than purchase on their own. after all, in a very real sense that is what libraries do—and have done, for thousands of years: buy sometimes expensive technology tailored to the needs and interest of the local community and make it available on a shared basis. makerspace: a continuum along with outreach opportunities, the exploration of how such examples can be extended to encompass more of the interests supported by the library is the essence of the maker movement in libraries. makerspace encompasses a continuum of activity that includes “co-working,” “hackerspace,” and “fab lab”; the common thread running through each is a focus on making rather than merely consuming. it is important to note that although the terms are often incorrectly used as if they were synonymous, in practice they are very different: for example, a fab lab is about fabrication. realized, it is a workshop designed around personal manufacture of physical items— typically equipped with computer controlled equipment such as laser cutters, multiple axis computer numerical controlled (cnc) milling machines, and 3d printers. in contrast, a “hackerspace” is more focused on computers and technology, attracting computer programmers and web designers, although interests begin to overlap significantly with the fab lab for those interested in robotics. co-working space is a natural evolution for participants of the hackerspace; a shared working environment offering much of the benefit of the social and collaborative aspects of the informal hackerspace, while maintaining a focus on work. as opposed to the hobbyist that might be attracted to a hackerspace, co-working space attracts independent contractors and professionals that may work from home. information technology and libraries | march 2013 4 it is important to note that it is entirely possible for a single makerspace to house all three subtypes and be part hackerspace, fab lab, and co-working space. can it be a library at the same time? to some extent, these activities are likely already ongoing within your library, albeit informally; by recognizing and embracing the passions driving those participating in the activity, the library can become central to the greater community of practice. serving the community’s needs more directly, opportunities for outreach will multiply even as it enables the library to develop a laser-sharp focus on the needs of that community. depending on constraints and the community of support, the library may also be well-served by forming collaborative ties with other local makerspace; having local partners can dramatically improve the options available to the library in day-to-day practice, and better inform the library as it takes well-chosen incremental steps. with hackerspace/co-working/fab lab resources aligned with the traditional resources of the library, engagement with one can lead naturally to the other in an explosion of innovation and creativity. renaissance in addition to supporting the work of the solitary reader, “today's libraries are incubators, collaboratories, the modern equivalent of the seventeenth-century coffeehouse: part information market, part knowledge warehouse, with some workshop thrown in for good measure.”3 consider some of the transformative synergies that are already being realized in libraries experimenting with makerspace across the country: • a child reading about robots able to go hands-on with robotics toolkits, even borrowing the kit for an extended period of time along with the book that piqued the interest; surely such access enables the child to develop a powerful sense of agency from early childhood, including a perception of self as being productive and much more than a consumer. • students or researchers trying to understand or make sense of a chemical model or novel protein strand able not only to visualize and manipulate the subject on a two-dimensional screen, but to relatively quickly print a real-world model to be able and tangibly explore the subject from all angles. • individuals synthesizing knowledge across disciplinary boundaries able to interact with members of communities of practice in a non-threatening environment; learning, developing, and testing ideas—developing rapid prototypes in software or physical media, with a librarian at the ready to assist with resources and dispense advice regarding intellectual property opportunities or concerns. the american libraries association estimates that as of this printing there are approximately 121,169 libraries of all kinds in the united states today; if even a small percentage recognize and begin to realize the full impact that makerspace in the library can have, the future looks bright indeed. editorial board thoughts: libraries as makerspace? | colegrove 5 references 1. dale dougherty, “the new stacks: the maker movement comes to libraries” (presentation at the midwinter meeting of the american library association, seattle, washington, january 28, 2013). http://alamw13.ala.org/node/10004. 2. michele hlubinka et al., makerspace playbook, december 2012, accessed february 13, 2012, http://makerspace.com/playbook. 3. alex soojung-kim pang, "if libraries did not exist, it would be necessary to invent them," contemplative computing, february 6, 2012, http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-benecessary-to-invent-them.html. http://alamw13.ala.org/node/10004 http://makerspace.com/playbook http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-be-necessary-to-invent-them.html http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-be-necessary-to-invent-them.html presidents it is with great pleasure that i bring you greetings as your lita president. it is an honor to be the lita president and to follow the very productive term of tom wilson. as you know, this column is an opportunity for the president to increase communication with the membership. as lita president, i plan to concentrate my efforts on continuing to capitalize on the association’s many strengths. last year at lita’s 2004 town meeting at ala midwinter meeting in san diego, we built on the planning efforts of tom wilson. to assist in the development of goals to support lita’s developing vision statement, i gathered additional information for the next planning phase. i asked the town meeting’s more than eighty attendees to consider these three questions. 1. what do you like about lita, its organizational structure, and its programs? 2. what services or products does lita currently offer that you value? 3. what new services or products would increase lita’s value to you? the attendees gathered at small tables and discussed the three questions. after the discussions, each table shared their answers. here’s what they thought: 1. what do you like about lita, its organizational structure, and its programs? � easy to become involved � inviting � great networking � forward thinking � lack of bureaucracy � flexibility � enthusiasm of members � encourages discussion � openness to new members � open structure 2. what services or products does lita currently offer that you value? � lita_l � regional institutes � top tech trends � ter � national forum � ital � lita publications � interest groups � programming � networking with knowledgeable people 3. what new services or products would increase lita’s value to you? � coordination with other divisions � liaison at state levels � mentoring (partnering with nmrt) � leadership in advising libraries � best practices and competencies � more standards involvement � more access of those not attending conference � webcasts in one-to-two-hour sessions � partnerships with vendors � more diversity in the organization � conference reports � use more leading-edge technology � more content on the web site � more focus on technical issues � blogging � online newsletter � announcement service � mechanism for sharing information � another electronic discussion list for tech how to’s � tech reviews and recommendations � online community support � rss feed on the web site � stronger voice on technology and policy � semiannual e-mail messages to all lita members as you can tell from these comments, lita provides many valuable services to our members, and to the association at large. however, there are many opportunities for us to do more. to accomplish the goal of expanding our services and setting priorities, we have continued our emphasis on strategic planning activities. along with mary taylor, lita’s executive director, i attended several alaahead planning sessions last fall. our participation in these meetings reinforced our commitment to the planning processes we used to draft lita’s goals and strategies for the next five years. this draft plan is scheduled for review by the lita board during the 2005 ala midwinter meeting. additionally, lita’s strategic plan will be discussed at the 2005 lita town meeting, also during midwinter meeting. we look forward to finalizing our strategic plan by the 2005 ala annual conference. i believe this plan will help us achieve our goals and be used to gauge our successes. in addition to our strategic planning process, lita has made great strides in a number of significant areas. the lita web advisory task force, chaired by zoe stewart marshall, has been working to “establish policies governing the lita web site’s content, responsibilities for its management, and an approval process for posting content 2 information technology and libraries | march 2005 president’s column colby mariva riggs colby mariva riggs is lita president and project coordinator, library systems, university of california–irvine. (continued on page 31) online.” they have implemented several process improvements already and will complete their work by the 2005 ala annual conference. this past fall, michelle frisque, lita web manager, conducted a survey of our members about the lita web site. michelle and the web coordinating committee are already working on a new look and feel for the lita web site based on the survey comments, and the result promises to be phenomenal. on top of all of the current activities, new vision statement, strategic planning, and the lita web site redesign, mary taylor and the lita board worked with a graphic designer to develop a new lita logo. after much deliberation, the new logo debuted at the 2004 lita national forum with great enthusiasm. many members commented that the new logo expresses the “energy” of lita and felt the change was terrific. with your help, lita had a very successful conference in orlando. although there were weather and transportation difficulties, the lita programs and discussions were of the highest quality, as always. the program and preconference offerings for the upcoming annual conference in chicago promise to be as strong as ever. don’t forget, lita also offers regional institutes throughout the year. check the lita web site to see if there’s a regional institute scheduled in your area. lita held another successful national forum in fall 2004 in st. louis, “ten years of connectivity: libraries, the world wide web, and the next decade.” the threeday educational event included excellent preconferences, general sessions, and more than thirty concurrent sessions. i want to thank the wonderful 2004 lita national forum planning committee, chaired by diane bisom, the presenters, and the lita office staff who all made this event a great experience. the next lita national forum will be held at the san jose marriott, san jose, california, september 29–october 2, 2005. the theme will be “the ubiquitous web: personalization, portability, and online collaboration.” thomas dowling, chair, and the 2005 lita national forum planning committee are preparing another “must attend” event. next year marks lita’s fortieth anniversary. 2006 will be a year for lita to celebrate our history, future, and our many accomplishments. we are fortunate to have lynne lysiak leading the fortieth anniversary task force activities. i know we all will enjoy the festivities. i look forward to working with many of you as we continue to make lita a wonderful and vibrant association. i encourage you to send me your comments and suggestions to further the goals, services, and activities of lita. 32. terence cavanaugh, “e-books and accommodations: is this the future of print accommodation?” teaching exceptional children 35, no. 2 (2002): 56–61. 33. skip pratt, “e-books and e-publishing: ignore ms reader and palm os at your own peril,” knowledge download, 2002. accessed dec. 27, 2004, www.knowledge-download.com/260802 -e-book-article. 34. davina witt, “audience profile and demographics,” mar./apr. 2003. accessed dec. 27, 2004, www.bookbrowse.com/ media/audience.cfm. 35. geoff daily, “gameboy advance: not just playing with games,” econtent 27, no. 5 (2004): 12–14. 36. associated press, “flexible e-paper on its way,” associated press, 7 may 2003. accessed dec. 27, 2004, www.wired.com/news. 37. richard mayer, multimedia learning (cambridge, uk: cambridge university press, 2000). 38. sottong, “e-book technology.” 39. amc, “film facts: read about lost films.”accessed june 19, 2003, www.amctv.com/article?cid=1052. 40. ronald jantz, “e-books and new library service models: an analysis of the impact of e-book technology on academic libraries,” information technology and libraries 20, no. 2 (2001): 104–15. 41. susan lareau, the feasibility of the use of e-books for replacing lost or brittle books in the kent state university library, 2001, eric, ed 459862. accessed dec. 27, 2004, http://searcheric.org. 42. eli edwards, “ephemeral to enduring: the internet archive and its role in preserving digital media,” information technology and libraries 23, no. 1 (2004): 3–8. 43. norm parry, format proliferation in public libraries, 2002, eric, ed 470035,. accessed dec. 27, 2004, http://searcheric.org. 44. david m. levy, scrolling forward: making sense of documents in the digital age (new york: arcade pub., 2001). 45. about alan lomax. accessed dec. 27 2004, www.alan -lomax.com/about.html. dispelling five myths about e-books | gall 31 (president’s column continued from page 2) art & tech 24 ebsco cover 2 lita covers 3–4 index to advertisers 116 information technology and libraries | september 2009 success factors and strategic planning: rebuilding an academic library digitization program cory lampert and jason vaughan this paper discusses a dual approach of case study and research survey to investigate the complex factors in sustaining academic library digitization programs. the case study involves the background of the university of nevada, las vegas (unlv) libraries’ digitization program and elaborates on the authors’ efforts to gain staff support for this program. a related survey was administered to all association of research libraries (arl) members, seeking to collect baseline data on their digital collections, understand their respective administrative frameworks, and to gather feedback on both negative obstacles and positive inputs affecting their success. results from the survey, combined with the authors’ local experience, point to several potential success factors including staff skill sets, funding, and strategic planning. e stablishing a successful digitization program is a dialog and process already undertaken or currently underway at many academic libraries. in 2002, according to an institute of museum and library services report, “thirty-four percent of academic libraries reported digitization activities within the past 12 months.” nineteen percent expect to be involved in digitization work in the next twelve months, and forty-four percent beyond twelve months.1 more current statistics from a subsequent study in 2004 reflected that digitization work has both continued and expanded, with half of all academic libraries performing digitization activities.2 fifty-five percent of arl libraries responded to a survey informing part of the 2006 association of research libraries (arl) study managing digitization activities; of these, 97 percent of the respondents indicated engagement in digitization.3 the 2008 ithaka study key stakeholders in the digital transformation in higher education found that nearly 80 percent of large academic libraries either already have or plan to have digital repositories.4 with digitization becoming the norm in many institutions, the time is right to consider what factors contribute to the success and rapid growth of some library digitization programs while other institutions find digitization challenging to sustain. the evolution of digitization at the unlv libraries is doubtless a journey many institutions have undertaken. over the past couple of years, those responsible for such a program at the unlv libraries have had the opportunity to revitalize the program and help collaboratively address some key philosophical questions that had not been systematically asked before, let alone answered. associated with this was a concerted focus to engage other less involved staff. one goal was to help educate them on academic digitization programs. another goal was to provide an opportunity for input on key questions related to the programs’ strategic direction. as a subsequent action, the authors conducted a survey of other academic libraries to better understand what factors have contributed to their programs’ own success as well as challenges that have proven problematic. many questions asked of our library staff in the planning and reorganization process were asked in the survey of other academic libraries. while the unlv libraries have undertaken what is felt are the proper structural steps and have begun to author policies and procedures geared toward an efficient operation, the authors wanted to better understand the experiences, key players, and underlying philosophies of other institutional libraries as theses pertain to their own digitization program. the following article provides a brief context relating the background of the unlv libraries’ digitization program and elaborates on the authors’ efforts toward educating library colleagues and gaining staff buy-in for unlv’s digitization program—a process that countless other institutions have no doubt experienced, led, or suffered. the administered survey to arl members dealt with many topics similar to those that arose during the authors’ initial planning and later conversations with library staff, and as such, survey questions and responses are integrated in the following discussion. the authors administered a 26-question survey to the 123 members of the arl. the focus of this survey was different from the previously mentioned arl study managing digitization activities, though several of the questions overlapped to some degree. in addition to demographic or concrete factual types of questions, the unlv libraries digitization survey had several questions focused on perceptions—that is, staff support, administrative support, challenges, and benefits. areas of overlap with the earlier arl survey are mentioned in the appropriate context. though unlv isn’t a member of the arl, we consider ourselves a research library, and, regardless, it was a convenient way to provide some structure to the survey. survey responses were collected for a forty-five-day period from mid-june to late july, 2008. through visiting each and every arl library’s website, the authors identified the individuals that appeared to be the “leaders” of the arl digitization programs, with instructions to forward the message to a colleague if cory lampert (cory.lampert@unlv.edu) is digitization projects librarian and jason vaughan (jason.vaughan@unlv.edu) is director, library technologies, university of nevada las vegas. success factors and strategic planning | lampert and vaughan 117 they themselves had been incorrectly identified. this was very tricky, and revealed numerous program structures in place, differences between institutions in promoting their collections, and so on. the authors didn’t necessarily start with the presumption that all arl libraries even have a digitization program, but most (but not all) either seemed to have a formal organized digitization program with staffing, or at least had digitized and made available something, even if only a single collection. we e-mailed a survey announcement and a link to the survey to the targeted individuals, with a follow-up reminder a month later. responses were anonymous, and respondents were allowed to skip questions; thus the number of responses for the twenty-six questions making up the survey ranged from a low of thirty (24.4 percent) to a high of forty-four responses (35.8 percent). the average number of responses for each of the questions was 39.8, yielding an overall response rate of 32.4 percent. questions were of three types: multiple choice (select one answer), multiple choice (mark all that apply), and open text. in addition, some of the multiple choice questions allowed additional open text comments. survey responses appear in appendix a. n context of the unlv libraries’ digitization program “digital collection,” for the purpose of the unlv library digitization survey, was defined as a collection of library or archival materials converted to machine-readable format to provide electronic access or for preservation purposes; typically, digital collections are library-created digital copies of original materials presented online and organized to be easily searched. they may offer features such as: full text search, browsing, zooming and panning, side by side comparison of objects, and export for presentation and reuse. one question the survey asked was “what year do you feel your library published its first ‘major’ digital collection?” responses ranged from 1990 to 2007; the general average of all responses was 2001. the earlier arl study found 2000 as the year most respondents began digitization activities.5 mirroring this chronology, the unlv libraries has been active in designing digital projects and digitizing materials from library collections since the late 1990s. technical web design expertise was developed in the cataloging unit (later renamed bibliographic and metadata services), and some of the initial efforts were to create online galleries and exhibits of visual materials from special collections, such as the jeanne russell janish (1998) exhibit.6 subsequently, the unlv libraries purchased the contentdm digital collection management software, providing both back-end infrastructure and front-end presentation for digital collections. later, the first digitization project with search functionality was created in partnership with special collections and was funded by a unlv planning initiative award received in 1999. the early las vegas (2003) project focused on las vegas historical material and was designed to guide users to search, retrieve, and manipulate results using contentdm software to query a database.7 unlv’s development corresponds with regional developments in utah in 2001, when “the largest academic institutions in utah were just beginning to develop digital imaging projects.”8 data from the 2004 imls study showed that, in the twelve months prior to the study release in 2004, the majority of larger academic libraries had digitized between one and five hundred images for online presentation.9 in terms of staffing, digitization efforts occur in a wide variety of configurations, from large departments to solo librarians managing volunteers. for institutions with recognized digitization staff, great variations exist between institutions in terms of where in the organizational chart digitization staff are placed. boock and vondacek’s research revealed that, of departments involved in digitization, special collections, archives, technical services, and newly created digital library units are where digitization activities most commonly take place.10 a majority of respondents to the arl study indicated that some or all activities associated with digitization are distributed across various units in the library.11 in 2003, the unlv libraries created a formal department within the knowledge access management division—web and digitization services (wds)—initially comprising five staff focused on the development of the unlv libraries’ public website, the development of web-based applications and databases to manage and efficiently present information resources, and the digitization and online presentation of library materials unique to the unlv libraries’ collections and of potential interest to a wider audience. augmenting their efforts were individuals in other departments helping with metadata standards, content selection, and associated systems technical support. the unlv library digitization survey showed that the majority (78 percent) of libraries that responded have at least one full-time staff member whose central job responsibility is to support digitization activities. this should not imply the existence of a fully staffed digitization program; the 2006 imls study found that 74.1 percent of larger academic libraries described themselves as lacking in sufficiently skilled technology staff to accomplish technology-related activities.12 central to any digitization program should be some structure in terms of how projects are proposed and subsequently prioritized. to help guide the priorities 118 information technology and libraries | september 2009 of unlv’s infant wds department, a digital projects advisory committee was formed to help solicit and prioritize project ideas, and subsequently track the development of approved projects. this committee’s work could be judged as having mixed success partly because it met too infrequently, struggled with conflicting philosophical thoughts on digitization, and was confronted with the reality that staff that were needed to help bring approved ideas to fruition simply weren’t in place because of too many other library priorities drawing attention away from digitization. an evaluation of the lessons learned from these early years can be found in brad eden’s article.13 the unlv library digitization survey had several questions related to management and prioritization for digital projects and shows that despite the challenges of a committee-based decisionmaking structure, when a formal process is in place at all, 42.1 percent of survey respondents used a committee versus a single decision maker (23.7 percent) for determining to whom projects are proposed for production. a follow-up question asked “how are approved projects ultimately prioritized?” the most popular response (54.1 percent) indicated “by a committee for review by multiple people,” followed by “no formal process” (27 percent). “by a single decision maker” was selected by 18.9 percent of the respondents. the earlier arl study asked a somewhat related question: “who makes decisions about the allocation of staff support for digitization efforts? check all that apply.” out of seven possible responses, the three most popular were “head of centralized unit,” “digitization team/committee/working group,” and “other person”; the other person was most often in an administrative capacity, such as a dean, director, or department head.14 administrative support for a program was another variable the unlv library digitization survey investigated. the survey asked respondents to rate, on a scale of one to five, “how would you characterize current support for digitization by your library’s administration?” more than 40 percent of responses indicated “consistent support,” followed by 31 percent of respondents indicating “very strong support, top priority,” 14.3 percent ranking support as neutral, and 14.2 percent claiming “minimal support” or “very little support, or some resistance.” it was also clear from some of the other questions’ responses that the dean or director’s support (or lack thereof) can have dramatic effects on the digitization program. 2005 brought change to the unlv libraries in the form of a new dean. well-suited for the digitization program, she came from california, a state very heavily engaged and at the forefront of digitization within the library and larger academic environment. one of her initiatives was a retooling of the digitization program at the unlv libraries, and her enthusiasm reflects a growing awareness of administrators regarding the benefits of digitization. n reorganization, library staff engagement, and decision making in 2006, two new individuals joined unlv libraries’ web and digitization services department, the digitization projects librarian (filling a vacancy), and the web technical support manager (a new position). a bit later, the systems department (providing technical support for the web and digitization servers, among other things), and the wds department were combined into a single unit and renamed library technologies. collectively, these changes brought new and engaged staff into the digitization program and combined under one division many of the individuals responsible for digital collection creation and support. perhaps more subtlety, this arrangement also provided formal acknowledgement of the importance and desire of publishing digital collections. with the addition of new staff and a reorganization, a piece still missing was a resuscitation of library stakeholders to help solicit, prioritize, and manage the creation of digital collections and an overall vision guiding the program. while the technical expertise, knowledge of metadata and imaging standards, and deep-rooted knowledge of digitization programs and concepts existed within the library technologies staff, other knowledge didn’t—primarily in-depth knowledge of the unlv libraries’ special collections and a track record of deep engagement with college faculty and the educational curriculum. similar to other organizations, the unlv libraries had not only created a new unit, but was also poised to introduce cross-departmental project groups that would collaborate on digitization activities. in their study of arl and greater western library association (gwla) libraries, book and vondracek found that this was the most commonly used organizational structure.15 knowledge of the concepts of a digitization program and what is involved in digitizing and sustaining a collection was not widespread among other library colleagues. acknowledged, but not guaranteed up front for the unlv libraries, was the likely eventual reformation of a group of interested and engaged library stakeholders charged to solicit, prioritize, and provide oversight of the unlv libraries’ digitization program. for various reasons, the authors wanted to garner staff buy-in to the highest degree possible. apart from wanting less informed colleagues to understand the benefits of a digitization program, it was also likely that such colleagues would help solicit projects through their liaison work with programs of study across campus. one unlv library digitization survey question asked, “how would you characterize support for digitization in your library by the majority of those providing content for digitization projects?” “consistent support” was indicated by 65.9 percent of respondents; 15.9 percent indicated “very strong support, top priority,” 13.6 percent indicated neutrality, and 4.6 success factors and strategic planning | lampert and vaughan 119 percent indicated either minimal support or even some resistance. to help garner staff buy-in and set the stage for revitalizing the unlv libraries’ digitization efforts, we began laying the groundwork to educate and engage library staff in the benefits of a digitization program. this work included language successfully woven into the unlv libraries’ strategic plan and an authored white paper posing engaging questions to the larger library audience related to the strategic direction of the program. finally, we planned and executed two digitization workshops for library staff. n the strategic plan one unlv library digitization survey question asked, “is the digitization program or digitization activities referenced in your library’s strategic plan?” a total of 63.4 percent indicated yes, with an additional 22 percent indicating no specific references, but rather implied references. only 7.3 percent indicated that the digitization program was not referenced in any manner in the strategic plan, while, surprisingly, 3 responses (7.3 percent) indicated that their library doesn’t have a strategic plan. the unlv libraries’ strategic plan is an important document authored with wide feedback from library staff, and it exemplifies the participatory decision-making process in place in the library. the current iteration of the strategic plan covers 2007–9 and includes various goals with supporting strategies and action items.16 in addition, all action items have associated assessment metrics and library staff responsible for championing the action items. departmental annual reports explicitly reference progress toward strategic plan goals. as such, if goals related to the digitization program appear in the strategic plan, that’s a clear indication, to some degree, of staff buy-in in acknowledging the significance of the digitization program. fortunately, digitization efforts figure prominently in several goals, strategies, and action items, including the following: n increasingly provide access to digital collections and services to support instruction, research, and outreach while improving access to the unlv libraries’ print and media collections. n provide greater access to digital collections while continuing to build and improve access to collections in all formats to meet the research and teaching needs of the university. identify collections to digitize that are unique to unlv and that have a regional, national, and international research interest. create digital projects utilizing and linking collections. develop and adapt metadata and scanning standards that conform to national standards for all formats. provide content and metadata for regional and national digital projects. continue to develop expertise in the creation and management of digital collections and information. collaborate with faculty, students, and others outside the library in developing and presenting digital collections. n be a comprehensive resource for the documentation, investigation, and interpretation of the complex realities of the las vegas metropolitan area and provide an international focal point for the study of las vegas as a unique urban and cultural phenomenon. facilitate real and digital access to materials and information that document the historical, cultural, social, and environmental setting of las vegas and its region by identifying, collecting, preserving, and managing information and materials in all formats. identify unique collections that strengthen current collections of national and international significance in urban development and design, gaming, entertainment, and architecture. develop new access tools and enhance the use of current bibliographic and metadata utilities to provide access to physical and digital collections. develop web-based digital projects and exhibits based upon the collections. an associated capital campaign case statement associated with the strategic plan lists several gift opportunities that would benefit various aspects of the unlv libraries; several of these include gift ideas related to the digitization of materials. n the white paper another important step in laying the groundwork for the digitization program was a comprehensive white paper authored by the recently hired digitization projects librarian. the finished paper was originally given to the dean of libraries and thereafter to the administrative cabinet, and eventually distributed to all library staff. the outline of this white paper is provided as appendix b. the purpose of the white paper was multifaceted. after a brief historical context, the white paper addressed perhaps the single most important aspect of a digitization program—program planning—developing the strategic goals of the program, selecting and prioritizing projects though a formal decision-making process, and managing initiatives from idea to reality through efficient project teams. this first topic addressing the core values of the program had a strong educational purpose for the entire library staff—the ultimate audience of the paper. as part of its educational goal, the white paper enumerated the various strengths of digitization and why an institution 120 information technology and libraries | september 2009 would want to sustain a digitization program (providing greater worldwide access to unique materials, promoting and supporting education and learning when integrated with the curriculum, etc.). it defined distinctions between an ephemeral digital exhibit and a long-term published and maintained collection. it discussed the various components of a digital collection—images, multimedia, metadata, indexing, thematic presentation (and the preference to be unbiased), integration with other digital collections and the library website, etc. it posited important questions on sustenance and assessment, and defined concepts such as refreshing of data and migration of data to help set the stage for future philosophical discussions. given the myriad reasons one might want to publish a digital collection, checked by the reality that all the reasons and advantages may not be realized or given equal importance, the white paper listed several scenarios and asked if each scenario was a strong underlying goal for our program—in short, true or false: n “the libraries are interested in digitizing select unique items held in our collection and providing access to these items in new formats.” n “the libraries are interested in digitizing whole runs of an information resource for access in new formats.” n “the libraries should actively pursue funding to support major digitization initiatives.” n “the libraries should take advantage of the unique publicity, promotion, and marketing opportunities afforded by a digital project/program.” continuing with a purpose of defining boundaries of the new program, the paper asked questions related to audience, required skill sets, and resources. the second primary topic introduced the selection and prioritization of the items and ideas suggested for digitization. it posed questions related to content criteria (why does this idea warrant consideration? would complex or unique metadata be required from a subject specialist?) and listed various potential evaluative measures of project ideas (should we do this if another library is already doing a very similar project?). technical criteria considerations were enumerated, touching on interoperability of collections in different formats, technical infrastructure considerations, and so on. multiple simultaneous ideas beg for prioritization, and the white paper proposed a formal review process and the library staff and skill sets that would help make such a process successful. the third primary topic focused on the details of carrying an approved idea to reality, and strengthened the educational purpose of the white paper. it described the general planning steps for an approved project and included a list of typical steps involved with most digital projects—scanning; creating metadata, indexes, and controlled vocabulary; coding and designing the web interface; loading records into unlv libraries’ contentdm system; publicizing the launch of the project; and assessing the project after completion. one unlv library digitization survey question was related to thirteen such skills the unlv libraries identified as critical for a successful digitization program. the question asked respondents to rate skill levels possessed by personnel at their library, based on a five-point scale (from one to five: “no expertise,” “very limited expertise,” “working knowledge/enough to get by,” “advanced knowledge,” and “tremendous expertise”). neither “no expertise” nor “very limited expertise” garnered the highest number of responses for any of the skills. the overall rating average of all thirteen skills was 3.79 out of 5. the skills with the highest rating averages were “metadata creation/cataloging” 4.4 and “digital imaging/document scanning/post image processing/photography” with 4.27. the skills with the lowest rating averages were “marketing and promotion” with 2.95 followed by “multimedia formats” with 3.33. the unlv libraries’ white paper contained several appendixes that likely provided some of the richest content of the white paper. with the educational thrust completed, the appendixes drew a roadmap of “where do we want to go from here?” this roadmap suggested the revitalization of an overarching digital projects advisory committee, potential members of the committee, and functions of the committee. the committee would be responsible for soliciting and prioritizing ideas and tracking the progress of approved ideas to publication. the appendixes also proposed project teams (which would exist for each project), likely members of the project teams, and the functions of the project team to complete day-to-day digitization activities. the liaison between the digital projects advisory committee and the project team would be the digitization projects librarian, who would always serve on both. the last page of the white paper provided an illustration highlighting the various steps proposed in the lifecycle of a digital project—from concept to reality. n digitization workshops several months after the white paper had been shared, the next step in restructuring the program and building momentum was sponsoring two forums on digitization. the first one occurred in november 2006 and included two speakers brought in for the event, roy tennant (formerly user services architect with the california digital library and now with oclc) and ann lally (head of the digital initiatives program at the university of washington libraries). this session consisted of a success factors and strategic planning | lampert and vaughan 121 two-hour presentation and q&a to which all library staff were invited, followed by two breakout sessions. all three sessions were moderated by the digitization projects librarian. questions from these sessions are provided in appendix c. the breakout sessions were each targeted to specific departments in the unlv libraries. the first focused on providing access to digital collections (definitions of digital libraries, standards, designing useful metadata, accessibility and interoperability, etc.). the second focused on components of a well-built digital library (goals of a digitization program, content selection criteria, collaboration, evaluation and assessment, etc.). colleagues from other libraries in nevada were invited, and the forum was well attended and highly praised. the sessions were recorded and later made available on dvd for library staff unable to attend. this initial forum accomplished two important goals. first, it was an allstaff meeting offering a chance to meet, explore ideas, and learn from two well-known experts in the field. second, it offered a more intimate chance to talk about the technical and philosophical aspects of a digitization program for those individuals in the unlv libraries associated with such tasks. as a momentum-building opportunity for the digitization program, the forum was successful. the second workshop occurred in april 2007. to gain initial feedback on several digitization questions and to help focus this second workshop, we sent out a survey to several dozen library staff—those that would likely play some role at some point in the digitization program. the survey contained questions focused on several thematic areas: defining digital libraries, boundaries to the digitization program, users and audience, digital project design, and potential projects and ideas. it contained thirteen questions consisting of open-ended response questions, questions where the respondent ranked items on a five-point scale, and “select all that apply”–type questions. we distributed the survey to invitees to the second workshop, approximately three dozen individuals; of those, eighteen (about 50 percent) responded to most of the questions. the survey was closely tied to the white paper and meant to gauge early opinions on some of the questions posed by that paper. whereas the first workshop included some open q&a, the second session was structured as a hands-on workshop to answer some of the digitization questions and to illustrate the complexity of prioritizing projects. the second workshop began with a status update on the retooling of the unlv libraries’ digitization program. this was followed by an educational component that focused on a diagram that detailed the workflow of a typical digitization project and who was involved and that emphasized the fact that there is a lot of planning and effort needed to bring an idea to reality. in addition, we discussed project types and how digital projects can vary widely in scope, content, and purpose. finally, we shared general results from the aforementioned survey to help set the stage for the structured hands-on exercises. the outline for this second workshop is provided in appendix d. one question of the unlv library digitization survey asked, “on a scale of 1 to 5, how important are each of the factors in weighing whether to proceed with a proposal for a new digital collection project, or enhancement of an existing project?” eight factors were listed, and the fivepoint scale was used (from one to five: “not important,” “less important,” “neutral,” “important,” and “vitally important”). the average rating for all eight factors was 3.66. the two most important factors were “collection includes unique items” (4.49 average rating) and “collection includes items for which there is a preservation concern or to make fragile items more accessible to the public” (3.95 average rating). the factors with the lowest average ratings were “collection includes integration of various media into a themed presentation” (2.54 average rating) followed by “collection involves a whole run of an information resource (i.e., such as an entire manuscript, newspaper run, etc.” (3.39 average rating). the earlier arl survey asked a somewhat related question, “what is/has been the purpose of these digitization efforts? check all that apply.” of the six possible responses (which differed somewhat from those in the unlv library digitization survey), the most frequent responses were “improved access to library collections,” “support for research,” and “preservation.”17 the earlier survey also asked the question, “what are the criteria for selecting material to be digitized? check all that apply.” the most frequent responses were “subject matter,” “material is part of a collection being digitized,” and “rarity or uniqueness of the item(s).”18 the first exercise of the second digitization workshop focused on digital collection brainstorming. the authors provided a list of ten project examples and asked each of the six tables (with four colleagues each) to prioritize the ideas. afterward, a speaker from each table presented the prioritizations and defended their rankings. this exercise successfully illustrated to peers in attendance that different groups of people have different ideas about what’s important and what constitutes prime materials for digitization. the rankings from the varying tables were quite divergent. a related question asked of the arl libraries in the unlv library digitization survey was “from where have ideas originated for existing, published digital collection at your library?” and offered six choices. respondents could mark multiple items. the most chosen answer (92.7 percent) was “special collections, archives, or library with a specialized collection or focus.” the least chosen answer (51.2 percent) was “an external donor, friend of the library, community user, etc.” for the second part of the workshop exercise, each table came up with their own digital collection ideas, defined the audience and content of the proposal, and defended and 122 information technology and libraries | september 2009 explained why they thought these were good proposals. fourteen unique and varied ideas were proposed, most of which were tightly focused on las vegas and nevada, such as “history of las vegas,” “unlv yearbooks,” “las vegas gambling and gamblers,” and “african american entertainers in las vegas.” other proposals were less tied to the area, such as a “botany collection,” “movie posters,” “children’s literature,” “architecture,” and “federal land management.” this exercise successfully showed that ideas for digital collections stretch across a broad spectrum, as broad as the individual brainchilden themselves. finally, in the last digitization workshop exercise, each table came up with specialties, roles, and skills of candidates who could potentially serve on the proposed committee, and defended their rationale—in other words, committee success factors. this exercise generated nineteen skills seen as beneficial by one or more of the group tables. at the end of the workshop, we asked if others had alternate ideas to the proposed committee. none surfaced, and the audience thought such a committee should be reestablished. this second workshop concluded with a brief discussion on next steps—drafting a charge for the committee, choosing members, and a plug for the expectation of subject liaisons working with their respective areas to help better identify opportunities for collaboration on digital projects across campus. n toward the future digital projects currently maintained by the unlv libraries include both static web exhibits in the tradition of unlv’s first digitization efforts, as well as several searchable contentdm–powered collections. the unlv libraries have also sought to continue collaborative efforts, participating as project partners for the western waters digital library (phase 1) and continuing in a regional collaboration as a hosting partner in the mountain west digital library. partnerships were shown in the unlv library digitization survey to garner increased buy-in for projects, with one respondent commenting that faculty partnerships had been “the biggest factor for success of a digital library project.” institutional priorities at unlv libraries reflect another respondent’s comment regarding “interesting archival collections” as a success factor. one recently launched unlv collection is the showgirls collection (2006), focused on a themed collection of historical material about las vegas entertainment history.19 another recently launched collection, the nevada test site oral history project (2008), recounts the memories of those affiliated with and affected by the nevada test site during the era of cold war nuclear testing and includes searchable transcripts, selected audio and video clips, and scanned photographs and images.20 with general library approval, the restructured digitization projects advisory committee was established in july 2007 with six members drawn from library technologies, special collections, the subject specialists, and at large. the advisory committee has drafted and gained approval for several key documents to help govern the committee’s future work. this includes a collection development policy for digitization projects and a project proposal form to be completed by the individual or group proposing an idea for a digital collection. at the time of writing, the committee is just now at the point of advertising the project proposal form and process, and time will tell how successful these documents prove. in the unlv library digitization survey, 65.4 percent responded that a digitization mission statement or collection development policy was in place at their institution. one goal at unlv is to “ramp up” the number of simultaneous digitization projects underway at any one time at unlv. many items in the special collections are ripe for digitization. many of these are uncataloged, and digitizing such collections would help promote these hidden treasures. related to ramping up production, one unlv library digitization survey question asked, “on average over the past three years, approximately how many new digital collections are published each year?” responses ranged from zero new collections to sixty. the average number of new collections added each year was 6.4 for the 32 respondents who gave exact numerical answers. while this is perhaps double the unlv libraries’ current rate of production, it illustrates that increasing production is an achievable goal. staffing and funding for the unlv libraries’ digitization program have both seen increases over the past several years. a new application developer was hired, and a new graphics/multimedia specialist filled an existing vacancy. together, these staff have helped with projects such as modifying contentdm templates, graphic design, and multimedia creation related to digital projects, in addition to working on other web-based projects not necessarily related to the digitization program. another position has a job focus shifted toward usability for all things webbased, including digitization projects. in terms of funding, the two most recent projects at the unlv libraries are both the result of successful grants. the recently launched nevada test site oral history project was the result of two grants from the u.s. departments of education and energy. subsequently, a $95,000 lsta grant proposal seeking to digitize key items related to the history of southern nevada from 1900 to 1925 was funded for 2008–9, with the resulting digital collection publicly launched in may 2009. this collection, southern nevada: the boomtown years, contains more than 1,500 items from several institutions, focused on the heyday of mining town life in southern success factors and strategic planning | lampert and vaughan 123 nevada during the early twentieth century.21 this grant funded four temporary positions: a metadata specialist, an archivist, a digital projects intern, and an education consultant to help tie the digitized collection into the k–12 curriculum. grants will likely play a large role in the unlv libraries’ future digitization activities. the unlv library digitization survey asked, “has your institution been the recipient of a grant or gift whose primary focus was to help efforts geared toward digitization of a particular collection or to support the overall efforts of the digitization program?” the question sought to determine if grants had played a role, and if so, whether it was primarily large grants (defined as > $100,000), small grants (< $100,000), or both. the majority of responses (46.2 percent), indicated a combination of both small and large grants had been received in support of a project or the program. an additional 25.6 percent indicated that large grants had played a role, and 23.1 percent indicated that one or more small grants had played a role. two respondents (5.1 percent) indicated that no grants had been received or that they had not applied for any grants. the earlier arl survey asked the question, “what was/is the source of the funds for digitization activities? check all that apply.” of seven possible responses, “grant” was the second most frequent response, trailing only “library.”22 with an eye toward the future, the survey administered to arl libraries asked two blunt questions summarizing the overall thrust of the survey. one of the final open-ended survey questions asked, “what are some of the factors that you feel have contributed to the success of your institution’s digitization program?” forty respondents offered answers that ranged from listing one item to multiple items. several responses along the same general theme seemed to surface, which could be organized into rough clusters. in general, support from library administration was mentioned by a dozen respondents, with such statements as “consistent interest on the part of higher level administration,” “having support for the digitization program at an administrative level from the very beginning,” “good support from the library administration,” “support of the dean,” and, mentioned multiple times in the same precise language, “support from library administration.” faculty collaboration and interest across campus was mentioned by ten respondents, evidenced by statements such as “strong collaboration with faculty partners,” “support of faculty and other partners,” “interest from faculty,” “heavily involving faculty in particular . . . ensures that we can have continued funding since the faculty can lobby the provost’s office,” and “grant writing partnerships with faculty.” passionate individuals involved with the program and/or support from other staff in the libraries were mentioned by ten respondents, with comments such as “program management is motivated to achieve success,” “a strong department head,” “individual staff member ’s dedication to a project,” “commitment of the people involved,” “team work, different departments and staff willing to work together,” and “supportive individuals within the library.” having “good” content to digitize was mentioned by seven respondents, with statements such as “good content,” “collection strength,” “good collections,” and “availability of unique source materials.” strategic plan or goals integration was mentioned in several responses, such as “strong financial commitment from the strategic plan” and “mainstreaming the work of digital collection building into the strategic goals of many library departments.” successful grants and donor cultivation were mentioned by four respondents. other responses were more unique, such as one respondent’s one-word response—“luck”—and other responses such as “nimbleness, willingness, and creativity,” and “a vision for large-scale production, and an ability to achieve it.” the final unlv library digitization survey question asked, “what are the biggest challenges for your institution’s digitization program?” thirty-nine respondents provided feedback, and again, several variations on a theme emerged. the most common response, unsurprisingly, “not enough staffing,” was mentioned by eighteen respondents, with responses such as “lack of support for staffing at all necessary levels,” “the real problem is people, we don’t have enough staff,” “limited by staff,” and “we need more full-time people.” following this was (a likely related response) “funding,” mentioned by another nine respondents, with statements such as “funding for external digitization,” “identifying enough funding to support conversion,” “we could always use more money,” and, succinctly, “money.” related to staffing, specifically, six responses focused on technical staff or support from technical staff, such as “need more it (information technology) staff,” “need support from existing it staff,” “not enough application development staff,” and “limited technical expertise.” prioritization and demand issues surfaced in six responses, with responses such as “prioritizing efforts now that many more requests for digital projects have been submitted,” “prioritization,” “can’t keep up with demand,” and “everyone wants to digitize everything.” workflow was mentioned in four responses, such as “workflow bottlenecks,” “we need to simplify the process of getting materials into the repository,” and “it takes far longer to describe an object than to digitize it, thus creating bottlenecks.” “not enough space” was mentioned by three respondents, and “maintaining general librarywide staff support for the program” was mentioned by two respondents. the unlv libraries will keep in mind the experiences of our colleagues, as few, if any, libraries are likely immune to similar issues. 124 information technology and libraries | september 2009 n conclusions the unlv library digitization survey revealed, not surprisingly, that not all libraries, even those of high stature, are created equally. many have struggled to some extent in growing and sustaining their digitization programs. many have numerous published projects, others have few or perhaps even none. administrative and fellow colleague support varies, as does funding. additional questions remain to be tackled at the unlv libraries. how precisely will we define success for the digitization program? by the number of published collections? by the number of successful grants executed? by the number of image views or metadata record accesses? by the frequency of press in publications and word-of-mouth praise from fellow colleagues? ideas abound, but no definitive answers exist as of yet. at the larger level, other questions are looming. as libraries continue to promote themselves as relevant in the digital age, and promote themselves as a (or the) central partner in student learning, to what degree will libraries’ digital collections be tied into the educational curriculum, whether at their own affiliated institutions or with k–12 in their own states as well as beyond? clearly the profession is changing, with library schools creating courses and certificate programs in digitization. discussions about the integration of various information silos, metadata crosswalking, and item exposure in other online systems used by students will continue. library digitized collections are primary resources involved in such discussions. while these questions persist, it’s hoped that at a minimum, the unlv libraries have established the foundational structure to foster what we hope will be a successful digitization program. references 1. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2002 report,” may 23, 2002, www.imls.gov/publications/ techdig02/2002report.pdf (accessed mar. 1, 2009). 2. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2006 report,” jan. 2006, www.imls.gov/resources/ techdig05/technology%2bdigitization.pdf (accessed mar. 1, 2009). 3. rebecca mugridge, managing digitization activities, spec kit 294 (washington, d.c.: association of research libraries, 2006): 11. 4. ross housewright and roger schonfeld, “ithaka’s 2006 studies of key stakeholders in the digital transformation in higher education,” aug. 18, 2008, www.ithaka.org/research/ ithakas%202006%20studies%20of%20key%20stakeholders%20 in%20the%20digital%20transformation%20in%20higher%20 education.pdf (accessed mar 1, 2009). 5. ibid. 6. university of nevada, las vegas university libraries, “jeanne russell janish, botanical illustrator: landscapes of china and the southwest,” oct. 17, 2006, http://library.unlv .edu/speccol/janish/index.html (accessed mar. 1, 2009). 7. university of nevada, las vegas university libraries, “early las vegas,” http://digital.library.unlv.edu/early_ las_vegas/earlylasvegas/earlylasvegas.html (accessed mar. 1, 2009). 8. arlitsch, kenning, and jeff jonsson, “aggregating distributed digital collections in the mountain west digital library with the contentdm multi-site server,” library hi tech 23, no. 2 (2005): 221. 9. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2006 report.” 10. michael boock and ruth vondracek, “organizing for digitization: a survey,” portal: libraries and the academy 6, no. 2 (2006), http://muse.jhu.edu/journals/portal_libraries_and_ the_academy/v006/6.2boock.pdf (accessed mar. 1, 2009). 11. mugridge, managing digitization activities, 12. 12. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2006 report.” 13. brad eden, “managing and directing a digital project,” online information review 25, no. 6 (2001), www.emerald insight.com/insight/viewpdf.jsp?contenttype=article& filename=html/output/published/emeraldfulltextarticle/ pdf/2640250607.pdf (accessed mar. 1, 2009). 14. mugridge, managing digitization activities, 32–33. 15. boock and vondracek, “organizing for digitization: a survey.” 16. university of nevada, las vegas university libraries, “university libraries strategic goals and objectives,” june 1, 2005, www.library.unlv.edu/about/strategic_goals.pdf (accessed mar. 1, 2009). 17. mugridge, managing digitization activities, 20. 18. ibid, 48. 19. university of nevada, las vegas university libraries, “showgirls,” http://digital.library.unlv.edu/showgirls/ (accessed mar. 1, 2009). 20. university of nevada, las vegas university libraries, “nevada test site oral history project,” http://digital.library .unlv.edu/ntsohp/ (accessed mar. 1, 2009). 21. university of nevada, las vegas university libraries, “southern nevada: the boomtown years,” http://digital .library.unlv.edu/boomtown/ (accessed may 15, 2009). 22. mugridge, managing digitization activities, 40. success factors and strategic planning | lampert and vaughan 125 appendix a. unlv library digitization survey responses 1. is the digitization program or digitization activities referenced in your library’s strategic plan? answer options (41 responses total) response percent response count yes 63.4 26 no 7.3 3 not specifically, but implied 22.0 9 our library doesn’t have a strategic plan 7.3 3 2. how would you characterize current support for digitization by your library’s administration? answer options (42 responses total) response percent response count very strong support, top priority 31.0 13 consistently supportive 40.5 17 neutral 14.3 6 minimal support, 7.1 3 very little support, or some resistance 7.1 3 3. how would you characterize support for digitization in your library by the majority of those providing content for digitization projects (i.e., regardless of whether those providing content have as a primary or a minor responsibility provisioning content for digitization projects)? answer options (44 responses total) response percent response count very strong support, top priority 15.9 7 consistently supportive 65.9 29 neutral 13.6 6 minimal support 2.3 1 very little support, or some resistance 2.3 1 126 information technology and libraries | september 2009 4. what year do you feel your library published its first “major” digital collection? major is defined as this was the first project deemed as having permanence and which would be sustained; it has associated metadata, etc. if you do not know, you may estimate or type “unknown.” responses ranged from 1990 to 2007. 5. to date, approximately how many digital collections has your library published? (please do not include ephemeral exhibits that may have existed in the past but no longer are present or sustained.) responses ranged from 1 to 1,000s. the great majority of responses were under 100; four responses were between 100 and 200, and one response was “1,000s.” success factors and strategic planning | lampert and vaughan 127 6. on average over the past 3 years, approximately how many new digital collections are published each year? all but two responses ranged from 0 to 10. one response was 13, one was 60. 7. what hosting platform(s) do you use for your digital collections (e.g., contentdm, etc.)? 8. does your institution have an institutional repository (e.g., dspace)? answer options (41 responses total) response percent response count yes 73.2 30 no 26.8 11 9. if the answer was “yes” in question 5, is your institutional repository using the same software as your digital collections? answer options (30 responses total) response percent response count yes 26.7 8 no 73.3 22 128 information technology and libraries | september 2009 10. is there an individual at your library whose central job responsibility is the development, oversight, and management of the library’s digitization program? (for purposes of this survey, central job responsibility means that 50 percent or more of the employee’s time is dedicated to digitization activities.) answer options (38 responses total) response percent response count yes 78.9 30 no 21.1 8 11. are there regular, full-time staff at your library who have as their primary or one of their primary job responsibilities support of the digitization program? for this question, a primary job responsibility means that at least 20 percent of their normal time is spent on activities directly related to supporting the digitization program or development of a digital collection. (mark all that apply) answer options (39 responses total) response percent response count digital imaging/document scanning, post-image processing, photography 82.1 32 metadata creation/cataloging 79.5 31 archival research of documents included in a collection(s) 28.2 11 administration of the hosting server 53.8 21 grant writing/donor cultivation/program or collection marketing 23.1 9 project management 61.5 24 multimedia formats 25.6 10 database design and data manipulation 53.8 21 maintenance, customization, and/or configuration of digital asset management software or features within that software (e.g., contentdm) 64.1 25 programming languages 30.8 12 web design and development 71.8 28 usability 25.6 10 marketing and promotion 28.2 11 none of the above 2.6 1 12. approximately how many individuals not on the full-time library staff payroll (i.e., student workers, interns, fieldworkers, volunteers) are currently working on digitization projects? answers ranged from 0 to “approximately 46.” the majority of responses (24) fell between 0 and 10 workers; twelve responses indicated more than 10; several responses indicated “unknown.” success factors and strategic planning | lampert and vaughan 129 13. has your library funded staff development, training, or conference opportunities that directly relate to your digitization program and activities for one or more library staff members? answer options (41 responses total) response percent response count yes, frequently, one or more staff have been funded by library administration for such activities 48.8 20 yes, occasionally, one or more staff have been funded by library administration for such activities 51.2 21 no, to the best of my knowledge, no library staff member has been funded for such activities 0.0 0 14. where does the majority of digitization work take place? answer options (41 responses total) response percent response count centralized in the library (majority of content digitized using library staff and equipment in one department) 48.8 20 decentralized (majority of content digitized in multiple library departments or outside the library by other university entities) 12.2 5 through vendors or outsourcing 7.3 3 hybrid of approaches depending on project 31.7 13 15. on a scale of 1 to 5 (1 being least important and 5 being vitally important), how important are each of the factors in weighing whether to proceed with a proposal for a new digital collection project or enhancement of an existing project? answer options (41 responses total) not important less important neutral important vitally important rating average response count collection includes item(s) for which there is a preservation concern or to make fragile item(s) more accessible to the public 0 1 9 22 9 3.95 41 collection includes unique items 0 0 1 19 21 4.49 41 collection involves a whole run of an information resource (e.g., an entire manuscript, newspaper run, etc.) 2 5 11 21 2 3.39 41 130 information technology and libraries | september 2009 answer options (41 responses total) not important less important neutral important vitally important rating average response count collection includes the integration of various media (i.e., images, documents, audio) into a themed presentation 7 11 17 6 0 2.54 41 collection has a direct tie to educational programs and initiatives (e.g., university courses, statewide education programs, or k–12 education) 3 3 6 17 12 3.78 41 collection supports scholarly communication and/or management of institutional content 1 4 7 21 8 3.76 41 collection involves a collaboration with university colleagues 1 3 9 18 10 3.83 41 collection involves a collaboration with entities external to the university (e.g., public libraries, historical societies, museums) 2 4 11 19 5 3.51 41 16. from where have ideas originated for existing, published digital collections at your library? in other words, have one or more digital collections been the brainchild of one of the following? (mark all that apply) answer options (41 responses total) response percent response count library subject liaison or staff working with teaching faculty on a regular basis 75.6 31 library administration 65.9 27 special collections, archives, or library with a specialized collection or focus 92.7 38 digitization program manager 63.4 26 university staff or faculty member outside the library 68.3 28 an external donor, friend of the library, community user, etc. 51.2 21 (continued from previous page) success factors and strategic planning | lampert and vaughan 131 17. to whom are new projects first proposed to be evaluated for digitization consideration? answer options (38 responses total) response percent response count to an individual decision-maker 23.7 9 to a committee for review by multiple people 42.1 16 no formal process 34.2 13 18. how are approved projects ultimately prioritized? answer options (37 responses total) response percent response count by a single decision-maker 18.9 7 by a committee for review by multiple people 54.1 20 by departments or groups outside of the library 0.0 0 no formal process 27.0 10 19. are digitization program mission statements, selection criteria, or specific prioritization procedures in use? answer options (40 responses total) response percent response count yes, one or more of these forms of documentation exist detailing process 67.5 27 yes, some criteria are used but no formal documentation exists 25.0 10 no documented process in use 7.5 3 20. what general evaluation criteria do you employ to measure how successful a typical digital project is? (mark all that apply) answer options (39 responses total) response percent response count log analysis showing utilization/record views of digital collection items 69.2 27 analysis of feedback or survey responses associated with the digital collection 38.5 15 publicity generated by, or citations referencing, digital collection 46.2 18 e-commerce sales or reproduction requests for digital images 12.8 5 we have no specific evaluation measures in use 33.3 13 132 information technology and libraries | september 2009 21. has your institution been the recipient of a grant or gift whose primary focus was to help efforts geared toward digitization of a particular collection or to support the overall efforts of the digitization program? answer options (39 responses total) response percent response count we have received one or more smaller grants or donations (each of which was $100,000 or less) to support a digital collection/program 23.1 9 we have received one or more larger grants or donations (each of which was greater than $100,000) to support a digital collection/program 25.6 10 we have received a mix of small and large grants or donations to support a digital collection/program 46.2 18 we have been unsuccessful in receiving grants or have not applied for any grants—grants and/or donations have not played any role whatsoever in supporting a digital collection or our digitization program 5.1 2 22. how would you rate the overall level of buy-in for collaborative digitization projects between the library and external partners (an external partner is someone not on the full-time library staff payroll, such as other university colleagues, colleagues from other universities, etc.)? answer options (41 responses total) response percent response count excellent 41.5 17 good 39.0 16 neutral 4.9 2 minimal 7.3 3 low or none 0.0 0 not applicable—our library has not yet published or attempted to publish a collaborative digital project involving individuals outside the library 7.3 3 23. when considering the content available for digitization, which of the following statements apply? (mark all that apply) answer options (40 responses total) response percent response count at my institution, there is a lack of suitable library collections for digitization 0.0 0 content providers regularly contact the digitization program with project ideas 52.5 21 the main source of content for new digitization projects comes from special collections, archives, other libraries with specialized collections (maps, music, etc.), or local cultural organizations (historical societies, museums) 87.5 35 success factors and strategic planning | lampert and vaughan 133 answer options (40 responses total) response percent response count the main source of content for new digitization projects comes from born digital materials (such as dissertations, learning objects, or faculty research materials) 32.5 13 content digitization is mainly limited by available resources (lack of staffing, space, equipment, expertise) 47.5 19 obtaining good content for digitization can be challenging 7.5 3 24. various types of expertise are important in collaborative digitization projects. please rate the level of your local library staff’s expertise in the following areas (1–5 scale, with 1 having no expertise and 5 having tremendous expertise). answer options (41 responses total) no expertise very limited expertise working knowledge/ enough to “get by” advanced knowledge tremendous expertise n/a rating average response count digital imaging/ document scanning, post image processing, photography 0 1 3 21 16 0 4.27 41 metadata creation/ cataloging 0 0 2 20 18 0 4.40 40 archival research of documents included in a collection 0 2 6 15 16 2 4.15 41 administration of the hosting server 1 2 7 16 15 0 4.02 41 grant writing/ donor cultivation 1 4 13 13 8 2 3.59 41 project management 0 1 9 23 8 0 3.93 41 multimedia formats 0 5 21 10 4 1 3.33 41 database design and data manipulation 0 4 9 14 13 1 3.90 41 (continued from previous page) 134 information technology and libraries | september 2009 answer options (41 responses total) no expertise very limited expertise working knowledge/ enough to “get by” advanced knowledge tremendous expertise n/a rating average response count digital asset management software (e.g., contentdm) 3 0 5 21 11 0 3.93 40 programming languages 4 3 14 9 11 0 3.49 41 web design and development 2 1 13 10 15 0 3.85 41 usability 1 7 12 13 8 0 3.49 41 marketing and promotion 2 11 17 7 3 1 2.95 41 25. what are some of the factors that you feel have contributed to the success of your institution’s digitization program? survey responses were quite diverse because respondents were speaking to their own perceptions and institutional experience. the general trend of responses are discussed in the body of the paper. 26. what are the biggest challenges for your institution’s digitization program? survey responses were quite diverse because respondents were speaking to their own perceptions and institutional experience. the general trend of responses are discussed in the body of the paper. appendix b. white paper organization i. introduction ii. current status of digitization projects at the unlv libraries iii. topic 1: program planning a. are there boundaries to the libraries digitization program? what should the program support? b. what resources are needed to realize program goals? c. who is the user or audience? d. when selecting and designing future projects, how can high-quality information be presented in online formats incorporating new features while remaining un-biased and accurate in service provision? e. to what degree do digitization initiatives need their own identity versus heavily integrating with the libraries’ other online components, such as the general website? f. how do the libraries plan on sustaining and evaluating digital collections over time? g. what type of authority will review projects at completion? how will the project be evaluated and promoted? iv. topic 2: initiative selection and prioritization a. project selection: what content criteria should projects fall within in order to be considered for digitization and what is the justification for conversion of the proposed materials? (continued from previous page) success factors and strategic planning | lampert and vaughan 135 b. project selection: what technical criteria should projects fall within in order to be considered for digitization? c. project selection: how does the project relate to, interact with, or complement other published projects and collections available globally, nationally, and locally? d. project selection and prioritization: after a project meets all selection criteria, resources may need to be evaluated before the proposal reaches final approval. what information needs to be discussed in order to finalize the selection process, select between qualified project candidates, and begin the prioritization process for approved proposals? e. project prioritization: should we develop a formal review process? v. topic 3: project planning a. what are the planning steps that each project requires? b. who will be responsible for the different steps in the project plan and department workload? c. how can the libraries provide rich metadata and useful access points? d. what type of web design will each project require? e. what type of communication needs to exist between groups during the project? vi. concluding remarks vii. related links and resources cited viii. white paper appendixes a. working list of advisory committee functions and project workgroup functions b. contentdm software: roles and expertise c. project team workflow d. contentdm elements appendix c. first workshop questions general questions 1. how do you define a digital library? do the terms “repository,” “digital project,” “exhibit,” or “online collection” connote different things? if so, what are the differences, similarities, and boundaries for each? 2. what factors have contributed to a successful digitization program at your institution? did anything go drastically wrong? were there any surprises? what should new digitization programs be cautious and aware of? 3. what is the role, specifically, of the academic library in creating digital collections? how is digitization tied to the mission of your institution? 4. why digitize and for whom? do digital libraries need their own mission statement or philosophy because they differ from physical collections? should there be boundaries to what is digitized? 5. what standards are most widely in use at this time? what does the future hold? are there new standards you are interested in? technical questions, metadata questions 1. what are some of the recommended components of digital library infrastructure that should be in place to support a digitization program (equipment, staff, planning, technical expertise, content expertise, etc?) 2. what are the relationships between library digitization initiatives, the library website, the campus website or portal, and the web? in what ways do these information sources overlap, interoperate, or require boundaries? 3. how do you decide on what technology to use? what is the decision-making process when implementing a new technology? 4. standards are used in various ways during digitization. what is the importance of using standards, and are there areas where standards should be relaxed, or not used at all? how do digitization programs deal with evolving standards? 5. preservation isn’t talked about as much as it used to be. what’s your solution or strategy to the problem of preserving digital materials? 6. will embedded metadata ever be the norm for digital objects, or will we continue to rely on collection management like contentdm to link digital objects to their associated metadata? 136 information technology and libraries | september 2009 appendix d. second workshop outline 1. introduction—purpose/focus of the meeting a. to talk about next steps in the digitization program b. quick review of the current status and where the program has been c. serve to further educate participants on the steps involved in taking a project idea to reality d. goals for participants: understand types of projects and project prioritization; engage in activities on ideas and prioritization; talk about process and discuss committee; open forum 2. staff digitization survey discussion a. “defining digital libraries” b. “boundaries to the digitization program” c. “users and audience” d. “digital project design” e. “potential projects and ideas” 3. first group exercise: digital project idea ranking and defense of ranking 4. second group exercise: digital project idea brainstorming and defense of ideas brainstormed 5. concept/proposal for a digitization advisory committee 6. conclusion and next steps collections and design questions 1. how do you decide what should be included in a digital library? does the digital library need a collection development policy and if so, what type? how are projects prioritized at your institution? 2. how do you decide who your user is? are digital libraries targeting mobile users or other users with unique needs? what value-added material compliments and enhances digital collections (i.e., item-level metadata records, guided searches, narrative or scholarly content, teaching material, etc.)? 3. how should digital libraries be assessed and evaluated? how do you gauge the success of a digital collection, exhibit, or library? what has been proven and disproved in the short time that libraries have been doing digital projects? 4. what role do digital libraries play in marketing the library? how do you market your digital collections? are there any design criteria that should be considered for the web presence of digital libraries (should the digital library look like the library website, the campus website, or have a unique look and feel)? 5. do you have any experience partnering with teaching faculty to create digital collections? how are collaborations initiated? are such collaborations a priority? what other types of collaborations are you involved in now? how do you achieve consensus with a diverse group of collaborators? to what degree is centralization important or unnecessary? 2 information technology and libraries | march 2009 andrew k. pace president’s message: lita now andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. a t the time of this writing, my term as lita president is half over; by the time of publication, i will be in the home stretch—a phrase that, to me, always connotes relief and satisfaction that is never truly realized. i hope that this time between ala conferences is a time of reflection for the lita board, committees, interest groups, and the membership at large. various strategic planning sessions are, i hope, leading us down a path of renewal and regeneration of the division. of course, the world around us will have its effect—in particular, a political and economic effect. first, the politics. i was asked recently to give my opinion about where the new administration should focus its attention regarding library technology. i had very little time to think of a pithy answer to this question, so i answered with my gut that the united states needs to continue its investment in it infrastructure so that we are on par with other industrialized nations while also lending its aid to countries that are lagging behind. furthermore, i thought it an apt time to redress issues of data privacy and retention. the latter is often far from our minds in a world more connected, increasingly through wireless technology, and with a user base that, as one privacy expert put it, would happily trade a dna sample for an extra value meal. i will resist the urge to write at greater length a treatise on the bill of rights and its status in 2008. i will hope, however, that lita’s technology and access and legislation and regulation committees will feel reinvigorated post–election and post–inauguration to look carefully at the issues of it policy. our penchant for new tools should always be guided and tempered by the implementation and support of policies that rationalize their use. as for the economy, it is our new backdrop. one anecdotal view of this is the number of e-mails i’ve received from committee appointees apologizing that they will not be able to attend ala conferences as planned because of the economic downturn and local cuts to library budgets. libraries themselves are in a paradoxical situation—increasing demand for the free services that libraries offer while simultaneously facing massive budget cuts that support the very collections and programs people are demanding. what can we do? well, i would suggest that we look at library technology through a lens of efficiency and cost savings, not just from a perspective of what is cool or trendy. when it comes to running systems, we need to keep our focus on end-user satisfaction while considering total cost of ownership. and if i may be selfish for a moment, i hope that we will not abandon our professional networks and volunteer activities. while we all make sacrifices of time, money, and talent to support our profession, it is often tempting when economic times are hard to isolate ourselves from the professional networks that sustain us in times of plenty. politics and economics? though i often enjoy being cynical, i also try to make lemonade from lemons whenever i can. i think there are opportunities for libraries to get their own economic bailout in supporting public works and emphasizing our role in contributing to the public good. we should turn our “woe-are-we” tendencies that decry budget cuts and low salaries into championed stories of “what libraries have done for you lately.” and we should go back to the roots of it, no matter how mythical or anachronistic, and think about what we can do technically to improve systemwide efficiencies. i encourage the membership to stay involved and reengage, whether through direct participation in lita activities or through a closer following of the activities in the ala office of information technology policy (oitp, www.ala.org/ala/aboutala/offices/oitp) and the ala washington office itself. there is much to follow in the world that affects our profession, and so many are doing the heavy lifting for us. all we need to do sometimes is pay attention. make fun of me if you want for stealing a campaign phrase from richard nixon, but i kept coming back to it in my head. in short, library information technology— now more than ever. 5 tails wagging dogs a funny thing happened on the way to the form. in the past decade, many libraries believed they were developing or using automated systems to produce catalog cards, or order slips, or circulation control records. the trauma of aacr2 implementation has helped many to realize belatedly that they have, in fact, been building data bases. libraries must relate their own machine-readable records to each other in a new way as they face new applications. further methods of relating and using records from different libraries, and even different networks, are becoming necessities in our increasingly interdependent world. a narrow view of the process of creating records has often resulted in introduction of nonstandard practices that provide the required immediate result, but create garbage in the data base. in effect, letting the tails wag the dogs. for many years, john kountz and the tesla (technical standards for library automation) committee addressed this issue forcefully, but were as voices in the wilderness. the problems created are the problems of success. the expectations libraries have developed have outstripped their practices. many libraries are only now seriously addressing the practices they have used to create data bases that already contain hundreds of thousands of records. precisely because of its success, the oclc system is a useful case in point. in general, oclc has adhered closely to marc standards. in call number and holding fields, national standards have been late forthcoming, and libraries have often improvised. meeting the procrustean needs of catalog cards has ofttimes blinded libraries to the long-term effects of their practices. multiple subfield codes to force call number "stacking" and omission of periods from lc call numbers are two examples of card-driven practice. not following recommended oclc practice of fully updating the record at each use has created archive tapes requiring significant manual effort to properly reflect library holdings. variant branch cataloging practices create dilemmas. some malpractices have resulted from attempts to beat pricing algorithms . some, like retaining extraneous fields or accepting default options when they are incorrect, merely reflect laziness or shortsighted procedures. while implementing systems in the present, libraries must keep a weather eye to the future. what new requirements will future systems place on records being created today? brian aveney 222 information technology and libraries | december 2006 social engineering is the use of nontechnical means to gain unauthorized access to information or computer systems. while this method is recognized as a major security threat in the computer industry, little has been done to address it in the library field. this is of particular concern because libraries increasingly have access to databases of both proprietary and personal information. this tutorial is designed to increase the awareness of library staff in regard to the issue of social engineering. one morning the phone rings at the circulation desk; the assistant, joyce, answers. “seashore branch public library, how may we help you?” she asks, smiling. “my wife and i recently moved and i wanted to confirm that you had our current address,” a pleasant male voice responds. “could you give me your name please?” “the card is in my wife’s name, jennifer greene. we’ve been so busy with the move that she hasn’t had a chance to catch up with everything.” “okay, i have her information here. 123 main street, apartment 2b. is that correct?” “thank you so much, that’s it. do you have our new number or is it still 555-555-1234 in your records?” “let me see . . . no, i think we have your new number.” “could you read it back to me?” “sure . . . 555-555-6789, is that right?” “555-555-6789 . . . that’s right. thank you very much, you’ve been very helpful.’ “no problem, that’s what we’re here for.” what just happened? what happened to joyce may have been exactly what it appeared to be—a conscientious spouse trying to make sure information was updated after a move. but what else could it have been—research for an identity theft, or a stalker trying to get personal information? we have no way of knowing. all reasons except for the first, innocent, reason are covered by the term social engineering. in the language of computer hackers, social engineering is a nontechnical hack. it is the use of trickery, persuasion, impersonation, emotional manipulation, and abuse of trust to gain information or computer-system access through the human interface. regardless of an institution’s commitment to computer security through technology, it is vulnerable to social engineering. recently, the institute of management and administration (ioma) reported social engineering as the number-one security threat for 2005. according to ioma, this method of security violation is on the rise due to continued improvements in technical protections against hackers.1 why and how does social engineering work? the first thing to keep in mind about social engineering is that it does work. kevin mitnick, possibly the best known hacker of recent decades, carried out most of his questionable activities through the medium of social engineering.2 he did not need to use his technical expertise because it was easier to just ask for the information he wanted. he discovered that people, when questioned appropriately, would give him the information he wanted. social engineering succeeds because most people work under the assumption that others are essentially honest. as a pure matter of probability, this is true; the vast majority of communications that we receive during the day are completely innocent in character. this fact allows the social engineer to be effective. by making seemingly innocuous requests for information, or making requests in a way that seems reasonable at the time, the social engineer can gather the information that he or she is looking for. methods of social engineering the arsenal of the social engineer is large and very well established. this is mainly because social engineering amounts to a variation on confidence trickery, an art that goes back as far as human history can recall. one might argue that homer’s iliad contains the first record of a social engineering attack in the form of the trojan horse. direct requests many social-engineering methods are complex and require significant planning. however, there is a simple and effective method that is often just as effective. the social engineer contacts his or her target and simply asks for the information. preying on trust and emotion social engineering is a method of gaining information through the persuasion of human sources, based on the abuse of trust and the manipulation of emotion. in his book, the art of deception, mitnick makes the argument that once a social engineer has established the trust of a contact, then all security is effectively voided and helping the hacker? library information, security, and social engineering samuel t. c. thompson samuel t. c. thompson (sthompson@ collier-lib.org), is a public service librarian at the collier county public library, naples, florida. helping the hacker? | thompson 223 the social engineer can gather whatever information is required. the most common method of targeting computer end-users is through the manipulation of gratitude. in these cases, a social engineer, usually impersonating a technician, contacts a user and states that there is something wrong on the victim’s end, and that the social engineer needs a few pieces of information to “help” the user. appreciative of the assistance, the victim provides the necessary information to the helpful caller or carries out the requested actions. predictably, no problem ever existed and the victim has now provided the social engineer either access to a computer system or with the information needed to gain that access. a counterpoint to the manipulation of gratitude is the manipulation of sympathy. this method is most often used on information providers such as help-desk personnel, technicians, and library staff members. in this scenario, a social engineer contacts a victim and claims to have either lost information, is out of contact with a normal source, or is simply ignorant of something that he or she should know. as anyone can empathize with this plea, the victim is often all too willing to provide the information sought by the social engineer. using these methods—taking advantage of the gratitude, sympathy, and empathy of their victims—social engineers are able to achieve their aims. impersonation because forming trust relationships with their victims is critical to a socialengineering attack, it is not surprising that social engineers often pretend to be someone or something that they are not. two of the major tools of impersonation are (1) speaking the language of the victim institution and (2) knowledge of personnel and policy. to allay suspicion, a social engineer needs to know and be able to use an institution’s terminology. being unable to do so would cause the victim to suspect, rather than trust, the social engineer. with a working knowledge of an organization’s particular vocabulary, a social engineer can phrase his or her request in terms that will not rouse alarm with the intended victim. the other major goal of a social engineer in preparing a successful impersonation is to develop a familiarity with the “lay of the land,” i.e., the specifics of and personnel within an organization. for instance, a social engineer needs to discover who has what authority within an organization so as to understand for whom he or she needs to claim to speak. research to establish trust in their victims, social engineers use research as a tool. this comes in two forms, background research and cumulative research. background research is the process by which a social engineer uses publicly available resources to learn what to ask for, how to ask for it, and whom to ask it of. while the intent and goal of this research differs from the techniques used by students, librarians, and other members of the population, the actual process is the same. cumulative research is the process by which a social engineer gathers the information that he or she needs to make more critical requests of their victims. the facts that a social engineer seeks through cumulative research may seem without value to the casual observer, but put together properly, they are anything but that. questions can include names of staff, internal phone numbers, procedures, or seemingly minor technical details about the library’s network (e.g., what operating system are you running?). late in the afternoon the phone at the reference desk rings. marcy, the librarian on duty answers, “reference desk.” “hi there, this is dave simpson calling from information services at the main branch. sorry about the echo, i’m working in the cabling closet at the moment, so i’m calling you on my cell phone.” “no problem, i can hear you fine. what can i do for you?” “thanks. a lot of the branches have been having network problems over the last few days. has everything been okay at the seashore branch reference desk?” “i think so.” “okay, that’s good. i’m running a test right now on the network and needed to find a terminal that was behaving itself. could you log off and let me know if any messages come up?” “no problem.” marcy logs off of the reference computer; nothing strange happens. “just the usual messages.” “good. now start logging back on. what user are you going in as? i mean which login name are you using?” “searef. okay, i’m logged on now.” “no strange messages?” “nothing.” “that’s great. look, our problem might be kids hacking into the system so i need you to change the password. do you know how to do that?” “i think so.” “well, let me walk you through it.” dave spends a couple of minutes walking marcy through changing the system password. the password is now changed to 5ear3f, a moderately secure password. “thanks, marcy. you’ve been a great help. we have your new password logged into the system. could you pass on the new password to the other reference personnel?” “sure.” “wonderful. just remember not to give the password out to anyone who doesn’t need it, and don’t write it down where anyone who shouldn’t have it can get at it. have a great day.” “you too.” 224 information technology and libraries | december 2006 why are libraries vulnerable? libraries are vulnerable to social-engineering attacks for two major reasons: (1) ignorance and (2) institutional psychology. the first of these difficulties is the easiest to address. the ignorance of library professionals in this matter is easily explained—there is very little literature to date about the issue of social engineering directed at library personnel. what exists is usually mixed in larger articles on general security issues and receives little focus. this lack of concern about social engineering can also be seen in computer professional literature, where it is dwarfed by the volume of articles concerning technical security issues. this is a curious gap, considering the high rate of occurrence of this kind of attack. is it because many technical professionals are less comfortable with a social issue—that can only be solved through people—than with a technical security issue that can be solved through the development or implementation of proper software?3 unfortunately, not knowing about a method of security violation leaves one vulnerable to that method. it is incumbent on librarians, computer administrations, and security professionals to be aware of these issues. the second factor is harder to address but equally important. unlike almost any other profession, librarians are expected to fulfill their patrons’ informational needs without question or bias. this laudable goal makes librarians vulnerable to social-engineering attacks because the inquiries made by a social engineer about the information resources available at a library may be used for nefarious purposes. a reference interview over these issues may be very successful from the point of view of both parties involved, as the librarian fills the openended inquiries of the social engineer, and the social engineer receives much, if not all, of the information that he or she needs to violate the library’s internal information systems. why libraries can be targets at this point, it is relevant to ask why security violators would even bother with library computer networks. what do libraries have that is worth possibly committing a crime to get? personal information is probably the most tempting target in a library computer system. libraries possess databases of names, addresses, and other personal data about library cardholders. this information is valuable, and not all of it is easily available from public sources. as may be seen in the section of this article on techniques, such information could be used as an end unto itself or as a stepping stone to security violations in other systems. subscriptions to proprietary databases are quite expensive, as any acquisitions librarian will explain. given the high prices and limited licensing, a hacker may want to gain access to these information resources. this could be a casual hacker who wants to have access to a library-only resource from his or her home computer, or this may be a criminal who wishes to steal intellectual properties from a database provider. libraries often have broadband access designed for a large network (e.g., t1). as these lines are very expensive, few individuals can afford them. at the same time, it has been observed that these broadband lines have immense capabilities for downloading information from other networks. there are many reasons why a hacker would seek to illicitly use such a resource. for instance, a casual hacker may want to download a large number of bootlegged movie files, or a criminal may wish to download a corporate database. with access to a library’s high bandwith internet line, these actions can be carried out quickly and with a minimized chance of detection. libraries possess large numbers of computers due to their increasing automation. these computer resources can, if compromised, be used as anonymous remote computers by hackers. called “zombies,” compromised computers could be used to deliver illegal spam, distributed denial of service (ddos) attacks, or as servers to distribute illegal materials. if library computers are used in this way, there is a potential for a library to face legal responsibility for the actions of its computers or for the questionable materials found on them. prevention the tools needed to prevent social engineering from succeeding are awareness, policy, and training. these tools feed into one another—we become aware of the possibility of social-engineering attacks, develop policy to communicate these concerns to others, and then train others in these policies to protect them and their libraries from social engineering. libraries should have a simple set of policies to help prevent social engineering from affecting them. this policy need not be long; ideally, it should be a small page of bullet points that are easy to remember or to post near telephones. what is important is that it is easy to remember and implement when a call or e-mail comes in.4 basic guidelines for protection against social engineering ■ be suspicious of unsolicited communications asking about employees, technical information, or other internal details. ■ do not provide passwords or login names over the phone or helping the hacker? | thompson 225 via e-mail no matter who claims to be asking. ■ do not provide patron information to anyone but the patron in person and only upon presentation of the patron’s library card or other proper identification. ■ if you are not sure if a request is legitimate, contact the appropriate authorities. ■ trust your instincts. if you feel suspicious about a question or communication, there is probably a good reason. ■ document and report suspicious communications. in closing social engineering is an immensely effective method of breaching computer and network security. it is, however, entirely dependent on the ability of the social engineer to persuade staff members into providing information or access that they should not provide. with care and good information policies, we can prevent social engineering from working. after all, do we really want to be helping the hacker? the circulation desk phone rings. joyce answers, “seashore branch public library, how may we help you?” “hi there, i’m worried that i haven’t turned in all the books i have out, and i really don’t want to get stuck with a fine. could you tell me what i have out?” “no problem. what is you name?” “sean grey.” joyce brings up sean grey’s circulation records, and then remembers about the library’s information policy and decides to ask another question, “could you give me your library card number?” “i don’t have that with me. i really don’t want to get stuck with those fines.” “i’m sorry. mr. grey, to preserve patron privacy we can only give out circulation information if you give us your card number or if you are here in person with your card or id.” “but i just want to avoid a fine. can’t you help?” “don’t worry; if you are late by accident on occasion, we are willing to forgive a fine.” “so you can’t give me my records?” “i’m sorry but we have to protect patron privacy. i’m sure you understand.” “i guess so. goodbye.” “have a good day.” ■ references 1. institute of management & administration, “six security threats that will make headlines in ’05,” ioma’s security director’s report 5, no. 1 (2004): 1–14. 2. k. manske, “an introduction to social engineering,” security management practices (nov./dec. 2000): 53–59. 3. m. mcdowell, “cyber-security tip st04-014,” (2005), http://www.us.cert. gov/cas/tips/st04-014.html (accessed june 5, 2005). 4. k. mitnick and w. simon, the art of deception (indianapolis: wiley, 2002). alcts cover 2 lama cover 3 lita 180, 216, cover 4 index to advertisers editorial | truitt 163 ■■ the space in between in my opinion, ital has an identity crisis. it seems to try in many ways to be scholarly like jasist, but lita simply isn’t as formal a group as asist. on the other end of the spectrum, code4lib is very dynamic, informal and community-driven. ital kind of flops around awkwardly in the space in between. —comment by a respondent to ital’s reader survey, december 2009 last december and january, you, the readers of information technology and libraries were invited to participate in a survey aimed at helping us to learn your likes and dislikes about ital, and where you’d like to see this journal go in terms of several important questions. the responses provide rich food for reflection about ital, its readers, what we do well and what we don’t, and our future directions. indeed, we’re still digesting and discussing them, nearly a year after the survey. i’d like to use some of my editorial space in this issue to introduce, provide an overview, and highlight a few of the most interesting results. i strongly encourage you to access the full survey results, which i’ve posted to our weblog italica (http:// ital-ica.blogspot.com/); i further invite you to post your own thoughts there about the survey results and their meaning. we ran the survey from mid-december to mid-january. a few responses trickled in as late as mid-february. the survey invitation was sent to the 2,614 lita personal members; nonmembers and ital subscribers (most of whom are institutions) were excluded. we ultimately received 320 responses—including two from individuals who confessed that they were not actually lita members—for a response rate of 12.24 percent. thus the findings reported below reflect the views of those who chose to respond to the survey. the response rate, while not optimal, is not far from the 15 percent that i understand lita usually expects for its surveys. as you may guess, not all respondents answered all questions, which accounts for some small discrepancies in the numbers reported. who are we? in analyzing the survey responses, one of the first things one notices is the range and diversity of ital’s reader base, and by extension, of lita’s membership. the largest groups of subscribers identify themselves either as traditional systems librarians (58, or 18.2 percent) or web services/development librarians (31, or 9.7 percent), with a further cohort of 7.2 percent (23) composed of those working with electronic resources or digital projects. but more than 20 percent (71) come from the ranks of library directors and associate directors. nearly 15 percent (47) identify their focus as being in the areas of reference, cataloguing, acquisitions, or collection development. see figure 1. the bottom line is that more than a third of our readers are coming from areas outside of library it. a couple of other demographic items: ■■ while nearly six in ten respondents (182, or 57.6 percent) work in academic libraries, that still leaves a sizable number (134, or 42.3 percent) who don’t. more than 14 percent (45) of the total 316 respondents come from the public library sector. ■■ nearly half (152, or 48.3 percent) of our readers indicated that they have been with lita for five years or fewer. note that this does not necessarily indicate the age or number of years of service of the respondents, but it’s probably a rough indicator. still, i confess that this was something of a surprise to me, as i expected larger numbers of long-time members. and how do the numbers shake out for us old geezers? the 6–10 and greater-than-15-years cohorts each composed about 20 percent of those responding; interestingly, only 11.4 percent (36) answered that they’d been lita members for between 11 and 15 years. assuming that these numbers are an accurate reflection of lita’s membership, i can’t help but wonder about the explanation for this anomaly.” see figure 2. how are we doing? question 4 on the survey asked readers to respond to several statements: “it is important to me that articles in ital are peerreviewed.” more than 75 percent (241, or 77.2 percent) answered that they either “agreed” or “strongly agreed.” “ital is timely.” more than seven in ten respondents (228, or 73.0 percent) either “agreed” or “strongly agreed” that ital is timely. only 27 (8.7 percent) disagreed. as a technology-focused journal, where time-to-publication is always a sensitive issue, i expected more dissatisfaction on this question (and no, that doesn’t mean that i don’t worry about the nine percent who believe we’re too slow out of the gate). marc truitt editorial: the space in between, or, why ital matters marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 164 information technology and libraries | december 2010 would likely quit lita, with narrative explanations that clearly underscore the belief that ital—especially a paper ital—is viewed by many as an important benefit of membership. the following comments are typical: ■■ “lita membership would carry no benefits for me.” ■■ “dues should decrease, though.” [from a respondent who indicated he or she would retain lita “i use information from ital in my work and/ or i find it intellectually stimulating.” by a nearly identical margin to that regarding timeliness, ital readers (226, or 72.7 percent) either “agreed” or “strongly agreed” that they use ital in their work or find its contents stimulating. “ital is an important benefit of lita membership.” an overwhelming majority (248, or 79.78 percent) of respondents either “agreed” or “strongly agreed” with this statement.1 this perception clearly emerges again in responses to the questions about whether readers would drop their lita membership if we produced an electronic-only or open-access ital (see below). where should we be going? several questions sought your input about different options for ital as we move forward. question 7, for example, asked you to rank how frequently you access ital content via several channels, with the choices being “print copy received via membership,” “print copy received by your institution/library,” “electronic copy from the ital website,” or “electronic copy accessed via an aggregator service to which your institution/library subscribes (e.g., ebsco).” the choice most frequently accessed was the print copy received via membership, at 81.1 percent (228). question 8 asked about your preferences in terms of ital’s publication model. of the 307 responses, 60.6 percent (186) indicated a preference for continuance of the present arrangement, whereby we publish both paper and electronic versions simultaneously. four in ten respondents preferred that ital move to publication in electronic version only.2 of those who favored continued availability of paper, the great majority (159, or 83.2 percent) indicated in question 9 that they simply preferred reading ital in paper. those who advocate moving to electronic-only do so for more mixed reasons (question 10), the most popular being cost-effectiveness, timeliness, and the environmental friendliness of electronic publication. a final question in this section asked that you respond to the statement “if ital were to become an electronic-only publication i would continue as a dues-paying member of lita.” while a reassuring 89.8 percent (273) of you answered in the affirmative, 9.5 percent (29) indicated that you figure 2. years of lita membership figure 1. professional position of lita members 18.2% (58) 0.3% (1) 0.6% (2) 0.6% (2) 0.9% (3) 2.2% (7) 2.5% (8) 3.1% (10) 4.1% (13) 4.4% (14) 6.3% (20) 7.9% (25) 9.4% (30) 9.7% (31) 12.9 % (41) 16.7% (53) 0% 5% 10% 15% 20% systems librarian (includes responsibility for ils, servers, workstat... other (please specify) library director web services/development librarian deputy/associate/assistant director reference services librarian cataloging librarian consortium/network/vendor librarian electronic resources librarian digital projects/digitization librarian student teaching faculty computing professional (non-mls) resource sharing librarian acquisitions/collection development librarian other library staff (non-mls) 11.4% (36) 19.7% (62) 20.0% (63) 48.3% (152) 0% 10% 20% 30% 40% 5 years or less 11–15 years 6–10 years more than 15 years editorial | truitt 165 his lipstick-on-a-pig ils. somewhere else there’s a library blogger who fends off bouts of insomnia by reading “wonky” ital papers in the wee hours of the morning. and that ain’t the half of it, as they say. in short—in terms of readers, interests, and preferences—“the space in between” is a pretty big niche for ital to serve. we celebrate it. and we’ll keep trying our best to serve it well. ■■ departures as i write these lines in late-september, it’s been a sad few weeks for those of us in the ital family. in mid-august, former ital editor jim kopp passed away following a battle with cancer. last week, dan marmion—jim’s successor as editor (1999–2004)—and a dear friend to many of us on the current ital editorial board—also left us, the victim of a malignant brain tumor. i never met jim, but lita president karen starr eulogized him in a posting to lita-l on august 16, 2010.3 i noted dan’s retirement due to illness in this space in march.4 i first met dan in the spring of 2000, when he arrived at notre dame as the new associate director for information systems and digital access (i think the position was differently titled then) and, incidentally, my new boss. dan arrived only six weeks after my own start there. things at notre dame were unsettled at the time: the libraries had only the year before successfully implemented exlibris’ aleph500 ils, the first north american site to do so. while exlibris moved on to implementations at mcgill and the university of iowa, we at notre dame struggled with the challenges of supporting and upgrading a system then new to the north american market. it was not always easy or smooth, but throughout, dan always maintained an unflappable and collegial manner with exlibris staff and a quiet but supportive demeanor toward those of us who worked for him. i wish i could say that i understood and appreciated this better at the time, but i can’t. i still had some growing ahead of me—i’m sure that i still do. dan was there for me again as an enthusiastic reference when i moved on, first to the university of houston in 2003 and then to the university of alberta three years later. in these jobs i’d like to think i’ve come to understand a bit better the complex challenges faced by senior managers in large research libraries; in the process, i know i’ve come to appreciate dan’s quiet, knowledgeable, and hands-off style with department managers. it is one i’ve tried (not always successfully) to cultivate. while i was still at notre dame, dan invited me to join the editorial board of information technology and libraries, a group which over the years has come to include many “friends of dan,” including judith carter (quite possibly the world’s finest managing editor), andy boze (ital’s membership] ■■ “ital is the major benefit to me as we don’t have funds for me to attend lita meetings or training sessions.” ■■ “the paper journal is really the only membership benefit i use regularly.” ■■ “actually my answer is more, ‘i don’t know.’ i really question the value of my lita membership. ital is at least some tangible benefit i receive. quite honestly, i don’t know that there really are other benefits of lita membership.” question 12 asked about whether ital should continue with its current delayed open-access model (i.e., the latest two issues embargoed for non-lita members), or go completely open-access. by a three-to-two margin, readers favored moving to an open-access model for all issues. in the following question that asked whether respondents would continue or terminate lita membership were ital to move to a completely open-access publication model, the results were remarkably similar to those for the question linking print availability to lita membership, with the narrative comments again suggesting much the same underlying reasoning. in sum, the results suggest to me more satisfaction with ital than i might have anticipated; at the same time, i’ve only scratched the surface in my comments here. the narrative answers in particular—which i have touched on in only the most cursory fashion—have many things to say about ital’s “place,” suggestions for future articles, and a host of other worthy ideas. there is as well the whole area of crosstabbing: some of the questions, when analyzed with reference to the demographic answers in the beginning of the survey, may highlight entirely new aspects of the data. who, for instance, favors continuance of a paper ital, and who prefers electronic-only? but to come back to that reader’s comment about ital and “the space in between” that i used to frame this discussion (indeed, this entire column): to me, the demographic responses—which clearly show ital has a substantial readership outside of library it—suggest that that “space in between” is precisely where ital should be. we may or may not occupy that space “awkwardly,” and there is always room for improvement, although i hope we do better than “flop around”! the results make clear that ital’s readers—who would be you!—encompass the spectrum from the tech-savvy early-career reader of code4lib journal (electronic-only, of course!) to the library administrator who satisfies her need for technology information by taking her paper copy of ital along when traveling. elsewhere on that continuum, there are reference librarians and catalogers wondering what’s new in library technology, and a traditional systems librarian pondering whether there is an open-source discovery solution out there that might breathe some new life into 166 information technology and libraries | december 2010 between membership and receiving the journal. many of them appear to infer that a portion of their lita dues, then, are earmarked for the publication and mailing of ital. sadly, this is not the case. in years past, ital’s income from advertising paid the bills and even generated additional revenue for lita coffers. today, the shoe is on the other foot because of declining advertising revenue, but ital is still expected to pay its own way, which it has failed to do in recent years. but to those who reasonably believe that some portion of their dues is dedicated to the support of ital, well, t’ain’t so. bothered by this? complain to the lita board. 2. as a point of comparison, consider the following results from the 2000 ital reader survey. respondents were asked to rank several publishing options on a scale of 1 to 3 (with 1 = most preferred option and 3 = least preferred option): ital should be published simultaneously as a print-onpaper journal and an electronic journal (n = 284): 1 = 169 (59.5%); 2 = 93 (32.7%); 3 = 22 (7.7%) ital should be published in an electronic form only (n = 293): 1 = 55 (18.8%); 2 = 61 (20.8%); 3 = 177 (60.4%) in other words, then as now, about 60% of readers preferred paper and electronic to electronic-only. 3. karen starr, “fw: [libs-or] jim kopp: celebration of life,” online posting, aug. 16, 2010, lita-l, http://lists.ala. org/sympa/arc/lita-l/2010-08/msg00079.html (accessed sept. 29, 2010). 4. marc truitt, “dan marmion,” information technology & libraries 29 (mar. 2010): 4, http://www.ala.org/ala/mgrps/ divs/lita/ital/292010/2901mar/editorial_pdf.cfm (accessed sept. 29, 2010). webmaster), and mark dehmlow. while dan left ital in 2004, i think that he left the journal a wonderful and lasting legacy in these extremely capable and dedicated folks. my fondest memories of dan concern our shared passion for model trains. i remember visiting a train show in south bend with him a couple of times, and our last time together (at the ala midwinter meeting in denver two years ago) was capped by a snowy trek with exlibris’ carl grant, another model train enthusiast, to the mecca of model railroading, caboose hobbies. three boys off to see their toys—oh, exquisite bliss! i don’t know whether ital or its predecessor jola have ever reprinted an editorial, but while searching the archives to find something that would honor both jim and dan, i found a piece that i hope speaks eloquently of their contributions and to ital’s reason for being. dan’s editorial, “why is ital important?” originally published in our june 2002 issue, appears again immediately following this column. i think its message and the views expressed therein by jim and dan remain as valid today as they were in 2002. they also may help to frame my comments concerning our reader survey in the previous section. farewell, jim and dan. you will both be sorely missed. notes and references 1. a number of narrative answers to the survey make it clear that ital readers who are lita members perceive a link 121 n ews and announcements redi or not . .. "public libraries and the remote electronic delivery of information (redi)," a working meeting, was held in columbus, ohio, on monday and tuesday, march 23 and 24, 1981. the meeting, jointly sponsored by the public library of columbus and franklin county (ohio) and oclc, inc., considered the issues that public libraries must examine before becoming involved in electronic information services . subjects explored included technology, communications, information providers, information users , social implications, and financial, legal, and regulatory responsibilities. tom harnish, program dir e ctor of oclc' s home delivery of library services program, was moderator of the twoday event. participants at the conference represented a variety of public libraries from throughout the u.s., including new york, georgia, texas, california, colorado, and illinois. don hammer represented lita at the meeting; mary jo lynch of the ala office for research also attended . "geographic distances, " said harnish, "were the only points of separation among the meeting participants . there was an overwhelming agreement on the concerns for the future of libraries and universal access to information in the electronic age . " on the second day of the conference it became apparent that the redi agenda could not be properly dealt with in two days. "we need an organization which will address these issues on an ongoing basis," said richard sweeney, executive director of plcfc . "librarians at the conference agreed to promote and lead the development of the electronic library . to that end, this group is seeking recognition by ala as a membership initiative group with a special interest in the electronic library." the group's founders prepared the following mission statement for the membership initiative group: to ensure that information delivered electronically remains accessible to the general public, the electronic library association shall promote participation and leadership in the remote electronic delivery of information* (redi) by publicly supported libraries and nonprofit organizations . goals of the organization are to: • identify services and information that are best suited to remote electronic delivery; • plan , fund, and develop working demonstrations of library redi services ; • communicate the availability of electronic library services to the user community; · • inform the library profession of trends, specific events , and future directions of redi; • create coalitions with organizations in allied fields ·of interest. public libraries and nonprofit organizations with information interests, such as information and referral groups, are invited to join the electronic library association . the group plans to meet at the ala annual conference in san francisco. meeting details will be announced as soon as they are available . it was the goal of the "public libraries and the remote electronic delivery of information " meeting to provide th e fram e work within which to address the myriad issues in redi. the electronic library group will validate the role of libraries in technology .... redi or not here we come. *information delivered electronically where and when it is needed, in the library and elsewhere (home/office/off-site). 122 journal of library automation vol. 14/2 june 1981 arl adopts plan for improving access to microforms a plan aimed at improving bibliographic access to materials in microform by building a nationwide database of machinereadable records for individual titles in microform sets was approved in principle by the arl board of directors on january 30, 1981. the plan concentrates on monograph collections, and is aimed at providing records for individual titles in both current and retrospective sets. records add~d to the database will also aid cooperative efforts in preservation microfilming. elements of the plan include: • inputting of records conforming to accepted north american standards to the major bibliographic utilities by libraries and microform publishers; • d~ve_lopment of "profile matching" by the b1blwgraphic utilities permitting the cataloging of all individual titles in a series or microform collection with single operation; • cooperative cataloging of current and retrospective microform sets by libraries and publishers; • compensation for publishers who input acceptable bibliographic records to the bibliographic utilities to offset loss of revenue from card set sales. cooperation among libraries publishers, networks, and others ha's been stressed throughout the development of the plan, and initiatives on a number of fronts are necessary and encouraged in order to accomplish the goal of improved bibliographic access to microforms. arl will s_eek outside funding for a program coordmator to facilitate implementation of the elements outlined above, and recruitment for the one-year position will begin short!~ . the coordinator, advised by a committee of librarians (from arl and ~on-arl institutions) and microform publ~shers, will work with libraries, publishers, and the bibliographic utilities to help get the plan off the ground. the plan is the result of a one-year study funded by a grant from the national endowment for the humanities and conducted for arl by richard boss of information systems consultants, inc. during the course of the year, he interviewed librarians , microform publishers, representatives of the bibliographic utilities, and others interested in bibliographic access to microforms, gradually building the plan from elements on which there was agreement and discarding ideas that were not widely accepted. the effort to build a consensus among the various interested parties was aided by the advisory committee, comprising both arl librarians and microform publishers, which assisted and advised throughout the course of the project. arl will publish the study this spring. arl sponsorship of this project and its follow-up reflects the long-standing commitment the association has had to improving access to microforms . two earlier arl studies on improving bibliographic access contributed to the development of standards for descriptive cataloging of microforms, reinforced the importance of microforms for preserving and disseminating scholarly materials, and identified some of the problem areas that the current study has addressed . today, as the amount of materials in microform in arl libraries continues to grow-arl libraries hold more than 146,660,000 units of microform-improving access to these materials has taken on even greater urgency. the association of research libraries is an organization of major research libraries in the united states and canada. members include the larger university libraries, the national libraries of both countries and a number of public and special librar~ ies with substantial research collections . there are at present 111 institutional members . battelle studies using computers to access unpublished technical information engineers may be able to use computers to store, call up, and otherwise display some technical information not currently published in professional journals as a result of a study recently begun by battelle's columbus laboratories. in a four-month study sponsored by the american society of mechanical engineers (asme), battelle researchers are examining ways to use computers as an alternative to publications for communicating with the technical community. asme is a technical and educational organization with a membership of 100,000 individuals, including 17,000 student members. it conducts one of the largest technical publishing operations in the world, which includes codes, stanc dards, and operating principles for industry. according to battelle's gabor j. kovacs, certain types of information traditionally are not covered in monthly or quarterly technical journals, yet they often have widespread appeal among engineers. "recent advances in computer and telecommunications technologies, coupled with rapidly rising publication costs and postal rates, have created an ideal environment for organizations to consider using computers as an alternative mode of communication," kovacs said . "data bases can be used to maintain information that is impractical for conventional publication, and it is now possible to use them for many other types of communication as well." during the study, researchers will determine the feasibility of using a computer database to disseminate to asme members such information as short articles dealing with design and applications data, news and announcements 123 catalog data, and teleconference messages. with the help of the asme, battelle specialists will define the information requirements for such a system. while technology is sufficiently advanced to accommodate virtually any type of information, costs can become prohibitive unless practical compromises are made, kovacs said . as part of the study, battelle researchers also will analyze the costs associated with systems of varying capabilities. researchers then will define several alternative database systems, which will include such attributes as: • online, interactive retrieval features • simple-to-use retrieval language • user-aid features • a minimum of seventy-five simultaneous users • ability to send, store, and broadcast messages • compatibility with a variety of hard copy and crts (cathode ray tube terminals) • sixteen or more hours per day availability to accommodate different time zones • a minimum of thirty-characters-persecond transmission rates two of these alternative system de .signs-one representing a minimum capability and the other a maximum capability-then will be selected for further evaluation by battelle and the asme. 2 information technology and libraries | december 2007 editorial: farewell and thank you john webb this issue of information technology and libraries (ital), december 2007, marks the end of my term as editor. it has been an honor and a privilege to serve the lita membership and ital readership for the past three years. it has been one of the highlights of my professional career. editing a quarterly print journal in the field of information technology is an interesting experience. my deadlines for the submission of copy for an issue are approximately three and a half months prior to the beginning of the month in which the issue is pub lished; for example, my deadline for the submission of this issue to ala production services was august 15. therefore, most articles that can appear in an issue were accepted in final form at least five months before they were published. some are older; one was a baby at only four months old. when one considers the rate of change in information technologies today, one understands the need for blogs, wikis, lists, and other forms of profes sional discourse in our field. what role does ital play in this rapidly changing environment? for one, unlike these newer forms, it is doubleblind refereed. published articles run a peer review gauntlet. this is an important distinction, not least to the many lita members who work for aca demic institutions. it may be crass to state it so baldly, but publication in ital can help one earn tenure, an oldfashioned fact of life. it is indexed or abstracted in nineteen published sources, not all of them in english. many of its articles appear in various digital repositories and archives, and these also are harvested or indexed or both. in addition, its articles are cataloged in worldcat local. many of lita’s most prominent members—your distinguished peers—have published articles in ital. the journal also serves as a source for the wider dis semination of sponsored research, a requirement of most grants. and you can read it on the bus or at the beach (heaven forbid!), in the brightest sunlight, or with a flashlight under the covers (though there are no reports of this ever having been observed). i am amazed at how quickly these three years have passed, though that may be at least as much a function of my advanced age as of the fun and pleasure i have had as editor. certainly, these past three years have hosted some notable landmarks in our history. lita and ital both celebrated their fortieth anniversaries. sadly, the death of one of lita’s founders and ital’s first editor, frederick g. kilgour, on july 31, 2006, at age ninetytwo, was a landmark in the passing of an era. oclc and rlg’s merger, which fred lived to witness, was a landmark of a different sort—one of maturity, we hope. ital is now an electronic as well as a print journal. this conversion has had some rough passages, but i trust these will have been ironed out by the time you read this. when i became editor, i had a number of goals for the journal, which i stated in my first editorial in march 2005. reading that editorial today, i realize that we successfully accomplished the concrete ones that were most important to me then: increasing the number of articles from library and ischool faculty; increasing the number that result from sponsored research; increasing the number that describe any relevant research or cuttingedge advance ments; increasing the number of articles with multiple authors; and finding a model for electronic publication of the journal. the accomplishment of the most abstract and ambitious goal, “to make ital a destination journal of excellence for both readers and authors,” only you, the readers and authors, can judge. i thank mary taylor, lita executive director, and her staff for all of the support they provided to me during my term. i owe a debt that i can never repay to all of the staff of ala production services who worked with me these past three years. their patience with my some times bumbling ways was awardwinning. thank all of you. the lita presidents and other officers and board members were unfailingly supportive, and i thank you all. in the lita organizational structure, the ital editor and the editorial board report to the lita publications committee, and the editor is a member of that body. i thank all of the chairs and other members of that commit tee for their support. once more, and sadly for the last time, i thank all of the members of the ital editorial board who served dur ing my term for their service and guidance. they perform more than their share of refereeing, but more importantly, as i have written before, they are the junkyard dogs who have kept me under control and prevented my acting on my worst instincts. i say again, you, the lita member ship and ital readership, owe them more than you can ever guess. trust me. to marc truitt, ital managing editor and the incom ing ital editor for the 2008–2010 volume years, i must say, “thank you, thank you, thank you!” marc and the ala production services staff were responsible for the form, fit, and finish of the journal issues you received in the mail, held in your hands, and read under the covers. finally, most of all, thank you authors whose articles, communications, and tutorials i have had the privilege to publish, and you whose articles have been accepted and await publication. john webb (jwebb@wsu.edu) is a librarian emeritus, washington state university, and editor of information technology and libraries. editorial: farewell and thank you | john webb 3 not only is this the end of my term as editor, but i also have retired. from now on, my only role in the field of library and information technology will be as a user. those of you have seen the movie the graduate probably remember the early scene when benjamin, the dustin hoffman character, receives the single word of advice regarding his future: “plastics.” (i don’t know if that scene is in the novel from which the movie was adapted.) my single word of advice to those of you too young or too ambitious to retire from our field is: “handhelds.” i am surprised that my treo is more valuable to me now in retirement than it was when i was working. (i’m not surprised that my ipod video is, nor that word thinks that treo and ipod are misspellings.) i just wish that more of the web was as easily accessible on my treo as are google maps and almost all of yahoo!. handhelds. trust me. rural public libraries and digital inclusion: issues and challenges brian real, john carlo bertot, and paul t. jaeger information technology and libraries | march 2014 6 abstract rural public libraries have been relatively understudied when compared to public libraries as a whole. data are available to show that rural libraries lag behind their urban and suburban counterparts in technology service offerings, but the full meaning and effect of such disparities is unclear. the authors combine data from the public library technology and access study with data from smaller studies to provide greater insight to these issues. by filtering these data through the digital inclusion framework, it becomes clear that disparities between rural and nonrural libraries are not merely a problem of weaker technological infrastructure. instead, rural libraries cannot reach their full customer service potential because of lower staffing (but not lower staff dedication) and funding mechanisms that rely primarily on local monies. the authors suggest possible solutions to these disparities while also discussing the barriers that must be overcome before such solutions can be implemented. introduction the large number of rural public libraries in the united states is surprisingly understudied, particularly in terms of technology access. the american library association (ala) and other professional organizations consider a public library to be small or rural if its population of legal service area is 25,000 or less. when viewed through this lens, rural public libraries1 • have on average less than one (.75) librarian with a master’s degree from an alaaccredited institution; • have an average of 1.9 librarians, defined as an employee holding the title of librarian; • have an average total of 4.0 staff, including both fulland part-time employees; • have a median annual income (from all sources) of $118,704.50; • have an average of 41,425 visits annually; and • typically have one building or branch that is open an average of 40 hours/week. brian real (breal@umd.edu) is a phd candidate in the college of information studies, john carlo bertot (jbertot@umd.edu) is co-director of the information policy and access center and professor in the college of information studies, and paul t. jaeger (pjaeger@umd.edu) is codirector of the information policy and access center and associate professor and diversity officer of the college of information studies, university of maryland, college park, maryland. mailto:breal@umd.edu mailto:jbertot@umd.edu mailto:pjaeger@umd.edu rural public libraries and digital inclusion | real, bertot, and jaeger 7 while these data suggest rural libraries operate on a smaller and less financially robust scale than their suburban and urban counterparts, the full effect of these discrepancies on service levels is unclear. this article uses various information sources to analyze the effect of these discrepancies on the ability of rural libraries to offer technology-based services. since the advent of the internet in the mid-1990s, public libraries have been key internet-access and technology-training providers for their communities. the ability to offer internet access alongside support and training for patrons using such technology are primary indicators of libraries’ value to their communities. by analyzing data from the 2012 public library funding and technology access survey (plftas), the authors found that rural libraries, on average, have weaker technological infrastructure (such as fewer average numbers of computers and slower broadband connections) and are able to offer fewer support services, such as training classes, than urban and suburban public libraries. with public libraries being many patrons’ only source of broadband access in many rural communities, limitations for rural libraries may affect patrons’ ability to fully participate in employment, education, government, and other central aspects of society. through analysis of the plftas data2 about technology access in rural public libraries in conjunction with other studies of rural libraries and librarians, this article explores the causes and effects of the relatively more limited technological and support infrastructures for rural patrons and communities. method as documented since 1994,3 public libraries were early adopters of internet-based technologies. the purpose of the pltas survey, and its previous iterations, is to identify public library internet connectivity; propose and promote public library internet policies at the federal level; maintain selected longitudinal data as to the connectivity, services, and deployment of the internet in public libraries; and provide national estimates regarding public library internet connectivity. through changes in funding sources and frequency of administration over the past two decades, the survey has maintained core longitudinal questions (e.g., numbers of public access workstations, bandwidth), but consistently explored a range of emerging topics (e.g., jobs assistance, e-government, emergency roles). the survey’s method has evolved over time to meet changing survey data goals. the 2012 survey provides both national and state estimates of public library internet connectivity, public access technologies, and internet-enabled services and resources. the survey used a stratified “proportionate to size sample” to ensure a proportionate national sample using the fy2009 imls public library dataset (formerly maintained by the us national center for education statistics) to draw its sample. strata included states in which libraries resided and metropolitan status (urban, suburban, rural) designations. bookmobile and books by mail service outlets were removed from the file, leaving 16,776 library outlets. information technology and libraries | march 2014 8 the study team drew a sample with replacement of 8,790 outlets stratified and proportionate by state and metropolitan status state.4 the survey received 7,252 responses for a response rate of 82.5%. using weighted analysis to generate national and state data estimates, the analysis uses the responses to estimate to all public library outlets (minus bookmobiles and books by mail) in the aggregate as well as by metropolitan status designations. unless otherwise noted, all data discussed in the article are from the 2012 study. that study, along with all previous public libraries and the internet and public library funding and technology access studies, additional analysis, and data products are available at http://www.plinternetsurvey.org. digital inclusion and the value of public libraries digital inclusion is a useful framework through which one can understand the importance of ensuring individuals have access to digital technologies as well as the means to learn how to use them.5 digital inclusion comprises policies and actions that mitigate the significant, interrelated problems of the digital divide and digital literacy: • digital divide implies the gap—whether based in socioeconomic status, education, geography, age, ability, language, or other factors—between individuals for whom internet access is readily available and those for whom it is not. indeed, even those with basic, dialup internet access are losing ground as internet and computer technologies continue to advance, using increasing bandwidth and demanding high-speed (“broadband”) internet access. • digital literacy encompasses the skills and abilities necessary for access once the technology is available, including understanding the language and component hardware and software required to successfully navigate the technology. • digital inclusion is policies developed to close the digital divide and promote digital literacy. it marries high-speed internet access (as dial-up access is no longer sufficient) and digital literacy in ways that reach various audiences, many of whom parallel those mentioned within the digital divide debate. to match the current policy language, digital inclusion will signify outreach to unserved and underserved populations. since virtually every public library in the united states offers public internet access, these institutions are invaluable in promoting digital inclusion. however, the plftas data shows that not all libraries are equal, with rural public libraries lagging behind libraries in more populated areas in providing technology services. therefore this article focuses on the following issues and questions: • digital divide: why do rural individuals have less access to broadband technologies than their suburban and urban counterparts? how are rural libraries currently compensating for this deficit? rural public libraries and digital inclusion | real, bertot, and jaeger 9 • digital literacy: why do rural libraries offer less digital literacy training and patron support? how do rural libraries compare to libraries in more populated areas on key issues in digital literacy, such as employment and government information? • digital inclusion: what policies have been developed to help rural libraries close the digital divide and promote digital literacy, and what policies—including funding structures and decisions—hinder these libraries from adequately addressing these concerns? what governmental and extra-governmental policies can be enacted to help rural libraries to better promote digital inclusion? the following section describes the differences between rural libraries and their urban and suburban counterparts, combining plftas data with information from other studies to demonstrate how rural libraries are more essential in bridging the digital divide yet are seemingly doing less to promote digital literacy. following this, the authors discuss why rural libraries trail suburban and urban libraries in these areas, with studies suggesting the issue is a result of inadequate resources, not a lack of staff dedication. finally, the authors present a review of some of the initiatives that are attempting to bridge these divides, including suggestions that may help rural librarians to act as better advocates for their patrons’ needs. rural challenges to digital inclusion numerous studies, including plftas, show that rural libraries offer less technology access with slower connection speeds than libraries in more populated areas. these libraries also offer comparatively less formalized digital literacy training, although rural libraries still provide invaluable informal training in this area. this section highlights discrepancies between rural libraries and those in more populated areas. technology and service disparities between rural and nonrural libraries while almost every public library offers patrons internet access, 70.3% of rural libraries are the only free internet and computer terminal access providers in their service communities, compared to 40.6% of urban and 60.0% of suburban libraries.6 the disparity between these categories becomes more striking when one considers the difference between home broadband adoption in rural and nonrural areas. according to the pew research center’s home broadband 2010 survey, only 50% of rural homes have broadband internet access, compared to 70% of nonrural homes.7 this disparity is due in large part to the greater difficulty and cost of creating the infrastructure to support broadband internet access in more sparsely populated areas.8 with broadband access provided primarily by for-profit companies, little profit motive exists to expand services to areas where the infrastructure cost would not allow for a quick and efficient recouping of costs. the us government has attempted to address this problem in a numerous ways, including dedicating $7.2 billion to improving broadband access throughout the country through grants (broadband information technology and libraries | march 2014 10 technology opportunity programs; btop) and loans (broadband infrastructure projects; bip) as part of the american recovery and reinvestment act (arra) of 2009.9 expanding this infrastructure will take time, and at this time it is unknown as to the extent to which broadband access in rural communities, both in general and for public libraries, will increase. as the arra projects near completion, it will be important to conduct follow-up analysis of the effect in terms of access to broadband in the home and in anchor institutions such as public libraries, as well as the extent to which broadband subscriptions increased. at present, however, public libraries—and rural public libraries in particular—are still the primary source of broadband access for many americans, and this will likely remain true for large portions of the population for the foreseeable future. individuals in need of internet access have few options in many communities. though there are increasing free wireless (wi-fi) internet access sources in communities (e.g., coffee shops, food outlets), one needs to have a device (e.g., tablet, laptop) to use these options. in two-thirds of american communities, the public library is the only source of freely available public internet access inclusive of public access computers.10 specific government efforts to increase internet access, broadband networks, and digital literacy of the population, however, fail to involve public libraries in a meaningful way, if at all.11 to be fair, public libraries were eligible to compete for the grants or submit loan applications for the arra broadband funding initiatives, and public libraries in states such as alaska, arizona, colorado, idaho, maine, montana, nebraska, and others have benefited from this, primarily through inclusion in applications with multiple beneficiaries.12 since btop works as a grantmaking process, relatively few us public libraries (approximately 20%) have benefited from btop funding, but the results have been encouraging. for example, 85 libraries in mostly rural nebraska have upgraded their broadband capacity using btop funds, with broadband capacity for these locations increasing from an average of 2.9 mbps to an average of 18.2 mbps. other states have tried innovative ideas, such as the idaho department of labor’s youth corps program to train high school and college students to work as digital literacy coaches, and then deploy them to libraries around the state. indeed, the btop program has certainly created some encouraging results, but this is not a permanently funded program and it targets a limited number of libraries, so it cannot be considered as a primary, widespread solution to the digital inclusion gap between rural and more populated areas. the authors of a recent btop report note, “unless strategic investments in u.s. public libraries are broadened and secured, libraries will not be able to provide the innovative and critical services their communities need and demand.”13 thus btop may provide a good model to addressing gaps in digital inclusion, but it was never designed to be a permanent solution. this role of ensuring digital inclusion in communities has accelerated at a time of unprecedented austerity nationally and at the state and local levels of government in particular. based on bureau of labor statistics (http://www.bls.gov) data, the united states lost 584,000 public-sector jobs between june 2009 and april 2012, or 2.5% of the local, state, and federal government jobs that rural public libraries and digital inclusion | real, bertot, and jaeger 11 existed before the prolonged economic downturn began. according to the center on budget and policy priorities, state budget shortfalls have ranged from $107 billion to $191 billion between 2009 and 2012, and current projections place state budget shortfalls at $55 billion for 2013.14 the prolonged economic downturn, in part, has driven up library usage in some communities.15 even before the downturn began, public libraries in the rural areas typically had the oldest computing equipment, the slowest internet access speeds, and the lowest support levels from the federal government.16 as a part of becoming the main source of digital literacy training and digital inclusion, public libraries have also become a primary training provider for in-demand, technology-based job skills.17 the resulting situation forces public libraries to balance reduced support, increased demand, and a growing centrality in helping their communities recover from the economic downturn. at the center of both increased demand and increased support of digital literacy and inclusion lies sufficient internet access. in a survey of rural librarians in tennessee, respondents reported that their patrons’ most critical information need was broadband internet access.18 the respondents also ranked access to recent hardware technology and software, technology training, and help with specific tasks like applying for jobs or government benefits as highly critical. by comparison, the respondents ranked traditional services such as book loaning as the least critical duty, significantly trailing the abovementioned and other technology services. despite rural librarians viewing technology-based services as their most important function, however, rural libraries lack the resources to meet the same service quality as nonrural libraries. the ensuing section discusses the nature of those disparities. technology infrastructure and technology training virtually all public libraries offer their patrons access to the internet. there is no statistical difference between rural, suburban, and urban libraries in this regard.19 likewise, rural libraries only lag slightly in wireless internet availability, which is becoming increasingly important with the ubiquity of mobile technology devices; 86.3% of rural libraries have wireless access available for patrons, compared to an average of 90.5% across all three categories.20 and, in one of the few technological areas where rural libraries lead their nonrural counterparts, 42.3% of rural libraries reported they had sufficient public access computer terminals at all times, compared to 33.5% of suburban and 12.9% of urban libraries. while the number of rural library computer terminals may be adequate in many locations, hardware quality suffers; 69.5% of rural libraries replace their public access computer terminals as needed while, 66.4% of urban libraries have a technology replacement schedule.21 for many small libraries with only a single full-time librarian, that employee also serves as the it specialist for the location.22 therefore many rural libraries have less up-to-date technologies and less technical support than their nonrural counterparts. even if the librarians who also provide it support for their locations are qualified to fulfill this role, the greater issue is limited time for information technology and libraries | march 2014 12 librarians to work on these issues in addition to other duties. in addition to less recent hardware, rural libraries also have limited bandwidth; 31.1% of rural libraries operate on bandwidths of 1.5 mbps (t1) or less, compared to only 18.3% of suburban libraries and a mere 9.7% of urban libraries.23 the greatest issues facing rural libraries are not well represented by the broader categories of internet access but instead in the implementation of services to make these technologies highly useful and effective for patrons. only 31.8% of rural libraries offer formal technology training classes, as compared to 63.2% of urban and 54.0% of suburban libraries.24 this comparison alone does not present a problem, since more populated areas have larger customer bases that justify training patrons in groups rather than in one-on-one sessions. however, rural libraries also trail significantly in offering one-on-one technology training, with only 30.1% of rural libraries providing such programs, compared to 43.4% of urban and 37.9% of suburban libraries. only 21.9% of rural libraries have online training materials, compared to 36.3% and 33.7% of urban and suburban libraries, respectively. in fact, 12.5% of rural libraries do not offer planned technology training at all, compared to a mere 5.1% of urban libraries and 8.0% of suburban libraries. therefore, while most patrons in nonrural areas who have limited technology skills can go to their local library and acquire such skills for free, such access to the resources for personal advancement is drastically limited by comparison in rural areas. since many rural residents do not have internet access in their homes, many of these individuals do not own computers and have limited technology skills resulting from limited technology exposure. this makes the technology training disparity between rural and nonrural libraries quite problematic, since most americans need these skills to maintain a high standard of living and employment. employment assistance while public libraries in all areas saw adequate staffing as a statistically similar problem for helping patrons find jobs—51.9% or rural librarians agreed this was a challenge, only slightly exceeding the overall average of 49.8%—the greater issue is the disparity of confidence levels in assisting patrons in employment matters.25 nearly half (48.3%) of rural survey respondents agreed a lack of staff expertise was a challenge to helping patrons find and apply for jobs online, compared to 27.9% of urban and 37.7% of suburban libraries. the internet has become essential for many people who wish to gain employment, thus rural public librarians’ inability to support rural residents with limited technology skills is problematic. many government agencies, hospitals, and private employers—including walmart, the largest employer in the united states—will no longer accept paper applications, but instead insist potential employees submit applications via the internet.26 this can be especially challenging for individuals who have recently lost jobs they have held for decades, as they simultaneously need to refresh basic application and interviewing skills while learning how to use unfamiliar information technologies to find and apply for jobs. librarians can offer critical assistance in these cases, especially for individuals who do not own a computer or have internet access in their homes. however, inequities in staffing between rural rural public libraries and digital inclusion | real, bertot, and jaeger 13 and nonrural libraries can prevent rural residents from having equal access and aid in finding careers. government service access rural libraries also lag behind libraries in more populated areas in providing support for accessing online government services. there is no statistically significant difference between public libraries in staff providing assistance to patrons who need help filling out forms, with 96.6% of all libraries offering this service.27 however, only 45.6% of rural libraries assist customers in understanding government programs and services, compared to 57.8% of urban and 52.9% of rural libraries. rural libraries are also far less likely to have formal guides to help customers understand these government services, with only 15.3% of rural libraries offering such products as compared to 33.6% of urban and 22.2% of suburban counterparts. just 6.2% of rural libraries offer formal training classes for using government websites, understanding government programs, and completing forms. roughly a one-fourth (24.5%) of urban and 11.9% of suburban libraries offer such services. in terms of staff expertise, 20.0% of rural libraries reported having at least one staff member with significant knowledge and expertise of government services, compared to 31.4% of urban and 25.0% of suburban libraries. therefore, while most public libraries help patrons access government services, rural libraries lag substantially in the type of formal planning that may make patrons more aware of government services that would improve their quality of life. important services such as voter registration, motor vehicle services, payment of taxes, and school enrollment for children can now be done either only or much more efficiently online.28 these online services are more convenient for many americans, but “while many members of the public may struggle with accessing or using egovernment information and services, government agencies have come to focus on it as a means of cost savings rather than increasing access for members of the public.”29 government agencies have for the most part not taken many americans' lack of digital literacy into account when shifting their primary means of service to the digital realm, nor have they considered the effect this shift has on public libraries as the primary internet provider for many americans. this has led to extra responsibilities for rural public libraries but not a direct increase in resources. one might consider that rural libraries offer fewer of these services, or have less expertise in providing digital government services, in part because such services are not in demand by patrons. however, government services have steadily moved online and the pace is accelerating towards an e-only means of interacting with government. the open government movement,30 combined with the federal government’s release of the technology and services blueprint, signals the further use of technologies to offer innovative and operational digital government services—both through more traditional web-based services and mobile applications.31 and state and local governments are increasingly engaging in e-government services such as unemployment and social service benefits, taxation, licensing, and more. in short, federal, state, and local governments are moving rapidly to a range of e-services that will necessitate facility by librarians with technologies, information technology and libraries | march 2014 14 government services, and government information to better help their communities navigate the challenges of e-government. government intervention in digital literacy although most government agencies have not considered the effect their shift to primarily digital services has on individuals who lack basic digital literacy, the federal communications commission launched two programs that could help with the digital literacy problem. the first of these, digitalliteracy.gov, is designed to provide individuals with tools to facilitate digital inclusion, helping users to acquire skills that will make them more capable in the modern information environment. the challenge with this approach is that many resources on the website are designed for individuals who need such skills and who therefore probably do not have access to the internet or possess the skills to fully engage the resources. moreover, most of these resource links point to external sites, which are organized by arbitrary user ratings rather than skill level and relevance.32 likewise, educator resources – which should be most valuable in helping librarians to education patrons – are presented as links to external sites with limited information about each resource. these resources may be able to help patrons, but a collaborative effort that includes public librarians in creating resources could better target particular patron needs in a public access setting. a newer project, connect2compete, demonstrated more promising progress in this area. connect2compete is a partnership between the fcc and private businesses to provide low-cost internet and computers to low-income families, digital literacy training, and other services.33 they also publicize the digital literacy divide, working with the ad council and other organizations to promote this issue.34 the website allows users to search for places where they can receive digital literacy training, with the search results primarily displaying local public library branches. however, despite pointing users to public libraries for such training, connect2compete currently only helps to fund such training in limited cases. while this program provides a strong model for raising awareness about digital inclusion, it is unlikely to provide infrastructure resources to fully bridge the gap between rural and nonrural communities in the near future. while the fcc has been innovative by soliciting private funds to prevent connect2compete from using any taxpayer funds, these private funds will not replace the need for government funds for public libraries throughout the nation, nor is private funding likely to continue indefinitely. indeed, “while governments at all levels are relying on public libraries to ensure digital inclusion, the same governments are reducing the funding of the very libraries that are being relied on.”35 the following section will detail how decreasing funding and limited resources have contributed to the digital divide between rural and nonrural libraries. rural libraries and barriers to promoting digital inclusion when the internet was emerging in the 1990s, “public libraries essentially experimented with public-access internet and computer services, largely absorbing this service into existing service rural public libraries and digital inclusion | real, bertot, and jaeger 15 and resource provision without substantial consideration of the management, facilities, staffing, and other implications of public-access technology services and resources.”36 while some libraries have increased their funding levels to match these challenges, most funding agencies have not recognized the costs or value of additional services that public libraries now offer in a wired nation. this section discusses the reasons why rural libraries have not been able to offer the same level of service nonrural library patrons routinely expect. funding inadequacies for rural libraries rural libraries face challenges from their problematic funding structure. sin noted that for public libraries, “on average, the local government provided 76.6% of the funding; the state, 8.7%; the federal government, 0.7%; and other sources, 13.9%.”37 this is a particular problem for rural libraries since, as holt explained, “if cities and suburbs had to survive on the extraordinarily low taxes on agricultural property, the urban/suburban public sector would have service levels so low that most officials would turn away in disgust.”38 this lack of local revenue for all public services—including libraries—in rural areas is exacerbated by the continuing population decrease in small towns and the desirability of such locales for retiring seniors, who prefer to live in areas with low taxes because of limited incomes.39 in other words, public library funding structures that place local governments at the forefront of budgeting plans put rural libraries at a serious disadvantage and promote a digital divide between rural and nonrural areas. holt notes, “it is the legitimate function of state government to make things right. state governments, after all, are of a size and scale that historically allows them to perform as equity agencies for locales.”40 indeed, the averages for funding sources cited above can vary, and state and federal governments have attempted to dampen the funding inequities between rural and nonrural libraries. one example is the federal e-rate program, established under the telecommunications act of 1996 to provide schools, libraries, and healthcare providers with a discounted “education rate” for communication technologies, including internet technologies.41 while this has subsidized part of the internet service costs for libraries throughout the nation, many libraries do not apply because they do not know they are eligible or the application process is too complicated. some rural libraries have had the advantage of their state library systems applying on their behalf, but even when funding is provided this only covers parts of the libraries’ connection and equipment costs. and, according to the plftas survey, only 61.5% of rural libraries received e-rate funds, compared to 75.0% of urban libraries, showing the program does not favor the class of libraries with the greatest connectivity issues.42 likewise, as noted above, the federal government designated $7.2 billion from the american recovery and reinvestment act of 2009 for improving broadband access throughout the nation, with funding designated for rural areas and public libraries in general. these improvements will take time, though, and will not fully compensate for the lack of local funds for rural libraries or rural libraries not receiving nearly as much in nongovernmental funds as nonrural libraries.43 additionally, while local governments in some areas have created their own broadband information technology and libraries | march 2014 16 infrastructure to compensate for corporate providers’ unwillingness to expand to some areas due to inadequate predicted profits, nineteen state governments banned such practices due to lobbying efforts from the broadband industry.44 the corporations that lobbied for these laws feared that if this becomes common practice, local governments could offer low enough pricing to compete against for-profit services. while this may be a legitimate concern, the end result of this legislation is local governments—including rural governments—in some states being legally blocked from allocating funds to solve the market failure that has prevented corporate providers from adequately expanding into rural areas. therefore public libraries’ funding and resource structures are inherently stacked against rural institutions. while e-rate and other federal and state programs may mitigate the problem, the ultimate solution needs to be a restructuring of library funding models that takes the primary burden off struggling local governments or at least increases state and federal contributions. in a seminal article on rural libraries and the technology written in 1995, vavrek noted that “public libraries cannot survive by only appealing to those who are least likely to be able to pay to support the library. while visions of the homeless person using the internet to locate information is both compassionate and within the social role of the public library, can the library afford to provide this access?”45 beyond patrons not being open to assisting less fortunate individuals, vavrek suggested attempts to diversify library services—including introducing internet technology services, which was novel at the time—could distract resources from libraries’ established services that have traditionally appealed to all income classes and, with this, erode public support for these institutions. the pew home broadband 2010 survey show vavrek’s thoughts on this matter were prescient, as 53% of survey respondents believed the government either should not support broadband expansion or that this should not be a very important priority.46 the benefits of greater broadband access and relevant service support may seem obvious to those who are intimate with this matter, but much of the public does not see the importance of expanding such services. if rural librarians cannot fight these perceptions and convince traditional library users and the general public of the importance of these services, then they will probably not be able to reverse these negative trends. unfortunately, rural libraries lack the time, resources, and data to lobby the public on these matters. staffing and training problems for rural librarians a lack of funding and resources affect not only rural public libraries, but also rural public librarians. in a study that illustrated such issues, flatley and wyman surveyed a random sample of libraries in extremely rural areas, with their service population baseline being 2,500 as opposed to the 25,000 threshold noted above.47 while the data they collected are somewhat dated (the survey was conducted in 2007), this study still deserves special attention because similar data have not been collected more recently or by other authors. the authors found that 80% of rural libraries have only a single full-time employee, and 50% have two or less paid employees when fulland part-time employees are considered.48 these rural public libraries and digital inclusion | real, bertot, and jaeger 17 employees are underpaid compared to the national average, with 72% reporting they earned $12.99 or less per hour.49 when asked why they believed their pay was relatively low, more than half (53%) of rural librarians responded it was because their communities lacked funds, demonstrating the structure of local funding being more important than state and federal funding to librarians’ salaries.50 flatley and wyman also found that only 14% of these employees held mls degrees, with 32% having achieved bachelor’s degrees and 37% having completed only a high school diploma.51 as one would expect in relation to most rural librarians not having professional training before entering the field, many of these individuals applied for their first library career because they saw a position advertised for their local library and it offered better pay than most other local jobs. while many rural librarians entered the profession because of reasons other than a desire to become librarians, the data suggest these individuals are capable and enthusiastic about their jobs. almost half (47%) of rural librarians had worked in the field for more than a decade, with an additional 22% having been librarians for six to ten years.52 two-thirds (66%) of survey respondents stated they intended to remain librarians until retirement age, and 97% responded they were very satisfied or somewhat satisfied with their careers.53 additionally, despite the relatively low pay for library positions, this was not the most common complaint rural librarians had about their jobs. instead, while 27% found low pay to be the greatest issue they faced, 29% felt a lack of funds for new materials was a greater problem.54 therefore, while certain technological issues in rural public libraries—such as the lack of technological training courses for patrons—can be framed accurately as a problem involving rural librarians, these problems should not be framed as the librarians’ fault. with current staffing levels, rural librarians do not have as much available staff time to provide training courses and one-on-one training as their suburban and urban counterparts. these librarians may also lack the knowledge and experience to train others in technological skills, and their libraries may lack the funds to help them acquire these abilities. these factors are outside of these librarians’ control, however, and “no matter how hard lis professionals try, one cannot expect public library systems (especially those in less-advantaged neighborhoods) to bridge the information gap when the libraries are themselves underfunded and understaffed.”55 considering typical rural librarians' high dedication levels, one can assume they would be willing to remedy information gaps if they first had the resources to fix their libraries’ skill, funding, and staffing gaps. possible solutions rural libraries face the dual issue of a lack of resources to allow librarians enough time to advocate for their branches and a lack of data that advocates can use to show funders these libraries’ value to their communities. as a solution for the latter problem, sin suggested that library and information science (lis) scholars and other prominent figures in the field begin a dialogue with underfunded libraries—including rural institutions—to work with librarians to gather, process, and interpret data on libraries’ needs and libraries’ effects on their information technology and libraries | march 2014 18 communities.56 this would have the dual benefit of giving librarians better information with which they could focus their services for maximum value and providing graduate-student and professional-level researchers with a stronger understanding of their field. the authors of this article would like to expand on this slightly to suggest that any researchers who draft scholarly papers and presentations from data collected from work with underfunded libraries should feel obligated to assist libraries in using this data for their own benefit. scholars are likely to be in a better position to advocate for libraries with which they collaborate than time-and resourcestrapped librarians, and they should feel an ethical responsibility to do so after reaping the benefits of research. more rural librarians also need the skills to empower them to lead technological training courses for patrons, gather data to better understand how to best optimize their services, and lobby for greater funding at the local level. mehra et al. of the school of information sciences at the university of tennessee attempted to remedy this problem to a limited degree with a program they launched in june 2010 with funding from the imls laura bush 21st century librarian program.57 the researchers used this funding to provide full scholarships—including laptop computers and funds for books—to sixteen rural librarians already working in the south and central appalachia regions, allowing them to earn an mls degree in two years of part-time study. the researchers had previously conducted a qualitative survey of rural librarians in tennessee to determine the training and resource needs of rural librarians,58 and they used these data to form a customized mls program for the scholarship students. this included courses focusing on strong technical competencies, service evaluation, grant writing, and other courses of particular relevance to the rural environment. likewise, georgia uses state funds to pay the salaries of many experienced librarians with mls degrees throughout the state, thereby lifting the burden of affording such individuals off cashstrapped counties and municipalities.59 however, as this system develops in georgia, state funding is still limited and there have been state funding cuts to other areas, such as materials and infrastructure, to allow for an increase in state-funded professional librarians.60 therefore, while this appears to be a promising model that can be of particular benefit to rural residents of the state, further study is needed to determine its overall effects. with an estimate of more than 8,000 rural public libraries operating in the united states,61 it would be impossible to find the resources to provide the large majority of librarians without an mls at these locations with the full training needed to earn the degree. even if such funding were available, a large portion—if not the majority—of these resources could be put to better use by improving rural libraries’ technological infrastructure, increasing salaries, and growing collections. therefore, while the mls may remain the gold standard for library professionalism, it is not a realistic goal for many experienced and dedicated librarians throughout the country. instead, a more realistic program on a larger scale may be to provide rural librarians with targeted online and in-person training to enhance the skills they feel they need to be more successful. faculty and rural public libraries and digital inclusion | real, bertot, and jaeger 19 graduate students in lis academic programs are perhaps the most capable people to lead such training, and they are likely more capable of writing grant proposals to cover the costs of such programs than the rural librarians they could assist. mehra et al. have shown promising progress in this direction,62 and by removing the mls goal (or only expecting it in limited cases), their work could easily be emulated to help lis educators empower librarians throughout the nation. connect2compete, as detailed above, also has the potential to provide a training model for public librarians. the organization plans to create a “digital literacy corps,” comprising individuals who will help train portions of the public in basic digital literacy skills.63 while this program is still in its early phases, the organization plans to include librarians among this corps, training them to be better able to train others. once again, this will be achieved through private funds donated by corporate partners. this is certainly a noble effort and will likely benefit many libraries and their patrons, but “having access to training and being able to take advantage of training are two separate things.”64 connect2compete, digitalliteracy.gov, and other organizations already provide some resources to help rural librarians understand digital literacy issues and provide better training, but librarians have limited time to familiarize themselves with these sources when dealing with their daily duties. for librarians to use current, future, or more refined training resources, the problem of low staffing—and its cause, low funding—must be addressed. since many rural librarians lack the skill or, more importantly, time to lobby for their own libraries, this is a significant area where partner organizations can help. whether these partners are university departments as envisioned by mehra et al. and sin or individuals funded by private donations in the connect2compete model is inconsequential. the important issue is that if these partner groups want to truly help rural libraries bridge the digital divide, these groups will have to contribute a significant portion of their efforts to lobbying to increase library funding enough to improve infrastructure and increase staffing—and, through this, staff time—for training and assisting patrons. as discussed above, the btop program has had success both in increasing technological infrastructure and human infrastructure, with grant funding being used in some cases to bring in temporary staff that is capable of training patrons in digital literacy and to increase training opportunities for patrons using existing staff. given the information above, btop’s holistic approach is certainly encouraging, and the program's use of federal funds has shown how resources from above the local level can serve as an equalizing force. the temporary nature and limited funding of this program, however, make it important to remember this cannot be considered as the primary solution to the digital inclusion problem. conclusion many rural public libraries are the only providers of free broadband internet service and computer terminals for their communities, with these communities having the lowest average proportion of homes with broadband connections. with the internet being essential to receive information technology and libraries | march 2014 20 important government services and to apply for jobs with some of the largest and most ubiquitous employers throughout the nation, the value of the services offered by these libraries cannot be understated. the basic public library funding structure needs to be modified to close the digital inclusion gap between rural and more populated areas. even if local governments remain the primary funding source for public libraries, this contribution cannot remain grossly disproportionate when compared to state and federal support. state and federal governments are already seeing savings by moving access to government services and information online, and these governments will benefit with the better employment rates and better employee competency that comes with a digitally inclusive society. since these governments share in the benefits of digital inclusion, they must also share in the costs. some programs have shown promising results in bolstering rural public libraries and, though this, improving this nation's digital inclusion. these results range from large-scale programs such as btop to smaller programs such as the mls education program initiated by mehra et al. a common element of many of these programs, though, is their temporary nature, showing that funders are not recognizing that as technological innovation continues, new problems in digital inclusion will emerge. for government decision makers to understand the ongoing nature of the digital inclusion problem, rural public librarians and their allies—including academics and other stakeholders— will need to gather better data and provide better advocacy. references 1. , “fy2011 public library (public use) data files,” institute of museum and library services, http://www.imls.gov/research/pls_data_files.aspx. 2. john carlo bertot etal., 2011–2012 public library funding and technology access survey: survey findings and results (college park, md: information policy and access center, 2012), http://ipac.umd.edu/sites/default/files/publications/2012_plftas.pdf. 3. the studies originally began as the public libraries and the internet survey series until 2006 through various funding sources, at which time they became part of the public library funding and technology access study (http://www.ala.org/plinternetfunding), funded by the american library association and the bill & melinda gates foundation. 4. john carlo bertot et al., “public libraries and the internet: an evolutionary perspective,” library technology reports 47, no. 6 (2011): 7–8. 5. paul t. jaeger et al., “the intersection of public policy and public access: digital divides, digital literacy, digital inclusion, and public libraries,” public library quarterly 31, no. 1 (2012): 1–20. 6. bertot et al., 2011–2012 public library funding and technology access survey. http://www.imls.gov/research/pls_data_files.aspx http://ipac.umd.edu/sites/default/files/publications/2012_plftas.pdf http://www.ala.org/plinternetfunding rural public libraries and digital inclusion | real, bertot, and jaeger 21 7. aaron smith, home broadband 2010 (washington, dc: pew research center, 2010): 8, http://www.pewinternet.org/~/media//files/reports/2010/home%20broadband%20201 0.pdf. 8. federal communications commission, connecting america: the national broadband plan (washington, dc: federal communications commission, 2009): xi–xiii, http://download.broadband.gov/plan/national-broadband-plan.pdf. 9. aaron smith, home broadband 2010, 5. 10. john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 286; bertot et al., “public libraries and the internet,” 12–13. 11. jaeger et al., “the intersection of public policy and public access,” 1–20. 12. us public libraries and the broadband technology opportunities program (btop). (washington, dc: american library association, 2013): 1–2, http://www.districtdispatch.org/wp-content/uploads/2013/02/ala_btop_report.pdf. 13. ibid., 18. 14. “states continue to feel recession’s impact,” center on budget and policy priorities, last modified june 27, 2012, http://www.cbpp.org/cms/index.cfm?fa=view&id=711. 15. deanne w. swan et al., public libraries survey: fiscal year 2010 (imls-2013–pls-01) (washington, dc: institute of museum and library services, 2010) 16. paul t. jaeger et al., “public libraries and internet access across the united states: a comparison by state from 2004 to 2006,” information technology & libraries 26, no. 2 (2007): 4–14, http://dx.doi.org/10.6017/ital.v26i2.3277. 17. natalie greene taylor et al., “public libraries in the new economy: 21st century skills, the internet, and community needs,” public library quarterly 31, no. 3 (2012): 191–219. 18. bharat mehra et al., “what is the value of lis education? a qualitative study of the perspectives of tennessee’s rural librarians,” journal of education for library & information science 52, no. 4 (2011): 272. 19. bertot et al., 2011–2012 public library funding and technology access survey, 15. 20. ibid., 22. 21. ibid., 46. 22. bertot, “public access technologies in public libraries,” 88. 23. bertot et al., 2011-2012 public library funding and technology access survey, 21. 24. ibid., 29. http://www.pewinternet.org/~/media/files/reports/2010/home%20broadband%202010.pdf http://www.pewinternet.org/~/media/files/reports/2010/home%20broadband%202010.pdf http://download.broadband.gov/plan/national-broadband-plan.pdf http://www.districtdispatch.org/wp-content/uploads/2013/02/ala_btop_report.pdf http://www.cbpp.org/cms/index.cfm?fa=view&id=711 information technology and libraries | march 2014 22 25. ibid., 42–45. 26. mehra et al., “what is the value of lis education?” 271–72. 27. bertot et al., 2011–2012 public library funding and technology access survey, 36. 28. paul t. jaeger and john carlo bertot, “responsibility rolls down: public libraries and the social and policy obligations of ensuring access to e-government and government information,” public library quarterly 30, no. 2 (2011): 91–116. 29. ibid., 100. 30. the obama administration’s commitment to open government: a status report (washington: government printing office, 2013): 4–7, http://www.whitehouse.gov/sites/default/files/opengov_report.pdf. 31. barack obama, digital government: building a 21st century platform to better serve the american people (washington, dc: office of management and budget, 2012), http://www.wh.gov/digitalgov/pdf. 32. “find educator tools,” digitalliteracy.gov, http://www.digitalliteracy.gov/content/educator. 33. “about us,” everyoneon, http://www.everyoneon.org/c2c. 34. ad council, “ad council & connect2compete launch nationwide psa campaign to increase digital literacy for 62 million americans,” press release, march 21, 2013, http://www.adcouncil.org/news-events/press-releases/ad-council-connect2competelaunch-nationwide-psa-campaign-to-increase-digital-literacy-for-62-million-americans. 35. jaeger et al., “public libraries and internet access,” 14. 36. bertot, “public access technologies in public libraries,” 81. 37. sei-ching joanna sin, “neighborhood disparities in access to information resources: measuring and mapping u.s. public libraries’ funding and service landscapes,” library & information science research 33, no. 1 (2011): 45. 38. glenn e. holt, “a viable future for small and rural libraries,” public library quarterly 28, no. 4 (2009): 288. 39. ibid., 288–89. 40. ibid., 289. 41. paul t. jaeger, charles r. mcclure, and john carlo bertot, “the e-rate program and libraries and library consortia, 2000–2004: trends and issues,” information technology & libraries 24, no. 2 (2005): 57–67. 42. bertot et al., 2011–2012 public library funding and technology access survey, 61. 43. sin, “neighborhood disparities in access,” 51. http://www.whitehouse.gov/sites/default/files/opengov_report.pdf http://www.wh.gov/digitalgov/pdf http://www.digitalliteracy.gov/content/educator http://www.everyoneon.org/c2c/ http://www.adcouncil.org/news-events/press-releases/ad-council-connect2compete-launch-nationwide-psa-campaign-to-increase-digital-literacy-for-62-million-americans http://www.adcouncil.org/news-events/press-releases/ad-council-connect2compete-launch-nationwide-psa-campaign-to-increase-digital-literacy-for-62-million-americans rural public libraries and digital inclusion | real, bertot, and jaeger 23 44. olivier sylvain, “broadband localism,” ohio state law journal 73, no. 4 (2012): 20–24. 45. bernard vavrek, “rural information needs and the role of the public library,” library trends 44, no. 1 (1995): 26. 46. aaron smith, home broadband 2010, 2. 47. robert flatley and andrea wyman, “changes in rural libraries and librarianship: a comparative survey,” public library quarterly 28, no. 1 (2009): 25–26. 48. ibid., 34. 49. ibid., 35. 50. ibid., 28. 51. ibid., 33. 52. ibid., 26. 53. ibid., 29. 54. ibid., 30. 55. sin, “neighborhood disparities in access,” 50. 56. ibid., 51. 57. bharat mehra et al., “collaborations between lis education and rural libraries in the southern and central appalachia: improving librarian technology literacy and management training,” journal of education for library & information science 52, no. 3 (2011): 238–47. 58. mehra et al., “what is the value of lis education?” 59. “state paid position guidelines,” last updated august 2013, http://www.georgialibraries.org/lib/stategrants_accounting/official_state_paid_position_gui delines-updated-august-2013.pdf. 60. bob warburton, “georgia tweaks state funding formula to prioritize librarians,” library journal, february 2, 2014, http://lj.libraryjournal.com/2014/02/budgets-funding/georgiatweaks-state-funding-formula-to-prioritize-librarians. 61. bertot et al., 2011–2012 public library funding and technology access survey, 14. 62. mehra et al., “collaborations between lis education and rural libraries”; mehra et al., “what is the value of lis education?” 63. institute of museum and library services, “imls announces grant to support libraries’ roles in national broadband adoption efforts,” press release, june 14, 2012, http://www.imls.gov/imls_announces_grant_to_support_libraries_roles_in_national_broadban d_adoption_efforts.aspx. http://www.georgialibraries.org/lib/stategrants_accounting/ http://lj.libraryjournal.com/2014/02/budgets-funding/georgia-tweaks-state-funding-formula-to-prioritize-librarians/ http://lj.libraryjournal.com/2014/02/budgets-funding/georgia-tweaks-state-funding-formula-to-prioritize-librarians/ http://www.imls.gov/imls_announces_grant_to_support_libraries_roles_in_national_broadband_adoption_efforts.aspx http://www.imls.gov/imls_announces_grant_to_support_libraries_roles_in_national_broadband_adoption_efforts.aspx information technology and libraries | march 2014 24 64. bertot, “public access technologies in public libraries,” 88. book reviews information technology and libraries | march 2014 44 epub 3: best practices, by matt garrish and markus gylling. sebastopol, ca: o'reilly. 2013. 345 pp. isbn: 978-1-449-32914-3. $29.99. there is much of value in this book—there aren't really that many books out right now about the electronic book markup framework, epub 3—yet i have a hard time recommending it, especially if you're an epub novice like me. so much of the book assumes a familiarity with epub 2. if you aren't familiar with this version of the specification, then you will be playing a constant game of catch-up. also, it's clear that the book was written by multiple authors; the chapters are sometimes jarringly disparate with respect to pacing and style. the book as a whole needs a good edit. this is surprising since o'reilly is almost unifo rmly excellent in this regard. the first three chapters form the core of the book. the first chapter, "package document and metadata," illustrates how the top level container of any epub 3 book is the "package document." this document contains metadata about the book as well as a manifest (a list of files included in the package as a whole), a spine (a list of the reading order of the files included in the book), and an optional list of bindings (a lookup list similar to the list of helper applications contained in the configurations of most modern web browsers). the second chapter, "navigation," addresses and illustrates the creation of a proper table of contents, a list of landmarks (sort of an abbreviated table of contents), and a page list (useful for quickly navigating to a specific print-equivalent page in the book). the third chapter, "content documents," is the heart of the core of the book. this chapter addresses markup of actual chapters in a book, pointing out that epub 3 markup here is mostly a subset of html5, but also pointing out such things as the use of mathml for mathematical markup, svg (scalable vector graphics), page layout issues, use of css, and the use of document headers and footers. after reading these first three chapters, my sense is that one is ready to dive into a markup project, which is exactly what i did with my own project. that said, i think a reread of these core chapters is due, which i intend to do presently. the rest of the book is devoted to specialty subjects such as how to embed fonts, use of audio and video clips, "media overlays" (epub 3 supports a subset of smil, the synchronized multimedia integration language, for creating synchronized text/audio/video presentations), interactivity and scripting (with javascript), global language support, accessibility issues, provision for automated text-to-speech, and a nice utility chapter on validation of epub 3 xml files. of these, the chapter on global language support i found to be fascinating. for us native english speakers, it's not immediately obvious some of the problems one will inevitably encounter when trying to create an electronic publication that can work in non-western languages. just consider languages that read vertically and from right to left, for one! as an epub novice, my greatest desire would be for the book to provide, maybe in an appendix, a fairly comprehensive example of an epub 3 marked -up book. maybe this is a tall book reviews 45 order? nevertheless, i would love to see an example of marked up text including bidirectional footnotes, pagination, a table of contents, etc.; simple, foundational things, really. examples of each of these are included in the book, but not in one place. having such an example in one place would be something that could be used as a quick-start template for us epub beginners. to be fair, code examples of all of this is up on the accompanying website, and i am using these examples as i learn to code epub 3 for my own project. but having a single, relatively comprehensive example as an appendix to the book would be very useful. as i read this book, something kept bothering me. epub2 and epub 3 are so very different, with reading systems designed to render epub 3 documents being fairly rare at this point. so if different versions of the same spec are so different, with no guarantee that a future reading system will be able to read documents adhering to a previous version, then the prospect of reading epub documents into the future is pretty sketchy. are e-books, then, just convenient and cool mechanisms for currently reading longish narrative prose—convenient and cool, but transitory? mark cyzyk is the scholarly communication architect in the sheridan libraries, johns hopkins university, baltimore, maryland, usa. 250 compression word coding techniques for information retrieval 'william r. nugent: vice president, inforonics, inc., cambridge, massachusetts a description and comparison is presented of four compression techniques for word coding having application to information retrieval. the emphasis is on codes useful in creating directories to large data files. it is further shown how differing application objectives lead to differing measures of optimality for codes, though compression may be a common quality. introduction cryptographic studies have documented much useful language data having application to retrieval coding. because unclassified cryptographic studies are few, fletcher pratt's 1939 work ( 1) remains the classic in its field. gaines ( 2) has the virtue of being in print, and the more recent cryptographic history of kahn ( 3), while comprehensive, lacks the statistical data that made the earlier works valuable. the word coding problem for language processing, as opposed to cryptography, has been extensively studied by nugent and vegh ( 4). information theorists have contributed the greatest volume of literature on coding and have added to its mathematical basis, largely from the standpoint of communications and error avoidance. a brief discussion of compression codes and their objectives is here presented, and then a description of a project encompassing four compression codes having application to retrieval directories. two of the compression word goding/ nugent 251 codings are newly devised. one is transition distance coding, a randomizing code that results in short codes of high resolving power. the second is alphacheck. it combines high readability with good resolution, and permits simple truncation to be used by means of applying a randomized check character that acts as a surrogate of the omitted portion. it appears to have the greatest potential, in directory applications, of the codes considered here. recursive decomposition is a selected letter code devised by the author several years ago ( 4). it has been tested and has the advantages of simple derivation and high resolution. soundex(5) is the only compression code that has achieved wide usage. it was devised at remington rand for name matching under conditions of uncertain spelling. objectives of compression coding it is desired to transform sets of variable length words into fixed length codes that will maximally preserve word to word discrimination. in the final directories to be used, the codes for several elements will be accessible to enable the matching of several factors before a file record is selected. the separate codes for differing factors need not be the same length, though each type of code will be of uniform length; nor need the codes for differing factors be derived by the same process. what we loosely call codes, must be formally designated ciphers. that is, they must be derivable from the data words themselves, and not require "code books" to determine equivalences. this is so because the file directories must be derivable from file items, ent:ries in directory form must be derivable from an input query, and these two directory items must match when a record is to be extracted. the ciphers need not be decipherable for the application under consideration, and in general are not. fixed length codes which provide the rough equivalent and simplicity of a margin entry in a paper directory, are generally desirable for machine directories. the functions of the codes will detennine their form, and a code or file key designed to meet one objective will generally not be satisfactory for any other objective. the following typical objectives serve as four examples: ( 1) create a file key for extraction of records in approximate file order, as is required for the common sorting and printout problem. a typical code construction rule is to take the first six letters. johnsen _..,.. johnse johnson _..,.. johnso johnston _..,.. johnst johnstone _..,.. johnst 252 journal of library automation vol. 1/ 4 december, 1968 ( 2) create a file key for extraction of records under conditions of uncertainty of spelling (airline reservation problem). a typical code construction rule is vowel elimination or soundex. a typical matching rule is best match. vowel elimination soundex johnsen_..,.. jhnsn j525 _..,.. j52 johnson_..,.. jhnsn j525 _..,.. j52 johnston _..,.. jhnstn j5235 _..,.. j52 johnstone _..,.. jhnstn j5235 _..,.. j52 ( 3) create a file key extraction of records from accurate input, with objective of maximum discrimination of similar entries (cataloging search problem). typical code construction rules are recursive decomposition coding or transition distance coding. recursive decomposition transition distance johnsen_..,.. jhnsen bftz johnson _..,.. jhnson dnwu johnston _..,.. jhston ziky johnstone _..,.. jhsone ecrc for the file keys of primary concern accurate imput data is assumed and the objective is maximum discrimination. desirably, a code would be as discriminating as transition distance coding and be as readable as truncation coding. this can be achieved to some degree by combining the two codes into one, with an initial portion truncated and a final check character representing the remainder via a compressed transition distance code: alphacheck. ( 4) create a file key for human readability and high word to word discrimination. possible code construction rules are alphacheck, and simple truncation plus a terminal check character. johnsen _..,.. johnsv johnson _..,.. johnsx johnston _..,.. johnsd johnstone _..,.. johnss methods the algorithms for creating the preceding codes are described in the following sections. it is axiomatic that randomizing codes give the greatest possible discrimination for a given code space. the whole trick of creating a good compression code is to eliminate the natural redundancy of english orthography, and preserve discrimination in a smaller word size. compression word goding/ nugent 253 letter-selection codes can only half accomplish this, due to the skewed distribution of letter usage. they can eliminate the higher-frequency components, but they cannot increase the use of the lower-frequency components. randomizing codes-often called "hash" codes, properly quasi-random codes-can equalize letter usage and hence make best use of the code space. prime examples here are the variants of godel coding devised by vegh ( 4) in which the principle of obtaining uniqueness via the products of unrepeated primes is exploited, as it is in the randomizing codes considered here. the problem in design of a randomizing code is that the results can be skewed rather than uniformly distributed due to the skewed nature of the letters and letter sequences that the codes operate on. in transition distance coding, the natural bias of letters and letter sequences is overcome by operating on a word parameter that is itself semi-random in nature. the following principle, not quite a theorem, applies: "considering letters in their normal ordinal alphabetic position, and considering letter transitions to be unidirectional and cyclic, the distribution of transition distances in english words is essentially uniform." in view of the fact that letter usage has an extremely skewed distribution, with a probability ratio in excess of 170 to one for the extremes, it is seen that the more uniform parameter of transition distances is a superior one for achieving randomized codes. the relative uniformity of transition distance needs further investigation, but one typical letter diagram sample from gaines ( 2) with 9999 transitions (means number of occurrences of each distance = 385) yielded a mean· deviation of 99 and a standard deviation of 123, and an extreme probability ratio of 3.3 to one for the different transition distances from 0 to 25. the distribution can be made more uniform by letter permutation. permutation is used in the algorithm for transition distance coding but not in alphacheck. algorithm the method of transition distance coding is used to operate on a variable length word to achieve fixed length alphabetic or alphanumeric codes that exhibit quasi-random properties. the code is formed from the modulo product of primes associated with transition distances of permuted letters. the method is intended strictly for computer operation, as it is a simple program but an extremely tedious manual operation. there are five steps: ( 1) permute characters of natural language word. this breaks the diagram dependency that could make the transition distances less uniformly distributed. this step might be dispensed with if the resulting distributions prove satisfactory without it. the permutation process consists of taking the middle letter (or letter right of middle for words with an even number of letters) , the first, the last, the second, the next-to-last, etc. 254 journal of library automation vol. 1/4 december, 1968 until all letters have been used. that is, for a letter sequence: at, a2, ... at ... ' an the following permutation is taken: arnt ( -i +1), at, an, a2, an-1, ... a) elements in the structural map (example 1a).

[subsidiary

s containing image information]

[subsidiary

s containing image information]

[subsidiary

s containing image information]

example 1a. sample experiment-level structural map within these containing divisions, subsidiary

elements are used to map the combination of images necessary to deliver the content for each type. mets allows the specification of the parallel or sequential structuring of files using its and elements respectively. the parallel processing of the apotone subtype, for instance, could be encoded as shown in example 1b. information technology and libraries | september 2012 27

example 1b. sample parallel structure for raw image files to be combined using a process specified in associated metadata (behavior section) each division of the structural map of this type may in its turn be attached to a specific software item in the mets behavior section to designate the application through which it should be processed: the tri-partite set of images in example 1b, for instance, would be linked to the processapotone software using the code in example 1c. example 1c. sample mets behavior mechanism for a specification of image processing this approach is straightforward, and mets is capable of encoding all of the requirements of this data model, although at the cost of large file sizes and a degree of inflexibility. this may be no problem when the principle rationale behind the creation of this metadata is preservation: linking all of the project metadata in a coherent, albeit monolithic, structure of this kind benefits especially its usage as an open archival information system (oais) archival information package (aip), one of the key functions for which mets was designed. problems are likely to arise, however, when this approach is scaled up in a delivery system to include the potentially millions of data objects that this project may produce. the large size of the mets files that this approach necessitates makes their on-the-fly processing for delivery much slower than a system that uses aggregations of the smaller files required by the fcm model and so processes only metadata at the granularity necessary for the delivery of each object. such flexibility is much harder to achieve within mets, although mechanisms that currently exist for aggregating diverse objects within mets may seem to offer some degree of solution to this problem. complex relationships under mets underlying the mets structural map is an assumed ontology of digital objects that encodes a longestablished view of text as an ordered hierarchy of content objects;11 this model accounts for the mets as an intermediary schema for a digital library of scientific multimedia | gartner 28 map’s use of hierarchical nesting and the ordinality of the object’s components. the rigidity of this model is alleviated to some extent by the facility within mets to encode structural links that cut across these hierarchies. these links, which join nodes at any level of the structural map, are particularly useful for encoding hyperlinks within webpages,12 and so are often used for archiving websites. various attempts have been made to extend the functionality of the structural map and structural links sections to allow more sophisticated aggregations and combinations of components beyond the boundaries of a single digital object, in a manner analogous to the flexible granularity of fcm. mets itself offers the possibility of aggregating other mets files through its (mets pointer) element: this element, always a direct child of a

element in the structural map, references a mets file that contains metadata on the digital content represented by this

. for example, two complex digital objects could be represented at a higher collection level, as shown in example 2.

example 2. use of mets element this feature has found some use in such projects as the echo depository, which uses it to register digital objects at various stages of their ingest into, and dissemination from, a repository;13 it is also recommended by the paradigm project as a method for archiving born-digital content, such as emails.14 nonetheless, its usage remains fairly limited; of all the mets profiles registered on the central mets repository, for instance, echo dep at the time of writing remains the only project on the library of congress’s repository of mets profiles to employ this feature. 15 an important reason for its limited take-up is that its potential for more sophisticated uses than merely populating a division of the structural map is severely limited by its place in the mets schema. the element can only be used as a direct child of its parent

: it cannot, for instance, be located in or elements to indicate that the objects referenced in its subordinate mets files should be processed in parallel or in sequence (as is required by the different experiment types in figure 1), nor may the contents of these files be processed by the sophisticated partitioning features of the element, which allows subsidiary parts of a

to be addressed directly. a more sophisticated approach to combining digital object components is to employ open archives initiative object reuse and exchange (oai-ore) aggregations,16 which express more complex relationships at greater levels of granularity than the method allows. information technology and libraries | september 2012 29 mcdonough’s examination of the possibility of aligning the two standards concludes that it is indeed possible, although at the cost of eliminating the mets behavior section and removing much of the flexibility of mets’s structural links, both side effects of oai-ore’s requirement that resource maps must form a connected rdf graph.17 in addition, converting between mets and oai-ore may not be lossless, depending on the design of the mets document.18 neither approach therefore seems ideal for an application of this type, the former because of the limited ways in which the element can be deployed outside the element and its subsidiaries, the latter because of its removal of the functionality of the behavior section, which is essential for the delivery of material such as this. mets as an intermediary schema an alternative approach adopted here uses the technique of employing mets files as intermediary schemas to act as templates from which mets-encoded packages for delivery can be generated. intermediary xml schemas are intermediary in the sense that they are designed not to act as final metadata containers for archiving or delivery, but as mediating encoding mechanisms from which data or metadata in these final forms can be generated by xslt transformations: one example is cerif4ref, a heavily constrained xml schema used to encode research management information from which metadata in the complex common european research information format (cerif) data model can be generated.19 the cerif4ref schema attempts to emulate the architectural processing features of sgml,20 which are absent from xml; these allowed simpler document type definitions (dtds) to be compiled for specific applications, which could then be mapped to established, more complex, sgml models. instead of architectural processing, cerif4ref uses xslt to carry out this processing, so allowing the combination of a simpler scheme tailored to the requirements of an application to be combined with the benefits of a more interoperable but highly complex model that is difficult to implement in its standard form. instead of using this technique for constraining encoding to a simpler model and generating more complex data structures from this, the intermediary schema technique may be used to define templates, similar to a content model, from which the final mets files to be delivered can be constructed. as is the case with cerif4ref, xslt is used for these transformations, and the xslt files form an integral part of the application. in this way, a series of templates, beginning with highest-level abstractions, are used to generate their more concrete subsidiaries, until a final version used for dissemination is generated. the core of this application is a mets file, which acts as a template for the data delivery requirements for each type of experiment. figure 2 demonstrates the components necessary for defining these for the 2grating experiment subtype detailed previously in figure 1. mets as an intermediary schema for a digital library of scientific multimedia | gartner 30 figure 2. defining an experiment subtype in mets the data model for the delivery of these objects is defined in the (b): as can be seen here, a series of nested

elements is used to define the relationship of experiment subtypes to types, and then to define, at the lowest level of this structure, the model for delivering the objects themselves. in this example, two files are to be processed in parallel; these are defined by elements within the (parallel) element. in a standard mets file, the fileid attribute of would reference a element within the mets file section (a): in this case, however, they reference empty file group () elements, which are populated with elements when this template undergoes its xslt transformation. the final component of this template is the mets behavior section (c), in which the applications required to process the digital objects are defined. two behavior sections are shown in this example: the first is used to invoke the xslt transformation by which this mets template file is to be processed, the second to define the software necessary to co-process the two images files for delivery. both indicate the divisions of the structural map whose components they process by their structid attributes: the first references the map as a whole because it applies to recursively to the mets file itself, the second references the experiment for which it is needed. when delivering a digital object, it is then necessary to process this template mets file to generate the final version used to encode its metadata in full. the xslt used to do this co-processes the template and a separate mets file defined for each object containing all of its relevant metadata: information technology and libraries | september 2012 31 this latter file is used to populate the empty sections of the template, in particular the file section. figure 3 provides an illustration of the xslt fragment which carries out this function. figure 3. the xslt transformation file is evoked with the sample parameter, which contains the number of the sample to be rendered: this is used to generate the filename for the document function, which selects the relevant mets file containing metadata for the image itself. the element within this file, which corresponds to the required image, is then integrated into the relevant element in the template file, populating it with its subcomponents, including the element, which contains the location of the file itself. in the case of the actin_5 experiment, which generates a video file from a sequence of still images, the processes involved are slightly more complicated. because the number of still images to be processed will vary for each sample, it is not possible to specify the template for the delivery mets as an intermediary schema for a digital library of scientific multimedia | gartner 32 version of the sequence explicitly within a element as is done for the other experiments. instead, it is necessary to define a further mets file (the “sequence file”) in which the sequence for a given sample is defined. in this case, the architecture is shown in figure 4. figure 4. populating sequentially processed file section with xslt in this case the element in the mets template file acts as a placeholder only and does not encode even the skeletal information for the parallel-processed tiff files in figure 3. similarly, the structural map

for this experiment indicates only that this section is a sequence but does not enumerate the files themselves even in template form. both of these sections are populated when the file is processed by the xslt transformation to import metadata from the mets “sequence file,” information technology and libraries | september 2012 33 in which the file inventory (a) and sequential structure (b) for a given sample are listed. the xslt file populates the file section and structural map directly from this file, replacing the skeletal sections in the template with their counterparts from the sequence file. through this relatively simple xslt transformation, the final delivery version of the mets file is readily generated for either content model. this file can itself then be delivered on the fly (for instance, as a fedora disseminator); this is done by using a further xslt file to process the complex digital object components using the mechanism associated with each experiment in the mets behavior section. given the relatively small size of all of the files involved, this processing can be done more quickly than would be possibly using a fully aggregated mets approach. in the laboratory environment in particular, where the fast rendering and delivery of these images is needed so as not to impede workflows, this has major advantages. although the project aimed to examine specifically the use of fedora for the delivery of this complex material, and so employed fcm as the basis of its metadata strategy, the technique examined in this article proved itself a viable alternative that made much fewer demands on developer time. the small number of xslt stylesheets required to render and deliver the mets files were written within a few hours: the development time to program the delivery of the rdfbased metadata that formed the fcm required several weeks. processing xml using xslt disseminators in fedora is very fast, and so using this method instead of processing rdf introduces no discernible delays in object delivery. conclusions this approach to delivering complex content appears to offer the benefits of the alternative approaches outlined above in a simpler manner than either currently allows. it offers much greater flexibility than the mets element, which can only populate a complete structural map division. when compared to the fcm approach, this strategy, which relies solely on relatively simple xslt transformations for processing the metadata, requires less developer time but offers a similar degree of flexibility of structure and granularity. it also avoids much of the rigidity of the oai-ore approach by not requiring the use of connected rdf graphs, and so frees up the behavior section to define the processing mechanisms needed to deliver these objects. using the intermediary schema technique in this way does therefore offers a means of combining the advantages of employing well-defined interoperable metadata schemes and the practicalities of delivering digital content in an efficient manner, which makes limited demands on development. as such, it represents a viable alternative to the previous attempts to handle complex aggregations within mets discussed above. the adoption of integrated library systems (ils) became prevalent in the 1980s and 1990s as libraries began or continued to automate their processes. these systems enabled library staff to work, in many cases, more efficiently than they had been in the past. however, these systems were also restrictive—especially as the nature of the work began to change—largely in response to the growth of electronic and digital resources for which they were not intended to manage. new library systems—the second (or next) generation—are needed to effectively manage the processes of acquiring, describing, and making available all library resources. this article examines the state of library systems today and describes the features needed in a next-generation ils. the authors also examine some of the next-generation ilss currently in development that purport to fill the changing needs of libraries. mets as an intermediary schema for a digital library of scientific multimedia | gartner 34 references 1 library of congress, “metadata encoding and transmission standard (mets) official web site,” 2011 http://www.loc.gov/standards/mets (accessed august 1, 2011). 2 organisation internationale de normalisation, “iso/iec jtc1/sc29/wg11: coding of moving pictures and audio,” 2002, http://mpeg.chiariglione.org/standards/mpeg-21/mpeg-21.htm (accessed august 1, 2011). 3 fedora commons, “the fedora content model architecture (cma),” 2007, http://fedoracommons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html (accessed december 9, 2011). 4 carl lagoze et al., “fedora: an architecture for complex objects and their relationships,” international journal on digital libraries 6, no. 2 (2005): 130. 5 ibid., 127. 6 ibid., 135. 7 ibid. 8 rishi sharma, fedora interoperability review (london: centre for e-research, 2007), http://wwwcache1.kcl.ac.uk/content/1/c6/04/55/46/fedora-report-v1.pdf.3 (accessed august 1, 2011). 9 richard gartner, “intermediary schemas for complex xml publications: an example from research information management,” journal of digital information 12, no. 3 (2011), https://journals.tdl.org/jodi/article/view/2069 (accessed august 1, 2011). 10 centre for e-research, “bril,” n.d., http://bril.cerch.kcl.ac.uk (accessed august 1, 2011). 11 s. j. derose et al., “what is text, really,” journal of computing in higher education 1, no. 2 (1990): 6. 12 digital library federation, “: metadata encoding and transmission standard: primer and reference manual,” digital library federation, 2010, www.loc.gov/standards/mets/metsprimerrevised.pdf, 77 (accessed august 1, 2011). 13 bill ingram, “echo dep mets profile for master mets documents,” n.d., http://dli.grainger.uiuc.edu/echodep/mets/drafts/mastermetsprofile.xml (accessed august 1, 2011). 14 susan thomas, “using mets for the preservation and dissemination of digital archives,” n.d., www.paradigm.ac.uk/workbook/metadata/mets-altstruct.html (accessed august 1, 2011). 15 library of congress. “mets profiles: metadata encoding and transmission standard (mets) http://www.loc.gov/standards/mets http://mpeg.chiariglione.org/standards/mpeg-21/mpeg-21.htm http://fedora-commons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html http://fedora-commons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html http://wwwcache1.kcl.ac.uk/content/1/c6/04/55/46/fedora-report-v1.pdf.3 https://journals.tdl.org/jodi/article/view/2069 http://bril.cerch.kcl.ac.uk/ http://dli.grainger.uiuc.edu/echodep/mets/drafts/mastermetsprofile.xml information technology and libraries | september 2012 35 officialweb site”, 2011. http://www.loc.gov/standards/mets/mets-profiles.html (accessed december 6, 2011). 16 open archives initiative, “open archives initiative protocol—object exchange and reuse,” n.d., www.openarchives.org/ore (accessed december 12, 2011). 17 jerome mcdonough, “aligning mets with the oai-ore data =mmodel,” jcdl ’09 proceedings of the 9th acm/ieee-cs joint conference on digital libraries (new york: association for computing machinery, 2009): 328. 18 ibid., 329. 19 gartner, “intermediary schemas.” 20 gary simons, “using architectural processing to derive small, problem-specific xml applications from large, widely-used sgml applications,” summer institute of linguistics electronic working papers (chicago: summer institute of linguistics, 1998), www.silinternational.org/silewp/1998/006/silewp1998-006.html (accessed august 1, 2011). http://www.loc.gov/standards/mets/mets-profiles.html http://www.openarchives.org/ore/ lib-mocs-kmc364-20131012112916 aacr2: oclc's implementation and database conversion georgia l. brown: oclc, inc., dublin, ohio. 161 oclc's online union catalog (oluc) contains bibliographic records created under various cataloging guidelines. until december 1980, no system-wide attempt had been made to resolve record conflicts caused by use of the different guidelines. the introduction of the new guidelines, the anglo-american cataloguing rules, second edition (aacr2) , exacerbated these record conflicts. to reduce library costs, w hich might increase dramatically as users attempted to resolve those conflicts, oclc converted name headings and uniform titles in its database to aacr2 form. the purpose of the conversion was to resolve record conflicts that resulted from rule changes and to conform to lc preferred forms of heading if possible. background in may 1978, upon receiving an advance copy of the anglo-american cataloguing rules second edition (aacr2), oclc formed an internal task force of librarians who were professional catalogers to study the new rules. the aacr2 task force was charged with identifying differences between aacr2 and aacrl as applied by the library of. congress. the task force compared the two sets of rules on a rule-by-rule basis to determine: (1) effects of rule changes on the marc record formats , (2) who benefited from the changes, and (3) relative costs of the changes on both a one-time and a continuing basis. each change was assigned a number from 0 to 5 to represent the cost to libraries (0 being no cost and 5 being maximum cost) . the task force identified a total of 454 significant rule changes or new rules . the task force categorized each rule's effect, and in its judgment, 56 percent of the changes would benefit neither the librarian nor the patron, 23 percent would benefit librarians, and 21 percent would benefit patrons. the estimates of the percentage of changes along the cost spectrum are illustrated in table 1. manuscript received ap rill981; accepted aprill98 l. 162 journal of library automation vol. 14/3 september 1981 table 1. estimates of aac r2 changes in terms of costs cost range 0 1 2 3 4 5 pe rcentage of changesone-time 18 54 13 9 4 2 identification of conversion requirements percentage of changescon t inuing 20 56 20 0 2 2 originally, the findings of the task force were to be used to adjust the oclc online system and card production programs to accommodate aacr2 changes. however, in light of estimated costs to individual libraries to convert existing headings and uniform titles to aacr2 form, the task force studied the requirements for an oclc machine conversion. the machine conversion required that information within the record be consistently identifiable. the task force used work sheets to record and keep track of its findings . the first column of each row on the work sheet represented one rule. the row was completed with the rule number, the aacr2 form with tagging, the pre-aacr2 form with tagging, instructions, and comments. figure 1 illustrates a work sheet. an analysis of the work sheets indicated that one method to convert to aacr2 form was to develop an oclc authority control system based on aa cr 2 aa cr2 fo rm prc-aacr2 form rule with tagging with tagging instructions comments 22.501 100 10 zerotina, 100 10 z ze rotina withi n l a z could be fo r czech and slovak karel z ka rel searched , dele ted , and names only a dded at end o f field 25.4b l xx ! a ... lxx !a . . . set up table of uniform t h is would requ ire 240 ta theaetet us 240 t a theaitetos titles where greek reading of ! a . . . or fo rms change to latin c hecking against ta ble 240 !a theaetetus fo rms. change 240 ta greek fo rm to l atin form 21 .26 100 10 parker, 100 10 ra msdell , no way to and theodore tc sar ah a. automatically recognize 22.14 (s pirit) 700 10 70010 par ker , those records requ iring ramsdell, theodore change sarah a. 25 .9 240 t a selections 240 t a selected if text of 240 ! a is this will require wo rks "selected works" reading tex t of l a c hange to " selections" fig. 1. task force conversion w orksheet aacr2: oclc/brown 163 the lc name authority file. due to time constraints and the complexity of developing such a system, however, oclc decided on a second method: to convert online union catalog headings and uniform titles using the lc name authority file and some additional data manipulation techniques that would detect changes not done by the authority processing. preconversion testing using the work sheets, the task force assigned the rule changes to pattern sets. pattern sets were defined as combinations of character strings, punctuation, subfield coding, and other characteristics that indicate that the heading could be algorithmically changed to conform to the new rules. these changes were further divided into those that could be converted by machine and those that could not be converted by machine. approximately 100 pattern sets were initially identified. before making a commitment to convert all 100 of these pattern sets, tests were run to determine the approximate number of bibliographic records that would be changed. a test file obtained by selecting records at random from the online union catalog as of september 2, 1978, already existed at oclc. the test file represented a 1 percent sample of the database on that date, or 41,212 records. programs run on the test file identified the patterns within the bibliographic records and counted the number of times each pattern occurred in the test file. table 2 illustrates selected results of pattern set sampling. patterns not found in the test file were later eliminated from those to be applied against the entire online union catalog. " u.s." was found in qualifying fields 754 times, and "covenant" was found only once. "university of' was found 486 times on the test sample; however, it could be incorrectly converted frequently enough to eliminate it from the list of pattern matching to be done. tests also indicated that some changes that appeared straightforward, when applied , introduced further errors that would have to be resolved after the conversion. of the 41,212 records , 100 records were manually checked for system changes that would need to be made for the existing bibliographic records table 2. selected results of pattern sampling rule number number matched comments 21.39a 32 !a ... !k liturgy and ritual 21.39c 7 ! a jews ! k liturgy and ritual 24.1b 7l state university 21.33 28 constitution 3 charter 1 covenant 21.35 27 treaties 25.15 206 laws, etc. 25.681 0 books, parts, numbers 25.9 19 selected works 24.2782 0 pope 164 journal of library automation vol. 14/3 september 1981 to comform toaacr2. general findings included: change number of records none 33 more than one 21 minor personal name change 19 personal name modification 13 single change other than 14 personal name specific changes that would be needed are shown in table 3. as noted in the table, personal name changes account for more than two-thirds of all required conversion changes. as a final note, name headings to be converted by authority processing could not be estimated by sampling, since the lc name authority file was not available online when the tests were run. early estimates, based on the tests and anticipated name authority matches, called for conversion of 8 percent of the online union catalog, or 560,000 records, to aacr2. however, samplings done by the library of congress indicated that 17 percent of all marc records contained one or more headings that needed to be converted. oclc assumed that this statistic would also apply to its database. the task force's study, in general, showed that oclc could convert by machine a large portion of its bibliographic records to conform to aacr2. design methodology oclc formally initiated the aacr2 project to : (1) accommodate the use of aacr2 format in the online system, and (2) convert existing bibliographic records to aacr2. accommodating aacr2 formats required validating additional content designators, modifying card printing to allow for the new content designators, and training users. also , the seven bibliographic format documents (books, serials, audiovisual media, scores, sound recordings, maps, and manuscripts) were rewritten to include the new content designators and aacr2 input conventions and table 3. modifications needed for aacr2 conversion (based on a sample of 100 records) modification personal name parenthesize geographic location u.s.-united states, ct. brit.-great britain uniform title modification drop geographic location from corporate name tk dropped university heading conference date and place inverted u.s. congress total occurences per 100 records 57 8 3 3 5 2 2 2 1 83 percent of modification 69 10 4 4 6 2 2 2 1 100 aacr2: oclc!brown 165 examples. the remainder of this paper will deal with the conversion of existing bibliographic records in the online union catalog, oclc's bibliographic database. the purpose of the conversion was to resolve record conflicts that resulted from rule changes affecting name headings and uniform titles. functional specifications two sets of functional specifications were written based on the preproject studies by the aacr2 task force. set 1 functional specifications addressed the conversion of bibliographic records to aacr2 by matching the records in the lc name authority file and then incorporating data into the bibliographic records. set 2 functional specifications addressed the machine manipulation of character strings that formed a given pattern. set 1 functional specifications three constraints were placed on the conversion described in set 1 functional specifications. first, the pre-aacr2 form of a converted field must be retained. second, the bibliographic record must be retrievable by both pre-aacr2 and aacr2 forms. third, the field that was changed must be identified to users, and the record must indicate that it had been modified by machine conversion. set 1 functional specifications listed the fields in the bibliographic and authority records that should be considered in the conversion, grouping bibliographic fields that should be matched with given authority fields. for each field, characters were eliminated that might inadvertently cause a no-match result. subfield codes and delimiters, multiple blanks, and diacritics were eliminated from the character string used for matching. all alphabetic characters were converted to uppercase letters and certain subfields were eliminated from the matching strings. this process was applied to both bibliographic and authority records. the resultant matching strings, for a bibliographic and an authority field, were compared on a character-by-character basis. if any character was different, there was no match. matches were treated differently depending on the contents of the name authority field . four cases for matching were defined: case 1. bibliographic field matches aacr2 authority field. in case 1, the only change needed was to indicate in subfield w of the bibliographic field that it conformed with aacr2. case 2. bibliographic field matches non-aacr2 authority field; aacr2 form present in authority record. case 2 called for the following changes: (1) replacing the bibliographic field with the aacr2 form from the authority record; (2) moving the replaced bibliographic data to another field (an 87x field); and (3) indicating in the converted bibliographic field that conversion had been done. 166 journal of library automation vol. 14/3 september 1981 case 3. bibliographic field matches non-aacr2 authority field; aacr2 form not present in authority record . in case 3, the authority record contained the form preferred by lc, but not the aacr2 form . if the bibliographic field matched a "see from" reference ( 4xx authority field), case 3 called for the following changes: (1) replacing the bibliographic field with the authoritative field (lxx authority field); and (2) moving the replaced bibliographic data to another field (an 87x field) . no indication was added that the field was machine-converted, since the form supplied was not aacr2. case 4. bibliographic field tagged as personal name matches authority field tagged as corporate name. in case 4, the bibliographic tag was corrected to a corporatename tag. case 4 was used to clean up the database and to allow more fields to be converted. set 2 functional specifications for set 2 functional specifications, the pre-aacr2 form of the entry also must be retained and the record retrievable by both pre-aacr2 and aacr2 forms. these functional specifications called for conversion of six pattern sets. each pattern set might apply to multiple fields and, within the fields, to multiple character strings. some of the pattern sets were further subdivided into various conditions . for example, pattern set 2 specified the conversion of form subheadings. this pattern set looked only at one field, the 110 field, but held two conditions. in the first condition, any one of ten character strings might be matched. in the second condition, either of two character strings qualified for matching. pattern set 2 was actually one of the easier sets to work with since it involved minimum data manipulation and testing. the most complicated pattern set concerned music uniform titles where only two fields were involved but six possible conditions had to be considered. one of these conditions required conversion of forty-two character strings, provided other information was present. development plan after reviewing the two sets of functional specifications, a development plan was established. this plan outlined the steps involved in software development for the project, named an individual responsible for each step, estimated the duration of each step, identified the objectives of software development, and identified potential time conflicts for staff and machine resources. the time estimates were constantly monitored and revised during the project cycle to ensure that the work would be completed on time. development method based on a thorough analysis of the functional specifications, the following basic design was chosen: aacr2: oclc/brown 167 1. read a bibliographic record. 2. identify a field in the bibliographic record for potential conversion. 3. derive a key from that field. the key derivation used would be the same as that used for the online system, except that it would be extended to include fields not normally indexed but that needed to be converted to aacr2. derived search keys are formulated by extracting a certain number of characters from the words in a name. for personal names, a 4,3,1 key is used; i.e., the first four characters from the surname, the first three characters from the forename, and the middle initial. 4. perform a keyed search of the lc name authority index files. 5. for each hit on an index record, read the corresponding name authority record and check for a match of the authority and bibliographic fields. when a match is found, merge the bibliographic and authority data. 6. repeat steps 2 through 5 for every field in the bibliographic record that qualifies for conversion. 7. scan the bibliographic record for fields that might be converted using the machine-manipulation pattern matching and compare these fields with the various patterns. should a match occur, manipulate the string accordingly. 8. if a record has been converted, add the 040 field if it is not already present in the record; or, edit the 040 field to include a subfield d indicating that oclc has modified the record. 9. repeat the entire process for every record in the online union catalog. design method for conversion the method presented a complex design. because it required indexing fields not normally indexed by the oclc system, the search keys would have to be specified. also, the 130,430, 530 uniform and variant title fields in the name authority file would have to be indexed and the keys defined. this could be done by adding the search keys to the existing name index file , which contains indexes to the lc name authority file , or by creating a separate file. adding to the existing name index file would result in inconsistent data within the file , mixed names and titles, and, more important, interference with the online system . using a separate file would mean more maintenance, necessitate slightly more machine space, and require two searches to cover all search possibilities for derived name authority search keys. (it should be noted that currently online system users cannot search the name authority file using a derived title search key.) software development project software design defined activity along the two lines of conversion, corresponding to the functional specifications: conversion of name 168 journal of library automation vol. 14/3 september 1981 headings by matching bibliographic headings with headings in the lc name authority file , and conversion of name and uniform title headings through machine manipulation of existing bibliographic data. conversion by matching name authority headings was broken down into subactivities as specified by cases 1 through 4 in the functional specifications. conversion by machine manipulation was subdivided into: 1. conversion of conference name headings. 2. conversion of uniform titles-music. 3. conversion of uniform titles-general. 4 . conversion of form subheadings. 5. conversion of general material designators. 6. conversion of "united states" and "great britain" abbreviations. the entire conversion was designed to be directed by a series of runtime parameters that specified which subactivities were to be performed, whether the conversion was to be run concurrently with the online system , the names of files to be used (including audit and checkpoint files), and the range of oclc numbers to be processed. the run-time parameters allowed multiple processes (programs) to be run simultaneously, with each process running against a different part of the online union catalog. the design also included use of an audit trail, where a record is written to a file every time a change is made to a bibliographic field. the trail consisted of the oclc number and the type of subactivity applied to the field . conversion restarts were specified to be automatically controlled through a checkpoint file. checkpoint records in this file contained the latest oclc number processed, total number of records processed, total number of records, and time stamps to calculate elapsed time. to effect a restart, the conversion was simply rerun and the checkpoint file handled file positioning to ensure against duplicate reprocessing of records. an overriding development priority was to design the software to be flexible enough to handle both the conversion of the online union catalog and the conversion of incoming marc tapes. in this way, pre-aacr2 headings would be converted (if they met the specifications) before being loaded into the database. growth requirements at the same time that the coding began, the project staff studied the design to determine its effects on the existing system. additional disk space was projected based on the estimate of bibliographic records to be converted. based on past research of field lengths, project staff estimated that 66.42 bytes (characters) would be added to each converted record . based on earlier samplings by the library of congress, it was assumed that 17 percent of the database would be converted (a figure that turned out low). therefore, 79.04 additional megabytes would be used. because an additional 13 percent of this would be needed for file management, the total aacr2: oclcibrown 169 requirement for the expansion of the bibliographic file was projected as 89.3 megabytes. the bibliographic index files would also expand with the conversion. not only would the old index keys be retained but new keys would be added. it was estimated that 4 percent of the bibliographic records would generate new keys (duplicate keys are not added to the files), for an additional requirement of ten megabytes. it was also calculated that six megabytes would be required for the new name authority index file. the total additional space required for the expansion of the bibliographic file, the expansion of the bibliographic index file, and the addition of the name authority index file was thus 105.3 megabytes. this space would have to be available before the conversion could be run. testing as coding progressed into the testing phase, it became obvious to the project staff that existing testing methods were not well suited to testing the conversion software. therefore, a utility program was developed to enter bibliographic records in a test file using techniques similar to those used by the online system. these test bibliographic records included both good and bad data, and records that should and should not be converted. an attempt was made to cover as many situations as practicable. for example, a given record might have multiple fields that would convert and, within a given field, multiple conversions might apply. images of the converted test records were manually compared with the original entry. at the same time, the accuracy of the audit trail was verified. the conversion process was tested using a utility debugger to simulate error conditions that did not occur as a result of other tests. changes to the online system code were tested using a simulator. all testing and development work was done on a development machine. calibration tests were made on the data base processor (dbp), the database management portion of the online system. the calibrations were taken in a stand-alone environment to calculate the length of time needed to run the conversion and to test the conversion software on a larger database than the test files. at the time of the calibration tests, the lc name authority file held about 250,000 records; it was not distributed across the various disk packs but rather restricted to a fairly small number of packs. between the time of the calibration and the conversion run, the lc name authority file grew to 450,000 records and was distributed evenly across the disk packs on the dbp. according to the calibration tests, the conversion to aacr2 was expected to take ninety-two hours, with sixteen copies of the software processing different ranges of the bibliographic file. the calibration tests also estimated that 28 percent of the bibliographic records would be converted, much higher than originally estimated. after the calibration tests, the software underwent quality assurance tests. the conversion software was run against test files on the dbp to 170 journal of library automation vol. 14/3 september 1981 verify the conversion process and to provide the data for the next phase of quality assurance, the regression test. during regression testing, each subsystem in the online system, with new software changes included, was tested by oclc staff. additional tests were made of normal work flows to ensure that all functions that previously worked still functioned correctly and all functions that should not work still did not work. no problems were uncovered during these tests and no software changes were made. conversion of the oclc online union catalog the conversion was designed to run either in a stand-alone mode or concurrently with the online system. the major drawback to running in a stand-alone mode was that the online system would be unavailable to users for some period of time. however, this was not deemed as great a problem as running the conversion while the online system was operational. with the online system operational, the conversion would have to "lock" the bibliographic record as it is processed, thus potentially affecting system performance. for example, if a user wanted to retrieve a record that was locked, he or she would have to wait until the record was unlocked. since theaacr2 conversion process locks the bibliographic record when it reads it and keeps it locked until the conversion for that record is complete, the record could stay locked for several seconds. before the conversion could be run, various files had to be created on the dbp . the bibliographic file on the dbp is partitioned across twenty-nine disk packs, each pack holding 250,000 records within a range of oclc control numbers. the start-up commands and parameters were put into one file for execution. one audit file was created for each process to be run. the conversion began running with sixteen processes . ten of the processes were run against two disk packs, with four processes running against a single disk pack. at the time of conversion, the dbp contained fourteen cpus; twelve of the processes ran alone in a cpu, and two processes ran in each of two cpus . as soon as the conversion began, on december 13, 1981, at 4:00a.m., another calibration test was done to estimate completion time. the results showed that the file redistribution that was expected to lower the time estimates significantly had not produced the expected result. attempts were made to explain the discrepancies, but it was concluded that the processes simply were slow. the 110 rate and cpu utilization rate were high. based on these calibration test findings, it was decided to start up additional processes so that one process would be run on a single disk pack, with two processes per cpu. the original sixteen processes had to be stopped, the range of oclc numbers processed redistributed, and additional audit files created. twenty-eight processes were then started up. all records in the twenty-ninth disk pack, records with control numbers greater than seven million, were to be handled by the twenty-eighth process. aacr2: oclc!brown 171 the conversion ran smoothly until some of the processes encountered a problem they could not handle. the conversion was then stopped. because the problem was not immediately obvious, the records being processed at the time of the error were skipped and the conversion restarted using the checkpoint file. the problem was later identified-if the converted field held more than 255 characters, the length of the field was incorrectly calculated-and software was corrected. the audit files were saved up to the point of the correction to identify the problem records. using these audit files to find records that had been converted, a preconversion copy of the bibliographic file was scanned for records that would need correction. fifty-six records were identified and sent to the bibliographic maintenance section, user services division, of oclc for manual correction. from this point on, the conversion ran smoothly but slowly, processing an average of 28,500 records per hour. the checkpoint files were read every two hours to monitor the speed of the conversion. because this monitoring in itself proved to be quite cumbersome, a program was written to format the checkpoint data for easier readability. the resultant reports showed a breakdown by process of how much of the conversion had been done, the rate at which it had been done, and how much remained . by using these reports, as a process would finish, another slower process could be divided and started up to balance the load and finish faster. periodically, converted records were written on hard-copy printers for oclc staff to use to check the accuracy of the conversion. the checkpoint reports showed that 39 percent of the records in the online union catalog were being converted to aacr2. this percentage was much higher than anticipated by the calibration tests, and consequently the disk space needed for expansion was more than anticipated. files not used by the conversion were deleted and index files were moved to other disk packs to allow the bibliographic files to expand. the last record was converted and all processes stopped by 10:45 a .m. on december 21, after 246 hours of work. the bibliographic file and its indexes were reorganized, slack space squeezed out, and all files that had been deleted were put back. the online system was made available to users at 7:00a.m., december 23, 1980. a total of 3,704,440 changes had been made on more than 2, 767,000 records. table 4 lists the number of fields converted for each activity. summary some records could not be converted because: (1) the data within the field were incorrect or inadequate, or (2) the record would have exceeded field number and record length limits. oclc has made a continuing effort since the conversion to correct problems. the most difficult and numerous problems involved the lc name authority file. in some cases the data within the authority records are incorrect, while in other instances multiple authority records exist. the 172 ]ournaloflibraryautomation vol.14/3 september1981 1'a ble 4. fields converted for each activ ity activity mistagged corporate na me fie lds direct aacr 2 match match where aacr2 form is e lsewhe re in the a uth ority record ma tc h on lc prefe rred form conversion of conference name headings conversion of uniform titlesmusic conversion of uniform titles-general conversion of form headings conversion of general material designators conversion of " united states" and " great brita in" abbreviations nu mber of fields co nverted 1,268 2, 685. 211 614,333 23,611 96,382 68,905 2,263 3 1,278 49,978 13 1,211 conversion used the first matching authority record it encountered. the most desirable record, as it turned out, was sometimes not the first encountered . a series of eight fixes was programatically applied to the oluc to correct problems, using either the audit file or database scans to select the record to be corrected. fixes 1 and2 were similar in that each was the result of a bad authority record and the original form was restored . headings converted to "voice of america (radio program)" were changed back to " united states. dept . of state" by fix 1. "united states bureau of the census . census of construction industries (1972)" was changed back to "united states. bureau of the census" by fix 2. fixes 3 through 7 were needed to correct programming problems, omissions in the functional specifications, and changes in lc procedures . subfields x, y, and z were deleted from 600 fields by the conversion. fix 3 moved the subfields back into the 600 fields. fix 4 reordered the e and q subfields in personal name headings that had been moved into the field in the wrong order by the conversion. the conversion had supplied a subfield g between the word "manuscript" and the following text in 110 fields. fix 5 changed subfield coding g to n when lc began using the n . fixes 6 and 7 restored some fields to the original form, which had been unintentionally converted. fix 6 corrected form subheadings, and fix 7 corrected music uniform titles. "constitutional" had been treated as " constitution," i.e., it was deleted from the field. some terms within music uniform titles were to have been pluralized. however, the conversion did not differentiate between terms needing pluralization and those that were already plural . " masses" ended up as "masseses. " fix 7 corrected this problem . forty-six headings, including chopin, shakespeare, and beethoven , were identified as unconverted headings, resulting from the multiple authority record problem . fix 8 adjusted the name authority file so the desired record would be the first encountered, scanned the oluc to select records containing the forty-six headings, and ran those selected records through the conversion process. approximately 80,000 records were converted by fix 8. aacr2: oclcibrown 173 other problems were expected to filter in, although the stream has slowed to a trickle. these problems continue to be dealt with by oclc librarians. on the whole, problems were expected, planned for, and handled in a timely fashion. oclc originally envisioned the conversion of its large database to encompass 8 percent of the total records online; 39 percent of the records were converted, and they were available to oclc users before the january 1, 1981, deadline set by the library community. georgia l. brown is manager, cataloging section, in the development division of oclc. 66 ]ourml of library automation vol. 2/2 june, 1969 appendix i preliminary guidelines for the implementation of the proposed american standard for bibliographic information interchange on magnetic tape this appendix is not part of the proposed standard but is included to illustrate its application in one environment and recommended application in another. part a of the appendix contains general guidelines which apply to both parts b and c. part b contains the preliminary guidelines for the library of congress, national library of medicine, and national agricultural library implementation of the standard. part c contains the proposed preliminary committee on scientific and technical information ( cosati) guidelines for the implementation of the standard. a. general 1. labels volume header and file header labels are required and will conform to usas proposed standard x.3/552 magnetic tape labels and file structure. 2. character codes a code for a diacritical will always be recorded before the code for the alphabetic character which it modifies. 2.1 character definitions 2.1.1 delimiter. the delimiter will consist of the "unit separator" (ascii character 1/15). 2.1.2 field termimtor. the field terminator will consist of the "record separator" (ascii character 1/14). 2.1.3 record termimtor. the record terminator will consist of the "group separator" (ascii character 1/13). . 2.1.4 padding character. the padding character will consist of the "space" (ascii character 2/ 0). 2.2 ba.sic character set the basic character set will consist of the characters in columns 2, 3, 6 and 7 of the standard code as defined in usas x3.4-1967 code for information interchange, p. 6. this basic character set is included as part of the illustration on p . 82 of this appendix, columns 2, 3, 6 and 7. 3. type-of-record symbols the following table indicates the type-of-record symbols that have been assigned at this time : symbol a b c d e f f i j k 1 x y meaning printed text manuscript text printed music manuscript music printed maps manuscript maps motion pictures; fllms microform publications recorded sound (language) recorded sound (music) pictures digital media authority datanames authority datasubjects appendix i 67 4. bibliographic level symbols the following table indicates the bibliographic level symbols that have been assigned at this time: symbol a m s c meaning analytical (a bibliographic unit generally not published separately but part of a larger bibliographic entity) monographic publication serial publication (a bibliographic unit issued in successive parts, usually dated or numbered, intended to be continued indefinitely) collective (a made-up collection which is gathered together and cataloged as a unit) indicates that this data element is not used. 5. status symbols the following table indicates the status symbols that have been assigned at this time. the meaning of the symbols is relative to the transmitting source. symbol n meaning new record c d changed or corrected record (complete record to be substituted for one previously transmitted) deleted record 68 journal of library automation vol. 2/ 2 june, 1969 b. preliminary guidelines for the library of congress, national library of medicine, and national agricultural library implementation of the proposed american standard for a format for bibliographic information interchange on magnetic tape as applied to records representing monographic materials in textual printed form (books) 1. labels 1.1 header labels the following table indicates the data elements of the volume and file labels and their permissible values. 1.1.1 volume header data element name length contents label identifier 8 .. vol" label number 1 •t' volume serial number 6 reserved for user accessibility 1 lb unused 26 reserved for user format description 28 usaslbz39.2-1969bbiibfmtli~bl313 label standard level 1 1.1.2 file header data element length contents name label identifier 3 "hdr" label number 1 eel" file identifier 17 mixedbbibliobdata set identifier 6 marcb2 file section number 4 "0001" file sequence number 4 "0001" unused 6 blanks creation date 6 "l6yyddd" expiration date 6 '1llyyddd" or blanks accessibility 1 13 block count 6 "000000" system code 13 reserved for user unused 7 blanks 1.2 end of file data element name length contents label identifier 3 "eof" label number 1 't' appendix i 69 the next 50 characters correspond to those in the same positions in the header label. block count 6 nnnnnn the next 20 characters correspond to those in the same positions in the header label. ~=blank n = decimal digit yy = last two digits of year ddd = day number in julian calendar 2. delimiter and data element identifier the delimiter will consist of the unit separator. the data element identifier will consist of one basic character. a delimiter will precede each data element identifier which in turn precedes each data element that it identifies. the first data element in each variable field will always be preceded by a delimiter and a data element identifier (even though there is only one data element in the field). 3. indicat01' two indicators will be used as the first two data elements in each variable data field. each indicator will consist of one basic character. h an indicator is not used, it will be set to blank (ascii character 2/0) . no indicators are used in the control fields. 4. leader the following table indicates the data elements in the leader and their permissible values and formats. record length decimal digits, right justified, status type-of-record bibliographic level indicator count delimiter count base address of data entry map with leading zeros. as defined in paragraph a.5 of this appendix. c' ,, a " ,, m decimal digits, right justified, leading zeros. "4500" 70 journal of library automation vol. 2/ 2 june, 1969 5. directory each directory entry consists of the following data elements: tag 3 decimal digits length of field 4 decimal digits, right justified, leading zeros. starting character 5 decimal digits, right justified, position leading zeros. the directory ends with a field terminator. 6. control fields tag 001 008 control number fixed length data character positions 0-5 date entered on file 6 type of publication 7-10 date of publication 1 11-14 date of publication 2 15-17 country of publication code 18-21 illustration codes 22 intellectual level code 23 form of reproduction code 24-27 form of contents codes 28 government publication indicator 29 conference proceedings indicator 30 festschrift indicator 31 index indicator 32 main entry in body of entry indicator 33 fiction indicator 34 biography code 35-37 language code 38 modified record indicator 39 cataloging source code 7. variable field data elements tag indicator data 1 010 element 2 identifier preceded by a "unit separator." a name library of congress card number library of congress card number appendix i 71 011 linking library of congress card number a linkin~ library of congress card num er 015 national bibliography number a national bibliography number 016 linking national bibliography number a linking national bibliography number 020 standard book number a standard book number 021 linking standard book number a linking standard book number 025 overseas acquisition number a overseas acquisition number 026 linking oan number a linking oan number 035 local system number a local system number 036 linking local system number a linking local system number 040 cataloging source a cataloging source 041 language ( s) 0 work contains more than one language 1 work is a translation a group of 3-character language codes needed to describe languages of the text or its translation b languages of summaries 042 search code a search code 050 library of congress call number 0 book is in library of congress 1 book is not in library of congress a library of congress classification number b book number 051 copy, issue, offprint statement a library of congress classification number b book number c copy information 72 journal of library automation vol. 2/2 june, 1969 060 national library of medicine call number 0 book is in national library of medicine 1 book is not in national library of medicine a national library of medicine classification number b book number 070 national agricultural library call number 0 book is in national agricultural library 1 book is not in national agricultural library a national agricultural library classification number b book number 071 national agricultural library subject category a national agricultmal library subject category 080 universal decimal classification number a udc number 081 british national bibliography classification number a bnb classification number 082 dewey decimal classification number a ddc number 086 supt. of documents classification number a supt. of documents classification number 090 local call number 100 personal name as main entry (names may be established in conformity with the ala or anglo-american rules.) 0 forename only 1 single surname 2 multiple surname 3 name of family 0 main entry is not subject appendix i 73 1 main entry is subject a name b numeration c titles and other words associated with name d dates e relator k form subheading t title (of book) 110 corporate name as main entry 0 surname (inverted) -· 1 place or place and name 2 n arne (direct order) 0 main entry is not subject 1 main entry is subject a name · b each subordinate tmit e relator k form subheading t title (of book) 111 conference or meeting as main entry 0 surname (inverted) 1 place and name 2 name (direct order) 0 main entry is not subject 1 main entry is subject a name b number c place d date · e subordinate unit in name g other information k form subheading t title (of book) 130 uniform title heading as main entry ~ null condition in first indicator 0 main entry is not subject 1 main entry is subject a uniform title heading t title (of a book) 240 uniform title 0 not printed on lc card 1 printed on lc card a uniform title 74 journal of library automation vol. 2/2 june, 1969 241 romanized title 0 does not receive title added entry 1 receives title added entry a romanized title 242 translated title a translated title 245 title statement 0 no title added entry in this form 1 title added entry in this form a short title b remainder of title c transcription of remainder of title page up to next field 250 edition statement a edition b additional information 260 imprint 0 publisher is not main entry 1 publisher is main entry a place b publisher c date 300 collation a pagination or volumes b illustration( s) c height 350 bibliographic price a bibliographic price 360 converted price a converted price 400° series note-personal name 0 forename only 1 single surname 2 multiple surname 3 name of family 0 author of series is not main entry 1 author of series is main entry a name b numeration c titles, other name-associated words d dates e relator k form subheading t title (of series) v volume or number •used only when series is traced in the same form. appendix i 75 410. series note-corporate name 0 surname (inverted) 1 place or place and name 2 name (direct order) 0 author of series is not main entry 1 author of series is main entry a name b each subordinate unit e relator k form subheading t title (of series) v volume or number 411. series note--conference 0 surname (inverted) 1 place and name 2 name (direct order) a name b number c place d date e subordinate unit in name ~ other information form subheading t title (of book) v volume or number 440. title a title v volume or number 490 series untraced or traced differently 0 series not traced 1 series traced differently a series statement 500 general note a general note 501 "bound with" note a "bound with" note 502 dissertation note a dissertation note 503 bibliographic history note a bibliographic history note 504 bibliography note a bibliography note 505 formatted contents note 0 "complete" contents 1 "incomplete" contents • used only when series is traced in the same form. 76 i ournal of library automation vol. 2/ 2 june, 1969 2 partial contents a contents note 506 " "limited use note a " "limited use note 520 abstract or annotation a abstract or annotation 600 personal name as subject added entry 0 forename only 1 single surname 2 multiple surname 3 name · of .family 0 lc subject heading 1 subj. heading assigned for use in children's catalog 2 nlm subject heading 3 nal subject heading a name b numeration c titles, other name-associated words d dates e relator k form subheading t title (of book) x general subdivision y period subdivision z place . subdivision 610 corporate name as subject added entry 0 surname (inverted) 1 place or place and name 2 name' (direct order) 0 lc subject heading 1 subj. heading assigned for use in children's catalog 2 nlm subject heading 3 nal subject heading a name · b each subordinate unit e relator k form subheading t title (of book) x general subdivision y period subdivision z place subdivision •used only when series ij traced in tho same form. appendix i 77 611 conference as subject added entry 0 surname (inverted) 1 place and name 2 n arne (direct order) 0 lc subject heading 1 subj. heading assigned for use in children's catalog 2 nlm subject heading 8 nal subject heading a name b number c place d date e subordinate unit in name f other information form subheading t title (of book) :1 general subdivision y period subdivision z place subdivision 630 uniform title heading as subject ~ added entry null condition in first indicator 0 lc subject heading 1 subj. heading assigned for use in children's catalog 2 nlm subject heading 8 nal subject heading a unif01m title heading t title (of book) x general subdivision y period subdivision z place subdivision 650 topical subject added entry 0 not entered under place 1 entered under place 0 lc subject heading 1 subj. heading, children's catalog 2 nlm subject heading 8 nal subject heading a topical subject heading b name following place entry element x general subdivision y period subdivision z place subdivision 78 journal of library automation vol. 2/ 2 june, 1969 651 geographic name (not capable of authorship) as subject added entry 0 not entered under place 1 entered under place 0 lc subject heading 1 subj. heading assigned for use in children's catalog 2 nlm subject heading 3 nal subject heading a geographic name b geographic name following place entry element x general subdivision y period subdivision z place subdivision 652 political jurisdiction as subject added entry 16 null condition in first indicator 0 lc subject heading 1 subj. heading assigned for use in children's catalog 2 nlm subject heading 3 nal subject heading a political jurisdiction x general subdivision y period subdivision z place subdivision 690 local subject headings 16 reserved for user lb reserved for user a subject heading x general subdivision y period subdivision z place subdivision 700 personal name as added entry 0 forename only 1 s~le surname 2 m tiple surname 3 name of family 0 alternative entry 1 secondary entry 2 analytical entry a name b numeration c titles, other name-associated words appendix i 79 d dates e relator k form subheading t title (of book) u non-printing filing information 710 corporate name as added entry 0 surname (inverted) 1 place or place and name 2 n arne (direct order) 0 alternative entry 1 secondary entry 2 analytical entry a name b each subordinate unit e relator k fotm subheading t title (of book) u non-printing filing information 711 conference as added entry 0 surname (inverted) 1 place and name 2 n arne (direct order) 0 alternative entry 1 secondary entry 2 analytical entry a name b number c place d date e subordinate unit in name g other information k form subheading t title (of book) u non-printing filing information 730 uniform title heading as added entry fj null condition in first indicator 0 alternative entry 1 secondary entry 2 analytical entry a uniform title heading t title u non-printing filing information 140 title traced differently from short title fj null condition in first indicator 80 journal of library automation vol. 2/ 2 june, 1969 0 alternative entry 1 secondary entry 2 analytical entry a title traced differently from short title 750 n arne not capable of authorship 0 not entered under place 1 entered under place 0 alternative entry 1 secondary entry 2 analytical entry a name or place entry element b name following place entry element 800° personal name-title series added entry 810• corporate name-title series added entry 811° conference-title series added entry 840° title series added entry 900 block of 100 numbers for local use 8. extended ascii character set for roman alphabet and rcnnanized non-roman alphabets 8.1 scope a library character set for the roman alphabet and romanized non-roman alphabets necessitates a larger number of characters than are provided for in the 7 -bit american standard code for information interchange (ascii) . in addition, many libraries only have a 6-bit capability. therefore, it was necessary to develop a character set which would meet all of the following requirements: (a) leave the 7 -bit standard (ascii) intact, (b) expand the 7-bit standard to include an 8th bit to provide additional chamcters, (c) provide a shift mechanism which would make it possible to use all of the characters defined in the 8-bit set in the 6-bit environment. this section describes such a character set. 8.2 criteria governing selection of characters 8.2.1 frequency of occurrence of character 8.2.2 degree of necessity in expressing character when it occurred 8.2.3 possibility of substituting one character for another or of expressing a character by writing it out •taki in the 800'• aro used for series added entries traced differently from the series rtatement. with the exception that no aecond indicators are used in the soo's, tho indicators and data element ldentiben are tile same as those used with the 400' s. appendix i 81 8.3 digital codes the correlation of the character set to digital form code is based upon the ascii (american standard code for information interchange) standard. in conformance with the design considerations of ascii ( 7 -bit code), the character set is also correlated to an 8-bit code and a 6-bit code. the basic digital form code for the character set is the 8-bit code (see figure 1 ) . 8.3.1 the 8-bit code is an extended form of the standard 7-bit ascii. some of the standard ascii characters such as the braces or the backwards slash are not proposed for the character set. however, no characters will be substituted for these code positions. other characters such as diacritical marks will be left in their standard position (unused) and duplicated in another portion of the code set reserved for special characters and diacriticals. 8.3.2 the 7 -bit code will be derived from the 8-bit code by removing the 8th bit. those characters which previously had a 0 in the 8th bit will be considered part of the standard 7 -bit ascii set. those with a 1 in the 8th bit will be considered part of the nonstandard set. a so (shift out) control character will be used to go from the standard set to the non-standard. the code will stay in the nonstandard mode until a si (shift in) control character is reached. 7-bit 8-bit si i usascii i ~i'--------'-;8-th_b_i_t _-_0__ji so (special characters ~i :'8th bit ,. 1 i and diacriticals) : ~-----------------~---------~ 8.3.3 the 6-bit code will be derived by removing the 6th bit and the 8th bit. the 8-bit code set will be divided into 4 sets as follows: columns 2, 3, 6 & 7 = standard set (referred to as the "basic set" in the proposed standard for a format for bibliographic information interchange, p. 56 and in paragraph a.2 of this appendix) fig. 1. proposed extended ascii character set --standard 6-bit set -----non-standard set 1 •• •••·•·· non-standard set 2 ~ ~ ~ 1 2 ~ ~ 1 ~ ~ ~ ¢ ~ ¢ inul idle sp ~ ~ ¢ 1 1 isoh idc1 ~ ¢ 1 ¢ 2 istx idc2 " ~ ~ 1 1 3 ietx idc3 tl ¢ 1 ~ ~ 4 ieot idc4 $ ¢ 1 ¢ 1 5 ienq. i nak /. ~ 1 1 ~ 6 lack isyn & ~ 1 1 1 7 ibel ietb 1 ¢ ¢ ¢ 8 ibs can . ( 1 ¢" ¢ 1 9 iht el-i 1 ¢ 1 ¢ 10 i lf sub * 1 ¢ 1 1 11 ivt esc + 1 1 ¢ . ¢ 12 iff fs j 1 ¢ 1 13 icr igs ~· 1 i 1 ¢ 14 i so i rs !l' • ~ i~ ~ 1 1 ¢ 1 ~ 3 i 4 ¢ i @ a 2 b . 3 c 4 d 5 e 6 f 7 g 8 h 9 i j k ( l = h > n 5 p q r s t u v w x y z [ \ j ¢ 1 ¢ 1 ¢ i~ 1 1 1 1 ¢ 1 6 i 7 ' 1 i p a. q b r c s d t e u f v g v h x i y j z k {3 1 m } 3 a 1 , n 1 8 1 ~ ¢ ¢ 9 1 ~ ~ 1 1 ~ 1 ¢ 10 !. h d p ie (e / ; ® ± ()" 11 }: fi!j d p re ce // .l £ '& 1 ~ 1 1 0" t.r i u1 1 ¢ ~ 12 1 1 ¢ 1 13 ' 1 1118b 1 1 7 i 1 1 6 t ¢ 1 5 s i 14 __ ./...!s.. .. '1 .) ... .. a ... ... v r .; ) ~ ..:. 1 1 1 1 15 ~_r_ ~~rs 2.l i i ? l~ .... l.,..._1 j o i net i i l .......... ~ .......... ~ 1 ~ ......... ~ ........ 1 4 3 2 1 bits key:(1)redefined elsewhere in the set. (2)to be used as terminators or delimiters. (3)to be "used as shift codes for 6-blt set(non•1ocking). 100 to i -.q. t"'< .... ~ ~ '"'l ~ .... ~ -c;· ;:3 < 0 rto ~ ._ § sll ~ cd ffi appendix i 83 columns 0, 1, 4 & 5 = non-standard set ( 1) columns 10, 11, 14 & 15 =non-standard set (2) character 7b in the standard set will be used as a non-locking shift code to reach non-standard set ( 1); character 7/11 (column 7, row 11 of figure 1) will shift to non-standard set ( 2). the presence of one of these codes will indicate that the next character is in one of the appropriate non-standard set. the code will then be automatically shifted back to the standard set. c. proposed preliminary cosati guidelines for the implementation of the proposed american standard for a format for bibliographic information interchange on magnetic tape (this imple,mentation guide, prepared by a panel of the committee on science and technical information, should be regarded as a committee working paper until approval by the federal council for science and technology, to which it is in process of being presented.) 1. labels 1.1 header labels 1.1.1 volume header as specified in paragraph b.l.l.1 of this appendix. 1.1.2 file header as specified in paragraph b.l.l.2 of this appendix, except that the set identifier (field 4) shall contain the characters "cosati." 2. delimiter the delimiter shall consist of the "unit separator" (ascii character 1/15). 3. indicator the indicator is a two-character code consisting of basic characters specifying the origin or authority for the data in each variable field. the codes as presently assigned are as follows: federal agency code legislative branch general accounting office govemment printing office library of congress judicial branch administrative office of the u.s. courts the supreme court of the u.s. gg gp li ao si 84 journal of library automation vol. 2/2 june, 1969 executive branch american battle monuments commission ac appalachian regional commission ar atomic energy commission ai bureau of the budget bo canal zone government cv central intelligence agency cl civil aeronautics board cc commission of fine arts ci council of economic advisers cf delaware river basin commission ee department of agriculture al department of commerce co department of defense office, secretary of defense (includes defense agencies not indicated below) dd department of army da department of navy dn department of air force df defense supply agency ds defense atomic support agency dh defense communications agency dk department of health, education, & welfare hh department of housing and urban development hu department of interior in department of justice ju department of labor la • . ' department of state su agency for international development sv peace corps sw department of transportation to department of treasury tr district of columbia government cz export-import bank of washington ei farm credit administration fc federal aviation agency fa federal coal mine safety board fg federal communications commission fe federal deposit insurance corporation fk federal home loan bank board fm federal maritime commission fo federal mediation and conciliation service fq federal power commission fs federal reserve system fu federal trade commission fw appendix i 85 foreign claims settlement commission general services administration indian claims commission interstate commerce commission national aeronautics and space administration national aeronautics and space council national capital housing authority national foundation on arts and humanities national labor relations board national mediation board national security council national science foundation office of economic opportunity office of emergency planning office of science and technology office of special representative for fi gs ik ic nc nf nh au ni nm no ns oe oh os trade negotiations tu panama canal company pc post office department po railroad retirement board rr renegotiation board re saint lawrence seaway development corporation sx securities and exchange commission sl selective service system sr small business administration sf smithsonian institution so subversive activities control board sc tax comt of the united states tc tennessee valley authority tx u.s. arms control and disarmament agency af u.s. civil service commission cr u.s. information agency us u.s. tariff commission uw veterans administration va virgin island corporation vi 4. leader the following table indicates the data elements in the leader and their permissible values and formats. record length decimal digits, right justified, with leading zeros status type-of-record bibliographic level as defined in paragraph a.5 of this appendix (( , a as defined in paragraph a.4 of this appendix 86 journal of library automation vol. 2/2 june, 1969 indicator count delimiter count "2" "i" base address of data decimal digits, right justified, with leading zeros entry map "4500" 5. directory each directory entry consists of the following data elements : tag 3 decimal digits length 4 decimal digits, right justified, leading zeros starting character position 5 decimal digits, right justified, leading zeros the directory ends with a field terminator. the entries in the directory shall be sequenced in ascending numeric order by the tag. 6. control field tag designation: content 001 record identification number an identification number is assigned for purposes of file control by the specific organization which is distributing the tape. the number may be newly assigned, may be the accession number assigned by a documentation center, or may be the report number assigned either by the originating organization or the monitoring organization, depending on the practice of the organization which writes the tape. examples might be: ad-635 050, pb-170 275, ucrl-1376. 7. variable field data elements tag 100 designation: type of item content this defines whether tl1e item is cataloged as a monograph, serial title, journal article, patent, technical report, audio-visual matetag designation: content rial, etc. 110 security classification of item this is the alphabetic code which properly specifies the security classification of the item. the codes available include: u unclassified c confidential s secret tag designation: content tag designation: content tag designation: content tag designation: content tag designation: content tag designation: content tag designation: content tag designation : content appendix i 87 120 downgrading authority code codes should be taken from dod industrial security manual, app. 2, automatic downgrading and declassification system. 130 distribution limitation statements these are limitations (other than security classification) on the availability of the item to the public. these limitations might include: not reproducible; only available on loan; only available to certain recipients. 140 distribution limitation codes codes corresponding to information in field 130. 150 cataloging organization acronym or name of cataloging organization. 160 announcement journal reference this designates the specific issue of the announcement journal in which this record is published. 170 source report numbers these report numbers are the numbers assigned to the report by the orginating organization. examples might be: 180 ucrl-1035 rm-4244-pr tm/ adc/ 820/ 03 monitoring organization report numbers these are the report numbers assigned by the monitoring or sponsoring organizations. examples might be : nasa-cr-263 asd-tdr-63-24 190 other report identification numbers these are other identification numbers such as other organization identification numbers which do not fall into the other categories. 88 journal of library automation vol. 2/2 june, 1969 tag designation: content tag designation: content tag designation : content tag designation: content tag designation: content tag designation: content tag designation: content 200 project numbers included here are the project numbers under which the work was performed. a project is a grouping of tasks or efforts directed toward a single end result. the project is the basic building-block used in planning, reviewing, and reporting of performance of research and developing programs. 230 security classification of title this is the classification of the content of data element 240. use codes shown in data element 110. 240 classified title this is the classified title in the vernacular, transliterated if necessary. this title is integral to the work. 250 unclassified translated title this title is supplied by the cataloger if not on the document. 260 alternate title entry tllis is an alternate title derived from the secondary part of the title as given on the title pages or a catchword title or subtitle. 270 index annotation this is an edited or supplied version of the title that more accurately reflects subject content of the work than the original title. 280 personal names these are the names of people associated with the responsibility for the intellectual content of the item. this might include authors, compilers, illustrators, translators, etc., but it excludes personal names used as subjects. data will be entered in the form last name, initial. initial. examples nlight be: smith, j. r. roberts, a. b. tag designation: content tag designation: content tag designation: content tag designation: content tag designation: content tag designation: content tag designation: content appendix i 89 290 personal name affiliation this is the affiliation of each name in data element 280 if that affiliation is different from the corporate author. 300 corporate authors these are the names of the organizations associated with the intellectual content of the work. they do not include the publisher or personal affiliation or sponsor except where they are also the corporate author. 310 corporate author codes these are the codes which correspond to the content of data element 300 (if present). 320 contract numbers the contract number is an alphanumeric identifier of the contract cited in the report which designates the financial support of the report. some examples might be: af 33(657)-8146 da036-039-sc-8727 4 330 grant numbers the grant number is an alphanumeric identifier of the grant cited in the report which designates the financial support of the report. some examples might be: 340 nih-5r01-ca-03157-02 nsf-gp-2528 sponsoring organizations sponsors include any and all of the following: true sponsors, who furnished financial support and issued the contracts; monitors, who supervised compliance with the contract; and beneficiaries, for whose benefit the work was done and the report written. 350 cosati subject category codes these are alphanumeric codes used to group subject terms according to broad subject areas established by cosati. 90 journal of library automation vol. 2/ 2 june, 1969 tag 360 designation: other subject heading codes content these are alphanumeric codes used to group subject terms according to broad subject areas which have been established by organizations other than cosa ti. tag 370 designation: primary subject term security classification content this is the classification code of the subject term in data element 380 which has the highest classification. use codes shown in data element 110. tag 380 designation: controlled primary subject terms content this consists of vocabulary taken from the controlled list of subject terms which describes the prime subject content and appears as a heading in the bibliography. tag 390 designation: secondary subject term security classi£cation content content this is the classification code of the subject term in data element 400 which has the highest classification. use codes shown in data element 110. tag 400 . ' designation: controlled secondary subject terms content this consists of vocabulary taken from the controlled list of subject terms which describes the subject content and is available in the system of the organization but does not appear in the bibliography. tag 410 designation: security classification of provisional subcontent ject terms this is the classification code of the provisional subject term in data element 420 which has the highest classification. use codes shown in data element 110. tag 420 designation: provisional subject terms content these are terms which may be applied to subject-classify the content of the work but tag designation: content tag designation: content tag designation: content tag designation: content tag designation: content tag designation: content appendix i 91 are usually taken from the work itself and not from any authorized listing or thesaurus. they may be terms which will eventua1ly be classified as controlled vocabulary depending on their frequency and consistency of use or importance as a retrieval tag, or they may stay in the uncontrolled vocabulary group indefinitely. 430 security classification of special retrieval terms this is the classification code of the special retrieval term in data element 440 which has the highest classification. use codes shown in data element 110. 440 special reb·ieval terms these are terms which designate project names, equipment nomenclature, trade names, catch words. 450 source journal citation this contains the source journal title, the volume and issue number, the pages on which the article appears, and the date of the journal issue. definition is provided as to whether the item is a reprint or if the source journal is in another language. 460 original language of item this is the language (or languages) in which the item origina1ly appeared if different from data element number 470. 470 present language of item this is the language (or languages) in which the item appears at present. it may be the original language, or it may be the result of translation. 480 imprint date of item this is the date of current publication of the item. this would include new imprints, translation dates, date of revision, etc. 92 journal of library aut01tultion vol. 2/ 2 june, 1969 tag designation: content tag designation: content tag designation: content tag designation: content tag designation: content tag designation: content tag designation: content tag designation: content 490 original date of item this contains the date of the original completion or publication of the item only when different from the date of imprint. 500 place of publication this is the city of publication and includes state or country when necessary for identification. 510 country of origin of the intellectual effort this is the country where the original work was done and is not necessarily the country where the publication or translation occurred. 520 number of pages this is the number of pages and/or volumes as determined by current cataloging procedures. 530 availability and price this is the acronym or name of the specific organization, if any, from which the document is available, the hardcopy price, and the microform price. 540 descriptive note this is a title without subject content which describes the type of item, such as final report, progress report for the period . . . , quarterly technical status report, etc. 550 bibliography note this is a note which indicates the presence of bibliographic information as part of the contents of the work. 560 dissertation note this is a note which identifies the work as an academic dissertation presented in partial fulfillment of requirements for a degree. it usually names the institution or faculty to which the dissertation was presented and tag designation: content tag designation: content tag designation: content tag designation: content appendix i 93 the degree for which the author was a candidate. 570 contents note this is a note which lists either all or part of the contents of a work, such as authors and titles, in order to bring out important parts of the work not mentioned in the main title. 580 notes, general these are notes not covered elsewhere. 590 owner or assignee these are the names of owners or assignees of the patent. 600 security classification of abstract use the codes shown in data element 110. tag 610 designation: languages of abstracts or summaries, if content different from data element 470. tag designation: content 620 abstract this is free form as it occurs in the file of the organization writing the tape, in which the source of the abstract is identified as author of the item or the organization generating the abstract. for contents notes see data element 390. editorial ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ communications ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ lib-mocs-kmc364-20131012113553 230 i ournal of library automation vol. 14/3 septem her 1981 for member libraries and will demonstrate their system in mid-1981. oclc data has been successfully transferred to many local circulation systems. rlgirlin rlin does not anticipate offering local circulation services for member libraries. rlin data has been successfully transferred to several local circulation systems. wln wln does not anticipate offering local circulation systems on their computer for member libraries. wln data has been successfully transferred to local circulation systems and an agreement has been reached with dataphase, a computerized circulation system vendor, to discount purchase of their system by wln member libraries. public online catalogs again, none of the bibliographic utilities under consideration currently support public online catalogs of an individual library's collection. a public online catalog requires further programming in order to make it easy for the public to locate materials of interest without extensive training; the bibliographic utility's searching procedures are too esoteric to be used by the general public. as in circulation, issues of data transferability and full retrospective conversion of the uo library's catalog are paramount. oclc oclc does not currently encourage public access to their database and does not support use of local online catalogs on their computer due to the tremendous demand for computer resources exerted by 2400 member libraries. oclc and rlg /rlin are participating in a study of user requirements for a public online catalog. oclc data has been successfully transferred to several local online catalogs, including eugene public library's circulation and online catalog system, ulisys. rlg!rlin rlin anticipates being able to offer public access to their database. they are participating in a study with oclc of user requirements for such a system, but no date has been announced for the development of this capability in rlin. rlin data has been successfully transferred to a local public online catalog at northwestern university. wln wln does not believe that a local online patron accessed catalog should be provided through the wln computer, even though they anticipate having such a capability within one year. instead, they encourage libraries to develop local systems for public access to the online computerized catalog and to obtain data from the wln cataloging system. the university of illinois is adapting the wln computer search and database management software to provide a local online catalog and computerassisted instruction in its use for the public. checklist for cassette recorders connected to crts prepared by lawrence a. woods: purdue university libraries, west lafayette, indiana, for the technical standards for library automation committee, information science and automation section, library and information technology association. introduction a data cassette recorder connected to a printer port is an effective, low-cost method of collecting data in machine-readable form from display terminals such as the oclc 100/105. it is important that a data recorder be used rather than an audio recorder although the cassette itself can be a goodquality audio tape. it is also important to note that the data recorded on the tape are not the same as the data originally transmitted to the display terminal, but are simply a line-by-line image of what appears on the screen. a typical installation will have a minimum of two devices: one attached to the display terminal to collect data, and one attached to a printer or an input device to another computer for playback of the data. there are more than 150 various data recording devices on the market. this checklist is prescriptive in nature, outlining and describing those features that are necessary or desirable for a typical application. in addition to features, environmental considerations are briefly mentioned along with information for the purchase, lease, or rental of data equipment. features in general, featu.res must be compatible between all devices used for recording and playback in a given application. some features that are desirable for certain applications are unnecessary or inappropriate for others. 1. recording media the phillips cassette is most widely used and may be interchanged between the recorders of different manufacturers that utilize it. the cartridge (either 3m or a vendor proprietary cartridge) is gaining popularity because of its greater storage and transfer rates, but as yet is not widely used. 2. code most print ports on display terminals use ascii (american standard code for information exchange) data code. the recorder selected should use the same. 3. interfaces the cassette recorder has an "in" plug to accept data. this must be compatible with the print port on the terminal-usually rs232c. the "out" plug on the recorder sends the recorded data to a printer or to a computer. this interface should also be rs232c. 4. recording characteristics a. the number of tracks can vary from one to four. this is one of the factors that determine the amount of data that can be recorded on a single cassette. four tracks are recommended. b. density also affects the amount of data that can be recorded. usual densities are 800 or 1,600 bits per inch (bpi). c. recording mode. thereareseveral reports and working papers 231 modes available. phase encoded (pe) is the best mode for data applications. non-return to zero (nrz) is a popular mode, but has poor error recovery. ibm has a version called nrzi, which improves on nrz but still is less reliable than phase encoded. other commonly found modes are complementary nrz and ratio recording. d. recording format. there is a variety of recording formats. to be assured compatibility with the terminal and playback device the format should be either ansi (american national standards institute) or ecma (european computer manufacturers association) compatible. 5. transmission a. duplex. the recorder should have both full and half duplex available. b. data transfer rate (baud rate). baud rate is usually switchselectable from 110 to 9600. the recorder must be set at the same speed as the printer port on the terminal. the oclc 100 and 105 terminals have a printer port baud rate selection switch that may be setat100,150,300,600,1200,and a meaningless 1800 baud. select a recorder that has the fastest compatible setting: 1200 baud is best. data must be played back at a rate compatible with the receiving device. 6. tape transport characteristics a. read/write speed is usually a function of the baud rate. b. non recording speeds. this feature is important for convenience. fast forward and rewind should be available. one hundred twenty inches per second will rewind a cassette in about thirty seconds. c. drive mechanism. four options are available: capstan, pinch roller, servomechanism, or reel-toreel. pinch roller is the most precise but reduces the life of the tape. 7. packaging 232 journal of library automation vol. 14/3 september 1981 this feature can affect the price of the final configuration. if any item is listed as "separate," increase the total price accordingly. components that can be either internal or separate are: controller, interface, or power supply. 8. remote operation some devices use ascii control codes to trigger controls automatically. this is a useful feature, but the device must have a transparent mode switch, otherwise codes embedded in the data being recorded or sent may trigger undesired operations such as rewind. 9. operating characteristics a. rewind, fast forward, initialize, send and receive are all necessary operations and should be switchcontrolled. b. edit, auto program search, string search, skip, etc., are useful for word-processing operations but are of little use in simple data collection and transmission. c. read backward is desirable for sort operations. d. character mode, line mode, and string mode are useful for printing operations but of little use in data transmission. e. online/offline should be switchselectable. f. simultaneous read/write is useful for editing operations. g. direct block accessing is useful if there is a need to search for recorded data but is not used in sequential processing. h. auto reverse is a useful feature for recording or transmitting more data than can be recorded on one side of a cassette. environmental requirements i. humidity range humidity range should be 20 percent to 80 percent without condensation. lower humidity will cause excessive static electricity. 2. temperature temperature range should be between ten degrees and forty degrees centigrade. 3. power requirements most recorders require a standard 115-volt alternating current at 47 to 63hz. and draw about 60 watts. the circuit should be free from interference such as that caused by florescent lights. a transformer may be required in the outlet to guarantee even power. 4. space requirements the recorder usually can be stored on a desk top. it is important that the indicator lights be visible to the terminal operator to monitor its operation. purchase i. maintenance and availability ask how many drives the manufacturer has installed to date. this may vary from a few hundred to one hundred thousand or more. establish a maintenance contract with the company or a local service bureau. it may be necessary to acquire a spare recorder to use as backup. 2. price determine ahead of time what features you are actually going to use. bells and whistles all cost money. a simple reliable recorder can be purchased for around $700. multiple drive units and other features can run as high as $3,600. a candid look at collected works: challenges of clustering aggregates in glimir and frbr gail thornburg information technology and libraries | september 2014 53 abstract creating descriptions of collected works in ways consistent with clear and precise retrieval has long challenged information professionals. this paper describes problems of creating record clusters for collected works and distinguishing them from single works: design pitfalls, successes, failures, and future research. overview and definitions the functional requirements for bibliographic records (frbr) was developed by the international federation of library associations (ifla) as a conceptual model of the bibliographic universe. frbr is intended to provide a more holistic approach to retrieval and access of information than any specific cataloging code. frbr defines a work as a distinct intellectual or artistic creation. put very simply, an expression of that work might be published as a book. in frbr terms, this book is a manifestation of that work.1 a collected work can be defined as “a group of individual works, selected by a common element such as author, subject or theme, brought together for the purposes of distribution as a new work.”2 in frbr, this type of work is termed an aggregate or “manifestation embodying multiple distinct expressions .”3 zumer describes aggregate as “a bibliographic entity formed by combing distinct bibliographic units together.”4 here the terms are used interchangeably. in frbr, the definition of aggregates applies only to group 1 entities, i.e., not to groups of persons or corporate bodies. the ifla working group on aggregates has defined three distinct types of aggregates: (1) collections of expressions, (2) aggregates resulting from augmentation or supplementing of a work with additional material, and (3) aggregates of parallel expressions of one work in multiple languages.5 while noting the relationships between the categories, this paper will focus on the first type. aggregates of the first type include selections, anthologies, series, books with independent sections by different authors, and so on. aggregates may occur in any format, from a volume containing both of the j. d. salinger works catcher in the rye and franny and zooey to a sound recording containing popular adagios from several composers to a video containing three john wayne movies. gail thornburg (thornbug@oclc.org) is consulting software engineer and researcher at oclc, dublin, ohio. mailto:thornbug@oclc.org a candid look at collected works | thornburg 54 the environment the oclc worldcat database is replete with bibliographic records describing aggregates. it has been estimated that that database may contain more than 20 percent aggregates.6 this proportion may increase as worldcat coverage of recordings and videos tends to increase. in the global library manifestation identifier (glimir) project, automatic clustering of the records into groups of instances of the same manifestation of a work was devised. glimir finds and groups similar records for a given manifestation and assigns two types of identifiers for the clusters. the first type is manifestation id, which identifies parallel records differing only in language of cataloging or metadata detail, some of which are probably true duplicates whose differences cannot be safely deduplicated by a machine process. the second type is a content id, which describes a broader clustering, for instance, physical and digital reproductions and reprints of the same title from differing publishers. this process started with the searching and matching algorithms developed for worldcat. the glimir clustering software is a specialization of the matching software developed for the batch loading of records to worldcat, deduplicating the database, and other search and comparison purposes.7 this form of glimirization compares an incoming record to database search results to determine what should match for glimir purposes. this is a looser match in some respects than what would be done for merging duplicates. the initial challenges of tailoring matching algorithms to suit the needs of glimir have been described in thornburg and oskins8 and in gatenby et al.9 the goals of glimir are (1) to cluster together different descriptions of the same resource and to get a clearer picture of the number of actual manifestations in worldcat so as to allow the selection of the most appropriate description, and (2) to cluster together different resources with the same content to improve discovery and delivery for end users. according to richard greene, “the ultimate goal of glimir is to link resources in different sites with a single identifier, to cluster hits and thereby maximize the rank of library resources in the web sphere.”10 glimir is related conceptually to the frbr model. if the goal of frbr is to improve the grouping of similar items for one work, then glimir similarly groups items within a given work. manifestation clusters specify the closest matches. content clusters contain reproductions and may be considered to represent elements of the expression level of the frbr model. the frbr and glimir algorithms this paper discusses have evolved significantly over the past three years. in addition, it should be recognized that the frbr algorithms use a map/reduce keyed approach to cluster frbr works and some glimir content while the full glimir algorithms use a more detailed and computationally expensive record comparison approach. the frbr batch process starts with worldcat enhanced with additional authority links, including the production glimir clusters. it makes several passes through worldcat, each pass constructing keys that pull similar records together for comparison and evaluation. as described by toves, “successive passes progressively build up knowledge about the groups allowing us to refine and information technology and libraries | september 2014 55 expand clusters, ending up with the work, content and manifestation clusters to feed into production.”11 each approach to clustering has its limits of feasibility, but the frbr and glimir combined teams have endeavored to synchronize changes to the algorithms and to share insights. some materials are easier to cluster using one approach, and some in the other. clustering meets aggregates in the initial implementation of glimir, the issue of handling collected works was considered out of scope for the project. with experience, the team realized there can be no effective automatic glimir clustering if collected works are not identified and handled in some way. why is this? suppose a record exists for a text volume containing work a. this matches to a record containing work a, but actually also containing work b. this matches to a work containing b and also containing works c, d, and e. the effect is a snowballing of cluster members that serves no one. how could this happen? in a bibliographic database such as worldcat, items representing collected works can be catalogued in several ways. efforts to relax matching criteria in just the right degree to cluster records for the same work are difficult to devise and apply. the glimir and frbr teams consulted several times to discuss clustering strategies for works, content, and manifestation clusters. practical experience with glimir led to rounds of enhancements and distinctions to improve the software’s decisions. while glimir clusters can and have been undone and redone on more than one occasion, it took experience from the team to realize that the clues to a collected work must be recognized. bible and beowulf as are many initial production startups, the output of glimir processing was monitored. reports for changes in any clusters of more than fifty were reviewed by quality control catalogers for suspicious combinations. and occasionally a library using a glimiror frbr-organized display would report a strange cluster. this was the case with a huge malformed cluster of records for the bible. such a work set tends to be large and unmanageable by nature; there are a huge number of records for the bible in worldcat. however, it was noticed the set had grown suddenly over the previous two months. user interface applications stalled when attempting to present a view organized by such a set. one day, a local institution reported that a record for beowulf had turned up in this same work set. this started the team on an investigation. after much searching and analysis of the members of this cluster, the index case was uncovered. in many cases bibliographic records are allowed to cluster based on a uniform title. what the team found connecting these disparate records was a totally unexpected use of the uniform title, a field a candid look at collected works | thornburg 56 240 subfield a, contents: “b.”. that’s right, “b.”. once the first case was located, it was not hard to figure out that there were numerous uniform “titles” with other single letters of the alphabet. so in this odd usage, bible and beowulf could come together, if insufficient data were present in two records to discriminate by other comparisons. or potentially, other titles which started with “b.” seeing this unanticipated use of uniform title field, the frbr and glimir algorithms were promptly modified to beware. the frbr and glimir clusters were then unclustered and redone. this was a data issue, and unanticipated uses of fields in a record will crop up, if usually with less drama. further experience showed more. in the examination of another ill-formed cluster, a reviewer realized that one record had the uniform title stated as “illiad” but the item title was homer’s “odyssey.” of course these have the same author, and may easily have the same publisher. even the same translator (e.g., richard lattimore) is not improbable for a work like this. this was a case of bad data, but it imploded two very large clusters. music and identification of collected works as music catalogers know, musical works are very frequently presented in items that are collections of works. the rules for creating bibliographic records for music, whether scores or recordings or other, are intricate. the challenges to software to distinguish minor differences in wording from critical differences seem to be endless. moreover, musical sound recordings are largely collected works due to the nature of publication. as noted by papakhian, personal author headings are repeated oftener in sound recording collections than in the general body of materials.12 there are several factors that may contribute to such an observation. there are likely to be numerous recordings by the same performer of different works and numerous records of the same work by different performers. composers are also likely to be performers. the point is, for sound recordings an author statement and title may be less effective discriminators than for printed materials. vellucci13,14 and riley15 have written extensively on the problems of music in frbr models. the problems of distinguishing and relating whole/part relationships is particularly tricky. musical compositions often consist of units or segments that can be performed separately. so they are generally susceptible to extraction. these extractive relationships are seen in cases where parts are removed from the whole to exist separately, or perhaps parts for a violin or other instrument are extracted from the full score. software must be informed with rules as to significant differences in description of varying parts and varying descriptions of instruments, and in this team’s experience that is particularly difficult. krummel has noted that the bibliographic control of sound recordings has a dimension beyond item and work, that is, performance.16 different performances of the same beethoven symphony information technology and libraries | september 2014 57 need to be distinguished. cast and performer list evaluation and dates checking are done by the software. however, the comparisons the software can make are susceptible to fullness or scarcity of data provided in the bibliographic record. there is great variation observed in the numbers of cast members stated in a record. translator and adapter information can prove useful in the same sense of roles discrimination for other types of materials. this is close scrutiny of a record. at the same time consider that an opera can include the creative contributions of an author (plot), a librettist, and a musical composer. yet these all come together to provide one work, not a collected work. tillett has categorized seven types of bibliographic relationships among bibliographic entities, including the following: 1. equivalence, as exact copies or reproduction of a work. photocopies, microforms are examples. 2. derivative relationships, or, a modification such as variations, editions, translations. 3. descriptive, as in criticism, evaluation, review of a work. 4. whole/part, such as the relation of a selection from an anthology. 5. accompanying, as in a supplement or concordance or augmentation to a work. 6. sequential, or chronological relationships. 7. shared characteristic relationships, as in items not actually related that share a common author, director, performer, or other role. 17 while it is highly desirable for a software system to notice category 1 to cluster different records for the same work, that same software could be confused by “clues,” such as in category 7. and the software needs to understand the significance of the other categories in deciding what to group and what to split. to handle these relations in bibliographic records, tillett discusses linking devices including, for instance, uniform titles. yet uniform titles are used for the categories of equivalence relationships, whole/part relationships, and derivative relationships. this becomes more and more complex for a machine to figure out. of course, uniform titles within bibliographic records are supposed to link to authority records via text string only. consideration should ideally be given to linking via identifiers, as has been suggested elsewhere.18 thematic indexes review of scores and recordings glimir clusters showed a case where haydn’s symphonies a and b were brought together. these were outside the traditional canon of the 104 haydn symphonies and were referred to as “a” and “b” by the haydn scholar h. c. robbins landon. this misclustering highlighted the need for additional checks in the software. a candid look at collected works | thornburg 58 the original glimir software was not aware of thematic indexes as a tool for discrimination. thematic indexes are numbering systems for the works of a composer. the kochel mozart catalog, as in k. 626, is a familiar example. these designations are not unique to a given composer, that is, they are intended to be unique for a given composer, but identical designators may coincidentally have been assigned to multiple composers. while “b” series numbers may be applied to works of chambonnières, couperin, dvořák, pleyel, and others, the presence of more than one b number is suggestive of collected work status. for more on the various numbering systems, see the interesting discussion by the music library association.19 however, the software cannot merely count likely identifiers in the usual place. this could lead to falsely flagging aggregates; one work by dvořák could have b.193, which is incidentally equivalent to opus 105. clearly, any detection of multiple identifiers of this sort must be restricted to identifiers of the same series. string quartet number 5, or maybe 6 cases of renumbering can cause problems in identifying collected works. an early suppressed or lost work, later discovered and added to the canon of the composer’s work, can cause renumbering of the later works. clustering software needs must be very attentive to discrete numbers in music, but can it be clever enough? paul hindemith (1895–1963) works offer an example. his first string quartet was written in 1915, but long suppressed. his publisher was generally schott. long after hindemith’s death, this first quartet was unearthed, and then was published by schott. the publisher then renumbered all the quartets. so quartets previously 1 through 6 became 2 through 7. the rediscovered work was then called “no. 1,” though sometimes called “no. 0” to keep the older numbering intact. further, the last two quartets did not even have opus numbers assigned and were both in the same key.20 this presents a challenge. anything musical another problem case emerged when reviewers noticed a cluster contained both the unrelated songs “old black joe” and “when you and i were young maggie.” on investigation, the cluster held a number of unrelated pieces. here the use of alternate titles in a 246 field had led to overclustering, and the rules for use of 246 fields were tightened in frbr and glimir. as in the other problem cases, cycles of testing were necessary to estimate sufficient yet not excessive restrictions. rules too strict split good clusters and defeat the purpose of frbr and glimir. at this point the glimir/frbr team recognized that rules changes were necessary but not sufficient. that is, a concerted effort to handle collected works was essential. information technology and libraries | september 2014 59 strategies for identifying collected works the greatest problem, and most immediate need, was to stop the snowballing of clusters. clusters containing some member records that are collected works can suddenly mushroom out of control. rule 1 was that a record for a collected work must never be grouped with a record for a single work. if all in a group are collected works, that is closer to tolerable (more on that later). with time and experimentation, a set of checks were devised to allow collected works to be flagged. these clues were categorized as types: (1) considered conclusive evidence, or (2) partial evidence. type 2 needed another piece of evidence in the record. finding the best clues was a team effort. it was acknowledged that to prevent overclustering, overidentification of aggregates was preferable to failure to identify them. several cycles of tests were conducted and reviewed, assessing whether the software guessed right. table 1 illustrates the types of checks done for a given bibliographic record. here the “$” is used as abbreviation for subfield, and “ind” equals indicator. area field rule notes uniform title 240 $a and no $m, $n, $p, or $r title in $ a on list of terms, without the other subfields listed, is collected work this is a long list of terms such as “symphonies,” “plays,” “concertos,” and so on. title 245 contains “selections,” is collected 245 245 with multiple semi colons and doc type “rec” 246 if four or more v246 fields with ind2 = 2, 3, or 4, is collected. if more than 1 246, consider partial evidence extent 300 if 300$a has “pagination multiple” or “multiple pagings,” is collected contents notes 505$a and $t 1. check $a for first and last occurrences of “movement”. if not multiple movement occurrences and does have if all / any the above produce more than one pattern instance or more a candid look at collected works | thornburg 60 multiple “ / ” pattern. 2. if the above doesn’t find multiple patterns, also look for “ ; “ patterns. 3. if the above checks don’t produce more than 1 pattern, look for multiple “ – ” patterns. 4. count 505s $t cases. 5. count $r cases. than one $t, or more than one $r, is collected. various fields for thematic index clues 505a if any v505 $a, check for differing opuses. (this also checks for thematic index cases too.) if found, is collected. for types score and recording related work 740 if 1 or more 740 and 1 has indicator 2 = 2”, is collected . if only multiple 740s, partial evidence author 700/710/711/730 check for $t and $n. and check 730 ind 2 value of “2.” if 730 with ind2 = 2 or multiple $t is found, is collected. if only 1 $t, partial evidence 100/110/111, 700/710 730 if format recording, and both records are collected work, require cast list match to cluster anything but manifestation matches. that is, do not cluster at content level without verifying by cast. table 1. checks on bibliographic records. frailties of collected works identification in well-cataloged records the above table illustrates many areas in a bibliographic record that can be mined for evidence of aggregates. the problem is that cataloging practice offers no one rule mandatory to catalog a collected work correctly. moreover, as worldcat membership grows, the use of multiple schemes of cataloging rules for different eras and geographic areas adds to the complexity, even assuming that all the bibliographic records are cataloged “correctly.” correct cataloging is not assumed by the team. information technology and libraries | september 2014 61 software confounded with all the checks outlined in the table, the team still found cases of collected works that seemed to defy machine detection. one record had the two separate works, tom sawyer and huckleberry finn, in the same title field, with no other clues to the aggregate nature of the item. the work brustbild was another case. for this electronic resource set, brustbild appeared to be the collection set title, but the specific title for each picture was given in the publisher field. a cluster for the work gedichte von eduard morike (score) showed problems with the uniform title which was for the larger work, but the cluster records each actually represented parts of the work. the bad cluster for si ku quan shu zhen ben bie ji, an electronic resource, contained records which each appeared to represent the entire collection of 400 volumes, but the link in each 856 field pointed only to one volume in the set. limitations of the present approach the current processing rules for collected works adopt a strategy of containment. the problem may be handled in the near term by avoiding the mixing of collected works with noncollected works, but the clusters containing collected works need further analysis to produce optimal results. for example, it is one thing to notice scores “arrangements” as a clue to the presence of an aggregate. the requirement also exists that an arrangement should not cluster with the original score. the rules for clustering and distinguishing different sets of arrangements present another level of complexity. checks to compare and equate the instruments involved in an arrangement are quite difficult; in this team’s experience, they fail more often than they succeed. without initial explication of the rules for separating arrangements, reviewers quickly found clusters such as haydn’s schopfung, which included records for the full score, vocal score, and an arrangement for two flutes. an implementation that expects one manifestation to have the identifier of only one work is a conceptual problem for aggregates. a simple case: if the description of a recording of bernstein’s mass has an obscurely placed note indicating the second side contains the work candide, mass is likely to be dominant in the clustering effect, with the second work effectively “hidden.” this manifestation would seem to need three work ids, one for the combination, one for mass, and one for candide. this does not easily translate to an implementation of the frbr model but could perhaps be achieved via links. several layers of links would seem necessary. a manifestation needs to link to its collected work. a collected work needs links to records for the individual works that it contains, and vice versa, individual works need to link to collective works. this can be important for translations, for example, into russian, where collective works are common even where they do not exist in the original language. a candid look at collected works | thornburg 62 lessons learned first and foremost, plan to deal with collected works. for clustering efforts this must be addressed in some way for any large body of records. secondly, formats will gain the focus. the initial implementation of the glimir algorithms used test sets mainly composed of a specific work. after all, glimir clusters should all be formed within one work. these sets were carefully selected to represent as many different types of work sets as possible, whether clear or difficult examples of work set members. plenty of attention was given to the compatibility of differing formats, given the looser content clustering. these were good tests of the software’s ability to cluster effectively and correctly within a set that contained numerous types of materials. random sets of records were also tested to cross check for unexpected side effects. what in retrospect the team would have expanded was sets that were focused on specific formats. recordings, scrutinized as a group, can show different problems than scores or books. the distinctions to be made are probably not complete. another lesson learned in glimir concerned the risks of clustering. the deliberate effort to relax the very conservative nature of the matching algorithms used in glimir was critical to success in clustering anything. singleton clusters don’t improve anyone’s view. in the efforts to decide what should and should not be clustered, it was initially hard to discern the larger scale risks of overclustering. risks from sparse records were probably handled fairly well in this initial effort, but risks from complex records needed more work. collected works is only one illustration of risks of overclustering. future research the current research suggests a number of areas for possible further exploration: • the option for human intervention to rearrange clusters not easily clustered automatically would seem to be a valuable enhancement. • there is next the general question, what sort of processing is needed, and feasible, to distinguish the members of clusters flagged as collected works? • part versus whole relationships can be difficult to distinguish from the information in bibliographic records. further investigation of these descriptions is needed. • arrangements of works in music are so complex as to suggest an entire study by themselves. work on this area is in progress, but it needs rules investigation. • other derivative relationships among works: do these need consideration in a clustering effort? can and should they be brought together while avoiding overclustering of aggregates? • how much clustering of collected works may actually be helpful to persons or processes searching the database? how can clusters express relationships to other clusters? information technology and libraries | september 2014 63 conclusion clustering bibliographic records in a database as large as worldcat takes careful design and undaunted execution. the navigational balance between underclustering and overclustering is never easy to maintain, and course corrections will continue to challenge the navigators. acknowledgments this paper would have been a lesser thing without the patient readings by rich greene, janifer gatenby, and jay weitz, as well as their professional insights and help in clarifying cataloging points. special thanks to jay weitz for explicating many complex cases in music cataloging and music history. references 1. barbara tillett, “what is frbr? a conceptual model for the bibliographic universe,” last modified 2004, accessed november 22, 2013, http://www.loc.gov/cds/frbr.html. 2. janifer gatenby, email message to the author, november 10, 2013. 3. international federation of library associations (ifla) working group on aggregates, final report of the working group on aggregates, september 12, 2011, http://www.ifla.org/files/assets/cataloguing/frbrrg/aggregatesfinalreport.pdf. 4. maja zumer and edward t. o’neill, “modeling aggregates in frbr,” cataloging and classification quarterly 50, no. 5–7 (2012): 456–72. 5. ifla working group on aggregates, final report. 6. zumer and o’neill, “modelling aggregates in frbr.” 7. gail thornbug and w. michael oskins, “misinformation and bias in metadata processing: matching in large databases,” information technology & libraries 26, no. 2 (2007): 15–22. 8. gail thornburg and w. michael oskins, “matching music: clustering versus distinguishing records in a large database,” oclc systems and services 28, no. 1 (2012): 32–42. 9. janifer gatenby et al., “glimir: manifestation and content clustering within worldcat,” code{4}lib journal 17 (june 2012),http://journal.code4lib.org/articles/6812. 10. richard o. greene, “cataloging alchemy: making your data work harder” (slideshow presented at the american library association annual meeting, washington, dc, june 26–29, 2010), http://vidego.multicastmedia.com/player.php?p=ntst323q. 11. jenny toves, email message to the author, december 17, 2013. 12. arsen r. papakhian, “the frequency of personal name headings in the indiana university music library card catalogs,” library resources & technical services 29 (1985): 273–85. http://www.loc.gov/cds/frbr.html http://www.ifla.org/files/assets/cataloguing/frbrrg/aggregatesfinalreport.pdf http://journal.code4lib.org/articles/6812 http://vidego.multicastmedia.com/player.php?p=ntst323q a candid look at collected works | thornburg 64 13. sherry l. vellucci, bibliographic relationships in music catalogs (lanham, md: scarecrow, 1997). 14. sherry l. vellucci, “frbr and music,” in understanding frbr: what it is and how it will affect our retrieval tools, ed. arlene g. taylor (westport, ct: libraries unlimited, 2007), 131–51. 15. jenn riley, “application of the functional requirements for bibliographic records (frbr) to music,” www.dlib.indiana.edu/~jenlrile/presentations/ismir2008/riley.pdf. 16. donald w. krummel, “musical functions and bibliographic forms,” the library, 5th ser. 31 (1976): 327–50. 17. barbara tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging,” (phd diss., graduate school of library & information science, university of california, los angeles, 1987), 22–83. 18. program for cooperative cataloging (pcc) task group on the creation and function of name authorities in a non marc environment, “report on the pcc task group on the creation and function of name authorities in a non marc environment,” last modified 2013, http://www.loc.gov/aba/pcc/rda/rda%20task%20groups%20and%20charges/reportpcc tgonnameauthina_nonmarc_environ_finalreport.pdf. 19. music library association, authorities subcommittee of the bibliographic control committee, “thematic indexes used in the library of congress/naco authority file,” http://bcc.musiclibraryassoc.org/bcc-historical/bcc2011/thematic_indexes.htm. 20. jay weitz, email message to the author, may 6, 2013. http://www.dlib.indiana.edu/~jenlrile/presentations/ismir2008/riley.pdf http://www.loc.gov/aba/pcc/rda/rda%20task%20groups%20and%20charges/reportpcctgonnameauthina_nonmarc_environ_finalreport.pdf http://www.loc.gov/aba/pcc/rda/rda%20task%20groups%20and%20charges/reportpcctgonnameauthina_nonmarc_environ_finalreport.pdf http://bcc.musiclibraryassoc.org/bcc-historical/bcc2011/thematic_indexes.htm overview and definitions the environment clustering meets aggregates in the initial implementation of glimir, the issue of handling collected works was considered out of scope for the project. with experience, the team realized there can be no effective automatic glimir clustering if collected works are not identified ... why is this? suppose a record exists for a text volume containing work a. this matches to a record containing work a, but actually also containing work b. this matches to a work containing b and also containing works c, d, and e. the effect is a snowb... bible and beowulf music and identification of collected works thematic indexes string quartet number 5, or maybe 6 anything musical strategies for identifying collected works the greatest problem, and most immediate need, was to stop the snowballing of clusters. clusters containing some member records that are collected works can suddenly mushroom out of control. rule 1 was that a record for a collected work must never be grouped with a record for a single work. if all in a group are collected works, that is closer to tolerable (more on that later). frailties of collected works identification in well-cataloged records software confounded limitations of the present approach lessons learned future research conclusion acknowledgments this paper would have been a lesser thing without the patient readings by rich greene, janifer gatenby, and jay weitz, as well as their professional insights and help in clarifying cataloging points. special thanks to jay weitz for explicating many co... references 2007 is ital’s 40th volume. my 40th birthday was the occasion of a great deal of bizarre behavior by my work colleagues, who boobytrapped my office. i do not like cake but love radishes. my birthday “cake” at work was a cheese ball decorated with forty radishes stuck on toothpicks. since i didn’t have to blow them out, i ate them—all forty. ital’s fortieth is no time for such shenanigans. rather it is a time for reflection, celebration, and memoriam. fred kilgour, the founding editor of the journal of library automation (jola), ital’s original title, died last summer. in planning for the 40th anniversaries of lita in 2006 and ital in 2007, the editorial board and i wanted to honor fred as founding editor. i called him and invited him to submit an article of his choosing. he thanked me but graciously declined. he was busy writing his mem oirs and said that he needed to conserve his strength for that task. to honor him as founding editor, i have invited a number of authors to submit articles describing their research or their seminal thoughts on our profession. readers have, i hope, seen those articles that are so des ignated by notes. i have also invited all lita members to submit such articles in previous editorials and in a posting to lital. several articles have resulted from these invitations. this being the first issue of the 2007 volume, it is neither too late for me to reissue an invitation, nor too late for you lita members and ital readers to respond with articles that commemorate our fortieth. i’m old enough to know that it is a cliché to proclaim “there has never been a more exciting time to be a librar ian.” it was so when volume 1 of jola appeared in 1967. it is so today. let us together peruse the tables of contents (tocs) of the first two issues. vol. 1, no. 1 ned c. morris, “computer based acquisitions system at texas a&i university”; richard d. johnson, “a book catalog at stanford”; robert wedgeworth, “brown university library fund accounting system”; richard e. chapin and dale h. pretzer, “comparative costs of converting shelf list records to machine readable form”; richard de gennaro, “the development and administration of automated systems in academic libraries” vol. 1, no. 2 lawrence auld, “automated book order and circulation control procedures at the oakland university library”; donald v. black, “creation of computer input in an expanded character set”; frederick c. kilgour, “costs of library catalog cards produced by computer”; r. a. kennedy, “bell laboratories’ library realtime loan system (bellrel)” four things are immediately striking about those titles. their authors described computerbased solutions and systems for big issues facing libraries forty years ago. second, those problems were all administrative, i.e., they involved using computers to increase the productivity of major operations performed by librarians and library staff. to paraphrase an oftcited goal, they were systems designed to attempt to control the rate of rise of library costs of operations—to improve the efficiency and effec tiveness of internal library processes. therefore third, they were not systems for library users per se. and fourth, they were harbingers of success. global cooperative cataloging and wellintegrated library systems have revolutionized our operations. we are devoting relatively more resources to direct services than we did forty years ago. i do not mean that no thoughts or efforts were being devoted to improved user services. when these articles were published, lockheed and the system development corporation (sdc) were in the process of developing the first commercially successful, general online database search systems. in fact, forty years ago, in a former life, as it were, i was present at what i believe was the first trans continental online information search, from a teletype machine in sdc’s office in dayton, ohio, to a computer at its santa monica headquarters. (aside to readers: as an impatient young man, i was struck less by the “magic” of the event than by an observation that i expressed on the spot: the response time was horrible—unacceptable. i opined that no one would put up with such a wait. i narrowly escaped with my scalp intact.) the national library of medicine (nlm) was perfecting the medical literature analysis and retrieval system (medlars), medline’s (medlars online’s) predecessor. selective dissemination of information (sdi) services were already being provided using batch processes. computers gen erated a myriad of printed article and technical report indexes. we’ve come a long way in forty years. an article in the current issue describes what librarians need to know about “facebook.” increasingly, in informationrich soci eties, our students and others want and need their infor mation technology on the run. the first five paragraphs of this editorial were com posed three weeks ago using the word processor on my palm treo 650 whilst i sat in medicalcenter waiting and examining rooms in portland, oregon. i downloaded the tocs of jola to my home desktop computer in vancouver, washington, two weeks ago. yesterday, i editorial: reflections on forty john webb john webb (jwebb@wsu.edu) is a librarian emeritus, washington state university, and editor of information technology and libraries. editorial | webb 3 contiuned on page 34 34 information technology and libraries | march 200734 information technology and libraries | march 2007 12. if you answered “yes” to question 11, please describe how facebook could be considered an aca demic endeavor. ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ 13. please check all answers that best describe what effect, if any, use of facebook in the library has had on library services and operations?  has increased patron traffic  has increased patron use of computers  has created computer access problems for patrons  has created bandwidth problems or slowed down internet access  has generated complaints from other patrons  annoys library faculty and staff  interests library faculty and staff  has generated discussion among library faculty and staff about facebook 14. is privacy a concern you have about students using facebook in the library?  yes  no  not sure please list any observations, concerns, or opinions you have regarding facebook use in libraries. extracted the paragraphs from my palm to my desktop, and saved that document and the tocs on a universal serial bus (usb) key. today, i combined them in a new document on my laptop and keyed the remaining paragraphs in my room at an inn on a pier jutting into commencement bay in tacoma on southern puget sound. i sought inspiration from the view out my window of the water and the fall color, from old crow medicine show on my ipod, and from early sixties beyond the fringe skits on my treo. fred kilgour was committed to delivering informa tion to users when and where they wanted it. libraries must solve that challenge today, and i am confident that we shall. editorial continued from page 3 liu ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ lib-mocs-kmc364-20131012122710 292 journal of library automation vol. 14/4 december 1981 we need a format which is consistent, easily maintainable without being uncontrollably disruptive, and responsive to changing needs which are likely to accelerate as we gain experience with online systems. rather than recommending or supporting the implementation of specific changes to the marc format, it is essential that the library community begin to establish the framework and benchmarks necessary to maintain the marc formats over the long term as well as to guide short-term considerations. arl and others can play an important role in undertaking and encouraging a broader approach to this pressing problem. such an approach will not only reduce the risk of decision making, but will also assist in the development of the cost/benefit data needed to enhance consideration of format changes. references l. d. kaye capen, simplification of the marc format: feasibility, benefits, disadvantages, consequences (washington, d.c.: association of research libraries, 1981), 22p. 2. "principles of marc format content designation," draft (washington, d.c.: library of congress, 1981), 66p. 3. ichikot. morita and d. kaye capen, "a cost analysis of the ohio college library center on-line shared cataloging system in the ohio state university libraries," library resources & technical services 21:286-302 (summer 1977). 4. council on library resources bibliographic interchange committee, bibliographic interchange report, no.1 (washington, d.c.: the council, 1981). comparing fiche and film: a test of speed terence crowley: division of library science, san jose state university, san jose, california. introduction for more than a decade librarians have been responding to budget pressures by altering the format of their library catalogs from labor-intensive card formats to computer-produced book and microformats. studies at bath, 1 toronto, 2 texas, 3 eugene, 4 los angeles, 5 and berkeley, 6 have compared the forms of catalogs in a variety of ways ranging from broad-scale user surveys to circumscribed estimates of the speed of searching and the incidence of queuing. the american library association published a state-of-the-art reporf as well as a guide to commercial computer-output microfilm (com) catalogs pragmatically subtitled how to choose; when to buy. 8 in general, com catalogs are shown to be more economical and faster to produce and to keep current, to require less space, and to be suitable for distribution to multiple locations. primary disadvantages cited are hardware malfunctions, increased need for patron instruction, user resistance (particularly due to eyestrain), and some machine queuing. the most common types of library com catalogs today are motorized reel microfilm and microfiche, each with advantages and disadvantages. microfilm offers filesequence integrity and thus is less subject to user abuse, i.e., theft, misfiling, and damage; in motorized readers with "captive" reels it is said to be easier to use. disadvantages include substantially greater initial cost for motorized readers; limits on the capacity of captive reels necessitating multiple units for large files; inexact indexing in the most widespread commercial reader, and eyestrain resulting from high speed film movement. microfiche offers a more nearly random retrieval, much less expensive and more versatile readers, and unlimited file size. conversely, the file integrity of fiche is lower and the need for patron assistance in use of machines is said to be greater than for self-contained motorized film readers. the problem one of the important considerations not fully researched is that of speed of searching. the toronto study included a selftimed "look-up" test of thirty-two items "not in alphabetical order" given to thirtysix volunteers, of whom thirty finished the test. the researchers found the results "inconclusive" but noted that seven of the ten librarians found film searching the fastest method. "average" time reported for searching in card catalogs was 37.3 min-utes, in film catalogs 41.6 minutes, and for fiche catalogs 4i. 7 minutes. a reanalysis of the original data shows a stronger advantage of fiche over film (45.3 minutes versus 51.7 minutes) when all times except duplicates are totaled, but that difference is almost entirely due to one extreme score (203 minutes). 9 the berkeley report of fiche/film comparability addressed the issue of retrieval speed directly. by constructing a series of look-up tests composed of items selected from a large public library com catalog, the researchers were able to compare microfiche and microfilm formats while holding other variables constant. in one test involving thirty-six paid users and 252 trials, microfilm was determined to be faster by 7.6 percent (±2.5 percent). in a second test, forty volunteer users were timed in 240 trials and the advantage of film over fiche dropped to 5. 7 percent ( ± 2.5 percent) .1° although rigorous in design and execution, the berkeley experimenters used in their look-up tests questions that naive users might misinterpret, e.g., "you want a book about paul robeson, written by eloise greenfield. find the listing and give the call number"; and some which could be confusing, e.g., "does the library have any joke books? if so, give the call number for one. "11 such questions potentially pose an element of uncertainty for subjects: should i look under robeson or greenfield? under joke books or humor? in addition, questions were selected by "browsing the file for target items," a procedure which could result in an uneven distribution of items which in turn could bias the results. since the number of observations is relatively large the reliability of the results is not questioned; the validity may be. the study reported here was executed by a class in research methods taught by the author during the same time as the berkeley study; we used the same two formats of the same catalog, and attempted to answer the same question: using the best available equipment, which microformat is faster to search? assumptions we assumed (i) the two forms of the catalog were identical; (2) the quality of the image was not significantly different; (3) a communications 293 search for items selected randomly from the file and arranged randomly was a fair test of retrieval speed; and ( 4) graduate students in library science were reasonably representative users for a test of speed. methodology we used a dictionary catalog from a public library system with 436, 79i entries, of which 5,63i were author, ill,l58 were title or added entries, and 320,002 were subject entries. using a random number table, we selected from the catalog i6 entries which were reproduced and randomly arranged to form the test. of the i6 items, 3 were author entries, 8 were title or added entries, 5 were subject entries. the sequence, which presumably would affect the speed of retrieval more in the film format because of the necessity to scroll from one letter to another, wasacwns kcb wm h l p pal. the test was then administered to thirty-seven volunteer graduate students randomly assigned to a micro-design 4020 fiche reader or an information design rom 3 film reader. the two readers were located in the same room. the 86 fiche were held and displayed by a ring king binder. all times were measured by a stopwatch. questionnaires administered before and after the test established that the two groups did not differ significantly in age or in selfperceived mechanical ability. of the film users, 64 percent used micro-formats "occasionally" or "frequently" compared with 35 percent of the fiche users. of the total group, 73 percent wore glasses and 62 percent reported prior physical problems with both film and fiche readers used before the test. results table 1 shows that the mean speed of the film users was i6. 7 minutes, significantly faster than the 25.3 minutes recorded by the fiche users; the range of speed for the film users was less than v3 that of the fiche users. even the slowest film user was faster than 70 percent of the fiche users. however, the fastest fiche user was faster than 70 percent of the film users. the range of fiche scores is more than 3 times that of the film scores (figure i). the standard statistical test shows the difference of means to be significant at the .oilevel. 294 journal of library automation vol. 14/4 december 1981 table i. speed of retrieval (in minutes) format low microfilm (n = 17) 12.3 microfiche(c = 20) 14.6 t = 4.8,p< .01 discussion searching motorized microfilm appears to be significantly faster than searching microfiche, on the average, for relatively inexperienced users. even the slowest time on the film was faster than most fiche times. the wide range of fiche scores suggests the possibility that frequent users could improve their searching times; very experienced users may be able to search fiche faster than film. • because of the relatively small numbers of subjects and observations •the author, an experienced fiche user, was timed at 11.6 minutes; this was the fastest time recorded by either fiche or film users. 'l g( t1,p1 ) i£s/ i£s/ if the above condition is not satisfied, p1 is left unchanged. before proving that this new algorithm is guaranteed to terminate, it is desirable first to make the algorithm more general by allowing overlap between the clusters. the following theorem proves the termination of a method which allows overlapping clusters. theorem: let the subscript n designate the nth iteration. let d represent the document space and let po,1, ... , po,m represent m initial profiles corresponding to an arbitrary distribution so,1, ... , so,m of documents in d . given a cut-off value t, the nth iteration is defined as follows : 1). generate the sets sn,l, ••• , sn,m and ln by sn,j=[i1lg( d~,,pn-1,1 f:""·t] ln= (loose documents) { p n,j if, 'l g ( t1,p n,f) ? 'l g ( t1,p n-l,j) 2), let p n,;= 't£sn,j 't£sn,j pn-l,j otherwise this algorithm is guaranteed to terminate in a finite number of iterations, where termination occurs when pn,1=pn-l,j for all f. proof: extend the document spaced to a new document spaced# confast algorithm for automatic classification/dattola 37 taining m distinguishable copies of every document in d. also, add the condition that sn,j can never contain more than one copy of each document. clearly, any sn,j defined on d# in this manner can also be represented on d as defined in the theorem. conversely, any sn,j defined on d can also be represented on d# as defined above. thus, it suffices to prove the theorem on d# under the added condition. define a function f ,., which will be shown to be monotone increasing in n, by the following: m f,. = l f n,j+ t•zn, where i=l f,.,j= lg(cl,pn-t,j) and if.sn,j z,. = number of documents in l,.. after step 2 of the iteration, f,. is replaced by f ,.', where ( f n,j)' = l g ( dt,p n,j) • if.sn,j if for any j, pn,j =f pn-t,j, then (fn,j )' > fn,j (this statement is not necessarily true in doyle's algorithm) and therefore f ,.' > f n· if . termination occurs; i.e., pn,j = pn-t,j for all j; then fn' = f,.. for the n + lth iteration, m f,.+l = l fn+t,j + t•zn+t, where i=l fn+l ,j = l g(dt,pn,j). if.sn+z,j consider the relation between the contribution of cl to f,.' and fn+t, and note that each cl (where copies of a document are distinct) contributes once and only once to both f,.' and fn+t· this relation is summarized in the following table: documentcl a) was assigned to sn,j and now 1) to s,.+l,j (cl did not change clusters) 2) to sn+l,k, k =f f ( cl did change clusters) 3) to ln+t b) was assigned to l,. and now 1) to ln+t 2) sn+l ,j relation between contribution of cl to f,.+t and f,.' > (g( dt,pn,j) (g(cl,pn,j)t, where 0..::::::~1 "·1·' lt otherwise 4) if any sn,; contains fewer than two documents, then s,.,; is eliminated, thereby reducing the number of clusters by one. the advantages of this method over the one defined in the theorem are discussed in the present section; the disadvantage is, of course, that termination is not guaranteed. to show this, note that conditions 1) and 2) above are equivalent to the termination condition in doyle's algorithm, since in doyle's method pn,j always corresponds to the new partition s,.,,, and s,.,1 = s:j (no overlap is allowed). also, if a = 0 in condition 3, then tn-t,< = hn-t,t. thus, only those documents c1 that score highest against pn-1,/, where hn-u::::::... t, are assigned to sn,j. therefore, with a= 0 this method is equivalent to doyle's algorithm. the first two modifications are implemented to improve the efficiency of the program. although convergence is no longer guaranteed, all the experiments tried so far have in fact always terminated. programs without these two modifications run about twice as slow. also, in cases where the overlap is not too high ( s:.1=sn,j), the new termination condition is usually equivalent to the one used in the theorem. that is, when s:.1=s:+l,j, then very often sn,j=sn+t,j. fast algorithm for automatic classification/dattola 39 the third modification does not improve efficiency, but it allows a more flexible, and intuitively, a more desirable method for creating overlap. the algorithm described in the theorem assigns a document d. to a cluster sn,j if g( dt, pn-l,j )-::::...t. this has two major disadvantages: 1) the overlap cannot be increased independently of the number of loose documents; increasing the overlap by lowering t in general decreases the percentage of loose documents; 2) the difference between d.'s highest score and d/s second highest score is ignored; e.g., if t=50, g(ct,p~)=200, and g(ct,p2)=50, then ct is assigned to both sl and s2. the first problem decreases the flexibility of the algorithm, since the amount of overlap and percentage of loose documents cannot be varied independently. the example in the second part illustrates the other problem. it seems desirable that a document should be assignable to two or more clusters when it scores equally (or almost equally) as high against all of them. the previous method does not take this fact into account. in the new algorithm, documents are assigned to more than one cluster on the basis of how close the score is to the highest score, relative to the cut-off value t. the parameter a determines how close to the highest score the other scores must be. when a.;_o, no overlap occurs, while a=! generates the maximum amount of overlap. with a=l, the formula reduces to tn-l,<=t; hence, it is the same definition of sn,j as in the theorem. the last modification increases the efficiency of the program, and also avoids forming clusters around documents which should be classified as loose. when s,,1 contains only one document, and that document is contained in no other clusters, then it has the same status as a loose document. experimental results the algorithm described in the preceding section is used to cluster the 82-document adi collection and the 200-document cranfield word stem collection ( 4). the results of the classification indicate three important problems: 1) the scoring function g tends to give higher scores to documents containing a larger number of concepts; thus, many of the documents containing very few concepts are classified as loose; 2) the documents do not move freely enough from one profile to another; i.e., the final clusters are quite similar to the initial ones; 3) the initial clusters cannot be chosen arbitrarily. the scoring function the first problem is due to the fact that g scores a document ct against a profile p1 by simply adding up the rank values of all the concepts in d, which appear in p1• if tl contains a larger number of concepts than d~c, the chances are greater for d; to receive a higher score. figure 1 is a plot of the score of the document against its final profile vs. the number of con40 journal of library automation vol. 2/1 march, 1969 -g) ... 0 cj (/) 8 0 • 1 c1uster t cluster 4 no. of concepts /document fig. 1. initial scoring function . q) ... 0 cj (/) 25 5 • _jl.----"~•r--c i uster 3 ,_ .. 0 10 20 30 40 50 no. of concepts/document fig. 2 modified scoring function. fast algorithm for automatic classification/dattola 41 cepts in the document for one of the adi runs. although there are a few exceptions, the graph indicates that the documents with a larger number of concepts generally receive higher scores. in fact, the average number of concepts in a loose document is eleven, while the average number of concepts per document for the entire collection is twenty. the solution to this problem is to weight the score inversely by the number of concepts in the document. the obvious answer is to divide the score by the number of concepts, but this overcompensates and gives many of the smaller documents the highest scores. dividing by the square root of the number of concepts in the document does not solve the original problem; i.e., larger documents give higher scores. satisfactory results are obtained when the score is divided by ( # of concepts per document) 718• figure 2 represents the same adi sample as figure 1, except that the new scoring function h=g/(# concepts per document) 718 is used. unlike the function g, h seems to be independent of the number of concepts in the document. movement of documents the second problem is clearly indicated by examination of the results of the classification. table 5 shows the initial and final clusters for the adi collection. the problem occurs because the documents tend to "stick" to the clusters that they are already in. this problem is solved by a method similar to that used by doyle. cluster 1 2 3 4 5 6 7 loose table 5. final results of adi classification initial documents 1-12 13-24 25-36 37-48 49-60 61-71 72-82 final documents 111, 13, 21, 30, 33, 34, 40, 43, 51, 68 3, 10, 13 24, 26, 33, 34, 53, 69, 79 9, 11, 13, 20, 22, 23, 25 28, 30 34, 36, 47, 51, 55, 65, 75 4, 7-9, 14, 20, 30, 3748, 51, 69 1, 5, 7, 20, 30, 32, 45, 47, 51 53, 55 59, 79, 80 2, 9, 27, 30, 47, 51, 61, 62, 64-71 10, 40, 51, 72 75, 77 81 12, 29, 35, 49, 50, 54, 60, 63, 76, 82 42 i ournal of library automation vol. 2/1 march, 1969 during the first few iterations, documents should be allowed to move freely from cluster to cluster, until a nucleus is formed within each cluster. the nucleus consists of those documents that are most highly correlated to one another. once the nucleus is formed, these documents will probably not move from their present clusters. clusters can be forced to contain only very highly correlated documents by raising the cut-off value t, assuming that documents with the highest scores are most similar to the other documents in the cluster. this assumption is investigated later. however, raising the cut-off value results in a larger number of loose documents. this is resolved by repeating the classification for a lower value of t, but using the clusters from the first classification as the initial clusters. this creates the problem of how to determine the initial value of t, and how much to decrement it when the classification is repeated using as initial clusters the results of the first classification. the initial value of t should be high enough so that only those documents which score very highly against profile p1 are assigned to sj. one method of achieving this is to pick t so that the clusters after the first iteration average q documents, where q is small compared to the total number of documents. in the experiments run so far, q is arbitrarily set at 4. after termination of the first classification, a nucleus is formed within each cluster. t is now chosen so that a certain percentage of the loose documents are assigned to clusters after the first iteration of the second classification. assuming it is desirable to have approximately x percent of the documents loose after the final clusters are formed, two approaches are possible: 1) t is lowered far enough so that only x percent of the documents remain loose after the first iteration; thus, after termination of the second classification, the clusters represent the final results; 2) t is lowered just enough to allow a certain percentage of the loose documents to be assigned to clusters after the first iteration; thus, the classification is repeated until approximately x percent of the documents remain loose. experiments performed using both methods indicate that the second approach allows greater control of the loose documents, with only slightly greater execution times. after the first classification, a large proportion of the documents still remain loose. therefore, if x is not too high, method 1) decreases t by a large amount. this injects many new documents into the clusters, and several iterations are necessary before termination occurs. also, t is chosen so that the percentage of loose documents is x at the end of the first iteration, but it is impossible to know beforehand the percentage of loose documents after the final iteration. in general, the more iterations, the more the final percent varies from the percent after the first iteration. in method 2), t is lowered just enough to allow a fairly small percentage ( 20% in the present experiments) of the loose documents to be assigned to clusters. this normally results in only a few fast algorithm for automatic classification/dattola 43 iterations before termination occurs; therefore, the final percent of loose documents does not vary much from the percent loose after the first iteration. the adi collection is reclassified using the procedures described above, where it is desired that about 25% of the documents remain loose. once again seven initial clusters are used, and the initial value of t is calculated to be 28.2 so that the clusters after the first iteration average four documents. however, in this case cluster 3 is assigned ten documents, while clusters 1,5, and 6 contain only one document. thus, these three clusters are eliminated, and the documents within them become loose. mter termination occurs, the final clusters are used as initial clusters for the next classification, where t is set to 19.1. the process is repeated again for t = 16.8, and after termination 17% of the documents remain loose. table 6 shows the final results of this classification. compared with table 5, many more of the documents have moved from their initial clusters. table 6. final results of new adi classification cluster 1 2 3 4 5 6 7 loose initial clusters initial documents 1-12 13-24 25-36 37-48 49-60 61-71 72-82 final documents 3, 5, 9, 10, 14-17, 2028, 30, 34, "37, 43, 45, 48, 53, 57 59, 64, 68, 69, 72, 79, 80 1, 2, 5, 6, 8, 11, 13, 20, 21, 24, 27, 28, 30, 36, 39, 41, 43, 47, 51, 53, 55, 56, 58, 61, 62, 65 68, 70, 71 79, 80 7, 31, 42, 44, 46 4, 9, 19, 32, 40, 51, 73 75, 78, 81 12, 18, 29, 35, 38, 49, 50, 52, 54, 60, 63, 76, 77, 82 in the present study, the initial clusters are determined by assigning the first p (or possibly p+ 1) documents to cluster 1, the next p ( p + 1) to 44 journal of library automation vol. 2/1 march, 1969 table 7. score vs. average correlation for adi classification cluster 1 cluster 2 document score avg. corr. document score avg. corr. 25 19.1 .08 8 18.7 .09 5 19.6 .12 5 18.8 .12 64 20.1 .10 20 18.8 .12 23 20.2 .13 68 18.9 .10 27 20.3 .10 2 19.2 .12 34 20.6 .11 70 19.2 .13 15 20.6 .11 39 19.2 .14 37 20.7 .14 28 19.3 .11 48 20.8 .12 58 19.4 .12 58 20.9 .12 36 19.5 .14 28 21.0 .12 61 19.6 .11 53 21.0 .14 56 19.6 .14 20 21.0 .14 66 19.7 .12 68 21.1 .12 67 19.9 .12 80 21.2 .14 80 20.0 .14 57 21.2 .13 43 20.0 .14 59 21.3 .15 33 20.0 .15 14 21.4 .13 21 20.0 .15 16 21.5 .13 11 20.0 .14 79 21.5 .15 65 20.0 .15 43 21.6 .16 27 20.0 .13 24 21.6 .15 71 20.1 .14 69 21.7 .14 41 20.2 .15 26 21.7 .16 79 20.4 .16 17 21.8 .15 24 20.4 .15 72 21.8 .17 13 20.5 .15 21 22.0 .17 51 20.6 .12 3 22.0 .17 53 20.7 .16 9 22.1 .17 6 21.0 .18 30 22.2 .17 62 21.4 .18 22 22.3 .17 55 21.5 .17 45 22.4 .18 1 21.7 .20 10 22.4 .17 30 21.8 .18 cluster 3 cluster 4 document score avg. cor1·. document score avg. co1'1'. 31 31.0 .21 32 24.5 .10 46 31.2 .05 51 25.0 .15 44 31.3 .14 74 26.0 .16 7 31.8 .24 4 26.3 .18 42 33.2 .28 75 26.4 .16 9 26.5 .19 19 26.9 .17 73 27.0 .19 78 27.1 .18 40 27.4 .24 81 27.6 .21 fast algorithm for automatic classification/dattola 45 table 8. score vs. average correlation for cranfield classification cluster 1 cluster 2 document score avg. corr. document score avg. corr. 26 22.0 .12 38 19.9 .12 6 22.3 .13 97 20.2 .13 7 22.4 .13 15 20.3 .11 117 23.0 .14 1 20.3 .13 2 23.1 .14 34 20.4 .13 121 23.2 .15 145 20.4 .14 13 23.3 .15 171 20.8 .12 19 23.4 .16 172 20.9 .13 60 23.7 .15 30 20.9 .14 23 23.8 .17 4 21.1 .15 18 23.8 .17 140 21.1 .15 44 24.1 .18 72 21.2 .15 183 24.2 .17 138 21.3 .15 116 24.3 .18 143 21.3 .15 128 24.5 .18 141 21.3 .14 61 24.6 .18 27 21.6 .13 9 24.6 .18 36 21.7 .13 197 24.7 .19 157 21.7 .17 16 24.7 .20 59 2!.8 .16 198 24.7 .17 156 21.8 .16 3 24.8 .18 200 21.9 .16 25 24.9 .20 32 22.0 .18 28 25.0 .21 137 22.4 .15 115 25.1 .21 29 22.5 .19 58 25.6 .21 148 22.5 .17 181 25.6 .20 . 57 22.8 .18 56 25.9 .21 128 22.8 .15 160 26.6 .23 44 23.1 .19 31 23.2 .19 139 23.9 .19 56 23.4 .18 160 24.3 .18 58 25.3 .21 clustet· 3 document score avg. corr. 179 31.4 .19 154 32.5 .24 79 32.8 .27 133 32.9 .29 134 33.4 .28 77 33.7 .27 132 33.8 .32 78 34.1 .30 76 34.3 .34 74 34.4 .34 75 34.5 .36 46 ]oumal of library automation vol. 2/1 march, 1969 cluster 2, . . . , and the final p to cluster m, where p = (total number of docu'ments) i m. since the nucleus of each cluster depends quite strongly on the initial clusters, it is not surprising that different initial clusters lead to different results. if the initial clusters are chosen at random, it is unlikely that the documents within each cluster are very similar. thus, the nucleus of each cluster might not be very tight. this problem is solved by insuring that the initial clusters contain at least a few documents that are highly con-elated. in the adi and cranfield collections, the order of the documents is such that many adjacent documents are quite similar; therefore, most of the initial clusters contain a few highly con-elated documents. in collections where the order of the documents is random, a simple, fast, clustering scheme can be used to determine the initial clusters. this type of an algorithm need only perform document-document con-elations within a fraction of the docll'ment space, and therefore should not take up much time. evaluation of results the assumption was made earlier that those documents of a cluster s; that score highest against the conesponding profile p1 are most similar to the other documents in the cluster. the phrase "most similar" is used to mean "con-elate most highly", where a standard con-elation function is used. table 7 compares the score of each document to the average correlation ( unweighted cosine function) of each document with every other document in the cluster. the documents are arranged in ascending order by scores, and hopefully, the con-elations will also appear in ascending order. as the table indicates, there is a strong tendency for the higher scores to conespond to the higher correlations. table 8 illustrates the same results for three out of seven final clusters from the cranfield collection. so far nothing has been said about how to choose the base value that is used to compute rank values. this integer has an important effect on the type of clusters produced. recall that the rank value of a concept equals the base value b, minus its rank. suppose a cluster s1 contains four documents d1-d4, and a total of twenty different concepts. the lowest possible rank value for any concept=b-4, since 4 is the lowest possible rank. if h=20, then the lowest rank value is 16, while if h=5, the lowest rank value is 1. consider a document d. which is the same as d1 except for one concept, and assume this concept does not occur in p,. with h=20, g( d.,p1) is between 16 and 20 points less than g( d1,p1 ); with h=5, g( d.,p1 ) is only 1 to 5 points less. since large clusters have profiles containing many concepts, the chances of a random document d, having concepts in the profile of a large cluster are greater than the chances of d. having concepts in the profile of a small cluster. therefore, if b is high, d. will score much lower against the profile of the small cluster, and large clusters will tend to capture all the remaining loose documents at the expense of the smaller clusters. fast algorithm for automatic classification/dattola 47 experimental results support this hypothesis, i.e., a large base value produces a few clusters with many documents, and many clusters with only a few documents. if, on the other hand, b is set so that the lowest rank value in an average cluster is 1, then there is a tendency for small clusters to get larger and large clusters to get smaller. in smaller than average clusters, all the rank values are high, since there are only a few different ranks. in larger than average clusters, the rank value as defined might become zero or even less than zero. in these cases, it is redefined to be 1, but then it is possible for many concepts to have a rank value of i . thus, a document often scores higher against the profiles of smaller clusters. the results of the cranfield classification clearly indicate the ability of a document to score higher against profiles of smaller clusters. during the classification, nine clusters are generated, and cluster 9 starts to grow much larger than average (average = 22 documents). it keeps growing until it contains 27 documents, and then it starts to oscillate. the following numbers indicate the number of documents in cluster 9 on successive iterations: 27, 21, 34, 17, 56, 01 thus, cluster 9 is eliminated. the same thing happens to cluster 8 on the next few iterations. although this tends to keep the size of the clusters somewhat uniform, it is not desirable to throw away a cluster which might contain many highly correlated documents. one solution which might be implemented is to split up large clusters into several smaller ones; i.e., classify the documents , within a single cluster. if the number of documents in the cluster is not too large, it might be practicable to use an n 2 algorithm to do this. conclusion the classification algorithm that has been described in the preceding two sections requires the following parameters as input: 1) maximum number of clusters desired; 2) approximate percentage of loose documents desired; 3) decision on whether or not loose documents should be "blended" into the nearest cluster at the end of the classification; 4) amount of overlap desired. the first parameter specifies the number of initial clusters that are formed. if no clusters are eliminated during the evaluation, then the maximum number are actually generated. the experiments run so far indicate that the number of clusters produced is usually only about 60% of the maximum. the next two parameters determine the "tightness" of the final clusters; the higher the percentage of loose documents, the tighter the clusters. if no loose documents are desired, parameter b can be set to 0, but very low percentages increase the running time of the program. almost identical results are obtained in less time by specifying about 15% loose, and then asking for all loose documents to be assigned to the cluster to which they score highest. 48 journal of library automation vol. 2/1 march, 1969 the last parameter determines the amount of overlap. this number corresponds to a in the formula t _ fhn-1,•-a • (hn-1,,-t), if hn-1,t>t "· 1 ·' lt otherwise which was mentioned under implementation. when a = 0, no overlap is produced, and with a = 1, the maximum amount of overlap is produced. the actual percentage of overlap for a given value of a depends on the collection, but results indicate that 10% overlap for a = .4, and about 20% for a= .6. although the algorithm is not guaranteed to terminate, convergence has always been obtained in practice. in order to prevent the program from looping in cases of non-convergence, the algorithm can be modified to permit a maximum of n iterations, whether or not convergence is obtained. the results indicate that clusters change very little after about four or five iterations, so that this modification would not make much difference in the final clusters. the true evaluation of the final clusters can only be made by actually performing two-level searches on the clustered document space. however, the algorithm is sufficiently general to allow for the evaluation of many different types of clusters. references 1. jones, s. k.; jackson, d.: "current approaches to classification and clump-finding at the cambridge language research unit," the computer journal, 10 (may 1967). 2. doyle, l. b.: breaking the cost barrier in automatic classification, sdc paper sp-2516 (july 1966). 3. needham, r. n.: the termination of certain iterative processes. the rand corporation memorandum rm-5188-pr (november 1966). 4. salton, g.; lesk, m. e.: computer evaluation of indexing and text processing, information storage and retrieval; report isr-12 to the national science foundation, section iii. (ithaca, n.y.: cornell university, department of computer science, august, 1967). 198 production of library catalog cards and bulletin using an ibm 1620 computer and an ibm 870 document writing system donald p. murrill: philip morris, incorporated, richmond, virginia a program is presented which runs on an ibm 1620 computer and pro· duces punched cards that activate an ibm 870 document writing system to type catalog cards in upperand lower·case characters. another pro· gram produces punched cards which instruct the 870 to type a library accessions bulletin. the programs are written in fortran ii and are de· scribed in detail. estimates of costs and production times are included. producing library catalog cards and accessions bulletins with the aid of a computer is not a new idea. since 1963 several published papers have described projects that have resulted in the production of such cards and bulletins, either as the principal end products or as two products under a total systems concept. kilgour ( 1,2,3), while with the yale medical library, described a project which had been developed jointly with tl1e harvard and columbia medical libraries under a grant from the national science foundation. six pro. grams were written to produce catalog cards by means of an ibm 1401 computer and either a 1403 line printer or an ibm 870 document writer. these programs were of interest to the philip morris, incorporated, re· search center library because of the upper· and lower·case printing capability. during the period 1961·1963 the technical library of the bureau of ships, department of the navy, used a 1401 computer to automate the preparation of 3x5 catalog cards and the library accessions bulletin. in production of library catalog cards/murrill 199 reports concerning project sharp (ships analysis and retrieval project) ( 4,5) these functions are described as being two of eleven formal outputs of the project. production of the cards is termed a "by-product" of the accumulated bibliographic data. in an ibm publication concerning the administrative terminal system, holzbaur and farris ( 6) list the production of catalog cards and of the library bulletin as two of approximately seven outputs of a total system using the 1401 computer. ibm began automatic card preparation in 1963. other publications by ibm also deal with the production of catalog cards and library bulletins with the help of a computer ( 7,8). in 1964 buckland ( 9) prepared a report for the council on library resources, inc., in which he described the preparation of catalog data on a tape-punching typewriter. the perforated tape was processed by computer for phototypesetting, tape typewriter, or line printer output. at the 1967 meeting of the american documentation institute (now american society for information science) cariou ( 10) discussed the preparation of catalog cards by means of a computer and an ibm document writing system. she programmed her computer to count the number of spaces between sentences and to use this count to determine the type of information it could expect next and, thus, what kind of processing it should give that information. the files set up by this program were used with another program to punch cards for the document writer. the computers used with the programs discussed in the foregoing were of the ibm 1400 series or equivalent. the philip morris research library has access to such computers but only on a limited basis. it has ready access to a 1620 computer. for this reason its programs were written for use with the 1620. the punched card output is processed by the 870 document writing system to produce printout in upperand lower-case characters. the philip morris library contains 5000 books, subscribes to 600 journals, and serves approximately 300 people. it is growing at the rate of approximately 1000 volumes per year. the ibm 1620 computer the 1620 computer for which the programs described in this paper were written has 20,000 positions of core, and typewriter and punched card input. it has typewriter and punched card output and differs from the computers used with the ptograms mentioned earlier in not having magnetic tape or a line printer. the lack of these devices does not detract from the final output, although that output is not produced as fast with the 870 document writer as it would be with a printer. also, data punched into cards cannot be stored on tape but must be saved, if desired, in the cards. there are many 1620 computers extant, however, and many libraries which might use this machine for the automation of their library functions. programs which permit the use of the 1620 in the pro200 journal of library automation vol. 1/ 3 september, 1968 duction of catalog cards and accessions bulletins might be of help to these libraries. the programs are written in fortran ii, specifically for compilation with a pdq fortran processor deck, as developed by maskiell ( 11 ) of mcgraw-edison company. when the program for the library catalog cards is compiled, columns 7-13 of card 25058 in the fixed format subroutine deck must be changed to 4903158. this eliminates the punching of sequence numbers in the computer output cards which are to be run through the 870 system, thus permitting use of the last eight columns for tum up control characters ( & ) and for the forms feed character (). the ibm 870 document writing system the 870 document writing system is a combination of a control unit, which is a card punch machine with a control panel, and a typewriter. a complete system could include an auxiliary keyboard, a paper-tape punch, an auxiliary card punch, and a second typewriter; but for an output of library catalog cards and bulletin only the control unit and one typewriter are needed. the punched cards whose contents are to be typed are placed in the hopper of the control unit and are passed in a continuous feed under the read head. the control panel interprets the punched characters in the cards and produces the desired alphameric symbols and punctuation marks on the typewriter. for the production of library catalog cards, continuous 3x5 card forms are put into the typewriter and the punched cards which are output from the computer, as explained in the next section, are passed through the control unit. carriage turnup is controlled by a continuous chain of small beads in which four large beads are equally spaced three-and-a-half inches apart. one rotation of the chain corresponds to the turnup of four 3x5 cards. before typing begins, one of the large beads is positioned to coincide with the top of a card. a special character in the last card of a unit of punched cards obtained from the computer activates the carriage control, causing tumup to the next large bead and the top of the next card. eleven special character exits on the control panel correspond to the special characters on the 836 card-punch keyboard. any one of the exits can be wired to do a certain job when the special character is encountered in a punched card. it seems logical to have a punched period produce a typed period, a punched comma a typed comma, and a punched slash a typed slash. these three characters, along with the numbers, are lower case on the 866 typewriter. other special characters on the typewriter, such as parentheses, brackets, colon and semicolon, are obtained by punching the upper-case shift character in the card immediately before the appropriate lower-case character. the left parenthesis, for example, is produced from an upper-case 9, the right parenthesis from an upper-case 0. reference to the typewriter keyboard gives ready knowledge of how production of library catalog gauls/ murrill 201 to obtain any desired character (see table 1). other special character exits on the control panel are wired to produce lower-case shift, single and multiple upper-case shift, typewriter carriage return, tab control, underlining (obtained from an asterisk punched immediately before the character to be underlined), and forms feed (see table 2). table 1. special character production card punch input @. typewriter output @, @/ #1 #2 #3 #4 #5 #6 #7 #8 #9 #0 #. #, #i • ( @ = lower case shift, # = upper case shift) table 2. control panel wiring typewriter 1 on _ _ ..,.. column zero single card read on ..,.. column zero single carriage return __ ..,.. card drop out special character exits common channel: . __ ..,.. type-only entry . , __ ..,.. type-only entry , / __ ..,.. type-only entry / o __ ..,.. type-only entry o & __ ..,.. carriage return @ _ _ ..,.. lower case shift # __ ..,.. single upper case shift $ __ ..,.. multiple upper case shift % __ ..,.. typewriter tab control __ ..,.. forms feed start 1 ' i " l + [ ] ? & ( ) (underline) (punched "&" prints as "+", punched "%" as "(", and punched ":if' as "-'' ) . 202 journal of library automation vol. 1/ 3 september, 1968 to obtain panel control of the typewriter the star wheels of the control unit must be engaged and a program card must be on the drum. for the library programs a blank program card is used. the same panel that is used to control the printout of the catalog cards is used for the library bulletin. the same manually punched cards are used as input data to both programs, with modifications in the case of the bulletin, as will be noted later. library catalog cards the data cards containing the bibliographic information on the books being cataloged must be punched with care, but no worksheet need be filled out by the librarian. this saves him a great deal of time and trouble. he must designate the call number, tracings, and other details for the keypunch operator, but he can do this on a plain piece of paper, which also contains a transcription of the title page or of an lc proof slip. he does not need to remember or look up any codes, nor does he need to be concerned with where each letter will go in the punched cards. the keypuncher must know certain details, of course; e.g., that the author's name always starts in column nine, and that two blank spaces are inserted after a period, one after a comma. she must know the special characters which have been wired in the control panel to produce upperand lower-case printout on the 866 typewriter. she must remember that the character for multiple upper-case printing will produce capital letters until a different control character is encountered, and she must punch the appropriate control characters where needed. these are details which are quickly learned with use, but because of them only one keypunch operator should be selected to handle the library data and to be responsible for production of the catalog cards. the 3x5 catalog card will hold seventeen typed lines. it was arbitrarily decided that the tracings, i.e., the subject entries and added entries, would always start on line thirteen. if there are more than twelve lines of bibliographic information before the tracings, the program described here will not punch the thirteenth card, and the computer processing will stop. if there are more than seventeen lines total, the program will not provide for automatic turnup to the necessary continuation card. catalog cards which fall into these two categories must be typed manually, but only a small percentage of cards need to be prepared in this way. computerizing the production of catalog cards enables one set of keypunched cards to produce several sets of computer punched cards and, thus, several catalog cards for each book. the necessity for typing the card manually is eliminated. in the program presented here each punched card represents one printout line on the catalog card. this makes it possible to count the number of lines before the tracings, by counting each punched card as it is read by the computer, and to skip as· many lines as is necessary to always start the tracings on line thirteen. production of library catalog cards/ murrill 203 to facilitate explanation in the following discussion certain terms and definitions have been assumed: a "unit" is all of the manually punched cards required for all of the catalog cards for a given book, including main entry cards and tracing cards, the latter being those with headings taken from the subject and added entries; a "set" is all of the manually punched cards required for the main entry card and not including the headings punched separately; the "body" cards are those required for the "body" of the catalog card, down to but not including the tracings. figure 1 shows how the keypunched cards look. class code subject code cutter number $tx :;@415 ij_if29 ~ : r; ~ r;$tx r;stx v@415 vif29 #fritzsche #brothers iinc. #guide to flavoring ingredients as classified under the #federal #food, #drug ahd fcosmetic fact . #neii#york, 1966. 84p. 1 l. fflavoring essences. ii . #t itle. 1 flavoring essences 1 #guide to flavoring ingredients as classified under the #federal i food, #drug and #cosnetic #act 1-i ..,_..~ fig. 1. keypunched cards. ii l i unit l certain controls must be punched into the data cards. a "1" is punched into a field called mon to indicate the last body card; a "1" is punched right-justified into a two-character field called kon to designate the last tracings card, before a repetition of the tracings to serve as headings; and "-1" is punched into the field kon to indicate the last card for a given book. one rule must be followed: if there are no non-spacing characters in a card, e.g., upperand lower-case shift characters, punching should not extend beyond column 57. for each non-spacing character punching can be extended one column, but must not extend beyond column 65. in the read statement of the computer program discussed here sixtyfive columns of alphameric data are read and stored in an array. the contents of mon and kon are saved until the next card is read. 10 read 11, m1(i),m2(i),m3(i), (read statement for all lines 1m4(i),m5(i),m6(i),m7(i), on main entry cards and all 2m8(i) ,m9( i),m10(i),mll(i), lines except headings on trac3ml2(i),m13(i),mon,kon ing cards.) 11 format( 13a5,11x,2i2) 204 journal of library automation vol. 1/ 3 september, 1968 as long as mon and kon are zero, each card is punched as it is read, and the program returns to the read statement: 15 punch 8, m1(i),m2(1) ,m3(i) , (punch statement for all lines 1m4(i),m5(i),m6(i),m7(i), except last of body of main 2m8(i ),m9(i ),m10(i) ,mll(i) , entry cards.) 3m12 (i) ,m13(i) 8 format(13a5) 14 go to (32,33) ,nbc 32 i=i+1 go to 10 (nbc is set to "1" at beginning of program. ) when the last body card is read and mon=1, the program branches to a statement which stores n-1, n being the number of cards read to that point. the program then calculates the difference between thirteen and n and branches to the appropriate statement for punching the last body card and the special characters needed to produce the number of blank lines which will begin the tracings on line thirteen in the printout: 23 max=13-n go to (15,19,20,21,22,34), max 21 (e.g.) punch 30, m1(i),m2(i), 1m3( i) ,m4( i ),m5(i) ,m6(i ), 2m7 (i),m8(i) ,m9(i ),mlo(i) , 3mll (i) ,m12( i) ,m13( i) 30 format ( 13a5,11x,4h&&&&) (sample punch statement for last body line; includes special characters to produce skipped lines.) the computed go to contains only six statement numbers to which the program can branch because of the limited memory of the 1620 computer. this means that there must be at least seven cards before the tracings and, if necessary, blank cards must be added to reach this count. the subject entries and added entries, i.e., the tracings, are read and punched, card by card, after a branch back to the read statement ( 10). with the reading of the last tracings card, kon=l. the program branches to a statement which punches the last tracings data and a special character ( ) which, during printout, will cause the typewriter to tum up to a new 3x5 card. this completes the preparation of punched cards for the first main entry card. most libraries need more than one such card. additional sets of punched cards are prepared by means of a do loop and a return to an earlier part of the program: 36 do 6 k= 1,nb 6 punch 8,ml(k),m2(k) , 1m3 ( k) ,m4( k) ,m5( k) ,m6( k), 2m7(k) ,m8(k),m9(k),mlo(k), 3mll(k),m12(k),m13(k) ( nb has been previously set to one less than the number of body cards.) (punch statement for all except last body card for all main entry cards except the first. ) max is again defined in statement 23 and the last body card is again production of library catalog cards/ murrill 205 punched. statement 14 sends the program to statements which punch the tracings for the second and subsequent cards: 33 nix=nb+2 if(not-nix)1,9,9 (not=one less than total number of cards in set.) 9 do 47 jo=nix,not 47 punch 8, m1(jo),m2(jo ), 1m3(jo ),m4(jo ),m5(jo ), 2m6(jo),m7(jo),m8(jo), 3m9(jo ),m10(jo),mll(jo ), 4m12(jo),m13(jo) 1 i=not+1 punch 26, m1(i),m2(i), 1m3( i ),m4( i) ,m5(i ),m6( i), 2m7 (i ),m8( i ),m9( i),m10( i) , 3mll( i) ,m12( i) ,m13(i) 26 format ( 13a5,14x,1h-) (punch statement for all but last line of tracings. ) (punch statement for last line of tracings; includes special character to produce typewriter turn up.) a count is kept, in a fixed point variable, ncs, of the number of card sets which have been prepared. this variable is now used to determine the next step in the program: go to ( 36,36,53,53,53,53,53,53,etc.) ,ncs the number of card sets punched for main entry cards is one more than the number of times "36" is inserted in the foregoing computed go to statement. . the headings for the tracing cards, as shown in figure 1, are placed after the card set, thus completing the card unit. a 't' in the mon field controls the processing of one heading at a time. if a heading requires more than one card, the control is punched into the last card. preparation of the remaining punched cards required for the tracing cards is provided for in the following statements: 59 m=1 2 read ll,n1,n2,n3,n4,n5,n6, 1n7,n8,n9,n10,n11,n12,n13, 2mon,kon (read statement for headings. ) punch 8, n1,n2,n3,n4,n5,n6, (punch statement for head1n7,n8,n9,n10,nll,n12,n13 ings.) m=m+1 if(mon)2,2,51 51 if(m-4)12,52,52 52 i= 3 punch 13, m2(i),m3(i), 1m 4 (i) ,m5 (i) ,m6 (i) ,m7 (i ) , 2m8( i ),m9( i ),mlo( i) ,mll (i), 3m12( i) ,m13( i) (punch statement for main entry on tracing card, drawn from fomputer memory; class code is t>mitted.) 206 journal of library automation vol. 1/ 3 september, 1968 a header card is punched not only with the heading but also with the alphabetic class code of the book being cataloged. if there is a second card to the heading, it is punched with the subject code number. a third card, if there is one, contains the cutter number. in the program statements cited above, provision is made for retrieving from the computer memory any part or parts of the call number that are not read in with the heading. care is taken to assure that no part of the number that is not needed is retrieved. as headings are generally in capital letters, and the multiple upper-case shift character is used to produce them, it is necessary to precede the subject code number of a second card to a heading with the lower-case shift character. otherwise, instead of the subject code number's being typed, the upper-case characters for the digits of this number will be typed. cards for the remainder of the body of the catalog card, except the last line, are punched by use of a do loop: m= 4 12 do 7 j= m,nb (punch statement for remainder of tracing card through next-to-last body line.) 7 punch 8, m1(j),m2(j),m3(j), 1m4(j),m5( j) ,m6(j ),m7 (j), 2m8 ( j),m9(j),m10(j),mll(j), 3m12(j),m13(j) before the last body card can be punched, the value of n must be set. before this can be done, the value of i, subscript for the punch statement, must be t ested, which is accomplished in a series of if statements. i is set to nb + 1; then if there are fewer than four cards in a heading: 54 if(i-8)18,42,43 43 if(i-10)44,45,46 46 if(i-12)48,49,50 18 n= 7 42 n= 8 44 n= 9 45 n= 10 48 n= ll 49 n= 12 after each statement setting n to a specific value, a go to statement sends the program back to statement 23. if the number of cards in a heading is four or more, the value of n is set by the following statements, mat having been equated earlier to one more than the number of cards in the heading: 41 if(mat-4 )23,27,37 37 if(mat-6)38,39,40 27 n= n + 1 38 n=n + 2 39 n= n + 3 40 n= n + 4 -----------(it is assumed that no heading will be longer than seven lines.) --~~--==---................ """"""'""""""'-... production of library catalog cards/ murrill 207 again, after each statement setting n to a specific value, a go to statement sends the program back to statement 23. . the last card of the unit of manually punched cards contains -1 in the kon field. when this control number is read and tested, the program branches to a do loop which erases the bibliographic data stored in the computer's memory, then branches to the beginning of the program to statements which set n=o, i=l, nbc=l, and ncs= o. the computer is ready to read the first card of the next unit. figure 2 shows a printout of a main entry card. exact cost figures for the production of catalog cards are not available, but a good estimate would be approximately eighteen cents per card. this cost is high, perhaps, but when the cards issue from the 870 system, they are complete, including call numbers and tracings, items that are missing from the lc cards and that must be typed onto these cards. the saving is in time; the cards produced by the program described here can be ready for filing in the library within a week; delivery of lc cards is, on the average, six months after the order date. the overall cost is reduced somewhat by the fact that the same cards that are punched for the catalog cards are used for the accessions bulletin. they can also be used for bibliographic listings under selected headings. a listing of the complete catalog card program may be obtained from the author. tx 415 f29 fritzsche brothers inc. guide to flavoring ingredients as classified under the federal food, drug and cosmetic act. new york, 1966. ·a4p. l.flavoring essences. i.title. fig. 2. printout of main entry cm·d. 208 journal of library automation vol. 1/ 3 september, 1968 library bulletin the data for the library bulletin program consists of the cards which were punched manually for the catalog card program. the information concerning each book which is to be included in the bulletin is the choice of the individual librarian. at the philip morris, incorporated, research center library none of the data after the publication date is included. care must be taken that there are at least five cards remaining for each book after unwanted cards have been discarded. the reason for this will become apparent later. the headings, consisting of the first subject entries in the tracings of the books to be listed, need to be punched. the books are grouped in the bulletin under these headings. each of the headings is to be in uppercase letters, so the first column of each card must be punched with a dollar sign, the special character wired in the 870 control panel to produce multiple upper-case printout. as with the catalog card program, certain controls are required for the bulletin program. a 't' must be punched in the field called mon in the last card of each book set. a 't' must be punched in a field called jon in each header card. a code punched into the cards to facilitate keeping them in the proper order is not necessary for the computer program, but it is certainly desirable. therefore, a sequence code consisting of eleven digits is in each card: the first five digits designate the subject heading, the next four the author, and the last two the card sequence for each book. judicious selection of the codes makes it possible to put the cards into proper or near proper order with a card sorter an especially useful feature for preparing a listing of all the titles under a given heading acquired over a period of several months. in the bulletin the titles are numbered sequentially beginning with 't', the numbering being controlled with a fixed point variable, no, which is set to one at the beginning of the computer program. the data cards are arranged in alphabetic order by author and are placed behind the appropriate header cards, which have also been arranged in alphabetic order. the first card to be read and punched is a header card: read 4, ka,kb,kc,kd,ke, kf, (read statement for first headlkg,kh,ki,kj,kk,kl,km,jon er card.) 4 format( 13a5,11x,ll) 15 punch 3, ka,kb ,kc,kd,ke, (punch statement for header lkf,kg,kh,ki,kj,kk,kl,km cards; includes special charac3 format ( 3h&&&, 13a5 ) ters to produce skipped lines.) the first three data cards are read in a do loop. the three parts of the call number are stored in an array, then the main entry, in the third card, is punched, along with the sequence number and the first part of the call number, the alphabetic class code. production of library catalog cards/ murrill 209 do 6 i= 1,3 6 read 1, m (i ) ,n (i) ,lb,lc,ld, 1le,lf,lg,lh,li,lj,lk,ll, 2lm,mon 1 format(a5,a4,11a5,a1,12x,il) punch 12, no,lb,lc,ld,le, 1lf,lg,lh,li,lj,lk,ll,lm, 2m( 1),n ( 1) 12 format ( ih&,1h@,i4,1h.,2x, 112a5, 1h%,a5,a4) (read statement for call number and main entry.) (punch statement for sequence number, main entry, and class code; includes characters for skipped line, lower case, and typewriter tab control. ) the next two cards are read and new cards are punched in another do loop. the remainder of the call number, the subject code number and the cutter number, are punched into these two cards. if there are fewer than five cards to be processed in each book set, part of the call number will be lost. blank cards are added, if necessary, to bring the count to five. 16 do 7 i = 2,3 read 17, ka,kb,kc,kd,ke, 1kf,kg,kh ,ki,kj,kk,kl,mon 17 format(8x,11a5,a2,12x,il) punch 5, ka,kb,kc,kd,ke, lkf ,kg,kh,ki,kj ,kk,kl,m (i), 2n(i) 5 format( 4x,1h%, 12a5,5x, 11h%,a5,a4) 7 continue (read statement for fourth and fifth cards of set. ) (punch statement for second and third lines of bibliographic data and remainder of call number; includes typewriter tab controls. ) the mon field in the fifth card is tested in an if statement. if this field is zero, indicating that there are more cards in the set, the additional cards are read and new ones are punched in another pair of read and punch statements. if the mon field is ''1'', indicating that the processing of the card set is complete, one is added to the sequence number variable, no, and the next card is read. if this is a header card, as indicated by ''1'' in the field called jon, the program branches back to statement 15. if, on the other hand, it is the first card of another title under the same heading, the class code is stored in an array, and the second and third cards are read in a do loop. the main entry, the content of the third card, is then punched, along with the class code and the sequence number. 9 no=n0 + 1 read 4, ka,kb,kc,kd,ke,kf, 1kg,kh,ki,kj,kk,kl,km,mon (read statement, for heading if jon=1, for main entry and class code of new card set if jon= 0.) 212 journal of library automation vol. 1/ 3 september, 1968 7. ibm manual no. e20-8094 : mechaniz ed library procedmes, 14. 8. ibm manual no. e20-0093: library catalog production -1 401 and 870. 9. buckland lawrence f. : the recording of library o f congress bibliographical data in machine form. a report for the council on library resources, inc. (washington council on library resources, inc.: november 1964 ) (rev. february 1965 ). 10. cariou, mavis: "a computerized method of preparing catalogue cards, using a simplified form of data input," proceedings, american documentation institute annual meeting, 4 (october 1967 ), 186-90. 11. maskiell, frank h.: "pdq fortran (an interpretive program for the fortran language )" (november 1963 ). lib-mocs-kmc364-20131012114231 on their regular clsi equipment in the main library and station branch. on days when housekeeping chores are scheduled, the console operator's job includes turning on the apples so we can begin serving the public when the doors open at 9:00 a.m. unless downtime persists for more than a day, no other routines are done except checkouts. under some circumstances, certain materials might be checked in on the apple, but it is not desirable to do this for newer materials on which holds may have been placed. when the libs 100 is online again, the checkout station is switched back to normal mode and the apple takes over the information desk's port for dumping, rendering that terminal inoperative. dumping continues around the clock until all transactions have been processed from both apples. normal activities proceed at all other terminals. diskettes are dumped in chronological order. as the dumping process operates, a file of transactions eliciting error or exception messages from the libs 100 is created on the apple diskette. this file is available for attention at a later time for manual entry into the database. the chief asset of the dumping process is the accuracy achieved by automatic inputting. when we used paper and pencil, not only was the original writing time consuming, but manual data entry was difficult because of illegible handwriting, inaccurate transcription of the numbers, inaccurate inputting into the database, and lack of available personnel for the job. the cti system resolves all of these difficulties, but a price is paid in the loss of the dumping terminal's services. the public may be less disturbed if a terminal in a nonpublic area is used. but to the department involved, access to the database is a central part of their work and its loss severely limits their output. in fact, dependence on the automated circulation system by all departments in the library has been swift and universal even though we originally assumed the terminals outside the circulation department would be used sparingly. plans are being made to store personnel records in machine-readable form on diskettes. other developments are being put on a back burner until we have less frequent communications 299 need for the apples as backups. however, levels, great neck library's youth department, has several apples of its own on which budding "computerniks" practice their art. for them there are few limits to possible applications-perhaps only the outermost boundaries of imagination. reference 1. joseph covino and sheila intner, "an informal survey of the cti computer backup system,'' journal of library automation 14:108-10 oune 1981). computer-to-computer communication in the acquisition process sandra k. paul: skp associates, new york city. in the 1970s, we entered the period of computer-to-computer communication; we now appear to have reached the second stage of development. today more than seventy publishers are equipped to receive computer tape orders and input them directly to their order fulfillment systems; twenty-six publishers can produce computer invoices and credits for their customers; six are capable of sending monthly updating information about titles, prices, publication dates, and books declared out of print. all of this, however, is based on a system through which computer tapes are sent from buyer to seller and back via the united states mail. the next stepcomputer-to-terminal or computer-tocomputer communication-is just around the corner. historical perspective how did this happen? it started in september 1974 when dewitt c. ("bud") baker, newly appointed president of the baker & taylor company, envisioned the savings his company could find if their customers provided the international standard book number (isbn) on their orders. he also believed that the volume of paper created by the computer was expensive and time-consuming for publishers to handle. 300 journal of library automation vol. 14/4 december 1981 always a visionary, he believed that computers communicating directly with each other would not only save time and money, but would prevent human errors introduced by research clerks or keypunchers. he invited publishers, booksellers, librarians, wholesalers, representatives of school systems, and others to a full-day meeting at thew. r. grace building. this diverse group of individuals discussed the isbn-what it was, what it might do. by the end of the day, the group defined two areas in which efforts might bear fruit. one was educationalpublishers needed to be told the importance of printing valid isbns on their books, in their promotional materials, in their advertising, and on any other source of ordering information, and on the invoices and packing lists they send to their customers. wholesalers, librarians, and booksellers needed to be shown the efficiency their use of isbns on orders introduces to the fulfillment process at publishing houses and wholesaler offices. these functions were assigned to an isbn publicity committee, chaired by franklyn ("lee") rodgers of scribner book company. the second function was the design of computer-tocomputer formats for orders and invoices that would be keyed to the use of isbn as title identifier and would be industry-wide in scope. this function was undertaken by an isbn data transmission committee, chaired by david wolverton, then of brodart. the isbn publicity committee produced a booklet and posters, distributed them at all major conventions, made press releases available, and prepared articles for inclusion in the newsletter of all of the major industry associations. the committee surveyed the use of isbns by publishers and published a list of in-house contacts for isbns. the committee's program was a success! format development the isbn data transmission committee had a more difficult task. the first question they faced was one of basic approach. immediately they decided to proceed with a format for orders rather than invoices. next, they reviewed the level to which the format would be directed. believing it was more appropriate to "crawl before they walked," they decided to develop a format that could be generated on computer tape, which would be mailed from buyer to seller through the united states mail. once there had been experience with that format, work would begin on direct computer-tocomputer communication formats and protocols. the final decision related to form. the majority of people volunteering their time to work on this committee came from the major book publishing houses. additional members included two major bookstore chains, b. dalton and waldenbooks; the new york public library (nypl) and three major wholesalers, brodart, baker & taylor, and the ingram book company. representatives from nypl, r. r. bowker company, and the library wholesaling organizations were familiar with american national standard z39.2bibliographic information interchange on magnetic tape. they felt that this standard, which is the basis for the marc tapes, should also become the basis for an order to be sent on magnetic tape. the majority of the committee, however, was not only unfamiliar with z39.2, but with the concept of programming for variable length records and/or fields. these dataprocessing managers argued that the format basically would consist of sending a quantity and the isbn for each title ordered, not the sending of bibliographic records as such. after review of a strong and well thought out letter from michael malinconico supporting the use of z39.2, the majority held to their decision and the subcommittee, chaired by tom brady, then of baker & taylor, was assigned responsibility for developing the first computer-tocomputer order format. it is a fixed length field and record format. debate continued throughout its development. each publisher hoped to have to do minimal programming in order to interface the new format and the input requirements of his or her specific order fulfillment system. provision was made for minimal bibliographic information if an isbn was unknown. polling of the members resulted in decisions to limit author and title to thirty characters, for instance. shortly before the format was approved by the committee, dick fontaine, then sales manager of b. dalton (now president) and dick lieberman, sales manager of random house decided that they would begin sending tapes in the mail as of january 1975. once the format had been approved, other publishers joined the group. orders were sent from baker & taylor and brodart to random house, john wiley, prentice hall, and doubleday. b. dalton continued to send random house tape orders in a slightly different version of the format. by the end of 1976, it appeared that the task of the isbn publicity committee had become publicizing the order format, rather than the isbn as such. the two committees decided to merge in march 1977, selecting the name book industry systems advisory committee (bisac), as much because it was a pronounceable acronym as for any other reason. development of an invoice format, originally considered of immediate importance by waldenbooks, had been shelved after that company lost interest and the individual chairing the subcommittee working on the format left the field. the next step became reports of experience with the order format. fields left open for individual use came under review for standardized coding; procedures were developed for the marking of information on the outside of the tapes and paperwork to accompany it; some publishers refused to accept tapes without isbns, while others pondered the procedure to separate titles without isbn from those with and then merge the two for discounting purposes. with all of its inadequacies, the format was working. publishers reported saving up to one week of time in processing tape orders. random house analyzed returns for wrong title or wrong edition one year after they began receiving tape orders from dalton, baker & taylor, and brodart. they found an extraordinary 47 percent decrease in that type of misshipment to those three customers. with a few years experience under their belt, bisac decided to revise the format to accommodate the inadequacies and problems members had found with it. in addition, the r. r. bowker company and oclc, inc., had just announced their incommunications 301 tentions of developing acquisition systems which would replace them in the role of "order forwarders." they would prepare orders for other organizations and transmit those orders, on tape or directly online, to the vendor of the organization's choice. this forced bisac to include a field for an "order placer" in addition to the traditional "bill-to" and "ship-to" customer name and address. the revision was approved in february 1977 and called "format 112." (it is this version of the format which has been programmed by publishers and wholesalers noted in the introductory paragraph). bisac members began pressuring b. dalton to convert from their original "preformat 1" format to format 112. in analyzing the cost of doing so, dalton also considered the potential saving they would have if invoices from publishers were received on tape. although they have never made those figures public, the potential was so great that jim nermyr, then their vice president of data processing, agreed to chair a subcommittee to develop an invoice format, paralleling as closely as possible, the order format. once that was approved jim made "selling trips" to new york and elsewhere, convincing over forty publishers to program for the invoice format in return for receiving orders on tape in format 112 from dalton. finally, bisac members began expressing concern about misinformation on orders and invoices. typically, when two organizations agreed to communicate using the standardized formats, they would exchange tapes of titles, descriptions, and the appropriate isbn for each. they would produce error reports and the purchasers would bring their computer records in line with the publisher's. however, once price changes occurred, books were made out of print, or publication dates changed for notyet-published titles, orders carried erroneous information. to resolve this, a subcommittee, chaired by andrew uszak of r. r. bowker company, set about developing what has come to be known as the "title update format." this format allows a publisher to send a monthly tape of all titles on file indicating those fields that have changed since the prior month, or simply sending the isbn and changed field infor302 journal of library automation vol. 14/4 december 1981 mation. six publishers are now sending information in this format to the ingram book company on a monthly basis; others will be doing so shortly. (the format is also the basis for information college textbook publishers send to update the monthly aap microfiche service.) most recent changes at its may 1981 meeting, bisac approved minor modifications to its order and invoice formats. these changes included increasing the zip code field to nine digits and specifying a seven-digit field for the standard address number. the invoice format was modified to accommodate its use for sending credits as well as invoices. these new formats were released in august 1981 and are "titled" order format #3 and invoice format #2. we guess that it will take at least a year before the bulk of those organizations now sending tapes in order format #2 and invoice format# 1 program for the revisions. during 1980, bisac members began expressing concern that the "crawl before you walk" philosophy had stopped in the crawl stage. baker & taylor and brodart, in particular, expressed concern that the generation of tapes was expensive and using the mails introduced such delays that orders were filled more promptly when phoned into publishers than when sent in the bisac format. in addition, oclc, which had programmed the order format into their new acquisition system, agreed that sending this information on tape would be far less effective than transmitting it online to the major vendors. in 1980, bisac established a subcommittee, chaired by jim long of oclc, inc., to develop an alternative version of the order format. this version, with variable length fields and records, is intended for use in a communication mode between the main frames of two computers. we expect there will be deep consideration and long debate on this proposed version at bisac meetings in the next few months, with passage expected in 1982. finally, bisac brought its formats to the attention of american national standards committee z39 (ansc z39) when format #2 was completed. at its may 1981 meeting, the committee decided to ask z39 to officially begin work on formalizing both the order format #3 and the new variablelength order-format alternative as american national standards. the z39 program committee and executive council agreed; ernest muro of baker & taylor is chairing the z39 subcommittee charged with this task. as bisac activities became more widely known, this ad hoc committee strained the resources of its volunteer officers in answering requests for information, for copies of the formats, and in preparing and disseminating the minutes of their meetings to an ever-increasing number of interested individuals and organizations. in 1980, bisac approached the book industry study group, inc. (bisg) with the suggestion that they become a permanent committee of that research organization, whose membership also included publishers, librarians, booksellers, and wholesalers, as well as book manufacturers. the bisg agreed and today supports bisac activities through its offices. at the end of this communication is the address which can be used to request any of the formats from the bisg office. the future automation is here and here to stay. individuallibraries that once considered it impossible to imagine being able to afford a computer now have several-and soon will have more-acquisition systems available to them through independent vendors and through the national bibliographic utilities. wholesalers are gaining computer sophistication, as are publishing houses. as an industry, we are lucky that those volunteer data-processing types who formed bisac and kept it alive were each willing to make the compromises necessary to provide us with a standardized industry-wide format. other industries have not been so lucky, with major vendors using their dollarvolume clout to demand that their customers accept orders in their own , unique formats. however, the need for standardization is known. in 1979 the american national standards institute approved the creation of a new committee-ansc x12-business data interchange. this committee is charged with developing anationa! standard format for transmission of orders, invoices, and other transactions related to the sale of merchandise, and the payment for that sale through electronic funds transfer. bisac and ansc z39 have been carefully reviewing the progress this new committee is making. it appears that the formats that result from their efforts will be variable length fields and sufficiently general in nature to fit the needs of librarians, booksellers, wholesalers, and publishers, along with those involved in the sale and purchase of all other types of commodities. the traditions and laws of this country preclude any organization from "forcing" a library to make use of these standardized formats. however, the cost savings, the guarantee of accuracy of the record received, and the speed with which the order reaches the fulfillment center suggest that these formats will increase in use in the future. we also anticipate that more and more use of the formats will be in an online transmission mode, rather than in the form of computer tapes in the mail. as the volume communications 303 of transmissions grows, we expect that some day messages from purchasers will be forced into queues to reach the more popular suppliers. to the extent that the major wholesalers provide terminals to their customers and/or facilities to accommodate a large number of transmissions, their queues may be minimal. however, it will be interesting to discover how individual publishers will cope with this situation. readers who are interested in receiving copies of order format #3, invoice format #2 or title update format# 1, should write to: book industry study group, inc., 160 fifth ave., new york, ny 10010. there is no charge for these formats. if all three are requested and first class mail is requested, postal costs are billed to the recipient. those interested in active participation on bisac should send a letter to the organization stating that request. finally, those interested in receiving copies of the minutes of the bimonthly meetings held between september and may should send a request, accompanied by a check for thirty-five dollars to the bisg address. hungry for answers in the life sciences? no matter how unusual your search question , the new 1981 810sis search guide can help you find the right answers. in response to a request for references on " people eating insects as food," an information scientist searched 810sis previews online: " ... i wanted a file with a broad subject coverage and wide range of search options. the first step i took in my search strategy was to use the free-text faci lity for the word 'edible' . then i used the keywords insect , insecta, and insects, which i identified from the 810sis search guide. the guide also led me to several biosystematic codes .. . " from a prize-winning 8/0s/s search tourna ment essay. for your online searches of biological and medical research, you need the right words ... the right concepts ... the right codes ... you'll find them all in the: 1981 biosis search guide the price? just $75.00. for further information, or to place your order, contact: biosis customer services, 2100 arch street , philadelphia, pa 19103. 800-523-4806 or 215-568-4016. factors affecting university library website design | kim 99 yong-mi kim factors affecting university library website design factors include usability testing and institutional forces.5 because website design studies are sparse, this study examines the success of technology utilization studies to further identify factors that are pertinent to website design in order to provide a comprehensive view of web design success factors. a review of literature related to university library website design will be offered in the next section. the research methods, which discuss the data collection strategies and the measurements used in the current study, will be followed by the literature review. the findings of the study will later be reported and discussed after the research methods section. the paper will then conclude with an overview of the implications the findings have for academia and managers. ■■ literature review this section offers an overview of the existing website design literature and relevant success factors. these factors include institutional forces, supervisors’ technical knowledge and support, input from secondary sources, and input from users. because the aforementioned elements are identified as independent variables, this study also adopts them as such. following existing studies, website success factors are identified from the utilitarian perspective.6 the dependent variables are (1) the extent to which website designers meet users’ needs, (2) the extent to which users perceive ulwr to be useful, and (3) their actual usage. in this manner, the evaluation of success is measured from different perspectives. this discussion of the independent and the dependent variables appears in the conceptual model, figure 1. institutional forces institutional forces refer to as organizations following other organizations practices to secure efficiency and legitimacy. existing studies have identified three institutional forces: coercive, mimetic, and normative.7 coercive force takes place when an organization pressures others to adopt a certain practice. it is higher when an organization is a subset of another organization. in this research context, the university could be an agent of coercive force. mimetic force refers to organizations following other organizations’ practices, and it is especially common for organizations within the same industry group.8 because organizations within existing studies have extensively explored factors that affect users’ intentions to use university library website resources (ulwr); yet little attention has been given to factors affecting university library website design. this paper investigates factors that affect university library website design and assesses the success of the university library website from both designers’ and users’ perspectives. the findings show that when planning a website, university web designers consider university guidelines, review other websites, and consult with experts and other divisions within the library; however, resources and training for the design process are lacking. while website designers assess their websites as highly successful, user evaluations are somewhat lower. accordingly, use is low, and users rely heavily on commercial websites. suggestions for enhancing the usage of ulwr are provided. f rom a utilitarian perspective, a website evaluation is based on users’ assessments of the website’s instrumental benefits.1 if a website helps users complete their tasks, they are likely to use the website. following this line of reasoning, dominant research has reported that users are most likely to use university library website resources (ulwr) when they can help with user tasks.2 although we know now that the utilitarian perspective should be applied to web design, not clear is the extent to which web designers consider users’ needs and, likewise, the extent to which users consider ulwr to be successful in terms of meeting their needs. also not clear are what factors other than user needs influence university library website design. this is not a trivial issue because university libraries have invested a massive number of resources into providing web services and need to justify their investments to stakeholders (such as the university) by demonstrating their ability to meet users’ needs.3 also important is the identification of these factors because web design and website performance are closely correlated.4 as a consequence, investigating factors that influence successful university library website design and providing managerial guidance is a timely pursuit. later, the objectives of this paper are twofold: 1. what factors influence university library website design? 2. to what extent do website designers and users consider the university library website to be successful? to explore these research questions, this study identifies factors influencing university library website design that have been reported in existing literature. these yong-mi kim (yongmi@ou.edu) is assistant professor, school of library and information studies, university of oklahoma, tulsa, oklahoma. 100 information technology and libraries | september 2011 although it is a critical factor for website success, there is little evidence that website designers receive strong support from their supervisors. research shows that supervisors’ lack of knowledge about websites inhibits user-centered website design.17 a respondent from chen et al.’s study reports, “it’s really a pain trying to connect with our administration on the topic of web design and usability, because even definitions are completely out the window” and “the dean and the associate directors know little about the need for usability and view it as a last minute check-off, so they can say that the web site is tested and usable.”18 lack of supervisor support inhibits website usability.19 input from secondary sources website designers typically aggregate information from secondary sources rather than from users. identified secondary sources are consultations with experts, other divisions within the library, webmasters, web committees, and focus groups.20 the most widely used method is consultation with experts.21 experts uncover technical flaws and any obvious usability problems with a design,22 facilitate focus groups,23 and create new information architecture.24 because they are experts, however, their ways of thinking may not be the same as users.’25 research shows that 43 percent of the problems found by expert evaluators were actually false alarms and that 21 percent of users’ problems were missed by those evaluators. if this analysis is true, expert evaluators tend to miss and incorrectly identify more problems than they correctly identify;26 consequently, expert testing should not substitute for user testing.27 another problem with secondary sources is that web committees “are ignorant about integrating design with usability and focus on their own agenda.”28 nonetheless, because of the lack of available resources to conduct more rigorous usability tests and the difficulty of collecting information directly from users, secondary sources such as expert evaluations are commonly used.29 input from users user input provides a great advantage for directly finding out users’ needs and integrating a user-centered design during the development stage.30 often, information from secondary sources makes assumptions about users’ needs.31 to discover users’ genuine needs, designers can conduct a regular user survey and/or seek out users’ input.32 by surveying users’ needs, one can overcome criticism such as, “most websites are created with assumptions of more expert knowledge than the users may actually possess,” and can address users’ needs more effectively.33 discovering users’ needs goes beyond usability testing because information obtained directly the same industry face similar problems or issues, mimetic decisions can reduce uncertainty and secure legitimacy.9 in this context, website designers may analyze and emulate other universities’ websites to claim that their websites are congruent with successful websites, thereby justifying their managerial practices. normative force is associated with professionalism.10 normative force occurs when the norms (e.g., equity, democracy, etc.) of the professional community are integrated into organizational decision-making. in a library setting, website designers may follow a set of value systems or go to conferences to discover ways to better deliver services. there is evidence that website designers follow other organizations.11 this phenomenon is known as isomorphism. the appearance and the structure of websites show isomorphic patterns when an organization follows examples of other organizations’ websites or conforms to institutional pressures.12 another study reports coercive forces in the design of university library websites; the parent institution exercises power over library website design by providing guidelines, and later, the design is not independent.13 supervisors’ technical knowledge and support literature on supervisors’ knowledge of and support for technology has long been recognized as one of the most important technology success factors.14 if supervisors are knowledgeable about technology, they are likely to support and provide resources for training.15 supervisors’ technical knowledge also serves as a signal for the importance of the utilization of technology within the organization; consequently, employees actively look for ways to utilize technology and vigorously adopt technology.16 figure 1. conceptual model for website design success factors affecting university library website design | kim 101 march and may 2009. a total of 315 responses were collected (139 males and 176 female; 148 undergraduates, 101 master ’s, and 66 doctoral/faculty; business 152, human relations 51, psychology 43, engineering 41, education 20, other 8). because detailed discussion of the user side of this sample appears elsewhere,36 it will not be repeated here to avoid redundancy. because sparse research has been done in this area, the questionnaire and its measurements were created based on literature relating to the successful deployment of technology, but they were modified to fit into the website design context. because of this modification, the finalized instrument was pretested and pilot tested before use in this study.37 the institutional forces are measured in three categories: coercive isomorphism (i.e., following the university guidelines regarding website creation), mimetic isomorphism (i.e., investigating other university websites and investigating commercial websites), and normative isomorphism (i.e., attending conferences). following existing studies, supervisors’ knowledge and support are assessed by the web designer in two areas: the extent to which a supervisor is knowledgeable about technology and aware of the importance of technology. the supervisor ’s support for the website is measured by asking web designers about the extent to which their supervisors allocated resources and offered training. input from secondary sources is measured by asking the extent to which website designers consult sources such as experts, other divisions, webmasters, and web committees. input from users is measured by the extent to which web designers collect information from website users. finally website successes are measured by two categories: assessments made by the web designers and the website users themselves. the finalized measurements and the sources appear in table 1. ■■ report of findings this section reports the empirical findings of each category discussed in the previous section. figure 2 shows institutional forces that influence university library website design. the first category is coercive force, the second category is mimetic forces, and the third category is normative force. it is clear that the majority of university library web designers (75 percent) comply with the guidelines given by the university, which is a measurement of coercive force; and also designers investigate other universities’ websites (75 percent) and commercial websites (59 percent), which is a measurement of mimetic forces; however, designers don’t appear to actively attend conferences that influence website design, which is a measurement of normative force. from users will reveal what users want and what should be done to meet their needs, thereby enhancing ulwr usage. however, research shows that this aspect is not actively integrated into web design due to the lack of support from supervisors.34 website success success can be measured according to the website’s purpose: to what extent does the website meet users’ needs? in the university library website context, following a utilitarian perspective, researchers measured the success by the degree of ulwr integrated into users’ tasks and users’ frequent visits to the website.35 these two measurements, when combined with the designers’ perceptions of success, will allow one to measure the users’ and designers’ perspectives of website success. by measuring from these two sides, if there is a discrepancy between the two success outcomes, it will prompt designers to adjust their viewpoints to align their success measures with users. ■■ research methods this section discusses the sampling strategies and the measurements for the independent and the dependent variables. because one of the contributions of this study is to compare users’ and designers’ perceptions of website success, the samples are drawn from two groups: one is from university library website designers and the other one is from university library users. for the designer side, it is directly collected from university library website designers; later, libraries without website designers within the library are excluded. the designer sample is identified from the publicly available yahoo academic library list (http://dir.yahoo.com/ reference/libraries). the list contains 448 academic libraries, including those outside the united states. the research assistant made a phone call to the libraries that reside in the united states and verified the existence of website designers within the library, which included 86 academic libraries. if a library had a website designer, the research assistant contacted the person and invited him or her to participate in the study. because of difficulties contacting website designers, the research assistant was able to collect 16 responses between may 2009 and february 2010. once the graduate assistant identified the unreachable designers, the researcher e-mailed those designers between january and april of 2010 and added 30 more responses to the dataset, which resulted in a total of 46 responses (a 54 percent response rate). for the user side, a survey questionnaire was sent to faculty, doctoral, master ’s, and undergraduate students between 102 information technology and libraries | september 2011 the second group of factors that affects website design is supervisors’ knowledge about technology and support for the utilization of technology (see figure 3). web designers have a somewhat mixed perception about their supervisors’ technical knowledge. more specifically, 37 percent of respondents responded that their supervisors do not have good knowledge about technology; 23 percent responded that their supervisors were somewhat knowledgeable about technology; and 40 percent responded that their supervisors have good knowledge about technology; thus, web designers have mixed evaluations about supervisors’ technical knowledge. web designers reported that their supervisors’ perceptions of the importance of technology and websites are higher than their technical knowledge. approximately 60 percent of designers responded that their supervisors emphasize the importance of technology and websites, and the remaining respondents answered that their supervisors are somewhat aware of the importance or do not value it at all. table 1. instrument construct operationalization source institutional forces following university guidelines regarding website creations investigating other university websites investigating commercial websites attending conferences 11, 12, 15 supervisor’s technical knowledge and support supervisor’s knowledge about technology supervisor’s evaluation of the importance of technology supervisor’s evaluation of the importance of website utilization availability of website tools availability of budgeting availability of technical training availability of website creation training 17, 22 input from secondary sources consulting with experts consulting with other divisions within the library consulting with webmasters consulting with website committee consulting with focus group 10, 25–26 input from users conducting user survey utilizing users’ inputs 10 website success measures from web designer we meet users’ needs we provide better services via the website we satisfy users’ needs we provide quality services our library is overall successful 1, 2 website success measures from website users it lets me finish my project more quickly it helps improve my productivity it helps enhance the quality of my project the extent to which users integrate website library resources into users’ tasks* frequency of users’ visits to university library website** 3, 41, 43 all items are measured with a likert scale: 1 not really; 2: somewhat; and 3: greatly. * measured by percentage **measured by frequency figure 2. institutional forces factors affecting university library website design | kim 103 percent of respondents reported that they consult with web experts; over 70 percent responded that they integrate input from other divisions; and around 70 percent consult with webmasters. the utilization of secondary information sources for website creation is very high except for focus groups. the most widely used technique in this category is expert consultations followed by consultations with other divisions within the library. web designers also consider input from webmasters and web committees. figure 6 shows the extent to which website designers apply input directly derived from web users. around half of respondents reported that they obtain information from user surveys, and around 70 percent responded that they consider users’ input collected via comments, feedback, and complaints. figure 4 shows the extent to which supervisors support web designers. fifty-five percent of respondents reported that they have good web creation tools; 44 percent responded that they have enough budget for website creation, and almost a similar rate of respondents (39 percent) reported that they do not have adequate budgets for website creation. the last two questions concerning training show somewhat different results from the findings of the first two questions. the majority of web designers do not get technology-related or website creation-related training. less than one-third of respondents reported that they receive enough technology-related and web creationrelated training. the findings of the use of secondary sources show in figure 5 that web designers actively leverage such information sources for web design. by category, over 80 figure 3. supervisor’s knowledge about technology figure 4. supervisor’s support figure 5. input from secondary sources figure 6. input from users 104 information technology and libraries | september 2011 majority of users rely on commercial web resources for their academic tasks. ■■ discussion based on the study’s findings, this discussion will first cover the most influential factors first followed by the least influential elements in designing a university library website. first, the most influential factors for website designers are expert opinions and consultations with other divisions within the library. these may be the most important factors because relying on experts allows designers to discover users’ needs while saving costs. web designers also consider input from webmasters and web committees. coercive and mimetic forces are also highly significant factors affecting web designers. the university library is a subset of the university, and thus, designers may need to align themselves with university policy. also, designers can claim legitimacy by imitating other successful university websites, thereby securing necessary resources and support for website creation; however, web designers are much less likely to imitate commercial websites. this finding is consistent with existing reports that organizations imitate other successful organizations’ managerial practices that are within the same industry category.38 the least influential website creation factors are supervisors’ knowledge, which in turn impacts low budget allocations, and web designers’ technical training. this finding is consistent with successful technology deployment literature that shows supervisors’ technical knowledge is highly correlated with budget allocations.39 the lack of training for web designers does not appear to be improved since the last study, which was conducted in 2001;40 library ■■ website success website success is evaluated from two sides: designer opinion and user opinion. overall, designers evaluated their websites to be highly successful. they believe that they meet users’ needs, provide better services via the web, satisfy users’ needs, and provide quality services. later, their evaluation of their website is extremely positive, as reported in figure 7. figure 8 shows users’ perceptions of the usefulness of ulwr. users generally agree that ulwr are useful for their academic projects. more specifically, 55 percent responded that they are able to finish their tasks quickly because of the resources; 65 percent reported that they could increase their productivity; and 67 percent responded that they enhanced project quality thanks to the resources. on the other hand, a significant portion of respondents (more than 30 percent) do not think or have no opinions that ulwr are useful for their academic tasks. figure 9 investigates how often users visit university library websites. approximately 30 percent reported that they never visited or rarely visited the university library website. thirty-two percent made a visit to the website a couple of times a month, and approximately 40 percent visited the library website a couple of times a week or daily. figure 10 examines the users’ utilization of ulwr versus commercial website resources. the responses from 315 users show that they utilize commercial websites more than ulwr. specifically, 46 percent of respondents reported that they use less than 20 percent of ulwr and only 8 percent utilize ulwr more than 80 percent. in contrast, 14 percent utilize less than 20 percent of commercial website resources, and 22 percent utilize more than 80 percent of commercial website resources. the figure 7. website success evaluated by design figure 8. users’ perceptions of website usefulness factors affecting university library website design | kim 105 from a utilitarian perspective, web designers primarily need to consider the ability of the website to meet users’ needs. usefulness again needs to be evaluated by users. according to user assessments ulwr are somewhat satisfactory but not strong enough to rely heavily on for academic projects. it is an alarming fact that users use commercial website resources at a much higher rate than ulwr. this is somewhat disturbing given that web designers strive to provide good services to users, and libraries have invested massive resources into providing online services. this study has implications for academia and practitioners. for academia, there has been sparse research on web design studies from a designer standpoint. it may be because of difficulties in collecting data directly from website designers. from this line of research, this study enhances the understanding of what factors influence university web design. although university websites may be deemed successful, information managers should discover why the majority of users turn to commercial websites for their academic projects. without addressing this problem, the existence of library websites may be compromised. although there is evidence that libraries consider user input, it may not accurately represent all user populations because only extremely satisfied or extremely dissatisfied users tend to provide feedback;43 consequently, a regular survey may facilitate the utilization of ulwr. finally, supervisors’ technical knowledge is found to be low. this problem may be alleviated as time goes on because new generations are more aware of the importance of technology. in the meantime, web designers are encouraged to actively communicate with supervisors about the value of the utilization of technology and seek more financial support. this study’s data have some limitations. although the web designers are usually self-taught rather than formally trained.41 one promising finding, though, is that despite the relatively low technical knowledge held by supervisors, the respondents tend to rank highly when it comes to their perceptions of the importance of technology. compared with other institutional forces, normative force is relatively low. this kind of institutional force is higher at the early stage of technology adoption. in other words, the majority of universities have already launched their websites and have established rules and policies, so libraries are already past this stage. also, input from user surveys is relatively low. this may be because it is very costly, and they have other sources to turn to such as other universities’ successful websites. website success evaluations by web designers and users show discrepancies. overall, web designers evaluate their websites to be highly successful, while user ratings offer a different picture. this incongruity is a red flag in terms of ulwr usage. the majority of users report that they turn to commercial websites more than ulwr, and one-third never or rarely visit the university website. the disparity of the success between web designers and users may be attributed to the sources of information that website designers rely on. more specifically, existing studies report that input from experts and website committees is incongruent with what users really want, while feedback from focus groups can assist in understanding users’ needs.42 ■■ conclusions this study investigates the factors that website designers consider when designing university library websites. figure 9. frequency of visits to university library websites figure 10. university library vs. commercial website 106 information technology and libraries | september 2011 seriously in information systems research,” mis quarterly 29, no. 4 (2005): 591–605. 9. scott, institutions and organizations; dimaggio and powell, “the iron cage revisited”; h. haverman, “follow the leader: mimetic isomorphism and entry into new markets,” administrative science quarterly 38, no. 4 (1993): 593–627. 10. scott, institutions and organizations. 11. k. lee, dinesh mirchandani, and xinde zhang, “an investigation on institutionalization of web sites of firms,” the data base for advances in information systems 41, no. 2 (2010): 70–88. 12. lee, mirchandani, and zhang, “an investigation on institutionalization of web sites of firms.” 13. r. raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36. 14. y-m. kim, an investigation of the effects of it investment on firm performance: the role of complementarity (saarbrucken, germany: vdm verlag, 2008); p. weill, “the relationship between investment in information technology and firm performance: a study of the valve manufacturing sector,” information systems research 3, no. 4 (1992): 307–33. 15. a. lederer and v. sethi, “the implementation of strategic information systems planning methodologies,” mis quarterly (1988): 445–461; j. thong, c. yap, and k. raman, “top management support, external expertise and information systems implementation in small business,” information systems research 7, no. 2 (1996): 248–67; m. earl, “experiences in strategic information systems planning,” mis quarterly (1993): 1–24; a. boynton and r. zmud, “information technology planning in the 1990’s: directions for practice and research,” mis quarterly 11, no. 1 (1987): 59–72. 16. s. jarvenpaa and b. ives, “information technology and corporate strategy: a view from the top,” information systems research 1, no. 4 (1990): 351–76. 17. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries.” 18. ibid. 19. j. veldof and s. nackerud, “do you have the right stuff? seven areas of expertise for successful web site design in libraries,” internet reference services quarterly 6, no. 1 (2001): 20. 20. chen, germain, yang, “an exploration into the practices of library web usability in arl academic libraries”; r. raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36; j. bobay et al., “working with consultants to test usability: the indiana university bloomington experience,” in usability assessment of library-related web sites: methods and case studies, ed. n. campbell (chicago: ala, 2002): 60–76; h. king and c. jannik, “redesigning for usability: information architecture and usability testing for georgia tech library’s website,” oclc systems & services 21, no. 3 (2005): 235–43. 21. j. h. spyridakis, j. b. barrick, and e. cuddihy, “internetbased research: providing a foundation for web-design guidelines,” ieee transactions on professional communication 48, no. 3 (2005): 242–60; t. a. powell, web design: the complete reference (berkeley, calif.: osborne/mcgraw-hill, 2002). 22. powell, web design. 23. r. tolliver, d. carter, and s. chapman, “website redesign and testing with a usability consultant: lessons learned,” oclc systems & services 21, no. 3 (2005): 156–66; l. vandecreek, author tried to increase responses using various means, the number of responses does not allow one to use a sophisticated analytical technique such as regression. this study includes academic libraries with a web designer within the library; as a consequence, libraries without a web designer are not included. it is recommended to collect data from both groups and compare those with a designer (resource rich) and without a designer (resource poor), and discover underlying patterns of the factors impacting website designs and offer implications for academia and managers. references 1. d. v. parboteeah, j. s. valacich and j. d. wells, “the influence of website characteristics on a consumer’s urge to buy impulsively,” information systems research 20, no. 1 (2009): 60–78; m-h. huang, “designing web site attributes to induce experiential encounters,” computers in human behavior 19 (2003): 425–42. 2. y-m. kim, “the adoption of university library web site resources: a multigroup analysis,” journal of the american society for information science & technology 61, no. 5 (2010): 978–93; o. nov and c. ye, “users’ personality and perceived ease of use of digital libraries: the case for resistance to change,” journal of the american society for information science & technology 59 (2008): 845–51; n. park et al., “user acceptance of a digital library system in developing countries: an application of the technology acceptance model” international journal of information management 29, no. 3 (2009): 196–209. 3. w. hong et al., “determinants of user acceptance of digital libraries: an empirical examination of individual differences and system characteristics,” journal of management information systems 18, no. 3 (2001–2): 97–124. 4. parboteeah, valacich and wells, “the influence of website characteristics; j. palmer, “web site usability, design, and performance metrics,” information systems research 13, no. 2 (2002): 151–67. 5. c. burton, “library web site user testing,” collect & undergraduate libraries 9, (2002): 10; s. ryan, “library web site administration: a strategic planning model for the smaller academic library,” journal of academic librarianship 29, no. 4 (2003): 207–18; y-h chen, c.a. germain., and h. yang, “an exploration into the practices of library web usability in arl academic libraries,” journal of the american society for information science and technology 60, no. 5 (2009): 953–68. 6. m-h huang, “designing web site attributes to induce experiential encounters,” computers in human behavior 19 (2003): 425–42. 7. w. r. scott, institutions and organizations (thousand oaks, calif.: sage publications, inc, 1995); p. dimaggio and w. powell, “the iron cage revisited: institutional isomorphism and collective rationality in organizational fields,” american sociological review 48 (1983): 147–60. 8. w. r. scott, institutions and organizations; h. haverman, “follow the leader: mimetic isomorphism and entry into new markets,” administrative science quarterly 38, no. 4 (1993): 593–627; m. w. chiasson and e. davidson,” taking industry factors affecting university library website design | kim 107 “usability testing for web redesign: a ucla case study,” oclc systems & services 21, no. 3 (2005): 226–34; j. ward, “web site redesign: the university of washington libraries’ experience,” oclc systems & services 22, no. 3 (2006): 207–16. 32. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries.” 33. ibid. 34. kim, “the adoption of university library web site resources.” 35. ibid. 36. ibid. 37. y-m. kim, “validation of psychometric research instruments: the case of information science,” journal of the american society for information science & technology 60, no. 6 (2009): 1178–91. 38. h. haverman, “follow the leader: mimetic isomorphism and entry into new markets,” administrative science quarterly 38, no. 4 (1993): 593–627. 39. t. teo and j. ang, “an examination of major is planning problems,” information journal of information management 21 (2001): 457–70. 40. r. raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36. 41. ibid. 42. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries”; powell, web design; b. bailey, “heuristic evaluations vs. usability testing,” ui design update newsletter (2001), http:// www.humanfactors.com/downloads/jan01.asp (accessed june 15, 2011). 43. t. hennig-thurau et al., “electronic word-of-mouth via consumer-opinion platforms: what motivates consumers to articulate themselves on the internet?” journal of interactive marketing 18, no. 1 (2004): 38–52. “usability analysis of northern illinois university libraries’ website: a case study,” oclc systems & services 21, no. 3 (2005): 181–92. 24. spyridakis, barrick, and cuddihy, “internet-based research.” 25. b. bailey, “heuristic evaluations vs. usability testing,” ui design update newsletter (2001), http://www.humanfactors .com/downloads/jan01.asp (accessed june 10, 2011). 26. powell, web design. 27. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries.” 28. k.a. saeed, y. hwang, and v. grover, “investigating the impact of web site value and advertising on firm performance in electronic commerce,” international journal of electronic commerce 7, no. 2 (2003): 119–41. 29. l. manzari and j. trinidad-christensen, “user-centered design of a web site for library and information science students: heuristic evaluation and usability testing,” information technology & libraries 25, no. 3 (2006): 163–69. 30. e. abels, m. white, and k. hahn, “identifying user-based criteria for web pages,” internet research 7, no. 4 (1997): 252–56. 31. l. vandecreek, “usability analysis of northern illinois university libraries’ website: a case study,” oclc systems & services 21, no. 3 (2005): 181–92; m. ascher, h. lougee-heimer, and d. cunningham, “approaching usability: a study of an academic health sciences library web site,” medical reference services quarterly 26, no. 2 (2007): 37–53; b. battleson, a. booth and j. weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (2001): 188– 98; g. h. crowley et al., “user perceptions of the library’s web pages: a focus group study at texas a&m university,” journal of academic librarianship 28, no. 4 (2002): 205–10; b. thomsett-scott and f. may, “how may we help you? online education faculty tell us what they need from libraries and librarians,” journal of library administration 49, no. 1/2 (2009): 111–35; d. turnbow et al., ital_24n4p3 ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ lib-mocs-kmc364-20131012113301 202 journal of library automation vol. 14/3 september 1981 rlin system scheduled to be operational this summer. the proposed test will involve eight members of the reference stafffour from each departmentwho will be trained to search on oclc and rlin. those selected will include both librarians and library assistants who regularly provide reference assistance. the results obtained from such a representative group will better enable us to assess the impact on the whole reference staff should we later decide to fully implement the service. they will be the only ones involved in sampling questions and conducting comparative searches. the test will have two components, the first of which will be a twenty-week period to collect at least 400 sample questions. during their regularly scheduled reference hours, the eight specially trained librarians 'will collect samples of reference requests for materials that, based on the information initially given by the patron, cannot be identified in the card catalog. after checking the catalog, the librarian will then complete the top portion of a two-page selfcarbon form with all of the information that is known about the requested item. then, at regular intervals during the semester, the pages of each form will be separated and distributed to other members of the test staff for batch-mode searching. the manual oclc and rlin searching for each query will be done by different staff members to eliminate crossover effects. each request will be searched on both oclc and rlin with the following information being recorded: 1. date of the material requested (if known). 2. type of material (e.g. , conference proceeding) . 3. amount of time required to do the search. 4. success or failure of the search. this information will then be cumulated in a statistical table, and the results of each search will be keypunched for computerized analysis using the bmdp (biomedical computer programs) statistical package to determine whether or not effectiveness and efficiency have been improved significantly. in addition , on twenty-four randomly selected days during the semester the trained searchers will count the total number of questions received by them on that day that would have been appropriate to search on rlin or oclc. by using these data it will be possible to extrapolate the potential usefulness of the systems for the entire semester. the second component of the test will be a two-week real-life test during which all questions requiring further verification would be searched immediately on rlin, oclc, and in the appropriate printed sources to compare time required, success rate, and type of material requested. this sort of test would permit the searcher to continue to negotiate with the patron as the search progressed, which is the usual situation. also, this would provide the only opportunity to have the patron judge the value of subject searches done on rlin. if funding is received, preliminary results should be available in early 1982. anyone conducting similar or otherwise relevant studies is asked to contact the author. replicating the washington library network computer system software thomas p. brown: manager of computer services, and raymond deb use: manager of development and library services, washington library network, olympia. the washington library network (wln) computer system supports shared cataloging and catalog maintenance, retrospective conversion, reference, com catalog production, acquisitions, and accounting functions for libraries operating within a network. the system offers both full marc and brief catalog records as well as linked authority control for all traced headings. it contains more than 250,000 lines of pli 1 and ibm bal code in more than 1,100 program modules and runs on ibm or ibm-compatible hardware with ibm operating systems (mvs ,os/vsl). all database management functions are provided by ada:bas, a product of software a.g. of north america. the online system runs under cics/vs 1.5. a set of assembler codes called the tp monitor interface defines a standard service interface between the applications programs and the tp monitor. this allows easy upgrade to different tp monitors and convenient points for collecting performance statistics. the majority of the bibliographic subsystem updating is done in batch mode to conserve online resources. a new version of the system with interactive updating is currently being planned, for use in special environments. the applications software was designed and implemented with a number of important conventions: 1. top-down design. 2. standard use of ibm environments. 3. structured coding techniques. 4. interfaces to a database management system (adabas) and teleprocessing monitor (currently cics). 5. stand;ud naming and formatting. 6. use of a standard set of data structures and assembler subroutines to manipulate data. 7. identification of maintenance changes in source programs. in addition, programming for the online functions meets other conditions: 1. load modules less than 20k bytes. 2. no pl/1 run-time subroutines. 3. reentrant coding. 4. standard services for the tp monitor interface. 5. applications are kept as terminal independent a~ possible, with the tp monitor interface performing input and output translations. replication a system with these characteristics, even though large, can easily be transported to a different site. while wln was not designed with multiple replications in mind, a policy decision made by the network a few years ago made replication an attractive possibility. recognizing that it had a capability that would be highly competitive with other online shared bibliographic services, wln expanded its service area beyond the state of washington. it set limits to its expansion, however, having determined that it would remain a small, responsive organicommunications 203 zation providing what it hoped would be superior service to its participating libraries. having set such limits, however, created two impediments to its achieving superior service: wln would have a smaller base of libraries from which its participants could obtain the benefits of shared cataloging, and there would not be the fiscal resources necessary to support a large continuing development effort. both would penalize libraries for joining wln, the first with a lower hit rate against the database and the second with fewer added capabilities. replication provided a possible answer to both of these problems, as well as a potential source of income. in its software license agreements, wln asks the licensee to agree to bibliographic data sharing. all cataloging done by a licensee or its participants would thus be available for loading on wln's own database; likewise, all wln participant cataloging will be made available to the licensee. wln, at least, would accept catalog records only from libraries that follow its bibliographic standards; that is, the standards of the library of congress. currently this sharing is accomplished via magnetic tape, but in the future, online record interchange may be possible, given wln's current work in this area. wln also explicitly asks in its software license agreements that the replicating institution carry out an organized program of development to meet the latter's particular needs. such development is monitored by wln in order that redundant work is not undertaken and to ensure that the various efforts relate coherently. there is a built-in constraint upon major modification andredesign: wln is packaging enhancements and changes into periodic releases of the source code and requiring that the replicants install these releases within twelve months of the date issued. because of the interest in shared development and because wln itself is not in a position to provide first-level program maintenance, the system is distributed in source-code form. the initial installation, however, is of load modules (programs in a form efficiently read and executed by machine), allowing immediate testing of the system's capabilities in its new environ204 journal of library automation vol. 14/3 september 1981 ment. wln is currently negotiating a contract with a new firm, biblio-techniques, that will offer a more nearly turnkey version of wln, packaged with adabas and software a.g.'s tp monitor, complete, and, if necessary, with the required hardware. national lffirary of australia the first replication of the system was made at the national library of australia (nla) in cranberra in early 1979. nla had its own ibm 370/148 and an established data processing staff. adabas had been installed prior to the arrival of wln's installation consultant. minor changes are necessary in cics to support dedicated wln terminals, and these were quickly made and the system was up within days. further work allowed searching on the system from the library's 3270 terminals. after a couple of weeks of shakedown, a wln staff member spent about two weeks training nla staff in the use of the system. it has been in full production for in-house production cataloging for more than a year now, and this spring is being extended to other libraries around the country on a pilot basis, testing the concept of the newly defined australian bibliographic network (abn). nla has replaced the 370/l48 with a larger machine. university of illinois the second installation occurred earlier this year at the university of illinois, where the system was obtained to carry out a pilot project in which the urbana campus and the river bend library system will use it as an online public-access catalog in conjunction with the lcs circulation system. again, load modules were installed and the system was up within a few days, running on the university's administrative computer at the chicago circle campus. illinois staff have had some difficulties in recompiling all of the source code, but these problems are being worked out. wln will warrant that the source code supplied corresponds to the load modules it installs. the system as presently distributed by wln can in no way be considered turnkey. local computer operations or jcl requirements as well as differing levels of staff expertise can create problems. furthermore, wln handles source management through wylbur, a text-editing system, and this is not included with the wln software. the module descriptions, grogramming language, mode, link-edit information, etc., maintained through this facility are supplied, however. either a test or, if contracted for, a full database is also supplied. wln has had some difficulties in creating a valid test database for illinois, but has now defined procedures to better control the process. wln has distributed its second release to australia as a full source update identical to what was installed at illinois. in the future only the source changes in standard ibm iebupdte form will be supplied to replication sites. this will better enable these sites to integrate the new version into theirs. other sites the university of missouri is likely to be the third replicant of the system, since it has just selected wln as the basis for its online catalog system . installation is planned before the end of 1981. the national library of new zealand has also indicated that it intends to purchase the system. the southeastern library network (solinet) has obtained the system in order to convert it to a burroughs facility. while this is a software license, it is not a replication. theresulting system, however, would be available from wln for installation on burroughs equipment. wln has not implemented data sharing with australia, but is testing the loading of illinois data into its bibliographic file. wln libraries should see illinois records on a regular basis by late summer of 1981. similar arrangements will be made with missouri and solinet. shared development has gotten off to a start with the national library of australia having done the work necessary to add the ibm 3270 type of terminal to those that can support cataloging input and edit on wln. illinois will be undertaking the development of enhancements to make the system easier to use as a public online catalog, in addition to other possible areas of concern. wln, of course, continues its in-house development, which has recently seen the implementation of a new batch retrospectiveconversion subsystem, and added com catalog options and online authority verification during input/edit. while not the only bibliographic system to be successfully replicated, the wln computer system is becoming the most systematically replicated main-frame facility, with a broad range of future possibilities, including that of a truly turnkey system. wln's experience indicates that, if a system is designed for ease of maintenance at perhaps some sacrifice of efficiency, it will be readily transportable and allow others to obtain the benefits of a highly sophisticated bibliographic capability without the everincreasing cost of original development and, more importantly, without having to support the ongoing maintenance of a unique system. a general planning methodology for automation richard w. meyer, beth ann reuland, francisco m. diaz, and frances colburn: clemson university, clemson, south carolina. introduction a workable planning methodology is the logical starting place for the successful implementation of automation in libraries. an automation plan may develop on the basis of an informal arrangement or from the efforts of one individual, but just as often, automation plans are developed by committees. an automation planning committee must determine and execute some kind of planning methodology and is more likely to be successful if it starts with clear guidelines, good leadership, and a thoroughly proven approach. as a summary review of the literature will bear out, many libraries have developed their own planning techniques inhouse. some of these, which are addressed to the issues of cataloging rule changes and public-access catalogs, have been very well thought out. 1 however, these techniques are generally not directed to planning for communications 205 library-wide automation, and are usually designed to meet the specific needs of an individual library. although the pattern for these studies is often similar, they do not seem to be based upon any general automation design methodology. neither, in addition, does there seem to be a general methodology available through any external library agency. the office of library management studies of the association of research libraries has developed a number of programs designed to assist libraries with their planning efforts, some of which appear to be useful in automation development. 2 but for many libraries, these programs may be too broad, too time-consuming or too expensive. as an alternative, some libraries will need to look elsewhere for a general automation planning methodology. this problem was addressed by the administration of the clemson library, and was resolved in a unique way. background the robert muldrow cooper library of clemson university has the responsibility of acquiring, preserving, and making available for use the many materials needed by faculty and students in their research and instructional efforts. at a typical landgrant institution like clemson, the amount of scholarly publishing and the pressure to develop research proposals has risen sharply in recent years. the increased needs of users working with an expanding and diversified collection have resulted in a doubling of circulation activity, and have required the growth of library staff by 70 percent over the last decade. furthermore, acquisition, processing, and access problems are compounded by the high inflation rate of materials, particularly serial publications, and manpower costs. even though user demands heavily burdened the traditional manual systems, the extent of library automation at clemson had been limited to a batch circulation system, a simple serials-listing capability, and the use of bibliographic utilities. although it had been generally accepted for some time that the acquisitions and fund-control functions at clemson were in need of automation, no concrete approach to developlib-mocs-kmc364-20131012113754 editor's notes goodbye jola it is with mixed emotions that we note that this is the last issue of the journal of library automation. the first issue appeared in march 1968, just shortly after this editor had graduated from library school. under the editorships of frederick g. kilgour and susan k. martin, ]ola established itself as a major source of information about developments in library automation. this is also the last issue of the first volume produced by a new editorial board. the current editors are especially indebted to eileen mahoney of ala's central publication unit, whose experience, patience, and wise counsel contributed materially to making this last volume one we are all proud of. hello ital please welcome volume l , number l of information technology and libraries when its bright new face appears on your doorstep in march. it will look very familiar to you. the new name reflects many of the shifts in emphasis that have gradually been introduced in recent years as changing technologies have encouraged a broadening of lola's original scope. we plan to introduce some minor changes to increase it al's utility, but see these as evolutionary. \.ye continue to solicit comments and suggestions on how the journal can better serve your needs. sychronicity in our september issue, we initiated a new section , " reports and working papers," in which we reproduce documents we believe deserve a wider readership than their original distribution. w e were amused to note a similar innovation in the august bulletin of the american society for information science. we would welcome comments on the usefulness (or wastefulness) of the new section. standard.s standards continue to be a major concern in our field. w e hope those of you involved with acquisitions systems will find the communications by sandy paul and jim long in this issue useful. we encourage you to participate in standards development efforts when possible. please t ry to use developed standards whenever they are applicable to your work. the isbn , san (standard address number), sln (standard library number) , and other standard numbers will become increasingly important as our systems become more interdependent in this shrinking world. 251 microsoft word ital_december_gerrity_final.docx editor’s comments bob gerrity information technologies and libraries | september 2013 3 this month’s issue we have an eclectic mix of content in this issue of information technology and libraries. lita president cindi trainor provides highlights of the recent lita forum in louisville and planned lita events for the upcoming ala midwinter meeting in philadelphia, including the lita town meeting, the always-‐popular top tech trends panel, and the association’s popular “networking event” on sunday evening. ital editorial board member jerome yavarkosky describes the significant benefits that immersive technologies can offer higher education. the advent of massive open online courses (moocs) would seem to present an ideal framework for the development of immersive library services to support learners who may otherwise lack access to quality library resources and services. responsive web design is the topic of a timely article by hannah gascho rempel and laurie m. bridges, who examine what tasks library users actually carry out on a library mobile website and how this has informed oregon state university libraries’ adoption of a responsive design approach for their website. piotr praczyk, javier nogueras-‐iso, and salvatore mele present a method for automatically extracting and processing graphical content from scholarly articles in pdf format in the field of high-‐energy physics. the method offers potential for enhancing access and search services and bridging the semantic gap between textual and graphical content. elizabeth thorne wallington describes the use of mapping and graphical information systems (gis) to study the relationship between public library locations in the st. louis area and the socioeconomic attributes of the populations they serve. the paper raises interesting questions about how libraries are geographically distributed and whether they truly provide universal and equal access. vadim gureyev and nikolai mazov present a method for using bibliometric analysis of the publication output of two research institutes as a collection-‐development tool, to identify journals most important for researchers at the institutes. bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. editor’s comments bob gerrity editor’s comments | gerrity 4 \ comparisons of lc proofslip and marc tape arrival dates at the university of chicago library charles t. payne: systems development librarian, and robert s. mcgee: assistant systems development librarian; university of chicago library, chicago, illinois 115 a comparison of arrival dates of 5020 lc proofslips and corresponding marc magnetic tape records reveals that four-fifths of the marc records were received the same week as, or earlier than, the proofslips. the purpose of this study is to determine the timeliness of marc ii records' arrival dates in comparison to the arrival dates of matching lc proofslips. the acquisitions department of the university of chicago library receives a complete set of cut and punched lc proofsheets (or "lc proofslips") that is used primarily for selection and ordering. in examining potential uses of marc records in acquisitions processing, the library systems development office felt that a critical determinant would be the timeliness of marc records in comparison to the arrival dates of the matching lc proofslips. accordingly, the study described below was designed to gather data upon which appropriate system design questions might be considered. it was decided that "arrival date" would be defined as the week in which an arrival occmted, since the initial processing and distribution of incoming lc proofslips is framed within weekly, rather than daily, periods. "week" was defined as the monday through friday workweek. "arrivals" were defined as deliveries of marc tapes and lc proofslips by the library mail service. no attempt was made to influence the normal delivery procedures, or to specialize or hasten identification of these 116 journal of library automation vol. 3/2 june, 1970 materials for priority handling. arrival weeks were numbered consecutively, the week of march 31 april 4, 1969, being designated week 1. marc tape numbers correspond to arrival week numbers; i.e., marc tape #4 arrived during week 4. table 1 presents these correspondences. table 1. week numbers for 15 weeks of study week number arrival week dates 1 march 31 april 4 2 april 7 april 11 3 april 14 april 18 4 april 21 april 25 5 april 28may 2 6 may 5-may 9 7 may 12may 16 8 may 19 may 23 9 may 26-may 30 10 june 2june 6 11 june 9june 13 12 june 16june 20 13 june 23june 27 14 june 30july 3 15 july 7july 11 data collection proofslip collection began in week 2, but in that week only a partial collection was made. in subsequent weeks, complete collections of proofslips bearing the marc acronym (marc proofslips) were attempted, so that proofslip data beginning with week 3 (april 14-18) are more complete. proofslip collection was terminated in week 15. discrepancies between the counts of marc records and the numbers of marc proofslips collected have not been accounted for, but possible reasons are discussed in the following section. data collection was based upon comparisons of: 1) the weekly printed indexes, in lc card number order, that came with marc ii tapes; and 2) weekly lists of marc proofslip arrivals. in each incoming batch of lc proofslips, those with marc notes were separated and their arrival date noted. the marc proofslips for each week were put in primary order by the £rst two digits (series number) of the card number, and were secondarily ordered within each series by the serial number following the hyphen, thereby matching the order of lc card numbers in the marc indexes. these numbers were transcribed to create a weekly list of proofslip arrivals. two new lists of lc card numbers were derived each week: 1) a marc index; and 2) a proofslip list. weekly each new list was compared with all lists of the other type to identify card number matches. \ proofslip and marc arrival dates/payne and mcgee 117 thus, each of the two types of lists was cross-tabulated with all the lists of the other type, showing on all lists which card numbers had been matched, and the week numbers of these matches. counts were made of the matches tabulated on each list, and were entered into table 2. matches made during a given week are subcounted by series groups 65-68, 69, and the 7 series. the cumulative percentages of marc record and proofslips matches were entered into tables 3 and 4. table 3 contains the percentages of matches for any week's proofslips with successive marc tapes. for example, of the 340 proofslips received in week 4, 71.2% matched marc records received the same week, or earlier, i.e., tapes 4, 3, and 2. table 4 contains the percentage of matches for any marc index on successive proofslip lists. for example, of the 768 records on marc tape number 5 (received in week 5) 23% were matched by proofslips received the same week, or earlier, i.e., weeks 5, 4, 3, and 2. analysis of results some patterns of marc and proofslip arrivals are indicated by the tables. the results in table 2 show that there is not a one-for-one weekly relationship between proofslip and marc record arrivals. for example, the 340 marc proofslips received in week 4 matched tape records received from week 2 through week 10, although the highest number of matches was also in the tape received in week 4. in later proofslip weeks, however, the highest number of proofslip matches was with tape records received at least one week earlier. a summary of table 2 would show that of the 5020 marc proofslips received during weeks 3-10, 4004, or 79.8% were matched to marc records received the same week or earlier. in table 3, the cumulative percentages of proofslip matches with successive marc indexes indicate, for several of the weeks, more than a 90% match with tape records two weeks after proofslip arrivals. table 3 shows that the percentage of matches for a set of proofslips received in one week with the marc indexes received the same week or earlier ranges from 48.9% to 91.6%. table 4 shows that the percentage of matches for a marc tape received in a given week with the proofslips received the same week or earlier ranges from 7.1% to 49.8%. for the period of weeks corresponding to tape numbers 3-10, 6335 tape records (from table 4) and 5020 proofslips (from table 2 or 3) were received. the reason for the discrepancy between the number of marc records and the number of marc proofslips is not clear, but is possibly due to the combined effects of basic factors such as the limited period of the study, the difficulties of collecting proofslips in a working environment, and the nature of the manual effort required to list lc card numbers and compare proofslip lists and marc indexes. table 2. number of proofslip matches with marc indexes by arrival week and by lc card number subseries proof slip lc tape tape tape tape tape tape tape tape tape tape week series 1 2 3 4 5 6 7 8 9 10 ps 2 6568 5 25 13 4 4 0 0 0 0 0 # 88 69 2 8 5 3 3 0 l 0 0 0 7 series 1 2 7 1 2 1 0 0 0 0 total 8 35 25 8 9 1 1 0 0 0 6!:>-68 6 42 77 37 17 9 2 1 0 0 ps 3 69 7 25 65 32 30 9 5 0 1 0 #497 7 series 0 16 36 41 8 2 0 0 2 0 total 13 83 178 110 55 20 7 1 3 0 65-68 0 14 35 57 19 12 1 3 2 1 ps 4 #340 69 0 0 26 56 12 3 0 1 1 3 7 serie' 0 0 18 36 19 9 1 0 1 2 total 0 14 79 149 so 24 2 4 4 6 65-68 0 0 14 56 33 35 0 4 7 4 ps 5 69 0 0 7 62 21 8 0 4 3 2 #398 7 seriel 0 0 9 49 9 9 1 5 2 6 total 0 0 30 167 63 52 1 13 12 12 65-68 0 0 0 29 108 77 3 5 2 3 ps 6 69 0 0 1 55 95 so 4 2 2 4 #653 7 serie~ 0 0 2 28 72 52 6 0 5 s· total 0 0 3 112 275 179 13 7 9 12 65-68 0 0 0 9 68 128 29 6 4 2 ps 7 #711 69 0 0 0 2 29 133 54 9 1 8 7 serie! 0 0 0 5 33 92 33 10 3 6 total 0 0 0 1 6 130 353 116 25 8 16 ps 8 65-68 1 0 0 0 5 87 46 29 6 4 69 0 0 0 0 2 54 49 20 11 4 #503 7 seriel 0 .0 0 0 2 37 46 17 1 2 total 1 0 0 0 9 178 141 66 18 10 65-68 0 0 0 0 0 10 86 122 52 34 ps 9 69 0 0 0 0 l 3 75 107 53 39 #933 7 «f>ri f>< 0 0 0 0 0 1 49 115 73 42 total 0 0 0 0 1 14 210 344 178 115 65-68 0 0 0 0 0 0 5 36 159 91 ps 10 #985 · 69 0 0 0 0 0 0 1 40 180 96 7 series 0 0 0 0 0 0 5 23 165 101 total 0 0 0 0 0 0 11 99 504 288 tape tape tape 11 12 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 2 1 q_ 3 1 n 0 0 0 3 0 0 1 2 0 4 2 0 0 l. 0 0 0 0 0 0 , 0 l. 1 5 1 1 0 0 0 10 4 0 15 5 1 8 1 0 5 1 1 7 1 0 20 3 1 tape tape 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 n n n n 0 0 1 0 0 0 1 0 0 0 0 0 () () () () 0 0 1 0 1 0 2 0 5 1 1 0 1 1 7 2 ~ ~ 00 'o' ~ g ..... -q.. t-c & j :;t.. ~ 0 g ..... a· ~ 2 !"""' co -to ._ c: l:l sd ~ td z table 3. cumulative percentages of matches of each week's proofslips received with each additional marc ii tape index proofsl ' cumulative \ of tape number 1_p tape 1 tape 2 tape 3 tape 4 tape 5 tape 6 tape 7 tape 8 tape 9 tape 10 ps 2 9.1 48,9 77.3 86.4 96.6 97 . 7 98.6 98.6 98.6 98.6 ll 88 ps 3 2.6 19.3 55.1 77.2 !1497 88.5 92.3 93.7 94.0 94.6 94.6 ps 4 o.o 4.1 27 . 4 71.2 85.9 92.9 ll340 93.5 94.7 95 .9 97.6 ps 5 o.o 0 . 0 7.5 49 . 7 65.3 78.4 78.6 81.9 84.9 87.9 # 398 ps 6 o.o o.o .4 17.6 59 .7 87.1 89.1 90.2 91.6 93.4 #653 ps 7 o.o o.o 2.2 20.5 70 . 2 86.5 90.0 91.1 93.5 11711 o.o ps 8 .2 .2 .2 .2 2. :> 37 .4 65.4 78 . 5 8 2.1 84.1 #503 ps 9 o.o o.o 0 . 0 o.o .1 1.6 24.1 61.0 80.0 92 . 4 ll933 ps 10 1.1 11.2 62 . 3 91.6 ll995 o.o o.o 0.0 o.o o.o 0.0 tape 11 tape 12 tape 13 tape 14 tape 15 ~ 0 (;l" 98.6 98 . 6 98.6 98.6 98.6 ....... -s· 9 4.6 94.6 94.6 94.6 94.6 ~ ~ r.. 97.6 97 . 6 97.6 97.6 97.6 ~ > i:j::j 88.2 88.2 88.2 88.2 88 .2 93.9 94.0 94 . 0 94.0 94.0 cj ~ c;· ~ ...... 94.9 94.2 94.2 94 .4 94 . 4 t:::l / ~ <'+<.1:> ., 84 . 1 84 .5 84. 7 84.7 84.7 --~ to< 94.0 94.5 94 . 6 94.8 94.8 z tr1 93 .6 93,9 94.0 94.7 94.9 $)j ~ 0.. ~ n g'j tr1 tr1 ...... ...... co table 4. cumulative percentages of matches of each marc ii tape index with each additional week's proofslips received tape cumulative \ of proofslip week number ps 2 ps 3 ps 4 ps 5 ps 6 ps 7 ps 8 ps 9 ps 10 ps 11 tape 1 1.2 3.2 3.2 3.2 3.2 3.2 3.4 3.4 3.4 3.4 # 648 tape 2 7.1 23.8 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 # 495 tape 3 · 5.1 41.1 57.1 63.2 63.7 63.7 63.7 63.7 63.7 63.7 # 494 tape 4 1.0 14.9 33 . 1 54.7 67.9 71.0 71.0 71.0 71.0 71.0 #791 tape 5 1.2 8.3. 14.8 23.0 #768 58.8 75.6 77.0 77.1 77.1 77.1 tape 6 0.1 1.9 4,0 a.5 #1136 24.2 55.4 71.0 72.3 72.3 72.4 tape 7 0.1 1.2 1.4 1.6 3.5 20.2 40.5 70.7 72.3 72.6 #694 tape 8 o.o 0,2 o.7 2.7 * 658 3.8 7.6 17.6 69.9 84.9 86.2 tape 9 o.o 0.3 o . a 2.1 3.1 4.0 6.1 26.0 82.5 86.0 #892 tape 10 o.o o.o 0.7 2.0 3.3 5.0 6.2 18.5 49.8 79.5 #922 -------ps 12 ps 13 ps 14 3.4 3.4 3.4 26.6 26.6 26.6 63.7 63.7 63.7 71.0 71.0 71.0 77.1 77.1 77.1 72.4 72.4 72.4 72.6 72.6 72.6 86.5 86.5 86.5 86.3 86.3 86.3 89.9 89.9 90.5 ps 15 3.4 26.6 63.7 71.0 77.1 72.4 72.6 86.5 86.3 90 . 5 ~~f--1 to 0 g ~ ...... .q.. ~ 1 > ~ 8' g ..... a· ;;s 2 ~ vo ........... to <....t = ::i so f--1 c:d ~ \ proofslip and marc arrival dates/payne and mcgee 121 conclusion the data collected to date indicate that the arrivals of marc records generally precede those of the corresponding proofslips. thus, marc records seem to be timely enough to be used in book selection and ordering processes, where proofslips are now used, as well as to supply bibliographic data for cataloging. 178 information technology and libraries | december 2006 l eadership—what is it? ala president leslie burger has me thinking about it a lot these days. as i write, the lita board is in the process of determining who lita will sponsor in the ala emerging leaders program. the task is difficult. lita has many new librarians who have strong potential for leadership. consequently i feel assured that lita has a strong future because what is an association, if not its members? so one of the questions the board asked was what does it mean to be an emerging leader? when has one emerged? personally, i feel that i am still emerging because there is always more to learn. lifelong learning, isn’t that what librarians are all about? in preparation for my presidency, i attended an american society for association executives seminar facilitated by tecker consultants. they defined four types of influential leadership: servant, visionary, expert, and catalytic. i see all four types of influential leaders within lita and they are all important. the servant leader provides service to others. in a volunteer organization like lita, a lot of servant leadership is being exhibited. these are the people who keep the organization humming, making sure we have the programs and education opportunities that make lita relevant to its members. the most obvious place we see visionary leaders in lita is at our top technology trends; however, it is not the only place where visionary thinking occurs. lita members are often cuttingedge, applying new technologies to solve problems or to provide better solutions and services. visionary leadership is where one sees what the future could look like. lita programs are filled with expert leaders who share their technical expertise and lead the profession in applying those technologies. however, we also have many expert leaders who have important insights into what the association can be. the catalytic leader brings people together and leverages their capabilities. the lita board works with other lita leadership to ensure that our goals are reached and to bring together all of the lita offerings to make membership a comprehensive professional benefit. my challenge as the current lita president with the ala emerging leaders program is to ensure that our sponsored member has a meaningful opportunity to become a superb leader both within lita and within the profession. in addition to attending the leadership training workshops for all of the emerging leaders, each sponsored person will be appointed to some service role within ala or one of its units. the lita board has elected to have our sponsored emerging leader work closely with the officers, in particular our vice president/president-elect mark beatty, on strategic planning for the next two years. i am hopeful that we will learn a great deal from our emerging leader regarding what new members are seeking out of the organization. when i think about a good leader, i think about someone who listens, who allows others to think creatively and to take risks, who inspires, who sees the big picture, who can make decisions and make others understand the reasons for a decision, and who communicates well. john buchan put it this way: “the task of leadership is not to put greatness into people, but to elicit it, for the greatness is there already.” my goal this year, in conjunction with, but not limited to, the ala emerging leaders program, is to grow our new members into future lita leaders. i have been rewarded in all of my work within lita to witness rising stars take on exciting roles and projects. i hope everyone reaps the joys of mentoring new professionals at some point in their careers. in my own leadership role, i take seriously the need to implement lita’s strategic plan. in that vein, the board has created an assessment and research task force that will make recommendations on gathering assessment data and feedback from members. with the appropriate knowledge base, we can ensure that value is being received. the board has also created a working group consisting of the chairs of the education committee, the regional institutes committee, and the program planning committee to make recommendations on our education programs. i have been working with that group to identify new modes of delivering our programs and to ensure that they maintain their relevancy to lita members. lita continues to implement new communication technologies to reach out to its members. the lita blog has now been up for over a year and the new lita wiki is available for use by interest groups and others to allow experts to collaborate in the building of topic-specific resources. sir john harvey-jones framed the question thusly: “how do you know you have won? when the energy is coming the other way and when your people are visibly growing individually and as a group.” i see this happening in lita. what an energizing and fulfilling sight it is! president’s column bonnie postlethwaite bonnie postlethwaite (postlethwaiteb@umkc.edu) is lita president 2006/2007 and associate dean of libraries, university of missouri–kansas city. 88 information technology and libraries | june 2006 article title: subtitle in same font author name and second author the concept of digital libraries is familiar to both librarians and library patrons today. these new libraries have broken the limits of space and distance by delivering information in various formats via the internet. since most digital libraries contain a colossal amount of information, it is critical to design more user-friendly interfaces to explore, understand, and manage their content. one important technique for designing such interfaces is information visualization. although computer-aided information visualization is a relatively new research area, numerous visualization applications already exist in various fields today. furthermore, many library professionals are also starting to realize that combining information visualization techniques and current library technologies, such as digital libraries, can help library users find information more effectively and efficiently. this article first discusses three major tasks that most visualization for digital libraries emphasize, and then introduces several current applications of information visualization for digital libraries. a good understanding of user tasks is the foundation of designing useful visualizations. rao et al. defined several specific user tasks of digital libraries and illustrated some existing information-visualization techniques that can be used to enhance these tasks, such as tilebar, cone tree, and document lens.1 the tasks were browsing subsets of sources iteratively, viewing context-of-query match, visualizing passages within documents, rendering sources and results, reflecting time costs of interaction, managing multiple-search processes, integrating multiple search and browsing techniques, and visualizing large information sets. moreover, zaphiris et al. generalized these tasks into three essential ones: searching, navigation, and browsing.2 indeed, most information-visualization projects for digital libraries have emphasized these three tasks. in terms of searching, shneiderman et al. proposed the use of a two-dimensional display with continuous variables to view several thousand search results simultaneously.3 this visualization included two strategies: two-dimensional visualizations, and browsers for hierarchical data sets (implemented by using categorical and hierarchical axes). in combination with a grid display, this visualization let users see an overview by colorcoded dots or bar charts arranged on a grid and organized by familiar labeled categories. users could probe further by zooming in on desired categories or switching to another hierarchical variable. a language-independent document-classification system, completed by liu et al., provided a search aid in a digital-library environment and helped users analyze the search query results visually.4 this system used a vector model to calculate the similarities between documents and a java applet to display the classification to the user. in terms of navigation, there are also a variety of information-visualization applications. the previous example of two-dimensional display developed by shniederman et al. also contained navigation functions.5 another example is hascoet’s map interface applied to a digital library.6 this prototype was associated with summary views in the form of navigation trees and neighbor trees that showed documents related to one focus document. the user interface was composed of maps automatically generated based on the characteristics of documents retrieved and a default configuration. users could also modify the configuration of maps and edit maps (classical operations such as cut, paste, move, delete, save and load a view, and expand a view). as for browsing, the use of dynamic queries is a technique that has been employed for some time. ahlberg and shneiderman’s (1994) filmfinder is an early example.7 users can move several sliders to select query parameters, and the search results change with the movement of the sliders. this tool can help users browse movie records more easily and cognitively. another technique is query previews, proposed by doan et al.8 query previews allows users to rapidly gain an understanding of the content and scope of a digital collection. users are presented with generalized previews of the entire database using only the most salient attributes. when they select rough ranges, they will immediately learn the availability of the data for their proposed query. all these applications provide good examples and paradigms to some recent projects. this paper’s discussion of visualization techniques will be based on these three essential tasks—searching, navigation, and browsing. ■ techniques and applications this section presents several recent information-visualization projects applied to digital libraries. all these applications emphasize searching, navigation, and browsing. gang wan gang wan (wangang11@gmail.com) is science librarian, texas a&m university, college station. visualizations for digital libraries article title | author 89visualizations for digital libraries | wan 89 lvis—digital library visualizer indiana university’s (iu’s) lvis (digital library visualizer) project aims to aid users’ navigation and comprehension of large document sets retrieved from digital libraries. borner et al. developed a prototype of lvis based on the data set in the dido image bank, provided by iu’s department of art history.9 lvis is a good combination of information-retrieval algorithms and visualized-search interface. in the information retrieval and classification stage, it adopts latent semantic analysis (lsa) to automatically extract semantic relationships between images. the lsa output feeds into a clustering algorithm that groups images into several classes sharing semantically similar descriptors. a modified boltzman algorithm is then used to lay out images in space. this section will focus on the interface metaphors used to display the results of this classification. two interfaces have been implemented for lvis. a 2d java applet was used on a desktop computer for details, and a 3d immersible environment for the cave (cave automatic virtual environment). cave is a virtual reality 10’ x 10’ x 10’ theater made up of three rear-projection screens for walls and a down-projection screen for the floor. projectors throw full-color workstation fields (1,280 x 512 stereo) at 120hz onto the screens, giving between 2,000 and 4,000 linear pixel resolution to the surrounding composite image.10 both 2d and 3d interfaces give users access to three levels of detail: they provide an overview about document clusters and their relations; they show how images belonging in the same cluster relate to one another; and they give more detailed information about an image, such as its description or its full-size version. in the cave environment, users can first enter a virtual display theater that stages the digital library as a cyberspace easter island, presenting gateways to specific subject categories established by the previous classification process. borner et al. used 3d icons here to encode subject categories (in this case, they actually used a sculptural form of heads inspired by images of the data set).11 after users “enter” into these head icons, they are transited to a new 3d spatial metaphor that presents images in the current category. these images, or slides from the digital library, are presented in crystalline structures (figure 1). in this environment, each crystal represents a set of images with semantically similar image descriptions. again, physical proximity is used here as a metaphor to encode semantic similarities among images. the formations of the crystalline structures depend on the size of the actual search-result data set. navigation in this space is easy. users can also “walk” through this environment and select images of interest to display a largerand clearer-size version (as two images shown in figure 1). if the larger version is not satisfactory, it can be returned to its previous iconic presentation. uc system: a fluid treemap interface for digital libraries the uc system—the acronym “uc” came from its original (but no longer used) internal name “uplib client”—was developed by good et al. at palo alto research center.12 it was built on the uplib personal digital-library platform, which provides an “extensible, format-agnostic, and secure document-repository layer.”13 “personal” here means that the user already has the right to use all of the data objects in the library, and already has local possession of those objects. however, this visualization can be employed in more general digital libraries. the uc system uses continuous and quantum treemap layouts to present collections of documents. continuous treemaps are space-filling visualizations of trees that assign an area to tree nodes based on the weighting of the nodes.14 in continuous treemaps, the aspect ratio of the cells is not constrained, although square cells are often preferred. quantum treemaps extend this idea by making cell dimensions an even multiple of a unit size.15 the treemap visualizations provide meaningful overviews of document collections and fast, intuitive navigation among multiple documents in a working set. an important aspect of the interface is the fluidity of navigation. this allows the user to focus on the documents rather than on interacting with the tool. the interface allows a user to zoom in on an object with a left-click, and to zoom out when the user clicks on the background; figure 1. lvis: image crystals and panels 90 information technology and libraries | june 2006 however, the combination of a “zoomable” user interface and continuous treemaps leads to a problem: conflicts with aspect ratios. to solve this problem, good et al. proposed to zoom and morph the cell to the window size while leaving the rest of the layout in place.16 thus the visual disturbance of the display is minimized since only a single cell moves. with respect to searching, this system provides several methods to filter results. first, its interface includes a mechanism to search for specific content within the documents. as letters are added to the search query, the system increasingly highlights matching documents to immediately indicate matching documents. secondly, the user can also choose to update the view to display only those documents that match the current query. figure 2 gives an example of a user-initiated query process. in the scenario described in figure 2, the user first enters the search terms (2.1), and interactive highlights then appear for groups with matching documents (2.2). the user presses a button to limit the view to only the matching documents (2.3). finally, the user zooms in on a document and begins reading with an integrated reading tool. (2.4) the uc system also offers a mechanism that allows the user to compare multiple documents. after users retrieve a set of documents through a search, they can press a button to “explode” the documents to pages. they can continue zooming in to a portion of a single document, and then select a document page to read with the integrated reading tool. in short, the uc system uses treemaps as the primary visual metaphor. it also uses various visualization techniques that enhance user interactions, such as zooming, interactively highlighting, exploding, etc. activegraph activegraph is an information-visualization tool designed by marks et al. (2000) at los alamos national laboratory (lanl).17 it aims to provide users with a concise, customizable view of documents in a digital library. in this system, a set of digital-library documents is represented as a data set in a 2d or 3d scatter plot. the data set can represent any digital-library objects in various formats including books, journals, papers, images, and web resources. marks et al. used six visual attributes of the scatter plot: the x-, y-, and z-axes, color, size, and shape to encode the bibliographic information of documents in a digital library, including title, author, date of publication, and number of citations.18 the user can select and adjust these attributes from a control panel on the right-hand side of the screen. thus, activegraph allows users to both view and customize the contents of a digital library. the main visual representation of this tool is a scatter plot. scatter plots have been used to represent large sets of data for a long time. they provide an overview of a data set and show the distribution of data points clearly, revealing clusters and statistical information.19 hence, these scatter plots make it possible for users to perceive meaningful patterns of the data. an example of using activegraph scatter plots to visualize citation data for postdoctoral researchers at lanl is given in figure 3. this scatter plot intends to provide users with information, such as the number of times their papers published between 1998 and 2002 were cited. the visualization is based on the metadata in the lanl digital library. in this scatter plot, the postdoctoral researcher’s last name is mapped to the x-axis and the number of citations is mapped to the y-axis. a pixel of a particular color can provide two pieces of information, for example, by encoding a paper and the subject category of the paper. a group of pixels of a particular size, shape, and color can provide four pieces of information by encoding a paper, the subject category of the paper, whether the paper has been cited, and whether the paper has been read by another user of the collaborative library. from this scatter plot, users can easily learn the citation pattern of these papers. unlike some other scatter plot applications such as homefinder and filmfinder, activegraph uses different filters for queries. instead of filter sliders, it uses filter lists, which consist of selection list boxes, one for each data attribute. these filter lists can provide users with functionality that is important in the context of digital libraries. activegraph allows users to manipulate the display of data in another manner by applying a logarithmic transformation. as some data sets, such as citation data, can frequently have an exponential distribution, the figure 2. the search interface of the uc system article title | author 91visualizations for digital libraries | wan 91 logarithmic transformation can spread the clustered dots more evenly across the scatter plot. other data transformations and visualizations may be important in some cases as well, such as parallel coordinates for displaying citation statistics for the same group of researchers at different points in time. the scatter plot is not a new visualization technique. this example, however, demonstrates that by encoding document attributes and designing proper filters, it can be used in a digital library environment effectively and efficiently. 3d vase museum the previous example, lvis, already introduced the applications of 3d representations in digital libraries. the 3d vase museum developed by shiaw et al. at tufts university is another good example of 3d space metaphor in digital libraries using a variety of visualization techniques.20 in this 3d vase museum, the user can navigate seamlessly from a high level scatter-plot-like plan view to a perspective overview of a subset of the collection, to a view of an individual item, to retrieval of data associated with that item, all within the same virtual room and without any mode change or special command. unlike the traditional digital library, which displays thumbnails and descriptions of vases in the main browser interface, this museum is a 3d virtual environment that presents each vase as a 3d object and places it in a 3d space of a room within a museum. figure 4 gives a wideangle view of this 3d museum. in this view, one wall represents the timeline (year bce) and the adjacent wall represents the types of wares (e.g., red figures, black figures). the user can “walk” through this virtual room and look at the vases. the wide-angle view pictured in figure 4 will then be transited to an eye-level view so that the user can probe the objects more clearly. when the user continues “walking” toward an object of interest, secondary information about this vase will appear in the virtual scene. if the user looks closer, the text information becomes clearer. as the user moves farther and farther away, the information becomes less and less visible until it eventually disappears from the scene. if the user clicks on the vase html page, a version of the original html page will be loaded, from which a 3d model of the vase can be loaded. the user can then rotate this 3d model on the screen using the mouse to see all the aspects of the vase. the 3d vase museum is maintained in the background all times. the user can also navigate the room in a perspective view by switching the view port upward toward the ceiling (watching from upside down). the user can then switch the views between a high-level scatter plot and a 3d perspective view. similarly, in this application, the xand yaxes are mapped to two attributes of the vases: year and ware. with this seamless blend from a high-level data plot to 3d objects, the user can navigate without losing the point of view or context by just moving within the virtual environment. according to shiaw et al., a usability test has been administered, in which tasks based on archaeology courses were designed and subjects were asked to perform these tasks in the original traditional digital library and this 3d museum.21 the results showed subjects who used the 3d vase museum performed the tasks 33 percent better and did so nearly three times faster. ■ collaborative visual interfaces to digital libraries collaborative visual interfaces is an ongoing project led by borner at indiana university (iu). borner et al. (2002) proposed the development of a shared 3d document space for a scholarly community—namely faculty, staff, and students at iu’s school of library and information science.22 the space will provide access to a collection of various online documents including text, images, video, and software demonstrations. a semantic treemap algorithm has been developed to layout documents in a 3d space.23 semantic treemaps utilize the original treemap approach to determine the size (dependent on the number of documents) and layout figure 3. activegraph scatter plot of citation data for papers authored by lanl postdoctoral researchers 92 information technology and libraries | june 2006 of document clusters. subsequently, an algorithm (force directed placement) was applied to the documents in each cluster to place documents spatially, based on their semantic similarity, which was encoded by the physical proximity between two dots. an example of the semantic treemaps is shown in figure 5.1. a 3d space metaphor was then used to display these documents on the desktop interface, as shown in figure 5.2. in this 3d space, each document is represented by a square panel textured by the corresponding web page’s thumbnail image and augmented by a short description such as the web page title that appears when the user moves the mouse over the panel. as in other 3d environments, users can “walk” through this space to probe documents of interest. upon clicking the panel, the corresponding web page is displayed in the web-browser interface. users can collaboratively examine, discuss, and modify (add and annotate) documents, thereby converting this document space into an ever-evolving repository of the user community’s collective knowledge that members can access, learn from, contribute to, and build upon. certain usability studies have been performed to determine the influence of panel size and panel density on retrieval performance. results showed that subjects were slightly faster and more accurate if web-page panels are larger and denser. ■ aquabrowser aquabrowser is a fuzzy visualization tool that shows the high-level description of a conceptual space, hiding irrelevant information and displaying information elements in context.24 it is a generic java applet that can be embedded into any web page. medialab, the developer of aquabrowser, claims that users of aquabrowser can browse through a dynamic conceptual space that is continually reshaped to reflect their interests. animations make transitions from one state to another appear more fluid, showing users why and how the information is rearranged. medialab uses the term “word cloud” as the visual metaphor of the aquabrowser interface. but in fact, the primary visual representation is a network of linked words that are distributed in the conceptual space. the search term that the user assigns will display at the center of this network. the physical distance between another term node and this term encodes the relevancy between these terms. the larger and closer the word is to the center of the screen, the greater its relevance to the search term. in contrast, the smaller and more peripherally positioned, the less relevant it is. each of the user’s actions will change and rearrange the distribution and importance of the words, putting those of greater interest closer to the user and those of less interest nearer to the edge of the screen. it also uses colors to encode attributes of terms, such as spelling variations, visited words, and translations. figure 6 shows an example of a search display. this tool has been used by a number of libraries to enhance their online catalog search interfaces. it could be a very useful search aid in digital libraries as well. ■ summary and trends the above applications are just a few examples of information visualization in a digital-library environment. many other metaphors and techniques, such as perspective wall, cone tree, document lens, and hyperbolic browser, have been used or can potentially be used to facilitate searching, browsing, and navigating through the maze of information in a digital library. the digital library is an interdisciplinary subject involving several research areas such as information retrieval, multimedia information processing, and classification. all these aspects of digital libraries make information visualizations more complicated in this environment. therefore, the systems described in this paper have integrated various visualization techniques. figure 4. a wide-angle overview of the 3d vase museum article title | author 93visualizations for digital libraries | wan 93 the examples in this paper, along with many others, show that the 3d space metaphor has attracted much attention from information-science communities. the combination of 3d space and virtual reality that can be accessed from web browsers these days is becoming a trend of information visualization for digital libraries. this technique gives the user maximum freedom to walk through the library collections, searching and browsing documents. the 3d visual structures, however, have greater implementations compared with those that are 2d, since they require more processing power and include more parameters.25 that is partly why many 2d visualizations developed in the 1990s are still widely used. for example, both activegraph and 3d vase museum have employed 2d scatter plots; both uc system and collaborative visual interfaces have used treemaps. furthermore, it is very important to focus on the actual needs of users. research on any visualization for digital libraries should be based on the detailed analysis of users, their information needs, and their tasks.26 usability tests have been done for some of the above applications, but not for all of them. further research and usability tests are required to determine to what extent a visual interface facilitates the user ’s perception of information. references and notes 1. r. rao et al., “rich interaction in the digital library,” communications of acm 38, no. 4 (1996): 29–39. 2. p. zaphiris et al, “exploring the user of information visualization for digital libraries,” the new review of information networking 10, no. 1 (2004): 51–69. 3. b. shneiderman et al., “digital library search results with categorical and hierarchical axes,” dl-00: 5th acm digital library conference, san antonio (new york: acm pr., 1999). 4. y. liu et al, “visualizing document classification: a search aid for the digital library,” journal of the american society for information science 51, no. 3 (2000), 216–27. 5. shneiderman et al., “digital library search results with categorical and hierarchical axes.” 6. m. hascoet, “using maps as a user interface to a digital library,” sigir ’98, melbourne, australia (new york: acm pr., 1998). 7. c. ahlberg and b. shneiderman, “visual information seeking using the filmfinder,” acm chi94 conference, boston (new york: acm pr., 1994). 8. k. doan et al., “query previews for networked information services,” advanced digital libraries conference (washington: ieee, 1996). 9. k. borner et al., “lvis—digital library visualizer,” proceedings, ieee international conference on information visualfigure 5.2. interface to the document space figure 5.1. a semantic treemap of web links figure 6. a search display in aquabrowser 94 information technology and libraries | june 2006 ization, july 19–21, 2000, london, england (los alamitos, calif.: ieee computer society, 2000), 77–81. 10. c. cruz-neira et al., “surround-screen projection-based virtual reality: the design and implementation of the cave,” computer graphics (proceedings of siggraph ’93), vol. 27 (new york: acm siggraph, 1993), 135–42. 11. k. borner et al., “lvis—digital library visualizer.” 12. l. e. good et al., “a fluid treemap interface for personal digital libraries,” jcdl’05, june 7–11, denver (new york: acm pr., 2005). 13. w. c. janssen and k. popat, “uplib: a universal personal digital library system,” acm symposium on document engineering (new york: acm press, 2003), 234. 14. good et al., “a fluid treemap interface for personal digital libraries.” 15. b. b. bederson et al., “ordered and quantum treemaps: making effective use of 2d space to display hierarchies,” acm transactions on computer graphics 21, no. 4 (2002): 833–54. 16. l. e. good et al., “zoomable user interface for in-depth reading,” jcdl’04, june 7–11, tucson, ariz. (new york: acm pr., 2004) 17. l. marks et al., “activegraph: a digital library visualization tool,” international journal on digital libraries 5, no. 1 (mar. 2005), 57–69. 18. ibid. 19. e. r. tufte, the visual display of quantitative information (cheshire, conn.: graphics pr., 1983). 20. h. shiaw et al., “the 3d vase museum: a new approach to context in a digital library,” jcdl’04, june 7–11, tucson, ariz. (new york: acm pr., 2004). 21. ibid. 22. k. borner et al., “collaborative visual interfaces to digital libraries,” jcdl’02, july 13–17, portland, ore. (new york: acm pr., 2002). 23. y. feng and k. borner, “using semantic treemaps to categorize and visualize bookmark files,” visualization and data analysis 2002: 21–22 january 2002, san jose, usa (proceedings of spie, v. 4665) (bellingham, wash.: spie—the international society for optical engineering, 2002), 218–27. 24. a. veling, “the aquabrowser—visualization of dynamic concept spaces,” journal of agsi 6, no. 3 (1997): 136–42. 25. b. eden, “3d visualization techniques: 2d and 3d information visualization resources, applications, and future,” library technical reports 41, no. 1 (2005). 26. e. bertini et al., “visualization in digital libraries,” www .dis.uniromal.it/~delos/docs/ivdls_book_chapter.pdf (accessed jan. 12, 2006). lib-mocs-kmc364-20131012114424 book reviews more joy of contracts: an epicurean approach to negotiation, by kevin hegarty. tacoma, wash.: tacoma public library, 1981. 66p. $10. order from: administrative offices, tacoma public library, 1102 tacoma ave. south, tacoma, wa 98402. hegarty's book, the second edition of his original joy of contracts (american library association, dallas, texas, june 1979), has both strengths and weaknesses. the basic strength is one heck of a lot of information about how to negotiate and write a contract that will assure a library that it gets what it pays for from a turnkey automation system. the weakne.c;ses involve the organization of the text, the writing style, the specific focus on automated circulation systems, and the physical format of the document. first, the author has clearly fought his way through a contract negotiation for a turnkey "computerized library circulation system." the first edition of this book was produced soon after that negotiation was completed. this second edition seems to be augmented on the basis of experience gained in living with the contract. the main text walks the reader through each element of a contract (e.g., terms of agreement, specification of governing law, schedule, acceptance testing, etc.), provides sample contract language and adds comments and recommendations for how to cope with specific problems (e. g., negotiation of system reliability standards, p. 3-4). while the contract structure and the specification of contract elements may be useful, the real value of the book lies in the comments (e.g., the difference between two percent downtime and five percent downtime over one year is a system that is disabled for 140 additional hours). the practical value of these comments may be measured in wasted dollars, wasted staff hours, or frustrated library patrons. the section on system maintenance (p. 13-15) 319 alone, may be worth the cost of the book. on the negative side of the ledger, the book is somewhat difficult to use, because of its organization. it is composed of a primary section-in outline form-on the elements of a contract between a library and a vendor, and seven secondary sections, including examples of plans, sub-agreements, and schedules (and a seventeen-item bibliography). that is all that appears in the table of contents and there is neither an introduction, an overview, nor an index. it is very difficult to find a specific topic of interest without skimming through the text itself. second, the body of the text is a mixture of sample (or recommended; it isn't clear) contract language (identifiable by use of the word "shall"), comments on the language of particular portions of the contract (sometimes labeled "comment " and sometimes not), and cross-references within the book itself (sometimes labeled "note:"). the mixture of different elements-contract language, narrative, etc.-are sometimes confusing. moreover, there are a number of small grammatical garbles which are slightly distracting. a bit of professional editing would make this document both more readable and more useful. third, hegarty focuses on (or uses as an example; it isn't clear) automated circulation systems. this would be very useful if that is what the reader intends to buy. however, with a variety of other turnkey automated systems and sources for libraries on the market or soon to be made available (e.g., acquisitions, book fund accounting, cataloging, online bibliographic access), some language about how the contract should be redesigned or revised to account for different systems and services would have made the book more immediately useful to more readers. last, the book comes as a photocopy of a typed original, with a velo binding. the binding of the reviewer's copy broke apart 320 journal of library automation vol. 14/4 december 1981 the first time it was opened. however, it should be possible to rebind or staple it together if this turns out to be a persistent problem. on balance, for those about to negotiate a contract with a vendor of automated systems and services, the strengths of more loy of contracts probably outweigh the weaknesses. one gets what a contract says one will get; any help in writing a thorough, comprehensive, and airtight contract will be of usei-donald thompson, university of california systemwide administration, office of the president, berkeley, california. computer science resources: a guide to professional literature. compiled and edited by darlene myers. white plains, n.y.: knowledge industry publications, 1981 . 346p. $59.50 (asis members: $47.60), paperback. isbn 0-914236-80-6. this comprehensive guide to the englishlanguage literature of computer sciences catalogs qn-time managed by professionals grc provides • accurate • flexible • economical com catalog services contact don gill general research corporation a subsidiary of flow general. inc. p.o box 6770, santa barbara. calofornoa 93111 (805) 964-7724 covers books, journals, technical reports, indexing and abstracting resources, directories, dictionaries, handbooks, newsletters, software resources, proceedings, programming languages, and publishers. its appendixes give information relating to career and salary trends, societies and associations, academic computer-center libraries, commercial fairs and shows, and myers' draft of a proposed expansion of the library of congress classification for the computer sciences. as meyers states in the preface, "the work is designed to serve the needs of researchers, managers, librarians, consultants and systems analysts in academic, corporate and governmental data processing centers." computer science resources, divided into ten main sections with five appendixes at the back, is on the whole easy to use. since the book does not have an index, its table of contents becomes the key to information access. its wide margins together with fairly large print make it very readable. however, its unconventional arrangement of entries-letter by letter ignoring conjunctions and prepositions instead of word by word-can be misleading. for instance, "computers and urban society" is arranged ahead of "computer survey." the word "and" and spaces between words are ignored; resulting in computersurban . .. filing before computersurvey. this practice does not follow the traditional library principle "nothing files before something. " the explanation of the idiosyncratic entry arrangement is only given in the preface. when people use a book for quick reference, they usually skip the preface and the introduction; some users will probably miss many terms as a result. the book is international in scope, relatively up-to-date, and informative. english titles published overseas, foreign publishers, and trade fairs and shows pertaining to the computer industry are included in the directory. most titles mentioned have been published since 1970 and many citations are as recent as 1980. the annotations for each entry in "indexing and abstracting resources," "directories, dictionaries, handbooks," and "software resources'' are very informative. it would have been ideal if titles in the "current books" and "computer-related journals" were also annotated to aid users in selecting the materials. subject headings and cross-references used in various sections of the book are not al·ways consistent. for example, in the section "current books," there is a see reference from "a. i. (artificial intelligence)" to "cybernetics/ artificial intelligence/robots," but none from "artificial intelligence." however, in the section "computer-related journals," the heading is "artificial intelligence" with a see also reference to "cybernetics; robots," but no reference from "a.l. (artificial intelligence)." in the "current books" section, "careers/vocational guidance" is used as a subject heading. in the "computer-related journals" section, "employment" becomes the subject. there is no cross reference from either heading to the other in either section. in the "computer-related journals" section, preceding and succeeding titles are linked by cross-references. the history of title changes is outlined whenever applicable under the entries for the current titles. this information is invaluable especially for librarians in identifying variant journal titles. although there are see references under most former titles to current titles, some entries are omitted for previous titles. for example, injosystems was formerly called management and business automation and later changed to business automation with the merging of international business automation and international edition business automation. then there was business automation news analysis edition published book reviews 321 as a supplement to business automation. surprisingly there are no see references under "business automation" and "international edition business automation" to "infosystems." maybe it is because ''business automation" is quite similar to "business automation news analysis edition'' and "international edition business automation" is similar to "international business automation" and would have appeared close together if not adjacent to one another. again some users may miss the links to the current titles. it might have been better to include a separate list for ceased journals. computer science uesources is the result of monumental effort and years of thorough research and careful planning. its compiler and editor, darlene myers has been very active in the computer and information science field, and is the manager of the computing information center at the university of washington. the wealth of information in the book and the currentness of cited materials are the prominent strengths. the flaws mentioned earlier are minor if users read the preface and the introduction in each section first. this reference tool is strongly recommended for computer industry libraries as well as for medium-sized and large public and academic libraries. although more current, it does not wholly supplant ciel carter's guide to reference sources in the computer sciences (new york: macmillan, 1974). carter's entries are all annotated, and some of the citations are not included in the newer work.-frauces lau, blackwell! north america, beaverton, oregon. microsoft word june_ital_nelson_final.docx what’s in a word? rethinking facet headings in a discovery service david nelson and linda turney information technology and libraries | june 2015 76 abstract the emergence of discovery systems has been well received by libraries who have long been concerned with offering a smorgasbord of databases that require either individual searching of databases or the problematic use of federated searching. the ability to search across a wide array of subscribed and open-‐access information resources via a centralized index has opened up access for users to a library’s wealth of information resources. this capability has been particularly praised for its “google-‐like” search interface, thereby conforming to user expectations for information searching. yet all discovery services also include facets as a search capability and thus provide faceted navigation that is a search feature for which google is not particularly well suited. discovery services thus provide a hybrid search interface. an examination of e-‐commerce sites clearly shows that faceted navigation is an integral part of their discovery systems. many library opacs also now are being developed with faceted navigation capabilities. however, the discovery services faceted structures suffer from a number of problems that inhibit their usefulness and their potential. this article examines several of these issues and offers suggestions for improving the discovery search interface. it also argues that vendors and libraries need to work together to more closely analyze the user experience of the discovery system. introduction the emergence of google as the premier search engine1 has had a very profound effect on searcher expectations regarding information.2 by virtue of its simplicity and the remarkably powerful search algorithms that enable its highly relevant results, the simple search box of google has clearly triumphed as the preferred way to find information. but is the google search model really the panacea that libraries need to resolve their search interface requirements? the nature of search engine and search interface design is a very complex issue. unfortunately for academic libraries, google has dominated discussions and thinking about search engine interfaces: “just google it!” is a simple google search box really the preferred vehicle with which libraries should be delivering their content, both licensed and unlicensed? the assumption librarians make to justify the use of a google model is that library users are essentially google users,3 or that they have the same information searching needs.4 this is a david nelson (david.nelson@mtsu.edu), is chair, collection development and management, and linda turney (linda.turney@mtsu.edu) is cataloging librarian, james e. walker library, middle tennessee state university, murfreesboro, tennessee. what’s in a word? rethinking facet headings in a discovery service | nelson and turney doi: 10.6017/ital.v34i2.5629 77 flawed assumption. as an academic library, we are tasked with making discoverable not simply digital-‐only information, but information objects with discrete characteristics that often constitute the object of a search, e.g., an audio book, a film, or even a book on a shelf. google has put the emphasis on the keyword, with remarkably gratifying results for the average lay user. however, a recent project information literacy study concluded that “google-‐centric search skills that freshmen bring from high school only get them so far—but not far enough—with finding and using the trusted sources they need for fulfilling college research assignments.”5 until now, the library web development focus on providing a “google-‐like” search has, unfortunately, diverted attention from an appreciation of the developments in other areas of the internet world, such as e-‐ commerce, where searching for information is an integral component of the buyer–seller relationship. commercial entities have a vested interest in developing their websites to enable each user to have a successful search outcome. while the search interfaces routinely encountered at various e-‐ commerce sites may seem obvious, it is important to remember that one is looking at a series of deliberate decisions made with regard to the interface organization and structure. for companies, the search interface represents millions of dollars in investment, and the design is part of their search engine optimization strategy.6 in this way, companies and other organizations create robust search interfaces that enable visitors to effectively and efficiently find what they want in the company “knowledgebase.” it is clear that the product search industry has arrived at some very significant conclusions about user search behavior, and that they strive to optimize their interfaces to accommodate those conclusions. three features stand out: (1) the importance of facets as a key component in the search design; (2) the personalization of the text that instructs the user; and (3) intelligibility of facet labels. in a blog article on facets in e-‐commerce websites, scharnell advises that, when determining what the facets are, there are two rules to follow: (1) keep it simple; and (2) create an intuitive structure.7 the primary goal of a commercial website is to bring about what is called conversion—that is, getting someone to the site (driving traffic) and, ultimately, making a sale (the conversion). companies have discovered that facets are the key to enabling their potential customers to locate discrete pieces of information (e.g., a product) almost intuitively. broughton observes that “there is an evident faceted approach to product information in many commercial websites.”8 an important characteristic of the faceted structure is that it enables the user to have the ability to browse a collection. thus the goals of a commercial site successfully employing faceted navigation is not that different from the objectives which a library discovery layer seeks to accomplish. while the literature on information literacy is now vast, very few articles deal with the role that facets play in the discovery process for student searchers. fagan is an author who has addressed this issue of facets.9 ramdeen and hemminger discuss the role of facets in the library catalog. 10 to date, reviews of discovery systems or catalog interfaces tend to place emphasis on helping patrons information technology and libraries | june 2015 78 to search our demonstrably flawed systems rather than considering the interfaces as the actual source of problems for users.11 while it can be argued that comparing an academic site and a commercial site compares apples and oranges, there being little connection between the complex, open-‐ended subject/research questions and searching a company’s inventory of goods. however, there are elements of commonality at the higher level of an information need that drive an individual to perform any kind of information search. in both the subject/topic search and the product search there is a need to evaluate results as they appear and to make various decisions while going through a search process to limit and narrow a search. that is, for the information that libraries seek to make discoverable, it is often their extratextual characteristics that are every bit as important as the content itself. this leads to a discussion of facets, the various attributes by which we can further describe the “manifestation” and the “expression” (using the frbr sense here) of an intellectual creation. we need to pay more attention to the importance of facets as a critical component of the search process. that is, we must begin to move away from the mantra that our single search box will provide a successful result without additional considerations, with the idea that the facets are of secondary, even tertiary importance. badke observes “that users of google actually need a deeper level of information literacy because google offers so little opportunity to nuance or facet results.”12 yet facets are a key part of our discovery interface design. however, a full and successful exploitation of their possibilities has been significantly hobbled by a use of jargon-‐heavy terminology that assumes users will immediately and instinctively grasp the concept of a faceted term. even a superficial study of many successful commercial websites quickly leads the thoughtful observer to the conclusion that their web developers and designers have been making excellent use of focus groups and surveys to make the search process as easy as possible. while businesses have an obvious monetary incentive to make sure their users do not leave a site because the site itself presented a problem, libraries have the same interest in making sure our users are equally able to easily search our site. a library’s site should not, by its assumptions about the user, present obstacles to their search success.13 with the growing use of discovery systems,14 academic libraries are entering into a new phase of search engine deployment.15 by making use of a preindexed database rather than the more restrictive federated search process, the discovery service interface allows a user to search for content in a wide variety of publication and media types (e.g., journals, books, dictionaries, audio books, videos, manuscripts, newspapers, images, etc.). to assist searchers, discovery systems provide faceted navigation along with the search box interface. several studies have shown that the use of facets in the library environment has proven effective in assisting searchers.16 however, it is equally clear that library vendors have not thought deeply about the facet category labels, and libraries, which can do a certain amount of customization, tend toward unquestioning acceptance what’s in a word? rethinking facet headings in a discovery service | nelson and turney doi: 10.6017/ital.v34i2.5629 79 of the vendor-‐supplied labels. this is a critical area involving both the user interface and the user experience; libraries and vendors need to spend far more time and effort on ensuring the intelligibility of the facet labels and on finding effective ways to encourage their use. the presence of facets is a standard feature for all library discovery systems.17 however, as we will show below, facet labels are not easily understandable for the average user and our search systems tend toward emphasizing our users as “anonymous service recipient(s).” what are facets? a review of various discussions of facets in information retrieval literature reveals the elasticity of the term, along with related terms.18 will observes that “what a facet is has been stretched . . . and the term is used loosely to mean any system in which terms are selected from pre-‐defined groups at the time of searching.”19 it is probably easiest to understand the use of the term facet in information retrieval systems as categories derived from the universe of objects that one is seeking to discover, whether we are dealing with manufactured products at home depot or greek manuscripts in a library collection.20 what adds to the problem of definition is the number of synonyms: “the term facet is commonly considered as analogous to category, attribute, class and concept.”21 how objects are grouped would most logically determine the facets that are necessary for the classification scheme. it is the objects that are under a facet that present a problem in understanding. niso z39.19 defines facets as “attributes of content objects encompassing various non-‐semantic aspects of a document,”22 thus including such things as author, language, format, etc. the terms that are indexed are not the facets but rather concepts that exist in a unique relationship to the facet. “homer” is indexed under a facet “author,” but indexing the term author is meaningless. another source of confusion is the failure to distinguish between facets and filters, both of which are used to refine or narrow a search.23 when a search interface states that it is using “faceted navigation,” usually both facets and filters are present. because both a facet and a filter are part of retrieval, it is often difficult to separate the two. once again, we encounter a terminological problem. for example, one can speak of how a facet itself is used to filter a search in the sense that it refines or narrows a search to a smaller segment of the universe of objects. here the term filter refers to the process of narrowing a search. but we also have filters that deal with ranges. thus, the filter “date” covers a range of time, from say one month or one year, to a range over a specific period of time. the same can be seen for the filter “price,” used to specify only one amount, say $5, or a range from $100 to $299. the critical difference between a facet and a range filter is that the terms found in a facet are indexed while a range filter (e.g., date or price) is not an indexed term. it is important to maintain a clear distinction between a facet and a range filter because the underlying metadata is different. a range filter sorts the content in a specific way and at the same time narrows the results. our examples, along with the closer analysis of the ebsco eds discovery system below, will amply demonstrate that facets and filters are extremely effective in information retrieval systems. the information technology and libraries | june 2015 80 challenge that libraries face is the need to make sure that users are aware of their presence on a search interface rather than relying exclusively on keywords alone and solely on the algorithmically based result.24 the value of the faceted/filtered search is the ability to lead the searcher quickly and efficiently to the desired result, a result that will too often elude the user even with a powerful google search, unless that user gets most of the terms exactly right. we chose various e-‐commerce websites because they have extremely large numbers of site visits or because they were smaller specialty sites that reflected a more highly optimized use of facets. a wide range of product types was in the selection. the frequency of visits indicates that large numbers of users are exposed to a search page structure and terminology, which in turn establishes a standard for a set of user expectations. best buy, target, and home depot are among the top on hundred accessed websites, a fact richly indicative of the type of influence they will have in setting user search expectations. an examination of these websites reveals an underlying set of best practices for making use of faceted navigation with text searching. linguistic personalization with the advent of web 2.0 there are several forms of interaction an individual can have with a website. these can be considered forms of personalization of websites.25 usually, personalization is “largely about filtering content to satisfy an individual’s particular” information needs.26 we see personalization at its most complex in the algorithmically adjusted results to a search based on previous searches. there we find the feature of suggestions that are offered to an individual on the basis of search results, a feature offered by amazon and netflix. while we will not be able to personalize our discovery services in a manner similar to netflix or amazon, we can improve the quality of the interaction in other areas of “personalization.” we should be seeking ways we can more directly speak to individual searchers, for example, by selecting words and phrases that speak directly to a person’s needs. our examination of many e-‐commerce sites reveals a robust use of linguistically personalized features as an intrinsic part of their website design and enhancement. that is, e-‐commerce sites make use of their interface itself to directly communicate with their customers in a way that makes use of certain linguistic features that can be easily adopted by library sites. combined with faceted searching, adding certain linguistic features should prove effective in encouraging the use of the facets, and in the process improve both the search results and the user experience. this constitutes the fundamental challenge for the academic library—to help shape the mental model with regard to the universe of content that we provide through our search interface. finally, there is what we can consider a form of linguistic personalization with which language is used to “speak” more directly to a searcher. it is this third feature of linguistic personalization that libraries can more easily control and customize with the discovery services, as well as at other places on the library website. what’s in a word? rethinking facet headings in a discovery service | nelson and turney doi: 10.6017/ital.v34i2.5629 81 there is, of course, the personalization that is intended primarily for those who register and then set up their own accounts. however, there is also the personalization in terms of text communication in which the website uses both pronouns and verb forms that directly address the searcher. this is seen in the use of the second-‐person pronoun, either the subject or the possessive, “you” or “your,” and for verbal forms, the use of the second-‐person imperative (usually the same as the infinitive in english). this type of personalization is a web design decision. the search box now frequently contains text, ranging from simple noun lists to sentences, all of which are intended to encourage the user to make use of the search capabilities. after a search has occurred, the results are also indicated with text that speaks directly to a person by means of the use of pronouns and verbs. we find the following interesting examples in table 1: pronoun site notes what are you looking for today? kroger search box what can we help you find? home depot search box what are you looking for? lowe’s search box your selections target post-‐search we found x results for [search term] target post-‐search narrow your results tigerdirect post-‐search table 1. linguistic personalization examples in examining the features that are found at these e-‐commerce sites, it is interesting to note the use of either of two words for the facet instructions: refine or narrow, two words our users will routinely encounter in nonlibrary searching. the various sites all have the following elements: 1. search box 2. search results outcome clearly shown 3. facet instruction [“refine,” “narrow,” “show”] 4. facets information technology and libraries | june 2015 82 major problems with library discovery interfaces we can identify three important areas that need to be considered with the discovery interface design: 1. the search box itself 2. the facet labels and their intelligibility 3. getting the user to the facets area the library search box the search box makes an excellent point of departure for implementing improvements of the library’s discovery interfaces. note that companies do not assume prior search knowledge on the part of their potential market; they explicitly tell people what they can do in the search box. as we see in table 1, many companies (e.g., home depot and lowe’s) are choosing to use entire sentences, not merely clipped phrases or strings of nouns. many libraries are beginning to populate the search box with text. however, that text is often simply a noun list of types of formats, e.g., articles, books, media, etc. it is important to point out that there is an implicit expectation of an action present in a search box. but too often when our library websites supply a list of nouns, we are assuming that we are answering the question in the mind of the searcher—they are looking for a subject or topic—and we supply a string of nouns that enumerate formats. so right from the beginning, we find a mismatch between the user’s purpose when coming to a library’s search box and our arbitrary enumeration not of topics, but of types of information sources. once we recognize this problem, we have some very good options to choose from in terms of personalizing the search box in a way that is more analogous to what home depot and lowe’s offer: 1. what are you looking for? the sentence above is colloquial; it is exactly what a person would expect to hear when approaching a reference librarian or from a service counter experience in a variety of settings. 2. what are you searching for? this is a more complex concept because it includes what can be considered a technical term (“search”), a word now commonly understood within the context of searching for information and not only applicable to a lost dog or strayed notebook. this simple adjustment matches the user’s intent with a clearly stated purpose in the search box. there are additional ways we can enrich the search box that will assist the users in their queries. what’s in a word? rethinking facet headings in a discovery service | nelson and turney doi: 10.6017/ital.v34i2.5629 83 both examples use the pronoun you so that the sentence speaks directly to the individual searcher. there is, of course, the option to just use a verb in the imperative: “search for . . .” or “enter [keywords, terms, etc.]”. however, the added feature of the pronoun you promotes the involvement of the participant-‐searcher. see also the interesting article by thompson on the use of personal pronouns in social media communications by university students.27 facets column all library discovery services make use of facets. since the facets column does constitute a far more challenging area of linguistic personalization for the discovery interface, the incorporation of specific types of design features should be employed to immediately attract the attention of the user to the facets column. this is a very complicated area that deals with user behavior, interface design, etc. how do we direct the user’s attention to the facets column, let alone to be aware of the facets on the lefthand side? we can add a note after a search that says something to the effect of “too many results/hits? try narrowing your search with the facets below.” although this involves difficult interface design issues, it is very important that we begin to think more seriously about ways to draw our users into the search process more intuitively and effectively. if we don’t, we will find the continual underutilization of an incredibly powerful searching feature. we also know that users routinely ignore advertising banners so often that the literature has christened this tendency “banner blindness”; in the same way, if our facet labels are meaningless, they will be overlooked.28 we condemn the discovery service interface to the same fate if we are not careful to choose meaningful labels that make sense when the “average” student or faculty user encounters them. currently, we are also assuming knowledge on the part of our users that is clearly misplaced or we anticipate a much greater success with instruction than is usually warranted. there are several studies that show the disparity between the searcher’s self-‐ assessment and the reality of the actual skill possessed.29 one of the main problems users experience with search engines is their inability to narrow their searches, especially because we are now dealing with such a large array of information source types.30 this is where the use of facets comes into its own. as we seek to make the discovery interface the first and, eventually, probably the only primary interface to our selected resources, the user needs to know how to easily find a video or a sound recording as well as a pertinent article. this should be done through an easily accessible and understandable search interface. the success of the e-‐commerce sites in making effective and profitable use of facets amply demonstrates the value of facets even for complex research questions and topics. this brings up the matter of naming conventions for the facets. it is clear that, despite the newness of discovery services, the facet labels simply continue the naming conventions that are used in databases. we know from usability studies that library jargon is a stumbling block for our users.31 when we do not pay close attention to the appropriateness of each facet category label, we simply continue the utilization of a terminology that is foreign to the understanding of many of our information technology and libraries | june 2015 84 users,32 undermining the use of a powerful searching feature merely because of user ignorance of the terms. an honest appraisal of the discovery interface will bring us immediately face-‐to-‐face with one of our primary legacy library problems, our heavily jargon-‐laden vocabulary. in fact, we are actually dealing simultaneously with two problems—the facet labels that are chosen and the complexity of the information universe that discovery systems expose. at a presentation on discovery services at the 2014 ala annual conference, one speaker went so far as to say that facets are not used in discovery searches.33 this underscores the unpleasant reality that we are dealing with both a design problem and an intelligibility problem, not the failure of facets as a navigational feature. at a recent loex presentation, one school had already thrown in the towel and will concentrate on teaching academic search premier over the discovery service primo.34 again, this reveals that users are having a problem with the interface and its display content. suggestions for improving facets and the facet labels currently, the facet labels in library discovery service interfaces are limited to a list of nouns that designate the facets that can be used for narrowing or limiting a search. however, the labels that we use may not be meaningful to our users and are simply a list of nouns that are, by and large, not really understood.35 second, a facet label is also intended to have the user do something, hence a verb of action is implied. in standard classification taxonomies, the facet is used for organizing and grouping the objects that will be included in the facet. for a discovery system, the facet is there to lead the searcher to content on the basis of the content’s differing characteristics as expressed through a facet. one has to ask the question, exactly why would a student do something simply because that student sees a noun on the lefthand side? we need to provide more context during the search process. below we make recommendations that we think will enhance the intelligibility and the usability of facets.36 it will be important for libraries and vendors to do substantial user experience investigations into the various options that are available for use on a discovery page. our goal is to draw attention to the current inadequacies in how facets have been implemented in discovery services and to encourage a more systematic approach to this important area of our library information delivery capabilities. 1. as observed above, in the e-‐commerce sites, the facet is indicated by the presence of an icon marker that allows for the facet to expand and contract. in our sample of sites, there was a parity between using the +/-‐ sign or a triangle (a full triangle, not a right and downward chevron). eds made the decision to go with the chevron symbol. this is a user interface issue and one that needs further examination and testing. we think that the +/-‐ sign is a more suitable visual icon indicator for a user to take a specific action. +/-‐ also have a value attached to them that says to a user “yes” for the + sign and “no” for the -‐ sign, thereby signaling a user to expand (+) or contract (-‐) a list. we want to attract users to the facets and to take an action. what’s in a word? rethinking facet headings in a discovery service | nelson and turney doi: 10.6017/ital.v34i2.5629 85 2. make sure that only facets and filters are collapsible and expendable and that the design interface makes this clear. 3. the term limit is often found in discovery systems. this is a term that was not found in our sample of e-‐commerce sites. the two primary terms are refine and narrow. the advantage of using these terms is that one can more easily personalize this feature, “narrow your results to” [full text] [scholarly . . .] [date]; these are two words that users normally see when searching e-‐commerce sites. 4. the facet “source types” is a common facet label. this is obscure terminology that users, especially students, tend not to know. a suitable option to personalize this category could be, “what type of information do you need?” and then list the types. at least by asking the question, a user will be encouraged to look at the possibilities available, e.g., academic journals, trade publications, magazines, etc. in the following list of facets, we can see that the facets themselves are inherently contradictory or do not actually represent what they purport to be. this is not an argument against facets; rather, we need to rethink exactly what we do want our metadata to do. to simply take up space on the facets column does not serve any purpose. it is also clear that we need to systematically monitor the use of facets, and for this we need analytics. at this point, it is difficult, if not impossible, to know whether facets have been used for searches and, if so, which facets have been used. until we routinely gather this sort of data, we will not have the appropriate data to make suitable decisions about facets and their use. 1. language—this facet represents the language (both written and spoken content) of the work. while the term language is understood by users, we need to consider whether the word alone triggers a response. since users most likely want only english, the facet label can ask that question, and then the selection of language choices will appear, making it clear that there are other choices as well. a question like “do you want english only?” will then elicit a response to narrow the results by language. with the majority of the materials in english, this may be moot, but it does encourage the searcher to think about the language. the discovery layer adds the facet term “undetermined” when the provided metadata does not specify the language of a work. in a sense, the metadata has holes and a user that is searching for a particular language will inadvertently exclude relevant search results if the facet is used too soon to filter out undesired languages. we recommend that filtering by language should be used only as necessary and only when overwhelmed by a large number of unwanted languages. 2. publisher—this facet represents the entity or the issuer of a published work. this applies across both serial and nonserial materials. the user most likely understands this term. but the question is, what is the value of this facet? while we do have the information technology and libraries | june 2015 86 metadata for this, it is difficult to understand the circumstances under which one will actually limit a search by the publisher. we suggest not displaying this facet. 3. publication—this facet represents the source title of the published work, such as a journal, trade magazine, or newspaper. this applies primarily to articles, book reviews, columns, etc., and not to publications like books, sound recordings, and videos. the user must be made aware that the use of this facet should be used for serial-‐type materials only. alternatives to “publication” can be “article source.” this facet answers the implicit search query and could be a pop-‐up window: “what journal or magazine are you looking for?” 4. content providers—this is a very problematic facet. it is not difficult to surmise that most users when encountering this term would not know what it means and, more significantly, why it is important. in fact, the term itself is not accurate—another interesting issue that must be dealt with. the “content providers” may not be the actual providers of content but rather providers of the metadata content, which is something altogether different. for example, emerald is the actual content provider for an article, yet a different provider, the metadata provider, is listed as the content provider. a suggested replacement for this term is “sources.” wordings for a pop-‐up window could be, “to narrow your search, choose from a source that most closely matches your topic. the sources are from different types of subject databases.” 5. subject—the use of the facet “subject” may seem to be obvious, yet, upon closer inspection, the nature of this facet is problematic. what is the cognitive connection between first doing a keyword search and then seeing on the lefthand side the facet label “subject?” why should a user assume he or she should now click on a link called “subject,” since they just finished doing a subject search? we need to provide the context for an action that takes into account the most common experience of the user. using the term “topic” rather than “subject” would allow us to offer a term that is more congruent with the familiar vocabulary of a student’s classroom experience because generally students are directed to research topics. a university of washington libraries usability study from the prediscovery era (2004) found that users preferred “browse subjects” to “by subject.” 37 here we see the presence of a verb specifying an action. the significant finding for our purposes from this earlier study is the fact that users found the phrase with a verb more meaningful than the phrase with a preposition. we suggest making it clear that the user can further refine the search by the suggested subjects that are listed in the facets by using the phrase “narrow your topic” or “further narrow your topic.” the pop-‐up window could say, “to narrow your search, choose from this list of possible topics that most closely match your search terms.” what’s in a word? rethinking facet headings in a discovery service | nelson and turney doi: 10.6017/ital.v34i2.5629 87 the conclusion reached by the university of arizona study is even more relevant for the discovery layer interface: “we learned that if students have no idea why or when they should use an index, they will not choose a link labeled index, no matter how well designed the web page is.”38 this is the situation with facet labels. if they are not intelligible, or at least provoke some response to a question posed, they will be ignored, and if ignored, their potential value goes completely unused. conclusion e-‐commerce has concluded, in the face of overwhelmingly positive evidence, that facets are an essential aspect of the successful (i.e., profitable) user experience and that they have been almost universally adopted by companies who sell products, have very large product lines, and need to lead their customer to exactly the type of product they want. in our discovery layers, we also need to develop the kinds of features that promote the effective use of the resources we offer our academic users, and build in, where feasible, appropriate features. modifications can and should be made as libraries work with their discovery-‐services vendor to rationalize an interface page that should include natural language, easily understandable navigation, logical taxonomic ordering of the facets, etc. in essence, both product searches and academic information searches present the same scenario: we begin with an information need, a retrieval system, and the need to achieve recall, precision, and relevance. discovery services allow for an information search to be carried out essentially as a google search while limiting the scope of facets to assistance in refining it. we can be confident that our users, many (or even most) of whom also use e-‐commerce faceted search sites, are able to recognize a similar search interface. thus we are dealing with an important design issue. but to what extent do our users take advantage of faceted searches? as it stands at this writing, the link between the facets and their corresponding content “documents” (articles or video) is simply not clear. the characteristics of our discoverable objects must be tied in with what a user would be likely to understand. we need analytics capable of supplying this sort of critical user-‐experience information. it may be that we are perhaps dealing with conflicting mental models about information searching. students and other members of the academic community may simply not be adequately cognizant of the implicit faceted nature of their query, and this becomes a new opportunity for improvements in our approach to user instruction. it is clear that libraries and vendors need to work together to properly evaluate the facet labels if facets are to begin to achieve their potential as an essential search function. disheartening statements to the effect that no one uses them, or that the discovery system itself is already branded a failure, demonstrates that the discovery layer, while clearly a powerful tool for integrating a range of accessible resource, is still in its infancy. our purpose in this paper was to draw attention to both the proven value of faceted navigation and the ongoing problem of information technology and libraries | june 2015 88 confusing or inadequately understood library terminology that is presently hindering what should be a powerful tool in our information discovery warehouse. references 1. google is ranked number 1 according to alexa, a traffic-‐ranking website. “the top 500 sites on the web,” alexa, accessed may 9, 2014, http://www.alexa.com/topsites. 2. irene lopatovska, megan r. fenton and sara campot, “examining preferences for search engines and their effects on information behavior,” proceedings of the american society for information science & technology 49, no. 1 (2012): 1–11. 3. betsy sparrow, jenny liu and daniel m. wegner, “google effects on memory: cognitive consequences of having information at our fingertips,” science 333, no. 6043 (2011): 776–78; daniel m. wegner and adrian f. ward, “how google is changing your brain,” scientific american 309, no. 6 (2013): 58–61; robin marantz henig and samantha henig, twentysomething: why do young adults seem stuck? (new york: hudson street, 2012), 139– 43; matti näsi and leena koivusilta, “internet and everyday life: the perceived implications of internet use on memory and ability to concentrate,” cyberpsychology, behavior, and social networking 16, no. 2 (2013): 88–93. 4. alison j. head, learning the ropes: how freshmen conduct course research once they enter college (project information literacy, december 5, 2013), http://projectinfolit.org/images/pdfs/pil_2013_freshmenstudy_fullreport.pdf. 5. ibid. 2. 6. joshua steimle, “what does seo cost? [infographic],” forbes, september 12, 2013, http://www.forbes.com/sites/joshsteimle/2013/09/12/what-‐does-‐seo-‐cost-‐infographic/. 7. frank scharnell, “guide to ecommerce facets, filters and catelgories,” youmoz (blog), april 30, 2013, http://moz.com/ugc/guide-‐to-‐ecommerce-‐facets-‐filters-‐and-‐categories 8. vanda broughton, “meccano, molecules, and the organization of knowledge: the continuing contribution of s. r. ranganathan” (presentation, international society for knowledge organization uk chapter, london, november 5, 2007), 2, http://www.iskouk.org/presentations/vanda_broughton.pdf. 9. jody condit fagan, “discovery tools and information literacy,” journal of web librarianship 5, no. 3 (2011): 171–78. 10. sarah ramdeen and bradley m. hemminger, “a tale of two interfaces: how facets affect the library catalog search,” journal of the american society for information science & technology 63 (2012): 702–15. what’s in a word? rethinking facet headings in a discovery service | nelson and turney doi: 10.6017/ital.v34i2.5629 89 11. amy f. fyn, vera lux and robert j. snyder, “reflections on teaching and tweaking a discovery layer,” reference services review 41, no. 1 (2013): 113–24. see also the various presentations at recent loex conferences. 12. william badke, “pushing a big rock up a hill all day: promoting information literacy skills,” online searcher 37, no. 6 (2013): 67. 13. see the following blog entry on library jargon, which makes observations on terms such as “periodicals” and “databases”: “periodicals and other library jargon,” mr. library dude (blog), march 18, 2011, http://mrlibrarydude.wordpress.com/tag/library-‐jargon/. this presentation on library jargon is a very helpful contribution to the discussion: mark aaron polger, “re-‐ thinking library jargon: maintaining consistency and using plain language,” (slideshow presentation, february 5, 2011), http://www.slideshare.net/markaaronpolger/library-‐ jargon-‐newestjan2010feb2010-‐6815908. 14. we are referring here to systems such as ebsco eds, proquest summon, ex libris primo. 15. beth thomsett-‐scott and patricia e. reese, “academic libraries and discovery tools: a survey of the literature,” college & undergraduate libraries 19, no. 2–4 (2012): 123–43; helen dunford, review of planning and implementing resource discovery tools in academic libraries, by mary pagliero popp and diane dallis, the australian library journal 62, no. 2 (2013): 175– 76. 16. sarah ramdeen and bradley m. hemminger, “a tale of two interfaces: how facets affect the library catalog search,” journal of the american society for information science & technology 63 (2012): 713; kathleen bauer and alice peterson-‐hart, “does faceted display in a library catalog increase use of subject headings?,” library hi tech 30, no. 2 (2012): 354; jody condit fagan, “usability studies of faceted browsing: a literature review,” information technology & libraries 29, no. 2 (2010): 62, http://dx.doi.org/10.6017/ital.v29i2.3144. 17. william f. chickering and sharon q. yang, “evaluation and comparison of discovery tools: an update,” information technology & libraries 33, no. 2 (2014), http://dx.doi.org/10.6017/ital.v33i2.3471. 18. vanda broughton, “the need for a faceted classification as the basis of all methods of information retrieval,” aslib proceedings 58, no. 1/2 (2006): 49–72. 19. leonard will, “rigorous facet analysis as the basis for constructing knowledge organization systems (kos) of all kinds” (paper presented at 2013 isko uk conference, london, july 8–9, 2013): 4, http://www.iskouk.org/conf2013/papers/willpaper.pdf. 20. marti a. hearst, “design recommendations for hierarchical faceted search interfaces,” in proceedings of the acm sigir workshop on faceted search (2006), http://flamenco.sims.berkeley.edu/papers/faceted-‐workshop06.pdf. information technology and libraries | june 2015 90 21. kathryn la barre, “traditions of facet theory, or a garden of forking paths?,” in facets of knowledge organization: proceedings of the isko uk second biennial conference, 4th–5th july, 2011, london (bingley, uk : emerald, 2012), 96. 22. barre, “traditions of facet theory, or a garden of forking paths?,” 98. 23. frank scharnell, “guide to ecommerce facets, filters and categories.” 24. andrew d. asher, lynda m. duke, and suzanne wilson, “paths of discovery: comparing the search effectiveness of ebsco discovery service, summon, google scholar, and conventional library resources,” college & research libraries 74, no. 5 (2013): 464–88. 25. saverio perugini, “personalization by website transformation: theory and practice,” information processing & management 46, no. 3 (2010): 284; elizabeth, f. churchill, “putting the person back into personalization,” elizabeth f. churchill (blog), july 24, 2013, http://elizabethchurchill.com/uncategorized/putting-‐the-‐person-‐back-‐into-‐personalization/. 26. churchill, “putting the person back into personalization.” 27. celia thompson, kathleen gray, and hyejeong kim, “how social are social media technologies (smts)? a linguistic analysis of university students’ experiences of using smts for learning,” the internet & higher education 21 (2014): 31–40, http://dx.doi.org/10.1016/j.iheduc.2013.12.001. 28. “banner blindness studies,” bannerblindness.org, accessed april 7, 2014, http://bannerblindness.org/banner-‐blindness-‐studies/. 29. melissa gross and don latham, “undergraduate perceptions of information literacy: defining, attaining, and self-‐assessing skills,” college & research libraries 70, no. 4 (2009): 336–50. 30. see the section “most internet users say they do not know how to limit the information that is collected about them by a website,” pew report 2012, http://www.pewinternet.org/2012/03/09/main-‐findings-‐11/#most-‐internet-‐users-‐say-‐ they-‐do-‐not-‐know-‐how-‐to-‐limit-‐the-‐information-‐that-‐is-‐collected-‐about-‐them-‐by-‐a-‐website. 31. chris jasek, “how to design library websites to maximize usability,” library connect, pamphlet 5 (2007): 4, http://libraryconnectarchive.elsevier.com/lcp/0502/lcp0502.pdf. see also the results compiled in this paper of fifty-‐one intelligibility studies, john kupersmith, “library terms that users understand” (university of california, 2012), http://escholarship.org/uc/item/3qq499w7. 32. paige alfonzo, “my library usability study stage 1,” librarian enumerations (blog), june 19, 2013, http://librarianenumerations.wordpress.com/2013/06/19/library-‐usability-‐study/. 33. “discussing discovery services: what's working, what’s not and what’s next?” (discussion forum, ala 2014 annual conference, las vegas, nevada, june 29, 2014). what’s in a word? rethinking facet headings in a discovery service | nelson and turney doi: 10.6017/ital.v34i2.5629 91 34. susan avery and lisa janicke hinchliffe, “hopes, impressions, and reality: is a discovery layer the answer?” (program, loex 2014 annual conference, grand rapids, michigan, may 8–10, 2014), http://www.loexconference.org/2014/presentations/'loex2014_'hopes%20impressions%2 0and%20reality-‐averyhinchliffe.pdf. 35. kupersmith, “library terms that users understand.” 36. we are taking our examples from ebsco eds with which we are most familiar. the issues discussed are common to all discovery systems. 37. kupersmith, “library terms that users understand.” 38. ruth dickstein and vicki mills, “usability testing at the university of arizona library: how to let the users in on the design,” information technology and libraries 19, no. 3 (2000): 144–51. c1'eation of compuh" "'.,---' llo in (9) presents me results 01 a comparison test 01 me first mree creation of computer input in an expanded character set donald v. black: system development corporation, santa monica california (formerly, university of california, santa cruz, calif.) , keypunching of an expanded character set for library catalog data is described. the set included 101 different characters. source documents were shelf list cards, the master record at the university of california library, santa cruz. at the end of february, 1967, some 50 million characters, 1'epresenting more than 110,000 separate titles, had been punched. some of the considerations leading to the adoption of this method for the creation of machine readable input are given, and details on costs and production rates. for manipulation by a computer, data must be converted to machine readable fornl. there are still only a few reasonably flexible means of creating machine readable records, especially if the data include an. ex~ panded character set. five possible methods utilize one of the fnuowlllg. standard keypunch, paper tape-producing typewriter, optical character reader, keyboard device that encodes dh'ectly onto magnetic tap~, or f keyboard tenuinal that inputs directly into a computer. descriptions. 0 some of these methods are available in the literature. the johns hopkin! university (1) used optical character recognition which can handle a ft~_ > alphanumeric representation, whereas southern illinois (2) used mar. sense scanning to convert only a limited amount of information. car~ wright (3) and ibm (4) discuss direct computer input from a keybo~r terminal. buckland (5) discusses the use of the paper tape-produc:nf typewriter. hammer (6) and kilgour (7) discuss keypunching. patflc t( 8) discusses several methods of conversion, but only in the abstrac · ...cbap\110ds above.does not discuss the relative merlts 01 these methods, but. 'fb paper ts the details of a system that has converted approximately es 11resen . h 2 1ra"eris;fuon characters 01 library catalog data on more t an 0 anguag , 1 500 to et of 101 characters. with • ~ iversity 01 california at santa cruz is one 01 three university '!'h. n recently established by the state. it opened lor business in the ~pls~~65 with a core collection 01 some 55,000 titles in approximately fw000 volun,es. early in the operation 01 the library, it was decided to so, achin ro as much as possible; therelore me existing catalog emods eos. as had to be converted il the original collection were to be a part 'f,~e lutur machine system. the creation 01 the core collection lor the e ;)"ee new campuses 01 the university 01 calilornia has been described in the literature (10). methodsbids were sought to convert the catalog records during the summer 01 )965. the shell list record produced by me new campuses' project was the master record and was to be me source lor conversion. unlortunately, the shell list consisted 01 both printed library of congress caids and cards produced at me new campuses' project irom typewritten multilim mas' ters. no editing was to be done on me shell list caids. the only addition was the stamping 01 an arbitrary number using a five-digit automatic numbering machine, the purpose 01 the number being to keep individual punch cards together for each entry. weighing me responses to me request lor bids was a disherutening experience. only lour responses were received irom a total 01 15 requests sent out. the bid request did not specify the method to be used to convert to machine readable form, but only the resulting machine read able record. since the specifications had used punch cards as an example, p,:,haps this limited the minking 01 some 01 the organizations involved, with the result that they did not choose to bid, e three .bids were based on keypunching. one was from florida and me .omplexlhes of the task made the choice of such a distant company :mpossible. ii problems had arisen during the course 01 me conversion, ravel costs would have been excessive. cf'0ther .response estimated the cost to be about $1.50 per record. early, tllls was too costly, and since bids of this nature are apt to be ~::ervative in 'h. matter ~f ~itimate tot~1 costs, we i~lt the choice 01 eth an mgam7.abon to do tne lob would, mdeed, result m a target figure at would be too high. i ?nly one bid used optical scanning as the method 01 conversion. un· orufately, the bid was for me scanning only, and library staff members wou d have had to retype the records for the scanner. since the cost the scanjling alone was close to 301 a title, that bid was also ljased"'ll~.~being ultunately more costly, choice 01 a kely"nching service in san francisc~ was made m 1'b" fill . 01 its pr<>xirillty to santa cruz, on the enthuslas 01 the '" tbo b""j, task to be undertaken, and on a reasonable cost estimate. ljiddor i -"" .... n <: ' .... ..... dedco, col '" 0 because it was avlulable on an ibm/1401 computer at the los.<: " >,. +j ... co coir\, p co i i ..... >, i -::t -::t cbg'j':. caidpus 01 the university (ucla). at that time, it was the only >< tdp (v") i is '"' $,,; ~;:1~ ~with sucb a printer on the west coast. the character set had been ! g .& ~ .~ ~ ro...... ... z ..., joe joe 0) f co co i, ted by librarians at ucla from characters offered by ibm in the ..; ~~ pi~ '}j~~ co al ij) joe 0 i • ..-i .;.> ~~§ij)'rlpi ... ~~ (j) ~er of 1964 for the 1403 printer.'" >r< a 'rle:;j~~g ~ ...... ~g lor the special characters is descnbed in the tables. there are'-" d ,....-! ~ -rl" (v") col col ~ 0 0 .;.> ~ ul

~ q) q} , o;:j q.! 0) pi 0) .c ~rl+j rooo"'co i i .;.> > u ~ .... ..-i 0 '" ~ a.g -a:ci~~ 0j,'f'cr~ ~ e ~ ~ ~ ~ ~ ~ ~ ;i;«: ~ ~ e-o ~ (j) e-o u '" 0:l.!ll8c"l«:~~~~;:1~ 01 the character; obtaining a centered minus requires a multiple punch u ~ '" ~ -&3 (11. ). the underscore prints in a space by itself, just as do other char· 0:; 'b po< ~ acters. it requires special programming to overprint this character by 0uj co co ..... 0 ~£, , , r-i c\j (v") -::t if'\ \0ffi '):lr-co~j,~ ..... ..... suppressing paper spacing. the virgule overprint requires two columns e-o o ;:1 ..... ~ ~ cj it> punch. sharp.eyed readers will notice that the virgule appears twice ~~ u in figure 1, and it has been counted twice lor the total 01 101 char o;j '""l acters. the blank has also been counted as a character, but the black ~ , < o lz -a '" ... > ....... "" ed «: .'ifj square, which was not used at santa cruz, was not counted.fu ~ all data elements were encoded in fixed card fields; that is, the field for each type of inforrnation had a fixed length, generally 300 characters. ..," it was not necessary, however, to use the entire field or to fill it with g ..-i - :a til zeros or other codes. no terminating characters were used to separate ii the fields. each type of information was included on one or more cards'"' ~ ~ ~ ~ ~ u ~ ~ m m :arin ~ code which would tell the computer precisely what type 01uj 1l ~ ~~ ~(q" ga ii '"' .<: a .., "" '"' ~ () as ..., 0)

<: u there are basically two ways that information can be encoded into ~ ,.a<>: 1-'~g~ ~~6 h~~p~" rd 0) '-" j..t ~1j)~!!~gbj'ga~p~,.... cards. this is discussed in references (3) and (6) especially. to use a.ch~meno)~ en 0 ei ro ~ ,0 r::!30)o.rllqo)....-i~vjdco.....-la"ii ~ ~: '" .~ ..; ... ~ ° '" °m ~ .;.> so'"8 rl0 ~ '" h '" r-id ~~og.;i§ n,",daiqo,"podjl) .... p,..; u~etely variable lonnat it is necessary to bave field delimiting codes. cd ~ a«:~~p., (j) 0:; u w ::0: z ~ ~ ~ «: u uj «: ~ po< 000z i' ~ xed sequence 01 data elements is established (e.g., author, title, pub co co co co ""'-!cococoed o?cfo'?'?'f'?'? 'f'f '" a"fu . ii a number 01 individual codes are to be used to delimit fields, h u ell- 0.05) or total ills received (t = 0.76, p > 0.05). circulation from ’99 to ’00 declined slightly by 3.42 percent while total ills received increased 3.35 percent. the mblc’s available retrospec tive data set currently only goes back to fy ’99, so a deeper understanding beyond this twoyear comparison of normal yeartoyear trends was impossible to achieve. yet data from this sample suggest that both circulation and ills may trend statistically flat from one year of little if any alteration of ill design to the next. additionally, comparisons of the percent of total ills received to total circulation were made between ’99 and ’00 (as will be seen in table 6) and were found to be insignificantly different (z = 0.23, p > 0.05). ills received made up 0.61 percent of total circulation in fy ’99 and 0.65 percent of total circulation in fy ’00. during fy ’01 (november 2001), c/w mars rolled out automated systemwide holds functionality whereby library staff were first able to place patron requests for materials at other c/w mars member libraries through the consortium’s automated circulation system. up until this point, holds (ills) were placed primarily by staff through email or faxed requests from one ill depart ment to another. patrons would request material either verbally with staff or through the submission of a paper or electronic form. staff would then look up the item in the electronic catalog and make the request. with the advent of systemwide holds, staff still accepted requests in a similar fashion, but instead of using the fax or email, they began to place requests directly into the network’s innovative millennium circu lation clients. from there, the automated system not only randomly chose the lending library within the system but also automatically queued paging slips at the lending library for material that would subsequently be sent in transit to the borrowing location. by this time in the network’s development, opac had also graduated from a characterbased telnet system to a smoother web design. but the catalog, in terms of directly assisting in the placing of ill requests, func tioned as it always had—it was still individually a search ing mechanism. the introduction of systemwide holds led to the sec ond largest jump in ill figures out of all comparative samples (table 2, chart 4). interestingly enough, the con siderably significant 127.23percent gain in ill activity from fy ’00 to fy ’01 (t = 4.07, p < 0.05) did not translate into a significant increase in total circulation. in fact, cir culation declined during this period, not significantly (t = 1.87, p > 0.05), but by 2.40 percent nonetheless (table 2, chart 5). a comparison of the percent of ills to total circulation from fy ’00 to fy ’01 (table 7) indicated a sig nificant increase of 0.65 percent to 1.52 percent (z = 4.20, p < 0.05). more on the possible effects to circulation that rising levels of ills may elicit will be touched upon. though no statistical evaluations were made between fy ’01 and fy ’02 (as no novel ill changes were made over this period), it should be noted that during fy ’02 the network first allowed patrons the ability, through opac, to log into their own accounts remotely. patrons were given the ability to set up a personal identification number and view such things as a list of their checked out items. patrons were also allowed to place checks next to such items and to renew these items remotely. fy ’03 saw the original direct ill enhancement to opac. during this year patrons were first given the opportunity to directly place ill requests of their own (patronplaced holds) for material found in the catalog through the addition of an opac screen request button. up until this time, all material requests had been medi ated by library staff. comparative total circulation results from the year before enhancement to fy ’03 (table 3, chart 5) showed only a slightly significant 4.18 percent increase (t = 2.94, p < 0.05). illsreceived figures (table 3, chart 6), however, jumped by a considerable 25.58 percent margin (t = 4.66, p < 0.05), strongly suggesting that the opac request button addition and its facilitation of patronplaced holds had a positive effect upon total ill activity as was hypothesized. finally, total ills received as a percentage of total circulation increased slightly from fy ’02 (2.52 percent) to fy ’03 (3.04 percent) (table 8) but did not rep resent a significant shift (z = 1.51, p > 0.05). the last augmentation to the network’s opac design that this study examined was an additional link for ills through the massachusetts virtual catalog. the massachusetts virtual catalog at the time of this study was an online union catalog of nine massachusetts net work consortia and four university of massachusetts system libraries. unlike the previous requestbutton enhancement that allowed for seamless patronplaced holds within the c/ w mars catalog, the massachusetts virtual catalog link was not a button but a descriptive hyperlink (can’t find the title you want here? try the massachusetts virtual catalog next!) from the network’s opac to the virtual catalog’s own dedicated opac interface. once there, patrons were required to login to the virtual catalog and recreate their search queries from scratch as previous searches were not automatically passed through to the second catalog. in essence, the virtual catalog acted as an additional step for patrons to take beyond c/w mars’s list of holdings to broaden their search for materials that the network’s member libraries did not own. article title | author 41opac design enhancements | bennett 41 comparative figures for total circulation between fy ’04 and fy ’05 (table 4, chart 7) when the virtual catalog link was added to the c/w mars opac screen found circulation down an insignificant 2.04 percent (t = 0.97, p > 0.05), which ran counter to hypothesized expectations. total ills received between fy ’04 and fy ’05 (table 4, chart 8), however, rose 30.85 percent, which proved to be a highly significant increase (t = 7.03, p < 0.05). additionally ills as a percent of total circulation rose from 4.70 percent in fy ’04 to 6.27 percent in fy ’05 (table 9), which was sta tistically significant (z = 3.28, p < 0.05) and pointed to not only gains in ill itself after the introduction of the virtual catalog link but also to the ever increasing proportion of total circulation that ill activity accounted for. the final statistical comparison accomplished in this study was a look at what possible cumulative effect, if any, both opac enhancements may have had from the year before the first enhancement’s rollout (patronplaced holds request button) to one year after the latest addition (virtual catalog hyperlink from opac). in turn, com parative numbers for circulation and ills between fy ’02 and fy ’05 were examined. total circulation over this time (table 5, chart 9) increased insignificantly by 3.46 percent (t = 1.47, p > 0.05). total ills received (table 5, chart 10), how ever, increased by 157.47 percent, the highest significant increase of any two comparative samples (t = 7.20, p < 0.05). ills as a percent of total circulation also increased significantly from 2.52 percent in fy ’02 to 6.27 percent in fy ’05 (z = 7.71, p < 0.05) (table 10). if one steps back and examines the various compari sons discussed up to this point, certain trends become evident. over the course of the sevenyear study, total circulation remained relatively flat, oscillating slightly back and forth, year to year with only one significant increase that occurred after the introduction of patron placed holds in fy ’03. these results, excluding fy ’03, ran against hypothesized expectations that predicted that as ill enhancements were rolled out, correspondingly significant increases in circulation would result. total ills received (the fy ’99 to fy ’00 control com parison) before the advent of first, network systemwide holds, then a succession of opac design enhancements that allowed for a broader range of patroninitiated ills suggested that these totals run statistically flat from one year to the next. with the advent of systemwide holds, the ill picture, however, began to change dramatically with a significant increase in total ills. this was fol lowed by significant increases in ill activity in each study year that came after an opac ill enhancement. these results pointed toward the substantial effect that these enhancements made in total ill activity and sup ported hypothesized expectations. when such opac rollouts were examined as a cumu lative influence through the prism of ill levels of this past fiscal year (fy ’05) compared to the year before their initial advent (fy ’02), the positive effect that such enrich ments had on not only total ill but also on total circula tion becomes clearest. for it is through this comparison that it was found that not only did total ills increase significantly but that ills as a percentage of total circula tion also increased significantly from the time before the first opac enhancement to the present. total circulation was surprisingly impervious to change and ran statisti cally flat during this time. it is clear from this longitudinal study that incremen tally granting patrons access to online tools for them to initiate such traditional library business as ills spurs sig nificantly large increases in such activity. in other words, these online tools are not ignored but are intellectually and literally grasped. what may be surprising, however, is the degree to which ill has increased as a result of them, to a point where ill has not only taken up a sig nificantly greater proportion of total circulation than ever before but also appears to be changing the very nature of circulation itself. future studies may include a deeper examination of the circulation and ill statistical picture farther back in time than this investigation covers to better clarify trends leading up to such major enhancement rollouts. also, similar longitudinal studies from different consortia envi ronments may shed further light on evidence discussed throughout this writing. consortia are uniquely poised to offer large statistical sample sizes and standardized workflows within their networkwide ill and circulation software packages and automated statistical programs. this, in turn, results in highquality, consistent data samples from heterogeneous library sources that are rela tively uncorrupted by scattershot recording methods and differing circulation and ill methodologies. finally, a future look at the effects that similar opac ill enhancements may have on borrowing trends beyond general raw transactional figures is warranted. chris anderson, for example, has recently commented on long tail statistical analysis and its relation to library catalogs. here outwardly shifting demand curves for library mate rials are hypothesized as collections become more visible and interconnected through the web.20 in a similar vein, a more granular examination of such concepts as possible circulation and illactivity trends in terms of discrete material types borrowed, patron types who borrow, or a crosstabulation of these data points would appear to be a fertile next step toward a greater knowledge of ills and circulation as a whole. references 1. t. peters, “when smart people fail: an analysis of the transaction log of an online public access catalog,” the journal of academic librarianship 15, no. 5 (1989): 267–73. 42 information technology and libraries | march 200742 information technology and libraries | march 2007 2. ibid., 272. 3. ibid. 4. ibid., 272. 5. p. wallace, “how do patrons search the online catalog when no one’s looking? transactionlog analysis and impli cations for bibliographic instruction and system design,” rq 33, no. 2 (1993): 239–43. 6. peters, “when smart people fail.” 7. wallace, “how do patrons search the online catalog when no one’s looking?” 239. 8. a. ciliberti et al., “empty handed? a material availabil ity study and transactionlog analysis verification,” the journal of academic librarianship 24, no. 4 (1998): 282–89. 9. p. kantor, “availability analysis,” journal of the american society for information science 27, nos. 5–6 (1976): 311–19. 10. ciliberti et al., “empty handed? a material availability study and transactionlog analysis verification.” 11. peters, “when smart people fail.” 12. ciliberti et al., “empty handed? a material availability study and transactionlog analysis verification.” 13. d. blecic, et al., “a longitudinal study of the effects of opac screen changes on searching behavior and searcher suc cess,” college & research libraries 60, no. 6 (1999): 515–30. 14. ibid. 15. d. thomas, “the effect of interface design on item selec tion in an online catalog,” library resources & technical services 45, no. 1 (2001): 20–46. 16. ibid., 41. 17. r. graham, “subject nohits searches in an academic library online catalog: an exploration of two potential ame liorations,” college & research libraries 65, no. 1 (2004): 36–54. 18. ibid. 19. massachusetts board of library commissioners 2005, “public library data, data files,” http://www.mlin.lib.ma.us/ advisory/statistics/public/index.php (accessed oct. 13, 2005). 20. c. anderson, “the long tail,” wired magazine 12, no. 10 (2004): 170–77; “q&a with chris anderson,” oclc newsletter, 2005, no. 268, http://www.oclc.org/news/publications/news letters/oclc/2005/268/interview.htm (accessed july 20, 2006). appendix a: tables and charts table 1. yearly comparison prior to the beginning of ill opac enhancements table 2. general systemwide holds implementation (adopted 11/00) article title | author 43opac design enhancements | bennett 43 table 3. opac design enhancement: patron-placed holds (adopted 12/02) table 4. opac design enhancement: patron-placed massachusetts virtual catalog holds (adopted 8/04) table 5. opac design enhancements: “cumulative effect” (fy ’02 to fy ’05) table 6. yearly comparison prior to the beginning of ill opac enhancements of ill received as a percentage of total circulation 44 information technology and libraries | march 200744 information technology and libraries | march 2007 table �. opac design enhancement: patron-placed massachusetts virtual catalog holds (adopted 8/04) ill received as a percentage of total circulation table 10. opac design enhancements: “cumulative effect” (fy ’02 to fy ’05) ill received as a percentage of total circulation table 7. general systemwide holds (adopted 11/00) ill received as a percentage of total circulation table 8. opac design enhancement: patron-placed holds (adopted 12/02) ill received as a percentage of total circulation article title | author 45opac design enhancements | bennett 45 chart 1. circulation comparison prior to any ill opac enhancement (fy ’99 to fy ’00) chart 2. ill received comparison prior to any ill opac enhancement (fy ’99 to fy ’00 chart 4. holds received comparison before and after general systemwide holds implementation (adopted 11/00) chart 5. circulation comparison before and after patron-placed holds opac enhancement (adopted 12/02) chart 3. circulation comparison before and after general systemwide holds implementation (adopted 11/00) chart 6. holds received comparison before and after patron-placed holds opac enhancement (adopted 12/02) 46 information technology and libraries | march 200746 information technology and libraries | march 2007 chart 7. circulation comparison before and after massachusetts virtual catalog opac enhancement (adopted 8/04) chart 8. holds received comparison before and after massachusetts virtual catalog opac enhancement (adopted 8/04) chart 9. circulation comparison opac enhancements “cumulative effect” (fy ’02 to fy ’05) chart 10. ill comparison opac enhancements “cumulative effect” (fy ’02 to fy ’05) lita 35, 47, cover 2, cover 4 nealschuman cover 3 index to advertisers usability test results for a discovery tool in an academic library jody condit fagan meris mandernach carl s. nelson jonathan r. paulo grover saunders information technology and libraries | march 2012 83 abstract discovery tools are emerging in libraries. these tools offer library patrons the ability to concurrently search the library catalog and journal articles. while vendors rush to provide feature-rich interfaces and access to as much content as possible, librarians wonder about the usefulness of these tools to library patrons. to learn about both the utility and usability of ebsco discovery service, james madison university (jmu) conducted a usability test with eight students and two faculty members. the test consisted of nine tasks focused on common patron requests or related to the utility of specific discovery tool features. software recorded participants’ actions and time on task, human observers judged the success of each task, and a post–survey questionnaire gathered qualitative feedback and comments from the participants. participants were successful at most tasks, but specific usability problems suggested some interface changes for both ebsco discovery service and jmu’s customizations of the tool. the study also raised several questions for libraries above and beyond any specific discovery-tool interface, including the scope and purpose of a discovery tool versus other library systems, working with the large result sets made possible by discovery tools, and navigation between the tool and other library services and resources. this article will be of interest to those who are investigating discovery tools, selecting products, integrating discovery tools into a library web presence, or performing evaluations of similar systems. introduction discovery tools appeared on the library scene shortly after the arrival of next-generation catalogs. the authors of this paper define discovery tools as web software that searches journal-article and library-catalog metadata in a unified index and presents search results in a single interface. this differs from federated search software, which searches multiple databases and aggregates the results. examples of discovery tools include serials solutions summon, ebsco discovery service, jody condit fagan (faganjc@jmu.edu) is director, scholarly content systems, meris mandernach (manderma@jmu.edu) is collection management librarian, carl s. nelson (nelsoncs@jmu.edu) is digital user experience specialist, jonathan r. paulo (paulojr@jmu.edu) is education librarian, and grover saunders (saundebn@jmu.edu) is web media developer, carrier library, james madison university, harrisonburg, va. mailto:faganjc@jmu.edu mailto:manderma@jmu.edu mailto:nelsoncs@jmu.edu mailto:paulojr@jmu.edu mailto:saundebn@jmu.edu usability test results for a discovery tool in an academic library | fagan et al 84 ex libris primo, and oclc worldcat local; examples of federated search software include serials solutions webfeat and ebsco integrated search. with federated search software, results rely on the search algorithm and relevance ranking as well as each tool’s algorithms and relevance rankings. discovery tools, which import metadata into one index, apply one set of search algorithms to retrieve and rank results. this difference is important because it contributes to a fundamentally different user experience in terms of speed, relevance, and ability to interact consistently with results. combining the library catalog, article indexes, and other source types in a unified interface is a big change for users because they no longer need to choose a specific search tool to begin their search. research has shown that such a choice has long been in conflict with users’ expectati ons.1 federated search software was unable to completely fulfill users’ expectations because of its limited technology.2 now that discovery tools provide a truly integrated search experience, with greatly improved relevance rankings, response times, and increased consistency, libraries can finally begin to meet this area of user expectation. however, discovery tools present new challenges for users: will they be able to differentiate between source types in the integrated results sets? will they be able to limit large results sets effectively? do they understand the scope of the tool and that other online resources exist outside the tool’s boundaries? the sea change brought by discovery tools also raises challenges for librarians, who have grown comfortable with the separation between the library catalog and other online databases. discovery tools may mask important differences between disciplinary searching, and they do not currently offer discipline-specific strategies or limits. they also lack authority control, which makes topical precision a challenge. their usual prominence on library websites may direct traffic away from carefully cultivated and organized collections of online resources. discovery tools offer both opportunities and challenges for library instruction, depending on the academic discipline, users’ knowledge, and information-seeking need. james madison university (jmu) is a predominantly undergraduate institution of approximately 18,000 students in virginia. jmu has a strong information literacy program integrated into the curriculum through the university’s information seeking skills test (isst). the isst is completed before students are able to register for third-semester courses. additionally, the library provides an information literacy tutorial, “go for the gold,” that supports the skills needed for the isst. jmu launched ebsco discovery service (eds) in august 2010 after participating as a beta development partner in spring and summer 2010. as with other discovery tools, the predominant feature of eds is integration of the library catalog with article databases and other types of sources. at the time of this study, eds had a few differentiating features. first, because of ebsco’s business as a database and journal provider, article metadata was drawn from a combination of journal-publisher information and abstracts and index records. the latter included robust subject indexing (e.g., the medical subject headings in cinahl). the content searched by eds varies by information technology and libraries | march 2012 85 institution according to the institution’s subscription. jmu had a large number of ebsco databases and third-party database subscriptions through ebsco, so the quantity of information searched by eds at jmu is quite large. eds also allowed for extensive customization of the tool, including header navigation links, results-screen layout, and the inclusion of widgets in the right-hand column of the results screen. jmu libraries developed a custom “quick search” widget based on eds for the library home page (see figure 1), which allows users to add limits to the discovery-tool search and assists with local authentication requirements. based on experience with a pilot test of the open-source vufind next-generation catalog, jmu libraries believed users would find the ability to limit up-front useful, so quick search’s first drop-down menu contained keyword, title, and author field limits; the second drop-down contained limits for books, articles, scholarly articles, “just leo library catalog,” and the library website (which did not use eds). the “just leo library catalog” option limited the user’s search to the library catalog database records but used the eds interface to perform the search. to access the native catalog interface, a link to leo library catalog was included immediately above the search box as well as in the library website header. figure 1. quick search widget on jmu library homepage usability test results for a discovery tool in an academic library | fagan et al 86 evaluation was included as part of the implementation process for the discovery t ool, and therefore a usability test was conducted in october 2010. the purpose of the study was to explore how patrons used the discovery tool, to uncover any usability issues with the chosen system and to investigate user satisfaction. specific tasks addressed the use of facets within the discovery tool, patrons’ use of date limiters, and the usability of the quick search widget. the usability test also had tasks in which users were asked to locate books and articles using only the discovery tool, then repeat the task using anything but the discovery tool. this article interprets the usability study’s results in the context of other local usability tests and web-usage data from the first semester of use. some findings were used to implement changes to quick search and the library website, and to recommend changes to ebsco; however, other findings suggested general questions related to discovery tool software that libraries will need to investigate further. literature review literature reviewed for this article included some background reading on users and library catalogs, library responses to users’ expectations, usability studies in libraries, and usability studies of discovery tools specifically. the first group of articles comprised a discussion about the limitations of traditional library catalogs. the strengths and weaknesses of library catalogs were reported in several academic libraries’ usability studies.3 calhoun recognized that library users’ preference for google caused a decline in the use and value of library catalogs, and encouraged library leaders to “establish the catalog within the framework of online information discovery systems.” 4 this awareness of changes in user expectations during a time when google set the benchmark for search simplicity was echoed by numerous authors who recognized the limits of library catalogs and expressed a need for the catalog to be greatly modernized to keep pace with the evolution of the web. 5 libraries have responded in several ways to the call for modernization, most notably through investigations related to federated searching and next-generation catalogs. several articles have presented usability studies results for various federated searching products.6 fagan provided a thorough literature review of faceted browsing and next-generation catalogs.7 western michigan university presented usability study results for the next-generation catalog vufind, revealing that participants took advantage of the simple search box but did not use the next-generation catalog features of tagging, comments, favorites, and sms texting. 8 the university of minnesota conducted two usability studies of primo and reported that participants were satisfied with using primo to find known print items, limit by author and date, and find a journal title.9 tod olson conducted a study with graduate students and faculty using the aquabrowser interface, and his participants located sources for their research they had not previously been able to find.10 information technology and libraries | march 2012 87 the literature also revealed both opportunities and limitations of federated searching and nextgeneration catalogs. allison presented statistics from google analytics for an implementation of encore at the university of nebraska-lincoln. 11 the usage statistics revealed an increased use of article databases as well as an increased use of narrowing facets such as format and media type, and library location. allison concluded that encore increased users’ exposure to the entire collection. breeding concluded that federated searching had various limitations, especially search speed and interface design, and was thus unable to compete with google scholar. 12 usability studies of next-generation catalogs revealed a lack of features necessary to fully incorporate an entire library’s collection. breeding also recognized the limitations of next-generation library catalogs and saw discovery tools as their next step in evolution: “it’s all about helping users discover library content in all formats, regardless of whether it resides within the physical library or among its collections of electronic content, spanning both locally owned materials and those accessed remotely through subscriptions.” 13 the dominant literature related to discovery tools discussed features,14 reviewed them from a library selector perspective,15 summarized academic libraries’ decisions following selection, 16 presented questions related to evaluation after selection,17 and offered a thorough evaluation of common features.18 allison concluded that “usability testing will help clarify what aspects need improvement, what additions will make [the interface] more useful, and how the interface can be made so intuitive that user training is not needed.”19 breeding noted “it will only be through the experience of library users that these products will either prove themselves or not.”20 libraries have been adapting techniques from the field of usability testing for over a decade to learn more about user behavior, usability, and user satisfaction, with library web sites and systems. 21 rubin and chisnell and dumas and redish provided an authoritative overview of the benefits and best practices of usability testing. 22 in addition, campbell and norlin and winters offered specific usability methodologies for libraries.23 worldcat local has dominated usability studies of discovery tools published to date. ward, shadle, and mofield conducted a usability study at the university of washington. 24 although the second round of testing was not published, the first round involved seven undergraduate and three graduate students; its purpose “was to determine how successful uw students would be in using worldcat local to discover and obtain books and journal articles (in both print and electronic form) from the uw collection, from the summit consortium, and from other worldcat libraries.” 25 although participants were successful at completing these tasks, a few issues arose out of the usability study. users had difficulty with the brief item display because reviews were listed higher than the actual items. the detailed item display also hindered users’ ability to decipher between various editions and formats. the second round of usability testing, not yet published, included tasks related to finding materials on specific subject areas. usability test results for a discovery tool in an academic library | fagan et al 88 boock, chadwell, and reese conducted a usability study of worldcat local at oregon state university.26 the study included four tasks and five evaluative questions. forty undergraduate students, sixteen graduate students, twenty-four library employees, four instructors, and eighteen faculty members took part in the study. they summarized that users found known-title searching to be easier in the library catalog but found topical searches to be more effective in worldcat local.the participants preferred worldcat local for the ability to find articles and search for materials in other institutions. western washington university also conducted a usability study of worldcat local. they selected twenty-four participants with a wide range of academic experience to conduct twenty tasks in both worldcat local and the traditional library catalog.27 the comparison revealed several problems in using worldcat local, including users’ inability to determine the scope of the content, confusion over the intermixing of formats, problems with the display of facet option, and difficulty with known-item searches. western washington university decided not to implement worldcat local. oclc published a thorough summary of several usability studies conducted mostly with academic libraries piloting the tool, including the university of washington; the university of california (berkeley, davis, and irvine campuses); ohio state university; the peninsula library system in san mateo, california; and the free library of urbana and the des plaines public library, both in illinois.28 the report conveyed favorable user interest in searching local, group, and global collections together. users also appreciated the ability to search articles and books together. the authors commented, “however, most academic participants in one test (nine of fourteen) wrongly assumed that journal article coverage includes all the licensed content available at their campuses.”29 oclc used the testing results to improve the order of search results, provide clarity about various editions, improve facets for narrowing a search, provide links to electronic resources, and increase visibility of search terms. at grand valley state university, doug way conducted an analysis of usage statistics after implementing the discovery tool summon in 2009; the usage statistics revealed an increased use of full-text downloads and link resolver software but a decrease in the use of core subject databases.30 the usage statistics showed promising results, but way recommended further studies of usage statistics over a longer period of time to better understand how discovery tools affect entire library collections. north carolina state university libraries released a final report about their usability study of summon.31 the results of these usability studies were similar to other studies of discovery tools: users were satisfied with the ability to search the library catalog and article databases with a single search, but users had mixed results with known-item searching and confusion about narrowing facets and results ranking. although several additional academic libraries have conducted usability studies of encore, summon, and ebsco discovery service, the results have not yet been published.32 information technology and libraries | march 2012 89 only one usability study of ebsco discovery service was found. in a study with six participants, williams and foster found users were satisfied and able to adapt to the new system quickly but did not take full advantage of the rich feature set.33 combined with the rapid changes in these tools, the literature illustrates a current need for more usability studies related to discovery tools. the necessary focus on specific software implementations and different study designs make it difficult to identify common themes. additional usability studies will offer greater breadth and depth to the current dialogue about discovery tools. this article will help fill the gap by presenting results from a usability study of ebsco discovery service. publishing such usability results of discovery tools will inform institutional decisions, improve user experiences, and advance the tools’ content, features, and interface design. in addition, libraries will be able to more thoroughly modernize library catalogs to meet users’ changing needs and expectations as well as keep pace with the evolution of the web. method james madison university libraries’ usability lab features one workstation with two pieces of usability software: techsmith’s morae (version 3) (http://www.techsmith.com/morae.asp), which records screen captures of participant actions during the usability studies, and the usability testing environment (ute) (version 3), which presents participants with tasks in a web-browser environment. the ute also presents end-of-task questions to measure time on task and task success. the study of eds, conducted in october 2010, was covered by an institutional review board – approved protocol. participants were recruited for the study through a bulk email sent to all students and faculty. interested respondents were randomly selected to include a variety of grade levels and majors for students and years of service and disciplines taught for faculty members. the study included ten participants with ranging levels of experience: two freshman, two sophomores, two juniors, one senior, one graduate student, and two faculty members. three of the participants were from the school of business, one from education, two from the arts and humanities, and two from the sciences. the remaining two participants had dual majors in the humanities and the sciences. a usability rule of thumb is that at least five users will reveal more than 75 percent of usability issues.34 because the goal was to observe a wide range of user behaviors and usability issues, and to gather data about satisfaction from a variety of perspectives, this study used two users of each grade level plus two faculty participants (for a total of ten) to provide as much heterogeneity as possible. student participants were presented with ten pre–study questions, and faculty participants were asked nine pre–study questions (see appendix a). the pre–study questions were intended to http://www.techsmith.com/morae.asp usability test results for a discovery tool in an academic library | fagan et al 90 gather information about participants’ background, including their time at jmu, their academic discipline, and their experience with the library website, the ebscohost interface, the library catalog, and library instruction. since participants were anonymous, we hoped their answers would help us interpret unusual comments or findings. pre–test results were not used to form comparison groups (e.g., freshmen versus senior) because these groups would not be representative of their larger populations. these questions were followed by a practice task to help familiarize participants with the testing software. the study consisted of nine tasks designed to showcase usability issues, show the researchers how users behaved in the system, and measure user satisfaction. appendix b lists the tasks and what they were intended to measure. in designing the test, determining success on some tasks seemed very objective (find a video about a given topic) while others appeared to be more subjective (those involving relevance judgments). for this reason, we asked participants to provide satisfaction information on some tasks and not others. in retrospect, for consistency of interpretation, we probably should have asked participants to rate or comment on every task. all of the tasks were presented in the same order. tasks were completed either by clicking “answer” and answering a question (multiple choice or typed response), or by clicking “finished” after navigating to a particular webpage. participants also had the option to skip the task they were working on and move to the next task. allowing participants to skip a task helps differentiate between genuinely incorrect answers and incorrect answers due to participant frustration or guessing. a time limit of 5 minutes was set for tasks 1–7, while tasks 8 and 9 were given time limits of 8 minutes, after which the participant was timed out. time limits were used to ensure participants were able to complete all tasks within the agreed-upon session. average time on task across all tasks was 1 minute, 35 seconds. after the study was completed, participants were presented with the system usability scale (sus), a ten-item scale using statements of subjective assessment and covering a variety of aspects of system usability.35 sus scores, which provide a numerical score out of 100, are affected by the complexity of both the system and the tasks users may have performed before taking the sus. the sus was followed by a post–test consisting of six open-ended questions, plus one additional question for faculty participants, intended to gather more qualitative feedback about user satisfaction with the system (see appendix a). a technical glitch with the ute software affected the study in two ways. first, on seven of the ninety tasks, the ute failed to enforce the five-minute maximum time limit, and participants exceeding a task’s time limit were allowed to continue the task until they completed or skipped the task. one participant exceeded the time limit on task 1 while three of these errors occurred during both tasks 8 and 9. this problem potentially limits the ability to compare the average time on task across tasks; however, since this study used time on task in a descriptive rather than comparative way, the impact on interpreting results is minimal. the seven instances in which the glitch occurred were included in the average time on task data found in figure 3 because the times information technology and libraries | march 2012 91 were not extreme and the time limit had been imposed mostly to be sure participants had time to complete all the tasks. a second problem with the ute was that it randomly and prematurely aborted some users’ tasks; when this happened, participants were informed that their time had run out and were then moved on to the next task. this problem is more serious because it is unknown how much more time or effort the participant would have spent on the task or whether they would have been more successful. because of this, the results below specify how many participants were affected for each task. although this was unfortunate, the results of the participants who did not experience this problem still provide useful cases of user behavior, especially because this study does not attempt to generalize observed behavior or usability issues to the larger population. although a participant mentioned a few technical glitches during testing to the facilitator, the extent of software errors was not discovered until after the tests were complete (and the semester was over) because the facilitator did not directly observe participants during sessions. results the participants were asked several pre–test questions to learn about their research habits. all but one participant indicated they used the library website no more than six times per month (see figure 2). common tasks this study’s student participants said they performed on the website were searching for books and articles, searching for music scores, “research using databases,” and checking library hours. the two faculty participants mentioned book and database searches, electronic journal access, and interlibrary loan. participants were shown the quick search widget and were asked “how much of the library’s resources do you think the quick search will search?” seven participants said “most”; only one person, a faculty member, said it would search “all” the library’s resources. figure 2. monthly visits to library website < 1 visit (2) 1 3 visits (4) 4 6 visits (3) > 7 visits (1) usability test results for a discovery tool in an academic library | fagan et al 92 when shown screenshots of the library catalog and an ebscohost database, seven participants were sure they had used leo library catalog, and three were not sure. three indicated that they had used an ebsco database before, five had not, and two were not su re. participants were also asked how often they had used library resources for assignments in their major field of study; four said “often,” two said “sometimes,” one “rarely/never,” and one “very often.” students were also asked “has a librarian spoken to a class you’ve attended about library research?” and two said yes, five said no, and one was not sure. a “practice task” was administered to ensure participants were comfortable with the workstation and software: “use quick search to search a topic relating to your major/discipline or another topic of interest to you. if you were writing a paper on this topic how satisfied would you be with these results?” no one selected “no opinion” or very unsatisfied”; sixty percent were “very satisfied” or “satisfied” with their results; forty percent were “somewhat unsatisfied.” figure 3 shows the time spent on each task, while figure 4 describes participants’ success on the tasks. task 1 task 2 task 3 task 4 task 5 task 6 task 7 task 8 task 9 no. of responses (not including timeouts) 10 9 5 7 9 10 10 8 10 avg. time on task (in seconds) 175* 123 116 97 34 120 92 252* 255* standard deviation 212 43 50 49 26 36 51 177 174 *includes time(s) in excess of the set time limit. excess time allowed by software error. figure 3. average time spent on tasks 175 123 116 97 34 120 92 292 255 0 50 100 150 200 250 300 350 task 1 task 2 task 3 task 4 task 5 task 6 task 7 task 8 task 9 t im e o n t a sk ( in s e co n d s) average time for all tasks (not including timeouts) information technology and libraries | march 2012 93 the first task (“what was the last thing you searched for when doing a research assignment for class? use quick search to re-search for this.”) started participants on the library homepage. participants were then asked to “tell us how this compared to your previous experience” using a text box. the average time on task was almost 2 minutes; however one faculty participant took more than 12 minutes on this task; if his or her time was removed, the time on task average was 1 minute, 23 seconds. figure 5 shows the participants’ search terms and their comments. task 1 task 2 task 3 task 4 task 5 task 6 task 7 task 8 task 9 how success determined users only asked to provide feedback valid typed-in response provided how many subtasks completed (out of 3) how many subtasks completed (out of 2) correct multiple choice answer how many subtasks completed (out of 2) end task at correct web location how many subtasks complete d (out of 4) how many subtasks completed (out of 4) p01 n/a correct 3 2 timeout 2 correct 0* 0** p02 n/a correct 3* 1 correct 2 correct 0** 3 p03 n/a correct 0* 1 incorrect 2 correct 4 3 p04 n/a correct 2 0* correct 2 skip 3 2 p05 n/a correct* 2 2 correct 1 correct 4 2 p06 n/a correct 3* 1 correct 1 correct 3 0** p07 n/a correct 2 1* correct 1 correct 0 2 p08 n/a correct 2 0* correct 0 skip timeout 0** p09 n/a correct 2* skip correct 2 correct 4 2 p10 n/a correct 1* 1 correct 2 skip 4 2 note: “timeout” indicates an immediate timeout error. users were unable to take any action on the task. *user experienced a timeout error while working on the task. this may have affected their ability to complete the task. **user did not follow directions. figure 4. participants’ success on tasks usability test results for a discovery tool in an academic library | fagan et al 94 participant jmu status major/discipline search terms p01 faculty geology large low shear wave velocity province comments: ebsco did a fairly complete job. there were some irrelevant results that i don’t remember seeing when i used georef. p02 faculty computer information systems & management science (statistics) student cheating comments: this is a topic that i am somewhat familiar with the related literature. i was pleased with the diversity of journals that were found in the search. the topics of the articles was right on target. the recency of the articles was great. this is a topic for which i am somewhat familiar with the related literature. i was impressed with the search results regarding: diversity of journals; recency of articles; just the topic in articles i was looking for. p03 graduate student education death of a salesman comments: there is a lot of variety in the types of sources that quick search is pulling up now. i would still have liked to see more critical sources on the play but i could probably have found more results of that nature with a better search term, such as “death of a salesman criticism.” p04 1st year voice performance current issues in russia comments: it was somewhat helpful in the way that it gave me information about what had happened in the past couple months, but not what was happening now in russia. p05 3rd year nursing uninsured and health care reform comments: the quick search gave very detailed articles i thought, which could be good, but were not exactly what i was looking for. then again, i didn’t read all these articles either p06 1st year history headscarf law comments: this search yielded more results related to my topic. i needed other sources for an argument on the french creating law banning religious dress and symbols in school. using other methods with the same keyword, i had an enormous amount of trouble finding articles that pertained to my essay. p07 3rd year english jung comments: i like the fact that it can be so defined to help me get exactly what i need. p08 4th year spanish restaurant industry comments: this is about the same as the last time that i researched this topic. p09 2nd year hospitality aphasia comments: there are many good sources, however there are also completely irrelevant sources. p10 2nd year management rogers five types of feedback comments: there is not many documents on the topic i searched for. this may be because the topic is not popular or my search is not specific/too specific. figure 5. participants’ search terms and comments information technology and libraries | march 2012 95 the second task started on the library homepage and asked participants to find a video related to early childhood cognitive development. this task was chosen because jmu libraries have significant video collections and because the research team hypothesized users might have trouble because there was no explicit way to limit to videos at the time. the average time on this task was two minutes, with one person experiencing an arbitrary time out by the software. participants were judged to be successful on this task by the researchers if they found any video related to the topic. all participants were successful on this task, but four entered, then left the discovery tool interface to complete the task. five participants looked for a video search option in the drop-down menu, and of these, three immediately used something other than quick search when they saw that there was no video search option. of those who tried quick search, six opened the source type facet in eds search results and four selected a source type limit, but only two selected a source type that led directly to success (“non-print resources”). task 3 started participants in eds (see figure 6) and asked them to search on speech pathology, find a way to limit search results to audiology, and limit their search results to peer-reviewed sources. participants spent an average of 1 minute, 40 seconds on this task, with five participants being artificially timed out by the software. participants’ success on this task was determined by the researchers’ examination of the number of subtasks they completed. the three subtasks consisted of successfully searching for the given topic (speech language pathology) limiting the search results to audiology, and further limiting the results to peer reviewed sources. four participants were able to complete all three subtasks, including two who were timed out. (the times for those who were timed out were not included in time on task averages, but they were given credit for success.) five completed just two of the subtasks, failing to limit to peerreviewed; one of these because of a timeout. it was unclear why the remaining participants did not attempt to alter the search results to “peer reviewed.” looking at the performed actions, six of the ten typed “and audiology” into search keywords to narrow the search results, while one found and used “audiology” in the subject facet on the search results page. six participants found and used the “scholarly (peer reviewed) journals” checkbox limiter. usability test results for a discovery tool in an academic library | fagan et al 96 figure 6. ebsco discovery service interface beginning with the results they had from task 3, task 4 asked participants to find more recent sources and to select the most recent source available. task success was measured by correct completion of two subtasks: limiting the search results to the last five years and finding the most recent source. the average time on task was 1 minute, 14 seconds, with three artificial timeouts. of those who did not time out, all seven were able to limit their sources to be more recent in some way, but only three were able to select the most recent source. in addition to this being a common research task, the team was interested to see how users accomplished this task. three typed in the limiter in the left-hand column, two typed in the limiter on the advanced search screen, and two used the date slider. two participants used the “sort” drop-down menu to change the sort order to “date descending,” which helped them complete this task. other participants changed the dates, and then selected the first result, which was not the most recent. task 5, which started within eds, asked participants to find a way to ask a jmu librarian for help. the success of this task was measured by whether they reached the correct url for the ask-a information technology and libraries | march 2012 97 librarian page; eight of the ten participants were successful. this task took an average of only 31 seconds to complete, and eight of the ten used the ask-a-librarian link at the top of the page. of the two unsuccessful participants, one was timed out, while another clicked “search modes” for no apparent reason, then clicked back and decided to finish the task. task 6 started in the eds interface and asked participants to locate the journal yachting and boating world and select the correct coverage dates and online status from a list of four options; participants were deemed successful at two subtasks if they selected the correct option and successful at one subtask if they chose an option that was partially correct. participants took an average of two minutes on this task; only five answered correctly. during this task, three participants used the ebsco search option “so journal title/source,” four used quotation marks, and four searched or re-searched with the “title” drop-down menu option. three chose the correct dates of coverage, but were unable to correctly identify the online availability. it is important to note that only searching and locating the journal title were accomplished with the discovery tool; to see dates of coverage and online availability, users clicked jmu’s link resolver button, and the resulting screen was served from serials solutions’ article linker product. although some users spent more time than perhaps was necessary using the eds search options to locate the journal, the real barriers to this task were encountered when trying to interpret the serials solutions screen. task 7, where participants started in eds, was designed to determine whether users could navigate to a research database outside of eds. users were asked to look up the sculpture genius of mirth and were told the library database camio would be the best place to search. they were instructed to “locate this database and find the sculpture.” the researcher observed the recordings to determine success on this task, which was defined as using camio to find the sculpture. participants took an average of 1 minute, 32 seconds on this task; seven were observed to complete the task successfully, while three chose to skip the task. to accomplish this task, seven participants used the jmu research databases link in the header navigation at some point, but only four began the task by doing this. six participants began by searching within eds. the final two tasks started on the library homepage and were a pair: participants were asked to find two books and two recent, peer-reviewed articles (from the last five years) on rheumatoid arthritis. task 8 asked them to use the library’s eds widget, quick search, to accomplish this, and task 9 asked them to accomplish the same task without using quick search. when they found sources, they were asked to enter the four relevant titles in a text-entry box. the average time spent on these tasks was similar: about four minutes per task. comparing these tasks was somewhat confusing because some participants did not follow instructions. user s uccess was determined by the researchers’ observation of how many of the four subtasks the user was able to complete successfully: find two books, find two articles, limit to peer reviewed, and select articles from last five years (with or without using a limiter); figure 4 shows their success. usability test results for a discovery tool in an academic library | fagan et al 98 looking at the seven users who used quick search on the quick search tasks, six limited to “scholarly (peer reviewed) journals”; six limited to the last five years; and seven narrowed results using the source type facet. the average number of subtasks completed on task eight was 3.14 out of 4. looking at the seven users who followed instructions and did not use quick search on task 9, all began with the library catalog and tried to locate articles within the library catalog. the average number of subtasks completed on task 9 was 2.29 out of 4. some users tried to locate articles by setting the catalog’s material type drop-down menu to “periodicals” and others used the catalog’s “periodical” tab, which performed a title keyword search of the e-journal portal. for task 9, only two users eventually chose a research database to find articles. user behavior can only be compared for the six users (all students) who followed instructions on both tasks; a summary is provided in figure 4. after completing all nine tasks, participants were presented with the system usability scale. eds scored 56 out of 100. following the sus, participants were asked a series of post–test questions. only one of the faculty members chose to answer the post–test questions. when asked how they would use quick search, all eight students explicitly mentioned class assignments, and the participating faculty member replied “to search for books.” two students mentioned books specifically, while the rest used the more generic term “sources” to describe items for which they would search. when asked “when would you not use this search tool?” the faculty member said “i would just have to get used to using it. i mainly go to [the library catalog] and then research databases.” responses from the six students who answered this question were vague and hard to categorize: • “not really sure for more general question/learning” • “when just browsing” • “for quick answers” • “if i could look up the information on the internet” • “when the material i need is broad” • “basic searching when you do not need to say where you got the info from” when asked for the advantages of quick search, four specifically mentioned the ability to narrow results, three respondents mentioned “speed,” three mentioned ease of use, and three mentioned relevance in some way (e.g., “it does a pretty good job associating keywords with sources”). two mentioned the broad coverage and one compared it to google, “which is what students are looking for.” when asked to list disadvantages, the faculty member mentioned he/she was not sure what part of the library home page was actually “quick search,” and was not sure how to get to his/her library account. three students talked about quick search being “overwhelming” or “confusing” because of the many features, although one of these also stated, “like anything you need to learn in order to use it efficiently.” one student mentioned the lack of an audio recording limit and another said “when the search results come up it is hard to tell if they are usable results.” information technology and libraries | march 2012 99 knowing that quick search may not always provide the best results, the research team also asked users what they would do if they were unable to find an item using quick search. a faculty participant said he or she would log into the library catalog and start from there. five students mentioned consulting a library staff member in some fashion. three mentioned moving on from library resources, although not necessarily as their first step. one said “find out more information on it to help narrow down my search.” only one student mentioned the library catalog or any other specific library resource. when participants were asked if “quick search” was an appropriate name, seven agreed that it was. of those who did not agree, one participant’s comment was “not really, though i don’t think it matters.” and another’s was “i think it represents the idea of the search, but not the action. it could be quicker.” the only alternative name suggestion was “search tool.” web traffic analysis web traffic through quick search and in eds provides additional context for this study’s results. during august–december 2010, quick search was searched 81,841 times from the library homepage. this is an increase from traffic into the previous widget in this location that searched the catalog, which received 41,740 searches during the same period in 2009. even adjusting for an approximately 22 percent increase in website traffic from 2009 to 2010, this is an increase of 75 percent. interestingly, the traffic to the most popular link on the library homepage, research databases, went from 55,891 in 2009 to 30,616 in 2010, a decrease of 55 percent when adjusting for the change in website traffic. during fall 2010, 28 percent of quick search searches from the homepage were executed using at least one drop-down menu. twelve percent changed quick search’s first drop-down menu to something other than the keyword default, with “title” being the most popular option (7 percent of searches) followed by author (4 percent of searches). twenty percent of users changed the second drop-down option; “just articles” and “just books” were the most popular options, garnering 7 percent and 6 percent of searches, respectively, followed by “just scholarly articles,” which accounted for 4 percent of searches. looking at ebsco’s statistical reports for jmu’s implementation of eds, there were 85,835 sessions and approximately 195,400 searches during august–december 2010. this means about 95 percent of eds sessions were launched using quick search from the homepage. there were an average of 2.3 searches per session, which is comparable to past behavior in jmu’s other ebscohost databases. discussion usability test results for a discovery tool in an academic library | fagan et al 100 the goal of this study was to gather initial data about user behavior, usability issues, and user satisfaction with discovery tools. the task design and technical limitations of the study mean that comparing time on task between participants or tasks would not be particularly illuminating; and, while the success rates on tasks are interesting, they are not generalizable to the larger jmu population. instead, this study provided observations of user behavior that librarians can use to improve services, it suggested some “quick fixes” to usability issues, and it pointed to several research questions. when possible, these observations are supplemented by comparisons between this study and the only other published usability study of eds.36 this study confirmed a previous finding of user studies of federated search software and discovery tools: students have trouble determining what is searched by various systems.37 on the tasks in which they were asked to not use quick search to find articles, participants tried to search for articles in the library catalog. although all but one of this study’s participants correctly answered that quick search did not search “all” library resources, seven thought it searched “most.” both “most” or “some” would be considered correct; however, it is interesting that answering this question more specifically is challenging even for librarians. many journals in subject article indexes and abstracts are included in the eds foundation index; furthermore, jmu’s implementation of eds includes all of jmu’s ebsco subscription resources as well, making it impractical to assemble a master list of indexed titles. of course, there are numerous online resources with contents which may never be included in a discovery tool, such as political voting records, ethnographic files, and financial data. users often have access to these resources through their library. however, if they do not know the library has a database of financial data, they will certainly not consider this content in their response to a question of how many of the library resources are included in the discovery tool. as discovery tools begin to fulfill users’ expectations for a “single search,” libraries will need to share best practices for showcasing valuable, useful collections that fall outside the discovery tool’s scope or abilities. this is especially critical when reviewing the 72 percent increase in homepage traffic to the homepage search widget compared with the 55 percent decrease in homepage traffic to the research databases page. it is important to note these trends do not mean the library’s other research databases have fallen in usage by 55 percent. though there was not a comprehensive examination of usage statistics, spot-checking suggested ebsco and non-ebsco subject databases had both increases and decreases in usage from previous years. another issue libraries should consider, especially when preparing for instruction classes, is that users do not seem to understand which information needs are suited to a discovery tool versus the catalog or subject-specific databases. several tasks provided additional information about users’ mental models of the tool, which may help libraries make better decisions about navigation customizations in discovery tool interfaces and on library websites. task 7 was designed to discover whether users could find their way to a database outside of eds if they knew they needed to use a specific database. six participants, including one of the faculty members, began by searching eds for the name of the sculpture and/or the database name. on task 1, a graduate information technology and libraries | march 2012 101 student who searched on “death of a salesman” and was asked to comment on how quick search results compared to his or her previous experience, said, “i would still have liked to see more critical sources on the play but i could probably have found more results of that nature with a better search term, such as ‘death of a salesman criticism.’” while true, most librarians would suggest using a literary criticism database, which would target this information need. librarians may have differing opinions regarding the best research starting point, but their rationale would be much different than that of the students in this study. this study’s participants said they would use quick search/eds when they were doing class work or research, but would not use it for general inquiries. if librarians were to list which user information needs are best met by a discovery tool versus a subject-specific database, the types of information needs listed would be much more numerous and diverse, regardless of differences over how to classify them. in addition to helping users choose between a discovery tool or a subject-specific database, libraries will need to conceptualize how users will move in and out of the discovery tool to other library resources, services, and user accounts. while users had no trouble finding the ask-alibrarian link in the header, it might have been more informative if users started from a searchresults page to see if they would find the right-hand column’s ask-a-librarian link or links to library subject guides and database lists. discovery tools vary in their abilities to connect users with their online library accounts and are changing quickly in this area. this study also provided some interesting observations about discovery tool interfaces. the default setting for ebsco discovery service is a single search box. however, this study suggests that while users desire a single search, they are willing to use multiple interface options. this was supported by log analysis of the library’s locally developed entry widget, quick search, in which 28 percent of searches included the use of a drop-down menu. on the first usability task, users left quick search’s options set to the default. on other tasks, participants frequently used the dropdown menus and limiters in both quick search and eds. for example, on task 2, which asked them to look for videos, five users looked in the quick search format drop-down menu. on the same task within eds, six users attempted to use the source type facet. use of limiters was similarly observed by williams and foster in their eds usability study.38 one eds interface option that was not obvious to participants was the link to change the sort order. when asked to find the most recent article, only two participants changed the sort option. most others used the date input boxes to limit their search, then selected the first result even thought it was not the most recent one. it is unclear whether the participant assumed the first result was the most recent or whether they could not figure out how to display the most recent sources. finding a journal title from library homepages has long been a difficult task,39 and this study provided no exception, even with the addition of a discovery tool. it is important to note that the standard eds implementation would include a “publications” or “journals a–z” link in the header; usability test results for a discovery tool in an academic library | fagan et al 102 in eds, libraries can customize the text of this link. jmu did not have this type of link enabled in our test, since the hope was that users could find journal titles within the eds results. however, neither eds nor the quick search widget’s search interfaces offered a way to limit the search to a journal title at the time of this study. during the usability test, four participants changed the field search drop-down menu to “title” in eds, and three participants changed the eds field search drop-down menu to “so journal title/source,” which limits the search to articles within that journal title. while both of these ideas were good, neither one resulted in a precise results set in eds for this task unless the user also limited to “jmu catalog only,” a nonintuitive option. since the test, jmu has added a “journal titles” option to quick search that launches the user’s search into the journal a–z list (provided by serials solutions). in two months after the change (february and march 2011), only 391 searches were performed with this option. this was less than 1 percent of all searches, indicating that while it may be an important task, it is not a popular one. like many libraries with discovery tools, jmu added federated search capabilities to eds using ebscohost integrated search software in an attempt to draw some traffic to databases not included in eds (or not subscribed to through ebsco by jmu), such as mla international bibliography, scopus, and credo reference. links to these databases appeared in the upper-righthand column of eds during the usability study (see figure 6.) usage data from ebsco showed that less than 1 percent of all jmu’s eds sessions for fall 2010 included any interaction with this area. likewise, williams and foster observed their participants did not use their federated search until explicitly asked to do so.40 perhaps users faced with discovery tool results simply have no motivation to click on additional database results. since the usability test, jmu has replaced the right-hand column with static links to ask-a-librarian, subject guides, and research database lists. readers may wonder why one of the most common tasks, finding a specific book title, was not included in this usability study; this was because jmu libraries posed this task in a concurrent homepage usability study. on that study, twenty of the twenty-five participants used quick search to find the title “pigs in heaven” and choose the correct call number. eleven of the twenty used the quick search drop-down menu to choose a title search option, further confirming users’ willingness to limit up-front. the average time on this task was just under a minute, and all participants completed this task successfully, so this task was not repeated in the eds usability test. other studies have reported trouble with this type of task;41 much could depend on the item chosen as well as the tool’s relevance ranking. user satisfaction with eds can be summarized from the open-ended post–study questions, from the responses to task 1 (figure 5), and the sus scale. answers to the post–study questions indicated participants liked the ability to narrow results, the speed and ease of use, and relevance of the system. a few participants did describe the system as being “overwhelming” or “confusing” because of the many features, which was also supported by the sus scores. jmu has been using the sus to understand the relative usability of library systems. the sus offers a benchmark for system improvement; for example, ebsco discovery service received an sus of only 37 in spring 2010 (n information technology and libraries | march 2012 103 = 7) but a 56 on this study in fall 2010 (n = 10). this suggests the interface has become more usable. in 2009, jmu libraries also used the sus to test the library catalog’s classic interface as well as a vufind interface to the library catalog, which received scores of 68 (n = 15) and 80 (n = 14), respectively. the differences between the catalog scores and eds indicate an important distinction between usability and usefulness, with the latter concept encompassing a system’s content and capabilities. the library catalog is, perhaps, a more straightforward tool than a discovery tool and attempts to provide access to a smaller set of information. it has none of the complexity involved in finding article-level or book chapter information. all else being equal, simpler tools will be more usable. in an experimental study, tsakonas and paptheodorou found that while users did not distinguish between the concepts of usability and usefulness, they prefer attributes composing a useful system in contrast to those supporting usability.42 discovery tools, which support more tasks, must make compromises in usability that simpler systems can avoid. in their study of eds, williams and foster also found overall user satisfaction with eds. their participants made positive comments about the interface as well as the usefulness and relevance of the results.43 jmu passed on several suggestions to ebsco related to eds based on the test results. ebsco subsequently added “audio” and “video” to the source types, which enabled jmu to add a “just videos at jmu” option to quick search. while it is confusing that “audio” and “video” source types currently behave differently than the others in eds, in that they limit to jmu’s catalog as well as to the source type, this behavior produces what most local users expect. a previous usability study of worldcat local showed users have trouble discriminating between source types in results lists, so the source types facet is important.44 another piece of feedback provided to ebsco was that on the task where users needed to choose the most recent result, only two of our participants sorted by date descending. perhaps the textual appearance of the sort option (instead of a drop-down menu) was not obvious to participants (see figure 6); however, williams and foster did not observe this to be an issue in their study.45 future research the findings of this study suggest many avenues for future research. libraries will need to revisit the scope of their catalogs and other systems to keep up with users’ mental models and information needs. catalogs and subject-specific databases still perform some tasks much better than discovery tools, but libraries will need to investigate how to situate the discovery tool and specialized tools within their web presence in a way that will make sense to users. when should a user be directed to the catalog versus a discovery tool? what items should libraries continue to include in their catalogs? what role do institutional repositories play in the suite of library tools, and how does the discovery tool connect to them (or include them?) how do library websites begin to make sense of the current state of library search systems? above all, are users able to find the best resources for their research needs? although research on searchers’ mental models has been extensive,46 librarians’ mental models have not been studied as such. yet placing the usability test results for a discovery tool in an academic library | fagan et al 104 discovery tool among the library’s suite of services will involve compromises between these two models. another area needing research is how to instruct users to work with the large numbers of results returned by discovery tools. in subject-specific databases, librarians often help users measure the success of their strategy—or even their topic—by the number of results returned: in criminal justice abstracts, 5,000 results means a topic is too broad or the search strategy needs refinement. in a discovery tool, a result set this large will likely have some good results on the first couple of pages if sorted by relevance; however, users will still need to know how to grow or reduce their results sets. participants in this study showed a willingness to use limiters and other interface features, but not always the most helpful ones. when asked to narrow a broad subject on task 3 of this study, only one participant chose to use the “subject” facet even when the subtopic, audiology, was clearly available. most added search terms. it will be important for future studies to investigate the best way for users to narrow large results set in a discovery tool. this study also suggested possible areas of investigation for future user studies. one interesting finding related to this study’s users’ information contexts was that when users were asked to search on their last research topic, it did not always match up with their major: a voice performance student searched on “current issues in russia,” and the hospitality major searched on “aphasia.” to what extent does a discovery tool help or hinder students who are searching outside their major area of study? one of jmu’s reference librarians noted that while he would usually teach a student majoring in a subject how to use that subject’s specific indexes, as opposed to a discovery tool, a student outside the major might not need to learn the subject-specific indexes for that subject and could be well served by the discovery tool. future studies could also investigate the usage and usability of discovery tool features in order to continue informing library customizations and advice to vendors. for example, this study did not have a task related to logging into a patron account or requesting items, but that would be good to investigate in a follow-up study. another area ripe for further investigation is discovery tool limiters. this study’s participants frequently attempted to use limiters, but didn’t always choose the correct ones for the task. what are the ideal design choices for making limiters intuitive? this study found almost no use of the embedded federated search add-on: is this true at other institutions? finally, this study and others reveal difficulty in distinguishing source types. development and testing of interface enhancements to support this ability would be helpful to many libraries’ systems. conclusion this usability test of a discovery tool at james madison university did not reveal as many interface-specific findings as it did questions about the role of discovery tools in libraries. users were generally able to navigate through the quick search and eds interfaces and complete tasks successfully. tasks that are challenging in other interfaces, such as locating journal articles and discriminating between source types, continued to be challenging in a discovery tool interface. information technology and libraries | march 2012 105 this usability test suggested that while some interface features were heavily used, such as drop down limits and facets, other features were not used, such as federated search results. as discovery tools continue to grow and evolve, libraries should continue to conduct usability tests, both to find usability issues and to understand user behavior and satisfaction. although discovery tools challenge libraries to think not only about access but also about the best research pathways for users, they provide users with a search that more closely matches their expectations. acknowledgement the authors would like to thank patrick ragland for his editorial assistance in preparing this manuscript. correction april 12, 2018: at the request of the author, this article was revised to remove a link to a website. references 1. emily alling and rachael naismith, “protocol analysis of a federated search tool: designing for users,” internet reference services quarterly 12, no. 1 (2007): 195, http://scholarworks.umass.edu/librarian_pubs/1/ (accessed jan. 11, 2012); frank cervone, “what we've learned from doing usability testing on openurl resolvers and federated search engines,” computers in libraries 25, no. 9 (2005): 10 ; sara randall, “federated searching and usability testing: building the perfect beast,” serials review 32, no. 3 (2006): 181–82, doi:10.1016/j.serrev.2006.06.003; ed tallent, “metasearching in boston college libraries —a case study of user reactions,” new library world 105, no. 1 (2004): 69-75, doi: 10.1108/03074800410515282. 2. s. c. williams and a. k. foster, “promise fulfilled? an ebsco discovery service usability study,” journal of web librarianship 5, no. 3 (2011), http://www.tandfonline.com/doi/pdf/10.1080/19322909.2011.597590 (accessed jan. 11, 2012). 3. janet k. chisman, karen r. diller, and sharon l. walbridge, “usability testing: a case study,” college & research libraries 60, no. 6 (november 1999): 552–69, http://crl.acrl.org/content/60/6/552.short (accessed jan. 11, 2012); frances c. johnson and jenny craven, “beyond usability: the study of functionality of the 2.0 online catalogue,” new review of academic librarianship 16, no. 2 (2010): 228–50, doi: 10.1108/00012531011015217 (accessed jan, 11, 2012); jennifer e. knievel, jina choi wakimoto, and sara holladay, “does interface design influence catalog use? a case study,” college & research libraries 70, no. 5 (september 2009): 446–58, http://crl.acrl.org/content/70/5/446.short (accessed jan. 11, 2012); jia mi and cathy weng, “revitalizing the library opac: interface, searching, and display challenges,” information technology & libraries 27, no. 1 (march 2008): 5–22, http://0http://scholarworks.umass.edu/librarian_pubs/1/ http://www.tandfonline.com/doi/pdf/10.1080/19322909.2011.597590 http://crl.acrl.org/content/60/6/552.short http://crl.acrl.org/content/70/5/446.short http://0-www.ala.org.sapl.sat.lib.tx.us/ala/mgrps/divs/lita/publications/ital/27/1/mi.pdf usability test results for a discovery tool in an academic library | fagan et al 106 www.ala.org.sapl.sat.lib.tx.us/ala/mgrps/divs/lita/publications/ital/27/1/mi.pdf (accessed jan. 11, 2012). 4. karen calhoun, “the changing nature of the catalog and its integration with other discovery tools,” http://www.loc.gov/catdir/calhoun-report-final.pdf (accessed mar. 11, 2011). 5. dee ann allison, “information portals: the next generation catalog,” journal of web librarianship 4, no. 1 (2010): 375–89, http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1240&context=libraryscience (accessed january 11, 2012); marshall breeding, “the state of the art in library discovery,” computers in libraries 30, no. 1 (2010): 31–34; c. p diedrichs, “discovery and delivery: making it work for users . . . taking the sting out of serials!” (lecture, north american serials interest group, inc. 23rd annual conference, phoenix, arizona, june 5–8, 2008), doi: 10.1080/03615260802679127; ian hargraves, “controversies of information discovery,” knowledge, technology & policy 20, no. 2 (summer 2007): 83, http://www.springerlink.com/content/au20jr6226252272/fulltext.html (accessed jan. 11, 2012); jane hutton, “academic libraries as digital gateways: linking students to the burgeoning wealth of open online collections,” journal of library administration 48, no. 3 (2008): 495–507, doi: 10.1080/01930820802289615; oclc, “online catalogs: what users and librarians want: an oclc report,” http://www.oclc.org/reports/onlinecatalogs/default.htm (accessed mar. 11 2011). 6. c. j. belliston, jared l. howland, and brian c. roberts, “undergraduate use of federated searching: a survey of preferences and perceptions of value-added functionality,” college & research libraries 68, no. 6 (november 2007): 472–86, http://crl.acrl.org/content/68/6/472.full.pdf+html (accessed jan. 11, 2012); judith z. emde, sara e. morris, and monica claassen‐wilson, “testing an academic library website for usability with faculty and graduate students,” evidence based library & information practice 4, no. 4 (2009): 24– 36, http://kuscholarworks.ku.edu/dspace/bitstream/1808/5887/1/emdee_morris_cw.pdf (accessed jan. 11,2012); karla saari kitalong, athena hoeppner, and meg scharf, “making sense of an academic library web site: toward a more usable interface for university researchers,” journal of web librarianship 2, no. 2/3 (2008): 177–204, http://www.tandfonline.com/doi/abs/10.1080/19322900802205742 (accessed jan. 11, 2012); ed tallent, “metasearching in boston college libraries—a case study of user reactions,” new library world 105, no. 1 (2004): 69–75, doi: 10.1108/03074800410515282; rong tang, ingrid hsieh-yee, and shanyun zhang, “user perceptions of metalib combined search: an investigation of how users make sense of federated searching,” internet reference services quarterly 12, no. 1 (2007): 211–36, http://www.tandfonline.com/doi/abs/10.1300/j136v12n01_11 (accessed jan. 11, 2012). 7. jody condit fagan, “usability studies of faceted browsing: a literature review,” information technology & libraries 29, no. 2 (2010): 58–66, http://0-www.ala.org.sapl.sat.lib.tx.us/ala/mgrps/divs/lita/publications/ital/27/1/mi.pdf http://www.loc.gov/catdir/calhoun-report-final.pdf http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1240&context=libraryscience http://www.springerlink.com/content/au20jr6226252272/fulltext.html http://www.oclc.org/reports/onlinecatalogs/default.htm http://crl.acrl.org/content/68/6/472.full.pdf+html http://kuscholarworks.ku.edu/dspace/bitstream/1808/5887/1/emdee_morris_cw.pdf http://www.tandfonline.com/doi/abs/10.1080/19322900802205742 http://www.tandfonline.com/doi/abs/10.1300/j136v12n01_11 information technology and libraries | march 2012 107 http://web2.ala.org/ala/mgrps/divs/lita/publications/ital/29/2/fagan.pdf (accessed jan. 11, 2012). 8. birong ho, keith kelley, and scott garrison, “implementing vufind as an alternative to voyager’s web voyage interface: one library’s experience,” library hi tech 27, no. 1 (2009): 8292, doi: 10.1108/07378830910942946 (accessed jan. 11, 2012). 9. tamar sadeh, “user experience in the library: a case study,” new library world 109, no. 1 (2008): 7–24, doi: 10.1108/03074800810845976 (accessed jan. 11, 2012). 10. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61, doi: 10.1108/07378830710840509 (accessed jan. 11, 2012). 11. allison, “information portals,” 375–89. 12. marshall breeding, “plotting a new course for metasearch,” computers in libraries 25, no. 2 (2005): 27. 13. ibid. 14. dennis brunning and george machovec, “interview about summon with jane burke, vice president of serials solutions,” charleston advisor 11, no. 4 (2010): 60–62; dennis brunning and george machovec, “an interview with sam brooks and michael gorrell on the ebscohost integrated search and ebsco discovery service,” charleston advisor 11, no. 3 (2010): 62–65, http://www.ebscohost.com/uploads/discovery/pdfs/topicfile-121.pdf (accessed jan. 11, 2012). 15. ronda rowe, “web-scale discovery: a review of summon, ebsco discovery service, and worldcat local,” charleston advisor 12, no. 1 (2010): 5–10; k. stevenson et al., “next-generation library catalogues: reviews of encore, primo, summon and summa,” serials 22, no. 1 (2009): 68–78. 16. jason vaughan, “chapter 7: questions to consider,” library technology reports 47, no. 1 (2011): 54; paula l. webb and muriel d. nero, “opacs in the clouds,” computers in libraries 29, no. 9 (2009): 18. 17. jason vaughan, “investigations into library web scale discovery services,” articles (libraries), paper 44 (2011), http://digitalcommons.library.unlv.edu/lib_articles/44. 18. marshall breeding, “the state of the art in library discovery,” 31–34; sharon q. yang and kurt wagner, “evaluating and comparing discovery tools: how close are we towards next generation catalog?” library hi tech 28, no. 4 (2010): 690–709. 19. allison, “information portals,” 375–89. 20. breeding, “the state of the art in library discovery,” 31–34. 21. galina letnikova, “usability testing of academic library websites: a selective bibliography,” internet reference services quarterly 8, no. 4 (2003): 53–68. http://web2.ala.org/ala/mgrps/divs/lita/publications/ital/29/2/fagan.pdf http://www.ebscohost.com/uploads/discovery/pdfs/topicfile-121.pdf http://digitalcommons.library.unlv.edu/lib_articles/44 usability test results for a discovery tool in an academic library | fagan et al 108 22. jeffrey rubin and dana chisnell, handbook of usability testing: how to plan, design, and conduct effective tests, 2nd ed. (indianapolis, in: wiley, 2008); joseph s. dumas and janice redish, a practical guide to usability testing, rev. ed. (portland, or: intellect, 1999). 23. nicole campbell, ed., usability assessment of library-related web sites: methods and case studies (chicago: library & information technology association, 2001); elaina norlin and c. m. winters, usability testing for library web sites: a hands-on guide (chicago: american library association, 2002). 24. jennifer l. ward, steve shadle, and pam mofield, “user experience, feedback, and testing,” library technology reports 44, no. 6 (2008): 17. 25. ibid. 26. michael boock, faye chadwell, and terry reese, “worldcat local task force report to lamp,” http://hdl.handle.net/1957/11167 (accessed mar. 11 2011). 27. bob thomas and stefanie buck, “oclc’s worldcat local versus iii’s webpac: which interface is better at supporting common user tasks?” library hi tech 28, no. 4 (2010): 648–71. 28. oclc, “some findings from worldcat local usability tests prepared for ala annual,” http://www.oclc.org/worldcatlocal/about/213941usf_some_findings_about_worldcat_local.pdf (accessed mar. 11, 2011). 29. ibid., 2. 30. doug way, “the impact of web-scale discovery on the use of a library collection,” serials review 36, no. 4 (2010): 21420. 31. north carolina state university libraries, “final summon user research report,” http://www.lib.ncsu.edu/userstudies/studies/2010_summon/ (accessed mar. 28, 2011). 32. alesia mcmanus, “the discovery sandbox: aleph and encore playing together,” http://www.nercomp.org/data/media/discovery%20sandbox%20mcmanus.pdf (accessed mar. 28, 2011); prweb, “deakin university in australia chooses ebsco discovery service,” http://www.prweb.com/releases/deakin/chooseseds/prweb8059318.htm (accessed mar. 28, 2011); university of manitoba, “summon usability: partnering with the vendor,” http://prezi.com/icxawthckyhp/summon-usability-partnering-with-the-vendor (accessed mar. 28, 2011). 33. williams and foster, “promise fulfilled?” 34. jakob nielsen, “why you only need to test with 5 users,” http://www.useit.com/alertbox/20000319.html (accessed aug. 20, 2011). 35. john brooke, “sus: a ‘quick and dirty’ usability scale,” in usability evaluation in industry, ed. p. w. jordanet al. (london: taylor & francis, 1996), http://www.usabilitynet.org/trump/documents/suschapt.doc (accessed apr. 6, 2011). 36. williams and foster, “promise fulfilled?” http://hdl.handle.net/1957/11167 http://www.oclc.org/worldcatlocal/about/213941usf_some_findings_about_worldcat_local.pdf http://www.lib.ncsu.edu/userstudies/studies/2010_summon/ http://www.nercomp.org/data/media/discovery%20sandbox%20mcmanus.pdf http://www.prweb.com/releases/deakin/chooseseds/prweb8059318.htm http://prezi.com/icxawthckyhp/summon-usability-partnering-with-the-vendor/ http://www.useit.com/alertbox/20000319.html http://www.usabilitynet.org/trump/documents/suschapt.doc information technology and libraries | march 2012 109 37. seikyung jung et al., “libraryfind: system design and usability testing of academic metasearch system,” journal of the american society for information science & technology 59, no. 3 (2008): 375–89; williams and foster, “promise fulfilled?”; laura wrubel and kari schmidt, “usability testing of a metasearch interface: a case study,” college & research libraries 68, no. 4 (2007): 292–311. 38. williams and foster, “promise fulfilled?” 39. letnikova, “usability testing of academic library websites,” 53–68; tom ipri, michael yunkin, and jeanne m. brown, “usability as a method for assessing discovery,” information technology & libraries 28, no. 4 (2009): 181–86; susan h. mvungi, karin de jager, and peter g. underwood, “an evaluation of the information architecture of the uct library web site,” south african journal of library & information science 74, no. 2 (2008): 171–82. 40. williams and foster, “promise fulfilled?” 41. ward et al., “user experience, feedback, and testing,” 17. 42. giannis tsakonas and christos papatheodorou, “analysing and evaluating usefulness and usability in electronic information services,” journal of information science 32, no. 5 (2006): 400– 419. 43. williams and foster, “promise fulfilled?” 44. bob thomas and stefanie buck, “oclc’s worldcat local versus iii’s webpac: which interface is better at supporting common user tasks?” library hi tech 28, no. 4 (2010): 648–71. 45. williams and foster, “promise fulfilled?” 46. tracy gabridge, millicent gaskell, and amy stout, “information seeking through students’ eyes: the mit photo diary study,” college & research libraries 69, no. 6 (2008): 510–22; yan zhang, “undergraduate students’ mental models of the web as an information retrieval system,” journal of the american society for information science & technology 59, no. 13 (2008): 2087–98; brenda reeb and susan gibbons, “students, librarians, and subject guides: improving a poor rate of return,” portal: libraries and the academy 4, no. 1 (2004): 123–30; alexandra dimitroff, “mental models theory and search outcome in a bibliographic retrieval system,” library & information science research 14, no. 2 (1992): 141–56. usability test results for a discovery tool in an academic library | fagan et al 110 appendix a task pre–test 1: please indicate your jmu status (1st year, 2nd year, 3rd year, 4th year, graduate student, faculty, other) pre–test 2: please list your major(s) or area of teaching (open ended) pre–test 3: how often do you use the library website? (less than once a month, 1–3 visits per month, 4–6 visits per month, more than 7 visits per month) pre–test 4: what are some of the most common things you currently do on the library website? (open ended) pre–test 5: how much of the library’s resources do you think the quick search will search? (less than a third, less than half, half, most, all) pre–test 6: have you used leo? (show screenshot on printout) (yes, no, not sure) pre–test 7: have you used ebsco? (show screenshot on printout) (yes, no, not sure) pre–test 8 (student participants only): how often have you used library web resources for course assignments in your major? (rarely/never, sometimes, often, very often) pre–test 9 (student participants only): how often have you used library resources for course assignments outside of your major? (rarely/never, sometimes, often, very often) pre–test 10 (student participants only): has a librarian spoken to a class you've attended about library research? (yes, no, not sure) pre–test 11 (faculty participants only): how often do you give assignments that require the use of library resources? (rarely/never, sometimes, often, very often) pre–test 12 (faculty participants only): how often have you had a librarian visit one of your classes to teach your students about library research? (rarely/never, sometimes, often, very often) post–test 1: when would you use this search tool? post–test 2: when would you not use this search tool? post–test 3: what would you say are the major advantages of quick search? information technology and libraries | march 2012 111 post–test 4: what would you say are the major problems with quick search? post–test 5: if you were unable to find an item using quick search/ebsco discovery service what would your next steps be? post–test 6: do you think the name “quick search” is fitting for this search tool? if not, what would you call it? post–test 7 (faculty participants only): if you knew students would use this tool to complete assignments would you alter how you structure assignments and how? appendix b task purpose • practice task: use quick search to search a topic relating to your major / discipline or another topic of interest to you. if you were writing a paper on this topic how satisfied would you be with these results? help users get comfortable with the usability testing software. also, since the first time someone uses a piece of software involves behaviors unique to that case, we wanted participants’ first use of eds to be with a practice task. 1. what was the last thing you searched for when doing a research assignment for class? use quick search to re-search for this. tell us how this compared to your previous experience. having participants re-search a topic with which they had some experience and interest would motivate them to engage with results and provide a comparison point for their answer. we hoped to learn about their satisfaction with relevance, quality, and quantity of results. (user behavior, user satisfaction) 2. using quick search find a video related to early childhood cognitive development. when you’ve found a suitable video recording, click answer and copy and paste the title. this task aimed to determine whether participants could complete the task, as well as show us which features they used in their attempts. (usability, user behavior) 3. search on speech pathology and find a way to limit your search results to audiology. then, limit your search results to peer reviewed sources. how satisfied are you with the results? since there are several ways to limit results in eds, we designed this task to show us which limiters participants tried to use, and which limiters resulted in success. we also hoped to learn about whether they thought the limiters provided satisfactory results. (usability, user behavior, user satisfaction) usability test results for a discovery tool in an academic library | fagan et al 112 4. you need more recent sources. please limit these search results to the last 5 years, then select the most recent source available. click finished when you are done. since there are several ways to limit by date in eds, we designed this task to show us which limiters participants tried to use, and which limiters resulted in success. (usability, user behavior) 5. find a way to ask a jmu librarian for help using this search tool. after you’ve found the correct web page, click finished. we wanted to determine whether the user could complete this task, and which pathway they chose to do it. (usability, user behavior) 6. locate the journal yachting and boating world. what are the coverage dates? is this journal available in online full text? we wanted to determine whether the user could locate a journal by title. (usability) 7. you need to look up the sculpture genius of mirth. you have been told that the library database, camio, would be the best place to search for this. locate this database and find the sculpture. we wanted to know whether users who knew they needed to use a specific database could find that database from within the discovery tool. (usability, user behavior). 8. use quick search to find 2 books and 2 recent peer reviewed articles (from the last 5 years) on rheumatoid arthritis. when you have found suitable source click answer and copy and paste the titles. click back to webpage if you need to return to your search results. these two tasks were intended to show us how users completed a common, broad task with and without a discovery tool, whether they would be more successful with or without the tool, and what barriers existed with and without the tool (usability, user behavior) 9. without using quick search, find 2 books and 2 recent peer reviewed articles (from the last 5 years) on rheumatoid arthritis. when you have found suitable sources click answer and copy and paste the titles. click back to webpage if you need to return to your search results. yiotis ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 266 performance of ruecking's word-compression method when applied to machine retrieval from a library catalog ben-ami lipetz, peter stangl, and kathryn f. taylor: research department, yale university library, new haven, connecticut f. h. ruecking's word-compression algorithm for retrieval of bibliographic data from computer stores was tested for performance in matching usersupplied, unedited bibliographic data to the bibliographic data contained in a library catalog. the algorithm was tested by manual simulation, using data derived from 126 case studies of successful manual searches of the card catalog at sterling memorial library, yale university. the algorithm achieved 70% recall in comparison to conventional searching. its acceptability as a substitute for conventional catalog searching methods is questioned unless recall performance can be improved, either by use of the algorithm alone or in combination with other algorithms. frederick h. ruecking has published a report ( 1) of a method for improving bibliographic retrieval from computerized files when searching on unverified input data supplied by requestors. the method involves compression of author-and-title information before comparison. the rules for compression cause certain types of spelling errors and word discrepancies to be ignored by the computer. ruecking reported 90.4% recall and 98.67% accuracy (precision) in a test of his method in which unverified book order requests were matched against a marc i data base that contained 1392 of the references searched. this paper reports on a small-scale manual simulation test undertaken to assess the value of the method when applied to bibliographic retrieval from a library catalog. ruecking' s w ord-c ompression/ lipetz 267 the opportunity to test ruecking's method when applied to retrieval from a library catalog was provided by the ready availability of data derived from a current study ( 2) of catalog use at sterling memorial library (3.5 million books) at yale university. this study collects, from a rigidly randomized sample of catalog users, precise information on the clues available to them at the moment of initiating a search. search clues are recorded exactly as known to the catalog user, employing his own spelling-right or wrong. for each catalog user studied, the outcome of the search is ascertained; complete catalog information is recorded for documents identified as pertinent in successful searches. search clues known to catalog users wno seek specific documents correspond to the "unverified input data" which ruecking's method would match against catalog holdings. catalog information on those documents identified as pertinent corresponds to the portion of the data base that ruecking's program seeks to match. it was possible, therefore, to apply ruecking' s method by manual simulation, and to test its recall performance in real catalog searches. a test of its precision was not immediately feasible .because such a test would require comparison of input data with the entire catalog (or a substantial portion of it). however, the determination of recall performance would at least indicate whether the method shows sufficient promise in catalog searching to warrant evaluation of its precision. an aside on precision is in order, however. it should be noted that precision of retrieval with a given method tends to vary inversely with the size of the file being searched. although ruecking did not specify the number of records included in his marc i data base, it could not have exceeded 48,000. had he run his test on a data base, ten, or fifty, or one hundred times larger, the measured precision would certainly have been much lower than the figure reported. any librarian who is contemplating the adoption of a retrieval technique which has been tested on a data base similar to, but smaller than, his own should realize that precision performance must inevitably drop as the data base is increased. the degree of lowered precision to be expected may be predicted theoretically or estimated from tests on files of several different sizes. the data used in the evaluation of recall performance reported in this paper came from 126 searches in which the catalog users had been successful in locating the specific documents that they were seeking. the compression coding method described by ruecking was applied in each instance to the author-title search clues supplied by the catalog user and to the author-title information available on the catalog card. threshold values were computed for the catalog card data, and retrieval values were computed for the user data .. when the retrieval value was at least as large as the threshold value, the document was considered "retrieved." ruecking's method was designed for use with english-language titles only. of the 126 catalog searches in the study sample, 20 involved foreign268 journal of library automation vol. 2/ 4 december, 1969 language titles. recall was determined on both the full sample and the english-language subset of 106 searches. surprisingly, there is not a great improvement in performance when foreign-language references are excluded. it should be noted that several difficulties were encountered in applying ruecking' s method because of ambiguities in the rules stated in his paper. in fact, in his figure 2 (page 236), of the seventeen illustrations of compression-coded data retrieved by his program, at least eight appear to contain departures from the compression-coding rules as stated in the paper. his table 5 (page 235) is scantily described: "individual code test'' and "full-code test" are not defined; neither are column headings. and, contrary to the text (page 234), values in columns five through seven are obtained by adding two to the calculated thresholds in only the top half of table 5; in the bottom half, no such regular correlation exists. in all cases of ambiguity, the alternative was selected that would tend to increase probability of retrieval. for example, ruecking states (page 234) that the search program provided for matching of titles on the basis of rearrangement of title words, and that the threshold value required for retrieval is raised at the same time. raising this value decreases the probability of retrieval, but it is not clear by how much the value is to be raised. for purposes of the test, the threshold value was not raised at all in cases where title words were out of correct sequence, thus retaining maximum probability of retrieval based on the number of matched words alone, regardless of their sequence. results of the test showed that, of the 126 documents in the full sample which were located successfully by manual search in the existing card catalog, only 88 were retrieved by the compression-code method-a recall rate of 70%. considering only the 106 english-language references, 77 were retrieved by the compression-code method-a recall rate of 73%. the premise for the preceding calculation of recall rate should be clearly understood. the test considered real document searches that were concluded successfully in an actual library using a manual catalog; recall is defined here as the proportion of such searches that would be concluded successfully in a hypothetical, computerized library where the only means of searching the catalog would be by ruecking•s method. in a real library with a manual catalog, wanted documents can be located in many ways, not merely through a knowledge of author and title (e.g., through subject entries, series entries, cross references). the test did not disqualify any manual approaches from consideration; it compared the real world with a specific potential alternative. obviously, the use of ruecking's method in combination with other computer programs could result in a recall rate higher than 70% or 73% by the method of calculation employed, and conceivably higher than 100% (because some document searches of manual catalogs that now end in failure might become successful using new search methods). ruecking' s w ord-compression/lipetz 269 table 1 provides detailed information on the discrepancies between user data and catalog data in the test. with respect to the full sample ( 126 documents), there were 49 documents for which mismatches of data were observed. of these, the compression-code method was able to "heal" mismatches in 11 instances to cause retrieval; on the other hand, manual searches had achieved retrieval in all 49 instances. with respect to the english-language sample ( 106 documents), there were 37 documents for which mismatches of data were observed. of these, the compressioncode method was able to "heal" mismatches in 8 instances to cause retrieval; on the other hand, manual searches had achieved retrieval in all 37 instances. contrary to expectations, the compression-code method performed somewhat worse, or at least no better, in "healing" actual mismatches in english references ( 8 out of 37) than it did with foreign-language references ( 3 out of 12). the higher overall recall percentage with the englishtable 1. results of applying ruecking's method in cases where user clues and catalog data did not match completely type of mismatch in user data had neither author nor title had author's last name, no title had title, no author had wrong author had misspelled author had wrong words in title had misspelled words in title had words transposed in title had incomplete title: a. first word correct b. first word incorrect had entire subtitle, no title had part of subtitle full sample english subset (126 documents) (106 documents) not not retrieved retrieved retrieved retrieved 1 4 1 2 2 2 2 1 9 5 2 1 2 1 1 4 2 1 9° 1 6 2 1 2 2 2 a. first word correct 1 1 b. first word incorrect 2 2 total documents 0 00 11 38 8 29 0 1 case of correct word stems not matched because of wrong endings. 0 0 2 cases of long or composite titles with maximum threshold values contained in input words but not among the first four significant words. o 0 ° figures shown are lower than totals of figures in columns because some documents had two or more types of mismatch. 270 journal of library automation vol. 2/4 december, 1969 language subset is attributable entirely to the fact that users had complete and correct data more frequently for english references ( 69 out of 106) than they did for foreign-language references (8 out of 20). thus, regardless of original intent, the method words equally well (or equally poorly, depending on one's viewpoint) on foreign-language and english references. if foreign-language references had been systematically ignored in applying the test to catalog searches, some 16% ( 20 out of 126) of the searches would have been excluded, with no real gain in performance. the block of interviews from which the searches used in this test were drawn included 10 unsuccessful document searches in addition to the 126 successful searches. one could speculate on whether the compressioncode method would have been able to "heal" these failures, resulting in a higher performance rating. the indications are, however, that the chances of such healing are close to zero. in a majority of these unsuccessful searches, the available data were incomplete or were not of the type that the method is intended to utilize. in the few remaining cases, it is very likely that the searches were unsuccessful simply because the desired documents were not in the library collection. recall performance as measured by the test could have been improved by modifying ruecking' s rules to some extent. for example, five more titles would have been retrieved had the assigned retrieval value been increased by two units in cases where the first title word matched correctly; this would have increased overall recall performance from 70% to 7 4%. a further increase to 76% would have resulted from matching the user's version of the title with the catalog's subtitle, or with portions of titles which follow a punctuation mark (in addition to matching with the actual title in the catalog). extension of the compression code to include publisher and date as well as author and title would do little or nothing to improve the performance of this method. the test data, although admittedly a small sample, indicate that users who do not have accurate author and title information when they begin a search very rarely have accurate information on any other descriptive data element. it is, of course, a matter for individual judgment as to whether the performance of the compression-code method, as indicated by the test reported here, is sufficiently good to make it attractive for use in some computerized alternative to the manual library catalog. in the authors' opinion, ruecking's method does not in itself supply an adequate solution to the problem of searching a computerized catalog. however, further investigation seems warranted along two lines. first, the method might be modified to give better performance in this application. second, it might be used in combination with some other computer methods to give searching performance approaching that which is attained today by the manual searching of card catalogs. book reviews 211 acknowledgment the work reported in this paper was supported in part by a grant from the u.s. office of education, oeg-7-071140-4427. references 1. ruecking, frederick h., jr.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-38. 2. lipetz, ben-ami; stangl, peter: "user clues in initiating searches in a large library catalog," in american society for information science, proceedings, 5. annual meeting, october 20-24, 1968, columbus, ohio, p. 137-139. book reviews conceptual design of an automated national library system, by norman r. meise. metuchen, n.j.: scarecrow press, 1969. 234 pp. $5.00. this is a very confusing book. and it is too bad, because this reviewer kept feeling that the author, norman meise, had something to present. the trouble is that he does not communicate. this, i think, is the result of two things. first, the book reflects the naivete of engineers when they come to deal with what are basically social systems like libraries. this does not mean it can't be done, but such a task needs clarity and purpose, which this book does not have. the second springs from this failure. the masses of data, assumptions, and commentary in the book are poorly organized and intenelated. it is not enough to write strings of words; those strings must communicate and relate backward and forward in the text. although never explicitly stated, the book evidently grew out of a study performed by the united aircraft corporate systems center in 1965-66 for the development and implementation of a connecticut library research center (see eric document ed 0221512) . the latest reference in the book is 1966. in a field, i.e. library networks, where a fair amount of work and discussion has taken place in the last three years (e.g. the edunet conference in 1966), a book like this quickly loses its impact. the purpose of the book, according to the author, is "to show the feasibility of a system concept rather than provide a detailed engineering design." the system is "an automated national library system" using the state of connecticut as a model. the author then adds (spoiling the whole introduction) : "if these functions (bibliographic searching, acquisition, cataloging, circulation) can be economically automated, the major problems associated with our information explosion will be solved." as anatole france once said: "it is in the ability to deceive oneself that the greatest talent is shown." tull ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ital_24n4p24-32 ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ lib-s-mocs-kmc364-20141005023522 61 from the editor at the january 1973 midwinter meeting of the american library association, the board of directors of the information science and automation division appointed me to the position of editor of the journal of library automation. i wish to express gratitude to don s. culbertson, at that time executive secretary of both the information science and automation division and the american library trustees association, for adding yet another hat while he prepared a substantial portion of this june 1972 issue of jola. as incoming editor, i also wish to describe briefly to the subscribers and regular readers of jola the situation of the journal and my plans for its immediate future. you are aware that there has been a hiatus in the publication of jola. at this writing, the journal is approximately ten months behind schedule. by taxing the capacity of the ala staff, jola should return to its normal schedule within a year. during the intervening period, i will appreciate greatly the support of isad members, jola readers, authors, and the ala staff. with this support the task will be made lighter and perhaps will be expedited. no substantial changes in editorial policy will be made in the near future, as all efforts will be turned toward bringing the journal up to scheduje. susan k. martin, editor 17 march 1973 295 listings of uncataloged collections fred l. bellomy: head, and lies n. jaccarino: systems analyst, library systems staff, university of california, santa barbara, california. an operational computerized system used by the ucsb libraries produces listings of bibliographic data about items in collections where full cataloging treatment is not considered justified. the system produces listings of the brief bibliographic records sorted by any of the data elements in the record including up to twenty-five subjects terrrl8. of special interest are the authority listings of descriptions and the coordinate indexes to the full records. introduction this short report was extracted from the more comprehensive document, listings of uncataloged collectionssystems documentation, santa barbara: university of california, december 1969, library systems document ls 69-11. the library staff at the university of california at santa barbara is using computerized procedures to produce a variety of listings of bibliographic information about items in uncataloged collections. although many similar systems undoubtedly have been developed to do similar jobs, this one is noteworthy in two respects, first in being well-documented and second because its versatility has been tested on three totally different collections. the machine programs, written in pl/1, were first used to list the ucsb art exhibition catalogs collection, but they were designed to be versatile so that they could he applied easily to other similar collections 296 journal of library automation vol. 3/4 december, 1970 as well. at present these programs are also being used at ucsb to list the documentation of marine pollution due to major oil spills (the oil spill information center). the programs have been successfully tested also on about one hundred items of the ucsb collection of early american trade catalogs. application to other collections (such as the phono record collection or video tape file) has been studied and is feasible. although it is usually difficult to use programs that were not specifically tailored for a particular user, these programs represent at least one instance where attention to versatility and the probable broad scope of possible applications has resulted in a system capable of producing listings for different collections at any location where there is access to an ibm system 360 computer and a staff capable of adapting about a half dozen job control language ( jcl) statements. the machine written listings of catalogs provide a limited amount of bibliographic data about each item in the collection. the advantage of such listings is the expedition with which a new, not-yet-cataloged, collection can be made accessible. description as a first step in obtaining a listing, library staff members examine ea~h item in the collection to be listed and transcribe the necessary bibliographic data to an input work sheet (figure 1). information on the work sheet is keypunched into one or more punched cards. these records, once in the computer, can be sorted in various ways to provide a variety of listings. master listings can be produced at desired intervals (e.g. monthly). multiple copies of each list can be produced, and the sheets of computer printout are a convenient form of access to the material when individual copies of the list are separated and placed in hard-board binders for distribution to the library service desks. program "packages" (i.e. jcl decks) contain many comment cards, so that each package is self explanatory after very little instruction. to keep the system simple for the librarians who use it, separate "packages" have been prepared for each different listing (or combination of listings) decided on. listings of the full records (see figure 2) have been prepared now by 1) classification letter, 2) accession number, 3) year of "exhibit", 11) mai~ and secondary subjects, 12) agency name, 13) agency city, and 17) author. obviously, others are possible. listings of subjects (figure 3) and agencies with the number of times each was used accompany full record listings by subject and agency. these are used as authority lists for future term assignments. another package, artindx, is used to produce coordinate indexes by subject, agency, author and others. an example of the subject index is shown in figure 4. such indexes are used with a master listing of the full bibliographic records in accession number order. this method reduces the amount of printout required to provide many different description approaches to the collection. listings of uncataloged collectionsfbellomy and jaccarino 297 catalog collections input worksheet column 1. classification letter ___ 2-3 2. accession number 4-8 3. year of exhibit 9-12 4. 8&w illustration no. 13-15 5. color illustration no. 16-18 6. chronology (y=yes, n=no) __ 19 7. bibliography no. pages __ 20-21 8. bib.ft.notes (y=yes,n=no)_ 22 9. pages no. 23-25 10. spare 26-30 11. subject(s) (separate with ";"} var 1 2. agency name var 13. agency city var 14. agency state var 1 5. agency country var 1 6. title var 17. au thor _________ _ var 18. spare __________ _ var note: data elements 1-10 are fixed field and are to be keyed into the card columns indicated. the card sequence number is always keyed into column 1. data elements 11-18 are variable field and each is to be terminated with a"". every record must contain exac tly eight of the s e end of variable field marks (" "). fig. 1. input worksheet for catalog collections. a~t e~h i b ition catalog in da te sequence december 1, 1968 pag e 3 ~ <---------a genc y---------> <~o.> <-------------subject--------------> <---------------~o i es ,aut hor ,title , etc . --------------> g5 british museum lonoon ,great br itai n ar iti sh 'i useum lon oon,great br ita in f itz william mu seum cambr i dge,grea t britain maggs bros. lonoon ,gr eat britain klein~erger,f .,galler i es new york , new yor k usa orouot, hotel paris , fr a.'lce natio na l loan collection t rust londo'l,great britain belvedere vienna,austri4 orouot , hotel par!s,france maggs bros, london,g reat britain nat i onal huseet co penhagen , oenmark 933 british museum, lonoon,gre at brl ta int collect i ons ,ha noaooks manual s and gui'les; london,great 8r itain ,ga lleri es and museu~ s,collecti ons; bronze age,european,collections; bron7.es,euro pean, cp.llfc tions; ~ronzes,celtic,collections 113 british mu seum,l onoon,g rf.at britain, collect ions,hanob 80ks manuals and gu i of.s; lono'ln ,great britain,galler i es and museums,collecti ons,handbook s ~anuals ano guides; art, fgypti an, collections 774 fitlwilliam mljseum,cambrioge,great britain,collections,handbooks man uals and gu i des; chmbridge,gro.at britain,galleries and "useu ms,collections,hanobo oks manua ls and guides 670 graph i c arts 70 1 painting, italian, 15th cen tury; charitjes,amerj can,20 th century ; painting,jtalian,l6th cen tury; world war ,l914-191 8 tcharitie s, a"erican 7 2 sa i nt-aubin,gabriel jacques oe,l7241780; graphic arts,french,18th centu~y 789 nat iona l loan coll ec tion trus t, lonoon,great brjtai n, co llectionst handbooks manual s ano guides; lonoon,great britain,galleries and museums,collections: painting,collection s 224 tapestry,gobelin 70 uhde,wilhelm,l874-19~7; collectors and collec ting,20th century 669 graphic arts 641 hanet,edouar0,1 832-188~ ; paint i ng , french,19th century fig. 2. sample listin g of full record. 11905 1 1~8p 1153 btw lll us t 1 color !llus , inc . chronologyi author: lea),c~arlcs a.;shit h, regin al o; tille: gui de to the ant!~uit ies of the early bronze age of central ano western eu~jpe in the department of british and medi aeval aniiq uit! fs !british museumi 119091 32sp 1233 bt tl i llus, 1p bibliography, footnotes, i nc. chronolo:oy i aut hor: bu'lge,e,a .wal lis ; title: guide to t~e egyptian colle cti ons in the br iti sh museum 119121 240p 1223 b&w ill us i title: pr in cipal pictures of the fitzwilliam museum 0 cam 8ridge 119151 105p 127 b&w il lusi titl e: engrav i ngs , etchings and drawings ! catalogue 134?1 119171 26 0p 1102 b&w i llus , 3p bibliography, foo tn otesi author : siren,jsvalo;brockwell ,maur ic e w.; t itl e: loan exh ibiti on of italian primi tive s in aio of the ameri can war reli ef 119191 63p 140 btw illus, footn otes i title: eaux-fo rt es originales , gravures,oessins , l i vres et catalogues !llustres 119191 1l3p 153 btw lllus, ip iii bliogr aphy, foo tnote s, inc. chronolo~yi titl e: cat alogue of pictur es in .the national loan collect ion trust,lonoon 119211 71p 124 btw il lus , footn otes i aut hor: baloass,luow i g v~; ti tle: kata log .oer go~elins-auss tellung i part 2 of a work i~ 3 partsi 119211 izp 11 6 3tw illus) title: catalogue des tableaux: aquarellcs,oess in s,collection uhoe , salle n.l. 119221 146p 142 btw i llu si title: engravings,etchings and draw i ngs ! cata logue •4301 11 9221 36p 17 bt w ill us , 1 p bibliography, footnotesi t i tle : eoouaro manet cuisial lnin g av hans arbeten i sk4 ndinavisk agoi '0' ~ ~ ..q.. t-t & a "'t ~ f .,... 5" ;'$ < 0 ~ c.o -t+>-d (!) n (!) g. (!) v'"t ~ ~ 0 listings of uncataloged collectionsjbellomy and jaccarino 299 count 1 3 l 1 1 4 6 1 1 1 1 1 2 fl 1 1 1 1 1 1 7 1 1 8 3 q 1 1 1 2 r 1 14 1 1 1 1 1 4 1 4 1 1 2 1 4 1 23 3 1 57 surject pa i~ting,ame~ ican ,2 0th century, 1qa3-1967 painting,argentiner20th century pai~ting,austrain,15th c~ntury painting,australian,20th century,l954-1966,coll~(tions painting,austr!an,lbth century painting~austrianrl9th renturv pat~ting,austrian,20th century palnting,austrian,20th century,collections painting,baroque,outch,l7th ce~tury patnting,baroque,flemish,1?th cen tury ?ainting,baroque,italy pajnting,baroque,lbth century paintingrbelgian,lqth cfntury painting,belgia~.20th century painting,brasilian,20th century dai~ting,britishrl9th century pat~ting,canaoa,zoth ce ntury painting,canadianrl9th centu~y painting,canadian,2 0 th century painting,chinese,collecti ons patnting,collecttnns. painting, collections painting,czechoslovak,17th century painting,outch,collections painting,outch,17th cfntury patnting,dutch,1qth century painting,outch,20th century painting,english,collfctiqns patnting,english,nor~i\.h school painting,english,l6th centupy painting,english,18th century painting,english,l9th century pajnting,englisy,19th century,collections painting,english, 20t h century patnting,engljsh,?oth c~etury pal~ting,eurepea~ painting,europea~ painting,fle~ish,collecti~ns patnting,flemish,l6th century,collectigns pat~ting,flemish,t7th ce~turv patnting,flemish,17th cfntury,collectio~s pt.tnting,french patnting,french,collecti0ns patnting,fre~ch,l6th century,l530-1619 patnting,fre~ch,l7th century patnting,french,17th .century,crllectlons painting,fr ench,j~th ccnt~ry painting,fre~ch,lrth c~~t~ry,collections patnting,french,19th ce~tury paintjng,french,19th cfntury,cnll~ctions palnting,french,l9th century,t892-1897 dajnting,french,20th century fig. 3. subject listing. oil oil spill infor"ation center sli8jec t in de x tc tober, 1970 page 102 oh c.o 8 oil ihports.histdry ~e 0007 ...... 0 ~ oil ihfc~ls.~eslrlllll~s jo ol.lj '"'t ~ ~ ...... oil in navica8l£ waters act 119221 jt 1100 j o 1013 jo 1195 .q.. jo 2065 ~ .... oil in naviga8l£ wh£rs alt .amendhents j~ 1100 ~ 119631 ~ ~ oil lands ne 0603 jo 0129 ~ 1:: oil lukage j() 016 0 gp 0068 ..... 0 oil leaks jo 0038 ~ ..... .... oil pollution gp uo.lo gp 001l gp 0012 jo oou gp occ4 gp ooss gf 0 056 gp 0057 jo 0008 gp 0069 0 ~ gp 0060 jo 1011 j g 002 2 jo 0103 gp 0034 jo 002 5 j t 0cl6 jo 0037 jo 0098 jo 0009 jo coijo jo 2021 j o 0032 jo oih gp oc61t jo 003.5 jc oc86 jo 0071 jo 0118 jo 0039 < j(; 0100 nf 101t1 jo 0 082 jo oisl ju oc11t jo c0~5 jc oiu j~ 0087 jo l l~t8 jo 0089 jo 0120 nf 2111 jg 0092 jo 1013 jo oollt j o c105 jt 0126 jo 0097 jo 2018 jo 0099 £. .jo 10ou nf 2261 jc 0102 ju 1213 jo cc41t j(j 0115 jc olit6 ju 0127 ne 0608 jo 0109 ju 1100 jg 0132 ne 0603 jll 0 094 j o 1165 jll 1196 jo 1067 ne i 008 jo 0129 c.o j~ 20qo jll 1002 ne iou ju 010 4 j() £ 025 j( £036 jo 1297 nf 1088 jo 0199 -ne 06iu jc loll ne ius) jll 0154 jo 2065 ne 2016 r.e· 0607 ne £048 j o 1209 ,;:.. nf 1040 jo 1092 ne 1073 jo 1014 ne 0615 m 21t6 he 1 0 67 ne 2088 jo 1229 ne 1070 jc llti2 ne 2u03 jo ic44 ne 2035 ne 2246 ~e 2097 ne 2188 jo 1299 ne 2090 jo 2002 ne 2083 jo ic54 ne 2045 1<£ 2.l5t h 2257 ne 2228 ne 0609 t::j ne 22•0 ju 2012 n£ 2093 jg 1c6'< ne 2085 "e 22~6 i02 ne 210l jo 1084 igrams might never have been developed. author name and second author the use of ajax, or asynchronous javascript + xml, can result in web applications that demonstrate the flexibility, responsiveness, and usability traditionally found only in desktop software. to illustrate this, a repository metasearch user interface, ojax, has been developed. ojax is simple, unintimidating but powerful. it attempts to minimize upfront user investment and provide immediate dynamic feedback, thus encouraging experimentation and enabling enactive learning. this article introduces the ajax approach to the development of interactive web applications and discusses its implications. it then describes the ojax user interface and illustrates how it can transform the user experience. w ith the introduction of the ajax development paradigm, the dynamism and richness of desktop applications become feasible for web-based applications. ojax, a repository metasearch user interface, has been developed to illustrate the potential impact of ajax-empowered systems on the future of library software.1 this article describes the ajax method, highlights some uses of ajax technology, and discusses the implications for web applications. it goes on to illustrate the user experience offered by the ojax interface. ■ ajax in february 2005, the term ajax acquired an additional meaning: asynchronous javascript + xml.2 the concept behind this new meaning, however, has existed in various forms for several years. ajax is not a single technology but a general approach to the development of interactive web applications. as the name implies, it describes the use of javascript and xml to enable asynchronous communication between browser clients and server-side systems. as explained by garrett, the classic web application model involves user actions triggering a hypertext transfer protocol (http) request to a web server.3 the latter processes the request and returns an entire hypertext markup language (html) page. every time the client makes a request to the server, it must wait for a response, thus potentially delaying the user. this is particularly true for large data sets. but research demonstrates that response times of less than one second are required when moving between pages if unhindered navigation is to be facilitated through an information space.4 the aim of ajax is to avoid this wait. the user loads not only a web page, but also an ajax engine written in javascript. users interact with this engine in the same way that they would with an html page, except that instead of every action resulting in an http request for an entire new page, user actions generate javascript calls to the ajax engine. if the engine needs data from the server, it requests this asynchronously in the background. thus, rather than requiring the whole page to be refreshed, the javascript can make rapid incremental updates to any element of the user interface via brief requests to the server. this means that the traditional page-based model used by web applications can be abandoned; hence, the pacing of user interaction with the client becomes independent of the interaction between client and server. xmlhttprequest is a collection of application programming interfaces (apis) that use http and javascript to enable transfer of data between web servers and web applications.5 initially developed by microsoft, xmlhttprequest has become a de facto standard for javascript data retrieval and is implemented in most modern browsers. it is commonly used in the ajax paradigm. the data accessed from the http server is usually in extensible markup language (xml) but another format, such as javascript object notation, could be used.6 applications of ajax google is the most significant user of ajax technology to date. most of its recent innovations, including gmail, google suggest, google groups, and google maps, employ the paradigm.7 the use of ajax in google suggest improves the traditional google interface by offering real-time suggestions as the user enters a term in the search field. for example, if the user enters xm, google suggest might offer refinements such as xm radio, xml, and xmods. experimental ajax-based auto-completion features are appearing in a range of software.8 shanahan has applied the same ideas to the amazon online bookshop.9 his experimental site, zuggest, extends the concept of auto-completion: as the user enters a term, the system automatically triggers a search without the need to hit a search button. the potential of ajax to improve the responsiveness and richness of library applications has not been lost on the library community.10 several interesting experiments have been tried. at oclc, for example, a “suggest-like service,” based on controlled headings from the worldjudith wusteman and pádraig o’hiceadha using ajax to empower dynamic searching | wusteman 57 using ajax to empower dynamic searching judith wusteman (judith.wusteman@ucd.ie) is a lecturer in the ucd school of information and library studies, university college dublin, ireland. 58 information technology and libraries | june 2006 wide union catalog, worldcat, has been implemented.11 ajax has also been used in the oclc deweybrowser.12 the main page of this browser includes four iframes, or inline frames, three for the three levels of dewey decimal classification and a fourth for record display.13 the use of ajax allows information in each iframe to be updated independently without having to reload the entire page. implications of ajax there have been many attempts to enable asynchronous background transactions with a server. among alternatives to ajax are flash, java applets, and the new breed of xml user-interface language formats such as xml user interface language (xul) and extensible application markup language (xaml).14 these all have their place, particularly languages such as xul. the latter is ideal for use in mozilla extensions, for example. combinations of the above can and are being used together; xul and ajax are both used in the firefox extension version of google suggest.15 the main advantage of ajax over these alternative approaches is that it is nonproprietary and is supported by any browser that supports javascript and xmlhttprequest—hence, by any modern browser. it could be validly argued that complex client-side javascript is not ideal. in addition to the errors to which complex scripting can be prone, there are accessibility issues. best practice requires that javascript interaction adds to the basic functionality of web-based content that must remain accessible and usable without the javascript.16 an alternative non-javascript interface to gmail was recently implemented to deal with just this issue. a move away from scripting would, in theory, be a positive step for the web. in practice, however, procedural approaches continue to be more popular; attempts to supplant them, as epitomized by xhtml 2.0, simply alienate developers.17 it might be assumed that the use of ajax technology would result in a heavier network load due to an increase in the number of requests made to the server. this is a misconception in most cases. indeed, ajax can dramatically reduce the network load of web applications, as it enables them to separate data from the graphical user interface (gui) used to display it. for example, each results page presented by a traditional search engine delivers, not only the results data, but also the html required to render the gui for that page. an ajax application could deliver the gui just once and, after that, deliver data only. this would also be possible via the careful use of frames; the latter could be regarded as an ajax-style technology but without all of ajax’s advantages. ■ from client-server to soa the dominant model for building network applications is the client/server approach, in which client software is installed as a desktop application and data generally reside on a server, usually in a database.18 this can work well in a homogenous single-site computing environment. but institutions and consortia are likely to be heterogeneous and geographically distributed. pcs, macs, and cell phones will all need access to the applications, and linux may require support alongside windows. even if an organization standardizes solely on windows, different versions of the latter will have to be supported, as will multiple versions of those ubiquitous dynamic link libraries (dlls). indeed, the problems of obtaining and managing conflicting dlls have spawned the term “dll hell.”19 in web applications, a standard client, the browser, is installed on the desktop but most of the logic, as well as the data, reside on the server. of course, the browser developers still have to worry about “dll hell,” but this need not concern the rest of us. “speed must be the overriding design criterion” for web pages.20 but the interactivity and response times possible with client/server applications are still not available to traditional web applications. this is where ajax comes in: it offers, to date, the best of the web application and client/server worlds. much of the activity is moved back to the desktop via client-side code. but the advantages of web applications are not lost: the browser is still the standard client. service-oriented architecture (soa) is an increasingly popular approach to the delivery of applications to heterogeneous computing environments and geographically dispersed user populations.21 soa refers to the move away from monolithic applications toward smaller, reusable services with discrete functionality. such services can be combined and recombined to deliver different applications to users. web services is an implementation of soa principles.22 the term describes the use of technologies such as xml to enable the seamless interoperability of web-based applications. ajax enables web services and hence enables soa principles. thus, the adoption of ajax facilitates the move toward soa and all the advantages of reuse and integration that this offers. ■ arc arc is an experimental open-source metasearch package available for download from the sourceforge opensource foundry.23 it can be configured to harvest open using ajax to empower dynamic searching | wusteman 59 archives initiative-protocol for metadata harvesting (oai-pmh)-compliant data from multiple repositories.24 the harvested results are stored in a relational database and can be searched using basic web forms. arc’s advanced search form is illustrated in figure 1. ■ applying ajax to the search gui the use of ajax has the potential to narrow the gulf between the responsiveness of guis for web applications and those for desktop applications. the flexibility, usability, and richness of the latter are now possible for the former. the ojax gui, illustrated in figure 2, has been developed to demonstrate how ajax can improve the richness of arc-like guis. ojax, including full source code, is available under the open-source apache license and is hosted on sourceforge.25 ojax comprises a client-side gui, implemented in javascript and html, and server-side metasearch web services, implemented in java. the web services connect directly to a metasearch database created by arc from harvested repositories. the database connectivity leverages several libraries from the apache jakarta project, which provides open-source java solutions.26 ■ development process the ojax gui was developed iteratively using agile software development methods.27 features were added incrementally and feedback gained from a proxy user. in order to gain an in-depth understanding of the system and the implications for the remainder of the gui, features were initially built from scratch, using objectoriented javascript.they were then rebuilt using three open-source javascript libraries: prototype, script.aculo .us, and rico.28 prototype provides base ajax capability. it also includes advanced functionality for object-oriented javascript, such as multiple inheritance. the other two libraries are built on top of prototype. the script.aculo. us library specializes in dynamic effects, such as those used in auto-completion. the rico library, developed by sabre, provides other key javascript effects—for example, dynamic scrollable areas and dynamic sorting.29 ■ storyboard one of the aims of the national information standards organization (niso) metasearch initiative is to enable all library users to “enjoy the same easy searching found in web-based services like google.”30 adopting this approach, ojax incorporates the increasingly common concept of the search bar, popularized by the google toolbar.31 ojax aims to be as simple, uncluttered, and unthreatening as possible. the goal is to reflect the simple-search experience while, at the same time, providing the power of an advanced search. thus, the user interface has been kept as simple as possible while maintaining equivalent functionality with the arc advanced search interface. all arc functionality, with the exception of the grouping feature, is provided. to help the intuitive flow of the operation, the fields are set out as a sentence: find [term(s)] in [all archives] from [earliest year] until [this year] in [all subjects] tool tips are available for text-entry fields. by default, searching is on author, title, and abstract. these fields map to the creator, title, and description dublin core metadata fields harvested from the original repositories.32 the search can be restricted by deselecting unwanted fields. arc supports both mysql and oracle databases.33 mysql has been chosen for ojax as mysql is an open-source database. boolean search syntax has been figure 1. arc’s advanced search form figure 2. the ojax metasearch user interface 60 information technology and libraries | june 2006 implemented in ojax to allow for more powerful searching. the syntax is similar to that used by google in that it identifies and/or and exact phrase functionality by +/and “ ”. hence it preserves the user’s familiarity with basic google search syntax. however, it is not as powerful as the full google search syntax; for example, it does not support query modifiers such as: intitle: 34 the focus of this research is the application of ajax to the search gui and not the optimization of the power or expressive capability of the underlying search engine. however, the implementation of an alternative back end that uses a full-text search engine, such as apache lucene, would improve the expressive power of advanced queries.35 full-text search expressiveness is likely to be key to the usability of ojax, ensuring its adequacy for the advanced user without alienating the novice. ■ unifying the user interface one of the main aims of ojax is the unification of the user interface. instead of offering distinct options for simple and advanced search and for refining a completed search, the interface is sufficiently dynamic to make this unnecessary. the user need never navigate between pages because all options, both simple and advanced, are available from the same page. and all results are made available on that same page in the form of a scrollable list. the only point at which a new page is presented is when the resource identifier of a result is clicked. at this stage, a pop-up window, external to the ojax session, displays the full metadata for that resource. this page is generated by the external repository from which the record was originally harvested. simple and advanced search options are usually kept separate because most users are unwilling or unable to use the latter.36 furthermore, the design of existing search-user interfaces is based on the assumption that the retrieval of results will be sufficiently time-consuming that users will want to have selected all options beforehand. with ojax, however, users do not have to make a complete choice of all the options they might want to try before they see any results. as data are entered, answers flow to accommodate them. because the interface is so dynamic and responsive and because users are given immediate feedback, they do not have to be concerned about wasting time due to the wrong choice of search options. users iterate toward the search results they require by manipulating the results in real time. the reduced level of investment that users must make before they achieve any return from the system should encourage them to experiment, hence promoting enactive learning. ■ auto-completion in order to provide instant feedback to the user, the search-terms field and the subject field use ajax to autocomplete user entries. figure 3 illustrates the result of typing smith in the search-terms field. a list is automatically dropped down that itemizes all matches and the number of their occurrences. users select the term they want, the entire field is automatically completed, and a search is triggered. the arc system denormalizes some of the harvested data before saving them in its database. for example, it merges all the author fields into one single field, each name separated by a bar character. to enable the ojax auto-completion feature, it was necessary to renormalize the names. a new table is used to store each name in a separate row; names are referenced by the resource identifier. to enable this, arc’s indexing code was updated so that it creates this table as it indexes records extracted from the oai-pmh feed. in its initial implementation, ojax uses a simple algorithm for auto-completion. future work will involve developing a more complex heuristic that will return results more closely satisfying user requirements. ■ auto-search as already mentioned, a central theme of ojax is the attempt to reduce the commitment necessary from users before they receive feedback on their actions. one way in which dynamic feedback is provided is the triggering of an immediate search whenever an entire option has been selected. examples of entire options include choice of an archive or year and acceptance of a suggested autocompletion. in addition, the following heuristics are used to identify when a user is likely to have finished entering a search term and, thus, when a search should be triggered: 1. entering a space character in the search-terms field or subject field 2. tabbing out of a field after having modified its contents 3. five seconds of user inactivity for a modified field the third heuristic aims to catch some of the edge cases that the other heuristics may miss. it is assumed likely that a term has been completed if a user has made no edits in the last five seconds. as each term will be using ajax to empower dynamic searching | wusteman 61 separated by a space, it is only the last term in a search phrase that is likely not to trigger an auto-search via the first heuristic. users can click the search button whenever they wish, but they should never have to click it. the zuggest system abandons the search button entirely; ojax retains it, mainly in order to avoid confounding user expectations.37 while a search is in progress, the search button is greyed out and acquires a red border. this is particularly useful in alerting the user that a search has been automatically triggered. this is the only feature of ojax that may have an impact on network load in terms of slightly higher traffic. however, the increased number of requests is offset by a reduction in the size of each response because the gui is not downloaded with it. for example, initiating a search in arc results in an average response size of 57.32k. the response is in the form of a complete html page. initiating a search in ojax results in an average response size of 7.96k. the latter comprises a web service response in xml. in other words, more than seven ojax autosearches would have to be triggered before the size of the initial search result in arc was exceeded. ■ dynamic archive list the use of ajax enables a static html page to contain a small component of dynamic data without the entire page having to be dynamically generated on the server. ojax illustrates this: the contents of the drop-down box listing the searchable archives are not hard-coded in the html page. rather, when the page is loaded, an ajax request for the set of available archives is generated. this is a useful technique; static html pages can be cached by browsers and proxy servers, and only the dynamic portion of the data, perhaps those used to personalize the page, need be downloaded at the start of a new session. ■ dynamic scrolling searches commonly produce thousands of results. typical systems, such as google and arc, make these results available via a succession of separate pages, thus requiring users to navigate between them. finding information by navigating multiple pages can take longer than scrolling down a single page, and users rarely look beyond the second page of search results.38 to avoid these problems and to encourage users to look at more of the available results, those results could be made available in one scrollable list. but, in a typical non-ajax application, accessing a scrollable list of, say, two thousand items would require the entire list to be downloaded via one enormous html page. this would be a huge operation; if it did not crash the browser, it would, at least, result in a substantial wait for the user. the rico library provides a feature to enable dynamic scrollable areas. it uses ajax to fetch more records from the server when the user begins to scroll off the visible area. this is used in the display of search results in ojax, as illustrated in figure 4. to the user, it appears that the scrollable list is seamless and that all 4,678 search results are already downloaded. in fact, only 386 have been downloaded. the rest are available at the server. as the user scrolls further down, say to item 396, an ajax request is made for the next ten items. any item downloaded is cached by the ajax engine and need not be requested again if, for example, the user scrolls back up the list. a dynamic information panel is available to the right of the scroll bar. it shows the current scroll position in relation to the beginning and end of the results set. in figure 3. auto-completion in the search terms field figure 4. display of search results and dynamic information panel 62 information technology and libraries | june 2006 figure 4, the information panel indicates that there are 4,678 results for this particular search and that the current scroll position is at result number 386. this number updates instantly during scrolling, preserving the illusion that all results have been downloaded and providing users with dynamic feedback on their progress through the results set. this means that users do not have to wait for the main results window to refresh to identify their current position. ■ auto-expansion of results ojax aims to provide a compact display of key information, enabling users to see multiple results simultaneously. it also aims to provide simple access to full result details without requiring navigation to a new web page. in the initial results display, only one line each of the title, authors, and subject fields, and two lines of the abstract, are shown for each item. as the cursor is placed on the relevant field, the display expands to show any hidden detail in that field. at the same time, the background color of the field changes to blue. when the cursor is placed on the bar containing the resource identifier, all display fields for that item are expanded, as illustrated in figure 5. this expansion is enabled via simple cascading style sheet (css) features. for example, the following css declaration hides all but the first line of authors: #searchresults td div { overflow:hidden; height: 1.1em } when the cursor is placed on the author details, the overflow becomes visible and the display field changes its dimensions to fit the text inside it: #searchresults td div:hover { overflow:visible; height:auto } ■ sorting results another method used by ojax to minimize upfront user investment is to provide initial search results before requiring the user to decide on sort options. because results are available so quickly and because they can be re-sorted so rapidly, it is not necessary to offer pre-search selection of sort options. ajax facilitates rapid presentation of results; after a re-sort, only those on the first screen must be downloaded before they can be presented to the user. results may be sorted by title, author, subject, abstract, and resource identifier. these options are listed on the gray bar immediately above the results list. clicking one of these options sorts the results in ascending order; an upward-pointing arrow appears to the right of the sort option chosen, as illustrated in figure 6. clicking on the option again sorts in descending order and reverses the direction of the arrow. clicking on the arrow removes the sort; the results revert to their original order. functionality for the sort feature is provided by the rico javascript library. server-side implementation supports these features by caching search results so that it is not necessary to regenerate them via a database query each time. figure 5. auto-expansion of all fields for item number 386 figure 6. results being sorted in ascending order by title using ajax to empower dynamic searching | wusteman 63 ■ search history several experimental systems—for example, zuggest— have employed ajax to facilitate a search-history feature. a similar feature could be provided for ojax. a button could be added to the right of the results list. when chosen, it could expand a collapsible search-history sidebar. as the cursor was placed on one of the previous searches listed in the sidebar, a call out, that is, a speech bubble, could be displayed. this could provide further information such as the number of matches for that search and a summary of the search results clicked on by the user. clicking one of the previous searches would restore those search results to the main results window. this feature would take advantage of the ajax persistent javascript engine to maintain the history. its use could help counter concerns about ajax technology “breaking” the back button; the feature could be implemented so that the back button returned the user to the previous entry in the search history.39 in fact, this implementation of back-button functionality could be more useful than the implementation in google, where hitting the back button is likely to take the user to an interim results page; for example, it might simply take the user from page 3 of results to page 2 of results. ■ scrapbook users browsing through search results on ojax would require some simple method of maintaining a record of those resource details that interested them. ajax could enable the development of a useful scrapbook feature to which such resource details could be copied and stored in the persistent javascript engine. ojax could further leverage a shared bookmark web service, such as del. icio.us or furl, to save the scrapbook for use in future sessions and to share it with other members of a research or interest group.40 ■ potential developments for ojax as well as searching a database of harvested metadata, the ojax user interface could also be used to search an oai-pmh-compliant repository directly. with appropriate implementation, all of ojax’s current features could be made available, apart from auto-completion. a recent development has enabled the direct indexing of repositories by google using oai-pmh.41 the latter provides google with additional metadata that can be searched via the google web services apis. the current ojax web services could be replaced by the google apis, thus eliminating the need for ojax to host any server-side components. hence, ojax could become an alternative gui for google searching. ■ conclusion ojax demonstrates that the use of ajax can enable features in web applications that, until now, have been restricted to desktop applications. in ojax, it facilitates a simple, nonthreatening, but powerful search user interface. page navigation is eliminated; dynamic feedback and a low initial investment on the part of users encourage experimentation and enable enactive learning. the use of ajax could similarly transform other web applications aimed at library patrons. however, ajax is still maturing, and the barrier to entry for developers remains high. we are a long way from an ajax button appearing in dreamweaver. reusable, well-tested components, such as rico, and software frameworks, such as ruby on rails, sun’s j2ee framework, and microsoft’s atlas, will help to make ajax technology accessible to a wider range of developers.42 as with all new technologies, there is a temptation to use ajax simply because it exists. as ajax matures, it is important that its focus does not become the enabling of “cool” features but remains the optimization of the user experience. references and notes 1. ojax homepage, http://ojax.sourceforge.net (accessed apr. 5, 2006). 2. j. j. garrett, “ajax: a new approach to web applications,” feb. 18, 2005, www.adaptivepath.com/publications/ essays/archives/000385.php (accessed nov. 11, 2005). 3. ibid. 4. j. nielsen, “the need for speed,” alertbox mar. 1, 1997, www.useit.com/alertbox/9703a.html (accessed nov. 11, 2005). 5. dynamic html and xml: the xmlhttprequest object, http://developer.apple.com/internet/webcontent/xmlhttpreq .html (accessed apr. 5, 2006). 6. javascript object notation, wikipedia definition, http:// en.wikipedia.org/wiki/json (accessed apr. 5, 2006). 7. google gmail, http://mail.google.com (accessed apr. 5, 2006); google suggest, www.google.com/webhp?complete =1&hl=en (accessed apr. 5, 2006); google groups, http://groups .google.com (accessed apr. 5, 2006); google maps, http://maps .google.com (accessed apr. 5, 2006). 8. p. binkley, “ajax and auto-completion,” quædam cuiusdam blog may 18, 2005, www.wallandbinkley.com/quaedam/?p=27 (accessed nov. 11, 2005). 9. francis shanahan, zuggest, www.francisshanahan.com/ zuggest.aspx (accessed apr. 5, 2006). 64 information technology and libraries | june 2006 10. a. rhyno, “ajax and the rich web interface,” librarycog blog apr. 10, 2005, http://librarycog .uwindsor.ca:8087/artblog/librarycog/1113186562 (accessed nov. 11, 2005); r. tennant, “tennant’s top tech trend tidbit,” lita blog june 22, 2005, http://litablog.org/?p=35 (accessed nov. 11, 2005). 11. t. hickey, “ajax and web interfaces,” outgoing blog, mar. 31, 2005. retrieved nov. 11, 2005 http://outgoing.typepad .com/outgoing/2005/03/web_application.html. 12. oclc deweybrowser. http://ddcresearch.oclc.org/ ebooks/fileserver (accessed apr. 5, 2006). 13. hickey, “ajax and web interfaces.” 14. j. wusteman, “from ghostbusters to libraries: the power of xul,” library hi tech 23, no 1 (2005a). retrieved nov. 11, 2005 www.ucd.ie/wusteman/; cover pages, microsoft extensible application markup language (xaml), http://xml.cover pages.org/ms-xaml.html (accessed apr. 5, 2006). 15. google extensions for firefox, http://toolbar.google .com/firefox/extensions/index.html (accessed apr. 5, 2006). 16. c. adams, “ajax: usable interactivity with remote scripting,” sitepoint. (jul. 13, 2005), www.sitepoint.com/article/ remote-scripting-ajax (accessed nov. 11, 2005). 17. xhtml 2.0, w3c working draft, may 27, 2005, www .w3.org/tr/2005/wd-xhtml2-20050527 (accessed apr. 5, 2006). 18. client/server model, http://en.wikipedia.org/wiki/ client/server (accessed apr. 5, 2006). 19. dll hell, http://en.wikipedia.org/wiki/dll_hell (accessed apr. 5, 2006). 20. j. nielsen, “the need for speed.” 21. service-oriented architecture, http://en.wikipedia.org/ wiki/service-oriented_architecture (accessed apr. 5, 2006). 22. j. wusteman, “realizing the potential of web services,” oclc systems & services: international digital library perspectives 22, no. 1 (2006): 5–9. 23. arc—a cross archive search service, old dominion university digital library research group, http://arc.cs.odu .edu (accessed apr. 5, 2006); niso metasearch initiative, www .niso.org/committees/ms_initiative.html (accessed apr. 5, 2006); arc download page, sourceforge, http://oaiarc.source forge.net (accessed apr. 5, 2006). 24. open archives initiative protocol for metadata harvesting, www.openarchives.org/oai/openarchivesprotocol.html (accessed apr. 5, 2006). 25. ojax download page, sourceforge, http://sourceforge .net/projects/ojax (accessed apr. 5, 2006). 26. apache jakarta project, http://jakarta.apache.org (accessed apr. 5, 2006); apache jakarta commons dbcp, http:// jakarta.apache.org/commons/dbcp (accessed apr. 5, 2006); apache jakarta commons dbutils, http://jakarta.apache.org/ commons/dbutils (accessed apr. 5, 2006). 27. agile software development definition, wikipedia, http://en.wikipedia.org/wiki/agile_software_development (accessed apr. 5, 2006). 28. prototype javascript framework, http://prototype.conio .net (accessed apr. 5, 2006); script.aculo.us, http://script.aculo .us (accessed apr. 5, 2006); rico, http://openrico.org/rico/ home.page (accessed apr. 5, 2006). 29. sabre, www.sabre.com (accessed apr. 5, 2006). 30. niso metasearch initiative, www.niso.org/committees/ ms_initiative.html (accessed apr. 5, 2006). 31. google toolbar, http://toolbar.google.com (accessed apr. 5, 2006). 32. dublin core metadata initiative, http://dublincore.org (accessed apr. 5, 2006). 33. mysql, www.mysql.com (accessed apr. 5, 2006). 34. google help center, advanced operators, www.google .com/help/operators.html (accessed apr. 5, 2006). 35. apache lucene, http://lucene.apache.org (accessed apr. 5, 2006). 36. j. nielsen, “search: visible and simple,” alertbox may 13, 2001, www.useit.com/alertbox/20010513.html (accessed nov. 11, 2005). 37. francis shanahan, zuggest. 38. j. r. baker, “the impact of paging versus scrolling on reading online text passages,” usability news 5, no. 1 (2003), http://psychology.wichita.edu/surl/usabilitynews/51/ paging_scrolling.htm (accessed nov. 11, 2005); j. nielsen, “search: visible and simple.” 39. j. j. garrett, “ajax: a new approach to web applications.” 40. del.icio.us, http://del.icio.us (accessed apr. 5, 2006); furl, www.furl.net (accessed apr. 5, 2006). 41. google sitemaps (beta) help, www.google.com/web masters/sitemaps/docs/en/other.html (accessed apr. 5, 2006). 42. ruby on rails, www.rubyonrails.org (accessed apr. 5, 2006); java 2 platform, enterprise edition (j2ee), http://java .sun.com/j2ee (accessed apr. 5, 2006); m. lamonica, “microsoft gets hip to ajax,” cnet news.com, june 27, 2005, http:// news.com.com/microsoft+gets+hip+to+ajax/2100-1007_3 -5765197.html (accessed nov. 11, 2005). the marc sort program john c. rather: specialist in technical processes research, and jerry g. pennington: information systems mathematician, library of congress, washington, d.c. 125 describes the characte1·istics, performance, and potential of sked (sortkey edit), a generalized computer program for creating sort keys for marc ii records at the users option. sked and a modification of the ibm s/360 dos tape sort/merge program form the basis for a comprehensive program for arranging catalog entries by computer. the role of sorting in the marc system many present and potential uses of cataloging data in machine readable form require that the input sequence of the records be altered before output. the production of book catalogs, bibliographical lists, and similar output products benefits from an efficient means for arranging the records in a more sophisticated way than mere alphabetical order or, even worse, the collating sequence of a particular computer. internal files, such as special tape indexes, also may require sequencing by sort keys that differ from the actual character strings in the records. the demonstration of the feasibility of filing catalog entries by computer hinges on successfully pedorming two tasks: 1) analyzing the requirements of particular filing arrangements; and 2) programming the computer to perform the required operations. actually, the two tasks are interdependent, because the nature of the filing analysis is strongly influenced by the ability of the computer to perform certain types of operations. the requirements for filing arrangement were considered at the genesis of the marc project ( 1) and they materially affected the characteristics of the marc ii format ( 2,3). structuring the format of a machine record is only part of the solution to the problem, however. 126 journal of library automation vol. 2/3 september, 1969 the first requirement for a program for library sorting is a set of generalized computer techniques for creating sort keys from marc records at the user's option. these techniques will provide the foundation for further refinement of the sorting capability by developing algorithms to resolve specific problems in file arrangement. this article describes the characteristics, performance, and potential of a generalized program developed by the information systems office and the technical processes research office of the library of congress. the present approach to the computer sorting problem was based on the following assumptions: 1) the sort key must be generated on demand. for maximum flexibility and economy of storage, it should not be a permanent part of the machine readable record. 2) data to be sorted must be processed (edited) for sorting by the machine. input to a data field should be in the form required for cataloging purposes; it should not be contrived simply to satisfy the requirements of filing. 3) all data elements contributing to a sort key must be fully represented. to determine the position of an entry in a large file, the filing elements must be considered in turn until the discrimination point is reached. no element may be truncated to make room for another. 4) at least initially, a manufacturer's program should be used for sorting and merging the records with sort keys. given the library's present machine configuration, this means using ibm s /360 dos tape sort/merge program. these assumptions shaped the course that was followed. the requirement that the sort key be generated on demand meant that a program had to be written to build sort keys specifically for records in the marc ii format. to allow maximum flexibility in specifying elements to be included in the sort key, the basic program was to be highly generalized, allowing any combination of fixed and variable field data to be included in the sort key. since several data elements may have to be considered to determine the proper location of an item in a complex file, it is desirable to construct a single sort key containing as many characters of each element considered in turn as the length of the key will allow. using a single sort key is more efficient than using separate keys for each element. the marc ii format the marc sort program was written to handle records in the processing format used by the library of congress. the differences between thi,s format and the marc ii communications format ( 2,3,4) have been described by avram and droz (5). for the purposes of the present article, it is sufficient to give a brief outline of the structure of the format as it marc sort progmm 127 relates to computer sorting capability and to describe the salient features of the content designators that facilitate the creation of sort keys. marc records are undefined; that is, they vary in length, and information is not provided at the beginning of each record for use by software input/ output macros. since the manufacturer's program used for sorting marc records cannot handle undefined records, preparation for sorting must include changing them from one type to the other. at the end of the sort/ merge phase, they must be returned to an undefined state. the maximum physical length of a marc record is 2048 bytes. if a logical record (that is, the bibliographical data plus machine format data) requires more than 2048 bytes, it must be divided into two (or more) physical records. at present, the marc sort program cannot handle continuation records of this type. the basic structure of the format includes a leader, a record directory, and variable fields. each variable field is identified by a unique tag in the directory. if necessary, the data in a field can be defined more precisely by indicators and subfield codes. they appear at the beginning of the field separated by delimiters. when no indicator is needed, the field begins with a delimiter. tags, indicators, and subfield codes are used to specify what variable field data are to be included in the sort key, how the data are to be arranged, and what special treatment may be required. although the full potential of these content designators has yet to be realized, they provide a basis for programming to achieve content-related filing arrangements; for example, placement of a single surname before other headings with the same character string. characteristics of the marc sort program the marc sort program has three components: 1) a sort-key edit program ( sked); 2) the sorting capability of the ibm s / 360 dos tape sort/merge program (tsrt); and 3) a merge routine written expressly for the marc sort program. the marc sort program is activated by a set of control cards supplied by the user. these control cards specify the parameters to be observed in processing each record. using this information, sked reads each record, builds as many sort keys as are required to satisfy the parameters, duplicates the master record each time a different key is constructed, and records information about the sort key and the master record for possible later use. the output of sked is an intermediate marc sort file containing records with sort keys. the second phase of the program involves tsrt, which also is controlled by parameter cards. the input is the intermediate marc sort file. the tsrt program sorts the records according to their keys, using a standard collating sequence (that is, according to the order of the bitconfigurations of the characters in the keys). the output can take either or both of two forms: 1) marc format, in which the sort key is stripped 128 journal of library automation vol. 2/ 3 september, 1969 merge i sort-key edit (sked) program. sort fig. 1. marc sort program data flow. sked parameter cards marc sort program 129 from each record and the format is returned to an undefined state; or 2) intermediate marc sort format, which is identical with the input to the tsrt program (sort key remains with the record). the merge routine written especially for the marc sort program provides the capability to merge two or more files produced previously by tsrt in the intermediate marc format and to output files in either or both of the above formats. it is necessary to provide a separate program for the merge function, since the manufacturer-supplied sort/merge package does not have the capability to merge intermediate marc records while producing marc ii output. figure 1 shows a simplified flow chart of the program. sort-key edit program ( sked) sked builds sort keys according to the parameters specified at run time. in this process, it uses a table to translate the data to equalize upperand lower-case letters, eliminate unwanted characters (e.g., diacritics, punctuation), and to provide a more desirable collating sequence during the sort phase. if the parameters result in more than one sort. key for a record, the record is duplicated each time a new key is built. the sort key is attached to the front of the marc record when both are written in the intermediate marc sort file. this is a variable-length, blocked file with a maximum block length of 4612 bytes (minimum blocking factor of 2). figure 2 shows diagrammatically how a record looks after it has been processed by sked. communiblock record sort key sort leader cations control fixed direc~ variable length length length key area field field tory fields i i i i i i i i l k -----2 or more records blocked-----~ fig. 2. schematic diagram of an intermediate marc sort record. records in the master file that do not satisfy the parameters for a particular processing cycle are written in an exception file which is in the same format as the original master file (that is, undefined). a utility program can be used to list the contents of the exception file. tsrt requires the specification of the number of control fields and certain related information about each such field. as many as twelve control fields (each with a maximum length of 256 bytes) can be accommodated by the program. the current implementation of the marc sort program uses a 256-byte key starting in position 9 of each record. (the first 8 bytes are used for variable record information). any change in the length 130 journal of library automation vol. 2/ 3 september, 1969 of the sort key must be reflected in the sked source deck and on the control cards for tsrt. the specification of control fields shown on a tsrt control card must be changed as follows: if the length of the sort key is shortened, then the control field length specification must be reduced. if the sort key is lengthened, then the control field must be split into two or more control fields, as follows: key length number of control fields = 256 (if the quotient is a fraction, use the next higher integer.) parameters the control cards for a sked processing cycle allow the user to specify the following options: 1) type of field. both fixed and variable fields may be specified as parameters for a sort key. there is no restriction on the order in which they are given. 2) specification of fields. fields may be specified in several ways: a) exact form: a specific tag for a variable field (e.g., 650) or the address of a fixed field (the only option for this type of field); b) generalized form: nxx, nnx, xnn, nxn, where any digit may be substituted for n (e.g., !xx ); and c ) as a range: nnnxxx (e.g. 600-651) . 3) selection of data from a field. the amount of data from a field to be processed can be determined in any of three ways: a) specifying the variable field tag without specifying particular subfield codes associated with it. this results in all data in the field being processed. b) specifying the number of characters to be processed. this must be done for fixed fields even if all data are desired. with either type of field, the data will be truncated if the number specified is smaller than the number of characters in the field. c) specifying the particular subfield codes associated with a variable field tag. this results in the sort key containing only the data from the specified subfields. for example, if the data in a 100 field were smith, john, 1910ed., failure to include subfield "e, (the designator for a relator like ''ed.") in the specification of subfields would result in its being excluded from the sort key. 4) alternate selection. two or more parameters may be specified for the same position in the sort key with the provision that only the first to be found will be used. for example, if 240 (uniform title) and 245 (bibliographical title) are specified as alternate selections in that order and both occur in a record, preference is given to 240 and only it is used in the sort key. marc sort program 131 5) multiple parametric levels. for efficient processing, mutually exclusively para~eters can be listed in the instructions for the same processing cycle. the program affords a means of distinguishing between primary parameters that must always appear in the sort key and secondary parameters that cannot be combined with one another. the user also has the option of specifying that a sort key is to be generated using only the primary parameters. for example, if a book catalog were to contain main entries, added entries, and subjects, the tags for added entries and subjects would be specified as secondary parameters and the tags for the main entry, title, and imprint date as primary parameters. the sort key built for each added entry and each subject entry would always include the main entry (if present), title, and imprint date. this option can be by-passed if, for example, only a subject catalog is desired. 6) sequence of subfield codes. the subfield codes for a variable field may be specified not only to control the data to be included in the sort key but also to determine the order in which it appears. the following example shows how this works: tag subfield codes data record 100 abed charles t ii t king of great britain,f 1630-1685 sked parameter 100 acbd sort key ( charles king of great britain ii 1630 1685ij 7) separator. the user must specify the character that will separate each data element in the sort key but he has a choice of the character to be used. when the required characters from a given data element have been moved to the sort key, the selected separator is inserted to mark the end of the element. the separator is one of a set of specially contrived characters called superblanks that sort lower than any other character, including a b1ank. use of the superblank permits the combination of different data elements in the same sort key because it prevents unlike data elements from sorting against one another, as shown below: {l-b_al_l_j_ohn_o_arth __ ull_th_!_l._i_m_g ___ 7,.. l ball john arthur 0 chess ~ ....____ ---· __ .r without the superb lank (shown here for convenience as a bullet) the second sort key would be placed before the first. later it is expected that use of different superblanks within data elements will enable the sort/merge program to group related headings together. .. . ' 132 journal of library automation vol. 2/3 september, 1969 8) acceptance/ rejection indicator. at the beginning of the processing cycle, a decision can be made as to whether a marc record should be processed if it does not include a particular parameter. if the rejection indicator is set, the record will be written to the exception file if that parameter does not occur. for example, if the parameters are 1xx (any main entry), 245 (bibliographical title ), and imprint date (taken from the fixed field), a record without a main entry tag could be accepted but a record without a title rejected. this allows for title main entries while excluding imperfect records that may be present. 9) duplication. the parameters for variable fields to be included in the sort key may be satisfied by more than one combination of tags in the directory for a single marc record. to provide for this occurrence, a duplication indicator may be set, thereby insuring that a sort key will be generated for each combination that satisfies the parameters. in addition the entire record will be duplicated so that it can accompany each sort key. 10) number of parameters. the number of parameters designating data elements for the sort key is determined by the user at the beginning of the processing cycle. any number up to twenty may be specified. it is unlikely that any given sort key will contain more than four or five parameters, but the ability to specify a much larger number allows for processing sort keys of several different types during the same cycle. translation table before data characters are moved from a designated field in a marc record to the sort key, they must be edited to insure that the key will include only characters that are relevant for sorting. this involves translation of the characters into the sked character set: 1) to equate upper and lower case versions of the same alpha characters; 2) to translate the period, comma, and hyphen to the bit configuration of an ordinary blank; 3) to reduce other punctuation, diacritics, and special characters to a single bit configuration that cannot be moved to the sort key; and 4) to insure the proper machine collating sequence (blank, 0 9, az). the sked character set also provides bit configurations for superblanks (see above). the translation routine is written so that the character set can be changed without programming complications. sked also includes a feature that safeguards the sort key against the possibility of two consecutive blanks, as would be the case when a period and a blank occur in sequence, or when the data erroneously include two blanks when only one should occur. before a character with a bit con~ figuration equal to a blank is moved to the sort key, the program determines whether the last character moved has the same configuration. if it does, the second character is not moved. marc sort prog1·am 133 other options sked has the optional capability of adding two variable data fields and their corresponding directory entries to each record. these entries follow the format for data in other marc ii variable fields. 1) 998 entry. when the sked capability to duplicate records is used, it may be desirable to label one record of the set as the "master record". this technique or a modification of it might be used to generate a reference from a partial record to the full (master) record in a book catalog. when this option is selected there will be one, and only one, 998 field generated with each record. infor,mation about the master record will be given by listing the tags used in the sort key to achieve a unique position of that record on file. for example, if a master record is sorted by main entry, then title (if different from main entry), and finally by the date of publication, the 998 field describing this master record should list the lxx tag, followed by the 2xx tag, and finally by the address !lnd length of the fixed field containing the publication date. the order of the tags in the 998 field is the same sequence used in the sort key of the master record. 2) 999 entry. when a book catalog is produced, it is desirable to show on the first line of an entry the element (e.g., title, subject) that determined the position of the entry in the arrangement. sked supplies this information by creating a variable field (tagged 999) containing the initial element of the sort tag. if this option is chosen, an indicator can be set in the 999 field to show that the data in it should be printed as the first line of the bibliographic printout. sort/ merge the tsrt program used by the marc sort program is the standard ibm-supplied ibm system/ 360 basic operating system tape sort/ merge program. design specifications for this program satisfy the sorting and merging requirements of tape-oriented systems with at least 16k bytes of main storage. this program enables the user to sort files of random records, or merge multiple files of sequential records, into one sequential file. if any inherent sequencing exists within the input file, the program will take advantage of it. the intermediate marc sort file produced by sked is acceptable to tsrt. as stated earlier, tsrt can accommodate up to twelve control fields for sorting. the marc sort program requires only one control field at present. it is important to note that the tsrt comparison routines end as soon as a character in one control field is different from the corresponding character in another conh·ol field. tsrt operates in four phases: assignment (phase 0); internal-sort (phase 1); external-sort (phase 2); merge-only (phase 3). if sorting is 134 journal of library automation vol. 2/3 september, 1969 to be done, the assignment, internal-sort, and external-sort phases are executed. if only merging is to be done, the assignment and merge-only phases are executed. tsrt provides various exits from the main line to enable a user to insert his own routines. exit 22 in the external sort has been provided to delete records. in the marc sort program phase this exit is used as an option for stripping the key and returning the records to standard marc ii format (undefined 2040 bytes maximum). the user exit intercepts each sorted record prior to output and converts it to an undefined state. the option is provided by addition of a "mods" control card . a how chart of tsrt appears as figure 3. user exit 2 2 assignment phase (phase 0 ) internal sor t phase (phase 1 ) external sort phase (phase 2) end o f progr am merged only ( pha!e 3) fig. 3. flow chart of the sort/merge phases of the marc sort program. marc s01·t program 135 four work tapes are used by this application of tsrt. a fifth drive is used for input and output. the ibm sort package is not capable of writing undefined length records. since the marc record is in an undefined format, the output routines of tsrt cannot be used. therefore, the following method is used to develop the marc output file: 1) a separate file receives the sort output instead of the standard sort out-file; 2) the separate file is written by special coding in exit 22 of tsrt; and 3) each record is written in such a way as to prevent the sort from also writing the record. the merge routine written especially for the marc sort program will output either intermediate (with key) or marc ii (without key) format tapes or both. the program in action processing times sked was written in assembler language, using physical input/ output control system and dynamic buffering assignments to achieve speed. the amount of time required to process a particular record is affected by the applicability of run-time parameters on the record. for example, if the user specifies twelve data elements from twelve different variable fields for inclusion in the sort key, the processing time will be greater than that required for a run with only one specification. likewise, a run requesting duplication of each record for every added entry will require more computer time than another run that does not duplicate records at all. since the total processing time required for a file of n records will be the same as the time to process n files of one record each (disregarding i/0 considerations) it is possible to project times using a run with various data conditions and numerous sked parameters as a guide. one such run on the ibm s / 360 at the library of congress processed records at the rate of 2,400 per minute. twelve parameters were specified and records were duplicated for certain subject entries. except for time spent in changing tape reels, sked can be expected to process records at the same rate regardless of the size of the file. the processing time for tsrt is affected by the same characteristics that affect most sort programs. some of these are as follows: 1) amount of memory available to the sort; 2) number of storage units (in lc' s case, tape units are used); 3) types of storage unit (for magnetic tape-interrecord gap, density, and tape length); 4) block size for data; and 5) amount of bias in the input. the only characteristic of sked that seems to relate to the speed with which tsrt operates has to do with sked's extended use of a single control field. for example, in many sorting systems, if records are to be arranged by main entry and within main entry by title and then by date, three control fields would be specified. one would be chosen for main entry; one would be chosen for title; and, one would be selected for the 136 journal of library automation vol. 2/ 3 september, 1969 date. sked places all of these within the same control field, separating them by a superblank. since tsrt is required to discriminate only on the single control field, a smaller amount of processing time is needed than would be the case if several control fields were used. results although sked does not have the ability to make the refined distinctions among headings required for sophisticated filing arrangements, it performs in a workmanlike way in producing alphabetical sequences that are unaffected by the presence of diacritical marks and vagaries in punctuation and spacing. moreover, the collating sequence (blank, 0 9, a z) insures that short names will file before longer ones beginning with the same character string. the ability to truncate headings to remove relators (e.g., ed.) also insmes the creation of a single sequence for authors whose names are sometimes qualified in this way. the following consolidated example shows some of the arrangements produced by sked. to simplify the presentation, generally only the first filing element is given. other elements have been added if they are needed to show distinctions that were made by the program. abbott, charles abc company. a "beckett, gilbert acadia university. alexander iii, king of albania alexander ii, king of bulgaria alexander i, king of russia bradley, bradley and bradley, firm. bradley, milton, 1836-1911. bradley (milton) company katz, eric, ed. sound about our ears. katz, eric. sound in space. lincoln, abraham, pres. u. s., 1809-1865. lincoln co., cr.-directories lincoln county coast directory lincoln, david. lincoln highway lincoln, marshall. lincoln, mass.-history lincoln, me.-genealogy marc sort program 137 london, albert, joint author. [mockridge, norman.] (author not used in sort key) inside the law. london, albert. london at night. london at night. london. central criminal court. london. county council. london, declaration of, 1909. london-description london (diocese) courts. london, jack. white fang. 1930. london, jack. white fang. 1950. london, jack white. alaskan adventure. london, ont. council london, ontario; a history london. ordinances. london-social conditions smith, john, 1900smith, john, 1901-1965. smith, john allan, 1900smith, john, clockmaker. anticipated developments at the present stage of the development of the marc sort program, sked does not have the ability to treat data in a field according to their semantic content. it cannot, for example, treat a character string in a 100 field in a special way because it is a single surname as opposed to a forename or multiple surname. nor does sked include routines for treating abbreviations and digits as if spelled out, or for suppressing data in a given field in some cases but not in others. the achievement of these capabilities will require: 1) development of a generalized technique for taking account of indicators in processing data in variable fields; 2) devising algorithms to handle particular filing situations related to the content of the field; and 3) placement of the algorithms within the framework of the sked program. the refinement of sked is being undertaken in relation to the problem of maintaining the lc subject heading list in machine readable form. techniques developed for this purpose will be applicable also to filing for book catalogs and other listings. the result should be a firm foundation for a comprehensive program for arranging bibliographic entries by computer. 138 journal of library automation vol. 2/ 3 september, 1969 availability of the program since the marc sort program should be useful to libraries that subscribe to the marc distribution setvice, the package (consisting of sked and the modified version of tsrt) has been filed with the ibm program information department. requests should be made through a local branch office of ibm and should cite the following number: 360d06.1.005. references 1. u. s. library of congress. office of the information systems specialist: a proposed format for a standardized machine-readable record. prepared by henriette d. avram, ruth s. freitag, kay d. guiles. iss planning memorandum, no.3. (washington, d. c.: 1965), p . 5. 2. u. s. library of congress. information systems office. the marc ii formats a communications format for bibliographic data. prepared ?y henriette d. avram, john f. knapp, and lucia j. rather. (washmgton, d. c.: 1968), p. 33. 3. "preliminary guidelines for the library of congress, national library of medicine, and national agricultural library implementation of the proposed american standard for a format for bibliographic information interchange on magnetic tape as applied to records representing monographic materials in textual printed form (books)," journal of library automation, 2 (june 1969), 68-83. 4. u. s. library of congress, information systems office: subscriber's guide to the marc distribution service. (washington, d. c.: 1968). 5. avram, henriette d.; droz, julius r.: "marc ii and cobol," journal of library automation, 1 (december 1968), 261-272. 218 an analysis of cost factors in maintaining and updating card catalogs j. l. dolby and v. j. forsyth : r&d consultants company, los altos, california this study enumerates and compares costs of manual and computerized catalogs. the difficulties of making comparative cost studies are examined. the report concentrates on the problems of cost element definition and on the reporting of as many comparable sources as possible. results of cost studies are presented in the form of tables that show comparative costs of cataloging, card processing, conversion, and manual and computerized processing. there are also tables on card catalog costs. conclusions are that the costs of manual and automated methods are essentially the same for short entries, and that there is a substantial economic advantage for automated methods in full entries. a side benefit of the present interest in library automation is the amount of attention now being given to study of the traditional methods of librarianship. this phenomenon is hardly unique to librarianship; in almost every area of human endeavor where attempts have been made to introduce the use of computers, workers in the field have suddenly discovered that they did not understand some of their long-standing methods quite as fully as they had believed. the source of this seeming anomaly is easy to find: to program a computer, it is necessary to specify the work to be done in much greater detail than is necessary to explain the same problem to a human being, that curious human phenomenon known variously as "common sense" or "experience" making up the difference. it has not been uncommon over the past decade to hear many survivors of the "automation experience" admit that a main benefit of use of the machine was catalog cost factors/dolby and forsyth 219 acquisition of better procedures through a more detailed understanding of the process involved. improved knowledge of "processes about to be automated" extends to the cost of the process as well, and with added force. in recommending the substitution of one procedure for another in a cost-conscious atmosphere, it behooves one to proffer sound financial reasons for doing so. computers are expensive devices. they also represent expenditure of a different kind of money: capital or lease funds in place of labor expense. thus, although one can still hear the occasional cry that it is difficult to obtain reasonable cost data on various parts of library operations, it is becoming increasingly difficult to pick up an issue of almost any library journal that does not include at least one piece of cost information. this paper is concerned with the cost of maintaining and updating card catalogs. as the authors have observed elsewhere (l), the cost of computing is going down at a rather spectacular rate, while the cost of labor is increasing. if this trend continues, almost every library will be forced to automate certain aspects of the catalog operation at some point in time. the cited report provided some information about the cost of computerized library catalogs. by adding a summary of the cost factors in the use of card catalogs, this article should place in slightly better perspective the more difficult problem of deciding (in the context of a particular library) when the crossover point between manual and automated methods is to be reached. the plan of attack remains essentially the same as in the previous report: selection from among the growing number of papers on the subject those that provide comparable sets of cost information pertinent to the various cost elements of the card catalog operation. it is appropriate, therefore, to begin this study with a brief description of the difficulties in comparing cost statistics in such a way. problems of comparative cost studies although comparative cost studies have much to recommend them, they are fraught with certain difficulties ( 2). in the first place, few librarians would group elementary cost operations in precisely the same way. one library may consider a particular element of cost as part of the acquisitions operation and a second as a part of the cataloging operation; a third may ignore it altogether, or include it in the burden or overhead cost. nor is this mere capriciousness on the part of members of the library community. library operations not only differ from one another, but they also change with time. consider, for example, the problem of obtaining a set of catalog cards for a particular monograph. any or all of the following alternatives might be in use at a given library: the cards may be l) supplied with the book as a service provided by the bookseller at some extra cost; 2) ordered from the library of congress; 3) provided by a centralized cataloging 220 journal of library automation vol. 2/4 december, 1969 operation serving several libraries (as in a county or state library system); 4) prepared by catalogers working in the library; or 5) generated by computer program from standard listings (e.g., from marc tapes). comparing any two of these procedures within a given library does not present any overwhelming problems, although minor questions of definition do occur (for example, how much of the cost of ordering should be allocated to the acquisitions department and how much to the cataloging department when both the book and the catalog cards are obtained simultaneously from the same source?). however, to compare costs from two different libraries, it is essential to know what proportion of each card source was used by each library. fortunately for the purposes of this study most libraries are presently using a mix of method 2 ( lc) and method 4 (own catalogers), and at least some provide sufficient information to enable determination of the appropriate mix for each. however, the problem is indicative of one essential difficulty in comparative cost analyses; and one that, although eased, would not be eliminated by having all libraries band together for adoption of a standard costing procedure. a second difficulty arises from temporal and geographic differences in the cost of manpower. on the surface, this problem can be eliminated, or substantially reduced, by having all studies based on man-hours spent, rather than on dollars required per item, and a number of writers have suggested such a change in reporting procedure. however, the problem is not quite so simple. for example, determining the number of man-hours spent on cataloging adds cost to the study that tends to reduce the number of libraries willing to report; those that do report may or may not be a representative sample of the total. however, there is a more basic problem. in almost all libraries the real restraint on activities is financial: there are just so many funds available for cataloging and these must be used to at least keep the backlog of uncataloged material down to the amount of space available to store it. suppose, for instance, that the amount of material to be cataloged increases by ten percent from one year to the next and that the catalogers are fortunate enough to obtain ten percent salary increases over the same period. it is not impossible to consider that in some libraries the catalogers may be forced to "earn" this raise by absorbing at least a part of the increased load without extra help. balancing salary increases by productivity increases is, of course familiar in industry and may well exist in libraries. as evidence that such an effect is present, it is noted later in this report (see table 4) that three studies made in three rather different libraries over a period of six years showed costs of from $0.228 to $0.235 per card for preparation, production, and filing. the total range ( $0.007) is only three percent of the average cost pef card. ($0.230). such close agreement would be startling if it were found in three simultaneous studies of three nearly identical library operations. to set this agreement aside as pure coincidence seems unwarranted. it is catalog cost factors/dolby and forsyth 221 more reasonable to assume that librarians are forced to operate under strong financial constraints and that they adjust their performance to those constraints through hiring of less well-trained personnel, increased time pressures on all personnel, etc. if this is the case, "standardized" reporting through time figures might be quite misleading unless cost figures were reported as well. finally, there is the question of allocating burden or overhead. potentially, burden could present a severe problem, and occasionally it may. however, in most of the reports cited here, burden is either ignored or separately stated and there is no reason to suspect that the results given in the summaries are noticeably biased by unseen burden differences. nevertheless, it would be of interest to determine proper overhead figures for library operations, as the switch to automation (which seems inevitable), will entail the use of more machines and fewer people, which in turn may drastically alter the overhead structure. the use of cost information having noted some of the difficulties that tend to cloud cost comparisons, it is perhaps useful to investigate how cost information is likely to be used. the nature of the problem can be illustrated by two rather different situations. one is exemplified by library "a", a large public library of some years' standing. it is considering the possibility of changing from its present manual procedures to some form of automation, and wishes to determine a reasonable strategy for implementing such a change over the next five years. library "b", otherwise comparable to "a", has been keyboarding the catalog records of its current acquisitions for the last three years. it has now decided to convert its retrospective catalog and wishes to choose the most economic procedure for this step. the differences in the problems facing two such libraries are basically the classic differences between strategy and tactics. library "a", must lay out a long-term plan, taking into account the growth in its collection over the five-year period, likely changes in equipment and personnel available to it, increases in labor costs, decreases in equipment usage costs, etc. library "b", on the other hand, is in the position of making a specific set of decisions as to whether the work should be done in-house or subcontracted; whether the library should use punched cards, punched paper tape, or optical character-recognition devices; and so forth. in terms of cost, library "b", has to prepare a specific budget request for its funding agency, and it is reasonable to assume that that funding agency will require assurance that the task is to be accomplished at the minimum cost consistent with the designated quality level. cost differences of as little as five percent may be quite important to library "b". general cost summaries can be of use only in enumeration of the possible alternatives. even the accounting procedures in effect in the local system will have a bearing on the final decision. 222 journal of library automation vol. 2/ 4 december, 1969 thus, the primary utility of a general cost summary to the library about to commit itself in a tactical situation is the information it can provide about the problem statement: which cost factors other libraries have been able to identify in similar situations; which of the various alternatives may be safely eliminated from consideration on the grounds that their present costs are considerably higher than other existing methods; and so forth. the likelihood seems remote that any general study, or, for that matter, any particular study, will be sufficiently applicable to the library now undertaking the problem to enable it to take over cost structures unchanged. library "a", faced with establishing a long-range plan, has much more flexibility available to it. its interest in specific costs will be established by some gross notion as to what quantity of funds are likely to be available over the period under plan. some procedures may be seriously considered because they are relatively new and untried and hence of potential interest to national funding agencies who would not consider funding further experiments with procedures that have been thoroughly tested. access to good cost information of such well-tested procedures will help in establishing the likely costs for important aspects of the overall plan. of even greater interest is the possibility that certain costs are likely to undergo substantial change over the planning period. for instance, in reference 1 it was noted that optical character recognition may be a very attractive long-run option for catalog conversion problems precisely because it is so new, and hence has not had time to allow a sufficient number of service centers to spring up to provide truly competitive service capabilities. computer typesetting with the new generation of hardware is in much the same category. in both situations it is clear that what is most needed is the enumeration of cost elements on the one hand and operating cost experience on the other. precise estimates of any one cost element are of relatively little importance, either because they are so likely to change over the long run, or because they are likely to be not appropriate to a specific application even in the short run. comparative cost information would therefore seem to provide a good basis for either application. the comparison forces an enumeration of cost elements precisely because one must evaluate the cost structure of each source to be sure that a reasonable comparison is possible. reporting of the actual experience of several libraries provides a range of experience, not only over several libraries but also over time, so that the extremes reported give an indication of the variability that must be allowed for. in what follows, therefore, concentration is on the problems of cost element definition and on the reporting of as many sources as are comparable in the broad sense. because precise estimates are not only difficult to ob' tain, but also unlikely to be relevant to most users, no attempt has been made to provide formal estimates either of the average cost figures or of their underlying variability. catalog cost factors/ dolby and forsyth 223 the cost of cataloging the preparation of catalog information for a given monograph is perhaps the most sophisticated operation in the entire catalog operation. as such it is probably the last to be considered a candidate for automation, although it is not unreasonable, even now, to consider the use of computers as aids to the cataloger. consequently in many operations the cost of cataloging will continue to be an invariant regardless of whether automation is introduced into other aspects of the catalog operation or not. nevertheless, it is useful to study the cost of catalogs, both to establish the relative cost of cataloging and the subsequent processing steps, and to establish the line of demarcation between the catalog step and the subsequent steps. any enumeration of the detailed steps involved in a complex process must be tentative. this is nowhere more true than in the cataloging operation. fortunately the number of descriptions in detail is growing. for the cataloging operation, three sources of information were used: 1) a detailed analysis made as part of an overall time and motion study of operations in the lockheed research library ( 3); a detailed study of the cataloging and processing activities of the new york public library as a preliminary to possible automation of some of these operations ( 4); and a detailed study of the acquisitions, cataloging, and other processing operations for the columbia university science libraries ( 5) . a summary of these studies is given in table 1. in addition to the eight items in table 1, the lockheed library study included five other items that we have chosen to include in subsequent operations. it is generally true that professionals do not like to have their jobs subjected to the minutiae of time and motion study. there is always the ugly feeling that the creative (and most important) aspects of the job cannot be subjected to simple measurement. nevertheless, cataloging is a continuing effort in most libraries and it is possible to establish some average production rates in terms of number of books cataloged per month or the number of minutes needed per book. the problem, as with most statistical studies, is not with the establishment of objective measurements but rather in the manner in which they are interpreted. use of comparative statistics does not eliminate the possibility of misinterpretation but it does tend to minimize it. the comparative studies selected for the cataloging operation, in addition to those already cited, were: a colorado study based on average cataloging times for eleven librarians from six cooperating libraries ( 2), and a study of ordering, cataloging, and preparations in several southern california libraries ( 6). the catalog cost information for these five studies is summarized in table 2. table 1. cataloging cost elements columbia university science (with lc information) 1. assign class number 2. compare book and card, check entries in general catalog, establish subjects, etc. 3. make necessary changes in lc proof slip, or type temporary slip giving brief descriptive information and class number 4. completed books revised and sent for shelf listing ( without lc information) i. supply descriptive cataloging 2. subject analysis, classification and authority work 3. type workslip for processing section. new york public 1. review work done by searcher. reconcile conflicts and approve new entry forms 2. full descriptive cataloging 3. assign subject entries 4. assign divisional catalog designators 5. check authority files and establish new authorities and cross references 6 . determine classmark lockheed research laboratory 1. 2. 3a. 3b. 4. 5. 6. 7. get book and analyze for subject. obtain dewey and cutter numbers check shelf list for duplicates and copy number (with lc information ) insert and type copy slip and temporary catalog card, check lc subject headings and other references. descriptive and subject catalog book pencil call number on title page (without lc information) insert and type descriptive part only on copy slip and temporary catalog card. write subject data only on catalog card. pencil call number on title page tear and separate copy slips and temporary cards. proof and correct as necessary. take report to reports cataloging travel to library, check national union catalog or other reference book count and tally titles cataloged 1:'0 ~ ....... ~ 5 ..... ..a t'"'l .... c::s-' a ~ :;t.. ~ ~ .... g· 6!:""" 1:'0 ~ tj (b (') (b s 0"' $ ..... co ffi catalog cost factors/ dolby and forsyth 225 table 2. comparative costs of cataloging library source date average cataloging implied avg. time, min. cost salary (per hour) lockheed 1967 10.0 colorado 1969 28.6 $ 2.07 $ 4.34 new york 1968 39.8 6.30 5.25 so. cal. 1961 44.8 2.23 2.98 columbia 1967 84.0 5.85 4.17 in the lockheed and colorado studies, basic times of each operation were studied and then "standard" time factors added to allow for nonproductive time. the standard factors increased the lockheed times by 13 percent and the colorado times by 48 percent. (the times in the table include these allowances.) the figures for new york were derived from their reported statements that they processed 65,000 books using 49 catalogers at a total cost of $409,500 (not including fringe benefits). the columbia figures have been reduced by 20 percent to eliminate fringe benefits. the implied average salary for each source was obtained by dividing the total cataloging cost by the average time and multiplying by 60 to convert to cost per hour. the simplest conclusion to reach from a study of table 2 is that cataloging costs vary widely from one library to another. average times differ by more than 8 to 1 and total cost varies by more than 3 to 1. the low salary for the southern california study is presumably explained by the fact that that study was done in 1961. adjustment of this figure for average salary increases from 1961 to 1968 would undoubtedly bring their total cost more directly in line with the other studies (bureau of labor statistics shows hourly wages increased approximately 30 percent over this period) . it would be interesting to know if the presumed increased salaries of the southern california catalogers has led to a decrease in the average time they spend on cataloging. the more recent data on colorado and new york suggest that this might be expected. the columbia and lockheed time data represent, perhaps not unreasonably, the extremes in this table. the lockheed research library is small compared to the others, and lockheed is, of course, a private corporation, whereas the other sources represent public and university libraries. columbia, on the other hand, is a large university library; however, the figures given are from a study of cataloging of science monographs, which may be more time-consuming. as these cataloging cost figures will be used only as a point of comparison with subsequent operations, it is not necessary to further resolve the apparent differences. the average time for the five sources is 41.4 minutes. assuming that a cataloger currently earns $4.50 an hour, the average cost for the five sources would be $3.11 for the unit cost of cataloging. 1:-0 1:-0 table 3. processing cost elements ~ ....... c columbia university new york public sacramento state ~ '"'t 5 ...... 1. card production 1. receive and distribute plan1. type master cards from c -2. card set completion ning sheets handwritten slips ~ .... 3. sorting and preliminary 2. type headings for added en2. produce subject cross refer<:l"' a filing tries and subject entries ence cards '"'t <.s::: 4. shelf listing 3. mark designators and sort 3. maintain guide cards > 5. typing of book pockets completed cards 4. card production and pur~ ..... 6. filing 4. distribute cards to filing chase c ~ section 5. complete card sets ~ ..... 5. paint edges of cards when 6. proof .... c ;:s required 7. alphabetize 6. glue and separate batches 8. file and revise < c 7. type masters for offset print9. card shifting !'""" to ing 10. update existing cards ........ 8. prepare copy for itek masters 11. correction of problems 11:>9. check format of entry on 12. withdrawals t:) (1) masters 13. weed order slips (') (1) 10. check letter for letter on 14. assembly of statistics s 0" planning sheet 15. file temporary slips (1) 11. gather statistics and keep 16. file permanent slips u'"t ...... log of card preparation 17. shelf list shifting co ~ 12. prepare itek masters and 18. blank catalog card stock co print cards on offset 13. file catalog cost factors/dolby and forsyth 227 card processing costs if cataloging is the least likely part of the library operation to be automated in the near future, the procedures that immediately follow cataloging are precisely opposite in character. card preparation, production, and filing all involve time-consuming routine operations that can be done automatically, thus relieving the library community of a significant proportion to man-hours to apply to problems of greater intellectual content. cost factors must nonetheless be considered. as with cataloging, description of basic cost elements will vary from one library to another. for the detailed breakdown in table 3, use is again made of the columbia and new york public studies previously cited. added to them is data from an unpublished study made available by neil barron of sacramento state college library. barron's cost elements are given in finer detail than those in the other studies reported in this section. in table 4, data from the new york public library and from the sacramento state college library have been grouped into three categories (preparation, production and filing) to achieve maximum compatibility with data from other sources reported in the table. these sources are: a study (7) at the university of toronto of manual costs made in conjunction with early machine methods; a comparative study ( 8) of manual methods and a special-purpose machine procedure at the air force cambridge research laboratory library; and results of three years of computerized card production at the yale medical library (9). costs shown in table 4 are on a "per-card" basis, rather than on a title basis, as differing library requirements show averages ranging from 4.6 cards per title at sacramento to 9.8 cards per title at new york public. most significant in table 4 is the extraordinary agreement between two of the studies: the total processing costs amount to 23.2c per card and 23.5c per card for these two sources, even though the reports were prepared over a six-year period and include significant changes in the cost of labor and materials. furthermore, these costs are reasonably constant for the individual categories in all three sources: card preparation varies from 11.4c per card to 11.6c per card; card production varies from 6.4c per card to 7.9c per card; and card filing varies from 4.2c per card to 5.2c per card. in one sense this close agreement should not be surprising. if it is indeed true that cataloging involves relatively high intellectual content that is difficult to automate, and card processing involves straightforward operations that are relatively easy to automate, it is reasonable to argue that the latter should show much less variability from one operation to the next. the fact that the new york public operation has significantly higher costs can be partially explained by the following observations. the nypl costs are based on the supposition that all cards are locally produced. the to to 00 ....... 0 ~ table 4. comparative costs of card processing ~ -0 date 1968 1969 1965 1963 1968 ~ .... ~ i:s library nypl sse onulp afcrl ch¥ ~ > cards per title (9.8) ( 4.6) (-9) (7) (9.3) ~ ..... 0 ~ (local) (lc (machine) ~ ..... .... 0 cat.) ~ preparation 0.140 0.116 0.114 0.088 < } 0 !"""" 0.233 0.166 0.075 to production } 0.064 0.079 ~ 0.186 tj filing 0.052 0.042 0.043 0.043 0.043 (b () (b totals 0.336 0.232 0.235 0.276 0.209 0.118 9 c"' ~ (b ~~ 0.228 ~ co 0':> co t catalog cost factors/ dolby and forsyth 229 other libraries indicate that a significant proportion of their work is based on the acquisition of lc cards. the breakdown for the afcrl study is shown in table 4 and the breakdown for sacramento is approximately the same. secondly nypl is clearly the largest of the operations under consideration here, and it is not unreasonable to expect that the size of the file will have an effect on the cost of filing. in fact, assuming that the nypl cost of preparation and production is the same as that for the afcrl' s locally produced cards ( 27 .6c) and assigning the rest of the nypl cost to filing, the latter figure becomes 10.3c per card, or a little more than twice the average for the other three sources ( 4.8c per card). if this is the case, it would be of interest to know whether the problem is one of sheer size of the catalog or rather one of increased density that naturally occurs in larger files. e.g., is it more costly to file "smith, adrian j." in a file with 100 smith's or 1000 smith's? finally, in the two cases of partial automation ( afcrl and yale) the cost of card preparation and production is significantly lower (7.5c and 8.8c) than that indicated for lc cards ( 16.6c), or the average for the three closely agreeing sources ( 23.2c). this observation alone should point the library community strongly towards automation of the card processing function. nor is this observation new; both authors of the preliminary studies at afcrl and yale made the point more than adequately. furthermore, as will be demonstrated shortly, the cost of filing is also reduced in an automated system. several factors may be contributing to the slowness of the library community to introduce changes to achieve such cost savings. first, there is inevitably a substantial initial cost involved in any automation project. second, although the potential cost saving is a substantial proportion of the processing cost, it is still small when compared to the cost of cataloging; a librarian under pressure to reduce costs could gain more by cutting back on the time allowed for cataloging without the initial investment necessary for automation. third, there is a persistent difficulty in finding trained personnel in the automation field. finally, librarians are certainly aware of the rapid changeover in equipment in the computing field with the concomittant costs of adapting programs to new equipment. case and space the preceding discussion has provided some notion as to the cost of obtaining the required cataloging information, encoding it on catalog cards, and entering those cards in a catalog file. these costs can be compared with other possible approaches to the problem, including those that involve some degree of automation. there are, of course, a number of associated costs that must be taken into account to obtain a full picture of the cost of card cataloging. they would include, at a minimum, the cost of the space occupied by the catalog, the purchase price of catalog filing 230 journal of library automation vol. 2/4 december, 1969 cases, the cost to the user of consulting th~ catalog, and the cost to the library of maintaining the catalog in usable form. the allocation of capital expenditure costs to a form comparable to the costs per title and the costs per card used in the earlier sections of this rep01t raises certain difficulties. accounting procedures vary from one institution to another. further there is the real but difficult-to-measure problem of comparing funds of various types in a particular situation. nonetheless, it is useful to know whether under any reasonable accounting system the cost of space and cabinets is of sufficient magnitude to make it worthwhile to consider these costs in the overall evaluation. assuming, therefore, that a filing case capable of storing 72,000 cards fully packed costs $800 and occupies approximately 30 square feet of space, including room for aisles and access area, and further assuming that land and construction costs are approximately $30 per square foot, the total cost of the cabinet and the space it occupies would be approximately $1,700. finally if it is assumed that on the average a catalog is approximately 60 percent full, the initial cost of space and case is approximately 4c per card. four cents a card is not negligible, but it is only about 15 to 20 percent of the cost of producing the cards and an even smaller fraction of the total cost when cataloging is included. hence, it seems reasonable to put this cost for space and case in the category of a secondary cost item that will favor book catalogs, microfilm catalogs, and other high-density forms. it is unlikely to be a determining factor unless other cost factors are very closely balanced. book and card catalogs: some relative advantages among the various cost factors involved in cataloging, the most difficult to assess objectively is the cost to the user. the problem is that no one really knows what a user does in a library, nor what impact a given change will have on its utility to him. whether they like a card catalog or not, library users do consult it and it is thus a usable device for providing access to library materials. equally, many libraries in times past, and again more recently, have had book catalogs; and they also are viable devices. but which is better? a card catalog is updated by the simple expedient of entering recently obtained cards in the file. a book catalog is updated by periodically printed revisions. hence any search for a particular item will in general require fewer specific searches in the card catalog than in the book catalog, if the proper information is available to the searcher. card catalogs are large and costly and there are few savings over the original cost in producing a second copy. reproducing books after the first copy is relatively inexpensive. libraries with many branches, or a decentralized set of users, will provide better service with book catalogs. the added cost of maintaining more than a few files is heavy with cards and light with books. whether card or book catalogs are used, the existence of a machine table 5. comparative conversion costs per title mar. 68 1968 1964 1968 1966 1964 1966 lc lacp onlup nypl uc/ b chy sul 446 char. -450 char. 400 char. 300 char. 317 char. 243 char. 180 char. coding/editing $0.169 $0.0801 $0.044 keying 0.207 } $0.480 $0.307 } $0.450 0.188 $0.198 } 0.183 re-keying 0.033 ) 0.030 } 0.117 ~ ~ 0.259 ~ proofing 0.125 0.127 0.085 0.103 s" j 0 ~ rental 0.156 0.084 0.6502 0 .036 0.037 ~ 0 0., ..... conversion & list } 0.359 0.020 }o.096 0.046 0.020 0.024 } 0.1043 ~ ~ c) edit list 0.084 0.141 ..... 0 ~ 0., sort & merge 0.165 0.121 '-. t::l 0 supplies 0.080 0.036 0.5084 0.033 r-c t:d supervision 0 .183 0.580 ....::: ~ ~ 0... 1 includes provision for keypunch rental, and supplies "tj 0 ::::0 2full keypunch rental absorbed by pilot project u) ....::: ..., 3includes use of automatic error-detection routines ::r:: 4includes cost of magnetic tapes and other supplies t--:l cn 1-' 232 journal of library automation vol. 2/4 december, 1969 readable catalog provides much greater flexibility as time goes on. revisions of cataloging practice become much simpler if the revisions can be programmed on a computer. in sum, machine readable book catalogs appear less advantageous than card catalogs only when immediate updating is the primary criterion for comparison. comparative costs of catalog conversion table 5 (an extension and revision of table 7 appearing on page 42 of reference 1) gives comparative conversion costs for three public libraries (library of congress (10), new york public library and los angeles public library), the library of the university of california at berkeley, the stanford undergraduate library (11), the ontario new universiti~s library project, and the columbia-harvard-yale study. although the data was gathered for the most part independently over a fouryear period, it is worth making a number of internal comparisons to test for consistency. the most outstanding comparison is between the encoding costs for the library of congress and those for the los angeles public library. for records of essentially the same average length ( 446 characters versus 450 characters) the coding costs agree to the penny! yet the methods of production are significantly different. the library of congress invested heavily in the coding and editing operation and used paper tape typewriters with their relatively high rental. as a result its costs in this area are significantly higher than those for lacp. on the other hand these procedural changes resulted in significantly lower keying costs, so that the overall cost for encoding was the same. the encoding costs of uc/b, chy, and sul are all very close (within three cents per title) even though there is a fair range of record size (from 180 for sul to 317 for uc/b). these three studies probably provide a more reasonable picture of the underlying variation in cost than the unusually close figures for lc and lacp. as a further test of consistency, average cost is plotted against average record length (in characters per record) in figure 1. the rightmost points are for lc and lacp, and the line is simply drawn through the origin (zero dollars, zero cost) and those points. the points of uc/ b, chy and sul cluster about the center of the line. following is an interpretation of the other points charted. the nypl point of $.45 for a 300-character record is not based on actual nypl experience, but rather on a study of information from other investigations. its proximity to the line suggests that nypl's analysis of existing information reaches a conclusion similar to that of this paper. the average encoding cost used to plot the onulp point does not contain the full rental charge reported in the onulp study, because the entire cost of keyboard rental was charged against the project although catalog cost factors/ dolby and forsyth 233 100 § ....l 80 8 z lc lacp ..... lzl ....l onulp • h ..... 60 h ~ lzl ll< h "' 0 u • nyp '-' 40 z sul • • uc/b h q .chy 0 u ~ 20 hul . o~-----.lon0------~2~00~----~j~00~-----4~00~----~j~00~----~6~00 average record length in charact ers fig. 1. encoding costs per title as a function of average record length. the machines were only partially utilized. the point for harvard university library ( hul) is based on information received in a private communication. although there is a significant amount of variation from one study to another it seems reasonable to conclude that the cost of encoding is approximately $.15 per title per hundred characters. the cost of computation is not as well-documented as the cost of conversion. studies that reported computer costs all include the following three operational costs: the first is the cost of conversion and listing. this cost includes the cost of converting the original machine readable form (be it cards or paper tape) to magnetic tape form. in most cases a byproduct of this operation was a listing (all-caps only) of the material on the tape. the second is the cost of an edit run, including a listing in upperand lower-case. the latter was eschewed in a number of cases because of the added costs. however, many libraries would require a proper edit run and many librarians would prefer to edit from an upper/ lower-case printout than from an all-caps printout. 234 journal of library automation vol. 2/4 december, 1969 the third is the cost of sorting and merging the tapes. many of the early studies did not explicitly report on this cost because they were primarily concerned with the cost of converting the retrospective list. however, in an on-going operation this would be a continuing cost of some magnitude. the available information points to a uniform cost of approximately $.02 per record for conversion and list, and approximately $.08 per record for editing. the two studies where both these costs are given indicate that a ratio of 4 to 1 is appropriate. the only study giving a ratio between the sort and merge operation and the edit operation is the nypl study and this is based on before-the-fact-information only; the ratio is approximately 8 to 7. for convenience, one can assume that this ratio is unity, giving an overall ratio of 4-4-1. the most complete history of total computer cost is given by lc: a total of $.36 per record for 446 character records. applying the above ratio to the lc total yields a breakdown of $.04 for conversion and list, $.16 for editing, and $.16 for sort and merge. extending the stanford cost of $.12 for conversion and list and editing gives a total cost for sul of $.22 for its 180 character records. this figure is considerably more than 180/ 446 parts of the lc cost. one other pertinent piece of information is available from the sul data. in the production of the annual catalog, stanford estimates a cost of $.121 per title for what is roughly comparable to the cost of sort and merge. this cost is then roughly 1.2 times the sul cost for conversion and list and editing, verifying the notion that the cost of "sort and merge" is of the same general magnitude as the cost of editing. the ratios of sul costs to lc for encoding are .367/.690 = .532 and .225/.359 = .625 for computer time. this suggests that the means of computing average record length may be different for the two institutions. taking the lc figures as the standard and assuming that both computing and encoding costs are strictly a function of record length, the sul record length should be between .532x446 = 238 and .625x446 = 279. this discrepancy may be a result of one source (presumably lc) counting all delimiter and other non-printing characters while the other does not. nypl indicates that the ratio of printed characters to total characters is approximately 3:4. if the sul figure of 180 is expanded by one third, one obtains the figure of 240 which agrees well with the lower limit (based on encoding costs) given above. the cost of sort and merge is a function of the size of the data base, not the amount of material being put into it. the library of congress points this out in its study ( 11) and report on an average month (where the data base grows for a period and then is reduced to zero.) stanford undergraduate library figures are based on its second year of operation, in which 16,000 titles were added to form a total base of 41,000 titles. the actual cost of this step in the operatiqn will therefore depend strongly on the operating strategy employed. clearly, the number of times one catalog cost factors/dolby and forsyth 235 has to sort and merge the entire data base should be minimized, particularly taking into account the fact that sorting costs go up faster than linearly. if the master file is arranged in n orders (author, subject, title, class number, etc.), it will generally be less expensive to sort the updating material into those n orders and make n merge runs with the sorted master files than to make a single merge with a single ordering of the master file and then sort the master file n times to obtain the required updated orderings of the master file. manual and computer processing: comparative cost one objective of this paper is to define factors whose costs enter into calculations of relative costs of manual and computer processing of catalog information and to report these factor costs. the following paragraphs present a simplified comparison of actual costs of manual and machine processing for a "typical" library characterized by average costs approximating those in the preceding tables. table 5 yields average figures for two cases: catalogs with approximately 425 characters per entry and catalogs with approximately 250 characters per entry; they may be called "full entries" and "short entries," respectively. from table 4, it is possible to compute similar figures for "full catalogs" and "short catalogs" by clustering the three larger cases (those having 9.8, 9.0, and 7.0 cards per title) and the three smaller cases (those having 3.0 and 4.6 cards per title). for the full catalogs the average cost of processing is 26.7c per card and 8.6 cards per title, or a total cost of $2.29 per title. for the short catalogs the average cost of processing is 20.3c per card and 3.8 cards per title, or $0.78 per title. combining these two sets of figures gives the results in table 6. table 6. comparative costs of manual and computerized processing short full entries entries manual $0.78 $2.29 computer $0.84 $1.31 table 6 shows that an hypothesized "typical" library would be slightly better off with manual methods if it chose the short form entries, and noticeably better off with the machine if it chose the full form of the entry. in making this quick comparison, consideration has not been given to several factors that should obviously be taken into account even in this simple example. first, there is not included either the initial cost of programming or the initial cost of converting the retrospective records. either or both of these costs could be substantial, but as they are one-time costs and as libraries are basically long-term institutions, such costs should be written off over a relatively long period, even though they must be financed out of a given year's budget. 236 journal of library automation vol. 2/4 december, 1969 second, the cost of printing the catalog is not included (assuming a book catalog is in fact to be used in the computerized system). thus the comparison in table 6 is between a card catalog and a catalog in machine readable form. such a comparison is complicated by the fact that a card once filed stays in the catalog indefinitely, subject only to longterm wear and tear and a certain rate of attrition due to unauthorized removal, misfiling, and so forth, whereas the machine readable catalog must be updated periodically and supplemented by interim publications. and, of course, the comparison is also complicated by the corresponding low cost of producing a number of copies of the book catalog where this is useful for a given system. however, to put the printing cost in some degree of perspective, one may make a quick calculation based on the production of a single book catalog using a standard upperand lower-case print chain. at present commercially available prices this would cost between 35c and 50c per 10,000 characters, or approximately 9c per entry for the full form entries and 5c per entry for the short form entries (assuming four complete listings for author, title, subject, and class number listings). this added cost would make the comparison between manual and computerized methods even less favorable for the short form, but still substantially better for the long form entries $1.40 to $2.29). conclusion it may be concluded that the card-processing operations in typical libraries can be automated economically in many situations today. libraries using the short form of a catalog and having no immediate need for multiple copies of the catalog may find it desirable to wait a year or two, depending upon their local situation, the availability of trained personnel and, of course, the availability of capital to finance the initial cost of programming and retrospective conversion. however, libraries using the full form in their catalogs, or those needing multiple copies of their catalogs, will almost certainly find that there is a substantial economic advantage to computerization at the present time. even when allowance is made for substantial departures from the "typical" costs found in this study, it is difficult to visualize any library using full form information not finding significant economic gains in computerization. considering the further advantages of the greater flexibility available in machine readable records, the increased services that can be offered to the user, and the fact that machine costs are decreasing while labor costs are increasing, one is led to the conclusion that more and more libraries will move towards catalog automation. tables 7 to 11 appearing on the following pages are reference tables for calculating costs. catalog cost factors/dolby and forsyth 237 acknowledgments the work reported in this paper was supported by the u . s. office of education under contract number oec-9-8-00292-0107. mrs. henriette avram (library of congress) and mr. neil barron (sacramento state college, sacramento, california) made important contributions of cost figures and other technical data used in this report. various state libraries supplied detailed cost information. bibliography a 400-item bibliography on cost and automation is available from the national auxiliary publication service of asis (naps 00696). references 1. dolby, j. l.; forsyth, v. j.; resnikoff, h. l.: computerized library catalogs: their growth, cost and utility (cambridge, mass.: m.i.t. press, 1969). 2. dougherty, richard m.: "cost analysis studies in libraries : is there a basis for comparison," library resources and technical setvices, 13 (winter 1969), 136-141. 3. kozumplik, william a.: "time and motion study of library operations," special libraties, 58 (october 1967), 585-588. 4. henderson, j. w.: rosenthal, j. a.: libmry catalogs: theit preservation and maintenance by photographic and automated techniques (cambridge, mass. : m. i.t. press, 1968). 5. fasana, paul j.; fall, james e. : "processing costs for science monographs in the columbia university libraries," libmry resources and technical services, 11 (winter 1967), 97-114. 6. macquarrie, catherine: "cost survey: cost of ordering, cataloging, and preparations in southern california libraries," library resources and technical services, 6 (fall 1962), 337-350. 7. bregzis, ritvars: "the onulp bibliographic control system: an evaluation," in university of illinois graduate school of library science: proceedings of 1965 clinic on library applications of data processing (urbana: university of illinois, 1966), pp. 112-140. 8. fasana, paul j.: "automating cataloging functions in conventional libraries," libmry resources and technical services, 7 (fall 1963), 350-365. 9. kilgour, frederick g.: "costs of library catalog cards produced by computer," journal of libmry automation,. 1 (june 1968), 121-127. 10. avram henriette: the marc pilot pro;ect (final report on a project sponsored by library resources: chapter viii: "cost models" (washington, d. c.: library of congress, 1968). 11. johnson, richard d.: "a book catalog at stanford," journal of libmry automation, 1 (march 1968), 13-50. to table 7. cost/ card-library of congress catalog cards (july 1968) co 00 extra ..... c ~ all chgs/title -t 5 titles au orders ...... 1st cd of 3 add'l copies same specific subsc for lacking .a lc cards ordered by/for 1-2 cds only or more order cd ordered same tm. subject all cds req info l:"'t .... <:3-' 1) lc # $ .22 $ .10 $ .06 **-~ ~ > 2) author & title .27 .15 .06 ~ 0 ~ 3) series .10 .06 1:::. ..... s· ;::s 4) subject -~---.10 .06 ---<: 0 5) chinese/ japanese/ korean .22-.27 .10-.15 .06 .04 $ .04 !'"""' to .......... .;:.. 6 ) motion pictures & .22-.27 .10.15 .06 .10 .04 filmstrips tj (!) (") (1) 7) phonorecords .22.27 .10.15 .06 .10 .04 3 0"' (1) 8) revised & cross ref. .04 ~'"i ---...... co "' 9) anonymous $ .04 co source-lc cds, july 1968 table 8. catalog card costs cards cost/ card cost/ hour time lc cards $.22-.27 (min order 1-2 cds) } $.04 extra chg all .10-.15 ( 1st cd-3 or more order) orders lacking .04-.06 ( add'l copies same cd-same order ) req. info. blank cards < 3-< 4 for $.01 (j ~ soriginal card ..... c ()'q prepantion $.20-2.34 $2.40-4.70 5-30 min/ cd (j c «> .... card checking ~ before filing $.21 $4.20 3 min/ cd ~ c") 8' ~ correcting «> .......... detected $.12 $2.40 3 min/ cd tj 0 errors t"'' t:p file $.024 $2.40 100 cds/ hr ~ ~ .03 3.00 100 cds!hr l:l p.. .047 4.71 100 cds/ hr "%j 0 store $.01 ~ rj:j ~ reproduce $.0023-.00208 ( ab dick offset press = $.125/bk( 54-60 cds ) ::i: .045 ( xerox-1k-100k cds ) 1:-0 c.:> td 240 journal of library automation vol. 2/4 december, 1969 table 9. (estimated) annual cost of 1000 sq ft of storage space 1)" minnesota state dept. of education ( 1968 )-$520 "source-private communication 2) r&d estimate 04 1968 construction cost $30 sq ft x 1000 sq ft $30,000 100 yrs (life of bldg) +maintenance costs, clean up, etc. ($1 yr/sq ft) $50,000 197 4 construction cost $50 sq ft x 1000 sq ft 100 yrs (life of bldg) +maintenance costs, clean up, etc. ($1 yr/sq ft) ""source-e. graziano, univ. calif. at santa barbara table 10. card catalog cost/ year $ 300/yr $ 1000 $ 1300/ yr $ 500/yr $ 1000 $ 1500/yr given the following variables, 1 card catalog case with a maximum card capacity of 72,000 cards (purchase price-$789) -the cost/ card to store would be $.01. estimated construction cost cost sq ft $30/sq ft maintenance rental@ --;100 yrs est. $.42 sq ft/mo life bldg @ $1/sq ft cost/ yr cabinet ( 6 sq ft) $30.24 $1.80 $ 6.00 $ 38.04 room for users ( 16 sq ft) 80.64 4.80 16.00 101.44 aisles ( 3 sq ft) 15.12 .90 3.00 19.02 catalog table ( 5 sq ft) 25.20 1.50 5.00 31.70 $190.20 + 72,000 cards @ $ .01 (to store) 720.00 total cost /yr $910.20 catalog cost factors/ dolby and forsyth 241 table 11. card catalog maintenance costs estimated requirement space cost/sq ft cost/mo cost/year card catalog cabinet 6 sq ft $ .42 $ 2.52 $ 30.24 room for users -16 sq ft 6.72 80.64 aisles 3 sq ft 1.26 15.12 catalog table 5 sq ft 2.10 25.20 30 sq ft $12.60 $151.20 source-e. graziano, univ. calif. at santa barbara and r&d consultants co. antelman 128 information technology and libraries | september 2006 article title: subtitle in same font author name and second author author id box for 2 column layout library catalogs have represented stagnant technology for close to twenty years. moving toward a next-generation catalog, north carolina state university (ncsu) libraries purchased endeca’s information access platform to give its users relevance-ranked keyword search results and to leverage the rich metadata trapped in the marc record to enhance collection browsing. this paper discusses the new functionality that has been enabled, the implementation process and system architecture, assessment of the new catalog’s performance, and future directions. editor’s note: this article was submitted in honor of the fortieth anniversaries of lita and ital. t he promise of online catalogs has never been realized. for more than a decade, the profession either turned a blind eye to problems with the catalog or accepted that it is powerless to fix them. online catalogs were, once upon a time, “the most widely-available retrieval system and the first that many people encounter.”1 needless to say, that is no longer the case. libraries cannot force users into those “closed,” “rigid,” and “intricate” online catalogs.2 as a result, the catalog has become for many students a call-number lookup system, with resource discovery happening elsewhere. yet, while the catalog is only one of many discovery tools, covering a proportionately narrower spectrum of information resources than a decade ago, it is still a core library service and the only tool for accessing and using library book collections. in recognition of the severity of the catalog problem, particularly in the area of keyword searching, and seeing that integrated library system (ils) vendors were not addressing it, the north carolina state university (ncsu) libraries elected to replace its keyword search engine with software developed for major commercial web sites. the software, endeca’s information access platform (iap), offers state-of-the-art retrieval technologies. ฀ early online catalogs larson and large and beheshti summarize an extensive body of literature on online public access catalogs (opacs) and related information-retrieval topics through 1997.3 the literature has tapered off since then; however, as promising innovations failed to be realized in commercial systems, mainstream opac technology stabilized, and the library community’s collective attention was turned to the web. first generation online catalogs (1960s and 1970s) provided the same access points as the card catalog, dropping the user into a pre-coordinate index.4 the first online catalogs, byproducts of automating circulation functions, were “intended to bring a generation of library users familiar with card catalogs into the online world.”5 the expectation was that most users were interested in known-item searching.6 with the second generation of online catalogs came keyword or post-coordinate (boolean) searching. while systems based on boolean algebra represented an advance over those that preceded them, boolean is still a retrieval technique designed for trained and experienced searchers. (twenty years ago, salton wrote, “[t]he conventional boolean retrieval methodology is not well adapted to the information retrieval task.”7) boolean systems were, however, simple to implement and economical in their storage and processing requirements, important at that time.8 soon after the euphoria of combining free-text terms across records wore off, the library community recognized that the major problem with firstand second-generation catalogs was the difficulty of searching by subject.9 ฀ the “next-generation” catalog by the early 1980s, thinking turned to next-generation catalog features.10 out of this surge of interest in improving online catalogs emerged a number of experimental catalogs that incorporated advanced search and matching techniques developed by researchers in information retrieval. they typically did not rely on exact match (boolean) but used partial-match techniques (probabilistic and vector-based). since probabilistic and vector-based models were first worked out on document collections, not collections of marc records, adaptations were made to the models.11 these prototype systems included okapi, which implemented search trees, and cheshire ii, which refined probabilistic retrieval algorithms for online catalogs.12 it is particularly sobering to revisit one system that was developed between 1979 and 1983. the cite catalog, developed at the national library of medicine, incorporated many of the features of the endeca-powered catalog, including suggesting (mesh) subject headings, correcting spelling errors, stemming, as well as even more advanced features, such as term weighting, keyword suggestion, and “find similar.”13 toward a twenty-first century library catalog kristin antelman, emily lynema, and andrew k. pace kristin antelman (kristen_antelman@ncsu.edu), emily lynema (emily_lynema@ncsu.edu), and andrew k. pace (andrew_pace@ncsu.edu) are respectively associate director for the digital library, systems librarian for digital projects, and head, information technology, at the north carolina state university libraries, raleigh. toward a twenty-first-century library catalog | antelman, lynema, and pace 129 ฀ where are we now? as belkin and croft noted in 1987, “there is a disquieting disparity between the results of research on ir techniques . . . and the status of operational ir systems.”14 two decades later, libraries are no better off: all major ils vendors are still marketing catalogs that represent secondgeneration functionality. despite between-record linking made possible by migrating catalogs to web interfaces, the underlying indexes and exact-match boolean search remain unchanged. it can no longer be said that more sophisticated approaches to searching are too expensive computationally; they may, however, to be too expensive to introduce into legacy systems from a business perspective. ฀ the endeca-powered catalog coupled with the relative paucity of current literature on next-generation online catalogs is a scarcity of library industry interfaces from which to draw inspiration, rlg’s red light green and oclc’s fictionfinder being notable exceptions. in june 2004, library automation vendor tlc announced a partnership with endeca technologies for joint sales, marketing, technology, and product development of the company’s iap software. this search software underlies the web sites of companies such as wal-mart, barnes and noble, ibm, and home depot. ncsu libraries acquired endeca’s iap software in may 2005, started implementation in august, and deployed the new catalog in january 2006. several organizational and cultural factors contributed to making this project possible. of significance was an ongoing administrative commitment to fund digitallibrary innovation, including projects that involve some risk. library staff share this feeling that calculated risks are opportunities to improve the library as well as to open up new challenges in their own jobs. critically, they also believe that not all issues, particularly “edge cases,” (i.e., rarely occurring scenarios) must be resolved before releasing a new service. finally, it was important that the managers who controlled access to programming and other resources were also the project leaders and drivers of the collective urgency to solve the underlying problem. all these factors also contributed to making possible a five-month implementation timeline. functionality the principle functionality gained by implementing an advanced search-and-navigation technology such as the endeca iap falls in three main areas: relevance-ranked results, new browse capabilities, and improved subject access. most ilss, including ncsu’s former catalog, presented keyword results to users in one order: last-in, first-out (i.e., system sort), while browsing within keyword result sets was limited to the links within individual records. ฀ searching and relevance ranking of results inhabiting the catalog search landscape now, somewhere between a secondand third-generation catalog, is endeca’s mdex engine, which is capable of both boolean and limited partial-match retrieval. queries submitted to endeca can use one of several matching techniques (e.g., matchall, matchany, matchboolean, matchallpartial). the current ncsu implementation primarily uses the “matchall” technique for keyword searching, an implied and technique that requires that all search terms (or their spellcorrected, truncated form) entered by the user occur in the result. the user is not required to enter boolean operators for this type of search; in fact, these terms are discarded as stopwords. the “matchboolean” technique continues to support true boolean queries with standard operators; access to this functionality is provided through advanced search options. although classic information retrieval research tends to associate relevance ranking with probabilistic or vector-based retrieval techniques, endeca includes a suite of relevance ranking options that can be applied to booleantype searches (i.e., implied and/or). these individual modules are combined and prioritized according to customer specifications to form an overall relevance ranking strategy, or algorithm. each search index created in the endeca software can be assigned a different relevance ranking strategy. this capability becomes significant when considering the differences in the data being indexed for isbn/issn as compared to a general keyword search. since the keyword anywhere index contains the majority of the fields in a marc record and is the default search operator, its relevance ranking strategy received the most attention. this strategy currently consists of seven modules. the first five modules rank results in a dynamic fashion, while the final two modules provide static ordering based on publication date and total circulation. the ncsu libraries, algorithm prioritizes results with the query terms exactly as entered (no spell-correction, truncation, or thesaurus matching) as most relevant. for multiterm searches, results containing the exact phrase are considered more relevant than those that do not. in addition, ncsu has created a field priority ranking, which 130 information technology and libraries | september 2006 provides the capability to define matches that occur in the title as more relevant than matches that occur in the notes fields. the relevance algorithm also considers factors such as the number of times the query appears in each result and the term frequency/inverse document frequency (tf/idf) of query terms. the unprecedented nature of using this particular set of tools to define relevance algorithms in library catalogs meant that the initial configuration required a best guess approach. the ability to quickly change the settings and re-index provided the opportunity both to learn by doing and test assumptions. much work remains, however, including systematic testing of the “matchallpartial” retrieval technique. while not a true probabilistic or vectorbased matching approach, the “matchallpartial” retrieval technique will broaden a search by dropping individual query terms if no results are returned. however, this type of retrieval technique creates the challenge of developing an intuitive interface that helps users understand partial matching (although many users must be aware that this is how google works). spell correction, “did you mean . . . ,” and sort several other features are included in the basic endeca iap application. these include auto-correction of misspelled words, which uses an index-based approach based on frequency of terms in the local database rather than a dictionary. due to the presence of unique terminology in the database (particularly author names), the relevance ranking has been configured to display any matches on the user’s original term before spell-corrected matches. a “did you mean…” feature also checks queries against terms indexed within the local database to determine if another possible term has more hits than the original term in order to provide the user the option to resubmit the search with a different spelling. various sort options are supported, including date, title, author, and “most popular.” ฀ browse whatever the shortcomings of the card catalog, a library user could approach it with no query in mind; any drawer could be browsed. with the advent of online catalogs, this is no longer possible: an initial search is required to enter the system. marchionini characterizes “browsing strategies” as “informal and opportunistic.”15 a good catalog browse should simulate the experience of browsing the stacks, even potentially improving upon it since the virtual browser can jump around. many patrons cite the serendipity of browsing the stacks and “recognizing” relevant resources as a key part of their discovery process. with more books moving to online formats and off-site storage (and therefore, unable to be browsed), enhancing virtual browsing in the catalog becomes increasingly important. as borgman points out, “few systems allow searchers . . . to pursue non-linear links in the database.”16 key browsing features provided by the endeca software are faceted navigation and the ability to browse the entire collection without entering a search term. although most modern search engines support both fast response times and relevance ranking, the opportunity to apply endeca’s guided navigation feature to the highly structured marc record data was particularly intriguing. guided, or faceted, navigation exposes the relationships between records in the result set. for example, a broad topical search might return thousands of results. classification codes, subject headings, and item-level details can be used to define logical clusters for browsing—post-coordinate refinement—within the result set. since these refinements are based on the actual metadata of the records in the result set, users can never refine to less than one record, (i.e., there are no “dead ends”).these clusters, or facets, are known as dimensions. users are able to select and remove values from all available dimensions in any order to assist them as they browse through the result set. endeca’s dimensions, while able to be browsed, are not available only as post-coordinate search refinements, however. using the endeca application, library catalogs can once again give users the ability to browse the entire set of records without first entering a search term. any of the dimensions can be used to browse the collection in this fashion, and the ability to assign item-level information (e.g., format, availability, new book), as well as bibliographic-record elements, to the dimensions further enhances the browsing functionality. ฀ improving subject access given the unsuitability of library of congress subject headings (lcsh) as an entry vocabulary, improving topical (subject) access in catalogs centers around keyword searching. while keyword searches query the subject headings as they do the rest of the record, most systems do not take advantage of the fact that subject headings are controlled and structured access points or use the subject information embedded in the classification number. the endeca-powered catalog, in addition to addressing classic keyword-search problems by introducing relevance ranking, implied phrase, spell correction, and stemming, also leverages the “ignored” controlled vocabulary present in the bibliographic records—subject headings and classification numbers—to aid in improving topical searching. this is a system design concept that has been discussed in the literature on improving subject toward a twenty-first-century library catalog | antelman, lynema, and pace 131 access but has not until now been manifest in a major catalog implementation. as chan noted, “subject headings and classification systems have more or less operated in isolation from each other.”17 the endeca-powered catalog interface is an experiment in presenting users with these two different, but complementary, approaches to categorizing library materials by subject. classification several catalog experiments created retrieval clusters based on deweyand ddc-classification schemes and captions in order to improve subject access by expanding the entry vocabulary and as a way to improve precision and recall.18 using the lc classification is more challenging, however, as it is not hierarchical. still, the potential of its use has been noted by bates and coyle; and larson experimented with creating clusters (“classification clusters”) based on subject headings associated with a given lc class.19 in larson’s system, the interface suggested possible subject headings of interest, an approach similar to that of displaying the subject facets alongside the result set in the endeca catalog. there is some evidence from early usability studies that exposing the classification, much as it was physically exposed in the card catalog, is useful and desired by catalog users. markey summarizes findings of a 1981 council on library resources study in which many institutions conducted usability testing. positive aspects of card-catalog use that people wanted to see in the opac included, a “visual overview of what is available in the library,” and “serendipity.”20 but there is a difference between using the classification scheme to identify subject headings and displaying the classification itself in the user interface. the latter can be problematic from a usability perspective, as larson pointed out, because the classification scheme and terminology are not transparent.21 imagine the would-be browser of a library’s computer-science collection having to know to select first q science, then qa1–qa939 mathematics, and then qa71–qa90 instruments and machines before possibly recognizing that qa75–qa76.95 calculating machines included computer science? despite these potential problems, because the endeca software supported display of the lc classification as a dimension, ncsu decided to experiment with its utility by making it available on the results screen. entry vocabularies entry vocabularies or mappings apply to all types of retrieval models. they address the general problem of reconciling a user’s query vocabulary with the index vocabulary represented in the catalog or documents.22 studies show that users’ query vocabulary is large (people rarely pick the same term to describe the same concept) and inflexible (people are unable to repair searches with synonyms.)23 because of this, bates refers to the objective of the entry vocabulary as the “side-of-abarn principle.”24 several approaches have been taken to develop this functionality. building on larson’s “classification clustering” methodology, buckland created an entry vocabulary module by associating dictionaries created by analyzing database records.25 the result was natural language indexes to existing thesauri and classification systems. while the endeca-powered catalog does not yet incorporate an entry vocabulary, its exposure of the index vocabulary to the user in subject dimensions could be said to be a limited side-of-a-barn approach. the limitation is that only controlled vocabulary from the retrieved records is exposed as dimensions on the results screen; relevant records not retrieved because of a lack of match between query vocabulary and terms in the record will not have their facets displayed. were an entry vocabulary for lcsh available, endeca’s synonym-table feature could be used to map between query terms and lcsh. ฀ implementation the library’s information technology advisory committee appointed a seven-member representative team to oversee the implementation. preparatory steps included sending key development staff to training and a two-day meeting with endeca project managers to establish functional and technical requirements. architecture knowing that the endeca application would not completely replace ncsu’s integrated library system, determining how best to integrate the two products was part of the implementation process. the endeca iap coexists with the sirsidynix unicorn ils and the sirsidynix (web2) online catalog, indexing marc records that are exported from unicorn’s underlying oracle database. figure 1 depicts the integration of the endeca software with existing systems. although the endeca software is capable of communicating directly with the database that supports the unicorn ils, ncsu chose the easier path of exporting marc records into text files for ingest by endeca. the marc4j api is used to reformat the exported marc records (which include itemlevel information in 999 fields) into flat text files with utf-8 encoding that are parsed by endeca’s data foundry process. nightly shell scripts export updated and new records from ils, merge those with the base endeca files, and start the re-indexing process. the indexing of seventy-three marc 132 information technology and libraries | september 2006 record fields and ten dimensions results in an index size of approximately 2.5 gb. the entire index resides in system memory. the endeca data foundry can easily parse and reindex the approximately 1.7 million titles in ncsu’s holdings nightly (in stark contrast to the more than 3 days of downtime required to re-index keywords in unicorn). the relative speed of this process and the fact that it does not interfere with the front-end application prompted the decision not to implement “partial indexing” at the outset. though there was little doubt among staff as to the increased capabilities of keyword searching through endeca, the implementation team decided that authority searching (author, title, subject, call number) would be preserved in the new catalog interface. this allowed ncsu to retain the value of authority headings, in addition to providing a familiar interface and approach to known-item searching. since the detailed record in web2 included the capability to save records, place requests, and send systemsuggested searches (“more like this”), the implementation team also decided to link from titles in the endeca-powered results page to the web2 detailed record. only slight modifications were required to stylize this display in a manner consistent with the new interface. the front-end interface for keyword searching in endeca is a java-based web application built in-house. this application is responsible for sending queries to the endeca mdex engine—the back-end http service that processes user queries—and displaying the results that are returned. user-interface design because it is created by the customer, ncsu libraries has complete control over the look, feel, and layout of the endeca search-results page. indexes, properties, and dimensions the implementation team began the process of making indexing decisions by looking at the fields indexed in the unicorn keyword-index file. this list included 161 marc fields and subfields, including more than thirty fields that are never displayed to the public. this kitchen-sink approach was replaced with a more carefully selected list less than half that number. the implementation team defined eleven dimensions for use with endeca’s faceted navigation feature. once users enter a search query, they can explore the result set by selecting values from these dimensions: availability; lc classification; subject: topic; subject: genre; format; library; subject: region; subject: era; language; and author (see figure 2). the eleventh dimension is not displayed on the results page, but is used to enable patrons to browse new titles. each dimension value also lists the number of results associated with it; most dimensions are listed in frequency order. search interface once the implementation team made some preliminary decisions regarding dimensions and search indexes, wireframes were created to assist in the iterative design process for the front-end application. while the positioning of the dimensions on the results page and the display of holdings information was well debated, the design of the catalog search page was an even hotter topic. integration of both endeca keyword searching and web2 authority searching required an interface that could help users differentiate between the two tools. a survey of the keyword-versus-authority searching distinction in a variety of library catalogs led to the development of four mock-ups. the implementation team chose a search tab that includes separate search boxes for keyword and authority searching, as well as search figure 1. ncsu endeca architecture figure 2. dimensions toward a twenty-first-century library catalog | antelman, lynema, and pace 133 examples dynamically displayed based on the index selected. authority searching was relabeled “begins with” searching to let users know that this particular search box featured known-item searching (although it is also where lcsh searching is found) (see figure 3). an advanced search tab re-creates the pre-coordinated search options from the web2 search interface using endeca search functionality. one unique new feature allows users to include or exclude reference materials and government documents from their results. a true boolean search box is made available here, primarily for staff. browse while users can submit a blank search and browse the entire collection by any of the dimensions, the browse tab specifically supports browsing by lc classification scheme (see figure 4). this tab also includes a “new titles” browse that can easily be refined with faceted navigation. at the time of this writing, there are plans to pull out other dimensions, such as format, language, or library, for browsing. this will be a great stride forward since there has traditionally been no way to perform a marc codes-only search (in order to browse all chinese fiction in the main library, for example). assessment the endeca-powered catalog seems self-evidently a better tool to help users find relevant resources quickly and intuitively. but since so much of the implementation involved uncharted territory, plans for assessment began before the launch of the interface, and the actual assessment activities began shortly thereafter. the library identified five assessment measures prior to implementation. one of these, however, requires longer time-series data (changes in circulation patterns), and another, the application of new and potentially complex log-analysis techniques (path analysis). other measures relate to use of the refinements, “sideways searching,” and objective and subjective measurements of quality search results, some of which can be preliminarily reported on here. log analysis to learn more about how patrons are using the catalog, data from two months of search logs were analyzed. while authority searching using the library’s old web2 catalog is still available in the new interface, search logs show that authority searching has decreased 45 percent and keyword searches have increased 230 percent. it is noted, however, that a significant—and indefinable—component of this increase in keyword searching is due to the fact that the default catalog search was changed from title to keyword. users are taking advantage of the new navigational features. fifty-five percent of the endeca-based search requests are simple keyword searches, 30 percent represent searches where users are selecting post-search refinements from the dimensions on the results page, and the remaining 15 percent are true browses with no search term entered (this figure includes use of browse new titles). dimensions the horizontal space just above the results is used to display the full range of results within the lc classification scheme (see figure 2). the first dimensions in the left column focus on the subject dimensions (topic and genre) that should be pertinent to the broadest range of searches. the following format and library dimensions recognize that patrons are often limited by time and space. when designing the user interface, it was not known which dimensions would be most valuable. as it turned out, dimension use does not exactly parallel dimension placement. lc classification is the most heavily used, followed closely by subject: topic, and then library, format, author, and subject: genre. since no basis for the placement of dimenfigure 3. new catalog search interface figure 4. browse by lc classification and new titles 134 information technology and libraries | september 2006 sions existed at the time of implementation, the endeca product team plans to use these data, after some time, to determine if changes in dimension order are warranted. spell correction and “did you mean . . .” approximately 6 percent of endeca keyword searches responded to the user’s query with some type of spelling correction or suggestion: 3.6 percent performed an automatic spell correction, and 2.8 percent offered a “did you mean…” suggestion. while ncsu has not analyzed how many of the spell corrections are accurate or how many of the “did you mean…” suggestions are being selected by users, future work in this area is planned. recommender features two features in endeca that have seen a surprising amount of use are the “most popular” sort option and the “more titles like this” feature available on the detailed-record page for a specific title. both relate broadly to the area of recommending related materials to patrons. the “most popular” sort option is currently powered by aggregated circulation data for all items associated with a title. while this technique is ineffective for serials, reference materials, and other noncirculating items, it provides users a previously unavailable opportunity to define relevance. to date, the “most popular” sort is the second most frequently selected sort option (after publication date, at 41 percent), garnering 19 percent of all sorting activity. most-popular sorting was trailed by title, author, and call-number sorting. when viewing a detailed record, users are given the option to find “more titles like this” or “more by these authors.” the first option initiates a new subject keyword search combining the phrases from the $a subdivision of all the subject (6xx) fields assigned to the record. the latter option initiates an author keyword search for any of the authors assigned to the current record. while there are not good statistics on use of this feature, these subject strings appear regularly in the list of most popular queries in search logs. assessing top results if relevance ranking was effective, one would expect to see good results on the first page. but what are “good” or “relevant” results? greisdorf finds that topicality is the first condition of relevance, and xu and chen’s more recent study finds topicality and novelty to be equally important components of relevance.26 while someone other than the searcher might be able to assess topical relevance, it is impossible to assess novelty, since it cannot be known what the searcher already knows. although researchers agree that relevance is subjective—that is, only a searcher can determine whether results are relevant—janes showed that trained external searchers do a reasonably good job of approximating the topical relevancy judgments of users.27 the analysis reported here focuses on topicality (using a liberal interpretation of what might be topically relevant). ncsu libraries sought to measure how many of the top search results are likely to be relevant to the user ’s query in the old and new catalogs. methodology one of the authors searched 100 topical queries (taken from 2005 search logs) in both web2 and endeca catalogs using “keyword anywhere.” topical queries whose meaning was unclear (e.g., “hand wrought”) were excluded. the topical relevance of the top hits (up to five) was coded for each target. because not all search-result sets contained five records, success for each was measured as a ratio (e.g., 2/5 = .4). those searches that resulted in 0 records in both targets were discarded, while those that resulted in 0 records in target a but “found relevant results” in target b were counted as 0 in target a. the ratios were then averaged for each target and compared to determine the difference in relevance-ranking performance. finally, a random subset of forty-four of the queries was selected, and the placement in the web2 results of the first result in endeca was noted. results on average, 40 percent of the top results in web2 were judged to be relevant, while 68 percent of the top results in endeca were judged to be relevant. that represents a 70 percent better performance for the endeca catalog. if one makes the assumption that the first endeca record is relevant (admittedly an assumption), based on these data, then one can look at the average position of that record in the old catalog. it was found that the first hit in endeca fell between #1 and #4126 in web2, with more than a third falling after the second screen of results, the maximum number of screens users are typically willing to examine.28 while this level of increased performance is impressive, it masks some dramatic differences in the respective result sets. looking at a broad search, “marsupial,” all of the top five hits in endeca have “marsupial” in the title and “marsupials” or “marsupialia” as a subject heading. the result set includes seventy-eight records, thanks to this intelligent stemming. in the web2 result set, just twenty-nine records, not a single one of the top five has “marsupial” in the title or subject headings (and the top two results, tributes to malcolm c mckenna and poisonous plants and related toxins, are highly unlikely to be relevant). it is not until record #10 that you see the first item that contains “marsupial” in the title or subject. this single example demonstrates the benefit of both relevance ranking and stemming. toward a twenty-first-century library catalog | antelman, lynema, and pace 135 usability testing as a result of a long history of catalog-usability studies, there are things that are known about library catalog users. one is that people both expect systems to be easy to use and find that they are not.29 usability testing was conducted to compare student success in using the new catalog interface with that of students using the old catalog interface when completing the same set of ten tasks. ten undergraduate students were recruited for the test. five were randomly selected to use the old web2 catalog, while the other five used the new catalog interface, which allows users to choose between a keyword search box powered by endeca and an authority search box (begins with . . . ) that is still powered by web2. the test contained four known-item tasks and six topical-searching tasks (appendix a). task success, duration, and difficulty were recorded. user satisfaction was not measured since catalog usability studies have found that satisfaction does not correlate with success.30 task duration figure 5 shows the average task duration for the topical tasks (5–10) for web2 and endeca. except for task 9*, there is clearly a trend of significantly decreased average task duration for endeca catalog users. the endeca catalog shows a 48 percent improvement in the average time required to complete a task (01:34 in web2 compared to 00:49 in endeca). it is also noted that, although results from known-item searching tasks (1–4) are not reported in detail here, test subjects were just as successful in completing them using keyword searching in the endeca catalog as they were using authority searching in web2. task success and difficulty in addition to task duration, the test moderator assigned a difficulty rating to each task attempted by the participants: easy, medium, hard, or failed. figure 6 illustrates the overall task-attempt difficulty for topical tasks (5–10) in the web2 and endeca catalogs. the largest improvement is in the increased percentage of tasks that are completed easily in endeca and the nearly equivalent decrease in the percentage of tasks that were rated as hard to complete. while a significant number of tasks were still failed using the endeca catalog, many of these failures can be attributed to participants’ propensity to select keyword in subject rather than keyword anywhere searches. in fact, the only instances where keyword anywhere search in the new catalog failed to lead to successful task completion were for a single participant who was unwilling to examine retrieved results closely enough to determine if they were actually relevant to the task question, assuming too quickly that the task had been completed successfully. terminology participants using both the web2 and endeca catalog interfaces expressed confusion over some of the terminology employed. one of the most problematic terms was “subject.” a number of participants selected keyword in subject for topical searches because of the attraction of the word “subject.” none of the participants recognized that this term referred to controlled vocabulary assigned to records. coupled with a slight unfamiliarity with the term “keyword,” not typically used in web searching, this misunderstanding led participants to misuse (or overuse) keyword in subject searches when they could have found results more effectively using general keyword searching. this terminology problem appears to be an artifact of the usability testing, however. looking at the search logs, more than 50 percent of the keyword searches were keyword anywhere searches, while only 4 percent represented keyword in subject searches. relevance relevance ranking of search results is clearly the most important im-provement in the new catalog. students in this usability test all looked immediately at the first few results on the first page to determine if their search had produced good results. if they didn’t like what they saw, they were likely to retry the search with fewer or more keywords in order to improve their first few results. one participant figure 5. average task duration: web2 versus endeca * while task 9 may appear to be an aberration, it actually reveals effective use of new functionality. this task required users to locate an audio recording of poetry in spanish. in web2, three of five participants completed the task successfully, all using the material type and language limits available in the advanced search tab. the two participants who didn’t locate this tool failed to complete the task. in endeca, two participants used the same advanced search limits to complete the task successfully and two additional participants were able to locate and use endeca dimensions to complete the task successfully. this suggests that the new interface is providing users with more options to help them arrive at the results they seek. 136 information technology and libraries | september 2006 using the web2 catalog expressed the need for relevance ranking, “once i scroll through a page, i get pretty discouraged about the results.” the number of paging requests recorded in system logs confirms that users are focusing on the first result screen (with ten results per page); only 13 percent of searchers go to the second page. use of dimensions when questioned after the test, all five participants who used the endeca catalog intuitively understood that dimensions could be used to narrow results. however, only three used the dimensions during the test. throughout the tests, the student participants frequently attempted to limit their search at the outset, rather than beginning with a broad search and then refining. it is unclear whether this behavior is a function of the very specific nature of the test questions or experience with the old catalog. log data show that users are indeed entering broad keyword searches with only one or two terms, which implies that dimensions may be more useful than this usability test indicates. it is also interesting to note that while none of the students understood that the lc classification dimension represented call-number ranges, they did understand that the values could be used to learn about a topic from different aspects—science, medicine, education. ฀ future directions weeks before the initial application went live in january 2006, the list of desired features had grown long. some of these were small “to do” items that the team did not have time to implement. others required deeper investigation, discussion, and testing before the feature could be put into production. still others may or may not be possible. a few of ncsu’s significant planned development directions are summarized below. functional requirements for bibliographic records there is much interest in the utility of applying the functional requirements for bibliographic records model to online catalogs.31 endeca includes a feature called “record rollup” that allows retailers to group items together for example, different sizes and colors of a shirt. all that is required for this feature is a rollup key. ncsu, working with oclc, has elected to try the oclc work identifier to take advantage of this functionality and create work-level record displays in the endeca catalog hit list. subject access the collective investment libraries have made in subject and name authorities is leveraged with the faceted navigation features of endeca. but only authorized headings in records are seen by endeca, cross-references in the subjectauthority record are not used. during implementation, the team looked at ways to improve the entry vocabulary to authorized-subject terms by loading the 1xx and 4xx fields from the subject-authority file into endeca synonym tables so that users could be guided to proper subject terms. the team still views this as a promising direction, but simply did not have time to fully explore it prior to implementation. additional discussions with oclc centered on their faceted access to subject terms (fast) project. fast terms are more amenable than lcsh headings to being broken up into topical, geographic, and time-period facets without losing context and meaning. the normalization of geographic and time-period subdivisions promises to be particularly useful. fast has, to date, lacked a ready interface for the application of its data. while the fast structure is more conducive to non-cataloger metadata creation and post-coordinate refinement, it still does not meet the need figure 6. topical task success and difficulty: web2 versus endeca toward a twenty-first-century library catalog | antelman, lynema, and pace 137 for a user-entry vocabulary.32 were such a vocabulary for lcsh to become available, it could be mapped to synonym tables to lead users to authorized headings. abandon authority searching? the future of authority searching, however, is less clear. although the usability testing described in this paper showed that the endeca keyword search tools performed on a par with the old catalog for known-item searching, it is recognized that authority searching serves more functions. clearly, collocation of all books on a topic is absent when a user does a topical search using keyword rather than a controlled subject heading. but there are more subtle losses as well. as chan points out, one purpose of subject access is to help users focus searches, develop alternative strategies, and enable recall and precision.33 this is not possible with a simple keyword search, unless the searcher discovers that he can search on a subject heading from a record of interest. the display of subject facets in the endeca-powered catalog works to counter this weakness of simple keyword searching. another navigation aid in the traditional authority display that is lost in a simple keyword-search result is visible “seams.” as mann points out, “seams serve as perceptible boundaries that provide points of reference; without such boundaries, readers get ‘lost at sea’ and don’t know where they are in relation to anything else: they can’t perceive either the extent of what they have, or what they don’t have.”34 until users have confidence that a known item will appear at the top of a results list if the library holds that item, with a large keyword result set, one cannot confirm a “negative result” without browsing through the entire set. the endecapowered catalog interface does not help to address either the “seams” or the negative-result problem, which are two reasons why ncsu maintained authority searching. an integration platform despite the vast improvements found in the endeca catalog, the fact remains that it is still mainly books—as calhoun says, “only a small portion of the expanding universe of scholarly information.”35 there are two approaches to take with the endeca platform: one is to take advantage of having control over the data and the interface to facilitate incorporation of outside data sources to enhance bibliographic records. the second is to put other, non-catalog data sources under the endeca search-and-navigation umbrella. the middleware nature of the endeca platform makes either approach more promising than the “square peg and round hole” problem of trying to work with library management systems illequipped to handle a diversity of digital assets. whether as a feed of catalog data to a metasearch application or web-site search tool, or as a platform for faceted access to electronic theses, institutional repositories, or electronic books, endeca has clear potential as a future platform for library resource discovery. ฀ conclusion while it cannot be claimed that this endeca-powered catalog is a third-generation online catalog, it does implement a majority of the third-generation catalog features identified by hildreth. most notably, through navigation of subject and item-level facets, the endeca catalog supports two of his objectives, “related record search and browse” and “integration of keyword, controlled vocabulary, and classification-based approaches.” spell correction, intelligent stemming, and synonym tables support “automatic term conversion/matching aids.” the flexible relevance-ranking tools support “closest, best-match retrieval” as well as “ranked output.” much work remains, however. three important features identified by hildreth cannot be said to be implemented in this catalog at this time: “natural language query expression,” that is, an entry vocabulary, “expanded coverage and scope,” and “relevance feedback methods.”36 requirements for these features are either being reviewed or are already under development by both endeca and ncsu libraries. ncsu views the endeca catalog implementation in the context of a broader, critical evaluation and overhaul of library discovery tools. like the library web site, the catalog still requires users to come to it. when they do, it still sets a high threshold for patience and the ability to interpret clues. still, at the end of the day it rewards the ncsu student searching “declaration of independence” with the book, american scripture: making the declaration of independence instead of the recent congressional resolution, recognizing the mexican holiday of cinco de mayo. references 1. christine l. borgman, “why are online catalogs still hard to use?” journal of the american society for information science 47, no. 7 (1996). 2. karl v. fast and d. grant campbell, “i still like google: university student perceptions of searching opacs and the web.” in proceedings of the 67th asis&t annual meeting (providence, r.i.: american society for information science and technology, 2004). 3. ray r. larson, “between scylla and charybdis: subject searching in the online catalog,” advances in librarianship 15 (1991); andrew large and jamshid beheshti, “opacs: a research review,” library & information science research 19, no. 2 (1997). 4. nathalie nadia mitev, gillian m. venner, and stephen walker, designing an online public access catalogue: okapi, a catalogue on a local area network (london: british library, 1985). 138 information technology and libraries | september 2006 5. borgman, “why are online catalogs still hard to use?” 495. 6. r. hafter, “the performance of card catalogs: a review of research,” library research 1, no. 3 (1979). 7. gerard salton, “the use of extended boolean logic in information retrieval,” in proceedings of the 1984 acm sigmod international conference on management of data (new york: acm pr., 1984), 277. 8. ray r. larson, “classification clustering, probabalistic information retrieval, and the online catalog,” library quarterly 61, no. 2 (1991). 9. ibid. 10. charles r. hildreth, online public access catalogs: the user interface (dublin, ohio: oclc, 1982). 11. larson, “classification clustering.” 12. mitev, venner, and walker, designing an online public access catalogue; ray r. larson et al., “cheshire ii: designing a next-generation online catalog,” journal of the american society for information science 47, no. 7 (1996). 13. tamas e. doszkocs, “cite nlm: natural-language searching in an online catalog,” information technology and libraries 2, no. 4 (1983). 14. nicholos j. belkin and w. bruce croft, “retrieval techniques,” in annual review of information science and technology, ed. martha e. williams (new york: elsevier, 1987), 129. 15. gary marchionini, information seeking in electronic environments (new york: cambridge univ. pr., 1995), 100–18. 16. borgman, “why are online catalogs still hard to use?” 494. 17. lois mai chan, exploiting lcsh, lcc, and ddc to retrieve networked resources: issues and challenges (washington, d.c.: library of congress, 2001), www.loc.gov/catdir/bibcontrol/ chan_paper.html (accessed july 10, 2006). 18. lois mai chan, “library of congress classification as an online retrieval tool: potentials and limitations,” information technology and libraries 5, no. 3 (1986); mary micco and rich popp, “improving library subject access (ilsa): a theory of clustering based in classification,” library hi tech 12, no. 1 (1994). 19. marcia j. bates, “subject access in online catalogs: a design model,” journal of the american society for information science 37, no. 6 (1986); karen coyle, “catalogs, card—and other anachronisms,” the journal of academic librarianship 31, no. 1 (2005); larson, “classification clustering.” 20. karen markey, “thus spake the opac user,” information technology and libraries 2, no. 4 (1983): 383. 21. larson, “classification clustering.” 22. marcia j. bates, library of congress bicentennial conference on bibliographic control for the new millennium, task force recommendation 2.3 research and design review: improving user access to library catalog and portal information, final report, 2003; charles r. hildreth, intelligent interfaces and retrieval methods for subject searching in bibliographic retrieval systems (washington, d.c.: library of congress, 1989); bates, “subject access in online catalogs”; belkin and croft, “retrieval techniques.” 23. bates, “subject access in online catalogs”; bates, library of congress bicentennial conference on bibliographic control for the new millennium; eric novotny, “i don’t think, i click: a protocol analysis study of use of a library online catalog in the internet age,” college & research libraries 65, no. 6 (2004). 24. bates, “subject access in online catalogs,” 367. 25. larson, “classification clustering”; buckland et al., “mapping entry vocabulary to unfamiliar metadata vocabularies,” d-lib magazine 5, no. 1 (1999). 26. h. greisdorf, “relevance thresholds: a multi-stage predictive model of how users evaluate information,” information processing & management 39, no. 3 (2003): 403–23; yunjie (calvin) xu and zhiwei chen, “relevance judgment: what do information users consider beyond topicality?” journal of the american society for information science and technology 57, no. 7 (2006). 27. joseph w. janes, “other people’s judgments: a comparison of users’ and others’ judgments of document relevance, topicality, and utility,” journal of the american society for information science 45, no. 3 (1994). 28. bernard j. jansen and udo pooch, “a review of web searching studies and a framework for future research,” journal of the american society for information science and technology 52, no. 3 (2001); novotny, “i don’t think, i click.” 29. borgman, “why are online catalogs still hard to use?” 30. brian nielsen and betsy baker, “educating the online catalog user: a model evaluation study,” library trends 35, no. 4 (1987). 31. ifla cataloging section, “frbr bibliography,” www.ifla .org/vii/s13/wgfrbr/bibliography.htm (accessed may 1, 2006). 32. lois mai chan et al., “a faceted approach to subject data in the dublin core metadata record,” journal of internet cataloging 4, no. 1/2 (2001). 33. chan, exploiting lcsh, lcc, and ddc. 34. thomas mann, “is precoordination unnecessary in lcsh? are web sites more important to catalog than books?” a reference librarian’s thoughts on the future of bibliographic control (washington, d.c.: library of congress, 2001), www.loc.gov/ catdir/bibcontrol/mann_paper.pdf (accessed july 10, 2006). 35. karen calhoun, “the changing nature of the catalog and its integration with other discovery tools,” prepared for the library of congress, 2006, 24. unpublished, www.loc.gov/ catdir/calhoun-report-final.pdf (accessed july 7, 2006). 36. charles r. hildreth, online catalog design models: are we moving in the right direction? (washington, d.c.: council on library resources, 1995). toward a twenty-first-century library catalog | antelman, lynema, and pace 139 copyright © 2006 by charles w. bailey jr. this work is licensed under the creative commons attributionnoncommercial 2.5 license. to view a copy of this license, visit http://creativecommons.org/licenses/by-nc/2.5/ or send a letter to creative commons, 543 howard st., 5th floor, san francisco, ca, 94105, usa. bailey continued from 127 ฀ known-item questions 1. “your history professor has requested you to start your research project by looking up background information in a book titled civilizations of the ancient near east.” a. “please find this title in the library catalog.” b. “where would you go to find this book physically?” 2. “for your literature class, you need to read the book titled gulliver’s travels written by jonathan swift. find the call number for one copy of this book.” 3. “you’ve been hearing a lot about the physicist richard feynman, and you’d like to find out whether the library has any of the books that he has written.” a. “what is the title of one of his books?” b. “is there a copy of this book you could check out from d. h. hill library?” 4. “you have the citation for a journal article about photosynthesis, light, and plant growth. you can read the actual citation for the journal article on this sheet of paper.” alley, h., m. rieger, and j.m. affolter. “effects of developmental light level on photosynthesis and biomass production in echinacea laevigata, a federally listed endangered species.” natural areas journal 25.2 (2005): 117–22. a. “using the library catalog, can you determine if the library owns this journal?” b. “do library users have access to the volume that actually contains this article (either electronically or in print)?” ฀ topical questions 5. “please find the titles of two books that have been written about bill gates (not books written by bill gates).” 6. “your cat is acting like he doesn’t feel well, and you are worried about him. please find two books that provide information specifically on cat health or caring for cats.” 7. “you have family who are considering a solar house. does the library have any materials about building passive solar homes?” 8. “can you show me how would you find the most recently published book about nuclear energy policy in the united states?” 9. “imagine you teach introductory spanish and you want to broaden your students’ horizons by exposing them to poetry in spanish. find at least one audio recording of a poet reading his or her work aloud in spanish.” 10. “you would like to browse the recent journal literature in the field of landscape architecture. does the design library have any journals about landscape architecture?” appendix a: ncsu libraries catalog usability test tasks microsoft word september_ital_fortier_final.docx hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons alexandre fortier and jacquelyn burkell information technology and libraries | september 2015 59 abstract librarians have a professional responsibility to protect the right to access information free from surveillance. this right is at risk from a new and increasing threat: the collection and use of non-‐ personally identifying information such as ip addresses through online behavioral tracking. this paper provides an overview of behavioral tracking, identifying the risks and benefits, describes the mechanisms used to track this information, and offers strategies that can be used to identify and limit behavioral tracking. we argue that this knowledge is critical for librarians in two interconnected ways. first, librarians should be evaluating recommended websites with respect to behavioral tracking practices to help protect patron privacy; second, they should be providing digital literacy education about behavioral tracking to empower patrons to protect their own privacy online. introduction privacy is important to librarians. the american library association code of ethics (2008) states that “we protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted,” while the canadian library association code of ethics (1976) states that members have responsibility to “protect the privacy and dignity of library users and staff.” this translates to a core professional commitment: according to the american library association (2014, under “why libraries?”), “librarians feel a professional responsibility to protect the right to search for information free from surveillance.” increasingly, information searches are conducted online, and as a result librarians should be paying specific attention to online surveillance in their efforts to satisfy their privacy-‐related professional responsibility. this is particularly important given the current environment of significant and increasing threat to privacy in the online context. although many concerns about online privacy relate to the collection, use, and sharing of personally identifiable information, there is increasing awareness of the risks associated with the collection and use of what has been termed ‘non-‐personally identifiable information’ (e.g.: internet protocol addresses, pages visited, geographic location information, search strings, etc.; office of the privacy commissioner of canada alexandre fortier (afortie@uwo.ca) is a phd candidate and lecturer, faculty of information and media studies, the university of western ontario, london, ontario. jacquelyn burkell (jburkell@uwo.ca) is associate professor, faculty of information and media studies, the university of western ontario, london, ontario. hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 60 2011, 12). this practice has been termed ‘behavioral tracking’, and recent revelations of government security agency collection of user metadata (ball 2013; weston, greenwald and gallager 2014) have heightened awareness of this issue. the problem, however, is not new, nor is the practice restricted to the actions of governmental agencies. indeed, as early as 1996 commercial and non-‐commercial entities were practicing online behavioral tracking for purposes of website and interaction personalization and to present targeted advertising (“affinicast unveils personalization tool” 1996; “adone classified network and clickover announce strategic alliance” 1997). since these initial forays into behavioral tracking and personalization of online content the practice has proliferated, and many sites now use a variety of behavioral tracking tools to enhance user experience and deliver targeted advertisements (see, e.g., the “what they know” series from the wall street journal 2010; gomez, pinnick and soltani 2009; soltani et al. 2009). there can be no question that behavioral tracking is a form of surveillance (castelluccia and narayanan 2012), and the ubiquity of this practice means that users are regularly subject to this type of surveillance when they access online resources. in order to satisfy a professional commitment to support information access free from surveillance, librarians must therefore address two related issues: first, they must ensure that the resources they recommend are privacy-‐respecting in that those resources engage in little if any online surveillance; second, they must raise the digital literacy of their patrons with respect to online privacy, increasing understanding of online tracking mechanisms and the strategies that patrons can use to protect their privacy in light of these activities. addressing the first issue requires that librarians attend to surveillance practices when recommending online information resources. privacy and surveillance issues, however, are notably absent from common guidelines for evaluating web resources (see, e.g., kapoun 1998; university of california, berkley 2012; john hopkins university 2013), and thus librarians do not have the guidance they need to ensure that the resources they recommend are privacy-‐respecting. it is critical that librarians and other information professionals address this gap by developing an understanding of the surveillance mechanisms used by websites and the strategies that can be deployed to identify and even nullify these mechanisms. this same understanding is necessary to address the second goal of raising the privacy-‐related digital literacy of patrons. librarians must understand tracking mechanisms and potential responses in order to integrate privacy literacy into library digital literacy initiatives that are central to the mission of libraries (american library association 2013). this paper provides an introduction to behavioral tracking mechanisms and responses. the goals of this paper are to provide an overview of the risks and benefits associated with online behavioral tracking, to discuss the various surveillance mechanisms that are used to track user behavior, and to provide strategies for identifying and limiting online behavioral tracking. we have elsewhere published analyses of behavioral tracking practices on websites recommended by information professionals (burkell and fortier 2015), and on practices with respect to the disclosure of tracking mechanisms (burkell and fortier 2015). this paper serves as an adjunct to information technologies and libraries | september 2015 61 those empirical results, providing information professionals with background that will assist them in negotiating, on the part of themselves and their patrons, the complex territory of online privacy. consumer attitudes toward behavioral tracking survey data suggest that consumers are, in general, aware of behavioral tracking practices. the 2013 us consumer data privacy study (truste 2013), for example, reveals that 80 percent of users are aware of online behavioral tracking on their desktop devices, while slightly under 70 percent are aware of tracking on mobile devices (see also office of the privacy commissioner of canada 2013). awareness, however, does not directly translate to understanding, and recent data indicate that even relatively sophisticated internet users are not fully informed about behavioral tracking practices (mcdonald and cranor 2010; smit et al. 2014). moreover, attitudes about tracking are at best ambivalent (ur et al. 2012), and many studies indicate a predominantly negative reaction to these practices (turow et al. 2009; mcdonald and cranor 2010; truste 2013). although it is not universally required by regulatory frameworks, many users feel that companies should request permission before collecting behavioral tracking data (office of the privacy commissioner of canada 2013). finally, although some users take steps to limit or even eliminate behavioral tracking, many do not. for example, while one-‐third to three-‐quarters of survey respondents indicate that they manage or refuse browser cookies (truste 2013; comscore 2007; 2011; rainie et al. 2013), at least one quarter reported no attempts to limit behavioral tracking. this may be attributed to the difficulty in using such mechanisms (leon et al. 2011). behavioral tracking and its mechanisms tracking mechanisms transmit non-‐personally identifiable information to websites for different purposes. originally, the information collected by these mechanisms was used to enhance user experience and to make these website interactions more efficient. tracking mechanisms can record user actions on a web page and their interaction preferences. using these data, websites can for example direct returning visitors to a specific location in the site, allowing those visitors to resume interaction with a website at the point where they were on the previous visit. using the internet protocol (ip) address of a user, websites can display information relevant to the geographic area where a user is located. tracking mechanisms also allow a website to remember registration details and the items users have put in their shopping basket (harding, reed and gray 2007). tracking mechanisms are also of great use to webmasters, supporting the optimization of website design. thus, for example, these mechanisms can inform webmasters of users’ movements on their websites: what pages are visited, how often they are visited, and in what order. they can also indicate the common entry and exit points for a specific website. this information can be leveraged in site redesign to increase user satisfaction and traffic. hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 62 website optimization and interaction personalization have potential benefit to users. at the same time, however, the detailed profile of user activities, potentially aggregated across multiple visits to different websites, presents potential privacy risks. the information gathered through tracking mechanisms can allow a website to identify browsing and information access habits, to infer user characteristics including location and some demographics, and to know what topics or products are of particular interest to a user. tracking mechanisms can be first-‐party or third-‐party, and the difference has implications for the detail that can be assembled in the user profile. first-‐party mechanisms are set by directly by the website a user is visiting, while third-‐party mechanisms are set by outside companies providing services, such as advertising, analysis of user patterns and social media integration, on the primary site. first-‐party tracking mechanisms collect information about a site visit and visitor and deliver that information to the site itself. using first-‐party tracking, web sites can provide personalized interaction, integrating visit and visitor information both within a single visit and across multiple visits (randall 1997). this information is available only to the web site itself, and thus neither includes information about visits to other sites nor is accessible by other websites, unless the information is sold or leaked by the first-‐party site (see narayanan 2011). third-‐party tracking mechanisms, by contrast, deliver information about a site visit and visitor to a third party. this transaction is often invisible to the user, and the information is transmitted typically without explicit user consent. third-‐party tracking represents a greater menace to privacy, since third parties have a presence on multiple sites, and are able to collect information about users and their activities on all those sites and integrate that information across sites and across visits into a single detailed user profile (see mayer and mitchell 2012 for a discussion of privacy problems associated with third-‐party tracking). research demonstrates that third-‐party tracking is a common and perhaps even ubiquitous practice (gomez, pinnick and soltani 2009; (burkell and fortier 2013). it is not uncommon for websites to have trackers from more than one third party, and some websites, especially popular ones, have trackers from dozens of different organizations: gomez, pinnick and soltani (2009), for example, found 100 unique web beacons on a single website. furthermore, the same tracking companies are present on many different websites, allowing them to integrate into a single profile information about visits to each of these many sites. privacychoice1, which maintains a comprehensive database of tracking companies, estimates that google display network (doubleclick), for instance, has a presence on 57 percent of websites. thus, a user traveling the web is likely to be tracked by doubleclick on more than half of the sites they visit, and doubleclick has access to information about all visits to each of these many sites. worries about the potential privacy breaches that mechanisms for tracking a user’s activities online can allow are not new. even at their inception in the mid-‐1990s, http cookies (also known as browser cookies) were generating controversy about the potential invasion of privacy 1 http://www.privacychoice.org/. information technologies and libraries | september 2015 63 (e.g. randall 1997). users, however, quickly realized that they could manage http cookies using accessible browser settings that limit or even entirely disallow the practice of setting cookies. as a result, websites, advertisers and others who benefit from web audience segmentation and behavior analytics developed newer and more obscure tracking technologies including ‘supercookies’ and web beacons, and these technologies are now deployed along with http cookies (sipior, ward and mendoza 2011). tracking technologies are constantly evolving in response to user behavior and advertiser demand, therefore keeping up to date is an ongoing challenge (see, e.g., goodwin 2011). http cookies http cookies were originally meant to help web developers “invisibly” gather information about users in order to personalize and optimize user experience (randall 1997). these cookies are simply a few lines of text shared in an http transaction, and a typical cookie might include a user id, the time of a visit, and the ip address of the computer. cookies are associated with a specific browser, and the information is not shared between different browsers on the same machine: thus, the cookies stored by firefox are not accessible to internet explorer, and vice versa. cookies do not usually include identifying information such as name or address, and they are able to do so if and only if the user has explicitly provided this information to the website. when users want to access a web page, their browser sends a request to the server for the specific website and the server searches the hard drive for a cookie file from this site. if there is no cookie, a unique identifier code is assigned to the browser and a cookie file is saved on the hard drive. if there is a cookie, it is retrieved and the information is used to personalize and structure the website interaction (for a detailed description of the mechanics of cookies, see kriscol 2001, 152–155). some http cookies, called session or transient cookies, automatically expire when the browser is closed (barth 2011). they are mainly used to keep track of what a consumer has added to a shopping cart or to allow users to navigate on a website without having to log in repeatedly. other http cookies, called permanent, persistent or stored cookies, are configured to keep track of users until the cookie reaches its expiration date, which can be set many years after creation (barth 2011). permanent http cookies can be easily deleted using browser management tools (sipior, ward and mendoza 2011). studies have shown that approximately a third of users delete cookies once a month (e.g. comscore 2007; 2011). such behavior, however, displeases advertisers, as it leads to an overestimation of the number of true unique visitors on a website and impede user tracking (marshall 2005; see also comscore 2007; 2011). flash cookies and other ‘supercookies’ to palliate this ‘attack’ on http cookies, an online advertising company, united virtualities, developed a backup system for cookies using the local shared object feature of adobe’s flash player plug-‐in: the persistent identification element (sipior, ward and mendoza 2011). this type of storage, called flash player local shared objects or, more commonly, flash cookies, shares many similarities with http cookies with regard to their tracking capabilities, storing similar hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 64 non-‐personally identifying information. unlike http cookies, however, flash cookies do not have an expiration date, a characteristic that makes them permanent until they are manually deleted. they are also not handled by a browser, but are stored in a location accessible to different browsers and flash widgets, which are thus all able to access the same cookie. they can hold much more data (up to 100 kb by default compared to 4 kb for http cookies), and support more complex data types than http cookies (see macdonald and cranor 2012 for a technical comparison of http and flash cookies). moreover, it is estimated that adobe’s flash player is installed on over 99 percent of personal computers (adobe 2011), making flash cookies usable on virtually all computers. flash cookies represent a more resilient technology for tracking than http cookies. erasing traditional cookies within a browser does not affect flash cookies, which needs to be erased in a separate panel (sipior, ward and mendoza 2011). flash cookies also have the ability to ‘respawn’ (or recreate) deleted http cookies. a website using flash cookies can therefore track users across sessions even if the user has taken reasonable steps to avoid this type of online profiling (soltani et al. 2009), and although it is declining in incidence, this practice is still occurring, sometimes on very popular websites (ayenson et al. 2011; macdonald and cranor 2012). it should also be noted that other internet technologies (e.g. silverlight, javascript, and html5), which have so far attracted less attention from researchers, use local storage for similar purposes. one developer even created the ‘evercookie’, a very persistent cookie incorporating twelve types of storage mechanisms available in a browser that makes data persist and allows for respawning (kamkar 2010), a method investigated by the national security agency to de-‐anonymize users of the tor network, (‘tor stinks’ presentation 2013), a network which aims at concealing the location and usage of users. web beacons users’ online behavior can also be monitored by web beacons (also called web bugs, clear gifs or pixel tags), which tiny are image tags embedded within a document, appearing on a webpage or attached to an email, that are intended to be unnoticed (martin, wu and alsaid 2003). the image tag creates a holding space for a referenced image residing on the web, and beacons transmit information to a remote computer when the document (web page or email) is viewed. web beacons can gather information on their own, and they can also retrieve information from a previously set cookie (angwin 2010; see martin, wu and alsaid 2003 for description of the different technological abilities of web beacons). such capacity means, according to the privacy foundation (smith 2000; quoted in martin, wu and alsaid 2003), that beacons could potentially transfer to a third party demographic data and personally identifiable information (name, address, phone number, email address, etc.) that a user has typed on a page. unlike cookies, beacons are not tied to a specific server and can track users over multiple web sites (schoen 2009). beacons, moreover, cannot be managed through browser settings. while blocking third-‐party cookies limit information technologies and libraries | september 2015 65 their range of action, it does not preclude beacons from gathering information on their own, and users have to install extensions to their browser to efficiently limit the effects of web beacons. strategies for identifying behavioral tracking in order to identify privacy-‐respecting online resources, librarians must learn to assess the behavioral tracking activities occurring on websites. the first step is to identify and review website privacy policies. privacy guidelines regulating the collection, retention and use of personal information in the online environment usually require that users should be given notice of website practices (e.g., fair information practice principles2 proposed in 1973 by the us secretary’s advisory committee on automated personal data systems, the convention for the protection of individuals with regard to automatic processing of personal data developed by the council of europe (1981), and the organisation for economic co-‐operation and development guidelines on the protection of privacy and transborder flows of personal data3). this notice is typically provided in privacy policies that identify what information is collected, how it is used, and with whom it is shared. regulatory frameworks, however, did not originally contemplate the collection of non-‐ personally identifiable information. while such disclosure would seem to be consistent with the fair information practice principles, the current mode of mode of control is in many cases self-‐ regulatory45, and full compliance with notice requirements is far from universal (komanduri et al. 2011-‐2012). thus, while disclosure of behavioral tracking practices in websites should be seen as diagnostic of the presence of these mechanisms, lack of disclosure cannot be interpreted to mean that the site does not engage in behavioral tracking (komanduri et al. 2011-‐2012; burkell and fortier 2013b). furthermore, privacy policy disclosures, where they do exist, may be difficult to understand (burkell and fortier 2013b). website privacy policies are often complex (micheti, burkell and steeves 2010). they tend to be written with the goal of protecting a website owner against lawsuits rather than informing users (earp et al. 2005; pollach 2005). pollach (2005), for example, details a variety of linguistic strategies that serve to undermine user understanding of website practices, including mitigation and enhancement, obfuscation of reality, relationship building, and persuasive appeals. therefore, even if many websites acknowledge the collection of non-‐ personally identifiable information, both from first-‐ and third-‐party, the effectiveness of this disclosure is limited, making privacy policies a relatively ineffective tool to identify behavioral tracking practices. 2 the privacy act of 1974, 5 u.s.c. § 552a. 3 c(80)58/final, as amended on 11 july 2013 by c(2013)79. 4 for instance, the new self-regulatory guidelines for online behavioral advertising identify the need to provide notice to users when behavioral data is collected that allows the tracking of users across websites and over time (united states federal trade commission, 2009). 5 exceptions to this self-regulatory principle are increasing, including but not limited to the california online privacy protection act of 2003 (oppa), and the eu cookie directive (2009/136/ec) of the european parliament and of the council. hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 66 as a result, librarians need to develop strategies and tools that allow them to assess directly the behavioral tracking practices of websites, in order that these practices can be considered in making websites recommendations. different protocols can be followed in making this assessment, but they should be built around the following guiding principles (see burkell and fortier 2013a for a full discussion). the first important principle is that each website should be visited in an independent session to eliminate contamination. each website under consideration should be visited in an independent session, beginning with the browser at an about:blank page, with clean data directories (no http and flash cookies, and an empty cache). the evaluator should ensure that browser settings are configured to allow cookies, tools to track web beacons (e.g., the ghostery6 browser extension) are installed in the browser, and adobe flash, via the website storage settings panel is configured to accept data. the website should then be accessed directly by entering the domain name into the browser’s navigation bar. evaluators should mimic a typical user interaction with the website on many pages without clicking on advertisements or following links to outside sites. as they browse through the site, the evaluator should record the web beacons and trackers identified by the browser extension (e.g., ghostery). at the end of the session, they should immediately review the contents of the browser cookie file and the adobe flash panel via website storage settings, recording any cookies that are present. privacychoice, as well as ghostery, maintains a database of trackers that evaluators can use to identify associated privacy risk. while all third-‐party trackers raise some privacy issues, some of them put users at a greater risk than others, either because of their practices or their presence on a large number of websites. evaluators should take that into account when making a decision. strategies for limiting behavioral tracking users may also take these steps to identify the presence of behavioral tracking, and digital literacy initiatives should provide this information along with tools and strategies that users can employ to limit tracking. it should be noted that elimination of all behavioral tracking may not be a desirable outcome from the perspective of users who benefit from the website personalization and optimization supported by these mechanisms. targeted advertising can also be positive for many people, since it eliminates unwanted or ‘useless’ advertisements. ultimately, a user must decide whether he or she wants to be tracked. digital literacy initiatives should raise awareness of behavioral tracking and provide users with the tools they need to identify and control tracking should they choose to do so. the easiest step is for users to learn how to manage http cookies in every web browser that they use. using browser settings, users can decide to refuse third-‐party cookies or even all cookies. the latter, however, will make the make the browsing experience much less efficient and may impede users from accessing some websites. users should also learn how to delete cookies and they should be encouraged to think about periodically emptying the cookie file of each of their browsers. controlling flash cookies is more complex, yet crucial considering the capabilities of 6 https://www.ghostery.com/. information technologies and libraries | september 2015 67 flash cookies. this is achieved through settings on the adobe website storage settings panel. browser extensions, such as ghostery and adblock plus7, can be added to most browsers. ghostery allows users to block trackers, either on a tracker-‐by-‐tracker basis, a site-‐by-‐site basis or a mixture of the two. also customable, adblock plus allows users to block either all advertisements or only the ones they do not want to see. these extensions, however, may slow down internet browsing. users can also change their internet use habits. it is possible for user to use search engines that do dot store any non-‐personally identifiable information, such as ixquick8 and duckduckgo9. ixquick returns the top ten results from multiple search engines. it only sets one cookie that remembers a user’s search preferences and that is deleted after a user does not visit ixquick for 90 days. duckduckgo, which returns the same search results for a given search term to all users, aims at getting information from the best sources rather than the most sources. while these search engines do not have all the functionality of the major search engines, both of them have received praise (e.g. mccracken 2011). the ultimate solution, one that allows a user to navigate online total anonymity, is to use the tor10 web browser, which impedes network surveillance or traffic analysis and which the u.s. national security agency has characterized as “the king of high secure, low latency internet anonymity” (schneier 2013). the anonymity afforded by tor, however, comes at the price of reduced speed and limitations to available content. conclusion it is widely understood that online privacy is at risk, threatened by the actions of governmental agencies and commercial entities. there is widespread awareness of and attention to the risks associated with the collection and use of personally identifiable information, but less attention is paid to an equally significant issue: the collection and use of information that is highly personal but nonetheless ‘non-‐identifying’. this practice, termed ‘behavioral tracking’, is the focus of this paper. other research demonstrates that behavioral tracking is widespread (gomez, pinnick and soltani 2009; burkell and fortier 2013a), but users demonstrate only a limited knowledge of the practice and they do little to control tracking (comscore 2007; 2011; rainie et al. 2013; truste 2013). we argue that librarians have a dual professional responsibility with respect to this issue: first, librarians should be aware of the surveillance practices of the websites they recommend to patrons and take these practices into account in making website recommendations; second, digital literacy initiatives spearheaded by librarians include a focus on online privacy, and provide patrons with the information they need to manage their own online privacy. this paper presents an overview of online behavioral tracking mechanisms, and provides strategies for identifying and limiting online behavioral tracking. the information presented provides a basic understanding of tracking mechanisms along with practical strategies that 7 https://adblockplus.org/. 8 https://www.ixquick.com/. 9 https://duckduckgo.com/. 10 www.torproject.org/torbrowser/. hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 68 librarians can use to evaluate websites with respect to these practices and strategies that can be used to limit online tracking. we recommend that website evaluation standards be extended to include assessment of online privacy and especially behavioral tracking. we also recommend that librarians actively promote digital literacy by engaging in public education programs that take privacy and other digital literacy issues into account (american library association 2013). finally, we note that protecting online privacy is an ongoing challenge, and librarians must ensure that they continually update their understanding of online surveillance mechanisms and the approaches that can be used to monitor and limit these activities. acknowledgement support for this project was provided by the office of the privacy commissioner of canada through its contributions program. the views expressed in this document are those of the researchers and do not necessarily reflect the views of the officer of the privacy commissioner of canada. references adobe. 2011. “adobe flash platform runtimes: pc penetration”. http://www.adobe.com/mena_en/products/flashplatformruntimes/statistics.html. “adone classified network and clickover announce strategic alliance”. 1997. business wire, march 24. “affinicast unveils personalization tool”. 1996. adage, december 4. http://adage.com/article/news/affinicast-‐unveils-‐personalization-‐tool/2714/. american library association. 2008. code of ethics. http://www.ala.org/advocacy/proethics/codeofethics/codeethics. ———. 2013. digital literacy, libraries, and public policies: report of the office for information technology policy’s digital literacy task force. http://www.districtdispatch.org/wp-‐ content/uploads/2013/01/2012_oitp_digilitreport_1_22_13.pdf. ———. 2014. choose privacy week. accessed april 8. http://chooseprivacyweek.org. angwin, julia. 2010. “the web’s new gold mine: your secrets”. the wall street journal july 31. http://online.wsj.com/news/articles/sb10001424052748703940904575395073512989404. ayenson, mika, dietrich james wambach, ashkan soltani, nathan good and chris jay hoofnagle. 2011. “flash cookies and privacy ii: now with html5 and etag respawning”. social science research network. http://ssrn.com/abstract=1898390. ball, james. 2013. “nsa stores metadata of millions of web users for up to a year, secret files show”. the guardian, september 30. http://www.theguardian.com/world/2013/sep/30/nsa-‐americans-‐ metadata-‐year-‐documents. information technologies and libraries | september 2015 69 barth, adam. 2011. “http state management mechanism”. internet engineering task force, rfc 6265. http://tools.ietf.org/html/rfc6265. burkell, jacquelyn and alexandre fortier. 2013. privacy policy disclosures of behavioural tracking on consumer health websites. proceedings of the 76th annual meeting of the association for information science and technology, edited by andrew grove. doi: 10.1002/meet.14505001087. burkell, jacquelyn and alexandre fortier. 2015. could we do better? behavioural tracking on recommended consumer health websites. health information and libraries journal 32 (3): 182– 194. canadian library association. 1976. code of ethics. http://www.cla.ca/content/navigationmenu/resources/positionstatements/code_of_ethics.htm. castelluccia, claude and arvind narayanan. 2012. privacy considerations of online behavioural tracking. heraklion, greece: european union agency for network and information security. http://www.enisa.europa.eu/activities/identity-‐and-‐trust/library/deliverables/privacy-‐ considerations-‐of-‐online-‐behavioural-‐tracking. comscore 2007. the impact of cookie deletion on the accuracy of site-‐server and ad-‐server metrics: an empirical comscore study. https://www.comscore.com/fre/insights/presentations_and_whitepapers/2007/cookie_deletio n_whitepaper. ———. 2011. the impact of cookie deletion on site-‐server and ad-‐server metrics in latin america: an empirical comscore study. http://www.comscore.com/insights/presentations_and_whitepapers/2011/impact_of_cookie_de letion_on_site-‐server_and_ad-‐server_metrics_in_latin_america. council of europe. 1981. convention for the protection of individuals with regard to automatic processing of personal data. http://conventions.coe.int/treaty/en/treaties/html/108.htm. earp, julia b., annie i. antón, lynda. aiman-‐smith and william h. stufflebeam. 2005. “examining internet privacy policies within the context of user values”. ieee transactions on engineering and management 52 (2): 227–237. gomez, joshua, travis pinnick and ashkan soltani. 2009. knowprivacy. http://ashkansoltani.files.wordpress.com/2013/01/knowprivacy_final_report.pdf. goodwin josh. 2011. super cookies, ever cookies, zombie cookies, oh my. ensighten, blog entry. http://www.ensighten.com/blog/super-‐cookies-‐ever-‐cookies-‐zombie-‐cookies-‐oh-‐my. harding, william t., anita j. reed and robert l. gray. 2001. cookies and web bugs: what they are and how they work together. information systems management 18 (3): 17–24. hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 70 johns hopkins university sheridan libraries. 2013. evaluating information found on the internet. http://guides.library.jhu.edu/evaluatinginformation. kamkar, samy. 2010. “evercookie”. http://samy.pl/evercookie/. kapoun, jim. 1998. “teaching undergrads web evaluation: a guide for library instruction”. college & research libraries news, july/august: 522–523. komanduri, saranga, richard shay, greg norcie, blase ur and lorrie faith cranor. 2011-‐2012. “adchoices? compliance with online behavioral advertising notice and choice requirements”. i/s: a journal of law and policy for the information society 7: 603–638. kristol, david m. 2001. http cookies: standards, privacy, and politics. acm transactions on internet technology 1 (2): 151–198. leon, pedro giovanni, blase ur, rebecca balebako, lorrie faith cranor, richard shay, and yang wang. 2012. “why johnny can’t op out: a usability evaluation of tools to limit online behavioral advertising”. proceedings of the sigchi conference on human factors in computing systems. http://dl.acm.org/citation.cfm?id=2207759. marshall, matt. 2005. “new cookies much harder to crumble”. the standard-‐times, may 15. http://www.southcoasttoday.com/apps/pbcs.dll/article?aid=/20050515/news/305159957. martin, david, hailin wu and adil alsaid. 2003. hidden surveillance by web sites: web bugs in contemporary use. communications of the acm 46 (1): 258–264. mayer, jonathan r. and john c. mitchell. 2012. third-‐party web tracking: policy and technology. proceedings of the 2012 ieee symposium on security and privacy. https://cyberlaw.stanford.edu/files/publication/files/trackingsurvey12.pdf. mccracken, harry. 2011. “50 websites that make the web great. time, august 16. http://content.time.com/time/specials/packages/0,28757,2087815,00.html. mcdonald, aleecia m. and lorrie faith cranor. 2010. “beliefs and behaviors: internet users’ understanding of behavioral advertising”. social science research network. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1989092. ———. 2012. “a survey of the use of adobe flash local shared objects to respawn http cookies”. i/s: a journal of law and policy for the information society 7 (3): 639–687. micheti, anca, jacquelyn burkell and valerie steeves. 2010. “fixing broken doors: strategies for drafting privacy policies young people can understand”. bulletin of science, technology, and society. 30 (2): 130–143. narayanan, arvind. 2011. “there is no such thing as anonymous online”. blog entry, july 28. https://cyberlaw.stanford.edu/blog/2011/07/there-‐no-‐such-‐thing-‐anonymous-‐online-‐tracking. information technologies and libraries | september 2015 71 office of the privacy commissioner of canada. 2011. report on the 2010 office of the privacy commissioner of canada's consultations on online tracking, profiling and targeting, and cloud computing. https://www.priv.gc.ca/resource/consultations/report_201105_e.pdf. ———. 2013. survey of canadians on privacy-‐related issues. http://www.priv.gc.ca/information/por-‐rop/2013/por_2013_01_e.pdf. pollach, irene. 2005. “a typology of communicative strategies in online privacy policies: ethics, power, and informed consent”. journal of business ethics 62 (3): 221–235. rainie, lee, sara kiesler, ruogu kang and mary madden. anonymity, privacy, and security online. pew research internet project. http://www.pewinternet.org/2013/09/05/anonymity-‐privacy-‐ and-‐security-‐online/. randall, neil. 1997. “the new cookie monster”. pc magazine 16 (8): 211–214. schneier, bruce. 2013. “attacking tor: how the nsa targets users' online anonymity”. the guardian, 4 october. http://www.theguardian.com/world/2013/oct/04/tor-‐attacks-‐nsa-‐users-‐ online-‐anonymity. schoen, seth. 2009. “new cookie technologies: harder to see and remove, widely used to track you”. blog entry, september 14. https://www.eff.org/deeplinks/2009/09/new-‐cookie-‐ technologies-‐harder-‐see-‐and-‐remove-‐wide. sipior , janice c., burke t. ward and ruben a. mendoza. 2011. online privacy concerns associated with cookies, flash cookies, and web beacons. journal of internet commerce 10 (1): 1–16. smit, edith g., guda van noort hilde a. m. voorveld. 2014. understanding online behavioural advertising: user knowledge, privacy concerns, and online coping behaviour in europe. computers in human behavior 32 (1): 15–22. smith, r. m. 2000. “why are they bugging you?” privacy foundation. http://www.privacyfoundation.org/resources/whyusewb.asp. soltani, ashkan, shannon canty, quentin mayo, lauren thomas, chris jay hoofnagle. 2009. “flash cookies and privacy”. social science research network. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1446862. “‘tor stinks’ presentation”. 2013. the guardian online, october 4. http://www.theguardian.com/world/interactive/2013/oct/04/tor-‐stinks-‐nsa-‐presentation-‐ document. truste. 2013. us 2013 consumer data privacy study – advertising edition. http://www.truste.com/us-‐advertising-‐privacy-‐index-‐2013/. hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 72 turow, joseph, jennifer king, chris jay hoofnagle, amy bleakley and michael hennessy. 2009. “americans reject tailored advertising and three activities that enable it”. social science research network. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1478214. united states federal trade commission. 2009. ftc staff report: self-‐regulatory principles for online behavioral advertising. http://www.ftc.gov/os/2009/02/p085400behavadreport.pdf. university of california, berkley library. 2012. “finding information on the internet: a tutorial” http://www.lib.berkeley.edu/teachinglib/guides/internet/evaluate.html. ur, blase, pedro giovanni leon, lorrie faith cranor, richard shay, and yang wang. 2012. “smart, useful, scary, creepy: perceptions of online behavioral advertising”. soups ’12 proceedings of the eighth symposium on usable privacy and security. http://dl.acm.org/citation.cfm?id=2335362. weston, greg, glenn greenwal and ryan gallagher. 2014. “csec used airport wi-‐fi to track canadian travelers: edward snowden documents”. cbc news, january 30. http://www.cbc.ca/news/politics/csec-‐used-‐airport-‐wi-‐fi-‐to-‐track-‐canadian-‐travellers-‐edward-‐ snowden-‐documents-‐1.2517881. “what they know”. 2010. the wall street journal online. http://blogs.wsj.com/wtk/. jin ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ management planning for library systems development fred l. bellomy: head, library systems office, university of california, santa barbara, california 187 this paper deals with the application to library systems development programs of planning techniques which long ago proved their usefulness in business, military, and aerospace developments. the significant features of pert (program evaluation and review technique), wbs (work breakdown structure), planning diagrams, statements of work, cost/time estimates, schedules, manpower loading, and cost phasing are related through an example to the management requirements of a mafor systems development program at a large university library. the practical aspects of planning are treated in preference to the more theoretical. one seldom finds the sense of urgency characteristic of aerospace and military programs influencing the development of new library systems. this, of course, has both advantages and disadvantages. compared to military programs, the level of risk demanded by the urgency of the requirements may be considerably lower. development periods may be relatively longer and resource allocations can be spread out over a longer period of time, also. fewer people need to be involved in the development at any one time, but the problem of retaining individuals with a technical knowledge of the program throughout its life is greatly increased. the development of a total library system could require twenty to fifty man-years of effort and, depending on the number of people assigned to 188 journal of library automation vol. 2/ 4 december, 1969 the program, it could span a period of a decade or more. nevertheless, the requirements of a major library systems development program and those of a major aerospace or defense project are more similar than different. it is appropriate, therefore, to expect that planning techniques perfected for aerospace programs might be useful in planning major library programs. it is the purpose of this article to show how these principles are even now being applied in some library systems development programs. is planning necessary? the question is rhetorical, for every program manager uses some technique of planning in his work. as often as not, however, he attacks problems individually without an oveniding concern about the effect a particular solution may have on other aspects of the library's operation. this approach to solving problems, while obviously not an optimum one from the long-range point of view, may be the only available alternative at times. even the most ardent proponents of the total systems approach admit the possibility of critical problems requiring "quick and dirty" solutions ( 1). many of the steps to be outlined here for planning and implementing a total library system would, undoubtedly, be omitted where a solution was urgently needed to satisfy a small set of relatively simple objectives and where few external constraints and resource limitations were imposed. furthermore, not all systems designers agree that a library should even attempt to develop a "total system" in the beginning arguing that man must crawl before he learns to walk ( 2). in practice, any library will find it necessary to apply a combination of approaches, but must plan from the very beginning for a total system. even where the "fire fighting" approach must be adopted it is helpful to have a knowledge of procedures to be followed were solutions approachable in an ideal manner. a planning technique, regardless of the degree of sophistication, is only a tool and can never be expected to serve as a substitute for effective management. furthermore, such a tool must be viewed as an integral part of the entire management process. the management process has been evolving as much through the process of trial and error as through design for a long time now ( 3). many knowledgeable people have written about the process and not all of the descriptions agree ( 4,5,6). there does seem to be general agreement, however, on some of the fundamental operations which constitute a management cycle. these are diagrammed in figure 1. although phrased variously by writers the management process is usually defined to include: i ) the determination of objectives for an organization, 2) the preparation of plans for achieving the objectives, including the development of compatible cost and time schedules based on the plans, 3) the authorization of the required work, 4) the monitoring and evaluation of progress towards the objectives, and 5) the identification of alternate corrective action as problems develop. systems development planning! bellomy 189 fig. 1. the generalized management cycle. it is an unfortunate fact that too many major development programs in libraries are begun without prior establishment of objectives, prepared plans or developed schedules. too often, discussion has been begun with the unwarranted assumption that everyone concerned has a clear and identical understanding of objectives that have not been explicitly stated. during the past three years the author has had occasion to study, first hand, library automation projects underway at a large number of institutions : university of californiasan diego, university of californiairvine, university of californiariverside, university of californialos angeles, university of californiasanta barbara, university of californiasanta cruz, university of californiasan francisco, university of californiadavis, university of californiaberkeley, stanford university, ibmlos gatos, washington state university, texas a & m, florida atlantic university, southern illinois university, massachusetts institute of technology, yale university, university of maryland, harvard university, university of missouri, michigan state university, university of chicago, university of illinoischicago, university of pittsburg, ohio state university, rensselaer polytechnic institute, johns hopkins university, state university of new yorkalbany, state university of new yorkbuffalo, honnold libraryclaremont, new york public library, national library of medicine, library of congress. 190 journal of library automation vol. 2/ 4 december, 1969 in some of the major systems programs studied, planning had progressed not much beyond the identification of the initial steps which were required in the program, with tentative discussions of the immediate resources which were needed to implement the first steps. several of the managers reported that adequate funding for automated library systems development was hard to obtain before a technical capability had been demonstrated. others were of the opinion that a greater degree of library automation was inevitable and that although everyone knew that the first steps would be costly and relatively ineffective, a start had to be made sometime. in retrospect it is very clear that such arguments, while undoubtedly expedient in the short run, are not in an institution's best interest in the long run and, after all, as one associate put it, libraries are designed to last a millennium. prerequisites to planning resources the total systems approach implies the deployment of a team of professional people possessing diverse capabilities and backgrounds. one library administrator maintains that the development of a total library system requires the skills of scientific managers, philosophers, all categories of analysts, systems engineers, many categories of design engineers, computer programmers, and others in addition to library scientists. it is improbable that any one library would have on its staff personnel possessing the full range of capabilities required to pursue a successful systems development program. in some cases dedicated, full-time staff members will be able to learn the new skills which are needed; however, not all of the jobs requiring special skills need to be performed by full-time staff members. in some cases it will be feasible, and perhaps even desirable, to employ on a consulting basis individuals from outside organizations ( including equipment manufacturers). it may even be advantageous to contract with an experienced outside organization to perform an entire segment of a complex systems development program. in addition to individuals with specialized skills the systems development team should include key individuals from all of the existing library operations likely to be affected by the new systems. first, these people can provide the necessary insight into their organization's operations that only an insider can develop, and second, these people will stand as strong evidence that their organization's special interests are being considered, so that the new systems will have a much better chance of being accepted once they are implemented. above all, the early identification of one individual responsible for directing the entire development program is essential. this individual must have great skill in eliciting cooperation among people with diverse backgrounds, for systems work, like management, is partially a "people art" (7,8). systems development planning!bellomy 191 while it is imperative that a library systems program be adequately staffed it is equally important to insure adequate funding for the project. serious funding difficulties may result from a library's attempt to develop a major new system out of its existing operating budget. when a library administration commits itself to a comprehensive systems study, it must be willing to accept the risk that the results of the study may indicate that existing systems are adequate; that no new major systems development is required. if a library administration is dedicated to change for change' sake or if it has decided to undertake a research project as distinguished from the development of operational systems, much of what is being said here must be viewed from a considerably different perspective. the process of analyzing existing systems is itself valuable ( 9). libraries which have subjected themselves to systems analysis know that problems or inconsistencies within existing systems discovered during the analysis ordinarily will be followed by some immediate corrective action. few administrators consciously intend to maintain useless duplicate records or to prepare reports which serve no purpose. techniques applied by effective program managers vary widely from one individual to another and from one situation to another ( 10). aside from personal preference, factors which affect the approach taken include the complexity or scope of the objectives, the urgency of the requirements, and the risks the individual manager is willing to take. while objectives should be made explicit, they may be sketched out broadly or documented in great detail. similarly, plans should include consideration of every major activity required to achieve the objectives, but the level of detail may vary widely here, also. plans should, either implicitly or explicitly, specify the contingency relationships among all of the tasks identified in the plan. it should go without saying that schedules must be based on plans. however, there are undoubtedly countless instances where schedules have been conjured up out of thin air to meet artificial deadlines, or worse, where no schedules at all have been specified. the latter is more characteristic of dozens of small library programs now in progress, and it may be suspected that the former characterizes too many of the major library programs. objectives the reasons for undertaking a program must be determined by management in advance. a library administration begins the process of developing objectives for a modernization program by reviewing existing library policies, both generally understood and documented. because program objectives must be compatible with library policies, this is an essential first step. it will likely be necessary to develop a few new policies and to document many previously undocumented ones. the preliminary decision to undertake a modernization program may university objective serve a s 11a center of knowledge , a dive rs e collabora tion of academic and professional disciplines, with s uch emphasis on graduate and profession al s t ud ies as will provide facul ty and faciliti e s for the most ski lied in struc t ion and the most advanced research in the academ ic and profess ional dis c ip l ines." library objective efficiently provide the informational resources re quired by authorized inst ructional or research prog rams of the university i res ou rces development obje ctive library management object ive patron service s object i ve select , procure and process for us e inplan, organize, direct. control 7 a nd satisfy pat r on requests for in formaformat i ona 1 re sources needed by all coordinate the use of all capital re tion a i resources neede d i n connection authorized in s tructional or re search sou rces in a manner which maximizes wi th any authorized i nstructional o r pr og rams the effectiveness and minimizes the research program o f the un ivers ity in cost of the overall i ibrary operation t he shor test time possib le i i i i i i i select ,i ma ter ia is i acq :.11 r\l hater ia is i organize materials retrieve sl --'"later i a is ioisseminatel materi a ls i educate users i fac i lit ies objective systems object i ve administrat ion object i ve evaluate the fa c ilities needs of the evaluate the effectiveness of ope ra develo p a competent sta ff plan , or i i brary imposed by chang ing condit i ons tiona i systems and procedures and deganize, direct, control , and coordiand deve l op new facility specifications velop new systems which are more efnate its efforts t o utilize available requifed to meet the needs fective and less costly resources to achieve i ib rary object i ves i i systems and procedures systems development program analyze and improve existing library develop a tota l l ibrary system using systems and procedures t he best of the presently availa bl e devices and technologies which wil l produce a more effecti ve and/or l ess cos tly operation fig. 2. library objectives hierarchy. 1-' ~ c~ --q. t"-1 & j ~ cs ~ g· ~ !"""' 1:0 ~ t:l g ~ ~~ ,_. co ~ systems development planning/bellomy 193 have resulted from demands for change by higher governing bodies; requirements for new services in response to changing conditions; increasing backlogs; or inadequate budgets, staff or building space. in any case, program objectives will need to be established that reflect existing library policies and current or anticipated needs. h the library is a part of an institution that utilizes the planning-programming-budgeting system (ppbs ), this step already may have been taken. in the case of an essential support activity like a library, the process of identifying objectives is complicated by the fact that the operation tends to be self justifying. that is, it is an integral part of the stated objectives of the higher-level organization of which it is a part. thus, in order for a library to examine its full range of responsibilities it must first secure an approved statement of objectives for its parent organization. the purpose or objective of any organization depends on the perspective from which its functions are viewed. thus, even at the highest levels of abstraction, concerned individuals arrive at widely varying statements of objectives. in a university this process is further complicated by the general lack of concurrence on any subject, a situation which seems to be peculiarly characteristic of an academic community. in attempting to program the operations of a library, it is absolutely essential that the statement of objectives for the library, in some sense, be correlated with some reasonably authoritative and reasonably widely accepted statement of objectives for the parent organization. no statement of the library's objective will satisfy everyone concerned, but it must reflect the administration's official attitude. just as the library's objectives must contribute to the achievement of the objectives of the parent organization, so too must the objectives of the major library programs contribute to the achievement of the overall library objective. wlien objectives for program elements are identified, these in tum must contribute to the objectives of the programs, and so forth on down to the lowest level of activity in the program. in other words, there is a hierarchy of objectives, although they are seldom discussed in these terms. a portion of this hierarchy for a university library is shown in figure 2. the main purpose of figure 2 is to show how the objectives of a systems development program contribute to achieving all of the objectives at successively higher levels in the hierarchy. the systems activity is divided, in this example, into two major areas of work: systems and procedures work, and major systems development projects. the systems and procedures work is directed at obtaining relatively short-term gains while the major systems projects have comparatively long-range goals. the systems and procedures work in the example is considered to be a continuing administrative function directed at improving the general efficiency of the existing operation. much of this work is carried on by the individual supervisors themselves, with central coordination being pro194 journal of library automation vol. 2/ 4 december, 1969 vided. systems and procedures tasks include: organization planning and analysis; systems analysis and design; management audits; policy, procedures, and bulletin maintenance; forms analysis and design; reports analysis; records management; work measurement; office equipment selection; office layout; systems implementation; and related research ( 11). most libraries need to give this aspect of professional systems work greatly increased emphasis. the main objective of the systems development program cited in the example is to "develop a total library system using the best of the presently available devices and technologies which will produce a more effective and/or less costly total library operation." specific objectives could include such things as faster processing of new book orders, better control over technical processing routines, availability of more comprehensive statistics, better management information, reduction in routines performed by clerical staff, availability of better bibliographic descriptions of the collection, more effective utilization of professional staff, improved reference services, better control over the physical collection, reduction of patron involvement in the charging transaction, better circulation control, etc. naturally, no two libraries' general or specific objectives are going to match these exactly. selecting the first project the steps which are usually taken when preparing a set of program plans will be discussed in terms of a relatively typical example. let it be assumed that a systems analysis has shown that a total library system should be defined to consist of the following thirteen interrelated modules: materials selection, order processing, cataloging, materials preparation, library accounting, personnel control, systems and procedures, management information, inventory control, circulation, information retrieval, reference, and user education. also, let it be assumed that every routine function performed by the library will support one of its stated objectives and will be subsumed within one of these thirteen modules. the library system itself will be defined to be concerned only with the operational objectives, however. special, single-end-item projects, like facilities development, objectives or policy formulation, major systems projects (i.e. the development of a new module), etc., are a part of the management apparatus of the system. it will be convenient to isolate these aspects of the undertaking from the operational segments of the system. while it is reasonably clear that the formulation of a total system concept has to precede the development of any of the identified modules, it is much less clear in just what order the development of individual modules should be undertaken. a study of even some of the more obvious dependencies among thirteen such modules reveals a very complex set of contingency relationships. in a few cases these contingency factors will definitely constrain the order in which modules must be developed. usually, systems development planning/ bellomy 195 however, these considerations will be much less demanding and it will appear that the choice of implementation priorities will be, for all practical purposes, arbitrary. an evaluation of factors influencing the choice of implementation priorities will include: the nature of the interfaces among the defined boundaries of all the modules, an evaluation of the relative value of the payoffs to be expected from developing each of the modules, an evaluation of imminent changes in the state of the art affecting the development of a new module, and the political advantages to be gained from the development of a particular module. thus, the library's management must take these and other factors (including technical) into consideration when they make their initial selection of implementation priority. for the example program a set of hypothetical contingency relationships have been evaluated. they are diagrammed in figure 3, which shows how the sequence of implementation will be constrained by the various design contingencies which have been identified. the diagram says that the formulation of the total systems concept precedes development of any module. it must be the first major activity to be undertaken. further, it says that once the total systems concept has been formulated the development of any one of five modules can be initiated. the selection of a . pari ! sta rt fig. 3. design contingency relationships. end 196 journal of library automation vol. 2/4 december, 1969 ticular implementation priority is indicated by the letters associated with each block. that is, work implied by block "a" is completed first, then block "b"', then block "c", then block "d", etc., each module being completed before the next is begun. under these circumstances there would be little justification for identifying much more than the obvious contingency relationships already discussed. for more rapid development of the total library system, a much more complex planning effort would need to be undertaken. several of the major efforts shown to occur sequentially in figure 3 could, in fact, overlap significantly. some of the tasks involved in developing the cataloging module, for example, can be undertaken while the development of the order processing module is still in progress. where minimizing development time is an important program objective and where all necessary resources are made available as needed, a carefully formulated and detailed program plan is warranted-indeed it is essential. the work breakdown structure the work breakdown structure ( wbs ) displays two different kinds of information. first, it shows how the system itself is subdivided into successively smaller sub-components. second, it shows how all program activities making demands on available resources are related to the achievement of program objectives ( 12). the development of a work breakdown structure can be undertaken as soon as the system is conceptualized. furthermore, it should be available before an attempt is made to identify specific program tasks and the sequence in which they should be done (pert programing) . the work breakdown structure is a useful means of showing the components of a major program in successively greater detail. while there is, naturally, no limit to the number of levels of subdivision which can be used, four or five should satisfy the requirements of most library system development programs. the development of the work breakdown structure proceeds from the top to the bottom, showing how the total program is first subdivided into major program elements (or activities) and then how each of these in turn is successively subdivided into tasks and finally work packages. this relationship is shown generally in figure 4. a well developed work breakdown structure provides a basis for effective program planning and insures that no major program activity is overlooked during the planning phase of the program. it provides an excellent graphic representation of the interrelationship of the various components of a complex program, and shows how all aspects are related to the achievement of stated program objectives. finally, the work breakdown structure chart can be used as a convenient means for displaying progress towards achieving the objectives of a program. the details of the work breakdown structure developed for a project are heavily dependent on a number of factors. these include: the complexity, cost, and time span of the project; the relationship among the organisystems development planning/bellomy 197 program program element b task b work package b fig. 4. work breakdown diagram. zational units directly involved in the project; the objectives of the project; and externally imposed program constraints. an example of an actual work breakdown structure is presented in figure 5 and illustrates the important features of such a diagram. it shows how a typical major development program at a large research library might be dissected into its successively more detailed component parts. in this example, the systems development program is subdivided into four major subsystems developments and a general program activity. these are represented by the five blocks in the second level of the diagram (program elements). each of these five program elements is then further subdivided into more detailed tasks. tasks are divided into work packages so that the bottommost elements on the chart represent work assignments of a manageable size for program control. this is just an example, of course. in actual practice a similar structure would be developed for each project in the program. the order processing module, for example, would be divided into sub-modules, etc. an integral part of the planning function involves the budgeting of available funds (or the estimation of required funding) among the various program activities. a common technique for accomplishing this makes use of the use of the work breakdown structure. during later stages in the planning, all of the specific activities required to accomplish program obnsn"s o£y[ l01mt"" "'"'~ o('ftlop a total lltfta&y sysf~ 1)0 s'i'stut info-faces sutsysl(" ta$1( ~f£'-oa.iu (all ..oouus) a f~u\af£ oijf.uiyu -2 rf:coao amo aaalyle -) foamui.au: (omc[jtt •'p'1!.[pm[ skc tr!catioiis •s 0[sl(;n a/110 o(yuop -& a$50\ll( ccmponiius • 7 tu y tl($1~111 •i install c: fou. i)i·up •9 ooci.t\(nf •10 iiisu llan( ot.is fig. 5. library systems development program work breakdown structure. 1-' "' 00 'o t -a t"-1 t ~ ~ i ..... c;· ~ < ~ t-0 ~ tj c1> ~ g. v~ 1-' "' ~ systems development planning/bellomy 199 jectives will fall under individual blocks in the work breakdown structure. the lowest level blocks, it will be recalled, represent work packages. each of these work packages may in tum be assigned a cost account number for which funds may be budgeted. the work breakdown structure may also be used to establish summary budgets. while fund numbers may be assigned arbitrarily, coding is helpful. one workable technique is illustrated in figure 5. blocks of numbers are established for activities at each level within a structure on the diagram. one digit usually suffices at any particular level within a structure. responsibility for planning while it is probably better to assign responsibility for the preliminary planning activity to a single individual, it is imperative that plans eventually reflect the intentions of those who will actually be responsible for doing the work. these individuals will require certain guidelines before they can complete detailed planning activities. first, they must understand the program objectives. second, they must understand the basic organization of the program and the fundamental planning philosophies adopted by the program manager. third, they must understand that no plan is ever final, and should, therefore, propose every task which they believe necessary for a high probability of success. there are many advantages of drawing people responsible for major areas of work into the detailed planning activities of a program. a program plan developed in this way becomes their plan; it is one which reflects their intentions and which records their commitments. when schedules are finally developed from the plan they are much more likely to understand the significance of the completion dates and the consequence of slipping schedules or over-running budgets. it is well known that when an individual commits himself to a particular task completion date, he is more likely to meet that date than when he is directed to do so. program evaluation and review technique (pert) planning factors while the work breakdown structure provides a logical means of displaying the interrelationship among the various system components and program activities, it does not necessarily show all of the essential jobs which must be undertaken during the program. all such tasks are either implied or assumed during the preparation of the work breakdown structure, but they must be enumerated in greater detail before an attempt is made to prepare a comprehensive program plan. examples· of implied tasks might include: the selection of personnel to be assigned to the program, the procurement of funding, the survey and evaluation of manufacturer's equipment, program review conferences, system design evaluations, etc. a comprehensive list of such planning factors is another invaluable tool for use during the planning phase of the program. 200 journal of library automation vol. 2/4 december, 1969 g authorization to start 0 p-ogmii g patiton infor.hati oh 0 n£[05 v[rifi£d g total svst£h cohce:i't @ foiu'iulated g available docuttentatiotll 0 assetui l£0 g or0£11; pt~;ocessinc hodule design conce ,.t forhui.ateo g g sum~ry flow ch.art of pa[s£nt operation 8 pii.£pared g 1100ul£ interfaces 0 studied g) heasuaaflle processing @ pamitet£1\s identified g pllofilehs dgcuhenteo and b stud ied g pai\aaeter i'i.easuiuhent g [jtp[r ii'i.ents des ich£0 o--o-o orou process inc objectives ~nalvz£0 proc essing policies oocuhenteo cuili!;ent costs heasuaeo 8y function neu h.ocess lng oijec• lives "8minstom ing" coh'lete processi ng in foi'.kation needs stud i ed processing constraints· identified pllocessing paii.aaetus 11easuii.ed prese nt costs p(r units of \jdrk calculated process inc frequency req:uirehents studied bas i c svstehs alternatives io[hti fieo critical path (sequence of act ivities esti hateo to ft[quh~.[ the 1'\0st tih[ to c~pl£l£) c0 design tmde·dff stuotes complete @ complete p.evie\1 and ihfinement of hooule design concept @ complete reyi£\j and refinement of total syst£hs concept 0 hanaceh ent appr oval of concepts @ hhctiveness of old and n[v designs cofoi.paa.£0 g cost to develop ano opeilate new syste/1 esti,..ated @ opeii..at lng costs f~ old and ne\o' designs cohpaii.ed @ cost/e ffectiveness tmoe•. off study complete g itanac[ment appii.oval to ihplehent c0 ceneii:al systots sp£cifications pii.£p,ait[o fig. 6. system development program (pert) planning diagram. systems development planning/ bellomy 201 ® ;ene aal svstehs e eq.uiphent @ operati ng personnel specifications paemaeo selected taaineo (0 detailed traoe·of'f g office layout g module elements studies for systems designed tested cot\ponents cqi1pleted e orgajrrf!zational @ new module in stalled @ record and file slftuctui'es formulated specifications prepared @ shake o()in run e detailed process\ nc cot\pleteo @ kachin( proci\aa pj\oc eourts oev[lop[o spltlficatiohs prhar(o @ release hooul[ to @ coi1plete evaluation opeaating unit 8 ihput/otjt sp£cifica. .. anq rhim[/1[nt of lions prepatl\£0 module des ic'fs @ follc)i-up eyaluat ion complete g equ1 pt1ent sp£cifica.tious g mnac[m[ nt approval prepared to implehe'it @ 1100vl[ design fi na liz£0 g module 111ple11entat ioii g existing manual plans prepared files co nv eat£0 @ module desig~ documetiteo g >;((oa.o forhats 9 data communication desig"leo links ihstalleo @ publish all system e tqu i pt1eht procured docuh[ntatiotl @ f'tl£ structur($ oes jgh£0 initiation of next task g operation instructions ·g constrai ned by' complet ion g hach it.[ progaahs ~(par[o of seveml pr lor tasks coo(o 8 assembly and test of @ input and ou!put ka rwar£ cof1plet£ forms oesjg~£0 fig. 6 continued. 202 journal of library automation vol. 2/4 december, 1969 sometimes good lists of planning factors can be developed by reviewing other programs of a similar nature. while no list of planning factors developed by other organizations or individuals will prove entirely satisfactory in a new tmdertaking, it seems wise to take advantage, where possible, of others' experiences. sequencing activities the axiom, "the best place to begin is at the beginning," is probably less true of program planning activities than any of life's other endeavors. planning should begin with the important program goals (the major program objectives as specified by the library's chief executive). this is an alien approach to many, for it seems more "natural" to assess one's present situation and then to ask "where do we go from here?" there are fewer unknowns associated with planning activity for the near future than for the far. conditions can change radically during the course of a program. assumptions may be discovered to be poor or false. after having been caught up in such situations a number of times, everyone finds it more natural to say 'til cross that bridge when i come to it." but some people responsible for funding major library development programs are not "natural" thinkers. they often want assurances of specific accomplishments within specified periods of time in return for a specified amount of funds. it is not unusual for them to get very "unreasonable" when a request for funds is not accompanied by these kinds of "justifications." thus, program managers must approach the initial planning activity in an unnatural way. dilliculties must be anticipated and contingencies identified. above all, the plans must include recognition of every essential major activity. when plans are developed with reference to a carefully formulated work breakdown structure, the chance of inadvertently omitting an important activity is greatly reduced. an example of a typical planning diagram is presented as figure 6. such a plan is developed by first selecting a primary project objective. then, moving backwards in time, each task required to achieve the objective is determined in sequence (13, 14, 15, 16, 17, 18). the process of charting tasks in this manner to show their contingency relationships continues backward in time until a task contingent upon nothing other than authorization to proceed with the program is reached. as a practical matter, when a task has been identified that is contingent upon the completion of several other tasks, it is probably advisable to enumerate all of these tasks before selecting one to trace on back to the beginning of the program. naturally, all of the tasks will have to be traced back before the charting process is complete. preparation of the initial charts is an iterative process and assumes that a number of reviews will be made by knowledgeable individuals and their comments rehected in subsequent drafts of the preliminary chart. systems development planning/ bellomy 203 during this preliminary planning stage an effort should be made to have the diagram reflect all tasks that everyone thinks essential. furthermore, wherever tasks ideally should be conducted sequentially, they should be shown as sequential on the chart. when this procedure ultimately reveals schedule conflicts, compromises can be made. the logic adopted initially will likely be modified a number of times before even the first preliminary draft of the chart has been completed. arrangements that seemed logical initially will be discovered to be inconsistent as the plan develops, and new approaches and subdivisions of activities will be required. every good program planner knows that no amount of careful thought and foresight will result in the identification of all problem areas that will interfere with progress once the program is underway. consequently, he will either explicitly or implicitly build into the program plan contin~ gency factors. in some cases there will be sequences of activities that can be completed ahead of the time when contingent tasks begin. in these cases the waiting time and contingency factors can be identical and the problem is solved automatically, so to speak. in the critical path (that sequence of activities which will take the most elapsed time to complete) there will be no waiting times so that contingency factors must be interjected into this sequence of activities. they may be explicitly identified as contingency time or they may be implicitly imbedded in other tasks in the program. for example, management reviews or evaluations can be "padded, with the additional contingency time required for a viable pro~ gram plan. the best program plan will result when the final preliminary draft of the planning diagram reflects the understanding of all the individuals responsible for executing portions of the program plan. their backgrounds and experiences will permit them to see discrepancies and inadequacies in the plan which any single man could not possibly see. in particular, they will tend to view the plan from their own organization's point of view and can be expected to scrutinize critically those areas for which they will have some responsibility. some of their comments will not be com~ patible with the overall program philosophy or with the requirements of other organizations involved in the planing process. someone will need to arbitrate the special interests of individuals reviewing the plan. it is im~ portant, however, to attain a degree of concurrence among all individuals before the planning diagram is finalized. each of the involved individuals should consider the plan to be his plan, reflecting his judgment of what must be done to achieve the stated program objectives. the program man~ ager, who is responsible for the overall direction of the program, must be a primary participant in these negotiations, naturally. not every program will require such detailed planning. the process of periodically reviewing and revising the detailed plans is time consuming and may be completely unwarranted where the pressures of time do not force the performance of many tasks simultaneously. where all major program 204 journal of library automation vol. 2/ 4 december, 1969 activities can be scheduled for performance sequentially the process of planning is greatly simplified. referring to figure 3, again, it will be noted that the first major undertaking in the example is the formulation of a total system concept. the second major undertaking is the development of an order processing module. the third is the development of a systems and procedures module, and so on for the rest of the thirteen modules in the example. it is assumed that the development of each module is substantially completed before initiating the development of the next. using the less detailed planning approach the interrelationship among the several major activities that could be undertaken in formulation of a total system concept are summarized in figure 7. it will be seen that the second, third, and fourth activities could be scheduled to occur simultaneously, if the necessary personnel to undertake them were available. however, there is no reason why they could not be performed sequentially. taking the same gross planning approach the interrelationship among the various activities that might be undertaken to develop one of the modules are summarized in figure 8. this generalized planning network could apply equally well to any of the modules. statement of work those responsible for estimating the magnitude of work to be performed in each activity will require some knowledge of the scope of each activity. a generalized statement of work for the development of any one of the modules (figure 8) might look as follows: 1) formulate module objectives the objective of the module must contribute to achieving the objectives at all higher levels in the objectives hierarchy (figure 2). in addition to a generalized statement of objectives for the module, a comprehensive list of specific objectives needs to be formulated, in particular, what functions the module must perform; in other words, what products the module must produce. in performing this task attention needs to be paid both to the generalized objectives of the parent organization as well as to the present activities of the existing operations which imply objectives themselves. the design concepts finally adopted will reflect these objectives. 2) document existing operations in the process of formulating a total systems concept a great deal of documentation will have been assembled for all operations of the library. however, the emphasis there was on interfaces among operating units of the library. in executing the present task the emphasis is on detailed inputs, outputs, external constraints, processing information needs, resource requirements, and detailed procedures. this task must be concerned not only with specific items, such as books or forms, but also with specific data elements utilized or generated within the operation. systems development planning/bellomy 205 document policies & objectives document system define sy ste m equi rements prepare implementation plans end r-------~--~----------------~ 7 fig. 7. total system concept formulation planning network. formulate module ob·ectives prepare design specifications 0 design and develop t-jodule 0~----~c~o~nd~u~c~t_f~o~j~i~~·~·~u~p~e~v~a~lu~a~t~io~n~------~~ ~--------~r~e~f~in~e~m~o~d~u~l e~d~e~si~g~n------~-----20~o~c~u~~n~t~m~o~d~ul~e~de~s~i~gn~------------~ fig. 8. generalized module development planning network . 206 journal of library automation vol. 2/4 december, 1969 3) analyze and summarize the previous task provided the data necessary for putting together a comprehensive picture of the existing operation. the mass of data and materials which were collected need to be summarized, in a way which presents a concise display of the significant characteristics of the operation. all significant measurable parameters need to be identified. those capable of succinctly characterizing the operation must then be measured under carefully controlled, typical operating conditions to provide an accurate picture of current costs and effectiveness of the operation. this task should culminate with the informal publication of a module parameter summary. 4) formulate design concepts once the module objectives have been formulated, various alternate means for achieving these objectives can commence to be discussed. one important objective of this particular task is the identification of as many alternate approaches to satisfying the objectives as can be conceived. in this regard "brainstorming" sessions are useful ( 19,20). the fullest range of techniques and devices available should be explored for possible use in the implementation of the system module. during early stages in the development of a design concept little concern is paid to even the obvious design constraints. eventually, of course, a system concept must be postulated which satisfies these contraints, but initially even impossible approaches may suggest others which are possible, so that all alternatives will be considered in the beginning. before a design concept is finalized the result of the systems analysis of the existing operations should be evaluated. when a single set of concepts is finally selected, estimates of development and operating costs for a new module based on the concept, together with its projected effectiveness, should be made and compared with those of the existing operation. the design concept document should describe all functions to be performed by the module, as well as special techniques or items of equipment which will be used. 5) prepare design specifications based on the generalized descriptions in the design concept document, detailed specifications for the module are prepared. these specifications include such considerations as: the numbers, kinds, output formats, accessibility, and frequency of various management reports; the number of processing stations of various kinds; a comprehensive list of record contents; a description of all files required by the module; descriptions of all forms required by the module; personnel requirements and organizational descriptions; office layout; data processing machine software; equipment to be procured; timing of processes; and special module interface features. the documented design specifications should be circulated widely among operating personnel for comment and possible modification based on this comment. systems development planning/ bellomy 207 6) design and develop module this task includes the development of detailed procedures for transforming the module inputs into all of the required module outputs. machine programs must be written, forms designs finalized, file structures and record formats optimized, detailed operating instructions and procedures written, equipment interfaces confirmed, and personnel training programs developed, to name most of the more important undertakings. while no attempt should have been made at this point in the development of a system module to prepare final formalized documentation, enough background material should have been assembled to permit the preparation of such documentation. 7) assemble module components special equipment must be procured. interfaces between the library and a remotely located electronic data processing system must be established. existing personnel must be retrained and new personnel recruited. new communication links, if required, must be installed. 8) test module design every segment of the module design should be tested prior to its installation. new items of equipment or communication channels should be tested through many cycles to verify their operating characteristics, as well as to familiarize a few members of the staff with their operation. if a pilot operation of the module is possible, it should be undertaken. during the testing phase a continuing effort should be made to detect serious design deficiencies. the module should be exercised through several processing cycles, utilizing as many different variations of input data and output requirements as possible. such a testing phase should evaluate the adequacy of the various forms and reports, as well as provide some preliminary information about the accuracy of the predicted operating costs for the new module. 9) install module the first element of this task is the preparation of an installation plan. during the preparation of the installation plan early consideration needs to be given to the installation approach (phased, parallel, all at once) to be followed ( 21). during the changeover period special attention will need to be paid to operational problems as they develop. no system design is perfect and during the installation period major design deficiencies may become apparent. the major file conversion efforts are included in this task. this task culminates with turning the new module over to the operating personnel. 10) conduct follow-up evaluation during the new system's shakedown period it will have been forced to operate as intended by the designers and the department supervisor. 208 journal of library automation vol. 2/4 december, 1969 however, the real test of the workability of the system comes after this initial period when the system is "released" to run without any special attention being paid to it. after the system has been in operation for a period of time an evaluation of its effectiveness and the actual operating costs should be undertaken. because no system is ever perfect, even a brand-new system may be significantly improved as a result of this followup evaluation. 11) refine module design if the follow-up evaluation has disclosed any design deficiencies, a modification of the original module design is undertaken where the cost of correcting the deficiency is not greater than the value of the improved operation. 12) document module design after warranted modifications to the module design have been made as the result of the follow-up evaluation, the module design and operating instructions should be formally documented and released. until about this point in time the module design parameters may have been undergoing a process of gradual evolution, so that formal documentation of them may not have been justifiable. full and careful documentation of the new module design completes the module development project. estimating once the preliminary plan has been completed and approved, estimates of manpower, equipment, and materials requirements can be made. some people find it convenient to mark the various estimates on the pert planning diagram itsell, using different colors for each of the estimators. this has the advantage of displaying all previous estimates to each individual attempting to provide estimates for other activities in the program. however, this approach results in estimates being made on the spot without the careful deliberation and evaluation which they deserve and, therefore, probably is not advisable. the use of estimation worksheets can be effective. a worksheet that has been prepared for the example program is presented as figure 9. (note that a task breakdovvn has been included for illustrative purposes in the first two program elements, only.) each planned activity is entered on the form, where activities have been numbered in their general order of occurence. enough copies of these forms are then made so that each organization can have its own full set to use for estimating. the responsible individual in each organization provides estimates of required manpower, elapsed time, materials costs, and special equipment or facilities based on his understanding of the job. estimates of manpower requirements are made by category of manpower, except where a specific individual must be applied to a specific task. in such cases this individual systems development planning/ bellomy 209 months services ela pse d & man hours by category>', no. program elements time equipment material 1 2 3 4 5 a total system concept 18 0 0 1800 1500 1600 --1 assemble document at ion 5 --200 200 100 --2 document organization 2 --100 100 200 --3 oocumen t sys tern 10 --1000 1000 800 --4 policies and objectives 5 --100 100 200 --s define system requirements 1 --200 -100 --6 total system concept 1 --100 so 100 --7 implementation plan 2 --100 50 100 --b order processing module 26 $2s,ooo $23,000 1500 1600 2000 3000 800 1 formulate objectives 3 --100 10 10 --2 document operations 4 --100 so 200 -200 3 analyze and summarize 3 --100 so 200 -100 4 design concep ts 1 --50 so 100 100 -5 design spec if i cett ions 1 --so 10 90 100 -6 design and oeve 1 op 12 -$18,000 6oo 20 0 600 1500 100 7 as semb 1 e components 1 $22,000 -so 10 100 --8 test design 1 -$ 3,000 50 -90 150 -9 install module 2 $ 2. 000 $ 1 ,000 100 1000 100 150 400 10 follow-up evaluation 1 --so 20 100 200 -11 refine design 1 $ 1,000 $ 1,000 so so 100 300 -iz document design 3 --zoo 150 300 500 -c systems & pro cedure module 18 $ 1,000 $ 3. 000 4000 3000 700 zoo sod d mater ial preparat ion module 6 $ 1,000 $10.000 200 200 500 500 -e circulation module 18 $41,000 $15 , 000 2000 2000 2000 3000 zooo f user eoucati on module 18 $10,000 $10.000 1000 400 500 zooo -g inventory control module 6 $ 2,000 $ 6,000 100 300 500 300 -h perso nne l control module 12 $ 1,000 $10,000 1000 500 1000 700 -i cata log ing module z4 $35,000 $30.000 4000 1000 3000 4000 -j l1 brary account ing module 12 $ 2,000 $10,000 1000 500 1000 2000 k materials selection module 12 $ 7,000 $ 6,000 1000 200 1000 1000 -l management informal i on module 18 $ 8,000 $10,000 1000 300 2000 1000 -m reference module 24 $10,000 $15,000 2000 2000 3000 3000 -n information retrie val module 36 $50.000 $30,000 4000 2000 4000 5000 -* (1) l ibrarian, (2) clerk-typist, (3) ana l yst, (4) programmer, (5) general assistance fig. 9. cost/time estimates. is identified as a separate category of manpower and estimates are made separately for him. when all estimates have been completed, the costs are summarized by funding categories, as has been done for the example in figure 10, and the elapsed time estimates are marked onto the planning diagram. scheduling an elapsed time analysis is performed to determine the estimated time of completion for every event in the program. this is accomplished by adding together all the estimated elapsed times in a sequence of activities, and indicating at each event marker the cumulated elapsed time to that point. where several sequences of activities converge on a single 210 journal of library automation vol. 2/4 december, 1969 general nonsuppl i es equipment assistacademic academic & & total no, program elements ance salar ies salar ies expenses facilities costs a total system conce pt $ -$ 10,500 $ 16, 050 $ 2,000 $ -$ 28,550 i assemble ooclmentat ion 2 document organizat ion 3 document sys t em 4 polic ies a nd object ives 5 define sy stem requirements 6 tot a i system concept_s 7 i mpl ementati on plan b order processing module 2,200 8, 700 19,320 39,050 25,000 94, 27 0 i formulate object i ves 2 oo ctnent operat i o ns 3 ana i yze and sl.w1111a r i ze 4 design con cepts 5 design specifications 6 design and deve i op 7 assemb i e components 8 test des i gn 9 install modul ~ 0 follow-up eva luat ion i refine des ign 2 doc ument design c systems & procedure module 1,375 23,200 13,350 4,070 1, 000 42,995 d material preparation module -1, 160 4,290 12,675 1,000 19, 125 e circulation module 5,500 11,600 20 ,400 31 , 050 4 1, 000 109,550 f user educ at i on module -5 ,800 5 , 230 20,700 10 , 000 41,73 0 g inventory control module -sao 4,560 7,600 2,000 14,740 ~i personnel contr ol module -5,800 8,850 13.750 1,000 29,400 i catalogin g module -23 ,200 25 ,200 51 , 400 35,000 134,800 j ll brary accounting module -5,800 8,850 20,700 2,000 3 7. 350 k mater i als se lect i on module -5,800 8,040 11,350 7,000 32,190 l management i nforhati on modu le -5 ,800 15 , 810 15,350 8,000 44,960 m reference module -11,600 27,900 3 1, 050 10,000 80,550 n information retrieval module -23,200 35 , 400 56 ,750 50,000 165,350 ------------totals $ 9, 075 $142,740 $213,250 $317 , 495 $193,000 $875,560 fig. 10. costs by budget category. event marker, that sequence requiring the longest period of time determines the cumulative elapsed time to reach that event. those sequences which require less time will have slack time (waiting time) built into them and this can be used for adjusting schedules to minimize peak manpower, equipment, or facilities loading. when the cumulative elapsed times for each event have been determined for the entire program the preliminary scheduling activity can commence. it is convenient to use tenth's of forty-hour-work weeks in expressing elapsed times because 1/10th of a week equals a half day, which often seems to be a good minimum unit of time for estimating purposes. when the elapsed time analysis is complete it may be determined that the total elapsed time estimated for the program is incompatible with the systems development planning/bellomy 211 required program completion date. if this happens, it will be necessary to reinspect the program logic in an effort to identify activities originally planned to occur in sequence which can, in fact, be performed in parallel. such a change in the plan, however, will almost always imply increased risks. sometimes it will be possible to compensate for the increased risk by additional backup efforts, or by assigning the same activity to two different groups for simultaneous parallel performance. upon closer scrutiny it may be found that some of the activities originally thought essential are, in fact, merely desirable and can be eliminated from the plan entirely. eventually, this strategy will force the planned program elapsed time to be compatible with the program completion date specified by the program manager. in establishing schedules for activities, it is always best to leave any available slack at the end of a sequence of activities rather than at an earlier time in the sequence. because the above approach to scheduling may produce undesirable manpower peaks or unreasonable work schedules for individuals, early drafts of the schedule likely will need to be modified significantly before the draft can be finalized. a preliminary schedule is prepared by charting on a graph the earliest beginning and ending time for each activity identified in the plan. in figure 11 such a schedule has been graphed. tasks which are not contingent upon anything other than the start of the program (tasks 1 and 2 in figure 8) can be scheduled to commence on the first day of the program and be scheduled for completion in the estimated elapsed time for each one. for example, if task 1 had been estimated to require an elapsed time of three months the graph would show a bar starting at the beginning of the program and running out to the third month. some of the tasks (tasks 3 and 4) depend on the completion of earlier tasks. thus, task 4 (in fi~re 8) could not commence until the third month. then, if the elapsed time to complete that task had been estimated at one month, the bar for that task would begin at the third month and end on the fourth month. similarly, all tasks are scheduled in this way for the entire program. utilizing the other estimations (see figure 9) and making reasonable assumptions about how the man-hours are distributed in time for each task, the total number of man-hours by category can be calculated for any week in the program. if it were assumed, for example, that the expenditure of personnel time was evenly distributed throughout each of the tasks, and if two tasks were scheduled during the same week, with an average of 15 hours per week for one task and 25 hours per week for the second, a total of 40 man-hours of labor of that particular category would be estimated to be expended during the week in question. this sort of analysis is continued until the estimates of man-hour expenditures by category are available for each week of the program. now it is possible to analyze any period of activity in the planned program to determine what level of each category of manpower or special no. program elements 12 elapsed time in monts 24 36 a total system concept i assemble documentation 1-2 document organization -3 document system 4 policies and objectives 5 def i ne system requirements • 6 total system concept • 7 implementation plan b order processing module 1 formulate objectives -2 document operations -3 analyze and summarize .... 4 design concepts • 5 design specifications • 6 des i gn and develop 7 assemble components • 8 test design • 9 install module -10 followup eva l uation • 11 refine design • 12 document design c systems & procedure module d material preparat ion e circulation module f user educat ion module g inventory control module h personnel control module i cataloging module j library account ing module k materials selection module l management in formation module h reference module n 1 nformat i on retrieval t~odule fig. 11 . syst ems development program schedule. 41l 60 72 • 1:-0 ~ 1:-0 'c ~ .q.. ~ & ~ ~ > ;: 8" £ ..... .... 0 ;:s < 0 ~ ~ tj cl> () cl> 8 0" cl> ~'"i ~ (.0 ~ systems development planning/bellomy 213 facilities will be required during that period. during the first months of a typical program there will be heavy demands made on various categories of manpower. furthermore, later in the program there will be periods when practically no demand is made for the same categories of manpower. it usually would be desirable to minimize the peaks by shifting some of the activities to later times when fewer demands were being made. it is almost always possible to accomplish some shifting of schedules in a typical program. after an evaluation of the manpower loading implications of various scheduling alternatives a program schedule like that shown for the example in figure 11 might be adopted. based on the cost/time estimates presented in figures 9 and 10 and the program schedules presented in figure 11, resources requirements by year can be developed for the life of the program. the manpower loading chart (figure 12) shows manpower requirements for each of the four categories of skills speci£ed. the cost phasing chart presented as figure 13 shows funding requirements by category for each year of the program. it would be possible, of course, to further break down the costs into individual accounts as discussed in the section describing the work breakdown structure. a much tighter time phasing of all categories of costs is required for program control, but that subject is beyond the scope of the present article. o---o ll brad ans )e ~ typ ists f • anal ysts • • prograrrrners l3 t im e i n years fig. 12. manpower loading chart. year 1 2 3 4 5 6 7 8 9 10 ii 12 13 total ------man hours by category>'< il costs by category•'<>'< i 2 3 4 5 i ga ac.sal. n-ac.sal. s & e e & f 1,300 1,300 1 '100 0 0 0 8,000 12,000 2,800 1,300 1,200 0 300 1,000 16,000 12,000 2 , 000 1,000 3,400 3,600 2,400 3,500 1,000 3,000 20,000 37,000 10,000 0 1,800 1, 700 1,800 2,200 1,500 4,000 10,000 17,000 46,000 25,000 2,300 1,500 2,200 5,200 500 1,000 13,000 15,000 80,000 55,000 2,000 900 1,600 1,800 0 0 12,000 9,000 14,000 i ,000 2,000 500 2,000 2,000 0 0 12,000 5,000 51,000 35,000 2,300 500 2,000 2,300 0 0 13,000 5,000 1 i ,000 7,000 1,700 1,200 3,000 2,200 0 0 10,000 12,000 46,000 8,000 1,800 1,500 2,200 2,500 0 0 10,000 15,000 0 10,000 1,400 700 i ,500 2,000 0 0 8,000 7,000 57,000 0 1,400 500 1,500 2,000 0 0 8,000 5,000 0 50,000 400 300 3oo 0 0 0 2,000 3,000 0 0 24,600 15,500 22,800 25,700 3,300 9,000 142,000 154,000 317,000 192 , 000 ----------------'-------*(1) librarian (academic salaries) (@$5.80/hr.), (2) clerk-typist (non-academic salaries) (@$2.70/hr.) (3) systems analyst (non-academic salaries) (@$7.50/hr.), (4) prog ramme r (supplies & equipment) (@$5.35/hr.), (5) general assistance (@$2.75/hr.), note: 1,800 working hours per year used. ,.,., costs rounded to nearest $1 , 000 fig. 13. systems development program cost phasing. tot a i 20, 000 32,000 70,000 102,000 164,000 36,000 103, 000 36,000 76,000 35,000 72,000 63,000 s,ooo, 8j4,oooi -~ 1.\0 ~ ~ w "i s ,_ .q., l:""t .... c"' a ~ ::to. ~ f ..... c· ;:! <: ~ 1.\0 ~ t:j (1) n (1) ~ (1) ~"'1 ~ "' ~ systems development planning/ bellomy 215 the "completed" plan when the planned program is finally compatible with the externally imposed constraints and when there is a reasonable degree of concurrence among all the involved organizations, it is generally desirable to formalize the documentation. the temptation to consider the document unchangeable will be eliminated if it is pointed out that individual dates in the schedule rehect current "best estimate" targets, and that planning and rescheduling will be a continuing effort throughout the program. the wide availability of program plans permits all involved individuals to assess the impact of their efforts on the overall program. furthermore the pert planning diagram provides them with a convenient means for recording their performance against the program goals. finally, the cost and benefits data contained in the plans are major inputs to any planning-programming-budgeting system (ppbs) and this more rational approach to partitioning limited resources among the many competing activities of large institutions, like universities, is going to become an increasingly significant part of library operations in the future ( 22, 23, 24). no plan is ever final. it must be periodically reevaluated and warranted modifications reflecting newly identified requirements or changes in the operating environment, etc., must be made. it is an axiom of total systems design that the implementation of earlier parts of a system may so significantly modify the actual operating environment as to dictate major changes in the design specifications for other parts of the system to be implemented later. thus, a total system is much more likely to evolve, than to unfold according to some predetermined design. we must continue to expect systems work to "evolutionize" rather than revolutionize library operations. acknowledgments the work on which this article is based was supported in part under the grant from the council on library resources to the institute of library research for the preparation of the forthcoming handbook of data processing for libraries. eugene graziano made a perceptive and critical evaluation of an earlier draft of this paper which led to its extensive revision; and robert hayes encouraged development of this article from material the author prepared for the handbook of data processing for libraries. references 1. hayes, r. m.: "concept of an on-line, total library system," library technology reports, (may 1965), 13. 2. de gennaro, richard: "the development and administration of automated systems in academic libraries," journal of library automation, 1 (march 1968), 75-91. 216 journal of library automation vol. 2/4 december, 1969 3. george, c. s., jr.: the history of management thought (englewood cliffs, n. j.: prentice-hall, 1968). 4. roberts, edward b.: "industrial dynamics and the design of management control systems," in management controls, edited by charles p. bonini (new york: mcgraw-hill, 1964), pp. 102-126. 5. "feedback," the systemation letter, 166 ( 1966). 6. wheeler, j. l.; goldhor, h.: practical administration of public libraries (new york: harper and row, 1962). 7. deardon, john: "how to organize information systems," harvard business review, 43 (march 1965), 67-73. 8. "how a system is built," the systemation letter, 186 ( 1966) . 9. "analysis check list," the systemation letter, 12 ( 1958). 10. holtz, j. n.: an analysis of major scheduling techniques in the defense systems environment (santa monica, california: the rand corporation, 1966). 11. minnich, c. j.; nelson, 0. s.: systems management ror greater profit and growth (englewood cliffs, n.j. : prentice-hall, 1966), pp. 16-34. 12. national aeronautics and space administration: nasa pert in facilities project management (washington, d.c.: u. s. government printing office, march 1965), pp. 9-11. 13. kadet, jordan; frank, bruce h.: "pert for the engineer," ieee spectrum, 1 (november 1964), 131-137. 14. management systems corporation : dod and nasa guide pert cost system design (cambridge, massachusets: june 1962), 145 15. national aeronautics and space administration: nasa-pert "c" computer systems manual (washington, d.c.: government printing office, september 1964). 16. pert coordinating group: pert guide for management use (washington, d. c.: u. s. government printing office, june 1963). 17. pert . .. a dynamic project planning & control method. ibm general information manual (white plains, new york: ibm data processing division), 28 pp. 18. navy department, special projects office: an introduction to the pert/ cost system for integrated project management (washington, d. c.: u. s. government printing office, 1962). 19. "brainstorming: cure or curse?" business week, 1426 (december 1956) . 20. osborn, alex: "brainstorming, new ways to find new ideas," time, 99 (february 1957), 90. 21. systems and procedures association: business systems (cleveland; 1966). 22. "planning-programming-budgeting, selected comment prepared by the committee on government operations, united states senate, july 26, 1967 (subcommittee on national security and international operations) . systems development planning/ bellomy 217 23. hartley, harry j.: educational planning-programming-budgeting: a systems approach (englewood cliffs, new jersey: prentice-hall, 1968). 24. fazar, willard: "the importance of ppb to libraries," presented to the institute on ppbs for libraries, department of library science, wayne state university, detroit, michigan, september 23, 1968. microsoft word december_ital_perrin_final.docx usability testing for greater impact: a primo case study joy marie perrin, melanie clark, esther de-‐leon, lynne edgar information technology and libraries | december 2014 57 abstract this case study focuses on a usability test conducted by four librarians at texas tech university (ttu). eight students were asked to complete a series of tasks using onesearch, the ttu libraries’ implementation of the primo discovery tool. based on the test, the team identified three major usability problems, as well as potential solutions. these problems typify the difficulties patrons face while using library search tools, but they have a variety of simple solutions. introduction the texas tech university libraries’ usability taskforce was created to inform, facilitate, and promote usability initiatives for services supporting teaching, learning, and research. the team’s first assignment was to study the libraries’ new implementation of primo, a discovery tool by ex libris, which is capable of simultaneously searching all library resources. primo, branded onesearch for public use at the ttu libraries, was initially implemented with no further customization. library administration charged the team to evaluate the primo interface as set up, determine whether the tool served patrons in an intuitive way, identify problem areas, and share possible improvements with the library systems group. the issues the team encountered, problems found, and lessons learned along the way are relevant across all library usability efforts and may assist other organizations in developing better searching tools. the purpose of this study was to evaluate how well onesearch served library patrons and to identify ways it could be improved before it replaced the existing library search tools. the information collected about the website navigation and searching practices of ttu students was also expected to assist instruction librarians in teaching students how to use onesearch. method the usability study comprised two components to evaluate both onesearch use and patron thoughts, comments, and observations. the first component was a series of seven joy marie perrin (joy.m.perrin@ttu.edu), is assistant librarian, digital resources unit, melanie clark (melanie.clark@ttu.edu) is associate librarian, architecture library, esther de-‐leon(esther.de-‐leon@ttu.ed) is assistant librarian for electronic resources, lynne edgar (lynne.edgar@ttu.edu) is assistant librarian, library systems office, texas tech university library, lubbock, texas. usability testing for greater impact: a primo case study | perrin, clark, de-‐leon, and edgar 58 tasks that participants completed using onesearch while the team observed. each participant was guided through the process by a facilitator who would prompt the participant when he or she got stuck. the rest of the team observed both the participant’s screen movements and audiovisual footage of the participant’s facial reactions from another room, with the help of techsmith’s morae usability software. while morae made the observation process easier, the same results could have been achieved through simple screen-‐capture software, a video camera, or simple note taking by the facilitator. in addition to the observation, the team used retrospective recall, asking the patrons to think through their choices after the tasks were done and explain their process.1 the second component to the study was the system usability scale (sus), a standard survey used to evaluate systems based on self-‐reported user experience.2 participants completed the sus survey after finishing the tasks. for the first component, the tasks were developed to cover seven types of materials a patron might find using the search tool: 1. you are looking for a work called “operations management” by roberta russell. find out if the library has this book and if you can check it out. if the library has it, where is it located? 2. you are not on campus, but you want to read a book about human resources management. see if there are any books available online and if you find them, try to read one of them. 3. find the database jstor and open it. 4. you need to find a full-‐text online article about customer service training. try to find an article and view it. 5. you want to see a picture of someone from the 1977 volume of the texas tech yearbook (la ventana). locate the yearbook and then open the first page of 1977. 6. find dr. rebecca k. worley’s ttu thesis the house of yes: a director’s approach and find the abstract of the thesis. 7. you need to find a picture of frank lloyd wright’s fallingwater. find the picture and access it. the order of the tasks was important to test the learnability of the system and minimize user frustration. the first task was a simple book search, allowing participants to familiarize themselves with the tool. the difficulty of the searches increased over the next three tasks. the method of “start easy, finish hard” is recommended in the cuep workshop to test memorability and learnability of the website.3 however, the team varied this model by designing the last three tasks to be similar to the first two. these tasks each requested a different material type, but all the materials could be found with a simple search identical to that of the first task. this design showed whether participants learned how to use the system and remembered the process. participants struggling with the last tasks would indicate a severe usability problem. the team timed participants’ performances with each task and noted each error or problem encountered. each task was labeled as either a success or failure for each participant depending on whether he or she completed the task, had to be guided to complete it, or gave up. information technology and libraries | december 2014 59 participants eight patrons participated in the study. from a demographic profile that participants filled out prior to completing the tasks, the team identified three expert users, three intermediate users, and two novices. student classification, how frequently the participant used the library website, and the ways the participant used the library website all factored into the user status. results system usability scale score the system usability scale “grades” a website or system by how usable patrons perceive the system to be, resulting in a single number score. a score above 80.3 is in the top 10 percentile and is considered to be an a. a score of 68 is average, with anything under 68 below average.4 onesearch received a sus score of 78.25 from the eight participants of the study. this is comparable to a b+, indicating that overall, the implementation of onesearch was successful, at least in terms of how students perceived it after using it. to identify specific problems with the interface, the team looked at three factors of the participants’ performances on the tasks. average time spent on each task figure 1 shows the statistics of the seven tasks. as expected according to the average completion time, participants spent more time on the first task. it may be inferred that they were acquainting themselves with the system. the fourth task, which the team expected to be the most difficult, proved to have the longest completion time. from the completion time alone, the team was unable to determine if task 4 was problematic or not. while participants started their search with the onesearch interface, they had to wait for a separate integrated system, such as a citation linker, to retrieve any found articles. this added to the task completion time. task average completion time (min) completed with ease completed with difficulty average error rate 1. find a book 1.74 50% 50% 1 2. find an e-‐book 0.97 87.5% 12.5% 0.63 3. open a database 1.11 37.5% 62.5% 3.25 4. find an article 2.92 37.5% 50% (1 did not complete the task) 3.63 5. find a digital collection item 1.12 62.5% 37.5% 0.75 6. find a thesis 0.87 87.5% 12.5% 0.38 7. find an image 0.70 87.5% 12.5% 0.38 table 1. task results usability testing for greater impact: a primo case study | perrin, clark, de-‐leon, and edgar 60 a more telling observation was that the time on tasks, excluding task 4, diminished between task 1 and task 7, suggesting that participants had no trouble learning and remembering how to use the system. error rate for each task the error rate proved to be the most accurate indicator of usability problems with each task. each time a participant chose a wrong path, faced an impasse, or had to be guided by the facilitator, the event was labeled as an error. in the ideal scenario, the participants would make no errors; therefore, the more errors observed, the greater indication of a usability problem. as seen in table 1, although the first task took an average of 1.74 minutes, the eight participants tended to only make a mistake once while trying to find a book. tasks 2, 5, 6, and 7 had an average error rate below 1. because the error rate seems to decline from task 1 to task 7 (excluding tasks 3 and 4), the team inferred that users were able to learn how to use the system quite easily. tasks 3 and 4 seemed to cause problems. both of these tasks had an average error rate above 3. this indicated to the team that, since the sus score for the entire system was good, they needed to identify why the database search and article search were problematic. success rate for each task table 1 also shows the three ways that tasks were tagged during the study. the participants completed the task with ease, completed the task with difficulty, or failed to complete the task. as the team expected, the first task was divided between those who completed it with ease and those who completed it with difficulty. this was expected when the participants were learning the system. task 2 shows a marked improvement: 82.5 percent of the participants found an e-‐book with ease. tasks 3 and 4 however, show up again as problematic. one of the participants failed to complete task 4. after tasks 3 and 4, the participants successfully regrouped, with 87.5 percent completing the last two tasks with ease. average completion time user level task 1 task 2 task 3 task 4 task 5 task 6 task 7 novice 1.41 0.62 0.93 2.35 0.57 0.72 0.67 intermediate 2.28 1.13 1.57 1.81 1.27 0.86 0.37 expert 1.44 1.04 0.77 4.42 1.35 0.98 1.06 information technology and libraries | december 2014 61 average error rate user level task 1 task 2 task 3 task 4 task 5 task 6 task 7 novice 0 0 3.5 4.5 0.5 1 0.5 intermediate 2.33 0.33 3.67 1.67 0.67 0.33 0.33 expert 0.33 1.33 2.67 5 1 0 0.33 table 2. task results by experience level results by experience level based on the team’s observation, novices used the simplest approach to each task. the intermediate and expert users sometimes extended their task completion time by performing more complex searches. almost without exception, the first thing the more experienced users did for each task was to look at the dropdown menus or go to “advanced search,” even when a general search would provide sufficient results. as shown in table 2, this approach lengthened their completion time and increased their error rates on the first two tasks, but the novice error rates increased dramatically on the most difficult tasks (3 and 4), surpassing those of the intermediate and expert users. the tendency of more experienced users to perform complex searches did not negatively affect the their success in completing each task. figure 1. onesearch home page usability testing for greater impact: a primo case study | perrin, clark, de-‐leon, and edgar 62 problems identified figure 1 is a screen shot of the onesearch interface used for the study. as shown, there are three tabs for different searches. the first tab, “texas tech libraries,” searches the print collections, institutional repository, digital collections, special collections, and e-‐books. the second tab, “articles by subject,” is a federated database search that can be narrowed by subject. to find a specific database, users had to click on the a–z link either in the upper right corner or at the lower left under the image. the test indicated that users faced the most trouble when searching for articles and opening specific databases. the team identified three problems based on the difficulties the participants experienced. all of these problems were related to visual design and clarity rather than the basic functionality of onesearch. identifying problems, however, does not address how to best resolve those problems. observing participants using the interface, as well as researching current web standards, may lead to educated guesses as to possible solutions. knowing the limitations of how the onesearch interface design could be changed, the team decided to offer as many possible solutions as could be identified. this was to highlight that there was no single way to fix the problems, but many different ways the problem could be solved. the team put the options in order of their expected effectiveness on the basis of what was observed during the test. problem 1: individual databases are difficult to find while analyzing the task completion statistics and video footage, it became clear that the way to access the databases was not visible to users. as shown in figure 1, the “databases a–z” link was at the very bottom of the page in a location not immediately noticeable. in addition, the formatting on the link did not look like an obvious link. even if users saw it, they might not realize it was something they could click on. solution 1: if possible, make “find databases” a search scope in the dropdown menu. the participants were more willing to go to the menu than to look around the page. this meant that the best place for users to be able to search databases was by selecting them on the dropdown menu. this solution was not possible in onesearch, however. solution 2: make “find databases” a fourth tab. participants were also more likely to look through the tabs than they were to see the database link at the bottom of the page. solution 3: move “find databases” or “databases a–z” to a different part of the page. figure 2 is a mock-‐up of different ways this option could be implemented. this was to highlight that as long as the link was put higher on the page, in a more visible place, users would be more likely to find it. solution 4: make “databases a–z” bigger or more eye-‐catching by changing the color. this would increase the likelihood of it being seen. information technology and libraries | december 2014 63 figure 2. suggested locations for databases a–z problem 2: dropdown limiters are misleading in figure 1, three search limiters can be seen below the search box. the ones shown in figure 1 are “all items,” “that contain my query words,” and “anywhere in the record.” the first box provides a way to limit the results by type such as “books” or “ejournals”. the second box offers a choice between a general search for query words, specifying if the search is an exact phrase search or if a field starts with the query words. the last box offers a way to specify which field the query is searching. some of the participants ran unsuccessful searches during the test, and then erroneously tried to get results by “limiting” the items in their already faulty search. for example, a few students went to the “articles by subject” tab and searched for an image (task 7). when no images came up, they went to the limiters and chose the format “images.” this resulted in zero results returned because there were no images in the “articles by subject” tab. again, the team gave three different options in order of their perceived effectiveness. solution 1: on the “texas tech libraries” tab, remove “articles” from the dropdown limiter. the reason behind this was that there were no articles in the basic search, so it should not be an option. solution 2: on the “texas tech libraries” tab, remove “databases” from the dropdown limiter (unless “find databases” can work as a search scope). solution 3: on the “articles by subject” tab, remove “images” and “journals” from the dropdown limiter. usability testing for greater impact: a primo case study | perrin, clark, de-‐leon, and edgar 64 however, most participants did expect all of the content to be in the main dropdown menu and did not tend to use the tabs or limiters. the team offered a mock-‐up of the best possible scenario in which the only options to search were in the main dropdown (figure 3). this would have been the most intuitive way for patrons to search, but it was also the most technically complex to implement. figure 3. dropdown menu suggestion problem 3: “articles by subject” tab is not visible to users the “articles by subject” tab, which allows users to choose a federated database search by subject, was not visible to participants. the text was smaller than other surrounding text and not immediately recognizable as a tab. one of the reasons that the tabs were removed from figure 3 is that they were not easy for users to see. the team recommended three options. solution 1: make the tabs more visually identifiable by designing them more like traditional website tabs. solution 2: enlarge the tab text so that the tabs are more visible. figure 4 shows a mock-‐up with enlarged, more noticeable text. solution 3: add an “articles by subject” description to the “what am i searching?” text. the team noticed that the explanatory text on the right side of the page did not include a description of the “articles by subject” tab. information technology and libraries | december 2014 65 figure 4. enlarged tabs other minor problems in addition to the three major problems, the team noted smaller issues. one of the problems was that some of the titles of the dropdown search scopes were not in terminology that the students understood. for example, one of the scopes is “thinktech,” the name of texas tech’s institutional repository. since this name doesn’t indicate what the scope actually searches—mainly theses and dissertations—users didn’t know what was in “thinktech” unless they read the explanatory text on the right. the team recommended changing the scope name to something more descriptive, such as “theses, dissertations, faculty research.” another small usability problem was how difficult it was to see the indication of an incorrect search. the “did you mean?” spelling suggestion on the search result page was very small, smaller than the notification that there were no results. participants who made simple spelling errors didn’t realize they had failed because of this simple error and assumed there were no results. discussion the team submitted these findings to the library systems group in two different ways: a written report and a presentation including the above mock-‐ups, charts, and video footage from the usability study. the video clips allowed the team to both illustrate the problems and show the systems group the sources of those problems. the presentation with more multimedia content made a much greater impact than the written report and resulted in the systems group better understanding the problems and how to fix them. visual aids are an effective way to report results because of how crucial visibility is in usability. one example of how the presentation was more usability testing for greater impact: a primo case study | perrin, clark, de-‐leon, and edgar 66 effective is that the team played a game with the systems group by showing figure 2 without the “databases a–z” links circled. the systems group was asked to raise their hands when they first saw the link. after everyone raised their hands, the next slide showed all the locations that the “databases a–z” link could be found. most of the group had not realized there was more than one link, illustrating that some positions are more noticeable than others and that users tend not to linger on a page. this was more effective than a dry report stating this fact. implementing usability findings is often more difficult than identifying them, particularly when usability is conducted on a “finished” system. not all the problems the team identified were able to be addressed, but some problems were fixed quickly and easily, such as including the link “find databases” at the top of the page (see problem 1, solution 3 above). what the usability study did was allow the systems group to understand how patrons view their tool and how they are likely to work with it. conclusion these small changes made the system more usable for patrons, which is what usability testing is all about. it is less about making a system conform to a single way of doing things than finding small ways that the system can be made easier to use. changing the name of the search scopes or changing the position of a link is a relatively small investment of time and resources that yields great benefits for patrons in making the system easier to use. one of the most interesting observations from this study was that most of the users wanted all their search options in one place. they preferred one dropdown menu to handle all their needs. for future development on these kinds of systems, this kind of preference should be kept in mind. the majority of patrons might be happier with a tool with less capability and simpler options rather than a complex tool with many different ways to approach their search. references 1. brian still and m. [qy: first name?] betz, a study guide for the certified user experience professional (cuep) workshop (lubbock: texas tech university, 2011), 61. 2. jeff sauro, “measuring usability with the system usability scale (sus),” measuring usability, february 2, 2011, http://www.measuringusability.com/sus.php. 3. still and betz, a study guide, 67. 4. sauro, “measuring usability with the system usability scale (sus).” 207 standardized costs for automated library systems mary ellen l. jacob: systems officer, fisher library, university of sydney costs of automated library systems as currently given in published reports tend to be misleading and confusing. it is necessary to have a clear understanding of how they were derived before any comparisons can be made. clearly defined costs in terms of time units are more meaningful than straight dollar costs and can be used as one means of comparison among various system designs and as guidelines for the design of new systems. there is a great lack of consistency in reporting the costs of automated library systems. cost figures given in published reports tend to be misleading and confusing; rather than indicate the true cost of a system, they tend to obscure the entire issue. without a clear understanding of how such figures were obtained, one cannot use them for comparison against any other system ( 1). while it is true that no two systems are identical, use of standardized methods can make cross comparisons meaningful and give a basis for estimating the costs of new systems ( 2). when all the variables affecting automated library systems are considered, it is very tempting to say that no realistic comparisons can be made. what is needed is some definite statement of just what criteria can be used to determine costs and how they are derived ( 3). while there have been numerous studies dealing with the cost aspects of specific functions, no real attempt has been made to define standardized cost criteria .. 208 journal of library automation vol. 3/3 september, 1970 for automated library systems. the following discussion is an attempt to identify and define some of the more common cost aspects of such systems. what is cost? "cost" as defined by accountants, is not the subject of this article. primary interest here is in cost as a yardstick for measuring the efficiency and effectiveness of a system and for its comparison with other systems. it is important to note that cost is only one criterion and not necessarily the most important one. as costs are herein described several factors need to be determined. does cost include: 1 ) fixed overhead such as lighting, office space, administrative functions, etc.? 2) actual salaries or assessed salaries (some installations have a fixed figure for certain types of jobs regardless of the actual cost)? 3) equipment cost ( each installation has its own methods for prorating equipment costs)? 4) material costs, paper supplies, etc.? cost figures in terms of dollars have little or no meaning unless their derivation is understood. more meaningful are costs in terms of time, man units for human work, and actual running times for equipment. even for use of these units it is necessary to know something of the relative skill of the personnel involved and the equipment configuration used. personnel costs before examination of the possible breakdown of personnel costs several pertinent points should be considered. it is necessary to state the backgrounds, skills, and levels of experience of the personnel involved. these should include extent of familiatity and experience with the system environment. this environment consists of the equipment, computer or otherwise, and the particular library application involved. it would be advantageous if there were some objective ways of measuring system analyst and programmer performance, rather than reliance on background and experience as measures. unfortunately there are none. this is a problem that has bothered service bureaus, software houses, and any data processing manager worth his salt. at present there is no clear-cut answer. some try to measure efficiency by the length of time taken to code a number of program steps. this gives no measure of the efficiency of the program generated, only a guide to the translating ability of the individual involved. it is certainly no measure of the actual program performance on the computer system. in addition it is extremely difficult to estimate accurately the actual running time that a given program should take, especially for a time-sharing system. how then can one measure the effectiveness of a given programmer in achieving such a goal? another problem to be considered is that of the best program versus the most efficient. it is important that a program be maintainable and capable of being changed easily by another programmer. too, if equipment changes are contemplated, it is highly desirable that the program be written costs for automated systems/jacob 209 in a higher-level language, such as cobol or fortran, which can be used on another machine. higher-level languages are not as efficient as assembler languages, but generally take less time to write and debug and are usually transferable from one machine to another with only minor modifications. while it is not possible to measure analyst or programmer efficiency accurately, it does help to know the level of experience and the general background of personnel. while an inexperienced analyst or programmer may occasionally be more efficient than an experienced one, this is not generally true. normally the more experienced man will know a variety of standardized methods or shortcuts that can be used effectively to either shorten running time, coding time, or both. more important, he knows where to start. a sample personnel description might read: systems and programming staff: 1 systems analyst, b.a. in business administration, five years' experience with various makes of computers, two years of which were spent as a programmer working with cobol and fortran, no library background, worked on a part-time basis. 2 programmers, both with high school diplomas, one with one year's experience with cobol, one trainee with no experience, but high aptitude, both with no library background. 1 library data processing coordinator, masters in library science, manufacturer's course in systems analysis and programming, knowledge of cobol. data preparation staff: 2 professional librarians with masters in library science. 5 clerk-typists, high school diplomas, two with keypunch ability, three typists with 60 wpm. a breakdown of personnel costs should include : 1 ) planning; 2) actual design (both systems and individual programs); 3) coding (writing of actual programs); 4) testing/debugging; 5) file conversion; 6) actual data preparation and correction (includes new file; preparation and maintenance of existing files); 7) program maintenance. in the planning, design, and coding phases both total time actually spent and the elapsed project time should be given. testing/debugging and conversion costs are normally one-time costs, but both can amount to a sizable portion of the system cost ( 4). ideally, if conversion costs can also include file cleanup, it helps make that portion more valuable and easier to accept ( 5). 2io journal of library automation vol. 3/3 september, i970 once the system is in operation, data preparation and correction times become major cost factors. these are usually highest during the initial installation when personnel are learning the new system. care should be taken not to let the size of initial costs bias the entire cost figure. once initial training is over, these will reach a more realistic level and will be more indicative of system requirements and actual costs. equipment costs just as there are difficulties in determining analyst and programmer efficiency there are similar, though less severe, problems in comparing the efficiency of various machine configurations. even in comparison of two identical machine configurations, different run times are possible for the same job. the operating system or monitor must also be considered, as must the experience and efficiency of the computer operator. systems are improving to the point where operator performance is less critical than it once was, but it can still be a significant factor. equipment costs are largely determined by the machine configuration used. a tape system may have a totally different running time from that of a disc or drum system. the configuration also affects what types of systems may be implemented. for comparison purposes it is necessary to state the make, model, and memory size of the computer used. memory size should be given in either words or bytes. a byte is the amount of storage required for one alphabetic character, one special character, or one or two numeric characters. if the memory size is specified in words, the word size should also be given. details of the computer peripherals, such as the general type (i.e., tape drives, disc, printer, card reader, punch, etc.) make, model, and number should be given. for printers, card readers, punches, paper tape units, etc. the speed should also be given (i.e. lines/ minute, card/minute, characters/second, etc.). for storage media, such as disc and drum, the storage capacity should be given. for tape units the tape density should be included. a sample description might read: 1-ibm 360/20 submodel 5, 24k bytes 4-ibm 2415 tape drives, model 3, 800 bpi i-ibm 2560 card reader/punch, reader-500 cpm, punch-160 cpm i-ibm 2203 printer 450 ipm 2-ibm 2311 disc, model i2, 2.7 million bytes equipment costs can be subdivided in many ways. a possible method includes: 1) computer costs a) compile times (highly language and computer dependent) b) test/debug time (should include the entire system as well as individual components) c) actual run times d ) maintenance or debug after installation costs for automated systems/jacob 211 2) additional equipment costs a) keypunch, paper tape, optical character, other inpu~ devices b) interpreting punched card output c) sorting/ collating d) listings e) bursting, binding, etc. 3) special forms or material costs a) input or work forms b) punched cards (pre-printed or blank) c) pre-printed forms d) carbon sets or ncr forms e) pre-punched badge or id cards f) masters for reproduction g) special computer printer ribbons while all of the above items may not be applicable to a particular system, all those that are should be included. large or real-time systems might need additional categories. compile and testjdebug times are of interest to the systems designer, but are less important than actual run times. compile times are a function of the computer, the language used, and the complexity of the program. they are more indicative of the compiler efficiency than the system performance. test/debug times must be allowed for in any system, but data on them will be useful to those having had little experience with automated systems. experienced designers will be aware of the problem and make adequate allowances for it. actual run times are a primary cost factor in any system and representative samples should be given. details concerning the type and volume of input, type of processing, and the type and volume of output should also be stated. program maintenance costs usually do not appear until after the system has been up and running for some time, and are usually not included in reports of system costs, since most reports are written before, or soon after, system installation. they are important, however, because they represent a part of the continuing cost of the system. conversion costs can represent a sizable portion of the installation cost of the system. this is especially true if the data must be converted to machine readable form. these costs are of great interest to others engaged in similar conversions and care should be taken to ensure these are accurate. the most obvious type of file conversion is from a record such as a typed list or a catalog card into a machine readable record through keypunching or keytape conversion. file conversion may still exist for a file already in machine readable form if there are differences between it and the files used by the system. normally such costs are considerably less than conversion from a non-machine-readable form. exceptions may occur if the file lacks much of the necessary information or if extensive character 212 journal of library automation vol. 3/3 september, 1970 manipulation is required before the information can be used. an example of a file having insufficient data to warrant conversion might be a card file with very abbreviated authors and titles used for quick listing purposes, when what is wanted is a full shelf list containing all added entries and subjects, full titles, and imprint information. existing information may require too many corrections to expand the authors and titles to provide any really usable information. in other words it might be more economical to repunch the file from scratch than to try to edit and punch corrections. non-computer equipment costs should not be neglected. while capital investments in such equipment may be small, the time spent in using the equipment can often be lengthy. this is particularly true of input devices. non-computer equipment includes such items as keypunches; keytape units; and any unit record equipment, such as collators, sorters, interpreters, xerox machines, typewriters, guillotines, etc. just as for computer equipment the type, make, model, quantity used, and special features should be given. a sample description of such equipment might be: 1 ibm selectric typewriter, ascii ocr type element 2 ibm 029 keypunches (no special features) 1 ibm 82 sorter 1 ibm 85 collator in a system using a large number of punched cards or large volumes of paper for printing, these too can be significant cost factors. again the volume of usage may be more helpful than actual dollar cost. special or pre-printed forms are usually more expensive than plain forms, so it is important to state types as well as quantities. the actual dollar cost should be stated as well. an example of materials used to produce a small printed catalog with shortened entries on a six month cycle is: 600 dikote masters (for multilith reproduction) at $56.00/1000 masters 1000 pre-printed punched cards at $1.50/1000 cards 1 ibm 1403 computer printer ribbon, no. 413197 1200 pages, standard, lined, 14% x 11-inch computer printer paper presentation of costs the format for presenting cost data could be divided as follows: personnel: a brief paragraph describing the number, types, backgrounds, and skills of all personnel involved with the system. equipment: computer equipment: a brief statement of the computer make, model, and memory size; type, model and number of peripherals. additional equipment: a brief statement of the types, makes, models, and numbers of any other equipment necessary for the successful operation of the system. materials : a brief statement of the types and quantities of forms used, and for special forms an indication of the actual dollar cost as well. table 1. cost control form for sdi system functions planning actual design coding compile testing/debugging file conversion data preparation/correction per run individual job run citations/external citations /internal profile update decollating bmsting printing computer paper ( 141.4 x 11) citations /external citations /internal profile update elapsed 1 month 3 months 4 months 3 months 2 months .5 months personnel total type hours 80 analyst 20 librarian 98 analyst 20 librarian 90 analyst 2 analyst 5 analyst 15 clerk 1 clerk 3 librarian .1 clerk .1 clerk .1 clerk equipment and material time number (in hours) type 1 1 1 1 1 1 1.2 cdc3600 1 .2 cdc3600 ~ .,... c 1 .5 cdc3600 ~ ;:::. ..... 1 .05 cdc3600 ~ ~ 1 .1 cdc3600 en .2 ~ "' .,... .2 ~ ~ "' ........ ._ > 80 pages n 0 50 pages t:p 50 pages 1:-0 1-' i en 214 journal of l ibrary automation vol. 3/3 september, 1970 fig. 1. profile update . f'rofii. £ lipdates sort (man nd . s£ tot·) sort fa i.pha sl£ al•) costs for alltomated systems/jacob 215 .sg/.£~t /. fdifi"'a -r t£rms .sort (aj.pjia 7"'£rm s.~ta.) ma'f'~/j plfdi'il£./ cita-r/iin 7'lifms fig . 2. citation run. !1drt ccitati~n .s£q,) slim aijd s£l£cr ~ort lit a ii. .s&d) pi? !nt norl(.£.5 sjji ndt/c.l.s 216 journal of library automation vol. 3/3 september, 1970 table 1 shows a simple presentation of system cost. the table is a suggested form only and is not exhaustive; it can be expanded as needed for more complex systems. the information and figures given in the table illustrate the system discussed in the following section. the purpose is to provide a sample, not to describe the system in detail, and consequently the system description is very brief. system description a selective dissemination of information system was developed to serve a small group of engineers in a scientific laboratory. one source of input consisted of current accessions obtained as a by-product of regular weekly runs to create a master shelf-list file in machine readable form. another input file was obtained by subscription to a commercial tape service supplying journal, book and report citations. while most of the programs developed for the system were new, it was possible to modify some existing programs for use in the new system. file conversion was required from an existing profile tape used in the previous sdi (ibm package 1401-cx-01) for the format used by the new system. the greater capabilities of the new system also resulted in numerous modifications and expansions of the profiles. the profile master containing a description of user interests had just under 100 profiles representing 40 separate groups. most profiles were for groups rather than individuals; these were updated only as needed. the citation tape contained slightly over 8,000 journals and book citations per week. the internal citation tape contained 180 report citations per week. an average of 400 notices per weekly run for the external citations and 200 notices per weekly run for the internal citations were generated. systems how for the profile update and the citation runs are contained in figures 1 and 2. the language used for the system was cobol. development personnel 1 analyst/programmer with three years cobol, two years autocoder, professional librarian with four years library experience, worked with ibm 1401, 1410 and cdc 3600 1 professional librarian with 15 years' experience in all phases of library work, knowledge of computers, but no programming or analysis experience 1 clerk-typist, ba in english, 60 wpm typist, self-taught keypunch operator, worked in library four years equipment configuration computer equipment 1 cdc 3600, 65 k (words), 8 bytes/word 8 cdc 604 tape drives, 200/500/800 bpi, 7 track, 37.5 inches per sec. costs for automated systems/jacob 217 2 cdc 861 magnetic drwns at 4.2 million characters, 17 ms access time, 2 million cps transfer rate 1 cdc 405 card reader, photoelectric, 1200 cpm 1 cdc 415 card punch, 250 cpm 2 cdc 501 printers, 1000 1 pm, 64 char. print set, 136 char. line 1 cdc 3601 console non-computer equipment 1 ibm 026 keypunch (no special features) 1 decollator 1 burster 1 hand perforator materials standard ( 14lh x 11), lined, computer printer paper blank punched cards magnetic tape subscription @ $5000./year general considerations how well the system attains its intended goals within the desired limits of design, development, and operating costs is the most important consideration. design and development costs are usually initial costs only, but operating costs continue as long as the system functions. operating costs must include the cost of data preparation, computer run times, cost of program maintenance, additional equipment costs, and cost of special forms or materials needed. careful consideration should be given to allowing sufficient money to be spent in design and development so that overal1 operating costs, especially those of data preparation and computer run times, can be reduced. references 1. griffin, hillis l.: "estimating data processing costs in libraries," college and research libraries, 25 (sept. 1964), 400-03, 431. 2. fasana, paul j.: "determining the cost of library automation," a.l.a. · bulletin, 61 (june 1967, 656-61). 3. landau, herbert b.: "the cost analysis of document surrogation: a literature review," american documentation, 20 (oct. 1969), 320-310. 4. gregory, robert h.; van horn, richard l.: automatic data-processing systems: principles and procedures (belmont, ca: wadsworth, 1963 ). 5. hammer, donald p.: "problems in the conversion of bibliographic data: a keypunching experiment," american documentation, 19 (jan. 1968), 12-17. lib-s-mocs-kmc364-20141005044842 182 technical communications reports-library projects and activities ohio state university health sciences library uses autamated bookstack system the new health sciences library at ohio state university began serving students in may 1973 with some of the most advanced features in any library in the country. it contains an automated bookstack system to locate and file books, and is the fourth library in the country to have the system ( randtriever, manufactured by remington rand corp.), says jo ann johnson, director of the health sciences library. "the bookstack system will find and deliver a book via a conveyor belt in about a minute," miss johnson said. the chief advantages of the system are that it saves space and is speedy and accurate, she pointed out. "the book stacks in the new library take up about 15 percent of the total space while in most libraries the stacks take 40 to 60 percent of the space," miss johnson said. aisles in the stacks are narrow, about 15 inches, and the shelves rise through two stories of the library-twenty-two feet in all, she said. the library has a capacity of 175,000 volumes. the accuracy of the system will reduce the problem of misfiling. also, book theft should dwindle because the stacks will be closed to users, she said. the library is connected with the computerized circulation system of the university library, made up of a main library and twenty-three branch libraries. this circulation system is the first of its type in the country and permits library users to place telephone calls to learn titles and authors and to charge out books. other features of the modern library will include a computer-assisted instruction area to be completed later, and connections to medline, the international computerized information system of medical journals. miss johnson explained that the automated books tack system works like this: a library staff member sends instructions via a terminal to an electronic device in an aisle. the device travels on vertical and horizontal columns in the aisles. it picks out a small bin of books containing the requested one, then travels to the end of the aisle and places it on a conveyor belt. at the terminal, the staff member selects the requested book from the bin, usually containing about eight volumes, and sends the bin back for refiling. a glass window permits observation of the system. university of california, berkeley serials key word index the university of california, berkeley, general library has published a serials ke y word index to titles of 45,741 serials. the computer-produced index is the largest of a fairly new variety of key word indexes, covering titles of serials rather than articles. the 360/ assembler programs written by the library systems office include a number of innovations. berkeley serial records are stored in marc format, upper-lower case, capitalized by citation rather than catalog standards. the key word extract program ignores prepositions and conjunctions, etc. (which are not capitalized); treats certain multiword terms (la paz, united nations) as single words; prepares a librarystandard sorting key (with u.s. filing as united states, & filing as and, and distinction made between two types of hyphenation); and does no stop-list searching or other searching for excluded words. key lines are sorted by key word; all other processing is based on an alphabetic file of key words attached to main entries. thus, vocabulary control (forced interfiling of abbreviations, synonyms, cognates, etc.-not heavily used in this edition) is a fast, simple runtime operation, changing certain key words (on a single alphabetic pass) and generating "see" references. exclusion of low-content words is also a fast, simple runtime operation, done in the printing program, allowing excluded-word entries to print if the word occurs first in either title or author, and generating an explanatory note under each excluded key word. listings are main-entry, alphabetic under key word groups, with brief holdings, campus location, and call number where available. the key word appears in all capital letters within each entry, and redundant entries are collapsed-that is, if a wqrd appears more than once in an entry, each occurrence is capitalized, but the entry is only listed once under the key word. the first edition limits entries to 98 characters and holdings to 13 characters; the programs have since been revised to allow up to 193 character entries and up to 45 characters of holdings. both versions t echnical communications 183 of the programs retain as much runtime flexibility as possible, while maintaining extremely low running time. the first edition, including mostly nondocument currently-received titles, is photocomposed in a 6-point slab-serif type and published in three paperbound volumes. copies are available for $60 a set from systems office, main library, university of california, berkeley, ca 94720. walt crawford, university of california, berkeley programming and computers plea, a pli 1 efficiency analyzer pl/ 1 users find that the language offers infinite ways to invoke inefficient code. partial defense is provided by careful manual reading. another, and very plea ut~~ctio'! analysis pace stat£nent t~a~ counts foil main p~oceoorf t!slimeo offset 000)68 in load noouleo 'l 141 :' 69 ·r10 1 3 11 "\j ~ u •• ~~ 2 ii ,, u 4 »' i z1 l6j ---lt 25 9 35 14 u 5 ' ' f: hi ' .. "s .. , u ~34 10 !i 3 .-... !. · . ul .... _,... 6 ' e0tt1) ; j = (tof + bot) i 2; if b > .a(j) then bot = j; else top • j; end; if top> 101 b = a(top) then; /•not found•/ else flagl • 'l'b; 'f, total ~ 20.4 10.1 13.4 s.z 0.7 19.4 1.1 1.5.6 88.9 9.4 totals 98.)* ( 0 )truncation error. no.traps 151 197 121 1z 28j 17 232 1312 146 14.58 fig. 2. each test repeated 2000 times with argument b in array and 2000 times with argument b not in array. sampling interrupt interval was .00 seconds. to get a reasonable sampling of the remammg blocks. a comparative run showed that the optimization overhead was charged to the proper statement groups, but pragmatists will note that the problem setup was biased against the binary search solution. however, using the trap totals from tests 2, 3, and 4, the 50 percent probability test indicates that the probability of no significant difference between methods 2 and 4 is more than 5 percent; the probability of no significant time difference between methods 2 and 3 is more than 1 percent. pll:a is available at a program distribution fee of $25 from the share program library agency, triangle university computer center, p.o. box 12076, research triangle park, nc 27709. thanks are due dr. david gomberg, university of california, san francisco computer center, for most of the runtime and several of the statements used in the test.-]ust·ine roberts, systems librarian, uc-san francisco input to the editor: i am writing to you concerning the article which occurred in the september 1972 issue of journal of library automation entitled "the shared cataloging system of the ohio college library center." i also note that this issue of your journal, even though dated sept.-mber 1972 was not received until july of 1973 by this library, and, indeed, it was a timely arrival for at the present time the northwest association of private colleges and universities is investigating the feasibility of seeking service from the oclc for some of its library requirements. however, in talking with mr. kilgour and his associates at ala this summer it was exceedingly difficult to get a complete cost picture of participation in the oclc and to this date we have not been able to get a complete cost breakdown obligation. in this regard, this article was extremetechnical communications 185 ly interesting and i requested one of my staff members to do a careful analysis of the cost aspects of the oclc services. i am attaching this analysis for your interest and perhaps it will be of suitable pertinence for the readership of your journal. certainly i, and other of my colleagues at this university and in napcu would more than be interested in response by mr. kilgour and his associates. summary: desmond taylor library director collins memorial library university of puget sourul tacoma, washington an analysis "average cost per card for 529,893 catalog cards in finished form and alphabetized for filing was 6.57¢ each ... the system is easy to use, efficient, reliable, and cost beneficial. an off-line catalog card production system based on a file of marc ii records was activated a year before the on-line system." requests were hatched weekly. library of congress card numbers were keypunched onto cards for searching. seventy percent were found the first search. "members could specify a recycling period of from one to thirty-six weeks ... before unfulillled requests were returned." lowest price in lots of one-half million permalife cards was $8.01 per thousand. cpa's checked the system and found that all direct costs were included in 6.57¢ cost. no mention is made of the preexisting cataloging systems- file ~ asize 50 100 150 200 250 300 400 500 750 1000 ~ $ .209 $ .118 $ .107 $ .100 $ .095 $ .087 $ .082 $ .080 .... !ok $ .141 $ .090 cs· 20k .345 .209 .164 .141 .127 .118 .107 .100 .091 .087 ;s 30k .481 .277 .209 .175 .155 .141 .124 .114 .100 .093 < 40k .617 .345 .254 .209 .182 .164 .141 .127 .109 .100 c ~ 50k .753 .413 .300 .243 .209 .186 .158 .141 .118 .107 ,;... -60k .889 .481 .345 .277 .236 .209 .175 .155 .127 .114 ...... 70k 1.025 .549 .390 .311 .263 . 232 .192 .168 .137 .121 ~ 80k 1.161 .617 .436 .345 .291 .254 .209 .182 .146 .127 ll' .685 .379 .277 .155 "t 90k 1.297 .481 .318 .226 .194 .134 c'.) look 1.433 .753 .526 .413 .345 .300 .243 .209 .164 .141 .?"' llok 1.569 .821 .572 .447 .372 . 322 .260 .223 .173 .148 ...... co ~ 120k 1.705 .889 .617 .481 .399 .345 .277 .236 .182 .155 ...... table 2. relationship of file size and batch size to cost per titlefile update and record selection functions combined in same program old batch size file size 50 100 150 200 250 300 400 lok $ .378 $ .225 $ .175 $ .149 $ .134 $ .124 $ .111 20k .650 .361 .265 . 217 .188 .169 .145 30k .922 .497 .356 .285 .243 .214 .179 40k 1.194 .633 .447 .353 .297 .260 .213 50k 1.466 .769 .537 .421 .352 .305 .247 60k 1.738 .905 .628 .489 .406 .350 .281 70k 2.010 1.041 .719 .557 .461 .396 .315 80k 2.282 1.177 8.09 .625 .515 .441 .349 90k 2.554 1.313 .900 .693 .569 .486 .383 look 2.826 1.449 .991 .761 .624 .532 .417 llok 3.098 1.585 1.081 .829 .678 .577 .451 120k 3.370 1.721 1.172 .897 .732 .622 .485 500 750 1000 ":tj ... ~ $ .104 $ .093 $ .088 en .131 .111 .102 n . ~ .158 .130 .115 ~ .185 .148 .129 ~ .212 .166 .143 ~ .240 .184 .156 > .267 .202 .170 ~ () .294 .220 .183 ~ .321 .238 .197 ~ ~ .348 .257 .211 c ~ .376 .275 .224 -.403 .293 .238 ~ ttl :z :z t=:l tj to< ~ 8 journal of library automation vol 4/1 march, 1971 combined merge-select runs. table 2 shows the predicted costs per title for combined merge-select runs with varying file and batch sizes. the costs shown are based on the following equation: c. =(fso + fs~:s· + fs. )c.o + cp where ct is the cost per title fso is the file size for the old marc file fsa is the file size for the add records ( 1200) fsv is the file size for the delete records ( 1200) fsn is the file size for the new marc file bs (batch size) is the number of records selected in the run c1o is the cost of reading or, writing a record ( $.00068) cp is the cost of processing a selected record ( $.073) calculations for this table are based on several assumptions: it is assumed that the file has reached a state of equilibrium in which the weekly additions and deletions are equal; it is also assumed that delete records have the same average length as other records and therefore take as long to read. while it is unlikely that these assumptions will hold perfectly, the variations are not great enough to destroy the usefulness of the resulting figures as a guide. discussion the figures presented in the two tables have several implications for the design of systems based on the maintenance of a cumulative marc file and the selection of records from that file. first, they show the im· portance of assuring that no unnecessary passes of the cumulative marc file are made. updating of the marc file, production of indexes to it and selection of records from it should be accomplished in a single pass of the file. if it is desired to select records from the file more often than once a week, table 1 provides a means of estimating the cost of the im· proved response time. if for example, the file size is 100,000 and the weekly volume is 500, twice-a-week runs would increase the cost by 14 cents per title or by $68.00 a week for the select runs. the figures presented in the two tables also show the critical importance of controlling the growth of the cumulative marc file, especially for file size and marc records/kennedy 9 libraries with a relatively small volume of titles to be processed. three characteristics of the acquisitions program of the library largely determine the possibilities for controlling the growth of this file. the number of titles acquired by the library determines the batch sizes for records to be selected from the file each week. the acquisition rate is also an important determinant of the growth rate of the cumulative file provided that records which have been selected and used are then purged from the file. if the library of congress issues an average of 1200 titles per week and a library uses an average of 1000 titles a week from the file, the net annual growth of the cumulative file will be only slightly over 10,000 records. on the other hand, a smaller library selecting an average of only 100 titles a week would have a net annual growth rate of about 57,000. if unused records were purged after one year, the file size would remain stable at these levels. table 2 indicates that the cost per title for file maintenance and selection at these two libraries would be about 9 cents and 86 cents respectively. a second characteristic of the acquisitions program of the library that is important in controlling the growth of the cumulative marc file is the scope of the subject coverage attempted. if most of the monographs acquired fall within well defined subject classes, the probability of utilizing marc records in many other subject classes may be low enough that these records need not be added to the cumulative marc file at all. for a special library that attempts to collect everything published in a few well defined subject areas it may be economical to maintain and utilize a limited marc file even though the number of records selected is small. on the other hand, a small or medium-sized public library acquiring the same number of titles would probably find a much larger percentage of its records on the marc file but still not be able to use the marc tapes economically. since the public library is likely to collect titles in most subject fields, the probabilities of utilizing records in different classes would not vary as widely and it would not be possible to limit the file to records in a few classes having a high probability of utility. consequently, the per-item cost of marc records would likely be too high for consideration. if it is determined that the probabilities of using marc records vary widely for other characteristics, such as publisher, these characteristics may be used for restricting the records to be added to the cumulative file, thus limiting its size, but subject class seems to be the most promising characteristic for this purpose. an analysis by subject class of all non-juvenile records in the marc i bi.e and of those records selected from it for use by the georgia tech ltbrary has been used as the basis for restricting the growth of the cumulative file of marc ii records. overall, 8,953 out of 46,486 records were utili~ed, 19.3% of the file. the percentage selected varied from more than 50% m some engineering classes to less than 1% in a few classes such as cs (genealogy) and bw (practical theology) . elimination of thirty 10 journal of library automation vol. 4/1 march, 1971 classes in which fewer than 4% of the records were eventually used would have reduced the file by 7,710 records or 16.6%. only 184 of these records ( 2.4%) were eventually selected for use. records for these thirty subject classes are not being added to the georgia tech file of marc ii records. a third characteristic of the acquisitions program important in controlling the growth of the cumulative marc file is the speed with which newly published monographs are acquired. if most monographs are acquired soon after publication, the probability of using a marc record that has not been selected in the first few months after its receipt may be low. unselected records may therefore be purged after a relatively short time and the file size thereby controlled. use of the marc tapes for book selection will help to increase the probability of records being selected during the first few months on the file. a system that uses the weekly marc tapes for book selection and does not retain on the cumulative marc file those records not selected for purchase might be quite economical. the frequency with which decisions are later made to acquire titles that were initially passed over, and the added cost for manual input of those records, would have to be considered in deciding on this policy. an analysis has been made of the interval between the date records were added to the marc file and the date on which they were selected for use by the georgia tech library. distributions by time intervals for each library of congress subject class were prepared. the distributions varied significantly for reasons that are not yet clear. generally, it appeared that in those subject classes for which a smaller percentage of the titles available on the marc file were acquired, they were acquired more rapidly. this seems to be advantageous for keeping the marc file small. for those classes in which a large percentage of titles are selected, unselected records will be retained on the file for a long period, such as eighteen months. use of a large percentage will mean that the number of unused records remaining on the file will be relatively small and they will have a high probability of selection over the extended period. for those classes in which a smaller percentage of titles are acquired, the unselected records will be retained on the file for a shorter period, such as six months. since titles in these fields tend to be acquired more promptly, few potentially useful records will be lost by purging unselected records after a shorter interval. over the past year major changes have been made in acquisitions procedures in the georgia tech library. a much larger proportion of mono· graphs are now received on approval plans. the marc distribution serv· ice now provides about twice as many records each week as were provided during the pilot project phase. the effects of these changes on the propor· tion of titles selected and the time required for acquiring titles in the various subject classes have not yet been determined. continuous moni· toring of the operation of the system for changes in these characteristics file size and marc records/kennedy 11 will be required for efficient operation. the improved program for maintenance of the marc ii file and selection of records from it provides for designating subject classes which are not to be added to the file and designating how long unselected records in other classes are to be retained on the file. this study of variations in the computer costs of card production lends support to the decision to continue using cobol as the primary language for the marc ii system being implemented on the univac 1108 rather than using assembly language. the inefficiency of cobol for characterby-character code conversion and for manipulating variable length data had been a source of some concern. the cost of all processing of selected records, including code conversion, reformatting, prooflisting, making corrections, generating and formatting added entry records, and sorting and printing catalog cards, averaged only about 16 cents per title. a reduction of even 50% through the use of assembly language and increased effort directed to program efficiency would reduce costs by only about 8 cents per title or 1 cent per card. these savings do not seem to justify the increased original programming costs and the likelihood of eventual costly reprogramming. on the other hand, the cost of selecting records from the marc file varied from 3 cents per title to 29 cents per title. with the added cost of weekly maintenance of the marc file and with more than twice as many marc records being received, the costs of processing the cumulative marc file might easily go much higher. by careful attention to controlling tl1e growth of this file, significant savings in the cost of the system may be achieved. conclusion some librarians have assumed that as the scope of the marc distribution service expands to include other languages and other types of materials their problems of inputting current records will be solved. this analysis shows that the situation is not so simple. probably only a few of the largest general research libraries will be able to maintain complete marc files for their individual use during the next few years, though reductions in computing costs may eventually change this prediction. even medium-sized libraries such as georgia tech will not be able to use economically the foreign language materials when they are included in the marc program. some libraries which do not use a large enough proportion of the marc records to make it economically practical to maintain a complete marc file may be able to make economical use of marc records by carefully contro~ling the retention of records on the cumulative file. continuing analysts of the probabilities for selecting records of varying age and subject classes rna~ be utilized in developing a formula for maintaining the file at near opbmum size if the system provides for collection of the required statistics. 12 journal of library automation vol. 4/1 march, 1971 for libraries which cannot profitably use the marc tapes, there is another prospect. cooperative centers that do the processing for large library systems or for several systems will have the volume to justify maintenance of complete files. certainly, a processing center serving all libraries of the university system of georgia could economically maintain a more complete marc file than georgia tech alone can justify. the development of cooperative processing programs in ohio, new england, oklahoma, ( 7, 8, 9) and elsewhere indicates that some librarians are coming to this realization. acknowledgments mrs. julie gwynn wrote most of the computer programs referred to in this paper. her husband, professor john gwynn, gave valuable advice on the statistical techniques employed in analyzing the data. the university of toronto library generously provided a copy of its marc file, which included the date each record was added to the file, for use in analysis of the time lag between availability of the record and selection of it. references 1. fasana, paul j.: "automating cataloging functions in conventional libraries," library resources and technical services, 7 (fall1963), 350-365. 2. kilgour, frederick g.: "costs of library catalog cards produced by computer," journal of library automation. 1 (june 1968), 121-127. 3. stone, sandra f .: yale bibliographic system; time and cost analysis at the yale medical library (unpublished document, new haven: yale university library, 1969). 4. murrill, donald p.: "production of library catalog cards and bulletin using an ibm 1620 computer and an ibm 870 document writing system," journal of library automation, 1 (september 1968 ), 198-212. 5. kennedy, john p.: "a local marc project: the georgia tech library." in university of illinois, graduate school of library science: proceedings of the 1968 clinic on library applications of data processing (urbana: university of illinois, 1969), pp 199-215. 6. ibid. 7. kilgour, frederick g.: "a regional networkohio college library center," datamation, 16 (february, 1970), 87-89. 8. agenbroad, james e.; et al.: systems design and pilot operations of the n ew england state universities. nelinet, new england li· brary information network. progress report, july 1, 1967. march 30, 1968 (cambridge, mass.: inforonics, inc., 1968). ed 026 078. 9. bierman, kenneth john; blue, betty jean: "processing of marc tapes for cooperative use," journal of library automation, 3 (march 1970)' 36-64. 78 information technology and libraries | june 2006 in the early years of modern information retrieval, the fundamental way in which we understood and evaluated search performance was by measuring precision and recall. in recent decades, however, models of evaluation have expanded to incorporate the information-seeking task and the quality of its outcome, as well as the value of the information to the user. we have developed a systems engineering-based methodology for improving the whole search experience. the approach focuses on understanding users’ information-seeking problems, understanding who has the problems, and applying solutions that address these problems. this information is gathered through ongoing analysis of site-usage reports, satisfaction surveys, help desk reports, and a working relationship with the business owners. ■ evaluation models in the early years of modern information retrieval, the fundamental way in which we understood and evaluated search performance was by measuring precision and recall.1 in recent decades, however, models of evaluation have expanded to incorporate the information-seeking task and the quality of its outcome, cognitive models of information behavior, as well as the value of the information to the user.2 the conceptual framework for holistic evaluation of libraries described by nicholson defines multiple perspectives (internal and external views of the library system as well as internal and external views of its use) from which to measure and evaluate a library system.3 the work described in this paper is consistent with these frameworks as it emphasizes that, while efforts to improve search may focus on optimizing precision or recall, it is equally important to recognize that the search experience involves more than a perfect set of high-precision, high-recall search results. the total search experience and how well the system actually helps the user solve the search task must be evaluated. a search experience begins when users enter words in a search box. it continues when the users view some representation (such as a list or a table) of candidate answers to their queries. it includes the users’ reactions to the usefulness of those answers and their representation in satisfying information needs, and continues with the users clicking on a link (or links) to view content. optimizing search results without considering the rest of the search experience and without considering user behavior is missing an opportunity to further improve user success. for example, the experience is a failure if typical users cannot recognize the answers to their information need because the items lack a recognizable title or an informative description, or they involve extensive scrolling or hard-to-use content. ■ proposed solutions problems with search, such as low precision or low recall, are often addressed by either metadata solutions (adding topical tags to content objects based on controlled vocabularies) or replacement of the search engine. the problems with the metadata approach include the time and effort required to establish, evolve, and maintain taxonomies, and the need for trained intermediaries to apply the tags.4 a community of stakeholders may be convened to define the controlled vocabulary, but often the lowest common denominator prevails, the champions and stakeholders leave, and no one is happy with the resulting standard. even with trained intermediaries, inter-indexer inconsistency compromises this approach, and inconsistent term application can cause degradation of search results.5 another shortcoming of the metadata approach is that a specific metadata classification is just a snapshot in time and assumes that there is only one particular hierarchy of the information in the corpus. in reality, however, there is almost always more than one way to describe a concept, and the taxonomy is the view of only one individual or group of individuals. in addition, topical metadata is often implemented with little understanding of the types of queries that are submitted or the probable user search behavior. the other approach to improving search results— replacing a search engine—is not a guarantee to fixing the problem because it focuses only on improving precision (and perhaps recall as well) without understanding the true barriers to a successful search experience. ■ irs.gov irs.gov, one of the most widely used government web sites, is routinely accessed by millions of people each month (more than 27 million visits in april 2005). as an informational site, the key goal of irs.gov is to direct visitors quickly to useful information, either through marcia d. kerchner (mkerchner@mitre.org) is a principal information systems engineer at the mitre corporation, mclean, va. a dynamic methodology for improving the search experience marcia d. kerchner article title | author 79a dynamic methodology for improving the search experience | kerchner 79 navigation or a search function. given that there were almost 16 million queries submitted to irs.gov in april 2005, search is clearly a popular way for its users to look for information. this paper offers an alternative to conventional search-improvement approaches by presenting a systems engineering-based methodology for improving the whole search experience. this methodology was developed, honed, and modified in conjunction with work performed on the irs.gov web site over a threeyear period. a similar strategy of “sense-and-respond” for information technology (it) departments of public organizations that involves systematic intelligence gathering on potential customer demand, a rapid response to fulfill that demand, and metrics to determine how well the demand was satisfied, has recently been described.6 the methodology described in this paper focuses on analyzing the information-seeking behaviors and needs of users and determining the requirements of the business owners (the irs business operating divisions that provide content to irs.gov, such as small business and self-employed, wage and investment) for directing users to relevant content. it is based on the assumption that a web site must evolve based on its user needs, rather than expecting users to adapt to its singularities. to support this evolution, this approach leverages techniques for query expansion and document-space modification.7 dramatic improvements in quality of service to the user have resulted, enhancing the user experience at the site and reducing the need to contact the help desk. the approach is particularly applicable for those government, corporate, and commercial web sites where there is some control over the content, and usage can be categorized into regular patterns. the rest of this paper provides a case study in the application of the methodology and the application of metrics, in addition to precision and recall, to measure search experience improvement. ■ conceptual framework while analysis of search results often focuses on search syntax and search-engine performance, there are actually several steps in the retrieval process, from the user identifying an information need to the user receiving and reviewing query results. as shown in figure 1, finding information is a holistic process. there are several opportunities to improve the whole user experience by fine-tuning this process with a variety of tools—from document engineering to results categorization. once the user and business-owner needs are understood, the appropriate tools to address specific issues can be identified. the tools in our toolkit are described in the following sections. document engineering document engineering includes: ■ document-space modification: modifying the document space by adding terms to content (especially to titles) that are good discriminators and reflect terms commonly entered by users. this approach has the added benefit of making the content more understandable to users. ■ establishment of content-quality standards: defining business processes that improve content quality and organization. document-space modification there is significant syntactic and semantic impreciseness in the english language. in addition, because of the inadequacies of human or automatic keyword assignment, standard means of representing documents in indexes by statistical term associations and frequency counts or by adding metadata tags are not definitive enough to produce a space that is an exact image of the original documents. document-space modification moves documents in the document space closer to future similar queries by adding new terms or modifying the weight of existing terms in the content (figure 2).8 the document space is thus modified to improve retrieval. for irs.gov, rather than adjusting content weights, titles and content are modified to adjust to changing terminology and user needs. establishment of content-quality standards the quality of the search correlates with the quality of the content. improved search results can be achieved by applying good content-creation practices. retrieval can be significantly improved by addressing problems observed in the content. these problems include inconsistencies in term use—for example, earned income credit (eic) versus earned income tax credit (eitc)—duplicate content, insufficiently descriptive page titles, missing document summaries, misspellings, and inconsistent spellings. figure 1. the information retrieval process 80 information technology and libraries | june 2006 processes to improve content quality should establish standards for consistent term usage in content, as well as standards for consistent and descriptive naming of content types (for example, irs types include forms, instructions, and publications). these processes will not only improve search precision, but will also help users identify appropriate content in the search results. for example, content entitled “publication 503” in response to the query “child care” may be the perfect answer (with excellent precision and recall), but the user will not recognize it as the right answer. a title such as “publication 503: child and dependent care expenses” will clearly point the user to the relevant information. usability tests conducted in march 2005 for irs.gov confirmed that content organization plays an important role in the perceived success of a user’s search experience. long pages of links or scrolling pages of content left some users confused and overwhelmed, unable to find the needed information. for these queries, although the search results were perfect, with a precision of 100 percent after one document, the search experiences were still failures. query enhancement the technique of relevance feedback for query expansion improves retrieval in an iterative fashion.9 according to this approach, the user submits a query, reviews the search results, and then reports query-document relevance assessments to the system. these assessments are used to modify the initial query, that is, new terms are added to the initial query (hopefully) to improve it, and the query is resubmitted. if one visualizes the content in a collection as a space (figure 3), this approach attempts to move the query closer to the most relevant content. a drawback of relevance feedback is that it is not generally collected over multiple user sessions and over time, so the next user submitting the same query has to go through the same process of providing results evaluations for query expansion. borlund has noted that, given that an individual user ’s information need is personal and may change over session time, relevance assessments can only be made by a user at a particular time.10 however, on irs.gov, where there are many common queries for which there is a clear best-guess response, there is valuable relevance information that, if captured once, could benefit tens of thousands of users for specific queries. in fact, in april 2005, the top four hundred queries represented almost half of all the queries. another drawback of the relevance-feedback ap proach is that it forces the user, novice or expert, to become engaged in the search process. as noted previously, users are generally not interested in becoming search experts or in becoming intimately involved in the process of search. the relevance-feedback approach tries to change users’ behavior and forces them to find the specific word or words that will best retrieve the relevant information. in fact, some research has shown that the potential benefits of relevance feedback may be hard to achieve primarily because searchers have difficulty finding useful terms for effective query expansion.11 to avoid requiring users to submit relevance-feedback judgments, the methodology uses alternative approaches for gathering feedback: (1) mining sources of input that do not require any additional involvement on the part of the users; and (2) soliciting relevance judgments from subject matter experts. as noted above, while best results may be different per task and per user, particularly given the shortness of the queries, our goal is to maximize the good results for the maximum number of people. best-guess results are derived from a variety of sources, including usability testing, satisfaction survey questionnaires, and businesscontent owners. for example, users entering the common query “1040ez” can be looking for information on the form or the form itself. given that—as shown in table 1 (based on the responses of 11,715 users to satisfaction surveys in 2005)—the goal of 39 percent of irs.gov searchers is to download a form as opposed to 28 percent seeking to obtain general tax information, the retrieval of the 1040ez form and its instructions is prioritized, while also retrieving any general related information. figure 2. document-space modification figure 3. query modification article title | author 81a dynamic methodology for improving the search experience | kerchner 81 we can determine the best-guess results as follows: ■ review the search results for terms that are on the frequently entered search-terms list ■ review help desk contacts, satisfaction-survey comments, and zero-results reports to identify information users who are having trouble finding or understanding ■ identify best results by working with the business owners as necessary ■ analyze why best results are not being retrieved for a particular query ■ add appropriate synonyms for this and related queries ■ engineer relevant documents (as described above) in this way, the thesaurus, as the source for query enhancement, is an evolving structure that adapts to the needs of the users rather than being a fixed entity of elements based on someone’s idea of a standardized vocabulary. search improvement we can intercept very popular queries and return a set of preconfigured results or a quick link at the top of the search-results listing. for example, the user entering “1040” sees a list of the most popular 1040-related forms and instructions in addition to a list of other search results. there were more than 31,000 users in april 2005 who requested the i-9 form. since the form is not an irs form, users are presented with a link to the bureau of citizen and immigration services web site. the tens of thousands of users who look for state tax forms on irs.gov are directed either to the specific state-tax-form website page or to a page with links to state tax sites. this unique and user-friendly approach provides a significant improvement over a page that tells the user that there is no matching result, leaving him to fend for himself. another technique for improving search precision (not currently used for irs.gov) is to tune and adjust parameters in the search engine, such as the relative weighting of basic metadata tags such as title (if they are used in the relevance calculation). results-ranking improvement the search results can be programmatically re-ranked before being presented to the user. this approach (not used as yet on irs.gov) is a variation on the quick links described above for re-ranking more than one result. categorization a large set of search results can be automatically categorized into subsets to help the user find the information he needs. in addition, a “search within a search” function is available to help the user narrow down results. research to be conducted on commercial products to support automatic categorization is planned for the future. summarization as noted earlier, a barrier to a successful user experience can be the lack of informative descriptions in the search results. therefore, an important tool for search-experience improvement is to make sure that content titles and summaries are informative, or as a second choice, that the search engine dynamically generates informative summaries. passage-based summaries and highlighted search terms in the summary and the content have become a feature of many commercial search engines as another way to improve the usability of the returned results. in table 1. reasons for using irs.gov reason for coming to irs.gov % of total site visitors % of total search users download a tax form, publication, or instructions 39 39 obtain general tax information 27 28 obtain information on e-file 10 10 other 6 6 obtain info on tax regulations or written determinations 4 4 order forms from the irs 3 4 sign up or login to e-services 3 3 link and learn (vita/vce) training 3 3 obtain info on the status of your tax return 2 2 use online tax calculators 1 1 obtain info on revenue rulings or court cases 1 1 obtain an employer identification number (ein) 1 — note: due to rounding, totals may not equal 100%. 82 information technology and libraries | june 2006 addition, for those pdf publications that lacked informative titles in the title tag, descriptive information from a different metadata field was added to the search display programmatically, which improved the usability of such results significantly. ■ methodology the methodology for evolving the search functionality is based on a logical, systems-engineering approach to the issue of getting users the information they seek: understanding the problems, understanding who has the problems, and applying solutions that address the problems. usability studies, weblogs, focus groups, help desk contacts, and user surveys provide different perspectives of the information system. the steps of the methodology are: 1. understand the user population. 2. identify the barriers to a successful search experience. 3. analyze the information-seeking behaviors of the users. 4. understand the needs of the business owners. 5. identify and use the appropriate tools to improve the user’s search experience. 6. repeat as needed. 7. monitor new developments in search and analytic technologies and replace the search engine as appropriate. step 1: understand the user population the first step is to profile and understand the user population. as mentioned above, an online satisfaction survey was conducted during a six-week period in january– february 2005, to which 11,715 users responded. the users were asked the frequency of their usage of the site, their primary reason for coming to irs.gov, their category (individual, tax professional, business representative), and how they generally find information on irs.gov. as shown in tables 1–4, 76 percent of the irs. gov visitors use it once a month or less (the largest group being those who use it every six months or less), or were using it for the first time; 64 percent are individual taxpayers; 10 percent are tax professionals; 39 percent visit the site to download a form or publication; and 27 percent come for general tax or e-file information. forty-nine percent use the search engine. not surprisingly, 44 percent of the frequent visitors (those who visit once a week or more) are tax professionals, while 72 percent of the infrequent visitors are individuals or those who represent a business. the most common task of both the most frequent and infrequent visitors is to download a form, publication, or instructions, followed by obtaining general tax information. most frequent and infrequent visitors use the search function to locate their information. thus, the largest group of irs.gov users consists of average citizens, unfamiliar with the site, who have a specific question or a need for a specific form or publication. these users require high-precision, highly relevant results, and a highly intuitive search interface. they do not want or need to read all the material generated by their search, but they want their question answered quickly. these users are generally not experienced with sophisticated query language syntax, and because they come to the site no more than once a month, they are not likely to be familiar with its navigational organization. as studies demonstrate, users in general do not want to learn a search engine interface or tailor their queries to the design of a particular search engine.12 they want to find their information now before “search rage” sets in. one study observed that, on average, searchers get frustrated in twelve minutes.13 tax professionals form a small but important group of irs.gov users that includes lawyers, accountants, and tax preparers. they generally use the site on a regular basis, which could be daily, weekly, or monthly. some of these users, particularly lawyers and accountants, require high relevance in their search results; it is critical that they retrieve every relevant piece of information (e.g., all the tax regulations) related to a tax topic. they may be willing to sift through large results sets to make sure they have seen all the relevant items. in contrast, many tax preparers use the site primarily to download forms and instructions. while these different sets of users have different levels of expertise using the site and somewhat different precision and recall requirements, they do have one characteristic in common—they are not interested in search table 2. frequency of visits to irs.gov first time every six months or less about once a month about once a week daily more than once a day site visitor 29% 34% 13% 13% 7% 4% search user 26% 34% 14% 14% 7% 5% article title | author 83a dynamic methodology for improving the search experience | kerchner 83 for its own sake. approaches to improving retrieval results that focus on forcing users to use tools to refine their query to get presumably better search results (e.g., leveraging the power of boolean or other search syntax) are not desirable in a public web site environment. the complexity of the search must be hidden behind the search box and users must be helped to find information rather than be expected to master a search function. step 2: identify the barriers to a successful search experience there are several categories of reasons why finding information on a public web site can be frustrating for the user. ■ mismatch between user terminology and content terminology  the user search terms may not match the terminology or jargon used in the content (e.g., users ask for “tax tables” or “tax brackets”; the irs names them “tax rate schedules”).  multiple synonymous terms or acronyms are found because different authors are providing content on similar topics (e.g., “ein,” “employer identification number,” “federal id number”; “eic” versus “eitc”).  users request the same information in a variety of ways (e.g., “1040ez,” “1040-ez,” “ez,” “form1040ez,” “1040ez form,” “2005 1040ez,” “ez1040”).  related content may be inconsistently named, complicating the user’s search process (e.g., “1040x” form versus “1040-x” instructions).  the user may use a familiar acronym that is spelled out in the content (e.g., “poa” for “power of attorney”). ■ mismatch between user requests and actual content  many users ask for information that they expect to find on the site but is actually hosted at another site (e.g., “ds156,” a department of state form; “it-201,” a new york state tax form). ■ issues with results listing and content display  content may lack informative titles.  automatically generated summaries may not be sufficiently descriptive for users to recognize the relevant material in the results listing.  content may consist of long, scrolling pages, which users find hard to manage. ■ incomplete user queries  very short search phrases (average length of less than two words) can make it difficult for a search algorithm to deduce the specific content the user is seeking. step 3: analyze the information-seeking behaviors of the users site-usage reports, satisfaction surveys, help desk contact reports, zero-results reports, focus groups, and usability studies are valuable sources of information. they should be mined for information-seeking behaviors of the site’s users and other barriers to a successful search experience, as follows: ■ review site-usage reports for the most frequently entered search terms and popular pages (both may change over time) and the zero-results search terms. look for:  new terms  variations on popular terms  common misspellings or typos  common searches, including searches for items table 3. irs.gov user types type of user % of total site visitors % of total search users individual taxpayer 64% 64% representing a business 11% 11% tax professional 10% 11% representing a charity or nonprofit 3% 3% vita/vce volunteers 3% 3% representing a government entity 2% 2% student 2% 1% irs employee 1% 2% other 4% 3% table 4. how users find information on irs.gov how do you usually find information on irs.gov? % of total site visitors search engine 49% irs keyword 18% navigation to the web page 11% internet search engine (e.g., google, yahoo) 7% site map 5% other 4% bookmarks 3% links to irs.gov from other web sites 3% 84 information technology and libraries | june 2006 not on the site, that could be candidates for preprogrammed “quick links”  frequently entered terms—review search results to identify candidates for improvement ■ review satisfaction surveys over time  look for new problems that caused satisfaction to decrease  analyze answers to questions asking what people could not find, potentially identifying new barriers to success ■ conduct usability studies  identify issues with the user interface as well as with content findability and usability ■ review help desk contact reports  identify which topics users are having trouble finding or understanding step 4: understand the needs of the business owners the business owners are the irs business operating divisions that provide content to irs.gov, such as small business and self-employed, wage and investment. it is important to involve them in the process of enhancing the user experience, because they may have specific goals for prioritizing information on a particular topic or may be managing campaigns for highlighting new information. thus it is desirable to: ■ meet with business owners regularly to understand their goals for providing information to users ■ work with them to increase the findability of their content for example, when an issue in finding a particular content topic is identified (e.g., through an increase in help desk contacts), one approach is to show the business owner the actual results that common queries (based on the site-usage reports) on the topic retrieve and then present suggested alternative results that could be retrieved with a variety of enhancement techniques, such as thesaurus expansion or title improvement. the business owner can then evaluate which set of results presents the content in the most informative manner to the user. steps 1–4 facilitate work behind the scenes to gather the data needed to improve precision and recall and to make information more findable. the remaining steps use these data to adapt proven, widely used techniques for improving search experiences to a web site’s specific environment. step 5: identify appropriate tools to improve the information-retrieval process as described in the previous section, the tools in our toolkit are document engineering, query enhancement, search improvement, results-ranking improvement, categorization, and summarization. step 6: repeat as needed the process of improving the user search experience is ongoing as the site evolves. at irs.gov, different search terms appear on the site-usage reports over time, depending on whether or not it is filing season, or as new content and applications are published. human intervention (with the help of applicable tracking software) is essential for incorporating business requirements, evaluating human behavior, and identifying changing terms. step 7: monitor new developments in search and analytic technologies and replace the search engine as appropriate although a new search engine will not address all the issues that have been described, new features such as passage-based summaries and term-highlighting can improve the search experience. of course, one should consider replacing a search engine if new technology can demonstrate significantly improved precision and recall. the application of the methodology and the use of the toolkit for irs.gov will be described in the next section. ■ findings site-usage reports in 2003, an example of a serious mismatch in user and content terminology was discovered when site-usage reports were analyzed. users entering the equivalent terms ein, employer number, employer id number, and employer identification number retrieved significantly different sets of results. we met with the business owner, who identified a key-starting page that should be retrieved along with other highly relevant pages for all of these query terms. we recommended that “ein” be added to the title of the key page because, although ein is a very popular query, the acronym was not used in the content, but was instead spelled out. as a result, the key page was not being retrieved. synonyms were added to the query enhancement thesaurus to accommodate the variants on the ein concept. after these steps were implemented, the results were as follows: ■ for the query ein, the target page moved from #16 to #1 ■ for the query ein number, it moved from #17 to #5 article title | author 85a dynamic methodology for improving the search experience | kerchner 85 ■ for the query employer identification number, it moved up to #2 (it was not in the top 20 previously) ■ all search results now retrieved on the first page for these terms were highly relevant in january 2004, there were approximately twenty thousand queries using these terms, so the search experience has been improved for tens of thousands of users in one month and hundreds of thousands of users throughout the year. ■ review of help desk contacts help desk reports summarize, for each call or e-mail, the general topic of the user’s contact (filing information, employer id number, forms, and publications issues) and the specific question. for example, the report might indicate that a user needed help in finding or downloading the w-4 form or did not understand the instructions for amending a tax return. as help desk contact reports were reviewed, clusters of questions emerged indicating information that many users could not find or understand. by analyzing approximately 9,800 contacts (e-mail, telephone, chat) during a peak five-day period in april 2003, four particular areas were identified that were ripe for improvement: 480 users could not find previous years’ forms, which, although they can be found on the site, are not indexed and thus not findable through search; 250 users had questions about where to send their tax returns; 170 users had questions about getting a copy of their tax return or w-2 form; and 77 users had problems finding the 1040x or 1040ez forms. utilizing the information retrieval toolkit, the following improvements were implemented: a) search for previous years’ forms tool used: results-ranking improvement a user requesting a previous year’s forms (for example, 2002 1099misc) is now presented with a link directly to the page of forms for that specific year, as follows: recommendation(s) for: 2002 1099misc ■ 2002 forms and publications 2002 forms, instructions, and publications available in pdf format b) request for filing address tools used: document engineering and query enhancement a new “where to file” page was created. synonyms were added to the thesaurus to accommodate the variations on how people make this request (address, where to send, where to mail) and to prioritize retrieval of the “where to file” page. c) request for information about obtaining a copy of a tax return or w-2 form tools used: results-ranking improvement and query enhancement a “quick link” was created to the target page for getting a copy of returns and w-2 forms and synonyms were added to the thesaurus to prioritize related content for any query containing the word “copy.” d) requests for 1040x or 1040ez forms or instructions tool used: query enhancement synonyms were added to the thesaurus to address both the variations on how users requested the 1040x and 1040ez forms and instructions, and the inconsistencies in the titling of these documents (for example, the form and the instructions have different variations of the compound name). ■ results in 2004, approximately 4,200 contacts were reviewed with the help desk during the same time period (the week before april 15) to see whether the changes actually did help users find the information. it should be noted that, during this period from april 2003 to april 2004, many other improvements to the user search experience based on the methodology were deployed. although the number of visits to irs.gov increased by approximately 50 percent compared with the same period in 2003, the total number of contacts with the help desk decreased by 47 percent (there were approximately 9,800 contacts in this period in 2003). the results for the specific improvements are shown in table 5. the average decrease in contacts for those four topics was 68 percent, compared with the average decrease of 47 percent. this approach has significantly improved the user experience by identifying and addressing subject areas users have trouble finding or understanding on irs.gov, eliminating the need for them to contact the help desk. as a result, an increase of resources at the help desk was avoided and, hopefully, user satisfaction improved. 86 information technology and libraries | june 2006 ■ conclusions while the case presented in this article was specific to irs.gov, the methodology itself has wide application across domains. customer service for most government and commercial organizations depends on providing users with relevant information effectively and efficiently. there are many aspects to achieving this elusive goal of matching users with the specific information they need. in this paper, it has been demonstrated that, rather than focusing just on optimizing the search engine or developing a metadata-based solution, it is essential to view the user search experience from the time content is created to the moment when users have truly found the answer to their information needs. there is no one surefire solution, and one should not assume that enhanced metadata or a new search engine is the only solution to retrieval problems. the methodology described in this paper assumes that users, especially infrequent users of public web sites, do not wish to become search experts; that intuitive interfaces and meaningful results displays contribute to a successful user experience; and that keeping business owners involved is important. the methodology is based on understanding the behavior of a site’s users in order to identify barriers to a successful search experience, and on understanding the needs of business owners. the methodology focuses on adapting the site to its users (rather than vice versa) through document modification, improved content-development processes, query enhancement, and targeted search improvement. it includes improvements to the results phase of the search process, such as improved titles and summaries, as well as to the searchand-retrieval phase. this toolkit-based approach is effective and low-cost. it has been used over the past four years to improve the user search experience significantly for the millions of irs.gov users. interesting follow-on research could focus on identifying to what degree this methodology can be automated and how to leverage new tools to provide automated support for usage log analysis (such as mondosearch by mondosoft). it is clear from this case study that it is time to apply systems engineering rigor to search-experience improvement. this approach confirms the need to extend metrics for evaluating search beyond precision and recall to include the totality of the search experience. ■ future work teleporting has been defined as an approach in which users try to jump directly to their information targets.14 trying to achieve perfect search results supports the information-seeking strategy of teleporting. but the search process may involve more than a single search. people often conduct “a series of interconnected but diverse searches on a single, problem-based theme, rather than one extended search session per task.”15 this approach is similar to the sport of orienteering with searchers using data from their present situation to determine where to go next—that is, looking for an overview first and then submitting more detailed searches. given the general, nonspecific nature of the short queries submitted by irs.gov users, the orienteering approach may well describe the information-seeking behaviors of many users. this paper is limited to the improvement of search results for individual searches, but the need to investigate improving the search experience to support orienteering behavior is acknowledged. future research will investigate how to leverage the theoretical models of the information-search process, such as the anomalous states of knowledge (ask) underlying information needs and the information search process model.16 references and notes 1. “common evaluation measures,” the thirteenth text retrieval conference, nist special publication sp 500-261 (gaithersburg, va.: national institute of standards and technology, 2004), appendix a. 2. kalervo jarvelin and peter ingwersen, “information-seeking research needs extension towards tasks and technology,” information research 10, no. 1 (2004), http://informationr .net/ir/10-1/paper212.html (accessed feb. 2, 2006); k. fisher, s. erdelez, and l. mckechnie, eds., theories of information behavior (medford, n.j.: information today, 2005); t. saracevic and paul b. kantor, “studying the value of library and information services, part i: establishing a theoretical framework,” journal of the american society for information science. 48, no. 6 (1997): 527–42. table 5. comparison of 2004 and 2003 help desk contacts problem area number of contacts 2003 number of contacts 2004 change 1040x, 1040ez 77 19 -75% prior year forms 480 103 -78% copy of return 170 91 -47% where to file 250 104 -58% total 977 317 -68% article title | author 87a dynamic methodology for improving the search experience | kerchner 87 3. scott nicholson, “a conceptual framework for the holistic measurement and cumulative evaluation of library services,” journal of documentation 60, no. 2 (2004): 164–82. 4. avra michelson and michael olson, “dynamically enabling search and discovery tem,” internal mitre presentation, mclean, va., mar. 30, 2005. 5. lawrence e. leonard, “inter-indexer consistency studies, 1954–1975: a review of the literature and summary of study results,” occasional paper series, no. 131, graduate school of library science, university of illionois, urbana-champaign, 1977; tefko saracevic, “individual differences in organizing, searching and retrieving information,” in proceedings of american society for information science ’91 (new york: john wiley, 1991), 82–86; g. furnas et al., ”the vocabulary problem in human-system communication,” communications of the acm 30, no. 11 (1987): 964–71. 6. rajiv ramnath and david landsbergen, “it-enabled sense-and-respond strategies in complex public organizations,” communications of the acm 48, no. 5 (2005): 58–64. 7. t. l. brauen et al., “document indexing based on relevance feedback,” report no. isr-14 to the national science foundation, section xi, department of computer science, cornell university, ithaca, n.y., 1968; m. c. davis, m. d. linsky, and m. v. zelkowitz, “a relevance feedback system employing a dynamically evolving document space,” report no. isr-14 to the national science foundation, section x, department of computer science, cornell university, ithaca, n.y., 1968; marcia d. kerchner, dynamic document processing in clustered collections, report no. isr-19 to the national science foundation, ph.d. thesis, department of computer science, cornell university, ithaca, n.y., 1971. 8. ibid. 9. gerard s. salton, dynamic information and library processing (englewood cliffs, n.j.: prentice-hall, 1975). 10. p. borlund, “the iir evaluation model: a framework for evaluation of interactive information retrieval systems,” information research 8, no. 3 (2003), http://informationr.net/ir/8 -3/paper152.html (accessed feb. 15, 2006). 11. ian ruthven, “re-examining the effectiveness of interactive query expansion,” in proceedings of the 26th international acm sigir conference on research and development in information retrieval (new york: acm press, 2003), 213–20. 12. marc l. resnick and rebecca lergier, “things you might not know about how real people search,” 2002, www.search tools.com/analysis/how-people-search.html (accessed oct. 1, 2005). 13. danny sullivan, “webtop search rage study,” the search engine report, 2001, http://searchenginewatch.com/sereport/ article.php/2163451 (accessed sept. 10, 2005). 14. j. teevan et al., “the perfect search engine is not enough: a study of orienteering behavior in directed search,” in proceedings of computer-human interaction conference ’94 (new york: acm press, 2004), 415–22. 15. vicki o’day and robin jeffries, “orienteering in an information landscape: how information seekers get from here to there,” in proceedings interchi ’93 (new york; acm press, 1993), 438. 16. n. j. belkin, r. n. oddy, and h. m. brooks, “ask for information retrieval, part i. background and theory,” the journal of documentation 38, no. 2 (1982): 61–71; n. j. belkin, r. n. oddy, and h. m. brooks, “ask for information retrieval, part ii. results of a design study,” the journal of documentation 38, no. 3 (1982): 145–64; carol c. kuhlthau, seeking meaning: a process approach (norwood, n.j.: ablex, 1993). presidents ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ lib-mocs-kmc364-20131012113937 268 the use of automatic indexing for authority control martin dillon: university of north carolina at chapel hill ; rebecca c. knight: wichita state university, wichita, kansas; margaret f. lospinuso: university of north carolina at chapel hill; and john ulmschneider: national library of medicine. thesaurus-based automatic indexing and automatic authority control share common ground as word-matching processes. to demonstrate the resemblance, an experimental system utilizing automatic indexing as its core process was implemented to perform authority control on a collection of bibliographic records. details of the system are given and results discussed. the benefits of exploiting the resemblance between the two systems are examined. introduction it is not often realized how close the relationship is between automatic indexing using a thesaurus , on the one hand , and automatic authority control, on the other. making the connection is worthwhile for many reasons. the first has to do with terminology. though one would be naive to hope for a reduction in specialized vocabulary, it is helpful to appreciate that what is called a thesaurus in one application is referred to as an authority file in the other; that the two have virtually the same structure, similar working parts, and play the same role in controlling the content of fields in a bibliographic file in their creation and, at least potentially, during retrievals by users. a second reason emerges in system development. below we discuss the various ways that a library can implement authority control. they range from a fully manual system, where the authority file exists only in card form, to online, automatic authority management. there are intermediate points as well. for each of the automated implementations, the system investment in software can be great. recognition of the close parallel in function of these two library needs allows for parallel development of software for any of these stages. a third reason looks to the future. successful system-patron interaction manuscript received apri11981 ; accepted september 1981. automatic indexing/dillon, et al. 269 ought not to depend upon a patron's knowledge of the authorized entry forms currently in use for a library. first, the concept of a controlled vocabulary is far too narrow: authority control should encompass all fields available for searching. but the patron need not be aware of complicating details: substitutions of recognized variants for authorized forms ought to be carried out automatically during patron retrievals (with due regard, of course, for the intent of the patron). this article describes a project in authority control in a specialized system environment, one that is increasingly typical in many of its features. the file of records is relatively small, currently below 10,000, and has a potential for growth not exceeding 100,000. the collection, derived from the annabel morris buchanan collection of american religious tune books at the university of north carolina (chapel hill) music library, has many similarities with standard book collections, but its details vary greatly and cataloging conventions have been developed locally. its use for scholarly research is similar to that for any standard collection of bibliographic records. a great many such nonstandard collections exist-the morgue file in a newspaper, machine-readable data files, even properties marketed by cooperatives of real estate agencies. developing automated retrieval systems for such collections are similar enterprises, sharing similar goals and problems. in particular, all require extensive authority control similar to that required by a tune-book collection. the important feature of the method of authority control described here, one that makes it likely to be of interest to others, is its use of the same structures and software that are used for general vocabulary control. the three major software components we will refer to below are: thesaurus maintenance, automatic indexing, and automatic updating. these components antedated our effort to implement a similar system for authority control. when the problems that dealt with authority control per se were investigated, it was discovered that the system already available for subject control could be used exactly as it stood for authority control as well. initial experiments confirmed this relationship. 1 authority control and automatic indexing automatic authority control has been approached largely as a unique problem requiring special software development for its implementation. but authority control shares common ground with automatic subject indexing. both are term-matching activities based on a list of preferred terms plus a much larger list of match terms. each preferred term is tied to a number of match terms, but each match term is tied to only one preferred term. in the indexing environment, document text is examined for certain terms; these "free text" (uncontrolled vocabulary) terms are tied to equivalent (controlled vocabulary) terms in a thesaurus. when an uncontrolled vocabulary term is encountered in a document, its associated controlled 270 journal of library automation vol. 14/4 december 1981 vocabulary term is posted to the document as a descriptor. in authority control, document text is also examined for certain terms, e.g., author names. these "free-text" author names (i.e., names just as they appear on a title page) are tied to their authoritative name form (controlled vocabulary) in an authority file . when a "free-text" author name is encountered, the authoritative name is posted to the document or book (i.e., assigned as a heading or entry point). an automatic authority control system, then, is realizable by applying standard automatic subject-indexing software, which exploits the resemblance between the two processes. the input would consist of a thesaurus (in this case, an authority file) and bibliographic records; the indexing discovers matches between the list of possible terms in the thesaurus (variants of author names) with the "free-text" terms (title-page author names) , and posts the appropriate controlled thesaurus terms (authoritative author name form) whenever a match occurs. (see figure 1.) the tune-book project an experimental version of an authority control system using automatic indexing was implemented to test the feasibility of automatic indexing as i thesaurus i (authority file) \ \ i i \ fig. 1. at1thority control by indexing. matching and posting , l ' pdated records i \ ' i bibliographic records ~ automatic indexing/dillon, et al. 271 the core process for authority control. the goal was automatic authority control for the buchanan collection index, the first step in work on a more comprehensive project, an index of american religious tune books, in particular, the shape-note tune books. for the study of american cultural and musical history it is important to be able to trace the dissemination of these hymn tunes and texts, but the absence of a comprehensive index of american hymn tune books severely constrains such studies. many factors have discouraged scholars from constructing an index, among them the magnitude of the repertory . using computers to sort, file, and print reduces many of the problems associated with the size of the repertory, but does not address those created by the diverse forms of names and texts used by the tune-book compilers. correct hymn titles and especially accurate composer attributions were not important to the compilers of the tune books. consequently, although many tune-book compilers did attempt to indicate who had composed the work, the names of the composers appeared in various forms. for example, the name "israel holdroyd" might appear as simply "holdrad" or "holdrayd" with no first name given, or a first initial might be added, or an abbreviated first name, such as "is." might be used with one of several forms of the family name. automatic authority control over these names is necessary to the study of this collection, since only automatic means can address the problems of magnitude encountered in approaching the index as a whole. the database now contains about 6,000 records for these tune books. they are stored in marc format with variable-length fields giving a variety of information about each tune . creation of the authority file a thesaurus of authority records for the buchanan collection was manually created and placed in an online file. the initial authority file comprises a selection of composers whose names are present in conflicting forms in the present database. these were obtained by analyzing the file sorted by tune names, noting those tunes for which it appeared that the name of the same composer was given in more than one form. all forms of the name found were entered on cards along with the name of the tune (or tunes) through which the relationship was established . we used an explicit algorithm as a guide in determining which names were actually forms of the same name (see appendix for details). this process resulted in a list of 266 distinct composers, each with one to four different name forms. all were compared with the list sorted by composers, noting additional forms. these names were then checked in several reference works, and authoritative forms (with dates) were established when possible. implementation software systems file processing for the tune records and the authority thesaurus was 272 journal of library automation vol. 14/4 december 1981 accomplished using a local software product, bibliographic/marc processing system (bps). bps is a general-purpose software package for the manipulation of marc-format records. this experiment used bps subsystems for creation of marc-format records, sorting and formatting, and file updating (i.e., updating a master file with the contents of a transaction file). the automatic indexing program used here was intended as part of a thesaurus-based document query system. 2 it is compatible with bps, but utilizes generalized automatic indexing principles-its compatibility depends only on properly formatted thesaurus and bibliographic records. it includes file-processing programs for the thesaurus (authority file) and the bibliographic records (tune records) and a matching program that performs the indexing. posting of the authoritative name forms to the proper marc record is done with standard bps updating procedures using output from the matching program. automatic authority control process as input the system uses a thesaurus and the text of fields selected from marc-format document records. the thesaurus consists of pairs of terms: the first of each pair is the term searched for in a document, the second is the authority term assigned to the document, whenever the first term is found. figure 2 gives examples. the text may be abstracts, titles, or the contents of any field selected from the documents for authority control. in this case, the text is derived from the composer field; for authority work in general, any field requiring authority control would be input. the first step in authority control is as follows. the text sample and a stop-word list are input to the initial text-processing program. the incomau'ihcrity fcri'i cole, j_ i cvle, joh~ 1774-1855 clarkf", thos. 1 clark, thomas \:ol e!' , ~ eo. i cuzens, 9. / cuzens, benjamin ilall , ::;_ bi ba 11 , r. fholraj / hcld r oyd , israel aolroyd i hcldroyd, israel fig. 2. thesaurus/authority file format . automatic indexing/dillon , et al. 273 ing text (in this case, composer names) is separated into individual words. the stop-word list is used to remove designated words from the input, which in authority control might be titles of address and so onterms such as "miss," "elder," or "reverend." (automatic indexing uses the stop-word list to eliminate similarly noncontributory terms, such as conjunctions and prepositions.) the processing program can also convert plurals to singulars if desired. the purpose of this option in automatic indexing is to pare down variants in order to increase matches by standardizing term forms. however, plurals are not converted in authority control, since names are usually distinguished from one another by their full forms. the processing produces a list of individual terms. each term is given once along with the number of words in the term, then broken up with the document number attached to each piece. the thesaurus authority records are edited by the thesaurus processing program into specially formatted matched pairs of variant and authoritative forms. input is the match-term/variant-term file (figure 2) and the same stop-word list used for document processing. the stop-word list eliminates all unwanted words in the list of variant name forms. output is a file containing all possible name forms (variants), the number of terms in each name and their positions in the name, and the authoritative name form, as in figure 3. next the two files are used as input to a matching program that creates an inverted file of the processed document text, then compares each match term from the prepared thesaurus with the inverted file. a match is discovered according to one of the following criteria: 1. exact match: match term and document term are the same words, in the same order, and adjacent. 2. stop word exact match: words are the same in match term and in document term, and in order, but deleted stop words may intervene between words in the document term. 3. any order match: term must be the same words and adjacent (i.e., without intervening words) and may be in any order. va!'iani twc!ld s ~:utiv~ auti:-ci\ily ?cs: no fch hlstin'js, 'ihos. 2 1 2 rastinq~ , tl:hii.l s 17~4-l tl7 _ hastl.nqs, l h:>s :le i 1 2 rds tl nq.< , th.:>llll s 17cl~ 1 -!72 holde a':! l!ol:lccyd , l s cd: ab-1054, .\3-166q, ad-1248, aq-133b, ••• fig. 4. update file. results table 1 gives some statistics on the experimental runs. in the 5, 788 bibliographic records, 760 distinct composer names were present, the remainder (one composer per record) being duplicate forms; many of these are simply "anon," where the composer was not known. earlier test runs on a subset of the file had fewer duplicates, and additions to the full database show few new composer name forms. thus the database is nearing a stable state with an exhaustive list of composers; this stability contribtable 1. implementation statistics f ile statistics: total number of bibliograp hi c records number of composer names in biblio reco rds ave rage number of compositions per composer tota l number of authorit y na me forms (in authority file) tota l number of variant and authority names (in authority file) run statisti cs: total number of variant thesauru s names matched total numbe r of variant thesaurus n am es unmatched average number of documents per match ed ter m average number of docume nts per term total number of reeords updated b y authority form 5,788 760 13.2 266 599 372 213 5.87 3.61 2, 110 276 journal of library automation vol. 14/4 december 1981 jqc 10: af1 14 7 .\nt ho l o.:; y ; 'i h <:> ~ n ion ilih jl on y i mjrin : : sel~cted ty ;ecr qe y~njr~ckson tune na:1e: i e::-usa lem firs: lin~:je~us, my all tc h~~v•n is gone, pcn: walk e r, william 18 09 -187 5 cc.'1p!)3:':r: loi al k e r, \ojr • joc i d: aa-1353 "antholo.:;y: the sacred harp imprinl': oy 3. f. lthite, e . j. king [and d.p. white}--4th ed.---atalnta : d. p. byrd, 1870 tune name: the hilt cf zion frgsr ~ine:the hill cf zion yield s , pc~: white, benjamin franklin 1800-1879 coi1po ser: white, b. f. )ot: id: afl -1100 anthology: the culcia;er imprint : or, 'ihe new york coll~ction of ~acred music 1 by i. b. woccbury. --neli york f. j. huntington tune name: carson first line:jesus an1 shall it ever be, pcn: bradbury, williaa; batchelder 1816-1868 composer: er, w. !l. fig. 5. updated records. utes to decreasing errors and fewer unmatched composer names in the automated authority control process. the total numbe r of thesaurus records matched applies to variant forms, authoritative forms (matching occurs for these also) , and for those few forms that have no variants. the unmatched terms (213) are largely variants not in the database but gleaned from reference sources in anticipation of their occurrence, and authority forms, most of which do not occur in the database. the 2, 110 matched represent the total number of composer names matched of the originals, 788 names. most of the unmatched names are the "anon" entries (more than 2 ,000); the remainder are unanticipated forms not detected in the initial manual construction of the authority file. these unanticipated forms become new variants added to the authority file as described above. conclusions automated authority control as presented here has a number of advantages, either for libraries with their own processing facilities or for the management of information collections outside the standard library environment. unifying the processes of subject control and authority control by using the same procedures and software for both simplifies the tasks of automatic indexing/dillon, et al. 277 systems personnel and information managers. where catalog access is online, the patron benefits by applying subject access facilities to other searches. ideally, substitutions for all variants would occur automatically, accompanied by an alerl lo the patron where it was felt necessary. at a minimum, the same command structure would be available for referencing names as would be normally available for consulting an online thesaurus. in either case, the difficulties of the patron are reduced, both in comprehending how the system works, and in acquiring a facility for using system commands. references 1. gordon ellyson jessee, "authority control: a study of the concept and its implementation using an automated indexing system" (master's paper, school of library science, university of north carolina at chapel hill, 1980). 2. margaret s. strode, "automatic indexing using a thesaurus" (master's thesis, department of computer science, university of north carolina at chapel hill, 1977). appendix rules for decisions on similar names the following conditions may exist: a = identical tune name b = identical surname c = identical first initial d = same first letter of surname and close match of the rest of the surname. (55 percent match of latters in content, not in order. such a similarity is presumed to represent a similarity in sound. ) e = similar tune name (same criteria as in d for percentage of match). exception: words "new" and "old" cancel any presumed relation between similar tune names. f = information in cmp subfield x field is identical in content the following combinations of conditions indicate the same person, expressed in decreasing order of reliability: l. a&b 2. b&c 3. a&d 4. c&d 5. b&e 6. c&d&e 7. d&e 8. f&(bord) note: points seven and eight are regarded as tentative, and matches using these combinations are flagged for later checking. martin dillon is associate professor of library science at the university of north carolina at chapel hill. rebecca c. knight is administrative services librarian at wichita state university, wichita, kansas. margaret f. lospinuso is music librarian at the university of north carolina at chapel hill. john ulmschneider is library associate at the national library of medicine. llo journal of library automation vol. 14/2 june 1981 the desperation from a downtime situation. great neck library is also planning to use the apples for other functions, which, it is hoped, will be implemented soon. multimedia catalog: com and online kenneth j. bierman: tucson public library, tucson, arizona. like many public libraries, the tucson public library (tpl) is closing its card catalog and implementing a vendorsupplied microform catalog. unlike most of these other libraries, however, the tpl microform catalog will not include', location or holding information. the indication of where copies of a particular title are actually available (i.e., which of the fifteen possible branch locations) will be available only by accessing a video display terminal connected to the online circulation and inventory control system. conceptually, the tpl catalog will be in two parts with each part intended to serve different functions.' the microform catalog (copies available in both film and fiche format) will fulfill the bibliographic function of the catalog. this catalog will contain bibliographic description and provide the traditional access points of author, title, and subject. the online catalog (online terminals are in place at all reference desks and a few public access terminals will also be available) will fulfill the finding or locating function of the catalog. this catalog will contain very brief bibliographic description and will only be searchable by author, title, author/title, and call number, and will contain the current status of every copy of every title in the library system (i.e., on shelf, checked out, at bindery, reported missing, etc.). why did the tucson public library make this decision? there are two major reasons: l. accuracy . the location information, if provided in the microform catalog, would always be inaccurate and out of date. assuming that the locations listed in the latest edition of the microform catalog were completely accurate when the catalog was first issued (an unrealistic assumption to begin with as anyone who has ever worked with location information at a public library with many branches well knows!), the location information would become increasingly less accurate with each day because of the large number of withdrawals, transfers, and added copy transactions that occur (more than 100,000 a year) . in addition, at any given time, one-quarter to one-third of the materials in busy branches are not on the shelf because they are either checked out or waiting to be reshelved . thus, the microform catalog would indicate that these materials were available at specific branches when a significant percentage would in fact not be available at any given time. in short, even in the best of circumstances, easily half of the location information would be incorrect in telling a user where a copy of a title was actually available at that moment. 2. cost , a study done at the tucson public library indicated that close to half of the staff time of the cataloging department was spent dealing with location and holding information. this time includes handling transfers, withdrawals, and added copies. all of this record keeping is already being done as a part of the online circulation and inventory control system (the tucson public library has no card shelflist containing copy and location information but rather relies completely on the online file for this type of information). to "duplicate" the information in the microform catalog would cost an estimated $40,000 to $60,000 a year and the information in the microform catalog would never be accurate or up to date for the reasons outlined above . figure 1 is a brief summary of how the bibliographic system will work. would the system in figure 1 be improved if holdings were included in the microform catalog? on the surface, the obvious answer is yes-more information is communications 111 known-item search (37 percent of tpl catalog use according to catalog use survey conducted at the tpl in 1971) user searches microform catalog by author and/or title. if user does not find desired bibliographic entry, user either leaves unsatisfied or goes to desk (or public access terminal) for help. if user finds the desired bibliographic entry, he/she writes down call number (or author for fiction) and proceeds to shelf. if user finds book on shelf he/she checks it out. if user does not find book on shelf, user either leaves unsatisfied or goes to desk (or public access terminal) to obtain holdings information or ask for help (put on reserve, borrow from another library, possible purchase of additional copies, etc.). subject search (63 percent of tpl catalog use by public according to catalog use survey conducted at the tpl in 1971) user searches microform catalog. user writes down call number(s) and proceeds to shelf. if user finds appropriate material(s), he/she checks it out. if user does not find appropriate material he/she leaves unsatisfied or goes to desk for help (reference interview, etc.) . fig. 1. summary of how system will work. always better. but, if we examine the situation in depth, perhaps not. let us look at some hypothetical situations. if the user is doing a search and does not find the desired entry/entries in the microform catalog, it makes no difference whether holdings are included in the catalog. the user will still either leave unsatisfied or go to the desk for help. if the user is doing a known-item search and finds the desired item and notes, and the agency he/she is at is listed as a holding agency, he/she will proceed to the shelf. if the desired material is found, fine . if not (because the material is checked out, reported missing, or withdrawn), he/she will either leave unsatisfied or go to the desk (or public access terminal) for help. if the user is doing a known-item search and finds the desired item in the microform catalog but notes that the agency is not listed as a holding agency, what are his/her choices? the user can go away unsatisfied without checking the shelves (although there may be a copy on the shelf because a copy may have been added to that agency since the microform catalog was last recumulated) or he/she can go to the desk (or public access terminal) to obtain help; here he/she will have access to the "real" holdings information--on the online system. the user could notice from the holdings in the microform catalog that another branch has the item and drive to the other branch. however, when the user gets there he/she may discover that the item is not available-information that could have been found in the online system at the original branch if he/she had gone to the desk (or public access terminal). · the purpose of the above exercise is to demonstrate that in all cases the user is still going to require access to the online catalog in order to determine holdings more accurately. with time, this access will become increasingly self-service through public access terminals. from the user's point of view, providing inaccurate holdings in the microform catalog does very little good and can actually do harm by leaving the impression that, if a library is listed as a holding library, that library will have the item (a false conclusion because of checkouts, reported missings, and withdrawals) or leaving the impression that if a library is not listed as a holding library, that library will not have the item (a false conclusion because a copy could have been added recently but that fact is not yet reflected in the microform catalog) . if the user is doing a subject search, holdings are of less value in the catalog 112 journal of library automation vol. 14/2 june 1981 anyway because he is primarily getting suggested classification numbers in order to browse. the tucson public library could not have made the above decisions if it did not have a complete online file of all its holdings (including even reference materials that never circulate). but since this data did exist (after a five-year bar-coding effort) and since more than forty online terminals were already in place throughout the library system to access the online file, the decision not to include locations or holdings in the microform catalog seemed reasonable . in the longer-range future (1990?), it is very likely that the entire catalog will be available online . in the meantime, the tucson public library did not want to divide its resources maintaining two location records, but rather wanted to concentrate resources in maintaining one accurate record of locations available as widely as possible throughout the library system (by installing more online terminals for staff and public use). was this decision a sound one? we don't know. the microform catalog has not yet been introduced for public use. by the end of this year we should have some preliminary answers to this question. references 1. robin w. macdonald and j. mcree elrod, "an approach to developing computer catalogs," college & research libraries 34:202-8 (may 1973). a structure code for machine readable library catalog record formats herbert h. hoffman: santa ana college, santa ana, california. libraries house many types of publications in many media, mostly print on paper, but also pictures on paper, print and pictures on film, recorded sound on plastic discs, and others. these publications are of interest to people because they contain recorded information. more precisely said, because they contain units of intellectual, artistic, or scholarly creation that collectively can be called "works." one could say simply that library materials consist of documents that are stored and cataloged because they contain works. the structure of publications into documents (or "books") and works, the clear distinction between the concept of the information container as opposed to the contents, deserves more attention than it has received so far from bibliographers and librarians. the importance of the distinction between books and works has been hinted at by several theoreticians, notably lubetzky . however, the idea was never fully developed. the cataloging implications of the structural diversity among documents were left unexplored. as a consequence, librarians have never disentangled the two terms book and work . from the paris principles and the marc formats to the new second edition of the anglo-american cataloguing rules, the terms book and work are used loosely and interchangeably, now meaning a book, now a work proper, now part of a work, now a group of books. such ambiguity can be tolerated as long as each person involved knows at each step which definition is appropriate when the term comes up. but as libraries ease into the age of electronic utilities and computerized catalogs based on records read by machine rather than interpreted by humans, a considerably greater measure of precision will have to be introduced into library work. as one step toward that goal an examination of the structure of publications will be in order. the items that are housed in libraries, regardless of medium, are of two types. they are either single documents, or they are groups of two or more documents. items that contain two or more documents are either finite items (all published at once, or with a first and a last volume identified) or they are infinite items (periodicals, intended to be continued indefinitely at intervals). schematically, these three types of bibliographic items in libraries can be represented as shown in figure l. it should be noted that all publications, all documents, all bibliographic items in li1 concept of an on-line computerized library catalog frederick g. kilgour, director, the ohio college library center, columbus, ohio a concept for mechanized descriptive cataloging is presented, together with four areas of research programs to be undertaken. this paper will describe a concept of a catalog that is hospitable to mechanized descriptive cataloging, and will delineate major areas of research for production of knowledge necessary to implement such a catalog. to avoid an unnecessarily complex presentation, the discussion will treat only of printed books. nevertheless, the catalog described will function equally well as a store for serials, journal articles, reports, and any other materials that carry bibliographic-like descriptions of themselves. as used in this paper, a concept is an idea that combines experience with and observations of catalogs, and suggests further experimentation and observation. the merit of a concept is measured by its fruitfulness in production of new ideas and new experimentation and observation. the purpose of the concept proposed in this paper is to suggest avenues of investigation that will yield findings useful in development of mechanized, descriptive cataloging. the paper opens with a brief discussion of the objective and functions of a library catalog. next there is an analysis of the principal contribution of information retrieval during the past quarter century and a proposal for applying this advance to cataloging of books. the third section describes a plan for a new style of catalog, and the fourth shows how it will be possible to prepare entries mechanically from title pages for inclusion in such a catalog. there follows an outline of major research investiga2 journal of library automation vol 3/ 1 march, 1970 tions to be undertaken to produce knowledge necessary for activation of t.'ie new style catalog and mechanical cataloging. the paper concludes with a brief estimate of the success the system may be expected to attain in achieving the objectives that appear at the start of the paper. objectives the principal objective of a future library will be active participation in the program of the community or institution of which it is a part by furnishing members of the community with bibliographic, textual, and other recorded information when and where they need such information. the passive service functions that librarianship has developed during the past century are no longer adequate to maintain a library as a viable organization within its environment. effective special libraries have rid themselves of the passive service concept and aggressively participate in the programs of their companies. an extreme example of active participation in institutional programs is the library-like sections of intelligence agencies. here collectors and processors of new information do not place that information on shelves or in files with the expectant hope that someone will use it. rather the collection and processing staff immediately brings new information to the attention of those making decisions that the new data may affect. a fruitful concept of a library is as an external human memory. since aristotle, it has been recognized that memory is necessary for creative thinking. the process of creative thinking requires raw materials from memory; but for centuries it has been impossible for man to retain within his memory all data that could fuel his creative thinking. indeed, one great triumph that sets human beings apart from all other animals is the ability to store data in an external memory such as a library. however, to support creative thinking, the external memory must transfer data to the human mind with as great speed as possible to prevent hindrance of thought that admits distraction. it is this lack of speed that generates frustration among library users. if a library is to participate effectively in programs of its institution or community, it must simulate human memory in furnishing a human mind with data when and where that mind needs data. current development in computers, and particularly in their memories, holds out the hope of highly effective external memory operation at some point in time beyond the foreseeable future, but in the meantime it is entirely feasible to strive for simulation of human memory with speedy recall of bibliographic information. it has been pointed out elsewhere ( 1) that productivity of library workers is not continuously increasing as is that of workers in the general economy, so that library unit costs are rising at a much more rapid rate than are those in the economy as a whole. if libraries are to attain their objectives in the future, they must invoke a technology that will enable them to lower their excessively high rate of unit-cost rise. it appears that concept of computerized catalogs/ kilgovr 3 mechanization, or more specifically, computerization, is the only avenue that extends toward the goal of economic viability. information retrieval during the last quarter-century, there have been important developments in information retrieval that have yet to be applied to book collections. such pioneers as w. e. batten, g. cordonnier, calvin mooers, and mortimer taube made a major innovation when they developed coordinate indexing. this technique coordinates index terms at time of searching, employing boolean logic. coordinate indexing has greatly increased flexibility of searching and number of accesses to documents in contrast to the precoordinated headings in traditional catalogs that are inflexible for searching and for up-to-date maintenance. early information retrieval systems dealt with relatively small files of documents, such as patents and internal reports, that were not subject to traditional bibliographic control. moreover, indexes to these files were housed in various manual devices. with the advent of the computer it became feasible to apply coordinate indexing techniques to large files of documents, including materials under classical bibliographic control. however, up to the present time, the techniques of information retrieval have been applied primarily to huge files of journal articles; an outstanding example of the application of coordinate indexing to journal articles is the pioneering medlars project, in which the primary approach to article is via coordination of subject indexing terms. retrieval of books from a library collection is an information retrieval process irrespective of whether the borrower uses subject headings or author-title entries in the catalog. the user who obtains a book from a library by employing the book's author-title label is logically engaged in the same information retrieval process as is the user who searches out a book under a subject heading. at first reading of the previous paragraph it might appear that a reader who obtains a novel by use of an author-title entry in a catalog is not engaged in information retrieval in its customary narrow sense. however, it is clear that the reader of a novel or poem is acquiring information in the same sense as is the reader of a book on computers, although knowledge he gleans from a novel is not for immediate practical application, but rather to enable him to understand what it is like to be a human being, and more specifically what it is like to be a human being in some of the precise circumstances of life. for the library user the words in an author's name, a title, and subject headings, are index labels that he uses to find a book that contains information he needs. the traditional use of these index labels, at least since the middle ages, has been some variety of an author and title citation form. the user knows externally to the citation that the book so labeled has information he wants. apparently three-quarters of the information 4 journal af library a.utomation vol. 3/1 march, 1970 retrieved from a library by use of a library catalog is via an author-title entry ( 2, 3) or a known document search ( 4) . a librarian's use of a catalog (except for reference librarians who. represent users) is to discover whether or not the library has a given book. the librarian does not use the authm-title entry as a label, but rather as information per se. librarians include sufficient data in catalog entries to enable them to decide from the description whether or not a book at hand or a book described by another citation is tne same book as that in the catalog records. in short, users employ a library catalog to direct them to information they require; librarians use the catalog for the actual information it contains. computerized catalog concepts several libraries employ nonconventional design for computerized catalogs or lists of other than bibliographic items. the stanford university library .. s on-line system uses a sequential fl:le of enfries to which there is an index of word's in the author and title elements of the edtry as well as other words in other elements (5). index files for various stanford data collections are "author, title word, id number, corporate author, conference author, keyword, citation,"' and for certain files, top1'cal subject indexes. in general, this system is· widely employed in the organization of computerized files, but the stanford application has a nniqne featnre in that it uses a derived key consisting of only the mst three letters of index words. for example, the derived keys for author and title words in figure 1 are vic, on}j, ret, sys, and the. the computer calculates the position of the word in the index from these trigrams· so that if is possible to locate the index word with great speecl this technique· of employing a derived key to compute location takes full advantage of a computer's major characteristic, namely, ability to compute rapidly. the washington state university library has developed a similar system for its on-line acquisitions file. access to an entry in this linear file is by purchase order nmnber. a random number genefafol' uses the purcllase order number to compute position of the entry in fhe file. from early trials, this technique appears to make possible exceptionally efficient use of random access file space. yale's machine aided techmcal processing system uses derived keys to locate entries for book funds in the system·s commitment register and entries in a name and address file employed for addressing notices and claims. the yale technical processing system also uses a derived key technique to detect duplication of purchase orders entering the master file. this system operates using the first four leffers of the author's name, first three letters of the first non-article word of the title, and first letter of the seeond title word if there is a second word. a routine run on 23 june, 1969, compared 1,23-7 new entries against 63,641 entries already in concept of cvmputerized catalogs/kilgour 5 the file. the comparison produced 199 couplets containing possible duplicates of which 115 couplets actually were duplicates; of the 115, only forty would have been obtained if an equal compare had been made throughout the author and title fields. several investigators are working on techniques for derivation of keys ( 6, 7, 8) . similar work on telephone directories ( 9) has yielded preliminary results indicating that an efficient formula for derived keys for personal listings is the first three letters of the surname and .first three letters of the street name; and for business listings the first three letters of the first word and first three letters of the second word. the ohio college library center is working on development of a computerized catalog for traditional catalog entries wherein a computer will rompute position of an entry in a file organized in a two-dimensional array from a precoordination of truncated strings of letters from words in the author's name and in the title. oclc plans to use this technique because, as already noted, three-quarters of the use of a library catalog seems to be use of author-title entries. precoordination :of derived keys from these two elements will speed average retrieval time. the present design calls for a microcatalog containing, on the average, perhaps fewer than five entries to be located at each computed position. having computed a location, a eomputer will search the microcatalog for entries possessing derived keys matching the original request, and entries satisfying this requirement will be displayed as a minilist on a cathode ray tube terminal. it is hoped that algorithms can be constructed that will yield minilists containing fewer than twenty entries 95% of the time. indexes to the p:r.oposed main entry file will be the equivalent of classical added entries. however, it is fruitful to view subject headings, title added entries, and author added entries as being continuous text, from which uniterms can be mechanically extracted. under each uniterm will be a list of addresses of the microcatalogs containing the corresponding entries, and each entry could be looked upon as analogous to the concept of a microtheme proposed by t. p. loosjes ( 10). coordination of indexing at sear-ch time by the user need not be limited within subject words, or title w{)rds, or author words. rather, coordination among these elements will greatly increase accesses to entries and will make possible retrieval of entries with a relatively slight amount of bibliographic information, particularly if each word is truncated as described above. although much research and new knowledge is necessary to achieve successful design of the type {)f catalog described above, there is no technical obstacle to its successful activation for experimentation. when practical implementation and routine operation are also successful, the user will be employing a minilist containing twenty or fewer entries most of the time. in other words, the reader will be using a catalog of twenty or fewer entries, and such a catalog makes it unnecessary to include bibliographical embellishments required for entries in huge card or book6 journal of library automation vol. 3/1 march, 1970 form catalogs. it would appear that for catalogs of twenty or fewer entries only information on title pages would be required; a scholar rarely, if ever, needs more. hence, it seems feasible that mechanization of descriptive cataloging could be achieved in the foreseeable future. mechanized descriptive cataloging the organization for a computerized catalog containing entries prepared mechanically from title pages would be somewhat different from that described in the previous section. if it proves impossible, as seems likely, to devise an algorithm that would mechanically identify author, title, and other elements on a title page, it would be necessary to arrange entries in sequential order. a computer could then prepare a mechanical coordinate index of substantive words on the title page that would make possible at search time coordination of words in author, title, and other elements without having to identify those elements. of course, catalogers would still do subject classifying and indexing, as well as assigning of call numbers, but a computer would mechanically convert these additions to the uniterm type of coordinate indexing described in the previous section. this proposal to construct a bibliographic record in the form of a transcription of a title page is not new. an early code for production of catalog entries, which the french government issued in 1791 ( 11), prescribed transcription of the title page, and underlining of the author's name as the filing term. if the book did not have an author, the key word in the title was to be underlined. the code also provided for the titlepage transcription to be supplemented by a physical description of the book. this proposed new concept for a computerized library catalog closely relates to the stanford design and the planned oclc design. however, in contrast to the new concept, the stanford file organization requires identification of record elements from within which words are extracted for inclusion in indexes, and the indexes are so tagged. similarly, the present oclc plan also requires identification of author and title elements for calculation of location, and hence for retrieval, as well as flagging of other retrieval elements, such as record number and call number; but the oclc system will not make necessary identification among types of added-entry elements. the proposed new concept expands this last device to the entire record. a computer simulation has been carried out of an on-line computerized catalog containing descriptive entries prepared mechanically. access to the simulated catalog was by coordination of non-structure words in titles via single-level indexes. simulation of user inquiries at a peak rate of five per second, processed on an economically feasible computer, revealed that utilization of the computer's central processing unit was only 19.87 percent. it is known from other simulation studies that library use of such concept of computerized catalogs/ kilgour 7 a computerized catalog would raise utilization by only two percent at the most. hence it follows that there is at least one (and probably several) existing, economical computer system that can be employed for such a catalog. mechanical descriptive cataloging of the title page depicted in figure 1 would be efficient and effective. the only character strings on the title page that would not be useful in coordinate indexing are "by" and "m.a., f .l.a.". however, the title page in figure 2 contains at least seven, or perhaps eleven, words and symbols that would not information retrieval systems characteristics, testing, and evaluation f. wilfrid lancaster nat!ooal ubruy of modlciqo john wiley & sons, inc. new york · london · sydney · toronto fig. 2. title page (undated). on retrieval system theory by b. c. vickery, m.a., f.l.a. second edition washington buiterworths 196s fig. 1. title page. be employed in coordinate indexing. if these eleven were to be included in the index and remain unused, they would approximately double the size of the index for this particular title page. such inefficiency is too large to tolerate. morever, the title page in figure 2 does not contain date of publication. effect of absence of publication date from entry and index is not known, although a recent study suggests that date of publication may be of relatively little use as a retrieval element ( 12). 8 journal of library automation vol. 3/ 1 march, 1970 the text of a title-page would be displayed as a string of characters and not rearranged as is done in traditional catalog entries. no doubt sophisticated algorithms will be devised to format displays, but even a simple algorithm produces a useful representation of title-page information. for example, by employing the simplest of algorithms that would insert two spaces at the end of each title-page line, the title page in figure 1 would appear as follows on a terminal. on retrieval system theory by b. c. vickery, m.a., f.l.a. second edition washington butterworths 1965 readers have used title pages successfully for centuries and will surely experience no difficulty in using them displayed in this manner on terminals. it is hoped that ultimately it will be possible to use optical character recognition techniques for mechanical transcription of most title pages. until effective ocr techniques are available, it will be necessary for clerical staff to transcribe title pages, and such employment for human beings is undesirable. however, libraries now employ clerical staff to transcribe bibliographic information for entries in essentially the same manner, so that continuance of an existing practice in this instance cannot be looked upon as invocation of a machine to convert human beings to machine-like activity. nevertheless, machines should replace such activity at the earliest opportunity. research there are at least four major areas of unknown on which research must be carried out to produce knowledge needed for development of a computerized library catalog hospitable to descriptive cataloging entries produced mechanically: 1) use of library catalogs; 2) specification for derived keys; 3) identification of title-page words useful and not useful for coordinate indexing; and 4) extent and type of coordination necessary to ensure successful retrieval. extraordinarily little is known about users' employment of library catalogs to obtain information from books. yet successful design of a catalog must be based on firm knowledge of catalog use. some areas of the broad pattern of catalog usage are known, but much more must be discovered before an effective catalog can be designed. descriptive cataloging rules have long been derived from rationalized principles of title-page and catalog formats. as yet there has been no major effort to derive these rules from the bibliographic practices of library users. for example, there has been no general effort to construct rules for descriptive catalog entries that match scholarly bibliographic references in such a way that a scholar could always expect to find in a library catalog essentially the same entry presented to him in a biblioconcept of computerized catalogs/kilgour 9 graphic footnote. to design new catalogs based on the various scholarly traditions of citation will require a series of analyses of citation practices that will ultimately yield descriptions of minimum regularity. the section of this paper on computerized catalog concepts has referred to research on specification for derived keys. such specification is required to enable swift access to files and at the same time to diminish human error in search requests. traditional designs of computer files are inadequate for management of huge files of millions of bibliographic entries. at the present time it appears that the truncation algorithms already referred to may be able to cope successfully with a majority of catalog entries. however, it is clear that such truncation techniques will not provide uniqueness of all keys adequate for efficient on-line catalogs. therefore, it will be necessary to carry out a series of investigations that will identify classes of entries for which a basic algorithm does not operate satisfactorily and to devise a supplementary algorithm to improve uniqueness of keys for those entries for which the basic algorithm essentially failed. presumably this cycle will be repeated as long as inadequacy of key uniqueness persists. in other words, research in this area will continue as long as retrieval inefficiency exists for the user. uniqueness of key depends on uniqueness of the serial combination of words from which the key is derived. hence analyses of frequency of word occurence on classical catalog entries, title pages, and in subject indexes, should be carried out with the aim of deriving a generalized description of such frequency distributions. such findings will be necessary for sophisticated logical and physical file organization. . to organize an efficient, huge file of bibliographic entries it is necessary to develop a method for computing scatter storage addresses that provides a very high percentage of unique addresses, thereby avoiding a collision with an entry already in an address. of course, it is necessary to furnish a hash-coding, or scatter-storage, algorithm with keys that possess high relative uniqueness; otherwise, the most efficient of scatter-storage algorithms would yield non-unique addresses in ratio to the degree of non-uniqueness of keys. p. c. mitchell and t. k. burgess ( 13) have introduced random-number generation for computing scatter-storage addresses and have shown their method to be more efficient than division hash coding. other investigators are working on techniques for minimizing queues resulting from repeated collisions. there is need for continuing imaginative investigation that will yield results like that of mitchell and burgess before huge bibliographic files and their indexes will be accessed efficiently. identification of useful and non-useful words for coordinate indexing on title pages, including those in foreign languages, is related to catalog usage. at the present time there is no information that gives a clue as to size of a list of non-useful words. much ingenuity and imagination will 10 journal of library automation vol. 3/1 march, 1970 be required to identify non-useful words and to construct efficient null lists. finally, investigation will be needed to determine amount and type of coordination necessary among author and title words. it will also be essential that a measure of retrieval success by author and title be developed. the need here is construction of a meaningful measure for retrieval of a single entry. conclusion the proposed concept for an on-line computerized library catalog will make it possible for a user to obtain bibliographic information from a remote terminal rapidly. use of derived keys would increase error tolerance well above that of present manual systems by diminishing effect of misspellings and by making it unnecessary for the user to have knowledge of catalog organization. moreover, the concept is a step toward full mechanization and can indeed be viewed as a partial simulation of text processing. the proposed catalog will also make it possible for libraries to take the first major step toward their economic goal of development of a continuously increasing productivity for both library staff and library user. it is anticipated that successive steps to come after mechanical descriptive cataloging will be automatic subject classification and indexing, to be followed ultimately by full text processing. when it is possible to achieve full text processing mechanically, and a decade or more may be required for that achievement, libraries will have succeeded in attaining their objective of participation as well as their economic goal of rate of cost rise equal to that in the general economy. references 1. kilgour, frederick g.: "the economic goal of library automation," college & research libraries, 30 (july 1969), 307-311. 2. tagliacozzo, renata; kochen, manfred; rosenberg, lawrence: "orthographic error patterns of author names in catalog searches," ]oumal of library automation, in press. 3. brooks, benedict; kilgour, frederick g.: "catalog subject searches in the yale medical library," college & research libraries, 25 (november, 1964), 483-487. 4. lipetz, ben-ami: "a quantitative study of catalog use" in university of illinois graduate school of library science: proceedings of the 1969 clinic on library applications of data processing, (preprint). 5. parker, edwin b. : "developing a campus information retrieval system." in proceedings of a conference held at stanford university libraries, october 4-5, 1968 (stanford, california: stanford university libraries, 1969), pp. 213-230. concept of computerized catalogs/kilgour 11 6. ruecking, frederick h., jr. : "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (dec. 1968), 227-238. 7. nugent, william r.: "compression word coding techniques for information retrieval," l oumal of library automation, 1 (dec. 1968), 250-260. 8. kilgour, frederick g.: "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science, 5 ( 1968), 133-136. 9. rothrock, hamilton irving, jr.: computer-assisted directory search; a dissertation in electrical engineering (university of pennsylvania, 1968). 10. loosjes, t. p.: "document analysis,'' proceedings of the third international congress on medical librarianship ( 1969), preprint. 11. instruction pour proceder a la confection du catalogue de chacune des bibliotheques (paris: imprimerie nationale, 1791 ), p. 6. 12. vaughan, delores k.: "memorability of book characteristics: an experimental study." in university of chicago graduate library school: requirements study for future catalogs (chicago: university of chicago graduate library school, 1968), pp. 1-41. 13. mitchell, patrick d.; burgess, thomas k.: "methods of randomization of large files with high volatility," journal of library automation, 3 (march 1970), 79-86. 4 information technology and libraries | march 2005 the challenges encountered in building the international children’s digital library (icdl), a freely available online library of children’s literature are described. these challenges include selecting and processing books from different countries, handling and presenting multiple languages simultaneously, and addressing cultural differences. unlike other digital libraries that present content from one or a few languages and cultures, and focus on either adult or child audiences, icdl must serve a multilingual, multicultural, multigenerational audience. the research is presented as a case study for addressing these design criteria; current solutions and plans for future work are described. t he internet is a multilingual, multicultural, multigenerational environment. while once the domain of english-speaking, western, adult males, the demographics of the internet have changed remarkably over the last decade. as of march 2004, english was the native language of only 35 percent of the total world online population. as of march 2004, asia, europe, and north america each make up roughly 30 percent of internet usage worldwide.1 in the united states, women and men now use the internet in approximately equal numbers, and children and teenagers use the internet more than any other age group.2 creators of online digital libraries have recognized the benefit of making their content available to users around the world, not only for the obvious benefits of broader dissemination of information and cultural awareness, but also as tools for empowerment and strengthening community.3 creating digital libraries for children has also become a popular research topic as more children access the internet.4 the international children’s digital library (icdl) project seeks to combine these areas of research to address the needs of both international and intergenerational users.5 ■ background and related work creating international software is a complex process involving two steps: internationalization, where the core functionality of the software is separated from localized interface details, and localization, where the interface is customized for a particular audience.6 the localization step is not simply a matter of language translation, but involves technical, national, and cultural aspects of the software.7 technical details such as different operating systems, fonts, and file formats must be accommodated. national differences in language, punctuation, number formats, and text direction must be handled properly. finally, and perhaps most challenging, cultural differences must be addressed. hofstede defines culture as “the collective mental programming of the mind which distinguishes the members of one group or category of people from another.”8 these groups might be defined by national, regional, ethnic, religious, gender, generation, social class, or occupation differences. by age ten, most children have learned the value system of their culture, and it is very difficult to change. hofstede breaks culture into four components: values, rituals, heroes, and symbols. these components manifest themselves everywhere in software interfaces, from acceptable iconic representations of people, animals, and religious symbols to suitable colors, phrases, jokes, and scientific theories.9 however, as hoft notes, culture is like an iceberg: only 10 percent of the characteristics of a culture are visible on the surface.10 the rest are subjective, unspoken, and unconscious. it is only by evaluating an interface with users from the target culture that designers can understand if their software is acceptable.11 developers of online digital libraries have had to contend with international audiences for many years, and the marc and oclc systems have reflected this concern by including capabilities for transliteration and diacritical characters (accents) in various languages.12 however, it is only more recently, with the development of international character-set standards and web browsers that recognize these standards, that truly international digital libraries have emerged. greenstone, an the international children’s digital library: a case study in designing for a multilingual, multicultural, multigenerational audience hilary browne hutchinson, anne rose, benjamin b. bederson, ann carlson weeks, and allison druin hilary browne hutchinson (hilary@cs.umd.edu) is a faculty research assistant in the institute for advanced computer studies and a ph.d. student in the department of computer science. anne rose (rose@cs.umd.edu) is a faculty research assistant in the institute for advanced computer studies. benjamin b. bederson (bederson@cs.umd.edu) is an associate professor in the department of computer science and the institute for advanced computer studies and director of the human-computer interaction laboratory. ann carlson weeks (acweeks@umd.edu) is professor of the practice in the college of information studies. allison druin (allisond@umiacs.umd.edu) is an assistant professor in the college of information studies and the institute for advanced computer studies. all authors are affiliated with the university of maryland-college park and the human-computer interaction laboratory. open-source software project based in new zealand, allows people to create online digital libraries in their native language and culture.13 oclc recently completed a redesign of firstsearch, a web-based bibliographic and full-text retrieval service, to accommodate users with different software, languages, and disabilities.14 researchers at virginia tech redesigned citidel, an online collection of computer-science technical reports, to create an online community that allows users to translate their interface into different languages.15 researchers have also realized that beyond accessibility, digital libraries have enormous potential for empowerment and building community, especially in developing countries. witten et al. and downie describe the importance of community involvement when creating a digital library for a particular culture, both to empower users and to make sure the culture is accurately reflected.16 even more than accurately reflecting a culture, a digital library also needs to be understood by the culture. duncker notes that a digital-library interface metaphor based on a traditional physical library was incomprehensible to the maori culture in new zealand, who are not familiar with the conventions of western libraries.17 in addition to international libraries, a number of researchers have focused on creating digital libraries for children. recognizing that children have difficulty with spelling, reading, and typing, as well as traditional categorization methods such as the dewey decimal system, a number of researchers have created more child-friendly digital libraries.18 pejtersen created the bookhouse interface with a metaphor of rooms in a house to support different types of searching.19 külper et al. designed the bücherschatz interface for children who are eight to ten years old using a treasure-hunt metaphor.20 druin et al. designed the querykids interface for young children to find information about animals.21 theng et al. used the greenstone software to create an environment for older children to write and share stories.22 the icdl project seeks to build on and combine research in both international and children’s digital libraries. as a result, icdl is more ambitious than other digital library projects in a number of respects. first, it is designed for a broader audience. while the digital libraries already described target one or a few cultures or languages, icdl’s audience includes potentially every culture and language in the world. second, the content is not localized. part of the library’s goal is to expose users to books from different cultures, so it would be counterproductive to present books only in a user’s native language. as a result, the interface not only supports multiple languages and cultures, but it also supports them simultaneously, frequently on the same screen. third, icdl’s audience not only includes a broad group of adults from around the world, but also children from three to thirteen years of age. to address these challenges, a multidisciplinary, multilingual, multicultural, and multigenerational team was created, and the development was divided into several stages. in the first stage, completed in november 2002, a java-based, english-only version of the library was created that addressed the searching and reading needs of children. in the second stage, completed in may 2003, an html version of the software was developed that addressed the needs of users with minimal technology. in the third stage, completed in may 2004, the metadata for the books in the library were translated into their native languages, allowing users to view these metadata in the language of their choice. the final stage, currently in progress, involves translating the interface to different languages and adjusting some of the visual design of the interface according to the cultural norms of the associated language being presented. in this paper, the research is presented as a case study, describing the solutions implemented to address some of these challenges and plans for addressing ongoing ones. ■ icdl project description the icdl project was initiated in 2002 by the university of maryland and the internet archive with funding from the national science foundation (nsf) and the institute for museum and library services (imls). today, the projects continues at the university of maryland. the goals of the project include: ■ creating a collection of ten thousand children’s books in one hundred languages; ■ collaborating with children as design partners to develop new interfaces for searching, browsing, reading, and sharing books in the library; and ■ evaluating the impact of access to multicultural materials on children, schools, and libraries. the project has two main audiences: children three to thirteen years of age and the adults who work with them, as well as international scholars who study children’s literature. the project draws together a multidisciplinary team of researchers from computer science, library science, education, and art backgrounds. the research team is also multigenerational—team members include children seven to eleven years of age, who work with the adult members of the team twice a week during the school year and for two weeks during the summer to help design and evaluate software. using the methods of cooperative inquiry, including brainstorming, lowtech prototyping, and observational note taking, the team has researched, designed, and built the library’s category structure, collection goals, and searching and reading interfaces.23 the international children’s digital library | hutchinson, rose, bederson, weeks, and druin 5 6 information technology and libraries | march 2005 the research team is also multilingual and multicultural. adult team members are native or fluent speakers of a number of languages besides english, and are working with school children and their teachers and librarians in the united states, new zealand, honduras, and germany to study how different cultures use both physical and digital libraries. the team is also working with children and their teachers in the united states, hungary, and argentina to understand how children who speak different languages can communicate and learn about each other’s cultures through sharing books. finally, an advisory board of librarians from around the world advises the team on curatorial and cultural issues, and numerous volunteers translate book and web-site information. ■ icdl interface description icdl has four search tools for accessing the current collection of approximately five hundred books in thirty languages: simple, advanced, location, and keyword. all are implemented with java servlet technology, use only html and javascript on the client side, and can run on a 56k modem. these interfaces were created during the first two development phases. the team visited physical libraries to observe children looking for books, developed a category hierarchy of kid-friendly terms based on these findings, and designed different tools for reading books.24 using the simple interface (figure 1), users can search for books using colorful buttons representing the most popular search categories. the advanced interface (figure 2), allows users to search for books in a compact, text-link-based interface that contains the entire librarycategory hierarchy. by selecting the location interface (figure 3), users can search for books by spinning a globe to select a continent. finally, with the keyword interface, users search for books by typing in a keyword. younger children seem to prefer the simplicity and fun of the location interface, while older children enjoy browsing the kid-friendly categories, such as colors, feelings, and shapes.25 all of these methods search the library for books with matching metadata. users can then read the book using a variety of book readers, including standard html pages and more elaborate java-based tools developed by the icdl team that present book pages in comic or spiral layouts (figures 4–6). in addition to the public interface, icdl also includes a private web site that was developed for book contributors to enter bibliographic metadata about the books they provide to the library (figures 7 and 8). using the metadata interface, contributors can enter information about their books in the native language of the book, and optionally translate or transliterate this information into english or latin-based characters. the design of icdl is driven by its audience, which includes users, contributors, and volunteers of all ages from around the world—more than six hundred thousand unique visitors from more than two hundred countries (at last count). as a result, books written in many different languages for users of different ages and cultural backgrounds must be collected, processed, stored, and presented. the rest of this paper will describe some of the challenges encountered and that are still being encountered in the development process, including selecting and processing a more diverse collection of books, handling different character sets and fonts, and addressing differences in cultural, religious, social, and political interpretation. figure 2. icdl advanced interface figure 1. icdl simple interface ■ book selection and processing the first challenge in the icdl project is obtaining and managing content. collecting books from around the world is a challenge because national libraries, publishers, and creators (authors and illustrators) all have different rules regarding copyrights. the goal is to identify and obtain award-winning children’s books from around the world, for example, books on the white ravens list, which are also made available to icdl users (www. icdlbooks.org/servlet/whiteravens).26 however, unsolicited books are received, frequently in languages the team cannot read. as a result, members of the advisory board and various children’s literature organizations in different countries are relied on to review these books. these groups help determine whether books are relevant and acceptable in the culture they are from, and whether they are appropriate for the three-to-thirteen age group. these groups are eager to help; including them in the process is an effective way to build the project and the community surrounding it. in addition to collecting and scanning books, bibliographical metadata in the native language of the book (title, creator[s], publisher, abstract) are also collected via the web-based metadata form filled out by the book contributors. it was decided to base the icdl metadata specification on the dublin core because of its international background, ability to be understood by nonspecialists, and the possibilities to extend its basic elements to meet icdl’s specific needs (see www.icdlbooks.org/ metadata/specification for more details).27 contributors who provide metadata have the option of translating them to english; they also can transliterate them to latin characters, if necessary. regardless of what language or figure 5. icdl comic book reader figure 3. icdl location interface figure 4. icdl standard book reader figure 6. icdl spiral book reader the international children’s digital library | hutchinson, rose, bederson, weeks, and druin 7 8 information technology and libraries | march 2005 languages they provide, they are asked to provide information that they create themselves, such as the abstract, in a format that is easily understandable by children. simple, short sentences make the information easy for children to read, and easier to translate to other languages. the metadata provided allow the team to catalog the books for browsing according to the various categories and to index the books for keyword searching. even though translation to english is optional, the englishspeaking metadata team needs the metadata in english in order to catalog the books. since many contributors do not have the time or ability to provide all of this information, volunteers who speak different languages are relied on to check the metadata that get submitted, and translate or transliterate them as necessary. this method allows information to be collected from contributors without overwhelming them, and also helps build and maintain the volunteer community. ■ handling different character sets the metadata form allows contributors to provide information from the comfort of an operating system and keyboard in their native language, but this flexibility requires software that can handle many different character sets. for example, english uses a latin character set; russian uses a cyrillic character set; and an arabic character set is used for persian/farsi. fortunately, there exists a single character set called unicode, an international, cross-platform standard that contains a unique encoding for nearly every character in every language.28 unfortunately, not all software supports unicode as yet. in the first stage of implementation in icdl, metadata information was collected only in english, so unicode compliance was not a problem. however, in the next phase of development, which included collecting and presenting metadata in the native language of all of the books, the software had to be adjusted to use unicode because icdl supports potentially every language in the world. the open-source mysql database, recently upgraded to allow storage of unicode data, was already in use for storing metadata. icdl’s web applications run on apache http and tomcat web servers, both of which are freely available and unicode-compliant. however, both the web site and the database had to be internationalized and localized to separate the template for metadata presentation from the content in different languages. a unicode-compliant database driver was necessary for passing information between the database and the web site. both the public and metadata web-site applications are written using freely available java servlet technology. the java language is unicode-compliant, but some adjustments had to be made to icdl’s servlet code to force it to handle data using unicode. to allow users to conduct keyword searches for books in the public interface, apache’s freely available lucene search engine is used to create indices of book metadata, which can then be searched. lucene is unicode-compliant, but a separate index for each language had to be created, requiring users to select a search language. this requirement was necessary for two reasons: (1) to avoid confusion over the same words with different meanings (bra means good in swedish); and (2) different languages have different rules for stopwords to ignore (the, of, a in english), truncation of similar words (cats has the same root as cat in english), and separation of characters (chinese does not put white space between symbols). lucene has text analyzers for a variety of languages that support these different conventions. for languages that figure 8. icdl metadata interface with japanese metadata figure 7. icdl metadata interface with spanish metadata lucene does not support, icdl volunteers translated english stopwords, and simple text analyzers were created by the team. finally, html headers created by the java servlets had to be modified to indicate that the content being delivered to users was in unicode. most current browsers and operating systems recognize and handle web pages properly delivered in unicode. for those that do not, help pages were created that explain how to configure common browsers to use unicode, and how to upgrade older browsers that do not support unicode. by making the icdl systems fully unicode-compliant, contributors from all over the world can enter metadata about books in an easily accessible html form using their native languages, and the characters are properly transmitted and stored in the icdl database. volunteers can then use the same form to translate or transliterate the metadata as necessary. finally, this information can be presented to our users when they look at books. for example the book where’s the bear? (harris, 1997) is written in six different languages.29 the original metadata came in english, but icdl volunteers translated them to italian, japanese, french, spanish, and german. users looking at the preview page for this book in the library have the opportunity to change the display language of the book to any one of these languages using a pull-down menu (figures 9 and 10). currently, only the book metadata language can be changed, but in the next stage of development, all of the surrounding interface text (navigation, labels) will be translated to different languages as well. the plan for doing this is to take a similar approach to the citidel and greenstone projects by creating a web site where volunteers can translate words and phrases from the icdl interface into their native language.30 like the creators of citidel, the team believes that machine-based translation would not provide good enough results. unfortunately, the resources do not exist for the team to do the translating themselves. encouraging volunteers to translate the site will help enlarge and enrich the icdl community. for languages that do not receive volunteer translation, translation services are an affordable alternative. ■ character-set complications several issues have arisen as a result of collecting multilingual metadata in many character sets. first, different countries use different formats for dates and times, so contributors are allowed to specify the calendar used when they enter date information (muslim or julian). second, not only do different countries use different formats for numbers, the numbers themselves are also different. for example, the arabic numbers for 1, 2, 3 are even though java is unicode-compliant, it treats numbers as latin characters, necessitating the storing of latin versions of any non-latin numbers used internally by the software for calculations, such as bookpage count. a third issue is that some of the metadata, such as author and illustrator names, need to be transliterated so their values can be displayed when the metadata are shown in a latin-based language. ideally, the transliteration standards used for a language need to be consistent so that the same values are always transliterated the same way. unfortunately, the team has found no practical way to enforce this, except to state the standard to be used in icdl metadata specification. when different standards are used, it makes comparison of equal items much more difficult. for example, the same persian/farsi creator has been figure 10. where’s the bear? in japanese figure 9. where’s the bear? in english the international children’s digital library | hutchinson, rose, bederson, weeks, and druin 9 10 information technology and libraries | march 2005 transliterated as both “hormoz riyaahi” and “hormoz riahi.” it cannot be assumed that a person is the same just because the name is the same (john smith), and when a name is in a character set that the team cannot understand, this problem becomes more challenging. finally, there was the question of how to handle differences in character-set length and direction in the interface. different languages use different numbers of characters to present the same text. icdl screens had to be designed in such a way that the metadata in languages with longer or shorter representations than the english version would still fit. the team anticipates having to make additional interface changes to accommodate longer labels and navigational aids when the remainder of the interface is translated. the fact also had to be considered that, while most languages are read left to right, a few (arabic and hebrew) are read right to left. as a result, screens were designed so that book metadata were reasonably presented in either direction. currently, only the text is displayed right to left, but eventually the goal is to mirror the entire interface to be oriented right to left when content is shown in right-to-left languages. for the problem of how to handle the arrows for turning pages in right-to-left languages—since these arrows could be interpreted as either “previous” and “next” or “left” and “right”—“previous” and “next” were chosen for consistency, so they work the same way in leftto-right books and right-to-left books. ■ font complications while most current browsers and operating systems recognize unicode characters, whether or not the characters are displayed properly depends on whether users have appropriate fonts installed on their computers. for instance, a user looking at where’s the bear? and choosing to display the metadata in japanese will see the japanese metadata only if the computer has a font installed that includes japanese characters. otherwise, depending on the browser and operating system, he may see question marks, square boxes, or nothing at all instead of the japanese characters. the good news is that many users will never face this problem. the interface for icdl is presented in english (until it is translated to other languages). since most operating systems come with fonts that can display english characters, the team has metadata in english (always presented first by default) for nearly all the books. users who choose to display book metadata in another language are likely to do so because they actually can read that language, and therefore are likely to have fonts installed for displaying that language. furthermore, many commonly used software packages, such as microsoft office, come with fonts for many languages. as a result, many users will have fonts installed for more languages than just those required for the native language of their operating system. of course, fonts will still be a problem for other users, such as those with new computers that have not yet been configured with different fonts or those using a public machine at a library. these users will need to install fonts so they can view book metadata, and eventually the entire interface, in other languages. to assist these users, help pages have been created to assist users with the process of installing a font on various operating systems. ■ issues of interpretation while technical issues have been a major challenge for icdl, a number of nontechnical issues relating to interpretation have also been encountered. first, until the interface has been translated into different languages, visual icons are crucial for communicating information to young children who cannot read, and to users who do not speak english. however, certain pictorial representations may not be understood by all cultures, or worse, may offend some cultures. for example, one icon showing a boy sticking out his tongue had to be redesigned when it was learned this was offensive in the chinese culture. the team has also redesigned other icons, such as those using stars as the rating system for popular books. the original icons used five-sided stars, which are religiously significant, so they were changed to more neutral sevenor eight-sided stars. as the team continues to internationalize the interface, there will likely be a need to change other icons that are difficult to represent in a culturally neutral way when the interface is displayed in different languages. for instance, it is a real challenge to create icons for categories such as mythology or super heroes, since the symbols and stories for these concepts differ by culture. icons for such categories as funny, happy, and sad are also complicated because certain common american facial and hand representations have different, sometimes offensive, meanings in different cultures. what is considered funny in one culture (a clown) may not be understood well by another culture. different versions of such icons may have to be created, depending on the language and cultural preferences of users. the team relies on its multicultural members, volunteers, and advisory board to highlight these concerns. religious, social, and political problems of interpretation have also been encountered. icdl’s collection develops unevenly as relationships are built with various publishers and libraries. as a result, there are currently many arabic books and only a few hebrew books; this has generated multiple e-mails from users concerned that icdl is taking a political stance on the arab-israeli conflict. to address this concern, the team is currently working to develop a more balanced collection. many books published in hong kong are received from contributors in either hong kong or china who want their own country to be credited with publication. to address this concern, it was decided to credit the publication country as “hong kong/china” to avoid offending either party. finally, some books have been received with potentially objectionable content. some of these are historical books involving presentation of content that is now considered derogatory. some include subject matter that may be deemed appropriate by some cultures but not by others. some include information that may be too sophisticated for children three to thirteen years of age in any culture. while careful not to include books that are inappropriate for children in this age group, the team does not want to censor books whose content is subjectively offensive. instead, such contributors are consulted to make sure they were aware of icdl collection-development guidelines. if they believe that a book is historically or culturally appropriate, the book is included. a statement is also provided at the bottom of all the book pages indicating that the books in the library come from diverse cultures and historical periods and may not be appropriate for all users of the library. ■ conclusions and lessons learned designing a digital library for an international, intergenerational audience is a challenging process, but it is hugely rewarding. the team is continually amazed with feedback from users all over the world expressing thanks that books are made available from their countries, from teachers who use the library as a resource for lesson planning, from parents who have discovered a new way to read with their children, and from children who are thrilled to discover new favorite books that they cannot get in their local library. thus, the first recommendation the team can make based on experience is that creating international digital-library resources for children is a rich and rewarding area of research that others should continue to explore. a second important lesson learned is that an international, intergenerational team is an absolute necessity. simply having users and testers from other countries is not enough; their input is valuable, but it comes too late in the design process to influence major design changes. team members from different cultural backgrounds offer perspectives that an american-only team simply would not think to consider. similarly, team members who are children understand how children like to look for and read books, and what interface tools are difficult or easy, and fun or not fun. enthusiastic advisors and volunteers are also a crucial resource. the icdl team does not have the time, money, or resources to address all of the issues that surface, and advisors and volunteers are key resources in the development process. bringing together as diverse a team as possible is highly recommended. the goals of educational enrichment and international understanding in an international library make it an attractive resource for people to want to help, so assembling such a team is not as difficult as it sounds. beyond the human resources, the technical resources involved in making icdl an international environment necessitate the examination and adjustment of software and interfaces at every level. unlike many digital libraries that only focus on one or a few languages, icdl must be simultaneously multilingual, multicultural, and multigenerational. as a result, a third lesson is that freely available and open-source technologies are now available for making the necessary infrastructure meet these criteria. with varying degrees of complexity, the team was able to get all the pieces to work together properly. the more difficult challenge, unfortunately, falls on icdl’s users, who may need to install new fonts to view metadata in different languages. however, as computer and browser technologies advance to reflect more global applications, this problem is expected to lessen and eventually disappear. having technical staff capable of searching for and integrating open-source tools with international support to handle these technical issues is highly recommended, as well as usability staff versed in the nuances of different operating systems and browsers. finally, the more subjective issue of cultural interpretation has proven to be the most interesting challenge. it is one that will likely not disappear as icdl’s collection grows and the next stage of development is embarked on for translating the interface to support other languages and cultures. the fourth lesson learned is that culture pervades every aspect of both the visual design and the content of the interface, and that it is necessary to examine one’s own biased cultural assumptions to ensure respect of others. however, with the enthusiasm that continues to be seen in the icdl team members, advisors, volunteers, and users, future design challenges will be able to be addressed with their help. the final recommendation is to actively seek feedback from team members, volunteers, and users from different backgrounds about the cultural appropriateness of all aspects of your software. it may not be possible to address all cultures in your audience right away, but it is important to have a framework in place so that these issues are addressed eventually. the international children’s digital library | hutchinson, rose, bederson, weeks, and druin 11 12 information technology and libraries | march 2005 ■ acknowledgments icdl is a large project with many people who make it the wonderful resource that it has become. we thank them all for their continued hard work, as well as our many volunteers and our generous contributors. we would especially like to thank nsf for our information technology research grant, and imls for our national leadership grant. without this generous funding, our research would not be possible. references 1. internet world stats. accessed mar. 9, 2005, www.internet worldstats.com 2. national telecommunications and information administration (2004). “a nation online: entering the broadband age.” accessed mar. 9, 2005, www.ntia.doc.gov/reports/anol/index. html. 3. i. witten et al., “the promise of digital libraries in developing countries,” communications of the acm 44, no. 5 (2001): 82–85; j. downie, (2003). “realization of four important principles in cross-cultural digital library development,” workshop paper for jcdl 2003. accessed dec. 16, 2004, http://music -ir.org/~jdownie/jcdl03_workshop_downie_dun.pdf. 4. p. busey and t. doerr, “kid’s catalog: an information retrieval system for children,” youth services in libraries 7, no. 1 (1993): 77–84; u. külper, u. schulz, and g. will, “bücherschatz—a prototype of a children’s opac,” information services and use no. 17 (1997): 201–14; a. druin et al., “designing a digital library for young children: an intergenerational partnership,” in proceedings of the acm/ieee-cs joint conference on digital libraries (new york: association for computing machinery, 2001), 398–405. 5. a. druin, “what children can teach us: developing digital libraries for children with children,” library quarterly (in press). accessed dec. 16, 2004, www.icdlbooks.org. 6. a. marcus, “global and intercultural user-interface design,” in j. jacko and a. sears, eds., the human-computer interaction handbook (mahwah, n.j.: lawrence erlbaum assoc., 2002), 441–63. 7. t. fernandes, global interface design (boston: ap professional, 1995). 8. g. hofstede, cultures and organizations: software of the mind (new york: mcgraw-hill, 1991). 9. fernandes, global interface design. 10. n. hoft, “developing a cultural model,” in e. del galdo and j. nielsen, eds., international user interfaces (new york: wiley, 1996), 41–73. 11. j. nielsen, “international usability engineering,” in e. del galdo, and j. nielsen, eds., international user interfaces (new york: wiley, 1996), 1–13. 12. c. borgman, “multimedia, multicultural, and multilingual digital libraries: or, how do we exchange data in 400 languages?” d-lib magazine 3 (june 1997). 13. i. witten et al., “greenstone: a comprehensive opensource digital library software system,” in proceedings of digital libraries 2000 (new york: association for computing machinery, 2000), 113–21. 14. g. perlman, “the firstsearch user interface architecture: universal access for any user, in many languages, on any platform,” in cuu 2000 conference proceedings (new york: association for computing machinery, 2000), 1–8. 15. s. perugini et al., “enhancing usability in citidel: multimodal, multilingual, and interactive visualization interfaces,” in proceedings of jcdl ‘04 (new york: association for computing machinery, 2004), 315–24. 16. witten et al., “the promise of digital libraries in developing countries”; downie, “four important principles.” 17. e. duncker, “cross-cultural usability of the library metaphor,” in proceedings of jcdl ‘02 (new york: association for computing machinery, 2002), 223–30. 18. p. moore, and a. st. george, “children as information seekers: the cognitive demands of books and library systems,” school library media quarterly 19 (1991): 161–68; p. solomon, “children’s information retrieval behavior: a case analysis of an opac,” journal of the american society for information science and technology 44, no. 5 (1993): 245–64; busey and doerr, “kid’s catalog,” 77–84. 19. a. pejtersen, “a library system for information retrieval based on a cognitive task analysis and supported by an iconbased interface,” acm conference on information retrieval (new york: association for computing machinery, 1989), 40–47. 20. külper et al., “bücherschatz—a prototype of a children’s opac,” 201–14. 21. druin et al., “designing a digital library,” 398–405. 22. y. theng et al., “dynamic digital libraries for children,” in proceedings of the joint conference on digital libraries (new york: association for computing machinery, 2001), 406–15. 23. a. druin, “cooperative inquiry: developing new technologies for children with children,” in proceedings of human factors in computing (new york: association for computing machinery, 1999), 592–99. 24. j. hourcade et al., “the international children’s digital library: viewing digital books online,” interacting with computers 15 (2003): 151–67. 25. k. reuter and a. druin, “bringing together children and books: an initial descriptive study of children’s book searching and selection behavior in a digital library,” in proceedings of american society for information science and technology conference (in press). 26. international youth library, the white ravens 2004. available for purchase at. www.ijb.de/index2.html (accessed dec. 16, 2004). 27. dublin core metadata initiative. accessed dec. 16, 2004, www.dublincore.org. 28. unicode consortium (2004). accessed dec. 16, 2004, www.unicode.org. 29. j. harris, where’s the bear? (los angeles: the j. paul getty museum, 1997). 30. perugini et al., “enhancing usability in citidel,” 315–24. 178 subject reference lists produced by computer ching-chih chen: massachusetts institute of technology, boston, massachusetts (formerly university of waterloo) and e. robert kingham, university of waterloo, waterloo, ontario, canada. a system developed to produce fourteen subject reference lists by ibm 360 f75 is described in detail. the computerized system has many advantages over conventional manual procedures. the feedback from students and other users is discussed, and some analysis of cost is included. introduction the university of waterloo, with the third largest enrollment in the province of ontario, was the first in canada to institute a "cooperative education plan". undergraduate students enrolled in cooperative courses (all engineering and some science, mathematics and arts students) spend eight four-month terms at the university for academic studies, alternated with six four-month terms with industry or government for practical experience related to their academic programmes. an ibm 360/ 75 at the university of waterloo is the heart of the largest university computer installation in canada, and is an important tool for faculty, students and administration. under multi-processing it can serve many departments through terminals around the campus. one terminal serves the data processing department of the computer centre, where all the maintenance and printing of various reports required for the project under discussion are handled for the engineering, mathematics & science library ( e.m .s. library) . the e.m.s. library contains approximately 75,000 volumes of monographs, periodicals, technical reports and government documents, and subject reference lists/ chen and kingham 179 currently receives 1,650 periodical titles. it serves about 4,500 on-campus students and more than 300 faculty members in the fields of engineering, mathematics and science (in 1967/68), and provides assistance on request to business and industry in the area. system since e.m.s. library users have frequently requested subject reference lists to guide them in the use of library materials, and library reference statistics have proved that there is a justified need for them ( 1), the reference staff began, in the fall of 1966, to investigate means of compiling and producing these lists. it was planned that each subject list should first be prepared and edited by reference librarians, but at that point, conventional manual procedures should be abandoned in favor of using the computer available on campus. in this way, operations in revising and updating the lists and in adding new lists in other subject areas would be simplified, manual clerical work would be reduced significantly (no typing would be needed) and titles related to interdisciplinary areas of study could be easily coded to appear on more than one list. although library literature contains numerous accounts of library automation programmes ( 2), it is very obvious that the chief emphasis has been on technical services and circulation applications. so far as "reference services" or "information services" go, many developments have been discussed in recent years in the areas of documentation, indexing, retrieval techniques and systems, selective dissemination of information, interlibrary communication, etc. . . concise summaries . can be found in many papers (3, 4, 5, 6, 7). however, in the initial stages of developing our system, we failed to locate any existing mechanized system of producing subject bibliographies for reference use. such subject reference lists could be easily generated if the library catalogue were in machine readable form ( 6, 8), but since a computerized catalogue was not foreseen at waterloo for some time to come, the library had to design and develop an independent system to fulfil reference needs. since december 1965, the university of waterloo libraries have achieved success in producing a serials list by computer. the techniques used in the original serials project (using an ibm 1620 with card input) which started in spring 1965 and was completed in december 1965, were not new, and the fields and codes used were based on modifications of those used by the national research council library ( 9) and dalhousie university [the dalhousie aa u list] ( 10). these techniques have also been used with various modifications at several libraries in the united states, such as m.i.t. libraries ( 11 ). in 1966, the waterloo serials project was greatly modified by conversion from ibm 1620 to ibm 360, and from a card system to a tape system, by re-writing the fortran ii pro180 journal of library automation vol. 1/3 september, 1968 gramme in rpg (report programme generator) and by expanding and adding certain data fields. the reference project was initiated in november 1966. it was apparent that, after the revision of the serials project, the newly improved serials. system could be adapted to maintain the master file of the reference subject lists. the project is unique in that it uses a separate code structure that makes possible the provision of information from the master file by types of materials within each subject area. it was decided that the existing library serials maintenance form could be used with minor modifications to produce reference lists. the original form was . modified to facilitate maintenance of the master file by the lib_rary reference staff and easy transcription onto cards by keypunch operators. through the use of these forms, the master file was created and is kept up-to-date. reference master file there are four record types in the master file, each of which is 64 characters in length. they are stored on tape in a blocked length of 6,400 characters for faster processing on the computer, tape being a relatively slow input-output device. the fields in each of the record types are as follows: 1. reference 1st record 1-7 serial number 8-10 record type code 1 [blank] [blank] 11 form type 12~21 classification number 22-32 cutter number 33-34 agent number 35 country code 36 language code 37-38 department code 39 serial exclusion code (for future use) 40-42 sequence number 43 library location 44-64 f~ller (for future use) 2; reference title record 1-7 serial number 8-10 record type code (2nn) 11-63 title information 64 filler (for future use) 3. reference holdings 1-7 serial number 8-10 record type code (3nn) 11-63 holdings information 64 filler (for future use) ' ( subject refetence lists/ chen and kingham 181 4. reference notes record 1-7 serial number 8-10 record type code ( 4nn) 11-63 notes information 64 filler (for future use) library data processing yes additions fig. 1. flowchart of maintenance run. computer print run --, i i i i i i i i i i i i i __ j i i r ( 182 journal of library automation vol. 1/ 3 september, 1968 programmes were written in r.p.g. ( 12) to achieve operational status rapidly with a minimum of debugging. r.p.g. is a problem-oriented ian· guage designed to provide users with an efficient, easy-to-use technique for generating programmes. a set of specification sheets is required, on which the user makes entries. the forms are simple and the headings on the sheets are largely self-explanatory. library i data processing i source '1'0 mai~alice .., ___ ...._ __ -t form oh display imtbe library fig. 2. flowchart of listings run. compu'rer porlic 'l'ransaction sort public print run subject reference lists/ chen and kingham 183 there are three phases to the e.m.s. computer runs: 1) maintenance run (weekly or as required) (figure 1); 2) listings run (monthly ) (figure 2); 3) masters run (semi-annually) (figure 3). . prints!iop subjjx:t referjlfce booklets printed library available 'lo stud!m's '-----1~ at. 25¢ p!3 copy fig. 3. flowchart of masters run. computer 184 journal of library automation vol. 1/ 3 september, 1968 library maintenance form most of the fields on this form (figure 4) are self-explanatory; however, the following may need further definition. library maintenance form serial no. [ lxi i i i i i i • i >p l • lsl•l7 le l' insert: "a" f or addit ion,"(;" for c:hange, or "d" for deletio n f 0 r m <:lass if i cation <: utter. i /k i i i i i l l_l i i i i seria lswhi te reference pink agnt. <: l dept . s e f. code t code n a no. e & 8. ~ ~ ·~· seq. no. 5 y <: r n n n i l 10 ii 11 ll 14 is 16 17 11 19 20 2 1 22 21 1 4 1 5 16 17 1 8 19 30 31 l2 ll h ls 16 37 38 19 40 41 42 4] 4<4 45 when "change' ' has been checked above an oit aff ects title holdings or notes i ndic ate the type of chang e with a-a dd ition c-change 0-0eletion here; place a li ne ty p e code t-title h-holoing nnot es he re;; place ttle sequence number witt~in t he line type he re : l 10 ll·l l 13 i s 20 25 ) 0 fig. 4. library maintenance form. columns 1-2: card code there are six possible codes: 35 •o 1. a[blank] new entry to master file. so 5 5 __l i 60 65 2. c[blank] change to record type 1 (see cols. 10 12 as described below). 3. cal 4. cc~ 5. cdj 6. d[blank] change in lines for an existing entry on the master file, which add, change or delete respectively title, holdings and/ or notes. deletion of an entry from master file. subject reference lists/ chen and kingham 185 columns 3-9: serial number (major sequence of master file) serial number is assigned to every new entry to maintain the alphabetical order of the complete listing. it consists of one alphabetic character taken from the first letter of the main entry, followed by six numerics which serve to make each entry unique within the letter. columns 10-12: (minor sequence of master file) there are four record type codes: · · 1. record type 1 one record permitted per entry ( information on call number, subject matter of the entry and other data). cols. 11 & 12 not 2. record type tl record type h ~ record type n j column 13: form code used. col. 10 contains "title", "holdings" & "notes" information respectively from cols. 13-65 inclusive. cols. 11 & 12 permit up to 99 lines per record type per entry. this alphabetic code represents form of publication, e.g., "a" stands for "abstract", "p" stands for "periodical'' etc ... column 39-40: department code this numeric code indicates the subject list or lists which reference librarians. assign to each entry, and there are two code types: 1. prime department numbers, of which there are 14, e.g. 20 physics ..... to appear on the physics list. 2. implied department numbers : to appear on 2 or more of the prime department lists. e.g. 41 math., phys. & chern. _..,... to appear on the math., phys. & chemistry lists. 60 general _ ____ ..,... to appear on all fourteen subject lists. etc. . . column 4244: sequence number col. 42 is always "r", which stands for "reference list". col. 43 & 44 is a numeric code which indicates type of reference materials. e.g. 12 reference books dictionaries 14 reference bookshandbooks and tables 60 abstracts and indexes where .. reference books" & "abstracts and indexes" are section headings, and "dictionaries" & "handbooks and tables" are sub-section headings. 186 journal of library automation vol. 1/3 september, 1968 pre-edit report the programme that produces the pre-edit report (figure 5) checks the maintenance transactions for the following known error conditions: 1. card code invalid. 2. serial number invalid. 3. sequence number invalid. 4. first record card columns 46-80 should be blank. 5. agent code invalid. 6. country code invalid. 7. language code invalid. 8. department number invalid. 9. exclude code should be "x" or blank. 10. reference code invalid. 11. library location invalid. 12. deletion card should be blank card columns 10-80. 13. 1st record card missing on addition. 14. title, holding or note card sequence error. 15. title, holding or note delete card should be blank card columns 13-80. 16. title, holding or note, addition or change card should be blank card columns 66-80. this step catches approximately 80% of the clerical and keypunching errors. page 1 rei'eri:lit rei'ort liallcr 0}, 1968 s , b,-w't· serial ex ref.llst r-80 llb.locn, 1)200000 addition title tol design quarterly. d200000 addition holding hol 1966/67d520700 change title from directory of british scientists, lonoon, e. benn, 'f01 to directory of british scientists, vi03100 change bolding fro!~ library has vols. 1-3· b01 to library has vols, l-it, m4o8ooo .... attehpt to add new record has been unsuccessnll serial nwmer exists already .... card columns .. . . 5 ••• 10 ••• 15 • •• 20 ••• 25 .. . 30 ... 35 .. ,40, .. 45 ... 50 ... 55 ... 60 ... 65 ., .,invalid card a m4o8oool be331 b55 40 rlelt a66 88 4 d36 d36 82 ito 4 4 e9 60 4 nl?3000 .... attd!pt to change a record has been unsuccessful add t,h or n seq.no, exists already .,.,card columns .. , .5. , .10 ... 15 ... 20 ••• 25 .. ,3q ... 35 ... 40 ... 45 ... 50 ... 55 ... 6o .. ,65 •••• invalid card can173qooh02 1925-1962// 111?3000 .,.,attempt to change a record has been unsuccessful add t,h or n seq.no. exists already ,.,,card columns , ... 5 ... 10 ... 15 ••• 20 ••• 25 ... 30 ... 35 ••• 40 ... 45 ... 50 • • • 55 ... 60 ... 65 ... ,invalid. card caiu73000n01. superseded by its highway research rf.x:ord, master/l'ile records read 6292 lltlmber of records added 162 number of records deleted 55 master/file records 'riritten 6399 lltlmber of invalid maintenance records not processed 8 fig. 6. maintenance report. 188 journal of library automation vol. 1/ 3 september, 1968 3. two types of error conditions that fail to appear in the pre-edit report due to the absence of the master file in the pre-edit programme. a. additions where serial numbers and/ or sequence numbers ( cols. 1012) exist already. b. changes/deletions where serial numbers and/ or sequence numbers are non-existent. 4. master file maintenance statistics on: ·~ a. master file records read. b. number of records added. c. number of records deleted. d. master file records written. e. number of invalid maintenance records not processed. addition list this list (figure 7) is an alphabetical summary (in serial number sequence) containing added entries only from the "maintenance" run. information on call number, complete bibliographical data of the entry, department or subject code ( cols. 39 40) and location are printed for each entry. this augments the internal reference list between the "listings" run (see figure 2). page 1 reference addition list for_ wed< ending ~anuary 30, 1968. serial a262000 abs qdl a 53 d200000 per nk1 ag cntry l dpl'. 85 t01 american chemical society • lf02 abstracts of paper. hol 196?60 e9 tol design quarterly • hol 1966/6?d56.5000 ref z?916 d6 01 tol doc~ts digest. h01 vol. 16, no. ?fig. 7. addition list. locn ma. ma. eng. subject reference lists/ chen and kingham 189 internal reference list this is a complete alphabetical list (figure 8) of all entries on the master file, similar to the addition list (figure 7) in arrangement and format. the serial number sequence facilitates the reference staff assignment of unused serial numbers to new entries and the easy location of serial numbers of entries for updating purposes. this document is the prime source of information for maintaining the master file. public list the main list (figure 9) is first divided by subjects of which there are fourteen: mathematics, astronomy, biology, chemistry, earth sciences, physics, design, management sciences, aero engineering, chemical engineering, civil engineering, electrical engineering, mechanical engineering and nuclear engineering. each subject list is further divided into the following sections and sub-sections: 1. reference books a. guides to the literature and bibliographies b. periodical listings c. dictionaries d. encyclopedias e. handbooks and tables f. directories -individuals page 1 serial a002500 per tk1 a8 a020000 per qc221 a4 a028ooo per qd1 a325 internal reference list ag cntry 1. dpt • ref x r 48 r80 t01 asea journal h01 vol. 321959n01 published with abstracts 60 rbo '1'01 acoustical society of america. '1'02 journal, h01 vol. 17• 1945/4644 rbo '1'01 acta chemica scandinavica. h01 vol. 1· 1947fig. 8. internal reference list. . february 5, 1968. locn eng, eng, eng. 190 journal of library automation vol. 1/ 3 september, 1968 chdiisl'rt page 4 encycidpaedias ref the encyclopedia of ciidusl'rt. 21> ed. qd5 new york, reinhold publishing corp., 1966. e.58 ref hampel, clifford allen, ed. qd.553 the fzfcycwpedia of elfx;troc!m4istry • b3 new york, reinhold, c1964. ref international encyclopedia of che24ical science. qd5 princeton, n.j., van nostrand, 1964. i5 ref jacobson, carl alfred, ed. qdl55 ejicyclopedia of choocal reactions. new york, j} reinhold pub. corp., 1946-1959. 8v. kingze'rl', charles thomas. kingze'rl's chemical encycidpedia, a digest of chejfis'l'ry &c its inoos'l'rial applications.gr ~:;:h~._~ princeton, n.j., van nostrand, 1966. fig. 9. public list. g. directoriesorganizations h. international conferences 2. standards and patents 3. important series 4. theses 5. abstracts and indexes 6. periodicals reference booklets it is planned that semiannually the e.m.s. library will receive from the computer centre the computer produced masters, which are exact duplicates of the public list except that they are printed on unlined paper with a special printer ribbon. the masters are then sent to the university's printshop, and the fourteen separate reference booklets are printed from offset masters photo-reduced to 75% of the original. this results in a publication of convenient size (8~"x5~") with clearer typographical representation than the actual computer printout. figure 10 shows a representative page from the aero engineering list of the first edition. rff tho) f.~7 rff tall .e~ ref ta'i j6~ ref l~c63 a~ ~2 ref q 17.1 1032 subfect refet·ence lists/ chen and kingham 191 atron, eng , p~ge 6 t~e et;cytlop(dja of f.ngi~eering iuhiuals ano processes, nfw ycp~t r~ini'cli) pub, corp,, 1'163, hcyclopeilu of hcimf.ring sic:ns anc syiii'olso nfw yor~, coyssf.y pi\fs~, c\'165, jones, frln~lin day, fc, enei~fering f.ncyclopeoia, 30 f.o, new york, inousti\ul press, c\'163, kempe s eng infers yf.ar-iiook, 720 ell, lcnoon, "organ broti'f.rs, l9h, 2v, l mccraw-hill encyclopedia qf science ano technology, rev. f!), new yllrk, ~cgraw-hll, 1'166, 15v, ~cc~aw-hlll yearbook of scie~ce ano technology, nfh york, ~ccraw-hillt 1'16?.hanoftooks and tables ref tjz33 1572 tjiu6 a~6 th07 j.7z ~ef oclu 8~5 american society for testing ann "atf.rial s, coii"t ttee ~-1 d n f 'on-c~roiiiuii, ircn-chromtliiinl ckelt ano relued alloys compilation, cciifilation of chell i cal compos it ions and rupture strengths of super-strength alloys, philadelphia, 1'16 .... american society of ~echhical engineers, asme handbook, 2d f.d, new y~rk, mcgraw-hill, 1965library has yol o l • aiurican society of tool and han\.ifacturinc engineers, machining thf. space-age ~etals.,, cearborno hichigaic, 1'165. armcur resurch founcationt cii'icago, handbook of thhhophvsical pri)perths of solid materials, rev, eo, new york, iiciiillan, 1'161. 5vo aviatiot; ace research and developme~t technical handbook, l'ih-1958, benedict, roftf.rt p, ha~cftcck cf generillzec cas dynamics , new yorko plenuh ~ress data divislllnt 1966, fig. 10. page (actual size) in the aero engineering list. •' •' ,. ' i i i i i i ' 192 journal of library automation vol. 1/ 3 september, 1968 table 1. information on first edition copies number estimated copies ordered sold to of printing first second students & subject pages cost/ copy printing printing faculty astronomy 9 14c 30 40 7 biology 16 18c 90 37 chemistry 16 18c 150 64 earth sciences 22 2lc 50 19 physics 15 18c 100 44 design 14 17c 50 12 management sciences 11 16c 30 100 88 mathematics 15 18c 150 81 aero engineering 20 chemical 20c 30 40 11 engineering civil 28 24c 100 46 engineering elech·ical 23 22c 100 57 engineering mechanical 26 23c 100 65 engineering nuclear 27 24c 100 44 engineering 16 18c 30 40 5 discussion first edition an estimated number of copies for each list, as shown in table 1, was ordered on the basis of sttident enrollment figures in different departments of the faculties of engineering, mathematics and science, and on the subject matter of each list in relation to the academic programmes of the university. it was hoped that those copies could adequately meet the demands of students, faculty and interested people outside the u niversity until the completion of the second edition, tentatively set then for september 1968. the first edition of the reference lists was available for distribution at the end of november 1967. experience having shown that free library materials were no sooner received than discarded, it was decided to give some value to these lists by a charge of 25¢ per copy. from the start students responded so enthusiastically to ·the lists that one week after their , sub;ect reference lists/ chen and kingham 193 availability, the library had to order 100 additional copies of the second printing of the "management science" list, and by the end of february 1968,. 40 additional copies each of the "aero engineering," "nuclear engineering" -and "astronomy" lists. table 1 gives information on quantities printed, costs, and sale to students and faculty of first edition lists. the estimated printing cost per copy is based on printing of 100 copies. · mter the announcement of the availability of the lists in several library professional journals, the e. m. s. library received many letters of inquiry and requests for complimentary copies. complimentary distribution was made of 12 sets and some 80 lists of different subjects. purchase orders were received for 83 complete sets "of lists, 21 from canada, 58 from the united states, and two each from australia and england. by the end of -march 1968, stock of the first edition was exhausted, and there were still 44 purchase orders unfilled and 28 filled only partially. questionnaire instead of ordering more copies of the first edition from the university f'rintshop to meet the requests received thus far, the reference staff decided to work on a second edition, and the original scheduled completion date of that edition was moved ahead to early june 1968. ' although the e. m. s. library had already received many valuable suggestions and comments on the project from waterloo faculty and interested librarians in canada and the united states, including some very enthusiastic library school professors, there was little feedback at that time from the immediate users, the students, on their use of the lists. since addresses and department affiliations of most of those who purchased lists had been recorded, it was possible to send out questionnaires (figure 11) to 210 undergraduates, 122 graduate students and 30 faculty members in the beginning of april 1968. by april 20, 65 returns ( 31%) were received from the undergraduates, 41 ( 33.6%) from the graduates and 11 (36.6%) from the faculty. a summary of those returns, shown in table 2, has been of great help in assessing the value of the first edition. from the replies it is certain that almost all who purchased lists found them useful and would be willing to buy the updated edition. most important, students used the list for research' 'purposes (including term papers and thesis work), thus fulfilling the original purpose of the project. another fact emerging from the questionnaire ' was that the number of serial titles included should be greatly expanded. second edition reference librarians started at the end of april to update the fourteen subject reference lists by incorporating the valuable feedback and comments received and to compile the fifteenth list, "optometry" (the university of waterloo has had a new optometry school since september 1967). many changes, additions and deletions have been made, and the 194 journal of library automation vol. 1/3 september, 1968 university of waterloo e.m,s, library according to our records, you have purchased one (or mqre) of the reference booklets. in order to plan for a second edition, and to a ssess the value of th~ first edition, we would be most grateful if you would fill out this questionnaire as completely as possible and mail it to us before april 20 1 1968. it is not necessary to sign yo~r name. 1. haye you used your reference list? a, if so did you find it helpful? yes yes 0 no 0 no 0 0 b. for what purpose did you use the list? regulard studies research o (including term papers, c. did you use the list in place of the serials list and card catalogue? 2. did the list save you time in your use of the library? 3. should the list include more or fewer titles? yea yes more a, which area~ do you feel should be expanded or deleted? thesis work) 0 no 0 0 no 0 ~~ fewer ~~ expanded deleted guides to the i.iterature & bibliographies •••••••••••••• periodical li stings ••• , ••••••.•••••• · •••• , • •••••••• , , , • , dictionar~es • •• •••• •• .•• •. • • .••• •..•. • •• , • • , , •• , •. , • , .• encyclopedias , , ••• , • , •••••••••••• • ••••• , •••••• , • , • , • , • , handbook and tables , • , •••••••.•••• ••• , ••••• , ••• , •••• , •• directoriesindividuals •••• ••••• ••••••••••••••••••••• directo!ues organizations ••••••••••• , • . ••••••• , , • , , , • international conferences •• , ••••••••• , ••••••••••••••••• standards and patents ••••••••• • •••• •• ••••.•••••• , , , •• , • important series , , , •• , , ••• •• • • •••••••••••• , • , , •• •• , • , , • theses ••••• •• ••• , • , ••••••••••••••••• •• , • •• • •••• • •••• , • , abstracts and indexes • , , ••••• • •••• , • , • • , • , •• , •••• , , •• , , periodi cals •. , , •• , •. , . , •.•.•• .• , •• . . ••. , •. , •. , •. , •••• , , • b. which specific titles do you feel should be added? c. which specific titles do you feel should be deleted? 4, would you be interes ted in buying an updated edition of the reference list? 5. additional comments 6. undergraduate d graduate· 0 faculty 0 thank you for answering this questionnaire. if you would like to discuss further anything pertaining to the reference lists, please feel free to call us, fig. 11. questionnaire on use of reference subject lists. subject reference lists/ chen and kingham 195 table 2. summary of questionnaire returns question undergr. grad. fac. 1. yes 39 30 5 no 26 11 5 la. yes 31 26 5 no 8 3 1 lb. studies 16 9 1 research 27 23 5 lc. yes 19 12 2 no 25 20 4 2. yes 27 21 6 no 11 11 1 3. more 45 32 5 fewer 2 3a. handb. exp. 16 18 7 .. del. 2 1 2 series exp. 19 8 3 .. del. 1 theses exp. 15 14 1 .. del. abst. exp. 15 13 3 '' del. 1 per. ea. 28 23 6 " de. 2 1 4. yes 23 24 6 no 17 8 6. 65 41 14 serial titles greatly expanded as requested by users. a new sub-section heading has been created under the section "reference books" for reference materials of a very general nature; thus materials such as encyclopaedia canadiana, canada yearbook, etc ... are pulled out from subsections such as "reference encyclopaedia" and "reference handbooks & tables" etc . .. to the sub-section "reference general" at the very beginning of each subject list. it is estimated that the second edition will be available at the beginning of june. a comparison of the two editions is shown in table 3. cost although up to this time, the computer centre has made no internal charge for its services to the library, it is estimated that with the university's present computer configuration, the monthly cost of maintaining 196 journal of library automation vol. 1/3 september, 1968 table 3. first and second editions compared edition completion date no. of records on master file addition up-dating . change (no. of entnes) d 1 t' e e 1011 no. of subject lists number of pages of each subject list aero engineering chemical engineering civil engineering electrical engineering mechanical engineering nuclear engineering design management sciences mathematics astronomy biology chemistry earth sciences physics optometry i nov./ 67 c.5,500 14 20 28 23 26 27 16 14 11 15 9 16 16 22 15 n · june/68 7,446 280 216 7 15 26 37 31 34 35 21 17 15 21 14 23 27 27 22 15 this project is approximately $40.00. this cost covers about 4 minutes/ month computer time, about 2 hours/month for keypunching and verifying and the cost of punch cards, multipart paper etc. . ., but does not cover the initial cost of system analysis and the charges for printing the booklets. by comparison, it would cost approximately $95.00 per month to produce the copy by hand and this method . would not provide the flexibility and other advantages of a computerized system. heferences 1. chen, ching-chih: "computer-produced subject reference lists," iplo newsletter, 9 (feb. 1968), 38-40. · ·. 2. mccune, lois c.; salmon, stephen r.: "bibliography of library automation," ala bulletin, 61 (june 1967), 674-94. · 3. black, donald v.; farley, earl a.: "library automation," in annual review of information science and technology, edited by carlos a. cuadra (new york, inter science: wiley) . 1 ( 1966), · 273 303. 4. schultz, claire . k.: "automation of reference work," .libmry trends, 12 (jan. 1964 ), 413-424. · subject reference lists/ chen and kingham 197 5. brownson, helen l. : "new developments in scientific documentation," cla occasional paper, no. 32, 1961. 6. hammer, donald p.: "automated operations in a university library; a summary," college & research libraries, 26 (jan. 1965), 19-29, 44. 7. prodrick, r. g.: "automation can transform reference services," ontario library review, 51 (sept. 1967) , 145-50. 8. cox, n. s. m.; dews, j. d. ; dolby, j. l.: the computer and the library; the role of the computer in the organization and handling of information in libraries (hamden, conn.: archon books, 1967), 78-84. . . 9. brown, j. e.; walters, peter: "mechanized listing of serials at the national research council library," canadian library, 19 (may 1963 ), 420-26. 10. wilkinson, john p.: "a.a.u. mechanized union list of serials," apla bulletin, 29 (may 1965), 54 59. 11. nicholson, natalie n.; thurston, 'villiam : "serials and journals in the m.i.t. library," american documentation, 9 ( 1958), 304-7. 12. international business machines corporation : "ibm system 360 operating system-report programme generator specifications," ibm system reference library, file no. s360-20, form c24-3337, (ibm programming publications dept. 452, san jose, c.alif. 95114, 1965 + revisions). 53 usa standard for a format for bibliographic information interchange on magnetic tape the chairman of the united states of america standards institute, sectional committee z39, library work and documentation, has approved publication of the following draft "usa standard for a format for bibliographic information interchange on magnetic tape" to hasten availability of this fundamental contribution to bibliographic standardization. two important implementations follow the standard. part b of appendix i is "preliminary guidelines for the library of congress, national library of medicine, and national agricultural library implementation of the proposed american standard for a format for bibliographic information interchange on magnetic tape as applied to records representing monographic materials in textual printed form (bookstmore succintly known as marc ii. part cis a committee working paper entitled "preliminary committee on scientific and technical information (cosati) guidelines for implementation of the usa standard." 0. introduction 0.1 t~is introduction is not part of the proposed standard but is included to facilitate its use. 0.2 this standard defines a format which is intended for the interchange of bibliographic records on magnetic tape. it has not been designed as a record format for retention within the files of any specific organization. nor has it been the intent of the subcommittee to define the content of individual records. rather it has attempted to describe a generalized structure which can be used to transmit between systems records describing all forms of material capable of bibliographic qescriptions, as well as related records, such as authority records for authors and subject headings. 54 journal of library automation vol. 2/2 june, 1969 0.3 in designing the format the subcommittee has tried to achieve the goals listed below. it recognizes, however, that the goals were not completely compatible and that various trade-offs were required. (a) hospitality-to all kinds of bibliographic information should be provided; (b) hardware independence-a format which can be used with a variety of digital computers should be defined; (c) uniformity of structure-the structure of all machine records should be basically identical and include such control information as may be required "to specify unique characteristics. for any given class of records the components of the format may have specific meanings and unique characteristics; (d) data manipulation-the methods of recording and iden. tifying data should provide for maximum manipulability · ·leading to ease of conversion -to other f.ormats for various uses. · 0.4 the standard· includes the concept that the bibliographic unit may .be described independently or in relation to other bibliographic units. many relationships exist, including: the hierarchical, in which the bibliographic unit contains, or is contained in, another bibliographic unit, e.g., a monograph in a series; the equivalent, e.g., a work and its translation; and the sequential, e.g., a serial which appeared under a succession of titles. the standard provides for bibliographic records which describe one or more related bibliographic units, and provides for coding the relationships among them. appendix ii describes a proposed method for implementing this concept. 0.5 preliminro·y guidelines for implementing the standard by two different groups of users are provided in appendix i. these guidelines are not part of the standard but . are included to illustrate the use of the format. 0.6 explanatory material which is not part of the standard but which will assist in its interpretation or implementation appears in brackets. 0.7 the appendices accompanying this standard are not part of the standard. · · · · 0.8 the development of this standard was made possible partially by support received from the national science foundation and the council on library . resources. personnel of . the us asi committee z39 at the time the committee approved the standard were dr. jerrold orne, chairman; mr. james wood, vicechairman; and mr. harold oatfield, secretary. usa standard for a format for bibliographic information interchange 55 the subcommittee on machine input records, which is directly responsible for this standard, had the following personnel: mrs. henriette d. avram, chairman assistant coordinator of information systems information systems office library of congress washington, d.c. 20540 mrs. pauline a. atherton school of library science syracuse university 308 carnegie library syracuse, new york 13210 mr. arthur r. blum american institute of physics 335 east 45th street new york, new york 10017 mr. lawrence f. buckland president, lnforonics, inc. 806 massachusetts avenue cambridge, massachusetts 02139 miss ann t. curran inforonics, inc. 806 massachusetts a venue cambridge, massachusetts 02139 mr. kay d. guiles information systems office library of congress washington, d.c. 20540 mr. frederick g. kilgour director, ohio college library center 1314 kinnear road columbus, ohio 43212 mr. abraham i. lebowitz assistant to the director national agricultural library u.s. department of agriculture washington, d.c. 20250 mrs. phyllis b. steckler r. r. bowker company 1180 a venue of the americas new york, new york 10036 56 journal of library automation vol. 2/ 2 june, 1969 1. glossary it has been considered unnecessary to define terms in common use. terms which have a special meaning in the standard or which might be ambiguous are defined below. base address of data. a data element whose value is equal to the character position of the character following the field terminator of the directory, where the specified origin is the first character of the leader. [example: if the directory contains two ( 2) entries, the first character position of data will be 49, and therefore the base address of data equals 49.] basic character. a character occurring in columns 2, 3, 6 or 7 of the standard code as defined in usas x3.4-1967 code for information interchange, p. 6. [the basic character set is included as part of the illustration on page 82 of appendix i, columns 2, 3, 6 and 7.] bibliographic information interchange format. a format for the exchange, rather than the local processing, of bibliographic records. (the terms "bibliographic information interchange format," "information interchange format," and "interchange format" are used interchangeably in this standard. ) bibliographic level. a data element which, in conjunction with the data element "type-of-record," specifies the characteristics and describes the components of the bibliographic record. [see appendix i for an illustration of an application of this data element.] bibliographic record. a collection of fields, including a leader, directory, and bibliographic data, describing one or more bibliographic units treated as one logical entity. bibliographic unit. a defined body of recorded information and the artifact on which it is recorded, e.g., a book, chapter of a book, map, cuneiform tablet, digital magnetic tape file, song (sheet music ), and song (phonograph record ). a bibliographic unit may be part of a larger bibliographic unit (e.g., the chapter as part of a book, which in tum is part of a series). [it is assumed that the originators of bibliographic information and/or bibliographic descriptions follow a set of mles or guidelines which define, for the originating source, what is to be treated as a bibliographic unit.] a single author or subject heading authority record is also a bibliographic unit. character. see internal character. communications format. see bibliographic information interchange format. control field. a variable field which supplies parameters which may be required in the processing of the bibliographic record. control number. an alphanumeric symbol uniquely associated with a bibliographic record assigned by the organization creating the bibliographic record. usa standard for a format for bibliographic information interchange 57 data element. a defined unit of information within a system. data element identifier. a code consisting of one or more basic characters used to identify individual data elements within a variable field. if and when data element identifiers are used, each occurrence must be immediately preceded by a delimiter, and each data element identifier must immediately precede the data element it identifies. the length (in characters) of the data element identifier must be uniform for each field of a given record. [in effect, a delimiter and data element identifier are combined to form a symbol used to initiate and identify data elements within a variable field. the use of the concept of data element identifiers is optional and provides a means of explicitly identifying data elements, even though in some instances there may be a redundancy of identification (e.g., if a variable field consists of only one data element, presumably the tag alone would provide sufficient identification). j data field. a variable field containing bibliographic or other data not intended to supply parameters for the processing of the bibliographic record. delimiter. a character which serves as an initiator, a separator, or a terminator of individual data elements within a variable field. [whether a delimiter is used to initiate, to separate, or to terminate, is dependent upon a specific system.] delimiter (or delimiter plus data element identifier) count. a data element whose value is the length (in characters) of the delimiter (or, if data element identifiers are used, the length (in characters) of the delimiter plus data element identifier) used within the record. directory. an index to the location of the variable fields (control and data) within a bibliographic record. the directory consists of entries. entry. a fixed field within the directory which contains information about a variable field. entry map. a data element which is used to indicate the structure of the entries in the directory. external characi'er. a graphic symbol which may be represented by one or a series of two or more internal characters. [the external character "space, is always represented by an internal character.] field. a defined character string which may contain one or more data elements. see also control field; delimiter; entry; fixed field; indicator; variable field. field terminator (ft). a character used to terminate a variable field within a bibliographic record. the last variable field is terminated by a record terminator and not a field terminator. 58 journal of library automation vol. 2/ 2 june, 1969 file. a set of related records denoted by a single name. fixed field. one in which every occurrence of the field has a length of the same fixed value regardless of changes in the contents of the field from occurrence to occurrence. format. see structure. ft. see field terminator. indicator. a data element associated with a data field which supplies additional information about the associated data field. indicator count. a data element whose value is the length (in characters) of the indicator( s) which appears as the first data element in each variable data field. the length (in characters) of the indicator ( s) must be uniform for each field of a given record. (a length of zero ( 0) is permitted) . information interchange format. see bibliographic information interchange format. interchange format. see bibliographic information interchange format. internal character. a pattern of bits of a predetermined length (depending on the system ) treated as a meaningful unit. (the terms "internal character" and "character" are used interchangeably in this standard.) leader. a fixed field which occurs at the beginning of each bibliographic record which provides parameters for the processing of the record. padding character. a character used to fill areas in fixed fields which contain no data. [see paragraph a.2.1.4 of appendix i.] primary bibliographic unit. that bibliographic unit whose physical and bibliographic characteristics determine the type-ofrecord and bibliographic level. record. see bibliographic record. record length. a data element whose value is equal to the length (in characters) of the bibliographic record including the record terminator. record terminator (rt) . a character used to terminate each record. rt. see record terminator. status. a data element which indicates the relation of the bibliographic record to a file (e.g., new, updated, etc.). structure. the framework of fixed and variable fields within the bibliographic record. subrecord. a group of fields within a bibliographic record which may be treated as a logical entity. [when a bibliographic record describes more than one bibliographic unit, the descriptions of individual bibliographic units may be treated as subrecords.] usa standard for a format for bibliographic information interchange 59 tag. a series of characters used to specify the name or label of an associated variable field. type-of-record. a data element which, in association with the data element "bibliographic level," indicates the form of the bibliographic description provided for the primary bibliographic unit. [it is assumed that the person providing the bibliographic description, on the basis of predefined criteria, will detemline the treatment a given item is to receive; i.e., whether the item is to be treated as a book, a journal article, a map, a picture, an abstract, a bibliographlcal footnote, etc. if a given item consists of parts which, if they occurred independently, would be accorded different bibliographic descriptions, the choice of treatment selected is assumed to be the most appropriate. frequently occurring combinations may be accorded their own treatments, e.g., collections of drawings with accompanying text. for each established form of bibliographic description, there will be a record format whose components are defined by the "type-of-record" data element. among these components are the length of the fixed fields, the tagging scheme employed, and the definition of the data elements. if the interchange format is used for the interchange of records of a type for which "bibliographic description" is not a parameter, e.g., authority records, this data element may be redefined. see afpendix i for an illustration of an application of this data element. variable field. one in which the length of an occurrence of the field is determined by the length (in characters) required to contain the data stored in that occurrence. the length may vary from one occurrence to the next. 2. purpose and scope 2.1 2.2 2.3 2.4 this standard defines a format for the interchange of bibliographic and related [authority files, subject heading lists, etc.] records. this standard does not define a record format for retention within the files of any specific organization. this standard does not necessarily define the content of individual records. it does describe a generalized structure which can be used for the interchange of records describing various forms of bibliographic material. this standard assumes the utilization of the following usasi standards and proposed standards: (a) usas x3.22-1967 recorded magnetic tape for information interchange ( 800 cpi, nrzi) (b) usas x3.4-1967 code for information interchange (c) proposed standard x3.2/552 magnetic tape labels and file structure 60 journal of library automation vol. 2/2 june, 1969 3. bibliographic information interchange format 3.1 schematic representation the interchange format is schematically represented below: i . i i i . i leader directory i f control if other if data if data i r field i t n i iu!cord length 0 i t nulllber it control 1t field rt i i fields i i (if _l _l present) i 1 i i ' i 3.2 leader status 4 5 3.2.1 schematic representation the leader is schematically represented below: type of iiiiiliqreserved indidelimiter !lase reserved entry record graphic for cator (or deaddress for usi! map level futiiri! count limiter plus of iiy user use data element data systems identifier count 6 7 8 9 10 ll 12 16 17 19 20 23 3.2.2 record length the record length is a 5-digit decimal number equal to the bibliographic record length. this number will include its own five characters and the record terminator. the record length will always be present in character positions 0-4 of the record. in the interchange format the bibliographic record has a maximum length of 99,999 characters. 3.2.3 · status a data element in character position 5 consisting of 1 basic character. 0 3.2.4 type-of-record a data element in character position 6 consisting of 1 basic character. o 3.2.5 bibliographic level a data element in character position 7 consisting of 1 basic character. 0 3.2.6 indicator count a data element in character position 10 consisting of 1 decimal digit 0 equal to the length (in characters) of the indicator ( s) which appears as the first data element of each variable data field. if indicators are not used, this field is set to zero ( 0). (see 3.4.2.1) • see appendb: i for an lllustratfoa of an application of this data element. usa standard for a format for bibliographic information interchange 61 3.2.7 delimiter (or delimiter plus data element identifier) count a data element in character position 11 consisting of 1 decimal digit equal to the length (in characters) of the delimiter (or, if data element identiliers are used, the length (in characters) of the delimiter plus data element identifier) used within the record. if a delimiter is not used, this field is set to zero ( 0). if a delimiter alone (i.e., without data element identifiers) is used, this field is set to one ( 1 ) . 3.2.8 base address of data a data element in character positions 12-16 consisting of 5 decimal digits and equal to the combined length (in characters) of the leader and directory (including the field terminator at the end of the directory) . 3.2.9 entry map (see 3.3.1 for the description of entries.) structure of each entry in the directory: tag length starting of character field position entry map: m n i ~ i ~ m = length (in characters) of the '1ength of field" portion of each entry in the directory n = length (in characters) of the "starting character position" portion of each entry in the directory 0 = undefined; available for future use the entry map is a data element in character positions 20-23 consisting of 4 decimal digits. each decimal digit recorded corresponds sequentially to each portion of the entry, except for the portion allotted to the tag. character position 20 in the entry map indicates the length (in characters) of the "length of field" portion of each entry in the directory; character position 21 indicates the lenrh (in characters) of the "starting character position portion of each entry. if one of these does not occur, the relevant character position in the entry 62 journal of library automation vol. 2/ 2 june, 1969 map is set to zero. character positions 22 and 23 are undefined and are available for future use. [since bibliographic data is usually variable in length, the structure of an entry in the directory will usually follow the pattern "tag, length of field, starting character position." the inclusion of an entry map provides flexibility for those users who wish to structure the entry in the directory differently, either by including (in addition to tag, length of field, and starting character position) other data elements not defined in this standard or by excluding those that have been defined. however, any restructuring of the entry by a user will have to be done within the general limitations imposed by the standard (see 3.3.1). the use of the entry map can be illustrated as follows : ( 1) an entry map set to 4500 would define the characteristics of a directory in which each entry consisted of a 3-digit tag (not expressed in the entry map), a 4-digit length of field, and a 5-digit starting character position. ( 2) an entry map set to 0500 would define the characteristics of a directory in which each entry consisted of a 3-digit tag, no length of field data element, and a 5-digit starting character position. see appendix i for an illustration of an actual application of the concept of an entry map.] 3.3 direct01·y the directory consists of a series of fixed fields (hereinafter referred to as entries). the directory ends with a field terminator. the directory must contain at least one entry for each subsequent variable field (control and data). [in the case of very long fields additional entries may be required. see 3.3.1.3.] 3.3.1 entries each entry consists of 12 characters. each entry must contain, at the very least, a tag, and length of field, or a tag and starting character position and must correspond, unambiguously, to a specific variable length data or control field. the tag, length of field, and starting character position must, whenever they occur, be in that sequence. 3.3.1.1 tag the tag is a data element consisting of 3 basic characters. 3.3.1.2 tags for control fields tags 001-009 are reserved for control fields as shown: usa standard for a format for bibliographic information interchange 63 001 control number 002 reserved for subrecord directory, if any• 003 reserved for subrecord relationship, if any• 004-009 reserved for use by user systems 3.3.1.3 length of field the length of field in the entry is the length (in characters) of the variable field to which it corresponds. the length of field includes the indicator( s) and field terminator. it is expressed as a decimal number. if the length of a variable field exceeds the maximum length expressible as decimal number in the length of field portion of the entry, two or more entries (called a "subset" for the purposes of this explanation) will be used to define the location and extent of such a field. since all the entries in the subset of entries reference the same variable field, they will contain the same tag. the length of field in each entry of the subset, except the last entry in the subset, will be set to 0 to indicate that the length of field is equal to the maximum length expressible and that there is additional information for the same fi eld in the next entry in the record direptory. the length of field for all entries in the subset subsequent to the first will refer to the length (in characters) of the overflow data. [this convention cannot be followed if the structure of the entry does not contain a length of field.] 3.3.1.4 starting character position the starting character position is the character position of the first character in the variable field (which may be an indicator or data; see 3.4.2) referenced by the entry. it is given relative to the base address of data (i.e., the first character of the first variable field following the directory is numbered 0) . 3.3.2 sequence of entries the entries in the directory may be recorded in any sequence (i.e., they need not be in the same sequence as the corresponding variable fields ) except that the • ap pendix ii illustrates a possible method of handling sub records within a b ibuoktaphic record. this is not part of the st andard . 64 journal of library automation vol. 2/2 june, 1969 entry for tags 001-009 must always be first and in ascending numeric sequence. [note that specific systems may use the sequence of entries in the directory to convey semantic information.] 3.4 variable fields 3.4.1 general following the leader and directory, the bibliographic record consists of variable fields. (although the directory is technically a variable field, the following paragraphs do not apply to it.) 3.4.2 structure of variable fields indicator(s)* each variable field consists of indicators( s) (if used), a delimiter (if used), a data element identifier (if used), data, and a field terminator, as shown. control fields do not contain indicators, delimiters, or data element identifiers. delimiter* . data element data field identifier* terminator * except control fields 3.4.2.1 indicator the indicator is the first data element in each variable field. the length (in characters) of the indicator(s), which may be 0, (i.e., no indicator is present) is recorded in the indicator count in the leader. all variable fields, except control fields, in the same record have the same length (in characters) for an indicator(s ). 3.4.3 sequence of variable fields the variable fields, except for the control fields associated with tags 001-009, need not occur in the same sequence as the corresponding directory entries. the control fields which occur must be first and in ascending numeric sequence. 3.4.4 control fields the variable fields associated with tags 001-009 are control fields. control fields do not contain indicators, delimiters, or data element identifiers. 3.4.4.1 control number field this field contains the control number, consisting of basic characters. this field must always occur once, and only once, in each usa standard for a format for bibliographic information interchange 65 bibliographic record, and must immediately follow the directory. 3.5 variable data fields 3.5.1 general the remainder of the bibliographic record consists of variable data fields. there are no restrictions on the munber, length, or content of the variable data fields other than those already stated or implied (e.g., those based on the limitations of the total record length). 3.5.2 multiple data elements multiple data elements within fields may be fixed or variable and may be identified by position, by the use of a delimiter alone, or by the use of a delimiter plus data element identifier( s) as the case may be. lib-mocs-kmc364-20131012114451 322 highlights of lit a board meetings these highlights are published to inform division members of the activities of their board. they are abstracted from the official minutes. 1981 ala annual conference san francisco first session june 29, 1981 board members present: s. michael malinconico, brigitte l. kenney, barbara e. markuson, nancy l. eaton, kenneth j. bierman, bonnie k. juergens, marilyn j. rehnberg, helen cyr, heike kordish, donald p. hammer. lita election results. vice-president/president-elect: carolyn m. gray director-at-large: hugh atkinson ala councilor: bonnie k. juergens vccs vice-chairperson/chairperson-elect: mary h. karpinski vccs secretary: patricia m. paine vccs member-at-large: leon l. drolet, jr. avs chairperson: anne t. meyer a vs vice-chairperson/chairperson-elect: louis r. pointon avs member-at-large: michael d. miller isas vice-chairperson/chairperson-elect: james c. thompson isas member-at-large: sherrie schmidt evaluation of electronic mail project. the members of the board reviewed their experiences and impressions with the ontyme electronic mail system. the general consensus was that the system was very good and everyone was pleased with it and wants to expand its use. the board has not yet used the source, although we are now subscribers to the system. motion was made by markuson , seconded by rehnberg, and passed that: the electronic mail project be extended through the midwinter meeting, 1982, with a total budget of $2,000 from the inception of the project. uta's representation on ansi x-3. x-3 is the american national standards institute committee on computers and information processing. discussion included the mechanics of keeping the membership informed of proposed standards being considered, the large amount of time required of the representahighlights of meetings 323 tive to monitor, study, and disseminate the proposed standards, and the costs involved for lit a to support a representative. juergens requested that if a division-wide representative to x-3 is appointed that that person should also be made ex officio to the isas/tesla committee or be liaison to the chair of isas. no action was taken. goals and long-range planning committee. kenney announced that she had appointed an ad hoc goals and long-range planning committee chaired by george abbott. directory of library systems in use. the suggestion was made that a directory of the many automated systems in use in libraries would be very useful. a motion was made by markuson, seconded by kenney, and passed that: in response to inquiry about a directory to assist in identifying specific applications of technology in libraries, media, and information centers, that the publications committee explore the feasibility of an online lit a directory of library, media, and information center use of technology. the investigation should consider format of description, potential of interactive online updating, and possible output byproducts, and should result in a draft rfp for consideration by the lit a board for review at midwinter. president's program at philadelphia. kenney announced her plans for the lit a president's program at the philadelphia ala annual conference. she is planning to transmit by satellite to fifty receiving sites around the country an "ala sampler" of outstanding technically-based programs from the philadelphia conference and short vignettes of what ala is all about. the subject of "0 n-line catalogs" has been chosen for the president's program and segments of it and the rtsd/lit airasd preconference institute on the same subject will be used. the program is intended for people who cannot get to ala conferences. if not enough registration is received by the coming ala midwinter meeting the whole activity would be cancelled. oral history project. at the 1980 new york ala conference, the suggestion was made that in the future many of the pioneers in the field of library automation will pass off the scene and it was felt that it was lit a's responsibility to capture for posterity the ideas and philosophy of those people. a motion was made by kenney, seconded by eaton, and passed that: an ad hoc committee be formed to investigate an oral history project in all aspects and submit a detailed set of alternative approaches for the board's consideration. the library history roundtable will be informed of the committee's activity and invited to participate. second session june 30, 1981 board members present: s. michael malinconico, brigitte l. kenney, barbara e. markuson, nancy l. eaton , kenneth}. bierman, ronald f. miller, bonnie k. juergens, marilyn j. rehnberg, helen cyr , heike kordish , charles husbands, and donald p. hammer. 324 journal of library automation vol. 14/4 december 1981 lita section reports: isas. bonnie juergens, chairperson of isas, reported that the section has approved three programs for the philadelphia conference. asis will be asked to cosponsor the program "information science, computer science, and library science: in search of common ground". another program is the "the uses of microcomputers in medium-sized public and academic libraries," and the third one will be a detailed analysis and comparison of the marc format. juergens reported that the isas retrospective conversion discussion group and one of the same name in rtsd would like to combine. a motion was made by juergens, and passed that: isas pursue appropriate steps to invite the rtsd section which currently hosts a discussion group on retrospective conversion to combine that discussion group with the lit a/isas retrospective conversion discussion group. the invitation to rtsd will include a specific description of mutual responsibilities. electronic library membership initiative group. (information report by richard sweeney, public library of columbus and franklin co., ohio; and neal kaske, oclc.) sweeney reviewed the discussions that took place at a meeting held in columbus on march 23-24, 1981 concerned with the whole area of remote electronic access to information and its impact on the library field. the group concluded that its members want to have some input on a very immediate level on the direction technology goes and the direction the policies and issues go. out of that meeting came a mission statement which is now the function statement of the ala electronic library membership initiative group (elmig). sweeney read that statement and reported on the group's concern for the future of libraries when these remote systems become established. he commented on the large number of programs and meetings on these areas that are not coordinated and not really providing the leadership our field should be giving. the almost total lack of research on these areas was also commented on. the need for the associations to provide the leadership was stressed. several members of the lita board expressed interest in providing a "home" for elmig within lita as many of lit a's interests are those of the mig. both groups are concerned with the same issues it was pointed out. lita section reports: audio-visual section. avs recommended that an audiovisual task force be established, which would include other ala units, and would share information about their plans, and would try to avoid major schedule conflicts and overlaps. a motion was made by cyr, and passed that lit a board approve ad hoc lit a a-v section participation in a broadbased task force involving rtsd, pla, acrl, aasl and others to coordinate audiovisual-related activities. cyr asked board's sanction for a "a-v interest group breakfast" where people could just socialize and talk together. this would be sometime in the future. the board members had no objection. marbi committee report. elaine woods reported that the marbi committee is focusing more on the principles and the issues that need to be highlights of meetings 325 addressed in the marc format. the committee is current with l.c. proposals. marbi has drawn up a shopping list of issues to be addressed and they are now working on some of them. publications committee report. charles husbands informed the board that the publications committee feels it is time to change the title of lola. they have chosen a title of information technology and libraries, and it is to be effective with the march 1982 issue. after discussion, a motion was made by bonnie juergens, and passed to that effect. the matter of raising the subscription price of lola was discussed. due to the fact that the division's subsidy to the journal will greatly increase next budget year, the motion was made by ken bierman, and passed that: non-member prices for the journal of the division be increased to $20 for a one-year subscription and $5.50 for a single issue, effective with march 1982, and that the published member subscription price be raised sufficiently to conform to postal regulation. husbands requested that various members of the lola editorial board be included in the lita electronic mail system. approved by the board by consensus. husbands asked the board to keep in mind the possibility of publishing some of the results of the oral history project in lola. brian aveney asked the board to allow him to investigate the possibility of putting the full text of lola online. it would be an experiment to see what people would do with it. the board approved by consensus. aveney will return with a final proposal later. other such ideas were discussed including the proposals to put the "headlines" from the lita newsletter on the source, and to include the roster of lita committees in the oclc address directory. arrangements are in process for both of these activities. goals and long-range planning committee. george abbott, chairperson, asked the board's permission to include his committee on lit a's electronic mail system. the intent would be to use it for text editing of committee documents. board approved by consensus. abbott reported that the committee expects to hold open hearings at midwinter and to have a basic document for discussion at that time. third session june 30, 1981 board members present: s. michael malinconico, brigitte l. kenney, ronald f. miller, kenneth j. bierman, marilyn j. rehnberg, heike kordish, and donald p. hammer. bylaws and organization report. there have been seven changes to the lit a bylaws that kordish will prepare in text form for the board to act on at midwinter in time for the spring ala ballot. ala priorities survey. ron miller reported that the ala executive board took action on the ala priorities and there are five of them. briefly, they are 326 journal of library automation vol. 14/4 december 1981 access to information , legislation and funding, intellectual freedom, public awareness, and personnel resources. joint council on educational telecommunications. lynne bradley reported that jcet has established a task force to bring information to its members about the new technologies and how they can best be used in education. since lit a members have much of the necessary expertise, bradley suggested that lita organize a one-day program for jcet. some board members were very much interested and bradley was asked to work with the lit a program planning committee to organize such a program. program planning committee. sue tyner reported that the telecommunications committee will hold a preconference institute at the philadelphia annual conference called "the teleconference center." it is intended to teach librarians how to set up a teleconference center. the lit a group that has been putting on the "data processing specifications and contracting" workshops has been asked to hold a workshop prior to the ifla meeting. malinconico suggested that the board adopt a policy of lit a costs plus 15 percent, but that a subcommittee of the lit a program planning committee should be set up to define policy in this area. carolyn gray was suggested as a person for this committee. marilyn rehnberg, chairperson of vccs, reported a request from national audio-visual association asking lit a to put on a " video showcase" for the seminar part of the nava annual conference in anaheim in january. lit a board of directors meetin gs record of votes 1981 annual conference motions (in order of appearance in the " highlights") board member 1 2 3 4 5 6 7 8 s. michael malinconico y y y y y y y y brigitte l. kenney y y y y y y y y barbara e. markuson y y y y y y y y nancy l. eaton y y y y y y y y kenneth j . bierman y y y y y y y y honald f. miller 0 0 0 y y y y y angie w. lecierq 0 0 0 0 0 0 0 0 helen cyr y y y y y y y y bonnie k. juergens y y y y y y y y marilyn j. rehnberg y y y y y y y y key: y =yes a= abstain 0 =absent an evaluation of finding aid accessibility for screen readers kristina l. southwell and jacquelyn slater information technology and libraries | september 2013 34 abstract since the passage of the american disabilities act in 1990 and the coincident growth of the internet, academic libraries have worked to provide electronic resources and services that are accessible to all patrons. special collections are increasingly being added to these web-based library resources, and they must meet the same accessibility standards. the recent popularity surge of web 2.0 technology, social media sites, and mobile devices has brought greater awareness about the challenges faced by those who use assistive technology for visual disabilities. this study examines the screen-reader accessibility of online special collections finding aids at 68 public us colleges and universities in the association of research libraries. introduction university students and faculty today expect some degree of online access to most library resources. special collections libraries are no exception, and researchers now have access to troves of digitized finding aids and original materials at university library websites nationwide. as part of the websites of higher education institutions, these resources must be accessible to patrons with disabilities. section 504 of the rehabilitation act of 1973 first prohibited exclusion of the disabled from programs and activities of public entities, and the 1990 americans with disabilities act (ada) mandated accessibility of public services and facilities. section 508 of the rehabilitation act, as amended by the workforce investment act of 1998, also requires accessibility of federally funded services. since the passage of these laws, libraries at us colleges and universities have made progress in physical and electronic accessibility for the disabled. according to the employment and disability institute at cornell university, 2.1 percent of noninstitutionalized persons in the united states in 2010 had a visual disability.1 the us census bureau counted nearly 8.1 million people (3.3 percent) who reported difficulty seeing and 2 million who are blind or unable to see.2 these numbers indicate that there are students, faculty, and patrons outside the academic community with visual impairments who are potential consumers of online special collections materials. as ada improvements increasingly pave the way for greater enrollment numbers of students with visual impairments, libraries must anticipate these students’ need for fully accessible information resources. kristina southwell (klsouthwell@ou.edu) is associate professor of bibliography and assistant curator at the western history collections, university of oklahoma, norman, ok. jacquelyn slater (jdslater@ou.edu) is assistant professor of bibliography and librarian at the western history collections, university of oklahoma, norman, ok. mailto:klsouthwell@ou.edu mailto:jdslater@ou.edu an evaluation of finding aid accessibility for screen readers | southwell and slater 35 library websites and the constantly changing resources they offer must be regularly evaluated for compatibility with screen readers and other accessibility technologies to ensure access. perhaps because special collections materials are relatively late arrivals to the internet, their accessibility has not received as much attention as more traditional library offerings like published books and periodicals. the goal of this study is to determine whether a sampling of special collections finding aids available on public us college and university library websites are accessible to patrons using screen readers. internet access and screen readers blind and low-vision internet users have various types of assistive technology available to them. these include screen readers with text-to-speech synthesizers, refreshable braille displays, text enlargement, screen magnification software, and nongraphical browsers. guidelines for making websites accessible via assistive technology are available from the w3c’s web content accessibility guidelines (wcag 2.0).3 these rules provide success criteria for levels a, aa, and aaa for web developers to meet. many websites today still do not conform to these guidelines, and barriers to access persist. screen-reader users access information on the internet differently than sighted persons. the keyboard usually replaces the monitor and mouse as the primary computer interface. webpage content is spoken aloud in a strictly linear order, which may differ from the visual order on screen. instead of visually scanning the page to look for the desired content, screen-reader users can use the “find” or “search” function to look for something specific or use one of several options for skimming the page via keyboard shortcuts. these shortcuts, which vary by screen reader, allow navigation to the available links, headings, aria landmarks, frames, paragraphs, and other elements of the page. a recent webaim survey of screen-reader users indicated 60.8 percent navigated lengthy webpages first by their headings. using the “find” feature was the second most common method (16.6 percent), followed by navigation with links (13.2 percent) and aria landmarks (2.3 percent). only 7.0 percent reported reading through a long website without using navigational shortcuts.4 some websites also offer a “skip navigation link” at the beginning of a page, which allows the user to skip over the repetitive navigational information in the banner to hear the “main content” as soon as possible. these fundamental differences in the way screenreader users access internet content are the key to making websites that work well with assistive technology. literature review accessibility studies of library web sites in higher education have primarily focused on the library’s homepage and its resources and services. more than a decade ago, lilly and van fleet and spindler determined only 40 percent and 42 percent, respectively, of academic library homepages were rated accessible using bobby accessibility testing software.5 a series of similar studies followed by schmetzke and comeaux and schmetzke, which found accessibility rates of library homepages fluctuating over time, decreasing from a 2001 rate of 59 percent to 51 percent in 2002, information technology and libraries | september 2013 36 and rising back to 60 percent in 2006.6 providenti and zai iii examined academic library homepages in kentucky, comparing data from 2003 and 2007. they also found low accessibility rates with minimal improvement in four years.7 many accessibility studies have focused on one of the mainstays of academic library sites: databases of e-journals. early studies by coonin, riley, horwath, and others found significant accessibility barriers in most electronic content providers’ databases.8 problems ranged from missing and unclear alternative text to inaccessible journal pdfs saved as images instead of text. as awareness of web accessibility in library resources spread, research studies began to find that most databases were section 508 compliant but still lacked user-friendliness for users of assistive technology.9 more recent studies examined the actual usability of journal databases and the challenges they pose for the disabled. power and labeau still found vendor databases that were not section 508 compliant and others that were minimally compliant but lacked functionality.10 dermody and majekodunmi found that students were hindered by advanced database features intended to improve general users’ experiences.11 disabled students were also confronted with excessive links, unreadable links, and inaccessible pdfs. related studies have focused on providing guidelines for accessible library web design and services. brophy and craven highlighted the importance of universal design in library sites because of the ever-increasing complexity of web-based media.12 vandenbark provided a clear explanation of us regulations and standards for accessible design and outlined basic principles of good design and how to achieve it.13 recent works by samson and willis addressed best practices for reference and general library services to disabled patrons. samson found no consistent set of best practices between eight academic libraries studied, noting that five of the eight based their services on reactions to individual complaints instead of using a broader, proactive approach.14 willis followed up on a 1996 study by surveying technology and physical-access issues for the disabled in academic health sciences libraries. she found improvements in physical access, but technological access proved to be a mixed bag. while library catalogs were more accessible because they were available online, library webpages continued to pose problems for the disabled. significant deficiencies in the provision of alternative text and accessible media formats were observed.15 finding no comparable evaluations of special collections resources, in 2011 we examined the screen-reader accessibility of digitized textual materials from the special collections departments of us academic library websites.16 our study found that 42 percent of the digitized items were accessible by screen reader, while 58 percent were not. published at the same time, lora j. davis’ 2012 study evaluated accessibility of philadelphia area consortium of special collections libraries (pacscl) member libraries’ special collections websites and compared their performance to popular sites such as facebook, wikipedia, and youtube. davis found that the special collections sites had error rates comparable to the popular sites, but demonstrated that a low number of error codes in automatic checkers does not necessarily mean the page is usable for nonsighted people.17 davis concluded that it is difficult to “meaningfully assess site accessibility” an evaluation of finding aid accessibility for screen readers | southwell and slater 37 using only automatic accessibility checkers.18 our current research study addresses this issue by incorporating manual tests of the special collections finding aids we examined. the results provide some insight into the screen-reader user’s experience with these materials. method the researchers evaluated a single online finding aid from the websites of each of the 68 us public university and college libraries in the association of research libraries. they were analyzed with automated and manual tests during the 2012 fall academic semester. the evaluated finding aids were randomly selected from each library’s manuscripts and archives collections. selection was limited to only collections that have a container list describing manuscript or archives contents at least at box level. evaluations were performed on the default display mode of the selected finding aids. if the library’s website required a format choice instead of a default display (such as html or pdf) the html version was selected. the automated web-accessibility checker wave 5.0 by webaim was used to perform an initial assessment of each finding aid’s conformance to section 508 and wcag 2.0 guidelines. the wavegenerated report for each finding aid was used to compile a list of codes for the standard wave categories: errors, alerts, features, structural elements, and wai-aria landmarks. we recorded how many libraries earned each type of code, as well as how many times each code was generated during the entire evaluation process. manual testing of each finding aid was performed with the webbie 3 web browser, a text-only browser for the blind and visually impaired. webbie’s ctrl-h and ctrl-l commands were used to examine the headings and links available on each finding aid to determine whether patrons who use screen readers could navigate the finding aid by using its headings or internal links. the study concluded with a manual test by screen reader directed by keyboard navigation. system access to go (satogo) and nvda were used for this test. results overview basic descriptive data recorded during the selection process shows that 65 of the 68 finding aids tested were displayed as webpages using html, xhtml, or xml coding. the remaining three finding aids were displayed only in pdf, with no other viewing option available. only 25 of 68 finding aids were offered in multiple viewing formats, while 43 were only available in a single format. twenty of the finding aids were displayed in a state or regional database, while four used archon, one used contentdm, and four used dlxs. wave 5.0 web accessibility evaluation tool the three finding aids available only in pdf cannot be checked in wave, which is limited to webpages. therefore 65 finding aids were evaluated with this tool. the results show that the majority of tested finding aids (58 of 65, or 89.23 percent) had at least one accessibility error (see information technology and libraries | september 2013 38 table 1). the most common errors were missing document language, missing alternative text, missing form labels, and linked images missing alternative text. only seven of the finding aids had zero accessibility error codes. missing document language was noted for 63 percent of finding aids. language identification is important for screen readers or any text-to-speech applications, and it is a basic level a conformance requirement to meet wcag 2.0 criteria. the finding aids tested for this study contain primarily english materials, but they also describe materials in other languages, particularly spanish and french manuscript and book titles. without language identification, these words are spoken incorrectly with english pronunciation. furthermore, increasing popularity of mobile devices with voicing capabilities will likely make language identification helpful for many users, whether or not they use a screen reader for a disability accommodation. error number of libraries total number of occurrences broken skip link 4 6 document language missing 41 41 empty button 1 1 empty form label 4 7 empty heading 15 16 image map area missing alternative text 2 2 linked image missing alternative text 12 28 missing alternative text 15 36 missing form label 23 29 missing or uninformative page title 7 7 table 1. wave 5.0 errors (n = 65). the number of errors found for missing alternative text (36 instances at 15 libraries), linked images missing alternative text (28 instances at 12 libraries), and image map areas missing alternative text (two instances at two libraries) is surprising. alternative text for graphic items is one of the most basic and well-known accessibility features that can be implemented. the fact that it has not been provided when needed in more than a dozen finding aids suggests that these libraries have not performed the most rudimentary accessibility checks. missing or empty form labels and empty buttons, found at 24 libraries, can cause significant problems for screen-reader users. form labels and buttons allow listeners to identify and interact with forms such as search boxes. lack of accessible descriptive information makes them challenging to use, if not impossible. because headings are used with screen readers to facilitate quick keyboard navigation of a page, the presence of empty headings deprives screen-reader users of the information they need to scan the page the way a sighted patron does. similarly, skip links are used to jump to the main content of a page, bypassing the repetitive information in headers and sidebars. broken skip links were an evaluation of finding aid accessibility for screen readers | southwell and slater 39 present at four libraries, eliminating their intended advantage. missing or uninformative page titles were found at seven libraries, six of which were from pages using frames for display. when frames are used, each frame must have a clear title so listeners can choose the correct frame to hear. wave’s alerts category identifies items that have the potential to cause accessibility issues, particularly when not implemented properly (see table 2). a total of 43 percent of the finding aids reported missing first level headings, 30 percent had a skipped heading level, and nearly 17 percent had no heading structure. missing and disordered headings cause confusion when screenreader users try to navigate a page with them. listeners may think they have missed a heading, or they may have difficulty understanding the order and relationship of the page’s sections. alert number of libraries total number of occurrences accesskey 3 15 broken same-page link 9 18 link to pdf document 3 5 missing fieldset 1 1 missing first level heading 28 28 missing focus indicator 13 13 nearby image has same alternative text 9 1,071 no heading structure 11 16 noscript element 8 9 orphaned form label 2 2 plugin 1 1 possible table caption 3 4 redundant alternative text 4 9 redundant link 26 264 redundant title text 18 1,093 skipped heading level 20 22 suspicious alternative text 6 6 suspicious link text 1 5 tabindex 8 74 underlined text 8 142 unlabeled form element with title 1 2 very small text 11 20 table 2. wave 5.0 alerts (n = 65). at first glance, the most frequently encountered alerts appear to be for nearby images with the same alternative text (1,071 instances at nine libraries), and redundant title text (1,093 instances information technology and libraries | september 2013 40 at 18 libraries). on closer inspection, it is clear the vast majority of these alerts came from just three libraries using archon and are due to the inclusion of an “add to your cart” linked arrow image at the end of each described item. this repetitive statement is read aloud by the screen reader, making for a tedious listening experience. redundant links accounted for the next largest group of errors (264 instances at 26 libraries). most of these came from a single library using contentdm. its finding aid included a large number of subject headings linked to a “refine your search” option. excessive links clutter the navigational structure used by screen readers. broken same page links, present on nine finding aids, also hamper quick navigation within a page. other alerts reported at several libraries indicated failure to provide descriptive information or adequate alternative text for form labels, table captions, fieldsets, and links. the presence of these problems underscores the fact that descriptive information needed by screen reader users is not reliably available in finding aids. the remaining alerts for accesskey, tabindex, plugin, noscript element, and link to pdf document simply highlight areas that should be checked for correct implementation and do not confirm the presence of an access barrier. the features, structural elements, and wai-aria landmarks codes in wave identify the coding elements that make online content more accessible. features help users with disabilities interact with the page and read all of the available information on it, such as alternative text for images and form labels (see table 3). fully 83 percent (54 of 65) of library finding aids evaluated included at least one accessibility feature. the most commonly used features are alternative text and linked images with alternative text. a total of 53 libraries used some form of alternative text. wave reported that skip navigation links were available on only four finding aids, accounting for just 6 percent of libraries. a manual check of the source code, however, located a total of six finding aids feature number of libraries total number of occurrences alternative text 45 142 element language 2 2 form label 5 16 image button with alternative text 4 4 image map area with alternative text 2 5 image map with alt attribute 3 3 linked image with alternative text 19 31 long description 1 6 null or empty alternative text 10 21 null or empty alternative text on spacer 9 30 skip link 4 4 skip link target 5 5 table 3. features (n = 65). an evaluation of finding aid accessibility for screen readers | southwell and slater 41 with functioning skip links, all correctly located at or near the beginning of the page. this discrepancy indicates that accessibility checkers are not fail proof and must be followed by manual tests. the two added libraries raise the total to just 9 percent of libraries with skip links. considering the value of skip links to users of assistive technology, it is unfortunate they are not present on more pages. the structural elements noted by wave are the elements that help with keyboard navigation of the page and contextualizing layout-based information, such as tables or lists (see table 4). most libraries (64 of 65, or 98 percent) used at least one structural feature on their finding aids. lists and heading levels 2 and 3 are the most frequently used, followed by heading levels 1 and 4. although heading levels should be ordered sequentially to provide logical structure to the document, heading level 1 was skipped at 28 libraries (see table 2). table header cells, included at the 9 libraries using data tables to display container lists, are key to making tables screen-reader accessible. inline frames were used at seven libraries, as opposed to six libraries that used traditional frames. while inline frames are considered more accessible than traditional frames, using css is preferable to using either type. structural element number of libraries total number of occurrences definition/description list 11 86 heading level 1 33 54 heading level 2 43 150 heading level 3 42 295 heading level 4 25 108 heading level 5 1 2 heading level 6 0 0 inline frame 7 7 ordered list 6 16 table header cell 9 38 unordered list 41 715 wai-aria landmarks 1 3 table 4. structural elements (n = 65). wai-aria landmarks are element attributes that identify areas of the page such as “banner” or “search.” they serve as navigational aids for assistive technology users in a manner similar to headings. only one of the finding aids included three wai-aria roles. while aria landmarks are becoming more common on the internet in general, the data collected for this study indicates they have not yet been incorporated into library finding aids. information technology and libraries | september 2013 42 webbie 3 text browser analysis because screen reader users often use a webpage’s headings and links for navigating by keyboard commands, their importance to accessibility cannot be overstated. a quick check of any page in a nongraphical browser will reveal the page’s linear structure and reading order as handled by a screen reader. a text-only view of a website shows the order of headings and links within the document. webbie 3’s ctrl-h and ctrl-l commands were used to evaluate the 65 finding aids for the presence of headings and links for internal navigation. finding aids were rated on a pass/fail basis in three categories: • presence of any headings • presence of headings for navigating to another key part of the finding aid (e.g., container list) • presence of internal links for navigating to another key part of the finding aid headings/links yes no finding aid has at least one heading 59 (91%) 6 (9%) headings are used for navigation within finding aid 44 (68%) 21 (32%) links are used for navigation within finding aid 37 (57%) 28 (43%) headings and/or links used for navigation within finding aid 49 (75%) 16 (25%) table 5. use of headings and links for navigation (n = 65). while 91 percent had at least one heading, just 68 percent actually had headings that enabled navigation to another important section of the document, such as the container list. that means one-third of all finding aids encountered during this study could not be navigated by headings. even those that did have enough headings with which to navigate did not always have the headings in proper sequential order, or were missing first-level headings. this lack of adequate structure, given the length of some manuscript-collection finding aids, can make reading them with a screen reader tedious. finding aids with few or no headings prevent users of assistive technology from conveniently moving between sections, as a sighted reader can by visually scanning the page and selecting a relevant portion to read. even fewer finding aids offered links for navigating between sections of the finding aid. while 57 percent included such links, 43 percent did not. a total of 25 percent of pages tested lacked both headings and links of any kind for navigation within the finding aid. inclusion of headings or links to the standard sections of the finding aid facilitates keyboard navigation. additional headings or links to individual series or boxes provide even more options an evaluation of finding aid accessibility for screen readers | southwell and slater 43 for screen reader users. this is particularly helpful for patrons whose queries aren’t easily found using a search function – for example, when a patron does not know the specific terms to use for searching. only the most patient visitor will listen to an entire finding aid being read. screen reader test a manual screen reader test of each finding aid was completed by the researchers with satogo and nvda. both screen readers were used to ensure that success or failure to read the content was not because of any particular screen reader software. despite the 89 percent error rate noted by the automatic accessibility checker, the screen readers were able to read the main content of all 65 finding aids. the three pdf-only finding aids in the original group of 68 were also tested by opening them with the screen reader and adobe reader together. adobe reader indicated all three lacked tagging for structure and attempted to prepare them for reading. this resulted in all three being read aloud by the screen reader, but only one of the three was navigable by linked sections of the finding aid. the remaining two finding aids had no headings or links. while it is encouraging that the main content of all 68 finding aids could be read, some functioned poorly because of how the information is organized and displayed. finding aids serve as reference works for researchers and as internal record-keeping documents for the history of the collection. as such, they typically have a substantial amount of administrative information positioned at the beginning. biographical, acquisition, and scope and content notes are common, as are processing details and subject headings. sighted users can glance at the administrative information and skip to the collection summary or container list as needed. screenreader users can bypass this administrative information by using headings or links when they are supplied. users of the one-third of finding aids in this study that lacked these shortcuts must skim, search, or read the entire finding aid. inclusion of extensive administrative information without providing the means to skip past it creates a significant usability barrier. the descriptive style and display format of the container list also posed problems during this test. lengthy container lists displayed in tables are difficult to understand when spoken because tables are read row-by-row. this separates the descriptive table header cells, such as “box” and “folder” from the related information in the rows and columns below. as a result, the screen reader says “one, fifteen” before the description of the item in box 1, folder 15. it is hard to follow a long table, and the listener must remember or revisit the column and row headers to make sense of the descriptions. most screen readers have a table-reading mode for data tables that will read the header cell with the associated content, but only if the table has been marked up with sufficient tags. container-list-item descriptions that begin with an identification number or numeric date (e.g., 2012/01/13) are particularly unclear for listeners. these long sequences of numbers seem out of context when spoken by the screen reader, and it can be difficult to infer the relationship between the number and the item. item descriptions that are phrased as brief sentences in plain language result in finding aids that are more easily understood. application of findings information technology and libraries | september 2013 44 most special collections personnel in academic libraries are not responsible for the design of their websites, which are part of a larger organization that serves other needs. it is important that special collections librarians communicate to administrative and systems personnel that finding aids must be accessible to the visually disabled. libraries cannot rely on a content management system’s claims of being section 508-compliant to ensure accessibility, because that does not automatically guarantee the information displayed in the system is accessible. proper implementation of any content management system’s accessibility features is a key factor in achieving accessibility. librarians can take the first step toward improving accessibility of their special collections’ online finding aids by experiencing firsthand what screen reader users encounter when they use them. this can be done by conducting the same automated and manual tests described in this study. the following key checkpoints should be considered: accessible finding aids should • be keyboard navigation-friendly; • include alternative text for all graphics; • have descriptive labels and titles for all interactive elements like forms; • offer at least one type of navigational structure: o skip links and internal navigation links, o sufficient and properly ordered headings, or o wai-aria landmarks; and • linear reading order should be correct and simulate visual reading order, particularly for the container list. conclusion this study indicates that special collections finding aids at us public colleges and universities can be accessed by screen-reader users, but they do not always perform well because of faulty coding and inadequate use of headings or links for keyboard navigation. it is clear that many finding aids available online today have not been evaluated for optimal performance with assistive technology. this results in usability barriers for visually impaired patrons. special collections librarians can help ensure their electronic finding aids are accessible to screen-reader users by conducting automatic and manual tests that focus on usability. the test results can be used to initiate changes that will result in finding aids that are accessible to all users. an evaluation of finding aid accessibility for screen readers | southwell and slater 45 references 1. “disability statistics,” employment and disability institute, cornell university, 2010, accessed december 20, 2012, www.disabilitystatistics.org/reports/acs.cfm. 2. matthew j. brault, “americans with disabilities: 2010,” us census bureau, 2010, accessed december 20, 2012, www.census.gov/prod/2012pubs/p70-131.pdf. 3. “web content accessibility guidelines (wcag) 2.0,” world wide web consortium (w3c), accessed december 20, 2012, www.w3.org/tr/wcag. 4. “screen reader user survey #4,” webaim, accessed december 20, 2012, http://webaim.org/projects/screenreadersurvey4. 5. erica b. lilly and connie van fleet, “wired but not connected,” reference librarian 32, no. 67/68 (2000): 5–28, doi: 10.1300/j120v32n67_02; tim spindler, “the accessibility of web pages for mid-sized college and university libraries,” reference & user services quarterly 42, no. 2 (2002): 149–54. 6. axel schmetzke, “web accessibility at university libraries and library schools,” library hi tech 19, no. 1 (2001): 35–49; axel schmetzke, “web accessibility at university libraries and library schools: 2002 follow-up study,” in design and implementation of web-enabled teaching tools, ed. mary hricko (hershey, pa: information science, 2002); david comeaux and axel schmetzke, “web accessibility trends in university libraries and library schools,” library hi tech 25, no. 4 (2007): 457–77, doi: 10.1108/07378830710840437. 7. michael providenti and robert zai iii,“web accessibility at kentucky’s academic libraries,” library hi tech 25, no. 4 (2007): 478–93, doi: 10.1108/07378830710840446. 8. bryna coonin, “establishing accessibility for e-journals: a suggested approach,” library hi tech 20, no. 2 (2002): 207–20, doi: 10.1108/07378830210432570; cheryl a. riley, “libraries, aggregator databases, screen readers and clients with disabilities,” library hi tech 20, no. 2 (2002): 179–87, doi: 10.1108/07378830210432543; cheryl a. riley, “electronic content: is it accessible to clients with ‘differabilities’?” serials librarian 46, no. 3/4 (2004): 233–40, doi: 10.1300/j123v46n03_06; jennifer horwath, “evaluating opportunities for expanded information access: a study of the accessibility of four online databases,” library hi tech 20, no. 2 (2002): 199–206; suzanne l. byerley and mary beth chambers, “accessibility and usability of web-based library databases for non-visual users,” library hi tech 20, no. 2 (2002): 169–78, doi: 10.1108/07378830220432534; suzanne l. byerley and mary beth chambers, “accessibility of web-based library databases: the vendors’ perspectives,” library hi tech 21, no. 3 (2003): 347–57. 9. ron stewart, vivek narendra and axel schmetzke, “accessibility and usability of online library databases,” library hi tech 23, no. 2 (2005): 265–86, doi: 10.1108/07378830510605205; suzanne l. byerley, mary beth chambers, and mariyam thohira, “accessibility of web-based http://webaim.org/projects/screenreadersurvey4/ information technology and libraries | september 2013 46 library databases: the vendors’ perspectives in 2007,” library hi tech 25, no. 4 (2007): 509– 27, doi: 10.1108/07378830710840473. 10. rebecca power and chris labeau, “how well do academic library web sites address the needs of database users with visual disabilities?” reference librarian 50, no. 1 (2009): 55–72, doi: 10.1080/02763870802546399. 11. kelly dermody and norda majekodunmi, “online databases and the research experience for university students with print disabilities,” library hi tech 29, no. 1 (2011): 149–60, doi: 10.1108/07378831111116976. 12. peter brophy and jenny craven, “web accessibility,” library trends 55, no. 4 (2007): 950–72. 13. r. todd vandenbark, “tending a wild garden: library web design for persons with disabilities,” information technology & libraries 29, no. 1 (2010): 23–29. 14. sue samson, “best practices for serving students with disabilities,” reference services review 39, no. 2 (2011): 244–59, doi: 10.1108/00907321111135484. 15. christine a. willis, “library services for persons with disabilities: twentieth anniversary update,” medical reference services quarterly 31, no. 1 (2012): 92–104, doi: 10.1080/02763869.2012.641855. 16. kristina l. southwell and jacquelyn slater, “accessibility of digital special collections using screen readers,” library hi tech 30, no. 3 (2012): 457–471, doi: 10.1108/07378831211266609. 17. lora j. davis, “providing virtual services to all: a mixed-method analysis of the website accessibility of philadelphia area consortium of special collections libraries (pacscl) member repositories,” american archivist 75 (spring/summer 2012): 35–55. 18. davis, “providing virtual services to all,” 51. communication design and development of a himalayan studies information system for india: a proposed model anil singh the ever-increasing need for information, with its complexity and escalating costs; the enormous growth in publications, and the emergence of subject specialization have compelled librarians to share resources through information networks and systems. this paper describes the necessity of networking among the himalayan studies and research centers in india, allowing the sharing of information originating from the himalayan studies information system (himis). the paper also discusses in brief the definition of information systems, as well as the objectives and needs of a proposed himis. the recent advancements in the computer, communications, and networking technologies have brought about three paradigm shifts. there is the shift of information resources from print to electronic media, the shift in the role of information providers from passive to proactive, and the shift from manual to automated information delivery. this has presented library and information professionals with a tremendous challenge, that of playing a proactive role not only in their routine activities (acquisition, processing, and dissemination), but also in the actual learning process of their clientele. library and information professionals have to learn to scan, filter, interpret, analyze, repackage, and deliver information from a variety of sources in ways that are meaningful to their users.1 himalaya, the greatest physical feature of the earth, is not only an integral part of history and heritage, but it has also assumed the form of a social, cultural, and geo-political reality that cannot be ignored or underestimated. this mountain range makes an enormous contribution to our contemporary life, even as it influenced our history and mythology. furthermore, this influence promises to extend even to the future.2 the himalayas have always remained a source of fascination and inspiration for people from all walks of life, and have been deemed by the peoples of the subcontinent to be the cradle of human civilization. the variety of cultures, terrains, forest physiographies, flora, and fauna of this region has lured the intelligentsia of the world since time immemorial. in recent years, however, the himalayas have become the focus of attention of scientists and government alike. efforts are underway for a better understanding of its highly complex environmental and ecological systems, and to bring about an all-around development of the region, which has remained backward throughout the centuries.3 there has been a tremendous explosion of data and information in recent years, particularly in the field of himalayan resources, with an increase in the number of research and development institutions at the national and international level. while most of these research institutions and universities possess excellent libraries and information centers, there is at present no information network by which coordination and sharing of resources could be effected for the mutual benefit of each of the existing libraries. because there is little access to the right information at the right time, it has become difficult for one single organization to collect the data and information that are required by policy makers, administrators, and research scientists. the other scientific departments of the government of india have already started planning to begin information networking in their respective areas. some of these projects are completed and others are in the process of implementation.4 during the last decades, india has been active in setting up information systems and networks, and considerable progress has been achieved in this area (figure 1). since most of the information related to the himalayas is scattered in different research-andstudies centers all over india, there is a need to develop a himalayan studies information system (himis). himis will be a computer-communication network for linking various libraries and information centers of research and development (r&d) institutions, universities, and nongovernmental organizations (ngos) working on the himalayas. the system, therefore, has to take into account the specific information requirements of each development sector so far as its relevance to the himalayas is concerned. defining information systems the purpose of an information system is to provide accurate and relevant information to users at the right time and at the appropriate level of detail. this will help ensure that the corporate information resource is utilized fully.5 buckingham defines an information system as a system which assembles, stores, processes, and delivers information relevant to an organization (or to society), in such a way that the information is accessible and useful to those who wish to use it, including managers, staff, clients, and citizens. an anil singh (rathoreas@hotmail.com) is professional assistant, division of lib-rary, documentation and information, national council of educational research and training, sri aurobindo marg, new delhi, india. communications design and development of a himalayan studies information system for india | singh 37 38 information technology and libraries | march 2005 information system is a human activity (social) system which may or may not involve the use of computer system.6 harrod’s librarians’ glossary defines information system as an organized procedure for collecting, processing, storing, and retrieving information to satisfy a variety of needs.7 according to bowman, four different components can be identified: (1) a store of useful information that has been accumulated over a period of time; (2) a series of techniques used for adding material to and retrieving information from the store upon demand; (3) a group of people who operate the system and are responsible for selecting information to be added, answering questions, organizing the store, and for implementing and modifying the techniques for both storage and retrieval; and (4) the user. the ultimate test of any such system is the degree of satisfaction it gives the user who has specific information needs.8 setting up himis the volume of himalayan information plus the number of users and their various requirements have created a situation where it is almost impossible for any single library to provide information services singlehandedly. it is only through a network of all these information centers that some viable control of himalayan information is feasible, not only at a local level, but also at the national and international level, to ensure effective use of information resources to the best advantage at a minimum cost. these days, information is being regarded as a national resource, and this awareness has led to computerbased information services such as indexing and abstracting at the national and international levels in the field of himalayan studies.9 considering the importance of himis, the government of india, through the agency of the national information systems in science and technology (nissat), should aid and support the proposed himis generously. himis is eminently suitable for several spheres of national activities, including planning and research. reliable and timely information for decisionmaking becomes increasingly important for india, where a concept of social welfare has developed over the past three decades. many organizations have successfully developed their own information systems to plan, monitor, or control their research activities, and these have yielded increased research proficiency. the government should surely benefit from these methods. apart from suggesting suitable solutions to the problems of planning, monitoring, allocation, control, and coordination of the departmental programs, one has to consider the special distribution of these programs.10 the need to set up himis has, therefore, to be considered in the context of the rapid development of himalayan information as well as the increasing awareness of its relevance to societal development.11 objectives of himis himis will be fully computerized so that an efficient system of storage and retrieval could be organized through networks linking all the himalayanstudies centers with each other. the main objectives of the proposed himis are listed in appendix a. users of the proposed system the proposed information system is planned to meet the needs of the specialists who are directly or indirectly concerned with himalayan studies and research as a subject or as an activity. the following are the categories of those to whom the information would be supplied in a meaningful form within a reasonable time through himis. 1. planners, policy makers, decision makers, administrators with respect to himalayan development at government and nongovernment levels; 2. major institutions devoted to himalayan study and research as a discipline; 3. international organizations such as the united nations educational, scientific, and cultural organization (unesco) and the international centre for integrated mountain development (icimod); agricultural information system ahamdabad library network biodiversity conservation information system calcutta library network developing library network education information system environmental information system industrial information system information and library network management library network medical information system mysore library network national information system for science and technology nutrition information system patent information system project information system pune library network rural information system figure 1. some of the current networking systems in india 4. scientists engaged in the implementation and execution of plans and policies; 5. scientists and researchers engaged in himalayan study and research; 6. teachers engaged in teaching about the himalayas; 7. communicators who attempt to convey the information about development policies, plans, programs, and projects; and 8. ngos working on projects about himalayas. components of the proposed himis for designing himis, six essential components have been identified and proposed: (1) national resource center on himalayan studies; (2) library consortia; (3) computerization of himalayan institutes’ libraries; (4) information networking between himalayan institutes; (5) digitization of information material on himalayas; and (6) himalayan information gateway. figure 2 depicts the proposed components. national resource center on himalayan studies as a part of the information system, a national-level resource center for himalayan studies is also proposed on the pattern of other information centers in india, such as the national information center for management information (nicman), ahmedabad, and the national information center for food science and technology (nicfos), mysore, to name but two. these information centers have been established with the financial help of nissat, and at present are working as sectoral information centers of nissat. figure 3 describes various functions and activities of the proposed center. the planned objectives of the resource center are: � to create a user-need-based information-technology (it) resource center; � to develop a strong collection of different types of information sources; � to develop a user-friendly information-retrieval mechanism, such as an online facility for having access to international databases; � to create an it-based infrastructure; � to develop a liaison with all himalayan studies and research centers of india and other international centers for better information service through resource sharing and networking; and � to provide bibliographies on selected topics on demand or even in advance of demand. himalayan-studies libraries consortium the concept of library consortia has been floating around for quite some time in india. though it is the need of the hour, indian libraries have yet to move in a definite direction in this regard. strong resource-sharing activity among libraries, a prerequisite for the right attitude towards consortia activities, has not been as present in india as could be desired. sudden influx of electronic information is forcing library consortia to materialize.13 traditionally, the primary purpose of establishing a library consortium is to share physical resources amongst members. access to resources is now considered more important than collection building, especially if the access is perpetual in nature. a library consortium helps libraries to get the benefit of wider access to electronic resources at an affordable cost and at the best terms of license. a consortium with the collective strength of resources of the various institutions available to it is in a better position to address and resolve the problems of managing, organizing, and archiving electronic resources.14 the indian national digital library in science and technology (indest) consortium, which is set up by india’s ministry of human resource and development, and ugc-infonet, which is set up by the university grant commission of india are the best example of library consortia in india. therefore, it is also proposed that a consortium of himalayan institute’s libraries may be formed to share the electronic resources of other libraries. computerization of himalayan-studies institute’s libraries observance of an adherence to standard techniques, procedures, and methods is an essential prerequisite for the effective functioning of a network. participating libraries will have to follow certain procedures and practices, without which the resources held by them cannot be effectively and meaningfully shared. in the context of library computerization, standardization is very necessary in such areas as classification, subject heading, and cataloging of various types of library materials; bibliographic description; and standard interchange of bibliographic data.15 the himalayan institutes have started the computerization of their libraries, beginning with creation of computerized databases of books, journals, reports, conference proceedings, annual reports, and monographs. but not all the housekeeping activities have been computerized. it is, therefore, suggested that every library should begin the computerization of each and every one of their activities. in india, libraries are generally using dewey decimal classification (ddc) for classification, anglo-american cataloguing rules (aacr) for cataloging rules, and library of congress (lc) for subject headings. machine-readable cataloging (marc) fomat is being used in the majority of libraries for creation of the database. design and development of a himalayan studies information system for india | singh 39 40 information technology and libraries | march 2005 himalayan institute’s network it is also proposed that there is a need for networking among the himalayan institute’s libraries and information centers in the country for optimum resource sharing. resource sharing and networking in libraries are powerful tools both for increasing productivity and enhancing services to meet the changing needs of library users. the proposed network ensures effective bibliographic control, document delivery, cooperative acquisition of serials and other literature in the field, and dissemination of relevant information. all the libraries and information centers of himalayan-studies centers of india (see appendix b) will be linked to the national resource center on himalayan stud-ies. these libraries have been identified as regional centers of the proposed himis. a network, in the first instance, envisaged a physical structure of links among the libraries and information centers established by means of computer and telecommunication links. resource sharing is based on the concept that the collective strength and effectiveness of a group of libraries is greater than that of the sum of its individual members. digitization of himalayaninformation sources research publications are vital for any professional discipline, and it is crucial to preserve and provide access to them. due to the amount of important information published in journals, conference proceedings, and reports, these efforts must be on a par with similar initiatives in other countries. permanent preservation and enhanced access must be ensured vigorously now by the applications of new technology for digitizing and electronic access to valuable contents. digitization provides unhindered access to information via computer and communication networks, justifying the need for using it for studying himalayan literature. at present, very few copies of conference volumes, journals, and reports are published, and these are exclusively distributed to subscribers. the institutions involved rarely encourage readers to subscribe personally to such literature, thus placing it out of the reach of a large number of readers. the paper used for publishing these types of sources is of inferior quality, significantly reducing shelf life. in addition to providing increased and easy access to these publications, the most visible benefit of digitization is the fact that it preserves them.16 some of the purposes figure 2. components of himis figure 3. national resources center on himalayan studies in india of digitization, identified by different ongoing projects, are to: � collect, store, and organize information and knowledge in digital form; � promote economic and efficient delivery of information; � leverage considerable investments in computing and communication infrastructures; � strengthen communication and collaboration between research, government, and educational communities; and � contribute for lifelong learning opportunities. keeping all of these points in mind, the digitization of himalayan literature—particularly back volumes of journals, conference proceedings, annual reports, monographs, reports, and research papers published by himalayan scientists in various journals, available in himalayan studies centers of india—is very necessary. and it is one of the important and necessary aspects of the proposed himis. himalayan-studies information and subject gateway one of the major problems in accessing information from the internet is that it is very difficult and time consuming to get reliable and relevant information in the shortest possible time. the effective and efficient way to provide easy access to quality information on the internet is developing subject gateways in specific areas. “subject gateways are online services and sites that provide searchable and browseable catalogues of internetbased resources. subject gateways will typically focus on a related set of academic subject areas.”17 to meet the information requirements of the scientific and academic communities in the digital era, various departments in india have developed or are still in the process of developing subject gateways in their respective areas.18 in recent years, there has been a tremendous explosion of data and information on the internet, particularly in the field of himalayan resources. as a result, it has become difficult for an individual—and also for an organization— to collect data and information. thus, access to the right information at the right time has become very difficult.19 since most of the information related to the himalayas is scattered on the web, it is necessary to develop a himalayan-information subject gateway. this gateway will provide links to various libraries and information centers of r&d institutions, universities, and ngos working in the himalayan region. this gateway will be developed on the pattern of the sayama prasad mookerjee information gateway of social science (spmigss) developed by the indian council of social science research (icssr).20 the system, therefore, has to take into account the specific information requirements of each development sector insofar as its relevance to the himalayas is concerned. institutions engaged in himalayan studies and research to reduce the quantum of information illiteracy, it is essential that information is readily available to an individual about the agencies that generate and publish himalayan information—a huge task. the main hurdle has been the lack of appreciation of the role and importance of this type of institutional activity on a continuing basis. bearing in mind the concern for setting up himis, the first step is to identify the agencies and institutions that generate himalayan information. the information generated by these institutions may be contained in the form of files, computerized databases, reports, institutional publications, dissertations and theses, articles in journals, and conference and seminar papers. the information also is accumulated through state-of-the-art reports, serials, and yearbooks.21 india adinet (www.alibnet.org/) agris (www.fao.org/agris/) calibnet (www.calibnet.org.in/) csir (www.csir.res.in) delnet (www.delnet.nic.in/) desidoc (http://drdo.nic.in/labindex. shtml) drdo (http://drdo.nic.in) dst (www.dst.gov.in/) envis (http://envis.nic.in/) hellis (www.hellis.org/) icar (www.icar.org.in) icfre (www.icfre.org) icssr (www.icssr.org) indest (http://paniit.iitd.ac.in/indest/) inflibnet (www.inflibnet.ac.in) moef (www.envfor.nic.in) mylibnet (www.mylibnet.org/) nassdoc (www.icssr.org/doc_mail.htm) nicchem (www.dsir.nic.in/division/ nissat/nisnat/nics/mh.html) nicfos (www.cftri.com/department/ fostis.htm) niclai (www.clri.org/) nicman (www.iimahd.ernet.in/library/) niscair (www.niscair.res.in) nissat (www.dsir.nic.in/division/nissat/) punenet (www.punenet.com/) ugc (www.ugc.ac.in) ugc-infonet (http://web.inflibnet.ac.in/info/ ugcinfonet/ugcinfonet.jsp) figure 4. urls of organizations, networks, and systems of india design and development of a himalayan studies information system for india | singh 41 42 information technology and libraries | march 2005 has a reasonably good institutional setup for himalayan research. figure 5 lists agencies currently involved in diverse fields of r&d on the himalayas. appendix b lists some of the institutions that are engaged in himalayan studies and research. 22 conclusion all himalayan studies and research centers have to assume the major responsibility for developing himis. the government of india needs to be convinced of the usefulness and utility of such a system. it is necessary to emphasize that in the absence of such an information system, a large amount of research talent and resources will be wasted in duplicated efforts. it is hoped that the himalayan institutions and scientists engaged in himalayan studies and research will be able to impress upon the government of india the need for an early formulation and implementation of himis.23 this information system will also supplement the resources and services of the participating libraries as “libraries acting together can more effectively satisfy user needs and thus meet the objectives at reduced costs.” the success of the venture shall depend upon financial support, guidance, and encouragement received from government of india.24 references 1. r. l. raina, and i. v. malhan, eds., business librarianship and information services: proceedings of the iiml-manlibnet 3rd annual national convention, march 12–14, 2001, lucknow: (lucknow: international book distributing co., 2002), v–vii. 2. shekhar pathak and anup sah, kumaon himalaya, temptations (nainital: published for kumaon mandal vikas nigam ltd. by gyanodaya prakashan, 1993). 3. n. k. shah, s. d. bhatt, and r. k. pande, himalaya: environment resources and development (almora: shree almora book depot, 1990), iii. 4. p. c. bose, “national agricultural research information system,” in national information policies and programmes, seminar papers thirty-seventh all-india library conference, (delhi: indian library association, 1991), 177. 5. d. e. avinson and g. fitzgerald, information systems development: methodologies, techniques and tools (oxford: blackwell scientific, 1988), 7. 6. ibid., 8. 7. raymond prytherch and l. m. harrod, harrod’s librarians’ glossary of terms used in librarianship, documentation, and the book crafts, and reference book, 6th ed., (aldershot, u.k.: gower pub., 1987), 385. 8. c. m. bowman, “the development of chemical information systems,” in chemical information systems, j. e. ash, and ernest hyde, eds. (chichester, u.k.: ellis horwood, 1975), 6. 9. n. k. goil, “need for a social science information system: guidelines for a model for india,” library herald 17, nos. 1–4 (1975–1979): 81. 10. s. p. agrawal, “national information system in social sciences in india: a review,” in twenty-eighth all-india library conference, october 20–23, 1982, lucknow: seminar papers of planning for national information system, j. l. sardana et al., eds. (delhi: indian library association, 1982), 273–74. 11. s. p. agrawal, “national information system in social sciences,” in handbook of libraries, archives and information centers in india, vol. 3, information policy systems and networks, b. m. gupta et al., eds. (new delhi: information industry pub., 1986), 183. 12. raina, roshan, “national information system for geoscience in india,” in twenty-eighth all-india library conference, october 20–23, 1982, lucknow: seminar papers of planning for national information system, j. l. sardana et al., eds. (delhi: indian library association, 1982), 262–63. 13. swati bhattacharyya, “library consortia: towards an action plan,” in business librarianship and information services: proceedings of the iiml-manlibnet 3rd annual national convention, march 12–14, 2001, lucknow, r. l. raina and i. v. malhan, eds. (lucknow: international book distributing co., 2002), v–vii. 14. jagdish arora and pawan agrawal, “indian national digital library in science and technology (indest) consortium: consortium-based subscription to electronic resources for technical education system in india,” in mapping technology on libraries and people: proceedings of the second international conference automation of libraries in education and research institutions (ahmedabad: inflibnet, 2003), 272–73. 15. hanif uddin, md. and haru-orrashid, md. (2002), “networking of agricultural information systems in bangladesh (bd-agrinet): a model,” library herald 40, no. 1 (2002): 11. 16. v. k. j. jeevan, “digitizing of indian library science journals,” university news 39, no. 34 (2001): 5–13. 17. desire subject gateways. accessed july 30, 2003, www.desire.org/ html/subjectgateways/community/ imesh/. 18. anil singh and j. n. gautam, “himalayan information subject gateway in digital era: a proposal for its development,” desidoc bulletin of information technology 23, no. 2 (mar. 2003): 3–9. council of scientific and industrial research (csir) defence research and development organization (drdo) department of science and technology (dst) indian council of agricultural research (icar) indian council of forests research and education (icfre) ministry of environment and forests (moef) nongovernmental organizations (ngos) universities centers under the network of university grant commission (ugc) science, technology, and environment departments in various himalayan states figure 5. agencies involved in himalayan research 19. p. c. bose, “national agricultural research information system,” 177. 20. icssr newsletter 23, no. 4 (jan.–mar. 2002): 24. 21. goil, “need for a social science information system,” 72–73. 22. p. pushpangadan and k. narayanan nair, “future of systematics and biodiversity research in india: need for a national consortium and national agenda for systematic biology research,” current science 80, no. 5 (2002): 633. 23. n. k. goil, “need for a social science information system,” 92. 24. amritpal kuar, “networking of the libraries of agricultural universities and research institutes in the states of punjab, haryana, and himachal pradesh (phhalnet): a proposal,” library herald 33, nos. 3–4 (1995–1996): 113. appendix a. main objectives of himis the main objectives of the proposed himis are as follows: � to identify, study, and survey the existing himalayan information infrastructure in the country; � to function as an information base so that policy makers, administrators, and scientists can access the computerbased information in special fields and build up their expertise; � to function as a computer-based information storage-and-retrieval system database that collects structured information generated by research institutions, continuously updating and making the information available to users � to provide a communications link with international databanks and databases for selective bibliographic information to scientists and other users; � to examine, promote, and develop existing information services and resources to meet the information requirements of scientists working in the area of himalayan research; � to establish and maintain links with other national information centers and systems in the country;12 � to create a linked collection of internet-based, high-quality himalayan resources; � to convert core indian himalayan journals, research reports, dissertations, and working papers into digital format; � to keep indian databases of himalayan journals and newsletters of himalayan institute’s online; � to establishing a network of all himalayan-studies research centers situated in different parts of the country for sharing research resources; � to provide online information of forthcoming conferences, seminars, and training workshops in himalayan-researchand-studies centers in india; � to provide details of completed and on-going himalayan-research projects; � to connect web sites of himalayan studies, hill studies departments of major universities, and himalayan-research institutes; � to provide for discussions, chat groups, and video-conferencing facilities for himalayan scientists; � to share the resources of other libraries to supplement a library’s own collection; � to share scientific efforts and expertise; � to ensure effective bibliographic control of the literature; � to facilitate and promote document-delivery and library-lending services; � to develop a common collection-development policy; � to share catalog service and to create a computerized union database; � to share database services such as abstracting, indexing, and full-text services; � to collect, store, organize, and retrieve information on all aspects of himalaya and its interdisciplinary areas contained in various recorded media; and � to coordinate the existing resources, services, and facilities within india in the field of himalayan studies. design and development of a himalayan studies information system for india | singh 43 appendix b. institutions engaged in himalayan research universities center for environmental sciences, himachal pradesh university, shimla (http://hpuniv.nic.in/envstu.htm) center for himalayan studies, university of north bengal, darjeeling (www.nbu.ac.in/) center for interdisciplinary studies of mountain and hill environment, university of delhi, delhi g. b. pant university of agriculture and technology, ranichauri, tehri-garhwal (www.gbpuat.ac.in/) 44 information technology and libraries | march 2005 north east hill university, shillong (www.nehu.ac.in/) high altitude plant physiology research center, h. n. b. garhwal university, srinagar-garhwal dr. y. s. parmar university of horticulture and forestry, solan (www.yspuniversity.ac.in/) institute of integrated himalayan studies (iihs), (ugc center of excellence) himachal pradesh university, shimla (www.hpuniv.nic.in/) institute of himalayan studies and regional development, garhwal university, srinagar-garhwal r&d institutions central institute of higher tibetan studies, sarnath, varanasi (www.smith.edu/cihts/) central institute of himalayan culture studies, dahung, arunachal pradesh defence agricultural research laboratory (drdo) pithoragarh (http://drdo.nic.in/labindex.shtml) snow and avalanche study establishment, (drdo) chandigarh (http://drdo.nic.in/labindex.shtml) temperate forest research institute (icfre), shimla (www.envfor.nic.in/icfre/tfris/tfris.html) himalayan forest research institute, shimla (www.icfre.org/institues/hfri.htm) forest research institute (icfre), dehradun (www.envfor.nic.in/icfre/fri/fri.html) institute of himalayan bioresources technology, (csir) palampur (www.csir.res.in/ihbt/) regional research laboratory, (csir) jammu tawi (www.rrljammu.org/) g. b. pant institute of himalayan environment and development, with its headquarters at almora; and regional units at tadong-gangtok; srinagar-garhwal; shamshi-kullu; itanagar (http://gbpihed.nic.in/) icar research complex for neh region, (icar) umroi road : barapani, meghalaya (http://dare.nic.in/icarneh.htm) vivekananda parvatiya krishi anushandhanshala (icar), almora (http://vpkas.nic.in/) central institute of temperate horticulture (icar), srinagar, jammu and kashmir (http://dare.nic.in/cith.htm) indian veterinary research institute, regional station, palampur (icar) (http://ivri.nic.in/) indian veterinary research institute, regional station, muketeswar (icar), nainital (http://ivri.nic.in/) national bureau of plant genetic resources, regional station, bhowali–niglat, nainital (http://nbpgr.delhi.nic.in/) north eastern regional institute of science and technology, nirjuli, itanagar (http://agni.nerist.ac.in/) wadia institute of himalayan geology, (dst) dehradun (www.himgeology.com/) wildlife institute of india, dehradun (www.wii.gov.in/) international center international center for integrated mountain development (icimod), kathmandu, nepal (www.icimod.org/) ngos center of himalayan development and policy studies, dehradun himalayan environmental studies and conservation organization, kotdwara (garhwal), uttaranchal society for integrated development of himalayas (sidh), landour cantt., musoorie himalayan action research center (harc), dehradun himalayan study circle, pithoragarh people’s association for himalayan area research (pahar), nainital the himalaya trust, dehradun the himalayan foundation, nandprayag, chamoli distt research, advocacy, and communication in himalayan areas (rachna), dehradun central himalayan environment association, nainital himalayan region study and research center institute, new delhi himalayan seva sangh, new delhi himalayan research group, nainital himalayan research and cultural foundation, new delhi himalayan institute of action research and development, dehradun lib-mocs-kmc364-20140103102448 on the recursive definition of a format for communication leonid n. sumarokov: head, research department, international center for scientific information, moscow, ussr 61 a recursive presentation of a communication format is discussed and a form of pertinent notation proposed. recursive notation permits presentation of an interchange format in more general terms than heretofore published, and expands application possibilities. the development of the forms of exchange of information among documentation systems, and particularly the development of the technique of recording machine readable bibliographic data on magnetic tape, has led to the requirement for the adoption of an agreement on a standard for a format for communication. thus, the problem of a format for communication reflects the existing tendency toward ensuring compatibility among formats. at the present time the greatest impact on world information practice has been caused by the american national standard institute (ansi) standard for bibliographic information interchange on magnetic tape ( l ) and the several implementations of that standard: marc, inis, cosati and others. it should be noted that, despite numerous existing peculiarities, in principle there is no difference in structure among the formats. one of the most important requisites for a communication format is universality. the practice of processing large quantities of information has emphasized the flexibility of the above-mentioned formats; their use has permitted identification of huge numbers of documentary materials in 62 journal of library automation vol. 4/2 june, 1971 various forms, thereby creating the impression that the structure of the format has been developed to such an extent that it can be canonized for any application. it must be said that support or rejection of this impression can be based only upon future experience in the application of a communication format. nevertheless, it appears expedient to generalize about the structure of a communication format by making a few preliminary remarks and thereby contributing toward expanding the sphere of its application. the remarks deal with the following. in the existing systems for interchanging information on magnetic tape, the document is the object of identification. with the development of data banks the characteristics of the objects to be identified may prove to be so varied, even though presented in the proper documentary form, that their uniform presentation will cause difficulties. (actually, examples can be given of data banks in which data appear in the capacity of objects : information regarding firms, rivers, information about products of the electrical engineering industry, etc.). furthermore, even if it is possible to identify in principle a certain object with the aid of the format, one must distinguish between the question of possible identification in principle, and that of the optimal (or rational) form of identification in view of the limitations of a certain system. the recursive notation of a communication format is presented below. certain definitions and ideas in general are used as source material for such a notation, using the american standard for bibliographic information interchange on magnetic tape ( 1). it must be conceded that the use of one term or another for defining individual elements of a notation, as well as the general structure of the entire notation, are not the principal subject of discussion here; this means that any change, either in definition or, to a certain extent, in the structure of the notation, will not affect the proposed form of the notation. consequently, this article does not pretend to describe a certain universal structure for a communication format. it has a different purpose, viz., to point out wider perspectives that will unfold by applying the recursive presentation of notations in formats at the expense of an object with any hierarchical depth. for the following symbols explanations can be found in the ansi standard ( 1 ) : r=record l=leader dr= directory t=tag d=data, or data elements ft=field terminator, or field separator rt=record terminator, or record separator the concept tt used below, and standing for tag terminator, is analogous to ft and rt. so also is the concept sf, meaning specific fields for de· d efinition of communication formatjsumarokov 63 fining contents that did not appear in the proposed notation although utilized in actual formats. the following symbols are also used : tg=tag generalized f=field df=data field bf=bibliographic fields utilization of special notation in brackets (analogous to the form used in algorithmic languages) enables r to be defined in the form of the following consecutive structure: 1) r=[l] [dr] [sf] [bf] the symbols written in brackets after the equal sign maintain the relationship of priority. further, the recursive universal tag tg is defined as follows: 2) tg=[t;tt] such a notation indicates that the expression in brackets is t or tt. the recursiveness of the notation indicates that it is possible that tg is t1t2 ... tp :tt where p is any whole number, a larger or an equal unit. (obviously p defines the depth of the hierarchic description in accordance with the given characteristic. ) finally 3) f=:[tg] [d]; 4) df=: [f;ft] ; 5) bf=: [df;rt]. thus, the general notation of the format is expressed by 1), in which the element bf, which constitutes the basic part of the so-called alternate fields , is expressed recursively with the aid of the system 2) -5 ). as is evident, the quantity f in df, and df in bf, as well as in the case of the subscripts tg, can arbitrarily be a whole number, changing from notation to notation. reference l. "usa standard for a format for bibliographic information interchange on magnetic tape," 1 ournal of library automation, 2 (june 1969), 53-65. 128 on-line serials system at laval university library rosario de varennes: director, library analysis and automation, laval university library, cite universitaire, quebec, canada description of a system, operational since june 1968, that provides control of all serials holdings in nine campus libraries, permits updating of the complete file every two or three days, and produces various outputs for library users and library staff from data in variable fields on disks (listings, statistics, etc.). the program, presently operating on an ibm 360/50 and utilizing an ibm 2314 disk -storage facility and three ibm 226 crt terminals, is written in ibm system/360 operating system assembler language and in pl/i; it could encompass a file of no more than 10 million records of variable length limited to 127/255 characters and subdivided in 25 or fewer fields . l'universite laval, the oldest french university in america, around 1950 began a move from the original location in historic old quebec to a new campus in suburban sainte-foy; the general plan calls for a total investment of $235,000,000. this private institution, subsidized by the provincial government at about 75%, had an operating budget in 1968/69 of $32,000,000 (research not included); of this sum, $2,300,000 was appropriated for the library system. the enrollment of full-time students was 10,145 and the total registration 22,726. the regular teaching staff amounted to 1,016 and the total figure was 1,691. the library serving this community constitutes a unified system under one administration, with centralized technical processing, but with nine physical locations-one of which is still in the old city-and four auxiliary services: documentation center, rare books and archives, map lion-line serials systemjde varennes 129 brary and film library. the most recent addition to it is the main library building, dedicated in june 1969, a $10,000,000 seven-story complex of 424,000 square feet(1) . the library staff consists of 269 employees, of which 78 are professional librarians or specialists. the serials department totals fifteen employees, of which three are professional librarians. the collections as of august 1969 represent 815,966 physical units, or 433,407 cataloging units of books, periodicals, government publications, pamphlets and microtexts; and 88,734 physical units of special collections (maps, photos, films, fixed films, music records, manuscripts, archives). the serials alone account for 189,440 bound volumes and 16,335 titles, of which 12,396 are received currently and 7,934 are subscriptions. the figures for serials titles will probably reach the 20,000 mark with the completion in 1970 of an inventory started in 1964. library automation venture at the library goes back to the autumn of 1963, when an off-line serials system and a subject headings list program were begun. along the road, there was developed in the documentation center a special technique of information storage and retrieval utilizing the recordak miracode (microfilm retrieval access code) system and a program called asyvol 2 (analyse synthetique par vocabulaire libre/synthetical analysis by free vocabulary) by means of which various indexes and research projects are currently processed. recently the first on-line real-time program with the new serials system went into successful operation. some literature, mostly in french, has been issued concerning these realizations and projects, but has been little publicized (2-11). it is also worth noting that the library, except for some peripheral equipment, mostly input devices, does not own any machinery and is utilizing instead the programming staff and the computer facilities of the polyvalent laval university's computing center. in the library itself the author of this article is mainly responsible for preliminary analysis of projects, for coordination of activities between the library and the computing center, for the supervision of work done in library automation units integrated into library services, and for the administration of the budget appropriated for library automation. this last item, research projects not included, is $170,000 for the year under discussion. system design contents of the file in its present organiza.tion, the serials file is accessible only by an accession number limited to seven digits and ordered corresponding to the alphabetized entries of records. there is a distinct entry for every title and every reference and for each duplication of any one title or reference. all records fall within two main divisions: humanities, represented by h, and sciences, represented by s, and are further identified by subdivisions of these main classes to a limit of three letters (for 130 journal of library automation vol. 3/2 june, 1970 titre a sigle g period! cite c repertoires de defouillement d ab./don/ech. e lieu de fubl. f pays/langue g cote h date de parution i nit !ale i editeur et son adresse j prix k note historique l titre direct m vedettes·"'atiere 1 2 n renouv • ccm~. 0 . etat de collection de l'annee cour. p etat retrospectif de la collection q etat retrospectif de ce qui manqjje r n.b. ."sigle !ere col 2e col 3e col 4e col. 5e col. 6e c ol. fig. 1. input sheet. matricule no: i i i i i i i i i i i i i 1 courant 1 publication officielle 1 annuel ou continuation 1 voir 1 collection non complete 1 periodiques d'hopitallx i i i i i i i 0 non courant 0 non officielle 0 les autres o titre du periodique o collection complete information sheet for transmission to system (back-up) on-line serials system/de varennes 131 fig. 2. serials file updating. 132 journal of library automation vol. 3/2 june, 1970 example: hh, main library; hmu, music library; sa, agricultural library; scc> science library, department of chemistry). there is a possibility of 25 fixed/variable fields for any record, but only 18 are currently used (figure 1). as of september 2, 1969, the statistical figures for the complete file were as follows: 22,530 entries, of which 16,335 were titles; 6,192 references; three entries were unspecified by error. hardware the system is operational with an ibm 360/50, an ibm 2314 diskstorage facility and ibm 2316 disk packs, three ibm 2260 crt display units, an ibm 2848 display control unit, an ibm 2401 tape transport and control unit and an ibm 1403 high-speed printer. the program system occupies a 56,320-byte region of core memory. software the system developed at laval provides essentially for two things: the record display on crt terminals for questioning or modifying the records; and the updating of the serials file (figure 2). it is not affected by the bibliographic contents of records; the control of this part is the responsibility of the serials department. the system could encompass any file of no more than 10 million records of variable length limited to 127/255 characters and subdivided into 25 or fewer fields. the program is written in ibm system/ 360 operating system assembler language, except for the output and printing routines written in pl/1, and it is conceived for an ibm 360/ 30 model or one of a higher number, matched at a minimum with one magnetic tape, one disk, one 2848 control unit and one 2260 crt terminal; it operates under the control of operating system os/ 360, version 14 or subsequent (12, 13). the system is roughly subdivided into three subsystems, the first being the control routine for the system and crt terminals, developed by the linkages cm cmca ed~ nvc modossif avanc i racine i --r--institr ~ecran croix ins chan 'ligne ins ~ cor mot aic reperage ric log blanbuf fig. 3. communication between modules. on-line serials systemfde varennes 133 computing center of laval university and called racine (root). the second is a subsystem that consists of display and updating routines, a group of 18 modules falling under two control sections ( csect): linkages and marchand (family name of an analyst-programmer from b.i.r.o.). each is again constituted of various subprograms, and marchand includes also all literals of the program. all these modules communicate various ways as illustrated in figure 3. modules not within the large box constitute the csect marchand; other csecf are within the large box. the third subsystem is a modification routine of records on disk called modossif (modification des dossiers du fichier/modification of records on file). the ibm/linkage editor links these routines, and they communicate 1) by way of specific registers; and 2) by way of working space areas, some common to all terminal stations and some restricted to one in particular. the main purpose of the system being to give the user up-to-date information, it is implied that information concerning modified records should be available as soon as the transaction is performed. the ibm/ indexed sequential system seems at first sight to ideally answer this need. nevertheless, the library was forced to elaborate a more complex system for the sake of security. data sets (figure 4) commune et unite modossi f ' ' ' ' 0 0 0 0 procedure de " restart " ---~, fig. 4. data sets. 134 journal of library automation vol. 3/ 2 june, 1970 there exist the master file in direct access on disk, with a backup £le on tape. when a record is asked for, the accession number of the record is transmitted to the program and searched for in the file; if found it is duplicated completely in the working space area on disk corresponding to a particular terminal, and is displayed on the crt nine lines at a time. in case of a modification being asked for, it is on this copy in that particular area that the program modossif applies. in switching to another demand, the program checks to see if any modification occurred. if not, the copy is destroyed; otherwise the amended record is transferred to the temporary common working space area on disk called bp am (buffer periodiques amendes/ buffer amended serials), where all modifications accumulate from one updating of the master £le to the other. if queried anew before updating, the same amended record will be retrieved from bp am £le and duplicated as before. moreover, any instruction concerning modifications is chronologically recorded on tape as given and constitutes the log (figure 2). if any down time occurs, it is then possible to simulate all the transactions performed since the last updating. updating, normally a daily process, is basically the merger of the master £le with the bp am £le, resulting in the creation of a new master £le on disk and a new backup on tape. record in the file as mentioned before, any record in the £le is identified by an accession number of seven digits. number 0000000 identifies the system's messages and is always displayed first, and number 9999999, indicating that a working space area is not occupied, is not to be used. otherwise all numbers are symmetrical and interchangeable. any record may cover up to 25 fields or blocks of logical information. these fields are identified by letters a to y and put into alphabetical order. they vary in length from three to many thousand characters. each field is divided into three elements: identifying letter; information to display; and end of field or record control tag. this tag is fd (fin du dossier/end of field) for all fields except the last one, which is tagged fe (fin de !'entree/ end of record ). the information to be displayed is submitted to various restrictions, exemplified in detail in the instructions manual ( 13). the manual, in fact, puts in action the main program modossif. physically any record on file is subdivided into many subrecords of fixed length ( l equ nnn) optimized at 239 bytes to a maximum of 127 per record. each subrecord is addressed in three sections as follows : 235 bytes of information, three bytes representing the accession number in binary code and one byte giving the sequence number of the subrecord under this particular accession number. this way, the last four bytes give the key to the subrecord in the master file and the last byte the key of access in the working-space area, making it possible to execute on-line serials systemfde varennes 135 modossif and various print-out routines. to facilitate the retrieval of any particular field in a record, the 26 first bytes of the first subrecord are set aside for an index to the fields . the 25 first bytes represent fields a to y; in each position, a binary zero points to an inexistent field and a positive value indicates the sequence of the subrecord where the field starts. the 26th byte gives the total number of subrecords in the record. the 27th byte gives the name of the first existing field, etc. figure 5 is an example of a complete record. underlined sections indicate hexadecimal notation. each row in the · figure is a subrecord here given an unreal value of 40. the remainder is in alphanumeric characters, except that space is compressed and indicated by a dollar sign. the information on figure 5 appears as follows on the crt screen: a tokyo bunrika daiguru, science reports b 000010 sc d annual reports of sciences q no 50-67, 83, 97 t chimie chimie industrielle u a et b c v a-b c w vide x revue annuelle de chimie y ce dossier est dresse a titre d'exemple seulement. varia the program provides also the parameters for each of the lines displayed on the screen, that is, nine screen-parameters called p armec ( parametres-ecran). a b c d e f g u i j k l m n 0 p q r s t u v w x y 0102000300000000000000000000000003000004050505050608a t 0 k y 0 bun 21e88e01 1 2 3 4 567 r i k a d a i g u r u , s c i e n c e r e p 0 r t s fd b 0 0 0 0 1 0 21e88e02 $ s c fd d a n n u a l r e p 0 r t s 0 f s c i e n c e s !~ q n 0 21e88e03 5 0 6 7 , 8 3 , 9 7 fd t c h i h i e c ii i 11 i e i n d u s t r i e 21e88e04 l l e fd u a e t d $ c fd v a f d , $ c fd w $ v i d e fd x r e v u 21e88eos e a n n u e l l e d e c h i h i e • fd y c e d 0 s s i e r e s 21e88e06 t d r e s s e a t i t r e d ' e x e m p l e s e u l e m e n t 21e8be07 , fe0000000000000000000000000000000000000000000000000000000000000000000021e8be08 fig. 5. example of complete record. 136 journal of library automation vol. 3/2 june, 1970 the analyst-programmers at the computing center completed the program by various printing subprograms from the data on variable fields on disks, by a statistics subprogram and by a control routine of the indexes to the file. recently another addition occurred to the system when les presses de l'universite laval, the library's subscription agent for serials, decided to utilize the file to initiate a computerized ordering process. the programming for this project was tested during october and the program was successfully run during the first week of december 1969. implementation as soon as it was confirmed that the computing center would receive by summer 1967 a third-generation computer (ibm 360/40) it was deemed advisable to contemplate an on-line system to replace the already saturated off-line serials system on the ibm 1410 inaugurated in 1964 (8). an optimistic target date having been set for january 1968, the author transmitted in april 1967 to the computing center for study a working hypothesis concerning the automatic conversion of holdings data and the automatic claiming of missing issues ( 14). in answer to it, in august 1967, mr. jean lachance, analyst-programmer, proposed a first draft of an automatic control system for serials ( 15). in fact the draft envisaged only a semi-automatic conversion of data and the on-line system for current entries only, the non-current being managed off line. then, on account of various restrictions befalling the computing center, it was decided to call upon an external firm, b.i.r.o., inc., located in quebec city. the contract, signed at the end of november 1967, provided basically for: 1) the conversion of the master file on magnetic tapes, containing records in fixed fields, to a random access file on disks, with records in variable fields; 2) the programming of record display on crt terminals; 3) the updating of the file via coded input procedures; 4) the provision of transitory working space areas for current transactions; 5) the possibility of questioning and amending both the master file and the transitory file; 6) the writing of the appropriate technical documentation and the intitial training of the operators of the terminals. the contract was to be in conformity with the standards of operating system os for an ibm 360 computer and subject to the acceptance of the computing center of laval university. at the same time a working schedule was established as follows: 1) beginning of work as soon as the contract was signed; 2) operational program sixty days after delivery to the contractor of terminals in good working condition according to manufacturer's specifications; and 3) termination of contract thirty days after acceptance of the finished product. the terminals were ready by january 25, 1968. the program was declared operational by april 11, 1968, and the technical report describing it deposited the week after. meanwhile a last updating of the master on-line serials systemjde varennes 137 file with the off-line system was performed at the beginning of april. on april 29, 1968, the conversion of the file to ibm 2311 disks connected to an ibm 360/40 was realized. everything was then ready for the final test. unhappily, mr. lachance left laval at the end of april at the most crucial moment, and it was not before june 12, 1968, that the first updating succeeded and the program became operational. from then on, apart from various technical problems, there were other difficulties: a moving of the main library during august, a moving of the computing center that precluded any activity from september 26 to october 25, and a switch to an ibm 360/50 and to an ibm 2314 diskstorage facility during october-not to mention a turnover of staff in the serials department. these prevailing conditions explain why the program was not officially accepted by the library before december 16, 1968. operation the first of a series of turning points in the refinement of the program occurred in november 1968 with a normalized run of updatings. in january 1969 the two first printing programs ran successfully; these were a daily checking list with calendar, and a statistics subprogram (figure 6). in february 1969 the program produced almost error-free updatings (0.8% and 0.63%), and in june 1969 the system was finally debugged, eliminating a particular recurring anomaly accounting for most of the errors in the system (a display of preceding instruction bearing accession number and code in some field o:i subsequent record). some other technical difficulties were encountered along the road and tasleau des slalisti~u”) and c∈{c1, . . . ,ck}. tag predicates (i = t) and class predicates (class = c) represent exact matches, which mean “the object is tagged with the tag (i, t)” and “the object belongs to class c.” q ∷=(q and q) | (q or q) | (l op v) | (i = t) | (class = c) | v | ε table 1. solr query language. http://www.geonames.org/ontology/documentation.html http://dbpedia.org/ontology information technology and libraries | september 2014 26 virtual and real-time tagging in annotation tagging data curators apply bulk (un)tagging actions with respect to a tag (i, t) over arbitrarily large sets of objects returned by queries q over the information space. due to the potential impact that such operations may have over the information space, tools for annotation tagging should allow data curators to perform their actions in a protected environment called work session. in such an environment curators can test sequences of bulk (un)tagging actions and incrementally shape up an information space preview: they may view the history of such actions, undo some of them, add new actions, and pose queries to test the quality of their actions. to offer a usable annotation tagging tool, it is mandatory for such actions to be performed in (almost) realtime. for example, curators should not wait more than a few seconds to test the result of tagging 1 million objects, an action which they might undo immediately after. moreover, such actions should not conflict (e.g., slow performance) with the activities of end users running queries on the information space. finally, when data curators believe the preview has reached its maturity, they can commit the work session, i.e., materialise the preview in the information space, and make the changes visible to end users. apache solr and annotation tagging as mentioned in the introduction, our focus is on annotation tagging for apache solr (v3.6). this section describes the main information space features and functionalities of the solr full-text index search platform. in particular, it explains the issues arising when using its native apis to implement bulk real-time tagging as described previously. solr information spaces: objects, classes, tags, and queries solr is one of the most popular full-text indexes. it is an apache open source java project that offers a scalable, high performance and cross-platform solution for efficient indexing and querying of information spaces made of millions of objects (documents in solr jargon).23 a solr index stores a set of objects, each consisting in a flat list of possibly replicated and unordered fields associated to a value. each object is referable by a unique identifier generated by the index at indexing time. the information spaces described previously can be modelled straightforwardly in solr. each object in the index contains field-value pairs relative to the properties and tag interpretations of all classes they belong to. moreover, we shall assume that all objects share one field named class whose values indicate the classes (e.g. c1, . . . ,ck) to which the object belongs. such an assumption does not restrict the application domain, since classes are typically encoded in solr by a dedicated field. the solr api provides methods to search objects by general keywords, field values, field ranges, fuzzy terms and other advanced search options, plus methods for the bulk addition and deletion of objects. in our study, we shall restrict to the search method query(q, qf), where q and qf are cql queries respectively referred as the “main query” and the “filter query”. in particular, in order to match the query language requirements described previously, we shall assume that q and qf are high-performance annotation tagging over solr full-text indexes | artini et al 27 expressed according to the cql subset matching the query language in table 1. getdocset :rs→ds returns the docset relative to a result set intersectdocsets :ds × ds→ds returns the intersection of two docset intersectsize :ds × ds→integer returns the size of the intersection of two docsets unifydocsets :ds × ds→ds returns the union of two docsets andnotdocsets :ds × ds→ds given two docsets ds1 and ds2 returns the docset {d | d ∈ ds1 ⋀ ¬ d ∈ ds2} searchondocsets :q × ds→rs executes a query q over a docset and returns the relative resultset table 2. solr docset management low-level interface. to describe the semantics of query(q, qf) it is important to make a distinction between the solr notions of result set and docset. in solr, the execution of a query returns a result set (i.e., queryresponse in solr jargon) that logically contains all objects matching the query. in practice, a result set is conceived to be returned at the time of execution to offer instant access to the query result, which is meantime computed and stored in memory into a low-level solr data structure called docset. docsets are internal solr data structures, which contain lists of object identifiers and allow for efficient operations such as union and intersection of very large sets of objects to optimize query execution. table 1 illustrates some of the methods used internally by solr to handle docsets. method names have been chosen to be self-explanatory and therefore do not match the ones used in the libraries of solr. information technology and libraries | september 2014 28 ⟦query(q,qf)⟧solr= � {d|id(d)∈ ⟦q⟧ds} if (qf=null) searchondocset(q, ⟦qf⟧cache(ϕ)) if (qf ≠null) ⟦qf⟧cache(ϕ)= � ds if (ϕ(qf)=ds) ⟦qf⟧cache(ϕ[qf ← ⟦qf⟧ds]) if (ϕ(qf) = ⊥) ⟦(q1 and q2)⟧ds= ⟦q1⟧ds ∩ ⟦q2⟧ds ⟦(q1 or q2)⟧ds= ⟦q1⟧ds ∪ ⟦q2⟧ds ⟦(l op v)⟧ds= {id(d)| d.l op v} ⟦(i=t)⟧ds= {id(d)| d.i op t} table 3. semantic functions. informally, query(q, qf) returns the result set of objects matching the query q intersected with the objects matching the filter query qf, i.e. its semantics is equivalent to the one of the command query(q and qf, null). in practice, the usage of a filter query qf is intended to efficiently reduce the scope of q to the set of objects whose identifiers are in the docset of qf. to this aim, solr keeps in memory a filter cache ϕ:q →ds. the first time a filter query qf is received, solr executes it and stores the relative docset ds in ϕ, where it can be accessed to optimize the execution of query(q, qf). once the docset ϕ(qf) = ds is available, query(q, qf) invokes the low-level method searchondocset(q, ds) (see table 1). the method executes q to obtain its docset, efficiently intersects such docset with ds, and populates the result set relative to the query. due to the efficiency of docset intersection and in-memory data structures, query execution time is closely limited to the one necessary to execute q. table 3 shows the semantic functions ⟦.⟧solr :q x q →rs , ⟦.⟧ds :q →ds, ⟦.⟧cache :q x ℘(q x ds) → ds. the first yields the result set of query(q, qf); the second the docset relative to a query q (where d is an object); and the third resolves queries into docsets by means of a filter cache ϕ. limits to virtual and real-time tagging in solr whilst solr is a well-known and established solution for full-text indexing over very large information spaces, it poses challenges for higher-level applications willing to expose to users private, modifiable views of the same index. this is the case for annotation tagging tools, which must provide data curators with work sessions where they can update with tagging and untagging actions a logical view of the information space, while still providing end users with search facilities over the last committed information space. since solr api does not natively provide “view management” primitives, the only approach would be that of materializing tagging and untagging high-performance annotation tagging over solr full-text indexes | artini et al 29 actions in the index while making sure that such changes are not visible to end users. prefixing tags with work session identifiers, cloning of tagged objects, or keeping index replicas may be valuable techniques, but all fail to deliver the real-time requirement described previously. this is due to the fact that when very large sets of objects are involved the re-indexing phase is generally far from being real-time. in general, independently of the configuration, processing such requests may take up to some hours for millions of objects, while for real-time previews even minutes would not be acceptable. tagtick virtualizer: virtual real-time tagging for solr this section presents the tagtick virtualizer module, as the solution devised to overcome the inability of apache solr to support out-of-the-box real-time virtual views over information spaces. the virtualizer api, shown in table 4, supports methods for creating, deleting and committing work sessions, and, in the context of a work session: (1) performing tagging/untagging actions and (2) querying the information space modified by such actions. in the following we will describe both functional semantics and implementation of the api, given in terms of a formal symbolic notation. the semantics defines the expected behaviour of the api and is provided in terms of the semantics of solr. the implementation defines the realization of the api methods in terms of the low-level docset management library of solr. the right side of figure 1 illustrates the layering of functionalities required to implement the tagtick virtualizer module. as shown, the realization of the module required exposing the solr low-level docset library through an api. figure 1. tagtick virtualizer: the architecture. tagtick virtualizer api: the intended semantics the commands createsession() creates a new session s, intended as a sequence of (un)tagging information technology and libraries | september 2014 30 actions over an initial information space i. the command and deletesession(s) removes the session s from the environment. we denote the virtual information space obtained by modifying i with the actions in s as i(s); note that: i(𝜖) = i. createsession() creates and returns a work session s deletesession(s) deletes a work session s commitsession(s) commits a work session s action(a, rs, (i, t), s) applies the action a with (i, t)to all objects in rs in s virtquery(q, s) executes q over the information space i(s) table 4. tagtick virtualizer api: the methods. the command action(a, rs, (i, t), s), depending on the value of a being tag or untag, applies the relative action for the tag (i, t) to all objects in rs and in the context of the session s. (un)tagging actions occur in the context of a session s, hence update the scope of the information space i(s). the construction of such rs takes place in the annotation tagging tool user interface and may require several queries before all objects to be bulk (un)tagged are collected. annotation tagging tools may for example provide web-basket mechanisms to support curators in this process. the command commitsession(s) makes the virtual information space i(s) persistent, i.e., materializes the bulk updates collected in session s. once this operation is completed, the session s is deleted. the command virtquery(q, s) executes a virtual search whose semantics is that of the solr’s method query(q, null) executed over i(s). more formally, let’s extend the semantic function ⟦.⟧solr to include the information space scope of the execution, that is: ⟦query(q,qf)⟧solr i is the semantics of query(q, qf) over a given information space i. then, we can define: ⟦virtquery(q, s)⟧tv = ⟦query(q, null)⟧solr i(s) tagtick virtualizer api: the implementation to provide its functionalities in real time, the tagtick virtualizer avoids any form of update action into the index. the module emulates the application of bulk (un)tagging actions over the information space by exploiting solr low-level library for docset management, whose methods are shown in table 2. the underlying intuition is based on two considerations: (1) the action action(a, rs, (i, t), s) can be encoded in memory as an association between the tag (i, t) and the objects in the docset ds relative to rs in the context of s; and (2) the subset of objects ds should be returned to the high-performance annotation tagging over solr full-text indexes | artini et al 31 query (i = t) if executed over i in the scope of s (i.e., as if i was updated with such an action). by following this approach, the module may rewrite and execute calls of the form virtquery(q and (i = t)) into calls searchondocset(q, ds), thereby emulating the real-time execution of the query over the information space i(s). more generally, any query of the form q and qtag predicates, where qtag predicates is a query combining tag predicates relative to tags touched in the session, can be rewritten as searchondocset(q, ds). in such cases, ds is obtained by combining the docsets relative to tag predicates by means of the low-level methods intersectdocsets and unifydocsets. the tagtick virtualizer module implements the aforementioned session cache by means of an inmemory map ρ = s × i × t →ds, which caches the tagging status of all active work sessions. to this aim, ρ maps triples (s, i, t) onto docsets ds that are defined as the set of objects tagged with the tag (i, t) in the context of s at the time of the request. the tagtick virtualizer is stateless with regard to the specific tags and sessions identifiers it is called to handle; such information is typically held in applications using the module to take advantage of real-time, virtual tagging mechanisms. tagging and untagging actions the method action(a, rs, (i, t), s) has the effect of changing the status ρ to reflect the action of tagging or untagging the objects in the result set rs with the tag (i, t) in the session s. table 5 describes the effect of the command over the status ρ in terms of the semantic function ⟦.⟧m:c × ℘(s×i×t)→℘(s×i×t) that takes a command c and a status ρ and returns the status ρ affected by c. in order to optimize the memory heap, ρ is populated following a lazy approach, according to which a new entry for the key (s, i, t) is created when the first tagging or untagging action with respect to the tag (i, t) is performed in the scope of s. when the user adds or removes a tag (i, t) for the first time in the session s (case ρ(s, i, t)= ⊥), the value of the entry ρ(s, i, t) is initialized to the docset relative to the query i = t: ds = getdocset(⟦query((i=t),null)⟧solr i the function init(ρ, s, i, t) returns such new ρ over which the tag or untag action is eventually executed. if the action involves a tag (i, t) for which an entry ρ(s, i, t)= ds exists (case ρ(s, i, t) ≠ ⊥), the commands return the new ρ obtained by adding or removing the docset getdocset(rs) to or from ds. such actions are performed in memory with minimal execution time. information technology and libraries | september 2014 32 ⟦action(a, rs, (i, t), s)⟧m(ρ)= � updatetag(ρ, rs, (i, t), s) if(a=tag and ρ(s, i, t)≠ ⊥) updateuntag(ρ, rs, (i, t), s) if(a=untag and ρ(s, i, t)≠ ⊥) ⟦action(a, rs, (i, t), s)⟧m(init(ρ, s, i, t)) if(ρ(s, i, t) = ⊥) init(ρ, s, i, t)= ρ[ρ(s, i, t)←getdocset(⟦query(i=t, null)⟧solr] updatetag(ρ, rs, (i, t), s)= ρ[ρ(s, i, t)← ρ(s, i, t) ∪ getdocset(rs)] updateuntag(ρ, rs, (i, t), s)= ρ[ρ(s, i, t)← ρ(s, i, t) ∖ getdocset(rs)] table 5. semantics of tag/untag commands. queries over a virtual information space as mentioned above, the command virtquery(q, s) is implemented by executing the low-level method searchondocset(q', ds). informally, 𝑞′ is the subpart of q whose predicates are not affected by actions in s, while ds is the subset of objects matching tag predicates affected by actions in s, to be calculated by means of the map ρ. to make this a real statement, two main issues must be addressed. the first one is syntactical: how to extract from q the sub-query q' and the subquery to be filtered by ρ to generate ds. the second issue is semantic: the misalignment between the objects in the original information space i, where searchondocset is executed, and the ones in i(s), to be virtually queried over and returned by virtquery. syntactic issue: to obtain 𝑞′ and ds from q, the tagtick virtualizer module includes a query rewriter module that is in charge of rewriting q as a query: q' and qtags in session (1) both queries are compliant to the query grammar in table 1, but the second is a query that groups all tag predicates in q which are affected by s. the reason of this restriction is due to the fact that the method searchondocset(q’, ds) performs an intersection between the docset ds and the docset obtained from the execution of q�. in principle, qtags in session may contain arbitrary combinations of tag predicates (i = t) combined with and and or operators. to get a better understanding, refer to the examples in table 6, where we assumed to have two tag interpretations a with terms {a1, a2} and b with terms {b1, b2} where ρ(s, a, a1) and ρ(s, b, b1) are defined in ρ; note that keyword searches, e.g., “napoleon,” are not run over tag values. the first two queries can be executed, while the last one is invalid. indeed there is no way to factor out the tag predicate (a = a1) so that it can be separated and joint with the rest of the query using an and operator. high-performance annotation tagging over solr full-text indexes | artini et al 33 clearly, the ability of the query rewriter module to rewrite the query independently of its complexity may be crucial to increase the usability level of tagtick virtualizer. in its current implementation, the tagtick virtualizer assumes that q is provided to virtquery as already satisfying the expected query structure (1). as we shall see in the next section, this assumption is very reasonable in the realization of our annotation tagging tool tagtick and, more generally, in the definition of tools for annotation tagging. indeed, such tools typically allow data curators to run google-like free-keyword queries to be refined by a set of tags selected from a list. such queries fall in our assumption and also match the average requirements of this application domain. 𝑞 = "napoleon" 𝐴𝑛𝑑 (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏1) 𝑤ℎ𝑒𝑟𝑒: 𝑞′ = "napoleon" 𝑞𝑡𝑎𝑔 𝑖𝑛 𝑠𝑒𝑠𝑠𝑖𝑜𝑛 = (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏1) 𝑞 = (𝐴 = 𝑎2 𝑂𝑟 "napoleon") 𝐴𝑛𝑑 (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏1) 𝑤ℎ𝑒𝑟𝑒: 𝑞′ = (𝐴 = 𝑎2 𝑂𝑟 "napoleon") 𝑞𝑡𝑎𝑔 𝑖𝑛 𝑠𝑒𝑠𝑠𝑖𝑜𝑛 = (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏1) 𝑞 = (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏2) 𝐴𝑛𝑑 napoleon table 6. query rewriting. semantic issue: the command searchondocset(q�, ds) does not match the expected semantics of virtquery(q, s). the reason is that searchondocset is executed over the original information space i and objects in the returned result set may not reflect the new tagging imposed by actions in s. for example, consider an untagging action for the tag (i, t) and the result set rs in s. although the objects in rs would never be returned for a query virtquery((i = t),s), they could be returned for queries regarding other properties and in this case they would still display the tag (i, t). to solve this problem, the function patchresultset : rs → rs in table 7 intercepts the result set returned by searchondocset and “patches” its objects, by properly removing or adding tags according to the actions in s. to this aim, the function exploits the low-level function intersectsize, which efficiently computes and returns the size of the intersection between two docsets. for each object d in a given result set rs, the function verifies if d belongs to the docsets ρ(s, i, t) relative to the tags touched by the session s: if this is the case (intersectsize returns 1), the object should be enriched with the tag (add(d, (i, t))), otherwise the tag should be removed from the object (remove(d, (i, t))). information technology and libraries | september 2014 34    = = = = ≠ 0 ))(},({ )),(,( 1 ))(},({ )),(,( )),(,( ))},(,({),,( ^),,( rsgetdocsetdizeintersectsiftidremove rsgetdocsetdizeintersectsiftidadd tidentpatchdocum tidentpatchdocumsrrstsetpatchresul tisr dîrs   table 7. patching result sets. the tagtick virtualizer implements also patching of results for browse queries. a solr browse query is a cql query q followed by the list of object properties l for which a group-by operation (in the sense of relational databases) is demanded. the query returns two responses: the query result set rs and the group-by statistics (l, v, k(l, v)) calculated over the result set and for the given properties, where k(l, v) is the number of objects featuring the value v for the property l in rs. as in the case of standard queries, the semantic issue affects browse queries when a group-by is applied over a tag interpretation i touched in the current work session. indeed, the relative stats would be calculated over the information space i rather than the intended i(s). to solve this issue, when a browse query demands for stats over a tag interpretation i, the relative triples (i, t, k(i, t)) are patched as follows: 1. if (i, t, k(i, t)) is such that ρ(s, i, t )= ⊥, i.e. the tag was not affected by the session, then k(i, t) is left unchanged; 2. if (i, t, k(i, t)) is such that ρ(s, i, t )= ds, then k(i,t)= intersectsize(ds, getdocset(rs)) the operation returns the number of objects currently tagged with (i, t) which are also present in the result set rs. query execution: the implementation of virtquery can therefore be defined as ⟦virtquery(q,s)⟧tv = patchresultset(searchondocset(q�, ds), ρ, s) where q is rewritten in terms of q� and qtags in session by the query rewriter module, and ds is the docset obtained by applying the function ⟦.⟧vt:q × s × ℘(s × i × t) →ds defined in table 8 to qtags in session. the function, given a query of tag predicates, a session identifier, and the status map ρ returns the docset of objects satisfying the query in the session’s scope. high-performance annotation tagging over solr full-text indexes | artini et al 35 ⟦𝑞1 𝑂𝑟 𝑞2⟧𝑉(𝑠,𝜌) = unifydocsets(⟦𝑞1⟧𝑉(𝑠,𝜌), ⟦𝑞2⟧𝑉(𝑠,𝜌)) ⟦𝑞1 𝐴𝑛𝑑 𝑞2⟧𝑉(𝑠,𝜌) = intersectdocsets(⟦𝑞1⟧𝑉(𝑠,𝜌), ⟦𝑞2⟧𝑉(𝑠,𝜌)) ⟦(i=t)⟧v(s, ρ) = ρ(s, i, t) table 8. evaluation of qtags in session in session s. the definition of ρ, the query rewriter module, the semantics of the commands action and virtquery, the definition of searchondocset, and the function ⟦.⟧v guarantee the validity of the following claim, crucial for the correctness of the tagtick virtualizer: claim (search correctness) given an information space i, a map ρ, and a session s, for any query q such that 1. q = q� and qtags in session 2. ds = �qtags in session�v(s, ρ) we can claim that ⟦virtquery(q, s)⟧tv = ⟦query(q, null)⟧solr i(s) hence the implementation of the command virtquery matches its expected semantics. making a virtual information space persistent the commitsession(s) command is responsible for updating the initial information space i to the changes applied in s, i.e. add and remove tags to objects in i according to the actions in s. to this aim, the module relies on the map ρ, which associates each tag (i, t) to the set of objects virtually tagged by (i, t) in s, and on the low-level function andnotdocsets. by properly matching the set of objects tagged by (i, t) in i and i(s) the function derives the sets of objects to tag and untag in i. overall, the execution of commitsession(s) consists in: 1. identifying the set of tags affected by tagging or untagging actions in the session s: changedtags(s) = {(i, t)|ρ(s, i, t) ≠ ⊥} 2. for each (i, t) ∈ changedtags(s) a) fetching the result set relative to all objects in i with tag (i = t): rs = query((i = t), null); b) keeping in memory the relative docset ds = getdocset(rs); c) calculating in memory the set of objects in i to be untagged by (i = t): information technology and libraries | september 2014 36 tobeuntagged = andnotdocsets(ds, ρ(s, i, t)); d) calculating in memory the set of objects in i to be tagged with (i = t) tobetagged = intersectdocset(ρ(s, i, t), ds); e) update the index to tag and untag all objects in the two sets; and f) remove session s. the tagtick virtualizer module is also responsible for the management of conflicts on commits and to avoid index inconsistencies. to this aim, only the first commit action is executed, and once the relative actions are materialized into the index, all other sessions are invalidated, i.e., deleted. tagtick user interface: annotation tagging for solr the tagtick user interface module implements the functionalities presented in previously over a solr index equipped with the tagtick virtualizer module described in the section on solr and annotation tagging (see figure 2). the user interface offers to authenticated data curators an annotation tagging environment where they can open work sessions, do and undo sequences of (un)tagging actions, and eventually commit the session into the current solr information space. when data curators log out from the tool, the modules stores on disk their pending work sessions and the relative (un)tagging actions. such sessions will be restored at the next access to the interface, to allow data curators continuing their work. figure 2. tagtick: user interface. the tagtick user interface is a general-purpose module that can be configured to adapt to the classes and to the structure of objects residing in the index. to this aim, the modules acquires this information from xml configuration files where data curators can specify: high-performance annotation tagging over solr full-text indexes | artini et al 37 1. the names of the different classes, the values used to encode such classes in the index, and the index field used to contain such values; 2. the list of tag interpretations together with the relative ontologies: in the current implementation ontologies are flat sets of terms, which can be optionally populated by curators during the tagging step; and 3. the intended use of interpretations: the association between classes and interpretations. once instantiated, the tagtick user interface allows users to search for objects of all classes by means of free keywords and to refine such searches by class and by the tags relative to such class. this combination of predicates, which matches the query structure 𝑞� = 𝑞 𝐴𝑛𝑑 𝑞𝑡𝑎𝑔𝑠 𝑖𝑛 𝑠𝑒𝑠𝑠𝑖𝑜𝑛 expected by the tagtick virtualizer, is then executed by the module and the results presented in the interface. users can then add or remove tags to the objects—the interface makes sure that the right interpretations are used for the given class. as an example, we shall consider the real-case instantiation of tagtick in the context of the hope project, whose aim is to deliver a data infrastructure capable of aggregating metadata records describing multimedia objects relative to labour history and located across several data sources.24 such objects are collected, cleaned, and enriched to form an information space stored into a solr index. the index stores two main classes of objects: descriptive units and digital resources. descriptive unit objects contain properties describing cultural heritage objects (e.g., a pin). digital resource objects instead describe the digital material representing the cultural heritage objects (e.g., the pictures of a pin). tagtick is currently used in the project hope to classify the aggregated objects according to two tag interpretations: “historical themes,” to tag descriptive units with an ontology of terms describing historical periods, and “export mode,” to tag digital resources with an ontology which describes the different social sites (e.g., youtube, facebook, flickr) from which the resource must be made available from. in particular, figure 3 illustrates the hope tagtick user interface. in the screenshot, a set of descriptive units obtained by a query is being added a new tag “communism . . .” of the tag interpretation “historical themes.” the tagtick user interface offers the possibility to access the history of actions, in order to visualize their sequence, and possibly to undo their effects. figure 4 shows the history of actions that led to the actual tag virtualization in the current work session. curators can only rollback the last action they accomplished. this is because virtual tagging actions may be depending on each other; e.g., an action is based on a query that includes tag predicates whose tag has been affected by previous actions. other approaches may infer the interdependencies between the queries behind the tagging actions and expose dependency-based undo options. information technology and libraries | september 2014 38 figure 3. tagtick user interface: bulk tagging action. figure 4. tagtick user interface: managing history of actions. stress tests the motivations behind the realization of tagtick are to be found in annotation tagging requirements of bulk and real-time tagging. in general, the indexing speed of solr highly depends on the underlying hardware, on the number of threads used for feeding, on the average size of the objects and their property values, and on the kind of text analysis to be adopted.25 however, even assuming the most convenient scenario, bulk indexing in solr is comparably slow with respect to other technologies, such as relational databases,26 and far from being real-time. in this section, we present the result of stress tests conceived to provide concrete measures of query performance, i.e., the real-time effect, the scalability of the tool, and how many tagging actions can be handled in the same session. the idea of the tests is to re-create worst scenarios and give evidence of the ability of tagtick to cope and scale with response time and memory high-performance annotation tagging over solr full-text indexes | artini et al 39 consumption. the experiments were run on a machine with processor intel(r) xeon(r) cpu e5630 @ 2.53ghz (4 cores), a total of memory 4 gb, and available disk of 100 gb (used at around 52 percent). the machine installs an ubuntu 10.04.2 lts operating system, with a java virtual machine configured as -xmx1800m -xx:maxpermsize = 512m. in simpler terms, a medium-low server for a production index. the index was fed with 10 million objects randomly generated and with the following structure: [ identifier: string, title: string, description: string, publisher: string, url: string, creator: string, date: date, country: string, subject: terms] the tag interpretation subject can be assigned values from an ontology terms of scientific subjects, such as “agricultural biotechnology,” “automation,” “biofuels,” “biotechnology,” “business aspects.” the objects are initially generated without tags. each test defines a new session s with k tagging actions of the form action(tag, virtquery(identifier <> id,null), t, s) where id is a random identifier and t is a random tag (subject, term). in practice, the action adds the tag t to all objects in the index, thereby generating docsets of size 10 million. once the k actions are executed, the test returns the following measures: 1. the size of the heap space required to store k tags in memory. 2. the minimal, average, and maximum time required to reply to two kinds of stress queries to the index (calculated out of 100 queries): a. the query identifier <> id and(i,t)∈s(i = t): the query returns the objects in the index which feature all tags touched by the session. b. the query identifier <>id or(i,t)∈s(i = t): the query returns the objects in the index which feature at least one of the tags assigned in the session. in both cases, since tagging actions where applied to all objects in the index, the result will contain the full index. however, in one case the response will be calculated by intersecting docsets, while in the other case by unifying them. note that by selecting a random identifier value (id), the test makes sure that low-level solr optimization by caching is not fired, as this would compromise the validity of the test. information technology and libraries | september 2014 40 3. the minimal, average, and maximum time required to reply to browse queries which involve all tags used in the session (calculated out of 100 queries). 4. the time required to reconstruct the session in memory whenever the data curator logs into tagtick. the results presented in figure 5 show that the average time for the execution of search and browse queries always remain under 2 seconds, which we can consider under the “real-time” threshold from the point of view of the users. user tests have been conducted in the context of the hope project, where curators were positively impressed by the tool. hope curators can today apply sequences of tagging operations over millions of aggregated records by means of a few clicks. moreover, independently of the number of tagging operations, queries over the tagged records take about 2 seconds to complete. the execution time has a major increase from 0 tags to 1 tag. this behavior is expected because when there is 1 tag in the session, the 10 million records must be “patched.” from 1 tag onwards the execution time increases as well, but not at the same rate as in the previous case. this means that in the average case patching 10 million records with 100 tags does not cost much more than tagging them with 1 tag. figure 5. stress test for tagtick search and browse functionality. the results in figure 6 show that the amount of memory to be used does not exceed the limits expected on reasonable servers running a production system. the time required to reconstruct the sessions is generally long, starting from 20 seconds for 50 tags up to 1.5 minutes for 200 tags. high-performance annotation tagging over solr full-text indexes | artini et al 41 on the other hand, this is a one-time operation, required only when logging in to the tool. figure 6. stress test for heap size growth and session restore time. conclusions in this paper, we presented tagtick, a tool devised to enable annotation tagging functionalities over solr instances. the tool allows a data curator to safely apply and test bulk tagging and untagging actions over the index in almost real time and without compromising the activities of end users searching the index at the same time. this is possible thanks to the tagtick virtualizer module, which implements a layer over solr that enables real-time and virtual tagging by keeping in memory the inverted list of objects associated to a (un)tagging action. the layer is capable of parsing user queries to intercept the usage of tags kept in memory and, in this case, to manipulate the query response to deliver the set of objects expected after tagging. future developments may regard the ability to enable more complex query parsing to handle rewriting of a larger set of queries beyond google-like queries currently handled by the tool. another interesting challenge is tag propagation. curators may be interested in having the action of (un)tagging an object to be propagated to objects that are somehow related with the object. handling this problem requires the inclusion into the information space model of relationships between classes of objects and the extension of the tagtick virtualizer module for the specification and management of propagation policies. acknowledgements the work presented in this paper has been partially funded by the european commission fp7 econtentplus-2009 best practice networks project hope (heritage of the people’s europe, http://www.peoplesheritage.eu), grant agreement 250549. http://www.peoplesheritage.eu/ information technology and libraries | september 2014 42 references 1. arkaitz zubiaga, christian körner, and markus strohmaier, “tags vs shelves: from social tagging to social classification,” in proceedings of the 22nd acm conference on hypertext and hypermedia, 93–102 (new york: acm, 2011), http://dx.doi.org/10.1145/1995966.1995981. 2. meng wang et al., “assistive tagging: a survey of multimedia tagging with human-computer joint exploration,” acm computer survey 44, no. 4 (september 2012): 25:1–24, http://dx.doi.org/10.1145/2333112.2333120. 3. lin chen et al., “tag-based web photo retrieval improved by batch mode re-tagging,” in 2010 ieee conference on computer vision and pattern recognition (cvpr) (june 2010), 3440–46, http://dx.doi.org/10.1109/ cvpr.2010.5539988. 4. emanuele quintarelli, andrea resmini, and luca rosati, “information architecture: facetag: integrating bottom-up and top-down classification in a social tagging system,” bulletin of the american society for information science & technology 33, no. 5 (2007): 10–15, http://dx.doi.org/10.1002/bult.2007.1720330506. 5. stijn christiaens, “metadata mechanisms: from ontology to folksonomy . . . and back,” in lecture notes in computer science: on the move to meaningful internet systems 2006: otm 2006 workshops (berlin heidelberg: springer-verlag, 2006). 6. m. mahoui et al., “collaborative tagging of art digital libraries: who should be tagging?” in theory and practice of digital libraries, ed. panayiotis zaphiris et al., 162–72, vol. 7489, lecture notes in computer science (springer berlin heidelberg, 2012), http://dx.doi.org/10.1007/978-3-642-33290-6_18. 7. alexandre passant and philippe laublet, “meaning of a tag: a collaborative approach to bridge the gap between tagging and linked data,” in proceedings of the linked data on the web (ldow2008) workshop at www2008, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.6915. 8. michael khoo et al., “towards digital repository interoperability: the document indexing and semantic tagging interface for libraries (distil),” in theory and practice of digital libraries, ed. panayiotis zaphiris et al., 439–44, vol. 7489, lecture notes in computer science (springer berlin heidelberg, 2012), http://dx.doi.org/10.1007/978-3-642-33290-6_49. 9. leonardo candela, et al, “setting the foundations of digital libraries: the delos manifesto.” d-lib magazine 13, no. 3/4, march/april 2007, http://dx.doi.org/10.1045/march2007castelli. 10. jennifer trant, “studying social tagging and folksonomy: a review and framework,” journal of digital information (january 2009), http://hdl.handle.net/10150/105375. http://dx.doi.org/10.1145/1995966.1995981 http://dx.doi.org/10.1145/2333112.2333120 http://dx.doi.org/10.1109/%20cvpr.2010.5539988 http://dx.doi.org/10.1002/bult.2007.1720330506 http://dx.doi.org/10.1007/978-3-642-33290-6_18 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.6915 http://dx.doi.org/10.1007/978-3-642-33290-6_49 http://dx.doi.org/10.1045/march2007-castelli http://dx.doi.org/10.1045/march2007-castelli http://hdl.handle.net/10150/105375 high-performance annotation tagging over solr full-text indexes | artini et al 43 11. cameron marlow et al., “ht06, tagging paper, taxonomy, flickr, academic article, to read,” in proceedings of the seventeenth conference on hypertext and hypermedia, 31–40 (new york: acm, 2006), http://dx.doi.org/10.1145/1149941.1149949. 12. andrea civan et al., “better to organize personal information by folders or by tags? the devil is in the details,” proceedings of the american society for information science and technology 45, no. 1 (2008): 1–13, http://dx.doi.org/10.1002/meet.2008.1450450214. 13. marianne lykke et al., “tagging behaviour with support from controlled vocabulary,” in facest of knowledge organization, ed. alan gilchrist and judi vernau, 41–50 (bingley, uk: emerald group, 2012) 14. guus schreiber et al., “semantic annotation and search of cultural-heritage collections: the multimedian e-culture demonstrator,” web semantics: science, services and agents on the world wide web 6, no. 4 (2008): 243–49, http://dx.doi.org/10.1016/j.websem.2008.08.001. 15. diana maynard and mark a. greenwood, “large scale semantic annotation, indexing and search at the national archives,” in proceedings of lrec vol. 12 (2012). 16. martin feijen, “driver: building the network for accessing digital repositories across europe,” ariadne 53 (october 2007), http://www.ariadne.ac.uk/issue53/feijen-et-al/. 17. heritage of the people’s europe (hope), http://www.peoplesheritage.eu/. 18. european film gateway project, http://www.europeanfilmgateway.eu. 19. paolo manghi et al., “openaireplus: the european scholarly communication data infrastructure,” d-lib magazine 18, no. 9–10 (september 2012), http://dx.doi.org/10.1045/september2012-manghi. 20. panagiotis antonopoulos et al., “efficient updates for web-scale indexes over the cloud,” in 2012 ieee 28th international conference on data engineering workshops (icdew), 135–42, april 2012, http://dx.doi.org/10.1109/icdew.2012.51. 21. chun chen et al., “ti: an efficient indexing mechanism for real-time search on tweets,” in proceedings of the 2011 acm sigmod international conference on management of data, 649– 60 (new york: acm, 2011), http://dx.doi.org/10.1145/1989323.1989391. 22. rafal kuc, apache solr 4 cookbook (birmingham, uk: packt, 2013). 23. david smiley and eric pugh, apache solr 3 enterprise search server (birmingham, uk: packt, 2011). 24. the hope portal: the social history portal, http://www.socialhistoryportal.org/timelinemap-collections. http://dx.doi.org/10.1145/1149941.1149949 http://dx.doi.org/10.1002/meet.2008.1450450214 http://dx.doi.org/10.1016/j.websem.2008.08.001 http://www.ariadne.ac.uk/issue53/feijen-et-al/ http://www.peoplesheritage.eu/ http://www.europeanfilmgateway.eu/ http://dx.doi.org/10.1045/september2012-manghi http://dx.doi.org/10.1109/icdew.2012.51 http://dx.doi.org/10.1145/1989323.1989391 http://www.socialhistoryportal.org/timeline-map-collections http://www.socialhistoryportal.org/timeline-map-collections information technology and libraries | september 2014 44 25. assuming to operate a stand-alone instance of solr, hence not relying on solr sharding techniques with parallel feeding. 26. whyusesolr—solr wiki, http://wiki.apache.org/solr/whyusesolr. simon fraser university computer produced map catalogue 105 brian phillips: head social sciences librarian and gary rogers : programmer-analyst, computer centre, simon fraser university, burnaby, british columbia an ibm 360/50 computer and magnetic tape are used in a new university library to produce a map catalogue by area and up to six subiects for each map. cataloguing is by non-professional staff using the library of congress "g, schedule. author, title, and publisher are in variable length fields, and codes are seldom used for input or interpretation. machine searches by area, subjects, author, publisher, scale, pro-;ection, date and language can be carried out. simon fraser university in burnaby, british columbia, opened in september 1965 to 2,500 students. the library's book collection was small and the map collection yet to be started. to-day there are 6,000 students, approximately 350,000 volumes and 25,000 sheet maps. when graduate work was offered in geography the map collection had to be expanded rapidly. only a small staff was available and it was essential that any map catalogue be largely maintained by trained nonprofessional assistants. the circulation, acquisitions, and serials systems were automated and there was of course no sacred 3"x5" card file to be replaced. an ibm 1401 (now a 360/50) was in the library and the university librarian encouraged experiment. some form of automated book catalogue was clearly indicated and work began in 1966 to develop one. automated or semi-automated methods for cataloguing and producing map lists have been in use for over twenty years. very little, however, appeared in print on the subject until the 1960's. since that time there 106 journal of library automation vol. 2/ 3 september, 1969 has been a number of articles on proposed systems and experimental projects, though only a few describe operating systems. the u.s. army map service library has used punched cards since 1945 ( 1). at the time of investigation, this system was not fully automated, making use only of electric accounting machines rather than a computer. other automated catalogues, such as those for the san juan project ( 2 ) and for mcmaster university ( 3), restricted the amount of information possible by using only one punched card. these systems required codes and tables for both input and interpretation. the literature revealed other approaches, and several, such as indexing by co-ordinates ( 4 ), or using a hierarchical classification, were considered. in the former, each sheet is indexed by its centroid in latitude and longitude. this provides complete control by location; but all requests, and of course indexing, must be expressed by centroid and the extent of the search area indicated in miles. a hierarchical system, such as that suggested by donahue and hedges ( 5) or that used by mcmaster university, permits a detailed subdivision of area and/ or subject. there is, however, no agreement on standards, with each library developing a classification to meet its own needs. visits were made to the university of california at santa cruz and illinois state university at normal, illinois, to see two systems that were being automated. both had used the universally recognized classification of the library of congress. the california system was first outlined by carlos hagen ( 6) in a proposal to automate the map library at the los angeles campus, and implemented with some changes at santa cruz by stanley stevens. william easton at illinois state has described his work in cataloguing the collection there ( 7). the use of codes in both cases meant a number of revisions, as new projections, publishers or other information were required. because of format, library staff must be called upon to interpret much of the information. materials and method in the simon fraser map catalogue the library of congress classification "g" schedule ( 8) is adopted for comruter use. in it each major natural or political unit is assigned a block o four numbers. the schedule starts with the world and hemispheres, then sweeps through north and south america, to europe, asia, africa, and finally oceania. adjacent areas are thus grouped together numerically. the classification similarly groups related subjects. a single letter is used for the broad subject and an alphanumeric code for subdivisions. in an automated system each area name must have a unique number if it is to appear in the printout under that name. to this end it h~s been necessary to make variations in the library of congress "g" schedule. indo-china ( g8010-g8014 }, for example, must be split to provide separate numbers for laos, cambodia and vietnam. the subject classificomputer pmduced map catalog 107 cation must also be divided to provide an alphanumeric code for each subject that is grouped under one general number in the schedule. as commonly in map libraries, the main entry is area rather than author, which is of secondary importance. the author (engraver, cartographer, etc.) is entered on the coding form and appears in the description. in the imprint, publisher is given first, followed by place of publication. these three elements are in variable length fields. information from the maps is entered on a coding sheet (figure 1) by a library assistant. difficult sheets are entered by a librarian who checks all sheets. as indicated on the flow chart (figure 2), the coding sheets are sent to the library keypunching section, where a deck is made for each record. the number of cards for any particular map depends upon the quantity of information required to describe it. the cards are then sent to the computing centre, where they are written onto magnetic tape and used to update the current master · files. a preliminary survey determined the average length of a map record to be 350 characters, while the maximum approached the region of 700. in order to maximize the use of tape space, it was decided that four of the fields would be variable in length. these are: 1) main entry (area and title) ( 215 characters maximum); 2) publisher ( 129 characters maximum); 3) author ( 129 characters maximum); and 4) notes ( 215 characters maximum). access to these data elements is made possible by storing their character counts in fields preceding the variable portion of the record. two master files are kept and updated each time a run is required. these are the area master (by l.c. class number) and the subject master. the area master contains all maps and is used to produce the classified list and the alphabetical list. the subject master contains only those maps which have been assigned an l.c. subject code. if a map has more than one subject it appears on both the list and the tape file as many times as it has subjects. changes and deletions are entered into the system along with additions. status codes signal the three: aaddition, cchange, ddeletion. change and deletion records are complete decks. the records are changed or deleted by comparing the call number on the area master and the call number and subject code on the subject master. call number and subject code are the only fields that cannot be thus changed. their change is accomplished by replacing the old record with the revised record. as the only unique identifying number for each punched card would be the call number (maximum of 24 spaces), a six-digit i. d. number is assigned. it is repeated for each of the five decks. the maximum number of cards used in any deck is four (main entry and notes), though up to ninety-nine could be used if necessary. 1 6 1.0 . nuiiier s.f.u. map coding sheet ~~·:·;~·;·j;j;t:t ~:· :·r: ; :·: ;·:,~~·~·~;: ~· ~·~,~~;j·: : : :t·;·t; ;·;r ~:·t· ;~~·~ ·;~ ;~~· : f:·: :r·;·:·:~·:·r;·j . .... ~~ 6 10 80 1.0 . number t:·:·:·:·tl::t:·:~·: :·:·:·:·::· :·: :: ·: ·:·::::;:·:~::~~:~·:·:~·:·:·: :·~:· : ·: ·:·: ·:·: :,::~·::·:·: :·:·:·:,: :: : : : :: : : :: i 60 f:~gl:l:::h· ::~~: :~:: :,:·:·:·: : ::: ::: :~:·:: 0:~: ~: :: :: : :::: ::::::::::: :::::: :::::::: i 1 6 1.0 . number 10 6 10 fig. 1. .map coding sheet. 1 0 .. -------notes ............. . ...... @ i ~ 1:"4 .... c3" ~ ~ f g· ~ ~ coj cf.) .g l ~ t5 ffi 8 computer produced map catalog 109 map work forms sort records card code within card type within 1.0. humber fig. 2. work flow chart. 1 tape layout • lm005 cd subject master & update tapes (v area master & update tapes status identification a= addition area number c =change ® d = deletion cd cd 6 1 8 projection language cd g) 40 41 4 2 ~-~ .. 45 title publisher length length cd 0) it 83 84 86 87 1113 location cd author length 0) ~block size = max • 2340 rec. size = max.· 780 sub subj. date of sheet copy scale area maj. publication number cd cd cd @) cd g) 15 tg 1119 z213 --32 33 35 37 39 -cd subj. master form size code name subj. code g) spares cd ® g) cd area master· alpha • code g) @or@ 4547 4149 54 55 -·-~·57 58 60 or 61 61 or 62 ---80 notes title publisher author notes length (place & subject) 0) max = 215 max= 129 max= 129 max = 215 89 90~ ----91 93 -------........variable length section fig . .'3: layout for area and subject masters and update tapes. ..... b ..... ~ -c -t"-4 & ~ ~ g ..... ... c ;:$ ~ ~ t-0 ~ ij) cd "0 ,.,. ~ ct> vi-! ..... co ~ computer produced map catalog 111 the equipment used is all ibm. the cards are punched on an 029 keypuncher and verified on an 059. a computer model 360/50 is now being used, though equipment of this capacity is not necessary. during the development of the project an ibm 1401 and later a 360/40 were used. printing is done on an ibm 1403 at 1100 lines per minute. the programmes and their functions. the following nine programmes which were originally written in autocoder are now in pl/i. this is a relatively new high-level language for the ibm 360 system. to have maximum efficiency from this language large core storage is necessary, though it can be used, with restrictions, on a 32k core storage computer. with the use of other programming languages the system could run efficiently on any computer. lm001: this programme puts the card decks (from keypunching) onto tape in card image. lm005: this programme creates and explodes each group of records on the card image tape with the same identification number to produce a subject update tape and an area update tape (figure 3). at the same time, each record is edited; if an error is found, the record is rejected. in order for a record to be valid, the following conditions must exist: 1) numerical identification number. 2) valid card type, i.e. ( 1, 2, 3, 4, 5) (see figure 1 ) . 3) no duplicate cards for the same map. 4) card codes successive. 5) area being 'g' followed by four numeric digits. 6) numerical date. 7) if scale absent, 'z' (not printed). 8) general information card and title card for each map. lm010: in this programme, the area master is updated with the area tape. an error message is printed and the record rejected if there is already an addition on the master file or if there is a change or deletion having no corresponding record on the master file. also the area number is checked against a table to see if it is valid. if it is invalid, an error message is printed but the record will appear in the master file. lm015: this programme lists the alphabetical geographical master. lm025: this programme lists the area master geographically. lm030: this programme updates the subject master. a message is printed and the record rejected if there is an addition which already exists on the master file and if there is a change or deletion which has no corresponding record on the master file. also, the subject code is checked against a table to see if it is valid. if the subject code is invalid, an error message is printed but the record will appear in the master file. i 112 ]oumal of library automation vol. 2/ 3 september, 1969 lm035: this programme lists the subject master. at the same time, a tape is produced with each subject heading and the page number it appears on. lm036: this programme lists a table of contents for the master subject list. lm037: this programme lists an index for the master subject list. the catalogue the catalogue is a book catalogue produced in three sections, unburst and top-bound in loose-leaf binders. the first is the classified or shelf list section, which brings maps of adjacent areas together. within each l.c. number or area, maps are by subject code. in l.c. order, general maps are followed by those with subject emphasis, then by those showing political divisions, ending with cities. area names and numbers are in bold type. all pages are numbered and there is a table of contents giving area name and l.c. equivalent. there is also a list of subjects with code numbers. section two is the same list in alphabetical order by area name (figure 4). section three (figure 5) is the subject listing. maps are arranged by l.c. subject code rather than alphabetically, which gives the advantage of grouping related subjects together. within each group maps are in class number ( ie. area) order. in the format for this section the l.c. alphanumeric code is given first, with the subject name in bold type antigua. • • • g5047g5050 g5047 1959 antigua, west indies (antigua island). 1:25,000: transverse mercator. great britain. directorate of overseas surveys, london, 1962. set of ·2 maps. fig. 4. alphabetical area list. j80 industrial agricultural produg.ts g8481 j80 1959 rhodesia and nyasalandtobacco (tobacco production ••• rhodesia and nyasaland). 1:3 million. rhodesia and nyasaland. director of federal surveys, salisbury, 1961. federal atlas map no 20. k forests and forestry k10 forestry in general fig. 5. classified subject list. computer produced map catalog 113 for major groups and in regular type for the subdivisions. an alphabetical index of subjects refers the user to the page where his subject begins. the call number in all three lists includes only the major subject of the map, but since a map may cover several, up to five additional or "minor" ones may be included when cataloguing. a single sheet may therefore appear under several headings in the subject section. this method is also used to catalogue a single sheet containing several separate maps. evaluation although some modifications may yet be made to the system, the catalogue has proven highly successful and possesses a number of advantages over existing manual and automated systems. its clear format and lack of symbols make it easy to use. it is issued each trimester in three copies, of which one is kept in the library. one copy is sent to the geography department, and one to the history department. the work form is simple enough to be used by skilled non-professional help and as all punched cards are verified there are fewer errors than with card typing. some errors do occw·, but in almost all instances the record is automatically rejected and corrections made. filing errors are non-existent. few codes are needed for input and only the l.c. number, form and location are not readily understood in the printed catalogue. although language, scale, projection and subject are entered in code or short form, tney appear in full on the lists. the codes used for form, language, and projection are very simple and reference to them is seldom necessary. main entry, author, and imprint are in variable length fields, allowing complete information to be given without codes or abbreviations. as main entry is the area name and imprint is by publisher rather than place, a gazetteer-index, and a list by publisher, as well as an author list or index, can be produced when required. although not envisaged as an important element, the provision of a punched card for notes has been most valuable. if a map is withdrawn from a journal, or has an accompanying brochure filed elsewhere, this is stated. any further explanation necessary for an understanding of the map is also given. since all elements on the first card are in fixed fields it is possible to obtain lists on demand by subject, date, scale, language projection, etc. although the extent of the simon fraser collection makes this impractical now, its potential for preparing bibliographies and machine searching is apparent. an analysis shows that initial costs were not excessive. the programming time of two months was the largest single item at $2,400.00. computer time and forms to produce the three listings totalled $110.00. the projected cost based on the present size of the collection is $280.00 per 114 journal of library automation vol. 2/ 3 september, 1969 year, a figure which will increase as the collection grows. keypunching and verifying time is approximately 2~ minutes per map. while this is of course a cost factor, it is done at slack time by the cataloguing department, whose operators are paid from $360.00 to $400.00 per month. in a manual system, an additional clerk at $3,564.00 would have been needed to type and file cards, and furniture for the cards would also have been required. the disadvantages are now more evident than upon receipt of the first lists in june 1968. use of the classified section has been slight except by the library staff and it will be issued only once each year. the alphabetical area section is the most heavily used, but the arrangement of entries by l.c. code under each area is confusing. as the number of maps increases from one page to many the user finds it increasingly difficult to locate a thematic map. the third section, by subject, helps overcome this problem, but here again the list is by l.c. code, not alphabetical within each subject. topographic series and sets were catalogued with one enb·y so the number of records was considerably less than the 25,000 sheets in the collection. archival and facsimile maps acquired since the system was designed have presented problems. the librarian and library assistant were new to map work; consequently the number of errors was high, and corrections and patching-up were time consuming and therefore costly. conclusion despite the less than perfect product, however, the results are worthwhile. first time users experience some difficulty with the classified arrangements but only a simple explanation is needed, and thereafter students are able to identify and locate most maps with little reference to staff. the geography department, and to a lesser extent the history department, do make use of their copies of the catalogue. telephone enquiries for holdings are minimal and some faculty have asked that they be given their own copies. the simon fraser system is not expensive to operate, the catalogue could be issued more frequently at little extra cost, and the system uses a widely accepted classification scheme that is updated periodically. the programmes employed could be adapted by other libraries with few, if any, modifications and the system could be run on any computer. there will be more sophisticated map catalogues, such as that of the library of congress, using marc ii format, and others which will take greater advantage of computer capabilities. extensive and costly research, liowever, will be needed to develop these systems. the simon fraser system is operating now, was developed in a very short time, and has had a successful first year of use. computer produced map catalog 115 references 1. murphy, mary: "will automation work for maps?" special libraries, 54 (november 1963 ), 563-567. 2. thomas, kenneth a., sr.: "the san juan island project: cataloguing maps by mechanized techniques," special libraries association, geography and map division. bulletin, 54 (december 1963), 8-12. 3. donkin, kate; goodchild, michael: "an automated approach to map utility," the cartographer, 4 (june 1967), 39-45. 4. stallings, david lloyd: automated map reference retrieval. a thesis submitted in partial fulfilment of the requirements for the degree of master of arts. (seattle, university of washington, 1966), p. 71. 5. donahue, joseph c.; hedges, charles p.: "caresa proposed cartographic retrieval system," american documentation institute. proceedings, (1964), 137-140. 6. hagen, carlos b. : "an information retrieval system for maps," unesco. library bulletin, 20 (january-february 1966) 30-35. 7. easton, william w.: "automating the illinois state university map library," special libraries association, geography and map divis~on . bulletin, 61 (march 1967), 3-9. 8. united states. library of congress. subject cataloging division: classification, class g: geography, anthropology, folklore, manners and customs; 3rd. ed. (washington: superintendent of documents, 1954)' p.502. lib-mocs-kmc364-20131012113805 252 libraries and information services in a post-technological society* maurice b. line: british library lending division, boston spa, england. technological imperatives will produce major changes in society in the future, as they have in the past. post-technological society will affect the way we work, where we live, and how we spend our leisure. changes in educational and research directions and in publishing and information delivery will affect the role and shape of the library of the future. this paper covers ground trodden by several papers in the last two or three years. its only justification is that it approaches the matter from a rather different angle than most, and that it may help to stimulate debate and, possibly, concerted action. it asks questions to which only tentative answers are given, and opens up issues that are left as loose ends. readers can work out their own answers and tie up the loose ends in their own way. it is the questions and issues that are important-so important, and so urgent, that they cannot be ignored without great peril. the post-technological society by "post-technological society" i do not mean a society in which technology has fulfilled all that could be asked of it (and probably more than many want of it) and has no further use. i mean a society in which technology, specifically electronic technology, is fully integrated and accepted, but as the servant of society. it is not hard to find examples of society serving technology, or at least of technologists, and those who make money out of technology making other people serve it (or them)just as in the industrial revolution people served machines, had their personal and social lives organized around machines, and were forced into a different way of living by machines. we are still feeling the effects of the massive changes brought *the views expressed in this paper are those of the author and do not represent those of the british library. a version of this paper was delivered at the ala annual conference in san francisco on june 29 , 1981. manuscript received august 1981; accepted september 1981. libraries/line 253 about by the industrial revolution, but some countries, at least, have long reached a stage where they are no longer dominated by industry, and others are suffering from an actual decline in industry which may prove irreversible in the long term. to me a post-technological society means one that has gone beyond technology and progressed to the concept of society for society's sake. since such societies will be attained much more quickly in developed rather than in developing countries, this paper is concerned with the former. the latter raise different and extensive issues which deserve separate discussion. this paper will be at least as concerned with the social framework as with the role of libraries and information in a future society, since libraries and information cannot exist in a vacuum and must be geared to the society we have. they can respond to changes, often after some delay, as has happened too often in the past, or they can anticipate them, or they can perhaps influence them. my own picture of a possible, and by no means improbable, future will be an optimistic one, in that i shall assume that no major wars or major social revolutions occur-only a gradual, but eventually massive, change. economic shifts the decline of heavy industry has been happening for a long time, even if allowance is made for the fact that some of its decline in developed countries has been caused by exporting it to developing countries. big things are still made, but they are no longer made manually, since much of the heavy work has been taken over from men by machines-machines, moreover, that do not even have to be minded by men. some industries are more amenable to this kind of change than others, but few if any industries are unaffected. these changes are being greatly accelerated by electronic technology. since there will be less work and economic growth will be slower, nil, or negative, the disposable income of the average citizen will increase slowly, remain steady, or decline. this will reduce buying power and affect production. there has been a huge switch from production industries to service industries, which have absorbed many of the personnel freed from production industries. the service industries themselves are now being subjected to automation and this time there will be nowhere for people to go for jobs. the western world may or may not achieve continued (or resumed) economic growth. if it does, it will be achieved largely by automated means. some jobs will be created, but far from sufficient enough to replace all those destroyed. without economic growth-and no economy can go on growing for ever-no jobs will be created. the question is not, therefore, whether but how fast jobs will be lost. whatever measures may be taken to alleviate it, the loss of jobs will happen and the sooner we adjust to the fact the better. jobs are being lost, now, in most of the developed world. it is 254 journal of library automation vol. 14/4 december 1981 later than we think, and planning and action are needed urgently. a world with less work there are various ways in which the loss of jobs might be dealt with. one possibility is not to deal with it at all. the results of a do-nothing policy would be that a very large proportion of the population would be unemployed, and after a time unemployable because they become unused to working, and the gap between the rich and poor would extend to an unbridgeable width. the social and political dangers of such a situation are obvious: it could be controlled only by a system that would be dangerously near to a police state. the alternative is anarchy. a positive alternative is to deliberately create work , in order to have full or near-full employment. since there will not be enough necessary work, most of the work created will be strictly unnecessary. some socialist states do this. competition and enterprise are stifled, and awareness that their work serves no useful purpose can hardly satisfy those doing it. this is true even when systems are almost entirely manual; the problems of creating work when there is even less that needs to be done will be enormous. technology could of course be ignored and systems left unchanged, in which case hard and unpleasant work will continue to be done by peoplework that could be done automatically. this could also be a cause of dissatisfaction. another alternative is to spread work around more thinly, so that, in effect, everyone works part-time. this solution requires full cooperation from the work force, and workers with full-time jobs have, to date, given no indication that they will behave altruistically. we shall probably see a mixture of all these solutions: more people totally unemployed , more people working shorter hours than now, and some deliberate creation of work. whatever happens, a lot more people will have a lot more time on their hands-except psychiatrists. the work that will exist will be mostly skilled, and much of it will be highly skilled. unskilled labor will be little in demand: this is the corollary of the otherwise wholly welcome decline in unpleasant work. it is incidentally a sad reflection on modern society that little attempt has been made to apply modern technology to some of the nastiest, dirtiest, most laborintensive jobs such as garbage collection-presumably because the workers in question have less say than middle-class housewives, much of whose time is now spent in waiting for labor-saving devices to be repaired. societal decentralization information technology will make our present huge conurbations unnecessary, and they will no longer exert a magnetic attraction because they will have no more work to offer than smaller communities, including probably small rural communities. much work will be capable of being done without traveling to it. we may see a revival of small towns, on a libraries/line 255 truly human scale, where people know one another and feel some sense of responsibility to and for one another. indeed, conurbations, many of which are historically amalgamations of small townships created largely by the industrial revolution, may break up again into small townships. social life in such communities can flourish. huge industrial plants will not be necessary either. that is not to say that there will not be huge industrial concerns, but they will be able to consist of a network of small units, which can be semiautonomous and widely scattered. within units, smaller groups can have their own identities. since people find it very hard to relate to large numbers of other people from day to day-fifty is about the maximum meaningful grouping-this trend, which is already occurring in forward-looking industries, is entirely beneficial. it will make communication within firms, which is often lamentably poor at present, even more important. a move from big cities to townships and other smaller communities, and the independence of distance provided by information technology, should lead to much greater local and regional self-sufficiency. more social selfsufficiency would certainly be expected, but political self-sufficiency, in the form of greater independence of national or state government, could also develop. there tend to be fashions in centralization and decentralization, and the recent trend towards decentralization may not last in the face of international tensions. the balance will always be a difficult and shifting one, but technology should enable national identity and purpose to coexist with a high degree of regional and local autonomy. the need for balance between the state and smaller groups will also become greater as most western societies come to be more and more multicultural, not to say multilingual. the cultural needs of minorities will have to be reconciled with the fact that they are part of a larger community. alternative life-styles there are signs that some of the social changes suggested are already taking place. beneath all the superficiality and sheer silliness of many "alternative" movements in psychology, politics, social life, etc., it is not hard to detect a deep and widespread desire for a less competitive, less aggressive, less exploitative, and less polluted society-one less dominated by industry and the profit motive. this desire, ill defined though it may be, is surely felt by many who play no part in these minority movements. we should incidentally remind ourselves that not all dropouts are failures: many could have succeeded in our society if they had wished and tried. there are signs of changes in education, too. academic pressure on school pupils in japan may have increased in recent years, but it has decreased in many countries, including britain, partly because schoolchildren themselves have a different view of the future-less optimistic than in the past, but certainly less aggressive and self-seeking. 256 journal of library automation vol. 14/4 december 1981 optimistic faith in technology generally is a thing of the past, except for those directly involved in technological developments. a feeling almost of despair on the part of many people, due partly but certainly not solely to technology and its misuse, is leading them to take up extreme positions. this is a dangerous trend: society rarely progresses by extremes or fluctuations between them. if mindless opposition to technology can be redirected to careful thought as to how technology could be used to benefit society, and if those who appear to have total faith in technology could bring themselves to think more of its social function, not only would the proand antitechnology split be healed but both sides could work toward a common end. new educational needs the economic and social changes outlined above will obviously create a need for very high skills on the part of a few. skills of a lesser order will need to be much more widely disseminated, and some knowledge of electronic technology should be universal. the educational system will need to gear itself to this, not only during childhood and adolescence but during adulthood as well, since much of the population will need reeducating, maybe several times if not continuously. other educational needs are less obvious. one of the reasons why the idea of a society where there is less work horrifies us so much is that we have been brought up to worship work. work worship is deeply ingrained in us-not surprisingly, as the prosperity of the northern hemisphere (and more especially, the northern part of that) has been built on the anglosaxon protestant work ethic-originally work to keep oneself warm, work to save one's soul, work to create wealthand ultimately work as an end in itself. the right to work is in itself an odd phrase-why not the right to be idle? because the right to be idle sounds immoral; laziness is one of the worst sins. i suspect sloth is probably rated, deep down, as the worst of the seven deadly sins-most of the others are fun, like gluttony and lust, or generally accepted as human weaknesses, like pride and envy. historically, work worship is recent, and geographically it is not widespread. there will be no point at all in inculcating the work ethic in a society where there is not very much work. instead, there is much to be said for inculcating a play ethic, or at least a leisure ethic. there will be a lot more leisure, and if soul-destroying work is not to be replaced by soul-destroying leisure, people will have to learn to enjoy it and use it positively. in the early stages this will require major efforts, because it will be unnatural both to the work worshippers and to the next generation of work shunners. since continuous expectations of an ever-increasing standard of living cannot be sustained, people will also need to be educated for contentment. they will have to learn to live more in the present, and there will need to be a certain degree of acceptance that things are as they are, economically at least. the hope that somehow, sometime, improved income, housing conlibraries/line 257 ditions, or whatever will make people happy will have to be abandoned. this does not of course mean an oriental fatalism or a cessation of efforts to improve people's well-being; it does mean that these efforts should be personal and social rather than economic. a recognition of this can again be seen in the burgeoning of self-development and social growth movements, though many of these are too inward-looking, not to say egocentric, to be of much social value, even if and when they actually work. planned leisure leisure will need to be carefully planned if it is to take the place of work in satisfying some basic human and social drives. it will have to absorb energies, mental, physical, and social. it is interesting to note how mankind, as his society progresses well beyond the stage where it depends on hunting, shooting, and fishing, increasingly occupies his spare time in hunting, shooting, and fishing-and since these are often expensive pursuits it is those who are highest up on the economic scale who go in for them most. both individual and social leisure activities will have to be planned and provided for. individual leisure activities will of course include not merely mind-occupying (and mind-stultifying) things like space invaders but intellectual and artistic pursuits. social activities, which will also be intellectual and artistic as well as sporting, will be especially important in the smaller communities we may hope to see replacing conurbations, especially as much work will no longer need to be done in groups and will take place in isolation-some of it at home. individually and socially, the performing arts will gain in importance, and the development of creative talents of all kinds, from the making of furniture to the making of poetry, must be encouraged. my earlier use of the term "necessary" in relation to work begged several questions. some work is necessary to keep people alive, but merely being kept alive is hardly a sufficient objective for humankind: man cannot live by bread alone. other kinds of activity than work are equally necessary for a reasonably satisfactory human existence. this is recognized already to some extent, indirectly rather than specifically, by the increasingly large amount of work created by the leisure industry . note the terms "work" and "leisure industry" -we even have to fit leisure into our work-oriented, industry-dominated ethos. in my (perhaps rather idealized) society the artist is as "necessary" as the coal miner, the sportsman as the government official. education for future society the tasks of education for future society are numerous and formidable. as well as meeting the continuing needs for scientific and technical knowledge and skills, education must educate people for leisure. this education must be both "positive" and "negative""positive" in developing social, 258 journal of library automation vol. 14/4 december 1981 physical, intellectual and artistic skills, "negative" in the sense that training for the rat race must be superseded by personal development for contentment. certainly the present tendency of the educational system to stifle creativity and originality at an early age must be eliminated. all this is a very tall order, and to meet it may require radical changes in the system. above all, the idea that education begins in infancy and ends in middle or late adolescence should be abandoned, because the kind of education needed in the future must be available at all ages. education must be literally a continuous process, not a separate and definable chunk of life. there is no reason why all education should be left to professional educators. not only should people be encouraged to educate themselves, but they can educate one another. these processes go on all the time, of course, but they would benefit from some planning and professional help. the generation of educational materials-"educational" in the broadest sense-will be a major activity. government and professional information needs government, whether national, regional, or local, will have a massive and increasing need for information. much of this will be of a technological, political, and economic nature, as it is now. if there is more local and regional autonomy, and some cultural autonomy , some of this information will be concerned with coordination, holding together as a nation a society that could easily become fragmented. information will have to flow from, as well as towards, the center. the main dangers are that government's needs will be so huge that the information system will be designed primarily to serve them, possibly neglecting other needs; since much of the information transmitted to and generated by government will never appear as print-on-paper (pop), access to it will be almost impossible unless it is deliberately provided. freedom of access to printed information is far easier to ensure and monitor than freedom of access to closed information stores. at the same time, access could be made much easier if governments so desired. this must be ultimately a political judgment, and will undoubtedly be a political issue: for better or for worse, politics and information will be even more inextricably mixed than now. service occupations, such as the medical profession, will also have a massive and increasing need for information, which should not be confined to any narrow specialism, such as surgery, but should encompass social aspects. many wrong tacks have been taken because the context in which they were adopted was too narrow: a good example of this is the planning and building of high-rise flats. such errors, which are colossally expensive in social as well as economic terms, are due partly to a failure of imagination but also to an insufficiently broad body of information; future doctors and planners working on a problem must be made aware of the wider framework of that problem. at the same time the background and libraries/line 259 nature of their plans and decisions must be made publicly available: they must convey information to the society they serve, abandon their mystique and expose themselves to public scrutiny. for this to be useful the public must have some understanding of what is being conveyed. changing research directions development research will continue to be as important as it is now, but there will be much less fundamental research in science and technology because the funds will not be there, and such fundamental research as is carried out will tend to have its origin and inspiration in problem solvingthe problems set by government and industry, and, one must hope, society. the cutbacks in basic research we are seeing on both sides of the atlantic, especially in academic institutions, are unlikely to be a temporary phenomenon. the cutbacks may be made by governments for economic reasons, but there is little resistance to them, partly because of a widespread public suspicion of science and technology. research will of course still be done because some people like doing it and will have access to the necessary facilities, but there will be less of it, and it will become even more interdisciplinary than now-or rather, the present boundaries will give way to quite new ones, and science will be structured differently. restriction of money and facilities will curb scientific and some social science research, but it need not seriously hamper research and private study in the arts and humanities, and in a leisured society one would expect the balance of research to shift substantially toward these subjects. if my diagnosis and forecast are even approximately accurate, then we shall see a society in which there is less work, much less hard work, and shorter working hours; in which conurbations and large industrial concentrations are unnecessary and smaller communities and plants may take their place; in which there will be less centralization; in which services will grow and interact much more with the people they serve; in which fundamental research in science and technology will shrink, but private research in other fields will grow; in which, above all, there is far more leisure, and hence far more scope and need for leisure activities of all kinds. if we are wise, we shall adapt and develop education toward this sort of society; and in the process much education will be de-institutionalized. information and libraries all up to this stage is really an extended prologue to the setting of information and libraries in a broader context. i do not propose to spend long on likely developments in information technology , since it is becoming almost impossible to pick up even a newspaper without reading something about them. the vision of the "paperless society" is held out before us as an unavoidable and desirable ideal. those who advocate such a society seem to me either naive or deliberately misleading. they speak as if all 260 journal of library automation vol. 14/4 december 1981 published literature were scientific and technical journals or report literature and as if all readers were involved in research and development. this is patently not the case, and if my predictions are correct scientific and technical literature will come to constitute a diminishing proportion of published material. the advocates, moreover, appear to assume patterns of reading behavior-where, when, and how people read-that can hardly be widespread. for example, it is doubtful if many people read most books, apart from novels, in a linear fashion from the first page to the last, or read for long continuous periods, or do most of their reading, whether for the purposes of work or leisure, in only one or two places. i certainly do not, yet a system of "publication" based on electronic transmission and reception on a screen would impose these restrictions on me. my reaction would be the same as with microfilm-to get a legible hard copy as soon as possible to read when, where, and how i like. unless very high quality receiver/printers are used, my copy may be legible but it will be much inferior to what i am used to, so that the new media will merely be an expensive way of producing inefficient old media. there will be even more paper around than now; it will simply be produced in vast quantities at the receiving rather than the producing end. nor do i believe that economics favor, let alone necessitate, the electronic transmission of most published material: if enough copies of a document are produced at the receiving end it may even occur to someone that centralized production and distribution would be more efficient, and the book would be reinvented. the pop culture is likely to be with us for some time yet. are the paperless prophets offering yet another example of trying to do something because it can be done, of giving technology priority over society, which then has to adjust to the changes that technology has brought about? the adjustment would have to be not only individual but political, because paper is a democratic medium in that it can find its way almost anywhere it can be read, whereas electronic communication will tend to be elitist both within and between countries, widening the already huge information gap between developed and developing, rich and poor, north and south. the future will surely be a mix of more or less conventional publications-many of which may be produced from machine-readable versions by computer phototypsetting-and of electronically transmitted information-some of which will end up, and be distributed, as printed matter. technology should be used to extend the range of media available, not reduce it, and it is more sensible to consider who needs what information before deciding that one mode of provision is inevitably best. new concepts of publication nevertheless, the changes will be profound. the concept of "publication" will need to be redefined; it certainly cannot be confined to a printed document, or to any physical object at all. there will be no permanent libraries/ line 261 archival form of many items, since the electronic originals of papers can readily be corrected, updated, or even deleted. (incidentally, the danger that this will give governments or others the ability to control information and to reinvent the past as well as adjust the present cannot be ignored, and in an open society there must be strong safeguards). some (by no means all) journal articles and reports, dictionaries, encyclopedias, and various kinds of handbooks are well suited to electronic storage and transmission. the contents of these materials too will be subject to frequent change. although most published material is likely to continue to appear in a more or less conventional form, the fact that much of it will be produced from a machine-readable version will mean that it need never go out of print because copies can always be produced on demand. so long as the electronic masters are kept somewhere, there will no longer be a need for permanent paper: books can be disposable objects. they have been regarded as disposable by many individuals for a long timebooks are a rare example of an object which is consumed but after consumption is still there unless thrown or given away-but libraries have generally deplored selfdestructive paper and research libraries have avoided disposal whenever possible. if and when existing printed matter whose content is significant can be captured and converted to machine-readable forms, conservation efforts can be directed to works that should be preserved in their original form because of their artistic and literary value (paradise lost just doesn't look the same or communicate the same message in a seventeenth-century edition as opposed to a computer printout) . luckily, many such works were printed before the quality of paper began to deteriorate. a leisured society, educated to entertain itself and to create, will produce more media, written, visual, and aural. written media can of course simply be handwritten and duplicated for local distribution and use, and one would devoutly hope that most will stop at this stage. it will not be difficult, however , for authors to input their writings on word processors, and once they have done this there is, in theory, nothing to stop them from being added to larger files. book publishing has always been a rather uncertain business, but at least somebody somewhere has had to make a decision on whether a given book was worth publishing. who decides, and how, whether a book, or a short story, or a poem, is "worth" putting into a nationally accessible database? will there be an equivalent of the book publisher in the form of a database provider, who may concentrate on a certain type or range of material and who will exercise some quality control over input? or will input be relatively uncontrolled, and the control be exercised by the consumer, who will decide whether material and how long material remains accessible? there are obvious dangers in the latter. similar considerations apply to research literature-whether and how input will be controlled, or whether and how control will be exercised by users. what does seem clear is that it will be very difficult to prevent rubbish getting into the system -even more than happens now. 262 journal of library automation vol. 14/4 december 1981 bibliographic control in any case bibliographic control will be essential. bibliographic control over conventionally published and printed material is far from complete or perfect, and this has not always mattered because there has been visual and physical exposure to the material itself. but with a closed store of information this kind of exposure is impossible. solutions in the case of research literature should not be hard to find : most of it is readily indexable and exposure is important mainly for keeping up-to-date with general developments in and around one's specialty. with other kinds of literature, exposure may be achieved by means of television and videotex. it is clear that while pop will continue to be published as now, it will increasingly be supplemented, and in some sectors supplanted, by alternative media. alternative media will include videocassettes and sound recordings of various kinds as well as surrogates for pop. some of the information on alternative media will be ephemeral-though no more than much present printed material-but much of it will be of long-lasting, if not permanent, interest. this will be especially true of the unprinted word-matter that would today be published as a journal article or report, but tomorrow may be accessible only electronically. the strange concept that appears to be held by many librarians that everything printed has permanent value and must be preserved while other communication media are largely disposable will become untenable (as it should already bemany books are of far less value, even as ephemera's of interest to future social historians, than many films or television programs). home information services one obvious trend is that most homes in western society will have (as they have now) television sets, and also videotex adaptations. many homes will also have their own computer terminals. the average home may be poorer in real terms than now, but such equipment will be cheap and commonplace. in one way or another , many individuals will be able to call up a great deal of information without having to leave their houses , and some will be able to print it out. it looks then as if there will be a greater volume of information generated at all levels, from government through academic and research bodies to individuals, in a wide variety of forms , with imperfect bibliographic control and only limited quality control. the information will range from that needed and generated by management, through research papers, to material produced for leisure-for the enjoyment or amusement of others or the satisfaction of the creator. much of this information will be available on television, much on machine-readable databases some of which will be accessible in many homes, and much in conventional pop, which because it need not be permanent can be produced very cheaply and hence made available to a wide market. libraries/line 263 in this situation, the workers will be able to obtain the information they need for their work either at a place of employment where others work or at home if their work is such as can be done at home, whether online or from printed paper produced at a terminal. their leisure needs, and the much greater leisure needs of the increasing numbers of unemployed, will be met by television, including videotex and videocassettes, and by cheap throwaway pop. so will their education needs-and i suggested earlier that education will involve many more people as educators as well as learners, that it will be lifelong, and that its strictly functional element will be limited and less socially important than its leisure element. information suppliers some of the information will be produced by government and made available as a public service if not a public necessity. it may or may not be supplied by the government, and even if it is, it may not be free . some information will be produced by amateurs, whether groups or individuals, at their own initiative and expense-but access to it may still be paid for, partly depending on how it is supplied. some information will be produced for no direct payment (like research papers), but will be commercially supplied. much will be both produced and supplied for profit. to state the obvious, someone will have to pay for the supply if not the production of all kinds of information, and if it is supplied electronically, particularly in direct response to demands, the costs cannot and will not be concealed. this would not need stating if much of the information supplied by and through libraries at present were not detached from any costs and prices. it has of course been paid for at one stage, usually from public or institutional funds, but uses are not paid for, and the consumer very rarely pays anything. this very point has, of course, led to the campaign by authors and publishers for so-called public lending rights and for photocopying fees. there are various answers that can be made to the arguments put forward , but most of these will be irrelevant to much information in the future because the supply of information, for each and every use, will be in the producers' or suppliers' hands and under their control. if they want to sell, say, a videodisc to an institution on terms that permit subsequent use without payment to them, that will be their decision-but in such cases the initial sale price will hardly be small. where do libraries fit in? where might libraries fit into future society? the need for current recreational reading will be supplied (as now) largely by disposable paperbacks. needs for publications of the past will be capable of being met directly by on-demand printing from a machine-held database, or possibly from microform. reference needs, fulfilled at present largely by directories, handbooks, encyclopedias, etc., will be met online, probably through television screens. research needs will be satisfied largely through commercial data264 journal of library automation vol. 14/4 december 1981 base operators, selling printouts or computer time. libraries are intermediaries in the communication of information, at best transmitters: if intermediaries are less needed, and transmission is done in other ways, where do libraries come in? should libraries wait and see how things develop and then see what they can do about it? should they try and influence matters at an early stage-if so, how, and is it already too late? should libraries become producers as well as intermediaries? should patrons be charged? all or some? if some, where should the line be drawn? the questions are agreed, but no one knows the answers. librarians who consider these matters tend to start from the assumption that public and other libraries exist, that they will still be supplying pop, and that the main issue is whether they add electronically accessible information to their services. an alternative approach would be to start with the "null hypothesis" that in the future libraries will not be needed-not even to supply books-and then to consider the best means of making information available in various forms. if the best means turns out to include something like libraries as we know them, at least they will be based on more secure foundations. the information unit in institutions such as industrial firms and universities there will almost certainly be an important role for an information unit-important if only because there will be such a quantity and variety of information, available in so many ways, that an access and control unit will be vital if users are to acquire the information they need and to acquire only the information they need-sipping or drowning, both inferior to a decent drink, will both be possible without expert help. since conventional pop will still exist the information unit may as well acquire and provide that as well: note that the unit will be adding pop to electronically accessible information, not vice versa. a large store of books will be unnecessary, since only those needed fairly frequently need be kept for long; others can be obtained on demand as required. as well as remotely accessible information and pop, there will be local information stores held in electronically accessible form, whether built up locally or acquired as electronic packages (such as videodiscs or tapes containing the future equivalent of journals). the unit will supply both information, orally or on paper, and pop, much of which will be generated at terminals. there will of course also be local access points within an institution of any size, e.g., the chemists will have direct access to chemical data. the unit will have one very big advantage over local terminals in that it will be able to afford machinery to produce far higher quality pop. the unit must also be able to supply remote workers, since many will not need to come into the institutional buildings regularly. the institutions will pay all necessary fees for the use of information, since their staff will libraries/line 265 presumably be using the information for the purposes for which the institution employs them. whether the unit itself pays or whether it charges departments of the institution is a minor issue. information, in its widest sense, for the public is a rather different matter . or is it? there may be terminals and screens at home, but the same problems of diversity and quantity of information sources and media will exist, perhaps on an even larger scale. there is no reason why there should not beindeed, it is fairly certain there will be-various information suppliers, some of them competing with one another. the question is whether there is a place for a publicly-funded service embracing a wide range of information sources and media, and if so what form it should take. in the public interest, and ultimately in the national interest, ready access to official information should be provided, but this information could still be made available at home: a physical unit open to the public, on the lines of today's public libraries, would not be necessary. the same applies to other information sources: there is a strong political and social case for a publicly-funded guidance unit, and probably also a supply unit, though it would not always be easy to decide when the supply costs should be carried on public funds on the grounds of public utility. a very strong case can be made for free, or at least heavily subsidized, access for deprived sectors of the community, but borderlines are no easier to draw here than elsewhere . some people will have no television and no money to spend on books, and for these there could be places that were open to the public and where free and direct access could be provided . in this case the future equivalent of the public library would become a sort of information flophouse. community centers another approach is more promising. the growth and importance of leisure have been stressed several times in this paper. at first it will be enforced leisure, but one can also expect more voluntary, guilt-free leisure. it will be a matter of the utmost importance to provide leisure centers-on public funds, because the unemployed will not be wealthy. these would be community centers that would serve the functions of self-education and mutual education, creative activities, and individual and group recreation. all of these will need access to information of different kinds , and most of them will generate information as well. an information unit will therefore be an absolutely essential part of the community center. some information will be produced for (and often by) groups rather than individuals: listening and viewing, retrieving and assessing information, are often better done in groups than individually. the pompidou centre in paris may be more forward-looking than even its originators thought. since the identity and coherence of the community will be very important, there will of course be no question of restricting it to one sector. issues of 266 journal of library automation vol. 14/4 december 1981 who pays for what will still have to be resolved , but this will be a community decision. in other words, something rather like the public library will have to be reinvented-but it will be necessary because it is vital to serve society, not because it happens to be there already. it would be rash to prophesy that community centers will be an inevitable element in future society, but it is hard to see how otherwise society can respond intelligently and humanely to the changes it will undergo. at the least, community information service and integration with education and leisure seem a more than worthy objective for the public library or its future equivalent. it is, moreover, an objective that the public library is in a good position to help achieve: it can give a lead as perhaps no other public service can. finally, it is an objective that is as applicable to developing as to developed countries. the concept of a worldwide network of community information units is very attractive, and may not be as utopian as it seems at first. national centers has all this future-gazing anything to do with national libraries? this is another question that i would turn round, and, instead, ask, what national library/information service will be needed to serve future society? this question is too big to discuss just now. however, one function seems fairly certain-that of acting as a backstop reservoir for permanent storage and supply, in support of all the local and institutional units. this function will have to be applied selectively, and selection will be one of the most difficult problems to handle. to avoid corruption of records, they should be stored in a more "fixed" form than volatile electronic media-paper, perhaps? bibliographic control--also selective-may also be a function, though the national center's role here may be mainly one of coordinating and standardizing rather than generating records. since the frontier between publishers and librarians, already subject to occasional frontier disputes, will become more and more fluid, and since both producers and disseminators are working towards a common end, separated only by the need to recover costs or make profits, not only should a truce be called between the parties but they must form an alliance. this will inevitably be somewhat uneasy-only a dead marriage has no tensions or conflicts-but continued warfare will incur penalties not only for the parties themselves but for society. library education finally, and briefly, what kind of training will be needed for our future intermediaries? even today i would like to see a much more broadly based course than conventional library education normally offers, one that is concerned with communications rather than librarianship and information technology-"communications" to include the whole field of publishing, bookselling, advertising, and communications technology. future edlibraries/line 267 ucation may need to be broader than this. since such a course could not possibly be contained within three or four years, and since much of it anyway is relevant to much of the population, it would be diffused, spread over a large number of years, starting in childhood and continuing through life. the educators too will need to be diffused-we should all be educating ourselves and one another and it should no longer be a question of administrators retiring to stud in their latter years to produce people in their own image. where expertise and special training will be needed is in information analysis and coding, even though one would hope and expect much of this to be done directly by the computer through analysis of the full original text. conclusion who will decide what happens? not the technologists, one must hope, nor any sort of elite; though opinion may need to be led, this is different from decision making by a few for the many. libraries and information people should obviously have a major say, and they may have to represent their users, present and potential. this implies that librarians must be fully integrated with the society they serve-just as doctors and planners should be, but so rarely are, integrated with the people they are supposed to be serving. information, as perhaps the major national resource of developed countries in the future, must never become the domain of "experts". a massive and continuing process of discussion, leadership, and consultation with information users-most of them ordinary people-is necessary, and should begin now. leadership of social trends is not a common or natural characteristic of the library and information profession, but it is desperately and urgently needed. the post-technological society, to conclude, should see libraries and information services totally integrated into society in a way that we are only beginning to glimpse today. the implications of this are much more profound than this paper indicates, and the whole future system needs to be thought through in depth. if there is a single main message in this paper, it is that a technological vision alone is hopelessly incomplete, that librarians and information people must at all costs avoid the planning disasters that narrow-thinking, well-meaning experts have scattered so liberally around us, and that they can take a positive lead not only in developing libraries and information services for the future society but in helping to shape that society. maurice b. line is director general of the british library lending division at boston spa, england. communications ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ search across different media | buckland, chen, gey, and larson 181 digital technology encourages the hope of searching across and between different media forms (text, sound, image, numeric data). topic searches are described in two different media: text files and socioeconomic numeric databases and also for transverse searching, whereby retrieved text is used to find topically related numeric data and vice versa. direct transverse searching across different media is impossible. descriptive metadata provide enabling infrastructure, but usually require mappings between different vocabularies and a search-term recommender system. statistical association techniques and natural-language processing can help. searches in socioeconomic numeric databases ordinarily require that place and time be specified. a hope for libraries is that new technology will support searching across an increasing range of resources in a growing digital landscape. the rise of the internet provides a technological basis for shared access to a very wide range of resources. the reality is that network-accessible resources, like the contents of a well-stocked reference library, are quite heterogeneous, especially in the variety of indexing, classification, categorization, and other forms of metadata. however, the use of digital technology implies a degree of technical compatibility between different media, sometimes referred to as “media convergence,” and these developments encourage the prospect of being able to search across and between different media forms—notably text, images, sound, and numeric data sets—for different kinds of material relating to the same topic. to examine the practical problems involved, the authors undertook to demonstrate searching between and across two different media forms: text files and socioeconomic numeric data sets.1 two kinds of search are needed. first, it should be possible to do a topical search in multiple media resources, so that one can find, for example, both pertinent factual numeric data and relevant discussion. (one difficulty is that the vocabulary used to classify the numeric data is ordinarily quite different from the subject headings used for books, magazine articles, and newspaper stories about the same topic.) second, when intriguing data values are encountered, one would like to move directly to topically relevant texts. likewise, when a questionable statement is read, one would like to be able to find relevant statistical evidence. therefore, there needs to be search support that facilitates such transverse searching among resources, establishing connections, transferring data, and invoking appropriate utilities in a helpful way. both problems were addressed through the design and demonstration of a gateway providing search support for both text and socioeconomic numeric databases. first, the gateway should help users conduct searches in databases of different media forms by accepting a query in the searcher’s own terms and then suggesting the specialized categorization terms to search for in the selected resource. second, if something interesting was found in a socioeconomic database, the gateway would help the searcher to find documents on the same topic in a text database, and vice versa. selection of the best search terms in target databases is supported by the use of indexes to the categories (entries, headings, class numbers) in the system to be searched. these search-term recommender systems (also known as “entry vocabulary indexes”) resemble dewey’s “relativ index,” but are created using statistical association techniques.2 four characteristics of this investigation need to be noted: 1. searching independent sources: the authors were not concerned with ingesting resources from different sources into a consolidated local data repository and searching within it. the interest lay, instead, in being able to search effectively in any accessible resource as and when one wants. this implies that interoperability issues in dealing with the native query languages and metadata vocabularies of remote repositories can be solved. 2. search for independent content: numeric data sets commonly have associated text in the form of documentation, code books, and commentary. however, the authors were interested in finding topical content that had no such formal or literary connection. independent means, for example, a newspaper article written by someone unaware that relevant statistical data existed or had been written before the author’s article existed. in the other direction, having found statistical data of interest, could topically related text created independently of this particular data point be found? 3. two different media forms were chosen: text and numeric data sets. they look similar because they both use arabic numerals, but the traditional reliance on information retrieval in a text environment search across different media: numeric data sets and text files michael buckland, aitao chen, fredric c. gey, and ray r. larson michael buckland (buckland@sims.berkeley.edu) is emeritus professor, school of information, university of california, berkeley; aitao chen (aitao@yahoo-inc.com) is a researcher at yahoo!, sunnyvale, california; fredric c. gey (gey@berkeley .edu) is an information scientist, uc data archive and technical assistance at the university of california, berkeley; and ray r. larson (ray@sims.berkeley.edu) is a professor, school of information at the university of california, berkeley. 182 information technology and libraries | december 2006 of using any character string from the corpus as a query, although technically feasible, cannot be expected to be useful here. one can copy a number expressing quantity, such as 12,941, from a numeric data cell, use it as a query in a text search engine such as google, and retrieve a large and eclectic retrieved set, usually involving “12941” as an identifying number for a postal code, a memorandum, a part number, software bug report, and so on, but the relationship is spurious. it requires great faith in numerology to expect anything topically meaningful to the original data cell one started with. with other combinations of media forms, not even spurious results are feasible: one cannot submit a musical fragment or some pixels from an image as a text query. 4. the authors’ interest was in how to achieve a better return on existing investments in well-formed, edited resources with descriptive metadata. this project built directly on prior work on how to make more effective use of existing, expertly developed metadata, rather than creating or replacing metadata. search of multiple resources comes in two forms: 1. parallel search is when a single query is sent to two or more resources at more or less the same time. for example, a researcher interested in the import of shrimp would like to see pertinent newspaper articles and trade statistics. thus, one might send a query to the census bureau’s united states (u.s.) imports and exports numeric data series and look at sic 0913 for shrimp and prawn and note a dramatic increase in imports from vietnam through los angeles from 1995 onwards. one would also search newspaper indexes for articles such as “normalizing ties to vietnam important steps for u.s. firms; california stands to profit handsomely when barriers fall to trade with fast-growing country.”3 different sources are likely to use different index terms or categories, so the challenge is how to express the searcher’s query in terms that will be effective for searching in the target resources, which, mostly likely, will use different vocabularies. as one example, the term for “automobiles” is 3711 in the standard industrial classification; tl 205 in the library of congress (lc) classification, 180/280 in the u.s. patent classification; and, in the census bureau’s u.s. imports and exports data series, pass mot veh, spark ign eng.4 2. transverse search is when an item of interest found in one resource is used as the basis for a query to be forwarded to a different resource. the challenge here, again, is that when a query using the topical metadata in one resource needs to be expressed in the vocabulary of the target resource, the metadata vocabularies in the two resources will usually be different from each other, and, quite likely, both are unfamiliar to the searcher. when searching within a single media form, it may be possible to use content itself directly as a query: a fragment of text in a source-text database is commonly used as a query in a target-text database. similarly, one might start with an image and seek images that are measurably similar. however, because such direct search cannot be done when searching across different media forms, an indirect approach relying on the use of interpretive representations becomes necessary. as the network environment expands, mapping between vocabularies will be increasingly important. ■ text and numeric resources text resource a library catalog—a special case of text file—was chosen for use as a text file rather than a corpus of “full text.” the reasons were practical: in this exploratory investigation, it was important to start with resources that had rich metadata; it needed to be a resource that was sufficiently controllable to enable experimentation with it. a library catalog was in the spirit of the project in that it would lead to additional text resources; and a suitable resource was available, which was intended for metadata mapping: a set of several million marc records, derived from melvyl, the university of california online library catalog. socioeconomic numeric data set initially, and in prior work, the authors had worked on access to u.s. federal data series, especially import and export statistics and county business reports. although some progress was made with interfaces to these data series, it became clear that the investment needed to craft interoperable access was high relative to the available staff. crafting access to individual data series did not appear to be a scalable way to demonstrate variety within the authors’ limited resources, so attention was turned to a single collection comprising many diverse numeric tables, the counting california database.5 ■ mapping topical metadata well-edited, high-quality databases typically have topical metadata expertly assigned from a vocabulary (thesaurus, classification, subject-heading system, or set of search across different media | buckland, chen, gey, and larson 183 categories). but there is a babel of different vocabularies. not only do the names of topics vary, but the underlying concepts or categories may also differ. effective searching requires expert familiarity with a system’s vocabulary; but as access to digital resources expands, the diversity of vocabularies increases and accessible resources are decreasingly likely to use vocabularies familiar to any individual searcher. the best answer is twofold: first, it is desirable to have an index (a “mapping”) from the natural language of each group of searchers to the entries used in each metadata vocabulary. such a mapping provides an index from a vocabulary familiar to the searcher to the vocabulary used in entries of the target system and so is called a search-term recommender system. (the authors called it an “entry-vocabulary index,” or evi.) dewey’s “relativ index” to his decimal classification is a familiar example. when searching across databases, one also wants a second kind of mapping: between pairs of system vocabularies. unfortunately, mappings between different vocabularies are rare, expensive, time-consuming, and hard to maintain. (the unified medical language system is a notable example.)6 it is the authors’ impression that this problem is worse in searching across different media forms because data bases in different media forms tend to be created by different communities, increasing the chances that they will use different categories, vocabularies, and ways of thinking. fortunately where data containing two forms of vocabulary are available, they can be used as training sets for statistical-association techniques to generate evis automatically, and this is the approach that was used. (more details can be found in the appendix.) from text words to library subject headings an evi from ordinary english words to library of congress subject headings (lcsh) was created by taking catalog records containing at least one subject heading (6xx field in the marc bibliographic format). from each of the 4,246,510 records used, main subject headings were extracted (subfield a from fields 600, 610, 611, 630, 650, and 651) and fields containing text: titles (245a), subtitles (245b), and summaries describing the scope and general content of the material (520a). the underlying assumption is that for each record, the words in the “text” fields (245a,b and 520a) tend to be characteristic of discourse on the subject (6xxa). two examples, with identifying lccns in the <001> field are: <001>73180254 //r86 <245>a study of operant conditioning under delayed reinforcement in early infancy <650>infant psychology <650>operant conditioning <001>73180255 <245>reptilian diseaserecognition and treatment <650>reptilesdiseases the words in the text fields (245a, 245b, and 520a) were extracted. stop words were removed and the remainder normalized. then the degree to which each word is associated with each subject heading (by co-occurring in the same records) was computed using a maximum likelihood ratio-based measure. natural-language processing can be used to identify adjective-noun phrases to support more precise searching using phrases as well as individual words. a very large matrix shows the association of each text word (or phrase) with each subject heading; so, for any given word (or combination of words), a list of the most closely associated headings, ranked by degree of association, can be derived from the matrix. queries a query, which can be a single word, a phrase, a set of keywords, a book title, and so on, is normalized in the same way and looked up in the matrix to produce a ranked list of the most closely associated subject headings as candidate lcsh search terms. for example, entering the textual query words “peanut” and “butter” generates the following ranking list of lcsh main headings as candidates for searching: rank lcsh (subfield 650a) 1. peanut 2. cookery (peanut butter) 3. cookery (peanuts) 4. peanut industry 5. peanut butter 6. butter 7. schulz, charles m. this display is an important departure from traditional fully automatic searching. the list is, in effect, a prompt, indicating probably suitable query terms in the vocabulary of the target resource. it introduces the searcher to the categories and terminology of the system and enables the searcher to use expert judgment to select the heading that seems best for the search. from text words to the metadata vocabularies in numeric data sets a training set of records containing both descriptive words and topical metadata is often not readily available for numeric data sets. the authors’ first effort was to create an evi to the standard industrial classification (sic), widely used over many years in numeric data sets. (sic codes were associated with words by using, as a training set, the 184 information technology and libraries | december 2006 titles in a bibliographic database that used sic codes.) but by the time the sic evi was completed, sic had been discontinued and replaced by the north american industry classification system (naics), so a mapping was created from sic codes to naics codes. figures 1–3 show stages in an interface that accepts a searcher’s query “car” (figure 1), prompts with a ranked list of naics codes (figure 2), then extends the search with the selected naics code to retrieve numeric data (figure 3). by this time, however, it had become apparent that, with the current low level of interoperability in software and in data formats, the labor required to create evis and interfaces to each large traditional numeric data series was enormous. therefore, attention was turned to a collection of different numeric data sets available through a single interface, counting california, made available by california digital library at http://countingcalifornia.cdlib.org. this resource is a collection of some three thousand numeric tables containing statistics related to a range of topics. the numeric data sets are mainly from the california department of health services, the california department of finance, and the federal bureau of the census. the tables are organized under a two-level classification scheme. there are sixteen topics at the top level, which are subdivided into a total of 184 subtopics. all the numeric tables were assigned to one or more subtopics and each table has a caption. at the counting california web site, a searcher can browse for tables by selecting a higher-level topic, then a lower-level subtopic, and then a table. two additional ways were created to access the tables: probabilistic retrieval, and an evi to the topical categories. the captions, topics, and subtopics were extracted for each of the three thousand tables, and xml records were created in the following form:

. if the table’s content is complex, it may be necessary to provide an alternative presentation of the information. it is best to rely on css for page layout, taking into consideration the directions in subparagraph (d) above. (i) frames shall be titled with text that facilitates frame identification and navigation. frames are a deprecated feature of html, and their use should be avoided in favor of css layout. (j) pages shall be designed to avoid causing the screen to flicker with a frequency greater than 2 hz and lower than 55 hz. lights with flicker rates in this range can trigger epileptic seizures. blinking or flashing elements on tending a wild garden: library web design for persons with disabilities | vandenbark 25 a webpage should be avoided until browsers provide the user with the ability to control flickering. (k) a text-only page, with equivalent information or functionality, shall be provided to make a web site comply with the provisions of this part, when compliance cannot be accomplished any other way. the content of the text-only page shall be updated whenever the primary page changes. complex content that is entirely visual in nature may require a separate text-only page, such as a page showing the english alphabet in american sign language. this requirement also serves as a stopgap measure for existing sites that require reworking for accessibility. some consider this to be the web’s version of separate-but-equal services, and should be avoided.9 offering a text-only alternative site can increase the sense of exclusion that pwd already feel. also, such versions of a website tend not to be equivalent to the parent site, leaving out promotions or advertisements. finally, a text-only version increases the workload of web development staff, making them more costly than creating a single, fully accessible site in the first place. (l) when pages utilize scripting languages to display content, or to create interface elements, the information provided by the script shall be identified with functional text that can be read by assistive technology. scripting languages such as javascript allow for more interactive content on a page while reducing the number of times the computer screen needs to be refreshed. if functional text is not available, the screen reader attempts to read the script’s code, which outputs as a meaningless jumble of characters. using redundant text links avoids this result. (m) when a web page requires that an applet, plug-in, or other application be present on the client system to interpret page content, the page must provide a link to a plug-in or applet that complies with [subpart b: technical standards] §1194.22(a) through (i). web developers need to ascertain whether a given plug-in or applet is accessible before requiring their webpage’s visitors to use it. when using applications such as quicktime or realaudio, it is important to provide an accessible link on the same page that will allow users to install the necessary plug-in. (n) when electronic forms are designed to be completed on-line, the form shall allow people using assistive technology to access information, field elements, and functionality required for completion and submission of the form, including all directions and cues. if scripts used in the completion of the form are inaccessible, an alternative method of completing the form must be made immediately available. each element of a form needs to be labeled properly using the tag. (o) a method shall be provided that permits users to skip repetitive navigation links. persons using screen reader software typically navigate through pages using the tab key, listening as the text is read aloud. websites commonly place their logo at the top of each page and make this graphic a link to the site’s homepage. many sites also use a line of graphic images just beneath this logo on every page to serve as a navigation bar. to avoid having to listen through this same list of links on every page just to get to the page’s content, a “skip to content” link as the first option at the top of each page provides a simple solution to this problem. (p) when a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required. some sites log a user off if they have not typed or otherwise interacted with the page after a certain time period. users must be notified in advance that this is going to happen and given sufficient time to respond and request more time as needed. n standards-setting groups and their work one organization that seeks to move internet technology beyond basic section 508 compliance is the web accessibility initiative (wai) of the world wide web consortium (w3c). the mission of the wai is to develop n guidelines that are widely regarded as the international standard for web accessibility; n support materials to help understand and implement web accessibility; and n resources through international collaboration.10 the w3c published its first web content accessibility guidelines (wcag 1.0) in may of 1999 for making online content accessible to pwd. by following these guidelines, developers create web content that is readily available to every user regardless of the way it’s accessed. the wai provides ten quick tips for improving accessibility in website design: n images and animations. use the “alt” attribute to describe the function of each visual. n image maps. use the client-side map and text for hotspots. n multimedia. provide captioning and transcripts of audio, and descriptions of video. 26 information technology and libraries | march 2010 n hypertext links. use text that makes sense when read out of context. for example, avoid “click here.” n page organization. use headings, lists, and consistent structure. use css for layout and style where possible. n graphs and charts. summarize or use the “longdesc” attribute. n scripts, applets, and plug-ins. provide alternative content in case active features are inaccessible or unsupported. n frames. use the “noframes” element and meaningful titles. n tables. make line-by-line reading sensible. summarize. n check your work. validate. use tools, checklist, and guidelines at http://www.w3.org/tr/wcag.11 many libraries and other organizations have sought to follow wcag 1.0 since it was published. recently, the w3c updated their standards to wcag 2.0, and the wai website offers an overview of these guidelines along with a “customizable quick reference” designed to facilitate successful compliance. the principles behind 2.0 can be summarized by the acronym p.o.u.r. perceivable n provide text alternatives for non-text content. n provide captions and alternatives for multimedia. n make information adaptable and available to assistive technologies. n use sufficient contrast to make things easy to see and hear. operable n make all functionality keyboard accessible. n give users enough time to read and use content. n do not use content known to cause seizures. n help users navigate and find content. understandable n make text readable and understandable. n make content appear and operate in predictable ways. n help users avoid and correct mistakes. robust n maximize compatibility with current and future technologies.12 these guidelines offer assistance in creating accessible web-based materials. given their breadth, however, they raise concerns of overly wide interpretation and the strong possibility of falling short of section 508 standards. reading the details in wcag 2.0 does not give any additional assistance to library web developers on how to create a section 508–compliant website. clark points out that the three wcag 2.0 documents are long (72–165 pages), confusing, and sometimes internally contradictory.13 the goal of a library webmaster is to provide an interface (website, opac, database, and so on) that is both cutting-edge and accessible, and to encourage its use by patrons of all ability levels. while they have outlined a helpful rationale, the w3c’s overlong guidelines do little to help library web developers to achieve this goal. n recommendations libraries today typically offer three types of web-based resources: (1) access to the internet, (2) access to subscription databases, and (3) a library’s own webpage, all of which need to be accessible to pwd. libraries trying to comply with section 508 are required to “furnish auxiliary aids and services when necessary to ensure effective communication.”14 there are a number of options available to libraries on tight budgets. the first set involves the features built into each computer’s operating system and software. for some users with visual impairments, enlarging the font size of text and images on the screen will make electronic content more accessible. both macintosh and windows system software have universal-access capabilities built in, including the ability to read aloud text that is on the screen using synthesized speech. the mac read-aloud tool is called voice over; the windows read-aloud tool is called narrator. both systems allow for screen magnification. exploring and learning the capabilities of these systems to enhance accessibility is a free and easy first step for any library’s technology offerings, regardless of funding restrictions. libraries with more substantial technology budgets have a wide variety of hardware and software options to choose from to meet the needs of pwd. for patrons with visual impairments, several software packages are available to read aloud the content of a website or other electronic document using synthesized speech. jaws by freedom scientific and windoweyes by gw micro are two of the best-known software packages, and both include the ability to output to a refreshable braille display (which both companies also sell). kurzweil 3000 is an education-oriented software package that not only reads on-screen text aloud but has a wealth of additional tools to assist students with learning difficulties such as attention deficit disorder or dyslexia. it is designed to integrate with any education package as well as to assist students whose primary language is not english. persons with low vision needing screen magnification beyond the features windows offers may look to magic by freedom scientific or zoomtext by ai squared. some of these tending a wild garden: library web design for persons with disabilities | vandenbark 27 software companies offer free trial versions, have online demonstrations, or both. because prices for this software and related equipment can be high, it is prudent to first check with patrons with visual impairments and professionals in the field prior to making your purchase. humbert and stores, members of indiana university’s web accessibility team, offer accessibility evaluations of websites and other services at the university. when asked to compare windows and macintosh systems as to their usefulness in assisting pwd with web-based media, humbert rated the windows operating system superior, explaining that it has the proper “handles” coded into its software for screen readers and assistive technologies to grab onto. assistive technology software is more stable in windows vista because its predecessor, windows xp, “used hacked together drivers to display the information.”15 humbert discourages the use of vista and jaws on an older machine because vista is a memory hog and can crash jaws along with the rest of the system. the web browsers internet explorer and firefox allow the user to enlarge text and images on a webpage, though firefox is more effective. text can be enlarged only if the webpage being viewed is designed using resizable fonts. stores, who is profoundly visually impaired, uses jaws screen-reader software to work and to surf the web. she notes that both browsers work equally well with screenreader software.16 an important web-based resource that libraries provide is subscription databases. however, as one study has shown, “most librarians lack the time, resources and/or skills to evaluate the degree to which their library subscription databases are accessible to their disability communities.”17 the question is do the vendors themselves make an effort to produce an accessible product? a 2007 survey of twelve major database companies found that while most “have integrated accessibility standards/ guidelines into their search interfaces and/or plan to improve accessibility in future releases,” only five actually conducted usability studies with people who use assistive technology. a number of studies have found that “while most databases are functionally accessible, companies need to do more to meet the needs of the disability community and assure librarians of the accessibility of their products.”18 subscription databases can be inaccessible to pwd in the display of search results and accompanying information. the three most common forms of results delivery are html full text, html full text with graphics, and pdf files. pdf files are notoriously inaccessible to persons using screen readers. while adobe has made significant strides in rendering pdfs accessible, many databases contain numerous pdf documents created in versions of adobe acrobat prior to version 5.0 (released in 2001), which are not properly tagged for screen readers. even newer pdf documents are only as accessible as their tagging allows. journal articles received from publishers may or may not be properly tagged, so database companies cannot guarantee that their content is fully accessible. one vendor that is avoiding this trap is jstor. using optical character recognition (ocr) software, jstor delivers image-based pdfs with embedded text to make their content available to screen readers.19 librarians must insist that database packages be accessible and compatible with the forms of assistive technology most frequently used by their patrons, both in-house and online. one tool used to evaluate database (or other product) accessibility is the voluntary product accessibility template (vpat). created in partnership between the information technology industry (iti) council and the u.s. general services administration (gsa) in 2001, it provides “a simple, internet-based tool to assist federal contracting and procurement officials in fulfilling the new market research requirements contained in the section 508 implementing regulations.”20 vpat is a voluntary disclosure form arranged in a series of tables listing the criteria of relevant subsections of section 508 discussed previously. blank cells are provided to allow company representatives to describe how their product’s supporting features meet the criteria and to provide additional detailed information. library personnel can request that vendors complete this form to document which subsections of section 508 their products meet, and how. to be most useful, the form needs to be completed by company representatives with both a clear understanding of section 508 and its technical details and thorough knowledge of their product. knowledgeable library staff are encouraged to verify the quality and accuracy of the information provided before purchasing. like databases, a library’s website needs to be accessible to patrons with a variety of needs. according to muncaster, accessible sites are 35 percent easier for everyone to use and are more likely to be found by internet search engines.21 fully accessible websites are simpler to maintain and are on average 50 percent smaller than inaccessible ones, which means they download faster, making them easier to use.22 in creating a basic site, current best practice has been to render the content in html or xhtml and design the layout using css. this way, if it is discovered the site’s pages are not fully accessible, a simple change to the css updates all pages, saving the site manager time and effort. finally, creating an accessible site from the beginning is substantially easier than retrofitting an old one. a complete rebuild of a library website is an opportunity to improve accessibility. reynolds’ article on creating a user-centered website for the johnson county (kans.) library offers an example of how libraries can apply basic information architecture design principles on a budget. johnson county focused on simple, low-budget 28 information technology and libraries | march 2010 usability studies involving patrons in the selection of site navigation categories, designing the layout, and testing the resulting user interface. by involving average users in this process, this library was able to achieve substantial improvements in the site’s usability. prior to the redesign, usability testing determined that 42 percent of users were not successful in finding information on the library’s old site. after the redesign, “only 4% of patrons were unsuccessful in finding core-task information on the first attempt.”23 even so, a quick test of the site with the online accessibility evaluation tool cynthiasays indicates that it still does not fully meet the requirements of section 508. had the library’s staff included pwd in their process, the demonstrated degree of improvement might have allowed them to meet and possibly exceed this standard. an understanding of how a person with disabilities experiences the online environment can help point the way toward improved accessibility. a recent study in the united kingdom tracked the eye movements of ablebodied computer users in an effort to answer these questions. researchers asked eighteen people with normal or corrected vision to search for answers on two versions of a bbc website—the standard graphical page and the textonly version. subjects’ eyes tended to dart around the standard page “as they attempt to locate what appears visually to be the next most likely location”24 for the answer. but in searching the text-only page, subjects went line-by-line, making smaller jumps across each page. researchers determined that the webpage and its layout serve as a form of external memory, providing visual cues to the structure of its content and how to navigate it. if the internet is an information superhighway, then the layout of a standard webpage serves as the borders and directional signs for browsing. the visual cues and navigation aids inherent in current webpages’ layouts provide no auditory equivalent for presentation to people with visual impairments. information seeking on the web is a complex process requiring “the ability to switch and coordinate among multiple information-seeking strategies” such as browsing, scanning, query-based searching, and so on.25 if web browsers could translate formatting and presentation into audio tailored to the needs of the visually impaired, the use of the internet would be a far more satisfying experience for those users. however, such web programming would require years of additional research and development. in the meantime, web librarians must strive to build sites that are clean, hierarchical, and usable by all persons by following to the standards and guidelines currently available. one way to enhance the accessibility of sites is to follow a database-driven web development model. in addition to using xhtml and css, dunlap recommends that content be stored in a relational database such as mysql and that a coding language such as php be used to create pages dynamically. this approach has two advantages. first, it allows for the creation of “a flexible website design style that lives in a single, easily modified file that controls the presentation of every web page of the site.”26 second, it requires far less time for site maintenance, freeing staff to devote time to assuring accessibility while accommodating changes in web technology. such a model can be used by database vendors to ensure that their services can seamlessly integrate with the library’s online content. use of mobile phones and similar devices to browse the web is at an all-time high, and content providers are eager to make their sites mobile-friendly. many of these end users experience similar barriers to accessing this content as pwd do. for example, persons with some motor disabilities as well as mobile phones with only a numeric keypad cannot access sites with links requiring the use of a mouse. sites that follow either the w3c’s mobile web best practices (mwbp) or wcag are well on their way to meeting both standards.27 by properly associating labels with their controls, internet content can be made fully accessible to both end users. understanding the similarities between mwpb and wcag can lead to website design that is truly perceivable, operable, understandable, and robust. n summary librarians with responsibility for web design and technology management operate in an evolving environment. legal requirements make clear the expectation to serve the wide variety of needs of patrons with disabilities. yet the guidelines and standards available to assist in this venture range from complex to vague and insufficient. assistive technologies continue to improve with many traditional vendors confident that their products are accessible. in actual use, however, substantial challenges and shortcomings remain. the challenge for technology librarians is to be proactive in keeping abreast of technological advances, to experiment and learn from their efforts, and to continually update and adapt to provide web or hypermedia information and services to patrons of all kinds. references 1. united nations, convention on the rights of persons with disabilities, 2008, http://www.un.org/disabilities/default .asp?navid=12&pid=150 (accessed aug. 10, 2009). 2. ibid. 3. erika steinmetz, americans with disabilities (washington, d.c.: u.s. census bureau, 2002). tending a wild garden: library web design for persons with disabilities | vandenbark 29 4. u.s. department of justice, civil rights division, disability rights section, “title ii highlights,” aug. 29, 2002, http:// www.ada.gov/t2hlt95.htm (accessed july 26, 2008). 5. marilyn irwin, resources and services for people with disabilities: lesson 1b transcript (indianapolis: indiana university at indianapolis school of library and information science, 2008): 10 6. ibid., 10 7. 1998 amendment to section 508 of the rehabilitation act, subpart b—technical standards, §1194.22, http://www .section508.gov/index.cfm?fuseaction=content&id=12#appli cation (access dec. 2, 2009). 8. access it, “how long can an ‘alt’ attribute be?” university of washington, 2008, http://www.washington.edu/ accessit/articles?257 (accessed dec. 12, 2008). 9. matt may, “on ‘separate but equal’ design,” online posting, june 24, 2004, bestkungfu weblog, http://www.bestkungfu .com/archive/date/2004/06/on-separate-but-equal-design/ (accessed dec. 18, 2008). 10. web accessibility initiative, “wai mission and organization,” 2008, http://www.w3.org/wai/about.html (accessed july 22, 2008). 11. shawn lawton henry and pasquale popolizio, “wai, quick tips to make accessible web sites,” world wide web consortium, feb. 5, 2008, http://www.w3.org/wai/quicktips/ overview.php (accessed mar. 30, 2008). 12. ben caldwell et al., “web content accessibility guidelines (wcag) 2.0,” world wide web consortium, dec. 11, 2008, http://www.w3.org/tr/wcag20/ (accessed july 27, 2008). 13. joe clark, “to hell with wcag 2,” a list apart no. 217 (may 26, 2006), http://www.alistapart.com/articles/tohellwith wcag2 (accessed july 25, 2008). 14. u.s. department of justice, “title ii highlights.” 15. joseph a. humbert and mary stores, questions about new software and accessibility (richmond, ind., july 28, 2008). 16. ibid. 17. s. l. byerley, m. b. chambers, and m. thohira, “accessibility of web-based library databases: the vendors’ perspectives in 2007,” library hi tech 25, no. 4 (2007): 509–27. 18. ibid. 19. p. muncaster, “poor accessibility has a price,” vnu net, feb. 9, 2006, http://www.vnunet.com/articles/send/2150099 (accessed july 27, 2008). 20. information technology industry council, “faq: voluntary product accessibility template (vpat),” http://www.itic .org/archives/articles/20040506/faq_voluntary_product_ accessibility_template_vpat.php (accessed july 29, 2008). 21. muncaster, “poor accessibility has a price.” 22. isaac hunter dunlap, “how database-driven web sites enhance accessibility,” library hi tech 23, no. 8 (2008): 34–38. 23. erica reynolds, “the secret to patron-centered web design: cheap, easy, and powerful usability techniques,” computers in libraries 28, no. 6 (2008): 6–47. 24. caroline jay et al., “how people use presentation to search for a link: expanding the understanding of accessibility on the web,” universal access in the information society 6, no. 3 (2006): 307–20. 25. c. kouroupetroglou, m. salampasis, and a. manitsaris, “browsing shortcuts as a means to improve information seeking of blind people in the www,” universal access in the information society 6, no. 3 (2007): 11. 26. dunlap, “how database-driven web sites enhance accessibility.” 27. web accessibility initiative, “mobile web best practices 1.0,” july 29, 2008, http://www.w3.org/tr/mobile-bp (accessed aug. 10, 2009). are pdf documents accessible? | ribera turró 25 are pdf documents accessible? mireia ribera turró adobe pdf is one of the most widely used formats in scientific communications and in administrative documents. in its latest versions it has incorporated structural tags and improvements that increase its level of accessibility. this article reviews the concept of accessibility in the reading of digital documents and evaluates the accessibility of pdf according to the most widely established standards. in a world in which an increasing amount of informa-tion is circulating in digital format, document acces-sibility is becoming a major concern. many countries have adopted legislative measures concerning digital accessibility (see, for example, the web accessibility initiative at www.w3.org/wai/policy) and the guru of the web, jakob nielsen, has included it in several columns (nielsen 1996, 1999) and reports (coyne and nielsen 2001a, 2001b, 2001c; schade and neilsen 2002). improving document accessibility for disabled persons, including the elderly, offers good business opportunities for it firms. for example, sun has introduced strict accessibility guidelines in its java programming language, and microsoft has incorporated an increasing number of assistive technologies in its operating system. for its part, adobe came out clearly in favor of accessibility in the latest updates of its flagship format, pdf, and its free reader program (adobe 2005). the efforts of these and many other companies are necessary if persons with disabilities are to be able to use products as well as persons without disabilities. in an effect similar to that of the cascade of interactions that takes place in the search for information in a digital library (bates 2002), the accessibility of a digital product is contextual and depends on many layers: the product itself, the application used to operate it, the support of the operating system, and the additional assistive technologies used to transform the content (henry 2007). for example, an html document is considered to be accessible if it complies with the web content accessibility guidelines (wcag 1.0) (w3c 1999, w3c 2006), but it is only usable if the browser with which it is consulted provides the options of accessibility (e.g., by allowing users to modify the associated style sheet), if the user has the necessary assistive technologies—screen magnifiers, screen readers, alternate pointing devices, etc.—to use the information and functionality contained in the document, and if all these tools interact correctly with each other. this article focuses on the accessibility of the pdf due to its importance in the world of digital publishing. though we do not have global statistics on its use, a google search specifying pdf as the format returns 236 million documents, whereas none of the other recoverable formats reaches 50 million documents (postscript 10 million, microsoft word 37 million, microsoft excel 14 million, microsoft powerpoint 14 million). (the search was performed on april 14, 2007, with the arguments filetype:pdf, etc. values were rounded off to the nearest million.) it should be remembered that pdf is the main format used for digital publishing of electronic journals and for a great variety of administrative documents, including e-government communications. furthermore, the subformat pdf/a for archiving is the preferred format for digital preservation in many large libraries, including the library of congress, which recommends it for textual documents in which the appearance is more relevant than the structure (library of congress 2005). finally, according to a study by forrester research in 2005, pdf/a and xml will be the dominant formats in document archiving in 2008 (markham 2005). if our digital memory is going to be in pdf, we must ensure that it is accessible to all persons. so far, the many studies that have been carried out on the accessibility of digital information have considered mainly the accessibility of web content in html. digital documents in a broad sense have never been evaluated from an accessibility viewpoint, and the only user studies carried out on them have dealt with usability—without paying particular attention to special capacities (see dillon 2004)—or user preferences with regard to articles in electronic format (tenopir 2003). because it is a very new field, few studies have concentrated on pdf accessibility, and they do not form part of the scientific literature. however, joe clark and duff johnson have published some interesting articles on the subject (see clark 2005, johnson 2006, 2007a, and 2007b). n what does accessible really mean? the most widespread view of digital accessibility is the regulatory one. the concept of accessibility of digital information was mainly disseminated with the publication of wcag 1.0 by the world wide web consortium (w3c) as the de facto standard in this area, and with its incorporation in the federal legislation of the united states. in the united states, compliance with the accessibility guidelines has been used as a requirement in calls for tenders, and in some cases bidders have been taken to court for failure to comply. from this viewpoint, an accessible application is one that is “valid,” i.e., approved by the criteria of wcag 1.0 or section 508 of the rehabilitation act (u.s. access board) and complymireia ribera turró (ribera@ub.edu) is professor at universitat de barcelona department of library and information science. 26 information technology and libraries | september 2008 ing with their established checklist (see appendixes for the detailed checklist). products must be certifiable as accessible. to facilitate the administrative procedures for approving bids, there is great interest in the creation of automatic protocols for checking compliance. examples of this are products such as the historic bobby, the lift extension for adobe dreamweaver, and the new wcag 2.0, which is still being revised at the time of submission of this manuscript. some authors have evaluated digital journals and database interfaces according to their compliance with these guidelines (e.g., coonin 2002; stewart, narendra, and schmetzke 2005). see the importance of the concept “programmatically determined” in the draft version of wcag 2.0 (w3c 2006). the international standard defined by iso 16071:2003 offers a different definition of accessibility. it considers accessibility to be “the usability of a product, service, environment, or facility by people with the widest range of capabilities” (iso 16071:2003). in other words, accessibility is considered equivalent to usability, with the sole difference that the objective users are not specified but rather defined broadly as having “the widest range of capabilities.” if we consult the standard definition of usability (iso 9241-11:1998), we can rewrite the definition of accessibility as “the extent to which a product can be used (by users with the widest range of capabilities) to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use.” accessibility must also be measured according to the parameters of efficacy, efficiency, and satisfaction for the type of user (hornbaek 2006). furthermore, one should not consider the accessibility of a product in general, but rather in a given context and for given tasks. according to this definition, it would not be appropriate to state that “the website of the company company.com is accessible.” we should state that “the website of the company company.com is accessible for broadband connections in office environments with the browsers internet explorer 6.0 or later and mozilla firefox 7 or later, and for commercial transactions.” a new concept, the “baseline,” appears is wcag 2.0. it marks a great change in the philosophy of website accessibility. the baseline defines the context of the software platform, and the accessibility can be evaluated only in this context. specifying the context is particularly important because some authors have seen a parallel between disabled users and disabling situations, which arise increasingly with the new paradigm of ubiquitous computation (newell 1995). for example, a person suffering from deafness may have the same problems of access as a user in a noisy environment (a discotheque) or one in which silence is compulsory (a hospital or library). accessibility is also linked to computer manipulability. according to the mvc (model-view-controller) pattern, the final format or document should allow its content, its presentation, and its interactivity to be manipulated independently in order to personalize each one according to the user’s preferences. for example, a webpage (content) should be navigable with a keyboard and with a mouse (control), and should be viewable with different font sizes, color contrasts, etc. (presentation). the separation of presentation and content, for example, through cascading style sheets (css), is thus highly desirable. however, there are also other aspects of digital formats related to computer manipulability. for example, it is considered that an open format is more accessible than a proprietary format because it is easier to develop assistive technologies to take advantage of its potential; a multiplatform format is more accessible than a format linked to a particular platform because it is adapted to a greater diversity of controls; a format that includes characteristics of internationalization is more accessible than one that does not because it can present a greater wealth of content; and a format that uses semantic encoding is more accessible than one that uses syntactic encoding because the software tools can extract more information from it. finally, several authors in the field of publishing (dechilly 2004) and the field of accessible design (paciello 2000, petrie and weber 2002) have related the accessibility of the digital documents to the structuring of the content and its potential transformation. specifically, raman (1994) defines the accessibility of a digital document as n the amount of structural information captured by the encoding; n the degree to which this structural information is available for processing by other applications; and n the availability of the appropriate software needed to process this structure. n disabilities that affect reading and assistive technologies returning to the definition of accessibility as being for the widest range of capabilities, it is observed that some disabilities have direct effects on digital reading and associated activities (o’hara 1996), and that there are some well-established assistive technologies that can eliminate or minimize these effects. there are three main groups of print disabilities that also affect the comprehension of graphics: n all degrees of vision problems, from total blindness to reduced vision, color-blindness, and other dysfunctions. the most widely used assistive technology for total blindness is that of screen readers, which digest the information on the screen and transform it into spoken text. reduced vision makes it difficult to read or capture the information offered; for persons with this disability, screen are pdf documents accessible? | ribera turró 27 magnifiers offer an optical zoom of the information shown in addition to color and contrast adjustment. in both cases additional information in the document is often required, such as explanatory subtitles for video recordings and alternative descriptions for images. n motor skill problems, particularly those affecting the upper extremities. this disability hinders the interaction with information and the activation of controls, links, and even linear scrolling in the document. there are a great variety of assistive technologies for persons with this disability, including pointing devices, alternative keyboards, voice synthesis technologies for activating controls, and even assistive technologies for automatic text completion. n different types of cognitive problems that affect reading comprehension. those caused by dyslexia or early deafness can benefit from screen magnifiers, screen readers, and automatic text completion. those caused by cognitive disabilities—which have not been widely studied—often require a simplified presentation of the information through graphics or very simple language. n pdf pdf is the descendent of postscript and is oriented toward presentation. though recent research has experimented with new, more versatile image models (bagley, brailsford, and hardy 2003), these have not yet been incorporated in the commercial format, which is still basically postscript. pdf is a format of digital dissemination that has replicated paper documents for many years. its faithful reproduction and portability on different platforms, in addition to a commercial policy of free dissemination of the reader program, have given it a dominant market position among digital publishing formats. in the latest versions, pdf incorporates functions of digital management of access rights (drm) and allows information providers to regulate the permission to view, print, extract, and modify the content. the orientation toward presentation, which has turned pdf into the de facto standard in the publishing industry, is its main drawback for accessibility. in order to solve this, from version 1.4 onward adobe has incorporated structural elements in the format (e.g., structural tags). this article therefore only studies the most recent versions of the format. despite the potential of the format, it is possible to create pdf documents of very different qualities. the application from which a pdf document is generated and the process followed in creating it directly affect the accessibility of the resulting document. specifically, pdf can have four increasing levels of accessibility: 1. pdf image—no accessibility 2. pdf text 3. pdf text, with order 4. tagged pdf—maximum accessibility pdf image documents a pdf image document is a document obtained from scanning or photographing a printed document. its content is exclusively the bitmap resulting from the optical process. it does not allow searches in the document or text extraction because the text is only coded as a graphic. a pdf image document is similar to a paper document in its level of accessibility. for blind persons or persons with reading problems (caused by dyslexia or early deafness) it must be transformed through an optical character recognition process in order to encode the text and adapt it to screen readers or alter the presentation. the only advantage that the digital presentation may have over a paper presentation is the possibility of optical zooming and increasing the text size to benefit persons with certain visual impairments. pdf text documents the second level of accessibility is that of pdf text documents, which come from the same source as image files but have gone through an optical character recognition process and incorporate the resulting text in the file. in this case it is possible to search the content, export the content to a word processor, listen to it with screen readers, and perform other types of conversion. specifically, as of version 6 or later, reader incorporates the possibility of removing columns, viewing the text in negative (white on black), increasing the font size, and even hearing it with a screen reader. according to the quality of the ocr used and of the subsequent manual revision, these files often contain slight typographic errors that may affect the results of text searches and also make continued reading more difficult (especially when a screen reader is being used). if the quality of the original is poor, or it is in bad condition, ocr programs often make small mistakes. furthermore, if the original has a creative layout in which the order of presentation does not correspond to the order of reading, the resulting text appears disordered and is therefore illegible; this problem can arise due to such common structures as footnotes, headers, and margin notes. pdf documents with ordered text the third level of accessibility is that of pdf text docu28 information technology and libraries | september 2008 ments in which the correct reading order has been established. this can be done when the document is created or by editing it at a later stage. tagged pdf documents the fourth and highest level of accessibility is when the pdf document contains ordered text and structural tags to define headers, tables, lists, etc. with this encoding, an assistive technology can present a summary of the document, facilitate navigation, provide structural information of the content, etc. this fourth level is normally achieved only by postprocessing a pdf document that has already been created. when documents are converted from the most widely used word processors to pdf, there still arise errors that must be revised manually. for example, when tables are converted from microsoft word to pdf, the tags are created correctly in general, but the headers of the tables are not marked up because word does not allow them to be differentiated structurally. another example is the conversion from open office 2.0, in which the documents created have major structural deficiencies. n are pdf documents accessible? even at the highest level of accessibility, tagged pdf document, the aspects discussed above in the definition of accessibility must be checked. though pdf is totally multimedia, particularly in the latest versions, and it now allows programming to be included, the commonest pdf documents are plain documents in which text and images are reproduced and the interactivity is reduced to the use of forms; these documents will thus be the ones analyzed here in an initial approach to pdf accessibility. though this may represent a limitation, in fact it includes most of the pdf documents used for electronic journals and administrative documents. pdf’s accessibility is analyzed from the viewpoint of the end users, the readers of the document, so comments on the most widespread user agent, reader, are also included. pdf’s compliance with wcag 1.0, the wcag 2.0 draft, and section 508 is evaluated, and its accessibility is considered from the viewpoint of the platforms on which it runs and the programs that can be used to create documents. however, reader and adobe acrobat professional are not evaluated as authoring tools or user agents because that is not the focus of this article. wcag 1.0 in 1997 the w3c officially created an initiative to foster the accessibility of the web (engelen 2001), following the vision of its creator, tim berners lee: “the power of the web is in its universality. access by everyone regardless of disability is an essential aspect.” the web accessibility initiative (wai, www.w3.org/wai) works with organizations around the world to develop strategies, guidelines, and resources to help make the web accessible to people with disabilities. w3c-wai has established three recommendations to improve the accessibility of the web: n wcag 1.0, released in may 1999, based on the trace unified web guidelines (version 8), affecting web content in itself (e.g., an html page) n the authoring tool accessibility guidelines, affecting software used to build websites (e.g., dreamweaver) n the user agents accessibility guidelines (uaag), affecting browsers and multimedia players that interact with web content (e.g., mozilla firefox) of these guidelines, the ones that have had the greatest impact are wcag 1.0, because they affect content providers, such as governments. most legislations promoting digital accessibility have made direct or indirect reference to these guidelines. wcag 1.0 is divided into in fourteen guidelines or general principles of accessible design, which are specified through several checkpoints. each checkpoint has a priority level based on its impact on accessibility. checkpoints of [priority 1] are a basic requirement for some groups to be able to use web documents. the ones of [priority 2] will remove significant barriers to accessing web documents. checkpoints of [priority 3] will improve access to web documents (w3c 1999). wcag 1.0 is designed to evaluate documents, not formats, so in this article the evaluation often refers to the potential of the format if it is used properly to create documents. pdf can potentially comply with all the checkpoints of wcag 1.0 applicable to text, images, and forms (its multimedia potential has not been analyzed in this article) in any priority except for three points: 5.2 headers for tables of complex data, in priority 1. the current version of pdf only foresees the use of th as the header of tables, specifies attributes that allows it to be related to columns or files, but provides no mechanism for grouping cells (like colgroup or rowgroup of html). 3.4 relative units in markup language attribute values and style sheet property values, in priority 2. though the most recent versions of pdf use css, the format only allows absolute values to be specified. however, this does not prevent reader from making extensive zooms of the content of the pdf are pdf documents accessible? | ribera turró 29 documents. 9.5 provide keyboard shortcuts, in priority 3. no mechanism is specified for linking a keyboard shortcut to a content. in both point 4.1, for identifying changes of language, and point 5.1, on simple table headers (both of them in priority 1), although pdf provides for the incorporation of this information, some experiments carried out with the acrobat professional tools for conversion from microsoft word have shown that this information is not correctly transferred from the word processor to the pdf document. for a more detailed analysis, see the tables in appendix a. here we have only commented on the most outstanding aspects. wcag 2.0 draft in the course of time wcag 1.0 has become out of date. the web of 2007 is very different from that of 1999: the increased use of multimedia formats; the growth of webbased software, ajax, the xml subformats; and the paradigm of ubiquitous computing have made it necessary to redefine wcag 1.0 guidelines, which were initially highly focused on html, in order to extend them to all types of format; it has been considered necessary to be able to define an available software platform baseline (for a detailed discussion of the differences between wcag 1.0 and wcag 2.0, see www.webaim.org/stan dards/wai/wcag2.php#differences). however, due to the enormous success of wcag 1.0, the development of the wcag 2.0 has been subject to enormous pressure, and it has received more comments and suggestions than any other guideline of w3c. this is why the process of approval is slower than normal, and though publication dates have been repeatedly announced, the current document is no more than a draft. wcag 2.0 has four principles, each one addressed by several guidelines. under each guideline there are success criteria used to evaluate conformance to this standard for that guideline, which fall into three levels of conformance, each representing a higher level of accessibility (w3c 2006). pdf complies almost absolutely with all the success criteria of all the levels of the four principles of accessibility described in wcag 2.0. only in principle 3 (which establishes that both the content and the controls must be understandable), guideline 3.1, level 3, “make text content readable and understandable” does wcag 2.0 fail in several success criteria: 3.1.3 offer definitions for words used in an unusual way. 3.1.6 offer the pronunciation of words where meaning cannot be determined without pronunciation. in all of these points, the generic title attribute could be used for all the tags, but there is no standard mechanism for differentiating pronunciation or slang. for a more detailed analysis, see the tables in appendix b. here we have only commented on the most outstanding aspects. section 508 of the rehabilitation act in august 2000 the u.s. access board (www.access-board .gov), an independent federal group that oversees the production of guidelines on accessibility for compliance with various legislative measures, published a revised amendment of the rehabilitation act of 1973, stating that the information provided by federal agencies must be accessible to people with disabilities (engelen 2001). as johnson mentions, “the regulations also apply to contractors that submit electronic documents to the federal government”(2007b). compliance with section 508 is parallel to compliance with wcag 1.0. pdf complies with all the points except the ability to associate data cells and header cells for tables that have more than one logical level (checkpoint h). for a detailed analysis see the tables in appendix c. here we have only commented on the most outstanding aspects. n are pdf documents accessible from a computer’s viewpoint? some doubt still remains about whether pdf is an open or closed format. it is a proprietary format, so strictly speaking it is not open, but adobe systematically publishes the specification of format and allows it to be used by third parties free of charge, simply reserving the intellectual property rights (see adobe 2006, 32, for the exact terms of the adobe license). furthermore, adobe recently initiated the process for pdf to become an iso standard (adobe 2007). the latest versions of pdf allow content and presentation to be differentiated. the content consists of the text and images, and the presentation can be encoded like webpages with some properties of css version 1 and 2. the control of the document depends on the program used to read it, since pdf does not allow keyboard shortcuts or alternative actions to be defined for any application. one of its main advantages is that it can be reproduced faithfully on any platform, but in terms of accessibility it is not multiplatform. the structure, links, and forms of adobe reader for assistive technologies are accessed through the microsoft active accessibility technology, so they can only be used on microsoft platforms. the screen reader incorporated as of reader 6 does work on both windows and macintosh. 30 information technology and libraries | september 2008 with regard to internationalization, pdf supports the unicode character set and also allows the language of the document to be specified on a global and local level. though the specification of the format establishes that it supports inverse reading order (e.g., for arabic languages) or vertical reading order (e.g., for asian languages), webaim (webaim 2006) states that it causes problems with them. the structure in a pdf document is transmitted mainly through the tags incorporated in the format since version 1.4. the standard set of tags is fairly limited—more so than that of html 4 (see adobe 2006, sections 10.7 and 10.8, for a complete list). the set of tags can be extended but always with an equivalent with the standard set, which is the only one supported by adobe. the structure is transmitted through the microsoft active accessibility (msaa) technology to any assistive technology, so other applications can also read it. msaa provides agents and synthesizers in several languages that do not tend to be installed by default but can be downloaded free of charge from the microsoft website (www.microsoft .com/msagent/downloads/user.asp). there are currently few programs that can process this structure, of which reader is the most widespread. among the screen readers, jaws by freedomscientific and windows eyes by gw micro can also process pdf files on a structural level. according to joe clark (2005), ibm home page reader and hal screen reader by dolphin can also do so. most of these programs incorporate support for pdf in their latest versions, which are not always the most widely used. n are pdf documents accessible according to the iso standard? on this topic, few studies have been made and much work remains to be done. accessibility should be evaluated for the different types of disabilities that affect electronic reading and for different scenarios and contexts of use. though the tests of users with disabilities are beginning to be generalized, and there are even guidelines on how to do them (coyne and nielsen 2001a), after an extensive bibliographic search i was unable to locate any studies evaluating this aspect in pdf. n the opinion of the experts joe clark, the author of one of the most important books on website accessibility, building accessible websites (2003), currently forms part of adobe’s work group on usability and accessibility. he is one of the greatest proponents of pdf, and claims that it offers some advantages of accessibility/usability compared with html (its greatest competitor for digital documents), such as its ability to use footnotes, notes, and comments. he gives little importance to the fact that it is a proprietary format and argues that the important point is that the specification is public. in his article on the accessibility of pdfs (clark 2005), he defends the use of the format compared with others for certain objectives and needs: for interactive forms, for documents in revision, for design fonts not available in html, as a format of preservation, and for documents that require the management of digital rights. with regard to software, he makes a great defense of reader for taking advantage of documents and resolving problems of accessibility, but recognizes that in the field of authoring tools more programs are necessary. webaim, an initiative for accessibility at utah state university, gives its opinion on some points that facilitate or hinder the accessibility of pdfs: n the screen-reading function only exists in the complete version of reader, and by no means offers the same functionality as jaws or windows eyes. furthermore, to use it one must memorize new access keys that are not common to other programs. n the reflowing function is very useful for persons who require magnification or who work with small screens because it eliminates the horizontal scroll. n webaim recommends the use of html for tables, particularly if they are complex, because current screen readers can process them far better than if they are in pdf. n webaim criticizes the fact that the options for configuring accessibility in reader are highly oriented toward screen readers and magnifiers, and that they are only partial and confusing. for example, it mentions the fact that it is possible to configure some multimedia options but not from the accessibility setup wizard. (webaim 2006) another aspect that receives constant criticism is the cost of creating accessible pdf documents. though throughout this article the accessibility of pdf has been studied from the viewpoint of the user, the lack of tools for creating documents must also be stressed. though the specification is in theory public and there exist software tools (e.g., the pdflib software library and the xslt transformations in open office) to generate tags in a pdf that are the result of a document conversion, users normally depend on the tools of microsoft office and the acrobat professional program to create accessible pdf documents. even with these tools, editing is not as intuitive as editing a plain text tagged with html, and it is are pdf documents accessible? | ribera turró 31 far less maintainable because in pdf tags and content are encoded separately. n conclusions as has been seen in the analysis, pdf can be considered fairly accessible from many points of view, and its degree of compliance with the most widely recognized guidelines is fairly high. however, a surprising lack of attention is paid to complex data tables, which form a very important part of scientific articles, one of the main types of document encoded in this format. the accessibility of pdf has greatly reduced its multiplatform nature and it is to be expected that adobe will gradually resolve this point, particularly in the linux environment that has been adopted by many public administrations. an accessible and multiplatform pdf is a requisite for a really public electronic government. the format still faces three major challenges with regard to accessibility: n the creation of powerful authoring tools that allow documents to be edited easily, to be modified, and to be partially changed without having to restart the cycle of creation. there is a lack of authoring tools for creating accessible pdf documents easily and a lack of integration of the creation process in the most widely used word processors. it is to be expected that with the merger of macromedia and adobe these tools will shortly appear on the market. n a greater opening of the format by adobe in order to foster its extensibility (adobe recently applied for pdf to become an isa standard), which in digital articles is beginning to be a requirement for the annotation of mathematical or chemical formulas. (see, for example, the specific development of the infty project for reading mathematical formulas in pdf [kanahori 2006].) n a greater wealth of tags and attributes in order to express variants, and additional or complementary information, such as definitions and pronunciation. despite its shortcomings, the possibilities of the pdf with regard to accessibility have increased greatly in the latest versions, and it is now almost on a level with html. the existence of reader with several options for facilitating accessible reading increases its attractiveness even further. the experts recognize the giant steps made by the format, though they are aware of its limitations; for the moment their advice is to use the right format for the right task. finally, further research is required in order to gather the opinion of users on its accessibility. works cited adobe. 2005. creating accessible pdf documents with adobe acrobat 7.0: a guide for publishing pdf documents for use by people with disabilities. www .adobe.com/enterprise/accessibility/pdfs/acro7_pg_ ue.pdf (accessed apr. 17, 2007). ———. 2006. pdf reference: adobe portable document format version 1.7, 6th ed. www.adobe.com/devnet/acrobat/ pdfs/pdf_reference.pdf (accessed apr. 17, 2007). ———. 2007. adobe to release pdf for industry standardization. press release, jan. 29. www.adobe.com/aboutadobe/ pressroom/pressreleases/200701/012907openpdfaiim .html (accessed apr. 17, 2007). bagley, s. r., d. f. brailsford, and m. r. b. hardy. 2003. document formatting: creating reusable well-structured pdf as a sequence of component object graphic (cog) elements. presented at document engineering, grenoble, france. bates, m. j. 2002. the cascade of interactions in the digital library interface. information processing and management 38, no. 3: 381–400. clark, j. 2003. building accessible websites. indianapolis: new riders. ———. 2005. facts and opinions about pdf accessibility. a list apart, aug. 22, 2005. www.alistapart.com/articles/ pdf_accessibility (accessed apr. 17, 2007). coonin, b. 2002. establishing accessibility for e-journals: a suggested approach. library hi tech 20, no. 2: 207–20. coyne, k. p., and j. nielsen. 2001a. how to conduct usability evaluations for accessibility: methodology guidelines for testing websites and intranets with users who use assistive technology. fremont, calif.: norman nielsen group. ———. 2001b. beyond alt text: making the web easy to use for users with disabilities. fremont, calif.: norman nielsen group. ———. 2001c. web usability for senior citizens: design guidelines based on usability studies with people age 65 and older. fremont, calif.: norman nielsen group. dechilly, t. 2004. diffusion de contenus et de documents sur internet. in j. le moal, b. hidoine, and l. calderna (eds.), publier sur internet : séminaire inria 27 septembre–1er octobre 2004 aix-les-bains: 65–100. (s.l.): association des professionals de l’information et de la documentation (adbs)/institut national de recherche en informatique et en automatique. dillon, a. 2004. designing usable electronic text. 2nd ed. boca raton, fla.: crc press. engelen, j. 2001. guidelines for web accessibility. inclusive design guidelines for hci. ed. nichole collette and julio abascal. london: taylor and francis. henry, s. l. 2007. just ask: integrating accessibility throughout design. www.uiaccess.com/accessucd (accessed apr. 17, 2007). hornbaek, k. 2006. current practice in measuring usability: challenges to usability studies and research. international journal of human-computer studies 64, no. 2: 79–102. 32 information technology and libraries | september 2008 iso 16071:2003. ergonomics of human-system interaction— guidance on accessibility from human-computer interfaces. iternational organization for standardization, 2003. iso 9241-11:1998. ergonomic requirements for office work with visual display terminals (vdts)—part 11: guidance on usability. international organization for standardization, 1998. johnson, d. 2006. what are pdf tags and why should i care? www.acrobatusers.com/articles/2006/02/pdf_tags/pdf_ tags.php (accessed apr. 17, 2007). ———. 2007a. pdf goes to iso: the road ahead. www.planet pdf.com/enterprise/article.asp?contentid=pdf_goes_to_ iso_-_the_road_ahead&page=0 (accessed apr. 17, 2007). ———. 2007b. pdf in government. www.acrobatusers.com/ articles/2007/02/pdf_in_government/index.php (accessed apr. 17, 2007). kanahori, t. 2006. scientific pdf document reader with simple interface for visually impaired people. conference on computers helping people with special needs (icchp 10th), university of linz, austria. library of congress. 2005. pdf/x, pdf for prepress graphics file exchange. sustainability of digital formats planning for library of congress collections. http://digitalpreservation .gov/formats/fdd/fdd000124.shtml (accessed apr. 17, 2007). markham, r. 2005. the market for accessible technology— the wide range of abilities and its impact on computer use. www.microsoft.com/enable/research/phase1.aspx (accessed apr. 17, 2007). nielsen, jakob. 1996. accessible design for users with disabilities. alertbox: current issues in web usability, oct. www .useit.com/alertbox/9610.html (accessed apr. 17, 2007). ———. 1999. disabled accessibility: the pragmatic approach. alertbox: current issues in web usability, june 13. www.useit .com/alertbox/990613.html (accessed apr. 17, 2007). newell, alan f. 1995. extra-ordinary human computer operation. extra-ordinary human-computer interactions: interfaces for users with disabilities. ed. a. d. n. edwards, 3–18. cambridge: cambridge univ. pr. o’hara, k. 1996. toward a typology of reading goals. no. xrce technical report no. epc-1996-107. xerox research centre europe. www.xrce.xerox.com/publications/ attachments/1996-107/epc-1996-107.pdf (accessed apr. 17, 2007). paciello, michael g. 2000. web accessibility for people with disabilities. lawrence, kans.: cmp books. petrie, helen and g. weber. 2002. reading multimedia documents. computers helping people with special needs 8th international conference, icchp 2002. proceedings, 15–20 july. raman, t. v. 1994. audio system for technical reading. phd thesis, cornell university. schade, a. and j. nielsen. 2002. accessibility and usability of flash for users with disabilities based on best practices for design of flash-based user interfaces, based on usability studies with people who use assistive technology. fremont, calif.: norman nielsen group. stewart, r., v. narendra, and a. schmetzke. 2005. accessibility and usability of online library databases. library hi tech 23, no. 2: 265–286. tenopir, c. 2003. use and users of electronic library resources: an overview and analysis of recent research studies. washington, d.c.: council on library and information resources. http://clir.org/pubs/reports/pub120/pub120. pdf (accessed apr. 17, 2007). u.s. access board. 1998. section 508 of the rehabilitation act. u.s. code 29, §794d. webaim. 2006. accessibility features in acrobat reader 7. www .webaim.org/resources/reader/index.php (accessed apr. 17, 2007). w3c. 1999. web content accessibility guidelines 1.0. www .w3.org/tr/wai-webcontent (accessed apr. 17, 2007). w3c. 2006. web content accessibility guidelines 2.0. w3c working draft, apr. 27, 2006. www.w3.org/tr/wcag20 (accessed apr. 17, 2007). are pdf documents accessible? | ribera turró 33 appendix a. wcag 1.0 checklist of checkpoints priority 1 checkpoints in general (priority 1) yes no n/a 1.1 provide a text equivalent for every non-text element (e.g., via “alt,” “longdesc,” or in element content). this includes: images, graphical representations of text (including symbols), image map regions, animations (e.g., animated gifs), applets and programmatic objects, ascii art, frames, scripts, images used as list bullets, spacers, graphical buttons, sounds (played with or without user interaction), stand-alone audio files, audio tracks of video, and video. x 2.1 ensure that all information conveyed with color is also available without color, for example from context or markup. x 4.1 clearly identify changes in the natural language of a document’s text and any text equivalents (e.g., captions). x1 6.1 organize documents so they may be read without style sheets. for example, when an html document is rendered without associated style sheets, it must still be possible to read the document. x 6.2 ensure that equivalents for dynamic content are updated when the dynamic content changes. x 7.1 until user agents allow users to control flickering, avoid causing the screen to flicker. x 14.1 use the clearest and simplest language appropriate for a site’s content. x and if you use images and image maps (priority 1) yes no n/a 1.2 provide redundant text links for each active region of a server-side image map. x 9.1 provide client-side image maps instead of server-side image maps except where the regions cannot be defined with an available geometric shape. x and if you use tables (priority 1) yes no n/a 5.1 for data tables, identify row and column headers. x 5.2 for data tables that have two or more logical levels of row or column headers, use markup to associate data cells and header cells. x and if you use frames (priority 1) yes no n/a 12.1 title each frame to facilitate frame identification and navigation. x and if you use applets and scripts (priority 1) yes no n/a 6.3 ensure that pages are usable when scripts, applets, or other programmatic objects are turned off or not supported. if this is not possible, provide equivalent information on an alternative accessible page. x and if you use multimedia (priority 1) yes no n/a 1.3 until user agents can automatically read aloud the text equivalent of a visual track, provide an auditory description of the important information of the visual track of a multimedia presentation. x 1.4 for any time-based multimedia presentation (e.g., a movie or animation), synchronize equivalent alternatives (e.g., captions or auditory descriptions of the visual track) with the presentation. x and if all else fails (priority 1) yes no n/a 11.4 if, after best efforts, you cannot create an accessible page, provide a link to an alternative page that uses w3c technologies, is accessible, has equivalent information (or functionality), and is updated as often as the inaccessible (original) page. x 34 information technology and libraries | september 2008 priority 2 checkpoints in general (priority 2) yes no n/a 2.2 ensure that foreground and background color combinations provide sufficient contrast when viewed by someone having color deficits or when viewed on a black and white screen. [priority 2 for images, priority 3 for text.] x2 3.1 when an appropriate markup language exists, use markup rather than images to convey information. x3 3.2 create documents that validate to published formal grammars. x 3.3 use style sheets to control layout and presentation. x4 3.4 use relative rather than absolute units in markup language attribute values and style sheet property values. x 3.5 use header elements to convey document structure and use them according to specification. x 3.6 mark up lists and list items properly. x 3.7 mark up quotations. do not use quotation markup for formatting effects such as indentation. x 6.5 ensure that dynamic content is accessible or provide an alternative presentation or page. x 7.2 until user agents allow users to control blinking, avoid causing content to blink (i.e., change presentation at a regular rate, such as turning on and off). x 7.4 until user agents provide the ability to stop the refresh, do not create periodically autorefreshing pages. x 7.5 until user agents provide the ability to stop auto-redirect, do not use markup to redirect pages automatically. instead, configure the server to perform redirects. x 10.1 until user agents allow users to turn off spawned windows, do not cause pop-ups or other windows to appear and do not change the current window without informing the user. x 11.1 use w3c technologies when they are available and appropriate for a task and use the latest versions when supported. x 11.2 avoid deprecated features of w3c technologies. x 12.3 divide large blocks of information into more manageable groups where natural and appropriate. x 13.1 clearly identify the target of each link. x 13.2 provide metadata to add semantic information to pages and sites. x 13.3 provide information about the general layout of a site (e.g., a site map or table of contents). x 13.4 use navigation mechanisms in a consistent manner. x are pdf documents accessible? | ribera turró 35 in general (priority 2), cont. yes no n/a and if you use tables (priority 2) yes no n/a 5.3 do not use tables for layout unless the table makes sense when linearized. otherwise, if the table does not make sense, provide an alternative equivalent (which may be a linearized version). x 5.4 if a table is used for layout, do not use any structural markup for the purpose of visual formatting. x and if you use frames (priority 2) yes no n/a 12.2 describe the purpose of frames and how frames relate to each other if it is not obvious by frame titles alone. x and if you use forms (priority 2) yes no n/a 10.2 until user agents support explicit associations between labels and form controls, for all form controls with implicitly associated labels, ensure that the label is properly positioned. x 12.4 associate labels explicitly with their controls. x and if you use applets and scripts (priority 2) yes no n/a 6.4 for scripts and applets, ensure that event handlers are input-device independent. x 7.3 until user agents allow users to freeze moving content, avoid movement in pages. x 8.1 make programmatic elements such as scripts and applets directly accessible or compatible with assistive technologies. [priority 1 if functionality is important and not presented elsewhere, otherwise priority 2.] x 9.2 ensure that any element that has its own interface can be operated in a device-independent manner. x 9.3 for scripts, specify logical event handlers rather than device-dependent event handlers. x priority 3 checkpoints in general (priority 3) yes no n/a 4.2 specify the expansion of each abbreviation or acronym in a document where it first occurs. x5 4.3 identify the primary natural language of a document. x 9.4 create a logical tab order through links, form controls, and objects. x 9.5 provide keyboard shortcuts to important links (including those in client-side image maps), form controls, and groups of form controls. x6 10.5 until user agents (including assistive technologies) render adjacent links distinctly, include nonlink, printable characters (surrounded by spaces) between adjacent links. x 11.3 provide information so that users may receive documents according to their preferences (e.g., language, content type, etc.). x 36 information technology and libraries | september 2008 in general (priority 3), cont. yes no n/a 13.5 provide navigation bars to highlight and give access to the navigation mechanism. x 13.6 group related links, identify the group (for user agents), and, until user agents do so, provide a way to bypass the group. x 13.7 if search functions are provided, enable different types of searches for different skill levels and preferences. x7 13.8 place distinguishing information at the beginning of headings, paragraphs, lists, etc. x 13.9 provide information about document collections (i.e., documents comprising multiple pages). x 13.10 provide a means to skip over multi-line ascii art. x 14.2 supplement text with graphic or auditory presentations where they will facilitate comprehension of the page. x 14.3 create a style of presentation that is consistent across pages. x and if you use images and image maps (priority 3) yes no n/a 1.5 until user agents render text equivalents for client-side image map links, provide redundant text links for each active region of a client-side image map. x and if you use tables (priority 3) yes no n/a 5.5 provide summaries for tables. x8 5.6 provide abbreviations for header labels. x 10.3 until user agents (including assistive technologies) render side-by-side text correctly, provide a linear text alternative (on the current page or some other) for all tables that lay out text in parallel, word-wrapped columns. x and if you use forms (priority 3) yes no n/a 10.4 until user agents handle empty controls correctly, include default, place-holding characters in edit boxes and text areas. x notes 1. this information is not correctly transferred from some word processors to the pdf format. 2. adobe reader allows color combinations in text to be changed. 3. the pdf tag set is very limited and does not include mathematical formulas or chemical symbols. 4. the latest versions of pdf use css. 5. creators can use the e element to specify an abbreviation for a word. 6. adobe reader and adobe acrobat use generic access keys, but they cannot be specified in a document. 7. included in adobe reader and adobe acrobat. 8. only as of version 1.7. are pdf documents accessible? | ribera turró 37 appendix b. wcag 2.0 checklist of checkpoints (draft april 2006) principle 1: content must be perceivable guideline 1.1: provide text alternatives for all non-text content yes no n/a level 1 success criteria for guideline 1.1 for all non-text content, one of the following is true: if non-text content presents information or responds to user input, text alternatives serve the same purpose and present the same information as the non-text content. if text alternatives cannot serve the same purpose, then text alternatives at least identify the purpose of the non-text content. if non-text content is multimedia; live audio-only or live video-only content; a test or exercise that must use a particular sense; or primarily intended to create a specific sensory experience; then text alternatives at least identify the non-text content with a descriptive text label. if the purpose of non-text content is to confirm that content is being operated by a person rather than a computer, different forms are provided to accommodate multiple disabilities. if non-text content is pure decoration, or used only for visual formatting, or if it is not presented to users, it is implemented such that it can be ignored by assistive technology. x1 guideline 1.2 provide synchronized alternatives for multimedia (not analyzed in this article) yes no n/a level 1 success criteria for guideline 1.2 1.2.1 captions are provided for prerecorded multimedia. 1.2.2 audio descriptions of video, or a full multimedia text alternative including any interaction, are provided for prerecorded multimedia. level 2 success criteria for guideline 1.2 1.2.3 audio descriptions of video are provided for prerecorded multimedia. 1.2.4 captions are provided for live multimedia. level 3 success criteria for guideline 1.2 1.2.5 sign-language interpretation is provided for multimedia. 1.2.6 extended audio descriptions of video are provided for prerecorded multimedia. 1.2.7 for prerecorded multimedia, a full multimedia text alternative including any interaction is provided. guideline 1.3: ensure that information and structure can be separated from presentation yes no n/a level 1 success criteria for guideline 1.3 1.3.1 information and relationships conveyed through presentation can be programmatically determined, and notification of changes to these is available to user agents, including assistive technologies. x 1.3.2 any information that is conveyed by color is also visually evident without color. x 38 information technology and libraries | september 2008 guideline 1.3: ensure that information and structure can be separated from presentation, cont. yes no n/a 1.3.3 when the sequence of the content affects its meaning, that sequence can be programmatically determined. x level 2 success criteria for guideline 1.3 1.3.4 information that is conveyed by variations in presentation of text is also conveyed in text, or the variations in presentation of text can be programmatically determined. x 1.3.5 information required to understand and operate content does not rely on shape, size, visual location, or orientation of components. x guideline 1.4: make it easy to distinguish foreground information from its background yes no n/a level 2 success criteria for guideline 1.4 1.4.1 text or diagrams, and their background, have a luminosity contrast ratio of at least 5:1. x 1.4.2 a mechanism is available to turn off background audio that plays automatically, without requiring the user to turn off all audio. x level 3 success criteria for guideline 1.4 1.4.3 text or diagrams, and their background, have a luminosity contrast ratio of at least 10:1. x 1.4.4 audio content does not contain background sounds, background sounds can be turned off, or background sounds are at least 20 decibels lower than the foreground audio content, with the exception of occasional sound effects. x principle 2: interface components in the content must be operable guideline 2.1: make all functionality operable via a keyboard interface yes no n/a level 1 success criteria for guideline 2.1 2.1.1 all functionality of the content is operable in a non-time-dependent manner through a keyboard interface, except where the task requires analog, time-dependent input. note: this does not preclude and should not discourage the support of other input methods (such as a mouse) in addition to keyboard operation. x level 3 success criteria for guideline 2.1 2.1.2 all functionality of the content is operable in a non-time-dependent manner through a keyboard interface. x are pdf documents accessible? | ribera turró 39 guideline 2.2: allow users to control time limits on their reading or interaction yes no n/a level 1 success criteria for guideline 2.2 2.2.1 for each time-out that is a function of the content, at least one of the following is true: • the user is allowed to deactivate the time-out; or • the user is allowed to adjust the time-out over a wide range that is at least ten times the length of the default setting; or • the user is warned before time expires and given at least 20 seconds to extend the time-out with a simple action (for example, “hit any key”), and the user is allowed to extend the timeout at least ten times; or • the time-out is an important part of a real-time event (for example, an auction), and no alternative to the time-out is possible; or • the time-out is part of an activity where timing is essential (for example, competitive gaming or time-based testing) and time limits can not be extended further without invalidating the activity. x level 2 success criteria for guideline 2.2 2.2.2 content does not blink for more than three seconds, or a method is available to stop all blinking content in the web unit or authored component. x 2.2.3 content can be paused by the user unless the timing or movement is part of an activity where timing or movement is essential. x level 3 success criteria for guideline 2.2 2.2.4 except for real-time events, timing is not an essential part of the event or activity presented by the content. x 2.2.5 interruptions, such as updated content, can be postponed or suppressed by the user, except interruptions involving an emergency. x 2.2.6 when an authenticated session expires, the user can continue the activity without loss of data after re-authenticating. x guideline 2.3: allow users to avoid content that could cause seizures due to photosensitivity yes no n/a level 1 success criteria for guideline 2.3 2.3.1 content does not violate the general flash threshold or the red flash threshold. x level 3 success criteria for guideline 2.3 2.3.2 web units do not contain any components that flash more than three times in any onesecond period. x 40 information technology and libraries | september 2008 guideline 2.4: provide mechanisms to help users find content, orient themselves within it, and navigate through it yes no n/a level 1 success criteria for guideline 2.4 2.4.1 a mechanism is available to bypass blocks of content that are repeated on multiple web units. x level 2 success criteria for guideline 2.4 2.4.2 more than one way is available to locate content within a set of web units where content is not the result of, or a step in, a process or task. x 2.4.3 web units have titles. x 2.4.4 each link is programmatically associated with text from which its purpose can be determined. x level 3 success criteria for guideline 2.4 2.4.5 titles, headings, and labels are descriptive. x 2.4.6 when a web unit or authored component is navigated sequentially, components receive focus in an order that follows relationships and sequences in the content. x 2.4.7 information about the user’s location within a set of web units is available. x 2.4.8 the purpose of each link can be programmatically determined from the link. x guideline 2.5: help users avoid mistakes and make it easy to correct mistakes that do occur yes no n/a level 1 success criteria for guideline 2.5 2.5.1 if an input error is detected, the error is identified and described to the user in text. x level 2 success criteria for guideline 2.5 2.5.2 if an input error is detected and suggestions for correction are known and can be provided without jeopardizing the security or purpose of the content, the suggestions are provided to the user. x 2.5.3 for forms that cause legal or financial transactions to occur, that modify or delete data in data storage systems, or that submit test responses, at least one of the following is true: actions are reversible. actions are checked for input errors before going on to the next step in the process. the user is able to review and confirm or correct information before submitting it. x level 3 success criteria for guideline 2.5 2.5.4 context-sensitive help is available for text input. x principle 3: content and controls must be understandable guideline 3.1: make text content readable and understandable yes no n/a level 1 success criteria for guideline 3.1 3.1.1 the primary natural language or languages of the web unit can be programmatically determined. x2 level 2 success criteria for guideline 3.1 are pdf documents accessible? | ribera turró 41 guideline 3.1: make text content readable and understandable, cont. yes no n/a 3.1.2 the natural language of each passage or phrase in the web unit can be programmatically determined note: this requirement does not apply to individual words or phrases that have become part of the primary language of the content. x2 level 3 success criteria for guideline 3.1 3.1.3 a mechanism is available for identifying specific definitions of words or phrases used in an unusual or restricted way, including idioms and jargon. x3 3.1.4 a mechanism for finding the expanded form of abbreviations is available. x4 3.1.5 when text requires reading ability more advanced than the lower secondary education level, supplemental content is available that does not require reading ability more advanced than the lower secondary education level. x 3.1.6 a mechanism is available for identifying specific pronunciation of words where meaning cannot be determined without pronunciation. x3 guideline 3.2: make the placement and functionality of content predictable yes no n/a level 1 success criteria for guideline 3.2 3.2.1 when any component receives focus, it does not cause a change of context. x 3.2.2 changing the setting of any form control or field does not automatically cause a change of context (beyond moving to the next field in tab order), unless the authored unit contains instructions before the control that describe the behavior. x level 2 success criteria for guideline 3.2 3.2.3 navigational mechanisms that are repeated on multiple web units within a set of web units or other primary resources occur in the same relative order each time they are repeated, unless a change is initiated by the user. x 3.2.4 components that have the same functionality within a set of web units are identified consistently. x level 3 success criteria for guideline 3.2 3.2.5 changes of context are initiated only by user request. x principle 4: content should be robust enough to work with current and future user agents (including assistive technologies) guideline 4.1: support compatibility with current and future user agents (including assistive technologies) yes no n/a level 1 success criteria for guideline 4.1 4.1.1 web units or authored components can be parsed unambiguously, and the relationships in the resulting data structure are also unambiguous. x 4.1.2 for all user interface components, the name and role can be programmatically determined, values that can be set by the user can be programmatically set, and notification of changes to these items is available to user agents, including assistive technologies. x 42 information technology and libraries | september 2008 guideline 4.2: ensure that content is accessible or provide an accessible alternative yes no n/a level 1 success criteria for guideline 4.2 4.2.1 at least one version of the content meets all level 1 success criteria, but alternate version(s) that do not meet all level 1 success criteria may be available from the same uri. x 4.2.2 content meets the following criteria even if the content uses a technology that is not in the chosen baseline: if content can be entered using the keyboard, then the content can be exited using the keyboard. content conforms to success criterion 2.3.1 (general and red flash). x level 2 success criteria for guideline 4.2 4.2.3 at least one version of the content meets all level 2 success criteria, but alternate version(s) that do not meet all level 2 success criteria may be available from the same uri. x level 3 success criteria for guideline 4.2 4.2.4 content implemented using technologies outside of the chosen baseline satisfies all level 1 and level 2 requirements supported by the technologies. x notes 1. nonfunctional content can be specified using watermarks, or can simply be deleted from the reading order or the tag tree. 2. this information is not correctly transferred from some word processors to the pdf format. 3. creators can use the title attribute to specify an alternate title for a tag. 4. creators can use the e element to specify the abbreviation for a word. are pdf documents accessible? | ribera turró 43 appendix c. section 508 checklist section 508 pass fail n/a (a) a text equivalent for every non-text element shall be provided (e.g., via “alt,” “longdesc,” or in element content). x (b) equivalent alternatives for any multimedia presentation shall be synchronized with the presentation. x (c) webpages shall be designed so that all information conveyed with color is also available without color, for example from context or markup. x (d) documents shall be organized so they are readable without requiring an associated style sheet. x (e) redundant text links shall be provided for each active region of a server-side image map. x (f) client-side image maps shall be provided instead of server-side image maps except where the regions cannot be defined with an available geometric shape. x (g) row and column headers shall be identified for data tables. x (h) markup shall be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers. x (i) frames shall be titled with text that facilitates frame identification and navigation. x (j) pages shall be designed to avoid causing the screen to flicker with a frequency greater than 2 hz and lower than 55 hz. x (k) a text-only page, with equivalent information or functionality, shall be provided to make a website comply with the provisions of this part, when compliance cannot be accomplished in any other way. the content of the text-only page shall be updated whenever the primary page changes. x (l) when pages utilize scripting languages to display content, or to create interface elements, the information provided by the script shall be identified with functional text that can be read by assistive technology. [not evaluated in this article] x (m) when a webpage requires that an applet, plug-in, or other application be present on the client system to interpret page content, the page must provide a link to a plug-in or applet that complies with §1194.21(a) through (l). x (n) when electronic forms are designed to be completed online, the form shall allow people using assistive technology to access the information, field elements, and functionality required for completion and submission of the form, including all directions and cues. x (o) a method shall be provided that permits users to skip repetitive navigation links. x (p) when a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required. x first aid training for those on the front lines: digital preservation needs survey results 2012 jody deridder information technology and libraries | june 2013 18 “the dilemma for the cultural heritage preservation community derives from the lag between immediate need and the long-term transformation of digital preservation expertise.” 1 introduction every day history is being made and recorded in digital form. every day, more and more digitally captured history disappears completely or becomes inaccessible due to obsolescence of hardware, software, and formats.2 although it has long been the focus of libraries and archives to retain, organize, and preserve information, these communities face a critical skills gap. 3 further, the typical library cannot support a true, trusted digital repository compliant with the open archival information system (oais) framework.4 until we have in place the infrastructure, expertise, and resources to distill critical information from the digital deluge and preserve it appropriately, what steps can those in the field take to help mitigate the loss of our cultural heritage? the very “scale of the digital landscape makes it clear that preservation is a process of triage.” 5 while educational systems across the country are scrambling to develop training programs to address the problem, it will be years, if ever, before every cultural heritage institution has at least one of these formally trained employees on staff. librarians and archivists already in place are wondering what they can do in the meantime. those on the front lines of this battlefront to save our cultural history need training. surrounded by content under digitization, digital content coming into special collections and archives, assisting content creators in their research and scholarship, these archivists and librarians need to know what they can do to prevent more critical loss. even if developing a preservation program is limited to ensuring the digital content survives long enough to be collected by some better-funded agency, capturing records in open standard interoperable technology neutral formats would help to ease later ingest of such content into a trusted digital repository.6 as molinaro has pointed out, those in the field need “the knowledge and skills to ensure that their projects and programs are well conceived, feasible, and have a solid sustainability plan.” 7 for those on the front lines, digital preservation education needs to be accessible, practical, and targeted to an audience that may have little technical expertise. since “resources for preservation are meager in small and medium-sized heritage organizations,” 8 such training needs to be free or as low-cost as possible. jody l. deridder (jlderidder@ua.edu) is head of digital services at the university of alabama libraries, tuscaloosa. mailto:jlderidder@ua.edu first aid training for those on the front lines | deridder 19 in an effort to address these needs, the library of congress established the digital preservation outreach & education (dpoe) train-the-trainer network.9 in six one-hour modules,10 this training provides a basic overview of the framework necessary to begin to develop a digital preservation program. the modules formed the basis for three well-attended aserl webinars in february 2012.11 attendee feedback after the webinars indicated a deep need for practical, detailed instruction for those in the field. this article reports on the results of a follow-up survey to identify the topics and types of materials most important to webinar attendees and their institutions for digital preservation, in the fall of 2012. approach the survey was open from october 2 until december 15, 2012. invitations to participate were sent to the following discussion lists: society of american archivists (saa) archives & archivists (a&a), saa preservation section discussion list, saa metadata and digital object round table discussion list, digital-curation (google group), digital library federation (dlf-announce), and the library of congress digital preservation and outreach (dpoe) general listserv. each invitation clarified that respondents need not be association of south eastern research libraries (aserl) members in order to attend the free webinars or to participate in the survey. the survey consisted of three questions, the first to determine the sources of digital content most important for respondents’ institutions to preserve, and the second to identify the topics of greatest concern to respondents themselves. for these two questions, respondents were asked to rate the options as: • extremely important • somewhat important • maybe of value • not important at all the first two questions are as follows: please rate the following sources of digital content in terms of importance for preservation at your institution: • born-digital institutional records • born-digital special collections materials • digitized collections • digital scholarly content (institutional repository or grey literature) • digital research data • web content • other please rate the following topics in terms of importance to you, for inclusion in future training webinars: information technology and libraries | june 2013 20 • how to inventory content to be managed for preservation • developing selection criteria, and setting the scope for what your institution commits to preserving • selecting storage options and number of copies • determining what metadata to capture and store • methods of preservation metadata extraction, creation, and storage • legal issues surrounding access, use, migration, and storage • selecting file formats for archiving • validating files and capturing checksums • monitoring status of files and media • file conversion and migration issues • business continuity planning • security and disaster planning at multiple levels of scope • self-assessment and external audits of your preservation implementation • developing your institution's preservation policy and planning team • planning for provision of access over time • other after each of these questions, respondents were provided a free text field in which to add additional entries related to the “other” entry. the last question on the survey asked respondents whether they are members of an aserl institution, since aserl is supporting this series of webinars. results of the 182 respondents, 37 (20.7 percent) self-identified as aserl members, 142 (79.3 percent) as non-aserl members, and three skipped the question. all respondents answered the first two queries. sources of digital content for the complete set of respondents, the top three types of material considered extremely important for preservation were born-digital special collections materials (65 percent, 117 respondents), born-digital institutional records (62.7 percent, 111 respondents), and digitized collections (61.2 percent, 109 respondents). digital scholarly content, digital research data, and web content trailed in importance, rated extremely important by only 37 percent (64 respondents), 33.9 percent (59 respondents), and 30.6 percent (52 respondents) respectively. in clarification, one respondent listed “born-digital correspondence (e-mail),” another listed “state government digital archival records,” a third asked for instructions for use of “kodak’s new asset protection film for preservation of moving and still images,” and one specified that by “special collections” she meant “audiovisual.” first aid training for those on the front lines | deridder 21 the concern for a/v materials was echoed by some of the 8 respondents suggesting other content as extremely important: “born-digital moving image preservation” (an aserl respondent), “best practices for preservation of different audio and video formats” (also an aserl respondent), “born digital photographs and video of college events,” and a request for an “audio digitization workshop.” additional “other” entries were copyright pitfalls, data security, and “very practical steps that very small institutions can take to preserve their digital materials (e.g. how to check digital integrity, and how often, selection of storage media, and creation of a ‘dark archive’).” one aserl respondent indicated that she did not rate “born digital” institutional and special collections materials as extremely important for preservation only because her institution does not yet have a system set up for these, nor do they yet collect many born-digital special collections. she clarified that she does think this is extremely important despite the seeming lack of interest on the part of her institution. figure 1. results for all survey respondents indicating sources of digital content of importance for preservation at their institution. information technology and libraries | june 2013 22 in comparing the responses to the first question by whether the respondents self-identified as members of an aserl institution (37 respondents as opposed to 142), those who did considered born-digital special collections materials far more important (73 percent, 27 respondents) than non-aserl respondents (62.9 percent, 88 respondents), but this still was rated most important by both groups. second for aserl respondents was digitized collections (69.4 percent, 25 respondents) whereas born-digital institutional records held second place for non-aserl respondents (62 percent, 85 respondents). third and fourth-ranked material sources for aserl respondents were born-digital institutional records (64.9 percent, 24 respondents) and digital scholarly content (63.9 percent, 23 respondents); digital research data only rated 52.8 percent (19 respondents). non-aserl respondents considered digitized collections the third most important source of digital content for preservation (59.7 percent, 83 respondents), and this group of respondents was far less concerned with digital scholarly content (29.9 percent, 40 respondents) or digital research data (29.6 percent, 40 respondents) than the aserl respondents. web content ranked lowest for both groups: 29.4 percent (10) aserl respondents and 30.6 percent (41) nonaserl respondents considered this content extremely important. figure 2. results for aserl survey respondents indicating sources of digital content of importance for preservation at their institution. first aid training for those on the front lines | deridder 23 figure 3. results for non-aserl survey respondents indicating sources of digital content of importance for preservation at their institution. perhaps most surprising was that 20 non-aserl respondents (14.8 percent) rated digital research data as “not important at all” for preservation at their institutions, but this may be reflective of their type of institution. museums and historical societies, non-research institutions, and government agencies likely are not concerned with research data; this theory seems to be supported by the 12.7 percent (17) non-aserl respondents who rated digital scholarly content as “not important at all.” in comparison, only one aserl respondent (2.8 percent) indicated that research data had no importance to his institution for preservation (0 for digital scholarly content). this may simply reflect a lack of awareness of current issues on the part of the respondent. topics of interest both groups of respondents agreed on the three most important topics for future training webinars. “methods of preservation metadata extraction, creation and storage” led the way with 77.3 percent (140 respondents: 70.3 percent or 26 aserl and 79.4 percent or 112 non-aserl) information technology and libraries | june 2013 24 listing this as extremely important. next was “determining what metadata to capture and store” (68 percent, 96 respondents: 62.2 percent or 23 aserl and 66.7 percent or 120 non-aserl). the third most important topic is “planning for provision of access over time” at 65.4 percent (117 respondents: 1.1 percent or 22 aserl and 65.7 percent or 92 non-aserl). figure 4. results for all survey respondents indicating topics of importance to them, for future training webinars. fourth in importance overall was “file conversion and migration issues” (58.8 percent, 107 respondents: 54.1 percent or 20 aserl and 60.6 percent or 86 non-aserl), though the aserl respondents thought this topic was slightly less critical than “developing selection criteria, and setting the scope for what your institution commits to preserving” (56.8 percent, 21 respondents as opposed to 49.6 percent or 70 non-aserl respondents; overall percentage 51.9 percent, 94 respondents). close in relative importance were “validating files and capturing checksums” (53.9 percent, 97 respondents), “monitoring status of files and media” (52.8 percent, 95 respondents), and “developing your institution’s preservation policy and planning team” (51.1 percent, 92 first aid training for those on the front lines | deridder 25 respondents). interestingly, however, “validating files and capturing checksums” is far more important to non-aserl respondents (53.6 percent, 75 respondents) than those from aserl institutions (only 37.8 percent, 14 respondents). “legal issues surrounding access, use, migration and storage” is a more important topic for aserl respondents (51.4 percent, 19 respondents) than non-aserl (42.8 percent, 77 respondents), and aserl respondents were more concerned (37.8 percent, 14 respondents) than non-aserl (33.1 percent, 46 respondents) with “selfassessment and external audits.” additionally, “selecting file formats for archiving” and “selecting storage options and number of copies” is more important for non-aserl (47.5 percent, 67 respondents and 47.9 percent, 67 respondents) than aserl respondents (35.1 percent, 13 respondents and 32.4 percent, 12 respondents, respectively). figure 5. results for aserl survey respondents indicating topics of importance to them, for future training webinars. “security and disaster planning” was ranked extremely important by only 32.6 percent (45) respondents overall, followed by “business continuity planning” at only 29.2 percent (40) respondents. the latter may reflect a lack of widespread awareness of just how critical the loss of information technology and libraries | june 2013 26 a single key employee can be, especially in smaller institutions. it also seems clear that there’s a level of complacency or sense of security about our ephemeral digital content that may be in error. then again, it is quite possible that the respondents are not administrators and feel they do not have the power in their organizations to address such issues. figure 6. results for non-aserl survey respondents indicating topics of importance to them, for future training webinars. additional topics considered extremely important to respondents are as follows, listed in the free text area (the last four by aserl members): • "clean" work station setup—hardware & software for ingest, virus scan, checksum, disk image, metadata, conversion, etc. • integrating tools into your workflow. there is a need to address the nuts and bolts for those of us that are further along in determining the metadata required to capture, selection criteria, and asset audit and preservation policy. first aid training for those on the front lines | deridder 27 • methods for providing researchers access to born digital content (not necessarily online, could be just in-house). • strategies for locating digital assets on physical media in large collections that have been using mplp [“more product, less process”] for decades. • format determination and successful migration or emulation. • staff diversity and training. • how to validate files, migrate files, and which born-digital institutional files our special collections needs to be preserving. • creating and maintaining effective organizational models for digital preservation (i.e. collaboration with central it and/or external vendors, etc.). • case studies of digital preservation, establishing workflow of digital preservation. • web archiving (best practices, alternatives to archive-it, methods of selection, etc.). • one (non-aserl) respondent said it was “somewhat important” to include the topic of “trends for field, future outlook.” conclusions the results from this survey are clear: free or low-cost training needs to focus immediately on preservation of born-digital special collections materials, born-digital institutional records, and digitized collections. the topics of prime importance to respondents were “methods of preservation metadata extraction, creation and storage,” “determining what metadata to capture and store,” and “planning for provision of access over time.” the variations in ratings between respondents from self-identifying as aserl members versus non-aserl members indicates that the needs of those in research libraries differs somewhat from that of cultural heritage institutions in the field dealing with “the long tail” of digital content. 12 future training may need to target these differing audiences appropriately to ensure these needs are met. additionally, administrators need to be addressed as a unique audience in order to focus on the requirements for addressing “security and disaster planning” and “business continuity planning,” as these critical areas need to be developed by those in management positions. future surveys of this nature should include a component to determine the level of technical expertise and support the respondents have, as well as a measure of their position or power in the administrative hierarchy. continued surveys would be extremely helpful in ensuring that available educational options meet the needs of librarians and archivists in the field. as molinaro has pointed out, “getting the right information in the right hands at the right time is a problem that has plagued the library community for decades.” 13 now is the time to develop free, openly available, practical digital preservation training for those on the front lines, if we are to retain critical cultural heritage materials which are only available in digital form. for them to effectively perform necessary triage on incoming digital content, they must be trained in “first aid.” our history is at stake. information technology and libraries | june 2013 28 references 1. paul conway, “preservation in the age of google: digitization, digital preservation, and dilemmas,” library quarterly 80, no. 1 (january 2010): 73–74, doi:10.1086/64846.3. 2. clifford lynch, “challenges and opportunities for digital stewardship in the era of hope and crisis” (keynote speech, is&t archiving 2009 conference, arlington, virginia, may 2009). 3. karen f. gracy and miriam b. kahn, “preservation in the digital age,” american library association, library resources and technical services 56, no. 1 (2012): 30. 4. marshall breeding, “from disaster recovery to digital preservation,” computers in libraries 32, no. 4 (2012): 25. 5. mike kastellec, “practical limits to the scope of digital preservation,” information technology & libraries 31, no. 2 (2012): 70, doi:10.6017/ital.v31i2.2167. 6. charles dollar and lori ashley, “digital preservation capability maturity model,” ver. 2.4, (november 2012), https://docs.google.com/file/d/0bwbqtwrvkhokrxnvnmhxtmo2suu/edit?pli=1 (accessed dec. 24, 2012). 7. mary molinaro, “how do you know what you don’t know? digital preservation education,” information standards quarterly 22, no. 2 (2010): 45. 8. conway, “preservation in the age of google,” 70. 9. library of congress, “digital preservation outreach & education: dpoe background,” accessed december 31, 2012, www.digitalpreservation.gov/education/background.html. 10. library of congress, “digital preservation outreach & education: dpoe curriculum,” accessed december 31, 2012, www.digitalpreservation.gov/education/curriculum.html. 11. jody l. deridder, “introduction to digital preservation—a three-part series based on the digital preservation, outreach and education (dpoe) model,” association of southeastern research libraries, 2012, [archived webinars], accessed december 31, 2012, www.aserl.org/archive. 12. jody l. deridder, “benign neglect: developing life rafts for digital content,” information technology & libraries 30:2 (june 2011): 71–74. 13. molinaro, “how do you know what you don’t know?” 47. https://docs.google.com/file/d/0bwbqtwrvkhokrxnvnmhxtmo2suu/edit?pli=1 http://www.digitalpreservation.gov/education/background.html http://www.digitalpreservation.gov/education/curriculum.html http://www.aserl.org/archive/ from our readers | eden 93 bradford lee edenfrom our readers the new user environment: the end of technical services? editor’s note: “from our readers” is an occasional feature highlighting ital readers’ letters and commentaries on timely issues. technical services: an obsolete term used to describe the largest component of most library staffs in the twentieth century. that component of the staff was entirely devoted to arcane and mysterious processes involved in selecting, acquiring, cataloging, processing, and otherwise making available to library users physical material containing information content pieces (incops). the processes were complicated, expensive, and time-consuming, and generally served to severely limit direct service to users both by producing records that were difficult to understand and interpret, even by other library staff, and by consuming from 75–80 percent of the library’s financial and personnel resources. in the twenty-first century, the advent of new forms of publication and new techniques for providing universal records and universal access to information content made the organizational structure obsolete. that change in organizational structure, more than any other single factor, is generally credited as being responsible for the dramatic improvement in the quality of library service that has occurred in the first decade of the twenty-first century. t here are many who would say that i was the one who wrote this quotation. i didn’t, and it is, in fact, more than twenty-five years old!1 while i was beginning to research and prepare for this article, i began as most users today start their search for information: i started with google. granted, i rarely go beyond the first page of results (as most user surveys indicate), but the paucity of links made me click to the next screen. there, at number 16, was a scanned article. jackpot! i thought as i started perusing the contents of this resource online, thinking to myself how the future had changed so dramatically since 1984, with the emergence of the internet and the laptop, all of the new information formats, and the digitization of information. ahh, the power of full text! after reading through the table of contents, introduction, and the first chapter, i noticed that some of the pages were missing. mmmm, obviously some very shoddy scanning on the part of google. but no, i finally realized that only part of this special issue was available on google. obviously, i missed the statement at the bottom of the front scan of the book: “this is a preview. the total pages displayed will be limited. learn more.” and thus the issues regarding copyright reared their ugly head. when discussing the new user environment, there are many demands facing libraries today. in a report by martha bates, citing the principle of least effort first attributed to philologist george zipf and quoted in the calhoun report to the library of congress, she states: people do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable—so long as it requires little effort to find—rather than using information they know to be of high quality and reliable, though harder to find . . . despite heroic efforts on the part of librarians, students seldom have sufficiently sustained exposure to and practice with library skills to reach the point where they feel real ease with and mastery of library information systems.2 according to the final report of bibliographic services task force of the university of california libraries, users expect the following: ■■ one system or search to cover a wide information universe (e.g., google or amazon) ■■ enriched metadata (e.g., onix, tables of contents, and cover art) ■■ full-text availability ■■ to move easily and seamlessly from a citation about an item to the item itself—discovery alone is not enough ■■ systems to provide a lot of intelligent assistance ■❏ correction of obvious spelling errors ■❏ results sorting in order of relevance to their queries ■❏ help in navigating large retrievals through logical subsetting or topical maps or hierarchies ■❏ help in selecting the best source through relevance ranking or added commentary from peers and experts or “others who used this also used that” tools ■❏ customization and personalization services ■■ authenticated single sign-on ■■ security and privacy ■■ communication and collaboration ■■ multiple formats available: e-books, mpeg, jpeg, rss and other push technologies, along with traditional, tangible formats ■■ direct links to e-mail, instant messaging, and sharing ■■ access to online virtual communities ■■ access to what the library has to offer without actually having to visit the library3 bradford lee eden (eden@library.ucsb.edu) is associate university librarian for technical services & scholarly communication, university of california, santa barbara. 94 information technology and libraries | june 2010 what is there in this new user environment for those who work in technical services? as indicated in the opening quote, would a dramatic improvement in library services occur if technical services were removed from the organizational structure? even in 1983, the huge financial investment that libraries made in the organization and description of information, inventory, workflows, and personnel was recognized; today, that investment comes under intense scrutiny as libraries realize that we no longer have a monopoly on information access, and to survive we need to move forward more aggressively into the digital environment than ever before. as marcum stated in her now-famous article, ■■ if the commonly available books and journals are accessible online, should we consider the search engines the primary means of access to them? ■■ massive digitization radically changes the nature of local libraries. does it make sense to devote local efforts to the cataloging of unique materials only rather than the regular books and journals? ■■ we have introduced our cataloging rules and the marc format to libraries all over the world. how do we make massive changes without creating chaos? ■■ and finally, a more specific question: should we proceed with aacr3 in light of a much-changed environment?4 there are larger internal issues to consider here as well. the budget situation in libraries requires the application of business models to workflows that have normally not been questioned nor challenged. karen calhoun discusses this topic in a number of her contributions to the literature: when catalog librarians identify what they contribute to their communities with their methods (the cataloging rules, etc.) and with the product they provide (the catalog), they face the danger of “marketing myopia.” marketing myopia is a term used in the business literature to describe a nearsighted view that focuses on the products and services that a firm provides, rather than the needs those products and services are intended to address.5 for understanding the implementation issues associated with the leadership strategy, it is important to be clear about what is meant by the “excess capacity” of catalogs. most catalogers would deny there is excess capacity in today’s cataloging departments, and they are correct. library materials continue to flood into acquisitions and cataloging departments and staff can barely keep up. yet the key problem of today’s online catalog is the effect of declining demand. in healthy businesses, the demand for a product and the capacity to produce it are in balance. research libraries invest huge sums in the infrastructure that produces their local catalogs, but search engines are students and scholars’ favorite place to begin a search. more users bypass catalogs for search engines, but research libraries’ investment in catalogs—and in the collections they describe—does not reflect the shift in user demand.6 i have discussed this exact problem in recent articles and technical reports as well.7 there have to be better, more efficient ways for libraries to organize and describe information not based on the status quo of redundant “localizing” of bibliographic records. a good analogy would be the current price of gas and the looming transportation crisis. for many years, americans have had the luxury of being able to purchase just about any type of car, truck, suv, hummer, etc., that they wanted on the basis of their own preferences, personalities, and incomes, not on the size of the gas tank or on the mileage per gallon. why not buy a mercedes over a kia? but with gas prices now well above the average person’s ability to consistently fill their gas tank without mortgaging their future, the market demands that people find alternative solutions in order to survive. this has meant moving away from the status quo of personal choice and selection toward a more economic and sustainable model of informed fuel-efficiency transportation, so much so that public transportation is now inundated with more users than it can handle, and consumers have all but abandoned the truck and suv markets. libraries have long worked in the mercedes arena, providing features such as authority control, subject classification, and redundant localizing of bibliographic records that were essential when libraries held the monopoly on information access but are no longer cost-efficient—nor even sane—strategies in the current information marketplace. users are not accessing the opac anymore; well-known studies indicate that more than 80 percent of information seekers begin their search on a web search engine. libraries are investing huge resources in staffing and priorities fiddling with marc bibliographic records in a time when they are struggling to survive and adapt from a monopoly environment to being just one of many players in the new information marketplace. budgets are stagnant, staffing is at an all-time low, new information formats continue to appear and require attention, and users are no longer patient nor comfortable working with our clunky opacs.8 why do libraries continue to support an infrastructure of buying and offering the same books, cds, dvds, journals, etc., at every library, when the new information environment offers libraries the opportunity to showcase and present their unique information resources and one-of-a-kind collections to the world? special collections materials held by every major research and public library in the world can now be digitized, and from our readers | eden 95 sparse library resources need to be adjusted to compete and offer these unique collections and their services to our users and the world. the october 2007 issue of computers in libraries is devoted solely to articles related to the enhancement, usability, appropriateness, and demise of the library opac. interesting articles include “fac-back-opac: an open source solution interface to your library system,” “dreaming of a better ils,” “plug your users into library resources with opensearch plug-ins,” delivering what people need, when and where they need it,” “the birth of a new generation of library interfaces,” and “will the ils soon be as obsolete as the card catalog?” an especially interesting quote is given by cervone, then assistant university librarian for information technology at northwestern university: what i’d like to see is for the catalog to go away. to a great degree, it is an anachronism. what we need from the ils is a solid, business-process back end that would facilitate the functions of the library that are truly unique such as circulation, acquiring materials, and “cataloging” at the item level for what amounts to inventory-control purposes. most of the other traditional ils functions could be rolled over into a centralized system, like oclc, that would be cooperatively shared. the catalog itself should be treated as just another database in the world of resources we have access to. a single interface to those resources that would combine our local print holdings, electronic text (both journal and ebook), as well as multimedia material is what we should be demanding from our vendors.9 one book that needs to be required reading for all librarians, especially catalogers, is weinberger ’s everything is miscellaneous.10 he describes the three orders of order (self organization, metadata, and digital); provides an extensive history of how western civilization has ordered information, specifically the links to nineteenth-century victorianism; and the concepts of lumping and splitting. in the end, weinberger argues that the digital environment allows users to manipulate information into their own organization system, disregarding all previous organizational attempts by supposed experts using outdated and outmoded systems. in the digital disorder of information, an object (leaf) can now be placed on many shelves (branches), figuratively speaking, and this new shape of knowledge brings out four strategic principles: 1. filter on the way out, not on the way in. 2. put each leaf on as many branches as possible. 3. everything is metadata and everything can be a label. 4. give up control. it is this last principle that libraries have challenges with. whether we agree with this principle or not, it has already happened. arguing about it, ignoring it, or just continuing to do business as usual isn’t going to change the fact that information is user-controled and user initiated in the digital environment. so, where do we go from here? the future of technical services (and its staff) far be it from me to try to predict the future of libraries as viable, and more importantly marketable, information organizations in this new environment. one has only to examine the quotations from the first issues of technical services quarterly to see what happens to predictions and opinions. titles of some of the contributions (from 1983, mind you) are worthy of mention: “library automation in the year 2000,” “musings on the future of the catalog,” and “libraries on the line.” there are developments, however, that require reexamination and strategic brainstorming regarding the future of library bibliographic organization and description. the appearance of worldcat local will have a tremendous impact on the disappearance of proprietary vendor opacs. there will no longer be a need for an integrated library system (ils); with worldcat local, the majority of the world’s marc bibliographic records are available in a library 2.0 format. the only things missing are some type of inventory and acquisitions module that can be formatted locally and a circulation module. if oclc could focus their programming efforts on these two services and integrate them into worldcat local, library administrators and systems staff would no longer have to deal with proprietary and clunky opacs (and their huge budgetary lines), but could use the power of web 2.0 (and hopefully 3.0) tools and services to better position themselves in the new information marketplace. another major development is the google digitization project (and other associated ventures). while there are some concerns about quality and copyright,11 as well as issues related to the disappearance of print and the time involved to digitize all print,12 no one can deny the gradual and inevitable effect that mass digitization of print resources will have in the new information marketplace. just the fact that my research explorations for this article brought up digitized portions of the 1983 technical services quarterly articles is an example. more and more, published print information will be available in full-text online. what effect will this have on the physical collection that all libraries maintain, not only in terms of circulation, but also in terms of use of space, preservation, and collection development? no one knows for sure, but if the search strategies and information discovery patterns of our users are any 96 information technology and libraries | june 2010 indication, then we need to be strategically preparing and developing directions and options. automatic metadata generation has been a topic of discussion for a number of years, and jane greenberg’s work at the university of north carolina–chapel hill is one of the leading examples of research in this area.13 while there are still viable concerns about metadata generation without any type of human intervention, semiautomatic and even nonlibrary-facilitated metadata generation has been successful in a number of venues. as libraries grapple with decreased budgets, multiplying formats, fewer staff to do the work, and more retraining and reprofessional development of existing staff, library administrators have to examine all options to maximize personnel as well as budgetary resources. incorporating new technologies and tools for generating metadata without human intervention into library workflows should be viewed as a viable option. user tagging would be included in this area. even intner, a long-time proponent of traditional technical services, has written that generating cataloging data automatically would be of great benefit to the profession, and that more tools and more programming ought to be focused toward this goal.14 so, with print workflows being replaced by digital and electronic workflows, how can administrators assist their technical services staff to remain viable in this new information environment? how can technical services staff not only help themselves but their supervisors and administrators to incorporate their unique talents, expertise, education, and experience toward the type of future scenarios indicated above? competencies and challenges for technical services staff there are some good opinions available for assisting technical services staff with moving into the new environment. names have power, whether we like to admit it or not, and changing the name from “technical services” to something more understandable to our users, let alone our colleagues within the library, is one way to start. names such as “collections and data management services” or “reference data services” have been mentioned.15 an interesting quote sums up the dilemma: it’s pretty clear that technical services departments have long been the ugly ducklings in the library pond, trumped by a quintet of swans: reference departments (the ones with answers for a grateful public); it departments (the magicians who keep the computers humming); children’s and youth departments (the warm and fuzzy nurturers); other specialty departments (the experts in good reads, music, art, law, business, medicine, government documents, av, rare books and manuscripts, you-name-it); and administrative groups (the big bosses). part of the trouble is that the rest of our colleagues don’t really know what technical services librarians do. they only know that we do it behind closed doors and talk about it in language no one else understands. if it can’t be seen, can’t be understood, and can’t be discussed, maybe it’s all smoke and mirrors, lacking real substance. it’s easy to ignore.16 ruschoff mentions competencies for technical services librarians in the new information environment: comfortable working in both print and digital worlds, specialized skills such as foreign languages and subject area expertise, comfortable working in both digital and web-based technologies (suggesting more computing and technology skills), expertise in digital asset management, and problem-solving analytical skills.17 in a recent blog posting summarizing a presentation at the 2008 ala annual conference on this topic, comparisons between catalogers going extinct or retooling are provided. the following is a summary of that post: converging trends ■■ more catalogers work at the support-staff level than as professional librarians. ■■ more cataloging records are selected by machines. ■■ more catalog records are being captured from publisher data or other sources. ■■ more updating of catalog records is done via batch processes. ■■ libraries continue to deemphasize processing of secondary research products in favor of unique primary materials. what are our choices? ■■ behind door number one—the extinction model. ■■ behind door number two—the retooling model. how it’s done ■■ extinction ■❏ keep cranking about how nobody appreciates us. ■❏ assert over and over that we’re already doing everything right—why should we change? ■❏ adopt a “chicken little” approach to envisioning the future. ■■ retooling ■❏ considers what catalogers already do. ■❏ look for support. ■❏ find a new job. what catalogers do ■■ operate within the boundaries of detailed standards. ■■ describe items one-at-a-time. ■■ treat items as if they are intended to fit carefully from our readers | eden 97 within a specific application—the catalog. ■■ ignore the rest of the world of information. what metadata librarians do ■■ think about descriptive data without preconceptions around descriptive level, granularity, or descriptive vocabularies. ■■ consider the entirety of the discovery and access issues around a set or collection of materials. ■■ consider users and uses beyond an individual service when making design decisions—not necessarily predetermined. ■■ leap tall buildings in a single bound. what new metadata librarians do ■■ be aware of changing user needs. ■■ understand the evolving information environment. ■■ work collaboratively with technical staff. ■■ be familiar with all metadata formats and encoding metadata. ■■ seek out tall buildings—otherwise jumping skills will atrophy. the cataloger skill set ■■ aacr2, lc, etc. the metadata librarian skill set ■■ views data as collections, sets, streams. ■■ approaches the task as designing data to “play well with others.” characteristics of our new world ■■ no more ils ■■ bibliographic utilities are unlikely to be the central node for all data. ■■ creation of metadata will become more decentralized. ■■ nobody knows how this will all shake out, but metadata librarians will be critical in forging solutions.18 while the above summary focuses on catalogers and their future, many of the directions also apply to any librarian or support staff member currently working in technical services. in a recent educause review article, brantley lists a number of mantras that all libraries need to repeat and keep in mind in this new information environment: ■■ libraries must be available everywhere. ■■ libraries must be designed to get better through use. ■■ libraries must be portable. ■■ libraries must know where they are. ■■ libraries must tell stories. ■■ libraries must help people learn. ■■ libraries must be tools of change. ■■ libraries must offer paths for exploration. ■■ libraries must help forge memory. ■■ libraries must speak for people. ■■ libraries must study the art of war.19 you will have to read the article to find out about that last point. the above mantras illustrate that each of these issues must also be aligned with the work done by technical services departments in support of the rest of the library’s services. and there definitely isn’t one right way to move forward; each library with its unique blend of services and staff has to define, initiate, and engender dialogue on change and strategic direction, and then actively make decisions with integrity and vigor toward both its users and its staff. as calhoun indicates, there are a number of challenges to feasibility for next steps in this area, some technically oriented but many based on our own organizational structures and strictures: ■■ difficulty achieving consensus on standardized, simplified, more automated workflows. ■■ unwillingness or inability to dispense with highly customized acquisitions and cataloging operations. ■■ overcoming the “not invented here” mindset preventing ready acceptance of cataloging copy from other libraries or external sources. ■■ resistance to simplifying cataloging. ■■ inability to find and successfully collaborate with necessary partners (e.g., ils vendors). ■■ difficulty achieving basic levels of system interoperability. ■■ slow development and implementation of necessary standards. ■■ library-centric decision making; inability to base priorities on how users behave and what they want ■■ limited availability of data to support management decisions. ■■ inadequate skill set among library staff; unwillingness or inability to retrain. ■■ resistance to change from faculty members, deans, or administrators.20 moving forward in the new information world in a recent discussion on the autocat electronic discussion list regarding the client-business paradigm now being impressed on library staff, an especially interesting quote puts the entire debate into perspective: the irony of this discussion is that our patrons/users/ clients [et al.] expect to be treated as well as business customers. they pay tuition or taxes to most of our institutions and expect to have a return in value. and a very large percentage of them care about the differences between the government services vs. business 98 information technology and libraries | june 2010 arguments we present. what they know is that when they want something, they want it. more library powers-that-be now come from the world of business rather than libraries because of the pressure on the bottom line. business administrators are viewed, even by those in public administration, as being more fiscally able than librarians. i would recommend that we fuss less about titles and semantics and develop ways to show the value of libraries to the public.21 wheeler, in a recent educause review article, documents a number of “eras” that colleges and universities have gone through in recent history.22 first is the “era of publishing,” followed by the “era of participation” with the appearance of the internet and its social networking tools. the next era, the “era of certitude,” is one in which users will want quick, timely answers to questions, along with some thought about the need and context of the question. wheeler espouses five dimensions that tools of certitude must have: reach, response, results, resources, and rights. he explains these dimensions in regards to various tools and services that libraries can provide through human–human, human–machine, and machine–machine interaction.23 wheeler sees extensive rethinking and reengineering by libraries, campuses, and information technology to assist users to meet their information needs. are there ways that technical services staff can assist in these efforts? although somewhat dated, calhoun’s extensive article on what is needed from catalogers and librarians in the twenty-first century expounds a number of salient points.24 in table 1, she illustrates some of the many challenges facing traditional library cataloging, providing her opinion on what the challenges are, why they exist, and some solutions for survivability and adaptability in the new marketplace.25 one quote in particular deserves attention: at the very least, adapting successfully to current demands will require new competencies for librarians, and i have made the case elsewhere that librarians must move beyond basic computer literacy to “it fluency”—that is, an understanding of the concepts of information technology, especially applying problem solving and critical thinking skills to using information technology. raising the bar of it fluency will be even more critical for metadata specialists, as they shift away from a focus on metadata production to approaches based on it tools and techniques on the one hand, and on consulting and teamwork on the other. as a result of the increasing need for it fluency among metadata specialists, they may become more closely allied with technical support groups in campus computing centers. the chief challenges for metadata specialists will be getting out of library back rooms, becoming familiar with the larger world of university knowledge communities, and developing primary contacts with the appropriate domain experts and it specialists.26 getting out of the back room and interacting with users seems to be one of the dominant themes of evolving technical services positions to fit the new information marketplace. putting web 2.0 tools and services into the library opac has also gained some momentum since the launch of the endeca-based opac at north carolina state university. as some people have stated, however, putting “lipstick on a pig” doesn’t change the fundamental problems and poor usability of something that never worked well in the first place.27 in their recent article, jia mi and cathy weng tried to answer the following questions: why is the current opac ineffective? what can libraries and librarians do to deliver an opac that is as good as search engines to better serve our users?28 of course, the authors are biased toward the opac and wish to make it better, given that the last sentence in their abstract is, “revitalizing the opac is one of the pressing issues that has to be accomplished.” users’ search patterns have already moved away from the opac as a discovery tool; why should personnel and resource investment continue to be allocated toward something that users have turned away from? in their recommendations, mi and weng indicate that system limitations, not fully exploiting the functionality already made available by ilss, and the unsuitability of marc standards to online bibliographic display are the primary factors to the ineffectiveness of library opacs. exactly. debate and discussion on autocat after the publication of their article again shows the line drawn between conservative opinions (added value, noncommercialization, and overall ideals of the library profession and professional cataloging workflows) and the newer push for open-source models, junking the opac, and learning and working with non-marc metadata standards and tools. conclusion from an administrative point of view, there are a number of viable options for making technical services as efficient as possible, in its current emanation: ■■ conduct a process review of all current workflows, following each type of format from receipt at loading dock to access by user. revise and redesign workflows for efficiency. ■■ eliminate all backlogs, incorporating and standardizing various types of bibliographic organization (from brief records to full records, using established criteria of importance and access). ■■ as much as possible, contract with vendors to make from our readers | eden 99 all print materials shelf-ready, establishing and monitoring profiles for quality and accuracy. establish a rate of error that is amenable to technical services staff; once that error rate is met, review incoming print materials only once or twice a year. ■■ assure technical services staff that their skills, experience, and attention to detail are needed in the electronic environment, and provide training and professional development to assist them in scanning and digitizing unique collections, learning non-marc metadata standards, improving project management, and performing consultation training to interact with faculty and students who work with data sets, metadata, and research planning. support and actively work for revised job reclassification of library support staff positions. most libraries are forced to work with fewer staff, and it is essential that current personnel are valued for their institutional knowledge and skill sets (knowledge management philosophy). library administrations need to emphasize to their staff that the organization has a vested interest in providing them with the tools and training they need to assist the organization in the new information marketplace. the status quo of technical services operations is no longer viable or cost-effective; all of us must look at ways to regain market share and restructure our organizations to collaborate and consult with users regarding their information and research needs. no longer is it enough to just provide access to information; we must also provide tools and assistance to the user in manipulating that information. to end, i would like to quote from a few of the articles from that 1983 issue of technical services quarterly i have alluded to throughout this chapter: like all prognostications, predictions about cataloging in a fully automated library may bear little resemblance to the ultimate reality. while the future cataloging scenario discussed here may seem reasonable now, it could prove embarrassing to read 10–20 years hence. still, i would be pleasantly surprised if, by the year 2000, ts operations are not fully integrated, ts staff has not been greatly reduced, there has not been a large-scale jump in ts productivity accompanied by a dramatic decline in ts costs, and if most of us are not cooperating through a national database.29 in conclusion, i will revert to my first subject, the uncertain nature of predictions. in addition to the fearless predictions already recorded, i predict that some of these predictions will come true and perhaps even most of them. some of them will come true, but not in the time anticipated, while others never will. let us hope that the influences not guessed that will prevent the actualization of some of these predictions will be happy ones, not dire. however they turn out, i predict that in ten years no one will remember or really care what these predictions were.30 technical services as we know them now may well not exist by the end of the century. the aims of technical services will exist for as long as there are libraries. the technical services quarterly may well have changed its name and its coverage long before then, but its concerns will remain real and the work to which many of us devote our lives will remain worthwhile. there can be few things in life that are as worth doing as enabling libraries to fulfill their unique and uniquely important role in culture and civilization.31 twenty-five years have come and gone; some of the predictions in this first issue of technical services quarterly came true, many of them did not. there have been dramatic changes in those twenty-five years, most of which were unforeseen, as they always are. what is a certainty is that libraries can no longer sustain or maintain the status quo in technical services. what also is a certainty is that technical services staff, with their unique skills, talents, abilities, and knowledge in relation to the organization and description of information, are desperately needed in the new information environment. it is the responsibility of both library administrators and technical services staff to work together to evolve and redesign workflows, standards, procedures, and even themselves to survive and succeed into the future. references 1. norman d. stevens, “selections from a dictionary of libinfosci terms,” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/winter 1983): 260. 2. marcia j. bates, “improving user access to library catalog and portal information: final report,” (paper presented at the library of congress bicentennial conference on bibliographic control for the new millennium, june 1, 2003): 4, http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03 .doc.pdf (accessed apr. 7, 2009). see also karen calhoun, “the changing nature of the catalog and its integration with other discovery tools,” final report to the library of congress, mar. 17, 2006, 25, http://www.loc.gov/catdir/calhoun-report-final .pdf (accessed apr. 7, 2009). 3. university of california libraries bibliographic services task force, “rethinking how we provide bibliographic services for the university of california,” final report, dec. 2005, 8, http://libraries.universityofcalifornia.edu/sopag/bstf/final. pdf (accessed apr. 7, 2009). 4. deanna b. marcum, “the future of cataloging,” library resources & technical services 50, no. 1 (jan. 2006): 9, http://www .loc.gov/library/reports/catalogingspeech.pdf (accessed apr. 100 information technology and libraries | june 2010 7, 2009). 5. karen calhoun, “being a librarian: metadata and metadata specialists in the twenty-first century,” library hi tech 25, no. 2 (2007), http://www.emeraldinsight.com/insight/view contentservlet?filename=published/emeraldfulltextarticle/ articles/2380250202.html (accessed apr. 7, 2009). 6. calhoun, “the changing nature of the catalog,” 15. 7. bradford lee eden, “ending the status quo,” american libraries 39, no. 3 (mar. 2008): 38; eden, introduction to “information organization future for libraries,” library technology reports 44, no. 8 (nov./dec. 2007): 5–7. 8. see karen schneider’s “how opacs suck” series on the ala techsource blog, http://www.techsource.ala.org/ blog/2006/03/how-opacs-suck-part-1-relevance-rank-or-the -lack-of-it.html, http://www.techsource.ala.org/blog/2006/04/ how-opacs-suck-part-2-the-checklist-of-shame.html, and http:// www.techsource.ala.org/blog/2006/05/how-opacs-suck-part3-the-big-picture.html (accessed apr. 7, 2009). 9. h. frank cervone, quoted in ellen bahr, “dreaming of a better ils,” computers in libraries 27, no. 9 (oct. 2007): 14. 10. david weinberger, everything is miscellaneous: the power of the new digital disorder (new york: times, 2007). 11. for a list of these concerns, see robert darnton, “the library in the new age,” the new york review of books 55, no. 10 (june 12, 2008), http://www.nybooks.com/articles/21514 (accessed apr. 7, 2009). 12. see calhoun, “the changing nature of the catalog,” 27. 13. see the metadata research center, “automatic metadata generation applications (amega),” http://ils.unc.edu/mrc/ amega (accessed, apr. 7, 2009). 14. sheila s. intner, “generating cataloging data automatically,” technicalities 28, no. 2 (mar./apr. 2008): 1, 15–16. 15. sheila s. intner, “a technical services makeover,” technicalities 27, no. 5 (sept./oct. 2007): 1, 14–15. 16. ibid, 14 (emphasis added). 17. carlen ruschoff, “competencies for 21st century technical services,” technicalities 27, no. 6 (nov./dec. 2007): 1, 14–16. 18. diane hillman, “a has-been cataloger looks at what cataloging will be,” online posting, metadata blog, july 1, 2008, http://blogs.ala.org/nrmig.php?title=creating_the_future_of_ the_catalog_aamp_&more=1&c=1&tb=1&pb=1 (accessed apr. 7, 2009). 19. peter brantley, “architectures for collaboration: roles and expectations for digital libraries,” educause review 43, no. 2 (mar./apr. 2008): 31–38. 20. calhoun, “the changing nature of the catalog,” 13. 21. brian briscoe, “that business/customer stuff (was: letter to al),” online posting, autocat, may 30, 2008. 22. brad wheeler, “in search of certitude,” educause review 43, no. 3 (may/june 2008): 15–34. 23. ibid., 22. 24. karen calhoun, “being a librarian.” 25. ibid. 26. ibid. (emphasis added). 27. andrew pace, quoted in roy tennant, “digitl libraries: ‘lipstick on a pig,’” library journal, apr. 15, 2005, http:// www.libraryjournal.com/article/ca516027.html (accessed apr. 7, 2009). 28. jia mi and cathy weng, “revitalizing the library opac: interface, searching, and display challenges,” information technology & libraries 27, no. 1 (mar. 2008): 5–22. 29. gregor a. preston, “how will automation affect cataloging staff?” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/ winter 1983): 134. 30. david c. taylor, “the library future: computers,” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/winter 1983): 92–93. 31. michael gorman, “technical services, 1984–2001 (and before),” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/winter 1983): 71. lita cover 2, cover 3 neal-schuman cover 4 index to advertisers article title | author 5revitalizing the library opac | mi and weng 5 the behavior of academic library users has drastically changed in recent years. internet search engines have become the preferred tool over the library online public access catalog (opac) for finding information. libraries are losing ground to online search engines. in this paper, two aspects of opac use are studied: (1) the current opac interface and searching capabilities, and (2) the opac bibliographic display. the purpose of the study is to find answers to the following questions: why is the current opac ineffective? what can libraries and librarians do to deliver an opac that is as good as search engines to better serve our users? revitalizing the library opac is one of the pressing issues that has to be accomplished. t he information-seeking behavior of today’s academic library users has drastically changed in recent years. according to a survey conducted and published by oclc in 2005, approximately 89 percent of college students across all the regions that were included in the study (including areas outside the united states) begin their electronic information searches with internet search engines.1 more than half of u.s. residents used google for their searches. internet search engines dominate the information-seeking landscape. academic libraries are the ones affected most, because many college students are satisfied with the answers they find on the internet for their assignments, and they end up not taking advantage of the many quality resources in their libraries. for many years, before the internet search engine emerged, library catalogs were the sole information-seeking gateway. just as the one-time industry giant kodak has lost ground to digital photography, academic library opacs are losing ground to online search engines. all along we academic librarians have devotedly and assiduously produced good cataloging records for the public to use. we have diligently and faithfully educated and helped our faculty and students find the proper library resources to fulfill their research needs and assignment requirements. we feel good about what we have achieved. why have our users switched to online search engines? ■ the evolution of user behavior it is technology and rising user expectations that have contributed to the changes in user behavior. as coyle and hillmann pointed out: “today’s library users have a different set of information skills from those of just a few decades ago. they live in a highly interactive, networked world and routinely turn to web search engines for their information needs.”2 a recent study conducted by the university of georgia on undergraduate research behavior in using the university’s electronic library concluded that internet sites and online instruction modules are the primary sources for their research.3 the students’ year of study did not make much of a difference in their choices. tenopir also concluded from her study of approximately 200 scholarly works published between 1995 and 2003 that no matter what type of resources were used, “convenience remains the single most important factor for information use.”4 recently, oclc identified three major trends in the needs of today’s information consumers—self-service (moving to self-sufficiency), satisfaction, and seamlessness.5 services provided by google, amazon, and similar companies are the major cause of these emerging trends. customers have wholeheartedly embraced these products because of their ease of use and quick delivery of “good enough” results. researchers do not need to take information literacy classes to learn how to use an online search engine. they do not need to worry about forgetting important but infrequently used search rules or commands. in addition, the search results delivered by online search engines are sorted using relevance ranking systems that are more user-friendly than the ones currently employed by academic library opacs. these are just some of the features that current academic library opacs fail to deliver. in 2004, campbell and fast presented their analysis of an exploratory study of university students’ perceptions of searching opacs and web search engines.6 they found that “[s]tudents express a distinct preference for search engines over library catalogues, finding the catalogue baffling and difficult to use effectively.” as a result, library opacs, because they do not fulfill user needs, have been bombarded with criticism.7 we often hear librarians complain about how library users forget what they have learned in user education classes. librarians sometimes even laugh at users’ ignorance and ineffectiveness in searching library opacs. this legacy mentality has actually prevented librarians from recognizing the changes in user behavior and expectations that have occurred in the past decade. rarely have librarians considered ineffective opac design to be at the root of unsuccessful opac use. roy tennant has mentioned frequently in his presentations that “only librarians like to search; users prefer to find”; that “users aren’t lazy, they are jia mi and cathy weng revitalizing the library opac: interface, searching, and display challenges jia mi (jmi@tcnj.edu) is electronic resources/serials librarian and cathy weng (weng@tcnj.edu) is head of cataloging, the college of new jersey library, ewing. 6 information technology and libraries | march 20086 information technology and libraries | march 2008 human.”8 it is only natural that library users turn to internet search engines first for their information needs. ■ the opac reexamined cutter, in his 1876 book, introduced the objectives of the library catalog as follows: 1. to enable a person to find a book of which either a. the author b. the title is known c. the subject 2. to show what the library has a. by a given author b. on a given subject c. in a given kind of literature 3. to assist in the choice of a book a. as to its edition (bibliographically) b. as to its character (literary or topical)9 the majority of today’s opacs have successfully fulfilled cutter’s model in finding known items. following the card-catalog convention, bibliographic elements such as title, author, and subject have been the leading search options in opac search menus for many years. it was assumed that users always came to the library with specific author, title, or subject information in mind before searching the catalog. the opac bibliographic display is in essence an electronic version of the card catalog. to accommodate the bibliographic data from card catalogs, many display labels were created, but often without regard to whether or not they were suitable in an online environment. this data-centered, card-catalog type of design was easily understood and fluently used by librarians, but not by most end users. campbell and fast found in their study that “while the participants were generally happy with their understanding of search engines, they frequently expressed a low opinion of their ability to search the catalogue.” they also found that students felt that “[t]he web is cluttered; the catalogue is organized. however, this organization was not always helpful; it was admired, but not understood.”10 the traditional catalog retrieval mechanism is significantly different from the web search engine. as yu and young noted in 2004, “web search engines and online bookstores have a number of features that are not typically incorporated into opacs. these functions include: natural-language entry, automated mapping to controlled vocabulary, spell-checking, similar pages, relevance-ranked output, popularity tracking, and browsing.”11 these features have unquestionably affected user expectations in searching library opacs. teaching users to search for structured bibliographic data is completely opposed to the ever-popular free and open internet search mechanism drawn from the google-like search experience, which does not require any special training. since academic libraries aim to provide more dynamic and versatile services, revitalizing library opacs should be considered a top priority. furthermore, librarians’ expectations of user behavior should adjust to today’s needs. educating users to become fluent in using opac search commands and rules has become less relevant as users now seldom read and follow instructions. investing effort and energy in designing a truly user-friendly opac that functions intuitively to achieve productive retrieval could not be more imperative. academic librarians have started pondering what changes should be made to library opacs so that a truly user-friendly, twenty-first-century catalog that offers a “google-like” experience can be delivered. two important aspects that affect the usability of library opacs are addressed in this article: (1) the current interface and searching capabilities and (2) the bibliographic display. the opac’s public interface and searching capabilities together function as a finding aid. it determines how successful a user is in retrieving information and is the gateway to library resources. the effectiveness of an opac’s bibliographic display affects the user’s understanding of the bibliographic description. users use bibliographic information to identify, select, and obtain library resources. ■ the study of the public interface of library opacs to find out how academic libraries designed and administered their opacs, the authors examined the interfaces of 123 association of research libraries (arl) libraries’ opacs powered by five major integrated library systems (ils): aleph, horizon, millennium, unicorn, and voyager. the study focused on searching ability, relevance ranking, layout, and linking functionalities. during the study, we expected each ils system to have its own opac design. we also anticipated that search mechanisms would be managed differently at each location. however, we were surprised by the great disparities that we discovered in opac quality, a clear indication of the time and effort (or lack thereof) devoted to their maintenance and improvement. the findings are summarized below. google-driven changes—keyword search as the default search key in his article “mental models for search are getting firmer,” usability expert jakob nielsen argued that cur> article title | author 7revitalizing the library opac | mi and weng 7 rent users have already developed a firm mental model of searching: search is such a prominent part of the web user experience that users have developed a firm mental model for how it’s supposed to work. users expect search to have three components: ■ a box where they can type words ■ a button labeled “search” that they click to run the search ■ a list of top results that’s linear, prioritized, and appears on a new page—the search engine results page (serp) in our experience, when users see a fat “search” button, they’re likely to frantically look for “the box where i type my words.” the mental model is so strong that the label “search” equals keyword searching, not other types of search.12 studies have also shown that the default search option to which an opac is set affects users’ success in retrieving information. two studies on university opac search transactions confirmed that novice users preferred searching by keyword. at nanyang technological university, singapore, a recent search transaction log study was conducted to “identify query and search failure patterns with the goal of identifying areas of improvement for the system.” results indicated that “the most commonly used search option for the ntu opac is the keyword search. the use of keyword searches contributed to 68.9 percent of all queries while other options such as title, author, and subject accounted for 16.5 percent, 8.2 percent, and 6.4 percent of all searches respectively.”13 at california state university–los angeles, a fourquarter (2002–2003) search transaction log analysis also revealed similar results. after the library implemented an “advanced keyword search” feature that provided more user-centered, behind-the-scenes search algorithms and that set keyword search as the default, the keyword search queries rose dramatically.14 many university library opacs have already begun to adopt features employed by internet search engines. among the 123 arl library opacs studied, 81 have “keyword(s) anywhere” as the default search key (see appendix a). this is a positive sign that libraries are paying attention to user search behavior. thirty-six libraries’ default search keys are still set to “title,” and six libraries, instead of providing a default search option, list field choices from which users must choose before entering their search terms. the title search used as the default option holds some potential problems. in order to retrieve good results from a title search, users are expected to type in a title in the right order, spelled correctly, and omitting the initial article (a, an, the), if any. while librarians are fluent with these seemingly simple rules, students and faculty constantly have trouble remembering them. providing online search tips and offering information literacy classes only help a little. since presenting keyword search as the default has proved effective, libraries using title search as their opac default search option might want to reconsider switching their default setting to keyword. search ability—true keyword search the basis of current opac search systems is boolean logic. the ease of using google-like search engines comes from its implicit “and” feature, which eliminates the need to enter boolean connectors (and, or, not) between search terms. this is logical because users usually look for records that contain all the terms that they enter. sixty-six percent of the arl libraries studied have opacs with keyword set as the default search option. these libraries handle boolean logic in keyword searching very differently. all five ils vendors offer “automatic and” functionality, but not all of these libraries have adopted it: in some cases, users are required to enter boolean operators during a search. emory university library’s opac automatically executes “same” for multiple search words if no boolean operators are entered which means that it will find records with the search terms in the same bibliographic fields. syracuse university’s opac automatically uses the boolean operator “or” for all keyword queries. this practice can generate too many irrelevant results. libraries that automatically supply the boolean operator “and” for multiple terms entered in the search box consequently produce more relevant results. in addition, none of the arl opacs studied handle auto-correction for typos, spell-check, auto-plurals, auto-word-truncation, punctuations, or special characters. this makes searching unnecessarily inconvenient. for many years now, teaching students how to properly use boolean operators has been one of the essential topics in information literacy classes. after taking these classes, do students use boolean operators when searching? a study of 2,374 transaction logs collected by 836 french universities revealed that french university students use boolean operators infrequently. fifty-six percent of the queries used only a single term. approximately 28 percent of the queries contained one boolean operator. to further investigate the impact of information search expertise on the use of boolean operators, the study showed that approximately one-third (32 percent) of the students (considered the “novice” group in the study) still did not use boolean operators even when they were explicitly invited to do so, compared to 83 percent of librarians (considered the “expert” group in the study), who used at least one boolean operator for their queries.15 therefore, complicated search strategies and syntax are mostly used by expert users. novice users 8 information technology and libraries | march 20088 information technology and libraries | march 2008 prefer to use natural-language queries. libraries also handle phrase searching in different ways. phrase searching usually is embedded within keyword search either explicitly or implicitly depending upon the ils system. aleph (ex libris) libraries use a radio button for “word or phrase” or “words adjacent” or “exact phrase” options for the computer to execute the command. unicorn (sirsi/dynix) libraries provide three options: “keyword,” “begins with,” and “exact.” some libraries have the “exact” command executed to search every field in a bibliographic record; other libraries search the title, subject, and author fields only. the millennium system’s (innovative) keyword search feature can do automatic phrase and “and” search. some millennium libraries (e.g., michigan state university) take advantage of this feature to search words entered as phrases first and, if unsuccessful, the system then repeats the search for the same words using the boolean operator “and.” this feature produces more relevant search results. however, several millennium libraries have not implemented this feature. they still use “boolean keyword” search as the default and instruct users to add quotation marks to define phrases. the voyager (ex libris, formerly endeavor) system offers two types of keyword searches: “keyword relevance” and “keyword boolean.” both options can handle phrase searching. but users are required to enter quotation marks for specific terms used as phrases. some libraries intentionally made only one keyword search option available. other libraries provided both options and used different languages as an opac search key (see appendix b). these search keys are not self-explanatory, and users will often find them puzzling. the default help screen provided by the ils vendor and adopted by many voyager libraries does not help much either (see appendix c). thirty-one of the 35 voyager libraries provide a boolean keyword search option. only five libraries utilize the automatic “and” feature. one library uses boolean keyword search as the only keyword option, but did not activate the automatic “and” functionality. relevance ranking in search results when users search by keyword, the best way to sort the results is by relevance. presenting the most relevant results at the top of the results page is crucial because it enhances library resource discovery and access. other sorting options, such as title or publication date, are not very useful since users usually do not have titles or publication dates in mind when browsing search results from a keyword search. three ils systems (millennium, unicorn, and voyager) have a relevance-ranking feature, yet this functionality was very much underutilized by the libraries studied. of the eighteen unicorn libraries, only five offered relevance ranking. none made it the default sorting option. thirtysix of the 38 millennium libraries provided relevance ranking as a sorting option. only twelve of those libraries made relevance ranking the default sorting system. twenty-seven out of the thirty-five voyager libraries offered the keyword (relevance) search option, under which the search results were automatically ranked by relevance. out of the twenty-nine voyager libraries that offered the keyword (boolean) search option, only four libraries used relevance as the default sorting system. the rest of the libraries used a “system sort” mechanism that sorted search results by bibliographic control number. figure 1 summarizes the sorting options used by the arl libraries studied and also shows the default sorting options for keyword search. unlike online search engines, which pull data directly from full-text documents, library opacs search for words from the structured metadata entered by catalogers. different fields are set to carry different weights for relevance considerations. the behind-the-scenes algorithm (the criteria used to decide the level of relevance) should be carefully established to warrant a good ranking scheme. for example, the new opac of north carolina state university library, powered by endeca, adopted an algorithm based on field weighting, phrase matching, facet lcsh, term frequency (tf), and inverse document frequency (idf). their search results are indeed more logically ranked by relevance. recently there have been suggestions to incorporate circulation statistics, book review data, and a library of congress call number table into the algorithm. the checkout data would provide a rough substitute for google’s pagerank (a count of links to a site, which is an indication of the site’s popularity), and book reviews would provide more text to be considered in the relevancy tests. using library of congress call numbers would either require having the call number table loaded and then running the search terms against it or including call numbers in the algorithm, giving more weight to titles having the same call number. for example, seven out of twenty-three results generated for a search for “new york history” on an opac have the call number “f128.” the call number “f128” is linked to the call number table with the subject new york and history. it can be confirmed that seven items with call number “f128” should be considered more relevant and ranked first on the results list. more research needs to be done in this area. the search results display the search results display is critical. the information, options, and bibliographic data presented on the browse page help users decide what actions to take next. in the opacs examined, the authors found the following problems: article title | author 9revitalizing the library opac | mi and weng 9 1. search terms and search boxes were not retained on the results page after a search is performed, many opacs do not effectively carry the original search information onto the results screen. this information includes the search key and the words typed in the search box. users need to consult this information to identify and select records relevant to their needs from the search results page. based on the retained information, users also decide what to do next. for example, they might change their search strategy or modify their previous search. many of the opacs studied neglected to display the original search information. even better than just displaying the text of the user’s search terms would be to maintain them in search boxes at the top and bottom of the results display page. this way, users would only have to modify their search terms rather than type new search terms each time they wished to modify their original search. only one of the twentyone aleph libraries studied kept the previous search terms in the search box on the results page. fourteen of the aleph libraries retained neither the previous search strategy nor the search terms. six libraries placed the search box at the bottom of the search results page, which could be easily missed. 2. post-search limit functions were not always readily available sometimes keyword searches produce an overwhelmopac sorting options for keyword search relevance year (publication date) author title call # (subject) format default aleph 21 0 21 21 21 8 4 year/author: 17 title/year ascending: 1 title: 1 system sort: 2 horizon 7 0 7 7 7 0 0 publication date: 2 title: 2 author(ascending): 1 system sort: 2 millennium 38 36 38 0 38 0 0 date: 20 title: 5 relevance: 12 system sort: 1 unicorn 18 5w descending 18 ascending 18 18 18 18 0 new to old: 5 relevance: 1 (ncsu) system sort: 12 voyager 35 kw (r) 27 kw (b) 4 descending 34 ascending 34 35 35 0 0 relevance: 5 kw with relevance: 27 system sort: 8 figure 1. arl libraries sorting options for keyword search (as of march 2007) 10 information technology and libraries | march 200810 information technology and libraries | march 2008 ing number of search results. since the relevance ranking functionality currently provided by ils vendors does not work very well, the best way to refine searches is to make effective search limit options available. limiting options such as format, language, date, availability, and location should be readily available on the results page. some ilss in our study hid this feature, either under a modified search link or an advanced search link. this made refining a search unnecessarily cumbersome. 3. item statuses were not available on the search results page in addition to bibliographic information, users also need to know whether an item they want is available. having the item status on the browse page is very helpful because users can skip the records that have been checked out. some libraries studied did not have this information on the results browse page. users needed to go to the individual bibliographic records to find out whether an item was available or not. a few libraries provided an added-value option to limit the results by “available items”—a very useful feature. 4. a lack of value-added information a book cover image conveys an impression of a book that words cannot. it can also help a user recognize a book he or she has seen previously. in addition to cover images, libraries can provide value-added and contextual information by linking those images to tables of contents, summaries, sample passages of text, and reviews. one way libraries provide value-added and contextual information is to link cover images to the library of congress’s table of contents page. another way is to link opacs to information obtained from syndetics.com, a company that provides cover images, tables of contents, summaries, author biographical information, and reviews. the ohio state university library not only adds the table of contents into the marc record, but also links the names of the authors of a particular resource to other works by the same authors. this is a great discovery tool for finding related resources, and it is especially helpful, since in the future opacs will be able to search not only books but also articles and other resources. 5. title links were misleading we found that several libraries’ opacs title links on the results page did not take users to the detailed bibliographic record, but instead directed users to an alphabetical title-browsing page. to get to the actual bibliographic record, users had to click a “display full record” link (which is sometimes difficult to locate) to view the individual bibliographic record. this misleading feature makes the retrieval process inefficient. 6. switching between individual records and the results list was cumbersome after viewing an individual bibliographic record, users will want to return to the results browse page, either by hitting the “back” button or by clicking on a “return to results” link. many library opacs in our study returned the user to the top of the results page rather than to the location to which the user had previously scrolled. this forced the user to scroll back down through the records that had already been examined. this feature ought to be improved. 7. the color of entry links that had already been read were not differentiated for over a decade now, web browsers have changed the color of links that have already been clicked on. however, this has not been the case with opacs. to solve this problem, visited bibliographic entry links on search results pages should likewise be given a different color from entries that have not yet been visited. this feature facilitates the browsing of the search results. if what has been viewed is clearly marked, users only need to focus on entries that have not yet been visited. some libraries in our study did not have this feature. 8. searched keywords were not highlighted when a keyword search is performed, highlighting the entered keywords in each bibliographic record that has been retrieved is helpful. based on the bibliographic elements in which the highlighted keywords appear, users can then decide how relevant the retrieved publication is to their research. all five ils vendors provide this feature, and many libraries did a good job of implementing it. however, some libraries neglected to make this feature available. 9. many libraries lack a meaningful call-number browse feature library opacs should take better advantage of call number links by allowing users to browse them much as if they were browsing shelves in the stacks. to that end, opacs should link call numbers directly to a page with more useful identifying information, such as the authors and titles. no aleph library opacs that we studied currently have this feature. instead, clicking on the hyperlinked call number field only leads users to a list of more call numbers, which is not helpful at all. 10. title link, subject link, and author link should be relabeled to be meaningful to end users (other valueadded features) millennium’s “similar records” and voyager’s “more like this” are added to pull similar titles under the same article title | author 11revitalizing the library opac | mi and weng 11 subjects. unicorn and horizon offer a panel on the left side of the detailed book record, which can add meaningful information to these links. but how the panel is used depends on the individual libraries. some libraries use the panel with only library holding information, but other libraries, such as university of virginia, make an informative presentation of those links to students. virginia has added three browse features to make the index links much more meaningful: “find more by this author” (author link), “find more on these topics” (subject link), and “nearby items on shelf” (call number link). (see figure 2.) this value-added feature can indeed facilitate retrieval process. by analyzing five major integrated library systems’ opacs among arl libraries, the authors have come to believe that librarians can make a big difference in improving opacs. no matter how good the library system is, librarians still need to invest effort, time, and technical knowledge to configure and take full advantage of the many capabilities that ilss offer. public services, technical services, and system librarians should all work together to continuously study the usability of opacs and to make them more effective. it is true that all current opacs lack spell-check and automatic stemming functionality. aleph and horizon need to add relevance ranking, and millennium, unicorn, and voyager should make our data work harder and relevance ranking algorithms more effective. besides those systems in need of improvements, the study shows that all library opacs could do a much better job if they focus on the user’s needs. ■ the opac bibliographic display study when the web opac was introduced, libraries around the world quickly abandoned the traditional card catalog display and adopted the line-by-line display with display labels on one side and bibliographic information on the other. because the line-by-line display format can be locally customized, each library’s opac bibliographic display looks very different. for decades, most academic libraries in the united states have used aacr and marc as their content and metadata standards for resource description and access. marc and aacr were originally created for card catalogs in which descriptive elements and access elements were separately defined and presented. the line between the two types of elements has become less distinct in today’s web environment. many elements in bibliographic records can serve as both description and tracing elements on opacs.16 hyperlink functionality has also streamlined the retrieval process. to see how academic libraries in the united states format their opac bibliographic displays, the authors examined the opacs of fifteen academic libraries.17 the purpose was to study the effectiveness of the display of records in different formats. in the mid-1990s, wool studied the bibliographic display practices for monographs of thirty-six online catalogs in the united states. in his study, five criteria were used to analyze each bibliographic record structure.18 the authors of this paper adopted for analysis three of the five opac bibliographic display criteria used by wool, only this time with an emphasis on the user’s perspective and needs. eight different titles were reviewed and compared: three monographs, two serials, one video recording, and two sound recordings.19 the analysis given below is based on the following three criteria: ■ the accuracy and clarity of display labels; ■ the order of bibliographic elements display; and ■ the utilization of bibliographic data. accuracy and clarity of display labels for this discussion, the authors divided the bibliographic elements into three areas: ■ the first tier: information about author/contributor, title, imprint, and subjects; ■ the second tier: other descriptive information, including the physical description, notes, related contributors, related titles, etc.; and ■ the third tier: the linking fields (marc 76x–78x fields) and the electronic location and access field (i.e., 856 field). the first-tier elements the information displayed in the first tier can be consid-figure 2. university of virginia libraries catalog 12 information technology and libraries | march 200812 information technology and libraries | march 2008 most libraries in our study used the label “author” for the principal author. the principal author could be a personal author, a corporate author, or a conference name. if it is a personal author, it could be a writer, an artist, or a composer. some opacs used “author” to represent all types of responsible bodies, be it a personal author, a corporate author, a meeting name, an artist, a music composer, etc. this use of a single label to cover a diverse set of situations is confusing. some libraries used separate labels (“author,” “corporate author,” “meeting name,” “author/ artist,” “author/composer,” or “author, etc.”) for different types of responsible bodies (see appendix d). “uniform title” was defined in aacr to collocate resources derived from the same original intellectual or artistic creation. for example, when cataloging a translation, in addition to its official translated title, an established uniform title is entered to indicate the original work. when browsing by uniform title on a properly set opac, all entries related to the original intellectual creation should be retrieved. this uniform title browsing feature helps users locate related publications in the catalog. the problem is that the term “uniform title” is only understood by catalogers, not by others. there is no label for such an entry that can be easily understood by the average user. however, suppressing the uniform title entry to avoid confusing users will cause the opac to lose its helpful collocation functionality. some libraries studied use the term “uniform title” as a display label. some libraries use “other title” as a display label. some libraries display this entry under the label “title” along with the title proper (title in the 245 field). none of the above-mentioned arrangements are ideal. the display labels for subject headings provided by each library were very similar. most academic libraries in the united states use the library of congress subject headings and the medical subject headings as the thesauri for subject entries. specifying the thesauri for headings on opacs with acronyms like “lcsh” and “mesh” is of no help to users, because these thesauri do not clarify anything that would assist users in their research. figure 3 lists the display labels used by libraries in the study. ered the key elements for identification. opac users first examine them and decide if the manifestation described is relevant to their query. most opacs studied used “title” as the display label for the title statement. this element actually consists of the title and statement of responsibility (author, etc. statement). using the label “title” alone is not inclusive enough. one library (university of arizona library) displayed only the title portion under the label “title” and provided a separate label, “author/contributor info,” for the statement of responsibility portion, which, while helpful in a limited way, could also create more confusion. let us consider, for example, the project directory (répertoire des projets) of tdc (in french, cdt). the title statement for this data would be “project directory / tdc = répertoire des projets / cdt.” here, the english title and statement of responsibility is equivalently presented with its french title and statement of responsibility. the opac display using the university of arizona library’s model is as follows: title: project directory author/contributor info: tdc = répertoire des projets / cdt. this arrangement will not work for items with titles and statements of responsibility in multiple languages presented on a single manifestation. the french title appears under the label “author/contributor info,” which makes no sense. marc fields library of congress subject (marc 650 field 2nd indicator 0) medical subject (marc 650 field 2nd indicator 2) d is p la y l a b e ls subject (lcsh) subject (mesh) subject-lib. cong. subject-medical subject lc subject medical library of congress subject headings medical subject headings subject(s) subject(s) subject, general subject, geographic subject, medical subject med. subject figure 3. display labels for subjects article title | author 13revitalizing the library opac | mi and weng 13 the second-tier elements the elements in the second tier include the physical description, notes, related authors, and related titles. this is an area where mapping bibliographic elements onto proper display labels is difficult. this area was also not managed well by the libraries studied. unlike first-tier elements in which one element usually corresponds to a unique display label, second-tier elements exhibit two patterns in the opacs examined: many-to-one and one-to-many. that is, multiple categories of data (of different marc fields) can be represented by one display label, e.g., incorporating physical description, numbering notes, and publication numbering into “description” (many-to-one). on the other hand, one display label can represent one single, repeatable bibliographic element (the same marc field repeated many times), e.g., multiple general notes (oneto-many). both arrangements (one-to-many and manyto-one) can result in a simpler, cleaner public display, since some descriptive elements are self-explanatory and users can get by without specific display labels supplied. the disadvantage of these arrangements is that the level of specificity of public displays is compromised. some important descriptions can be easily missed if they are clustered in a group of elements. for bibliographic elements that are not self-explanatory, this type of arrangement can fail to convey useful information, or even worse, deliver inaccurate or vague information. for example: description: v. : ill. ; 28 cm (physical description, marc 300 field) report year ends mar. 31. (numbering note, marc 515 field) ’77– (publication span, marc 362 field) published: philadelphia : robert morris associates, 1977– (imprint, marc 260 field) ’77– (publication span, marc 362 field) annual (frequency, marc 310 field) the numbering field (field 362) is defined to describe a serial publication’s chronological or numerical publication extent. carelessly placing data like “’77-” under labels such as “description” or “published” is very unclear. in fact, it is inaccurate because “’77-” is the publication span, not the publication date. without a proper label, it is difficult to convey this information to users. some libraries we studied used such labels as “publication history,” “publishing history,” “publication dates,” or “volume/date range” to describe the publication span. this practice is misleading (see appendix e). names like coauthors, editors, cast members, performers, related corporate names, or meeting names of people who contributed to or were involved in the creation of the work are considered secondary contributors. using one label to cover the various roles (author, editor, composer, etc.) is the practice most libraries have adopted. like the primary author field, this element represents a variety of roles depending upon the type of manifestation. some opacs used one display label to cover all related personal names, corporate names, and meeting names (see appendix f). most libraries failed to supply a proper label for a secondary name when it was entered with a related title. this so-called “name-title added entry” is provided to collocate materials under the same author and title in the catalog. ideally, the name-title combined element, provided with redirect functionality via hyperlink, should perform an author-title combination search for exact retrieval. most opac systems could only perform either an author or a title search. the search results were unsurprisingly irrelevant, because they did not utilize both elements of the name-title added entry to produce results that were sufficiently specific: users would get only a list of authors or a list of titles instead of an author-title combination entry list. some libraries presented this type of element only as an unhyperlinkable note, which defeats the purpose of having such data available. handling series for opac displays is also challenging. the majority of opacs studied did a poor job in this area. in general, a series title transcribed from the resource also functions as an access element if the transcribed title is the same as the established one in the authority file. when the transcribed series title is different from the established series title, ideally the transcribed series title should only be accessible via the library system’s cross-references feature, which then directs users to bibliographic records that contain the established entry. this type of descriptive element is not meant to be displayed on the opac. the opacs examined used the labels listed in figure 4 to handle transcribed and established series entries. labels listed in the same row were taken from the same opac. as can be seen, users are not expected to know the difference between a “series statement” and a “series.” in many cases, these two elements are identical due to the vendor authority control process.20 this could confuse the user, especially when both elements are displayed right next to each other. 14 information technology and libraries | march 200814 information technology and libraries | march 2008 the third-tier elements the third-tier elements consist of linking fields (marc 7xx fields) and electronic location and access fields (marc 856 field). the linking fields are used mostly in serial bibliographic records. their purpose is to link the title being described to its related publications, e.g., supplements, translations, preceding titles, or succeeding titles. elements in this category should be displayed and linked directly to the related record via control numbers provided in the bibliographic record. if the catalog does not have the related record, a clear message should indicate this to the user. unfortunately, many libraries do not display all the linking entries. none of the opacs studied offered direct link functionality. instead, what was usually offered was a redirect feature via hyperlink that prompted the system to issue a new author or title search. the direct link functionality via record control numbers was never made available. if the library did not have the related entry, the opac system simply took the user back to the original entry—a very confusing design flaw. to ease the user’s access to internet resources, the electronic location and access element (marc 856 field) was defined for catalogers to record the internet location of the resource being described and its related information. by clicking the hyperlinked element on an opac, users seamlessly get to the desired electronic document site. the url specified in the field might link to full-text documents, the table of contents, the document abstract, the publisher’s description, or the author’s biographical information. a label that fits all types of materials is crucial. the bibliographic elements displayed under the label should also be carefully managed. under the label, some libraries displayed the type of resource (e.g., table of contents). other libraries displayed the http url only. some libraries displayed both the type of resource and the http url (see figure 5). as for the location of the label in the opac record, we found that the location of the url link depended on the opac in which it appeared: in some opacs, links were located at the top of records; in others, they appeared in the middle or at the bottom. we found that the location of the link was not terribly critical, provided that the label was prominent and the display text understandable. the order of the bibliographic elements display the way bibliographic data is organized in each opac record, together with display labels, helps users to quickly identify library resources. although each library can locally choose the arrangement of bibliographic data displayed on its opac, most libraries prefer to place the citation information (author, title, publication) ahead of other elements. the sequence of the other elements exhibited enormous variation in the opacs studied. some libraries placed the electronic access element above all other data (suny buffalo); some libraries placed local holdings information, call number, and item availability in the middle of the bibliographic record. arrangements were clearer and more understandable when provided with clear labels and a distinct layout between the local holdings information and bibliographic data. problems arose when second-tier elements were mingled with firsttier elements and when they shared the same display label. see example in figure 6. in this example, two titles are displayed under the “title” label. the first title, “rma annual statement studies,” is the full title (marc field 245) of the publication. the second title, “rma annual statement studies: industry default probabilities and cash flow measures,” is the title of the resource’s related publication (marc field 730), which normally is considered a second-tier element and should be placed farther from the title proper with a clear label. since the display order of bibliographic elements is completely customizable, we found in our study that few libraries put enough effort into providing clear bibliographic displays. more importantly, records in different formats (e.g., monographs, serials, music materials, video recordings) were not given equal attention. some labels and data sequences might work for one format, but not another. utilization of bibliographic data another factor that has an effect on the usability of an opac is the utilization of bibliographic data. two issues are addressed in terms of utilization of bibliographic data: (1) the completeness and suitability of the metadata displayed on an opac, and (2) the extent of repurposing the bibliographic data and creating added value to an opac.21 a typical bibliographic record contains descriptive data, access data, and adminlabel for transcribed element label for established element series statement series series statement series indexed as other series series series note series description series figure 4. display labels for series article title | author 15revitalizing the library opac | mi and weng 15 istrative data. descriptive data is provided to describe the manifestation cataloged and is considered of interest to the public. access data is entered and indexed for retrieval. administrative data is used for setting up search limits (e.g., limit by language, format) and pulling statistics (e.g., how many titles in spanish). it is most useful for internal, administrative use. librarians must be careful when deciding whether such data elements will be displayed. in terms of the completeness and the suitability of metadata in the opac display, the authors discovered the following in the opacs studied: 1. many libraries’ opacs displayed control numbers, such as the oclc control number (the 035 field), the lc control number (010 field), and other local system control numbers. this type of information is usually of no interest to the public. see example in figure 7. in this example, the numbers listed under the label “wln #” represent different types of system control numbers, which are of no concern to users and therefore should not be displayed. 2. some opacs displayed bibliographic data from the leader fields of the cataloging record. marc leader fields are a group of fixed-length codes that represent the type of resource (monograph, serial, or musical score) and material format (print, electronic, or sound recording). the information could be helpful for patrons if they are displayed with the proper label on the opac. libraries that chose to display the leader data on their opacs did not do a good job of making the information clear to users. for example, one library listed “journals and newspapers,” “computer file,” “serial,” “book,” “e-resource,” and “gov publication” under the label “record type” (see figure 8). seeing so many record types under one label can easily confuse library users. 3. some libraries omitted certain crucial variable fields, e.g., the linking entry complexity note (field 580, containing information about title history), related title access entries (fields 730 and 740, containing related titles), and linking entries (linking the record to other bibliographically related records, e.g., 76x, 77x, and 78x fields). these fields are defined with a clear purpose and should be carefully considered for public display with clear labels. some libraries in our study displayed them but left other irrelevant information on the opac, which clutters the display with information that does not help users. see example in figure 9. in this example, under the label “related publication,” the french version and the spanish version of jama are displayed. in addition to the french title and the spanish title, the marc 21 language code and its corresponding issn are also displayed. the language code and the eight-digit issn number— since no separate label is provided for them—are confusing. 4. the linking elements not only should be displayed on the opac, but should also be hyperlinkable. they ought to be used to link to related bibliographic records. in an online environment, this sort of field can also function as a descriptive element. some opacs displayed linking entries but did not enable hyperlink functionality. some libraries displayed two instances of them, one as a descriptive element and the other as a linking element with hyperlink capability. another important aspect of making use of bibliographic data is repurposing the bibliographic data to provide added value to opacs. lorcan demsey mentions frequently in his blog that in order to sustain library value, libraries should “make data work harder.” he points out that “libraries have invested a great deal in bibliographic data—yet it has remained somewhat inert in our catalogs, failing to release the value of the investment.”22 these rich data can be better utilized for different purposes, including designing an enhanced opac. lavoie, et al. described further in their recent article about data mining: as more activities move into networked spaces, more areas of our lives are shedding data. this data is increasingly being mined for intelligence that drives services. . . . [c]ompanies like amazon repurpose data to create added value. this is a lesson librarians must learn if they want to improve their own visibility and value in increasingly crowded digital information spaces where users, as always, want good results without too much time or effort. . . . the good news is that libraries don’t come to this task empty-handed but with figure 5. online opac record from suny buffalo figure 6. online opac record from the college of new jersey 16 information technology and libraries | march 200816 information technology and libraries | march 2008 rich, structured information about the materials in our collections.23 tim o’reilly highlighted in his article the successful example of how amazon reutilizes data: amazon relentlessly enhanced the data, adding publisher-supplied data such as cover images, table of contents, index, and sample materials. even more importantly, they harnessed their users to annotate the data, such that after ten years, amazon, not bowker, is the primary source for bibliographic data on books, a reference source for scholars and librarians as well as consumers. . . . effectively, amazon “embraced and extended” their data suppliers.24 all opacs reviewed in the study operate within the traditional vendor-supplied module. this long-established approach gives libraries limited flexibility to customize the search key options, search results displays, restricted sorting options, and preand post-search limit options of their opacs. unfortunately, libraries can do very limited data mining inside the vendor’s hard-coded framework. many valuable metadata are buried in the bibliographic database. system vendors have failed to make the most of technology to better utilize data. very few libraries have thought outside the box and taken advantage of the existing rich bibliographic data. the emergence of north carolina state university’s endecapowered opac was a good example of repurposing data and creating value-added information. the data facets used on ncsu’s single search-andbrowse combined opac interface are pulled and repurposed from their sirsi/dynix database. as one might have expected, eight of the eleven facets are extracted from the library’s marc bibliographic records (“availability” and “browse: new” are from item records). out of the eight facets, four are extracted from subject headings; two are from the fixed fields; one is from the call number field and one from the variable fields of the bibliographic record.25 ■ discussion and recommendation based on the authors’ findings above, the following are the primary factors that have contributed to the ineffectiveness of the opacs offered by today’s academic libraries. 1. system limitations the inadequacy of today’s ils has been a known problem. inflexible search options make library catalogs difficult to use. despite the fact that some vendors diligently enhance their systems’ functionalities, overall performance is still disappointing. karen markey pointed out in a recent article that one of the reasons why the solutions recommended by researchers in the 1990s were not applied to online library catalogs was “the failure of ils vendors to monitor shifts in information-retrieval technology and respond accordingly with system improvements.”26 antelman et al. observed similarly that all major ils vendors are still marketing catalogs that represent second-generation functionality. despite between-record linking made possible by migrating catalogs to web interfaces, the underlying indexes and exact-match boolean search remain unchanged. it can no longer be said that more sophisticated approaches to searching are too expensive computationally; they may, however, to be too expensive to introduce into legacy systems from a business perspective.27 since ils vendors first introduced their products back in the 1980s, user behavior and expectations have changed immensely. while libraries have started to figure 7. online opac record from the university of washington. figure 9. online opac record from the university of michigan.figure 8. online record from suny buffalo article title | author 17revitalizing the library opac | mi and weng 17 recognize the changes and are working hard toward meeting the needs of multiple generations of users, little can be done if ils products still operate within the same old-fashioned information-retrieval structure. because ils vendors have failed to revamp their opac modules to meet user needs, libraries have been forced to seek other options. north carolina state university is one of the first libraries to exercise its options. its new opac system, powered by endeca (operated on the sirsi/dynix platform), has shown remarkable improvements in ease of use, which usability tests have verified. recently, two ils vendors (innovative and ex libris) have been in the process of developing new opac modules using new technology and a new approach in data mining. 2. libraries are not fully exploiting the functionality already made available by ilss unsurprisingly, the opacs examined by the authors, if powered by the same vendor, showed similarities in general layout and interface features. during the study, it soon turned out to be easy for the authors to recognize the ils system of each opac. as mentioned previously, we expected opacs to vary somewhat. what was unexpected was the huge differences in, among other things, interface layout, search options and search languages, behind-thescenes search algorithms, search results displays, display labels and the corresponding bibliographic data, and what data was chosen for display. the disparities that we found in these features suggested that there had been great differences in the amount of attention, energy, and time devoted by each library to designing its opac. some libraries took advantage of available features and made better use of them than others. (see appendix g for examples of best practices of library opacs.) many libraries did only the very minimum. while we recognize that academic library opacs are difficult to use, we also need to recognize that some libraries do not fully exploit existing resources, thereby exacerbating the difficulty of using their opacs. 3. the unsuitability of marc standards to online bibliographic display as previously mentioned, aacr and marc were initially designed for card catalogs without display labels in mind. many marc fields can be used for multiple purposes. providing labels that properly fit all the cataloging data needed to cover all types of resources is nearly impossible. from the opacs studied, some libraries used vague labels in an effort to encompass as many circumstances as possible. some libraries used labels suitable only for certain formats, but not all formats. neither approach is satisfactory. the solution has to come from cataloging and metadata standards. wool identified this issue back in the 1990s: the interchangeability of descriptive data elements and access points (since each can be made to serve both functions online) makes the separate creation of description and headings seem pointless and burdensome. labeling of data elements (made possible through the mapping of terms to marc fields) creates a need for simpler, less ambiguous bibliographic data definitions than are appropriate for the dense and context-rich narrative-style records catalogers continue to create . . . cataloging standards will need to be rewritten in order to provide the kind of data flexibility expected in online systems . . . records flexible enough to be added to, subtracted from, and rearranged without loss and garbling of meaning. what is needed is a modular record structure, in which every segment of data can stand on its own with appropriate labeling and which can support all possible display lengths and combinations of data elements.28 a decade later, not much progress has been made in improving cataloging and metadata standards for online display. while enhancing cataloging and metadata standards for better retrieval is desirable, making the standards more complicated and difficult to adopt in order to accommodate opac displays is not. as librarians are working to simplify cataloging, our essential rich metadata should not be sacrificed. one possible solution is to have the system recognize the existence of certain subfields and produce specific display labels accordingly. this certainly will not solve all the issues with regard to display labels. regardless, there is much room for improvement, and librarians’ attention is this area is critically needed. ■ conclusion the information-seeking world has entered an era of selfservice. roy tennant described well the self-service trend: “i wish i had known that the solution for needing to teach our users how to search our catalog was to create a system that didn’t need to be taught.”29 tim o’reilly also indicated in his article “what is web 2.0” that “the web 2.0 lesson [is to] leverage customer-self service and algorithmic data management to reach out to the entire web, to the edges and not just the center, to the long tail and not just the head.” he also argued that “[t]rusting users as co-developers” is one of the core competencies of web 2.0 companies.30 academic libraries should aim toward designing a user-centered, self-sufficient, twenty-first-century online catalog that fits the web 2.0 model. the ultimate goal is that users will be comfortable and confident using library opacs for their information needs wherever a computer 18 information technology and libraries | march 200818 information technology and libraries | march 2008 is available and without special training. as campbell and fast have trenchantly asked, “are we witnessing a major disruption, a large-scale redefinition of information design and delivery so radically different from the traditional library environment that it renders irrelevant all our experience in bibliographic control?”31 this remains an open question. regardless, a new generation of opacs will need to be in place soon. much needs to be done to make academic library opacs matter. academic librarians cannot afford to be considered irrelevant in the information-seeking world. the future of academic libraries relies on effective opacs. this is one of the most pressing tasks that must be accomplished. references and notes 1. cathy de rosa et al., perceptions of libraries and information resources: a report to the oclc membership (dublin, ohio: oclc, 2005), 1–17. http://www.oclc.org/reports/2005perceptions.htm (accessed jan. 20, 2007). 2. karen coyle and diane hillmann, “resource description and access (rda): cataloging rules of the 20th century,” d-lib magazine 13, no. 1/2 (2007). http://www.dlib.org/dlib/january07/coyle/01coyle.html (accessed feb. 3, 2007). 3. anna m.van scoyoc and caroline cason, “the electronic academic library: undergraduate research behavior in a library without books,” portal: libraries and the academy 6, no. 1 (2006): 47–58. 4. carol tenopir, “user and users of electronic library resources: an overview and analysis of recent research studies,” council on libraries and information resources, 2003. http://www.clir.og/pubs/reports/pub120/pub120 (accessed jan. 20, 2007). 5. cathy de rosa et al., the 2003 oclc environmental scan (dublin, ohio: oclc, 2003), http://www.oclc.org/reports/ escan/introduction/default.htm (accessed jan. 20, 2007). 6. d. grant campbell and karl v. fast, “panizzi, lubetzky, and google: how the modern web environment is reinventing the theory of cataloguing,” the canadian journal of information and library science 28, no. 3 (2004): 25–38. 7. roy tennant, “breaking library services out of the box,” presentation (2005), http://www.cdlib.org/inside/news/ presentations/rtennant/2005netspeed/ (accessed feb. 11, 2007); andrew pace, “my kingdom for an opac,” american libraries online (feb. 2005), http://www.ala.org/ala/alonline/ techspeaking/2005colunms/techfeb2005.cfm (accessed feb. 11, 2007); karen g. schneider, “how opacs suck, part 1: relevance rank (or the lack of it),” ala techsource blog (mar. 13, 2006), http://www.techsource.ala.org/blog/2006/03/how-opacssuck-part-1-relevance-rank-or-the-lack-of-it.html (accessed feb. 11, 2007); karen g. schneider, “how opacs suck, part 2: the checklist of shame,” ala techsource blog (apr. 3, 2006), http:// www.techsource.ala.org/blog/2006/04/how-opacs-suck-part2-the-checklist-of-shame.html (accessed feb. 11, 2007); “how opacs suck, part 3: the big picture,” ala techsource blog (may 20, 2006), http://www.techsource.ala.org/blog/2006/05/ how-opacs-suck-part-3-the-big-picture.html (accessed feb. 11, 2007); lorcan dempsey, lorcan dempsey’s weblog (oct. 4, 2005), http://orweblog.oclc.org/archives/000815.html (accessed feb. 11, 2007); kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology and libraries 25, no. 3 (2006): 128–139. 8. roy tennant, “libraries through the looking-glass,” 2004 ala midwinter endeavor presentation. http://www.cdlib. org/inside/news/presentations/rtennant/2004ala/ (accessed march 16, 2007). 9. charles ammi cutter, rules for a printed dictionary catalogue (washington, d.c.: government printing office, 1876). 10. d. grant campbell and karl v. fast, “panizzi, lubetzky, and google: how the modern web environment is reinventing the theory of cataloguing,” 31. 11. holly yu and margo young, “the impact of web search engines on subject searching in opac,” information technology and libraries 23, no.4 (2004): 194. 12. jakob nielsen, “mental models for search are getting firmer,” in jakob nielsen’s alertbox, http://www.useit.com/ alertbox/20050509.html (accessed feb 20, 2007). 13. eng pwey lau and dion hoe-lian goh, “in search of query patterns: a case study of a university opac,” information processing and management 42, no. 1 (2006): 1316–1329. 14. holly yu and margo young, “the impact of web search engines on subject searching in opac,” 173. 15. dinet jérome, favart monik and passerault jean-michel, “searching for information in an online public access catalogue opac: the impacts of information search expertise on the use of boolean operators,” journal of computer assisted learning 20, no. 5 (2004): 338–346. 16. gregory wool, “the many faces of a catalog record: a snapshot of bibliographic display practices for monographs in online catalogs,” information technology and libraries 15, no. 3 (1996): 184. 17. the fifteen libraries are located at the college of new jersey, library of congress, northwestern university, princeton university, state university of new york at buffalo, temple university, university of arizona, university of florida, university of illinois–urbana-champaign, university of michigan, university of minnesota, university of rochester, university of texas– austin, university of washington, and vanderbilt university. 18. gregory wool, “the many faces of a catalog record: a snapshot of bibliographic display practices for monographs in online catalogs,” 173–195. 19. eight titles representing monograph, serial, video recording, and sound recording were used to study the effectiveness of the bibliographic display. the eight titles are: (1) to love the wind and the rain: african americans and environmental history, edited by dianne d. glave and mark stoll. university of pittsburgh press, 2006. (monograph) (2) to kill a mocking bird, by harper lee (mongraph) (3) rma annual statement studies, robert morris associates, 1977(serial) (4) sideways (20th century fox, 2004) (video recording) (5) chamber music (newport classic, 2000) (sound recording) (6) end of summer book of hours ; bright music, naxos, 2003 / by ned rorem (sound recording) (7) jama : the journal of the american medical association, 1960(serial) article title | author 19revitalizing the library opac | mi and weng 19 (8) the 21st century at work, by lynn a. karoly (rand, 2004) (mongraph) 20. many vendors retag the 440 field to 490 in bibliographic record and create an 830 field based on the contents of the 440 field. the series title in the 830 field receives authority control. many libraries prefer not to restore the 830 field back to the 440 fields causing the duplicate series statements on opac if both fields are displayed. 21. lorcan demsey, “making data work—web 2.0 and catalogs.” 22. ibid. 23. brian lavoie, lorcan dempsey, and lynn silipigni connaway, “making data work harder,” library journal.com (jan. 15, 2006), http://www.libraryjournal.com/article/ca6298444. html (accessed jan. 28, 2006). 24. tim o’reilly, “what is web 2.0: design patterns and business models for the next generation of software,” (sept. 30, 2005), http://www.oreillynet.com/pub/a/oreilly/tim/ news/2005/09/30/what-is-web-20.html (accessed jan. 28, 2007). 25. tito sierra, “a faceted interface to the library catalog,” ala 2007 midwinter meeting, http://www.lib.ncsu.edu/ endeca/presentations.html (accessed feb. 11, 2007). 26. karen markey, “the online library catalog: paradise lost and paradise regained?” d-lib magazine 13, no.1/2 (2007). http://www.dlib.org/dlib/january07/markey/01markey.html (accessed feb. 11, 2007). 27. kristin antelman, emily lynema, and andrew k pace, “toward a twenty-first century library catalog,” 129. 28. gregory wool, “the many faces of a catalog record: a snapshot of bibliographic display practices for monographs in online catalogs,” 184–185. 29. roy tennant, “lipstick on a pig,” library journal.com (apr. 15, 2005), http://libraryjournal.com/article/ca516027. html (accessed feb. 11, 2007). 30. tim o’reilly, “what is web 2.0: design patterns and business models for the next generation of software.” 31. d. grant campbell and karl v. fast, “panizzi, lubetzky, and google: how the modern web environment is reinventing the theory of cataloguing,” 26. appendix a. default search keys used by arl libraries (as of march 2007) appendix b. keyword search keys used by voyager libraries keyword (relevance) keyword (boolean) keyword with relevance ranking keyword (enclose phrases “in quotes”) keyword anywhere (user “” for phrase) keyword combined (use and/or/not “ “ for phrase) keyword anywhere (relevance ranked) keyword (and or not) keyword anywhere advanced boolean words anywhere keyword boolean basic keyword keyword(s) (user and, or, not, or “a phrase”) any word anywhere boolean search (use and or not) relevance keyword (user + for key terms) command keyword keyword phrase keyword (use “and” “or” “not”) keyword and or not( keyword boolean) keyword (results sorted by relevance) expert keyword keyword keyword expert (user an or not “phrase”) keyword command ranked keyword keyword 20 information technology and libraries | march 200820 information technology and libraries | march 2008 keyword (ranked by relevance) keyword keyword command search find all words search for a phrase keyword (quick search) boolean search appendix c. default keyword search help page provided by voyager system keyword search ■ enter words and/or phrases ■ use quotes to search phrases: "world wide web" ■ use + to mark essential terms: +explorer ■ use * to mark important terms: *internet ■ use ? to truncate (cut off) words: theat? finds theaters, theatre, theatrical, etc. ■ do not use boolean operators (and, or, not) to combine search terms boolean ■ use the boolean terms (and, or, not) to combine search terms. ■ use quotation marks to search for a phrase, e.g., "united states" ■ use ? to truncate a word, e.g., browser? ■ use parentheses to group search terms, e.g., (automobile or car) and repair appendix d. display labels for entries of principal responsibility marc fields libraries 100 (personal name) 110 (corporate name) 111 (meeting name) u. of arizona author author author u. of ill. author author conference lc personal name corporate name meeting name u. of minnesota author author author u. of michigan author author author northwestern u. author, etc. author, etc. author, etc. princeton u. author/artist author/artist author/artist u. of washington author author author suny buffalo author author author temple author corp author conference u. of florida author, etc. author, etc. author, etc. u. of rochester main author main author conference ut austin author corporate author conference tcnj principal author principal author conference name vanderbilt u. author corporate author meeting/event name article title | author 21revitalizing the library opac | mi and weng 21 appendix e. display labels for publication extent libraries marc 362 field u. of arizona issued u. of ill. publication history lc description u. of minnesota published u. of michigan pub history northwestern u. extent of publication princeton u. description u. of washington (suppressed from opac) suny buffalo publication dates temple publication started u. of florida publishing history u. of rochester (suppressed from opac) ut austin publication coverage date tcnj description vanderbilt u. volume/date range appendix f. display labels for entries of secondary responsibility marc fields libraries 700 (personal name) 710 (corporate name) 711 (meeting name) u. of arizona other auth other auth other auth u. of ill champaign other name other name other name lc related names related names related names u. of minnesota contributor contributor contributor u. of michigan contributors people contributors other contributors other northwestern u. other authors, title, etc. other authors, title, etc. other authors, title, etc. princeton u. related name(s) related name(s) related name(s) u. of washington alt author alt author alt author suny buffalo contributors contributors contributors temple other author(s) other author(s) other name u. of florida other author(s), etc. other author(s), etc. other author(s), etc. u. of rochester other author(s) other author(s) other author(s) ut austin added author (not display) (not display) tcnj other contributor(s) other contributor(s) conference name vanderbilt u. author, editor, etc. corporate author meeting/event 22 information technology and libraries | march 200822 information technology and libraries | march 2008 appendix g. examples of best practices of opacs (accessed july 16, 2007) search interface, including retaining search keys and searched terms university of notre dame http://alephprod.library.nd.edu:8991/f/?func= find-b-0&local_base=ndu01pub keyword searching ability michigan state university http://magic.msu.edu/search~/x facets browsing (endeca) north carolina state university http://www.lib.ncsu.edu/catalog mcmaster university http://libcat.mcmaster.ca make author, subject and call number links more accessible university of virginia https://virgo.lib.virginia.edu/uhtbin/cgisirsi/0/ uva-lib/0/60/1180/x links to amazon ratings ohio state university http://library.ohio-state.edu/search direct export to refworks johns hopkins university https://catalog.library.jhu.edu/ipac20/ipac. jsp?profile=default#focus university of chicago http://libcat.uchicago.edu/ipac20/ipac. jsp?profile=ucpublic cover art/toc/ summary/review indiana university http://www.iucat.iu.edu/authenticate.cgi?status=start guesstimate/del.icio.us persistent link enabled virginia tech http://addison.vt.edu author id? metaphor’s role in the information behavior of humans interacting with computers | sease 9 robin sease metaphor’s role in the information behavior of humans interacting with computers metaphors convey information, communicate abstractions, and help us understand new concepts. while the nascent field of information behavior (ib) has adopted common metaphors like “berry-picking” and “gap-bridging” for its models, the study of how people use metaphors is only now emerging in the subfield of human information organizing behavior (hiob). metaphors have been adopted in human–computer interaction (hci) to facilitate the dialogue between user and system. exploration of the literature on metaphors in the fields of linguistics and cognitive science as well as an examination of the history of use of metaphors in hci as a case study of metaphor usage offers insight into the role of metaphor in human information behavior. editor’s note: this article is the winner of the lita/ ex libris writing award, 2008. o ur world is growing increasingly digital; our entire lives—our interactions, our entertainment, even our personal memories—are mediated by technology. humans have had thousands of years to learn to communicate with each other, largely employing metaphors and analogies to negotiate meaning. our experience communicating with computers is both nascent yet broadening every day with increasing dependency. we must fully understand the role that metaphors play in the exchange of information to facilitate the communication between humans and computers. n metaphors: a definition originally regarded as rhetorical devices, plato abhorred the use of metaphors, arguing that they could convince a man to do the illogical. schön explains that at that time metaphors were considered a “kind of anomaly of language, one which must be dispelled in order to clear the path for a general theory of reference or meaning.”1 aristotle, on the other hand, saw that they provided insight into the items of comparison. “ordinary words convey only what we know already; it is from metaphor that we can best get hold of something new.”2 traditionally the objects in the equation have been called the tenor and the vehicle, but more recently they are referred to as the target and source domains. in the metaphor, “alex is a space cadet,” alex is the tenor or target domain (the abstract or undefined), and space cadet represents the vehicle or source domain (the known). if “the essence of metaphor is understanding and experiencing one thing in terms of another,” then the vehicle or the source domain is responsible for elucidating the tenor or target domain.3 one measures the relationship between these domains, the tenor and the vehicle, with “ground” and “tension.” ground concerns the similarities between the domains and tension represents the dissimilarities.4 metaphors have been studied from multiple perspectives: from the creative use of metaphors in literature to the comprehension or appreciation of metaphors.5 the research from other disciplines can offer insight into the effect of metaphors on human information behavior. i will first discuss the use of metaphors in language and then review some of the theories on how they work. n metaphorically speaking: the role of metaphors in language the work of lakoff and johnson has been fundamental to understanding the pervasive use of metaphors in our language. they propose that metaphors are an underlying structure forming and shaping the way we discuss and even think about the world. they argue that the “human conceptual system is metaphorically structured and defined.”6 mapping from a source domain to a target domain is central to the semantics of language and communication. “domains need structure so that one can reason about them. the major function of metaphor is thus to supply structure in terms of which reasoning can be done.”7 in metaphors we live by, lakoff and johnson catalogue examples of underlying conceptual metaphors. they identify orientation metaphors that underlie how we speak about abstract concepts such as health, happiness, and success. each of these states is associated with the direction up. one can be “up and at ‘em” or in “high spirits” or of “high standing.” counter examples include “being under the weather,” “feeling down,” and “low comedy.” metaphors shape the way we think about the concepts we are describing. for instance, the metaphor “argument is war” (“defending your point of view,” “attacking your opponent’s stance,” and “he shot me down”) may define expectations for “winning” and “losing” and detrimentally shape our ability to negotiate and compromise.8 lakoff and johnson refer also to michael reddy’s 1979 piece, “the conduit metaphor.”9 reddy hypothesizes that linguistically and conceptually we see ideas or meanings as objects, linguistic expressions as containers, and communication as sending. the “receivers” of robin sease (seaser@u.washington.edu) is an mlis candidate at the ischool, university of washington, seattle. 10 information technology and libraries | december 2008 the communication are the information users or seekers. the designers “package their ideas,” “put them down on paper,” and “convey” them to the user who “gets” them or not. reddy argues that this underlying metaphor influences the way we think about the communication process, making information and meaning an object rather than a process, which trivializes the function of the reader or listener.10 metaphors are undeniably central to our ability to communicate and use language, and perhaps more fundamentally, to convey meaning or to infer meaning—to illustrate and explain as well as to identify and to catalog. the role of metaphors in human cognition is still a matter of great debate. n thinking about metaphors: the cognitive role of metaphors information science is at its heart the study of information. if metaphors exist as a necessary component of language—a tool to convey meaning and to transfer information—then metaphors are by necessity a component of information science. understanding how metaphors work provides insight into information itself. early propositions about how metaphors were understood stemmed from poetic and rhetorical research. that is, if a sentence cannot be interpreted literally, then it must be interpreted figuratively. to illustrate, the assertion “my child is a pig” is initially illogical, so the receiver would then move on to figurative interpretation. once that determination is made, the mind sets about finding meaning from the expression. this theory argues that once the statement is deemed false, the statement is treated like a simile, or a comparison statement, by identifying traits or attributes in the source domain (the pig: sloppy, slovenly, fascinated with mud) that would be applicable to the target domain.11 one group of theorists questions this premise, pointing to sentences that can be interpreted literally and figuratively. one useful example is the statement “my dog is an animal.”12 while this expression is true literally, most would reject the literal interpretation in favor of one that depicts the dog as a ferocious or uncontrollable beast. glucksberg and keysar, among others, seek a model that focuses on the associations between the domains. they hypothesize that metaphors are not “implicit comparisons” but are class-inclusion statements or “assertions of categorization.”13 research in cognitive processing of analogies has shifted from plain association of a is to b where a traits are matched to b traits to a hypothesis that maps from a to b and leads insight into a super-ordinate category that includes both a and b. gentner’s work studying science metaphors in the 1980s is partially founded on this theory. she notes that through “analogical reasoning, learning can result in the generation of new categories and schemas.”14 she is particularly interested in creating ways for computers to interpret figurative expressions. she proposes a structuremapping theory: a system of relations (not just traits) from the source domain to the target domain with a parallelism between the structures that allows for a one-to-one mapping of the domains and relationships. weiner explores a similar tactic with human–computer interaction language processing by prototyping the shared framework. the prototype theory allows for a range of possible predicates and would accommodate greater tension (the differences in a metaphor) in the same way that we can categorize penguins and chickens under the prototype of bird.15 these theories of categorization remain popular today, but still struggle to account for certain things about the way metaphors are comprehended. specifically, take the shakespearean line, “juliet is the sun.” categorization theory does not explain why some attributes like “glowing” and “center of the solar system” are transferred from the source while others such as “nuclear” and “huge” are not.16 this theory also stumbles with novel poetic metaphors like e e cummings’ “the voice of your eyes is deeper than all roses.”17 alternative theorists argue that while the categorization-based theories accommodate the ground (commonality) in a metaphor, they fail to fully explain the effect and purpose of the tension (differences) in the equation. lakoff fervently contends that simplifying conceptual models to mere categorization ignores the unique nature of each specific mapping: each mapping defines an open-ended class of potential correspondences across inference patterns. when activated, a mapping may apply to a novel source domain knowledge structure and characterize a corresponding target domain knowledge structure.18 in other words, each pairing creates new meaning or conceptual frameworks from which other metaphors and meanings can be instantiated. a is to b creates meaning c, rather than a and b are part of c. looking at it from the perspective of lanier, a vocabulary is created upon which we can define even more vocabulary.19 lakoff maintains that the theory of conceptual domains speaks to both the uni-directional nature of metaphors as well as the “systematicity” that allows the interpreter to selectively identify the aspects that are consistent and discard the aspects that are inconsistent with the metaphor.20 more recent work approaches the question from a connectivist point of view, seeking ways to identify an overarching model consistent with and encompassing of other theories. this premise rests on the foundation of metaphor as communication and examines the use of metaphors in conversational contexts. the necessary mutual cognitive environment of the communicators, the metaphor’s role in the information behavior of humans interacting with computers | sease 11 working memory, and the common ground that they find are all of importance, but so are context and motivation as influencing factors. the context in which the statement is made, the place in which it is interpreted, and the motivation of the user to understand the statement combine to affect the meaning that is derived. for instance, the phrase, “i want you to sheepdog this project” could mean something different in the context of a chaotic group of workers than in the context of a core team threatened by competing entities.21 likewise, the relationship of the receiver to the sender could modify the motivation of the receiver to seek meaning beyond the first or easiest interpretation. n classifying metaphors: metaphors in information science these notions of context and user-motivation are not new to the field of information science. at the turn of the century the subfield of information behavior had begun to direct its attention to cognitive psychology, the nature of man–machine dialogue, and to a certain extent the role of metaphor in deciphering and creating meaning. spink investigates human information behavior (hib) from an evolutionary perspective.22 after exploring a wide variety of research in fields, spink and currier performed a qualitative analysis of the information behavior of historical figures. they postulated that modular cognitive architecture makes homo sapiens rare in their ability to think of one thing in terms of another.23 the resulting mapping allows for the creation of new cognitive structures in a similar fashion to lanier’s vocabulary development conjecture. spink and currier’s work launched a new theory of information use, which has led to recent research into metaphor use. in an attempt to model an integrative approach to human information behavior incorporating the everyday life information-seeking and sense-making approach, the information-foraging approach, and the problem-solution view of information seeking, spink and cole recognized a gap in the research covering actual information use and proffered a fourth information approach to account for it. their information-use theory “starts from an evolutionary psychology notion that humans are able to adapt to their environment and survive because of our modular cognitive architecture.”24 development of this theory has birthed a sub-area within the field of human information behavior dubbed human information organizing behavior (hiob) of which the use of metaphors or metaphor instantiation is a necessary component. cole and leide explore the notion of modular cognitive architecture in an attempt to model a cognitive framework for metaphor use in hiob. similar to the categorization theory of metaphor use, they claim that “metaphor instantiation is similar to a form of superordinate category instantiation . . . along with the metaphor comes the structure of the metaphor.”25 following in belkin’s footsteps, they address the problem of a “domain novice attempting to formulate his information need into an effective query to an information retrieval system.”26 they conducted three case studies with the purpose of developing a methodology that researchers can use to “ascertain the efficacy of metaphor instantiation as an information need structuring device.”27 they conclude that metaphor instantiation might help us create systems that more closely resemble the way that humans behave with information: interaction, organization, and retrieval. n metaphors in human–computer interaction: a case study reality bytes while theorists of various fields explored the nature of metaphors, the field of human–computer interaction (hci) found itself thrust into the thick of it. rarely does one intentionally adopt new ideas so whole-heartedly without first considering the ramifications, but the history of hci shows that that is exactly what happened. it began with enthusiastic adoption to improve communication, then reeled in recognition of the drawbacks of metaphor mismatches, and finally has lurched to a standstill while new approaches to metaphor use are explored. the first instances of metaphor and analogy in the field of computer science and hci preceded images of windows, desktops, mice, scrollbars, and icons. the initial focus was on natural language processing to improve the communication between the user and the system.28 although the field of information science was on the periphery of metaphor research at the time, it certainly was interested in improving the dialogue between users and systems. belkin proposed a model of information seeking that highlighted the user ’s anomalous state of knowledge. he argued for a better understanding of user’s conceptual models in order to improve system communications.29 although he did not propose metaphors specifically, the advent of the graphical user interface (gui) placed metaphors in a position to tackle belkin’s concerns. hci gets gui perhaps because of the difficulty of man–machine dialogue, guis emerged. by simplifying the “language” to “point and click,” even an average user could make the system do what it was supposed to do.30 with its more intuitive and memorable interface, the gui was the 12 information technology and libraries | december 2008 result of years of frustration trying to remember system functions and commands. because illustrations of the abstract are necessarily grounded in something concrete, guis and metaphors were inexorably intertwined; in a sense, metaphors were “inescapable.”31 metaphors enacted through the user interface would become the primary mechanism of communication between the user and the system. gui metaphors can be categorized several ways. a typical breakdown is to break out noun and verb metaphors into “organization metaphors” and “operations metaphors.”32 alternatively, fineman further divides the nouns and classifies various metaphors into three basic types: functionality metaphors, interface metaphors, and interaction metaphors.33 fineman describes an e-mail program. functionality metaphors outline the expectations that a user should have for an application and generally guide the overall behavior of the tool. in the e-mail program the functionality metaphor would be “e-mail is postal mail.” interface metaphors are the mechanical metaphors that allow the user to accomplish the tasks within the functionality metaphor. the interface metaphors should be guided by the functionality metaphors, but not constrained by them. examples would include the address book and printer metaphors. interaction metaphors, or the verbs, are the underlying metaphors that define the form of the action, how things are performed; these metaphors span beyond a particular tool, but greatly affect the functionality metaphor.34 the effect of the selected metaphors cannot be underestimated. for instance, many feel that the direct manipulation metaphor (data is an object that can be manipulated) and gui are synonymous.35 and within the graphical user interface, the choice of desktop has affected all aspects of the interface with the user. one need only reflect upon the famous englebart demonstration of the “mouse” most often viewed in alan kay’s video presentation.36 englebart’s mouse preceded the notion of a desktop and more closely resembled a pilot’s controls than an office worker sitting at a typewriter keyboard. imagine how different our computers would be today had the pilot metaphor ever got off the ground.37 the ground we walk on having adopted metaphors, the field of hci wanted a better understanding of why and how they worked. carroll and thomas stressed the importance of psychology research and rallied for the use of metaphor for its grounding purposes, that is, bridging abstract concepts to concrete attributes. in a manner similar to belkin, they brought forth the notion that the designer of the system creates a conceptual model of how it works. the metaphors used within the user interface serve as bridges to the user’s mental model of the system. “people employ metaphors in learning about computing systems, the designers of those systems should anticipate and support likely metaphorical constructions to increase the ease of learning and using the system.”38 they encouraged designers to consider the limitations and consequences of metaphors; ideally, the metaphor should convey its limitations to the user. their eagerness to adopt metaphors, which they considered “crucial” for motivating and facilitating understanding, was countered only by their warning that “for most computer systems there will come a point at which the metaphor or metaphors that initially helped the user understand the system will begin to hinder further learning.”39 case recognized the importance of assessing users’ needs and expectations when designing metaphors for systems. his study of historians found that metaphors and analogies are commonly used in the information behavior of historians. he endorsed their use in interface development despite potential pitfalls. concerned mostly with transitioning historians from physical to electronic format, case argued that digital documents and files should more closely resemble physical files— not necessarily physically but in the manner of retrieval and storage.40 espousing a slightly more conservative opinion, marcus indicated that an “appropriate metaphor balances delicately expectation and surprise on part of the user/viewer.”41 marcus repeated that the objective of the designer is to design a conceptual model that clearly indicates to users what their expectations of the system should be, the goal being that the conceptual model created by the designers will map as much as possible to an existing mental model that the user can bring to reference.42 metaphors are not only useful for familiarizing users with the system, but also affect the system design as part of the design rationale. maclean, bellotti, young, and moran noted the usefulness of metaphors in the creative process, but expressed concern that designers should consider the effect of even implicit metaphors.43 some metaphors are inevitable because “new concepts and processes require new terminology. we can either coin new terms, borrow them from greek, latin, or other languages, create terms by adding prefixes or suffixes—or use metaphoric terms.”44 many metaphors used by designers in their communications are simply embedded in the language of computer science. what makes computer science so unique among the sciences, especially when using metaphors, is that they not only talk about something in terms of metaphors, they implement them too. “we live with our metaphors.”45 this discourse may carry loads of inexplicable metaphors for common users, “heaps” and “stacks” and “parents” and “children,” for instance, come readily to mind for anyone with computer science experience, but do metaphor’s role in the information behavior of humans interacting with computers | sease 13 not necessarily convey meaning to users. we should stay aware of our metaphors so that we avoid seeing “platforms, engines and objects rather than ‘platforms’, ‘engines’ and ‘objects.’”46 the tension builds these caveats that metaphors must be constantly monitored and selected with care, coupled with a growing collection of mismatched and ill-fitting metaphors, began the initial protestations over the use of metaphors in hci. the field of hci started experiencing the effect of the tension in the metaphorical equation—those attributes that fail to match. gentner and nielson summarize three “classic drawbacks” of metaphors: n the target domain has features not in the source domain (magic attributes). n the source domain has features not in the target domain (misleading attributes). n some features exist in both domains but act differently (violation of expectations).47 even proponents of metaphors readily admitted the limits of metaphors, specifically that they never match perfectly and that they can “limit meaning.”48 halasz and moran cautioned that teaching new users through analogical models may be an easy way to introduce a user to a new system but that “analogical models can act as barriers preventing new users from developing an effective understanding of systems.”49 halasz and moran argued that computers are unique; we should abandon analogical models and rather seek to create a conceptual model of the system that would more accurately reflect the actual system. a system designer’s conceptual model would represent the system to improve the user’s ability to solve problems and apply reason within the system. they confess that moving away from analogical models leaves the user without the tool of “prior knowledge,” so for teaching purposes (though not long-term reasoning purposes) they offer the use of smaller, simpler metaphors—those that they liken to literary metaphors used to “make a point in passing. once the point is made, the metaphor can be discarded.”50 noting that there was room for error and rejection on behalf of the user, marcus explained that some inappropriate metaphors simply become assimilated or evolve. for example, the original apple trashcan icon more closely resembled a “kitchen garbage can” for scraps and rotting things than an office wastebasket for paper, but over the years it has evolved to its current office basket icon.51 also, as technology changes, the metaphors will change. “the paradigm shift, or change in metaphors, will be constant and swift as paradigms evolve from prototypes, become typed, evolve to archetypes, and eventually become stereotyped or obsolete.”52 without stating it explicitly, he spoke of dead metaphors: metaphors that no longer bring new meaning to light, the “arm” of a chair or the “leg” of a table, for instance. these metaphors are accepted idiomatically with no need for explanation and exploration. aware of the ease with which users employ idiomatic icons in computing, cooper adduced that idioms and meaningless symbols are preferable to new metaphors, claiming “metaphors offer a tiny boost in learnability to first time users at tremendous cost. the biggest problem is that by representing old technology, metaphors firmly nail our conceptual feet to the ground, forever limiting the power of our software.”53 he proposed that we move away from a metaphoric paradigm to an idiomatic paradigm where a word or symbol simply stands for something else and does not carry with it the weight of analogy. many of the metaphors originally created in computing have become dead metaphors or idioms already. people do not think of their memory buffer where they store copied or cut items as an actual clipboard. the macintosh trashcan is ubiquitously cited as a perfect example of a mismatched metaphor and illustrates what may happen when a metaphor becomes idiomatic. for many years to the horror and confusion of many users, the trashcan both deleted files and was used to eject a diskette. a user would drag their diskette icon to the trashcan to eject it. although this may seem like just a poor choice of metaphor, it does have a sensible origin. historically, computers had no hard drive, but rather ran applications from diskettes. when you were entirely done with the application you would remove the application icon from the desktop by placing it in the trash. you would also need to eject the diskette. for expediency, apple engineers incorporated ejection and desktop removal into one quick task. it was user tested and readily adopted.54 the metaphor was a natural extension until the functionality changed. the user is not the only potential victim of metaphors; the blinders of an adopted metaphor can curtail a system designers’ vision.55 gentner and nielson take great offense at the direct manipulation metaphor because it reduces us to “pointing” and “grunting” as if we were children barely able to communicate or patrons at a restaurant where we don’t speak the language. when they state “computer interfaces must evolve to utilize more of the power of language,”56 they are not speaking of voice control and natural language processing, but to creating a shared language understandable by both the user and the system. only “power users” of a machine have breached the walls of the interface and have attempted to learn the language of the machine itself, but even they are inevitably dragged down by the restrictions of direct manipulation.57 14 information technology and libraries | december 2008 near the end of the millennium, user interface guidelines and handbooks backed off—afraid to support or spurn metaphor use in hci. blackwell’s chronicle of the history of the desktop metaphor notes that 1990 “marked the middle of a decade (1985 to 1995) in which researchers anticipated problems with metaphor at the start and had experienced failure by the end.”58 the silence is most stunning in the handbook of human computer interaction, a 1,582 page volume in which only two of the sixty-two chapters even mention metaphors.59 hollan, bederson, and helfman caution against metaphors in their chapter on information visualization,60 while neale and carroll cautiously return to carroll’s original thesis, stressing the importance of creating a conceptual model (the designer’s model of the system’s functions) that “should incorporate an accurate understanding of the user’s task, requirements, experience, capabilities, and limitations.”61 n metaphor ever after by the year 2000, system designers found themselves stuck between a rock and hard drive. investigations into the efficacy of metaphors find that metaphors are a mixed bag, unavoidable, useful, yet problematic.62 while creating a taxonomy of hci metaphors, barr, biddle, and noble conclude that “the analysis present in the taxonomy should indicate that there are many benefits to user-interface metaphors if we choose them correctly and harness them properly.”63 yet blackwell’s dissertation research finds that metaphors afford “surprisingly little benefit for cognitive tasks” and that the benefit is “largely restricted to mnemonic assistance.”64 blackwell notes that the benefits were greatest when the user constructed his or her own metaphor rather than using the system supplied metaphor. interestingly, while studying students’ understanding of search engines, hendry identified a conceptual metaphor (not provided by the system) common to many of the students’ visions of an information retrieval system. although hendry does not suggest that metaphors should be used when creating systems, he does question how existing conceptual metaphors might be identified through sketching and then incorporated to create mappings “between problem domains and programming notations.”65 endeavoring to incorporate the benefits of metaphors while dodging the drawbacks, recent variations on the use of metaphor have been tendered. neale and carroll lobby for composite metaphors—metaphors made up of multiple metaphors—to alleviate the tension between source and target domains.66 powell found composite metaphors useful for facilitating computer game play without unduly upsetting users. she explains that gamers have readily adopted the tool or inventory bag from which the user may equip their character with a mannequin style “dress-up” panel. the bag and mannequin metaphors have no real-world association but work effectively.67 hsu, investigating composite metaphors, confirmed neale and carroll’s assertions and found that the “closer the mapping between designers’ conceptual models and users’ mental models, the greater the effect of interface metaphors.”68 as an alternative to composite metaphors, khoury and simoff propose a new class of metaphors that they call “elastic.” they explain that metaphors in language are unavoidable, and we must deal with them in information technology. rather than focusing on concrete objects, however, metaphors should focus on social structures, such as relationships in game play. they conclude that “elastic metaphors can provide an optimal mapping from source to target domains.”69 n conclusion historically, in hci the designer of the system supplies metaphors to help the user understand the system better. unfortunately, this format falls prey to reddy’s conduit metaphor: the receiver of the information is left out of the communication process. if hci is to learn from humanto-human interaction, then the user of the system should be able to communicate his or her needs to the system. if the system does not have the capacity to understand the request, then the user and the system should be empowered to select mutually agreeable simple metaphors for communicating. the user should be given the option to choose his or her own metaphors, and the metaphors, vocabulary, and “language” created should be able to evolve as the boundaries of the comparison are reached. a common complaint from users is, “the computer just isn’t listening to me.” and they are, of course, right. the field of information science, and particularly the subfield of human information behavior, are in a unique position to help resolve the long-standing debate over the use of metaphors in hci. from belkin’s early stated objectives to improve information systems to cole and leide’s pursuit of metaphor instantiation in human information organizing behavior, the study of information behavior attempts to better understand and ideally facilitate the user—assisting them in their acquisition and application of information. metaphors are clearly utilized by humans as they communicate with each other, seek and conceptualize information, and solve problems. to improve the interaction between human and computer, we must first gain better insight into the role that metaphors play in our own interaction with information. metaphor’s role in the information behavior of humans interacting with computers | sease 15 references 1. donald a. schön, “generative metaphor,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1993): 138 . 2. aristotle, rhetoric, book iii, chapter 10, ed. lee honeycutt, trans. william r. roberts, www.public.iastate.edu/~honeyl/ rhetoric/rhet3-10.html (accessed june 25, 2008). 3. george lakoff and mark johnson, metaphors we live by (chicago: univ. of chicago pr., 1980): 5. 4. andrew ortony, “metaphor, language, and thought,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1979/1993): 1–18. 5. robert sternberg, roger tourangeau, and georgia nigro, “metaphor, induction, and social policy: the convergence of macroscopic and microscopic views,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1979/1993): 277–303. 6. lakoff and johnson, metaphors we live by, 9. 7. george lakoff, “the contemporary theory of metaphor,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1979/1993): 194. 8. lakoff and johnson, metaphors we live by. 9. michael j. reddy, “the conduit metaphor,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1993): 174–201. 10. ibid. 11. alan paivio and mary walsh, “psychological processes in metaphor comprehension and memory,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1979/1993): 307–28. 12. sam glucksberg and boaz keysar, “how metaphors work,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1993): 408. 13. ibid., 401. 14. dedre gentner, “reasoning and learning by analogy,” american psychologist 52, no. 1 (1997): 33. 15. judith e. weiner, “a knowledge representation approach to understanding metaphors,” computational linguistics 10, no. 1 (1984): 1–14. 16. george lakoff, “position paper on metaphor,” proceedings of the 1987 workshop on theoretical issues in natural language processing (morristown, n.j.: association for computational linguistics, 1987): 94–197. 17. gentner, “reasoning,” 106. 18. lakoff, “contemporary theory,” 210. 19. jaron lanier, “jaron’s world: the meaning of metaphor,” discover (mind and brain) 28, no. 2 (2007), http://discover magazine.com/2007/feb/jarons-world-metaphors-vocabulary (accessed june 25, 2008). 20. lakoff and johnson, “metaphors.” 21. david ritchie, “metaphors in conversational context: toward a connectivity theory of metaphor interpretation,” metaphor and symbol 19, no. 4 (2004): 265–87. 22. amanda spink and charles cole, “a human information behavior approach to a philosophy of information,” library trends 52, no. 3 (2004): 617–28; amanda spink and james currier, “towards an evolutionary perspective for human information behavior: an exploratory study,” journal of documentation 62, no. 2 (2006): 171–93; amanda spink and james currier, “emerging evolutionary approach to human information behavior,” in new directions in human information behavior, ed. amanda spink and charles cole, ol. 8 of information science and knowledge management (netherlands: springer, 2006): 170–202. 23. spink and currier, “towards an evolutionary perspective.” 24. amanda spink and charles cole, “human information behavior: integrating diverse approaches and information use,” journal of the american society for science and technology 57, no. 1 (2005): 25. 25. charles cole and john e. leide, “a cognitive framework for human information behavior: the place of metaphor in human information organizing behavior” in new directions in human information behavior ed. amanda spink and charles cole, vol. 8 of information science and knowledge management (netherlands: springer, 2006): 174. 26. ibid., 173. 27. ibid., 198. 28. weiner, “a knowledge representation approach”; gentner, “reasoning and learning.” 29. nicholas j. belkin, “anomalous state of knowledge for information retrieval,” canadian journal of information science 5 (1980): 133–43. 30. donald gentner and jacob nielson, “the anti-mac interface,” communications of the acm 39, no. 8 (1996): 70–82. 31. richard m. chisholm, “new metaphors for understanding the new machines” proceedings of the 4th annual international conference on systems documentation (new york: acm, 1986): 91. 32. aaron marcus, “metaphor design in user interfaces: how to effectively manage expectation, surprise, comprehension, and delight” in conference companion on human factors in computing systems chi ‘95, ed. irivin katz, robert mack, and linn marks (new york: acm, 1995): 373–74. 33. benjamin fineman, “computers as people: human interaction metaphors in human-computer interaction (master’s thesis, carnegie-mellon university, 2004), www.mildabandon .com/paper/paper.pdf (accessed june 25, 2008). 34. ibid. 35. gentner and nielson, “the anti-mac interface.” 36. alan kay, doing with images makes symbols (university video communications, 1987), flash video file, http://video .google.com/videoplay?docid=-533537336174204822 (accessed june 25, 2008). 37. alan f. blackwell, “the reification of metaphor as a design tool,” acm transactions on computer-human interaction 13, no. 4 (2006): 490–530. 38. john m. carroll and john c. thomas, “metaphor and the cognitive representation of computing systems,” ieee transactions on systems, man and cybernetics 12, no. 2 (1982): 108. 39. ibid., 113. 40. donald case, “conceptual organization and retrieval of text by historians: the role of metaphor and memory,” journal of the american society for information science 42, no. 9 (1991): 657–68. 41. aaron marcus, “managing metaphors for advanced user interface,” proceedings of international workshop avi ’94 (new york: acm, 1994): 14. 16 information technology and libraries | december 2008 42. ibid. 43. allan maclean, victoria bellotti, richard young, and thomas moran, “reaching through analogy: a design rationale perspective on roles of analogy” in proceedings of chi ‘91 conference on human factors in computer systems (new york: acm press, 1991), 167–72. 44. chilsolm, “new metaphors,” 90. 45. gerald j. johnson, “of metaphor and the difficulty of computer discourse,” communications of the acm 37, no. 12 (1994): 97–102. 46. ibid., 101. 47. gentner and nielson, “the anti-mac interface.” 48. chisolm, “new metaphors,” 90. 49. frank halasz and thomas p. moran, “analogy considered harmful,” international journal of man-machine studies 14 (1981): 383. 50. ibid., 185. 51. marcus, “managing metaphors,” 14. 52. ibid., 16. 53. alan cooper, “the myth of metaphor” originally published in visual basic programmer’s journal (july 1995), www .cooper.com/articles/art_myth_of_metaphor.htm (accessed june 25, 2008). 54. tim rohrer, “metaphors we compute by: bringing magic into interface design,” (1995), http://zakros.ucsd.edu/~trohrer/ metaphor/gui4web.htm (accessed june 25, 2008). 55. gentner and nielson, “the anti-mac interface.” 56. ibid., 74 57. ibid. 58. blackwell, “the reification of metaphor,” 493. 59. handbook of human computer interaction, 2nd rev. ed., ed. martin helander, thomas landauer, and p. prabhu (amsterdam: elsevier science pub. b.v., 1998). 60. james hollan, benjamin bederson, and jonathan helfman, “information visualization” in handbook of human computer interaction, 2nd rev. ed., ed. martin helander, thomas landauer, and p. prabhu (amsterdam: elsevier science pub. b.v., 1998), 441–62. 61. dennis c. neale and john m. carroll, “the role of metaphors in user interface design” in handbook of human computer interaction, 2nd rev. ed., ed. martin helander, thomas landauer and p. prabhu (amsterdam: elsevier science pub. b.v., 1998): 447. 62. a. f. blackwell and t. r. g. green, “does metaphor increase visual language usability?” in proceedings 1999 ieee symposium on visual languages (1999): 246–53.; lee ratzan, “making sense of the web: a metaphorical approach,” information research 6, no. 1 (2000), http://informationr.net/ir/6-1/ paper85.html (accessed june 25, 2008); christopher r. wolfe, “plant a tree in cyberspace: metaphor and analogy as design elements in web-based learning environments,” cyberpsychology & behavior 4, no. 1 (2001): 67–76; muna k. yousef, “legal, social, theoretical and fundamental aspects: assessment of metaphor efficacy in user interfaces for the elderly: a tentative model for enhancing accessibility,” proceedings of the 2001 ec/ nsf workshop on universal accessibility of ubiquitous computing: providing for the elderly (new york: acm, 2001): 120–24. 63. pippin barr, robert biddle, and james noble, “a taxonomy of user interface metaphors” proceedings of sigchi-nz symposium on computer-human interaction (chinz 2002) (hamilton, new zealand: australian computer society, 2002): 6. 64. alan f. blackwell, “metaphor in diagrams” (phd diss., darwin college, univ. of cambridge, 1998), www.cl.cam .ac.uk/~afb21/publications/thesis/blackwell-thesis.pdf (accessed june 25, 2008): 1. 65. david g. hendry, “sketching with conceptual metaphors to explain computational processes” visutal languages and human-centric computing (vl-hcc ‘06), (piscataway, n.j.: ieee, 2006): 7. 66. neale and carroll, “the role of metaphors in user interface design.” 67. amy powell, “composite metaphor, games and interface” proceedings of the second australasian conference on interactive entertainment: vol. 123, acm international conference proceeding series (sydney, australia: creativity & cognition studios pr., 2005): 159–62. 68. yu-chen hsu, “the long-term effects of integral versus composite metaphors on experts’ and novices’ search behaviors,” interacting with computers 17 (2005): 391. 69. gerald khoury and simeon simoff, “elastic metaphors: expanding the philosophy of interface design” in selected papers from conference on computers and philosophy, ed. john weckert and yeslam al-saggaf, vol. 37 of acm international conference proceeding series volume 101 (darlinghurst, australia: australian computer society, 2003): 70. 8 information technology and libraries | march 2010 t. michael silver monitoring network and service availability with open-source software silver describes the implementation of a monitoring system using an open-source software package to improve the availability of services and reduce the response time when troubles occur. he provides a brief overview of the literature available on monitoring library systems, and then describes the implementation of nagios, an open-source network monitoring system, to monitor a regional library system’s servers and wide area network. particular attention is paid to using the plug-in architecture to monitor library services effectively. the author includes example displays and configuration files. editor’s note: this article is the winner of the lita/ex libris writing award, 2009. l ibrary it departments have an obligation to provide reliable services both during and after normal business hours. the it industry has developed guidelines for the management of it services, but the library community has been slow to adopt these practices. the delay may be attributed to a number of factors, including a dependence on vendors and consultants for technical expertise, a reliance on librarians who have little formal training in it best practices, and a focus on automation systems instead of infrastructure. larger systems that employ dedicated it professionals to manage the organization’s technology resources likely implement best practices as a matter of course and see no need to discuss them within the library community. in the practice of system and network administration, thomas a. limoncelli, christine j. hogan, and strata r. chalup present a comprehensive look at best practices in managing systems and networks. early in the book they provide a short list of first steps toward improving it services, one of which is the implementation of some form of monitoring. they point out that without monitoring, systems can be down for extended periods before administrators notice or users report the problem.1 they dedicate an entire chapter to monitoring services. in it, they discuss the two primary types of monitoring—real-time monitoring, which provides information on the current state of services, and historical monitoring, which provides long-term data on uptime, use, and performance.2 while the software discussed in this article provides both types of monitoring, i focus on real-time monitoring and the value of problem identification and notification. service monitoring does not appear frequently in library literature, and what is written often relates to single-purpose custom monitoring. an article in the september 2008 issue of ital describes the development and deployment of a wireless network, including a perl script written to monitor the wireless network and associated services.3 the script updates a webpage to display the results and sends an e-mail notifying staff of problems. an enterprise monitoring system could perform these tasks and present the results within the context of the complete infrastructure. it would require using advanced features because of the segregation of networks discussed in their article but would require little or no extra effort than it took to write the single-purpose script. dave pattern at the university of huddersfield shared another perl script that monitors opac functionality.4 again, the script provided a single-purpose monitoring solution that could be integrated within a larger model. below, i discuss how i modified his script to provide more meaningful monitoring of our opac than the stock webpage monitoring plug-in included with our opensource networks monitoring system, nagios. service monitoring can consist of a variety of tests. in its simplest form, a ping test will verify that a host (server or device) is powered on and successfully connected to the network. feher and sondag used ping tests to monitor the availability of the routers and access points on their network, as do i for monitoring connectivity to remote locations.5 a slightly more meaningful check would test for the establishment of a connection on a port. feher and sondag used this method to check the daemons in their network.6 the step further would be to evaluate a service response, for example checking the status code returned by a web server. evaluating content forms the next level of meaning. limoncelli, hogan, and chalup discuss end-to-end monitoring, where the monitoring system actually performs meaningful transactions and evaluates the results.7 pattern’s script, mentioned above, tests opac functionality by submitting a known keyword search and evaluating the response.8 i implemented this after an incident where nagios failed to alert me to a problem with the opac. the web server returned a status code of 200 to the request for the search page. users, however, want more from an opac, and attempts to search were unsuccessful because of problems with the index server. modifying pattern’s original script, i was able to put together a custom check command that verifies a greater level of functionality by evaluating the number of results for the known search. n software selection limoncelli, hogan, and chalup do not address specific t. michael silver (michael.silver@ualberta.ca) is an mlis student, school of library and information studies, university of alberta, edmonton, alberta, canada. monitoring network and service availability with open-source software | silver 9 how-to issues and rarely mention specific products. their book provides the foundational knowledge necessary to identify what must be done. in terms of monitoring, they leave the selection of an appropriate tool to the reader.9 myriad monitoring tools exist, both commercial and open-source. some focus on network analysis, and some even target specific brands or model lines. the selection of a specific software package should depend on the services being monitored and the goals for the monitoring. wikipedia lists thirty-five different products, of which eighteen are commercial (some with free versions with reduced functionality or features); fourteen are opensource projects under a general public license or similar license (some with commercial support available but without different feature sets or licenses); and three offer different versions under different licenses.10 von hagen and jones suggest two of them: nagios and zabbix.11 i selected the nagios open-source product (http:// www.nagios.org). the software has an established history of active development, a large and active user community, a significant number of included and usercontributed extensions, and multiple books published on its use. commercial support is available from a company founded by the creator and lead developer as well as other authorized solution providers. monitoring appliances based on nagios are available, as are sensors designed to interoperate with nagios. because of the flexibility of a software design that uses a plug-in architecture, service checks for library-specific applications can be implemented. if a check or action can be scripted using practically any protocol or programming language, nagios can monitor it. nagios also provides a variety of information displays, as shown in appendixes a–e. n installation the nagios system provides an extremely flexible solution to monitor hosts and services. the object-orientation and use of plug-ins allows administrators to monitor any aspect of their infrastructure or services using standard plug-ins, user-contributed plug-ins, or custom scripts. additionally, the open-source nature of the package allows independent development of extensions to add features or integrate the software with other tools. community sites such as monitoringexchange (formerly nagios exchange), nagios community, and nagios wiki provide repositories of documentation, plug-ins, extensions, and other tools designed to work with nagios.12 but that flexibility comes at a cost—nagios has a steep learning curve, and usercontributed plug-ins often require the installation of other software, most notably perl modules. nagios runs on a variety of linux, unix, and berkeley software distribution (bsd) operating systems. for testing, i used a standard linux server distribution installed on a virtual machine. virtualization provides an easy way to test software, especially if an alternate operating system is needed. if given sufficient resources, a virtual machine is capable of running the production instance of nagios. after installing and updating the operating system, i installed the following packages: n apache web server n perl n gd development library, needed to produce graphs and status maps n libpng-devel and libjpeg-devel, both needed by the gd library n gcc and gnu make, which are needed to compile some plug-ins and perl modules most major linux and bsd distributions include nagios in their software repositories for easy installation using the native package management system. although the software in the repositories is often not the most recent version, using these repositories simplifies the installation process. if a reasonably recent version of the software is available from a repository, i will install from there. some software packages are either outdated or not available, and i manually install these. detailed installation instructions are available on the nagios website, in several books, and on the previously mentioned websites.13 the documentation for version 3 includes a number of quick-start guides.14 most package managers will take care of some of the setup, including modifying the apache configuration file to create an alias available at http://server.name/nagios. i prepared the remainder of this article using the latest stable versions of nagios (3.0.6) and the plug-ins (1.4.13) at the time of writing. n configuration nagios configuration relies on an object model, which allows a great deal of flexibility but can be complex. planning your configuration beforehand is highly recommended. nagios has two main configuration files, cgi.cfg and nagios.cfg. the former is primarily used by the web interface to authenticate users and control access, and it defines whether authentication is used and which users can access what functions. the latter is the main configuration file and controls all other program operations. the cfg_file and cfg_dir directives allow the configuration to be split into manageable groupsusing additional recourse files and the object definition files (see figure 1). the flexibility offered allows a variety of different structures. i group network 10 information technology and libraries | march 2010 devices into groups but create individual files for each server. nagios uses an objectoriented design. the objects in nagios are displayed in table 1. a complete review of nagios configuration is beyond the scope of this article. the documentation installed with nagios covers it in great detail. special attention should be paid to the concepts of templates and object inheritance as they are vital to creating a manageable configuration. the discussion below provides a brief introduction, while appendixes f–j provide concrete examples of working configuration files. n cgi.cfg the cgi.cfg file controls the web interface and its associated cgi (common gateway interface) programs. during testing, i often turn off authentication by setting use_authentication to 0 if the web interface is not accessible from the internet. there also are various configuration directives that provide greater control over which users can access which features. the users are defined in the /etc/nagios/htpasswd.users file. a summary of commands to control entries is presented in table 2. the web interface includes other features, such as sounds, status map displays, and integration with other products. discussion of these directives is beyond the scope of this article. the cgi.cfg file provided with the software is well commented, and the nagios documentation provides additional information. a number of screenshots from the web interface are provided in the appendixes, including status displays and reporting. n nagios.cfg the nagios.cfg file controls the operation of everything except the web interface. although it is possible to have a single monolithic configuration file, organizing the configuration into manageable files works better. the two main directives of note are cfg_file, which defines a single file that should be included, and cfg_dir, which includes all files in the specified directory with a .cfg extension. a third type of file that gets included is resource.cfg, which defines various macros for use in commands. organizing the object files takes some thought. i monitor more than one hundred services on roughly seventy hosts, so the method of organizing the files was of more than academic interest. i use the following configuration files: n commands.cfg, containing command definitions n contacts.cfg, containing the list of contacts and associated information, such as e-mail address, (see appendix h) n groups.cfg, containing all groups—hostgroups, servicegroups, and contactgroups, (see appendix g) n templates.cfg, containing all object templates, (see appendix f) n timeperiods.cfg, containing the time ranges for checks and notifications all devices and servers that i monitor are placed in directories using the cfg_dir directive: servers—contains server configurations. each file includes the host and service configurations for a physical or virtual server. devices—contains device information. i create individual files for devices with service monitoring that goes beyond simple ping tests for connectivtable 1. nagios objects object used for hosts servers or devices being monitored hostgroups groups of hosts services services being monitored servicegroups groups of services timeperiods scheduling of checks and notifications commands checking hosts and services notifying contacts processing performance data event handling contacts individuals to alert contactgroups groups of contacts figure 1. nagios configuration relationships. copyright © 2009 ethan galstead, nagios enterprises. used with permission. monitoring network and service availability with open-source software | silver 11 ity. devices monitored solely for connectivity are grouped logically into a single file. for example, we monitor connectivity with fifty remote locations, and all fifty of them are placed in a single file. the resource.cfg file uses two macros to define the path to plug-ins and event handlers. thirty other macros are available. because the cgi programs do not read the resource file, restrictive permissions can be applied to them, enabling some of the macros to be used for usernames and passwords needed in check commands. placing sensitive information in service configurations exposes them to the web server, creating a security issue. n configuration the appendixes include the object configuration files for a simple monitoring situation. a switch is monitored using a simple ping test (see appendix j), while an opac server on the other side of the switch is monitored for both web and z39.50 operations (see appendix i). note that the opac configuration includes a parents directive that tells nagios that a problem with the gateway-switch will affect connectivity with the opac server. i monitor fifty remote sites. if my router is down, a single notification regarding my router provides more information if it is not buried in a storm of notifications about the remote sites. the web port, web service, and opac search services demonstrate different levels of monitoring. the web port simply attempts to establish a connection to port 80 without evaluating anything beyond a successful connection. the web service check requests a specific page from the web server and evaluates only the status code returned by the server. it displays a warning because i configured the check to download a file that does not exist. the web server is running because it returns an error code, hence the warning status. the opac search uses a known search to evaluate the result content, specifically whether the correct number of results is returned for a known search. i used a number of templates in the creation of this configuration. templates reduce the amount of repetitive typing by allowing the reuse of directives. templates can be chained, as seen in the host templates. the opac definition uses the linux-server template, which in turn uses the generic-host template. the host definition inherits the directives of the template it uses, overriding any elements in both and adding new elements. in practical terms, generic-host directives are read first. linux-server directives are applied next. if there is a conflict, the linuxserver directive takes precedence. finally, opac is read. again, any conflicts are resolved in favor of the last configuration read, in this case opac. n plug-ins and service checks the nagios plugins package provides numerous plug-ins, including the check-host-alive, check_ping, check_tcp, and check_http commands. using the plug-ins is straightforward, as demonstrated in the appendixes. most plugins will provide some information on use if executed with—help supplied as an argument to the command. by default, the plug-ins are installed in /usr/lib/nagios/ plugins. some distributions may install them in a different directory. the plugins folder contains a subfolder with usercontributed scripts that have proven useful. most of these plug-ins are perl scripts, many of which require additional perl modules available from the comprehensive perl archive network (cpan). the check_hip_search plug-in (appendix k) used in the examples requires additional modules. installing perl modules is best accomplished using the cpan perl module. detailed instructions on module installation are available online.15 some general tips: n gcc and make should be installed before trying to install perl modules, regardless of whether you are installing manually or using cpan. most modules are provided as source code, which may require compiling before use. cpan automates this process but requires the presence of these packages. n alternately, many linux distributions provide perl module packages. using repositories to install usually works well assuming the repository has all the needed modules. in my experience, that is rarely the case. table 2. sample commands for managing the htpasswd.users file create or modify an entry, with password entered at a prompt: htpasswd /etc/nagios/htpasswd.users create or modify an entry using password from the command line: htpasswd -b /etc/nagios/htpasswd.users delete an entry from the file: htpasswd -d /etc/nagios/htpasswd.users 12 information technology and libraries | march 2010 n many modules depend on other modules, sometimes requiring multiple install steps. both cpan and distribution package managers usually satisfy dependencies automatically. manual installation requires the installer to satisfy the dependencies one by one. n most plug-ins provide information on required software, including modules, in a readme file or in the source code for the script. in the absence of such documentation, running the script on the command line usually produces an error containing the name of the missing module. n testing should be done using the nagios user. using another user account, especially the root user, to create directories, copy files, and run programs creates folders and files that are not accessible to the nagios user. the best practice is to use the nagios user for as much of the configuration and testing as possible. the lists and forums frequently include questions from new users that have successfully installed, configured, and tested nagios as the root user and are confused when nagios fails to start or function properly. n advanced topics once the system is running, more advanced features can be explored. the documentation describes many such enhancements, but the following may be particularly useful depending on the situation. n nagios provides access control through the combination of settings in the cgi.cfg and htpasswd.users files. library administration and staff, as well as patrons, may appreciate the ability to see the status of the various systems. however, care should be taken to avoid disclosing sensitive information regarding the network or passwords, or allowing access to cgi programs that perform actions. n nagios permits the establishment of dependency relationships. host dependencies may be useful in some rare circumstances not covered by the parent–child relationships mentioned above, but service dependencies provide a method of connecting services in a meaningful manner. for example, certain opac functions are dependent on ils services. defining these relationships takes both time and thought, which may be worthwhile depending on any given situation. n event handlers allow nagios to initiate certain actions after a state change. if nagios notices that a particular service is down, it can run a script or program to attempt to correct the problem. care should be taken when creating these scripts as service restarts may delete or overwrite information critical to solving a problem, or worsen the actual situation if an attempt to restart a service or reboot a server fails. n nagios provides notification escalations, permitting the automatic notification of problems that last longer than a certain time. for example, a service escalation could send the first three alerts to the admin group. if properly configured, the fourth alert would be sent to the managers group as well as the admin group. in addition to escalating issues to management, this feature can be used to establish a series of responders for multiple on-call personnel. n nagios can work in tandem with remote machines. in addition to custom scripts using secure shell (ssh), the nagios remote plug-in executor (nrpe) add-on allows the execution of plug-ins on remote machines, while the nagios service check acceptor (nsca) add-on allows a remote host to submit check results to the nagios server for processing. implementing nagios on the feher and sondag wireless network mentioned earlier would require one of these options because the wireless network is not accessible from the external network. these add-ons also allow for distributed monitoring, sharing the load among a number of servers while still providing the administrators with a single interface to the entire monitored network. the nagios exchange (http://exchange.nagios .org/) contains similar user-contributed programs for windows. n nagios can be configured to provide redundant or failover monitoring. limoncelli, hogan, and chalup call this metamonitoring and describe when it is needed and how it can be implemented, suggesting self-monitoring by the host or having a second monitoring system that only monitors the main system.16 nagios permits more complex configurations, allowing for either two servers operating in parallel, only one of which sends notifications unless the main server fails, or two servers communicating to share the monitoring load. n alternative means of notification increase access to information on the status of the network. i implemented another open-source software package, quickpage, which allows nagios text messages to be sent from a computer to a pager or cell phone.17 appendix l shows a screenshot of a firefox extension that displays host and service problems in the status bar of my browser and provides optional audio alerts.18 the nagios community has developed a number of alternatives, including specialized web interfaces and rss feed generators.19 monitoring network and service availability with open-source software | silver 13 n appropriate use monitoring uses bandwidth and adds to the load of machines being monitored. accordingly, an it department should only monitor its own servers and devices, or those for which it has permission to do so. imagine what would happen if all the users of a service such as worldcat started monitoring it! the additional load would be noticeable and could conceivably disrupt service. aside from reasons connected with being a good “netizen,” monitoring appears similar to port-scanning, a technique used to discover network vulnerabilities. an organization that blithely monitors devices without the owner’s permission may find their traffic is throttled back or blocked entirely. if a library has a definite need to monitor another service, obtaining permission to do so is a vital first step. if permission is withheld, the service level agreement between the library and its service provider or vendor should be reevaluated to ensure that the provider has an appropriate system in place to respond to problems. n benefits the system-administration books provide an accurate overview of the benefits of monitoring, but personally reaping those benefits provides a qualitative background to the experience. i was able to justify the time spent on setting up monitoring the first day of production. one of the available plug-ins monitors sybase database servers. it was one of the first contributed plug-ins i implemented because of past experiences with our production database running out of free space, causing the system to become nonfunctional. this happened twice, approximately a year apart. each time, the integrated library system was down while the vendor addressed the issue. when i enabled the sybase service checks, nagios immediately returned a warning for the free space. the advance warning allowed me to work with the vendor to extend the database volume with no downtime for our users. that single event convinced the library director of the value of the system. since that time, nagios has proven its worth in alerting it staff to problem situations, providing information on outage patterns both for in-house troubleshooting and discussions with service providers. n conclusion monitoring systems and services provides it staff with a vital tool in providing quality customer service and managing systems. installing and configuring such a system involves a learning curve and takes both time and computing resources. my experiences with nagios have convinced me that the return on investment more than justifies the costs. references 1. thomas a. limoncelli, christina j. hogan, and strata r. chalup, the practice of system and network administration, 2nd ed. (upper saddle river, n.j.: addison-wesley, 2007): 36. 2. ibid., 523–42. 3. james feher and tyler sondag, “administering an opensource wireless network,” information technology & libraries 27, no. 3 (sept. 2008): 44–54. 4. dave pattern, “keeping an eye on your hip,” online posting, jan. 23, 2007, self-plagiarism is style, http://www.daveyp .com/blog/archives/164 (accessed nov. 20, 2008). 5. feher and sondag, “administering an open-source wireless network,” 45–54. 6. ibid., 48, 53–54. 7. limoncelli, hogan, and chalup, the practice of system and network administration, 539–40. 8. pattern, “keeping an eye on your hip.” 9. limoncelli, hogan, and chalup, the practice of system and network administration, xxv. 10. “comparison of network monitoring systems,” wikipedia, the free encyclopedia, dec. 9, 2008, http://en.wikipedia .org/wiki/comparison_of_network_monitoring_systems (accessed dec. 10, 2008). 11. william von hagen and brian k. jones, linux server hacks, vol. 2 (sebastopol, calif.: o’reilly, 2005): 371–74 (zabbix), 382–87 (nagios). 12. monitoringexchange, http://www.monitoringexchange. org/ (accessed dec. 23, 2009); nagios community, http:// community.nagios.org (accessed dec. 23, 2009); nagios wiki, http://www.nagioswiki.org/ (accessed dec. 23, 2009). 13. “nagios documentation,” nagios, mar. 4, 2008, http:// www.nagios.org/docs/ (accessed dec. 8, 2008); david josephsen, building a monitoring infrastructure with nagios (upper saddle river, n.j.: prentice hall, 2007); wolfgang barth, nagios: system and network monitoring, u.s. ed. (san francisco: open source press; no starch press, 2006). 14. ethan galstead, “nagios quickstart installation guides,” nagios 3.x documentation, nov. 30, 2008, http://nagios.source forge.net/docs/3_0/quickstart.html (accessed dec. 3, 2008). 15. the perl directory, (http://www.perl.org/) contains complete information on perl. specific information on using cpan is available in “how do i install a module from cpan?” perlfaq8, nov. 7, 2007, http://perldoc.perl.org/perlfaq8.html (accessed dec. 4, 2008). 16. limoncelli, hogan, and chalup, the practice of system and network administration, 539–40. 17. thomas dwyer iii, qpage solutions, http://www.qpage .org/ (accessed dec. 9, 2008). 18. petr šimek, “nagioschecker,” google code, aug. 12, 2008, http://code.google.com/p/nagioschecker/ (accessed dec. 8, 2008). 19. “notifications,” monitoringexchange, http://www .monitoringexchange.org/inventory/utilities/addon-projects/notifications (accessed dec. 23, 2009). 14 information technology and libraries | march 2010 appendix a. service detail display from test system appendix b. service details for opac (hip) and ils (horizon) servers from production system appendix c. sybase freespace trends for a specified period appendix d. connectivity history for a specified period appendix e. availability report for host shown in appendix d appendix f. templates.cfg file ############################################################################ # templates.cfg sample object templates ############################################################################ ############################################################################ # contact templates ############################################################################ monitoring network and service availability with open-source software | silver 15 # generic contact definition template this is not a real contact, just # a template! define contact{ name generic-contact service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r,f,s host_notification_options d,u,r,f,s service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email register 0 } ############################################################################ # host templates ############################################################################ # generic host definition template this is not a real host, just # a template! define host{ name generic-host notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 notification_period 24x7 register 0 } # linux host definition template this is not a real host, just a template! define host{ name linux-server use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period workhours notification_interval 120 notification_options d,u,r contact_groups admins register 0 } appendix f. templates.cfg file (cont.) 16 information technology and libraries | march 2010 # define a template for switches that we can reuse define host{ name generic-switch use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period 24x7 notification_interval 30 notification_options d,r contact_groups admins register 0 } ############################################################################ # service templates ############################################################################ # generic service definition template this is not a real service, # just a template! define service{ name generic-service active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 10 retry_check_interval 2 contact_groups admins notification_options w,u,c,r notification_interval 60 notification_period 24x7 register 0 } appendix f. templates.cfg file (cont.) monitoring network and service availability with open-source software | silver 17 # define a ping service. this is not a real service, just a template! define service{ use generic-service name ping-service notification_options n check_command check_ping!1000.0,20%!2000.0,60% register 0 } appendix f. templates.cfg file (cont.) appendix g. groups.cfg file ############################################################################ # contact group definitions ############################################################################ # we only have one contact in this simple configuration file, so there is # no need to create more than one contact group. define contactgroup{ contactgroup_name admins alias nagios administrators members nagiosadmin } ############################################################################ # host group definitions ############################################################################ # define an optional hostgroup for linux machines define hostgroup{ hostgroup_name linux-servers ; the name of the hostgroup alias linux servers ; long name of the group } # create a new hostgroup for ils servers define hostgroup{ hostgroup_name ils-servers ; the name of the hostgroup alias ils servers ; long name of the group } # create a new hostgroup for switches define hostgroup{ hostgroup_name switches ; the name of the hostgroup alias network switches ; long name of the group } ############################################################################ # service group definitions ############################################################################ 18 information technology and libraries | march 2010 # define a service group for network connectivity define servicegroup{ servicegroup_name network alias network infrastructure services } # define a servicegroup for ils define servicegroup{ servicegroup_name ils-services alias ils related services } appendix g. groups.cfg file (cont.) appendix h. contacts.cfg ############################################################################ # contacts.cfg sample contact/contactgroup definitions ############################################################################ # just one contact defined by default the nagios admin (that’s you) # this contact definition inherits a lot of default values from the # ‘generic-contact’ template which is defined elsewhere. define contact{ contact_name nagiosadmin use generic-contact alias nagios admin email nagios@localhost } appendix i. opac.cfg ############################################################################ # opac server ############################################################################ ############################################################################ # host definition ############################################################################ # define a host for the server we’ll be monitoring # change the host_name, alias, and address to fit your situation define host{ use linux-server host_name opac parents gateway-switch alias opac server monitoring network and service availability with open-source software | silver 19 appendix i. opac.cfg (cont.) address 192.168.1.123 } ############################################################################ # service definitions ############################################################################ # create a service for monitoring the http port define service{ use generic-service host_name opac service_description web port check_command check_tcp!80 } # create a service for monitoring the web service define service{ use generic-service host_name opac service_description web service check_command check_http!-u/bogusfilethatdoesnotexist.html } # create a service for monitoring the opac search define service{ use generic-service host_name opac service_description opac search check_command check_hip_search } # create a service for monitoring the z39.50 port define service{ use generic-service host_name opac service_description z3950 port check_command check_tcp!210 } appendix j. switches.cfg ############################################################################ # switch.cfg sample config file for monitoring switches ############################################################################ ############################################################################ # host definitions ############################################################################ 20 information technology and libraries | march 2010 appendix k. check_hip_search script #!/usr/bin/perl -w ######################### # check horizon information portal (hip) status. # hip is the web-based interface for dynix and horizon # ils systems by sirsidynix corporation. # # this plugin is based on a standalone perl script written # by dave pattern. please see # http://www.daveyp.com/blog/index.php/archives/164/ # for the original script. # # the original script and this derived work are covered by # http://creativecommons.org/licenses/by-nc-sa/2.5/ ######################### use strict; use lwp::useragent; # note the requirement for perl module lwp::useragent! use lib “/usr/lib/nagios/plugins”; use utils qw($timeout %errors); # define the switch that we’ll be monitoring define host{ use generic-switch host_name gateway-switch alias gateway switch address 192.168.0.1 hostgroups switches } ############################################################################ ### # service definitions ############################################################################ ### # create a service to ping to switches # note this entry will ping every host in the switches hostgroup define service{ use ping-service hostgroups switches service_description ping normal_check_interval 5 retry_check_interval 1 } appendix j. switches.cfg monitoring network and service availability with open-source software | silver 21 ### some configuration options my $hipserverhome = “http://ipac.prl.ab.ca/ipac20/ipac. jsp?profile=alap”; my $hipserversearch = “http://ipac.prl.ab.ca/ipac20/ipac.jsp?menu=se arch&aspect=subtab132&npp=10&ipp=20&spp=20&profile=alap&ri=&index=.gw&term=li nux&x=18&y=13&aspect=subtab132&getxml=true”; my $hipsearchtype = “xml”; my $httpproxy = ‘’; ### check home page is available... { my $ua = lwp::useragent->new; $ua->timeout( 10 ); if( $httpproxy ) { $ua->proxy( ‘http’, $httpproxy ) } my $response = $ua->get( $hipserverhome ); my $status = $response->status_line; if( $response->is_success ) { } else { print “hip_search critical: $status\n”; exit $errors{‘critical’}; } } ### check search page is returning results... { my $ua = lwp::useragent->new; $ua->timeout( 10 ); if( $httpproxy ) { $ua->proxy( ‘http’, $httpproxy ) } my $response = $ua->get( $hipserversearch ); my $status = $response->status_line; if( $response->is_success ) { my $results = 0; my $content = $response->content; if( lc( $hipsearchtype ) eq ‘html’ ) { if ( $content =~ /\(\d+?)\<\/b\>\ \;titles matched/ ) { $results = $1; appendix k. check_hip_search script (cont.) 22 information technology and libraries | march 2010 } } if( lc( $hipsearchtype ) eq ‘xml’ ) { if( $content =~ /\(\d+?)\<\/hits\>/ ) { $results = $1; } } ### modified section original script triggered another function to ### save results to a temp file and email an administrator. unless( $results ) { print “hip_search critical: no results returned|results=0\n”; exit $errors{‘critical’}; } if ( $results ) { print “hip_search ok: $results results returned|results=$results\n”; exit $errors{‘ok’}; } } } appendix k. check_hip_search script (cont.) appendix l. nagios checker display editorial board thoughts: “india does not exist.” mark cyzyk information technology and libraries | june 2013 4 often, i find myself trolling online forums, searching for and praying i find a bona-fide solution to a technical problem. typically, my process begins with the annoying discovery that many others are running into the same, or very similar, difficulty. many others. once i get over my initial frustration ("why isn't this problem fixed by now?"), i proceed to read, to attempt to determine which of the often conflicting and even contradictory suggestions for fixing the problem might actually work. i thought it would be instructive to step back for a moment and examine this experience. to do so, i want to use as my example, as my straw man, not a technical question, but a more generic question, the sort of question anyone might conceivably ask. i'll ask this question, then i'll list what i think might be answers, in form and substance, from the technical forums had it been asked there: "i want to go to india. how best to get there?" why would you want to go there? you could fly. you could take a ship. why go to india? iceland is much better. i went to india once and it wasn't that great. you never specify where in india you want to go. we can't help you until you tell us where in india you want to go. i am sick and tired of these people who don't read the forums. your query has been answered before. the only way to get there is to fly first class on continental. you could ride a mule to india. new zealand is much better. you should go there instead. it is impossible to go to india. you can get from india to anywhere in europe very easily via india air. you should read a passage to india, i forget who wrote it. i read it as an undergraduate. it was very good. you are an idiot for wanting to go to india. india does not exist. mark cyzyk (mcyzyk@jhu.edu), a member of lita and the ital editorial board, is the scholarly communication architect in the sheridan libraries, the johns hopkins university, baltimore, maryland. mailto:mcyzyk@jhu.edu editorial board thoughts: india does not exist | cyzyk 5 i think it's safe to say that the signal to noise ratio here is high. if we truly want to answer a question, we don't want to add noise. pontificating, posturing, and automatically posing as a mentor in a mentor/protégé relationship will typically be construed as adding nothing but noise to the signal. in most cases, we who answer such questions are not here to educate, except insofar as we provide a clear and concise answer to a technical query issued by one of our peers. what should we assume? first off, we should assume that the person writing the question is sincere: he truly does want to go to india. we need not question his motives. the best way to think about this is that the query is a hypothetical: if he were to want to go to india, how best to do it? if you were to want to go to india, how best to do it? this requires a certain level of empathy on the part of the one answering the question, a level of empathy of which the technical forums are all but devoid. many answers on those forums are so tone-deaf to human need they may as well have been written by robots. "how best to get there" is tricky because you must make some assumptions. assumptions are fine as long as you're explicit about them. one assumption might be: he is leaving from the east coast of the united states. another assumption might be: he is going to india only for a short while, for a conference or vacation. yet another one might be: by "best" he means "quickest, most efficient, least expensive." stating these assumptions, then stating your answer to the question, is appropriate and is what is most helpful. stating your assumptions is tantamount to stating your understanding of the original question, its scope and context. this is always a helpful thing to do when attempting to communicate with another human being. now, communication and plumbing the depth of human need, at least with respect to informational and bibliographic needs, has always been a strong suit of librarians, so what i write here is not really directed at librarians. it is, though, directed at we who straddle both the library world and the technology world, if that distinction is not a false one and can be usefully made. i think it important for those of us split between two cultures to ensure that we fall to one side and not the other, in particular that we do not fall into the oftentimes loutish and ultimately unproductive communication mores exhibited by many of the online technical forums. whenever my wife and i hear a news story on tv or radio openly wondering why more women do not go into i.t., i blurt out something like: "you wanna know why? just go read the comments section of most posts at slashdot.com. why on earth would anyone who didn't have to put up with that kind of culture actually choose to put up with it?" isn't "india does not exist" exactly the kind of response one would find on slashdot.com if the initial question was, "i want to go to india -how best to get there?"? with all this in mind, i hereby issue my own question, this time a technical one: information technology and libraries | june 2013 6 "i want to programmatically convert a largish set of documents from pdf to docx format. how best to do it?" i hope you don't think i'm an idiot. editorial | truitt 51 marc truitteditorial: ala and our carbon footprint obligatory disclaimer: before proceeding, i want to state very clearly that—as with anything i write in this space that is not explicitly attributed to someone other than myself—the reflections that follow are my own thoughts and views. they in no way are intended to represent the views either official or personal of lita or ala officials or employees. w hile i am writing these lines just a week or so after the end of the american library association (ala) midwinter meeting, by the time you see them the ala annual conference in chicago will be just days away. i’ve been reflecting (stewing?) for some time now about the question of ala conferences: why do i attend, and what do i get from these gatherings? is the vendor/exhibitor “tail” wagging the ala/attendee “dog”? is attendance responsible in a time of straitened budgets? and, most recently, what is the environmental cost of attendance? for the moment, i’d like to consider only one of these. we all know that flying is, from an environmental perspective, enormously wasteful and destructive. yet, for attendance at ala and most other professional conferences, air travel is the only practical means, unless either one is fortunate enough to live in the area or ala holds the event in a place such as new york, chicago, philadelphia, or washington, each of which can boast credible commuter rail service. sadly, in most other places trains are really not an option; how many of us can imagine being able to take a long-distance amtrak train to an ala conference? so i wondered what it costs the environment for all of us to go to an ala conference. the following admittedly broad-side-of-barn figures for the recently completed midwinter meeting in denver are real eye-openers (you may not like my assumptions, but we have to assume some things, and after all, i’m only trying to get an orderof-magnitude number): a. number of paid attendees at midwinter meeting 2009: 9,8501 b. “fudge” figure for those who didn’t fly (local attendees or those close enough to use other means of transport): 1,000 c. total number of attendees who flew (a-b): 8,850 d. average distance to denver (round trip, in metric tons of co2 produced): .36352 e. total metric tons of co2—the “carbon footprint”— for all attendees who flew to denver (c x d): 3,217 i’m guessing this is a conservative number; still, the total “carbon footprint” of all who flew to the midwinter meeting was more than 3,000 metric tons of co2.3 that seems to me to be a giant’s footprint indeed for what we are told is primarily a “business meeting.” and this, of course, represents only that portion of the footprint that one identifies with air travel . . . enumerating the actual footprint would require taking into account many other sources of waste, with the resulting total being far larger. is it just me, or does this seem to be an extravagance these days? given that the vast majority of our “business meetings” can be transacted through video conference, teleconference, e-mail, or similar technological means, how do we continue to justify the indulgence of attending such conferences as the planet warms to temperature levels not observed in thousands of years? at a minimum, i would suggest that it’s high time we—individually or as a profession—began to think hard about compensating for our excess by purchasing carbon credits. i personally think of them as “bleeding heart environmentalism,” that is, little more than a means for we “haves” to assuage our guilt about our profligate ways. but even offset payments would be better than nothing. the obvious way to handle this would be for ala to add a modest ($5–10) surcharge to the meeting registration fee, with the resulting proceeds dedicated to an approved beneficiary. let’s see . . . my “carbon footprint” for flying to midwinter meeting 2009 is .38 metric tons. i can purchase an “offset” for about $5 and apply it to any of several worthy causes shown on the carbonfootprint.com website. ah, i feel better already . . . . . . or not. n more midwinter meeting fallout one of the more interesting sessions i attended at the midwinter meeting was a sleeper bearing the title “redefining technical services workflows with oclc.” led by karen calhoun, oclc’s vice president of worldcat and metadata services, a panel that included robin fradenburgh of the university of texas and my university of alberta colleagues kathy carter and sharon marshall described several innovative oclc services aimed at “improv[ing] efficiency and enhanc[ing] access to library materials.”4 calhoun’s overview, “reinventing technical services,” nicely summarized many of the issues facing technical services (ts) operations today, marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 52 information technology and libraries | june 2009 including declining staff counts and the desire by library administrators to reclaim for patron use the space currently occupied by ts operations. she then reviewed recent studies about our patrons’ changing preferences for research tools—i.e., the question that has often been cast as “google versus the catalog.” precisely how workflow and organizational efficiencies (whether or not they come from oclc) in ts can alter our users’ research habits is a bit beyond me, but i’ll leave it to you to decide. the presentations are available to view at http://www.oclc. org/us/en/multimedia/2009/ala_mw_redefining_ technical_services.htm; do listen to the presentations and decide for yourself. in any case, calhoun’s talk, and an earlier comment made by a colleague and long-time friend of mine, got me to thinking again about “the catalog.” my friend, when asked at another program held just before the midwinter meeting, had said that the ts efficiency she would like most to institute would be “to stop cataloguing new (trade) books.” instead, we should put our limited cataloging resources where they might best be used, that is, in making rare and unique local resources discoverable. whoa!, i thought at the time. how might we do this? as calhoun talked about our users’ preference for discovery outside of the catalog, my mind wandered back to my friend’s comment. worldcat local? probably not, since it would still involve “cataloging” books, and doesn’t seem likely to be any more appealing to the google and amazon–focused user than are our opacs already. but what about amazon? i can envision a “catalog” search that begins at amazon’s already metadatarich site, enhanced with links to local holdings of all the things listed there—amazoncat local, if you will. blue-skying a bit more, i can imagine amazon’s business model for offering this kind of service. not only would there be even more eyeballs on its site than there are now, but a library considering such a service might offer in return that some or all of its acquisitions be sourced to amazon. conceivably, amazon could even offer a shelf-ready service, in which it provided the materials already barcoded, marked, and ready to park on our shelves. hmmm . . . open the box, shelve the already-in-the-“catalog” books, and pay the invoice. sounds pretty simple, no? things are rarely that simple, and i know that. there would be complexities aplenty, but who knows? am i serious? i make this proposal because i come from a background that respects and values the work of catalogers and other ts staff. part of me wants the idea to be tried and found wanting, that some of those who argue that library cataloging is “dead” might then come to a different view. but, either way, what we’d need would be a sizable institution willing to try it and see. who wants to be the pilot site? amazoncat local, anyone? references and notes 1. library journal.com, “with economy sputtering, ala midwinter attendance dips sharply,”www.libraryjournal.com/ index.asp?layout=talkbackcommentsfull&talk_back_header_ id=6582196&articleid=ca6632569#129349 (accessed feb. 5, 2009). according to libraryjournal.com, the count on saturday, january 24, was 9,850, including 7,689 registrants (of whom 498 were on-site registrants) and 2,161 exhibitors. 2. i used the carbon footprint calculator at www .carbonfootprint.com/calculator.aspx (accessed feb. 5, 2009) to compute the co2 footprint in metric tons for one round-trip flight between denver and each of the following cities: atlanta (.40), boston (.58), chicago (.30), dallas (.22), houston (.29), los angeles (.27), miami (.57), minneapolis (.23), new york–jfk (.54), philadelphia (.52), phoenix (.19), pittsburgh (.43), salt lake city (.23), san diego (.27), san francisco (.31), seattle (.34), and washington, d.c. (.49). i then averaged these for an “average trip” production of .3635 metric tons. 3. according to wikipedia, one metric ton equals 2,204.6226 lbs. or 1.102 u.s. tons. thus, 3,217 metric tons equals approximately 3,545 u.s. tons. wikipedia, “tonne,” http://en.wikipedia .org/wiki/tonne (accessed feb. 5, 2009). 4. oclc, redefining technical services workflows with oclc, www.oclc.org/us/en/multimedia/2009/ala_mw_ redefining_technical_services.htm (accessed feb. 25, 2009). 48 information technology and libraries | march 200748 information technology and libraries | march 2008 touchable online braille generator wooseob jeong a prototype of a touchable online braille generator has been developed for the visually impaired or blind using force feedback technology, which has been used in video games for years. without expensive devices, this prototype allows blind people to access information on the web by touching output braille displays with a force feedback mouse. the data collected from user studies conducted with blind participants has provided valuable information about the optimal conditions for the use of the prototype. the end product of this research will enable visually impaired people to enjoy information on the web more freely. the united states has made some attempts to nationally address information access for those with disabilities. section 508 of the rehabilitation act (www.section508.gov) requires federal agencies to make their electronic information accessible to people with disabilities, mainly those who are visually impaired. the library of congress launched a webbraille service (www.loc.gov/nls/) for the blind in 1998, which continues today. with the upsurge in information stored on the internet, the importance of these issues cannot be overemphasized. many products have been developed to help the visually impaired use technology. several braille output and input devices are available, such as the braille notetaker (www. artictech.com) and voice synthesizers for screen readers like jaws (www. freedomscientific.com/fs_products/ software_jaws.asp). while these products are mainly for textual information, recent developments put more focus on graphical displays. the american national institute of standards and technology proposed a “pins” down imaging system for the blind (www.nist.gov/ public_affairs/factsheet/visualdisplay.htm). uniplan in japan and ksg america (www.kgs-america.com/ dvs.htm) have produced other products based on similar ideas. software like the duxbury braille translator (www.duxburysystems.com) can translate plain text into braille output, which can then be used for embossed printing. however, such products are fairly expensive, ranging from hundreds to several thousands of dollars in addition to the cost of computers. fortunately, there is a potentially promising solution. based on the technology used in prior research, it is possible to develop an online braille generator.1 the braille could then be read either by touching the screen with a fingertip sensor or through the use of a force feedback mouse similar to the type used in some video games.2 this application has several advantages over existing devices. first, it does not require expensive special devices—only a $20 mouse, which is readily available. also, the technology is available as long as there is access to the internet. another advantage is that this technology utilizes the existing braille skills of visually impaired people. the same technology can be used for producing image displays as well, allowing for the creation of a virtual museum for the blind where they can touch objects that are displayed alongside their braille descriptions. literature review force feedback has been studied under the name of haptic perception. haptic perception involves sensing the movement and position of joints, limbs, and fingers through kinesthesia and proprioception, and sensing information through the skin’s tactility.3 haptic output can be achieved through several techniques, including pneumatic, vibrotactile, electrotactile, and electromechanical stimulation.4 this study examines only vibrotactile haptic output methods because vibrotactile stimulation is easily created, manipulated, and delivered. it is also easily perceived by users through the use of commonly available software and devices. researchers have begun to develop various haptic input/output devices and software, such as massachusetts institute of technology’s (mit) frequently used phantom haptic interface.5 along with these developments, a number of studies have tried to apply haptic displays to real-world computing, including a force feedback braille system,6 force feedback virtual reality modeling language (vrml),7 a force feedback x window system8, and gis.9 haptic studies have only recently become more mainstream, and there are few extensive studies with real subjects. gillespie and others developed the “virtual teacher,” a device for manual skill learning, which they tested with 24 participants and found that most profited from the “force feedback teacher.” 10 langrana and others used the rutgers master ii, a dexterous, portable master for virtual reality simulations for force feedback using four fingers. in their experiment of tumor detection in virtual livers with 32 subjects, the experimental group with force feedback training performed slightly better than the control group.11 this may mean that either the training methods need improvement or that the task did not require extensive training. colwell and others confirmed that a haptic interface (impulse engine 3000) has considerable potential for blind computer users through their threedimensional objects experiment with 22 subjects.12 jeong tested ordering communications wooseob jeong (wjj8612@uwm.edu) is associate professor at the school of information studies, university of wisconsin–milwaukee. article title | author 49touchable online braille generator | jeong 49 tasks in auditory and haptic displays with 23 subjects and found that subjects performed better with haptic-only displays than with auditory-only displays or with auditory/ haptic combination displays.13 several studies already attempted to apply force feedback technology to assist blind people’s computing. ramstein conducted a pilot study to apply haptics to braille.14 yu and brewster compared the use of force feedback in multimodal virtual reality and printed medium in visualization for the blind.15 tzovaras and others tried to implement a virtual reality interface with force feedback for blind people.16 ramloll and others studied the use of haptic line graphs with sound for blind students.17 emery and others tested a multimodal haptic interface with 29 older adults to find that all participants performed well under auditory-haptic bimodal feedback.18 jacko and others tested a multimodal interface with 29 normal vision older adults and 30 visually impaired older adults, finding that in some cases, nonvisual feedback forms—including auditory or haptic feedback—demonstrated significant performance gains over the visual feedback form.19 s. jeong and others proposed an interactive system that combines an immersive virtual environment with a humanscale haptic interface.20 when conducting user studies with the visually impaired, it is necessary to separate the completely blind from the partially sighted. in spite of the different characteristics of these two groups, the literature on visually impaired people typically does not distinguish between them. this distinction is especially important if the legally blind or those with low vision are included in the definition of visually impaired. the challenges to the partially sighted are different from those of the totally blind, demanding different assistance and considerations. in fact, the completely blind represent a small portion of the visually impaired population. according to an advisor in wisconsin’s division of vocational rehabilitation, less than 5 percent of her advisees are totally blind and require very specialized attention quite different from partially sighted people. purpose of study the purpose of this study is to explore the feasibility of using force feedback technology to facilitate blind people’s access to text information on the web. both quantitative and qualitative data were collected to identify the optimal conditions under which the prototype can best serve the blind. significance of study public libraries in the u.s., primarily through their main libraries, are providing special services for the visually impaired. currently, the core service is the provision of audiobooks. as digital libraries prevail, services for the blind should be online as well, with the the library of congress’s web-braille service as one of the leading examples. however, such services require the use of an expensive braille output device. upon refinement, this prototype would significantly improve the experience of the visually impaired using online services. this prototype can be easily expanded to support graphical displays without any additional devices, making the use of touchable picture books possible for blind users in libraries. prototype development force feedback technology has been used for many years in video games. its use has expanded to other areas such as surgical operations and dangerous mechanical processes. this technology was previously applied to gis to solve the problem of ambiguous multicolor displays for multi-variable thematic maps.21 the same technique was used for this project. the online braille generator translates text on the web into a braille display, letting the user feel the braille dots with a vibrating mouse. the prototype interface was developed using immersion studio (www.immersion.com), javascript, perl/cgi, and active server pages (asp). logitech’s ifeel mouse, inexpensive at a cost of $20, was used for force feedback output (figure 1). the interface has an input text box, which can be filled with any plain text. once it is submitted, the text is instantly translated into braille (figures 2 and 3). when the user moves the mouse over each dot on the screen, it vibrates with a given force. while users explore the screen with the vibrating mouse, force feedback dots provide a tactile effect similar to braille displays. in future projects, the manual configure 1. experimental setting 50 information technology and libraries | march 200850 information technology and libraries | march 2008 version programs will be upgraded to automatic conversion programs with which any texts on the web can be grabbed by their urls and converted into a touchable format for the blind. participants to make this prototype more usable, user studies were conducted in milwaukee, wisconsin, with 21 participants who are completely blind and read braille. the small sample size—due to the relatively small percentage of visually impaired people who are completely blind and can read braille—is comparable to or larger than those found in other research on the blind. the participants came from various age groups—teens (3), twenties (6), thirties (2), fifties (5), and sixties (5)—and included 9 females and 12 males. nineteen of the 21 were born blind. participants were recruited at several sites, including the university of wisconsin–milwaukee student accessibility center, public libraries with centers for the blind, and nonprofit organizations for the physically impaired. vision teachers in local school districts were also contacted. participants provided valuable i n f o r m a t i o n about the optimal conditions for the use of the prototype. this information will eventually lead to force feedback displays that enable visually impaired people to access the vast amount of information on the web without expensive devices. experimental procedure experiments were conducted in a number of settings, including in the organization’s offices, at the participants’ homes, and at the site of a regional annual meeting for the blind. each session lasted no more than 60 minutes. participants were asked to try different interfaces of force feedback braille outputs with various dot sizes and magnitudes of force. they used a tactile mouse on a notebook computer; after exploring every option, they were asked to select the most comfortable settings for their sense of touch, including what size the dots should be, how strong the force should be, what kind of force feedback should be used (vibration or friction), and their general opinions of the prototype (see figure 4). interviews accompanied the experiments so that both quantitative and qualitative data could be collected. interviews were transcribed for qualitative data analysis. result even though there were only 21 study participants, a number of issues were clearly identified. it is encouraging to see that all of the participants could identify braille characters using the force feedback mouse with the guidance of the researcher. all the participants agreed that this prototype would be useful with training. the participants preferred the largest dot size (30 pixels in diamfigure 2. touchable braille input screen figure 3. touchable braille output screen figure 4. inexpensive force feedback mouse article title | author 51touchable online braille generator | jeong 51 eter) and the strongest force possible for maximum perception of the force feedback effect. however, the prototype was less attractive to the participants than the currently dominant voice synthesizer software. at least two participants mentioned that their current braille pads fulfill their needs. it seems that they are not motivated to invest their time and effort in a new device. when a potential graphical display application was introduced at the end of a session, the participants became more receptive. at this time there is no practical solution for the visually impaired to feel graphics on computers. experimental devices are available, but they are either quite expensive or still in the research phase. the blind participants also suggested that this graphical prototype could be used for geometry and geography easily and effectively. discussion blind people’s navigation by mouse because blind people do not use a mouse for computing, using the force feedback mouse itself was a challenge for the study participants. a sighted person uses a mouse with both hand and eye, moving the mouse while watching the mouse cursor on the screen. for the blind it is difficult to identify the mouse’s position. the direction of movement and the distance between two points are difficult to grasp. due to the lack of guidance, the blind encounter difficulties in moving the mouse in a straight line. these issues hinder the effectiveness of force feedback displays for the blind. however, this issue does not only affect the blind. some sighted people, especially older adults, cannot move a mouse easily. one possible solution may be to develop guardrails to help blind people to differentiate relevant areas of the screen from irrelevant ones. due to their inexperience in using a mouse, the participants held the mouse too firmly to move it or to feel the force feedback. the only participant to use the mouse successfully was a college student who is music major with 15 years of piano playing experience. this implies that a significant learning session will be required to allow blind people to use the mouse freely. ignorance or suppression of graphical information need even though the participants were more excited about the potential graphical displays, blind people’s graphical information needs are limited. it is possible that their graphical information needs are ignored or suppressed based on their lifetime experiences. they tend to resort to braille and, more recently, voice synthesizers instead of graphical displays. this finding suggests the importance of studying the real information needs of the blind or visually impaired rather than the sighted researchers’ expectations of those needs. more research needed with sound because the blind already use sound, particularly voice synthesizers, more sound applications should be researched. for example, audio games have the potential to help blind children learn some skills in the same way that video games teach certain skills to sighted children. audio games also provide a broader research area for future studies. conclusion numerous devices have been developed to improve blind or visually impaired people’s access to information, including information on the internet. however, such devices are quite expensive or limited in flexibility and mainly work in text-only environments. there is no suitable graphic display for the blind, except the laboratory level’s expensive and bulky pin-based external devices. this new prototype uses established force feedback technology with a minimal cost to existing pcs. it functions for both text and graphics. the final products derived from this study can be used for many purposes nationally and internationally. information on the web can be delivered to the visually impaired without expensive devices. this touchable braille also lets deaf-blind people, who cannot use screen reader software, access information on the web, and it can help people learn braille. the application of this force feedback prototype to image displays has exciting and enormous potential because currently there is no practical, usable method for the blind to access images. for example, blind children are still using handmade 3-d picture books that are labor-intensive and time-consuming to produce. with this prototype, children’s books can be delivered easily to blind children, who will touch the books’ images via the force feedback mouse. maps of local, state, national, or international interests can be delivered to the blind as well. this prototype will help to add yet another sense—touch—to already blossoming visual and auditory digital libraries. through force feedback technology, new multimodal digital libraries will be accessible to the world. acknowledgement this research was supported by a diversity research grant from the american library association in 2005. 52 information technology and libraries | march 200852 information technology and libraries | march 2008 references and notes 1. wooseob jeong and myke gluck, “multimodal geographic information systems: adding haptic and auditory display,” journal of the american society for information science and technology 54, no. 3 (2003): 229–242. 2. wooseob jeong, “touchable online braille generator,” in proceedings of the 7th international acm sigaccess conference on computers and accessibility (new york: acm press, 2005), 188–189. 3. jack m. loomis and susan j. lederman, “tactual perception,” in handbook of perception and human performance, ed. k. r. boff, l. kaufman and j. p. thomas (new york: john wiley & sons, 1986), vol. 2, chap. 31, 1–41. 4. r. dan jacobson, robert kitchen, and reginald golledge, “multimodal virtual reality for presenting geographic information,” in virtual reality in geography, ed. p. fisher and d. unwin (new york: taylor & francis, 2000), 382–400. 5. j. kenneth salisbury and mandayam a. srinivasan, “phantom-based haptic interaction with virtual objects,” ieee computer graphics and applications 17, no. 5 (1997): 6–10. 6. christopher ramstein, “combining haptic and braille technologies: design issues and pilot study,” in proceedings of the 2nd annual acm conference on assistive technologies (new york: acm press, 1996), 37–44. 7. a. hardwick, s. furner, and j. rush, “tactile access for blind people to virtual reality on the world wide web,” iee colloquium on developments in tactile displays 1997, no. 012: 9/1–9/3. 8. timothy miller and robert zeleznik, “the design of 3d haptic widgets,” in proceedings of the 1999 symposium on interactive 3d graphics (new york: acm press, 1999), 97–102. 9. r. dan jacobson, “geographic visualization with little or no sight: an interactive gis for visually impaired people (paper submitted to aag-gis specialty group student paper competition). 10. r. brent gillespie and others, “the virtual teacher” in proceedings of asme dynamic systems and control division (new york: asme, 1998), vol. 2, 171–78. 11. noshira a. langrana and others, “human performance using virtual reality tumor palpation simulation,” computer & graphics 21, no. 4 (1997): 451–458. 12. c. colwell and others, “haptic virtual reality for blind computer users,” in proceedings of the third annual acm conference on assistive technologies (new york: acm press, 1998), 92–99. 13. wooseob jeong, “exploratory user study of haptic and auditory display for multimodal geographic information systems,” in chi’01 extended abstracts on human factors in computing systems (new york: acm press, 2001), 73–74. 14. ramstein, “combining haptic and braille technologies.” 15. wai yu and stephen brewster, “multimodal technologies: multimodal virtual reality versus printed medium in visualization for blind people,” in proceedings of the 5th international acm conference on assistive technologies (new york: acm press, 2002), 57–64. 16. d. tzovaras and others, “multimodal technologies: design and implementation of virtual environments training of the visually impaired,” in proceedings of the 5th international acm conference on assistive technologies (new york: acm press, 2002), 41–48. 17. r. ramloll and others, “constructing sonified haptic line graphs for the blind student: first steps,” in proceedings of the 4th international acm conference on assistive technologies (new york: acm press, 2000), 17–25. 18. v. kathlene emery and others, “toward achieving universal usability for older adults through multimodal feedback,” in proceedings of the 2003 conference on universal usability (new york: acm press, 2003), 46–53. 19. julie a. jacko and others, “older adults and visual impairment: what do exposure times and accuracy tell us about performance gains associated with multimodal feedback?” in proceedings of the sigchi conference on human factors in computing systems (new york: acm press, 2003), 33–40. 20. seongzoo jeong, naoki hashimoto, and sato makoto, “a novel interaction system with force feedback between real and virtual humans,” in proceedings of the 2004 acm sigchi international conference on advances in computer entertainment technology (new york: acm press, 2004), 61–66. 21. jeong and gluck, “multimodal geographic information systems”; and wooseob jeong, “multimodal trivariate thematic maps with auditory and haptic display” (paper contributed to asist 2005, charlotte, north carolina, october 28–november 2, 2005). cost comparison of computer versus manual catalog maintenance 159 john c. kountz : county of orange public library, orange, california is a computer assisted catalog system less expensive than . its manual counterpart? a method for comparing the two was developed and applied to historical data from the orange county public library. comparative costs obtained were $ .89 per entry for computer assisted catalog maintenance versus $1.71 for manual maintenance. introduction since november 1965, the county of orange public library has performed all acquisitions by means of a computer assisted system. as a byproduct of this continuing operation, records for over 30,000 titles are now available in machine readable form on magnetic tape. the next logical step to realize the library's goal of mechanizing a major portion of its many nonprofessional functions is the production of a comprehensive multi-access list of its holdings suitable for both library and patron use; in short, a book catalog. the 30,000 captive entries, however, comprise only a quarter of the library's total holdings of 120,000 titles. before the envisioned book catalog can be produced, approximately 90,000 titles remain to be captured, and subsequent file handling and data printout operations must be developed. an undertaking of this magnitude naturally prompted a review of the literature. initially, hayes and shoffner's work for the stanford university undergraduate library ( 1) would appear adequate. on closer exam160 journal of library autouwtion vol. 1/ 3 september, 1968 fig. 1. manual card catalog system. cem~\.\uo 0~'\lii..,.iotl~ (c~ ... "'"'""'c.t\oh) cost comparisonj kountz 161 fig. 2. proposed computer assisted book catalog system. 162 journal of library automation vol. 1/ 3 september, 1968 ination, however, their approach did not optimize the cycle for supplement production or catalog reprint; nor was particular attention given this problem in the institute of library research report to the california state library ( 2). the cartwright and shoffner study for the california state library ( 3) paid close attention to cycle length, but the system therein described differed extensively from the system proposed for orange county. further, though the costing of data capture has been well documented and continues to appear in the literature ( 4,5,6), there is little concerning the cost of maintaining data once on file. in brief, neither a method nor basic information was available which could be applied generally, although several specific approaches and results had been presented (1,7,8,9,10,11), and an approach to the analysis of manual operations established ( 12). when it became apparent that more than article reading .would be required, cost analysis of the existing manual operation and the proposed computer assisted book catalog program was performed. in addition, a method was designed to discern what cost benefit, if any, was implied in a computer maintained file before a massive keying effort and systems development should be undertaken. it is important to note that the analysis gives no consideration to increased level of service, esthetics, practicality, or the subsidiary products of a computerized system. nor is the capital investment represented by existing card catalogs considered, as those units are assumed to have been paid for in the course of their creation. manual card catalog system the manual system to be replaced consists of individual card catalogs and shelf lists in each of the library's service units, comprising 25 branches and a separate bookmobile base. this system, depicted in figure 1, consists of: 1) centralized card production, and; 2) branch catalog maintenance. in the centralized operations, offset masters are created from worksheets prepared by the cataloging section and used for two-up card production. these cards are collated into sets, the sets merged with their corresponding books, and the completed packages sent to the ordering branches. when book and card packages are received by the branch, shelf list and catalog cards are sorted and merged with their respective files. withdrawal of a book (discarded or lost) from a branch collection triggers a reversal of this process, and all cards for the withdrawn item are purged from the files. proposed computer assisted book catalog system the computer assisted system (figure 2) consists of tlll'ee phases of computer operation and catalog printing. in the first phase the computer receives as input magnetic tapes produced by the library's ongoing book acquisition system and/ or the output of a device providing a direct keycost comparisonj kountz 163 board to tape capability, processes the input data into updated records, and merges the updated records with the master file of library holdings. the first phase will build the initial master file through capture of the library's remaining 90,000 titles via the keyboard-to-tape device indicated above, and will also form the main avenue for communicating revision (update) data to the master file. in the second phase the computer extracts two print tapes from the master file: the first is the biblio file, consisting of all the bibliographic data and the record number for each master file entry in alphabetical sequence (author-title mix); the second, or locate file, contains location codes and copy counts for each record number in numeric sequence. in the third and final phase, the biblio and locate files are processed. from tl1e biblio file are produced keylines (camera-ready copy) for the book catalog and periodic cumulative supplements of new entries. out of the locate file are generated three numeric listings: 1) a locate list containing all entries, 2) periodic cumulative locate supplements, 3) branch inventories. in production of the book catalog, the computer produced keylines are used to create offset masters for printing. the end product of the printing process is 400 bound copies of the book catalog. factors in cost comparison following is an examination of the principal factors which must be equal or identical to permit comparative analysis of the two systems. unit of comparison (entry) to facilitate the cost analysis between manual and computer assisted file maintenance systems, a unit of comparison was established which would be compatible to both. this unit is called the ent}\y, and in the analysis which follows is the basis for all cost comparisons. for the manual system, entry means creation, distribution, filing and, ultimately, purging of the complete set of cards (figure 3) pertaining to a specific book; while for the computer assisted counterpart, an entry is a record (figure 4) in machine readable form which has been captured, sorted, listed and updated. frequency of transactions either system, in addition to creating and posting new records to a file, must periodically update both entire records and the elements of those records. the number of these updates can be determined for a given period of time, and for our purposes we call this figure the frequency of transactions. with regard to the systems under analysis, the frequency of transactions is identical, and includes two elements: titles added and withdrawn; and volumes added and withdrawn (including re-assignments) as shown in table 1. 164 journal of library automation vol. 1/ 3 september, 1968 don baa 940.5472 940.5472 sandulescu, jacques donbas. mckay, 1968. 217p $4.95 escapes sandulescu, jacques donbas. mckay, 1968, 217p $4.95 world war, 1939-1945 prisoners and prisons, russian personal narratives 940.5472 ., 940.5472 sandulescu, jacques donbas. mckay, 1968. 217p $4.95 sandulescu, jacques donbas. mckay, 1968, 217p $4.95 5217s3 940,5472 940.5472 sandulescu, jacques · donbaa. mckay, 1968, 217p $4.95 sandulescu, jacques donbas. mckay, 1968. 217p $4.95 521763 1. world war, 1939-1945 prisoners and prisons, russian personal narratives (wo 63866) 2. escapes (es14042) i, t 0 68-14127 fig. 3. set of catalog cards. 11-i. ~ m~~fr. ~ rl::c.olitc z ~ ... 0 "' ...j "' a: "' • "' :> z 0 c: ... 0 "' ...j "' a: "' .. 2 ::> z 0 ~ <> "' .j "' 2. ::; na.!'l!: ; :> sv~!:>)e<.t ~ rec.oro:: 0 fig. 4. "' .j "' dep artm ent ~' . . . multiple layout form for electric account ing mac hine cards interpreter spacing n'-mi:: / svaje.ct c.odes lc./oc. nvm8t:r • . ~ iu...f-. o-n l.tnnnc."n•• 0 c.o~ coac c.odt c.oot. c.oo~ c.oot:. p1tlrt• nv"at:" ~ ~ ~ ~ ~ ~ ~ s' 0 0 0 0 0 0 effective date----filingtitle. v tt « « tt. ii it 9 9 919 919 9 9 9 9 919 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 919 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 i 2 3 4 s 6 1 i s 10 11 i jl 14 15 1l 11 ii ii 10 21 2 23 24 25 2l 27 2t 1'! 30 )i 32 ll 34 35 3c l1 31 jt 41 41 4~43 44 4s 46 47 41 0 $0 51 52 s3 s4 55 ~ 51 s. 5t so 61 ' 2 6j 64 5$ "51 61 " 70 71 11 1314 75 11 7171 7t to fiuniptiti..e (c.on'r.) svs·t\tl.e: 9 9 9 9 9 9 9 9 9 9 9 919 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 1 ~ 3 4 5 i 7 i t \ 0 ii 12 ll 14 15 " 11 ii it )(! 21 n ll 24 2'5 k 1l 21 2! 30 ll l2 33 :w l s 3l 31 31: ll 40 41 42 4j 44 '54& 41 u 49 so 51 52 5j s4 5.5 ~ 57 5i st 50 61 cj 63 s4 65 6' 51 m tt 70 71 ll 7l 74 15 71 71 1111 • 5ub·tttl.e (.c.• .. t ) d"-te jfvelfi$4.-fies a~ pte /sons 1'?1):.~ •1\n pt"iu~n~ t n~tttt'a11vgs(wll ".39,6) ;). • fsc"pj:s (es 11./oi.l~) strips: 6 price new ti tle 0 recol. [=:j end papers new ed. d new set c=j bind have d l abels i ,._., re info rce f850-66 . 2 bind ing: trade closs: 9 01 02 1 20 21 32 53 54 55 56 61 fig. 5. sub purchase order. anf . 1 63 71 72 73 cost comparisonj kountz 171 adult titles on oroer branch 11 laguna beach 04-30-67 page 3 01933.5 arco 03-67 $6.50 scoring high bn reading tests 019413 arco 03-67 1 s4.00 vocabulary, spelling, gra1<4mar 022015 armstrong, charlotte 05-67 1 $4.95 the cih shop 022483 ashley ~ontagu, m. f. 05-67 1 $5.95 american way of life 017896 ashuy-~ontagu s4.50 on being hijman 021535 asi"'ov, isaac 05-67 1 s3.75 whlsprings of liff 022599 attwood, . wil~ia~ 05-67 1 $5.95 the rf.ds & thf blacks 020713 aucfiincloss, louis 04-67 1 $4.95 tales of ~anhattan 019960 auer, alfons 03-67 1 s5.95 open to the world 022460 austen, jane 05-67 s5.95 pride & prejudice 018680 austen, jane 1 s3.95 em'4a 021536 bak f.r, geoffrey 05..;67 1 sl5.00 motfls 021666 balogh. thas 05-67 $7.95 economics of poverty 018116 bannister, margaret $4.95 6up.n tfie little lamp 022093 baring-goulo, william 05-67 ss.oo the lure of th£ limerick 018392 barlow, james 1 s5.95 one man in the world 021188 barnett, a •. ooak 04-67 1 $6.00 china aftei-t mao fig. 6. on order listing. 172 journal of library automation vol. 1/ 3 september, 1968 is taken from historical costs for the operation of a report generator doing this in the acquisitions system. similarly, the reformatted entries in "printable" form must also be : sequenced alphabetically (single ·authortitle mix) before they can be printed, and again the $0.00034 cost is taken from historical data. finally, the sorted, reformatted entries are printed (upper case) at a cost of $0.027 each ( 2 lines). the total cost for these operations becomes $0.036 per entry, as shown in table 6. table 6. keyline production cost computer reformat master file entries sort reformatted enh·ies print (offline) total entry cost $00.009 00.00034 00.027 $00.036 for catalog printing, the computer generated keylines will be reduced photographically to 60 percent, and the reductions assembled for 16-up reproduction with approximately 100 entries per sheet (both sides). initial book catalog production will be 400 copies of approximately 1,800 sheets. there are slightly more than 61,000 author entries which will receive full bibliographic data, call and lc order numbers, while the 120,000 title entries will present only author data and call number. the estimated total cost of printing is given in table 7. the resultant. cost per entry is $00.186, regardless of the number of lines required. table 7. catalog printing cost set-up plates run time gather/ collate paper cover / perfect bind total: $ 3,000 7,000 6,000 2,000 4,000 350 $22,350 as the printed and bound book catalog will not present the locations of the materials it lists, an off-line locate list will be produced concurrent with catalog creation. this list will contain 120,000 numeric entries ( lc order number, coded locations and price), and will be generated for library use only. the cost of offline printing of this list ( 25 copies) is based on a historical print cost of $0.014 per one-line entry extended to the number of entries, or $1,680.00. cost comparisonj kountz 173 summary of computer assisted catalog production costs the grand total cost per entry for all operations leading to the initial book catalog (based on initial data capture through file construction above) is given in table 8. as shown in this table, the computer cost per table 8. composite cost per entry of a computer produced book catalog operation file construction keyline production book production locate list production total cost $0.286 0.036 0.186 0.014 $0.52 entry for the first 'edition' of the book catalog is $0.52 .. this figure is comparable to the manual system figure of $0.99 per entry derived earlier. however, the cost per entry figure for computer assisted file maintenance must also be derived before comparison with the total manual figure of $1.71 per entry is possible. table 9. cost of posting and printing catalog update data operation unit cost total locate update input print locate list (offline) subtotal total for 133,000 actions biblio update reformat master file entry sort reformatted entry print biblio list (offline) subtotal ( rounded) total for 9,000 actions grand total ( 142,000 actions) $00.007 00.014 00.021 00.009 00.00034 00.027 00.036 $2,793.00 324.00 $3,117.00 174 journal of library automation vol. 1/ 3 september, 1968 computer file maintenance the figures developed in table 9 establish a cost per entry for file maintenance, from keyboarding corrected data to the production of an offline printout of biblio and locate supplements. to understand their derivation, let us review the frequency of transactions. in table 1, it can be noted that the 15,000 titles added to the collection annually will necessitate master file location update for the volumes they represent. the locations for withdrawals will also require update. in combination, additions and withdrawals mean a total of 129,000 actions, plus 4,000 last copy withdrawals, or 133,000 updates yearly to keep the master locate file current. in contrast, only the 9,000 new titles will require bibliographic listing. the locate file update input cost is identical to that used for the entry of error correction data (table 5). this is possible since approximately the same number of characters must be keyed to address an entry and enter updated location data ( 9 characters for record and card code, an action digit, and 2 numeric location characters for a total of 12 characters versus the 14 characters required for entry correction) . similarly, the offline print cost for locate data remains the same as that indicated for the initial locate list printout: $0.014 per entry. the biblio file update costs are the computer keyline production figures presented in table 6. to derive the cost per entry, both locate and biblio figures are extended to reflect the proportion of the final figure they represent, and reduced to a single cost per entry. in summation, $0.0242 is the cost per entry for file maintenance for one year. however, this figure is of limited value without reference to either the frequency of supplemental production or total catalog reprint period. therefore, the optimum frequency of supplement production and the period of maintenance are discussed below to bring this raw $0.0242 per entry into perspective. optimum frequency of supplement production and catalog reprints the optimum frequency of bibliographic supplement production is based on the most timely reporting of new title disposition at the least cost. that is, a determination of the number of cumulative listings of new titles in concert with all location changes which can be produced before their production cost equals or exceeds the cost of total catalog reprint. the most economical approach to reporting revised, new, or deleted bibliographic and location entries would be through listing only those entries which have been changed. the summary figures presented in table 10 reflect only the cost per entry developed in table 9 for the production of cumulative exception listings, assuming an equal monthly distribution of transactions. in addition, the annual cost per year, excepting the twehth month, is tabulated to reflect overall cost where total reprint cost comparison/ kountz 175 would occur instead of last cumulative supplement cycle. a quarterly supplement production cycle is selected, as it best meets the optimum defined earlier (i.e., most timely reporting for the least cost). table 10. cumulative supplement costs for various cycles computer runs per year 12 6 _..,.. 4 3 2 1 annual cost @ $0.0242/entry 12th month 12th month included excluded $ 22,335.92 $ 18,899.63 12,027.04 8,590.74 8,590.74 5,154.44 6,872.26 3,436.30 5,154.44 1,718.15 3,436.30 by extending the quarterly supplement production costs shown in table 10 to represent recurring annual expenses and cumulating these annual expenses for comparison with the total cost of complete book catalog and locate list product, the number of years between catalog reprints becomes obvious. this calculation is shown in table 11, where 3 years is the optimum reprint cycle for the qua1terly supple.ment costs selected. table 11. years i 2 _..,.. 3 4 catalog reprint vs. supplement production costs supplement cost annual cumulative $ 8,591 17,182 25,773 34,364 12th month excluded $ 5,154 13,746 22,337 30,928 (year's end) catalog reprint $ 23,000 24,250 25,500 26,750 comparison and conclusion to return to the cost per entry for catalog maintenance alone for optimum reprint cycle, there is a total outlay of $47,837 for 3 years of cumulative supplements and a catalog reprint to report an average of 129,000 titles. from this base can be derived a cost per entry of $0.37 for entry maintenance. this $0.37 can then be summed with the $0.52 cost per entry for the catalog "first edition", for a grand total of $0.89 as the cost per entry for a computer assisted catalog production and maintenance system. further, this cost per entry is realized in a document equal to 400 card catalogs! in terms of the manual system, maintenance was $0.72 per entry, and some 26 files had to be maintained. thus, it is possible to extend the single file maintenance cost to a systemwide average of $18.72, plus the $00.99 required for entry preparation, or a grand total of $19.72 per entry, rather than the $1.71 indicated earlier. 176 journal of library automation vol. 1/3 september, 1968 the lesson implied here is simple: manual cost per entry is dependent upon the number of manual files being maintained. this is of importance since it means a significant increase in outlay for file maintenance with the addition of each new branch; whereas, costing for a computer produced and maintained catalog is relatively independent of the number of service units accommodated. finally, a word of caution. there is a potential danger lurking in these figures for the small public library which has a limited number of branches. this is the fact that the cost per entry, even for the single shelf list/card-catalog comparison, has been calculated for an operating system serving a relatively large number of branches. the cost-per-entry method used in this paper does not include amortization of the capital outlay for "computerization" which, in this specific case, amounts to almost $200,000 for design of system, procedures and forms, and for design, coding and debugging of programs. although savings equal to this amount, or more, would be realized over a period of time because of reduced clerical operations and attendant burden, a large sum would still have to be earmarked for expenditure during a relatively short period with no immediate return. foreknowledge of this "one-shot" cost and its related cost-per-entry payoff should not be a deterrent. rather, it should permit the administrator of a limited operation to deal effectively with increased clerical costs and to make meaningful decisions relative to service bureau overtures, library board interrogations, or the goals of a new library system. references 1. hayes, robert m.; shoffner, ralph m.; weber, david c.: "the economics of book catalog production," library resources and technical services, 10 (winter 1966), 65, 68-82, 87-88. 2. university of california, institute of library research: report to the california state library preliminary evaluation of the feasibility of mechanization (institute of library research, university of california, 1966), p . 3-6. 3. cartwright, kelly l.; shoffner, ralph m.: catalogs in book form: a research study of their implications for the california state library and the california union catalog, with a design for their implementation (institute of library research, university of california, 1967), p. 58-68. 4. bourne, charles : bibliographic data conversion techniques (mimeographed tables presented at oregon library mechanization workshop, june 1968) , table ii. · 5. chapin, richard e.; pretzer, dale h.: "comparative costs of converting shelf list records to machine readable form," journal of library automation, 1 (march 1968) , 71. l cost comparison j kountz 177 6. black, donald v.: "creation of computer input in an expanded character set," ] ournal of library automation, 1 (june 1968), 117118. 7. fasana, paul j.: "automating cataloging functions in conventional libraries," library resources and technical services, 7 (summer 1963), 358, 361-365. 8. robinson, charles w.: "the book catalog: diving in," wilson library bulletin, 40 (november 1965), 265-268. 9. macquarrie, catherine; martin, beryl l.: "the book catalog of the los angeles county public library; how it is being made," library resources and technical services, 4 (summer 1960), 225-226. 10. heinritz, fred: "book versus card catalog costs," library resources and technical sm·vices, 7 (summer 1963), 231-236. 11. smith, f. r.; jones, s. 0.: card versus book-form printout in a mechanized library system, (douglas aircraft company, 1967; clearing house document #ad 653 697), p. 7-8. . 12. wynar, don: "cost analysis in a technical services division:· library resources and technical services, 7 (fall 1963 ), 320-326. 13 a book catalog at stan ford richard d. johnson : stanford university libraries, stanford, calif. description of a system for the production of a book catalog for an undergraduate library, using an ibm 1401 computer (12k storage, 4 tape drives), an expanded print chain on the 1403 printer, and an 029 card punch for input. described are the conversion of cataloging information into machine readable form, the machine record produced, the computer programs employed, and printing of the catalog. the catalog, issued annually, is in three parts: an author & title catalog, a subject catalog, and a shelf list. cumulative supplements are issued quarterly. a central idea in the depiction of entries in the catalog is the abandonment of the main entry concept. the alphabetical arrangement of entries is discussed: sort keys employed, filing order observed, symbols employed to alter this order, and problems encountered. cost factors involved in the preparation of the catalog are summarized. in november, 1966, a new library opened at stanford university. designed primarily to serve undergraduates, the j. henry meyer memorial library is a major addition to the libraries on the university's campus. a four-story structure with 88,000 square feet of usable space, it has shelving for 140,000 volumes and seating for 1,900 readers. the new library has numerous distinctive features. one is the subject of this account -the catalog. there is no standard card catalog in the building. instead, copies of a book catalog are situated at eighteen locations throughout the library, easily accessible to all students and staff. in addition, copies 14 journal of library automation vol. 1/ 1 march, 1968 of the catalog have been placed at other points on the campus: the main and departmental libraries, offices of academic departments, and student dormitories. the literature now contains numerous accounts on the preparation of book catalogs in libraries ( 1,2). one may question the value of yet another narrative, but an account of the stanford experience is valuable for several reasons. the genesis of the stanford book catalog has been recorded, and a follow-up describing what happened subsequently is the next chapter in the story. the book catalog experience at stanford is now sufficiently advanced that one may recount the undertaking both in depth and breadth-from its inception, through design, implementation, and first full year of operation. such an account can give stanford's approach to some still unsolved problems; for example, filing order, and the innovations it has made. the particular environment within which the book catalog was designed was conducive to innovation, because the entire university library system was not itself committed. finally, the approach here employed has been eclectic, and this report can record thanks to the many individuals .and institutions whose ideas and plans have been examined for possible use in the stanford undertaking. of particular importance to this project were the example and experience of florida atlantic university, the ontario new universities library project at the university of toronto, and the columbia-harvard-yale computerization project. origins the stanford book catalog had its ongms in 1962. during planning for an undergraduate library it was felt a catalog in book form and available in many locations would have immeasurable educational benefits for the students. particularly was it felt that the subject portion of such a catalog would prove a valuable bibliography to students in the university ( 3). somewhat later, when the size and proposed layout for the new library indicated the desirability of at least three complete card catalogs as an adequate guide to the collection, further emphasis was given to the possibility of a book catalog in multiple locations. a grant in 1963 to stanford university from the council on library resources, inc., permitted a study by robert m. hayes and ralph m. shoffner on the economics of book catalog production. this investigation compared the costs of the various ways in which a book catalog can be produced ( 4). of the methods considered, stanford selected the computer to study further. the computer was chosen not only because equipment was already available on campus but also because of the recent introduction of an expanded print chain with the capability of printing upper and lower case letters as well as necessary diacritical marks. in the fall of 1964 stanford undertook further study, employing the hayes-shoffner report as a basis but now comparing refined costs of a a book catalog at stanford/ johnson 15 computer-produced book catalog with costs for three complete card catalogs in the new library, as well as costs for two shelf lists and main entries in the university libraries' union catalog. this second study was completed in december, 1964, and university officials approved the preparation of a computer-based book catalog for the library when it was determined that such a catalog would prove more useful, and for a few years less expensive, than the three card catalogs ( 5). while the autumn study was in progress, cataloging of the new ljbrary' s collection began. plans were made for three card catalogs. although the card catalogs were never prepared, the planning was of considerable value later in establishing field and record lengths for the machine record, as well as in securing general agreement on the kind of information to include and the format of the final catalog. systems design preliminary systems design began in january, 1965. a systems engineer from ibm guided a team of university staff composed of librarians and personnel from the administrative data processing center in the controller's office. at the outset it was recognized that the assignment to produce a book catalog for the new library did not call for consideration of the other aspects involved in the library's operation~, such as acquisitions, circulation and reference. but as work proceeded, efforts were consciously made to design a system that could be integrated into a larger system at a later date. the basic object of the preliminary systems design was to refine further the cost estimates from the study of the preceding autumn. the system as it was being designed, however, called for increased machine time and corresponding increases in cost for processing as well as for programming. in retrospect, the major achievement of the preliminary systems design was to establish the environment for a meaningful dialogue between the librarian and systems and computer personnel. when the study began, the librarian requested a system that would have involved use of a large configuration of equipment with direct-access capability. the systems and computer staff approached the design with knowledge of the equipment that would be used for the project (an ibm 1401 computer, 12k storage, 4 tape drives) and thought in terms of fixed-length records and fixed-length fields. through a program of mutual education, the librarian learned of the computer and what it could do and what it could not do; and systems and computer personnel learned of the library's requirements and desires. there evolved the basic design for a system capable of being implemented on the equipment at hand and acceptable to the library. as preliminary systems design drew to a close, necessary equipment was ordered. the principal element was the expanded print chain for 16 journal of library automation vol 1/ 1 march, 1968 the ibm 1403 printer, containing 100 different characters and developed earlier by florida atlantic university, yale university, and the university of toronto. in addition, appropriate modifications were made to the central processing unit of the 1401 computer to be used in the project. for the inputting of data the ibm 026 card punch was selected. it was available, and there was considerable local experience in its use. a modification made to it simplifies punching of one character, the word-separator character, used to designate an upper-case letter. delivery time on the 026 card punch was four months. although it was realized that the newly announced ibm 029 card punch would be superior for our project, delivery time on it was one year. even before the 026 card punch was received in july, 1965, an order was placed for an 029. the 029 replaced the 026 in august, 1966. the 029 card punch, designed for use with system/ 360, was considered superior to the 026, because it is possible to punch each of the characters specified on the expanded print chain without resorting to the multi-punch key. appropriate modifications were ordered for the 029 so that desired characters would print at the top of the punched cards. detailed systems design was completed by june, 1965, and the system may be described in the following manner. output the design called for four basic outputs from the system: 1. an edit list to facilitate proofing of the items converted into machine readable form. this was considered essential because of print-out in upper and lower case. 2. an author & title catalog listing items under their author and title entries. 3. an alphabetical subject catalog listing items under library of congress subject headings. 4. a shelf list entering all items in call number order (the library of congress classification was adopted in may, 1965), giving all tracings for a particular entry, as well as the number of volumes and copies and their location in the library. a complete catalog was to be printed annually (author & title, subject, and shelf list) with cumulative monthly supplements to each. output for the annual author & title catalog and subject catalog from the computer printer were to be photographically reduced, offset masters created, and fifty copies printed. the catalogs were then to be bound in reusable binders. later it was decided to restrict use of the reusable binders to the shelf list, printed in four copies, and the supplements for the author & title and subject catalogs, to be printed in six copies, and to bind the basic annual catalog in standard book form. it was also decided to print ten copies of the author & title and subject supplements. a book catalog at stanford/johnson 17 it was originally proposed to divide the catalog in a slightly different manner: names (as authors and as subjects) and titles in one section, and topical subjects in the other. although this seemed to have considerable logical value, it proved impossible to implement during preliminary work with card files, given the time and staff available. provision was also made to print the catalog in one section as a dictionary catalog if so desired, or on cards if the book catalog should be abandoned at a later date. input to achieve the above output, the design called for four kinds of input into the system: 1. entries for titles cataloged. a separate record was to be made for each volume or copy of a title cataloged so as to provide holdings information for the shelf list and for integration into a circulation system at a later date. 2. cross references to connect headings in the author & title catalog and in the subject catalog. in addition, the cross reference format would permit the introduction of information notes into any of the catalogs. 3. changes to entries that are in the catalog. 4. entries for items that are on order, with a view to integrating this form of input into a larger acquisitions system at a later date. implementation the systems design called for the preparation of eight different computer programs to transform the input into the various documents as specified above. the basic programs were written dlll'ing the six-month period of june-december, 1965. during the first part of 1966 the programs were debugged and the very important change procedure prepared that enables revision or deletion of a record. coincident with the preparation of the programs, library staff began in july, 1965, the inputting of cataloging information. the expanded print chain was installed in june, 1965, and edit listings for proofing purposes were available in august. in order to test the programs and study the catalog's format, a first test catalog was prepared in january, 1966. a second test, incorporating the change procedure, was undertaken in april; and a third, partial, test was run in june. the machine record when stanford first considered the costs of a book catalog in 1962, it was quickly discovered that the most expensive element was reproduc18 journal of library automation vol. 1/ 1 march, 1968 tion of the individual pages. this factor influenced many decisions in design: the more entries per page, the fewer pages and less overall expense. it became necessary then to consider which elements in a standard catalog entry could be omitted or abbreviated. decisions were fairly simple to make. the collection duplicates almost entirely material in the main research library's collection, with full bibliographical information given in that library's union card catalog. in addition, browsing is encouraged among the open shelves of the new library. the books are readily available should further information be required. along with the factor of cost another element appeared-the desire to make a book catalog that would be something more than reproductions of unit catalog cards. as this thought evolved, it was learned that more space could be saved in the catalog through abandonment of the unit card and main entry concept. articles by ralph h. parker ( 6) and wesley simonton ( 7) were instrumental in developing this aspect of the system. · the library was amenable to a short entry in the catalog, but the actual length was another matter. from a sampling of items cataloged, it was learned that more than 99 per cent of the entries would be less than 500 characters in length. there was considerably less certainty on maximum lengths for the individual units, or fields, composing each entry. computer personnel argued in favor of a fixed-length machine record in order to simplify programming, and a successful compromise was made: there was to be a fixed-length record composed of one fixedlength field and six variable-length fields. each record is 570 characters in length. for the few catalog entries that are extremely long it is possible to use two records for one catalog entry. the maximum length for any catalog entry is thus approximately 1,000 characters. it is possible to enter even longer units by dividing them into sections and entering each as an analytical entry. to speed input-output time and to conserve space on tape, the records are placed on magnetic tape in blocks of two records each. each of the six variable-length fields in the record is individually tagged. it was learned during the preparation of a later program that it would be necessary to restrict the overall length of any one field, and it was agreed that the maximum length of any one of the variable-length fields would be 400 characters. through a misunderstanding, the author did not realize that in tape storage an upper-case letter is equivalent to two characters, a factor not taken into account when record and field lengths were established. fortunately, this minor error has occasioned no problem. the master tape record the master tape record (table 1) illustrates how all of the information appears on magnetic tape. (figure 5 gives an example of the layout.) a book catalog at stanford/johnson 19 table 1. map of master tape record. position type of information 1-30: 31-35: 36-42 : 43-44: 45-46: 47: 48: 49: 50: 51: 52: 53-54: 55-57: 58-71: 72-77 : 78-83 : 84-89: 90-95 : 96-101: 102-107: library of congress classification size and/or format of publication (e.g., folio, mfilm) volume number part number copy number type (blank: monograph, no anal.; 1: monograph, anals. made; 2: serial received in unbound form; 3: serial, unbound, anals. made; 4: serial received in bound form; 5: serial, bound, anals. made; 6: analytic; 7: author-title cross reference; 8: subject cross reference; 9: item on order) record indicator (program supplies ''1'' if there is an overflow record and "2" in second record) special location in library (code a-z) change indicator (code c for revision; code d for delete) title indicator (code t if entry desired under title) shelf list indicator (code s if entry .is to appear in shelf list only) year acquired (e.g., 67) month and year reported missing (e.g., 117 for nov. 1967). it is assumed a book will be removed from the catalog if missing more than· nine years. future codings address and length of main entry (area 20) (three positions for address, three for length) address and length of conventional title (area 30) address and length of title paragraph (area 40) address and lengh of notes (area 50) address and length of subject headings (area 60) address and length of added authors and added titles (area 70) 108-570: variable length fields the fields to simplify coding and keypunching, each field in the record is called an area and numbered 10 through 70. as will be shown later, these numbers are not transferred to tape. a description of the seven fields in each record can give a good idea of the elements included in cataloging and how the unit card/main entry concept was abandoned. area 10 is the one fixed-length field in the record. it is 71 characters long and contains positions for call number, volume number, and copy l 20 journal of libra1·y automation vol. 1/ 1 march, 1968 number. in addition, it contains indicators for other elements: type of publication; record indicator (program supplied if there is overflow to a second record); special location in the library; change indicator; title indicator; shelf list indicator; year of acquisition; and date missing. fourteen positions remain blank for future use. area 20 contains the main entry, area 30 the conventional title, area 40 the title paragraph. the title paragraph includes: the title; author statement; edition statement; imprint, limited to publisher and date; and collation, limited to pagination. area 50 contains notes. subject headings are recorded in area 60, entered one after another and separated one from another by a record mark, a symbol resembling a double dagger. added authors and added titles are entered in area 70, similarly separated one from another by the record mark. only added titles are entered in area 70. if a catalog entry is desired under title, then the title indicator is marked in area 10. personal names on the form of personal names in the catalog, it was decided to anticipate the anglo-american cataloging rules, publication of which was imminent. in general, the title-page form of a personal author's name is used. on the one hand, this has meant a shorter record and greater simplicity in inputting data; on the other hand, it became necessary to maintain a name authority file when the form adopted for the book catalog differed from that established by the library of congress or earlier cataloging rules. the relator, the element that describes the relationship of a person used as an entry to the work being cataloged (e.g., ed., tr., comp., illus.), is omitted in the heading to save space. the relationship is shown in the title paragraph. a heading in the book catalog, either author or subject, is printed once before a group of titles and repeated only if the titles associated with it are continued in another column. in addition to not permitting use of the relator, the system does not permit in the author & title catalog "added" entries composed of an author and a title. in standard cataloging such a technique may be used instead of a separate analytical entry. in the author & title catalog, however, such a composite entry would establish a new "author" (name plus title of the work) and would file as a separate unit after all works by that author. in the subject catalog the author-title entry is permitted so that books about voluminous authors and their individual works may be better displayed. the conventional title the conventional title has been employed to assemble under an author's name editions of a work with variant titles. collected writings of an author, or selections, are given the conventional title [works] or [sea book catalog at stanford/johnson 21 lections] ; through a combination of coding and programming, they are entered first under an author's name before titles of individual works are listed. (see in figure 8 the entry under karen horney for an example. ) the conventional title has meaning only as it is related to the main entry. for that reason it prints only in the catalog when preceded by the main entry. the title paragraph and the unit record as summarized above, the title paragraph includes the title, author statement, edition statement, imprint, and collation. with one major exception this involves the copying of, or truncation of, information present on a library of congress card. the exception is the author statement. as shown in a recent investigation ( 8), this element was present in but twenty-five per cent of the entries studied. current cataloging rules permit in some cases the omission of the author statement when it is identical with the form used in the heading ( 9,10). these rules are based upon a cataloging system employing unit records on cards, the first element of which is the main entry. in unchanged form the author statement is used as the main entry; for added entries another heading, such as author, title, or subject, is superposed on the card. in the stanford system a new unit record was introduced. the first element of it is the title paragraph. all headings, rriain or added, are placed directly above it; and if entry under title is desired, a . title entry is made in hanging-indention form. the stanford book catalog thus does away with the main entry concept completely. the necessity, or even wisdom, of setting apart one field in the machine record as main entry may be questi~ned. why not group the main entry with the other added author entries in area 70? there were two reasons: first, it is simpler to adapt the information from library of congress cataloging information if the form can be followed relatively closely. second, we wished to allow for the possibility of printing standard catalog cards if necessary, and this would allow for a reinstatement of the standard unit card concept. a basic requirement of the system is that the author statement must be included in the title paragraph. if for any reason it cannot be listed there, then it is recorded in note position in area 50. although no formal study was undertaken, it was believed that works by single personal authors would constitute more than fifty per cent of the collection. the addition of the author statement in the title paragraph for each such book could add considerable bulk to the catalog. accordingly, through the use of record marks as coding symbols, the author statement is set off in the title paragraph for those works by single personal authors. through programming, the author statement is suppressed when the work is entered in the catalog under the name given in the main entry; whereas it appears under all "added" entries. 22 journal of library automation vol. 1/1 march, 1968 i }7({]1\_, '-. 0 ~' -oifoii j 1y !.0. • "<• "'·><• ~~· ! :·~·;.~~~,, t: ,· dec z q 1966 1 1' fl ' 6'6 1 ·, . :··' ·""': '~~ ,,, ~~~ -·· ' ' ' ,. ':·.-:. }"'~:::;> ' '., • ., ..... no. '/'---"'~' ' ' i . . ' ~ ' ? i .. · ·' .':' . ' . 1018~ hdll~· ).;--' . ':.' _i''··-::-''~·· :;-_, . v ~·~" 101 i~0 ' . . ; ; r.~~-ickes, harold l. • u • 1 " • u ~ ~o(]j ll .:l,'h•;j}secret diary ot kt.rold l, i ekes, ' . ,t.-1 2 0 1, 1111:.: ' !4oit.i8o6 simon t.nd schuster, 1954-~53 v. ii:j. '1' ~ ' :4.~~ :12 ~\eet.a, lc 'illae fire~ tbocuana oook no. oat£ _l; i , . da;s, 1933 1936. ..e, 'l'he iulde n-t;~ 1 l1jj : • · 11 1 c, . .,.,o.f_ atx oggh, 1936 1939 3 'rio loun"'&u.l.o · · : 1 • tl:odth 1939 19'•1 1 oth( '\ l.• o. sur&. h.._,,u,f oots ! so[]] '<•'l-"·••,_, .. ' . { . ' v•';.,...,,,.;, j· . i llll i ~ ' u.s.~-polltics and sover.,..ent--1933-1~' . · . ·~· do~ i. title . vol. no. u i't. (f)i'y s4 . ~ ' 11 j 11 j i. ·. ' l'{jl () " . " ........ '1· .!' .:. ·. (\;,: ..... --·· · .. c.:t-·-··· oeczo~ _ ..... •· . i ·. 6j6' . ·fig. l. the coding sheet. a book catalog at stanford/johnson 23 the system as it has been established calls for works to be entered alphabetically by title under the heading in the catalog. in the subject catalog this means, too, that works are listed alphabetically by title under each subject heading and not by author. it was felt that this form of arrangement is quite satisfactory for a selective collection, such as the meyer library; and it offers the possibility of scanning a page of thirty or more entries. it has occasioned one problem-when a subject heading expresses form and not subject. for example, under "symphonies," works are arranged alphabetically by title and not by composer. conversion the basic source document used is the library of congress catalog card. although the card itself could be used by crossing out unnecessary information and adding other data, it was felt that a clearer document would result if the needed information were copied onto another catalog card. examples of such cards are shown in figure 1. (subsequent figures depict the manipulation of the entry for the book by thomas a. bailey, presidential greatness.) as illustrated in figure 1, an identification number is assigned to each catalog card and four catalog cards are placed upon a coding board and a xerographic copy made. . the original cards are filed in a manual shelf list with the identification number as an indication that the information has been coded. the coding sheet is given to the coder, who enters area 10 information in the blocks at the right and indicates to the keypuncher where other areas begin and what special symbols should be used. · to simplify the inputting of data and the scanning of punched cards, a special data-processing card was designed for input (see figure 2) . each title converted is represented by a decklet of punch cards averaging six. each of the seven areas or fields begins on a separate card. alcn1'1ov 1. 0. !fq tl.ali :o ~vjs:ot\11 $ 118 fin soot! ~!\ oat i!: oflll"!l 1. ()_ ~4/pi:iitma....!l:!'ouj!oi:e ml'li!:r ~.:.y !' ~ " . 1\ /·· •o o.iioeiit£0 041[ oiiot" im*.ij" ; ,__ 1nit' ::-. ~ 0 0 0 0 d 0 doooodooiiidoooooooooocooododiiioooodooooooooooooooooooooooiddoooqoodo ~ '~ 1 t =. ~~q~"~n ••nwnn~ nuu~n~nnu~u~n~••ftdum~u••u•l1t~li»~"~~~umflhumuhuhunnnnhn~n~na • 111111 i 11 i 1 i ii 1111111 i 1.1 11 111 11 i 11 i 1 i 1111111 i i 11 11 11 1111 i ~ 2,0 main entry ~i 222222 30 222222222222222222222222222222222222222222222222222 convenfion.al title• g~ 3 3 3 3 33 3333333333333333333333333313 33 33333333333 3333333333 z~ 40 title q~ 444444 444444444444444444444444444444444444444444444444444 ~~ $0 note$ ~;;1 ssssss ss5555555555 55ss55 sssssssssssssssssssssssssssss6l55 c •o suo ject headings !ll sssiu iiisisiisiiiiggii ii illiiiilllli l iillilliiilllllllll !:; to iiii(daui~$-igc£0 mm ~111117 11 i i i i i i 11111 111 111111 1j1111111jiii111111111111h 1771111111111711111111 iiiiiiiiiiiiiiiiii i iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii iii iiiuiiiii i iiiiiiiiiiiiiii !9999919!9999999919999999999999!99999911119999!!99!99999!!! !!9! ! ! !! ! 1!!9911!1!9! i l 3 t i & ',.! ~~~~~~ 112 11 u i)j* ll)t iijtj\uu!cuan:uuoll ublt ""31m··~·~ 0"4uuitiusi$1'1" ~' "~ui51~hh1uumiuuimm71 n nn )4 wltn hltl fig. 2. the data processing card. 24 journal of library automation vol. 1/ 1 march, 1968 though this may be considered wasteful of cards and indicative of "socolumn mentality," it does have its benefits. a mistake in punching in one area requires repunching of the material in that area only, the area being the smallest unit for editing purposes. so that the cards are kept in correct order for processing, the first ten columns of the cru:d are used for identification numbers. the six-digit identification number assigned to the original card is punched in the first six columns. the first digit is the month, 1-9 being january through september and zero being october-december, one "month" of 92 days; the second and third digits are the day of the month; and the fourth through sixth a consecutive number assigned each day. it is possible thus to code 999 enhies each day. the year is omitted, because it was assumed data would be transferred to magnetic tape at least once a year and probably more often. the area number is punched in columns 1 and 8 and the sequence within the area in columns 9 and 10. cataloging information begins in column 11. figure 3 shows a decklet of cards for one title. information in area 10 is fonnatted somewhat differently for books on order or for cross references. to enter a book on order, the word "ordered" is punched in columns 22 through 28, the date and order number in columns 29 through 40, and 9 in column 57, the type indicator. ·. this information prints in the catalog as a call number, and books on order are listed first in the shelf list. to date, entering of books on order has been limited to a few sample cases only. a cross reference in the subject catalog has a "call number" composed of the first nine characters of the entry to which reference is made (punched in columns 13-21) and an eight-digit identification number (in columns 22-29). an 8 is punched in column 57, the type indicator. a cross reference in the author & title catalog has a similar "call number" composed of the first ten characters of the entry to which reference is made (punched in columns 12-21) and an eight-digit identification number (in columns 22-29). a 7 is punched in column 57, the type indicator. subject cross references are listed in the shelf list in "call number" order following books on order; and author & title cross references follow subject cross referen.ces. the "call numbers" for cross references do not print in the catalog proper, and serve only as addresses to retrieve the cross reference from magnetic tape when a change or deletion is necessary. added copies and volumes are entered by preparing area 10 information only and punching s in column 61, the shelf list indicator. the first copy entered is never coded explicitly as copy 1, even though a second copy is being simultaneously added. the program automatically identifies the first copy entered as copy 1, and the number prints in the shelf list when another copy is added or a volume or location is shown. ~o long as cataloging information is on punch cards and not on maga book catalog at stanford/ johnson 25 netic tape, the 10-digit identification number is the device used to retrieve the information. when the information is transferred to magnetic tape, the identification number is lost and the call number becomes the identification device. as shown in figure 3, information in area 40 (and in areas 20, 30, 50, and 70 as well ) continues to successive cards. it is not actually necessary tb punch through column 80 before beginning a continuation card for an area, and experience indicates that corrections are simplified if blanks are left at the end of each card. the only requirement is that if 0 10'1000 1 0 0 00 1 08110001100800dod8001000dgdgoo g~gooobigodoooogoogioo nqqmbm ~ ·~mnnn"aknnh·~~n~ nxum~~~u~~uutt ~"m~u~~mmgmhuuvu~ »muuunftd n~n mnn ~ main (huy cohv( ntklftal. ti tle> titl e note:s ii 1("0 suii..icct h£ /tdi,.g3 11 1 1111 1 1111 1 1111 1 111 11 11 1 1111 1 11 1 11 1 1111 1 11 1 111111 2 2 2 22 2 22 2 2 2 2 2 2 2222 222 22 2 22 22222 2 2 222 222f22 2 22222222 llll 3 j 3 3 3i 3 j 3 3 j j j 33 j 3 j l j j j j jj j j j j j j 33 j j l j l ll lll j l j l 1 4411114 44111144114 4 4 1 44 1 1114 41 444 4 14 4 4 44 4444 4 4 14 44 i i s 5 is is ii i i 51 i i 5 1 i is i is 55 5 fs i 1511151555 55 5 55 5 51 ill ai:uo autii«nez perc6 lnd!ons end the opening of the northwest+• by alvin"· josephy0 jrto yale un lv. press1 19€5. 705 p• sub.! nez perce indians--history nort h~ est0 pacific-history r . o. o8i009 call auth tltl subj j.d. call auth tl tl sub j i.d. call auth 1111 subj i. d. e17l .a43 index amer i can h eri ta ge typ joe cllg u. sht yr 66 ten year cu~ula t! ve index, amer!con heritage. volu•e vi, hu•ber 1-volu~e xv• nu•~•r 6. dece•~•r• 195t octobe~, 1964. american heritage, 1965. 167 p• u. s. --his tory--indexes u.s.--civiliz~tion--indexes 081010 el76.1.bl7 typ loc chg tlo t aht vr 65 ::!!~~;ni~:~•:r~;tness: the saage and th e • •n !toe george washington to the presentt• by tho••• ao b•lleyte app l etoncen t ury, 196~ 368 ~· pres idetits-u .s. 081011 eb06.12 v.l tvp joe chg i ckes, harold t. secret dlar~ of harold l• iekel• si~on 1nd schuster, 1954-~5. 3 v. the pro9ra• ha; e ncountered a card sequence error. u. s. --pol !ti cs ano govern~ent--1933-19~ 5 08101 2 u. t slit )lr 66 page 35 france call i.d. . e81!.§..i2 v.2 ·~~~i_:i) typ i oc chg tlo sht s yt 66 ( .... <;.. c'-<1 e'l/c'l3 fig. 4. the edit list. ~ <:r.l i .,... .a 1:'"1 .,.. ~ ~ ~ ~ b" ~ .... a· ~ ~ !""" ....... ........... ....... ~ ii) '"t _?....... ~ ~ a book catalog at stanford/ johnson 27 the edit list on a regular basis, generally once a week during activation, an edit list is run on the computer for cards punched since the last listing. an example of a page from this list is shown in figure 4. this list is proofed against original cataloging data as represented on the coding sheets. information still remains on punched cards, and errors detected are corrected on the cards. as an aid. to proofing, the edit program generates a number of error messages: cards out of order; absence of area 10 information; call number incorrectly formatted; an invalid character punched; information too long to fit into two machine records; information in one field more than 400 characters in length; an incorrect use of coding symbols (for use in determining filing order and to set off author statements) ; and incorrect use of record marks in areas 60 and 70. a nagging problem encountered during proofing is the fact that it is done out of context. in the preparation of cards for a standard card catalog, a second form of proofing is possible when cards are filed and entries compared with headings already in the catalog to insure that they are compatible. with a machine performing this function, this further check is not practical with existing equipment. similarly, the machine does not recognize human errors and will file a misspelled word as it was entered and not as the word it was meant to be. the discipline required for the accurate inputting of cataloging information intended for machine manipulation is at times frightening. differences between a book catalog and a card catalog became more obvious as work progressed. in a card catalog one can: see from the typography, the stains on older cards, and the kind of card stock used, tl1at information was entered at different times. one is more willing to tolerate the differences that appear. because it is not generally possible to compare a number of entries at the same time in a card catalog, one misses many inconsistencies. on the other hand, in a machine-produced book catalog, one scans numerous entries with one glance. produced at the same printing, they appear of equal vintage even though they may have been entered at different times. the inconsistencies resulting from changing cataloging rules become very obvious. this is particularly annoying with respect to the matter of capitalization, and the effort to produce an internally consistent document is difficult. costs for inputting the first year's experience ( 1965-66) has indicated a cost of $.40 per title for inputting of cataloging information. indications are that this cost has remained constant for the second year. included in inputting for each title is provision for all extra records needed for added volumes and copies and cross references. the cost does not include actual cata28 journal of library automation vol. 1/ 1 march, 1968 loging, preparation of the typed catalog card, overhead, or computer charges for the edit list. the cost may be broken down as follows: table 2. inputting costs, 25,000 titles (1965-66) coding: 50 titles per hour @ $2.20 per hour keypunching: 12 titles per hour @ $2.20 per hour proofing: 72 titles per hour @ $7.40 per hour ( 2 staff members) equipment : keypunch rental ( $926.02); punch cards ( $312.34); and coding sheets ( $520.86) $ 1,100.00 4,583.33 2,569.43 1,759.22 total $10,011.98 the stanford experience indicates that over a period of time it is possible to input 100 titles per eight-hour day on each card punch or approximately 2,000 per working month. this figure is based on a shortened catalog record as described, but includes provision for separate records for added volumes and copies and cross references. the staff employed at stanford had no previous experience keypunching and were instructed either in a formal school for five days or, as is now done, on the job. three or four staff members are trained for keypunching at all times and have regular schedules. with a staff of this size, punching can proceed . on a steady basis in spite of vacations and illnesses. the programs systems design called for the preparation of eight computer programs. in addition, two package sort programs from ibm are employed for the filing of entries in the shelf list and in the author & title and subject catalogs. as experience increased, it was found that the updating of entries in the first annual catalog when merged with new data for the second annual catalog could be simplified if three utility programs were used; these also were prepared. a locally devised assembly language, sopat, similar to autocoder, was used for the programs. the first prog~am is the edit program ( lb001) which processes the cards to prepare the edit list described above. during the first two years of operation the basic pattern has been to prepare weekly the edit list described above using the edit program ( lb001), and transfer data to permanent storage on magnetic tape once every three months. (the punch cards are stored in another area on campus, as back-up.) the quarterly basis has coincided with the schedule for the program tests as well as the quarterly supplements and annual catalogs. in brief, the following happens: 1. ·cataloging information is transferred from punch cards to magnetic tape through the card to tape progr~m ( lb010). 2. through a call· number sort ( ibm sort 7) the above records are arranged in call number order for a basic shelf list. a book catalog at .stanford/ johnson 29 3. through the format and update program (lb020), all the necessary entries for the author & title catalog and subject catalog are generated from the above records and the shelf list is updated. 4. through an alphabetical sort (ibm sort 7 or ibsys 7090 sort depending on the magnitude) the entries for the author & title and subject catalogs are arranged in alphabetical order. (longer sorts have been run on an ibm 7090 computer. ) 5. through the author-title and subject update program ( lb050) , the new entries created by lb020 are merged with existing entries and existing records are deleted as specified by lb020. 6. the author-title and subject split program ( lb060) sets up the entries on magnetic tape (line length, indention) as they will appear in the catalog and establishes the two columns for each page. 7. the author-title and subject printout program (lb070 ) prints the pages of the author & title and subject catalogs. 8. the shelf list split program ( lb030) performs the same function for the shelf list as undertaken by the split program for the author & title and subject catalogs ( lb060). 9. the shelf list printout program ( lb040) prints the pages of the shelf list. the change procedure it is possible to change information in a preceding supplement by following tl1e change procedure: to change a call number, the entire entry is deleted, employing a record consisting of area 10 and a delete symbol in the change indicator, and a new entry inserted in a separate record. to change an area, a change record is prepared consisting of area 10 information with the change indicator marked plus card( s) for any area( s) to be changed. for example, if there is an error in a subject heading, it is necessary to prepare only an area 10 plus an area 60 change record showing the subject headings desired. the smallest unit for editing purposes between supplements is the area. changes in an annual catalog dm;ing systems design it was realized tl1at machine time could be saved if a different procedure were followed to change information in the preceding year's catalog when merging it with fresh data to form a new annual catalog. the procedure used is to delete through three utility programs ( lb075, lb080, and lb090) entries that are to be changed and then enter them anew. the first of these programs ( lb075) is a card-to-tape program. special delete cards (essentially area 10 information giving call number ) are 30 journal of library automation vol. 1/1 march, 1968 prepared for each volume and copy to be deleted. they are transferred to magnetic tape and sorted into order by call number. in the second program, the shelf list delete program ( lbobo), the last annual shelf list tape is read and entries as specified by lb075 are deleted and a new shelf list tape written. since the author-title and subject files are not in call-number order, a table look-up technique is used in the third program, lb090, the authortitle and subject delete program. the table consists of call numbers (in proper sequence) for all records to be deleted. each entry in tl1e author-title and subject files is checked against the table and deleted if the call number for the entry is listed there. a revised author-title and subject tape is thus prepared. records on tape the basic tape record follows the format of the master machine record described earlier. figure 5 shows a tape dump of the master shelf list 501 601 701 801 901 1001 1101 1 lot 201 301 ~01 501 601 701 801 ,01 1001 1101 1 101 201 301 401 501 601 701 801 901 1001 1101 1 101 201 101 ~01 501 601 701 801 901 1001 1101 1 101 h 174 0563 12 1 c s46 + 1 10 20 30 40 50 60 70 80 ~0 100 e 1761 ~17 1 r 66 108021 129151> 27901 7 baileyo thomas a.+ presidential greatness thf image ano ne man from george washing ton to the p~ese~t+o 5y thomas ao 5ailey?. "ppletoncentury, 1966o 368 potpresioents•-u.s.t s66 + 1 10 20 30 40 50 60 70 80 9 0 !co e 1761 n 1 c r 66 1oao23 t31nnrco3510501 7 kane, jcseph nathan+ facts about the presidents a ccmpilaticn of 810grape l "l _.c hi stor !cal oau. 5y josfph nathan kane. ho w. wilson, 1960. 348 p.h s$h$eslsvseso cnly i ~ aru z 30o+presioents-u. s.+ +e 1762 "4 1 t 66 108018 126153 279ch ~eanso ~lria~~e+ 7~e wc~an in thf white ~ovse the lives, times and influence of twelve mtable fi~st l~cies+. 8y ii ariamne ~eans+. ranoch ~ouseo 1963o 299 p.+presioents--u.s.--~ives+ + 1 10 20 30 40 50 60 70 80 90 100 e 178 a2~ 1 i k t 66 108c24 1321c1233 c9933204 'i aol~s, jahes 7~uslcw+ album of ~merican history. james ·tr~slc .. acams, ecitor i ~ chih . . c. sca.ifthf.r• l91tlt-60. 6 v.+vole 5 eclted by j. g. e. hopkins.• vol. 6 (s l"'cex.4$ ss h$ e hlvs eso in reference alcove ho.+u.s.--histcry+u . s.--social life and custo~s? +e 178 a24 2 1 k 566 + 10 2 0 30 50 60 70 8 0 9 0 10 0 178 .~24 l k s66 fig. 5. tape dump of master shelf list. a book catalog at stanford/ johnson 31 as formatted in the card to tape program (lb010) and after having been sorted into call-number order. as stated above, two records are placed in a block of 1,140 characters, each record with 570 characters. printed with a limited print chain, some characters do not appear as they will finally. the record mark prints as a plus sign, and in the entry for the work by kane the symbol indicating underscoring prints as a dollar sign. the word-separator characters and some other characters do not print at all; spaces are left to indicate their presence. once the information has been processed through the format and update program (lb020), however, the machine record is somewhat different. new records, one for each entry that will appear in the final catalog; are generated. a listing of the elements in each of these records is shown in table 3. figures 6 and 7 show tape dumps for author a11d subject entries. table 3. the author-title and subject tape record. position 1: 2-81: 82-101: 102-131: 132-136: 137: 138: 139: 142: 145: 148: 151: 154-618: type of information catalog indicator ( 1: author & title catalog; 2: subject catalog) major sort key minor sort key library of congress classification number size and format record indicator (program supplies ·"1" if there is an overflow record and "2" in second record) delete indicator (program supplied, for use in change procedure) address for main entry (area 20) or added author/ added title (area 70) address for title paragraph (area 40) address for conventional title (area 30) address for notes (area 50) address for subject heading (area 60) variable-length fields the sort key the format and update program ( lb020) generates the sort key for each entry. the sort key determines the characters that will be considered when the entry is to be alphabetized. a succeeding sort program does the actual alphabetizing. mter some study, experimentation, and conjecture a 100-character sort key was selected. it is in two parts: a major sort key of 80 characters and a minor sort key of 20 characters. the major sort key is formed from the first 80 characters of the ele32 journal of library automation vol. 1/ 1 march, 1968 1 10 1 20 1 30 1 401 501 6 0 1 1 10 1 20 1 jci 40 1 so l t01 1 1 01 201 301 4 ci sc i 6 01 1 10 1 201 30 1 40 1 501 6 01 1 101 20 1 30 1 4 0 1 50 1 f 0 1 i 10 \ 20 1 301 1 10 zo 10 1ba1cwi ~ t ~ rpr 3095 83 l of i~e s~akespearean company • 40 s o 60 70 eo 90 100 crg anizhicn anc pe 154 171 baldwin, t. w,+ the orcanllaticn a~o personne russell russell, 1961. 463 p.+ + i 1 10 zo 30 40 50 60 70 80 90 100 poll t ical part! es 8alley, stephen kehp+ politi ca l partlf. st u. eojieo by robert a, cclcw jn. ran c mc hal 1bailey stephe n kehp ujk 226s g59 1 54 178 s. a. ess.v s ty stephen kenp ba iley m/0 others. ly, 1965 . 158 p. + + 1 10 20 30 40 50 60 1"ai ley tpo~as a ee 1761 bl7 . 1 541 75 bailey, p. e !~age ano the "an froh george wa shi ngton to the present • + ·1 10 20 30 40 ! bai ley tpo~as a to 6 4 1 a7828 1 54 175 re4t betrayal, h icounter pap erbac ks, 1963 . + 50 6 0 bailey , lt29 p.+ ro eo " too pres ioh t i al creath thoma s a.+ pr es,! cent i al cre6tness t appletoncen tury , 1~66. 168 p,+ 70 80 90 10 0 wccc roo wilson and tp.oha s a.+ wc ccrcw oils cn and t~e c 1 10 20 30 40 50 60 70 80 90 100 life and expl or at 10 1 6 aj n j a~thur n~ 70018~38) 1 54173 baj~ , j , art j.tvr+ life and explorations of f rictjcf nans en . • lt'i 9 p.+ ~ew eo• re v. and f.onsioerabl y enl. , wtth numerous !llus. anc map. w, scctt , n. o + 1 10 20 30 40 50 ! ba i nville jaccues 7c c 335 832 1 54 175 70 -1 935 , trans lat ed, with an introouc tdry ncte, by 60 70 80 90 10 0 fre~ch repuelic 18 bainville , jacou es + t~e f~fnc' rfpubli c, 18 hamis h hiles. j, cape , 1916. 25 3 p. + fig. 6. tape dump of entries in author & title catalog. ment that will serve as the entry in the catalog-the author, the added author, the title (for entry under title), the added title, or the subject heading. the minor sort key is formed from the first 20 characters of the element that follows the heading-the title or conventional title. the conventional title can never be a major sort key. the title, on the other hand, can be either a major sort key or a minor sort key. during the course of the split program ( lb060), major sort keys are compared. if two or more are identical, the entry words for the second and subsequent identical headings are suppressed and do not print. under the heading that does print, entries are arranged in alphabetical order through the first twenty characters of the element generating the minor sort key. in a sense, there is a third sort key. if both major and minor sort keys a book catalog at stanford/johnson 33 are identical, items will print in call-number order. in no case will the element generating the minor sort key be suppressed. during the first test of the programs one mistake was discovered with respect to the generation of sort keys and the suppression of entry elements. in a few cases the library possessed multiple copies of the same book cataloged under different call numbers (for example, one as a separate and one as part of a series). the problem arose when there was to be an entry under title, with two major sort keys identical. a similar situation arose for periodicals. a periodical might be entered itself under its title (area 40) and also appear in the catalog as an author (area 20) for a book it might issue. to eliminate this problem, a minor 1 10 20 ...)o,l 40 50 60 70 80 90 ico 1 2p~f.~ervati~n cf 2cological spec t•ens se e" zc clcgti'l!l spf.c 1 01 i lcclccica12ll~616 191 154preserv4ticn gf zcclcgical speci"e's+\ shse z 201 cclcgjcal speci"ens-collectio~ 4no pe pecple s choic e 201 hc" the voter ~akes up his ml~lo i~ a presioent!al campaign, ey paul f. lazarsfelc, bernarc 8 3ci eltelscn, and' halel gauoet. zo eo. columbia untv. press, 1965, 11e p o+ 4 01 501 t:cl + 1 10 20 30 40 50 60 70 bo 9c ico 1 2prf~icents u s election statisti cal ~ istcry 1ci jk 1967 p4 tal 34q[54presioents--u.s.--electic•• a shtis iical tt lstc 2ci ry cf the ahrica~ presiof thial elections, p.v svf.no pet~rsen, intrcc. c"~ ·i·tic,a l electic•s 301 ay lcuis fill er: • f. uncart 1963. 247 p.+s s,h.e'itlvse'o 1~ referfnce alccve 'ho.+ 4ci '3cl ttl + 1 tr.l 201 1 10 2.0 30 40 2phsidents u s election 1696 cf 71c j6 11!7 tal hectic~ cf 1•~6, 8y sta~ley lo jo>;fs, so 60 70 eo 9c teo preslt£•tial elect! 154pr£s i oe~ts--u. s .--flect tc•--1~96• t~e pres i cent ljnivo of wtsco'sin prfss, 1964. ~3~ po+ fig. 7. tape dump of entries in subject catalog. 34 journal of library automation vol. 1/ 1 march, 1968 change was made in the program: if a title generates a major sort key, it is never suppressed if identical with a preceding major sort key. filing order formation of sort keys leads immediately to a discussion of the unsolved problem of alphabetization. in the meyer library catalog the aim has been to duplicate as closely as possible the arrangement of entries found in the university libraries' union card catalog. basically, this means a word by word alphabetization. in addition, we have attempted to preserve as many of the currently used typing conventions as possible in preparing entries for the card catalog. for example, two or more initials separated one from another by periods have no spaces following internal periods. thus, the abbreviation for the united states is typed as u.s. and not as u. s. abbreviations filed as they are spelled and me's and mac's in separate sequences are two of the major differences from standard manual library filing. it was recognized that in generating the sort key the computer will scan the entry words character by character and space by space. thus, it is important that each character and each space be positioned accurately. the computer checks a character and either interprets it as a blank, a letter, a numeral, a symbol, or else ignores it. in alphabetizing, · this basic rule is followed : a blank files before a letter (a through z), and a letter files before a numeral ( 0 through 9). . certain marks of punctuation are interpreted as a space. they are: period, comma, colon, semicolon, hyphen, and question mark. some marks of punctuation are ignored. they are : parentheses, brackets, dollar sign, virgule or slash ( / ), equal sign, number or sharp ( #), per cent, asterisk, apostrophe, hat sign, and ampersand. it was believed that the presence of a space on either side of an ampersand would place entries in correct order, but in some cases this did not happen. some diacritical marks change the value of the character with which they are associated. for example, an umlaut over an "a," "o," or "u" changes that character to "ae," "oe," or "ue" respectively. non-filing symbols if an alphabetical order is desired other than that explicitly given in the entry words, special symbols are employed at the time of coding. since language of publication is not coded, it is necessary to place symbols around introductory articles for them to be ignored. the less-than ( < ) and greater-than ( >) signs are the symbols used to set off a sequence of characters to be ignored. the placement of these symbols is important. for example, to eliminate the article from the title, the century of science, it is necessary to place the symbols in this manner: century of science. in this way the sort key would be generated starting with a book catalog at stanford/johnson 35 the letter "c" in the word "century"; and a space would be left between the words "the" and "century" in the printed heading. use of the non-filing symbols internally in an entry is limited and must be strictly controlled through recording of decisions in authority files. so that names filed under prefixes and written as two words will be filed in the same sequence as names written as one word, the non-filing symbols are employed internally. for example, in order that van buren and vandenburg will file in the same sequence, van buren is coded as van< >buren. in this manner the computer is instructed to ignore the space when forming the sort key. the use of non-filing symbols has proved quite useful in subject headings to arrange period subdivisions in chronological order when there is a word or words intervening between the heading and the date. thus, with non-filing symbols employed as shown below, these particular subject headings are arranged chronologically: ct. brit.-history-1042-1066 ct. brit.-history-1066-1154 ct. brit.-history-1066-1485 ct. brit.-history-1135-1154 ct. brit.-history-1154-1189 ct. brit.-history- 1154-1216 the filing symbol the less-than and greater-than signs are provided on the expanded print chain purchased for the book catalog project. to date it has not been found necessary to use these signs as symbols in titles, and so their use is restricted to their role in forming non-filing elements. as work proceeded, need was felt for another symbol-one that would set off a field that would not print but which would be filed upon. for example, we wished to file the title, 1848: chapters of german history, as though it were written, eighteen forty-eight; chapters of german history; yet we did not wish to violate the form of the title as given in the book. an examination of all characters in the print chain led us to sacrifice the symbol @ for use as a sign in its own right. it is used solely as a filing symbol. thus, any characters or spaces placed between two @'s will generate a sort key as specified by those characters, but the information will not print. the title, 1848: chapers of german history, will be coded in this manner: @eighteen forty-eight@< 1848 >: chapters of german history. it will be filed as though it were written: eighteen forty-eight: chapters of german history. the use of the filing symbol has been especially useful in arranging period subdivisions in chronological order when treating years in the prechristian era or before the year 1000 a.d. for years in the pre-christian era, coding permits chronological arrangement of years beginning with 36 journal of library automation vol. 1/ 1 march, 1968 9999 b.c. the following procedure is observed: the year in question is subtracted from 9999 and the resulting difference, preceded by the letter z, is entered inside the filing symbols. the year will thus file after all letters but before years in the christian era. for years before 1000 a.d. the leading 0 is simply placed inside the filing symbol, for example, @0@476. to illustrate further, here are three subject headings as manually filed: rome-history-republic, 510-30 b.c. rome-history-republic, 365-30 b.c. rome-history-augustus, 30 b.c.-14 a.d. they are coded in this manner : rome-history- @z9489-z9969@ rome-history-@z9734-z9969@ rome-history-@z9969-0014@ the following sort keys are generated: rome-history-z9489-z9969 rome-history-z9734-z9969 rome-history-z9969-0014 the headings will file in correct chronological order and print as originally shown above. observations on filing order with but few exceptions, the filing order as designed has proved a very satisfactory arrangement. it has been felt advisable to place notes at various points in the catalog to link together headings which are filed separately. for example, the abbreviation mr. is filed as mr, and the word mister is filed as mister. here a note refers from one to the other. in the subject catalog it was discovered that if a country or local heading is abbreviated, two different alphabets are established. so far this has occurred for the united states (u.s.) and great britain ( gt. brit.) the terminal period generates a space when the sort key is established. thus subdivisions separated from the heading by a dash (two hyphens equivalent to two spaces) are in fact separated by three spaces and file before jurisdictional or form subdivisions which do not require the dash. for example, u.s.-history files before u.s. dept. of state. a note in the catalog gives instructions on the filing order in such a case. less fortunate is the situation of the author who chooses to use a name with a first initial and a complete middle name. because of the period and space separating the first initial from the middle name, there are established two spaces. thus, the following "incorrect" alphabetical order is established. smith, j. russell smith, j.a. smith, j.c. a book catalog at stanford/johnson 37 as may be expected, situations such as those described above do not occur often. it is hoped that through scanning of the open page before him the reader will find the correct heading. it may be argued that the coding required to achieve the alphabetical order in this catalog is too demanding for a project based upon use of a sophisticated electronic computer. possibly, programming should have taken care of all of this work. it has been our belief that we have achieved, in terms of the present state of the art, a good balance between what the machine should do and what the human should do. in the process we have been able to keep the form of the information as it appears in the source. as examples, introductory articles have not been eliminated from titles, and library of congress subject headings have been retained ( 11). most important, it has been possible to implement these rules consistently with a relatively inexperienced staff. page creation with each entry created and alphabetized, the author-title and subject split program ( lb060) is called upon to generate the lines for the final catalog and create the two columns of each page on magnetic tape. the final program ( lb070) prints the pages of the catalog from the tape. the computer line printer permits the use of 132 print positions in each line. the type size is the same as pica type-ten characters to the horizontal inch, six lines to the vertical inch. it was decided that the completed page size for the book catalog should be 8~" · x 11". with an allowance for an adequate margin on all four sides of the page, it was believed that the reduction necessary to employ the 132 characters in a line probably no longer than seven inches would be too great. experimentation led us to accept a reduction to 68 per cent and use of 98 of the 132 print positions. this can, however, prove expensive, as the printer takes as long to print 98 characters as it does 132. the catalog page as designed calls for two columns, each 45 characters in length, with an eight-character margin between them. the text is 80 lines, and the page is 84 lines in length because of the heading at the top and the page number centered at the bottom. catalog entries are not split between columns, so that the bottoms of the pages are rarely even. to simplify programming it was decided not to attempt programmed hyphenation of words or to require right and left justification of the lines in the catalog. the first words of a catalog entry are set flush left, and all successive lines are indented two spaces. the call number is set flush right on the last line of the entry if there are three spaces separating it from the last word of the entry; otherwise, it is set flush right on the following line. as stated earlier, entry words (authors, subject headings, added titles) are suppressed if they are the same as those found in a preceding entry 38 journal of library automation vol. 1/ 1 march, 1968 and are repeated only at the head of a new column if the entries are continued there. entry words are so clearly shown in the catalog that it was not considered necessru:y to use keys at the top of each page indicating which letters are included on that page. because an expanded print chain is employed, speed on the printer is considerably reduced, actually to 250 lines per minute. the printer requires eighteen seconds to print one page. the page image is approximately ten inches by fourteen inches. through use of the itek platemaster, this image is reduced to 68 per cent and an offset master created for reproduction on offset equipment. figures 8 and 9 show representative pages from the 1967 annual catalog. the foregoing account has emphasized the preparation of a page in an annual catalog. except for the size of the page and the kind of paper used, the identical process is followed for the preparation of the supplement. through a switch setting, a forty-line page for the supplement is printed. the supplement is printed on ten-ply paper ( 8~" x 11"), kept in unburst form, and bound at the top in post binders. figures 10 and 11 show representative pages from the january, 1967, supplement and illustrate how some of the entries earlier depicted in figures 6 and 7 appear in final form. through similar programs the shelf list is prepared and printed in . essentially the same format as the supplement, a 98-character line and a page of forty lines. a key at the top of each page indicates the first call number on that page. the shelf list is printed on four-ply paper, and copies distributed to important staff service points in the main and meyer libraries. figure 12 shows a copy of a page from the january, 1967, shelf list, depicting how the information from figure 5 appears when finally printed. the lone call number, e176.1.b77, copy 2, is for an item, of which copy one is represented in the 1966 annual shelf list. the first annual catalog and its supplements the first annual catalog was prepared during the summer of 1966, listing the 25,000 titles cataloged as of the end of june. the catalog was 2,804 pages long, 1,569 pages in the author & title catalog and 1,235 pages in the subject catalog. each page from the printer was first scanned by library staff and serious errors masked with white tape. in consequence, the user of the catalog will encounter an occasional blank on a page. the pages were then sent to the university's photo reproduction service, where offset masters were created and fifty copies of each page reproduced. the stanford university press prepared the binding. each set of the catalog was bound in red buckram in seven volumes, approximately 400 pages in each, four volumes for the author & title catalog, three volumes for the subject catalog. there is a title page in each volume and several pages of explanation on the use of the catalog. letters included in each volume are a book catalog at stanford/ johnson 39 author ~ titlc catalog -june 1967 ho rnbeln, thomas r. everest: th e west ri dge. photop raphs fr o• the a•er le~n hount eve rest ekped ltl on and bv tts leader, nor~ an g. dyhrentur th . introd. bw william e. s!rl. edited by david br.ower. sierr a club, 1965. 198 p•, ill us. ds486. e8h54 foil o hornberger, theodore · benj~mln fra nklin. univ. of minnesota pre ss, 1 962. 48 p• ps75l oh6 lfo rnbjow, ar thur the ca p tive, b~ edoua rd bourdet. translated by arthur ho r nb l ow, jr. lntr od . by j. brooks atk inson. bre n t ano's , 19 26 . 255 p. pq2603.0777p7z a. hi sto ry o( the theatre in afl'l.e rlca (t' om its beg lnnln n s t o the pre sent tt-e. j. d. li ppin cot t, 19 13. 2 v. pn2221.h6 the trlu~ph of dea th, by ga briele d'annunz io . tra n, lated by arthur ho rn blow. lntrod. by burton nasc oe . 8oni a nd llverl ght, 192~ . 412 p • pq4 803.z3t7 hornb y , a. s . a nuid e t o patterns and usage in english. oxford untv. pre ss , 1962 . 26 1 p. hot"n e, alistair the fall of pari s; commune , 1870-71. 1966. 458 p• pc14 60.h5 4 the siege and the st. hartin's pr e ss, oc311.h6 the price of ~ j o ry! verdun 1 91 6 . s t. ma rti n*s pr ess , 196j . 371 p• d545.v3h6 return t o po wer; a r epor t on th e new germa ny . pr a ege r, 1956. 41 5 p• 00259.4 .h65 ho r ne, c. sl l vest ~r put"lt.t nlsd .tn d art ; an l nq ulr11 int o a popular rat lacy. b~ joseph crouc h. introd. by the re v. c. si lv es ter horne. c as s e llt 19 10. j8 1 p• n7z.c 8 ho r ne d moo n ; an a ccount of a jou r ne y throu g h pakis t an, kash• ir, a nd af gha nist a n. bv i an s te phen,. . indillna unlv. press, 1955. 2.88 p• 05377.58 horner, har l an ho~t lin coln and gre:ele!l• un l v. ot 11tlno t s press, 1953. 43 2 p• e45 7. z. h79 horne~ , tlia ren rwork o . 196 4.] the coll ec t~d work s o f k~ren horney. w.w. no rt on, 19~2-6~. 2 v . conte n t s . v.l. th e neurot ic personality ot our t i ~e.v.2. selfanaly s i s . rc435 .ff6 neurosis and hu man qrowth; the struggl e t o ward se lf-reall z otlon. w.w. norton, 1950. 391 p• rc 343. h5 48 the neuro t i c pe r sonality of our tf •e • w.w . norton, 1937. 299 p• rc343oh75 n~w wll l/s ln psychoanalysts. w.w • . nof'ton, 1939. 3 13 p• bf1 73.h762 ou r inn e r confli cts: a co nstructive the o ry of neurosis. w.w. norton, 1945. 25 0 p• rcj43,h56 115z j. henry mcycr memor i al library horne~ , karen self-an al ysis. w. ~. norton, 1942. 309 p• bfi73.h"1625 ho rn gren, charl es t. cost accountin g ; a managerial e~pha~ i s. pre ntice-hall, 1 964. 80 1 r • hf~5~6. c8h59 hornik, he nri le t emp le d'honneur et de ver tu~, par jean lenalre de belges. ed. critique pub ll ~e p4r henri horni k. oroz, 1957. 136 p. pq1628 . lst4 horns, stri nn s and har•ony . fty art hu r h. be na::le. doubleday, 1960. 271 p• ml380s.b33 das ho rnunge r helm we h, a nd o ther 3to rt es . by wern e r bergen~ ruen. edi t ed by w. i. lucfts. to ne ls on , 1963. 1 17 p• pt260j.es9h6 ho rodls ch • abraha m picasso ~s a book a rti s t. ~orld pub• co •• 1962. 136 p• nc247.psh63 h~ ronj~rr, robert th e p lanning and design ot airports, hegraw -h lll, 1962. 464 p• tl7 2s . 3 .p5h6 horowit z, david he•ls phe res no rt h and south; econo~ lc di s parity among na ti ons . john s hopkins press, 1966. 11 8 p• hd82 . h617 stu:fe n t . r~lla~tt n~ books, 1962 . 160 p• ho r owitz, irving lou l5 the anarchists, edited irvin g louis horowitz. p• l076 0.1i6 vll h an intr od . by 0• 11, 1 96 4. 6 40 hx 82 6.h6 the i dee o f var end p eace in conte mpora r y p h ilos op hy. w~th an in tr oduc tory essay by roy wood se llar s. pai ne-~hit~an , 1~57. 224 p• j x1952 . h72 the ne ~ socio l ogy; e$says in ~oc ial sc ie nce and socia l th eo r y in honor o ( c. wri ght mi ll s . r.di ted by ir ving lou is ho rowitz. oxfor j un i v. press , }g64 . 5 1 2 p• h35.h68 radi ea 1lsm and t he rev o lt b3alnst reason; the socia l th eori e s d f georges so r e l, wit h a tra ns lati on o ( hls ess a y on the deco•posl ti on of ma r xi s m. hu• en t tir.s press t 1961. 264 p• hx26j.s6h5 revolution in brazil; polit i cs a nd soc iet y in a d eve l opi ng nation. e.p. dutton , 196 4 . ~30 p• f2538 . 2 . h6 th r ee .,. orl ds o t deve l o pmen t; the t heory and p ra c t ice o f international stratl!icatlon r oxfor d univ. press , 1966. 4 75 p• 0640.h6 horrabin, j.r. an atlas or afrtca. 2d, rev . ed . f.a. praeger, 1961. 126 p• g2445.116 mathe ~atics cor th e mlllton, by lance l o t hogb en. i llustrations by j .f. horrahtn. w.~. nort on, 1 937. 647 p• qa36 .h6 horrobln , dav id p. the com~unlcat lon systems o r t he body . basic bo oks, 1964. 214 p• qh508 .h6 fig. 8. a page in the annual author & title catalog. 40 journal of library automation vol. 1/1 march, 1968 subj ect catalog -june 196? cromyell, oliver ullver cro~well, by john morley. centurv co., 1900. 48g p. da4z6.k86 oliver cro•well, by c.v. wcdgwood . mac~illan, 1956. 14~ p• da426.w4 oliver cro• wetl and the rule or the puri tans in en~ l•nd. by s ir charles firth . with an lntrod. by g.h. youn". oxro~d unlv. press, 1~61. 486 p• oa~26.f52 crohye ll, thomas tho•as cro•wel l and the en~lish refor••tlon 1 b~ a.c. dickens. english unive rsities pr e1 s 1 1959. 192 p• da334.c9d5 cronin, a.j. adventur e s in two worlds, by a.j. cronin. little, brown, 1952. 331 p• pr6005.r68a4 crop r£po rts ~.!! agrleulture--statlstlcs. crops--statist ic s ~~~ agrlcu1ture--stotiotlcs. crops and clikate--u.s. cli•ete and •an, the yeerboo~ ot agriewlture, 1 ~41. u.s. dept. or agr iculture. u. s . govt. f>riot. orr., lo•h. 1248 p• s21oa35 1941 cross, "ari~n evan s §!!el iot, george, pseud. cross-country run~ing §:!~ rijnning. crow indians cro w indian beadwork: a descriptive and historical studs~• by willie"' wildschut and john c. ewe rs. museu~ of the a•e rlean lndlan, heye foundation, 1959. ss p. , il1uo. £99.c92w5 the crow indians, by robert lowie. farr•~ ~ rinehart,. 1935. 350 p• e9!1.c92l913 the lite •nd a dventu res or ja~es p. ueckwourth. edited by t.o. bonner. a.a . knopr, 1931. 405 p • f592.b393 the re11glon of the crow indians, by robert h. lovte. american museue ot natural history, tnz. 30>1-4h p• e99.c92l6 croiids the crowd: gu stave le 23~ p• a studv or the popular ~i nd. by bon. t. fisher unvln, 1917. hm28t.l 5 the cro wd in history: a ~tud~ ot popular disturbances in france and eng land, 17301848· 8y george rude. j. wiley, 1964. 2&1 p• hmz83,r8 the erowd in the french revolution, by ceo r o• rud~. cl•rendon press, 1961. 267 p• dc158.8.r& the psychology or social movement•, by hadley cantril. j. wlley 9 1~41. 274 p• h"291o c3 418 crusades an arab-s~ rl•n "entjem•n and wlrrlor in the period or the crus•des: •e•olrs or usa11ah lbn-hunqldh. trans l ated fro~ the original manuscript by philip k. hltti. colu~bla univ . preos, 1929. 265 p• ds!i?.u5 background to the crusades. a 8bc publication. british broadc asting corpore tlon, n.d. 38 p• 015 9. 87 the crusades, by ernest barker. oxford unlv. press, 1923. 11 2 p• 0158.83 the crusade3t by richard a. newhall. rev. ed. holt, rinohort and winston, 1964. 136 p• dl58.n~ fhe crusades, by zo~ old~nbourg . translated by an ne cdrter. pantheon books, 1966. 650 p• 0158,04 the crusades: iron •en and saint~. by ha r old la•b• doubleday. doran, lvjo. 368 p• d15?.l3 the crusadest the ~tory ot the latin kingdo• ot jeruaale~. by t.a. archer and charles l• kingsford. c.p. putn••• 1936. 467 p• dl58.a67 a hlatorv ot the crusades, by steven runc:l•an. cambridge, eng., unlv. press, 19~?. 3 v. d1 57.r8 the klngdo• of the cru•eders, by dana carleton munro. o. appleton, 1935. 216 p• d18 z.m8 the recovery ot the holy land, by pierre dubois. translated wlth an introduction and not~• by walth e r 1. brandt. columb ia un lv . press, 1956. 2sl p• di~z.d813 crus ade s-hi story a history ot the cru~ade~. £dttor-ln-chiet, kenneth h. setton. unlv. or pennsylvania press. 1 958 librory has v.~-2. d1s?.s48 crusades--first, 109 6-1099 the first erusede; the accounts witnesses and participants. pe 1958. 299 p• or eyes~ith, d161.1.a3~7 gest• francoru• et e11orum hlerosollai t anorum. the deeds of the franks and t~e other pllgrl~s to jerusalem. edited bl( rosalind hilla; lntrod. by r.a.b. "ynors. t. nelson. 196z. 103 9 103 p• in lati n and en~ iish. d16l.l.g4 crusades--second, 114?-1149 de protection• ludovlcl vii 1ft oriente•• edited, vlth an english translation by virg inia gingef"lck. berry. coluabla uniy. •preos, 1948. 154 p• dl62olou3 crusades--third, 1189-1192 the crusade ot richa r d llon-hea'f't, by a~brolse. translated by "erton jero~e hdbert. with notes and documentation by john l. l• monte. columbia univ. press, 1~41. 4?8 p• d163.aja52 fig. 9. a page in the annual subject catalog. 0 • • • ~ • • • ~ ~ oe ~ ~ ~ • /\unioii e title catalog -januai!y 1967 sijpplf.he:nt bailey, stephen ke~p political pnrtles, u.s./i,; essays by step~en kemp bailey and others. edited by robert a. goldwin, rand mcnally, 1965. 158 p• jk2265.g59 bailey, thomas a, presidential greatness; the l10age and the man trcm george ~ashlngton to the present. appleton-century, 1966, 368 p• el?6.l.b17 woodrow ~llson and the great betrayal , encounter paperbacks, 1963. 429 p• d543.a7b28 baln, j. atthur lite and exploratlcns of fridtjot nansen. new ed. rev. and considerably en!., with numerous illus. and map. w. scott, n.d. 449 p • g700,l893.b3 bainville, jacques the french republic, 1870-1935. translated, with an introductory note, by hamish miles. j, cape, 1936, 253 p• dc335,b;jz baird, a, craig araumentatlon, discussion, and debate. mcgraw-11111, 1950, 422 p• pn418l,b29 general speech; an introduction. by a. cral!l ealrd and franklin h. knower. 3d ed. mcgraw-1111 i, 1963. 44 8 p• pnh21. 8314 speech criticism; the development ot ~tandar.rls cor rhetorl cal aoo••al9al. bv j, henry ~eyer mekoriiil library baird, donald the english novel, 1578-1956; a checklist o! twentieth-century criticisms. by inglis f. bell and donald !laird. a. swallow, 1958. 169 p• ~h~l~ed only in reference alcove 280, z20l4.f4b4 baird, wlllla~ ralmond baird's manual ot american college fraternities. 17th ed. g. ba~ta, 1963, 834 p. ~h£1~£4 only in area 2~0. lj31.b2 baird's manual ot american college fraternities. 17th ed , g. banta, 1963. 834 p• ~h£1y£4 only in area 230. lj3l, b2 la baja edad media, por enrique bag~ e y juan petit. s. barra!, 1956. 412 p•t lllus. dp99.b3 bak, b-rge elementary introduction to molecular spectra. 2d rev. ed. intersclence, 1962. 144 p• ~c451.b16 baker, carlos american issues: the social record. edited by merle curti, willard thorp, and carlos baker. 4th ed. j.a. llpplncctt, 1960. 1160 p• psso?.t54 ernest he10ln9w~y: critiques of four major novels, f.dlted by carlos oaker. c. fig. 10. a page from the january 1967 author-title supplement. c) • u uu 0 • • • w wu 8 • • • p uv • • w u~ ol > tl:l c c ..,... cj ~ ~ ~ .... c/:) 4 ~ 0 :i: z c/:) 0 z ~ ~ \. '· • # .. c> v 0 40) ~ ~ q, ~ ~ ~ 0 co 0 0 • ... , ''-v' 40$1o•· ....:.t~......,._ ,_... ·~-: ........... ~ .......... """',. ..... ....~ ~~<:j: "' .. ,.jjt .,: r ,!a r., l.u.: -jf' ~ijj\:t 'l 1 jfj-) su p ... lc~t: •: t" po~ys, llewellyn the po~y3 brotherst by r.c. churchill• ton gmnns , grce nt l oos and influence of twalve notable u~·at '• ~~~~. col l:t ... llo,6 }lclel\~t e.dl .t.or: l if\ c~jef• c:. ~crlb.11e t & f•"l .c& '!•1 c:,l co v.l! c,l co .itt.~ c.:..l . ~.. ,111 .. "~"'60 .• _6 " ·•-. y.o:ta ct.l c: ~'1.1.• ~ ~cl.l.t•d. ,bl( ~.g~j:•. jio.pl\1 114•. ,vo t. !> .1 i\ ' l 11<\elt.f. . &11'6·)·)l~7 . . · dellejt, tt.o., u ~. h .utdoilt(.i1 a're~t!"•.••: tk• ... aae and. t.."~· ·~n . . r:rp•. qearge v~s.f\l11ato11 to th~ .present~ ill!. tka•.u ·"·-.. b•ll•ll • 1\ppleta!i,.,c.ellt.lll:.lft .~!1116 •. , 3d~ .jh ..• .p res.j de:nts~:-u •. $, tjue e1.7~.l:,.b77 co.2 &171it1 •. j<3 kllll .t:t .!ot.eph tla~han . t:ec:.t• abou.t .the presidents.:~ .ciollpiu.tloi\ p,t ,b_logrephloe.l ~ncl hl,.,t.qr.lcal .d!ll.t!'• 811 "o .. ph nathan kane, h,,w, iu.l•olla ,\,96.0 .•. ~41\ .. p.·· . ~hlll!~d onl~ in area ~~o. prils1d£11t!i--u, s• '-' tte col c ~l!tl.l!!!!. 111. re.hr.c.nc!\ ~! .. cove ~40• u,s,--histo.ry ~:;s~:~.:sqcj/.1. j.iji:: ~nd (;~stq/1.~ 1..l.t.l!! . ' . !(~j c:, _j, !'\,. ~(·~ ~.·,1, k.:t ~.·3. c::~.1 ica v_._4 .~.1, kt ~~~ ~~ .j. 1\,. v.,g c:.l 1\ tl78o:s91\tl 1.94?. fa.j.i.q 811ttertletd 1 roger ;fh~ :"""'rl.c•ll pa~t i . •. ~lstoi')i ot 't;he uri (ted stat."s r:r~ .... con~o.rd ,tp .h.i.~o.shlu, .. 1776:-1~4~ .• . b:v: ,r.o!j~r but.t.e.rt.l•~.l~~ sj.aon •11cl schuster, 19'17. 47!1 p• t lllu•• tl:~:~;:n~~~g~:~~:p l<:t~rfa~i; wil~i';fi . :r.t..t.le 1!1~ ... .. ~'""' ~i • ··~ ·· : • ,1 ' ~:·· ~ ' ~ ; 0 1!-t\) " \d }., \)( t·..;-o . . ~ : l i ~ ... • ! • u· , : ..... 1: i i ._.~ j j .. .> l:z:j 0 0 ;>~-' ~ i ~ ..... en ~ -it a ~ 0 :i: z en 0 z t!z 44 journal of library automation vol. 1/ 1 march, 1968 imprinted on the spine. fifty sets of the catalog were ready when the building opened in november, 1966. the shelf list, printed in four copies, contained 3,261 pages. each set required seven binders. in view of the fact that activation of the library continued through the first year of opening, the collection grew at a much greater rate than is anticipated for subsequent years. hence when the building opened, there was available besides the annual catalog a first supplement, in ten copies, listing the 4,000 titles cataloged from july through september, 1966. although it was proposed originally to prepare monthly supplements, factors of cost and staff time led . to acceptance of quarterly supplements. the second supplement, issued in january, 1967, included the 8,000 titles cataloged from july through december, 1966. a third supplement, issued in april, 1967, included the 12,000 titles cataloged from july, 1966, through march, 1967. the april supplement had 1,934 pages in its author & title section, 1,206 pages in the subject section, and 1,752 pages in the shelf list. the major drawback to an off-line, batch-process book catalog is that it is an obsolete document when produced. this was especially true during the first year when the library grew at the rate of 100 volumes per working day. as a partial remedy to this situation a brief, dated catalog card accompanies each book cataloged for the meyer library. information included consists of call number, author, and title. this card is placed in an alphabetical file at the reference desk and purged when a new supplement is issued. the meyer library s~aff considered the ten copies of the supplement inadequate for use in the building; during the second year twenty copies of each supplement are to be prepared by running the print program twice. supplements in the second and succeding years will, of course, be considerably shorter than those issued in the first year. the second annual catalog preparation of the second annual catalog began in the spring of 1967. this catalog lists the 41,000 titles cataloged as of the end of june, 1967. the first procedure was to emend the 1966 tape by purging the entries to be changed or deleted; corrected entries come in with new data. in july the information for titles cataloged from april through june was transferred to magnetic tape and merged with the data in the april, 1967, supplement. all programs were run through the author-title and subject update program and at that point merged with the emended catalog from the preceding year, the split program was run, and the pages for a new catalog were created. as in the preceding year, library staff scanned the completed pages and masked noticeable errors. the photo reproduction service printed a book catalog at stanford/ johnson 45 75 ·copies of each page during the first half of august, and the stanford university press bound the catalog during the following month. completed sets of the catalog were delivered on september 20, 1967, a week before classes were to begin for the new academic year. the 1967 catalog is 4,612 pages in length-2,683 pages in the author & title catalog, divided into five volumes of 530 pages each, and 1,929 pages in the subject catalog, divided into four volumes of 480 pages each. as in 1966, there is a title page and an explanatory introduction in each volume. floor plans of the library are on the end sheets; and imprinted in gold on the spine are the letters included in each volume, there being clean alphabetical breaks between volumes. the second annual shelf list is 5,634 pages long, and each of the four copies requires eleven binders. some confusion resulted in 1966, when both author & title and subject catalogs were bound in the same color. in 1967 the author & title catalog was bound in tan bookcloth and the subject catalog in light green. machine timing as the above figures demonstrate, the 1967 edition of the book catalog is no brief document, similarly, time required to process the information on the computer was not brief. as stated earlier, the addition of the expanded print chain considerably reduced the speed of the line printer. instead of printing in excess of 600 lines per minute, the printer speed was reduced to 250 lines per minute. this speed was determined by timings made of the print programs. to print each page in the annual author & title and subject catalogs, eighteen seconds were required. to print each page in the supplements or in the shelf list, ten seconds were required. thus, for example, to print the 4,612 pages in the 1967 annual catalog, twenty-three hours were required on the computer printer. in processing the supplements and annual catalogs, it has now become necessary to talk of time required for processing in terms of hours and not seconds or minutes. during the preparation of the 1967 annual catalog timings were made of the various internal programs, whose output was magnetic tape and which were not tied to the mechanical limitations of the line printer. sample times are shown in table 4. table 4. program running times format and update program ( lb020) shelf list split program ( lb030) author-title and subject update program (lb050) author-title and subject split program ( lb060) 6.5 hours 11.2 hours 3.7 hours 28.5 hours throughout the year time is required on the computer for the preparation of edit lists. timing was conducted for this particular program as well for each 100 records entered, four minutes of machine time are required to prepare an edit list. 46 journal of library automation vol. 1/ 1 march, 1968 · the computer employed for the project is a university facility, and the library was billed for its use at the rate of $32.00 per hour. the library receives a monthly statement for various charges from the administrative data processing center, and these have served as one basic record to employ in calculating the actual costs of the book catalog. costs the determination of actual costs is a difficult undertaking, and a meaningful comparison with costs estimated during the planning process is filled with problems, uncertainties, questions of definition, etc. in a sense, it is impossible to make a meaningful comparison. an element measured during planning is not the same as the element actually achieved. for example, during the early planning stages, before systems design actually began, there was no clear plan for the shelf list nor idea of what its role would actually be. the shelf list as finally designed and implemented is a far more sophisticated document than was then visualized. second, there was no clear thought given to the inputting of separate records for each added volume or copy of a given item in order to achieve an inventory control document as well as a classed listing of items in the library. third, there was no clear determination as to the length of the sort field required and its effect on processing. fourth, a principal study . conducted to justify the book catalog compared its projected costs with the costs of three dictionary card catalogs in the new library. although the book catalog was implemented, the three card catalogs were not, and we have no idea as to the accuracy of our calculations of their cost, even though our experience with the preparation of card catalogs is greater. fifth, cost studies were based upon the preparation of a 40,000title catalog as the first product. this was an unrealistic assumption to make, because the library was to open with only 25,000 titles in its collection. given such reservations and conditions, an effort has been made to summarize estimated costs and so attempt an understanding of how they compare to actual costs. even the determination of actual costs is difficult. it must be borne in mind that the complete operation was performed "in house." cost statements thus omit considerations of such necessary factors as overhead and considerable administrative supervision. for example, during the second year the library was not charged for program maintenance, a signi.jicant contribution from the administrative data processing center. initial planning was based upon preparation of a 40,000-title ( 60,000volume) catalog, ·and it is possible to present cost approximations in two sections, the first recording · costs required to prepare 50 copies of the 25,000-title catalog issued in 1966; and, second, the additional costs required to input the next 16,000 titles, issue three supplements, and prea book catalog at stanford/johnson 47 pare 75 copies of the 41,000-title ( 60,000-vohime) catalog issued in 1967. they are shown in table 5. . . table 5. cost approximations july 65-aug. 66 sept. 66-aug. 67 input (@ $.40 per title) $10,000 $ 6,400 programming computer charges edit lists test catalogs supplements annual catalog reproduction binding . 5,945 3,000 4,000 2,500 4,570 1966 ( 350 vols.); 1967 ( 675 vols. ) 805 binders for shelf list and supplements 84 totals $30,904 1,660 4,460 4,950 5,270 1,690 300 $24,730 . if we eliminate the costs directly related to the production of the 25,000-title catalog, we may be able to isolate the cost of the 41,000-title catalog issued in 1967. this calculation is subject to a certain amount of error, because some processing done in preparation of the 1966 catalog was used again in 1967. this may be compensated for, however, by the time required for the utility programs to emend and delete items from the 1966 tape. test catalogs and their cost were not considered in early planning, and so their $4,000 cost is eliminated as well. in table 6 below the actual costs for the 41,000-title catalog so derived are compared with the estimates prepared in the fall of 1964 and the estimates offered in april, 1965, at the conclusion of the preliminary systems design. various adjustments have been made so that these figures are as comparable as possible. for example, the systems estimate did not include a cost for inputting, and this has been added. the actual figures have been adjusted to include costs only for the printing and binding of fifty sets of the catalog instead of the seventy-five which actually were prepared. although the december, 1964, estimate included under computer charges a factor for supplements, these are not included in the systems estimate or actual charges. the format of the supplement particularly became so sophisticated in design and implementation, both in format and number of copies, that this discrepancy is minimal. these figures necessarily cannot be precise, but they give some magnitude of the work undertaken. although the cost figures indicate that the actual cost was more than fifty per cent greater than actually estimated in 1964, it does remain close to the estimate prepared in the systems design. the chief reasons for the discrepancy may be summarized as the underestimation of the amount 48 journal of library automation vol. 1/ 1 march, 1968 of machine time needed for the various programs, the underestimation of the programming job involved, underestimating the charges for edit lists; and, most important, the design of a system that was very much more sophisticated than that originally foreseen in 1964. table 6. comparison of estimated and actual costs input of 40,000 titles computer charges reproduction (50 copies) binding programming totals dec. 64 april 65 estimate estimate $11,060 $16,647 1,750 8,595 4,324 4,500 . 3,750 2,385 3,000 6,000 $23,884 $38,127 actual costs $16,400 9,610 5,115 1,600 5,945 $38,670 even though costs were greater than expected, one estimate did hold up, namely the time required to complete the job. delivery and installation of equipment, programming, program testing, inputting of data, reproduction, and binding-all were on schedule with only minor slippage that did not affect the completion date of the overall job. the future the publication of the second annual book catalog coincided with the completion of the library's activation project. continued work on the addition of materials to the new library has been assigned to existing divisions within the university library. inputting of cataloging information for the meyer library and preparation of supplements to the book catalog and of new annual catalogs are now functions of the catalog division. growth of the library will henceforth proceed at a slower rate, with from 5,000 to 8,000 titles being added each year. the first by-product of the system has appeared-a listing of serial publications in the library for use in ordering and claiming operations. even as the first annual catalog was being prepared, the feeling was expressed that the equipment employed (the ibm 1401 computer) was not adequate in an economic sense to undertake this mission for increasingly larger masses of material. this · feeling became clearer with the preparation of the second edition. looking to the future, we see at present several paths we may follow. first, studies are under way on the conversion of the book catalog operation to larger equipment, in this case an ibm system/360, probably linking it to the overall library program of automation. not only might this change entail use of more powedul equipment for the ·off-line processing necessary to prepare a book catalog, but there may be possibilities as well of instituting on-line inquiry. there could thus be eliminated the problem of supplements and time-lags. a book catalog at stanford/johnson 49 second, preliminary inquiries have also been made on the use of the existing tapes in computerized typesetting equipment. the hoped-for result would be the achievement of graphic arts quality on the book catalog page and less bulk to the completed catalog through the greater legibility and greater density thus realized. conclusion such success as this project has achieved may be attributed to a number of factors: the entire operation was performed "in house;" we were able to draw upon the skills of many staff members on the stanford campus-in the library, the administrative data processing center, the photo reproduction service, the news and publications office, and the stanford university press. ibm representatives, and particularly the systems engineer assigned to the project, gave considerable impetus and guidance to the undertaking. equipment was delivered on schedule and functioned well. there was a particularly harmonious and understanding working relationship achieved among the many participants in the project, and administrative support from library and university officials was constant. there never was any problem in gaining access to the computer, and the staff responsible for its operation gave devoted service in the preparation of the catalog. through a happy combination of circumstances, sufficient lead time was available for the project to be completed on schedule. when it became obvious that we should exceed the cost estimates originally prepared, library funds were available to continue the work. it is clear from student reactions that the book catalog is a useful tool in the new library, and it is hoped that the experience here recounted will prove valuable to the profession at large. references 1. mccune, lois c.; salmon, stephen r.: "bibliography of library automation," ala bulletin, 61 (june 1967), 674-94. 2. weber, david c.: "book catalog trends in 1966," library trends, 16 (july 1967), 149-64. 3. freitag, wolfgang m.: "planning for student interaction with the library," california librarian, 26 (april 1965), 89-96. 4. hayes, robert m.; shoffner, ralph m.: the economics of book catalog production, a study prepared for stanford university libraries and the council on library resources (sherman oaks, calif.: advanced information systems division, 1964). 5. hayes, robert m.; shoffner, ralph m.; weber, david c.: "the economics of book catalog production," library resources and technical services, 10 (winter 1966), 63-65, 90. 50 journal of library automation vol. 1/ 1 march, 1968 6. parker, ralph h.: ·"book catalogs," library resources and technical services, 8 (fall 1964), 348. 7. simonton, wesley: "the computerized catalog: possible, feasible, desirable?" library resources and technical services, 8 (fall1964 ), 403-405. 8. avram, henriette d .; guiles, kay d.; meade, guthrie t.: "fields of information on library of congress catalog cards: analysis of a random sample, 1950-1964," the library quarterly, 37 (aprill967), 190-91. 9. "rule 134," anglo-american cataloging rules. north american text (chicago: american library association, 1967), pp. 196-97. 10. "rule 3:6,'' rules for descriptive cataloging in the library of congress (washington, d. c.: u. s. government printing office, 1949), p. 14. ll. hines, theodore c.; harris, jessica l.: computer filing of index, bibliographic, and catalog entries (newark: bro-dart foundation, 1966)' p. 18. generating collaborative systems for digital libraries | visser and ball 187 marijke visser and mary alice ball the middle mile: the role of the public library in ensuring access to broadband of fundamentally altering culture and society. in some circles the changes happen in real time as new web-based applications are developed, adopted, and integrated into the user’s daily life. these users are the early adopters; the internet cognoscenti. second tier users appreciate the availability of online resources and use a mix of devices to access internet content but vary in the extent to which they try the latest application or device. the third tier users also vary in the amount they access the internet but have generally not embraced its full potential, from not seeking out readily available resources to not connecting at all.1 regardless of the degree to which they access the internet, all of these users require basic technology skills and a robust underlying infrastructure. since the introduction of web 2.0, the number and type of participatory web-based applications has continued to grow. many people are eagerly taking part in creating an increasing variety of web-based content because the basic tools to do so are widely available. the amateur, creating and sharing for primarily personal reasons, has the ability to reach an audience of unprecedented size. in turn, the internet audience, or virtual audience, can select from a vast menu of formats, including multimedia and print. with print resources disappearing, it is increasingly likely for an individual to only be able to access necessary material online. web-based resources are unique in that they enable an undetermined number of people, personally connected or complete strangers, to interact with and manipulate the content thereby creating something new with each interaction and subsequent iteration. many of these new resources and applications require much more bandwidth than traditional print resources. with the necessary technology no longer out of reach, a crosssection of society is affecting the course the twenty-first century is taking vis à vis how information is created, who can create it, and how we share it.2 in turn, who can access web-based content and who decides how it can be accessed become critical questions to answer. as people become more adept at using web-based tools and eager to try new applications, the need for greater broadband will intensify. the economic downturn is having a marked effect on people’s internet use. if there was a preexisting problem with inadequate access to broadband, current circumstances exacerbate it to where it needs immediate attention. access to broadband internet today increases this paper discusses the role of the public library in ensuring access to the broadband communication that is so critical in today’s knowledge-based society. it examines the culture of information in 2010, and then asks what it means if individuals are online or not. the paper also explores current issues surrounding telecommunications and policy, and finally seeks to understand the role of the library in this highly technological, perpetually connected world. i n the last twenty years library collections have evolved from being predominantly print-based to ones that have a significant digital component. this trend, which has a direct impact on library services, has only accelerated with the advent of web 2.0 technologies and participatory content creation. cutting-edge libraries with next generation catalogs encourage patrons to post reviews, contribute videos, and write on library blogs and wikis. even less adventuresome institutions offer a variety of electronic databases licensed from multiple publishers and vendors. the piece of these library portfolios that is at best ignored and at worst vilified is the infrastructure that enables internet connectivity. in 2010, broadband telecommunication is recognized as essential to access the full range of information resources. telecommunications experts articulate their concerns about the digital divide by focusing on firstand last-mile issues of bringing fiber and cable to end users. the library, particularly the public library, represents the metaphorical middle mile providing the public with access to rich information content. equally important, it provides technical knowledge, subject matter expertise, and general training and support to library users. this paper discusses the role of the public library in ensuring access to the broadband communication that is so critical in today’s knowledge-based society. it examines the culture of information in 2010, and then asks what it means if individuals are online or not. the paper also explores current issues surrounding telecommunications and policy, and finally seeks to understand the role of the library in this highly technological, perpetually connected world. ■■ the culture of information information today is dynamic. as the internet continues on its fast paced, evolutionary track, what we call ‘information’ fluctuates with each emerging web-based technology. theoretically a democratic platform, the internet and its user-generated content is in the process marijke visser (mvisser@alawash.org) is information technology policy analyst and mary alice ball (maryaliceball@yahoo .com) former chair, telecommunications subcommittee, office for information technology policy, american library association, washington, dc. 188 information technology and libraries | december 2010 the geographical location of a community will also influence what kind of internet service is available because of deployment costs. these costs are typically reflected in varying prices to consumers. in addition to the physical layout of an area, current federal telecommunications policies limit the degree to which incentives can be used on the local level.7 encouraging competition between isps, including municipal electric utilities, incumbent local exchange carriers, and national cable companies, for example, requires coordination between local needs and state and federal policies. such coordinated efforts are inherently difficult when taking into consideration the numerous differences between locales. ultimately, though, all of these factors influence the price end users must pay for internet access. with necessary infrastructure and telecommunications policies in place, there are individual behaviors that also affect broadband adoption. according to the pew study, “home broadband adoption 2008,” 62 percent of dial-up users are not interested in switching to broadband.8 clearly there is a segment of the population that has not yet found personal relevance to high-speed access to online resources. in part this may be because they only have experience with dial-up connections. depending on dial-up gives the user an inherently inferior experience because bandwidth requirements to download a document or view a website with multimedia features automatically prevent these users from accessing the same resources as a user with a high-speed connection. a dial-up user would not necessarily be aware of this difference. if this is the only experience a user has it might be enough to deter broadband adoption, especially if there are other contributing factors like lack of technical comfort or availability of relevant content. motivation to use the internet is influenced by the extent to which individuals find content personally relevant. whether it is searching for a job and filling out an application, looking at pictures of grandchildren, using skype to talk to a family member deployed in iraq, researching healthcare providers, updating a personal webpage, or streaming video, people who do these things have discovered personally relevant internet content and applications. understanding the potential relevance of going online makes it more likely that someone would experiment with other applications, thus increasing both the familiarity with what is available and the comfort level with accessing it. without relevant content, there is little motivation for someone not inclined to experiment with internet technology to cross what amounts to a significant hurdle to adoption. anthony wilhelm argues in a 2003 article discussing the growing digital divide that culturally relevant content is critical in increasing the likelihood that non-users will want to access web-based resources.9 the scope of the issue of providing culturally relevant content is underscored in the 2008 pew study, the amount of information and variety of formats available to the user. in turn more content is being distributed as users create and share original content.3 businesses, nonprofits, municipal agencies, and educational institutions appreciate that by putting their resources online they reach a broader segment of their constituency. this approach to reaching an audience works provided the constituents have their own access to the materials, both physically and intellectually. it is one thing to have an internet connection and another to have the skill set necessary to make productive use of it. as reported in job-seeking in u.s. public libraries in 2009, “less than 44% of the top 100 u.s. retailers accept instore paper applications.”4 municipal, state, and federal agencies are increasingly putting their resources online, including unemployment benefit applications, tax forms, and court documents.5 in addition to online documents, the report finds social service agencies may encourage clients to make appointments and apply for state jobs online.6 many of the processes that are now online require an ability to navigate the complexities of the internet at the same time as navigating difficult forms and websites. the combination of the two can deter someone from retrieving necessary resources or successfully completing a critical procedure. while early adopters and policy-makers debate the issues surrounding internet access, the other strata of society, knowingly or not and to varying degrees, are enmeshed in the outcomes of these ongoing discussions because their right to information is at stake. ■■ barriers to broadband access by condensing internet access issues to focus on the availability of adequate and sustainable broadband, it is possible to pinpoint four significant barriers to access: price, availability, perceived relevance, and technical skill level. the first two barriers are determined by existing telecommunications infrastructure as well as local, state, and federal telecommunications policies. the latter barriers are influenced by individual behaviors. both divisions deserve attention. if local infrastructure and the internet service provider (isp) options do not support broadband access to all areas within its boundaries, the result will be that some community members can have broadband services at home while others must rely on work or public access computers. it is important to determine what kind of broadband services are available (e.g., cable, dsl, fiber, satellite) and if they are robust enough to support the activities of the community. infrastructure must already be in place or there must be economic incentive for isps to invest in improving current infrastructure or in installing new infrastructure. generating collaborative systems for digital libraries | visser and ball 189 at all. success hinges on understanding that each community is unique, on leveraging its strengths, and on ameliorating its weaknesses. local government can play a significant role in the availability of broadband access. from a municipal perspective, emphasizing the role of broadband as a factor in economic development can help define how the municipality should most effectively advocate for broadband deployment and adoption. gillett offers four initiatives appropriate for stimulating broadband from a local viewpoint. municipal governments can ■■ become leaders in developing locally relevant internet content and using broadband in their own services; ■■ adopt policies that make it easier for isps to offer broadband; ■■ subsidize broadband users and/or isps; or ■■ become involved in providing the infrastructure or services themselves.12 individually or in combination these four initiatives underscore the fact that government awareness of the possibilities for community growth made possible by broadband access can lead to local government support for the initiatives of other local agencies, including nonprofit, municipal, or small businesses. agencies partnering to support community needs can provide evidence to local policy makers that broadband is essential for community success. once the municipality sees the potential for social and economic development, it is more likely to support policies that stimulate broadband buildout. building strong local partnerships will set the stage for the development of a sustainable broadband initiative as the different stakeholders share perspectives that take into account a variety of necessary components. when the time comes to implement a strategy, not only will different perspectives have been included, the plan will have champions to speak for it: the government, isps, public and private agencies, and community members. it is important to know which constituents are already engaged in supporting community broadband initiatives and which should be tapped. the ultimate purpose in establishing broadband internet access in a community is to benefit the individual community members, thereby stimulating local economic development. key players need to represent agencies that recognize the individual voice. a 2004 study led by strover provides an example of the importance of engaging local community leaders and agencies in developing a successful broadband access project.13 the study looked at thirty-six communities that received state funding to establish community technology centers (ctc). it addressed the effective use and management of ctcs and called attention to the inadequacy of supplying the hardware without community support which found that of the 27 percent of adult americans who are not internet users, 33 percent report they are not interested in going online.10 that pew can report similar information five years after the wilhelm article identifies a barrier to equitable access that has not been adequately resolved. ■■ models for sustainable broadband availability in discussing broadband, the question of what constitutes broadband inevitably arises. gillett, lehr, and osoria, in “local government broadband initiatives,” offers a functional definition: “access is ‘broadband’ if it represents a noticeable improvement over standard dial-up and, once in place, is no longer perceived as the limiting constraint on what can be done over the internet.”11 while this definition works in relationship to dial-up, it is flexible enough to apply to all situations by focusing on “a noticeable improvement” and “no longer perceived as the limiting constraint” (added emphasis). ensuring sustainable broadband access necessitates anticipating future demand. short sighted definitions, applicable at a set moment in time, limit long-term viability of alternative solutions. devising a sustainable solution calls for careful scrutiny of alternative models, because the stakes are so high in the broadband debate. there are many different players involved in constructing information policies. this does not mean, however, that their perspectives are mutually exclusive. in debates with multiple perspectives, it is important to involve stakeholders who are aligned with the ultimate goal: assuring access to quality broadband to anyone going online. what is successful for one community may be entirely inappropriate in another; designing a successful system requires examining and comparing a range of scenarios. existing circumstances may predetermine a particular starting point, but one first step is to evaluate best practices currently in place in a variety of communities to come up with a plan that meets the unique criteria of the community in question. sustainable broadband solutions need to be developed with local constituents in mind and successful solutions will incorporate the realities of current and future local technologies and infrastructure as well as local, state, and federal information policies. presupposing that the goal is to provide the community with the best possible option(s) for quality broadband access, these are key considerations to take into account when devising the plan. in addition to the technological and infrastructure issues, within a community there will be a combination of ways people access the internet. there will be those who have home access, those who need public access, and those who do not seek access 190 information technology and libraries | december 2010 the current emphasis on universal broadband depends on selecting the best of the alternative plans according to carefully vetted criteria in order to develop a flexible and forward-thinking course of action. can we let people remain without access to robust broadband and the necessary skill set to use it effectively? no. as more and more resources critical to basic life tasks are accessible only online, those individuals that face challenges to going online will likely be socially and economically disadvantaged when compared to their online counterparts. recognition of this potential for intensifying digital divide is recognized in the federal communication commission’s (fcc) national broadband plan (nbp) released in march 2010.18 the nbp states six national broadband goals, the third of which is “every american should have affordable access to robust broadband service, and the means and skills to subscribe if they so choose.”19 research conducted for the recommendations in the nbp was comprehensive in scope including voices from industry, public interest, academia, and municipal and state government. responses to more than thirty public notices issued by the fcc provide evidence of wide concern from a variety of perspectives that broadband access should become ubiquitous if the united states is to be a competitive force in the twentyfirst century. access to essential information such as government, public safety, educational, and economic resources requires a broadband connection to the internet. it is incumbent on government officials, isps, and community organizations to share ideas and resources to achieve a solution for providing their communities with robust and sustainable broadband. it is not necessary to have all users up to par with the early adopters. there is not a one-size-fits-all approach to wanting to be connected, nor is there a one-size-fits-all solution to providing access. what is important is that an individual can go online via a robust, high-speed connection that meets that individual’s needs at that moment. what this means for finding solutions is ■■ there needs to be a range of solutions to meet the needs of individual communities; ■■ they need to be flexible enough to meet the evolving needs of these communities as applications and online content continue to change; and ■■ they must be sustainable for the long term so that the community is prepared to meet future needs that are as yet unknown. solutions to providing broadband internet access will be most successful when they are designed starting at the local level. community needs vary according to local demographics, geography, existing infrastructure, types of service providers, and how state and federal systems in place. users need a support system that highlights opportunities available via the internet and that provides help when they run into problems. access is more than providing the infrastructure and hardware. the potential users must also find content that is culturally relevant in an environment that supports local needs and expectations. strover found the most successful ctcs were located in places that “actively attracted people for other social and entertaining reasons.”14 in other words, the ctcs did not operate in a vacuum devoid of social context. successful adoption of the ctcs as a resource for information was dependent on the targeted population finding culturally relevant content in a supportive environment. an additional point made in the study showed that without strong community leadership, there was not significant use of the ctc even when placed in an already established community center.15 this has significant implications for what constitutes access as libraries plan broadband initiatives. investments in technology and a national commitment to ensure universal access to these new technologies in the 1990s provide the current policy framework. as suggested by wilhelm in 2003, to continue to move forward the national agenda needs to focus on updating policies to fit new information circumstances as they arise. today’s information policy debates should emphasize a similar focus. beyond accelerating broadband deployment into underserved areas, wilhelm suggests there needs to be support for training and content development that guarantees communities will actually use and benefit from having broadband deployed in their area.16 technology training and support for local agencies that provide the public with internet access, as well as opportunities for the individuals themselves, is essential if policies are going to actually lead to useful broadband adoption. individual and agency internet access and adoption require investment beyond infrastructure; they depend on having both culturally relevant content and the information literacy skills necessary to benefit from it. ■■ finding the right solution though it may have taken an economic crisis to bring broadband discussions into the living room, the result is causing renewed interest in a long-standing issue. many states have formed broadband task forces or councils to address the lack of adequate broadband access at the state level and, on the national front, broadband was a key component of the american recovery and reinvestment act of 2009.17 the issue changes as technologies evolve but the underlying tenet of providing people access to the information and resources they need to be productive members of society is the same. what becomes of generating collaborative systems for digital libraries | visser and ball 191 difficult to measure, these kinds of social and cultural capital are important elements in ongoing debates about uses and consequences of broadband access. an ongoing challenge for those interested in the social, economic, and policy consequences of modern information networks will be to keep up with changing notions of what it means to be connected in cyberspace.”20 the social contexts in which a broadband plan will be enacted influence the appropriateness of different scenarios and should help guide which ones are implemented. engaging a variety of stakeholders will increase the likelihood of positive outcomes as community members embrace the opportunities provided by broadband internet access. it is difficult, however, to anticipate the outcomes that may occur as users become more familiar with the resources and achieve a higher level of comfort with technology. ramirez states, the “unexpected outcomes” section of many evaluation reports tends to be rich with anecdotes . . . . the unexpected, the emergent, the socially constructed innovations seem to be, to a large extent, off the radar screen, and yet they often contain relevant evidence of how people embrace technology and how they innovate once they discover its potential.21 community members have the most to gain from having broadband internet access. including them will increase the community’s return on its investment as they take advantage of the available resources. ramirez suggests that “participatory, learning, and adaptive policy approaches” will guide the community toward developing communication technology policies that lead to a vibrant future for individuals and community alike.22 as success stories increase, the aggregation of local communities’ social and economic growth will lead to a net sum gain for the nation as a whole. ■■ the role of the library public libraries play an important role in providing internet access to their community members. according to a 2008 study, the public library is the only outlet for no-fee internet access in 72.5 percent of communities nationwide; in rural communities the number goes up to 82.0 percent.23 beyond having desktop or, in some cases, wireless access, public libraries offer invaluable user support in the form of technical training and locally relevant content. libraries provide a secondary community resource for other local agencies who can point their clients to the library for no-fee internet access. in today’s economy where anecdotal reports show an increase in library use, particularly internet use, the role of the public policies mesh with local ordinances. local stakeholders best understand the complex interworking of their community and are aware of who should be included in the decision-making process. including a local perspective will also increase the likelihood that as community needs change, new issues will be brought to the attention of policy makers and agencies who advocate for the individual community members. community agencies that already are familiar with local needs, abilities, and expectations are logical groups to be part of developing a successful local broadband access strategy. the library exemplifies a community resource whose expertise in local issues can inform information policy discussions on local, state, and federal levels. as a natural extension of library service, libraries offer the added value support necessary for many users to successfully navigate the internet. the library is an established community hub for informational resources and provides dedicated staff, technology training opportunities, and no-fee public access computers with an internet connection. libraries in many communities are creating locally relevant web-based content as well as linking to other community resources on their own websites. seeking a partnership with the local library will augment a community broadband initiative. it is difficult to appreciate the impacts of current information technologies because they change so rapidly there is not enough time to realistically measure the effects of one before it is mixed in with a new innovation. with web-based technologies there is a lag time between what those in the front of the pack are doing online and what those in the rear are experiencing. while there is general consensus that broadband internet access is critical in promoting social and economic development in the twenty-first century as is evidenced by the national purposes outlined in the nbp, there is not necessarily agreement on benchmarks for measuring the impacts. three anticipated outcomes of providing community access to broadband are ■■ civic participation will increase; ■■ communities will realize economic growth; and ■■ individual quality of life will improve. when a strategy involves significant financial and energy investments there is a tendency to want palpable results. the success of providing broadband access in a community is challenging to capture. to achieve a level of acceptable success it is necessary to focus on local communities and aggregate anecdotal evidence of incremental changes in public welfare and economic gain. acceptable success is subjective at best but can be usefully defined in context of local constituencies. referring to participation in the development of a vibrant culture, horrigan notes that “while inherently 192 information technology and libraries | december 2010 isolation. an individual must possess skills to navigate the online resources. as users gain an understanding of the potential personal growth and opportunities broadband yields, they will be more likely to seek additional online resources. by stimulating broadband use, the library will contribute to the social and economic health of the community. if the library is to extend its role as the information hub in the community by providing no-fee access to broadband to anyone who walks through the door, the local community must be prepared to support that role. it requires a commitment to encourage build out of appropriate technology necessary for the library to maintain a sustainable internet connection. it necessitates that local communities advocate for national information and communication policies that are pro-library. when public policy supports the library’s efforts, the local community benefits and society at large can progress. what if the library’s own technology needs are not met? the role of the library in its community is becoming increasingly important as more people turn to it for their internet access. without sufficient revenue, the library will have a difficult time meeting this additional demand for services. in turn, in many libraries increased demand for broadband access stretches the limit of it support for both the library staff and the patrons needing help at the computers. what will be the fallout from the library not being able to provide internet services the patrons desire and require? will there be a growing skills difference between people who adopt emerging technologies and incorporate them into their daily lives and those who maintain the technological status quo? what will the social impact be of remaining off line either completely or only marginally? can the library be the bridge between those on the edge, those in the middle, and those at the end? with a strong and well articulated vision for the future, the library can be the link that provides the community with sustainable broadband. ■■ conclusion the recent national focus on universal broadband access has provided an opportunity to rectify a lapse in effective information policy. whether the goal includes facilitating meaningful access continues to be more elusive. as government, organizations, businesses, and individuals rely more heavily on the internet for sharing and receiving information, broadband internet access will continue to increase in importance. following the status quo will not necessarily lead to more people having broadband access in the long run. the early adopters will continue to stimulate technological innovation which, in turn, will trickle down the ranks of the different user types. currently, library as a stable internet provider cannot be overestimated. to maintain its vital function, however, the library must also resolve infrastructure challenges of its own. because of the increased demand for access to internet resources, public libraries are finding their current broadband services are not able to support the demand of their patrons. the issues are two-fold: increased patron use means there are often neither sufficient workstations nor broadband speeds to meet patron demand. in 2008, about 82.5 percent of libraries reported an insufficient number of public workstations, and about 57.5 percent reported insufficient broadband speeds.24 to add to these already significant issues, the report indicates libraries are having trouble supporting the necessary information technology (it) because of either staff time constraints or the lack of a dedicated it staff.25 public libraries are facing considerable infrastructure management issues at a time when library use is increasing. overcoming the challenges successfully will require support on the local, state, and federal level. here is where the librarian, as someone trained to become inherently familiar with the needs of her local constituency and ethically bound to provide access to a variety of information resources, needs to insert herself into the debate. librarians need to be ahead of the crowd as the voice that assures content will be readily accessible to those who seek it. today, the elemental policy issue regarding access to information via the internet hinges on connectivity to a sustainable broadband network. to promote equitable broadband access, the librarian needs be aware of the pertinent information policies in place or under consideration, and be able to anticipate those in the future. additionally, she will need to educate local policy makers about the need for broadband in their community. in some circumstances, the librarian will need to move beyond her local community and raise awareness of community access issues on the state and federal level. the librarian is already able to articulate numerous issues to a variety of stakeholders and can transfer this skill to advocate for sustainable broadband strategies that will succeed in her local community. there are many strata of internet users, from those in the forefront of early adoption to those not interested in being online at all. the early adopters drive the market which responds by making resources more and more likely to be primarily available only online. as we continue this trend, the social repercussions increase from merely not being able to access entertainment and news to being unable to participate in the knowledge-based society of the twenty-first century. by folding in added value online access for the community, the library helps increase the likelihood that the community will benefit from broadband being available to the library patrons and by extension to the community as a whole. to realize the internet’s full potential, access to it cannot be provided in generating collaborative systems for digital libraries | visser and ball 193 community, the entire community benefits regardless of where and how the individuals go online. the effects of the internet are now becoming broadly social enough that there is a general awareness that the internet is not decoration on contemporary society but a challenge to it.28 being connected is no longer an optional luxury; to engage in the twenty-first century it is essential. access to the internet, however, is more than simple connectivity. successful access requires: an understanding of the benefits to going on line, technological comfort, information literacy, ongoing support and training, and the availability of culturally relevant content. people are at various levels of internet use, from those eagerly anticipating the next iteration of web-based applications to those hesitant to open an e-mail account. this user spectrum is likely to continue. though the starting point may vary depending on the applications that become important to the user in the middle of the spectrum, there will be those out in front and those barely keeping up. the implications of the pervasiveness of the internet are only beginning to be appreciated and understood. because of their involvement at the cutting edge of internet evolution, librarians can help lead the conversations. libraries have always been situated in neutral territory within their communities and closely aligned with the public good. librarians understand the perspective of their patrons and are grounded in their local communities. librarians can therefore advocate effectively for their communities on issues that may not completely be understood or even recognized as mattering. connectivity is an issue supremely important to the library as today access to the full range of information necessitates a broadband connection. libraries have carved out a role for themselves as a premier internet access provider in the continually evolving online culture. as noted by bertot, mcclure, and jaeger, the “role of internet access provider for the community is ingrained in the social perceptions of public libraries, and public internet access has become a central part of community perceptions about libraries and the value of the library profession.”29 in times of both economic crisis and technological innovation, there are many unknowns. in part because of these two juxtaposed events, the role of the public library is in flux. additionally, the network of community organizations that libraries link to is becoming more and more complex. it is a time of great opportunity if the library can articulate its role and frame it in relationship to broader society. evolving internet applications require increasing amounts of bandwidth and the trend is to make these bandwidth-heavy applications more and more vital to daily life. one clear path the library community can take however, the supply of internet resources is unevenly stimulating user demand and the unequal distribution of broadband access has greater potential for significant negative social consequences. staying the course and following a haphazard evolution of broadband adoption, may, in fact, renew valid concerns about a digital divide. without an intentional and coordinated approach to developing a broadband strategy, its success is likely to fall short of expectations. the question of how to ensure that internet content is meaningful requires instituting a plan on a very local level, including stakeholders who are familiar with the unique strengths and weaknesses of their community. strover, in her 2000 article the first mile, suggests connectivity issues should be viewed from a first mile perspective where the focus is on the person accessing the internet and her qualitative experience rather than from a last mile perspective which emphasizes isp, infrastructure, and market concerns.26 both perspectives are talking about the same physical section of the connection network: the piece that connects the user to the network. according to strover, distinguishing between the first mile and last mile perspectives is more than an arbitrary argument over semantics. instead, a first mile perspective represents a shift “in the values and priorities that shape telecommunications policy.”27 by switching to a first mile perspective, connectivity issues immediately take into account the social aspects of what it means to be online. who will bring this perspective to the table? and how will we ascertain what the best approach to supporting the individual voice should be? the first mile perspective is one the library is intimately familiar with as an organization that traditionally advocates for the first mile of all information policies. the library is in a key position in the connectivity debate because of its inclination to speak for the user and to be aware of the unique attributes and needs of its local community. as part of its mission, the library takes into account the distinctive needs of its user community when it designs and implements its services. a natural outgrowth of this practice is to be keenly aware of the demographics of the community at large. the library can leverage its knowledge and understanding to create an even greater positive impact on the social, educational, and economic community development made possible by broadband adoption. to extend the first mile perspective analogy, in the connectivity debate, the library will play the role of the middle mile: the support system that successfully connects the internet to the consumer. while the target populations for stimulating demand for broadband are really those in the second tier of users, by advocating for the first mile perspective, the library will be advocating for equitable information policies whose implementation has bearing on the early adopters as well. by stimulating demand for broadband within a 194 information technology and libraries | december 2010 initiatives,” 538. 12. ibid., 537–58. 13. sharon strover, gary chapman, and jody waters, “beyond community networking and ctcs: access, development, and public policy,” telecommunications policy 28, no. 7/8 (2004): 465–85. 14. ibid., 483. 15. ibid. 16. wilhelm, “leveraging sunken investments in communications infrastructure,” 282. 17. see, for example, the virginia broadband round table (http://www.otpba.vi.virginia.gov/broadband_roundtable .shtml), the ohio broadband council (http://www.ohiobroad bandcouncil.org/), and the california broadband task force (http://gov.ca.gov/speech/4596. see www.fcc.gov/recovery/ broadband/) for information on broadband initiatives in the american recovery and reinvestment act. 18. federal communication commission, national broadband plan: connecting america, http://www.broadband.gov/ (accessed apr. 11, 2010). 19. ibid. 20. horrigan, “broadband: what’s all the fuss about?” 2. 21. ricardo ramirez, “appreciating the contribution of broadband ict with rural and remote communities: stepping stones toward and alternative paradigm,” the information society 23 (2007): 86. 22. ibid., 92. 23. denise m. davis, john carlo bertot, and charles, r. mcclure, “libraries connect communities: public library funding & technology access study 2007–2008,” 35, http:// www.ala.org/ala/aboutala/offices/ors/plftas/0708/libraries connectcommunities.pdf (accessed jan. 24, 2009). 24. john carlo bertot et al., “public libraries and the internet 2008: study results and findings,” 11, http://www.ii.fsu.edu/ projectfiles/plinternet/2008/everything.pdf (accessed jan. 24, 2009). these numbers represent an increase from the previous year’s study which suggests that libraries while trying to meet demand are not able to keep up. 25. ibid. 26. sharon strover, “the first mile,” the information society 16, no. 2 (2000): 151–54. 27. ibid., 151. 28. clay shirky, “here comes everybody: the power of organizing without organizations.” berkman center for internet & society (2008). video presentation. available at http:// cyber.law.harvard.edu/interactive/events/2008/02/shirky (retrieved march 1, 2009). 29. john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 286, http://www.journals.uchicago.edu.proxy.ulib.iupui.edu/ doi/pdf/10.1086/588445 (accessed jan. 30, 2009). is to develop its role as the middle mile connecting the increasing breadth of internet resources to the general public. the broadband debate has moved out of the background of telecommunication policy and into the center of public attention. now is the moment that calls for an information policy advocate who can represent the end user while understanding the complexity of the other stakeholder perspectives. the library undoubtedly has its own share of stakeholders, but over time it is an institution that has maintained a neutral stance within its community, thereby achieving a unique ability to speak for all parties. those who speak for the library are able to represent the needs of the public, work with a diverse group of stakeholders, and help negotiate a sustainable strategy for providing broadband internet access. references and notes 1. lee rainie, “2.0 and the internet world,” internet librarian 2007, http://www.pewinternet.org/presentations/2007/20 -and-the-internet-world.aspx (accessed mar. 4, 2009). see also john horrigan, “a typology of information and communication technology users,” 2007, www.pewinternet.org/~/media// files/reports/2007/pip_ict_typology.pdf.pdf (accessed feb. 12, 2009). 2. lawrence lessig, “early creative commons history, my version,” video blog post, 2008, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed jan. 20, 2009). see the relevant passage from 20:53 through 21:50. 3. john horrigan, “broadband: what’s all the fuss about?” 2007, p. 1, http://www.pewinternet.org/~/media/ files/reports/2007/broadband%20fuss.pdf.pdf (accessed feb. 12, 2009). 4. “job-seeking in us public libraries,” public library funding & technology access study, 2009, http://www.ala.org/ ala/research/initiatives/plftas/issuesbriefs/brief_jobs_july.pdf (accessed mar. 27, 2009). 5. ibid. 6. ibid. 7. sharon e. gillett, william h. lehr, and carlos osorio, “local government broadband initiatives,” telecommunications policy 28 (2004): 539. 8. john horrigan, “home broadband adoption 2008,” 10, http://www.pewinternet.org/~/media//files/reports/2008/ pip_broadband_2008.pdf (accessed feb. 12, 2009). 9. anthony g. wilhelm, “leveraging sunken investments in communications infrastructure: a policy perspective from the united states,” the information society 19 (2003): 279–86. 10. horrigan, “home broadband adoption,” 12. 11. gillett, lehr, and osorio, “local government broadband 176 journal of library a-utomation vol. 2/3 september, 1969 book reviews information retrieval systems; characteristics, testing, and evaluation, by f. wilfred lancaster. new york, john wiley & sons, 1968. 222 pp. $9.00. despite the fact that users retrieve the majority of information that they obtain from collections such as libraries by employing author / title listings in catalogs, information scientists consider only subject listings in discussions of information retrieval this book is no excepton. lancaster defines an information retrieval system as informing a user "on the existence (or non-existence) and whereabouts of documents relating to his request." half of his book treats of characteristics and operation of information retrieval systems and half of testing and evaluating such systems. it is the latter half of the book that distinguishes it from other general introductions to the subject. for the testing and evaluation sections of his book, the author draws heavily on his experience gained while working on the cranfield project as well as at the national library of medicine. at the latter he examined a segment of the real world in a major investigation of the medlars system. an interesting finding of the medlars study that he reports in the book, but on which he does not elaborate, is that there was no relationship between recall ratio percentage and precision ratio percentage for 299 searches examined. in his preface the author expresses the hope that his book will be helpful to students and useful to practitioners. however, a principal function of such an introduction is to guide the reader in further pursuit, or retrieval, of information. in this function the book does not succeed, for seven chapters are barren of references, another eight average somewhat more than three, and the remaining chapter boasts fifty-three. this book will not supplant other general introductions to information retrieval systems, but its discussion of testing and evaluation is a useful introduction. frederick g. kilgour book reviews 177 how to manage your information, by bart e. holm. new york, reinhold book corporation, 1968. 292 pp. $10.00 essential information exceeds the grasp of the keenest minds in all professions. a method of readily obtaining needed resource material can be a particularly knotty problem for those who have no background in appropriate methods of data storage and retrieval. successful operation for many professionals depends directly upon their ability to work out a practical personal system which does not require complex apparatus, excessive cost or time. the purpose of this volume is to help such individuals evaluate their particular needs and design a method of managing information which will be workable and practical. i found the book enjoyable and informative. it immediately recommends itself with its own efficient organization, attractive format, readable style, clever illustrations, and complete indexing. it not only deals with the broad principles necessary for development of a personal information system but also includes specific information of a practical nature on the approach to this problem for professionals in several different fields. the first chapter, which is titled "man the collector", is fascinating to an unsophisticated non-librarian. it outlines the enormous problem of the growth of world-wide information that appears to be proliferating in an almost malignant manner. this served to emphasize a repeatedly stressed cardinal principle: the need to be selective, so that only items of probable real value will be retained. a most valuable chapter for those not experienced in library work relates to the basic principles for retrieval on a single or multiple entry basis. this logically leads into a discussion of how to evaluate the individual's personal need. the operations of specific simple systems, such as optical coincidence, termatrex, keysort and term cards were adequately discussed. individual chapters are devoted to the unique problems that might be encountered by the engineer, the chemist, the physicist, the architect, the doctor, and the archivist, with emphasis on the specific vocabulary needed for proper organization and a brief review of information sources of the various disciplines. the remaining seven chapters deal with proper use of available sources of information, such as keeping current with the literature, use of the modern library, records management, microfilming, and data systems of the present and the future. this volume should be a real value to many who have limited background and are struggling in vain to keep up with the information they need. it can provide practical pointers for those who want to make a serious effort toward establishing and maintaining a system of storage and retrieval of information that does not rely on an all too often faulty memory. ellis a. fuller, m.d. 178 journal of library automation vol. 2/3 september, 1969 the institutes of education union list of periodicals processing system, by j. d. dews and j. m. smethurst. ( symplegades, no. 1). newcastleupon-tyne, oriel press ltd., 1969. 39 pp. sbn ( 69uk) 85362 060 1. 15s. the first half of this small manual is devoted to describing the file .maintenance .and text editing system developed by the university of newcastle-upon-tyne. the second half of the text is devoted to the technical specifications of the newcastle file handling system and refers specifically to the english electric-leo marconi kdf 9 computer. the system described is the application of a series of general purpose programs, that provide the capability of storing, adding, deleting, or changing variable length records, to a union list project for a group of libraries. unfortunately this otherwise well designed system has not been able to do away with the manual "typed slips" back-up file which plagues so many other computerized union list projects. also of interest in this processing system is the use of the work developed at the newcastle computer typesetting research project for computer controlled composition of the final output. section two of seminar on the organization and handling of bibliographic records by computer, newcastle-upon-tyne, 1967 edited by nigel s. m. cox and michael w. grose (archon books, hamden, connecticut, 1967) is the preferred description of all aspects of the system except for those who need the program specifications. alan d. hogan computer based information retrieval systems, edited by bernard houghton. camden, conn., archon books, 1969. 136 pp. $5.00. this book contains six papers that their authors presented at a special course in april 1968 at the liverpool school of librarianship. the objective of the course was to survey the major computer based informational retrieval systems operating in the united kingdom for an audience of prospective users and planners. the book is a successful elementary introduction to large information retrieval systems. in the 1940's and early 1950's, such pioneers as w. e. baten, g. cordonnier, calvin mooers and mortimer taube developed new techniques for information retrieval, a phrase which mooers coined. the major innovation in the new development was "coordinate indexing" or the coordination of index terms at the time of searching. coordination employed simpl~ boolean logic -"and," "or," and "but not." coordinate indexing increased flexibility of searching and number of accesses to documents in contrast to the inflexible, pre-coordinated traditional subject catalogs. book reviews 179 it was also characteristic of the early systems that they dealt with relatively small files of documents not under classical bibliographical control -patents, internal reports, and segments of external report literature. with the advent of the computer, it became feasible to apply the new information retrieval techniques to large files of traditional materials, but to date the major effort has been directed toward huge files of journal articles. it is, therefore, no surprise to find that the five chapters in computer based information retrieval systems that describe systems all depict retrieval from files of journal articles. these five systems are medlars, the science citation index (sci) and its peripherals, chemical titles ( ct) and chemical biological activities ( cbac), a burgeoning institution of electrical engineers (lee ) sponsored project in selective dissemination of electronics information, and a minor computer application to production of the british technology index; the three major, operational projects are of united states origin. selective dissemination is a gt·atifying feature of sci, ct, cbas, and the lee project, for sdi applications take advantage of the computer's potential for personalization by servicing individual users on the basis of their individual needs. the book is a successful primer that provides a useful introduction to computer based systems for retrieval of journal citations from large files. g. a. somerfield's last chapter, "state of the art of computer based information retrieval systems," is more than its title implies, for the last half of the chapter analyzes desirable improvements yet to be achieved. the first half could well serve as an introduction to the book. recently, several worthwhile primers on information retrieval and retrieval systems have appeared. computer based infotimation retrieval systems is still another to provide the brief, clear, elementary introduction that new students, new users, and new planners find most effective in providing an understanding of an unfamiliar field. frederick g. kilgour modern data processing, by robert r. arnold, harold c. hill and aylmer v. nichols. new york, john wiley and sons, inc., 1969. 370 pp. $8.95 this book is an updated version of the authors' previous book, introduction to data processing, john wiley and sons, 1966. the present volume is designed to be used as an introductory text to the concepts of all facets of data processing. it will not teach people to be programmers or systems analysts but it can be very useful to anyone who would like to learn about data processing without having to become a programmer or systems analyst. the book is well organized and explains, in non-technical terms, highly technical facets of data processing. this book can be used not only 180 journal of library automation vol. 2/3 september, 1969 at the high school level but also at the beginning college level. in it the authors strived and achieved to ·make . available all the latest advancements in the computer science field. in my opinion the authors have achieved then· goal of developing a very good elementary text in data processing. i highly recommend this book to librarians and all others as a basic primer in automation. it will be particularly useful to administrators, as it has an excellent glossary that assist them in their understanding of the data processing vocabulary and jargon. thomas k. burgess 148 telecommunications primer joseph becker: vice-president, interuniversity communications council (educom), bethesda, maryland a description of modern telecommunications devices which can be useful in inter-library communications, including their capacities, types of signal<; and carriers. described are telephone lines, radio broadcasting, coaxial cable, microwave and communications satellites. this article, and the one following, were presented as tutorials by the autho1's to participants at the american library association's atlantic city convention on june 25, 1969. as greater emphasis is placed on the development of regional and national library network programs to facilitate interinstitutional services, a concommitant requirement emerges to understand and apply communications technology. a great variety of communications methods has been used for interlibrary communications in the past, ranging from the simplest use of the u.s. !ilails up to the telephone, the teletype, the radio, and even experiments with microwave telefacsimile transmission. of all the different kinds of equipment used by libraries for interlibrary communications, the one which has received widest acceptance for its practical value and immediate usefulness is the teletype machine. the earliest use of the teletype machine can be traced back to the free li~ brary of philadelphia, which in 1927 used the teletype as part of a closed circuit system for communicating book information from the loan desk in the main reading roo'm to the stacks and vice versa. following world telecommunications primer 149 war ii, an installation connecting distant libraries yvas established in wisconsin between the milwaukee public library and the racine public library. racine's limited collection was considered inadequate to the demands of its patrons and its director, instead of increasing the book budget significantly, negotiated an access arrangement with the larger collection at milwaukee via teletype. daily messenger service was instituted between the two libraries to effect pickup and delivery of library materials. the teletype machine enabled the two libraries to have the speed of the telephone with the authority of the printed word. this advantage continues today and can be considered mainly responsible for the proliferation of t eletype communications for interlibrary loan. teletype communications between and among libraries are beginning to emerge in both informal and formal network configurations. in addition to its obvious application to interlibrary loan, teletype has also been used to augment library holdings on a reciprocal basis, to provide for general communications with other libraries, to serve as a channel for querying union catalogs, to accommodate reference questions and services, and to handle internal communications. perhaps the most important benefit to accrue to users of library teletype service is the ability to communicate immediately with any other teletype user anywhere in the world. thus, it becomes possible for any participant in the teletype network to communicate reference inquiries to information points outside the formal network. (a classified teletype directory exists which lists library subscribers in the united states and canada.) as reference demands increase, it is likely that libraries wpl begin to make wider use of the teletype machine even though it may have been acquired initially for a more limited purpose. in addition, expanded uses in the future are a virtual certainty both because of the low cost of teletype operation and because of the technical improvements in the equipment itself. although the advantages of other means of telecommunication have been known to libraries for many years, their utilization has been retarded by problems of cost and systems planning. however, in recent years, as libraries have made greater use of computers and as they h ave moved towards new programs of interlibrary cooperation and resource sharing, interest in telecommunications in general has grown more intense. the purpose of this article, therefore, is to provide a brief explanation of the fundamentals of communications technology in order to establish a basis of understanding for current and future library planning. telecommunications capacity telecommunications may be simply defined as the "exchange of information by electrical transmission over great distances." for the past forty years, the united states, through its commercial carriers, the bell tele150 journal of library automation vol. 2/ 3 september, 1969 phone system and western union, has built an increasingly effective system of wires, trunk stations, and switching centers for the transmission of human speech from point to point. the telephone network is a technological marvel despite the occasional busy signal one gets on the line. however, with the increasing use of computers and television in science, business, and industry, this network is being asked to carry digital and video signals in addition to voice, and its facilities are fast becoming overloaded. in the library field one can observe the trend toward use of machine readable data and non-print materials. these are but a few examples of library data forms that one will wish to communicate between and among libraries. voice can be efficiently transmitted over telephone lines, but data, like the digital language of the computer or the video language of the television camera and facsimile scanner, need a broader band-width for their efficient transmission than the narrow-band-width telephone line can provide. band-width is a measure of the signal-carrying capacity of a communications channel in cycles per second. it is the numerical difference between the highest frequency and the lowest frequency handled by a communications channel. the broader the band, the greater the signal transmission rate. the tens of thousands of bits which make up a computer message or tv picture, if sent by telephone, have to be squeezed through the narrow line over a longer period of time to transmit a given message. this consumes telephone capacity that would normally be used to carry other conversations. a good example of the problem can be illustrated with the "picture-phone". this is the telephone company service now being tested which permits a caller to see and hear the other person at the distant end. the two-way picture part of this dialogue requires more than 100 times more telephone transmission capacity than the voice portion. there are 100,000,000 telephones in the u.s. today. thus, if only 1% of the subscribers had picture phones we would theoretically exhaust our national telephone capacity for any other use. · fortunately, the problem of telecommunications capacity is not without solution. new channels of communication are being opened that do provide capacity for broad band-width exchange. the new technology of laser communications, for example, stands in the wings with a long-range answer. the word laser stands for light amplification by stimulated emission of radiation. its theoretical beginnings go back half a century, but fifteen years ago scientists working in high-energy physics learned how to amplify high-energy molecules so as to produce a powerful, narrow, coherent beam of light. this strange kind of light remains sharp and coherent over great distances and can therefore be used as a reliable channel or pipe for telecommunications. all other long-distance transmission systems tend to spread or disperse their signals, but laser beams provide a tight, confined highway over which signals can travel back and forth. telecommunications primer 151 a few years ago seven new york television channels, in an experiment, transmitted their programs simultaneously over the same laser beam. in terms of telephone conversations, one laser communications system could theoretically carry 800,000,000 voice conversations! the intense pencil-thin laser beam is so powerful and reliable that it can and is being used as a communications channel for space exploration. the apollo 11 astronauts left a laser beam reflector on the moon's surface to facilitate future communications experiments. types of signals there are three principal types of signals that telecommunications systems are designed to carry: 1) audio-originating as human speech or recorded tones and transmitted over conventional telephone lines. 2) digital-originating with computers or other machines in which data is encoded in the binary language. the data, instead of being represented as zeros and ones, take the form of an electrical pulse or no pulse. 3) video -originating with tv recorders, facsimile scanners, or other devices which change light particles into electrical energy in the form of small, discrete bits of information. each of the three types of telecommunication signals is associated with a telecommunication channel that can carry it most efficiently. audio, of course, was designed to travel over the telephone line. however, it can be carried just as well over the broader band-width channels. digital and video signals are carried over the wider band-width channels because of the great number of bits that must be accommodated per unit of time. sending computer data or pictures over telephone lines is possible if data phones are used; they convert digital and video data to their tone equivalents at the transmission end and reconvert them at the receiving end. this is, however, a very slow process and from a communications viewpoint it is most inefficient. when reference is made to "slow scan television.. it means that the video signal is being carried over a telephone line. library experimentation with telefacsimile has by and large been restricted to transmission of the facsimile signals over telephone lines. an 8"x10" page carried by telephone lines takes about six minutes, as compared to 30 seconds if it were sent over a broad band-width channel. a telecommunications system used for library purposes will eventually need to integrate audio, digital, and video signals into a single system. this integrated media concept is an important aspect of the design of an interlibrary communications system but it is poorly understood analytically in today's practice. the idea of an "integrated telecommunications system" became practical only during the past few years and commercial and governmental efforts are underway to provide these unified facilities as rapidly as possible. 152 journal of library automation vol. 2/3 september, 1969 signal carriers a number of methods exist by which audio, digital, and video information can flow back and forth for information exchange purposes. these telecommunications facilities are furnished for lease or private line use by the commercial carriers. a dedicated system may also be installed for the sole use of a particular customer. for example, the u.s. government has more than one dedicated system: the federal telecommunications system (fts ), which is available for official use only by civilian agencies; and it has similar dedicated facilities for use by the military. large companies, such as general electric, weyerhauser, and ibm, have exclusiveuse telecommunications systems also. in all cases, however, private or dedicated systems are planned in such a way that they interface smoothly with commercial dial-up facilities-thus increasing the overall distributive capacity of any one system. as might be expected, the tariff structure for these combined interconnections is very complex. the federal communications commission is reviewing the overall question of cost for voice and data communication and is also investigating the policy issues raised by the growing interdependence of computers and communications. technically speaking, there are five means by which audio, digital, and video signals may be carried to their destination and returned: by telephone line, by radio, by coaxial cable, by microwave relays, and by communications satellite. an explanation of each is given below and they are presented in ascending order of their band-width capacity. telephone lines the telephone as a means of communication is beyond compare. it is simple, quick, reliable, accurate, and provides great geographic flexibility. quite ofien the telephone can supply all the communications capability required for an information system, especially when it is coupled with the teletypewriter. a good toll quality telephone circuit has a frequency response of about 300-3400 cycles, which is adequate to supply good quality and a natural sounding voice. regular telephone lines are referred to as narrow band caltiers because of the low cycle range needed to carry human speech. radio broadcasting as the word "broadcasting" implies, signals are radiated in all directions and the omnidirectional antennas which are used in radio broadcasting are designed to have this effect. frequencies used are 500 to 1500 kilocycles for am (amplitude modulation), and 88 to 108 megacycles for fm (frequency modulation). the number of radio waves that travel past a point in one second is called the frequency. the number of waves s~nt out by a radio station each second is the frequency of that station. one complete wavelength is called a cycle. a kilocycle is one thousand cycles and a megacycle is one million cycles. broadcasting, in general, is used telecommunications primet· 153 as a one-way system. any radio or tv set equipped to receive certain frequencies can tune in to a particular station or channel. low-frequency systems, in the kilocycle range, require less power to operate. the signals are propagated close to the ground and the effective radius of reception ts small. with ultra-high frequency, vast distances can be covered by striking upper layers of the atmosphere and having the signal deflected to earth; this can happen more than once before the signal is received. high-frequency systems, however, are subject to atmospheric interference, which causes fading. coaxial cable (and catv) a remarkable extension of the carrier art was provided by the development of the coaxial cable. within the sheath of most coaxial cables are a number of copper tubes. within each tube is a copper wire, supported by insulating disks spaced one inch apart. the name coaxial reflects the fact that both the wire and the tube have the same axis. coaxial cables can carry many times the voice capacity of telephone lines and are thus considered to be broad band-width carriers able to accommodate digital and video data with equal efficiency. the coaxial cable has the additional advantage that the electrical energy confined within the tube can be guided directly to its destination, instead of spreading in all directions as is the case in radio broadcasting. to provide necessary amplification along the route, repeater stations are placed at designated intervals. repeater stations are unnecessary, however, within a half-mile radius and many libraries, planning new buildings, are including special ducts to accommodate known or potential requirements for communication between computer units, terminals, dial access stations, etc. the technology of community antenna television (catv) incorporates extensive use o.f coaxial cables. catv operates very similarly to the way a closed circuit television system works. a company in a locality sets up a powerful receiving antenna capable of importing television signals from many cities hundreds of miles away. on a subscription basis (about $6.00 per month), it will run a coaxial cable from the receiving station to the subscriber's home. subscribers benefit in several ways: 1) the incoming signals are sharper and clearer because there is no atmospheric interference; 2) a roof-top antenna is unnecessary; 3) more channels are available than a local tv station normally provides (some catv stations already offer the potential of 20 channels); and, 4) catv stations have close interrelationships with educational television stations (etv) and by law are required to make available to subscribers at least one channel for .. public service" and .. educational" purposes. the latter benefit has special implications for libraries. school libraries in a town or city where catv is proposed might well inquire whether the operator is willing to provide a school library programming service. 154 journal of library automation vol. 2/3 september, 1969 it is hardly possible to predict what effect catv and its coaxial cables will have on libraries. it is clear, however, that many homes will soon have coaxial cables as well as telephone lines, and this implies a new capability for bi-directional broad band-width information exchange. attachment of a coaxial cable from a catv trunk station to the horne provides an electronic pathway 300 megacycles wide. the telephone line is only 4000 cycles wide. since a megacycle is one million cycles, the relative practical difference in an operational environment is in the order of 50,000:1. it is this significant difference that causes some people to suggest that advanced telecommunications will someday bring newspapers and books into the home by electronic facsimile, along with computer information from data banks, individualized instruction from schools, and a much greater variety of educational materials. microwave the term microwave applies to those systems where the transmitting and receiving antennas are in view of each other. the word is not very definitive but generally describes systems with frequencies starting at 1000 megacycles and extending up to 15,000 megacycles, a range which includes the ultraand super-high frequency bands of the radio spectrum. microwave is, therefore, without question, one of the larger broad bandwidth carriers. microwave systems are used to transmit data and multichannel telephone or video signals. antennas are in the form of parabolic dishes mounted on high towers and lined up in sight of each other. these antenna produce very sharp beams to minimize power requirements. since microwaves do not bend, transcontinental microwave systems consist of relay towers spaced at approximately thirty-mile, line-of-sight intervals across the country. because of the earth's curvature, transoceanic microwave systems are hardly possible without a repeater station. it is this limitation which helped give rise to the development of the communications satellite. many state governments have, or are planning, private microwave systems for handling the mix of official, internal communications. here again, state libraries might investigate the use of such systems for interlibrary communications. communications satellites the newest and most promising telecommunication development is the communications satellite. a communications satellite is an object which is placed in orbit above the earth to receive and retransmit signals received from different points on earth. a communications satellite is launched by a conventional rocket, which sends it into an eliptical orbit with a high point, or apogee, of about 23,000 miles and a low point, or perigee, of 195 miles. on command from earth, a small motor aboard the satellite is fired telecommunications primer 155 just as the satellite reaches the high point of its orbit. this action thrusts the satellite into a circular path over the equator at an altitude of approximately 22,300 miles. subsequently, the satellite's orbital velocity is then synchronized with the speed of the earth's rotation. thus, a satellite in synchronous equatorial orbit with the earth appears to remain in a fixed position in space. three satellites can cover the globe with communications except for the north and south poles. or the antennas can be squinted to focus exclusively on one country or on part of a country. early bird's antenna was positioned to cover europe and the northeastern part of the united states, thus making it possible to link north america with europe. a satellite is not very large; early bird, which is still operating, is about seven feet in diameter. it contains a receiver to catch the signal, an amplifier to increase the signal's intensity, and a transmitter. signals received from one earth station on one frequency are amplified and transmitted on another frequency to a second earth station. the satellite receives light energy from the sun, and its solar batteries convert it into electrical energy for transmitting power. communications satellites are, in essence, broad band-width signal repeaters whose height enables them to provide coverage over a very large area. they can be "dedicated"; that is, designed for a single class of service, such as television relay; or they may be multipurpose and integrate a mix of different signals at the same time. generally, we tend to think of satellites as an extension of satellite broadcasting, mainly because most of their use up to now has been for television broadcasting. however, the enormous band-width capacity which they possess also makes them very attractive channels for two-way voice and picture applications for education, business, and libraries. within the next decade, domestic communications satellites will be available as "switchboards in the sky" for just such uses. conclusion libraries, like other institutions in our society, have learned the hard way that the new technology must be treated as an opportunity and not as a panacea. the same is true of telecommunications. before telecommunications can be applied effectively to interlibrary functions and services, many non-technical problems have to be solved. librarians must answer questions such as: how shall we organize our libraries to make optimum use of the advantage of telecommunications? what segment of our information resources and daily library business should flow over these lines? will our users accept machines as intermediates in the information exchange process? how can the copyright principle be safeguarded if libraries expand their interinstitutional communications? and, of course, how do we measure cost/ effectiveness before moving ahead with an operating program? to provide answers professional librarians must become more familiar with telecommunications technology and principles. 156 journal of library automation vol. 2/3 september, 1969 bibliography 1. becker, joseph: "communications networks for libraries," wilson library bulletin, 41 (december 1966), 383-387. 2. gentle, edgar c.: data communications in business: an introduction (new york: american telephone and telegraph company, 1965 ), 200 p. 3. kenney, brigitte l.: a survey of interlibrary communications systems (jackson, mississippi: rowland medical library, april1967), 74 p. prepared for the national library of medicine under nih contract no. ph-43-67-1152. 4. library telecommunications directory: canadaunited states. 2d edition, revised. (toronto and durham: 1968). 5. u. s. president. task force on communications policy: final report. (washington, d. c.: u.s. government printing office, december 1968). lib-mocs-kmc364-20131012113638 book reviews the future of the printed word: the impact and implications of the new communications technology. edited by philip hills. westport, conn.: greenwood, 1980. 172p. $25. lc: 80-1716. isbn: 0-313-22693-8 (lib. bdg.). the character of this volume is as much that of a topical journal or annual review as that of a monograph. a dozen authors have contributed thirteen chapters, all but one prepared especially for this publication. ten of the chapters are by british authors, two by americans, and one by european community personnel located in luxembourg. an amusing punch satire about book (built-in orderly organized knowledge) is reprinted as an unnumbered fourteenth chapter. in an excellent opening essay, john m. straw horn notes: "in this book, the expression printed word is construed very broadly, to include words in any kind of display: paper, microforms, crt's, plasma panels and so on." his essay is a terse but pointed review of the organization of information transfer, some current trends, factors affecting acceptance of new technologies, and some broad projections for the future. provocative essays by maurice b. line and p. j. hills, editor of the volume, explore the printed word from the points of view of a bookperson and an educator. in one of the most elegant metaphors to appear in information science literature, line suggests: "the printed butterfly will emerge from its electronic chrysalis, but it will also return again to it in due time. the vast majority of documents will thus be stored in electronic (chrysalis) form, but the majority of those used at any given time will be in their printed (butterfly) form." two incisive and thorough chapters on official information by patricia wright systematically explore the use of old and new technologies for forms, leaflets, and signs. 239 wright makes acute and useful observations on how technology can hinder or help gathering and dispersion of governmental information. the graphic information research unit of the royal college of art has done excellent work in recent years in exploring how various display options affect comprehension. linda reynolds provides a good essay, "designing for the new communications technology," based on that research. the review of prospects for electronic journal publishing by donald w. king is a good overview, especially for beginners. a chapter on euronet diane describes problems in creating an online database capability in the european political environment. chapters on printing technologies, microforms, and videodiscs cover all major alternatives but suffer from brevity. two brief but competent speculative essays, which add little, complete the volume. the work lacks a general index, but the organization of chapters makes this a minor flaw. use of presumably common british acronyms without explanation, especially in credits and citations, is an irritant for non-u.k. readers. the work would make an excellent supplementary text for a course on the history of the book. practitioners in publishing or library and information science will find much of interest.-brian aveney. turnkey automated circulation systems: aids to libraries in the market place. edited by judith bernstein . chicago. american library assn., 1980. 332p. $10.50. when my library entered the marketplace for an automated circulation system, i searched the literature for aids. had i found this book at that time i would have been disappointed. what i would expect from a 332-page book with a subtitle, "aids to libraries in the market place," would be numerous examples of what had been done 240 journal of library automation vol. 14/3 september 1981 before. i would expect samples of the analyses that other libraries had done to justify entering the marketplace, samples of the rfps that had been sent to vendors, and samples of the contracts that had been signed. i would like to see a case study (or two) of the complete process of procurement. admittedly, this expectation is somewhat of an ideal, but these are "aids" that we searched for and that other libraries now ask from us. what does this book provide? an editorial introduction gives a sense of the difficulties of the marketplace and the frustrations encountered in it. a two-page bibliography gives a reasonable selection of readings to provide a background for decision making. a discussion titled "hiring a consultant-why and how," is a very useful enumeration of details to be considered in the decision to hire a consultant and in the agreement with a consultant. a model request for proposal is a good synthesis of the details to be included in almost every library's rfp and thus provides a starting point for the library new to the marketplace. all of this is what i consider to be the substance of this book, and it ends at page 40. the remaining 292 pages are devoted to the "profiles" of individual libraries which have installed automated circulation systems. the profiles are intended to assist in the identification of libraries to be contacted for further information, but provide little useful information by themselves. my primary objection to this book is the misleading nature of the citation. one expects more than three hundred pages of "aids" and finds a directory with a fortypage preface. but for the librarian new to the marketplace it may be worth the price.-alan e. hagyard, yale university library, new haven, connecticut. archives and the computer, by michael cook. london: butterworths, 1980. 152p. $29.95. lc: 80-41286. isbn: 0-40810734-0. michael cook recognizes the special predicament of the archivist whose job consists of trying to satisfy three contradictory needs: (1) the need to arrange and describe archives by their provenance, (2) the need to store them most efficiently by shape and size, and (3) the need to access them to answer inquiries that are mostly subjectoriented. the solution to these conflicting requirements may come from the computer. as cook says, "the speed and variety of computerized lists and indexes derived from a single data base could solve this problem by producing finding aids in all possible sorts of order." in a very handsomely produced, sturdily bound book, archives and the computer, michael cook, archivist of the university of liverpool, reports on various computer systems serving the needs of the archivists. his book starts with a general discussion on the nature of automated systems and their relation to manual ones. this is followed by the description of a select group of archives systems-some still in use, others put to their well-deserved rest after a few years' use. he covers records management systems (i.e., the area of handling current records) and archives management systems (i.e., the handling of noncurrent documents). in the final chapter cook moves the discussion away from computer processing of traditional, familiar forms of archival material, focusing instead on processing archives that are themselves machine-readable data files. how does the archivist accomplish all of the necessary tasks if the archives are not readable by the human eye? how does he appraise, arrange, describe, and access them? i like mr. cook's cautious and sober attitude. talking about system design, he remarks, "at this stage decisions will be made which will be irrevocable in practical terms, and may cause much trouble later. " about implementation and testing, "computer systems should help people to work more effectively in a more interesting environment; if they fail in this, or appear to fail, there is something wrong, and it would perhaps be better not to introduce the change." the records management systems he describes are used by british county and city record offices. an interesting feature in one of them, a system called arms, is a printout that tabulates for each class of documents the number of requests in a year, per year stored. this printout could be very helpful in modifying established retention periods on the basis of experience. the following archives systems are described: prospec (adopted by the public record office of london) , nars a-1 (used by the national archives of the usa), spindex (first used by the national archives and the national historical publications and records commission), selgem (used by the archives of the smithsonian institution), stairs (an ibm system, used, among others, by the house of lords record office in london), paradigm (developed and used at the university of illinois), mistral (used by the national archives of ivory coast), and arcaic (used and abandoned by the east sussex record office). of all these systems, i found the description of selgem the most educational. besides listing the fields making up a computer record, cook shows an example of an actual record as it appears in the master list, and as it appears in the printed guide to the archives. he also includes an actual segment of the name/ subject index. although there is a brief mention about the choice between networking versus isolated, separate systems, the book does not speculate about the possibility of a network of many institutions building a common database. nor does the author discuss the much debated and very timely question of whether archivists could possibly agree on a uniform computer record for the description of manuscripts and archives, similar to the way in which librarians have agreed on using the marc formats for the description of their materials. a glossary of technical terms, a "select directory" of archival systems, and a "select bibliography" are useful additions to the main text. this book is more recommended to the archivist looking for a computer system than for the systems analyst who wants to learn how archives are processed.suzanna lengyel, yale university library, new haven, connecticut. the library and information manager's guide to online seroices. edited by ryan e. hoover. white plains, n.y.: knowledge industry publications, 1980. 270p. $29.50 hardcover, $24.50 softcover. lc: 8021602. isbn: 0-914236-60-1 (hardcover); book reviews 241 0-914236-52-0 (softcover). hoover and jeven colleagues provide an overview of the main issues and techniques involved in starting and managing an online retrieval service. the emphasis is on a library setting-the implicitly broader focus conveyed by the title is not matched by any specific coverage of, for example, the online search activity of the for-profit information brokers, where funding, staffing, publicizing, and the search process itself are handled differently than in libraries. the three large, general search services (lockheed, sdc, and brs) are used throughout for the descriptions and search examples, and their bibliographic databases inevitably receive the most attention. there is a noticeable slant toward the two agencies with which several of the contributors are or were affiliated-the university of utah (which doesn't detract from the book's objectivity) and sdc (which does). the chapters are of uneven quality and scope. most of the obvious areas are covered-the available search systems and databases; equipment needs; search techniques; managing an online service in a library; training searchers; promoting service; and measurement and evaluation. taken as a whole, the.book is a good stateof-the-art report, even though it is already becoming outdated in terms of industry facts. the numerous charts and tables serve to flesh out the text, but do we really need six photographs of terminals (two of them showing the same searcher at the same terminal , the only difference being that in one there is an onlooker) to illustrate that "some searchers prefer to have the user present"? brief chapters on the growing network of online user groups, and on the future of online services (largely derived from lancaster) end the text, and the book has a serviceable bibliography, glossary, and index. six years ago i reviewed one of the first kipi publicationsit was in typescript, comb-bound, a little more than one hundred pages, and it cost $24.50. this is a much better production and, considering inflation since 1975, it represents vastly better value for money. it should serve as a useful handbook for those of us in the field, as well as those just starting, for another 242 journal of library automation vol. 14/3 september 1981 year or two.-peter watson, california state university, chico. basics of online searching, by charles t. meadow and pauline atherton cochrane. new york: wiley, 1981. 245p. $15.95. lc: 80-23050. isbn: 0-417-05283-3. the use of online information retrieval services is becoming widespread throughout the information community, whether in traditional libraries or in business, industry, or government offices. the need for trained searchers is evident by looking at the job advertisements and at the quantity of training programs being offered around the country. the programs presented by the machine-assisted reference section (mars) of the reference and adult services division of ala are always packed. the librarians attending ala annual conferences seem to be hungry for any information available about online information retrieval services. this text fills an obvious need for the professional who attended library school before course offerings in online information retrieval were available. although online information retrieval is now being taught in most library and information science curriculums, there have been only a few attempts at providing a textbook for beginning students, and none of those has been very successful since the lancaster and fayen information retrieval online in 1973. basics of online searching is a text intended "to teach the principles of interactive bibliographic searching . . . to those with little or no prior experience. the major intended audiences are students, working information specialists and librarians, and end users, the people for whom all this searching is done. " because the authors have done an excellent job of targeting their audience and sticking to that target, this text will be useful at the introductory level. the authors cover the elements of interactive searching including the reference interview, boolean logic, search strategy development, telecommunications and equipment, basic database structure, selective dissemination of information, and how to get help from search-service vendors. the text is relatively free of jargon and does a good job of defining in context new terms as they appear. the authors begin with basic definitions and a brief overview of the process of interactive searching. the reference interview and search strategy development is covered adequately, first with an introduction and then in a later chapter providing more detailed information. telecommunications and computer equipment are covered in enough detail for the novice. the next five chapters cover search language, databases, various types of text searching, and how to get on and off the computer. this section of the book uses examples that show the different approaches to the same process on three different systems-brs, orbit, and dialog. the authors do not lose sight of their intent to demonstrate the principles of online searching. there is a brief chapter on selective dissemination of information (sdi) and cross-file searching. the chapter explains how sdi is used and gives examples of constructing and saving a search for sdi on each of the three systems. the last chapter of the book, "search strategy," is especially good. there seemed to be something beyond the basic elementary information of the preceeding chapters. the authors clearly demonstrate concept development and search strategy formulation. the authors do an excellent job of integrating the discussion of the three major search service vendors, lockheed's dialog, system development's orbit, and bibliographic retrieval services, inc. examples are used from each of the services with a discussion of the differences. the book does clarify the similarity of the services by showing how each function can be accomplished on each system. searchers using only one system now might use this text to see how easily their knowledge could be transferred to another system. problems with the text do not abound, but there are some that should be brought to the attention of the reader. there is a slight problem with the format of the examples. the reviewer found herself searching for the completion of a paragraph of text on a few occasions. the examples are very good and clear; they are simply not separated from the text adequately for easy reading. there were a couple of instances of unnecessary redundancy . t here were two separate discussions, one on truncation and one on searching word fragments, which could have been improved by integration into one section. there was a repetition of "steps in the presearch interview and the online search" in chapter 3 and then again in chapter 12. this is almost a page of steps, which are very good, but a simple reference back to the earlier list would have sufficed. but the biggest problem with the text in the eyes of this reviewer is that of omission. there was no discussion of citation searching, evaluation of search results, and no mention of the various training options available for the novice searcher. this reviewer would like to have seen more information on where to go next as guidance to the novice. the one hundred pages of appendixes seem unnecessary and will soon be out of date. library school teachers planning to use this as a text would do well to request free, up-to-date materials rather than relying upon the documents in the appendix, which are more than a year old at the time of this writing. most every book on this topic has made the same mistake of reprinting search-service and database-producer literature. overall, however, the authors have succeeded very capably in their intended endeavor "to teach principles, rather than the detailed mechanics of any particular search system." there is a place in the literature for this very basic text, which is well written, uses clear examples, and teaches in an understated way. for those people who are afraid of automation, afraid to touch a computer terminal, and are insecure about their ability to do online searching, this book will relieve most of those fears and insecurities. the authors acknowledge their desire to give simple instructions and offer a chapter called "assistance" for people who need more help. novices might assume they could read this book, purchase a terminal, get a password and system manual, and begin searching. as a matter of fact one could do this, but the results would likely be a discredit to the search-service vendor because of a lack of system-specific training on the part of the searcher. most people, like this reviewer, can conceptualize a new process, but would feel more comfortable with some type of formal hands-on book reviews 243 training-even for half a day. there are too many little things that can be an impediment to success. the reviewer would heartily recommend this book to inexperienced searchers and library school students but would warn the experienced searchers that there is nothing newforthem.-carolynm. gray, western illinois university, macomb. quick • search cross-system database search guides. san jose, calif.: california library authority for systems and services, 1980. 21 charts. $75 (class members), $95 (nonmembers). isbn: 0-938098-00-4. the class on-line reference service (colrs) is a cooperative program for public, academic, and special libraries offering training and consultation on almost any aspect of online reference searching through the major commercial vendors of databases. this service is a part of class, the california library authority for systems and services, and acts as a contact point for searchers and the database industry through vendor-training sessions, database training, and the coordination of large group contracts with dialog information services and bibliographic retrieval services (brs). this close relationship to the online industry gives class a unique position from which to supply information on databases from a multiple search-system perspective. the publication of the quick•search cross-system database search guides is a natural outgrowth of the colrs program in training and consulting. the twenty-one charts in quick•search show the formats used to search for information in a specific database across the two or three vendors offering the database commercially. the databases were selected as the most commonly searched through the major commercial search services: bibliographic retrieval services, dialog information services, and system development corporation search service (soc). eight databases in the sciences, eight in the social sciences, and five multidisciplinary files are included in the complete set. two subsets of the science and multidisciplinary files, and the social science and multidisciplinary files are available for $60 for class members 244 journal of library automation vol. 14/3 september 1981 and $80 for nonmembers. the eight science databases are biosis, cab abstracts, compendex, energyline, enviroline, food service & technology abstracts, inspec, and oceanic abstracts. the social science files are abii inform, eric, exceptional child education resources, library and information science abstracts, management contents, psychological abstracts, social scisearch, and u.s. political science documents. the multidisciplinary databases are conference papers index, comprehensive dissertation index, ntis, pais international, and ssie current research. the stated purpose of the quick • search guides is to aid the experienced searcher who must use databases from more than one search service by showing the formats for each vendor of a database side by side for comparison. because most searchers tend to use a database on only one system, the guides are really more appropriate to an organization where several searchers may be using the same database through different systems and a "universal" quickreference chart is needed. because each guide covers only one database, the level of detail shown is much greater than in the simple-command comparison charts previously published. the guides are arranged to show particular features of the databases as they are used on the different search systems. the file label used to access the database and those fields that are searched when a term is entered with no restriction (the basic index) are shown at the top of each chart. the fields used in subject searching follow and show the field codes used to restrict subject searches, along with the format used online to enter search terms. the typical fields illustrated are title, subject descriptor, identifier, abstract, and category or section code. these fields vary according to database, but include the majority of subject access points used in the file. the balance of the chart is used to illustrate the field codes and formats used to retrieve information from other access points in the database such as author, journal source, language, publication date, document type, report numbers, or update code. these alternate access points vary widely by database, but each chart provides information on limiting searches by date, language, or update code at a minimum. the guides supply a useful amount of information for the experienced searcher needing a prompt on a form of entry for the fields available in a database, but a good understanding of the search system is required to use them properly. given the close contact class has with the database producers and online vendors, it is somewhat surprising to find inaccuracies and some misinterpretation in some of the guides. in the preface, for instance, the editor states, "in many brs files, uj and un are paragraph labels used in addition to de, mj, and mn. they are used to indicate major (uj) or minor (un) single word descriptors, similar to the df in dialog and iw in orbit." it is true that df is used in dialog to indicate a single-word descriptor, but in orbit the code is it. in brs, uj and un mean the term so restricted is an "unbound" part of a multiword descriptor-not a single-word descriptor (see brs/eric database guide, p.l4). the use of iw in orbit retrieves "unbound" words from the it field. the most trouble in the charts appears to be in the orbit sections. the basic index is misrepresented in several files and the iw field is only irregularly listed, even when it is present in the sdc version of the database. suggestions on the use of sensearch and stringsearch are not consistently illustrated for fields that cannot be directly restricted in some databases on orbit, such as abstract or supplementary index terms. many times the suggested search entry would not restrict retrieval to the field indicated on the chart. these inaccuracies would probably not doom an experienced searcher to failure in using a database, but they are annoying and do little to inspire absolute confidence in the information presented. class is to be complimented on the graphic representations in quick*search and the heavy stock used for the guides (the paper will probably outlive the information printed on it). addenda are planned for those databases changed or reloaded since the preparation of quick*search in october 1980, and a second edition is already under consideration. the quick*search guides are not meant as a replacement for vendor or database documentation and, in fact, are simply repackaged versions of the basic file descriptions available from the online vendors. considering the price of this publication, organizations would do well to consider investing instead in detailed user guides and updates for their searchers in order to provide the most accurate and current information on databases on a specific system.-rod slade, university of oregon library, eugene. viewdata and videotext, 1980-81: a worldwide report. transcript of viewdata '80, first world conference on viewdata, videotex, and teletext, london, march 26-28, 1980. white plains, n.y.: knowledge industry publications, 1980. 623p. $75 softcover. lc: 80-18234. isbn: 0914236-77-6. videotex81. proceedingsofvideotex'81 international conference and exhibition, may 20-22, 1981, toronto, canada. northwood hills, middlesex, u.k.: online conferences ltd., 1981. 470p. $85 softcover. viewdata '80 and videotex '81 were two state-of-the-art conferences for the emerging videotex field. videotex is the generic name for mass-market, consumer-oriented information retrieval systems of low cost and relative ease of use. videotex, as a technology, is divided into teletext systems and viewdata systems. teletext systems sequentially broadcast information using a portion of the television signal. subscribers, using a special decoder, can select individual pages from the several hundred offered. viewdata systems, on the other hand, are quite like online information systems except for their use of a television as a display device, their simplicity, and their broader range of transactions and information. these conference proceedings will be of interest to a limited audience. they are not for the complete beginner. nor will they provide hours of entertaining reading. neither meets academic publication criteria; many of the papers are fluff, outlines, or sales pitches. both proceedings have their share, unfortunately large, of uninformative articles. but if you are seriously interested in vidbook reviews 245 eotex's technology, uses, and social implications, then by all means at least skim the 1981 conference papers. the proceeding~ do describe the state of the art. moreover, the two proceedings, taken together, show some of the changes in the videotex field in the last year ... and not only in the spelling of "videotex." as state of the art, the viewdata '80 conference proceedings are already superseded. most of the material has been adequately covered by now in other publications at a much lower cost. there are two exceptions to this, both worth noting. the proceedings has several excellent articles on the japanese captain system, the best published on that system. of additional interest is a report on control data corporation's (cdc) market test of their plato educational system. their report suggests a large consumer market for highquality educational services even at a relatively high price. the videotex '81 conference proceedings are, of course, more current. there are four major topics of interest in the proceedings. firstly, there are several good presentations on videotex services, such as electronic publishing, retailing, and banking. there is an excellent discussion on what videotex means to newspapers, both in opportunities and threats. secondly, and particularly recommended, is a paper by tydeman and zwimpfer of the institute for the future. the paper outlines some of the social changes and problems that may result from large-scale videotex implementation. thirdly, there are updates on the existing videotex technologies and efforts from the french, japanese, canadian, and british groups. the british are perhaps the most interesting since they have a year of operational experience with their viewdata system, prestel. they state that most usage was from the business community, and their reports suggest that services are shifting to attract that market. if this is the case, it is a significant change from the original consumer orientation. there is also a good article on a prestel information provider's first year. of additional interest is that prestelcompatible databases and systems are being constructed in britain. thus, people will be 246 journal of library automation vol. 14/3 september 1981 able to access different systems using the same protocol. finally, there are numerous fascinating papers on american efforts. the americans, in contrast to the british, seem very unsettled; there is still a multiplicity of designs. (at&t's decision on a modified telidon standard, not reported in the proceedings but a major event of the conference, may ameliorate that .) the papers indicate overall that the "classic" definitions of viewdata and teletext will crumble or will be supplemented in the face of 100-channel, two-way cable systems. several papers document how these new cable capabilities will provide channels for large amounts of information to be delivered by teletext, viewdata, or hybrid systems. a paper by simon notes that cable will not only provide large audiences for information services but will also eliminate some of the traditionally defined viewdata functions. for example, people will not buy commodity prices from a viewdata service if that same information is available on a cable channel at a lower price. unfortunately, there are some topics missing from the 1981 conference proceedings. consumer-oriented educational services are mentioned little. systemperformance or human-factor considerations are rarely analyzed. there is much discussion of what services should be offered, but there is little discussion of how those services should be offered. no presentation is made on how to design very large databases for ease of use. particularly distressing is the relative omission of the word "quality" from the american papers in both proceedings. one cannot expect every home to be wired to access the entire library of congress. nonetheless, one can hope that videotex will not become merely a medium for used-car advertising.-mark s. ackerman, department of computer and information science, ohio state university and oclc, inc. , columbus. 198 information technology and libraries | december 2011 yan hantutorial articles: one was to make a case for using the cloud;4 while the other provided more details of moving a library’s it infrastructure (ils, website, and digital library systems) to a cloud along with discussing motivation, results, and evaluation in three areas (quality and stability, impact on library services, and cost).5 on the cost discussion, mitchell mentioned the difficulty of calculating technology total cost of ownership (tco) and cited two papers suggesting minimal cost savings. mitchell suggested the same but did not provide detailed cost information. in comparison, this paper has a detailed breakdown cost analysis along with different services, such as web applications and storage. mirsa and mondal proposed a suitability index and a return on investment (roi) model by taking into consideration impacts and real value.6 their suitability index and roi model is well thought but consider using the cloud for every aspect of all it operations as a whole. as a result, a company using this model will have the final conclusion of a “suitable,” or “may or may not be,” or “not suitable.” however, modular it operations and services (e.g., e-mail and storage) can be evaluated individually because these services can be easily upgraded or changed with minimal impacts to customers. i/o intensive services and storage intensive services have different resource requirements and thus the same evaluation criteria may not give an accurate picture of costs and benefits. for example, storing digital preservation files for libraries is a one-time data intensive operation. giving the above different nature of it operations and services, cloud computing may be suitable for some it operations but not for others. healy suggested that many companies did not have a complete financial analysis by missing staff retraining and system management. he listed the following areas for tco: hardware, software, recurring licensing and maintenance, bandwidth, a starting point for locating information for research; (2) buyer, the library as a purchaser of resources; and (3) archive, the library as a repository of resources. the 2009 survey indicates a gradual decline in their perception of the importance of “gateway,” no change in “archive,” growth in “buyer,” and increased importance for two new roles: “teaching support” and “research support.”1 to meet customers’ needs in these roles, libraries are innovating services, including catalogs and home websites (as “gateway” services), repository and digital library programs (as “archive,” “teaching support,” and “research support” services), and interlibrary loan (as a “buyer” and “research support” services). these services rely on stable and effective it infrastructure to operate. in the past, the growing needs of these web applications increased it expenditures and work complexity. more web applications, more storage, and more it support staff are weaved into centralized on-site it infrastructure along with huge investments in physical servers, networks, and buildings. however, decreasing budgets in libraries have had huge impact on all aspects of library operations and staffing. web applications running on local, managed servers might not be effective in technology nor efficient in cost. web applications utilizing cloud computing can be much more effective and efficient in some cases. literature review there are a growing number of articles related to cloud computing in libraries. chudnov described his personal experience of using cloud services amazon ec2 and s3 in an informal tone, costing him 50 cents.2 jordan discussed oclc’s strategies of building its next generation of services in cloud and provided a clear view of oclc’s future directions for us.3 mitchell wrote two cloud computing: case studies and total costs of ownership this paper consists of four major sections: the first section is a literature review of cloud computing and a cost model. the next section focuses on detailed overviews of cloud computing and its levels of services: saas, paas, and iaas. major cloud computing providers are introduced, including amazon web services (aws), microsoft azure, and google app engine. finally, case studies of implementing web applications on iaas and paas using aws, linode and google appengine are demonstrated. justifications of running on an iaas provider (aws) and running on a paas provider (google appengine) are described. the last section discusses costs and technology analysis comparing cloud computing with local managed storage and servers. the total costs of ownership (tco) of an aws small instance are significantly lower, but the tco of a typical 10tb space in amazon s3 are significantly higher. since amazon offers lower storage pricing for huge amounts of data, the tco might be lower. readers should do their own analysis on the tcos. a 2009 study from ithaka suggested that faculty perceive three traditional functions of a library: (1) gateway, the library as yan han (hany@u.library.arizona.edu) is associate librarian, university of arizona libraries, tucson, arizona. selecting a web content management system for an academic library website | han 199cloud computing: case studies and total costs of ownership | han 199 fundamental computing resources so that they can deploy and run arbitrary software such as operating systems and applications.13 in this model, the providers only manage underlying physical cloud infrastructure (e.g. physical servers and network), and provides services via virtualization. the users have maximum control on the infrastructure as if they own underlying physical servers and network. leading providers of this model includes amazon, linode, rackspace, joyent, and ibm blue cloud. major cloud computing providers include amazon web services (aws), microsoft windows azure, and google appengine. aws is considered to be an iaas, paas, and saas provider, which offers a collection of multiple computing services through the internet, including a few well-known services such as amazon elastic compute cloud (ec2),14 amazon simple storage service (s3), and amazon simpledb. ec2 started as a public beta in 2006. it allows users to pay for computing resources as they use them. with scalable use of computing resources and attractive pricing models, ec2 is one of the biggest brand names in cloud computing. it offers different os options, including multiple linux distributions, opensolaris, and windows server. ec2 uses xen virtualization, each virtual machine is called an instance. an instance in ec2 has no persistent storage, and data stored will be lost if the instance is terminated. therefore it is typical to use ec2 along with amazon elastic block store (ebs) or s3, which provides persistent storage for ec2 instances. amazon claims that both ebs and s3 are highly available and reliable. a user can create, start, stop, and terminate server instances through multiple geographical locations for benefits of resource optimization and high availability. for example, a user can start an instance in northern virginia, a potential to transform the it industry and it services, shifting the way it infrastructure and hardware are designed, purchased, and managed. many experts have their own version of cloud computing, which was discussed before.9 the national institute of standards and technology defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configuration computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.”10 nist also gives its three service models layered based on computing infrastructure: ■■ software as a service (saas) allows users to use the cloud computing providers’ applications through a thin client interface such as a web browser.11 in the saas model, the cloud computing providers manage almost everything in the cloud infrastructure (e.g., physical servers, network, os, applications). it is directly targeted for general end users. the end users can directly run applications on the clouds and do not need install, upgrade, and backup applications and their work. typical saas products are google apps and salesforce sales crm. ■■ platform as a service (paas) allows users to deploy their own applications on the provider’s cloud infrastructure under the provider’s environment such as programming languages, libraries, and tools.12 in this model, the cloud computing providers manage everything except the application in the cloud infrastructure. paas is directly targeted for general software developers. they can develop, test, and run their codes on a paas platform. typical examples of this model includes google appengine, windows azure, and joyent. ■■ infrastructure as a service (iaas) allows users to manage processing, storage, networks, and other staffing allocation, monitoring, backup, failover, security audit and compliance, integration, training, and speed to implementation.7 the author published his first paper regarding cloud computing in 2010.8 since then, the author has implemented and has been managing multiple web applications and services using iaas and paas providers. several web applications of the university of arizona libraries (ual) have been migrated to the cloud. this paper focuses on enterprise-level applications and services, not individual-level cloud applications such as google docs. the purposes of this article are to ■■ define cloud computing and levels of services; ■■ introduce and compare major cloud computing providers; ■■ provide case studies of running two web applications (dspace and a home grown java application) utilizing cloud computing with justification; ■■ provide a comparison of tco of running web applications comparing a cloud computing provider with a local managed server; ■■ provide a comparison of tco of 10tb storage space comparing a cloud computing provider with local managed storage; and ■■ briefly discuss technology advantages of cloud computing. definition of cloud computing and levels of services cloud computing services and providers cloud computing is becoming popular in the it industry. over the past few years, the supply-and-demand of this new area has been seeing a huge increase of investment in infrastructure and has been drawing broader uses in the united states. the author believes that it has a 200 information technology and libraries | december 2011 16gb storage, 200gb transfer, and the cost is $19.95 per month.20 customers pay up front. open-source cloud computing software and private cloud cloud computing also goes to open source if any person or organization wants to set up their own clouds. eucalyptus is an open-source cloud computing system developed by the university of california at santa barbara. some of its eye-catching features include full compatibility with amazon ec2 public infrastructure and multiple hypervisors, which allows different virtual machines (e.g., xen, kvm, vsphere) to run on one platform.21 its open-source company, eucalyptus systems, provides technical supports to end users. building a cloud infrastructure on cloud(s) is also possible and might be desirable in certain situations. current linux distributions work with eucalyptus to provide private cloud services such ubuntu enterprise cloud and red hat’s deltacloud. some organizations have been setting up private clouds to utilize advantages of cloud computing. the azure allows non-windows applications to run on the platform. for example, apache web server can be run as a “worker role.”17 there also are a few small-to-medium size providers such as linode.18 table 1 lists major cloud computing providers. the cloud computing providers operate in two business models: variable (pay-for-your-usage) plans and fixed plans. variable plans allows customers to pay only for the resources actually consumed (e.g., instancehours, data transfer). aws offers a variable plan. google app engine works in a similar way. google app engine offers two interesting features: daily budgets and free quotas. a daily budget allows customers to control the amount of resources used every day. the free quota is currently set as 6.5 hours of cpu time per day, 1 gb data in and out per day, and 1gb of data storage.19 by the end of each month, customers receive a bill listing the number of running hours, the amount of storage used, the size of data transfers, and other add-on services. linode only offers a fixed plan. the charge is based on the amount of ram, data storage, and data transfer by assuming an instance is always running. for example, the smallest instance has 360mb ram, mirroring instance in ireland, and another mirroring instance in asia. amazon keeps increasing its offering by introducing new paas and saas services, such as simpledb, simple e-mail service, and e-commerce. google app engine is a paas provider offering a cloud platform for web applications in google’s data centers. it was released as a beta version in 2008 but is currently in a full service mode. appengine functions like a middle layer, which frees customers worrying about running oss, modules, and libraries. it currently supports python and java programming languages and related frameworks, and it is expected to support more languages in the future. google app engine uses bigtable with its gql (a sqllike language). bigtable15 is google’s proprietary database, used in multiple google applications such as google earth, google search, and app engine. the design of gql intentionally does not support “join” statement for multiple machine optimization.16 unlike aws, google appengine has a nice feature that allows customers a taste of the platform: it is free of charge up to a certain level of resource use. after that, fees are charged for additional cpu time, bandwidth and storage. windows azure also is a paas provider, which runs on microsoft data centers. it provides a new way to run applications and storing data in microsoft way. microsoft customers can install and run applications on microsoft cloud. customers are provided with two different instance types: web role instances and worker role instances. customers can use a “web role instance” to accept incoming http/https requests using asp.net, windows communication foundation (wcf) or another.net technology working with iis. a “worker role instance” is not associated with iis, but functions as a background job. the two instances can be combined to create desired web services. it is clear that windows table 1. list of major cloud computing providers cloud computing provider layer akamai paas, saas amazon web services iaas, paas, saas emc saas eucalyptus iaas open source software google paas(appengine), saas ibm paas, saas linode iaas microsoft paas (azure), saas rackspace iaas, paas, saas salesforce.com paas, saas vmware vcloud paas, iaas zoho saas selecting a web content management system for an academic library website | han 201cloud computing: case studies and total costs of ownership | han 201 the work of modification of sql-style code would have been significant. the author has a monthly bill of $40 using an aws small instance. case study 2: japanese gif holding library finder application the author helped the north american coordinating council on japanese library resources (ncc) to develop and maintain a web service to identify japanese global ill framework (gif) libraries to facilitate interlibrary loan (ill) service. the application was developed in java using j2ee framework, and run in typical java servlet container such as tomcat. the application was initially operated in a small, locally managed server, and was migrated to linode and google appengine in may 2010. cloud computing provider selection and implementation unlike case 1, the author tested and installed the application to aws, linode and google appengine. aws and linode are iaas providers which give users greater control over virtual nodes on their cloud infrastructure. google appengine might be a better choice when applications run on normal os environments, because system administration tasks can be completed by paas providers, saving users’ time and resources. as a paas provider, google maintains its infrastructure environment such as os, programming languages, and tools. installing the application in google appengine can go through an eclipse plug-in or through command lines. in this case, the gif application is a simple system written in java without any database transactions. therefore google app engine’s proprietary gql database is not a barrier. however, users should be aware that google appengine has other unique features. for example, cloud computing provider selection and implementation a typical dspace instance requires java and related libraries, j2ee environment, and postgresql as database backend. three cloud computing providers have been evaluated: aws, linode, and google appengine. two instances were successfully installed and configured in aws and linode after a few days of testing. building a dspace instance on the cloud is the same process as running it on local except that it is much quicker to build, restart, rebuild, and backup. for example, an initial os installation in a traditional server will take a few hours compared to doing the same task that takes a few minutes using an iaas provider. installation on the aws ec2 and linode is almost the same except creating a login and setting up security policies. to log on to aws, command line tools using an x.509 certificate using public/private key are by default. a generated keypair is required to ssh an instance and no password ssh option is provided. in addition, appropriate “security groups” are required to set up to enable network protocols. in this case, protocols such as ssh and http along with typical port number 80 and 8080 must be enabled. activities such as manage instances, creating images, and setup security policies can be set up through aws web interface (see figure 1). steps and commands of running regular operations can be found in the appendix. in linode, using “root” to log on is allowed. users do not need to set network and security policies, as protocols and ports are already open. in system administration practice, running applications without enforcing security policies does present security risks to applications and systems. linode allows users to set up security policies. the author decided not to proceed with installation in google appengine because of its proprietary database gql. if implemented in google appengine, private cloud eases concerns in the public cloud such as security of data, control of data, and legal issues. for example, an institution can build its own cloud infrastructure using eucalyptus (or ubuntu cloud) with its own computing resources or simply using amazon aws. the private cloud computing service becomes customizable cloud computing resources which can be configured and reconfigured as needed. why is this valuable? in traditional computing approaches, servers, storage, and networking equipment are purchased, configured, and then used without significant changes for three to five years until lives end. in this case, some planning must be scheduled ahead of time thinking of computing resource needs in three to five years. it is certain that additional resources (e.g., ram, hard disks, cpu) will be reserved for future needs and are currently wasted. the private cloud reduces concerns regarding security and data control. however, one must still buy, build, and manage the private cloud, increasing tco and reducing the cost benefit. case studies: applications on the cloud case study 1: dspace implementation and analysis many libraries are running their institutional repositories at locally managed servers. ual has been running its repositories since 2004 as one of the earliest dspace adapters. one of the dspace instances was tested on the cloud in january 2010 after comparing costs and supports. later the author chose to run a production dspace in aws starting march 2010. the repository (http://www.afghan data.org/) currently holds 1,800 titles of digitized unique afghan materials. since then, several content and system updates have been applied. 202 information technology and libraries | december 2011 a good case for calculating the tco.25 in cases below, readers should be aware that there are the following assumptions: ■■ software, training, licensing, and maintenance costs are the same by assuming using on the same software environment on the local managed infrastructure and on the cloud. ■■ monitoring costs are the same based on the fact that monitoring software has to be hosted somewhere. ■■ bandwidth and network costs ignored. ■■ security audit and compliance ignored by assuming all data are open. the author runs an instance of 100gb in aws and a monthly bill of this node is around $40. in comparison, if running a local managed server, a physical server would have been purchased. in our case, a comparison of tco shows that the cloud computing model has a significant 50 percent cost saving, assuming a server life expectancy is five years. analysis and discussions cost analysis running applications on the cloud gives many technical advantages and results in significant cost savings over running them on local managed servers. in this section, the author presents detailed cost comparisons between virtual managed nodes in the cloud computing and local managed storage and servers in the traditional model. cost saving and low barriers to launch web services using the cloud is significant when considering easy start-up, scalability, and flexibility. one of the biggest advantages of the cloud computing lies in its on-demand, allowing users to start applications with minimal cost. the current cost of starting an instance on aws is 0.03 per hour if reserved. above the clouds: a berkeley view of cloud computing cites a comparison: “it costs $2.56 to rent $2 worth of cpu” and “costs are $6.00 when purchasing vs. $1.20–$1.50 per month on s3.”24 clearly healy made currently google appengine only allows users to have their codes running in python and java; it uses its own database query language gql. this creates an extra step for developers who are willing to migrate existing codes to google and existing sql queries have to be rewritten. in addition, other limitations with google app engine include allowing only a subset of the jre standard edition and users are unable to create new threads.22 the cost of running the application on google app engine is great, because google app engine offers free of charge up to its free quota. google identified 90 percent of applications were hosted free.23 this is a great paas resource for small web applications. applications on the cloud since 2009, the author has been running multiple web applications and services on multiple iaas and paas providers and has been very happy regarding services and overall costs. the running applications and services are listed in table 2. figure 1. amazon aws management console selecting a web content management system for an academic library website | han 203cloud computing: case studies and total costs of ownership | han 203 ■❏ operation expense: $7,190– $10,690. ignoring downtime and failure expenses, insurance cost, technology training, and backup process. ■● system administrator cost: $3,500–$7,000 = 5 years x 1–2 percent time x (50,000 salary + 50000 x 40 percent benefits). 1–2 percent time is about 5–10 minutes per day assuming this administrator works at 8 hours per day 5 days per week at 100 percent capacity. ■■ space cost: $1,500. ■● space cost for a book in ual is $2.80 per year. a physical server is estimated to be $300 dollars per year for space. ■● electricity cost: $2,190. of a 1.0–1.2 ghz 2007 opteron or 2007 xeon processor.”26 ■■ the tco of a physical server comparable to an aws small instance for 5 years: $5,858–$7,608. ■❏ an aws small instance is roughly 50 percent of computing power of a server quoted. (the tco here is calculated as 50 percent of $11,715–$15,215). ■❏ hardware: $4,525. ■● $4,525 = $2,658 (server) + $1,125 (3-year support) + $1,125 x2 /3 (additional 2-year support). note: dell poweredge server: intel xeon e56302.53ghz with 5-year support for mission critical 6-hours repair (source: dell. com quoted on oct. 20, 2010). ■■ the tco of an aws small instance for 5 years: $2,750–$3,750. ■❏ hardware: $0. ■❏ operation expense: $2,750– $3,750 ■● system administrator cost: $0–$1,000?. by eliminating physical infrastructure, there is no need or minimal cost to manage a server. ■● $2,750 = $350 (aws initial subscription fee) + $40/ month x 12 months x 5 years. the instance’s capacity can be found on aws, and cpu power can be evaluated by using /proc/cpuinfo. amazon indicated that “one ec2 compute unit provides the equivalent cpu capacity table 2. some ual web applications and cloud computing service providers computing infrastructure functions applications computing environment instances service providers data storage data storage n/a linux / windows data storage using ebs or s3 aws access digital repository dspace j2ee, java, tomcat, postgresql, afghanistan digital collections aws linode content management system joomla linux, apache, php, mysql, afghanistan digital libraries aws linode website html html sonoran desert knowledge exchange aws linode integrated library system koha linux, apache, perl, mysql afghanistan higher education union catalog aws linode web applications home-grown j2ee web application j2ee, java, tomcat japanese gif (global interlibrary-loan) holding finder at linode at google app engine aws linode google app engine computing services monitoring nagios linux, perl internal application aws linode networked devices administration ssh, sftp linux n/a aws linode 204 information technology and libraries | december 2011 meet users’ needs at will. rebuilding nodes and creating imaging are also easier on the cloud. server failure resulting from hardware error can result in significant downtime. the ual has a few server failure in the past few years. last year a server’s raid hard drives failed. the time spent on ordering new hard disks, waiting for server company technician’s arrival, and finally rebuilding software environment (e.g., os, web servers, application servers, user and group privileges) took six or more hours, not to mention about stress rising among customers due to unavailability of services. mirroring servers could minimize service downtime, but the cost would be almost doubled. in comparison, in the cloud computing model, the author took a few snapshots using the aws web management interface. if a node fails, the author can launch an instance using the snapshot within a minute or two. factors such as software and hardware failure, natural disasters, network failure, and human errors are the main causes for system downtime. the cloud computing providers generally have multiple data centers in different regions. for instance, amazon s3 and google appengine are claimed to be highly available and highly reliable. both aws and google app engine offer automatic scaling and load balancing. the cloud computing providers have huge advantages in offering high availability to minimize hardware failure, natural disasters, network failure, and human errors, while the locally managed server and storage approach has to be invested a lot to reduce these risks. in 2009 and 2010 the university of arizona has experienced at least two network and server outages each lasting a few hours; one failure was because of human error and the other was because of a power failure from tucson electric power. when a power line was cut by accident, what can you do? in comparison, over the past two years minimal downtime from includes 12tb hard disks (about 10tb usable space after raid 5 configuration) with 5-year support, assuming 5-year life expectancy. ■❏ operation expense: $1,438– $2,138 per year. ■● system administrator cost: $700–$1,200. see above. ■● space cost: $300. see above. ■● electricity costs: $438 per year. see above. ■● network cost ignored. technology analysis there is no need to purchase a server; no need to initial a cloud node; no need to setup security policies; no need to install tomcat, java and j2ee environment; and no need to update software. compared to the traditional approach, paas eliminates upfront hardware and software investment, reduces time and work for setting up running environment, and removes hardware and software upgrade and maintenance tasks. iaas eliminates upfront hardware investment along with other technical advantages discussed below. the cloud computing model offers much better scalability over the traditional model due to its flexibility and lower cost. in our repository, the initial storage requirement is not significant, but can grow over time if more digital collections are added. in addition, the number of visits is not high, but can increase significantly later. an accurate estimate of both factors can be difficult. in the traditional model, a purchased server has preconfigured hardware with limited storage. upgrading storage and processing power can be costly and problematic. downtime will be certain during the upgrade process. in comparison, the cloud computing model provides an easy way to upgrade storage and processing power with no downtime if handling well. bigger storage and larger instances with high-memory or highcpu can be added or removed to ■■ electricity cost: $2,190 = 5 years x 365 days/year x 24 hours/day x 0.5 kilowatt / hour x $0.10/kilowatt. most libraries running digital library programs require big storage for preserving digitization files. the analysis below just illustrates a comparison of the tco of 10tb space. it shows that the tco of locally managed storage has lower costs than amazon s3’s storage tco. though the cloud computing model still have the advantage of on-demand, avoid big initial investment on equipment, the author believes that locally managed storage may be a better solution if planned well. since amazon s6 storage pricing decreases from $0.14/gb to $0.095/gb over 500tb, amazon s3’s tco might be lower if an organization has huge amounts of data. the author suggests readers should do their own analysis. ■■ the tco of 10tb in amazon s3 per year: $16,800. note: amazon s3 replicate data at least 3 times, assuming these preservation files do not need constant changes. otherwise, data transfer fees could be high. ■❏ operation expense: $16,800 per year. ■● $16,800 = $1,400/month x 12 months. (based on amazon s3 pricing of $0.14/gb per month) ■● network cost ignored. ■■ the tco of a 10tb physical storage per year: $11,212–$12,612. ■❏ to match reliability of amazon s3, local managed storage needs three copies of data: two in hard disk and one in tape. note: dell ax4–5i san storage: quoted on october 26, 2010. replicate data 3 times, including 2 copies in hard disks, one copy in tape. ignoring time value of money, 3 percent inflation per year based on cpi statistic data. ■❏ hardware: $4,168 per year. ■● $20,840 a san storage selecting a web content management system for an academic library website | han 205cloud computing: case studies and total costs of ownership | han 205 ’06), nov. 6–8, 2006, seattle, wash., h t t p s : / / w w w. u s e n i x . o r g / e v e n t s / o s d i 0 6 / t e c h / c h a n g / c h a n g _ h t m l / (accessed apr. 21, 2010). 16. google, “gql reference, 2010, http://code.google.com/appengine/ docs/python/datastore/gqlreference .html (accessed apr. 21, 2010); google developers, “campfire one: introducing google app engine (pt. 3),” 2010, http:// www.youtube.com/watch?v=og6ac7dnx8 (accessed apr. 21, 2010). 17. david chappell, “introducing windows azure,” 2009, http://download.microsoft.com/download/e/4/3/ e43bb484–3b52–4fa8-a9f9-ec60a32954bc/ azure_services_platform.pdf (accessed apr. 2, 2010). 18. linode, “linode—xen vps hosting,” 2010, http://www.linode.com/ (accessed apr. 7, 2010). 19. google, “quotas—google app engine,” 2010, http://code.google.com/ appengine/docs/quotas.html (accessed oct. 21, 2010). 20. jay jordan, “climbing out of the box and into the cloud: building webscale for libraries,” journal of library administration 51, no. 1 (2011): 3–17. 21. nurmi daniel et al., “the eucalyptus open-source cloud-computing system,” in 9th ieee/acm international symposium on cluster computing and the grid, 2009, doi: 10.1109/ccgrid.2009.93. 22. google, “the jre white list— google app engine—google code,” 2010, http://code.google.com/appengine/ docs/java/jrewhitelist.html (accessed apr. 9, 2010); google, “the java servelet environment,” 2010, http://code.google .com/appengine/docs/java/runtime .html (accessed apr. 9, 2010). 23. google, “changing quotas to keep most apps serving free,” 2009, http:// googleappengine.blogspot.com/2009/ 06/changing-quotas-to-keep-most-apps .html (access oct. 21, 2010). 24. michael armbust et al., above the clouds: a berkeley view of cloud computing (eecs department, university of california, berkeley: reliable adaptive distributed systems laboratory, 2009), http://www.eecs.berkeley.edu/pubs/ te c h r p t s / 2 0 0 9 / e e c s 2 0 0 9 2 8 . h t m l (accessed july 1, 2009). 25. amazon, “amazon ec2 pricing,” 2010, http://aws.amazon.com/ec2/pricing/ (accessed feb. 20, 2010). 26. michael healy, “beyond cya as a service,” information week 1288 (2011): 24–26. case of 10tb storage. since amazon offers lower storage pricing for huge amounts of data, readers are recommended to do their own analysis on the tcos. references 1. roger c. schonfeld and ross housewright, faculty survey 2009: key strategic insights for libraries, publishers, and societies, 2010, http://www.ithaka .org/ithaka-s-r/research/faculty-surveys -2000–2009/faculty-survey-2009 (accessed apr. 20, 2010). 2. daniel chudnov, “a view from the clouds,” computers in libraries 30, no. 3 (2010): 33–35. 3. jay jordan, “climbing out of the box and into the cloud: building webscale for libraries,” journal of library administration 51, no. 1 (2011): 3–17. 4. erik mitchell, “cloud computing and your library,” journal of web librarianship 4, no. 1 (2010): 83–86. 5. erik mitchell, “using cloud services for library it infrastructure,” code4lib journal 9 (2010), http://journal .code4lib.org/articles/2510 (accessed feb 10, 2011). 6. subhas c. misra and arka mondal, “identification of a company’s suitability for the adoption of cloud computing and modelling its corresponding return on investment,” mathematical & computer modelling 53 (2011): 504–21, doi: 10.1016/j. mcm.2010.03.037. 7. michael healy, “beyond cya as a service,” information week 1288 (2011): 24–26. 8. yan han, “on the clouds: a new way of computing,” information technology & libraries 29, no. 2 (2010): 88–93. 9. ibid. 10. peter mell and tim grance, the nist definition of cloud computing, nist, http://csrc.nist.gov/groups/sns/cloud -computing/ (accessed oct. 21, 2010). 11. ibid. 12. ibid. 13. ibid. 14. amazon, amazon elastic compute cloud (amazon ec2), 2010, http://aws .amazon.com/ec2/ (accessed oct. 21, 2010). 15. fay chang et al., “bigtable: a distributed storage system for structure data,” in 7th symposium on operating systems design and implementation (osdi the cloud computing providers was reported. there are some issues when implementing cloud computing. above the clouds: a berkeley view of cloud computing discusses ten obstacles and related opportunities for cloud computing.27 all of these obstacles and opportunities are technical. the author’s first paper on this topic also discusses legal jurisdiction issues when considering cloud computing.28 users should be aware of these potential issues when making a decision of adopting the cloud. summary this paper starts with literature review of articles in cloud computing, some of them describing how libraries are incorporating and evaluating the cloud. the author introduces cloud computing definition, identifies three-level of services (saas, paas, and iaas), and provides an overview of major players such as amazon, microsoft, and google. open source cloud software and how private cloud helps are discussed. then he presents case studies using different cloud computing providers: case 1 of using an iaas provider amazon and case 2 of using a paas provider google. in case 1, the author justifies the implementation of dspace on aws. in case 2, the author discusses advantages and pitfalls of paas and demonstrates a small web application hosted in google appengine. detailed analysis of the tcos comparing aws with local managed storage and servers are presented. the analysis shows that the cloud computing has technical advantages and offers significant cost savings when serving web applications. shifting web applications to the cloud provides several technical advantages over locally managed servers. high availability, flexibility, and cost-effectiveness are some of the most important benefits. however, the locally managed storage is still an attractive solution in a typical 206 information technology and libraries | december 2011 (accessed july 1, 2009). 29. yan han, “on the clouds: a new way of computing,” information technology & libraries 29, no. 2 (2010): 88–93. (eecs department, university of california, berkeley: reliable adaptive distributed systems laboratory, 2009), http://www.eecs.berkeley.edu/pubs/ te c h r p t s / 2 0 0 9 / e e c s 2 0 0 9 – 2 8 . h t m l 27. erik mitchell, “cloud computing and your library,” journal of web librarianship 4, no. 1 (2010): 83–86. 28. michael armbust et al., above the clouds: a berkeley view of cloud computing, appendix. running instances on amazon ec2 task 1: building a new dspace instance ■■ build a clean os: select an amazon machine image (ami) such as ubuntu 9.2 to get up and running in a minute or two. ■■ install required modules and packages: install java, tomcat, postgresql, and mail servers. ■■ configure security and network access on the node. ■■ install and configure dspace: install system and configure configuration files. task 2: reloading a new dspace instance ■■ create a snapshot of current node with the ebs if desired: use aws’s management tools to create a snapshot. ■■ register the snapshot using aws’s management tools and write down the snapshot id, specify the kernel and ramdisk. command: ec2-register: registers the ami specified in the manifest file and generate a new ami id (see amazon ec2 documentation) (example: ec2-register -s snap-12345 -a i386 -d “description of ami” -n “name-of-image” —kernel aki-12345 — ramdisk ari-12345 ■■ in the future, a new instance can be started from this snapshot image in less than a minute. command: ec2-run-instances: launches one or more instances of the specified ami (see amazon ec2 documentation) (example: ec2-run-instance ami-a553bfcc -k keypair2 -b /dev/sda1=snap-c3fcd5aa: 100:false) task 3: increasing storage size of current instance ■■ to create an instance with desired persistent storage (e.g., 100 gb) command: ec2-run-instances: launches one or more instances of the specified ami (see amazon ec2 documentation) (example: ec2-run-instances ami-54321 -k ec2-key1 -b /dev/sda1=snap-12345:100:false) ■■ if you boot up an instance based on one of these amis with the default volume size, once it’s started up you can do an online resize of the file system: command: resize2fs: ext2 file system resizer (example: resize2fs /dev/sda1) task 4: backup ■■ go to aws web interface and navigate to the “instances” panel. ■■ select our instance and then choose “create image (ebs ami).” ■■ this newly created ami will be a snapshot of our system in its current state. 110 information technology and libraries | september 2009 employing virtualization in library computing: use cases and lessons learned arwen hutt, michael stuart, daniel suchy, and bradley d. westbrook this paper provides a broad overview of virtualization technology and describes several examples of its use at the university of california, san diego libraries. libraries can leverage virtualization to address many long-standing library computing challenges, but careful planning is needed to determine if this technology is the right solution for a specific need. this paper outlines both technical and usability considerations, and concludes with a discussion of potential enterprise impacts on the library infrastructure. o perating system virtualization, herein referred to simply as “virtualization,” is a powerful and highly adaptable solution to several library technology challenges, such as managing computer labs, automating cataloging and other procedures, and demonstrating new library services. virtualization has been used in one manner or another for decades,1 but it is only within the last few years that this technology has made significant inroads into library environments. virtualization technology is not without its drawbacks, however. libraries need to assess their needs, as well as the resources required for virtualization, before embarking on large-scale implementations. this paper provides a broad overview of virtualization technology and explains its benefits and drawbacks by describing some of the ways virtualization has been used at the university of california, san diego (ucsd) libraries.2 n virtualization overview virtualization is used to partition the physical resources (processor, hard drive, network card, etc.) of one computer to run one or more instances of concurrent, but not necessarily identical, operating systems (oss). traditionally only one instance of an operating system, such as microsoft windows, can be used at any one time. when an operating system is virtualized—creating a virtual machine (vm)—the vm communicates through virtualization middleware to the hardware or host operating system. this middleware also provides a consistent set of virtual hardware drivers that are transparent to the enduser and to the physical hardware. this allows the virtual machine to be used in a variety of heterogeneous environments without the need to reconfigure or install new drivers. with the majority of hardware and compatibility requirements resolved, the computer becomes simply a physical presentation medium for a vm. n two approaches to virtualization: host-based vs. hypervisor virtualization can be implemented using type 1 or type 2 hypervisor architectures. a type 1 hypervisor (figure 1), commonly referred to as “host-based virtualization,” requires an os such as microsoft windows xp to host a “guest” operating system like linux or even another version of windows. in this configuration, the host os treats the vm like any other application. host-based virtualization products are often intended to be used by a single user on workstation-class hardware. in the type 2 hypervisor architecture (figure 2), commonly referred to as “hypervisor-based virtualization,” the virtualization middleware interacts with the computer’s physical resources without the need of a host operating system. such systems are usually intended for use by multiple users with the vms accessed over the network. realizing the full benefits of this approach requires a considerable resource commitment for both enterprise-class server hardware and information technology (it) staff. n use cases archivists’ toolkit the archivists’ toolkit (at) project is a collaboration of the ucsd libraries, the new york university libraries, and the five colleges libraries (amherst college, hampshire college, mt. holyoke college, smith college, and university of massechusetts, amherst) and is funded by the andrew w. mellon foundation. the at is an open-source archival data management system that provides broad, integrated support for the management of archives. it consists of a java client that connects to a relational database back-end (mysql, mssql, or oracle). the database can be implemented on a networked server or a single workstation. since its initial release in december 2006, the at has sparked a great deal of interest and rapid uptake of the application within the archival community. this growing interest has, in turn, created an increased demand for demonstrations of the product, workshops and training, and simpler methods for distributing the application. (of the use cases described here, the two for the at arwen hutt (ahutt@ucsd.edu) is metadata specialist, michael stuart (mstuart@ucsd.edu) is information technology analyst, daniel suchy (dsuchy@ucsd.edu) is public services technology analyst, and bradley d. westbrook (bradw@library.ucsd.edu) is metadata librarian and digital archivist, university of california, san diego libraries. employing virtualization in library computing | hutt et al. 111 distribution and laptop classroom are exploratory, whereas the rest are in production.) at workshops the society of american archivists sponsors a two-day at workshop occurring on multiple dates at several locations. in addition, the at team provides oneand two-day workshops to different institutional audiences. at workshops are designed to give participants a hands-on experience using the at application. accomplishing this effectively requires, at the minimum, supplying all participants with identical but separate databases so that participants can complete the same learning exercises simultaneously and independently without concern for working in each other’s space. in addition, an ideal configuration would reduce the workload of the instructors, freeing them from having to set up the at instructional database onsite for each workshop. for these workshops we needed to do the following: n provide identical but separate databases and database content for all workshop attendees n create an easily reproducible installation and setup for workshops by preparing and populating the at instructional database in advance virtualization allows the at workshop instructors to predefine the workstation configuration, including the installation and population of the at databases, prior to arriving at the workshop site. to accomplish this we developed a workshop vm configuration with mysql and the at client installed within a linux ubuntu os. the workshop instructors then built the at vm with the data they require for the workshop. the at client and database are loaded on a dvd or flash drive and shipped to the classroom managers at the workshop sites, who then need only to install a copy of the vm and the freely available vmplayer software (necessary to launch the at vm) onto each workstation in the classroom. the at vm, once built, can be used many times both for multiple workstations in a classroom as well as for multiple workshops at different times and locations. this implementation has worked very well, saving both time and effort for the instructors and classroom support staff by reducing the time and communication figure 1. a type 1 hypervisor (host-based) implementation figure 2. a type 2 hypervisor-based implementation 112 information technology and libraries | september 2009 necessary for deploying and reconfiguring the vm. it also reduces the chances that there will be an unexpected conflict between the application and the host workstation’s configuration. but the method is not perfect. more than anything else, licensing costs motivated us to choose linux as the operating system instead of a proprietary os such as windows. this reduces the cost of using the vm, but it also requires workshop participants to use an os with which they are often unfamiliar. for some participants, unfamiliarity with linux can make the workshop more difficult than it would be if a more ubiquitous os was used. at demonstrations in a similar vein, members of the at team are often called upon to demonstrate the application at various professional conferences and other venues. these demonstrations require the setup and population of a demonstration database with content for illustrating all of the application’s functions. one of the constraints posed by the demonstration scenario is the importance of using a local database instance rather than a networked instance, since network connections can be unreliable or outright unavailable (network connectivity being an issue we’ve all faced at conferences). another constraint is that portions of the demonstrations need some level of preparation (for example, knowing what search terms will return a nonempty result set), which must be customized for the unique content of a database. a final constraint is that, because portions of the demonstration (import and data merging) alter the state of the database, changes to the database must be easily reversible, or else new examples must be created before the database can be reused. building on our experience of using virtualization to implement multiple copies of an at installation, we evaluated the possibility of using the same technology for simplifying the setup necessary for demonstrating the at. as with the workshops, the use of a vm for at demonstrations allows for easy distribution of a prepopulated database, which can be used by multiple team members at disparate geographic locations and on different host oss. this significantly reduces the cost of creating (and recreating) demonstration databases. in addition, demonstration scripts can be shared between team members, creating additional time savings as well as facilitating team participation in the development and refinement of the demonstration. perhaps most important is the ability to roll back the vm to a specific state or snapshot of the database. this means the database can be quickly returned to its original state after being altered during a demonstration. overall, despite our initial anxiety about depending on the vm for presentations to large audiences, this solution has proven very useful, reliable, and cost-effective. at distribution implementing the at requires installing both the toolkit client and a database application such as mysql, instantiating an at database, and establishing the connection between database and client. for many potential customers of the at, the requirements for database creation and management can be a significant barrier due to inexperience with how such processes work and a lack of readily available it resources. many of these customers simply desire a plug-and-play version of the application that they can install and use without requiring technical assistance. it is possible to satisfy this need for a plug-and-play at by constructing a vm containing a fully installed and ready-to-use at application and database instance. this significantly reduces the number and difficulty of steps involved in setting up a functional at instance. the customer would only need to transfer the vm from a dvd or other source to their computer, download and install the vm reader, and then launch the at vm. they would then be able to begin using the at immediately. this removes the need for the user to perform database creation and management; arguably the most technically challenging portion of the setup process. users would still have the option of configuring the application (default values, lookup lists, etc.) in accord with the practices of their repository. batch processing catalog records the rapid growth of electronic resources is significantly changing the nature of library cataloging. not only are types of library materials changing and multiplying, the amount of e-resources being acquired increases each year. electronic book and music packages often contain tens of thousands of items, each requiring some level of cataloging. because of these challenges, staff are increasingly cataloging resources with specialized programs, scripts, and macros that allow for semiautomated record creation and editing. such tools make it possible to work on large sets of resources—work that would not be financially possible to perform manually item by item. however, the specialized configuration of the workstation required for using these automated procedures makes it very difficult to use the workstation for other purposes at the same time. in fact, user interaction with the workstation while the process is running can cause a job to terminate prior to completion. in either scenario, productivity is compromised. virtualization offers an excellent remedy to this problem. a virtual machine configured for semiautomated batch processing allows for unused resources on the workstation to process the batch requests in an isolated environment while, at the same time and on the same machine, the user is able to work on other tasks. in cases employing virtualization in library computing | hutt et al. 113 where the user’s machine is not an ideal candidate for virtualization, the vm can be hosted via a hypervisorbased solution, and the user can access the vm with familiar remote access tools such as remote desktop in windows xp. secure sandbox in addition to challenges posed by increasingly large quantities of acquisitions, the ucsd libraries is also encountering an increasing variety of library material types. most notable is the variety and uniqueness of digital media acquired by the library, such as specialized programs to process and view research data sets, new media formats and viewers, and application installers. cataloging some of these materials requires that media be loaded and that applications be installed and run to inspect and validate content. but running or opening these materials, which are sometimes from unknown sources, poses a security risk to both the user’s workstation and to the larger pool of library resources accessible via the network. many installers require a user to have administrative privileges, which can pose a threat to network security. the virtual machine allows for a user to have administrative privileges within the vm, but not outside of the vm. the user can be provided with the privileges needed for installing and validating content without modifying their privileges on the host machine. in addition, the vm can be isolated by configuring its network connection so that any potential security risks are limited to the vm instance and do not extend to either the host machine or the network. laptop classroom instructors at the ucsd libraries need a laptop classroom that meets the usual requirements for this type of service (mobility, dependability, etc.) but also allows for the variety of computing environments and applications in use throughout our several library locations. in a least-common-denominator scenario, computers are configured to meet a general standard (usually microsoft windows with a standard browser and office suite) and allow minimal customization. while this solution has its advantages and is easy to configure and maintain from the it perspective, it leaves much to be desired for an instructor who needs to use a variety of tools in the classroom, often on demand. the goal in this case is not to settle for a single generic build but instead look for a solution that accommodats three needs: n the ability to switch quickly between different customized os configurations n the ability to add and remove applications on demand in a classroom setting n the ability to restore a computer modified during class to its original state of course, regardless of the approach taken, the laptops still needed to retain a high level of system security, application stability, and regular hardware maintenance. after a thorough review of the different technologies and tools already in use in the libraries, we determined that virtualization might also serve to meet the requirements of our laptop classroom. the need to support multiple users and multiple vms makes this scenario an ideal candidate for hypervisor-based virtualization. we decided to use vdi (virtual desktop infrastructure), a commercially available hypervisor product from vmware. vmware is one of the largest providers of virtualization software, and we were already familiar with several iterations of its host-based vm services. the core of our project plan consists of a base vm to be created and managed by our it department. to support a wide variety of applications and instruction styles, instructors could create a customized vm specific to their library’s instruction needs with only nominal assistance from it staff. the custom vm would then be made available on demand to the laptops from a central server (as depicted in figure 2 above). in this manner, instructors could “own” and maintain a personal instructional computing environment, while the classroom manager could still ensure the laptop classroom as a whole maintained the necessary secure software environment required by it. as an added benefit, once these vms are established, they could be accessed and used in a variety of diverse locations. n considerations for implementation before implementing any virtualization solution, in-depth analysis and testing is needed to determine which type of solution, if any, is appropriate for a specific use case in a specific environment. this analysis should include three major areas of focus: user experience, application performance in the virtualized environment, and effect on the enterprise infrastructure. in this section of this paper, we review considerations that, in hindsight, we would have found to be extremely valuable in the ucsd libraries’ various implementations of virtualization. user experience traditionally, system engineers have developed systems and tuned performance according to engineering metrics (e.g., megabytes per second and network latency). while such metrics remain valuable to most assessments of a 114 information technology and libraries | september 2009 computer application, performance assessments are being increasingly defined by usability and user experience factors. in an academic computing environment, especially in areas such as library computer labs, these newer kinds of performance measures are important indicators of how effectively an application performs and, indirectly, of how well resources are being used. virtualization can be implemented in a way that allows library users to have access to both the virtualized and host oss or to multiple virtualized oss. since virtualization essentially creates layers within the workstation, multiple os layers (either host or virtualized) can cause the users to become confused as to which os they are interacting with at a given moment. in that kind of implementation, the user can lose his or her way among the host and guest oss as well as become disoriented by differing features of the virtualized oss. for example, the user may choose to save a file to the desktop, but may not be aware that the file will be saved to the desktop of the virtualized os and not the host os. external device support can also be problematic for the end user, particularly with regard to common devices such as flash drives. the user needs to be aware of which operating system is in use, since it is usually the only one with which an external device is configured to work. authentication to a system is another example of how the relationship between the host and guest os can cause confusion. the introduction of a second os implicitly creates a second level of authentication and authorization that must be configured separately from that of the host os. user privileges may differ between the host and guest os for a particular vm configuration. for instance, a user might need to remember two logins or at least enter the same login credentials twice. these unexpected differences between the host and guest os produce negative effects on a user’s experience. this can be a critical factor in a time-sensitive environment such as a computer lab, where the instructor needs to devote class time to teaching and not to preparing the computers for use and navigating students through applications. interface latency and responsiveness latency (meaning here the responsiveness or “sluggishness” of the software application or the os) in any interface can be a problem for usability. developers devote a significant amount of time to improving operating systems and application interfaces to specifically address this issue. however, users will often be unable to recognize when an application is running a virtualized os and will thus expect virtualized applications to perform with the same responsiveness as applications that are not-virtualized. in our experience, some vm implementations exhibit noticeable interface latency because of inherent limitations of the virtualization software. perhaps the most notable and restrictive limitation is the lack of advanced 3d video rendering capability. this is due to the lack of support for hardware-accelerated graphics, thus adding an extra layer of communication between the application and the video card and slowing down performance. in most hardware-accelerated 3d applications (e.g., google earth pro or second life), this latency is such a problem that the application becomes unusable in a virtualized environment. recent developments have begun to address and, in some cases, overcome these limitations.3 in every virtualization solution there is overhead for the virtualization software to do its job and delegate resources. in our experience, this has been found to cause an approximately 10–20 percent performance penalty. most applications will run well with little or moderate changes to configuration when virtualized, but the overhead should not be overlooked or assumed to be inconsequential. it is also valuable to point out that the combination of applications in a vm, as well as vms running together on the same host, can create further performance issues. traditional bottlenecks the bottlenecks faced in traditional library computing systems also remain in almost every virtualization implementation. general application performance is usually limited by the specifications of one or more of the following components: processor, memory, storage, and network hardware. in most cases, assuming adequate hardware resources are available, performance issues can be easily addressed by reconfiguring the resources for the vm. for example, a vm whose application is memorybound (i.e., performance is limited by the memory available to the vm), can be resolved by adjusting the amount of memory allocated to the vm. a critical component of planning a successful virtualization deployment includes a thorough analysis of user workflow and the ways in which the vm will be utilized. although the types of user workflows may vary widely, analysis and testing serve to predict and possibly avoid potential bottlenecks in system performance. enterprise impact when assessing the effect virtualization will have on your library infrastructure, it is important to have an accurate understanding of the resources and capabilities that will form the foundation for the virtualized infrastructure. it is a misconception that it is necessary to purchase stateof-the-art hardware to implement virtualization. not only are organizations realizing how to utilize existing hardware better with virtualization for specific projects, they are discovering that the technology can be extended employing virtualization in library computing | hutt et al. 115 to the rest of the organization and be successfully integrated into their it management practices. virtualization does, however, impose certain performance requirements for large-scale deployments that will be used in a 24/7 production environment. in such scenarios, organizations should first compare the level of performance offered by their current hardware resources with the performance of new hardware. the most compelling reasons to buy new servers include the economies of scale that can be obtained by running more vms on fewer, more robust servers, as well as the enhanced performance supplied by newer, more virtualization-aware hardware. in addition, virtualization allows for resources to be used more efficiently, resulting in lower power consumption and cooling costs. also, the network is often one of the most overlooked factors when planning a virtualization project. while a local virtualized environment (i.e., a single computer) may not necessarily require a high performance network environment, any solution that calls for a hypervisor-based infrastructure requires considerable planning and scaling for bandwidth requirements. the current network hardware available in your infrastructure may not perform or scale adequately to meet the needs of this vm use. again, this highlights the importance of thorough user workflow analyses and testing prior to implementation. depending on the scope of your virtualization project, deployment in your library can potentially be expensive and can have many indirect costs. while the initial investment in hardware is relatively easy to calculate, other factors, such as ongoing staff training and system administration overhead, are much more difficult to determine. in addition, virtualization adds an additional layer to oftentimes already complex software licensing terms. to deal with the increased use of virtualization, software vendors are devoting increasing attention to the intricacies of licensing their products for use in such environments. while virtualization can ameliorate some licensing constraints (as noted in the at workshop use case), it can also conceal and promote licensing violations, such as multiple uses of a single-license applications or access to license-restricted materials. license review is a prudent and highly recommended component of implementing a virtualization solution. finally, concerning virtualization software itself, it also should be noted that while commercial vm companies usually provide plentiful resources for aiding implementation, several worthy open-source options also exist. as with any opensource software, the total cost of operation (e.g., the costs of development, maintenance, and support) needs to be considered. n conclusion as our use cases illustrate, there are numerous potential applications and benefits of virtualization technology in the library environment. while we have illustrated a number of these, many more possibilities exist, and further opportunities for its application will be discovered as virtualization technology matures and is adapted by a growing number of libraries. as with any technology, there are many factors that must be taken into account to evaluate if and when virtualization is the right tool for the job. in short, successful implementation of virtualization requires thoughtful planning. when so implemented, virtualization can provide libraries with cost-effective solutions to long-standing problems. references and notes 1. alessio gaspar et al., “the role of virtualization in computing education,” in proceedings of the 39th sigcse technical symposium on computer science education (new york: acm, 2008): 131–32; paul ghostine, “desktop virtualization: streamlining the future of university it,” information today 25, no. 2 (2008): 16; robert p. goldberg, “formal requirements for virtualizable third generation architectures,” in communications of the acm 17, no. 7 (new york: acm, 1974): 412–21; and karissa miller and mahmoud pegah, “virtualization: virtually at the desktop,” in proceedings of the 35th annual acm siguccs conference on user services (new york: acm, 2007): 255–60. 2. for other, non–ucsd use cases of virtualization, see joel c. adams and w. d. laverell, “configuring a multi-course lab for system-level projects,” sigcse bulletin 37, no. 1 (2005): 525–29; david collins, “using vmware and live cd’s to configure a secure, flexible, easy to manage computer lab environment,” journal of computing for small colleges 21, no. 4 (2006): 273–77; rance d. necaise, “using vmware for dual operating systems,” journal of computing in small colleges 17, no. 2 (2001): 294–300; and jason nieh and chris vaill, “experiences teaching operating systems using virtual platforms and linux,” sigcse bulletin 37, no 1 (2005): 520–24. 3. h. andrés lagar-cavilla, “vmgl (formerly xen-gl): opengl hardware 3d acceleration for virtual machines,” www .cs.toronto.edu/~andreslc/xen-gl/ (accessed oct. 21, 2008). jaeger ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 162 information technology and libraries | december 2010 within that goal are two strategies that lend themselves to the topics including playing a role with the office for information technology policy (oitp) with regard to technology related public policy and actively participating in the creation and adoption of international standards within the library community. colby riggs (university of california–irvine) represents lita on the office for information technology policy advisory committee. she also serves on the lita technology access committee, which addresses similar issues. the committee is chaired by elena m. soltau (nova southeastern university). the standards interest group is chaired by anne liebst (linda hall library of science, engineering, and technology). yan han (university of arizona) chairs the standards task force, which was charged to explore and recommend strategies and initiatives lita can implement to become more active in the creation and adoption of new technology standards that align with the library community. the task force will submit their final report before the 2011 ala midwinter meeting. for ongoing information about lita committees, interest groups, task forces, and activities being implemented on these and related topics, be sure to check out ala connect (http://connect.ala.org/) and the lita website (http://www.lita.org). the lita electronic discussion list is there to pose questions you might have. lita members have an opportunity to advocate and participate in a leadership role as the broadband initiative sets the infrastructure for the next ten to fifteen years. lita members are encouraged to pursue these opportunities to ensure a place at the table for lita, its members, and libraries. b y now, most lita members have likely heard about the broadband technology opportunities program (btop) and the national broadband plan. the federal government is allocating grants to the states to develop their broadband infrastructure, and libraries are receiving funding to implement and expand computing in their local facilities. by september 30, 2010, the national telecommunications and information administration (ntia) will have made all btop awards. information about these initiatives can be found at www2.ntia.doc.gov (btop), www.broadband.gov (national broadband plan), and www.ala.org/ala/aboutala/offices/oitp/index.cfm (ala office for information technology policy). on september 21, 2010, a public forum was held in silicon valley to discuss e-rate modernization and innovation in education. the conversation addressed the need to prepare schools and public libraries for broadband. information about the forum is archived at blog .broadband.gov. established in 1996, the e-rate program has provided funding for k–12 schools and public libraries for telecommunications and internet access. the program was successful in a dial-up world. it is time to now address broadband access which is not ubiquitous on a national basis. while the social norm suggests that technology is everywhere and everyone has the skills to use it, there is still plenty of work left to do to ensure that people can use technology and compete in an increasingly digital and global world. how does lita participate? the new strategic plan includes an advocacy and policy goal that calls for lita to advocate for and participate in the adoption of legislation, policies, technologies, and standards that promote equitable access to information and technology. karen j. starr (kstarr@nevadaculture.org) is lita president 2010–11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starr president’s message: btop, broadband, e-rate, and lita president’s message: the year in review—open everything colleen cuddy information technologies and libraries | june 2012 1 as i sit down to write my last president’s column a variety of topics are running through my mind. but as i focus on just one word to sum up the year, “open” rises to the top of the list. for truly it was a year of all things open. my presidential theme is open data/open science and i am looking forward to hearing tony hey and clifford lynch speak at the lita president’s program later this month on this topic. dr. lynch is also the recipient of this year’s lita/library hi tech award for outstanding communication in library and information technology, cosponsored by emerald group publishing limited. the prestigious frederick g. kilgour award for research in library and information technology award, co-sponsored by oclc, is being given to g. sayeed choudhury this year. dr. choudhury is a longtime proponent of open data and the award recognizes his leadership in the field of data curation through the national science foundation supported data conservancy project. as you well know ital is now an open-access journal. open access continues to be a hot topic, and rightly so. my last column was devoted to the subject of open access, but, i do want to remind librarians to advocate for open access in the coming year—please keep up the fight! in addition to seeing our journal to its new platform, the publications committee has also been busy with a few new lita guides, one of which, “getting started with gis,” by eva dodsworth, provides some guidance on harnessing data sets to work with geospatial technology. ms. dodworth will be conducting an online course on this topic in august and the education committee has many new courses in the pipeline. internally lita has been working towards a more open and transparent governance structure. the board has been relentless in making sure that all of its meetings are open, from in-person meetings at conferences to our monthly phone meetings to conversations on ala connect. we have been streaming our board meetings live and now will archive the recordings for a limited amount of time. this move has not been without challenges as board members and the lita office struggled to build open communication with each other and the membership. sometimes the challenges were ideological or legal, and sometimes the very technology that we embrace has caused problems, but i think it is safe to say that lita leadership is working towards a common goal of a transparent structure with open communication channels. colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011-12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york. mailto:colleen.cuddy@med.cornell.edu president’s message | cuddy 2 we opened up communication channels to get feedback on what our membership would like most when zoe stewart-marshall, incoming president, hosted a town hall meeting at the ala midwinter meeting that focused on member feedback. i know that she is working hard to address membership needs during her presidency. as a medical librarian i often travel in circles outside of ala and when my medical colleagues learned that i was lita president they were really impressed. lita is a well-known and wellrespected brand in the library community. talking to my non-lita colleagues reinforced the value that lita brings to the entire profession, particularly through our programming, education, and they way in which we share and exchange information in open forums such as the lita blog and listserv. (of course i hope that we have gained some new members through this outreach!) clearly we are doing many things right and we should not lose sight of what is great about lita as we work on addressing areas that need improvement. one thing that is consistently great about lita is its annual sponsorship of ala emerging leaders. this year we sponsored two lita members who were part of the 2012 ala emerging leaders cohort: jodie gambill and tasha keagan. both were assigned to a team working on a lita project that asked for a recommendation and plan for the implementation of a lita experts profile system. the team was responsible for identifying the software to employ and creating an implementation plan with ontology recommendations. the team has identified vivo (an opensource, semantic-web application) as the software for the project and will present its findings and implementation plan to the lita board and the ala community at the ala annual conference. the team did an outstanding job on this project and completed the deliverable on time, with very little guidance from lita leadership—a sure sign of leadership! yet, i was often reminded that as we embrace our upcoming leaders, we should not forget that leadership occurs on all levels. one message that i heard throughout my presidency is that lita should do more for mid-career librarians—and this sentiment is shared by members of other organizations in which i am active. this is a challenge that lita leadership is poised to take on as it balances its services to membership. as i now count eighteen occurrences of the word “open” in this column i believe i have made my point and it is time to sign off. although i am finishing up my duties as lita president, i am not saying goodbye. i look forward to my new role as past-president, particularly in hosting the 2012 lita national forum in columbus, ohio (october 4-7): new world of data: discover. connect. remix. the national forum planning committee led by susan sharpless smith has done an outstanding job putting together an excellent meeting. the committee has lined up interesting speakers such as eric hellman, ben schneiderman, and sarah houghton, and thoughtfully evaluated many paper and poster submissions. i am sure we will all learn quite a bit from our colleagues as we attend sessions and network. i will be hosting a dinner and i hope to see some of you there as i enjoy what i hope will be a more relaxed role as past-president. it has been an honor to serve you and i look forward to working with lita in the years to come! 62 information technology and libraries | june 2011 jason vaughan and kristen costello management and support of shared integrated library systems the second major hardware migration occurred, and an initial memorandum of understanding (mou) was drafted by the unlv libraries. this mou is still used by the libraries. the mou was discussed with all partners and ultimately signed by the director of each library. since the mou was signed nearly a decade ago, the system has continued to grow by all measures—size of the database, number of users, number of software modules comprising the complete system, and the financial and staff commitment toward support and maintenance. despite the emergence of a large number of other network-based technologies critical to library operations and services, the ils remains a critical system that supports many library operations. the research described in this paper developed in part because there is a dearth of published survey-based research of shared ils management and financial support. this article interweaves local existing practices with research findings. for brevity’s sake, the system shared by the unlv university libraries and four additional partners will be referred to as unlv’s system. to provide a relative sense of the footprint of each partner on the system, various measures can be used (see figure 1). ■■ survey method in april 2010, the authors administered a 20-question survey to the innovative user’s group (iug) via the group’s listserv. the survey focused on libraries that are part of a consortial or otherwise shared innovative ils. the innovative user’s group is the primary user’s group associated with the innovative ils and suite of products. the iug hosts a busy listserv, coordinates the annual north american conference devoted solely to the innovative system, and provides innovative customer-driven enhancement requests. to prevent multiple individuals from the same consortium responding to the survey, instructions indicated that only one individual from the main institution hosting the system should officially respond. given the anonymity of the survey and the desire to provide confidentiality, there is the possibility that some survey responses refer to the same system. the survey consisted primarily of multiple choice, “select all that apply,” and free-text response questions. the survey was divided into four broad topical areas: (1) background information; (2) funding; (3) support; and (4) training, professional development, and planning. the survey was open for a period of three weeks. because respondents could choose to skip questions, the number of responses received per question varied. on average, 43 individual responses were received for each question. innovative currently has more than 1,200 millennium ils installations.2 not all of those installations support multiple, administratively separate library entities. it is unknown the university of nevada, las vegas (unlv) university libraries has hosted and managed a shared integrated library system (ils) since 1989. the system and the number of partner libraries sharing the system has grown significantly over the past two decades. spurred by the level of involvement and support contributed by the host institution, the authors administered a comprehensive survey to current innovative interfaces libraries. research findings are combined with a description of unlv’s local practices to provide substantial insights into shared funding, support, and management activities associated with shared systems. s ince 1989, the university of nevada, las vegas university libraries has hosted and managed a shared integrated library system (ils). currently, partners include the university of nevada, las vegas university libraries (consisting of one main and three branch libraries, and hereafter referred to as unlv libraries); the administratively separate unlv law library; the college of southern nevada (a community college system consisting of three branch libraries); nevada state college; and the desert research institute. the original ils installation included just the unlv libraries and the clark county community college (now known as the college of southern nevada). the desert research institute joined in the early 1990s, the unlv law library joined with the establishment of the william j. boyd school of law in 1998, and, finally, nevada state college joined upon its creation in 2002. over time, the technological underpinnings of the ils have changed tremendously and have migrated firmly into a webbased environment unknown in 1989. the system was migrated to innovative interfaces’ current java-based platform, millennium, beginning in 1999. since the original installation, there have been three major full hardware migrations, in 1997, 2002, and 2009. over time, regular innovative software updates, as well as additional purchased software modules, have greatly extended both the staff and end user functionality of the ils. in early 2001, unlv and its partners conducted a marketplace assessment of ils vendors catering to academic customers.1 the assessment reaffirmed the consortia’s commitment to innovative interfaces. shortly thereafter, jason vaughan (jason.vaughan@unlv.edu) is director, library technologies, university of nevada las vegas. kristen costello (kristen.costello@unlv.edu) is systems librarian, university of nevada las vegas. management and support of shared integrated library systems | vaughan and costello 63 partners originally purchased the system together; 20 (38.5 percent) indicated they purchased the system with some of their current existing partners, while 9 (17.3 percent) indicated they as the main institution originally and solely purchased the system. several of the entities sharing the unlv libraries’ system did not even exist when the ils was originally purchased; only two of the current partners shared the original purchase cost of the system. another background question sought to understand how partners potentially individualize the system despite being on a shared platform. innovative, and likely other similar ils vendors, offers several products to help libraries better manage and control their holdings and acquisitions. of potential benefit to staff operations and workflow, innovative offers the option to have multiple acquisitions and/or serials control units, which provide separate fund files and ranges of order records for different institutions sharing the ils system. of 51 responses received, 44 respondents (86.3 percent) indicated they had multiple acquisitions and serials units and 7 (13.7 percent) do not. innovative offers two web-based discovery interfaces for patrons: the traditional online public access catalog, known as webpac, and their version of a next-generation discovery layer, known as encore. of potential benefit to staff as well as patrons, innovative offers “scoping” modules that help patrons using one of the web-based discovery interfaces, as well as staff using the millennium staff modules. the scoping module allows holdings segmentation by location or material type. scopes allow libraries to define their collections and offer their patrons the option to search just the collection of their applicable library. forty-six (88.5 percent) of the 52 respondents indicated they use scoping and 6 (11.5 percent) do not. unlv how many shared innovative library systems exist. while a true response rate cannot be determined, such a measure is not critical for this research. the survey questions with summarized results are provided in appendix a. ■■ survey background unlv’s system, with only five unique library entities, is a “small” system when compared with survey responses. survey respondents indicated a range from 2 to 80 unique members sharing their system. of the 48 responses received for this background question, 26 (54 percent) indicated 10 or fewer partners on the system. seven (14.6 percent) indicated 40 or more partners. the average number of partners sharing an ils implementation was 18 and the median was 8.5. there can be varying levels of partnership within a shared ils system. unlv’s instance is a rather informal partnership. some survey respondents indicated the existence of a far more structured or dedicated support group not directly associated with any particular library. one respondent noted they have a central office comprised of an executive director and two additional staff, responsible for ils administration; this central office reports to a board of directors, comprised of library directors for each member library. another indicated they have a central office responsible not only for the ils, but for other things such as wide and local area networks and workstation support. one respondent indicated that they are actually a consortium of consortia, with 9 hosts each comprised of anywhere from 4 to 11 libraries. twenty-three respondents out of 52 (44.2 percent) indicated that they and all of their current existing full-time library staff bibliographic records item records order records patron records staff login licenses unlv libraries 105 (70.9%) 1,494,890 (78.2%) 1,906,225 (81.1%) 74,223 (58.4%) 40,788 (59.6%) 85 (69.1%) unlv law library 13 (8.8%) 246,678 (12.9%) 243,788 (10.4%) 29,921 (23.5%) 2,034 (3%) 13 (10.6%) college of southern nevada 27 (18.2%) 146,118 (7.6%) 175,862 (7.5%) 22,142 (17.4%) 23,876 (34.9%) 20 (16.3%) nevada state college 1 (.7%) 17,787 (.9%) 17,979 (.8%) 841 (.7%) 1,718 (2.5%) 3 (2.4%) desert research institute 2 (1.4%) 5,396 (.3%) 5,361 (.2%) 0 (0%) 24 (<.1%) 2 (1.6%) figure 1. various measures of ils footprints for unlv’s shared ils (percentage of overall system) note: “staff login licenses” refers to the number of simultaneous staff users each institution can have on the system at any given time. 64 information technology and libraries | june 2011 share of funding toward annual maintenance based on their number of staff licenses, as shown in figure 1. ■■ funding support from partners mous appear to include funding and budgeting information more than any other discrete topic. direct support costs can include the maintenance support costs paid to one or more vendors, costs for additional vendor authored software modules purchased in addition to the base software, and, perhaps, licensing costs associated with a database or operating system used by the ils (e.g., an oracle license for oracle based ils systems). there are many parameters by which costs could be determined for partners, and, given the dearth of published research on the topic, a chief focus of this research sought more information on what factors were used by other consortia. the authors brainstormed 10 elements that could potentially figure into the overall cost sharing method. thirty-eight respondents provided information on factors playing a role in their cost sharing arrangements, illustrated in figure 2. respondents could mark more than one answer for this question, as more than one factor could be involved. the top two factors relate directly to vendor costs— whether annual support costs or acquisition of new vendor software. hardware placed third in overall frequency; for innovative and likely for other ils systems, ils hardware can be purchased from the vendor or an approved platform can be sourced from a reseller directly. support costs from third parties and the number of staff login ports were each identified as a factor by more than a third of all respondents. ■■ software purchases depending on the software, additional modules extending the system capabilities can benefit a single partner, or, in unlv’s experience, all partners on the system. traditionally, the unlv libraries have had the largest operating budget of the group, and a majority of new software requests have come internally from unlv libraries staff. over the past 20 years, the unlv libraries have fully funded the initial purchase costs of a majority of the software extending the system, regardless of whether it benefits just the unlv libraries or all system partners. there are numerous exceptions where the partner libraries have contributed funding, including significant start-up costs associated with the unlv law library joining the system in 1998 and the addition of nevada state college in 2002. in both instances, those bodies funded required and recommended software directly applicable has multiple serials and acquisitions units as well as multiple scopes configured to help segment the records for each entities’ particular collection. innovative offers various levels of maintenance support. unlv’s level of support includes the vendor supplying services such as application troubleshooting resolution, software updates, and some degree of operating system and hardware configuration and advice. unlv also contracts with the hardware vendor for hardware maintenance and underlying operating system support. the unlv libraries have had the opportunity to hire fully qualified and capable technical staff to provide a high level of support for the ils. unlv’s level of vendor support has evolved from an original full turnkey installation with innovative providing all support to a present level of more modest support. nearly half of all survey respondents, 25 of 52 (48.1 percent) indicated they had a turnkey arrangement with innovative; the remaining 27 respondents had a lesser level of support. maintenance and support obviously carry a cost with one or more third party providers. the majority of the respondents, 40 of 51 (78.4 percent), indicated there is a cost-sharing structure in place where maintenance support costs related to the ils are spread across partner libraries. six respondents (11.8 percent) indicated the main institution fully funds the maintenance support costs. the unlv libraries drafted the first and current mou in 2002 for all five entities sharing the ils system. thirty-five of 51 survey respondents (68.6 percent) indicated they, too, have a mou in place. unlv’s mou is a basic document, two pages in length, split into the following sections: background; acquisition of new or additional hardware; acquisition of new or additional software; annual maintenance associated with the primary vendor and third party suppliers and, importantly, the associated cost allocation method for how annual support costs are split between the partners; how new products are purchased from the vendor; and management and support responsibilities of the hosting institution. many of the survey respondents provided details on items contained in their own mous, which can be clustered into several broad categories. these include budgeting, payments, funding formulas; general governance and voting matters; support (e.g., contractual service responsibilities, responsibilities of member libraries); equipment (e.g., title and use of equipment, who maintains equipment); and miscellaneous. this latter category includes items such as expectations for record quality; network requirements/ restrictions; fine collection; and holds management. the majority of unlv’s mou addresses shared costs for annual maintenance. unlv’s cost-sharing structure is simple. the system has a particular number of associated staff (simultaneous login) licenses, which have gradually increased as the libraries have grown. logins are separated by institution, and each member is assessed their management and support of shared integrated library systems | vaughan and costello 65 annual maintenance bill and all partners help maintain new software acquisitions by contributing toward the annual maintenance. regarding new software acquisitions, cost-sharing practices varied between 44 respondents providing information in the survey. eight (18.2 percent) indicated there is consultation with other partners and there is some arrangement to share costs between the majority or all partners sharing the system. two respondents (4.5 percent) indicated the institution expressing the initial interest in the product fully funds the purchase. nineteen respondents (43.2 percent) indicated that they have had instances of both these scenarios (shared funding and sole funding). two respondents (4.5 percent) indicated they could not recall ever adding any additional software. thirteen respondents (29.5 percent) offered details to their operation such as additional serials and accounting units (for the law library), check-in and order records, and staff licenses. in addition, when the system was migrated from the aging text-based system (innopac) to the current millennium java-based gui system in 1999, the current partners contributed toward the upgrade cost based on number of staff licenses. partner institutions have continued to fund items of sole benefit to their operation, such as adding staff licenses or required network port interfaces associated with patron self-check stations installed at their facilities. during the 2000s, the unlv libraries have fully funded a majority of software of potential benefit to all partners, such as the electronic resource management module, the encore next generation discovery platform, and various opac/encore enhancements. software additions typically increase the figure 2. cost-sharing formula factors t h e a m o u n t o f th e o ve ra ll ye a rl y in n o va ti ve in te rf a c e s m a in te n a n c e /s u p p o rt i n vo ic e t h e a m o u n t o f a n y a d d it io n a l 3 rd p a rt y m a in te n a n c e / su p p o rt a g re e m e n ts a ss o c ia te d w it h t h e i n n o va ti ve sy st e m ( su c h a s c o n tr a c ts w it h t h e h a rd w a re m a n u fa c tu re r— h p, s u n m ic ro sy st e m s [o ra c le ], e tc .) t h e p u rc h a se c o st (s ) fo r n e w ly a c q u ir e d i n n o va ti ve m o d u le s/ p ro d u c ts t h e p u rc h a se c o st (s ) fo r n e w ly a c q u ir e d h a rd w a re a ss o c ia te d w it h t h e i n n o va ti ve s ys te m ( su c h a s a se rv e r, a d d it io n a l d is k s p a c e , b a c k u p e q u ip m e n t, e tc .) t h e n u m b e r o f in c id e n t re p o rt s (o r ti m e s p e n t) , b y p e rs o n n e l a t th e m a in i n st it u ti o n r e la te d t o r e se a rc h , tr o u b le sh o o ti n g , e tc . su p p o rt i ss u e s re p o rt e d b y p a rt n e r in st it u ti o n s t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y in st it u ti o n f t e t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f b ib o r it e m r e c o rd s th e p a rt n e r’ s in st it u ti o n h a s in t h e in n o va ti ve d a ta b a se t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f st a ff lo g in p o rt s d e d ic a te d t o t h e p a rt n e r lib ra ry t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f u se r se a rc h e s c o n d u c te d f ro m i p r a n g e s a ss o c ia te d w it h th e p a rt n e r in st it u ti o n t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y th e n u m b e r o f p a tr o n r e c o rd s w h o se h o m e l ib ra ry i s a ss o c ia te d w it h t h e p a rt n e r in st it u ti o n 66 information technology and libraries | june 2011 applied, the number of staff users has increased significantly, and the system was migrated to an underlying oracle database in 2004. since the original system was purchased in 1989 and fully installed in 1990, the central, locally hosted server has been replaced three times, in 1997, 2002, and 2009. partners contributed toward the costs of the server upgrades in 1997 and 2002, while the unlv libraries fully funded the 2009 upgrade. software and hardware components comprising the backup system have been significantly enhanced with a modern system capable of the speed, capacity, and features needed to perform appropriately in the short backup window available each night. unlv funded the initial backup software and hardware, and the partner institutions contribute toward the annual maintenance associated with the backup equipment and software. one survey question focused on major central infrastructure supporting the ils (defined as items exceeding $1,000 and with several examples listed). the question did not focus on hardware that could be provided by ils vendors benefiting a single partner, such as self-check stations or inventory devices. fourteen (31.8 percent) of the 44 respondents indicated that if major new hardware was needed, there was consultation with other partners, and, if purchased, a cost-sharing agreement was arranged. two respondents (4.5 percent) indicated the institution expressing the initial interest fully funds the purchase and seven respondents indicated they’ve had instances in the past of both these scenarios. three respondents (6.8 percent) indicated their shared system hardware had never been replaced or upgraded to their knowledge. nineteen respondents provided information on alternate scenarios or otherwise more details as to local practice. several indicated a separate fund is maintained solely for large ils system-related improvements or ils related purchases. revenue for these funds can be built up over time through maintenance and use payments by partner libraries or by a small additional fee earmarked for future hardware replacement needs collected each year. one respondent indicated they have been able to get grant funds to cover major purchases. with few exceptions, the majority of free text responses indicated that costs for major purchases were shared by partners or otherwise funded by the central consortium or cooperative agency. as with regular annual maintenance and new software purchases, various elements can determine what portion of hardware replacement costs are borne by partner libraries. this includes number of staff licenses (21.9 percent of responses), institutional fte count (15.6 percent), number of bibliographic or item records (15.6 percent), and number of patron records (9.4 percent). twenty respondents provided additional information. several indicated that the costs are split evenly across all partners. several indicated that population served was a factor. others reiterated that costs for central hardware on other scenarios. several indicated that if a product is directly applicable to only one library, such as self-check interfaces and additional acquisition units, then the library in need fully funds the purchase, which mirrors the local practice at unlv. several respondents indicated that if a product benefits all libraries, then costs are shared equally. one respondent indicated that the partner libraries discuss the potential item, and collectively they may choose not to purchase, even if one or more partners are very interested. in such cases, those partners have the option to purchase the product and must agree to make it available to all partners. several respondents indicated that, as the largest entity using the shared system, they generally always purchased new software for their operation as needed, with the associated benefit that the other partners of the system were allowed to use the software as well. three respondents reiterated that a central office funds add-on modules, in one case from funding set aside each year for system improvements. a fourth respondent indicated that a “joiners fee” fund, built up from new members joining the system, allows for the purchase of new software. clearly there are many scenarios of how new software is funded. generally, regardless of funding source, sole or share, if a product can benefit all partners, it’s allowed to do so. thirty-six survey respondents provided details on what factors determine how much each partner contributes toward new software purchases. seven respondents (19.4 percent) indicated the number of staff licenses plays a role (as in the unlv model). three respondents (8.3 percent) indicated that institution fte played a role, while three other respondents indicated that the number of partner bibliographic/item records played a role. the majority of respondents, 25 (69.4 percent) provided alternate scenarios or otherwise more information. nine of these 25 respondents indicated costs were split evenly across all partners. several indicated that the formula used for determining maintenance costs was also applied to new software purchases. four respondents indicated that the library service population was a factor. two indicated that circulation counts were a factor. one indicated that it’s negotiated on a per purchase basis, based on varying factors. ■■ hardware purchases hardware needs related to the underlying infrastructure, such as server(s), disk space, and backup equipment increases as the ils grows. unlv’s ils installation has grown tremendously. new software modules have been purchased, application architecture changes occurred with the release of the millennium suite in the late 1990s, regular annual updates to the system software have been management and support of shared integrated library systems | vaughan and costello 67 each partner institution. each module coordinator served as the contact person charged with maintaining familiarity with the functions and features of a particular module, testing enhancements within new releases, keeping other staff informed of changes, and alerting the system vendor of any problems with the module. annually, module coordinators were to consider new software and prioritize and recommend ils software the library should consider purchasing. module coordinators were tasked to maintain a system-wide view of the ils and alert others if they discovered problems or made changes to the ils that could affect other areas of the system. in addition, module coordinators were encouraged to subscribe to the iug listserv to monitor discussions and to maintain awareness of overall system issues. all staff had access to the system’s user manual but if they had questions on system features or functions, the module coordinator served as an additional resource. in addition, any bug reports were provided to the most appropriate module coordinator, who would contact innovative. the unlv systems staff, which has grown over time and is now part of the library technologies division, was responsible for all hardware and networking problems, and for scheduling and verifying nightly data backups. the systems department coordinated any new software installations with the module coordinators group, library staff, and library partners. in 2006, the unlv libraries reorganized and hired a dedicated systems librarian focused on the ils. the systems librarian’s principal job responsibility is to serve as the central administrator and site coordinator of the unlv libraries’ shared ils. responsibilities include communicating with colleagues regarding current system capabilities, monitoring vendor software developments, monitoring how other libraries utilize their innovative systems, and recommending enhancements. the systems librarian is the site contact with innovative and coordinates and monitors support calls, software and patch upgrades, and new software module installations. the position serves as the contact person for the shared institutions whenever they have questions or issues with the ils. the systems librarian has taken over much of the work previously coordinated through the module coordinators group. while the formal module coordinators group no longer exists, module experts still provide assistance as needed, and consultation always occurs with partners on system-wide issues as they arise. unlv is not unique in how it manages their ils. in the survey results, 36 respondents (87.8 percent) indicated there is a dedicated individual at the main institution who has a primary responsibility of overseeing the ils. to help clarify the responses, “primary responsibility” is defined as individuals spending more than half their time devoted to support, research, troubleshooting, and system administration duties related to the ils. the authors replacements are determined by the same formula used for assessing the share of annual maintenance. ■■ additional purchases the last funding-related survey question asked if ongoing content enrichment services were subscribed to, and if so, to describe how the cost share amount is determined for partner libraries. content enrichment services can provide additional evaluative content such as book cover images, table of contents (toc), and book reviews. unlv subscribes to a toc service as well as an additional service providing book covers, reviews, and excerpts. partner institutions contribute to the annual service charge associated with the toc service and pay for each record enhanced at their library. unlv fully funds the book cover/review/excerpt service that benefits all partners. fourteen of the 43 survey respondents (32.6 percent) indicated they did not subscribe to enrichment services. twelve respondents (27.9 percent) indicated they had one or more enrichment services and that the costs were fully funded by the main institution. seventeen respondents (39.5 percent) subscribe to enrichment services and that the costs are shared. several indicated the existing cost-sharing formula used for other assessments (annual maintenance, hardware, or nonsubscription-based software) is also used for the ongoing enrichment services. one respondent indicated they maintain a collective fund for enrichment services and estimate the cost of all shared subscriptions; this figure is integrated into the share each institution contributes to the central fund annually. one respondent indicated that their system only uses free enrichment services. ■■ support the next section of the survey addressed staff support efforts related to management of the ils. twenty years ago when unlv installed its ils, staff support included one librarian and one additional staff; both focused on various aspects of system support, from maintaining hardware to working with the vendor, in addition to having other primary job responsibilities completely unrelated to the ils. in addition, over time, functional experts developed for particular modules of the system, such as cataloging, acquisitions, circulation, and serials control. this group of functional experts eventually became known as the unlv innovative module coordinators group, which was chaired by the head of the library systems department. this group met quarterly and included experts from unlv as well as one representative from 68 information technology and libraries | june 2011 solely by the main library. typical system administration activities include managing and executing mid-release and major release software upgrades (95.2 percent of all respondents indicated the main library is solely responsible); managing, coordinating, and scheduling new products for installation (95.2 percent); monitoring disk space (95 percent); and scheduling and monitoring backups (92.9 percent). unlv’s ils support model is very similar to the survey results. the systems librarian at unlv manages all software upgrades, as well as coordinating and scheduling new ils software product and module installs. the library technologies division monitors and schedules the nightly backups and diskspace usage. certain unlv libraries staff and selected individuals from the partner libraries are authorized to open support calls with the system vendor, although the systems librarian often handles this activity herself. other functions, such as maintaining the year-to-date and last year circulation statistics are also performed by the unlv libraries systems librarian. updating circulation parameters are tasks best performed by each of the created a list of 20 duties related to ils system administration and asked respondents to indicate whether: the main library or a central consortial or cooperative office dedicated to the ils handles this particular duty; the duty is shared between the main library and partner libraries; or the duty is handled by just a partner library. as illustrated in figure 3, the survey results overwhelmingly show that the main library in a shared system provides the majority of system administration support. only two tasks were broadly shared between the main library and partner libraries; maintenance of the institution’s records (bibliographic, item, patron, order, etc.) and maintaining network and label printers. other shared tasks included changes to the circulation parameters tables (e.g., configuring loan rules and specifying open hours and days closed tables for materials they themselves circulate) with 40.5 percent of the respondents indicating this as a shared responsibility, opening support calls with the vendor (38.1 percent), monitoring bounced export and fts mail (33.3 percent), and account management (31 percent). the more typical system administration activities are done a c c o u n t m a n a g e m e n t (c re a te n e w / d e le te a c c o u n ts ; m ill e n n iu m a u th o ri za ti o n s) m a n a g e a n d e x e c u te i n n o va ti ve m id -r e le a se a n d m a jo r re le a se s o ft w a re u p g ra d e s m a n a g e , c o o rd in a te a n d s c h e d u le n e w in n o va ti ve s o ft w a re p ro d u c t in st a lla ti o n s s c h e d u le a n d m o n it o r b a c k u p s w ri te s c ri p ts t o a u to m a te p ro c e ss e s (i. e ., c ir c u la ti o n o ve rr id e s re p o rt , sy st e m s ta tu s re p o rt s, e tc .) p e rf o rm r e vi e w f ile m a in te n a n c e a n d t a k e a c ti o n s h o u ld a ll fi le s fi ll o p e n s u p p o rt c a lls w it h i n n o va ti ve m o n it o r st a tu s o f o p e n c a lls ; se rv e a s lia is o n w it h i n n o va ti ve f o r re so lu ti o n o f su p p o rt c a lls m a in ta in y e a rto -d a te /l a st y e a r c ir c u la ti o n st a ti st ic c o u n te rs m o n it o r sy st e m m e ss a g e s m o n it o r d is k s p a c e u sa g e m o n it o r b o u n c e d e x p o rt a n d f t s m a il m a in ta in c o d e t a b le s (f ix e d l e n g th , va ri a b le le n g th , e tc .) u p d a te c ir c u la ti o n p a ra m e te rs t a b le s (lo a n ru le s, h o u rs o p e n , d a ys c lo se d , e tc .) s e t u p , m o n it o r a n d t ro u b le sh o o t n o ti c e s is su e s w ri te o r m o d if y lo a d t a b le s fo r n e w r e c o rd lo a d in g m a in ta in s ys te m p ri n te rs ( la b e l, n e tw o rk e d la se r p ri n te rs ) p ro vi d e m a in te n a n c e o n r e c o rd s (p a tr o n , b ib , it e m , e tc .) m a n a g e s ys te m s e c u ri ty t h ro u g h i n n o va ti ve sy st e m s e tt in g s a n d /o r h o st b a se d o r n e tw o rk b a se d f ir e w a lls p ro vi d e e m e rg e n c y (o ff h o u rs ) re sp o n se t o re p o rt s o f in n o va ti ve d o w n ti m e o r se rv e r h a rd w a re f a ilu re s figure 3. systems administration / support responsibilities management and support of shared integrated library systems | vaughan and costello 69 and definition of policies and procedures. some groups provide recommendations to a larger executive board for the consortia. the meeting frequency of these groups is as varied as the libraries. some groups meet quarterly (33.3 percent) or monthly (20 percent) but the majority meet at other frequencies (40 percent), such as every other month or twice a year. some libraries use e-mail to communicate as opposed to having regular in-person meetings. in addition to a standing committee focused on the ils, and similar to unlv’s experience, libraries may have finite working groups to implement particular products. ■■ training, professional development, and planning the survey also focused on training, professional development, and planning activities related to the ils. there are many methods that library staff can use to stay current with their ils. most training methods typically include in-person workshops or online tutorials, as well as other venues for professional development, such as conference attendance. the authors were interested in how libraries sharing an ils determined training needs and who was responsible for the training. the survey results showed that libraries value a variety of training opportunities, partner institutions, with advice and assistance as necessary provided by the systems librarian. the authors were interested if an ils oversight body exists with other shared systems, and, if so, what issues are discussed. responses indicated that a variety of groups exist, and, in some instances, multiple groups may exist within one consortia (some groups have a more specific ils focus and others a more tangential involvement). as illustrated in figure 4, a minority of respondents, 11 of 41 (26.8 percent), indicated that they do not have a group providing ils oversight. if such a group exists, respondents were allowed to select various predefined duties performed by that group. twenty-three respondents indicated the group discusses purchasing decisions. respondents also indicated that such a group also discusses the impact of the vendor enhancements offered by mid-release and regular full-releases (19), and when to schedule the upgrades (12). the absence of an oversight group doesn’t imply that consultation doesn’t occur, rather, it may be the responsibility of an individual as opposed to an effort coordinated by a group. some libraries also have module-driven committees, which disseminate information, introduce new ideas, and try to promote cohesiveness throughout the consortium. other duties that such an oversight group may focus on include workflow issues, discussion of system issues, figure 4. issues discussed by ils oversight body updates on unresolved problem calls with innovative discussion on enhancements offered by mid-release and regular full release software upgrades and their impact (positive/ negative) on users of the system scheduling mid-release/ full release software upgrades prioritizing and selecting choices related to the innovative user’s group enhancements ballot for your installation discussion of potential new software/ modules to purchase from innovative n/a—an oversight group, body, or committee does not exist related to the oversight of the innovative system other 70 information technology and libraries | june 2011 specifically regarding cost sharing, support, and rights and responsibilities. in conducting this background research, a paucity of published literature was observed, and thus the authors hope the findings above may help other established consortia, who may be interested in reviewing or tweaking their current mous or more formalized agreements likely in place. it may also provide some considerations for libraries considering initiating a shared ils instance, something that, given the current recession, may be a topic to consider. given that nearly a decade has passed since the original unlv mou was drafted and agreed to, several revisions will be proposed and drafted. this includes formalization of how costs are divided for enrichment services (new since the original mou), and formalization in writing of the coordination role of the systems librarian in her capacity as chief manager of the ils. other ideas gathered from survey responses are worth consideration, such as a base additional fee contributed each year (above and beyond the fee accessed as determined by staff licenses). such a fee could help recoup real, sometimes significant costs associated with the system, such as the purchase of additional software benefitting all players (often, in practice funded solely by the main library). such a fee could also help recoup more tangential (but still real) expenses, such as replacement of backup media. however, at the time of writing, tweaking (increasing) the fee assessed to partner institutions is a delicate issue. as with many other institutions of learning and their associated libraries, the nevada system of higher education has been particularly hard hit with funding cuts, even when compared against serious cuts experienced by colleagues nationwide. by all measures (unemployment, state budget shortfall, foreclosures, etc.) nevada has been one of the hardest hit states in the current recession. while knowledge gained from this survey was useful (and current), what effect it will have in changing the cost structure is, now, on hold. in the spirit of support among the libraries in the same system of higher education, and in continuing to demonstrate serious shared efficiencies (by maintaining one joint system as opposed to five individual systems), no new fee structure will be implemented in the short term. at the appropriate time, different costing structures such as those elicited in the survey results will merit closer attention. references 1. jason vaughan, “a library’s integrated online library system: assessment and new hardware implementation,” information technology and libraries 23, no. 2 (june 2004): 50–57. 2. innovative interfaces, “about us: history,” http://www .iii.com/about/history.shtml (accessed may 17, 2010). regardless of the library’s status. the easiest and cheapest method of awareness involves having someone monitor the iug electronic discussion list, with 29 respondents (70.7 percent) indicating that both the main library and one or more partner libraries participate in this activity. attendance at the national and regional iug meetings was also valued highly by libraries with 26 respondents (66.7 percent) indicating both the main libraries and their partner libraries having a staff member attend such meetings in the past 5 years. sixteen respondents (64 percent) indicated both the main library and their partner libraries regularly send staff to the american library association annual conference and midwinter meeting. iug typically has a meeting the friday before the midwinter meeting. attendance at training workshops held at the vendor headquarters, as well as online training, is an activity in which the main library participates more frequently than the partner libraries (61.1 percent). complete survey results are provided in appendix a, available at http://www.lita.org/ala/mgrps/divs/lita/ ital/302011/3002jun/pdf/vaughan_app.pdf. ■■ research summary and future directions integrated library systems shared by multiple partners hold the promise of shared efficiencies. given a rather significant number of responses, shared systems appear to be quite common, ranging from a few partners to systems with many partners. perhaps reflecting this, shared systems range from loose federations of library partners to shared systems managed by a more formalized, official consortium. a majority of libraries with shared systems have a mou or other official documents to help define the nature of the relationship, focusing on such topics as budgeting, payments, and funding formulas; general governance and voting matters; support; and equipment. most libraries sharing a system have a method or funding formula outlining how the ils is funded on an annual basis and the contributions provided by each partner. such methods can include not only annual maintenance, but also the procurement of new hardware and software extending the system capabilities. while many support functions are carried out by a central office or staff at the main library hosting the shared system, partner libraries often participate in annual user group and library association conferences where they help stay abreast of vendor ils developments. the research above describes the authors’ investigations into management of shared integrated library systems. in particular, the authors were interested in how other consortia sharing an ils managed their system, index to volume 24 ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 44 information technology and libraries | june 2007 author id box for 3 column layout column title 44 information technology and libraries | september 2008 communications james feher and tyler sondag administering an open-source wireless network this tutorial presents enhancements to an open-source wireless network discussed in the june 2007 issue of ital that should reduce its administrative burden. in addition, it will demonstrate an opensource monitoring script written for the wireless network. as it has become increasingly important to provide wireless internet access for their patrons, libraries and colleges are almost expected to offer this service. inexpensive methods of providing wireless access—such as adding a commodity wireless access point to an existing network—can suffer from security issues, access by external entities, and bandwidth abuses. designs that address these issues often involve more costly proprietary hardware as well as expertise and effort that are often not readily available. a wireless network built with open-source software and commodity hardware that addressed the cost, security, and equal access issues mentioned above was presented in the june 2007 issue of ital.1 this tutorial highlights enhancements to the previous design that help to explain the technical hurdles in implementation, and includes a program that monitors the status of the various software and hardware components, helping to reduce the time required to administer the network. the wireless network presented requires several different pieces of software that must work together. because each of the required software programs are frequently updated, slight changes to the implementation may also be needed. a few issues that have arisen since the previous paper was written are addressed. a note is provided explaining the significance of setting the correct media access control (mac) address for the radius server and for wireless distribution system (wds) when configuring the system. in addition, in order to provide secure exchange of authentication credentials (username and password), the secure socket layer was used. a brief explanation of how to install a registered certificate on the gateway server is provided. lastly, a program that monitors the status of the network, provides a web page displaying the status of the various hardware and software components, and e-mails administrators with any changes to the network status—along with information on how this program is to be deployed within the network—is presented. configuration changes for previous design as new exploits are discovered and patched on a continual basis, any system should be regularly updated to insure that the most recent software is being used. the network design provided in the previous article used many different software components including, but not limited to: access point software openwrt—whiterussian rc3 dns cache dnsmasq v2.32 gateway chillispot v1.0 operating system fedora core 4 radius server free radius v1.0.4 web caching server squid v2.5 web server apache 2.2.3 many of these components can be kept up-to-date by using the yellow dog updater, modified (yum). 2 for example, to update a given package, with root access, at the command line enter: yum update packagename the yum command may also be used to update each package that has an available update by simply removing the package name from the yum update command and entering the following: yum update yum may also be used to upgrade the entire operating system.3 keep in mind that with any change in software, the configuration of any particular package may change as well. for example, the newest version of squid is currently 2.6. appendix d in the previous paper explained how to allow transparent relay of web requests so that client browsers did not have to be reconfigured. so, while version 2.5 required four changes to allow the transparent relay, the current version—found in appendix a—requires only one. in addition to changes in software, occasionally even entire websites move, as happened with chillispot.4 another change involved the configuration of the linksys wrt54gs access points. the newer versions of this access point/router sold by linksys have half the flash memory and half of the ram of the older versions.5 while the newer versions of the linksys wrt54gs can be flashed with custom firmware,6 the firmware that will fit on the newer unit lacks all the capability of the standard firmware. given this, those wishing to implement such a wireless network should investigate the capability of models to be deployed, as well as the version numbers for the access points chosen. the current version of the linksys wrt54gl and wrtsl54gs units retain enough flash memory and ram to be updated with the standard firmware mentioned in the previous article.7 james feher (jdfeher@mckendree.edu) is associate professor of computer science and computer information systems at mckendree university, lebanon, illinois. tyler sondag (sondag@cs.iastate.edu) is a phd candidate in computer science at iowa state university, ames. introducing zoomify image | smith 45administering an open-source wireless network | feher and sondag 45 in addition, the procedure for upgrading the firmware for the wrtsl54gs is simpler than the procedure outlined in appendix i of the previous paper. the factory-installed firmware on version 1.1 can be flashed directly using the web interface provided by linksys. so, while this tutorial and the previous paper outline the design of a network, the administrator will need to be vigilant in updating the packages used and keep in mind that the configuration specifications may also change with those updates. the administrator for the network must also investigate the capability of the standard hardware used to insure that it retains the functionality required for the system. choosing the correct mac address for the access point the access points used will have more than one interface and as such more than one mac address. when entering the mac address of a given access point into either the users file for the radius server or the access points that use the wds, use the mac address associated with the wireless interface.8 using the incorrect mac address will result in problems when communicating with the various access points. for the radius server, the access point will not get the correct ip address, which will prohibit the possibility of remotely administering the unit. incorrect mac addresses that are used for the wds settings will cause even worse problems, as the unit will not be able to relay data from users who connect to this access point. installation of a registered ssl certificate as users are required to enter their authentication credentials to gain access to the internet, the exchange of this data is encrypted using the secure socket layer.9 while administrators can self-sign the certificates used for their web servers, it is recommended that a registered certificate be obtained and installed for the system. this can help prevent common attacks and has the added benefit of eliminating warnings for the client browsers when they detect unregistered certificates being used by the ssl. a search of “ssl certificate” will yield any number of commercial vendors from which a certificate can be obtained. generally the installation of a certificate is fairly straightforward. the openssl command line utility can be used to generate a ssl key and certificate signing request (csr).10 once the csr is generated, pick a vendor/certificate authority who can sign your key. it should be noted that the design presented required the authentication gateway to be behind the main router. this required a certificate to be signed for a server within an intranet that does not have a fully qualified domain name. so, when generating the ssl key and csr, make sure to use gatewayhostname.localnet as the common name of your server. of course, gatewayhostname is whatever you choose as the name of your gateway host. the term localnet is used to refer to the server existing within an intranet. then make sure to place an entry for gatewayhostname.localnet into the hosts file of the server that is providing domain name service for your network. an example entry for the hosts file which is in the /etc directory of a standard fedora core installation is found in appendix b. monitoring script for wireless network as the wireless network has many separate hardware and software components, many possible points of failure exist for the system. the script from appendix c, which was written in perl,11 uses ping to test if each access point is still connected to the network and nmap to test whether the port associated with a given network service is still available.12 this program can be run manually or, even better, run automatically through the unix cron utility to update a webpage that displays the current state of all the network components. the webpage generated by this script for the mckendree college wireless network may be found at http://lance.mckendree.edu/cgi-bin/wireless/status.cgi. (additionally, a sample of this page is available as a figure in appendix d.) this script actually contains a script within a script. the main script must be run on the gateway machine, chilli on the diagram in appendix e, as only this machine has access to ping the access points. when the script determines that an access point or daemon is down, it will e-mail the system administrator. when an access point is down, in addition to sending the system administrator an e-mail, it can also send notification to an e-mail address associated with that device. this allows for someone other than the system administrator—who may have closer physical access to the unit—to check the access point on behalf of the administrators for simple issues, such as an access point losing power. this script then generates another cgi script that can be transmitted to an external server that can be reached from anywhere on the internet. in this case, this generated script can be run as a web-based application or by the system itself using the cron utility. if run as by the cron daemon, it will also e-mail the administrators if the script has not been updated recently. the script requires the use of several perl modules that will need to be installed. n expect n mail::mailer n net::ping the script has been released using the gnu general public license, 46 information technology and libraries | june 200846 information technology and libraries | september 2008 version 2 (gpl).13 the first portion of the script contains a reference to the gpl, followed by a brief explanation of the script as well as a set of parameters that should be changed to fit the specifications of the network designed. conclusion administrators should be vigilant in updating the entire system to assure security, keeping in mind that new versions of software or hardware may necessitate changes in the overall configuration of the system. in addition, while the monitoring script provides a useful aid in monitoring the network, it could be further expanded to include a more comprehensive review of level of use for various access points by the different users. it is felt that this would be best done through a database, which would require a higher level of administrative effort. a brief frequently asked questions list along with the script and link to the code for the script can be found at http://lance.mckendree.edu/csi/ wirelessfaq.html. references 1. sondag, tyler and james feher, “open source wifi hotspot implementation,” information technology and libraries 26, no. 2: 35–43, http://ala.org/ala/lita/ litapublications/ital/262007/2602jun/ toc.cfm (accessed july 24, 2008). 2. linux@duke, “yum: yellow dog updater, modified,” http://linux.duke .edu/projects/yum (accessed july 24, 2008) 3. upgrading fedora using yum frequently asked questions, http://fedora p r o j e c t . o r g / w i k i / yu m u p g r a d e f a q (accessed mar. 16, 2007). 4. chillispot—open source wireless lan access point controller, “spice up your hotspot with chilli,” www .chillispot.info/ (accessed may 22, 2008). 5. openwrtdocs/hardware/linksys /wrt54gs—openwrt, http://wiki.open wrt.org/openwrtdocs/hardware/link sys/wrt54gs (accessed july 24, 2008). 6. bitsum technologies wiki— wrt54g5 cfe, http://bitsum.com/ openwiking/owbase/ow.asp?wrt54g5_ cfe (accessed july 24, 2008). 7. openwrtdocs/hardware/linksys/wrtsl54gs—openwrt, http:// wiki.openwrt.org/openwrtdocs/hard ware/linksys/wrtsl54gs (accessed july 24, 2008). 8. o p e n wr t d o c s / w h i t e r u s s i a n / configuration, wireless distribution system (wds)/repeater/bridge. http:// wiki.openwrt.org/openwrtdocs/white russian/configuration (accessed july 24, 2008). 9. viega, john, matt messier, and pravir chandra, network security with openssl cryptography for secure communications. (sebastopol, calif.: o’reilly and associates, 2002). 10. generating a key pair and csr for an apache server with modssl. www .verisign.com/support/tlc/csr/modssl/ v00.html (accessed feb. 20, 2007). 11. wall, larry, tom christiansen, and randal schwartz, programming perl, third edition (sebastopol, calif.: o’reilly and associates). 12. nmap—free security scanner for network exploration and security audits. http://insecure.org/nmap/ (accessed feb. 20, 2007). 13. gnu general public license version 2, june 2007. www.gnu.org/licenses/ gpl.txt. appendix a. squid configuration changes # changes made to squid.conf # lines needed for squid 2.5 #httpd_accel_port 80 #httpd_accel_host virtual #httpd_accel_with_proxy on #httpd_accel_uses_host_header on # # one line needed in version 2.6 http_port 3128 transparent appendix b. /etc/hosts entry on marla for localnet entry 127.0.0.1 marla localhost.localdomain localhost 66.128.109.60 bob 66.99.172.252 lance.mckendree.edu lance # next line is for the ssl certificate to work properly 192.168.176.1 chilli.localnet chilli introducing zoomify image | smith 47administering an open-source wireless network | feher and sondag 47 appendix c. monitoring script #!/usr/bin/perl ######################################################### # code released 03/22/07 under: # # the gnu general public license, version 2 # # http://www.gnu.org/licenses/gpl.txt # # # # it is recommended that this script is run as a cron # # job frequently to find changes in the network. this # # script will check the status of the wireless access # # points/routers as well as the daemons necessary to # # run the network. it will then output the results to # # another perl file that is copied to a remote # # webserver. when the script observes a change in the # # availability of any access point or daemon, email # # will be sent to the specified administrator # # address(es). the option exists to send an email to # # to an additional person for each access point. # # # # additionally, the output file on the remote webserver # # will check when it was last updated, if that script # # is run from the command line or via cron. if it has # # not been updated for a specified number of minutes, # # it will send an email to the administrator. it is # # also recommended that this output script be run as a # # cron jobr. this output script can also be executed # # as a cgi program to generate a display of network # # status. # ######################################################### use strict; use expect(); # needed to scp to webserver use mail::mailer; # needed to send emails if outages use net::ping; # needed to check the status of aps #variables for webserver to host status page’s my $webservuname = “username”; my $webservpass = “password”; my $webservurl = “lance.mckendree.edu”; my $webservtarg = “/var/www/cgi-bin/wireless/”; my $weboutputurl = “http://lance.mckendree.edu/cgi-bin/wireless/status.cgi”; my $instname = “mckendree college”; #default background color of the status page my $defbgcolor = “#660066”; # if the page on the webserver has not been updated # in $updatemin minutes send an email that the service # is down (set to =~ 3*crontime) my $updatemin = 10; #email address errors will be sent to my $fromemail = ‘admin1@email.com’; my $toemail = 48 information technology and libraries | june 200848 information technology and libraries | september 2008 ‘admin1@email.com, admin2@email.com’; #file where errors will be stored on remote host my $logfilename = “/tmp/wireleslog.txt”; #hash for routers/ap’s #location is displayed on the webpage and in status emails #owner changes in status regarding this ap are sent to # this address as well (optional) my %iptoloc = ( “192.168.182.10” => { “location” => “clark 205”, “owner” => ‘’}, “192.168.182.11” => { “location” => “clark 202a”, “owner” => ‘apuser1@email.com’}, “192.168.182.12” => { “location” => “pac lounge”, “owner” => ‘apuser2@email.com’}, “192.168.182.20” => { “location” => “library main”, “owner” => ‘apuser3@email.com’}, “192.168.182.21” => { “location” => “library upper”, “owner” => ‘’}, “192.168.182.22” => { “location” => “library lower”, “owner” => ‘’}, “192.168.182.30” => { “location” => “carnegie”, “owner” => ‘apuser4@email.com’}); #hash for daemons my %daemons = ( “dnsmasq dns server” => { “ip_addr” =>”10.4.1.90”, “port” =>”53”, “proto” =>”tcp”}, “radius authenticate” => { “ip_addr” =>”10.4.1.90”, “port” =>”1812”, “proto” =>”udp”}, “chilli capt. portal” => { “ip_addr” =>”10.5.3.30”, “port” =>”0”, “proto” =>”local”}, “squid web cache” => { “ip_addr” =>”10.4.1.90”, “port” =>”3128”, “proto” =>”tcp”}, “apache web server” => { “ip_addr” =>”10.5.3.30”, “port” =>”80”, “proto” =>”tcp”}); introducing zoomify image | smith 49administering an open-source wireless network | feher and sondag 49 ######################################################## # # # no changes need to be made to the following code # # # ######################################################## # get the current time my $currenttime = scalar localtime(); my $starttime = time(); # open old output status script to get previous status’ open(old, “status.cgi”); my @tmpoldstatfile = ; my $oldstatfile = join(“”, @tmpoldstatfile); # check routers/ap’s using ping my $diff = ‘’; my $allrouterstat; foreach my $host (sort keys %iptoloc){ my $p = net::ping->new(); my $pingresult = $p->ping($host); if(!$pingresult){ sleep 10; $pingresult = $p->ping($host); } my $thislaststat = ( $oldstatfile =~ m/$iptoloc{$host}{location}<\/td>close(); } #check the status of each daemon my $alldaemonstat =’’; foreach my $i (sort keys %daemons){ my $thislaststat = ( $oldstatfile =~ m/$i<\/td> (\$lasttime + (60 * $updatemin))){ \$systemstatus = “#ff0000”; \$message = “
status update failed
”; } # if this is cron running the script if (\$currentuser =~ “$webservuname”){ # send email if status is down & logfile doesn’t exist &sendemail() if ( (\$systemstatus =~ “#ff0000”) && !(-e “$logfilename”) ); # delete log file if everything is up unlink(“$logfilename”) if ( (!(\$systemstatus =~ “#ff0000”)) && (-e “$logfilename”) ); } #else apache is accessing the page (its a web request) else{ #print the page print header(); ############################ # start of html output # ############################ print < $instname wireless status introducing zoomify image | smith 51administering an open-source wireless network | feher and sondag 51
$instname wireless status
\$message $allrouterstat

access point status

$alldaemonstat

daemon status

last updated $currenttime
web_output ########################## # end of html output # ########################## }#end else sub sendemail { my \$mailer = mail::mailer->new(“sendmail”); \$mailer->open({from => ‘$fromemail’, 52 information technology and libraries | june 200852 information technology and libraries | september 2008 to => [\$toemail], subject => “wireless problem”}); my \$message = “the wireless system has failed to “ .”it’s status.\n\n$weboutputurl\n”; print \$mailer \$message; \$mailer->close(); open(file, “>>$logfilename”); print file “failed to update system.”; close(file); } output_file_for_remote_host ######################################################## # end of script output block # ######################################################## #write output code to the file my $perloutputfile = “status.cgi”; open (out, “>$perloutputfile”); print out $perloutput; close (out); chmod 0755, $perloutputfile; #send email is necessary &sendemail($diff, $weboutputurl, $fromemail, $toemail) if ($diff); #send perl file to webserver &scpfile($perloutputfile, $webservuname, $webservpass, $webservurl, $webservtarg); ################################################ # # # end main code block, start functions # # # ################################################ # given the name and status of something (ap or # daemon), this returns a string for the table # row for displaying the status of the ap/daemon sub printstatus { my ($service, $status, $oldstatus, $owner, $toemail,$oldstatusfile, $currenttime ) = @_; my $msg = “”; my $statusline = “\n

$service

up”; introducing zoomify image | smith 53administering an open-source wireless network | feher and sondag 53 # if last two status’ were down if ($oldstatusfile =~ m/$$service$-0--->/){ $msg = “$service back up at $currenttime\n”; # if service has owner & not already in mail list, # add owner to mail list $toemail .= “, \’$owner\’” if ($owner && (!($toemail =~ $owner))); } } #else current status is down else{ $statusline .= “down\”>down”; # if last status was down & before that status was up if ($oldstatusfile =~ m/$$service$-0-1-->/){ $msg = “$service down at $currenttime\n”; # if service has owner & not already in mail list, # add owner to mail list $toemail .= “, \’$owner\’” if ($owner && (!($toemail =~ $owner))); } } $statusline .= “

education libraries

retrieval two search methods were used: direct probabilistic retrieval. an in-house implementation was used of a probabilistic full-text retrieval algorithm developed at berkeley.7 this search engine takes a free-form text query and returns a ranked list of captions of tables ranked according to their relevance scores. for example, the five top-ranked captions returned to the query “public libraries in california” were: figure 1. query interface for search-term recommender system f or the north american industry classification system figure 2. display of naics code search-term recommendations for “car” figure 3. display of numeric data retrieved using selected naics code search across different media | buckland, chen, gey, and larson 185 1. library statistics, statewide summary by type of library california, 1992–93 to 1997–98 table f6. 2. library statistics, statewide summary by type of library california, 1993–94 to 1998–99 table f6yr0-0. 3. number of california libraries, 1989 to 1999 table f5yr00 4. number of california libraries, 1989 to 1998, as of september table f5. 5. california public schools, grades k–12, 1989 to 1998 table f4. each entry in the retrieved set list is linked to a numeric table maintained at the counting california web site and, by clicking on the appropriate link, a user can display the table as an ms excel file or as a pdf file. mediated search. from the same extracted records the words in the captions were used to create an evi to the subtopics in the topic classification using the method already described. as an example, the query “personal individual income tax,” when submitted to the evi, generated the following ranked list of subtopics: 1. income 2. government earnings and tax revenues 3. personal income 4. property tax 5. personal income tax 6. corporate income tax 7. per capita income a user can click on any selected subtopic to retrieve the captions of tables assigned that subtopic. for example, clicking on the fifth subtopic, personal income tax, retrieves: ■ personal income tax returns: number and amount of adjusted gross income reported by adjusted gross income class california, 1998 taxable year. table d10yr00 ■ personal income tax returns: number and amount of adjusted gross income reported by adjusted gross income class california, 1997 taxable year. table d9 ■ personal income statistics by county, california 1997 taxable year. table d10 ■ personal income statistics by county, california 1998 taxable year. table d11yr00 ■ transverse searching between textand numeric-data series to demonstrate the searching capability from a bibliographic record to numeric-data sets, the first step is to retrieve and display a bibliographic record from an online catalog. a web-based interface for searching online catalogs was implemented using an in-house implementation of the z39.50 protocol. besides the z39.50 protocol, an important component that makes searching remote online catalogs feasible is the gateway between the http (hypertext transfer protocol) and the z39.50 protocol. while http is a connectionless-oriented protocol, the z39.50 is a connection-oriented protocol. the gateway maintains connections to remote z39.50 servers. all search requests to any remote z39.50 server go through the gateway. searching from catalog records to numeric data sets having selected some text (for the purposes of this study, a catalog record), how could one identify the facts or statistics in a numeric database that are most closely related to the topic? clicking on a “formulate query” button placed at the end of a displayed full marc record creates a query for searching a numeric database. the initial query will contain the words extracted from the title, subtitle, and the subject headings and is placed in a new window where the user can modify or expand the query before submitting it to the search engine for a numeric database. so, for example, the following text extracted from a catalog record: library laws of the state of california, library legislation. california. public libraries when submitted as a query, retrieves a ranked list of table names, of which two, covering different time periods, are entitled library statistics, statewide summary by type of library, california. searching from numeric data sets from catalog records transverse search in the other direction, starting from a data table, is achieved by forwarding the caption of a table to the word-to-lcsh evi to generate a prompt list of the seven top-ranked lchss, any one of which can be used as a query submitted to the catalog. ■ architecture figure 4 shows the structure of the implementation. the boxes shown in the figure are: 1. a search interface for accessing bibliographic/textual resources through a word-to-lcsh evi. 2. a word to the lcsh evi. 3. a ranked list of lcshs closely associated with the query. 4. an online catalog. 186 information technology and libraries | december 2006 5. results of searching the online catalog using an lcsh. 6. a full marc record displayed in tagged form. 7. a new query formed by extracting the title and subject fields from the displayed full marc record. 8. a numeric database. 9. a list of captions of numeric tables ranked by relevance score to the query. 1 0. numeric table displayed in pdf or ms excel format. 11. a search interface for numeric databases based on a probabilistic search algorithm. a user can start a search using either interface (boxes 1 or 11) and, from either starting point, find records on the same topic of interest in a textual (here bibliographic) database and a socioeconomic database. ■ conclusions and further work enhanced access to numeric data sets the descriptive texts associated with numeric tables, such as the caption, headers, or row labels, are usually very short. they provide a rather limited basis for locating the table in response to queries, or describing a data cell sufficiently to form a usefully descriptive query from it. sometimes the title (caption) of a table may be the only searchable textual description about the content of the table, and the titles are sometimes very general. for example, one of the titles, library statistics, statewide summary by type of library california, 1992–93 to 1997–98, is so general that neither the kinds of statistics nor the types of libraries are revealed. if a user posed the question, “what are the total operating expenditures of public libraries in california?” to a query system that indexes table titles only, the search may well be ineffective since the only word in common between the table title and the user’s query is “california” and, if the plurals of nouns have been normalized, to the singular form, “library.” table column headings and row headings provide additional information about the content of a numeric table. however, the column and row headings are usually not directly searchable. for example, a table named “language spoken at home” in counting california databases consists of rows and columns. the column headings list the languages spoken at home, while the row headings show the county names in california. each cell in the table gives the number of people, five years of age and older, who speak a specific language at home. to answer questions such as “how many people speak spanish at home in alameda county, california?” using the table title alone may not retrieve the table that contains the answer to the example question. it is recommended that the textual descriptions of numeric tables be enriched. automatically combining the table title and its column and row headings would be a small but practical step toward improved retrieval. geographic search socioeconomic numeric data series refer to particular areas and, in contrast to text searching, the geographical aspect ordinarily has to be specified. to match the geographical area of the numeric data, a matching text search may also have to specify the same place. the authors found that this was hard to achieve for several reasons. place names are ambiguous and unstable: a search for data relating to trinidad might lead to trinidad, west indies, instead of trinidad, california, for example. the problem is compounded because, in numeric data series, specialized geopolitical divisions, such as census tracts and counties, are commonly used. these divisions do not match conveniently with searchers’ ordinary use of place names. also, the granularity of geographical coverage may not match well. data relating to berkeley, for example, may be available only in aggregated data for alameda county. it was eventually concluded that reliance on the names of places could never work satisfactorily. the only effective path to reliable access to data relating to places would be to use geospatial coordinates (latitude and longitude) to establish unambiguously the identity and location of any place and the relationship between places. this means that gazetteers and map visualizations become important. gazetteers relate named places to defined spaces, and thereby reveal spatial relationships between places, e.g., the city of alameda is on alameda island within alameda county. this problem has been addressed in a subsequent figure 4. architecture of the prototype search across different media | buckland, chen, gey, and larson 187 study entitled “going places in the catalog: improved geographical access.”8 temporal search searches of text files and of socioeconomic numeric data series also differ substantially with respect to time periods: numeric data searches ordinarily require the years of interest to be specified; text searches rarely specify the period. an additional difficulty arises because in text, as in speech, a period is commonly referred to by a name derived metaphorically from events used as temporal markers, rather than by calendar time, as in “during vietnam,” “under clinton,” or “in the reign of henry viii.” named time periods have some of the characteristics of place names: they are culturally based and tend to be multiple, unstable, and ambiguous. it appears that an analogous solution is indicated: directories of named time periods mapped to calendar definitions, much as a gazetteer links place names to spatial locators. this problem is being addressed in a subsequent study entitled “support for the learner: what, where, when, and who.”9 media forms the paradox, in an environment of digital “media convergence,” that it appears impossible to search directly across different media forms invites closer attention to concepts and terminology associated with media. a view that fits and explains the phenomena as the authors understand them, distinguishes three aspects of media: ■ cultural codes: all forms of expression depend on some shared understandings, on language in a broad sense. convergence here means cultural convergence or interpretation. ■ media types: different types of expression have evolved: texts, images, numbers, diagrams, art. an initial classification can well start with the five senses of sight, smell, hearing, taste, and feel. ■ physical media: paper; film; analog magnetic tape; bits; . . . being digital affects directly only this aspect. anything perceived as a meaningful document has cultural, type, and physical aspects, and genre usefully denotes specific combinations of code, type, and physical medium adopted by social convention. genres are historically and culturally situated. convergence can be understood in terms of interoperability and is clearly seen in physical media technology. the adoption of english as a language for international use in an increasingly global community promotes convergence in cultural codes. nevertheless, the different media types are fundamentally distinct. metadata as infrastructure it is the metadata and, in a very broad sense, “bibliographic” tools that provide the infrastructure necessary for searches across and between different media—thesauruses, mappings between vocabularies, place-name gazetteers, and the like. in isolation, metadata is properly regarded as description attached to documents, but this is too narrow a view. collectively, the metadata forms the infrastructure through which different documents can be related to each other. it is a variation on the role of citations: individually, references amplify an individual document by validating statements made within it; collectively, as a citation index, references show the structure of scholarship to which documents are attached. ■ summary a project was undertaken to demonstrate simultaneous search of two different media types (socioeconomic numeric data series and text files) without ingesting these diverse resources into a shared environment. the project objective was eventually achieved, but proved harder than expected for the following reasons: access to these different media types has been developed by different communities with different practices; the systems (vocabularies) for topical categorization vary greatly and need interpretative mappings (also known as relative indexes, searchterm recommender systems, and evis); specification of geographical area and time period are as necessary for search in socioeconomic data series and, for this, existing procedures for searching text files are inadequate. ■ acknowledgement this work was partially supported by the institute of museum and library services through national library leadership grant no. 178 for a project entitled “seamless searching of numeric and textual resources,” and was based on prior research partially supported by darpa contracts n66001-97-c-8541; ao# f477: “search support for unfamiliar metadata vocabularies” and n66001-00-18911, to# j290: “translingual information management using domain ontologies.” references 1. michael k. buckland, fredric c. gey, and ray r. larson, seamless searching of numeric and textual resources: final report on institute of museum and library services national leadership 188 information technology and libraries | december 2006 grant no. 178 (berkeley, calif.: univ. of california, school of information management and systems, 2002), http:// metadata.sims.berkeley.edu/papers/seamlesssearchfinal report.pdf (accessed july 18, 2006); michael buckland et al., “seamless searching of numeric and textual resources: friday afternoon seminar, feb. 14, 2003,” http://metadata.sims .berkeley.edu/papers/seamlessfri.ppt (accessed july 18, 2006). 2. michael buckland et al., “mapping entry vocabulary to unfamiliar metadata vocabularies,” d-lib magazine 5, no. 1 (jan. 1999), www.dlib.org/dlib/january99/buckland/01buckland .html (accessed july 18, 2006); michael buckland, “the significance of vocabulary,” 2000, http://metadata.sims.berkeley .edu/vocabsig.ppt (accessed july 18, 2006); fredric c. gey et al., “entry vocabulary: a technology to enhance digital search,” in proceedings of the first international conference on human language technology, san diego, mar. 2001 (san francisco: morgan kaufmann, 2001), 91–95, http://metadata.sims.berkeley.edu/ papers/hlt01-final.pdf (accessed july 18, 2006). 3. los angeles times, july 12, 1995: d1. 4. michael buckland, “vocabulary as a central concept in library and information science,” in digital libraries: interdisciplinary concepts, challenges, and opportunities. proceedings of the third international conference on conceptions of library and information science (colis3), dubrovnik, croatia, may 23–26, 1999, ed. t. arpanac et al. (lokve, croatia: benja pubs., 1999), 3–12, www .sims.berkeley.edu/~buckland/colisvoc.htm (accessed july 18, 2006); buckland et al., “mapping entry vocabulary.” 5. counting california, http://countingcalifornia.cdlib.org (accessed july 18, 2006). 6. “factsheet: unified medical language system,” www .nlm.nih.gov/pubs/factsheets/umls.html (accessed july 18, 2006). 7. william s. cooper, aitao chen, and fredric c. gey, “fulltext retrieval based on probabilistic equations with coefficients fitted by logistic regression,” in d. k. harman, ed., the second text retrieval conference (trec-2), march 1994, 57–66 (gaithersburg, md.: national institute of standards and technology, 1994), http://trec.nist.gov/pubs/trec2/papers/txt/05.txt (accessed july 18, 2006). 8. “going places in the catalog: improved geographical access,” http://ecai.org/imls2002 (accessed jul. 18, 2006). 9. vivien petras, ray larson, and michael buckland, “time period directories: a metadata infrastructure for placing events in temporal and geographic context,” in opening information horizons: joint conference on digital libraries (jcdl), chapel hill, n.c., june 11–15, 2006, forthcoming, http://metadata.sims .berkeley.edu/tpdjcdl06.pdf (accessed july 18, 2006); “support for the learner: what, where, when, and who,” http://ecai .org/imls2004 (accessed july 18, 2006). search across different media | buckland, chen, gey, and larson 189 appendix: statistical association methodology a statistical maximum likelihood ratio weighting technique was used to construct a two-way contingency table relating each natural-language term (word or phrase) with each value in the metadata vocabulary of a resource, e.g., lcsh, lccns, u.s. patent classification numbers, and so on.1 an associative dictionary that will map words in natural languages into metadata terms can also, in reverse, return words in natural language that are closely associated with a metadata value. training records containing two different metadata vocabularies can be used to create direct mappings between the values of the two metadata vocabularies. for example, u.s. patents contain both u.s. and international patent classification numbers and so can be used to create a mapping between these two quite different classifications. multilingual training sets, such as catalog records for multilingual library collections, can be used to create multilingual natural language indexes to metadata vocabularies and, also, mappings between natural language vocabularies. in addition to the maximum likelihood ratio-based association measure, there are a number of other association measures, such as the chi-square statistic, mutual information measure, and so on, that can be used in creating association dictionaries. the training set used to create the word-to-lcsh evi was a set of catalog records with at least one assigned lcsh (i.e., at least one 6xx field). natural language terms were extracted from the title (field 245a), subtitle (245b), and summary note (520a). these terms were tokenized; the stopwords were removed; and the remaining words were normalized. a token here can contain only letters and digits. all tokens were then changed to lower case. the stoplist has about six hundred words considered not to be content bearing, such as pronouns, prepositions, coordinators, determiners, and the like. the content words (those not treated as stopwords) were normalized using a table derived from an english morphological analyzer.2 the table maps plural nouns into singular ones; verbs into the infinitive form; and comparative and superlative adjectives to the positive form. for example, the plural noun printers is reduced to printer, and children to child; the comparative adjective longer and the superlative adjective longest are reduced to long; and printing, printed, and prints are all reduced to the same base form print. when a word belonging to more than one part-of-speech category can be reduced to more than one form, it is changed to the first form listed in the morphological analyzer table. as an example, the word saw, which can be a noun or the past tense of the verb to see, is not reduced to see. subject headings (field 6xxa) were extracted without qualifying subdivisions. the inclusion of foreign words (alcoholismo, alcoolisme, alkohol, and alcool), derived from titles in foreign languages, demonstrate that the technique is language independent and could be adopted in any country. it could also support diversity in u.s. libraries by allowing searches in spanish or other languages, so long as the training set contains sufficient content words. evis are accessible at http://metadata. sims.berkeley.edu/prototypesi.html. fuller descriptions of the project methodology can be found in the literature.3 ■ references 1. ted dunning, “accurate methods for the statistics of surprise and coincidence,” computational linguistics 19 (march 1993): 61–74. 2. daniel karp et al., “a freely available wide coverage morphological analyzer for english,” in proceedings of coling-92, nantes, 1992 (morristown, n.j.: association for computational linguistics, 1992), 950–55, http://acl.ldc.upenn .edu/c/c92/c92-3145.pdf (accessed july 18, 2006). 3. michael k. buckland, fredric c. gey, and ray r. larson, seamless searching of numeric and textual resources: final report on institute of museum and library services national leadership grant no. 178 (berkeley, calif.: univ. of california, school of information management and systems, 2002), http://metadata.sims .berkeley.edu/papers/seamlesssearchfinalreport.pdf (accessed jul. 18, 2006); youngin kim et al., “using ordinary language to access metadata of diverse types of information resources: trade classification and numeric data,” in knowledge: creation, organization, and use. proceedings of the american society for information science annual meeting, oct. 29–nov. 4, 1999 (medford, n.j.: information today, 1999), 172–80. microsoft word march_ital_tharani_tc proofread.docx linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe karim tharani information technology and libraries | march 2015 5 abstract by way of a case study, this paper illustrates and evaluates the bibliographic framework (or bibframe) as means for harvesting and sharing bibliographic metadata over the web for libraries. bibframe is an emerging framework developed by the library of congress for bibliographic description based on linked data. much like semantic web, the goal of linked data is to make the web “data aware” and transform the existing web of documents into a web of data. linked data leverages the existing web infrastructure and allows linking and sharing of structured data for human and machine consumption. the bibframe model attempts to contextualize the linked data technology for libraries. library applications and systems contain high-‐quality structured metadata, but this data is generally static in its presentation and seldom integrated with other internal metadata sources or linked to external web resources. with bibframe existing disparate library metadata sources such as catalogs and digital collections can be harvested and integrated over the web. in addition, bibliographic data enriched with linked data could offer richer navigational control and access points for users. with linked data principles, metadata from libraries could also become harvestable by search engines, transforming dormant catalogs and digital collections into active knowledge repositories. thus experimenting with linked data using existing bibliographic metadata holds the potential to empower libraries to harness the reach of commercial search engines to continuously discover, navigate, and obtain new domain specific knowledge resources on the basis of their verified metadata. the initial part of the paper introduces bibframe and discusses linked data in the context of libraries. the final part of this paper outlines and illustrates a step-‐by-‐step process for implementing bibframe with existing library metadata. introduction library applications and systems contain high-‐quality structured metadata, but this data is seldom integrated or linked with other web resources. this is adequately illustrated by the nominal presence of library metadata on the web.1 libraries have much to offer to the web and its evolving future. making library metadata harvestable over the web may not only refine precision karim tharani (karim.tharani@usask.ca) is information technology librarian at the university of saskatchewan in saskatoon, canada. information technology and libraries | march 2015 6 and recall but has the potential to empower libraries to harness the reach of commercial search engines to continuously discover, navigate, and obtain new domain specific knowledge resources on the basis of their verified metadata. this is a novel and feasible idea, but its implementation requires libraries to both step out of their comfort zones and to step up to the challenge of finding collaborative solutions to bridge the islands of information that we have created on the web for our users and ourselves. by way of a case study, this paper illustrates and evaluates the bibliographic framework (or bibframe) as means for harvesting and sharing bibliographic metadata over the web for libraries. bibframe is an emerging framework developed under the auspices of the library of congress to exert bibliographic control over traditional and web resources in an increasingly digital world. while bibframe has been introduced as a potential replacement for marc (machine-‐readable cataloging) in libraries;2 however, the goal of this paper is to highlight the merits of bibframe as a mechanism for libraries to share metadata over the web. bibframe and linked data while the impetus behind bibframe may have been replacement of marc, “it seems likely that libraries will continue using marc for years to come because that is what works with available library systems.”3 despite its uncertain future in the cataloging world, bibframe in its current form provides fresh and insightful mechanism for libraries to repackage and share bibliographic metadata over the web. bibframe utilizes the linked data paradigm for publishing and sharing data over the web.4 much like semantic web, the goal of linked data is to make the web “data aware” and transform the existing web of documents into a web of data. linked data utilizes existing web infrastructure and allows linking and sharing of structured data for human and machine consumption. in a recent study to understand and reconcile various perspectives on the effectiveness of linked data, the authors raise intriguing questions about the possibilities of leveraging linked data for sharing library metadata over the web: although library metadata made the transition from card catalogs to online catalogs over 40 years ago, and although a primary source of information in today’s world is the web, metadata in our opacs are no more free to interact on the web today than when they were confined on 3" × 5" catalog cards in wooden drawers. what if we could set free the bound elements? that is, what if we could let serial titles, subjects, creators, dates, places, and other elements, interact independently with data on the web to which they are related? what might be the possibilities of a statement-‐based, linked data environment? 5 linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 7 figure 1. the bibframe model6 bibframe provides the means for libraries to experiment with linked data to find answers to these questions for themselves. this makes bibframe both daunting and delighting simultaneously. it is daunting because it imposes a paradigm shift in how libraries have historically managed, exchanged, and shared metadata. but embracing linked data also leads to a promise land where metadata within and among libraries can be exchanged seamlessly and economically over the web. bibframe (http://bibframe.org) consists of a model and a vocabulary set specifically designed for bibliographic control.7 the model identifies four main classes, namely, work, instance, authority, and annotation (see figure 1). for each of these classes, there are many hierarchical attributes that help in describing and linking instantiations of these classes. these properties are collectively called the bibframe vocabulary. philosophically, linked data is based on the premise that more links among resources will lead to better contextualization and credibility of resources, which in turn will help in filtering irrelevant resources and discovering new and meaningful resources. at a more practical level, linked data provides a simple mechanism to make connections among pieces of information or resources over the web. more specifically, it not only allows humans to make use of these links but also machines to do so without human intervention. this may sound eerie, but one has to understand the history behind the origin of linked data not to think of this as yet another conspiracy for machines to take over the world (wide web). in 1994 tim berners-‐lee, the inventor of the web, put forth his vision of the semantic web as a “web of actionable information—information derived from data through a semantic theory for information technology and libraries | march 2015 8 interpreting the symbols. the semantic theory provides an account of ‘meaning’ in which the logical connection of terms establishes interoperability between systems.”8 while the idea of semantic web has not been fully realized for a variety of functional and technical reasons, the notion of linked data introduced subsequently has made the concept much more accessible and feasible for a wider application.9 once again, it was tim berners-‐lee who put forth the ground rules for publishing data on the web that are now known as the linked data principles.10 these principles advocate using standard mechanisms for naming each resource and their relationships with unique universal resource identifiers (uris); making use of the existing web infrastructure for connecting resources; and using resource description framework (rdf) for documenting and sharing resources and their relationships. a uri serves as a persistent name or handle for a resource and is ideally independent of the underlying location and technology of the resource. although often used interchangeably, a uri is different from a url (or universal resource locator), which is a more commonly used term for web resources. a url is a special type of uri, which points to the actual location (or the web address) of a resource, including the file name and extension (such as .html or .php) of a web resource. being more generic, the use of uris (as opposed to urls) in linked data provides persistency and flexibility of not having to change the names and references every time resources are relocated or there is a change in server technology. for example if an organization switches its underlying web-‐scripting technology from active server pages (asp) to java server pages (jsp), all the files on a web server will bear a different extension (e.g., .jsp) causing all previous urls with old extension (e.g., .asp) to become invalid. this technology change, however, may have no impact if uris are used instead of urls because the underlying implementation and location details for a resource are masked from the public. thus the uri naming scheme within an organization must be developed independent of the underlying technology. there are diverse best practices on how to name uris to promote usability, longevity, and persistence.11 the most important factors, however, remain the purpose and the context for which the resources are being harvested and shared. use of rdf is also a requirement of using linked data for sharing data over the web. much like how html (hypertext markup language) is used to create and publish documents over the web, rdf is used to create and publish linked data over the web. the format of rdf is very simple and makes use of three fundamental elements, namely, subject, predicate, and object. similar to the structure of a basic sentence, the three elements make up the unit of description of a resource known as a triple in the rdf terminology. unsurprisingly, rdf requires all three elements to be denoted by uris with the exception of the object, which may also be represented by constant values such as a dates, strings, or numbers.12 as an example, consider the work divine comedy. the fact this work, also known as divina commedia, was created by dante alighieri can be represented by the following two triples (using n-‐triples format): linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 9 . “divina commedia”. in the first triple of this example, the work divine comedy (subject) is being attributed to a person called dante alighieria (object) as the creator (predicate). in the second triple the use of sameas predicate asserts that both divine comedy and divina commedia refer to the same resource. thus using uris makes the resources and relationships persistent whereas use of rdf makes the format discernible by humans and machines. this seemingly simple idea allows data to be captured, formatted, shared, transmitted, received, and decoded over the web. use of the existing web protocol (http or hypertext transfer protocol) for exchanging and integrating data saves the overhead of putting additional agreements and infrastructure in place among parties willing or wishing to exchange data. this ease and freedom to define relationships among resources over the web also makes it possible for disparate data sources to interact and integrate with each other openly and free of cost. why is this seemingly simple idea so significant for the future of the web? from a functional perspective, what this means is that linked data facilitates “using the web to create typed links between data from different sources. these may be as diverse as databases maintained by two organisations in different geographical locations, or simply heterogeneous systems within one organisation that, historically, have not easily interoperated at the data level.”13 the notion of typed linking refers to the facility and freedom of being able to have and name multiple relationships among resources. from a technical point of view, “linked data refers to data published on the web in such a way that it is machine-‐readable, its meaning is explicitly defined, it is linked to other external data sets, and can in turn be linked to from external data sets.”14 in a traditional database, relationships between entities or resources are predefined by virtue of tables and column names. moreover, data in such databases become part of the deep web and not readily accessed or indexed by search engines. 15 the use of uris to name relationships allows data sources to establish, use, and reuse vocabularies to define relationships between existing resources. these names or vocabularies, much like the resources they describe, have their own dedicated uris, making it possible for resources to form long-‐term and reliable relationships with each other. if resources and relationships have and retain their identities by virtue of their uris, then links between resources add to the awareness of these resources both for humans and machines. this is a key concept in realizing the overall mission of linked data to imbue data awareness and transforming the existing web of documents into a web of data. consequently various institutions and industries information technology and libraries | march 2015 10 have established standard vocabularies and made them available for others to use with their data. for example, the library of congress has published its subject headings as linked data. the impetus behind this gesture is that if data from multiple organizations is “typed link” using lcsh (library of congress subject headings) with linked data, then libraries and others gain the ability to categorize, collocate, and integrate data from disparate systems over the web by virtue of using a common vocabulary. as more and more resources link to each other through established and reusable vocabularies, the more data aware the web becomes. recognizing this opportunity, the library of congress has also developed and shared its vocabulary for bibliographic control as part of the bibframe framework.16 implementing bibframe to harvest and share bibliographic metadata nowadays, systems like catalogs and digital collection repositories are commonplace in libraries, but these source systems often operate as islands of data both within and across libraries. the goal of this case study is to explore and evaluate bibframe as a viable approach for libraries to integrate and share disparate metadata over the web. as discussed above, the bibframe model attempts to contextualize the use of linked data for libraries and provides a conceptual model and underlying vocabulary to do so. to this end, a unique collection of ismaili muslim community was identified for the case study. the collection is physically housed at the harvard university library (hul) and the metadata for the collection is dispersed across multiple systems within the library. an additional objective of this case study has been to define concrete and replicable steps for libraries to implement bibframe. the discussion below is therefore presented in a step-‐by-‐step format for harvesting and sharing bibliographic metadata over the web. 1. establishing a purpose for harvesting metadata the harvard collection of ismaili literature is first of its kind in north america. “the most important genre represented in the collection is that of the ginans, or the approximately one thousand hymn-‐like poems written in an assortment of indian languages and dialects.”17 the feasibility of bibframe was explored in this case study by creating a thematic research collection of ginans by harvesting existing bibliographic metadata at hul. the purpose of this thematic research collection is to make ginans accessible to researchers and scholars for textual criticism. historically libraries have played a vital role in making extant manuscripts and other primary sources accessible to scholars for textual criticism. the need for having such a collection in place for ginans was identified by dr. ali asani, professor of indo-‐muslim and islamic religion and cultures at harvard university: perhaps the greatest obstacle for further studies on the ginan literature is the almost total absence of any kind of textual criticism on the literature. thus far merely two out of the nearly one thousand compositions have been critically edited. naturally, the availability of reliably edited texts is fundamental to any substantial scholarship in this field. . . . for the scholar of post-‐classical ismaili literature, recourse to this kind of linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 11 material has become especially critical with the growing awareness that there exist significant discrepancies between modern printed versions of several ginans and their original manuscript form. fortunately, the harvard collection is particularly strong in its holdings of a large number of first editions of printed ginan texts—a strength that should greatly facilitate comparisons between recensions of ginans and the preparation of critical editions.18 2. modeling the data to fulfill functional requirements historically, the physicality of resources such as book or compact disc has dictated what is described in library catalogs and to what extent. the issue of cataloging serials and other works embedded within larger works has always been challenging for catalogers. for this case study as well, one of the major implementation decisions revolved around the granularity of defining a work. designating each ginan as a work (rather than a manuscript or lithograph) was perhaps an unconventional decision, but one that was highly appropriate for the purpose of the collection. thus there was a conscious and genuine effort to liberate a work from the confines of its carriers. fortuitously, bibframe does not shy away from this challenge and accommodates embedded and hierarchal works in its logical model. but bibframe, like any other conceptual model, only provides a starting point, which needs to be adapted and implemented for individual project needs. figure 2. excerpt of project data model information technology and libraries | march 2015 12 the data model for this case study (see figure 2) was designed to balance the need to accommodate bibliographic metadata with the demands of linked data paradigm. central to the project data model is the resources table where information on all resources along with their uris and categories (work, instance, etc.) are stored. resources relate to each other with use of predicates table, which captures relevant and applicable vocabularies. the namespace table keeps track of all the set of vocabularies being used for the project. in the triples table, resources are typed linked using appropriate predicates. once the data model for the project was finalized, a database was created using mysql to house the project data. 3. planning the uri scheme in general the uri scheme for this case study conformed to the following intuitive nomenclature: . this uri naming scheme ensures that a uri assigned to a resource depends on its class and category (see table 1). while it may be customary to use textual identifiers in the uris, the project used numeric identifiers to account for the fact that most of the ginans (works) are untitled and transliterated into english from various indic languages. generally support for using uris is either already built-‐in or added on depending on the server technology being used. this case study utilized the lamp (linux, apache, mysql, and php) technology stack, and the uri handler for the project was added on to the apache webserver using url-‐rewriting (or mod_rewrite) facility.19 resource types bibframe category uri example organizations annotation http://domain.com/organization/1 collections annotation http://domain.com/collection/1 items instance http://domain.com/item/1 ginan work http://domain.com/ginan/1 subjects authority http://domain.com/subject/1 table 1. uri naming scheme and examples 4. using standard vocabularies bibframe provides the relevant vocabulary and the underlying uris to implement linked data with bibliographic data in libraries. while not all attributes may be applicable or used in a project, the ones that are identified as relevant must be referenced with their rightful uri. for example, the predicate hasauthority from bibframe has a persistent uri (http://bibframe.org/vocab/hasauthority) enabling humans as well as machines to access and decode the purpose and scope of this predicate. other vocabulary sets or namespaces commonly used with linked data include resource description frameowrk (rdf), web ontology language (owl), friend of a friend (foaf), etc. in rare circumstances, libraries may also choose to publish their own specific vocabulary. for example, any unique predicates for this case study could be linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 13 defined and published using the http://domain.com/vocab namespace. 5. identifying data sources the bibliographic metadata used for this case study was obtained from within hul. as mentioned above, the data pertained to a unique collection of religious literature belonging to the ismaili muslim community of the indian subcontinent. this collection was acquired by the middle eastern department of the harvard college library in 1980. the collection comprises 28 manuscripts, 81 printed books, and 11 lithographs. in 1992, a book on the contents of this collection was published in 1992 by dr. asani and was titled the harvard collection of ismaili literature in indic languages: a descriptive catalog and finding aid. the indexes in the book served as one of the sources of data for this case study. subsequent to the publication of the book, the harvard collection of ismaili literature was also made available through harvard’s opac (online public access catalog) called hollis (see figure 3). the catalog records were also obtained from the library for the case study. some of the 120 items from the collection were subsequently digitized and shared as part of the harvard’s islamic heritage project. the digital surrogates of these items were shared through the harvard university library open collections program. and the library catalog records were also updated to provide figure 3. hollis: harvard university library’s opac direct access to the digital copies where available. additional metadata for the digitized items was also developed by the library to facilitate open digital access through harvard library’s page delivery service (pds) to provide page-‐turning navigational interface for scanned page images over the web. data from all these sources was leveraged for the case study. information technology and libraries | march 2015 14 6. transforming source metadata for reuse etl (extract, transform, and load) is an acronym commonly used to refer to the steps needed to populate a target database by moving data from multiple and disparate source systems. extraction is the process of getting the data out of the identified source systems and making it available for the exclusive use of the new database being designed. in the context of the library realm, this may mean getting marc records out from a catalog or getting descriptive and administrative metadata out of a digital repository. format in which data is extracted out of a source system is also an important aspect of the data extraction process. use of xml (extensible markup language) format is fairly common nowadays as most library source systems have built-‐in functionality to export data into a recognized xml standard such as marcxml (marc data encoded in xml), mods (metadata object description schema), mets (metadata encoding and transmission standard), etc. in certain circumstances, data may be extracted using csv (comma-‐separated values) format. transformation is the step in which data from one or more source systems is massaged and prepared to be loaded to a new database. the design of the new database often enforces new ways of organizing source data. the transformation process is responsible to make sure that the data from all source systems is integrated while retaining its integrity before being loaded to the new database. a simplistic example of data transformation may be that the new system may require authors’ first and last names to be stored in separate fields rather than in a single field. how such transformations are automated will depend on the format of the source data as well as the infrastructure and programming skills available within an organization. since xml is becoming the de facto standard for most data exchange, use of xslt (extensible stylesheet language transformations) scripts is common. with xslt, data in xml format can be manipulated and given different structure to aid in the transformation process. the loading process is responsible for populating the newly minted database once all transformations have been applied. one of the major considerations in this process is maintaining the referential integrity of the data by observing the constraints dictated by the data model. this is achieved by making sure that records are correctly linked to each other and are loaded in proper sequence. for instance, to ensure referential integrity of items and their annotations, it may be necessary to load the items first and then the annotations with correct reference to the associated item identifiers. for this case study, records from source systems were obtained in marcxml and mets formats, and specific scripts were developed to extract desired elements and transform them into the required format. a somewhat unconventional mechanism was used to capture and reuse the data from dr. asani’s book, which was only available in print. the entire book was scanned and processed by an ocr (optical character recognition) tool to glean various data elements. once the data was cleaned and verified, the information was transformed into a csv data file to facilitate linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 15 database loading. 7. generating rdf triples the rdf triples can be written or serialized using a variety of formats such as turtle, n-‐triples, json, as well as rdf/xml, among others. the traditional rdf/xml format, which was the first standard to be recommended for rdf serialization by the world wide web consortium (w3c), was used for this case study (see figure 4). the format was chosen for its modularity in preserving the context of resources and their relationships as well as its readability for humans. generating rdf may be a simple act if the data is already stored in a triplestore, which is a database specifically designed to store rdf data. but given that this project was implemented using a relational database management system (rdbms), i.e., mysql, the programming effort to generate rdf data was complex. the complications arose in identifying and tracking the hierarchical nature of the rdf data, especially in the chosen serialization format. several server-‐side scripts were developed to aid in discerning the relationships among resources and formatting them to generate triples. in hindsight generating triples would have been easier using the n-‐triples serialization but that would have also required more complex programming for rebuilding the context for the user interface design. figure 4. a sample of triples serialized for the project 8. formatting rdf triples for human and machine consumption the raw rdf data is sufficient for machines to parse and process, but humans typically require intuitive user interface to contextualize triples. in this case study, xsl was extensively used for formatting the triples. while xslt and xsl (extensible stylesheet language) are intricately related, they serve different purposes. xslt is a scripting language to manipulate xml data whereas xsl is a formatting specification used in presentation of xml, much like how css (cascading style sheets) are used for presenting html. a special routing script was also developed to detect whether the request for data was intended for machine or human consumption. for machine requests, the triples were served unformatted whereas for human requests, the triples were formatted to display in html. information technology and libraries | march 2015 16 figure 5. formatted triples for human consumption discussion models are tools of communicating simple and complex relations between objects and entities of interest. effectiveness of any model is often realized during implementation when the theoretical constructs of the models are put to test. the challenge faced by bibframe, like any new model, is to establish its worthiness in the face of the existing legacy of marc. the existing hold of marc in libraries is so strong that it may take several years for bibframe to be in a position to challenge the status quo. historically bibliographic practices in libraries such as describing, classifying, and cataloging resources have primarily catered to tangible, print-‐based knowledge carriers such as books and journals.20 bibframe challenges libraries to revisit and refresh their traditional notion of text and textuality. although initially introduced as a replacement for marc, bibframe is far from being an either-‐or proposition given the marc legacy. nevertheless, bibframe has made linked data paradigm much more accessible and practical for libraries. rather than perceiving bibframe as a threat to existing cataloging praxis, it may be useful for libraries to allow bibframe to coexist within the current cataloging landscape as a means for sharing bibliographic data over the web. libraries maintain and provide authentic metadata about knowledge resources for their users based on internationally recognized standards. this high quality structured metadata from library catalogs and other systems can be leveraged and repurposed to fulfill unmet and emerging needs of users. with linked data, library metadata could become readily harvestable by search engines, transforming dormant catalogs and collections into active knowledge repositories. in this case study seemingly disparate library systems and data were integrated to provide a unified and enabling access to create a thematic research collection. it is also possible to create such purpose-‐specific digital libraries and collections as part of library operations without having to acquire additional hardware and commercial software. it was also evident from this case study that digital libraries built using bibframe offer superior navigational control and access points linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 17 for users to actively interact with bibliographic data. any linked data predicate has the potential to become an access point and act as a pivot to provide insightful view of the underlying bibliographic records (see figure 6). with advances in digital technologies “richer interaction is possible within the digital environment not only as more content is put within reach of the user, but also as more tools and services are put directly in the hands of the user.”21 developing capacity to effectively respond to the informational needs of users is part and parcel of libraries’ professional and operational responsibilities. with the ubiquity of the web and increased reliance of users on digital resources, libraries must constantly reevaluate and reimagine their services to remain responsive and relevant to their users. figure 6. increased navigational options with linked data conclusion just as libraries rely on vendors to develop, store, and share metadata for commercial books and journals, similar metadata partnerships need to be put in place across libraries. the benefits and implications of establishing such a collaborative metadata supply chain are far reaching and can also accommodate cultural and indigenous resources. library digital collections typically showcase resources that are unique and rare, and the metadata to make these collections accessible must be shared over the web as part of library service. as the amount of data on the web proliferates, users find it more and more difficult to differentiate between credible knowledge resources and other resources. bibframe has the potential to address many of the issues that plague the web from a library and information science perspective, including precise search, authority control, classification, data portability, and disambiguation. most popular search engines like google are gearing up to automatically index and collocate disparate resources using linked data.22 libraries are particularly well positioned to realize this goal with their expertise in search, metadata generation, and ontology development. this research looks forward to further initiatives by libraries to become more responsive and make library information technology and libraries | march 2015 18 resources more relevant to the knowledge creation process. references 1. tim f. knight, “break on through to the other side: the library and linked data,” tall quarterly 30, no. 1 (2011): 1–7, http://hdl.handle.net/10315/6760. 2. eric miller et al., “bibliographic framework as a web of data: linked data model and supporting services,” november 11, 2012, http://www.loc.gov/bibframe/pdf/marcld-‐report-‐ 11-‐21-‐2012.pdf. 3. angela kroeger, “the road to bibframe: the evolution of the idea of bibliographic transition into a post-‐marc future,” cataloging & classification quarterly 51, no. 8 (2013): 873–89, http://dx.doi.org/10.1080/01639374.2013.823584. 4. eric miller et al., “bibliographic framework as a web of data: linked data model and supporting services,” november 11, 2012, http://www.loc.gov/bibframe/pdf/marcld-‐report-‐ 11-‐21-‐2012.pdf. 5. nancy fallgren et al., “the missing link: the evolving current state of linked data for serials,” serials librarian 66, no. 1–4 (2014): 123–38, http://dx.doi.org/10.1080/0361526x.2014.879690. 6. the figure has been adapted from eric miller et al., “bibliographic framework as a web of data: linked data model and supporting services,” november 11, 2012, http://www.loc.gov/bibframe/pdf/marcld-‐report-‐11-‐21-‐2012.pdf. 7. “bibliographic framework initiative project,” library of congress, accessed august 15, 2014, http://www.loc.gov/bibframe. 8. nigel shadbolt, wendy hall, and tim berners-‐lee, “the semantic web revisited,” intelligent systems 21 no. 3 (2006): 96–101, http://dx.doi.org/10.1109/mis.2006.62. 9. sören auer et al., “introduction to linked data and its lifecycle on the web,” in reasoning web: semantic technologies for intelligent data access, edited by sebastian rudolph et al., 1– 90 (heidelberg: springer, 2011), http://dx.doi.org/10.1007/978-‐3-‐642-‐23032-‐5_1. 10. tim berners-‐lee, “linked data,” design issues, last modified june 18, 2009, http://www.w3.org/designissues/linkeddata.html. 11. danny ayers and max völkel, “cool uris for the semantic web,” world wide web consortium (w3c), last modified march 31, 2008, http://www.w3.org/tr/cooluris. 12. tom heath and christian bizer, linked data: evolving the web into a global data space (morgan & claypool, 2011), http://dx.doi.org/10.2200/s00334ed1v01y201102wbe001. linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 19 13. christian bizer, tom heath, and tim berners-‐lee, “linked data—the story so far,” international journal on semantic web and information systems 5, no. 3 (2009): 1–22, http://dx.doi.org/10.4018/jswis.2009081901. 14. ibid. 15. tony boston, “exposing the deep web to increase access to library collections” (paper presented at the ausweb05, the twelfth australasian world wide web conference, queensland, australia, 2005), http://www.nla.gov.au/openpublish/index.php/nlasp/article/view/1224/1509. 16. “bibliographic framework initiative,” bibframe.org, accessed august 15, 2014, http://bibframe.org/vocab; “bibliographic framework initiative project,” library of congress, accessed august 15, 2014, http://www.loc.gov/bibframe. 17. ali asani, the harvard collection ismaili literature in indic languages: a descriptive catalog and finding aid (boston: g.k. hall, 1992). 18. ibid. 19. ralf s. engelschall, “url rewriting guide,” apache http server documentation, last modified december, 1997, http://httpd.apache.org/docs/2.0/misc/rewriteguide.html. 20. yann nicolas, “folklore requirements for bibliographic records: oral traditions and frbr,” cataloging & classification quarterly 39, no. 3–4 (2005): 179–95, http://dx.doi.org/10.1300/j104v39n03_11. 21. lee l. zia, “growing a national learning environments and resources network for science, mathematics, engineering, and technology education: current issues and opportunities for the nsdl program,” d-‐lib magazine 7, no. 3 (2001), http://www.dlib.org/dlib/march01/zia/03zia.html. 22. thomas steiner, raphael troncy, and michael hausenblas, “how google is using linked data today and vision for tomorrow” (paper presented at the linked data in the future internet at the future internet assembly (fia 2010), ghent, december 2010), http://research.google.com/pubs/pub37430.html. 142 lc/marc on molds; an experiment in computer-based, interactne bibliographic storage, search, retrieval, and processing pauline atherton, associate professor, school of library science, and karen b. miller, research associate, syracuse university, syracuse, new york a project at syracuse university utilizing molds, a generalized computer-based interactive retrieval program, with a portion of the library of congress marc pilot project tapes as a data base. the system, written in fortran, was used in both a batch and an on-line mode. it formed part of a computer laboratory for library science students during 1968-1969. this report describes the system and its components and points out its advantages and disadvantages. introduction the somewhat intimidating title of this report becomes less so when translated from jargon into more familiar phrases. the lc/marc on molds experimental project conducted at syracuse university school of library science utilizes a computer: 1) to store bibliographic reference (library catalog) data, 2) to search the data for items that meet a searcher's criteria, 3) to retrieve items the searcher wishes retrieved, and 4) to process or manipulate items as required. a dialog or interaction between man and his data, via the machine, is established when a searcher makes a request in a query language and the computer responds immediately to the request. the lc/marc on molds system consists of two major components. the first is the data base, which is a slightly modified subset of the library of congress marc pilot project records ( 1). the second component is the computer programming system written in fortran known as molds (acronym for management on-line lc marc on molds/atherton and miller 143 data system). molds provides the computer routines required to store and maintain the data base, and the query language (also known generally as molds) that a searcher uses to interact with his data stored in the computer. the lc/ marc on molds system was originally implemented in april 1968 on the ibm 360/ 50 at the syracuse university computing center. this system is part of an experiment to determine how on-line interactive retrieval systems could be used to greatest advantage in the information gathering process. the molds system, developed in 1966 by the syracuse university research corporation (2) for management purposes, was readily available for use in the research reported in this paper. molds has been used with several data bases, including the marc records. the system has not been made available to a large user population. preliminary work with the system and a few demonstrations to students have already provided considerable insight into the desirable and undesirable features in both the marc data base and the molds query language, an insight that has already resulted in both data-base and querylanguage modification. work with the system on the computer at syracuse university has raised many crucial questions extending beyond the original research plan about system and data base design-questions for which there are as yet no answers. even at its early stage of experimentation the work should be of interest to librarians because of its use of the marc pilot project records and its use of an available retrieval program with features suitable for reference retrieval. to the authors' knowledge, this is the first computer-based project in which the library of congress marc records were used in an interactive retrieval environment. the query language (molds) was not specifically designed for reference retrieval, but its design features make its use for this purpose quite feasible. it differs from the usual interactive system designed for bibliographic reference retrieval and therefore deserves attention for comparative purposes. molds gives a user the ability to process as well as retrieve data, something very few search and retrieval systems are designed to do. the contribution of lc/marc on molds to the world of information retrieval, promising though it appears, cannot be assessed until all experiments are run. this report on its features, both good and bad, is offered in order to make those concerned with the design and application of interactive systems aware of its unique aspects and potential. hopefully, this work will contribute another ingredient to the synthesis of ideas and methods that will bring the state of the art ever closer to the optimum and ideal. 144 journal of library automation vol. 3/2 june, 1970 table i. some features of interactive retrieval systems (circa 1968) #docs. in data base system name data base structure access points 1. audacious 2330 tree structure udc descriptions (alp) threaded list euratom key words 2. bold 6000 threaded list astia subject category (sdc) index terms accession numbers 3. colex 2000 inverted index descriptor} subject micro tree structure author qualified country (sdc) index subject by ~ent, subject area, date 4. grins > 1000 serial document index terms (lehigh u) inverted index 5. multilist varies threaded list any chosen key term to fit aaplitree structure cation ( e! author, subject, ate, directory title wor , subject headings) 6. marc/ 2000 molds cell-matrix any discrete data block 7. nasa/recon 270,000 ? subject l { author corporate qualified date source by report# contract# 8. tip > (mit) 25,000 list structure author(s) location ( where work done) citation identification ( i.v-p.) article title ( entire, keyword) citation index bibliographic coupling 9. suny biomed > 20,000 inverted index auilior } { comm. title qualified date, ne1work subject by lang. •each command is a subroutine. commands are tailored to application. a ccess to authority files on-line udc schedules subject category list, index term file no index term no optional no no no lc marc on molds/atherton and miller 145 related terms or #commands cross refs in query given language yes 11 yes 14 ( 11 light pen) no (conversation) yes (conversation) no 0 optional 35 yes 16 function keys no 9 also various mac commands no 10 (?) computer instruction in language use optional optional optional no no no optional no no comj:ier ai query formulation (conversation) limited yes yes yes no no yes no yes root communiword cation search link yes crt yes crt with light pen yes teletype yes teletype no crt no crt yes teletype no ibm 2740 console 146 journal of library automation vol. 3/ 2 june, 1970 background a number of interactive retrieval systems have b een designed and implemented within the last few years. the features and potential of lc/ marc on molds are best viewed in relation to what has been done in the field up to now. to gain some perspective, the major features of data base structures and query languages of other interactive systems are summarized in table 1. this table presents those features of most interest to librarians who may wish to compare searching on a computer with searching in the card catalog or other bibliographic reference tools. references 3-12 document sources for the data in this table. molds data base structure the general structure of the data base with which molds operates is, in comparison with the threaded lists and inverted indexes found in many retrieval systems, extremely simple and unsophisticated. the data base can be composed of from one to ten distinct files of 1000 records each. a record is equal to the bibliographic description on a card in a library catalog. each record may be up to 300 computer words ( 1200 characters) long and may be subdivided into 80 blocks. originally, there was a 200word (boo-character) limitation on record size, but this has now been expanded. the total file size (limit of 10,000 records) is adequate for testing purposes, but expansion beyond the present limitations is planned in order to make the system more practical for actual use. the structure of a file is essentially a simple matrix. each row contains all the elements of a single complete record; each column contains all like discrete items of all the records in the file. the columns are called blocks in the molds system, block and fi eld being used synonymously in this report. for example, a library catalog card for one publication would be a record in a file composed of library catalog records. the main entries in the file constitute a block and the dates of publication constitute another block. figure 1 illustrates the data base structure, as of 1968. in this illustration the maximum number of files is 10 ( 1000 records each) and the maximum number of blocks 80. each file and each block in a file is given a name and/ or number. a user can reference or call up any file or data block within a file by using its name or number in a molds query language command. there are as many access points to a file as there are blocks in that file. this is in contrast to a conventional card catalog, for example, where the only access points are filing entries: main entry, title, subject(s), added entries, series, and analytics. no specific provision is made within the molds system for the storage of authority files, cross reference lists, or other intermediate keys to the records. such files are not absolutely necessary for effective operation of the system since every block can be accessed and can serve as its own authority file. for more efficient system operation, however, it is intended lc marc on molds/atherton and miller 147 to explore the possibility of creating authority fi1es as part of the data base, beginning with portions of the seventh edition of the library of congress list of subject headings. block al block a2 •••••••• block abo pin l ....___ ---v../ file a } record aj. record a2 re;,ord a3 record aj.ooo data base block bl blockb2 -------block b8c ~ ~ file b record bl record b2 .. record blooo (as of 1968,--maldmulllllulllber .of files is 10 files,each or 1000 records, v1th maximum number of blocks 80) figure 1. section of general molds data base structure. provision is made for temporary user storage areas in which the user places the results of his retrieval and processing operations. data in the user area is retained only during the session in which it is created. although it cannot be saved for use at a later date, all or part of it can be printed out on the on-line printer for the user's later reference. while the general structure of the data base is formalized within the molds system, the content and specific organization of a particular data base is determined by its originator. this feature, plus the simplicity of molds' own structure, introduces a great deal of flexibility into the data base and the use that can be made of it. the originator of the data base may designate as a block any discrete data item he wishes. if the user population is dissatisfied with results using one content and arrangement of blocks, the base can be reformatted and restructured in a fairly simple maintenance run. no problems of linking records or modifying authority lists arise, as neither is part of the system. the first version of the lc/marc data base has in f.act been modified by addition of three blocks and division of one block in half to form two blocks, giving access to smaller units of data. 148 journal of library automation vol. 3/2 june, 1970 the lc/marc data base in molds format library of congress marc pilot project tapes containing some 40,000 records of english language books cataloged in 1966-67 became available for this project in the fall of 1967. because of the molds data base limitations, a subset of these catalog records was selected for use with molds. the original plan was to have each file in the data base consist of as complete a set as possible of all marc pilot project records from a single library of congress classification schedule. the candidate for the first file was class r (medicine) which contained just under 1000 records. later molds files were formed for two other lc classes: t (technology) and z (bibliography and library science). in mid-1969 two stratified sample files of the marc data base were created, one in the humanities, another in the social sciences. in all, syracuse has a marc/molds data base of 10,000 records. the record format of the marc tape was first analyzed to determine which fields should be included in the data base, and which might be omitted. the criterion for selection was probable usefulness to searchers of the data base, a conception that should undoubtedly be modified as searches are monitored. appropriate changes would not be difficult. toward the end of january 1969, a programming project was begun which entailed the design and implementation of a computer program to perform format conversion of the library of congress marc i bibliographic file to satisfy molds data base requirements. the project represented a three man-month effort and was completed by june 1969. the data-base converter program represents an attempt to provide a user-oriented facility for creating a molds data base from marc information. essentially, the user of the program describes each molds file to be produced by specifying: 1) the number of (fixed) fields per molds record; 2) the name and size (in characters) of each field in the molds record; 3) the name of the marc i field from which the data are to be taken; 4) selection criteria according to which marc i records are to be chosen for conversion; 5) for any marc i field, a data conversion procedure to be applied prior to transferring the information to the appropriate molds field; 6) whether or not diacritical codes should be stripped from the marc i field prior to transferring the information to the molds field; 7) whether or not character translation from lower-case to upper-case codes should be performed on the data prior to transfer from the marc i to the molds field. although the program has not yet been refined to the extent originally intended, nevertheless it contains all the features indicated above and has lc marc on molds/atherton and miller 149 been used to create ten molds files since its completion. the program is written in pl/i and more fully documented in a report available from the national auxiliary publication service of asis. molds requires fixed-field input for its data base, but many of the fields or data blocks on the marc tape are variable in length. therefore, the field lengths of 200 records in the class r (medicine) subset were examined to determine the maximum size which would produce a molds record within the original 200 computer-word (boo-character) limitation and still retain all the desired data. this limitation was easily expanded to 300 words, allowing addition of new fields and expansion of existing fields as new marc/molds files were generated. a record whose original variable length was 500 characters or less expanded to about 800 characters when converted to fixed-field form. in the first data base only records of 500 characters or less were considered for inclusion, which gave a total of 620 records in the first marc/molds file. by mid-1969 this data base was greatly enlarged using the program described above. the names of the present marc/molds files are: ss01, ss02, ss03, ss04, ssoz, and ssoh. the first files generated were called marc and marz. the marc/molds format now in use is given in table 2. the additions made to the original format are noted. marc/molds block names can be used instead of block numbers; for ease of searching both name and number are given in the table. the molds block number corresponds to marc pilot project field tags whenever possible. after this second revision had been completed, marc ii ( 13) format with new field tags appeared . interestingly, there were remarkably few differences. creating an information retrieval system from other data bases can present some major headaches. during the first test session with the marc/ molds data base, it was discouraging to find that successful retrieval operations could not be performed on such vital items as subject or main entry (blocks main and suba, respectively) . the problem lay in the fact that the lower-case character codes employed on the marc tape had not been converted to the all-upper-case-codes required by molds. once discovered, the problem was easily remedied. other problems were not so easy to solve. the marc data base had been received in a "raw form", i.e., there were typographical errors in the original tapes and irregular spacing; and incorrect punctuation, spelling and abbreviations. there was no way to detect these errors, and the retrieval program would only work on direct matches of query and document information elements. the molds language (to be discussed subsequently) required a good deal of standardization and regularity of the records to take full and effective advantage of its retrieval capabilities. 150 journal of library automation vol. 3/2 june, 1970 table 2. marc/ molds data base format description field names chars. marc i data element marc fixed fields: molds blk in fixed information block no. block field values or name position explanation or tag no. lc card no. ldn~ 80 11 9-19 type of main entry type 81 1 21 a-g form of work f0rm 82 1 22 mis bibliographies indicator bib 83 1 23 xb illustrations " illu 84 1 24 " maps map 85 1 25 conferences c0nf 86 1 26 juvenile juv 87 1 27 languages lang 88 4/ 4 29-36 both languages language 1 lanl 1 4 29-32 language 2 lan2 2 4 33-36 publication dates date 89 4/ 4 38-45 both dates height in em. hite 90 2 59-60 uniform tracing indicator unif 91 1 66 xlb series tracing indicator sert 92 1 69 " place of publication code plcd 18 4 46-49 publisher code pucd 19 4 50-53 lc call no. lcn~ 98 20 90 dewey class no. dew! 99 20 92 dewey class no. (edited) dew2 39 8 92 ooddd.dd lc class no. (edited) lccl 97 8 90 e.g. 00351.2352 main entry main 10 68 10 title statement titl 20 80 20 subtitle statement stit 21 80 20 edition statement edit 25 12 25 place } . plce 30 28 30 publisher 1mpnnt statement publ 31 28 30 collation c0ll 40 48 40 series note sers 50 44 50/51 note n0ta 60 44 60 note n0tb 61 44 60 subject tracing suba 68 48 70 subject tracing subb 69 48 70 subject tracing subc 70 48 70 lc marc on molds/atherton and miller 151 personal author tracing paua 71 40 71 personal author tracing paub 72 40 71 corporate author tracing c0rp 73 1 72 lc card suffix lcff 94 3 94 total marc/molds characters 848 the molds system functionally, the molds system consists of utility routines to store a data base, a well-defined query language, a language interpreter, and a set of logical procedures which allow the user to operate on a data base. the molds system is a set of fortran iv subroutines which perform the maintenance functions, interpret the commands in the query language and perform the desired logical procedures. the subroutines render the system modular and open. it is therefore relatively easy for a programmer skilled in fortran iv to add, modify and delete commands and functions as required. this feature of the system is quite desirable. user feedback invariably points up weaknesses in the language or suggests useful features which might be incorporated. molds was continually modified in response to user requirements, and each modification was implemented within a short time without requiring major programming changes throughout the system. the system has already grown since it was first implemented with the marc data base, and commands have been added or modified as required. hardware configuration marc/molds was run at syracuse university computing center on an ibm 360/ 50 computer. originally, the on-line mode required full dedication of the computer during execution. the molds system requires some 150,000 bytes of main memory and a disk storage unit to hold the entire data base, as well as intermediate data generated by the user. the molds system has been implemented on other computers (2). interaction with the system in the on-line version was carried on through an ibm 2260 display station consisting of a keyboard and crt (cathode ray tube) display screen. although two or more consoles have not as yet been operated simultaneously, the system is intended to be time-shared. effort was made to alter the system to operate in a 50,000 ( 50k) upper partition, so that it could be accessible at all times rather than on a scheduled basis. this involved reorganizing the program into an overlay structure in which the basic or root segments are resident in a fixed portion of memory throughout execution, while the remainder of the program is divided into a set of smaller segments which can overlay each other, being brought into memory only when needed. this task 152 ] ournal of library automation vol. 3/ 2 june, 1970 required a careful analysis of each subroutine for its dependence upon others, breaking the program into mutually exclusive segments, while ensuring that any given set of segments which occupied memory simultaneously did not exceed 50k bytes of storage. many of the larger segments which had to be further subdivided required considerable reprogramming. the first attempt at executing the new overlay version failed. due to a general lack of experience with the 2260 display units, it had not been anticipated that system software would not allow the console to be accessed from outside of the root segment, and the 2260 software package had been placed in an overlay area. as a result the original overlay configuration had to be altered. the console input/ ouput (i/0) package was moved into the root segment, increasing its size by several hundred bytes and similarly decreasing the amount of storage available for the overlay portions. therefore, it was necessary to develop yet another configuration to conform to these new storage limitations. while the necessary changes were being made, the computing center began operating a limited time-sharing system which itself required full dedication of the 360/50 machine. projected dates for returning to normal computer operations within a multi-partition environment were far enough in the future to suggest the efficacy of creating a new version of molds which could function off line, with cards and printer instead of the 2260 consoles. in this batch, or off-line, mode molds jobs could be submitted through the regular queue and run by computer center staff during batch processing time. with the on-line source program as a starting point, all references to 2260's were replaced with card reader and printer statements and the molds language instructions deleted which depended on the console for their use. mter all changes had been made and compilation was completed successfully, the off-line molds was exercised against a sample data base until it was satisfactorily debugged. since it was known that the computing center would eventually return to r artitioned operation, it was next undertaken to overlay the off-line molds into a 50k partition. this was accomplished with little difficulty since the problems encountered in working with the on-line version were largely due to the consoles. the end result of the entire task, therefore, was an off-line molds which could operate either in core or in overlay structure at the discretion of the user. the molds query language the molds query language includes some 34 distinct commands which must be entirely formulated by the user according to precise syntactical rules. the large number of commands is in part a reflection of the fact that this system provides the user with the ability to perform more operations of a greater variety on a data base than other interactive inforlc marc on molds/ atherton and miller 153 mation retrieval systems. it provides for retrieval of records from the data base according to data value descriptors, processing of data values by arithmetic and logical operations, sorting of retrieval records, and display of retrieval records in full or in part. operationally, the molds system regards a file of records as a set of parallel lists of blocks (figure 1). with the marc data base, these blocks were the 38 fields of catalog data (such as dewey class number, title, author, etc.). the commands in the molds query language are geared to list processing operations. in general, most of the molds commands will result in the formation of lists which are either identical in format to the original file, or are an independent list of alpha or numeric constants not subdivided into blocks. despite its surface complexity, the query language was designed specifically for users with absolutely no computer experience. the fixed format commands are easy to learn and use, even for the novice in computer based systems. they are mnemonic enough so that a little use soon brings an easy familiarity with them. commands in the molds query language there are six categories of commands in the language: retrieval, processing, display, storage, utility, and language augmentation. the commands are listed below with a brief explanation of each. retrieval commands: find: extract fetch define chain select forms a temporary subfile consisting of records from the data base for which the value in a specified block is equal, not equal, greater, greater or equal, less, less or equal to an input value. forms a temporary subfile consisting of records from an argument subfile for which the value in a specified block is equal, not equal, greater, greater or equal, less, less or equal to an input value. forms a temporary file which duplicates an existing file in the data base (added to original molds commands during this project) . forms a temporary subfile from two argument subfiles based on logical relationships and, or, not. forms a temporary subfile consisting of records from an argument subfile for which the value in a specified block is equal to any of the values in a specified block from a second argument subfile. forms a temporary subfile consisting of records from an argument subfile for which the value in a specified block is equal to any of the values in an argument list. 154 journal of library automation vol. 3/2 june, 1970 these six retrieval commands allow the user to extract selected data from the data base. selection is based on 1) a simple algebraic relationship (e.g., equal, not equal, greater than, etc.) between block values and a value specified by the user in the command (value may be alphanumeric or numeric), or 2) a simple logical relationship (e.g., and, or, not) between block values in two lists. all retrievals from molds files are based on exact-match correspondences between input descriptors and data values as they occur in records. each file is treated as distinct regardless of the fact that for the marc/ molds data base the second file may simply be a continuation of the first, etc. any block in a file may be used as an argument in a retrieval process. thus, the usual range of access points (author, title, subject, classification number) is considerably extended to include such unorthodox access points as juvenile literature, language, illustrations, and bibliographies. for example, one can retrieve all documents on a given subject or subjects which are juvenile books with bibliographies and illustrations published by a given publisher in 1966. the user can define his search limits with a degree of specificity not found in most interactive systems. however, the price he must pay is exactness in specifying the values used as retrieval criteria. the system will not retrieve on root words or key letter combinations, although such capability could be added. the block values must, therefore, be consistent and the user must have a precise knowledge of what they may be. this knowledge can be gained by examining the values and having them printed out as needed. (molds does have the capability of selecting unique values from a list, ordering them, and printing them out at any time during system operation. processing commands: count counts the number of records in an argument subfile or items in an argument list. order (reverse) maximum (minimum) total average arranges the records of an argument subfile in ascending (descending) order according to the values in a specified block or similarly sorts the values in an argument list. may be applied to alphabetic, numeric, and chronological data. selects the record containing the maximum (minimum) value in a specified block from an argument subfile, or the maximum (minimum) value in an argument list. may be applied to numeric or chronological data. calculates the sum of the values in a specified block of an argument subfile or of a list of numbers. calculates the average of the values in a specified block of an argument subfile or of a list of numbers. lc marc on molds/atherton and miller 155 median variance squareroot difference add (subtract multiply divide) calculates the median of the values in a specified block of an argument subfile or of a list of numbers. calculates the variance (standard deviation squared) of the values in a specified block of an argument subfile or of a list of numbers. calculates the square root of each value in a block of an argument subfile or of a list of numbers. calculates successive differences in the values of a specified block in an argument subfile or of a list of numbers. adds (subtracts, multiplies, divides ) the values from a specified block from an argument file (or list) to the corresponding values from a specified block from a second argument file (or list) . firstelement selects the first record from an argument subfile or reduce compress list. deletes the first record from an argument subfile or list. forms a temporary list composed of all the unique values in a specified block of an argument subfile or in an argument list. the eighteen processing commands allow the user to manipulate the data in the lists he has retrieved. he may count the number of elements in a list, arrange them in ascending or descending order, form the sum, average, variance, median and square root of a list of numbers; add, subtract, multiply, and divide one list by another, and select all unique elements from a list. the ability to process data as well as retrieve it may be unique to molds as compared to other interactive systems, and gives the language a useful added power. display commands display show print outputs on the crt (cathode ray tube) each complete record in an argument subfile (added to original molds commands during this project). outputs in columnar fashion on the crt selected blocks from up to three argument subfiles or lists (deleted in batch or off -line mode) . outputs in columnar fashion on the printer selected blocks from up to three argument subfiles or lists (added to original molds commands during this project) . the three display commands allow the user to display entire documents, or display selected books of information or records in columnar format. in 156 journal of library automation vol. 3/2 june, 1970 the on-line version of molds this may be done on the crt, or a printout made of selected blocks or lists of documents on the high speed printer. there is much flexibility and versatility in output format which is completely determined by the user. the command, show, is not used in the batch mode of molds. storage commands: set stores a single numeric value. store stores an alphabetic, chronological, or numeric list of arbitrary length. the two storage commands allow the user to insert independent lists of constants into the storage area. such lists do not become part of the data base, but are used in conjunction with retrieval and processing commands. utility commands: clear delete dump recall list deletes from storage a temporary subfile or list created during the session. deletes from storage all temporary subfiles or lists created during the session. displays on the crt in tabular fashion the names, file origins, and number of items in each subfile and list created by the user during the session (deleted in batch or off-line mode). displays on the crt the command which resulted in the creation of a specified temporary subfile or list (added to original molds commands during this project). produces printed copy of all commands issued during the session. may be used with stop at end of search (added to original molds commands during this project). the five utility commands allow the user to perform housekeeping operations, such as the clearing of storage areas, reinitialization of the system, and termination of execution. the command dump is not used in the batch mode of molds. language augmentation command: program allows the user to create new commands consisting of a sequence of basic commands and to store them for future sessions. the language augmentation command program, is one of the most important features of the language. it allows the user to create new commands tailormade to his own needs. this is shown in the first molds search query which follows. lc marc on molds/atherton and miller 157 search request formulation in marc/molds molds search query-example 1 (batch mode) program tally a/ count b a/ print b// end find zny ssoz/plcd/e/nyny/ tally zny/ print zny/plce/ plcd/pucd/ i find p67 ssoz/ date/e/1967 i tally p67/ define ny67 zny/and/p67/ tally ny67/ average avht ny67 /hite/ print avht/ stop the above example shows an off-line or batch-mode search. this sequence of commands would be keypunched and submitted as a job deck in the regular queue and run by the computer center staff, the searcher receiving the results as a printout from the high speed printer. ssoz is the name of one of the marc/molds files. this particular interaction shows the use of the operator program to augment the language in the subsequent search by adding tally to the list of commands. the following example shows a search query which is a sequence of some typical molds commands along with an explanation of the effect of each. each command has three parts. the first part (find, define, etc.) is the imperative which tells what operation is to be performed. the second part ( bibl, engl, both, etc.) is the label of the place in storage where the result of the operation is to be stored. this label is made up by the user when he gives a command. the third part of the command is the operand. in some cases the operand gives the criteria for retrieval (as in find, define) . it always gives the name or label of the list to be operated on, and in some cases specifies a particular block of that list. the request shown in this example was handled by molds to retrieve, display, and process all english language books on printing, or typesetting, or type founding which have bibliographies. the sequence illustt·ates the flexibility of molds, the many types of processing which can be done, the relatively easy way to use command format. this particular sequence was performed in the on-line version with chance for usersystem interaction after each command. 158 journal of library automation vol. 3/2 june, 1970 molds search query-example 2 (on-line mode) molds commands: find bibl marc/bib/e/x/ explanation: find all records in the file named marc for which the block named bib contains a value equal to (e) x (x in the block indicates presence of bibliographies). the list of selected records is to be stored in a location called bibl. find engl marc/lang/e/eng/ find all documents in the file named marc for which the block named lang contains a value equal to (e) eng, i.e. english language books. the list of selected records is to be !itored in a location called engl. define both bibl/ and/engl/ store subs 3/ alpha/13/ element 1 = printing/ element 2 = type-setting/ element 3 = define a new list called both which consists of the documents common to both bibl and engl, i.e., all english language books with bibliographies. inform the system that the user wishes to store, via the console, a list of values which will be called subs. the list will contain 3 elements which will be alphanumeric (alpha) as opposed to strictly numeric. the longest element will not exceed 13 characters. (system responds with these words.) user inserts first value by typing it on the console. (system responds with these words.) user inserts second value. (system responds with these words.) lc marc on molds/atherton and miller 159 type-founding/ select all both/subj/subsi count no. all/ show no.// print all/main/titl/lcno i i all/publ/plce/ i maximum big all/hite/ average ave all/hite/ user inserts third value. user has now created an independent list of three distinct valuesprinting, typesetting, type-founding and stored them in a location called subs. select all records from the list called both for which the values in the block named subj are equal to any of the values in the list called subs, i.e. those records for which the subject heading is printlng, type-setting, or type-founding. the selected records are stored in a location called all. count the number of records in the list called all. the count is stored in a location called no. display the contents of no. on the crt. produce a 5-column printed listing consisting of the values in the blocks ·named main (main entry), titl (title), lcno (library of congress classification number), publ (publisher), plce (place of publication) from each record of the list called all. from the list called all, select the record containing the maximum value in the block named hite (height) . the record is stored in a location called big. calculate the average of the values in the block named hite (height) of the list called all. the value is stored in a location called ave. 160 journal of library automation vol. 3/2 june, 1970 the following example records another interaction and the results in the off-line or batch mode. notice the error message which did not interrupt the search. this result also includes a report on the length of central processing unit (cpu) time each operation takes in hours, minutes, seconds and tenths of seconds. any line preceded by c indicates that the line was printed by the computer; any line minus the c indicates that the information was typed in by the user. molds retrieval-example 3 (batch mode) c please enter your program c line 1 oooooooopauline athertonooooooooo c invalid command name c set in at 185 day of 1969 16-01-17.1 c line 1 program tally a/ c line 1 count b a/ c line 2 print b// c line 3 end c set in 185 day of 1969 16-01-17.5 c line 2 find d2 ssoz/dew2/ne/o? c set in at 185 day of 1969 16-02-38.7 c line 3 find d1 ssoz/dewl/ne/ i c set in at 185 day of 1969 16-03-56.7 c line 4 tally d2/ c 950.00 c set in at 185 day of 1969 16-03-57.3 c line 5 tally d1/ c 905.00 c line 6 stop comments on marc/molds thus far this report has been confined to a more or less factual description of the components of the marc/molds system. no doubt the reader has asked himself many questions about the system, and made his own critical comparisons between this system and others. what follows are preliminary and necessarily subjective comments based on a lc marc on molds/atherton and miller 161 few demonstrations given to students in the school of library science and on the authors' own observations and reflections. system design response time response time (i.e. the time between transmission of a command in the on-line version and its execution) has been on the order of 90 seconds for a search of 620 records, to 20 seconds for an arithmetic operation involving the same number of records. when one thinks of these times in comparison with the time required to perform the same operations manually, they seem rapid. however, 90 seconds appears to be an unreasonably long period of time in a computer-based interactive retrieval environment. viewers of demonstrations often asked why it took the computer "so long" to perform a search. a user's tolerance for delay appears to vary a great deal with the type of retrieval system he is using. this has been observed on other occasions, but no determination has yet been made of tolerable limits in different environments, a determination that would be important in designing computer-based systems. · man-system interaction a design goal of most other existing interactive retrieval systems seems to be to give the computer certain anthropomorphic qualities and make it into a teacher or a responsive friend. such systems offer computeraided query formulation and/or a friendly conversation with the computer. the molds on-line system does not include either of these features. the user must first master a marc/molds manual which is an explanation of the system and the data base. he then goes on line and gives his command. molds responds by performing that command or by putting out a brief error message if the command format was improper. apparently the objective of conversation with the computer as found in most systems is to make it easier for the user to achieve desired results or to make him feel more at ease with the system. the person who plays with an interactive system once or twice probably finds conversations with a computer amusing, novel, and helpful in his first attempts. however, for a serious and steady user, carrying on the same conversation with the computer during each and every session can be tedious, repetitive, time consuming and sometimes circular. the optimum mix of computer-aided and independent user-formulated query is yet to be studied and found. perhaps molds, because it is a poor conversationalist, could aid in this search. at any rate, the automatic assumption of conversational features as a design goal for computer-based retrieval systems may not be based on sound knowledge of what suits the serious user. molds repertory of commands the processing commands in the molds query language are a wei162 journal of library automation vol. 3/2 june, 1970 come and valuable addition to the usual repertory of search and display commands common to most interactive systems. although the marc data base does not lend itself to a great deal of processing, we have found some commands useful, particularly count, order, maximum, minimum, and compress. processing times when individual commands of a single search take seconds of cpu time, it is certain that a retrieval system will be expensive if it is employed by a great many users as a general purpose system. some of the molds commands operating on the marc data base took whole minutes of cpu time! the authors have learned a great deal about interactive retrieval systems by using molds experimentally, but because of the excessive cost of certain runs, may not be able to continue research with it. modifications will have to be made to make it more efficient (i.e. cheaper to run) before it could be recommended for general use in the syracuse university library school or anywhere else. if the molds system can be designed to yield good results for certain types of searches with a realistic file size, it will be a boon to the library or educational institution seeking to automate some part of its searching procedures. data base noah prywes ( 14) has commented, "the effectiveness in retrieving documents is highly dependent on the amount of labor and processing invested in the storage of documents." the minimum amount of processing done on the marc tapes has, in fact, limited the effectiveness of retrieval. the extreme simplicity of the general molds data base structure is worthy of study. the efficiency and cost of retrieval using this structure needs to be compared very carefully with more sophisticated threaded lists. one extremely important factor to consider will undoubted· ly be the effect of increasing the size of the file. as pointed out before, the molds system requires an exact match of punctuation and spelling between retrieval criteria and stored data items, a match difficult to achieve. to be sure, this is partially a limitation in the molds system that may be relaxed by incorporating a capability to search for root words and key letter combinations. however, the many inconsistencies in abbreviations, punctuation, and spelling that appear in bibliographic records when information on title pages is transcribed, as on the marc tapes, can enormously complicate effective retrieval. marc or non-marc bibliographic records will always contain some "author" variations that such a system as molds may have to accommodate. this is a very knotty problem. these comments are not to be construed as a criticism of the fine work the library of congress has done in its marc pilot project. the marc lc marc on molds/atherton and miller 163 pilot project record format, with sometimes indistinct data elements ( special punctuation marks and symbols), was not specifically designed for computer-based interactive search systems. hopefully, the use herein described to which the marc data base has been put, and the experience derived from that use, will be of value as future modifications of the marc format are made. mter all, reference retrieval, using bibliographic information, automated or manual, is natural to libraries and is, indeed, one of the purposes for which that information is recorded in the first place. since one of the true values of a computer-based file lies in making multiple use of the records, it becomes imperative to test the various uses to which these records can be put. the future use of marc/molds at syracuse university the marc/molds system has undergone continual modification in data base structure and query language during the first year of work on it. a computer-based system must be capable of such flexibility, for changes should be accomplished easily and smoothly. no system is perfect, especially in its early days, least of all molds. it is intended to continue investigation into information-seeking behavior, and to use marc/ molds occasionally along with other retrieval systems. another paper describes use of the marc file with the ibm/ document processing system ( 15). summary this report has tried to describe, not sell, marc/molds as fairly as possible in the belief that some of its features should be considered by persons designing interactive systems, and by those responsible for refinement of the marc format. the searching capability is valuable as it increases the access points to the data. the arithmetic and logical operations provide an opportunity to perform certain studies of the marc data base. the marc files will eventually have many applications beyond technical processing functions in libraries. these applications would be more practically implemented if the marc format were modified to accommodate them and if librarians would use systems such as molds during their exploration of alternatives. marc/molds as a computer-based system has many wealmesses. outnumbering and to some extent overshadowing the concrete statements about its faults is its great potential. many questions have been raised which remain unanswered. questions dealing with the basic design of the system and data base are indicative of the development and experimentation which must be done before computer-based interactive retrieval in libraries is a practical reality. acknowledgments the work on this project has been supported by rome air develop164 journal of library automation vol. 3/2 june, 1970 ment center (contracts. u. no. af30 (602)-4283). related work, supported by a grant from the u. s. office of education, provided an education in understanding of the marc tapes. the authors gratefully acknowledge the comments made by phyllis a. richmond and frank martel on the original manuscript. mrs. sharon stratakos, programmer most responsible for molds, contributed a great deal to the authors' understanding of this retrieval program and its potential use with a bibliographic reference file such as marc. program microfiches and photocopies of the following may be obtained from national auxiliary publications service of asis: "rome project program description: molds support package" (naps 00884). references 1. avram, henriette: the marc pilot project, final report (washington, d. c.: library of congress, 1968). 2. a user-oriented on-line data system (syracuse, n. y.: syracuse university research corp., 1966). 2 v. 3. freeman, robert r.; atherton, pauline: audacious-an experiment with an on-line interactive reference retrieval system using the universal decimal classification as the index language in the field of nuclear science (new york: american institute of physics, april 25, 1968) (aip/udc-7). 4. burnaugh, h . p.; et al: the bold user's manual (revised) (santa monica, cal.: jan. 16, 1967) ( tm-2306/004/01). 5. cegala, l.; waller, e.: colex user's manual (falls church, va.: system development. feb., 1969) (tm-wd-(l)-405/000/00). 6. smith, j. l.; micro: a strategy for retrieving ranking and qualifying document references (santa monica, cal.: jan. 15, 1966) (sp 2289). 7. green, james sproat: grins : an on-line structure for the negotiation of inquiries (bethlehem, pa.: lehigh university, center for the information sciences, september 1967) . 8. computer command and control company: description of the multilist system (philadelphia, pa.: july 31, 1967. 9. national aeronautics and space administration, scientific and technical information division: nasa/recon user's manual (washington, d. c.: october 1966). 10. kessler, m. m.: tip user's manual (cambridge, mass.: massachusetts institute of technology, dec. 1, 1965). 11. biomedical communication network: user's training manual (syracuse, new york : december 1968). 12. welch, noreen 0. : a survey of five on-line retrieval systems (washington, d. c.: mitre corp., august 1968) (mtp-322). lc marc on molds/atherton and miller 165 13. avram, henriette d.; knapp, john f.; rather, lucia j.: the marc ii format (washington, d. c.: library of congress, 1968). 14. prywes, noah s.: on-line information storage and retrieval (philadelphia, pa.: university of pennsylvania, moore school of electrical engineering, june 1968). 15. atherton, p.; wyman, j.: "searching marc project tapes using ibm/document processing system," proceedings of american society for information science, 6 ( 1969), 83-88. 90 oclc search key usage patterns in a large research library kunj b. rastogi: oclc; and ichiko t. morita: ohio state university, columbus. many libraries use the oclc online union catalog and shared cataloging subsystem to perform various library functions, such as acquisitions and cataloging of library materials. as an initial part of the operations, users must search and retrieve a bibliographic record for the desired item from the large oc lc database. various types of derived search keys are available for retrieval. this study of actual search keys entered by users of the oclc online system was conducted to determine the types of search keys users prefer for performing various library operations and to find out whether the preferred search keys are effective. introduction in the last decade, many information systems have been developed that use search keys to retrieve bibliographic records from large databases. the oclc online union catalog and shared cataloging subsystem in particular is one of the larger of these systems. 1--u there are currently more than 7 million bibliographic records in the oclc database. the oclc online system uses search keys to access various index files that locate bibliographic records in the database. index files are maintained for name/title, personal author, corporate author, coden, isbn, and lccn indexes. the first four of the above index files contain search keys that are derived from information (e. g., author, title) present in the piece or citation. search keys in these four indexes are in general not unique, because the derived key could be the same for different bibliographic records. the last three indexes (coden, isbn, and lccn) contain search keys or identifiers that are unique in general. a user enters a search key consisting of characters (letters, numbers, symbols, commas, hyphens) formatted according to specific rules that identify to the system which index file to search. for example, to search the name/title index, the user enters a search key consisting of the first four characters of the author's last name and the first four characters of manuscript received october 1980; .accepted december 1980. search key usage!rastogi and morita 91 the first nonarticle word of the title of the work, separated by a comma. to search the title index, the user enters a search key consisting of the first three characters of the first nonarticle word in the title, the first two characters of the second word, the first two characters of the third word, and the first character of the fourth word, each separated by a comma. 7 the system compares the user-entered search key with the search keys contained in that index file. this comparison results in one of three possible cases: l. only one index file search key matches the user-entered search key . 2 . more than one index file search key matches the user-entered search key. 3. no index file search key matches the user-entered search key. in the first case, the system retrieves the unique bibliographic record corresponding to the search key and displays it on the user's terminal screen. in the second case, the system retrieves all records that correspond to the search key, prepares truncated entries (consisting of author, title, imprint data, etc.) for those records, and displays the truncated entries on the user's terminal screen . the user then selects the truncated entry that corresponds to the desired record and requests the system to display the full record for that item. in the third case, the system responds with the reply that a record matching the user-entered search key was not present (a "not found" response) in the index. in the oclc online system, 2,500 member libraries ·using 3,800 terminals search the oclc database to perform various library functions such as acquisitions, monograph cataloging, and serials cataloging. users can choose to enter any type of search key from the various types of search keys permitted by the system. users' preferences to enter a particular type of search key will depend in part upon the kind of information they have about the item to be searched and the type of library function they wish to perform. if users receive a "not found" response after entering a particular type of search key, they may then try a different type of search key that they consider next best. the purpose of this study was to determine what types of search keys are preferred to perform various library functions and whether the preferred search keys are effective. the study also investigated what type of search key is used next when particular types of search keys are unable to retrieve the desired record to determine if there are any discernible search patterns. materials and methods for conducting this study, data were needed on the pattern of searchkey use in oclc member libraries. further, the data had to include the actual time of day when work was performed for a particular library 92 journal of library automation vol. 14/2 june 1981 function on a specific terminal. this requirement would permit identification in the online system use data collected by oclc of search keys entered to perform specific library functions. ideally, a library with several oclc terminals, each used exclusively for only one library function, was desired. the ohio state university (osu) library met this requirement. the osu library has eleven terminals: two of the eleven terminals are used exclusively for performing acquisition functions, seven are used for monographic cataloging, and one terminal each is used for serials cataloging and public use. the terminal assigned for serials cataloging is used for monograph cataloging after 5 p.m. library staff at osu use all the terminals exclusively, except for the public-use terminal. this public-use terminal can be used by anyone, including faculty, students, and library staff. two full days' transactions for each of the osu terminals were obtained from the oclc online system use statistics (olsus) file. during the online operation, the system writes a record on the olsus file for each message entered by the user. this record includes the institution number, a number identifying the terminal from which the message came, the time of the transaction, and the first nonblank sixteen characters of the message . if the user-entered message is a search key, the system response is either a "not found" response or a "found" response. with the "found" response, the system displays the bibliographic record (if unique) or displays a truncated entry screen. however, a "found" response does not necessarily mean that the truncated entry screen includes information about the bibliographic record the user was actually seeking. for the study, a program was written to scan the records in the olsus file for two full days in october 1978. the program extracted all the records for messages that came from the eleven osu terminals and wrote the records on two tapes--one for each day's activity. these tapes were sorted first by the terminal number and then within each terminal number by the time of transaction. each sorted tape was fed to another program that printed, for each terminal, the actual messages in chronological order and the associated system response. from this printout, it was possible manually to go through the complete sequence of messages entered to search a single bibliographic item. the printout for an entire day's activity for each terminal was thus divided into sections, each section containing all transactions that were performed to search for a single item. for each section, the type of search key first entered and the system response was noted. in case of a "not found" response, the type of search key next entered (if the search process was continued for the item) also was noted. the results were combined for all the terminals used to perform a specific library function (e.g., acquisitions) and for the two days. search key usage!rastogi and morita 93 results and discussion table 1 and figure 1 show the different types of search keys used as the first choice to perform various library functions. note that at the time of data collection for this study, the interlibrary loan subsystem was not operational. table 1. different types of searches for various applications type of search nametritle title personal author lccn isbn issn coden total monograph acquisitions cataloging items %of items %of searched total searched total 111 37.5 313 51.7 49 16.6 48 7.9 0 0.0 9 1.5 122 41.2 201 33.2 14 4.7 34 5.6 0 0.0 0 0.0 0 0.0 0 0.0 296 100.0 605 100.0 acquisitions lccn ( 41.2% serials cataloging title other s 4 .7% serials cataloging items %of searched total 15 15.9 72 76.6 0 0.0 1 l.l 1 l.l 5 5.3 0 0.0 94 100.0 monograph cataloging lccn name/tit le public use title name / title 48.7% public use items %of searched total 77 48.7 44 27 .8 16 10.1 13 8.2 3 1.9 3 1.9 2 1.3 158 100.0 fig . 1 . number of different types of search keys for various applications. 94 journal of library automation vol. 14/2 june 1981 during the two-day period, a total of 605 items were searched for monograph cataloging, 296 items were searched for acquisitions operations, and 94 items were searched for serials cataloging. a total of 158 items were. searched on the public-use terminal. most types of search keys were used to some extent. the use of isbn and issn search keys was quite limited for all types of library functions. the coden search key was used only twice, and both times through the public-use terminal. the corporate author search key was not used at all. the use of the personal-author search key was much smaller than expected. this was probably because at the time of the study the system did not permit use of personal author keys during peak hours (9 a .m. to 5 p.m .) of online system operation. for the acquisitions function, the lccn search key was used most often, followed by the name/title key. these two types of keys together were used for about 80 percent of the acquisitions items searched . for the monograph cataloging function, the most frequently used search key was the name/title key. this key was entered for about 52 percent of items searched. the next most frequently used key for monograph cataloging was the lccn key, used for about 33 percent of the items searched. for the serials cataloging function, the title key was used most often, for more than 75 percent of the items searched. searches performed through the public-use terminal included all types of search keys. the name/title key was used most frequently , followed by the title key. before performing an actual search, a user must choose, from among the various types of search keys available in the oclc system, the particular search key to use. if the search key used for a first try (primary choice of search key) results in a "not found" response from the system, a second key may be entered (secondary choice of search key). this sequence may continue through many search-key choices until the user retrieves the desired record ("found" response) or decides to abandon the search at some point upon obtaining a "not found" response. for this study, the investigation was confined to onlyprimary and secondary choices of search keys. the results of the "found" responses for the primary choice of key and for the secondary search key entered after receiving the first "not-found" response are presented in tables 2 through 5. for the acquisitions function (table 2), the most frequently used primary search key was the lccn key, which retrieved the desired record about 89 percent of the time. when the lccn key could not retrieve the record, the user chose mostly the name/title key as his/her secondary choice or abandoned the search. the next most frequently used primary search key was the name/title key, which retrieved the desired record about 51 percent of the time. when the name/title key was unsuccessful, the users entered as their secondary search key a title key search key usage!rastogi and morita 95 table 2. number of primary and secondary choices of search keys for acquisitions search discontinued types of search key used after the type of %of notafter the first not-found response first notsearch key items found found found name/ personal found used first searched responses responses responses title title author lccn isbn response nameffitle 111 57 51.3% 54 17 22 0 1 0 14 (31.5%)(40. 7%) (0.0%) (1.9%) (0.0%) (25.9%) title 49 17 34.7% 32 6 ll 0 2 1 12 (18.8%)(34.4%) (0.0%) (6.2%) (3.1%) (37.5%) personal author 0 lccn 122 109 89.3% 13 5 1 0 2 1 4 (38.4%) (7.7%) (0.0%) (15.4%) (7.7%) (30.8%) isbn 14 1 7.1% 13 8 3 0 0 0 2 (61.5%)(23.1 %) (0.0%) (0.0%) (0.0%) (15.4%) issn 0 coden 0 total 296 184 62 .2% 112 36 37 0 5 2 32 (32.1 %)(33.0%) (0.0%) (4.5%) (1.8%) (28.6%) note: to calculate the percentage given in parentheses, the number of ''types of search key used after the first not-found response" was divided by the number of "not-found responses." about 41 percent of the time, or a different name/title key about 31 percent of the time. approximately 26 percent of the time they abandoned the search. it seems that acquisitions users mostly try the lccn key first if available (the lccn is not present in all the records) and the name/title key first if the lccn is not available. thus, users adopted the right approach since the lccn key has· the highest hit rate. furthermore, the lccn key is more efficient than other keys because it results, on the average, in a fewer number of replies. for the monograph cataloging function (table 3), the name/title key was used most often as the primary search key, resulting in retrieval of the desired record about 57 percent of the time. when the name/title key could not retrieve the record, the users next attempted a title key (52 percent of the time) or a different name/title (21 percent of the time). about 23 percent of the time they discontinued the search. the lccn key was the second most frequently used primary search key and successfully retrieved the record about 79 percent of the time. when the lccn key was unsuccessful, the users tried the name/title key (58 percent of the time) as their secondary choice or abandoned the search . unlike the search-key usage pattern for acquisitions, the use of the lccn key for monograph cataloging was lower than use of the name/ title key, although here also the hit rate was highest for the lccn key. the reason the lccn use was lower is that ohio state university, being a research institution, processes a large number of items from var96 journal of library automation vol. 14/2 june 1981 table 3. number of primary and secondary choices of search keys for monograph cataloging search discontinued types of search key used after the type of %of notafter the first not-found response first notsearch key items found found found name/ personal found used first searched responses responses responses title title author lccn isbn response nameffitle 313 180 57.5% 133 28 69 1 4 1 30 (21.1%)(51.9%) (0.7%) (3.0%) (0.7%) (226%) title 48 24 50.0% 24 9 2 1 3 2 7 (37.5%) (8.3%) (4.2%) (12.5%) (8.3%) (29.2%) personal author 9 3 33.3% 6 4 0 0 0 1 1 (66.6%) (0.0%) (0.0%) (0.0%) (16.7%) (16.7%) lccn 201 158 78.6% 43 25 4 0 2 l 11 (58.1 %) (9.3%) (0.0%) (4.7%) (2.3%) (25.6%) isbn 34 3 8.8% 31 20 4 1 1 3 2 (64.5%)(12.9%) (3.2%) (3.2%) (9.7%) (6.5%) issn 0 coden 0 total 605 368 60.8% 237 86 79 3 10 8 51 (36.3%)(33.3%) (1.3%) (4.2%) (3.4%) (21.5%) note: to calculate the percentage given in parentheses, the number of ''types of search key used after the first not-found response" was divided by the number of "not-found responses." ious sources other than regular acquisitions channels, and many of these sources do not have lccn information. for the serials cataloging function (table 4), the title key was the first primary choice and retrieved the desired records 44 percent of the time. if this key failed to retrieve the desired records, the users entered as their secondary key a different title key 55 percent of the time and a name/title key 17 percent of the time. approximately 23 percent of the time, users decided to discontinue the search. although for serials cataloging the title key was used most frequently, its hit rate was less than 45 percent. on the other hand, the issn key was used very little, but its hit rate was as high as 80 percent. the use of the issn key is likely to increase in the future, however, because the united states postal service now requires the issn to be present on serials . 8 therefore, the issn will be more readily available to the user. among the searches performed through the public-use terminal (table 5), the most frequently used primary search key was the name/title key, which resulted in a successful search about 29 percent of the time. when patrons encountered a "not found" response, they tried as their secondary choice a different name/title key 29 percent of the time, or a title key 29 percent of the time. they abandoned the search 38 percent of the time. as mentioned earlier, the public-use terminal can be used by anyone, including faculty and students. the hit rate for name/title search key usage!rastogi and morita 97 table 4 . number of primary and secondary choices of search keys for serials cataloging %of nottypes of search key used after the first not-found response search discontinued after the first not-type of search key used first items found found found name/ personal found response searched responses responses responses title title author lccn isbn nameffitle 15 3 20.0% title 72 32 44.4% personal author 0 lccn 0 0.0% isbn 0 0.0% issn 5 4 80.0% coden 0 total 94 39 41.5% 12 6 4 1 0 0 (50.0%)(33.3%) (8.3%) (0.0%) (0.0%) 1 (8.3%) 40 7 22 2 0 0 9 1 (17.5%)(55.0%) (5.0%) (0.0%) (0.0%) (22. 5%) 0 1 0 0 0 (0.0%)(100.0%) (0.0%) (0.0%) (0.0%) 1 0 0 0 0 (100.0%) (0.0%) (0.0%) (0. 0%) (0.0%) 0 1 0 0 0 (0.0%) (100.0%) (0.0%) (0.0%) (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 55 14 28 3 0 0 10 (25.5%)(50.9%) (5.4%) (0.0%) (0.0%) (18:2%) note: to calculate the percentage given in parentheses, the number of "types of search key used after the first not-found response" was divided by the number of "not-found responses." table 5. number of primary and secondary choices of search keys for public use %of nottypes of search key used after the first not-found response search discontinued after the first n ot-type of search key used first items found found found name/ personal found response searched responses responses responses title title . author lccn isbn nameffitle 77 22 28.6% 55 16 16 0 2 0 21 (29.1 %)(29.1 %) (0.0%) (3.6%) (0.0%) (38.2%) title 44 20 45.4% 24 ll 9 0 0 0 4 (45.8%)(37.5%) (0.0%) (0.0%) (0.0%) (16.7%) personal author 16 5 31.3% ll 0 0 3 0 0 8 (0.0%) (0.0%) (27.3%) (0.0%) (0.0%) (72.7%) lccn 13 5 38.5% 8 2 2 0 1 1 2 (25.0%)(25. 0%) (0.0%) (12.5%) (12.5%) (25.0%) isbn 3 2 66.7% 0 0 0 0 1 0 (0 .0%) (0.0%) (0.0%) (0.0%) (100.0%) (0.0%) issn 3 33.3% 2 0 0 0 0 0 2 (0 .0%) (0.0%) (0.0%) (0.0%) (0.0%) (100.0%) coden 2 0 0.0% 2 0 1 0 0 0 1 (0.0%) (50.0%) (0.0%) (0.0%) (0.0%) (50.0%) total 158 55 34.8% 103 29 38 3 3 2 3h (28.2%)(27 .2%) (2.9%) (2.9%) (1.9%) (36.9%) note: to calculate the percentaee given in parentheses, the number of "types of search key used after the first not-found response" was divided by the number of "not-found responses." 98 journal of library automation vol. 14/2 june 1981 key at this terminal was rather low. from this study, it is not possible to say whether this was due to patrons' lack of knowledge in key construction or lack of sufficient information needed for the construction of the key. summary and conclusions among various types of search keys available to the users, the name/ title, lccn, and title search keys were entered most frequently. the use of personal-author, isbn, issn, and coden search keys was very limited for all library functions. corporate-author search keys were not used at all. for the acquisitions function, system users most frequently entered the lccn key, followed by the name/title key. for monograph cataloging, the users entered the name/title key most frequently, followed by the lccn key. for serials cataloging, the use of the title key was the most common. persons using public-use terminals entered mostly name/ title and title search keys. for acquisitions and monograph cataloging functions, the lccn key was most successful in retrieving the desired records. the next most successful key was the name/title key. for both of these functions, when the name/title key failed to retrieve the record, users next tried the title key most of the time. for serials cataloging, the title key was used most frequently but was not very successful in retrieving serial records. on the other hand, the issn key was the most successful but it was used very little . individual identifiers such as lccn, issn, isbn, and coden are very efficient search keys because they retrieve, on the average, far fewer numbers of replies than other types of search keys. with the exception of lccn, the individual indentifiers were used only to a small extent. from this study, it is not possible to answer questions such as: why weren't individual identifiers' search keys not used more often? did a searcher use a name/title key even when the lccn was available? to answer such questions, data will have to be collected concerning what kind of information is available to the searcher when constructing the search keys. acknowledgments the authors wish to thank william h. hochstettler for programming assistance, and peggy zimbeck for editorial assistance with the manuscript. references l. f. g. kilgour, p. l. long, and e. b. leiderman, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science 7:79-82 (1970) . 2. f. g . kilgour and others, '"title-only entries retrieved by truncated search keys," search key usage!rastogi and morita 99 journal of library automation 4:207-10 (dec . 1971). 3. p. l. long and f . g . kilgour, "a truncated search key title index," journal of library automation 5:17-20 (march 1972). 4. a. l. landgraf and f. g. kilgour, "catalog records re trieved by personal author using derived search keys," journal of library automation 6:103--8 (june 1973). 5. a. l. landgraf, k. b. rastogi, and p. l. long, "corporate author entry records retrieved by use of derived truncated search keys," journal of library automation 6:15161 (sept. 1973). 6. j. d . smith and j . e . rush , "the relationship between author names and author entries in a large on-line union catalog as retrie ved using truncated keys," journal of the american society for information science 28 , no.2:115--20 (march 1977). 7. oclc, inc., searching the on-line union catalog (columbus, ohio: oc lc, inc., 1979). 8. library of congress information bulletin 37:35 (1 sept. 1978). kunj b. rastogi is a research scientist at oclc . ichiko morita is assistant professor at the ohio state university libraries. this paper discusses some of the problems associated with search and digital-rights management in the emerging age of interconnectivity. an open-source system called context driven topologies (cdt) is proposed to create one global context of geography, knowledge domains, and internet addresses, using centralized spatial databases, geometry, and maps. the same concept can be described by different words, the same image can be interpreted a thousand ways by every viewer, but mathematics is a set of rules to ensure that certain relationships or sequences will be precisely regenerated. therefore, unlike most of today’s digital records, cdts are based on mathematics first, images second, words last. the aim is to permanently link the highest quality events, artifacts, ideas, and information into one record documenting the quickest paths to the most relevant information for specific data, users, and tasks. a model demonstration project using cdt to organize, search, and place information in new contexts while protecting the authors’ intent is also introduced. ■ statement of the problem human history is composed of original events, artifacts, ideas, and information translated into records that are subject to deciphering and interpretation by future generations (figure 1). it’s like putting together a puzzle, except that each person assembling bits and pieces of the same information may end up with a different picture. we are at a turning point in the history of humanity’s collective knowledge and expertise. we need more precise ways to structure questions and more interactive ways to interpret the results. today, there is nearly unlimited access to online knowledge collections, information services, and research or educational networks to preserve and interpret records in more efficient and creative ways.1 there is no reason digital archiving and dissemination techniques could not also be used to streamline redundancies between collections, build cross-references more methodically.2 content should be presented and techniques utilized according to orderly specifications. this will help to document work more responsibly, making shared records more correct, interesting, and complete. the open-source system proposed, context driven topologies (cdt), packs and unpacks ideas and information in themes similar to museum exhibitions using specifications created by each author and network. data layers are formed by registering unique combinations of geography, knowledge domains, and internet addresses to create multidimensional shapes showing where data originate, where they belong, and how they relate to similar information over time. the topologies can be manipulated to consolidate and compare multiple sources to identify the most reliable source, block out repetitious or irrelevant background information, and broadcast precise combinations of ideas and information to and from particular places. “places,” in this sense, means geographic region and cultural background, knowledge domain and education level, and all of their corresponding online resources. modern information must be searchable on multiple and simultaneous levels.3 today’s searches occur for a number of reasons that did not exist when most current collections, repositories, and publications were created. digital records have the potential to reach far broader audiences than original events, artifacts, and ideas. therefore, digitized items and the acts of publishing and referencing over networks could theoretically serve a longer-term and more expanded purpose than most individual collections, repositories, or publications are designed to serve. there is no shortage of interesting work to look at. we live in a complex world that is just recently being digitized, mapped, analyzed, and broadcast over the internet in fine detail and compelling overall relationships. many deborah l. macpherson (debmacp@gmail.com) is projects director, accuracy&aesthetics (www.accuracyandaesthetics.com) in vienna, virginia. deborah l. macpherson digitizing the non-digital: creating a global context for events, artifacts, ideas, and information digitizing the non-digital | macpherson 95 figure 1. 50 word word-search-puzzle (courtesy of kevin lightner) 96 information technology and libraries | june 2006 of these relationships require mathematics, images, and maps to explain them. we need more than keywords to explore and reference all that has been documented, but we have formed the habit of using keywords and machine-based classification schemes. the entire digital world is in a mire of conflicting priorities, funding opportunities, and intellectual quests toward the future. to advance humanity’s collective curiosity and knowledge, and to coordinate similar efforts across disciplines and cultures, we need one form of record keeping. one global context to show: 1. where ideas and information begin; 2. if the original is non-digital (e.g., an artifact or real world event), and if so, the location where the artifact resides or the time and place of the event; and 3. a marking system to keep track of the ways information has been exchanged, reinterpreted, and reused to create a more comprehensive and simplified guide to humanity’s collective knowledge and expertise. digitizing the non-digital is a concept to address three issues: ■ tools to assemble the bigger pictures needed to document the best paths to the most relevant information in sets rather than retrieving results item by item; ■ placeholders for information that has not been digitized or was never recorded; and ■ distribution to and from specific places according to the ways it is used, the kind of information it is, and the types of people who are able to understand it. there is currently little distinction between all data that have been collected or exist, versus the data and techniques selected to draw conclusions. there are no tools to differentiate between information under rigorous discussion by a discipline or culture versus random bits and pieces. there is a need to develop the equivalent of interpretive exhibits to instruct and inspire the general public. there is currently no way to herd information into crowded areas to be consolidated, compressed, and prioritized by its relationship to similar ideas and information. citation patterns are able to show connections or structure-related information.4 however, they currently do not show whether the reference is for or against the other work. there are very few big pictures.5 there is no way to trace where an idea has led over time. the global context proposed is not like the ancient library of alexandria or large-scale contemporary initiatives. the envisioned process looks beyond the quest to digitize or publish every available event, artifact, and idea. it is not about each item itself. it is being able to make sense of the ways the same information can be viewed in different contexts, and being able to construct a reliable process to search and document the results. having bigger pictures will allow researchers, curators, and others to see what is missing or decide which archival works should be converted into digital form. we do not have the time, resources, or reasons to digitize every item in every collection. the aim is to gradually identify what the most telling examples are in different areas so someone new to an event, artifact, idea, or information can see it in various contexts and automatically be shown the most compelling or instructive sequences first (figure 2). a coordinated effort to overlap and see all archives and publications by ranking accuracy and appeal to the public in relationship to all knowledge will make it possible for entirely new lines of inquiry to be established. it will help researchers coordinate work across disciplines. an example of this principle today is the international virtual observatory alliance (ivoa).6 ivoa is a coordinated effort by astronomers worldwide to document our universe more efficiently by systematizing their records; showing where they originate; indicating how they were collected; meeting their rigorous mathematical figure 2. photomosaic ® thousands of miniature images of the civil war combine to make one large portrait. (courtesy of robert silvers) article title | author 97digitizing the non-digital | macpherson 97 standards; and deciding themselves how and where their records belong in relationship to each other, and which ones are most important. only astronomers are qualified to do this. the same is true in any area of humanity’s specialized knowledge and expertise. the most difficult aspect of creating a global context is accommodating and expressing each area in its unique way as created from within, while still being able to get the most descriptive examples from all areas to fit together in a sensible and appealing overview. until digital archives and publications can be deeply searched on a global level using simpler tools and predetermined pathways accessible by anyone, two researchers in different geographic or academic areas may be investigating the same topic from different points of view and will not know it. there is no way to be led to the best internet resources. today, as so much information surrounds us, it is hard to believe that common lines of inquiry could be discovered by accident. context of the place, time, idea, or education level should be able to drive internet topologies to the most appropriate online resources. constructing a reliable and beautiful digital history of all events—both natural and man-made—artifacts, ideas, and information means contributing to and combining a wide range of knowledge, expertise, networks, archives, and tools. mapping digital knowledge to historical knowledge means arguing about and perfecting an entirely new set of checks and balances. historical and digital knowledge are different. historical knowledge is fluid, continuous, and held by traditionally separated cultures and disciplines. digital knowledge goes everywhere that can be marked and traced by the times and places it was created, captured, and distributed. trying to visualize what is happening and relating it to working practices and the types of information that came before it is not like tracing the history of the human race back to adam and eve or the universe back to the big bang, where substantial guesswork beyond our memory or experience is involved. the entire conversion into the networked age is happening before our eyes in less than one generation without the benefit of reflection, careful review, and storytelling. we’re collecting everything indiscriminately over and over again while all datasets are rapidly expanding. we need to step back, slow down, and acknowledge that many current digitization and publication methods do not consistently generate reflective or reviewed results that are able to tell a story. we do not currently have one shared map, context, mathematical record, language, or set of symbols to interpret from different points of view for a variety of purposes over time. we do not currently mark the original versus subsequent interpretations of the same information as an integral component of most digital records. there is no financial support for one single shared storage space to preserve only the highest-resolution, most agreed-upon versions because we may never be able to agree on what they are. therefore, there is also not one system that can be fine-tuned to discover research and results that may be accidentally overlapping. instead, unusual approaches get watered down by constrained words designed to fit metadata requirements developed by archivists and engineers rather than the original authors. links get broken, web sites are no longer maintained, trends change. there are currently very few feasible ways to pick up on a line of inquiry previously initiated by others without sorting through and regenerating the same information again.7 a simplified version of the work needs to be preserved on the network, able to be referenced by others even if they are far away, live in a different time, or are more or less advanced in their ways of thinking. if digital information is reliable, someone in a remote place or in the future should not need to collect the same information again or unintentionally retrieve out-of-date or duplicate results. searches in the public domain should not be boring. they should be as easy to click through as tv channels, with more directions to go and better content. all searchers should not have to start at the top like everyone else on the first page of google, citeseer, or arxiv with a blank white space and a box to enter key words. investigators should be able to outline the facts they know, dial in measurements, specify relationships, and generally be able to use their own knowledge and expertise to isolate and extract entire ideas over broad spectrums or select only relevant portions of archives and publications to reintegrate into larger bodies of work for further discussion. digital objects are able to depict more than the unaided eye can see. an example is the evaluation of the center of mass of michelangelo’s david performed for david’s restoration by the visual computing lab based on a 3d model of the statue built by stanford university (figure 3).8 the digital david does not have mass. the original david is a beautiful object sculpted of a known and predictable material. the model makes it possible to test restoration techniques without permanent damage in ways no one would dare attempt on the irreplaceable original without first knowing more. the documentation process is an enhanced original that should be permanently bound to the digital history of the original sculpture. the evaluation method could be applied to other objects, but this model belongs with this object and this type of research. a global context built upon a solid, mathematically linked foundation would mean this conscientious work would not be lost or need to be repeated. digital records are not being used nearly to their full potential. so many influences on humanity’s intellectual evolution could be examined as history takes shape over 98 information technology and libraries | june 2006 time. concurrent and conflicting interpretations can take on more meaning than the original by itself. for example, how could the internet and legal citations be used to map the subsequent interpretations of the u.s. constitution from the time, place, and reasons where it was written to every supreme court case and related citation since the original context? what would this map look like (figure 4)? the impact that these four pages of ink on paper have had to the united states and the entire world cannot currently be examined in one volume to see where the most contentious and useful passages are. similar dynamics in wikipedia are shown in history flow by martin wattenberg at ibm research.9 what if techniques developed in one field could be applied to content from another area? for example, what if computer models created to track storms and hurricanes could be used to arrange and watch the evolution and real world impact of all the documents and actions associated with a war? being able to see how originals evolve in their interpretation and impact on society over time is practical because not all records are worth keeping. even worse, mundane or meaningless events, artifacts, ideas, or information may seem more important than they actually were if they are not translated into digital form or distributed in the right way.10 the task today is to make the most advanced ways of thinking and working more approachable and appealing to someone new, which is everyone outside a particular discipline or culture, while traversing a map of humanity’s collective knowledge and expertise. because shared memories of this magnitude would be so far-reaching and complex, the record itself needs to be able to show every user how to use it. every unique purpose for looking around, publishing, or referencing work, and adding to or taking away from a collaborative global context should be geared toward improvement and simplification. while millions and millions of people are accessing enormous numbers of files and collections, some paths are better than others. in order to sort and choose the best parts of vast collections, documenting everyone going in and out of various semantic places can ultimately identify the best paths to information everyone understands. what if someone who does not care at all about paintings makes an inquiry—which ten should they be shown to get them interested? there is also the issue of gearing the internet to provide more efficient pathways to widely accessed preapproved and curated information. every mouse click could accumulate to document the most reliable pathways in and out of shared information spaces to generate an assortment of scenarios for looking at the same information in different ways (figure 5).11 we think there is far too much information to consolidate into one big picture, that our ideas and methods are too incompatible to coexist comfortably in one space, but perhaps this is not really the case. perhaps we can understand what is happening more clearly by working backwards. ■ proposed solution and design for a running prototype even though many networks are in place and countless computers have been manufactured, technology advances rapidly. there are very few reasons to repair obsolete equipment or maintain outdated web resources. therefore, why not go back to the drawing board on all of it? we may have completely new computers and networks within ten years, anyway. a record-keeping and referencing system this ambitious needs to incorporate every type of record, classification scheme, symbol, style, and quirk. when visiting a new place outside your comfort zone, it needs to be obvious what the best local techniques are to filter and understand the results. people new to an area need to have the option of using tools they can invent or already know. figure 3. david’s center of mass (courtesy of the visual computing lab and stanford university) article title | author 99digitizing the non-digital | macpherson 99 the visualization of cdt’s model demonstration project will bring together research scientists, artists, integrators, and institutions to develop a running prototype. the purpose is to establish and record a series of planned and spontaneous situations in different parts of the world across a range of disciplines and existing networks so that these situations can be mapped. the project will be a group of people thinking together to confront the roadblocks in assembling incompatible ideas and information into one context. the group will collaborate in larger and smaller groups in roughly three-month intervals as participants continue with their existing work. the development of this system has to be dynamic, changing piece by piece both from the bottom up and the top down while everyone’s regular work continues. therefore, the system will be geared toward sample sets of active work products, rather than the record-keeping system by itself. the current objective is to establish a network of ten art museums, ten scientific research institutes, and ten new media/new technology efforts in ten cities that speak different natural languages (for example: english, german, french, italian, hindi, mandarin, ga [belonging to the cluster of kwa languages in ghana], uzbek, spanish, and arabic). the overall intent is to use mathematics, art, and individual ways of knowing to develop a series of professional sketches to serve as shortcuts between languages and key words in the search process. the first step is to map the background of each of the project participants’ previous work by time, location, and discipline. the database will include scientific visualizations, art objects, performances, algorithms, mathematical formulae, musical recordings, and many other forms of creative and scholarly expression. the next steps will be to hold a series of interactive workshops. at the first workshop, the research scientists will explain the mathematics and images they use in their work. two sets of artists will isolate the aesthetics to render their own map through the scientists’ ideas. two traveling exhibits will be created, one to be experienced in person, the other to be presented through a new media and online exhibit. both will be tracked physically and conceptually using cdt. the results will be generated and interpreted using gis, matlab, photoshop, and flow visualization software. for more information, please contact the author. a survey of individual and institutional requirements will be undertaken to define practical ways to move and organize ideas and information into a unified sample map of previously unrelated content and techniques. for example, at one institute, perhaps only two participants and four local professors will understand what that part of the map is showing. another part may only have meaning to one artist. a unified map for everyone, with built-in copyright protection for the participating artists, scientists, and institutions, will be presented to nonspecialist general publics around the world for feedback and further change within specified limits. the participating publics will be people interested in contemporary art, cutting-edge scientific research, new media, and events where all three communities can interact. each part of the prototype will be able to be examined in groups to compare and contrast different elements against different backgrounds. some arrangements will be assisted by the computer and network. the project will map everything with which each event, idea, and artifact has ever been associated in scale, proportion, and relative placement in the record overall. for example, if the records in question are paintings, any group could figure 5. thick and thin (courtesy of the artist john simon) figure 4. the constitution of the united states (courtesy of the u.s. national archives and records administration [nara]) 100 information technology and libraries | june 2006 be gathered together into the same reference window without copying the images. the assembly window has a built-in scale for the items it is showing, so they will be displayed in the correct proportion to each other. the system binds images of physical objects with their dimensions and the times and places they were created while this information is known—so a user does not ever have to guess later when looking back at any part of the record. any group of paintings can be automatically arranged chronologically, by size, culture, or any number of comparisons and curatorial issues. a sample sequence is: 1. a zoomed-in map showing a group of paintings in an exhibit. each painting links to its history. 2. within the map of all paintings shown in an intricate collage. 3. inside the map of all human endeavor shown as an appealing landscape. higher levels can then be used to reorganize a theme, for example, “only germany 2005 to 2007,” and drilling back down to generate other exhibitions. this would lead to other paintings and other curators’ conclusions, which would provide a more complete representation of each painting, exhibition, museum, curator, culture, and era. when the records in question are scientific visualizations, problems of presenting unrelated files together are more complex. the records may not share a common scale or system of reference. it may only be possible to place mathematical constructs in contexts based on where they originate geographically and by knowledge domain. an important part of the work will be determining the best contexts by which to introduce ideas or information to untrained viewers and devising methods to start deeper in the records using mathematical, cultural, or other prior knowledge and preferences. the same concept can be described by different words, the same image can be interpreted a thousand ways by every viewer, but mathematics is a set of rules to ensure that certain relationships or sequences will be precisely regenerated. therefore, unlike most of today’s digital records, cdts are based on mathematics first, images second, words last. ideas and information will be encoded to persist over specified periods of time. better examples will find higher placement by connecting to more background information and showing stronger relationships to larger numbers of open questions. cycles will be implemented to return to the same idea later and remove information that is never referenced or has not changed the course of the record’s flow. out-of-date, irrelevant, or rarely used information has to either be compressed or be thrown away, a new type of identity and a process to assemble and eliminate information will be created in thirty prototype forms showing the intertwined history of the events, artifacts, ideas, and information generated by the project and all it branches out to when connecting back to the publications, exhibits, ideas, artifacts, and other information generated by the participating individuals and institutions. the cdt model will relate and join tables to display all the different forms together in one map. each piece of information and the patterned space around it will be documented a special way to generate drawings leading back to originals reliably structured to transfer to other computers and networks. they will transfer without ambiguity because the transactions and paths to the internet addresses are based on mathematical relationships that can be checked. each contributor has the first opportunity to place his or her ideas in context and define the limits of how their originals can be referenced, changed, and presented. at the end of the project, the set will be closed so that it can be cleaned of information that was only temporary, placeholders can be examined, and the entire model can be manipulated as one whole. for more information, please see www.contextdriventopologies.org the more specified a single piece or set of information, the easier it will be to define its history and place it in context. each unique placement and priority assigned by each individual or institution may not agree with the priorities and placements envisioned by others, but sooner or later, there will begin to be correspondence and everyone will be looking very generally at the same emerging map. ■ conclusions there will be innumerable contexts to create, discover, and remark upon in the future by creating a shared pace of curiosity and knowledge acquisition. a global context could be used to extrapolate new knowledge from trends that occur over longer periods of time in more places than we currently share or document. as the envisioned system is fine-tuned, it will become an ideal place to test an idea that is only partially complete to see where the idea fits or to determine if it has already been done. the results could be immediately applied to improve education. in today’s frantic information overload, we should not forget that digital information—and even cold, hard, raw data—is more than ones and zeros. they represent peoples’ work, their fingerprints; people are attached to their data. one wishes networks of computers could understand one’s ideas and work, but we only show them the boring parts. the proposed system will capture beauty so computers can help to find where it is hidden inside all the repositories, publications, and collections through which no person has the time to sort. the system will allow article title | author 101digitizing the non-digital | macpherson 101 users to specify how they think their information relates to the rest of the world so their intended context can be traced in the future. one hopes that using networks and computers to compare ideas and works on larger levels will restore craftsmanship and attention spans to make users want to spend more time with better information. a shared visual language driven by mathematical relationships that can be checked will allow future historians to see where records simply will not harmonize. users will be able to analyze why different ways of looking can shape and divide knowledge and history as it changes. visiting online archives and publications will change. developing processes to pre-organize searches and results for public viewing can change now by creating a system for curators and others to develop sets of information, rather than publishing individual items on their web sites. library facilities can change, and research rooms can become multimedia centers. networks can broadcast content and techniques in one package. there is not one clearly defined reason why being able to see these kinds of overviews or make these types of comparisons can be useful. the internet is a worldwide invention being constructed for a variety of purposes. a perfectly legitimate reason to capture the history of transactions across it in a simple form is just to see what might happen with the objective of increasing our understanding and respect for each other. the most important reason for establishing a global context is to allow users to transfer and update complex histories, thoughts, images, studies, visualizations, drawings, flow diagrams, sequences, transformations, cultural objects, stories, expressions, and purely mathematical or dynamic relationships without depending on constrained keywords or illegible codes that do not describe this information as well as the information can describe itself. all cultures and disciplines would be able to construct their parts of the record precisely the way they prefer. we would finally be able to use computers to show why and how we think information is related—a huge leap forward in the world of digital record keeping. references and notes 1. citeseer, 2005, http://citeseer.ist.psu.edu (accessed apr. 6, 2006); internet2, 2005, www.internet2.edu (accessed apr. 6, 2006); jane’s information group, 2005, www.janes.com (accessed apr. 6, 2006); machine learning network online information service (mlnetois), 2005, www.mlnet.org (accessed apr. 6, 2006); national technical information service, 2005, www.ntis .gov (accessed apr. 6, 2006); smithsonian institution libraries, galaxy of knowledge, 2005, www.sil.si.edu/digitalcollections (accessed apr. 6, 2006); thompson scientific, isi web of knowledge, 2005, www.thomsonisi.com (accessed apr. 6, 2006); visual collections, david rumsey collections, 2005, www .davidrumsey.com/collections (accessed apr. 6, 2006); world health organization, statistical information system, 2005, www3.who.int/whosis/menu.cfm (accessed apr. 6, 2006). 2. g. ammons et al., “debugging temporal specifications with concept analysis,” in proceedings of the acm sigplan 2003 conference on programming language design and implementation (new york: association for computing machinery, june 2003). 3. w. huyer and a. neumaier, “global optimization by multilevel coordinate search,” in global optimization 14 (1999): 331–55 4. a. bagga and b. baldwin, (workshop paper), in colingacl ‘98: 36th annual meeting of the association for computational linguisitics and 17th international conference on computational linguisitics, aug. 10–14, 1998, montréal, quebec, canada: proceedings of the conference (new brunswick. n.j.: acl; san francisco: morgan kaufmann, 1998); s. deerwester et al., “indexing by latent semantic analysis,” journal of the american society for information science 41, no. 6 (1990): 391–07; a. mccallum and b. wellner, “toward conditional models of identity uncertainty with application to proper noun coreference,” in proceedings of the ijcai workshop on information integration on the web (mountain view, calif: research institute for advanced computer science, 2003), 79–84; t. nisonger, “citation autobiography: an investigation of isi database coverage in determining author citedness,” college and research libraries 65, no. 2 (mar. 2004): 152–63; k. van deemter and r. kibble, “on coreferring: coreference in muc and related annotation schemes,” computational linguistics 26, no. 4 (dec. 2000); k. boyack, “mapping all of science and technology at the paper level,” presented at the session mapping humanity’s knowledge and expertise in the digital domain as part of the 101st annual meeting of the association of american geographers (denver: association of american geographers, 2005): 54; metacarta, 2005, www.metacarta.com. 5. j. burke, “knowledgeweb project, 2005.” www.k-web .org (accessed apr. 6, 2006); visual browsing in web and non-web databases, iowa state university, www.public.iastate .edu/~cyberstacks/bigpic.htm (accessed apr. 6, 2006). 6. international virtual observatory alliance, 2005, www .ivoa.net (accessed apr. 6, 2006). 7. s. bradshaw, “charting excursions through the literature to manage knowledge in the biological sciences,” presented at the session mapping humanity’s knowledge and expertise in the digital domain, as part of the 101st annual meeting of the association of american geographers (denver: association of american geographers, 2005): 56, project paper available from http://dollar .biz.uiowa.edu/~sbradsha/beedance/publications.html (accessed apr. 6, 2006). 8. m. callieri et al., “visualization and 3d data processing in the david restoration,” ieee computer graphics and applications 24, no. 2 (mar./apr., 2004): 16–21. 9. m. wattenberg, “history flow,” 2005, http://research web.watson.ibm.com/history (accessed apr. 6, 2006). 10. k. börner, “semantic association networks: using semantic web technology to improve scholarly knowledge and expertise management,” in visualizing the semantic web, 2nd ed. vladmire geroimenko and chaomei chen, eds., (london: springer verlag, 2006) 99–115. 11. g. sidler, a. scott, and h. wolf, “collaborative browsing in the world wide web,” in proceedings of 8th joint european networking conference, edinburgh, scotland (new york: elsevier, 102 information technology and libraries | june 2006 1997); j. thomas, “meaning and metadata: managing information in a visual resource reference collection,” in proceedings of association for computers and the humanities and the association for literary and linguistic computing meeting (charlottesville, va.: university of virginia, 1999); h. yu and a. vahdat, “design and evaluation of a conit-based continuous consistency model for replicated services,” in acm transactions on computer systems 20, no. 3 (aug. 2002): 239–82. 12. visualization of context driven topologies/cdt model demonstration project, 2005, www.contextdriventopologies.org (accessed apr. 6, 2006). image acknowledgments: 50-word word-search puzzle www.synthfool.com/puzzle.gif permission: kevin lightner, synthesizer enthusiast. wrightwood, california abraham lincoln www.photomosaic.com/samples/large/abrahamlincoln.jpg permission: from the artist robert silver. david’s center of mass http://vcg.isti.cnr.it/projects/davidrestoration/restaurodavid.htm http://graphics.stanford.edu/projects/mich/book/book.html permission: roberto scopigno, visual computing lab, isti-cnr, via g. moruzzi, 1, 56124 pisa italy and marc levoy, stanford computer graphic lab, gates computer science bldg. stanford, ca 94305 u.s. constitution www.archives.gov/ repository: national archives building, washington, d.c. permission: nara government records are in the public domain. thick and thin www.numeral.com/drawings/plotter/thickandthin.html 1995 11" × 15" ink on paper. permission: from the artist john simon, new york city. specializing in algorithms and conceptual art. editorial i think that writing editorials in my job as the new editor of information technology and libraries (ital) is going to be a real piece of cake. all i have to do, dear readers, is to quote (with proper attribution) walt crawford, the title of whose book i repeat as the title of this, my inaugural editorial.1 and then quote other sages of our profession, using only as many of their words as is fitting and proper to make my editorials relevant to the concerns of our membership and readers and as few of my own words as i can to repay the confidence that the library information and technology association (lita) has placed in me— and to avoid muddling the ideas of those to whom i shall be indebted. those of you reading this will note that i have already fallen prey to the conceit of all scholarly journal editors: that their readers, of course, after surveying the tables of contents, dive wide-eyed first into the editorials. of course. to paraphrase a technologist of an earlier era, “when in the course of human events, it becomes necessary for” a new editor to take on the responsibility for the stewardship of ital, “a decent respect to the opinions of mankind requires that” he “should declare the causes which impel” him to accept that responsibility and, further, to write editorials. i quote, of course, from the first paragraph of the declaration of independence adopted by the “thirteen united states of america” july 4, 1776. in this, my first editorial, i, too, shall put forth for the examination of the members of lita and the readers of ital my goals and hopes for the journal that i am now honored to lead. these goals and hopes are shared by the members of the ital editorial board, whose names appear in the masthead of this journal. ital is a double-blind refereed journal that currently has a manuscript acceptance rate of 50 percent. it began in 1968 as the journal of library automation (jola), the journal of the information science and automation division (isad) of ala, and its first editor was fred kilgour. in 1978 isad became lita, and in 1982, the journal title was changed to reflect the expanding role of information technology in libraries, an expansion that continues to accelerate so that ital is no longer the only professional journal within ala whose pages are now dominated by our accelerating use of information technologies as tools to manage the services we provide our users and as tools we use ourselves to accomplish our daily duties. i write part of this editorial in the skies over the middle section of the united states as i return home from the seventh national lita forum held in st. louis, october 7–10. at the forum, i heard presentations, visited poster sessions, and talked with colleagues from forty-four states and six countries who had something to say and said it well. i hope that some of them may submit manuscripts to ital so that all the members of lita and all the readers of the journal will profit as well from some of what the attendees of the forum heard and saw. i attended the forum forewarned by previous ital editors to carry plenty of business cards, and i went armed with a pocketful. i think i distributed enough that, if pieced together, their blank sides would provide sufficient writing space for at least one manuscript! in an attempt to fulfill the jeffersonian promise above, i hereby list a few of my goals for the beginning of my term as editor. i must emphasize that these goals of mine supplement but do not supplant the purposes of the journal as stated on the first page and on the ital web site (www.ala.org/lita/litapublications/ital/italinformation. htm); likewise, they do not supplant the goals of my predecessors. in no particular order: i hope to increase the number of manuscripts received from our library and information schools. their faculty and doctoral students are some of the incubators of new and exciting information technologies that may bear fruit for future library users. however, not all research turns up maps on which “x marks the spot.” exploration is interesting, even vital, for the journey, for the search itself, and our graduate faculties and students have something to say. i hope to increase the submission of manuscripts that describe relevant sponsored research. in the earlier volumes, jola had an average of at least one article per issue, maybe more, describing the results of funded research. ital can and should be a source that information-technology researchers consider as a vehicle for the publication of their results. two articles in this issue result from sponsored research. in fact, i hope to increase the number of manuscripts that describe any relevant research or cutting-edge developments. much of the exploration undertaken by librarians improving and strengthening their services involves research or problems solved on both small scales and large. neither the officers of lita, the referees, the readers, nor i are interested in very many “how i run my library good” articles. we all want to read a statement of the problem(s), the hypotheses developed to explore the issues surrounding the problem(s), the research methods, the results, the assessment of the outcomes, and, when feasible, a synthesis of how the research methods or results may be generalized. i hope to increase the number of articles with multiple authors. libraries are among society’s most cooperative institutions and librarians, members of one of the most cooperative of professions. the work we do is rarely that of solitary performers, whether it be research or the editorial | webb 3 editorial: first have something to say john webb john webb (jwebb@wsu.edu) is assistant director for digital services/collections, washington state university libraries, pullman, and editor of information technology and libraries. (continued on page 21) __problems with unauthorized people accessing the internet through the wireless network __problems with restricted parts of the network being accessed by unauthorized users __other 3. how were security problems resolved? benefits of use of network 1. what have been the biggest benefits of wireless technology? check all that apply. __user satisfaction __increased access to the internet and online sources __flexibility and ease due to lack of wires __has improved technical services (use for library functions) __has aided in bibliographic instruction __provides access beyond the library building __allows students to roam the stacks while accessing the network __other 2. how would you describe current usage of the network? __heavy __moderate __low 3. in your opinion, has this technology been worth the benefit-cost ratio thus far? __yes __no __not sure 4. what advice would you give to librarians considering this technology? (editorial continued from page 3) design and implementation of complex systems to serve our users. writing about that should not be solitary either. i hope to publish think-pieces from leaders in our field. i hope to publish more articles on the management of information technologies. i hope to increase the number of manuscripts that provide retrospectives. libraries have always been users of information technologies, often early adopters of leading-edge technologies that later become commonplace. we should, upon occasion, remember and reflect upon our development as an information-technology profession. i hope to work with the editorial board, the lita publications committee, and the lita board to find a way, and soon, to facilitate the electronic publication of articles without endangering—but in fact enhancing—the absolutely essential financial contribution that the journal provides to the association. in short, i want to make ital a destination journal of excellence for both readers and authors, and in doing so reaffirm the importance of lita as a professional division of ala. to accomplish my goals, i need more than an excellent editorial board, more than first-class referees to provide quality control, and more than the support of the lita officers. i need all lita members to be prospective authors, prospective referees, and prospective literary agents acting on behalf of our profession to continue the almost forty-year tradition begun by fred kilgour and his colleagues, who were our predecessors in volume 1, number 1, march 1966, of our journal. reference 1. walt crawford, first have something to say: writing for the library profession (chicago: ala, 2003). wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 21 75 video technologies: neologism or library trend? converging factors are shaping a new environment for libraries, and, as a consequence, the present is full of opportunity. technical and social changes provide libraries with a host of alternatives for service, growth, and innovation. in this new environment libraries will, undoubtedly, continue to promote the availability of books and other materials, continue to increase their efforts to furnish patrons with information, and continue to broaden the range of activities offered so that patrons can receive personalized service. patron information seeking and searching methods we have known, however, will give way to new methods based on computers and telecommunications. a host of new technologies is growing out of the evolutionary pathway marked by telegraph, telephone, radio, and television . broadband communications (that's the cable that today brings you predominantly entertainment television), satellite, videotex, teletext, videodisc, videotape, large-screen television, and computer displays (some are as large as the side of a building) are available either today or within the next year or two . each of these technologies is a new medium within its own inherent capabilities and limitations. each has the promise of providing faster and more cost-efficient information services than some present forms of printed communication. and each requires a different approach and different knowledge for effective and efficient use, and integration into library operations. in a growing number of locations, cable communications for delivery of library se rvices have already been made available virtually free of charge . other technologies, such as videotex, will grow significantly . estimates suggest that in five years more than 8 million american homes will be able to obtain extensive, automated information services from commercial, private, and government sources . probably a larger number will receive limited information services over the broadcast airwaves via teletext. dramatic new services will combine television, computer, telephone, satellite, and cable into home entertainment and information centers ... potential extensions of libraries. some sources suggest more than 50 percent of the american gross national product results from the collection, processing, and dissemination of information, much of which involves new technologies . inev76 journal of library automation vol. 14/2 june 1981 itably, this technological trend also will occur in libraries and, in this light, the relatively low level of involvement of computers in providing patron services today is notable. by their natural inertia, individuals and organizations in the library community will be opposed to the acceptance of cable services, videotex , online catalogs, information retrieval , and other video technologies simply because it represents change . but these technologies are technically feasible and are becoming an economic reality. the point of demarcation between computer and library may well become a terminal in a patron's home. whether or not the service provided is a library's or a commercial competitor's depends to a great extent on how libraries define their role in this environment, and on the degree of library participation in the evolutionary process that's now taking place. something besides inertia opposes the acceptance of new technologies, however. to some degree, lack of awareness of technological trends is a factor, but more significant is a lack of clear understanding (both by the proponents of the technology and by librarians) of how new technology can be integrated into the library setting. understanding the value a technology offers for increased service or decreased cost, for example should be paramount, but frequently the technology seems to be offered as an end in itself. internal and external factors must be considered to guide the application of technology toward meeting library and patron needs. financial concerns, social forces , and the consumer/patron appear to be major factors leading libraries toward a future deeply involved in video technologies. whether the outcome will result from external pressure or internal plan remains to be seen. it's incumbent upon libraries to be informed and active participants in directing their own future in this kind of an environment. what are the implications of this technical evolution and internal/external factors? one thing is sure: it's a massive industry growing at a very rapid rate, and it is going to grow even faster . libraries have the opportunity to grow with this trend through application of the technologies to existing technical services, increased availability of patron services, and development of innovative services . if there is a common thread that can identify those libraries which will grow and prosper, that thread is flexibility the capacity of library management and staff to adapt this library to the new environment, and integrate technology into their library. readers and contributors of ]ola are the people that can e ither have an integral part in defining the future direction of libraries, or passively watch patron needs outstrip services. library schools and people involved in library-related research must play a key role in assessing the value of video technologies and defining how to integrate them into the business and service of libraries. what is going to preserve and enhance editorial!harnish 77 the role of libraries in the 1980s will not only be flexibility but another very critical element foresight, dedicated to patron needs . many libraries have met this technological revolution head-on and are intimately involved in testing, developing, and providing innovative library services . in this and forthcoming issues, we hope to bring a perspective on these changes that is valuable and cogent to the library community. readers of ]ola and practitioners in all areas of video technology are called upon to describe their efforts and share their results drawn from this rapidly changing field through contributions to this journal. thomas d. harnish editor's notes jola will continue to be interesting and useful to its readers to the extent that its readers are willing to expend the efforts to also be its writers. the authors in this issue are all as busy as you and i. they have made time in their already full schedules to write down ideas and information they hope will be useful and provocative. they and we of the ]ola staff hope you are pleased with the results . so what's new by you? how have your patrons reacted to your new online catalog? what do the costs of your acquisitions system look like? how about that idea you had about a new way to do whatever? do you think the fuss over authority control is worth it? if you have ,ideas, perceptions, or stories to tell that you feel are of interest to your fellow readers, please write and let them know. editorial board thoughts: the promise of immersive libraries jerome yavarkovsky information technology and libraries | december 2013 5 immersive technologies—interactive 3d graphics, simulation, and gaming technologies—have much to offer higher education by collapsing geography and by providing a richer learning environment. over the past forty years, through digitization and internet services, librarians have brought technology to bear on making it easier to find and use information, even to the point where people can find and use library resources without coming to the library. information space—the space between user and literature—has been collapsed through digital access. now there is potential to collapse the space between users themselves as they work together from different locations. in recent decades, learning has gone from a predominantly independent and competitive process for students to one that makes greater use of collaboration, cooperation, and group study. the library as a place for students and researchers to work individually with their literature has become a collaborative workspace where students work together on research projects and shared class assignments. this presents a challenge to libraries and learning institutions with limited space for students to meet, share ideas, do coursework together, work on joint projects, and practice presentations. for more than five years, advances in the development of virtual meeting space and workspace have enabled librarians to provide immersive, 3d virtual world services that give a sense of presence that is lacking in conference calls, text chat, and web conferencing.1 as a result, not only is the individual’s physical distance from library materials eliminated, but also the distance is eliminated between individuals who work with each other using library materials. immersive technologies offer the promise of 3d virtual world libraries where students and their teachers can work together in virtual space with library materials and tools—search engines, online catalogs, media, text, etc. students would sit at their computer wherever they are and work together with classmates in a shared environment, using library materials as well as productivity tools—word processor, spreadsheet, web, or blog development software. this would be a boon for real-world institutions lacking sufficient physical space, but also for distance education and international education. for example, students might study abroad but take a class at their home institution, take classes with classmates at foreign institutions, learn jerome yavarkovsky (jeromey@bc.edu) is emeritus university librarian, boston college, and founding co-chair of the libraries and museums technology working group under the immersive education initiative of the media grid. editorial board thoughts | yavarkovsky 6 languages and experience foreign cultures more directly, or take classes in locales not accessible to professors. and now, with the advent of massive open online courses, the opportunity lies within the immersive library to serve large numbers of students who have limited or no library facilities. these students would be able to use library resources as well as to communicate and work with classmates, teaching assistants, tutors, and others whose support would be augmented through their virtual presence rather than via email or text chat. as current research material is born digital, and legacy material is digitized at accelerating rates and is delivered digitally, they are perfect for use in immersive environments. however, immersive library resources are not limited to traditional materials. they include virtual world learning objects and environments, and virtual representations of books—walk-in books or educational simulations—that are interactive experiences with literature.2 further, in virtual space, physical research objects can be part of the study environment and be brought into an active relationship with the information resources that pertain to them. for example, if you were studying 3d models of mayan pottery in a virtual library workspace, you would have access to historical accounts, conference papers, current periodical articles, photographic archives, dissertations and other material pertinent to your studies. through this, not only would distance be eliminated between individuals and library materials, but also physical research objects from pottery to architecture, from monuments to molecular models could be represented in virtual space and related to the information that pertains to them.3 so whether the collaborators are students, teachers, librarians, researchers, we can see a time, even now, when they will no longer be bound by physical workspace. in addition to the immersive library as repository and collaborative study space, the immersive library can be expected to offer enhanced services to users. for example, immersive information literacy programs, immersive research and course consultations, virtual interlibrary document management, and document delivery are just a few possibilities. the immersive workspace is a logical setting for instructing students in digital literacy, the evaluation of resources, and the use of information tools. the goal here would be to bring into the immersive environment the rich body of course designs and curriculum materials pertaining to the educated use of information. we know that among the most significant lifelong benefits of higher education are information skills—finding, evaluating, and using information for work or for personal enrichment. research and course consultations are further academic services that would be valuable applications in the immersive library. librarians, library assistants, and docents have all helped individuals in second life and opensim libraries by providing advice, information, and guidance. these services have further potential in 3d virtual environments through enhancement with photographs, documents and virtual structures. add to these the tools and resources of the library collaborative workspace, and the potential for student and faculty consultation and advisory information technology and libraries | december 2013 7 services is even greater. imagine the phd candidate or student with a writing assignment in the library space with the librarian and all the tools needed for help with dissertation research. as immersive libraries realize their potential and grow in number and scale, the challenge of managing them will grow as well. the tools of 3d virtual workspaces hold promise here also to facilitate the work of the library and improve librarian productivity. librarians work across geographic boundaries regionally and nationally in consortia and multitype systems for resource sharing, collaborative research and development projects, digitization initiatives, staff development programs, and any number of efforts to economize and improve performance. the very management of the library enterprise should benefit from 3d virtual reality tools brought to bear on the day-to-day work and communication of the library. for any who might want to learn more, the ala virtual communities and libraries member initiative group maintains an ala connect site (http://connect.ala.org/node/66325). communication on libraries in virtual environments is also available through the acrl virtual worlds interest group and its google group, acrlinsl (http://groups.google.com/group/acrlinsl). references 1. lori bell and rhonda b. trueman, eds., virtual worlds, real libraries: librarians and educators in second life and other multi-user virtual environments (medford nj: information today, 2008); tom peters, “librarianship in virtual worlds,” library technology reports 44, no. 7 (october 2008). 2. aaron griffiths, ma education in virtual worlds: immersive literature, “presenting the novel night, by elie wiesel, as an immersive literature discussion space,” youtube video, www.youtube.com/watch?v=i-ijpjcwtxa&feature=player_embedded#t=0, blog post at f/xual education services, may 17, 2013, http://fxualeducation.wordpress.com/2013/05/17/immersivelit/; bernadette daly swanson/daisyblue hefferman, “second life: bradburyville virtual experience, fahrenheit 451,” youtube video, 2008, http://www.youtube.com\\watch?v=yyhqo0q2m_g. 3. luís miguel sequeira and leonel caseuri morgado, “virtual archaeology in second life and opensimulator,” journal of virtual worlds research 6, no. 1 (april 2013), http://journals.tdl.org/jvwr/index.php/jvwr/article/view/7047. http://connect.ala.org/node/66325 http://groups.google.com/group/acrlinsl http://fxualeducation.wordpress.com/2013/05/17/immersivelit/ http://www.youtube.com/watch?v=yyhqo0q2m_g http://journals.tdl.org/jvwr/index.php/jvwr/article/view/7047 microsoft word ital_march_gerrity.docx editor’s comments bob gerrity information technology and libraries | march 2013 1 with this issue, information technology and libraries (ital) begins its second year as an open-‐ access, e-‐only publication. there have been a couple of technical hiccups related to the publication of back issues of ital previously only available in print: the publication system we’re using (open journal system) treats the back issues as new content and automatically sends notifications to readers who have signed up to be notified when new content is available. we’re working to correct that glitch, but hope that the benefit of having the full ital archive online will outweigh the inconvenience of the extra e-‐mail notifications. overall though, ital continues to chug along and the wheels aren’t in danger of falling off any time soon. thanks go to mary taylor, the lita board, and the lita publications committee for supporting the move to the new model for ital. readership this year appears to be healthy—the total download count for the thirty-‐three articles published in 2012 was 42,166, with 48,160 abstract views. unfortunately we don’t have statistics about online use from previous years to compare with. the overall number of article downloads for 2012, for new and archival content, was 74,924. we continue to add to the online archive: this month the first issues from march 1969 and march 1981 were added. if you haven’t taken the opportunity to look, the back issues offer an interesting reminder of the technology challenges our predecessors faces. in this month’s issue, ital editorial board member patrick “tod” colegrove reflects on the emergence of makerspace phenomenon in libraries, providing an overview of the makerspace landscape. lita member danielle becker and lauren yannotta describe the user-‐centered website redesign process used at the hunter college libraries. kathleen weessies and daniel dotson describe gis lite provide examples of its use at the michigan state university libraries. vandana singh presents guidelines for adopting an open-‐source integrated library system, based on findings from interviews with staff at libraries that have adopted open-‐source systems. danijela boberić krstićev from the university of novi sad describes a software methodology enabling sharing of information between different library systems, using the z39.50 and sru protocols. beginning with the june issue of ital, articles will be published individually as soon as they are ready. ital issues will still close on a quarterly basis, in march, june, september, and december. by publishing articles individually as they are ready, we hope to make ital content more timely and reduce the overall length of time for our peer-‐review and publication processes. suggestions and feedback are welcome, at the e-‐mail address below. bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. 157 library network analysis and planning (lib-nat) maryann duggan: director, industrial information services program, southern methodist university, dallas, texas a preliminary report on planning for networl< design undertaken by the reference round table of the texas library association and the state advisory council to library services and construction act title iii texas program. necessary components of a network are discussed, and network transactions of eighteen dallas area libraries analyzed using a methodology and quantitative measures developed fm· this project. to be a librarian in 1969 is to stand at the crossroads of change, with a real opportunity to put libraries and professional experience to work on immediate problems of today's world. in mobilizing total library resources for effective service to a variety of patron groups in a variety of ways, the librarian has at hand an exciting new tool of great potential and equally great challenge: the library network library networks and reference services networks and all that they imply are simply an extension of good reference services as they have been practiced for years, but their existence and potential capability require redefinition of the reference function, which, being no longer limited to one collection, has been given new dimensions of time, depth and breadth. networks, and the inter-library cooperation they require, offer an opportunity to combine materials, services and expertise in order to achieve more than any one library can do alone. in this case, the whole is greater than the sum of its parts, for each library can offer its particular patron group the total capability of the network, including outside resources not previously available. 158 journal of library automation vol. 2/ 3 september, 1969 with the new tool of library networks, it is possible to provide responsive, personalized, in-depth reference service, and to provide it so rapidly that a patron can receive a pertinent bibliography covering his desired topic within an hour of his original inquiry. the reference librarian becomes an expert in resources and resource availability at the national level. his reference desk becomes a switching center, at which he receives and analyzes inquiries, decides the level of service required, identifies available sources or resources that match an inquiry, transmits the latter ( restructured to be compatible with the network language), conducts a dialog with the source, receives the response and interprets it to the patron. this procedure is not markedly different from what has been done for years in any reference library, but with greater potential the process must be more formalized and structured. networks do require new expertise and crystallizing the reference philosophy. clarification is needed as to 1) types or levels of reference services, and unit operations in reference services; 2) the role of in-depth subject analysis of reference queries; 3) decisions on alternate choices of sources and of communications links; 4) structuring of large blocks of resources to permit fast access; and 5) the role of each library in the network and its responsibility to the network. approach to network design the reference round table of the texas library association and the state advisory council to library services and construction act title iii texas program have been struggling with the challenge of inter-library network design for the past two years. this paper is written to share with reference librarians some of their preliminary findings and to urge the involvement of reference librarians in planning and developing networks and network parameters. for identification the project herein described is referred to as lib-nat, for library network analysis theory. although only the author can be blamed for any faults of this "theory," many persons have contributed to the development of it. the reference round table of the texas library association has provided the forum for exploring and developing ideas on inter-library cooperation. title iii of the library services and construction act has provided the legal and financial impetus enabling the field testing of some of those ideas. texas chapter, special libraries association, has sparked and catalyzed ideas and clarified needs. the state technical services act provided the vehicle for experimental development of new approaches to reference services. southern methodist university provided the haven and ivory tower from which these new approaches could be tried under the cloak of academic respectability. but, of greatest importance of all, individual librarians, with vision and desire to be of service and willingness to try new things, have been the driving force in helping to develop new concepts of library use and purpose in the texas area. lib-nat 159 the basic philosophy back of lib-nat is simply that any person anywhere in the state of texas should have access to any material in any library anywhere in the state through a planned, orderly, effective system that will preserve the autonomy of each library while serving the needs of all the citizens of the state. particular needs of special user groups (such as the blind or the accelerated student or the industrial researcher) should also be identified and provided for in a cooperative mode through local libraries throughout the state. network components in the process of developing lib-nat, twelve critical components were identified that are essential to orderly, planned development of the objectives stated above. as a minimum, such a network must have the following: 1) organizational structure that provides for fiscal and legal responsibility, planning, and policy formulation. it must require commitment, operational agreement and common purpose. 2) collaborative development of resources, including provision for cooperative acquisition of rare and research material and for strengthening local resources for recurrently used material. the development of multi-media resources is essential. 3) identification of nodes that provide for designation of role specialization as well as for geographic configuration. 4) identification of primary patron groups and provision for assignment of responsibility for library service to all citizens within the network. 5) identification of levels of service that provide for basic needs of patron groups as well as special needs, and distribution of each service type among the nodes. there must be provision for "referral" as well as "relay" and for "document" as well as "information" transfer. 6) establishment of a bi-directional communication system that provides "conversational mode" format and is designed to carry the desired message/document load at each level of operation. 7) common standard message codes that provide for understanding among the nodes on the network. 8) a central bibliographic record that provides for location of needed items within the network. 9) switching capability that provides for interfacing with other networks and determines the optimum communication path within the network. 10) selective criteria of network function, i. e., guidelines of what is to be placed on the network. 11 ) evaluation criteria and procedures to provide feedback from users and operators and means for network evaluation and modification to meet specified operational utility. 160 journal of library automation vol. 2/ 3 september, 1969 12) training programs to provide instruction to users and operators of the system, including instruction in policy and procedures. the foregoing components of the ideal inter-library network (one so designed that any citizen anywhere in the state can have access to the total library and information resources of the state through his local library) may be considered the conceptual model, or the floor plan from which the network of the program can be constructed. although these twelve components might be labeled "ideal," they are achievable and they are within reach of the present capability of all libraries today. they have also weathered the unrelenting critique of 288 reference librarians in the march 27, 1969, tla reference round table ("the 1969 reference round table pre-conference institute: an overview," texas library journal, vol. 45 (summer 1969), no. 2.). during that reference round table the twelve components were tested in a simulated network, using 42 cases. in this behavioral model actual, current inter-library practices were observed during game-playing in the simulated network. the experience verified that the components outlined above are essential to the development of planned, cooperative, inter-library systems. analysis of network transactions as part of the lsca title iii project, and to test the twelve components, exploration was instituted into the existing inter-library relations among eighteen libraries of all types in the dallas area to see how current practices compared with the ideal conceptual model the essential minimum requirement of a library is document transfer, i. e., the ability to supply a known item on request; and on-going inter-library loan transactions are a valid indicator of emerging network patterns in the current environment. this microscopic study of 1967 individual library loans among eighteen libraries of different types has provided a wealth of insight into network developments. as a pilot model it has offered a means of observing and studying existing practices, identifying problems, and experimentally evaluating the effect of changes in the system or environment. more must be known about on-going inter-library transactions for the design of improved networks. in the attempt to find out who was attempting to borrow what from whom and how successfully requests were filled, the following variables were considered: 1) type of library, both borrowing and lending, such as academic, public, special, or public school. 2) type of message format, i. e., telephone, twx, telex, letter, or interlibrary loan. 3) type of item requested in the transaction, such as monograph, serial, map, document. 4 ) geographic location of borrowing and lending library, i. e., local, area, state, regional, national or international. lib-nat 161 the complexity of even a small pilot model required the formulation of some rigor in the analysis and the development of analytical tools and symbolic models. figure 1, for example, is a symbolic model that permits comparison of two variables simultaneously, e. g., the type of library participating in the transactions and the geographic level of the participants. for modeling purposes, it was assumed all libraries fall into one of four 1 = local 3 = state 2 = area 4 = re ion switching centers fig. 1. symbolic model of inter-library networks. classes represented by the quadrants in figure 1. also it was assumed that each library can be identified as to a specific geographic level, as indicated by the numbers 1 through 6. in the analysis of the pilot model data it was observed that transactions occur among libraries of the same type and at the same geographic level, and between libraries of different types at different geographic levels. figure 1 provides a symbolic model for conceptualizing these various types of transactions. switching centers, represented on figure 1 by the circles around the geographic numbers, participate in transactions at varying geographic levels, as well as between and among various types of library sectors. the role and the location of switching centers is an important aspect of lib-nat. 162 journal of library automation vol. 2/ 3 september, 1969 within the framework of the symbolic model, the simple form of interlibrary loan may be represented as a two-body transaction between the borrowing library and the lending library, as shown in figure 2. applying these transactions on the symbolic model of figure 1 and considering both a b fig. 2. two-body transaction. type of library and geographic level, four general classes of two-body transactions can be identified: 1 ) homogeneous vertical, i. e., between two libraries of the same type but at different geographic levels (pt _..,.. p~; st _..,.. sa) ; 2) heterogeneous horizontal, i. e., between two different types of libraries at different levels ( pt _..,.. a1; st _..,.. p1); 3) heterogeneous vertical, i. e., between two different types of libaries at different levels (pt _ ..,.. a4; sl _..,.. pg); 4) homogeneous horizontal, i.e., between two libraries of the same type and the same geographic level (pt _..,.. pt; s2 .... s2). the formulas serve as a shorthand symbolic representations of some typical transactions of these four classes. the final report on lib-nat will contain statistical data on distribution of pilot model transactions by type and by geographic level, showing type interdependency and geographic dependency or self-sufficiency. further analysis of the pilot model data revealed another type of transaction, the three-body transaction, in which a third agent becomes involved. the third agent may act as a referral center, as illustrated in figure 3, or as a relay center, as illustrated in figure 4 ( sw indicates switching center) . part of the lib-nat theory specifies that there is a distinction between referral and relay, and that the latter is a valid function of a true switching center. figure 5 illustrates the various types of possible three-body transactions with different geographic levels of switching among the different types of libraries. which of these transactions is the most efficient or has the greatest utility is one of the basic design parameters needing further analysis. it should be noted that the variable, of message format, that is, the channel of communication or type of communication link, has not yet been investigated in the symbolic modeling of these transactions. lib-nat 163 ..... .... a ... b c ~ .4~ t fig. 3. three-body transaction: referral. ... .... a sw b fig. 4. three-body transaction: relay. at • swt ., st at • sws ~ a4 ps • sws • pt sc1 ~sw2 .... sc2 p2 .. p1 ~sw3 .. p3 p2 ... p1 ~sw3 ~ a3 p2 • p1 ~sw3 .. sw1 ~s1 fig. 5. three-body transactions at various geographic levels. .. . . 'i 164 journal of library automation vol. 2/ 3 september, 1969 network configuration another very important design parameter is the network configuration or organizational hierarchy specifying the communication channels and message flow pattern. figure 6 illustrates symbolically a non-directed configuration of communication. if each dot represents a node in the network ( i. e., a participating library), and each line represents a communication link, it can be seen that each node can communicate directly with every other node, providing (or requiring) a total of fifteen links among the six nodes . n·l c = n (-2-) =15 fig. 6. non-directed network . by contrast, figure 7 illustrates a directed configuration to which the six nodes are interconnected through a switching center and requiring only six channel links. in like manner, if a non-directed network desires to interface with a specialized center, such as the library of congress or a special bibliographic center or search center, a total of twenty-one channels is required (figure 8), whereas a directed network can interface with a specialized center via only seven channels, as illustrated in figure 9. __ .... ----------jc~n-t=sl fig. 7. directed network. lib-nat 165 fig. 8. non-directed network including specialized center. fig. 9. directed network including specialized center. 166 journal of library automation vol. 2/ 3 september, 1969 as local or area networks begin to develop, there will be a need for tying together two area networks to develop larger units of service. the interfacing of an original network of six libraries in one area with an adjoining area network of sl"( libraries will result in the network configuration shown in figure 10 in the case of a non-directed network, and sixty-six communication links among twelve nodes will be required. whereas, if two directed networks of six libraries each desire to interface, a type of linkage requiring only thirteen channels may be envisioned (figure 11). which is the best type of network configuration? what are the decision parameters that should be considered in designing or planning network configuration? how can alternate configurations be evaluated ? alternate channel requirements? and alternate geographic levels of switching? in the pilot model study, a mathematical model has been devised which can be used for simulating various configurations and channel capacities, fig. 10. interface of two non-directed networks. fig. 11 . interface of two directed networks. lib-nat 167 thereby permitting some desired criteria function of network performance to be maximized or optimized. the details of the mathematical model will be published as part of the final report on lib-nat; in the meantime it can be said that this is a fascinating area of network analysis which will be useful to any group of libraries planning network configurations. the mathematical model-a multi-commodity, multi-channel, capacitated network model, developed by dr. richard nance at southern methodist university as part of the title iii project-promises to have a high potential application in network design and performance evaluation. it does require that the librarian make some hard-nosed decisions on operational and performance parameters of the inter-library systems discussed in the preceding article, but this is part of the challenge of lib-nat. measures of participation it is obvious that types of libraries, geographic level, types of transactions, various network configurations, alternate communication links and switching levels are all important in planning inter-library systems. next it is necessary to take an in-depth look at the relationship between the individual participating library and the total network. in the pilot model study of eighteen libraries a noticeable difference appeared in the magnitude and type of participation. in surveying only the two-body transactions, it was observed that some libraries were primarily borrowers and others primarily lenders, and some were heavy and some light. in pursuit of a quantitative method of representing these relationships some formulae were evolved which are helpful in understanding node/network dynamics. starting with the individual library or node, let b .. equal the number of borrowing transactions originating at that node and l,. equal the number of lending transactions; then l.. plus b,. will equal the total number of all transactions at that particular node. in like manner, looking at the total network (in this case all eighteen participating libraries), let bt equal the total number of borrowing transactions originating in the network and l, the total number of lending transactions; then b, plus lt will equal the total number of both types of transactions in the network. in the analysis of node/network dynamics, it was felt there should be some way of quantitatively expressing the individual node's dependency on the total network and also a way of expressing the relative degree of activity of each node. in other words, a participating library that was a net borrower (compared to its lending) was obviously more dependent on the network than would be a library that borrowed very little compared to its lending. the extent of dependency can be expressed as a node dependency coefficient calculated as follows: b. b,. + l,. relative amount of borrowing compared to total node transactions .. .. 168 journal of library automation vol. 2/3 september, 1969 among its other uses, the dependency coefficient of a node may give some insight into the extent to which it should share in network expenses, but the dependency coefficient alone should not be a final criterion, since magnitude of activity is of equal importance. for developing a method of quantitatively expressing activity of a node compared to total activity of the network a factor called the node activity coefficient may be calculated as follows: relative activity of both types at _ one node compared to total activity be + lt in total network bn + ln then, to quantitatively express the dependency of a given node on the network, one can calculate the node/ network dependency coefficient as follows: b b+l fig. 12. 0. 0.6 0.5 0.4 0.3 o' o i 0.2 cp i 0.1 i i i 100 be + le > o. 5 = net borrower < 0.5 = net lender i i i i q i i i i i ·i i i i i 200 300 400 b+ l node dependency coefficient. i i i i i i b>l1 i b

q0.2241, the null hypothesis is rejected and it must be accepted that 15,123 is an outlier. once it is determined with statistical certainty that the suspected outlier is indeed an outlier, it needs to be replaced with the median calculated from all values found in dataset 2. for the case of polymer, the median was calculated to be 27 from all values in table 2. replacing an outlier with the median to accommodate the data has been proven to be quite effective in dealing with outliers by introducing less distortion to that dataset.39 extreme values are therefore replaced with values more consistent with the rest of the data.40 january february march april may june july august september october november december polymer 2009 27 14 35 22 15 28 24 19 11 8 13 7 polymer 2010 12 15 26 33 38 64 39 5 13 27 109 44 polymer 2011 113 159 638 345 52 57 94 70 39 36 221 65 polymer 2012 130 4 98 24 27 18 13 16 18 25 9 5 table 3. the identified outlier is replaced with the median (highlighted in bold). the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 37 table 3 represents the number of full-text articles downloaded for polymer after the outlier had been replaced with the median. the confirmed outlier of 15,123 articles downloaded recorded in october 2010 is replaced with the median of 27, highlighted in bold. this then becomes the accepted value for the number of articles downloaded from polymer in october 2010. the outlier is discarded. the new value of 27 articles downloaded in october 2010 replaces the extreme value of 15,123 in the original 2010 jr1 report (see table 4). this is the final step. january february march april may june july august september october november december polymer 12 15 26 33 38 64 39 5 13 27 109 44 surface and coatings technology 3 1 2 1 22 17 17 0 12 3,771 5,428 601 international journal of radiation oncology 11 18 35 22 17 6,436 176 13 25 29 24 19 journal of catalysis 0 1 5 1 2 2 16 4 0 2 6,693 1 table 4. sample from a 2010 jr1 counter-compliant report indicating the number of articles downloaded per journal over a twelve-month period. polymer’s identified outlier is replaced with the median calculated from table 2 (highlighted in bold). once the first outlier is corrected, the same procedures need to be followed for the other suspected outliers highlighted in table 1. if it is determined that they are outliers, they are replaced with their associated median values. although the steps and calculations used to identify and correct for outliers are relatively simple to follow, it is admittedly a very lengthy and timeconsuming process. but in the end, it is well worth the effort. results and discussion table 5 details the changes in the overall number of articles downloaded from j. n. desmarais library e-journals that resulted from the elimination of outliers. the column titled “recorded downloads” details the number of articles downloaded between 2000 and 2012, inclusively, prior to outlier testing. the column titled “corrected downloads” represents the number of articles downloaded during the same period of time but after the outliers had been positively identified and the data cleaned. the affected values are highlighted in bold. information technology and libraries | june 2014 38 year recorded downloads corrected downloads 2000 806 806 2001 1034 1034 2002 1015 1015 2003 4890 4890 2004 72841 72841 2005 251335 251335 2006 640759 640759 2007 731334 731334 2008 710043 710043 2009 725019 725019 2010 857360 757564 2011 869651 696973 2012 716890 716890 table 5. comparison of the recorded number of articles downloaded to the corrected number of articles downloaded, over a thirteen-year period. all data from all available years were tested for outliers. only data recorded in 2010 and 2011 tested positive for outliers. replacing outliers with the median values for those affected journal titles dramatically reduced the total number of downloaded articles (see table 5). between 2007 and 2009, inclusively, the actual number of full-text articles downloaded recorded from the library’s e-journal collection totaled between 731,334 and 725,019 annually (see table 5). the annual average for those three years is 722,132 articles downloaded. but in 2010 that number dramatically increased to 857,360 downloaded articles, which was followed by 869,651 downloaded articles in 2011 (see table 5). the elimination of outliers from the 2010 data resulted in the number of downloads dropping from 857,360 to 757,564, a difference of nearly 99,796 downloads, or 12 percent. similarly, in 2011, the number of articles downloaded decreased from 869,651 to 696,973 once outliers were replaced with median values. this represents a reduction of over 172,678 downloaded articles, or 20 percent. a staggering 20 percent of articles downloaded in 2011 can therefore be considered as erroneous and, in all likelihood, the result of illicit downloading. figure 1 is a graphical representation of the change in the number of articles downloaded before and after the identification of outliers and their replacement by median values. the line “recorded downloads” clearly indicates a surge in usage between 2010 and 2011 with usage returning to levels recorded prior to the 2010 increase. the line “corrected downloads” depicts a very different picture. the plateau in usage that began in 2007 continues through 2012. evidently, the observed spike in usage was artificial and the result of the presence of outliers in certain datasets. if the data had not been tested for outliers, it would have appeared that usage the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 39 had substantially increased in 2010 and it would have been incorrectly assumed that usage was on the rise once more. instead, the corrected data bring usage levels for 2010 and 2011 back in line with the plateau that had begun in 2007 and reflects a more realistic picture of usage rates at laurentian university. figure 1. comparing the recorded number of articles downloaded to the corrected number of articles downloaded over a thirteen-year period. accuracy in any data gathering is always extremely important, but accuracy in e-resource usage levels is critical for academic libraries. academic libraries having e-journal subscription rates based either entirely or partly on usage can be greatly affected if usage numbers have been artificially inflated. it can lead to unnecessary increases in cost. since it was determined that outliers were present only during the period in which the library had found itself under “attack,” it can be assumed that the vast majority, if not all, of the extreme usage values were a result of illegal downloading. it would therefore be a shame to need to pay higher costs because of inappropriate or illegal downloading of licensed content. accurate usage data is also important for academic libraries that integrate usage statistics into their collection development policy for the purpose of justifying the retention or cancellation of a particular subscription. the j. n. desmarais library is such a library. as indicated earlier, if the cost-per-download of a subscription is consistently greater than the cost of an interlibrary loan for three or more years, it is marked for cancellation. at the j. n. desmarais library, the average cost of an interlibrary loan had been previously calculated to be approximately can$15.00.42 therefore, subscriptions recording a “cost-per-download” greater than the can$15.00 target for more than three years can be eliminated from the collection. information technology and libraries | june 2014 40 any artificial increase in the number of downloads would have as result to artificially lower the cost-per-use ratio. this would reinforce the illusion that a particular subscription was used far more than it really was and lead to the false belief that it would be less expensive to retain rather than rely on interlibrary loan services. the true cost-per-use ratio may be far greater than initially calculated. the unnecessary retention of a subscription could prevent the acquisition of another, more relevant, one. for example, after adjusting the number of articles downloaded from sciencedirect in 2011, the cost-per-download ratio increased from can$0.74 to can$1.59, a 53 percent increase. for the j. n. desmarais library, this package was obviously not in jeopardy of being cancelled. but a 53 percent change in the cost-per-use ratio for borderline subscriptions would definitely have been affected. it must also be stated that none of the library’s subscriptions having experienced extreme downloading found themselves in the position of being cancelled after the usage data had been corrected for outliers. regardless, it is important to verify all usage data prior to any data analysis to identify and correct for outliers. once the outlier detection investigation has been completed and any extreme values replaced by the median, there would be no further need to manipulate the data in such a fashion. the identification of outliers is a one-time procedure. the corrected or cleaned datasets would then become the official datasets to be used for any further usage analyses. conclusions outliers can have a dramatic effect on the analysis of any dataset. as demonstrated here, the presence of outliers can lead to the misrepresentation of usage patterns. they can artificially inflate average values and introduce severe distortion to any dataset. fortunately, they are fairly easy to identify and remove. the following steps were used to identify outliers in jr1 countercompliant reports: 1. identify possible outliers: visually inspect the values recorded in a jr1 report dataset (dataset 1) and mark any extreme values. 2. for each suspected outlier identified, take the usage values for the affected e-journal title and incorporate them into a separate blank spreadsheet (dataset 2). incorporate into dataset 2 all other usage values for the affected journal from all available years. it is important that dataset 2 contain only those values for the affected journal. 3. test for the outlier: perform dixon q test on the suspected outlier to confirm or disprove existence of the outlier. 4. if the suspected outlier tests as positive, calculate the median of dataset 2. 5. replace the outlier in dataset 1 with the median calculated from dataset 2. 6. perform steps 1 through 5 for any other suspected outliers in dataset 1. 7. the corrected values in dataset 1 will become the official values and will be used for all subsequent usage data analysis. the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 41 the identification and removal of outliers had a noticeable effect on the usage statistics for j. n. desmarais library’s e-journal collection. outliers represented over 100,000 erroneous downloaded articles in 2010 and nearly 200,000 in 2011. a total of 20 percent of recorded downloads in 2011 were anomalous, and in all likelihood a result of illicit downloading after laurentian university’s ezproxy server was breached. new technologies have made digital content easily available on the web, which has caused serious concern for both publishers43 and institutions of higher learning, which have been experiencing an increase is illicit attacks.44 the history of napster supports the argument that users “will freely steal content when given the opportunity.”45 since web robot traffic will continue to grow in pace with the internet, it is critical that this traffic be factored into the performance and protection of any web servers.46 references 1. victoria j. hodge and jim austin, “a survey of outlier detection methodologies,” artificial intelligence review 85 (2004): 85–126, http://dx.doi.org/10.1023/b:aire.0000045502.10941.a9; patrick h. menold, ronald k. pearson, and frank allgöwer, “online outlier detection and removal,” in proceedings of the 7th mediterranean conference on control and automation (med99) haifa, israel—june 28-30, 1999 (haifa, israel: ieee, 1999): 1110–30. 2. hodge and austin, “a survey of outlier detection methodologies,” 85–126. 3. vic barnett and toby lewis, outliers in statistical data (new york: wiley, 1994). 4. hodge and austin, “a survey of outlier detection methodologies,” 85–126; r. s. witte and j. s. witte, statistics (new york: wiley, 2004); menold et al., “online outlier detection and removal,” 1110–30. 5. menold et al., “online outlier detection and removal,” 1110–30. 6. hodge and austin, “a survey of outlier detection methodologies,” 85–126. 7. laurentian university (sudbury, canada) is classified as a medium multi-campus university. total 2012 full-time student population was 6,863, of which 403 were enrolled in graduate programs. in addition, 2012 part-time student population was 2,652 with 428 enrolled in graduate programs. also in 2012, the university employed 399 full-time teaching and research faculty members. academic programs cover a multiple of fields in the sciences, social sciences, and humanities and offers 60 undergraduate, 17 master’s, and 7 doctoral degrees. 8. alain r. lamothe, “factors influencing usage of an electronic journal collection at a mediumsize university: an eleven-year study,” partnership: the canadian journal of library and information practice and research 7, no. 1 (2012), https://journal.lib.uoguelph.ca/index.php/perj/article/view/1472#.u36phvmsy0j. https://journal.lib.uoguelph.ca/index.php/perj/article/view/1472#.u36phvmsy0j information technology and libraries | june 2014 42 9. ben tremblay, “web bot—what is it? can it predict stuff?” daily common sense: scams, science and more (blog), january 24, 2008, http://www.dailycommonsense.com/web-botwhat-is-it-can-it-predict-stuff/. 10. derek doran and swapna s. gokhale, “web robot detection techniques: overview and limitations,” data mining and knowledge discovery 22 (2011): 183–210, http://dx.doi.org/10.1007/s10618-010-0180-z. 11. c. lee giles, yang sun, and isaac g. councill, “measuring the web crawler ethics,” in www 2010 proceedings of the 19th international conference on world wide web (raleigh, nc: international world wide web conferences steering committee, 2010): 1101–2, http://dx.doi.org/10.1145/17772690.1772824. 12. shinil kwon, kim young-gab, and sungdeok cha, “web robot detection based on patternmatching technique,” journal of information science 38 (2012): 118–26, http://dx.doi.org/10.1177/0165551511435969. 13. david watson, “the evolution of web application attacks,” network security (2007): 7–12, http://dx.doi.org/10.1016/s1353-4858(08)70039-4. 14. eric kin-wai lau, “factors motivating people toward pirated software,” qualitative market research 9 (2006): 404–19, http://dx.doi.org/1108/13522750610689113. 15. huan-chueh wu et al., “college students’ misunderstanding about copyright laws for digital library resources,” electronic library 28 (2010): 197–209, http://dx.doi.org/10.1108/02640471011033576. 16. ibid. 17. ibid. 18. emma mcculloch, “taking stock of open access: progress and issues,” library review 55 (2006): 337–43; c. patra, “introducing e-journal services: an experience,” electronic library 24 (2006): 820–31. 19. wu et al., “college students’ misunderstanding about copyright laws for digital library resources,” 197–209. 20. ibid. 21. vincent j. calluzzo and charles j. cante, “ethics in information technology and software use,” journal of business ethics 51 (2004): 301–12, http://dx.doi.org/10.1023/b:busi.0000032658.12032.4e. 22. s. l. solomon and j. a. o’brien “the effect of demographic factors on attitudes toward software piracy,” journal of computer information systems 30 (1990): 41–46. 23. j. n. desmarais library, “collection development policy” (sudbury, on: laurentian university, 2013), http://www.dailycommonsense.com/web-bot-what-is-it-can-it-predict-stuff/ http://www.dailycommonsense.com/web-bot-what-is-it-can-it-predict-stuff/ http://dx.doi.org/10.1007/s10618-010-0180-z http://dx.doi.org/10.1145/17772690.1772824 http://dx.doi.org/10.1177/0165551511435969 http://dx.doi.org/10.1016/s1353-4858(08)70039-4 http://dx.doi.org/1108/13522750610689113 http://dx.doi.org/10.1108/02640471011033576 http://dx.doi.org/10.1023/b:busi.0000032658.12032.4e the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 43 http://biblio.laurentian.ca/research/sites/default/files/pictures/collection%20development %20policy.pdf. 24. lamothe, “factors influencing usage”; alain r. lamothe, “electronic serials usage patterns as observed at a medium-size university: searches and full-text downloads,” partnership: the canadian journal of library and information practice and research 3, no. 1 (2008), https://journal.lib.uoguelph.ca/index.php/perj/article/view/416#.u364kvmsy0i. 25. martin zimerman, “e-books and piracy: implications/issues for academic libraries,” new library world 112 (2011): 67–75, http://dx.doi.org/10.1108/03074801111100463. 26. ibid. 27. peggy hageman, “ebooks and the long arm of the law,” econtent (june 2012), http://www.econtentmag.com/articles/column/ebookworm/ebooks-and-the-long-arm-ofthe-law--82976.htm. 28. “dataset, n.,” oed online, (oxford, uk: oxford university press, 2013), http://www.oed.com/view/entry/261122?redirectedfrom=dataset; “dataset—definition,” ontotext, http://www.ontotext.com/factforge/dataset-definition; w. paul vogt, “data set,” dictionary of statistics and methodology: a nontechnical guide for the social sciences (london, uk: sage, 2005); allan g. bluman, elementary statistics—a step by step approach (boston: mcgraw-hill, 2000). 29. david b. rorabacher, “statistical treatment for rejection of deviant values: critical values of dixon’s ‘q’ parameter and related subrange ratios at the 95% confidence level,” analytical chemistry 63 (1991): 139–45; r. b. dean and w. j. dixon, “simplified statistics for small numbers of observations,” analytical chemistry 23 (1951): 636–38, http://dx.doi.org/10.1021/ac00002a010. 30. surenda p. verma and alfredo quiroz-ruiz, “critical values for six dixon tests for outliers in normal samples up to sizes 100, and applications in science and engineering,” revista mexicana de ciencias geologicas 23 (2006): 133–61. 31. robert r. sokal and f. james rohlf, biometry (new york: freeman, 2012); j. h. zar, biostatistical analysis (upper saddle river, nj: prentice hall, 2010). 32. “null hypothesis,” accessscience (new york: mcgraw-hill education, 2002), http://www.accessscience.com. 33. ibid. 34. “critical value,” accessscience, (new york: mcgraw-hill education, 2002), http://www.accessscience.com. 35. ibid. 36. verma and quiroz-ruiz, “critical values for six dixon tests for outliers,” 133–61. http://biblio.laurentian.ca/research/sites/default/files/pictures/collection%20development%20policy.pdf http://biblio.laurentian.ca/research/sites/default/files/pictures/collection%20development%20policy.pdf https://journal.lib.uoguelph.ca/index.php/perj/article/view/416#.u364kvmsy0i http://dx.doi.org/10.1108/03074801111100463 http://www.econtentmag.com/articles/column/ebookworm/ebooks-and-the-long-arm-of-the-law--82976.htm http://www.econtentmag.com/articles/column/ebookworm/ebooks-and-the-long-arm-of-the-law--82976.htm http://www.oed.com/view/entry/261122?redirectedfrom=dataset http://www.ontotext.com/factforge/dataset-definition http://dx.doi.org/10.1021/ac00002a010 http://www.accessscience.com/ http://www.accessscience.com/ information technology and libraries | june 2014 44 37. rorabacher, “statistical treatment for rejection of deviant values,” 139–45. 38. ibid. 39. jaakko astola and pauli kuosmanen, fundamentals of nonlinear digital filtering (new york: crc, 1997); jaakko astola, pekka heinonen, and yrjö neuvo, “on root structures of median and median-type filters,” ieee transactions of acoustics, speech, and signal processing 35 (1987): 1199–201; l. ling, r. yin, and x. wang, “nonlinear filters for reducing spiky noise: 2dimensions,” ieee international conference on acoustics, speech, and signal processing 9 (1984): 646–49; n. j. gallagher and g. wise, “a theoretical analysis of the oroperties of median filters,” ieee transactions of acoustics, speech, and signal processing 29 (1981): 1136–41. 40. menold et al., “online outlier detection and removal,” 1110–30. 41. ibid. 42. lamothe, “factors influencing usage”; lamothe, “electronic serials usage patterns.” 43. paul gleason, “copyright and electronic publishing: background and recent developments,” acquisitions librarian 13 (2001): 5–26, http://dx.doi.org/10.1300/j101v13n26_02. 44. tena mcqueen and robert fleck jr., “changing patterns of internet usage and challenges at colleges and universities,” first monday 9 (2004), http://firstmonday.org/issues/issue9_12/mcqueen/index.html. 45. robin peek, “controlling the threat of e-book piracy,” information today 18, no. 6 (2001): 42. 46. gleason, “copyright and electronic publishing,” 5–26. http://dx.doi.org/10.1300/j101v13n26_02 http://firstmonday.org/issues/issue9_12/mcqueen/index.html 14 information technology and libraries | march 2007 article title: subtitle in same font author name and second author author id box for 2 column layout 14 information technology and libraries | march 2007 article title: subtitle in same font author name and second author author id box for 2 column layout based on data collected as part of the 2006 public libraries and the internet study, the authors assess the degree to which public libraries provide sufficient and quality bandwidth to support the library’s networked services and resources. the topic is complex due to the arbitrary assignment of a number of kilobytes per second (kbps) used to define bandwidth. such arbitrary definitions to describe bandwidth sufficiency and quality are not useful. public libraries are indeed connected to the internet and do provide public-access services and resources. it is, however, time to move beyond connectivity type and speed questions and consider issues of bandwidth sufficiency, quality, and the range of networked services that should be available to the public from public libraries. a secondary, but important issue is the extent to which libraries, particularly in rural areas, have access to broadband telecommunications services. t he biennial public libraries and the internet studies, conducted since 1994, describe public library involve ment with and use of the internet.1 over the years, the studies showed the growth of publicaccess comput ing (pac) and internet access provided by public libraries to the communities they serve. internet connectivity rose from 20.9 percent to essentially 100 percent in less than ten years; the average number of public access computers per library increased from an average of two to nearly eleven; and bandwidth rose to the point where 63 percent of public libraries have connection speeds of greater than 769kbps (kilobytes per second) in 2006. this dramatic growth, replete with related information technology challenges, occurred in an environment of challenges—among them budgetary and staffing—that public libraries face in main taining traditional services as well as networked services. one challenge is the question of bandwidth suf ficiency and quality. the question is complex because typically an arbitrary number describes the number of kbps used to define “broadband.” as will be seen in this paper, such arbitrary definitions to describe band width sufficiency are generally not useful. the federal communications commission (fcc), for example, uses the term “high speed” for connections of 200kbps in at least one direction.2 there are three problematic issues with this definition: 1. it specifies unidirectional bandwidth, meaning that a 200kbps download, but a much slower upload (e.g., 56kbps) would fit this definition; 2. regardless of direction, bandwidth of 200kbps is neither high speed nor does it allow for a range of internetbased applications and services. this inad equacy will increase significantly as internetbased applications continue to demand more bandwidth to operate properly. 3. the definition is in the context of broadband to the single user or household, and does not take into consideration the demands of a highuse multiple workstation publicaccess context. in addition to connectivity speed, there are many ques tions related to public library pac and internet access that can affect bandwidth sufficiency—from budget and sus tainability, staffing and support, to services public librar ies offer through their technology infrastructure, and the impacts of connectivity and pac on the communities that libraries serve. one key question, however, is what is quality pac and internet bandwidth for public libraries? and, in attempting to answer that question, what are measures and benchmarks of quality internet access? this paper provides data from the 2006 public libraries and the internet study to foster discussion and debate around determining quality pac and internet access.3 bandwidth and connectivity data at the library outlet or branch level are presented in this article. the band width measures are not systemwide but rather at the point of service delivery in the branch. ■ the bandwidth issue there are a number of factors that affect the sufficiency and quality of bandwidth in a pac and internet service context. examples of factors that influence actual speed include: ■ number of workstations (publicaccess and staff) that simultaneously access the internet; ■ provision of wireless access that shares the same con nection; ■ ultimate connectivity path—that is, a direct connec tion to the internet that is truly direct, or one that goes through regional or other local hops (that may have aggregated traffic from other libraries or orga nizations) out to the internet; john carlo bertot and charles r. mcclure assessing sufficiency and quality of bandwidth for public libraries john carlo bertot (jbertot@fsu.edu) is the associate director of the information use management and policy institute and professor at the college of information, florida state university; and charles r. mcclure (cmcclure@ci.fsu.edu) is the director of the information use management and policy institute (www .ii.fsu.edu) and francis eppes professor of information studies at the college of information, florida state university. article title | author 15assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 15 ■ type of connection and bandwidth that the telecom munications company is able to supply the library; ■ operations (surfing, email, downloading large files, streaming content) being performed by users of the internet connection; ■ switching technologies; ■ latency effects that affect packet loss, jitter, and other forms of noise throughout a network; ■ local settings and parameters, known or unknown, that impede transmission or bog down the delivery of internetbased content; ■ range of networked services (databases, videoconfer encing, interactive/realtime services) to which the library is linked; ■ if networked, the speed of the network on which the publicaccess workstations reside; and ■ general application resource needs, protocol priority, and other general factors. thus, it is difficult to precisely answer “how much bandwidth is enough” within an evolving and dynamic context of public access, use, and infrastructure. putting publicaccess internet use into a more typi cal applicationanduse scenario, however, may provide some indication of adequate bandwidth. for example: ■ a typical threeminute digital song is 3mb; ■ a typical digital photo is about 2mb; and ■ a typical powerpoint presentation is about 10mb. if one person in a public library were to email a powerpoint presentation at the same time that another person downloaded multiple songs, and another was exchanging multiple pictures, even a library with a t1 line (1.5mbps—megabytes per second) would experience a temporary network slowdown during these operations. this does not take into account many other new high bandwidthconsuming applications such as cnn stream ingvideo channel; uploading and accessing content to a wiki, blog, or youtube.com; or streaming content such as cbs’s webcasting the 2006 ncaa basketball tournament. an increasingly used technology in various settings is twoway internetbased video conferencing. with an installed t1 line, a library could support two 512kbps or three 384kbps videoconferences, depending on the amount of simultaneous traffic on the network—which, in a public access context, would be heavy. indeed, the 2006 public libraries and the internet study indicated a near continuous use of publicaccess workstations by patrons (only 14.6 percent of public libraries indicated that they always had a sufficient number of workstations available for patron use). public libraries increasingly serve as access points to egovernment services and resources, e.g., social services, disaster relief, health care.4 these services can require the simple completion of a webbased form (lowbandwidth consumption) to more interactive services (highband width consumption). and, as access points to continuing education and online degree programs, public libraries need to offer adequate broadband to enable users to access services and resources that increasingly can depend on streaming technologies that consume greater bandwidth. ■ bandwidth and pac in public libraries today as table 1 demonstrates, public libraries continue to increase their bandwidth, with 63.3 percent of public libraries reporting connection speeds of 769kbps or greater. this compares to 47.7 percent of public libraries reporting connection speeds of greater than 769kbps in 2004. there are disparities between rural and urban pub lic libraries, with rural libraries reporting substantially fewer instances of connection speeds of greater than 1.5mbps in 2006. on the one hand, the increase in con nectivity speeds between 2004 and 2006 is a positive step. on the other, 16.1 percent of public libraries report that their connection speeds are insufficient to meet patron demands all of the time, and 29.4 percent indicate that their connection speeds are insufficient to meet patron demands some of the time. thus, nearly half of public libraries indicate that their connection speeds are insuf ficient to meet patron demands some or all of the time. in terms of public access computers, the average number of workstations that public libraries provide is 10.7 (table 2). urban libraries have an average of 17.1 workstations, as compared to rural libraries, which report an average of 7.1 workstations. a closer look at bandwidth and pac for the next sections, the data offer two key views for analysis purposes: (1) workstations—divided into libraries with ten or fewer publicaccess workstations and libraries with more than ten publicaccess worksta tions (given that the average number of publicaccess workstations in libraries is roughly ten); and (2) band width—divided into libraries with 769kbps or less and libraries with greater than 769kbps (an arbitrary indicator of broadband for a public library context). in looking across bandwidth and publicaccess work stations (table 3), overall 31.8 percent of public libraries have connection speeds of less than 769kbps while 63.3 percent have connection speeds of greater than 769kbps. a majority of public libraries—68.5 percent—have ten or fewer workstations, while 30.9 percent have more than ten workstations. in general, rural libraries have fewer workstations and lower bandwidth as compared to sub urban and urban libraries. indeed, 75.2 percent of urban 16 information technology and libraries | march 200716 information technology and libraries | march 2007 libraries with fewer than ten workstations have connec tion speeds of greater than 769kbps, as compared to 45.2 percent of rural libraries. when examining pac capacity, it is clear that public libraries have capacity issues at least some of the time in a typical day (tables 4 through 6). only 14.6 percent of public libraries report that they have sufficient numbers of workstations to meet patron demands at all times (table 6), while nearly as many, 13.7 percent, report that they consistently are unable to meet patron demands for publicaccess workstations (table 4). a full 71.7 percent indicate that they are unable to meet patron demands during certain times in a typical day (see table 5). in other words, 85.4 percent of public libraries report that they are unable to meet patron demand for publicaccess workstations some or all of the time during a typical day—regardless of number of workstations available and type of library. the disparities between rural and urban libraries are notable. in general, urban libraries report more difficulty in meeting patron demands for publicaccess workstations. of urban public libraries, 27.8 percent report that they consistently have difficulty in meeting patron demand for workstations, as compared to 11.0 percent of suburban and 10.6 percent of rural public libraries (table 4). by contrast, 6.6 percent of urban libraries report sufficient workstations to meet patron demand all the time as compared to 18.9 percent of rural libraries (table 6). when reviewing the adequacy of speed of connectiv ity data by the number of workstations, bandwidth, and metropolitan status, a more robust and descriptive pic table 1. public library outlet maximum speed of public-access internet services by metropolitan status and poverty metropolitan status poverty level maximum speed urban suburban rural low medium high overall less than 56kbps 0.7% ±0.8% (n=18) 0.4% ±0.6% (n=17) 3.7% ±1.9% (n=275) 2.0% ±1.4% (n=245) 2.7% ±1.6% (n=61) 2.6% ±1.6% (n=5) 2.1% ±1.4% (n=311) 56kbps– 128kbps 2.5% ±1.6% (n=67) 5.4% ±2.3% (n=264) 15.2% ±3.6% (n=1,132) 9.9% ±3.0% (n=1,237) 9.5% ±2.9% (n=216) 5.3% ±2.2% (n=10) 9.8% ±3.0% (n=1,463) 129kbps– 256kbps 2.7% ±1.6% (n=72) 6.8% ±2.5% (n=332) 11.1% ±3.1% (n=829) 8.5% ±2.8% (n=1,067) 7.3% ±2.6% (n=166) 8.2% ±2.8% (n=1,233) 257kbps–768kbps 9.1% ±2.9% (n=241) 10.4% ±3.1% (n=504) 13.4% ±3.4% (n=1,002) 12.5% ±3.3% (n=1,557) 8.4% ±2.8% (n=190) 11.7% ±3.2% (n=1,747) 769kbps– 1.5mbps 33.6% ±4.7% (n=889) 40.0% ±4.9% (n=1,945) 31.0% ±4.6% (n=2,310) 34.3% ±4.8% (n=4,286) 34.6% ±4.8% (n=788) 38.1% ±4.9% (n=70) 34.4% ±4.8% (n=5,144) greater than 1.5mbps 49.4% ±5.0% (n=1,304) 31.6% ±4.7% (n=1,533) 19.9% ±4.0% (n=1,488) 27.4% ±4.5% (n=3,423) 35.5% ±4.8% (n=808) 50.5% ±5.0% (n=93) 28.9% ±4.5% (n=4,324) don’t know 1.9% ±1.4% (n=50) 5.4% ±2.3% (n=263) 5.7% ±2.3% (n=427) 5.5% ±2.3% (n=685) 2.1% ±1.4% (n=48) 3.5% ±1.8% (n=6) 4.9% ±2.2% (n=739) weighted missing values, n=1,497 table 2. average number of public library outlet graphical publicaccess internet terminals by metropolitan status and poverty* poverty level metropolitan status low medium high overall urban 14.7 20.9 30.7 17.9 suburban 12.8 9.7 5.0 12.6 rural 7.1 6.7 8.1 7.1 overall 10.0 13.3 26.0 10.7 * note that most library branches defined as “high poverty” are in general part of library systems with multiple branches and not single building systems. by and large, library systems connect and provide pac and internet services systemwide. article title | author 17assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 17 ture emerges. while overall, 53.5 percent of public librar ies indicate that their connection speeds are adequate to meet demand, some parsing of this figure reveals more variation (tables 7 through 10): ■ libraries with connection speeds of 769kpbs or less are more likely to report that their connection speeds are insufficient to meet patron demand at all times, with 24.0 percent of rural libraries, 25.8 percent of suburban libraries, and 25.4 percent of urban libraries so reporting (table 7). ■ libraries with connection speeds of 769kpbs or less are more likely to report that their connection speeds are insufficient to meet patron demand at some times, with 35.0 percent of rural libraries, 38.1 per cent of suburban libraries, and 53.4 percent of urban libraries so reporting (table 8). ■ libraries with connection speeds of greater than 769kbps also report bandwidthsufficiency issues, with 12.0 percent of rural libraries, 10.5 percent of suburban libraries so reporting; and 14.0 percent of urban librar ies indicating that their connection speeds are insuf ficient all of the time (table 7); 20.3 percent of rural libraries, 29.5 percent of suburban libraries, and 30.0 percent of urban libraries indicating that their connec tion speeds are insufficient some of the time (table 8). ■ libraries that have ten or fewer workstations tend to rate their bandwidth as more sufficient at either 769kbps or less or greater than 769kbps (tables 7, 8, and 10). thus, in looking at the data, it is clear that libraries with fewer workstations indicate that their connection speeds are more sufficient to meet patron demand. table 3. public library public-access workstations and speed of connectivity by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 48.4% n=2,929 45.2% n=2,737 30.1% n=891 63.2% n=1,872 21.6% n=269 75.2% n=937 more than 10 workstations 22.0% n=307 75.5% n=1,053 12.0% n=225 85.1% n=1,595 9.6% n=130 89.8% n=1,221 total 43.4% n=3,242 50.9% n=3,802 23.0% n=1,116 71.6% n=3,474 15.1% n=399 83.0% n=2,194 missing: 7.6% (n=1,239) table 4. fewer public library public-access workstations than patrons wishing to use them by metropolitan status rural suburban urban total 10 or fewer workstations 10.5% n=681 10.8% n=339 23.6% n=300 12.1% n=1,321 more than 10 workstations 10.8% n=158 11.4% n=220 31.2% n=430 16.9% n=808 total 10.6% n=845 11.0% n=562 27.8% n=748 13.7% n=2,157 missing: 2.9% (n=473) table 5. fewer public library public-access workstations than patrons wishing to use them at certain times during a typical day by metropolitan status rural suburban urban total 10 or fewer workstations 68.8% n=4,444 74.5% n=2,347 69.1% n=880 70.5% n=7,670 more than 10 workstations 78.1% n=1,139 80.2% n=1,548 62.8% n=866 74.5% n=3,553 total 70.5% n=5,605 76.7% n=3,905 65.6% n=1,764 71.7% n=11,273 missing: 2.9% (n=473) table 6. sufficient public library public-access workstations available for patrons wishing to use them by metropolitan status rural suburban urban total 10 or fewer workstations 20.6% n=1,331 14.7% n=464 7.4% n=94 17.4% n=1,889 more than 10 workstations 11.0% n=161 8.4% n=163 6.0% n=83 8.5% n=406 total 18.9% n=1,501 12.3% n=627 6.6% n=177 14.6% n=2,304 missing: 2.9% (n=473) 18 information technology and libraries | march 200718 information technology and libraries | march 2007 table 7. public library connection speed insufficient to meet patron needs by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 25.4% n=668 12.1% n=297 27.4% n=233 9.8% n=173 15.4% n=34 10.2% n=90 more than 10 workstations 11.6% n=34 11.4% n=108 19.2% n=41 11.3% n=168 25.4% n=32 17.1% n=199 total 24.0% n=705 12.0% n=408 25.8% n=274 10.5% n=341 18.7% n=72 14.0% n=293 table 8. public library connection speed insufficient to meet patron needs at some times by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 34.1% n=898 19.3% n=474 37.1% n=315 29.0% n=511 50.0% n=130 27.0% n=238 more than 10 workstations 43.2% n=127 22.5% n=214 42.3% n=90 30.3% n=450 60.3% n=76 32.0% n=374 total 35.0% n=1,025 20.3% n=694 38.1% n=405 29.5% n=961 53.4% n=206 30.0% n=626 table �. public library connection speed is sufficient to meet patron needs by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 38.9% n=1,025 68.3% n=1,675 35.0% n=297 60.2% n=1,062 34.6% n=90 62.9% n=556 more than 10 workstations 45.2% n=133 66.1% n=628 38.5% n=82 54.9% n=817 14.3% n=18 50.9% n=594 total 39.5% n=1,158 67.5% n=2,306 35.7% n=379 57.9% n=1,886 28.0% n=108 56.0% n=1,168 table 10. public library connection speed insufficient to meet patron needs some or all of the time by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 59.5% n=1,566 31.4% n=771 64.6% n=549 38.8% n=684 65.4% n=170 37.1% n=328 more than 10 workstations 54.8% n=161 33.9% n=322 61.5% n=131 41.6% n=618 85.7% n=108 49.1% n=573 total 24.0% n=1,025 32.3% n=1,102 64.0% n=680 40.0% n=1,302 72.0% n=278 44.0% n=919 article title | author 1�assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 1� ■ discussion and selected issues the data presented point to a number of issues related to the current state of public library pac and internetaccess adequacy in terms of available public access computers and bandwidth. the data also provide a foundation upon which to discuss the nature of quality and sufficient pac and internet access in a public library environment. while public libraries indicate increased ability to meet patron bandwidth demand when providing fewer publicly avail able workstations, public libraries indicate that they have difficulty in meeting patron demand for public access computers. growth of wireless connections in 2004, 17.9 percent of public library outlets offered wire less access, and a further 21.0 percent planned to make it available. outlets in urban and highpoverty areas were most likely to have wireless access. the majority of librar ies (61.2 percent), however, neither had wireless access nor had plans to implement it in 2004. as table 11 demon strates, the number of public library outlets offering wire less access has roughly doubled from 17.9 percent to 36.7 percent in two years. furthermore, 23.1 percent of outlets that do not currently have it plan to add wireless access in the next year. thus, if libraries follow through with their plans to add wireless access, 61.0 percent of public library outlets in the united states will have it by 2007. the implications of the rapid growth of the public library’s provision of wireless connectivity (as shown in table 11) on bandwidth requirements are significant. either libraries added wireless capabilities through their current overall bandwidth, or they obtained additional bandwidth to support the increased demand created by the service. if the former, then wireless access created an even greater burden on an already problematic band width capacity and may have actually reduced the overall quality of connectivity in the library. if the latter, libraries then had to shoulder the burden of increased expendi tures for bandwidth. either scenario required additional technology infrastructure, support, and expenditures. sufficient and quality connections the notion of sufficient and quality public library con nection to the internet is a moving target and depends on a range of factors and local conditions. for purposes of discussion in this paper, the authors used 769kbps to differentiate “slower” from “faster” connectivity. if, how ever, 1.5mbps or greater had been used to define faster connectivity speeds, then only 28.9 percent of public libraries would meet the criterion of “faster” connectiv ity (see table 1). and in fact, simply because 28.9 percent of public libraries report connection speeds of 1.5mbps or faster does not also mean that they have sufficient or quality bandwidth to meet the computing needs of their users, their staff, their vendors, and their service provid ers. some public libraries may need 10mbps to meet the pac needs of their users as well as the internal staff and management computing needs. the library community needs to become more edu cated and knowledgeable about what constitutes sufficient and quality connectivity in their library for the communi ties that they serve. a first step is to understand clearly the nature and type of the connectivity of the library. the next step is to conduct an internal audit that minimally: ■ identifies the range of networked services the library provides both to users as well as for the operation of the library; ■ identifies the typical bandwidth consumption of these services; ■ determines the demands of users on the bandwidth in terms of services they use; ■ determines peak bandwidthusage times; ■ identifies the impact of highconsumption networked services used at these peakusage times; ■ anticipates bandwidth demands of newer services and resources that users will want to access through the library’s infrastructure—myspace.com, youtube. com—regardless of whether or not the library is the direct provider of such services; and ■ determines what broadband services are available to the library, the costs of these services, and the “fit” of these services to the needs of the library. based on this and related information from such an audit, library administration can better determine the degree to which the bandwidth is sufficient in speed and quality. ■ planning for sufficient and quality bandwidth knowing the current condition of existing bandwidth in the library is not the same as successful technology plan ning and management to ensure that the library has, in fact, bandwidth that is sufficient in speed and quality. once an audit such as has been suggested is completed, careful planning for bandwidth deployment in the library is essential. it appears, however, that currently much of the management and planning for networked services is based first on what bandwidth is available as opposed to the bandwidth that is needed to provide the necessary services and resources in a networked environment. this stance puts public libraries in a reactive condition rather than a proactive condition regarding provision of net worked services. 20 information technology and libraries | march 200720 information technology and libraries | march 2007 most public library planning approaches stress the importance of conducting some type of needs assessment as a precursor to any type of planning.5 further, technology plans should include such things as goals, objectives, ser vices provision, and evaluation as they relate to bandwidth and the appropriate bandwidth needed. recent library technology planning guides, however, give little attention to the management, planning, and evaluation of band width as it relates to provision of networked services. it must be noted that some public libraries may be prevented from accessing higher bandwidth due to high cost, lack of availability of bandwidth alternatives, or other local factors that determine access to advanced telecommunications in their areas. in such circumstances, the audit may serve to inform the public service/utilities commissions, fcc, and others of the need for deploy ment of advanced telecommunications services in these areas. ■ bandwidth planning in a community context the audit and planning processes that have been described are critical activities for libraries. it is essential, however, for these processes to occur in the larger community con text. investments in technology infrastructure are increas ingly a communitywide resource that services multiple functions—emergency services, community access, local government agencies, to name a few. it is in this larger context that library pac and internet access occurs. moreover, there is a convergence of technology and service needs. for example, public libraries increasingly serve as agents of egovernment and disasterrelief providers.6 first responders rely on the library’s infrastructure when theirs is destroyed, as hurricane katrina and other storms demonstrated. local, state, and federal government agen cies rely on broadband and pac and internet access (wired or wireless) to deliver egovernment services. thus, at their core, libraries, emergency services, gov ernment agencies, and others have similar needs. pooling resources, planning jointly, and looking across needs may yield economies of scale, better service, and a more robust community technology infrastructure. emergency providers need access to reliable broadband and commu nications technologies in general, and in emergency situ ations in particular. libraries need access to highquality broadband and pac technologies. both need access to wireless technologies. as broadcast networks relinquish ownership of the 700 mhz frequency used for analog television in february 2009, and this frequency is distributed to municipali ties for emergency services, now is an excellent time for libraries to engage in community technology planning for egovernment, disaster planning and relief efforts, and pac and internet services. by working with the larger community to build a technology infrastructure, the library and the entire community benefit. ■ availability to high-speed connectivity one key consideration not known at this time is the extent to which public libraries—particularly those in rural areas—even have access to highspeed connec tions. many rural communities are served not by the large telecommunications carriers, but rather by small, privately ownedandrun local exchange carriers. iowa and wisconsin, for example, are each served by more than eighty exchange carriers. as such, public libraries are limited in capacity and services to what these exchange table 11. public-access wireless internet connectivity availability in public library outlets by metropolitan status and poverty metropolitan status poverty level provision of public-access wireless internet services urban suburban rural low medium high overall currently available 42.9% ± 4.9% (n=1,211) 42.5% ± 4.9% (n=2,240) 30.7% ± 4.6% (n=2,492) 38.0% ± 4.8% (n=5,165) 28.1% ±4.5% (n=679) 53.8% ± 5.0% (n=99) 36.7% ± 4.8% (n=5,943) not currently available and no plans to make it available within the next year 23.1% ± 4.2% (n=651) 29.7% ± 4.6% (n=1,562) 49.2% ± 5.0% (n=3,988) 37.4% ± 4.8% (n=5,091) 44.4% ± 4.9% (n=1,072) 21.0% ± 4.1% (n=39) 38.3% ± 4.9% (n=6,201) not currently available, but there are plans to make it available within the next year 30.6% ± 4.6% (n=864) 26.0% ± 4.4% (n=1,369) 18.6% ± 3.9% (n=1,509) 22.5% ± 4.2% (n=3,063) 26.2% ± 4.4% (n=633) 25.3% ± 4.4% (n=46) 23.1% ± 4.2% (n=3,742) article title | author 21assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 21 carriers offer and make available. thus, in some areas, dsl service may be the only form of highspeed connec tivity available to libraries. and, as suggested earlier, dsl may or may not be considered high speed given the needs of the library and the demands of its users. communities that lack highquality broadband ser vices by telecommunications carriers may want to con sider building a municipal wireless network that meets the community’s broadband needs for emergency, disas ter, and publicaccess settings. as a community engages in communitywide technology planning, it may become evident that local telecommunications carriers do not meet the broadband needs of the community. such com munities may need to build their own networks, based on identified technologyplan needs. ■ knowledge of networked services connectivity needs patrons may not attempt to use highbandwidth services at the public library because they know from previous visits that the library cannot provide acceptable connec tivity speeds to access that service—thus, they quit trying to access that service, limiting the usefulness of the pub lic library. in addition, librarians may have inadequate knowledge or information to determine when bandwidth is or is not sufficient to meet the demands of their users. indeed, the survey and site visits revealed that some librarians did not know the connection speeds that linked their library to the internet. consequently, libraries are in a dilemma: increase both the number of workstations and the bandwidth to meet demand; or provide less service in order to operate within the constraints of current connectivity infrastruc ture. and yet, roughly 45 percent of public libraries indi cate that they have no plans to add workstations within the next two years; the average number of workstations has been around ten for the last three surveys (2002, 2004, and 2006); and 80 percent of public libraries indicate that space limitations affect their ability to add workstations.7 hence, for many libraries, adding workstations is not an option. ■ missing the mark? the networked environment is such that there are multi ple uses of bandwidth within the same library—for exam ple, public internet access, staff access, wireless access, integrated library system access. we are now in the web 2.0 environment, which is an interactive web that allows for content uploading by users (e.g., blogs, mytube.com, myspace.com, gaming). streaming content, not text, is increasingly the norm. there are portable devices that allow for text, video, and voice messaging. increasingly, users desire and prefer wireless services. this is a new environment in which libraries provide public access to networked services and resources. it is an enabling environment that puts users fully in the content seat—from creation to design to organization to access to consumption. and users have choices, of which the public library is only one, regarding the information they choose to access. it is an environment of competition, advanced applications, bandwidth intensity, and highquality com puters necessary to access the graphically intense content. the impacts of this new and substantially more com plex environment on libraries are potentially significant. as user expectations rise, combined with the provision of highquality services by other providers, libraries are in a competitive and service and resourcerich informa tion environment. providing “bare minimum” pac and internet access can have two detrimental effects in that they: (1) relegate libraries to places of last resort, and (2) further digitally divide those who only have publicaccess computers and internet access through their public librar ies. it is critical, therefore, for libraries to chart a highend course regarding pac and internet access, and not access that is merely perceived to be acceptable by the librarians. ■ additional research the context in which issues regarding quality pac and sufficient connectivity speeds to internet access reside is complex and rapidly changing. research questions to explore include: ■ is it possible to define quality pac and internet access in a public library context? ■ if so, what are the attributes included in the defini tion? ■ can these attributes be operationalized and mea sured? ■ assuming measurable results, what strategies can the library, policy, research, and other interested communities employ to impact public library move ment toward quality pac and internet access? ■ should there be standards for sufficient connectivity and quality pac in public libraries? ■ how can public librarians be better informed regard ing the planning and deployment of sufficient and quality bandwidth? ■ what is the role of federal and state governments in supporting adequate bandwidth deployment for public libraries?8 ■ to what extent is broadband deployment and avail ability truly universal as per the universal service 22 information technology and libraries | march 200722 information technology and libraries | march 2007 (section 254) of the telecommunications act of 1996 (p.l. 104104)? these questions are a beginning point to a larger set of activities that need to occur in the research, practitioner, and policymaking communities. ■ obtaining sufficient and quality public-library bandwidth arbitrary connectivity speed targets, e.g., 200kbps or 769kbps, do not in and of themselves ensure quality pac and sufficient connectivity speeds. public libraries are indeed connected to the internet and do provide public access services and resources. it is time to move beyond connectivitytype and speed questions and consider issues of bandwidth sufficiency, quality, and the range of networked services that should be available to the public from public libraries. given the widespread connectivity now provided from most public libraries, there continue to be increased demands for more and better networked services. these demands come from governments that expect public libraries to support a range of egovernment services, from residents who want to use free wireless connectivity from the public library, to patrons who need to download music or view streaming videos (to name but a few). simply providing more or better connectivity will not, in and of itself, address all of these diverse service needs. increasingly, pac support will require additional public librarian knowledge, resources, and services. sufficient and quality bandwidth is a key component of those services. the degree to which public libraries can provide such enhanced networked services (requiring exceptionally high bandwidth that is both sufficient and of high quality) is unclear. mounting a significant effort now to better understand existing bandwidth use and plan for future needs and requirements in individual public libraries is essential. in today’s networked envi ronment, libraries must stay competitive in the provision of networked services. such will require sufficient and highquality connectivity and bandwidth. ■ acknowledgements the authors gratefully acknowledge the support of the bill & melinda gates foundation and the american library association for support of the 2006 public libraries and the internet study. data from that study have been incorpo rated into this paper. references 1. information institute, public libraries and the internet (tal lahassee, fla.: information use management and policy insti tute, 2006). all studies conducted since 1994 are available at: http://www.ii.fsu.edu/plinternet (accessed march 1, 2007). 2. u.s. federal communications commission, high speed services for internet access: status as of december 31, 2005 (wash ington, d.c.: fcc, 2006), available at http://www.fcc.gov/ bureaus/common_carrier/reports/fccstate_link/iad/ hspd0604.pdf (accessed mar. 1, 2007). 3. j. c. bertot et al., public libraries and the internet 2006 (tal lahassee, fla.: information use management and policy insti tute, forthcoming), available at http://www.ii.fsu.edu/plinternet (accessed mar. 1, 2007). 4. j. c. bertot et al., “drafted: i want you to deliver e government,” library journal 131, no. 13 (aug. 2006): 34–37. 5. c. r. mcclure et al., planning and role setting for public libraries: a manual of options and procedures (chicago: ala, 1987); e. himmel and w. j. wilson, planning for results: a public library transformation process (chicago, ala, 1997). 6. j. c. bertot et al., “drafted: i want you to deliver egov ernment.”; p. t. jaeger et al., “the policy implications of internet connectivity in public libraries,” government information quarterly 23, no. 1 (2006): 123–41. 7. j. c. bertot et al., public libraries and the internet 2006. 8. jaeger et al., “the policy implications of internet connec tivity in public libraries.” checking out facebook.com | charnigo and barnett-ellis 23 author name and second author checking out facebook.com | charnigo and barnett-ellis 23 author name and second author author id box for 2 column layout while the burgeoning trend in online social networks has gained much attention from the media, few studies in library science have yet to address the topic in depth. this article reports on a survey of 126 academic librarians concerning their perspectives toward facebook.com, an online network for students. findings suggest that librarians are overwhelmingly aware of the “facebook phenomenon.” those who are most enthusiastic about the potential of online social networking suggested ideas for using facebook to promote library services and events. few individuals reported problems or distractions as a result of patrons accessing facebook in the library. when problems have arisen, strict regulation of access to the site seems unfavorable. while some librarians were excited about the possibilities of facebook, the majority surveyed appeared to consider facebook outside the purview of professional librarianship. d uring the fall of 2005, librarians noticed something unusual going on in the houston cole library (hcl) at jacksonville state university (jsu). students were coming into the library in droves. patrons waited in lines with photos to use the publicaccess scan ner (a stack of discarded pictures quickly grew). library traffic was noticeably busier than usual and the computer lab was constantly full, as were the publicaccess termi nals. the hubbub seemed to center around one particular web site. once students found available computers, they were likely to stay glued to them for long stretches of time, mesmerized and lost in what was later determined to be none other than “facebook addiction.” this addic tion was all the more obvious the day the internet was down. withdrawal was severe. soon after the librarians noticed this curious behavior, an article in the chanticleer, the campus newspaper for jsu, dispelled the mystery surrounding the website brouhaha. a campus reporter broke the exciting news to the jsu community that “after months of waiting and requests from across the country, it’s finally here. jsu is officially on the facebook.”1 the library suddenly became a popular hangout for students in search of computers to access facebook. apparently jsu jumped on the bandwagon relatively late. the facebook craze had already spread throughout other colleges and universities since the web site was founded in february 2004 by mark zuckerberg, a former student at harvard university. the creators of facebook vaguely define the site as “a social utility that connects you with the people around you.”2 although originally created to allow students to search for other students at colleges and universities, the site has expanded to allow individuals to connect in high schools, companies, and within regions. recently, zuckerberg has also announced plans to expand the network to military bases.3 currently, students and alumni in more than 2,200 colleges and uni versities communicate, connect with other students, and catch up with past high school classmates daily through the network. students who may never physically meet on campus (a rather serendipitous occurrence in nature) have the opportunity to connect through facebook. establishing virtual identities by creating profiles on the site, students post photographs, descriptions of academic and personal interests such as academic majors, campus organizations of which they are members, political orientation, favorite authors and musicians, and any other information they wish to share about themselves. facebook’s search engine allows users to search for students, faculty, and staff with similar interests by keyword. it would be hard to gauge how many of these students actually meet in person after connecting through facebook. the authors of this study have heard students mention that either they or their friends have made dates with other students on campus through facebook. many of the “friends” facebook users first add when they initially establish their accounts are the ones they are already acquainted with in the physical world. when facebook made its debut at jsu, it had become the “ninth most highly trafficked web site in the u.s.”4 one source estimated that 85 percent of college students whose institutions are registered in facebook’s directory have created personal profiles on the site.5 membership for the university network requires a university email address, and an institution cannot be registered in the directory unless a significant number of students request that the school be added. currently, more than nine mil lion people are registered on facebook.6 soon after jsu was registered on facebook’s direc tory, librarians began to receive questions regarding use of the scanner and requests for help uploading pictures to facebook profiles. students seemed surprisingly open about showing librarians their profiles, which usually contained more information than the librarians wanted to know. however, not all students were enthusiastic about facebook. complaints began to surface from students awaiting access to computers for academic work while classmates “tied up” computers on facebook. some stu dents complained about the distraction facebook caused checking out facebook.com: the impact of a digital trend on academic libraries laurie charnigo and paula barnett-ellis laurie charnigo (charnigo@jsu.edu) is an education librarian and paula barnett-ellis (pbarnett@jsu.edu) is a health, science, and nursing librarian at the houston cole library, jacksonville state university, alabama. 24 information technology and libraries | march 200724 information technology and libraries | march 2007 in the library’s computer lab, a complaint that eventually reached the president of jsu. currently, the administra tion at jsu has decided to block access to facebook in the computer labs on campus, including the lab in the library. opinions of faculty and staff in the library about facebook vary. some librarians scoff at this new trend, viewing the site primarily as just another dating service. others have created their own facebook accounts just to see how it works, to connect with students, and to keep up with the latest internet fad.7 ■ study rationale prompted by the issues that have arisen at hcl as a result of heavy patron use of facebook, the authors surveyed academic librarians throughout the united states to find out what impact, if any, the site has had on other libraries. the authors sought information about the practical effect facebook has had on libraries, as well as librarians’ perspectives, perceived roles associated with, and awareness of internet social trends and their place in the library. online social networking, like email and instant messaging, is emerging as a new method of com munication. recently, the librarians have heard facebook being used as a verb (e.g., “i’ll facebook you”). few would probably disagree that making social connections and friends (and facebook revolves around connecting friends) is an important aspect of the campus experi ence. much of the attraction students and alumni have toward college yearbooks (housed in the library) stems from the same fascination that viewing photos, student profiles, and searching for past and present classmates on facebook inspires. emphasis in this study centers on librarians’ awareness of, experimentation with, and atti tudes towards facebook and whether or not they have created policies to regulate or block access to the site on publicaccess computers. however trendy an individual web site such as facebook may appear, online social networking, a cat egory facebook falls within, has become a new subject of inquiry to marketing professionals, sociologists, commu nication scholars, and library and information scientists. downes defines social networks as a “collection of indi viduals linked together by a set of relations.”8 according to downes, “social networking web sites fostering the development of explicit ties between individuals as ‘friends’ began to appear in 2002.”9 facebook is just one of many popular online social network sites (myspace, friendster, flickr), and survey respondents often asked why questions focused solely on facebook. the authors decided to investigate it specifically because it is cur rently the largest online social network targeted for the academic environment. librarians are also increasingly exploring the use of what have loosely been referred to as “internet 2.0” com panies and services, such as facebook, to interact with and reach out to our users in new and creative ways. the term internet 2.0 was coined by o’reilly media to refer to internet services such as blogs, wikis, online social net working sites, and types of networks that allow users the ability to interact and provide feedback. o’reilly lists the core competencies that define internet 2.0 services. one of these competencies, which might be of particular inter est to librarians, is that internet 2.0 services must “trust the users” as “codevelopers.”10 as librarians struggle to develop innovative ways to reach users beyond library walls, it seems logical to observe online services, such as facebook and myspace, which appeal to a huge portion of our clientele. from a purely evaluative standpoint of the site as a database, the authors were impressed by several of the search features offered in facebook. graphtheory algo rithms and other advanced network technology are used to process connections.11 some of the more interesting search options available in facebook include the ability to: ■ search for students by course field, class number, or section; ■ search for students in a particular major; ■ search for students in a particular student organiza tion or club; ■ create “groups” for student organizations, clubs, or other students with common interests; ■ post announcements about campus or organization events; ■ search specifically for alumni; and ■ block or limit who may view profiles, providing users with builtin privacy protection if the user so wishes. since the authors finished the study, the site has added a news feed and a mini feed, features that allow users to keep track of their friends’ notes, messages, profile changes, friend connections, and group events. in response to negative feedback about the news feeds and mini feeds by users who felt their privacy was being violated, facebook’s administrators created a way for users to turn off or limit information displayed in the feeds. the addition of this technology, however, provides a sophisticated level of connectivity that is a benefit to users who like to keep abreast of the latest happenings in their network of friends and groups. the pulse, another feature on the site, keeps daily track of popular interests (e.g., favorite books) and member demographics (number of members, political orientation) and compares them with overall facebook member averages. the authors were pleasantly surprised to discover that the beatles and led zeppelin, beloved bands of the baby boomers, article title | author 25checking out facebook.com | charnigo and barnett-ellis 25 continue to live on in the hearts of today’s students. these groups were ranked in the top ten favorite bands by stu dents at jsu. as of october 2006, the top campaign issues expressed by facebook users were: reducing the drinking age to eighteen (go figure) and legalization for samesex marriage. arguably, much of the information provided by facebook is not academic in nature. however, an evaluation or review of facebook might provide useful information to instruction librarians and database ven dors regarding interface design and search capabilities that appeal to students. proviteramcglynn suggests that facilitating learning among millennials, who “represent 70 to 80 million people” born after 1992 (a large percent age of facebook members) involves understanding how they interact and communicate.12 awareness of students’ cultural and social interests, and how they interact online, may help older generations of academic librarians better connect with their constituents. ■ the literature on online social networks although social networks have been the subject of study by sociologists for years and social network theories have been established to describe how these networks func tion, the study of online social networks has received little attention from the scholarly community. in 1997, garton, haythornthwaite, and wellman were among the first to describe a method, social network analysis, for studying online social networks.13 their work was published years before online social networks similar to facebook evolved. currently, the literature on these networks is predominantly limited to popular news pub lications, business magazines, occasional blurbs in library science and communications journals, and numerous student newspapers.14 privacy issues and concerns about sexual predators lurking on facebook and similar sites have been the focus of most articles. in the chronicle of higher education, read details numerous arrests, suspensions, and schol arship withdrawals that have resulted from police and administrators searching for incriminating information students have posted in facebook.15 read discovered that, because students naively reveal so much informa tion about their activities, some campus police were regularly trolling facebook, finding it “an invaluable ally in protecting their campuses.”16 students may feel a false sense of security when they post to facebook, regarding it as their private space. however, read warns that “as more and more colleges offer alumni email accounts, and as campus administrators demonstrate more internet savvy, students are finding that their conversations are playing to a wider audience than they may have antici pated.”17 privacy concerns expressed about facebook appear to revolve more around surveillance than stalk ers. in a web seminar on issues regarding facebook use in higher education, shawn mcguirk, director of judicial affairs, mediation, and education at fitchburg state college, massachusetts, recommends that administrators and others concerned with students posting potentially incriminating, embarrassing, or overtly personal infor mation draft a document similar to the one created by cornell university’s office of information technologies, which advises students on how to safely and responsibly use online social networking sites similar to facebook.18 after pointing out the positive benefits of facebook and reassuring students that cornell university is proud of its liberal policy in not monitoring online social networks, the essay, entitled “thoughts on facebook,” provides poignant advice and examples of privacy issues revolv ing around facebook and similar web sites.19 the golden rule of this essay states: don’t say anything about someone else that you would not want said about yourself. and be gentle with your self too! what might seem fun or spontaneous at 18, given caching technologies, might prove to be a liability to an ongoing sense of your identity over the longer course of history.20 a serious concern discussed in this document is the real possibility that potential employers may scan facebook profiles for the “real skinny” on job candidates. however, unless the employer uses an email issued from the same school as the candidate, he or she is unable to look at the individual’s full profile without first request ing permission from the candidate to be added as a “friend.” all the employer is able to view is the user’s name, school affiliation, and picture (if the user has posted one). unless the user has posted an inappropriate picture or is applying for a job at the college he or she is attending, the threat of employers snooping for informa tion on potential candidates in facebook is minimal. the same, however, cannot be said of myspace, which is much more open and accessible to the public. additionally, three pilot research studies have also focused on privacy issues specifically relating to facebook, including those of stutzman, gross and acquisti, and govani and pashley. results from all three studies revealed strikingly close findings. individuals who participated in the studies seemed willing to dis close personal information about themselves—such as photos and sometimes even phone numbers and mailing addresses—on facebook profiles even though students also seemed to be aware that this information was not secure. in a study of fifty carnegie mellon university undergraduate users, govani and pashley concluded that these users “generally feel comfortable sharing their per sonal information in a campus environment. participants said they “had nothing to hide” and “they don’t really 26 information technology and libraries | march 200726 information technology and libraries | march 2007 care if other people see their information.”21 a separate study of more than four thousand facebook members at the same institution by gross and acquisti echoed these findings.22 comparing identity elements shared by members of facebook, myspace, friendster, and the university of north carolina directory, stutzman discov ered that a significant number of users shared personal information about themselves in online social networks, particularly facebook, which had the highest level of campus participation.23 gross and acquisti provide a list of explanations suggesting why facebook members are so open about sharing personal information online. three explanations that are particularly convincing are that “the perceived benefit of selectively revealing data to strang ers may appear larger than the perceived costs of possible privacy invasions”; “relaxed attitudes toward (or lack of interest in) personal privacy”; and “faith in the network ing service or trust in its members.”24 in public libraries, concern has primarily centered on teenagers accessing myspace.com, an online social net working site much larger than facebook. myspace, whose membership, unlike facebook, does not require an .edu email address, has a staggering 43 million users, a num ber that continues to rise.25 julian aiken, a reference librar ian at the new haven free public library, wrote about the unpopular stance he took when his library decided to ban access to myspace due to the hysterical hype of media reports exposing the dangers from online predators lurking on the site.26 for aiken, the damage of censorship policies in libraries far outweighs the potential risk of sex crimes. furthermore, he suggests that there are even edu cational benefits of myspace, observing that “[t]eenagers are using myspace to work on collaborative projects and learn the computer and design skills that are increasingly necessary today.”27 what is apparent is that whether facebook continues to rise in popularity or fizzles out among the college crowd, the next generation of college students, who now constitute the largest percentage of myspace users, are already solidly entrenched and adept at using online social networks. librarians in institutions of higher education might need to consider what implica tions the communication style preferences of these future students could have, if any, on library services. while most of the academic attention regarding online social networks has centered on privacy concerns, perhaps the business sector has done a more thorough investiga tion of user behavior and students’ growing attraction towards these types of sites. business magazines have naturally focused on the market potential, growth, and fluctuating popularity of various online social networks. advertisers and investors have sought ways to capital ize on the exponential growth of these hightraffic sites. business week reported that as of october 2005, facebook .com had 4.2 million members. more than half of those members were between the ages of twelve and twenty four.28 while some portended that the site was losing momentum, as of august 2006, membership on facebook had expanded beyond eight million.29 marketing experts have closely studied, apparently more so than com munication scholars, the behavior of users in online social networks. in a popular business magazine, hempel and lehman describe user behavior of the “myspace generation”: “although networks are still in their infancy, experts think they’re already creating new forms of social behavior that blur the distinctions between online and realworld interactions.”30 the study of user behavior in online social networks, however, has yet to be addressed in length by those outside the field of marketing. although evidence of interest in online social net works is apparent in librarian weblogs and forums (many librarians have created facebook groups for their libraries), actual literature in the field of library and information science is scarce.31 dvorak questions the lack of interest displayed by the academic community toward online social networks as a focus of scholarly research. calling on academics to “get to work,” he argues “aca demia, which should be studying these phenomena, is just as out of the loop as anyone over 30.”32 this discon nect is also echoed by michael j. bugeja, director of the greenlee school of journalism and communication at iowa state university, who writes, “while i’d venture to say that most students on any campus are regular visitors to facebook, many professors and administrators have yet to hear about facebook, let alone evaluate its impact.”33 the lack of published research articles on these types of networks, however, is understandable given the newness of the technology. a few members of the academic community have sug gested opportunities for using facebook to communicate with and reach out to students. in a journal specifically geared toward student services in higher education, shier considers the impact of facebook on campus community building.34 although she cannot identify an academic purpose for facebook, she describes how the site can con tribute to the academic social life of a campus. facebook provides students with a virtual campus experience, particularly in colleges where students are commuters or are in distance education. shier writes, “as the student’s definition of community moves beyond the geographic and physical limitations, facebook.com provides one way for students to find others with common interests, feel as though they are part of a large community, and also find out about others in their classes.”35 furthermore, facebook membership extends beyond students to fac ulty, staff, and alumni. shier cites examples of professors who used facebook to connect or communicate with their students, including the president of the university of iowa and more than one hundred professors at duke university. professors who teach online courses make article title | author 27checking out facebook.com | charnigo and barnett-ellis 27 themselves seem more human or approachable by estab lishing facebook profiles.36 greeting students on their own turf is exactly the direction staff at washington university’s john m. olin library decided to take when they hired web services librarian joy weese moll to communicate and answer questions through a variety of new technologies, includ ing facebook.37 brian mathews, information services librarian at georgia institute of technology, also created a facebook profile in order to “interact with the students in their natural environment.”38 mathews decided to experiment with the possibilities of using facebook as an outreach tool to promote library services to 1,700 stu dents in the school of mechanical engineering after he discovered that 1,300 of these students were registered on facebook. advising librarians to become proactive in the use of online social networks, mathews reported that overall, his experience helped him to effectively “expand the goal of promoting the library.”39 bill drew was among the first librarians to create an account and profile for his library, the suny morrisville library. as of september 2006, nearly one hundred librarians had created profiles or accounts for their libraries on facebook. one month later, however, the administration at facebook began shutting down library accounts on the grounds that libraries and institutions were not allowed to represent themselves with profiles as though they were individu als. in response, many of these libraries simply created groups for their libraries, which is completely appropri ate, similar to creating a profile, and just as searchable as having an account. the authors of this study created the “houston cole library users want answers!” group, which currently has ninetyone members. library news and information of interest about the library is announced in the group.40 in this study, one trend the authors will try to identify is whether other librarians have considered or are already using facebook in similar ways that moll, mathews, and drew have explored as avenues for com municating with students or promoting library services. ■ the survey in february 2006, 244 surveys were mailed to reference or public service librarians (when the identity of those per sons could be determined). these individuals were chosen from a random sample of the 850 institutions of higher education classified by the carnegie classification listing of higher education institutions as “master’s colleges and universities (i and ii)” and “doctoral/ research universities (extensive and intensive).”41 the sample size provided a 5.3 percent margin error and a 95 percent confidence level. one hundred twentysix surveys were completed, providing a response rate of 51 percent. fifteen survey questions (appendix a) were designed to target three areas of inquiry: awareness of facebook, practical impact of the site on library services, and perspectives of librarians toward online social networks. awareness of facebook a series of questions on the survey queried respondents about their awareness and degree of knowledge about facebook. the overwhelming majority of librarians were aware of facebook’s existence. out of 126 librarians, 114 had at least heard of facebook; 24 were not familiar with the site. as one individual wrote, “i had not heard of facebook before your survey came, but i checked and our institution is represented in facebook.” universities registered in facebook are easily located through a searchbyregion on facebook’s home page. thirtyeight colleges and universities for alabama (jsu’s location) are registered in facebook. (in comparison, 143 academic institutions in california are listed.) out of those librar ians who had heard of the site, 27 were not sure whether their institutions were registered in facebook’s directory. sixty survey participants were aware that their institu tions were registered in the directory, while fifteen librar ians reported that their universities were not registered (figure 1). several comments at the end of the survey indicated that some of the institutions surveyed did not issue school email accounts, making membership in facebook impossible for their university. interestingly, out of the sixty individuals who could claim that their universities were in the directory, 34 percent have created their own personal facebook accounts and two libraries have individual profiles (figure 2). one individual who established an account on the site wrote, “personally, i’m a little embarrassed by having an account because it’s such a teenybopper kind of thing and i’m a little old for it. but it’s an interesting cultural phenomenon and academic librarians need to get on the bandwagon with it, if only to better understand their constituents.” another survey respondent with an individual profile on the site reported a group created by his or her institution on facebook titled “i totally want to have sex in the library.” this individual wanted to make it clear, however, that the students—not the librarians—created this group. a particularly help ful participant went so far as to poll the reference col leagues in all nine of the libraries at his/her institution and found that “only a few had even heard of facebook.” that librarians will become increasingly aware of online social networks was the sentiment expressed by another individual who wrote, “most librarians at my institu tion are unaware of social software in general, much less facebook. however, i think this will change in the future as social software is mentioned more often in traditional media (such as television and newspapers).” according to survey responses, it does not appear 28 information technology and libraries | march 200728 information technology and libraries | march 2007 that use of facebook by students has been as noticeable or distracting in other libraries as it has been at hcl. when asked to describe their observation of student use of library computers to access facebook, 56 percent of those surveyed checked “rarely to never.” only 20 percent indicated “most of the time” to “all of the time” (table 1). however, it is important to remember that only sixty individuals could verify that their institutions are regis tered on facebook. through comments, some librarians hinted that “snooping” or keeping mental notes of what students view on library computers is frowned upon. it simply is not our business. “we do not regulate or track student use of computers in the library,” wrote one indi vidual. several librarians noted that students were using facebook in the libraries, but more so on personal laptops than publicaccess computers. practical impact of facebook another goal of this study was to find out whether facebook has had any real impact on library services, such as an increase in bandwidth, library traffic, and noise, or in use of publicaccess computers, scanners, or other equipment. student complaints about monopolization of computers for use of facebook led administrators to block the site from computer labs at jsu. access to facebook on publicaccess terminals, however, was not regulated. survey responses revealed that facebook has had minimal impact on library services elsewhere. only one library was forced to develop a policy for specifically addressing computeruse concerns as a result of facebook use. one individual mailed the sign posted on every computer terminal in the library, which states, “if you are using a computer for games, chat, or other recreational activity, please limit your usage to thirty minutes. computers are primarily intended for academic use.” another librarian reported that academic computing staff had to shut down access to facebook on library computers due to band width and access issues. this individual, however, added, “interestingly, no one has complained to the library staff about its absence!” given a list of possible effects facebook may have had on library services and operations, 10 per cent of respondents indicated that facebook has increased patron use of computers. seven percent agreed that it has increased patron traffic, and only 2 percent reported that the site has created bandwidth problems or slowed down internet access. only four individuals received patron complaints about other users “tying up” the computers with facebook (figure 3). since the advent of facebook, the public scanner has become one of the hottest items in hcl. librarians at jsu know that use of the scanner has increased tremendously due to facebook because the scanner used by students to upload photos is attached to a public workstation next to the general reference desk. students often ask questions about uploading pictures to their facebook profiles as well as how to edit photos (e.g., resizing and cropping). one survey question asked whether scanner use had increased as a result of facebook. of the sixtytwo respon dents who answered this question (it was indicated that only those libraries that provide public access to scanners should answer the question), 77 percent reported that figure 1. institutions added to the facebook directory figure 2. involvement with facebook table 1. student use of library computers to access facebook (based on observation) total percentage never 23 32 rarely 17 24 some of the time 17 24 all the time 7 10 most of the time 7 10 article title | author 2�checking out facebook.com | charnigo and barnett-ellis 2� scanner use had not increased. furthermore, only two librarians have assisted students with the scanner or pro vided any other type of assistance, for that matter, with facebook. the assistance the two librarians gave included scanning photographs, editing photos, uploading photos to facebook profiles, and creating accounts. however, in a separate question, 21 percent of participants agreed that librarians should be responsible for helping students, when needed, with questions about facebook. no librar ian has added additional equipment such as computers or scanners as a result of facebook. only one individual reported future plans by his/her library to add additional equipment in the future as a result of heavy use of the site. perspectives toward facebook one of the main goals of the study was to obtain a snapshot of the perspectives and attitudes of librarians toward facebook and online social networks in general. most of the librarians surveyed were neither enthusiastic nor disdainful of facebook. a small group of the respon dents, however, when given the chance to comment, were extremely positive and excited about the possibilities of online social networking. twentyone individuals saw no connection between libraries and facebook. sixty seven librarians were in agreement that computer use for academic purposes should take priority, when needed, over use of facebook. however, fiftyone respondents indicated that librarians needed to keep up with internet trends, such as facebook, even when such trends are not academic in nature (table 2). out of 126 librarians who completed the survey, only 23 reported that facebook has generated discussion among library faculty and staff about online social networks. on the other hand, few individuals voiced negative opinions toward facebook. only 5 percent of those surveyed indicated that facebook annoyed faculty and staff. one individual wrote, “i don’t like facebook or most social networking services. they encourage the formation of cliques and keep users from meeting and accepting those who are different than themselves.” comments like this, however, were rare. although the majority of librarians seemed fairly apa thetic toward facebook, few individuals expressed nega tive comments toward the site. few librarians indicated that facebook should be addressed or regulated in library policy. most individu als viewed the site as just another communication tool similar to instant messaging or cell phones. in fact, while most librarians did not express much interest in facebook, many were quite vocal about not regulating its use. the following comment by one survey partici pant captures this sentiment: “attempts to restrict use of facebook in the library would be futile, in my opinion, in the same way it is now impossible to ban use of usb drives and aim in academic libraries.” while most indi table 2. access, assistance, and awareness of facebook and similar trends: perspectives total percentage computer use for academic purposes should take priority, when needed, over use of facebook. 67 53 librarians need to “keep up” with internet trends, such as facebook, even when these trends are not academic in nature. 51 40 library resources should not be monopolized with use of facebook. 35 28 librarians should help students, when able, with questions regarding facebook. 27 21 there is no connection between libraries and facebook. 21 17 student use of facebook on library computers should not be regulated. 15 12 library computers should be available for access to facebook, but librarians should not feel that it is their responsibility to assist students with questions regarding the site. 11 9 (respondents were allowed to check any or all responses that applied.) figure 3. patron complaints about facebook 30 information technology and libraries | march 200730 information technology and libraries | march 2007 viduals agreed that academic use of computers should take priority over recreational use, a polite request that a patron using facebook allow another student to use the computer for academic purposes, when necessary, appears more preferable than the creation and enforce ment of strict policies. as one librarian put it, “i don’t want students to see the library as a place where they are ‘policed’ unnecessarily.” when asked if facebook serves any academic pur pose, 54 percent of those surveyed indicated that it does not, while 34 percent were “not sure.” twelve percent of the librarians identified academic potential or pos sible benefits of the site (figure 4). the authors were surprised to find that 46 percent of those surveyed were not completely willing to dismiss facebook as pure rec reation. some librarians found facebook to be a distrac tion to academics: “maybe i’m old fashioned, but when do students find time for this kind of thing? i wonder about the impact of distractions like this on academic pursuits. there’s still only twentyfour hours in a day.” another individual asked two students who were using facebook in the library what they thought of the site and they admitted that it was “frequently a distraction from academic work.” for the 34 percent who were not sure whether facebook has any academic value, there were comments such as “i am continuing to observe and will decide in the future.” academic uses for facebook included suggestions that it be used as a communication tool for student collaboration in classes (facebook allows students to search for other students by course and sec tion number). one individual suggested it could be used as an “online study hall,” but then wondered if this might lead to plagiarism. some thought instructors could somehow use facebook for conducting online discussion forums, with one participant observing “it’s ‘cooler’ than using blackboard.” “building rapport” with students through a communication medium that many students are comfortable with was another benefit mentioned. respondents who were enthusiastic about facebook thought it most beneficial as a virtual extension of the campus. facebook could potentially fill a void where facetoface connections are absent in online and dis tanceeducation classes. several librarians suggested that facebook has had a positive influence in fostering col legiate bonds and school spirit. as one individual wrote, “[t]he academic environment is not only responsible for scholarly growth, but personal growth as well. this is just one method for students to interact in our highly techno logical society.” facebook could provide students who are not physically on campus with a means to connect with other students at their institutions who have similar academic and social interests. some librarians were so enthusiastic about facebook that they suggested libraries use the site to promote their services. using the site to advertise library events and creating online library study groups and book clubs for students were some of the ideas expressed. one librar ian wrote: “facebook (and other social networking sites) can be a way for libraries to market themselves. i haven’t seen students using facebook in an academic manner, but there was a time when librarians frowned on email and aim too. if it becomes a part of students’ lives, we need to welcome it. it’s part of welcoming them, too.” more librarians, however, felt that facebook should serve as a space exclusively for students and that librarians, profes sors, administrators, police, and other uninvited folks should keep out. furthermore, as one individual noted, it is not “an appropriate venue” for librarians to promote their services. while the review of literature demonstrates that much has been made of online social networks and privacy issues, the librarians surveyed were not particularly con cerned about privacy. only 19 percent indicated that they were concerned about privacy issues related to facebook. however, some librarians voiced concerns that many stu dents are ignorant about the risks of posting personal infor mation and photographs on facebook and do not seem fully aware of the possibility that individuals outside their social sphere might also have reason to access the site. one individual mentioned that the librarians at her institution have begun to emphasize this to students during library instruction sessions on internet research and evaluation. ■ limitations several limitations to this study must be noted when attempting to reach any type of conclusion. participants who had never heard of facebook obviously could not answer any questions except that they were not famil iar with the site. some questions required respondents to “guesstimate.” unless librarians have access to their figure 4. finds conceivable academic value in facebook article title | author 31checking out facebook.com | charnigo and barnett-ellis 31 institution’s internet usage statistics, it would be hard for them to really know how much bandwidth is being used by students accessing facebook. librarians, having been trained in a profession that places a high value on freedom of access, might also be wary of activities that suggest any type of censorship. therefore, it is conceivable that some of the librarians surveyed do not know whether students are using facebook in the library because they make a point not to snoop or make note of individual web sites that students view. ■ discussion while online education is growing at a rapid rate across the united states, so is the presence of virtual academic social communities. although facebook might prove to be a passing fad, it is one of the earliest and largest online social networking communities geared specifically for students in higher education. it represents a new form of communication that connects students socially in an online environment. if online academics have evolved and continue to do so, then it is only natural that online academic social environments, such as facebook, will continue to evolve as well. while traditionally considered the heart of the campus, one is left to ponder the library’s presence in online academic social networks. what role the library will serve in these environments might largely depend on whether librarians are proactive and experi mental with this type of technology or whether they simply dismiss it as pure recreation. emerging technolo gies for communication should provoke, at the very least, an interest in and knowledge of their presence among library and information science professionals. this survey found that librarians were overwhelmingly aware of and moderately knowledgeable about facebook. some librarians were interested in and fascinated with facebook, but preferred to study it as outsiders. others had adopted the technology, but more for the purpose, it would seem, of having a better understanding of today’s students and why facebook (and other online social net working sites) appeals to so many of them. it is apparent from this study that there is a fine line between what now constitutes “academic” activity and “recreational” activity in the library. sites like facebook seem to blur this line fur ther and librarians do not seem eager or find it necessary to distinguish between the two unless absolutely pressed (e.g., asking a student to sign out of facebook when other patrons are waiting to use computers for academic work). one area of attention this study points to is a lack of con cern among librarians toward the internet and privacy issues. some individuals surveyed suggested that librari ans play a larger role in making students aware that people outside their society of friends—namely, administrative or authority figures—have the ability to access the informa tion they post online to social networks. participants were most enthusiastic about facebook’s role as a space where students in the same institution can connect and share a common collegiate bond. librarians who have not yet “checked out” facebook might consider one individual’s description of the site as “just another ver sion of the college yearbook that has become interactive.”42 among the most cherished books in hcl that document campus life at jsu are the mimosa yearbooks. alumni and students regularly flip through this treasure trove of pho tographs and memories. no administrator or librarian would dare weed this collection or find its presence irrele vant. while year books archive campus yesteryears, online social networks are dynamically documenting the here and now of campus life and shaping the future of how we communicate. as casey writes, “libraries are in the habit of providing the same services and the same programs to the same groups. we grow comfortable with our provision and we fail to change.”42 by exploring popular new types of internet services such as facebook instead of quickly dismissing them as irrelevant to librarianship, we might learn new ways to reach out and communicate better with a larger segment of our users. ■ acknowledgements the authors would like to acknowledge stephanie m. purcell, student worker at the houston cole library, for her excellent editing suggestions and insight into online social networks from the student’s point of view, and johnbauer graham, head of public services at the houston cole library, for his encouragement. references and notes 1. angela reid, “finally . . . the facebook,” the chanticleer, sept. 22, 2005, 4. 2. facebook.com, http://www.facebook.com/about.php (accessed dec. 2, 2005). 3. angus loten, “the great communicator,” inc.com., june 6, 2006, http://www.inc.com/30under30/zuckerberg.html (accessed dec. 4, 2005). 4. adam lashinsky, “facebook stares down success,” fortune, nov. 28, 2005, 4. 5. michael amington, “85 percent of college students use facebook,” testcrunch: tracking web 2.0 company review on facebook (sept. 7, 2005), http://www.techcrunch.com/2005/09/07/ 85ofcollegestudentsusefacebook (accessed dec. 2, 2005). 6. http://www.facebook.com/about.php. 7. facebook us! if you are a registered member of facebook, do a global search for “laurie charnigo” or “paula barnett ellis.” 32 information technology and libraries | march 200732 information technology and libraries | march 2007 8. stephen downes, “semantic networks and social net works,” the learning organization 12, no. 5 (2005): 411. 9. ibid. 10. tim o’reilly, “what is web 2.0?” http://www.oreilly net.com/pub/a/oreilly/tim/news/2005/09/30/whatisweb 20.html (accessed aug. 6, 2006). 11. http://www.facebook.com/about.php. 12. angela provitera mcglynn, “teaching millennials, our newest cultural cohort,” the education digest 71, no. 4 (2005): 13. 13. laura garton, caroline haythornthwaite, and barry well man, “studying online social networks,” journal of computer mediated communication 31, no. 4 (1997). 14. facebook.com’s “about” page archives a collection of col lege newspaper articles about facebook: http://www.facebook .com/about.php (accessed dec. 4, 2005). 15. brock read, “think before you share,” the chronicle of higher education, jan. 20, 2006, a38–a41. 16. ibid., a41. 17. ibid., a40. 18. shawn mcguirk, “facebook on campus: understanding the issues,” magna web seminar presented live on june 14, 2006. transcripts available for a fee from magna pubs. http://www .magnapubs.com/catalog/cds/5987551.html (accessed aug. 2, 2006). 19. tracy mitrano, “thoughts on facebook” (apr. 2006) cor nell university of information technologies, http://www.cit .cornell.edu/oit/policy/ memos/facebook.html (accessed june 22, 2006). 20. ibid., “conclusion.” 21. tabreez govani and harriet pashley, “student awareness of the privacy implications when using facebook,” unpublished paper presented at the “privacy poster fair” at the carnegie mellon university school of library and information science, dec. 14, 2005, 9, http://lorrie.cranor.org/courses/fa05/tubzhlp .pdf (accessed jan. 15, 2006). 22. ralph gross and alessandro acquisti, “information rev elation and privacy in online social networks,” paper presenta tion at the acm workshop on privacy in the electronic society, alexandria, va., nov. 7, 2005, 79, http://portal.acm.org/citation .cfm?id=1102214 (accessed nov. 30, 2005). 23. frederic stutzman, “an evaluation of identitysharing behavior in social network communities,” paper presentation at the idmaa and ims code conference, oxford, ohio, april 6–8, 2006, 3–6, http://www.ibiblio.org/fred/pubs/stutzman _pub4.pdf (accessed may 23, 2006). 24. gross and acquisti, “information revelation and privacy in online social networks,” 73. 25. “myspace: design anarchy that works,” business week, jan. 2, 2006, 16. 26. julian aiken, “hands off myspace,” american libraries 37, no. 7 (2006): 33. 27. ibid. 28. jessi hempel and paula lehman, “the myspace genera tion,” business week, dec. 12, 2005, 94. 29. http://www.facebook.com/about.php. 30. hempel and lehman, “the myspace generation,” 87. 31. the authors created the “librarians and facebook” group on facebook to discuss issues concerning facebook and librari anship, such as censorship issues, policies, and ideas for con necting with students through facebook. this is a global group. if you have a facebook account, we invite you to do a search for “librarians and facebook” and join our group. 32. john c. dvorak, “academics get to work!” pcmagazine online, http://www.pcmag.com/article2/0,1895,1928970,00 .asp (accessed feb. 21, 2006). 33. michael j. bugeja, “facing the facebook,” the chronicle of higher education, jan. 27, 2006, c1–c4; ibid. 34. maria tess shier, “the way technology changes how we do what we do,” new directions for student services 112 (winter 2005): 83–84. 35. ibid., 84 36. shier, “the way technology changes how we do what we do,” 112; j. duboff, “poke” your prof: faculty discovers thefacebook.com,” yale daily news, mar. 24, 2005, http://www .yaledailynews.com/article.asp?aid=28845 (accessed jan. 15, 2006; mingyang liu, “would you friend your professor? duke chronicle online, feb. 25, 2005, http://www.dukechronicle.com/ media/paper884/news/2005/02/25/news/would.you.friend .your.professors1472440.shtml?norewrite&sourcedomain =www.dukechronicle.com (accessed jan. 15, 2006). 37. brittany farb, “students can ‘check out’ new librarian on the facebook,” student life (washington univ. in st. louis), feb. 27, 2006, http://www.studlife.com/home/index.cfm?eve nt=displayarticle&ustory_id=5914a90d53b (accessed feb. 27, 2006). 38. brian s. mathews, “do you facebook? networking with students online,” college & research libraries news 37, no. 5 (2006): 306. 39. ibid., 307. 40. view the “houston cole library users want answers!” group by doing a search for the group title on facebook. 41. nces compare academic libraries, http://nces.ed.gov/ surveys/libraries/ compare/peervariable.asp (accessed dec. 2, 2005). the random sample was chosen using the research ran domizer available online, http://www.randomizer.org/form .htm (accessed dec. 2, 2005). 42. michael e. casey and laura c. savastinuk, “library 2.0,” library journal 131, no. 14 (2006): 40. article title | author 33checking out facebook.com | charnigo and barnett-ellis 33 1. has your institution been added to the facebook directory?  yes  no (skip to questions 10, 11, and 12  not sure (skip to questions 10, 11, and 12)  i am not familiar with facebook (skip all questions and submit) 2. which best describes your involvement with facebook?  i have a personal account  my library has an account  no involvement 3. which best describes your observation of student use of library computers to access facebook?  all the time  most of the time  some of the time  rarely  never 4. has your library added additional equipment such as computers or scanners as a result of facebook use?  yes  no  no, but we plan to in the future 5. have patrons complained about other patrons using library computers for facebook?  yes  no  not sure 6. has your library had to develop a policy or had to address computer use concerns as a result of facebook use?  yes  no  not sure 7. if your library provides public access to a scanner, has patron use of scanners increased due to the use of facebook?  yes  no 8. have you assisted students with the library’s scan ner for facebook?  yes  no 9. if you have provided assistance to students with facebook, please check all that apply:  creating accounts  scanning photographs or offering advice on where students can access a scanner  editing photographs (e.g., resizing photos or use of a photo editor)  uploading photographs to facebook profiles  other __________________________________ 10. check the responses that best describe your opinion about the responsibilities of librarians in assisting students with facebook questions and access to the web site:  student use of facebook on library computers should not be regulated.  library resources should not be monopolized with facebook use.  computer use for academic purposes should take priority, when needed, over use of facebook.  librarians should help students, when able, with facebook questions.  librarians need to “keep up” with internet trends, such as facebook, even if they are not academic in nature.  there is no connection between librarians, libraries, and facebook.  library computers should be available for facebook use, but librarians should not feel that they need to assist students with facebook questions. 11. would you consider facebook to be a relevant aca demic endeavor?  yes  no  not sure appendix a: survey on the impact of facebook on academic libraries 34 information technology and libraries | march 200734 information technology and libraries | march 2007 12. if you answered “yes” to question 11, please describe how facebook could be considered an aca demic endeavor. ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ 13. please check all answers that best describe what effect, if any, use of facebook in the library has had on library services and operations?  has increased patron traffic  has increased patron use of computers  has created computer access problems for patrons  has created bandwidth problems or slowed down internet access  has generated complaints from other patrons  annoys library faculty and staff  interests library faculty and staff  has generated discussion among library faculty and staff about facebook 14. is privacy a concern you have about students using facebook in the library?  yes  no  not sure please list any observations, concerns, or opinions you have regarding facebook use in libraries. extracted the paragraphs from my palm to my desktop, and saved that document and the tocs on a universal serial bus (usb) key. today, i combined them in a new document on my laptop and keyed the remaining paragraphs in my room at an inn on a pier jutting into commencement bay in tacoma on southern puget sound. i sought inspiration from the view out my window of the water and the fall color, from old crow medicine show on my ipod, and from early sixties beyond the fringe skits on my treo. fred kilgour was committed to delivering informa tion to users when and where they wanted it. libraries must solve that challenge today, and i am confident that we shall. editorial continued from page 3 ital_24n4.pdf ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ in march 2003 the university of mississippi libraries made our metasearch tool publicly available. after a year of working with this product and integrating it into the library web site, a wide variety of libraries interested in our implementation process and experiences began to call. libraries interested in this product have included consortia, public, and academic libraries in the united states, mexico, and europe. this article was written in an effort to share the recommendations and concerns given. much of the advice is general and could be applied to many of the metasearch tools available. google scholar and other open web initiatives that could impact the future of metasearching are also discussed. m any libraries are looking for ways to facilitate the discovery process for users. implementing a onestop search product that does not require databasespecific knowledge is one of the paths librar ies are choosing.1 as these search engines are made available to patrons, the burden of design falls to the library as well as to the product developers. most library users may be familiar with a few databases, but the vast majority of electronic resources remain unrevealed. using a metasearch product, a single search is broadcast out to similar and divergent electronic resources, and search results are returned and typically mixed together. metasearch results are returned in realtime and link the user to the native interface. although there are many products that support onestop searching, the university of mississippi libraries chose to purchase innovative interfaces’ metafind product because it tied into a digital initiative partnership with innovative. some of the possibilities of the types of resources you can search include: n library catalogs n licensed databases n locally created databases n full text from journals and newspapers n digital collections n selected web sites internet search engines the simplicity of google searching is very appeal ing to users. in fact, users have come to expect this kind of empowering tool. at the university of mississippi, students use and have been using google for research. as google scholar went public, it became evident that university faculty also use it for the same reasons. it was apparent from the university of mississippi libraries’ 2003 libqual+ survey results that users would like more personal control than the library was offering (table 1). unintentionally elaborate mazes are created and users become lost in a quagmire of choices. as indicated by our libqual+ survey results, our users want easytouse tools that allow them to find informa tion on their own, and they want information to be easily accessible for independent use. these are clearly two areas that many libraries are struggling to improve for their patrons. the question is how to go about it. based on several changes made between 2003 and 2005, which included implementing a metasearch tool, the adequacy mean improved for both questions and for undergradu ates as well as graduate students and faculty (table 2). the adequacy mean compares the minimum level of ser vice that a user expects with the level of service that they perceive. in table 1, the negative adequacy mean figures indicate that the library was not meeting users’ minimum level of service for these two questions or that the per ceived level of service was lower than the minimal level of service. table 2 compares the adequacy mean from 2005 with 2003 and indicates a notable, positive change in adequacy mean for each question and with each group. n design perspectives and tension generally, there are conflicts within libraries regarding the question of how to improve access for patrons and allow for independent discovery. for those leading a metasearch implementation, these tensions are important to understand. in implementing new technologies, there are key development issues that may decrease internal acceptance until they are addressed. however, one may also find that there are some underlying fears regarding this technology. although the following crosssubculture comparisons simply do not do justice to each of the valid perspectives, these brief descriptions highlight the types of perspectives one might encounter when considering or implementing a metasearch product. expert searchers prefer native interfaces and all of the functionalities of the native interface. they are typically unhappy with the “dumbeddown” or clunky searching of a metasearch utility. they would prefer for patrons to be taught the ins and outs of the database they should be using for their research. this presupposes that the students either know which database to use, will spend time inves tigating each database on their own, or that they will ask for assistance. however, there are clearly native interface 44 information technology and libraries | june 2007 metasearching and beyond: implementation experiences and advice from an academic library gail herrera gail herrera (gherrera@olemiss.edu) is assistant dean for technical services & automation and associate professor at the university of mississippi. metasearching and beyond | herrera 45 functionalities—such as lim iting to full text—that, while wonderful to patrons, are not consistent across resources or a part of the metasearch standard. users would cer tainly benefit if limiting to fulltext was ubiquitous among vendors and if there were some way to determine fulltext availability within metasearch tools. results ranking is another issue that expert searchers may bring to the table. currently, there is a niso metasearch initiative that is striving to standard ize metasearching.2 another downside for the expert searcher is that there is no browse function. those who are in administrative or manage rial positions working with electronic resources see metasearching as an opportunity to reveal these resources to users who might not otherwise discover them. for example, many users have learned to search ebsco’s academic search premier not realizing that key articles on a local civil rights figure such as james meredith are also available in america: history & life, jstor, and lexisnexis. metasearching removes the need for the user to spend additional time choosing databases that seem relevant and searching them indi vidually. from a financial perspective, if a library is pay ing for these electronic resources, they should be using them as much as possible. and while the university of mississippi libraries generally target the undergraduate audience with our metasearch tool, the james meredith search is a good example of how a metasearch tool might reveal other databases with information that a serious researcher could then further investigate by link ing through the citation to the native interface. those associated with library instruction may also be uncomfortable with metasearching. in fact within a short time of implementing the product, several instructors conveyed their fear that in making searching so simple, they would no longer have a job as the product developed. generally, it seems that users are always in need of instruc tion although the type of instruction and the tools continue to change. it is an understandable fear and one that would be wise to acknowledge for those embarking on a metasearch implementation. while metasearch can be an empowering tool for users, you may also encounter some emotional reactions among library employees. from an information literacy point of view, frost has noted that metasearching is “a step backward” and “a way of avoiding the learning process.”3 it is true that in providing an easy search tool, the library is not endeavoring to teach all students intermedi ate or advanced information retrieval knowledge or skills. however, it is important to provide tools that meet users at their level of expertise and as previously noted, this is an area identified in need of improvement. for those working at public service points such as the reference desk, metasearching is an adjustment. many times those working with patrons tend to use databases with which they are more familiar or in which they feel more confident. federated search tools may reveal resources that are typically less used and therefore unfa miliar to library employees. training may then become an issue worthy of addressing not just for the metasearch interface and design but also for the lessused resources. for those involved in technical support, this product may range from exciting to exasperating. the amount of time your technical support personnel have to dedicate to your metasearch project should be a major factor when investigating the available products. just like any other technological investment, you are either going to (1) purchase the technology and outsource manage ment or (2) obtain a lesser price from a vendor for the tool and invest in developing it yourself. there is also a middle ground, but this costshifting is important to keep in mind. regardless of your approach, it is critical to include the technical support person on your imple mentation team and to keep in mind the kind of time investment that is available when reviewing prices. along with developing this product, one may also find oneself investing additional time and money into infra structural upgrades such as the proxy server, network equipment, or dns servers. in addition to these perspectives, there is a general tension in library web site design philosophies between how librarians would like patrons to use their services table 1. 2003 libqual adequacy mean undergrad grad faculty easy-to-use access tools that allow me to find things on my own -.10 -.30 -.29 making information easily accessible for independent use .37 -.09 .03 table 2. positive change in libqual adequacy mean from 2003 to 2005 undergrad grad faculty easy-to-use access tools that allow me to find things on my own .53 .46 .24 making information easily accessible for independent use .22 .22 .45 46 information technology and libraries | june 2007 and what patrons want. the traditional design based on educating users and having users navigate to information “our way” has definitely curtailed over the past several years with attention being paid increasingly to usability. as usability studies give librarians increasing informa tion, libraries are moving toward designing for our users based on their approaches and needs rather than how librarians would have them work. depending on where one’s library is in this spectrum of design philosophy, implementing a metasearch tool may be harder or easier. judy luther surmised the situa tion well, “for many searchers, the quality of the results matter less than the process—they just expect the process to be quick and easy.”4 moving toward this lofty goal is to some extent dictated by the abilities and inabilities of the technologies chosen. as a technologist, the general rule seems to be that the easier navigation is made for our users; the more complex the technical structure becomes. n metasearch categories in arranging categories of searches for a metasearch product, some libraries group their electronic resources by subject, and others use categories that reflect fulltext avail ability. the university of mississippi libraries use both. the most commonly used category is our fulltext category. this fulltext category was set as the default on our most popular search box located on our articles and databases web page (figure 1). since limiting to fulltext materials is not a standard, the category was defined by the percentage of fulltext they contain. this is an important distinction to understand because a user may receive results that are not fulltext, but the majority of results will likely be fulltext. at our library, if the resource contains more than 50 percent fulltext, it is included in the fulltext category. other categories included in this implementation are ready reference, library catalogs, digital collections, lim ited resources, publicly available databases, and broad subject categories. one electronic resource may be included in the fulltext category, a broad sub ject category such as “arts and humanities” and also have its own individual category in order to mix and match individual resources on sub ject guides using a tailormade search box. the limited resource category contains resources that should be searchable using the metasearch tool but that have a limited number of simultaneous users. if it were included in the default fulltext category that is used so much, it would tie up the resource too much. investigating resources with only one or two simultaneous users at the begin ning of the project may help you avoid error messages and user frustration. one might wonder, “why profile limited resources then?” there may be specific search boxes on subject guides where librarians decide to add that individual but limited resource. it might also be necessary to shorten the timeout period for limited user resources. along those same lines, having paypersearch resources profiled could also be expensive and is not recommended. since the initial implementation, migrating away from per search resources has become a priority. within the first few months of implementation, the free resources such as pubmed and askeric were moved to a new “publicly available” category. the reason is that since there is not any authentication involved, these results return very quickly and are always the first results a user sees. while they are important resources, our intent was really to reveal our subscription resources. this approach allows users to search these resources if specifically chosen but they are not included in the default fulltext category. this approach does still allow subject librarians to mix and match these free individual resources on subject guide search boxes. n response time of all of the issues with our metasearch tool, response time has been the most challenging. there are so many issues when it comes to tracking down sluggish response that it can be extremely difficult to know where to start. if one’s metasearch software is not locally hosted, response time could involve the library network, campus network, offcampus network provider, and the vendor’s network, not to mention the networks of all the electronic resources users are searching. when one adds the other variable of authentication, the picture becomes even more over whelming and difficult to troubleshoot. for authentication, the university of mississippi libraries purchased innovative’s web access management module (wam), which is based on the figure 1. metasearch tailored search box with full text category selected metasearching and beyond | herrera 47 ezproxy software. as the use of our electronic resources from oncampus and offcampus has grown, the inci dence of increasing network issues has risen. in work ing with our campus telecommunications group, the pursuit of evergreater bandwidth has become a priority. troubleshooting has included tracking down trouble some switch settings, firewall settings, as well as campus dns and vendor dns issues. if your network adminis trators use packet shapers, this may be another hurdle. clearly, our metasearch product has placed a significant load increase on the proxy server. in looking at proxy statistics, 24 percent of total proxy hits were from the metasearch product (figure 2). with this in mind, one may find the load on one’s proxy server increasing very dramatically during peak usage and may need to plan for upgrades accordingly. even with improvements and tweaks along the way, response time is still an issue and one of the highest hurdles in selling a metasearch product internally and externally. one metasearch statistical module includes response time information for individual resources along with usage data. the response time information would be very helpful in troubleshooting and in working with electronic resource vendors. usage tracking is another criterion to consider in reviewing metasearch products. n response time and tailored search boxes during implementation, one of the first discussions to have is who will be the target audience for this product. at this institution, undergraduates were the target audi ence and more specifically, those looking for three to five articles for a paper. while our metasearch software has a master screen showing all of the resources divided into the main categories, facing users with over sixty check boxes was not a good solution (figure 3). this master screen is good for demonstrating categories to library staff, overall functionality of the technology, and also for quickly checking all of your resources for connectivity errors. from early conversations with students, keeping basic users far away from this busy screen is a good goal. remember, the purpose is to give them an easy starting point. the best way to keep users in a simple search box is to construct search boxes and handpick either individual resources or categories keep ing in mind the context of the web page. for example, the articles and databases page has a simple search box that searches for articles. subject guide boxes search individual electronic resources selected by the subject librarian. the university of mississippi libraries also have a large col lection from the american institute of certified public accountants (aicpa). the search box on that page searches our catalog, which contains aicpa books along with the aicpa digital collection. some libraries are interested in developing a standard metasearch box to display as a widget or standing content area throughout their web site. this is interesting and worth considering. however, matching the web page content with appropri ate resources has been our approach. as the standards and technology develop, this may be worth further con sideration depending on usability findings. for the most commonly used search box on the articles and databases page (figure 1), the default category checked is the full text articles category. donna fyer stated that, “for the average end user, the less decision making, the better.”5 this certainly rings true for our users. originally, a simple metasearch search box was placed on the library homepage. the library catalog and the basic metasearch box were both displayed. this seemed confusing for users since both products have search capabilities. with the next web site redesign, the basic metasearch box moved from the library homepage to the articles and journals web page. this was a success ful place for the article quick search box to reside since the default was set to search the fulltext category. there were some concerns that users might be typing journal titles into the search box but these were rare instances and not necessarily inappropriate uses. the next rede sign eventually moved this search box to the articles and databases page, where it remains. for the articles and databases pages, the simple search box (figure 1) by default searches the fulltext category and searches the title keyword index. the index category with the label, “article citations,” can also be checked by the user. the majority of metasearches begin with this search box and figure 2. total proxy hits vs. metafind proxy hits 4� information technology and libraries | june 2007 most users do not change the default settings for the resources or the index. n subject guide search boxes in addition to the “article quick search” box, subject librarians slowly became interested in a search box for their subject guides as the possibili ties were demonstrated. in order to do this, the ven dor was asked to profile each resource with its own unique value in order to mix and match individual resources. while the idea of searching resources by subject category sounds useful and appealing, sometimes universal design begets universal dis cord. even with a steering committee involved, it is hard for everyone to agree what resources should be in each of the main subject categories: arts and humanities, science and engineering, business and economics, and social science. some libraries have put a lot of time and effort into creating a large number of subject categories. the master search screen (figure 3) displays several of this library’s categories but not the broad subject categories noted above. these general sub ject categories are brought out in the multipurpose interface called the “library search engine” (figure 4). the library search engine design is a collection of the categories and resources showing the full functionality of our metasearch tool. the subject categorization approach within our metasearch interface is a good way to show the multifunction ality of the product but remains relatively unused by patrons. by giving each resource its own value, subject librarians have the flexibility to select spe cific resources and/or categories for their subject guides. it is worth noting that it required additional setup from our vendor and was not part of the original implementation. after a few months of testing with the initial implemen tation, willing subject librarians chose individual resources for their tailored search boxes. once a simple search box has been constructed, it can be easily copied with minor modi fications to make search boxes for those requesting them. while progress was slow to add these boxes to subject guides, after about a year there was growing interest. in setting these up, subject librarians have several choices to make. first of all, they choose the resources that will be searched. for example, the biology subject guide search box searches academic search premier, bioone, and jstor by default. basicbiosis and pubmed are also avail able but are not checked by default. users can check these search boxes if they also wish to search these resources. choosing the resources to include in the search box as well as setting what resources are checked by default is the most important decision. the subject librarian is also encour aged to assist in evaluating the number of hits per resource returned. with response time being a critical factor, deter mining the number of hits per resource should involve testing and take into consideration the overall number of resources being searched. n relevance selecting the default index is another decision in setting up search boxes. again, users are googleoriented and tend to go with whatever is set as the default option. out of the box, our metasearch tool defaults to the keyword index or keyword search. the issue of relevancy is a hot topic for metasearch products. this issue typically comes up in metasearch discussions. it is also listed as an issue in the niso metasearch initiative. from the technical side of the equation, results are displayed to the user as soon as they are retrieved. this allows users to begin immediately exam figure 3. master screen display (partial screenshot) figure 4. library search engine subject categories metasearching and beyond | herrera 4� ining the results. adding a relevancy algorithm as a step would mean all of the results would have to be returned, ranked, and then displayed. with response time being a key issue, a faster response is more important than relevance. another consideration is if the metasearch results are displayed to the user as interfiled or by electronic resource where the resource is returning results based on its own relevancy rankings. one way to increase relevance is to change the default index from keyword to title keyword. for our students, bringing back keywords in the title made the results more relevant. this is the default index used for our article search on the articles and database web page. subject librarians have the choice of indexes they prefer when blending resources. one caveat in using title keyword is that there are resources that do not support title keyword searching. for other resources, title keyword is not an appropriate index. for example, wilson biographies does not have a title keyword search. it makes perfect sense that a biography database would not support title keyword searching. in these cases, the search may fail and note that the index is not supported. to accommodate this type of exception, the profile for wilson biographies needed to be changed to have the title keyword searchmapped to a basic keyword search. while this does not make the results as relevant as the other search results, it keeps any errors from appearing and allows results to be retrieved. n results per source and per page for metafind, there are also two minor controls that can work as hidden values unseen by the patron or as compo nents within the search box for users to manipulate. the first control is the number of hits to return per resource. if a subject librarian is only searching two or three resources in his tailored search box, he probably will want to set this number higher. if there are many resources, this number should be lower in order to keep response time reasonable. the second control is the number of results to return per page. in general, it is important to adjust these controls after testing the response for the resources selected. while users typically use the default settings, showing these two con trols gives the user a visual clue that the metasearch tool is not retrieving all of the results from the resource. instead, it is only retrieving the first twentyfive, for example. n implementation advice one of the most important pieces of advice is that it is extremely important to have a date in one’s contract or rfp for all of the profiling to be completed if the vendor is doing the resource profiling. from this library’s experi ence, the profiling of a resource can take a very long time, and this is a critical point to include in the contract. one might also consider adding cost and turnaround time for new resources after the initial implementation to the contract. the more resources profiled, the more useful the product. however, one also needs to pay attention to response time. if the plan is to profile one’s own resources or connectors, librarians should be mindful of the time involved and ask other libraries with the same product about time investments. being able to work with vendors who will provide an opportunity to evaluate the product “live” is preferable. in deciding who to target for an implementation team, consider representatives from reference, collection development, and systems. it is also very important to include whoever manages electronic resource access/ subscriptions and a web manager. in watching other pre sentations, exclusion of any of these representatives can seriously undermine the implementation. buyin is essen tial to success. additionally, giving librarians as many options as possible, such as control over what types of resources are in their search boxes as well as the number of hits per resource makes the product more appealing. n questions to ask once the implementation team is set, interviewing refer ences for the products under consideration is an impor tant part of the process. unstructured conversations with references really allow librarians to explore together what the group wants and how its needs fit with the services the vendor offers. a survey of questions via email is another possibility. in choosing this method, be sure to leave some room for open comments. regardless of the approach, it is important to spend some time asking ques tions. provided are a list of recommended questions: n who is responsible for setting up each resource—the vendor or you? n how much time does it typically take to set up a new resource and what is the standard cost to add a new resource? n is there a list or database of alreadyestablished pro files for electronic resources for this product? n how much time would you estimate that it took to implement the product? n will you be able to edit all of the public web pages yourself or will you be using vendor support staff to make changes? if the vendor support staff has to make some of the changes, how responsive are they? 50 information technology and libraries | june 2007 n can you easily mix and match individual resources for subject guides, departmental pages, or other kinds of web pages? or do you only have the option to set up global categories? n is your installation local or does the vendor host it? are there response issues? n is there an administrative module to allow you to maintain categories, resource values, and configura tion options? n how much time goes into managing the product monthly? and who manages the product at your library? n what kind of statistical information does the vendor provide? n how satisfied are you with the training, implementa tion support, and technical documentation? n how does the vendor handle broken resources or subscription changes? as with most technologies, there are upfront and hid den costs. it is important to determine what hidden costs are involved and if you have the resources to support all of the costs. sometimes libraries choose the least expen sive product. however, this approach can lead librar ies down the path of hidden costs. for example, if the product is less expensive but your library is responsible for setting up new electronic resources, managing all of the pages, and finding ways to monitor and troubleshoot performance outside of the tools provided, the hidden expenditures in time and training may be more costly in the end than purchasing the premium metasearch tool. in essence, one must pay for the product one way or another. the big question is, where are the resources to support the product? if one’s library has more it/web personnel than money, the lowercosting product may be the way to go, but be sure to check with other librar ies to see if they have been able to successfully clear this hurdle. additionally, if your library has more onetime money than yearly subscription money, this may dictate the details of the rfp, and your library may lean toward a purchase rather than an annual subscription. n metasearch summary clearly, students want a simple starting place for their research. implementing a metasearch tool to meet this need can be a hard sell internally for many reasons. at this institution, response time has been the overriding critical issue. response has lagged due to server and network issues that have been difficult to track down and improve. however, authentication is truly the most time consuming and complex part of the equation. some fed erated search tools are actually searching locally stored information, which helps with response. while these are not truly metasearch tools and are not performing real time searches, this approach may yield more stability with faster response. over the years in implementing new services such as the library web site, illiad, electronic resources, and off campus authentication, new services are often adopted at a much faster rate by library users than by library employees. typically, there will be early adopters who use the services immediately based on need. it then takes general users about a year to adopt a new service. iii’s metasearch technology has been available for the past four years. however, our implementation is evolving with each web site redesign. still, it is used regularly. the university of mississippi libraries has been pro viding access to its electronic resources in two distinct ways: (1) providing urls on web pages to the native interface of the electronic resource and (2) metasearching. as the library moves forward in developing digital col lections and the number of electronic resources profiled for metasearching increases, it is possible that this kind of global discovery tool will compete in popularity with the library catalog. providing such information mining tools to patrons will cause endless frustration for the library literate. response times, record retrieval order, as well as licensing and profiling issues, are all obstacles to pro viding a successful metasearch infrastructure. retrieval inconsistency and ad hoc retrieval order of records is very unsettling for librarians. however, this is the kind of tool to which web users have become accustomed and certainly seems to fill a need that to date has been lacking where library electronic resources are concerned. n open web developments one other trend appearing is scholarly research discovery tools on the open web. enter google scholar along with other similar initiatives such as windows live academic search. google scholar beta was released in november 2004 and very soon after began an initiative to work with libraries and their openurl resolvers.6 this bridging between an open web tool and libraries is an interest ing development. a fair amount has been written about google scholar to date although the project is still in its beta phase. what does google scholar have to do with metasearching? good question. it remains to be seen how much scholarly information will become search able via google scholar. for now, the jury is still out as to whether google scholar will begin to encroach upon the traditional territory of the indexing and abstracting world. if sufficient content becomes available on the open web, whether from publishers or vendors allowing their metasearching and beyond | herrera 51 content to be included, then the authentication piece that directly effects response time may be overcome. in using google scholar or other such open web portals, search ing happens instantly. when a user uses the openurl resolver to get to the fulltext, that is where authentication enters into the picture and removes the negative impact on searching. the tradeoff is that there are many issues involved in openurl linking and the standardization of the metadata needed to provide consistent access. there are many parallels between what google scholar is attempting to offer and what the promises of metasearching have been. for metasearching, under graduate students looking for their three to five articles for a paper are considered our target audience. for in depth searching, metasearching does have limitations, but for the casual searcher looking for a few fulltext articles, it works well. interestingly, similar recommen dations are being made for google scholar.7 however, opinions differ on this point. roy tennant went so far as to indicate it is a step forward in access to those users without access to licensed databases, but remained reserved in his opinion regarding the usefulness for those with access.8 google scholar also throws in a few bonuses. while providing access to open access (oa) materials in our opac for specific collections such as the directory of open access journals, these same resources have not been included in our metasearch discovery tool. google scholar is searching these open repositories of scholarly informa tion, although there is some concern over the automatic inclusion of materials such as syllabi and undergraduate term papers within the institutional repositories.9 google scholar also provides a useful citation feature and rel evancy. google scholar recognizes the user’s preference for fulltext access and provides a visual cue from the brief results when article fulltext is available. this func tionality is not currently available from our metasearch software but would be extremely helpful to users. on the downside, some of google scholar’s linking policies make it difficult for libraries to extend services beyond full text articles to their users. another notable development among subscription indexing services is the ability to reveal content to web search engines. ebsco’s initiative is called ebscohost connection.10 in implementing metasearching, libraries have debated about providing access to free versus subscrip tion resources. for our purposes, free resources were not included in the most commonly used search in the full text category. there are those who would argue against this decision, and they have very good points. in fact, it has already been noted that some libraries use google scholar to verify incomplete interlibrary loan citations quickly.11 in watching the development of google scholar, it seems possible that this free tool that uncovers free open access resources and institutional repository mate rials may not necessarily be a competitive product, but may be a very complementary one. n impact on the opac what will this mean for the “beloved” opac? for a very long time, users have expected more of the library catalog than it has provided. while the library catalog is typically appreciated by library personnel, its usefulness for finding materials other than books has been hard for general users to understand. many libraries including the university of mississippi have been loading records from their electronic resources in hopes of making the library catalog more useful. the current conversation regarding digital library creation also begs the question, “what is the library catalog?” although the library catalog serves as a searchable inventory of what the library owns, it is simply a pointing mechanism, whether it points the user to a shelf, a building, or a url. in our endeavor to provide instant gratification and fulltext, as well as the user’s desire for information regardless of format, the library catalog is beginning to take a backseat. it was clear four years ago in plan ning digital collections that a metasearch tool would be needed to tie together subscription resources, digital collections, publicly available resources, and the library catalog. it will be interesting to see whether patrons choose to use the formal tools provided by the library or the informal tools developing on the open web, such as google scholar, to perform their research. more than likely, discovery and access will happen through many avenues. while this may complicate the big picture for those in library instruction, it is important to meet users on the open web. one’s best intentions and designs are presented to users but they may choose unintended paths. librarians should watch the paths they are taking and build upon them. sometimes even one’s best attempts fall short, as pointed out clearly in karen schneider’s latest series, “how opacs suck.”12 still it is important to acknowl edge design shortcomings and keep forging ahead. dale flecker, who spoke at the taiga forum, recommended not to spend years trying to “get it right” before imple menting, but instead to consider ourselves in perpetual beta and simply implement and iterate.13 in other words, do not try to make the service perfect before implement ing it. most libraries do not have the time and resources to do this. instead, find ways to gain continual feedback and constantly adjust and develop. students are familiar with internet search engines and do not want to choose between resources. access to a simple resource discovery tool is an important service for users. unfortunately, authentication, product design 52 information technology and libraries | june 2007 and management, and licensing restrictions tend to be stumbling blocks to providing fast and comprehen sive access. regarding the metasearch tool used at the university of mississippi libraries, development part nerships have already been formed between the vendor and a few libraries to improve upon many of the issues discussed. innovative is developing a nextgeneration metasearch product called research pro that leverages ajax technology. while efforts are made to participate in discussions and develop our alreadyexisting tools, it is also impor tant to pay attention to other developments such as google scholar. at this point, google scholar is in beta but this kind of free searching could turn the current infra structure on its ear to the benefit of patrons. the efforts to meet users on the open web and reveal scholarly content are definitely worth keeping an eye on. references 1. roland dietz and kate noerr, “onestop searching bridges the digital divide,” information today 21, no. 7 (2004): s24. 2. niso metasearch initiative, http://www.niso.org/ committees/ms_initiative.html (accessed may 8, 2006). 3. william j. frost, “do we want or need metasearching?” library journal 129, no. 6 (2004): 68. 4. judy luther, “trumping google? metasearching’s prom ise.” library journal 128, no. 16 (2003): 36. 5. donna fyer, “federated search engines,” online 28, no. 2 (2004): 19. 6. jill e. grogg and christine l. ferguson, “openurl link ing with google scholar,” searcher 13, no. 9 (2005): 39–46. 7. mick o’leary, “google scholar: what’s in it for you?” information today 22, no. 7 (2005): 35–39. 8. roy tennant, “is metasearching dead?” library journal 130, no. 12 (2005): 28. 9. o’leary, “google scholar.” 10. what is ebscohost connection?, http://support.epnet .com/knowledge_base/detail.php?id=2716 (accessed may 10, 2006). 11. laura bowering mullen and karen a. hartman, “google scholar and the library web site: the early response by arl libraries,” college & research libraries 67, no. 2 (2006): 106–22. 12. karen g. schneider, “how opacs suck,” ala techsource, http://www.techsource.ala.org/blog/karen+g./sch neider/100003/ (accessed may 10, 2006). 13. dale flecker, “my goodness, life is different,” pre sentation to the taiga forum, mar. 27–28, 2006, http://www .taigaforum.org/pres/fleckerlifeisdifferenttaiga20060327.ppt (accessed may 10, 2006). lita cover 2, cover 3, cover 4 index to advertisers lib-mocs-kmc364-20140106083504 the recon pilot project: a progress report october 1970-may 1971 159 henriette d. avram and lenore s. maruyama: marc development office, library of congress, washington, d. c. synopsis of three progress reports on the recon pilot project submitted by the library of congress to the council on library resources covering the period october 1970-may 1971. progress w reported in the following areas: recon production, foreign language editing test, format recognition, microfilming, input devices, and tasks assigned to the recon working task force. introduction with the implementation of the marc distribution service in march 1969, the library of congress and the library community have had available in machine readable form the catalog records for english language monographs cataloged since 1969. most libraries, however, also need to convert their older cataloging records, and the library of congress attempted to meet these needs by establishing the recon pilot project in august 1969. during the two-year period of the pilot project, various techniques for conversion of retrospective bibliographic records have been tested, and a useful body of catalog records is being converted to machine readable form. the pilot project is being supported with funds from the library of congress, the council on library resources, and the u.s. office of education. earlier articles in the journal of library automation have described the progress through september 1970 ( 1, 2, 3 ). this article covers the period october 1970 through may 1971. 160 journal of library automation vol. 4/3 september, 1971 progress-october 1970 through may 1971 recon production the conversion of 8476 records in the 1969 and 7-series of card numbers that had not been included in the marc distribution service was completed, and these records were sent to 47 subscribers of the marc distribution service. the subscribers were not charged for these records but were asked to send a tape reel to the library for the duplication process. at present, the recon data base consists of 25,206 records in the 7, 1969, and 1968 series of card numbers. records in the 1968 series that were part of the data base for the marc pilot project are being converted by program from the marc i format to the marc ii format, proofed, and updated. to date, 7551 out of 7583 marc i records have been processed. prior to the implementation of the marc distribution service, records were input for test purposes, and the resulting practice tapes contain data requiring correction or updating to correspond with the present specifications of the marc ii format. of the 8340 titles on the practice tapes, 3460 have been updated and reside on the recon master file. these updated machine readable records will be distributed with the recon titles in the 1968 card series. foreign languages editing experiment a foreign language editing experiment was conducted to test the accuracy of marc/recon editors in editing french and german language records. records used for this test included 1180 of the 5000 recon research titles. at least 50 percent accuracy was expected since half of the task of editing a marc record involves being able to read the language of the record. the other half involves identifying the data elements by their location in the record. the three editors used in the experiment had studied french in high school, one having had an additional year in college; none had studied german. each editor was required to edit approximately 200 records in each language. statistics on the number of records edited per hour and the number of errors made, when compared with the same editors' statistics for editing english language records, showed that each editor maintained an approximately equal rate of speed in editing foreign language records as in editing english. the error rate for each editor, however, was more than tripled on foreign records, and each made approximately as many errors in french (the language studied) as in german. each editor averaged more than 12 errors per batch in french and 12 in german. since the marc editorial office has established a standard of 2.5 errors per batch ( 20 records comprising a batch ) as being acceptable for trained marc editors, this error rate would have to be lowered in a production environment. the majority of errors occurred in the title field, which is a portion of the recon pilot projectjavram and maruyama 161 the record that must be read for content in order to be edited correctly. the second largest number of errors occurred in the fixed fields, which are also dependent upon a reading knowledge of the language of the record for accurate coding. the number of errors made in each batch of records by each editor was tabulated to determine if any improvement was made during the course of the experiment. in no case was improvement noted. statistics were also kept on the number of times an editor consulted various sources for help: e.g., dictionaries, the editing manual, the lc official catalog, the reviser, or a language specialist. dictionaries were consulted frequently, and the reviser and language specialists rarely. typing statistics (number of errors) were also recorded for 181 french and 185 german records. the error rate for typing foreign language material was lower than for typing english. the english language statistics, however, were combined for several typists, and the foreign language statistics were for one typist only. charts showed that there was no improvement in the number of typing errors made at the end of the test. the primary conclusion drawn from the results of the experiment is that in order to edit foreign language records with an acceptable degree of accuracy, it would be necessary for the editor to have a good knowledge of the language as well as the editing procedures. f orrnat recognition format recognition is a technique that allows the computer to process unedited bibliographic records by analyzing data strings for certain keywords, significant punctuation, and other clues to determine proper identification of data fields. the library of congress has been developing this technique since early 1969 in order to eliminate substantial portions of the manual editing process, which in turn should represent a considerable savings in the cost of creating machine readable records. the recon report, which was written prior to the completion of the first format recognition feasibility study, concluded that "partial editing combined with format recognition processing is a promising alternative to full editing." ( 4) since that time, the emphasis in the deve1opment of the programs has been shifted to no editing prior to format recognition processing. the programs are in the final stages of acceptance testing, and it is expected that 75% of the records can be processed without errors created by the format recognition programs. preliminary estimates show that it takes approximately half a second of machine time to process one record by format recognition ; the manual editing process, on the other hand, takes approximately six minutes per record. the total amount of core storage required is approximately 120k: 80k for the programs and 40k for the keyword lists. although the keyword lists are maintained as a separate data set on a 2314 disk pack, they are loaded into memory during processing. the format recognition programs have been written 162 journal of library automation vol. 4/3 september, 1971 in assembler language for the library's ibm 360/40 under dos. the logical design of the format recognition process, with detailed flow charts needed for implementation of computer programming, has been published as a worki~;tg document by the american library association so that the technical content would be available to assist librarians in their automation projects ( 5). workflow for format recognition begins with the input of unedited catalog records via the mt /st following the typing specifications created for format recognition. mter being processed by the format recognition programs, these records are proofed by the editors (the first instance in which they see the records), and the necessary corrections or verifications made. correction procedures for format recognition records are the same as those used for regular marc records. figures 1, 2, and 3 are examples of the printed card used for input, the mt /st hard copy, anq the proofsheet of the record created by format recognition. initial use of the format recognition programs is for input of approximately 16,000 recon records in the 1968 card series. input of current marc records via format recognition will begin at a later date. recon records were chosen for large-scale testing because they are not required for an actual production operation such as the marc distribution service. in addition, work has begun on the expansion of format recognition to foreign languages. analysis is being done on german and french monograph records, and eventually spanish, for new or expanded keyword lists and some changes to the algorithms. ewart, andrew. the world's greatest ion' n if airs. london. odhu m~. hl(;j ti. e. 19681• 287 p. 8 plates, lllus .. ports. 2~ em . 20/( n 68-. library of congress 0 301.41'4'0922 lir-97457 hq80l.a2eo fig. 1. input for format recognition. the recon pilot pro;ect/avram and maruyama 163 hq80l.a2e9 ewart, andrew the world's greatest love affairs.#london, odhams, 1967 [i. e. 1968]. 287 p. 8 plates, illus., ports. 22 em. 25/(b68-03757) l.l love. 2. biography. i. title. 301.41/4/0922 68-97457 library of congress fig. 2. mt j st hard copy. 050/ 1 100/1 68-97457 cal :$ab ---·----------·---. ---meps :ta *ewnrt , 1\ndrew. ----------------·-----------------------245/ 1 tila~ *the world's greatest love affair s. 260/ 1 i mp *abc *london , *odhams, *1 967 [i.e. 1 968) . ---·------------------------·-------300/1 col *abc *287 p . *8 p lates , illus., ports , 22*cm . 350/1 pri *e. 015/ 1 :mrha *b68-03757 650/ 1 sut-l*a *love . -------------------·------·--650/2 sut-l*a *biography. 0 --------------------------------08 2/1 ddc*a *301.41 /4/0 922 --;o'bft~c--=-~ --==~-~--~--1 ·~-_--2 -~--~-=i-;_ ~-:-_---;---_ ~ . ~~== c. c. ~-r11r..~-~1tl~.b~·-+13~.--~1*~~.--~1~5~.~etmtyr-------------·------m-;-s-21-.-l%&-h.-------r-3-;-en-!r-z*;-aef'-2-5.--------------. --2-6-; --~.-m--~--t9~--'*l-;-----7r.----fig. 3. proofsheet of format recognition r ecord. microfilming for a full-scale retrospective conversion project at the library of congress, it is likely that records for input would be microfilmed from the card division record set and updated from the corresponding records in the library's official catalog. a subset of the record set, such as the catalog cards for a given year, would be microfilmed and then the appropriate records, i.e., english language monographs, german monographs, etc., would be selected after filming. costs were calculated for a base figure of 100,000 records for the year 1965, and four different methods of 164 journal of library automation vol. 4/3 september, 1971 microfilming have been estimated as follows by the library's photoduplication service: 1) microfilming for a direct-read optical character reader ( $2000); 2) microfilming for reader/ printer specifications ( $2350); 3) microfilming for reader specifications ( $400); and 4) microfilming for a xerox copyflo printout of a card overlaid on a 8 x 10)~ worksheet ( $7000). the differences in cost are primarily attributable to the type of camera used (rotary or planetary) and the kind of feed mechanism (manual or automatic). other factors need to be considered, such as the fact that film suitable for ocr requirements could not be used on xerox copyflo or even for contact printing to positive film. since a readable copy of the original printed card is necessary for updating and proofing, microfilming for direct-read ocr would not be a viable alternative. input devices the monitoring of existent input devices was continued with an investigation of dissly systems' scan data optical character reader. scan data has been modified, via software, to read 55 different type fonts which are recognized by a "best compare" technique using six stored fonts to match against the remaining 49. according to the manufacturer, direct-reading is accomplished with approximately 95% level of accuracy. errors are recorded during a proofing cycle and corrected in the machine readable data base. the scan data equipment does not have a transport for a 3 x 5 document, so that a number of 3 x 5 cards must be attached to an 8 x 14 document for scanning, and therefore these cards would not be returned to the library by the manufacturer. under these conditions, cards to be read by scan data equipment would have to be obtained from stock rather than from the card division record set. unfortunately, many cards are out of stock; and of those that are in stock many may be cards reprinted several times by photo-offset methods and consequently have a poor image. therefore the use of this device would be severely hampered. fifty good quality cards were submitted to dissly systems for an experiment that was run without any modifications to the existing machine and software. five of the 50 cards were returned to the library with a matching printout. the results were not encouraging because many lines of text were missed and many characters misread. recon working task force the recon working task force has compiled work statements for contractual support for two of its research projects. these projects involve investigations on the implications of a national union catalog in machine readable form and the possible utilization of machine readable data bases other than that of the library of congress for use in a national bibliographic store. preliminary tasks related to these projects have been described in earlier progress reports ( 6, 7). the recon pilot project/ avram and maruyama 165 the first part of the work statement deals with the products that could be derived from the machine readable national union catalog: a bibliographic register, indexes by name, title, and subject, and a register of locations. these indexes would provide multiple access points to the records in the national union catalog. the bibliographic register will contain a full bibliographic record on each title covered. the indexes will contain partial records which are associated with the full records in the register, and a given index file will carry one or more partial records for every record in the register. for each title in the register, the register of locations lists those libraries where copies of the title are held. the assumption is made that the indexes under consideration will contain the following data elements (the numeric designations and subfield codes are those used in the marc format fields): name index name ( 100, 110, 111, 400, 410, 411, 600, 610, 611, 700, 710, 711, 800, 810, 811) short title ( 245) main entry in abbreviated form date (fixed field date 1) language (fixed field language code) lc card number register number title index short title ( 130, 240, 241, 245, 440, 630, 730, 7 40, 840) main entry in abbreviated form date (fixed field date 1, or may be omitted if in heading) language (fixed field language code, or may be omitted if in heading) lc card number register number subject index subject heading ( 650, 651) main entry ( 100, 110, or 111) short title (245) date (fixed field date 1) language (fixed field language code) lc card number register number the abbreviated form of main entry noted above is to be included in the record of the name or title index unless the name itself is carried in the main entry of that record. it is defined as follows: 1) for a personal name, a conference, or a uniform title heading-subfield "$a" is appended in brackets after the title; and 2) for a corporate name-subfield "$a" plus the first "$b" subfield are appended, within a single set of brackets, after the title. 166 journal of library automation vol. 4/3 september, 1971 the specific objective of this project is to define and investigate alternative processing schemes associated with an automated national union catalog. this study will explore and examine these processing schemes and the following components: 1) techniques for introducing the necessary input into the automated nuc svstem. the considerations to be covered include the relationship to' marc input, use of the format recognition programs, and the problems of language in terms of selection of input. 2) techniques for structuring or organizing the data contained in the register and the various indexes to establish and maintain the relationships among the records contained in these data bases. 3) techniques and procedures connected with the production of the products listed above. this investigation will also cover any selection and sorting procedures necessary. 4) analysis of the format, i.e., graphic design and printing, size, style, typographic variation, condensation, etc. 5) examination of alternative cumulation patterns associated with the products of the system. in this connection, items such as number of characters in an average entry, average number of entries on a page, expected rate of increase of number of entries in catalog, and segmentation of catalog are to be taken into consideration. 6) feasibility of producing a register through automation techniques. if this can be accomplished, further investigation will be directed toward the feasibility and cost of segmenting the register into three sections: one produced from machine readable records (english and whatever roman alphabet language records are in machine readable form); one produced from roman alphabet language records which are only in printed form; and one produced from non-roman alphabet language records which are only in printed form. the costs associated with the various techniques and procedures enumerated above as well as with their components will be calculated. from these figures an average total cost per title cataloged is to be determined for each alternative processing scheme. these cost values (one per alternative scheme ) are to be compared with those associated with a purely manual processing scheme. included in this cost analysis will be the associated costs for different forms of hard copy as well as for the use of com (computer output microfilm). from any one index and the register of locations, the maximum number of alphabetic and numeric lists (registers of location ordered by register number) will be determined, taking into account ease of usage and technical and economic feasibility. the intent is to have as few lists as possible and still keep the cost within reasonable bounds. supplements to the indexes should be issued monthly; supplements to the register of locations may be issued monthly or quarterly. the recon pilot pro;ectfavram and maruyama 167 the second project is a continuation of a previous investigation on the possible utilization of machine readable data bases other than that produced by the library of congress for use in a national bibliographic store. the results of this project should determine if the use of other data bases is economically and technically feasible. using three or four data bases selected by the recon working task force, the study will determine the following: 1) method and cost of acquiring these other data bases in machine readable form. 2) analysis of the kinds of programs capable of converting records from a number of these data bases into the marc format. different level data bases might require different kinds of programs. if such an effort is deemed feasible, a cost estimate for such a program or array of programs will be calculated. 3) method and cost of printing the records for examination, corrections, etc. 4) method and cost of eliminating records already in the marc data base. 5) method and cost of comparing these records against the lc official catalog and making the necessary changes in the data or content designators. 6) cost for input of additions and corrections. 7) method and cost of incorporating the additions and corrections in the machine readable file. 8) cost of providing means by which these records would not be input again by any future lc retrospective conversion effort. a result of this project should be a determination as to whether high potential or medium potential files, or both, are suitable for conversion. a determination will be made of the minimum yield or the minimum number of titles needed to justify writing the programs to convert these data bases. a factor to be considered is that the number of unique titles will decrease as more data bases are converted for this pool of records. it was decided that the research tasks to study the problems in distributing name and subject cross reference control files would be dropped because of limitations of time and funds. an additional task, however, has been added that can be performed within the time limits of the pilot project. during the past year, the library of congress card division has recorded information about card orders in machine readable form. this information will be analyzed as to the year and language of the most frequent orders because it is assumed that the most popular card orders bear a relationship to the potential use of a data base in machine readable form by libraries in the field. this study involves the following: 1) analysis of a frequency count of lc card orders for a one-year period and preparation of a distribution curve for card series. 168 journal of libmry automation vol. 4/3 september, 1971 2) analysis of a sample of frequently ordered cards to determine with fair reliability the proportion of english language titles in this group. the sample will be large enough to give an indication of other language groups that might be significant for any recon effort. 3) preparation of distribution curves for english language and nonenglish titles by card series. 4) mathematical analysis of the results of 1) -3) above to arrive at a table to show the anticipated utility of converting specified subsets of the lc card set. outlook research in input devices has not uncovered any equipment that offers a significant technical and cost improvement over the mt /st currently used in the library of congress. on-line correction and verification of marc/recon records will, however, speed conversion and will offer relief in the flow of documents and paper work required in a purely batch operation. since marc/recon records will be corrected and verified in one operation rather than by the cyclic process of the present system, · cost savings should be realized. the library of congress will have this on-line capability through the multiple use marc system. this new system is still in the design phase, and a projected date for implementation has not yet been set. to date investigations in the use of direct-read optical character readers have demonstrated that there are no devices currently available capable of scanning the lc printed card. the format recognition programs are operational, and recon titles in the 1968 card series are being converted without any prior editing of the records. procedures are being implemented to gather the necessary data to compare costs of the format recognition technique with costs of conversion with human editing. production statistics have shown that retrospective records are more costly to convert than current records. this higher cost is attributed to the additional tasks in recon of selecting the subset for input from the lc record set and comparing the records with the lc official catalog for updating. since cards in the lc record set do not necessarily reflect the latest changes made to the cards in the lc official catalog, the official catalog comparison is necessary to ensure that recon records are as up-to-date as the cards in the official catalog. although the recon report ( 8) recommended conversion in reverse chronological order with highest priority given to the last ten years of english language monograph cataloging, the working task force study on the card division popular titles may reveal that selective conversion is a more practical approach. the orderliness of chronological conversion by language does mean that records in machine readable form can be ascertained easily. it is interesting, however, to speculate on the use of the recon pilot project/ avram and maruyama 169 these records compared with popular titles which may cross many years and languages. the marc/recon titles constitute the data base for the phase ii card division mechanization project, and close liaison continues to be maintained between both projects. it is recognized that the distribution of cards and marc records requires the same computer based bibliographic files and has similar hardware and software requirements. plans are presently underway to transfer the duplication of tapes for ~.iarc subscribers from the library's ibm 360/40 to the card division's spectra 70 when the phase ii system is operational. the recon pilot project does not officially end until august 1971. in an attempt to make information available as rapidly as possible, the preparation of the final report will begin this summer, since several aspects of the project are complete enough to be documented. the final report will be published by the library of congress, and its availability will be announced in the lc information bulletin and in professional journals. acknowledgments the authors wish to thank the staff members associated with the recon pilot project in the marc development office, the marc editorial office, the technical processes research office, and the photoduplication service of the library of congress for their contributions to the project and, therefore, to this report. special thanks are due to patricia e. parker of the marc development office for her work on the foreign language editing experiment and for writing that section of this article. references 1. avram, henriette d.: "the recon pilot project: a progress report," journal of library automation, 3 (june 1970), 102-114. 2. avram, henriette d.; guiles, kay d.; maruyama, lenore s.: "the recon pilot project: a progress report, november 1969-april 1970," journal of librm·y automation, 3 (september 1970), 230-251. 3. avram, henriette d.; maruyama, lenore s.: "recon pilot project: a progress report, april-september 1970," jow·nal of library automation, 4 ( march 1971 ) , 38-51. 4. recon working task force: conversion of retrospective catalog records to machine-readable form: a study of the feasibility of a national bibliographic service (washington, d.c.: library of congress, 1969 ), 179. 5. u. s. library of congress. information systems office. format recognition process for marc records: a logical design (chicago, american library association, 1970 ). 6. avram , guiles, maruyama, op. cit., 248-249. 7. avram, maruyama, op. cit., 49-51. 8. recon working task force, op. cit., 11. conversion of bibliographic information to machine readable form using on-line computer terminals 217 frederick m. balfour: information systems engineer, technical information dissemination bureau, state university of new york, buffalo, new york a description of the first six months of a profect to convert to machine readable form the entire shelf list of the libraries of the state university of new york at buffalo. ibm datatext~ the on-line computer service which was used for the conversion, provided an upperand lowercase typewriter which transmitted data to disk storage of a digital computer. output was a magnetic tape containing bibliographic information tagged in a· modified marc i format. typists performed all tagging at the console. au information except diacriticals and non-roman alphabets was converted. direct costs for the first six months were $.55 per title. several recent articles have reported on methods and related costs to convert library bibliographic information to machine readable form. chapin ( 1) compared keypunching, paper tape, and optical character recognition. keypunching was also described by hammer ( 2), and black (3) . buckland (4) described paper tape conversion, and johns hopkins university ( 5) reported on optical character recognition. online computer terminals have been proposed ( 6), but have hitherto not been tried in a large library. without attempting to discuss the various techniques, this paper presents a detailed report of converting with on-line computer terminals. it is hoped that the experiences reported here and in the cited articles will 218 journal of library automation vol. 1/ 4 december, 1968 provide suitable information to a library administration considering largescale conversion. background in 1965 a systematic program of automation was begun in the libraries of the state university of new york at buffalo. the general goals of the program were to improve services to patrons and streamline internal operations. there are three general areas usually considered for automation in a library: acquisitions and accounting, the card catalog, and circulation control. an analysis of the system indicated that conversion of the card catalog to machine readable form would provide the greatest improvement in library services and operations. the reasons for the decision were as follows. first, the university libraries are growing rapidly; in one year the shelf list will increase by 60,000 to 100,000 titles, or about 15 to 25 per cent. second, suny buffalo is currently planning a new campus which will be completed in five to ten years. in the interim, the university will be spread over three major campus locations, with many smaller offices and departments located throughout the city, and the libraries must provide some form of bibliographic index for each location. the conversion of the shelf list to machine readable form will allow this distribution of the bibliographic information at a very low cost per title. finally, the project will provide experience in using magnetic tape for the handling of bibliographic information, so that when the library of congress' marc project begins to produce magnetic tapes, suny buffalo will be able to utilize them immediately. selecting the conversion hardware in 1966, a proposal for converting the shelf list to machine readable form ( 7) was presented to the library administration. it pointed out the many improvements in patron services, the advantages to the library staff, both professional and clerical, and the monetary savings to be realized by such a conversion. it discussed the four methods of file conversion then feasible: punched cards, optical scanners, punched paper tape, and magnetic tape-keyed data converters (as exemplified by tl1e mohawk data sciences equipment) ( 8). the proposal recommended using the magnetic tape-keyed data converters because of their input speed, ease of entry, and elimination of handling cards or paper tape. during the first quarter of 1967, a fifth method of conversion was considered, an ibm product called datatext (9). it required the rental of an ibm 27 41 communications terminal (essentially a typewriter), a western electric 103a data-set, and a voice-grade telephone line to the nearest ibm installation, which was cleveland, ohio. a customer may buy time in six-hour blocks called datatext agreements. an agreeconversion of bibliographic information/ balfour 219 ment covered a time segment from 7:00a.m. to 1:00 p.m., or from 1:30 p.m. to 7:30p.m., five days a week. datatext provided everything that the magnetic tape converters did with some important additions. first, it had upperand lower-case alphabet using a shift character (the library administration had seen only the mohawk upper-case converter). second, the typewriter gave a typed copy which was easy to proofread. third, corrections were much easier because of the text-editing capabilities of the on-line computer. text-editing can best be illustrated by describing a typical datatext job. a typist working from source material produces a typewritten page; at the same time, the ibm 27 41 she is using transmits the data being typed to the computer in an area called "working storage". when typing is completed, the clerk gives the appropriate command and the information is stored in an area called "permanent storage", a computer manipulation which can be compared to taking a page from the typewriter and placing it in a folder in a file cabinet. when the typist wishes to make changes to the information, she can give a command to recall it from permanent storage to working storage. she can then manipulate it in several ways. during original entry, the computer automatically assigned numbers to each line. using these line numbers, a typist can move information within the text, can add or delete information, and can correct errors. commands are very simple and concise; for example, it takes four keystrokes to move a new line into the text. in making a correction, the typist merely types the incorrect word and the correct word; the computer then types the complete line to show that the correction has been properly executed. (this instant replay, or on-line interaction, is a benefit unique to the on-line terminal.) after any change, the computer automatically renumbers lines and reformats the entire text. a sample of typed input is illustrated and discussed later in the article. in april 1967, it was decided to test the datatext service because of its powerful correction capability, and because it could be installed and working within three weeks. in may the console was delivered, the telephone equipment installed, and a long-distance line to cleveland rented. a one-month test of datatext proving successful, three more consoles, data sets and telephone lines were added, and the conversion project was fully underway. training the typists the majority of the typing and proofreading staff were drawn from existing personnel in the cataloging department. individuals chosen had a background in either catalog card typing or file maintenance, and consequently a good working knowledge of information on a catalog card. it was anticipated that with a minimum of further training, the typists could identify and tag information as they were typing it at the console. this assumption was critical to the success of the project, since the li.... ----------------~---220 journal of library automation vol. 1/ 4 december, 1968 brary could not afford the professional time necessary for complete pretagging of bibliographic information. typists involved in the one-month test were given several hour-long training sessions on tagging before the console arrived. when the project got underway, a list of all possible tags was posted near the console, and a librarian was nearby to answer questions. mter three weeks of operation, it was obvious that the typists could tag at the console, thus making this part of the test run a success. the tagging system used was developed from the marc i pilot project ( 10). most of the original tags were retained and several additional ones designed to meet specific local needs. tape files created were formatted according to marc i specifications, although fixed fields were left blank. the tagging system is outlined in a reference manual prepared for typists and proofreaders ( 11). operation of an on-line console requires special training. ibm sent a datatext instructor to buffalo on several occasions to provide typist training. for the major training session, which occurred in june, the ibm representative came for a full week. ten typists were trained; five specialized in entering information, and five specialized in retrieving, correcting, and transmitting information. by the end of the week both groups were skilled in their respective specialities, and many typists were able to perform well in both areas. later, typists were trained in several sessions by one of the library's typing staff. during the first three months, the author was near the terminals at all times to answer questions on terminal operation, to collect data for measuring and controlling performance, and to act as supervisor. a librarian was on call for questions on complex library problems, and the programmer-analyst was available to help solve problems regarding input format and tagging. at the end of this period, appropriate clerical staff had been trained to supervise minute-to-minute operation. conversion procedures the general method of conversion (figure 1 ) was as follows. a typist typed into "working storage" for an hour, inputting 15 to 30 shelf list cards. she instructed the computer to store this "document" in a permanent storage location on disc. she then placed the typed copy and cards in a proofreading bin, cleared working storage, and started another document. a proofreader compared typed copy with original cards and indicated any errors. the corrected document then went to a correction typist who "retrieved" the document from permanent storage to working storage, performed the corrections, and transmitted the corrected document to magnetic tape. the original uncorrected document was left in permanent storage overnight and deleted the following day. documents were transmitted to tape bqffalo 3 x 5 shelf list cards hard copy proofreading operation file conversion of bibliographic information/ balfour 221 cleveland computer disc storage mail to library fig. 1. shelf list conversion information flow. 222 journal of library automation vol. 1/ 4 december, 1968 for about two weeks and the accumulation returned to the library via the mails. (ibm saved all permanent storage records for one week as a security measure. if a library typist inadvertently deleted a document , it could be retrieved by the computer operator. ) figure 2 shows a sample of typed input and subsequent correction. line numbers, as they are stored on the disc, are included on the right margin for ease of explanation. lines typed in capitals are computer r esponses to commands, the first entry being the command to clear working storage. the computer responds and then indicates that the console is in one of two general input modes. all cards are typed in "automatic" mode, for which the typist gives the appropriate command. when the computer responds the typist asks for the next line number, which is 3, and begins to input the card. in line 4, the typist makes an error and realizes it before throwing the carriage. she hits the "attention" key proc cleared uncontrolled mode a automatic mode n next number -3 90t bs2575.3.a7 lot bible. n.t. matthew. english. 1963. new english. 20t the gospel according to matthew=. commemen 3 ntary by a.w. argyle. 4 30a cambridge 30b university press 30c 1963 40t 227 p. maps. 20 em. sot the dambridge bible commentary: new english bible 70t bible. n.t. matthew -commentaries. 7lt argyle, aubrey william, 191073t title. 60z 92t 226.207 94t 63-23728 n next number -10 6 dambridge cambridge ~ot the cambridge bible commentary: new english bible fig. 2. sample input and correction of one shelf list card. 5 6 7 8 9 conversion of bibliographic info1'mation/balfovr 223 clueing the underscore, rolls the platen down, back spaces, and retypes the correct word. the computer then corrects the error. in line 6 the typist misspells "cambridge", but does not realize it before throwing the carriage. the correction is shown at the bottom although the input typist could not have performed it herself; it would have gone through proofreading and back to the correction typist. the correction is made by typing the line number, in this case "6", the incorrect word, "dambridge", tab, and the correct word. the computer responds by typing out the complete line showing the correction. except for a brief period, the shelf list was converted in alphabetic order, and by december 1 shelf list drawers through the e's were completed. early in the project, some of the literature classification, p and pq, was converted. foreign languages in the pq's gave no particular problems, and typing rates did not drop. all cards were converted in shelf list order except for those having non-western alphabets. when possible, these were transliterated and entered. otherwise their input was delayed. since the 2741 console has no diacritical marks, these were left out; however each card having them was entered and given a special tag to permit retrieval at a later date when diacritical marks could be added by special coding such as used by marc. conversion consoles and shelf list were in the same building. each day, several inches of cards were removed from the drawer being processed and a marker inserted indicating where the cards had gone. in general operation, cards were returned and refiled in less than a day so that inconvenience to staff was minimal. as a card was proofread, it was marked on the back with a "c" and the ·upper right hand comer received a very small notch with a mcbee punch. thus, newly cataloged cards filed with cards already converted are recognizable by the unnotched comer. costs table 1 gives a statistical summary of the conversion project from july 31 through december 1, 1967. the term "l.c. card" refers to a complete bibliographic entry for a title and may include more than one physical card, or may include writing on the back of a card. input and correction functions are reported separately and then totaled to give a realistic input rate per hour for corrected cards. supervisor cost reflects wages of clerical supervisors only. those of the programmer-analyst, the librarian and the systems analyst assigned to the project are not included. a breakdown of monthly equipment costs per console is given in table 2. installation costs were $150 for each terminal, and $50 for each leased telephone line. when the project operated four consoles, the monthly equipment cost was $4,472. 224 journal of library automation vol. 1/ 4 december, 1968 table 1. conversion project statistics (july 31-dec. 1, 1967) input, proofreading and correction total l.c. cards input typist hours input typist hours correcting total typist hours proofreading hours number of errors per l.c. card l.c. card input rate per hour l.c. card correction rate per hour overall conversion rate (input & correction) cards per hour proofreading rate, cards per hour costs labor cost @ $1.75 per hour equipment and supervisors total cost cost per card converted utilization of console time hours typed hours consoles down hours computer down hours lost time table 2. monthly operational costs per terminal ibm 2741 communications terminal western electric 103a data set 24-hour voice-grade lease line to cleveland plus local telephone costs 2 data text agreements @ $310. total 3,035 492 3,381 245 91 438 4,155 49,348 3,527 1,235 .42 16.3 100 14 40 $ 8,078.00 18,995.00 $27,073.00 $0.55 81.4% 5.9% 2.2% 10.5% 100.0% $ 85.00 27.50 385.50 620.00 $1118.00 "hours typed" is time that consoles were actually being used to input or correct cards. this is slightly less than "typist hours worked" because some correction has been delayed, but it was included in hours worked to give true representation of input rates. "hours consoles down" reflects time lost due to console breakdown. during the early part of the conversion of bibliographic information/ balfour 225 period, two consoles were failing often. however, as operating problems were solved, console down-time dropped far below the average 5.9 per cent shown. "hours computer down" was also greater during early weeks of the project. however, for each hour down, ibm credited the library with $12.00 ( $3.00 per terminal for four terminals). "hours lost time" reflects periods when a working console could not be manned because of personnel breaks or operator absence. all times are given in console-hours, four consoles operating for one hour being recorded as four hours. the error rate of .42 errors per card is very low. allowing 350 characters per shelf list card, typists were making one error for every 830 keystrokes. this translates to about 3 errors per typewritten page of 50 characters per line, 50 lines per page. the office of secretarial studies of suny at buffalo indicates that this rate is well within the tolerance for "normal" typing, as in a typing pool. when it is considered that typists were tagging and inputting complicated bibliographic information, rate of accuracy was commendably high. typists used in the project included the lowest salary grade of civilservice typists, part-time hourly workers, and students. an acceptable input rate for civil service typists was 18 cards per hour, which is equivalent to 21 5-character words per minute. the faster typists, at 26 cards per hour, were typing at 30 words per minute. again, let it be mentioned that the material was complex and that typists were required to tag each piece of information. conclusions several points can be made about converting with datatext. it was easy to implement and received excellent support from ibm. the ibm information marketing staff in cleveland provided constant assistance during the early part of the installation and visited often once the project was successfully underway. ibm sent the datatext instructor as often as needed and provided free computer time during teaching sessions. the four long-distance telephone lines and data sets proved reliable. there was only one instance during the period when a line was inoperable and it was repaired in three hours. the liaison and support from new york bell telephone was very good. datatext costs would have been lower had the ibm installation been nearer. cleveland is 173 miles from buffalo giving a 24-hour leaseline cost of $342 per month. (datatext service will soon include a uniform long-distance-lines cost.) verification or correction on datatext does not require human retyping of each line of entry. only the word in error and its replacement need be typed; the console then types the corrected line to show that the error was deleted and the replacement inserted. consequently correction costs are low and corrections accurate. 226 journal of library automation vol . l / 4 december, 1968 average rates and costs given in table i reflect learning during the first six months of the project. towards the end of the reported period, rates were improving and costs decreasing. since december 1967, the project has added three more consoles and uses a datatext service provided by a campus computer. costs have dropped below $.45 per card, a figure which will increase somewhat when diacriticals are added. potentially cost per title for complete conversion is under $.50. references 1. chapin, richard e.; pretzer, dale h.: "comparative costs of converting shelf list records to machine readable form," journal of library automation, 1 (march 1968), 66-7 4. 2. hammer, donald p.: "problems in the conversion of bibliographic dataa keypunching experiment," american documentation, 19 (january 1968), 12-17. 3. black, donald v. : "creation of computer input in an expanded character set," journal of library automation, 1 (june 1968), 110-120. 4. buckland, l. f.: recording of library of congress bibliographical data in machine readable form (rev. ed.; washington, d.c.: council on library resources, 1965). 5. the johns hopkins university. milton s. eisenhower library: progress report on an operations research and systems engineering study of a university library (baltimore: johns hopkins, 1965). 6. international business machines corporation. federal systems division : report of a pilot protect for converting the pre-1952 national union catalog to a machine readable record (rockville, maryland: ibm, 1965). 7. lazorick, gerald j.; herling, john; atkinson, hugh: conversion of shelf list bibliographic information to machine readable form and production of book indexes to shelf list (buffalo, n.y.: state university of new york at buffalo, technical information dissemination bureau, 1966). 8. mohawk data sciences corp.: datagram no. 35, 1181 twk correspondence data-recorder, (herkimer, n.y., mohawk data sciences corp., 1967). 9. international business machines corporation: datatext operators instruction guide, form # j20-0010-1 (ibm, white plains, n.y., 1967). 10. u.s. library of congress, information systems office: a preliminary report on the marc (machine readable catalog) pilot protect (washington, d.c.: library of congress, 1966). 11. michael m. coffey: reference manual for typists and proofreaders. sunyab shelf list conversion project (buffalo, n.y. : suny at buffalo, technical information dissemination bureau, 1968). a hybrid access method for bibliographic records abraham bookstein: the university of chicago graduate library school, chicago, illinois. 97 this paper defines an access method for bibliographic reco1'ds that combines features of the sea1'ch key app1'oach and the inverted file approach. it is a refinement of the search key technique that permits its extension to la1'ge files. a method by which this approach can be efficiently implemented is suggested. introduction a major problem in the development of computerized files of bibliographic records is the creation of a convenient and economical mechanism to access the records. as the problem of organizing a file for efficient access is a general one, a number of structural devices have been suggested. hsiao and harary propose an abstract model for file structure that encompasses those that are discussed most frequently. 1 lefkovitz discusses these techniques in more detail and considers the advantages of each for implementation, while dodd and knuth describe the data structures needed in implementing such files. 2-4 these works reveal the interrelation between a file's organization and its retrieval capability, but the determination of which routes of access to provide must be the task of those responsible for creating the file. such a determination may involve consideration of both the intrinsic structure of the items represented by the file and the conditions under which the file is to be used. they will influence which file organization should be chosen. because of the complexity inherent in collections of bibliographic " items, the problem of determining suitable access routes to library files has been a challenging one. almost any datum may, on some occasion, be a useful means of entering the file. dimsdale and heaps, in their discussion of a file structure for an on-line catalog, explicitly propose words from the title, authors, and library of congress call numbers. 5 in this paper we shall consider the problem of accessing a known item by means of information contained in the author and title field. we shall concentrate on two approaches that have received much attention-the use 98 i ottrnal of library automation vol. 7/2 june 197 4 of a truncated search key, referred to simply as search key, and the use of boolean expressions of key words from the title. both of these are intended to allow a user simple entry into the file when the full field of information is long, complicated, or, perhaps, incompletely known by the user. the authors and titles of books often share these characteristics. each of the approaches, taken by itself, has its strengths and its weaknesses. we will discuss each technique in tum, and then suggest an elaboration of the search key technique that incorporates some features of the boolean search technique; this combination of techniques should enable systems that are committed to the use of search keys as a primary access route to extend this technique to large files. it introduces into the search key approach some of the flexibility of the key word approach. search keys this approach defines at least one special field, the search key, for each item represented in the file, and allows retrieval of the record for an item by inputting the value of its search key. 68 the search key should be constructed so as to allow its evaluation from data that are available at the time of access. the main advantage of this approach as it is usually implemented has been its great simplicity-for a broad variety of materials, the key can be readily evaluated and quickly entered into the system. the most heavily discussed defect of this approach is that it will sometimes retrieve a considerable number of records to a single request. consider, for example, these works: 1. ramsay, blanche margaret. relation of various climactic factors to the growth and development of sugar beets, and 2. ramsey, ian thomas. religious language. the popular ( 3, 3) search key, constructed by concatenating the first three letters of the author's name and the first three letters of the first significant word of the title, would represent each of these by the key ram, rel. this defect becomes particularly severe with certain corporate entries and works such as conference proceedings. furthermore, this difficulty can be expected to become aggravated as the file increases in size or, equivalently, as some items are given multiple search key values; the latter may be required in order to alleviate the problems inherent in having to access items with ambiguous or multiple forms of titles. attempts to remedy the difficulty of multiple retrievals have resulted in increasingly complex keys, defeating the purpose for which this technique was originally proposed. a more complex key makes greater demands on the user, encourages mistakes on entry, and also might increase the likelihood of two individuals deriving different keys for the same item. inverted files in this approach, a user attempts to retrieve a record by forming a hyb1·id access methodjbookstein 99 boolean expression of key words taken from various fields of the desired record. 9• 10 stanford university's ballots, for example, allows the user to enter the file by means of words taken from the title of a book. two advantages of this approach as compared to the search key are that: (a) the user need not know the information required to form a search key, for example the first word of the title; and (b) the user is able to enter the system by what appears to him to be the most distinctive terms in the title, thereby minimizing false drops. users of ballots have found that because of the speed at which computers operate, usually the indexes can be manipulated and a record retrieved immediately, or in a very short period of time. fayollat gives an estimate of two to five seconds as the response time. the most direct way to implement this approach would be to access each record in the file and compare it to the request. for any but the smallest files this would be unreasonably costly in computer time. an alternative, and customary, implementation involves maintenance of indexes of key words. while experience with this approach, as at the ballots project, recommends this as a workable implementation, it can be costly in terms of the computer costs involved with upkeeping the indexes. hybrid approach we offer for consideration an elaboration of the search key approach that incorporates aspects of the key word approach. it is intended as an alternative to developing increasingly complex keys for systems adopting a search key approach, but for which a simple search key retrieves too many items; possibly this approach can be selectively applied to the more troublesome parts of the file, such as to items with corporate authors. this approach associates a search key with each record, hopefully one that is simple and easily derived. a user would begin by entering into the system the search key. if the system finds that the number of items that would be retrieved exceeds a preset threshold, it would output a message requesting that the user enter a set of key words taken from various fields in the records; the title would be very useful in this regard. the system first generates a subfile of records having the desired search key. if a hashing technique is used, constructing this subfile can be accomplished quickly and at relatively little cost in space for tables. 11 once the smaller file is formed, a complete search of the full records can be made for the key words. since the system operates in two phases, it is less sensitive to the number of records the search key retrieves as far as user considerations are concerned. ease of use becomes the dominating objective in designing the search key. experience to date suggests that even a very simple search key will almost always produce less than thirty records with files having in the order of 100,000 records. however, a complete search of a reduced file of thirty records should be feasible; in fact, usually the subfile will be no larger than two or three records. from one point of view, in the hybrid system, we can 100 journal of library automation vol. 7/2 june 1974 think of the search key not as an access mechanism, as earlier, but rather as a file reduction mechanism. this system trades the cost of maintaining and storing large indexes for an increase in costs of computer processing; only relatively easily maintained hash tables for fixed length search keys need be maintained. an accurate assessment of these costs can be made only after the statistical characteristics of various search keys have been explored. observations if it should be desired to implement a hybrid system, the following observations would be in order: 1. among the current concerns of facilities with large bibliographic files is file compaction. if records will have to be searched for key words, this consideration will influence planning of compaction techniques. for example, a technique such as cop ack, which completely scrambles the bits in a record, would not be permissible. 12 use of variable length codes for characters, such as in hoffman coding, would allow searches for key words; most likely such a search would be implemented by attempting to match substrings of bits rather than matching on the full word level.13 another common compaction technique, bigram coding, would also complicate the separation of words unless the blank were prevented from combining with other characters; because of the frequency with which the blank occurs with other characters, this restriction would interfere considerably with the efficacy of the technique.u a different approach would be to recognize that each word could have only two "spellings," depending on what happened to the blank preceding the word, and both spellings could be tested. (a brief survey of the above compaction techniques has been conducted by fouty.15 ) 2. though a complete search for key words would be feasible on a small file, it is possible to expedite the search considerably by means of a technique devised by malcolm harrison, which involves adding a fixed number of bits, or signatures, to each field on which a search can take place; these additional bits are derived in a well-defined way from the original field. 16• 17 this subfield is a fixed-size representation of the full field in a form that can be used to very rapidly eliminate most records which would not pass the key word matching test. it is stored in the index to the file along with the address of the record. though this preliminary test is not foolproof, it could considerably reduce the size of the subfile that requires a more costly complete search, thereby reducing the number of disc accesses. if this procedure is adopted, a possible sequence of events would be as follows: (a) a user inputs a search key and, perhaps, a couple of key words. these may be words he is certain are in the title, although the hybrid access methodjbookstein 101 name of a series, the author, or subject headings would also represent candidates. (b) on the basis of the search key the system creates a sub file of record addresses and signatures taken from the index-if the user is unfortunate the subfile would have a large number of records. (c) a rapid preliminary search of the signatures using the harrison technique is made of the reduced file to test whether the key words could possibly be part of a record. this pass eliminates a number of records; how efficient this technique is will depend on the number of bits the system associates with each representative field. (d) finally, the full records of the remaining items are retrieved and a full search is made. at any point, if the subfile is too large, the system may request additional key words. example of technique implementation how to create a signature for a record is best explained by means of an example. many variants are possible, and we have chosen a simple one for the purposes of illustration. the signature we shall create will consist of one word of thirty-two bits. we proceed as follows: 1. list all the substantive words of the title, e.g., relation, various, climactic, ... , beets, if we consider one of the titles mentioned above. 2. truncate each word to, say, the first four characters: rela, vari, ... , beet. other truncation sizes, or no truncation at all, may be elected. 3. for each string of characters produced in this way, form the two consecutive strings of three characters. for example, "vari" contributes "var" and "ari." since the first word is already represented in the search key, we may use only the second three-letter string for that word-here "rela" is represented only by "ela." implicit in this implementation is the assumption that if a user remembers anything about a word, he will correctly remember at least its first three characters, and that the first four characters go a long way toward giving the word away. 4. finally, we turn on a bit in the signature for each three-character string, essentially creating a hash code of thirty-two bits. the code should incorporate information from all three characters. for purposes of illustration, the following method will suffice: (a) for each letter in a three-letter string, substitute the rank of that letter in the alphabet, beginning with 01 for a-thus "ela" becomes 05,12,01; (b) consider the string of digits as a single six-digit number, and multiply that number by 1111-thus "ela" becomes 51201 and then 56884311; (c) divide by 32 and use the remainder as the address of the bit which is to be turned on. the string "ela" is thus associated with bit number 23, where the leftmost bit is the oth bit. as the algorithm is 102 journal of library automation vol. 7/2 june 1974 applied to each three-character string, the signature is formed. the book by blanche margaret ramsay is accordingly represented by: 01000011100100011000010100100101 similarly the book by ian thomas ramsey is represented by: 00000000000000010000000001000010 suppose a patron, or a cataloger, wishes to see the record associated with mr. ramsey's book on religious language. he would enter the search key, ram,rel, and, say, the word "language." among the index entries reb·ieved by the search key will be the desired book, and also the book by ramsay, dealing with sugar beets. the signature for the word "language" has bits numbered 30 and 25 turned on. since the ramsay book does not have both of these turned on (in this case neither bit is turned on), it is immediately eliminated; the actual records retrieved from the file will be only those for which both bits are on. though it is quite possible that false drops can be incurred in this way, clearly many incorrect records are easily eliminated. note also that the user need input only as much of the word as he has confidence in, provided that at least three characters are produced. use of the above technique leaves a number of decisions that still must be made by the system designer. among these are: 1. should a signature be associated with each item, or only a part of them, for example, with corporate authors? 2. how much truncation is appropriate, if any? if no truncation is used, then the user can input fragments of words, including fragments taken from the middle of a word, as well as full words. on the other hand, as the signature fills up, the probability of a false drop increases. earlier research contains a formula that allows us to estimate this effect.18 consider a title with six significant words. fayollat has found that in a file of biomedical serials, about 83 percent of all items will be of this size or less.19 similarly, let us assume that the average word in the title is made up of eight characters, a modal number of characters in fayollat's data base. if the user requests a term composed also of eight characters, then table 1 estimates the probability of a false drop as a function of the b·uncation size. table l. probability of false drops as function of truncation size. truncation probability of length false drop 3 .17 4 .10 5 .08 6 .08 7 .08 8 .09 it is seen that for this typical case, the method eliminates about 90 perhybrid access methodjbookstein 103 cent of the false drops. it must be understood that longer titles, or titles made up of longer words, will be more likely to be erroneously retrieved; on the other hand, the user can increase his precision by inputting a larger number of terms. the above calculation assumes that terms in the request and in the title are independent; of course, all items having the same search key as the request and sharing the discriminant word will be retrieved; presumably the user will minimize this effect by choosing distinctive words. fayollat finds that 50 percent of the words appearing in his titles occur only once. conclusion in conclusion, we propose a technique for entering a bibliographic data base that retains the simplicity of search keys while also including some of the flexibility that boolean expressions of key words have for uniquely defining an item. in such a system, the only indexes that must be maintained are the hash tables; the other indexes, such as title words, are replaced by the search algorithms. if a signature, the supplementary field described above, is also stored in the index, this approach reduces the number of disc accesses. a major limitation of this approach is that a user must be able to provide a search key; this is shared, however, with systems depending exclusively on search keys. furthermore, since the system is capable of handling larger numbers of retums on the search key, there is greater inducement to associating more search key values for each item. thus such a hybrid system allows groups that find search keys an attractive access technique to extend this approach to file sizes which strain the capacities of the direct approach. references 1. d. hsiao and f. harary, "a formal system for information retrieval from files," communications of the acm 13:67-73 (feb. 1970). 2. d. lefkovitz, file structures for on-line systems (new york: spartan books, 1969). 3. g. dodd, "elements of data management systems," computing surveys 1:117-35 (june 1969). 4. d. knuth, fundamental algorithms, the art of computer programming, vol. 1 (new york: addison-wesley, 1968). 5. j. j. dimsdale and h. s. heaps, "file structure for an on-line catalog of one million titles," journal of libmry automation 6:37-55 (march 1973). 6. f. g. kilgour, p. l. long, and e. b. leiderman, "retrieval of bibliographic entries for a name-title catalog by use of truncated search keys," proceedings of the asis 7:79-82 (1970). 7. p. l. long and f. g. kilgour, "a truncated search key title index," journal of libmry automation 5:17-20 (march 1972). 8. a. landgraf, k. rastogi, and p. long, "corporate author entry records retrieved by use of derived truncated search keys," journal of library automation 6:15661 (sept. 1973). 9. james fayollat, "on-line serials control system in a large bio-medical library. 104 journal of library automation vol. 7/2 june 1974 part ii. evaluation of retrieval features," journal of the asis 23:353-58 (nov.dec. 1972). 10. a. h. epstein et al., articles in proceedings of the asis 10 ( 1973). 11. a. bookstein, "double hashing," journal of the asis 23:402-5 (nov.dec. 1972). 12. b. a. matton and p. a. d. de maine, "automatic data compression," communications of the acm 10:711-15 (nov. 1967). 13. w. d. maurer, "file compression using hoffman coding," in computing metho.ds in optimization problems 2, from second international conference on computing methods in optimization problems (new york: academic press, 1969), p.242-56. 14. w. d. schieber and g. w. thomas, "compaction of alphanumeric data," journal of library automation 4:198-206 (dec. 1971). 15. gary fouty, unpublished master's thesis, university of chicago. 16. m. harrison, "implementation of the substring test by hashing," communications of the acm 14:777-79 (dec. 1971). 17. a. bookstein, "on malcolm harrison's subsb·ing testing technique," communications of the acm 16:180-81 (march 1973). 18. ibid. 19. fayollat, "on-line serials control system." 20 file organization of library records i. a. warheit: international business machines corporation, san jose, california library records and their utilization are described and the various types of file organization available are examined. the serial file with a series of inverted indexes is preferred to the simple serial file or a threaded list file. it is shown how various records should be stored, according to their utilization, in the available storage devices in order to achieve optimum cost-performance. one of the problems data processing people are beginning to face is the organization of library files. these are some of the largest and most voluminous files that will have to be organized, maintained and searched. they range in size from the national union catalog of the library of congress, which has over sixteen million records with an average of three hundred characters each, down to the hundreds of small college catalogs of 100,000 records. there are more than fifty universities whose holdings range from one million to over eight million volumes. the average holdings of library systems serving cities of 500,000 or more exceed two million volumes, although the actual number of titles is less. since the tum of the century the university libraries have been growing exponentially and at present are doubling, on the average, every fifteen years. , also the abstracting-indexing services, whose records are very similar to library catalog records and are used in much the same way, have grown very large. chemical abstracts which has been operating since 1907, now has over three and a half million citations. it provides data on file organization of library records/w arheit 21 some three million compounds and is today adding over a quarter of a million citations each year. if the present rate of growth continues, it will be adding 400,000 citations a year by 1971. index medicus and biological abstracts are very similar and there are a number of other somewhat smaller bibliographic services in the field of metals, engineering, physics, petroleum, urban renewal, atomic energy, meteorology, geology, aerospace, and so on. in addition, library-type file maintenance, organization and search are being applied to medical records, adverse drug reaction reports, intelligence files, engineering drawings, museum catalogs · and the like, and these too, represent very large information retrieval files. in other words, library files are very widespread and are beginning to become a problem for data processing. characteristics of files the aforementioned library files have certain common characteristics. first, as already noted, they are large. in the next ten or fifteen years there will probably be several hundred libraries with holdings exceeding one million volumes each. second, the records themselves are alphabetic and tend to be voluminous. they range from two hundred characters in an index journal, to three hundred characters for the standard catalog card up to two thousand characters for the abstract journals. in 1962 the library of congress, for example, estimated that it would need a file exceeding 9 x 108 bits to do its normal library processing and to store the serial records; it would need a file of 1.3 x 109 bits to store the circulation records and location directory and monitor the use of the collection, and would need a file of 1012 bits for the central catalog and the catalog authority files ( 1) . on the basis of library experience since 1962, these figures are generally considered too low. third, file records are variable in length. the librarian cannot control his inputs. the world's publica,tions appear in every shape, form and identity and they must be recorded the way they have appeared so that they can be properly identified. artificial identification such as book numbers, call numbers, coden numbers for journals and the like are simply parochial conveniences and do not replace the actual bibliographic record. records in a large catalog file are generally stable and not dynamic. if there is a new edition of a document, a new bibliographic record is made. if the old document is retained along with the new edition, the old catalog record is also retained. the record is discarded only if the document is discarded and, in the large research library, this occurs very infrequently. new indexing or cataloging is seldom applied to old records. in contrast, the smaller item record file used for acquisition and processing, the circulation file, and the serials records file, all ranging from 10,000 to 100,000 records, are dynamic records requiring many and frequent changes, additions and deletions. _ 22 journal of library automation vol. 2/1 march, 1969 each record item must have a number of different access points, since a single class or access point which everyone will accept is an impossibility. at present, with conventional library cataloging, card catalogs and printed indexes provide about five or six access points or records per title. however, computer systems, with their greater opportunity to do deeper indexing, are providing from ten to twenty keys or access points per title. distribution of index tenns is very uneven and not predictable. a few terms have a great many postings or addresses, while many terms, notably author entries, have only one or two postings. file segmentation by subject class has been proposed by some data processing personnel, but inter-disciplinary needs are such that subject segmentation is not considered very seriously. file segmentation by date, especially for the abstract services, is increasing in popularity. it is generally thought that major activity, in the technologies especially, is concentrated in current records; this is less true, however, in the sciences and even less in the humanities. public library and undergraduate library personnel may not object to segmenting their files, but those librarians responsible for major research collections that cover all disciplines do not look with favor on segmented files. although circulation records do provide some clues as to the activity of the various parts of a library's collection, no one really knows what the search activity in the catalog is, or how it is distributed across the various records used. therefore, since every record is considered permanent in libraries, major effort has been expended on input processing which has included the recording of much material whose utility is questionable. a user wants to access files in open language, and wants to receive response in open language; he will not use codes and so-called machine language and will tolerate only a minimum of training on methods to interrogate the file. he prefers to engage in an actual dialogue with the file and if he cannot do this will ask a reference librarian or reader's advisor to find the references for him. he also wants real-time response. if he doesn't get fairly prompt answers, he will go elsewhere to satisfy his informational needs. types of files the librarian must work with a number of files: 1) the item record file is the record of an item, book, journal, report. etc., that is being ordered, is on order, is being received, or is being processed by the cataloger. 2) the catalog file is the permanent bibliographic and subject record of the item that has been processed by the cataloger. 3) the serial~ record file, which is in two parts, is the record of holdings of completed volumes both bound and unbound, and the check-in record of currently received periodical issues. 4) the circulation control file keeps the record of all items loaned or otherwise charged out. 5) the catalog authority file organization of library records/w arheit 23 file is the thesaurus-like vocabulary control which indexers and catalogers use as their authority list and guide in assigning index terms. it is also used to "normalize" the inquiries of a searcher and convert them to legitimate index terms. the librarian is also concerned with a number of indexed abstracts produced by various discipline oriented institutions which are used in libraries. he also uses a number of special files: borrower or patron file, special collection files, location files, vendor files, and the like. except for a few comments about the item record, this discussion is confined primarily to the catalog file, which is by far the largest file and, for the librarian and the general user, the most important. as already noted, in most respects it is very similar to the indexed abstract file and, in fact, in certain special libraries, these two files are combined. in process file the in process, or item record, file consists of records of all items which the library is acquiring and processing. it is not a very large file, or, at least if properly policed, should not be. unfortunately, because in manual systems it is difficult continuously to follow up outstanding orders, a lot of deadwood accumulates and files become unnaturally large and difficult to handle. in a well controlled file, however, the number of records does not grow appreciably, for, although new items are added, processed titles are removed when they are added to the catalog file. · in addition to providing such normal bibliographic access points as personal author, corporate author, title, report nmj}ber and the like, the item record may also be searched by a number of specialized keys: order number, vendor, publisher, journal code, contract number, fund, requester. the item record is very dynamic. information available to the librarian when the order for an item is placed may be faulty. new information will be coming in about the item, such as price, shipping costs, invoice number, change in vendor, and change in title. various funds have to be charged and obligations changed, payments authorized, funds decremented, receipt notices prepared and sent to requesters, flags in various files changed to prevent duplicate orders and the bibliographic record transmitted to the cataloging staff. however, once an item has been received and cataloged, only the bibliographic information (author, title, place, publisher, date, pagination) are retained and the rest of the information is retired to an historical file. ( 2). because it would provide greater flexibility as new and unexpected demands are generated, the best way to handle this dynamic file would be with a generalized data management system rather than with a tailormade acquisitions and processing program. although present data management systems are really not suitable, because of variable length records in item record files and because terminals will be used, it appears that some could be adapted. 24 journal of library automation vol. 2/1 march, 1969 catalog file the tendency today, however, is to build a single master file with various functional fields where bibliographic information, ordering, and purchasing data, loan records, location information and other item control data are stored. how should this very large master catalog file be organized so that it will be easy and economical to maintain and provide all the desired search capabilities? there are three basic file organization schemes in use today for information retrieval: the serial file, the inverted file and the list process file ( 3,4,5). actually, from a technical point of view, both the inverted file and the list process file represent two different classes of list structures and are, therefore, sometimes referred to as the inverted list system and the threaded list system. serial file organization although the serial file is the easiest and cheapest to maintain, the librarian obviously cannot accept purely serial searching of his catalog. the file is much too big and the real time requirements are such as to rule out any but the shortest, simplest serial or sequential search. as will be pointed out later, the librarian does need some serial searching capability, and of course he does need it if he wants to do any browsing. however, if he is to provide any kind of useful service, he must use direct-access storage devices and access to his records individually. threaded list file organization for a while there was some interest in using a threaded list file organization for the catalog file. here, the searcher is first directed through a dictionary or directory to the latest record associated with a term. this record also contains the chain address of the previous record having the same descriptor, so that a user can run through a "chain" or "list" until he reaches the oldest or last record, or comes back full circle to the starting record. each record belongs to a number of lists, one for each descriptor used to describe it, and there are as many lists as there are descriptors. such a system seems economical of storage space in that a secondary or separate index does not have to be stored, but, since storage space for the chain or link address has to be provided, the actual savings are very small. there are several possible refinements of this list file organization which reduce storage costs. some involve elimination of redundant information; a term, or any other searchable piece of information, is stored just once, sometimes in the form of a table. each record that contains searchable information has a pointer to the term itself. there have to be, of course, pointers from every term back to the records as well. insofar as the pointers may require fewer bits than the terms or addresses themselves, there is a saving in storage space. it does cost some additional processing time and file maintenance is somewhat complicated. ( 6). file organization of library records/w arheit 25 another economy measure is provided by what is generally called a multilist system which groups several-usually three-descriptors into one super key with one chain address. a multilist not only saves space but also speeds both file posting and searching by processing multiple descriptors simultaneously. ( 7,8,9,10,11). such a system, to be workable, must permit grouping of various descriptors into mutually exclusive groups, and within each group there must be some equitable distribution of descriptors posted to records. in normal library information retrieval applications, a very large percentage of the descriptors are used just for one or two documents and only a few descriptors are used to identify a large number of document records. in other words, most of the so-called super keys end up having just a single real descriptor, which is equivalent to establishing a separate list for each descriptor. in a test made with the defense document center collection it turned out that about ninety percent of the super keys had only single descriptors. ( 12,13). there are, in addition, special modifications of multilist files which essentially involve segmenting the multilist to fit the hardware, for example, the track length or cylinder size. (14). a fragmented sub-list, sometimes referred to as a cellular multilist, may even contain all the link addresses in the directory, thus becoming indistinguishable from an inverted file. any list process file organization, however, does pose serious file maintenance problems, especially where individual records must be changed or deleted. also special precautions must be taken to avoid broken chains and provision made to repair breaks, although some advocates of list process files claim it is easier to maintain thread~d lists than inverted lists. of course, if multilists are used, a special effort must be made to build the super keys. · it must not be forgotten that a threaded list directory can only provide the search statistics for a single term and, unlike the inverted list, can only provide intersection statistics upon completion of a total search. the few librarians who have been exposed to threaded list file organization have not reacted favorably. a few have been interested in applying this technique to do hierarchical searches and other relationship connections in their authority lists or thesauri, but have not seriously considered using it for their catalog files. inverted file organization the traditional library file organization as exemplified by the standard card catalog has been based on a serial main file plus an inverted file. here a normal serial file is "inverted" and the file sequenced by index entry or key. the record itself is duplicated under each of its keys, which librarians call tracings. by strictly limiting the number of tracings or keys applied to each record, the librarian can keep the card catalog down to a reasonable size. however, as deeper indexing is applied to the documents, more keys or tracings are used and the file becomes very large. 26 ] ournal of library automation vol. 2/1 march, 1969 furthermore, storage costs in the mechanized file are appreciably higher than in an ordinary manual card file. the full record, therefore, in a mechanized system cannot be economically stored behind each term. only the document or record number or file address of the master record is recorded after each term; in other words, the inverted file is just an index to the record file. the main record file itself is a simple serial file where each record is complete in itself, the tracings or keys in the record and the address of the record being duplicated on the inverted file. the catalog file, therefore, is made up of two parts: a serially organized main or master record file, and an inverted index to the main file. ( 15). maintenance of an inverted index is expensive. tracings and the addresses to which they refer have to be duplicated, requiring costly additional storage space. new terms and new addresses cannot simply be added to the end of a file but must be distributed and interfiled throughout tl1e index, causing a number of file maintenance problems. the inverted index and main serial file must be kept in phase, with changes in one being reflected in the other. to maintain these files, separate inputs should not be prepared; instead the inverted index should be generated from the main record file update by program control. ( 16,17,18,19). although the combined file organization of a serial record file and an inverted index does cost more to maintain than serial or list file organization, it provides such superior search capabilities that it has become the favored library catalog file organization. since the inverted file is organized by subject headings or descriptors and since a search request is specified by listing the desired descriptors and their logical relationships, the search programs need only examine the items filed behind each selected descriptor or subject heading. it is unnecessary to look at all the records, as it is with the serial file. the inverted file search, in · its basic form, takes the request descriptors, obtains the list of record addresses or items under each relevant descriptor, makes the specified logical connections, and produces all items satisfying the request. the search procedure examines only potentially pertinent records, ignoring the rest of the file. in other words, the file is organized every time a search is made to suit the requirements of the search. thus, the file and the request are compatible and utilization of the file is essentially independent of its size. an inverted index provides a very special capability to a searcher who is using a terminal, on-line system. he can test both individually and collectively the effectiveness of the terms of his search statement without having to make a complete search of the master record, simply by examining the inverted index. the system will tell him, for example, the number of entries under a term. it will tell him how many entries several terms share in common so that he can test the intersections, that is, the conjunction and disjunction of the terms. the count of addresses that results from the list intersection can be returned immediately to the terfile organization of library records/w arheit 27 minal as an upper limit of the number of hits. in effect decoding of the boolean expression takes place in the inverted index, which is a very compact list, and hence the response time is fast. it is true, some additional calculations and comparisons in the record itself may reduce the number of hits, but will never increase them. sitting at a terminal, a searcher can ask the system what will be the maximum number of hits he will get in response to a search statement. he can change the parameters of his search statement and see immediately what effect that will have on the response of the system. it is primarily because of this capability of the user to have a dialog with the machine that every terminal-oriented library information retrieval system, at least of which the author is aware, is adopting an inverted file organization. in order to reduce storage costs, not every search term need be carried on an inverted index. those search terms or index entries that are practically never searched alone, but used rather in conjunction with another term or tracing, are carried only in the main file and not on the inverted index. in a library catalog these terms are usually the place and date of publication, publisher, language of the book, level of the publication (i.e. adult, children, youth), number and type of illustrations, and so on. these terms appear on almost every record and some of them are high density terms; that is, they are heavily posted. for example, in a typical u.s. library, some eighty per cent of the books are identified as being in english. form headings (bibliography, essay, poem, biography, map, etc. ) , geographic headings, and numerics tha~ are used in conjunction with what are called main headings, also do not appear on the inverted index, but can be searched in the main file. in the very unlikely event that a search is required to be made only for a term not on the inverted file, then, of course, a serial search can be made of the master file. in some systems, a very compact serial file of data may simplify serial searching of the master file. physical organization a basic understanding of how a library's records are used is necessary to a proper plan for their physical organization. in a manual system, logical organization and physical organization of a library's records are identical. furthermore, all files are physically the same, usually on 3x5 catalog cards or, in a few cases, in printed book or sheaf catalogs. in a computer system, however, because of varying capacities, speeds and storage costs of different direct access devices, it is extremely important that the various records and segments of records be stored in those devices which will give the best cost-performance for the application. this means that the rate of utilization of the various records and parts of records, as well as the size of the records, will determine what types of devices will be used as physical files. 28 journal of library automation vol. 2/1 march, 1969 in a library operation there is very heavy use of index terms, or subject headings and author entries, to search the files; records for these entries can be very short. borrower records and charge-out records in circulation control systems are also very actively used. there is less use made of the bibliographic record or journal citation. these records are somewhat longer than the subject and author tracings, and hence require more storage, but do not need such rapid access. notes, abstracts and other explanatory material can require an enormous amount of storage space but, as a rule, are used only infrequently. patron registration, as contrasted with borrower records, is used much less frequently, unless, of course, the two types of records are combined. since serials holdings records do not change very frequently, printouts are quite satisfactory as finding tools and the records are usually kept off-line. journal check-in, however, requires a great number of accesses every day. in view of the requirements generated by the above uses, the present thinking for on-line library systems, in terms of current hardware, runs something like this: in a combined file system described above, with the bibliographic record on the serially organized main file and the index in an inverted file arrangement, the inverted file, which must be accessed many more times than the main fil~, would best be carried on disk files. the bibliographic record itself, being much more voluminous and accessed less frequently, is stored in a larger, slower, more economical file like the ibm 2321, tl1e data cell. abstracts, and other seldom used bulk records might well be on tape, off line. actually, though, as libraries build up their record files to control their total collections, they will, of course, exceed the capacity of the present data cells and will have to go to future mass memory devices similar to the ibm photo digital storage system. then it may be economical to put even abstracts and notes of the bibliographic record on line. if there is a separate item record file of in process or acquisitions data, it can be handled in the same way as the catalog file, that is, all access points as an inverted file on disk with the record itself on the data cell strips. if, however, the total item record file is not too big, it might well be stored on disk. circulation control records are carried on disk, but patron registration, if it is to be kept on line, would be more economically stored in the data cell. the authority list or thesaurus really has two functions. it is heavily used to validate and convert all inputs and all search requests. it is also used to store all cataloging and indexing decisions and to provide guides to users as to the formulation of search queries. the necessary data makes for long records that are either infrequently used or available as printouts. therefore, a condensed form of the authority list or thesaurus, a forni which carries only the terms and their equivalents, is best stored on disk, whereas the full-blown authority list which is used primarily for printing the thesaurus and its supplements can be carried off-line on tape, or in file organization of library records/w arheit 29 the cheapest, biggest and slowest direct access device which is available. in order to achieve economical, compact storage, the subject headings, descriptors or index terms would not be stored in open language but in numeric codes. by using, for example, the decimal code as used in a dewey decimal system, numeric codes would also make it possible economically to build hierarchies or class tables with the descriptors. it would be necessary, therefore, in every transaction, to translate from open language to code when interrogating the system and to translate from code to open language when outputing from the system. translations would have to be very fast to accommodate the traffic of a large number of terminals. the translation job, using a stored table, might have to be done in an auxiliary, large core storage, which is very fast but more expensive than disk files. as a general rule, what is being proposed is that for very large files the index and the bibliographic record are not to be stored in the same device. one might start this way until the file and the traffic into it are built up and the system becomes fully operational. however, the system should be so structured that indexes could be stored in files that are faster than the bulk storage devices used for the records. the translation files, that is, the tables that convert from open language to stored codes on input and the reverse on output, can be stored in the fastest available exemal storage. ( 20). it is extremely doubtful that hardware development in the' immediate future will change these principles of library file organization very much. as storage costs drop, total capacities increase, and _access times become shorter, more and more libraries will find it practical and economical to put their files on line in order to provide the improved services that users demand. references 1. u. s. library of congress: automation and the library of congress (washington, government printing office, 1963), p. 74. 2. batts, n. c.: "data analysis of science monograph order/cataloging fmms," special libraries, 57 (october, 1966), 583-586. 3. "corporate data file design," edp analyzer. 4 (december, 1966). 4. climenson, w. d.: "file organization and search techniques," annual review of information science and technology. 1 (new york: interscience, 1966), p. 50. 5. borko, h.: "design of information systems and services," annual review of information science and technology, 2 (new york: interscience, 1967), p. 50. 6. castner, w. g., et al.: "the mecca systema modified list processing application for library collections," proceedingsa. c. m. national meeting ( 1966), pp. 489-498. 30 journal of library automation vol. 2/1 march, 1969 7. prywes, n. s., et al.: the multi-list system (philadelphia, moore school of electrical engineering, university of pennsylvania technical status report no. 1 under contract nonr 551(40), november, 1961). 8. prywes, n. s.; gray, h. j.: "the multi-list system for real time storage and retrieval," ifip congress proceedings. 1962, pp. 112-116. 9. university of pennsylvania, moore school of electrical engineering: the tree as a stratagem for automatic information handling (report of work under ... contract nonr 551 ( 40) and ... af 30 ( 602)2832, moore school report no. 63-15, 15 december 1962). 10. lefkovitz, d.: automatic stratification of descriptors (philadelphia, moore school of electrical engineering, university of pennsylvania, technical report under contract nonr 551 ( 40), moore school report no. 64-03, 15 september 1963). 11. landauer, i.: "the balanced tree and its utilization in information retrieval," ieee transactions on electric computers (december, 1963), pp. 863-871. 12. univac division of sperry rand corporation: multi-list systems: preliminary report of a study into automatic attribute group assignment; technical status report no. 1-2, 3#ad 609 709, 4#ad 609 710 ( 1963-1964). 13. univac division of sperry rand corporation: optimization and standardization of information retrieval language and systems; final report (ad 630-797, 1966). 14. lefkovitz, d.: file st1'uctures for on-line systems (class syllabus). 15. curtice, r. m.: magnetic tape and disc file organizations for retrieval (lehigh university, center for information sciences, july, 1966). 16. warheit, i. a.: "the direct access search system," afips conference proceedings, 24 ( 1963), pp. 167-172. 17. warheit, i. a.: the combined file search system. a case study of system design fm· information retrieval (paper presented at the f. i. d. meeting in washington, d. c., october 15, 1965; abstract, 1965 congress, international federation for documentation ( fid), washington, d. c., u. s. a. 10-15, october 1965), p. 92. 18. prentice, d. d.: the combined file search system (san jose, california: ibm june 15, 1964). 19. 1401 information storage and retrieval systemversion ii; the combined file search system, no. 10.3.047 (hawthorne, new york: ibm, may 1, 1966) . 20. warheit, i. a.: file organization for libraries; report to project intrex, mit, cambridge, massachusetts, march 14, 1968. lib-mocs-kmc364-20131012115244 150 the british library's approach to aacr2* lynne brindley: british library, bibliographic services division, london, england. the formal commitment of the british library to aacr2 and dewey 19 entailed substantial changes to the u.k. marc format, the blaise filing rules, and a variety of products produced for the british library itself and for other libraries, including the british national bibliography. the british library file conversion involved not only headings but also algorithmic conversion of the descriptive cataloguing. along with the u.s. library of congress and the national libraries of australia and canada, the british library was formally committed to the adoption of the anglo-american cataloguing rules, second edition (aacr2) and decimal classification, 19th edition (dc19) in 1981. this entailed fairly substantial changes to the marc format as published in the u.k. marc manual, 2nd edition as well as the implementation of the new and more sophisticated blaise (british library automated information service) filing rules. 1 there is, of course, never an ideal time for making major changespolitically, economically, or technically; and the bibliographic services division (bsd) found itself having a large number of preexisting separate systems, particularly for our batch processing work, which had grown up over a long period of time and had in most cases been tailor-made to the individual products. whilst relatively small, bsd is nonetheless responsible for a multiplicity of products and services, almost all of which were to be affected to some extent by the change toaacr2/dc19. briefly, then, a comment on the different services and the degree to which they were affected, thus setting the scene for our decisions on machine conversion. *based on a talk given at the library association seminar "library automation and aacr2," held in london on january 28, 1981. the views expressed in this paper do not necessarily represent those of the british library or the bibliographic services division. manuscript received june 1981; accepted june 1981. services and impacts printed publications british library!brindley 151 the major printed publication of the division is the british national bibliography. it is arguable that for the printed publications (especially the weeklies) there would have been little justification for retrospective conversion. the files could have been cut off at the end of 1980 and started afresh for 1981-it might, however, have precluded, or certainly have made more messy, the possibility of any multiannual cumulations across this period. microform products these are mostly individual com catalogues, both within the bl, especially the reference division, and externally, provided through locas (bsd's local catalogue service) to some sixty libraries in the u.k. in many ways those libraries that plunged into automation early, building up files of records derived from central u.k. and lc marc, were likely to be worst affected. individual machine-readable files had grown very large and exploited not only relatively current cataloguing data, but also full retrospective u.k. holdings back to 1950. also we foresaw no lessening of use by libraries taking our catalogue service of the u.k. retrospective 1950-80 file after aacr2 implementation. therefore the grounds for attempting automatic retrospective conversion of records were indisputable. tape services u .k. exchange tapes, either as a weekly service or through the selective record service, are supplied to nearly one hundred organisations. the same arguments that there will be continuing selection from the retrospective files apply-therefore, for compatibility and ease of use we needed to consider conversion. the weekly exchange tape service makes a clean aacr1/aacr2 break, but obviously libraries have back files of aacrl records. mindful of our responsibility to other organisations and agencies utilising our records, we decided to make our own converted tapes of lc and u.k. marc records available to tape-service customers to aid their own conversions. online services regarding the blaise online information retrieval system for u.k. and lc marc, our concern was to ensure continued easy searching and printing across the total span of files. without automatic conversion it would have been difficult, if not impossible, to ensure consistency in search elementsandindexentries (e.g.: in u.k. marc, seriesfields400, 410, and 411 no longer exist, so without conversion a searcher would have to remember specific search qualifiers for pre-1981 records, and different ones thereafter). without conversion the searcher would need a lot more knowl152 journal of library automation vol. 14/3 september 1981 edge of marc and the history of cataloguing practices to formulate effective strategies. outside users of marc last and very much not least was a consideration of what we could do to help the now large community of u.k. marc users in coping with the changeover. this is now a very large and diverse group relying on bsd for the provision of bibliographic records for whatever purpose. our own conversion enabled us to provide a multiplicity of aids to libraries. of particular note are (1) u.k. and lc exchange tapes of converted records, and (2) machine-readable and microfiche versions of our own name conversion file, which is being used as the basis for the new name authority fiche. so, in the context of the variety of our services the case for conversion was strong. retrospective conversion the extent of the retrospective conversion exercise is discussed below. in conjunction with this work we were faced with the necessity of rationalising our com and print product software (library software package), both to enable it to drive each of the previously separate print applications and to ensure that it had sufficiently sophisticated output facilities to cope with the complexity of aacr2/u.k. marc 2 records, with their increase in numbers of subfields, their repeatability, all or some, and varying sequences, to produce the specified layout and punctuation across our services. extent of conversion we are now in a position to discuss the retrospective conversion exercise. having decided in principle to become involved with conversion, the extent of our involvement had to be established. british libraries have never had the tradition of building and utilising name authority files, and certainly the concepts fit more easily in the north american primarily online system context rather than in the predominantly batch cataloguing systems established in the u.k. the bl therefore found itself without a machinereadable authority file and began to create one from scratch to enable the important heading changes required by aacr2 to be handled automatically. again because of the overriding importance of com catalogues in the u.k., considerable attention was paid not only to automatic heading changes but also to automatic marc coding and text conversions bringing the descriptive cataloguing elements also into line with aacr2/u.k. marc 2, so that catalogue records could be consistent on output whether derived from the conversion or newly created . the third consideration for conversion was our library of congress file british library!brindley 153 (books all1968), used in the u.k. as part of our cataloguing services and as a file in the blaise online system. we had always performed certain conversions on lc records to bring them more into line structurally with the u.k. marc format. however, u.k. libraries using these records for cataloguing purposes still had to undertake substantial editing. it was therefore decided to use the opportunity to enhance this conversion and bringlc records into line with u.k. marc 2 to make them of maximum use to british librarians. to summarise, then, the retrospective conversion comprised three main parts: 1. that part which utilised information stored in the name conversion file, which records the aacr2 and aacrl forms of names. this enabled the automatic conversion of major, commonly occurring personal and corporate headings. 2. automatic marc coding and text conversions-this consisted of specifications at marc tag and subfield level of algorithms for automatic marc coding and scme bulk text conversions. it resulted in records being converted to a pseudo-aacr2/u.k. marc 2jormat, so that all output specifications, whether by profile or by online inversion, had only to cater for the new format. these two parts of the conversion are inexorably linked, both conceptually and in programming terms , with frequent references to alternative courses of action dependent on whether a match has been found on ncf. the details of conversion are in "specification for retrospective conversion of the uk marc files 1950-1980,"2 prepared in the computer services department. 3 . the third facet of conversion was to our library of congress files (books all1968), to bring records in line with u.k. marc 2 as far as possible. only conversions of tags, indicators, subfield marks, punctuation, and order of data elements have been included; no attempt has been made to bring textual data into conformity with bsd practice. the converted records are therefore in aa c r2 form to the extent that lc applies aacr2 to a particular record. the next section highlights major points of each part of the conversion, commenting particularly on aspects of programming and testing. name conversion then arne conversion file was built up by bsd's descriptive cataloguing section over nine months of 1980 and comprises authenticated aacr2 headings with theaacrl form where different. it will form the basis of an authority file of headings and references for future bsd cataloguing and will be the first publicly available u.k. authority file. the file was maintained using existing locas facilities. pseudo-marc records were created recording the aacrl and aacr2 forms of headings in the format shown in example 1. 154 journal of library automation vol. 14/3 september 1981 field 001 (control number) 049 (source code) 110.1 $a great britain $c accidents investigation branch (name heading in aacrii form) 710.1 $a great britain $c department of trade $c accidents investigation branch (name heading in aacri form) 910.1 $a great britain $c department of trade $c accidents investigation branch $x see $a great britain $x accidents investigation branch (reference for aacrii name heading) name conversion file record example! the file being used for conversion comprised some 12,000 records, of which 4,000 had aacr2 heading changes. the remaining records were authenticated by bsd as correct aacr2 headings without alteration. of the changed headings most were prolific personal and corporate (particularly u.k. government) headings. the first stage of the conversion process for u.k. marc records (1950-80) involved all records being processed against the name conversion file to replace aacrl with aacr2 headings and associated references. in programming terms, the name conversion was relatively easyrelatively, that is, in the context of bibliographic programming. the matching program used was not particularly sophisticated. it took each ncf record, identified the 7xx (aacrl) field, created a key of fifty characters stripping out all blanks, embedded punctuation and diacriticals, and then tried to match the key against each 1xx heading in whatever file was being converted. if there was a match on the key, then the program proceeded to match character by character through the data looking for an exact match. if this was not found, then the ncf record was not processed. example 2 shows this procedure more clearly. of course, this file has not converted all aacrl headings, but it has ensured that the majority of headings likely to recur (i.e., of any significance in catalogue collocation of headings) have been automatically changed. automatic marc coding and text conversions this is commonly known as the format conversion program and forms the bulk of the "specification for retrospective conversion." the original specification was extremely complex, particularly bearing in mind the tight time scales that we were working to. the major difficulty throughout all parts of this facet of conversion was having to specify procedures to accommodate the variety of usage of marc across thirty years, including previously automatically converted 1950-68 u.k. marc records; it has british library!brindley 155 ncfrecord 710 (aacri) $a great britain $c civil service department $c central computer agency# 110 (aacrii) $a central computer agency# 910 (aacrii) $a great britain $c civil service department $c central computer agency $x see $a central computer agency# key: 10$agreatbrit ain$ccivilservicedepartment$ccentralc matching on datawould match central computer agency would not match central cataloguing agency n.b. key equals 50 characters (upper case) ncfrecord 700 (aacri) 100 (aacrii) $a walker $h david esdailel $a walker $h david e. $q david esdaile $r 1907 -1 900 (aacrii) $a walker $h david $c 1907$x see $a walker, david e.# key: 10$aw alker$hdavidesdaile book record before: 100 walker $h david esdaile# 900ajter: 100 $a walker $h david e. $q david esdaile $r 1907 -# 900 $a walker $h david $c 1907$x see $a walker, david e. $z 100# n. b. addition of new reference name conversion matching example2 been almost impossible to verify absolutely that any of the automatic changes would cover all cases. not surprisingly, this was an extremely complex program. it had to allow for manipulating in fairly precise ways nonstandard and variable data, and had to be designed to cope with occurrences in many different combinations . the programmer had to code for these combinations, some of which may possibly never have been used. it is probably the case that certain combinations do not exist, but this could not be guaranteed over such a large number of records until the total file had been converted. a good example of the complex logic of this kind of processing is found in the 245 field, where seven complex conditions were allowed for: (1) (2) (3) field245 if $e ___ then _ __ else _ _ _ if$£ then else _ _ _ if $d or $e ___ or ___ or _ _ _ or ______ or ______ or ______ or ____ __ 156 journal of library automation vol. 14/3 september 1981 then __ _ else if $d or $e ___ or ___ or __ _ or ___ or ___ or or __ _ or __ _ then __ _ (4) iftags ___ then __ _ (5) if008 and or ___ or __ _ then __ _ (6) if $h then and __ (7) if $e then __ _ else if first $e then _ _ _ else __ _ else __ _ repeat for all levels of 245. another variation on this theme is that the specification catered for what it expected to find. again, because of the voiume and span of data the expected was not always found. for example, a lot of processing of references is dependent on the presence of a $x. what do you do when you find a record accidentally without one? a third problem was that of interdependency of fielch and subsequent actions . a good example of this is found in llos and related 910s. if a 110 is changed, you may have to create a 910 , replace a 910 with another one, or reorganise existing subfields. then you may have to reorder the field and also flag the action to come back to later in the program. hence you are switching back and forth across fields throughout the program . you cannot simply start at field one, process sequentially, and then stop. clearly this makes program testing that much more complicated. however, those were the problems-really a very small percentage of the whole. from all that has been seen of the converted files so far it has been a highly successful exercise. all of the major marc changes and many of less significance have been converted automatically by this program-treaties, laws, statutes, series, conferences, multipart works-the resulting records being consistent in marc tagging structure and in significant headings and areas of text. library of congress file conversion it has already been stressed that the automatic marc coding and text conversions for u.k. marc were very complex programs. perhaps even more complicated was the conversion program written to transform lc into u.k. marc format. the main reason for this is that the u.k. and ncf conversions are one-off programs and a great number of the manipulations could be hard-coded. however, it is intended that the lc conversion program will be used on an ongoing basis against each weekly lc tape. thus each conversion has been treated as a separate parameter to the british library!brindley 157 program so that it is general purpose and easily alterable in the light of changes of practice by lc. to give you some idea of the complexity, there are well over 600 separate parameters to the program. i say separate, but in fact they are interrelated parameters, so that if a minor change is made to one it can potentially affect many others. many of the problems relating to this program could again only be really apparent in volume testing, not in writing. each parameter written and tested in isolation was satisfactory, but when they began to be put together in modular form, then the problem of unusual combinations began to show. although the conversion parameters for lc records are extensive, they cannot touch the cataloguing data, certainly not nearly as much as in the u.k. marc conversion. there are added problems in the fact that the records coming to us from lc do not show the clean aa crl/ aacr2 break that bsd is adopting. we are having to allow for mixed records from lc at least in the foreseeable future. details of the lc-to-u.k. marc conversion are published in a detailed specification. 3 common issues in conversion testing it is possible to draw out common problems applicable across all the conversion work, particularly in testing. they are as follows: 1. variability of records; 2. complexity of records; 3. volume of data; 4. nonstandard data; 5. repercussions throughout system. variability this is an obvious problem in the handling of marc records, but particularly pertinent when trying to do such complex manipulations. the record format itself is of course variable-there are very few essential fields or data elements; most need not be present at all; if they are present, they can be there once or ten times. standards of cataloguing, and therefore marc coding, have changed considerably over the period in question, adding to the variability. in some exceptional cases bsd practices are different from those prescribed in the marc manual, e. g., nonstandard use of title references. all of this results in additional difficulties from specification, through programming and testing. on average we found that one conversion process took two to three times the amount of coding required for more normal computer processing. complexity this is linked with variability and was manifest particularly in the fact that it was extremely difficult to ensure that the programs catered for all 158 journal of library automation vol. 14/3 september 1981 conditions. we found that testing threw up oddities not allowed for in the original specification. in an ideal situation with no time constraints a totally tailored and comprehensive test file should have been drawn up for each facet of conversion. this exercise alone would have taken a good year and would still not have catered for the unexpected data problems. in practice, whilst bsd's descriptive cataloguing staff were able to provide several hundred records that tested the majority and most important of the conversions, we always faced the possibility of coming across exceptions. this soon became apparent when volume testing commenced and each new file threw up another combination and a different program route not previously tested. volume the third major factor adding to the complexity of the whole operation was the sheer volume of data to be processed. approximate figures are as follows : u.k. marc 0.7 million records lc marc 1.4 million records locas 2.5 million records the combination of these three factors-variability, complexity, and volume of datamade testing extremely difficult and expensive in machine terms, in that large test batches of material had to be processed. nonstandard data like any large file, u.k. marc has its share of incorrect data, most of it of no particular significance. however, some problems arose in conversion testing resulting occasionally in corrupted records. one example that springs to mind was the incorrect spelling of months in treaties, giving problems in the 110 $b conversion to 240 . repercussions throughout system a cautionary note, really: we made a decision that postconversion records should not be put back and overwrite existing master files until they had been through validation programs (i.e., those used for validating new input for bnb and locas); it was felt that this was a necessary safeguard against reintroducing any structurally incorrect records postconversion. it was here again that testing threw up timely reminders of just how much the validation programs had been upgraded and changed since many of the original records had been input through the system. scheduling the scheduling of such a large, complex exercise was extremely difficult, with interdependency of processing related to the success or otherwise of overnight runs . a lot of time was spent before the conversion period in british library/brindley 159 discussion with our computer bureau to ensure maximum cooperation throughout the difficult time. they were extremely helpful in ensuring operator coverage throughout weekends and priority for our work. one of the problems we encountered was having to forecast the approximate number of machine hours that would be required throughout january 1981 when the bulk of conversion work was carried out. at the time the figures were needed we were still in early stages of programming so no volume tests could be run. equally, although we were experienced in large-volume processing it was difficult to draw any direct comparisons with production work. additionally, we had to allow for a heavier than normal production work load towards the end of the year, which always sees annual volumes, cumulations, online file reorganisation, and so on. scheduling therefore was a fine art to ensure correct priorities for production, the bureau's own work, and conversion , and to minimise contentions for files and peripherals. staffing of interest is a picture of the human resources involved in this project . what is striking is the magnitude of the task achieved by very few people. the overall management of the project was taken on by existing line management within bsd's computer services department. two project leaders were appointed, one a librarian and one a systems analyst. the librarian had a team of four temporarily seconded staff who were totally responsible for all output profile specifications (printed products and com), testing, and implementation. they also did a considerable amount of checking of test file conversion runs . the systems analyst was a project leader for three analyst-programmers and one jcl writer. between them they were responsible for lc and u.k. conversion programming and the new filing rules. existing operations staff and others as appropriate within the division were called upon for other tasks. disruption to services whilst disruption to our normal production services was kept to an absolute minimum, it was decided that it would be necessary to temporarily suspend certain services through the month of january 1981 while the bulk of the file conversion took place. throughout the period, the blaise online information retrieval system continued to be operational : associated online facilities that would normally allow the despatch of marc records to catalogue files were suspended to avoid any non-aacr2 or nonconverted records inadvertently updating converted locas files. the production of com catalogues through locas was suspended for a single month, and the first issue of bnb for 1981 was not scheduled until early in february. the schedule for the conversion exercise was adhered to with no major slippage except in the case of our lc file conversion; this exercise 160 journal of library automation vol. 14/3 september 1981 stretched on into the spring for a variety of technical reasons largely concerned with the characteristics of the lc data. conclusions having been so closely involved in this project it is difficult to draw out general conclusions as yet. however, there are some already obvious benefits both for bsd and the wider library community: the rationalisation of our software for com/printed products will lead to easier maintenance and future upgrading; the introduction of the blaise filing rules across all our products is an improvement; the new lc conversion will make our lc files much more easily usable by the british library community; we have the basis of a u.k. name authority file for the first time. this was a vast and sophisticated conversion exercise and will result in u.k. marc files probably more uniform in structure than they have ever been. it forms an excellent basis for the continuation of bsd services, especially those based on utilising records across the whole time span, e. g., blaise information retrieval, selective record and cataloguing services. equally, because our conversion has been so extensive we have been able to share it: the specification, the name conversion file, and the converted u.k. and lc files were all available at minimal cost to libraries in the u.k. of course, it is not the 100 percent solutionit was never intended to beso of course if you look hard enough you will find inconsistencies. however, it has proved that very extensive automatic conversion is possible even with today's state of the art of computing and that bsd had led the way, indeed eased the path of transition to aacr2 for british libraries. references 1. british library, filing rules committee, blaise filing rules (bl , 1980). 2. british library, bibliographic services division, computer services department, "specification for retrospective conversion of the uk marc files 19501980" (unpublished with limited distribution). 3. british library, bibliographic services division, "specification for conversion of lc marc records to uk marc" (unpublished with limited distribution). lynne brindley is head of c ustomer services for the british library automated information service (blaise). lib-s-mocs-kmc364-20141005043703 68 book reviews indiana seminar on information networks (isin). proceedings. compiled by donald p. hammer and gary c. lelvis. west lafayette, indiana: purdue university libraries, 1972. 91 p. (available at no charge from the extension division, indiana state library, 140 north senate avenue, indianapolis, indiana 46204 as long as the supply lasts). the indiana seminar on information networks (october 26-28, 1971) was an attempt to introduce indiana librarians to the benefits (and presumably problems) of library networking. papers included in the proceedings are introduction to networks (maryann duggan), library of congress marc & recon (lucia j. rather), nelinet (ronald f. miller), an on-line interlibrary circulation and bibliographic searching demonstration (gary c. lelvis and donald p. hammer), ohio college library center (frederick g. kilgour), user response to the facts (facsimile tran.smission system) netu1ork (lynn r. hard), indiana twx network discussion (margaret d. egan & abbie d. h eitger), and how does the n etwork serve the researcher? (irwin h. pizer) . as with any collection of written papers or oral presentations, the quality is mixed. the papers are introductory in nature, the pizer article being the exception. the majority report "case studies" of particular automated operations and/ or networks (marc & recon, nelinet, oclc, facts) . the facts article is the most interesting of these "case studies" because it moves beyond simply reporting "how we done it good" into an evaluation of why the network did not succeed (the network did not meet a real and/or consciously recognized need of the libraries it was proposing to serve) and emphasizes the importance of careful planning. any wouldbe network planner should read this article; there are many lessons to be learned. although the collected papers have all of the disadvantages usually associated with a collection of oral presentations (material is loosely organized and lacks continuity, introductory and oversimplified, repetitive, and out of date), they are a valuable addition to the growing body of literature dealing with networks both from the idealized conceptual view and, perhaps more importantly, from the practical reality view of existing networks. kenneth j. bierman systems librarian virginia polytechnic institute computers and systems; an introduction for librarians, by john eyre and peter tonks. hamden, connecticut, linnet books (shoe string press), 1971. 127 p. $5. 75. isbn: 0-208-01073-4. at last an inexpensive introductory text specifically written for librarians and library students! not since n. s. m. cox's the computer and the library have we had such a short, easy to read, yet comprehensive, description of the essentials. complementing the text are twenty-nine figures illustrating everything from batch and real-time processors, disc drives, program process, and systems flowcharts to data elements, formats, and input procedures, marc ii records on magnetic tapes, and sample pages from a computer-produced author catalog. the text reads like a well-organized glossary, treats the subjects of library use of computers and systems analysis in a way at once simple and informative. the authors had tested the material with students in courses at the school of librarianship of the polytechnic of north london. thanks to the british-american cooperation surrounding marc efforts, this book will be as useful in our library school classes as it is in theirs. the index d eserves a special note because it was compiled after the style of precis developed by the british national bibliography. it is a facet analysis of the text featuring access to "activity:thing: type:aspect" in a prescribed permuted order. although there is not much emphasis in such a text on subject access or information retrieval, this is not entirely overlooked and this index serves as an excellent example of what could be done by computer. truly an excellent introduction to computers and systems analysis for librarians! a two-page bibliography contains suggestions for further reading on the topic or for an expanded reading of various applications of computers in libraries. pauline atherton school of library science syracuse university isis: integrated scientific information system; a general description of an ap· proach to computerised bibliographical control, by william schieber. geneva: international labour office, 1971. 115p. $1.50. this document is a well-written description of the computerized library system developed at the international labour office. planning and development for the system began in 1963. it has been implemented and is now in operation within the central library and documentation bmnch of the ilo. the isis bibliographic control system is a large file system for storing, processing, and retrieving bibliographic information. the ilo data base consists of some 45,000 records of books, periodical articles, and other documents. each record consists of conventional bibliographic data (with less detailed definitions than marc data, however) plus an abstract. in form, the abstract appears to be written in natural language, but all descriptor words used in the abstract are taken from a controlled vocabulary and, in fact, provide subject indexing. on-line terminals are used for ide searches. the search system allows searches by subject descriptors, language, and date of publication. sequential formulation of the search allows control of the number of responses to a desirable size. records are also indexed on various data fields, such book reviews 69 as author and title. display of records and browsing are handled on line, but printing of lists or bibliographies is handled through subsequent batch printing jobs. regularly scheduled outputs of the system include printed catalogs, indexes, and authority lists. two other systems have been developed at the ilo using some programs and files of the bibliographic control system. one is for controlling loans of library books, the other is for serials data and includes a subsystem for routing library periodicals. these three major systems are described in some detail in this report. a fourth section deals with system monitoring and control. costs are discussed here. the isis system is an interesting and unique one even though the system is geared primarily to a special library environment. it is evident that much careful thought and attention to detail went into the system design and development. the integrated use of programs and files as described here and the details of some design elements make this a useful document. the report itself is well done. describing a complex system for a varied audience is a difficult task. the author, william d. schieber, has put together an excellent example of a systems report document. charles t. payne systems development office university of chicago library title derivative indexing techniques: a comparative study, by hilda feinberg. metuchen, n.j.: the scarecrow press, 1973. x+297p.; index and bibliography. this book is primarily a survey of key word indexes, with some discussion of issues in indexing. the survey is quite good, but already out of date. the discussion is unfortunate. the survey covers a wide range of computer-based article title key word indexes, including extreme cases such as permuterm. sample pages are included for fifty-six indexes, and thirteen lists of excluded words ("stopwords") are· given. reproduction of samples is generally ex70 journal of library automation vol. 6/ 1 march 1973 cellent, and this portion is valuable in showing the virtues and defects of various approaches to key word indexing. since this survey, at least three major libraries have begun publication of key word indexes to serial titles, a type of index with different problems which is likely to be more common in the future. the discussion suffers from a lack of focus. there are no clear standards for key word indexes or the traditional tools they complement or replace, and studies of user preference and convenience have been limited and inconclusive. it is difficult to say what makes a key word index more or less workable, and this book seems to cloud the issues even more. ms. feinberg makes some questionable and unsupported assumptions about what users think, want, and need, and a number of recommendations which are at best only applicable to indexes of article titles in scientific fields. take three major recommendations: plural and singular forms should be interfiled, synonyms and similar words should be interfiled, and foreign titles should be translated. the university of california (berkeley) library found "college," "university," "company" and "papers" to be good exclusion words, while "colleges," "universities," "companies," and "paper" are good subject words. synonym control increases homonym problems, makes for longer (and thus more difficult to use) lists, and entails difficult decisions as to what con~titute true synonyms. translation raises the qm ..., tion of whether a user should be guided to a publication he may not be able to read. in sum, these and similar decisions should depend much more on the field of study and user population than on this type of general treatment. there are other problems reflecting deficiencies in the areas of technical background, understanding of typography, and appreciation of some reasons for key word indexing. ms. feinberg comes out strongly in favor of "title enrichment"-adding artificial titles to improve indexing. this, however, adds cost and time to the key word approach, and subtracts from its clear advantages. a large section is devoted to an experimental study of different indexing programs, with the result that different programs produce different in dexes. generally, the discussion detracts from the survey. finally, the title chosen seems unfortunate. "key word indexing" may not be an ideal term, but it is fairly well known; must we introduce yet another vague, polysyllabi<, phrase, "title derivative indexing"? walt crawford university of california berkeley accountability: systems planning in education. leon lessinger & associates. creta d. sabine, editor. homewood, ill.: etc publications, 1973. 242 pages. "accountability" has become a rallying cry in many educational circles of late: for the public in its demand for visible results for educational dollars, and for educators as they attempt to define and defend new programs. this well-sequenced collection of nine papers on this subject addresses the problem of accountability at all levels of the educational 'enterprise. first is a conceptualization of systemsplanning through an explanation of the systems approach, cost effectiveness, and cost analysis. next are specific methods of systems-planning at the classroom, community college, university, and state 0 ....... 0 1:: '"'t ~ ~ .q. t"-t .... ~ '"'t ~ ~ e0 ~ .,... .... 0 ~ < 0 ~ c.:> .......... ~ s: ill '"i n .?"' ~ (0 ~ 0 e~tus hconstruct •c? hatcr message; ____j \.__} \.__} l.__1 \.__} i """ no ~ ~ ~ co co -· ;:$ q"q .q.. add 1 to h construct ~ print h skip a skip i a:: a read a read > delete delete msg old new old new ~ counter ~iessage ht":ti ~~~ d~a d-a (j ~ ~ ~ co construct i ~ral .......... invalid to matcr msg 1-; trj !:lo ~ > z ill ::s 0.. to roe fig. 2 continued. c:: tr1 ,j:>.. ....... ~ ~ ...... c subtract i 'r·-··-~ 2048 from ...... length c -1:"'4 .... add 1 to c:.-neii·count ~ exit ) ~ ~ .... c ;:! ~ .... s· ;: < c :-c.:> subtract i ......... ~ subtract i v \ r~d i 1 2048 from i 2048 from length ~ length cj no >; ~ no i . \ .?" exit ~ (!) exit ~ fig. 2 continued. add 1 to new-count fig. 2 continued. subtract 2048 i froh length ~ i exit ) ud-ne\1 hove hivalues to old-compare ~ area -----' hove hi· values to h nell-compare area ~~ j exit '"i; ~ ~ <.':) ~ ~0 ~ ::x:l (") ~ ~ ) <.':) ex it ~ b:j 1-c tr1 !:d ~ > z § 0.. b:j t-t c tr1 ..... (,:) 00886nam 2200205 0010013000000080041000130500021000540820018000751110093000932450119001862600 ~7ft03053000033003425000089c037550400290046465000320049365000240052570000460054970000460059571000400 0641& 67026007 &690324s1968 moua · b 10100 engo &0 sarc847sb.a67 1966& $4616.3/62/0755&20 saapplied seminar 0~ the laboratory diagnosis of liver diseases,scwashington, d.c.,$01966.&1 $alabor atory diagnosis of liver oiseases.$ccompiled and edited by f. william suno f. r~an and f. william sunde rman, jr.&o sast. louis,sbw. h. greensc*c1968*& saxl[[, 542 p.sbillus.sc27 cm.& sahelo under the a uspicf.s of the association of clinical scientists, nov. 10•13, 1966.& $al~cludes bibliographies.&oo saliversxdiseasessxdiagnosis.&oo$ameoicine, clinical.&10sasunderman, frederick williamtsd1898•seeo.& 10~asunderman, frederick william,sd1931•seed.&20saassociation of clinical scientists.* 00778nam 2200169 0010013000000080041000130500019000540820010000731000017000832450295001202600 04600415300002600461500003800487500002800525652002800553740002800581& 6702a617 &690324r19681846mdu c c 00000 f~go &0 saf93sb.h65 1968& sa929.3&10sahinman, royal ralph,$01785•1868.&1 saa catal ogtw of the names of the first puritan settlers of the colony of connecticut,sbwith the time of thei r a~rival in the colony, and their standing in society, together with thfir place of residence, as f ar as can be discovered by the records.sccollected from the state and town records.&o sabaltimore,sb genealogical pub. co.,sc1968.& sa336 p.sbport.sc23 cm.& $aon spine* first puritan settlers.& sare print of the 1646 ed.&oosaconnecticutsxgenealogy.&olsafirst puritan settlers.* 00896nam 2200193 00100130000000800410u0130500017000540820010000711000021000812450128001022600 0500023030000320020049c005800312500013300370504003100503650003100534710006100565810007700626& 6703 0030 &690324s1968 nyua b 00010 engo &0 sara395.a3sbu4& sa362.1£10saullmann, john e.t1 sat he application of management science to thf evaluation and design of regional hf.alth services,scedit ed by john e. ullmann.&o sa*hempsteao, n.y.,sbhofstra university*sc1968.& saiii, 346 p.sbillus.$c28 cm.&1 sahofstra university yearbook of business, ser. 5 0 v. 2& 'a**this* ~fport results from the c ontinulng series of m.b.a. seminars conducted by the school of business of hofsfra university.*& sa bibliographical footnotes.&oosacommunitv health services.&20 sahofstra university, hempstead, n.y.sbs chool of business.&2 sahofstra university, hempstead, n.y.styearbook of buslnfss,svser. 5, v. 2* 00844nam 2200217 00100130000000a0041000130410011000540500018000650 8 20014000831000027000972450 0940012426000580021830000490027635000100032549000730033550400810040865000260048965000330051584000270 054884000s200575& 67031114 &690328s1968 njua 8 00100 engo &1 shengfrf.&o san7b32sb.g6613& sa704.948/2&10sagrabar, anor=e,$dl896•&1 sachrlstian iconography*sba study of its origins.sc*trans lateo from french by terry grabar.&o saprinceton, n.j.*sbprinceton univepsity presssc*c196r*& sal, 174, *203* p.sbillus. ipart col.)sc27 cm.& sa15.00&1 sabollingen series, ~s. the a. w. mellon lectu res in the fine arts, 10& sabibliography* p. 149•158 12d group) *illustrations** p. *1*•*203* (30 g roupl&oosaartt early christian.&oosachr.istian art and symbolism.& sabolltngfn seriesrsv35.& sathe a. w. mellon lectures in the fine artsrlv10* fig. 3. print record program output. ..,. ..,. i ....... -q.. t"i & ~ ~ ..... c ~ ..... c;· ;:s ~ !"""' (;:) ........... ...... a:: ~ '"i pi-' cd c3 processing of marc tapes j bierman and blue 45 drop and transfer records program this is a utility program that enables any number of lc card numbers to be entered on cards, with the option in each case of dropping the record entirely or transferring it to another tape for future action. it has proven useful for removing out-of-sequence records, purging files, etc. inputs are two in number: 1) any tape in marc code and format (sequence is not checked) ; and 2) detail cards, each of which contains a 12-position lc card number and a code indicating if this marc record is to be dropped or transferred to another tape. these cards must be in sequence. there are three outputs: 1) an updated tape containing all marc records on which no action was taken; 2) transferred tape containing, in sequence, all records transferred; and 3) a listing showing the lc number and the action taken, which is useful for verification of results. print record program this program prints in readable form any tape in marc code and format. the translation table, which produces a form of upper-case ebcdic, is the same as that used for other department of libraries programs. it is a character-for-character translation, which, for the present, is useful for many and varied applications. input is any tape in marc code and format. output is an upper-case ebcdic translation of the tape. figure 3 shows a sample output. figure 4 shows how the oklahoma department of libraries is handling the marc expanded character set with a small printer (ibm 1404-48 characters). simply stated, the problem is that there are many more characters coded in the marc ascii character set than are available on the particular printer that the department of libraries is using. (this is a local limitation of the printer that happens to be available; it is not a limitation of computer technology, as printers with expanded character sets are readily available). in general, rarely used punctuation and special punctuation marks not in the printer's character set print as an "•'', the lower-case letters print their upper-case equivalents, and diacriticals and foreign language symbols print as "= ". this translation table is used for in-house lists (for checking purposes, etc.). for production purposes, a slightly different translation table is used. characters, particularly punctuation marks, not available on the printer are translated to their closest equivalent or left blank, whichever is more appropriate. at the oklahoma department of libraries, all translations at this time are internal and do not affect the marc tapes, which are being left in the original ascii code. it seemed unreasonable to centrally translate the tapes to ebcdic until agreement among all the users could be reached as to a mutually useful translation table. 46 journal of library alutomation vol. 3/ 1 march, 1970 there is a good possibility that in the near future the information and management services division will make available an off-line printer with an expanded character set ( upperand lower-case letters, additional punctuation, etc.). if this does happen, then print-outs in an expanded character set would be economically possible. k .. .c " .c "" kc: h a, y,9 a,9 8, 9 c , 9 0,9 e , 9 f ,9 g ,9 11, 9 a,8 , 9 8 , 8 , 9 c , 8 , 9 0 ,8, 9 e, 8 ,9 f , 8,9 g , 8,9 a , q , 9 j , 9 k, 9 l , 9 m, 9 n,9 0, 9 p,9 q, 9 j , 8 , 9 k ,8,9 l ,8, 9 m,8 , 9 n , 8 ,9 0 ,8 , 9 p . 8 , 9 k .. " k .c .. " " .c .c " u .... " " 0 .. .. u .. .. "' :e :e "' 00 nul $ 01 soh s 0 2 stx $ oj etx $ 04 eot $ os enq $ 0 6 ack $ 07 bel s 08 ss $ 09 lit $ oa lf s os vt s oc ff $ od cr s :je so s of si $ 10 ole $ 1 1 dc! $ 12 dc2 + lj ocj $ 14 dc4 $ 15 nak $ !6 syn $ 17 ets $ 18 can $ 19 em $ ! a sub $ 18 esc $ !c fs $ 10 gs * le rs & l f us $ " k .. " .c .c " u u .... .....c 0 0<.> u uc: : ~6. s b $ sb $ sb $ 58 s sb $ s b $ s b $ sb $ ss s 58 $ ss $ sb $ ss s sb $ ss $ ss $ 58 $ 58 $ 4e + ss $ 5b $ 58 $ sb $ sb $ 58 $ 5b $ 58 $ sb $ sb $ sc h 50 & sb $ .. .. .c ... .c ...... "" .. ~ "'"' j ,y,9 z , l z,2 z , j z,4 z, s z , 6 z , 7 z , 8 y,l , 9 y, 2 , 9 y , 3 , 9 y, 4,9 y , 5 , 9 y,6, 9 y, 7 , 9 a ,q, z 1 , 9 2 , 9 j , 9 4 ,9 5 , 9 6 , 9 7 , 9 8 , 9 1 ,8, 9 2 , 8 , 9 j , 8 ,9 4, 8 , 9 5 , 8 ,9 6 , 8,9 7 , 8,9 ~ k ~ " 0 .. .. "' 20 sp 21 ! 22 .. 2j i 24 $ 25 7. 26 & 27 . 28 ( 29 ) 2a " 2b + 2c ' 20 2e 2f i jo 0 jl 1 j2 2 3j j j4 4 3 5 5 j6 6 j7 7 38 8 39 9 ja : 38 jc < )0 3e } 3f ? marc hex lc • tape mare k .. .c 0 !::! 0 u "' "' sp ,, k * * * " . ( ) * * ' i ll 1 2 j 4 s 6 7 8 9 .;( " " • • * marc hex ld • end o f recor d marc hex l e • field terminator marc h ex if • delimet er fig. 4. conversion table. k " .. -<£: 0 ov u uc: "'"'~ "' "'"' 40 sc ,., sc * 5c * sc * sc * 5c " 70 i 40 ( so ) 5c * sc • 6 b ' 60 48 61 i f\l 0 fll f2 2 fj j f4 4 f5 5 f6 6 f7 7 f8 8 f9 9 5c * sc h sc • sc • sc " sc • ..~ "'"' a, z 8 , z c,z d, z l:: , z f , z c , z h, z a, 8 b, s c , 8 0 , 8 e , s f , s g, 8 & a,r 8 , r c , r d, r e , r f , r g , r h,r j , 8 k, 8 l , 8 m, 8 n, s 0, 8 p,8 k x <0 " .c .c " 0 " .. .. .. "' .. "' 4 0 @ 41 a 42 b 4j c 44 d 4 5 e 4 6 f 47 c 48 h 49 i 4a j 48 k 4c l 40 ~~ 4e )i 4 f 0 so p 51 q 52 r 53 s 54 t 5 5 u 56 v 57 w 58 x 59 y sa z ss ( sc \ so j se ... sf .. .. oo x <0 .c : + is 0 ,1 is 0 ,3, 8 i s 5 , 8 .. .. .1 65019066 66021087 66061643 66065709 65019667 66021669 66061685 6606'5767 6'5023174 6607.1679 66061f!69 66065770 6'502s047 66021680 66061875 67010007 65026126 66021689 66061886 67010038 65027231 66021695 66061889 67010310 6'>027416 66022509 66061r99 67010836 6'>021"107 660229af! 66061917 67011394 6s027708 66.02'l067 66061967 670115h 65028116 66024150 66061983 670ll9h 65060409 6602'i530 66061988 67012048 65060483 66025986 66062017 67012128 6'>060652 1>6026120 660620l8 67012478 650601,84 66026122 66062160 67012840 65060737 660261h 66062168 67013691 65060796 66026124 66062252 670140!>1 65061226 660261?.5 66062259 67014071 650b1567 1>6026126 6 60 62283 67014142 65061895 6602659r 6 6 062290 67014311 65061896 660?6650 66062309 67014312 65062346 b6027410 66062403 67014916 65062359 66027435 66062405 67015033 65062399 6602769~ 6 6 062417 67015715 6'i062463 66027694 660624'•4 67016233 6506248'> m021!204 66062476 67016619 65062489 66028413 66062637 670169h 65062')04 6602a462 66062640 67017216 65062'>07 66028495 66062649 670174!19 65062'>4'3 6602 8687 66062820 67017582 6506<'').72 6602f\c)')f) 66062964 67017584 65062300 660 'i014fl 6fl06?.986 670176:>9 fig. 5. library of congress number listing. processing of marc tapes j bierman and blue 49 c start ) move data to header une move to hoidarea usingh alter end-sw to go to eoj t-----.1 print-rtn fig. 6. print card numbers program detail flowchart. 50 journ-al of library automation vol. 3/ 1 march, 1970 add 1 to h yes )---------.{ print-rtn locate & pull est l.------lil'l length codertranslate fig. 6 continued. subtract 2048 from length processing of marc tapes/ bierman and blue 51 sethtofl translate hold area move 1 to k 51 to l 101 tom 151 ton 201 top 251 to q perform rtnx 5 times clear hold area fig. 6 continued. eo) close files stop run skip 1 une perform rtn-y 10 times l move hold (k) toprlnt-k • move hold (l) to printl move hold (m) toprlntm , move hold (n) toprlntn move hold (p) to printp , move hold (q) to printq add 1 t o k, l, m, n, p, q. 52 journal of library a.utomation vol. 3/1 march, 1970 retrieval sub-system withdrawing records program this program withdraws records selected by lc card number and copies the complete marc ii records onto another tape. a library sends the department of libraries a magnetic tape containing the lc card numbers for the records it wants copied from the data base. the data base is searched and the requesting library is sent back three tapes and three hard copies. the tapes are: 1) the original finder tape, 2) an item tape containing the records which matched, and 3) a tape containing the lc card numbers of the records which did not match. the three hard copies are: 1) a list in lc card number order of the records which matched containing on the first line information from the finder tape and on the second line information from the marc tape; 2) a listing of the card numbers and other information on the finder tape which did not match any card number in the data base; 3) a listing of card numbers and other information on the finder tape that were invalid. there are three inputs to the system, the first being a marc master, which is the latest merged master at the department of libraries; its records are in the original code and format. the second consists of finder records, which come from the individual library. input is originally on card in the format specified in table 1, then put on tape, blocked 5, and sorted (no tape labels are used at this time) on all 12 positions of the lc number. the tapes are unlabeled upper-case ebcdic 1600 bpi. the third is a card that enters the appropriate date and library code into the system. table 1. original card input format to odl-~5 card columns 1 2-4 5-12 13 14-28 29-48 49-76 77-80 field contents and special instructions local library code (assigned by dept. of libraries) lc card number prefix (upper case alpha or blank) lc card number (numeric) lc card supplement indicator (may be blank) local use (may be blank) local use or first 20 positions of author (may be blank) local use or first 28 positions of title (may be blank) local use or publication date (may be blank) the system gives the following five outputs: 1) matched records, a listing of records that matched and were transferred to the individual library's item tape. this listing shows all informaprocessing of marc tapes/bierman and blue 53 tion from the finder record, and immediately below, the following information from the marc record: lc card number, the first 20 characters of the author, and first 28 characters of the title and the publication date. information pulled is as follows: author (first tag beginning with 1), which will usually be 100 or 110; title, which will always be 245; and date, which will be the 7-10 positions under tag 008. figure 7 shows sample of output. the first line is data from the finder tape and the second line data from the marc master tape. hucheo .records library cool! x qate. processed · 06/15/69 lc number local use author htle date 6406(1336 arcd publishing cohp operations and maintenance 1966 aroo publishing,tom9 operation~ and maintl!nanceot 1966 6()02[680 knox, john jay a history of banking in the 1969 knox, john jay.$0182 a history of banking in .the 1969 67021200 g.ll/pic gilbert pictorial anatomy ·of the cat 1968' gilber.t • stephen g.c pictorial anatomy of the cat 1968 61.023081> dickinson, emily two poehs. 1968 dicki.nsoii, ehily•sdt two poems.·sc*illus. and call 1'168 680080h gernsheih, helmut l. jo1 m. daguerrb 1968 gernsheih, helmulo$0 l. j• m. daguerr!!*sbthe hist 1968 68008214 riley• jdhii it, the studeiit looks at his tea 1'169 riley, john w.£1 sat the .student looks at his tea 1969' 6a0081ol8 tagiurl'o renato organizational climate 1968 tagiurto rf.nat0~&1 $ organizational clihate*~8exp 1968 680257:!11 t"e bardoue principles, sty 1968 8az!n 0 gerhajn~tl sa the baroque* principles, ·sno 1968 69015554 36823 philips, judson girt with sl~ fingers 1969 philips, judson penf the girl with six fingers*sb 1969 17002574 aylmer, g. e. the struggle for the constit 1968 aylmer, .c. e.&t uth the struggle for the constit .1968 78625296 groves, doris guiding the development of 1968 .groves, ooris.u sag guiding . the development of t 1968 79000540 americ~n assoclation preparation fo~ retirement 1968 ·american association preparation for retirement•s· 1968 at 6.800't"l52 iiellst r08ei!t science• hob8y hook of iieathf 1968 iiel:lsi robert', s01913 ·st i ence•h08b'i' book op weathe 1968 agr680001't 5 .santhyer, carolee morocco s agricultural ecd~o 1968 santh'i'er'• carolee,sd morocco s agricultural econo l96b gs 68000236 us geological survey bi~liogi!aphy 'of reports resu 1968 heath, jo ann,sd1923 bi8liograph'i' of reports rfsu 1968 total hatched reccs 15 aoo. generated errs .. fig. 7. matched records listing. 54 journal of library a.utomation vol. 3/1 march, 1970 2) items tape, containing all records requested from the master tape. they are in marc format and code, and the number of logical records should match the matched record count. 3) unmatched finders listing, showing all valid finder records that did not match the marc master tape. figure 8 shows sample output. 4) unmatched finders tape, containing all valid finder records that did not match the marc master tape. uiimatcheo ~fcoros lleaary coof. x date paocessfo 06/is/69 lc nuhbf.~ local use a unto~ titlf date 39015412 paoint paoelfoao i~teana tional uw 1939 6000716) 3~6~ delllt melvin belli l coks at life 1960 640635r6 caswell t barbara w woak~ens c!i~pe~sat i on hfnh i 1963 68002763 belli law revolt 1?6~ 68055~0~ eiseha'it alberta thf. guest oog 68066~07 roheck t hllcreo spf.c!al class programs for 1?6a 70003466 th f fhenoeo ca{f. facility 196 7 71 079)1 0 rfitlelt willi am the •fo itfrranean, i is rolf 1969 hfw68000051 pi ve'ih/fouc tot~l un•atcheo rcos 9 fig. 8. unmatched records listing. 1!!\ltor list iiig library code x d ate processed 06/15/69 x 66016u65 rich necessit ies of life 1966 i nv all c lc• number x 6805540~ i? i segues t dupl i ca te lc number x 73u3622 8leinh eih the ai.se and fall of the 197 0 inv alic lc• number j 95000001 invalid libr ary code j g9 6md003b ~683986 i nvalid library code jg9 6adddol8 468~986 invalid lc• pref i x jhe\17369 78 }6 he t s~lue iiivalid liormy code xhe 3a3326d9l invalid lc• prefix xheh332609z invalid lc• nuh8er xhlw790d0366 j ones relationships a~cng l969 invalic lc• pref ix j3266aodoog9 curt is the making of a presiden t 1969 invalid lirrary code j3266addddg9 curtis the making of a pres i dent l 96? i'ivalio lc• prffix j32668dooor.9 curtis the making of a pres i dent 196? i nv all c lc• nuhher x32669005736 hew3265ht 32 invalic lc2 pre f ix total frrors l4 fig. 9. errors listing. processing of marc tapesjbierman and blue 55 5) errors listing, showing all 80 columns of invalid finder records and the appropriate error message. finder records are invalid if one of the following errors occurs: 1) blank or invalid library code; 2) prefix any characters except blank or upper-case alpha; 3) lc card number not pure numeric. invalid finder records are not processed but are placed on an error listing. figure 9 shows sample output. no edits will be made on columns 14-80, which are for local use entirely; all data from these fields will be transmitted to printed listings for any desired local use or for verification. record counts are included at numerous points to facilitate accurate record control. for the purposes of this particular program, counts should check as follows: matched + e + unmatched -generated original rrors '= records records errors count matched records appear at the end of the listing of the same name, errors appear at the end of the listing of the same name, and unmatched records appear at the end of the listing of the same name. generated errors appear at the end of the matched records listing. a generated error indicates more than one error in -a single card and this count is included only for control purposes. the original count is expected to be maintained by the submitting library for maximum accuracy. these counts are checked immediately, and any discrepancies cleared up as soon as possible. figure 10 gives the overall view of the program and figure 11 a detailed flowchart. the odl-05 program was written to provide the greatest flexibility possible to the user libraries. the only information absolutely required for the finder tape is the local library code, and the complete lc card number. however, the remaining 67 card columns are available to the local library for any use it may wish to make of them. if the local library would like a quick method of sight checking to make sure that the records copied were the records wanted, it can keypunch the first twenty characters of the author in columns 29-48, the first 28 characters of the title in columns 49-76, and the date of publication in columns 77-80. if this is done, the matched records listing will contain the author, title and date from the finder tape, immediately followed underneath in the same position on the page by the corresponding information from the marc record. figure 7 shows sample output. thus, the library can quickly sight check what it thought it was getting at the time of request with what it actually got from the marc record. of course, the local library is free to put no information, or other information, in columns 29-80; the operation of the system will not be affected and whatever information is included in columns 29-80 will appear on the three output listings (matched records, unmatched finders and errors). 56 journal of library automation vol. 3/ 1 march, 1970 -phase 1edit, pull atches and list. -phase 2print errors -phase 3print unmatched listing unmatched listing date & lib. code card matched listing errors fig. 10. withdrawing records program system flowchart. ~ master-read \..._/ ~ finder-read ~ processing of marc tapes/ bierman and blue 57 move control data ·to header areas put hi-values 1----~in compare pull lci & convert to ebcdic compare \..._/ area close files put proper error code in record ~ !---•compare '-.j ~ phase-2 \._.,/ ~---=" fineread fig. 11. withdrawing records program detail flowchart. 58 journal of library automation vol 3/ 1 march, 1970 construct ~:...___,..,non-match red 'and code as 1----~ ~1rud finder-read fig. 11 continued. such /"""'.. work-read '-./ open work tap e r-'\ 1----_.. finder-read -...__,~ close files construct error message work-read processing of marc tapes/bierman and blue 59 {\ phase-3 open unmatched tape construct at end close files unmatched eoj mes sage fig. 11 continued. another convenience for the local library is that it has to do no original programming to use the system. all that is needed are standard sort, merge and card-to-tape programs. any of the programs written by the department of libraries is available to users on demand. they may find the merge or lc card number print programs useful. another consideration for the user is the ease with which invalid finder records and unmatched finder records can be resubmitted into the system. to correct finder records in error, the library simply repunches cards from the error listing, with necessary corrections, and resubmits them in the next cycle with new cards. unmatched finder records can be merged with any new finder records in the next cycle and resubmitted, no repunching being necessary. 60 journal of library automation vol. 3/ 1 march, 1970 what is presently being done the variety of applications for marc presently being worked on in oklahoma libraries is most interesting. central state college, edmond, oklahoma, is currently subscribing to the weekly marc tapes and producing an index of available materials which cumulates for two months and then drops off the older entries. the library is receiving its own subscription to the marc tapes for this purpose but does not plan to maintain a complete file of marc records. the tulsa city-county library system, tulsa, oklahoma, is currently using marc records from the state data base for bibliographic information for its machine produced book catalog. it originally had a subscription to the marc tape service, but with the operation of the state-wide data base, is dropping it. the university of oklahoma, oklahoma state university, and oklahoma county libraries have no immediate plans for utilization of the marc records as distributed by the library of congress; however, when they do move in this area it will probably be for use in their technical processing departments and the state marc data base will form a basis for their use. computer and language used the computer being used for the department of libraries marc program is an ibm 360/ 30 located in the state budget bureau but under the administrative control and operation of the information and management services division of the board of affairs (the centralized state computer center for the capitol complex) . the computer has 32k core size, one on-line card read/ punch, model 2540, four magnetic tape drives, model 2415, two magnetic disk drives, model 2311, and one on-line printer model 1404. the programs are written in cobol for the 360/ 30, operating under dos, with a cobol compiler. very little modification would be required to operate under os. the merge program ( odl-01) requires three tape drives. the withdrawing program ( odl-05) requires four tape drives but could be modified to operate with only three tape drives. in agreement with henriette avram and julius droz ( 4), the department of libraries has found that cobol can easily be used to process marc records. the information and management services division has assigned a programmer to the department of libraries who has done, and will do, all the marc programming. she is actually employed by the imsd and the department of libraries contracts with them for her services. presently, the department is being charged about $7.00 an hour for programming time. the planning, system design, actual programming, and production are all closely supervised by the data processing coordinator of the library, and he is on the department of libraries' staff. processing of marc tapes/ bierman and blue 61 the relationship between the imsd and the odl has been extremely beneficial for the library. thus far, the centralized computer center has provided fast and excellent service at a minimum cost. having a fulltime data processing coordinator on the staff of the library has negated the communication barrier which so often exists between a computer service center and a user library. cost cost figures for use of marc are very difficult to find. few of the marc i participants ( 3) give anything but a fleeting reference to cost. the reason is clear: cost figures are difficult to determine and even more difficult to evaluate meaningfully. table 2 is a breakdown of the charges to the department of libraries for programming and machine time; it does not include department of libraries' staff time or overhead costs. the figures are accurate through the end of february 1970. table 2. costs system design -----------------------------------------------$1,102.00 programming -----------------------------2,467.00 machine cost for program testing, debugging and machine and operator cost for merging through 2/ 28/ 70 ------------------2,026.00 total --------------------------$5,595.00 for the first year, the department of libraries is absorbing all the costs of merging and maintaining the marc master file, as well as the costs of all programming, as a form of state aid to libraries. the machine costs of comparing a finder tape with the master file, copying the desired records, and printing the various hard-copy lists, is being absorbed by the user library. the user also supplies the two blank tapes which are needed for each run. the machine time costs are based on the rate of $80.00 an hour of cpu time. plans for the future of the state-wide marc master file two major problems are apparent in the system as it is now set up. the system was initially created as a sequential tape system because this was the easiest and quickest way to establish a working system, and because it was felt that this would be practical for at least the first year of operation. one problem is that the sequential file will become expensive to maintain and does not allow direct access to a particular record without a sequential search. another problem is that the present system allows enhy into the file only by lc card number and does not allow entry directly with bibliographic information. 62 ]vurnal vf library automation vol 3/1 march, 1970 in accordance with present plans, in march 1970 work will begin on converting the storage medium from tape to a direct access device (disk or data cell) as the recon study suggests ( 5). at that time the file will cease to be maintairied in lc card number order and will be maintained in the order in which the records are received from the library of congress. various indices to the marc data base will be produced; author and title indices will enable the data base to be searched by bibliographic information when the lc card number is not known. in this way, only the indices (which would be comparatively much smaller), and not the complete data base, would have to be merged and searched. in terms of the data base itself, this will be the next major change. in the long run, it will be desirable for libraries that want access to the marc data base to have such access directly via terminals. at the present time, the cost of this kind of access is not worth the increased speed of access, nor is the money presently available; however, in the future, the cost of such a system will surely be reduced by technological improvements and increased importance of instantaneous access to the data base. when need balances with cost, such a set-up will be feasible. the geographical expansion of the system is a possibility. economically, this is most desirable, because the more ways the cost of maintaining the data base is split, the cheaper it is for all involved. some preliminary investigation along these lines with bordering states is being made and hopefully at some time in the future there will be a regional data base which many libraries can use. plans for future cooperative use of marc the cooperative use of marc thus far in oklahoma only affects the larger libraries which have access to computers and automation personnel. essentially, each library is autonomous and is free to use marc in any manner it wishes. it will remain true in oklahoma that individual libraries will always be free to use the data base to retrieve part or all of the data base for any purpose. however, plans are under way for more cooperative use of marc with libraries that do not have automation capabilities that would result in useful hard copy products for such libraries. two such cooperative plans have been proposed for immediate implementation. the first of these is a current awareness service. selected subjects would be compared against the data base on a bi-weekly (or other period) basis and complete bibliographic information for books representing the selected subjects would be printed as a personalized current awareness service. for example, all law titles on the marc tapes for two weeks could be pulled and listed, and the listing distributed to the county and state law offices, attorney firms, the law school library, etc., for selection and order purposes. the same could be done with library processing of marc tapes/ bierman and blue 63 science or any other subject. subject lists of interest to various agencies of state government could be produced and sent to them. another possibility is a profile of a legislative session by subject and then weekly or monthly lists of current materials available on these subjects for ordering by the department of libraries and possible lists to be made available to the legislative members. there are many possible uses for such a system which could be done fairly inexpensively. work began on this project in october 1969, and the service became operational on a cost basis in february 1970. a second possibility is catalog card and processing aids production. this would probably be done as a pilot project with several libraries throughout the state and then, if successful, expanded to any library in the state wanting to use the service. catalog card sets with subject headings printed at the top, and call numbers printed if the library accepts lc or lc dewey classification (there would be several options available within the system), spine labels, and book and circulation card labels would be provided. a by-product of such a state-wide operation would be the maintenance of book location information in machine readable form in a central place for future use as a basis for a machine readable state-wide union catalog. a project not in the immediate future but certainly being considered is that of cooperative retrospective conversion. that is, several libraries in the state would like to have bibliographic information in marc format for all books in their collections. whether the department of libraries would go ahead with such an ambitious project or wait for it to be done nationally ( recon study) would depend on timeliness on the national scene, need on the local scene, and available financial resources. eventually, oklahoma would like to have in machine readable form a complete union catalog of the entire library resources of the state that could be used for cooperative acquisitions programs, for strengthening subjects which are weak within the state, and as a location tool for interlibrary loan. such a data base would later be used also for reference functions. needless to say, such an ambitious project as this is not in the immediate future. conclusion early in the game, oklahoma libraries learned that the most economical means to library automation was cooperative automation. the creation of a state-wide marc data base is an important step toward cooperative library automation, while still allowing each local library to maintain its individuality for uses of the data. many areas of cooperation still remain untouched. the future success of library automation in oklahoma lies in the imaginative and creative projects that could be designed and implemented cooperatively to the mutual cost savings and benefit of all. 64 journal of library automation vol. 3/ 1 march, 1970 programs copies of the programs mentioned in this paper may be obtained from national auxiliary publications service of asis as follows: 1) "a program to merge all marc ii tapes received from the library of congress onto a single tape" (naps 00815); 2) "a program to drop given records or to transfer them to a separate tape" (naps 00816); 3) "a program to print marc tapes in readable form" (naps 00817); 4) "a program to pull selected records from the marc master tape for a single library" (naps 00818); and 5) "a program to print a listing of all library of congress card numbers on a given marc tape" (naps 00819). references 1. nugent, william r.: "nelinet: the new england library information network." a paper presented at the international federation for information processing, ifip congress 68, edinburgh, scotland, august 6, 1968. (cambridge, mass.: inforonics, inc., 1968), 4 pp. 2. pulsifer, josephine s.: "washington state library." in avram, henriette d.: the marc pilot project; final report on a project sponsored by the council on library resources, inc. (washington: library of congress, 1968), pp. 149-165. 3. avram, henriette d.: the marc pilot project; final report on a project sponsored by the council on library resources, inc. (washington: library of congress, 1968), pp. 89-183. 4. avram, henriette d.: droz, julius r.: "marc ii and cobol," journal of library automation, 1 (december 1968), 261-72. 5. recon working task force: conversion of retrospective catalog records to machine-readable form; a study of the feasibility of a national bibliographic service. (washington, d. c.: library of congress, 1969). lib-mocs-kmc364-20131012114302 304 reports and working papers cable library survey results public service satellite consortium: washington, d .c. the following paper was distributed to pssc members in may 1981 , and is reproduced here to bring it to the attention of a wider audience. background the public service satellite consortium (pssc) conducted a survey of academic libraries in july 1980 to study their data communications needs and services. results of that study, coupled with library interest generated by that study, convinced pssc that: (1) libraries have a wide variety of communications needs which could be addressed with appropriate uses of telecommunications; (2) all types of libraries are affected, not just academic libraries; and (3) data transfer was but one of many types of library services in need of better communications. this information motivated pssc to take a broader look at library communications. that second look resulted in the identification of the "cable library" (catvlib) phenomenon and video library services. in december 1980, pssc launched a second survey directed to cable libraries; that is, libraries of all types which are connected to local cable companies. this study was aimed at determining to what extent, if any, a national satellite cable library network might be already in technical existence. how many libraries are presently connected to cooperative cable companies with satellite hardware and excess satellite receiver capacity? and of that number, how many cable libraries would be interested in participating in satellite-assisted library services and video-teleconferences? to answer these questions, pssc mailed questionnaires to 101 libraries that had been identified as potential cable libraries. in order to allow the participation of unidentified cable libraries, pssc also advertised the survey in various library periodicals, including american libraries, cable-libraries, and lola. that ad resulted in an additional 97 cable libraries requesting to participate in the survey, raising the total number of libraries receiving the questionnaire to 198. as of april 1981, 86 libraries have responded, yielding a 43 % return. follow-up phone calls have indicated that more surveys are forthcoming, or that the questionnaire proved to be irrelevant to present library conditions. in some cases, copies of the survey were requested and distributed for informational purposes only. the survey instrument the questionnaire incorporated explanations of terminology and was eight pages long. additional enclosures furnished more specific information about pssc and videoteleconferencing. the respondent was not only questioned about his/her library facilities, but also was asked to interview thecable company for necessary technical information. though contributing to slower returns, this two-tiered approach did succeed in establishing contact between the library and the cable company, as well as provide all the data required to profile each library as a potential network participant. survey participants since a national network is being pursued, an attempt was made to reach as many of the states as possible. thirty-seven states received copies of the survey, while thirty-one had at least one responding library. all types of libraries were surveyed. those surveyed included elementary school libraries, high school libraries, vocational school libraries, academic libraries, public libraries, regional library networks, state libraries, library systems, special libraries, and libraries that also double as their local community access center for cable television. of the 86 who responded, 63 were public, 18 were academic, 4 were school, and one was a special library. responding libraries have been categorized according to their ability to be an active member of the network: uf usable facility-those libraries that have met all the technical requirements for network participation. the library must be currently connected to an operational cable system which has a satellite receiving station and excess receiver capacity. in addition, the cable system and the library must have indicated an interest in participating in and hosting occasional satellite-transmitted events. nxc no excess ro capacity-libraries that meet all technical cable connectivity requirements, but whose cable system cannot presently accommodate any more activity on its satellite receiver(s), are grouped here. should time become available in the future, these libraries are then technically able to advance to the usable facility group. nro no catv rohere are placed those libraries that are connected to an operational cable system . however, the cable system has no satellite receiving station and, therefore, no satellite access. in order to become a usable facility, these cable systems must install a satellite receiving station and be able to offer excess receiver capacity. ncc no catv connectionwhile a cable system with all the satellite hardware requirements may be operating in the library's area, these libraries are not connected to the cable system. reasons given in the survey are varied including logistics, economics, and disinterest . depending upon the technical status of the cable system, a reports and working papers 305 simple link may be all that is needed for the library to become a usable facility. nca no catv in arealibraries in this group are located in areas that presently have no operational cable system. some areas are now in the franchising process, some have awarded franchises but are not operational, and others have no idea if and when cable service will come to their areas. libraries here have the advantage of knowing what requirements are necessary for network participation and can use this information when franchising negotiations begin. ni no interest-here are grouped those libraries that are at various stages of technical capability, but have no desire to participate in a national satellite cable library network. table 1 illustrates responses according to geographical location. (numbers refer to the quantity of libraries from each state that fit into the above defined categories .) exactly half of these respondents are usable facilities. the largest hindrance to network participation is lack of connectivity between the library and the cable system. library /cable connectivity part one of this survey established the degree of connectivity between libraries and their local cable companies. pssc's major concern was to find libraries wired to at least receive cable programming. pssc also discovered that the highest percentage of libraries had two-way connection, usually for the purpose of cablecasting. connectivity among the 86 respondents was broken down as follows (all percentages have been rounded off): 33 (39%) two-way interconnection (transmit and receive video) 29 (34%) one-way catv drop (receive onlyregular subscriber) 14 (16%) no catv connection 9 (10%) nocatvinmyareaor presently operational in my area 1 (1%) no answer to question other questions in this section profiled the technical capabilities of the cable system. specific hours of each day of the week a satellite receiver was available for occa306 journal of library automation vol. 14/4 december 1981 table i. state nabama naska arizona california colorado connecticut florida georgia hawaii idaho lllinois indiana iowa kansas kentucky maryland massachusetts michigan minnesota mi ssouri nevada new jersey new york north carolina north dakota ohio oklahoma oregon pennsylvania tennessee texas utah vermont virg inia wash ington wisconsin wyoming total total state respondents uf 0 no response 3 i 5 2 2 4 i i i 0no response 2 2 i i 2 2 2 i 3 3 i 2 2 ii 7 i i i i 4 i 14 5 2 i 0 no response i i 0no response 2 2 2 2 i 2 2 2 0 no response 4 2 2 i 3 3 0 no response 86 43 sional use were charted. weekday mornings proved to be the most available time block. it is also imperative for pssc to know what transponders (channels) of the satellite cable systems can access. there are twenty-four transponders on satcom i, the main satellite used by cable. when pssc coordinates a satellite telecast, time on a satellite transponder must be secured . each transponder is leased to someone, such as home box office (hbo), ted turner's cable news network, or the appalachian community service network nxc 3 i 7 nro 2 3 9 ncc 3 i 2 2 2 14 nca 2 i 2 8 nl 5 (acsn), to name a few, for the carriage of their programming. time needed by pssc for a two-hour satellite event, for example, can be sublet from a transponder lessee, subject to availability. however, finding time slots on satcom i transponders is becoming increasingly difficult as many lessees are expanding the number of hours of their own programming. as a result, pssc must know which transponders each cable system can receive so that an attempt can be made, where possible, to accommodate the majority of survey facilities. the ideal situation is for catvs to own "frequency agile" satellite receivers; that is, receivers that can access any of the transponders. some receivers can get only evennumbered transponders or odd-numbered transponders; others can access only certain individual transponders. transponder accessibility is usually related to the type of programming the cable operator offers or plans to offer to the local cable subscribers, or to the age of the system. (older systems often use twelve channel receivers, tunable to only evenor odd-numbered transponders on satcom i.) for example, if a cable operator does not anticipate offering anything besides hbo now or in the future from satcom i, often he/she cannot justify the need for a frequency agile receiver. table 2 outlines transponder accessibility for usable facilities only. this abundance of frequency agile receivers will provide the connected libraries with a greater amount of flexibility in receiving programming since their participation will not be dependent upon a certain transponder. another question probed the availability of provisions for closed-circuit, discrete delivery of satellite transmissions from thecable system's receiver into the library. being able to provide closed-circuit capabilities would ensure the privacy of a satellite telecast. some pssc clients insist that their transmissions be safe-guarded through closed-circuit delivery. as expected, closed-circuit arrangement does not exist between very many libraries and their catvs. unless part of an institutional cable loop, most libraries cannot presently be singled out for closed-circuit cable reception. under normal conditions, what is transmitted from the head end of the cable system travels to everyone subscribing to the cable service. eleven of the forty-three usable facilities claimed closedcircuit capabilities are currently available. those thirty-two without described what technical considerations must be present before such provisions could be offered. these technical requirements included scrambling devices, mid-band channel usage, modulators and demodulators. such upgrading of the cable company's hardware was quoted as costing from hundreds to sevreports and working papers 307 era! thousands of dollars. no catv indicated willingness to assume the expenses for such special capabilities, but a few did offer to investigate the possibility of temporary special links on a per-occasion basis. library facilities the survey also asked about the library's facilities. information in part two centered on library accommodations and equipment. answers here provided a description of each library, which gave pssc an idea of how adaptable to hosting satellite teleconferences each might be. a basic satellite program viewing facility consists of the viewing area, equipped with chairs and tables, at least one television monitor (wired to receive the cable protable2. # of facilities able to access transponder # transponder i 2 2 2 3 i 4 i 5 i 6 4 7 3 8 3 9 6 10 3 11 0 12 2 13 i 14 3 15 0 16 3 17 i 18 i 19 0 20 2 21 3 22 4 23 0 24 5 frequency agile 30 not sure 4 note: these ligures are for transponder accessibility on satcom i. numbers for the specific transponders were tabulated from those surveys that indicated their satellite receivers were not frequency agile, but rather could access only those transponders they had listed. 308 journal of library automation vol. 14/4 december 1981 gramming), and , for interactive programs, a telephone. survey libraries reported they had conference rooms, auditoriums, and classrooms available for viewing satellite telecasts. the number of viewers able to be accommodated at one time ranged from 6 to 400, with the average facility holding 75 people. some libraries could provide simultaneous viewing in more than one room, which increased the total number of people they could accommodate for a single event. a majority of the libraries had more than one monitor; some as many as fifteen monitors. three lib!aries indicated they owned a large-screen television projector. fortyfour percent of the usable facilities have no phones in the viewing rooms, but many explained that phones were either nearby or could be temporarily installed for an interactive event. in response to a question about the location and accessibility of the library within its community, the general comments described the majority of the libraries as being in a convenient part of town, with ample parking and barrier-free design. when given enough advance notice, most libraries were willing to schedule an event at any time, even during hours and on days the library was normally closed to the public. traditionally, as a part of its standard networking service, pssc rents viewing facilities for the client, whether they are public television stations, hotels, or other facilities. libraries, as another type of viewing resource, would be entitled to receive payment for use of their facilities. obviously, this fact treads on controversial "fee or free" waters. being aware of this, pssc asked the libraries whether they could accept money for these purposes; and, if not, whether they might have some other mechanism, such as a "friends of the library" group, to which the money could be given instead. those libraries that said they could accept money directly for the use of their facilities numbered thirty-four. oddly enough, thirtyfour libraries also said they could not accept money directly for the use of their facilities. of that group, thirty-one indicated they did have a "friends of the library" or similar group to which money could be given for indirect channeling back into the library. eighteen libraries did not answer this question (many due to libraries not completing the entire survey once they felt the cable information made them technically ineligible for participation). only three libraries might have a problem with financial arrangements for an event. program interests the final section of the survey (part three) gave each respondent the opportunity to list topics of interest to the library and community that could be presented via a satellite video-teleconference. general comments identified continuing education, organizational conferences, training, seminars, workshops, media distribution, and information dissemination as major activities suitable for satellite-assisted delivery and distribution. special target audiences included the following: 1. senior citizens 2. handicapped 3. minorities 4. the disadvantaged (economically, educationally, socially) 5. the abused (drug addicts and alcoholics; abused children and spouses, teachers and students; victims of crime; and the sexually harrassed) 6. the institutionalized (in hospitals, prisons, nursing homes, mental health centers, hospices) these special patrons are often served through outreach programs and were named here as potential beneficiaries of satellite programming. the most frequently named special population was the elderly, with suggestions for retirement, social services, nursing-home care, insurance, and other senior-oriented programming. three major classes of other potential users of satellite video-teleconferencing in the library were identified: 1. education-oriented: preschool and nursery students; elementary, middle, junior high, and high school students; postsecondary and graduate students; vocational, technical, extension, and cooperative education students; special education students; adult and continuing education students; educational administrators, faculties, and staff 2. government-oriented: federal, regional, state, county, and local government officials and employees 3. employment-oriented: professional! nonprofessional; salaried/hourly; union /nonunion; management/staff; public/private sectors; employed/ unemployed; full /part-time; permanent/ temporary; big/small business; human services/ trade particular topics of interest felt to be ideal satellite program areas within each library's community included the following (appearing in no rank order): energy (solar and natural resources) consumerism community services environment historic preservation/oral history legal aid librarianship computers, data processing technology communications/telecommunications fund raising safety recreation , physical education, sports, parks language (bilingual, sign, foreign, literacy) economics and finance (investment, banking, inflation, budgeting) conservation genealogy religion business and industry civil defense agriculture and forestry health and medicine mental health arts and humanities curriculum sharing therapy and rehabilitation real estate several local associations, who have affiliates or branches located nationally, were listed as potential users of satellite videoteleconferencing (in order of popularity): 1. american association of retired persons 2. league of women voters 3. historical societies 4. american library association 5. chamber of commerce 6. american association of university women reports and working papers 309 7. parent/teacher associations 8. councils of government 9. jaycees 10. boy scouts 11 . friends of the library three questions concerning interest and ability to participate in future satellite video-teleconferencing activities were asked . the questions, vital to the outcome of this survey, are reiterated here with their respective answers: 1. would you be interested in helping set up one or more of these specialized teleconferences? yes 63 (73%) no 10 (12 o/o) maybe 5 (6%) no answer 8 (9 %) 2. would you be interested in doing a local follow-up program after a national teleconference that is of interest to your community? yes 65 (76%) no 6 (7%) maybe 8 (9%) no answer 7 (8%) 3. periodically, nationally based organizations sponsoring teleconferences or special programs enlist promotional and site arrangement support from local site facilitators. would you like to be listed as available to provide this support? yes 54 (63%) no 18 (21 o/o) maybe 3 (3%) no answer 11 (13%) the interest of the libraries surveyed is well documented in questions one and two. however, their ability to presently participate is limited to financial and personnel resources as demonstrated by question three's responses. general conclusions and recommendations the majority of surveyed libraries recognize the need for libraries to expand their community service roles through some use of telecommunications. many of the 86 libraries indicated the concept of libraries becoming satellite program viewing facilities through their cable connectivity was an idea so new to them that they could not fully 310 journal of library automation vol. 14/4 december 1981 understand or visualize what would be expected of the library in this novel role. yet the general consensus was that if joining with their cable systems to provide satellite programs receiving locations was a method of improving community library services, while not making demands on the library's budget, then the concept was worth exploring individually on an operational basis. to illustrate this concept of the ca tvlib as a satellite program viewing facility, a typical scenario would find participating catvlibs contacted by an organization or networking agent who wishes to reach the general community or a special segment with its satellite-transmitted programming. the catvlib, as the community contact, would have the option to respond negatively or positively. if the catvlib is interested, it must begin performing local coordination duties, most important of which is garnering the agreement of its cable system. catvlib and cable system discussions will determine five things: 1. can the cable system access the satellite transponder on which the programming will be carried? 2. will the cable system have a satellite receiver available on the date and time of the program? 3. will theca tv lib have its viewing facility available on the date and time of the program? 4. if desired by the program's sponsor, will the catv lib contact the local group who is to participate in the program and work with them prior to the satellite telecast to the extent needed by the requesting organization? 5. can the cable system and/or the catvlib handle special program considerations, if any? for example, provide closed circuit capability in the catv lib? tape the program? provide telephone(s) for interactive programs? provide local site facilitation? -coordinate local follow -up activities? provide refreshments? coordinate advance publicity within the community? once the catvlib has determined whether or not it is able and desires to offer their services, the ca tvlib would be recorded as a satellite program "receive site." theca tv lib will then assume the degree of local responsibility requested and contracted by the requesting organization, including all negotiations necessary with the cable system. while there were survey indications of general support for such a national satellite cable library network, what are the pros and cons of its operation? pros pre-existing conditions. ca tvlibs need no investment for hardware, but merely take advantage of pre-existing cable connectivity. community service. such ca tvlib participation potentially offers service to every member of the community. outreach to new patrons. those community residents not previously using the library may find this new service applicable to their needs. economics. catvlibs could recoup any charges incurred through this service, as well as expect payment as a rented receive site. program interaction. live satellite programming has the advantage over taped programming of allowing the option of offering viewers the opportunity to interact with the program's presenter(s). resource-sharing potential. this service has the future potential of providing catvlibs with an alternative method of accessing new information resources and data bases. human resources can be shared now through this service. potential catv expansion. more catvs are expanding and upgrading their satellite access capabilities as usage of satellites by cable programming vendors increases. some catv s have already purchased west ar iii hardware in addition to their satcom i hardware. future implications. if satellite-related services become valued by the community, the residents might decide the catv lib should have its own satellite hardware so that the community could take advantage of more programming available directly from satellite. cons lack of sa tcom i occasional time. it is becoming increasingly difficult to sublease transponder time on this satellite for occasional satellite programs. dependency. the catvlib must depend entirely on the cable system to be able to be a network participant and offer this service. ca tvlib participation is dependent upon the cable system's satellite access capabilities, which generally means satcom i only. lack of cctv. generally, most ca tvlibs cannot offer closed-circuit capability, so absolute privacy cannot be guaranteed to the program's sponsor. catvlib policies. some catvlibs will have to make decisions about various controversial items, such as: -accepting money for use of facilities. -allowing some clients the right to limit viewing to only registrants. -hosting controversial groups. range of catvlib capabilities. the survey demonstrated that ca tvlibs cannot all offer the same degree of service due to the wide range of technical capabilities. at present, each satellite event would have to be judged individually to determine which catvlibs were equipped to participate. a glance at the pros and cons of marrying libraries and satellite communications through cable connectivity suggests a national satellite catv lib network is a presently available and usable resource with potential for future expanded capabilities and unlimited programming uses. the obstacles imposed by the cons, however, are cause for a serious and objective look at the present and future viability of such a network. popular present uses of satellite videoteleconferencing are for telecasting continuing education and organizational conference interactive programming to special audiences. some pssc clients will often request to: -charge his/her special audience for participating (course or conference fees, for example). -have the satellite-transmitted event reports and working papers 311 closed-circuit telecasted to the receiving locations only. -reach specific geographical locations (often large urban areas, such as new york or los angeles). charging special audiences for closed-circuit satellite event the first two client requests are often related. if the client intends to charge the registrant-viewer a fee, he/she often expects the program to be viewed only at designated receive sites that are hosting the paying participants. (why should a viewer pay if heishe could watch the same program at home on a cable channel for free?) obviously, those clients interested in a "box office" approach to their event, that is, to make a profit rather than offer a service, are not suited for catvlib network use. however, how can the ca tvlibs accommodate those public service groups which must recoup expenses in order to offer such satellite program services? client-designed incentives such as giving the phone number for viewer interaction in a program only to the ca tvlibs rather than displaying or announcing the number during the program; requiring participants to have special materials and/ or integrating local preor postevent activities in the catvlibs with the program; even offering course credit to registrants only are manageable alternatives for those catvlibs that cannot terminate the program in their facilities only. some catvlibs may be able to negotiate whh their catv for the provision of the necessary equipment to provide closed-circuit capabilities. however, this survey did not identify many catvs that were willing to cooperate with the libraries to that extent. for those catvlibs whose policies restrict their involvement with financial transactions, particularly money exchange among library patrons, advance registration fees paid directly to the client could enable the libraries to avoid being required by the client to "collect at the door." most libraries, however, by their very nature, cannot prohibit anyone from viewing a program within their facilities, thereby making it generally impossible for them to guar312 journal of library automation vol. 14/4 december 1981 antee the client their requested selective audience. size, location, and distribution of receive sites video-teleconference users generally want to reach as many of their members or special populations as possible, yet they must pay to rent each receive site. economics influence their attempt to reach more people at fewer locations, not necessarily those most in need of the program. therefore, it is no surprise that popular receive sites are located in heavily populated cities. while cable television is finally coming to urban areas, present conditions find a lack of operational catvs available. the typical catvlib now is located in a smaller city or rural area. large states, such as california and texas, have little or no catvlib representation. only twentythree states currently have a usable catvlib facility, which makes the network descriptor "national" not quite accurate. expanding the catvlib network to include more and larger cities and all states is a must to make it competitive with other satellite networks available to a client. but even if the network is able to expand, the previously mentioned inability of catvlibs to provide closed-circuit capabilities will lessen its desirability as a resource when that capability is offered by another satellite ground facility in the same city. one competitive alternative a catv lib can consider is rental cost. clients expect to pay a reasonable rate for the use of each facility. this rate differs among different types of satellite networks, and even within the same network. for example, renting a public television station is generally less expensive than booking a hotel. yet the rate for two public television stations can vary in the hundreds of dollars. if a catvlib chooses to offer its facilities for free, asking only for compensation os any expenses it might incur because of the satellite event or charges a minimal amount, their facility becomes economically attractive. one factor the ca tvlibs must not overlook when contemplating such a decision is the cable system. will the cable system expect remuneration for its services, especially if the catvlib is receiving payment? libraries must remember they have entered into a cooperative arrangement with their catvs in order to become a satellite program viewing facility. toward future independence while a skeletal cable library network does technically exist, it is imperative that libraries work toward their own future independence before they can truly establish themselves as a viable satellite network. evolution of a catvlib network to a satellite library network might include the following two steps: l. expanded catvljb network. the survey instrument should now evolve into an interview tool for profiling additional libraries to become part of this network. efforts should be made to encourage libraries within poorly represented states to join the network if technically feasible. expansion is urged for two main reasons: to allow libraries the opportunity to experience being a satellite program viewing facility without financial obligations. -to allow community residents the opportunity to experience a library service with great potential for all local population segments. once the library is regarded as the logical place for community communications, it will be much ea~ier to begin a community drive toward supporting the outfitting of the library with the proper hardware necessary to function in that capacity. requirements for becoming part of the expanded catvlib network include: -at least one-way connectivity between the library and the catv. (a typical subscription for basic service will suffice.) -the catv must have a satellite receiving station. -the catv must have excess capacity available on its satellite receiver. -catv must be willing to cooperate with the library in providing satellite reception of occasional satellite telecasts. library must have at least one viewing room available to seat those viewing the satellite program. library must have at least one television monitor, wired to receive cable programming, available in the viewing room. library must be willing to assume role of community contact to extent requested by client. (need is for library interest in participating in these occasional satellite telecasts; degree of local responsibility can be negotiated.) even though this network is designed to be a temporary method of allowing library participation in satellite communications, future implications could find these libraries expanding, improving, or beginning eablecasting on a library-designated cable channel. thus, libraries deciding whether they should become involved with a temporary network might contemplate the related activities available from library/cable system cooperation. 2. satellite library network. at some point in the not too distant future, libraries will be faced with the decision of becoming independent from their cable system and obtaining their own satellite hardware. a library with its own satellite receiving station will become more desirable to more users as a receive site for a satellite videoteleconference since it will be more reports and working papers 313 flexible and autonomous. besides satellite video-teleconferences, libraries could investigate other uses of their satellite hardware including: direct satellite access (with permission recommended) for cable television fare; reception of nationwide satellite distribution of taped video programming for library use; -facilitation of various library data communications. if the library is able to prove the value and practicality of having community satellite access capabilities located at its facilities to the residents through participation in the catv lib network, local funding of a satellite library project might be realistic. if corporations are made aware of how such a satellite library facility could benefit their own communications needs, a corporate grant could prove to be another funding route. other sources of support must also be explored. final word as a result of this survey, pssc has profiled cable libraries of all technical capabilities for input into a database of network resources. however, the limitations of a catv lib network have been noted. effort will be made by pssc where appropriate to use this network for client satellite telecasts. pssc will continue to profile interested cable libraries for addition to the network , upon request of the library. statement or ownership and management }ourrwf of l.ihrary automatior1 is published qua rterly by the arn ericnn library as~iation, 50 e. lluron st .. chica~o. jl 60611. annual subscriptio n prk-e, s 15. am erican library a.o,sociation. o\\ rwr; brian avcncy. editor. second class postage paid at chicago, ill inois. pnnted in u.s.a. a .. a nonprofit organization authorized to mail at special ratl'\ (sl'ction 132.122, postal s(•rt:ice .\lanual), the puq>oi,(', fu nction. and nonprofit .;:tatu~ of thh organi zation and the exempt status for ft..--dcral in(•omt.· taj; purposes have not chan~ed during the preceding l\\ el\ e month~. extent and ~aturc of circulatlon ("a\eras;:e" figures denott• the numlx:r of copies printed each issue during the preccdmg: tweh"e months: ''actual " figure-. denote number of copies o f sin~le l 'isue published neart..-q to filin~date -the june 1981 i~sue.) tot al numbt!r o f copll"i printed: aq~ra~e. 6,869: ac t ual. 7,345. paid circulation: not applicable (i.e .. no ... a c"' throuj!h dealers. carrie rs. street 'endoro, and rountcnal<.--s} . mail subscription ... : ah•ra~<·. 6,076: actual. 6 ,308. total puid circulation: average, 6.076. actual6 ,308. free dhtrihution b y mail, carrier, o r otlwr means, samples, complirnt·ntary. and ot her free cop ies: a\t~ragc . 432: a<:tuul, 446. t olitl di ~t rihution: average. 6.508: actual. 6,i.54. copies no t di. ... tributcd: offic<' us(', le ft over, unacco unt ed . 11poiled after printing: aven~ge, 361; actual. 59 1. hcturm from news agents: not applicahlc. t otul (sum prcviouo; thrt.."c entries): a\'erage, 6,869: actual. 1.345. stateml·nt of 0\\ ncn hi p. ~1anagement and circulation (ps 3526. j une 19so) fo r 1981 fil ed with the united stat<" po't office pmt rna\tn in chica~o. september 30. 19hl 48 information technology and libraries | march 2007 author id box for 3 column layout column title editor zoomify image is a mature product for easily publishing large, high-resolution images on the web. end users view these images with existing webbrowser software as quickly as they do normal, downsampled images. a flash-based zoomifyer client asynchronously streams image data to the web browser as needed, resulting in response times approaching those of desktop applications using minimal bandwidth. the author, a librarian at cornell university and the principal architect of a small, open-source company, worked closely with zoomify to produce a cross-platform, opensource implementation of that company’s image-processing software and discusses how to easily deploy the product into a widely used webpublishing environment. limitations are also discussed as are areas of improvement and alternatives. z oomifyer from zoomify (www .zoomify.com) enables users to view large, highresolu tion images within existing web browser software while providing a rich, interactive user experience. a small zoomifyer client, authored in macromedia flash, is embedded in an html page and makes asyn chronous requests to the server to stream image data back to the client as needed. by streaming the image data in this way, the image renders as quickly as a normal, downsampled image, even for images that are giga bytes in size. as the user pans and zooms, the response time approaches that of desktop applications while using the smallest possible band width necessary to render the image. and because flash has 98.3 per cent browser saturation, viewing “zoomified” images is seamless for most users and allows them to view images interactively in much greater detail than would otherwise be prac tical or even possible.1 zoomify image (sourceforge.net/ projects/zoomifyimage) was created at cornell university in collabora tion with zoomify to create an open source, crossplatform, and scriptable version of the processing software that creates the image data displayed in a zoomifyer client. this work was immediately integrated into an inno vative contentmanagement system that was being developed within the zope application server, a premier web application and publishing plat form. authors in this system can add highresolution images just as they normally add downsampled images, and the image is automat ically processed on the server by zoomify image and displayed within a zoomifyer client. zoomify image is now in its second major release on source forge and contains user con tributed software to easily deploy it in other environments such as php. zoomifyer has been used in a number of applications in many fields, and can greatly enhance many research and instructional activities. applying zoomifyer to digitalimage collections is obvious, allowing libraries to deliver an unprecedented level of detail in images published to the web. new applications also suggest themselves, such as serving highresolution images taken from tissue samples in a medical lab or using zoomifyer in advanced geo spatial image applications, particu larly when advanced client features such as annotations are used. the zoomifyer approach also has positive implications for preservation and copyright protection. zoomify image generates cached derivatives of master image files so the image masters are never directly accessed in the application or sent over the internet. image data are stored and transmitted to the client in small chunks so that end users do not have access to the full data of the original image. deploying zoomify image dependencies and winstallation zoomify image was designed ini tially to be a faithful, crossplatform port of zoomify’s imageprocessing software. it was developed in close cooperation with zoomify to pro vide a scriptable method for invok ing the imagepreparation process for zoomifyer clients so this technol ogy could be used in more environ ments. zoomify image is written in the python programming language and uses the thirdparty python imaging library (pil) with jpeg support, both of which are also open source and crossplatform. it has been tested in the following environments: ■ python 2.1.3 ■ pil 1.1.3 and ■ python 2.4.3 ■ pil 1.1.4 installers for python and pil exist for all major platforms and can be obtained at python.org and www .pythonware.com/products/pil. the installation documentation that comes with pil will help you locate the appropriate jpeg libraries if they are missing from your system. for macosx, you can find prebuilt binary installers for python, pil and zope at sourceforge.net/projects/ mosxzope. introducing zoomify image adam smith adam smith (ajs17@cornell.edu) is a systems librarian at cornell university library, ithaca, new york. introducing zoomify image | smith 4�introducing zoomify image | smith 4� the “ez” version of the zoomifyer client, a flashbased applet with basic pan and zoom functionality, is pack aged with zoomify image for conve nience so the software can be used immediately once installed. the ez client is covered by a separate license and can be easily replaced with more advanced clients from zoomify at www.zoomify.com. (a description of how to upgrade the zoomifyer client is included in this paper.) after python and pil with jpeg support are installed, download the zoomify image software from sourceforge.net/projects/zoomify image and decompress it. using zoomify image from the command line begin exploring zoomify image by invoking it on the command line: python /zoomifyfilepr ocessor.py or, to process more than one file at a time: python /zoomifyfile processor.py the file format of the images input to zoomify image are typically either tiff or jpeg, but can be any of the many formats that pil can read.2 an image called “test.jpg” is included in the zoomify image distribution and is of sufficient size and complexity to provide an interesting example. during processing, zoomify image creates a new directory to hold the converted image data in the same location as the image file being processed. the name of this direc tory is based on the file name of the image being processed, so that, for example, an image called “test.jpg” would have a corresponding folder called “test” containing the converted image data used by the zoomifyer client. if the image file has no file extension, the directory is named by appending “_data” to the image name, so that an image file named “test” would have a corresponding directory called “test_data.” if the process is rerun on the same images, any previously generated data are automatically deleted before being regenerated. zoomify provides substantial documentation and sample code on its web site that demonstrates how to use the data generated by zoomify image in several environments. user contributed code is bundled with zoomify image itself, further dem onstrating how to dynamically incor porate this conversion process into several environments. an example of the use of zoomify image within the zope application server is given. incorporating zoomify image into the zope application server the popular zope application server contains a number of builtin services including a web server, ftp and webdav servers, plugins for access ing relational databases, and a hier archical objectoriented database that uses a filesystem metaphor for stor age. this object database provides a unique opportunity to incorporate zoomifyer into zope seamlessly. to use zoomify image with zope, the distribution must be decom pressed into your zope products directory. for versions 2.7.x and up, this is at: /products/ in zope versions prior to the 2.7.x series, the products directory is at: /lib/python/ products/ restart zope and now within the webbased zope management interface (zmi), the ability to add zoomify image objects appears. after selecting this option, a form is presented that is identical to the form used for adding ordinary image objects within zope. when an image is uploaded using this form, zope automatically invokes the zoomify image conversion process on the server and links the generated data to the default zoomifyer client that comes with the distribution. if the image is subsequently edited within zmi to upload a new version, any existing conversion data for that image are automatically deleted, and the new conversion data are gener ated to replace them, just as when invoked on the command line. again, the uploaded image can be in any format that zope recognizes as having a contenttype of “image/...” and that pil can read. the only potential “gotcha” in this process is that in the versions of the zoomifyer client the author has tested, zoomify image objects that have file names (in zope terminology, the file name is the object’s “id” property) with extensions other than “.jpg” are not displayed properly by the zoomifyer client. so, when uploading a tiff image, for example, the id given to the zoomify image object should either not contain an extension, or it should be changed from image.tif to something like image_tif. this bug has been reported to zoomify and may be fixed in newer versions of the flashbased viewing software at the time of publication. to view the image within the zoomifyer client, simply call the “view” method of the object from within a browser. so, for a zoomify image object uploaded to: http:///test/test.jpg go to this url: http:///test/test. jpg/view or, to include this view of the image within a zope page template 50 information technology and libraries | march 200750 information technology and libraries | march 2007 (zpt), simply call the tag method of the zoomify image just as you would a normal image object in zope. so, in a zpt, use this: it is possible that the zoomify image conversion process will not have had time to complete when someone tries to view the image. the zoomify image object will attempt to degrade gracefully in this situation by trying to display a downsampled version of the image that is gener ated part way through the conver sion process, or, if that is also not available, finally informing the user that the image is not yet ready to be viewed. this logic is built into the tag method. to add larger images more effi ciently, or to add images in bulk, the zoomify image distribution contains detailed documentation to quickly configure zope to accept images via ftp or webdav and automatically process them through zoomify image when they are uploaded. finally, the default zoomifyer cli ent can be overridden by uploading a custom zoomifyer client into a loca tion where the zoomify image object can “acquire” it, and giving it a zope id of “zoomifyclient.swf”. how it works to be viewed by a zoomifyer cli ent, an image must be processed to produce tiles of the image at differ ent scales, or tiers. an xml file that describes these tiles is also necessary. zoomify image provides a cross platform method of producing these tiled images and the xml file that describes them. beginning at 100percent scale, the image is successively scaled in half to produce each tier, until both the width and height of the final tier are, at most, 256 pixels each. each tier is further divided into tiles that are, at most, 256 pixels wide by 256 pixels tall, as seen in figure 1. these tiles are created left to right, top to bottom. tiles are saved as images with the naming conven tion indicated in figure 2. the numbering is zerobased, so that the smallest tier is represented by one tile that is at most 256 x 256 pixels wide with the name “00 0.jpg.” tiles are saved in directories in groups of 256, and those directories also follow a zerobased naming con vention starting with “tilegroup0.” lowernumbered tile groups contain lowernumbered tiles, so 000.jpg is always in tilegroup0. zoomifyer clients understand this tilenaming scheme and only request tiles from the server that are necessary to stitch together the por tion of the image being viewed at a particular scale. limitations zoomify image was developed to meet two goals: 1. to provide a crossplatform port of the zoomifyer con figure 1. tiers and tiles for a 2048 x 2048 pixel image figure 2. tile image naming scheme introducing zoomify image | smith 51introducing zoomify image | smith 51 verter for use in unix/linux systems, and 2. to make the converter script able, and ultimately integrate it into opensource contentman agement software, particularly zope. this zoomifyer port was writ ten in python, a mature, highlevel programming language with an execution model similar to java. although zoomify image continues to be optimized, compared to the official zoomify conversion software, it is slower and more limited in the sizes of images it can reasonably process. anecdotally, zoomify image has been used effectively on images hundreds of megabytes large, but significant performance degradation has been reported in the multigiga byte range. because of these limitations in zoomify image, the official zoomify imageprocessing software is recom mended for converting very large images manually in a windows or macintosh environment. the zoomify image product is recommended in the following circumstances: ■ the conversion must be per formed on a unix/linux machine. ■ the conversion process must be scriptable, such as for batch pro cessing or being run dynamically. ■ images sizes are not in the multi gigabyte range. if a scriptable, crossplatform version of the zoomifyer converter is needed, but performance is an issue, several things can be done to extend the current limits of the soft ware. obviously, upgrading hard ware, particularly ram, is effective and relatively inexpensive. running the latest versions of python and pil will also help. each new version of python makes significant perfor mance improvements, and this was a primary goal of version 2.5, which was released in september 2006. the author believes that the cur rent weak link in the performance chain is related to how zoomify image is loading image data into memory with pil during processing. in the current distribution, a python script contributed by gawain avers, which is based partially on the zoomify image approach, uses imagemagick instead of pil for image manipula tion and is better able to process multigigabyte images. the author would like to add the ability to des ignate the image library at runtime in future versions of zoomify image. future development beyond improving the performance of the coreprocessing algorithm, the author would also like to explore opportunities for more efficiently processing images within zope, such as spawning a background thread for processing images so the zope web server can immediately respond to the client’s imagesubmission request. the author would also like to improve the tag method to display data more flexibly in the zoomifyer client and ensure consistent behav ior with zope’s default image tag method. finally, zoomify image could also benefit from the addi tion of a simple configuration file to control such runtime properties as image quality and which thirdparty imageprocessing library to use, for example. conclusion zoomify image is mature, open source software that makes it pos sible to publish large, highresolution images to the web. it is designed to be convenient to use in a variety of architectures and can be viewed within existing browser software. download it for free, begin using it in minutes, and explore its unique possibilities. references 1. adobe systems, macromedia flash player statistics, http://www.adobe.com/ products/player_census/flashplayer/ (accessed march 1, 2007). 2. pythonware, python imaging library handbook: image file formats, http:// www.pythonware.com/library/pil/ handbook/formats.htm (accessed aug. 6, 2006). resources macromedia flash player statistics (http://www.adobe.com/ products/player_census/flash player/) (accessed jan. 2, 2007). python imaging library (pil) (http:// www.pythonware.com/products/ pil/) (accessed jan. 2, 2007). python programming language official web site (http://www.python.org/) (accessed jan. 2, 2007). zoomify image (http://sourceforge.net/ projects/zoomifyimage/) (accessed jan. 2, 2007). zoomify (http://www.zoomify.com/) (accessed jan. 2, 2007). zope community (http://www.zope .org/) (accessed jan. 2, 2007). zope installers for macosx (http:// sourceforge.net/projects/ mosxzope/) (accessed jan. 2, 2007). lib-mocs-kmc364-20140103102106 27 personnel aspects of library automation david c. weber: director of libraries, stanford university, stanford, california personnel of an automation project is discussed in terms of talents needed in the design team, their qualifications and organization, the attitudes to be fostered, and the communication and documentation that is important for effective teamwork. discussion is based on stanford university's experience with protect ballots and includes comments on some specific problems which have personnel importance and may be faced in major design efforts. no operation is any better than its rersonnel. the selection, encouragement, motivation and advancement o the individuals who operate libraries or library automation programs are the critical elements in the success of automation. the following observations are based upon experience at stanford university over the past eight years in applying data processing to libraries, and particularly in the large scale on-line experience of project ballots (an acronym standing for bibliographic automation of large library operations using a time sharing system) supported by the u. s. office of education bureau of research during the past three years. the first par! of the paper treats of five key personnel aspects: the automation team, thetr qualifications, their organization, the climate for effort, and documentation. 28 journal of library automation vol. 4/1 march, 1971 the team experts are required for the design of any computer system or system based on other sophisticated equipment and they must emphatically form a "team" to be effective. the group may include a statistician and/or financial expert, a systems analyst, a systems designer, a systems programmer, a computer applications programmer, and a librarian. there may be several persons of each type, or one person may assume more than one responsibility. a few universities have librarians who have received training in systems analysis or in programming. the computer related professions are, however, demanding in themselves, and especially so when the programming language may change with each generation of computers. it is therefore usual for the head librarian to work with experts located in a systems office, an administrative datajrocessing center, or a computation center. except for the librarians, few · any of the experts may be on the library payroll, although in a very large project all may be financed from one or two accounts in the library. the team must cover the variety of functions encompassed in a formal system development process. these functions are enumerated in detail in stanford's project documentation ( 1), but a brief summary of typical functions performed by the team may indicate its diversity. there is the analysis of existing library operations, conceptual design of what is desired under an automated system, form and other output design, review of published literature and on-site analysis of selected efforts of a related nature; determination of machine configuration to support the system design, study of machine efficiency, and reliability of main frame plus peripheral equipment; choice of programming language, checkout and debugging of programs; cost effectiveness study, study of present manpower conversion, analysis of space requirements and equipment changes; staff training programs with manuals or computer aided instruction, system documentation and publicity; systems programming and applications programming, and project management. the total effort is collaborative; the system is designed by and with the users of it (i.e., library staff), not for them, and a tremendous contribution of local staff time is essential to success. in many instances an institution will have some, but not all, of these resources and capabilities in adequate amount. if amount is insufficient, the project director must determine how, through consultants or change of project course, a needed talent can be obtained or bypassed. the consequences of each mix of talent and change of strategy need assessment at frequent intervals; reassessment must be done with the full participation of the most senior library officers, including the director of libraries, as well as certain other key university officers. at stanford, the group has for three years comprised diverse talent and worked reasonably well as a team. the library has recently delegated to the director of the computation center the immediate project management of ballots and spires ( 2) (stanford public information retrieval personnel aspects of automation/ weber 29 system). thus the current combined staff of twenty-three, which should reach a peak of twenty-five during 1971, reports to the ballots-spires project director. he in turn reports both to the director of the computation center in a direct relationship and, under his second hat as chief of the library automation department, to the assistant director of libraries for bibliographic operations in a dotted-line relationship. see table 1 for stanford's diversity of staff. table 1. staff of project ballots-1970 title or age degree years of years on classification experience project project director 36 bs, ce 15 1 special assistant 40 bs 12 2 senior system programmer 37 ba 8 1 system programmer 36 bs 14 3 manager technical development 29 bs 5 2 system services manager 30 ba 8 2 librarian 11/system analyst 28 ba, mls 3 3 librarian/system analyst 27 ba, mls 2 <1 project documentation 35 ba, mls 3 1 editor assistant 26 ma 3 <1 system analyst 27 ba, ma 5 1 junior system analyst 25 ba 2 2 programmer trainee 26 1 1 programmer 30 aa 7 3 programmer 26 ba 4 1 programmer 32 bs 11 <1 programmer 28 bs 7 <1 research assistant 27 bs, ms, phd 4 3 research assistant 28 ba, llb 8 2 research assistant 22 ba 3 2 research assistant 24 ba 4 2 senior secretary 27 8 1 secretary 19 1 1 in development of library automation or of any sophisticated data processing system, it is essential to utilize librarians and other system users to the utmost in constructing the design. there is evidence that an effective program of library automation results from on-campus development: that is, using a local staff with librarians working on a daily basis with system ~alysts, programmers, and information scientists. librarians most definitely should not try to do it all themselves; that would be sheer folly and w~uld reveal a lamentable lack of appreciation of the highly complex sktlls of the other professionals working in the information sciences. l 30 journal of library automation vol. 4/1 march, 1971 team qualifications a qualified and enthusiastic team with strong backing from the library administration is the most important single element in a library's automation ehorts. this requires that the library administrator have a grasp of the intricacies, although he himself will probably not understand all details involved in the system design. it also requires consideration of the desire for advancement of those in computer refated professions and the various characteristics of their career/attems, including training, experience, job market, salary potentials, an mobility. the team will need to be selected with care and joint ehort by computer stah and library stah management. people are needed who can teach and learn from one another. they must be tolerant, and interested in problems and details, for they will be changing traditional systems, altering people's work habits, and probably shaking their self-confidence. security comes from knowing the facts and being able to work on the new system-to be in part responsible for one's own future. team harmony of ehort can be promoted by the so-called "bridge professional", or what the sociologists call a "marginal professional", meaning one who is able to assist those in one profession to converse and work ehectively with those in another. at stanford the librarian/analysts and the project editor have been ehective in such a capacity. those in the computer related professions, along with all on the library stah, need a sense of purpose, a sense of achievement, and recognition of their contributions by superiors as well as peers. the automation team needs a competent, experienced, technically knowledgeable, and tactful captain. he must manage with an appreciation for communication, a knack for touching base with various groups having interests in the ehort, the judgment to assign reasonable tasks, and the realism to set and achieve feasible time schedules-all within budget limitations. if the leader is less than this paragon, others in the organization must provide these qualities, all of which are required. for at least another decade it is likely that the expert analyst andjrogrammer will receive as high a salary as a librarian division hea or assistant department chief, and a highly qualified systems designer may well earn more than any chief and perhaps as much as the assistant director of libraries. the scale is not irrational or unjust; it merely recognizes the scarcity of particular talents and their importance to major library automation programs. designing an on-line library system requires a person of proven competence in on-line systems. a salary oher shaved here may well lead to regret. experience in project ballots points up problems with the selection of personnel who are not library trained. some persons may be excellent in theoretical development but poor as managers, or some may play a "campus politics" game in order to move into senior positions in the computation center. computer specialists have diherent career goals than do librarians, and rarely see the library as a permanent career commitment personnel aspects of automation/weber 31 by which to promote library automation; rather their commitment is toward automation and computer applications, not a particular section of the university. a project manager also needs to take great care that research does not become an end in itself, a particular tendency of graduate students doing system development. implementation must be the goal of library automation; automated operations must be sound, efficient, dependable, and economical. some of the special needs and working conditions for personnel in an automated program are outlined by allen b. veaner (3). team organization the organizational unit of an automation program may be first an office, then later a division when the group is farger and the function more permanent. the staff of a major project should have a departmental status equal to that of the acquisition or cataloging department. these latter two departments may be combined with an automation department under an assistant or associate director for technical processing. however, it is a rare individual who can give adequate attention to both the complexities of a major traditional library function and the direction of a major research and development program. thus the initial organizational pattern may be one of separate but equal status, and at some point in time the units may be combined under one administrator. see figure 1 for stanford's new organization adopted after three years of ehort, as it entered the productionengineered phase. units may best be combined when a research and development project begins to take on a significant amount of operational work. the reason is that the person in charge of the system development may need to oversee its implementation in order to assure that standards are followed for data preparation, coding, and the details of forms; and that feedback of experience for system improvement is secured. this combination of units should not be achieved when the rroject is still in the development stage, but it should also not wait unti operations are well under way. some anticipation is desirable. in the medium-scale program such combination of units may be possible after a year of operation, or the continuing production may be assumed by a traditional department and the systems office left free for further experimentation and development work. production is normally the responsibility of traditional departments and ~om the day of implementation; the automation department responsibility is for instructing in system use, debugging of programs, and fine tuning of the system. in a large project striving toward an integrated system for all technical processes and public services, the transfer of responsibilities to traditional departments may come in no less than three years and perhaps as many as five years from the origin of the project because of c_onstant developments in software and hardware, developments which library users cannot control but to which they must be responsive. an ~ director, stanford : university librarjes e the .r.lots : : ballots: prin:cipal investigator and assistant director of libraries for bibliographic operations i library systems i design com!httee i i i i system services manager 4-library syste~ls 2-system programmers analysts (incl. 2 librarians) age = 27 years degrees = 1t (a)exper = 3 years (b)proj = 2 years age = 37 years degrees = 1 exper = 11 years )?roj = 2 years (a) professional experience in edp systems (b) time with the ballots/spires project director, stanford vice president computation center for research project director spires: principal investigator spires/ballots and professor of the and chief of the library's department of communication automation department 6-applications + 4-graduate programmers srudents (full time) (part time ) 26 years 2 age = 31 years degrees = 1 exper = 71years proj = l year 4 years 2 years project documentation 2-editors age = 30 years degrees = 2 exper = 3 years proj = 1 year fig. 1. ballots/spires organization-1970. ~ 'c' ~ ~ ..... .q.. t-t 5:j ~ c ~ ;; c· ;$ ~ ~ -~ ~ i ..... ~ ..... personnel aspects of automation/weber 33 automation division or systems office would remain to take care of the refinements, maintenance, and development of further applications which are a result of the open-ended nature of a major automation program. the climate for effort if the librarian is to work effectively with all of the previously mentioned experts, he must become more than superficially familiar with the equipment and with the software which instructs it. the librarian who carries the responsibility for major mechanized data processing programs will probably have taken at least half a dozen courses in various aspects of data processing in order to be able to state reasonable requirements, to comprehend economic and technical limitations, discuss file organization problems with the systems designer, and be sufficiently informed to help explain the new system to the library staff that will operate or make use of it. this type of specialized training will also be necessary for other team members who will work with different parts of the system. a number of librarians will need to take several short courses selected for their early relevance to the work at hand. staff may take courses offered in the university computer science department, by the computation center, or by a local computer firm. various clerical personnel will need briefing sessions, and it will be necessary to train some typists to serve as skilled terminal operators. indeed, training will be needed on a continuing basis as more staff use the system; manuals are important unless self-instruction is built in. these efforts are desirable because the employee needs assurance that his talents will not be outdated and he be laid off as a consequence; rather that he will be retrained to the new system, shown that its function is not totally different from the previous one, and shown that it can actually serve him and lead to enhanced satisfaction and improved salary in his library employment. computer based systems are far more likely to upgrade librarianship than to make it obsolete. they will enhance the profession by eliminating its routine drudgery, and thus more sharply identify its really professional nature. don r. swanson has commented on this point: "those librarians who have some kind of irrational antipathy toward mechanization per se (not just toward some engineers who have inappropriately oversold mechanization) i regard with some suspicion because i think they do not have sufficient respect for their profession. they may be afraid that librarianship is going to be exposed as being intellectually vacuous, which i don't think is so. even in a completely mechanized library there would still be need for skilled reference librarians, bibliographers, catalogers, acquisitions specialists, administrators, and others. those librarians in the future who regard mechanization, not with suspicion, but as a subject to be mastered will be those who will plan our future libraries and who will plan the things that machines are going to do. there will be no doubt of their professional status." ( 4) 34 journal of library automation vol. 4/1 march, 1971 persons who have inhibitions about machine based systems will not be effective members of the design and development group. those receptive to the change will benefit by having their job horizons enlarged and their prospects for improved salary and personnel classification enhanced. they will also share in the enthusiasm inspired by a bold new enterprise. this is not to say that all library staff members will enjoy the exacting refinements of a machine system, just as not everyone has talent to be a first-rate cataloger. it is not suited to everyone, and therefore the nature and purpose of the system must be clearly explained or demonstrated to anyone interested in such an assignment lest he accept it and then become disenchanted with the work. the importance cannot be overstated of telling the entire library staff what is being done in regard to automation-and why. disquieting rumors will abound in the absence of full and candid communication. staff meetings should be held to review progress and outline next steps. staff bulletins should publish summaries of the program and reports on its current status, information that can also be useful for faculty and staff outside the library. it must not be forgotten that the card catalog, the manual circulation system, and common order forms are familiar to all students and faculty. most students will have seen these in their high school or public libraries, yet few will have seen a sophisticated machine system, and will often be skeptical about its efficiency and dependability. faculty members may well wonder whether it is worth the cost. the effort to explain a program concisely but clearly to the library staff, students, faculty, and other university staff can be highly rewarding in understanding, and in moral and financial support. columbia university's experience with library automation has led them to state that .. though the hardware and software programs associated with computer technology are formidable, they are not the only (and possibly not even the most important) problems in an automation effort. two areas often overlooked or grossly underestimated are: 1 ) creating an environment hospitable to change [and] especially important in this area is staff training and organization. 2) describing and analyzing existing manual procedures sufficiently before attempting to design automated systems." (5) documentation the documentation of any new system is of singular importance. there is an oral tradition in most libraries; techniques of filing or searching are passed on by the supervisor, although libraries use staff manuals to formalize some of the techniques. however, in a system where absolute exactitude is demanded and where costs of system development are high, methodical recording of principles and procedures is obviously necessary. especially vital are details of design and programming, for purposes of debugging, maintenance, and transfer to others. personnel aspects of automation/weber 35 critical personnel issues in an important statement from massachusetts institute of technology's project mac in 1968, professor f. j. corbat6 outlines fifteen critical issues ranging from technical to managerial that affect the complexity and difficulty of constructing computer systems to serve multiple users ( 6). seven of the fifteen have substantial personnel aspects; experience with project ballots provides the basis for the following comments on them. 1) "the first danger signal is when the designers of the system won't document. they don't want to be bothered trying to write out in words what they intend to do." stanford's experience might not put this as a first critical issue, yet it is evident that without adequate and clear documentation the advancement of any research or development project is jeopardized. one expert, an invaluable member of the ballots team, has full responsibility for this very important task. the position requires adequate clerical support; there are one-and-a-half assistants on the ballots team. 2) "the second danger signal is when designers won't or can't implement. what is referred to here is the lofty designer who sketches out on a blackboard one day his great ideas and then turns the job over to coders to finish many months later." stanford has experienced some of the seductiveness of design innovations, especially on the part of graduate student research assistants. (yet these assistants have done excellent work and it is wished they were all full time on the project. ) without constant review and the use of pert charts or other scheduling, shying away from implementation can be a real hazard. there will be dark days when the design team cannot surmount some intractable but crucial obstacle, and tne project manager and stah librarians working with the team must be sympathetic, encouraging and patient. 3) "the next danger signal is when the design needs more than ten people. this doesn't mean that all the support people . . . must add up to no more than ten. but when the crucial kernel of the design team is more than ten people, a larger scale project is coming into being. this is the point where communication problems begin to develop." stanford has flirted with that particular danger point. with acquisition and cataloging staff included, the ballots design group is over ten and there is a communication problem, but one due not so much to size as to different backgrounds, vocabulary and scheduling of effort. the need for communication has been intensified because the main library is over half a mile from the computation center. it has required monthly staff meetings at early stages of design, and late stages of development, and at other times weekly staff meetings of the design group with the librarians who are ~etting the design criteria. failure of constant and accurate communication m a research and development effort is a threat to its effective progress. 4) "if a project cannot be finished or made use of in one year, there is potential trouble, because the chances of underestimation are strong (and ) a personnel turnover of roughly 20% per year must be assumed." stanford's 36 journal of library automation vol. 4/1 march, 1971 experience would bear this out. there was some time and cost underestimation. turnover during 1969-70 was 17%; the year before it was 50%. obviously documentation then becomes a more critical element in progress, and turnover may lead the librarian to feel that it is sometimes one step backwards for every two steps forward. turnover may be minimized by generous salary increases, not only once a year but perhaps at other times also when merit deserves reward and as responsibilities increase. in contrast to customary operations, an automation design effort is constantly changing in nature and emphasis; this fact requires flexibility in personnel management and frequently deserves immediate response in salary and classification administration. to keep a qualified research team in an area of specialization in demand, one must pay the price. let there be no misunderstanding, a good system of library automation cannot be finished in one year-nor in three; and it is costly. 5) "another danger signal is when a system is not a line-of-sight system. this means that all of the terminals, consoles, or what-have-you are not in the same room within shouting distance of the operator." any on-line system like ballots cannot be line-of-sight. terminals are brought to the users, not users to the terminals. since an on-line system requires total file recovery through use of log tapes, a facility not available on the prototype system, stanford has experienced problems when the machine goes down; it takes time to rerun a program or mount a different disk pack; a file was once wiped out; and there are many other users of the central facility, which puts a premium on scheduling, advance notice, backup, and the like. if a design team is not housed in adjacent space, it will take more personnel or time than in a line-of-sight arrangement to achieve the same accomplishment. ballots systems analysts were in the main library througbout the early design phases and the systems designers were near the computation center. lack of line-of-sight was a sufficiently severe problem that all of the ballots staff were collocated near the computation center last winter as the production engineered phase began. 6) "a somewhat related danger signal is when there are over ten system maintainers. here i am talking about an on-line system that is actually being maintained on-line." at stanford no more than one person has worked at one time on the program maintenance of stanford's four-yearold computer produced undergraduate library book catalog. there have been some complexities due to staff changes, changes in the operating system, and an off-campus contract for reprogramming to third-generation equipment, but the problems have not resulted because of the scale of the project. ballots, on the other hand, is twenty to fifty times as large a system, and it is expected that two or three programmers will be needed to maintain the systems software and a similar number to maintain and make minor revisions to the applications software. 7) "the last danger signal is when the system requires the ability to permit combinations of sharing, privacy and control." at stanford, assignpersonnel aspects of automation/ weber 37 ment of authority for file access has become a problem-who is permitted to update an acquisition record or authorize payment? the requirement for security also enters in any system which has salary data or other personnel information in files. a whole order of complexity is added. as in many of the above problems, complexity is accentuated when one is developing an on-line interactive system which serves multiple users. security must be designed to the file level and, later, to the record or even data element level. security requires control of access to file, of writing in a file, and of updating data through three types of checks: access allowable from a given terminal, from the file password, or from an individual password. such problems do not exist in off-line systems. conclusion for successful automation of library operations, it is of fundamental importance to choose a task that is appropriate in timing, magnitude of effort, funding, and personnel. the ballots experience demonstrates that one must devote great thought, care, and analysis to choosing the right automation project at the right time, and base it on having well qualified people to direct and accomplish the task. given suitable conditions it will be a most exciting and fruitful endeavor. the system that works well is a thing of beauty, and people make it so. references 1. stanford university, spires/ballots project: project control notebook, may 1970. section 1.4 "system development process." 2. parker, edwin b.: spires (stanford physics information retrieval system) 1969-70 annual report to the national science foundation. (stanford university: institute for communication research, june 1970). 3. veaner, allen b.: "major decision points in library automation," college & research libraries, 31 (september 1970), 299-312. 4. swanson, don r.: "design requirements for a future library." in markuson, barbara evans, ed.: libraries and automation. (washington: library of congress, 1964), p. 21. 5. columbia university libraries : progress report [to the national science foundation on library automation] for jan. 1968-dec. 1969 (nsf-gn-694). p. 14. 6. corbat6, fernando j.: sensitive issues in the design of multi-use systems (waltham, massachusetts: honeywell edp technology center, technical symposium on advances in software technology, february 1968). 17 pp. project mac internal memo. mac-m-383. microsoft word 5485-10835-5-ce.docx negotiating a text mining license for faculty researchers leslie a. williams, lynne m. fox, christophe roeder, and lawrence hunter information technology and libraries | september 2014 5 abstract this case study examines strategies used to leverage the library’s existing journal licenses to obtain a large collection of full-‐text journal articles in xml format, the right to text mine the collection, and the right to use the collection and the data mined from it for grant-‐funded research to develop biomedical natural language processing (bnlp) tools. researchers attempted to obtain content directly from pubmed central (pmc). this attempt failed because of limits on use of content in pmc. next, researchers and their library liaison attempted to obtain content from contacts in the technical divisions of the publishing industry. this resulted in an incomplete research data set. researchers, the library liaison, and the acquisitions librarian then collaborated with the sales and technical staff of a major science, technology, engineering, and medical (stem) publisher to successfully create a method for obtaining xml content as an extension of the library’s typical acquisition process for electronic resources. our experience led us to realize that text-‐mining rights of full-‐text articles in xml format should routinely be included in the negotiation of the library’s licenses. introduction the university of colorado anschutz medical campus (cu anschutz) is the only academic health sciences center in colorado and the largest in the region. annually, cu anschutz educates 3,480 full-‐time students, provides care during 1.5 million patient visits, and receives more than $400 million in research awards.1 cu anschutz is home to a major research group in biomedical natural language processing (bnlp), directed by professor lawrence hunter. natural language processing (also known as nlp or, more colloquially, “text mining”) is the development and application of computer programs that accept human language, usually in the form of documents, as input. bnlp takes as input scientific documents, such as journal articles or abstracts, and provides useful leslie a. williams (leslie.williams@ucdenver.edu) is head of acquisitions, auraria library, university of colorado, denver. lynne m. fox (lynne.fox@ucdenver.edu) is education librarian, health sciences library, university of colorado anschutz medical campus, aurora. chistophe roeder is a researcher at the school of medicine, university of colorado, aurora. lawrence hunter (larry.hunter@ucdenver.edu) is professor, school of medicine, university of colorado, aurora. negotiating a text mining license for faculty researchers | williams et al 6 functionality, such as information retrieval or information extraction. cu anschutz’s health sciences library (hsl) supports hunter’s research group by providing a reference and instruction librarian, lynne fox, to participate on the research team. hunter’s group is working on computational methods for knowledge-‐based analysis of genome-‐scale data.2 as part of that work, his group is devising and implementing text-‐mining methods that extract relevant information from biomedical journal articles, which is then integrated with information from gene-‐centric databases and used to produce a visual representation of all of the published knowledge relevant to a particular data set, with the goal of identifying new explanatory hypotheses. hunter’s research group demonstrated the potential of integrating data and research information in a visualization to further new discoveries with the “hanalyzer” (http://hanalyzer.sourceforge.net). their test case used expression data from mice related to craniofacial development and connected that data to pubmed abstracts using gene or protein names. “copying of content that is subject to copyright requires the clearing of rights and permissions to do this. for these reasons the body of text that is most often used by researchers for text mining is pubmed.”3 the resulting visualization allowed researchers to identify four genes involved in mouse craniofacial development that had not previously been connected to tongue development, with the resulting hypotheses validated by subsequent laboratory experiment.4 the knowledge-‐based analysis tool is open access. to continue the development of the bnlp tools for the knowledge-‐based analysis system, three things were required: a large collection of full-‐text journal articles in xml format, the right to text mine the collection, and the right to store and use the collection and the data mined from it for grant-‐funded research. the larger the dataset, the more robust the visual representations of the knowledge-‐based analysis system, so hunter’s research group sought to compile a large corpus of relevant literature, beginning with journal articles. the text that is mined can start in many formats; however, xml provides a computer-‐ready format for text mining because it is structured to indicate parts of the document. xml is “called a ‘markup language’ because it uses tags to mark and delineate pieces of data. the ‘extensible’ part means that the tags are not pre-‐defined; users can define them based on the type of content they are working with.”5,6 xml has been adopted as a standard for content creation by journal publishers because it provides a flexible format for electronic media.7 xml allows the parts of a journal article to be encoded with tags that identify the title, author, abstract, and other sections, allowing the article to be transmitted electronically between editor and publisher and to be easily formatted and reproduced into different versions (e.g., print, online). xml can also indicate significant content in the text, such as biological terms or concepts. xml allowed hunter’s research group to write computer programs that can make sense of each article by using the xml tags as indicators of content and placement within the article. products have been developed, such as la-‐pdftext, to extract text from pdf documents.8 however, direct access to xml provides more useful corpora information technology and libraries | september 2014 7 because the document markup saves time and improves the accuracy of results extracted from xml. once the sections and content of an article are identified, text-‐mining techniques are applied to the article. “text mining extracts meaning from text in the form of concepts, the relationships between the concepts or the actions performed on them and presents them as facts or assertions.”9 text-‐ mining techniques can be applied to any type of information available in machine-‐readable format (e.g., journal article, e-‐books). a dataset is created when the text-‐mined data is aggregated. using bnlp tools, hunter’s research group’s knowledge-‐based analysis system analyzed the dataset and produced visual representations of the knowledge that have the potential to lead to new hypotheses. text mining and bnlp techniques have the potential to build relationships between the knowledge contained in the scholarly literature that lead to new hypothesis resulting in more rapid advances in science. literature review hunter and cohen explored “literature overload” and its profoundly negative impact on discovery and innovation.10 with an estimated growth rate of 3.1 percent annually for pubmed central, the us national library of medicine’s repository, researchers struggle to master the new literature of their field using traditional methods. yet much of the advancement of biological knowledge relies on the interplay of data created by protein, sequence, and expression studies and the communication of information and discoveries through nontextual and textual databases and published reports.11 how do biomedical researchers capitalize on and integrate the wealth of information available in the scholarly literature? “the common ground in the area of content mining is in the shared conviction that the ever increasing overload of information poses an absolute need for better and faster analysis of large volumes of content corpora, preferably by machines.”12 bnlp “encompasses the many computational tools and methods that take human-‐generated texts as input, generally applied to tasks such as information retrieval, document classification, information extraction, plagiarism detection, or literature-‐based discovery.”13 bnlp techniques accomplish many tasks usually performed manually by researchers, including enhancing access through expanded indexing of content or linkage to additional information, automating reviews of the literature, discovering new insights, and extracting meaning from text.14 text mining is just one tool in a larger bnlp toolbox of resources used to read, reason, and report findings in a way that connects data to information sources to speed discovery of new knowledge.15 according to pioneering text-‐mining researcher marti hearst, “text mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. a key element is the linking together of the extracted information together to form new facts or new hypotheses to be explored further by more conventional means of negotiating a text mining license for faculty researchers | williams et al 8 experimentation.”16 biomedical text mining uses “automated methods for exploiting the enormous amount of knowledge available in the biomedical literature.”17 recent reports, commissioned by private and governmental interest groups, discuss the economic and societal value of text mining.18,19 the mckinsey global institute estimates the worth of harnessing big data insights in us health care at $300 billion. the report concludes that greater sharing of data for text mining enables “experimentation to discover needs, expose variability, and improve performance” and enhances “replacing/supporting human decision making with automated algorithms,” among other benefits. furthermore, the mckinsey report points out that north america and europe have the greatest potential to take advantage of innovation because of a well-‐developed infrastructure and large stores of text and data to be mined.20 however, these new and evolving technologies are challenging the current intellectual-‐property framework as noted in an independent report by ian hargreaves, “digital opportunity: a review of intellectual property and growth,” resulting in lost opportunity for innovation and economic growth.21 in “the value and benefits of text mining,” jisc finds copyright restrictions limit access to content for text mining in the biomedical sciences and chemistry and that costs for access and infrastructure prevent entry into text-‐mining research for many noncommercial organizations.22 despite copyright barriers, organizations surveyed pointed out the risks associated with failing to use text-‐ mining techniques to further research include financial loss, loss of prestige, opportunity lost, and the brain drain of having talented staff seek more fulfilling work. jisc explores a research project’s workflow and finds a lack of access to text mining delayed the publication of an important medical research study by many months, or the time the research team spent analyzing and summarizing relevant research.23 both reports advocate an exception to intellectual property rights for noncommercial text-‐mining research to balance the protection of intellectual property with the access needs of researchers. a centrally maintained repository for text mining has been proposed, although its creation would face significant challenges.24 scholarly journal content is the raw “ore” for text mining and bnlp. the lack of access to this ore creates a bottleneck for researchers. “new business models for supporting text mining within the scholarly publishing community are being explored; however, evidence suggests that in some cases lack of understanding of the potential is hampering innovation.”25 bnlp and machine-‐ learning research products are more accurate and complete when more content is available for text mining. “knowledge discovery is the search for hidden information. . . . hence the need is to start looking as widely as possible in the largest set of content sources possible.”26 however, as noted in a nature article, “the question is how to make progress today when much research lies behind subscription firewalls and even ‘open’ content does not always come with a text-‐mining license.”27 large scientific publishers are facing economic challenges, and potentially diminished economic returns, as the tension over the right to use licensed content heats up. nature, the flagship of a major scientific publisher, predicted “trouble at the text mine” if researchers lack access to the contents of research publications.28 and a 2012 investment report predicted slower information technology and libraries | september 2014 9 earnings growth for elsevier, the largest stem publisher, if it blocked access to licensed content by text-‐mining researchers. the review predicted, “if the academic community were to conclude that the commercial terms imposed by elsevier are also hindering the progress of science or their ability to efficiently perform research, the risk of a further escalation of the acrimony [between elsevier and the academic community] rises substantially.”29 with open access alternatives proliferating, including making federally funded research freely accessible, stem publishers are under increased pressure to respond to market forces. “the greatest challenge for publishers is to create an infrastructure that makes their content more machine-‐accessible and that also supports all that text-‐miners or computational linguists might want to do with the content.”30 on the other end of the spectrum, researchers are struggling to gain legal access to as much content as possible. academic libraries have long excelled at serving as the bridge between researchers and publishers and can expand their roles to include navigating the uncharted territory of obtaining text-‐mining rights for content. increasing the library’s role in text mining and other associated bnlp and machine-‐learning methods offers tremendous potential for greater institutional relevance and service to researchers.31 at cu anschutz’s hsl, fox and williams, an acquisitions librarian, found natural opportunities for collaboration including negotiating rights to content more efficiently through expanded licensing arrangements and facilitating the secure transfer and storage of data to protect researchers and publishers. method hunter and fox began working in 2011 to obtain a large corpus of biomedical journal articles in xml format to create a body of text as comprehensive as possible for bnlp experimentation that would further advance hunter’s research group’s knowledge-‐based analysis system. the desired result was an aggregated collection obtained from multiple publishers, stored locally, and available on demand for the knowledge-‐based analysis system to process. hunter and fox soon realized that “the process of obtaining or granting permissions for text mining is daunting for researchers and publishers alike. researchers must identify the publishers and discover the method of obtaining permission for each publisher. most publishers currently consider mining requests on a case by case basis.”32 they pursued a multifaceted strategy to build a robust collection and to determine which strategy proved most fruitful because, during a grant review, national library of medicine staff wanted evidence of access to an xml collection before awarding a grant. fox first approached two open-‐access publishers, biomed central (bmc) and public library of science (plos), to request access to xml text from journals in the subjects of life and biomedical science. fox had existing contacts within both organizations and an agreement was reached to obtain xml journal articles. letters of understanding were quickly obtained as both publishers were excited about exploring new ways for their research publications to be accessed and the potential to increase the use of their journals. possible journal titles were identified and negotiating a text mining license for faculty researchers | williams et al 10 arrangements were made to transfer and store files locally from bmc and plos to hunter’s research group. hunter approached staff at pubmedcentral (pmc) to request access to articles and discovered they could only be made available with permission from publishers. a wiley research and product development executive granted hunter permission to access wiley articles in pmc. the wiley executive was interested in learning what impact text mining might have on wiley products. hunter’s research group planned to transfer document type definition (dtd) format files from pmc. unfortunately, when hunter’s research group staff requested file-‐transfer assistance from pmc, no pmc staff were available to provide the technical help needed because of budget reductions. pmc staff could accurately evaluate their time commitment because they had a clear understanding of the xml access and transfer process, and knew they could not allocate resources to the effort. hunter then began to leverage his professional network connections to obtain content from a major stem vendor. research and development division directors within the company were familiar with the work of hunter’s research group and were willing to provide assistance in acquiring content. however, when the research group began to perform research using this data, further investigation determined that the contents were not adequate for the research. follow-‐up between fox, the research group, and the vendor revealed that the group’s needs were not communicated in the vendor’s vernacular, resulting in the group not clearly understanding what content the vendor was providing. this disconnect occurred in the communication flow from the research group to the vendor’s research and development staff to the vendor’s sales staff (who identified the content to be shared). it was a like a game of telephone tag. after the initial strategies produced mixed results, hunter’s research group hypothesized that they could harvest materials through hsl’s journal subscriptions. hunter’s research group attempted to crawl and download journal content being provided by hsl’s subscription to a major chemistry publisher. since publishers monitor for web crawling of their content, the chemistry publisher became aware of the unusual download activity, turned off campus access, and notified the library that there may have been an unauthorized attempt to access the publisher’s content. researchers are often unaware of complex copyright and license compliance requirements. in fact, librarians sometimes become aware of text-‐mining projects only after automated downloads of licensed content prompt vendors to shut off campus access.33 libraries can prevent interruption of campus-‐wide access to important resources by suggesting more effective content-‐access methods. williams, an hsl acquisitions librarian, investigated the interruption in access and discovered hunter’s research group’s efforts to obtain journal articles to text mine for their research. she offered to use her expertise in acquiring content to help hunter’s research group obtain the dataset needed for their research. initially, hunter and fox had not included an acquisitions information technology and libraries | september 2014 11 librarian because that position was vacant. after williams became involved, the effort focused on licensing content via negotiation and licensing with individual publishers. results “there are a large number of resources to help the researcher who is interested in doing text mining” but “no similar guide to obtaining the necessary rights and permissions for the content that is needed.”34 at cu anschutz, this vacuum was filled by williams, who is knowledgeable about the acquisition of content, and fox, who is knowledgeable about hunter’s research, serving as the bridge between the research group and the stem publisher. by working together and capitalizing on each other’s expertise, williams and fox were able to facilitate the collaboration that developed a framework for purchasing a large collection of full-‐text journal articles in xml format. as the collaboration progressed, three major elements to the framework surfaced, including a pricing model, a license agreement, and the dataset and delivery mechanism. researchers interested in legally text mining journal content often find themselves having to execute a license agreement and pay a fee.35 what should the fee be based on to create a fair and equitable pricing model? publishers establish pricing for library clients on the basis of not only the content, but many valued-‐added services such as the breath of titles aggregated and made available for purchase in a single product, the creation of a platform to access the journal titles, the indexing and searching functionality within the platform, and the production of easily readable pdf versions of articles. these value-‐added services are not required for text-‐mining endeavors. rather, the product is the raw journal content that has been peer-‐reviewed, edited, and formatted in xml that precedes the addition of value-‐added services. therefore the pricing should not be equivalent to the cost of a library’s subscription to a journal or package of journals. in the end, after lengthy negotiations, the pricing model for the hunter’s research group collection of full-‐text journal articles in xml format consisted of • a cost per article; • a minimum purchase of 400,000 articles for one sum on the basis of the cost per article; • an annual subscription for the minimum purchase of 400,000; • the ability to subscribe to additional articles in excess of 400,000 in quantities determined by hunter’s research group; • a volume discount off the per article price for every article purchased in excess of 400,000; • inclusion of the core journal titles purchased via the library’s subscription at no charge; • inclusion of the core journal titles purchased by the university of colorado boulder at no charge because of hunter’s joint appointment at both cu boulder and cuanschutz campuses; and • a requirement for hsl to maintain its subscription to the vendor’s product at its current level. negotiating a text mining license for faculty researchers | williams et al 12 “where institutions already have existing contracts to access particular academic publications, it is often unclear whether text mining is a permissible use.”36 from the beginning, common ground was easily found on the subject of core titles purchased by the two campuses’ libraries. core titles are typically those journals that libraries pay a premium for to obtain perpetual rights to the content. most of the negotiation focused on access titles, which are journals that libraries pay a nominal fee to have access to without any perpetual rights included. the final challenge related to cost was determining how to process and pay for the product. hunter’s research group operates on major grant funding from federal government agencies. the university of colorado requires additional levels of internal controls and approvals to expend grant funds as well as to track expenditures to meet reporting requirements of the funding agencies. also, grant funding of this type often spans multiple fiscal years whereas the library’s budget operates on a single fiscal year at a time. therefore it was decided that hunter would handle payment directly rather than transferring funds to hsl to make payment on their behalf. “libraries as the licensee of publishers’ content are from that perspective interested in the legal framework around content mining.”37 during price negotiations, williams recommended negotiating a license agreement similar to those libraries and publishers execute for the purchases of journal packages. a license agreement would offer a level of protection for all parties involved while clearly outlining the parameters of the transaction. hunter and the stem publisher readily agreed. the final license agreement contained ten sections including definitions; subscription; obligations; use of names; financial arrangement; term; proprietary rights; warranty, indemnity, disclaimer, and limitation of liability; and miscellaneous. while the license agreement was similar to traditional license agreements between libraries and publishers for journal subscriptions, there were some notable differences. first, in the definitions section, users were defined and limited to hunter and his research team. this limited the users to a specific group of individuals unlike typical library–publisher license agreements that license content for the entire campus. second, the subscription section covered how the data can be used in detail and allowed the dataset to be installed locally. this was important to make the dataset available on demand to researchers; to allow researchers to manipulate, segment, and store the data in multiple ways instead of as one large dataset; and to allow the researchers the ability to access and use the large dataset efficiently and quickly. because the dataset would be manipulated so extensively, the license gave permission to create a backup copy and store it separately. the subscription section also required the dissemination of the research results to occur in such a way that the dataset could not be extracted and used by others. this was significant because prof. hunter releases the bnlp software applications they develop as open source software so that the applications can be open to peer review and attempts at reproduction. ideally, someone could download the open source software, obtain the same corpus as input, and see the same output mentioned in the paper. information technology and libraries | september 2014 13 third, the obligations section was radically different from traditional library–publisher license agreements because even though “publishers are still working out how to take advantage of text mining . . . none wants to miss out on the potential commercial value.”38 this interest prompted the crafting of an atypical obligations section in the license agreement that included an option for hunter to collaborate with the stem publisher to develop and showcase an application on the vendor’s website and included a commitment for hunter to meet quarterly with the vendor’s representatives to discuss advances in research. furthermore, the obligations section specified a request for hunter and the university of colorado to recognize the vendor where appropriate and a right for the stem publisher to use any research software application released as open source. up to this point, williams had been collaborating with the university of colorado in-‐house counsel to review and revise the license agreement. when the stem publisher requested the right to use the software application, williams was required to submit the license agreement to the university of colorado‘s technology transfer office for review and approval. approval was prompt in coming, primarily because prof. hunter releases his software applications as open source. fourth, the license agreement included a “use of names” section, which is not found in typical library–publisher agreements. this section authorized the vendor to use factual information drawn from a case study in market-‐facing materials and a requirement that the vendor request written consent, as required from the university of colorado system, for information in the case study to be released for market facing materials. the vendor also agreed not to use the university of colorado’s trademark, service mark, trade name, copyright, or symbol without prior written consent and to use these items in accordance with the university of colorado system’s usage guidelines. fifth, the vendor agreed not to represent in any way that the university of colorado or its employees endorse the vendor’s products or services. this is extremely important because the university of colorado’s controller does not allow product endorsements because of the federal unrelated business income tax. exempt organizations are required to pay this tax if engaged in activities that are regularly occurring business activities that do not further the purpose of the exempt organization.39 finally, the license agreement stated all items would be provided in xml format with a unique digital object identifier (doi) number, essential for linking xml content to real-‐world documents that researchers using hunter’s research group’s knowledge-‐based analysis system would want to access. after a pricing model and license agreement were finalized, the focus turned to the last major element of the framework: the dataset and delivery mechanism. elements such as quality of the corpora contents, file transfer time, and storage capacity are all important. in other words, “the need is to start looking as widely as possible in the largest set of content sources possible. this need is balanced by the practicalities of dealing with large amounts of information, so a choice negotiating a text mining license for faculty researchers | williams et al 14 needs to be made of which body of content will most likely prove fruitful for discovery. text mines are dug where there is the best chance of finding something valuable.”40 when building an xml corpora for research, hunter’s research group wanted to maximize their return on investment, so a pilot download was conducted to assure that the most beneficial content could be transferred smoothly to a local server. “permissions and licensing is only a part of what is needed to support text mining. the content that is to be mined must be made available in a way that is convenient for the researcher and the publisher alike.”41 this pilot phase allowed hunter’s researchers and the vendor’s technical personnel to clarify the requirements of the dataset and to efficiently deliver and accurately invoice for content. one of the initial obstacles was that a filter for the delivery mechanism didn’t exist. letters to the editor, errata, and more were all counted as an article. hunter’s researchers quickly determined that research articles were most important at this point in the development of the knowledge-‐based analysis system. how should a useful or minable article be defined—by its length, by xml tags indicating content type, or by some other criteria? roeder, a software engineer, used article attributes and characteristics embedded in xml tags to define an article as including all of the following: • an abstract • a body • at least 40 lines of text • none of the following tags: corrigendum, erratum, book review, editorial, introduction, preface, correspondence, or letter to the editor in the end, hunter’s research group and the vendor agreed to transmit everything and allow the group a fifteen business days to evaluate the content. the research group would then notify the vendor of how many “articles” were received. this process would continue until 400,000 “articles” were received. after spending more than a year working to develop a structure to purchase a large corpus of journal articles to text mine. just as hunter’s research group was ready to execute the license, remit payment, and receive the articles, their federal grant expired, stalling the purchase. in retrospect, this unfortunate development was the catalyst for a shift in philosophy and strategy for the researchers and librarians at cu anschutz. discussion xml text-‐mining efforts will continue to expand, leading to increased demand on libraries and librarians to play a role in securing content. publishers, researchers, and libraries see the potential commercial and research value for text mining journal content and are driving the rapid evolution of this arena, in part, because “there is increasing demand from public and charitable funders that maximum value is leveraged from their substantial investment and this includes making outputs information technology and libraries | september 2014 15 accessible and usable. . . . text mining offers the potential for fuller use of the existing publicly-‐ funded research base.”42 however, publishers identified two main barriers to text mining from their perspective—lack of standardization in content formats and in access terms—and concede that “publishers should develop shared access terms for research-‐driven mining requests.”43 from the researcher and librarian perspective, there are many barriers and costs involved including “access rights to text-‐ minable materials, transaction costs (participation in text mining), entry (setting up text mining), staff and underlying infrastructure. currently, the most significant costs are transaction costs and entry costs.”44 the significant transaction costs stem from the time it takes to navigate the complexity of negotiating and complying with license agreements for journal content. the various types of “costs are currently borne by researchers and institutions, and are a strong hindrance to text mining uptake. these could be reduced if uncertainty is reduced, more common and straightforward procedures are adopted across the board by license holders, and appropriate solutions for orphaned works are adopted. however, the transaction costs will still be significant if individual rights holders each adopt different licensing solutions and barriers inhibiting uptake will remain.”45 in a survey of libraries, findings indicated that librarians anticipate a new role as facilitators between researchers and publishers to enable text mining.46 librarians are a natural fit for this role because they already have expertise in navigating copyright, requesting copyright permissions, and negotiating license agreements for journal content. “advice and guidance should be developed to help researchers get started with text mining. this should include: when permission is needed; what to request; how best to explain intended work and how to describe the benefits of research and copyright owners.”47 after their experience with developing a framework to license and purchase a large corpora of journal articles in xml format to be text mined, fox and williams came to believe that, in addition to providing copyright expertise, librarians should assist in reducing transaction costs by developing model license clauses for text mining and routinely negotiating for these rights when the library purchases journals and other types of content. adopting this philosophy and strategy led williams and fox to successfully advocate for the inclusion of a text-‐mining clause in the license agreement for the stem publisher in this case study at the time of the library’s subscription renewal. this occurred at a regional academic consortium level, making text mining easier at fourteen academic institutions. furthermore, the university of colorado libraries, which includes five libraries on four campuses, is now working on drafting a model clause to use when purchasing journal content as the university of colorado system and to put forth for consideration by the consortiums that facilitate the purchase of our major journal packages. given that incorporating text mining clauses into library–publisher license agreements for scholarly journals is in its infancy, there are few resources available to assist librarians adopting this new role. model clauses include the following: negotiating a text mining license for faculty researchers | williams et al 16 • british columbia electronic library network’s model license agreement48 o clause 3.11. “data and text mining. members and authorized users may conduct research employing data or text mining of the licensed materials and disseminate results publicly for non-‐commercial purposes.” • california digital library’s standard license agreement49 o section iv. authorized use of licensed materials. “text mining. authorized users may use the licensed material to perform and engage in text mining/data mining activities for legitimate academic research and other educational purposes.” • jisc’s model license for journals50 o clause 3.1.6.8. “use the licensed material to perform and engage in text mining/data mining activities for academic research and other educational purposes and allow authorised users to mount, load and integrate the results on a secure network and use the results in accordance with this license.” o clause 9.3. “for the avoidance of doubt, the publisher hereby acknowledges that any database rights created by authorised users as a result of textmining/datamining of the licensed material as referred to in clause 3.1.6.8 shall be the property of the institution.” publishers are also beginning to break down barriers perhaps, in part, because of the sentiment that “privately erected barriers by copyright holders that restrict text mining of the research base could be increasingly regarded as inequitable or unreasonable since the copyright holders have borne only a small proportion of the costs involved in the overall process; furthermore, they do not have rights or ownership of the inherent facts or ideas within the research base.”51 biomed central and plos both offer services that allow researchers to access xml text collections. biomed central makes content readily accessible by providing a website for bulk download of xml text.52 plos requires contact with a staff member for download of xml text.53 in december 2013, elsevier also announced that it would create a “big data” center at the university college london to allow researchers to work in partnership with mendeley, a knowledge management and citation application now owned by elsevier. while this is a positive step, the partnership does not appear to make the data available to research groups beyond the university college london.54 however, there is still a long way to go before publishers and librarians are routinely collaborating on opening up the scholarly literature to be mined. for example, a 2012 nature editorial states “nature publishing group, which also includes this journal, says that it does not charge subscribers to mine content, subject to contract.”55 repeated attempts by williams to obtain more information from nature publishing group and a copy of the contract have proved fruitless. in january 2014, elsevier announced that “researchers at academic institutions can use elsevier’s online interface (api) to batch-‐download documents in computer-‐readable xml format” after information technology and libraries | september 2014 17 signing a legal agreement. elsevier will limit researchers to accessing 10,000 articles per week.56,57 for small-‐scale projects with a narrow scope, this limit will suffice. for example, mining the literature for a specific gene that plays a known role in a disease could require a text set under 30,000 articles. at elsevier’s current rate of article transfer, a 30,000 article text set could be created in roughly three weeks. however, for large-‐scale projects such as hunter’s research group’s knowledge-‐based analysis system that require a text set of 400,000 articles (or much more, if not limited by budget constraints), nearly a year of time would be required to build the corpora. time is one of the most valuable commodities in computational biology. the elapsed time required to transfer articles at the rate of 10,000 articles per week represents a bottleneck that most grant-‐funded research cannot afford. speed of transfer will also be a factor. researchers require flexibility to maximize available central processing unit (cpu) hours because documents can take from a few seconds to a full minute each to transfer to the storage destination. monopolizing peak hours in high performance computing (hpc) settings may mean that computing power is not available for other tasks, although many hpc centers have learned to allocate cpu use more efficiently to high volumes. furthermore, the terms and conditions set by elsevier for output limits excerpting from the original text to 200 characters.58 this is roughly equivalent to two lines of text or approximately forty words. this may be insufficient to capture important biological relationships necessary to evaluate the relevance of the article to the research being represented by the hanalyzer knowledge-‐based analysis system. conclusion forging a partnership between a library, a research lab, and a major stem vendor requires flexibility, patience, and persistence. our experience strengthened the existing relationship between the library and the research lab and demonstrated the library’s willingness and ability to support faculty research in a nontraditional method. librarians are encouraged to advocate for the inclusion of text-‐mining rights in their library’s license agreements for electronic resources. what the future holds for publishers, researchers, and libraries involved in text mining remains to be seen. however, what is certain is that without cooperation between publishers, researchers, and libraries, breaking down the existing barriers and achieving standards for content formats and access terms will remain elusive. references 1. university of colorado anschutz medical campus, university of colorado anschutz medical campus quick facts, 2013, http://www.ucdenver.edu/about/whoweare/documents/cuanschutz_facts_041613.pdf. negotiating a text mining license for faculty researchers | williams et al 18 2. sonia m. leach et al., “biomedical discovery acceleration, with applications to craniofacial development,” plos computational biology 5, no. 3 (2009): 1–19, http://dx.doi.org/10.1371/journal.pcbi.1000215. 3. jonathan clark, text mining and scholarly publishing (publishing research consortium, 2013). 4. corie lok, “literature mining: speed reading,” nature 463 (2010): 416–18, http://dx.doi.org/10.1038/463416a. 5. hong-‐jie dai, yen-‐ching chang, richard tzong-‐han tsai, wen-‐lian hsu, "new challenges for biological text-‐mining in the next decade," journal of computer science and technology 25, no.1 (2010): 169-‐179, doi: 10.1007/s11390-‐010-‐9313-‐5. 6. anne hoekman, “journal publishing technologies: xml,” http://www.msu.edu/~hoekmana/wra%20420/ismte%20article.pdf. 7. alex brown, "xml in serial publishing: past, present and future," oclc systems & services 19, no. 4, (2003):149-‐154, doi: 10.1108/10650750310698775. 8. cartic ramakrishnan et al., “layout-‐aware text extraction from full-‐text pdf of scientific articles,” source code for biology and medicine 7, no. 7 (2012), http://dx.doi.org/10.1186/1751-‐0473-‐7-‐7. 9. ibid. 10. lawrence hunter and k. bretonnel cohen, “biomedical language processing: perspective what’s beyond pubmed?” molecular cell 21, no. 5, (2006): 589–94. 11. martin krallinger, alfonso valencia, and lynette hirschman, “linking genes to literature: text mining, information extraction, and retrieval applications for biology,” genome biology 9, supplement 2 (2008): s8.1–s8.14, http://dx.doi.org/10.1186/gb-‐2008-‐9-‐s2-‐s8. 12. eefke smit and maurits van der graaf, “journal article mining: the scholarly publishers’ perspective,” learned publishing 25, no. 1 (2012): 35–46, http://dx.doi.org/10.1087/20120106. 13. hunter and cohen, “biomedical language processing,” 589. 14. clark, text mining and scholarly publishing. 15. leach et al., “biomedical discovery acceleration.” 16. marti hearst, “what is text mining?” october 17, 2003, http://people.ischool.berkeley.edu/~hearst/text-‐mining.html. information technology and libraries | september 2014 19 17. k. bretonnel cohen and lawrence hunter, “getting started in text mining,” plos computational biology 4, no. 1 (2008): 1–3, http://dx.doi.org/10.1371/journal.pcbi/0040020. 18. jisc, “the model nesli2 licence for journals,” 2013, http://www.jisc-‐collections.ac.uk/help-‐ and-‐information/how-‐model-‐licences-‐work/nesli2-‐model-‐licence-‐/. 19. ian hargreaves, “digital opportunity: a review of intellectual property and growth,” may 2011, http://www.ipo.gov.uk/ipreview-‐finalreport.pdf. 20. james manyika et al., “big data: the next frontier for innovation, competition, and productivity,” mckinsey & company, may 2011, http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_inn ovation. 21. hargreaves, “digital opportunity.” 22. diane mcdonald and ursula kelly, “the value and benefits of text mining to uk further and higher education,” jisc, 2012, http://www.jisc.ac.uk/reports/value-‐and-‐benefits-‐of-‐text-‐ mining. 23. jisc, “the model nesli2 licence for journals.” 24. smit and van der graaf, “journal article mining.” 25. mcdonald and kelly, “the value and benefits of text mining.” 26. clark, text mining and scholarly publishing. 27. “gold in the text?” nature 483 (march 8, 2012): 124, http://dx.doi.org/10.1038/483124a. 28. richard van noorden, “trouble at the text mine,” nature 483 (march 8, 2012): 134–35. 29. claudio aspesi, a. rosso, and r. wielechowski. reed elsevier: is elsevier heading for a political train-‐wreck? 2012. 30. clark, text mining and scholarly publishing. 31. jill emery, “working in a text mine: is access about to go down?” journal of electronic resources librarianship 20, no. 3 (2009):135–38, http://dx.doi.org/10.1080/19411260802412745. 32. clark, text mining and scholarly publishing: 14. 33. van noorden, “trouble at the text mine.” 34. ibid. 35. ibid. negotiating a text mining license for faculty researchers | williams et al 20 36. jisc, “the model nesli2 licence for journals.” 37. smit and van der graaf, “journal article mining.” 38. van noorden, “trouble at the text mine.” 39. internal revenue service, “unrelated business income defined,” http://www.irs.gov/charities-‐&-‐non-‐profits/unrelated-‐business-‐income-‐defined. 40. clark, text mining and scholarly publishing: 10. 41. ibid: 14. 42. mcdonald and kelly, “the value and benefits of text mining.” 43. smit and van der graaf, “journal article mining.” 44. mcdonald and kelly, “the value and benefits of text mining.” 45. ibid. 46. smit and van der graaf, “journal article mining.” 47. mcdonald and kelly, “the value and benefits of text mining.” 48. british columbia electronic library network, bc eln database licensing framework, http://www.cdlib.org/services/collections/toolkit/. 49. “licensing toolkit,” california digital library, http://www.cdlib.org/services/collections/toolkit/. 50. jisc, “the model nesli2 licence for journals.” 51. mcdonald and kelly, “the value and benefits of text mining.” 52. “using biomed central’s open access full-‐text corpus for text mining research,” http://www.biomedcentral.com/about/datamining. 53. “help using this site,” plos, http://www.plosone.org/static/help. 54. iris kisjes, “university college london and elsevier launch ucl big data institute,” elsevier connect, press release, december 18, 2013, http://www.elsevier.com/connect/university-‐ college-‐london-‐and-‐elsevier-‐launch-‐ucl-‐big-‐data-‐institute. 55. “gold in the text?” 56. richard van noorden, “elsevier opens its papers to text-‐mining,” nature 506 (february 2, 2014): 17. 57. sciverse, content apis, http://www.developers.elsevier.com/cms/content-‐apis. information technology and libraries | september 2014 21 58. “text and data mining,” elsevier, , http://www.elsevier.com/about/universal-‐access/content-‐ mining-‐policies. 108 information technology and libraries | june 2006 tutorial writing your first scholarly article: a guide for budding authors in librarianship scott nicholson this series of questions and answers is designed to help you take the first steps toward the successful production of a scholarly article in librarianship. you may find yourself in a library position that requires writing or you may have just decided that you are ready to share your findings, experiences, and knowledge with the current and future generations of librarians. while following the guidelines listed here will not guarantee that you will be successful, these steps will take you closer to discovering the thrill of seeing your name in print and making a difference in the field. what should i write about? perhaps you already have an idea based upon your experiences and expertise, or perhaps you aren’t sure which of those ideas you should write about. the best way to start writing is to read other articles! many scholarly articles end with a future research section that outlines other projects and questions that the article suggests. it is useful to contact the author of a piece that holds a future research seed to ensure that the author has not already taken on that challenge. sometimes, the original author may be interested in collaborating with you to explore that next question. how do i start? scholarship is an iterative process, in that works that you produce are bricks in an ever-rising wall. your brick will build upon the works of others and, once published, others will build upon your work. because of this, it is essential to begin with a review of related literature. search in bibliographic and citation databases as well as web search tools to see if others have done similar projects to your own. the advantage of finding related literature is that you can learn from the mistakes of others and avoid duplicating works (unless your plan is to replicate the work of others). starting with the work of others allows you to place your brick on the wall. if you do not explicitly discuss how your scholarship relates to the scholarship of others, only those having familiarity with the literature will be able to understand how your work fits in with that of previous authors. in addition, it’s easier to build upon your work if those who read it have a better idea of the scholarly landscape in which your work lives. as you go out and discover literature, it is crucial to keep citation information about each item. much of what you will cite will be book chapters or articles in journals, and you will save yourself time and trouble later if you make a printed copy of source items and record bibliographic information on that copy. recording the title of the work, the full names (including middle initials) of authors and editors, page range, volume, issue, date, publisher and place of publication, url and date accessed, and any other bibliographic information at the time of collection will save you headaches later when you have to create your references list. as different journals have different citation requirements, having all of this information allows you the flexibility of adapting to different styles. one type of scholarship produced by libraries is the “how our library did something well” article. while a case study of your library can be an appropriate area of discussion, it is critical to position these pieces within the scholarship of the field. this allows readers to better understand how applicable your findings are to their own libraries. the concept illustrates the difference between the practice of librarianship and library science. library science is the study of librarianship and includes the generalization of library practice in one setting to other settings. before starting your writing, talk about your idea with your colleagues, which will help you refine your ideas. it will also generate some excitement and publicity about your work, which can help inspire you to continue in the writing process. colleagues can help you consider different places where similar works may already exist and might even open your eyes to similar work in another discipline. you may find a colleague who wants to coauthor the piece with you, which can make the project easier to complete and richer through the collaborative process. another important early step is to consider the journals you would like to be published in. many times, it can be fruitful to publish in the journal that has published works that are in your literature review. considering the journal at this point will allow you to correctly focus the scope, length, and style of your article to the requirements of your desired journal. your article should match the length and tone of other articles in that journal. most journals provide instructions to authors in each issue or on the web; the information page for ital authors is at www.ala.org/ala/ lita/litapublications/ital/information authors.htm. how can i find funding for my research? some projects can’t be easily done in your spare time and require resources for surveys, statistical analysis, travel, or other research costs. you will find that successful requests for funding scott nicholson (srnichol@syr.edu) is an assistant professor in the school of information studies, syracuse university, new york. writing your first scholarly article | nicholson 109 start with a literature review and a research plan. developing these before requesting funding will make your request for funding much stronger, as you will be able to demonstrate how your work will sit within a larger context of scholarship. you will need to develop a budget for your funding request. this budget will come together more easily if you have planned out your research. it may be useful or even required for you to develop a set of outcomes for your project and how you will be assessing those outcomes (find more information on outcome-based evaluation through the imls web site at www.imls.gov/grants/current/ crnt_obe.htm). developing this plan will give you a more concrete idea of what resources you will need and when, as well as how you can use the results of your work. resources for research may come from the inside, such as the library or the parent organization of the library, or from an external source, such as a granting body or a corporate donor. in choosing an organization for selection, you should consider who would most benefit from the research, as the request for funding should focus on the benefit to the granting body. many libraries and schools do have small pots of money available for research that will benefit that institution and that, many times, go untapped due to a lack of interest. granting organizations put out formal calls for grant proposals. these can result in a grant that would carry some prestige but would require a detailed formal application that can take months of writing and waiting. another approach is to work with a corporate or nonprofit organization that gives grants. if your organization has a development office, this office may be able to help connect you with a potential supporter of your work. how do i actually do the research? just as the most critical part of a dissertation is the proposal, a good research plan will make your research process run smoothly. before you start the research, write the literature review and the research plan as part of an article. it can be useful to create tables and charts with dummy data that will show how you plan to present results. doing this allows you to notice gaps in your data-collection plan well before you start that process. in many research projects, you only have a single chance to collect data; therefore, it’s important to plan out the process before you begin. how do i start writing the paper? the best way to start the writing process is to just write. don’t worry about coming up with a title; the title will develop as the work develops. you can skip over the abstract and introduction; these can be much easier to write after the main body of the article is complete. if you’ve followed the advice in this paper, then you’ve already written a literature review and perhaps a research plan; these make a good starting point for your article. one way to develop the body of the article is to develop an outline of headings and subheadings. starting with this type of outline forces you to think through your entire article and can help you identify holes in your preparation. once you have the outline completed, you can then fill in the outline by adding text to the headings and subheadings. this approach will keep your thinking organized in a way typically used in scholarly writing. scholarly writing is different than creative writing. many librarians with a humanities background face some challenges in transitioning to a different writing style. scholarly writing is terse; strunk and white’s the elements of style (2000) focuses on succinct writing and can help you refresh your writing skills.1 if you are having difficulty finding the time to write, it can be useful to set a small quota of writing that you will do every day. a quota such as four paragraphs a day is a reasonable amount to fit into even a busy day, but it will result in the completion of your first draft in only a few weeks. i’m finished with my first complete draft! now what? while you will be excited with the completion of the draft, it’s not appropriate to send that off to a journal just yet. take a few days off and let your mind settle from the writing, then go back and reread your article carefully. examine each sentence for a subject and a verb, and remove unneeded words, phrases, sentences, paragraphs, or even pages. try to tighten and clean your writing by replacing figures of speech with statements that actually say what you mean in that situation and removing unneeded references to firstand second-person pronouns. working through the entire article in this way greatly improves your writing and reduces the review and editing time needed for the article. after this, have several colleagues read your work. some of these might be people with whom you shared your original ideas, and others may be new to the concepts. it can be useful to have members of different departments and with different backgrounds read the piece. ask them if they can read your work by a specific date, as this type of review work is easy to put off when work gets busy. these colleagues may be people who work in your institution or may be people you have met online. if you know nobody who would be appropriate, consider putting out a request for assistance on a library discussion list focused on your research topic. dealing with the comments from others requires you to set aside your 110 information technology and libraries | june 2006 defenses. you did spend a lot of time on this work and it can be easy to slip into a defensive mode. attempt to read their comments from an objective viewpoint. remember—these people are spending their time to help you, and a comment you disagree with at first blush may make more sense if you consider the question “why would someone say this about my work?” putting yourself into the reader’s shoes can aid you in the creation of a piece that speaks to many audiences. what goes on when i submit my work? at this point, your readers have looked at the piece, and you have made corrections on it. now you’re ready to submit your work. follow the directions of the target journal, including length, citation format, and method of submission. if submission is made by e-mail, it would be appropriate to send a follow-up e-mail a few days after submission to ensure the work was received; it can be very frustrating to realize, after a month of waiting, that the editor never got the work. once you have submitted your work, the editor will briefly review it to ensure it is an appropriate submission for the journal. if it is appropriate, then the editor will pass the article on to one or more reviewers; if not, you will receive a note fairly quickly letting you know that you should pick another journal. if the reviewing process is “blind,” then you will not know who your reviewers are, but they may know your identity. if the process is “double-blind,” neither reviewer nor author will know the identity of the other. the reviewers will read the article and then submit comments and a recommendation to the editor. the editor will collect comments from all of the reviewers and put them together, and send those comments to you. this will always take longer than you would prefer; in reality, it will usually take two to six months, depending upon the journal. after a few months, it would be appropriate for you to contact the editor and ask about the progress on the article and when you should expect comments. do not expect to have your article accepted on the first pass. the common responses are: ■ reject. at this point, you can read the comments provided, make changes, and submit it to another journal. ■ revise and resubmit. the journal is not making a commitment to you, but they are willing to take another look if you are willing to make changes. this is a common response for first submissions. ■ accept with major changes. the journal is interested in publishing the article, but it will require reworking. ■ accept with minor changes. you will be presented with a series of small changes. some of these might be required and others might be your choice. ■ accept. the article is through the reviewing process and is on to the next stage. this is an iterative process. you will most likely go through several cycles of this before your article is accepted, and staying dedicated to the process is key to its success. it can be disheartening to have made three rounds of changes only to face another round of small changes. ideally, each set of requested changes should be smaller (and take less time) until you reach the acceptance level. do not submit your work to multiple journals at the same time. if you choose to withdraw your work from one journal and submit it to another, let the editor know that you are doing this (assuming they have not rejected your work). my article has been accepted. when will it come out? once your article is accepted, it will be sent into a copyediting process. the copy editor will contact you with more questions that focus more on writing and citation flaws than on content. after making more corrections, you will receive a proof to review (usually with a very tight deadline). this proof will be what comes out in the journal, so check important things like your name, institutions, and contact information carefully. the journal will usually come out several months after you see this final proof. the process from acceptance to publication can take from six months to two years (or more), depending on how much of a publication queue the journal has. the editor should be able to give you an estimate as to when the article will come out after full acceptance. can i put a copy of my article online? it depends upon the copyright agreement that you sign. many publishers will allow you to put a copy of your article on a local or institutional web site with an appropriate citation. some allow you to put up a preprint, which would be the version after copyediting but not the final proof version. if the copyright agreement doesn’t say anything about this, then ask the editor of the journal about the policy of authors mounting their own articles on a web site. conclusion writing an article and getting it published is akin to having a child. your child will have a life of its own, and others may notice this new piece of knowledge and build upon it to improve their own library services writing your first scholarly article | nicholson 111 or even make their own works. it is a way to make a difference that goes far beyond the walls of your own library, to extend your professional network, and to engage other scholars in the continued development of the knowledge base of our field. reference 1. w. strunk jr. and e. b. white, the elements of style (boston: allyn & bacon, 2000). for more information: w. crawford, first have something to say: writing for the library profession (chicago: ala, 2003). r. gordon, the librarian’s guide to writing for publication (lanham, md.: scarecrow, 2004). l. hinchliffe and j. dorner, eds., how to get published in lis journals: a practical guide (san diego: elsevier, 2003), www .elsevier.com/framework_librarians/lib raryconnect/lcpamphlet2.pdf, (accessed feb. 8, 2006). adventure code camp: library mobile design in the backcountry david ward , james hahn, and lori mestre information technology and libraries | september 2014 45 abstract this article presents a case study exploring the use of a student coding camp as a bottom-up mobile design process to generate library mobile apps. a code camp sources student programmer talent and ideas for designing software services and features. this case study reviews process, outcomes, and next steps in mobile web app coding camps. it concludes by offering implications for services design beyond the local camp presented in this study. by understanding how patrons expect to integrate library services and resources into their use of mobile devices, librarians can better design the user experience for this environment. introduction mobile applications offer an exciting opportunity for libraries to expand the reach of their services, to build new connections, and to offer unique, previously unavailable services for their users. mobile apps not only provide the ability to present library services through mobile views (e.g., the library catalog and library website), but they can tap into an ever-increasing list of mobile-specific features. by understanding how patrons expect to integrate library services and resources into their use of mobile devices, librarians can better design the user experience for this environment. by adjusting the normal app production workflow to directly involve students during the formative stages of mobile app conception and design, libraries have the potential to generate products that more accurately anticipate real-life student needs. this article details one such approach, which sources student talent to code apps in a fast-paced, collaborative setting. as part of a two-year institute of museum and library services (imls) grant, an academic library– based research team investigated three different methods for involving users in the app development process—a student competition, integrated computer science class projects, and the coding camp described in this article. the coding camp method focuses on a trend in mobile software development of having intensive two-to-three-day coding events that result in working prototypes of applications (e.g., iphonedevcamp, http://www.iphonedevcamp.org/). coders typically work in groups to simultaneously learn how new software works, and also develop a functioning app that matches an area of personal interest. camps promote collaboration, which provides additional networking and social outcomes to attendees. additionally, camps provide an david ward (dh-ward@illinois.edu) is reference services librarian, james hahn (imhahn@illinois.edu) is orientation services & environments librarian, and lori mestre (lmestre@illinois.edu) is head, undergraduate library and professor of library administration, university of illinois at urbana-champaign. http://www.iphonedevcamp.org/ mailto:dh-ward@illinois.edu mailto:imhahn@illinois.edu mailto:lmestre@illinois.edu adventure code camp: library mobile design in the backcountry | ward, hahn, and mestre 46 opportunity for software makers to promote their services and products, and they can result in new code and ideas on which to base future products. for academic libraries, a camp environment provides an educational opportunity for students, particularly those in a field with a computing or engineering focus, to learn new coding languages and techniques and to gain experience with a professional software production process that runs the full timeline from conception to finished product. coding camps offer a chance for librarians to get direct student feedback on their own software development goals. the resulting applications provide potential benefits to both groups—students have a functional prototype to enhance their classroom experiences and a codebase to build on for future projects, and the librarians gain an insight into students’ desires for the content of mobile apps, code to integrate into existing apps, and direct student input into the iterative design process. this article presents the results of a mobile application coding camp held in fall 2013. the camp method was tested as a way to explore a less timeand staff-intensive process for involving students in the creation of library mobile apps. three specific research questions framed this investigation: 1. what library and course-related needs do students believe would benefit from the development of a mobile application? 2. is the library providing access to data that is relevant to these needs, and is it available in a format (e.g., restful apis) that end users can easily adopt into their own application design process? 3. how viable is the coding camp method for generating usable mobile app prototypes? literature review in line with efforts in academic libraries to operationalize participatory design for novel service implementation,1 the library approach to code camps included sourcing student technical expertise in line with other tech companies’ approaches to quickly iterating prototypes that may advance or enhance company services. while coding camps happen in corporate settings, other types of camps try to publicize technologies, like a programming language, while others still are directed toward a specific cohort.2 the departure point for the library was in understanding other ways the library might consider organizing and pairing its resources of apis with other available campus services. a few highly visible and notable corporate “hackfests” or “hackdays” include the facebook hackdays, in which facebook timeline was developed (http://www.businessinsider.com/facebook-timeline-began-as-a-hackathon-project-2012-1). the mobile app company twitter also has monthly hackfests where employees from across the company work for a sustained period (a weekend or friday) on new ideas putting together prototypes that may transition into new services for the company. http://www.businessinsider.com/facebook-timeline-began-as-a-hackathon-project-2012-1 information technology and libraries | september 2014 47 an example of code camps from academia are the mhacks camps at the university of michigan (http://mhacks.challengepost.com/), among the largest code camps for university students in the midwest. these camps are notable for their funding from corporations and for their support of student travel from colleges around the country to participate at the university of michigan. at each event, coders are encouraged to make use of the corporate apis that student programmers may make use of once they graduate or form companies after graduation. on the professional front, digital library code meet-ups (such as that of the code4lib preconference: http://code4lib.org/) are an opportunity for library technologists to share strategies and new directions in software using hands-on group coding sessions that last a half or full day. a recent digital event for the digital public library of america (dpla) hosted hackfests to demonstrate interface and functional possibilities with the source content in the dpla. similarly, the hathi trust research center organized a developer track for api coaching at their conference so that participants would have hands-on opportunities to use the hathi trust api (http://www.hathitrust.org/htrc_uncamp2013). goals of coding camps include development of new services or creation of value-added services on already existing operations. code is not required to be complete, but functional prototypes help showcase new ways of approaching problems or novel solutions. recently, mhacks issued the call to form new businesses at their winter hackathon (http://www.mhacks.org). libraries are typically less interested in new businesses, but rather seek new service ideas and new principles for organizing content via mobile and to do so in such a way that will source student preferences for location specific services, a key focus for the research team’s student/library collaborative imls grant. method while the camp itself took only two days, there was a significant amount of lead-time needed to prepare. in addition to obtaining standard campus institutional-review-board permissions for the study, it was also necessary to consult the office of technology management to devise an assignment agreement covering the software generated by the camp. the research team chose a model that gave participating students the option to assign co-ownership of the code they developed to the library. this meant that both students and the library could independently develop applications using the code generated during the camp. marketing for the camp specifically targeted departments and courses where students with interest and skills for mobile application development were likely to be found, particularly in computer science and engineering. individual instructors were contacted, as well as registered student organizations, to help promote the camp. attendees were directed to an online application form, where they were asked to provide information on their coding skills and details on their interest in mobile application development. http://mhacks.challengepost.com/ http://code4lib.org/ http://www.hathitrust.org/htrc_uncamp2013 http://www.mhacks.org/ adventure code camp: library mobile design in the backcountry | ward, hahn, and mestre 48 ten students were ultimately selected from the pool and, of those, six attended the camp. a precamp package was sent to these students to help them prepare for the short, intense timeframe the event entailed. this package included details on library data that were available to base applications on through web apis, as well as brief tutorials on the coding languages and data formats participants needed to be familiar with (e.g., javascript, json, xml, etc.). participants were also provided with information on parking and other logistics for the event. the research team consisted of librarians and academic professionals involved in public services and mobile design, and student coders employed by the library to serve as peer mentors. the team designed the camp as a two-day experience occurring over a weekend (friday evening to saturday late afternoon). the first day was scheduled as an introduction to the camp, with details on library and related apis that could be used for apps and an opportunity for participants to brainstorm app ideas and form design teams. the day ended with some preliminary coding and consultation with camp organizers about planned directions and needs for the second day. the second day of the camp mostly for coding, with breaks scheduled for food, presentations of work-in-progress, and an opportunity to ask questions of the research team. the day ended with each team presenting their app, describing their motivation in designing it and the functionality they had been able to code into it. given the brief turnaround time, the research team put a heavy focus during the orientation session on clearly articulating the need to develop apps germane to student library needs. examples from the student mobile app design competition conducted in february 2013 were provided as starting points for discussion, as these reflected known student desires for mobile library applications.3 after the camp ended, students who elected to share their code with the library were given details on how and where to deposit the code. post-camp debriefing interviews (lasting 30 to forty-five minutes each) were scheduled individually with all participants to get their feedback on the setup of the event as well as what they felt they learned from the experience. discussion researcher observations and feedback from students, both during the camp and in individual interviews afterwards, led to several insights about what sorts of outcomes libraries might anticipate from running camps, how to best structure library coding camps, what outcomes students anticipate from participating in a sponsored camp environment, and what features and preferences students have for mobile apps designed to support their academic endeavors. a key student assumption, which emerged from comments at the event and through subsequent student interviews, was that students anticipated completing a fully functioning mobile app by the end of camp. instead, the two student teams each finished with an app that, while it included some of the features they desired, still required additional coding to be fully realized. several information technology and libraries | september 2014 49 suggestions were made for how this need might be met at future events. the most consistent feedback from the students was that they would have liked an additional day of coding (three total camp days), so that they could have gotten further on the implementation of their app ideas. during the exit interviews, one student noted that the two-day timeframe really only allowed for sketching out an idea for an app, not coding it from scratch. a pair of related suggestions from students included having templates for mobile apps available to review to get up to speed on new frameworks (particularly jquery), and secondly, a longer meetand-greet for teams prior to beginning work during which they could compare available coding skills and have some extended brainstorming of app ideas. students were somewhat mixed in their desire for assistance in developing app ideas—some appreciated the open-endedness of the camp, but others wanted a more organizer-driven approach. some students suggested having time to work with library staff after the camp to finish or polish their apps. this observation suggests the enthusiasm students had for the camp itself, and specifically for having a social, structured, and mentored opportunity to develop their coding skills. based on these requests, the research team created “office hours” on the fly after the camp ended to support this request. research team members and coding staff communicated times when team members could come into the library and get additional help with developing their apps. the students had very similar themes for app features to those that the research team observed in an earlier student mobile app competition study. notable categories included the following: • identify and build connections with peers from courses. • discover campus resources and spaces. • facilitate common activities such as studying and meeting for group work. students remarked that the camp was an opportunity to both meet people with similar coding interests as well as to learn more about specific functional areas of app development (specific coding languages, user interface design, etc.) in which they had little experience. jquery and javascript for user-facing designs were particular areas of interest. many students had some indepth background working on pieces of a finished software product but had not previously done start-to-finish software design; this was a big selling point for the camp. the collaborative nature of the camp also matched students’ preferences to work in teams and to learn from peers. while the research team had coders on hand to assist with both the library apis, as well as jquery basics, most teams did the majority of their work themselves, and preferred self-discovery. each team did eventually ask for coding advice, but this occurred toward the end of the camp, once their apps were largely coded and they needed assistance overcoming particular sticking points. the other piece of advice students asked organizers about concerned identifying apis for locations of campus maps, and other related resources to serve as data sources powering their apps. in the course of assisting with these requests, researchers discovered another key issue facing library mobile app development—the lack of campus standards for presenting information across adventure code camp: library mobile design in the backcountry | ward, hahn, and mestre 50 different colleges and departments. in particular, maps of rooms inside campus buildings were not provided in a consistent or comprehensive way. this was particularly frustrating to the team that was attempting to develop an app featuring turn-by-turn navigation and directions to meeting rooms and computer labs. in addition to sharing information on known apis and data sources, camp organizers also learned about previously unknown data sources from the student teams. one example was a json feed for the current availability of computers in labs provided by the college of engineering. while this feed was beneficial to starting work on an app for one team, it also led to frustration because feeds for other campus computer labs did not exist, and the team was limited to designing around the specific labs that did have this information available. observed student discussions about the randomness of data availability also highlighted one of the key themes of student-centered design—the conceptualization of a university as a single entity, the various parts of which combine and come in and out of focus depending on the current student task. related student feedback from one of the post-event interviews described a strong desire to create integrated, multifunction apps to meet student needs as opposed to a variety of apps that each did one thing. the siloed nature of campus infrastructures frustrates this desire to some extent but also creates opportunities for students to build a tool that meets a real need among their peers to comprehend and organize their academic environment. this observation also matches those found during the aforementioned student competition. conclusion and future directions student feedback on the camp, as a whole, was very positive, and in the individual interviews, students noted they would like to participate in another camp if it was offered. on the library side, the research team felt that the camp was useful to their ongoing mobile app development process, partially for the code generated but primarily for the direct feedback on what types of apps students wanted to see. the start-up time and costs for the project were low, as expected, and the insights into student mobile preferences seemed proportionate to this outlay. the camp method should be reproducible in a variety of library environments. the key assets other libraries will need to have in place to run a camp include staff with knowledge of client-side api use (in particular jquery, cors, or related skills), and knowledge of campus data sources that students may wish to pull from. third-party apis with bibliographic data (e.g., good reads) could also be used as placeholders for libraries that do not have access to apis for their own catalogs or discovery systems. student suggestions for extending the camp by a day, and their ideas for how to structure it for student success, were very specific and actionable and provided excellent guidance. one of their ideas was to develop tutorials and templates that could be introduced as a pre-camp meeting. this would not add too much prep time. another idea for a future camp would be to develop a specific theme for teams, which would allow for more documentation of and practice with specific apis. information technology and libraries | september 2014 51 the low attendance was a concern, so for the next camp twice the number of desired participants will be invited to ensure both a variety of coding skills and interests as well as opportunities for more teams to be formed. additionally, partnerships with student coding groups or related classes should help to drive up attendance. the biggest difficulty moving forward will be developing campus standards for data that can be made available to students about resources, spaces, and services. as noted above, students typically do not design a “library app,” rather they look to build a “student app” that pulls in a variety of data from across campus. functions of apps are therefore more oriented toward common student activities like studying, socializing, and learning. a related challenge will be to provide adequate format and delivery mechanisms for access to supporting data feeds. cognizant of the silo issue, noted above, as libraries present their own data for student consumption, these tendencies towards a unified view need to be taken into account. completion of an assignment is more than identifying three scholarly sources; it might involve identifying a space to do the research, locating peers or mentors for either the research or writing process, locating suitable technology to complete an assignment, and a variety of other needs. the features and information presented on a library’s website should be designed as modular building blocks that can fit into other campus services in a similar way to how course reserves are sometimes presented in campus learning management services alongside syllabi and assignments. separating library content (e.g., full-text articles, room information, research services) from library process can help with freeing information about what libraries have to offer and can facilitate broader discovery of services and resources at point of need. key to this process is recognizing the student desire to shape the resources they need into a comprehensible format that matches their workflow rather than forcing students to learn a specific, isolated, and inflexible path for each part of the projects they work on. this study has shown that a collaborative process in technology design can yield insights into students’ conceptual models about how spaces, resources, and services can be implemented. while the traditional model of service development often leaves these considerations until the very end in a summative assessment of service, the coding camp and collaborative methods presented here provide librarians a new tool for adding depth to service design and implementation, ultimately resulting in services and platforms that are informed by a more wellrounded and deeper understanding of the student mobile-use experience. in that regard, the initial research questions that framed this study could also be used by other libraries as they explore the library and course-related needs that students could benefit from with the development of mobile applications, as well as to determine if their library provides access to data that is relevant to those needs. the results from this study have affirmed that, for at least the library from this study, that the coding camp method is viable for generating usable mobile app prototypes. it also affirmed that by directly involving students during the formative stages of mobile app conception and design, the products of those apps more accurately reflect real-life student needs. adventure code camp: library mobile design in the backcountry | ward, hahn, and mestre 52 references 1. council on library and information resources (clir), participatory design in academic libraries: methods, findings, and implementations (washington, dc: clir, 2012), http://www.clir.org/pubs/reports/pub155/pub155.pdf. 2. “hackathon,” wikipedia, 2014), http://en.wikipedia.org/wiki/hackathon. 3. david ward, james hahn, and lori mestre, “designing mobile technology to enhance library space use: findings from an undergraduate student competition,” journal of learning spaces (forthcoming). http://www.clir.org/pubs/reports/pub155/pub155.pdf http://en.wikipedia.org/wiki/hackathon 124 book reviews theory and application of information research. edited by ole harbo and leif kajberg. london: mansell publishing, 1980. 235p. £16.00. isbn: 0-7201-1513-2. this book reproduces twenty-one papers presented at the second intemational research forum on information science, which was held at the royal school of librarianship in copenhagen during august of 1977. the title of this work may be misleading since the majority of the papers could better be described as the foundations of information science. the papers that advanced the theory of information science were the exception, and the contributions dealing with practical applications were even rarer. the contributors included many familiar names: kathleen t. bivins, anthony debons, william goffman, manfred kochen, allan d. pratt, and hans h . wellisch from the united states; nicholas j. belkin, j. m. brittain, b. c. brookes, robert a. fairthorne, j.-m. griffiths, m. h. heine, s. e. robertson, b. c. vickery, and t. d. wilson from the united kingdom; and many names from europe that may be less familiar on this side of the atlantic. the forum was organized into five sessions: general models of information science, information science in relation to other scientific disciplines, measurement, the information retrieval process, and the future tasks of information scientists in europe. within the book, the distinction between these sessions generally is not obvious. appendixes give the forum program, summarize the discussions of the papers, and report on group discussions. in the introduction, it was stated that it was hoped that the forum would bridge the gap betwe~n theory and research on one side and practice on the other. the book does not fulfill this hope, but it does present a good collection of papers dealing with a variety of aspects in information science. the view that the main problems of information science are cognitive rather than technical is evident in many of the papers. however, bradford's law, shannon's theory, and the epidemic model are addressed in several of the papers. with a few exceptions, the papers are quite readable and do not require a mathematical background to be understood and appreciated. the summaries and group discussions are disappointing, possibly because several of the authors were unable to attend the forum. kathleen bivius was the only american contributor present. there is no index, although one would have been helpful. the book is valuable and should be part of any library collection covering information science. anyone interested in information science should be able to find several highly relevant papers. however, only a limited number of scholars will find it necessary to read the entire work.-edward t. o'neill, matthew a. baxter school of information and library science, case western reserve university, cleveland, ohio. personal documentation for professionals-means and methods, by v. stibic. amsterdam: north-holland pub!. co., 1980. 214p. $29.25 (dfl 60.00). isbn: 0-444-85480-0. while there have been many a number of books written on the design, development, and use of large-scale database systems, there have been few that focus on the control of one's own personal collect ion of reprints, memoranda, reports, drafts, slides, and related miscellanea, which accumulate so rapidly in any professional " information -handler's" office. stibic's book addresses this problem in a thoroughly professional and competent manner. his first two chapters introduce the general nature of the problem, and discuss professionals' information needs and sources. the third, "document description," covers the record structure, abstracting, subject descriptions, keywords and classification methods, and their various combinations. the fourth chapter details the various technical means for storage of original documents, microfilm, and such control meobanisms as card indexes, peek-a-boo cards, and computer-supported indexes. all of these chapters draw on the experience and practices familiar to users of large-scale systems. stibic recommends the use of iso and other standardized practices, and endeavors to emphasize the need for constructing one's own system in accord with generally accepted design principles. stibic is careful to point out, however, that if one is in fact designing a personal documentation system, then personal idiosyncrasies and preferences can be built into it. it is not necessary to use an established and standardized vocabulary or classification system without modification. one may alter it to suit one's own purposes. however, the structure of the system (whether descriptors, classification numbers, or other means} must be controlled; otherwise the system will become useless. the next four chapters are case studies of different systems. the first is a card index technique used by an individual. the second describes a computerized index to support the documentation needs of a project team. (essentially an augmented kwic index, published quarterly.) the third case study is one of particular interest to many professionals at the moment-the use of a personal computer as an indexing control system. the system, though not explicitly identified, is roughly comparable to many of those available in the u.s.; a microcomputer with 64k ram, a display of 80x24 lines, two floppy disks with 512k bytes/disk, and an socharacter-per-line printer. the indexing is done via a faceted classification system of about 250 terms, which are hierarchically linked, providing automatic up-posting from specific to generic terms. a hashcoding technique is used to minimize the storage space required on the disk, and searching is performed by simple serial book reviews 125 searching of the index records. the fourth case study is an examination of the upgrading of the manual card index described in the first study to a system supported by a large main-frame computer, using a terminal in the professional's office. a combination of automatic keyword extraction and manual classincation is used for indexing. complex boolean searches are possible with this system. stibic concludes with a chapter on future prospects, touching briefly on such things as internal and public viewdata/teletext systems. he also provides a checklist of desirable features of "a multi-purpose personal work station." such a station is not merely a special-purpose device used to aid in some parts of one's work, such as retrieval, but is an integral part of all of one's work; computer, calculator, textprocessor, mail-dispatch system, calendar, in/out box, and so forth. the author, a scientist of long standing with philips in holland, has provided a valuable guide to this area. there are two relatively minor points of criticism, however. whether it was the author's or the publisher's choice is not clear, but there is an excessive use of italics throughout the text. this lavish use seems more appropriate to teenagers' romantic novels than to a serious work. in this case, it is more distracting than helpful. secondly, but more understandably, the extensive references stibic gives are frequently to documents not easily available in the u.s. some are oecd papers, some refer to the german din standards, and some to internal philips technical reports. these are minor points, however, regarding an excellent book. it is recommended not only for the information professional, but for anyone who is seriously concerned with the problem of keeping track of what one needs to know.-allan d. pratt, university of arizona graduate library school, tucson. viewdata revolution, by sam fedida and rex mahle a halsted press book. new york: wiley, 1979. l86p. $34.95. lc: 7923869. isbn: 0-470-26879-4. sam fedida is the inventor of prestel, 126 journal of library automation vol. 14/2 june 1981 the british post office's viewdata system . with this as his license, he and rex malik have written a 186-page volume explaining the prestel system. prestel is a series of databases, which are accessed by a keypad similar to a calculator. the common television takes on the characteristic of a crt for viewin·g alphabetical and numerical information. the connection to the computer is by telephone, and, in britain, the post office is in charge of the telephones . overall, in spite of several printing errors, this book does provide information about the system. the authors explain the types of information that will be available on the pres tel system, such as "buying a car," "houses for sale," "entertainment," "education," "an evening out," and "news . " they have also devoted individual chapters to electronic mail, electronic funds transfer, and education, explaining how each works in the system. the authors stress the benefits and attributes of their system almost to the point of redundancy . in each of the chapters, the manner in which the information is going to be accessed is repeated. despite the repetition, the primary focus is what prestel will do for the betterment of mankind. the uniqueness of prestel is the simplicity of its access process. according to the authors, being able to access the information in one's own home will make prestel a major tool for dissemination of information for many agencies and businesses. at times, the "hard sell" is very obvious throughout the volume. however, the diagrams are good and help to explain the authors' points. the problems fedida and malik anticipate in the electronic mail and protocols are realistic. in the chapters "future i" and "future ii," the authors go off on a tangent, using a time line, on what they see in the future. again, it is basically a repetition of what was said in the previous chapters, only from a futuristic point of view. here, the reader gets a distinct feeling of what is really bothering them now in the system; that is, government bureaucracy . they cite the different groups trying to control the information by means of legislation. they delve into the problem of uniformity of standards. television is an example . what will be standard for convertors and adapters for the computer hookup? this is a real problem that was well explored throughout the work. this volume is good for librarians who are interested in cable, telecommunications, and computers . however, be aware of its poor organization. there are numerous printing errors that affect its readability. nevertheless, if a person can wade through these errors and the repetition of ideas, he/she can obtain some useful information from this text. there is a distinct feeling throughout this work that it was put together hastily . nonetheless, there is a dearth of information on this subject, and this book will serve some useful purpose for libraries .-robert miller, memphis/shelby county public library and information center, memphis, tennessee . ala filing rules. filing committee, resources and technical services division, american library association . chicago: american library assn., 1980. 50p. $3.50. lc: 80-22186. isbn: 0-8389-3255-x. library of congress filing rules. prepared by john c . rather and susan c. biebel. washington, d. c.: library of congress, 1980. ll1p . $5. lc: 80-607944 isbn: 0-8444-0347-4. available from customer services section, cataloging distribution service, library of congress, washington, dc 20541. these two works represent the culmination of over a decade of effort within the library profession to overhaul the techniques by which entries are arranged to form catalogs. the impetus for this work came from recognition that computer technology would soon be enlisted to perform the arrangement of entries for the production of catalogs, and that filing rules current at the time would be impossible to implement in their entirety on the computer. although the original intention was to develop rules appropriate for the arrangement of entries by computer, those at the library of congress and the ala committee working on the problem soon realized that, from the point of view of catalog users, it would be very undesirable to have different sets of filing rules in operation depending on the physical medium of the catalog. therefore, the scope of the effort was broadened to rules that could be applied both manually and by machine using headings that were formulated according to more than one set of cataloging rules. now that we have these new rules, the question arises whether they are better than what preceded them . the criteria for "better" ought to be whether the rules make entries easier to find both for known-item searches and browsing within the complex device called a library catalog. or to state the same criteria negatively: it should be more difficult to lose an entry in the catalog if it has been filed according to the rules . the evaluation of these rules against other possible approaches to catalog arrangement ought to be centered on observation of the needs of a variety of both experienced and unsophisticated catalog users and on measurement of the effectiveness of the alternative approaches to meet these needs. the complex problems of filing clearly exemplify the need for research as recently expressed by herb white in his columns in american libraries. lacking any empirical data on which to base an evaluation, we must rely on our professional judgment and personal biases to argue the case for the new rules. to this reviewer, it seems that common sense supports a set of rules that are simple, consistent, and easy to explain to library users. the need for simplicity and consistency directly implies the "file-as-is" principle (i.e., file exactly as the heading is visually constructed, not by some interpretation of it), which should be applied even at the cost of having to search in more than one place in the arrangement; e .g., numeric digits and numeric words , mac and me, muller and mueller. the file-as-is principle has been more consistently applied in the ala rules than the lc rules, the latter undoubtedly a result of the anticipated complexity and size book reviews 127 of lc's catalogs, although there is no justification argued for these departures from the basic principle. of specific interest to readers of the journal is whether these rules can be implemented for computer sorting of catalog entries . do the rules succeed in meeting their original objective? the ala rules certainly appear to be amenable to very straightforward systems analysis and programming. for this the committee and its chairperson, joe rosenthal, need to be commended. from some sources there are already claims of systems that fully implement the new ala rules, which certainly could be the case . however, it would be interesting to know how these systems deal with the follow ing, which seem to be potentially troublesome: • the lack of consistent support in the marc format for handling initial articles when the rules call for ignoring initial articles in corporate names other than personal or place names, title subheadings ($t subfield), and subject headings. the english articles obviously present no problem, but the table of articles in appendix 2 shows more than thirty words that can be both an article and the cardinal numeral l. in addition, the footnote , "in h awaiian, the '0 emphatic' must be carefully distinguished from the preposition 0, but 0 also serves. the h awaiian language as a noun and a verb (each with several meanings), an adverb, and a conjunction," must surely give pause to the diligent systems designer. the recent library of congress practice of dropping nonfiling initial articles from heading fields still does not solve the problem of initial articles in the several million marc records that already exist in library catalogs . • the requirement that roman numerals be filed numerically presents an opportunity to construct an interesting but not overly complex algorithm . however, although the marc format makes the identification of roman numerals in heading fields fairly straightforward (the $b subfield), the identification of roman numerals embedded in a long title is much more ambiguous . for example, does iv mean "4" or "intravenous"? 128 journal of library automation vol. 14/2 june 1981 • the rules require that punctuation in an arabic numeral that is included to increase its readability is to be ignored in filing, but decimal points are significant in determining the numeric value of the number (i.e . , .003 files before 1) . how does one specify an algorithm to deal with the title, "5.000 kilometres dans le sud"? using european practice, this number is obviously 5,000, but why not 5 according to the computer algorithm? • the special rule for nonroman alphabets (rule 7) is interesting: "if, in the arrangement of bibliographic records, it is necessary to distinguish access points containing characters in different nonroman alphabets, scripts and syllabaries (cf. rule 1, order of characters) the following order of precedence is used. . .. " there follows a table beginning with amharic and ending in tibetan. that is the entire rule. systems designers who have implemented this rule clearly have transcendent skills! reliance on the marc language code in the 008 field has both theoretical and practical problems. • the introductory text advises libraries to include in the file information notes and references that explain filing practices to catalog users . however, the rules do not specify where these references are to file in relation to other headings. admonishment to provide these at "appropriate points" is not much help. • the ampersand is ignored in filing (for which we should be grateful) . but, by including the optional rule 1.3, which allows filing the ampersand "as its spelled-out language equivalent," the ala committee has put systems designers in the position of having to explain why this rule cannot be implemented on the computer-at least not until the marc format includes a code for language of the field (not a likely development, and even then not all ambiguity would be eliminated). interestingly, the library of congress treats all ampersands as a character filing between blank and the letter a . • the optional rule 9.1, which allows the inclusion of "the role of a person or a corporate body in a legal action in arranging access points," presents a problem when the rule requires suppression of all other relators . how is the computer programmed to recognize a legal action? is there a finite list of such relator words? differences between aacr2 and previous cataloging practices further complicate the use of this option . admittedly, many of these problems are marginal in terms of the number of entries in a catalog affected, but to a systems designer, even though there is only one instance, it must be accounted for in the computer programs if the system can claim a "full" implementation of the rules. clearly, full implementation will require some changes in the marc format before all rules can be applied absolutely consistently and unambiguously . the library of congress rules, although applying similar principles, depart significantly from the ala rules in detail and complexity. a full analysis of the implementation problems would require much more space than this review will allow. suffice it to say that although the library's libsked program has been under development for twelve years, and its strengths and limitations have undoubtedly influenced the development of these filing rules, there are elements in these rules that have not yet been implemented in libsked, and several where no one has yet figured out how to do it. although the work on these rules is complete, there are two more projects the profession should undertake that would be most useful for those concerned with catalog development . in both sets of rules, there is mention in the introduction of the need for a brief version of the essential rules, which could be handed out to catalog users . why did the committee not develop such a brief guide and include it as an appendix to the rules? those of us who work on computers are familiar with the reference cards for programming languages put out by computer manufacturers . a similar format for the filing rules would be very useful. another more difficult but equally useful project would be the publication of a standard design implementation of the ala filing rules expressed in terms of the marc format . such a design would include the marc fields and subfields necessary for each possible entry from a bibliographic record and a description of any special processing required for particular data elements. the design would be expressed at a level that is independent of programming languages and computer hardware . we need a standard reference that translates the filing rules into the language of the marc format. t he ala rules, in some tantalizingly brief instances, begin this process. both sets of filing rules are significant improvements over those previously available to systems analysts. reference librarians should find these rules easy to explain book reviews 129 to beleaguered catalog users. for their simplicity and relatively slight departure from the "file-as-is" principle, the ala rules are to be recommended . the library of congress rules, in their attempt to retain the classificatory structures that support the browsing user, further complicate the task of the user performing a known-item search. library research has indicated that the preponderance of catalog searches in research libraries are known-item searches .-] ohn f. knapp, ringgold management systems, beaverton, oregon . tps ties them together ® aegistered tr ade mark oclci!!> a t s a c c i rcu l at i®o n l 0 s g i® r l i n® n g tps electronics provides on-line and off-line interfaces • one-step item processing • error-free data entry • back-up storage tps electronics 4047 transport st. palo alto, ca 94303 41 5-494-6802 130 highlights of lit a board meetings the highlights of lita board meetings are published here to inform division members of the activities of their board . the highlights are not the official minutes of the meetings . 1981 ala midwinter meeting washington, d.c. first session february 1, 1981 the meeting was called to order by s. michael malinconico, president . the following board members were present : s. michael malinconico, barbara evans markuson, brigitte l. kenney, nancy l. eaton, kenneth j. bierman, ronald f . miller, bonnie k. juergens, marilyn j. rehnberg, heike kordish, and donald p. hammer, lita executive director. staff: laura stewart. the minutes of the 1980 annual meetings were approved and adopted with the correction that brigitte kenney be reported as present at the wednesday, july 2, 1980, meeting. marbi committee report (report by eleanor montague). montague reported that the marbi committee is continuing its work, and that the members do not feel that the value of their work has been lessened by the new arrangement with the library of congress . the committee has discussed changing its mode of operations by introducing teleconferencing and by establishing a steering committee, but these things may be in the future . board discussion took place on the value of ala input to the marc format and whether or not lita should support a representative to the two lcsponsored meetings. montague requested a budget of $2200 to support that representative and the board decided to vote on that matter when it considers the 1981182 lit a budget later in the week. new lita button and new avs brochure (report by donald hammer). the new lita button, "litaship is for everyone," was introduced, and copies of the audiovisual section's membership recruitment brochure "who says ala doesn't do anything about a v?" were distributed to the board members . joint litairtsd board meeting. malinconico announced and discussed the joint lita/rtsd board meeting to take place later in the week. he pointed but that there are many areas of joint interest and many activities the two divisions could cooperate in . he mentioned specifically discussion groups, highlights of meetings 131 cosponsorship of programs, z39 representation, problems concerning ala policies , the coming five-year review of the isbd, and other things. telecommunications committee report (report by joan maier). the board was brought up to date by maier on the preconference the telecommunications committee plans to sponsor at san francisco. it will be concerned with the "office in the home" concept and the support the library should provide to that "electronic cottage" mode of operating. the second day will consist of a tour of "silicon valley's" mission college where that college will demonstrate its new approach to education and its use of automation . additionally, the silicon valley electronic manufacturers will demonstrate their technology . joyce capell, who represented mission college, gave the board information about the college and the potential exhibitors from silicon valley. vacancy in division councilor position. hammer reported that the request was made to the ala bylaws committee to ask council to change the ala bylaws to allow for an alternate councilor to be elected by each division, and for that alternate councilor to have the vote if a division's councilor cannot complete the term of office. lit a will elect an alternate councilor in the coming ala election and it is expected that the ala bylaws committee will present their proposal to council this week. proposed increase in ala overhead charges . hammer reported that the ala controller has proposed that ala raise its overhead charges from 13 percent to 16.5 percent. this is the ala charge against institutes , preconferences, and other special activities. · the board decided to establish a task force to determine ho~ these overhead · charges are arrived at and exactly what items are included in them . . the task force will consist of ronald f . miller, chairperson; barbara markuson, bonnie juergens, and donald hammer, resour~e person. the following motion was made by ronald miller, seconded by kenneth bierman , and passed: that a task force be formed to obtain additional information about overhead charges which are assessed the division . toward that end, the task force will accomplish the following : (1 ) describe in writing the steps required for determination and approval and adoption of an overhead rate; (2) define the component costs included in the overhead rate ; (3) suggest services which overhead covers which might be contracted for in other ways . the members of the task force are : ronald f. miller, chairman barbara e. markuson bonnie k. juergens donald p. hammer, resource person the dates for accomplishment of the three items are: (1 ) may 1, 1981 (2) june 1, 1981 (3) ala annual meeting 132 journal of library automation vol. 14/2 june 1981 report on free ]olas. to date, twenty-six requests have been received from lita members for free copies of back issues of jola. this offer was approved by the board at the last annual conference as a means to reduce the supply of back issues of jola. it was suggested that new members of lita should be notified that these issues are available. report on funds allocated for san francisco programs . the ala conference program committee allocated to the lit a units the following funds for programs at the san francisco annual conference . isas/tesla "technical standards: the good, the bad, the missing" vccs "use of video by and for the deaf" vccs "viewdata-the electronic delivery of information" end of first session. second session february 2, 1981 $100.00 $350.00 $700.00 the meeting was called to order by s. michael malinconico, president . the following board members were present: s. michael malinconico, brigitte l. kenney, barbara e. markuson, nancy l. eaton, kenneth j. bierman, ronald f. miller, bonnie k. juergens, marilyn j. rehnberg, heike kordish, and donald p. hammer, lita executive director. staff: laura stewart . lita standards committee. a problem has arisen concerning an overall standards committee in lita in that those seeking information about technical standards have no one or no unit within lita to contact except tesla, which is concerned only with computer and data processing standards. an example of the situation is that of steve salmon who was appointed liaison to lita from the ala standards committee . he can only contact tesla and has nowhere to go concerning standards in any of the other areas of interest to lita. there is also no unit in lita empowered to establish standards policy for the entire division. after discussion, the board asked the lita executive director to contact mr. salmon and discuss the matter with him to determine what, if any, problems he felt the present arrangement made for him . the board will later reconsider the matter. rtsd catalog form, function, and use committee. this is a committee rtsd is proposing that would be an interdivisional committee concerned with the evolving and the proliferation of library catalogs and with development of programs and workshops " to inform and develop professional thinking on the form, function, and use of library catalogs." it was decided to bring the matter up at the lita/rtsd joint board meeting and to ask for additional information at that time. lita legislation and regulation committee (report by judith sessions). the legislation and regulation committee has made arrangements to hold a reception in the russell senate office building at which librarians will be able to meet their legislator and/or the legislators' staff members. the rehighlights of meetings 133 sponse to the invitations has been excellent as about one hundred rsvps have been received from legislators and their staff members. a report was given on the revision of the communications act of 1934 and the provisions that librarians should be working to have included. the copyrigh~, law was also discussed, especially the lack of a clear definition for "fair use. information bill of rights. about a year ago the information industry association compiled and published a statement called the "information bill of rights." the lita executive director brought the statement to the attention of the board because it was felt that the statement was written from the aspect of the profit-making organization only and perhaps should be broadened . the board decided that this was not in its province and asked the lita executive director to forward the matter to the ala office for intellectual freedom for any action they feel is warranted. marc users & library automation discussion groups. the marc users discussion group has decided that it would like to merge with the library automation discussion group (formerly cola) but would like to retain the four-hour time slot it has· had for many years. a motion was made by kenney, seconded by ron miller, and passed: that the lita board permit the merger of the marc users discussion group (mudg) and the library automation discussion group (ladg) and that they be called library and information technology discussion group. a motion was made by kenney, seconded by juergens, and passed: that the discussion groups (mudg & ladg) after they merge retain the four-hour time slot for the combined new group. a motion was made by juergens, seconded, and passed: that the chair of the library and information technology discussion group be instructed to contact the lit a program planning committee chair for coordination of discussion topics prior to each litdg meeting. a motion was made by kenney, seconded by ron miller, and passed: that the library and information technology discussion group elect a deputy chair to assist the chair from now on. end of second session. third session february 2, 1981 the meeting was called to order by s. michael malinconico, president. the following board members were present: s. michael malinconico, brigitte l. kenney, barbara e. markuson, nancy l. eaton, kenneth j. bierman, angie w. leclercq, helen cyr, bonnie k. juergens, marilyn j. rehnberg, heike kordish, charles husbands, and donald p . hammer, lita executive director. staff: laura stewart. apple education foundation grants. brigitte kenney reported that the apple foundation had been flooded with grant requests and that they 134 journal of library automation vol. 14/2 june 1981 have decided to restrict their grants to software development only. president malinconico asked kenney to determine exactly what the limitations are before the board considers the matter further. honoraria paid to lita speakers. the question raised was whether or not the people on the lit a board -of directors or any of the lit a program planning committees should be paid honoraria when they serve as speakers at lita institutes. a motion was made by juergens, seconded by helen cyr, and passed: that lita will not pay honoraria to lita board members or lita members of program committees for participation in institute programs . (this will take effect after end of next annual conference.) (this will take effect immediately for board members.) ala survey of priorities of membership (report by ron miller). five priorities the ala membership priorities committee has determined are access to information, legislation and funding, intellectual freedom, public awareness, and professional and staff development. some board members expressed surprise that some of the areas of concern to the white house conference were not included as ala priorities and that one of the expressed priorities (legislation) is only a means to an end. malinconico asked the board members to send their comments by april 1 to miller, who will then distribute a proposed amendment to the board. a task force of three, barbara markuson, brigitte kenney, and ron miller, was appointed by consensus to write the proposed amendment. it was suggested by juergens that the lita statement be published in american libraries as a letter to the editor in the same issue that the proposed ala priorities are published. lita teleconferencing system. a representative, john sehnert, from the source, gave a presentation of that system to the board. after a long discussion about the capabilities of such systems and the needs of the board, it was decided that by march 1 the recommendations for a pilot project would be provided, by march 15 the system should be operational, by april 1 a set of criteria should be made up to evaluate the project, and by the next board meeting (in san francisco) an evaluation should be held with a decision made as to the permanency of the system. it was suggested by barbara markuson that the "electronic mail" system should be demonstrated during the lit a president's program and input should be sought from the members as to what their needs are along this line. program planning committee (report by kaye capen). the committee is in the midst of a transition of the chair. sue tyner will be the new chairperson. capen discussed the joint rtsd/lita/rasd preconference on online catalogs to be held in philadelphia. national conference plans. berna heyman reported on plans for the lita national conference planned for baltimore in the spring of 1983. if for highlights of meetings 135 any reason it cannot be held in the spring of 1983, she stated that the fall of 1984 would be their second choice. the maryland library association would be interested in cosponsorship. the committee is considering asking for help from the council on state governments. the conference format would include exhibits, workshops prior to the conference, invited papers, contributed papers, a poster session, and panel sessions. a survey of the lita members is being considered in order to get ideas on subjects of interest. discussion followed, but no action was necessary. end of third session. joint lita/rtsd board meeting february 2, 1981 introductions. both boards, guests, and staff introduced themselves. agenda . there was no set agenda. karen horny, rtsd president, suggested that one topic that might be discussed or at least recognized is that both lita and rtsd have retrospective conversion discussion groups . both appeared to have different focuses on their discussion of retrospective conversion. background . michael malinconico, lita president, gave background on the reason for the joint board meeting. there has been created an uneasy sort of division between technology, application of technology, and technical services systems. this uneasy division is thinking of technology as the form in which library services are delivered, thinking of the technical-services interests as reflected in rtsd as concerned primarily with the content of that service. the distinction between form and content obviously falls apart very rapidly. in previous lita discussions, barbara markuson p"ointed out that there are perhaps three stages of implementation of technology . in the first phase, there is exploration of the potential of technology. that is the domain of lita. in the second phase, there is an implementation and a certain amount of acclimatization that is ne<;essary. this is the gray area. the third phase is where the technology becomes integrated into the operation of a library. this is the concern of the traditional technical services. the gray area needs to be addressed. with automated cataloging systems in particular, they are beginning to mature, and it is no longer clear who should be concerned and addressing the problems. thus there is overlap. we need to meet together to consider ways to make more efficient use of time that is expended at ala meetings. currently there are a number of joint ventures: cosponsorship of the catalogs preconference, 1982; the establishment of a joint committee on catalog form, function, and use; cosponsorship of a program in san francisco on union lists of serials. three things should be considered: l. how to organize to take joint action on matters that concern us mutually. 2. think of the joint programs as pilot ventures and attempt to set up a structure that can be used for future joint ventures. 136 journal of library automation vol. 14/2 june 1981 3. consider what other projects we might want to do jointly. bill gosling, rtsd past president, stated that one of the points in terms of overlap is the factor of growth ofboth divisions . the factor of growth is related to two things: (1) interest and (2) the desire or need to have official affiliation with the association . this is not unique to rtsd and lita. ala, as well as the divisions, is growing; more and more people are involved and want to be involved. michael malinconico stated that we should let the growth be a result of conscious action, it should not be something that happens without our conscious intent or control. it may be that there are instances where overlap is necessary and desirable. let the overlap occur as something done by intention. norman dudley stated that ala does have mechanisms for resolving overlap, which we are just beginning to use . we can never identify the gray areas because of the very nature of the technology. every new application of technology presents us with new or possible gray areas . what is needed is the sensitivity, willingness, and ability to approach the other unit and ask for cosponsorship. michael malinconico stated that the divisions had experimented with liaisons to their boards. the meetings often conflict, so this seems an enormously inefficient method of communication. another example of perhaps peripheral interest to lita is the isbd fiveyear program. there might be some value in having a joint review of the isbds . arnold hirshon suggested that the division executive directors exchange minutes or summary board minutes . at this time rtsd does not do summary minutes. the rtsd newsletter reports rtsd board action as well as section and committee reports, however. bill gosling, rtsd past president, stated that when talking about units of a division, even as an officer, it is difficult to ensure information communication . an orientation session is very important. if two or three people miss this, the information has to be picked up by sitting in meetings. for programming, a mechanism to be used is a screening for all programs . the planners have to include what affiliation is appropriate and what contact has been made. perhaps , to return to mike malinconico's point about structure, we should charge our organization committees, who review recommendations for new committees, also to look at possible affiliation. this happened with the catalog form, function, and use committee. it was suggested that rtsd and lita ought to exchange representatives to the organization and bylaws committees. michael malinconico suggested expanding this to exchanging representatives to the division level programming committees . rtsd does not have one as yet. bill gosling agreed that when the structure becomes defined, this is another area of exchange . michael malinconico suggested two other areas that lita and rtsd could explore--the proposed increase in the ala overhead rate for workshops, institutes, preconferences, etc., and the difficulty of getting publicity for forthcoming programs in american libraries. lita has formed a task force to look into the proposed change in the overhead rate to look at what ala central provides for the overhead charges and to identify those things that might be more economical to contract for separately. highlights of meetings 137 this is perhaps another area for cooperation. the sense of the lita board was that they would like rtsd participation in the task force. nancy williamson agreed to sit in on the task force as an rtsd observer. the task force's function statement has three aspects: 1. to identify the procedural steps that a dues increase would have to go through and how to effect those steps. 2. determine what it is we get for the overhead we pay. 3. determine those things that we get that might more economically be contracted for separately. student dues and graduated dues. the lit a board acted in support of student dues. rtsd had a concern about the impact of student dues on publications as the current $7.50 fee from the $15.00 membership fee does not cover the cost of publications. rtsd on the proposed graduated dues structure for new members, felt that it was difficult to assess the impact of new division members until they saw the effects on the ala general membership. the lita board was in favor of the graduated dues structure for new division members. interdivisional committee on catalog form, function, and use. this committee would replace the book catalog committee. currently rtsd is receiving responses from other divisions on their interest in forming such an interdivisional committee . ala/coo would have to look at this committee. michael malinconico stated that the formation of this committee would be one way of addressing some of rtsd and lita's mutual concerns. on-line preconference. the division executive directors were charged with writing an agreement on the responsibilities of each division with respect to this program and then circulate it to the respective boards. the joint board meeting was adjourned at 5:42 p.m. fourth session february 3, 1981 the meeting was called to order by s. michael malinconico, president. the following board members were present : s. michael malinconico, brigitte l. kenney, barbara evans markuson, nancy l. eaton, kenneth j. bierman, angie w. leclercq, helen cyr, bonnie k. juergens, marilyn j. rehnberg, heike kordish, charles husbands, donald p. hammer, lita executive director. staff: laura stewart. discussion of marbi committee request for funds. a discussion of the marbi committee and lita representation at its meetings took place. the following motion was made by barbara e. markuson, seconded by bonnie k. juergens, and passed : the lit a board approves the expenditure of up to $2200 to cover expenses for one lita representative at the two 1981 marbi meetings held 138 journal of library automation vol. 14/2 june 1981 outside the two annual ala meetings . this matter will be reviewed again at the next midwinter meeting. (amended by s. michael malinconico, and approved unanimously.) lita's place in standards setting. a long discussion took place on the past contribution of lit a in standards setting and what its position should be now and in the future in the standards field. the place of marbi, tesla (isas), and the isas international mechanization consultation committee was considered. the discussion culminated in a decision to ask the executive director of lita to write a background paper on the history of lita's involvement with all standards activities, including actions with other groups, and what results were achieved . the report is to be available at the next annual conference. national conference report (report by berna heyman) . it was reported that the national conference program committee would ask the ala executive board to approve a conference for lita in the spring of 1983. a discussion took place on the audience at which the conference would be aimed. concern was expressed for the inclusion of beginning-level programs and papers as well as activities for the more knowledgeable. the tutorial approach to all aspects of areas of interest to lita members and others was advocated by several board members. after a discussion on the registration fees and on the individuals who should be present to represent the lita board before the ala executive board, a motion was made by bonnie k. juergens, seconded by brigitte kenney, and passed: the board approves the request of the program planning committee to proceed with current plans to hold a lita conference entitled "informationffechnology : lita brings it all together." such approval includes a vote of appreciation to the committee for the effort that has gone into this plan. end of fourth session. fifth session february 3, 1981 the meeting was called to order by s. michael malinconico, lita president . funds allocated by ala to lita for san francisco programs. hammer reported that $900.00 was allocated by ala to each division to be distributed by their boards for san francisco conference programs. also---$100.00 was given to tesla by ala. also---$350.00 was given to lita vccs "video for the deaf" program. also---$700.00 was given to lita/vccs "viewdata" program; vccs requests at least $300.00 more for this . a motion was made by kenneth j. bierman, seconded by bonnie k. juergens, and passed : that tesla be awarded $550.00 and vccs program planning committee be awarded $350.00 for additional support for their programs for the san highlights of meetings 139 francisco conference. these funds are to come from the "regular conference program funds." bylaws and organization committee (report by heike kordish). no action items to report . program planning committee (report by sue tyner). no report given as sue tyner had just recently become chair. telecommunications committee (report by joan maier). after discussion it was decided to double the number of registrants expected for the "office in the home" preconference from 150 to 300. a revised budget was presented to the board. the lit a telecommunications committee would like to publish in the lit a newsletter a listing of electronic mail systems, a listing of paperless information technology consultants, and a dial-order-type services listing. these items have been turned down by the editor of ]ola, but have been accepted by the editor of the lita newsletter . the lita board gave its enthusiastic endorsement. nominating committee . malinconico reported on the 1981 elections slate as follows: vice-president/president-elect: kevin hegarty, carolyn gray director-at-large: hugh atkinson, emma cohn council: bonnie juergens, george abbott, lynne bradley lita section reports: isas (report by bonnie k. juergens). no action items to report. publications committee (report by charles husbands): brian aveney has augmented the jola staff by getting david weisbrod to be book review editor, and tom harnish to be an assistant editor for video communicationswhich we think will bring some new focus to those areas. as the committee has begun to organize the division's publications program, it is questioning whether or not an editorial board and a publications committee are needed . the committee proposes that lita have a publications committee, and that the journal and the newsletter have editorial boards. the committee recommends the chairperson of the publications committee be an ex officio member of each of the editorial boards, and that the chief editor of each of those publications be an ex officio member of the publications committee. the newsletter editorial board would consist only of the staff, i.e ., the chief editor, and the section editors; and the journal editorial board would consist of the chief editor, the various assistant editors, and additional people to serve as a core of reviewers (but not necessarily limited to that function) . the relationship to the lita board is something of a question. the bylaws state that the jola editor is a member, ex officio, of the lita board . the roster shows that charles husbands, as chair of the publications committee, is the ex officio member. there is a question as to whether either needs to be a member of the lita board . the publications committee feels that there should be only one ex officio member on the lita board-the chair of the publications committee-and asks the lita board to resolve this question. 140 journal of library automation vol. 14/2 june 1981 after discussion it was moved by ronald f . miller, seconded by brigitte l. kenney, and passed : that the board officially recognizes and approve s the establishment of two editorial boards; the first for the association's newsletter, the second for its journal. furthermore, the chairperson of the publications committee should appoint a liaison to the lita board for reportorial purposes, and the bylaws shall be amended to delete the journal editor as an ex officio member of the board. the publications committee recommends that the lita budget be published each year in the newsletter. the committee suggests changes in the form of the lita budget that would more accurately and/or more specifically indicate expenditures. the publications committee also suggested that some narrative be included with the budget to explain various aspects of it. no board action was necessary . the suggestion came up, in reference to items for the newsletter, that it might be interesting to try getting the headlines from the newsletter into some kind of electronic distribution. nexis was suggested. tom harnish suggested the source as another possibility, keeping in mind legal and copyright considerations . the lita board is considering an electronic mail pilot project, but the lita telecommunications committee is already in the process of setting up such a project of its own at this time . the board asked the newsletter editorial board to draw up a proposal for the lit a board to consider at the san francisco annual conference . american national standards institute committees . hammer brought to the board's attention two recent problems concerning lit a's representation on american national standards institute (ansi) committees . 1. ansi sent lita an invoice for $50.00 for membership in ansi. when it was pointed out to ansi that lita is a division of ala and ala is a member of ansi, the $50.00 charge was dropped. 2. as was reported to the board at the last annual conference meetings, the computer & business equipment manufacturers association (cbema) billed lita for $1,125.00 for a partial-year 1980 membership on x4 ($1 ,500.00 for a full year), and later information revealed that membership on x3 would cost $2,500.00. x3 and x4 have now been combined, but no information has been received on what the dues are for the "new" x3 committee . the problem is that letters to cbema asking what provision has been made for representation from nonprofit users groups are ignored . lita, therefore, no longer has any representation on the computer-standards-setting committees. after discussion , it was suggested that the lita executive director continue to try to communicate with cbema. sponsorship of lita institutes by outside organizations . bonnie juergens, chair of isas, brought up the matter of outside organizations asking to hold lita institutes for their members. the specific incident concerned is that of the law library association's request for sponsoring the "data processing specifications and contracting" workshop as a preconference workshop prior to their conference in june. the board indicated a willingness to allow such arrangements, but felt that lit a should gain some financial return from them. highlights of meetings 141 in this case, the board, by consensus, indicated that the law library association should be asked for 20 percent of the costs (with 15 percent being least acceptable) as remuneration to lita. lita bibliography . juergens brought up the question of continuing the lit a bibliography on library automation . the last one published included the years 1973--1977. she wanted board reaction as to whether or not it is a viable project and whether or not isas should prepare a working plan and a budget to be presented to the board at the next annual conference. the board, by consensus, asked isas to proceed to develop a plan . lita representative to ifla ( international federation of library associations) . ifla representative nominations. kenney presented a statement concerning the need of, and the requirements for, nominees to ifla. her recommendations for nominees were fred kilgore, susan martin, russell shank, and dick degennaro . members of the lita board were invited to submit additional names of possible nominees, especially as there is no limit to the number of nominees. ala operating agreement with divisions . at one of the copes meetings, there emerged a new ala operating agreement for the divisions written by robert wedgeworth, now being discussed by all units . there was a negative reaction to the vagueness of the document as it now stands. the president of the board suggested that the board members put their comments in writing to send to him around march 1. student membership dues proposal . ron miller asked the board if it wanted to reconsider its approval of the reduced student dues proposal in the light of recent discussions and actions in ala council. after discussion, the board confirmed its approval of reduced student dues and also took the position of being in favor of "local," i.e., divisional control of d~es . ala membership promotion task force (reported by blanche woolls) . the membership promotion task force is going to arrange special discounts to members of ala to go to museums and so forth in san francisco . lita might want to mention in the lita newsletter places of interest and things to do that the members might not otherwise know about. more specifically, lita might want to highlight the technology that exists in the san francisco area that lita members might be interested in going to see on their own . membership committee (blanche woolls). the lita membership committee recommends that lita prepare information for ala members who are not members of lita, to suggest that they should belong to lita by stressing those areas of the division that could attract individual participation in the association, such as the discussion groups and programs. it was moved by brigitte l. kenney, seconded by kenneth j. bierman, and passed : that the lita board authorize up to $700.00 for a mailing to ala members who are not lita members . 142 journal of library aut01nation vol. 14/2 june 1981 the membership committee requests support of the lita board for student chapters. though there is only one, university of michigan, there should be a letter sent to welcome the students into lita. woolls offered to write the "greetings" letter. bringing into lita people who are not librarians was presented to the board and discussed . aect is having their national meeting in philadelphia in april. as a member of aect as well as lita, blanche woolls would like authorization to arrange a very small reception at this meeting to attract members to lita . it was moved by kenneth j. bierman , seconded by brigitte l. kenney , and passed: that the lita board authorize up to $300.00 to the membership committee for a reception for the aect national convention, april 5--9 . the purpose of this reception is to encourage new members for lit a. membership committee is going to have a microcomputer in the lita booth at ala with a "lita game" on it-telling what lita is all about . they are aiming at zero cost to lita for both the microcomputer and the game . lita oral history task force . s. michael malinconico suggested that board members read the report on this subject that was made to the board by robert miller. avs & vccs proposed merger. brigitte l. kenney announced that both the a v section and the vccs section have expressed an interest in merging into one section and in expanding the telecommunications interests in lita into another separate section. s. michael malinconico suggested that the av section and vccs meet in san francisco and, in a joint meeting, discuss this matter and see that their memberships are informed of the results of that meeting. end of fifth session. lita board of directors meetings record of votes 1981 midwinter motions (in order of appearance in the "highlights" ) board member 2 3 4 5 6 7 8 9 10 11 12 13 s. michael malinconico y y y y y y y y y y y y y brigitte l. kenney 0 y y y y y y y y y y y y barbara e. markuson y y y y y y y y y y y y y nancy l. eaton y y y y y y y y y y y y y kenneth j. bierman y y y y y y y y y y y y y ronald f. miller a y y y y y y y y y y y y angie w . leclercq 0 0 0 0 0 y y y y y y y y helen cyr 0 0 0 0 0 y y y y y y y y bonnie k. juergens y y y y y y y y y y y y y marilyn j. rehnberg y y y y y y y y y y y y y key: y = yes n = no a = abstain 0 = absent 143 instructions to authors the journal of library automation welcomes manuscripts related to all aspects of library and information technology . some specific topics of interest are mentioned on the masthead page . feature articles, communications, letters to the editor, and news items are all considered for inclusion in the journal. feature articles are refereed, other items generally are not. all material is edited as necessary for clarity or length. manuscripts must be typewritten and submitted in original and one duplicate . do not use onion skin. all text must be double spaced, including footnotes and references. manuscripts should conform to a manual of style , 12th ed., rev. (chicago: university of chicago press, 1969). illustrations should be prepared carefully as camera-ready copy, neatly drawn in a professional manner on separate sheets of paper. manuscript pages , bibliographic references, tables, and figures should all be numbered consecutively . feature articles consist of original research, state-of-the-art reviews, or comprehensive and in-depth analyses . they may be from ten to twenty-five pages in length. an abstract of 100 words or less should accompany the article on a separate sheet. headings should be used to identify major sections. authors are encouraged to relate their work to other research in the field and to the larger context of economic, organizational or management issues surrounding the development, implementation, and use of particular technologies. communications consist of brief research reports, technical findings, and application notes . these may be up to ten pages in length; an abstract need not be included. letters to the editor may offer corrections, clarifications, and additions to previously published material , or may be independent expressions of opinion or fact related to current matters of concern in the interest area of the journal. a' letter commenting on an article in the journal is shared with the author, and a response from the author may appear with the letter. letters should be no more than three pages in length . . news items may announce publications , conferences, meetings, products, services , or other items of note. these should be limited to two pages in 'length. book reviews are assigned by the book review editor. re;1ders wishing to review books for the journal are invited to contact the book review editor, indicating their special areas of interest and expertise. names and addresse s of the journal editors may be found in paragraph three on the masthead page. in all correspondence please include your own name, institutional affiliation, mailing address , and phone number. lib-mocs-kmc364-20131012114220 nated, volume-oriented, resource-sharing electronic ordering process. for information relative to bisac transmission formats or bisac membership, write to: book industry systems advisory committee, 160 fifth ave., suite 604, new york, ny 10010. for input to bisac purchase order formats, write to: j. k. long, chairman, bisac p.o. subcommittee, c/o oclc, inc., 6565 frantz rd., dublin, oh 43017. (mr. long is also the library or network representative on the isbn advisory council.) for input to the ansi z39 p.o. transmission formats, write to: mr. e. muro, chairman, subcommittee u, c/o baker & taylor co., 6 kirby ave., somerville, nj 08876. for problems with the isbn and san, write to: mr. emory i koltay, international standard book numbering agency, 1180 avenue of the americas, new york, ny 20036. microcomputer backup to online circulation sheila intner: emory university, atlanta, georgia. our primary objective in purchasing microcomputer systems for the great neck library was to provide a better alternative to paper and pencil checkouts when our minicomputer-based clsi libs 100 automated circulation system was down. two difficult and lengthy downtime periods occurring shortly after going online convinced the administration that public service should not be jeopardized because of system failure. after investigation of the backup systems vended by computer translation, inc., 1 two of them were purchased in november 1980. computer translation, inc. (cti) sells a turnkey backup system based on an apple ii plus microcomputer, with two mini-disk drives using 5 '14 " floppy diskettes, a tv monitor, and a switching system connecting the apple to the libs 100 console and terminals. software designed to interface with the clsi system is part of the package. the backup collects and stores data for communications 297 check-ins and checkouts and then dumps them into the database by simulating a terminal when the mini-main-frame is operational again. this requires dedicating a terminal to this process until complete. it can also be used alone as a portable unit for circulation purposes, or with any of the many applesoft packages available, or with an applesoft program of the user's own design. our initial experience in great neck was with a borrowed demonstration system, set up by a sympathetic cti representative on the spur of the moment in tandem with and connected to the main library checkout station's crt laser terminal after several days of downtime. the circulation staff cheered as the familiar prompts appeared on both screens. they used the clsi equipment which they were accustomed to operating and the computer room staff learned to operate the cti system. the ease with which the apple could be transported to different locations in the building and the immediate relief it gave wherever it was connected, sometimes one checkout station, sometimes another, led us to put off deciding on a permanent installation at first. we thought it might be more advantageous to keep it on a rolling cart and use it wherever a terminal was down, or wherever the traffic appeared to be heaviest. we continued in this manner for a while even after both of our own apple systems were delivered. it soon became apparent that the apple and its accompaniments, especially the switching system with its dangling cables, was a nuisance at the checkout counter. people with piles of books or records tended to nudge it dangerously close to the edge or jiggle its connections loose. the circulation staff didn't like waiting until someone from the computer room could be spared to bring up the system, secure the connections, and turn on the apple. also, although the apple is a very reliable instrument which has given us negligible downtime, bumpy rides over various floors, carpets, lintels, and textured tiles occasionally loosened its chips and rendered it, too, inoperative. cti representatives were called in to make a more permanent installation for the apple in our computer room, a simple operation requiring some additional cable. se298 journal of library automation vol. 14/4 december 1981 lection of the terminals to be attached as alternate backup or dumping sites was not so easy, however. the choice of the primary backup site was not a problem, since one of the two checkout stations flanking the main door was fairly obvious. but the second terminal which would be preempted for dumping was a more difficult decision. dumping sessions vary in length depending on the number of records to be processed and the activity on the rest of the libs 100 system. in our library, we find it takes about an hour to dump 100 to 150 transactions. this appears to be slower than average and may well be due to the extremely high level of system activity. thus, dumping 1,000 transactions would take a full working day. we had been online for such a short time that great backlogs of patron and material data entry from new registrants and unconverted books had developed and were a high priority item. neither the circulation department, which was handling registrations, nor technical services, which was handling materials, felt they could afford to lose much terminal time for dumping. thus, the reference department's information desk terminal was reluctantly chosen as the alternate terminal on the grounds that they only did inquiries formaterials which borrowers could locate by means of searching the catalog and making trips to the shelves. if necessary, information desk personnel could step across the aisle to the circulation department and use a terminal there. the permanent installation was set up in this way for one backup system , while the other one remained mobile in the event we wanted to use it at one of our three branches. only the switching box and cables were really unmovable. the apple, drives, and monitor could still be disconnected and moved about at will. experience over the last few months with this arrangement demonstrated that, all things considered, it is unwise to attach two public service terminals to one apple, in spite of the pressure it puts on behind-thescenes operations to lose terminal time in the event of an extensive dump. the reaction of the public to being told a terminal that usually helped them was inoperative has been so negative it outweighed the delays in data entry. therefore, a change in the current configuration will soon be made. meanwhile, we realized the second backup system was not being used to greatest advantage. when the libs 100 was down, the next most pressing demand after main library checkouts were checkouts at the largest branch, located near the railroad station. we were collecting about thirty transactions an hour or less at other locations in the main building while the station branch staff were writing down twice that amount or more and explaining to their public that the computer was down. it seemed important to pursue the possibility of connecting one of the station branch's terminals to the second apple while keeping the apple itself in the computer room in the main library. not only was there even less space in the branch for another piece of hardware on their counter, but staff training and hardware control presented a greater problem since many more part-time people were employed there. cti worked on the problem for about two months, resolving it through the addition of a modem to the basic configuration. in this new installation, which we did ourselves with phone assistance from cti, and which has been operational for three weeks as of this writing, the dedicated phone line connector for the branch terminal is removed from its port on the libs 100 console and inserted into one of the switching box connectors. the apple is turned on as usual and the crt laser terminal at station branch appears to operate normally. in fact, it operates so closely to its usual libs 100 mode that staff members forget they are not online with the libs and call up to find out why inquiries don't work. we are still experiencing a significant amount of downtime with our libs 100. some of this is attributable to our relatively full storage, requiring us to perform housekeeping routines frequently, but the rest is a result of system failure. now, however, because of the apples, this causes far less anguish in the circulation department. when the libs 100 goes down, the permanently connected backups are switched on in the computer room by their staff and circulation clerks continue checking materials out on their regular clsi equipment in the main library and station branch. on days when housekeeping chores are scheduled, the console operator's job includes turning on the apples so we can begin serving the public when the doors open at 9:00 a.m. unless downtime persists for more than a day, no other routines are done except checkouts. under some circumstances, certain materials might be checked in on the apple, but it is not desirable to do this for newer materials on which holds may have been placed. when the libs 100 is online again, the checkout station is switched back to normal mode and the apple takes over the information desk's port for dumping, rendering that terminal inoperative. dumping continues around the clock until all transactions have been processed from both apples. normal activities proceed at all other terminals. diskettes are dumped in chronological order. as the dumping process operates, a file of transactions eliciting error or exception messages from the libs 100 is created on the apple diskette. this file is available for attention at a later time for manual entry into the database. the chief asset of the dumping process is the accuracy achieved by automatic inputting. when we used paper and pencil, not only was the original writing time consuming, but manual data entry was difficult because of illegible handwriting, inaccurate transcription of the numbers, inaccurate inputting into the database, and lack of available personnel for the job. the cti system resolves all of these difficulties, but a price is paid in the loss of the dumping terminal's services. the public may be less disturbed if a terminal in a nonpublic area is used. but to the department involved, access to the database is a central part of their work and its loss severely limits their output. in fact, dependence on the automated circulation system by all departments in the library has been swift and universal even though we originally assumed the terminals outside the circulation department would be used sparingly. plans are being made to store personnel records in machine-readable form on diskettes. other developments are being put on a back burner until we have less frequent communications 299 need for the apples as backups. however, levels, great neck library's youth department, has several apples of its own on which budding "computerniks" practice their art. for them there are few limits to possible applications-perhaps only the outermost boundaries of imagination. reference 1. joseph covino and sheila intner, "an informal survey of the cti computer backup system;· journal of library automation 14:108-10 oune 1981). computer-to-computer communication in the acquisition process sandra k. paul: skp associates, new york city. in the 1970s, we entered the period of computer-to-computer communication; we now appear to have reached the second stage of development. today more than seventy publishers are equipped to receive computer tape orders and input them directly to their order fulfillment systems; twenty-six publishers can produce computer invoices and credits for their customers; six are capable of sending monthly updating information about titles, prices, publication dates, and books declared out of print. all of this, however, is based on a system through which computer tapes are sent from buyer to seller and back via the united states mail. the next stepcomputer-to-terminal or computer-tocomputer communication-is just around the corner. historical perspective how did this happen? it started in september 1974 when dewitt c. ("bud") baker, newly appointed president of the baker & taylor company, envisioned the savings his company could find if their customers provided the international standard book number (isbn) on their orders. he also believed that the volume of paper created by the computer was expensive and time-consuming for publishers to handle. lib-s-mocs-kmc364-20140601052211 96 ]o11mal of library automation vol. 5/ 2 june, 1972 analysis of search key retrieval on a large bibliographic file gerry d. guthrie, steven d. slifko : research & development division, the ohio state university libraries, columbus, ohio two search keys (4,5 and 3,3) are amlyzed using a probability formula on a bibliographic file of 857,725 records. assuming random requests by record permits the creation of a predictive model which more closely approximates the actual behavior of a search and retrieval system as determined by a usage survey. introduction systems planners are hard pressed to accurately predict the access characteristics of search keys on large on-line bibliographic files when so little is known about user requests. this paper presents a realistic model for analyzing different search keys and, in addition, the results are compared to actual request data gathered from a usage survey of the ohio state university libraries circulation system. a number of papers are available in the literature concerning search key effectiveness; however, all of these were done on relatively small data bases ( 1-5) . of particular importance to this paper is kilgour's article on truncated search keys ( 6) . purpose the purposes of this study are ( 1 ) to determine the comparative effectiveness of the 4,5 and 3,3 search keys, ( 2) to compare two predictive models, and ( 3 ) to test the results with an actual usage survey. method the ohio state university libraries circulation system contained at the time of this study 857,725 titles representing over 2.6 million volumes in the analysis of search key retrieval/guthrie 97 osu collection. the data base used for this study was the search key index file which contained one search key for each title in the master file. the search key is composed of the first four letters of the author's last name and the first five letters of the first word of the title excluding nonsignificant words ( 4,5 key). title words are passed against a stop-list to determine significance. the stop-list contains the words: a, an, and, annual, bulletin, conference, in, international, introduction, journal, of, on, proceedings, report, reports, the, to, yearbook. the search key file is in sequence by search key. for comparative purposes, a second search key file was created and sorted which contained a 3,3 key (the first three characters of the author's last name and the first three characters of the first significant word of the title. ) the two files of sorted search keys were then processed by a statistical analysis computer program. this program created a frequency distribution table of identical keys, i.e., how many keys were unique, duplicated once, duplicated twice, etc. from this table two models were compared. modell: file entry was viewed as a random process with choice of any unique search key equiprobable. this model has been suggested in the literature mentioned earlier. it states that if x;. number of keys will return i matches then the probability of a file search returning i matches may be written: p(i) = xi/ku where ku is the total number of unique file keys. likewise, the cumulative probability for i or fewer matches is i i p(i) = ~ p(i) = ( l x;. )/ku i= l i= l model 2: file entry is viewed as a random process with the choice of any record equiprobable. thus, p( i) = ix;/rt where r t is the total number of file records. correspondingly, i i p(i) = l p(i) = ( ~ ixi )/rt i= l i= l survey: the ohio state university libraries automated circulation system includes a telephone center to which patrons may telephone requests for 98 journal of library automation vol. 5/2 june, 1972 library holdings information and for checking out and renewing books. telephone operators, sitting at cathode ray tube ( crt) terminals, translate the patron's author-title request into a 4,5 search key and proceed with a file search. by having the telephone operators treat te lephone calls as random input to the system and recording the number of matches returned for each search used, results can be generated in the same form that both of the models take, i.e. , i or fewer matches have been returned p( i ) x 100 percent of the time. this is a relatively easy survey to conduct since the output list of matching records for any particular key entry is headed with the exact number of matches which follow. the sample size was 1000 information requests recorded over two one-week periods separated by one month. before these two subsamples were merged, statistical analysis on their individual means (for percent of 10 or fewer matches) signified they were identical at the 99 percent confidence level. results the results predicted by the two models for both a 4,5 and 3,3 search key for 1-10 matches appear in tables 1 and 2. the figures pertaining to the 4,5 key can be compared directly to the data received fro m the survey conducted through the osu library's telephone center. this comparison is shown in table 1 for 1-10 matches. table 1. file access comparisons (4,5 search key). (percent of time i or fewer matches returned) i 1 2 3 4 5 6 7 8 9 10 actual survey 35.9 53.8 66.0 73.1 78.5 81.3 83.8 85.6 86.6 87.8 modell model 2 (random key) (random tecord) 81.3 55.7 92.9 71.6 96.3 78.5 97.7 82.4 98.4 84.9 98.8 86.6 99.1 87.8 99.3 88.8 99.4 89.6 99.5 90.2 to acquire a 99 percent upper confidence limit on the percent of requests returning 10 or fewer matches, the normal distribution was used as an approximation to the binomial distribution ( n = 1000, p = .878 ) producing an upper limit of 90.2 percent. analysis of search key retrieval/guthrie 99 table 2. file access comparisons (3,3 search key). (percent of time i or fewer matches were returned ) i 1 2 3 4 5 6 7 8 9 10 discussion modell (random key) 64.3 81.0 87.9 91.6 93.7 95.1 96.1 96.8 97.3 97.7 model 2 (random record) 28.0 42.5 51.7 58.0 62.7 66.3 69.3 71.8 73.9 75.7 in table 1 the results of the survey show that 87.8 percent of all searches recorded returned 10 or fewer titles. in modell, assuming that requests of the file are random with respect to search key, it is predicted that 99.5 percent of all searches will return 10 or fewer titles. all predicted percentages for model 1 are consistently higher than observed results. the predicted response in model2 more closely approximates the observed behavior of the system as the number of responses increases. however, model 2 is also consistently higher than the actual survey. comparing model 1 and model 2 only, it is apparent that assuming a random record request more accurately reflects the true usage of a library collection. the lower percentages recorded in the actual survey may be attributable to a number of variables not taken into consideration in this study. clustering due to common english word titles and common names may account for the greater part of this difference. table 2 shows the results of predicted response for a 3,3 search key. in this table, model2 predicts that only 75.7 percent of requests will return 10 or fewer titles. equally important, only 28.0 percent of the requests will return a single record. conclusion in predicting the expected behavior of an information retrieval system, it is more accurate to assume random requests by record than to assume random requests by search key. probability predictions are deceptively high for assumed random key requests and do not reflect actual usage of the file. even assuming random requests by record will produce higher-thanobserved results. data calculated using model 2 should be considered as an upper limit or "ideal" performance indicator. regarding the results of 100 journal of library autvmatio11 vol. 5/ 2 june, 1972 the random record model as the upper limit on effectiveness of the search key, the data gathered indicate that, as the search key is shortened from 4,5 to 3,3, the deviation between the random key and random record models is considerably heightened. the 4,5 search key is more efficient for retrieval of 10 or fewer records from a large file than the 3,3 key (90.2 -75.7 percent ). based on these data, the osu libraries decided to retain the 4,5 search key and not reduce it to 3,3. additional studies should be undertaken to determine the effects of common word usage, common names, and their relation to book usage. secondly, the data presented here could be systematically and randomly reduced in size to predict the behavior of various search key combinations on varying file sizes. references 1. philip l. long and frederick g. kilgour, "a truncated search key title index," journal of library automation 5:17-20 (mar. 1972 ). 2. frederick g. kilgour, philip l. long, eugene b. leiderman, and alan l. landgraf, "title-only entries retrieved by use of truncated search keys," journal of library automation 4:207-10 (dec. 1971 ). 3. frederick g. kilgour, "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science 5: 133-36 ( 1968) . 4. frederick h. ruecking, jr., "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," j ournal of library automation 1:227-38 ( dec. 1968). 5. william l. newman and edwin j. buchinski, "entry / title compression code access to machine readable bibliographic files," journal of library automation 4:72-85 (june, 1971 ). 6. frederick g. kilgour, philip l. long, and eugene b. leiderman, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science 7:79-81 ( 1970). .. starr ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ lib-s-mocs-kmc364-20141005043103 the new york public library automated book catalog subsystem s. michael malinconico: assistant chief, systems analysis and data processing office and james a. rizzolo: chief, systems analysis and data processing office, the new york public library. 3 a comprehensive automated bibliographic control system has been developed by the new york public library. this system is unique in its use of an automated authority system and highly sophisticated machine filing algorithms. the primary aim was the rigorous control of established forms and their cross-reference structure. the original impetus for creation of the system, and its most highly visible product, is a photocomposed book catalog. the book catalog subsystem supplies automatic punctuation of condensed entries and contains the ability to pmduce cumulation/ supplement book catalogs in installments tl'ithout loss of control of the crossreferencing structure. background in 1965 studies confirmed what much of the new york public library's administration had long felt: the public card catalog of the research libraries, containing entries dating back to 1857, was rapidly deteriorating.1 it was estimated that 29 percent of the cards were illegible, damaged, or in some other way unusable. further, cataloging and card filing arrearages were monotonically increasing at an alarming rate. increases in labor costs were eroding all efforts to cope with these problems manually. in addition, the deputy director at that time (now director), john m. cory, realized that a wider base of support was absolutely essential to the survival of the new york public library as an institution. as a result of these disquieting observations, three logical conclusions followed. first, the existing card catalog would have to be closed off, rehabilitated, and photographically preserved. second, available technology should be explored as a possible solution to some of the spiraling arrearage problems. in particular the applicability of computer technology was to be explored. this exploration appeared to offer some most attractive longterm solutions. the capture of all future cataloging in a machine-readable form would obviate for all time the deterioration problem. this strategy could also provide a basis for a check against spiraling costs, since traditionally unit costs have tended to increase in manual and decrease in 4 journal of library automation vol. 6/ 1 march 1973 automated systems.2 seen within the context of the marc project at the library of congress ( lc), the economies were becoming manifestly obvious. the long-term benefits to the entire library community of a national network of shared machine-readable bibliographic data could not be denied. capture of data in machine-readable form for use by information retrieval systems which might become economically feasible in the near future had to be viewed as a matter of great value. third, wider access to the resources of the new york public library had to be provided if a wider base of support for the library's operation was to be sought. the solution decided upon was the development of an automated bibliographic control system capable of producing photocomposed book catalogs. the book catalog would then serve as the prospective catalog and augment the retrospective card catalog, which would also appear in book form following photographic duplication of the cards. 3 this solution, at one stroke·, addressed itself to all three of the major problems, and showed great promise as a future investment. reproducible book catalogs could be widely distributed. a machine-based system would eliminate manual filing, would take full advantage of cataloging available from marc, and would begin at the earliest possible time the establishment of an invaluable machine-readable bibliographic data base. photographic techniques had already been employed in producing book catalogs, e.g. the national union catalog, the book catalog of the free library of philadelphia, and the enoch pratt free library catalogs, among others. 4 computer-produced book catalogs embodying various techniques (computer line printing, photo-typesetting, etc.) and levels of sophistication were being produced by many institutions, e.g. harvard university's widener library shelflist, stanford university's undergraduate library catalog, baltimore county public library's catalog, among others.57 an extensive review of various types of book catalogs including typical pages of each is given by hilda feinberg.8 following extensive studies conducted by messrs. henderson, rosenthal, and nantier of the nypl research libraries, the systems analysis and data processing office (sad po) was formed, staffed by edp and library specialists, to be completely dedicated to the solution of problems of automated bibliographic control and library automation. from the beginning it was decided that if edp technology were to be utilized, it should qe utilized in a manner which took full advantage of the properties of the medium. the computer was not to be used as an ultrasophisticated and costly printing press. the application of new technology to a field will invariably lead to waste and awkward results if the intrinsi_c properties of the technology are not fully utilized. the fundamental properties of edp technology lie in its abilities to: 1. reorganize and combine data; 2. select items meeting a set of predefined conditions; 3. maintain a permanent but flexible correlation between items; ' automated book catalog subsystemj malinconico 5 4. transform a set of conditions into data; 5. perform all of the above with remarkable speed and accuracy; 6. perform all operations with a merciless consistency. thus, it was realized, at the outset of the project at nypl, that technology could provide a great deal more than the maintenance of a machine-readable record and its reorganization for display. a rigorous control of bibliographic data was possible, and would extract maximum utility from any investment in edp technology. it was with these ideas in mind that machine-based authority control and filing systems were developed. the authority control file provides the fnndamental utility of the system. control of data usage has always been of paramount concern to the professional bibliographer. it becomes even more important in a machine-based system in which the data lie in an essentially invisible form until a fairly complex display operation is performed. advantages of an authority file another bibliographic aid which the computer could provide through an authority control system was the maintenance and integrity of a crossreference structure. in addition, one of the classical functions of crossreferencing could be eliminated : it would no longer be necessary to direct a user from one classification which has been used extensively to a newer one when terminology changes. consider the problems which might arise if the library of congress were to change its current usage of the heading aeroplane to airplane. it would be virtually impossible, under a manual system, for a library to attempt to locate, alter, and refile all cards bearing the tracing aeroplane. with a central authority file the problem is reduced to a single transaction and a fraction of a second of effort by the computer. the change is effected with an accuracy unattainable in a manual system. finally, the common nuisance of a cross-reference leading to yet another cross-reference is automatically obviated. the presence of a machine-readable authority file and the ability to verify use of all forms against this central authority, with machine accuracy, eliminates all clerical errors in the usage of names and headings to which a manual system is susceptible. the problem of consistent usage is greatly compounded in a machine-based system which does not provide mechanical verification. inconsistencies in any automated system generally tend to diminish its utility, and invariably lead to ludicrous results. nonetheless, inconsistencies of usage in an automated system are more readily corrected than those in a manual system. the existence of a central authority file, however, reduces the operation to maximum simplicity and allows no deviation from established standards. while maximum rigor in machine control was attempted, an attempt was also made to shield the professional librarian, who would be using the system, from as much of the tyranny imposed by the machine as possible. in the system finally adopted, the librarian need only exercise care when 6 journal of library automation vol. 6/ 1 march 1973 establishing a form. following establishment of the form, the cataloger need not be concerned, with any of the details of the entry, such as punctuation, accent marks, marc delimiting or categorization. the authority subsystem supplies all such details. in short, the cataloger is only required to spell the form correctly. the machine will identify any incorrect usage; thus a great deal of tedious and time-consuming (and thereby costly) manual searching is eliminated. at the same time that work began on the automated system at nypl extensive activity in library automation was also in progress in many other parts of the country, involving virtually all areas of library operation: cataloging, acquisitions, serials control, circulation, and reference services (information retrieval) . since, at nypl, it was assumed that the bibliographic data base and its conb·ol would form the cornerstone of each of these systems, cataloging was given first priority. this approach differed from that taken at other institutions; others, columbia university for example, chose to develop an acquisitions system first. 9 still others developed highly sophisticated circulation systems, ohio state university being notable among these. 10 even among those institutions which chose to address themselves to the problems of automated cataloging, important differences in approach were evident. these diherences were largely a result of attempts to solve different types of problems related to cataloging. among the many projects initiated at that time two will be mentioned, as they are representative of the differences in approach to automated cataloging. the first is represented by the university of california union book catalog project, undertaken by the institute of library research ( ilr). this system is characterized by an attempt to minimize, via computer programming, manual intervention in data preparation. employing the technique of automatic format recognition, the ilr staff attempted to find the most economical means of rendering a vast amount of retrospective data into machine-readable form. 11 in converting such a large amount of data they had to also concern themselves with the statistical error levels to be expected from keying. having decided that extensive manual edit was too timeconsuming and costly, and itself prone to statistical error, they attempted to create computer programs which would use the massive amounts of data as a self-editing device. in a sense, ilr used the nature of the problem as its own solution. the goal of the project was the production of a book catalog representing a five-year cumulation ( 19631967) of materials on the nine university of california campuses, and a marc-like data tape clean enough for print purposes. nypl, on the other hand, decided to consider only prospective materials in a continuously published catalog, and the creation of a marc-like record which would approach in completeness, as closely as was economically feasible, that created by the library of congress. to this end manual tagging and editing were absolutely essential. automated book catalog subsystem/malinconico 7 the second system to be considered is the shared cataloging system developed by the ohio college library center.12 the primary emphasis here is on the economy to be derived by instantaneous access to the combined cataloging efforts of a cooperating group of libraries. at oclc the primary emphasis was placed on on-line bibliographic data input and access. the major bibliographic product to be produced was a computer printed card set. the overriding consideration of oclc was the sharing of resources among many users, while at nypl the major concern was the content integrity of a single user's file. advantages of a book form catalog a book form catalog has several advantages over a card form catalog: it is portable, compact, more readily scanned and extremely simple to reproduce. when coupled with an automated system for maintenance and production the advantages are greatly magnified, as manual filing is virtually eliminated. the format, sequencing, and usage of terms in a book catalog may be varied at will to accommodate users' needs and library service policies. advantages and disadvantages to book catalogs are summarized in the introduction to tauber and feinberg's collection of articles on book catalogs.13 comparisons of book versus card catalogs are presented by catherine macquarrie and irwin pizer in articles reprinted in the work cited above.14• 16 the most obvious advantage of the book catalog is its portability. wide availability of the catalog of a library's collection makes possible a level of service not economically feasible under any other system. access to the complete collection of a library system can be made available economically to every educational institution in the region served by the system. access to a highly valuable research collection can be made available to a much wider geographic region than was hitherto possible. the concept of a union catalog for a region becomes much more viable, making possible regional cooperation in acquisitions policies and relieving the burden of heavily duplicated collections currently borne by library systems within manageable geographic regions. such cooperative ventures allow the cost of maintaining the catalog to be defrayed among the various members of the consortium. thus, a book form catalog would appear to provide groups of libraries with the possibility of operating economies, while increasing the overall level of service to the public they serve. the utility of a book form union catalog has already been demonstrated by the experience of the m~d-manhattan libraries in new york. midmanhattan, a central circulating library, consists essentially of five libraries in two locations. provision of complete bibliographic access with a traditional card catalog would require the manual maintenance of five individual and two union catalogs. the utility of the mid-manhattan catalog has been further increased with the inclusion of the entire nypl branch library system in january 1973. 8 journal of library automation vol. 6/1 march 1973 a library's internal operation benefits by wide availability of the catalog, as individual copies of the catalog can be made available to the acquisition division, the cataloging division, and each of a library's special collection administrators, making references to the traditional official catalog more efficient; such has been the experience of nypl. baltimore county public library reports a similar finding.16 a perhaps hidden advantage of a book catalog lies in its compactness. a book catalog requires neither the space nor the expensive furniture required by a card catalog. the problem of space becomes more and more acute as the "information explosion" continues to mushroom. an ironic squeeze is encountered in that the collection yearns more and more for the space occupied by the catalog, while the catalog, in growing, continues to make its own demands on available space. description of the nypl bibliographic system files before attempting to describe the book catalog subsystem, we shall briefly describe the nature of the files from which the bibliographic data are drawn. the complete bibliographic system consists of four major files and computer programs for their control and maintenance. • the files are: 1. complete marc data base (updated weekly with all changes and additions) from which cataloging may be drawn; 2. bibliographic master file; 3. authority master file; 4. bibliographic/ authority linkage file. for the purpose of this discussion we shall take the existence and maintenance of these files for granted, and concern ourselves solely with their use in the production of photocomposed book catalogsp bibliographic master file this file contains unit records for each bibliographic item in the collection; t books and book-like materials, monographs, serials, analytics and in• the system actually consists of three independent sets of such files (marc is common to all)-one each for the research libraries, the branch libraries, and the dance collection. t separate data bases are maintained for the research and branch library systems. the research libraries file contains all book and certain book-like material added to its collections since january 1971. the branch libraries' file contains all holdings of books and book-like materials of the mid-manhattan library collections. this file currently duplicates to a large extent the holdings of the rest of the branch system, and will eventually encompass the entire system. automated book catalog subsystem/malinconico 9 dexing items are included.: the information content is identical to that of marc records. tagging and delimiting adhere to the marc conventions except in those cases in which it was necessary to expand delimiting in order to enhance the functional utility of the marc coding structure. some data distinctions which marc has since dropped, but which are nonetheless useful, have been retained. the expansions consist of the addition of several delimiters not used by marc in order to provide filing forms (which are automatically generated, but which may be manually overridden) for titles, and sequencing information for volume numbers of series and serials. transformations from a marc ii communications for· mat to the nypl format and vice versa are possible due to the isomor· phism of the two records. the transformation of marc ii format records into nypl processing format is carried out in the normal course of processing, in which marc records are selected for addition to the nypl files. authority master file this file is the central repository of all established forms. names (personal, corporate, and place), series titles, uniform titles, conventional titles, and topical subject headings are all established on this file. categorization of each form is controlled by this file. no form is accepted for use in a bibliographic record unless it matches a form already established on the authority file, and is used consistently with the categorization assigned to it, e.g. a form categorized as a topical subject is never permitted as an author, a series title may only match a form categorized as a title, etc. the cross-reference and note structures are maintained on this file. an additional heading employed by nypl, which falls conceptually half way between a cross-reference and a subject heading, the dual entry, is also controlled here. the dual entry heading serves to bring together, under a nonlc heading, bibliographic items which nypl considers unique, by virtue of the nature of its collection. an example might be found in the genealogy division which contains a very extensive collection dealing with new york city. use of the dual entry allows a sequencing under both a subject heading indirectly regionalized to new york city (lc heading) and at the same time a drawing together of all items about new york city into a single sequence headed by new york city. take, for example, the lc established heading elections-new york (city); nypl automatically causes all ~ at nypl a distinction is made between analysis and indexing of a work. the latter refers to selective analysis, used when it is desired to provide, for example, subject access to a significant article in a periodical without creation of the series added entry. there are two types of indexing provided by the nypl system. the fust creates only a subject tracing; such treabnent might be accorded an article of topical significance by a staff writer of a popular periodical. the second would create both an author and subject entry; this might be used in the case of an author of note writing on a significant subject in a popular periodical, e.g. norman mailer writing on political conventions for esquire magazine. 10 journal of library automation vol. 6/ 1 march 1973 items traced to the above heading to appear under both the lc heading, and the dual entry new york (city)-elections (figure i). the dual entry merely provides an alternate form of organization for display. no bibliographic tracing is permitted directly to a dual entry. the additional entry point is automatically created when a catalog is printed. manual effort by the cataloger in order to provide the additional entry point is prevented; in addition, the bibliographic record remains rigorously marc-compatible. automatic control of cross-references, dual entries, and the en masse alteration of classification are facilitated by the authority subsystem together with the correlative and reorganizational capabilities of the computer. there is some irony in the relative ease with which the computer allows such individualized organization of data to be effected and the computer's reputation-richly deserved-for imposing a bland uniformity on its victims. the authority file provides one other invaluable service: it controls, in a single location, filing forms to be associated with a heading. consistency of filing is assured and, again, extreme simplicity of alteration is possible. only one record need be changed in order to alter the filing of the entire 201 election handbook. [boise, 1970] 87 p. 71-511901 [jld 71·314) elections in ghana, 1969. austin, dennis, 1922· [new delhi, 1970] 26 p. 7l -44166s [jfe 71·564] elections • japan. curtis, gerald l. eleclion campaigning, japanese style. new york, 1971. xiii , 275 p. 71-591294 [jld 71-805) elections • jurisprudence. see elecflon law. elections • lancashire, eng. • history. clarke, p. f. lancas hire and the new liberalism. cambridge [eng.} 1971. ix, 472 p. 71-s09sj8 (jle 11·191) elections • management and methods. see electioneering. elections • new york (city) .. ivins, william mills, 1851-1915. machine politics and money in elections in new york city. new york, 1970 [cl887} 150 p. 72-41160 [irgn 72-92] elecfions ·norway. koritzinsky, theo. velgere, partier og utenrikspo litikk. oslo, 1970. 182, [i] p. 12-261079 [jld 72·536) 2 4 new york (city) • economic assistance. poston, richard waverly. the gang and the establishment. new york [ 1971] xii, 269 p. 72-59612 [jld 72-583] new york (city) • economic assistance· law and legislation. u. s. congress. house. committee on education and labor. subcommiltee on the war on poverty program. antipoverty program in new york city and los· angeles. washington, 1965. vii, 209 p. n -222049 [jle 72·171) new york (city) • elections.+ ivins, william mills, 1851-1915. machine politics and money in elections in new york city. new york, 1970 [cl887] 150 p. 72-41160 [irgn 72-92] new york (city). environmental protection administratioll. fabricant, neil, 1937· toward a rational power policy: energy, politics, and pollution. new york [1971] vi , 292, [30] p. 72143433 [jse 72·291] new york (city) federation of jewish philanthropies. see federation of jewish p•iluthropies of new york. fig. 1. the nypl research libraries dictionary catalog, july 1972: ciu-f page 201 on the left, and ln page 297 on the right. dual entries under new york (city) are shown on the right. this catalog was produced in 6 and 8 pt. type set on 8 pt. body. automated book catalog subsystem/ malinconico 11 body of material associated with a heading. filing forms are automatically generated, with provision made for a manual override. automatic filing has been found to be correct in better than 95 percent of the cases currently in use. the remaining 5 percent required manual intervention. the machine filing algorithms are based on language and on marc categorization and delimiting. 18 initial articles are dropped in each of thirty-eight languages, including the major languages transliterated into a romanized alphabet (those employing cyrillic alphabets, oriental languages, hebrew, and yiddish) . chronological subdivisions are filed automatically obseiving rules regarding inclusiveness of dates, etc. important chronological periods (currently fifty-four such periods) are recognized and filed automatically, e.g. american revolutionary and civil wars, french revolutions, chinese dynasties, middle ages, etc. roman enumeration is automatically filed in correct decimal sequence. bibliographic/ authority linkage file the basic function of the bibliographic/ authority linkage file is to provide a communications channel between the two major files by assigning to each authority form a neutral unique number. the linkage file then provides access to the established form regardless of the metamorphoses which it may have undergone since its original use (the number remains inviolate). each authority upon addition to the file is assigned a unique number; however, the authority file is sequenced by an alphabetic sort key. this sort key bears no logical relationship to the filing form of the heading; it is constructed by dropping punctuation and accent marks, converting to upper case, dropping multiple blanks, and appending a hash total. the linkage file maintains the correspondence between authority control number and alphabetic sort key. only the authority control numbers, determined by the first bibliographic/ authority file match for each field, are carried in the bibliographic records. in addition, information is provided to the book catalog subsystem regarding changes in the authority file (alteration of established forms, etc.) which would cause an entry exhibiting such alterations to be immediately regenerated for inclusion in a book catalog supplement. appropriate action is taken against the bibliographic file when activity to an au~ thority heading is sensed by the book catalog subsystem. the presence of a dual entry form, which will require the creation of an additional entry under the associated variant form, is also indicated here. alternative input files it should be mentioned that the full set of files described above is not a mandatory requirement for creation of a book catalog. a bibliographic file in a marc ii communications format alone will suffice. we have performed tests using both another library's data file and the marc file as 12 journal of library automation vol. 6/1 march 1973 sole input to the system. using unmodified file update software we have generated from these marc ii format data bases complete authority files, and thence book catalogs. no cross~references or scope notes are possible in this mode of operation, since marc makes no provision for them. a further experiment was performed using another library's data base (in marc ii format) in combination with the cross-reference structure of the nypl authority file. this led to highly satisfactory results, demonstrating that a photocomposed catalog could be created, and exhibiting the utility of the input file enhanced by cross-references.§ the photocomposed book catalog subsystem the system for production of book catalogs represents only the visible tip, albeit a large and complex tip, of the entire bibliographic system. it consists, in a11, of ten computer programs and several score modules. the system was designed with thought toward production of catalogs with a variety of output options. in most cases, these options can be attained by the elimination of entire programs or modules. space does not permit a consideration of all possible variations; the most important will be mentioned in the course of the discussion. one consideration which was deemed of paramount importance was to remain as independent of photocomposition hardware as possible. photocomposition is yet in its infancy; hence, an inextricable commitment to a particular device, it was decided, was to be avoided. the final approach taken was the design, by sadpo, of generalized photocomposition software which is responsive to device-independent typographic commands. the only function of this software is to accept, as input, completely defined text data and typographic instructions from which it generates formatted pages. this task is accomplished via a translation of device-independent into device-particular commands in the form of a photocomposition device driver tape. should a new or more desirable photocomposition device become available, or significant advantage be found in employing a different photocomposition vendor, only one program need be altered. the photocomposition software is completely generalized and can be used to generate anything from book catalogs to typeset prose, in virtually any format (see the section on the pagination program for a discussion of the formatting options provided). figures 2, 3, and 4 demonstrate some of the possibilities . . the creation, organization, and control of data to appear in the catalog was undertaken as a completely distinct set of programming tasks. design obfectives of the book catalog system before embarking upon a discussion of the technical aspects of each g the september 1972 hennepin county public library book catalog was published with a bibliographic data base produced by the hennepin county library combined with the nypl research libraries' authority flle. automated book catalog subsystem/ maljnconjco 13 dictiona r)' catalog supplemen t, august/no vember 1972 a. a. a. see associ~td amt'rlc.n artists. a. a. h. p. e. r. see .a•etitoan assod1tion for h• altb. ph)l'iq' ialllcmioft, a•d recrn.lio.-. a. a. t... l 1o«: ar.rima .4uoe:ia tl011 ol law l.lh..-. a amoa re~rts. afro-american music o ppnrtu!uiic's assnd:lllinn. v. 3. no. 4; o c i./oc<:. 197 1minneapolis. ci.jrrf.ntlssues a v a ilable t:-1 music division . 12-.. dl .. 979 [m.,sie div.) a. a. s. sec assor:i:ado• l•r asilft sra~~d~s. a. a. s. h . 0 . f or corpotati: body reprek!i.icd by cht'sc ioailiits. ~e:. a~k:ajto ~ioal .r s tme hipw•y otneials. a. b. a. sec: american bankert aisecis~~iiod. abe clt r ft rmehtmpr:ill.let'lt chnik. sareng., klaus k. ( bertin. 19701 243 p. 72-j96ll1 (jsd 72·391] an a. b.c. of b:ritidl fe.ms. ma~arthy, d.1phne. 1-ondnq, 1971. 127 p., 2() plaits. 72-21s.ui (jse 72· 640] abc ~oor de ... atenporl. bron&erj, j . f. laren [197 1] . 199 p. 12·l9llll [jfd 7l·l831] abc"s ol libr.ry pronotiojt. sherma n, stc11e, l9l8· metuc he n, n.j., 19 7 l.lv, u2 p. n .nn •7 (jld 72-1106] a. c. a. for corpot"ate 'body represc:r.te d by these iflilials. !4x: assocbtell coundls • f tile arts. a cu d: • n-itk al imi1llt int• israer.s. 111--.s. 1, no. s/6-; lt\ l.!g._ 1971?· oeylon. mo. curr ent i$si.j£s availa8le in pejhodic a ls division. 72-421 5-to (per d l,,] a. c. s . moaow•ph. ~ee ame rican chen~ieal society. acs mo11•1raph. a etui ha rfd•no. mn trona.rdi, lucio, 193().. milano, )971. 14fl p. 71·2hi1.s [jfil 71· 1735] t\vail ..-.ole 1n periodicals division. i'2-2ui24 (p~div.] a. i . p. see arhr ftaa l11stit•te of pi••mts. a. j. r. se.~ au.claz.imv .ita1ia .. rsui. a. j. c. su a .. r~ jewisll c~tec. a. l. g. 0 . l. (computer program languagei see algol (computer program unguagei a. l. p. see lllbor parf)" (a.astraua). a ia quile de din. s.b4e. philippe de. parfs. 1970.96 p . 12·11112> [~fo 7:1-,78] a ia ~he clts trisws ~ bohattc;. m ilnsl.a\1, [pracuc-. 1970) ,;g, (63j p. nf ill"-s. lp :u1 c nl.) 1l-<40jil76 [ifg 12--9"] a la vitille rutslf:, in.c. the att or the anldsmith. & the jc:we\tr; a loan c'lhibition ror the bcntfit c f the: y oun1 women·, chri51i an auoc:ia,ion of 1hc: city nf new y ork. novtn~bc:r 6·novcmbc:r zl. 1961. new york (1 96!1 u9 p , ill us. (pari col.) 2& em. 7l.j09ll2 [mn o 71-ui] a. m . a. rc: a.-rkaa mn.ical auoclmka. amdg, a histor y ol c•dhts coli~ 18?0.79. harney . thomas e .. tl9 7· c~n i!dus col le&e. n.e w york (1 971lls ~ p. n-m(;}o 7 holo] a. m. i. r . a. ~ a111straliu miael"'!ij lnlll•stries ranrch auod•tio.t. a. m . i. r . a. bullt tla. a lluraliaa mu~cral industries rnurc:h associauon. bulletin. fmtibou rnc:) full recolo o f holdi ngs in central serial jt.ecord12·171208 [jsp 7l-29t] a. m. s. !icc: ame riun m•t .. c~nclk•l s odety; amcric:a n m et toroloaica l sot'if:ty, a m o•tevidu. 0<)uoq:ha.lk. louis moreau, 1829· 1869 . (s)rnphony. nca. 2) (1168?) score: (sj pi 7l·lt;<91 (lng 7h9) aaron. daniel. 19j2· supervi;sioll :and c un-lcwlum de11elopmen 1. a. s. c. i. see hylller•bn, 11\di:a. admin.lscr • tl'e s taff col~ae ol i-.4.1l a. s. 1. s. stt aidt:ric ... soclelr r.,. t.r.r-.ati .. sdctl«. asis werlc,o.p ef c..,.t~r cdmposkior, washi•ston. d. c., \970. prn.::ccdin &'· edited by r o bert m. lartd.au. wu hina:to n , atiutican society fnr t11rorm<1 1i on scicnc:t [i 97 11 he., 2 58 p. illw. 24 em. bibliojtllpl'ly: p. 249-l so. 1l·l'161jo (jfe 72·1217) a. s. s. £. me a.uricu s.c~ty of sa-.itary ~ ... a. s. t. d . see amet~ s.clety fer tr ... ni"l md j>e..-~lop•t•t. as1mt l£s/ aiaa sptt:t sinn1.lation co,.ruc "c~, :zct. philt delpllia. 1967. tcchnk~l p.ap~r!l. ~~:~ri':.~h;~;)j~~219 ~~t~r. ~3' ~:.tin& •cch.pon10rrd b y a..nmta.• s!ikiely for ttu.i'-1 •nd m i mrim. lm;ritlll:t of £nnm.-enut st-ienca (md) j\llteri.:n lm.~itimt of auou.ckj. md ast~nt~wtk:l.. • 7l...ojii10 [js p 71·5 11] a., s ergio lllirliitr. ~c lll:irlien a •• str&io. 1945· a. t . .1\. sc~: great brlu,fr~. ait tr:ansporl a~txiu•ry. a. t. £ e. m. ~e asod.lcih t£cni~ esp:aiol:1 lilt esiydi•s m ttallilrekos. a. t . l k c acadttnit tt ri:..sopolita n 4e k tras. a·l' (fighter·bomber j'l.anf:sj ••• mustang (fighte!li'lanes ) a llt rnpo y (uea:o! mdn stntos, frllnci ~cn. sa.ntia ao d e: chile, 1970. 59 p. 72274u4 [ jfc 7:.695) a. v, h . sec: hln&:m)'· alla• •ne\nu hat6s.ic. avr; allatllltilter vlit.nt.tf·rc-port.(n r.j i·; 1972· ticusetwamm. gcrm.any. c urrent fig. 2. the nypl research libraries dictionary catalog supplement, november 1972: az page 1. this catalog was produced in 6 and 7 pt. type set on 7 pt. body utilizing a three column format. captions are in 8 pt. type. the mid-manhattan ubr/try ..4pmi~ lauis, li/j1·1 873. (cont.) ~~-~oflbt~ofl., ................ llkll!~ dte~ ...-l placci 1·14.,. ~ preaded .,. .. t.lwlti!.ui ouul.lhwlqof ~,._.., ~ot irwse.autl• ......__,~,...cm66. rt4ml'tln ,_~* .. a-at .. ._......., • aa... .. ....._ ~ neucww. 1o 24 )lilo.t ju7. 7j.jwio cd [551.312·a) agassiz. lquis, 1117-1173. luri&, edward, 19l7· loub apssiz. chieqo (cl960] 449p. 70.104ut c.:3 [b·apai .. t) tharp, louise (hall) 1898· adveniuj"o\ij alliance. booton (el g$9] 354p. "'"'"' cc3 [b·apui&-t} "'-*• jit•......, ll77•l!n7. felaaieo and impromphll. freeport, n .y., 8oob for ullraries prasa [ 1967] :u8p. <-r --) _ol ... ltuod. 71h1,.. c.:l [82+apta) agatbias, schoi.asticus, d 512. cam~, averil apthiu. oxford [c l970) 168p. (t~;~;i't~ a.,.._, ~~odor aluloj, thothe na..t olllcer's alli. 70.100619 c.: co4 [b·apew-l) manh, robert, 1932· a,new, the une•aminod-. new york [e1967) 182p. 71· 111010 c.: co4 [b-a-w·m] ..._. s. y .... alma. s-..t j_,., , .... fig. 3. the mid·manhattan names catalog, april 1972: acit page 20. this is a divided catalog produced in a two column f01mat utilizing 6 and 8 pt. type set on 8 pt. body. 14 journal of library automation vol. 6/ 1 beyond the ttable: state. new york. [<1971) 2·2s. p.; 71·591412 c.. (301.1.~5) bcr_. t~w a*-'le tt.a.. schon. oo:n•ld. ap. new yoct (ct971} 2-ls.c p.; tl·s~i47l q4 [l01.2 .. s) ...... .... amii. (eel.) freeport, n.y. (1971. df443}29tp. 12·210216 co6 [94u31 .. a) a.,o~ ........ krotj'iey, herbert. new york [~1966} 209p. 7l·'89so cc4 (3tu41-k) n. 8.._041 gila owl !u eplld,. ol ra.l sltiner, rudou', 18.61·1925. ntw vo.-k [1971) 102 p. tl·s919h q4 (294.5924-s] bbofn. o;.>wu, c. od.mcpu, 19ll· n"' york.[cl969) )17, l26p. 70.4301s9 c:.4 [966.905.0] "he bulle ami dn: tade•t nc• £me. v,~,~,, p.ola.ad de, l91l3· carden city, n.y .• 1911. 21!4 p . 1l·1386s8 cof [221-v] :bibl£ ·comment allies. bisek, m&nhcw_ (~.} peeke'a coml'l)tfttajy on lhc bib~e. (loll.doa, e196l:u 116, »ii) n:..~:=~. th::' .. -ti~ 114'7·l9u. detroit, 1961. sv. 11·399610 co6 [oiu7j.b) blb ..... ~a '-' iaiu .. amri gw.. fig. 4. the mid-manhattan titles catalog supplement, july 1972. this page was created as a test utilizing 4 pt. type on 4 pt. body. the actual supplement created for use by the public was created in 6 and 8 pt. type. automated book catalog subsystem/ malinconico 15 processing step, we shall state the objectives which we set out to meet, and the constraints-generally economic-under which they were met. method of publication as it is economically impractical to publish the entire catalog on a very frequent basis, a cumulation/ supplement scheme was adopted. two basic types of supplements are possible: ( l ) a supplement containing only new items for the period represented; or ( 2) a cumulative supplement containing all items new to the system since the last appearance of a cumulation of the entire collection, automatically replacing all previous supplements. the latter is more costly than the former. the economic desirability of the former was eschewed in favor of convenience to the user. under the scheme adopted, a user has, at any time, only three sources to considerthe retrospective catalog, the prospective cumulation, and the cumulative supplement. b we have derived several optimization formulae for reaccumulation schedules.19 application of these formulae indicated a reaccumulation cycle of approximately one year, assuming that supplements would appear monthly. the formulae also indicated that a small premium would have to be paid for the administrative convenience of spreading the printing and processing load of the cumulation over the span of the entire reaccumulation period, compared to the cost of a complete printing at the beginning of each period. the adopted publication scheme calls for the publication each month of *2 of the cumulation, together with a supplement containing all items which have not yet appeared in the cumulation and those which have been altered since their appearance in a cumulation. the division into twelve segments is table-controlled; the number of segments may be varied from one to sixteen. for example, in january a cumulation is published for the alphabetic span a-b; a supplement is published for the remaining letters of the alphabet. a similar situation would occur the following month, etc. thus, at any given time the public is presented with a set of volumes representing the cumulated catalog and a supplement which contains all material not found in the former. the public is unaware of the fact that the cumulation is being cyclically updated. they are only aware of the fact that they have no more than three sources to consult: ( 1) the old card catalog, ( 2) the basic cumulative book catalog, and ( 3) the cumulative supplement. the fact that entries are migrating from the supplement to the basic cumulation each month is of no consequence from the standpoint of catalog usage. the decision governing representation of an item in a cumulation or supplement is made on an entry by entry basis. for example, one of the n all material in the card catalog h as become known as the retrospective collection, and all material entered into the automated system after january 1972 has b ecome known as the prospective collection. 16 journal of library automation vol. 6/ 1 march 1973 subject added entries may have migrated into the cumulation; hence, it will no longer appear in a supplement. however, the main, and all other added entries, falling into different filing ranges, will continue to appear in a supplement until they too can be absorbed into the cumulation. similarly, alterations to a bibliographic record will cause only those entries whose text or sequencing is affected to reappear in a supplement. a change to or an addition of a subject tracing will cause only that subject added entry to be regenerated for inclusion in a supplement. the main, and all other added entry citations, which remain unaltered, need not reappear in a supplement (assuming they have previously migrated into the cumulation). condensed added entries in order to keep printing costs to a minimum, all added entries are condensed; title page extension, publisher, and bibliographic notes do not appear under any of the added entries, the assumption being that the user who is interested in such data will take the trouble to refer to the main entry, which contains the complete bibliographic citation. this type of back-and-forth reference, while quite awkward in a card environment, is extremely simple in a book catalog. economic considerations also led to the decision to suppress tracings from the main entry. the system was designed to allow these decisions not to be irreversible. the choice of data which are to appear with an entry is governed by a set of tables which may be readily altered should it be desired to change the format or context of an entry. punctuation of condensed entries is accomplished automatically. this is not a trivial problem, and one that only a cataloger can truly appreciate. consider, for a moment, the myriad ways in which bracketing may occur within the title or imprint statement, and the ways in which these may span the two fields. add to these factors the rules which do not permit the appearance of double punctuation. we have found that punctuation of added entries is effected correctly in 98 percent of catalog entries. in those instances in which ala punctuation rules are observed in the complete record, correct punctuation is assured (this is not true of cataloging obtained from european sources). control of cross-references it is in the realm of cross-references that the mindless consistency of the computer is most effectively employed. the goal to which we addressed ourselves was the absolute integrity of cross-referencing. under no circumstances-short of erasing a cross-reference from a previously published catalog-were cross-references to refer the user to a heading which did not have an associated bibliographic citation. all meaningful cross-references providing alternate access points to a citation must appear. by the same token, in order to minimize costs, cross-references which appear in a automated book catalog subsystem/malinconico 17 cumulation available to the public are not to be repeated in a supplement. cross-references to a heading would be considered valid entry points to the catalog when bibliographic citations appear under a subdivision of that heading. for example, the appearance of bibliographic citations under negro art-exhibitions would cause all cross-references to negro art to be generated (figure 5). the same rules concerning appearance in supplements and cumulations are observed for these secondary cross-references. alterations to cross-references which have appeared in a cumulation will cause the altered forms to reappear immediately in a supplement, provided the referenced heading is still in use in the catalog. similarly, alteration of the referenced heading would cause the reference to the new form to be automatically generated. nepi. aatiat.o. i..a comunili estctica il> kant. bari, adriatica, 1968. 399 p. 25 em. "2. edizione accresciuu: includes biblioarephl~al rcrcrepces. 72-283171 [jfe 72-659) noeriqm. shapiro, norman r. (comp) new york [1970) 247 p. 72-4010599 [jfd 72·5021) ne necro ud j-.lea. pim, bedford clapperton trevelyan, 1826-1886. freeport, n.y., 1971. vii, 72 p. 12·3324'8 [hrc 72·749) ne necro ocl doe • ....._ bond, frederick weldon. college park, md. [1969, c1940] x, 213 p. 72-365267 [mwed 12·657) negro art • exhibitions. ~ harlem cultural council. new blac~ts. [new york, 19691 [541 p. (chietry illus., ports.) 72-420544 [mcw 72-9il8] negro art • united states. harlem cultural council. new black artists. [new york, 1969] (s4] p. (chien.x. ~u .... , ports.) 72-420544 [mcw 72-908] negro art· united states • history. chase, judith wraag. afro-american art and craft. new york [1971] 142 p. 72·363299 [3-mamt 72-910] negro artists· united states. fax, elton c. seventeen black artists. new york [1971] xiv, 306 p. 12·31l294 [mamt 72·732) negro arts • harlem, new york (city) hujjins, nathan irvin. 1927harlem rcnatasance. new york, 1971. xi, 343 p. 12-173133 [jfd 12·3936] afro-americans. new york (19711 61 p. 12·261130 [jnf 72·6] negroes. . black america. new york [1970] xv, 303 p . 72·296234 [iec 72·1178) negroes· addresses, essays, lectures. goldstein, rhoda l. black life and culture in the united states. new york [1971) xiii, 400 p. 12·240427 [iec 72-1 ul6] necroes •d the ....,.t depl'essiotl wolters, raymond, 1938westport, conn. [1970j.xvii, 398 p. 72-296828 [iec 72-1260] negroes • art. see negro art. + negroes as businessmen. andreasen, al~n r, 1934inner city business. new york [1971) xix, 238 p. 72-3371ss [jle 72-977) durham, laird. black capitalism. washington [1970) vii, 71 p. 72-401063 [jld 72-1931] jones. edward h. blacks in busin.,.., new york (1971] 214 p. 72-4008520 [jld 71-2200) negroes as businessmen • directories. national minority business directories, inc. national black business directory. (minneapolis] full record 01' holdings in central serial record. 12·406758 [jlm 12·221] negroes as physicians. sec negro physicians. fig. 5. the nypl research libraries dictionary catalog supplement, october 1972: l z page 120 and 121. these pages demonstrate the generation of the cross reference negroesart see negro-art even though only subdivisions of negro-art appear in the catalog. a further consideration extends to cross-references which have migrated into a cumulation. when a cumulation segment is updated, all cross-references which previously appeared in it should continue to appear if, and only if, the referenced heading is still in use in either the same segment of the cumulation, another segment of the cumulation, or a supplement; if not, its use is discontinued. subsequent use of the referenced heading would then call up the cross-reference for reuse. each of the above desid_,. ...: ._______. ··---· _ ~ 18 journal of library automation vol. 6/ 1 march 1973 erata requires rather intricate logic when the cumulation is being produced in monthly installments, as any of the following is possible: 1. cross-reference in a supplement, referenced heading in a supplement; 2. cross-reference in a supplement, referenced heading in a cumulation; 3. cross-reference in a cumulation, referenced heading in a supplement; 4. cross-reference in a cumulation, referenced heading in a cumulation. in each case, the cross-reference must be suppressed whenever the referenced heading disappears from the catalog available to the public, but must be retained when it refers to a heading existing in any part of the catalog. the cross-reference and referenced heading may easily appear in catalog segments published as much as eleven months apart, making it absolutely essential that both the authority and book catalog subsystems maintain strict control of the cross-reference structure. control of hierarchies it was decided that the appearance of cataloging under a subdivision of a heading which contains associated notes should cause the higher level heading with its attendant notes to appear. such a heading would be forced to appear regardless of whether or not it itself headed a bibliographic citation, under the assumption that notes concerning a heading might he valuable to a user interested in a subdivision of that heading (see figure 6 for an example ). acta symbojjc:a. v. i, no. 2·; fall, 197(). [akron, ohio] current issues available in periodicals division. 72·218723 [per. di•.] acflng. schreck, everett m. princij:iies and styles of acting. reading, mass. [1970) 354 p . 72·24is44 [mweq 7:z-t57] acflon songs. see games with music l'actirite utistiq•e. philippe, marie dominique. paris (1969)1 v. 72· 272967 [jfd 72-1443] actirities by v.no.s ceatraj bllllks to pro-'ote economic ud sod~ •elfue prop-aas [by lester c. thurow and others] a staff report prepared for the committee on banking and currency, house of representatives, 91st congress, second session. washington, u. s. govt. print. off., 1971 .' vii, 332 p. 24 em. at head of title: committee print includes bibliographies. 72·288174 [.jle 12·835] actors. here are entered works on actors .. including bolh men and women. works about women actors alone or women as actors are entered under the heading actresses. actors, american • biography. shaw, dale, 1927titans of tbe american stage. philadelphia [1971) 160 .p.· 72·313460 [mwer 12-524) fig. 6. the nypl research libraries dictionary catalog supplement, august 1972: page 2. the heading actors is caused to appear due to the presence of a scope note and the use of a subdivision of the heading. automated book catalog subsystemj malinconico 19 dictionary and divided catalogs the same system was required to serve two divisions of the new york public library, each of which has different traditions and philosophies of service to identifiably different users. therefore, an additional flexibility was required of the system: the ability to produce both dictionary form and divided catalogs. the research libraries, which have traditionally used a dictionary fom1 of catalog, wished to continue that practice. the branch libraries, on the other hand, felt that their public could be better served by a divided catalog, separated into titles, subjects, and names. the system was designed in such a manner that the modification of a single parameter in the final sort would produce either form of catalog. book catalog subsystem-technical description the entire subsystem consists of ten separate programs, each of which will be described below. the flow charts in figures 7, 8, and 9 depict the processing how of the subsystem. the system was designed to operate on an ibm 360 model 40 (which has since been replaced with a 370 model 145) with 256k bytes of core storage. the programs were written exclusively in bal for a dos configuration. a conversion to full os has recently been completed. each processing step described below is executed sequentially. significant peripheral devices required are: five tape drives, one disk drive in addition to those required by the operating system, and a line printer. please refer to figures 7, 8, 9, and 10 for the programs and files referenced by symbols pi, tl, dl, etc. entry explosion and construction-program pl this program serves as the driver for the entire subsystem. in this step entries are selected for inclusion in a supplement or cumulation segment. requests for data required from the authority :file are initiated. the format and data content of each entry are defined by this program via a set of tables. these tables may be altered at will, allowing redefinition of the format and content of any entry. the bibliographic master file is updated to indicate the appearance of an entry in a cumulation, preventing its subsequent appearance in a supplement. in addition, this program is charged with accepting communication of activity to the authority file and taking the appropriate action with respect to the bibliographic file. this activity may take several forms: alteration of a heading, change of delimiting, change to a filing form, posting or removal of a cross-reference or dual entry, change of categorization, or the complete transfer of all cataloging from one valid heading to another. evidence of activity to an authority heading is carried on the authority 1 bibliographic linkage file ( d0 ). when such activity has affected a head20 journal of library automation vol. 6/ 1 march 1973 fig. 7. subsystem flow chart. explode catalog entries generate responses, select headings.and update x-reference linkage pl create requests for x-references, dual entries, &. higher level headings p3.1 automated book catalog subsystemj malinconico 21 format headings module format headings module fig. 8. subsystem fiow chart. eliminate duplicate heading requests p4 locate higher le ve l headings, & dual entries ps 22 journal of library automation vol. 6/ 1 march 1973 create requests for secondary x-references, write headings p3.2 locate x-references, update x-reference indicators p6 fig. 9. subsystem flow chart. format headings module format headings module automated book catalog subsystem/ malinconico 23 insert authority text data into skeleton catalog entries p7 pagination p8 fig. 10. subsystem flow chart. 24 journal of library automation vol 6/ 1 march 1973 ing used by a bibliographic record as an authority field, the field is tagged for verification by the authority file in the next file update/ authority-interface run. the indicator for the field in question, denoting previous appearance in a cumulation, is turned off. at the same time, the indicators for all other catalog tracings which require that authority field as data are turned off. when a transfer from one heading to another has occurred, the new linkage number is inserted into the authority directory of the bib· liographic record. this is not absolutely necessary, as the authority / bibliographic linkage file provides the link via a chain when a transfer has occurred. nonetheless, the insertion of the true authority control number into the bibliographic file eliminates the necessity of a chained search in all future accesses of the tracing, space on the linkage file is conserved, and no additional indicators are required to make note of the fact that the entry has been caused to reappear in a supplement as a result of the transfer. in all cases of activity to an authority record, reverification is forced for the associated tracing field in order to guarantee correct usage of the altered authority. each bibliographic record is examined to determine whether it will contribute to the .catalog. this is done on an entry by entry basis. each field of the bibliographic record capable of defining a catalog entry is examined. all fields which define a catalog entry (tags 1, 245, 4-, &-, 7-) carry a set of indicators denoting appearance in the cumulation, and a number defining the cumulation segment into which the entry should file. an additional indicator for authority fields denotes the presence (or absence) of an associated dual entry on the authority file. appearance in the cumulation and filing segment number of the dual entry are also carried in the bibliographic record, allowing independent control of the dual entry citation. as may be readily seen, the dual entry acts as a phantom tracing in the bibliographic record and will thus not be specifically mentioned in the discussion of selection criteria below. an entry is selected for construction on the basis of the following criteria: 1. the bibliographic record is in a valid status, i.e. has passed all editing tests, and sufficient time for proofreading has elapsed. 2. all authority fields required for construction of the entry have been verified against the authority file in the weekly bibliographic file update/ intedace production runs. 3. it files in the segment being produced that month. 4. the indicator denoting appearance in the cumulation is not set. thus, any alteration to the content of a bibliographic record, warranting immediate reappearance of an entry, may be communicated to the book catalog subsystem by the extinction of the cumulation indicator. both cumulation and supplement entries are created in the same run. the entries are separately collated by causing the highest level of the final automated book catalog subsystem/ malinconico 25 sort to be a code denoting supplement or cumulation. it will prove fruitful at this point to draw a distinction between a catalog entry-the printed bibliographic citation-and the machine record which is created by the system prior to phototypesetting. the machine record is nothing more than a highly organized print record. the final merging of such print records from various processing steps completely define the text, typography, and sequencing of the final printed catalog. the machine print records created by the system up to step p8 will be referred to as text entry ( te) records. when an entry is to be included in a particular month's catalog segment or supplement, a table for the particular type of entry is consulted in order to determine the data and the typographic commands which will govern the entry's format. at this point only a skeleton text entry record is constructed, as all authority data will be obtained from the authority file. the sequencing information is contained in the sort key of each te record, which defines six levels of sorting: 1. collation-catalog or supplement. this is further refined when a divided catalog is being produced. 2. level i sort, and sort code. 3. level ii sort, and sort code. 4. level iii sort, and sort code. 5. publication date. 6. publisher. in the case of certain series entries, level ii and iii may be split into two half-size levels by the program in order to further refine the sort sequence. as an example of the use of sort levels i, ii, and iii, we might consider a subject added entry. in that case, the level i sort is defined by the filing form of the subject tracing, level ii by the filing form of the author's name and level iii by the filing form of the title of the work. the sort codes are used to separate entries which would result in the same sort keys but are conceptually different, e.g. a name which might simultaneously define a title added entry, a main entry, and a subject added entry. a similar situation exists at the second sort level where conventional titles are to be separated from titles or subject title entries. sort key levels, as all other data elements required in a te record, will be directly inserted into the record under construction if they consist of nonauthority data, and will be identified by linkage codes for later insertion when the filing form data is returned from the authority file. the final te record will not be completed until step p7, to be described below. following construction of the sort key (or indications to complete a sort key) typographic commands and text data are inserted into the te record. the typographic commands are contained as binary bit settings in a record directory. the directory also defines the location and length of each data element, or gives a linkage code when the data are to be obtained from the authority file, and hence cannot be inserted until program p7. the order 26 journal of library automation vol. 6/1 march 1973 of entries in the directory defines the printing sequence of text data. thus, when text data are available, true locations and lengths are provided in the record. when they are not, linkage codes replace them in the directory. these linkage codes are simply replaced by true locations and lengths when the authority text is added to the end of the record by another program (p7). it will suffice at this point to mention that all typographic commands are present in the record. the function of the commands will be discussed in detail below when the pagination program ( p8) is discussed. having constructed a set of skeleton te records, the program initiates requests to the authority file for authority text data and filing forms. requests are also made to the authority file for headings which are to print above the bibliographic citations. these headings will be constructed in the same manner as catalog enbies, i.e. as te records. they will then be merged with the respective te records as citation entries. these heading requests also initiate a sequence of processing steps culminating in the location and formatting of all relevant cross-references. the necessary crossreferences are formatted into te records, and are likewise merged to form the complete catalog. when an entry is chosen for inclusion in a cumulation segment, indicators to that effect are set in the bibliographic master record; it is then written onto the updated bibliographic master file. locate authority data and select headings-program p2 all inquiries to the authority file are sorted into authority sort key sequence and matched with the authority file. all inquiries will result in a match to a valid authority record. a match for each inquiry is assured by the weekly file updatej intetface processing programs. inquiries to the authority file result in any combination of the following actions: ( 1) authority text and filing data are supplied, via a response record, to program p7 for the completion of te records created by program pl; (2) authority records are selected to serve as headings above bibliographic citations (these same records will also cause cross-references to be selected); ( 3) authority records are selected in order to initiate a search for the associated dual entry, as per instructions contained in the inquiry record. the selected headings consist of complete authority records with instructions regarding their eventual use and routing. headings are routed, via a collation code, into cumulation segments or supplements. since a single authority heading may appear as both a main entry and subject heading, indicators are set defining its eventual use as one, the other, or both. these indicators will be called usage indicators. usage decisions made by pl are passed to this step as part of the inquiry records. the results of these decisions are then transmitted as a set of codes inserted into the se~ lected authority records. automated book catalog subsystem/malinconico 27 this program is further charged with the responsibility of keeping current the catalog status indicators for cross-references by maintaining two binary indicators with every cross-reference. a cross-reference record with multiple see fields will have a pair of indicators for each see field. the first binary indicator denotes prior appearance of a cross-reference in a cumulation segment. the second indicates that the referenced heading currently appears in some part of the catalog. in passing through the entire authority file, this program will note that a heading which falls in the current month's filing range has had no requests for its use lodged against it. when this is the case, transactions are created for every cross-reference, defined by see froms in the heading record, extinguishing the second binary indicator described above. the cross-reference will then not be used again until it is required. the need for this operation will become more evident when we discuss program p6. the maintenance of the physical linkage between cross-references and headings is performed by the authority file update subsystem. this subsystem guarantees that the linkage is kept current regardless of alterations to headings and cross-references. hence, all see froms are guaranteed to refer to a cross-reference (direct see) record on the file. explode hierarchies, cross-references and dual entries-program p3.1 the selected authority records are examined for the presence of see from fields. if any are found, they are used to create further inquiries to the authority file for cross-references. a similar operation is performed for dual entries with the exception that the dual entry inquiry is not created unless it was requested by program pl. the request is passed via indicators in the inquiry record (as discussed above in the description of program p2). all records which are subdivisions of headings, e.g. sculpture-technique, will cause inquiries for all significant higher level headings ( sculpture in this case) to be created. higher level headings will supply additional entry points via cross-references to them, or may themselves appear if they contain notes. cros.'i-reference requests are separated for later processing. they will be processed with requests for secondary cross-references to be generated by program p3.2 below. exclude duplicate headings and separate inquiries-program p4 this program is nothing more than a sort with exits. the input tape of selected headings and higher level heading requests is sorted, and if a request for a higher level heading has already been filled by a heading selected in 'p2, the request is dropped. all usage information carried by the request is logically added to the matching heading. when multiple requests for the same higher level heading are discovered, all but the first are 28 journal of library automation vol. 6/ 1 march 1973 dropped. usage information from all duplicates is added to the retained request by a logical or operation. the authority records which were selected by p2 for use as headings are formatted into complete text entry (te) records for later input to the pagination program. te heading records are formatted by a single module invoked by this step and again in p5. the surviving hierarchy requests, and all dual entry requests are separated for processing in the next step. format headings module all heading records selected for print are processed by this module, which converts the input text and filing data of authority recmds into te records. at times quasi-duplicates of the te record are constructed with different filing and typography codes for use as main entry and subject headings. at times portions of the data are encoded as nonprinting because it is lmown that the print data will be provided by other heading records. this is the case with author/ conventional title records. the author heading is assured because of the explosion of higher level headings; hence, a simple method is provided for insuring its appearance only once regardless of the number of associated conventional titles. when a subject heading record is created, the heading is made to appear twice in the record, once in upper case for printing, and once in its normal upper and lower case form, encoded as nonprinting, for possible use as a dictionary heading by the pagination program. the conversion to upper case is effected via a translate table, because of the presence of control information within the text for floating diacritics. also, diacritics and many special characters do not have a simple upper case equivalent due to the use of the complete ala character set. punctuation of cross-references is effected in this module. the complexities by no means approach those encountered in punctuating condensed added entries; nonetheless, they do exist. for example, terminal periods in headings referenced in a cross-reference must be replaced with semicolons when more than one heading is referenced, a blank mus l be inserted following the hyphen and preceding the semicolon in open ended dates, the final referenced heading in a string must end in a period unless it terminates with a hyphen, quote mark, exclamation point, question mark, parenthesis, etc. typographic codes which apply to headings, notes associated with headings, and phrases in cross-references are inserted by this program when te records are created. locate hierarchies and dual entries-program p5 all heading requests are applied to the authority master file. when the heading corresponding to a request is located, the entire authority record is written onto an output file for further processing. this process is simautomated book catalog subsystem/ malinconico 29 ilar to that executed when the original heading requests were processed in program p2. higher level headings are encoded for use in accordance with their categorization and filing form. when a requested dual entry heading is located, a te record is written for later processing by the pagination program. a response record containing the filing form of the dual entry is also written onto an indexed sequential disk file. a direct access file is necessary since the catalog record contains only a link to the primary heading, and all requests for the dual entry come via a request against the primary heading in program p2. rather than attempting a complex scheme for keeping track of all bibliographic items requiring the dual entry data, only one copy of the dual entry response is isolated and indexed by the control number of the primary heading. it is then retrieved on that basis when needed. explode secondary cross-references, separate and select hierarchical headings-program p3.2 this program is simply a phase of program p3.1 described above. the major difference lies in its handling of the authority records which it accepts as input. they are written out as te records, but only if they meet one of two conditions: if the authority record matching the heading request contains notes, it is selected for eventual formatting into a heading; or if it represents an author, required of an author/conventional title combination. in all other instances higher level headings are not selected for printing. the format headings module is invoked by this step for all higher level headings selected for print. if secondary cross-references are not desired, the explosion module which creates the requests is simply bypassed. similarly, higher level headings may be suppressed. no further attempt is made to generate higher level headings, as they have all been exploded in p3.1. the exploded cross-reference requests are separated in this program, just as they were in p3.1. locate cross-references-program p6 prior to execution of this step tapes t3.1, t3.2, t3.3 are sort/merged into a single tape t3.4 (figure 9). t3.4 now contains all of the transactions generated by program p2, and all cross-reference requests. recall that p2 has created transactions extinguishing the indicator carried by cross-reference headings, denoting that the referenced heading appears somewhere in the catalog. the sort causes all of these transactions to be applied before any cross-reference requests are processed. it might appear a bit paradoxical that a request should be made to a cross-reference whose referenced heading was not selected in p2; however, recall that a cross-reference may be invoked as the result of the use of a subdivision of the referenced heading (secondary cross-reference). at this point some discussion of the cross-reference record is in order. a cross-reference may point to several headings simultaneously, e.g. ani30 journal of libraty automation vol. 6/ 1 march 1973 mals see aardvarks/ bears/ cats/ ... zebras. each referenced heading is controlled individually. only the required references are extracted as needed. in the example above, if aardvarks and cats appeared in the catalog those two references would have been selected, and no others. hence, the discussion which follows will be greatly simplified if we consider each cross-reference transaction to apply to only a single reference. this is effected operationally by carrying the control number of the heading which gave rise to the cross-reference request within the request. following the application of transactions, if any, to extinguish indicators, the selection fm· print logic is executed. cross-references are selected for printing when the indicator specifies that the cross-referenced heading appears somewhere in the catalog available to the public, regardless of whether there is a specific request for it, and the cross-reference is filed in the segment being produced. a request for a cross-reference which already appears in a cumulation segment currently in use is ignored. a request for a cross-reference which is not already in the catalog is honored. the actual logic is somewhat complex; however, the end result is as described above. cross-references to be printed are routed to either a supplement or cumulation installment depending upon the filing range in which they fall. when a divided catalog is being produced cross-references are further routed into the appropriate catalog on the basis of categorization. following the selection of, or refusal to select, a heading, the indicators denoting prior appearance in the catalog and linkage to a heading in use are updated. continuing integrity of the cross-reference structure for future printings of the catalog is thus assured. complete citation text entry records-program p7 prior to execution of this processing step, response records emanating from p2 are sorted into bibliographic item number sequence. sequencing is necessary since the ske-leton te records are in the same sequence as the bibliographic master file. identification of authority response data required by a te record is via bibliographic item number and a sequence number assigned to each authority field within a bibliographic record. subfields of a response record are identified by delimiter. response records are matched to skeleton te records bearing the same item number. following the match, all required data are inserted into the skeleton te record. codes are carried in the te record directing this program to perform certain formatting functions not possible in step pl. these functions include insertion of certain combinations of parentheses and brackets required by series notes, addition of a series note to certain call numbers, and the replacement of the author portion of an authortitle combination se1·ies note with his:, her:, in his:, in her:, etc. none of the above could have been accomplished in a typographically acceptable manner in program pl. dual entry data are obtained from the indexed sequential file ( dl). the automated book catalog subsystemjmalinconico 31 identification of such data is via the authority control number of the primary lc subject heading carried in the bibliographic record. this number is used to access file dl for the required text and filing data. pagination-program p8 prior to execution of this step a set of page initialization records is created for the particular type of catalog being produced. these records are prepared by a program not shown in the subsystem flow. initialization records govern the overall format of the book to be produced. there are six such initialization records, all of which must appear at the beginning of the input tape. they may also appear embedded anywhere among the te records in various combinations. the first initialization record, known as a page dimension ( pd) record, defines the physical dimensions of the page to be printed. parameters carried in this record also determine the dimensions of inner and outer page margins, head and foot margins (independently for recto and verso pages), number and width of columns, body size on which to set type, and spacing between entries. when an embedded pd record is encountered the program will terminate any page cun-ently being formatted, begin a new page, and continue formatting in accordance with the redefined dimensions. the second initialization record defines the starting page number, and indicates whether paging is to start with a recto or verso page. the pagination program may also be directed via this record to place a black square at the edge of a page, at a location defined by the record, to serve as a thumb index. this record may also appear anywhere else on the tape. when it does appear as an embedded record it commands the program to terminate the page being formatted at that point, to begin a new page, and possibly provide a number of blank pages. this allows volumes to be broken at predefined sort points. in this manner we may separate alphabetic segments, the various volumes of a divided catalog, or cumulation and supplement volumes, and move the thumb index. subsequently four records define caption and legend text (independently for recto and verso pages). any one or combination of these records may also occur elsewhere on the tape. when they do occur as embedded records, the program terminates the page currently being formatted, alters the appropriate caption and/ or legend text, and continues to format text. interfiling of these records with te records allows captions to be changed automatically between volumes of a divided catalog, or between supplement and cumulation volumes, or at any other desired sort point. the six records described above control those aspects of page format which are common to a large class of entries. individual te records carry typographic commands which are specific to the entry, or to an element of the entry. a code carried by each te record (entry fo1'mat code) defines typographical rules for the entry as a whole. this code is used to identify 32 journal of library automation vol. 6/ 1 march 1973 data to be used in the formation of dictionary and column headings when page breaks occur. certain widow rules affecting the entire entry are specified, e.g. entry may not span columns, entry may not form the last line of a column, etc. line advance commands, defining the amount of space (if any) to be left between enbies, are carried in this code. data elements within an entry may require different typographic rules. format codes for each such element are carried within a record directory. the directory also serves to identify the location and length of text data to be typeset in accordance with the typography specified by element format codes. element format codes consist of 32 bit fullwords. groups of bits within the word define separate typographic rules. these bits may be set in any combination, defining a complete spectrum of typography. the major typographic parameters governed by these bit settings are: 1. starting indention ("continue on the previously used line" is included). 2. overflow indention to be used if the element must be continued onto another line. 3. space to be left on a line before adding any additional text to a previously used line. 4. justification-left, right, center of column, and center of page. 5. type size height. 6. type size width relative to height. 7. type face-bold or light. 8. type style-roman or italic. 9. element widow rules-restrictions which do not allow text to : span columns, form the first line of a column, span from a verso to a recto page, or span from a recto to a verso page. 10. line break-indicating whether lines may be broken at blanks only, or may be broken at blanks and certain special characters. line break decisions observe a hierarchy of rules, e.g. if the indicator is set to break at blanks only and no blanks are found within the entire line, the program automatically reverts to the second option (break at blanks and special characters ); should that also fail, the line will be broken arbitrarily at the last character which fits on the line. 11. hyphenation indicator-due to the great number of foreign languages used in the nypl catalog no hyphenation routine is employed. allowance has been made, however, for the inclusion of a hyphenation module should it be desired in the future and an indicator provided in order to invoke it. other rules of lesser importance exist, but space does not warrant their discussion. the entire ala character set plus several additional characters specified by nypl may be typeset via this program on an iii videocomp. diacritics are floated onto the characters they accent. the coding structure adopted by nypl consists of two unique codes preceding a pair of characters to be automated book catalog subsystem / malinconico 33 overprinted. the first code indicates to all processing programs that the data to follow must be interpreted in a unique manner. the second defines the unique treatment to be accorded. we currently employ only two such functions codes; both imply a form of overprint. coding in this manner allows unlimited expansion of the character set. a function code has been assigned but not yet utilized for overprinting of triplets. this would be necessary in handling doubly accented characters, such as are found in vietnamese. functions codes have been assigned defining escapes to nonroman alphabets. the character set includes two blanks in addition to the normal word space. one of these will provide a word space on printed output but will fail line break tests. such a character is of great utility as a separator in abbreviations and as a word space preceding such terminal characters as a close parenthesis. conversion of the nypl data base to utilize this super blank will be effected following definition of sufficiently reliable rules for its automatic generation at input. the second blank is a zero set width character. this character, when present in a machine record, is assigned a null width by the phototypesetting device. its utility lies in areas in which it is required to remove only one or two characters from a record, but it is not desired to expend the programming or processing time in restructuring the record. all of the input text data and format codes are translated into commands to an iii videocomp 830 and written onto a driver tape. the driver tape is then delivered to a photocomposition vendor who mounts it on a videocomp to produce camera ready copy for catalog pages. the camera ready copy is then delivered to a printer who produces multilith plates, and thence, pages which are bound into monthly supplements and cumulation segments. conclusion photocomposed book catalogs have been in use at nypl since january 1971. the effectiveness of the system can, perhaps, best be judged by the only adve rse reaction received thus far: in the case of material which must pass through the bindery after cataloging, entries appear in the catalog befor e the materials reach the shelves, thereby causing annoyance to users. judged by more serious criteria, the system has been proven to be an operational success. the processing budget for the research libraries is now insignificantly higher than it was under the manual system, but cataloging volwnes have increased dramatically: 7,500 titles/ mo. cataloged vs. 5,500 titles/ mo. under the old manual system. the increase in productivity cannot be solely attributed to the automated system. some of it is attributable to the revision, by the head of preparation services, of manual procedures. e xpansion of book catalog coverage the entire bibliographic system is currently in the final stages of revi34 journal of library automation vol. 6/1 march 1973 sion for production of a multimedia catalog of the dance collection of the research library of the performing arts. 20 the organization of cita~ tions referring to material in diverse media will be accomplished by pr~ viding separate sequences under appropriate headings, denoting: works by, works about, visual works, music, audio materials. listed under each of these headings will be the following types of materials: 1. works by-written works by an author. 2. works about-written works about an author, performer, etc. (the subheading is not used under topical subjects.) 3. visual works-photographs (original and indexed), prints and orig* inal designs, motion pictures and videotapes, filmstrips and slides. 4. music-music scores. 5. audio materials-phono records and phonotape. these headings are not as specific as those suggested by riddle, et al., however, they do provide the early warning function discussed by virginia taylor.zl, zz this catalog is due for publication in early 197 4. pending the success of this venture, a study will be made of the means of extending the scope of the research libraries' catalog to include nonbook materials. in late fall 1973, an extremely exciting and bold step will be taken by the jewish division of the research libraries. they will begin data input of material in hebrew, using the recently defined ansi correspondence scheme for hebrew characters. 23 within this scheme roman and special keyboard characters have been assigned to each character of the hebrew alphabet. book catalog display of hebrew text will utilize these characters in a left to right print mode until such time as development money is found for the digitization of hebrew character fonts, and for modifications to the pagination program in order to display mixed roman and he~ brew text. all hebrew entries will be filed in accordance with conventions for sequencing hebrew text. the hebrew entries will be interfiled with entries in romanized forms by conceptually assuming the sequencing alphabet to contaln 57 characters: blank, a, b, ... , z, 0, 1, . .. , 9, n , !l , ... , .n . if we have an author who has written several titles in roman al~ phabet languages, and others in hebrew, we would create a sequence of main entries under his name interfiled according to the alphabetic sequence shown above. all hebrew or variant title added entries would be found in a sequence starting at the end of the roman alphabet. the primary reasons for adopting such a scheme as opposed to the more traditional romanization are: 1. a nationally endorsed correspondence schedule has been provided by ansi. 2. it is desired to enter this data into the automated system and end the manual operation at the earliest possible time. 3. it is desired not to have to revise all cataloging when true hebrew text may be economically displayed. it is virtually impossible to reauto1mted book catalog subsystemjmalinconlco 35 cover the true form of nonroman text from its romanized form. these two areas, nonroman alphabet display and inclusion of nonbook materials, represent the only areas in which further development of the book catalog system is planned. future efforts will be directed to conver~ sion of the batch-oriented processing system to one with on-line file maintenance capability. it should be stressed again that the primary aim of the bibliographic system is not production of book catalogs. the system was designed to create a highly controlled data base which could be used in conjunction with whatever display medium it; technologically and economically feasible. online access to the catalog will require extreme control of the data, as automated retrieval techniques require very precise definition of access points. the problems of data organization become greatly magnified when crt display devices are used, as the visual scan range produced is severely limited. the extensive development effort to produce book catalogs was undertaken at nypl since it was felt that for at least the next decade book catalogs in printed or microform would provide the only economically viable form of access to the collection. book catalogs will, no doubt, also serve as backup forms of display for a considerable time after introduction of electronic access techniques. references 1. seoud makram matta, the card catawg in a large research library: present conditions and future possibilities in the new york public library, submitted in partial fulfillment of the requirements for the degree of doctor of library science. (new york: columbia university, school of library service, 1965). 2. i. a. warheit, "automation of libraries-some economic considerations," presented to: canadian association of infornuition science, ottawa, ontario, canada, 27 may 1971. 3. james w. henderson and joseph a. rosenthal, eds., library catalogs: their preservation and maintenance by photographic and automated techniques (mit report no. 14.) (cambridge, mass .: mit press, 1968). 4. margaret c. brown, "a book catalog at work (free library of philadelphia)," library resources and technical services 8:349-58 (fall1964). 5. richard de gennaro, "harvard university's widener library shelflist conversion and publication program," college & research libraries 31:318-33 (september 1970). 6. richard d. johnson, "a book catalog at stanford," journal of library automation 1:13-50 (march 1968). 7. paula kieffer, "the baltimore county public library book catalog," library resources and t echnical services 10:133--41 (spring 1966). 8. hilda feinberg, "sample book catalogs and their characteristics." in: book catalogs by maurice f . tauber and hilda feinberg. (metuchen, n.j.: the scarecrow press, 1971) p.381-511. 9. paul j. fasana and heike kordisb, the columbia university libraries integrated technical services system. part ii: acquisitions. (a) introduction. (new york: columbia university libraries systems office, 1970). 62 p. 10. gerry d. guthrie, "an on-line remote access and circulation system." in: amer36 journal of library automation vol. 6/ 1 march 1973 ican society for infor11ultion science. annual meeting. 34th, denver, colorado, 7-11 november 1971. proceedings 8:3059, communications for decision-makers. (greenwood publishing corp.: westport, connecticut, 1971). 11. ralph m. shoffner, "some implications of automatic recognition of bibliographic elements," journd of the american society for infor11ultion science 22:275-82 (july/ august 1971) . 12. frederick c . .kilgour, "initial design for the ohio college library center: a case history." in : clinic on library applications of data processing, 1968. proceedings (urbana: university of illinois, graduate school of library science, 1969) , p. 54-78. 13. maurice f. tauber and hilda s. feinberg, book catalogs (metuchen, n. j.: the scarecrow press, 1971). 14. catherine 0. macquarrie, "library catalogs: a comparison," hawaii library association ]ournal21:18-24 (august 1965). 15. irwin h. pizer, "book catalogs versus card catalogs," medical library association bulletin 53: 225-38 (april 1965). 16. kieffer, "the baltimore county public library," p.l33--41. 17. james a. rizzolo, "the nypl book catalog system: general systems flow," the larc reports 3:87-103 ( falll970). 18. edward duncan, "computer filing at the new york public library," the larc r eports 3:66-72 (fall1970). 19. s. michael malinconico, "optimization of publication schedules for an automated book catalog," the larc reports 3:8185 (fall 1970) . 20. dorothy lourdou, "the dance collection automated book catalog," the larc reports 3: 1738 (fall 1970). 21. jean riddle, shirley lewis, and janet macdonald, n on-book materials: the organization of integrate d collections. prelim. ed. (ottawa, ont.: canadian library association, 1970). 22. virginia taylor, "media designators," library resources and technical services 1:60-65 (winter 1973) . 23. edward a. goldman, et al., "transliteration and a 'computer-compatible' semitic alphabet," hebrew union college annual 42:251-78 (1971). lib-mocs-kmc364-20140103102512 64 predicting the need for multiple copies of books robert s. grant, presently at hope college, holland, michigan. an industrial inventory technique adapted to a university library's computer based circulation system as one aid in identifying heavily used books for multiple-copy purchase. the university of windsor has approximately 5,000 students. the university library's open stacks contain more than 300,000 volumes, 100,000 of which are non-circulating (bound periodicals and reference books). there are approximately 200,000 books available for circulation, a booksto-student ratio of 40:1. nevertheless, a perennial student complaint is: "why is it that every time i need a book, someone else has already checked it out?" to help mitigate this problem, the library decided several years ago to embark upon a programme of purchasing multiple copies of much used books. the question then became one of determining which books would need duplicating, and how many more copies of each title would need to be bought. suggestions of titles to be duplicated were at £rst solicited from the faculty, but ever-increasing demands on them prevented their being more than minimally cooperative. three years ago, in an effort to increase the availability of books to undergraduates, the library changed its circulation period for undergraduates from two weeks to one week, with unlimited renewals. at the same time there was instituted a system whereby a student £lied out a reserve card requesting that he be allowed to check out a book upon its predicting need for multiple copies/grant 65 return. when there were five or more such requests, then a copy of the book was to be purchased. although this . system of ordering multiple copies was very cumbersome, it was better than nothing. an article by william l. leffler ( 1) suggested a system of adapting industrial inventory techniques to the problem of identifying books to be duplicated that would be compatible with the library's computer based circulation system and also could be expected to be simpler and more thorough than the above method of buying multiple copies. without rehearsing leffler's arguments, the basic formula used in this project can be simply stated as: n x n9s% nbooks = t where nbook• = the number of copies of a single title necessary to meet at least 95% of student demand for that title; t =number of days of observation, i.e., the number of days in the academic year in which students are permitted to check out books (a constant of 273 in this formula, being the number of days in the period from 1 september to 31 may); n = total number of times a title circulated during t; n9s% =a+ 2s, where a= the average length of time a title was on loan, i.e., the total number of days in which a title was in circulation divided by the number of times (n ) the title circulated. s =standard deviation, which is computed as: j l (a~a) 2 a1 is the length of time, in terms of days, that a single title was off the shelves each time it circulated, and is not to be confused with a which is the average length of time (over the academic year) that the same title was on loan. the sum ( l) of all the a1's was used earlier to calculate a. a1 + a2 + aa . ... etc. a= n for example, if a book circulated three times during the academic year (the first time for 18 days, the second time for 20 days, and the third time for 3 days) then a (the average length of time the book was on loan) would be calculated as 18 + 20 + 3 13 66 3 , or . at this point it should be noted that although the library continues to accept request cards for books presently on loan (and to reserve books for the requestors), these requests are not used as part of the data in determining the number of copies necessary to meet at least 95% of the demand. for one thing, there is no way of knowing how long the person making a request will want to keep a book out, and time is an important element in the formula. but more importantly, the formula, as it now stands, attempts to account for unsatisfied requests. it assumes that in at least some instances there will be more requests for a title than there 66 journal of library automation vol. 4/2 june, 1971 calculate t n '>-----~1 of loans + add 1 t o copies circulati ng n calcu late n o . of days on loan s tor e information in table fig. 1. programme logic. total n of days on loan re-set table n calculate average length of l oan calculate standard deviation calculate calcu l a te print report predicting need for multiple copiesj grant 67 are copies in the library. by providing an analysis of the present circulation profile of each book, the formula attempts to predict the number of copies of each title the library would need to have in order to more adequately accommodate unsatisfied demand. the programme for performing the calculations is written in pl/1 and is run on an ibm 360 j 50 (figure 1). the execution time for 140,000 circulation records (each time a book circulates the data on its circulation is considered a single record ) is 15 minutes. the historical record file, the source of data for the programme, is incremented each time a book in circulation is returned. figure 2 shows the format of this file. the file itself is a sequential file stored on magnetic tape, updated daily to include the previous day's circulation data. entries are arranged in lc call number-accession number order. field card type lc call number author accession number spare card sequence number spare borrower's id code borrower 's id number spare action code due da te (mmddyy) (mo .-dayyr.) spare indicator date charged out (yyddd) (yr.-day) date returned (yyddd) (yr. -day) fig. 2. format of historical record file. length 1 29 15 6 1 6 2 1 6 3 1 6 3 1 5 5 accumulative length 1 30 45 51 52 58 60 61 67 70 71 77 bo bl 86 91 68 journal of library automation vol. 4/ 2 june, 1971 results after the calculations described above have been performed for every title circulated during the academic year, a print-out of the results is produced ( figure 3). in order to limit paperwork, only those results under "projected need" which were ~ 1.00 appear on the print-out; any results less than 1.00 were suppressed. the column labelled "transactions" is simply the number of times the book was checked out and checked back in again. the column, "average loan period" is the a described in the formula above. and the column, "copies circulated" is the number of books with the same classification number as listed on the left-hand column, but with different accession numbers, checked out during the year. this figure is not the number of copies of the book that the library owns, which could, in some instances, be more copies than were actually circulated. the column labelled "projected need" should, according to the calculations, indicate the number of copies of a title which could accommodate the demand for that title with 95% certainty. in order to find out whether or not the library should purchase more copies of a particular title, the number listed in this column is simply checked against the number of classification author projected trans. avg loan copies need period circul . am---101.-.c3488-canada-national 3 . 61 37 10 . 45 17 b-----56.-.c6---collins-james-d 1.14 21 8 . 00 2 b-----65.-.86---bodenheimer,£.1. 21 12 11.50 3 b--65. .r6----rommen-heinrich l. 34 5 20.60 2 8--67 ..858-blake-ralph-m-2.00 4 36 .75 2 8 ----67.-.n22---nagel-ernest--2. 34 23 11 . 39 3 8----72.-.c63-copleston f.c . 2. 39 27 9.18 10 b----72. .hs---gilson-e. h.---1. 64 26 9.03 14 b-----72.-.j6----joad-cyril-edwi 2.84 8 2 1. 7 5 2 b----72.-.p3----parkerf .h . ---2.48 4 41.00 2 b--358.-.c57---plato--------5.68 21 15 . 61 3 8----358.-.j8----plato---------2.00 38 8 . 07 10 b---358.-.w7----plato-------2. 72 5 39.80 3 8----377. -. a285-plato--3.65 8 3 5. 3 7 2 b----378.-.a2c6--plato-----l. 58 2 73 . 00 2 8 -381. -. ast35plato---l. 04 3 36 . 33 3 b---385.-.a6----anderson-f h-2.92 16 13 . 43 2 b----395.-.877---brumbaugh-r-s2. 05 12 13.33 1 b----395.-.c6----crombie-i-----3.02 17 12 . 41 2 b-395. .g67--grube-georg em5.13 30 10.30 5 b----395. -. g78 --guardini,r.2. 04 17 12 . 23 4 b---395. -. k6----koyre-alexandre 1.13 4 21. 7 5 1 b-395.-.l6--lodge rupert c1. 88 3 51.33 1 b----395.. 553-5horey , paul-4.69 23 11.91 4 b--398. .t25 -taylor,a.e. ---1. 31 28 7 . 7 5 5 8 -398.-.e8h17hall , robert-w .2.99 11 16 . 72 2 b-407 .-. l8l9-lutoslaw5ki,w. 3.10 4 59.25 1 b--505. -. m2--aristotel£5, -2 . 88 17 1 2 . 00 7 b----505.-.03---oates-w.j.--3 . 86 9 2 7. 3 3 7 b--528.. z 4 13-zeller-eduard-1.39 6 33 . 00 2 8--528. .p751--pohlenz-max---1. 35 5 34.60 2 8 --667.. 525---sam bur5ky 5am-1. 36 5 42 . 40 1 b-701 . -.d4d6-dondaine , h . f . 1. 03 2 69. 0 0 1 b-701.-.a4e5 -proc lu5-diadoch 1.11 2 72.50 1 fig. 3. circulation history analysis report. predicting need for multiple copies/ grant 69 copies listed for this classification number in the official shelf list. for example, the book classified as b.72.j6 shows a "projected need" of 2.84. therefore if the library had three copies of this book, and the book's circulation pattern did not change significantly in the immediate future, then the library would be able to fill 95% of the requests. the official shelf list, however, indicates that the library only owns two copies of this title, suggesting that at least one more copy should be purchased to meet present demand. these calculations do not anticipate future demand on the book. also, doubling the number of copies can never succeed in doubling circulation, a fact demonstrated by leimkuhler ( 2). this print-out, therefore, can only serve as one guide to multiple-copy purchase. precautions and pitfalls in using the results of these computations as a guide to the purchase of multiple copies, the librarian should be aware of several factors which may have distorted the results. one is that the student who checks out the only copy of a book and keeps checking it out all year, in lieu of buying his own copy, creates a false "demand" for· the book. it may be that he is the only person in the university interested in it, and when he graduates this book may sit out its life on the shelves completely unused. however, since the historical record file contains the borrower's id number, it is possible to distinguish between an original loan and a renewal. the first time the borrower's id number appears on the book's circulation record indicates the original loan. each additional and consecutive time the same borrower's id number appears on the same circulation record indicates a renewal. although the pilot project did not contain provisions for obviating this problem, it would have been simple enough to build into the programme a mechanism for suppressing the unwanted data. a faculty member who assigns parts of books for students to read, but does not place the books on reserve, forces competition for them on the open shelves. this too creates a demand which may not exist after the professor leaves the university or stops teaching a particular course. the librarian should be aware of such possible short-lived demands that may never recur. the circulation analysis programme was executed at the end of one academic year in order to provide the university of windsor librarians with guidelines for purchase of multiple copies of books to be used in the next academic year. if it were known that a particular book receiving heavy use one year would not receive equally heavy use in the next (because, for example, the particular course requiring that book would no longer be taught; or the book would be placed on a "two-hour reserve" for the coming academic year; or the book circulated frequently in one year only because it was on the "best-seller list"), then it would be folly to purchase three or four additional copies of the book just because the computer print-out indicated that a number of additional copies were 70 journal of library automation vol. 4/2 june, 1971 needed. other factors, therefore, although not included in the input data, are certainly relevant in determining the need for multiple copies. at the university of windsor library, a book that needs to be re-bound because of heavy use or mutilation is charged out to the bindery department. it then shows up on the historical record file, just as though it had been charged out. but since the "borrower's id number" for books charged to the bindery department consists of all zeroes, it would be simple enough to identify and suppress these particular records as unwanted data. by-products in addition to providing a list of books to be considered for duplication, the historical record file upon analysis revealed several other interesting facts about the university library's circulation. most noteworthy is the fact that, although there were more than 200,000 circulating books sitting on the open shelves at the time of this pilot project, only 40,205 different titles circulated for a total of 134,276 times. assuming there were only 100,000 different titles among the 200,000 books, this would mean that nearly 60% of the collection was probably not used by the students. of the 40,205 different works which did circulate, the calculations indicated that only 3,257 titles required one or more copies in order to fill 95% of the requests. of this latter number, only 570 titles were in need of duplication. (that is to say, the number of copies listed under projected need exceeded the number of copies actually owned by the library as indicated by the shelf list.) a random sample comprising one-third of these 570 titles was checked to see whether or not the books were in print. indications were that 38% of the titles in need of duplication were no longer in print. conclusions a close examination of the 570 titles apparently in need of duplication reveals that, with very few exceptions, students are apparently checking out only books that are curriculum oriented in the most narrow sense, i,e., books which they need to use in writing term papers. nevertheless, one can appreciate the fact that these books are in demand by the student, and if the library is to be responsive to users' demands on its facilities, it will need to spend part of the book budget each year purchasing multiple copies of the most heavily used books. unfortunately, even with these good intentions and the sophisticated assistance of the computer, students' demands for books will still be frustrated (at least one-out-of-three times) because books which need to be duplicated are no longer in print. programme a print-out copy of the circulation analysis programme described above predicting need for multiple copies/grant 71 is available from mrs. jean griffiths, computer centre, university of windsor, windsor, ontario, canada. acknowledgments the initial impetus and continuous guidance for this project was provided by albert v. mate, assistant librarian for public services at the university of windsor. dr. martin basic, faculty of business administration, acted as consultant. systems analyst was mrs. jean griffiths, and programmer was mrs. lillian jin, both at the university computer centre. references 1. leffier, william l.: "a statistical method for circulation analysis," college and research libraries, 25 ( 1964), 488-490. 2. leimkuhler, ferdinand f .: "systems analysis in university libraries," college and research libraries, 27 ( 1966), 13-18. • 100 communications the evolution of an online acquisitions system jenko lukac: lewis and clark college library, portland, oregon. about two years ago a home-grown online acquisitions system was developed and implemented at pacific university. the program, written in basic for the data general nova computer, performs all the necessary functions such as ordering, receiving, fund accounting, etc. 1 this program was offered to the library community, and about one hundred libraries from around the world have availed themselves of it . one of the libraries that obtained and adopted pacific's electronic acquisitions system (peas) was the watzek library at lewis and clark college. the advantage of a home-grown system is that it can be freely modified to suit the evolving needs of a particular library. this communication describes some of the changes made by lewis and clark college to the peas program, in order to illustrate how software developed at one institution can be "imported" into and enhanced by another institution . although matters were particularly simplified by having the same person who developed peas at pacific be responsible for the enhancements at lewis and clark, the procedure and conclusions are still generally applicable. the first change made to the peas program was to rename it clas-the computerized library acquisitions system . the most important change, however, was to translate it from data general basic to digital equipment corporation basic, since the computer at lewis and clark is a dec vax-11 . (each hardware manufacturer implements a slightly different version of a programming language.) the translation requires changing things such as square brackets to parentheses, the word read to get, the word write to put, etc. these changes would have to have been done repeatedly throughout the program, but, in fact, were quite easily accomplished by using a text editor-a metaprogram that can be instructed to change all occurrences of, for example, the word read to the word get in a single pass. clas retained all of the features of peas , and became fully operational at lewis and clark in february of 1980. since then, new features have been added as the staff expressed a need for them. some are minor, such as having the computer recognize initial articles in titles . others are more significant : 1. searching for records in clas by author and title makes use of unlimited rightand left-handed truncation. this makes possible subject searching through k~y words in the title . for this purpose an extra terminal is provided at the reference desk. 2. clas permits the file to be searched by the name of the faculty member who requested the item, in addition to the eight other access points available in peas. 3. clas provides an activity report for any given period showing, for each fund, the amount ordered, the amount received, and the average cost per item . 4. clas can produce vendor reports showing for each vendor the average discount and the delivery schedule. 5. clas asks the operator to verify the cost of an item if the list price and cost differ by more than 30 percent. 6. clas allows the receipt of partial shipments. some of the enhancements to clas involved successive modifications. for example, one of the features of peas was the prevention of duplicate orders by matching new orders being input with records already in the database. a potential duplicate is reported if there is a match on both the author and the title fields . it was decided at the time of implementation at lewis and clark that this criterion was too restrictive, and clas was programmed to report a duplicate if only the title fields matched . after some months of experience, it turned out that even this requirement was excessively restrictive: a slight variation in the way a title was input would prevent a duplicate from showing up. the criterion was then further relaxed to signal duplicates if either the title or the author's last name matched. this, however, was too broad a net : although no duplicates were missed, ordering a book by wilson or smith produced a tedious list of potential duplicates. hence , the requirement was tightened slightly to look for a match in either the title or the author's last name and first initial. this final criterion is currently serving well the needs of the watzek library. what is important about this evolutionary process is that it illustrates the dynamic way in which a library can "fine-tune" an automated system that is receptive to user modifications. since peas is supposed to be a selfexplanatory system, it lacks any documentation. clas is still a self-explanatory system, but nevertheless a manual has been produced to describe all its features and to record programming information such as the structure of the files . one version of the documentation is kept in machine-readable form so that it can be easily updated to correspond to developments in the program . in conclusion, it can be stated that a library-application software package has been successfully transplanted from one institution to another, from one hardware environment to another, and in doing so has matured into a fuller and more flexible system, which it is hoped will, in turn, benefit other libraries contemplating the automation of their acquisitions operation .2 references 1. jenko lukac, "a no cost online acquisicommunications 101 tions system for a medium-size library," library journal 107:684-85 (march 15, 1980). 2. interested libraries can request a copy of the clas program ($80) or manual ($40) directly from the author. the significance of information in the ordinary conduct of life* robert newhard: torrance public library, torrance, california. the information benefit provided to the general public by the developing telecommunications systems will be highly dependent upon the provider's perception of the current and potential role of information in the ordinary interests of life. as sessing this role cannot easily be done by standard questionnaire or survey methods because information does not have a conscious function in people's lives. some paradigms from the past and present may, therefore, be of use in articulating the everyday importance of information. the tool paradigm: information as a link between man and his tools or repairing a lost confidence prior to the industrial revolution, most production was carried on in the home, using tools either made or repaired mainly at home. in this cottage industry, each person was very close to and secure in the use of his tools . with the advent of the industrial revolution and the factory system, the worker no longer owned his tools, but went to one place to use someone else's tools. man and his tools began to separate. many used the tools, fewer understood them. this process began to create the "expert." today most of the tools we use-the automobile, telephone, computer termi* a version of this paper was delivered at the meeting on "public libraries and the remote electronic delivery of information (redi)," columbus, ohio, march 23-24, 1981. editorial board thoughts: a considerable technology asset that has little to do with technology mark dehmlow information technology and libraries | march 2014 4 for this issue’s editorial, i thought i would set aside the trendy topics like discovery, the clo ud, and open . . . well, everything—source, data, science—and instead focus on an area that i think has more long-term implications for technologists and libraries. for technologists in libraries, probably any industry really, i believe our most important challenges aren’t technical at all. for the average “techie,” even if an issue is complex, it is often finite and ultimately traceable to a root cause—the programmer left off a semi-colon in a line of code, the support person forgot to plug in the network cable, or the systems administrator had a server choke after a critical kernel error. debugging people issues, on the other hand, is much less reductive. people are nothing but variables who respond to conflict with emotion and can become entrenched in their perspectives (right or wrong). at a minimum, people are unpredictable. the skill set to navigate people and personalities requires patience, flexibility, seeing the importance of the relationship through the 1s and 0s, and often developing mutual trust. working with technology benefits from one’s intelligence (iq), but working with people requires a deeper connection to perception, self-awareness, body language, and emotions, all parts of emotional intelligence (eq). eq is relevant to all areas of life and work, but i think particularly relevant to technology workers. of particular importance are eq traits related to emotional regulation, self-awareness, and the ability to pick up social queues. my primary reasoning for this is that technology is (1) fairly opaque to people outside of technology areas and (2) technology is driving so much of the rapid change we are experiencing in libraries. it units in traditional organizations have a significant challenge because many root issues in technology are not well understood, and change is uncomfortable for most, so it is easy to resent technology for being such a strong catalyst for change. as a result, it is becoming more incumbent upon us in technology to not only instantiate change in our organizations but also to help manage that change through clear communication, clear expectation setting, defining reasonable timeframes that accommodate individuals’ needs to adapt to change, a commitment to shift behavior through influence, and just plain old really good listening. i would like to issue a bit of a challenge to technology managers as you are making hiring decisions. if you want the best possible working relationships with other functional areas in the library, especially traditional areas, spend time evaluating candidates for soft skills like a relaxed demeanor; patience; clear, but not condescending, communication; and a personal commitment to mark dehmlow (mdehmlow@nd.edu), a member of lita and the ital editorial board, is director, information technology program, hesburgh libraries, university of notre dame, south bend, indiana. editorial board thoughts: a considerable technology asset | dehmlow 5 serving others. these skills are very hard to teach. they can be developed if one is committed to developing them, but more often than not, they are innate. if a candidate has those traits as a base but also has an aptitude for understanding technology, that individual will likely be the kind of employee people will want to keep, certainly much more so than someone who has incredible technical skill but little social intelligence. for those who are interested in developing their eq, there are many of tools available—a million management books on team building, servant leadership, influencing coworkers, providing excellent service, etc. personally, i have found that developing a better sense of self-awareness is one of the best ways to increase one’s eq. tests such as the meyers briggs type indicator ,1 the strategic leadership type indicator ,2 and the disc,3 which categorize your personality and work-style traits, can be very effective tools for understanding how you approach your work and how your work style may affect your peers. combined with a willingness to flex your style based on the personalities of your coworkers, these can be very powerful tools for influencing outcomes. most importantly, i have found putting the importance of the relationship above the task or goal can make a remarkable difference in cultivating trust and collaboration. self-awareness and flexible approaches not only have the opportunity to improve internal relationships between technology and traditional functional areas of the library, but between techies and end users. we are using technology in many new creative ways to support end users, meaning techies are more and more likely to have direct contact with users. in many ways, our reputation as a committed service profession will be affected by out tech staffs’ ability to interact well with end users, and ultimately, i believe the proportion of our tech staff that have a high eq could be one the strongest predictor s of the long-term success for technology teams in libraries. references 1. “my mbti personality type,” the myers briggs foundation, http://www.myersbriggs.org/mymbti-personality-type/mbti-basics. 2. “strategic leadership type indicator —leader’s self assessment,” hrd press, http://www.hrdpress.com/slti. 3. “remember that boss who you just couldn’t get through to? we know why…and we can help,” everything disc, http://www.everythingdisc.com/disc-personality-assessment-about.aspx. http://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/ http://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/ http://www.hrdpress.com/slti http://www.everythingdisc.com/disc-personality-assessment-about.aspx lib-mocs-kmc364-20131012113323 velopment, which has recently seen the implementation of a new batch retrospectiveconversion subsystem, and added com catalog options and online authority verification during input/edit. while not the only bibliographic system to be successfully replicated, the wln computer system is becoming the most systematically replicated main-frame facility, with a broad range of future possibilities, including that of a truly turnkey system. wln's experience indicates that, if a system is designed for ease of maintenance at perhaps some sacrifice of efficiency, it will be readily transportable and allow others to obtain the benefits of a highly sophisticated bibliographic capability without the everincreasing cost of original development and, more importantly, without having to support the ongoing maintenance of a unique system. a general planning methodology for automation richard w. meyer, beth ann reuland, francisco m. diaz, and frances colburn: clemson university, clemson, south carolina. introduction a workable planning methodology is the logical starting place for the successful implementation of automation in libraries. an automation plan may develop on the basis of an informal arrangement or from the efforts of one individual, but just as often, automation plans are developed by committees. an automation planning committee must determine and execute some kind of planning methodology and is more likely to be successful if it starts with clear guidelines, good leadership, and a thoroughly proven approach. as a summary review of the literature will bear out, many libraries have developed their own planning techniques inhouse. some of these, which are addressed to the issues of cataloging rule changes and public-access catalogs, have been very well thought out .1 however, these techniques are generally not directed to planning for communications 205 library-wide automation, and are usually designed to meet the specific needs of an individual library. although the pattern for these studies is often similar, they do not seem to be based upon any general automation design methodology. neither, in addition, does there seem to be a general methodology available through any external library agency. the office of library management studies of the association of research libraries has developed a number of programs designed to assist libraries with their planning efforts, some of which appear to be useful in automation development. 2 but for many libraries, these programs may be too broad, too time-consuming or too expensive. as an alternative, some libraries will need to look elsewhere for a general automation planning methodology. this problem was addressed by the administration of the clemson library, and was resolved in a unique way. background the robert muldrow cooper library of clemson university has the responsibility of acquiring, preserving, and making available for use the many materials needed by faculty and students in their research and instructional efforts. at a typical landgrant institution like clemson, the amount of scholarly publishing and the pressure to develop research proposals has risen sharply in recent years . the increased needs of users working with an expanding and diversified collection have resulted in a doubling of circulation activity, and have required the growth of library staff by 70 percent over the last decade . furthermore, acquisition, processing, and access problems are compounded by the high inflation rate of materials, particularly serial publications, and manpower costs. even though user demands heavily burdened the traditional manual systems, the extent of library automation at clemson had been limited to a batch circulation system, a simple serials-listing capability, and the use of bibliographic utilities. although it had been generally accepted for some time that the acquisitions and fund-control functions at clemson were in need of automation, no concrete approach to develop206 journal of library automation vol. 14/3 september 1981 ing a system had been established. in addition, there was some concern that the development of an automated acquisitions system shouldn't be initiated without a clear understanding of how such an effort would affect the rest of the functions in the library. with this in mind, and as an initial part of planning, the library administration decided to implement a programmed study to determine specific needs and problems of the whole library at clemson and to determine the attendant costs and benefits of their resolution. since developing the methodology for this kind of study effort inhouse has been shown by experience elsewhere to be both expensive and time-consuming, a planning methodology was sought which could be brought in from outside the library and applied in a timely fashion. the international business machines corporation (ibm), through their local marketing representative, volunteered to supply that methodology by means of an education industry application transfer team (att} study. in order to implement the study, a team was organized consisting of representatives from the library, from the university's division of administrative programming services (daps), and from the ibm corporation. the purpose, approach, and results of that study constitute the rest of this paper. purpose the application transfer team methodology was implemented to fulfill a fourfold purpose. • first, it was necessary to act on the recognized need for a library-wide automation plan with something tangible that library and university administrators could use in the decision-making process. • second, basic objectives and implementation estimates were required to provide groundwork to the development of systems specifications and evaluation. • third, the planning process needed to provide a forum for meaningful participation by a number of library staff and users. • fourth, the planning needed to be accomplished rather quickly. the att met all these requirements. although the att study technique is generalized for work on any problem in the education arena, it seems particularly well suited to the library environment because it is oriented toward developing applications that solve production problems. the application transfer team methodology was developed by the ibm corporation for customer use. the a tt methodology evolved from ibm's business system planning function, which has been operational since the early 1970s. although the methodology has been used several times in the academic environment, this is the first time, to our knowledge, that it has been used in a library operation. the strength of the att is that it helps members of a team with diverse backgrounds to understand the environment under study. its final goal was "to improve operational productivity, provide better service to students, and provide information which can enhance management planning and decision making."3 put to work, the methodology is straightforward and effective. from beginning to end, the a tt process took clemson slightly more than three months elapsed time. total work time (including all report writing) for library staff was approximately one thousand man hours. as the initial step with the a tt methodology, it was necessary to engage a sponsor and to select a team . for this study, the sponsor chosen was the dean of graduate studies, who reported directly to the vicepresident for academic affairs. in turn, the director of mmpnting and the director of the division of administrative programming services (daps) reported to the dean of graduate studies. although it was not critical that the sponsor be intimately involved in the project, his level of authority within the university administration would help to secure acceptance of the study's recommendation. the sponsor also provided cogent advice along the way, based upon his understanding of institutional resources, and he served as a communication link with other university administrative offices. the study team was chosen by the library administration with the intention of getting diverse involvement and expertise. library staff included the associate director, the head of circulation, the serials cataloger, and a reference librarian. although only the associate director brought significant experience in library automation development, the head of circulation contributed substantial practical experience with automation systems. the cataloger offered specifics of bibliographic problems, cataloging rule changes, and serials control issues, and the reference librarian contributed a comprehensive knowledge of informationretrieval concerns. outside staff included the director of daps, who furnished details on the clemson computing environment, and an ibm marketing representative, who provided appropriate help with hardware capabilities, the att metnodology, and legwork. in addition, clemson was also able to engage the help of a representative of ibm's education industry division to guide the a tt efforts on the basis of his experience in the use of the methodology. from time to time, other ibm and daps staff were involved in assisting with interviews and report writing. the associate director served as team chair in order to act as spokesperson, to coordinate team effort, and to edit the final report. methodology the application transfer team methodology is applied in six phases. ibm recommends that these phases be conducted sequentially, and that they last from five to sixteen weeks, depending on the size of the problem. throughout the process, verbal reviews were conducted by the team with the sponsor and with the library staff. the first phase involved an organizational session. following the introduction of team members, the ibm education industry division representative presented an overview of the methodology and explained the mechanics of the a tt study process. the team then established the scope of the study by choosing an application area on which to focus and by determining the general objectives of the final system to be implemented. since part of the purpose of the project was to develop a plan for librarywide automation, it was quickly recognized by the team that the application area should be an integrated library information system. however, the ibm representative suggested that this scope was too broad for the study and that one functional area such as communications 207 acquisitions be chosen, with other functions reserved for subsequent att studies. given time constraints, a compromise arrangement was made in which serials control was determined as the scope. since serials control is a single functional area, but encompasses nearly all bibliographic issues, it served as a microcosm of overall library operations. therefore, it was generally accepted that a plan that effectively accommodated serials would constitute an integrated system plan. the organizational phase continued by determining who to interview during the data-collections phase and by setting up an interview schedule. this phase was concluded by developing an outline of the final report and by assigning writing responsibilities to individual team members. the data-gathering effort constituted phase two. this involved structured interviews of representative staff of each unit of the library who were involved in routine interactions with any phase of serials control at clemson. interviews were conducted with staff from acquisitions, cataloging, circulation, reference units, and branch libraries as well as the university business office, students, and faculty. following an outline in the att, each person interviewed was asked for specific details of his work with serial publications regarding (1) interfaces (or points of interaction), (2) concerns or needs, (3) suggested improvements, (4) expected values or benefits of improvements, (5) work volume, and (6) cycles. data gathered in each of these interview sessions were immediately documented in a letter to the interviewees. these letters were reviewed by those interviewed for corrections and adde::tl delail. data from completed and documented interviews were consolidated during the third phase of the study into a matrix of each of the six questions plotted against operational areas of the library, graphically designating areas of the greatest concern to the largest part of the library. this composite was analyzed to separate problems that could be reasonably handled by an integrated automation system from those that needed the attention of administrative policy and direction. functions for automation consideration were then examined in a 208 journal of library automation vol. 14/3 september 1981 "blue sky" session of the committee to envision what system would accommodate the specifications for serials control and access that each library unit and serials user required. from this session a synthesis emerged of the architecture for an integrated system. 4 this architecture included a description of the basic relationships of functional modules of the system, a list of the various files needed to contain system information, and a list of data elements required for bibliographic holdings, acquisition, and patron records in the system database. phase four called for the translation of the architecture and general system requirements into modules on basic access, acquisition or processing functions, and into the individual programs needed to execute each module. the team divided into two parts. the ibm and daps personnel, with the associate director, listed the modules and programs and formulated descriptions of each. part of the description effort involved drafting approximate flowcharts of each program. using algorithms developed by ibm, these descriptions were used to assign estimates of person hours required to create the necessary modules. in order to determine the overall cost of system development the person-hour figures were converted to dollars using an average hourly cost for clemson daps personnel. committee members not involved in program/module design formed a group to evaluate anticipated benefits defined in the interviews, to collect data from library staff to support these expectations, and to assign a value to them. benefits from reduced file maintenance, processing, and tracking time were valued as person hours saved by the new system . additional improvements were projected for the system's capability for better fund control, more complete and immediate on-order, claiming, and inprocess information, and statistical collection development/use data. these benefits were assigned the value of estimated duplicate and inappropriate material acquired under the present system. a value was not assigned to user benefits. faculty and student satisfaction is intangible, and variable from case to case. enhanced user service was recognized as a substantial benefit of the proposed system, but was not quantified. the cost factors determined in phase four were consolidated with derived benefit values to form a cost/benefit analysis, which constituted phase five. in the sixth and final phase an implementation plan was formulated. this plan, along with recommended target dates, was presented orally to library staff and university administration. in addition, the entire process, recommendations, and plan of action were documented in a written report. 5 results within the a tt report were a description of the current library environment, objectives and description of the proposed system, implementation considerations, a cost/benefits analysis, and recommendations for a plan of action. although care was taken to "walk through" the function of each module of the described system, the report was not intended to provide detailed computer program specifications ready to be coded by a programmer. it described a useful and powerful integrated serials system in sufficient detail to be a working tool in the hands of a knowledgeable systems analyst to match (or revise) already available systems and programs to the library's specifications. the report itself also served as an effective communication link with the university administration, setting out library concerns and giving rational solutions to the pervasive problem of serials control and, in the long term, to an integrated library information system. the timing of the a tt study was fortunate for the clemson library. the university was on the eve of an accreditation selfstudy. as often happens with the examination of any organization, a host of related, but unacknowledged , problems surfaced in the course of the att study. during the interviews, staff members felt free to bring up matters of unclear policies, misunderstood hierarchical arrangements, and staffing inadequacies throughout the library. the number and importance of nonautomation concerns was significant enough that an administrative report was written to articulate these problems to the university administration. 6 it is interesting to note also that, while in every instance the team received enthusiastic cooperation from all those interviewed , there was fear among some staff members that any automation project would necessarily cut staff positions. once this worry was identified, the study team was able to allay those fears by explaining the study's purpose. one of the greatest contributions of the att study has been the direction it has given the library for future goals and priorities. by focusing on the problems of serials control, the team evaluated a microcosm of library problems. investigating these problems in the environment of more limited budgets, possible future closing or freezing of the card catalog, and increased user demands for services has helped the library develop a course of action, a resolve of mission, and a direction for future growth. the staff of daps and the library are conducting a review of existing software and systems potentially appropriate for a comprehensive serials control system. the att study was the tool successfully used to elicit university support for library automation. the university has given its approval, and supplied funding, to proceed with the determination of available systems and with the development of a request for quotation. communications 209 references l. for example: university of rochester, river campus libraries, task force on access systems, report (rochester, n.y.: univ. of rochester, 1980), university of california, berkeley, general library, committee on bibliographic control, future of the general library catalogs of the university of california at berkeley (berkeley: univ. of california, 1977); pennsylvania state university libraries, systems development department, remote catalog access system: general system specifications (university park: pennsylvania state univ., 1977). 2. association of research libraries, office of management studies, annual report, 1979 (was hington, d.c.: the association, 1979). 3. international business machines corporation, application transfer teams: application description (white plains, n.y.: the corporation, 1977), p.1 ; international business machines corporation, application transfer teams: realizing your computing syste1ns' potential (white plains, n.y.: the corporation, 1977). 4. inte rnational busine'is machines corporation , business systems planning: information systems planning guide (white plains, n.y.: the corporation, 1975), p.49. 5. richard w. meyer and others, total integrated library information system: a report on the general design phase (syracuse, n.y.: eric clearinghouse on information resources, 1980), ed 191446. 6. richard w. meyer, cooper library: status and agenda. a report on fy 1979-80 (clemson, s.c.: clemson univ., 1980). lib-s-mocs-kmc364-20140601051731 58 book reviews descriptive cataloguing; a student's introduction to the anglo-american cataloguing rules 1967. by james a. tait and douglas anderson. second ed.; rev. and enl. hamden, conn.: linnet books, 1971, 122p. $5.00 this second edition contains some corrections to the errors made in the 1968 edition, and includes the changes and clarifications brought out by the aacr amendment bulletin. the number of exemplary title pages has been increased from twenty-five to forty, thus giving the student more practice in determining entries and doing descriptive cataloging. this reviewer believes that a more exact title would be "descriptive cataloging and determining entries and headings," because this introductory text not only covers descriptive cataloging as defined and explained in "part iidescriptive cataloging" of the anglo-american cataloguing rules, but also includes some of the basic rules for determining entries and headings in aacr's "part !-entry and heading." there are three distinct sections: descriptive cataloging; determining entries and headings; and facsimile title pages for student practice. descriptive cataloging is covered in just thirteen pages, but all the basic elements are there. the explanations are clear and examples are shown, but not in the context of a full card. (unfortunately only one full catalog card is illustrated in the entire book.) it is in this section, more than in any other, where the differences between british and american cataloging become obvious. british descriptive cataloging varies in so many ways from its american counterpart that a beginning student in an american library school would be quite confused by these variations. the next section consists of twenty-five pages and is devoted to the basic rules on entries and headings. examples are used to illustrate the rules and the authors point out some differences between the british and american texts of the aacr. the remaining seventy pages contain the forty reproduced title pages which are followed by some commentary and a key corresponding to each title page. these title pages give the student a wide range of experience in transcribing the proper information onto the card and in determining main and added entries. even though this book is an excellent introduction to the rudiments of descriptive cataloging and the determination of main and added entries, book reviews 59 its use of british descriptive cataloging precludes its being widely adopted in beginning cataloging courses in american library schools. donald /. l ehnus centmlized processing for academic librm·ies. by richard m. dougherty and joan m. maier. metuchen, n.j.: scarecrow press, 1971. 254p. $10.00 this is the final report of the colorado academic libraries book processing center ( calbpc) two-part study investigating centralized processing. phase i, reported by laurence leonard, maier, and dougherty in centralized book processing, scarecrow, 1969, was basically a feasibility study, whereas this final report describes the beginning six months of operations that tested the phase i recommendations. partially funded by the national science foundation, the experiment measured anticipated time and cost savings, monitored acquisitions and cataloging operations, and tested product acceptability for six libraries participating in the 1969, six-month study. even though centralized book processing might hold little appeal for the reader, this volume nonetheless is valuable to technical service heads because of its above average sophistication in applying a systems analysis approach to technical services problems. the authors objectively report their findings, outlining in detail the mistakes, the unanticipated problem areas, and what they believed to be the successes. from the start the authors encountered problems with scheduling. by the time the experiment began most participants had a large portion of their book money encumbered, and the center was forced to accept cataloging arrearages in addition to book order requests. those who did send in orders did not conform to patterns predicted in phase i. instead, the center was used as a source of obtaining more difficult materials, including foreign language items. it was discovered that in actual practice calbpc had no impact on discounts received from vendors. the vendor performance study lacked relevancy because it was based upon the date invoices were cleared for payment rather than the date books were received in house. in evaluating the total processing time, four libraries reduced their time lag by participating in the center's centralized processing, and the cost of processing the average book was reduced from $3.10 to $2.63. the product acceptance study showed that the physical processing was only partially accepted with most of the libraries modifying a truncated title that was printed on the book card and book pocket as a by-product of the automated financial subsystem. other local modifications were made on books processed by the center but that cost or local error correction costs were not reported in the study. calbpc's automated financial subsystem was beseiged with many problems resulting from lack of programming foresight and adequate consulting 60 journal of library automation vol. 5/1 march, 1972 by those who had previously designed such systems. individuals interested in the automation of acquisitions should read this section of the report. calbpc's problems were typically those of building exceptions to exceptions in order to accommodate unanticipated program omissio.ns. simply not recognizing that books could be processed before invoices were paid caused delays and bottlenecks of such magnitude that procedures had to be devised to circumvent requirements of the automated subsystem. many recommendations were particularly relevant to cooperative ventures. in formulating processing specifications such as call number format and abbreviation standardization, calbpc had not anticipated the infinite local variations they would have to accommodate. they quickly recognized the need for both greater quality control to minimize errors within the system and better communications and educational programs for participants. a reoccurring message was that librarians emphasized the esthetics of catalog cards rather than the content, thus a recommendation was made to investigate whether a positive correlation exists between the esthetics of the product and the quality of the library service. the authors emphasized that a cooperative program depends more upon competencies and willingness of individuals than the technical aspects of the operations. some diversification of services was called for but no mention was made of the possibilities of an on-line system. it was felt that in future operations the center should accept orders for out-of-print and audiovisual materials. those libraries participating in approval programs had received no benefit by having books sent first to the center, thus it was suggested that the center forward those libraries a bibliographic packet only and that the approval books bypass the center. this well-documented study, half of which is devoted to charts and appendix materials, concluded its recommendations with a positive evaluation of the service the center had performed and suggested that public and school libraries should also be participants. ann allan mccrory ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ \ orthographic error patterns of author names in catalog searches 93 renata tagliacozzo, manfred kochen, and lawrence rosenberg: mental health research institute, the university of michigan, ann arbor, michigan an investigation of error patterns in author names based on data from a survey of library catalog searches. position of spelling errors was noted and related to length of name. probability of a name having a spelling error was found to increase with length of name. nearly half of the spelling mistakes were replacement errors; following, in order of decreasing frequency, were omission, addition, and transposition errors. computer-based catalog searching may fail if a searcher provides an author or title which does not match with the required exactitude the corresponding computer-stored catalog entry ( 1). in designing computer aids to catalog searching, it is important to build in safety features that decrease sensitivity to minor errors. for example, compression coding techniques may be used to minimize the effects of spelling errors on retrieval ( 2, 3, 4). preliminary to the design of good protection devices, the application of error-correction coding theory ( 5, 6, 7) and data on error patterns in actual catalog searches ( 8, 9) may be helpful. a recent survey of catalog use at three university libraries yielded some data of the above-mentioned kind (10). the aim of this paper is to present and analyze those results of the survey which bear on questions of error control in searching a computer-stored catalog. in the survey, users were interviewed at random as they approached the catalog. of the 2167 users interviewed, 1489 were searching the catalog for a particular item ("known-item searches"). of these, 67.9% first entered the catalog with an author's or editor's name, 26.2% with a title, and 5.9% with a subject heading. approximately half the searchers had a written citation, while half relied on memory for the relevant in94 journal of library automation vol. 3/2 june, 1970 formation. paradoxically, though most known-item searchers tried to match primarily an author and only secondarily a title, there were in the sample of searches many more cases of exact title citation than of exact author citation. imperfect recall of author name of the 1489 "known-item" searches, 1356 could be verified against the actual item. from the total nwnber of searches ( 1260) in which the catalog user had provided an author's (or editor's) name, those works were subtracted which did not have a personal authorship ( 208) or had multiple authors or multiple editors ( 127). this left 925 searches, of which 470 had complete and correct author entries, while 455 contained various degrees of imperfection in the author citation. table 1 gives the distribution of incorrect and/or incomplete author citations. in the study an author's name was defined as incomplete when the first name, or the two initials, or one out of two initials was missing. table 1. incorrect and/or incomplete author names categories university of michigan libraries i ii iii total general library 144 25 6 175 undergraduate library 94 35 4 133 medical library 110 27 10 147 -total 348 87 20 455 in category i (the most numerous) the author's last name was correct, but the author citation as a whole was either incomplete or incorrect; i.e., there were mistakes and/or omissions in the first and middle name or initials. most of the searches in category i were incomplete rather than incorrect. since in category i there is nothing wrong with the author's last name, the searcher's ability to gain access to the right location in the catalog is presumably not impaired as long as the last name is not too common. once the searcher has entered the catalog, he will make use of other clues, such as title or knowledge of the topic, to identify the right item. but if the name is smith or brown or johnson, and the catalog is a large one, to have an incomplete author's name may be equivalent to having no name at all. (in the university of michigan general library catalog, which contains over four million cards, the entry "smith" extends over eight drawers, and the entries "brown" and "johnson" over four drawers each.) in an automated catalog it is easy to limit the set of entries from which the right item has to be selected by intersecting the last name of the author with some other clues. incompleteness of the author name may then not be a serious handicap. orthographic error patternsftagliacozzo 95 category iii includes all searches in which the searcher had an author that turned out to be wrong. the error in this case was not in incompleteness or misspelling of the author's name, but in the identity of the author. no further analysis of this group was conducted. category ii is the one which forms the object o£ the present report. the analysis concerns mainly position and type of errors, and the incidence of errors as related to name length. position of errors in author names the location of errors in the author citation is important for manual systems, such as traditional library card catalogs, as well as for automated systems. table 2 shows the distribution of e in the sample of incorrect author citations from all three libraries, where e is the position of the letter, counting from left to right, in which an error appeared. in the fourteen cases in which more than one error occurred in the same name, only the first error was considered. in a few cases the error involved a string of letters (e.g., friedman for friedberg). in such cases the position of the first letter of the string determined the location of the error. table 2. position of error in last name of author incorrect names e no. % cumulative % 1 2 2.3 2.3 2 11 12.6 14.9 3 11 12.6 27.6 4 19 21.8 49.4 5 13 14.9 64.4 6 12 13.8 78.2 7 7 8.0 86.2 8 6 6.9 93.1 9 3 3.4 96.6 10 2 2.3 98.9 11 1 1.1 100.0 total 87 table 2 shows that about half the incorrect author names had errors in one of the first four letters, while the other half had errors in one of the following letters, from the fifth to the eleventh position. the most frequently misspelled is the fourth letter, which is responsible for 21.8% of the total number of errors occurring in the sample. the ordinal number indicating the position of the error is not, by itself, a sufficient indicator of the area where the error occurred. an error in the third letter, for instance, is close to the beginning of the name if the 96 ]ourml of library automation vol. 3/ 2 june, 1970 name is 9 letters long, but close to the end if the name is 4 letters long. in table 3 l indicates the length (the number of letters) of the authot name and pa the location of the error-i.e., the position of the first letter, counting from left to right, where an error appears. the incorrect author names of the sample ( 87) have a length of between 3 and 12 letters. the column on the right of the table, el, indicates the distribution of names of a given length. the row at the bottom of the table gives the distribution of errors occurring in a given position. mistakes are shown to occur anywhere from the first letter to the eleventh letter. when the error consists in the addition of a letter to the end of the correct name, pa is beyond the name itself. the figures which appear next to the diagonal line, on the right, indicate mistakes of this sort. a sununary inspection of the table produces the impression that errors are clustered toward the end of the names, or at least that they are more prevalent in the second half of the name than in the first half. this seems to be a direct consequence of the fact that the first column of the table (errors in position 1) is almost empty. it is tempting to say that errors very rarely occur in the first letter of a proper name. but is this really so? it is true that english-speaking people place particular emphasis on initials, to the extent that initials are often sufficient for identifying well-known figures. the special attention given to the first table 3. position of error vs. length of name length (l) errors (pe) frequency (el) 1 2 3 4 5 6 7 8 9 10 11 3 1 1 4 1 3 5 5 1 2 1 7 6 1 3 6 21 7 4 2 6 19 8 2 3 2 16 9 2 1 1 1 1 1 1 8 10 1 1 2 1 2 7 11 1 1 2 12 1 1 total 2 11 11 19 13 12 7 6 3 2 1 87 orthographic error patternsjtagliacozzo 97 letter of a name would certainly contribute to the scarcity of errors in such a letter. but it is also possible that when errors in the first letter occur, they so transform the name that it becomes unrecognizable. several such authors may have ended up in the category of non-verified authors necessarily excluded from the analysis. it would be interesting to verify whether the "serial-position effect" that some authors found in the spelling of common nouns is present also in the spelling of proper names. according to jensen and to kooi et al., the distribution of spelling errors in relation to letter position closely approximates the serial-position curve for errors found in serial rote learning ( 11, 12). to ascertain if this is the case for author names, a data base much larger than that used for this study would be needed. distribution of errors and length of names is the probability of a catalog searcher misspelling the name of an author dependent to any extent on the length of the name? table 3 shows the frequency of occurrence of names of a given length in the 87 misspelled names (column el). the next step was to calculate the distribution of the length of author names in the whole group of verified author citations provided by the catalog searchers. this group, it should be remembered, does not include multiple authors, multiple editors or nonpersonal authors. the ratio of the corresponding figures in the two distributions will give the percentage of names of a given length having spelling mistakes (table 4) . table 4. probability of errors in recall of author names of a given length length frequency of frequency of percentage of of name incorrect names all names incorrect names 2 1 3 1 9 11.1%} 4 5 87 5.7% 4.9% (short 5 7 169 4. 1% names) 6 21 215 9.8%"\ 7 19 191 9.9% j 10.5% (medium 8 16 127 12.6% names) 9 8 59 13.6%} 10 7 36 19.4% 14.3% (long 11 2 26 7.7% names) 12 1 5 20.0% 87 925 there is an observable trend toward an increase of mistakes with length of name. of course, the two extremes of length distribution are scarcely 98 journal of library automation vol. 3/2 june, 1970 represented, and this is probably responsible for inconsistencies in the percentage disb·ibution. grouping names into three length categories (i.e., short names, middle-length names, and long names) makes more apparent differences in percentages of incorrect names. the differences are significant at the .01 level of confidence. type of error in author names errors which occurred in the spelling of the last names of authors were grouped into four broad categories: replacement errors, omission errors, addition errors, and transposition errors. while it is true, especially in badly mangled words, that an error can often be said to be of any of several types, it was generally easy to identify the simplest necessary transformation of the letters, and to assign the incorrect name to the type of error corresponding to that kind of transformation. in some cases this meant adding a string of letters or replacing one string by another. altogether the sample of 87 incorrect authors contained 104 errors. eleven names exhibited two errors each, three had three errors, and the remaining just one error. of the 104 errors, 50 were replacement errors; these are cases in which one letter or string of letters of the correct name has been replaced by a different letter or string of letters (e.g. hoiser for hoijer, friedman for friedberg). the most common replacement errors appear in table 5, in order of decreasing frequency. table 5. single-letter replacement errors no. of errors correct lettet' incorrect letter 6 0 a, a, a, a, p, r 5 a, e, y, y, y 4 y a, i, u, z 3 a i, o, 0 3 s c, r, z 3 v b, f, w 2 e i, 0 2 g c, r 28 not included in the table are the 10 letters which were each replaced just once and the 12 strings of letters. in four cases, the replaced letter was the second of a double letter. there were 34 omission errors in all. four of these involved a string of letters; all the rest were single-letter omissions. eleven single-letter omissions occurred in the last letter of the name (e.g. abbot instead of abbott), and 19 in the middle of the name (e. g. brent instead of orthographic error patternsjtagliacozzo 99 brendt). table 6 gives the frequency distribution of the omitted letters. the asterisk indicates that the omitted letter was the second of a double letter. table 6. single-letter omission errors no. of error in middle error in final letter errors position position omitted 8 5 3 e 4 4 a 4 40 t 3 1 20 n 2 2 h 2 2 i 2 20 1 2 1 1 s 1 1 c 1 1 d 1 1 r 30 addition errors totaled 18. in one case the addition consisted of a string of letters, while in the others only one letter was added. addition errors can occur in the middle of a name (e.g. berelison for berelson) or at the end of it (e.g. haller for halle). in the latter case, the added letter is found beyond the last letter of the correct name (these were the errors on the right of the diagonal in table 3). the distribution of addition errors is shown in table 7. the asterisk indicates that the added letter duplicated the previous letter. table 7. single-letter addition errors no. of error in middle errors position 5 2 2 2 1 1 1 1 l 1 17 error in final position 4 1 1 1 .l added letter s c e i a f 1 m n z 100 journal of library automation vol. 3/2 june, 1970 there were two transposition errors: ie for ei and ai for ia. in cases of second and third errors in the name, there were five replacement errors, seven omission errors, and five addition errors. table 8 summarizes the type of errors encountered in the sample of incorrect authors. figures in this table include strings as well as single letters, and second and third errors, as well as first errors. table 8. distribution of types of errors middle position replacement errors omission errors addition errors transposition errors conclusion four trends could be observed: 44 21 10 2 final total position 6 50 13 34 8 18 2 104 1) vowels usually replaced vowels, and consonants usually replaced consonants. apparently the probability of misspelling a single letter was slightly higher for vowels than for consonants. with the latter, there is some indication that the substitution was guided by phonetic similarity ( " » • 1 d b "b" "f" " ") e.g., v is rep ace y , or , or w . 2) most omissions in which the correct name had a double letter occurred at the end of the word. 3) replacement errors tended to come earlier in words than did omissions and additions. (this is not due to the fact that addition and omission errors contained a disproportionately high number of final errors; even when these final errors are excluded, replacement errors still come earlier than other types.) 4) second and third errors in a name have comparatively few replacement errors. acknowledgment this work was supported in part by the national science foundation, grant gn 716. references 1. kilgour, f. g.: "retrieval of single entries from a computerized library catalog file," proceedings of the american society for i nfo1'11ultion science, 5 ( 1968), 133-136. 2. nugent, william r. : "compression word coding techniques for information retrieval," journal of library automation, 1 (december 1968), 250-260. orthographic error patternsjtagliacozzo 101 3. ruecking, frederick h ., jr.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-238. 4. dolby, james l. : "an algorithm for noisy matches in catalog searching." in: a study of the organization and search of bibliographic holdings records in on-line computer systems: phase i. (berkeley, cal. : institute of library research, university of california march 1969 ), 119-136. 5. peterson, william w.: error correcting codes (new york: wiley, 1961). 6. alberga, cyril n.: "string similarity and mispellings," communications of the acm, 10 ( 1967), 302-313. 7. galli, enrico j.; yamada, hisao m.: "experimental studies in computer-assisted correction of unorthographic text," ieee transactions on engineering writing and speech, ews-11 (august 1968), 75-84. 8. tagliacozzo, r., et al.: "patterns of searching in library catalogs." in: integrative mechanisms in literature growth. vol iv. (university of michigan, mental health research institute, january 1970). report to the national science foundation, gn 716. 9. university of chicago graduate library school: requirements study for future catalogs, (chicago : university of chicago graduate library school, 1968) . 10. tagliacozzo, renata; rosenberg, lawrence; kochen, manfred: access and recognition: from users' data to catalog entries (ann arbor, mich.: the university of michigan, mental health research institute, october 1969, communication no. 257) . 11. jensen, arthur r.: "spelling errors and the serial-position effect," journal of educational psychology, 53 (june 1962), 105-109. 12. kooi, beverly y.; schutz, richard e.; baker, robert l.: "spelling errors and the serial-position effect," journal of educational psychology, 56 ( 1965), .334-336. president’s message cindi trainor information technologies and libraries | september 2013 1 it's fall already, and for lita that means some exciting things! the program planning committee is hard at work evaluating the sixty-plus program proposals that have been submitted. the new conference format specified by ala means that we have 20 slots for lita programs at annual, including the president's program and top tech trends. the ppc certainly has its work cut out! well before midwinter in philadelphia will be national forum 2013. i'm so excited that this year's forum is going to be in my home state of kentucky. join us from november 7-10 in "luhvuhl" (louisville) for great keynotes, preconference workshops, concurrent sessions, and of course, networking opportunities. travis good from make magazine will deliver our opening keynote, nate hill from chattanooga public library is up on saturday, and emily gore from the digital public library of america closes out the forum on sunday. i hope you'll join us for an exciting forum in the bluegrass state. in governance news, the board of directors identified three goal areas on which we will concentrate this year: stabilizing the budget, engaging members, and growing membership. we are eagerly awaiting the final report of the financial stability task force, led by tom wilson and andrew pace; they presented preliminary findings at the board meeting in chicago in july. we are currently updating the strategic plan, which was last updated in 2010—watch the lita-l list and the blog for more. we would love your input on them! if you're interested in lita governance, check out the board's space on ala connect. many discussion posts are open, and we welcome your comments! you can also complete this form (http://www.ala.org/lita/about/board/contact) to reach out to the board anytime. the board will be having several meetings this fall: the executive committee is tentatively meeting september 30; the forum steering committee will meet after forum 2013; the budget review committee will be meeting at the ala joint boards meeting in late october; and the entire board will have an online meeting before we convene in person in philadelphia. watch lita-l for the announcements; we welcome you as guests to all our meetings. finally, for those of you interested in leadership but not necessarily ready to run for board, i want to point you to documents put together by the lita emerging leaders team in 2013: http://connect.ala.org/node/197839 our team of three ala emerging leaders, margaret heller, zach coble, and katie heidgerkengreene, surveyed lita leaders, worked with committee chairs coordinator michelle frisque and ig chairs coordinator paul keith, and synthesized tons of information and many documents into the leadership guide for new chairs of committees and interest groups cindi trainor (cindiann@gmail.com) is lita president 2013-14 and community specialist & trainer for springshare, llc. http://litablog.org/ http://www.ala.org/lita/about/board/contact http://www.ala.org/lita/about/board/contact http://connect.ala.org/node/197839 http://connect.ala.org/node/209032 mailto:cindiann@gmail.com president’s message | trainor 2 (http://connect.ala.org/node/209032). they also created a sample leadership game (http://www.gloriousgeneralist.com/leadership.html) to test our leadership knowledge, and presented their work at the emerging leaders poster session in chicago. the leadership guide will inform future orientation activities for committee and ig chairs and will then be handed over to the bylaws & organization committee to be incorporated into the lita manual. thank-you for being a lita member! i hope to see you at a future event. my fellow board members and i welcome your comments and suggestion--for program ideas, workshop or online class ideas, and for how we can keep lita awesome. :) http://connect.ala.org/node/209032 http://www.gloriousgeneralist.com/leadership.html http://www.gloriousgeneralist.com/leadership.html 218 history of library computerization frederick g. kilgour : director, ohio college library center, columbus, ohio the history of library computerization from its initiation in 1954 to 1970 is described. approximately the first half of the period was devoted to computerization of user-oriented subject infotmation retrieval and the second half to library-oriented procedures. at the end of the period on-line systems were being designed and activated. this historical scrutiny seeks the origins of library computerization and traces its development through innovative applications. the principal evolutionary steps following upon a major application are also depicted. the investigation is not confined to library-oriented computerization, for it examines mechanization of the use of library tools as well; indeed, the first half-dozen years of library computerization were devoted only to user applications. the study reveals two major trends in library computerization. first, there are those applications designed primarily to benefit the user, although few, if any, applications have but one goal. the earliest such applications were machine searches of subject indexes employing post-coordination of uniterms. nearly a decade later, the first of the bookform catalogs appeared that made catalog information far more widely available to users than do card catalogs. finally, networks are under development that have as their objective availability of regional resources to individual users. the second trend is employment of computers to perform repetitive, routine library tasks, such as catalog production, order and accounting procedures, serials control, and circulation control. this type of mechanizahistory of library computerizationfkilgour 219 tion is extremely important as a fir st step toward an increasingly productive library technology, which must be an ultimate goal if libraries are to be economically viable in the future ( 1,2). historical studies of library computerization have not yet appeared, although some reports beginning with that of l. r. bunnow ( 3) in 1960 contain valuable literature reviews. both editions of literature on information retrieval and machine translation by c. f. balz and r. h. stanwood ( 4,5) are extremely useful. in addition, j. a. speer's libraries and automation ( 6) is a valuable, retrospective bibliography of over three thousand entries. origins the origins of library computerization were in engineering libraries newly established in the 1950's and employing the uniterm coordinate indexing techniques of mortimer taube on collections of report literature. the technique of post-coordination of simple index terms proved most suitable for computerization, particularly when the size of a file caused manual manipulation to become cumbersome. harley e . tillitt presented the first report, albeit unpublished at the time, on library computerization at the u.s. naval ordnance test station (nots), now the naval weapons center at china lake, california. the report, entitled "an experiment in information searching with the 701 calculator" (7), was given at an ibm computation seminar at endicott, new york, in may 1954. the system was extended .and improved in 1956, and a published report appeared in 1957 ( 8). tillitt subsequently published an evaluation ( 9). the nots system mimicked manual use of a uniterm card file. this noteworthy system could add new information, delete information related to discarded documents, match search requests against the master file, and produce a printout of document numbers selected. search requests were run in batches, thereby producing inevitable delays that caused user dissatisfaction. when the user did receive results of his search, he had a host of document numbers that he had to take to a shell list file to obtain titles. subsequent system designers also found that a computerized system could cause user dissatisfaction if it did not speed up and make more thorough practically all tasks. because use of the system dwindled, it was not reprogrammed for an ibm 704 that replaced the 701 in 1957. however, a couple of years later, when an ibm 709 became available, the system was reprogrammed and improved so that the user received a list of document titles ( 10). tillitt, bracken, and their colleagues deserve much credit for their pioneer computerization of a subject information retrieval system. the application required considerable in genuity, for the ibm 701 did not have built-in character representation. therefore it was necessary to develop subroutines that simulated character representation ( 11 ). moreover, the 701 had an 220 journal of library automation vol. 3/3 september, 1970 unreliable electrostatic core memory. on some machines the mean time between failures was less than twenty minutes ( 12). in september 1958, general electric's aircraft gas turbine division at evendale, ohio, initiated a system on an ibm 704 computer ( 13) that was similar to the nots application. mortimer taube and c . d. gull had installed a uniterm index system at evendale in 1953 (14,15). the ge system was an improvement over the then-existing nots system because it printed out author and title information for a report selected, as well as an abstract of the report. like the nots system, however, the ge application provided only for boolean "and" search logic. the celebrated medlars system ( 16) encompassed the first major departure in machine citation searching. the original medlars had two principal products: 1 ) composition of index m edicus; and 2) machine searching of a huge file of journal article citations for production of recurrent or ondemand bibliographies. the system became operational in 1964. the nots and ge systems coordinated document numbers as listed under descriptors. medlars departed from this technique by searching a compressed citation file in which each citation had its descriptors or subject headings associated with it. the medlars system also provides for boolean "and," "or," and "not" search logic. the next major development was dialog (17), an on-line system for machine subject searching of the nasa report file. queries were entered from remote terminals. the suny biomedical communication network constitutes an important development in operation of machine subject searching and production of subject bibliographies of traditional library materials. the suny network went into operation in the autumn of 1968 with nine participating libraries ( 18) . its principal innovation is on-line searches from remote terminals of the medlars journal article file to which book references have been added. the suny network eliminates the two major dissatisfactions with the nots system and all subsequent batch systems, in that it provides the user with an immediate reply to his search query. catalog production in 1960, l. r. bunnow prepared a report for the douglas aircraft company ( 3) in which he recommended a computerized retrieval system like the nots and ge systems that would also include catalog card production. bunnow's proposal was perhaps the first to contain the concept of production of a single machine readable record from which multiple products could be obtained, such as printed catalog cards and subject bibliographies produced by machine searching. catalog card production began in may 1961 ( 19), the cards having a somewhat unconventional format and being printed all in upper-case characters as shown in figure 1. cards were mechanically arranged in packs for individual catalogs, and alphabetized within packs-an early sophistication. accompanying the history of library ,computerizationjkilgour 221 ml 13,750 douglas aircraft co., inc mechanized information retrieval system for douglas aircraft company, inc., status report. g. w. koriagin, l. r. bunnow january 1962 copy 1 fig. 1. sample catalog card. info~tion retrieval ibraries computer earching ibm 7090 ibm 1401 production of catalog cards was production of accession lists from the same machine readable data. the next development in catalog card production occurred at the air force cambridge research laboratory library, which began to produce cards mechanically in upperand lower-case in 1963 ( 20). a special computer-like device called a crossfiler manipulated a single machine readable cataloging record on paper tape to produce a complete set of card images punched on paper tape. this paper-tape product drove a friden flexowriter that mechanically typed the cards in upperand lower-case. two years later, yale began to produce catalog cards in upperand lower-case directly on a high-speed computer printer ( 21). the yale cards were also arranged in packs, as had been those at douglas, but were not alphabetized within packs. the new england library information network, nelinet, demonstrated in a pilot operation in 1968 a batch processing technique servicing requests from new england state university libraries, via teletype terminals, for production of catalog card sets, book labels, and book pockets from a marc i catalog data file ( 22). the nelinet system became operational in the spring of 1970 employing the marc ii data base. also in 1968 the university of chicago library brought into operation catalog card production with data being input remotely on terminals in the library, and cards being printed in batches on a high-speed computer printer centrally ( 23 ). bookform catalogs began to appear in the early 1960's, and it appears that the information center of the monsanto company in st. louis, missouri, published the earliest report on a bookform catalog that it had 222 journal of library automation vol. 3/3 september, 1970 produced by computer in 1962 ( 24,25) . the center discontinued its card catalog in the same year. book catalogs can increase availability of cataloging information to users while reducing library work, and the monsanto book catalog is an example of such an achievement, for it provides a union catalog of the holdings of seven monsanto libraries, and is produced in over one hundred copies. as would be expected, the catalog appeared all in upper-case. however, in september 1964 the library at florida atlantic university produced a bookform catalog in upperand lower-case (26) and the university of toronto library put out the first edition of its upperand lower-case onulp catalog on 15 february 1965 (27,28). the monsanto catalog format called for author and call number on one line, with title and imprint on a second, or second and third, line. both florida atlantic and toronto catalogs were essentially catalogs of catalog cards. under the leadership of mortimer taube, documentation, inc. was first to produce a bookform catalog in upperand lower-case, with a format like that of bookform catalogs in the nineteenth century ( 29); documentation, inc., prepared the catalog for the baltimore county public library. entries were made once, with titles listed under an entry if there were more than one. the stanford bookform catalog appeared late in 1966, introducing a new type of unit record, whose first element is the title paragraph. h. p. luhn proposed selective dissemination of information ( sdi) in 1958 (30), and perhaps the first library application of sdi was in the spring of 1962 at the ibm library at owego ( 31), where special processing was given to new acquisitions for input into the sdi system. at about the same time, the library of the douglas missile & space systems division instituted an sdi system that employed as input a single machine readable record from which catalog cards and accessions lists were also produced ( 32). the introduction of sdi into library operation is a major, historic innovation, for sdi is a routine but personalized service in contradistinction to the depersonalized library service characteristic of all but the smallest libraries. selective dissemination of information is one of the few examples of library computerization that takes full advantage of the computer's ability to treat an individual as a person and not as one of a horde of users. circulation the picatinny arsenal reported the first computerized circulation system ( 33). the pica tinny application produced a computer printed loan record, lists of reserves, overdues, lists of books on loan to borrowers, and statistical analysis, in a system that began operation in april 1962. the charge card at picatinny was an ibm punch card into which was punched the bibliographic data and data concerning the borrower each time the book was charged. in the fall of 1962, the thomas j. watson research center ( 34) activated a circulation system much like the pica tinny system, except that bibliographic data was punched into a book card by machine, but information about the borrower was manually punched. history of library c;om.puterizationjkilgour 223 the next step forward occurred at southern illinois university ( 35), where a circulation system like the two just described began limited operation in the spring of 1964 employing an ibm 357 data collection system. by using the 357, it was possible to have a machine punched book card and a machine readable borrower's identification card that could be read by the 357, thereby eliminating manual punching. the southern illinois system became fully operational at the beginning of the fall term of 1964, as did a similar 357 system at florida atlantic university (26). batch processed circulation systems periodically producing a listing of books on loan have a built-in source of dissatisfaction, particularly in academic libraries, for current records are unavailable on the average for half the period of the frequency of the printout. such delay can be eliminated in an on-line system, wherein information about the loan is available immediately after recording the loan. however, not all circulation systems with remote terminals operate interactively. in an on-line system introduced at the illinois state library in december 1966 ( 36) the transactions were recorded on an ibm 1031 terminal located at the circulation desk, data transmitted from the terminal being accumulated daily and processed into the file nightly. as first activated, the system did not permit querying the file to determine books charged out, but this capability was added in 1969. also in december 1966, the redstone scientific information center brought into operation a pilot on-line book circulation system based on a converted machine readable catalog consisting of brief catalog entries. this pilot system remained in operation until october 1967, and was capable of recording loans, discharging loans, putting out overdues, maintaining reserves, and locating the record in the file (37). the bellrel real time loan system went into operation at bell laboratories library in march 1968 ( 38). bellrel has a data base consisting of converted catalog records, so that in effect it also is a remote catalog access system. bellrel serves three libraries remotely from two ibm 1050 terminals in each library. bellrel is a sophisticated on-line, real time circulation system that not only records and discharges books, but also replies to inquiries as to the status of a title, and the status of a copy, and will display the full record for a title, as would be required for remote catalog access. serials the library of the university of california, san diego, activated the first computerized serials control system ( 39). this system has as its objective production of a complete holdings list, lists of current receipts, binding lists, claims, nonreceipt lists, and expiration of subscription lists. checking in was accomplished by manual removal from a file of a prepunched card for a specific title and issue. the check-in clerk sent this card to the computer center for processing and the journal issue to the shelves. this 224 journal of library automation vol. 3/3 september, 1970 technique of prepunching receipt cards has generated new problems in some libraries, for professional advice is often needed as to action to be taken when the issue received does not match the prepunched card. nevertheless, the san diego system still operates, albeit with modifications. the washington university school of medicine library activated a serials control system in 1963 ( 40) that was essentially like that at san diego. a series of symposia held at washington university, with the first in the autumn of 1963, widely publicized the system and led to its adoption elsewhere. the university of minnesota biomedical library introduced a technique of writing in receipts of individual journal issues on preprinted check-in lists ( 41 ). check-in data was then keypunched from the lists. this system obviated the problem generated by prepunched cards that did not match received issues, but, of course, reintroduced manual procedures. difficulties with check-in procedures, and delays in receipt of printed lists of holdings made it clear that an on-line real time circulation control system would be superior to the batch systems described in the previous paragraph. laval university in quebec introduced the first on-line, real time system in 1969 ( 42). in september 1969 the laval on-line file held 16,335 titles. access to the file from cathode ray tube terminals is by accession number, and the file, or sections thereof, can be listed. the system also produces operating statistics and contains the potential for automatic claiming. the kansas union list of serials ( 43 ), which appeared in 1965, was the first computerized union list to contain holdings of several institutions. the kansas union list recorded holdings for nearly 22,000 titles in eight colleges and universities. reproduced photographically from computer printout and printed three columns on a page, this legible and easy-to-use list set the style for many subsequent union lists. acquisitions the national reactor testing station library was first to use a computer in ordering processes ( 44). a multiple-part form was produced for library records and for dealers. the library of the thomas j. watson research center activated a more sophisticated system in 1964 that produced a processing information list containing titles of all items in process, a shelf list card, a book card, and a book pocket label ( 45). the pennsylvania state university library put a computerized acquisition system into operation in 1964 ( 46). this system produced a compact, line-a-title listing of each item in process, together with an indication of the status of the item in processing. a small decklet of punch cards was produced for each item on a keypunch, and one of these cards was sent to the _computer center for processing each time its associated item changed status. the pennsylvania system also produced purchase orders. in june 1964, the university of michigan library ( 47) introduced a computerized acquisitions procedure more sophisticated than its predehistory of library computerization/kilgour 225 cessors. the michigan system produced a ten-part purchase order fanfold, an in-process listing, and computer produced transaction cards to update status of items in process; and carried out acconnting for encumbrance and expenditure of book fnnds. in addition, the system produced periodic listings of "do-not-claim" orders, listings of requests for quotation, and of "third claims" for decision as to future action on such orders. in 1966, the yale machine aided technical processing system began operation ( 48). it produced daily and weekly in-process lists arranged by author, a weekly order number listing, weekly fund commitment registers, and notices to requesters of status of request. subsequently, claims to dealers were added, as well as management information reports on activities within the system. like the pennsylvania and michigan systems, its inprocess list recorded the status of the item in processing. the washington state university library brought the first on-line acquisition system into operation in april 1968 ( 49). access to the system was by purchase order number, with records arranged in a random access file nnder addresses computed by a random number generator (50). the stanford university libraries on-line acquisition system began operation in 1969 (51), and employed a sequential file of entries having an index of words in author and title elements of the entry. the stanford system calculated addresses of index works by employing a division hashing technique on the first three letters of the word. standardization by 1965, a dozen or more libraries had a dozen or more formats for machine readable bibliographic records, and an impenetrable thicket of such records was evolving. fortnnately, the library of congress, with the help of the connell on library resources, took the initiative in standardization of format of bibliographic records and produced the now familiar marc format (52) . just as standardization of catalog card sizes enabled interchange of catalog records, so has marc made possible interchange of machine readable catalog records. this standardization has encouraged developments of networks, such as the suny biomedical network, nelinet, the washington state libraries network, and that of the ohio college library center. with each of these regional networks employing the marc bibliographic record, it will be possible to integrate these regional nodes into a future national network. substance and sum the first half of the first decade and a half of library computerization was confined almost entirely to two major mechanizations of mortimer taube's uniterm coordinate indexing. the computerization of single descriptors with attendant document numbers wa£ a relatively easy task. the first breakaway from computerized subject searching came at the 226 journal of library autornation vol. 3/ 3 september, 1970 douglas aircraft corporation, where the technique of producing one machine readable record from which multiple products could be obtained was introduced in 1961. the last half of library automation's decade and a half has been largely consumed with efforts to automate existing library procedures. althou~h notable departures have occurred that take advantage of the computers powerful qualities, on-line, real time techniques introduced at the very end of the historical period under review began again to use individual words as words, not unlike the logic in which the first applications employed uniterms; and it seems likely that the immediate future will witness increasing degrees of computerization based on individual words in bibliographic descriptions rather than on the record as a whole. acknowledgments the author is grateful to sheila bertram for identifying, searching out, and gathering most of the references used in this paper. cloyd dake gull furnished in correspondence invaluable information about events of the fifties and early sixties, and various librarians supplied photocopies of early documents. references l. kilgom, frederick g.: "the economic goal of library automation," college & research libraries, 30 (july 1969 ), 307-311. 2. baumol, william j.: "the costs of library and informational services." in libraries at large (new york: r. r. bowker co., 1969), pp. 168-227. 3. bunnow, l. r.: study of and proposal for a mechanized inforrrwtion retrieval system for the missiles and space systems engineering library (santa monica, california: douglas aircraft co., 1960). 4. balz, charles f.; stanwood, richard h .: literature on information retrieval and machine translation ( international business machines corp., november 1962). 5. balz, charles f.; stanwood, richard h.: literature on information retrieval and machine translation 2d. ed. (international business machines corp., january 1966). 6. speer, jack a. : libraries and automation; a bibliography with index (emporia, kansas: teachers college press, 1967). 7. tillitt, harley e.: "an experiment in information searching with the 701 calculator," journal of library automation, 3 (sept. 1970 ), 202-206. 8. bracken, r. h. ; tillitt, h. e.: "information searching with the 701 calculator," journal of the a ssociation for computing machinery, 4 ( april 1957 ), 131-136. 9. tillitt, harley e. : "an application of an electronic computer to information retrieval." in boaz, martha : modern trends in doc'lrlm entation (new york: pergamon press, 1959), pp. 67-69. history of library computedzationjkilgour 227 10. zaharias, jerome l.: lizards; libmry irlformation search and retrieval data system (china lake, california: u. s. naval ordnance test station, 1963). 11. bracken, robert h.; oldfield, bruce g.: "a general system for handling alphameric information on the ibm 701 computer," journal of the association for computing machinery, 3 (july 1956), 175-180. 12. rosen, saul: "electronic computers: a historical survey," computing surveys, 1 (march 1969), 7-36. 13. barton, a. r.; schatz, v. l.; caplan, l. n.: information retrieval on a high speed computer (evendale, ohio: general electric co., 1959), p· 8. 14. gull, c. d.: personal communication, (22 august 1969) . 15. dennis, b. k.; brady, j. j.; dovel, j. a., jr.: "five operational years of inverted index manipulation and abstract retrieval by an electronic computer," journal of chemical documentation, 2 (october 1962 )) 234-242. 16. austin, charles j.: medlars; 1963-1967 (bethesda, maryland: national library of medicine, 1968). 17. summit, roger k.: "dialog: an operational on-line reference retrieval system." in association for computing machinery: proceedings of 22nd national conference. (washington, d. c.: thomson, 1967), pp. 51-56. 18. pizer, irwin: "regional medical library network," bulletin of the medical libmry association, 51 (april1969), 101-115. 19. koriagin, gretchen w .: "library information retrieval program," journal of chemical documentation, 2 (october 1962 ) 242-248. 20. fasana, paul j.: "automating cataloging functions in conventional libraries," 7 (fall 1963), 350-365. 21. kilgour, frederick g.: "library catalogue production on small computers," american documentation, 17 (july 1966), 124-131. 22. nugent, william r.: "nelinet-the new engjand information network." in congress of the international federation for information processing, 4th, edinburgh, 5-10 august, 1968: proceedings (amsterdam: north-holland publishing co., 1968), pp. g 28-g 32. 23. payne, charles t.: "the university of chicago's book processing system." in proceedings of a conference held at stanford university libraries, october 4-5, 1968 (stanford, califomia: stanford university libraries, 1969). 24. wilkinson, w . a.: personal communication (november 1969). 2.5. wilkinson, w. a.: "the computer-produced book catalog: an appli · cation of data processing at monsanto's information center." in university of illinois graduate school of library science: proceedings of the 1965 clinic on library applications of data processing (champaign, illinois: illini union bookstore, 1966), pp. 92-111. 228 journal of library automation vol. 3/3 september, 1970 26. heiliger, edward: "florida atlantic university library." in university of illinois graduate school of library science: proceedings of the 1965 clinic on library applications of data processing (champaign, illinois: illini union bookstore, 1966), pp. 92-111. 27. bregzis, ritvars: personal communication (november 1969 ) . 28. bregzis, ritvars: "the ontario universities library project-an automated bibliographic data control system," college & research libraries, 26 (november 1965), 495-508. 29. robinson, charles w.: "the book catalog: diving in," wilson library bulletin, 40 (november, 1965), 262-268. 30. luhn, h. p.: "a business intelligence system," ibm journal of research and development, 2 (october 1958), 315-319. 31. stanwood, richard h.: "the merge system of information dissemination, retrieval and indexing using the ibm 7090 dps ." in association for computing machinery: digest of technical papers (1962), pp. 38-39. 32. young, e. j.; williams, a. s.: historical development and present status-douglas aircraft company computerized library program (santa monica, california: douglas aircraft co., 1965). 33. haznedari, i.; voos, h.: "automated circulation at a government r & d installation," special libraries, 55 (february 1964), 77-81. 34. gibson, r. w., jr.: randall, g. e.: "circulation control by computer," special libraries, 54 (july-august 1963), 333-338. 35. mccoy, ralph e.: "computerized circulation work: a case study of the 357 data collection system," library resources & technical services, 9 (winter 1965), 59-65. 36. hamilton, robert e.: "the illinois state library 'on-line' circulation control system." in university of illinois graduate school of library science: proceedings of the 1968 clinic on library applications of data processing (urbana, illinois: graduate school of library science, 1969), pp. 11-28. 37. "redstone center shows on-line library subsystems," datamation, 14 (february 1968), 79, 81. 38. kennedy, r. a. : "bell laboratories' library real-time loan system (bellrel)," journal of library automation, 1 (june 1968), 128-146. 39. university of california, san diego, university library: report on serials computer project; university library and ucsd computer center (la jolla, california: university library, july 1962). 40. pizer, irwin h.; franz, donald r.; brodman, estelle: "mechanization of library procedures in the medium-sized medical library: i. the serial record," bulletin of the medical library association, 51 (july 1963) , 313-338. 41. strom, karen c.: "software design for bio-medical library serials control system." in american society for information science, annual meeting, columbus, 0., 20-240ct.1968: proceedings, 5 (1968) , 267-275. history of l-ibrary computerizationjkilgour 229 42. varennes, rosario de : "on-line serials system at laval university library," journal of library automation, 3 (june 1970). 43. kansas union list of serials ( lawrence, kansas: university of kansas libraries, 1965 ), 357 pp. 44. griffin, hillis l.: "electronic data processing applications to technical processing and circulation activities in a technical library." in university of illinois graduate school of library science: p-roceedings of the 1963 clinic on library applications of data process'ing (champaign, illinois: illini union bookstore, 1964) , pp. 96-108. 45. randall, g. e.; bristol, roger p.: "pil (processing information list ) or a computer-controlled processing record," special libraries, 55 (feb. 1964), 82-86. 46. minder, thomas l.: "automation-the acquisitions program at the pennsylvania state university library." in international business machines corporation: ibm library mechanization symposium, endicott, new york, may 25, 1964, pp. 145-156. 47. dunlap, connie: "automated acquisitions procedures at the university of michigan library," library resources & technical services, 11 (spring 1967), 192-206. 48. alanen, sally; sparks, david e.; kilgour, frederick g.: "a computermonitored library technical processing system." in american documentation institute, 1966 annual meeting, october 3-7, 1966, santa monica, california: proceedings, pp. 419-426. 49. burgess, t .; ames, l.: lola; library on-line acquisitions subsystem (pullman, wash.: washington state university library, july 1968). 50. mitchell, patrick c.; burgess, thomas k.: "methods of randomization of large files with high volatility," journal of library au-tomation, 3 (march 1970). 51. parker, edwin b.: "developing a campus information retrieval system." in proceedings of a conference held at stanford university libraries, october 4-5, 1968 (stanford, california: stanford university libraries, 1969), pp. 213-230. 52. "preliminary guidelines for the library of congress, national library of medicine, and national agricultural library implementation of the proposed american standard for a format for bibliographic information interchange on magnetic tape as applied to records representing monographic materials in textual printed form (books) ," jourruzl of ubrary automation, 2 (june 1969), 68-83. lib-s-mocs-kmc364-20141005045144 providing bibliographic services from machine-readable data basesthe library's role richard de gennaro: director of libraries, university of pennsylvania, philadelphia, 215 libraries will play a key .role in providing access to data bases, but not by subscribing to tape services and establishing local processing centers as is commonly assumed. high costs and the nature of the demand will make this approach unfeasible. it is more likely that the library~s reference staff will develop the capability of serving as a broker between the local campus user and the various regional or specialized retail distribution centers which exist or will be established. this brief paper will attempt to counter the widely held view that the larger research libraries will soon need to begin subscribing to the growing number of data bases in machine-readable form and providing current awareness and other services from them for their local users. 0 it will speculate on how this field might develop and will suggest a less expensive and more feasible strategy which libraries may use to gain access to these increasingly important bibliographic services. the key question of who will pay for these new services, the user or the institution, will also be discussed. while it is clearly outside the scope of this paper to review the state-ofthe-art of data base services, reference to a few key works and a brief introduction to the subject may be helpful. the most comprehensive and authoritative review of the state-of-the-art of the field and its literature is the excellent chapter entitled "machinereadable bibliographic data bases" by marvin c. gechman in the 1972 volume of the annual review of information science and technology. 1 a useful selection of readings is key papers on the use of computerbased bibliographic services edited by stella keenan and published jointly • this paper was developed from a talk by the author on a panel entitled "library management of machine-readable data bases." the program was jointly sponsored by cola, isad, and acrl and took place at the ala conference in las vegas, june 24, 1973. 216 journal of library automation vol. 6/4 december 1973 by the american society for information science and the national federation of abstracting and indexing services in 1973.2 a study of six university-based information systems made by the national bureau of standards is essential and contains in convenient form comparative and descriptive information about these pioneering centers which are sponsored by the national science foundation.3 some of the most useful and important data bases available are those that have been developed by the indexing and abstracting services as byproducts of their efforts to automate the production of their regular printed publications. like the publications, the tapes come in a wide variety of incompatible formats. among the important producers are: chemical abstracts service, biosciences information service, engineering index inc., american institute of physics, and the american geological institute. ccm information corporation (pandex) and the institute for scientific information are two examples of major commercial suppliers. several of the scientific societies received substantial grants from the n ationa! science foundation and other sources in the 1960s for this automation effort, and it was generally expected that an important new market for the by-product tapes would develop among researchers in universities and in industry. imaginative and forward-looking librarians and computer people at various universities applied for and received grants to establish centers where these new data tapes could be used to provide current awareness and retrospective search services to users. the national aeronautics and space administration established a network of regional dissemination centers at six universities, including the universities of connecticut, indiana, and new mexico, the north carolina science and technology research center, university of pittsburgh, and the university of southern california. the national science foundation has been supporting centers at the university of georgia, lehigh university, university of california at los angeles, ohio state university, and stanford university. other centers have been established at the illinois institute of technology research institute and the university of florida. it is worth noting that nearly all centers provide services free to their own institutional users and continue to be heavily subsidized. all seem eager to expand their markets to include paying customers from a larger region. the latest entry into this field is the new england board of higher education's northeast academic science information center (nasic) sponsored by nsf. nasic's approach is basically different from the unitary centers that have been named. it will attempt to become a broker between the various existing centers and its own members, facilitating their access to existing services elsewhere. it will serve a ten-state region and is expected, perhaps somewhat optimistically, to become self-supporting after the three-year grant period ends. the number of data bases available in the united states is now over a hundred and is growing rapidly, apparently without benefit of firm stan..... providing bibliographic services/ de gennaro 217 dards. a parallel development is taking place in europe. as the number of available data bases increases, and as the activity at these centers expands, more and more librarians become interested in and concerned about how they are going to provide these new, important, and expensive services on their own campuses. interest among librarians in data base services is running high. a session at the association of research libraries conference in the spring of 1973 was devoted to it, and a program at the annual meeting of the american library association in las vegas on the subject was jointly sponsored by the cola discussion group~ the information science and automation division, and the association of college and research libraries. while this interest is commendable and should be stimulated, it is also important that it be tempered and put into perspective by a realistic consideration of some of the costs and problems involved in providing these services. this is what the remainder of this paper will attempt to do. the title of the ala program was "library management of machine~ readable reference data bases." implied in that title are two basic assumptions that are widely accepted: one is that libraries will play a key role in providing access to information in machine-readable data bases on their campuses. the other is that in order to provide this access they will have to acquire and maintain these data bases and develop the capability of searching and manipulating them for their local users. the first assumption is valid; libraries will be responsible for assisting users in gaining access to information in this new form. the second assumption is highly questionable, if not invalid. it is extremely unlikely that many individual libraries will be able to afford to establish centers to acquire and process these machine-readable data bases. while it may appear that a straw man is being set up that can be easily demolished, the idea that academic libraries must and will begin acquiring and servicing many large and expensive data bases, and even statistical data banks, is still widely enough held that it ought to be put to rest. how did this idea gain such currency? perhaps it was because the first available data bases were from the indexing and abstracting services and contained machine-readable versions of their printed indexes. since li~ braries subscribed to the printed editions, it followed that they should also subscribe to the tape editions. the same is true for the census tapes. li~ braries were the chief repositories for printed census publications, so it was natural to assume that they would have to subscribe to and make avail~ able the machine~readable census data as well. we now know better about the census tapes; the problem was simply beyond our resources, and they are being made available from specialized centers. a similar solution may well emerge for the bibliographical data tapes of the indexing and ab~ stracting services. to help put matters into perspective, it might be useful to review a few other ideas we had in the last two decades on how certain technological de~ 218 journal of library automation vol. 6/ 4 december 1973 veloprnents would be implemented in the library. take microfilm, for example. back in the 1950s when microfilm came of age for library use, many librarians thought that every major library would require its own laboratory where large quantities of film could be produced and processed under the direction of a new breed of librarian called a documentalist. several major libraries did establish such laboratories for a time, but the only remaining ones of any significance are at the library of congress and a few other large libraries . most of the others were put out of business by the copying machine, the local service bureau, and commercial micropublishers-and the documentalists became information scientists. library automation provides other interesting examples. many of us recall that in the 1960s it was a commonly held view that each major library would have to automate its operations, and that librarians would learn to master the computer that was soon to be installed in every library basement, or see themselves replaced by computer experts. as we all know, it did not happen that way. librarians will probably end up with computer terminals or minicomputers, with software packages supplied by library cooperatives or commercial vendors. when the marc tapes were first made available, it was assumed (and this is what the marc i experiment was all about) that each library would have to subscribe to the tapes and design, implement, and operate its own system to use the data in its cataloging operations. again, it did not happen that way. marc data are being used by libraries, but indirectly through cooperative centers such as oclc, or through commercial vendors of card services such as information design or josten's, inc. individual libraries are not subscribing to marc tapes, as we had thought would be the case. the point of citing these few examples is to suggest that it is extremely difficult in the early stage of a new technology to predict with any confidence how it will be introduced and implemented, and what effects it will have. we seem to have a natural tendency first to try to cope with each new technological development on a do-it-yourself individual library level, and when experience teaches us that implementing the particular technology is more difficult and more expensive than we thought, we regroup and try a broader-based approach. this is approximately where we are with data base services; it is time for a broader-based approach. again, it is unlikely that libraries will provide access to machine-readable data by setting up their own campus information centers to acquire and process data bases. anyone who takes the time to look at a list of data bases available and their annual subscription rates will understand that research library book budgets will not be large enough to cover these additional subscription costs. in fact, the subscriptions are only a minor element in the total cost of providing these services. the data bases must be cumulated and maintained. programs to manipulate and access them in their many nonstandard formats and contents must be written or adapted. c1 providing bibliographic services/de gennaro 219 the cost of administering and marketing the services and interfacing with the users will be high. perhaps the most critical question to be answered is: will the individual user be charged for the services he uses or will the costs be absorbed by the university? the answer to that question will determine how and to what extent the machine-based services will be used in the future. if they are offered free, as are traditional library services, then one can assume with some confidence that a substantial demand for them will materialize. this has in fact been the early experience of the centers at the university of georgia and ohio state and others where use has been totally subsidized by grant money.3·' on the other hand, if the individual user is asked to pay for these services out of his own pocket or even out of departmental or grant funds, the market for them will be severely limited. it is extremely unlikely that large numbers of faculty and other researchers in universities will be seriously interested in becoming paying users of machine-based information services. the experience of c. c. parker at the university of southampton may prove to be typical.5 he reported a drop from forty-seven to five users of an sdi service after charges were introduced. it was not that the users could not pay the charges, but that they preferred to use their resources for other more important needs. the national library of medicine recently instituted user charges in the medline system in order to effect a needed reduction in the number of users. the case for giving these services to users free is theoretically sound in the traditional library context, but there are practical difficulties. first, these services will be expensive and they will require a net addition to library budgets rather than a transfer from one activity to another; the prospects for such budget increases seem dim in the next few years. second, if the services are offered free, there will be no natural or automatic mechanism for controlling their use, and such control is essential to limit costs. once users get on a free subscription list they will tend to stay on it whether they actually use the products or not. this happens in many libraries where current accessions lists are regularly sent to faculty, most of whom discard them unread. on the other hand, there is ample precedent for charging a modest fee for certain services in libraries. the best example is the almost universal charge for photocopies. in those instances where libraries offered free copies, the service was abused and charges had to be reinstated. it seems likely that a combination of institutional subsidy and individual charges will evolve as the dominant method of paying for machinereadable services. in order to recover some costs and prevent abuses, an appropriate system of charges will have to be instituted in spite of the logic of the argument for free services. incidentally, the case for free computer time in universities is perhaps equally valid, but it has never been accepted by the responsible budget officers. 220 journal of library automation vol. 6/4 december 1973 regardless of who pays, these services will have to be advertised and marketed aggressively to reach the limited number of potential users on each campus. it will not be enough to announce their availability and wait for customers. but even the best salesman on the most research-oriented campus will probably fail to find enough users to justify the high costs of providing the extensive and diverse subject coverage that every university will require. the solution, of course, lies in the establishment of a small number of comprehensive regional or even national information processing centers, possibly backed up by a much larger number of specialized centers or services for particular subject or mission-oriented fields such as physics, chemistry, medicine, pollution, urban studies, census data, etc. libraries will play a key role in facilitating access to data bases by functioning as the interface or broker between the users on campus and these regional and special processing and distribution centers. this means that they must develop a new kind of information or data services librarian on their reference staffs whose function it will be to publicize these services and maintain extensive files of information on their scope, contents, cost, and availability. these reference specialists will also guide users to the most appropriate services, help them to build and maintain their interest profiles, and provide assistance with the business aspects of dealing with vendors."'"' after an initial start-up period, this function should and doubtless will become a fully integrated part of the regular reference service, and the need for specialists will disappear as this knowledge becomes a part of every reference librarian's repertoire. the available data base services fall into two main categories: off-line batch and on-line interactive services. the most commonly available up to now have been regular off-line current awareness ( sdi) services based on an interest profile; these have been supplemented by occasional requests for retrospective searches of the older files. the results of these off-line searches are delivered to the subscriber by conventional mail. on-line services permit the user or the reference specialist to access a portion of the data base directly via terminals and telephone lines and perform the search in an interactive mode. some results are immediately displayed on the terminal and others are sent by mail. the lockheed information retrieval service and systems development corporation have recently begun offering interactive searching ~th online computer terminals of a large selection of the most useful bibliographic data bases. with this capability commercially available from leased terminals on a fee-per-use basis, it will be difficult for a university or even some existing centers to justify subscribing to and maintaining these data bases for their own limited use. if lockheed, sdc, and other vendors ca~ d evelop the market and operate these services at a profit, they may be able to 00 the university of pennsylvania library recently established a data services office based on this concept with encouraging early results. providing bibliographic services/ de gennaro 221 satisfy a very substantial portion of the need for these new bibliographic services. medline, toxline, recon, and the new york times information bank provide other models for specialized and centralized interactive services. some authorities assert that this trend toward on-line interactive searching will accelerate and eventually supersede tape searching.6 others argue that the cost of maintaining and searching on-line the really large data bases is prohibitive and will remain so for several years to come. it seems most likely to this author that the trend will be toward on-line systems covering a limited period of time, probably the latest three to five years, with supporting off-line services for retrospective searches. if this proves to be the case, libraries will find it practical and convenient to make terminals available at or near reference desks. a close look at the several centers which now exist on individual campuses would probably show that they are heavily subsidized by grant or other outside funds, and that they are trying to expand to serve their states or even wider regions in order to achieve greater cost effectiveness. these centers ·deserve the credit that is always due pioneers. they are in the process of developing the patterns for providing these services in the future. one of the chief lessons they may have already taught us is that a single university, or even possibly a single state or region, is not a large enough market base upon which to build this activity. these centers will require a large volume of business to justify their high overhead and operating costs and they will seek and welcome additional paying customers. to summarize and conclude, libraries will play a key role in providing access to machine-readable data bases, but they will generally not do it by acquiring and managing these data bases in local campus centers because of the high costs involved. these high costs and the limited market will restrict the number of processing centers to several regional or even national centers, supplemented by a larger number of specialized discipline and mission-oriented services. many data bases and services will be available on a fee-for-service basis either through existing centers or directly from professional societies, government agencies, and commercial vendors with the library serving as facilitator or broker. it seems likely that a combination of institutional subsidies and individual cl1arges will emerge as the pattern for paying for these new computer-based bibliographical services. references i. marvin c. gechroan, "machine-readable bibliographic data bases," in annual review ()j info1'11uj.tion science and technology, v. 7 (washington, d.c.: asis, 1972). p.323-78. 2. stella keenan, ed., key papers on the use of computer-ba.sed bibliographic services (washington, d.c.: asis, 1973). 222 journal of library automation vol. 6/ 4 december 1973 3. b. marron, and others, a study of six university-based information systems (washington, d.c.; national bureau of standards, 1973 [nbs technical note 781]). 4. james l. carmon, "a campus-based information center," special libraries 64:6569 (feb. 1973). 5. c. c. parker, "the use of external current awareness services at southampton university," aslib proceedings 25:4-17 (jan. 1973). 6. m. cerville, l. d. higgins, and francis j. smith, "interactive reference retrieval in large files," information storage and retrieval 7:205-10 (dec. 1971). digitization has bestowed upon librarians and archivists of the late 20th and early 21st centuries the opportunity to reexamine how they access their collections. it draws these two traditional groups together with it specialists in order to collaborate on this new great challenge. in this paper, the authors offer a strategy for adapting a library system to traditional archival practice. t he librarian and the archivist . . . both collect, preserve, and make accessible materials for research; but significant differences exist in the way these materials are arranged, described, and used.”1 among the items usually collected by libraries are: published books and serials, and in more recent times, commercially available sound recordings, films, videos, and electronic resources of various types. archives, on the other hand, tend to collect original records of an organization, unique personal papers, as well as other effects of individuals and families. each type of institution, given its particular emphasis, has its own traditions and its own methods of dealing with its collections. most midto large-sized automated libraries in the united states and abroad use machine readable cataloging (marc) records to form the basis of their online catalogs. bibliographic records, including those in the marc format, generally represent an individually published item, or “information product,”2 and describe the physical characteristics of the item itself. the basic unit of archival description, however, is a much more complex entity than the basic unit of bibliographic description and often involves multiple hierarchical levels that may or may not extend down to the level of individual items. at portland state university (psu) the authors examined whether the capabilities of their present integrated library system could be expanded to capture the hierarchical structure of traditional archival finding aids. ■ background as early as 1841, the cataloging rules established by panizzi were geared toward locating individual published items. panizzi based his rules on the idea that any person looking for any particular book should be able to find it through the catalog.3 this tradition has continued over time up through current standards such as the anglo-american cataloguing rules and reaffirmed in marc, the standard for the representation and exchange of bibliographic information that has been widely used by libraries for over thirty years.4 archival description, on the other hand, is generally based on the fonds, that is, the entire collection of materials in any medium that were created, accumulated, and used by a particular person, family, or organization in the course of that creator’s activities and functions.5 thus, the basic unit of archival description, usually a finding aid, is a much more complex entity than the basic unit of bibliographic description, often involving multiple hierarchical levels of description that may or may not extend down to the level of individual items. before archival description begins, the archivist identifies related groups of materials and determines their proper arrangement. once the arrangement is determined, then the description of the materials reflects both their provenance and their original order.6 the first explicit statement of the levels of arrangement in an archival collection was by holmes and has since been elevated to the level of dogma in the archival community.7 a more recent statement in describing archives: a content standard (dacs) indicates that the actual levels of arrangement may differ for each collection. by custom, archivists have assigned names to some, but not all, levels of arrangement. the most commonly identified are collection, record group, series, file (or filing unit), and item. a large or complex body of material may have many more levels. the archivist must determine for practical reasons which groupings will be treated as a unit for purposes of description.8 rephrasing holmes, the five levels of arrangement can be defined as: 1. the collection level which holmes called the depository level—the breakdown of the depository’s complete holdings into a few major divisions based on the broadest common denominator 2. the record group level—the fonds or complete collection of the papers of a particular administrative division or branch of an organization or of a particular individual or family 3. the series level—the breakdown of the record group into natural series and the arrangement of each series with respect to the others 4. the filing unit level—the breakdown of each series into unit components, which are usually fairly obvious if the documents are kept in file folders 5. the document level—the level of individual items digital collection management through the library catalog michaela brenner, tom larsen, and claudia weston digital collection management through the library catalog | brenner, larsen, and weston 65 michaela brenner (brennerm@pdx.edu) and tom larsen (larsent@pdx.edu) are database maintenance and catalog librarians, and claudia weston (westonc@pdx.edu) is assistant university librarian for technical services, portland state university. 66 information technology and libraries | june 2006 the end result of archival description is usually a finding aid that ideally presents an accurate representation of the items in an archival collection so that users can, as independently as possible, locate them.9 building on the print finding aid, the archival community has explored a number of mechanisms for disseminating information on the availability of items in their collections. in 1983, the usmarc format for archival and manuscript control (marc-amc) was released and subsequently sanctioned for use as one possible standard data structure and communication protocol in the saa descriptive standard archives, personal papers, and manuscripts (appm) and its successor, dacs.10 its adoption, however, has been somewhat controversial among archivists.11 the difficulty in capturing the hierarchical nature of collections through the marc format is one factor that has limited the use of marc by the archival community. while it is possible to encode this hierarchical description in marc using notes and linking fields, few archivists in practice have actually made use of these linking fields.12 thus, in archival cataloging, marc records have been used primarily for collection-level description, allowing users to search and discover only general information about archival collections in online catalogs while the finding aid has remained the primary tool for detailed data at all levels of description. in 1995, the encoded archival description (ead) emerged as a new standard for encoding descriptions of archival collections. the ead standard, like the marc standard, allows for the electronic storage and exchange of archival information; but unlike marc, it is based on the finding aid. ead is well suited for encoding the hierarchical relationships between the different parts of the collection and displaying them to the user, and it has become more widely adopted by the archival community. as outlined, the standards and systems chosen by an institution are dictated by the needs and traditions of that institution. the archival community relies heavily on finding aids and, with increasing frequency, on ead, their electronic extension; whereas the library community heavily relies on the online public access catalog (opac) and marc records. new trends capitalizing on the strengths of both traditions are evolving as libraries and archives seek ways to improve access to their archival and digital collections. ■ access to digital archival collections in libraries when searching the web for collections of information, one frequently encounters separate interfaces for traditional library, archival, and digital collections even though these collections may be owned, sponsored, hosted, or licensed by a single institution. descriptive records for traditional library materials reside in the opac and are constructed according to standard library practice, while finding aids for the archival and digital collections increasingly appear on specially designed web sites. this, of course, means that users searching the opac may miss relevant materials that are described only in the archival and digital documents database or web site. similarly, users searching the archival and digital documents database or web site may miss relevant materials that are described only in the opac. in other instances, libraries, such as the library of congress, selectively add records to their opacs for individual items in their archival and digital document collections. this incorporation allows users more complete access to items within the library’s collections. authority control and the assignment of descriptors further enhance access to the item-level records. to minimize processing costs, however, libraries frequently create brief descriptive records for items, thereby limiting their value to patrons.13 by creating descriptive records for the items only, libraries also obscure the hierarchical relationships among the items and the collections in which they reside. these relationships can provide the user with a useful context for the individual items and are an essential part of archival description. still other libraries, such as the university of washington, include collection-level marc records in the opac for their archival and digital document collections. these are searchable in the opac in the same way as bibliographic records for other materials. these collection-level records can then in turn be linked to finding aids that describe the collections more fully.14 collection-level records often are used in libraries where library resources may be insufficient for cataloging large collections of materials at the item level.15 the guidelines for collection-level records in appm and dacs, however, allow for additional fields that are not ordinarily used in library bibliographic records. these include such things as descriptions of the organization and arrangement of the collection, citations for published descriptions of the collection and links to the finding aid, and acknowledgment of the donors, as well as ample subject access to the collection. despite their potential for detail, collectionlevel records cannot provide the same degree of access to individual items as full item-level records. ■ an approach taken at portland state university library in many ways, archival and digital-document collections are continuing resources. a continuing resource is defined as “. . . a bibliographic resource that is issued over time digital collection management through the library catalog | brenner, larsen, and weston 67 with no predetermined conclusion. continuing resources include serials and ongoing integrating resources.”16 like published continuing resources, archival and digital collections generally are created over time with no predetermined conclusion. in fact, some archival collections continue to grow even after part of the collection has been accessioned by a library or archive. thus, even though many of the individual items in the collection might be properly treated as monographic (not unlike serial analytics), it would not be unreasonable to treat the entire collection as a continuing resource. with this in mind, the authors examined whether their electronic-resource management system could be adapted to accommodate evolving collections of digitized and born-digital material. more specifically, the present system was examined to determine whether its capabilities could be expanded to capture the hierarchical structure found in traditional archival finding aides. the electronic resource management system in use by psu library is innovative interfaces’ electronic resource management (erm) product. according to innovative interfaces inc.’s (iii) marketing literature, “[erm] effectively controls subscription and licensing information for licensed resources such as e-journals, abstracting and indexing (a&i) databases, and full-text databases.”17 to control and provide improved access to these resources, erm stores details about purchase orders, aggregators and publishers, subscription terms, licensing conditions, breadth of holdings, internal and external contact information, and other aspects of these resources that individual libraries consider relevant. for increased security and data integrity, multilevel permissions restrict viewing and editing of data to the appropriate level of staff or patron. the ability of erm to replicate the two-level hierarchical relationships between aggregators or publishers and the electronic and print resources they provide was of particular interest to the authors. through erm and iii’s batch record load capabilities, bibliographic and resource records can be loaded into the iii system using delimited source files such as those provided by serials solutions. resource records are the mechanisms used by iii to describe digital resources at a collection, subcollection, or title level, thereby enabling the capture of descriptive information not permitted by standard bibliographic records. iii uses holdings records to document serial holdings statements. according to the marc 21 formats for holdings data, a holdings statement is the “record of the location(s) and bibliographic units of a specific bibliographic item held at one or more locations.”18 iii holdings records may also contain a url for connecting to an electronic resource. in figure 1, for example, the resource record shows that psu library provides limited access to a number of journal titles through its springer journals online resource. as seen in figure 2, the display of a holdings record embedded in a bibliographic record provides more specific information on the availability of a title through the library’s collection. in this particular example, the information display reveals that print volumes are available for this title but that psu only has this title available as a part of the springer-verlag electronic collection accessible by clicking on the hotlink. more information on the springer collection can be discovered by clicking on the about resource button to retrieve the springer journals online resource record. this example, then, represents a two-level hierarchy where the resource springer journals online is analogous to an archival collection and abdominal imaging is analogous to an archival series. adaptation of erm for library-created digital collections was explored through work being done to fulfill the requirements of a grant received in 2005 by psu library. the goal of this grant was “to develop a digital library under the sponsorship of the portland state university library to serve as a central repository for the collection, accession, and dissemination of key planning documents and reports, maps, and other ephemeral materials that have high value for oregon citizens and for scholars around the world.”19 the overall collection is called the oregon sustainable community digital library (oscdl). in addition to having its own web site, it was decided to make this collection accessible through the psu library catalog so that patrons could find digitized original documents about the city of portland together with other library materials. bibliographic records would be added to the database with hyperlinks to the digitized original documents using existing staff and tools. these bibliographic marc records would be as complete as possible. initially, attention was focused on documents originating from four different sources: ernest bonner, a former portland city planner; the city of portland archives; metro (the regional government for the portland, oregon, metropolitan area); and trimet (the portland metropolitan public transportation system). along with the documents, metadata was received from various databases. these descriptions ranged from almost nothing to detailed archival descriptions. unlike the challenge of shifting titles and holdings with typical serials collections, the challenge of this project was to reflect the four hierarchical levels of psu library’s collection (figure 3). innovative’s system structure was manipulated in order to accomplish this. at the core of iii’s erm module are resource records (rr) created to reflect the peculiarities of a particular collection. linked to these resource records are holdings records (hr) containing hyperlinks to the actual digitized documents (doc h1 – doc h3) as well as to their respective bibliographic records (bib doc h1 – bib doc h3) containing additional information on the individual items within the collection (figure 4). 68 information technology and libraries | june 2006 first, resource records were manually created for three of the subcollections within the bonner collection. these subcollections contained documents reflecting the development of harbor drive, front street, and the park blocks. the fields defined for the resource records include the resource title; type (digitized documents) and format (pdf) of the resource; a hyperlink to the new oscdl web site; content and systems contact names; a brief description of the resource; and, most importantly, the resource id used to connect holding records for individual documents to the corresponding resource record. next, the batch-loading function in erm was used to create bibliographic and holding records and associate them with the resource records. taking advantage of tracking data produced during the digitization process (figure 5), spreadsheets were created for each collection reflecting the data assigned to each individual digitized document. the document title, the date the document was created, number of pages, and summaries were included. coordinates for the streets mentioned in the documents were also included. because erm uses issn numbers and titles as match points for record loads, ”issn” numbers were also manufactured for each document and included in the spreadsheet. these homemade numbers were distinguished by using pdx as a prefix followed by collection and document numbers or letters, for example, pdx0022090 or pdxhdcoll. fortunately, erm accepted these dummy issns (figure 6). from this data spreadsheet, the system-required comma delimited coverage load file (*.csv) was also created. for this file, the system only allows a limited number of fields, and is very particular about the right terms, including correct capitalization, for the header row. individual document titles, the made-up issn numbers, individual urls to the documents, and a collection-specific resource id (provider) that connects all the documents from a collection to their respective resource record were included. the resource id is the same for all documents in one collection (figure 7). in the first attempt, the system was set up to produce holdings and bibliographic records automatically, using the data from the spreadsheets. for the bibliographic records, a system-provided template was created that included some general subject headings, genre headings, an author field, and selected fixed fields, such as language, bibliographic level, and material type (figure 8). records for the harbor drive collection were loaded, and the system created brief bibliographic and holdings records and linked them to the harbor drive resource record. the records were globally updated to add the general material designator (gmd) “electronic resource” to the title as well as the phrase “digitized document” as a local “call number” to make these documents more visible in the browse screen of the online catalog (opac) (figure 9). the digitized documents now could be found in the library catalog by author, subject, or keyword. the brief bibliographic records (figure 10) allow the user to go either to the digitized document via url or to the resource record with more information on the resource itself and links to other items in the same collection. the resource record then provides links either to the new oscdl web site (via the oregon sustainable community digital library link at the bottom of the resource record), to the bibliographic description of the individual document, or to the digitized document (figure 11). however, the quality of the brief bibliographic records that had been batch generated through the system-provided template was not satisfactory (figure 8). it was decided that more document-specific data like summaries, number of pages, the dates the documents were created, geographical information, and documentlevel local subject headings should be included. these data were already available from the original spreadsheets. with limited time and staff resources, full bibliographic marc records were batch created using the spreadsheets, detailed templates adjusted slightly to each collection, microsoft mail merge, and finally, the marcedit program created by terry reese of oregon state university (http://oregonstate.edu/~reeset/marcedit/html/index.html). this gave maximum control over the data to be included and the way they would be included. it also eliminated the need to clean up the data following the record load (figure 12). subsequently, full bibliographic records were created for the subcollections harbor drive, front street, and park blocks, to connect them to the next higher level, the bonner collection (figure 3). these records were also contributed to worldcat. mimicking the process used at the document level, a resource record was created for the bonner collection and the holdings records for the three subcollections were connected with their corresponding bibliographic records (figure 13). resource records with their corresponding item-level records for trimet, the city archives, and metro followed. the final step was then to add the resource record and the bibliographic record for the whole oscdl collection (figure 14). since this last bibliographic record is not connected to a collection above it, there is only a hyperlink to the oscdl resource record (figure 15). more subcollections and their corresponding digital documents are continually being added to oscdl. structures in psu library’s opac are adjusted as these collections change. digital collection management through the library catalog | brenner, larsen, and weston 69 ■ conclusion according to salter, “digitizing, the current challenge that straddles the 20th and 21st centuries, has given archivists and librarians pause to reconsider access to their collections. the world of digitization is the catalyst for it people, librarians, and archivists to unify the way they do things.”20 in this paper, a strategy has been offered for adapting a library system to traditional archival practice. by making use of some of the capabilities of the module in psu library’s integrated library system that was originally designed for managing electronic resources, a method was developed for managing digital archival collections in a way that incorporates some of the features of a traditional finding aid. the contents of the various hierarchical levels of the collection are fully represented through the manipulation of the record structures available through psu’s system. this technique provides for enhanced access to the individual items of a collection by giving the context of the item within the collection. links between the hierarchical levels facilitate navigation between the levels. although the records created for traditional library systems are not as rich as those found in traditional finding aids, or in ead, their electronic equivalent; and the visual arrangements are not as intriguing as a wellplanned web site, the ability to show how items fit within the greater context of their respective collection(s) is a step toward reconciling traditional library and archival practices. enabling the library user to virtually browse through the overall resources offered by the library and then, if desired, through the various levels of a collection for relevant resources enhances the opportunities presented to the user for finding relevant information. references and notes 1. society of american archivists, “so you want to be an archivist: an overview of the archival profession,” 2004, www.archivists.org/prof-education/arprof.asp (accessed apr. 24, 2006). 2. kent m. haworth, “archival description: content and context in search of structure,” journal of internet cataloging 4, no. 3/4 (2001): 7–26. 3. antonio panizzi, “rules for the compilation of the catalogue,” the catalogue of the british museum 1 (1841): v–ix. 4. joint steering committee for revision of aacr, angloamerican cataloguing rules, 2nd ed., 2002 revision (chicago: ala, 2002). 5. society of american archivists, describing archives: a content standard (chicago: society of american archivists, 2004). 6. haworth, “archival description.” 7. oliver w. holmes, “archival arrangement: five different operations at five different levels,” american archivist 27, no. 1 (1964): 21–41; terry abraham, “oliver w. holmes revisited: levels of arrangement and description of practice,” american archivist 54, no. 3 (1991): 370–77. 8. society of american archivists, describing archives: a content standard (chicago: society of american archivists, 2004); xiii. 9. haworth, “archival description.” 10. society of american archivists, describing archives: a content standard (chicago: society of american archivists, 2004); steven l. hensen, comp., archives, personal papers, and manuscripts, 2nd ed. (chicago: society of american archivists, 1989). 11. peter carini and kelcy shepherd, “the marc standard and encoded archival description,” library hi tech 22, no. 1 (2004): 18–27; steven l. hensen, “archival cataloging and the internet: the implications and impact of ead,” journal of internet cataloging 4, no. 3/4 (2001): 75–95. 12. abraham, “oliver w. holmes revisited.” 13. elizabeth j. weisbrod and paula duffy, “keeping your online catalog from degenerating into a finding aid: considerations for loading microformat records into the online catalog,” technical services quarterly 11, no. 1 (1993): 29–42. 14. carini and shepherd, “the marc standard and encoded archival description.” 15. see, for example, margaret f. nichols, “finding the forest among the trees: the potential of collection-level cataloging,” cataloging & classification quarterly 23, no. 1 (1996): 53–71; and weisbrod and duffy, “keeping your online catalog from degenerating into a finding aid.” 16. joint steering committee for revision of aacr, angloamerican cataloguing rules, d-2. 17. innovative interfaces inc., “electronic resources management,” 2005, www.iii.com/pdf/lit/eng_erm.pdf (accessed apr. 24, 2006). 18. library of congress, marc 21 format for holdings data: including guidelines for content designation (washington, d.c.: cataloging distribution service, library of congres, 2000), appendix e–glossary. 19. carl abbot, “planning a sustainable portland: a digital library for local, regional, and state planning and policy documents—framing paper,” 2005, http://oscdl.research.pdx. edu/framing.php (accessed apr. 24, 2006). 20. anne a. salter, “21st-century archivist,” newsletter, 2003, www.lisjobs.com/newsletter/archives/sept03asalter.htm (accessed apr. 24, 2006). 70 information technology and libraries | june 2006 figure 1. example of resource record from the psu library catalog (search conducted nov. 4, 2005) appendix. figures digital collection management through the library catalog | brenner, larsen, and weston 71 figure 2. example of a bibliographic record for a journal title from the psu library catalog (search conducted nov. 4, 2005) 72 information technology and libraries | june 2006 figure 4. resource record harbor drive with linked holdings records, bibliographic records, and original documents figure 3. partial diagram of the hierarchical levels of the collection digital collection management through the library catalog | brenner, larsen, and weston 73 figure 7. comma delimited coverage load file (*.csv) figure 6. data spreadsheet figure 5. spreadsheet for tracking data 74 information technology and libraries | june 2006 figure 9. browse screen in opac figure 8. bibliographic records template digital collection management through the library catalog | brenner, larsen, and weston 75 figure 11. resource record with various links figure 10. system-created brief bibliographic record in opac 76 information technology and libraries | june 2006 figure 13. bonner resource record with linked holdings records, bibliographic records, and original documents figure 12. full bibliographic record in opac digital collection management through the library catalog | brenner, larsen, and weston 77 figure 15. bibliographic record for the oscdl collection figure 14. outline of linked records in the collection 166 book reviews proceedings of the 1968 clinic on library applications of data processing, edited by dewey e. carroll. urbana : university of illinois, 1969. 235 pp. $3.00. for all except inveterate institute participants, it must be difficult to decide to spend yet another week listening to a widely mixed series of papers and discussions on data processing in libraries, in the hope of finding something new or useful. to attract a wide audience, the offerings tend to range from simple introductions to technical discussions of specific programs or projects. the value of gathering the papers of such institutes into volumes of proceedings is questionable. material from the introductory papers would certainly find greater use in a comprehensive monograph, while the papers which report new developments or technical problems would have a better chance of reaching their proper audiences if published in journals. the repetitive "how-we-did-it" reports might best be left unpublished. the proceedings of the 1968 illinois clinic does have a number of articles which deserve wide readership. frederick g. kilgour's paper on initial system design for the ohio college library center is excellent, not so much for solutions, but because he raises the questions on the purpose of college libraries and the nature of regional systems which need to be raised before embarking on design. those who have had experience with automated operations will appreciate lawrence auld's listing of ten categories of library automation failure. (he omits one of the most common-lack of computer stability. ) a technical article of considerable interest is alan r. benenfeld's paper on generation and encoding of the data base for intrex. those looking for reports of successful computer applications may find useful information in the papers by robert hamilton, of the illinois state library, on circulation; by james w. thomson and robert h. muller, of the university of michigan, on the u. of m. order system; by michael m. reynolds, of indiana university, on centralized technical processing for the university's regional campus libraries; by john p. kennedy, of georgia tech, on production of catalog cards; and by robert k. kozlow, of the university of illinois, on a computer-produced serials list. melvin ]. voigt book reviews 167 planning library services. proceedings of a research seminar held at the university of lancaster, 9-11 july, 1969. edited by a. graham mackenzie and ian m. stuart. lancaster, england: university of lancaster library, 1969. 30 shillings. this volume offers fifteen papers presented in six sessions; each session had one or more papers and some discussion. the papers range from very general mathematical models to local problems of british legal codes and re-organization of local governments. the first session introduces the problems and some theoretical notions of how to deal with them. the next three sessions deal with analysis techniques. morely introduces some simple techniques of maximizing benefits for given resources. brookes presents a good quick introduction to statistics and distributions which occur frequently in information science. leimkuhler develops cost models for storage policies and woodburn analyzes the costs in heirarchical library systems. the mathematics in these latter papers, although not difficult, will probably put off a good many librarians and administrators. both are practitioners and impressed by results, not complex models; the equations developed by leimkuhler or woodburn are probably too complex to be successfully used by most librarians. this might reflect the state of the librarian and not of the art, however, to quote cloote (from the paper by duchesne) : "with only a very few notable exceptions, successful models have been so simple that an operational research specialist would disown them." the fifth session covers data collection and evaluation. duchesne comments on management information systems and operations research for librarians. conventional techniques of data collection are reviewed by ford, including sample forms and a note of warning about too many surveys. in the final session leimkuhler presents an overview which includes several choice comments on progress (or lack of it) in libraries. during the discussion period, mackenzie suggests that libraries should use up to five percent of their budgets for research. this reviewer feels that unless this suggestion is taken more seriously, most of the theory will never find an application. these proceedings would make an excellent companion to burkhalter's case studies in library systems analysis as more theoretically oriented readings for a course operations research or administration in librarianship. some of the techniques presented could be adapted for immediate application in analyzing present systems. thus this collection of papers can be useful to both student and practitioner interested in research and development of library systems. arvo tars 168 journal of library automation vol. 3/2 june 1970 libraries at large, edited by douglas m. knight and e. shepley nourse. new york: r. r. bowker, 1969. 664 p. $14.95. libraries at large is based on the materials which the national advisory commission on libraries employed in its deliberations. the commission appraised the adequacy of libraries and made recommendations designed "to ensure an effective, efficient library system for the nation." these materials are also useful to those engaged in the enrichment of present library programs and to those developing new library projects. the materials consist of papers and reports written for the commission and include essays, original investigations, and literature reviews, as well as reprints of material that has appeared elsewhere. some papers are of top quality; some are poor. nevertheless, the appearance of these materials in one volume adds a convenient source of information that will be useful to librarians for years to come. approximately half the book is devoted to problems related to the use of libraries and to the users of libraries. the second half contains discussions of government relationships of libraries and a series of useful appendixes. perhaps the most novel section of the book is william j. baumors "the cost of library and informational services." this study investigates the economics of libraries in depth and the results are of great interest. this chapter on economics contains new material and brings together that which existed heretofore, so that it constitutes the major resource on library economics. this chapter alone is so valuable as to justify the recommendation that all libraries and most librarians should acquire libraries at large. the section on copyright is equally important, for it brings together data on a topic possessing cataclysmic potentials for librarianship. verner clapp's "copyright: a librarian's view" is the best statement that has appeared on the subject, and it is hoped that clapp's dissertation will awaken librarians to the peril that confronts them. on the other hand, the chapter entitled "some problems and potentials of technology as applied to library informational services" is somewhat less than satisfying. the section starts off with mathews and brown's "research libraries and the new technology," which originally appeared in on research libraries. it is still an inadequate exposition. there follows a reprint of "the impact of technology on the library building," which educational facilities laboratories published in 1967. the statement is adequate, but more useful information exists. the last section of the chapter is a study, "technology in libraries," which the system development corporation produced. this paper is a useful review of technologies employed by libraries and recommends five important network and systems projects to be undertaken. the chapters on government relationships include discussions of those book reviews 169 with the federal government and those at local, state and regional levels. germaine krettek and eileen d. cooke have provided a worthwhile appendix listing and abstracting library-related legislation at the national level. libraries at large is indeed a resource book, and those papers containing original investigations and literature reviews are of such high quality as to insure usefulness of this work to all thoughtful librarians. frederick g. kilgour computers and their potential applications in museums. a conference sponsored by the metropolitan museum of art. new york: arno press, 1968. 402 pp. $12.50. computers and their potential applications in museums contains the published proceedings of a conference which was held in new york, 1968. sponsored by the metropolitan museum of art and supported by ibm, the conference was another attempt to involve art and related fields in computer technology. this book covers a broad range of issues and problems from information retrieval to creativity. experts from museums, educators, librarians and computer specialists discussed the possible uses and the implications of computers for the museum field. the diversity of the participants seems to represent the components of an exceedingly complex problem which is as monumental as the museum field itself. as an overall document it gives evidence of concern and insight into the many technical problems which some researchers have encountered. in many instances the non-technical experts were too global in their thinking, while the technologists were too local in their area of concern to communicate to anyone but technologists. this disparity between approaches, with the obvious difficulties presented, is a typical one whenever non-technical groups attempt to make use of computer technology. an ambitious conference in scope, there were excellent participants and several of the papers were stimulating and provocative. the interaction among the people who attended the conference may have been useful and it may have generated important ideas. for a reader of the published proceedings one wishes there had been a final chapter which could have provided some guidelines for research and education in this field. there was an opportunity for the organizers of the conference or a small group of the participants to summarize the problems and to give some direction to solutions. several years and many conferences later we in the humanities have made little progress in use of the computer. it seems that we are still better at rhetoric than at problem solving. charles csuri 170 journal of library automation vol. 3/2 june 1970 books for junior college libraries. pirie, james w., comp. chicago: american library assoc., 1969. 452 pp. $35.00. during the recent period of rapid growth and development of junior and community colleges, a bibliographic guideline has been long awaited. james w. pirie's books for junior college libraries, with its healthy potential for developing many basic collections and extending and updating others, fills that void. though it does not boast to be the single ideal bibliographic tool, it is a welcome addition to, (and perhaps replacement for some of) its predecessors-frank bertalan's books for junior college libraries; charles l. trinkner's basic books for junior college libraries; hester hohman's readers adviser; helen wheeler's a basic book collection for the community college library; bro dart foundation's the junior college library collection~ edited by dr. bertalan; and the ever-present subject guide to books in print and books in print, from bowker. books for junior college libraries represents the cooperative efforts of some 300 expert consultants-subject specialists, faculty members and librarians-charged with the responsibility of producing a publication to serve as a book selection guide for new or established junior and community college libraries. approximately 20,000 titles are arranged by subject, broadly interpreted; with entries consisting of author, title, subtitle, edition; if other than the first, publisher, and place of publication, date, pagination, price and l.c. number. easy access is provided by the inclusion of an author and subject index. a comparative "table of subject coverage" appearing in the preface, tabulating the percentage of subject distribution to total volume for the lamount, michigan, and the more recent books for college libraries lists, indicates that books for junior college libraries maintains a comparable subject percentage distribution to total volume. only book titles have been included; foreign entries have been limited to a few major works, and out-of-print titles, in favor of titles readily available. paperbacks were listed, in the absence of card copy. though limited in its coverage of terminal and vocational courses, with emphasis toward the transfer or liberal arts program, books for junior college libraries does embrace all fields of knowledge that tend to be challenging and useful for the general education programs. it has been endorsed by the joint committee on junior colleges of aajc, ala, and the junior college section of acrl, and moves toward the recommendations of the ala standards for junior college libraries. this bibliographic guideline for junior college libraries should be welcomed by public schools as well as junior and community colleges for its assistance in developing new collections, as well as expanding and updating old collections, with quantity, quality, and economy working together. ]ames i. richey book reviews 171 agricultural sciences information network development plan. educom research report, august 1969. 74 pp. the national agricultural library wants to implement its old plan of an agricultural science information network "based on the assumption that the land-grant libraries in the states are the natural nodes to this network." educom undertook a study which was submitted to and discussed by a symposium held in washington, d. c., on february 10-12, 1970, with the participation of all agricultural libraries interested in "new and improved ways of exchanging information in support of agricultural research and education." the goal is "to develop a long-range plan for strengthening information, communication, and exchange among the libraries of land-grant institutions and the nal." according to the report, the network concept would constitute a "network of networks" and three basic components are envisioned: 1) land-grant libraries, 2) information analysis centers, and 3) telecommunications. all these components have their own aims and objectives described in this report. "nal's first course of action in the establishment of a system of information analysis centers is to develop a directory of existing analysis centers of interest to the agricultural community. the directory should be supported with a catalog detailing the services and products offered by these centers. nal should then establish cooperative agreements with these centers which would make them responsive to the needs of the users of the agricultural sciences information network. this should be supported with the installation of communications equipment to encourage and facilitate the use of a center." no doubt, the participants of the symposium will have thoroughly investigated and discussed this plan with serious consideration to its practical implementation. a new approach and improvement of information exchange is not only a necessity, but also long overdue, for those in agriculture. this information development plan would provide service for research workers at the experiment stations, scientists and teachers at the colleges, agricultural extension people at the land-grant institutions, and, last but not least, for the farmers who provide us with food and fibers in order to bring a fuller and better life on the farm and in rural and city homes. a detailed analysis of the performance, an evaluation and revision of this gigantic scientific information system, can only be made after it has been in operation for a few years. it is very promising that the national agricultural library-among its many objectives-has again taken the initiative. john de gara 172 journal of library automation vol. 3/2 june, 1970 cornell university libraries. manual of cataloging procedures. 2d ed. ithaca, n.y. : cornell university libraries, 1969. $18.00. editor robert b. slocum and his associates have produced a valuable manual useful to catalogers and persons involved in the administration of policies and procedures in technical services. as stated in the preface the manual is a supplement, not a substitute, for the anglo-american cataloging rules and its predecessors, lc list of subject headings and the lc classification schedules. the following directive is basic: "the revisers are always open for consultation on particularly difficult problems, but it must be assumed that a professional cataloger will have a thorough knowledge of the basic tools of his profession. . . . if this knowledge is in any way lacking, the cataloger has the obvious responsibility of acquiring it through diligent study and experience. he should not come to the reviser with questions whose answers are available in the aforementioned tools and in this manual." the format is loose-leaf, so that additions and revisions may be made easily to reflect new developments and techniques. the sections include pre-cataloging procedures; general cataloging and classification procedures; recataloging and reclassification; cornell university college and department libraries-special collections and special catalogs; . . . serials and binding department; files and filing; typing, card production, book preparation; statistics; appendix (including abbreviations, romanization tables, etc. ) ; and index. the procedures and practices described are those adopted by a research library "conscious of the need for both quality and quantity in the work of its staff." this publication, weighing five pounds, is a great achievement and with its full index an indispensable contribution to the collection of worthwhile cataloging manuals. descriptions of local procedures may seem detailed but basic principles and policies are well covered. the final touch is the inclusion of a catalog card for the manual! margaret oldfather barnettellis 22 information technology and libraries | march 2005 the metascholar initiative of emory university libraries, in collaboration with the center for the study of southern culture, the atlanta history center, and the georgia music hall of fame, received an institute of museum and library services grant to develop a new model for library-museum-archives collaboration. this collaboration will broaden access to resources for learning communities through the use of the open archives initiative protocol for metadata harvesting (oaipmh). the project, titled music of social change (mosc), will use oai-pmh as a tool to bridge the widely varying metadata standards and practices across museums, archives, and libraries. this paper will focus specifically on the unique advantages of the use of oaipmh to concurrently maximize the exposure of metadata emergent from varying metadata cultures. t he metascholar initiative of emory university libraries, in collaboration with the center for the study of southern culture, the atlanta history center, and the georgia music hall of fame, received an institute of museum and library services grant to develop a new model for library-museum-archives collaboration to broaden access to resources for learning communities through the use of the open archives initiative protocol for metadata harvesting (oai-pmh).1 the collaborators of the project, entitled music of social change (mosc), are creating a subject-based virtual collection concerning music and musicians associated with social-change movements such as the civil-rights struggle. this paper will specifically focus on the advantages offered by oai-pmh in amalgamating and serving metadata from these institutional sources that are significantly different in kind.2 there has been a great deal of discussion within the library community as to the possibilities oai-pmh holds for harvesting, aggregating, and then disseminating research metadata. however, in reality, only a few of institutions (be they museum, archives, or libraries) have actually begun to utilize oai-pmh to this end. there are some practical, historical barriers to implementing any shared system for distributing metadata across institutions that are, more than in degree, different in kind. one of these significant differences is of metadata cultures and practices. libraries have traditionally incrementally assigned metadata at an item level within their collection(s). the strength of this model is that at least a minimal amount of metadata is assigned to a very high percentage of items within the collection. the challenge of such a system is that for such metadata records to interoperate within a shared database and through a common interface (for example, the traditional union catalog), the metadata fields have been quite rigidly defined compared to those within archival and museum environments. due to tradition as well as the sheer volume of items collected by libraries, metadata at an item level are not greatly detailed or contextualized. often, items within library collections lack robust relationary mapping to other items within or outside of the collection, as is done, for example, in archival processing. content contextualization is highly valued by archival metadata practices and culture as the central tenet of metadata creation. items at a subcollection level almost always have metadata derivative from and deferential to that of the collection-level metadata. the great benefit of archival practices in metadata assignment is a contextualization of content that reflects the background, the topographic place in time and space of a given portion of a collection and its organic, emergent relationship to the whole. the weaknesses of this model are a great inconsistency in description details and variables (at the collection and subcollection levels), as well as very disparate levels of granularity within the hierarchy of the structure of a collection at which metadata are assigned. such disparities among institutional types feed an unnecessary level of misunderstanding by libraries of the metadata culture and aims of archives as well as those of museums. museums often have very skeletal documented (as opposed to undocumented) metadata about their collections or objects therein. often museums are not funded to make metadata on their collections freely available. it is common, in fact, for curatorial staff to view metadata as intellectual property to which they serve as gatekeepers, reflecting a professional value placed upon contextualizing materials for users. this is done on a user-by-user or exhibition-by-exhibition basis, depending on user background or the thesis of a given exhibition. additionally, museums perceive information on the aboutness of their collections to be a class of capital with which they can always potentially cost-recover or generate income. within the culture of museums, staff have traditionally been disinclined to make their collections available in an unmediated manner. additionally, there has been resistance to documenting information about collections in a systematic way. there is even greater resistance to adhering to any prescriptions on metadata as would be required for compliance with even the most minimally structured database. such regulation would discriminate the mosc project: using the oai-pmh to bridge metadata cultural differences across museums, archives, and libraries eulalia roel eulalia roel (eulalia.roel@gmail.com) is coordinator of information resources at the federal reserve, atlanta. against the nuanced information required for each and every object within a collection. � why oai-pmh to bridge these cultures? oai-pmh was selected by the mosc project as a means to bridge some of these substantial disparities. the protocol is often mistakenly assumed to function only with metadata expressed as unqualified dublin core (dc). in fact, the protocol functions with any metadata format expressed by extensible markup language (xml); this is the minimal requirement for content to serve metadata through oai-pmh. this includes those formats that have been well received by institutions other than libraries, such as xml encoded archival description (ead) as it is used in archives. as per 4.2 of the oai-pmh guidelines for repository implementers, communities are able to develop their own collection description xml schemas for use within description . . . elements. if all that is desired is the ability to include an unstructured textual description, then it is recommended that repositories use the dublin core description element. seven existing schemes are: dublin core, encoded archival description (ead), the eprints schema, rslp collection description schema, uddi/wsdl, marc21, and the branding schema.3 the oai protocol has often been partnered with unqualified dc metadata, as this is the most minimal metadata structure necessary for participation in an oai harvesting system. not only are these dc fields unqualified, no fields are actually required. no structure or regulations are codified outside of requiring metadata contributors to adhere to this unqualified metadata schema. therefore, the oai protocol requires minimal technology support and resources at any given contributing site (such support varying more widely across institutions than even their metadata practices themselves). this maximizes flexibility in metadata contribution, as well as maximizing interoperability between the collective data pool from which a user can search. granted, this unregulated framework does come at a cost of inconsistency in metadata detail and quality. however, the great advantage of such nominal requirements is that they enable contributors with minimal metadata-encoding practices to participate in the metadata collaborative. following is an example of a record as it may appear in the mosc collection:

oai:atlantahistorycenter.com:10 2003-03-31 south:blues south:mississippi-delta-region

long hall recordings morris, william blues .. comment: sound amateur recording 2003-05-16 sound recording http://atlantahistorycenter.com/ porcelain/10 additionally, with no fields required by the dc schema, institutions can have absolute discretion as to what metadata are exposed if this is a concern (as may be for privacy considerations for archives or for intellectualproperty concerns for museums). however, one of the great strengths of implementing oai-pmh is that, while the threshold for regulating metadata is low, the protocol can also handle any metadata format expressed by xml, including data formats significantly more structured than dc; for example, ead, text encoding initiative (tei), and tei lite-defined documents. scholars are then able to access these scholarly objects via one point, while still being able to collectively access and utilize all metadata objects available in all collections, from the most to the least robust. the aim of the mosc project participants in selecting oai-pmh is to maximize participation from fairly disparate kinds of organizations, with equally disparate kinds of metadata cultures and practices. in comparison to other, currently available methods of metadata aggregation, oai-pmh is maximally forgiving of discordant metadata suppliers. thereby, the hope is, metadata contributions are maximized. concurrently, the protocol the mosc project | roel 23 24 information technology and libraries | march 2005 allows for highly robust metadata formats. as the cost for inclusion in aggregated systems, in some cases metadata objects are stripped down. this need is eliminated when oai-pmh is utilized. the use of the protocol allows for the inclusion of objects consisting of the most skeletal unqualified dublin core elements, while still accommodating the most complicated metadata objects. optimally, this is a means to achieve a critical mass of contributed resources that will enable end users to utilize the mosc project as the premier site and a primary resource for information on materials about music and musicians associated with social-change movements. � acknowledgment the author would like to express her sincerest gratitude to the institute of museum and library services for funding the music of social change project. references 1. “metascholar: an emory university digital library research initiative,” emory university libraries web site. accessed sept. 1, 2004, http://metascholar.org/; “the center for southern culture,” university of mississippi web site. accessed sept. 1, 2004, www.olemiss.edu/depts/south/; “atlanta history center,” atlanta history center web site. accessed sept. 1, 2004, www.atlantahistorycenter.com/; “georgia music hall of fame,” georgia music hall of fame web site. accessed sept. 1, 2004, www.gamusichall.com/home.html; “institute of museum and library services: library-museum collaboration,” institute of museum and library services web site. accessed sept. 1, 2004, www.imls.gov/grants/l-m/index.htm. 2. “implementation guidelines for the open archives initiative protocol for metadata harvesting,” open archives initiative web site. accessed sept. 1, 2004, www.openarchives.org/ oai/openarchivesprotocol.html#introduction. 3. “4.2 collection and set descriptions,” open archives initiative web site. accessed sept. 1, 2004, www.openarchives.org/ oai/2.0/guidelines-repository.htm#setdescription. reproduced with permission of the copyright owner. further reproduction prohibited without permission. wikiwikiwebs: new ways to communicate in a web environment chawner, brenda;lewis, paul h information technology and libraries; mar 2006; 25, 1; proquest education journals pg. 33 reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. bailey 116 information technology and libraries | september 2006 three critical issues—a dramatic expansion of the scope, duration, and punitive nature of copyright laws; the ability of digital rights management (drm) systems to lock-down digital content in an unprecedented fashion; and the erosion of net neutrality, which ensures that all internet traffic is treated equally—are examined in detail and their potential impact on libraries is assessed. how legislatures, the courts, and the commercial marketplace treat these issues will strongly influence the future of digital information for good or ill. editor's note: this article was submitted in honor of the fortieth anniversaries of lita and ital. b logs. digital photo and video sharing. podcasts. rip/mix/burn. tagging. vlogs. wikis. these buzzwords point to a fundamental social change fueled by cheap personal computers (pcs) and servers, the internet and its local wired/wireless feeder networks, and powerful, low-cost software. citizens have morphed from passive media consumers to digital-media producers and publishers. libraries and scholars have their own set of buzzwords: digital libraries, digital presses, e-prints, institutional repositories, and open-access (oa) journals, to name a few. they connote the same kind of change: a democratization of publishing and media production using digital technology. it appears that we are on the brink of an exciting new era of internet innovation: a kind of digital utopia. gary flake of microsoft has provided one striking vision of what could be (with a commercial twist) in a presentation entitled “how i learned to stop worrying and love the imminent internet singularity,” and there are many other visions of possible future internet advances.1 when did this metamorphosis begin? it depends on who you ask. let’s say the late 1980s, when the internet began to get serious traction and an early flowering of noncommercial digital publishing occurred. in the subsequent twenty-odd years, publishing and media production went from being highly centralized, capital-intensive analog activities with limited and welldefined distribution channels, to being diffuse, relatively low-cost digital activities with the global internet as their distribution medium. not to say that print and conventional media are dead, of course, but it is clear that their era of dominance is waning. the future is digital. nor is it to say that entertainment companies (e.g., film, music, radio, and television companies) and information companies (e.g., book, database, and serial publishers) have ceded the digital-content battlefield to the upstarts. quite the contrary. high-quality, thousand-page-per-volume scientific journals and hollywood blockbusters cannot be produced for pennies, even with digital wizardry. information and entertainment companies still have an important role to play, and, even if they didn’t, they hold the copyrights to a significant chunk of our cultural heritage. entertainment and information companies have understood for some time that they must adapt to the digital environment or die, but this change has not always been easy, especially when it involves concocting and embracing new business models. nonetheless, they intend to thrive and prosper—and to do whatever it takes to succeed. as they should, since they have an obligation to their shareholders to do so. the thing about the future is that it is rooted in the past. culture, even digital culture, builds on what has gone before. unconstrained access to past works helps determine the richness of future works. inversely, when past works are inaccessible except to a privileged minority, future works are impoverished. this brings us to a second trend that stands in opposition to the first. put simply, it is the view that intellectual works are property; that this property should be protected with the full force of civil and criminal law; that creators have perpetual, transferable property rights; and that contracts, rather than copyright law, should govern the use of intellectual works. a third trend is also at play: the growing use of digital rights management (drm) technologies. when intellectual works were in paper (or other tangible forms), they could only be controlled at the object-ownership or object-access levels (a library controlling the circulation of a copy of a book is an example of the second case). physical possession of a work, such as a book, meant that the user had full use of it (i.e., the user could read the entire book and photocopy pages from it). when works are in digital form and are protected by some types of drm, this may no longer be true. for example, a user may only be able to view a single chapter from a drm-protected e-book and may not be able to print it. the fourth and final trend deals with how the internet functions at its most fundamental level. the internet was designed to be content-, application-, and hardware-neutral. as long as certain standards were met, the network did not discriminate. one type of content was not given preferential delivery speed over another. one type of strong copyright + drm + weak net neutrality = digital dystopia? charles w. bailey jr. charles w. bailey jr. (cbailey@digital-scholarship.com) is assistant dean for digital library planning and development at university of houston libraries. digital dystopia | bailey 117 content was not charged for delivery while another was free. one type of content was not blocked (at least by the network) while another was unhindered. in recent years, network neutrality has come under attack. the collision of these trends has begun in courts, legislatures, and the marketplace. it is far from over. as we shall see, its outcome will determine what the future of digital culture looks like. ฀ stronger copyright: 1790 versus 2006 copyright law is a complex topic. it is not my intention to provide a full copyright primer here. (indeed, i will assume that the reader understands some copyright basics, such as the notion that facts and ideas are not covered by copyright.) rather, my aim is to highlight some key factors about how and why united states copyright law has evolved and how it relates to the digital problem at hand. three authors (lawrence lessig, professor of law at the stanford law school; jessica litman, professor of law at the wayne state university law school; and siva vaidhyanathan, assistant professor in the department of culture and communication at new york university) have done brilliant and extensive work in this area, and the following synopsis is primarily based on their contributions. i heartily recommend that you read the cited works in full. the purpose of copyright let us start with the basis of u.s. copyright law, the constitution’s “progress clause”: “congress has the power to promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries.”2 copyright was a bargain: society would grant creators a time-limited ability to control and profit from their works before they fell into the public domain (where works are unprotected) because doing so resulted in “progress of science and useful arts” (a social good). regarding the progress clause, lessig notes: it does not say congress has the power to grant “creative property rights.” it says that congress has the power to promote progress. the grant of power is its purpose, and its purpose is a public one, not the purpose of enriching publishers, nor even primarily the purpose of rewarding authors.3 however, entertainment and information companies can have a far different view, as illustrated by this quote from jack valenti, former president of the motion picture association of america: “creative property owners must be accorded the same rights and protections resident in all other property owners in the nation.”4 types of works covered when the copyright act of 1790 was enacted, it protected published books, maps, and charts written by living u.s. authors as well as unpublished manuscripts by them.5 the act gave the author the exclusive right to “print, reprint, publish, or vend” these works. now, copyright protects a wide range of published and unpublished “original works of authorship” that are “fixed in a tangible medium of expression” without regard for “the nationality or domicile of the author,” including “1. literary works; 2. musical works, including any accompanying words; 3. dramatic works, including any accompanying music; 4. pantomimes and choreographic works; 5. pictorial, graphic, and sculptural works; 6. motion pictures and other audiovisual works; 7. sound recordings; 8. architectural works.”6 rights in contrast to the limited print publishing rights inherent in the copyright act of 1790, current law grants copyright owners the following rights (especially notable is the addition of control over derivative works, such as a play based on a novel or a translation): ฀ to reproduce the work in copies or phonograph records; ฀ to prepare derivative works based upon the work; ฀ to distribute copies or phonograph records of the work to the public by sale or other transfer of ownership, or by rental, lease, or lending; ฀ to perform the work publicly, in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works; ฀ to display the copyrighted work publicly, in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work; and ฀ in the case of sound recordings, to perform the work publicly by means of a digital audio transmission.7 duration the copyright act of 1790 granted authors a term of fourteen years, with one renewal if the author was still living (twenty-eight years total).8 now the situation is much more complex, and, rather than trying to review the details, i’ll provide the following example. for a personal author who produced a work on or after january 1, 1978, it is covered for the life of the author plus seventy years.9 so, assuming 118 information technology and libraries | september 2006 an author lives an average seventy-five years, the work would be covered for 144 years, which is approximately 116 years longer than in 1790. registration registration was required by the copyright act of 1790, but very few eligible works were registered from 1790 to 1800, which enriched the public domain.10 now registration is not required, and no work enriches the public domain until its term is over, even if the author (or the author’s descendants) have no interest in the work being under copyright, or it is impossible to locate the copyright holder to gain permission to use his or her works (creating so-called “orphan works”). drafting of legislation by 1901, copyright law had become fairly esoteric and complex, and drafting new copyright legislation had become increasingly difficult. consequently, congress adopted a new strategy: let those whose commercial interests were directly affected by copyright law deliberate and negotiate with each other about copyright law changes, and use the results of this process as the basis of new legislation.11 over time, this increasingly became a dialogue among representatives of entertainment, high-tech, information, and telecommunications companies; other parties, such as library associations; and rights-holder groups (e.g., ascap). since these parties often had competing interests, the negotiations were frequently contentious and lengthy. the resulting laws created a kind of crazy quilt of specific exceptions for the deals made during these sessions to the ever-expanding control over intellectual works that copyright reform generally engendered. since the public was not at the table, its highly diverse interests were not directly represented, and, since stakeholder industries lobby congress and the public does not, the public’s interests were often not well served. (there were some efforts by special interest groups to represent the public on narrowly focused issues.) frequency of copyright term legislation with remarkable restraint, congress, in its first hundred years, enacted one copyright bill that extended the copyright term and one in its next fifty; however, starting in 1962, it passed eleven bills in the next forty years.12 famously, jack valenti once proposed that copyright “last forever less one day.”13 by continually extending copyright terms in a serial fashion, congress may grant him his wish. licenses in 1790, copyrighted works were sold and owned. today, many digital works are licensed. licenses usually fall under state contract law rather than federal copyright law.14 licensed works are not owned, and the first-sale doctrine is not in effect.15 while copyright is the legal foundation of licenses (i.e., works can be licensed because licensors own the copyright to those works), licenses are contracts, and contract provisions trump user-favorable copyright provisions, such as fair use, if the licensor chooses to negate them in a license. criminal and civil penalties in 1790 there were civil penalties for copyright infringement (e.g., statutory fines of “50 cents per sheet found in the infringer ’s possession”).16 now there are criminal copyright penalties, including felony violations that can result in a maximum of five years of imprisonment and fines as high as $250,000 for first-time offenders; civil statutory fines that can range as high as $150,000 per infringement (if infringement is “willful”), and other penalties.17 once the copyright implications of digital media and the internet sunk in, entertainment and information companies were deeply concerned: digital technologies made creating perfect copies effortless, and the internet provided a free (or low-cost) way to distribute content globally. congress, primarily spurred on by entertainment companies, passed several laws aimed at curtailing perceived digital “theft” through criminal penalties. under the 1997 no electronic theft (net) act, copyright infringers face “up to 3 years in prison and/or $250,000 fines,” even for noncommercial infringement.18 under the 1998 digital millennium copyright act (dmca), those who defeat technological mechanisms that control access to copyrighted works (a process called “circumvention”) face a maximum of five years in prison and $500,000 in fines.19 effect of copyright on average citizens in 1790, copyright law had little effect on citizens. the average person was not an author or publisher, private use of copyrighted materials was basically unregulated, the public domain was healthy, and many types of works were not covered by copyright at all. in 2006, ฀ virtually every type of work imaginable is under automatic copyright protection for extended periods of time; ฀ private use of digital works is increasingly visible and of concern to copyright holders; ฀ the public domain is endangered; and ฀ ordinary citizens are being prosecuted as “pirates” under draconian statutory and criminal penalties. digital dystopia | bailey 119 regarding this development, lessig says: for the first time in our tradition, the ordinary ways in which individuals create and share culture fall within the reach of the regulation of the law, which has expanded to draw within its control a vast amount of culture and creativity that it never reached before. the technology that preserved the balance of our history—between uses of our culture that were free and uses of our culture that were only upon permission—has been undone. the consequence is that we are less and less a free culture, more and more a permission culture.20 how has copyright changed since the days of the founding fathers? as we have seen, there has been a shift in copyright law (and social perceptions of it) from ฀ promoting progress to protecting intellectual property owners’ “rights”; ฀ from covering limited types of works to covering virtually all types of works; ฀ from granting only basic reproduction and distribution rights to granting a much wider range of rights; ฀ from offering a relatively short duration of protection to offering a relatively long (potentially perpetual) one; ฀ from requiring registration to providing automatic copyright; ฀ from drafting laws in congress to drafting laws in work groups of interested parties dominated by commercial representatives; ฀ from making infrequent extensions of copyright duration to making frequent ones; ฀ from selling works to licensing them; ฀ from relatively modest civil penalties to severe civil and criminal penalties; and ฀ from ignoring ordinary citizens’ typical use of copyrighted works to branding them as pirates and prosecuting them with lawsuits. (regarding lawsuits filed by the recording industry association of america against four students, lessig notes: “if you added up the claims, these four lawsuits were asking courts in the united states to award the plaintiffs close to $100 billion—six times the total profit of the film industry in 2001.”)21 complicating this situation further is intense consolidation and increased vertical integration in the entertainment, information, telecommunications, and other high-tech industries involved in the internet.22 this vertical integration has implications for what can be published and the free flow of information. for example, a company that publishes books and magazines, produces films and television programs, provides internet access and digital content, and provides cable television services (including broadband internet access) has different corporate interests than a company that performs a single function. these interrelated interests may affect not only what information is produced and whether competing information and services are freely available through controlled digital distribution channels, but corporate perceptions of copyright issues as well. one of the ironies of the current copyright situation is this: if creative works are by nature property, and stealing property is (and has always been) wrong, then some of the very industries that are demanding that this truth be embodied in copyright law have, in the past, been pirates themselves, even though certain acts of piracy may have been legal (or appeared to be legal) under then-existing copyright laws.23 lessig states: if “piracy” means using the creative property of others without their permission—if “if value, then right” is true—then the history of the content industry is a history of piracy. every important sector of “big media” today—film, records, radio, and cable tv—was born of a kind of piracy so defined. the consistent story is how last generation’s pirates join this generation’s country club—until now.24 let’s take a simple case: cable television. early cable television companies used broadcast television programs without compensating copyright owners, who branded their actions as piracy and filed lawsuits. after two defeats in the supreme court, broadcast television companies won a victory (of sorts) in congress, which took nearly thirty years to resolve the matter: cable television companies would pay, but not what broadcast television companies wanted; rather they would pay fees determined by law.25 of course, this view of history (big media companies as pirates in their infancy) is open to dispute. for the moment, let’s assume that it is true. put more gently, some of the most important media companies of modern times flourished because of relatively lax copyright control, a relatively rich public domain, and, in some cases, a societal boon that allowed them to pay statutory license fees— which are compulsory for copyright owners—instead of potentially paying much higher fees set by copyright owners or being denied use at all. today, the very things that fostered media companies’ growth are under attack by them. the success of those attacks is diminishing the ability of new digital content and service companies to flourish and, in the long run, may diminish even big media’s ability to continue to thrive as a permission culture replaces a permissive culture. several prominent copyright scholars have suggested copyright reforms to help restore balance to the copyright system. james boyle, professor of law at the duke university law school, recommends a twenty-year copyright term with “a broadly defined fair use protection for journalistic, teaching, and parodic uses—provided that those uses were not judged to be in bad faith by a jury applying the ‘beyond a reasonable doubt’ standard.”26 120 information technology and libraries | september 2006 william w. fisher iii, hale and dorr professor of intellectual property law at harvard university law school, suggests that “we replace major portions of the copyright and encryption-reinforcement models with . . . a governmentally administered reward system” that would put in place new taxes and compensate registered copyright owners of music or films with “a share of the tax revenues proportional to the relative popularity of his or her creation,” and would “eliminate most of the current prohibitions on unauthorized reproduction, distribution, adaptation, and performance of audio and video recordings.”27 lessig recommends that copyright law be guided by the following general principles: (1) short copyright terms, (2) a simple binary system of protected/not protected works without complex exceptions, (3) mandatory renewal, and (4) a “prospective” orientation that forbids retrospective term extensions.28 (previously, lessig had proposed a seventy-five-year term contingent on five-year renewals). he suggests reinstating the copyright registration requirement using a flexible system similar to that used for domain name registrations. he favors works having copyright marks, and, if they are not present, he would permit their free use until copyright owners voice their opposition to this use (uses of the work made prior to this point would still be permitted). litman wants a copyright law “that is short, simple, and fair,” in which we “stop defining copyright in terms of reproduction” and recast copyright as “an exclusive right of commercial exploitation.”29 litman would eliminate industry-specific copyright law exceptions, but grant the public “a right to engage in copying or other uses incidental to a licensed or legally privileged use”; the “right to cite” (even infringing works); and “an affirmative right to gain access to, extract, use, and reuse the ideas, facts, information, and other public-domain material embodied in protected works” (including a restricted circumvention right).30 things change in two hundred-plus years, and the law must change with them. since the late nineteenth century, copyright law has been especially impacted by new technologies. the question is this: has copyright law struck the right balance between encouraging progress through granting creators specific rights and fostering a strong public domain that also nourishes creative endeavor? if that balance has been lost, how can it be restored? or is society simply no longer striving to maintain that balance because intellectual works are indeed property, property must be protected for commerce to prosper, and the concept of balance is outmoded and no longer reflects societal values? ฀ drm: locked-up content and fine-grained control noted attorney michael godwin defines drm as “a collective name for technologies that prevent you from using a copyrighted digital work beyond the degree to which the copyright owner (or a publisher who may not actually hold a copyright) wishes to allow you to use it.”31 like copyright, drm systems are complex, with many variations. there are two key technologies: (1) digital marking (i.e., digital fingerprints that uniquely identify a work based on its characteristics, simple labels that attach rights information to content, and watermarks that typically hide information that can be used to identify a work), and (2) encryption (i.e., scrambled digital content that requires a digital key to decipher it).32 specialized hardware can be used to restrict access as well, often in conjunction with digital marking and encryption. the intent of this article is not to provide a technical tutorial, but to set forth an overview of the basic drm concept and discuss its implications. what is of interest here is not how system a-b-c works in contrast to system x-y-z, but what drm allows copyright owners to do and the issues related to drm. to do so, let’s use an analogy, understanding that real drm systems can work in other ways as well (e.g., digital watermarks can be used to track illegal use of images on the internet without those images being otherwise protected). for the moment, let’s imagine that the content a user wishes to access is in an unbreakable, encrypted digital safe. the user cannot see inside the safe. by entering the correct digital combination, certain content becomes visible (or audible or both) in the safe. that content can then be utilized in specific ways (and only those ways), including, if permitted, leaving the safe. if a public domain work is put in the safe, access to it is restricted regardless of its copyright status. bill rosenblatt, bill trippe, and stephen mooney provide a very useful conceptual model of drm rights in their landmark drm book, digital rights management: business and technology, summarized here.33 there are three types of content rights: (1) render rights, (2) transport rights, and (3) derivative-works rights. render rights allow authorized users to view, play, and print protected content. transport rights allow authorized users to copy, move, and loan content (the user retains the content if it is copied and gets it back when a loan is over, but does not keep a copy if it is moved). derivative-works rights allow authorized users to extract pieces of content, edit the content in place, and embed content by extracting some of it and using it in other works. each one of these individual rights has three attributes: (1) consideration, (2) extents, and (3) types of users. in the first attribute, consideration, access to content is provided for something of value to the publisher (e.g., money or personal information). content can then be used to some extent (e.g., for a certain amount of time or a certain number of times). the rights and attributes users have are determined by their user types. digital dystopia | bailey 121 for example, an academic user, in consideration of a specified license payment by his or her library, can view a drm-protected scholarly article—but not copy, move, loan, extract, edit, or embed it—for a week, after which it is inaccessible. we can extend this hypothetical example by imagining that the library could pay higher license fees to gain more rights to the journal in question, and the library (or the user) could dynamically purchase additional article-specific rights enhancements as needed through micropayments. this example is extreme; however, it illustrates the fine-grained, high level of control that publishers could potentially have over content by using drm technology. godwin suggests that drm may inhibit a variety of legitimate uses of drm-protected information, such as access to public-domain works (or other works that would allow liberal use), preservation of works by libraries, creation of new derivative works, conduct of historical research, exercise of fair-use rights, and instructional use.34 the ability of blind (or otherwise disabled) users to employ assistive technologies may also be prevented by drm technology.35 drm also raises a variety of privacy concerns.36 fair use is an especially thorny problem. rosenblatt, trippe, and mooney state: fair use is an “i’ll know it when i see it” proposition, meaning that it can’t be proscriptively defined. . . . just as there is no such thing as a “black box” that determines whether broadcast material is or isn’t indecent, there is no such thing as a “black box” that can determine whether a given use of content qualifies as fair use or not. anything that can’t be proscriptively defined can’t be represented in a computer system.37 no need to panic about scholarly journals—yet. your scholarly journal publisher or other third-party supplier is unlikely to present you with such detailed options tomorrow. but you may already be licensing other digital content that is drm-protected, such as digital music or e-books that require a hardware e-book reader. as the recent sony bmg “rootkit” episode illustrated, creating effective, secure drm systems can be challenging, even for large corporations.38 again, the reasons for this are complex. in very simple terms, it boils down to this: assuming that the content can be protected up to the point it is placed in a drm system, the drm system has the best chance of working if all possible devices that can process its protected content either directly support its protection technology, recognize its restrictions and enforce them through another means, or refuse access.39 anything less creates “holes” in the protective drm shell, such as the well-known “analog hole” (e.g., when drm-protected digital content is converted to analog form to be played, it can then be rerecorded using digital equipment without drm protection).40 ideally, in other words, every server, network router, pc and pc component, operating system, and relevant electronic device (e.g., cd player, dvd player, audiorecording device, and video-recording device) would work with the drm system as outlined previously or would not allow access to the content at all. clearly, this ideal end-state for drm may well never be realized, especially given the troublesome backwardcompatibility equipment problem.41 however, this does not mean that the entertainment, information, and hightechnology companies will not try to make whatever piecemeal progress that they can in this area.42 the trusted computing group is an important multiple-industry security organization, whose standards work could have a strong impact on the future of drm. robert a. gehring notes: but a drm system is almost useless, that is from a content owner’s perspective, until it is deployed broadly. putting together cheap tc components with a marketdominating operating system “enriched” with drm functionality is the most economic way to provide the majority of users with “copyright boxes.”43 seth schoen argues computer owners should be empowered to override certain features of “trusted computing architecture” to address issues with “anti-competitive and anti-consumer behavior” and other problems.44 drm could potentially be legislatively mandated. there is a closely related legal precedent, the audio home recording act, which requires that digital audiotape equipment include special hardware to prevent serial copying.45 there is currently a bill before congress that would require use of a “broadcast flag” (a digital marker) for digital broadcast and satellite radio receivers.46 last year, a similar fcc regulation for broadcast digital television was struck down by a federal appeals court; consequently, the current bill explicitly empowers the fcc to “enforce ‘prohibitions against unauthorized copying and redistribution.’”47 another bill would plug the analog-to-digital video analog hole by putting “strict legal controls on any video analog to digital (a/d) convertors.”48 whether these bills become law or not, efforts to mandate drm are unlikely to end. dmca strongly supports drm by prohibiting both the circumvention of technological mechanisms that control access to copyrighted works (with some minor exceptions) and the “manufacture of any device, composition of any program, or offering of any service” to do so.49 what would the world be like if all newly published (or released) commercially created information was in digital form, protected by drm? what would it be like if all old works in print and analog formats were only reissued in digital form, protected by drm? what would it be like if all hardware that could process that digital information had to support the information’s drm scheme or block any access to it because this was mandated by law? what would it be 122 information technology and libraries | september 2006 like if all operating systems had direct or indirect built-in support for drm? would “progress of science and useful arts” be promoted or squashed? ฀ weaker net neutrality lessig identifies three important characteristics of the internet that have fostered innovation: (1) edge architecture: software applications run on servers connected to the network, rather than on the network itself, ensuring that the network itself does not have to be modified for new or updated applications to run; (2) no application optimization: a relatively simple, but effective, protocol is utilized (internet protocol) that is indifferent to what software applications run on top of it, again insulating the network from application changes; and (3) neutral platform: the network does not prefer certain data packets or deny certain packets access.50 lessig’s conceptual model is very useful when thinking about net neutrality, a topic of growing concern. educause’s definition of net neutrality aptly captures these concerns: “net neutrality” is the term used to describe the concept of keeping the internet open to all lawful content, information, applications, and equipment. there is increasing concern that the owners of the local broadband connections (usually either the cable or telephone company) may block or discriminate against certain internet users or applications in order to give an advantage to their own services. while the owners of the local network have a legitimate right to manage traffic on their network to prevent congestion, viruses, and so forth, network owners should not be able to block or degrade traffic based on the identity of the user or the type of application solely to favor their interests.51 for some time, there have been fears that net neutrality was endangered as the internet became increasingly commercialized, a greater percentage of home internet users migrated to broadband connections not regulated by common carrier laws, and telecommunications mergers (and vertical integration) accelerated. some of these fears are now appearing to be realized, albeit with resistance by the internet community. for example, aol has indicated that it will implement a two-tier e-mail system for companies, nonprofits, and others who send mass mailings: those who pay bypass spam filters, those who don’t pay don’t bypass spam filters.52 critics fear that free e-mail services will deteriorate under a two-tier system. facing fierce criticism from the dearaol.com coalition and many others, aol has relented somewhat on the nonprofit issue by offering special treatment for “qualified” nonprofits. a second example is that an analysis of verizon’s fcc filings reveals that “more than 80% of verizon’s current capacity is earmarked for carrying its service, while all other traffic jostles in the remainder.”53 content-oriented net companies are worried: leading net companies say that verizon’s actions could keep some rivals off the road. as consumers try to search google, buy books on amazon.com, or watch videos on yahoo!, they’ll all be trying to squeeze into the leftover lanes on verizon’s network. . . . “the bells have designed a broadband system that squeezes out the public internet in favor of services or content they want to provide,” says paul misener, vice-president for global policy at amazon.com.54 a third example is a comment by william l. smith, bellsouth ‘s chief technology officer, who “told reporters and analysts that an internet service provider such as his firm should be able, for example, to charge yahoo inc. for the opportunity to have its search site load faster than that of google inc.,” but qualified this assertion by indicating that “a pay-for-performance marketplace should be allowed to develop on top of a baseline service level that all content providers would enjoy.”55 about four months later, at&t announced that it would acquire bellsouth, after which it “will be the local carrier in 22 states covering more than half of the american population.”56 finally, in a white paper for public knowledge, john windhausen jr. states: this concern is not just theoretical—broadband network providers are taking advantage of their unregulated status. cable operators have barred consumers from using their cable modems for virtual private networks and home networking and blocked streaming video applications. telephone and wireless companies have blocked internet telephone (voip—voice over the internet protocol) traffic outright in order to protect their own telephone service revenues.57 these and similar examples are harbingers of troubled days ahead for net neutrality. the canary in the net neutrality mine isn’t dead yet, but it’s getting very nervous. the bottom line? noted oa advocate peter suber analyzes the situation as follows: but now cable and telecom companies want to discriminate, charge premium prices for premium service, and give second-rate service to everyone else. if we relax the principle of net neutrality, then isps could, if they wanted, limit the software and hardware you could connect to the net. they could charge you more if you send or receive more than a set number of emails. they could block emails containing certain keywords or emails from people or organizations they disliked, and block traffic to or from competitor web sites. they could make filtered service the default and force users to pay extra for the digital dystopia | bailey 123 wide open internet. if you tried to shop at a store that hasn’t paid them a kickback, they could steer you to a store that has. . . . if companies like at&t and verizon have their way, there will be two tiers of internet service: fast and expensive and slow and cheap (or cheaper). we unwealthy users—students, scholars, universities, and small publishers—wouldn't be forced offline, just forced into the slow lane. because the fast lane would reserve a chunk of bandwidth for the wealthy, the peons would crowd together in what remained, reducing service below current levels. new services starting in the slow lane wouldn't have a fighting chance against entrenched players in the fast lane. think about ebay in 1995, google in 1999, or skype in 2002 without the level playing field provided by network neutrality. or think about any oa journal or repository today.58 is net neutrality a quaint anachronism of the internet’s distant academic/research roots that we would be better off without? would new internet companies and noncommercial services prosper better if it was gone, spurring on new waves of innovation? would telecommunications companies (who may be part of larger conglomerates), free to charge for tiered-services, offer us exciting new service offerings and better, more reliable service? ฀ defending the internet revolution sixties icon bob dylan’s line in “the times they are achangin’”—“then you better start swimmin’ or you’ll sink like a stone”—couldn’t be more apt for those concerned with the issues outlined in this paper. here’s a brief overview of some of the strategies being used to defend the freewheeling internet revolution. 1. darknet: j. d. lasica says: “for the most part, the darknet is simply the underground internet. but there are many darknets: the millions of users trading files in the shady regions of usenet and internet relay chat; students who send songs and tv shows to each other using instant messaging services from aol, yahoo, and microsoft; city streets and college campuses where people copy, burn, and share physical media like cds; and the new breed of encrypted dark networks like freenet. . .”59 we may think of the darknet as simply fostering illegal file swapping by ordinary citizens, but the darknet strategy can also be used to escape government internet censorship, as is the case with freenet use in china.60 2. legislative and legal action: there have been attempts to pass laws to amend or reverse copyright and other laws resulting from the counter-internet-revolution, which have been met by swift, powerful, and generally effective opposition from entertainment companies and other parties affected by these proposed measures. the moral of this story is that these large corporations can afford to pay lobbyists, make campaign contributions, and otherwise exert significant influence over lawmakers, while, by and large, advocates for the other side do not have the same clout. the battle in the courts has been more of a mixed bag; however, there have been some notable defeats for reform advocates, especially in the copyright arena (e.g., eldred v. ashcroft), where most of the action has been. 3. market forces: when commercial choices can be made, users can vote with their pocketbooks about some internet changes. but, if monopoly forces are in play, such as having a single option for broadband access, the only other choice may be no service. however, as the oa movement (described later) has demonstrated, a concerted effort by highly motivated individuals and nonprofit organizations can establish viable new alternatives to commercial services that can change the rules of the game in some cases. companies can also explore radical new business models that may appear paradoxical to pre-internet-era thinking, but make perfect sense in the new digital reality. in the long run, the winners of the digital-content wars may be those who are not afraid of going down the internet rabbit hole. 4. creative commons: copyright is a two-edged sword: it can be used as the legal basis of licenses (and drm) to restrict and control digital information, or it can be used as the legal basis of licenses to permit liberal use of digital information. by using one of the six major creative common licenses (ccl), authors can retain copyright, but significantly enrich society’s collective cultural repository with works that can be freely shared for noncommercial purposes, used, in some cases, for commercial purposes, and used to easily build new derivative creative works. for example, the creative commons attribution license requires that a work is attributed to the author; however, a work can be used for any commercial or noncommercial purpose without permission, including creating derivative works.61 there are a variety of other licenses, such as the gnu free documentation license, that can be used for similar purposes.62 5. oa: scholars create certain types of information, such as journal articles, without expecting to be paid to do so, and it is in their best interests for these works to be widely read, especially by specialists in their fields.63 by putting e-prints (electronic preprints or post-prints) of articles on personal home pages or in various types of digital archives (e.g., institutional repositories) in full compliance with copyright law and, if needed, in compliance with publisher policies, scholars can provide free global access to these works with minimal effort and at no (or little) cost to themselves. further, a new generation of free e-journals are being published on the internet that are being funded by a variety of business models, such as advertising, author fees, library membership fees, and supplemental products. these oa strategies make digital 124 information technology and libraries | september 2006 scholarly information freely available to users across the globe, regardless of their personal affluence or the affluence of their affiliated institutions. ฀ impact on libraries this paper’s analysis of copyright, drm, and network neutrality trends holds no good news for libraries. copyright the reach of copyright law constantly encompasses new types of materials and for an ever-lengthening duration. as a result, copyright holders must explicitly place their works in the public domain if the public domain is to continue to grow. needless to say, the public domain is a primary source of materials that can be digitized without having to face a complex, potentially expensive, and sometimes hopeless permission clearance process. this process can be especially daunting for media works (such as films and video), even for the use of very short segments of these works. j. d. lasica recounts his effort to get permission to use short music and film segments in a personal video: five out of seven music companies declined; six out of seven movie studios declined, and the one that agreed had serious reservations.64 the replies to his inquiry, for those companies that bothered to reply at all, are well worth reading. for u.s. libraries without the resources to deal with complicated copyright-related issues, the digitization clock stops at 1922, the last year we can be sure that a work is in the public domain without checking its copyright status and getting permission if it is under copyright.65 what can we look forward to? lessig says: “thus, in the twenty years after the sonny bono act, while one million patents will pass into the public domain, zero copyrights will pass into the public domain by virtue of the expiration of a copyright term.”66 (the sonny bono term extension act was passed in 1998.) digital preservation is another area of concern in a legal environment where most information is automatically copyrighted, copyright terms are lengthy (or endless), and information is increasingly licensed. simply put, a library cannot digitally preserve what it does not own unless the work is in the public domain, the work’s license permits it, or the work’s copyright owner grants permission to do so. or can it? after all, the internet archive does not ask permission ahead of time before preserving the entire internet, although it responds to requests to restrict information. and that is why the internet archive is currently being sued by healthcare advocates, which says that it: “is just like a big vacuum cleaner, sucking up information and making it available.”67 if it is not settled out of court, this will be an interesting case for more digitally adventurous libraries to watch. as the cost of the hardware and software needed to effectively do so continues to drop, faculty, students, and other library users will increasingly want to repurpose content, digitizing conventional print and media materials, remixing digital ones, and/or creating new digital materials from both. with the “information commons” movement, academic libraries are increasingly providing users with the hardware and software tools to repurpose content. given that the wording of the u.s. copyright act section 108 (f) (1) is vague enough that it could be interpreted to include these tools when they are used for information reproduction, is the old “copyright disclaimer on the photocopier” solution enough in the new digital environment? or—in light of the unprecedented transformational power of these tools to create new digital works, and their widespread use both within libraries and on campus—do academic libraries bear heavier responsibilities regarding copyright compliance, permission-seeking, and education? similar issues arise when faculty want to place self-created digital works that incorporate copyrighted materials in electronic reserves systems or institutional repositories. enduser contributions to “library 2.0” systems that incorporate copyrighted materials may also raise copyright concerns. drm as libraries realize that they cannot afford dual formats, their new journal and index holdings are increasingly solely digital. libraries are also licensing a growing variety of “born digital” information. the complexities of dealing with license restrictions for these commercial digital products are well understood, but imagine if drm was layered on top of license restrictions. as we have discussed, drm will allow content producers and distributors to slice, dice, and monetize access to digital information in ways that were previously impossible. what may be every publisher/vendor’s dream could be every library’s nightmare. aside from a potential surge of publisher/vendor-specific access licensing options and fees, libraries may also have to contend with publisher/ vendor-specific drm technical solutions, which may: ฀ depend on particular hardware/software platforms, ฀ be incompatible with each other, ฀ decrease computer reliability and security, ฀ eliminate fair or otherwise legal use of drm-protected information, ฀ raise user privacy issues, ฀ restrict digital preservation to bitstream preservation (if allowed by license), digital dystopia | bailey 125 ฀ make it difficult to assess whether to license drmprotected materials, ฀ increase the difficulty of providing unified access to information from different publishers and vendors, ฀ multiply user support headaches, and ฀ necessitate increased staffing. drm makes solving many of these problems both legally and technically impossible. for example, under dmca, libraries have the right to circumvent drm for a work in order to evaluate whether they want to purchase it. however, they cannot do so without the software tools to crack the work’s drm protection. but the distribution of those tools is illegal under dmca, and local development of such tools is likely to be prohibitively complex and expensive.68 fostering alternatives to restrictive copyright and drm given the uphill battle in the courts and legislatures, ccls (or similar licenses) and oa are particularly promising strategies to deal with copyright and drm issues. copyright laws do not need to change for these strategies to be effective. it is not just a question of libraries helping to support oa by paying for institutional memberships to oa journals, building and maintaining institutional repositories, supporting oa mandates, encouraging faculty to edit and publish oa journals, educating faculty about copyright and oa issues, and encouraging them to utilize ccls (or similar licenses). to truly create change, libraries need to “walk the talk” and either let the public-domain materials they digitize remain in the public domain, or put them under ccls (or similar licenses), and, when they create original digital content, put it under ccls (or similar) licenses as well. as the oa movement has shown, using ccls does not rule out revenue generation (if that is an appropriate goal), but it does require facilitating strategies, such as advertising and offering fee-based add-on products and services. net neutrality there are many unknowns surrounding the issue of net neutrality, but what is clear is that it is under assault. it is also clear that internet services are more likely to require more, not less, bandwidth in the future as digital media and other high-bandwidth applications become more commonplace, complex, and interwoven into a larger number of internet systems. one would imagine that if a corporation such as google had to pay for a high-speed digital lane, it would want it to reach as many consumers as possible. so, it may well be that libraries’ google access would be unaffected or possibly improved by a two-tier (or multi-tier) internet “speed-lane” service model. would the same be true for library-oriented publishers and vendors? that may depend on their size and relative affluence. if so, the ability of smaller publishers and vendors to offer innovative bandwidth-intensive products and services may be curtailed. unless they are affluent, libraries may also find that they are confined to slower internet speed lanes when they act as information providers. for libraries engaged in digital library, electronic publishing, and institutional repository projects, this may be problematic, especially as they increasingly add more digital media, large-data-set, or other bandwidth-intensive applications. it’s important to keep in mind that net neutrality impacts are tied to where the chokepoints are, with the most serious potential impacts being at chokepoints that affect large numbers of users, such as local isps that are part of large corporations, national/international backbone networks, and major internet information services (e.g.,yahoo!). it is also important to realize that the problem may be partitioned to particular network segments. for example, on-campus network users may not experience any speed issues associated with the delivery of bandwidth-intensive information from local library servers because that network segment is under university control. remote users, however, including affiliated home users, may experience throttled-down performance beyond what would normally be expected due to speed-lane enforcement by backbone providers or local isps controlled by large corporations. likewise, users at two universities connected by a special research network may experience no issues related to accessing the other university’s bandwidth-intensive library applications from on-campus computers because the backbone provider is under a contractual obligation to deliver specific network performance levels. although the example of speed lanes has been used in this examination of potential net neutrality impacts on libraries, the problem is more complex than this, because network services, such as peer-to-peer networking protocols, can be completely blocked, digital information can be blocked or filtered, and other types of fine-grained network control can be exerted. ฀ conclusion this paper has deliberately presented one side of the story. it should not be construed as saying that copyright law should be abolished or violated, that drm can serve no useful purpose (if it is possible to fix certain critical deficiencies and if it is properly employed), or that no one has to foot the bill for content creation/marketing/distribution and ever-more-bandwidth-hungry internet applications. 126 information technology and libraries | september 2006 nor is it to say that the other side of the story, the side most likely to be told by spokespersons of the entertainment, information, and telecommunications industries, has no validity and does not deserve to be heard. however, that side of the story is having no problem being heard, especially in the halls of congress. the side of the story presented in this paper is not as widely heard—at least, not yet. nor does it intend to imply that executives from the entertainment, information, telecommunications, and other corporate venues lack a social conscience, are fully unified in their views, or are unconcerned with the societal implications of their positions. however, by focusing on short-term issues, they may not fully realize the potentially negative, long-term impact that their positions may have on their own enterprises. nor has this paper presented all of the issues that threaten the internet, such as assaults on privacy, increasingly determined (and malicious) hacking, state and other censorship, and the seemingly insolvable problem of overlaying national laws on a global digital medium. what this paper has said is simply this: three issues—a dramatic expansion of the scope, duration, and punitive nature of copyright laws; the ability of drm to lock-down content in an unprecedented fashion; and the erosion of net neutrality—bear careful scrutiny by those who believe that the internet has fostered (and will continue to foster) a digital revolution that has resulted in an extraordinary explosion of innovation, creativity, and information dissemination. these issues may well determine whether the much-touted information superhighway lives up to its promise or simply becomes the “information toll road” of the future, ironically resembling the pre-internet online services of the past. references and notes 1. gary flake, “how i learned to stop worrying and love the imminent internet singularity,” http://castingwords.com/ transcripts/o3/5073.html (accessed may 2, 2006). 2. lawrence lessig, free culture: the nature and future of creativity (new york: penguin, 2005), 130, www.free-culture.cc/ (accessed may 2, 2006). 3. ibid., 131. 4. ibid., 117–18. 5. william f. patry, copyright law and practice (washington, d.c.: bureau of national affairs, 2000), http://digital-law -online.info/patry (accessed may 2, 2006). 6. u.s. copyright office, copyright basics (washington, d.c.: u.s. copyright office, 2000), www.copyright.gov/circs/circl/ html (accessed may 2, 2006). 7. ibid. 8. lessig, free culture, 133. 9. barbara m. waxer and marsha baum, internet surf and turf revealed: the essential guide to copyright, fair use, and finding media (boston: thompson course technology, 2006), 17. 10. patry, copyright law and practice; lessig, free culture, 133. 11. jessica litman, digital copyright (amherst: prometheus books, 2001), 35–63. 12. lessig, free culture, 134. 13. ibid., 326. 14. association of american universities, the association of research libraries, the association of american university presses, and the association of american publishers, campus copyright rights & responsibilities: a basic guide to policy considerations (association of american universities, the association of research libraries, the association of american university presses, and the association of american publishers, 2006), 8, www.arl.org/info/ frn/copy/campuscopyright05.pdf (accessed may 2, 2006). 15. george h. pike, “the delicate dance of database licenses, copyright, and fair use,” computers in libraries 22, no. 5 (2002): 14, http://infotoday.com/cilmag/may02/pike .htm (accessed may 2, 2006). 16. patry, copyright law and practice. 17. computer crime and intellectual property section criminal division, u.s. department of justice, “prosecuting intellectual property crimes manual,” www.cybercrime.gov/ipmanual .htm (accessed may 2, 2006); u.s. copyright office, copyright law of the united states of america and related laws contained in title 17 of the united states code (washington, d.c.: u.s. copyright office, 2003), www.copyright.gov/title17/circ92.pdf (accessed may 2, 2006). 18. recording industry association of america, “copyright laws,” www.riaa.com/issues/copyright/laws.asp (accessed may 2, 2006). 19. kenneth d. crews, copyright law for librarians and educators: creative strategies and practical solutions, 2nd ed. (chicago: ala, 2006), 94. 20. lessig, free culture, 8. 21. ibid., 51. 22. lawrence lessig, the future of ideas: the fate of the commons in a connected world (new york: vintage bks., 2002), 165–66, 176. 23. lessig, free culture, 53–61. 24. ibid., 53. 25. ibid., 59–61. 26. james boyle, shamans, software, and spleens: law and the construction of the information society (cambridge: harvard univ. pr., 1996), 172. 27. william w. fisher iii, promises to keep: technology, law, and the future of entertainment (stanford, calif.: stanford univ. pr., 2004), 202. 28. lessig, free culture, 289–93. 29. litman, digital copyright, 179–80. 30. ibid., 181–84. 31. michael godwin, digital rights management: a guide for librarians (washington, d.c.: office for information technology policy, ala, 2006), 1, www.ala.org/ala/washoff/woissues/ copyrightb/digitalrights/drmfinal.pdf (accessed may 2, 2006). digital dystopia | bailey 127 32. ibid., 10–18. 33. bill rosenblatt, bill trippe, and stephen mooney, digital rights management: business and technology (new york: m&t bks., 2002), 61–64. 34. godwin, digital rights management: a guide for librarians, 2. 35. david mann, “digital rights management and people with sight loss,” indicare monitor 2, no. 11 (2006), www .indicare.org/tiki-print_article.php?articleid=170 (accessed may 2, 2006). 36. julie e. cohen, “drm and privacy,” communications of the acm 46, no. 4 (2003): 46–49. 37. rosenblatt, trippe, and mooney, digital rights management: business and technology, 45. 38. j. alex halderman and edward w. felten, “lessons from the sony cd drm episode,” feb. 14, 2006, http://itpolicy.princeton .edu/pub/sonydrm-ext.pdf (accessed may 2, 2006). 39. godwin, digital rights management: a guide for librarians, 18–36. 40. wikipedia, “analog hole,” http://en.wikipedia.org/ wiki/analog_hole (accessed may 2, 2006). 41. godwin, digital rights management: a guide for librarians, 18–20. 42. ibid., 36. 43. robert a. gehring, “trusted computing for digital rights management,” indicare monitor 2, no. 12 (2006), www.indicare .org/tiki-read_article.php?articleid=179 (accessed may 2, 2006). 44. seth schoen, “trusted computing: promise and risk,” www.eff.org/infrastructure/trusted_computing/20031001 _tc.php (accessed may 2, 2006). 45. pamela samuelson, “drm {and, or, vs.} the law,” communications of the acm 46, no. 4 (2003): 43–44. 46. declan mccullagh, “congress raises broadcast flag for audio,” cnet news.com, mar. 2, 2006, http://news.com .com/congress+raises+broadcast+flag+for+audio/2100-1028 _3-6045225.html (accessed may 2, 2006). 47. ibid. 48. danny o’brien, “a lump of coal for consumers: analog hole bill introduced,” eff deeplinks, dec. 16, 2005, www.eff .org/deeplinks/archives/004261.php (accessed may 2, 2006). 49. siva vaidhyanathan, copyrights and copywrongs: the rise of intellectual property and how it threatens creativity (new york: new york univ. pr., 2001), 174–75. 50. lessig, the future of ideas, 36–37. 51. educause, “net neutrality,” www.educause.edu/ c o n t e n t . a s p ? pa g e _ i d = 6 4 5 & pa r e n t _ i d = 8 0 7 & b h c p = 1 (accessed may 2, 2006). 52. electronic frontier foundation, “dearaol.com coalition grows from 50 organizations to 500 in one week,” mar. 7, 2006, www.eff.org/news/archives/2006_03.php#004461 (accessed may 2, 2006). 53. catherine yang, “is verizon a network hog?” businessweek, feb. 13, 2006, 58, www.businessweek.com/technology/ content/feb2006/tc20060202_061809.htm (accessed may 2, 2006). 54. ibid. 55. jonathan krim, “executive wants to charge for web speed,” washington post, dec. 1, 2005, d05, www.washingtonpost .com/wp-dyn/content/article/2005/11/30/ar2005113002109 .html (accessed may 2, 2006). 56. harold furchtgott-roth, “at&t, or another telecom takeover,” the new york sun, mar. 7, 2006. www.nysun.com/ article/28695 (accessed may 2, 2006). (see also: www.furchtgott -roth.com/news.php?id=87 (accessed may 2, 2006). 57. john windhausen jr., good fences make bad broadband: preserving an open internet through net neutrality (washington, d.c.: public knowledge, 2006), www.publicknowledge.org/ content/papers/pk-net-neutrality-whitep-20060206 (accessed may 2, 2006). 58. peter suber, “three gathering storms that could cause collateral damage for open access,” sparc open access newsletter, no. 95 (2006), www.earlham.edu/~peters/ fos/newsletter/ 03-02-06.htm#collateral (accessed may 2, 2006). 59. j. d. lasica, darknet: hollywood’s war against the digital generation (new york: wiley, 2005), 45. 60. john borland, “freenet keeps file-trading flame burning,” cnet news.com, oct. 28, 2002, http://news.com.com/2100 -1023-963459.html (accessed may 2, 2006). 61. creative commons, “attribution 2.5,” http://creative commons.org/licenses/by/2.5/ (accessed may 2, 2006). 62. lawrence liang, “a guide to open content licenses.” http://pzwart.wdka.hro.nl/mdr/research/lliang/open _content_guide (accessed may 2, 2006). 63. peter suber, “open access overview: focusing on open access to peer-reviewed research articles and their preprints.” www.earlham.edu/~peters/fos/overview.htm (accessed may 2, 2006); charles w. bailey jr., “open access and libraries,” in mark jacobs, ed., electronic resources librarians: the human element of the digital information age (binghamton, n.y.: haworth, 2006), forthcoming, www.digital-scholarship.com/cwb/oa libraries.pdf (accessed may 2, 2006). 64. lasica, darknet, 72–73. 65. waxer and baum, internet surf and turf revealed, 17. 66. lessig, free culture, 134–35. 67. joe mandak, “internet archive’s value, legality debated in copyright suit,” mercury news, mar. 31, 2006, www .mercurynews.com/mld/mercurynews/news/local/states/ california/northern_california/14234638.htm (accessed may 2, 2006). 68. arnold p. lutzker, primer on the digital millennium: what the digital millennium copyright act and the copyright term extension act mean for the library community (washington, d.c.: ala washington office, 1999), www.ala.org/ala/washoff/wois sues/copyrightb/dmca/dmcaprimer.pdf (accessed may 2, 2006). the chamberlain group inc. v. skylink technologies inc. decision offers some hope that authorized users of drm-protected works could legally circumvent drm for lawful purposes if they had the means to do so (see: crews, copyright law for librarians and educators: creative strategies and practical solutions, 96–97). continued on page 139 toward a twenty-first-century library catalog | antelman, lynema, and pace 139 copyright © 2006 by charles w. bailey jr. this work is licensed under the creative commons attributionnoncommercial 2.5 license. to view a copy of this license, visit http://creativecommons.org/licenses/by-nc/2.5/ or send a letter to creative commons, 543 howard st., 5th floor, san francisco, ca, 94105, usa. bailey continued from 127 ฀ known-item questions 1. “your history professor has requested you to start your research project by looking up background information in a book titled civilizations of the ancient near east.” a. “please find this title in the library catalog.” b. “where would you go to find this book physically?” 2. “for your literature class, you need to read the book titled gulliver’s travels written by jonathan swift. find the call number for one copy of this book.” 3. “you’ve been hearing a lot about the physicist richard feynman, and you’d like to find out whether the library has any of the books that he has written.” a. “what is the title of one of his books?” b. “is there a copy of this book you could check out from d. h. hill library?” 4. “you have the citation for a journal article about photosynthesis, light, and plant growth. you can read the actual citation for the journal article on this sheet of paper.” alley, h., m. rieger, and j.m. affolter. “effects of developmental light level on photosynthesis and biomass production in echinacea laevigata, a federally listed endangered species.” natural areas journal 25.2 (2005): 117–22. a. “using the library catalog, can you determine if the library owns this journal?” b. “do library users have access to the volume that actually contains this article (either electronically or in print)?” ฀ topical questions 5. “please find the titles of two books that have been written about bill gates (not books written by bill gates).” 6. “your cat is acting like he doesn’t feel well, and you are worried about him. please find two books that provide information specifically on cat health or caring for cats.” 7. “you have family who are considering a solar house. does the library have any materials about building passive solar homes?” 8. “can you show me how would you find the most recently published book about nuclear energy policy in the united states?” 9. “imagine you teach introductory spanish and you want to broaden your students’ horizons by exposing them to poetry in spanish. find at least one audio recording of a poet reading his or her work aloud in spanish.” 10. “you would like to browse the recent journal literature in the field of landscape architecture. does the design library have any journals about landscape architecture?” appendix a: ncsu libraries catalog usability test tasks lib-mocs-kmc364-20131012114126 286 communications marc format simplification d. kaye capen: university of alabama, university. this is a summary of a paper written on the consideration of the feasibility as well as the benefits, disadvantages, and consequences of simplification of the marc formats for bibliographic records. 1 the original paper was commissioned in june 1981, by the arl task force on bibliographic control as one facet in exploring the perceived high costs of cataloging and adhering to marc formats in arl libraries. the conclusions and recommendations, however, are entirely those of the author and the opinions and judgments stated here result from a wide-ranging canvas of technical services people, computer people, and/o r library administrators. because the marc format has so many uses, the paper is divided into five perspectives from which the marc format can be viewed: history, standards, and codes; present purposes; library operations; computer operations; and online catalogs. the library of congress has already begun a review of the marc format and has distributed a draft document. 2 the general thrust of that review is a close examination of the marc format in an attempt to begin to lay the foundation on which revised marc formats can firmly standparticularly in regard to content designation (tags, indicators, and subfield codes used to identify and characterize the data explicitly). as that review deals with the very specific, this paper aims generally at attempting to paint with broad strokes a picture of today's marc in its many relationships, benefits, costs, and what the impact would be to the whole from any change to the part. perspective: marc history, standards, and codes relationships the original marc format document established conventions for encoding data for monographs. though it was understood that early applications were going to relate to the production of catalog cards, the marc designers looked ahead to an increasing emphasis on data retrieval applications. other design considerations included, for example, the necessity for providing for complex computer filing, allowance for a variety of data processing equipment, and an attempt to provide for some analytical work (more specific description of contents notes or other types of analysis). later the single marc ii format was transformed into a series of formats, and as time passed, those formats became inextricably tied to other developments at the national and international levels: the international standard bibliographic descriptions, the anglo-american cataloguing rules , 2d ed., unimarc, the national level bibliographic records, and the national and international communications standards; e.g., ansi z39.2-1979 and iso 2709. benefits the benefits of the marc formats and other standards and codes have been substantial both philosophically and pragmatically. the sharing of cataloging records through the computer-based, online networks have been shown in a variety of cost studies to have contained the rate of rise of per unit cost. a further benefit of the marc formats is the momentum its creation gave to the steady movement toward standardization which can benefit individuallibraries in a number of ways: first, bibliographic information can be exchanged among libraries and countries. second, in recent years we have moved steadily toward creating an environment in which the library of congress would become one of many authoritative libraries thus enhancing the shareability of records. costs the early costs of the development and implementation of the marc formats were borne by lc (aided by council on library resources funds). lc continues to bear most of the costs of marc formats, such as new marbi proposals, duplication and distribution of documentation, and so forth. direct investment of library dollars came through the purchase of the marc tapes and the development of systems to receive, process, and output data in marc formats. impact of change throughout the years of its use, the marc format content designation and content rules have been augmented or modified. in the beginning, however, databases were small and changes could be absorbed more readily. the number and complexity of the formats have increased, as have the interrelationships of the marc formats with other standards and codes resulting in a present environment in which the impact of change is felt more strenuously. perspective: present relationships and constraints relationships today's close interrelationships between the marc formats and other codes and standards affect both library and computer operations. though, for example, the general international standard bibliographic description was implemented by the library community prior to the adoption of aacr2, the second edition of the rules has firmly incorporated the isbds. when this format description system is combined with the machine-based marc formats, some isbd information will be supplied by humans and some generated by programmed machine manipulations. communications 287 as a second example, in the last couple of years, the library of congress has spearheaded the development of national level bibliographic record(s) which define the specific data elements that should be included by any organization creating cataloging records which may also be shared with other organizations or be acceptable for contribution to a national database. as the logical idea of a national database comes to fruition, it is necessary for the marc format to provide for greater specificity in the coding of originating library, modifying library, and so forth. benefits the benefits of the use of the marc format continue to lie in the ease with which bibliographic information can be shared and the concomitant beneficial impact on cost control. in addition, the marc format supports a host of other standards and codes and the benefit from these relationships has been consistency in and fostering of standards development. in the bibliographic arena, the more that standards are developed-locally, regionally, nationally, and internationally-the more we will be able to transmit and share bibliographic data, thus controlling the costs of original cataloging. on the other hand, we also "pay" when we standardize. cost the two costs associated with increased standardization are additional time and thus cost required to meet standards, and the increased expense of maintaining local practices which may often be idiosyncratic. in relation to the latter, while many local idiosyncrasies are often unnecessary and counterproductive, there are generally some which have become an integral part of a large catalog database or upon which a major procedural activity is based. but, to benefit from compliance with standards, increasingly we will move away from local practices. in terms of the time required to adhere to the marc format, it is possible to continue to utilize the format (or participate in systems that use it) and yet control the amount of complexity with which one has to deal. both aacr2 and national level biblio288 journal of library automation vol. 14/4 december 1981 graphic record documents allow for "levels of description" which provide for more or less description; and various online networks allow, in a similar manner, for limited input standards. as we view the array of standards and codes which together make up today's bibliographic scene, we can see that each of the separate elements is consistent within itself, is understandable, and counts for only a portion of the costs associated with the cataloging process. the combination of elements, however, begins an accretion of complexity that for most requires an effort of organization and education in order to control work flow and meet standards. impact of change because the marc format is closely interwoven with a number of national and international codes and standards, changes to the format would have implications far beyond the local library. at the very least, discussions would have to involve a host of individuals and groups, all at different stages of development and implementation based upon the present marc format. perspective: library operations relationships in the library-operations perspective, any operations related to the marc format have to be viewed as only one of many elements which must be interfaced with daily work flow. let us look, for example, at the amount of time which might be expended in a typical large academic library by cataloging personnel in training and ongoing work activities required in marc-related operations. in those libraries which obtain access to cataloging databases as members of networks, contact with the marc format is filtered through the standards, requirements , marc implementation design, documentation and other related training facilities of the network. libraries which maintain their own databases do the same kind of filtering, though staff may have somewhat more control of the user cordiality of the interface. the shared networking environment , however, generally seems to imply more standards and requirements because of the attempt to guarantee as much "shareability" as possible. libraries participating in oclc, for example, must train staff in the following codes: aacri; aacr2; standard subject heading codes; standard classification codes; oclc/marc formats for each type of material being cataloged; oclc bibliographic input standards; oclc level i and level k input standards; oclc systems users guides; in some instances, input standards documents for regional or special-interest cooperatives; local library interpretations, procedures, and standards. any close review of the time library staff expend in the use of these tools for either training or ongoing operations reveals that marc per se requires only a limited proportion of a typical library staff person's day. while training may be intensive at either the beginning of a person's job or at the beginning of work with a new type/format of material, this portion of the cataloging unit cost is small. benefits, costs in the cataloging activity, the benefits from the use of the marc formats are at least two: first, the marc format as part of an online cataloging system permits the machine-production of catalog cards at a major savings over manual production. second, access to a shared cataloging database permits the use of "clerical" catalogers at an estimated unit cost saving per book of twenty dollars when compared to "original" cataloging.3 third, depending upon the information available in the cataloging record, the time required for decision making during the cataloging process can be decreased significantly. impact of change it was the general consensus of the technical services people i contacted that simplification of the formats through the consistent assignment of tags would make training and introduction to new formats somewhat easier, but that any savings of time would probably be trivial. there was no consensus that either simplification or shortening would result in any significant time or cost savings. to a certain extent, the use of the very specific marc formats has made the descriptive cataloging process (and the training to undertake it) clearer in that the logical relationships and description of the data elements are so clearly exposed through the assignment of tags and other codes. also, once initial familiarity with the format(s) is achieved, ongoing use becomes second nature. it is also possible for cataloging staff to control the complexity with which they will deal through the use of less than "full," but still nationally acceptable levels of cataloging and, hence, marc coding. finally, most technical services people believe that cataloging and maintenance activities in libraries have always been complex, requiring long and detailed procedures and intricate work flow . while membership in networks requires new skills and knowledge, it is the sum of the whole rather than the difficulty of any single portion which affects unit costs today. changing the marc format through either simplification or shortening would have only a slight effect on the total technical services operation and costs. perspective: the computer operations environment relationships in looking at computer operations, there are at least two major subdivisions: operations that serve only one client (e.g., alibrary system serving itself) or operations that serve many clients (e.g., rlin or blackwell/north america). the constraints differ for each operation and are further complicated by whether or not the computer operation must be able to produce as well as accept bibliographic records in a marc format. each computer facility, for example, can have distinct operating software depending upon the type and mix of computing equipment used. in addition, each computing facility translates the marc-formatted records into an internal processing format which may differ extensively from marc. too, further tailoring may be done for batch processing as opposed to online operations and computer operations which serve a single user may not have to re-create records in the marc format and may even communications 289 more radically redesign the marcformatted records for internal use. as changes to the marc format occur over the years, each computer system will write additional software to incorporate those changes into the then existing system. in some instances, it may be too difficult to attempt to convert old databases to reflect changes in marc coding, and there will then exist an "old" database and a "new" database for that particular marc field or subfield. since changes have occurred in many fields, most databases are an amalgam of new and old interpretations (this is true in relation to cataloging codes, too) of marc coding, and original internal software design may reflect the same type of patchwork quilt. operating these computer systems is complicated, in addition, by the fact that a wide range of user library needs and desires must be accommodated. indeed, a report prepared by hank epstein for the conference to explore machine-readable bibliographic interchange (cembi) revealed after an exhaustive review of the use of marc data elements that there was no data element not used by someone!• benefits benefits that accrue to computing operations as a result of the marc format include the use of what was called "a pretty decent general communications format ," which facilitates communications, card/ com production, and online information retrieval. as a communications format it is as coherent as any other structure for carrying bibliographic data. because the format allows for a very specific level of detail in description, computing operations can supply a variety of products to fill a variety of needs. costs while specific cost information was not available for inclusion in this paper, discussion does reveal some widely held generalizations. first, the marc format does not seem to be any more complex or costly to use than other variable field communications formats. beginning programmers are generally introduced first to the internal communications format of their particular 290 journal of library automation vol. 14/4 december 1981 computing system, and when they come to the marc tags rapidly become familiar with the coding through experience. indeed, if the programmers know the structure of and have a specification for the format, they can work with that format even though they may be unfamiliar with it from the users' point of view. thus, the format itself, and training in its use does not seem to be significantly costly. second, every change in the marc format requires some programming effort and may or may not require concomitant changes in the database. the consensus of the computer people with which i spoke was that the sophistication and specificity of the marc formats was a good thing, but the inconsistencies among formats is problematical. the benefits of consistency can be important, but to justify changes financially, the major changes should be done at one time. indeed, most individuals doubted whether or not there was sufficient capital in these straitened times to be able to implement consistently a major marc format changeand this is from the perspective of both the operations serving one and many users. impact of change without a philosophical and practical framework (or benchmark) against which to compare the benefits and costs of alternative solutions to marc format maintenance issues and without a better and more comprehensive description of the requirements of the internal processing formats of the computer operations, it is difficult to assess clearly the costs and benefits of marc format changes. it does seem to be the case presently that, once established, computer operations can deal with the complexity and specificity of the marc format without undue ongoing financial investment. the strength of the marc format for computer operations lies in its specificity. for the batch processing environment especially, the marc format is a reasonably efficient format and one that facilitates development. its inefficiencies are not drastic and its specificity buys valuable flexibility. severe cuts or major simplifications would be a mistake since discontinuing specificity is a one-way street-once it is gone, it cannot be retrieved. the ability of the machine to assist in editing is weakened by the loss of specificity and it then becomes more difficult to edit out poor data. simplification through consistency, rather than shortening, would produce the most beneficial impact-though it must be done carefully to be cost beneficial. perspective: online catalogs relationship the major difficulties facing us when we attempt to discuss the relationship of the marc format to online catalogs is that, first, we know so little about how people think when they use our card catalogs; and, second, we have so little experience with how those thought and use patterns might change when the online catalog replaces the card catalog. another aspect of online library system development is the combination of subsystems such as acquisitions, serials control, or authority control with the online catalog and the implications of such a combination for system design, the internal processing format, and compatibility with the marc format. the index design of most large online catalogs or information retrieval systems today relies upon precoordinated search keys in order to facilitate the large sorting activities that have to occur. the second indicator in the 700 field, for example, is designed for the purpose of formulating search keys, filing added entries or for selecting alternative secondary added entries. this type of specificity is necessary for both card production and online retrieval. taken together, all of these considerations make most systems and library technical people hesitate to recommend any major changes to the marc format at this time. benefits at this time, therefore, in terms of information retrieval, there does not seem to be any major force toward either simplifying or shortening the marc format to facilitate retrieval. this becomes an even more cogent sentiment when we consider that major development efforts have already been begun in the areas of online catalog access and information retrieval. delays in these development efforts now caused by ........ changes in the marc formats could be enormously wasteful of the time and effort already invested, and could postpone urgently needed implementation of new, easily maintainable online systems. costs there is no firm cost data to guide us in considering the impact of marc format changes in the information retrieval environment. generally accepted assumptions are, however, that because of our lack of knowledge and experience in this area, it is simply too risky and potentially costly to experiment. impact of change overall, without more experience in this area, it is the general opinion that the fullest level of descriptive specificity of the marc format might be required to design and implement online catalogs/information retrieval systems which can be responsive to the needs of a variety of users and levels of information. interaction with other subsystems and formats is also incomplete, thus clouding our vision of the impact of change over the breadth of the library community. summary and conclusions the original purpose of the marc format is still a cogent and necessary one-that of allowing for a great variety of individual library needs for products, practices, and policies via a standardizing communications format. both catalog card production and online retrieval necessitate the same level of specificity, though particular tags, indicators, and subfield codes may vary. as we look toward a variety of authoritative cataloging sources the marc format, in addition to a specific coding of bibliographic information, might also have to specify descriptions of cataloging actions so that the greatest degree of "shareability" might exist. some of this related authoritytype information will either be carried as part of the marc format or in some manner as linked records. the computer operations that utilize the marc formats exist under the constraints of a variety of internal processing formats and design constraints. for each internal processing system, however, the specificity of the marc format offers flexibility and communications 291 efficiency for a number of different processes and products. taken by itself, the marc format is no more difficult to work with than any other standard or technique for both librarians and computer people. while it might be useful for librarians to implement training aids such as online documentation, access to library manuals (particularly that of the library of congress), and so forth, the benefits of aids such as these are trivial since the coding can be learned rather quickly through experience. for computing people, on the other hand, changes in the formats can be very expensive and disruptive. there is general agreement, moreover, that over the long term we have got to be able to maintain the marc format in response to experience with retrieval and other theoretical and technical advances. the main thrust of maintenance in the computing realm is consistency across formats, but approaching this type of simplification requires a number of preliminary steps if it is to be implemented effectively. we need to develop a vocabulary for jointly discussing the elements of the problem. in addition, a major review needs to be undertaken of the internal processing formats and design constraints of the major computer operations-both to serve as a benchmark for measuring the impact of format changes, and as a guideline for newly developing systems to assist in avoiding mistakes in the development of new computer operations. someone needs to be thinking about and designing the ultimate, comprehensive marc format-not to be implemented, but to serve as a springboard for discussion and for consideration of system design. we need to establish limitations on what we will handle with the marc formats and where we will begin to rely on underlying formats instead. the development of a comprehensive marc conceptualization would also provide a protocol for undertaking the improvement of marc and would serve as a benchmark against which local systems could be compared. at the very least, the steps described here would facilitate the consideration and implementation of making the formats consistent across types of material a goal which is seen by all to be highly desirable. 292 journal of library automation vol. 14/4 december 1981 we need a format which is consistent, easily maintainable without being uncontrollably disruptive, and responsive to changing needs which are likely to accelerate as we gain experience with online systems. rather than recommending or supporting the implementation of specific changes to the marc format, it is essential that the library community begin to establish the framework and benchmarks necessary to maintain the marc formats over the long term as well as to guide short-term considerations. arl and others can play an important role in undertaking and encouraging a broader approach to this pressing problem. such an approach will not only reduce the risk of decision making, but will also assist in the development of the cost/benefit data needed to enhance consideration of format changes. references 1. d. kaye capen, simplification of the marc format: feasibility, benefits, disadvantages, consequences (washington, d.c.: association of research libraries, 1981), 22p. 2. "principles of marc format content designation,'" draft (washington, d.c.: library of congress, 1981), 66p. 3. ichikot. morita and d. kaye capen, "a cost analysis of the ohio college library center on-line shared cataloging system in the ohio state university libraries," library resources & technical services 21:286302 (summer 1977). 4. council on library resources bibliographic interchange committee, bibliographic interchange report, no. i (washington, d.c.: the council, 1981). comparing fiche and film: a test of speed terence crowley: division of library science, san jose state university, san jose, california. introduction for more than a decade librarians have been responding to budget pressures by altering the format of their library catalogs from labor-intensive card formats to computer-produced book and microformats. studies at bath, 1 toronto, 2 texas, 3 eugene, 4 los angeles, 5 and berkeley, 6 have compared the forms of catalogs in a variety of ways ranging from broad-scale user surveys to circumscribed estimates of the speed of searching and the incidence of queuing. the american library association published a state-of-the-art reporf as well as a guide to commercial computer-output microfilm (com) catalogs pragmatically subtitled how to choose; when to buy. 8 in general, com catalogs are shown to be more economical and faster to produce and to keep current, to require less space, and to be suitable for distribution to multiple locations. primary disadvantages cited are hardware malfunctions, increased need for patron instruction, user resistance (particularly due to eyestrain), and some machine queuing. the most common types of library com catalogs today are motorized reel microfilm and microfiche, each with advantages and disadvantages. microfilm offers filesequence integrity and thus is less subject to user abuse, i.e., theft, misfiling, and damage; in motorized readers with "captive" reels it is said to be easier to use. disadvantages include substantially greater initial cost for motorized readers; limits on thecapacity of captive reels necessitating multiple units for large files; inexact indexing in the most widespread commercial reader, and eyestrain resulting from high speed film movement. microfiche offers a more nearly random retrieval, much less expensive and more versatile readt:r~, and unlimited file size. conversely, the file integrity of fiche is lower and the need for patron assistance in use of machines is said to be greater than for self-contained motorized film readers. the problem one of the important considerations not fully researched is that of speed of searching. the toronto study included a selftimed "look-up" test of thirty-two items "not in alphabetical order" given to thirtysix volunteers, of whom thirty finished the test. the researchers found the results "inconclusive" but noted that seven of the ten librarians found film searching the fastest method. "average" time reported for searching in card catalogs was 37.3 minlib-mocs-kmc364-20131012113423 $c this subfield will contain all but the first character (or all but the first if a longer escape sequence is used) of every escape sequence found in the record. if the same escape sequence occurs more than once, it will be given only once in this subfield. the subfield is repeatable. this subfield does not identify the default character sets. example: l'>l'l~c)w a record containing the iso extended cyrillic character set. l'>l'>$c)w$c)x a record 3.4 discussion-other details containing both the iso greek and extended cyrillic character sets. when a field has an indicator to specify the number of leading characters to be ignored in filing and the text of the field begins with an escape sequence, the length of the escape sequence will not be included in the character count. when fields contain escape sequences to languages written from right to left, the field will still be given in its logical order. for example, the first letter of a hebrew title would be the eighth character in a field (following the indicators, a delimiter, a subfield code, and a three-character escape sequence). the first letter would not appear just before the end of field character and proceed backwards to the beginning of the field. a convention exists in descriptive cataloging fields that subfield content designation generally serves as a substitute for a space. an escape sequence can occur within a word, after a subfield code, or between two words not at a subfield boundary. for simplicity, the convention that an escape sequence does not replace a space should be adopted. one other convention is also advocated: when a space, subfield code, or punctuation mark (except open quote, pareports and working papers 215 renthesis or bracket) is adjacent to an escape sequence, the escape sequence will come last. wayne davison of rlin raised the following issue. after the library of congress has prepared and distributed an entirely romanized cataloging record for a russian book, a library with access to automated cyrillic input and display capability will create a record for the same book with the title in the vernacular. (since aacr2 says to give the title in the original script "wherever practicable," the library could be said to be obligated to do so.) in such an event the local record could have all the authoritative library of congress access points. to keep this record current when the library of congress record is revised and redistributed, it would be necessary to carry the lc control number in the local record. most automated systems are hypersensitive to the presence of two records with the same control number. the two records can be easily distinguished: in the library of congress record, the modified record byte in field 008 will be set to "o" and it will not have any 066, character sets present field. a comparison of oclc, rlg/rlin, and wln university of oregon library the following comparison of three major bibliographic utilities was prepared by the university of oregon library's cataloging objectives committee, subcommittee on bibliographic utilities. members of the subcommittee were elaine kemp, acting assistant university librarian for technical services; rod slade, coordinator of the library's computer search service; and thomas stave, head documents librarian. the subcommittee attempted to produce a comparison that was concise and jargonfree for use with the university community in evaluating the bibliographic utilities under consideration. the university faculty library committee was enlisted to review this document in draft form and held three meetings with the subcommittee for that purpose. the document was also shared with library faculty and staff in order to elicit suggestions for revision. 216 journal of library automation vol. 14/3 september 1981 a copy of the draft was sent to each utility with a request for suggestions for correction and/or clarification of the report. each of the utilities responded promptly, and their recommendations were reviewed by the subcommittee and have been incorporated into the report as it appears here. in reading this report two considerations should be kept in mind: (1) the information is current as of december 1980, and (2) the efforts at brevity and jargon-free comparison may have resulted in oversimplification in some areas. this report is one aspect of the sixmonths-long decision-making process that led the university of oregon library to select oclc, inc. (now the online computer library center). introduction an online bibliographic utility provides computer services to member libraries who, in turn, contribute computer-readable records to a common database. the database is a collection of catalog records input by the members and other sources such as the library of congress, the government printing office, and the national library of medicine. use of the database is online, meaning that each member library accesses the computer directly and carries out its work in an interactive, conversational manner through a computer terminal located in the library. communications with the central computer are carried over a leased long-distance telephone line. the bibliographic utility produces two primary products-catalog cards and magnetic tapes of a library's catalog records-and offers many other services for processing and bibliographic control in libraries. in addition to providing the products and services of a bibliographic utility through the research libraries information network (rlin), the research libraries group (rlg) has three other goals: (1) to provide a structure through which common research library problems can be addressed, (2) to provide scholars and others with increasingly sophisticated access to bibliographic and other forms of information , and (3) to promote, develop, and operate cooperative programs in collection development, preservation of library materials, and shared access to research materials. the purpose of this report is to provide an overview of considerations in selecting an online bibliographic utility and a comparison of the three utilities being reviewed by the university of oregon library. each consideration is accompanied by a brief definition or explanation, and a summary of each utility's capability in providing the necessary services or products. an attempt has been made to distinguish between currently available services and those that are planned for the future, but technological and organizational changes in the utilities have complicated this task and, in some cases, made it difficult for the subcommittee members to distinguish between operational and projected capabilities. basic characteristics history oclc oclc, inc., was founded in 1967 by the ohio college association as the ohio college library center, to be the first online shared cataloging network. it has since expanded beyond the confines of the state of ohio and is currently used by nearly 2,400 member libraries in the united states and abroad. in 1977 it adopted its present name. rlgirlin the research libraries group, inc. , was established in 1974 by four major research libraries. in 1978 it acquired from stanford university the ballots bibliographic data system, which became the foundation for rlin (research libraries information network), rlg's wholly-owned bibliographic utility. besides being the basis for rlg's cooperative processing activities, rlin supports its other three programs: shared resources, cooperative collection development, and preservation. rlg presently has 23 owner-members. wln in 1975 the washington library network began testing its online system using as its base a computerized bibliographic database that several washington libraries had been building since 1972. wln is a project of the washington state library and presently has over 60 members, primarily in the northwest. membership configuration oclc oclc had 2,392 member libraries, in early 1981, including about 1,300 college and university libraries, 330 public libraries, 250 federal libraries, 145 special libraries, 77law libraries, 71 members of the association of research libraries, 168 medical libraries, 37 state libraries, and at least 48 art and architecture libraries. rlg!rlin in december 1980, there were 23 ownermembers (21 university libraries, the new york public library, and the american antiquarian society), two associate members, two affiliate members, and several museum and three law library special members. libraries which formerly contracted for ballots cataloging services from stanford university are still being served by rlin. these include 52 libraries using rlin for online cataloging and 136 libraries using rlin on a search-only basis. wln wln had 65 members, in early 1981 , including 34 college and university libraries, 21 public libraries, two special libraries, three state libraries, five law libraries, and the pacific northwest bibliographic center. governance methods of governance are of concern to libraries considering membership inasmuch as they determine to a great extent the responsiveness of the utilities to the needs of their members and the ability of members to participate in setting the direction and priorities for the utility. oclc a 15-mem ber board of trustees holds the powers and performs the duties necessary for governance (including filling management vacancies and approving policy and budgets). a users' council, elected by the members, participates in the election of trustees and represents the interests of the membership in an advisory capacity. it also reports and working papers 217 must ratify amendments to the oclc code of regulations and articles of incorporation. of the 69 delegates to the council, 44 are from academic libraries. various advisory groups exist representing the interests of special groups within the membership, including a research libraries advisory group. twenty regional networks contract with oclc to provide services to their members. oclc libraries in oregon participate through the oclc western service center, claremont, ca, and are served by oclc's portland office. rlg!rlin rlg /rlin operates through a board of governors consisting of one representative from each full member institution with the president as chief operating officer. standing committees for collection management, public services, preservation, and library technical systems & bibliographic control; and program committees for east asia, art, law, theology, and music are composed of appointees from member institutions and report to the president. wln an 11-member computer services council is elected directly by the online participant libraries. legal responsibility for wln resides with the washington state library commission. financial stability an indicator of a utility's financial stability is its proven ability to generate sufficient revenues to cover expenses with the least recourse to outside funding sources. financial stability in a utility is a concern to a library considering membership not only from the standpoint of a utility's mere survival, but because of its implications for future system developments, possible dramatic fee increases should outside funding evaporate, and maintenance of high quality services and products. oclc oclc, inc., is a not-for-profit corporation, with tax-exempt status having been granted under section 501 ( c)(3) of the internal revenue code . it is self-supporting, receiving no government or private subsidies, 218 journal of library automation vol. 14/3 september 1981 and issuing no stock. its revenues alone support existing operations, expansion, and research and development activities. revenues result from fees charged member libraries for products and services. oclgs estimated assets for fiscal year 1980 were over $55 million and its revenues approximately $24 million. its revenue base is its 2,400 member institutions. rlgirlin the research libraries group, inc., is a tax-exempt corporation owned by its 23 owner-member institutions. revenues result from fees charged members for use of the rlin database. rlg currently must supplement this income with foundation grants and loans from stanford university, because of relatively high development costs and relatively low revenues. as of this year, nearly $5.25 million has been received in grants and a $2.2 million loan was obtained, to be repaid by august 1986. rlg has projected that in 1982-83 ongoing operating costs will be met by feegenerated income. rlg's board of governors recently approved a new income/ expense structure to take effect september 1, 1981: "operating expenses matched by rates for services; system development matched by grants and loans; program and administration matched by a program partnership fee." this new program partnership fee will be a flat annual rate for full members in the range of $20,000 to $25,000. a decline in the number of units cataloged by member libraries (due in part to decreased acquisitions budgets), which is the basis for fees charged, forced the board lo inslilute this new fee. il.lg is encouraging member libraries to seek these additional funds from institutional sources outside the libraries' own budgets. the new financial structure appears to reflect a recognition of the need for outside resources to provide for research and development for at least the immediate future, and at the same time an effort to reconcile income and expense in the areas of operating expenses and program administration. its revenue base is its membership of 23 institutions. in the past rlg has estimated that financial stability would be reached when membership reached 35, but it is unclear how the new rate structure will affect that projection. wln the washington library network receives revenues in the form of fees for services and products. as a division of the washington state library, it also receives some funding from the state of washington. wln has been the recipient of some outside grants, but does not appear to rely heavily upon grant monies to meet ongoing expenses or system development costs. wln would like to lessen its dependency upon the state of washington, and has taken the first step by broadening the base of its advisory committee to include out-ofstate members. its revenue base is its membership of approximately 60 libraries. the committee preparing this report does not have information as to the proportion of revenues generated by fees. however, a recent (july 1, 1980) 10% increase in service rates was put into effect for these stated purposes, among others: "to recover the cost of operation of the computer service" and to "allow a modest margin to insure stability." track record in meeting past system developme11t deadlines past success or failure in meeting announced deadlines for system developments may be indicative of future performance in this regard. all three utilities are heavily engaged in research and development and, while we are primarily interested in the features that are presently available, it is also important to try to gauge what each system will look like several years from uuw. the amount of information available to the committee varied according to the utility, so these columns are not directly comparable, but merely suggestive. oclc oclc tries not to attach dates to its projections because of early failures to meet announced deadlines. however, its interlibrary loan system was implemented one year early and its searching improvements are claimed to be ahead of schedule. the planned acquisitions subsystem had been scheduled for completion in summer 1980, and is currently being tested by a small number of member libraries. the conversion of oclc's database to accommodate the new cataloging rules and include new forms of names was completed on schedule in december 1980. the serials union listing capability was also completed on time. (seep. [224]) rlgirlin a study dated august 1978 performed for the university of california listed planned ballots system developments with projected completion dates. this list follows, with actual completion dates or revised projections added: • network file system (now called "reconfigured database" by rlin) projected january 1979 revised projection april 1981 serials cataloging projected january 1979 actual completion late 1979 authority control system, phase 1 projected january 1979 revised projection spring 1981 authority linking and control, phase 2 projected fall1979 revised projection spring 1981 generalized acquisitions projected fall1979 revised projection (in two phases) june 1981, october 1981 serials control projected 1980 revised projection post-1982 library management information system projected 1979 no projected date, no resources allocated book/com catalog interface projected 1980 revised projection 1981 wln wln's present online system was one year late, and its acquisitions module was also late. the processing of retrospective conversion tapes which had been three months behind was current by early 1981, *since 1978 the rlg board of governors has determined the order of priorities for research and development. reports and working papers 219 with the exception of two special projects. large-scale system adjustments to accommodate new cataloging rules were completed on schedule, as was implementation of roll-microfilm catalogs. database size and components the size and makeup of the utility's database is of concern to libraries considering membership because those factors have the greatest bearing on the library's likelihood of obtaining a large portion of its cataloging information from the system. oclc size. over 7.1 million bibliographic records (february 1981) books: 4.9 million (october 1979) serials: 341,000 (october 1979) other: 340,000 (october 1979) name authority records: 500,000 (est. by 1981) formats available. books serials films (av) maps manuscripts music recordings music scores sources of data. member-contributed records library of congress-produced machinereadable cataloging records (marc) (1968 to date) government printing office-produced records (cataloged directly into oclc by gpo) conser records (conversion of serialsa project of 15 major libraries to produce machine-readable serials cataloging records). data are entered directly into oclc, then authenticated by the library of congress and the national library of canada. national library of medicine-produced records additional sources include the following databases: canadian marc serials minnesota union list of serials pittsburgh regional library center serials 220 journal of library automation vol. 14/3 september 1981 rlg/rlin size. over 3 million bibliographic records 0 une 1980) books: 2.5 million (june 1980) serials: 460,000 (june 1980) authority records: 1.6 million (early 1981) formats available. books serials films (av) maps music recordings music scores sources of data. member-contributed records marc (excluding 19681972) gpo records (to be added spring 1981) conser records cataloging records from columbia and yale universities and university of minnesota biomedical libraries, previously put into machine-readable form, have been added to rlin. records from the new york public library, northwestern and pennsylvania state universities will be added in the near future. additional sources include the avery index to architectural periodicals. wln size. 2 million bibliographic records (january 1981) authority records: 2.3 million (january 1981) holdings records: 2.3 million (december 1980) formats available. books serials films (av) music recordings• music scores• sources of data. member-contributed records marc (1968 to date) gpo records conser records (except those not yet authenticated by the library of congress) machine-readable records from the university of illinois will be added to wln's • awaiting implementation by the library of congress. database on a weekly basis by mid-1981. records from certain libraries in the southeastern library network (solinet) will be added in the future, ,as part of an arrangement whereby wln made its computer software package available for use by illinois and solinet. resource sharing interlibrary loan (ill) ill is the process by which library materials are lent and borrowed by libraries in the u.s. and foreign countries. a bibliographic utility provides two tools to aid in this process: an online union catalog used to determine which library owns the needed material, and a message switching system used to communicate among libraries and to carry out the transaction. ill at the university of oregon library is currently accomplished using a large number of printed union catalogs and is communicated by mail or western union teletype. a bibliographic utility will not completely replace ill transactions carried out in this manner. the number of requests for materials from the library collection will probably increase due to the "visibility" gained in the online union catalog. oclc the oclc database provides the largest online union catalog through a holdings record listed with each catalog entry. the ill message system transfers records from the database to the lending library in a request form, automatically sends the request to up to five libraries, generates records on the status of each request, and provides statistics on ill transactions. oclc ill transactions are generally faster than traditional methods of interlibrary loan because of the ability to move data directly from the online union catalog to the request form without re-typing and the ability to have requests automatically forwarded if a library is unable to fill the request immediately. oclc's ill subsystem has been in operation for a year and participating libraries have reported general satisfaction with its performance. rlg/ rlin the rlin database provides an online union catalog through a holdings record listed with each catalog entry. materials not located in the rlin database may be referred to the bibliographic center at yale university for further manual searching through printed union catalogs. the rlg message system may be used to create and send ill requests to other rlg libraries, though this system is not specifically designed as a comprehensive ill support system. the shared resources program committee has recently formed a task force charged with the responsibility to create a functional specification for an automated interlibrary loan system, and to determine the priority for its implementation. rlg resource sharing policy requires members to give priority to ill requests from other rlg members, to suspend fees to members, to provide on-site access to users from members' libraries' institutions, and to provide free photocopies of non-circulating materials. wln the wln database provides an online union catalog through a holdings record listed with each catalog entry. this online union catalog includes the local library call number and, for serials, the specific holdings of the library. the wln resource directory is a microfiche listing of the bibliographic and holdings information in the database. wln offers no message switching system for ill, though this is their highest priority for future development. in cooperation with pacific northwest bibliographic center, wln is planning experiments with a message switching system for interim use until the comprehensive ill system is developed. cooperative acquisitions cooperation in purchasing library materials is done in order to minimize the duplication of expensive purchases and to ensure that important works are easily available to users of the library, whether they are actually owned or not. oclc member libraries may search the database to determine the holdings of particular items by other member libraries, in order to reports and working papers 221 avert undesirable duplicative purchases. rlgirlin members actively coordinate purchases of certain categories of materials in designated fields in order to avoid extensive duplication and to ensure that at least one copy of every item of research value be acquired by a member institution. in support of this effort is an automated "cooperative purchase file," containing limited bibliographic information and acquisition decisions of rlg members for all new serials on order and for all expensive items ($500 or more). member institutions agree to develop conspectuses reflecting their level of holdings and development in certain fields (subjects, language, and formats). these conspectuses are time-consuming to develop. a survey of holdings in chinese, japanese, and korean languages has been finished by 12 members. older members have completed language and literature, fine arts, philosophy, and religion. history is expected by march, 1981, to be followed by the hard sciences. based upon these conspectuses, rlg members will build a system-wide collection development policy. new members are expected to begin work on their conspectuses as soon as possible, but not necessarily immediately after joining rlg. wln members may search the database to determine the holdings of particular items by other member libraries, in order to avert undesirable duplicative purchases. libraries may also search the in-process file to determine if items are on order by one of the 23 libraries using wln's acquisitions subsystem. support for collection development activities a bibliographic utility is potentially useful for collection development in that it provides a large file of bibliographic records that may be searched to assist in a) determining the existence of published materials in specified categories (on a particular subject, by a particular author, in a particular series, for example), and b) obt~ining cor222 journal of library automation vol. 14/3 september 1981 rect bibliographic information about specific items to assist in ordering them. important features in a utility in this regard are database size and variety of access points (subject, author, series titles, etc.). oclc useful access points by which the database may be searched include: • personal author • corporate author • title • series title • variant names (e.g. clemens or twain) • conference names the database must be searched using a "search key" (a code based upon a sequence of initial letters in the words to be searched), not real words. rlg/rlin useful access points by which the database may be searched include: • personal author • corporate author • conference names • title • series title • subject heading or call number range (excluding items cataloged by the library of congress) • publisher, using a truncated isbn (international standard book number) [restricted to items cataloged by the library of congress] a search of rlin is likely to produce multiple records for particular items because an item held by more than one member will be displayed for as many libraries as have cataloged it through the system. it is projected that by april, 1981, run's "reconfigured database" will have solved that problem by attaching holdings information to one unified record. it will also have merged the two bibliographic subfiles (library of congress and member cataloging) so that access by subject heading, call number range, and isbn will be available for the entire database. wln useful access points by which the database may be searched include: • personal author • corporate author or corporate author keyword (keyword searching permits the user to search for items using either the full heading: american society for information science; or words from the heading: "society" and "information. " this capability is useful when the complete phrase is not known.) • title • corporate or conference author/title series (keyword) • series title or truncated series title • subject heading and/or subdivision or truncated subject heading • corporate and conference name subject headings (keyword) preservation of library materials all bibliographic utilities, because of their function as a union catalog of their members' machine-readable cataloging information, have some usefulness for libraries making decisions about preservation priorities. a library may, for example, choose to give preservation treatment to item a rather than item b because item b is owned by several other libraries in the vicinity, whereas item a appears to be unique. it must be remembered, however , that many older items will not appear at all, because they were cataloged long before the utilities came into existence. oclc members may search holdings information in the database to determine the relative rarity of an item that is a candidate for preservation treatment. rlgirlin members may search holdings information in the database to determine the relative rarity of an item that is a candidate for preservation treatment. a computerized list of members' micropreservation activities is provided. experimental programs are conducted to test new preservation technologies and applications of existing processes . preservation microfilming is being done for members by staff at yale and princeton. funds are provided to members for preservation activities. r these activities are part of rlg's preservation program, one of its four major programs. wln members may search holdings information in the database to determine the relative rarity of an item that is a candidate for preservation treatment. technical processing acquisitions the steps by which the library purchases books and other materials include: l. pre-order searching to determine that a requested item is not already owned by the library or on order. 2. selecting a dealer likely to be able to supply desired item. 3. placing the order. 4. receiving the item. 5. clearing the order records. 6. processing the invoice for payment. 7. maintaining precise accounting of all book funds. 8. inquiring about the status of items which are not received when expected. 9. cancelling orders and adjusting accounting records when items are not available. at the uo most acquisitions forms and files are created and maintained manually. in an automated acquisitions system the placing of the initial order generates an acquisition record for each item, which is updated as the item moves through the cycle outlined above. this eliminates the need for maintaining separate files according to the status of an order. oclc operational. oclc has an online nameaddress directory which presently can be searched while using other oclc subsystems. this file contains information about publishing, educational, library, and professional organizations and associations. this information will be automatically transferrable to forms being produced online. planned. oclc's acquisitions subsystem, which is presently being tested by sereports and working papers 223 lected member libraries, is projected to be generally available in spring 1981. when operational the acquisitions subsystem will permit users to: place orders for all types of bibliographic materials (forms generated will be sent directly to supplier with copy to library) renew subscriptions request publications or price quotations create deposit account orders send prepaid orders cancel orders create and adjust fund records receive periodic fund reports rlgirlin operational. rlin does not have an operational acquisitions subsystem. stanford university is continuing to use a system developed as part of ballots. planned. the rlg board of governors has approved functional specifications for an acquisitions subsystem to be introduced in two phases. by june 1981, rlin plans to have a centralized in-process file which will contain records of all new orders, gifts, subscriptions, etc. of members, and will be able to support non-accounting aspects of the acquisitions process. the capability to store and maintain an online book fund accounting system will be achieved in october 1981. rlin expects to be able to support all files, processing, and products necessary to establish, coordinate, and monitor materials acquisitions from the point of selection decision, request, order, or receipt through completion of technical processing activity. wln operational. wln's acquisitions subsystem, which has been operational since may 1978, is comprised of four files: 1. in-process file which supports the majority of acquisitions activities. 2. standing orders file which has records for subscriptions and other items which are renewed or reordered on a continuing or periodic basis. 3. name and address file which contains names and addresses of book dealers and other vendors, main libraries, branch libraries, etc. 4. account status file which provides ca224 journal of library automation vol. 14/3 september 1981 pability to maintain up-to-date accounting. information keyed into the terminal during the day is entered against the accounts nightly and is reflected in the account totals available online the following day. records of completed transactions are transferred to a magnetic tape history file and can be used for generating statistical and other reports. with each step of the order cycle, appropriate forms and reports are generated. special system reports reflecting the status of the four files may be generated on request. instructions entered at the time of the initial order provide for automatic generation of notification forms for individuals requesting the specific item being ordered or inquiry notices for materials not received after a specified period. planned. further refinements of the procedures and capabilities of the system. cataloging the creation of a cataloging record involves: i. describing an item 2. assigning headings for names of persons or organizations and titles by which the user might be expected to seek the item in the catalog 3. assigning a unique call number which will place the item with others of a similar nature, and 4. assigning subject headings which reflect the content of the item. because most libraries collect many of the same materials, the concept of sharing the responsibility for cataloging was developed which makes materials available more quickly at reduced cost. with the establishment of national and international cataloging rules and standards, and the growth of large online computerized databases, it is becoming increasingly feasible to have each item cataloged only once with that cataloging information available for all libraries to use. the library of congress catalogs approximately 250,000 titles per year into machine-readable form . this cataloging is available through each of the bibliographic utilities and may be used for the creation of local catalogs. when the library of congress has not yet cataloged a specific item, a utility member library may prepare the cataloging according to specified standards and enter its cataloging into the database for use by other member libraries and for its own catalog. another aspect of the cataloging activity is the creation of a local database which can be used as the basis of not only the local library catalog, but also of a local circulation, acquisitions, and serials system, as well as for regional union catalogs. in order to provide total access to a library's collection in this machine-readable database, information concerning every item in the library must be entered into the system. this process is called retrospective conversion. during the retrospective conversion process the library can choose to eliminate existing inconsistencies in the treatment of library materials including reclassifying books so that most materials are retained in one main classification system. the university of oregon library has as a long-term goal completing total retrospective conversion of its collection so that all materials can be searched and located in an online catalog. oclc operational. oclc's online cataloging subsystem has been operational since 1971. based on the experience of similar libraries, the university of oregon library might expect to find entries in oclc's database for over 90 percent of the items searched . • these cataloging records can be modified online or accepted as is. the local library's symbol is added to indicate that it has used the cataloging record and then presorted, alphabetized catalog cards are ordered. the cards are printed overnight and shipped on a daily basis. many oclc libraries print their call number labels by means of a printer attached to their terminal. once a cataloging transaction has been completed, it is not possible to retrieve your local modifications online in the oclc system. the record of your transaction is stored and sent to your library on magnetic tape on a periodic basis. these magnetic archive tapes can be used by a vendor or •see footnote on page 225. local computing center to generate a local microform or online catalog, run a circulation system, etc. it is presently possible to catalog most types of materials in the oclc system including books, serials, microforms, motion pictures, music, sound recordings, maps, and manuscripts. increased emphasis has been placed on quality control and adherence to specified standards in the creation of cataloging records, but there is no official editing of cataloging records by oclc staff. in 1979-80 nearly 45 percent of the activity on oclc's cataloging subsystem was related to retrospective conversion. oclc's large database, extended hours of service, and special pricing schedules for retrospective conversion and reclassification make it attractive for these activities. oclc charges 60 c~nts per retrospective conversion record during hours of peak system activity (prime time) and five cents per retrospective conversion record during less busy hours (non-prime time). planned. oclc continues to explore means of improving quality control. after moving their central facility to new quarters in early 1981, oclc will reconsider the possibility of storing and displaying the number and location of local copies of a title. rlgirlin operational. at this time the university of oregon might expect to find cataloging available for 70 to 90 percent of its ongoing work in rlin. t a search of rlin's database retrieves multiple records because each library's records are stored separately. the reports and working papers 225 library selects the desired record, modifies or accepts it, enters the library's symbol, and orders cards which are printed nightly and sent in presorted, alphabetized batches. no call number labels are produced, and it is not presently possible to print labels from the terminal. local library modifications are accessible online. magnetic tapes or cataloging transactions may be purchased and used to create local online or microform catalogs. most materials may be cataloged with rlin including books, serials, microforms, motion pictures, music, sound recordings, and maps. member libraries agree to catalog in conformity with rlin standards, but there is no formal editing of records by rlin staff on an ongoing basis. sample quality checking is the responsibility of a newly-created position of quality assurance specialist. with only 23 owner-members, rlg must carefully consider the impact on the system of allowing individual members to undertake retrospective conversion projects. each project must be approved by the board of governors, and members are encouraged to seek outside financial support rather than asking rlin for reduced rates. rlin has just received a 1.25 million dollar grant including $600,000 to support retrospective conversion projects. rlin does not charge for retrospective records which are completely recataloged and upgraded with the book in hand. the prices for other levels of retrospective conversion cataloging range from fifty-five cents to $1.85 per record. planned. in april 1981, rlin plans to reformat its database so that there will be t a wide range of success rates for searching each system are cited in the literature, each dependent on the sample procedures used. the university of oregon library had 100 items searched against each database. this sample excluded books with printed library of congress card numbers, and included books, serials, microforms, music scores, recordings, documents, and non-book materials. of this sample oclc found 96, rlin found 65, and wln found 38. the range of figures cited in this report allows for variation between studies cited in the literature, word-of-mouth reports from librarians using these systems, and the university of oregon library's own sample. an analysis of this sample is being prepared. recent comparisons of searching success are found in the following: linking the bibliographic utilities: benefits and costs, submitted to the council on library resources ... by donald a. smalley [and others). columbus, ohio, battelle, 1980; matthews, joseph r. , "the four online bibliographic utilities: a comparison," library technology reports 15:6 (november-december 1979), p. 665-838; tracy, juan i. and remmerde, barbara, "availability of machine-readable cataloging: hit rates for ballots, bna, oclc, and wln for the eastern washington university library," library research 1:3 (falll979), p. 227-81. 226 ] ournal of library automation vol. 14/3 september 1981 only one copy of each cataloging record. member libraries' symbols and local cataloging information will be displayed with the appropriate records. wln operational. based on the experience of others, the university of oregon library might currently expect to find cataloging records available for 50 to 70 percent of its ongoing work in the wln database. • libraries search wln's database, accept or modify the cataloging records, and order cards and labels which are printed nightly and shipped weekly. (card sets are not presorted for filing.) local cataloging information is accessible online through the library's wln terminal. magnetic tapes of a library's cataloging transactions may be purchased to run a local online or microform catalog. wln also provides microform catalogs on either microfilm or microfiche. books, serials, and audio-visual materials, but not music, sound recordings, and maps may be cataloged on wln's system. libraries cataloging in wln must conform to well-defined wln standards. new cataloging records go through an edit cycle and are reviewed by central wln staff before being added to the wln database. presently this review takes about two weeks. during this period, the cataloging record may not be retrieved online. the wln batch retrospective conversion subsystem has been operational since august 1980. using this system a library enters brief cataloging records which are collected by the system and searched later as a unit through the wln database. records for which a match is found are billed at six cents. records not matched are billed at one cent and may be searched again at a later date. over 30 wln libraries are using this capability, which can be made available to non-members under special circumstances. planned. wln is considering dispersing among selected member libraries responsi~ility for editing member-created catalogmg records. wln :will make music cataloging available within the near future. •see footnote on page 225. serials check-in serials are publications issued in successive pa~ts be~ring n~merical or chronological designations which are intended to be co~tinued indefinitely. they include periodicals; newspapers; annual reports and yearbooks; i?urnals, memoirs, proceedings an~ transactions of societies; and numbered senes. the average research library will have between 15,000 and 20,000 such titles. precise data must be maintained to enter ~ach issue as received, to discover missing ~ssues, to requ_est replacements for missing issues, to momtor accounting information, to ~enew or cancel subscriptions, and to mamtain binding information. serials files contain such information as title, relationship to earlier publications, name and address of publisher volumes the library owns, call number a'nd location date, volume, and number of each issue' date each issue was received, subscriptio~ dates, price, etc. at_ t?e university of oregon library all of this mformation is maintained in manual files. once the serials check-in operation is co_mpute~ized, it is possible to generate a w1de ~anety of serials finding lists, analyses of senals subscriptions by subject, location, department, etc., and to provide current serials information online. oclc operational. oclc introduced its serials control subsystem in 1976 and improvements to the system in 1979. participants create online local data records with information necessary to monitor and cont~ol each iss~e o~ each serial received by the hbr~ry. i_nshtutwns can check-in currently received issues online. ~recent ancillary to this system is the ability to create and maintain online a cooperativ~ r~or~ of serials owned by any group of mshtutwns (a union list of serials). pl~nned. oclc plans to continue upgradmg the capabilities of its serials control subsystem as needed. rlg!rlin operational. none. planned. automated serials check-in is one of several items listed for consideration after current development activities are released, probably in late 1982. no resources are presently committed to this project. wln operational. while wln has no current serials check-in capabilities, it does support maintenance of serials subscriptions in the acquisitions subsystem, including automatic renewal and reorder reminders. wln also produces union lists of serials. planned. wln is investigating existing commercially-created check-in systems to see whether they can purchase an existing system to incorporate into wln's services. management lnfonnation precise up-to-date information concerning library operations can be very useful in planning improvements in library services and in attaining efficient utilization of available personnel, resources, and materials. without the computer, the laborious record-keeping necessary to obtain useful management information almost negates the benefits of having the information. oclc operational. oclc produces cataloging, interlibrary joan, and serials check-in system use and system performance statistics on a regular basis. libraries can make local arrangements to create additional analyses of the information stored on subscription archival tapes of their local cataloging activity. oclc offers semimonthly, monthly, or quarterly accession lists of new materials cataloged by each library. these lists may be in call number or subject sequence. oclc has produced some special studies for institutions based on their cataloging records. planned. when the acquisitions subsystem is operational, libraries may choose to receive a cumulative, monthly fund activity report and a periodic, cumulative fund commitment register. these reports will provide institutions with current financial control data. oclc plans to continue to develop its ability to provide management information. reports and working papers 227 rlgirlin operational. system use statistics are provided in the form of the monthly invoice, which may be used to monitor cataloging and public service activity, and may be broken down into appropriate accounts by pre-planning. lists in call number order of materials cataloged by a library into rlin could be produced from local printers attached to the terminal. planned. the generation of management information is a future development project; no special management reports are prepared presently. among the management reports included in the specifications for the acquisitions subsystem, projected for implementation by october 1981, are status reports on in-process files, materials awaiting receipt, materials received, and book fund balances. wln operational. wln produces aggregate system activity reports monthly, but does not analyze the cataloging activity or subject holdings. wln's acquisitions subsystem can be used to produce acquisitionsrelated management reports concerning account transactions, account history, standing orders, renewals and reorders, receipts, detailed encumbrances, etc. a microform accession list by title is available. a general-purpose text-editing facility may be used by management to maintain data not derived from wln operations and to produce formatted reports of this data. planned. wln is developing the capability to store and maintain detailed collection information for each library online, including copy numbers and location symbols for each copy of a title owned by a library. no specific management information plans have been outlined at this point. public services reference use of the utility's terminal a bibliographic utility has potential for use in library reference services in three major areas: 1. verification of bibliographic information. the utility's database may be searched for cataloging information 228 journal of library automation vol. 14/3 september 1981 not in the uo library catalog. a verification search is made to locate a complete catalog description of a specific, known item and is carried out most easily using one of the unique numbers assigned to a publication (library of congress card number, international standard book number, etc.). if one of these is not known, a combination of author and title words, or a "search key"• based on author and title is used to retrieve the information. verification places a greater reliance on the quality of bibliographic information in the utility's database than on search techniques used to locate the information. 2. compilation of subject bibliographies. the utility's database is searched through words in the titles and subject headings in a bibliographic record in order to produce a list of materials on a given subject. this subject query can be modified using the logical relationships and, or, and not to indicate, respectively, limitations, synonyms, or exclusions in the search. the ability to obtain a printed list of references is convenient, if not required. 3. compilation of author bibliographies. the database is searched to find all material created by a particular individual or corporate body. the size of the utility's database is a major consideration, as is the source of the cataloging found in an author search. again, a printed list is necessary. oclc the oclc database can be searched in a variety of ways to support reference ser• a search key is a code based on a certain number of characters drawn from a particular element in the bibliographic reference. for instance, to find a record for william manchester's american caesar, an author/title search key using the first four letters of the author's name and the first four letters in the title would be manc,amer. various combinations of letters are used to search author names, titles, or author/title combinations. a search key may not necessarily be unique to a given item , and may retrieve other items beside the one desired. vices, though there is no subject search capability in the system. the following access points may be used in a search: 1. lc card number 2. international standard book number (isbn) 3. international standard serial number (issn) 4. coden (an abbreviation developed by chemical abstracts service for designating periodical titles) 5. government documents number 6. oclc identification number 7. personal author (search key, not full words) 8. corporate author (search key) 9. performer (search key) 10. title (search key) 11. author/title (search key) 12. series title (search key) 13. variant names (search key) 14. conference names (search key) searches may be restricted by year or by type of material, such as books, manuscripts, maps, etc. the logical operators and, or, and not are not used in oclc. the oclc search system is primarily based on search keys and is best utilized to locate a known item. local printing is available on any oclc terminal so equipped. there is one standard print format offered. rlgirlin the following access points may be used in a search of the rlin database, though not all are currently active in each subfile of the database: 1. lc card number 2. isbn 3. issn 4. coden 5. government documents number 6. rlin identification number 7. call number (complete or truncated) 8. recording label number 9. personal author 10. corporate authors or conference names (keyword or phrase) 11. title words 12. subject headings (keyword or phrase) 13. music publisher truncation (searching of partial entries) is available to aid in searching incomplete entries and the logical operators and, or, and not may be used to broaden or restrict a search. local printers may be attached to the rlin terminals. a variety of print formats is offered. plans include unified search access points for all subfiles of the database as of april, 1981. wln the "following access points may be used to search the wln database: 1. lc card number 2. isbn 3. issn 4. wln identification number 5. personal author 6. corporate authors or conference names 7. title words 8. series title (complete or truncated) 9. corporate or conference author/title series (keyword) 10. subject headings (complete or truncated) for a variety of reasons, the wln search system is the most powerful of the three utilities. truncation is available and the logical operators and, or, and not may be applied to broaden or restrict a search. records may be printed locally in a variety of formats on any wln terminal so equipped. wln will also provide printing at the central computer for reference bibliographies. wln search software may be purchased for local database management applications (see the section on online public catalogs.) links to other computerized services there are presently over 150 reference databases available through commercial computerized reference service vendors. during the last ten to fifteen years, standard bibliographic indexing and abstracting publications such as chemical abstracts, historical abstracts and dissertation abstracts international have used computerized methods to organize and print references to periodical articles, reports, dissertations, conference papers, etc. the vendor creates a computer searchable version of the reference database and makes reports and working papers 229 it available to libraries for a fee based on their use of the computerized search system. membership in a bibliographic utility can provide two benefits in the use of other computerized reference services: 1. discounts on fees through membership in large group contract administered by the utility. 2. access to the reference vendor's computer through the utility's terminal and communication network. oclc oclc's affiliated online services program provides access at discounted rates to the information services of bibliographic retrieval service (brs), lockheed information systems (lis), and the new york times information bank. oclc's communications network does not yet permit users to link to the hosts using an oclc terminal, though this capability is anticipated in the near future. rlg!rlin rlin does not offer a formal program in this area, though the rlg 40 terminal is compatible with other information retrieval systems. wln wln does not offer a program in this area, but anticipates offering access to brs, lis, and new york times information bank. circulation none of the bibliographic utilities under consideration currently support circulation functions on their computers. however, each system can provide a machinereadable archive tape of our cataloging information to be used in developing a computerized circulation system. in order to keep track of circulation transactions, it is necessary to have complete retrospective conversion of the uo library catalog. another important consideration is the transferability of data between the utility's computer and the circulation computer. oclc oclc anticipates offering support for local circulation systems on their computer 230 journal of library automation vol. 14/3 september 1981 for member libraries and will demonstrate their system in mid-1981. oclc data has been successfully transferred to many local circulation systems. rlg/rlin rlin does not anticipate offering local circulation services for member libraries. rlin data has been successfully transferred to several local circulation systems. wln wln does not anticipate offering local circulation systems on their computer for member libraries. wln data has been successfully transferred to local circulation systems and an agreement has been reached with dataphase, a computerized circulation system vendor, to discount purchase of their system by wln member libraries. public online catalogs again, none of the bibliographic utilities under consideration currently support public online catalogs of an individual library's collection. a public online catalog requires further programming in order to make it easy for the public to locate materials of interest without extensive training; the bibliographic utility's searching procedures are too esoteric to be used by the general public. as in circulation, issues of data transferability and full retrospective conversion of the uo library's catalog are paramount. oclc oclc does not currently encourage public access to their database and does not support use of local online catalogs on their computer due to the tremendous demand for computer resources exerted by 2400 member libraries. oclc and rlg/ rlin are participating in a study of user requir~ ments for a public online catalog. oclc data has been successfully transferred to several local online catalogs, including eugene public library's circulation and online catalog system, ulisys. rlgirlin rlin anticipates being able to offer public access to their database. they are participating in a study with oclc of user requirements for such a system, but no date has been announced for the development of this capability in rlin. rlin data has been successfully transferred to a local public online catalog at northwestern university. wln wln does not believe that a local online patron accessed catalog should be provided through the wln computer, even though they anticipate having such a capability within one year. instead, they encourage libraries to develop local systems for public access to the online computerized catalog and to obtain data from the wln cataloging system . the university of illinois is adapting the wln computer search and database management software to provide a local online catalog and computerassisted instruction in its use for the public. checklist for cassette recorders connected to crts prepared by lawrence a. woods: purdue university libraries, west lafayette, indiana, for the technical standards for library automation committee, information science and automation section, library and information technology association . introduction a data cassette recorder connected to a printer port is an effective, low-cost method of collecting data in machine-readable form from display terminals such as the oclc 100/105. it is important that a data recorder be used rather than an audio recorder although the cassette itself can be a goodquality audio tape. it is also important to note that the data recorded on the tape are not the same as the data originally transmitted to the display terminal, but are simply a line-by-line image of what appears on the screen. a typical installation will have a minimum of two devices: one attached to the display terminal to collect data, and one attached to a printer or an input device to another computer for playback of the data. there are more than 150 various data remicrosoft word june_ital_gerrity.docx editor’s comments bob gerrity information technology and libraries | june 2015 1 library discovery circa 1974 our ongoing project to digitize back issues of information technology and libraries (ital) and its predecessor, journal of library automation (jola), provides frequent reminders of what’s changed (and what hasn’t) in library technology in the past several decades. the image above is from a 1974 advertisement in jola for the “rom ii book catalog on microfilm” from information design in menlo park, ca. the ad copy speaks for itself: all the advantages of a printed book catalog…none of the disadvantages. your staff and patrons can use the catalog simultaneously in many different locations. the user can scan a number of related titles on the same page, in contrast to the one-‐at-‐a-‐time viewing of catalog cards in trays. manual filing routines and maintenance are eliminated. easy to use…requires no instruction. an automatic index pointer shows your patron his position in the file. at the touch of a button he can scan forward or back at high speed. average look-‐up time is about twelve seconds. a staff member can insert an updated catalog totally cumulated on a single reel of microfilm in about one minute. your patrons never touch the film—your complete library catalog “locked-‐in”! bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. editor’s comments | gerrity doi: 10.6017/ital.v34i2.8805 2 my favorite bit is the sign on the front of the machine, proudly proclaiming: these are all the books in the library. this month’s issue of ital looks at the current state of library discovery from a number of angles. will owen and sarah michalak describe efforts at unc chapel hill and partners within the triangle research libraries network to enhance the utility of the library catalog as a core tool for research, taking advantage of web-‐based search technologies while retaining many of the unique attributes of the traditional catalog. joseph deodato provides a useful step-‐by-‐ step guide to evaluating web-‐scale discovery services for libraries. david nelson and linda turney analyze faceted navigation capabilities in library discovery systems and offer suggestions for improving their usefulness and potential. julia bauder and emma lange describe a new approach to subject searching, using an interactive, visual approach. yan quan liu and sarah briggs report on the current state of mobile services among the top 100 us university libraries. unrelated to discovery but certainly relevant to issues around library provision of access to information, jill ellern, robin hitch, and mark stoffan report on user authentication policies and practices at academic libraries in north carolina. lib-mocs-kmc364-20131012113204 194 journal of library automation vol. 14/3 september 1981 today's large academic libraries struggle, there is, nonetheless, room for criticism of library priorities. this study must be viewed as only a first step (largely tentative and exploratory) in relating automation with service attitudes. it suggests that online systems may be associated with managers more positive in their view of the management role and more positive in their attitudes toward users than batchand manual-system managers. further research would be useful at this point to compare levels of automation (manual, batch, and online) with circulation-staff service attitudes or those of patrons using the systems. references l. laurence miller, "changing patterns of circulation services in university libraries" (ph.d. dissertation, florida state university, 1971), p.iii. 2. ibid., p.149. 3. robert oram, "circulation," in allen kent and harold lancour, eds., encyclopedia of library and information science, v.s (new york: marcel dekker, 1971), p.l. 4. william h. scholz, "computer-based circulation systemsa current review and evaluation," library technolo gy reports 13:237 (may 1977). 5. robert oram , " circulation," p.2. 6. james robert martin , "automation and the service environment of the circulation manager" (ph.d. dissertation, florida state university, 1980), p.22. statistics on headings in the marc file sally h. mccallum and james l. godwin: network development office, library of congress, washington, d.c. in designing an automated system, it is important to understand the characteristics of the data that will reside in the system. work is under way in the network development office of the library of congress (lc) that focuses on the design requirements of a nationwide authority file. in support of this work, statistics relating to headings that appear on the bibliographic records in the lc marc ii files were gathered. these statistics provide information on characteristics of headings and on the expected sizes and growth rates of various subsets of authority files. this information will assist in making decisions concerning the contents of authority files for different types of headings and the frequency of update required for the various file subsets. then ational commission on libraries and information science supported this work. use of these statistics to assist in system design is largely system-dependent; however, some general implications are given in the last section of this paper. in general , counts were made of the number of bibliographic records, headings that appear in those records, and distinct headings that appear on the records. the statistics were broken down by year, by type of heading, and by file. in this paper, distinct headings are those left in a file after removal of duplicates. distinctness will not be used to imply that a heading appears only once in a source bibliographic file, although distinct headings may in fact have only a single occurrence. thus, a file of records containing the distinct headings from a set of bibliographic records is equivalent in size to a marc authority file of the headings in those bibliographic records. methodology these statistics were derived from four marc ii bibliographic record files maintained internally at lc: books, serials, maps, and films. the files contain updated versions of all marc records that have been distributed by lc on the books, serials, maps, and films tape:; frum 1969 through october 1979, and a few records that were then in the process of distribution. the files do not contain cip records. a total of l ,336,182 bibliographic records were processed, including 1,134,069 from the books file, 90,174 from the serials file, 60,758 from the maps file, and 51,176 from the films file. a file of special records, called access point (ap) records, was created that contains one record for the contents of each occurrence of the following fields in the bibliographic records: type of heading personal name corporate name conference name topical subject geographic subject uniform title heading fields 100,700,400,800,600 110,710,410,810,610 111,711,411,811,611 130, 730, 650 651 830,630 only the 6xx subject fields that contained lc subject headings (i.e., second indicator = 0) were selected asap records. the main entry data string was substituted for the pronoun in the series (4xx) fields that contained pronouns. the ap records also contained information from the bibliographic records that assisted in making the counts, such as the date of entry of the record on the file, the identity of the type of bibliographic file, and the language of the bibliographic record. a third file was derived from the ap file that contained a normalized character string for each ap record heading. these normalized ap records were used to produce the counts of distinct headings by clustering like data strings. normalization included conversion of all characters to uppercase, and masking of diacritics, marks of punctuation, and other characters that do not determine the distinctness of a heading, but would interfere with machin~ determination of uniqueness. the subhelds included in the normalized string, hence used for all heading comparisons, are given below. only use-dependent subfields, such as the relator subfield, and those that belonged to title clusters in author/title headings were excluded. examples of the ap file field contents and the normalized forms are: ap field contents: chuang-tzu chuang-tzu [blaeu,joan] 1596-1673 blaeu, joan. 1596-1673 blaeu,joan, 1596-1673 byron, george gordon noel byron, baron, 1788-1824 byron, george gordon noel byron, baron, 1788-1824 byron, george gordon noel byron, baron, 1788-1824 byron, george gordon noel byron, baron, 1788.1824 communications 195 normalized forms: chuang tzu blaeu joan 1596 1673 byron george gordon noel byron baron 17881824 distinct headings for this study were determined by comparing on the following subfields: type of heading personal name corporate name sub fields a, b,c,d a, b, k, f, p, s, g conference name a, q, e topical subject a, b, x, y, z geographic subject a, b, x, y, z all occurrences of repeating subfields were included. the relator data of subfields were dropped from personal and corporate name headings as were the title subfields in author/title headings. a separate study will examine the occurrence of author/title headings. approximately 8 percent of the name headings in the files carry title subfields: 6 percent are series and 2 percent are author/title subjects or added entries. two types of distinct heading counts were generated for topical and geographic subject headings. one takes account only of main terms, the a and b subfields, excluding all subject subdivisions. the other compared the complete heading strings, including subject subdivisions. characteristics of the files the four bibliographic files from which the statistics were derived were begun in different years and are of unequal size. table 1 presents the number of bibliographic records added to each of the marc files by the year that the record was first entered into the file. the records added in the first months of 1979 have been eliminated from tables 1-3, thus the total number of records under consideration is 1,210,809. in the combined file, the records for books dominate the contributions from other forms of materials, representing 85 percent of the combined file records. after the addition of the films and serials records in 1972 and 1973 the total number of records added each year leveled off to around 115,000 but jumped to an average of slightly more than 150,000 records per year following the ad196 journal of library automation vol. 14/3 september 1981 table 1. number of records added to each file by year year entered book serial map film total 1968 11,812 0 0 0 11,812 1969 43,874 0 1, 104 0 44,978 1970 86,004 0 3,467 0 89,978 1971 105,390 0 8,857 6,280 114 ,247 1972 73,437 0 4,665 6,280 84,382 1973 92,512 3,720 5,566 8,929 110,727 1974 99,004 10,682 6,246 8,457 124,389 1975 86,527 15,866 6,721 8,604 117,718 1976 120,106 19,098 6,876 5,432 151,512 1977 140,011 17,999 7,011 4 ,797 169,818 1978 169,044 12,643 5,584 4,464 191,735 total 1,027,721 80,008 56,117 46,963 1,210,809 table 2. numbers of headings and distinct name headings added to all files by year number of headin gs number of distinct headin gs year personal corporate conference personal corporate conference entered names names names names names. names 1968 14,526 3,138 155 12,620 2,139 143 1969 53, 134 21,206 1,027 39,184 9,364 909 1970 104,365 42 ,798 2,175 63,037 14,286 1,769 1971 129,617 57,496 2,742 64,029 15,216 2,158 1972 91,040 45,768 1,942 41,246 9,891 1,402 1973 118,188 57,847 2,625 48,703 12,653 1,862 1974 127,588 73,303 2,972 51,623 17,129 1,983 1975 113,622 76,417 2,519 50,291 18,135 1,742 1;}76 154 ,7 18 88,207 3,454 73,182 23,120 2,306 1977 182,860 87,985 3,487 89,353 23,906 2,333 1978 218,535 97,042 4,192 99,780 24,280 2,831 total 1,308, 193 651,207 27,290 633,048 170, 119 19,438 table 3. numbers of subject headings and distinct subject headings added to all files by year number of distinct headings number of headings first terms only full headings year topical geographic topical geographic topical geographic entered subjects subjects subjects subjects subjects subjects 1968 10,615 1,857 4,390 489 7,775 1,512 1969 45,161 9,047 8,104 1,980 23,617 5,426 1970 89,304 21,054 8,170 4,263 34 ,526 10,179 1971 115,220 31,278 6,853 5,417 36,689 12,862 1972 92,247 20,760 4,236 2,597 26,201 7,074 1973 121 , 161 27,890 4,460 3,105 33,061 9,819 1974 137,843 31,814 4,524 3,553 39,262 11 ,4 13 1975 130,980 30,650 4,203 3,417 40,129 11 ,818 1976 168,840 39,886 5, 125 4,142 55,468 15,472 1977 185,331 44,973 5,718 4,194 59,529 16,676 1978 222,565 49,923 7,151 4,034 69,856 17,855 t otal 1,319,267 309,132 62,934 37,191 426, 113 120, 106 clition of major non-english roman alphabet language records in 1976. the increase is noticeable primarily in the books and serials files since the maps file had been adding those languages since 1969 and only a limited number of non-english-language audiovisual materials are cataloged. the unusually large number of records added to the books file in 1971 resulted from a special project to add retrospective titles to the file. the large increase in books records in 1978 was due to the co marc project in which retrospective lc records that had been converted to machine-readable form by other libraries were contributed to the lc marc file. approximately 12,000 comarc records were added in 1977 and 28,000 in 1978. the fall in numbers of film records produced in 1976-1978 reflects a general fall in production of instructional films in the united states. counts of items cataloged that are compiled by lc processing services from catalogers' statistics sheets show that lc cataloged approximately 225,000 titles in 1978; thus, approximately 73 percent of lc cataloging is currently going into machinereadable form. the principal exclusions are records for most nonroman material (only nonroman records for maps have been transliterated and added since 1969) and a few records for music, sound recordings, incunabula, and microforms. the portion being put into machine-readable form should rise significantly as the romanized records for items in several nonroman alphabets are added in the next year. name headings table 2 presents the number of occurrences of name headings in the marc bibliographic files and the number of distinct name headings, both by type of heading and by year. the number of distinct headings that were new to the file in a year was determined by comparing the headings added in a given year against those added in all previous years. it is not surprising to find that 66 percent of name-heading occurrences are personal names, 33 percent are corporate, and only 1.4 percent are conference. the figures shift when considering the distinct names, where 77 percent are percommunications 197 sonal and only 21 percent are corporate. looking at ~he total figures in table 2, while 1 ,308,193 of the headings that appeared on the records were personal names, only 633,048 or 48 percent of these were distinct. of the rest, 52 percent were duplicates of the distinct headings. similarly, 26 percent of corporate names were distinct, with 74 percent being duplicates; and 71 percent of conference names were distinct, with only 29 percent being duplicates. in 1968, 87 percent and 68 percent of personal and corporate names, respectively, were distinct, i.e., 13 percent and 32 percent "had been used previously" when they appeared on a bibliographic record during the year. as the base file of names grows, the percentage of names appearing on new records but which "had been used previously" rises, to 60 percent and 77 percent in 1974. while the figures reported in table 2 indicate that the percentage of headings used that were repeats fell slightly again in 1977 (51 percent and 73 percent), this is probably due to the influx of new names with the addition of new languages in 1976-77. additional statistics gathered on english-language items show the percentage of repeating headings becoming steady after 1974. subject headings statistics concerning distinct topical and geographical subject headings were collected for main terms, excluding subdivisions, and for full subject heading strings. table 3 gives the numbers of headings and the numbers of distinct headings of each type found in the marc file. looking at the total figures, only 4.8 percent of topical first terms are distinct, the rest are duplicates. this indicates an average occurrence of 20.8 times for each first term. slightly more, 12 percent, of the geographic first terms are distinct. when the full headings with topical, period, form, and geographic subdivisions are considered, the percentage of headings that are distinct rises to 32.3 percent for topical subjects and 38.8 percent for geographic subjects. thus, 67.3 percent of topical and 61.2 percent of geographic are duplicates of existing headings. in the yearly figures, sub198 journal of library automation vol. 14/3 september 1981 ject headings show the same tendency as name headings in that the percentages of headings that appear on new records but which "had been previously used" rises as the stock of headings increases and then levels off. subjects were also affected by the addition of other roman alphabet languages in 1976-77 but not to a very large degree. for all access points, name headings and full string subject headings, name headings account for 55 percent of the headings that occur in the bibliographic records, with only 45 percent attributable to topical and geographical headings. it should be noted that 12 percent of the name headings that appear on the bibliographic records are names used as subjects. frequencies of occurrence counts were also made of the frequency with which name headings occurred in the bibliographic files. table 4 summarizes the frequency data: 66 percent of distinct personal names, 62 percent of distinct corporate names, and 84 percent of distinct conference names occur only once in the files. the percent of corporate names with single occurrences is surprisingly close to that for personal; however, the percent of names having multiple occurrences falls more slowly for corporate than for personal names. while 5.47 percent of corporate names occur ten or more times, only 1.92 percent of personal names occur ten or more times. the figures for personal names roughly correspond to those obtained by william potter from a sample taken from the main catalog at the university of illinois at urbana-champaign. that study showed 63.5 percent of personal names occurred onlyonce. 1 the number of occurrences of different types of headings are compared in figure 1. the bars show the numbers of personal, corporate, conference, topical, and geographic headings that appear in the bibliographic files. the shaded areas represent the number of headings that are distinct, thus the upper part of each bar represents additional occurrences of the headings from the shaded area. for personal, corporate, and conference headings a further distinction is made between distinct headings that occur only once, the crosshatched area, and those that have multiple occurrences. thus the multiple occurrences of corporate names may be seen to come from a small table 4. frequency of occurrence of name headings in all files distinct distinct distinct number of personal names corporate names conference names occurrences number percent number percent number percent 1 456,328 65.65 116,250 62.02 18,02 1 83.90 2 119,68 1 17.22 30,185 16.10 2,049 9.54 3 46,247 6.65 11,563 6.17 587 2.73 4 23,951 3.45 6,814 3.64 289 1.35 5 13,820 1.99 4,109 2.19 163 .76 6 8,790 1.26 2,958 1.58 98 .46 7 5,827 .84 2,175 1.16 56 .26 8 4,056 .58 1,673 .89 48 .22 9 2,998 .43 1,395 .74 36 . 17 10 2,153 .31 10 ,037 .55 18 .08 11-13 4,116 .59 2,180 1.16 44 .20 14-20 3,748 .54 2,632 1.40 41 .19 2150 2,678 .39 2,901 1.55 23 .11 51-100 448 .06 936 .50 4 .02 101-200 149 .02 374 .20 2 .01 201-300 47 .01 109 .06 1 .00 301400 19 .00 46 .02 0 .00 401-500 11 .00 21 .01 0 .00 5011000 5 .00 53 .03 0 .00 1001 + 2 .00 18 .01 0 .00 total 695,074 99.99 187,429 99.98 21,480 100.00 number of distinct corporate headings, as was indicated by the slow decrease of the multiple-heading occurrence rate (i.e., a small group of corporate names have a very large number of occurrences). file growth as a bibliographic file grows and the stock of names and subjects that are contained in the associated authority file increases, the number of new-to-the-file 1400 1200 1000 "' <:> 800 z i5 : .. 0 a: w 600 id ::;: "' z 400 200 1,444,726 personal names corporate names communications 199 headings that are required for the new bibliographic records would be expected to fall. figure 2 illustrates that tendency and shows that there is a leveling off of the number of new-to-the-file headings per new bibliographic record after the bibliographic file reaches a certain size. for example, after approximately 700,000 bibliographic records are in the file, for every additional 100 bibliographic records approximately 298 name and subject headings 30,417 conference names 1.468,804 topical subjects geographic subjects d distinct headings distinct headings that occur -only once fig. 1. number of headin gs by type. 200 journal of library automation vol. 14/3 september 1981 will be assigned, and, of these, approximately 53 will be new personal names, 14 new corporate names, 2 new conference names, 35 new topical subjects, and 10 new geographic subjects; the remaining 184 headings used will already be established in the authority file. thus after a certain bibliographic file size is reached, the growth of the authority file is approximately a linear function of the growth of the bibliographic file. implications the reoccurrence frequency of headings in a bibliographic file is often cited as a factor in designing bibliographic and authority-file configurations. discussion 1.2 ii i 0 0 .9 a: 0 u w a: .8 ~ ~ .7 z 5 :.\ " .6 ~ z 0 .5 a: w "' ~ . 4 z .3 centers on the necessity of carrying authority records for headings that occur only once in a bibliographic file . with reference to the name-heading data in table 4 and figure 1, carrying authority records only for headings that occur more than once could 'potentially reduce the size of the authority file from that indicated by the whole shaded area (including shaded and crosshatched) to the plain shaded area, i.e., from 903,983 records to 310,123, a 66 percent decrease. controlling multiple occurrences of a heading is, however, only one role of the authority record. more important perhaps is the control of cross-references connected with the heading. preliminary work with a • persona l names ---9 top~cal su8jects ... corporate names 2~ ----------~----~---------& geographi ca l subj ects 'y con ference n ames » ~~~~r=~~~~~==~==~==~~~==~==~==~==~-100 200 300 400 500 600 700 800 900 1000 11 00 1200 1300 number of bibliographic records cthousands) fig. 2 . n umber of n ew headings p er r eco rd for all files. random sample of personal names in the lc file indicates that less than 17 percent of personal names require cross-references. thus the personal name headings that occur only once but would require authority records because of cross-references could be less than 17 percent. the frequency data combined with reference structure data could have a significant impact on design. out of a total of 695,074 personal names in the authority files associated with the marc bibliographic files examined here, 456, 328, or 66 percent, occur only once. of these, fewer than 77,575 would be expected to have cross-references, thus the nameauthority file for personal names could be reduced in size from 695,074 records to 316,321, a 55 percent decrease. if separate authority records are a system requirement, the occurrence figures might then be useful for defining configurations that employ machine-generated provisional records for single-occurrence headings that do not have reference structures or that simplify in other ways the treatment of these headings. these figures may also be useful in making decisions on the addition of retrospective authority records to the automated files. reference 1. william gray potter, "when names collide: conflict in the catalog and aacr2," library resources & technical services 24:7 (winter 1980). rlin and oclc as reference tools douglas jones: university of arizona, tucson. the central reference department (social science, humanities, and fine arts) and the science-engineering reference department at the university of arizona library are currently evaluating the oclc and rlin systems as reference tools, to see if their use can significantly improve the effectiveness and efficiency of providing reference service. a significant number of the questions received by our librarians, and presumably by librarians elsewhere, incommunications 201 volve incomplete or inaccurately cited references to monographs, conference proceedings, government documents, technical reports, and monographic serials. if by using a bibliographic utility a librarian can identify or verify an item not found in printed sources, then effectiveness has been improved. once a complete and accurate description of the item is found, it is a relatively simple task to determine whether or not the library has the item, and if not, to request it through interlibrary loan. additionally, if the efficiency of the librarian can be improved by reducing the amount of time required to verify or identify a requested item, then the patron, the library, and, in our case, the taxpayer, have been better served. the promise of nearimmediate response from a computer via an online interactive terminal system is clearly beguiling when compared to the relatively time-consuming searching required with printed sources, which frequently provide only a limited number of access points and often become available weeks, months, or even years after the items they list. we realize, of course, that the promise of instantaneous electronic information retrieval is limited by a va):'iety of factors, and presently we view access to rlin and oclc as potentially powerful adjuncts tonot replacements for-printed reference sources. given that rlin and oclc have databases and software geared to known-item searches for catalog card production, our evaluation attempts to document their usefulness in reference service. a preliminary study conducted during the spring semester of 1980-81 indicated that approximately 50 percent of the questionable citations requiring further bibliographic verification could be identified on oclc or rlin. the time required was typically five minutes or less. successful verification using printed indexes to identify the same items ranged from 20 percent in the central reference department to 50 percent in science-engineering. time required per item averaged approximately fifteen minutes. based on our findings, we plan a revised and more thorough test during the fall semester of 1981-82, which will include an assessment of the enhancements to the hutchinson this study focuses on the adoption and use of wireless technology by medium-sized academic libraries, based on responses from eighty-eight institutions. results indicate that wireless networks are already available in many medium-sized academic libraries and that respondents from these institutions feel this technology is beneficial. w ireless networking offers a way to meet the needs of an increasingly mobile, tech-savvy student population. while many research libraries offer wireless access to their patrons, academic libraries serving smaller populations must heavily weigh both the potential benefits and disadvantages of this new technology. will wireless networks become essential components of the modern academic library, or is this new technology just a passing fad? prompted by plans to implement a wireless network at the houston cole library (hcl) (jacksonville state university’s [jsu’s] library), which serves a student enrollment close to ten thousand, this study was conducted to gather information about whether libraries similar in size and mission to hcl have adopted wireless technology. the study also sought to find out what, if any, problems other libraries have encountered with wireless networks and how successful they have perceived those networks to be. other questions addressed include level of technical support offered, planning, type of equipment used to access the network, and patron-use levels. � review of literature a review of the literature on wireless networks revealed a number of articles on wireless networks and checkout programs for laptop computers at large research institutions. seventy percent of major research libraries surveyed by kwon and soules in 2003 offered some degree of wireless access to their networks.1 no articles, however, specifically addressed the use of wireless networks in medium-sized academic libraries. many articles can also be found on wireless-network use in medical libraries and other institutions. library instruction using wireless classrooms and laptops has been another subject of inquiry as well. breeding wrote that there are a number of successful uses for wireless technology in libraries, and a wireless local area network (wlan) can be a natural extension of existing networks. he added that since it is sometimes difficult to install wiring in library buildings, wireless is more cost effective.2 a yearly survey conducted by the campus computing project found that the number of schools planning for and deploying wireless networks rose dramatically from 2002 to 2003. “for example, the portion of campuses reporting strategic plans for wireless networks rose to 45.5 percent in fall 2003, up from 34.7 percent in 2002 and 24.3 percent in 2001.”3 the use of wireless access in academia is expected to keep growing. according to a summary of a study conducted by the educause center for applied research (ecar), the higher-education community will keep investing in the technology infrastructure, and institutions will continue to refine and update networks. the move toward wireless access “represents a user-centered shift, providing students and faculty with greater access than ever before.”4 in an article on ubiquitous computing, drew provides a straightforward look at how wlans work, security issues, planning, and the uses and ramifications of wireless technology in libraries. he suggests, “perhaps one of the most important reasons for implementing wireless networking across an entire campus or in a library is the highly mobile lifestyle of students and faculty.” the use of wireless will only increase with the advent of new portable devices, he added. wireless networking is the best and least expensive way for students, faculty, and staff to take their office with them wherever they go.5 the circulation of laptop computers is a frequent topic in the available literature. the 2003 study by kwon and soules primarily focused on laptop-lending services in academic-research libraries. fifty percent of the institutions that responded to their survey provided laptops for checkout. the majority indicated moderate-to-high use of laptop services. positive user response and improved “public reputation, image, and relations” were the greatest advantages reported with laptop circulation. the major disadvantages associated with these services were related to labor and cost.6 a study of laptop checkout service at the mildred f. sawyer library at suffolk university in boston revealed that laptop usage was popular during the fall semester of 1999. students checked out the computers to work on group projects. a laptop area was set aside on one library floor to provide wired internet access for eight users. however, students wanted to use the laptops anywhere, not one designated place. the wired laptop areas were not popular, dugan wrote, adding that “few students used the wired area and the wires were repeatedly stolen or intentionally broken.” an interim phase involved providing wireless network cards for checkout wireless networks in medium-sized academic libraries: a national survey paula barnett-ellis and laurie charnigo paula barnett-ellis (pbarnett@jsucc.jsu.edu) is health and sciences librarian, and laurie charnigo (charnigo@jsucc .jsu.edu) is education librarian at houston cole library, jacksonville state university, alabama. wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 13 14 information technology and libraries | march 2005 to encourage patrons to use their own laptops, and, when a wireless network was put into place in the fall of 2000, demand exceeded the number of available laptops for checkout.7 � method a survey (see appendix) was designed to find out how many libraries similar in size and mission to hcl have adopted wireless networks, the experiences they have encountered in offering wireless access, and, most importantly, whether they felt the investment in wireless technology has been worth the effort.8 the national center for education statistic’s academic library peer comparison tool, a database composed of statistical information on libraries throughout the united states, was used to select institutions for this study. a search on this database retrieved eighty-eight academic libraries that met two criteria: full-time enrollments of between five thousand and ten thousand, and classification by the carnegie classification of higher education as master’s colleges and universities i.9 the survey was administered to those thought most likely to be responsible for systems in the library; they were selected from staff listings on library web sites (library systems administrator, information tech-nology [it] staff). if such a person could not be identified, the survey was sent to the head of library systems or to the library director. the survey was divided into the following sections: implementation of wireless network, planning and installation stages, user services, technical problems, and benefits specific to use of network. surveys were mailed out in march 2004. an internet address was provided in the cover letter if participants wished to take the survey online rather than return it by mail. an e-mail reminder with a link to the online survey was sent out three weeks after the initial survey was mailed. all letters and e-mails were personalized, and a self-addressed stamped envelope and a ballpoint pen with the jsu logo were included with the mail surveys. in the e-mail reminder, the authors offered to share the results of the project with anyone who was interested, and received several enthusiastic responses. � results a total of fifty-three completed surveys were returned, resulting in a response rate of 60 percent. the overwhelming majority (85 percent) responded that their library offered wireless-network access. even if the thirty-five surveys that were not returned had reported that wireless networks were not available, more than 50 percent would still have offered wireless networks. survey results also pointed to the newness of the technology. only four of the fifty-three institutions have had wireless networks for more than three years. the majority (73 percent) has implemented wireless networks just within the last two years. when asked to identify the major reasons for offering wireless networks to their patrons, the three responses most chosen were: (1) to provide greater access to users; (2) the flexibility of a network unfettered by the limitations of tedious wiring; and (3) to keep up with technological innovation (see table 1). least significant factors in the decision to implement wireless networks were cost; use by library faculty and staff; to aid in bibliographic instruction; and use for carrying out technical services (taking inventory). somewhat to the authors’ surprise, wireless use in bibliographic instruction was not high on the list of reasons for installing a wireless network, identified by only 9 percent of respondents. the benefits of wireless for library instruction was stressed in the literature by mathias and heser and patton.10 in addition to obtaining an instrument for gauging how many libraries similar in scope and size to hcl have implemented wireless networks and why they chose to do so, questions on the survey were also designed to gather information on planning and implementation, user services, technical problems, and perceived benefits. � planning and implementation although tolson mentions that some schools have used committees composed of faculty, staff, and students to look into the adoption of wireless technology, responses from this survey indicated that the majority (60 percent) of the libraries did not form committees specifically for the planning of their wireless networks.11 in addition, 49 percent of the libraries took fewer than six months to plan for implementation of a network, 37 percent required six months to one year, and 15 percent reported more than one to two years. actual time spent on installation and configuration of wireless networks was relatively short, 98 percent indicating less than one year (see table 2 for specific times). one of the most important issues to consider when planning to implement a wireless network is extent of coverage—where wireless access will be available. survey responses revealed varying degrees of wireless coverage among institutions. twenty percent had campus-wide access, 55 percent had some level of coverage throughout the entire library, 37 percent provided a limited range of coverage outside the building, and 20 percent offered access only in certain areas within the library. according to a bulletin published by ecar, institutions vary in their approaches to networking depending on enrollment. smaller colleges and universities with fewer than ten thousand students are “more likely to implement campuswide wireless networks from the start. larger institutions are more likely to implement wireless technology in specific buildings, consistent with a desire to move forward at a modest pace, as resources and comfort with the technology grow.”12 questions on the survey also queried respondents about the popularity of spaces in the library where users access the library’s wireless network. answers revealed that the most popular areas for wireless access are study carrels, tables, and study rooms. nineteen percent indicated that accessing wireless networks in the stacks is popular. of particular concern to hcl, a thirteen-story building, was how the environment of the library would accommodate a wireless network. a thorough site survey is important to locate the best spots within the library to install access points and to determine whether there are architectural barriers in the building that might interfere with access. the majority of survey respondents indicated that the site survey conducted in their library for a wireless network was carried out by their academic institution’s it staff (59 percent). while library staff conducted 35 percent of site surveys, only 17 percent were conducted by outside companies. � user services an issue to be addressed by libraries deciding to go wireless is whether laptop computers should also be provided for checkout in the library. after all, it might be hard to justify the usefulness of a wireless network if users do not have access to laptops or other hardware with wireless capabilities. while one individual reported working at a “laptop university” in which campuswide wireless networking exists and all students are required to own laptops, not all college students will have that luxury. in order to provide more equal access to students, checking out laptops has become an increasingly common service in academic libraries. seventy percent of this survey’s respondents whose institutions offered wireless access also made laptops available for checkout. comments made throughout the survey seemed to imply that while checking out laptops to patrons is an invaluable complement to offering wireless access, librarians should be prepared for a myriad of hassles that accompany laptop checkout. wear and tear of laptops, massive battery use, cost of laptops, and maintenance were some of the biggest problems reported. one participant, whose institution decided to stop offering laptops for checkout to patrons in the library, wrote, “it required too much staff time to maintain and we decided the money was better spent elsewhere. the college now encourages students to purchase a laptop [instead of] a full-sized pc.” one participant worried that the rising use of laptops in his library would lead to the obsolescence of its more than one hundred wired desktops, writing, “our desktops are very popular and we think having them is one of the reasons our gate count has increased in recent years. what happens when everyone has a laptop?” the number of laptops checked out in the libraries varied. the majority of libraries had purchased between one and thirty laptops available for checkout (see table 3). three institutions had more than forty-one laptops available for checkout. one library could boast that it had sixty laptops available for checkout with twelve pagers to notify students waiting in line to use laptops. when asked about the use of laptops in libraries, 46 percent table 1. main reasons for implementing a wireless network in absolute numbers and percentages reasons for implementing total number of percent of responses a wireless network responses out of total number provide greater access to users 36 67 flexibility (no wires, ease in setting up) 29 54 to keep up with or provide technological innovation 28 52 campuswide initiative 21 39 requests expressed by users 16 30 provide greater online access due to shortage of computers-per-user in the library 15 28 other 7 13 offer network access outside the library building 6 11 aid in bibliographic instruction 5 9 for use by library faculty and staff 5 9 low cost 5 9 to carry out technical services (such as inventory) 4 7 wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 15 16 information technology and libraries | march 2005 observed moderate use, while 32 percent reported heavy use of laptops. only 3 percent indicated that they hardly ever noticed use of laptops in the library. for those students who chose to bring their own laptop to access the library’s wireless network, half of the institutions surveyed required students to purchase their own network-interface cards for their laptops, while 19 percent allowed students to check them out from the library. in addition to laptops, personal digital assistants, (pdas) were listed by 37 percent of respondents as devices that may access wireless networks. one librarian indicated that cell phones could access the wireless network in his library. fiftysix percent of respondents indicated that users are able to print to a central printer in the library from their wireless device. an important consideration for implementing a wireless network is how users will authenticate. authentication protocol is defined by the microsoft encyclopedia of networking as “any protocol used for validating the identity of a user to determine whether to grant the user access to resources over a network.”13 authentication methods listed by the institutions surveyed varied greatly and the authors could not identify all of them. methods mentioned were lightweight directory access protocol (ldap), virtual private network (vpn), and media access control (mac) addresses, bluesocket, remote authentication dial in user service (radius), pluggable graphical identification and authentication (pgina), protected extensive authentication protocol (peap), and e-mail logins. out of the thirty-nine responses to this question, seven individuals indicated that they do not require any type of authentication at the present. although some individuals noted that they are planning to enable some type of authentication in the future, one participant suggested that there were ethical issues involved in requiring users to authenticate. this person argued that “anonymous access to information is valued” and praised his institution’s current policy of allowing “anyone who can find the network” to use it. a concern about offering wireless network access in the library is how library staff will be prepared to handle the flood of technical questions that are likely to ensue. the level of technical support offered to users varied among the institutions surveyed. more than half of the respondents indicated that users receive help specifically from it staff or from the campus computer center. thirtynine percent of users received help from the reference desk, while 19 percent received help from circulation staff. thirty-three percent of the responding institutions offered technical help from a web site, while 7 percent indicated that they did not offer any type of technical support to users. technical problems the technical problems most often encountered with wireless networks centered on architectural barriers that cause black-outs or slow-spots where wireless access fails. this confirms the importance of carrying out thorough site surtable 2. total length of time taken to completely configure and install the wireless network time to install and total number of percent of responses configure wireless network responses out of total number less than one month 12 28 one to two months 11 26 more than two months to four months 10 23 more than four months to six months 4 9 more than six months to one year 5 12 more than one year 1 2 table 3. total number of laptops available for checkout in the library total laptops total number of percent of responses available for checkout responses out of total number one to five 8 26 six to ten 5 16 eleven to fiften 1 3 sixteen to twenty 5 16 twenty-one to thirty 8 26 thirty-one to forty 1 3 more than forty 3 10 veys and testing prior to installation of access points. site surveys may be carried out by companies specially equipped and trained to determine where access points should be installed, the most appropriate type of antennae (directional or omnidirectional), and how many access points are needed to provide the greatest amount of coverage. configuration of the network was the second most highly reported problem associated with installing wireless networks, seeming to suggest the need for librarians to coordinate their efforts and rely on the knowledge provided by the it coordinator (or similar type of personnel) within their institution. lack of technical support available to users, slow speed, and authentication were also indicated as technical problems most encountered (see table 4). integrating the wireless network with the existing wired network was the least-mentioned problem associated with wireless networks. although security problems, particularly concerning wired equivalency protocol (wep) vulnerabilities, have been pointed out as one of the major drawbacks of a wireless network, the majority of users had not as yet experienced security problems. although one participant wrote, “don’t be too casual about the security risks,” another individual wrote, “talk to your networking department,” as many of them are overly worried about security. perceived benefits respondents reported that the number-one benefit of offering wireless access was user satisfaction. giving patrons the ability to use their laptops anywhere in the library and do multiple tasks from one machine is simply becoming what more and more users expect. the secondlargest benefit revolved around flexibility and ease of use due to the lack of wires. thirty-five percent indicated that allowing students to roam the stacks while accessing the network was a significant benefit. although a few studies have suggested the promise of wireless networks for aiding bibliographic instruction, only 9 percent of respondents indicated this as a benefit of wireless technology. use of wireless technology for instruction, it might be recalled, was not a significant factor noted by respondents in the decision to implement a wireless network. likewise, use of this type of network to carry out technical services (such as inventory) was also low on the scale of benefits. seventy-three percent of users claimed that wireless networks have thus far been worth the cost-benefit ratio. while 70 percent indicated moderate to heavy use of the wireless network, 27 percent reported low usage. when asked what advice they would give to others considering adopting wireless networks in their libraries, the overwhelming majority of responses were positive, recommending that hcl take the plunge. as one individual wrote, “offer it and they will come. it has really increased the usage of our library.” other individuals noted that it is simply necessary to offer wireless access to keep up with technological innovation, and that students expect it. the most significant warning, however, revolved around checkout and maintenance of laptops, which, from the results of this survey, seems be both a big advantage and a headache. several individuals echoed the importance of doing site surveys to test bandwidth limitations and access. one particularly energized participant, using multiple exclamations for emphasis, shared a plethora of advice. “throttle connection speeds! allow only http access! block ports and unnecessary protocols! secure your network and disallow unauthenticated users! use access control lists! establish policies that describe wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 17 table 4. technical problems encountered problems total number of percent of responses encountered responses out of total number architectural barriers 15 28 configuration problems 12 22 not enough technical help available to users when needed 10 19 slow speed 10 19 authentication problems 10 19 blackouts 6 11 problems installing drivers 6 11 security problems 6 11 difficulty signing on 6 11 problems with operating systems 5 9 other 3 6 problems integrating the wireless network with an existing wired network 2 4 18 information technology and libraries | march 2005 [wireless fidelity] wi-fi risks and liabilities on your part!” useful advice on wireless-access implementation gleaned from this survey fell under the following categories: � be aware of slower speed � create a policy and guide for users � do it because more users are going wireless, it is necessary to keep up with technological innovation, and because students love it � provide plenty of access points � install access points in appropriate places � ensure continuous connectivity by allowing overlap between access points � purchase battery chargers and heavy-duty laptops with extended warranties � get support from it staff for planning and maintenance � offering wireless will increase library usage � perform or have an expert perform a careful site survey and do lots of testing to locate dead or slow spots in the library due to architectural barriers � enable some type of authorization � be aware of security concerns � although the majority of participants’ networks (70 percent) support 802.11b (which allows for throughput up to 11 megabits per second), a few participants suggest using the 802.11g standard (up to 54 megabits per second) because it is “the fastest” and “backwards compatible to 802.11b” � conclusion though it is a relatively new technology, this study found that a surprisingly large number of medium-sized academic libraries are already offering wireless access. not only are they offering wireless access, but they are also providing patrons with laptops for checkout in the library. although actual use of the network by patrons was not determined through survey responses (as individuals were only asked about their observations of network use), the comments and answers were overwhelmingly positive and enthusiastic about this new technology. problems that have been encountered with wireless networks largely revolve around configuration, slow speed, and laptop checkout. although much of the literature focuses on security issues that accompany wireless networking, few individuals reported problems with security. college and university students, like the rest of society, are becoming increasingly mobile. more often, they want access to library networks and the internet wherever they happen to be studying or working on group projects, not merely in computer labs or designated study areas. the majority of the libraries in this study are accommodating these students’ needs by offering wireless access. according to breeding, wireless networking is a rapidly growing niche in the networking world, and mobile computer users will become a larger and larger part of any library’s clientele.14 to encourage patrons to continue visiting them, academic libraries, large and small, should attempt to meet the demand for wireless access if at all possible. references and notes 1. myoung-ja lee kwon and aline soules, laptop computer services: spec kit 275 (washington, d.c.: association of research libraries office of leadership and management services, 2003), 11. 2. marshall breeding, “the benefits of wireless technologies,” information today 19, no. 3 (mar. 2002): 42–43. 3. kenneth c. green, “the campus computing project.” accessed mar. 3, 2004, www.campuscomputing.net/. 4. educause center for applied research, “respondent summary: wireless networking in higher education in the u.s. and canada.” accessed dec. 4, 2003, www.educause.edu/ ir/library/pdf/ecar_so/ers/ers0202/ekf0202.pdf. 5. wilfred drew, “wireless networks: new meaning to ubiquitous computing,” journal of academic librarianship 29, no. 2 (mar. 2003): 102–106. 6. kwon and soules, laptop computer services, 11, 15–17. 7. robert e. dugan, “managing laptops and the wireless networks at the mildred f. sawyer library,” journal of academic librarianship 27, no. 4 (jul. 2001): 295–98. 8. questions on the survey did not distinguish as to whether wireless network installations were initiated by it or library personnel. 9. national center for education statistics, “compare academic libraries.” accessed mar. 10, 2004, http://nces.ed.gov/ surveys/libraries/academicpeer/. 10. molly susan mathias and steven heser, “mobilize your instruction program with wireless technology,” computers in libraries 22, no.3 (mar. 2002): 24–30; janice k. patton, “wireless computing in the library: a successful model at st. louis community college,” community & junior-college libraries 10, no. 3 (mar. 2001): 11–16. 11. stephanie diane tolson, “wireless laptops and local area networks.” accessed dec. 11, 2003, www.thejournal.com/ magazine/vault/articleprintversion.cfm?aid=3536. 12. raymond boggs and paul arabasz, “research bulletin: the move to wireless networking in higher education.” accessed dec. 4, 2003, www.educause.edu/ir/library/pdf/erb0207.pdf. 13. mitch tulloch, microsoft encyclopedia of networking (redmond, wash.: microsoft pr., 2002), 122. 14. marshall breeding, “a hard look at wireless networks,” library journal 127, no. 12 (summer 2002): 14–17. 1. has a wireless network been implemented in your library? __yes __no 2. if your library has not adopted wireless networking, are you currently planning or seriously considering it for the near future? __yes (please skip to question 4) __no (please fill out questions 2 and 3 only) 3. what are your primary concerns about implementing a wireless network? check all that apply. __the technology is still new __unsure of its benefits __no need for one __questions regarding security __cost __would not be able to provide technical support that might be needed __funds must primarily support other types of technology at the moment __have not noticed many users with laptops in the library __slow speed of wireless networks __other 4. how long has a wireless network been implemented in your library? __fewer than 6 months __6 months to 1 year __more than 1 to 2 years __more than 2 to 3 years __more than 3 years 5. what were the main reasons for implementing a wireless network? check all that apply. __provide greater access to users __campuswide initiative __offer network access outside the library building __provide greater online access due to shortage of computers per user in the library __flexibility (no wires, ease in setting up) __requests expressed by users __low cost __to keep up with or provide technological innovation __to carry out technical services (such as inventory) __aid in bibliographic instruction __for use by library faculty and staff __other 6. please describe the coverage of your network. check all that apply. __campuswide __library building and limited range outside the library building __inside the library (all areas) __select areas within the library 7. what areas of the library are most popularly used for access to the wireless network? check all that apply. __reference and computer media center areas __in the stacks __librarians and staff offices __carrels, tables, reading or study rooms __area outside the library building 8. please list standards your wireless network supports. check all that apply. __802.11b __802.11a __802.11g __bluetooth __other planning and installation 1. was a committee established to plan the implementation and service of the wireless network? __yes __no 2. how long did it take to plan for implementation of the wireless network? __fewer than 6 months __6 months to 1 year __more than 1 to 2 years __more than 2 years 3. how long did it take to install and configure the network? __less than a month __1 to 2 months __more than 2 to 4 months __more than 4 to 6 months __more than 6 months to 1 year __more than 1 year 4. who performed the site survey? check all that apply. __an outside company or contractor appendix. survey: implementation of wireless networks wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 19 20 information technology and libraries | march 2005 __institution’s own information technology coordinator or computer staff __library staff with technical expertise __no site survey was conducted 5. if the site surveyor was an outside company or contractor, please list their company name and whether you would recommend them. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ user services 1. how are users authenticated? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2. does the library check out laptops to users (for either wired or wireless use)? __yes __no 3. if laptops are available for checkout, do they have wireless capability? __yes __no 4. how many laptops do you have for checkout? __one to five __six to ten __eleven to fifteen __sixteen to twenty __twenty-one to thirty __thirty-one to forty __more than forty 5. how would you describe use of laptops in your library on the average day? __heavy—very noticeable use of laptops __moderate use of laptops __low use of laptops __not sure __hardly even notice laptops are used 6. how do users obtain wireless cards for the network? check all that apply. __check out from library __purchase from library __purchase from the campus computer center __must purchase on their own 7. if the library checks out wireless cards, how many were purchased for checkout? __one to five __six to ten __eleven to fifteen __sixteen to twenty __twenty-one to twenty-five __twenty-six to thirty __more than thirty 8. what type of technical support does the library provide to users? check all that apply. __help from reference or help desk __help from the information technology staff or campus computer center __circulation staff __other library staff __from a web site __no technical help is provided to users 9. has the library created a policy for the use of wireless networks? __yes __no 10. are users able to print from the wireless network in the library? __yes __no 11. which of the following may access the wireless network? check all that apply. __laptops __desktop computers __pdas __cell phones __other technical problems 1. what technical problems have you or your users encountered? check all that apply. __blackouts __architectural barriers __slow speed __problems integrating the wireless network with an existing wired network __configuration problems __security problems __authentication problems __problems with operating systems __difficulty signing on __not enough technical help available to users when needed __problems installing drivers __other 2. have you experienced security problems with the network? check all that apply. __have not experienced any security problems __problems with unauthorized people accessing the internet through the wireless network __problems with restricted parts of the network being accessed by unauthorized users __other 3. how were security problems resolved? benefits of use of network 1. what have been the biggest benefits of wireless technology? check all that apply. __user satisfaction __increased access to the internet and online sources __flexibility and ease due to lack of wires __has improved technical services (use for library functions) __has aided in bibliographic instruction __provides access beyond the library building __allows students to roam the stacks while accessing the network __other 2. how would you describe current usage of the network? __heavy __moderate __low 3. in your opinion, has this technology been worth the benefit-cost ratio thus far? __yes __no __not sure 4. what advice would you give to librarians considering this technology? (editorial continued from page 3) design and implementation of complex systems to serve our users. writing about that should not be solitary either. i hope to publish think-pieces from leaders in our field. i hope to publish more articles on the management of information technologies. i hope to increase the number of manuscripts that provide retrospectives. libraries have always been users of information technologies, often early adopters of leading-edge technologies that later become commonplace. we should, upon occasion, remember and reflect upon our development as an information-technology profession. i hope to work with the editorial board, the lita publications committee, and the lita board to find a way, and soon, to facilitate the electronic publication of articles without endangering—but in fact enhancing—the absolutely essential financial contribution that the journal provides to the association. in short, i want to make ital a destination journal of excellence for both readers and authors, and in doing so reaffirm the importance of lita as a professional division of ala. to accomplish my goals, i need more than an excellent editorial board, more than first-class referees to provide quality control, and more than the support of the lita officers. i need all lita members to be prospective authors, prospective referees, and prospective literary agents acting on behalf of our profession to continue the almost forty-year tradition begun by fred kilgour and his colleagues, who were our predecessors in volume 1, number 1, march 1966, of our journal. reference 1. walt crawford, first have something to say: writing for the library profession (chicago: ala, 2003). wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 21 lib-mocs-kmc364-20140103103252 scope : a cost analysis of an automated serials record system 129 michael e. d. koenig, alexander c. finlay, joann g. cushman : technical information department, pfizer inc. , groton, conn., and james m. detmer: detmer systems co., new canaan, conn. a computerized serials record and control system developed in 1968/69 for the technical information department of pfizer inc. is described and subjected to a cost analysis. this cost analysis is conducted in the context of an investment decision, using the concept of net present value, a method not previously used in library literature. the cost analysis reveals a positive net present value and a system life break-even requirement of seven years at a 10% cost of capital. this demonstrates that such an automated system can be economically justifiable in a library of relatively modest size ( approx. 1,100 serial and periodical titles). it may be that the break-even point in terms of collection size required for successful automation of serial records is smaller than has been assumed to date. introduction the field of librarianship has in general not been characterized by an abundance of cost analysis articles. this is by no means a novel observation ( 1,2,3). library automation has been no exception, despite its more quantitative aura. in particular there has been an almost complete lack of any analysis of the cost of an automated system as an investment decision. 130 journal of library automation vol. 4/3 september, 1971 the bulk of material that has been written regarding costs and cost analysis has concentrated upon costs per unit of productivity of a functioning system, or upon comparison of such costs among various systems ( 4,5,6) . though still perhaps underrepresented, there is a growing core of such articles. indeed, jacob's article on standardized costs ( 7) indicates that a certain level of maturity has been reached. the analysis of library automation in terms of its justifiability as an investment decision is not an appropriate area for benign neglect. librarians, whether they be special, academic, or public, typically must justify their budgets to some higher authority, and the decision to automate must almost invariably be an investment decision, requiring an expenditure of funds above the normal operating budget. if librarians hope to be successful in justifying their pleas for an investment in automation, an "investment in the library's future", they should be prepared to justify their requests in terms of what they represent-investment decisions. the cost analysis described below is an example of such an analysis. it is an after-the-fact analysis, but the principle remains the same. methods and materials the scope (systematic control of periodicals) system was implemented in 1968 by the technical information department of pfizer, inc., at the medical research laboratories in groton, connecticut. the system is not radically different from others described in the literature ( 8,9,10). it is reasonably sophisticated in its handling of such featu res as claiming, binding, and budgeting. the basic design element of the system is the computer generation each month of a deck of ibm cards corresponding to anticipated receipts for that month. as an item is received, the corresponding card is pulled from the anticipated deck and is used to inform the system of the receipt of the item. this "tub file" feature, first used by the university of california at san diego ( 11) is the major design difference between scope and the university of minnesota bio-medical library system described by grosch ( 12) and strom ( 13) , with which scope seems most comparable in terms of system sophistication and capability. system description the system was originally written in fortran iv for an ibm 1800 computer with two tape drives . a total of twelve programs were written. two of these programs are quite large (the weekly update and the monthly generation program) comprising about 600 statements each; the remainder average 200 statements. since that time the programs have been revised to operate on an ibm 360/30 computer using two 2400 tape drives and two 2311 disk drives. several more programs have also been written. fortran iv was chosen as a program language to render the system relatively immune to hardware changes and has fully justified itself. a listing of programs follows. scope: a cost analysis/koenig, et al. 131 program function number epc01 weekly update epc02 monthly card deck generation epc03 vendor listing epc04 periodical title evaluation & budget listing epc05 holdings listing epc06 scope file print epc07 psn file swap-to reassign psn & realphabetize epc08 daily receipt listing epc09 binding listing epc10 short title vs. full title thesamus epcll skeleton binding punch epc12 copy tape file epc13 general skeleton punch epc14 cross index punch epc15 receipt edit epc16 pmchase order analysis epc17 discipline analysis file design core requirements bytes 17060 15992 5648 6916 6992 6852 7638 1768 2480 3876 3024 2008 2920 3300 1444 2796 3024 scope maintains a magnetic tape file in which each periodical is recorded in sequence by its periodical sequence number ( psn). appearing once in the file for every psn are records giving title, cross-reference, holdin~s, and journal control information, including, for instance, "separate index.' records for one or more copies then follow this basic information. each copy within a psn consists of records for all current expected receipts ( xrs ), binding units ( bus ) not yet complete, as well as a trailer ( tl) summary. a file print program is provided which enables the library staff to inspect every item of data in the file. "anticipated" deck scope generates monthly a deck of approximately 2,500 80-column hollerith cards to be used for posting periodicals as received. a card is made for each receipt expected within the succeeding five weeks. for all regular known publication schedules, these cards are complete as to volume, issue ( including separate index) and publication date. for irregular or unknown publication schedules, one or more incomplete cards are provided in the deck. upon receipt of an issue, the proper card is pulled from the "anticipated" deck, the actual date of receipt is punched and the card used to prepare the daily receipts listing. the card is also used to update the tape file on a weekly cycle. unexpected issues require that a card be prepared manually by the library staff. issues which are omitted by the publisher require that the card be returned to the system as a "throwback." if an issue is 132 journal of library automation vol. 4/3 september, 1971 unexpectedly divided into two or more parts, separate cards are manually prepared and the original card deleted. claims in order to issue claims on a current basis, the tape file is updated weekly with receipts. every receipt will find a copy of itself on the scope tape (generated when the "anticipated" deck was produced) and a received code (r) and the current date will be posted to the record. consequently, any item not marked received becomes a claim as soon as the "claim delay" period is exceeded. a card to be used for claiming will be punched on the weekly cycle first exceeding "lag" and "claim delay," and once again every four weeks thereafter until resolved either by receipt or transfer to the missing issue file. the "lag" is the period in weeks lapsing between formal date of publication and earliest anticipated date of receipt. the "claim delay" period is calculated as the weeks elapsing between earliest anticipated date of receipt and latest normal date of receipt. "lag" and "claim delay" may be modified for each publication based on experience. binding binding units are created within the scope file during the monthly generation run. a unit is punched when all the issues comprising it are received or claimed (that is, when none of them is yet to be anticipated). if the unit is complete (no claims) it will be dropped from the tape file at the time it is punched and will not be punched again. binding units are formed whenever a volume changes or whenever the "issues per bind" factor is satisfied. receipts having been accumulated in the file from week to week are dropped at the time of the monthly generation after being counted for binding. from the binding unit cards a listing is prepared that is used by the library staff to make up bundles of periodicals for the binders. the binding unit card accompanies the shipment and is used by the binder. it includes information on issues included, indexes, color of binding, etc. file maintenance in addition to receipts and "throwbacks" the weekly update procedure allows add, change, and delete transactions to affect the scope file on a record-for-record basis. such transactions are needed to handle new periodicals, additional copies, closed series, discontinued copies, name changes, publication schedule changes, revised costs, vendor changes, and the like. the update operation is ordered by psn, copy number, record type, and (for xrs) volume and issue, in that order. an entire publication schedule may be added to the file in such cases as when the schedule is known but highly irregular (frequency code 99). after the receipt cards are processed by the update each week, they are filed in the "manual receipt file" together with copies of claims sent to ....... scope: a cost analysis/koenig, et al. 133 vendors. as binding units are created, copies of binding cards are filed in the same file, and receipt cards representing binding cards are discarded, as are earlier binding cards. this manual file corresponding to 1,000 journals requires about 5,000 cards and occupies three card file drawers. it is filed by psn and is therefore in order alphabetically by journal title. discards and additions to the manual file are about equal and hence it does not increase substantially in size. it permits rapid manual examination of the current status of each periodical. holdings list a program is provided that lists the complete scope file showing full title and abbreviated holdings statement for each psn. in addition, any cross reference/ history data and any desired holdings detail will be printed. since the file maintenance process insures an accurately updated file, this listing may be run at any time to provide an accurate reflection of library holdings. periodical title evaluation (scrutiny) a program is provided that lists all copies in the scope file requiring annual review prior to renewal. this procedure is controlled by the "value code" assigned individually to each copy within a psn. in addition to full title and abbreviated holdings statement, the listing shows by whom abstracted, the discipline codes associated with the periodical, and the annual cost. given this information, library users are requested to vote for retention of items for the next year. those not receiving sufficient votes are not renewed. separate programs not part of the scope system are used to prepare vote cards and tabulate results. budget list the program that prepares the periodical title evaluation list can be used to prepare lists by "department charged," a convenient budgetary tool used each fall to plan purchases for the following year. the lists may, of course, be run at any time. vendor order list a program is provided to prepare from the scope file a listing of all non-terminated copies associated with each requested vendor. a threecharacter vendor abbreviation is used to control this process and is coded into each copy control record. in addition to the short title, the list gives vendor reference (his identifier for the periodical ), pfizer purchase order number and date, and the estimated annual cost. each different condition (form of publication, such as periodical, microfilm) is listed with the number of copies ordered. although prices are not firm at the time of ordering, this listing nevertheless provides the detail needed for purchasing documents. as price 134 journal of library automation vol. 4/3 september, 1971 change information is made available and updated into the fil e, the listing may be rerun for checking out final billings from the vendor. similar lists can be produced by purchase order number, a convenient tool for resolving those financial complexities which inevitably occur. discipline list this program is used to prepare lists by discipline/subject, as microbiology, immunology, etc., a useful tool for maintaining collection balance, and for assuaging patrons' fears that their disciplines may not be adequately represented. system capacity present counts indicate approximately 9,000 tape records in the system, representing approximately 1,100 journals. about 200 issues are posted weekly. there are no restrictions on future expansion of the system as presently implemented. method the method of cost analysis used was the "net present value method." perhaps the clearest most readily available description of this concept is to be found in chapters 19 and 20 of shillinglaw's cost accounting, analysis and control ( 14). briefly the idea is that of comparing a given investment decision with what might reasonably be expected from an alternative use of that same money for another investment. an investment is typically defined as "an expenditure of cash or its equivalent in one time period or periods in order to obtain a net inflow of cash or its equivalent in some other time period or periods." ( 14, p. 564). the librarian typically thinks of investing in automation now in order to make possible a lessened expenditure in the future-at least a lessened expenditure in comparison to what would be necessary to accomplish the same level of operations in a non-automated fashion. conceptually these are the same; investment now in order to reap some future benefit. future savings can be treated as a future cash inflow. the concept of net present value is rather simple; it consists of converting all present and expected future cash flows (or their equivalents) to a present value and examining that value in comparison to alternative uses for the resources invested. the process of conversion is that of relating time and money. time does of course influence the worth of money. a dollar a year from now is worth less than a dollar today, for the dollar today can be invested and a year from now it will be worth more than a dollar, or at least the mathematical expectation of its worth is more than a dollar. the question is at what rate future cash flows should be discounted. business firms typically use their "cost of capital" (the cost which the business must pay to obtain capital) as the discount rate. a business d~cision should yield a positive net present value when the appropriate future cash flows are discounted at the cost of capital. if not, the investscope: a cost analysis/koenig, et al. 135 ment is a losing proposition, and the business would have been better off by not obtaining the capital, or by investing it elsewhere. the calculation of an appropriate cost of capital is a complicated exercise involving such things as debt capital, equity capital, etc. the figure of 10% is often cited as a good rule of thumb; happily it is appropriate in the case at hand and is the one used here. to the obvious question "is there any relevance in this net-presentvalue/cost-of-capital idea to an academic or a public library which does not obtain its funds in the same way, or have any explicit cost of capital?" the response is "yes." if a decision to automate, when analyzed in this fashion in comparison with alternative methods, should result in a negative net present value, then that decision is demonstrably poor. for if the money invested in automation were instead invested in the market, it could supply the alternative system's future greater operating costs with money left over to utilize elsewhere. this latter course might not be an option in fact, but the mere presence of its theoretical preferability would cast doubt on the desirability of any decision to automate. conversely a positive net present value would argue for the desirability of automation, regardless of the source of the funds. the cost analysis that follows is expressed in terms of set up cost outlays (investment) and projected savings (cash inflow). the investment expenses are of course reasonably well documented. the operational savings are based on 18 months' successful experience with the system. set-up costs (including 1968 and 1969 parallel running costs) systems analysis and programming: (fees paid to consultant ) keypunching: conversion reprogramming: (ibm 1800-dbm 360/30) computer time: personnel, opportunity costs: (asst. librarian $4,000 tech. info. mgr. $6,000) total set-up costs: yearly running costs system maintenance: (retainer to detmer systems) computer time (full costing) : allowance for machine conversion: (based on an expectation of conversion at 3 yr. intervals at a cost of $750 each time) total $10,450 2,000 500 4,000 10,000 $26,950 $ 500 5,000 250 $ 5,750 136 journal of library automation vol. 4/3 september, 1971 operational savings 1970 ~ , per year (in comparison with continued running of the previous manual system) posting: $ 1,400 (based on a saving of 8 hours per week of clerical work) claiming: 1,050 (based on a saving of 10% of an assistant librarian's time) binding: 2,700 (based on elimination of approximately 450 hours of overtime, clerk and assistant librarian, and 150 hours regular time per year) replacement costs : 400 (represents decreased replacement costs due to rapid binding and consequent lower loss rate) production of holdings list: 250 (based on a savings of 50 hours per year of assistant librarian's time) ordering/bookkeeping: 1,250 (based on a savings of 250 hours per year of assistant librarian's time ) total $7,050 savings resulting from control of the collection practicable (see discussion below) space saving per year: subscription saving per year: incremental overhead saving per year: total total yearly savings yearly running costs difference (realized savings) results not previously $ 750 2,000 1,500 $ 4,250 $11,300 $ 5,750 $ 5,500 the net present value at the end of 1970 based on 10% cost of capital and 15 year life expectancy follows. the present value of one unit one year ago is 1.1052, at 10% cost of capital (assuming for simplicity that the 1968-70 set-up prices were paid in a lump one year prior to the end of 1970 ); 7.7688 is the present value of an annuity of one unit per year for 15 years at 10% cost of capital. net present value factor scope: a cost analysis/koenig, et al. 137 net present value 1968-1970 set-up costs: ( $26,950) x ( 1.1052) ( -$29,785) yearly savings, commencing 1970: ($ 5,500) x (7.7688 ) ( +$43,117) net present value= $13,332 these findings indicate that the crude payback period :::::::: 4.9 years (commencing january 1971). the system life required to break even at 10% cost of capital = 7 years. another way of looking at the matter is to calculate the discounted rate of return. that is, at what rate of discount is the sum of the positive present values equal to the sum of the negative present values. in this case, the discounted rate of return = 17%. in other words, since the discounted rate of return ( 17%) is significantly above that available for alternative uses of the resources (say 10%), this is a reasonable candidate for investment. discussion the net present value method has two inputs in addition to the raw data. the first one, already discussed, is the cost of capital. most large businesses can supply such a figure, or at least inform the librarian or information manager what approximation is used by that company (though surprisingly many otherwise sophisticated businesses do not use this method ) . in an academic environment, advice can usually be obtained from someone in the economics department or in the business school. in any case, 10% is a good rule of thumb. the second input is the expected life span. this is not as crucial as one might suppose, for the farther distant the cash flow, the less its net present value. the net present value factor in this case for 15 years' life expectancy was 7.7688; for ten years it would have been 6.3213, for 20 years 8.6466-not a great difference. as is invariably the case, many of the effects of scope were difficult to quantify. the most difficult were those in the sections "savings resulting from control of the collection not previously praclkable." since the collection can now be easily analyzed and scrutinized with only a minimum expenditure of research staff time, the rate of growth of the collection has been considerably tamed, while maintaining customer satisfaction. prior to scope, new subscriptions had been added at the rate of about 90 a year. when scope was implemented, this fell to 10, and has now risen to approximately 30. during its first year of operation, scope apparently resulted in 80 fewer periodical subscriptions, the second year, 60 fewer. continuing this progression, 80, 60, 40, 20, 0, one would arrive at the conclusion that a long-range reduction in collection size of 200 subscriptions was achievable. to be conservative, the calculation has been based 138 ]ourool of library automation vol. 4/3 september, 1971 on an estimate of a reduction of 100 subscriptions/year. even this estimate represents a saving of over $4,000 per year. the resulting space savings were based on a cost of $10 per square feet per year (standard occupancy charges adjusted for stack use) and a ten-year cycle in stack space enlargement. this scrutiny might have been done manually at a justifiable cost, but it had not been done, and more importantly probably would not have been done. the operational savings may be open to some criticism because, as is probably obvious to an experienced serials record librarian, the previous manual system was not strikingly efficient. it can well be argued that the most efficient possible manual system rather than the previous system should have been the alternative against which scope was evaluated. from the point of view of the organization, however, the relevant comparison is to actuality, not to what is theoretically possible, but in generalizing the results this specificity must be borne in mind. somewhat mitigating this circumstance, however, is the fact that the running costs of scope are probably overestimated. the computer cost is based on full costing, inappropriately high for the following reasons: 1) it includes programming overhead, but since scope was programmed externally, the scope project is being doubly charged for its programming; 2) the same double charging applies to program maintenance; 3) the costing makes no distinction between high priority jobs, and relatively low priority jobs such as scope, and presumably low priority is less expensive. since the distortions in the two paragraphs above are difficult to estimate and since they are to a degree counterbalancing, they are simply noted rather than quantified. the yearly operational savings ( $7,050) still intuitively appear surprisingly high. one's initial reaction is that even with overhead included, this is not a great deal less than the yearly cost of one library assistant. in point of fact, one library assistant has been transferred from the library to the rapidly expanding computer based information section (computer based sdi and retrospective searching ), with no apparent deterioration of library services. the library is in fact handling a greater work load than previously, with one less person. this cannot be entirely attributed to scope, as some other rationalization of library operations has b een introduced, but it does indicate that the calculated savings are not a grossly distorted reflection of reality. conclusion as pointed out in the introduction, almost any significant attempt at library automation will require an investment decision. librarians should be prepared to make analyses of their proposals in terms of their justifiability as investment decisions, both for reasons of politics and for their own satisfaction and confidence. the net present value method is a powerful, convenient, and useful tool for such analyses. it is hoped that this scope: a cost analysis/koenig, et al. 139 article will serve as a reasonable case study for the application of this technique to the problems of library automation. an automated serial records system for a relatively modest ( 1,100 serial and periodical titles ) special library has run successfully and achieved its objectives for more than a year and a half. one of the major objectives was to produce a system that allowed clerical help to be substituted for a librarian's scarce and costly time, thus allowing more effective utilization of the professional librarian's skills. this objective has been met. furthermore, a complete turnover of the personnel interfacing with the system has been accomplished easily and painlessly. no small part of the credit goes to the originators who designed and documented the system for such turnover. jt is an old chestnut, but well worth repeating-"design the systems not for yourself, but for the person who will be chosen to replace you." the cost analysis of the operations of the system indicate that its design, implementation, and operation are economically justified, and that capital investment will be paid off in approximately seven years. (the crude payback period was less than five years. ) the major implication of this economic justification lies in the relatively modest size of the library's operation. it may well be that the break-even point in terms of collection size required for successful and cost-effective automation of serial records is smaller than has heretofore been assumed. references 1. dougherty, richard m.: "cost analysis studies in libraries: is there a basis for comparison," library resources & technical services, 13 (winter 1969), 136-141. 2. fasana, paul j.: "determining the cost of library automation," a. l. a. bulletin, 61 (june 1967 ) 656-661. 3 . griffin , hillis l.: "estimating data processing costs in libraries," college and research libraries, 25 (sept. 1964), 400-403, 431. 4. kilgour, frederick g.: "costs of library catalog cards produced by computer," journal of library automation, 1 (june 1968), 121-127. 5. chapin, richard e.; pretzer, dale h.: "comparative costs of converting shelf list records to machine readable form," journal of library automation, 1 (march 1968), 66-74. 6. black, donald v.: "creation of computer input in an expanded character set," ] ournal of library automation, 1 (june 1968), 110-120. 7. jacob, m. e. l.: "standardized costs for automated library systems," ] ournal of library automation, 3 (september 1970), 207-217. 8. lebowitz, abraham 1.: "the aec library serial record: a study in library mechanization," special libraries, 53 (march 1967), 149-153. 9. scoones, m.: "the mechanization of serial records with particular reference to subscription control," as lib proceedings, 19 (february 1967)' 45-62. 140 journal of library automation vol. 4/3 september, 1971 10. pizer, irwin h.; franz, donald r. ; brodman, estelle: "mechanization of library procedures in the medium-sized medical library: the serial record," medical library association bulletin, ll (july 1963 ), 313-338. 11. university of california, san diego, university library: report on serials computer project (la jolla, cal., university library, 1962). 12. grosch, audrey n.: university of minnesota bio-medical library serials control system. comprehensive report (minneapolis, university of minnesota libraries, 1968) 91 p. 13. strom, karen d.: "software design for bio-medical library serials control system." in american society for information service, annual meeting, 20-24 oct. 1968, proceedings, vol. 5. (new york, greenwood publishing corp. 1968), 267-275. 14. shillinglaw, gordon: cost accounting analysis and control (homewood, illinois, richard d . irwin inc. 1967) 913 p. assignfast: an autosuggest-based tool for fast subject assignment rick bennett, edward t. o’neill, and kerre kammerer information technology and libraries | march 2014 34 abstract subject assignment is really a three-phase task. the first phase is intellectual—reviewing the material and determining its topic. the second phase is more mechanical—identifying the correct subject heading(s). the final phase is retyping or cutting and pasting the heading(s) into the cataloging interface along with any diacritics, and potentially correcting formatting and subfield coding. if authority control is available in the interface, some of these tasks may be automated or partially automated. a cataloger with a reasonable knowledge of faceted application of subject terminology (fast)1,2 or even library of congress subject headings (lcsh)3 can quickly get to the proper heading but usually needs to confirm the final details—was it plural? am i thinking of an alternate form? is it inverted? etc. this often requires consulting the full authority file interface. assignfast is a web service that consolidates the entire second phase of the manual process of subject assignment for fast subjects into a single step based on autosuggest technology. background faceted application of subject terminology (fast) subject headings were derived from the library of congress subject headings (lcsh) with the goal of making the schema easier to understand, control, apply, and use while maintaining the rich vocabula ry of the source. the intent was to develop a simplified subject heading schema that could be assigned and used by nonprofessional cataloger or indexers. faceting makes the task of subject assignment easier. without the complex rules for combining the separate subdivisions to form an lcsh heading, only the selection of the proper heading is necessary. the now-familiar autosuggest4,5 technology is used in web search and other text entry applications to help the user enter data by displaying and allowing the selection of the desired text before typing is complete. this helps with error correction, spelling, and identification of commonly used terminology. prior discussions of autosuggest functionality in library systems have focused primarily on discovery rather than on cataloging.6-11 rick bennett (rick_bennett@oclc.org) is a consulting software engineer in oclc research , edward t. o’neill (oneill@oclc.org) is a senior research scientist at oclc research and project manager for fast, and kerre kammerer (kammerer@oclc.org) is a consulting software engineer in oclc research, dublin, ohio. http://www.oclc.org/research/activities/fast.html http://www.loc.gov/catdir/cpso/lcc.html mailto:rick_bennett@oclc.org mailto:oneill@oclc.org mailto:kammerer@oclc.org information technology and libraries | march 2014 35 the literature often uses synonyms for autosuggest, such as autocomplete or type-ahead. since assignfast can lead to terms that are not being typed , autosuggest seems most appropriate and will be used here. the assignfast web service combines the simplified subject choice capabilities of f ast with the text selection features of autosuggest technology to create an in -interface subject assignment tool. much of a full featured search interface for the fast authorities, such as searchfast ,12 can be integrated into the subject entry field of a cataloging interface. this eliminates the need to switch screens, cut and paste, and make control character changes that may differ between the authority search interface and the cataloging interface. as a web service, assignfast can be added to existing cataloging interfaces. in this paper, the actual operation of assignfast is described , followed by how the assignfast web service is connected to an interface, and finally by a description of the web service construction. assignfast operation an authority record contains the established heading, see headings, and control numbers that may be used for linking or other future reference. the relevant fields of the fast record for motion pictures are shown here: control number fst01027285 established heading motion pictures see cinema see feature films -history and criticism see films see movies see moving-pictures in fast, the facet of each heading is known. motion pictures is a topical heading. the see references are unauthorized forms of the established heading. if someone intended to enter cinema as a subject heading, they would be directed to use the established heading motion pictures. for a typical workflow, the subject cataloger would need to leave the cataloging interface, search for “cinema” in an authority file interface, find that the established heading was motion pictures, and return to the cataloging interface to enter the established heading. the figure below shows the same process when assignfast is integrated into the cataloging interface. without leaving the cataloging interface, typing only “cine” shows both the see term that was initially intended and the established heading in a selection list. assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 36 figure 1. assignfast typical selection choices. selecting “cinema use motion pictures” enters the established term, and the entry process is complete for that subject. figure 2. assignfast selection result. the text above the entry box provides the fast id number and facet type. information technology and libraries | march 2014 37 as a web service, assignfast headings can be manipulated by the cataloging interface software after selection and before they are entered into the box. for example, one option available in the assignfast demo is marcbreaker format.13 marcbreaker combines marc field tagging and allows diacritics to be entered using only ascii characters. using marcbreaker output, assignfast returns the following for “ ”: =651 7$abrazil$zs{tilde}ao paulo$0(ocolc)fst01205761$2fast in this case, the output includes marc tagging of 651 (geographic), as well as subfie ld coding ($z) that identifies the city within brazil, that it’s a fast heading, and the fast control number. the information is available in the assignfast result to fill one or multiple input boxes and to reformat as needed for the particular cataloging interface. addition to web browser interfaces as a web service, assignfast could be added to any web-connected interface. a simple example is given here to add assignfast functionality to a web browser interface using javascript and jquery (http://jquery.com). these technologies are commonly used, and other implementation technologies would be similar. example files for this demo can be found on the oclc developers network under assignfast.14 the example uses the jquery.autocomplete function.15 first, the script packages jquery.js, jqueryui.js, and the style sheet jquery-ui.css are required. version 1.5.2 of jquery and version 1.8.7 for jquery-ui was used for this example, but other compatible versions should be fine. these are added to the html in the script and link tags. the second modification to the cataloging interface is to surround the existing subject search input box with a set of div tags.

the final modification is to add javascript to connect the assignfast web service to the search input box. this function should be called from assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 38 function setuppage() { // connect the autosubject to the input areas jquery('#existingbox').autoc omplete( { source: autosubjectexample, minlength: 1, select: function(event, ui) { jquery('#extrainformation').html("fast id " + ui.item.idroot + " facet "+ gettypefromtag(ui.item.tag)+ ""); } //end select } ).data( "autocomplete" )._renderitem = function( ul, item ) { formatsuggest(ul, item);}; } //end setuppage() the source: autosubjectexample tells the autocomplete function to get the data from the autosubjectexample function, which in turns calls the assignfast web service. this is in the assignfastcomplete.js file. in select: function, the extrainformation text is rewritten with additional information returned with the selected heading. in this case, the fast number and facet are displayed. the generic _renderitem of the jquery.autocomplete function is overwritten by the formatsuggest function (found in assignfastcomplete.js) to create a display that differentiates the see from the authorized headings that are returned in the search. the version used for this example shows: see heading use authorized heading when a see heading is returned, or simply the authorized heading otherwise. web service construction the autosuggest service for a fast heading was constructed a little differently than the typical autosuggest. for a typical autosuggest for the term motion picture from the example given above, you would index just that term. as the term was typed, motion picture and other terms starting with the text entered so far would be shown until you resolved the desired heading. for example, typing in “m t” might give motion pictures motion picture music employee motivation information technology and libraries | march 2014 39 diesel motor mothers and daughters for the typical autosuggest, the term indexed is the term displayed and is the term returned when selected. for assignfast, both the established and see references are indexed. however, when typing resolves a see heading, both the see heading and its established heading are displayed. only the established heading is selected, even if you are typing the see heading. for assignfast, the “m t” result now becomes features (motion pictures) use feature films motion pictures motorcars (automobiles) use automobiles motion picture music background music for motion pictures use motion picture music motion pictures for the hearing impaired use films for the hearing impaired documentaries, motion picture use documentary films mother of god use mary, blessed virgin, saint the headings in assignfast are ranked by how often they are used in worldcat, so headings that are more common appear at the top. to place the established heading above the see heading when they are similar, the established heading is also ranked higher than the see for the same usage. assignfast can also be searched by facet, so if only topical or geographic headings are desired, only headings from these facets will be displayed. the web service uses a solr16 search engine running under tomcat.17 this provides full text search and many options for cleaning and manipulating the terms within the index. the particular option used for assignfast is the edgengramfilter.18 this option is used for autosuggest and has each word indexed one letter at a time, building to its entire length. the ndex f “c nem ” w then c nt n “c,” “c ,” “c n,” “c ne,” “c nem,” nd “c nem .” solr handles utf-8 encoded unicode for both input and output. the assignfast indexes and queries are normalized using fast normalization19 to remove punctuation, diacritics, and capitalization. fast normalization is very similar to naco normalization, although in fast nor malization the subfield indicator is replaced by a space and no commas retained. assignfast is accessed using a rest request.20 rest requests consist of urls that can be invoked via either http post or get methods, either programmatically or via a web browser. http://fast.oclc.org/searchfast/fastsuggest?&query=[query]&queryindex=[queryindex]&qu eryreturn=[queryreturn]&suggest=autosuggest&rows=[numrows]&callback=[callbackfun ction] assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 40 where parameter description query the query to search queryindex the index corresponding to the fast facet. these include name description suggestall all facets suggest00 personal names suggest10 corporate names suggest11 events suggest30 uniform titles suggest50 topicals suggest51 geographic names suggest55 form/genre queryreturn information requested list, comma separated. these include: names description idroot fast number auth authorized heading, formatted for display with—as subfield separator type alt or auth—indicates whether the match on the queryindex was to an authorized or see heading tag marc authority tag number for the heading—100= personal name, 150 = topical, etc. raw authorized heading, with subfield indicators. blank if this is identical to auth (i.e., no subfields) breaker authorized heading in marcbreaker format. blank if this is identical to raw (i.e., no diacritics) indicator indicator 1 from the authorized heading numrows headings to return maximum restricted to 20 callback the callback function name for jsonp table 1. assignfast web service results description. information technology and libraries | march 2014 41 example response: http://fast.oclc.org/searchfast/fastsuggest?&query=hog&queryindex=suggestall&queryreturn=s uggestall%2cidroot%2cauth%2ctag%2ctype%2craw%2cbreaker%2cindicator&suggest=autosu bject&rows=3&callback=testcall yields the following response: testcall({ "responseheader":{ "status":0, "qtime":148, "params":{ "json.wrf":"testcall", "fl":"suggestall,idroot,auth,tag,ty pe,raw,b reaker,indicator", "q":"suggestall:hog", "rows":"3"}}, "response":{"numfound":1031,"start":0,"docs" :[ { "idroot":"fst01140419", "tag":150, "indicator":" ", "type":"alt", "auth":"swine", "raw":"", "breaker":"", "suggestall":["hogs"]}, { "idroot":"fst01140470", "tag":150, "indicator":" ", "type":"alt", "auth":"swine--ho using", "raw":"swine$xhousing", "breaker":"", "suggestall":["hog houses"]}, { "idroot":"fst00061534", "tag":100, "indicator":"1", "type":"auth", "auth":"hogarth, william, 1697-1764", "raw":"hogarth, william,$d1697-1764", "breaker":"", "suggestall":["hogarth, william, 1697-1764"]}] }}) table 3. typical assignfast json data return. assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 42 the first response heading is the use for headin hogs, which has the authorized heading swine. the second is the use for heading for hog houses, which has the authorized heading swine-housing. this authorized heading is also given in its raw form, including the $x subfield separator, which is unnecessary for the first heading. the third response matches the authorized heading for hogarth, william, 1697–1764, which is also given in its raw form. the breaker (marcbreaker) format is only added if it differs from the raw form, which is only when diacritics are present. conclusions subject assignment is a combination of intellectual and manual tasks. the assignfast web service can be easily integrated into existing cataloging interfaces, greatly reducing the manual effort eq ed f g d s bject d t ent y nc e s ng the c t ge ’s p d ct v ty. references 1. lois mai chan and edward t. o’neill, fast: faceted application of subject terminology, principles and applications (santa barbara, ca: libraries unlimited, 2010), http://lu.com/showbook.cfm?isbn=9781591587224 . 2. oclc research activities associated with fast are summarized at http://www.oclc.org/research/activities/fast. 3. lois m. chan, library of congress subject headings: principles and application: principles and application (westport, ct: libraries unlimited, 2005). 4. “a t c mp ete ” wikipedia, last modified on october 1, 2013, http://en.wikipedia.org/wiki/autocomplete. 5. tony russell-rose, “des gn ng e ch: as-you-type suggestions,” ux magazine, article no. 828, may 16, 2012, http://uxmag.com/articles/designing-search-as-you-type-suggestions. 6. david ward, jim hahn, and kirsten fe st “a t c mp ete s rese ch t : a t dy n providing search suggestions ” information technology & libraries 31, no. 4 (december 2012), 6–19. 7. jon je mey “a t m ted indexing: feeding the autocomplete monster,” indexer 28, no. 2 (june 2010), 74–75. 8. holger bast, christian w. mortensen, and ingmar webe “o tp t-sensitive autocompletion search,” information retrieval 11 (august 2008), 269–286. 9. elías tzoc, “re-using today’s metadata for tomorrow’s research: five practical examples for enh nc ng access t d g t c ect ns ” journal of electronic resources librarianship 23, no. 1 (january–march 2011) http://lu.com/showbook.cfm?isbn=9781591587224 http://www.oclc.org/research/activities/fast/ http://en.wikipedia.org/wiki/autocomplete http://uxmag.com/articles/designing-search-as-you-type-suggestions information technology and libraries | march 2014 43 10. holger bast and ingmar webe “type less f nd m e: f st a t c mp et n e ch w th succinct index,” sigir ’06 proceedings of the 29th annual international acm sigir conference on research and development in information retrieval (new york: acm, 2006), 364–71. 11. demian katz, ralph levan, and ya’aqov ziso “us ng a th ty d t n v f nd ” code4lib journal 11 (june 2011). 12. edward t. o’ne , rick bennett, and kerre kammerer, “using authorities to improve subject searches ” in m j ž me nd k. r e nd edw d t. o’ne eds., “ ey nd l b es— subject metadata in the digital environment and semantic web ,” special issue, cataloging & classification quarterly 52, no. 1/2 (in press). 13. “marcm ke nd marc e ke use ’s m n ” library of congress, network development and marc standards office, revised november 2007, http://www.loc.gov/marc/makrbrkr.html . 14. “oclc deve pe s netw k— ss gnfa t ” s bm tted eptembe 28 2012 http://oclc.org/developer/services/assignfast [page not found] 15. “jq e y t c mp ete ” ccessed oct be 1 2013 http://jqueryui.com/autocomplete. 16. “ap che l cene—ap che ” ccessed oct be 1 2013 http://lucene.apache.org/solr. 17. “ap che t mc t ” ccessed oct be 30 2013 http://tomcat.apache.org. 18. “ olr w k —analyzers tokenizers tokenfilters,” last edited october 29, 2013, http://wiki.apache.org/solr/analyzerstokenizerstokenfilters . 19. thomas b. hickey, jenny toves, and edward t. o’neill, “naco normalization: a detailed examination of the authority file comparison rules,” library resources & technical services 50, no. 3 (2006), 166–72. 20. “rep esent t n t te t nsfe ” wikipedia, last modified on october 21, 2013, http://en.wikipedia.org/wiki/representational_state_transfer . http://www.loc.gov/marc/makrbrkr.html http://oclc.org/developer/services/assignfast http://jqueryui.com/autocomplete/ http://lucene.apache.org/solr/ http://tomcat.apache.org/ http://wiki.apache.org/solr/analyzerstokenizerstokenfilters http://en.wikipedia.org/wiki/representational_state_transfer levan opensearch and sru | levan 151 not all library content can be exposed as html pages for harvesting by search engines such as google and yahoo!. if a library instead exposes its content through a local search interface, that content can then be found by users of metasearch engines such as a9 and vivísimo. the functionality provided by the local search engine will affect the functionality of the metasearch engine and the findability of the library’s content. this paper describes that situation and some emerging standards in the metasearch arena that choose different balance points between functionality and ease of implementation. editor's note: this article was submitted in honor of the fortieth anniversaries of lita and ital. ฀ the content provider’s dilemma consider the increasingly common situation in which a library wants to expose its digital content to its users. suppose it knows that its users prefer search engines that search the contents of many sites simultaneously, rather than site-specific engines such as the one on the library’s web site. in order to support the preferences of its users, this library must make its contents accessible to search engines of the first type. the easiest way to do this is for the library to convert its contents to html pages and let the harvesting search engines such as google and yahoo! collect those pages and provide searching on them. however, a serious problem with harvesting search engines is that they place limits on how much data they will collect from any one site. google and yahoo! will not harvest a 3-million-record book catalog, even if the library can figure out how to turn the catalog entries into individual web pages. an alternative to exposing library content to harvesting search engines as html pages is to provide a local search interface and let a metasearch engine combine the results of searching the library’s site with the results from searching many other sites simultaneously. users of metasearch engines get the same advantage that users of harvesting search engines get (i.e., the ability to search the contents of many sites simultaneously) plus those users get access to data that the harvesting search engines do not have. the issue for the library is determining how much functionality it must provide in its local search engine so that the metasearch engine can, in turn, provide acceptable functionality to its users. the amount of functionality that the library provides will determine which metasearch engines will be able to access the library’s content. metasearch engines, such as a9 and vivísimo, are search engines that take a user’s query, send it to other search engines, and integrate the responses.1 the level of integration usually depends on the metasearch engine’s ability to understand the responses it receives from the various search engines it has queried. if the response is html intended for display on a browser, then the metasearch engine developers have to write code to parse through the html looking for the content. in such a case, the perceived value of the content determines the level of effort that the metasearch engine developers put into the parsing task; low-value content will have a low priority for developer time and will either suffer from poor integration or be excluded. for metasearch engines to work, they need to know how to send a search to the local search engine and how to interpret the results. metasearch engines such as vivísimo and a9 have staffs of programmers who write code to translate the queries they get from users into queries that the local search engines can accept. metasearch engines also have to develop code to convert all the responses returned by the local search engines into some common format so that those results can be combined and displayed to the user. this is tedious work that is prone to breaking when a local search engine changes how it searches or how it returns its response. the job of the metasearch engine is made much simpler if the local search engine supports a standard search interface such as sru (search and retrieve url) or opensearch. ฀ what does a metasearch engine need in order to use a local search engine? the search process consists of two basic steps. first, the search is performed. second, records are retrieved. to do a search, the metasearch engine needs to know: 1. the location of the local search engine 2. the form of the queries that the local search engine expects 3. how to send the query to the local search engine to retrieve records, the metasearch engine needs to know: 4. how to find the records in the response 5. how to parse the records opensearch and sru: a continuum of searching ralph levan ralph levan (levan@oclc.org) is a research scientist at oclc online computer library center in dublin, ohio. 152 information technology and libraries | september 2006 ฀ four protocols this paper will discuss four search protocols: opensearch, opensearch 1.1, sru, and the metasearch xml gateway (mxg).2 opensearch was initially developed for the a9 metasearch engine. it provides a mechanism for content providers to notify a9 of their content. it also allows rss (really simple syndication) browsers to display the results of a search.3 opensearch 1.1 has just been released. it extends the original specification based on input from a number of organizations, microsoft being prominent among them. sru was developed by the z39.50 community.4 recognizing that their standard (now eighteen years old) needed updating, they simplified it and created a new web service based on an xml encoding carried over http. the mxg protocol is the product of the niso metasearch initiative, a committee of metasearch engine developers, content providers, and users.5 mxg uses sru as a starting place, but eases the requirement for support of a standard query grammar. ฀ functionality versus ease of implementation a library rarely has software developers. the library’s area of expertise is, first of all, the management of content and, secondarily, content creation. librarians use tools developed by other organizations to provide access to their content. these tools include the library’s opac, the software provided to search any licensed content, and the software necessary to build, maintain, and access local digital repositories. for a library, ease of adoption of a new search protocol is essential. if support for the search protocol is built into the library’s tools, then the library will use it. if a small piece of code can be written to convert the library’s existing tools to support the new protocol, the library may do that. similarly, the developers of the library’s tools will want to expend the minimum effort to support a new search protocol. the tool developer’s choice of search protocol to support will depend on the tension between the functionality needed and the level of effort that must be expended to provide and maintain it. if low functionality is acceptable, then a small development effort may be acceptable. high functionality will require a greater level of effort. the developers of the search protocols examined here recognize this tension and are modifying their protocols to make them easier to implement. the new opensearch 1.1 will make it easier for some local search-engine providers to implement by easing some of the functionality requirements of version 1.0. similarly, the niso metasearch committee has defined mxg, a variant of sru that eases some of the requirements of sru.6 ฀ search protocol basics once again, the five basic pieces of information that a metasearch engine needs in order to communicate effectively with a local search engine are: (1) local search engine location, (2) the query-grammar expected, (3) the request encoding, (4) the response encoding, and (5) the record encoding. the four protocols provide these pieces of information to one degree or another (see table 1). the four protocols expose a site’s searching functionality and return responses in a standard format. all of these protocols have some common properties. they expect that the content provider will have a description record that describes the search service. all of these services send searches via http as simple urls, and the responses are sent back as structured xml. to ease implementation, opensearch 1.1 allows the content provider to return html instead of xml. all four protocols use a description record to describe the local search engine. the opensearch protocols define what a description record looks like, but not how it is retrieved. the location of the description record is discovered by some means outside the protocol (a priori knowledge). the description record specifies the location of the local search engine. the sru protocols define what a description record looks like and specifies that it can be obtained from the local search engine. the location of the local search engine is provided by a means outside the protocol (a priori knowledge again). each protocol defines how to formulate the search url. opensearch does this by having the local search-engine provider supply a template of the url in the description record. sru does this by defining the url. opensearch and mxg do not define how to formulate the query. the metasearch engine can either pass the user’s query along to the local search engine unchanged or reformulate the query based on information about the local search engine’s query language that it has gotten by outside means (more a priori knowledge). in the first case, the metasearch engine has to hope that some magic will happen and the local search engine will do something useful with the query. in the latter case, the metasearch engine’s staff has to develop a query translator. sru specifies a standard query grammar: cql (common query language).7 this means that the metasearch engine only has to write one translator for all the sru local search engines in the world. but it also means that all the sru local search engines have to support the cql query grammar. since there are no local search engines that support cql as their native query grammar, the content provider is left with the task of translating cql queries into their native query grammar. the query translation task has moved from the metasearch engine to the content provider. opensearch and sru | levan 153 opensearch 1.0, mxg, and sru define the structure of the query response. in the case of opensearch, the response is returned as an rss message, with a couple of extra elements added. mxg and sru define an xml schema for their responses. opensearch 1.1 allows the local search engine to return the response as unstructured html. this moves the requirement of creating a standard response from the content provider and leaves the metasearch engine with the much tougher task of finding the content embedded in html. if the metasearch engine doesn’t write code to parse the response, then all it can do is display the response. it will not be able to combine the response from the local search engine with the responses from other engines. sru and mxg require that records be returned in xml and that the local search engine must specify the schema for those records in the response. this leaves the content provider with the task of formatting the records according to the schema of their choice, a task that the content provider is probably best able to do. in turn, the metasearch engine can convert the returned records into some common format so that the records from multiple local search engines can be combined into a single response. because the records are encoded in xml, it is assumed that standard xml formatting tools can be used for the conversion. opensearch does not define how records should be structured. the opensearch response has a place for the title of the record and a url that points to the record. the structure of the record is undefined. this leaves the metasearch engine with the task of parsing the record that is returned. again, the effort moves from the content provider to the metasearch engine. if the metasearch engine does not or cannot parse the records, then it can at least display the records in some context, but it cannot combine them with the records from another local search engine. ฀ conclusion these protocols sit on a spectrum of complexity, trading the content provider’s complexity for that of the search engine. however, with lessened complexity for the metasearch engine comes increased functionality for the user. metasearch engines have to choose what content providers they will search. those that provide a high level of functionality can be easily combined with their existing local search engines. content providers with a lower level of functionality will either need additional development by the metasearch engine or will not be searched. not all metasearch engines require the same level of functionality, nor will they be prepared to accept content with a low level of functionality. content providers, such as digital libraries and institutional repositories, will have to choose the functionality they need to support to reach the metasearch engines they desire. references and notes 1. joe barker, “meta-search engines,” in finding information on the internet: a tutorial (u.c. berkeley: teaching library internet workshops, aug. 23, 2005 [last update]), www.lib.berkeley. edu/teachinglib/guides/internet/metasearch.html (accessed may 8, 2006). 2. a9.com, “opensearch specification,” http://opensearch .a9.com/spec/ (accessed may 8, 2006); a9.com, “opensearch 1.1,” http://opensearch.a9.com/spec/1.1/ (accessed may 8, 2006). 3. mark pilgrim, “what is rss?” o’reilly xml.com, dec. 18, 2002, www.xml.com/pub/a/2002/12/18/dive-into-xml.html (accessed may 8, 2006). 4. the library of congress network development and marc standards office, “z39.50 maintenance agency page,” www.loc.gov/z3950/agency/ (accessed may 8, 2006). 5. national information standards organization, “niso metasearch initiative,” www.niso.org/committees/ ms_initiative.html (accessed may 8, 2006). 6. niso metasearch initiative task group 3, “niso metasearch xml gateway implementors guide, version 0.2,” may 16, 2005, [microsoft word document] www.lib.ncsu.edu/nisomi/images/0/06/niso_metasearch_initiative_xml _gateway _implementors_guide.doc (accessed may 8, 2006); the library of congress, “sru: search and retrieve via url; sru version 1.1 13 february 2004,” www.loc.gov/standards/sru/index.html (accessed may 8, 2006). 7. the library of congress, “common query language; cql version 1.1 13th february 2004.” [web page] www.loc .gov/standards/sru/cql/index.html (accessed may 8, 2006). table 1. comparison of requirements of four metasearch protocols for effective communication with local search engines protocol feature opensearch 1.1 opensearch 1.0 mxg sru local search engine location a priori a priori a priori a priori request encoding defined defined defined defined response encoding none rss xml xml record encoding none none xml xml query grammar none none none cql we do not have an information-prone society. when faced with a problem or interest, i suggest, we are more prone to ask, "what do i have to do?" rather than, "what do i have to know?" part of this reaction is probably due to the fact that when we ask "what do i have to know?" we are faced with another problem in addition to the initial one; i.e., where to get the information. this added effort simply confirms in us our indifference to information, and we take our best shot at solving the problem through decision and action . i sometimes think we have made a virtue of the information incapacity by the way we laud decision making as an indicator of ability. if the foregoing examples are reasonably accurate, we are then faced with a situation in which information is fundamentally important to societal and individual wellbeing, but is not perceived to be so by people in the conduct of their daily affairs. computer-supported telecommunications systems can be the instrument for accelerating information control by a few (this has been much of the trend , so far , as indicated by corporate, research, and technical use of these systems), or it can be used to build information confidence, use, and desire throughout society. this option, i. suggest, is central to the significance of telecommunications systems for a democratic society. if the latter option is to be obtained, i suggest that information will have to be packaged and targeted so well on people 's everyday problems and interests that it will be easier and more productive to say "what do i have to know?" before saying "what do i have to do?" a basic approach to articulating an information service of this kind consists of the following steps: l. determine and prioritize the individual and societal problems and interests of a given community. 2. ascertain the information parameters of those problems and interests. 3. locate and obtain the information necessary to address those problems and interests. 4. organize this information so as to optimally target the specified probcommunications 103 !em or interest to be as easily retrievable as possible. this requires an understanding of the context in which the information is used so that it is optimally relevant, and an understanding of the language and problem articulation common ·to the individuals in the community in order to ensure rapid retrieval. a lesson in interactive television programming: the home book club on qube w. theodore bolton: oclc , inc., columbus, ohio. on december 1, 1977, warner communications christened what has become the most publicized and talked about technological development in the field of cable television: qube, its two-way interactive cable system . publicity posters claimed that this would be "a day you'll tell your grandchildren about," and broadcasters added the word "interactive" to their cocktail-party vocabulary. academicians who ten· years ago forecast a technological revolution initiated by the marriage of computer to cable television, smugly grinned and saw their dreams turn into reality. response to qube, however, has been mixed. participatory television brings, to some, futuristic images of instant democracy ; others warn of its potential demagogic power. 1 regardless of your critical persuasion, there now exists what former cbs executive turned warner amex2 consultant mike dann calls "a whole new utility ."3 this whole new utility, whether in the form of qube cable television, or some other combination of computer, cable television, telephone, and standard over-the-air broadcasting, will change the way we conduct our lives <:nd interact with other people . the history of the home book club early in 1979, the oclc, inc . , research staff appraised the nature and context of the qube facilities (located in co104 journal of library automation vol. 14/2 june 1981 lumbus , ohio, only five miles away). discussions, which at times centered around far-fetched and lofty ideas, eventually led to realistic and inventive concepts that made use of qube's interactive technology. the most promising of these concepts was a book discussion program where the audience determined the content and direction of the discussion itself. hoping to take advantage of this new technology, and at the same time expand library services available to the general public, oclc proposed a book discussion program to qube. in a previously released statement, qube vice-president harlan kleiman had stated that the polling capabilities of the qube system should be treated like a "time bomb."4 yet oclc's proposal indicated an interest in exploring these very same devices. this factor, coupled with qube' s "closed door" policy toward outside researchers and scholars, seemed to indicated that the home book club research proposal would be rejected. but qube executives did the unexpected: they agreed to air six home book club programs, one each month . and so, on july 18, 1979, at 7 p.m . , the home book club premiered. an interactive book discussion what makes qube unique is its twoway, or upstream, capability. the qube technology is made up of three complementary computers that are used for monitoring, tabulation, and billing purposes . each qube console in a viewer's home has thirty channels to choose from and five response buttons to press when answering questions posed to home viewers on qube programs . by monitoring and tabulating data that show which tv sets are on, which programs viewers are watching, and which response buttons they last touched, qube therefore has a virtually error-free system of audience research . t his allows for a staggering amount of audience data to be compiled theoretically every six seconds . apart from the thirty-channel capability of standard television, community programs, and pay-per-viewing feature films, the most intriguing aspect of qube is its five response buttons. oclc felt that the use of these buttons should be emphasized and the concept of interaction should be fully incorporated into the home book club . at the beginning of each home book club program, home viewers were asked to select, from three alternatives, the opening topic of conversation about the book. after the home viewers had "touched in" their preference on one of the prespecified buttons, the qube polling computer tallied and displayed the results. once the book discussion was under way, the home viewers were given additional opportunities to "democratically" determine whether the panelists should continue in a particular topic area, or move on to new topic areas. if a controversial issue emerged within the course of a discussion, the horne book club panelists were encouraged to spontaneously pose interactive questions to horne viewers. this form of instantaneous polling was extended to telephone participants who were also periodically incorporated into the book discussion. a sampling of these opinion-type questions included: from the wifey program, "should sandy have left norman?"; from the metropolitan life program, "is this book too subjective for non-new yorkers?"; from the eye of the needle program, "was the violence portrayed a necessary part of this book?"; from the world according to carp program, "was this a feminist novel?" toward the end of each one-hour horne book club program the qube system broke new ground in interactive television history: home viewers selected, from five alternatives, the book to be discussed on next month's program. in addition, horne viewers were able to request a copy of the book to be sent to their home at no charge from the public library of columbus and franklin county (plcfc) . these two transactions took place with a mere touch of the prespecified button on the qube console. plcfc provided a major contribution to the horne book club. once the qube computers had compiled the names and addresses of those viewers who requested next month's book (earlier, all horne viewers had been told that their names would be entered in the qube computer if they responded to a book request), the qube computer printed the names on mailing labels. these labels were forwarded to the plcfc books-by-mail office, which then filled each request. the total time from "touch-in request" to "in-home mail delivery" was usually two to three days. indeed, a form of electronic catalog ordering actually took place each time the home book club program was cablecast in columbus. it should be noted that home book club viewers were also given the opportunity to order the alternative book choices. who watched the home book club? an additional use of qube's two-way capability was also incorporated into the first six home book club programs. prior to selecting and ordering the next months' books, home viewers were asked to respond to a series of demographic-type questions. from these questions, a profile of the typical home book club viewer was compiled to plcfc and qube management. this portion of the program also provided the oclc research department with data with which to explore the market-research potential of an interactive television system. from the beginning of the home book club research project, a few obvious limitations of interactive polling became apparent. first, not all home viewers made use of, or were willing to participate in, qube's interactive technology. response rates ranged from 20 to 85 percent, with an approximate mean rate of 55 percent. second, only one viewer in a multiple-person household could respond. third, it can be logically assumed that certain kinds of people will and did interact more often than others . taking these limitations into consideration, a few generalizations were still able to be made regarding the home book club audience . the demographic data traced over the first six programs showed the audience to be primarily composed of younger (below thirty-nine years of age), college-educated (65 percent had college communications 105 or postgraduate degrees), middle to upper income (60 percent earning $25 ,000 or more per year), females (approximately 70 percent of the interacting audience). these figures should not surprise anyone who is either familiar with previous profiles of the general library users or who may in passing conjure a guess as to what kind of person might be interested in viewing a televised interactive book discussion . a closer inspection of the instantaneous audience demographics, however, led to some disappointing implications. can a democratic television program survive? as was pointed out earlier, home viewers were permitted to select the next month's book at the conclusion of a program. this was strictly a democratic process where the majority ruled. the world according to carp, the premier home book club book, was followed by eye of the needle and wifey for programs two and three respectively. the qube computer indicated that each of these programs were viewed by approximately 175 households, or almost 420 individuals. in a competitive structure where there are twenty-nine television program alternatives from which a viewer can choose, qube, oclc, and the plcfc felt that a successful programming concept had been born. qube management enthusiastically reported that the home book club had achieved audience levels that at times rivaled their more extravagant and broadbased entertainment/interview program, "columbus alive." this enthusiasm was short-lived as audience-level figu res from program four came in . at the end of program three (wifey), the audience selected james michener's weighty novel chesapeake for the next month's program. the respectable figure of approximately 375 viewers for wifey dwindled to slightly less than 210 viewers for chesapeake. and to make matters worse, the audience-level figures did not improve for programs five and six. there are several alternative and sometimes complementary explanations for this substantial loss in audience. first, many viewers may not have been able to get 106 journal of library automation vol. 14/2 june 1981 through the some one thousand pages of "maryland's eastern shore" history in chesapeake, and thus chose not to participate in the horne book club. second, the new fall syndicated programs offered at that time by local network affiliates may have led many viewers to choose alternative programming. additional hypotheses can also be gleaned from the interactive demographic data: whereas in programs one through three approximately 40 percent of the audience indicated their educational level to be either some college or below, only 20 percent of the chesapeake audience (program four) fell into this category. this statistic remained constant for programs five and six of the horne book club. in the democratic television environment that the horne book club provides, what happens to the minority interest group? could this democratic television system be systematically eliminating specific viewer types? it might be that the outvoted minority group book reader can withstand being overruled just so many times before ceasing to participate. what recourse does this minority interest group have other than to be dominated by higher-educated viewers who heavily stuff the electronic ballot box in favor of their own book preferences? quite clearly the recourse for the minority interest group was to select a competing television program, as evidenced by the declining viewing audience-level figures. the loss of these viewers becomes especially disheartening because this particular audience segment may represent a group of individuals who never before participated in a book discussion . the future of the home book club given the somewhat disappointing results of the horne book club reported thus far, one would expect the program to be recorded in history as a noble, but unsuccessful, attempt at interactive television programming. the books-by-mail program did send out some 760 paperback books as a result of the horne book club (a 79 percent overall increase), and twenty-six new library cards (not a prerequisite) were issued to horne book club viewers . but the fact remains that a for-profit company such as warner annex most definitely cannot substantiate the continuation of a program that has the audience ratings as low as the horne book club . ... or can it? not only has the horne book club been continued (it's now in its twentieth month), but a morning edition of the horne book club premiered in june 1980. what explanations can account for this somewhat bewildering corporate behavior? on a very idealistic level, warner annex could be fulfilling its obligation to serve all facets of the columbus community. the horne book club certainly offers a viewing alternative to an often neglected segment of the viewing population. oclc, inc., and public libraries throughout the united states applaud this kind of responsible programming . on a more practical level, there may be other strategies behind the renewal of the horne book club contract. a 1978 study completed by the argus research corporation concluded that "no profits are expected from qube until the system is successfully replicated in cities other than columbus , and at considerably lower costs . "s to replicate the qube system, warner annex must expand its cable territory into new communities throughout the united states. this can at times be a very difficult task. the right for a company such as warner annex to wire a local municipality to its qube system is determined by local government. normally, a city council reviews and contrasts alternative cable systems in terms of the services each system proposes in return for franchising rights. the final decision usually is based on costs, the programming made available, and, most importantly, the kind of community service the cable system proposes to extend to its viewers . one definition of extended community service might be a televised book discussion program that involves the local public libraries . the alluring notion of an interactive book discussion may even be more appealing to community-minded city council members. in fact, qube is currently using an edited composite tape of h9rne book club highlights in their franchising efforts. the success of such efforts remains to be seen . whether warner amex's motives are communityor commercial-minded, the fact remains that other communities may have the opportunity to develop a program of this kind . since local governments can legally specify what services the cable company must provide, the inclusion of a televised book discussion program could become part of a contract fulfillment. advice for those interested in developing alternative television programs for special-interest groups : don't be caught napping when your national cable representatives come knocking on your city council door. as for the home book club , qube and the public library of columbus and franklin county are working at reestablishing a solid baseline audience . as is the case for any television program, promotion is a key ingredient for success . when viewers were asked where they first found out about the home book club, more than half indicated they obtained program information through the free qube program guide. approximately 15 percent heard from a friend and 12 percent found information at the public library . a coordinated promotional effort is highly recommended for a public-service program of this nature . the future of interactive television qube must be thought of as more than just a two-way television system. in fact, it is more than interactive television. qube is actually a computer hooked to a cable communication system . that cable communication system is a network providing a pathway for a wide variety of services from central facility to home subscribers . in the future, not only will systems such as qube provide "local loop" communications for these services, but undoubtedly will be interconnected by a satellite with other similar systems throughout the country and indeed the world. the five buttons on the existing qube consoles are just the first evidence of the future possibilities of interactive broadband communications systems currently delivering television . because the early applications of cable were to provide entertainment television, and more often communications 107 than not were provided by people in the television business, cable television is naturally oriented toward the entertainment business . but the future of these broadband communications systems is in interactive retrieval of information as much as it is in entertainment. this goes far beyond the simple polled system so frequently used in a two-way mechanism : the talk show host asks how many people have read a particular book, the audience responds, and the net result has no effect on the program itself. it is also a lot more than interactive television : the host asks what you want to discuss, the audience says the plot of the book, and the answer has an effect on the outcome of the show. in fact , these broadband communications systems have the potential for placing at the fingertips of americans a vast storehouse of information services about, for example, the best auto routes to your favorite spots, baby care, banking, buying a house , dressmaking, good buys, hobbies , jobs , legal facts , properties for sale or rent, sports scores, technology, and wine. as qube expands into its qube iii system with more than a hundred channels of services, it will be technically positioned to support all aspects of this burgeoning information age. 6 besides simple information retrieval, a qube subscriber will be able to conduct banking and shopping transactions, to provide information such as who is on what side of community issues , and also (incidentally) to watch television . if all of that does not seem like enough, remember that cable "is really a very large pipe through which any variety of electronic information can be pushed. passive home security, fire alarm , and energy management are also services either in existence or contemplated by a number of cable operators . for that matter there is no reason to believe the computer processing services can't be made available to individual subscribers . a subscriber could call up the program to balance his checkbook, to perform his smallbusiness payroll calculations, or to complete a statistical analysis of data for a school project. most people thought (as we initially did) that interactive cable (qube) means interactive television . but oclc's research 108 journal of library automation vol. 14/2 june 1981 has shown that interactive television programs : 1. serve as an initial introduction to naive audiences of what a truly interactive system is all about; 2. are difficult to implement; 3. really aren't democratic; 4. are basically polling devices. it has been said that the reason that railroads went out of business was because they insisted that they were in the railroad business and wouldn't admit that they were in the transportation business. if cable operators insist that they are in the television business, they may well miss the opportunities that are possible in the communications business or, in fact, in the information business . by the same token, if libraries miss the significance of what cable television is bringing to their business, their role in the community will be diminished and libraries may go the way of railroads. modern communications and computers offer an opportunity for libraries to become the information choice in their community. in the near future, applications such as the home book club may well be a way to provide increased accessibility of library services to library patrons, and to "condition" those patrons to the coming electronic nature of libraries . over the long term, libraries, if they have the courage and the foresight, can be the focus of the coming information and telecommunications revolution . the message is quite clear: opportunities abound. references l. john wicklein, "wired city , u.s.a: the charms and dangers of two-way tv," atlantic monthly 243:35--42 (feb. 1979). 2. warner amex represents a newly formed corporation resulting from the merger of warner communications and american express. 3. jonathan black, "brave new world of television," new times ll:41 (24 july 1978). 4. ibid., p.49. 5. "warner cable's qube: exploring the outer reaches of two-way tv," broadcasting 95:28 (31 july 1978). 6. "two-way converters hot ticket at ncta exhibits," broadcasting 97:72 (26 may 1980). an informal survey of the cti computer backup system joseph covino and sheila intner: great neck library, great neck, new york. in order to help decide whether or not to purchase computer backup systems from computer translation, inc. (cti), * for use when the clsi libs 100 automated circulation system is not operating, great neck library conducted an informal survey of libraries using both systems . eleven institutions, including both public and academic libraries, responded to a brief questionnaire. they were asked what size cti system they had purchased and why, how easily it was installed, how well it performed, how it was maintained, and if clsi acknowledged that the addition of the backup did not affect their libs 100 maintenance agreements . before summarizing the responses, the structure of the two systems and how they interact should be outlined. clsi libs 100 the clsi automated circulation system consists of a stand -alone minicomputer console with local and/or remote terminals connected to it through individual ports by means of electrical and/or dedicated telephone line hookups. when it operates, the terminals are online and interactive with the database, which is stored on one or more multiplatter disc packs. cti backup the cti backup system is based on an apple ii microcomputer with two minidisc drives, which take 5 1/4-inch floppy discs, a tv monitor, and a switching system that can be connected to the libs 100 console or its terminals . the cti system can also be used alone. when the libs 100 is down (inoperative), the cti system is connected to a terminal, and data is recorded on its discs for later dumping (data entry) into the database via a port connection . it *cti is a profit-making company wholly owned by brigham young university. the cti backup system was originally developed to support the clsi"installation at byu. lib-mocs-kmc364-20131012125101 174 oclc's database conversion: a user's perspective arnold wajenberg and michael gorman: university of illinois library, urbana-champaign this article describes the experience of a large academic library with headings in the oclc database that have been converted to aacr2 form. it also considers the use of lc authority records in the database. specific problems are discussed, including some resulting from lc practices. nevertheless, the presence of the authority records, and especially the conversion of about 40 percent of the headings in the bibliographic file, has been of great benefit to the library, significantly speeding up the cataloging operation. an appendix contains guidelines for the cataloging staff of the university of illinois, urbana-champaign in the interpretation and use of lc authority records and converted headings. the library of the university of illinois, urbana-champaign, is the largest library of a publicly supported academic institution, and the fifth largest library of any kind, in the united states. in the last year for which figures are available (1979-80), the library added more than 180,000 volumes representing more than 80,000 titles. the library is currently cataloging more than 8,000 titles a month; more than 80 percent of the records for these titles are derived from the oclc database (library of congress and oclc member copy). because our cataloging is of such volume and because we are actively engaged in the development of an online catalog, we decided to use the second edition of the anglo-american cataloguing rules (aacr2) earlier than the "official" starting date of january 1981. we began to use aacr2 for all our cataloging in november 1979. this early use of aacr2 has led to two consequences. first, we now have oclc archival tapes representing about 150,000 titles cataloged according to aacr2. this represents a valuable and continuously growing bibliographic resource that can be used without modification in our future online catalog. second, we have a considerable and unique collective experience in the practical application of aacr2. the minor problems of working with aacr2 in an aacrl manuscript received june 1981; accepted june 1981. oclc's database conversion/w ajenberg and gorman 175 plus superimposition environment (until january 1981) were more than compensated for by these two positive results. oclc conversion with our practical background in the use of aacr2 and our continuing need for a high volume of cataloging, we were, naturally, keenly interested in the (to our mind) progressive decision of oclc to use machine matching techniques to convert the form of name and title headings in its database-the online union catalog (oluc) to conform to aacr2. we recognized the limitations of the project, essentially those defined by the capabilities of the computer for matching character by character, but felt that this was a major venture that would, when completed, produce major benefits. what follows is an assessment and analysis of the results of the project in the light of the experience of a library that is dedicated to achieving highvolume, quality cataloging. we deal with the lc authority file as well as the oclc headings because the lc file was the basis of the project and because, from the practical point of view, the two files are complementary aspects of the same service. the greatest value of the conversion, and its greatest claim to uniqueness, lies in the sheer size of the project in terms of headings checked and changed. our catalogers, and others who work with current materials, estimate that more than 40 percent of the name and title fields we use in our current cataloging have a w subfield indicating that the name or title has been changed to its aacr2 form. since oclc estimates that 39 percent of the name and title fields were affected by the conversion, it would appear that the headings that were changed are the headings that we are more likely to use. in other words, the project has brought us more than a 39 percent benefit . we are also greatly encouraged to find that the number of headings coded dn (meaning aacr2 "compatible," or, more bluntly, lc's modifications of the provisions of aacr2) is a very tiny minority of all converted headings. this means that when, in the future, this policy of "compatibility" is lessened or dropped, there will be relatively few changes to be made. lc authority records we also benefit from the presence of lc authority records in the oclc database when we establish headings that are new to our catalogs. there is one problem with the use of these records, which was revealed by a sample of new university of illinois authority records (see table 1). this sample of 368 new university of illinois records reveals that lc authority records are available relatively rarely for new headings. this is not surprising as these new headings are established most often as part of the process of original cataloging, which , almost without exception , occurs in our library only when oclc copy is not available. it seems to us to be unfortunate that 176 journal of library automation vol. 14/3 septem her 1981 table 1. recently established headings no record record record authority coded coded coded record c• d* n• given name headi ngs 13 5 0 1 single surname headings 212 26 2 2 (number of this category with (132) (7) (1) (2) initial isms expanded in parentheses) compound surname headings 29 12 0 (number of this category with (2) (0) (0) (0) initialisms expanded in parentheses) single surnames plus 3 0 0 0 uniform titles general corporate 34 12 0 0 headings general headings with 7 2 0 0 subdivisions government headings 4 2 0 total 302 59 2 5 *key: c-in subfield w, indicates an aacr2 form , as established by library of congress. din subfield w, indicates an aacr2 " compatible" fo rm, as established b y library of congress. n-in subfield w, indicates that the input operator could not determine which set of rules governed the form of the heading. member libraries cannot contribute their authority records to the oclc database. our experience suggests that the online authority file would grow very rapidly if that were the case. to put it another way, the oclc conversion provides an enormous and valuable resource of aacr2 headings. it did not, and could not, provide new authority information. oclc will be complementing its valuable work in upgrading the retrospective file when it devises and implements a scheme for making available authority records for new headings derived from a wide range of sources. since so many headings were converted to aacr2, it may seem churlish and ungrateful to complain that more was not done. the following descriptions are not intended to form part of an attack on oclc's project or to minimize its achievement. form subdivisions the project failed to delete form subdivisions (such as "liturgy and ritual" and "laws, statutes, etc.") from added entry headings and subjects. the program correctly deleted them from main entry headings, but the inconsistencies resulting from their retention elsewhere makes the job of ensuring consistency in a large copy cataloging operation that much harder. oclc's database conversion/w ajenberg and gorman 177 this inconsistency in treatment is illustrated by examples 1 and 2. example 1 originally was entered under 110 10 illinois. k laws, statutes, etc . the program correctly changed the main entry heading to 110 10 illinois and added a subfield w, coded mn (them indicates a conversion by machine to theaacr2 form; then means "not applicable," and indicates that there is no title element in the heading) . example 2 has as main entry 110 20 illinois community college board but has as added entry 710 10 illinois. k laws, statutes, etc. t illinois public community college act under aacr2, the subfield k, "laws, statutes, etc.," should not be present in the heading. unfortunately, the program looked only at 110 fields, not at 710 fields, and so the heading was not corrected in the conversion. it must therefore be edited manually by every library that uses the record. program problems our direct use of the online authority file is somewhat hampered by the programming oversight that makes it impossible to search uniform titles. of course, uniform titles that are accompanied by a 100 field (notably in music) can be retrieved by an author search, but those without 100 fields (anonymous classics, sacred scriptures, etc.) are virtually inaccessible. there were a handful of specific instances in which the specifications were inadequate or the programs seem to have malfunctioned. these resulted in some oddities such as the conversion of the subject "jesus christ" to "sermon on the mount" and the (surely not politically motivated) switch from "u.s. department of state" to "voice of america." oclc has been scrupulous in identifying and publicizing these errors. they are few in number and, though conspicuous, have rarely caused us many problems. as can be seen, the problems caused by what we see as failures on oclc's part are few and affect few cataloging circumstances. theremaining problems either result from the decisions and actions of the library of congress and, hence, are wholly or mostly out of oclc's control, or are of such a nature that they cannot be solved by computer matching techniques without extensive editorial intervention. whether such human intervention is possible and, if possible, cost-beneficial is not for us to say, though it must be recognized that to transform the oluc to pure aacr2 conformity would be a herculean task . that task would undoubtedly involve many of the hundreds of thousands of records that are seldom or never used. oclc's database conversion!w ajenberg and gorman 179 serials the most troublesome example of the kind of problem that cannot be resolved by machine matching is that of serials. the oclc conversion project was, quite properly, not concerned with choice of entry (aacr2, chapter 21). this seems a simple and clearly defined decision. when we come to consider serials, this clear distinction between choice and form of entry becomes blurred. the major change brought about by aacr2 (rule 21.1b2) is that many serials previously entered under the heading for a corporate body are to be entered under their titles. in fact, the great majority of serials will now be entered under title. the upshot of this is that the citation (or form of heading) for a serial changes from, for example, national society for medical research. bulletin to bulletin i national society for medical research the restriction of the oclc project to forms of heading means that most serials in oluc will be found under headings the form of which may be correct but are inappropriate for citations. this problem, which, of course, cannot be resolved by computer matching, has led to difficulties for us in copy cataloging, because a degree of expertise is needed to apply aacr2 rule 21.1b2 and to distinguish between the majority of serials where the 110 field should be changed to a 710 and the small minority where the 110 field should remain as it is. since most serials are to be entered under their titles, it occurs to us to suggest that the oclc conversion project could have changed all 110 fields in records identified as relating to serials to 710 . by that method, the majority of serials would be correctly entered and the potential for mistaken citations greatly reduced. multiple personal names persons who write under more than one name (real names, pseudonyms, etc.) and who are not primarily identified by one of those names (aacr2 22.2c3) pose a special problem. under the provisions of aacr2, such persons are to be represented in the catalog (and the database) under two or more names. despite the fact that "creasey, john" and "marric, j. j ." and" ashe, gordon" are all names used by the same man, they will appear as separate headings from now on. under aacrl plus superimposition one of those names ("creasey, john") was used as the heading for all works. within the confines of the oclc project, there was no method available to distribute the various records under the various headings. it occurs to us that some method based on matching the name found in the 245 $c subfield with the 100 field might, at least, have resulted in the project recognizing probable cases calling for multiple headings. for example: 100 a hibbert, eleanor 245 a bride of satan i $c jean plaidy 180 journal of library automation vol. 14/3 september 1981 could alert the system to a case for change. we recognize that this would call for more sophisticated computer matching techniques and that it would call for editorial intervention. a good example of the problem this has caused for us is the case of the danish author karen blixen. she wrote under that name and under the pseudonyms isak dinesen and pierre andrezel. records in the database that were added before 1981 will use "blixen, karen, 1885-1962" as the heading for all her works including those published under pseudonyms. since the blixen heading is a perfectly acceptable aacr2 form, the conversion program codes it as an aacr2 heading, which it is for the blixen books but is not for those published under other names. the authority record (example 3) includes a note identifying both pseudonyms as valid aacr2 headings, but, of course, the programs as written cannot interpret such a note and match them with appropriate records. corporate name changes corporate bodies present a similar problem when one is dealing with those that have changed their name. until1967, the library of congress used the latest name of such bodies with see references from the earlier names. both editions of aacr require that works issued under the earlier names be entered under those names and works issued under the latest name be entered under that name, the various names being connected by see also references. however, records in the oluc for earlier works cataloged before 1967 will show those works entered under a later name. for bib record enter t>1b display recd :o: end r~c: stat : n entrd : 80 11 :::1 u!;eooj : 80 1121 t·,p~ : = b1b lvl : g~vt a9n : ~ang : suurce : ~· t tt:o : 004 inlc: :.. en-: lvl : n h~ad ref: a h~ad : c•: i i~ ;;\ d s t ol. t •j s : ;1_ n-3-n~~.? : o:t mod h:~ c : a•j t h status : a 1 0 10 n 7~0077 1 9 '2 1(•0 10 bl1 ~~'' t:-:·~r:"no d t:::=:3c.j '=~62 . w r.001790::1~-:t.·'l.c.:t.nn----n r.n n :..; 4(h) 10 andrb'i-::'1?1· pj."'r' \!' w rp;.jo:::790::15a•:ht.snn----nnnd 4 4 0c"i t ll d1 n,~ '='1::n. ls·"l ~ w nl"l•) 37"''021 ~;,:toln-tnr.----nr.r.d 5 /;..(.7 th.: f-(.•11 r.• w lnq ps~ •jd•:•n•r•s. ~r i.· val l •j aa(r 2 h~adlrt:a~ : a ar.,jrbe,;:el, r" l ·:?r 1"' €'. u:::::5l ctt,~·~ .a d l l1(•!5-er. ~ i !.•-:\1 1 1f::35j '="162 w n 0047902 15>i<:ln0l nr. -·--r•nr.n no holding~. i n uiu for iioldh~c·. en ter .1h depress disf'lay recd ~·end f'\~ o: r t .:... t: ,en t r d: 7507 11 used : 8 10725 t,pe : ·"i:l btb lvl : m c• o vt r ~•ji• : _ l~r,9 : ..:tn9 ::::.;:, tj rce : ij i ll•js.: r~::pr: [i•·= 1·11: r c.:onf p•jb: _t-rr. : __ oat tp : _ m/f/b : _ _ _ ind ;.. : _mod rec : f~s.ts•: t-.r: _ c·:.or.t : d-?~·: : lr:.t lvl: 0dt~~ = 196:?.. _ l 0 1 0 63-11618 2 040 ~ orl 1 oc.l ~ ~.~: . ::: 0'::·0 0 pz 3. 86;2(126 b el • 4 0~..,2 fl •5 09:.· t; 6 04.;, tiiuu 7 100 10 bl1xen ~ ka~ en , d 188~ ·1 962 w e n :::: _::.q•:; 1 ehr·.:-n9ar-d c (b·,·) 1~<1 1 d1 nes.-?n [,. .. :-~ ... :jd) q 260 0 new yor~. brandon. house. c fi~~~j 10 "30•) 111 "· c 22 err •• example3 oclc,s database conversion/w ajenberg and gorman 181 because those later names are valid aacr2 headings in terms of their form , they are coded en (i.e., aacr2 validated) by the program , even though they may not be the right headings for the records to which they are attached. a good example of this problem is that of the "lutheran churchmissouri synod ." an earlier name of this religious body is "deutsche evangelisch-lutherische synode von missouri, ohio, und andern staaten." unfortunately, the authority record (example 4) does not even show that the earlier name is valid according to aacr2. the conversion program, on encountering the earlier name used as a heading, would change it to the later name and code that form as being the aacr2 heading. another example of the problem is: chamber of commerce of the united states of america. international department this is identified as the aa c r2 form (example 5) but, in fact, the department has changed its name to "international division." lc practice another problem we have encountered is that of the literal-mindedness of the computer programs in matching like with like. this problem is compounded by inconsistencies in cataloging practice resulting from variations in lc cataloging practice. an example of this problem is that of the nigerian author chinua achebe. the heading "achebe, chinua" is marked as being aacr2 despite the fact that the authority record shows ~;.:rt:oen 1 ot .. ~ fur f< i b reco:.ord ente:f~ to1b oi'sf'lay reco send f;:.,.,: ~ t~ t : 11 f r.t r· d: :::(' 1 ~~:::::: 1.1•3-.:o•j! 801~::.2:3 t ·tpe! :;:: 8jb lvl : c'o:•vt a9n : l~l.n9 : 'i-t:••jr•:.: : ·; lt e: 0 11 l'nlc : .:;:~ e: n•: lvl : r• h e.]_ d 1 ...:.·t : a h~a. d: c i •) 10 :<)•)57t)65 :.: 11•.) .:-l~· ' • • ,~··.;t r, ,-hur•:h--11j.ss<:••jr j :_::..·, .,-l( .. j , t li' n(l(ol:::(lll(l~:::;.ao3•:~.r • r•----r,r.r+n 3 4 j. o .:•: l '~ '' '=' '3. . rori?l91• commerce-fc•rc~ i'ein f'oll c, [i~ pt. w n(u_l~.:?o l ').:· l ·'l.-3.f1~rtn ---nflrar' 8 67(1 ar, t ntr odu·:tt or. t.~· do1n9 lffif'•ht •• • 19 4 7w n007:~:t) t 02' 1 aananr.---nn n n example s that he was born in 1930. lc's announced policy is to give dates "whenever the information is readily available," but only for headings established after december 1980. this restriction creates inconsistencies in lc practice that are hard to predict. the result is that we often establish a personal heading with a date, only to discover that lc is not using it. the definition of " readily available" is clearly elastic and does not provide clear guidance to other libraries. it is irritating and occasionally burdensome but does not create a quantitatively serious problem . one unfortunate result of lc's machine conversion of its authority file to aacr2 forms has been to make notes on the authority records harder to understand. this is because only headings and references were changed; notes were not affected. this means that the wording of the notes may refer to a state of affairs that has altered as a result of the aacr2 conversion. example 6 is the authority record for theaacr2 form of heading for the university of illinois prior to the change of name in 1966. the history note (field 667) incorrectly says that the heading for works published before 1966 is "illinois. university" (the pre-aacr2 form) . since the aacr2 form as established by lc looks very much like the new name, "university of illinois at urbana-champaign," the authority card is very difficult to understand. nothing short of revising the note, and/or the use by lc of a less confusing qualifier than "(urbana-champaign campus)," will make the authority record intelligible. an example of how lc practice has affected the oclc program adversely is in the area of the so-called compatible headings. these are instances of when lc has chosen to depart from the provisions of aacr2 for one reason or another. leaving aside the utility and morality of such a policy, it presents a considerable problem to those of us who use oclc oclc's database conversion/w ajenberg and gorman 183 records. the example that follows is of the worst of these "compatible" practices. lc has decided to ignore the common form of name for persons who are not "famous or published under an american imprint." 1 thus, the writer p. c. boeren would be recorded as "boeren, p . c. (petrus cornelis), 1909" under the provisions of aacr2, but, because boeren is neither famous nor american, the "compatible" heading will be "boeren, petrus cornelis, 1909." this heading is not acceptable in an aacr2 catalog. scr-e-er. 1 qf 4 for bib record enter btb display recd senli rec stat: c entrd : 801122 us~d : 810718 fype: z bib lvl : ~ govt a9r : lang: source: site: 038 inl~: ~ enc lvl: n h@ad r~f: a he~d: ~ ~ie~d status : a name : a mod rec : auth ~t~tus: a 1 010 n 7904c•l04 2 110 20 untvt:-rsit-.~ of illtnc•ls chi•: a9o cir·cl€, artd the univer·sit -..-· .:·f illjr .. :.ts at th et-tedto:·a.l center~ o1e-r·f=' reorganized into equal administr·ative ca~puses with1n a university s . stem with a •:entral admintstr·attve staff in llr· ba,r,a . a wor•-s p•jbllshed by th~s~ b(•dles after the reorgantzatton tn 1966 are found under a un1v~rs1tv of llltnots at urbana-champat9n . a un1vers1ty of 11 l1no1s at ch1ca9o c trcle. a un1vers1ty ot lll1nots at the medical center . a untver·st t. of llltr.ots (srstem) a subject entry: wor• s about these bod1es are ~nte-red •jnder th ~?name o:·r· n>3m~s tn e .:..::: 1stence d•jrln~ the 1ate5-t period fc•r wht•:t. sijt•je:t c.·.vera9-e ts 91v~r.. in the c ase wher~ the required name is represent~d 1n tht~ ~ at~lo~ onl, un·j~r ~ later form of th e• rtame. ~r.tr··r 1s ma.j-e un·j ~r· tht.:o la t ~r f.:·r·m. w n010:3 t061c·a'lnur.n--nnrtn 14 667 llltno1s indus t r1al university. w n004790s=a4dnann----nnnn example6 184 journal of library automation vol. 14/3 september 1981 more, it is quite possible that if boeren' s works are published in america or if lc suddenly decides that boeren is "famous," the heading will be changed. this is an infrequently encountered problem for us but one where lc's peculiar policies have created problems that have nothing to do with oclc or aacr2. conclusion the problems that we have cited above are real but not numerically significant (except in the case of serials and multiple personal namesneither of which are under oclc's control). they are far outweighed by the tremendous value of the more than 40 percent of oclc headings that have been converted to their aacr2 form. the oclc conversion has made it possible for us to do aacr2 cataloging more quickly than in the period november 1979-december 1980. we have issued guidelines to our professional, paraprofessional, and clerical cataloging staff who deal with all the headings we encounter in using oclc (see appendix). problems such as those we have described are dealt with in our guidelines, and in practical terms now in day-to-day work. they may take some extra time, but overall our cataloging operation has been greatly speeded by oclc's conversion . reference 1. cataloging service bulletin , no.6:6(falll979) appendix university of illinois library at urbana champaign copy cataloguing guidelines authority records lc authority records, now available on oclc, can be very helpful in determining the correct aacr 2 form of headings, and should be cited on authority cards we prepare, when we use them in establishing headings. the tag numbers used on authority records sometimes have different meanings from the numbers used on bibliographic records. the meanings are: lxx heading 4xx see reference (i.e. from the form in this field to the form in the lxx field) 5xx see also reference (i.e. from the form in this field to the form in the lxx field) 6xx notes (e.g. the authority used by the lc cataloguer) each field concludes with a w subfield, consisting of 24 characters indicating in coded form various types of information about the heading. the 13th character, the 3rd past the six-character date, consists of one of five letters indicating the rules governing the form of heading in that field. the codes are: oclc's database conversion!w ajenberg and gorman 185 c aacr2 d compatible with aacr 2 b aacr, 1967 ed. a earlier rules (e.g., ala rules of 1949, etc.) n not applicable or not applied here is an example of an lc authority record, omitting the fixed field and some of the references: 010 n 790558820 110 20 state university of new york at buffalo. w n008801115aacann----nnnn 410 10 buffalo. b university w n002791105aaaann----nnna 41010 new york (state). b state university, buffalo. w n009801115aaaann----nnna 667 the following heading for an earlier name is a valid aacr 2 heading: university of buffalo. w n007791105aanann----nnnn when oclc carried out its aacr 2 conversion project, the data about the rules encoded in subfield w was added to headings in bibliographic records, if those headings were altered by the conversion. for bibliographic records in oclc, subfield w contains 2 characters, each of which must be one of the following: c (for aacr 2 heading) d (for accr 2 compatible heading) m (for machine converted heading) n (not applicable or not applied) the first character applies to the name portion of the heading; the second, to the title portion. obviously, in many cases there is no title portion, in which case the second character will ben. the code m (machine converted heading) is used when a heading is altered directly by program, rather than being extracted from an authority record. an example would be the elimination of subfield k laws, statutes, etc. 1. use of subfield win cataloguing since oclc does not want member libraries to apply the letter codes in subfield w for their original input, the presence of a cord in subfield w should always indicate an lc decision identifying an aacr 2 or aacr 2 compatible heading. supply subfield w for all cataloguing to be added to oclc's data base. the codes to be used are given in illinet's information bulletin #92, from which this table is copied: 1 aacr 2 form found in on-line lc name-authority file 2 aacr 2 compatible form in on-line lc name-authority file 3 aacr 2 form supplied by inputting institution with copy in hand and piece not in hand 4 aacr 2 form supplied by inputting institution with piece in hand 5 author or title portion of heading not converted to aacr 2 form. this subfield (#w) is always the last subfield in the field. it must contain a two character code. the first character applies to the name portion of the heading; the second character applies to the title portion of the heading. if the heading is a name heading and does not include a title portion, use "n" as the second part of the code. if the heading is a uniform title heading, use "n" as the first part of the code. examples: 700 10 day lewis, c. #q (cecil), #d 1904-1972 #win 600 10 schmidt, h. r. #q (heinrich rudolf) #w 4n 130 00 bible. #p n.t. #s authorized. #f 1974. #w n4 accept headings coded c in subfield was correct aacr 2 headings, unless the heading is for an author entered under surname who writes in a non-roman alphabet language. for such 186 journal of library automation vol. 14/3 september 1981 authors, use the form given only if it is a standard romanization of the name in the original alphabet. if a form other than the standard romanization is used, substitute the standard romanization, and trace an x ref. from the form coded c. 2. lc author headings without dates lc recently announced that it will not add dates to a heading already established without dates, unless the dates are needed to resolve a conflict. when there is no conflict, the dates will be recorded in the authority record in a 6xx field , but will not be added to the heading. dates will be routinely added to newly established headings at the time the headings are established, if the information is readily available. lc codes such headings c, not d, because aacr 2 does not require that a date be added to the heading, except to resolve a conflict. if such an lc authority record is available when a heading is being established, use the lc form , without adding dates to the heading, unless dates are needed to resolve a conflict in the new catalogue. record the dates on an authority card. if lc authority is not available wh(;ln a heading is being established, use dates in the heading if the information is readily available. if, later , lc authority is found that omits date from the heading, do not change the heading as already established for the uiuc new catalogue. since records in oclc may contain headings without dates for persons we have established with dates, some conflicts will be generated. these should be resolved by catalogue maintenance staff, who will add dates in pencil to headings on new cards that lack dates, but are otherwise identical with headings in the new catalogue. such conflicts in the machine record will be cleaned up gradually, after fbr is up . 3. acceptable dn forms headings coded d in authority records (dn in bibliographic records) are the aacr 2 "compatible" forms. in many cases, the difference from aacr 2 is trivial, and the form can therefore be used. in such cases, if lc authority is available, use the form as established by lc, and record the information on an authority card. if lc authority is not available when a heading is being established, follow aacr 2. if, later, lc authority is found that establishes a "compatible" form , do not change the form in the uiuc new catalogue to the lc "compatible" form. it will sometimes happen that "compatible" forms will be found on records in oclc (coded dn, usually) . such headings may be used only if they fall into one of the categories listed below . this will sometimes result in "compatible" forms and true aacr 2 forms both being used in the new catalogue. in some cases, the two forms can be interfiled; in other cases, catalogue maintenance staff will need to correct "compatible" headings in pencil. acceptable dn form s are: a . lc will omit hyphens between forenames if the heading has been established without hyphens, even though rule 22.102 would require hyphens. use the lc form , if found . catalogue maintenance will interfile headings identical except for the presence or absence of hyphens. b. lc will continue to place the abbreviation ca. after a date in the heading for a person, if the heading has already been established in that form , even though rule 22.18 specifies that the abbreviation should precede the date. use the lc form, if found . catalogue maintenance will interfile headings identical except for the placement of the abbreviation ca. c. lc will not correct the language of an addition to a personal name heading; i. e . will not change to the language used in the person's works. (e.g., a heading already established as louis antoine, father will not be changed to louis antoine, pere, even though the latter is the author's usage.) use the lc form, if found . catalogue maintenance will correct conflicts in pencil, to the lc form. d. lc will not change a personal name heading to a fuller form of the name, even if the shorter form is not predominant. use the lc form , if found. catalogue maintenance will correct conflicts created by personal name headings that vary in fullness to the form to which a "see" reference has been made. if there is no "see" reference, catalogue oclc's database conversion!w ajenberg and gorman 187 maintenance will refer the conflict to the appropriate cataloguing service. e. lc will continue to use additions to surname headings supplied by cataloguers, for headings already established with such additions. use the lc form, if available. catalogue maintenance will resolve conflicts by adding qualifiers in pencil to headings that are otherwise identical with the forms with qualifiers. f. lc will continue to use titles of honor, address, or nobility with headings that have already been established with such titles, even though the authors do not use such titles. use the lc form , if found. catalogue maintenance will resolve conflicts by adding qualifiers in pencil to headings that are otherwise identical to the forms with the qualifiers. g. lc will not use initial articles in uniform title and corporate headings, even when they are required by aacr 2. we will follow lc practice in this, and use the lc form when found. catalogue maintenance will interfile uniform title and corporate headings that are identical except for the presence or absence of initial articles. h . lc will continue to use the abbreviations bp. and abp. for personal name headings that have already been established with those abbreviations used as qualifiers, instead of spelling out the qualifiers in full. use the lc form, if found. otherwise, follow aacr 2 and spell out "bishop" and "archbishop". catalogue maintenance will resolve conflicts by correcting in pencil to the form spelled out in full. i. lc will not add terms of incorporation to corporate headings already established without them, nor delete them from corporate headings already established with them, even though lc interpretation of aacr 2 would require such adjustment. use the lc form, if available. otherwise, retain terms of incorporation in corporate name headings only if the term is an integral part of the name, or if, without the term, it would not be apparent that the heading is the name of a corporate body. catalogue maintenance will resolve conflicts by adding, in pencil, terms of incorporation to headings identical to established forms except for the absence of such terms. j. lc will not add geographic qualifiers to corporate headings established previously without such qualifiers, even though they have chosen to apply the option in rule 24.4 that allows qualifiers to be added when there is no conflict. use the lc form, if available. catalogue maintenance will resolve conflicts by adding qualifiers in pencil to headings identical to established headings except for the absence of such qualifiers. k. lc will not reduce the hierarchy of far eastern corporate headings, established before 1981, even though aacr 2 rules would require that intervening superior bodies would be omitted from the heading. use the lc form , if available. catalogue maintenance will refer conflicts to the appropriate cataloguing agency for resolution. the asian library cataloguer is the final authority for such headings. l. lc will not change the capitalization of acronyms and initialisms to conform to the usage of the corporate body, if the acronym has already been established with a different capitalization. use the lc form, if available. catalogue maintenance will resolve conflicts by interfiling acronyms and initialisms that are identical except for variations in capitalization. m. lc will not supply quotation marks around elements in a corporate heading that has already been established without quotation marks, even though this varies from the usage of the body. use the lc form, if available. catalogue maintenance will resolve conflicts by interfiling headings identical except for the presence or absence of quotation marks. n. if lc is attempting to resolve a conflict (i.e. two different people with identical author statements), and neither dates nor expanded initials are available to resolve the c:onflict, lc will add an unused name in parentheses to the heading if the information is available. e.g.: established heading: smith, elizabeth new author: elizabeth smith 188 journal of library automation vol. 14/3 september 1981 (new author's full name, ann elizabeth smith, is available) lc heading: for new author: smith, elizabeth (ann elizabeth) use lc forms if found in name authority file. catalogue maintenance will refer problems to the appropriate cataloguing agency. 4. unacceptable dn forms in a few cases, the aacr 2 "compatible" forms, coded d in authority records and dn in bibliographic records, are unacceptable in the uiuc library. instead, we will follow aacr 2 in constructing these headings, and record the lc form on authority cards when they are found. we will also make references from the lc forms, if they would file differently from the forms we use. for many of these, catalogue maintenance will have to refer conflicts to the appropriate cataloguing agency. in a few cases, catalogue maintenance can make the corrections on the cards. the unacceptable dn forms are: a. lc will sometimes, but not always, continue to use headings established prior to 1981 with names spelled out in full , when the authors represent some of those names with initials. follow aacr 2 in constructing headings for these names. use initials in conformity with the authors' usage, and add the corresponding full names in parentheses, in subfield q, when the information is available. whenever an element in a compound surname or a first forename is represented by an initial, make a reference from the fuller form. usually, a reference will not be needed if a forename other than the first is represented by an initial. b. lc will continue to add " pseud." to personal name headings already established with that qualifier. do not use the qualifier "pseud." when establishing personal name headings, and delete the term from oclc records that use it, including records added by lc. catalogue maintenance will resolve conflicts by lining out the qualifier "pseud." in headings. c. lc will continue to add 20th century fl. dates to personal name headings already established with such dates. do not use 20th century fl. dates when establishing personal name headings, and delete such dates from oclc records that use it, including recorded added by lc. catalogue maintenance will resolve conflicts by lining out 20th century fl. dates in headings. 5. 87x fields one part of the aacr 2 conversion project by oclc was the addition of fields tagged 870, 871, 872, or 873. these fields contain the pre-aacr 2 forms of headings that were changed by the conversion. oclc participants can add 87x fields to records they enter into the data base. however, we will not supply these fields in our cataloguing. 6. authority cards prepare authority cards whenever references are needed, and whenever an lc authority record for the heading is found , even if we do not use the lc form. citation of the authority record takes the form: "lc auth. rec." followed by the record number and the indication, in parentheses, of the code for rules given in subfield w. example: akademie der wissenschaften und der literatur (mainz, germany) lc auth . rec. 80076417 (en) if the lc form differs from the form used as the heading in muc, give the lc form in parentheses, following the sub field w code. example: abrahamson , max w. (max william) lc auth. rec. 78064817 ( dn) (ab-rahamson, max william) it will sometimes happen , when establishing the heading for a corporate body, that an lc oclc's database conversion/w ajenberg and gorman 189 authority record for a subdivision of the body you are establishing will give you the aacr 2 form of the body you are setting up. precede the citation to the authority record with the word "from". example: united states. environmental protection agency. region v. from lc auth . rec. 80159375 (en) (the lc authority record is for the water division of region v) 7. references the basic rule for making references is given in aacr 2, rule 26.1: "whenever the name of a person or corporate body or the title of a work is, or may reasonably be, known under a form .that is not the one used as a name heading or uniform title, refer from that form to the one that has been used. do not make a reference, however, if the reference is so similar to the name heading of uniform title or to another reference as to be unnecessary." ultimately, this decision depends on the cataloguer's judgement. usually, make a reference only if it would file differently from the established heading and from all other references. refer from variant forms found in works catalogued for this library, and in standard reference sources. lc authority records will often suggest useful references. however, we may need references not traced by lc, and we may not need all of the references lc traces. notice especially that lc authority records will often give a reference from the pre-aacr 2 form, even when it would file with the aacr 2 form. for example, the authority record for akademie der wissenschaften und der literatur (mainz, germany) traces a reference from adakemie der wissenschaften und der literatur, mainz-the pre-aacr 2 form. these two forms would file together, so we do not need the reference. we will trace "see also" references from forms that can legitimately be used as headings, whether or not they have been used yet in the uiuc library. we will no longer observe the former restriction, which allowed "see also" references to be made only if both headings had been used. for further information on authority records and references, see the cataloguing manual, section a79. aw:lgo arnold wajenberg is principal cataloger and michael gorman is director, technical services, at the university of illinois library. 102 the recon pilot project: a progress report henriette d. a vram: project director, information systems office, library of congress, washington, d. c. a synthesis of the progress report submitted by the library of congress to the council on library resources under an officers grant to initiate the recon pilot project that gives an overview of the project and the progress made from august-november 1969 in the following areas: training, selection of material to be converted, investigation of input devices, and format recognition. introduction the recon pilot project is an effort to analyze the problems of largescale conversion of retrospective catalog records through the actual conversion of approximately 85,000 non-current records. this project has grown directly out of the implementation of the marc distribution service. libraries considering the use of machine readable records for their current materials have naturally begun to consider conversion of their older records as well. some libraries have even begun such conversion projects. since the library of congress is also interested in the feasibility of converting its own retrospective records, it seemed appropriate to explore the possibility of centralized conversion of retrospective cataloging records and their distribution to the entire library community from a central source. a proposal having been submitted by the library of congress to the council on library resources, inc. ( clr), the council granted funds for a study of this problem. an advisory committee was appointed to provide guidance, and direct responsibility for the study and report ( 1) was assigned to a working task force. recon pilot project/ avram 103 a recommendation of the working task force was the implementation of a pilot project to test the techniques suggested in the report in an operational environment. since any feasibility report, no matter how detailed, refers to a theoretical model, the recommended techniques should be tested to determine a most efficient method for a large-scale conversion activity. the advisory committee concurred with this recommendation. the library of congress submitted a proposal for a pilot project (hereinafter referred to as recon) to clr, and received an officer's grant in august 1969 to initiate recon while the council continued its evaluation of the full-sc'ale pilot project. . a progress report was submitted to clr by the library covering the period from mid-august to november 1, 1969. so that clr might have a clear understanding of the work in progress, the report addressed itself to both the areas of recon supported by the council and those activities supported by the library of congress. in december 1969, clr awarded the library the funds requested for the entire pilot project. to make the library community cognizant of recon as quickly as possible, clr granted permission to modify the progress report for publication.· overview of the recon pilot project the pilot project is concerned with the conversion and distribution of an estimated 85,000 english language titles: 22,000 titles cataloged in 1969 and not included in the marc distribution service, and 63,000 titles from 1968. the creation of this data base partially satisfies the conclusions and specific recommendations of the recon working task force as stated in the report ( 2) : 1) there should be no conversion of any category (language or form of material) of retrospective records until that category is being currently converted; 2) the initial conversion effort should be limited to english language monograph records issued from 1960 to date and converted into machine readable form in reverse chronological order. (marc distribution service covers current english language monographs cataloged by the library of congress) . in order to explore the problems encountered in encoding and converting cataloging records for older english language monographs, and monographs in other roman alphabet languages, 5,000 additional titles will be selected and converted. the library further intends to investigate, through the design and implementation of a format recognition program, the use of the computer to assist in the editing of cataloging records. this technique should significantly reduce the manpower needs of the present method of conversion and therefore have an impact on any future library of congress conversion activity, either of currently cataloged or retrospective titles. recon will include experimentation with microfilming and producing hard copy from the lc record set. the record set in the lc card division consists of a master copy of the latest version of every lc printed card, arranged by card series and, 104 journal of library automation vol. 3/2 june, 1970 within each series, by card number. although a specific time period can be selected for conversion, the primary disadvantage of the record set for this purpose is the fact that not all changes in cataloging made to the lc official catalog are reflected in the record set. after considering all the alternatives, the recon working task force recommended (3) that the record set be used for selection of titles, but that the titles be compared with the official catalog and updated to insure bibliographic accuracy and completeness. since the record set is in constant use by card division personnel, the selected titles for conversion must be reproduced, and the original file reconstituted, as quickly as possible. the state of the art of direct-read optical character recognition devices suitable for large-scale conversion will be monitored and experimentation will be conducted with a variety of input devices. recon is closely related to the lc card division mechanization project, which is based upon the availability of records in machine readable form. recon will be closely coordinated with the card division project, both in the design of specifications for implementation and in the investigation of a common hardware/software configuration. the project was organized during august 1969. the first group of records being edited are those cataloged by the library of congress in 1969. in june 1970, the editing of the 1968 records will begin. since these records will have to be compared with the lc official catalog to record any changes, present thinking includes the design of a print program (referred to as a two-up print program) to cut printing time by providing a listing with records arranged in card number sequence (the order of input) and in alphabetic sequence by main entry on the same page. the records will be arranged by main entry to reduce the effort of checking them against the official catalog and the changed records will be inserted in their proper place in sequence by lc card number. the process of manual editing may be greatly reduced, or perhaps even eliminated, by october 1970, when the format recognition program is scheduled for completion. mter this time, the records will be input with little or no prior tagging and further editing will be performed by the computer. the resulting records will be examined by the marc editors both for accuracy in transcription and for correctness in the assignment of marc tags, indicators, and subfield codes. the duration of the pilot project will be twenty-four calendar months, august 1969-august 1971. it is anticipated that by november 1970 enough data should be available to determine whether a full-scale conversion project should be undertaken. an early evaluation of the project is advantageous in order to explore the funding possibilities of a conversion effort if the results of the pilot are affirmative. figure 1 is a calendar indicating the major milestones of recon as postulated during august 1969. recon pilot project/ avram 105 1969 1970 1971 au s 0 n d ja f mr ap ~y jn jy au s 0 n d ja f ~r ap my jn jy au t_ project begins • production staff hired • iso staff organized card division sends 1969, 1968 cards !investigate input devices, recon/card division hardware/software rrainfng editors trinf index !reproduction methods for catalog records study !analysis, editing, etc., research [itles 1 organize cardf for recon input ~ull editing of 1969 titlfs (16,000 records) rna1ysir of system to convert 1968 titles 1 full editingr 1968 titles ~er new mtst's ~ire nfw mtst typists fesign and implementation of format recognitifn fig. 1. recon calendar. use of format recognition on remainder 1968 titles conversion of marc , ! to marc ii rnd interim marc ii to marc ii regin evaluation of pilot pr~ject begin planning for con• tinuation of project! begin writing final report 106 journal of library automation vol. 3/2 june, 1970 essentially the same advisory committee and working task force selected for the recon feasibility study have agreed to serve in their respective capacities for recon. the implementation of the library of congress' marc distribution service and the initiation of recon are providing the nucleus of a national bibliographic data base. creation of this data base is not in itself a panacea for libraries but, in fact, amplifies the need to explore some of the larger issues at this time to provide the direction for future cohesive library systems. certain aspects of the problems were discussed in general terms in the recon report but time did not permit full analysis. during the two-year period of recon, the working task force will consider some of those issues (defined as four tasks listed below) under the grant from clr. the ability to complete all of the tasks described will be dependent on additional funding, which, it is hoped, may be available early in 1970. 1) any national data store should have a data base in which all records are consistent. it is possible, and highly probable, that libraries may convert bibliographic records for local use, which may not require the detail of a marc ii record. it is imperative that before levels of completeness of marc records are defined with respect to content and content designation, the implications of these definitions to future library networks be thoroughly explored. 2) any consideration of a national bibliographic data store in machine readable form should include the possibility of recording titles and holdings from other libraries. although the resolution of the problems associated with a machine readable national union catalog are enormous, it is time to begin an exploration of the problems to provide guidance for future design efforts. 3) several institutions have begun the conversion of their cataloging records into machine readable form. the possibility of utilizing these records in building a national bibliographic data store should be investigated. this will involve evaluating the difficulty and cost of converting and upgrading records converted by others to a marc format as opposed to preparing original records. 4) the library of congress maintains, and is considering the conversion into machine readable form, of its name and subject authority files. many libraries have expressed interest in receiving these records in the present marc distribution service. little thought has been given to the storage and maintenance of these large files in each library subscribing to marc distribution service. a library may not have in its collections a bibliographic record requiring either a name or subject cross reference record distributed by the library of congress. however, the library will keep the cross reference record because it cannot predict when a title will be added to the collection that does require the cross reference structure. the result will be the eventual storage and maintenance of the ~-==--------------------------------.... recon pilot project/ avram 107 entire lc name and subject reference files in each library. this problem should be explored to determine if there is a possible efficient method of libraries accessing these files from either a centralized source or several regional sources. progress-august 1969 to november 1969 organization the recon staff is divided into two sections: 1) the production section, responsible for the actual editing and keying of the records; and 2) the research and development section, responsible for liaison with the production section, determination of the criteria for the selection of the 1968 and 1969 titles, actual selection of the 5,000 research titles, investigation of input devices and photocopying techniques, liaison with the card division mechanization project, and the design and coding of special computer programs unique to recon. in addition, staff members of the marc project team in the information systems office (iso) are working in areas of format recognition and marc system programming that will affect recon. training the marc experience at the library of congress has demonstrated that staff members assigned to the editorial process of preparing catalog records for conversion to machine readable form must be exposed to cataloging fundamentals. phase i of the training program for the recon editors was a twoweek cataloging class conducted by the supervisor of the production section, a professional librarian with experience in teaching cataloging principles at the library of congress. each day was formally structured into reading, discussion, and practice. the editor-trainees applied the angloamerican cataloging rules ( 4) to practice problems and to actual cataloging of books. experience in using the lc subject heading list, filing rules, and classification schedules was provided to a lesser extent. in order to insure that the editor-trainees would have a wider range of experience in examining cataloging copy, the mnemonic marc tags and the more simple indicators and subfield codes were taught and used to identify explicitly cataloging elements on lc proofslips. phases ii and iii of the training, marc editing and correction procedures, were also taught by professional librarians. the editing class, which lasted two weeks, was divided into lecture sessions and laboratory sessions. each lecture period was from two to three hours; then, during the laboratory session, the instructions given in the lectures were applied to practice worksheets. the course covered input of variable and fixed fields, assignment of bibliographic codes for language and place of publication, and identification of diacritical marks included in the lc character set. phase iii of the training program, on correction procedures, 108 journal of library automation vol. 3/2 june, 1970 was a one-week class covering the addition, deletion, and conection of entire records or data elements at the field level. the training period was followed by an intensive practice period using marc input worksheets, which were reviewed by the experienced editors. selection of cards the actual selection of the 1968 and 1969 titles is a joint effort by the card division staff and the recon staff. the procedures for the selection of cards from the card division for recon differ from those described in the original report. since only cards for 1968 and 1969 titles are being selected, it is more expedient to draw the cards from the card division card stock than to microfilm the record set. these cards will include all titles cataloged by the library of congress during 1968 and 1969 regardless of language or form of material, which will yield approximately 250,000 cards. the cards are forwarded to the production section from the card division, where each record is inspected to determine whether it meets the criteria established for recon, i.e., all english language monographs with an lc catalog card number representing works cataloged by lc in 1968 and 1969 that are not already in machine readable form. the determination as to whether or not an item is in english is based upon the text, not the title page. an anthology of literature in spanish with a title page in english would not be included in recon; a book with text in english but title page in french would be included. if a book is multilingual (complete text in more than one language), the language of the first title determines inclusion or exclusion for recon. atlases are included, but not single maps or set maps. music or music scores are excluded, but books about music are included. records representing film strips, moving pictures, serials, and other kinds of materials not regarded as monographs are excluded. once the cards eligible for recon are selected and arranged in lc card number sequence, the cards are compared with the print index listing all records already in machine readable form. those records not in machine readable form are photocopied onto the input worksheet for editing and keying. to date, 60,000 cards have been selected by card division staff and forwarded to the production staff for further processing. selection of research titles an integral part of recon is the conversion of 5,000 titles to machine readable form for research purposes. ideally, these titles should serve not only the needs of recon but also be useful for some other purpose in the library of congress. these titles would include english language monographs cataloged before 1950, and foreign language material using the roman alphabet, and would be used to test various methods of input recon pilot projectjavram 109 and certain aspects of the format recognition program. the older material would represent records cataloged under earlier cataloging rules and would reveal problems in conversion in an area in which little information exists. two sources were initially considered for the selection of research titles: 1) titles in the main reading room collection for conversion into machine readable form for the production of book catalogs, and 2) the popular titles (cards ordered most frequently) of the card division mechanization project. a decision was made to study the titles in both sources with priority given to solution of conversion problems and to determine: 1) if overlap existed in records for both projects that would also serve the needs of recon; 2) if overlap did not exist, which titles (main reading room collection or card division popular titles) best served the needs of recon; and 3) if the titles in neither project were suitable, the method of selection to be used from the card division record set. the first task was a study of the characteristics of the main reading room collection. the collection consists of approximately 14,000 titles, and printed cards have been collected to compile a complete shelf-list catalog. these cards represent a wide range of material cataloged from 1900 to date. approximately one-fourth to one-third represent serials. the collection includes material in most of the roman alphabet languages currently processed at the library, the more common non-roman alphabet languages, such as russian, japanese, hebrew, etc., and a number of "difficult" titles, such as encyclopedias, dictionaries, etc., that would present a variety of cataloging and editing problems. the second task was a study of the popular titles from the card division. the card division provided a printout of card numbers for titles with 25 or more orders. there were 4,765 such card numbers listed with their corresponding number of orders. only 210 of these were for pre-1950 cards, and 97 of the 210 cards were for serial titles. only 15 out of the 210 cards were for "difficult" titles. another list was produced which contained card numbers for titles with ten or more orders. this list (with 39,148 card numbers) did produce more titles that would meet the research needs of recon. a sampling technique was designed by the technical processes research office to determine the percentage of overlap of this list with the titles in the main reading room reference collection. the estimated number of matches ( 15.5%) indicated that not enough overlap existed to consider a selection of titles that would serve the needs of both projects (main reading room collection and card division) and recon. therefore, the research titles are being selected from records for the reference collection. iso is working closely with staff members of the reference department on this project. the reference department is providing local informallo journal of library automation vol. 3/2 june, 1970 tion (e.g., local call number to locate the item in the reference collection as opposed to the lc call number which locates the item in the general collection) for all titles. as this process is completed, the responsible recon staff member is selecting the research titles. to date, "local" information has been added to 2,000 records, and 400 recon titles have been selected from this group of records. computer programs the only computer program implemented to date is the print index program. this program was required to check the records meeting the manual selection criteria for inclusion in recon against records in existing machine readable data bases to avoid duplicate input. print index lists by card number all records in machine readable form in either the marc i or marc ii data bases. at a later date, the 1968 titles found on the marc i data base will be processed by a subset of the format recognition program and converted to the marc ii processing format. the print index program is made up of two routines. the lc catalog card number routine reads each record, extracts the lc card number and creates a magnetic tape file of numbers (called print index tape). the tape created contains a card number right justified for machine sorting, a card number in the same form (zeros deleted) as the number on the printed card, and a data base code indicating the file in which the record originally resided (e.g., marc ii data base, marc ii practice tape, marc i data base). a parameter card is used to indicate which format and data base is to be processed. the ibm sort is used to arrange the output of the lc catalog card number routine into the following order: all 6x-series numbers, all 6xseries numbers with alphabetic prefixes (by year of cataloging-i.e., 1968 followed by 1969), all 7 -series numbers (disregarding the check digit, the second digit in the number). the lc card number print routine prints the card numbers, which are in numeric sequence as described in the preceding paragraphs, from the print index tape. each page of the listing contains a heading, a running index, a date, and a page number. the program prints 200 card numbers and data base codes per page. the numbers are in ascending order, top to bottom in four columns of 50 numbers each. format recognition the experience of the library in the creation of machine readable cataloging records during the marc pilot project and the marc dish·ibution service has clearly demonstrated that the highest cost factor of conversion is the human editing and proofing. the editing presently consists of assigning tags and codes to the bibliographic record to explicitly identify the content of the record for machine manipulation. the recon pilot projectfavram 111 library has completed a format recognition feasibility study which concluded that the probability of success of automatically assigning tags and codes by computer is high. since the format recognition feasibility study was only concerned with cataloging records for current english language monographs, the study must be extended to cover other roman alphabet languages and as part of recon, records which were created according to different rules and conventions. although the progress report submitted to clr included the definition and status of each of the tasks that make up the format recognition program, these have been omitted to avoid duplication with an article recently published in the i ournal of library automation ( 5) describing format recognition concepts in some detail and elaborating on the tasks completed and projected at that time. investigation of input devices the investigation of input devices and the testing of several selected devices in an operational mode will continue throughout recon. a study of the use of a mini-computer operating in an on-line mode for input, editing, and formatting of marc records is in progress at the library and will supplement the recon effort and provide additional data. a preliminary investigation was begun of optical character readers commercially available and in the developmental phases. only those readers capable of reading numerous characters on many lines (page reader) as opposed to a limited number of characters or lines per document (document reader) were included in the study. the machines evaluated were considered as possible candidates if they were capable of processing upperand lower-case alphabetic characters, numerals, standard punctuation and some special symbols. each manufacturer has specifications for the type of paper required and the font style which can be recognized. paper handling is a major drawback of optical character readers. excessive handling of the paper or any type of smear, crease, or crinkle could cause rejection of a character or conversion of a character to some specified symbol indicating an invalid character. error rates for the devices considered range from one to 35 characters per 10,000 characters and 80% of the errors are caused by paper handling. typewriters used to prepare the source document must be constantly cleaned and ribbons changed to keep impact keys free of dirt. frequent jamming appears to be a characteristic of most machines; unjamming these machines can be difficult and is highly dependent upon the skill of the operator. ten companies that have various types of optical character recognition equipment commercially available were considered in the first study. five were immediately rejected because their devices did not meet the criteria as specified above. 112 journal of libmry automation vol. 3/2 june, 1970 the devices remaining had the following characteristics: control data corporation 915 page reader. accepts 2.5x4 to 12x14inch paper; ocr-a standard type font; recognizes upper-case alphas, numerals, and standard punctuation; through programming and use of special symbols, lower-case alphas can be coded. farrington model 3030. accepts 4.5x5.5 to 8.5x13.5inch paper; ocr-a standard and 12l (farrington) type fonts; recognizes uppercase alphas, numerals, standard punctuation and special symbols; through programming and use of special symbols, lower-case alphas can be coded. scan-data models 100/300. accepts 8.5xll-inch paper; multi-type fonts; recognizes upperand lower-case alphas, numerals, standard punctuation, and special symbols; has programmable unit for formatting. philco-ford general purpose reader. accepts 5.7x8.5x11 inch paper; multi-type fonts; recognizes upper-case alphas, numerals, standard punctuation and special symbols; through programming and use of special symbols, lower-case alphas can be coded. recognition equipment retina. accepts 3.25x4.88 to 14.14-inch paper; multi-type fonts; recognizes upperand lower-case alphas, numerals, standard punctuation, and special symbols; has a programmable unit for formatting. the possibility exists of using any of these five machines for the input of english language material. the keying of an extraneous character is required with the farrington and control data corporation equipment for lower-case and some special symbols. this is not necessary with philco-ford, scan-data, and recognition equipment machines. since the number of special symbols vary by machine, each machine must be studied to determine a method of coding the entire library character set as developed by the library of congress and this method must be evaluated in terms of the burden placed on the typist. with the added feature of lower-case recognition, the price of the machine increases substantially. adequate information has not been obtained from these companies to give an accurate accounting of cost. it should be noted that the rental price for the majority of optical character readers is high, a factor which will have to be taken into consideration at the time of selection of an input device. the most economic route to recon pilot project/ avram 113 conversion may be through a service bureau, depending on the volume of records to be converted. outlook it is too early in the life of the project to predict the outcome or to describe any factual conclusions. the library of congress is greatly encouraged by the interest expressed in the project and the assistance offered by the members of the advisory committee and the working task force. the scope of the assignments and the fact that all members of the working task force have responsible positions in their own institutions are clear evidence of the spirit of cooperation that has been exhibited by the working task force members and their parent organizations. other members of the library community have been and will continue to be contacted throughout the project for their expertise in certain facets of the many problems under exploration. several developing regional networks were requested to describe their plans in the hope that smaller scale efforts would shed some light on the problems involved on a national level. those organizations contacted have responded, and a continuing liaison will be maintained not only to avoid duplication of effort but, more important, to attain a better understanding of how to approach the requirements of future library systems in terms of what is possible today. the report submitted to clr described progress made to november 1, 1969. since that time, the recon production staff has selected all the 1969 titles from the card stock to be included in recon, 5,200 records have been edited, and the first 250 have been forwarded to a service bureau to test its procedures for keying. the staff· has begun the selection of the 1968 titles and out of approximately 26,000 records received to date from the card division 19,000 are recon candidates. the production section continues its training by the proofing of marc records until the recon records are processed through the marc system to provide the required diagnostics for the proofing process. procedures were set up for typing records without any editing and in accordance with the requirements for the format recognition program. sample records selected for testing the procedures were of above-average difficulty in order to include all types of data that might be encountered. the procedures will be continually evaluated until some optimal method is determined. the format recognition algorithms are being evaluated by having recon staff simulate a computer and follow through the logic of the algorithms on actual data. results of the simulation will provide the necessary feedback to adjust the algorithms prior to the coding of the computer programs. detailed design work has begun on the expansion of the marc system to include random access capability and on-line correction. this 114 journal of library automation vol. 3/2 june, 1970 effort is being coordinated with the card division mechanization project and is considering the requirements of a large-scale conversion activity. although it has a long way to go, recon is on schedule and for any project concerned with automation, that is an encouraging note. for the moment the future looks bright. acknowledgment the author wishes to thank the recon staff members of the library of congress for their respective reports which were incorporated into the progress report submitted to the council on library resources, inc., and as such, are significant contributions to this paper. without the aid of the council on library resources the recon project would not have become a reality. through three important grants the council has made a major contribution to the project: 1) the first was a grant in support of the recon feasibility study and the working task force that resulted in the recon report; 2) an officer's grant enabling the establishment of the recon production unit to create additional machine readable records not included in the marc distribution service; and 3), most importantly, a grant providing full funding for the two-year pilot project. references 1. library of congress; recon working task force: conversion of retrospective catalog records to machine readable form. (washington: library of congress, 1969). 2. ibid, pp. 10-11. 3. ibid, pp. 20-38. 5. anglo-american cataloging rules. (chicago: american library association, 1967). 4. avram, henriette d., et al.: marc program research and development: a progress report," journal of library automation, 2 (december 1969), 242-265. lib-mocs-kmc364-20131012120558 314 news and announcements first use of catvlib network: american red cross satellite telecast on may 21, 1981, the american red cross celebrated their one-hundredth birthday by ending their annual conference in washington, d.c., with a special twohour nationwide satellite telecast. the pssc coordinated distribution of the telecast, which originated from constitution hall in washington, d.c. , from 10 a.m. to noon. the program was carried on satcom i, transponder 16 (appalachian community service network), and made available to all cable systems able to receive this transponder. those areas not able to schedule the live program were offered a satellite-transmitted taped feed later in the day. the american red cross had encouraged all its local chapters to initiate program reception in their communities by approaching the local cable system about carrying the event. since the american red cross was offering a free program and trying to saturate as much of the united states as possible, use of the catvlib network in conjunction with this telecast was appropriate. pssc contacted 53 libraries in 23 states that were interested in assuming local coordination for bringing this event to their communities. as the local coordinator, the ca tvlibs' minimum responsibilities included alerting the cable systems to schedule receiving this program (if the local red cross chapter had not already approached the catv) and contacting the local red cross chapter to offer the catvlibs' facilities for their group viewing and concomitant local celebration. of these fifty-three catvlibs , only seven could not participate because of technical problems. schedule conflicts; lack of catv, red cross, or community interest; and red cross alternative plans were the major factors in prohibiting twelve others from directly participating in hosting the satellite-transmitted program. the remaining thirty-four catvlibs did host community residents in their facilities. evaluation forms revealed a variety of degrees of catvlib participation in coordinating their first satellite event participation. several catvlibs (though none came to the library for viewing) were instrumental in getting the program into the community and available to all local cable subscribers. advance publicity, birthday cakes and refreshments, sing-alongs, taping for multiple showings, and joint library/ chapter preand postevent activities are but a few of the ways the individual catvlibs participated. all of the evaluation forms indicated that the ca tvlibs wanted to be contacted as a potential local site for future satellite events. the following list names the fifty-t hree ca tvlibs that were initially contacted to be local coordinators for the red cross onehundredth birthday satellite telecast. though not all were successful, catvlib made an effort to bring the program to its community. colorado boulder public library, boulder connecticut thomaston public library, thomaston florida tarpon springs public library, tarpon springs georgia tri-county regional library, rome idaho pocatello public library, pocatello illinois pekin public library, pekin rockford public library, rockford indiana fort wayne public library, fort wayne monroe county public library, bloomington iowa kirkwood community college telecommunications center, cedar rapids iowa city public library, iowa city kansas abilene public library, abilene newton public library, newton kentu cky lexington public library, lexington louisville public library, louisville camden-carroll library, morehead state university , morehead massachusetts greenfield community college library, greenfield south hadley library system, south hadley minn esota anoka county library, fridley cloquet public library, cloquet crow river regional library, willmar international falls public library, international falls minnesota valley regional library, mankato marshall-lyon county library system, marshall western plains library system, montevideo rochester public library , rochester st. cloud public library, st. cloud missouri st. charles city county library, st. peters new j ersey burlington county college library , pemberton new york albany public library, albany amherst public library , willia msville bethlehem public library, delmar chautauqua-cattaraugus library system, jamestown gates public library, rochester mid-york library system, utica ridge road elementary school library, horseheads north carolina davidson county community college library, lexington ohio greene county district library, xenia public library of columbus and franklin county, columbus news and announcements 315 university of toledo library, toledo pennsylvania altoona area public library, altoona lancaster county library, lancaster monroeville public library , monroeville tennessee memphis/shelby county public library & information center, memphis utah merrill library and learning resources program , utah state university, logan weber county library, ogden virginia arlington county department of libraries, arlington washington edmonds community college library, lynnwood lynnwood public library, lynnwood mountlake terrace public library, mountlake terrace seattle public library, seattle wisconsin middleton public library, middleton nicolet college learning resource center, rhinelander who's who and what's what in library video and cable for librarians interested in who is doing what in video in libraries, or in how to do it themselves, a guidebook has been published by the video a nd cable communications section of the libra ry a nd information technology association . it is the 461-page video and cable guidelines. edited by leslie c hamberlin burk and roberto esteves-two of the most active libra rians in the video field-the book includes papers from donald sager, kandy brandt, arlene farber sirkin, anne hollingsworth , and by burk and esteves. among the topics covered are a description ofthe present operation, future plans, problems, and benefits of video in 250 libraries in the u.s. and canada. the book is spiral-bound and can be used conveniently as a manual for staff development programs. its price is $9. 75 . for additional information, or to order copies (prepaid orders only, please), contact lit a, ala, 50 e. huron st., chicago, il 60611 ; (3 12)944-6780. 316 journal of library automation vol. 14/4 december 1981 elmig electronic mail arrives the "new arrival" to the library association family this summer is the electronic library membership initiative group. elmig is an organization of individuals established to ensure that electronically delivered information remains accessible to the general public. elmig promotes participation and leadership in the remote electronic delivery of information by publicly supported libraries and nonprofit organizations. the group's efforts are coordinated by richard sweeney, director of the public library of columbus and franklin county; neal kaske, director of oclc's office of research; and kenneth dow lin, director of the pikes peak library district. the first founding goals of elmig are: • identifying services and information best suited for the remote electronic access to and delivery of information; • planning, funding, and developing working demonstrations of library electronic information services; • communicating the availability of electronic library services to the community; • informing the library profession of trends, specific events, and future directions in remote electronic delivery of information; • creating coalitions with organizations in allied fields of interest. organizers of elmig are working within ala to foster interest in , and facilitate the needs of, the electronic library. ala has established a membership initiative group to address the concerns of this group. the electronic library membership initiative group will meet during the ala midwinter meeting in denver. interested individuals are encouraged to attend the meeting scheduled for monday, january 25, 1982, at 2 p.m. in room 2e of the auditorium. interest in elmig/ela has surfaced quickly. the membership group was formed in march, and gathered the 200 signatures needed for official recognition at the ala annual conference in san francisco. some 150 people met at that conference to discuss topics of concern. they decided to continue these discussions at the 1982 midwinter meeting and plan for an elmig program to be presented at philadelphia. elmig aims to address the issues concerning the electronic library on a continuing basis through ongoing interaction of its members. to facilitate this interaction, elmig will use an electronic mail system. further information on elmig and its members is available from richard sweeney at the public library of columbus and franklin county, 28 s. hamilton rd., columbus, oh 43213. see page 317 for subscriber agreement form. heynen to head arl microform project the association of research libraries has hired jeffrey heynen to head a two-year program designed to improve bibliographic access to microform collections in american and canadian libraries. the association has received $20,000 from the council on library resources to initiate the project, and additional funds are anticipated from other sources. heynen brings an extensive background in micrographics and publishing to the project as well as a long-standing commitment to improving the treatment , use, and bibliographic control of microforms in libraries. he has served as chair of the american library association's reproduction of library materials section, and was a participant in earlier groups that laid the foundation for the current arl project. currently president of information interchange corporation, heynen has held executive positions with congressional information service, greenwood press, and redgrave information resources. these positions have all included responsibility for the creation of large microform collections. heynen hold~ memberships in numerous standards-making bodies, including the international organization for standardization (iso), the american national standards institute, and the national micrographics association, and is a lecturer at the university of maryland college of library and information services. the arl microform project is based upon a planning study conducted for the association by richard boss of information systems consultants, inc. its purpose is to stimulate and coordinate the work of libraries, microform publishers, bibliographic utilities, and regional networks in providing bibliographic access to millions of monographic titles in microform that are now inadequately or insufficiently cataloged. since the development of the plan during 1980, there has been keen interest both in the elements of the plan and in the cooperative efforts needed to achieve them. a number of libraries-both arl and nonarl members-are planning to begin or are already entering catalog records for individual titles in microform sets into bibliographic databases. for example, three arl libraries have recently been awarded grants under title 11-c of the higher education news and announcements 317 act, strengthening research library resources, to catalog major microform sets, entering the resulting records into one of the major utilities. all three librariesstanford university, university of utah, and indiana university-will be coordinating their efforts with the goals of the arl program. key to these efforts, however, is coordination to ensure that national standards are accepted and followed, to distribute the work load so that as many sets as possible are covered and duplication of effort is avoided, and to ensure that the records are available to all libraries that want to use them. the arl microform project will emphasize building on existing resources, coordinating efforts among the library and pubsubscriber agreement electronic library membersiup initiative group ------------------·(ala member), applies for membership in the electronic library membership initiative group, electronic mail system, and states that: recitals: a. elmig is an association of individuals whose mission is to ensure that information delivered electronically remains accessible to the general public; and b. elmig seeks to promote participation and leadership in remote electronic delivery of information by publicly supported libraries and nonprofit organizations. now therefore, the above member and oclc agree that: 1. member will deposit with oclc a $100 contribution toward the cost of electronic mail service and attendant expenses for the first year of operation, which is to commence january 1, 1982. the member recognizes that the initial member contribution may not be sufficient to pay for a year of operation and agrees, when invoiced, to make additional payments of $100, or other agreed upon sums, to oclc for the continuation of service. 2. oclc agrees that by accepting member deposits, it will secure electronic mail service for the members of elmic; and 2.1 will place member deposits in a separate elmic account from which oclc will pay the cost of the electronic mail service, u.s. postal mailings, and any other expenses incurred in the administration of ems. 2.2 will provide a year-end accounting of contributions and expenditures to members with in a reasonable time after december 31 , 1981 , and each year-end thereafter. member: by --------------------------------------------------------title --------------------------------------------------------date ---------------------------------------------------------318 journal of library automation vol. 14/4 december 1981 lishing communities and the bibliographic utilities, and, where possible, facilitating cooperative projects already planned or under way. heynen will be assisted by an advisory committee composed of representatives of both arl and non-arl libraries, the major bibliographic utilities, and microform publishers. the arl project will operate out of the office of information interchange corporation, 503 11th st., se, washington, dc 20003; (202)544-0291. libraries and publishers interested in participating in the project are urged to contact the project office. nominations sought for lita award nominations are being sought for the library and information technology association's award for achievement. the award is intended to recognize distinguished leadership, notable development or application of technology, superior accomplishments in research or education, or original contributions to the literature of the field. the award may be given to an individual or to a small group of individuals working in collaboration. organized institutions or parts of organized institutions are not eligible. nominations for the award may be made by any member of the american library association and should be submitted by january 15, 1982, to hank epstein, lita awards committee chairperson, 1992 lemnos dr., costa mesa, ca 92626. are these books on your shelf? the special library role in networks: proceedings of a conference robert w. gibson, jr., ed. 296 p. 1980 isbn 0-87111·279°5 ........ . ........ . ......... $10.50 d reports on the cu rrent state of networking and presents a creative approach to special library involvement in network participation and management. special libraries special issue on information technology and special libraries april 1981, vol. 72, no. 2 ...... .. ....... . .... .. ............... .. .......... .. ......... $9.00 d the entire issue of this journal is devoted to the technological transformation of the information industry. topics discussed are such advances as computer and tele communications components, software developments, linking, and modes of access to information systems. bibliographic utilities: a guide lor the special librarian james k. webster, ed. 32 p. 1980 isbn 0°87111°280°7 ............. 0 •• 0 •••••••••••• 0 •• $3.75 d a comparative study of the services offered by the four major north american online bibliographic utilities. total$ ___ _ send to: special libraries association order departmentbox jla 235 park avenue south orders from individuals must be prepaid. new york, new york 10003 date ______ _ name __________ __ organization street address ------------------------city ----------state _ _____ zip _____ _ new york ctty purchasers add 8 'i•% state and city sales tax. new york state purchasers add appropriate state and local sales tax. assessing the treatment of patron privacy in library 2.0 literature michael zimmer information technology and libraries | june 2013 29 abstract as libraries begin to embrace web 2.0 technologies to serve patrons, ushering in the era of library 2.0, unique dilemmas arise regarding protection of patron privacy. the norms of web 2.0 promote the open sharing of information—often personal information—and the design of many library 2.0 services capitalize on access to patron information and might require additional tracking, collection, and aggregation of patron activities. thus embracing library 2.0 potentially threatens the traditional ethics of librarianship, where protecting patron privacy and intellectual freedom has been held paramount. as a step towards informing the decisions to implement library 2.0 to adequately protect patron privacy, we must first understand how such concerns are being articulated within the professional discourse surrounding these next generation library tools and services. the study presented in this paper aims to determine whether and how issues of patron privacy are introduced, discussed, and settled, if at all, within trade publications utilized by librarians and related information professionals introduction in today’s information ecosystem, libraries are at a crossroads: several of the services traditionally provided within their walls are increasingly made available online, often by non-traditional sources, both commercial and amateur, thereby threatening the historical role of the library in collecting, filtering, and delivering information. for example, web search engines provide easy access to millions of pages of information, online databases provide convenient gateways to news, images, videos, as well as scholarship, and largescale book digitization projects appear poised to make roaming the stacks seem an antiquated notion. further, the traditional authority and expertise enjoyed by librarians has been challenged by the emergence of automated information filtering and ranking systems, such as google’s algorithms or amazon’s recommendation system, as well as amateur, collaborative, and peerproduced knowledge projects, such as wikipedia, yahoo! answers, and delicious. meanwhile, the professional, educational, and social spheres of our lives are increasingly intermingled through online social networking spaces such as facebook, linkedin, and twitter, providing new interfaces for interacting with friends, collaborating with colleagues, and sharing information. michael zimmer, phd, (zimmerm@uwm.edu), a lita member, is assistant professor, school of information studies, and director, center for information policy research, university of wisconsin-milwaukee. mailto:zimmerm@uwm.edu information technology and libraries | june 2013 30 libraries face a key question in this new information environment: what is the role of the library in providing access to knowledge in today’s digitally networked world? one answer has been to actively incorporate features of the online world into library services, thereby creating “library 2.0.” conceptually, library 2.0 is rooted in the global web 2.0 discussion, and the professional literature often links the two concepts. according to o’reilly, web 2.0 marks the world wide web’s shift from a collection of individual websites to a computing platform that provides applications for end users and can be viewed as a tool for harnessing the collective intelligence of all web users.1 web 2.0 represents a blurring of the boundaries between web users and producers, consumption and participation, authority and amateurism, play and work, data and the network, reality and virtuality.2 its rhetoric suggests that everyone can and should use new internet technologies to organize and share information, to interact within communities, and to express oneself. in short, web 2.0 promises to empower creativity, to democratize media production, and to celebrate the individual while also relishing the power of collaboration and social networks. library 2.0 attempts to bring the ideology of web 2.0 into the sphere of the library. the term is generally attributed to casey,3 and while over sixty-two distinct viewpoints and seven different definitions of library 2.0 have been advanced,4 there is general agreement that implementing library 2.0 technologies and services means bringing interactive, collaborative, and user-centered web-based technologies to library services and collections.5 examples include • providing synchronous messaging (through instant message platforms, skype, etc.) to allow patrons to chat with library staff for real-time assistance; • using blogs, wikis, and related user-centered platforms to encourage communication and interaction between library staff and patrons; • allowing users to create personalized subject headings for library materials through social tagging platforms like delicious or goodreads; • providing patrons the ability to evaluate and comment on particular items in a library’s collection through rating systems, discussion forums, or comment threads; • using social networking platforms like facebook or linkedin to create online connections to patrons, enabling communication and service delivery online; and • creating dynamic and personalized recommendation systems (“other patrons who checked out this book also borrowed these items”), similar to amazon and related online services. launching such library 2.0 features, however, poses a unique dilemma in the realm of information ethics, especially patron privacy. traditionally, the context of the library brings with it specific norms of information flow regarding patron activity, including a professional commitment to patron privacy (see, for example, american library association’s privacy policy, 6 foerstel,7 gorman,8 and morgan 9). in the library, users’ intellectual activities are protected by decades of established norms and practices intended to preserve patron privacy and confidentiality, most assessing the treatment of patron privacy in library 2.0 literature | zimmer 31 stemming from the ala’s library bill of rights and related interpretations.10 as a matter of professional ethics, most libraries protect patron privacy by engaging in limited tracking of user activities, having short-term data retention policies (many libraries actually delete the record that a patron ever borrowed a book once it is returned), and generally enable the anonymous browsing of materials (you can walk into a public library, read all day, and walk out, and there is no systematic method of tracking who you are or what you’ve read). these are the existing privacy norms within the library context. library 2.0 threatens to disrupt these norms. in order to take full advantage of web 2.0 platforms and technologies to deliver library 2.0 services, libraries will need to capture and retain personal information from their patrons. revisiting the examples provided above, each relies on some combination of robust user accounts, personal profiles, and access to flows of patrons’ personal information: • providing synchronous messaging might necessitate the logging of a patron's name (or chat username), date and time of the request, e-mail or other contact information, and the content of the exchange with the librarian staff member. • library-hosted blogs or wikis will require patrons to create user accounts, potentially tying posts and comments to patron ip addresses, library accounts, or identities. • implementing social tagging platforms would similarly require unique user accounts, possibly revealing the tags particular patrons use to label items in the collection and who tagged them. • comment and rating systems potentially link patrons’ particular interests, likes, and dislikes to a username and account. • using social networking platforms to communicate and provide services to patrons might result in the library gaining unwanted access to personal information of patrons, including political ideology, sexual orientation, or related sensitive information. • creating dynamic and personalized recommendation systems requires the wholesale tracking, collecting, aggregating, and processing of patron borrowing histories and related activities. across these examples, to participate and benefit from library 2.0 services, library patrons could potentially be required to create user accounts, engage in activities that divulge personal interests and intellectual activities, be subject to tracking and logging of library activities, and risk having various activities and personal details linked to their library patron account. while such library 2.0 tools and services can greatly improve the delivery of library services and enhance patron activities, the increased need for the tracking, collecting, and retaining of data about patron activities presents a challenge to the traditional librarian ethic regarding patron privacy.11 despite these concerns, many librarians recognize the need to pursue library 2.0 initiatives as the best way to serve the changing needs of their patrons and to ensure the library’s continued role in information technology and libraries | june 2013 32 providing professionally guided access to knowledge. longitudinal studies of library adoption of web 2.0 technologies reveal a marked increase in the use of blogs, sharing plugins, and social media between 2008 and 2010.12 in this short amount of time, library 2.0 has taken hold in hundreds of libraries, and the question before us is not whether libraries will move towards library 2.0 services, but how they will do it, and, from an ethical perspective, whether the successful implementation of library 2.0 can take place without threatening the longstanding professional concerns for, and protections of, patron privacy. research questions recognizing that library 2.0 has been implemented, in varying degrees, in hundreds of libraries,13 and is almost certainly being considered at countless more, it is vital to ensure that potential impacts on patron privacy are properly understood and considered. as a step towards informing the decisions to implement library 2.0 to adequately protect patron privacy, we must first understand how such concerns are being articulated within the professional discourse surrounding these next generation library tools and services. the study presented in this paper aims to determine whether and how issues of patron privacy are introduced, discussed, and settled—if at all—within trade publications utilized by librarians and related information professionals. specifically, this study asks the following primary research questions: rq1. are issues of patron privacy recognized and addressed in literature discussing the implementation of library 2.0 services? rq2. when patron privacy is recognized and addressed, how is it articulated? for example, is privacy viewed as a critical concern, as something that we will need to simply “get over,” or as a non-issue? rq3. what kind of mitigation strategies, if any, are presented to address the privacy issues related to library 2.0? data analysis the study combines content and textual analyses of articles published in professional publications (not peer-reviewed academic journals) between 2005 and 2011 discussing library 2.0 or related web-based services, retrieved through the library, information science, and technology abstracts (lista) and library literature & information science full text databases. the discovered texts were collected in winter 2011 and coded to reflect the source, author, publication metadata, audience, and other general descriptive data. in total, there were 677 articles identified discussing library 2.0 and related web-based library services, appearing in over 150 different publications. of the articles identified, 50 percent of appeared in 18 different publications, which are listed in table 1. assessing the treatment of patron privacy in library 2.0 literature | zimmer 33 table 1. top publications with library 2.0 articles (2005–2011) publication count computers in libraries library journal information today library and information update incite scandinavian public library quarterly american libraries electronic library online school library journal information outlook mississippi libraries college & research library news library hi tech news library media connection csla journal (california school library association) knowledge quest multimedia information and technology 51 51 21 21 20 18 16 15 14 14 13 13 12 12 12 10 10 8 each of the 677 source texts was then analyzed to determine if a discussion of privacy was present. full-text searches were performed on word fragments to ensure the identification of variations in terminology. for example, each text was searched for the fragment “priv” to include hits on both the terms “privacy” and “private.” additional searchers were performed for word fragments related to “intellectual freedom” and “confidentiality” in order to capture more general considerations related to patron privacy. of the 677 articles discussing library 2.0 and related web-based services, there were a total of 203 mentions of privacy or related concepts in 71 articles. these 71 articles were further refined to ensure the appearance of the word “privacy” and related terms were indeed relevant to the ethical issues at hand (eliminating false positives for mentions of “private university,” for example, or mention of a publication’s “privacy policy” that happened to be provided in the pdf searched). the final analysis yielded a total of 39 articles with relevant mention of patron privacy as it relates to library 2.0, amounting to only 5.8 percent of all articles discussing library 2.0 (see table 2). a full listing of the articles is in appendix a. information technology and libraries | june 2013 34 table 2. article summary count % total articles discussing library 2.0 articles with hit in “priv” and related text searches articles with relevant discussion of privacy 677 71 39 10.5 5.8 the majority of these articles were authored by practicing librarians in both public and academic settings and present arguments for the increased use of web 2.0 by libraries or highlight successful deployment of library 2.0 services. of the 39 articles, only 4 focus primarily on challenges faced by libraries hoping to implement library 2.0 solutions.14 a textual analysis of the 39 relevant articles was performed to assess how privacy was discussed in each. two primary variables were evaluated: the length of discussion, and the level of concern. length of discussion was measured qualitatively as high (concern over privacy is explicit or implicit in over 50 percent of the article’s text), moderate (privacy is discussed in a substantive section of the article), and minimal (privacy is mentioned, but not given significant attention). the level of concern was measured qualitatively as high (indicated privacy as a critical variable for implementing library 2.0), moderate (recognized privacy as one of a set of important concerns), and minimal (mentioned privacy largely in passing, giving it no particular importance). results of these analyses are reported in table 3. table 3. length of discussion and level of concern length of discussion level of concern high moderate minimal 3 8 28 9 13 16 of the 39 relevant articles, only three had lengthy discussions of privacy-related issues. as early as 2007, coombs recognized that the potential for personalization of library services would force libraries to confront existing policies regarding patron privacy. 15 anderson and rethlefsen similarly engage in lengthy discussions of the challenges faced by libraries wishing to balance patron privacy with new web 2.0 tools and services. 16 these three articles represent less than 1 percent of the 677 total articles identified that discussed library 2.0 while only three articles dedicate lengthy discussions to issues of privacy, over half the articles that mention privacy (21 of 39) indicate a high or moderate level of concern. for example, cvetkovic warns that while “privacy is a central, core value of libraries…the features of web 2.0 applications that make them so useful and fun all depend on users sharing private information with the site owners.” 17 and casey and savastinuk’s early discussion of library 2.0 puts these concerns in context for librarians, warning that “libraries should remain as vigilant with assessing the treatment of patron privacy in library 2.0 literature | zimmer 35 protecting customer privacy with technology-based services as they are with traditional, physical library services.” 18 while 21 articles indicated a high or moderate level of concern over patron privacy, less than half of these provided any kind of solution or strategy for mitigating the privacy concerns related to implementing library 2.0 technologies. overall, 14 of the 39 relevant articles provided privacy solutions of one kind or another. breeding, for example, argues that librarians must “absolutely respect patron privacy,” 19 and suggests any library 2.0 tools that rely on user data should only be implemented if users must explicitly “opt-in” to having their information collected, a solution also offered by wisniewski in relation to protecting patron privacy with location-based tools.20 rethlefsen goes a step further, proposing libraries take steps to increase the literacy of patrons regarding their privacy and the use of library 2.0 tools, including the use of classes and tutorials to help educate patrons and staff alike. 21 conversely, cvetkovic argues that “the place of privacy in our culture is changing,” and that while “in many ways our privacy is diminishing, but many people…seem not too concerned about it.” 22 as a result, while she argues for only voluntary participation in library 2.0 services, cvetkovic takes a position that information sharing is becoming the new norm, weakening any absolute position regarding protecting patron privacy above all. discussion rq1 asks if issues of patron privacy are recognized and addressed within literature discussing library 2.0 and related web-based library services. of the 677 articles published for professional audiences that discuss library 2.0, only 39 contained a relevant discussion of the privacy issues that stem from this new family of data-intensive technologies, and only 11 of these discussed the issue beyond a passing mention. rq2 asks how the privacy concerns, when present, are articulated. of the 39 articles with relevant discussions of privacy, only 11 make more than a minimal mention of privacy concerns. however, the discussion in 22 of the articles reveals a high or moderate level of concern. this suggests that while privacy might not be a primary focus of discussion, when it is mentioned, even minimally, its importance is recognized. finally, rq3 seeks to understand if any solutions or mitigation strategies related to the privacy concerns are articulated. with only 14 of the 39 articles providing a means for practitioners to address privacy issues, readers of library 2.0 publications are more often than not left with no real solutions or roadmaps for dealing with these vital ethical issues. taken together, the results of this study reveal minimal mention of privacy alongside discussions of library 2.0. less than 6 percent of all 677 articles on library 2.0 include mention of privacy; of these, only 11 make more than a passing mention of privacy, representing less than 2 percent of information technology and libraries | june 2013 36 all articles. of the 39 relevant articles, 22 express more than a minimal concern, but of these, only 9 provide any mitigation strategy. these results suggest that while popular publications targeted at information professionals are giving significant attention to potential for library 2.0 to be a powerful new option for delivering library content and services, there is minimal discussion of how the widespread adoption and implementation of these new tools might impact patron privacy and even less discussion of how to address these concerns. consequently, as the interest in, and adoption of, library 2.0 services increase, librarians and related information practitioners seeking information regarding these new technologies in professional publications will not likely be confronted with the possible privacy concerns, nor learn of any strategies to deal with them. this absence of clear guidance for addressing patron privacy in the library 2.0 era resembles what computer ethicist jim moor would describe as a “policy vacuum”: a typical problem in computer ethics arises because there is a policy vacuum about how computer technology should be used. computers provide us with new capabilities and these in turn give us new choices for action. often, either no policies for conduct in these situations exist or existing policies seem inadequate. a central task of computer ethics is to determine what we should do in such cases, that is, formulate policies to guide our actions. 23 given the potential for the data-intensive nature of library 2.0 technologies to threaten the longstanding commitment to patron privacy, these results show that work must be done to help fill this vacuum. education and outreach must be increased to ensure librarians and information professionals are aware of the privacy issues that typically accompany attempts to implement library 2.0, and additional scholarship must take place to help understand the true nature of any privacy threats and to come up with real and useful solutions to help find the proper balance between enhanced delivery of library services through web 2.0-based tools and the traditional protection of patron privacy. acknowledgements this research was supported by a ronald e. mcnair postbaccalaureate achievement program summer student research grant,and a uw-milwaukee school of information studies internal research grant. the author thanks kenneth blacks, jeremy mauger, and adriana mccleer for their valuable research assistance. assessing the treatment of patron privacy in library 2.0 literature | zimmer 37 references 1. tim o’reilly, “what is web 2.0? design patterns and business models for the next generation of software,” 2005, www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web20.html. 2. michael zimmer, “preface: critical perspectives on web 2.0,” first monday 13, no. 3 (march 2008), http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2137/1943. 3. michael casey, “working towards a definition of library 2.0,” librarycrunch (october 21, 2005), www.librarycrunch.com/2005/10/working_towards_a_definition_o.html. 4. walt crawford, “library 2.0 and ‘library 2.0,’” cites & insights 6, no 2 (midwinter 2006): 1–32, http://citesandinsights.info/l2a.htm. 5. michael casey and laura savastinuk, “library 2.0: service for the next-generation library,” library journal 131, no. 14 (september 1, 2006): 40–42; michael casey and laura savastinuk, library 2.0: a guide to participatory library service (medford, nj: information today, 2007).; nancy courtney, library 2.0 and beyond: innovative technologies and tomorrow’s user (westport, ct: libraries unlimited, 2007). 6. american library association, “policy on confidentiality of library records,” www.ala.org/offices/oif/statementspols/otherpolicies/policyconfidentiality. 7. herbert n. foerstel, surveillance in the stacks: the fbi’s library awareness program (new york: greenwood, 1991). 8. michael gorman, our enduring values: librarianship in the 21st century (chicago: american library association, 2000). 9. candace d. morgan, “intellectual freedom: an enduring and all-embracing concept,” in intellectual freedom manual. (chicago: american library association, 2006). 10. library bill of rights, american library association, www.ala.org/advocacy/intfreedom/librarybill; american library association, “privacy: an interpretation of the library bill of rights,” www.ala.org/template.cfm?section=interpretations&template=/contentmanagement/conten tdisplay.cfm&contentid=132904 11. rory litwin, “the central problem of library 2.0: privacy,” library juice (may 22, 2006), http://libraryjuicepress.com/blog/?p=68. http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2137/1943 http://www.librarycrunch.com/2005/10/working_towards_a_definition_o.html http://citesandinsights.info/l2a.htm http://www.ala.org/offices/oif/statementspols/otherpolicies/policyconfidentiality http://www.ala.org/advocacy/intfreedom/librarybill http://www.ala.org/template.cfm?section=interpretations&template=/contentmanagement/contentdisplay.cfm&contentid=132904 http://www.ala.org/template.cfm?section=interpretations&template=/contentmanagement/contentdisplay.cfm&contentid=132904 http://libraryjuicepress.com/blog/?p=68 information technology and libraries | june 2013 38 12. zeth lietzau and jamie helgren, u.s. public libraries and the use of web technologies, 2010 (denver: library research service, 2011), www.lrs.org/documents/web20/webtech2010_closerlookreport_final.pdf. 13. ibid. 14. sue anderson, “libraries struggle to balance privacy and patron access,” alki 24, no. 2 (july 2008): 18–28; karen coombs, “privacy vs. personalization,” netconnect (april 15, 2007): 28; milica cvetkovic, “making web 2.0 work–from ‘librarian habilis’ to ‘librarian sapiens,’” computers in libraries 29, no. 9 (october 2009): 14–17, www.infotoday.com/cilmag/oct09/cvetkovic.shtml;, melissa l. rethlefsen, “tools at work: facebook’s march on privacy,” library journal 135, no. 12 (june 2010): 34–35. 15. coombs, “privacy vs. personalization.” 16. anderson, “libraries struggle to balance privacy and patron access.”; melissa l rethlefsen, “facebook’s march on privacy,” library journal 135, no. 12 (2010): 34–35. 17. cvetkovic, “making web 2.0 work.” 18. casey and savastinuk, “library 2.0: service for the next-generation library.” 19. marshall breeding, “taking the social web to the next level,” computers in libraries 30, no. 7 (september 2010): 34–37, www.librarytechnology.org/ltg-displaytext.pl?rc=15053. 20. jeff wisniewski, “location, location, location,” online 33, no. 6 (2009): 54–57. 21. rethlefsen, “tools at work: facebook’s march on privacy.” 22. cvetkovic, “making web 2.0 work,” 17. 23. james moor, “what is computer ethics?” metaphilosophy 16, no. 4 (october 1985): 266–75. http://www.lrs.org/documents/web20/webtech2010_closerlookreport_final.pdf http://www.infotoday.com/cilmag/oct09/cvetkovic.shtml http://www.librarytechnology.org/ltg-displaytext.pl?rc=15053 assessing the treatment of patron privacy in library 2.0 literature | zimmer 39 appendix a: articles with relevant mention of patron privacy as it relates to library 2.0 anderson, sue. “libraries struggle to balance privacy and patron access.” alki 24, no. 2 (july 2008): 18–28. balnaves, edmund. “the emerging world of open source, library 2.0, and digital libraries.” incite 30, no. 8 (august 2009): 13. baumbach, donna j. “web 2.0 and you.” knowledge quest 37, no. 4 (2009): 12–19. breeding, marshall. “taking the social web to the next level.” computers in libraries 30, no. 7 (september 2010): 34–37. casey, michael e. and laura savastinuk. “library 2.0: service for the next-generation library.” library journal 131, no. 14 (september 1, 2006): 40–42. cohen, sarah f. “taking 2.0 to the faculty why, who, and how.” college & research libraries news 69, no. 8 (september 2008): 472–75. coombs, karen. “privacy vs. personalization.” netconnect (april 15, 2007): 28. coyne, paul. “library services for the mobile and social world.” managing information 18, no. 1 (2011): 56–58. cromity, jamal. “web 2.0 tools for social and professional use.” online 32, no. 5 (october 2008): 30–33. cvetkovic, milica. “making web 2.0 work—from ‘librarian habilis’ to ‘librarian sapiens.’” computers in libraries 29, no. 9 (october 2009): 14–17. eisenberg, mike. “the parallel information universe.” library journal 133, no. 8 (may 1, 2008): 22–25. gosling, maryanne, glenn harper, and michelle mclean. “public library 2.0: some australian experiences.” electronic library 27, no. 5 (2009): 846–55. han, zhiping, and yan quan liu. “web 2.0 applications in top chinese university libraries.” library hi tech 28, no. 1 (2010): 41–62. harlan, mary ann. “poetry slams go digital.” csla journal 31, no. 2 (spring 2008): 20–21. hedreen, rebecca c., jennifer l. johnson, mack a. lundy, peg burnette, carol perryman, guus van den brekel, j. j. jacobson, matt gullett, and kelly czarnecki. “exploring virtual librarianship: second life library 2.0.” internet reference services quarterly 13, no. 2–3 (2008): 167–95. information technology and libraries | june 2013 40 horn, anne, and sue owen. “leveraging leverage: how strategies can really work for you.” in proceedings of the 29th annual international association of technological university libraries (iatul) conference, auckland, nz (2008): 1–10, http://dro.deakin.edu.au/eserv/du:30016672/horn-leveragingleveragepaper-2008.pdf. huwe, terence. “library 2.0, meet the ‘web squared’ world.” computers in libraries 31, no. 3 (april 2011): 24–26. “idea generator.” library journal 134, no. 5 (1976): 44. jayasuriya, h. kumar percy, and frances m. brillantine. “student services in the 21st century: evolution and innovation in discovering student needs, teaching information literacy, and designing library, 2.0-based student services.” legal reference services quarterly 26, no. 1–2 (2007): 135–70. jenda, claudine a., and martin kesselman. “innovative library 2.0 information technology applications in agriculture libraries.” agricultural information worldwide 1, no. 2 (2008): 52–60. johnson, doug. “library media specialists 2.0.” library media connection 24, no.7 (2006): 98. kent, philip g. “enticing the google generation: web 2.0, social networking and university students.” in proceedings of the 29th annual international association of technological university libraries (iatul) conference, auckland, nz (2008), http://eprints.vu.edu.au/800/1/kent_p_080201_final.pdf. krishnan, yyvonne. “libraries and the mobile revolution.” computers in libraries 31, no. 3 (april 2011): 5–9. li, yiu-on, irene s. m. wong, and loletta p. y. chan. “mylibrary calendar: a web 2.0 communication platform.” electronic library 28, no. 3 (2010): 374–85. liu, shu. “engaging users: the future of academic library web sites.” college & research libraries 69, no. 1 (january 2008): 6–27. mclean, michelle. “virtual services on the edge: innovative use of web tools in public libraries.” australian library journal 57, no. 4 (november 2008): 431–51. oxford, sarah. “being creative with web 2.0 in academic liaison.” library & information update 5 (may 2009): 40–41. rethlefsen, melissa. “facebook’s march on privacy.” library journal 135, no. 12 (2010): 34–35. schachter, debbie. “adjusting to changes in user and client expectations.” information outlook 13, no. 4 (2009): 55. http://dro.deakin.edu.au/eserv/du:30016672/horn-leveragingleveragepaper-2008.pdf http://eprints.vu.edu.au/800/1/kent_p_080201_final.pdf assessing the treatment of patron privacy in library 2.0 literature | zimmer 41 shippert, linda crook. “thinking about technology and change, or, ‘what do you mean it’s already over?’” pnla quarterly 73, no. 2 (2008): 4, 26. stephens, michael. “the ongoing web revolution.” library technology reports 43, no. 5 (2007): 10–14. thornton, lori. “facebook for libraries.” christian librarian 52, no. 3 (2009): 112. trott, barry and kate mediatore. “stalking the wild appeal factor.” reference & user services quarterly 48, no. 3 (2009): 243–46. valenza, joyce kasman. “a few new things.” lmc: library media connection 26, no. 7 (2008): 10– 13. widdows, katharine. “web 2.0 moves 2.0 quickly 2.0 wait: setting up a library facebook presence at the university of warwick.” sconul focus 46 (2009): 54–59. wisniewski, jeff. “location, location, location.” online 33, no. 6 (2009): 54–57. woolley, rebecca. “book review: information literacy meets library 2.0: peter godwin and jo parker (eds.).” sconul focus 47, (2009): 55–56. wyatt, neal. “2.0 for readers.” library journal 132, no. 18 (2007): 30–33. a comparative analysis of the effect of the integrated library system on staffing models in academic libraries ping fu and moira fitzgerald information technology and libraries | september 2013 47 abstract this analysis compares how the traditional integrated library system (ils) and the next-generation ils may impact system and technical services staffing models at academic libraries. the method used in this analysis is to select two categories of ilss—two well-established traditional ilss and three leading next-generation ilss—and compare them by focusing on two aspects: (1) software architecture and (2) workflows and functionality. the results of the analysis suggest that the nextgeneration ils could have substantial implications for library systems and technical staffing models in particular, suggesting that library staffing models could be redesigned and key librarian and staff positions redefined to meet the opportunities and challenges brought on by the next-generation ils. introduction today, many academic libraries are using well-established traditional integrated library systems (ilss) built on the client-server computing model. the client-server model aims to distribute applications that partition tasks or workloads between the central server of a library automation system and all the personal computers throughout the library that access the system. the client applications are installed on the personal computers and provide a user-friendly interface to library staff. however, this model may not significantly reduce workload for the central servers and may increase overall operating costs because of the need to maintain and update the client software across a large number of personal computers throughout the library. 1 since the global financial crisis, libraries have been facing severe budget cuts, while hardware maintenance, software maintenance, and software licensing costs continue to rise. the technology adopted by the traditional ils was developed more than ten years ago and is evidently outdated. the traditional ils does not have sufficient capacity to provide efficient processing for meeting the changing needs and challenges of today’s libraries, such as managing a wide variety of licensed electronic resources and collaborating, cooperating, and sharing resources with different libraries.2 ping fu (pingfu@cwu.edu), a lita member, is associate professor and head of technology services in the brooks library, central washington university, ellensburg, wa. moira fitzgerald (moira.fitzgerald@yale.edu), a lita member, is access librarian and assistant head of access services in the beinecke rare book and manuscript library, yale university, new haven, ct. mailto:pingfu@cwu.edu mailto:moira.fitzgerald@yale.edu a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 48 today’s libraries manage a wide range of licensed electronic resource subscriptions and purchases. the traditional ils is able to maintain the subscription records and payment histories but is unable to manage details about trial subscriptions, license negotiations, license terms, and use restrictions. some vendors have developed electronic resources management system (erms) products as standalone products or as fully integrated components of an ils. however, it would be more efficient to manage print and electronic resources using a single, unified workflow and interface. to reduce costs, today’s libraries not only band together in consortia for cooperative resource purchasing and sharing, but often also want to operate one “shared ils” for managing, building, and sharing the combined collections of members.3 such consortia are seeking a new ils that exceeds traditional ils capabilities and uses new methods to deliver improved services. the new ils should be more cost effective, should provide prospects for cooperative collection development, and should facilitate collaborative approaches to technical services and resource sharing. one example of a consortium seeking a new ils is the orbis cascade alliance, which includes thirty-seven universities, colleges, and community colleges in oregon, washington, and idaho. as a response to this need, many vendors have started to reintegrate or reinvent their ilss. library communities have expressed interest in the new characteristics of these next-generation ilss; their ability to manage print materials, electronic resources, and digital materials within a unified system and a cloud-computing environment is particularly welcome.4 however, one big question remains for libraries and librarians, and that is what implications the next-generation ils will have on libraries’ staffing models. little on this topic has been presented in the library literature. this comparative analysis intends to answer this question by comparing the nextgeneration ils with the traditional ils from two perspectives: (1) software architecture, and (2) workflows and functionality, including the capacity to facilitate collaboration between libraries and engage users. scope and purpose the purpose of the analysis is to determine what potential effect the next-generation ils will have on library systems and technical services staffing models in general. two categories of ilss were chosen and compared. the first category consists of two major traditional ilss: ex libris’s voyager and innovative interfaces’ millennium. the second category includes three nextgeneration ilss: ex libris’s alma, oclc’s worldshare management services (wms), and innovative interfaces’ sierra. voyager and millennium were chosen because they hold a large portion of current market shares and because the authors have experience with these systems. yale university library is currently using voyager, while central washington university library is using millennium. alma, wms, and sierra were chosen because these three next-generation ilss are produced by market leaders in the library automation industry. the authors have learned about these new products by reading and analyzing literature and vendors’ proposals, as well as information technology and libraries | september 2013 49 attending vendors’ webinars and product demonstrations. in the long run, yale university library must look for a new library service platform to replace voyager, verde, metalib, sfx, and other add-ons. central washington university library is affiliated with the orbis cascade alliance mentioned above. the alliance is implementing a new library management service to be shared by all thirty-seven members of the consortium. ex libris, innovative interfaces, oclc, and serials solutions all bid for the alliance’s shared ils. after an extensive rfp process, in july 2012 the orbis cascade alliance decided to choose ex libris’s alma and primo as their shared library services platform. the system will be implemented in four cohorts of approximately nine member libraries each over a two-year period, beginning in january 2013. the central washington university library is in the forth migration cohort, and their new system will be live in december 2014. it is important to emphasize that the next-generation ils has no local online public access catalog (opac) interface. vendors use additional discovery products as the discovery-layer interfaces for their next-generation ilss. specifically, ex libris uses primo as the opac for alma, while oclc’s worldcat local provides the front-end interface for wms. innovative interfaces offers encore as the discovery layer for sierra. as front-end systems, these discovery platforms provide library users with one-stop access to their library resources, including print materials, electronic resources, and digital materials. while these discovery platforms will also impact library organization and librarianship, they will have more impact on the way that end-users, rather than library staff, discover and interact with library collections. in this analysis, we focus on the effects that back-end systems such as alma, wms, and sierra will have on library organizational structure and staffing, rather than the end-user experience. as our sample only includes five ilss, the scope of the analysis is limited, and the findings cannot be universal or extended to all academic libraries. however, readers will gain some insight into what challenges any library may face when migrating to a next-generation ils. literature review a few studies have been published on library staffing models. patricia ingersoll and john culshaw’s 2004 book about systems librarianship describes vital roles that systems librarians play, with responsibilities in the areas of planning, staffing, communication, development, service and support, training, physical space, and daily operations. 5 systems librarians are the experts who understand both library and information technology and can put the two fields together to context. they point out that system librarians are the key players who ensure that a library stays current with new information technology. the daily and periodic operations for systems librarians include ils administration, server management, workstation maintenance, software and applications maintenance and upgrades, configuration, patch management, data backup, printing issues, security, and inventory. all of these duties together constitute the workloads of systems librarians. ingersoll and culshaw also emphasize that systems librarians must be proactive in facing constant changes and keep abreast of emerging library technologies. a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 50 edward iglesias et al., based on their own experiences and observations at their respective institutions, studied the impact of information technology on systems staff.6 their book covers concepts such as the client-server computing model, web 2.0, electronic resource management, open-source, and emerging information technologies. their 2004 studies show that, tough there are many challenges inherent in the position, there are also many ways for system staff to improve their knowledge, skills, and abilities to adapt to the changing information technologies. janet guinea has also studied the roles of systems librarians at an academic library.7 her 2003 study shows that systems librarians act as bridge-builders between the library and other university units in the development of library-initiated projects and in the promotion of information technology-based applications across campus. another relevant study was conducted by marshall breeding at vanderbilt university in an investigation of the library automation market. his 2012 study compares the well-established, traditional ilss that dominate the current market (and are based on client-server computing architecture developed more than a decade ago) to the next-generation ilss deployed through multitenant software-as-a-service (saas) models, which are based on service-oriented architecture (soa).8 through this comparison, breeding indicates that next-generation ilss will differ substantially from existing traditional ilss and will eliminate many hardware and maintenance investments for libraries. the next-generation ils will bring traditional ils functions, erms, digital asset management, link resolvers, discovery layers, and other add-on products together into one unified service platform, he argues.9 he gave the next-generation ils a new term, library services platform.10 this term signifies that a conceptual and technical shift is happening: the next-generation ils is designed to realign traditional library functions and simplify library operations through a more inclusive platform designed to handle different forms of content within a unified single interface. breeding’s findings conclude that the next-generation ils provides significant innovations, including management of print and electronic library materials, reliance on global knowledge bases instead of localized databases, deployment through multitenant saas based on a service-oriented architecture, and the provision of a suite of application programming interfaces (apis) that enable greater interoperability and extensibility.11 he also predicts that the next-generation ils will trigger a new round of ils migration.12 method our method narrowed down the analysis for the implications of ilss on library systems and technical services staffing models to two major aspects: (1) software architecture, and (2) workflows and functionality, including facilitation of collaborations between libraries and user engagement. first, we analyzed two traditional ilss, voyager and millennium, which are built on a client-server computing model, deliver modular workflow functionality, and are implemented in our institutions. through the analysis, we determined how these two aspects affect library organizational structure and librarian positions designed for managing these modular tasks. then, information technology and libraries | september 2013 51 based on information we collected and grouped from vendors’ documents, rfp responses, product demonstrations, and webinars, we examined the next-generation ilss alma, wms, and sierra— which are based on soa and intended to realign traditional library functions and simplify library operations—to evaluate how these two factors will impact staffing models. to provide a more in-depth analysis, particularly for systems staffing models, we also gathered and analyzed online systems librarian job postings, particularly for managing the voyager or millennium system, for the past five years. the purpose of this compilation is to cull a list of typical responsibilities of systems librarians and then determine what changes may occur when they must manage a next-generation ils such as alma, wms, or sierra. data on job postings were gathered from online job banks that keep an archive of past listings, including code4lib jobs, ala joblist, and various university job listing sites. duplicates and reposts were removed. the responsibilities and duties described in the job descriptions were examined for similarities to determine a typical list. the data from all sources were gathered together in a single database to facilitate its organization and manipulation. specific responsibilities, such as administering an ils, were listed individually, while more general responsibilities for which descriptions may vary from one posting to another were grouped under an appropriate heading. to ensure complete coverage, all postings were examined a second time after all categories had been determined. we also used our own institutions as examples to support the analysis. the implications of ils software architecture on staffing models voyager and millennium are built on client-server architecture. libraries that use these ilss also use add-ons, such as erms and link resolvers, to manage their print materials and licensed electronic resources. the installation, configuration, and updates of the client software require a significant amount of work for library it staff. many libraries must allocate substantial staff effort and resources to coordinating the installation of the new software on all computers throughout the library that access the system. those libraries that allow staff to work remotely have experienced additional costs and it challenges. in addition, server maintenance, backups, upgrades, and disaster recovery also require excessive time and effort of library it staff. administering ilss, erms, and other library hardware, software, and applications is one of the primary responsibilities for a library systems department. positions such as systems librarian, electronic resource librarian, and library it specialist were created to handle this complicated work. at a very large library, such as yale university library, the systems group of library it is only responsible for voyager’s configuration, operation, maintenance, and troubleshooting. two other it support groups—a library server support group and a workstation support group—are responsible for installation, maintenance, and upgrade of the servers and workstations. specifically, the library server support group deals with the maintenance and upgrade of ils servers and the software and relational database running on the servers, while the workstation support group takes care of the installation and upgrade of the client software on hundreds of a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 52 workstations throughout twenty physical libraries. at a smaller library, such as central washington university library, on the other hand, one systems librarian is responsible for the administration of millennium, including configuration, maintenance, backup, and upgrade on the server. another library it staff member helps install and upgrade the millennium client on about forty-five staff computers throughout its main library and two center campus libraries. comparatively, the next-generation ilss alma, wms, and sierra have a saas model designed by soa principles and deployed through a cloud-based infrastructure. oclc defines this model as “web-scale management services.”13 using this innovation, service providers are able to deliver services to their participating member institutions on a single, highly scalable platform, where all updates and enhancements can be done automatically through the internet. the different participating member institutions using the service can configure and customize their views of the application with their own brandings, color themes, and navigational controls. the participating member institutions are able to set functional preferences and policies according to their local needs. web-scale services reduce the total cost of ownership by spreading infrastructure costs across all the participating member institutions. the service providers have complete control over hardware and software for all participating member institutions, dramatically eliminating capital investments on local hardware, software, and other peripheral services. service providers can centrally implement applications and upgrades, integration across services, and system-wide infrastructure requirements such as performance reliability, security, privacy, and redundancy. thus participating member institutions are relieved from this burdensome responsibility that has traditionally been undertaken by their it staff.14 from this perspective, the next-generation ils will have a huge impact on library organizational structure, staffing, and librarianship. since the next-generation ils is implemented through the cloud-computing model, there is no requirement for local staff to perform the functions traditionally defined as “systems” staff activities, such as server and storage administration, backup and recovery administration, and server-side network administration. for example, the entire interfaces of alma and wms are served via web browser; there is no need for local staff to install and maintain clients on local workstations. therefore, if an institution decided to migrate to a next-generation ils, the responsibilities and roles of systems staff within the institution would need to be readdressed or redefined. we have learned from attending oclc’s webinars and product demonstrations that library systems staff would be required to prepare and extract data from their local systems during new systems implementation. they also would be required to configure their own settings such as circulation policies. however, after the migration, a systems staff member would likely serve as a liaison with the vendor. this would require, according to oclc’s proposal, only 10 percent of the systems staff’s time on an ongoing basis. through attending ex libris’s webinars and product demonstrations, we have learned that a local system administrator may be required to take on basic management processes, such as record-loading or integrating data from other campus systems. similarly, we have learned from innovative interfaces’ webinars and product demonstrations that sierra would still need local systems information technology and libraries | september 2013 53 expertise to perform the installations of the client software on staff workstations. sierra would require library it staff to perform administrative tasks like the user account administration and to support sierra in interfacing with local institution-specific resources. in general, as shown in table 1, local systems staff could be freed from the burdensome responsibility of administering the traditional ils because of the software architecture of the nextgeneration ils. systems librarian responsibilities workload percentage traditional ils nextgen ils managing ils applications, including modules and the opac 10 x managing associated products such as discovery systems, erms, link resolver, etc. 10 x day-to-day operations including management maintenance, troubleshooting, and user support 10 x x server maintenance, database maintenance and backup 10 x customizations and integrations 5 x x configurations 5 x x upgrades and enhancements 5 x patches or other fixes 5 x design and coordination of statistical and managerial reports 5 x x overall staff training 5 x x primary representative and contact to the designated library system vendors 5 x x keeping abreast of developments in library technologies to maintain current awareness of information tools 5 x x engaging in scholarly pursuit and other professional activities 10 x x serving on various teams and committees 5 x x reference and instruction 5 x x total 100 100% 60% table 1. systems librarian responsibilities comparison for traditional ils and next-generation ils. note: the systems librarian responsibilities and the approximate percentage of time devoted to each function are slightly readjusted based on the compiled descriptions of the systems librarian job postings we collected and analyzed from the internet and from vendors’ claims. a total of 47 position a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 54 descriptions were gathered. the workload percentage is adopted from the job description of the systems librarian position at one of our institutions. our analysis shows that systems staff might reduce their workload by approximately 40 percent. therefore library systems staff could use their time to focus on local applications development and other library priority projects. however, it is important to emphasize that library systems staff should reengineer themselves by learning how to use apis provided by the next-generation ils so that they will be able to support the customization of their institutions’ discovery interfaces and the integration of the ils with other local enterprise systems, such as financial management systems, learning management systems, and other local applications. the implications of ils workflows and functionality on staffing models the typical workflow and functionality of both voyager and millennium are built on a modular structure. major function modules, called client modules, include systems administration, cataloging, acquisitions, serials, circulation, and statistics and reports. additionally, the traditional ils provides an opac interface for library patrons to access library materials and manage their accounts. millennium has an erms module built in as a component of their ils while ex libris has developed an independent erms as an add-on to voyager. the systems administration module is used to add system users and to set up locations, patron types, material types, and other library policies. the cataloging module supports the functions of cataloging resources, managing the authority files, tagging and categorizing content, and importing and exporting bibliographic records. the sophistication of the cataloging module depends primarily on the ils. the acquisitions module helps in the tracking of purchases and acquisition of materials for a library by facilitating ordering, invoicing, and data exchange with serial, book, and media vendors through electronic data interchange (edi). the circulation module is used to set up rules for circulating materials and for tracking those materials, allowing the library to add patrons, issue borrowing cards, and form loan rules. it also automates the placing of holds, interlibrary loan (ill), and course reserves. self-checkout functionality can be integrated as well. the serials module is essentially a cataloging module for serials. libraries are often dependent on the serials module to help them track and check-in serials. the statistics and reports module is used to generate reports such as circulation statistics, age of collection, collection development, and other customized statistical reports. a typical traditional ils comprises a relational database, software to interact with that database, and two graphical user interfaces—one for patrons and one for staff. it usually separates software functions into discrete modules, each of them integrated with a unified interface. the traditional ils’s modular design was a perfect fit for a traditional library organizational structure. the staff at central washington university library, for example, under the library administration, are organized into the following three major groups: public services, including the reference and circulation departments; technical and technology services, including the cataloging, collection development, serials & electronic resource, and systems departments; and information technology and libraries | september 2013 55 other library services and centers, including the government documents department, the music library, two center campus libraries, the academic and research commons, and the rare book collection & archive. each department has at least one professional librarian and other library staff members responsible for their daily operations. for example, the collection development librarian is responsible for the acquisition of print monographs and serials, while the electronic resource librarian is responsible for purchasing and managing licensed databases or e-journals. however, the next-generation ils significantly enhances and reintegrates the workflow of traditional ils functions. the functionality is quite different from the traditional ils’s modular structure. the design of the functionality stresses two principles: modularity and extensibility. it brings together the selection, acquisition, management, and distribution of the entire library collection. it provides a centralized data-services environment to its unified workflows for all types of library assets. one of the big enhancements of the next-generation ils is the acquisitions module, which enables the management of both print and electronic materials within a single unified interface, with no need to move between modules or multiple systems for different formats and related activities. for example, according to oclc, wms streamlines selection and acquisition processes via built-in access to worldcat records and publisher data. vendor, local, consortium, and global library data share the same workflows. wms automatically creates holdings for both physical and electronic resources. the worldcat knowledge-base simplifies electronic resource management and delivery. order data from external systems can be automatically uploaded. for consortium users, wms’s unified workflow and interface fosters efficient resource-sharing between different institutions whose holdings share a common format. similarly, ex libris’s alma has an integrated central knowledge base (ckb) that describes available electronic resources and packages, so there is no need to load additional descriptive records when acquiring electronic resources based on the ckb. the purchasing workflow manages orders for both print and electronic resources in a very similar way and handles some aspects unique to electronic resources, such as license management and the identification of an access provider. staff users can start the ordering process by searching the ckb directly and ordering from there. this search is integrated into the repository search, allowing a staff user to perform searches both in his or her institution as well as in the community zone, which holds the ckb. the next-generation ils provides unified data services and workflows, and a single interface to manage all physical, electronic, and digital materials. this will require libraries to rethink their acquisitions staffing models. for example, in small libraries could merge the acquisition librarian position and the electronic resource librarian position or reorganize the two departments. another functionality enhancement of the next-generation ils provides the authoritative ability for consortia users to manage local holdings and collections as well as shared resources. for example, wms’s single shared knowledge base eliminates the need for each library to maintain a copy of a knowledge base locally, because all consortia members can easily see what is licensed by other members of the consortia. cataloging records are shared at the consortium and global levels a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 56 in real time. each institution immediately benefits from original cataloging records added to the system and from enhancements to existing records. authority control is built into worldcat, so there is no need to do authority processing against local bibliographic databases. with real-time circulation between libraries’ collections, there is no need to re-create bibliographic and item data in separate local systems. similarly, sierra enhances the traditional technical services workflows by providing a shared bibliographic database. whenever a member library performs selection or ordering, the library is able to determine if other consortia members have already selected, ordered, and cataloged the title. this may impact a local selection, allowing consortia members to more collectively develop their individual collections and reduce duplication. alma’s centralized metadata management service (mms) takes a very similar approach to wms and sierra, allowing several options for local control and shared cataloging, depending on an institution’s needs, while ex libris maintains authority files. very large institutions, for example, might manage some records in the local catalog and most records in a shared bibliographic database, while smaller institutions might manage all of their records in the shared bibliographic database. all these approaches require more collaboration and cooperation between consortia members. according to vendors’ claims on their proposals to the orbis cascade alliance, small institutions might not need to have a professional cataloger, since the cataloging process is simplified and it is therefore easier for paraprofessional staff to operate and copy bibliographic records from the knowledgebases of these ilss. in addition, the next-generation ils also allows library users to actively engage with ils software development. for example, by adding opensocial containers to the product, wms allows library developers to use api to build social applications called gadgets and add these gadgets to wms. one example highlighted by oclc is a gadget in the acquisitions area of wms that will show the latest new york times best sellers and how many copies the library has available for each of those titles. similarly, sierra’s open developer community will allow library developers to share ideas, reference code samples, and build a wide range of applications using sierra’s web services. also, sierra will provide a centralized online resource called sierra developer sandbox to offer a comprehensive library of documented apis for library-developed applications. all these enhancements provide library staff with new opportunities to redefine their roles in a library. conclusions and arguments in summary, compared to the client-server architecture and modular design of the traditional ils, the next-generation ils has an open architecture and is more flexible and unified in its workflow and interface, which will have a huge impact on library staffing models. the traditional ils specifies clear boundaries between staff modules and workflows while the next-generation ils has blurred these boundaries. the integration and enhancement of the functionality of the nextgeneration ils will help libraries streamline and automate workflows and processes for managing both print and electronic resources. it will increase libraries’ operational efficiency, reduce the information technology and libraries | september 2013 57 total cost of ownership, and improve services for users. particularly, it will free approximately 40 percent of library systems staff time from managing servers, software upgrades, client application upgrades, and data backups. moreover, the next-generation ils provides a new way for consortial libraries to collaborate, cooperate, and share resources. in addition, the web-scale services provided by the next-generation ils allow libraries to access an infrastructure and platforms that enable them to reach a broad, geographically diverse community while simultaneously focusing their services on meeting the specific needs of their end-users. thus the more integrated workflows and functionality allow library staff to work with more modules, play multiple roles, and back up each other, which will bring changes to traditional staffing models. however, the next-generation ils also brings libraries new challenges along with its clear advantages. librarians and library staff might have concerns pertaining to their job security and can be fearful of new technologies. they may feel anxious about how to reengineer their business processes, how to get training, how to improve their technological skills, and how to prepare for a transition. we argue here that library directors might think about these staff frustrations and find ways to address their concerns. libraries should provide staff more opportunities and training to help them to improve their knowledge and skills. redefining job descriptions and reorganizing library organizational structures might be necessary to better adapt to the changes brought about by the next-generation ils. systems staff might invest more time in local application developments, other digital initiatives, website maintenance, and other library priority projects. technical staff might reconsider their workflows and cross-train themselves to expand their knowledge and improve their work efficiency. they might spend more time on data quality control and special collection development or interact more with faculty on book and e-resource selections. we hope this analysis will provide some useful information and insights for those libraries planning to move to the next-generation ils. the shift will require academic libraries to reconsider their organizational structures and rethink their manpower distribution and staffing optimization to better focus on library priorities, projects, and services critical to their users. references 1. marshall breeding, “a cloudy forecast for libraries,” computers in libraries 31, no. 7 (2011): 32–34. 2. marshall breeding, “current and future trends in information technologies for information units,” el profesional de la información 21, no. 1 (2012): 11. 3. jason vaughan and kristen costello, “management and support of shared integrated library systems,” information technology & libraries 30, no. 2 (2011): 62–70. 4. marshall breeding, “agents of change,” library journal 137, no. 6 (2012): 30–36. a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 58 5. patricia ingersoll and john culshaw, managing information technology: a handbook for systems librarians (westport, ct: libraries unlimited, 2004). 6. edward g. iglesias, an overview of the changing role of the systems librarian: systemic shifts (oxford, uk: chandos, 2010). 7. janet guinea, “building bridges: the role of the systems librarian in a university library,” library hi tech 21, no. 3 (2003): 325–32. 8. breeding, “agents of change,” 30. 9. ibid. 10. ibid., 33. 11. ibid., 33. 12. ibid., 30. 13. sally bryant and grace ye, “implementing oclc’s wms (web-scale management services) circulation at pepperdine university,” journal of access services 9, no. 1 (2012): 1. 14. gary garrison et al., “success factors for deploying cloud computing,” communications of the acm 55, no. 9 (2012): 62–68. 276 on-line acquisitions by lolita frances g. spigai: former information analyst, oregon state university library; and thomas mahan: research associate, oregon state university computer center, corvallis, oregon. the on-line acquisition program (lolita) in use at the oregon state university library is described in t erms of development costs, equipment requirements, and overall design philosophy. in pa1'ticular, the record format and content of records in the on-orde1' file, and the on-line processing of these records (input, search, correction, output) using a cathode ray tube display terminal are detailed. the oregon state university library collection has grown by 15,00020,000 new titles per year (corresponding to 30,000-35,000 volumes per year) for the past three years to a total of approximately 275,000 titles ( 600,000 volumes); continuing serials account for a large percentage of annual "volume" growth. these figures would indicate an average input of 60-80 new titles per day. on an average, a corresponding number of records are removed each day upon completion of the processing cycle. a like number of records are updated when books and invoices are received. in addition, approximately 200 searches per day are made to determine whether an item is being ordered or to determine the status of an order. since the mid-1960's, and with the introduction of time-sharing, a handful of academic libraries ( 1, 2, 3) and several library networks ( 4, 5, 6) have introduced the advantages ( 7) of on-line computer systems to library routines. most of the on-line library systems use teletypewriter terminals. use of visual displays for library routines has been limited, although stanford anticipates using visual displays with ibm 2741 typeon-line acquisitionsjspigai and mahan 277 writer terminals in a read-only mode ( 1), and the library of the ibm advanced systems development division at los gatos, sharing an ibm 360/50, uses an ibm 2260 display for ordering and receiving ( 8). in addition, an institute of library research study, focusing on on-line maintenance and search of library catalog holdings records, has concluded that even with the limited number of characters available on all but the most expensive display terminals " ... the high volume of data output associated with bibliographic search makes it desirable to incorporate crt's as soon as possible, in order to facilitate testing on a basis superior to that achievable with the mechanical devices." (9). many academic libraries, during shelflist conversion or input of acquisition data, use a series of tags for bibliographic information. some of these tags are for in-house use, while others presumably are used to aid in the conversion of marc tape input to the library's own input format. the number of full-time staff required to design and operate automated systems in individual academic libraries typically ranges from seven to fifteen. this doesn't seem to be an inordinate range, since most departments of a medium-large to large academic library require a similar size staff for operational purposes alone. lolita (library on-line information and text access) is the automated acquisition system used by the oregon state university library. it operates in an on-line, time-shared, conversational mode, using a cathode ray tube (cdc-210) or a 35-ksr teletype as a terminal, depending upon the operation required. both types of equipment are in the acquisitions department of the library; each interacts with the university's main computer ( cdc-3300, 91k core, 24-bit words), which, in turn accesses the mass storage disk ( cdc-814, capable of storing almost 300 million characters) through the use of lolita's programs in conjunction with the executive program, os-3 ( 10). under the os-3 time-sll,aring system, lolita shares the use of the central computer memory and processor with up to 59 other concurrent users; the use of the mass storage disk is also shared with other users of the university's computer center. (lolita will require approximately 11 million characters of disk storage). lolita's programs are written in fortran and in the assembly language, compass, and are composed of two sets: those which maintain the outstanding order file, and those which produce printed products and maintain the accounting and vendor files. several key factors have shaped the design of lolita. an on-line, time-sharing system has been operating at osu since july 1968, and online capabilities have been available for test purposes since the summer of 1967. programming efforts could be concentrated exclusively on the design of lolita and an earlier pilot project ( 11) , for no time was needed to design, debug or redesign the operating system software, as was necessary at washington state u. and the u. of chicago (2, 12) . heavy reliance was put on assembly language coding for the usual 278 journal of library automation vol. 3/4 december, 1970 reasons, plus the knowledge that the computer center's next computer is to be a cdc-3500, with an instruction set identical to that which the library now uses. in short, neither the os-3 operating system nor the assembly language will change for the next few years. an added motivation influencing program design was the desire to minimize response time for the user. in view of the transient nature of a university library's student and civil service staff, the need for an easily-learned and maintained system is paramotmt. the flerible display format of the crt allows a machine readable worksheet, with a built-in, automatic, tagging scheme; it obviates the need for a paper worksheet, and thus eliminates a time-consuming, · tedious, and error-prone conversion process. the book request slip contains the source information for input. proofreading and correction are done on-line at time of input. alterations can be made at any later time as well. lolita has used from 1.5 to 3.0 fte through the period of design to operation. after an initial testing and data base buildup period, anticipated to last about six months, and during which lolita will be run in parallel with the manual system, it is expected that the on-order/in-process, vendor, and accounting files will be maintained automatically and that reports and forms currently output by the acquisitions department staff will be generated automatically. specifically, records comprising three files will be kept on-line : 1) the outstanding order file (a slight misnomer since it includes and will include three types of book request data: outstanding orders, desiderata of high priority, and in-process material), 2 ) name and address for those vendors of high use (approximately 200 of 2500, or about 8% ), and codes and use-frequency counts for all vendors, and 3) accounting data for all educational resource materials purchased by the oregon state university library. it should be kept in mind that, although lolita is designed for book order functions, the final edited record, after the item has been cataloged, will be captured on magnetic tape as a complete catalog record. thus, all statistics and information, except circulation data, will be available for future book acquisitions. this project is being undertaken for two reasons: 1) the oregon state university library is concerned that librarians achieve their potential as productive professionals through the use of data processing equipment for routine procedures, and that cost savings may be realized as the library approaches a total system encompassing all of the technical services routines, and 2) a uniquely receptive computer center and a successful on-line time-sharing facility are available. record format and content each book request is described by 27 data elements which are grouped into three logical categories and are displayed in three logical "pages" on-line acquisitionsfspigai and mahan 279 of a crt screen. the categories are: 1) bibliographic information, 2) accounting information, and 3) inventory information; figures 1, 2, and 3 list the data elements in the same sequence as they appear on the crt screen. though most data elements listed are self-explanatory, eight require some description. order number flag word author title edition id number publisher year published notes fig. 1. bibliographic information. order number date requested date ordered estimated price number of copies account number vendor code vendor invoice number invoice date actual price date received date 1st claim sent date 2nd claim sent fig. 2. accounting information. order number bib cit date cataloged volume issue location code lc class number fig. 3. inventory information. 280 l ournal of library automation vol. 3 f 4 december, 1970 flag word this data element indicates the status of a request. the normal order procedure needs no hag word. exceptions are dealt with automatically by entering an appropriate hag word. as more requests are added to the system, and as more exceptional instances are uncovered, more hag words will undoubtedly be added. to date there are twelve hag words, plus one data element which serves both as a data element and as a status signal. flag words and procedures activated are described below. conf.: confirming orders for materials ordered by phone or letter, and for unsolicited items which are to be added to the collection. the order form is not mailed, but used for processing internal to the library only. accounting routines are activated. gift: for gift or exchange items, a special series number prefixed by a "g" is assigned and the printed purchase order is used internally only. this hag word also acts as a signal so that accounting routines will not encumber any money. the primary reason for assigning a purchase order number is to provide a record indexing mechanism (this is also true for held orders) . held : selected second-priority orders being held up for additional book budget funds. these order records are kept on line, and are assigned a special series of purchase order numbers, prefixed by an "h." no accounting procedures accompany these orders, although a purchase order is generated and manually filed by purchase order number. live : held orders which have been activated. this word causes a reassignment of purchase order numbers to the next number in the main sequence ( instead of "h" -prefixed numbered) and sets up the natural chain of accounting events. the new purchase order number is then written or typed on the order form, the order date added, and the order mailed. cash: orders for books from vendors who require advance payment. an expenditure, instead of an encumbrance, is recorded. rush: used for books which are to be rush ordered and/or rush cataloged. rush will also be rubber-stamped on the purchase order for emphasis. no special procedures are activated within the computer programs; rush is an instruction for people. docs: used when ordering items from vendors with whom the osu library maintains deposit accounts (e.g. government printing office). this causes a zero encumbrance in the accounting scheme; cash is used to put additional money into deposit accounts. canc: cancelled orders. unencumbers monies and credits accounts for cash orders. reis: used to reissue an order for an item which has been cancelled. a new purchase order containing a new order number, vendor, etc. will automatically be issued. re-input is not necessary; however, changes in vendor no., etc., can be made. on-line acquisitionsj spigai and mahan 281 part: denotes a partial shipment for one purchase order. no catalog date can be entered while part appears as the flag word. invo will replace part when the final shipment has been received; canc will replace part if the final shipment is not received, and the order is reissued for the portion received. · invo : when invoice information is entered into the file, invo is typed in as the flag word. this causes accounting information (purchase order number, vendor code, invoice number, actual price, invoice data, account number) to be duplicated in the accounting file. kill: used to remove an inactive record from the file ( cf. date cataloged). date cataloged: a value entered for this data element signals the end of processing. the record is removed from the main file and transferred to magnetic tape. changes and additions to inventory and bibliographic data elements are anticipated at this final point, to bring the record into line with those of the catalog dept. author(s) all authors are to be included in this data element, corporate authors, joint authors, etc. the entry form is last name first (e.g. smith, john a. ). for compound authors, a slash is used as the delimiter separating names (e.g. smith, john a. i jones, john paul) . id number standard book number, vendor catalog number, etc. order number the order number is automatically assigned to one of three series depending on the flag word: the main number series with the fiscal year as prefix; held order series with an "h"-prefix (stored in the order number index as 101, the "h" is what is printed on the order forms); and gift series with a "g" -prefix (likewise stored in the order number index as 102). vendor code a sample of 18 months of invoice data (obtained from the comptroller's office) for the library resource account number indicates the use of 2200 vendors during that period of time. by sorting by invoice frequency and dollar amount, about 200 vendors were identified who either invoiced the library more than 12 times during this time period (since the invoices tended to contain more than one item for frequently used vendors, the number of purchase orders issued could easily be several times this amount), or whose invoices totalled over $110.00. of these, 171 have been selected for on-line storage. they will be assigned code numbers 1 to 171, and names and addresses of these vendors will be included on the computer generated purchase orders. authority files for all vendors 282 journal of library automation vol. 3/4 december, 1970 are kept on rolodex units; one set is arranged alphabetically by vendor name, the other by vendor code. account number the library account to which the book is charged. the number is divided into four sections: 1) a two-digit prefix identification for osu, 2) a four-digit identification for osu library resource expenditures, 3) a oneor two-digit identification of the particular library resource fund account to be charged (e.g. science, humanities, serials, binding, etc. ), and 4) a oneor two-digit code identifying the subject which most closely describes the request. from this data, statistics will be derived which describe expenditures by subject as well as by fund allocation. this will provide a powerful tool for collection building and . may also be a political aid in governing departmental participation in book selection. bibcit bibliographic citation code which cites the location by acquisitions dept. personnel of bibliographic data ( l.c. copy, etc. ). this information is included on the catalog work slip (4th copy of the purchase order) so that duplicate searching by the catalog dept. can be avoided. lc classification number refers to the call number as it is assigned by the osu catalog dept. file organization on-order record the operating system for oregon state university's on-line, time-sharing system reads into memory a quarter page (or file block) of 510 computer words at a time. each on-order (outstanding order) record is composed of a block of 51 computer words ( 204 6-bit characters), or linked lists of blocks, in order to best use this system. thu·s, each quarter page is divided into ten physical records of 51 computer words apiece. for records requiring more than one block, the nearest available block of 51 words within the same 510 word file-block is used; but if none is vacant within the same file-block, the first available 51-word block in the file is used. if none is free the file is lengthened to provide more blocks. a bit array is used to keep track of the status (in use, vacant) of records in the main file. in the bit array, each of 20 bits of each 24-bit computer word corresponds to a 51-word block in the main file. as in figure 4, the 13th bit has a zero value, indicating a vacancy in the 13th 51-word block of the main file; the 14th bit has a value of 1, indicating the 14th 51-word block in the on-order file is in use. a total of 10,120 block locations can be monitored by each file block of the bit array. records in this file are logically ordered by purchase order number, the arrangement effected by pointers which string the blocks together. on-line acquisitiansf spigai .and mahan 28$ 510-word ftle block unused 4 bits one -word b i t array fig. 4. bit army monitor of record block use in the on order file. access points order number the order number index is arranged by the main portion of the order number, and within that, it is in prefix number sequence. the sequence in figure 5 illustrates order number index arrangement (as well as the logical arrangement of the on-order file). the order number index allows quick access to selected points within the main file. conceptually, the ordered main file is segmented into strings of records whose order numbers fall into certain ranges. more specifically, items whose sequence numbers range from 0 to 4 (ignoring the prefix of the order number) comprise the first segment, 5 to 9 the second, etc. the index itself merely contains pointers to the leading record in each (conceptual) segment. thus, in the records whose purchase order numbers are shown in figure 5, there would be pointers to the second (69-124) and sixth (70-125), but not to the others. to reach the fourth ( 101-124) one follows the index to the second, and then follows the block pointers through the third to the fourth . 102-118 69-124 70-124 101-124, 102-124 70-125 102-125 . 70-126 fig. 5. fiscal year 1969, order number 124 fiscal year 1970, order number 124 held order number 124 for the current year gift order number 124 for the current year ( note : the prefix 'h,' which is printed on the purchase orders is represented as the number 101 for internal computer processing; likewise 102 represents the prefix 'g') order number index sequence. 284 journal of library automation vol. 3/4 december, 1970 p.o. number forward pointer ' p.o. number backward pointer time of last update . p. 0. number title forward pointer v title backward pointer v pointers to author( s) / ~ ~ title > date of re_quest date ordered encumbered price number of c<>e_ies account number (2 words) vendor number flag word ~ publisher 1 date of publication ~ notes ~ ~ edition ~ ld number ~ blbcit ' lc classification number )' volume number issue ~ location code ; ~ ~ vendor's invoice number ~~ invoice date actual price date received date first claim sent date second claim sent fig. 6. "on order" record organization. on-line acquisitionsjspigai and mahan 285 author(s) the author index is in the form of a multi-tiered inverted tree. the lowest tier is an inverted index containing the only representation of the author's names (it is not stored in the on-order record (figure 6), and, for each author, pointers to the records of each of his books (figure 7). the entries for several authors may be packed into a single 51-word block, if space permits. each higher tier serves to direct the indexing mechanism to the proper block in the next tier below, and to this end as much as needed of an author's name is filed upwards into higher tiers; this method is described in more detail by lefkovitz ( 13) as "the unique truncation variable length key-word key." author index directory (level 0 + 1) john/ jones, j 927 inverted author index (level 0) control word (ii chars. in record; # chars. in full name of author; # of titles jones, t jones, john pa ul 928 jop k.a 1282 tow ~ ~~~3 in on order file ~~2~66~7------------~ on order file 1072 927 10/20/69 10/29/69 $4.95 . 30-1061-6-20 16 0000 1282 10 fig. 7. author index organization and access to on order file. title not yet programmed. on-line record processing record creation after a number of new book requests have been searched to determine their absence from osu's collection and after they have been bibliographically identified, they are hatched for vendor assignment and readied for entry into the on-line file of book requests via the crt (figure 8 ). l-.:> 00 0) g '"'t i5 -c -~ n ... /y'rifiid "-.. n _/ not ""i assiql vrnoor 1•..-::-::-. _ i .... ~ a ~ y i > ~ ...... c ~ ...... .... c ;:s n < 0 !-' cn -~ d (!) () (!) !3 0"' (!) ~'"i ..... to -..1 0 fig. 8. book request processing. on-line acquisitionsjspagai and mahan 287 lolita's starting page is obtained by typing in the word lolita on the crt screen. the text illustrated in figure 9 is then displayed on the screen of the crt. when 't' is typed in, indicating a wish to create a record, the first data element of the first page of input appears (figure 10). (since the majority of records do not need a flag word upon input, the flag word fill-in line appears only on a redisplay of this page, and the flag word may be inserted at that time.) main file please indicate a choice 1. create a new entry 2. locate an existing entry 9. terminate all processing fig. 9. "starting" page of function choices. author(s): examples: jones dequincey, thomas washington, booker t. adams, john quincy/ doe, john american medical association fig. 10. first data element displayed in new record creation process. at this point the user can go in one of two directions. the first page of input information may be entered one data element at a time, each element being requested in a tutorial fashion by lolita. alternately, all of the first page data may be input at once, with data elements separated by delimiters. the user can switch from one method to the other at any point. a control key (return) is the delimiter used to signal the end of each data element, and, at the same time, return repositions the cursor (which indicates the position of the next character to be typed on the crt screen) to the location of the next data element to be filled in. another conh·ol key (send): 1) serves as a terminal delimiter, and 2) transmits data on the screen to the computer, thereby 3) triggering the continuation of processing until the next screen display is generated. thus, with page one, data elements are displayed, filled in and sent one at a time in the tutorial approach, or, all seven data elements are typed in at once, a return mark following items 1-6, then sent after the last data element. return or send must be used with each data element, even with those for which there is no information. this secures the sequence of element input, thus providing an easy (for the user) and automatic way of tagging elements for any future tape searches to provide statistics or analytical reports. in particular, this process obviates all content restrictions on variable (ie., free-form) items. each of the pages is redisplayed after 288 journal of library auto'tiultion vol. 3/4 december, 1970 input, and corrections can be made at this time. the crt is used for all input and its write-over capabilities are utilized for corrections, as compared to the "read-only" use planned for crt displays used for stanford's ballots ( 1). except for the flag word, all the data elements on the first page are variable in length and unrestricted as to content. data elements on page 2 and 3 (figures 2 and 3) are more of a fixed length in nature; thus with these pages, a whole page at a time is always filled in and sent: the tutorial function is inherent in the display. the concluding display is shown in figure 11. send if all done, type 1-3 to review pages. fig. 11. review option. because hatched searching and input are assumed, when one search or input is finished, the program recycles to continue searching or inputting without going back to the starting page (figure 9) each time. record search searching programs have been completed which will search by order number and by author. title searching will be implemented within the next few months, although a satisfactory scheme for title searching ( improving on manual methods, yet economical) has not been uncovered. methods suggested or used by ames, kilgour, ruecking, and spires have been noted (14, 15, 16, 17). the procedure for searching within the outstanding order file begins with the display of choices shown in figure 9. one types a "2," indicating a desire to locate an existing entry, and the text shown in figure 12 is displayed on the crt screen. at this point one chooses to search either by order number or by author. if one selects a valid order number representing a request record, the first page of that record, containing bibliographic information, is displayed. this is followed by the display shown in figure 11, so that accounting and inventory information may also be reviewed. for the user's convenience the order number is displayed in the upper right-hand comer of each of the three pages, both upon record input and search redisplay. to search by author, one types the author's name on the second line of figure 12, using the same format as that used in record creation. if the ------------------------: order number ------------------------------: a uth 0 r supply one of the above (start on the appropriate line) fig. 12. display of search options. ' on-line acquisitionsjspigai and mahan 289 author has only one entry in the outstanding order file, the first page of the entry will appear, etc. (as in the order number search above) . if the author entered has more than one entry in the on-line file, information depicted in figure 13 will be displayed on the screen of the crt. __ _____________ : enter number or 'nf' (not found) 1. night of the iguana 2. the milk-train doesn't stop here anymore 3. cat on a hot tin roof n. the glass menagerie fig. 13. display of multiple titles on file for one author. if the requested title is one of the titles displayed, one types its number and the record for that title will be displayed. if the title isn't among those displayed, typing nf would result in a redisplay of the text in figure 12 in order for searching to continue. for personal authors, variant forms of the name may be located using the following procedure. the word others is entered at the top of the screen, after an unsuccessful author search, so that a search for author j. p. jones would find all documents by john paul jones, joseph p. jones, j. peter jones, etc., as well as j. p. jones. a search for john p. jones would find all documents by j. p. jones, john jones and j. peter jones as well as john p. jones. record changes additions and corrections to the original record are made by first locating the record (by order number, author, or eventually, title), adding to the data elements, or writing over them (for corrections), and transmitting the information. examples of this procedure include: 1) entering the date received, 2) recording the vendor invoice number, invoice date, and actual price and 3) inserting or changing a flag word. in addition, after an item has been cataloged, the record is revised to include catalog data, as well as to exclude extraneous order notes. output aside from the crt displays, output is in three forms: off-line tape, printed forms and on-line files (figure 14). examples of output are library purchase orders, accounting reports, vendor data, and records of cataloged items. the number of potential reporting uses is limited only by money and imagination. 290 journal of library automation vol. 3/4 december, 1970 fig. 14. output from on-line on order file input. i order number i i date i id number author title publisher vendor name vendor address voujmes edition fig. 15. purchase order. f estimated price i no. of copies i vendor cooe i account date of pub. * * * • flag** • * gift or held order no. bibcit library purchase order 00 r cd !il~ iii r= ::0 r < . > sp >cil r-i ~ c/l c~ x c/lftl :v 0 c: -i ::0 z 0 "' < q "' ~ ::0 c/l ~ :::; -< on-line acquisitionsfspigai and mahan 291 the purchase order, shown in figure 15, is composed of four copies: 1} the vendor's copy to be retained by him, 2) a vendor "report" copy, 3) the copy which is kept as a record in the osu library, and 4) a catalog work slip to be forwarded to the catalog department with the book. purchase orders are printed on the library's teletype, which is equipped with a sprocket-feed. orders can also be printed on the line printer in the computer center. while this is a slightly cheaper data processing procedure, since no terminal costs are incurred, convenience and security have produced a victory in "economics over economies" ( 18 ), and the librarian's time has been considered in the total scheme. for gift items, purchase orders are produced as the cheapest means of preparing a catalog work slip. held purchase orders are produced and manually filed in purchase order number sequence, but when their status is changed to live, the old numbers are automatically replaced by a purchase order number in the main series. these new numbers are written onto the purchase orders, along with any other changes, and the orders are mailed. the flag word live also activates accounting procedures. there are two sets of accounting reports. the first is generated when the purchase orders are issued and contains tabulated information for the library's bookkeeper, the head of business records in the acquisitions dept., and the comptroller of the oregon state system of higher education. the second summary report is issued after the book and invoice have been received and will contain additional information, pertinent to the invoicing procedure; this report has the same distribution as the first. periodic reports are planned for the library's subject divisions summarizing expenditures by account number, reference area, and subject. programming for this has not yet been done. a frequency count will be stored with each vendor code and periodic listings will be printed for use in retaining vendors. mter an item has been cataloged, the catalog work slip and a slip equivalent to a main-entry catalog card are sent to acquisitions, and all remaining information and changes are recorded in the on-line record. this record is then transferred to a file from which it is dumped onto a magnetic tape. this off-line file will be used for statistical analyses and will be the start of a machine readable data base. future plans will, of course, depend on funding; however, two logical steps which could follow immediately and require no additional conversion are: 1) additional computer generated paper products (charge cards, catalog cards, book spine labels, new book lists, etc. ) , and 2) a management information system using acquisition and cataloging data. the construction of a central serial record in machine readable form would produce many valuable by-products. a program for the translation of the marc ii test tape has been written which causes these records to be printed out on the computer center's line printer; and since a sub292 journal of library automation vol. 3/4 december, 1970 scription to the marc tapes is now available to osu for test purposes, its advantages and compatibility with lolita will be investigated as time permits. unsolved problems, aside from those which everyone working in a data processing environment faces (e.g. syst~m and hardware breakdown, continued project funding, and lengthy dehv~ry times for hardware), include: 1) the widely varying system response tunes (commonly from a fraction of a second up to 60 seconds; usually 2-15 seconds); 2) the lack of personnel skilled in both data processing and library techniques; 3) the limited print train currently available on the line printer ( 62 character set); and 4) bureaucratic policy, which can render the most sophisticated plans for automation unfeasible if properly applied. it is recognized that all these problems can be solved by money, time, and priorities. meanwhile, the period of in-parallel operation will be valued as a time to educate, to test, to gather statistics, and to further refine the programs and procedures which comprise lolita. evaluation preliminary input samples indicate that a daily average of from 8 hours, 20 minutes, to 10 hours and 45 minutes will be necessary for input, searches, ~ting and corrections using the crt. an additional 3 hours per day ~f terminal time using the teletype will be required to produce the purchase orders, answer rush search questions if the crt is busy, and activate the daily batch programs (accounting reports, etc.). the sad economic plight of most libraries causes librarians to cast an especially suspicious eye on the costs of automation; a few words on osu's data processing costs may b~ of interest. the cost of total development efforts to produce lolita is under $90,000 (though considerably less was actually expended), or an average annual cost of $30,000 over a three-year period. this compare~ favorably with average annual incomes of from $50,000 to over $300,000 m federal funds alone for other on-line library acquisition projects in ?tiiversities ( 19, 20, 21, 22). a total of 6.75 man-years was required to des1gn lolita. the 6.75 man-years comprises 2.5 years of programming, 3.25 years .of systems analysis, coordination and documentation, and 1.0 year of clencal work, and represents the efforts of four students and six professional workers. this total does not include the time spent by acqu~sitions department personnel in reviewing lolita's abilities or in leammg to use the terminals. current data processing rates charged by the computer center include the following: crt rental-$100/mo.; cpu time-$300/hr.; terminal time -$2.00/hr.; on-line storage costs-15c/2040 characters/mo. the teletype has been purchased, thus only local phone lines charges are incurred. the on-line system is available for use from 7 :30 a.m. to 11:00 p.m. each week-day, and from 7:30 a.m. to 5:00 p.m. on saturday, which more than covers the 8-5 schedule of the acquisitions department. il on-line acquisitionsjspigai and mahan 293 acknowledgments the work on which this paper is based was supported by the administration, the computer center and the library of oregon state university. special mention is due robert s. baker, systems analyst, osu library, and lawrence w. s. auld, head, technical services, osu library, for their extensive participation in the lolita project and for their many suggestions which benefitted the final version of this paper. hans weber, head, business records, osu library, also contributed much to lolita's design. references l. veaner, allen b.: project ballots: bibliographic automation of large library operations using a time-sharing system. progress report, march 27, 1969-june 26, 1969, (stanford california: stanford university libraries, 29 july 1969), ed-030 777. 2. burgess, thomas k.; ames, l.: lola: library on-line acquisition sub~system~ (pullman, washington: washington state university, systems office, july 1968), pb-179 892. 3. payne, charles: "the university of chicago's book processing system." in stanford conference on collaborative library systems development: proceedings, stanford, california, october 4-5, 1968 (stanford california: stanford university libraries, 1969). ed-031 281, 119-139. 4. pearson, karl m.: marc and the library service center: automation at bargain rates (santa monica, california: system development corporation, 12 september 1969). sp-3410. 5. nugent, william r.: "nelinet -the new england library information network." in congress of the international federation for information processing (ifip), 4th: proceedings, edinburgh, august 5-10, 1968 (amsterdam, north holland publishing co., 1968 ). g28-g32. 6. blair, john r.; snyder, ruby: «an automated library system: project leeds," american libraries, 1 (february 1970), 172-173. 7. warheit, i. a.: "design of library systems for implementation with interactive computers," ] ournal of library automation, 3 (march 1970)' 68-72. 8. overmyer, lavahn: library automation: a critical review (cleveland, ohio: case western reserve university, school of library science, december 1969). ed-034 107. 9. cunningham, jay l.; schieber, william d.; shoffner, ralph m.: a study of the organization and search of bibliographic holdings records in on-line computer systems: phase i (berkeley, california university: institute of library research, march 1969). ed029 679, pp. 13-14. 294 journal of library automation vol. 3/4 december, 1970 10. meeker, james w.; crandall, n. ronald; dayton, fred a.; rose, g. : "os-3: the oregon state open shop operating system." in american federation for information processing societies: proceedings of the 1969 spring joint computer conference, boston, mass., may 14-16, 1969 (montvale, new jersey: afips press, 1969), 241248. 11. spigai, frances; taylor, mary: a pilot-an on-line library acquisition system (corvallis, oregon: oregon state university, computer center, january 1968), cc-68-40, ed-024 410. 12. university of chicago. library: development of an integrated, computer-based, bibliographical data system for a large university library (chicago, illinois: university of chicago, library, 1968). pb-179 426. 13. lefkovitz, david : file structures for on-line systems (new york: spartan books, 1969 ), pp. 98-104. 14. ames, james lawrence: an algorithm for title searching in a computer based file (pullman, washington : washington state university library, systems division, 1968). 15. kilgour, frederick g.: "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science, 5 (new york, greenwood publishing corp., 1968)' 133-136. 16. ruecking, frederick h., jr.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-238. 17. parker, edwin b.: spires (stanford physical information retrieval system). 1967 annual report (stanford california: stanford university, institute for communication research, december 1967), 33-39. 18. kilgour, frederick g.: "effect of computerization on acquisitions," program, 3 (november 1969), 100-101. 19. "university library systems development projects undertaken at columbia, chicago and stanford with funds from national science foundation and office of education," scientific information notes, 10 (april-may 1968), 1-2. 20. "grants and contracts," scientific information notes, 10 (octoberdecember 1968), 14. 21. "university of chicago to set up total integrated library system utilizing computer-based data-handling processes," scientific information notes, 9 (june-july 1967), 1. 22. "washington state university to make preliminary library systems study," scientific information notes, 9 (april-may 1967), 6. 1 an automated music programmer (musprog) david f. harrison, music director, wsui-ksui, and randolph j. herber, applications programmer, university computer center, the university of iowa, iowa city, iowa a system to compile programs of recorded music for .broadcast by the university of iowa's radio stations. the system also provides a permanent catalog of all recorded music holdings and an accurate inventory control. the program, which operates on an ibm 360/65, is avaaable in fortran iv, cobol and pl/1, with assembly language subroutines and external maintenance programs. the state university of iowa (iowa city) owns and operates two broadcasting stations, wsui, at 9l0 kc, and ksui, at 91.7 mc. wsui was the first educational radio station in operation west of the mississippi, and ranks among the oldest stations iri the country; ksui was among the earliest of the frequency modulation outlets in the area to offer programming in multiplex stereo. in the spring of 1967, when it became necessary to completely reorganize their recorded music libraries, an investigation wali simultaneously underway to determine the feasibility of utilizing automated data processing ( a.d.p.) techniques in the discographic operations of the stations. at the time there were several working bibliographic applications ( 1), ranging from relatively simple record-keeping (where is ... ? ) to more ambitious cross-referencing and indexing operations, one of which uses the kwic (keyword-in-context) computer program to classify musical recordings ( 1). on the basis of the awareness of these applications, and a belief that the intrinsic principles could be utilized and extended to cover somewhat different needs, it was proposed that the facilities of the uni2 journal of library automation vol. 2/1 march, 1969 versity computer center be employed in the selection and updating of recorded music programs. in designing a coded set of instructions to perform these tasks, it was deemed necessary that any attempt at the selection or compilation of a series of music programs should be made in accordance with certain criteria supplied to the system by the user, and that these selection specification parameters should closely parallel those which would be employed were such an extraction from the total libraries to be performed manually. additional requisites were that provision be made for updating and enlarging the master file as new items were acquired, and that the coding of the programmed instn.1ctions should be sufficiently flexible to permit inclusion of supplemental criteria as they became desirable. the above proposal met with a certain degree of opposition, the main bone of contention being that such an application would necessarily "dehumanize" music programming. there have been, and will continue to be, similar objections raised by those who are unaware of the advantages offered by a. d. p. and concomitantly unaware of the mental processes which result in what is commonly referred to as "artistic judgment." it is not the purpose of this article to attempt an exhaustive analysis of such processes, nor to castigate the objectors; it is rather simply to bring forth several basic observations dealing with the problem under discussion. a contemporary composer-theorist interested in the applications of a. d. p. techniques to the process of musical composition has observed that no paradoxical "almighty force" exists in science, which, in actual fact, progresses by discrete steps which are at once limited but unpredictable ( 3, 4). the following list of conclusions, although relating specifically to the problems of machine-"created" music, find no less an application to the current problem: creative human thought is an aggregate of restrictions or choices in all fields of human activity, including the arts. certain aspects of these ·judgments can be mechanized and simulated by certain physical mechanisms currently extant, including the cm;nputer. the rapidity of calculation or decision by computer frees human beings from the long and arduous task of manually selecting, compiling and checking of programmed works. the time thus saved can be better spent on such amenities as scripting, with complete performance information and record data, and the always-too-necessary pronouncing aids. moreover, the computer program can be "exported" to any place similarly equipped to be used by other individuals, or where other programmers are able to alter the algorithm to meet their specific needs. the automated music programmer ( musprog) was interpreted as being a series of steps, the first of which specifies that complete mus~c programs are to be selected in accordance with a table of specifications introduced as data, each card containing inforination pertinent to a discrete program. the second step requires that each and every entry in the catalog be i f automated music programmer/harrison and herber 3 checked for availability by any program in the tables established in the preceding step, this status to be determined on the basis of a satisfactory comparison with the individual criteria supplied on the selection specification card. among these are "tests" (note that a failure to meet the requirements in any step disqualifies the item) to determine when the item was last selected, as well as the number of times selected; a check for allowable time length; a check for duplication of composer and/ or title; a statement that stereo recordings are to be used only for fm; a check for acceptable period, style and type of composition; and the decision to update the master file. in the final operation of the program, each duplicate title of a work selected is also updated, simulating selection to prevent its selection during the next month. if each duplicate were given the date factor of the item actually selected, the latter would tend to appear much more frequently than its companions because the program would continue to select the longest available item, and it is reasonably safe to assume that the selected item is the longest version of the title in question. it was necessary, therefore, to devise a means by which each version of a given work (indicated by both title and composer) be given equal weight for "fair" selection. a unidimensional array called item was constructed with ten positions as follows: item (10)/0', '0', '0', '0', '1', '1', '1', '2', '2', '3'/. the index of the array was then selected by referencing a routine which generates random, positive integers in the range one through ten. the contents of that position in item are added to the date factor of the record selected, and the result placed in the corresponding field of the duplicate title under scrutiny. thus there exists a 40% probability that the duplicate will have the same "weight" as the selected item, a 30% chance that the duplicate will be "pushed back" one month, 20% for two months, and a 10% probability that the date factor of the duplicate title will be increased by three months. 'when all the titles have been thus read or updated, the run concludes. figure 1 is the flowchart that is the basic design of musprog and from which the computer program was coded. the program runs on an ibm 360/65 and is available in fortran iv, cobol, and pl 1 with assembly language subroutines and external maintenance programs. copies of these programs may be obtained from the national auxiliary publications service (naps #00278). the machine readable catalog system currently employed by the university's radio stations is, on the whole, independent of the record's origin or manufacture. (the catalog number could be considered as nothing more than an indication of a discrete shelf space. ) the system was designed to facilitate maximally efficient use of the 80 columns available on a punch card. by utilizing two alaphabetic and two decimal characters ranging from aoo through zz99 provision is made for identification of records and tapes in quantities somewhat in excess of seventy-thousand 4 ] ournal of library automation vol. 2/1 number of p rogram spec .. lfication spaces +1 to snumi snumi -1 to snumi write program determine for which station program is being selected indic ate null selection list iy setting pol nter= ~ o'flfrmine which compo~ents of the allowable characteristics are significant fig. 1. flowchart for musprog. march, 1969 to the paogram pointer set j to pointer in piece cep•j-it--l.--------' i from lag add oh f 10 i\iumifr of plfcfs sflfctfd chain rl f: cf to begi nn i ng of list for ,rogram sui i rac t duration from time remain ing c om,u i f lag fr om du ra l io n copy p i e ce informal i on 1 ~ 1 0 spac f automated music programmer/ harrison and herber 5 ph 1 w ri te prog ram poi nter rewind ~npui and r an oom c hang e in to pr opu l a g f i el o fig. 1 continued. 6 journal of library automation vol. 2/1 march, 1969 individual discs or reels. the total of actual single titles possible to catalog in this manner is at least twice that number. the card catalog is made up along more or less standard, triple-reference lines on the familiar 3x5-inch card. these remain in the master card file, but are actually used only for reference purposes, rather than for actual selection. the "real" master library exists in the form of punched cards (later transferred to magnetic tape). each card image contains the following information, with blank columns separating contiguous fields: columns 1-10 composer, or first ten characters if abbreviation necessary. 12-27 title, abbreviations standardized 29-33 duration of work in seconds. 35-37 period of composition. 39-40 type of composition. 42-45 catalog number. 47-57 physical location of item on cataloged disc or tape. 59-64 "date fields, used for updating and usage factors. 66-69 seasonal key, a blank indicating general usefulness. 71-80 field used by musprog for internal record-keeping. operation selection of music by the system is performed in accordance with a table of program specifications which includes information pertinent to the length of the desired program and maximum permissible length of any single work within it, the type of music desired, and additional information, such as date, time and title of the program to be aired and an indication of the station for which the program is to be selected. all the selections for ksui ( fm) are required to be stereophonic. classification into stereophonic and monophonic groups is a function of the catalog number, aoo through z99 being stereophonic and aaoo through zz99 being monophonic. a program selection card contains the following data: columns 1 2-6 7-11 12 13-27 28-79 station code: w for wsui, blank for ksui duration of program in seconds. maximum duration of each item to be selected ( 0 or blank indicates program may consist of but a single work equivalent in length to program duration. number of types being specified. three three-plus-two character fields to specify period and type (modia equals "twentieth-century, orchestra"). if any field is blank, musprog assumes anything acceptable. title of program to be selected, day and time. automated music programmer/harrison and herber 7 as an example, the following specifications were made for a program called "aubade" which was aired at 10:00 a.m. on tuesday, july 30, by wsui. program duration was to be 3400 seconds (56:40), allowing 3:20 for continuity. maximum length of any single work within the program was to be 900 seconds (15 :00). music could be chosen from the contemporary orchestral repertoire, any instrumental work from the classic period, or any type "3" work, i. e., soloist and piano, or chorus a cappella. figure 2 shows a printout of selections for two programs. music sel ected fo r ws ui even i ng con cert 5:30 pm thu rsday. july 2 5 pr ogram no, 6 9 le ng th 860 0 unuse d time is 1 tota l 2:23:19 rangs trom di v el eg !aco fa41 52/82 prok o~ ! ev semyon kotko sui ka 4 7 51-2/e bach cantata 146 ma 19 s1-2/e beetho ven pia co n arr vn c kb-> 1 s 12/ e music sele cte d for wsui 0:15:32 0:42:12 0:42:25 0 :43:10 even i ng co nce rt 5:30 pm tu esday• july 30 procr mt no , 6 7 length 060 0 unus ed t im e is 0 total 2:23:20 ha ydn sy mp ho ny 38 kfl21 51/e tchaik ovsk sy~ phony 5 ga 5 0 s l -2/e i ve s pi a son 2 con cor na8 1 s l-2/ e beethoven strin g qrt 15 ca 12 51-8/e fig. 2 printout of selections. 0 : 13:40 0 ! 42:58 0:4 3: 0 an additional feature of musprog is provision for a periodic summary of library usage, affording the librarian a concise account of frequently played items, as well as an indication of those works which have been selected infrequently or ignored altogether. this report allows the programmer to assess more accurately the maximum number of times a selection may be programmed before it is declared unacceptable. the system also puts out printed lists of works extracted from the library in accordance with a user-specified table of reference fields: e. g., all symphonies, all works by bach, all works of under ten minutes in length, all christmas music; or conceivably, any symphonies by bach which are 8 journal of library automation vol. 2/1 march, 1969 suitable for christmas and less than ten minutes long. this latter step could also include, with minor alterations in the computer program, provision for performances by one specific ensemble or artist only. an external program allows adding items to the master tape, deleting those no longer needed and correcting any of the various fields within individual records; thus if mis-timings or other inaccuracies are noted, it becomes a relatively simple matter to correct them. discussion it can readily be seen that "the machine" neither possesses nor displays "taste" in any conventional sense of that word, since it can select only those types of music which the programmer has declared acceptable. it does not, indeed cannot, show any predilection toward certain types of music to the detriment or exclusion of others, save those which have been removed from the list of potential selections by the programmer. it performs no independent judgments. without doubt, then, there is no logical basis for t.~e cry of "dehumanization," since the program was originally designed by human minds and is, at each step of the process of selection, governed by the human-designed control parameters and program specifications; therefore it cannot select music willy-nilly, but must be told what to do and how to do it. it also has been found that specifications cannot be "plugged in" at random, for the programs thus selected would prove little more than a conglomerate of sundry works bearing no relation to one another. it is very much a necessity that organization and logic be designed into each program to have any coherent programming result. the machine does not "know" what to do unless told. it should be brought out that because of a built-in logic and the order of titles on the master file the program will tend to select the longest works available to fill the specified program time, making up the difference, if any, with progressively shorter pieces until the time is filled, or until no work of acceptable type and sufficient brevity can be located. since the longer works tend to occur among certain types and/ or styles of music, there may be some tenuous grow1ds for a suspicion of bias. it will be observed that musprog does not include information pertinent to performer, conductor, etc. one of the several reasons for this apparent oversight is that such information would, at the outset, have required the use of one to four additional data cards per title. since this information was not deemed absolutely essential to the immediate functions of the program, it was decided to postpone inclusion of such a refinement to some future date. conclusion musprog has been utilized by the state university of iowa since march, 1968, and has resulted in considerable time-saving. for example, the july, 1968, programming required one hundred and two programs automated music programmer/harrison and herber 9 varying in length from thirty minutes to somewhat over four hours, and consisting of a variety of musical styles and representing a diversity of programming difficult to achieve efficiently by ordinary means. in three minutes and twelve seconds, musprog selected the programs, updated the catalog, checked for duplication of selections, timed each program, and printed out the resultant copy properly headed. at an approximate cost of $250.00 per hour, this comes to less than fifteen dollars per month to perform tasks which might normally require two persons, at perhaps two or three dollars per hour, to work an entire week or more. it is doubtful that even then each catalog entry could be examined and an accurate record of usage be kept. acknowledgments a staff research grant from the graduate college, university of iowa, partially supported development and operation of this system. dean duane c. spriestersbach of the graduate college, professor gerard p. weeg, chairman of the department of computer science, and program supervisor robert e. irwin gave generous support and encouragement to the development of musprog. references 1. 'wilhoit, g. cleveland: "computerized indexing for broadcast music libraries," journal of broadcasting, 11 (fall, 1967) 325-337. 2. brook, barry s.: "rilm, repertoire internationale de ia litterature musicale," notes; the quarterly journal of the mtisic library association, 23 (march, 1967) 462-467. 3. xenakis, iannis: "in search of a stochastic music," gravesano review, 11 (1958). fagan 140 information technology and libraries | september 2006 visual search interfaces have been shown by researchers to assist users with information search and retrieval. recently, several major library vendors have added visual search interfaces or functions to their products. for public service librarians, perhaps the most critical area of interest is the extent to which visual search interfaces and text-based search interfaces support research. this study presents the results of eight full-scale usability tests of both the ebscohost basic search and visual search in the context of a large liberal arts university. l ike the web, online library research database interfaces continue to evolve. even with the smaller scope of library research databases, users can still suffer from information overload and may have difficulty in processing large results sets. web search-engine research has shown that the number of searchers viewing only the first results page has increased from 29 percent in 1997 to 73 percent in 2002 for united states-based web searchengines users.1 additionally, the mean number of results viewed per query in 2001 was 2.5 documents.2 this may indicate either increasing relevance in search results or an increase in simplistic web interactions. visual alternatives to search interfaces attempt to address some of the problems of information retrieval within large document sets. while research and development of visual search interfaces began well before the advent of the web, current research into visual web interfaces has continued to expand.3 within librarianship, the most visual interface research seems to focus on those that could be applied to large-scale digital library projects.4 although library products often have more metadata and organizational structure than the web, search engine-style interfaces adapted for field searching and boolean operators are still the most frequent approach to information retrieval.5 yet research has shown that visual interfaces to digital libraries offer great benefit to the user. zaphiris emphasizes the advantage of shifting the user’s mental load “from slow reading to faster perceptual processes such as visual pattern recognition.”6 according to borner and chen, visual interfaces can help users better understand search results and the interrelation of documents within the result set, and refine their search.7 in their discussion of the function of “overviews” in visual interfaces, greene and his colleagues say that overviews can help users make better decisions about potential relevance, and “extract gist more accurately and rapidly than traditional hit lists provided by search engines.”8 several library database vendors are implementing visual interfaces to navigate and display search results. serials solutions’ new federated search product, centralsearch, uses technology from vivisimo that “organizes search results into titled folders to build a clear, concise picture for its users.”9 ulrich’s fiction connection web site has used aquabrowser to help one “discover titles similar to books you already enjoy.”10 the queens library has also implemented aquabrowser to provide a graphical interface to its entire library’s collections.11 xreferplus maps search results to topics by making visual connections between terms.12 comabstracts, from cios, uses a similar concept map, although one cannot launch a search directly from the tool. groxis chose a circular style for its concept-mapping software, grokker. partnerships between groxis and stanford university began as early as 2004, and grokker is now being implemented at stanford university libraries academic and information resources.13 ebsco and groxis announced their partnership in march 2006.14 the ebscohost interface now features a visual search tab as an option that librarians can choose to leave on (by default) or turn off in ebsco’s administrator module. figure 1 shows a screenshot of the visual search interface. within the context of library research databases, visual searching likely provides a needed alternative from traditional, text-based searching. to test this hypothesis, james madison university libraries (jmu libraries) decided to conduct eight usability sessions with ebscohost’s new visual search, in coordination with ebsco and groxis. while this is by no means the first published usability test of vendor interfaces, the literature understandably reveals a far greater number of usability tests on in-house projects such as library web sites and customized catalog interfaces than on library database interfaces.15 it is hoped that by observing users try both the ebsco basic search and visual search, more understanding will be gained about user search behavior and the potential benefits of a visual approach. ฀ method the usability sessions were conducted at jmu, a large liberal arts university whose student population is mostly drawn from virginia and the northeastern region. only 10 percent of the students are from minority groups. jmu requires that all freshmen pass the online information skills seeking test (isst) before becoming a sophomore, and the libraries developed a web tutorial, “go for the gold,” to prepare students for the isst. therefore, usabiljody condit fagan usability testing of a large, multidisciplinary library database: basic search and visual search jody condit fagan (faganjc@jmu.edu) is digital services librarian at carrier library, james madison university, harrisonburg, virginia. usability testing of a large, multidisciplinary library database | fagan 141 ity-test participants were largely white, from the northeastern united states, and had exposure to basic information literacy instruction. jmu libraries’ usability lab is a small conference room with one computer workstation equipped with morae software.16 audio and video recordings of user speech and facial expressions, along with “detailed application and computer system data,” are captured by the software and combined into a searchable recording session for the usability tester to review. a screenshot of the morae analysis tool is shown in figure 2. the usability test script was developed in collaboration with representatives of ebsco and groxis. ebsco provided access to the beta version of visual search for the test, and groxis provided financial incentives for student participants. the test sessions and the results analysis, however, were conducted solely by the researcher and librarian facilitators. the visual search development team was provided with the results and video clips after analysis. usability study participants were recruited by posting an announcement to the jmu students’ web portal. a $25 gift certificate was offered as an incentive, and more than 140 students submitted a participation interest form. these were sorted by the number of years the student(s) had been at jmu to try to get as many novice users as possible. because so much of today’s student work is conducted in groups, four groups of two, as well as four individual sessions, were scheduled, for a total of twelve students. jmu librarians who had received both human-subjects training and an introduction to facilitation served as facilitators to the usability sessions. their role was to watch the time and ask open-ended questions to keep the student participants talking about what they were doing. the major research question it was hoped would be answered by the tests was, “to what extent does ebsco’s basic search interface and visual search interface support student research?” since the tests could not evaluate the entire research process, it was decided to focus on the development of the research topic. specifically, the goal was to find out how well each interface supported the intellectual process of the students in coming up with a topic, narrowing their topic, and performing searches on their chosen subtopics. an additional goal was to determine how well users were able to find and use the interface widgets and how satisfied the students felt after using the interfaces. the overall session was structured in this order: a pretest survey about the students’ research experience; a series of four tasks performed with ebscohost’s basic search; a series of three tasks performed with ebscohost’s visual search; and a posttest interview. both basic and visual search interfaces were used with academic search premier. each of the eight sessions was recorded in entirety by the morae software, and each recording was viewed in entirety. to try to gain some quantitative data, the researcher measured the time it took to complete each task. however, due to variables such as facilitator involvement and interaction between group members, the numbers did not lend themselves to comparison. also, it would not have been clear whether greater numbers indicated a positive or negative sign. taking longer to come up with subtopics, for example, could as easily be a sign of exploration and interested inquiry as it might be of frustration or failure. as such, the data are mostly qualitative in nature. figure 1. screenshot of ebscohost’s visual search figure 2. screenshot of morae recorder analysis tool 142 information technology and libraries | september 2006 ฀ results the student participants were generally underclassmen. two of the students, group 2, were in their third year at jmu. all others were in their first or second year. while students were drawn from a wide variety of majors, it is regrettable that there was not stronger representation from the humanities. when asked, “what do you normally use to do research?” six students answered an unqualified “google.” three other students mentioned internet search engines in their response. only two students gave the brand or product names of library research databases: one said, “pubmed, wilsonomnifile, and ebsco,” while the other, a counseling major, mentioned psycinfo and cinahl. when shown a screenshot of basic search, half of the students said they had used an ebsco database before. all of the participants said they had never before used a visual search interface. the full results from the individual pretest interviews are shown in figures 3 and 4. to begin the usability test, the facilitator started internet explorer and loaded the ebscohost basic search, which was set to have a single input box. the scripts for each task are listed in figure 5. note that task 4 was only featured in the basic search portion of the test. for task 1 on the basic search—coming up with a general topic—all of the participants began by using their own topics rather than choosing from the list of ideas. also, although they were asked to “spend some time on ebsco to come up with a possible general topic,” all but group 6 fulfilled this by simply thinking of a topic (sometimes after some discussion within the groups of two) and typing it in. with the exception of group 6, the size of the result set did not inspire topic changes. figure 6 summarizes the students’ searches and relative success on task 1. in retrospect, the tests might have yielded more straightforward findings if the students had been directed to choose from the provided list of topics, or even to use the same topic. however, part of the intention was to determine whether either interface was helpful in guiding the students’ topic development. it was hoped that by defining the scenario as writing a paper for class, their topic selection would reflect the realities of student research. however, it probably would have been better to have used the same topic for each session. task 2 asked participants to identify three subtopics, and task 3 asked them to refine their search to one subtopic and limit it to the past two years. a summary of these tasks appears in figure 7. a surprising finding during task 2 was that students did go past the first page of results. four groups went past the first page of results, while two groups did not get enough results for more than one page. the other two groups did not choose to look past the first page of results. this contrasts with jansen and spink’s findings, figure 3. results from pretest interview, groups 1–4 figure 4. results from pretest interview, groups 5–8 usability testing of a large, multidisciplinary library database | fagan 143 in which 73 percent of web searchers only view the first results page.17 another pleasant surprise was that students spent some time actually reading through results when they were searching for ways to narrow their topic. five groups scanned through both titles and abstracts, which requires clicking on the article titles to display the citation view. one of these five additionally chose to open full-text articles and look at the references to determine relevance. two groups scanned through the results pages only, but looked at both article titles and the subjects in the left-hand column. group 5 seemed to only scan the titles in the results list. this user behavior is also quite different than that found with web search-engine users. in one recent study by jansen and spink, more than 90 percent of the time, search-engine users viewed five or fewer documents per query.18 the five groups that chose to view the citation/abstract view by clicking on the title (groups 1, 2, 3, 4, and 6) identified subtopics that were significantly more interesting and plausible than the general topic they had come up with. from looking at their results, these groups were clearly identifying their subtopics from reading the abstracts and titles rather than just brainstorming. although group 2 had the weakest subtopics, going from the world baseball classic to specific players’ relationships to the classic and the home-run derby, they were working with a results set of but eleven items. the three groups that relied on scanning only the results list succeeded to an extent, but as a whole, the new subtopics would be much less satisfying to the scenario’s hypothetical professor. after scanning the titles on two pages of results, group 5 (an individual) ended up brainstorming her subtopics (prevention, intervention, and what an eating disorder looks like) based on her knowledge of the topic rather than drawing from the results. group 7 (a group of two) identified their subtopic (sand dunes) from the lefthand column on the results list. group 8 (an individual) picked up his subtopics (steroids in sports, president bush’s stance on steroids, and softball) from reading keywords in the article titles on the first page of results. since the subjects in the left-hand column were a new addition to basic search, the use of this area was also noted. four groups used the subjects in the left-hand column without prompting. two groups saw the subjects (i.e., ran the mouse over them) but did not use them. the remaining two groups made no action related to the subjects. a worrisome finding of tasks 2 and 3 was that most students had trouble with the default search being set to phrase-searching rather than to a boolean and. this can easily be seen in looking at the number of results the students came up with when they tried to refine their topics (figure 7). even though most students had some limiter still in effect (full text, last two years) when they first tried their new refined search, it was the phrasesearching that really hurt them. luckily, this figure 6. task 1, coming up with a general topic using basic search figure 5. tasks posed for each portion of the usability test. 144 information technology and libraries | september 2006 is a customizable setting in ebsco’s administrator module, and it is recommended that libraries enable the “proximity” expander to be set “on” by default, which will automatically combine search terms with and. task 4, finding a “recent article in the economist about the october earthquake in kashmir,” was designed to test the usability of the ebscohost publication search and limiter. it was listed as optional in case the facilitator was worried that time was an issue. four of the student groups—1, 2, 5, and 7—were posed the task. of these four groups, three relied entirely on the publication limiter on the refine search panel. group 1 chose to use the publication search. all four groups quickly and successfully completed this task. ฀ ฀additional questions during basic search tasks at various points during the three tasks in ebsco’s basic search, the students were asked to limit their results set to only full-text results, to find one peer-reviewed article, and to limit their search to the past two years. seven out of the eight student groups had no problem finding and using the ebscohost “refine search” panel, including the full-text check box, date limiter, and peerreviewed limiter. group 7 did not find the refine search panel or use its limiters until specifically guided by the facilitator near the end. this group had found other ways to apply limits: they used the “books/monographs” tab on the results list to limit to full text, and the results-list sorting function to limit to the past two years. after having seen the refine search panel, group 7 did use the “peer reviewed” check box to find their peer-reviewed article. toward the end of the basic search portion, students were asked to “save three of their results for later.” three groups demonstrated full use of the folder. an additional three groups started to use the folder and viewed the folder but did not use print, save, or e-mail. it is unclear whether they knew how to do so and just did not follow through, or whether they thought they had safely stored the items. two students did not use the folder at all, acting individually on items. one group used the “save” function but did not save each article. ฀ visual search similar to task 1, when using the basic search, students did not discover general topics by using the interface, but simply typed in a topic of interest. only two groups, 1 and 8, chose to try the same topic again. in the interests of processing time, visual search limits the search to the first 250 results retrieved. since jmu has set the default sort results to display in chronological order, the most recent 250 results were returned during these usability tests. figure 8 shows the students’ original search terms using visual search, the actions they took while looking for subtopics, and the subtopics they identified. additionally, if the subtopics they identified matched words on the screen, the location of those words is noted. three of the groups (1, 2, and 5) identified subtopics when looking at the labels on topic and subtopic circles. group 3 identified subtopics while looking at article titles as well as the subtopic circles. the members of group 6 identified subtopics while looking at the citation view and reading the abstract and full text, as well as rolling over article titles with their mice. it was not entirely clear where the student in group 4 got his subtopics from. two of the three subtopics did not seem to figure 7. basic search, task 2 and 3, coming up with subtopics. usability testing of a large, multidisciplinary library database | fagan 145 be represented in the display of the results set. his third subtopic was one of the labels from a subtopic circle. groups 7 and 8 both struggled with finding their subtopics. group 7 simply had a narrow topic (“jackalope”), and group 8 misspelled “steroids” and got few results for that reason. lacking many clusters, both groups tried typing additional terms into the title keyword box on the filter panel, resulting in fewer or zero results. for task 3, students were asked to limit their search to the last two years and to refine their search to a chosen subtopic (figure 9). particularly because the results set is limited to 250, it would have been better to have separated these two tasks: first to have them limit the content, then perhaps the date of the search. three groups, all groups of two, used the date limit first (2, 6, and 8). three groups (1, 3, and 6) narrowed the content of their search by typing a new search or additional keywords into the main search box. groups 2 and 4 narrowed the content of their search by clicking on the subtopic circles. note that this does not change the count of the number of results displayed in the filter panel. groups 5 and 7 tried typing keywords into the title keyword filter panel and also clicking on circles. both groups fared better with the latter approach. group 8 typed an additional keyword into the filter panel box to narrow his search. while five of the groups announced the subtopic to which they wanted to narrow their search before beginning to narrow their topic, groups 2, 7, and 8 began to interact with the interface and experiment with subtopics before choosing one. while groups 2 and 8 arrived at a subtopic and identified it, group 7 tried many experiments, but since their original topic (jackalope) was already narrow, they were not ultimately successful in identifying or searching on a subtopic. as with basic search, students were asked to save three articles for later. five of the groups (2, 4, 5, 6, and 8) used the “add to folder” function which appears in the citation view on the right-hand side of the screen. of these, three groups proceeded to “folder has items.” of these groups, two chose the “save” function. two groups used either “save” or “e-mail” to preserve individual items, rather than using the folder. one group experienced system slowness and was not able to load the full-record view in time to determine whether they would be able to save items for later. a concern that students may not realize is that in folder view or individually, the “save” button really just formats the records. the user must still use a browser function to save the formatted page. no student performed this function. figure 8. visual search, task 1 and 2, coming up with a general topic figure 9. visual search, task 3, searching on subtopic (before date limit, if possible) 146 information technology and libraries | september 2006 several students had some trouble with the mechanics of the filter panel, shown in figure 10. seven of the eight groups found and used the filter panel, originally hidden from view, without assistance. however, some users were not sure how the title keyword box related to the main search box. at least two groups typed the same search string into the title keyword box that they had already entered into the main search box. also, users were not sure whether they needed to click the search button after using the date limiter. however, in no case was a student unable to quickly recover from these areas of confusion. ฀ results of posttest interview at the end of the entire usability session, participants were asked several questions while looking at screenshots of each interface. a full list of posttest interview questions can be found in figure 11. when speaking about the strengths of basic search, seven of eight groups talked about the search options, such as field searching and limiters. the individual in group 1 mentioned “the ability to search in fields, especially for publications and within publications.” one of the students in group 3 mentioned that “i thought it was easier to specify the search for the full text and the peer reviewed—it had a separate page for that.” the student in group 4 added, “they give you all the filter options as opposed to the other one.” five of the eight groups also mentioned familiarity with the type of interface as a strength of basic search. since jmu has only had access to ebsco databases for less than a year, and half of the students admitted they had not used ebsco, it seemed their comments were with the style of interface more than their experience with the interface. the student in group 1 commented, “seems like the standard search engine.” group 2 noted, “it was organized in a way that we’re used to more,” and group 3 said, “it’s more traditional so it’s more similar to other programs.” half of the groups mentioned that basic search was clear or organized. group 6 explained, “it was nice how it was really clearly set out . . . like, everything’s in a line.” not surprisingly, visual search’s strengths surrounded the grouping of subtopics: seven of eight groups made some comment about this. the student in group 4 said, “it groups the articles for you better. it kinda like gives you the subtopics when you get into it and search it and that’s pretty cool.” the student in group 8 stated, “you can look and see an outline of where you want to go . . . it’s easy to pinpoint it on screen like that’s where i want to go with my research.” some of the other strengths mentioned about visual search were: showing a lot of information on one screen without scrolling (group 7) and the colorful nature of the interface. a student in group 2 added, “i like the circles and squares—the symbols register easily.” the only three weaknesses listed for basic search in response to the first question were: “not having a spot to put in words not to search for” (group 1); that, like internet search engines, basic search should have “a clip from the article that has the keyword in it, the line before and the line after” (group 6); and that basic search might be too broad, because “unless you narrow it, [you have to] type in keywords to narrow it down yourself” (group 7). figure 10. visual search filter panel figure 11. posttest interview questions usability testing of a large, multidisciplinary library database | fagan 147 with regard to weaknesses of visual search, half of the groups had some confusion about the content, partially due to the limited number of results. a student from group 7 declared, “it may not have as many results. . . . if you typed in ‘school’ on the other one, it might have . . . 8,000 pages [but] on this you have . . . 50 results.” the student in group 5 agreed, saying that with visual search, “they only show you a certain number of articles.” the student in group 1 said, “it’s kind of confusing when it breaks it up into the topics for you. it may be helpful for some other people, but for the way my mind works i like just having all my results displayed out like on the regular one.” half of the groups also made some comment that they were just not used to it. six of the groups were asked which one they would choose if they had class in one hour. (it is not clear why the facilitator did not ask this question of groups 3 and 8.) four groups (1, 2, 5, and 7) indicated basic search. one student in group 2 said, “i think it’s easier to use, but i don’t trust it.” the other in group 2 added, “it’s new and we’re not quite sure because every other search engine is you just type in words and it’s not graphical.” both students in group 7 commented that the familiarity of basic search was the reason they would use it for class in one hour. both groups 2 and 7 would later say that they liked the visual search interface better. two groups (4 and 6) chose visual search for the “class in one hour” scenario. the student in group 4 commented, “because it does cool things for you, makes it easier to find. otherwise you’re going through by title.” both these groups would later also say that they liked the visual search interface better. the students were also asked to describe two scenarios, one in which they would use basic search and one in which they would use visual search. four of the groups (1, 3, 5, and 6) said they would use basic search when they knew what information they needed. seven of the eight groups said they would use visual search for broad topics. all the students’ responses are given in figure 12. when asked which interface they preferred, the groups split evenly. comments from the four who preferred basic search (1, 3, 5, and 8) centered on the familiarity of the interface. the student in group 5 added, “the regular one . . . i like to get things done.” all four of these students had said they had used an ebsco database before. the two students who could list library research databases by name were both in this group. of the four who preferred visual search (2, 4, 6, and 7), three groups had never used ebsco before, though one of the students in group 7 thought he’d used it in the library web tutorial. group 2 commented, “it seemed like it had a lot more information . . . cool . . . futuristic.” the student in group 4 said, “it’s kind of like a little game. . . . like you’re trying to find the hidden piece.” group 7 commented that visual search was colorful and intriguing. the students in group 6 both stated “the visual one” in unison. one student said that visual search was more “[eye-catching] . . . it keeps you focused at what you are doing, i felt, instead of . . . words . . . you get to look at colors” and added later that it was “fun.” the other students in group 6 said, “i’m a very visual learner. so to see instead of having to read the categories, and say oh this is what makes sense, i see the circles like ‘abilities test’ or ‘academic achievement’ and i automatically know that’s what it is . . . and i can see how many articles are in it . . . and you click on it and it zooms in and you have all of them there.” the second student went on to add, “i’ve been teaching my mom how to use technology and the visual search would be so much easier for her to get, because its just looks like someone drew it on there like this is a general category and then it breaks it down.” other suggestions given during the free-comment portion of the survey were to have the filters from basic search appear on visual search (especially peer-reviewed); curiosity about when visual search would become available (at the time it was in beta test); and a suggestion to have generaleducation writing students write their first paper using visual search. figure 12. examples of two situations: one in which you would be more likely to use visual search, and one in which you would be more likely to use ebsco 148 information technology and libraries | september 2006 ฀ discussion this evaluation is limited both because most students chose different topics for each search interface, and because they only had time to research one topic in each interface. therefore, there could be an infinite number of scenarios in which they would have performed differently. however, this study does show that, for some students, or for some search topics, visual search will help students in a way that basic search may not. one hypothesis of this study was that within the context of library research databases, visual searching would provide a needed alternative from traditional, text-based searching. the success of the students was observed in three areas: the quality of the subtopics they identified after interacting with their search results; the improvement of the chosen subtopic over their chosen general topic, and the quality of the results they found for their subtopic search. the researcher made a best effort to compare topics and results sets and decide which interface helped the student groups to perform better. in addition, qualities that each interface seemed to contribute to the students’ search process were noted (figure 13). these qualities were determined by reviewing the video recordings and examining the ways in which either interface seemed to support the attitudes and behaviors of the students as they conducted their research tasks. when considering all three of these areas, four groups did not, overall, require visual search as an alternative to basic search (1, 3, 4, and 7). two of these groups (4 and 7) seemed to benefit from more focus when using the basic search interface. although visual search lent them more interaction and exploration (which may be why they said they preferred visual search), it seems the focus was more important to their performance. for the other two groups (1 and 3), basic search really supported the depth of inquiry and high interest in finding results. these two groups confirmed that they preferred basic search. for two groups (6 and 8), visual search seemed an equally viable alternative to basic search. for group 6, both interfaces seemed to support the group’s desire to explore; they said they preferred visual search. for the student in group 8, basic search seemed to orient him to the goal of finding results, while visual search supported a more exploratory approach. since, in his case, this exploratory approach did not turn out well in the area of finding results, it is not surprising that he ended up preferring basic search. the remaining two groups (2 and 5) performed better with visual search, upholding the hypothesis that an alternate search is needed. group 2 seemed bored and uninterested in the search process when using basic search even though they chose a topic of personal interest: “world baseball classic.” visual search caught their attention and sparked interest in the impersonal topic “global warming.” group 2 spent more time exploring while using the visual search interface, and in the posttest survey admitted that they preferred the visual search interface. the student in group 5 said she preferred basic search, and as a selfdescribed psycinfo user, seemed comfortable with the interface. yet for this test scenario, visual search made her think of new ideas and supported more real exploration during the search process. within each of the three areas, basic search appeared to have the upper hand for both the quality of the subtopics identified by the students, and in the improvement of the chosen subtopics over the general topics. this is at least partially explained by the limitation of visual search to the most recent 250 results. that is, as the students explored the visual search results, choosing subtopics would not relaunch a search on that subtopic, which would have engendered more and perhaps better subtopics. in the third area, the quality of the results set for the chosen topic, visual search seemed to have the upper hand if only because of the phrase-searching limitation present in jmu’s administrative settings for basic search. that is, students were often finding few or no results on their chosen subtopics in basic search. this study also had findings that seem to transcend figure 13: strengths of basic search and visual search in quality of subtopics, most improved topic, and result sets usability testing of a large, multidisciplinary library database | fagan 149 these interfaces and the underlying database. first, libraries should strongly consider changing their database default searching from phrase searching to a boolean and, if possible. (this is possible in ebsco using the administrative module.) second, most students did not have trouble finding or using the interface widgets to perform limiting functions, with the one exception being some confusion about the relationship between the visual search filters and main search box. unlike some research into web search behavior, students may well travel beyond the first page of results and view more than just a few documents when determining relevance. finally, the presence of subject terms in both interfaces proved to be an aid to understanding results sets. this study also pointed out some improvements that could be made to visual search. first, it would be great if visual search returned more than 250 results in the initial set, or at least provided an overview of the size, type, and extent of objects using available metadata.19 however, even with today’s high-speed connections, result-set size will need to be balanced with performance. perhaps, as students click on subtopics, the software could rerun the search so that the results set does not stay limited to the original 250. on a minor note, for both basic and visual search, greater care should be taken to make sure users understand how the save function works and alert users to the need to use the browser function to complete the process. it should be noted that ebsco has not stopped developing visual search, and many of these improvements may well be on their way. ebsco says it will be adding more support for limiters, display preferences, and contextual text result-list viewing at some point in the future. these feature sets can currently be viewed on grokker.com. an important area for future research is user behavior in library subscription databases. while these usability tests provide a qualitative evaluation of a specific interface, it would be worthwhile to have a more reliable understanding about students’ searching behavior in library databases across similar interfaces. since public service librarians deal primarily with users who have self-identified as needing help, their experience does not always describe the behavior of all users. furthermore, studies of web search behavior may not apply directly to searching in research databases. specifically, students’ use of subject terms in both interfaces could be explored. half of the student groups in this study chose to use the basic search subject clusters in the left-hand column on the results page, despite the fact that they had never seen them before (this was a beta-test feature). is this typical? would this strategy hold up to a variety of research topics? another interesting question is the use of a single search box versus several search boxes arrayed in rows (to assist in constructing boolean and field searching). in the ebsco administrative module, librarians can choose either option. based on research rather than anecdotal evidence, which is best? another option is the default sort: historically, at jmu libraries, this has been a chronological sort. does this cause problems for relevance-thinking students? finally, the issue of collaboration in student research using library research databases would be a fascinating topic. certainly, these usability recordings could be reviewed with a mind to capturing the differences between individuals and groups of two, but there may be better designs for a more focused study of this topic. ฀ conclusion if you take away one conclusion from this study, let it be this: do not hesitate to try visual search with your users! information providers must balance investments in cutting-edge technology with the demands of their users. libraries and librarians, of course, are a key user group for information providers. a critical need in librarianship is to become familiar with the newest technology solutions, particularly with regard to searching, in order to provide vendors with informed feedback about which technologies to pursue. by using and teaching new visual search alternatives, librarians will be poised to influence the further development of alternatives to text-based searching. references and notes 1. bernard j. jansen and amanda spink, “how are we searching the world wide web? a comparison of nine search engine transaction logs,” special issue, information processing and management 42, no. 1 (2006): 257. 2. bernard j. jansen and amanda spink, “an analysis of web documents retrieved and viewed,” in proceedings of the 4th international conference on internet computing (las vegas, 2003), 67. 3. aravindan veerasamy and nicholas j. belkin, “evaluation of a tool for visualization of information retrieval results,” sigir forum (acm special interest group on information retrieval) (1996): 85–93; katy börner and javed mostafa, “jodl special issue on information visualization interfaces for retrieval and analysis,” international journal on digital libraries 5, no. 1 (2005): 1–2; ozgur turetken and ramesh sharda, “clustering-based visual interfaces for presentation of web search results: an empirical investigation,” information systems frontiers 7, no. 3 (2005): 273–97. 4. stephen greene et al., “previews and overviews in digital libraries: designing surrogates to support visual information seeking,” journal of the american society for information science 51, no. 4 (2000): 380–93; panayiotis zaphiris et al., “exploring the use of information visualization for digital libraries,” new review of information networking 10, no. 1 (2004): 51–69. 5. katy börner and chaomei chen eds., visual interfaces to digital libraries, 1st ed. (berlin; new york: springer, 2003), 243. 150 information technology and libraries | september 2006 6. zaphiris et al., “exploring the use of information visualization for digital libraries,” 51–69. 7. börner and chen, visual interfaces to digital libraries, 243. 8. greene et al., “previews and overviews in digital libraries,” 380–93. 9. “vivisimo corporate profile,” in vivisimo, http://vivi simo.com/html/about (accessed apr. 19, 2006). 10. “aquabrowser library—fiction connection,” www.fic tionconnection.com/ (accessed apr. 19, 2006). 11. “queens library—aquabrowser library,” http://aqua .queenslibrary.org/ (accessed apr. 19, 2006). 12. “xrefer—research mapper,” www.xrefer.com/research (accessed apr. 19, 2006). 13. “stanford ‘groks,’” http://speaking.stanford.edu/back _issues/ soc67/library/stanford_groks.html (accessed apr. 19, 2006); “grokker at stanford university,” http://library.stan ford.edu/catdb/grokker/ (accessed apr. 19, 2006). 14. “ebsco has partnered with groxis to deliver an innovative visual search feature as part of ebsco,” www.groxis .com/service/grokker/pr29.html (accessed apr. 19, 2006). 15. michael dolenko, christopher smith, and martha e. williams, “putting the user into usability: developing customer-driven interfaces at west group,” in proceedings of the national online meeting 20 (medford, n.j.: learned information, 1999), 81–90; e. t. morley, “usability testing: the silverplatter experience,” cd-rom professional 8, no. 3 (1995); ron stewart, vivek narendra, and axel schmetzke, “accessibility and usability of online library databases,” library hi tech 23, no. 2 (2005): 265–86; nicholas tomaiuolo, “deconstructing questia: the usability of a subscription digital library,” searcher 9, no. 7 (2001): 32–39; b. hamilton, “comparison of the different electronic versions of the encyclopaedia britannica: a usability study,” electronic library 21, no. 6 (2003): 547–54; heather l. munger, “testing the database of international rehabilitation research: using rehabilitation researchers to determine the usability of a bibliographic database,” journal of the medical library association (jmla ) 91, no. 4 (2003): 478–83; frank cervone, “what we’ve learned from doing usability testing on openurl resolvers and federated search engines,” computers in libraries 25, no. 9 (2005): 10–14; alexei oulanov and edmund f. y. pajarillo, “usability evaluation of the city university of new york cuny+ database,” electronic library 19, no. 2 (2001): 84–91; steve brantley, annie armstrong, and krystal m. lewis, “usability testing of a customizable library web portal,” college & research libraries 67, no. 2 (2006): 146–63; carole a. george, “usability testing and design of a library web site: an iterative approach,” oclc systems & services 21, no. 3 (2005): 167–80; leanne m. vandecreek, “usability analysis of northern illinois university libraries’ web site: a case study,” oclc systems & services 21, no. 3 (2005): 181–92; susan goodwin, “using screen capture software for web-site usability and redesign buy-in,” library hi tech 23, no. 4 (2005): 610–21; laura cobus, valeda frances dent, and anita ondrusek, “how twenty-eight users helped redesign an academic library web site,” reference & user services quarterly 44, no. 3 (2005): 232–46. 16. “morae usability testing for software and web sites,” www.techsmith.com/morae.asp (accessed apr. 19, 2006). 17. jansen and spink, “an analysis of web documents retrieved and viewed,” 67. 18. ibid. 19. greene et al., “previews and overviews in digital libraries,” 381. 230 the recon pilot project: a progress report november 1969 -april 1970 henriette d. avram, kay d. guiles, lenore s. maruyama: marc development office, library of congress, washington, d. c. a srtnthesis of the second progress report submitted by the library of congress to the council on library resources under a grant for the recon pilot project. an overview of the p1'0gress made from november 1969 to april 1970 in the following areas: p1'0duction, official catalog comparison, format mcognition, research titles, microfilming, investigation of inptlt devices. in addition, the status of the tasks assigned to the recon working task force are briefly described. introduction an article was published in the june 1970 issue of the journal of library automation ( 1) describing the scope of the recon pilot project (hereafter referred to as recon) and summarizing the first progress report submitted by the library of congress ( lc) to the council on library resources (clr). recon is supported by the council, the u.s. office of education, and the library of congress. in order that all aspects of the project might be brought together as a meaningful whole, the various segments, regardless of the source of support, were covered in the second progress report and have been included in this article. in some instances, it has been necessary to introduce a section by repeating some aspects already reported in the june 1970 article in order to add clarity to the content of that section. recon pilot project/ avram 231 progress-november 1969 to april 1970 recon production the production operations of the recon pilot project are being handled by the recon production unit in the marc editorial office of the lc processing department. printed cards with 1968, 1969, and 7-series card numbers have been provided from the card division stock for recon input, and approximately 99,550 cards in the 1969 and 7-series have been received. using prescribed selection criteria the recon editors have sorted these cards and obtained approximately 27,150 eligible for recon input. approximately 150,000 cards in the 1968 series have also been received. the recon editors have sorted 60,000 of these cards and obtained approximately 24,000 records eligible for recon input. a large number of cards in these three series is already out of print, and replacement cards are being sent by the card division as soon as reprints are made. each card eligible for recon input from the above-mentioned selection process is also checked against a computer produced index of card numbers for records in machine readable form. each number in the print index has a corresponding code to show on which machine readable data base the record resides. the source codes are as follows: m1-marc i data base m2-marc ii, 1st practice tape m3-marc ii, 2nd practice tape m4-marc ii data base m5-marc ii residual data base (the two practice tapes contain records converted before the implementation of the marc distribution service to test the programs and input techniques.) the print index used for the final selection of the 1969 and 7-series card numbers contained only the records from m2-m5 (the marc i data base consists of the records converted during the marc pilot project which ended in june 1968). for the selection of the 1968 records, another print index had been produced which contains numbers for records on all five data bases. if the recon editors find a match on the print index, the appropriate source code is added to the printed card; these printed cards are then maintained in a separate file. (later in the project, the records in the data bases identified as m1 to m3 will be updated to conform with the current marc ii format and added to the recon data base.) the remaining cards for recon are reproduced on input worksheets and edited. to date, approximately 9,750 records in the 1969 and 7-series have been edited for recon. recon records in the 1969 and 7-series are being input by a service bureau. the contractor uses ibm selectric typewriters equipped with an ocr typing mechanism, and the hard-copy sheets are run through an 232 journal of library automation vol. 3/3 september, 1970 optical scanner. the output from the scatmer is a magnetic tape which is processed by the contractor's programs to produce a tape in the marc pre-edit format. this tape is then sent to lc and processed by the marc system programs to produce a full marc record. since the input for the retrospective conversion effort will be printed cards (or copies of printed cards from the card division record set), it will be necessary to compare these with their counterparts in the lc official catalog. the printed card for each main entry in the official catalog will show if any changes have been made which did not warrant reprinting these cards to incorporate these changes. items on a printed card that could be noted in this fashion include changed subject headings, added entries, and call numbers. since these will be important access points in a machine readable catalog record, it was felt that such revisions should be reflected in the recon records. the recon report ( 2) contains a lengthy discussion of the various factors involved in the catalog comparison process, such as the percentage of change in relation to the age of the record, the difficulty in ascertaining any changes because of language, interpretation of cataloging rules, etc. to determine the most efficient and least costly method of catalog comparison, two recon editors were assigned to conduct an experiment to test eight different methods as follows: 1) print-out checked in alphabetic order-single group of 200 records. 2) proofsheets (already proofed) checked in worksheet (card number) order-group of 200 records in batches of 20. 3) proofsheets (not proofed) checked in worksheet (card number) order -group of 200 records in batches of 20. 4) proofsheets (already proofed) checked by mental alphabetizationgroup of 200 records in batches of 20. 5) proofsheets (not proofed) checked by mental alphabetization-group of 200 records in batches of 20. 6) worksheets before editing (not input) checked by mental alphabetization-group of 200 records in batches of 20. 7) worksheets before editing (not input) checked in alphabetical order -group of 200 records in batches of 20. 8) worksheet before editing (not input) checked in worksheet (card number) order-group of 200 records in batches of 20. mental alphabetization means the searching of all the entries in a batch beginning with "a," then all the entries beginning with "b," etc., even though the batch is not in alphabetical order. each editor used 200 records for each method, made the necessary corrections, and recorded the time required as well as the number of corrections made. . figure 1 shows the average number of records checked in an hour using the eight different methods of catalog comparison. tables 1 and 2 give the estimated cost per record for each of the methods. in determining met.hod one : prinffi-0w checked in alplt.aaet'lcal 0,ra.e¢ metn0d. twa : pr0€if~li~lt!s (already proepfed) cheeked.in w:ork&heet qrder method ·th~~e: pitoof:s:tieets--(no;t proo.£,ed) cheek:ed i .n worksrbet orde,r merll:od fou·r method five method six p'rotlf'sheets (already proofed) che.c:ked bv .tmij!f,j:m.. all'~etua.',l'l~on liroofs·h:eets (not pt;';oe:i ed) checked bv mental alphabetization method seven: workshtbftts before editing (no•t tnntht) ~jh\~e,':f!,q-!\1. qj;!>to ..... 0 ~ = table 4. input devices >:l -----------0 manufacturer i mn};~ne ke yboard reco rd price t'-1 model configudisplay length i•~ mont1:f! remarks .... purchase ~ ration characters rent a cybercom i kic mark! kp none 80 $7970 $145 con.verter-$1801month ~ data action ki c 150 kp projec720 $5900 $155 converter-$5751month > tion .: ibm i ki c 50 kp back7w $9605 $175 converter-$340/ month -0 i ki c light in£nite converter-$3401 month ~ ibm mtstv t printed $100 -iv t printed in£nite $277 ... _ 0 sycor i kic 301 t crt 216 $7000 $150 converter-$1301 month ~ tycore ki c 8500 kp light240 $6000 $120 converter-$220/month < emitting 0 diodes viatron ki c 21 ti kp crt infinite $1920 $39 many options affecting price "' burroughs ki m n-7000 kp projec160 $8400to $165 to ...___ "' tion $12,200 $277 honeywell ki m keytape ti kp back80-400 $7500 to $148 to pooler for 2 stationscl) light $33,000 $735 $2001 month exh-a (i) ~ keymatic ki m 1091 t backin£nite $8750 $166 price is for basic 88 keys. 256 .... (i) light unique keys available as well s as optional printer. 0" 100 or 200 (i) mal i ki m 100-92 kp projec$6400 $160 pooler for up to 8 stationsv~ tion $401month extra ...... mohawk i ki m 6400 kp back80 $8000 $145 pooler for 3 stationsco light $1751month extra --l c motorola i k/ m kb800 kp none wo $8500 none pooler for 7 stationspotter i ki m kdr kp bcd 160 $8100 $165 purchase price $9700 pooler for 3 stations(bit) $451month sangamo kim ds9100 kp back120 $8200 $177 pooler for 10 stationsvanguard ki m datakp scribe light none 200 $247 / month extra $8500 $175 comoutp-r t(j't tnfo. .,.. c'r't' qm ~ 1 0. clf'v\ .&. ont'!n ~ '"'""" . coo soles system computer kit 6000 kp entry system mohawk kit 9000 kp computer machinery ki t key processkp general ki t ing 2100 t computer systems inforex kit key entry kp penta kit key kp associates logic systems eng. k/ t keytran kp logic corp. ki d lc-720 kp legend: ki t = key to magnetic tape system kid = key to disk system ki c = key to cassette none 496 back80 light back250 light printed 200 crt l28 back200 light none 300 crt 350 ki m = key to computer compatible magnetic tape kp = key punch t ikp = typewriter or key punch $78,000 $ 200 $16,200 to $360 to two to 6 stations $42,000 $925 $53,000 to $1040 to four to 16 stations $145,000 $2840 $92,500 to $2055to eight to 32 stations $168,100 $4095 $81,240 to $2350 to seven to 39 stations $273,120 $7885 $30,300 to $760 to four to 8 stations $35,100 $960 $110,000 $3000 to eight to 64 stations to $8600 $345,200 $100,000 $2875 to nine to 48 stations to $6350 $220,000 $148,000 $2450to four to 16 stations to $5800 $300,000 t = typewriter backlight= a matrix consisting of all individual characters that can be keyed. each character, as keyed, is displayed one at a time in its particular position in the matrix. projection and light-emitting diodes = a one-character position dot matrix. each character, as keyed, is displayed one at a time in the same position. bcd (bit) = lights displaying the bit position (on, off ) of individual characters. each character, as keyed, is displayed one at a time. (the prices quoted and the characteristics given of each device reflect the best information that could be obtained by the recon staff.) :::tl ~ ("') 0 ~ "';j [ ~ ~~ --.. ~ ~ ~ ~ 244 journal of library automation vol. 3/3 september, 1970 could be assigned to single keys and translated to their proper value by software, thus reducing the amount of keystroking required. the keymatic appears worth further investigation; therefore, the library may rent a device for several months for testing and evaluation. a typist will be trained in current marc/recon procedures and assigned to the keymatic as soon as her training period has been completed. the first month will be spent training on the keymatic prior to the actual input of recon records to obtain production and error rates and cost evaluation for comparison purposes. serious consideration was also given in the recon report to direct-read ocr equipment; however, at that time no equipment existed that offered the technical capability to perform the conversion of the lc record set. since then, preliminary investigation of the model370 compuscan universal optical character reader proved interesting enough to continue further exploration of the device. the model 370 compuscan is a computer directed flying-spot scanner which matches the scanned portion of a character with a character described in the core memory of the computer. the manufacturer has examjned a sample of lc printed cards selected at random over a period of twenty years and has concluded that although the hardware is sufficient to read the record set optically, significant software effort would be required. the results of the sampling indicated that the record set is not constituted entirely of "mint" cards, i.e., cards printed from the metal of the original linotype composition, but is composed of originals and reprints of the original. when the stock of the original printing is close to depletion, the card is reprinted by photographing the card, and duplicates are made by a photo-offset process. as this cycle is repeated, the card for any one title could be several generations removed from the original. in some instances, a microscopic examination of the cards seems to indicate that the matrices used in the linotype composition were worn. because of these factors, what might appear as the same character to the naked eye would represent different pattern configurations to the scanner's core memory. · the coarseness of the card surface may also cause variations in the same characters. lc cards have a high rag content in order to meet the archival standards required by libraries. the roughness of the surface does not affect the readability for the human but may cause variations in a given character when read by an optical scanner. another significant problem with lc cards concerns characters which touch, i.e., connections between what are intended to be distinct characters but are read by the scanner as one. for example, if a lower case "n" were next to a lower case «t" and the cross bar on the "t" touched the "n," the scanner would consider the combination of the "n" and the "t" as one character. software must be written to handle the variant character and the touching recon pilot project/ avram 245 character problems. in the case of the touching characters, the machine must recognize some allowable limit of reading a single character, and when this limit is exceeded, the pattern read rnust be divided and matched against single-character patterns held in core. programs can be written so that if either of the above conditions occurs, the output on magnetic tape will be flagged for later spot checking, permitting the scanner to continue to operate at throughput speeds without human intervention. the resultant magnetic tape would serve as input to the library's format recognition programs to reformat the scanner's output into the marc ii format. it has been estimated that the throughput speed of compuscan would be in the vicinity of 1800 cards per hour. the lc record set will be microfilmed according to the specifications required by the scanner. since the scanner operates with negative film, a very dark background with a very clear, white image is necessary. a tentative cost estimate of the microfilming and reading has been computed at approximately fifty cents per 1000 characters output on magnetic tape (approximately three lc cards). this price does not include the cost of the software. original printed "mint" cards will be used to test the device without implementing the required software, and depending on the results, investigation may be continued. the keying of the 1969 recon records has been performed by a contractor using an ibm selectric typewriter with the resulting hard copy fed through a farrington optical character reader. as part of the contractor's services to the library, production rates were monitored and reported. this gave lc the basis to compare two devices, the key-tocassette used at the library of congress for the marc distribution serv.ice and the equipment used by the contractor for recon records. to make the comparison in table 5, it was necessary to determine the costs for each method using the techniques developed in the recon report (9). some modifications of cost were made to the original recon estimates because actual figures are now available. marc costs were obtained by dividing the costs of the manhours for typing and proofing in a given period by the number of records added to the marc master file in the same period. the equipment cost per record was also based on the number of records added to the master file. production rates associated with particular tasks were not used. the manpower figures supplied by the contractor were limited to hourly production rates; therefore, to obtain the cost per record for ocr typing it was necessary to project the hourly rate to cover a manyear. the estimated annual production of a typist was then divided into the annual salary of a gs-4 (step 1) typist incremented by 8.5% for fringe benefits. the ocr equipment costs were computed on the basis of figures supplied by the contractor, assuming ownership of the ocr-font typewriter and service bureau rental of the scanner. 246 journal of library automation vol. 3/3 september, 1970 table 5. input costs per record 1. manpower key to cassette method typing $ .45 proofing .70 total $1.15 ocr method typing rate of contractor 1,000 records in 104 hours or 9.6 records per hour typing cost at lc $5,522 + 8.5% ( $5,522) 9.6 x 1,338 $ .466 proofing rate of recon editors at lc: 1,534 records proofed in 173 hours or 8.9 records per hour20% = 7.1 records per hour proofing cost at lc $6,882 + 8.5% ( $6,882) $ .786 7.1 x 1,338 typing $ .466 proofing .786 total $1.25 2. equipment (costs do not include maintenance where applicable ) key to cassette key to cassette monthly rental $100.00 converter-monthly rental prorated over 10 key to cassettes 26.00 total $126.00 hourly cost (assumes 132 hours a month) $ .955 effective production rate of key to cassette average weekly marc output 1,005 4 k t c tt 't 120 = 8.4 records/hour ey o asse e um s record cost of key to cassette and converter $.955 8.4 = $ .114 recon pilot project/ avram 247 ocr method ocr-font typewriter purchase price 40-month amortization hourly cost (assumes 132 hours use) effective production rate of ocr typewriter $500.00 12.50/month .095 9.6 records/hour x 1,338 homs d /l 132 hours x 12 months 8·1 recor s 10ur record cost of ocr typewriter $.095 sr=$ .o12 ocr scanner-service bureau hourly rental 10,000 lines/hour each recordis lines 555 records/hour record cost of ocr scmmer total record cost for equipment $.012 + $ .09 = $ 50.00 $ .09 $ .102 the cost of proofing in the ocr method was based on the recon experience at lc modified by contractor experience. in actual practice, ocr records are proofed and corrected by the contractor before they are proofed by recon editors. it was assumed that double proofing is unnecessary but that allowance should be made for the added difficulty of reading copy with a higher proportion of errors. (a preliminary study of errors on recon proofsheets has shown that there are fewer typographical errors on recon proofsheets than on current marc proofsheets.) for this reason, the number of recon records proofed in an hour has been decreased by 20% in the calculations. on the basis of the calculations in table 5, the comparative input costs are summarized as follows: table 6. estimated input cost per record key-to-cassette ocr manpower: typing $.45 $.47 proofing .80 .78 equipment .11 .10 totals $1.26 $1.35 the final figures indicate that the two methods are very close in cost. as presently calculated, the key-to-cassette method is less expensive than the ocr method. it is easy to see that a slight change in any cost or production rate could make the ocr method less expensive. if the proofing 248 journal of library automation vol. 3/3 september, 1970 rate of 8.9 records per hour were maintained instead of decreasing to 7.1 per hour, the ocr proofing cost would drop to $.63, and the total price for this proposed method would be $1.20. one way to test the assumption of the added difficulty of a single proofing would be to obtain uncorrected records from the contractor as a means of determining the actual proofing rate under that condition. recon tasks the four tasks that have been identified for study by the working task force are: 1) levels of completeness of marc records; 2) implications of a national union catalog in machine readable form; 3) conversion of existing data bases in machine readable form for use in a national bibliographic service; and 4) study of problems involved in any future distribution of name and subject cross reference control files. progress to date on the first three tasks is described in the following paragraphs. task 1 has been completed, and an article summarizing the results of a report submitted to clr has been published in the journal of library automation, june 1970 ( 10). the following conclusions reached by this study are quoted from the article: 1) the level of a record must be adequate for the purposes it will serve. 2) in terms of national use, a machine readable record may function as a means of distributing cataloging information and as a means of reporting holdings to a national union catalog. 3) to satisfy the needs of diverse installations and applications, records for general distribution should be in the full marc ii format. 4) records that satisfy the nuc function are not necessarily identical with those that satisfy the distribution function. 5) it is feasible to define the characteristics of a machine readable nuc report at a lower level than the full marc ii format. task 2 consists of an investigation of the implications of a national union catalog in machine readable form. a design of such a system is needed, and although the implementation of such a project is beyond the purview of the working task force, some of the technical and cost factors should be examined and defined for possible future research. as a framework for discussion purposes, a future reporting system for the national union catalog was postulated based on the present reporting system as follows: contributors lc outside libraries present report form printed cards locally produced cards and lc cards future report form lc marc data (for all records) marc data (for all records) or records submitted to nuc to be keyed as machine readable records recon pilot project/ avram 249 the problems of the control number and library location symbols were considered, but a tentative decision was made that recommendations should be forthcoming when the american national standards institute sectional committee z39 has completed its work on library identification codes. the indicators and subfield codes to be included in the machine readable nuc records would depend on the optimum file arrangement of the suggested bibliographic listings. the library of congress is presently engaged in a filing rules study which should influence the inclusion or exclusion of particular content designators. task 2 is still in progress. task 3 is the investigation of the possible utilization of other machine readable data bases for use in a national bibliographic store. the task was divided into several subtasks as follows: 1) identification of useful data bases for the purposes described (content and bibliographic completeness); 2) cost of the conversion from a local format to a marc ii record; 3 ) cost of updating records not already in the lc data base for consistency and missing data by comparing the records with the library of congress official catalog; 4) cost of comparing the record for the existing lc machine readable records to eliminate duplicate records. to satisfy the first subtask, a questionnaire was sent to 42 organizations. the information requested included: 1) availability of data bases-maintained by library or service bureau, and permission to copy data base. 2) use of the data base-for acquisitions, production of book catalog, circulation system, etc. 3) composition of data base-monographs, serials, technical reports, etc. 4) composition of data base-number of titles, imprint dates (primarily current, retrospective, etc.), language of records. 5) source of catalog data-marc distribution service, lc catalog card, local cataloging. 6) data elements for monographs. 7) format used in identifying data elements-marc i format, marc ii format, etc. 8) character set used. the results from this survey were analyzed, and a follow-up letter was sent to 22 of the organizations, requesting further information as follows: 1) an estimate of the number of monographs added to the data base each year. 2) representative group of twenty-five entries for monographs including both fiction and non-fiction. 3) details on the character set used in the machine readable data base.· 4) detailed specifications of monographic record format. responses from this last letter have been received and analyzed. this analysis should identify a limited number of machine readable data bases that will be subjected to further content and cost analysis. 250 journal of library automation vol. 3/3 september, 1970 outlook the recon project continues to be on schedule. the working task force has met several times for deliberations on the assigned tasks; in addition, members have been briefed on the progress of the pilot project and their advice has been sought. thus, individuals interested in the problems of bibliographic conversion guide the project throughout its development. the library of congress recon staff continues to maintain liaison with individuals and organizations working in any facet of the project's scope, hoping to bring all expertise possible to bear on the problems involved. it is significant, although not fully recognized at the onset of the recon project, that the solution to many of the problems under exploration will have impact on current conversion as well as retrospective conversion. this is evident at the library of congress where marc and recon, although staffed separately in the production area, share staff in the information systems office, and the project is known as marc/recon. coordination continues between the recon project and the card division mechanization project. the recon project director is the technical adviser for the card division project, and under her general direction, a computer analyst in the information systems office has been assigned full time to the project. the analyst has been given a detailed orientation to the procedures and computer programs for marc/recon and the specifications for the card division project. this exposure is necessary to guarantee that there is no duplication of effort between the two projects and that the design work for the card division project includes the possibility of a future national service for machine readable cataloging, both current and retrospective. (the marc distribution service is such a national service for english language monograph cataloging data, but what is assumed here is a service of a much broader scope.) although progress has been made in many of the tasks included in recon, several methods of input described in the recon report can only be fully evaluated when the format recognition programs are implemented. according to present estimates, this should take place toward the end of 1970. much remains to be accomplished. the library of congress will continue to make its progress known as rapidly as possible, because the results of the pilot project will have great ramifications for the entire library community. acknowledgments the authors wish to thank the staff members associated with the recon pilot project in the technical processes research office and the marc editorial office in the library of congress processing department, and recon pilot profectj avram 251 those in the information systems office, for their respective reports, which were incorporated into the progress report submitted to the council on library resources and which provided significant contributions to this paper. references l. avram, henriette d.: "the recon pilot project: a progress report," journal of library automation, 3 (june 1970). 2. recon working task force: conversion of retrospective records to machine-readable form (washington, d. c.: library of congress, 1969), pp. 32-33. 3. avram, henriette d., et al.: "marc program research and development: a progress report," journal of library automation, 2 (december 1969)' 250-253. 4. recon working task force: op. cit., p. 31. 5. national microfilm association: glossary of terms for microphotography and reproductions made from micro-images. 4th rev. ed. (annapolis, md.: national microfilm association, 1966), p. 8. 6. ibid. 7. ibid., p. 52 8. hawken, william r.: copying methods manual (chicago: library technology program, american library association, 1966), p. 243. 9. recon working task force: op. cit., pp. 58-59, 86, 93. 10. recon working task force: "levels of machine-readable records," journal of library automation, 3 (june 1970). / 78 design principles for a comprehensive library system tamer uluakar, anton r. pierce, and vinod chachra: virginia polytechnic institute and state university, blacksburg, virginia. this paper describes a project that takes a step-by-step or incremental approach to the development of an online comprehensive system running on a dedicated computer. the described design paid particular attention to present and predicted capabilities in computing as well as to trends in library automation. the resultant system is now in its second of three releases, having tied together circulation control, catalog access, and serial holdings . perspective the use of computers in libraries is no longer a speculative venture for the daring few. rather, library automation has become the accepted prerequisite for effective library service. the question faced is not "if," but rather "how" and "when." the reasons for this evolution are diverse, but fundamental is the recognition of online computer processing as the most effective means of simultaneously handling inventory control, information retrieval, and networking of large, complex, and volatile stores of data. most areas of current library practice could now benefit from effective computer-based control. mature and proven systems exist for cataloging, circulation, serials control, acquisitions, catalog access, and "reader guidance"; the latter by virtue of online literature searching facilities such as dialog, medlars, or brs. the challenge is to find or develop an optimal mix of capabilities. two common limitations from which library automation projects suffer are the use of nonstandardized, incomplete records and the lack of functional integration of different tasks. in most cases these limitations are due to historic circumstances. the pioneering systems say, those online systems introduced between 1967 and 1975 had to conserve carefully the available computing resources. a decade ago it was unthinkable for any library to store a million marc records online. mass manuscript received july 1980; accepted february 1981. design principles/uluakar, et al. 79 storage costs alone precluded that option. to best realize the benefits of automation, short records, usually of fixed length, were employed. there is little question that systems based on short records were helpful to their users . however, one characteristic of these systems was their proliferation within a particular library. after the first system was shown to be a success, it became compelling to try another. the problem was that these separate systems were usually not communicating directly with each other because of limitations imposed by program complexity and load on available resources. thus, the use of incomplete records breeds isolated, noncommunicating systems. however, system users have come to demand that all relevant data be available at a single terminal from a single system. it is not enough to know that a particular title is due back in twenty-five days; the user must also know that copy two has just been received, and that copy three is expected to arrive from the vendor in one week. that is, the functions of catalog access, circulation, and acquisitions must be brought together at a single place the user's terminal. and while the importance of functional integration has been recognized for some time, only a very few report successful implementations. i,z the kafkaesque alternative to functional integration becomes the library that has been "well computerized" but where the librarian must use five different terminals, one for each task. as computer-based systems have grown to maturity, increasing stress has been placed on standardization . in library automation the measure of standardization is wide-scale use of the marc formats for documents and authorities; the use of bibliographic "registry" entries such as isbn, issn, or coden; the use of standard bibliographic description; and so forth. however, the application of common languages and standardized protocols, data description, and definition has been less pervasive. we find many applications that eschew use of the common high-level languages, database management systems, and standard "off-the-shelf' or general-purpose hardware. the emergence of powerful and easy-to-use database management systems, the spectacular price reductions in hardware, and the concomitant, and equally spectacular, improvements in system capabilities have made it clear that it is practical to think ambitiously. perhaps the major articulation of these developments has been the pervasive shift from a central computer shared with nonlibrary users to the utilization of dedicated minicomputers. 3 our analysis of the requirements of a comprehensive system led to recognition of the key role played by serials in research libraries. serials form the most critical factor in automating library service because of the complexity of their bibliographic, order, and inventory records, and because of their importance to research. 4 a fundamental error in designing a comprehensive library system would involve focusing on the require80 journal of library automation vol. 14/2 june 1981 ments of monographs and/or other "one-shot" forms of the literature. the reason is, simply, that monographs and other such publications can be treated as an easy limiting case of a continuing set of publications . this observation is borne out by christoffersson, who reports an application that extends the idea of seriality and develops a means to provide useful control and access to all classes of material. 5 design philosophy the concerns outlined above mean that a viable library system should meet the following design criteria: functional integration. functional integration is simply the ability to conduct all appropriate inquiries, updates, and transactions on any terminal. this envisages a cradle-to-grave system wherein a title is ordered, has its bibliographic record added to the database, is received and paid, has its bibliographic record adjusted to match the piece, is bound, found by author, title, subject, series, etc., charged out, and, alas, flagged as missing. in this way a terminal linked to the system will be a one-stop place to conduct all the business associated with a particular title, subject, series, order, claim, vendor, or borrower. completeness of data. if the system is to be functionally integrated, it is clear that it must carry the data required to support all functions. in particular, data completeness is required to satisfy the access and control functions. consider, for example, the problems associated with the cataloging function. a book is frequently known by several titles or authors. creating these additional access points is a large portion of the cataloger's responsibility. only systems that allow the user access to these additional entries utilize the effort spent in building the catalog record. such system capabilities must be present to allow the laborintensive card catalog to be closed and, more important, to allow maintenance of the catalog within the system . use of standardized data and networking. in an excellent article, silberstein reminds us that, in general, the primary rationale for adhering to standards is interchangeability. 6 we give great importance to being able to project our data to whatever systems may develop in the future. we believe this consideration is of the highest priority because, fundamentally, the only thing that will be preserved into the future is the data itself.* without interchangeability of data, sharing of resources is impossible. data interchangeability is, of course, a basic assumption that has been made in speculation concering the national bibliographic network7 developing from the bibliographic utilities-notably, oclc, inc., the research libraries group's rlin facility, the washington library network, and the university of toronto's utlas facility. today, nearly all *this state of affairs seems to be true for all computer-based systems because their lifetime is, typically, no greater than ten years. design principles!uluakar, et al. 81 research libraries participate in some utility. while their participation is primarily directed to utilization of the c<;~,taloging support services, we find an increasing amount of interest and use of additional capabilities, notably interlibrary loan. we expect a steady and continual growth of these library networking capabilities. however, networking is not problem free. perhaps the biggest single problem in using the network is the misalignment between the record as found on the bibliographic database and the requirements of individual libraries. while such variability between the resource database record and the user's needed version is well understood, 8 the local library frequently has a difficult time adjusting records to meet local needs. one example is oclc's inability to "remember" in the online database a particular library's version of a record. another example is the conser project's practice of "locking" very dynamic records as soon as they are authenticated. this locking frequently means that required updates cannot be made and users cannot share with one another corrections to the base record. after locking, each must, independently, go about bringing the record up to date. thus, as roughton notes, "the next library to call up the record loses the benefit of the previous library's work. "9 this inhospitable state of affairs forces individual libraries to maintain their own records if they wish to change bibliographic records after initial entry. the problem of local adjustment of bibliographic records in no way conflicts with the goal of standardized bibliogra:phic data. standardized data provides a quick means of delivering an intelligible package to a variety of users who will adapt the package to meet their particular needs . standardization does not mean making adaptation inefficient or more costly than it need be; rather, standards provide a framework around which the details are filled in. these observations on standardized data formats imply that the library's data must be based on marc records for books, serials, authorities, etc.; and on the ansi standards for summary serials holdings notation, book numbers, library addresses, and so forth. microscopic data description. at this point, system administrators face a fundamental problem-many of the library's important records have no standard format. the most conspicuous example involves the notation for detailed serials holdings. 10 the only alternative one has when trying to build a system without standardized formats is to rely on "microscopic" description. that is, each and every distinct type of data element that makes up (or can make up) a field in a record must be accounted for and uniquely tagged. in this way, whatever standard format is ultimately set, it will be possible, in principle, to assemble by algorithm the data elements into an arrangement that will be in conformity with the standard. only if the library is using microscopic data description will the library be able to maintain its independence of particular lines 82 journal of library automation vol. 14/2 june 1981 of hardware or software. we are convinced that the use of untagged, free-form input will, in the long run, spell disaster. use of general purpose hardware and software. many strategies in dealing with library automation involve redesigning standard hardware or software. for example, one vendor has reported an interesting design of mass storage units that improved access time. 11 we feel that future applications should, as much as possible, steer clear of such customized implementations because the standard capabilities of most affordable systems allow sufficient processing power and storage economies even if these capabilities are suboptimal for a particular application . the use of general-purpose hardware and system software promotes system sharing between different installations. moreover, an application based on general-purpose hardware and system software will be easier to maintain and far less vulnerable to changes in personnel. for turnkey installations, the greater the degree of use of general-purpose hardware and software, the better shielded will the installation be against changes in product line or the vendor's ultimate demise . a noteworthy application of this principle of compatibility is seen in the system being developed by the national library of medicine. 12 system description the functional capabilities of the virginia tech library system (vtls) have been developed in two software releases, with the third release soon to appear. the initial release met the needs associated with circulation control and also provided rudimentary access to the catalog and serials holdings. the present release has benefited from the use of the marc format, and allows vastly improved catalog access and control. release iii, the comprehensive library system now being developed, will draw together acquisitions, authority control, and serials control with the current capabilities. vtls release i the initial release of the system was developed in 1976 to meet needs generated by rapid library growth. circulation transactions had been increasing at about 10 percent annually for the previous decade and were straining the manually maintained circulation files beyond acceptable limits. the main library* at virginia tech is organized in subject divisions-each essentially "owning" one floor of a 100,000-square-foot facility. a 100,000-square-foot addition to the library had been approved. because virginia tech's library has only one card catalog, some means was necessary to distribute catalog information throughout a facility that *only two quite small branch libraries (architecture and geology) exist on campus . in addition there is a reserve collection located in the washington, d.c., area that supports off-campus graduate programs in the areas of education, business administration, and coiuputer science. all these sites are linked to the system. design principles/uluakar, et al. 83 was to double its size. after reviewing the alternative means of distributing the catalog-e . g., a duplicate card catalog, photographic reproduction of the catalog, or a com catalog-it was decided to attack both problems, circulation control and remote catalog access, within a single online system . vtls was installed on a full-time basis in august 1976. its first release ran continuously on the library's dedicated hewlett/packard 3000 minicomputer until december 1979 . at that time the system held brief bibliographic data for approximately 325,000 monographs and 25,000 journals and other serial titles-records for about half the collection. while the first release ably met its goals, it became clear that it would prove to be an unsuitable host for additional modules involving acquisitions and serials control, primarily because of the brief, fixed-length bibliographic records. as a result of highly favorable price reductions in computer hardware and improvements in capability, it was possible to think in terms of storing one million marc records online as well as supporting the additional terminals required for a comprehensive library system. vtls release ii vtls runs under a single online program for all real-time transactions. the major goals in the design of this program were the following: 1. two conflicting requirements had to be a~commodated : first, the program had to be easy to use for library patrons. this is requisite for a system that will eventually replace the card catalog. second, the program had to be practical, efficient, and versatile for its professional users. the keystrokes required had to be minimal, and related screens had to be easily accessible· from one to another. 2. the response time had to be good, especially for more frequent transactions. 3. the contents of all screens had to be balanced to provide enough information without being overcrowded and difficult to read or comprehend. further, each screen of vtls had to be arranged by some logical arrangement of the data it contains-for most screens this meant alphabetical sorting of the data according to ala rules. 4. the format of all screens, especially those to be viewed by the patrons, had to be visually pleasing. thus , the use of special symbols (which are so abundant on many computer system displays), nonstandard abbreviations, and locally (and often quite arbitrarily) defined terms were unacceptable. 5. the program had to have security provisions to restrict certain classes of users from addressing particular modules of the program. considerable effort was spent to satisfy these goals. the first goal was achieved by the "network of screens" approach. the second goalprompt system response-necessitated the use of the "data buffer 84 journal of library automation vol. 14/2 june 1981 method," which, in turn , proved to have other uses (both of these techniques are discussed below) . to satisfy goals three and four, a committee of librarians and analysts spent months drafting and reviewing each screen until it was finally approved by the design group. goal fivesecurity provisions-was reached without much difficulty. network of screens vtls' s data-access system is designed to be used as easily as a road map. this is accomplished by the use of a "network of screens." the network of screens is much like a road map in which a set of related data (a screen displayed in one or more pages) acts as a "city," and the commands that lead from one set to another act as "highways." vtls has nineteen screens including various menu screens, bibliographic screens (see "the data buffer method" below), serial holdings screens, item (physical piece) screens, and screens for patron-related data. the user can "drive" from one "city" to another us ing system commands. the system commands are either "global" or "local." global commands, as the name implies, may be entered at any point during the execution of the online program. a local command is peculiar to a given screen. global commands are of two types: search commands and processing commands. search commands are used to access the database by author, title, subject, added entries, call number, lc card number, isbn, issn, patron name, etc. processing commands, on the other hand, initiate procedures such as check-out, renewal, or check-in of items. the user first enters a global (search) command to access one of the screens in the network. from there, local commands that are specific to the current screen can be used. there are three different types of local commands: commands that take the user from one screen to another; commands that page within the current screen; and commands that update data related to the screen. for example, it is possible to start by entering an author search command to access the network and then proceed not only to find what books the author has in the system but also the availability of each of the books . if the books are checked out, information about the patrons who have them can also be reached. this display is called the patron screen . from the patron screen, one can "drive" to the patron activity screen , which displays circulation information about the patrons. thus, each d isplayed screen leads to another. in fact, the searches can start at ten different screens and proceed in many different ways through the network. database design image/3000, hewlett-packard's database management system used by vtls, is designed to be used with fixed-length records. this fact, coupled with the need to sort entries on most screens, created serious problems in the early stages of the system design . but various techdesign principles/uluakar, et al. 85 niques were devised to overcome these apparent road blocks . figure 1 illustrates the breakdown of the bibliographic record in the database and the way it is linked with piece-specific · data. bibliographic data are stored in three distinct groups for subsequent retrieval: l. controlled vocabulary terms. (authority data set) 2. title and title-like data. (title data set) 3. all remaining bibliographic data; i.e., data that is not indexed. (marc-other data set) this grouping of the marc record extends to subfields, thus splitting mixed fields such as author-title added entries . when individual fields are parsed in this way, a single field may contribute more than one access point, such as variant forms of author, title, series name, subject, and added entries. access by the standard bibliographic control numbers is effected by use of inverted files (not shown in the figure). a fundamental characteristic of this layout involves the storage of controlled vocabulary terms (i.e., authors and subjects). regardless of the number of references made to an authority term from different bibliographic records, the controlled vocabulary term is stored only once. the system assigns a unique number (authority id) to each such term and uses this number to keep records of the references made to it in a separate data set (authority bibliographic linkage data set). this particular structure makes an authority control subsystem possible, speeds up online retrieval and display, and economizes mass storage. the data buffer method the system displays bibliographic records in two different formats. if the terminal used is designated for librarians, the records are displayed al'thority -bibliographic linka<;e data set fh;ure 16. biblio<;raphic layo ut of the cfs-11 data base . tsimplif'iedl fig. 1. bibliographic layout of the vtls database (simplified). 86 journal of library automation vol. 14/2 june 1981 in the marc format (the resulting screen is referred to as the marc screen); otherwise, they are displayed in a screen that is formatted similar to a catalog card. before displaying these screens, the online program collects and formats the data to be displayed and stores it in one of the two "buffer" data sets. the records stored in the buffer data sets are called buffer records. buffer records can be edited, as required, by adding new lines, deleting, or modifying existing character strings. these updates can be executed quickly and without placing much load on the system since they involve little, if any, analysis, indexing, and sorting. thus, the buffer data sets store all bibliographic updates and new data entry of the day. at night, these records are transferred to the rest of the database by a batch program. the data buffer method has had several pronounced effects on the system. by transferring periods of heavy resource demand to off-hours, the system can work with full marc records in a library that has a heavy real-time load of data entry, inquiry, and circulation. the data buffer approach also improves access efficiency because once a buffer record is prepared for a screen, subsequent searches for the same record are satisfied by the buffer record. data entry and the oclc interface the most frequently encountered method of entering marc records into a local computer involves use of tape in the marc ii communications format . alternative methods include the use of microprocessors or digital recorders which "play back" a marc-tagged screen image from oclc or some other bibliographic utility. these alternative methods have the strong advantage of shortening the delay introduced while waiting for a tape to be delivered. we have been able to link the utility's terminal to the data buffer. 13 data flows from the utility to the buffer in real time. no intervention in the utility's terminal was required for the local processor to be able to capture the marc-tagged screen. batch programs running on the hip 3000 read records from printer ports of oclc terminals and pass them directly to the data buffer. once a record gets into the data buffer, it is accessible by oclc number so that subsequent editing and linkage to piece-specific data or serial holdings can be made right away in the local system . buffer records can also be created by direct keyboarding of the full array of fixed and variable fields using the vtls terminals. circulation as with most other online circulation systems, vtls uses machinesensible bar-code labels to identify books and borrowers to the system. all efforts have been made to humanize the system. one consequence is design principles/uluakar, et al. 87 that the system does not make decisions better made by responsible staff. thus, two kinds of circulation stations reside side by side. the first is staffed by students who typically work a ten-to-twenty-hour week and historically have shown high turnover. their circulation stations only deal with inquiries and with heavily used but nondiscretionary transactions: check-out, renewal, and check-in. should problems arise, the borrower is directed to the adjacent station staffed by a full-time employee who, using the system, can articulate circulation policy to borrowers and make decisions with regard to any questions concerning fines, lost books, or reinstatement of invalidated or blocked privileges . start-up we found system start-up to be a relatively easy task. it was convenient to use the so-called rolling conversion in which items were labeled upon their initial circulation through the system. the greatest benefit was seen in the first year when the probability that items brought to the circulation desk were already known to the system increased exponentially. after six months this probability had risen to 65 percent with only 10 percent of the circulating collection having been labeled . at the end of the year the probability increased linearly at 0. 7 percent per month. after three years of operation, the probability was 90 percent, with approximately 50 percent of the circulating collection having been labeled. reference use the ability to distribute catalog access as well as circulation information provides a powerful information tool. a subset of all functions previously described is available to the nonlibrarian users of the system through user-cordial screens. a "help" function may also be initiated at any screen to guide users through the network of screens. current development critical to the overall design of vtls is the system's ability to treat serials and continuations. without this capability, the modules being developed to support acquisitions, serials check-in and claiming, and binding, will not function satisfactorily. equally important, the design lays the foundation for authority control by virtue of its use of a dictionary for all controlled vocabulary terms. thus a name or subject entry is carried internally as a four-byte code, which is translated to the authority entry upon display. another internally coded data element, the bib-id, is designed to handle many of the linkage problems associated with serials and continuations. the bib-id is unique for each marc record. prior to establishing the serials control modules governing receipt, 88 journal of library automation vol. 14/2 june 1981 claiming, and binding, the coded holdings module must be functioning. this module will allow automatic identification of volume (or binding unit) closure and automatic identification of gaps in holdings or overdue receipts. thus, highest priority has been given to the development of this module so that these other modules can, in turn, develop. the holdings module serves two functions: first, it allows the detailed recordings of serials holdings consistent with the principle stated earlier concerning microscopic data description; and second, these microscopic data are coded so that the system can recognize (and predict) particular pieces or binding units in terms of enumerative and chronological data. the next three areas of development are modules for acquisitions and fund control, serials receipts and binding, and authority control. the final development will be comprehensive management reports. it should be noted that each one of these developments will result in a specific benefit to the user community. the project is incremental in that the development of area a does not mean that area b must be developed for a to have lasting value. this incremental approach offers designers and administrators the advantages associated with an orderly growth in complexity and budget requirements. further, the capabilities of the host hardware and software are stressed in smaller steps than would be the case if the comprehensive system were written and then turned on. the key move appears to be predefining the scope and capabilities of each stage so that a useful product emerges at its completion, and so that it lays a foundation for the next. references 1. velma veneziano and james s. aagaard, "cost advantages of total system development," in proceedings of the 1976 clinic on library applications of data processing (urbana, ill.: university of illinois press, 1976), p.133-44 . 2. charles payne and others, "the university of chicago data management system ," library quarterly 47:1-22 (jan . 1977). 3. audry n. grosch, minicomputers in libraries (new york: knowledge industry press, 1979), 142p . 4. richard degennaro, "wanted: a mini-computer serials system," library journal 102:878-79 (april 15, 1977). 5. john g. christoffersson, "automation at the university of georgia libraries," journal of library automation 12:23-38 (march 1979). 6. stephen m. silberstein, "standards in a national bibliographic network," journal of library automation 10:142-53 (june 1977). 7. network technical architecture group, "message delivery system for the national library and information service network: general requirements," in david c. hartmann, ed . , library of congress network planning paper, no.4, 1978, 35p. 8. arlene t. dowell, cataloging with copy (littleton, colo.: libraries unlimited, 1976), 295p. 9. michael roughton, "oclc serials records: errors , omissions, and dependability," journal of academic librarianship 5:316-21 (jan. 1980). 10. tamer uluakar, "needed: a national standard for machine-interpretable representation of serial holdings," rtsd newsletter 6:34 (may/june 1981) . design principles!uluakar, et al. 89 11. c.l. systems, inc., "the libs 100 system: a techn-ological perspective," clsi newsletter, no .6 (fall/winter 1977). 12. lister hill national center for biomedical communications, national library of medicine, "the integrated library system: overview and status" (lhc/ctb internal documentation, bethesda, md., october 1, 1979), 55p. 13. francis j. galligan to pierce, 11 feb. 1980. tamer uluakar is manager of the virginia tech library automation project. anton r. pierce is planning and research librarian at the university libraries. vinod chachra is director of computing resources and associate professor of industrial engineering. a scatter storage scheme for dictionary lookups d. m. murray: department of computer science, cornell university, ithaca, new york scatter storage schemes are examined with respect to their applicability to dictionary lookup procedures. of particular interest are virtual scatter methods which combine the advantages of rapid search speed and reason• able storage requirements. the theoretical aspects of computing hash addresses are developed, and several algorithms are evaluated. finally, experiments with an actual text lookup process are described, and a possible library application is discussed. a document retrieval system must have some means of recording the subject matter of each document in its data base. some systems store the actual text words, while others store keywords or similar content indicators. the smart system ( 1) uses concept numbers for this purpose, each number indicating that a certain word appears in the document. two advantages are apparent. first, a concept number can be held in a fixedsized storage element. this produces faster processing than if variablesized keywords were used. second, the amount of storage required to hold a concept number is less than that needed for most text words. hence, storage space is used more efficiently. smart must be able to find the concept numbers for the words in any document or query. this is done by a dictionary lookup. there are two reasons why the lookup must be rapid. for text lookups, a slow scheme is costly because of the large number of words to be processed. for handling user queries in an on-line system, a slow lookup adds to the user response time. 174 journal of library automation vol. 3/3 september, 1970 storage space is also an important consideration. even for moderate sized subject areas the dictionary can become quite large-too large for computer main memory, or so large that the operation of the rest of the retrieval system is penalized. in most cases a certain amount of core storage is allotted to the dictionary, and the lookup scheme must do the best possible job within this allotment. this usually means keeping the overhead for the scheme as low as possible, so that a large portion of the allotted core is available to hold dictionary words. the rest of the dictionary is placed in auxiliary storage and parts of it are brought in as needed. obviously the number of accesses to auxiliary storage must be minimized. this paper presents a study of scatter storage schemes for application to dictionary lookup, methods which appear to be fast and yet conservative with storage. the next two sections describe scatter storage schemes in general. they are followed by a section presenting the results of various experiments with hash coding algorithms and a section discussing the design and use of a practical lookup scheme. the final sections deal with extensions and conclusions. basic scatter storage method a basic scatter storage scheme consists of a . transformation algorithm and a table. the table serves as the dictionary and is constructed as follows: given a natural language word, the algorithm operates on its bit pattern to produce an address, and the concept number for the word is placed in the table slot indicated by this address. this process is repeated for every word to be placed in the dictionary. the generated addresses . are called hash addresses; and the table, a hash table.· · there are many possible algorithms for producing hash addresses ( 2,3,4). some of the most common are: 1 ) choosing bits from the square of the integer represented by the input word; 2) cutting the bit pattern into pieces and adding these pieces; 3) dividing the integer represented by the input word by the length of the hash table and using the remainder. · collisions in an ideal situation every word placed in the dictionary would have a unique hash address. however, as soon as a few slots in the hash table have been filled, the possibility of a collision arises-two or more words producing the same hash address. to differentiate among collided entries, the characters of the dictionary words· must be stored along with their concept numb~rs. during lookup, the input word can then be compared with the character string to verify that the correct table entry has been located. · · the problem of where to store the collided items has several · methods of solution ( 3,5). the linear scan method places a collided item in the first free table slot after the slot indicated by the hash address. the scan· is scatter storage for dictionary lookups/murray 175 circular over the end of the table. the random probe method uses a crude algorithm to generate random offsets r(i) in the interval [1,h] where h is the length of the hash table. if the colliding address is a, slot a+r( 1) mod h is examined. the process is repeated until an empty slot is found. both of these methods work best when the hash table is lightly loaded; that is, when the ratio between the number of words entered and the number of table slots is small. in such cases the expected length of scan or average number of random probes is small. chaining methods provide a satisfactory method of resolving collisions regardless of the load on the hash table. however, they require a second storage table-a bump table-for holding the collided items. when a collision occurs, both entries are linked together by a pointer and placed in the bump table. a pointer to this collision chain is placed in the hash table along with an identifying flag. further colliding items are simply added to the end of the collision chain. table layout and search procedure in the virtual scatter storage system described later, the hash table has a high load factor. hence the chained method (or rather a variation of it) is used to resolve collisions. further discussion involves only scatter storage systems using collision chains. with this restriction, then, a scatter storage system consists of a hash table, a bump table, and the associated algorithm for producing hash addresses. a dictionary entry consists of a concept number and the character string for the word it represents. these entries are placed in the hash-bump table as described above. consequently there are three types of slots in the hash table-slots that are empty, slots holding a single dictionary entry, and slots containing a pointer to a collision chain held in the bump table. figure 1 is a typical table layout. hash table 0 empty slot • • concept + char • nary entry single dictio . . pointer -..j.,ntry 11 'r --~)~entry 21 \ collision cha in fig. 1. typical table layout. 176 journal of libmry automation vol. 3/3 september, 1970 one of the advantages of scatter storage systems is that the search strategy is the same as the strategy for constructing the hash-bump tables. a word being given, its hash address is computed and the tables searched to find the proper slot. during construction, dictionary information is placed in the slot; during lookup, information is extracted from the slot. the basic search procedure is illustrated by the flow diagram in figure 2. the construction procedure is similar. pointer,----< get next bump table entry input the text ~rd coiilpute hasii. address return concept number word never entered in dictionary fig. 2. flow diagram for the lookup procedure in basic scatter storage systems. scatter storage for dictionary lookups/murray 177 theoretical expectations an ideal transformation algorithm produces a unique hash address for each dictionary word and thereby eliminates collisions. from a practical point of view, the best algorithms are those which spread their addresses uniformly over the table space. producing a hash address is simply the process of generating a uniform random number from a given character string. if the addresses are truly random, a probability model may be used to predict various facts about the storage system. suppose a hash table has h slots and that n words are to be entered in the hash-bump tables. let h, be the expected number of hash table slots with i entries for i=0,1, ... n. in other words, ho is the expected number of empty slots, h1 is the expected number of single entries, and h2,hs, ... , hn are the expected number of slots with various numbers of colliding items. even though the items are physically located in the bump table, they may be considered to "belong .. to the same slot in the hash table. it is expected that: n 1) h=~ h. i=o n 2) n="s i h, i=o now let x _ (1 if exactly i items occur in the r~~ slot ' 1 ~0 if exactly i items do not occur in the j'11 slot for i = 1,2, ... , h then h, = e [xu + x.2 + ... + xm] h = ~ e [x,j] i= 1 assume that any chosen table slot is independent of the others so that the probability of getting any single item in the slot is 1/h. then the probability of getting exactly i items in that slot is 3)p·= (~x1r(1-1f then e[x,1] = 1·p, + 0· (1-p,) = p, substituting into the above 4) h,= h·p, = n( ~x~) i ( 1~ ri for j = 0,1, ... > n 178 journal of library automation vol. 3/3 september, 1970 for the cases of interest h and n are large, and the poisson approximation can be used in equation 3: p ·nih (n/h)' •e il the ratio n f h is the load factor mentioned previously. it is usually designated by a so that a• 5) h, = he·a if i=o,l, . . . , n equation 5 is sufficient to describe the state of the scatter storage system after the entry of n items. most of the statistics of interest can be predicted using this expression; a few of them are listed in table 1. the time required for a single lookup using a hash scheme depends on the number of probes into the table space, that is, how many slots must be examined. suppose the word is actually found; if it is a single entry, only one probe is required. if the word is located in a collision chain, the number of probes is one (for the hash table) plus one additional probe for each element of the collision chain that must be examined. suppose that the word is not in the dictionary; if its hash address corresponds to an empty table slot, again only one probe is needed. however, if the address points to a collision chain, the number is one plus the length of the chain. for words found in the dictionary the average number of probes per lookup is : i 6) p = 1 + n[(o)ht + (1+2)hz + (1+2+3)hs + ... n i = 1 + ~ h , ~ f i=2 f=1 1 n = 1 + 2 ~ (i+1)fi-1 i=2 1 n 1 n =i+ 2 ~ (i-1) f i-1 +2 l f •. t i=2 i=2 1 n+1 1 n+1 = 1 + 2 ~ (i-1)fi-t + 2 ~ f1-1 i=2 i=2 (probes) + (1+2+ ... + n)hn] scatter storage for dictionary lookups/murray 179 table 1. expected storage and search properties fo1' basic scatter storage schemes measure load factor number of empty table slots number of single entries number of collision chains of length i expected sums fraction of hash table empty fraction of table filled with single entries fraction of hash table slots with i entries expected sums number of collisions number of entries in the bump table total table slots required average lookup time (probes) h = number of hash table slots n = number of words to be entered formula a=n/h ho =he-a h1 = ne-a ai hi = h e-a--:-r i = 2,3, ... , n z. n h = ~--hi i=o n n =~--hi i=o 1 fo = h ho= e-a \ 1 f1 = h h1 = aea 1 a' f, = h hi = e-a if i = 2,3, ... , n n 1 = ~ f, i=o n a = ~ i f, i=o no = h2 + h a + ... + hn = h ho-hl b = n-hl s = h+b 180 journal of library automation vol. 3/ 3 september, 1970 vihtual scatter storage method from table 1, the expected number of collisions is nc= hhoht = h( 1 e ·ntn _ ~e·nih) for a fixed n, this number decreases as h increases. at the same time the number of empty hash table slots ho = h e·nt n increases as h increases. both of these results are expected; as the hash addresses are spread over a larger and larger table space ( h slots), the number of collisions should decrease and the number of empties increase for a fixed number of entries ( n). a virtual scatter storage scheme tries to balance these opposing strains by combining hash coding with a sparse storage technique. large or virtual hash addresses are used to obtain the collision properties associated with a very large hash table, and the storage technique is used to achieve the storage and search properties of a reasonably sized hash table. if the virtual hash address is taken large enough, the expected number of collisions can be reduced to essentially zero. with no expected collisions, it is possible to dispense with verifying that a query word and the dictionary word are the same. it is enough to check that they produce the same virtual address. hence, the character strings need not be stored in the hash-bump tables at all. to implement the virtual scheme a large hash address is computed, say in the range ( 0, v), and the address is split into a major and minor part. the major portion is used just as before-as an index on a hash table of size h. the minor portion is stored in the hash or bump table, in place of the character string. with this difference, the virtual scheme works just as the basic scheme does. the lookup procedure is identical, but the minor portions are used for comparison rather than character strings. all the results of the previous section apply as storage and timing estimates. the advantage of virtual scatter storage systems is economy of storage space. the minor portion is much smaller in size than that of the character string it replaces. it is true that the virtual scheme assigns the same concept number to two different words if they have the same virtual address. this need not be disastrous for document retrieval applications. presumably v is chosen large enough to keep the number of collisions small. on the one hand, errors could be neglected because of their low probability of occurrance and their small effect on the total performance of the retrieval system. on the other hand, it is always possible to resolve detected collisions even in a virtual scheme. collisions may be detected during dictionary construction or updating, and the characters for the scatter storage for dictionary lookups/murray 181 colliding words appended to the bump table. the hash or bump table entry must contain a pointer to these characters along with an identifying flag. collisions occurring during actual lookups cannot be detected. collision problem "' in order to use a virtual hash scheme, the virtual table must be large enough to reduce the expected number of collisions to an acceptable level. from a practical point of view, a collision may be considered to involve only two words, rather than three, four, or more. it is assumed that the probability of these other types of collisions is negligible. let v be the size of the virtual hash table. then the expected number of collisions is simply n. =h2 a2 = v2e·a where a = ~ . in this case v> > n so that a is small and e·a is approximately 1. a2 7) n.=v2 n2 =2v suppose, for example, the dictionary has n = 213 words. if the size of the virtual hash table is chosen to be v = 226, then the expected number of collisions is (213)2 1 nc = 2(226) = ]' suppose further that this table size is adopted for the dictionary, and that the hash code algorithm produces three collisions. the question arises whether the algorithm is a good one-whether it produces uniform random addresses. the answer is found by extending the previous probability model. consider a virtual scatter storage scheme in which the virtual table size is v, and n items are to be entered into the hash-bump tables. again assume that collisions involve only two items. let p(i) =prob [i collisions] = prob [i table slots have 2 items and n-2i slots have 1 item] the number of ways of choosing the i pairs of colliding words (in an ordered way) is: (~x n22} .. ( n-~+2 ) 2' (~~2i)l 182 journal of library automation vol. 3/3 september, 1970 there are il ways of ordering these pairs and vi (v)n-i = (v-n+i)l ways of placing the pairs in the hash table, so that ( . nl (v)n-' t n 2 j 8) p l) = 21il (n-2i)! -yr fori= 0,1, ... , in a form for hand computation, 1 2 n-1 9) p(0)=(1--y) (1-y-) ... (1---y) p( ') =p('1 ) (n-2i+ 2) (n-2i+1) f t ' 2i(v-n+ i) or i=1,2, ... , these results are exact, but the following approximations can be used with accuracy n-1 . log p ( 0) = ~ log ( 1 ~ ) i=l n-1 . =~ -' . 1 v 7= nz -2v let f3 = ~; . terms linear in n may be neglected in equation 9, giving p(o) = exp(-fi) p(i) = ~ p(i-1) 1 this is also a poisson distribution: 10) p(i) = exp(-fi) ft for i = 0,1,2, .. . , l ~ j this equation gives the approximate probability of i collisions for a virtual scatter storage scheme. it may be used to form a confidence interval around the expected number of collisions nc = /3. for the previous example in which v = 22 6, n = 213, n c = ~'the following table of values can be made: i p(i) ~p(i) 0 .607 .607 1 .303 .910 2 .076 .986 3 .012 .998 \ scatter storage for dictionary lookups/murray 183 the probability is .986 that the number of collisions is less than or equal to 2. since the algorithm gave 3 collisions, it appears to be a poor one. the results for the collision properties are summarized in table 2. table 2. expected collision properties for virtual scatter storage systems measure collision factor expected number of collisions probability of i collisions probability that the number of collisions c lies in [ a,b] v virtual hash table size formula n2 {3= 2v n.=p p( i) = exp(-{3) ~' i=o, 1, ... , [ ~ j b prob = ~ p(i) i=a n number of words to be entered experiments with algorithms for generating hash addresses any scatter storage scheme depends on a good algorithm for producing hash addresses. this is especially true for virtual schemes in which collisions are to be eliminated. in these experiments three basic algorithms are evaluated for use in virtual schemes. the words in two dictionariesthe adi wordform and cran 1400 \ivordform-are used. the hash-bump tables are filled using these words and the resulting collision and storage statistics compared with the expected values. dictionaries the adi wordform contains 7822 words pertaining to the field of documentation. it contains 206 common words (previously judged) averaging 3.93 characters. the remaining 7616 noncommon words average 8.00 characters. in all there are 61,712 characters. the cran 1400 wordform contains 8926 words dealing with aeronautics. the common word list consists of that of the adi, plus four additional entries. the 8716 noncommon words average 8.40 characters. there is a total of 74,074 characters. figures 3 and 4 show the distribution of the length of the words versus percentage of collection. the abrupt end to the curves in figure 3 is due to truncation of words to 18 characters. both dictionaries have approximately the same size and proportions of words of various length. however, their vocabularies are considerably different. a good hash scheme should work equally well on both dictionaries. 184 journal of library automation vol. 3/3 september, 1970 1/) 0 common words "e ~ ~ adi >0 cran 1400 ~ 0 c 0 += 0 ·0 -0 -c q) ~ 8 q) a.. 0 2 4 6 8 10 14 word length fig. 3. distribution of dictionary words according to their lengths. >. '0 c .q -u 0 -0 q) .~ -a :; e ::j u \ scatter storage for dictionary lookups/murray 185 0 common words 6 adi 0 cran 1400 0 2 4 6 8 10 12 14 16 18 20 word length fig. 4. cumulative distribution of dictionary words according to th eir l engths. 186 journal of library automation vol. 3/3 september, 1970 hash coding algorithms by their nature, hash coding algorithms are machine dependent. the computer representation of the alphabetic characters, the way in which arithmetic operations are done, and other factors all affect the randomness of the generated address. the algorithms described below are intended for use on the ibm s /360. words are padded with some character to fill an integral number of s /360 full words. then the full words are combined in some manner to form a single fullword key, and the final hash address is computed from this key. in the experiments which follow, the blank is used as a fill character. this is an unfortunate choice because of the binary representation of the blank 01000000. in some algorithms the zeroes may propagate or otherwise affect the randomness. a good fill character is one that 1) is not available on a keypunch or teletype, 2) will not propagate zeroes, 3) will generate a few carries during key formation, and 4) has the majority of its bits equal to 0, so their positions may be filled. a likely candidate for the s/360 is 01000101. three basic methods of generating virtual hash addresses-addition, multiplication, and division-are studied. the first and second provide contrasting ways of forming the single fullword keys. the second and third differ in the way the hash address is computed from the key. variations of each basic method are also tested to try to improve speed, programming ease, or collision-storage properties. l. addition methods ac-addition and center the fullwords of characters are logically added to form the key. the key is squared and the centermost bits are selected as the major. the minor is obtained from bits on both sides of the major. as-addition with shifting same as ac, except the second, third, etc. fullwords are shifted two positions to the left before their addition in forming the key. (an attempt to improve collision-storage properties) am-addition with masking same as ac, except the second, third, etc. fullwords have certain nonsignificant bits altered by masks before their addition in forming the key. (an attempt to improve collision-storage properties) 2. multiplication methods mc-multiply and center the fullwords of characters are multiplied together to form the key. the center bits of the previous product are saved as the multiplier for the next product. the key is squared and the centermost bits selected as the major. the minor is obtained from the bits on both sides of the major. scatter storage fo1' dictionary lookups/murray 187 msl-multiply and save left same as mc, but during formation of the key, the high order bits of the products, rather than the center, are used as successive multipliers. (an attempt to improve speed) mlm-multiply with left major same as mc, but taking the major from the left half of the square of the key and the minor from the right half. (an attempt to improve speed) 3. division methods dp-divide by prime the fullwords of characters are multiplied together to form the key. the center bits of the previous product are saved as the multiplier for the next product. the key is divided by the length of the virtual hash table-a prime number in this case-and the remainder used as the virtual hash address. the major is drawn from the left end of the virtual address and the minor from the right. do-divide by odd number same as dp, except using a hash table whose length is odd. (an attempt to provide more flexibility of hash table sizes ) dt -divide twice same as dp, except two divisions are made. the major is produced by dividing the key by the actual hash table size. the minor results from a second division. primes are used throughout as divisors. (an attempt to improve storage-collision properties) evaluation in the experiments to evaluate each variation of the above hash schemes, the size of the virtual hash table varies from 220 to 228 slots. the actual hash table varies in size from 212 to 214 slots. bump table space is used as needed. the tables are filled by the words from either the adi or cran dictionaries and the collision and storage statistics taken. because good collision properties are most important, they are examined first. the storage properties are dealt with later. the number of collisions obtained from each scheme versus the virtual table length is plotted in figures 5 to 8. the adi dictionary is shown in figures 5 and 7, and the cran in figures 6 and 8. the circled lines correspond to curves generated from equations 7 and 10. the horizontal one shows the expected number of collisions and the lines above and below it enclose a 95% confidence interval about the expected curve. in other words, if an algorithm is generating random . addresses, the probability is 95% that the curve for that scheme lies between the heavy lines. consider figures 5 and 6 showing the results for all the addition methods and the mc variation of the multiplication variation. the ac and mc algorithms differ only in that addition is used in forming the key in the 188 journal of library automation vol. 3/3 september, 1970 -0 ooooooo theoretical curves (equations (7) and ( 1 0) experimental curves ---interpolated curves virtual hash table size (power of two) fig. 5. collisions in the adi dictionary for addition and multiplication hash schemes. first one and multiplication in the second one. yet the curves are spectacularly different. the result seems to have the following explanation. the purpose of a hash address computation is to generate a random number from a string of characters. if the bits in the characters are as varied as possible, then the algorithm has a headstart in the right direction. however, the s/360 bit patterns for the alphabet and numbers are: a to i 1100 xxxx j to r 1101 xxxx s to z 1110 xxxx 0 to 9 1111 xxxx scatter storage for dictiona1·y lookups/murray 189 en c: 0 en 0 (.) -0 ... q) .0 e ::t z 20 ooooooo theoretical curves (equations (7) and (l 0) experimental curves --interpolated curves 26 virtual hash tobl e size (power of two) fig. 6. collisions in the gran dictionary for addition and multiplication hash schemes. c 28 in each case the two initial bits of a character are l's, so that in any given word one-fourth of the bits are the same. in forming a key, the successive additions in the ac algorithm may obscure these nonrandom bits if a sufficient number of carries are generated. however, the number of additions performed is usually small-2 or 3and it appears that the pattems are not broken sufficiently. the mc algorithm uses multiplication to form its keys, which involves many additions-certainly enough to make the resulting key random. the multiplications in the mc algorithm are costly in terms of computation time. therefore the as and am algorithms are tried. these addition 190 journal of library automation vol. 3/3 september, 1970 en c 0 ·~ 0 u .... 0 ooooo theoretical curves experimental curves interpolated curves 22 20 virtual hosh table size (power of two} fig. 7. collisions in the adi dictionary for division and multiplication hash schemes. variants try to hasten the breakup of the nonrandom bits by shifting and masking respectively. although these variants reduce the number of collisions somewhat, none of the addition schemes could be called random. typically a few words are singled out at some point and continue to collide regardless of the length of the virtual address. several collision pairs are listed below. note the similarities between the words. count worth tolerated wheel -sound -forty -telemeter -sheet in 1: 0 ·;;; 0 0 ... 0 ... cl) .d e :i z 20 scatter storage for dictionary lookups/murray 191 0000000 \ \ ,---\ \ \ \ 26 theoretical curves (equations (7) and (1 0) experimental curves interpolated curves 28 virtual hash table size (power of two) fig. 8. collisions in the gran dictionary for division and multiplication hash schemes. consider the multiplication algorithms. during key formation, the process of saving the center of successive products adds to the computation time. the msl variation attempts to remedy this by saving only the high order bits between multiplications (on the s /360 this means saving the upper 32 bits of the 64-bit product) . this method is so inferior that its collision graph could not be included with the others. the poor results stem from the fact that characters at the end of fullwords have little effect on the key and that the later multiplications swamped the effects of the earlier ones. examples of collision pairs are given below. for convenience the fullwords are separated by blanks. 192 journal of library automation vol. 3/3 september, 1970 certainty prevented heaving expe nse charter certainly -presented -heat lng -expanse -chapter the mc and mlm variants are identical with respect to collision properties. in general these algorithms produce good results, reducing the number of collisions to zero in both dictionaries. the collision curve is always beneath the expected one. consider figures 7 and 8 showing the results for all division methods and the mc method. all of the division algorithms display a distinct rise in the number of collisions when the virtual table size is near 224-regardless of the dictionary. the majority of the colliding word pairs are 4-character words having the same two middle letters. this brings to light a curious fact about division algorithms. for virtual tables, the divisor of the key is large and the initial few bits determine the quotient, leaving the rest for the remainder. for words of less than 4 characters (which require no multiplications during key formation), dividing by 224 is equivalent to selecting the last 3 characters of the word as the hash address. because the divisors are not exactly equal to 224, only the two middle characters tend to be the same. examples are: deal -bear took -soon held -cell verb -term this phenomenon apparently continues for table sizes around 226 and 228, but there are few or no words of 4 characters or less which agree in 26 or 28 bits. for divisors smaller than 22 \ a larger part of the key determines the quotient and apparently breaks up the pattern. because the above effect occurs only for v = 22 \ these points are passed over on the graphs. in general, the dt algorithm is superior to the rest of the division methods, mostly because each of its two divisors is smaller than those used in other methods. prime numbers seem to produce better results than other divisors. on the basis of collision properties, the mc, mlm, dt, and possibly as algorithms are the best. storage-search evaluations are included for these methods only. the experiments with each hash coding method also include counting the frequency of various length of collision chains. here a collision chain refers to chains of words producing the same major. the frequency counts are compared with the expected counts given by equation 5. the comparison is in terms of a chi-square goodn ess-of-fit test with a 10 % level of 0 8 :;::: t/) -0 -(j) q) ~ 0 ;:) ct (j) i .c u 2 0 scatter storage for dictionary lookups/murray 193 x·---x·---x or----dt / dt a~as mlm as ----as,dt mlm--mlm mlm mc mc-mc ~c ----mc virtual hash table size (power of two) xcurve for 10% level of significance fig. 9. deviations of storage-search properties from expected values for selected hash schemes using the adi dictionary. significance. figures 9 and 10 show the results of this test for each dictionary. included in the graphs is the line corresponding to the 10% level of significance. if the major portions of the hash addresses are really random, there is a probability of 0.90 that the 10% line will lie above the curve for the algorithm tested. consider the mc and mlm algorithms which differ only in that the major is selected from the center and left of the virtual address. from the graphs, it is clear that the multiplication methods produce their most 194 journal of library automation vol. 3/3 september, 1970 .~ -.!:!! -0 -cj) ~ 0 8 ~ c:1' cj) i ..c (.) 6 4 mlm x--dt virtual hash table size (power of two) xcurve for 10% level of significance fig. 10. deviations of storage-search properties from expected values for selected i-iash schemes using the gran dictionary. random bits in the center of their product. this is somewhat as expected, because the center bits are involved in more additions than other bits. the division algorithm, which had fairly good collision properties, seems to have rather mediocre storage properties. this is probably due to the scatter storage for dictionary lookups/murray 195 same causes as the collision problems, but working at a lower level, and not affecting the results as much. the as curve is included simply for completeness. the scheme displays a well behaved storage curve, but it has poor collision properties. in summary, the mc scheme seems to be the best for both dictionaries in terms of collision and search properties. in terms of computing time, the method is more time consuming than the addition methods, but less expensive than the division methods. the difference in computation times is not an extremely big factor. all methods required from 35 to 55 microseconds for an 8-character word on the s/360/65. the routines are coded in assembly language and called from a fortran executive. the times above include the necessary bookkeeping for linkage between the routines. a practical lookup scheme general description the lookup scheme described below is designed for use with dictionaries of about 21 :. words. the virtual table size selected is 229 and the actual table size is 216• on the basis of the results presented in previous sections, when the dictionary is full, it is expected that 1) 36.8% of the hash table will be empty, 2) 36.8% of the hash table will be single entries, 3) the bump table will require ( 0.632 )215 entries, 4) 1 collision is expected, 5) the probability of 5 or fewer collisions is 0.999, and 6) the average lookup will require 2.13 probes. table layout in all previous discussions a dictionary entry has included a minor and a concept number. a concept number is simply a unique number assigned to each word. the hash address of a word is also unique, and hence can be used. there is no need to store and use a previously assigned concept number. a dictionary entry contains a 14-bit minor and a single bit indicating whether the word is common or noncommon: 1 2 15 ic minor c = 0 implies the word is common; c = 1 implies the word is noncommon. a hash table entry contains 16 bits arranged as : 0 1 15 i flag i information flag = 0 implies that the information is a dictionary entry; flag = 1 implies that the information is a pointer to the bump table. words that have the same major are stored in a block of consecutive 196 journal of library automation vol. 3/3 september, 1970 locations in the bump table. this eliminates the need for pointers in the collision "chains". a bump table entry also has 16 bits structured as: 0 1 2 w i end i c minor end= 0 implies that the entry is not the last in the collision block; end = 1 implies that the entry is the last in the block. some convention must be adopted to signify an empty hash table slot. a zero is most convenient in the above scheme. unfortunately a zero is also a legitimate minor. however, to cause trouble the word generating the zero minor would have to be a common word and a single table entry (zero minors in the bump table are no problem). hopefully this occurs rarely because of the size of the minor ( 14 bits) and the small number of common words. however, even if this combination of circumstances occurs, the common word could be placed in the bump table anyway. in designing the tables, it is important to make the hash table entries large enough to accommodate the largest pointer anticipated for the bump table. for the above scheme, the expected bump table size is less than 215 so that the 15 bits allocated for pointers is sufficient. search considerations the number of probes needed to locate any given word depends on the place that the word occupies in a collision block. the average search time is improved if the most common words occupy the initial slots in each block. a study of adi text yields the statistics given in tables 3 and 4. table 3. division of words by categm·y. number of words percent of total 17270 total words 100.0 8716 common words 50.5 8554 noncommon words 49.5 table 4. distribution of l engths. number of all common noncharacters words percent words percent common percent words 1-4 10145 58.8 8057 92.5 2097 24.5 5-8 4630 26.8 627 7.2 4003 46.8 9-12 2249 13.0 32 0.3 2217 25.9 13-16 221 1.3 0 0.0 221 2.6 17-20 11 0.1 0 0.0 11 0.1 21-24 5 0.0 0 0.0 5 0.1 totals 17270 100.0 8716 100.0 8554 100.0 av. length 6.3 4.3 8.3 scatter storage for dictionary lookups/murray 197 using the categorical information, it appears that in filling the hash-bump tables, the common words should be entered first. within each category, all words should be entered in frequency order if such information is known. if frequency information is not available, the distribution by lengths can be used as an approximation to it. for common words, this means entering the shorter words first. for noncommon words, the words of 5 to 8 characters should be entered first. the greater the number of single entries, the greater the average search speed. figure 11 shows the fraction of single entries ( f 1) and fraction of empty slots ( f o) for various load factors. the fraction of single entries .l: iji 0 i -0 c 0 -(.j 0 ~ u. 0 .4 .8 load factor fig. 11 . theoretical hash table usage. 0 fraction empty slots a fraction of single entries 1.6 198 journal of library automation vol. 3/3 september, 1970 f1=ae-a reaches a maximum for a= 1, but since the slope of the curve is small around this point, the load factor in the interval ( 0.8, 1.2 ) is practically the same. table usage is better, however, for the larger values of a. these facts imply that scatter storage schemes make most efficient use of space and time for a=l. most text words can be assumed to be in the dictionary. thus the order of comparisons during lookup should be: hash table scan 1) check minor assuming the text word is a common word 2) check minor assuming the word is non common 3) check if the entry is a pointer to the bump table 4) check if the entry is empty first bump table entry (must be at least two) 5) check minor assuming the word is a common word 6) check minor assuming the word is non common other bump table entries 7) check minor assuming the word is non common 8) check minor assuming the word is common 9) check if at end of collision block. the search pattern can be varied to take advantage of the storage conditions. for example, if all common words are either single entries or the first element of a collision block, then step 8 may be eliminated. performance the lookup system described above has been implemented and tested on the ibm s/360/65. a modified form of the mc algorithm is used to compute a 29-bit virtual address and divide it into a 15-bit major and a 14bit minor. the modification is the inclusion of a single left shift of the fullwords of characters during key formation. this breaks up certain types of symmetries between words such as wingtail and tailwing. without this, such words will always collide. the hash-bump tables were filled with entries from the adi dictionary-common words first, followed by noncommon words. the shortest words were entered first. table 5 gives comparison of the expected and actual results. table 5. lookup system results. a=.239 number of empty table slots number of single entries number of collision blocks longest collision block average length of collision blocks size of bump table number of collisions average probes per lookup expected 25810 6161 797 4 2.1 1663 .06 1.33 actual 25762 6250 756 4 2.1 1572 0 1.33 scatter storage for dictionary lookups/murray 199 to obtain the actual lookup times 627 words were processed. the words were read from cards and all punctuation removed. each word was passed to the lookup program as a continuous string of characters with the proper number of fill characters added. the resulting times are given in table 6 (in microseconds); a larger sample of the category of "not-found" words processed with less accurate timings indicates that the average time for words in this category is about 62 microseconds (standard deviation 26). table 6. lookup times category number of words of words all 627 common 288 noncommon 338 not found 1 percent of total 100.0 45.9 53.9 0.2 average time 57.9 49.9 64.7 53.1 standard deviation 11.7 6.7 10.7 0.0 average probes 1.18 1.12 1.24 1.00 the time to compute a hash address depends on the length of the word . let n be the number of s /360 full words needed to hold these characters. the time to form the initial address is i ( n) = 34.5 + 10.2 ( n-1) microseconds. the average total lookup time, then, is t = i(n) + cp where c is the average time per probe into the table space and p is the average number of probes. for the words in the experiment n = 2.32 (average), i ( n) = 40.3, and t = 57.9, so that each probe required about 15 microseconds. c ompadsons timing information for other lookup schemes is difficult to obtain. a treestructured dictionary is used for a similar purpose at harvard. published information indicates 6pq microseconds are needed to process p words in a dictionary of q entries. this time is for the ibm 7094. translating this time to the s/360/65, which is roughly four times faster, and using the adi dictionary ( q = 7822), it appears that each lookup averages 11,000 microseconds. exactly how much computation and input-output this includes is unknown. extensions larger dictionaries as more words are added to the dictionary, the size of the virtual address must increase in order to prevent collisions. as a result, the number of bits per table slot must also increase in order to accommodate the larger minors and pointers that are used. for a fixed-sized hash table, the number of entries in the bump table grows as new words are added. at some point the space required for tables will exceed the amount of core allotted for 200 journal of library automation vol. 3/ 3 september, 1970 dictionary use. to salvage the scheme, it may be possible to split the buinp table into parts-one part for more frequently used words and one for words in rather rare usage. during dictionary construction common words are entered first, then noncommon, then rare. when a rare word must be placed in a collision block, a marker is stored instead, and the item is placed in the secondary bump table. presumably the nature of the words in the second bump table will make its usage rather infrequent, thus saving access to auxiliary storage to fetch it. suffix removal many dictionary schemes store only word stems; the lookup attempts to match only the stem, disregarding suffixes in the process. this is not easily done with scatter storage schemes. one solution is to try to remove the suffix after an initial search has failed. each of the various possible stems must be looked up independently until a match is found. another solution is to use a table of correspondences between the various forms of a word and its stem. the concept number could be used as an index on th is table containing pointers to information about the actual stem. a thesauru s lookup can be handled the same way. application to library files library fil es-characterized by a large number of entries, personal and corporate names, foreign language excerpts, etc.-present special problems to lookups. with regard to size, there is no particular reason that scatter storage cannot b e extended to such files. the only genuine requirement is the ability to compute a virtual address long enough to insure a reasonably low number of collisions. as mentioned previously, table space can become a problem. for really large files, a two-stage process looks most promising. a small hash table is used to address high frequency items and a larger hash table is used for addressing all other data. lookup starts with the small tables and continues to the larger ones if the initial search fails. the same virtual address can be used in both lookups by shifting a few bits from the high-frequency minor to the low-frequency major. this two-stage technique should keep the amount of table shufbing to a minimum and provide rapid lookup for all textual data in titles, abstracts, etc. with respect to bibliographic information, personal and corporate names are bothersome because they can occur in several forms. unfortunately, scatter storage schemes do not guarantee that dictionary entries for r. a. jones and robert a. jones are near each other, so that if an initial lookup fails, the rest of the search can be confined to a local area of the file. there are two approaches to the problem : ( 1 ) standardization of names before input or ( 2) repeated lookups using variants of a name as it occurs in text. standardization, along with delimiting and formatting bibliographic data, is probably the most effective and least expensive approach. in addition, it reduces the amount of redundant data in the file. scatter storage for dictionary lookups/murray 201 phrases in foreign languages present a difficulty, since the character sets on most computing equipment are limited to english letters and symbols. however, if an encoding for such symbols is used, lookup can proceed normally. the problem of obtaining the dictionary entry for an english equivalent of a foreign word is a completely different matter and will not be dealt with here. conclusions virtual scatter storage schemes are well suited for dictionaries, having both rapid lookup and economy of storage. the rapid lookup is due to the fact that the initial table probe limits the search to only a few items. the space savings come from the fact that the actual character strings for words are not part of the dictionary. the schemes depend heavily on a good algorithm for producing random hash addresses. the theory developed in the first two sections of this paper gives a basis for judging the worth of proposed algorithms. for any particular application, the table organization may vary to suit different needs and to store different information. however, the advantages of scatter storage schemes are still present. references 1. salton, g.: "a document retrieval system for man-machine interaction." in association for computing machinery. proceedings of the 19th national conference, philadelphia, pennsylvania, august 25-27, 1964, pp. l2.3-l-l2.3-20. 2. mcilroy, m. d.: dynamic storage allocation (bell telephone laboratories, inc., 1965). 3. morris, r.: "scatter storage techniques," communications of the acm (january, 1968 ). 4. maurer, w. d.: "an improved hash code for scatter storage," communications of the acm (january, 1968) . 5. johnson, l. r.: "indirect chaining method for addressing on secondary keys," communications of the acm (may, 1961). reproduced with permission of the copyright owner. further reproduction prohibited without permission. using gis to measure in-library book-use behavior xia, jingfeng information technology and libraries; dec 2004; 23, 4; proquest pg. 184 that development rests with the local site. for the nwda consortium, this development , using the code base, ha s been manageable. the current stat e of interface dev elop ment for the nwda project can be reviewed at http :// nwda. wsulibs .wsu.edu / project_info /. conclusion in se lecting an ead searc handretrieval system, one important qu es tion for th e consortium was, which software solution had the best prosp ects for migration in the futur e? because of the inherent strength s of nativ e xml technology in comp ari son to the other product categories list ed in tabl e 1, a nati ve xml databas e appeared to be the best appro ach, and textml provided the best combination of licensi ng costs, software capabilities, and support. it is important to note that the distinctions betw een nativ e xml databas es and databases that support xml throu gh extensions (xmlenabled databa ses) ma y b eco me more difficult to dis cern over time, in part du e to the existi ng exper tise and in vestme nts in rdbms technologies. 16 nevertheless, capabilities central to native xml, such as the us e of an xml-based query language, are integral to th e success of such hybrid syst ems . references and notes 1. daniel pitti , "encoded archival de scriptio n: the development of a n encoding standard for archival finding aids ," the american archivist 60, no. 3 (summ er 1997): 269. 2. daniel pitti, "encod ed archival des cription: an introducti on and overvi ew," 0-lib magazine 5, no. 11 (nov. 1999). accessed nov. 2, 2004, www.dlib. org / dlib / november99 / 11 pitti.html. 3. daniel v. pitti and wendy m. duff (ed s.), "introduction," in encoded archival description on the internet (binghamton, n.y.: haworth, 2001), 3. 4. james m . roth, "serv ing up ead: an exp lorat o ry study on the deployment and utilization of encod ed archival description finding aids," the american archivist 64, no. 2 (fall /win ter 2001): 226. 5. sarah l. shreeves et al., "har ves ting cultural heritage metadata using the oai protocol," library hi tech 21, no. 2 (2003): 161. 6. nan cy fleck and michael sead le, "ead harvesting for the national ga llery of th e spoken word" (pap er present ed at th e coa liti on for netw orke d information fall 2002 task force meeting, san anton io, tex., dec. 2002). accessed nov. 2, 2004, www.cni .org/tfms/2002b. fall/handouts/h-ead-fleckseadle.doc. 7. anne j. gilliland -swe tland , "po pularizi ng th e finding aid : exploiting ead to enhance online discover y and retrieval," in encoded archival description on the internet (bing h a mton , n.y.: haworth, 2001), 207. 8. ibid, 210-14. 9. charlott e b. brown and brian e. c. schottlaender, "the onlin e arch ive of california: a consortia! approach to encode d archival descrip tion, " in encoded archival description on the internet (bingham ton, n .y.: haworth, 2001), 99. 10. ibid , 103-5. oac ava ilable at: www . o ac.c dlib.org / . accessed nov. 2, 2004. 11. christopher j. prom and thomas habing, "using the open archiv es initiative protocols w ith ead," in proceed ings of th e second acm/ieee-cs joint confe rence on digit al librari es (portland, ore., july 2002). accessed nov . 2, 2004, http:// dli .grai ng er.uiu c.edu / publications / jcdl20 02/ p14prom.pdf. 12. marc cyrenne, "go ing n at ive: wh en should you use a native xml database?" aim e-doc magazine 16, no . 6 (nov./dec. 2002), 16. accessed nov. 2, 2004, www. edo cmaga zine.com/ article_ n ew.as p?id=2 5421. 13. product categor y decisions ba sed upon definiti ons and classifications available from: ronald bourret, "xml database products." accessed nov. 2, 2004, www. rpbourret .com/ x ml / xmlda ta base prods .htm. 14. cyrenn e, "going native," 18. 15. bill stockting, "ead in a2a," microsoft power point pres entation. accessed n ov. 2, 2004, www.agad.a rchiwa . gov.pl/ ead /s tocking.pp t. 16. uwe hohenstein, "supp orting xml in oracle9i ," in akm a l b. chaudhri , 184 information technology and libraries i december 2004 awais rashid, and rob erto zic ar i (eds.), xml data management: native xml and xml-enabled database systems (boston: addison-wesley, 2003), 123-4. using gis to measure in-library book-use behavior jingfeng xia this article is an attempt to develop geographic information systems (gis) technology into an analytical tool for examining the relationships between the height of the bookshelves and the behavior of library readers in utilizing books within a library. the tool would contain a database to store book-use information and some gis maps to represent bookshelves. upon analyzing the data stored in the database, differen t frequencies of book use across bookshelf layers are displayed on the maps. the tool would provide a wonderful means of visualization through which analysts can quickly realize the spatial distribution of books used in a library. this article reveals that readers tend to pull books out of the bookshelf layers that are easily reachable by human eyes and hands, and thus opens some issues for librarians to reconsider the management of library collections. several years ago, when working as a library assistant reshelving books in a university librar y, the author noted that the majority of books used inside the library were from the mid-range laye rs of bookshelves. that is, b y proportion , few books pulled out by library readers were from the top or bottom layers. books on the layers that were easily reachable by readers were frequentl y utilized . such a book-us e distribution patt ern made the job of reshelving books easy, but created some inquiries: how could book locati ons influ ence th e choices of readers in selecting books? if this was not an isolated observation, it must have exposed an inter es ting reproduced with permission of the copyright owner. further reproduction prohibited without permission. phenomenon that librarians needed to pay attention to . then , by finding out the reasons , librarians might becom e capable of guiding, to some extent , us ers' selectiv eness on library books by deliberately arranging collections at design ated heights on book sh elves. a research study was designed to develop geographical information systems (gis) into an analytical tool to examine former casual observations by the author. the study was conducted in the mackimmie library at the university of calgary. thi s paper highlights th e results of the study that aimed at assessing th e behavior of library readers in pulling out books from bookshelves . thes e book s, when not checked out, are categoriz ed as "pickup books" becau se they are usually discarded inside a library after use and then picked up by library assistants for reshelving. like many other libraries , the mackimmie library does not encourage reasd ers to reshelve books th emse lves. arcview, a gis software, was selected to develop th e tool for this study because gis ha s the functions of dynamicall y analyzing and di splayin g spatial data. the research on library readers pullin g out books involv es the measur emen ts of bookshelf heights, and thu s deals with spatial coordinates. with the capability of presenting book shelves in different views on map s, gis is able to provide readers with an easy und erstanding of the anal ytical results in visual forms, which make any textu al description s wordy . at the same time, some gis products are available now in most academic libraries, thus giving develop ers convenient access to use. hypothesis when library users decide to check books out of a library, the se books are what the y think of as useful. peopl e are usually hesitant to carry home books that are of little or uncertain use, not only because of the limit on the numb er of check-out books , but also bec ause of the physical work required for carrying them. moreover, some items, such as periodicals and multimedia materials, are either designated as "refe rence only" or have a very short loan period . it is reasonable to beli eve that user s carefully select what they want from library collections and keep these book s for handy use outside the library. by contrast , in-library book use repre sents a different category of library readers' behavior . there are two general categories of in-library book us e: readers bringin g their own books into a library for use, and readers pulling out book s from bookshelves inside a librar y. the former is commonly seen when students study textbook s for examinations (not the topic of this study), whil e the latter is a little more complex. 1 as library users approach bookshelves to extract book s, th ey may or may not hav e a definit e target. when coming with call numb ers, peopl e will deliberately draw the books they want for reading, photoc opyi ng , or referencing. ho wever, there are time s when user s on ly wander in bookshelf aisles of desired collections, uncertain about singling out specific books . th ey may simply shelf-shop to randomly select whatever is interesting to them, or they may locate a subject of need and go to the storage position(s) to look for whatever books are there. no matter what these readers' intention s are, they roam among collections, pick book s for quick u se, and leave them inside the library after use, although some materials may also be checked out. because of such arbitrary selections from library collections , physical con venie nce sometimes influence s library users in takin g books from booksh elves-they ma y look around for books on bookshelf layers that are at a reach able height. the standard library bookshelf is hi gher than the average person's height and is structured to have five to eigh t layers. in aca demic libraries, "wood shelving is available in three heights: 82 in. (2050 mm), with a bottom shelf and six ad justabl e shelves; 60 in. (1500 mm), with a b ottom shelf and four adjustable she lves; and 42 in. (1050 mm), with a bottom shelf and two adjustable shel ves ." 2 for regular collections in mo st academic libraries, bookshelve s are usually about eightytwo inches high and hav e seven layers. books on the top lay er are out of reach for many reader s, requiring them to use a ladder to draw a book from it. many users are hesitant to use ladders. even worse, a reader will have to bend over or squ at down to view the contents of books on the bottom layer of a bookshelf . hence , the hypothe sis is that books used inside a library are primarily distribut ed among the mid-ranged layers of bookshelves. specifically, if a bookshelf ha s seven lay ers, books placed on layers two through six are most frequently consulted. this is the subject of this research paper . background a considerable number of studies have investigated the utilization of books that are checked out of a library. an esti mate made in 1967 pointed out that over seven hundred research results pertained to this topic. ' how ever, the situation of books used inside a library has not been given enough attention. one of the reasons for this seeming neglect comes from the belief that the records of library book s in circulation provide similar info rma tion as those of books used within libraries." thi s misunderstanding wa s lately criticized by other researchers who discov ere d the differences in use behavior between jingfeng xia (jxia@email.arizona.edu) is a student at the school of information resources and library science at the university of arizona, tucson. using gis to measure in-library book-use behavior i xia 185 reproduced with permission of the copyright owner. further reproduction prohibited without permission. libr ary readers takin g books h ome and those using books inside libraries. 5 research ers hav e now recog ni zed that correlations between the two sets of data are n ot as strong as they seemed to be. such reco gnition, unfortunately, ha s not resulted in mor e consequ ent work to explor e the issu e of in-lib rary book use. this is probabl y due to th e difficulties of co llecting data or the la ck of appropriate research methods .6 also, the majority of rele va nt surv eys w ere conducted several de cades ago and focu sed primaril y on exp loring a go od method of sam plin g in-library book us e.7 am ong the se studies , fu ssler and simon preferr ed to carry out researc h by distributing questionnaires am ong library reader s; drott u sed randomsampling m et hods to statisti ca lly examine th e importance of librar ybook use; and jain, as well as salv erso n, emphasized dividing th e survey time s into differ ent investi gation units when conducting res earch. simil a rly, m orse point ed out the compl ex ity of measurin g lib rarybook u se a t wo rk , advocating an involv ement of computerized operation s in librar y-book man ag ement. the sampling strategies and analy tical methods implemented in pa st studie s are still applicable to curr ent res earc h. non etheless, because many new technol ogie s ha ve come into view since th en, it is quite likel y tha t som e new ways of obtaining and analy zing th e d ata of in-library book use can now be developed. th e n ew app roac hes must have the capability of providing not only accurate m easurem ent of the data but also the me ans for easy manipulation . th eir result s must be able to enhance th e und ers tandin g of us er behavio r in expl ori ng th e reso urc es of existing collection inv entorie s . one of th e solutions is an analytic al tool. an analytical tool can control data collection and anal ysis by computerizati on . if the system is ab le to accumul ate const antly upd ated records ov er time, it will remedy the probl em of poor sampling th at man y researchers hav e encount ered, be cause an alysis will then b e done on all the data rather th an w ith certain isolated samples. the development of m odern technologi es makes such data collection and storage po ssible and easier than ever before. on e exampl e of the technologi es is the radio freque ncy identification (rfid) tag system that ha s been adopted b y some public and acade mic librar ies recently.8 thi s system stores a tag in each librar y item with the item's biblio gra phic information, and uses an antenna to keep tr ack of th e tag. by automatically communicating with dat a stored in the tags, the system can collect dat a on all librar y collections in a timel y manner and export them into pred esigned d atabases for easy man ag ement. data an a lys is and pres enta tion comprise ano ther p ar t of the an aly tical mechani sm. researc hers h ave to carefully evaluate existing technologies in order to select prop er produ cts or de ve lop parti cular pro gra ms to integrate with rfid (if used) and th e databases. it is fortunate th at gis techno log y is available with numerous functi ons for analyzing and demonstrating data , especiall y spatial data. da ta visuali za tion through gis produ cts has been very good, which giv es them advantages over other analytic al, stati stical, or repor tin g produ cts. combining rfid and gis into one system would seem to be th e perfect solution-the former can effective ly carry out dat a collection and th e latter can efficiently perfo rm data analysis and presentation. h ow ever, while gis products h ave been u sed in libraries in the unit ed states for more th an a dec ade, mo st academic libraries are hesitant to invest in rfid because of its high costs . gis technology alone, however , can still provide sufficient functions to be dev eloped into such an analytical tool. up to n ow, tho se librarie s that have provid ed gis serv ices only use the software that assists in the utilization of geospatial data and map186 information technology and libraries i december 2004 ping te chnologie s for users .9 gis is not expl oited enou gh to aid the manage ment of librari es them selves and the res earch of librar y collections. some commercial gis software, such as "lib rary decision" by civic-technologi es, has be en recently marketed to support the analysis of libraryuser d a ta for public libraries. 10 ho wev er, it only wor ks w ell on the data of conventional geographical nature, that is, th e distributi on and location of librari es and th eir users with the mapping of city bl ocks and streets . it does not app ly to a librar y an d its books, and especiall y not to the distribution of books us ed insid e th e librar y. such products are also not ap plicabl e to acad emic librari es that do not always concentrate on the ana lysis of geog rap hical area s of their us ers. even so, gis h as all the function s that such a propos ed analytical tool demands. it is suit able for assisting in the research of in-library book us e where library floor layout s or other facilities can be d raw n into maps on multiple-dimensional views. at the same tim e, bookshelves wi th individual lay ers can be treated as an innovative form of map by gis technology (see figur e 1), makin g visible the relationship of book u se to the height of the book sh elf. as soon as th e presentation mechanism is linked to databases, any updat es on book use will be mirror ed visuall y. method this proj ect is one of a serie s of projects for deve lopin g gis into a tool to manag e and anal yze the u sage characteristi cs of library books . the other projects include u sing gis to measure book u sability for the de velopm ent of collection inventorie s; to assist in the managem ent of libr ary physic al space an d facilities; and to locat e library items . 11 in order to make gis workable for the subject of this paper , the focus was placed only on the exploration of corr ela tions b e tween b ooks helf reproduced with permission of the copyright owner. further reproduction prohibited without permission. figure 1. the front view of one bookshelf rack on the fifth floor of the university of calgary mackimmie library. eight bookshelves assemb le the range. here, different shades of color represents the numbers of books used on each individual layer. the display is only for demonstration and not to actual scale. height s and book-use frequenci es in an aca demic library environm ent. th ere are two major step s to conductin g this research : collectin g data and d ev eloping a gis anal y tical tool. since mackimmie librar y did not in ves t in rfid at th e tim e thi s resea rch was undertak en , p ersonal ob servations were mad e to record b ook-use data. 12 the dev elopm ent of th e gis tool involves creatin g a sm all d a tabase to store data and facilit ate d ata analysis. it also requir es creatin g seve r al bookshelf and sh elf-r an ge m ap s to pre sent anal ytical result s in visualized forms. arc view-the mo st p opul ar gis produ ct in th e w orldwas ut ilize d for the de ve lopment. this paper presents only a p or tion of co llection areas at mackimmie library. part of the fifth floor, wh ere som e collections of humaniti es and social sciences are stored, w as selected becau se this floor is amon g the busi est of th e floors used by read ers. it is filled with sixty-eight ran ges of b ookshelves containin g book s from call numbers b to du. the terms used in this paper includ e bookshelf, referring to one unit of furnitur e fitted with horizontal sh elves to h old book s; rack, which includ es more than one bookshelf standin g tog eth er in a line ; and range, comp osed of two racks standing b ack-t o-back. bookshelves on the fifth floor are arr anged to surround a group of facility rooms in the central area. stud y corridors are set between booksh elves and the wall. each booksh elf ran ge consists of two bookshelf rack s, each of which in turn has eight individual bookshel ves . all of the book shel ves are about eight y-two in ches high and have seven laye rs. th e laye rs, except for the top on es th a t are open, are equal in height , w idth , and length. data collection personal surv eys wer e taken by the author to not e d own each call number of books that w ere n ot in their origin a l p os ition s on the sh elv es, but in stead were found discard ed on the floo r, tables, chairs, sofas , or on top or in front of other stocked book s . boo ks on th e sh elving carts ar e also account ed for. the surveys we re separ ately con ducted three times a d ay mo rnin g, afternoon, and ev en in gin ord er to cat ch as m any book s u sed in a day as p oss ible. to avoid reco rdin g the sa me boo k mor e than on ce, n o duplicat e call numbers w ere acce pt ed for any single da y even thou gh th e sam e book wa s found in diff erent locations on that day. on the oth er hand , the sam e call number coul d be ent ere d int o the records on th e second day alth ough it was recorded th e d ay befo re a nd remained in th e sa m e pla ce w ith out b eing pick ed up by librar y ass is tants . (thi s dupli ca te reco rdin g was ve ry rare beca use of th e routin e work of book pi ckup by libra ry ass istants.) a period of two w eeks w as d esignated for the sur vey in th e first h alf of december 2002. th e final exam in ation week was pl ann ed becau se it represents a week of h ea vy book u se, although previous resea rch found th at readers in this w ee k tend ed to u se library collection s less th an their own stud y mat erials." a suppl em ent a ry surve y th a t a lso las ted two w eeks, includin g a final exam ina tion wee k, wa s condu cted in th e lib ra ry in late spring 2004. to simplify the rese arch , some excepti ons w ere established for d a ta collection. pe riodicals were exclud ed beca use th ey have a very short loa n p er iod (gen erall y one day) . libra ry u sers m ay pr efer to read articl es in journ als w ithin the library and thu s w ill h av e a clear idea as to wh a t m aterials to read. '' books belon ging to oth er floo rs of the librar y, o r b oo ks b elon g ing to th e fifth floor but found out sid e th e area were not includ ed in th e an alysis. furthermore, du e to the n atur e and time limit of thes e ob ser v ation s, b ooks pulled out of tar geted bo okshelves were not distingui sh ed from b oo ks taken from book sh elves at rand om . thi s information can onl y become ava ilable throu gh int erv iew s using gis to measure in-library book-use behavior i xia 187 reproduced with permission of the copyright owner. further reproduction prohibited without permission. with library users, which can be another rese arch project. each book shelf laye r wa s recorded with and signified by two call numbers: the start and end numbers of books. for example, the call numbers "bf1999 .k54" to "bh21 .b35 1965," representing books stored on a particular layer, were record ed to identify that layer . because book shifting can happen from time to time, such recording of start and end call numbers for individual book shelf layers only reflects the condition s when this research wa s undertaken and may need updates whenever changes occur. data manipulation and visualization using a bookshelf lay er as the recording unit is essenti al for the analysi s of the relationship between book use and bookshelf height. each book used can be classifi ed to fit in one unit according to the call num ber of the book. therefore , building a databas e with a table for lay ers will be an important part in the development of such an analytic al tool. the layers tabl e includes a data field as an identifi e r to stand for the sequenc e of e ach layer-1 for the top layer, 2 for th e next layer down , and so on , in addition to storing the start and end call numbers of books for each lay er. if more than one bookshelf in th e library has seven layers, layer identifiers will it erate from booksh elf to bookshelf . therefore, this tabl e will also need an identifier for each individual book shelf with which lay ers are associated. the dat abase will also contain such information as bookshelf ranges, bookshelf racks, and books , all of which are individual database tables that are joined with each other by relational keys. among them, the ranges table is simply characterized by its id entifier, and is designed to repre se nt two rack s of book shelves that stand back to back. the bookshelves table is identified by the call numbers of the start and end books stor ed across individual bookshelves rather than on individual layers. furthermore, th e books table is primarily filled with the data of individual book call numbers as well as book pickup time s and book discard locations . gis h as lim ited ability for orga nizing da tab ase struc tu re. if n ecessa ry, oth er da tab ase managemen t sys tem s, su ch as microsof t access, can b e incor p ora ted . qu ery codes are built to ge t su mmarize d infor m ation for speci fic p ur poses, and th e agg rega ted da ta are exp or ted int o gis data bases for fur the r sp a tial an alysis or con venie nt vis u al prese nt ati on . da ta vis u aliza tion can be show n at differe nt leve lsby layer, books helf, rack, and range . th e firs t attempt at ma king a vis u al dem on stra tion of this researc h is for th e area of in di vi du al b ooks helves at layer leve l (see figur e 1). th e follow in g qu ery w ill return necessary summ arize d informa tion: select sum(b.call_no) as total_num, l.layer_id, l.shelf_id from (books b inner join layers l on b .some_id = l.some_id) where b.call_no > l.start_no and b.call_no < l.end_no group by l.layer_id, l.shelf_id order by l.shelf_id, l.layer_id. at the same time, another attempt is made to d emonstrate book numbers per layer, at bookshelf level, across multipl e bookshelf ranges. this demonstration provides a better visualization in the gis di splay so that an ov erall view of the height distributions of book usage over certain collection areas can be presented (see figures 2 and 3). to achi eve such visualization, data must be compared in order to get information about which layer of a bookshelf 188 information technology and libraries i december 2004 contains the most frequently used books and which holds thos e that are rarely visit ed . this demonstration indicates that any alternative selection of analytical-display units can be easily performed by making modifications on the query that works on aggregating data . technically, data visualization can be presented by using an y gis software, although arcview is used here because it has been availabl e in the systems of many academic libraries. bookshelf ranges in mackimmie library 's fifth floor were drawn into map features . in order to show them with a three-dim ensional view, each of the seven layers was given a sequential number as its height value , and all book shelves were treated as having the same height. these height values are tre a ted as the z values in any three-dimensional analysis. then, by associating the numbers of books from the database with the heights of layers on the map, arcview is able to sketch the hei ght distributions of inlibrary book us e in new perspectives, dramatically improving the understanding of book use. in order to implement the visualiza tion of all layers across a bookshelf range, lay ers were drawn as map features (see figure 1). layer heights and widths are in appropriat e proportion . (individual book s on each layer are for demonstration only, and thus are not in the exact shape and number.) figure 1 shows how a bookshelf rack has been presented as a gis map, which is a totally new idea in the applications of gis visualization . the databas e and visualization mechanism constitute what is referred to in this paper as the analytical tool. one will find that th e development is relatively easy and the tool is incredibly simple. however, it is a dynamic device. if expanded into other parts of the library collections, this tool will become an integrated system that is able to assi st in the management of library book use and reproduced with permission of the copyright owner. further reproduction prohibited without permission. ••••m•==== ===----::"'-:-=-=-=-=-=-=-=-=-=-=-===:-::::-::-":"".:-:-~.,jgl4 file edit 7:j scene iheme .s1.liace 6t~ s ~ i:!~ ijid ~ ~ liiffl~[i] !hj~ ~~~i§]~ [qi (ill ---..................... _ -~ ¥.l ,, ill figure 2. a three-dimensional view of bookshelf ranges on the fifth floor at the mackimmie library. the height of each bookshelf represents the corresponding height of the layer from which most books were removed. this display is not to actual scale. i -'! st a,t iij gj.1v ~ ...... 0 ~ ...... o· ;! < 0 ~ cjl ......._ l~ ._ c :l v(!) 1-' cd -l 1:-0 automation of acquisitions/carter 123 ming, because the computer center did not have the full-time personnel to support a major new effort. this was resolved by hiring a programmer on a special three-month contract running from april 15 to july 15, 1971. prior to implementation, the library was forced to rely on the availability of keypunch machines at the computer center. in september 1971, an ibm model 129 keypunch and verifier was installed in the technical services department of the library. a model 129 was chosen for the library in conformance with the initial requirement set by the director of the computer center-that all library data for the computer be verified. this has proven to be a wise decision, as we have had relatively limited problems with invalid or erroneous data. requirements specification phase (analysis) three weeks were allowed for identification and specification of all output desired from the initial system. many of these requirements were alluded to in the preliminary list of criteria for the system. to meet the library's needs we decided that the system must produce: purchase orders, individual order cards (including a copy used to order catalog cards from the library of congress), budget statements including all encumbrances and payments as well as other financial data, lists of all books on order or in process or cancelled, notices to vendors regarding items on order more than 120 days, notices to each faculty member of the additions to the collection of items they requested complete with call number, and a monthly accession list of all newly cataloged items that could be circulated to all faculty members. time date to date to development steps required start complete i. requirements specifications 3 weeks feb. 15 march 5 ii. detailed design-system how 3 weeks march 8 march 26 ill. detailed design-programming specifications 10 weeks march 29 june4 iv. programming-acquisitions 10 weeks april15 june 23 v. programming-materials accessioned 3 weeks june 24 july 14 vi. computer program system test -acquisitions & materials accessioned 2% weeks july 1 july 26 vii. implementation july 1971 fig. 2. time estimate for automation of acquisitions at parkland college as submitted in january 1971. a beginning and ending date for each phase is indicated and the actual time in weeks required is shown. 124 journal of library automation vol. 5/2 june, 1972 once it was known what forms were required, orders were placed for the necessary pre-printed forms. with some outside advice in the matter of forms suppliers, specifications for three new forms were delineated, two of which would be for use on the computer. the first form encountered in outlining the acquisitions process was a request form. the request form is used to make a record of all items ordered and to serve as a checklist in the searching process (see figure 3). later, it is stamped with a six-position control number and serves as the source document for keypunching new orders, which require three input cards per item ordered. the request form is then retained in control-number sequence until the item has completed its way through the technical services process. specifications for the purchase orders were drawn up by parkland's business manager. the machine-generated purchase orders used by parkland are almost identical to the conventional manual purchase orders used throughout the college. in this case, automation of the library's purchase orders is a likely precursor to automation of the purchase orders for the remainder of the college. the most complicated form to design, from the library's viewpoint, was the individual order form. this was required in five parts, including a copy complying with library of congress specifications for use with ocr equipment. (this is illustrated in figure 4. ) paper pato iy n.cji. co. speeoiset e moore business foams, inc., 26 searched in bip pbip 8pr ptla o. p, pil fund vendor format code author (last name first) titlefvol. card catalog publisher other year no. copies reviewed in: series/edition lccard no. requester control no. order code price sbn fig. 3. request form, used as a control record for each item ordered. -------r-------------------------------------------------------------------i 0 0 0 0 0 i subsc riber no i m i alpha pref' i i 220111 i i i author westheimer, david title lighter than a feathe r publisher little date 1971 no. copies l control number 103921-b order date ·v endor ll ttle brown & co j l c car d number r 174 -15494 7 i i 10 i list price i lo i ' w '; i r •· " ~ ii 0 7.95 i 1-14-72 i l01375 i 0 p 0. no . i parkland college library i io 11111111111111111 i 0 i a b c d e f g sbni h i j k l m n 0 i i b_ -------i-------------------------------------------------------------------~---~-. 01 1 o i i ------------_l__ original copy, used to order catalog cards rrom me library of congress. ---~--------------------------------,-0 i 1 r;:·-t,m7 ! o 0 [ author·westh£jmer, oavio l ... o ... ee•o title light fit than • featttfr i o-m 0 'o publisher ll ttle date 1971 list price 7. 95 0 0 0 0 . no. copies 1 control number 10)921-6 order date vendor little a·•town a. to 1-14-72 p.o. no. l01j7s parkland college library champaign, illinois 61820 second copy, used to send to vendor. fig. 4. copies one and two of the multiple-part order form . 0 0 0 126 journal of lib·rary automation vol. 5/2 june, 1972 it was important to determine forms requirements early, as it was anticipated that several months' time would elapse before they would be received. naturally, it was desired that the forms be on hand by the time the programs would be ready for testing, which was planned for late june or early july. one of the most critical parts of the requirements specification phase was the determination of data elements to be included in the master records. perhaps the most perplexing of those possibilities considered was subject headings. since we wanted an open-ended system which would leave us some room for future development, without major modifications, a decision was made to include three 50-character subject headings in each record. here we were limited because of the decision made (for purposes of simplicity of design and programming) to confine the system to fixed -length records. it was considered desirable for storage purposes to keep the master record length within 400 characters. while the decision on subject headings may prove to be adequate in the long run, it does give parkland's library a good starting point for some projects using subject headings, such as developing bibliographies on demand. despite possible future modifications to the data base, all items going into the history (master) file included headings as defined above. additional determinations made in the initial phase regarded files to be maintained. here a crucial factor was the physical limitations of the college's computer system. as only two tape drives and two disk drives comprised the primary storage facilities, the capability for performing sorts was limited. in fact, one of the disk drives was reserved strictly for systems programs, and could not be utilized directly by the library. this contributed to the decision to maintain separate on-order and in-process files, as well as a history file on tape. the college vendor file and the library budget file are maintained on disk. a final area of effort in the initial phase was developing codes to be utilized throughout the system. naturally, many conditions would be indicated in the computer records by the use of a oneor two-position code. one example is the format code, a one-position code, which indicates the types of items used such as: b=book, r=record, and s=filmstrip. design phase-system flow three weeks were allotted to developing the overall systems flow chart. this time was spent working out each separate program that would be required, and flow-charting the entire series of programs. a flow chart of the system (without minor additions dating after september 1971) is shown in figure 5. however, it does not necessarily indicate the sequence in which programs are run. in general, maintenance of each of the separate files is run prior to new data. this procedure has proved to work well. .-------~ : llfnoo• i : u'oaf( ca~o$ i ~ ----;--.! ... -.. -, i \woau i i vtlfooii r· ~:~~ ~ automation of acquisitions/ carter 127 o\uiojhiuv ooooo c:~f6' fig. 5. system flow chart. 128 journal of library automation vol. 5/ 2 june, 1972 in most cases, pre-sorting of card input is provided. this decision was not based on optimum efficiency but on the compatibility with routine procedures and facilities in the computer center. design phase-program specifications one of the most significant parts of the development of parkland's automated library acquisitions system is the exhaustive documentation provided by detailed written specifications for each program in the system. each program, including utilities such as sorts, was assigned a job number and then described under each of the following topics: purpose, frequency, definitions (any unusual terms), input, output, and method. a format was provided for each input and output, whether it was a card, tape, disk, list, or other printed report or form. these accompanied each individual program specification. the method section is particularly important. here the librarian-analyst stated the procedure used to arrive at the given output based on the given input. any necessary constants were defined. because the librarian-analyst has had programming training, these specifications are detailed to the point where the programmer does not have to do much more than code the problem, making it possible for programming to proceed quickly. this thorough problem definition for each program by the librarian-analyst was one of the major factors (perhaps the primary key) in our success in acquisitions being accomplished rapidly and efficiently. it had the advantage of obviating the need for a senior programmer, or for having someone from the computer center become highly involved in the analysis of library details. furthermore, and perhaps most important is the fact that it provides the detailed documentation of the system. there should be no doubt as to the procedures within each program. an example of a specification for one of the programs in the parkland college library acquisition series is presented in the appendix. it should be mentioned that most of the programs are written in cobol. there are a few in assembler, and some minimal use is made of rpg. testing of the program the original plans called for testing with test data which would proceed simultaneously with programming. however, as things developed, most coding was done prior to very much testing. as a result, the period originally devoted to live data testing of the whole system was instead devoted to testing the programs with test data. thus, in early july, we were about two weeks behind the original time estimate, and that is where it ended up. the usual problems showed up in testing with test data. moreover, during the first week of july, it was learned that the business office was changing the length of the account numbers from 9 to 11 positions. fortunately, space had been planned for up to a 12-position field, so the lengthened number could be easily accommodated by the system. however, the changautomation of acquisitions/carter 129 ing of numbers required modification of any program which edited data for valid account numbers. this was a minor problem and easily resolved. on july 15 the programmer completed the job for which he was hiredi.e., to complete a programming and systems test utilizing live data and to make appropriate changes as identified during testing. since not even testdata testing was complete on july 15, he stayed until july 20 and finished that work. meanwhile, the director of the computer center had already selected the individual to be the operator when the library's jobs were being run on a regular basis. this employee would also provide program maintenance. on july 21, this permanent staff member took over programming. for the next two weeks, while summer school classes were in session, most of the trial runs of the library series had to he done during evenings, nights, and on weekends. by the end of july, most of the major bugs appeared to be out of the programs. impact on technical services success on the first usable purchase order and order cards came on august 3. within the next day or two, a workable budget statement was produced along with a wits list (work in technical services). by august 13, when the vacation time came, nearly one thousand books had been ordered via the automated system. while a few bugs remained to be dealt with in september, the system was accomplishing its basic mission essentially on time. it took less than eight months to identify requirements, and design, program, and test a system consisting of twenty-seven programs in its original design! during the remainder of 1971, various bugs were found, and, it is to be hoped, eliminated from the system. more bugs occurred in the budget series than in any other single segment of the system. over a period of several months, these were worked out; as of march, 1972, the budget sequence of programs worked smoothly. implementation following the implementation of the automated technical services system, several effects were evident. an obvious effect was the saving of two to three days per month formerly spent on bookkeeping. on the other hand, one permanent staff member was added to technical services because of the keypunching workload. this addition had two causes: the keypunching load, and the fact that many more books were ordered directly from publishers with a consequent major increase in processing in-house. therefore, much of what was expended in salary for the extra clerk was saved by eliminating most prepaid processing costs. for several months after implementation, some duplication of effort was required, especially by acquisitions personnel. thus, the total effect on changing the nature of work was not immediately obvious. by march 1972, duplication was essentially phased out, and more realistic assessments of the 130 journal of library automation vol. 5/2 june, 1972 impact of automation in changing the nature of the workload are now being made. one of the most obvious changes is the increased number of bills to be approved for payment. by utilizing the computer to batch purchase orders and order cards, almost all materials are now ordered directly from publishers, rather than pre-processed from a jobber. although the speed by which items are received and processed has increased substantially, there has been a corresponding increase in paper work in this regard. additional services besides the immediate effects of the automation of acquisitions within technical services, other parts of the library and the college felt the impact. this is especially true of reference, which now has a weekly updated listing of all items on order, in process, or cataloged within the last month, in both author /main entry and title sequence. budget statements are now available to the director of the learning resource center and other personnel on a weekly rather than monthly basis. not only are they received sooner, but they provide more information than is present in the statement originating from the computer center. a useful fringe benefit is the availability of overdue notices to vendors when items have heen on order more than 120 days. a computer-generated notice is sent each week to faculty members regarding items requested, cancelled, or cataloged. the response of the library staff and the rest of the faculty to the automated system has been very favorable. cost at this date (march 1972) , costs are difficult to assess, but certainly seem minimal. the only direct costs are the installation of a 129 keypunch, which rents for $170 per month, plus the salary of the extra staff member for keypunching. however, the extra salary is compensated for by no longer ordering items pre-processed at an average cost of $2.05 per item. naturally, there is some local cost for processing materials such as pockets and labels, but it is minor on a per-volume basis. in addition, by being processed locally, materials are available to the users much more rapidly. among other costs, the learning resource center had to pay a threemonth salary for a programmer. other computer support, whether personnel or machine time, has not been directly billed to the library. analyst time is absorbed, in part, in general library salaries as the librarian-analyst is also head oftechnical services and is responsible for original cataloging. about one-half of her time is devoted to automation activities. as an indirect cost of automation, it is reasonable to include the cost of a special summer project contract of about $1500 for the reference librarian to catalog a-v materials. this was necessary because the librarian-analyst was directly involved with automation, thus not able to keep up with all media of materials to be cataloged. purchase-order forms previously covered by the business office budget cost the library $900. however, it was a two-year automation of acquisitions/carter 131 supply which was paid for by money the college, if not the library, would have expended anyway. the multiple-order forms for computer use exceed the cost of more standard forms by several hundred dollars per year. the library also expends about $400 per year to buy punch cards and magnetic tape. some direct savings resulted from what are by-products of the automated system, but which were previously done manually. these include production of a monthly accession list and notices to faculty members of items they requested which were ordered, cancelled, or cataloged. the accession list was previously compiled by xeroxing in ten copies the shelflist card for all items added to the collection during a month. this involved both xerox charges and student assistant time. notices to faculty were previously sent out by both the order and processing sections. now these notices are consolidated, which produces savings in addressing time, as well as eliminating manual production of each notice. overall, in calculating costs and savings, direct and indirect, it appears at this point that parkland has automated many library routines very inexpensively, although specific cost figures remain to be determined. with the availability of a similar computer, many other libraries should be able to undertake automation of certain basic functions without large expenditures of either money or personnel time. problems as with all automated efforts, some problems were encountered at almost every stage of development. taken as a whole, these were minor and, for the most part, few hitches were encountered. however, so that others may profit from the library automation experience at parkland, those problems will be discussed. the major problem was the original programmer of the series. this person was not a regular employee of parkland and was not concerned with being retained. since he was not part of the staff, he worked erratically and frequently was hard to get hold of. we were working on a tight time schedule, and it was very important to maintain close supervision of the progress being made, although sometimes this was difficult. in addition, even though it was strongly desired that tests be conducted throughout the three-month period, the programmer waited until all coding and compiling was completed before beginning even test-data testing with most programs. fortunately, it worked out satisfactorily, as the regular staff member of the computer center, who presently runs our jobs and does program maintenance, took over in mid-july and was available for live-data tests. all staff members directly involved with automation worked very hard the last two weeks of july and the first week of august to complete testing with live data. the programs were further refined during august and september, and most of the bugs were out by early fall . naturally, changes in specifications continued to be made, and our acquisitions system is definitely not static. 132 journal of library automation vol. 5/ 2 june, 1972 the lesson we learned from the experience with the initial programmer is that, if a regular staff member of the institution can be assigned to the development of programs for the library, avoiding other assignments during that time period, a more satisfactory response can be achieved from the programmer. also, in such an operation it would be possible to monitor progress on a more regular basis. another group of problems arose in connection with the new forms required for the automated system. fortunately, these were not serious. the forms arrived later than they were promised, and, without exception, their cost was about 25 percent more than the original estimates. because custom forms can take a long time to be completed, it is wise to identify output requirements ·early in the development of an automated system, so that the forms can be completed and delivered when the system is ready for final testing and implementation. a few minor problems revolved around decisions made in file design. for conserving space and holding down the size of the master record, it was decided to pack numerical fields. this would have been satisfactory if packing had been limited to such fields as the julian date, such as 72001 rather than 01-01-72. (this form of the date was used to provide easy computation when calculating overdue orders. ) unfortunately, fields such as the numerical part of the lc card number and the parkland college account numbers were also packed. no problem existed except when the lc card was blank at order time; then the lc number printed as zeros. of course, these could be suppressed once the problem was identified, although it was decided to make space to unpack the field. it was learned that packed fields always print zero when unpacked, unless this is specifically suppressed, and also that it is impossible to debug packed fields on routine file dumps that are requested with provisions for unpacking and reformatting the dump. this is because packed fields print blank when they are dumped. other minor difficulties included: l. the print chain did not print colons or semi-colons, except as zero, therefore, the library's records all contain commas instead. 2. in the midst of programming the account numbers , all the college's funds were changed, thus requiring the change of constants and edit criteria in many programs. 3. as originally specified for input, the lc classification number did not sort in shelf list order, for instance, bf 8 sorted after bf 21. this was eventually remedied by left-justifying the letters and right-justifying the numbers within separate fixed fields. 4. routine delays for machine repair and maintenance were a concern, since it is necessary to adhere to a tight schedule in systems development. automation of acquisitions/carter 133 future development as is so frequently the case, now that parkland is committed to automated functions within the library, more and more applications are seen. even the former skeptics on the staff are enthusiastic, and all the professionals have made suggestions for the future. several additions to the acquisitions system were made in the first six months following implementation of the system. these included a list of purchase orders sequenced by vendor and enlarging the machine-generated notices to faculty requestors to cover items ordered and cancelled. various additions have been made in several programs originally part of the system, which expand the services the system can provide for the library staff. many more minor modifications and supplementary features in acquisitions have been identified for inclusion in the system, and will be added as time permits. the first additional area to benefit directly by the computer availability has been periodicals. without involving complicated programming, the periodicals holdings have been converted to a card file which is then listed directly, card by card, without changes, except for suppression of a control and sequence number. nothing more is planned for periodicals in the near future, because the new card file enables the master holdings list of 800 titles to be updated in technical services by the periodicals assistant, who also keypunches one-half time. the time-consuming retyping of the holdings list is now eliminated, and multiple copies of up-to-date holdings lists can be produced more frequently with less effort. another new area for which programming specifications were released in december 1971 is reference. in this system it is hoped that subject bibliographies and holdings lists, based on library of congress classification, can be produced. this system will have a multitude of purposes, one of the primary ones being to give better service to our faculty members. we get many requests for copies of portions of our shelflist or other extracts of holdings. rather than filling these requests by xeroxing cards or tedious typing, a few extract specifications will permit computerized retrieval and printing. also, search time in the catalog will be cut down considerably. in the subject bibliographies, the library plans to be able to extract on any heading, stem of a heading, or any part of a heading, thus getting much more flexibility than in manual use of the card catalog. programming for this is currently under way, and after the system has been completed and is operational, some interesting results should be identified. by including three subject headings of fifty characters in our original file design, it was possible to design and program the reference series as a spin-off of the acquisitionstechnical services system with a minimum of additional effort. even if it is eventually decided to lengthen either the number or size of the subject headings contained in parkland's file, useful services will have been provided under the original design, as well as simply having provided a base for further decisions and developments. 134 journal of library automation vol. 5/2 june, 1972 other projects which are being considered for future action are serials holdings (in parkland's case, mostly annuals and yearbooks which get cataloged), including an anticipation list, and management statistics consisting of holdings percentages by class letter versus collection additions and circulation figures by class letter. circulation itself will undoubtedly not be designed prior to actual residence on the permanent campus ( anticipated for fall 1973), but all of the above are possibilities and some will receive attention in the immediate future. by building a data base which includes subject headings and call numbers, many future projects will be practical to consider as the file maintenance programs and the data base will already exist. these, of course, may be modified from time to time to meet changing conditions and requirements. additionally, parkland's library staff has been following cooperative library automation efforts involving other libraries, and would happily consider participation in appropriate cooperative ventures. conclusion in the opinion of both the library and computer staff, the automation of acquisitions is a success. it was accomplished rapidly and essentially on time and economically-with few costs higher than originally anticipated. now that the system is operating smoothly, with only an occasional bug cropping up, the extra workload caused by parallel operations has been phased out and the total efficiency of the system should continue to improve. the system to date has been running on a weekly basis, and this has proved satisfactory to both the computer center personnel and the library. the library is among the first parts of parkland to be on a regular weekly schedule using the computer. most other processing is on a monthly and quarterly cycle. in approaching any automated systems development, a general attitude of flexibility combined with thoroughness is very important and will probably bring the best long-term results. by being flexible and open-ended, regardless of what portion of a library's functions were originally automated, the way will be paved to provide a data nucleus for other applications in areas of the library. thoroughness in design and attention to initial detail are also important, as sometimes it is harder to find the time to make the changes than was expected. there is probably a tendency to get along with an operational system as it is, rather than making minor non-crucial modifications in it, although such changes do get worked in as time permits. nonetheless, it is very important that in the initial stages a system be as comprehensively planned as feasible. the parkland college learning resource center is fortunate in that original specifications (on the whole) were well thought out and provided a cohesive unit, which is also characterized by built-in flexibility, and as a result is adaptable to future growth. automation of acquisitions/carter 135 acknowledgments numerous individuals have participated in and supported library automation efforts at parkland college. david l. johnson, director of the learning resource center provided the initial inspiration and determination. robert 0. carr, director of the computer center, welcomed the library's commitment to automation and provided the technical advice where necessary. sandra lee meyer, acquisitions librarian, gave full cooperation, including tireless aid in clarification of requirements and debugging test results. since late july 1971, bill abraham has been the programmeroperator for the library system and has consistently given more than one hundred percent effort. jim whitehead from western illinois university contributed valuable advice based on his prior experience in acquisitions automation. finally, kathryn luther henderson, an inspirational teacher and friend, voluntarily spent many hours writing test data and offering the opportunity for many fruitful discussions. references 1. thomas k. burgess, "criteria for design of an on-line acquisitions system at washington state university library," in proceedings of the 1969 clinic on library applications of data processing, edited by dewey e. carroll (urbana: university of illinois, graduate school of library science, 1970), p. 50-66. 2. alvin c. cage, "data processing applications for acquisitions at the texas southern university library," in proceedings, texas confe1·ence on library automation, 1969 (houston: texas library association, acquisitions round table, 1969), p. 35-57. 3. john b. corbin, "the district and its libraries-tarrant county junior college district, fort worth, texas," in proceedings of the 1969 clinic on library applications of data processing, edited by dewey e. carroll (urbana: university of illinois, graduate school of library science, 1970), p. 114-34. 4. t. c. dobb, "administration and organization of data processing for the library as viewed from the computing centre," in proceedings of the 1969 clinic on library applications of data processing, edited by dewey e. carroll (urbana: university of illinois, graduate school of library science, 1970), p. 75-80. 5. connie dunlap, "automated acquisitions procedures at the university of michigan library," library resources & technical se rvices 11: 192202 (spring 1967). 136 journal of library automation vol. 5 / 2 jun e , 1972 6. robert m . hayes and joseph becker, handbook of data processing for libraries (new york: wiley-becker and hayes, 1970). 7. john f. macpherson, "automated acquisition at the university of western ontario," in automation in libraries. papers presented at the c.a.c.u.l. workshop on library automation at the university of british columbia, vancouver, april 10-12, 1967 (ottawa, ontario: canadian library association, 1967). 8. ned c. morris, "computer-based acquisitions system at texas a & t university," journal of library automation 1 :1-12 (march 1968 ). 9. louis vagianos, "acquisitions: policies, procedures, and problems," in automation in libraries. papers presented at the c.a.c.u.l. workshop on library automation at the university of british columbia, vancouver, april 10-12, 1967 (ottawa, ontario: canadian library associ ation , 1967 ), p. 1-9. design of library systems for implement at ion with interactive computers 65 i. a. w arheit: program administrator, information systems marketing, international business machines, san jose, california in the development of library systems, the movement today is toward the so-called "totar' or integrated system. this raises certain design and implementation questions, such as: what functions should be on-line, real time and what should be done off line in a batch mode; should one operate in a time-share environment or is a dedicated system preferred; is it practical to design and implement a total system or is the selective implementation of a series of applications to be preferred. although it may not be feasible in most cases to design and install a total system in a single operation, it is shown how a series of application programs can become the incremental development of such a system. currently library mechanization is entering a new phase. the first phase, extending from 1936 to the mid-fifties, saw the development of a number of small, scattered, and essentially experimental automatic data processing ( adp) library applications. these were punch card systems for purchasing, serials holdings lists and circulation control. during the second phase, which has been running now about 15 years, a large number of library applications have been mechanized. these include the production of catalog cards, book catalogs, periodical check-in, serials holdings, circulation control systems, acquisitions programs and searching of files, or 66 journal of library automation vol. 3/ 1 march, 1970 information retrieval. systems librarians have been busy designing individual programs, building special computer stored files, implementing conversion of records and developing operating procedures for these various applications. more importantly, they have been studying the library from a systems point of view in order to have a better understanding of the individual tasks performed and how they can be best accomplished with the available tools. at first concern was limited to individual applications in the library. gradually some of the more perceptive systems analysts began to be concerned about integrating these various applications. some simple examples are the generation of book cards for process control and circulation control as a by-product during the order-receiving cycle; the combination of subscription renewal, claims, and binding control with the serials holding program; the development of authority lists in book catalog programs; the simultaneous updating of accession files and circulation control files, etc. the purpose of many of these partially integrated programs was to reduce redundancy and make multiple use of single inputs. the next step was to look at the library as a whole and consider it as a "total" or single, integrated system. rather than building a series of independent applications programs, a number of libraries began to plan total systems in which the individual applications would be integrated segments. in the past year or two such efforts have been undertaken by the university of chicago, stanford university, redstone arsenal, the national library of medicine, washington state university, university of toronto, system development corporation, ibm and others ( 1, 2, 3, 4, 5, 6) . it is this total systems concept which is the new and current development of library electronic data processing ( edp). at first, a total integrated system was conceived as a series of separate application programs utilizing separate files, but whose records have similar formats and field designators allowing for the multiple use of single inputs. a more advanced concept, however, calls for the construction of a single logical file, even though, physically, the individual record elements may be distributed over a number of tracks and storage devices. operating on this central file are a series of program modules performing functions involving file building, searching, computation, display and printing. as each application is called for-that is, as the librarian prepares an order, receives an invoice, checks in a periodical, adds a call number, does some cataloging, charges out a book, etc.-the appropriate program functions are called into use. attached to the file are a number of indexes or access points. one such program, for example, provides some eighteen indexes: author, permuted title, subject heading, descriptor, call number, invoice number, publisher, serial j.d., l. c. card number, borrower, etc. it is not just coincidental that the development of the total integrated library system developed at the same time that computer hardware besystems with interactive computers/ warheit 67 came available that made it practical, especially in an economic sense, to operate a total library system. one of the basic elements of this hardware was the development of real-time, on-line, terminal-oriented, timeshared systems. at present, orders for on-line systems are increasing at such a rate that it is estimated in the june 23, 1969, edp weekly "that half of the computers installed by 1975 will be on-line systems." although there are a number of reasons why on-line, time sharing and terminal oriented equipment made it feasible to build total library systems, the fundamental ones were that now the librarians could interact with their system and records and could, essentially simultaneously, perform a great variety of tasks. the scientific and business communities have been quick to take advantage of these new capabilities. a number of computer manufacturers, software firms and service companies soon started to provide terminal oriented, commercial time-share services. by the beginning of 1969 there were some 35 such services in existence, serving over 10,000 customers; by the end of 1969 it is estimated there will be over 30,000 users. although these systems are often used essentially for remote job entry, their main attraction for users has been their on-line, conversational, realtime capabilities. the interactive, man-computer techniques made possible by commercial time-sharing services have been extremely valuable for problem solving applications, especially engineering and programming. however, the wide availability of text editing packages have also opened up these services for libraries. one of the first academic libraries to use such a service for preparing bibliographic records was the state university of new york at buffalo (7, 8). many universities and industrial firms have developed their own timesharing systems. a number of special libraries, notably those in ibm, were quick to take advantage of their in-house, time-share system to implement acquisitions, catalog input and library bulletin programs (9). the defense documentation center over three years ago began preparing its bibliographic inputs on line. the suny biomedical network based in syracuse does the same (10). the washington state university library was one of the first academic libraries to implement an on-line acquisition program ( 11), and midwestern university ( 12) and bell laboratories ( 13) now have on-line circulation control systems. with the advent of time-shared, on-line capabilities and the potentiality of building total, integrated systems, librarians today who are planning edp systems are faced with a number of design decisions: 1) should the system be a real-time, on-line system or an off-line, batch mode operation, or a combination of both? 2) is it desirable to operate in a timeshare environment or is a dedicated system to be preferred? 3) should one design a total, integrated system or should one selectively implement a number of individual applications? 4) if the decision is for an integrated system, how can it be incrementally implemented? 68 journal of library automation vol. 3/1 march, 1970 it is recognized that a program must be tailored to fit the available resources and that it is not always possible to build an ideal system. nevertheless, design objectives must be established even though they cannot be immediately realized. if the ultimate objectives are understood, then the program development will be orderly and later reconversions will be kept to a minimum. therefore, even though the design objectives may not be achieved for a number of years, they should be established so that current implementation can be carried out in a rational manner with some assurance that the system will grow and develop. real time or batch library operations have always involved a variety of interactive real time and batch mode procedures. most operations dealing directly with the library patrons are, of course, in real time; reference question handling and charging of books are typical examples. some technical processes, such as cataloging and searching for acquisition, are also essentially interactive, real-time operations. this means that the librarian completely processes each item by creating or updating a record or servicing an inquiry, one at a time, with little or no attempt to batch the identical operations for a number of items or inquiries. other processing, however, such as preparing and mailing orders to vendors, sorting and filing charge-out cards, sending overdues, filing into the catalog, checking in periodicals, labeling, preparing binding, etc., is essentially done in the batch mode. in other words, batch and real-time operations complement each other, for whereas it is more effective to do some operations in real time, hatching is more effective for other operations. librarians, therefore, expect and need both modes of operation. the actual distinction between these two modes is often lost in certain mechanized systems where everything is done in a non-interactive batch mode while interactive, real-time services are provided from printouts. many current library mechanized systems are really nothing more than processing techniques for producing the standard, hard-copy, bibliographic tools such as catalog cards, serials lists, book catalogs, orders, overdue notices and the like. whenever the librarian wants to use the information generated by these programs, he consults the hard-copy files or lists. he does not interrogate a computer file directly. this approach has been typical of many other computer-based information systems. when the first direct access devices ( ramac) were made available for commercial and industrial inventory control, they were used primarily to update the records and to produce the inventory lists and card files which the user would consult for information. later, as confidence developed in the machines, and terminals became available, the printout lists and files were abandoned and the user began consulting the computer store directly. systems with interactive computers/warheit 69 typically today in libraries using computer systems, inputs are processed in batches and outputs are produced in batches. real-time services are provided from the print-outs: the catalogs, the on-order file, serials lists and so on. even circulation control has been an off-line, batch operation. although the charge-out may be made through a data entry unit, all that is actually accomplished at the time is that the transaction is recorded. it is only later that the transactions are hatched and processed, the files set up for the loans, the discharges pulled from the file and the delinquencies handled. although librarians will not, in the immediate future at least, as readily give up their card catalogs and printed lists as business and industry are doing and as some enthusiasts believe librarians will ( 14, 15) the queuing problem alone where the public must use the files would be very severesome hard-copy files could be dispensed with in an on-line system. certainly hard-copy files of circulation records, periodical checkin records, authority lists, on-order records and the like need not be maintained when these files are available via terminals. until now, practically all library machine processing, with a few exceptions, has been hatched, off line and not interactive. in a non-interactive system, records are created and modified by manual preparation of work sheets followed by keypunching for data entry. in a library environment, for example, this means that the acquisitions librarian fills out an order work sheet that is given to a keypunch operator, who either prepares a decklet of punch cards or punches a paper tape or makes a magnetic record on tape. the cards or tape are then fed into the computer, the input is edited and errors noted and a proof copy is printed. the error messages and proof copy come back to the order librarian, who makes the necessary corrections. these are handed to the keypunch operator, who corrects and updates the record and inputs it again into the computer. if the operator has not introduced some new errors, the record is then processed. if she has, the record loops back again to the order librarian. the same story can, of course, be told about catalog records, journal and report records, and so on. in an interactive on-line system, the originator of the information (in this example the order librarian) could key his data directly into the computer or could prepare a work sheet for operator input. the editing would occur at once by the terminal responding to each entry and verification or error messages would be returned immediately. the librarian or operator would enter the necessary corrections and upon acceptance of the record by the system would signal entry of the record into the file and the print queues as required. also, during the preparation of the entry, the librarian would be using the terminal-presumably a display type terminal-to consult the files he needs, such as shelf list, orders outstanding, authority lists, etc. 70 journal of library aait01tultion vol. 3/ 1 march, 1970 a simplified flow chart comparison of an off-line and an on-line cataloging process would look something like that shown in figure 1. off-line catalog revision kp proof input output worksheet h edit h .], t 8 correction correction 1 .online cataloger output or --4 input edit (7 revision catalog worksheet l 1 error correction i fig. 1. cataloging process: off-line and on-line. although only a few library applications and no total library system are as yet on-line operations, a number of analogous operations are being carried out in other industries, such as order entry, inventory control, production scheduling, insurance policy information, freight waybilling, etc., so that one can make a few tentative assessments (16, 17, 18). to begin with, in an on-line system a work sheet does not have to be prepared, and so the keypunch operator is eliminated. because of the interaction of the originator and the system, all corrections and editing are accomplished at once, so that the tum-around time is very much less. preparation of printed error messages and proof copy are eliminated and the total error rate is greatly reduced. thus, although the reading-in of the individual records is slower in the on-line mode than in the batch mode, appreciably fewer messages need be read to complete a record in the on-line mode, making for more economical machine time. to this, however, must be added terminal and communication costs as well as the terminal supervisor program and the fact that most on-line work is done systems with interactive computers/warheit 71 during the prime shift, so that actual machine costs tend to be higher with the on-line system. some, however, dispute this, claiming that, on balanc~, machine costs are equal. labor costs, however, are very much lower with the on-line system. as a general rule, computer input costs are 85% labor and 15% machine. not only can a transcription clerk be eliminated, but the order librarian who prepares the original inputs on the terminal works very much more efficiently. consulting hard-copy files and lists is more time consuming and less informative than interrogating machine files. in an on-line system, the librarian's necessary tools are brought directly to him and displayed rapidly and efficiently. he does not have to walk to the sheh list, the catalog or the on-order file and copy information. in a well developed, sophisticated system some of the heavily used tools, such as the subject heading authority lists and class tables, would also be available from the terminal. not only does the librarian not have to spend time going to the physical files, but since the information is computer stored, it is brought to him in a greater variety of forms and sequences than is available in the hard-copy files. for example, titles are fully permuted so that incomplete title information can be searched. some systems librarians are proposing the use of codes and ciphers to search for entries, especially those with garbled titles ( 19, 20). all entries, including added authors, editors, vendors, etc., are immediately available even for uncataloged on-order items, so that searching is not restricted to main entries. it is not surprising, therefore, that clerks preparing computer inputs prefer working on line rather than off line. one interesting discovery is that since operators can do so much more with on-line systems they tend to take more time to turn out a better product. indications are "that significantly lower costs would have resulted if the time-sharing users had stopped work (i.e. gone to the next task) when they reached a performance level equal to that of batch users" ( 17). even with a circulation control system, there is higher system efficiency with an on-line operation. every transaction, such as a charge-out or a discharge, is an actual inquiry into the file as to the status of the book and borrower and the answer is immediately available; therefore controls and audit procedures can be simpler. elaborate error correction routines do not have to be provided in the program to identify improper inputs as has to be done with an off-line system. incorrect loans are not made of restricted material, such as holds and reserves, or to delinquent borrowers. the system also acts as a locator tool for determining the location and availability of volumes. as a final note, on-line systems are necessary if effective networks are to be developed and decentralized services provided ( 21, 22). the basic conclusion is that an on-line system can handle more work and provide more services at greater machine costs but lower labor costs than a manual or an off-line machine system. in view of the fact that 72 journal of library a-utomation vol 3/1 march, 1970 machine costs are coming down rapidly, while labor costs and throughput demands are forever rising, the future of the on-line machine system in the library looks very promising. time share or dedicated system a number of librarians have had very unhappy experiences with data processing departments over which they had no control. machines have been changed, schedules dropped, library jobs delayed or dropped for "higher priority jobs" and so on. one tendency, therefore, has been to try to get a library's own computer facility. but, as de gennaro so succinctly summarizes it, "the economics of present day computer applications in libraries make it virtually impossible to justify an in-house machine of the capacity libraries will need dedicated solely or largely to library uses ... eventually, library use may increase to a point where the in-house machine will pay for itself, but during the interim period the situation will be uneconomical unless other users can be found to share the cost. in the immediate future, most libraries will have to depend on equipment located in computing or data processing centers . . . experience at the university of missouri suggests the future will see several libraries grouping to share a machine dedicated to library use . . . it seems reasonable to suppose that in the next few years sharing of one kind or another will be more common than having machines wholly assigned to a single library . . ." ( 23). it is true that the small computers are getting more powerful and it is quite possible the day will come when small stand-alone computers will have the capacity to do all the jobs required by the library. for the time being, however, an on-line system supporting a number of terminals for a variety of tasks in the library requires a computer of a size which cannot be economically justified except for the very large libraries. also, one thing that is often overlooked is that implementing a large library system requires data processing technical support that is very seldom available on the library's staff. one need only look at the information systems office of the library of congress, or the system analysis and data processing office of the new york public library to have some appreciation of the requirements for such technical support. also, a large central system often has backup capabilities which provide insurance against breakdowns and interruptions. the question really is not whether a library should time share or have a dedicated system, but rather whether or not the library has the necessary control over its segment of the total system. this segment is the library's property and its services are available to the library as set forth in the agreement made when the library became part of the data processing services. again, it must be emphasized that all this applies to systems which have to perform all library functions. most libraries, however, in order to systems with interactive computers/warheit 73 get started and develop their programs, are beginning with small, standalone computers or are submitting batch jobs to a data processing center. later, as their programs develop, they will have to upgrade their com• puter capabilities. in view of the ultimate needs of a system which will support most of the major processing functions of a library, most libraries will have to have access to computer facilities whose full support they cannot economically justify. time sharing, certainly for the immediate future, will be required for any on-line library system. total integrated system or individual application it is more economical to handle a variety of library applications by using a single file and a standard set of functional programs, than it is to provide a separate file and a separate set of application programs for each application. not only is it more economical, but this total, integrated approach is, in its essential modularity, extremely flexible. functions can be added, changed, or removed, and sequences can be re-ordered, so that the system can grow and change with changing needs and capabilities. also, since the full record is available, if needed, for every application, added services, normally not feasible, are practical. for example, a circulation control system that, instead of having separate circulation files, keeps charge records in its central bibliographic file, can set a hold on all copies of a book, no matter where the copies are kept, as in the bellrel system ( 13). also, from a total record one can select various subsets and make different orderings to provide a variety of services. the library systems currently being designed are essentially mechanized versions of existing manual systems. however, as experience is gained with these new systems, as more advanced equipment is made available, and as research and development provide new insights, these systems will evolve and change. for example, in some cases a major part of descriptive cataloging is becoming a part of acquisitions. the former compartmentalization in libraries is already breaking down. one should, therefore, be prudent and not lock up the system into tightly compartmentalized segments on the assumption that current file subsets will remain unchanged. it is advisable that each library activity have potential access to all system functions and to all records. in the present context, an activity may have no need for all functions, nor does it need the total record, but as the system develops it might very well need these added capabilities. the problem, however, is that for a total, integrated system one must first build a complete structure including the file and all the functionssuch as file building, search, compute, compare, display, print, etc.-as well as set up all the access points which are essentially indexes. in addition, all the overhead necessary for supervising the programs, managing the files, and monitoring the terminals must be provided for. to use an 74 journal of library automation vol. 3/ 1 march, 1970 analogy, one must first build foundation, walls and roof and install all plumbing and wiring before building any rooms. consequently, the start up or initial investment is far higher than for implementing a single application program. some who have undertaken the development of total systems did not fully appreciate this at first and have, as a result, had to replan their development programs. even if one could bring in a fully debugged program for a total system, there would still be the tasks of converting records, training staff, setting up operating manuals and working out procedures. only as machinable records became available and the file grew and developed could various applications become operable. from a practical point of view, the implementation of a total system would have to be incremental; that is, once the basic system is installed, applications would have to be implemented one at a time and in some rational order. this is even more true where the programs for a total system have not been written as yet or where the library's resources are such that it can only undertake one job at a time. from a practical point of view, one can develop and implement only one application at a time. furthermore, as is often the case, the available equipment is limited and cannot do everything the library will ultimately want. it is necessary, therefore, to develop single applications and to design them in such a way that they can become part of an integrated system. it is also necessary to have a strategy and a plan to move up through the various levels of mechanization. today there are many who, although accepting a total, on-line system as a desirable goal, feel that it is impractical to consider because of costs and unavailability of equipment. a full analysis of economic change in terms of wage-cost rise and machine-cost decrease, of technologic improvement and of demand for added services, goes far beyond the limits of this paper. there is developing, moreover, a literature on these subjects (24, 25, 26, 27, 28, 29). suffice it to say that an increasing number of librarians are becoming convinced that library mechanization is inevitable, that it will affect all operations of the library, that it will provide the highest level of service through direct, on-line, interactive systems and that, whatever today's limitations may be, these changes are coming so fast that plans must be made now. these individuals are also convinced that whatever is now undertaken in the way of mechanization will evolve into an integrated system with many basic functions operated in a real-time, on-line mode. implementation of an integrated library system typically a library mechanization project will start with a single, relatively uncomplicated application that will not impact library operations very much, will require only a small amount of systems design and programming, and will run in a batch mode on a small equipment configuration. a typical example is the preparation of a serials holdings list. from systems with interactive computers/warheit 75 this first job, the librarian and his staff will become acquainted with data processing, will introduce the data processing personnel to some library requirements and will, hopefully, begin to develop procedures for working with the computing center. having passed this introductory stage, many librarians continued, as a rule, simply by developing the next application. today, however, the more prescient ones are first assessing the total impact of mechanization and, having decided that their library will be mechanized, try to plan what their foreseeable goals are, then work out a plan to achieve these goals. having decided that the ultimate goal is a total integrated system for the whole library, which will provide real-time services and therefore must operate on line, the library planner will set priorities and work out a strategy to reach these goals. in some instances he can start designing a total system. in other situations, he does not have the resources to do so, but plans to make use of programs being developed for other libraries or of so-called standard, commercial packages, or programs which may be developed jointly with other libraries. he should realize that he can't just sit and wait for d-day when a total complete program will be wheeled in and a turnkey operation will be installed overnight. the lead time necessary for planning, training, conversion and installation is too often grossly underestimated, so that these preliminary preparations are neglected to the detriment of orderly growth and development. having established certain long-range goals, the librarian will tailor his current programs so that the library system will develop as smoothly as possible. he will try to keep the various subsystems and program segments as generalized and as modular as possible. he will structure his records so that they can ultimately be fitted together into a full bibliographic record. he will try to avoid using records so truncated that they will have to be discarded and recorded again later. he may, in fact, actually start with a full record that is comparable to his present shelf list or catalog card, even though there may be no need of the whole record for the current application. he will provide for a variety of print options, such as line width, number of lines, number of columns, etc., so that a separate print program will not have to be written for each product or to accommodate every change in style. he will try to organize his files so that the file structures and the record formats will not have to be radically changed when the system goes on line. he may store some of his records -his active on-order file, for example-on direct access storage devices. if he can, he will create access points to his large bibliographic file and store them on disk files too, even though he is currently operating off-line. such direct access storage of indexes makes economic sense when very large files and library files are large and grow very fast must be searched or sorted. aside from these immediate benefits, such a file organization requires little or no restructuring or record reformatting when 76 journal of library automation vol. 3/ 1 march, 1970 the system ultimately goes on line and becomes terminal oriented. as early as possible, he will put his circulation control system on line. this is by far the cheapest and easiest on-line operation requiring the least investment and yet producing the most immediate benefits. again, aside from the immediate benefits, this on-line operation represents an important building block for the ultimate total system. aside from the current improved services, the experience of working on line and the opportunity to develop and refine processes and procedures will pay important dividends in the design and implementation of the total on-line system. with knowledge of how he wants his system to develop, the librarian is now able to establish priorities and allocate his resources. the emphasis will be on file building, on capturing the record. acquisitions programs or circulation control systems will come first. work on the display terminal and communication will come later after searchable files have been built up. in other words, an attempt is made to have a controlled growth through several levels of mechanization. a start is made with a simple, off-line, batch job. then a beginning is made on building what is to become the main, central bibliographic file, the catalog. as soon as possible, parts of it are stored on direct access devices, so that it can be used more effectively and so that its structure will conform to the requirements of an ultimate on-line system. a simple on-line process is adopted as soon as feasible. each application program uses standard functional modules in macro form and so on. all this, of course, is highly oversimplified and may seem truistic to many. nevertheless, there has been too much evidence of programs undertaken without adequate planning and of programs that have lacked continuity because adequate guide lines have not been established. such failures are too often ascribed to changes in personnel or hardware. a project should be designed so that inevitable changes in personnel and hardware can be tolerated without its being wrecked. therefore, the establishment of long-range goals can have a profound effect on the shape and success of current operations. more and more librarians and systems personnel engaged in library projects are beginning to think in terms of total integrated systems. they are looking ahead and planning. they are designing and implementing their present applications not in a simple ad hoc way but as part of what is to become a total system. references 1. alexander, r. w.: "toward the future integrated library system," 33rd conference of fid and international congress on documentation, (tokyo: 1967). systems with interactive computers/ warheit 77 2. redstone scientific information center : automation in libraries (first atlis workshop) 15-17 november 1966, huntsville, ala.: redstone arsenal, (june 1967). report rsic625. 3. black, donald v.: "library information system time sharing: system development corporation's lists project," california school libraries, (march 1969), 121-6. 4. black, donald v.: library information system time-sharing on a large, general purpose computer. (system development corporation report sp-3135, 20 september 1968). 5. bruette, vernon r.; cohen, joseph; kovacs, helen : an on-line computer system for the storage and retrieval of books and monographs (brooklyn, new york : state university of new york downstate medical center, 1967). 6. fussier, herman h.; payne, charles t. : development of an integrated computer-based bibliographical data system for a large university library. (chicago : chicago university, 1968) . clearinghouse report pb 179 426. 7. balfour, frederick m.: "conversion of bibliographic information to machine readable form using on-line computer terminals," journal of library automation, 1 (december 1968), 217-26. 8. lazorick, gerald j.: "computer/ communications system at suny buffalo," educom. the bulletin of the interuniversity communications council, 4 (february 1969), 1-3. q_. bateman, betty b.; farris, eugene h.: "operating a multilibrary system using long-distance communications to an on-line computer," proceedings of asis, 5 ( 1968 ), 155-62. 10. pizer, i. h.: "regional medical library network," medical library association bulletin, 57 (april 1969), 101-15. 11. burgess, t .; ames, l.: lola library on-line acquisitions subsystem. (pullman, wash.: washington state university library, july 1968). 12. reineke, charles d.; boyer, calvin j. : "automated circulation system at midwestern university," ala bulletin, 63 (october 1969 ), 1249-54. 13. kennedy, r. a.: "bell laboratories' library real-time loan system (bellrel)," journal of library automation, 1 (june 1968), 128-46. 14. licklider, j. c. r. : libraries of the future (cambridge, massachusetts : m.i.t. press, 1965). 15. swanson, don r. : "dialogues with a catalog," library quarterly, 34 (january 1964), 113-25. 16. brown, robert r.: "cost and advantages of on-line dp," datamation, 14 (march 1968), 40-3. 17. gold, michael m.: "time-sharing and batch-processing; an experimental comparison of their values in a problem-solving situation," communications of the acm, 12 (may, 1969), 249-59. 78 journal of library automation vol. 3/ 1 march, 1970 18. · sackman, h.: "time sharing versus batch processing: the experimental evidence," afips conference proceedings, 32, 1968 spring ]oint computer conference, 1-10. 19. nugent, william r.: "compression word coding techniques for information retrieval," journal of library automation, 1 (december 1968) ) 250-60. 20. ruecking, frederick h.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-38. 21. grosch, audrey n.: "implications of on-line systems techniques for a decentralized research library system," college & research libraries, 30 (march 1969), 112-18. 22. rayward, w. boyd: "libraries as organizations," college & research libraries, 30 (july 1969), 312-26. 23. de gennaro, richard: "the development and administration of automated systems in academic libraries," journal of library automation, 1 (march 1968), 75-91. 24. "the costs of library and informational services." knight, douglas m.; nourse, e. shepley, eds.: in libraries at large (new york: r. r. bowker, 1969), 168-227. 25. cuadra, carlos a.: "libraries and technological forces affecting them," ala bulletin, 63 (june 1969), 759-68. 26. culbertson, dons.: "the costs of data processing in university libraries : in book acquisition and cataloging," college & research libraries, 24 (november 1963), 487-89. 27. dolby, j. l.; forsyth, v.; and resnikoff, h. l.: an evaluation of the utility and cost of computerized library catalogs. final report project no. 7-1182, u. s. department of health, education and welfare. 10 july 1968, eric ed 022517. 28. kilgour, frederick g.: ''the economic goal of library automation," college & research libraries~ 30 (july 1969), 307-11. 29. knight, kenneth e.: 'evolving computer performance," datamation, 14 (january 1968), 31-5. president’s message cindi trainor information technologies and libraries | december 2013 1 hi, litans! forum 2013 i'm excited that 2014 is almost here. last month saw a very successful forum in louisville, in my home state of kentucky. there were 243 people in attendance, and about half of those were firsttime attendees. it's also typical of our yearly conference that there are a large number of attendees from the surrounding area; this is one of the reasons that it travels around the country. louisville's forum was the last of a few in the "middle" of the country--these included st. louis, atlanta, and columbus. next year, forum will move back out west, to albuquerque, nm. the theme for next year's conference will be "transformation: from node to network." see the lita blog (http://litablog.org/2013/11/call-for-proposals-2014-lita-forum/) for the call for proposals for concurrent sessions, poster sessions, and pre-conference workshops. goals of the organization at the board meeting in the fall, we took a stab at updating lita's major goal areas. the strategic plan had not been updated since 2010, so we felt it was time to update the goal areas, at least for the short term. the goals that we agreed upon will carry us through annual conference 2015 and will give us time to mount a more complete planning process in the meantime. they are: • collaboration & networking: foster collaboration and encourage networking among our members and beyond so the full potential of technologies in libraries can be realized. • education & sharing of expertise: offer education, publications, and events to inspire and enable members to improve technology integration within their libraries. • advocacy: advocate for meaningful legislation, policies, and standards that positively impact on the current and future capabilities of libraries that promote equitable access to information and technology. • infrastructure: improve lita’s organizational capacity to serve, educate, and create community for its members. midwinter activities in other governance news, the board will have an online meeting in january 2014, prior to cindi trainor (cindiann@gmail.com) is lita president 2013-14 and community specialist & trainer for springshare, llc. http://litablog.org/2013/11/call-for-proposals-2014-lita-forum/ mailto:cindiann@gmail.com president’s message | trainor 2 midwinter conference. our one-hour meeting will be spent asking and answering questions of those who typically submit written reports for board meetings: the vice-president, the president, and the executive director. as always, look to ala connect for these documents, which are posted publicly. we welcome your comments, as well as your attendance at any of our open meetings. our midwinter meeting schedule is: • the week of january 13 online meeting, time and date tba • saturday, january 25, 1:30 4:30 p.m. pcc 107a • monday, january 27, 1:30 4:30 p.m. pcc 115a as always, midwinter will also hold a lita happy hour (sunday, 6-8 pm, location tba), the top tech trends panel (sunday, 10:30 a.m., pcc 204a), and our annual membership meeting, the lita town meeting (monday 8:30 a.m., pcc 120c). we look forward to seeing you, in philadelphia or virtually. make sure to check the midwinter scheduler (http://alamw14.ala.org/scheduler) for all the details, including the forthcoming happy hours location. it's the best party^h^h^h^h^h networking event at midwinter! i would be remiss if i did not mention lita's committees and igs and their midwinter meetings. many will be meeting saturday morning at 10:30 a.m. (pcc 113abc)--so you can tablehop if you like. expressing interest at midwinter is a great way to get involved. can't make it to philadelphia? no problem! fill out the online form to volunteer for a committee, or check out the connect groups of our interest groups. some of the igs meet virtually before midwinter; some committees and igs also invite virtual participation at midwinter itself. join us! http://alamw14.ala.org/scheduler lib-s-mocs-kmc364-20141005045237 223 cumulating the supplements to the seventh edition of lc subject headings roy b. torkington: central library and documentation branch, international labour office. 0 at the time of writing, the author was head, library systems department, university of california, san diego. a description is presented of the project of the university of california library automation program to cumulate the 1966 through 1971 supplements to the library of congress subject headings. the university of california institute of library research marc processing software, bibcon, was used, with specially written programs. the resulting cumulation was edited, printed in book form, and made available to libraries. the final task involved merging six marc files into one file of over 125,000 records and then printing that file in a format similar to that of lc subject headings. the project was a cooperative effort with participation by people from several uc campuses. introduction the seventh edition of subject headings used in the dictionary catalogs of the library of congress was published with a cutoff date of june 30, 1964. the first supplement covered additions and changes from july 1, 1964 through december 31, 1965. subsequent supplements were issued annually, with each annual cumulating quarterly over a one-year period.1• 2 by 1972, when the supplement for 1971 was issued, it was necessary, when assigning or verifying subject headings, to use seven supplements ( 1964/ 65, 1966, 1967, 1968, 1969, 1970, 1971) in addition to the seventh edition. the supplement cumulation task of the university of california library automation program aimed at alleviating that problem by merging the 1966 through 1971 annual supplements into one cumulation. through the courtesy of the library of congress we were able to obtain unedited magnetic tape files of all supplements except the 1964/65; these files were in the lc internal format. the university of california library automation program undertakes cooperative automation programs which will benefit uc libraries. the cu0 the views expressed in this document may not be considered those of the international labour office. ~ i ·u~· tc:j-[ ci ··o· l3 ..... l.cmo101n' ...,. 00 ""~'"'·" .o ........ ""it '•,())' s....cj;.tj'td eii·10\.'c a mull ' ' ,~-. p'!... ~' ' .. ~ ... " .. .---.l'-' ...,' }" , i ~ ... , ... '' ',,, 0: ~/ j:.ym l111fc~lt.d ed-~41c~tmu&.~ ' ' .. .,. ...... ··.,o / umlim flt-'l\1~1'<11*"' fig.l. preliminary cumulation production and editing. ' ' , ' , ' ' 'tf 't5 ',b edited. kern ! dorm sori!hl f:llt.1uc\i~atien. m-,-llcu hitisom jc.,..tldltmi ... ,.c~)t,-.., . 'd \5' ~ ;:l, ~__j l:=_j '@),, ,/' ' l '@• / ....... ..... aiviual 0 b' ~ l:;:-j fig. 2. annual supplement cumulation cycle. ...... .. ,... ..... .,_, n•w ~ .... ~ ""'" i a. .a t"-1 .... i r ~ ~ ~ ~ t::;j (1) ~ 0" ~ 1-' ~ cumulating supplements to lc subject headingsjtorklngton 225 mulative supplement task was seen as a low-cost project which could result in a product useful to uc and other libraries. we intended to use software already available at uc as much as possible. the available software was bibcon, a group of marc processing programs developed and used by the uc institute of library research, berkeley, and uc santa cruz, for production of the 1963-1967 uc union catalog supplement and the uc santa cruz book catalog.8 the bibcon programs to be used were: sked, a sort key creator and editor; biblist and bibprint, record, column, and page formatters; and fix, a record corrector. essentially we considered the problem of cumulating the supplements as one of producing a book catalog by cumulating several annual catalogs. thus we needed a program to convert the lc internal format to uc marc (supcon) and a merge program ( supmrg) (see figures 1 and 2) . merge transactions a special merge program was necessary, because merging the supplement files is more complex than merely interfiling entries. frequent matches or <;0 < m [ai nu~bt• ui hor oui _10 nc . \. ii ill i u u 01 ' 110 ,,_; ,. 01 u n" " "'"""•"""" '"'" " ~~"~!"".~_'~~-~~ · ko l nn• ll k """" " kh 011 " "' '""""" " "'" fig. 2. circulation card. a mark-sense reproducer prepares the cards for the computer. this reproducer had been acquired for other college computer functions and the library was able to make use of it (2). under this plan the books are charged out by having the borrower write in his identification number, which serves as the borrower's number, and his name in the appropriate box on the ibm circulation card ( 3). the student assistant at the desk mark-senses the book card with the identification number; this is the one manual operation, but it has presented no problem. the marked circulation cards are sent three times a week to the data center, where the mark-senses are read and punched and the due date is gang-punched in. the 1401 computer generates a second circulation card, duplicating the accession number, call number, author and title. old and new circulation cards are machine filed together by accession number and returned to the circulation file, which is arranged by date and accession number. it was found that the accession number is easier to read than the library of congress number and is the truly unique number. a printed circulation listing, arranged by call number to facilitate use, is kept at the charge desk; it shows accession number, author and title, borrower's name, identification number and due date. it is also possible to prepare a daily circulation report by student identification number and name if required. the entire circulation is sent to the data center weekly to produce a cumulative print-out of all books in circulation. these print-outs provide daily and weekly totals of all outstanding circulated books. no data processing equipment is required for reserve circulation. charging out of books on reserve continues to be done by having the borrower write his identification and name on a blue reserve card to be kept at the desk. library mechanization at auburn/hilbert 17 when a book is returned, the pair of circulation cards are selected from the circulation file. the used charge card, which contains the borrower's identification number and due date, is marked "cancelled" with a rubber stamp. the new circulation card is inserted in the pocket of the book and the book is reshelved. cancelled circulation cards are kept and sorted later to provide statistical analyses by date and class number for each semester. this system was developed because it was felt a small library could not justify expensive charging machinery. acquisitions and shelf list once the reclassification operation was organized it was possible to set up automation procedures for processing current acquisitions. an ibm card was designed as a book request card (figure 3) to be filled in by staff or faculty member. information on it includes author's name, title, publisher, price purchase order number, academic department, and requestor's name. at the data center the foregoing information is keypunched into the card, which then becomes a purchase order card. the purchase order number identifies the vendor and is gang-punched into the cards. a computer print-out produced from the purchase order cards is mailed directly to the dealer as a book order. order cards are kept in an "on order" file by dealer or purchase order number and then by author until the books are received. / i i i i i i i i i i""'' '""· .,.o . nol li.tr,. 0 c • •. nn uf ••11r nnvr tiis ill£ mlibrary author c request form tide inch.! de date & edirion ii neceuafy publis'her dept, please print or type. list price i reqvested a, complete request nn lit jhow nrs lim . and sign. p.o. -~ cod lc. cion i i do not fold, bend acceu;on .j i i i or mutilate this card. i i i i i i ' 1 f j t i i i i i ii n ij ult 111111111tr11 nn l4 11mn•~uu~»•nxnut1unm~·t/ii~miiuummmpmm.i1uh~u-ii·h~ itd»h~niih~· ~ ............. fig. 3. book request card. when the book and its library of congress cards have been received, the corresponding order cards are pulled and the following additional information is added to the purchase order card: actual cost (taken from the invoice), accession number (stamped on), and the library of congress call number (taken from the library of congress printed card). figure 4 is a flowchart of the acquisitions procedure. the books are then processed in the same manner as was used for reclassifying ( 1). 18 journal of library automation vol. 3/ 1 march, 1970 ,,.~, ll/20 vcr fig. 4. acquisitions flow chart. vpm r~ceipt .f~~.s tj,.de/#~d. m•ke. ~kif list . card~ f"r1m ~rcluue. orr:lel' li~ra~"''/ ~,. fi/,..,9 library mechanization at auburn/ hilbert 19 /41>1 lillo fig. 4 continued. 1~01 l/1~0 order cards are sent to the data center to reproduce the shelf list cards, automatically transferring the pertinent information already punched in the order card and keypunching the additional information into the shelf list cards (figure 5). currently, provision is being made for inclusion of the library of congress card order number in the shelf list card to enable easy subsequent selection of the corresponding marc records. -r;, l,l,i"r%ry -hr .fi/i-,9 fig. 5. new books listing procedure. 20 l ournal of library automation vol. 3/ 1 march, 1970 the shelf list cards are used to produce the new books list (figure 5). the shelf list is kept in the ibm card form, and a book catalog could easily be made if so desired. to compile a bibliography it is only necessary to take the punched cards from the shelf list in the wanted classification. the library's subject catalog and the library of congress subject headings are checked to determine the class numbers to be used. as depicted in the flowchart in figure 5, these cards are put through the computer to produce the print-out, and then returned to the punched shelf list file. this system was designed to produce a bibliographical record of the books in the library and to automate the technical processing of the books in as simple a method as possible so as not to defeat the purpose of automating. accounting (figure 6) the accounting system was designed to use the book request card after it has had department and cost punched into it. after the books are processed, accumulated request cards are sent periodically to the data center, where computer print-out is produced by department, listing the books purchased and the cost of each, with a summary showing all expenditures. copies are sent to department chairmen to keep them informed of their expenditures. these order cards are kept for a semester, then returned to the individual requesting faculty members after a cumulative accounting record has been made. by this means it is possible to keep track of each department's book budget and the library's total book budget, with the computer doing all the work. l117tj fig. 6. accounting procedure. library mechanization at auburn/ hilbert 21 overdues (figure 7) overdue notices are machine prepared from overdue circulation cards which are selected periodically from the charge-out file. the cards are passed through the computer, which generates second and third overdue overdue c!.ire. ecl"ds 14tji 70 li.6ran; mw ovt~due.. file l. 1/~t:f fig. 7. overdue procedure. t 22 journal of library a.utomation vol. 3/ 1 march, 1970 cards to be used for discharging purposes. gummed address labels that include the student identification number are produced using the college log of names and addresses. the appropriate label is applied to the reverse side of the circulation card using the i.d. as a guide. each notice card is stamped "overdue book, please return as soon as possible," then sent through the postage meter and placed directly in the mail. if several overdues are sent to the same person, the cards are mailed in an envelope, using the gummed label. the second and third notice cards are filed at the circulation desk until needed or until the book is returned. there is another file for borrowers who are seriously delinquent in returning their books. cards that have accumulated in this overdue file are processed as follows to generate further overdue notices: an overdue notice is sent to the borrower, the dean's letter to the borrower or to his parents, and the list of names to the dean and the student personnel office. at the end of each semester a list is prepared indicating all books held by individual faculty members for more than three months and the latter are notified. the time-consuming operation of preparing overdues has been considerably reduced ( 4). serials serial holdings have been converted to machine readable punched cards. the state university of new york, under the direction of dr. irwin pizer of the upstate medical center at syracuse, has recently published a union list containing the titles of all periodicals received in all units of the state university (5). it includes the serial holdings of auburn community college, (approximately 400 titles) and punched cards for these holdings are used by the library adapted for its use. information on the card comprises title, inclusive dates, years on microfilm, department for which the periodical was ordered and the indexes in which the periodical is listed. each new serial title added to the holdings is keypunched with this information. the punched cards are used to print out an alphabetically arranged title listing and a departmental listing. adding or withdrawing titles is a simple matter, and up-to-date lists of periodical holdings are easily produced by the computer. copies of the lists are sent to eacli faculty member and several copies are available at the desk and in the periodical room. costs since library use of the data center was considered to be similar to other college uses (e.g., that of the business office), the cost of library automation was absorbed by the data center and not charged to the library. an estimate of the cost, including rental time on the computer (about three hours per week), supplies, and data center staff time, is about $1500.00 a year for ongoing programs. library mechanization at auburn/hilbert 23 conclusion the automated systems herein described have now been completely operational for over a year. converting data for a computer operation spotlighted inaccurate recording of information and afforded a good opportunity for correcting previous errors. periodically, progress and results have been reviewed and changes made, as will continue to be the case. the automated circulation system is providing the library with rapid, accurate, and efficient circulation control not possible for a manual system. ease and speed of performing routine library operations by the use of automation more than compensates for the cost of data processing. automated technical procedures provide for faster and more efficient processing of books, production of the library's monthly new books list (which previously took hours to type) and subject bibliographies. other important results of the mechanization project are the serial listings and departmental accounts, all of which make possible better library service. acknowledgments the programming was done in autocoder by, or under the supervision of, mr. richard klinger, chairman of the data processing department at auburn community college; to him is due most of the credit for the mechanization of the library. the library is grateful to mr. klinger for his encouragement and enthusiastic support and his willingness to assume the technical responsibilities of programming and systems design. references 1. international business machines: mechanized library procedures. (white plains, n. y.: ibm, n. d. ). 2. international business machines: library processing for the albuquerque public school system (white plains, n. y.: ibm, n. d.). 3. dejarnett, l. r.: "library circulation." in international business machines corporation: ibm library mechanization symposium (endicott, n. y.: 1964), pp. 78-93. 4. eyman, eleanor g.: "periodicals automation at miami-dade junior college," library resources and technical services, 10 (summer, 1966), 341-61. 5. the union list of serials in the libraries of the state university of new york. (syracuse, n.y.: state university of new york upstate medical center, 1966). marc international richard e. coward: head of research and development, the british national bibliography, london, england 181 the cooperative development of the library of congress marc ii profect and the british national bibliography marc ii project is described and presented as the forerunner of an international marc network. emphasis is placed on the necessity for a standard marc record for international exchange and for acceptance of international standards of cataloging. this paper is an examination of two major operational automation projects. these projects, the library of congress marc ii project and the british national bibliography (bnb) marc ii project, are the result of sustained and successful anglo-american cooperation over a period of three years during which there has been continuous evaluation and change. in 1969, for a brief period, the systems developed have been stabilised, partly to give time for library systems to examine ways and means of exploiting a new type of centralised service, and partly to give the library of congress and the british national bibliography the opportunity to look outwards at other systems being developed in other countries. there has, of course, already been extensive contact and exchange of views between the agencies involved in the planning and developing of automated bibliographic systems and the possibilities of cooperation and exchange have been informally discussed at many levels. the time has now come for the national libraries and cataloguing agencies concerned to look at what has been achieved and to lay the foundation for effective cooperation in the future. the history of the anglo-american marc project began at the library 182 journal of library automation vol. 2/ 4 december, 1969 of congress with an experiment in a new way of distributing catalogue data. the traditional method of distributing library of congress bibliographic information is to provide catalogue cards or proof sheets. these techniques will undoubtedly continue indefinitely into the future, but the rapid spread of automation in libraries has created a new demand for bibliographic information in machine readable form. the original marc project ( 1) was "an experiment to test the feasability of distributing library of congress cataloguing in machine readable form". the use of the word "cataloguing" underlines the essential nature of the marc i project; its end product was a catalogue record on magnetic tape. there is a very significant difference between a catalogue record on magnetic tape and a bibliographic file in machine form. the latter does not necessarily hold anything resembling a catalogue entry, although marc ii still reflects, both in the lc implementation ( 2,3) and in the bnb implementation ( 4,5), a preoccupation with the visual organisation of a catalogue entry. fortunately retention of the cataloguing ''framework" does not hinder the utilisation of lc or bnb marc data in systems designed to hold and exploit bibliographic information, as the whole project is designed as a method for communication between systems. the essence of the marc ii project is that it is a communications system, or a common exchange language between systems wishing to exchange bibliographic information. it is highly undesirable, in fact quite impossible, to plan in terms of direct compatability between systems. machines are different, programs are different, and local objectives are different. the exchange of bibliographic information in any medium implies some level of agreement on the best way to organise and present the data being exchanged. the need to use a fairly standard type of bibliographic structure on a catalogue card is obvious enough, and over the years a form of presentation, as best exemplified by a library of congress catalogue card, has been developed which holds all the essential data and also, by means of typographical distinctions and layout, conveys the information in a visually attractive style. when bibliographic information is transmitted in a machine readable form the question of visual layout does not arise but the question of structure is vitally important. this structure is called the machine format and the machine format holds the data. it literally does not matter in what order the various bits and pieces that make up a catalogue record appear on a magnetic tape. what does matter very much is that the machine should be able to recognise each data element: author, title, series, subject heading, etc. in practice, either each data element must be given an identifying tag that the machine can recognise, or each data element must occupy a predetermined place in the record. in view of the unpredictable nature of bibliographic information, the former methodthat of tag identification-is now widely used and is the technique adopted in the marc system. marc international/ coward 183 the lc and bnb marc systems are two very closely related implementations of a communications format which in its generalised form has been carefully designed to hold any type of bibliographic information. the generalised format description is now being circulated by british standards institute and united states of america standards institute. it can be very briefly described as follows : i leader i directory i control field(s) i data fields the leader is a fixed field of 24 characters, giving the record length, the file status and details of the particular implementation. the directory is a series of entries each containing a tag (which identifies a data field) , the length of the data field in the record, and its starting character position. this directory is a variable field depending on the number of data elements in the record. the control fields consist of a special group of fields for holding the main control number and any subsidiary control data. the data fields are designated for holding bibliographic data. each field may be of any length and may be divided into any number of subfields. a data field may begin with special characters, called indicators, which can be used to supply additional information about the field as a whole. it can be seen that the basis of marc ii is a very flexible file structure designed to hold any type of bibliographic record. once such a level of compatability is established it is possible to prepare general file handling systems ( 6) which will convert any bibliographic record to a local file format. there is certainly much scope for agreement on local file formats as well, but such formats will necessarily be conditioned by the type of machine available and the use to be made of the file. the establishment of a generalised file structure is a great step forward but by itself means very little unless a wide measure of agreement can be reached on the data content of the record to be exchanged. here the responsibility for cooperation and standardisation shifts from the automation specialist to the librarian, and particularly to those national libraries and cataloguing agencies who can by their practical actions assist libraries to implement the standards prepared for the profession. in order to appreciate the real importance of standardisation, particularly in the context of the marc project, it is necessary to look a few years into the future. it is inevitable that the rapid spread of automated systems in libraries will create a demand for machine readable bibliographic records and that in turn will lead to the setting up of bibliographic data banks in machine readable form in national and local centres. these data banks will be international in scope and will contain many millions of items. in the long run the only feasible way to maintain them is for each country or group of countries to develop automated centralised cata184 journal of library automation vol. 2/4 december, 1969 loguing systems for handling their own national outputs and to receive from all other countries involved in the network machine readable records of the latter's national outputs. countries cooperating on this basis must agree on standards of cataloguing (and ultimately on standards of classification and subject indexing), so that the general data bank presents a consistently compiled set of bibliographic data. there is no doubt that national data banks will be set up. libraries today are faced simultaneously with a rapid increase in book prices, a need to maintain ever-increasing book stocks to meet the basic requirements of their readers, and a persistent shortage of trained personnel to catalogue their purchases. these trends are already well established and in the united states, where they are most advanced, the result has been the massive and highly successful shared cataloguing program. historically the shared cataloguing program will probably be seen as the first and last attempt to provide a comprehensive bibliographic service by unilateral action. a large number of countries have cooperated in this attempt but the shared cataloguing program does not rest on the principle of exchange. it is doubtful if even the united states will be able to maintain and extend this programme in its present form. the shared cataloguing program must ultimately be replaced with an international exchange system. national machine readable bibliographic systems will be established, but there is a grave danger that those agencies responsible will be primarily concerned only with the immediate problem of producing records suitable for use in their own national context or for their own national bibliography, regardless of the fact that the libraries and information centres they need to serve are acquiring ever-increasing quantities of foreign material. the exchange principle will be downgraded to an afterthought, a bv-product of the fact that an automated system is being used. if this outcome is to be avoided, international standards must be prepared and national agencies must accept them instead of only paying lip service to them. in the past librarians have tended to be more concerned with codification than standardisation, but in the field of cataloguing at least a great breakthrough was made sixteen years ago when seymour lubetzky produced his "cataloguing rules and principles; a critique of the a.l.a. rules for entry and a proposed design for their revision" ( 7). the work of lubetsky led to the "paris principles" ( 8) published by ifla in 1963 and in due course to the preparation of the "anglo-american cataloguing rules" 1967 ( 9) . these rules, though unfortunately departing from lubetzky's principles in one or two areas provide a solid basis for standardisation. we are fortunate to have them available at such a critical moment in the history of librarianship. they must form the basis of an international marc project. of all the great libraries of the world, the library of congress has done more than any other to promote international cataloguing standards. it is now in a uniquely favourable position to promote these standards marc international/coward 185 through its own marc ii project. the lc marc ii project, together with the bnb marc ii project, can provide the foundation of the international marc network. these projects alone cover the total field of english language material and yet already the basic requirement of standardisation is absent. the library of congress finds itself unable, for administrative reasons, to adopt fully the code of rules it worked so hard to produce and which british librarians virtually accepted as it stood in the interests of international standardisation. that a great library should be in this position is understandable. what is less understandable is that the library of congress should transfer the non-standard cataloguing rules established by an internal administrative decision to prescription of cataloguing data in the machine readable record that it is now issuing on an international basis. one of the great advantages of machine readable records is that they can simultaneously be both standard and non-standard. there is no reason that the library of congress, or any national agency, should not provide for international exchange a standard marc record together with any local information the library might want. if as a result other national agencies are encouraged to do the same, it will not be long before the absurdity and expense of examining each record received via the international network in order to change a standard heading to a local variant, will become apparent. the british national bibliography has already accepted the anglo-american code and by this action has now done much to promote its acceptance in great britain. incomplete acceptance of the code is really the only significant difference between the two marc projects. at a detailed level there are differences in some of the subfield codes. these are chiefly due to the fact that the british marc committee was particularly concerned with the problems of filing bibliographic entries, and as no generally accepted filing code exists it was decided to provide a complete analysis of the fields in headings. this analysis will enable the bnb marc data base to be arranged in different sequences to test the rules now being prepared. the other difference, or extension, in the british marc format is the provision of cross references with each entry, on the assumption that in a marc system a total pack of cataloguing data should be provided. however these differences reflect the experimental nature of the british project, not the fundamental differences in opinion. in this paper an attempt has been made to look at the british and american marc projects not as systems for distributing bibliographic information but as the forerunners of an international bibliographic network. intensive efforts have been made to lay a foundation for this international network. the anglo-american code provides a sound cataloguing base, the generalised communications format provides a machine base, and the standard book numbering system provides an international identification 186 journal of library automation vol. 2/ 4 december, 1969 system. these developments are all part of a general move towards real cooperation in the provision of bibliographic services. they must now be brought together in an international marc network. references i. avram, henriette d.: the marc pilot profect (washington, library of congress: 1968). 2. u. s. library of congress. information systems office. the marc ii format: a communications format for bibliographic data. prepared by henriette d. avram, john f. knapp and lucia j. rather. (washington, d. c.: 1968). 3. "preliminary guidelines for the library of congress, national library of medicine, and national agricultural library implementation of the proposed american standard for a format for bibliographic information interchange on magnetic tape as applied to records representing monographic materials in textual printed form (books)," journal of library automation, 2 (june 1969) . 68-83 4. bnb marc documentation service publications, nos. 1 and 2 (london, council of the british national bibliography, ltd., 1968 ). 5. coward, r. e.: '~he united kingdom marc record service," in cox nigel s. j.; grose, michael w.: organization and handling of bibliographic records by computer (hamden, conn., archon books, 1967). 6. cox, nigel s. m.; dews, j. d.: "the newcastle file handling system," in op. cit. (note 4). 7. lubetzky, seymour: code of cataloging rules ... prepared for the catalog code revision committee . .. with an explanatory commentary by paul dunkin. (chicago : american library association, 1960). 8. international federation of library associations. international conference on cataloguing principles, paris, 9th-18th october, 1961: report; edited by a. h. chaplin. 9. anglo-american cataloging rules. british text (london: library association, 1967). editorial board thoughts: requiring and demonstrating technical skills for library employment emily morton-owens information technologies and libraries | september 2016 6 recently i’ve been involved in a number of conversations about technical skills for library jobs, sparked by an ital article by monica maceli1 and a code4lib presentation by jennie rose halperin.2 maceli performed a text analysis of job postings on code4lib to reveal what skills are cooccurring and most frequent. halperin problematized the expense of the mls credential in comparison to the qualifications actually required by library technology jobs and the salaries offered for technical versus nontechnical work. this work has inspired many conversations about the shift in skills required for library work, the value placed on different kinds of labor, and how mls programs can teach library technology. during a period of hiring at my institution and through teaching a library school course in which many of the students are on the brink of graduation, my attention has been called particularly to one point in the library employment process: job postings. these advertisements are the first step in matching aspiring library staff with the real-life needs of libraries—where the rubber meets the road between employer expectations and new-grad experience. most libraries already use the practice of distinguishing between required and preferred qualifications, which is a good start, especially for technology jobs where candidates may offer strong learning proficiency yet lack a few particular tools. although there have been conflicting interpretations of the hewlett-packard research suggesting that men are more likely than women to apply to jobs when they don’t meet all the requirements,3 i observe a general tendency among graduating students to err on the side of caution because they’re not sure which qualifications they can claim. among my students, for example, constant confusion attends the years of experience required. is this library experience? general job experience? experience at the same type of library? paid or unpaid? postings are often ambiguous and students may choose to apply or not. similarly, there are questions about what extent of experience qualifies someone to know a technology: mastering it through creating new projects at a paid job, experience maintaining it, or merely basic familiarity? not knowing who has been hired, and on the basis of what kind of experience, is a gap for researchers trying to close the loop on job advertisements. even when a job posting has avoided an overlong list of required technical skills, it might still be expressing a narrow sense of what’s required to qualify. someone who understands subversion will be capable of understanding git, so we see plenty of job advertisements that ask for experience with a “a version control system (e.g. git, subversion, or mercurial).” i recently polled staff in our department and found very few of us with bachelor’s degrees in technical subjects. more of us had come to working in library technology through work experience or graduate programs. and yet, our job postings contained long statements that conflated education and experience, such as “bachelor’s degree in computer science, information science, or other emily morton-owens (egmowens@upenn.edu), a member of the ital editorial board, is director of digital library development and systems, university of pennsylvania libraries, philadelphia, pennsylvania. mailto:egmowens@upenn.edu editorial board thoughts | morton-owens doi: 10.6017/ital.v35i3.9527 7 relevant field and at least 3 years of experience application development in object oriented and scripting languages or equivalent combination of education and experience. master’s desirable.” i edited our statement to more clearly allow a combination of factors that would show sufficient preparation: “bachelor’s degree and a minimum of 3-5 years of experience, or an equivalent combination of education and experience, are required; a master’s degree is preferred,” followed by a separate description of technical skills needed. this increased the number and quality of our applications, so i’ll remain on the lookout for opportunities to represent what we want to require more faithfully and with an open mind. meanwhile, on the other side of the table, students and recent grads are uncertain how to demonstrate their skills. first, they’re wondering how to show clearly enough that they meet requirements like “three years of work experience” or “experience with user testing” so that their application is seriously considered. second, they ask about possibilities to formalize skills. recently, i’ve gotten questions about a certificate program in ux and whether there is any formal certification to be a systems librarian. surveying the past experience of my own network—with very diverse paths into technology jobs ranging from undergraduate or second master’s degrees to learning scripting as a technical services librarian to pre-mls work experience—doesn’t suggest any standard method for substantiating technical knowledge. once again, the truth of the situation may be that libraries will welcome a broad range of possible experience, but the postings don’t necessarily signal that. some advice from the tech industry about how to be more inviting to candidates applies to libraries too; for example, avoiding “rockstar”/ “ninja” descriptions, emphasizing the problem space over years of experience,4 and designing interview processes that encourage discussion rather than “gotcha” technical tasks. at penn libraries, for example, we’ve been asking developer candidates to spend a few hours at most on a take-home coding assignment, rather than doing whiteboard coding on the spot. this gives us concrete code to discuss in a far more realistic and relaxed context. while it may be helpful to express requirements better to encourage applicants to see more clearly whether they should respond to a posting, this is a small part of the question of preparing new mls grads for library technology jobs. the new grads who are seeking guidance on substantiating their skills are the ones who are confident they possess them. others have a sense that they should increase their comfort with technology but are not sure how to do it, especially when they’ve just completed a whole new degree and may not have the time or resources to pursue additional training. even if we make efforts to narrow the gap between employers and jobseekers, much remains to be discussed regarding the challenge of readying students with different interests and preparation for library employment. library school provides a relatively brief window to instill in students the fundamentals and values of the profession and it can’t be repurposed as a coding academy. there persists a need to discuss how to help students interested in technology learn and demonstrate competencies rather than teaching them rapidly shifting specific technologies. editorial board thoughts | morton-owens doi: 10.6017/ital.v35i3.9527 8 references 1. monica maceli, “what technology skills do developers need? a text analysis of job listings in library and information science (lis) from jobs.code4lib.org,” information technology and libraries 34 no3 (2015): 8-21, doi:10.6017/ital./v23i3.5893. 2. jennie rose halperin, “our $50,000 problem: why library school?” code{4}lib, http://code4lib.org/conference/2015/halperin. 3. tara sophia mohr, “why women don’t apply for jobs unless they’re 100% qualified,” harvard business review, august 25, 2014, https://hbr.org/2014/08/why-women-dont-apply-for-jobsunless-theyre-100-qualified. 4. erin kissane, “job listings that don’t alienate,” https://storify.com/kissane/job-listings-thatdon-t-alienate. http://dx.doi.org/10.6017/ital./v23i3.5893 http://code4lib.org/conference/2015/halperin https://hbr.org/2014/08/why-women-dont-apply-for-jobs-unless-theyre-100-qualified https://hbr.org/2014/08/why-women-dont-apply-for-jobs-unless-theyre-100-qualified https://storify.com/kissane/job-listings-that-don-t-alienate https://storify.com/kissane/job-listings-that-don-t-alienate lib-mocs-kmc364-20131012114038 278 circulation systems past and present* maurice j. freedman: school of library service, columbia university, new york city. a review of the development of circulation systems shows two areas of change. the librarian's perception of circulation control has shifted from a broad service orientation to a narrow record-keeping approach and recently back again . the technological development of circulation systems has evolved from manual systems to the online systems of today. the trade-ojjs and deficiencies of earlier systems in relation to the comprehensive services made possible by the online computer are detailed. in her 1975 library technology reports study of automated circulation control systems, barbara markuson contrasted what she called "older" and "more recent" views of the circulation function. the "older" or traditional view was that circulation control centered on conservation of the collection and recordkeeping. the "more recent" attitude encompasses "all activities related to the use of library materials. " 1 it appears that this latter outlook is not as new as markuson had suggested. in 1927, jennie m. flexner's circulation work in public libraries described the work of circulation as the "activity of the library which through personal contact and a system of records supplies the reader with the [materials] wanted. "2 flexner went on to characterize four major functions of circulation as follows: (1) the staff must know the books in the collection, and have a working familiarity with them. (2) the staff must know the readers; their wants, interests, etc. (3) the circulation staff must fully understand the library mission and policies and work harmoniously with those in related departments. (4) the circulation department has its own particular duty to perform .... effective routines and techniques must be established by the library and mastered by the staff if the distribution of books is to be properly accomplished and the public is to have *this article is adapted from a speech delivered at rutgers university. manuscript received november 1980; revised may 1981 ; accepted july 1981. circulation systems/freedman 279 the fullest use of the resources of the institution. the library must be able to locate books, on the shelves or in circulation; to know who is using material and how the reader can be traced, if he is misusing or unduly withholding the books drawn. 3 the function of circulation has not changed since flexner's description. even within the context of online circulation systems, it is absolutely essential that the circulation system be seen in as broad a context as possible. it is not merely an electromechanical phenomenon staffed by automatonclerks. circulation services involve that function which is ultimately one of the most fundamental: the satisfactory bringing together of the library user and the materials sought by that person. it follows, then, that the mechanism and means of delivery and control of the service are only a small part, and certainly not the most important part of the circulation function. knowing your collection, your readers, and clearly knowing your library's mission are crucial prerequisites for the effective circulation of library materials. an examination of the history of circulation systems and their evolution to the present state reveals the change in outlook from a narrow view of the circulation function to a broader view. let us begin by establishing the basic elements of record keeping, upon which circulation control is based. there are three categories of records: 1. for the collection of materials, books, tapes, microforms, etc., comprising the library. 2. for the readers or users of the library service. 3. for the wedding or concatenation of the first two, i.e., the library user's use or borrowing of the library's materials. a minimal circulation model is a set of procedures or recordkeeping with respect to only the third category, i.e., records of the materials held by the library user outside of the library. a total or complete system would then be one that provides for all three categories. using these criteria to judge the level of control provided by the various circulation systems of the past, let us review. the earliest method of circulation control was the chain method. in this case, "circulation" is not an accurate term; "use" of materials is more appropriate, as the collection did not circulate. books were chained to the wall and the user did not take the material outside of the library. the minimal circulation model is not met, and records were not required. several hundred years later, the ledger system's first iteration involved a simple notation into a ledger. the identification of the book-call number and/or author and title-and the borrower's identification were recorded. upon the return of the book, the borrower or the receiving clerk initialed the ledger entry or otherwise indicated the return of the item. minimal circulation control is met. a more developed or sophisticated ledger system exceeded this minimal circulation model. the new ledger had each page headed by a different 280 journal of library automation vol. 14/4 december 1981 borrower or registration number. consequently, a given user had all of his or her charges recorded on the given page indicated by the user's number. the economy of not having to write the borrower's name for every transaction was made possible through the creation of a file of patron records linked to the ledger page by common registration numbers. in effect, this was our first "automation." the use of a master file in support of anumbered page provided information that had previously been handwritten every time someone wished to borrow books from the library. the new ledger system also allowed for a more orderly control of charges. only the borrower's number was needed to get at the page of transactions relating to that borrower, as opposed to the former methoda benchmark method, in a sensein which the transactions were chronologically entered and had no other ordering whatsoever. even with the improved ledger system, though, the only ordering was by borrower number and date of issue to the borrower. there was no arrangement that provided for sequencing or finding the books borrowed. the need to identify borrowed books led to the dummy system. every book had a concomitant dummy book (or large card) that had a ruled sheet of paper with the book identification information on it and the borrower's name and/or number. when a user wished to borrow a book, the dummy was pulled from a file and the borrower information was written on the sheet of paper. the dummy was then filed on the shelf occupying the space formerly occupied by the book itself. when the book was returned, it was reshelved, the dummy removed, and the circulation transaction was crossed out. this system is interesting in that it provides for a complete inventory control. either all items are on the shelf in proper sequence or a physical surrogate or record for circulating items is substituted and placed in proper sequence. one has instant and, in effect, "online" access to the presence or absence of materials if one has the call number and can go to the shelf. unlike most systems that can only tell whether or not the book is present, the dummy system tells who has the book and when it was charged. in terms of a minimal model, this system provided less and more than the ledger system. if a reader wanted a list of books he or she borrowed, the reader would have to view every dummy and see if the listed item was charged to him or her. in contrast, the ledger system served such a request well, though every page of the ledger might have to be examined to find out who had borrowed a book not found on the shelf. leaping past several systems, let us now discuss the newark system , the overwhelmingly prevalent system in the united states today (if we include the mechanical or electromechanical versions of dickman, gaylord (the manual, not automated), and demeo). the newark system incorporated the best features of the systems already mentioned. a separate registration file was kept which provided both alphabetic access by patron and numeric access by patron registration circulation systems/freedman 281 number. consequently, the recording of the borrower's identification during circulation transactions only involved the notation of the number. for book identification, a card and matching pocket were placed in each book with the call number and/or author-title identification information. the circulation transaction involved the removal of the card from the pocket and the entering on it, ala dummy system, the date of the transaction and the borrower number. the cards for all of the books borrowed on a given day were aggregated and filed in shelflist sequence in a tray headed by the date of the transactions. resorting to computer jargon, the major or primary sort of the book cards (read circulation cards) was by date, but the minor sort was by call number. consequently, if one wanted to know the status of a given book and one had the call number, it would not take too long to search, even with a file as large as the one in the main branch of newark public library, by looking for the item in all of the different days' charges. when a book was returned, the clerk noted from the date-of-issue card inserted in the book's pocket, the tray in which to search, and the matching call number on the pocket which was used for discharging the book, i.e., removing the charge card from the tray and replacing it in the book. the combination of the books on the shelf plus the cards in the different trays in shelflist order constituted a complete inventory. additionally, the trays of cards comprised a comprehensive record of all current charges, i.e., all transactions by date, call number, and borrower, with borrower number pointing to fuller information in the registration file. looking back at our basic model, the newark system offered not just the minimum-a record of the item and the borrower who took it-but also introduced a major step toward inventory control. there was an inventory sequence involved, or, more accurately, several inventory sequences-one for each given collection (or day) of circulation transactions. what was still missing was a record by borrower of what was charged to him or her. in the original newark system, the borrower's card had entered upon it dates of issue and return of items. this way, even if the library could not tell the user what items (s)he had, the user's card would reflect the number of items outstanding. the handling of reserves, renewals, and overdue notices occurred as follows: a colored clip or some indicator on a circulation card would be used to indicate a reserve. a renewal would be handled the same as a return except the person would wait while the charge card was pulled from the appropriately dated tray, and assuming that no reserves had been placed on the circulation card, the book would be recharged (i.e., renewed) to the borrower. overdues automatically presented themselves by default. cards left in a tray after a predetermined number of days represented charges for which overdues were to be sent. the tray was taken to the registration file and the numerically sequenced registration cards for the delinquent borrowers removed so that notices could be prepared and sent. then the 282 journal of library automation vol. 14/4 december 1981 registration slips and circulation cards had to be refiled at the completion of the process. essentially, most subsequent systems are variants on the newark system. the mcbee key-sort system involves the use of cards with prepunched holes around the edges, one of which can be notched to indicate the date an item is due. the cards are arranged by call number creating a single sequence. the insertion of a knitting needle .like device through a given hole will allow all of the books overdue for a given date to fall free of the deck. this system is like the newark system in that it has inventory and date access, but unlike newark it places a horrible burden on the borrower. each card has (written by the borrower) the borrower's name and address and the call number, author, and title of the book. thus, the library is saved the labor of creating circulation cards and maintaining registration records for every patron-all of the information needed is on the charge card. but here, as marvin scilken has pointed out, the burden of the library's tasks are merely passed on to the users. this point should be emphasized. the next system to be considered is the photo-charge system. microphotos are taken of the borrower's card, which has the name and address on it, the book card (as in the newark book identification card), and a sequentially numbered date-of-issue or date-due slip . again, as with the mcbee, since the photo record includes the borrower's name and address, one can throw away registration files. also, a list or range of transaction numbers is kept by date used. since the numbered date-of-issue slip is placed in the book at the time of charging, and one removes it when the book is returned, it is a simple step to cross off or remove the number on the slip from its corresponding duplicate on the list of numbers for that day's transactions. overdue transactions are found by searching for unchecked transaction numbers on the numerically sequenced microfilm. this system does meet the criterion of the minimal model, a record of the user's use of the item. in terms of labor intensity, one has eliminated the maintenance of charge-card files and registration files by a single microfilm record. reserves, though , are terribly time-consuming with the photo-charge system: each returned book, before it can be returned to the shelf or renewed, must be searched against a call-numbered sequence of reserve cards. academic libraries would not use this kind of system because call-number access is a necessity, especially in relation to recalls of longloaned items . the elimination of paper files is what so commended this system to public libraries over the newark-based systems. but, as was noted, one has virtually no way of determining who took a book out or when it is due back except, in principle, by searching all of the reels of microfilm. some variants on this microfilm system were developed. bro-dart marketed a system that thermographically produced eye-readable records instead of microimages . such was the state of circulation systems before computers began to be used. the following-a discussion of the involvement of computers-can circulation systems/freedman 283 be separated by the type of hardware: main frames, minicomputers, and microcomputers. the main-frame computer has been used primarily in the past as a processing unit for batches of circulation transactions collected and fed to it via punched cards, terminals, or minicomputers. call number and author and title (albeit brief) and user identification number, were captured for each transaction. in the 1960s and into the early 1970s, this information would be batch-processed by the computer and a variety of reports would be produced. what the computer does, then, is keeps track of numbers, their ranges, and the dates of the ranges. but the computer can do much more than this. it is capable, as none of the nonautomated systems were, of rearranging the data input and then comparing and tabulating them as desired and appropriate. consequently, the fact that the call number, author, and title are stored by the machine means that lists or files can be arranged by any of these elements. the same goes for date of transaction. as to borrower identification number, a master file much like the newark registration file is kept (only now in its machine-readable form), and the computer does the comparing at high speed instead of the clerk taking the charge record and going to the numeric file to find the name and address of the borrower. of course, the computer can then readily and quickly print out overdue notices with an obvious absence of clerical support and labor intensity. as we all know, the rate of increase of labor costs in increasing, and the rate of increase of computer costs is decreasing. two kinds of large computer systems have been used. the batchoriented one, which either kept track of items in circulation only (the absence system-only items absent from the collection were tracked), or one that kept track of the entire collection (the inventory system). 4 normally, identification numbers were used for patrons in either system. although relatively rare in academic and public libraries, the mainframe-based online system is also in use. ohio state university is famous for its online system. what is meant here is that all transactions are immediately recorded and all files are instantly updated. printing is still necessary for overdue notices, but printed circulation lists are not necessary because of the online answers to queries regarding books or patrons now possible through terminals distributed to appropriate locations. the minicomputers came on the scene in two stages. clsi's entrance in 1973 utilized one of the early minicomputers, quite small by today's standards. for relatively small libraries that had not begun to dream of having their own computers, it became possible to have an entire inventory (in abbreviated form) and an entire patron file online. consequently, all of the access power of the newark system, and none of its labor intensity, was available online and much more besides. few libraries could afford the main-frame system of ohio state, but many could pay for clsi's, and indeed they did. in the last few years, minicomputers have grown several magnitudes 284 journal of library automation vol. 14/4 december 1981 above the capacity and speed of main-frame computers of the 1960s. consequently, such firms as dataphase, systems control, geac, gaylord, and others offer these larger minis, which can now support online the needs of large branch systems with inventories of hundreds of thousands of books. incidentally, clsi, with a new mini line, can do this now as well. both the miniand maxi-based systems do all of the basic work originally outlined: the whole inventory can be accessed online or with printed lists arranged by author, title, or call number (and, presently, some vendors offer online subject access and cross-references); access can also be made by patron's name. further, the basic transactionitem, borrower, and date-is recorded and checked for holds or delinquency before it is accepted. without overly extolling the present state of the art, it should be said that all of the information identified as important in the earliest systems is now not only available in a far quicker and more usable fashion, it can be manipulated by the machine in a variety of ways to meet and serve management objectives not considered practicable in the past. peter simmons showed how collection development could be aided by automatically generating purchase orders when reserves exceeded a specified acceptable level. 5 all kinds of statistical data regarding collection and patron use can be generated that could not have been possible in a manual mode. while at the university of southwestern louisiana, william mcgrath was able to adjust book budget allocations in terms of collection use and undergraduate major in a most interesting fashion. 6 the net result was an empirically based expenditure of book funds. now the microcomputer or microprocessor is the newly emerging phenomenon , and in many respects it is not unlike the minicomputer of the early 1970s. it is being used to perform single data-recording functions, and is also being seen as the link to the larger computer . so we have moved from chained books to microcomputers the size of a desk top. originally, a great deal of information was captured at great expense and laboriously maintained. certainly the handwritten and typed records of the newark system, although relatively comprehensive, were obtained and preserved at great cost. and, despite it all , there were real limitations of access . the succeeding mcbee and photo-charging systems appreciably cut out-of-pocket costs to the library, but either passed labor directly on to the user, or eliminated access altogether. book or patron access are virtually impossible with the photo-charging method. simply put, that system tells what is overdue, and that's all. the entry in the 1960s of the computer radically altered the ground rules. now all sequences of encoded elements are possible, and management information can be derived. important statistical data pertaining to collection use and library users can be obtained by further manipulating the data accumulated in the circulation process. it is now possible for all but the smallest and the very largest libraries to have access to and control circulation systems/freedman 285 of their materials through the current range of minicomputers on the market. jennie flexner told us that circulation had to be more than maintenance and record keeping of loan and borrower transactions. through the advances of the computer technology and its application to circulation control, we have finally seen what seems to be an optimization of the recordkeeping process and, by extension, an improvement in circulation service. if instantaneous access to patron files, inventory files, and outstanding transaction files through a variety of modes and computer-developed management data does not constitute that optimization, it will have to dountil the real thing comes along. acknowledgment the author is deeply indebted to susan e. bourgault for her editorial assistance. references 1. barbara evans markuson, "automated circulation control," library technology reports quly and sept., 1975), p.6. 2. jennie m. flexner, circulation work in public libraries (chicago: american library assn., 1927), p.l. 3. ibid., p.2. 4. robert mcgee, "two types of design for online circulation systems," journal of library automation 5:185 (sept. 1972). 5. peter simmons, collection development and the computer (vancouver, b.c.: univ. of british columbia, 1971), 60p. 6. william e. mcgrath, "a pragmatic allocation formula for academic and public libraries with a test for its effectiveness," library resources & technical services 19:356-69 (fall1975). maurice j. freedman is an associate professor at the school of library service, columbia university, new york city. communications ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ roel some considered 2000 the year of the e-book, and due to the dot-com bust, that could have been the format’s highwater mark. however, the first quarter of 2004 saw the greatest number of e-book purchases ever with more than $3 million in sales. a 2002 consumer survey found that 67 percent of respondents wanted to read e-books; 62 percent wanted access to e-books through a library. unfortunately, the large amount of information written on e-books has begun to develop myths around their use, functionality, and cost. the author suggests that these myths may interfere with the role of libraries in helping to determine the future of the medium and access to it. rather than fixate on the pros and cons of current versions of e-book technology, it is important for librarians to stay engaged and help clarify the role of digital documents in the modern library. a lthough 2000 was unofficially proclaimed as the year of the electronic book, or e-book, due in part to the highly publicized release of a stephen king short story exclusively in electronic format, the dot-com bust would derail a number of high-profile e-book endeavors. with far less fanfare, the e-book industry has been slowly recovering. in 2004, e-books represented the fastest-growing segment of the publishing industry. during the first quarter of that year, more than four hundred thousand e-books were sold, a 46 percent increase over the previous year ’s numbers.1 e-books continue to gain acceptance with some readers, although their place in history is still being determined—fad? great idea too soon? wrong approach at any time? the answers partly depend on the reader ’s perspective. the main focus of this article is the role of e-book technologies in libraries. libraries have always served as repositories of the written word, regardless of the particular medium used to store the words. from the ancient scrolls of qumran to the hand-illuminated manuscripts of medieval europe to the familiar typeset codices of today, the library’s role has been to collect, organize, and share ideas via the written word. in today’s society, the written word is increasingly encountered in digital form. writers use word processors; readers see words displayed; and researchers can scan countless collections without leaving the confines of the office. for self-proclaimed book lovers, the digital world is not necessarily an ideal one. emotional reactions are common when one imagines a world without a favorite writing pen or the musty-smelling, yellowed pages of a treasured volume from youth. one of the battle lines between the traditional bibliophile and the modern technologist is drawn over the concept of the e-book. some see this digital form of written word as an evolutionary step beyond printed texts, which have been sometimes humorously dubbed tree-books. although a good deal of attention has been generated by the initial publicity regarding newer e-book technologies, the apparent failures of most of them has begun to establish myths around the concept. abram points out that the relative success of e-books in niche areas (such as reference works) is in direct contrast with public opinion of those purchasing novels and popular literature through traditional vendors.2 crawford paraphrases lewis carroll in describing this confusion: “when you cope with online content about e-books, you can believe six impossible things before breakfast.”3 incidentally, this article will attempt to dispel a mere five of the myths about e-books. the future of e-books and the critical role of libraries in this future are best served by uncovering these myths and seeking a balanced, reasoned view of their potential. a 2002 consumer survey on e-books found that 67 percent of respondents wanted to read an e-book, and 62 percent wanted that access to be from a library.4 underlying this position is the assumption that the ideas represented by the written word are of paramount importance to both writers and readers. it is also assumed that libraries will continue their critical role in collecting, organizing, and sharing information. � myth 1—e-books represent a new idea that has failed many libraries have invested in various forms of e-book delivery with mixed results.5 sottong wisely warns of the premature adoption of e-book technology, which he dubs a false pretender as a replacement to printed texts.6 however, the last five years are but a small part of a longer history, and presumably, a still longer future as is often the case with computer jargon, the term e-book has emerged and gained currency in a very short amount of time. however, the concept of providing written texts in an electronic format has existed for a long time, as demonstrated by bush’s description of the dispelling five myths about e-books james e. gall james e. gall (james.gall@unco.edu) is assistant professor of educational technology at the university of northern colorado, greeley. dispelling five myths about e-books | gall 25 26 information technology and libraries | march 2005 memex.7 the gutenberg project put theory into practice by converting traditional texts into digital files as early as 1971.8 even if the e-book merely represents the latest incarnation of the concept, it does so tenuously. books in their present form have a history of hundreds of years, or thousands if their parchment and papyrus ancestors are included. this history is rich with successes and failures of technology. for example, petroski presents an interesting historical examination of the problem of storing books when the one book–one desk model collapsed under the proliferation of available texts.9 similarly, a determination on the success or failure of e-books, or digital texts, based upon a relatively short period of time, is fraught with difficulty. rather, it is important to look at recent developments as merely a next step. the technology is clearly not ready for uncritical, widespread acceptance, but it is also deserving of more than a summary dismissal. � myth 2—e-books are easily defined the term e-book means different things depending on the context. at the simplest, it refers to any primarily textual material that is stored digitally to be delivered via electronic display. one of the confusing aspects of defining ebooks is that in the digital world, information and the media used to store, transfer, and view it are loosely coupled. an e-book in digital form can be stored on cd–rom or any number of other media and then passed on through computer networks or telephone lines. the device used to view an e-book could be a standard computer, a personal digital assistant (pda), or an e-book reader (the dedicated piece of equipment on which an e-book can be read; confusingly, also referred to as an e-book). technically, virtually any computing device with a display could be used as an e-book reader. from a practical point of view, our eyes might not tolerate reading great lengths of text on a wireless phone, and banks will not likely provide excerpts of chaucer during atm transactions. another important factor in defining e-books is the actual content. a conservative definition is that an e-book is an electronic copy or version of a printed text. this appears to be the predominant view of publishers. purists often maintain that a true e-book is one that is specifically written for that format and not available in traditional printed form.10 this was one of the categories of the shortlived (2000–2002) frankfurt e-book awards. of course, the multitude of textual materials that could be delivered via the technology exceeds these definitions. magazines, primary-source documents, online commentaries and reviews, and transcripts of audio or video presentations are just a short list of nonbook materials that are finding their way into e-book formats. one can note with some sense of irony that the technology behind the web was originally designed as a way for scientists to disseminate research reports.11 despite the web’s popularity, reading research reports makes up an exceedingly small percentage of its use today. although there is a continuing effort to reach a common standard for e-books (see www.openebook.org/), the current marketplace contains numerous noncompatible formats. this noncompatibility is the result of both design and competitive tradeoffs. in the case of the former, there is a distinct philosophical difference between formats that attempt to retain the original look and navigation of the printed page (such as adobe’s popular pdf files) versus those that retain the text’s structure but allow variability in its presentation (as best exemplified by the free-flowing nature of texts presented as html pages). this difference can also be seen in the functionality built around the format. traditional systems provide readers with familiar book characteristics such as a table of contents, bookmarks, and margin notes, a view that could be named bibliocentric. the alternative is one that takes more advantage of the new medium and could be labeled technocentric, and can most easily be seen in the extensive use of hyperlinking.12 the simplest use of hyperlinking provides an easy form of annotating texts and presenting related texts. on the other extreme, hyperlinks are used in the creation of nonlinear texts in which the followed links provide a unique context for building meaning on the part of the reader.13 it is interesting to note that a preliminary study of e-book features found that the most desirable features tended to reflect the functionality of traditional books and the least desirable features provided functionality not found there.14 competitive tradeoffs are a critical issue at the current point of e-book development. the current profit models of publishing entities and copyright concerns of authors seem naturally opposed to e-book formats in which texts were freely shared, duplicated, and distributed. for example, the open ebook forum is the most prominent organization devoted to the development of standards for e-book technologies. in late 2004, their web site listed seventy-six current members. although the american library association is a member, it is one of only six members representing library-oriented organizations. in comparison, thirty-five members (or 46 percent) are publishing organizations, and thirteen (or 17 percent) are technology companies.15 the number of traditional publishers versus technology companies on this list may suggest that a bibliocentric view of ebooks would be more favored. this also appears to confirm one media prediction that traditional publishers would continue to dominate efforts with this new medium.16 however, the limited representation of libraries in this endeavor is troubling (despite the disclaimer of using an admittedly rough metric for measuring impact). it is clear that many industry formats attempt to limit the ability to distribute materials by keying files so that they may only be viewed on one device or a specific installed version of the reader software. this creates technological problems for entities like libraries that attempt to provide access to information for various parties. the concept of fair use of copyrighted materials has to be reexamined under an entirely new set of assumptions. another irony is that the availability of free, public-domain materials in e-book format can be viewed as negative by the publishing industry. after investing considerable time and effort in developing e-book technology, publishers would prefer that users continue purchasing new e-book material rather than spend time reading the vast library of free historical material. many of these content issues are currently being played out in courts and the marketplace, particularly with regard to digital music and video.17 although one can humorously imagine the so-called problems associated with a population obsessed with downloading and reading great literature, the precedents set by these popular media will have a direct impact on the future of digital texts. despite the labor required to scan or key entire print books into digital formats, there have been some reports of this type of piracy.18 other models for the dissemination of digital intellectual property that are not determined by traditional material concerns of supply and demand will continually be attempted. for example, nelson predicted a hypertext-publishing scheme in which all material was available, but royalties were distributed according to actual access by end users.19 theoretically, such a system would provide a perfect balance between access and profitability. in nelson’s words “nothing will ever be misquoted or out of context, since the user can inquire as to the origins and native form of any quotation or other inclusion. royalties will be automatically paid by a user whenever he or she draws out a byte from a published document.”20 � myth 3—e-books and printed books are competing media many, if not most, published articles regarding e-books follow classic plot construction; the writer must present a protagonist and an antagonist. bibliophiles place the printed page as the hero and the e-book as the potential bane of civilization. proulx, one such author, was quoted as saying, “nobody is going to sit down and read a novel on a twitchy little screen—ever.”21 technologists cast the e-book as the electronic savior of text, replacing the tired tradition of the printed word in the same way the printed word replaced oral traditions. hawkins quotes an author who claims that e-books are “a meteor striking the scholarly publication world.” his slightly more restrained view was that e-books had the potential “to be the most far-reaching change since gutenberg’s invention.”22 grant places this metaphorical battle at the forefront by titling an article “e-books: friend or foe?”23 before deciding which side to take, consider whether this clash of media is an appropriate metaphor. this author has introduced samples of current ebook technology in graduate classes he has taught. when presented with the technology as part of the coursework, students quickly declare their allegiances. bibliophiles most often suggest that the technology will never replace the love of curling up with a good book. the technologists will ask how many pages can be stored in the device and then fantasize about the types of libraries they can carry and the various venues for reading that they will explore. however, after a few weeks in using the devices, both groups tend to move to a middle ground of practical use. at that point, the discussion turns to what materials are best left on the printed page (usually described as pleasure reading) and what would be useful in e-book format (reference works, course catalogs, how-to manuals). other instructors have reported similar patterns of use.24 at this point, the observation is largely anecdotal, but it does call into question the perceived need for a decisive referendum on the value of e-books. the issue is not whether e-books will replace the printed word. the concern of librarians and others involved in the infrastructure of the book should be on developing the proper role for e-books in a broader culture of information. unless this approach is taken, the true goal of libraries—disseminating information to the public—will suffer. the gap between bibiliophile and technologist approaches can already be seen in the materials available in e-book format. the publishing industry in general treats the e-book as just another format, releasing the same titles in hardcover, book-on-tape, and e-book at the same time. on the opposite end of the spectrum, technologists have adopted various e-book formats for creating and transferring numerous reference documents. given their preferences, it is easy to find e-book references on unix, html coding, and the like, but there is a scarcity of materials in philosophy, history, and the arts. librarians seem the most appropriate group for developing shared understanding. publishers and e-book hardware and software manufacturers need to be concerned with the bottom line. libraries, by design, are concerned with the preservation of information and its continued dissemination long after the need to sell a particular book has passed. the hobby of creating and transferring texts to digital form is idiosyncratic and unorganized when viewed from the highest levels. libraries not only contain expertise in all areas of human endeavor, but also have strategies for categorizing and maintaining information in productive ways. in short, libraries are the best line of defense for maintaining the value of the printed page and promoting the value of digital texts. dispelling five myths about e-books | gall 27 28 information technology and libraries | march 2005 � myth 4—e-books are expensive a common complaint about e-books is that they are expensive. on the surface, this seems clear. dedicated ebook readers seemed to bottom out at around $300, and a new bestseller in e-book format is priced about the same as the hardcover edition. add the immediate and longterm costs of rechargeable batteries and the electricity needed to power them, and the economic case against the e-book appears closed. what if we turn the same critical eye to the printed page? the manufacture and distribution of printed texts is highly developed and astounding. when gutenberg succeeded in putting the christian bible in the hands of the moneyed public, he surely could not have comprehended the billions of copies that would eventually be distributed. even with the wealth of printed material at hand, one must still consider the high cost of the system. the law of supply and demand rules books as a tangible product. the most profitable books are those that will reach the most readers. specialized texts have limited audiences and, therefore, will usually be priced higher. this produces problems for both groups. popular texts must be printed in high quantities and delivered to various outlets. unfortunately, the printed page does have maintenance costs. sellen and harper point out that the actual printing cost is insignificant compared with the cost of dealing with documents after printing. they cite one study that indicated that united states businesses spend about $1 billion per year designing and printing forms, but spend an additional $25 to $35 billion filing, storing, and retrieving them.25 books are no different; as any librarian knows, it costs money to maintain a collection and protect texts from the environment and the effects of age. in the retail arena, the competition is fiercer. books that do not sell are removed in favor of those that do. it is estimated that 10 percent of texts printed each year are turned to pulp, although, fortunately, many are recycled.26 the bbc reported that more than two million former romance novels were used in the construction of a new tollway.27 with more specialized texts, the problem is not wealth, but scarcity. if a text is not profitable, it will probably become out of print. this is often synonymous with inaccessible. from the publisher’s perspective, it is only cost-effective to commit to a printing when the demand is high enough. a library is a good source of outof-print texts, provided that it has been funded appropriately to acquire and maintain the particular works that are needed. e-books are not a panacea. other innovations, such as on-demand publishing, may be part of the answer in solving the economic issues regarding collections. however, e-books can help alleviate some of these issues. e-books are easily copied and distributed, which is a boon to the researcher and information consumer. in many cases, the goal is the access to information, not the possession of a book. it could also benefit the author and publisher if appropriate reimbursement systems are put into place. as previously described, nelson originally envisioned his online hypertext system, xanadu, with a mechanism for royalties based on access—a supply-anddemand system for ideas, not materials.28 the systems used to manage access to digital materials continue to increase in complexity and have spawned a whole new business of digital rights management (drm).29 examples include reciprocal (www.reciprocal.com), overdrive (www.overdrive.com), and netlibrary (www.net library.com). libraries are the specific target of netlibrary, which promotes an e-books-on-demand project that allows free access for short periods of time.30 the creation of a standard digital object identifier (doi) for published materials may also help online publishers and entities like libraries manage their digital collections more easily.31 online music systems, such as apple’s itunes (www. itunes.com), strike a workable balance between quickand-easy access to music and a workable, economic model for reimbursing artists. e-books also have appeal for special audiences who already require assistive technologies for accessing print collections.32 having discussed the hidden costs of printed texts, another important economic issue of e-books to examine is a current trend in usage. despite the availability of dedicated e-book readers, the largest growth in e-book usage is surely in nondedicated devices. e-book–reading software is available for personal computers, laptops, and pdas. according to one source, microsoft had sold four million pocketpc e-book-enabled devices, and had two million downloads of the ms reader for the personal computer; palm had sold approximately 20 million ebook-enabled devices; and adobe had more than 30 million acrobat readers downloaded.33 these numbers alone indicate some 24 million reader-capable pdas, and 32 million reader-capable pcs, for a total of 56 million devices. although it is difficult to find data on actual use, one online bookseller reported some data on e-book use from an audience survey.34 although 88 percent had purchased books online, only 16 percent had read an e-book (11 percent using a pc, 3 percent on a handheld device, and 2 percent on both). it is presumed that in most cases this equipment was purchased for other reasons, with ebook reading being a secondary function. as such, it would be unfair to include the full cost of this equipment in any calculation of the cost of providing information in an e-book format. if so, the cost of providing artificial lighting in any building where reading takes place would need to be calculated as part of the cost of the printed page. the potential user base for the e-book rises as more computers and pdas are sold, decreasing the need for special equipment. this does not mean that the dedicated e-book reader is obsolete. by most commercial accounts, the apple newton was a failure. its bulky size and awkward interface were the subject of much ridicule. however, it did introduce the concept of the pda. the success of the palm line of products owes much to the proof of concept provided by the newton. the makers of the portable gameboy videogame system are repositioning it for multimedia digital-content delivery, and plan to pilot a flash-memory download system for various content types, including e-books.35 innovative products such as e-paper are already developed in prototype form.36 they are likely to lead to another wave of dedicated e-book readers or provide e-book–reading potential embedded in other consumer applications. � myth 5—e-books are a passing fad it is trendy to list the failures of past media (such as radio, film, and television) in impacting education despite great initial promise.37 however, all those media are still with us after having found particular niches within our culture. if the e-book is viewed as just an alternative format, comparisons with past experiences of library collections containing videotapes, record albums, and such are not appropriate.38 however, if e-books are viewed as a tool or way to access information, the questions change. instead of asking how digital formats will replace print collections, we can ask how will an e-book version extend the reach of our current collection or provide our readers with resources previously unavailable or unaffordable. when trying to locate a research article, one is generally not concerned with whether the local library has a loose copy, bound copy, microform, microfiche, or even has to resort to interlibrary loan. as long as the content is accessible and can be cited, it can be used. electronic access to journal content is becoming more common. perhaps dry journal articles do not conjure up the same romantic visions of exploring the stacks that may hinder greater acceptance of e-books. a parallel can be drawn to the current work of filmrestoration experts. the medium of film has reached an age where some of the earliest influential works no longer exist or are in a condition of rapid deterioration. according to one film site, more than half of the films made before 1950 have already been lost due to decay of existing copies.39 the work of restoration involves finding what remains of a great work in various vaults and collections. often, the only usable film is a secondor third-generation copy. from digitized copies, cleaning, color correction, and other painstaking work, a restored and—it is hoped—complete work emerges. ironically, once this laborious process is completed, a near-extinct classic is suddenly available to millions in the form of a dvd disc at a local retailer. what if the same attitude was taken with the world’s collections of printed materials? jantz has described potential impacts of e-book technology on academic libraries.40 lareau conducted a study on using e-books to replace lost books at kent state university, but found that limited availability and high costs did not make it feasible at the time.41 project gutenberg (www.gutenberg.net) and the electronic text center at the university of virginia (http://etext.lib.virginia.edu) are two examples of scholars attempting to save and share book content in electronic forms, but more efforts are needed. unfortunately, the shift to digital content has also contributed to the sheer volume of content available. edwards has recently discussed issues in attempting to archive and preserve digital media.42 the web may be suffering from a glut of information, but the content is highly skewed toward the new and technology oriented. in a few years, we may find that nontechnology–related endeavors are no longer represented in our information landscape. � conclusion the e-book industry is currently dominated by commercial-content providers, such as franklin, and software companies, most notably adobe, palm, and microsoft. traditional print-based publishers have also maintained continued interest in the medium. it is assumed that these publishers had the capital to weather the ups and downs of the industry more so than new publishers dedicated solely to e-book delivery. although the contributions and efforts of these organizations are needed, the future of e-book content should not be left to their largesse. when the rocket e-book device was initially released, a small but loyal following of readers contributed thousands of titles to its online library. some of these titles were self-published vanity projects or brief reference documents, but many were public-domain classics, painstakingly scanned or keyed in by readers wishing to share their favorite reads. when gemstar purchased rocket, the software’s ability to create non-purchased content was curtailed and the online library of free titles dismantled. apparently, both were viewed as limiting the profitability of the e-book vendor. however, gemstar recently made notice of discontinuing their e-book reading devices, one would assume due to a lack of profitability. this can be seen as a cautionary tale for libraries, which often define success by number of volumes available and accessed rather than units sold. committing to a technology that concurrently requires consumer success can be problematic. bibliophile and technologist alike must take responsibility for the future of our collective information resources. the bibliophile must ensure that all aspects of dispelling five myths about e-books | gall 29 30 information technology and libraries | march 2005 human knowledge and creativity are nurtured and allowed to survive in electronic forms. the technologist must ensure that accessibility and intellectual-property rights are addressed with every technological innovation. parry provides three concrete suggestions for public libraries in response to new media demands: continue to acknowledge and respond to customer demands, revisit the library’s mission statement for currency, and promote or accelerate shared agreements with other institutions to alleviate the high costs of accumulating resources.43 the proper frame of mind for these activities is suggested by levy: we make a mistake, i believe, when we fixate on particular forms and technologies, taking them in and of themselves, to be the carriers of what we want to embrace or resist. . . . it isn’t a question, it needn’t be a question, of books or the web, of letters or e-mail, of digital libraries or the bricks-and-mortar variety, of paper or digital technologies. . . . these modes of operation are only in conflict when we insist that one or the other is the only way to operate.44 in the early 1930s, lomax dragged his primitive audio-recording equipment over the roads of the american south to capture the performances of numerous folk musicians.45 at the time, he certainly didn’t imagine that at one point in history someone with a laptop computer sitting in a coffee shop with wireless access could download the performances of robert johnson from itunes. however, without his efforts, those unique voices in our history would have been lost. it is hoped that the readers of the future will be thanking the library professionals of today for preserving our print collections and enabling their access digitally via our primitive, but evolving, e-book technologies. references 1. open e-book forum, “press release: record e-book retail sales set in q1 2004,” june 4, 2004. accessed dec. 27, 2004, www.openebook.org. 2. stephen abram, “e-books: rumors of our death are greatly exaggerated,” information outlook 8, no. 2 (2004): 14–16. 3. walt crawford, “the white queen strikes again: an e-book update,” econtent 25, no. 11 (2002): 46–47. 4. harold henke, “consumer survey on e-books.” accessed dec. 27, 2004, www.openebook.org. 5. sue hutley, “follow the e-book road: e-books in australian public libraries,” aplis 15, no. 1 (2002): 32–37; andrew k. pace, “e-books: round two,” american libraries 35, no. 8 (2004): 74–75; michael rogers, “librarians, publishers, and vendors revisit e-books,” library journal 129, no. 7 (2004): 23–24. 6. stephen sottong, “e-book technology: waiting for the ‘false pretender,’” information technology and libraries 20, no. 2 (2001): 72–80. 7. vannevar bush, “as we may think,” atlantic monthly 176, no. 1 (1945): 101–108. 8. michael s. hart, “history and philosophy of project gutenberg.” accessed dec. 27, 2004, www.gutenberg.net/ about.shtml. 9. henry petroski, the book on the bookshelf (new york: vintage, 2000). 10. steve ditlea, “the real e-books,” technology review 103, no. 4 (2000): 70–73. 11. tim berners-lee, weaving the web: the original design and ultimate destiny of the world wide web by its inventor (new york: harpercollins, 1999). 12. james e. gall and annmari m. duffy, “e-books in a college course: a case study” (presented at the association for educational communications and technology conference, atlanta, ga., nov. 8–10, 2001). 13. george p. landow, hypertext 2.0: the convergence of contemporary critical theory and technology (baltimore, md.: johns hopkins univ. pr., 1997). 14. harold henke, “survey on electronic book features.” accessed dec. 27, 2004, www.openebook.org. 15. open e-book forum, “press release: record e-book retail sales set in q1 2004.” 16. lori enos, “report: e-book industry set to explode,” e-commerce times, 20 dec. 2000. accessed dec. 27, 2004, www. ecommercetimes.com/story/6215.html. 17. luis a. ubinas, “the answer to video piracy,” mckinsey quarterly no. 1. accessed accessed dec. 27, 2004, www .mckinseyquarterly.com. 18. mark hoorebeek, “e-books, libraries, and peer-topeer file-sharing,” australian library journal 52, no. 2 (2003): 163–68. 19. theodor h. nelson, “managing immense storage,” byte 13, no. 1 (1988): 225–38. 20. ibid., 238. 21. jacob weisberg, “the way we live now: the good ebook,” new york times, 4 june 2000. accessed dec. 27, 2004, www.nytimes.com. 22. donald t. hawkins, “electronic books: a major publishing revolution. part 1: general considerations and issues,” online 24, no. 4 (2000): 14–28. 23. steve grant, “e-books: friend or foe?” book report 21, no. 1 (2002): 50–54. 24. lori bell, “e-books go to college,” library journal 127, no. 8 (2002): 44–46. 25. abigail j. sellen and richard h. harper, the myth of the paperless office (cambridge, mass.: mit pr., 2002). 26. stephen moss, “pulped fiction,” sydney morning herald, 29 mar. 2002. accessed dec. 27, 2004, www.smh.com.au. 27. bbc news, “m6 toll built with pulped fiction,” bbc news uk edition, 18 dec. 2003. accessed dec. 27, 2004, http:// news.bbc.co.uk. 28. nelson, “managing immense storage.” 29. michael a. looney and mark sheehan, “digitizing education: a primer on e-books,” educause 36, no. 4 (2001): 38–46. 30. brian kenney, “netlibrary, ebsco explore new models for e-books,” library journal 128, no. 7 (2003). 31. stephen h. wildstrom, “a library to end all libraries,” business week (july 23, 2001): 23. online.” they have implemented several process improvements already and will complete their work by the 2005 ala annual conference. this past fall, michelle frisque, lita web manager, conducted a survey of our members about the lita web site. michelle and the web coordinating committee are already working on a new look and feel for the lita web site based on the survey comments, and the result promises to be phenomenal. on top of all of the current activities, new vision statement, strategic planning, and the lita web site redesign, mary taylor and the lita board worked with a graphic designer to develop a new lita logo. after much deliberation, the new logo debuted at the 2004 lita national forum with great enthusiasm. many members commented that the new logo expresses the “energy” of lita and felt the change was terrific. with your help, lita had a very successful conference in orlando. although there were weather and transportation difficulties, the lita programs and discussions were of the highest quality, as always. the program and preconference offerings for the upcoming annual conference in chicago promise to be as strong as ever. don’t forget, lita also offers regional institutes throughout the year. check the lita web site to see if there’s a regional institute scheduled in your area. lita held another successful national forum in fall 2004 in st. louis, “ten years of connectivity: libraries, the world wide web, and the next decade.” the threeday educational event included excellent preconferences, general sessions, and more than thirty concurrent sessions. i want to thank the wonderful 2004 lita national forum planning committee, chaired by diane bisom, the presenters, and the lita office staff who all made this event a great experience. the next lita national forum will be held at the san jose marriott, san jose, california, september 29–october 2, 2005. the theme will be “the ubiquitous web: personalization, portability, and online collaboration.” thomas dowling, chair, and the 2005 lita national forum planning committee are preparing another “must attend” event. next year marks lita’s fortieth anniversary. 2006 will be a year for lita to celebrate our history, future, and our many accomplishments. we are fortunate to have lynne lysiak leading the fortieth anniversary task force activities. i know we all will enjoy the festivities. i look forward to working with many of you as we continue to make lita a wonderful and vibrant association. i encourage you to send me your comments and suggestions to further the goals, services, and activities of lita. 32. terence cavanaugh, “e-books and accommodations: is this the future of print accommodation?” teaching exceptional children 35, no. 2 (2002): 56–61. 33. skip pratt, “e-books and e-publishing: ignore ms reader and palm os at your own peril,” knowledge download, 2002. accessed dec. 27, 2004, www.knowledge-download.com/260802 -e-book-article. 34. davina witt, “audience profile and demographics,” mar./apr. 2003. accessed dec. 27, 2004, www.bookbrowse.com/ media/audience.cfm. 35. geoff daily, “gameboy advance: not just playing with games,” econtent 27, no. 5 (2004): 12–14. 36. associated press, “flexible e-paper on its way,” associated press, 7 may 2003. accessed dec. 27, 2004, www.wired.com/news. 37. richard mayer, multimedia learning (cambridge, uk: cambridge university press, 2000). 38. sottong, “e-book technology.” 39. amc, “film facts: read about lost films.”accessed june 19, 2003, www.amctv.com/article?cid=1052. 40. ronald jantz, “e-books and new library service models: an analysis of the impact of e-book technology on academic libraries,” information technology and libraries 20, no. 2 (2001): 104–15. 41. susan lareau, the feasibility of the use of e-books for replacing lost or brittle books in the kent state university library, 2001, eric, ed 459862. accessed dec. 27, 2004, http://searcheric.org. 42. eli edwards, “ephemeral to enduring: the internet archive and its role in preserving digital media,” information technology and libraries 23, no. 1 (2004): 3–8. 43. norm parry, format proliferation in public libraries, 2002, eric, ed 470035,. accessed dec. 27, 2004, http://searcheric.org. 44. david m. levy, scrolling forward: making sense of documents in the digital age (new york: arcade pub., 2001). 45. about alan lomax. accessed dec. 27 2004, www.alan -lomax.com/about.html. dispelling five myths about e-books | gall 31 (president’s column continued from page 2) art & tech 24 ebsco cover 2 lita covers 3–4 index to advertisers lib-s-mocs-kmc364-20140601051834 biblios revisited j kountz 63 biblios revisited john c. kountz: library systems coordinator, california state university and colleges, los angeles. when this article was in preparation, the author was systems analyst, orange county public libraries, orange county, california. in the following, orange county public library's earlier reports on its biblios system are updated. book catalog and circulation control modules are detailed, development and operation costs documented, and a cost comparison for acquisitions cited. "in 1968 ala began publishing, through its information science and automation division, a journal of library automation. it is perhaps appropriate to note that in the first three quarterly issues only one public library project was described ( 1), and this was a project under contemplation, not one actually in operation." ( 2) this statement by dan melcher to substantiate his contention that library automation is suspect is, in itself, suspect. the public library project alluded to as being contemplated in 1968 was brought to fruition by orange county (california) public library in 1969, and has functioned with startling success ever since. in addition, the finished system was reported to the library ( 3) and data processing ( 4) worlds in 1969 and 1970 respectively. orange county public library's biblios (book inventory building library information oriented system) is a system designed to fulfill all functional requirements of a multibranch library which is growing by leaps and bounds (5). specifically these functional requirements are: acquisitions, book processing, catalog maintenance, circulation control, and book fund accounting, in addition to management reporting on a level not practical in a manual system. 64 ]ounwl of uhrary automation vol. 5 / 2 !unc, 1972 the functional system the interrelation of these system elements is shown diagramaticall y in figure 1. briefly and from a us<'r's point of view, the system works like this: a title is desired by someone, patron or staff member. the p erson refers to the book catalog, figure 2, to see if the item is in the collection. if it is and not in circulation, he gets the book directly. if the item is in circulation, he can submit a request for it-to rece ive the book on its return. to update the catalog, a cumulative supplement is produced, keeping current the listing of the library's holdings. if the title is not found in the catalog or supplement, the monthly cumulative on· order list, figure 3, is consulted. if the title is listed , a request is submitted and, on receipt and processing, the book is released to the requester. if the title is cancelled, the requester is notified. when a title wanted for the collection is not listed in either the catalog or the cumulative on-order list, a bibliographic information sheet ( bis ), figure 4, is completed and optically scanned into the system. this information is essentially a pre-cataloging bibliographic description of the desired material. once entered, these same data serve first to create purchase orders and related reports; then, once edited by the catalogers from the book in hand, to create book card and pocket sets (figure 5 ), book catalog entries, shown in figure 2, holding lists (shelf lists ) for each branch, and a broad array of operational reports. it is a feature of biblios that the descriptive data (from the bis) are entered in their entirety only once. this means that a bibliographic description need not be initialized by each individual using it; rather, it need only be consulted and, if necessary, corrected or deleted. thus, an entry once in the system is immediately available for, among other purposes, ordering. this is especially significant since it means that each entry in the book catalog, the catalog supplement, the cumulative on-order list, etc., can be ordered against by simply using the key number for the desired item and the number assigned to the branch wishing to order. this poses the possibility of orders for materials which are op or otherwise not readily available through the usual vendor channels. biblios addresses these potential errors by listing (pre-vend list, figure 6) all order requirements for review before they are used to create orders. by editing this list against books in print and/or publishers' catalogs and taking corrective action, orders for the unobtainable are short-stopped. on placing an order, while a unique subpurchase order number is mechanically created, the key number continues to document the title for processing purposes. in this role the key number follows the order until it is filled or cancelled. thus, the key is used by biblios to update inven tory automatically on receipt of an order and to create the card and pocket sets for those materials received. finally, the key number is used by the branches to report inventory changes and, as a subset of inventory, for circulation control. rl ruos revisited /kountz fl.s since it is through the key number (or key, for short) for a bibliographic citation that the citation is used in the various functions performed hy biblios, perhaps a little detail concerning the key is in order. bibliographic data optical scan marc ~ l bibliographic jl book catalog ~ master indices book catalogs & supps. orders new materials reorders ~ ! acq uisitions accounting ~ sub purchase orders on-order-lists budget reports vendor performance pre-vend/review lists inventory update losses gifts ..___...._r----inventory t locator ~ lo cator guide & supps. pocket & card sets collection profiles fig. 1. biblios-the functiorwl system. the key number circulation input transactions patron registrations _,...___ ... l cl rculation l ~ holdings list book "tags" patron register overdue notices management repo rts us e profiles in figure 2, the key for 73084452 has been underlined. the key number resembles the lc card order number. wherever an lc card order number is available, it is used. when no lc card order number is available, a unique orange county ( oc) number is applied. the oc number consists of two alphabetic characters in the first two positions (at one time the numbers implied year ) of the "traditional" number followed by a six-digit sequential number. since the library of congress has certain idiosyncracies about its card order number, the key also specifies the type of material it represents (for example, only book keys are in the book ca talog ), and identifies each volume, or edition, of a title which has a blanket lc card order number. the selection of the lc card order number for this application was based on a suspicion that the bulk of materials in the collection were already adult catalog '71 cumulative supplement 7 author-title section wall, joseph frazier. walter chandoha's book of foals ward, mary jane. washington, george, pres. u. s ., andrew carneg•e oxford umvers•ty 1970 and horses. the other carol•ne~ a novel crown 1970 17 32-1799. 1 137p lnde• b•bhog photos b•ography see. chandoha. walter 216p the journal ol ma1or george wa.h1ngton 92-c aa006725 636.1 197 1 aa011379 fiction 70108078 march of ameoca facs1mlfe seoes. no wall, leonard vernon comp1ter the walters ndrome. ward, ritch i e . 42 oogmat r p reads w1tilamsburgh the puppet boo~ ed. of g. a while 2nd & see· neel ric~ard the living cloc~s drawmgs by hollett smith ponied. london. reponted for r jefferys extensive rev under ed. of a. r. philpott y . allred a knopf 1971 385p index b/ w 1154 co vers the pertod from oct i 153 faber & faber 1965 300plnde•b•bhog fictionm 1970 79122149 lllusphotos to jan 1154 un•vers•tymicro fllms 1966 b/ w lllus photos walters, barbara. 574 1 77111247 32p no index maps 791.53 68017740 how to tal~ w1th practically anybody about · . 973.26 66026314-001 wall street and w·tchcraft practically anyth~ng doubleday 1970 warde, fredenck b . washington international arts i • 195p f11ty years ol make bel1eve by fredeock l tt see gunther. max 80856 aa007142 warde international pr syndicate 19 20 e er edllor 133 1971 aa012873 · liop grant s and a1d to lnd1v1duals 1n the arts th w ii s walton, clarence c . contammg ltstmgs o f most protess1onat e a treet jungle. ethos and the execut1 ve values m fiction 2 i008l 54 awards. and /nlormafion about colleges see ney. richard manageoal decision makmgprent1ce-hall wardropper, bruce w. edlfor umvers1tres and prot schools of the 332.678 1970 76084477 1969 267plndekbibhog spamsh f>oetry of the golden age edited by arts by the ed1tors of the washmgton wallace, irving. 658.4 .. 73084~52 bruce w wardropper appleton-century inti arts lefler paperback wa sh lnll the nympho and other mamacs s1monwalton lzaak 1971 353p for lang poetry collect1on art s le tter 1970 75p no index sc huster 197 1 4 7 5p index b1bhog b/ w the ll~es of john donne and george herbert b1bhog r378. 34 70112695 lllus b•ography bound wlfh the prlgom 's progress. by sp861.08 78132806 waskow, arthur i. 301.415 aa011778 john bunyan v 15 m the harvard ware, clyde. the freedom seder a new 1/aggadah l or wallace, marcia. c!ass1cs coll1er 1909 418p the ecfen tree touchstone pubhsh~ng passover holt r1nehart w~nsto 1970 barefoot '" the kltchen. a cookbook tor fiction 09023026-001 -015 company 1971 357p 56p b w lllus summer hostesses dra wmgs by re1d the l1ves ol john donne and george herbert fiction aao 13079 296.437 7910355 7 perez kolman st marhn·s pr 1971 150p bound w1th the p1lgflm ·s progress. by . wasley, ruth. index b/ w tllus john bunyan no. i 5 m the harvard warmack, ohver j. bead oe 51 gn a comprehens11·e course tor 6 4 !5 73145431 ctasslcscoll1er 1937 418p the mystery ol lmquity . volume i 2 thess . begmner and expeoenced craltsman by w ii · r b rt fiction 37040164 -001 -015 2.1 pub by the author 1969 120p no ruth wasley and ed•lh hams crown a ace, 0 e edit or w b h j h index 19 70 216p index b w ill us col ill us the worldoibermnl.l5981680 byrobert am aug • osep • 200 77013647-001 photos 1 w41/4c~ tjnd th~ cdllors of t1me-lde the b lue kmeht an atlantic monthly press books t1me ·l1 i e books. 1970 192p index book l11t1e. brown. 1972 338p lg dwin g . -------746.~ . ~ ...__......._ 81~&,.chm ; ' cql.!!.~u s · · __...------~·~'l...._ _ 79175 nd the~r cu ~ -~ --~ ~-~ fig. 2. a book catalog page featuring four columns. biblios revisitedfkovntz 67 assigned a number, a suspicion which was confirmed on completion of conversion through simple reporting of the keys on file. in short, after fifty years of operation of orange county's libraries, 92 percent of all titles in the collection had an "lc number," a factor one might weigh when trying to decide between isbn and lc card order number; nor has it been indicated that isbn's will be developed retrospectively. an update to the system in the paper presented to the american society for information science in 1969 ( 6), neither the book catalog nor the circulation control modules had been implemented. book catalog in may 1971, the first edition of biblios book catalog was released for public use. since that date, the cumulative supplement has been run six times. the module of biblios producing the book catalog and cumulative supplement is diagrammed in figure 7. input is the title-master file (the system's bibliographic data base) and a specification of the output required. the output options available to the library include the production of either a full catalog or a cumulative supplement (displaying all entries placed on file since production of the full catalog which have been edited by cataloging). in the case of full catalog production, the title-master file is updated to reflect the use of all qualifying entries for catalog production and the date of their use. this updating facilitates cumulative supplement production by precluding the use of these entries from display until the next full catalog run. in addition to the type catalog (full or supplement), the library designates the format of the output. either an off-line print-out or a print file designed to drive a mechanical photocomposition device, or both, can be requested. it is important to note that this print file is designed specifically to be hardware independent, e.g., it will run on rca, photon, alphanumeric, or comparable equipment with equal ease. hardware independence in its simplest terms means the computer program does not have to be rewritten each time a vendor goes out of business. and, coincidentally, this print file is in the sequence it is to be displayed in. in short, the vendor only performs that processing necessary to make his device set type to the library's specification for layout, font style, and font sizea specification, it might be added, which calls for upperand lower-case type from a file in upper-case only. this approach differs from what has become typical of book catalog production in that sorting, file maintenance, and all related processing are sustained by the library through biblios. the vendor only sets type, prints, and binds. the results spell savings since a potentially error-laden file does not have to be committed to the most expensive of all displays, photocomposition, before corrections can be made. l~20l4c4 cumulative on·o~oer list _ _ _ _ media 01 book author titl~ --wioereierg~ siv wicker, kings~ey '~ier, este::t wiest, j, i levy, p, wiltox, leslie a, _ _ wil.de f; , laura .( incal.ls) wilk, max wllkeksun, oavio ~ilkes, bill st. john wllklhsun, paul h, wllki~son, rupert willarn5, milorto wilds ~~~ li.c ox, dont,lo \oilllc(1ll, donald wjllcux·, donald wjlliaiis, brao will i ams, ctlljn ·-· \ol i l li mis, ethel w a _ williahs, garth vii ll i mis, jay ~illians, john g, will i mis, joyce will 14ms, hill.er williams, rooert m. --williams, te nn essee ~ illia~s, ursula moray ~illia~s, ursula mo ray ~ illi~g hah, ~arren we 1-: illl$, f. roy i'll ls j;.~ , eomlino hilson, ~~len janet (came whson, erica wilson, h, w.; firm, pusl ¥! hson, ira g. 'rl ll.son, jean vi i lson~ j ulin rgwan ~ilson, k~ nneth l. w inc~[ll, cunstance mabel ~inuchvt eu~ene c, winn, marie; lllt\hf\ts tales , ~ i ntepburn 1 mollie ~ t nterst donald l., _ __ \-' lrte noer g, patricia l~ ~~ 1 se 1 arthur wi se, herbe~t alvin \ollse, slr.ne:v t, ~!li it her s1 carl ~lltkin, b. e. my best frtf l-10 ways of nhhlism white oak managem nt guide to pertm~schjt s 1/ cyage by thr. ~horr.s of silver l wit and wisrom of holly~u cross and th e s witc~ulaoe nautica i' arc ha ecjlogv aircraf enc i nes of t~e w preve nti on of u•i mki ng pr luc k nf harr y wfav~r muo (rn leat htr design new ues!gn i n jewelry wood design lost le l enos cf the wes t ~omnsex~al.s and the mllit knuw youk ant r sturs big gul •.en animal a 8 c s ll. v e r ~· h i s tl t; field gi de to the sutter adjusta~l[ jul i~ only wo~lo thlre is u c l a susi nr ss forecast hilk train du~sntt stop h snv in ear< n thrh toy makl rs fr~e-ac ~ ess higher ed ucat jta~y c ~ooses e~rope upstate alllr icml pai nter i n paris crf~e l. ~mgro l dery ficti on catal~g for 1970 wf-lht ccj ii ?uto;s ca n ~ot do weaving is fu!·i oa~r lt~c. t o:~ have fa l th with~ut ftar guide t reference books, to nk in 1' ulf playgro . p ol!o k w i ntf~ts tales l61 1970 techn i que lf rlandpu ilt pu henry c· nt.m .l 'riall..u:!: as all.aro~ndt h l 4guse 6rt \ti hu ki lli:o e!ijch pur:l.ll great t ~es uf tl rrgr i t lnvfst ~no ~~tirl i~ mexl m\e rjca . riddle &ulll< sumhary of califurnia law -------------------------fig. 3. all outstanding titles are reported in the monthly cumulative onorder list. orange c3unty public libraky lc-cc · n i.j ~uer 7zl14z2 9 ••••••••• aaoll44 9 ••••••••• 7 ~ 13~ 0 00 ••••••••• aaoll~5 3 • •••••••• 7 j l ~ 5~89 ••••••••• 1 39 0 27 9 4 9. , ••••••• 73124 9 83 ••••••••• 63 009~42 ••••••••• 711,9~50 ••••••••• 41013 39 7 •••••• 070 7aou 3d57 ••••••••• 72 131 147 ••••••••• 6y0 17~b7., ••••••• , 7 9 126t 7o ••••••••• 6~0 12400 ••••••••• . 7oo h 0 ~ 6n •• • •••••• aaol l , o? ••••••••• 6 00 15 2 5 2 •• • ••• • •• ~700 ~jlz ••••••••• 71 1 36 ~ 8? ••••••••• 7 3 146 ~ 03 ••••••••• a a u12 ~ 5l • •••• • ••• 7ol227uo ••••••••• aa017727 ••••••••• b 30l364l ••••••••• 79 1 0 2 ~ 11 ••••••••• 7315 2 d 7 ~ ••••••••• aa 0 17711 ••••••••• 7 50 g3 v 24., ••••••• 7514 33 0?. ••••••••• 7 0 149zzj., ••••• • • 62 009637 ••••••••• 0 90 35 0 4~ •••••• 070 ' 7 3 112 ~ 23 • • • •••••• aao l3 , 77., ••••••• 72150 ~2 ~ ••••••••• 7 7 1 2 4 69~ ••••••••• aao u 6 78 j •••••• o 67 aa c l2 1 55 • •••••••• o7 o l3)ql ••••••••• 5 5o1 j l ~~ •••••• o7l , aanl 77 9 4••••••••• 7 t::cj:1orc; .j ••••••••• 600260 ~~ ••••••• • • 761 4 8 ~ 31 ••••••••• 4 ~00 5 5 ~ 2 ••••••••• aao l 7o e5 ••••••• • • 5j olo u45., ••••••• 6000 4 79~ •••••• 169 surl pu t; tip. l8 order no 7127707 7 47 7117906?.49 7121705 '•27 7ll3 c02 0 3.3 7l2 7700j o a 711791 8~36 71 2 2105 -' ~l 7 1 2 !>ooo l 90 7127703 '•36 712210 ~'i 8q 710 '> 704 \li. o 7l22107z63 712!>6023 9 4 7122103587 7l277q5.::i30 712770297 9 7125605046 71277 0 2742 7l2t:l006~ l 71 22104 <. 31 7j.2560l l!l 6 7122100 0 95 712!>6026?.3 71161098 7 !) 7l2!;i605 ;•j3 7l2770 0 o37 7 l27 7 :l 4 ub8 "tllol os 'j65 712tr 0 2 6 5l 712 560 3 v l3 7125600143 7lz770ll6t 7l 25r) 097co 712211 ( v ~h 712770 1!u e0 7 i 6 4 712,~03013 711"1903-il~ 7 1256081>0 7127711:323 71 2i'7u tl'>31 7ll~ l()4d69 712770703i. 71221qq l ?7 712 ~ 60 3 g 5l 7 1277021> 71 711791p t94 712"170 59 05 712 21 0 9t•0 7 vei~dqr ct bro l:tp. o sho brq pr< ~! bro br~ bro cd bro bt b~o bro bro cro bt ar.o bihi 8 rll &ku ur•u sf{ o ch ut ~ r l.j ck o c!-i &r ij bf, g flfw liko wi p i< \~ st 00 bt p.ro dd cf·ia ih a f.:. a p (;ro ihuj bro ort-1 [i t cll too•··· 7l p t.r • .:~ 6 8 ptl qty cd :?.z 4 5 l 1 1 23 l 4 2 8 1 2 3 l 1 2 l 1 16 !. l 1 'l 1 7 lo '0 1 j 6 1 26 fl l \ ' ) 6 2 4 l 3 4 2 1 4 l 7.7 3 :; li ~ t prtc.t: :i, so-7.or, 3.95 4.9~ !!.9~ l. 8 () 7,95 4,9 5 ?.'j:; 2 5. 00 l :> . (; q 4.5 o 1~.so · 7.~0 8 ,95 5,95 6.9; 5 , c·o 3,3 () 3 .if ., ~ . 9 :; l. 3 5 4.9!> 10. ~ 0 3. 75 3,a il 3. (1 () 6 , ~ i) a. ~o 6. 9 :; 4 .95 7.5 () zs.r.o (l. 9~ 5 .95 b.95 3 .9~ · 4. 00 6.9~ 4.95 5.9!5 l.c, \) 0 8,9!) 5. 0 0 ;. 9 ) 4. 'i 'i l. -.: ..... a ;j ~ ..... 0 ~ 5' . ;::s 0 < £. cj1 0 ......_ l\j '--< c: 0 ::l _en ..... 0 cd -l t-0 0 --- 7616035~••••••••• 1 ryck1 fra~cis f a loarj~d gun a~olz421········· 1 sacki john n a ~ieu!enant callf.v 7712'906tt••••••• 1 sanders, eo, n a fami~v aa01311~•·••••••• 1 sanderson, ivan te~ence n a \.!• 5 9 a, 61014l1'••!••••!• 1 sandoz, marl ~ j r jhes~ were the sioux aaq1048~••••••••• 3 santesson 1 hans stefan f j days -after tqmurrow aaol0~7bee•t•••~• 1 sase~, hlroslav n esto es san p:railciscu j aa010~77,,,,,,,,, 1 sasek, mirosi.av n estu es washingt0~1 d, c, j 6~0l9787e••••••!• 1 sasek, ~lroslav this is hong kong '!016zb6," .. "!! 1 sasek, 11irdslav this ls pa~is 14171270 .. " .. ,., 1 saxton, jusepiiine group feast n j r n j r f a 1.1~1 pi{'i(;e ~,,o 4o95 ---------------------------------------------fig. 6. before producing a sub-p.o., the pre-vend list is checked for o .p. materials, among other things. ---------------------------~-----------------, y , _ r,u~llc llort\ry oaf e .. ll•oij·7~ ~agf44 ol ihjok , ty publlsh [ fl. ua!e vur 0 ujsc net pr av .. special o~oe~l liata co cd ptt p.rice co co 3 ~arnt:s -nubi.e ~9'11 •bro a ! , , e~!jnothcs ih a 4 l>oualroar 1971 •ou a 5 dcjl.isle()tiy 19?1 *ol' a sf 9 steln-uay 19"11 *i.h\ 0 a a! a 2 vi k l '·h> press 19'!1 •bt a 8ro a ) e. p, dutton 1971 *if!' a ii---3 n -.]. -.]. ----------~----~--------------()0 lb~oc301 holdings list cost center j adult call nu"ber author * ()' i 0 795.~15 reese terence • story of an accusation 1 0 795.415 s~einwolo alfred * shoat cut to winning bridge~· 795.415 s,.ith thoi'ias " * look it up in i'oyle 795.~15 young ray • bridge for people who don t know 0 795.41503 reese terence + bridge player s dictionary , 0 0 0 0 0 0 0 0 0 0 0 0 d 0 0 0 0 795.42 ccllver donald i + scie~tific blackjack and co ~ 795.42 thorp edward 0 * beat the dealer a winning stri 7q5.43& blackstone harry + blackstone s modern card tr ~ 795.43& stanyon ellis + card tricks for everyone 795.540973 r~nd ,.cnally * 1970 rand mcnally guidebook to c 796 bisher fur "an * with a southern exposure 796 krout john allen + annals of a"erican sport 796 mittelbuscher c f + call em rig~t 796 murray jim * sporting world of jim murray 796 smith robert miller • grantland rice award pri2 7<16 smith walter wellesley + views cf sport 796 vannier 11aryhelen + individual and tea" sp orts 796 wood cle11e nt • coi'iplete book of gai1es 796.026 sports rules encyclopedia* sports rules e~cycl 796.03 s~lak john s + dictionary of american sports 796.06& aaron david* chilo s play 796.068 butler george d * recreation areas 796.0m isaacs stan* careers and opportunities in spoil 796.08 pepe philips* winners never ouit 796.08 wino herbert warren * realm of sport 796.082 esquire * esquire s great men and moments in sp 796.0&2 sports illustrated * sports the american scene 796.09 cohane timothy + bypaths of glory 796.0973 be9t sports stories • best sports stories for 796.0973 best sports stories * best sports stories for 796.0973 best sports stories • best sports stories for 796.0973 best sports stories * b!st sports stories for 796.0973 best sports stories • best sports stories for 7q6.l broer marion r • fundamentals of marching 7q6.13 c~~se richard • hull~baloo and other singing f( 796.15 wagenvoord james • flying kites 796.3 holt rich~ro • teach yourself billiards 796.31 amateur athletic union • official rules 796.31 maxwell harvey c + american lawn bowler 796.323 amateur athletic union + official a a u 796.323 sports illustrated + book of basketball ~no sn( handbai s gu i dl basket i 7q6.32307 verderame sal reo • organization for champidnsi 796.323092 auerbach ~rnolo reo * reo auerb~ch winning the 796.3230q2 pettit bob • bob pettit the drive within ~e 796.3236 a~thel" pete • city game 796.33 ccnerly charlie • forward p~ss 796.332 schenkel chris • how to whch football on tel.e1 796.33203 treat roger l • encyclopedia of football 796.3320'& riger robert • best plays of the year 196z 796.33206 curran bob • four hundred thousand dollar quaa" 796.332077 devine dan * "issouri power football 796.332077 schoor gene • treasury of notre dame football 796.332082 newcombe ~ack * fireside book of football 7q6.33209 bell joseph n * bowl game thrills --.-----------.-----------......-.--..-.-fig. 9. th e maintenance of manual shelftists is obviated by a bibliosproduced holdings list for each branch. j------------------------------costa iusa loh-fict ion l tle i: card froi1 another l' lete casino guide ·ie gy for the ga"e of twenty one i :ks ; l"pgrounos rev eo i i l: sports stories i i 'or girls ano wollen ! 11pe d la ls i i ~ qrts i i i 061 : ~63 ~64 ~66 ~ 970 ~lk gai'ies ( jker l l e ! all guide 1965 1966 r ip high school basketball hard way ~.is ion ' erback 07/01/71 page 341 r s n8r lc/oc nbr • 67~17872 ••••••••• • 61016665 ••••••••• • 7&077366 ••••••••• • 64015641 ••••••••• • 63025374 ••••••••• • 66023116 ••••••••• • 66012019 ••••••••• • 58005566 ••••••••• • 6&022206 ••••••••• 60c01380••••••••• • 62008215 ••••••••• • 2900080~a •••••••• • 8b091802 ••••••••• • 68c25594••••••••• • 62015934 ••••••••• • 53006862 ••••••••• • 60007465 ••••••••• • 3&003909 ••••••••• • 61019409 ••••••••• • 60013658 ••••••••• • 64012696 ••••••••• • 57011288 ••••••••• • 64019529 ••••••••• • 67026079 ••••••••• • 66019433 ••••••••• • 61010232 ••••••••• • 63021480 ••••••••• • 63016506 ••••••••• • 45035124 •••••• 061 • 45035124 •••••• 063 • 45035124 •••••• 064 • 45035124 •••••• 066 • 45035124 •••••• 070 • 65021807 ••••••••• • 49008127 ••••••••• • 68031281 ••••••••• • 5&003667 ••••••••• • 88c40004 ••••••••• • 66025876 ••••••••• • 88090177 ••••••••• • 62011346 ••••••••• • 63014720 •••••• • •• • 67011223 ••••••••• • 66c14357 ••••••••• • aa010179 ••••••••• • 60012110 •••••••• • • 64020856 ••••••••• • 61013913 ••••••••• • 62022305 • • ••••••• • 65022618 ••••••••• • 62005250 ••••••••• • 6201&326 ••••••••• • 64019933 ••••••••• • 2 63016799 ••••••••• 80 journal of library automation vol. 5/ 2 june, 1972 branch to circul•tion control sub-system book card production (by s ranth) book & date-due cards fig. 10. biblios circulation control subsystem. biblios revisitedjkountz 81 cards, cassette, or mini-reels). ideally, the elusive transactor should be able to "read" a label on the book as well as a patron card. kimball labels, "sunburst" tags, magnetically coded swatches and the like have worked and continue to work in the retail trade; there is no reason why they shouldn't work for libraries. the only deterrent seems to be the reticence of their manufacturers to enter an unknown market where, following the melcher axiom, they are met with a "stubborn, 'show me' attitude when automation is proposed." ( 8) the products designed into the circulation control module include: weed lists, patron "black lists," circulation profiles (graphically displaying patron use of each branch's collection), and automatic duplicate ordering. reports measure circulation from a manager's viewpoint, but not to the exclusion of such bread-and-butter products as overdue notices, registration lists, and related statistical recapitulations. a word about documentation for each program in each subsystem of biblios, forty unique programs in all, there is a formal package consisting of: l. a program specification detailing the inputs, processing, outputs, idiosyncracies, and edits of that program; 2. a listing of the cobol program itself; 3. an operations binder (notebook) section for set-up and run procedures; 4. a user's guide section relating requirements and diagnostics to the librarians using the program including typical problems; and, 5. assorted total system binders (notebooks). while some might think "overkill," in automation this is not the case. the biblios system has yet to fail a scheduled commitment. further, it is suspected that the mere discipline of documentation caused many serious reconsiderations of program and procedural logic, at the time and on the spot, with the result that biblios is a reliable systemrequiring no major rework and continuing to respond to the library's functional requirements for over two years at this writing. a word about development costs both developmental and operational costs for biblios are known and documented. specifically, the costs to procure such a system are broken out in table 1, where each subsystem is examined in terms of the dollars it represents and the assorted tasks required to bring it into being. the totals represent all costs over approximately a three-year period beginning with rough specifications and yielding the first book catalog. it must be noted that final program specifications and coding were performed for orange county by a contractor. this approach was chosen, since a good job done on time was wanted. that the approach was valid is table 1. biblios development costs (including full conversion and publication of first book catalog). (x) 1:-0 '-. 0 bibliographic book catalog ~ marc inventory locator guide acquisitions circulation total contractor 0 -program specifications t"-' & coding $16,686 $ 54,299 $ 25,800 $ 72,305 $ 91,000 $260,090 & ;::; ""' .'= orange co. public library ~ :::: ..... analyst 3,360 7,840 2,240 14,560 7,000 35,000 0 ;:; coordination ..... 1,225 7,679 818 5,310 5,670 20,702 -· -~ :s implementation ( k.p. , < machine time, etc. ) 4,772 12,263 4,635 7,879 10,110 39,659 £. con version/ outside cll -services 800 53,500 41,370 95,670 l-:> subtotal 10,157 81,282 49,063 27,749 22,780 191,031 ._ ,.. :l "\.) ,.... total $26,843 $135,581 $ 74,863 $100,054 $113,780 $451,121 c;o -..1 l..o biblios revisitedfkountz 83 evidenced by the achievement of a successful system on schedule and within budget. this approach reflects a contention that librarians can specify their requirements if they "have a mind to," and that a contracted programming staff can satisfactorily perform to predetermined standards and timeframes if properly directed. in direct contrast to this approach are the incredible schedules developed when requirements are not specified (and frozen), and the suspected monumental costs hidden in lost staff time due to extended parallel operations or simply waiting until "they" get the " ... thing" to run right. the remaining cost components, briefly, reflect direct library analyst time, the cost of coordination meetings, direct key punch and machine time for programs, their test, debug, string test, systems test, and for the bibliographic and book catalog, subsystems conversion and catalog print file generation. the conversion/outside services include a marc subscription, the creation and use of a group of nine typists to optically scan the library's files to convert them to machine readable form (including error correction), and the contracted services of a photoreproduction house to mechanically compose, print, bind, and deliver 500 sets of the book catalog and 100 sets of the locator guide. these are the costs of setting the system up, staff training, and creating a single operational display: the book catalog. a word about operating costs early in 1965, as a prelude to implementing a book acquisition program, a time/cost study was performed to determine how much it cost the library to order a book (one title). this study detailed and costed the typing, sorting, assignment of vendors, and the reduction of a diversity of paper requisite to creating a purchase order. excluding the cost of the purchase order form itself, the direct manual cost for this process was $1.56 per title, using a clerical rate of $2.10 per hour. in the intervening years three things have happened: first, clerical rates have increased to $2.79 per hour which when applied to the unit cost of the 1965 acquisitions study means a direct outlay of $2.07 per title (as against the previous $1.56). second, the number of branches has increased which implies that, if the manual system of 1965 could cope with the increased load, it would have required more people and therefore an increase in indirect costs, not to mention the probability of less efficiency due to increased direct costs. third, orange county has automated this function (as well as others). since orange county is wont to track costs, it so happens that the cost for creating a purchase order ( subpurchase order under the new system) is available. specifically, orange county knows computer and peripheral costs and the exact time for processing from actual billings over the past two years. the reduction of these data to a per-unit-handled equivalent, while detailed, is not difficult. thus, it is possible to deduce the machine costs table 2. typical processing costs for one title in orange county public library's biblios system. marc acquisitions 2 book catalog (weekly) bibliographic1 inventory order receive3 b.c. inventory run cost $325.16 $300.40 $201.21 $1244.94 $238.55 $238.00 $26.00 average items per period 1154 1,000 8,100 700 4000 order receive cost/entry 2.83 0.30 0.025 $1.78 $0.34 0.059 0.0006 supplies 0.13 0.028 $.05 (sub p.o.) services 0.02 ( convelope) .06 ( opscan) 0.041 0.0028 (opscan) ( comp /print) ( comp /print) total $2.96 $0.32 $0.053 $1.89 $0.34 $.10 $0.0034 example: cost of entry from initial input to display in book catalog (including convelope; excluding marc source: $2.77). 1 40% bibliographic. 2 60% bibliographic. 3 includes invoice, vendor, and budget displays. 4 if all new entries to system came from marc. (x) ..,... 0' ::: 3 1::) ........ ~ t""< 6.:. ..., ;:::: ..., <::: :;.,... ::: .,... 2 :::; ..... c· ;:; < 2.. '-" -...._ 1'0 ._ ,.. 5 "(1) ,_. ~ 1'0 biblios revisitedjkountz 85 equitable to those for the earlier manual effort : creating a purchase order for one title, including the purchase order form , now costs $1.89. similar economies can readily be documented as can the increases in service to our patrons at no increase in staff. the operating costs for those biblios subsystems in regular use are given in table 2. only two entries on this table are not self-explanatory. marc marc, which is indicated as processed weekly, has not been run for over a year. the explanation is simple economics. it costs $0.32 to manually place a bibliographic description on file (excluding the time spent to circle an entry in publishers, weekly (pw) vs. $2.96 to process the same entry from marc. this cost for marc includes the subscription cost prorated to selected entries, the translation and format of all marc entries, the automatic release of those entries of limited value to a public library, the cumulation of entries which may be of value, the extract and transfer of those entries selected, and the reporting via indices and full listings for the contents of the cumulated file. the unit cost is the actual processing cost for marc ii files for one year divided by the number of titles processed through the rest of biblios during the same period. this cost does not include corrections to selected marc entries (invariably in the call number and author fields for consistency with the library's existing files). the costs affiliated with processing corrective input closely resemble those for bibliographic, e.g., $0.32 each. prorated bibliographic input biblios works on pre-cataloged entries. the 60 percent bibliographic input shown under acquisitions relates to the full initial description for a title being entered by a book selector to effect its order and subsequent reporting; the 40 percent shown under bibliographic is for cataloger input to adjust the entry for title-page accuracy, consistency with existing files , and, for nonfiction, the assignment of call numbers and subject headings. it is important to note that for reorders against a title already in the system, no bibliographic input is required. in the case of reorders, the per title cost is $0.88 including subpurchase order forms. references 1. john c. kountz, "cost comparison of computer versus manual catalog maintenance," l ournal of library automation 1:159-77 (spring 1968). 2. daniel melcher, m elcher on acquisition (chicago: american library association, 1971 ), p. 135. 3. john c. kountz and robert norton, "biblios-a modular approach to total library adp," proceedings of asis 6:39-50 ( 1969). 86 journal of library automation vol . 5/ 2 june, 1972 4. john c. kountz and robert e. norton , "biblios-a modular system for library automation," datamation 16-79-83 ( feb. 1970 ) . 5. orange county public library presently has twenty-six branches, three bookmobiles, and plans for at least three more branches and an additional bookmobile in the near future. 6. kountz and norton, "biblios-a modular approach." 7. the device affiliated with the book depends on the transactor. the only requirements are that it mechanically represent the key for the book, be practically indestructible, and that it can be prepared mechanically. this last consideration is an absolute when there are 800,000 volumes to convert. 8. melcher, melcher on acquisition, p. 135. microsoft word june_ital_vacek_final.docx president’s message: making an impact in the time that is given to us rachel vacek information technologies and libraries | june 2015 3 in an early chapter in the fellowship of the ring, by j.r.r. tolkien, frodo laments having found the one ring and gandalf tries to console him by saying, “all we have to decide is what to do with the time that is given us.” this is one of my favorite quotes in the lord of the rings series because it inspires us to rise to the occasion and perform to the best of our abilities. it also implies that that we have a purpose to fulfill within a predetermined time period. although my term in office is three years, i’m only lita president for one year. to set a vision and goals, establish a sense of urgency, generate buy-‐in, engage and empower the membership, implement sustainable changes, and remain positive and focused – all within one year while holding a full-‐time job – is challenging to say the least. i’ve been very fortunate during my almost eight-‐year tenure at the university of houston libraries to participate in numerous professional development opportunities, lead change, and make a difference. personal and professional growth has always been very important to me, and being in an environment that encourages me to become a better librarian, technologist, manager, and leader is not only helpful for my career, but also extremely rewarding on an intellectual level. lita has benefited that training. in today’s library technology landscape, one of the many skills leaders need to possess is the ability to effect change. as lita president, i have put many changes in motion and am happy with what i have accomplished, and proud of our board and the members who volunteer to lead and effect change. as i reflect over the past year, it’s fair to say that lita, despite some financial challenges, has had numerous successes and remains a thriving organization. three areas – membership, education, and publications – bring in the most revenue for lita. of those, membership is the largest money generator. however, membership has been on a decline, a trend that’s been seen across ala for the past decade. in response, the board, committees, interest groups, and many and individuals have been focused on improving the member experience to retain current members and attract potential ones. with all the changes to the organization and leadership, lita is on the road to becoming profitable again and will remain one of ala’s most impactful divisions. rachel vacek (revacek@uh.edu) is lita president 2014-‐15 and head of web services, university libraries, university of houston, houston, texas. president’s message | vacek doi: 10.6017/ital.v34i2.8804 4 the board has taken numerous steps to stabilize or reverse the decline in revenues that has resulted from a steady reduction in overall membership. at ala annual 2014, the financial advisory committee was established to respond to recommendations from the financial strategies task force, adjusting the budget to make a number of improvements while planning for larger, more substantial changes. in fall 2014 we took steps to improve our communications by establishing the communications & marketing committee and appointing a social media manager and a blog manager. the blog and social media have seen a steady upward trajectory of engagement with over 27,000 blog views since september 2014 and over 13,300 followers on twitter. these efforts help recruit and retain members, advertise our online education and programming, and increase attendance at conferences. over the past year, nine workshops and two web courses were offered, many of which sold out thanks to new marketing approaches. the forum remains popular and has stellar programming and keynote speakers. programs and workshops at ala conferences are stronger than ever and continue to be well attended. publications also remain strong. although only three lita guides were published this year, partially due to a change in publishers, there are many more in the pipeline. finally, the search for a new executive director is underway, and with a new leader comes fresh ideas and perspectives. i am excited about lita’s future. the incoming board, along with a new executive director, has an opportunity to make national and lasting impact as well as collaborate with outstanding librarians and staff in this division and across ala. lita’s challenges and successes are shared amongst a dedicated team of volunteers, and together we’ve made significant changes. i believe that lita members will continue to rise to the occasion and make incredible things happen with “the time that is given us.” lita is an amazing organization because of its members and their passion and dedication. i couldn’t be prouder. it has been an honor and a privilege to serve as your president. usability test results for encore in an academic library megan johnson information technology and libraries | september 2013 59 abstract this case study gives the results a usability study for the discovery tool encore synergy, an innovative interfaces product, launched at appalachian state university belk library & information commons in january 2013. nine of the thirteen participants in the study rated the discovery tool as more user friendly, according to a sus (standard usability scale) score, than the library’s tabbed search layout, which separated the articles and catalog search. all of the study’s participants were in favor of switching the interface to the new “one box” search. several glitches in the implementation were noted and reported to the vendor. the study results have helped develop belk library training materials and curricula. the study will also serve as a benchmark for further usability testing of encore and appalachian state library’s website. this article will be of interest to libraries using encore discovery service, investigating discovery tools, or performing usability studies of other discovery services. introduction appalachian state university’s belk library & information commons is constantly striving to make access to libraries resources seamless and simple for patrons to use. the library’s technology services team has conducted usability studies since 2004 to inform decision making for iterative improvements. the most recent versions (since 2008) of the library’s website have featured a tabbed layout for the main search box. this tabbed layout has gone through several iterations and a move to a new content management system (drupal). during fall semester 2012, the library website’s tabs were: books & media, articles, google scholar, and site search (see figure 1). some issues with this layout, documented in earlier usability studies and through anecdotal experience, will be familiar to other libraries who have tested a tabbed website interface. user access issues include the belief of many patrons that the “articles” tab looked for all articles the library had access to. in reality the “articles” tab searched seven ebsco databases. belk library has access to over 400 databases. another problem noted with the tabbed layout was that patrons often started typing in the articles box, even when they knew they were looking for a book or dvd. this is understandable, since when most of us see a search box we just start typing, we do not read all the information on the page. megan johnson (johnsnm@appstate.edu) is e-learning and outreach librarian, belk library and information commons, appalachian state university, boone, nc. mailto:johnsnm@appstate.edu usability test results for encore in an academic library | johnson 60 figure 1. appalachian state university belk library website tabbed layout search, december 2012. a third documented user issue is confusion over finding an article citation. this is a rather complex problem, since it has been demonstrated through assessment of student learning that many students cannot identify the parts of a citation, so this usability issue goes beyond the patron being able navigate the library’s interface, it is partly a lack of information literacy skills. however, even sophisticated users can have difficulty in determining if the library owns a particular journal article. this is an ongoing interface problem for belk library and many other academic libraries. google scholar (gs) often works well for users with a journal citation, since on campus they can often simply copy and paste a citation to see if the library has access, and, if so, the full text it is often is available in a click or two. however, if there are no results found using gs, the patrons are still not certain if the library owns the item. background in 2010, the library formed a task force to research the emerging market of discovery services. the task force examined summon, ebsco discovery service, primo and encore synergy and found the products, at that time, to still be immature and lacking value. in april 2012, the library reexamined the discovery market and conducted a small benchmarking usability study (the results are discussed in the methodology section and summarized in appendix a). the library felt enough improvements had been made to innovative interface’s encore information technology and libraries | september 2013 61 synergy product to justify purchasing this discovery service. an encore synergy implementation working group was formed, and several subcommittees were created, including end-user preferences, setup & access, training, and marketing. to help inform the decision of these subcommittees, the author conducted a usability study in december 2012, which was based on, and expanded upon, the april 2012 study. the goal of this study was to test users’ experience and satisfaction with the current tabbed layout, in contrast to the “one box” encore interface. the library had committed to implementing encore synergy, but there are options in layout of the search box on the library’s homepage. if users expressed a strong preference for tabs, the library could choose to leave a tabbed layout for access to the articles part of encore, for the catalog part, and create tabs for other options like google scholar, and a search of the library’s website. a second goal of the study was to benchmark the user experience for the implementation of encore synergy so that, over time, improvements could be made to promote seamless access to appalachian state university library’s resources. a third goal of this study was to document problems users encountered and report them to innovative. figure 2. appalachian state university belk library website encore search, january 2013. usability test results for encore in an academic library | johnson 62 literature review there have been several recent reviews of the literature on library discovery services. thomsettscott and reese conclude that discovery tools are a mixed blessing. 1 users can easily search across abroad areas of library resources and limiting by facets is helpful. downsides include loss of individual database specificity and user willingness to look beyond the first page of results. longstanding library interface problems, such as patrons’ lack of understanding of holding statements, and knowing when to it is appropriate to search in a discipline specific database are not solved by discovery tools.2 in a recent overview of discovery services, hunter lists four vendors whose products have both a discovery layer and a central index: ebsco’s discovery service (eds); ex libris’ primo central index; serials solutions’ summon; and oclc’s worldcat local (wcl). 3 encore does not have currently offer a central index or pre-harvested metadata for articles, so although encore has some of the features of a discovery service, such as facets and connections to full text, it is important for libraries considering implementing encore to understand that the part of encore that searches for articles is a federated search. when appalachian purchased encore, not all the librarians and staff involved in the decision making were fully aware of how this would affect the user experience. further discussion of this in the “glitches revealed” section. fagan et al. discuss james madison university’s implementation of ebsco discovery service and their customizations of the tool. they review the literature of discovery tools in several areas, including articles that discuss the selection processes, features, and academic libraries’ decisions process following selection. they conclude, the “literature illustrates a current need for more usability studies related to discovery tools.” 4 the most relevant literature to this study are case studies documenting a library’s experience with implementing a discovery services and task based usability studies of discovery services. thomas and buck5 sought to determine with a task based usability study whether users were as successful performing common catalog-related tasks in worldcat local (wcl) as they are in the library’s current catalog, innovative interfaces’ webpac. the study helped inform the library’s decision, at that time, to not implement wcl. beecher and schmidt6 discuss american university’s comparison of wcl and aquabrowser (two discovery layers), which were implemented locally. the study focused on user preferences based on students “normal searching patterns” 7 rather than completion of a list of tasks. their study revealed undergraduates generally preferred wcl, and upperclassmen and graduates tended to like aquabrower better. beecher and schmidt discuss the research comparing assigned tasks versus user-defined searches, and report that a blend of these techniques can help researchers understand user behavior better.8 information technology and libraries | september 2013 63 this article reports on a task-based study, in which the last question asks the participant to research something they had looked for within the past semester, and the results section indicates that the most meaningful feedback came from watching users research a topic they had a personal interest in. having assigned tasks also can be very useful. for example, an early problem noted with discovery services was poor search results for specific searches on known items, such as the book “the old man and the sea.” assigned tasks also give the user a chance to explore a system for a few searches, so when they search for a topic of personal interest, it is not their first experience with a new system. blending assigned tasks with user tasks proved helpful in this study’s outcomes. encore synergy has not yet been the subject of a formally published task-based usability study. allison reports on an analysis of google analytic statistics at university of nebraska-lincoln after encore was implemented.9 the article concludes that encore increases the user’s exposure to all the library’s holdings, describes some of the challenges unl faced and gives recommendations for future usability studies to evaluate where additional improvements should be made. the article also states unl plans to conduct future usability studies. although there are not yet formal published task-based studies on encore, at least one blogger from southern new hampshire university documented their implementation of the service. singley reported in 2011, “encore synergy does live up to its promise in presenting a familiar, user-friendly search environment.10 she points out, “to perform detailed article searches, users still need to link out to individual databases.” this study confirms that users do not understand that articles are not fully indexed and integrated; articles remain, in encore’s terminology, in “database portfolios.” see the results section, task 2, for a fuller discussion of this topic. method this study included a total of 13 participants. these included four faculty members, and six students recruited through a posting on the library’s website offering participants a bookstore voucher. three student employees were also subjects (these students work in the library’s mailroom and received no special training on the library’s website). for the purposes of this study, the input of undergraduate students, the largest target population of potential novice users, was of most interest. table 3 lists demographic details of the student or faculty’s college, and for students, their year. this was a task-based study, where users were asked to find a known book item and follow two scenarios to find journal articles. the following four questions/tasks were handed to the users on a sheet of paper: 1. find a copy of the book the old man and the sea. 2. in your psychology class, your professor has assigned you a 5-page paper on the topic of eating disorders and teens. find a scholarly article (or peer-reviewed) that explores the relation between anorexia and self-esteem. http://www.snhu.edu/ usability test results for encore in an academic library | johnson 64 3. you are studying modern chinese history and your professor has assigned you a paper on foreign relations. find a journal article that discusses relations between china and the us. 4. what is a topic you have written about this year? search for materials on this topic. the follow up questions where verbally asked either after a task, or asked as prompts while the subject was working. 1. after the first task (find a copy of the book the old man and the sea) when the user finds the book in appsearch, ask: “would you know where to find this book in the library?” 2. how much of the library’s holdings do you think appsearch/ articles quick search is looking across? 3. does “peer reviewed” mean the same as “scholarly article”? 4. what does the “refine by tag” block the right mean to you? 5. if you had to advise the library to either stay with a tabbed layout, or move to the one search box, what would you recommend? participants were recorded using techsmith’s screen-casting software camtasia, which allows the user’s face to be recorded along with their actions on the computer screen. this allows the observer to not rely solely on notes or recall. if the user encounters a problem with the interface, having the session recorded makes it simple to create (or recreate) a clip to show the vendor. in the course of this study, several clips were sent to innovative interfaces, and they were responsive to many of the issues revealed. further discussion is in the “glitches revealed” section. seven of the subjects first used the library site’s tabbed layout (which was then the live site) as seen in figure 1. after they completed the tasks, participants filled in a system usability scale (sus) form. the users then completed the same tasks on the development server using encore synergy. participants next filled out a sus form to reflect their impression of the new interface. encore is locally branded as appsearch and the terms are used interchangeably in this study. the six other subjects started with the appsearch interface on a development server, completed a sus form, and then did the same tasks using the library’s tabbed interface. the time it took to conduct the studies was ranged from fifteen to forty minutes per participant, depending on how verbal the subject was, and how much they wanted to share about their impressions and ideas for improvement. jakob nielson has been quoted as saying you only need to test with five users: “after the fifth user, you are wasting your time by observing the same findings repeatedly but not learning much new.”11 he argues for doing tests with a small number of users, making iterative improvements, and then retesting. this is certainly a valid and ideal approach if you have full control of the design. in the case of a vendor-controlled product, there are serious limitations to what the information technology and libraries | september 2013 65 librarians can iteratively improve. the most librarians can do is suggest changes to the vendor, based on the results of studies and observations. when evaluating discovery services in the spring of 2012, appalachian state libraries conducted a four person task based study (see appendix a), which used university of nebraska at lincoln’s implementation of encore as a test site to benchmark our students’ initial reaction to the product in comparison to the library’s current tabbed layout. in this small study, the average sus score for the library’s current search box layout was 62, and for unl’s implementation of encore, it was 49. this helped inform the decision of belk library, at that time, not to purchase encore (or any other discovery service), since students did not appear to prefer them. this paper reports on a study conducted in december 2012 that showed a marked improvement in users’ gauge of satisfaction with encore. several factors could contribute to the improvement in sus scores. first is the larger sample size of 13 compared to the earlier study with four participants. another factor is in the april study, participants were using an external site they had no familiarity with, and a first experience with a new interface is not a reliable gauge of how someone will come to use the tool over time. this study was also more robust in that it added the task of asking the user to search for something they had researched recently and the follow up questions were more detailed. overall it appears that, in this case, having more than four participants and a more robust design gave a better representation of user experience. the system usability scale (sus) the system usability scale has been widely used in usability studies since its development in 1996. many libraries use this tool in reporting usability results.12,13 it is simple to administer, score, and understand the results.14 sus is an industry standard with references in over 600 publications.15 an “above average” score is 68. scoring a scale involves a formula where odd items have one subtracted from the user response, and with even numbered items, the user response is subtracted from five. the total converted responses are added up, and then multiplied by 2.5. this makes the answers easily grasped on the familiar scale of 1-100. due to the scoring method, it is possible that results are expressed with decimals.16 a sample sus scale is included in appendix d. results the average sus score for the 13 users for encore was 71.5, and for the tabbed layout, the average sus score was 68. this small sample set indicates there was a user preference for the discovery service interface. in a relatively small study like this, these results do not imply a scientifically valid statistical measurement. as used in this study, the sus scores are simply a way to benchmark how “usable” the participants rated the two interfaces. when asked the subjective follow up question, “if you had to advise the library to either stay with a tabbed layout, or move to the one search box, what would you recommend?” 100% of the participants recommended the library change to appsearch, (although four users actually rated usability test results for encore in an academic library | johnson 66 the tabbed layout with a higher sus score). these four participants said things along the lines of, “i can get used to anything you put up.” participant sus sus year and major or college appsearch first encore tabbed layout student a 90 70 senior/social work/female no student b 95 57.5 freshman/undeclared/male yes student c 82.5 57.5 junior/english/male yes student d 37.5 92 sophomore/actuarial science/female yes student e 65 82.5 junior/psychology/female yes student f 65 77.5 senior/sociology/female no student g 67.5 75 junior/music therapy/female no student h 90 82.5 senior/dance/female no student i 60 32.5 senior/political science/female no faculty a 40 87.5 family & consumer/science/female yes faculty b 80 60 english/male no faculty c 60 55 education/male no faculty d 97.5 57.5 english/male yes average 71.5 68 table 1. demographic details and individual and average sus scores. discussion task 1: “find a copy of the book the old man and the sea.” all thirteen users had faster success using encore. when using encore, this “known item” is in the top three results. encore definitely performed better than the classic catalog in saving the time of the user. in approaching task 1 from the tabbed layout interface, four out of thirteen users clicked on the books and media tab, changed the drop down search option to “title,” and were (relatively) quickly successful. the remaining nine who switched to the books and media tab and used the default keyword search for “the old man and the sea” had to scan the results (using this search method, the book is the seventh result in the classic catalog), which took two users almost 50 seconds. this length of time, for an “average user” to find a well-known book is not considered to be acceptable to the technology services team at appalachian state university. when using the encore interface, the follow up question for this task was, “would you know where to find this book in the library?” nine out of 13 users did not know where the book would be, or information technology and libraries | september 2013 67 how to find it. the three faculty members and student d could pick out the call number and felt they could locate the book in the stacks. figure 3. detail of the screen of results for searching for “the old man and the sea”. the classic catalog that most participants were familiar with has a “map it” feature (from the third party vendor stackmap), and encore did not have that feature incorporated yet. since this study has been completed, the “map it” has been added to the item record in appsearch. further research can determine if students will have a higher level of confidence in their ability to locate a book in the stacks when using encore. figure 3 shows the search as it appeared in december 2012 and figure 4 has the “map it” feature implemented and pointed out with a red arrow. related to this task of searching for a known book, student b commented that in encore, the icons were very helpful in picking out media type. figure 4. book item record in encore. the red arrow indicates the “map it” feature, an add-on to the catalog from the vendor stackmap. browse results are on the right, and only pull from the catalog results. when using the tabbed layout interface (see figure 1), three students typed the title of the book into the “articles” tab first, and it took them a few moments figure out why they had a problem with the results. they were able to figure it out and re-do the search in the “correct” books & usability test results for encore in an academic library | johnson 68 media tab, but student d commented, “i do that every time!” this is evidence that the average user does not closely examine a search box--they simply start typing. task 2: “in your psychology class, your professor has assigned you a five-page paper on the topic of eating disorders and teens. find a scholarly article (or peer-reviewed) that explores the relation between anorexia and self-esteem.” this question revealed, among other things, that seven out of the nine students did not fully understand the term scholarly or peer reviewed article are meant to be synonyms in this context. when asked the follow up question “what does ‘peer reviewed’ mean to you?” student b said, “my peers would have rated it as good on the topic.” this is the kind of feedback that librarians and vendors need to be aware of in meeting students’ expectations. users have become accustom to online ratings by their peers of hotels and restaurants, so the terminology academia uses may need to shift. further discussion on this is in the “changes suggested” section below. figure 5. typical results for task two. figure 5 shows a typical user result for task 2. the follow up question asked users “what does the refine by tag box on the right mean to you?” student g reported they looked like internet ads. other users replied with variations of, “you can click on them to get more articles and stuff.” in fact, the “refine by tag” box in the upper right column top of screen contains only indexed terms from the subject heading of the catalog. this refines the current search results to those with the specific subject term the user clicked on. in this study, no user clicked on these tags. information technology and libraries | september 2013 69 for libraries considering purchasing and implementing encore, a choice of skins is available, and it is possible to choose a skin where these boxes do not appear. in addition to information from innovative interfaces, libraries can check a guide maintained by a librarian at saginaw valley state university17 to see examples of encore synergy sites, and links to how different skins (cobalt, pearl or citrus) affect appearance. appalachian uses the “pearl” skin. figure 6. detail of screenshot in figure 5. figure 6 is a detail of the results shown in the screenshot for average search for task 2. the red arrows indicate where a user can click to just see article results. the yellow arrow indicates where the advanced search button is. six out of thirteen users clicked advanced after the initial search results. clicking on the advanced search button brought users to a screen pictured in figure 7. usability test results for encore in an academic library | johnson 70 figure 7. encore's advanced search screen. figure 7 shows the encore’s advanced search screen. this search is not designed to search articles; it only searches the catalog. this aspect of advanced search was not clear to any of the participants in this study. see further discussion of this issue in the “glitches revealed” section. information technology and libraries | september 2013 71 figure 8. the "database portfolio" for arts & humanities. figure 8 shows typical results for task 2 limited just to articles. the folders on the left are basically silos of grouped databases. innovative calls this feature “database portfolios.” in this screen shot, the results of the search narrowed to articles within the “database portfolio” of arts & humanities. clicking on the individual databases return results from that database, and moves the usability test results for encore in an academic library | johnson 72 user to the database’s native interface. for example, in figure 8, clicking on art full text would put the user into that database, and retrieve 13 results. while conducting task 2, faculty member a stressed she felt it was very important students learn to use discipline specific databases, and stated she would not teach a “one box” approach. she felt the tabbed layout was much easier than appsearch and rated the tabbed layout in her sus score with a 87.5 versus the 40 she gave encore. she also wrote on the sus scoring sheet “appsearch is very slow. there is too much to review.” she also said that the small niche showing how to switch results between “books & more” to article was “far too subtle.” she recommended bold tabs, or colors. this kind of suggestion librarians can forward to the vendor, but we cannot locally tweak this layout on a development server to test if it improves the user experience. figure 9. closeup of switch for “books & more” and “articles” options. task 3: “you are studying modern chinese history and your professor has assigned you a paper on foreign relations. find a journal article that discusses relations between china and the us.” most users did not have much difficulty finding an article using encore, though three users did not immediately see a way to limit only to articles. of the nine users who did narrow the results to articles, five used facets to further narrow results. no users moved beyond the first page of results. search strategy was also interesting. all thirteen users appeared to expect the search box to work like google. if there were no results, most users went to the advanced search, and reused the same terms on different lines of the boolean search box. once again, no users intuitively understood that “advanced search” would not effectively search for articles. the concept of changing search terms was not a common strategy in this test group. if very few results came up, none of the users clicked on the “did you mean” or used suggestions for correction in spelling or change in terms supplied by encore. during this task, two faculty members commented on load time. they said students would not wait, results had to be instant. but when working with students, when the author asked how they felt when load time was slow, students almost all said it was fine, or not a problem. they could “see it was working.” one student said, “oh, i’d just flip over to facebook and let the search run.” so perhaps librarians should not assume we fully understand student user expectations. it is also information technology and libraries | september 2013 73 worth noting that, for the participant, this is a low-stakes usability study, not crunch time, so attitudes may be different if load time is slow for an assignment due in a few hours. task 4: “what is a topic you have written about this year? search for materials on this topic.” this question elicited the most helpful user feedback, since participants had recently conducted research using the library’s interface and could compare ease of use on a subject they were familiar with. a few specific examples follow. student a, in response to the task to research something she had written about this semester, looked for “elder abuse.” she was a senior who had taken a research methods class and written a major paper on this topic, and she used the tabbed layout first. she was familiar with using the facets in ebsco to narrow by date, and to limit to scholarly articles. when she was using appsearch on the topic of elder abuse, encore held her facets “full text” and “peer reviewed” from the previous search on china and u.s. foreign relations. an example of encore “holding a search” is demonstrated in figures 10 and 11 below. student a was not bothered by the encore holding limits she had put on a previous search. she noticed the limits, and then went on to further narrow within the database portfolio of “health” which limited the results to the database cinahl first. she was happy with being able to limit by folder to her discipline. she said the folders would help her sort through the results. student g’s topic she had researched within the last semester was “occupational therapy for students with disabilities” such as cerebral palsy. she understood through experience, that it would be easiest to narrow results by searching for ‘occupational therapy’ and then add a specific disability. student g was the user who made the most use of facets on the left. she liked encore’s use of icons for different types of materials. student b also commented on “how easy the icons made it.” faculty b, in looking for the a topic he had been researching recently in appsearch, typed in “writing across the curriculum glossary of terms” and got no results on this search. he said, “mmm, well that wasn’t helpful, so to me, that means i’d go through here” and he clicked on the google search box in the browser bar. he next tried removing “glossary of terms” from his search and the load time was slow on articles, so he gave up after ten seconds and clicked on “advanced search” and tried putting “glossary of terms” in the second line. this led to another dead end. he said, “i’m just surprised appalachian doesn’t have anything on it.” the author asked if he had any other ideas about how to approach finding materials on his topic from the library’s homepage and he said no, he would just try google (in other words, navigating to the group of databases for education was not a strategy that occurred to him). usability test results for encore in an academic library | johnson 74 the faculty member d had been doing research on a relatively obscure historical event and was able to find results using encore. when asked if he had seen the articles before, he said, “yes, i’ve found these, but it is great it’s all in one search!” glitches revealed it is of concern for the user experience that the advanced search of encore does not search articles; it only searches the catalog. this was not clear to any participant in this study. as noted earlier, encore’s article search is a federated search. this affects load time for article results, and also puts the article results into silos, or to use encore’s terminology, “database portfolios.” encore’s information on their website definitely markets the site as a discovery tool, saying, it “integrates federated search, as well as enriched content—like first chapters—and harvested data… encore also blends discovery with the social web. 18” it is important for libraries considering purchase of encore that while it does have many features of a discovery service, it does not currently have a central index with pre-harvested metadata for articles. if innovative interfaces is going to continue to offer an advanced search box, it needs to be made explicitly clear that the advanced search is not effective for searching for articles, or innovative interfaces needs to make an advanced search work with articles by creating a central index. to cite a specific example from this study, when student e was using appsearch, with all the tasks, after she ran a search, she clicked on the advanced search option. the author asked her, “so if there is an advanced search, you’re going to use it?” the student replied, “yeah, they are more accurate.” another aspect of encore that users do not intuitively grasp is that when looking at the results for an article search, the first page of results comes from a quick search of a limited number of databases (see figure 8). the users in this study did understand that clicking on the folders will narrow by discipline, but they did not appear to grasp that the result in the database portfolios are not included in the first results shown. when users click on an article result, they are taken to the native interface (such as psych info) to view the article. users seemed un-phased when they went into a new interface, but it is doubtful they understand they are entering a subset of appsearch. if users try to add terms or do a new search in the native database they may get relevant results, or may totally strike out, depending on chosen database’s relevance to their research interest. information technology and libraries | september 2013 75 figure 10. changing a search in encore. another problem that was documented was that after users ran a search, if they changed the text in the “search” box, the results for articles did not change. figure six demonstrates the results from task 2 of this study, which asks users to find information on anorexia and self-esteem. the third task asks the user to find information on china and foreign relations. figure 10 demonstrates the results for the anorexia search, with the term “china” in the search box, just before the user clicks enter, or the orange arrow for new search. figure 11. search results for changed search. figure 11 show that the search for the new term, “china” has worked in the catalog, but the results for articles are still about anorexia. in this implementation of encore, there is no “new search button” (except in the advanced search page, there is a “reset search” button, see figure 7) and usability test results for encore in an academic library | johnson 76 refreshing the browser is had no effect on this problem. this issue was screencast19 and sent to the vendor. happily, as of april 2013, innovative interfaces appears to have resolved this underlying problem. one purpose of this study was to determine if users had a strong preference for tabs, since the library could choose to implement encore with tabs (one for access to articles, one for the catalog, and other tab options like google scholar). this study indicated users did not like tabs in general, they much preferred a “one box solution” on first encounter. a major concern raised was the user’s response to the question, “how much of the library’s holdings do you think appsearch/ articles quick search is looking across?” twelve out of thirteen users believed that when they were searching for articles from the quick search for articles tabbed layout, they were searching all the library databases. the one exception to this was a faculty member in the english department, who understood that the articles tab searched a small subset of the available resources (seven ebsco databases out of 400 databases the library subscribes to). all thirteen users believed appsearch (encore) was searching “everything the library owned.” the discovery service searches far more resources than other federated searches the library has had access to in the past, but it is still only searching 50 out of 400 databases. it is interesting that in the fagan et al. study of ebsco’s discovery service, only one out of ten users in that study believed the quick search would search “all” the library’s resources.20 a glance at james madison university’s library homepage21 suggests wording that may improve user confusion. figure 12. screenshot of james madison library homepage, accessed december 18, 2012. information technology and libraries | september 2013 77 figure 13. original encore interface as implemented in january 2013. given the results that 100% of the users believed that appsearch looked at all databases the library has access to, the library made changes to the wording in the search box. (see figure 7). future tests can determine if this has any positive effect on the understanding of what appsearch includes. figure 14. encore search box after this usability study was completed. the arrow highlights additions to the page as a result of this study. some other wording changes suggested were from the finding that only seven out of nine students fully understood that “peer reviewed” would limit to scholarly articles. a suggestion was made to innovative interfaces to change the wording to “scholarly (peer reviewed)” and they did so in early january. although innovative’s response on this issue was swift, and may help students, changing the wording does not address the underlying information literacy issue of what students understand about these terms. interestingly, encore does not include any “help” pages. appalachian’s liaison with encore has asked about this and been told by encore tech support that innovative feels the product is so intuitive; users will not need any help. belk library has developed a short video tutorial for users, and local help pages are available from the library’s homepage, but according to innovative, a link to these resources cannot be added to the top right area of the encore screen (where help is commonly located in web interfaces). although it is acknowledged that few users actually read “help” pages, it seems like a leap of faith to think a motivated searcher will understand things like the “database portfolios” (see figures 9) without any instruction at all. after implementation, the usability test results for encore in an academic library | johnson 78 librarians here at appalachian conducted internally developed training for instructors teaching appsearch, and all agreed that understanding what is being searched and how to best perform a task such as an advanced article search is not “totally intuitive,” even for librarians. finally, some interesting search strategy patterns were revealed. on the second and third questions in the script (both having to do with finding articles) five of the thirteen participants had the strategy of putting in one term, then after the search ran, adding terms to narrow results using the advanced search box. although this is a small sample set, it was a common enough search strategy to make the author believe this is not an unusual approach. it is important for librarians and for vendors to understand how users approach search interfaces so we can meet expectations. further research the findings of this study suggest librarians will need to continue to work with vendors to improve discovery interfaces to meet users expectations. the context of what is being searched and when is not clear to beginning users in encore one aspect of this test was it was the participants’ first encounter with a new interface, and even student d, who was unenthused about the new interface (she called the results page “messy, and her sus score was 37.5 for encore, versus 92 for the tabbed layout) said that she could learn to use the system given time. further usability tests can include users who have had time to explore the new system. specific tasks that will be of interest in follow up studies of this report are if students have better luck in being able to know where to find the item in the stacks with the addition of the “map it” feature. locally, librarian perception is that part of the problem with this results display is simply visual spacing. the call number is not set apart or spaced so that it stands out as important information (see figure 5 for a screenshot). another question to follow up on will be to repeat the question, “how much of the library’s holdings do you think appsearch is looking across?” all thirteen users in this study believed appsearch was searching “everything the library owned.” based on this finding, the library made small adjustments to the initial search box (see figures 14 and 15 as illustration). it will be of interest to measure if this tweak has any impact. summary all users in this study recommended that the library move to encore’s “one box” discovery service instead of using a tabbed layout. helping users figure out when they should move to using discipline specific databases will most likely be a long-term challenge for belk library, and for other academic libraries using discovery services, but this will probably trouble librarians more than our users. information technology and libraries | september 2013 79 the most important change innovative interfaces could make to their discovery service is to create a central index for articles, which would improve load time and allow for an advanced search feature for articles to work efficiently. because of this study, innovative interfaces made a wording change in search results for article to include the word “scholarly” when describing peer reviewed journal articles in belk library’s local implementation. appalachian state university libraries will continue to conduct usability studies and tailor instruction and e-learning resources to help users navigate encore and other library resources. overall, it is expected users, especially freshman and sophomores, will like the new interface but will not be able to figure out how to improve search results, particularly for articles. belk library & information commons’ instruction team is working on help pages and tutorials, and will incorporate the use of encore into the library’s curricula. references 1 . thomsett-scott, beth, and patricia e. reese. "academic libraries and discovery tools: a survey of the literature." college & undergraduate libraries 19 (2012): 123-43. 2. ibid, 138. 3. hunter, athena. “the ins and outs of evaluating web-scale discovery services” computers in libraries 32, no. 3 (2012) http://www.infotoday.com/cilmag/apr12/hoeppner-web-scalediscovery-services.shtml (accessed march 18, 2013) 4. fagan, jody condit, meris mandernach, carl s. nelson, jonathan r. paulo, and grover saunders. "usability test results for a discovery tool in an academic library." information technology & libraries 31, no. 1 (2012): 83-112. 5. thomas, bob., and buck, stephanie. oclc's worldcat local versus iii's webpac. library hi tech, 28(4) (2010), 648-671. doi: http://dx.doi.org/10.1108/07378831011096295 6. becher, melissa, and kari schmidt. "taking discovery systems for a test drive." journal of web librarianship 5, no. 3: 199-219 [2011]. library, information science & technology abstracts with full text, ebscohost (accessed march 17, 2013). 7. ibid, p. 202 8. ibid p. 203 9. allison, dee ann, “information portals: the next generation catalog,” journal of web librarianship 4, no. 1 (2010): 375–89, http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1240&context=libraryscience (accessed march 17, 2013) http://www.infotoday.com/cilmag/apr12/hoeppner-web-scale-discovery-services.shtml http://www.infotoday.com/cilmag/apr12/hoeppner-web-scale-discovery-services.shtml http://dx.doi.org/10.1108/07378831011096295 usability test results for encore in an academic library | johnson 80 10. singley, emily. 2011 “encore synergy 4.1: a review” the cloudy librarian: musings about library technologies http://emilysingley.wordpress.com/2011/09/17/encore-synergy-4-1-areview/ [accessed march 20, 2013]. 11 . nielson, jakob. 2000. “why you only need to test with 5 users” http://www.useit.com/alertbox/20000319.html (accessed december 18, 2012]. 12. fagan et al, 90. 13. dixon, lydia, cheri duncan, jody condit fagan, meris mandernach, and stefanie e. warlick. 2010. "finding articles and journals via google scholar, journal portals, and link resolvers: usability study results." reference & user services quarterly no. 50 (2):170-181. 14. bangor, aaron, philip t. kortum, and james t. miller. 2008. "an empirical evaluation of the system usability scale." international journal of human-computer interaction no. 24 (6):574-594. doi: 10.1080/10447310802205776. 15. sauro, jeff. 2011. “measuring usability with the system usability scale (sus)” http://www.measuringusability.com/sus.php. [accessed december 7, 2012]. 16. ibid. 17. mellendorf, scott. “encore synergy sites” zahnow library, saginaw valley state university. http://librarysubjectguides.svsu.edu/content.php?pid=211211 (accessed march 23, 2013). 18. encore overview, “http://encoreforlibraries.com/overview/” (accessed march 21, 2013). 19. johnson, megan. videorecording made with jing on january 30, 2013 http://www.screencast.com/users/megsjohnson/folders/jing/media/0ef8f186-47da-41cf96cb-26920f71014b 20. fagan et al. 91. 21. james madison university libraries, “http://www.lib.jmu.edu” (accessed december 18, 2012). http://emilysingley.wordpress.com/ http://emilysingley.wordpress.com/2011/09/17/encore-synergy-4-1-a-review/ http://emilysingley.wordpress.com/2011/09/17/encore-synergy-4-1-a-review/ http://www.useit.com/alertbox/20000319.html http://www.measuringusability.com/sus.php http://librarysubjectguides.svsu.edu/content.php?pid=211211 http://encoreforlibraries.com/overview/ http://www.screencast.com/users/megsjohnson/folders/jing/media/0ef8f186-47da-41cf-96cb-26920f71014b http://www.screencast.com/users/megsjohnson/folders/jing/media/0ef8f186-47da-41cf-96cb-26920f71014b http://www.lib.jmu.edu/ information technology and libraries | september 2013 81 appendix a pre-purchase usability benchmarking test in april 2012, before the library purchased encore, the library conducted a small usability study to serve as a benchmark. the study outlined in this paper follows the same basic outline, and adds a few questions. the purpose of the april study was to measure student perceived success and satisfaction with the current search system of books and articles appalachian uses compared with use of the implementation of encore discovery services at university of nebraska lincoln (unl). the methodology was four undergraduates completing a set of tasks using each system. two started with unl, and two started at appalachian’s library homepage. in the april 2012 study, the participants were three freshman and one junior, and all were female. all were student employees in the library’s mailroom, and none had received special training on how to use the library interface. after the students completed the tasks, they rated their experience using the system usability scale (sus). in the summary conclusion of that study, the average sus score for the library’s current search box layout was 62, and for unl’s encore search it was 49. even though none of the students was particularly familiar with the current library’s interface, it might be assumed that part of the higher score for appalachian’s site was simply familiarity. student comments from the small april benchmarking study included the following. the junior student said the unl site had "too much going on" and appalachian was "easier to use; more specific in my searches, not as confusing as compared to unl site." another student (a freshman), said she has "never used the library not knowing if she needed a book or an article." in other words, she knows what format she is searching for and doesn’t perceive a big benefit to having them grouped. this same student also indicated she had no real preference between appalachian or the unl. she believed students would need to take time to learn either and that unl is a "good starting place." usability test results for encore in an academic library | johnson 82 appendix b instructions for conducting the test notes: use firefox for the browser, set to “private browsing” so that no searches are held in the cache (search terms to not pop into the search box from the last subject’s search). in the bookmark toolbar, the only two tabs should be available “dev” (which goes to the development server) and “lib” (which goes to the library’s homepage). instruct users to begin each search from the correct starting place. identify students and faculty by letter (student a, faculty a, etc). script hi, ___________. my name is ___________, and i'm going to be walking you through this session today. before we begin, i have some information for you, and i'm going to read it to make sure that i cover everything. you probably already have a good idea of why we asked you here, but let me go over it again briefly. we're asking students and faculty to try using our library's home page to conduct four searches, and then ask you a few other questions. we will then have you do the same searches on a new interface. (note: half the participants to start at the development site, the other half start at current site). after each set of tasks is finished, you will fill out a standard usability scale to rate your experience. this session should take about twenty minutes. the first thing i want to make clear is that we're testing the interface, not you. you can't do anything wrong here. do you have any questions so far? ok. before we look at the site, i'd like to ask you just a few quick questions. what year are you in college? what are you majoring in? roughly how many hours a week altogether--just a ballpark estimate--would you say you spend using the library website? ok, great. hand the user the task sheet. do not read the instructions to the participant, allow them to read the directions for themselves. allow the user to proceed until they hit a wall or become frustrated. verbally encourage them to talk aloud about their experience. usability test results for encore in an academic library | johnson 83 written instructions for participants. find the a copy of the book the old man and the sea. in your psychology class, your professor has assigned you a 5-page paper on the topic of eating disorders and teens. find a scholarly article (or peer-reviewed) that explores the relation between anorexia and self-esteem. you are studying modern chinese history and your professor has assigned you a paper on foreign relations. find a journal article that discusses relations between china and the us. what is a topic you have written about this year? search for materials on this topic. usability test results for encore in an academic library | johnson 84 appendix c follow up questions for participants (or ask as the subject is working) after the first task (find a copy of the book the old man and the sea) when the user finds the book in appsearch, ask “would you know where to find this book in the library?” how much of the library’s holdings do you think appsearch/ articles quick search is looking across? does “peer reviewed” mean the same as “scholarly article”? what does the “refine by tag” block the right mean to you? if you had to advise the library to either stay with a tabbed layout, or move to the one search box, what would you recommend? do you have any questions for me, now that we're done? thank subject for participating. usability test results for encore in an academic library | johnson 85 appendix d sample system usability scale (sus) strongly strongly disagree agree i think that i would like to use this system frequently 1 2 3 4 5 i found the system unnecessarily complex 1 2 3 4 5 i thought the system was easy to use 1 2 3 4 5 i think that i would need the support of a technical person to be able to use this system 1 2 3 4 5 i found the various functions in this system were well integrated 1 2 3 4 5 i thought there was too much inconsistency in this system 1 2 3 4 5 i would imagine that most people would learn to use this system very quickly 1 2 3 4 5 i found the system very cumbersome to use 1 2 3 4 5 i felt very confident using the system 1 2 3 4 5 i needed to learn a lot of things before i could get going with this system 1 2 3 4 5 comments: lib-mocs-kmc364-20140106083930 198 an algorithm for compaction of alphanumeric data william d. schieber, george w. thomas: central library and documentation branch, international labour office, geneva, switzerland description of a technique for compressing data to be placed in computer auxiliary storage. the technique operates on the principle of taking two alphabetic characters frequently used in combination and replacing them with one unused special character code. such une-for-two replacement has enabled the ilo to achieve a rate of compression of 43.5% on a data base of approximately 40,000 bibliographic records. introduction this paper describes a technique for compacting alphanumeric data of the type found in bibliographic records. the file used for experimentation is that of the central library and documentation branch of the international labour office, geneva, where approximately 40,000 bibliographic records are maintained on line for searches done by the library for its clients. work on the project was initiated in response to economic pressure to conserve direct-access storage space taken by this particularly large file. in studying the problem of how to effect compaction, several alternatives were considered. the first was a recursive bit-pattern recognition technique of the type developed by demaine ( 1,2), which operates mdependently of the data to be compressed. this approach was rejected because of the apparent complexity of the coding and decoding algorithms, and also because early analyses indicated that further development of the second type of approach might ultimately yield higher compression ratios. compaction of alphanumeric datajschieber and thomas 199 the second type of approach involves the replacement, by shorter nondata strings, of longer character strings known to exist with a high frequency in the data. this technique is data dependent and requires an analysis of what is to be encoded. one such method is to separate words into their component parts: prefixes, stems and suffixes; and to effect compression by replacing these components with shorter codes. there have been several successful algorithms for separating words into their components. salton ( 3) has done this in connection with his work on automatic indexing. resnikoff and dolby ( 4,5) have also examined the problem of word analysis in english for computational linguistics. although this method appears to be viable as the basis of a compaction scheme, it was here excluded because ilo data was in several languages. moreover, dolby and resnikoff's encoding and decoding routines require programs that perform extensive word analysis and dictionary look-up procedures that ilo was not in a position to develop. the actual requirements observed were twofold: that the analysis of what strings were to be encoded be kept relatively simple, and that the encoding algorithm must combine simplicity and speed presumably by minimizing the amount of dictionary look-up required to encode and decode the selected string. one of the most straightforward examples of the use of this technique is the work done by snyderman and hunt ( 6 ) that involves replacement of two data characters by single unused computer codes. however, the algorithm used by them does not base the selection of these two-character pairs (called "digrams") on their frequency of occurrence in the data. the technique described here is an attempt to improve and extend the concept by encoding digrams on the basis of frequency. the possibility of encoding longer character strings is also examined. three other related discussions of data compaction appear in papers by myers et al. (7) and by demaine and his colleagues (8,9). the compression technique the basic technique used to compact the data file specifies that the most-frequently occurring digrams be replaced by single unused specialcharacter codes. on an eight-bit character machine of the type used, there are a total of 256 possible character codes (bytes ) . of this total only a small number are allocated to graphics (that is, characters which can be reproduced by the computer's printer). in addition, not all of the graphics provided for by the computer manufacturer appear in the user's data base. thus, of the total code set, a large portion may go unused. characters that are unallocated may be used to represent longer character strings. the most elementary form of substitution is the replacement of specific digrams. if these digrams can be selected on the basis of frequency , the compression ratio will be better than if selection is done independent of frequency. 200 journal of library automation vol. 4/4 december, 1971 this requires a frequency count of all digrams appearing in the data, and a subsequent ranking in order of decreasing frequency. once the base character set is defined, and the digrams eligible for replacement are selected, the algorithm can be applied to any string of text. the algorithm consists of two elements: encoding and decoding. in encoding, the string to be encoded is examined from left to right. the initial character is examined to determine if it is the first of any encodable digram. if it is not, it is moved unchanged to the output area. if it is a possible candidate, the following character is checked against a table to verify whether or not this character pair can be replaced. if replacement can be effected, the code representing the digram is moved to the output area. if not, the algorithm then moves on to treat the second character in precisely the same way as the first. the algorithm continues, character-by-character until the entire string has been encoded. following is a step-by-step description of the element. 1) load length of string into a counter. 2) set pointer to first character in string. 3) check to determine whether character pointed can occur in combination. if character does not occur in combination, point to next character and repeat step 3. 4) if character can occur in combination, check following character in a table of valid combinations with the first character. if the digram cannot be encoded, advance pointer to next character and return to step 3. 5) if the digram is codable, move preceeding non-codable characters (if any) to output area, followed by the internal storage code for the digram. 6) decrease the string length counter by one, advance pointer two positions beyond current value and return to step 3. in the following example assume that only three digrams are defined as codable: ab, be and de. assume also that the clear text to be encoded is the six-character string abcdef. after encoding the coded string would appear as: ab c de f a horizontal line is used to represent a coded pair, a dot shows a single (non-combined) character. the encoded string above is of length four. note that although bc was defined as an encodable digram, it did not combine in the example above because the digram ab was already encoded as a pair. the characters c and f do not combine, so they remain uncoded. note also that if the digram ab had not been defined as codable, the resultant combination would have been different in this case: a bc de f compaction of alphanumeric data j schieber and thomas 201 the decoding algorithm serves to expand a compressed string so that the record can be displayed or printed. as in the encoding routines, decoding of the string goes from left to right. bytes in the source string are examined one by one. if the code represents a single character, the print code for that character is moved to the output string. if the code represents a digram, the digram is moved to the output string. decoding proceeds byte-by-byte as follows until end of string is reached: 1 ) load string length into counter. 2 ) set pointer to first byte in record. 3 ) test character. if the code represents a single character, point to next source byte and retest. 4) if the code represents a digram: move all bytes ( if any ) up to the coded digram; and move in the digram. 5) increase the length value by one, point to next source byte and continue with step 3. application of the technique the algorithm, when used on the data base of approximately 40,000 records was found to yield 43.5% compaction. the file contains bibliographic records of the type shown in figure 1. 413.5 1970 70al350 warner m stone m the data bank societyorganizations, computers and social freedom. london, george allen and unwin, <1970>. 244 p. charts. /social research/ into the potential thrf.at to privacy and freedom f/human right/sl through thf misuse of /data bank/s examines /computer/ based /information ---~ieval/, the impact of computer technology on branches of the /public administration/ ann /health service/$ in the /usa/ ano the /uk/ ano co~cluoes that, in order to protect human dignity, the new powers must be kept tn chf.ck. /bibliography/ pp. 236 to 242 ano /reference/$. engl fig. 1 . sample record from test file. each record contains a bibliographic se gment as well as a brief abstract containing descriptors placed between slashes for computer identification. a large amount of blank space appears on the printed version of these records; however, the uncoded machine readable copy does not contain blanks, except between words and as filler characters in the few fields defined as fixed-length. the average length of a record is 535 characters ( 10) . 202 journal of library automation vol. 4/4 december, 1971 the valid graphics appearing in the data are shown in table 1, along with the percentage of occurrence of each character throughout the entire file. table 1. single-character frequency freq. freq. freq. freq. freq. graphic % graphic % graphic % graphic % graphic % b 14.87 i 4.32 h 1.58 0.63 8 0.31 e 7.63 c 3.48 1.52 w 0.50 ( 0.28 n 6.38 l 3.32 ' 1.52 2 0.42 ) 0.28 i 6.01 d 2.32 1 1.08 k 0.42 + 0.21 a 6.01 u 2.21 v 0.91 3 0.40 j 0.15 (/j 5.86 p 2.12 b 0.87 5 0.37 x 0.14 t 5.50 m 2.02 9 0.83 7 0.37 z 0.13 r 4.82 f 1.61 y 0.82 0 0.35 q 0.08 s 4.61 g 1.58 6 0.81 4 0.34 misc. 0.01 spec. as might be expected, the blank (b) occurs most frequently in the data because of its use as a word separator. the slash occurs more frequently than is normal because of its special use as a descriptor delimiter. it should also be noted that the data contains no lower-case characters. this is advantageous to the algorithm because it considerably le~sens the total number of possible digram combinations. as a result, a larger proportion of the file is codable in the limited set chosen as codable pairs, and because the absence of 26 graphics allows the inclusion of 26 additional coded pairs. in the file used for compaction there are 58 valid graphics. allowing one character for special functions leaves 197 unallocated character codes (of a total of 256 possible ). a digram frequency analysis was performed on the entire file and the digrams ranked in order of decreasing frequency. from this list the first 197 digrams were selected as those which were eligible for replacement by single-character codes. table 2 shows these "encodable" digrams arranged by lead character. the algorithm was programmed in assembler language for use on an ibm 360/40 computer. the encoding element requires approximately 8,000 bytes of main storage; the decoding element requires approximately 2,000 bytes. in order to obtain data on the amount of computer time required to encode and decode the file, the following tests were performed. to find the encoding time, the file was loaded from tape to disk. the tape copy of the file was uncoded, the disk copy compacted. loading time for 41,839 records was 52 minutes and 51 seconds. the same tape to disk operation without encoding took 28:08. the time difference ( 24:43) represents encoding time for 41,839 records, or .035 seconds per record. a decoding test was done by unloading the previously coded disk file to tape. the time taken was 41:52, versus a time of 20:20 for unloading compaction of alphanumeric dataischieber and thomas 203 an uncompacted file. the time difference (21:32) represents decoding time for 41,839 records, or .031 seconds per record. the compaction ratio, as indicated above, was 43.5 per cent. for purposes of comparison, the algorithm developed by snyderman and hunt ( 6) was tested and found to yield a compaction ratio of 32.5% when applied to the same data file. table 2. most frequently occuring digrams lead char. a b c d e f g h i l m n 0 p r s t u v w y b 1 i ) eligible digrams ab ac ad ag ai al am an ap ar as at ab bl bo ca ce ch ci cl co ct cu cb c. dedi du db dl ea ec ed ef el em en ep er es et ev eb el fe fifo fr f~ ge gl gr gb gl ha he hi ho hb la ic ie il in 10 is it iv la le li ll lo lu us ma me mi mm mu mhs na nc nd ne ng ni no ns nt nla nl oc od of og ol om on op or ou ov ol,a pa pe pl po pr p. ra re ri rk rn ro rs rt ru ry rb rl sa se sl so sp ss st su shs s, s. ta tc te th ti to tr ts tu ty tb t i uc ud ul un ur us ut va ve vi wo yhs yl lisa hsb bc bd be hsg lal lal bm bn bo hip l;6r bs hit l;6u l;6w };6};6 l/j i l/j-. l/j ( 19 1 a ; c je 11 / l ; m jp jr ; s jt jb 1, ,b .l/j -b ), possible extension of the algorithm currently the compression technique encodes only pairs of characters. there might be good reason to extend the technique to the encoding of longer strings-provided a significantly higher compaction ratio could be 204 journal of library automation vol. 4/4 december, 1971 achieved without undue increase in processing time. one could consider encoding trigrams, quadrigrams, and up to n-grams. the english wo~d ·'the", for example, may occur often enough in the data to make it worth coding. the arguments against encoding longer strings are several. prime among these is the difficulty of deciding what is to be encoded. doing an analysis of digrams is a relatively straightforward affair, whereas an analysis of trigrams and longer strings is considerably more costly, because of the fact that there are more combinations. furthermore, if longer strings are to be en'coded, the algorithms for encoding and decoding become more complex and time-consuming to employ. one approach to this type of extension is to take a particular type of character string, namely a word, and to encode certain words which appear frequently. a test of this technique was made to encode particular words in the data: descriptors . all descriptors (about 1200 in number) appear specially marked by slashes in the abstract field of the record. each descriptor (including the slashes) was replaced by a two-character code. after replacement, the normal compaction algorithm was applied to the record. a compaction ratio of 56.4% was obtained when encoding a small sample of twenty records ( 10,777 characters). the specific difficulty anticipated in this extension is the amount of either processing time or storage space which the decoding routines would require. if the look-up table for the actual descriptor values were to be located on disk, the time to retrieve and decode each record might be rather long. on the other hand, if the look-up table were to be in main storage at the time of processing, its size might exclude the ability to do anything else, particularly when on-line retrieval is done in an extremely limited amount of main storage area. a partial solution to this problem might be to keep the look-up tables for the most frequently occurring terms in main storage and the others on disk. at present further analysis is being done to determine the value of this approach. conclusions the compaction algorithm performs relatively efficiently given the type of data used in text data base (i.e. data without lower case alphabetics, having a limited number of special characters, in primarily english text ). the times for decoding individual records ( .031 sec/ record ) indicate that on a normal print or terminal display operation, no noticeable increase in access time will be incurred. however several types of problems are encountered when treating other kinds of data. since the algorithm works on the basis of replacing the most-frequently occurring n-grams by single-byte codes, the compaction ratio is dependent on the number of codes that can be "freed up" for n-gram representation. the more codes that can be reallocated to n-grams, the better the compaction. data which would pose complications to the algorithm-as currently defined-can be separated for discussion as follows: compaction of alphanumeric datajschieber and thomas 205 1) data containing both upper and lower case characters (as well as a limited set of special characters), and 2) data which might possibly contain a wide variety of little-used special graphics. if lower-case characters are used, a possible way to encode data using this technique is to harken back to the time-honored method of representing lower-case with upper-case codes, and upper-case characters by their value, preceeded by a single shift code (e.g., #access for access). the shift code blank character digram would undoubtedly figure relatively high on the frequency list, making it eligible as an encodable digram. the second problem occurs when one attempts to compact data having a large set of graphics. a good example of this is bibliographic data containing a wide variety of little-used characters of the type now being provided for in the marc tapes ( 11) issued by the u. s. library of congress (such as the icelandic thorn). normally representation of these graphics is done by allocating as many codes as required from the possible 256-code set. since the compaction ratio is dependent on the number of unallocated internal codes, a possible solution to this dilemma might be to represent little-used graphics by multi-byte codes which would free the codes for representation of frequently occurring n-grams. further, it is noticeable that the more homogeneous the data the higher the compression ratio. this means that data all in one language will encode better than data in many languages. there is, unfortunately, no ready solution to this problem, given the constraints of this algorithm. in dealing with heterogeneous data one must be prepared to accept a lower compression factor. without doubt to be able to effect a savings of around 40% for storage space is significant. the price for this ability is computer processing time, and the more complex the encoding and decoding routines, the more time is required. there is a calculable break-even point at which it becomes economically more attractive to buy x amount of additional storage space than to spend the equivalent cost on data compaction. yet at the present cost of direct-access storage, compaction may be a possible solution for organizations with large data files. references 1. marron, b. a.; demaine, p. a. d.: "automatic data compression," communications of the acm, 10 (november 1967), 711-715. 2. demaine, p. a. d.; kloss, k.; marron, b. a.: the solid system iii: alphanumeric compression. (washington, d. c. : national bureau of standards, 1967 ) . (technical note 413 ) . 3. salton, g.: automatic information organization and retrieval (new york: mcgraw-hill, 1968 ). 4. resnikoff, h. l.; dolby, j. l.: "the nature of affixing in written english," mechanical translation, 8 (march 1965), 84-89. 206 journal of library automation vol. 4/4 december, 1971 5. resnikoff, h . l.; dolby, j. l.: "the nature of affixing in written english," mechanical translation, 9 (june 1966), 23-33. 6. snyderman, martin; hunt, bernard: "the myriad virtues of text compaction," datamation (december 1, 1970), 36-40. 7. myers, w.; townsend, m.; townsend, t.: "data compression by hardware or software," datamation (april 1966), 39-43. 8. demaine, p. a. d.; kloss, k.; marron, b. a.: the solid system ii. numeric compression. (washington, d. c.: national bureau of standards, 1967). (technical note 413 ). 9. demaine, p. a. d.; marron, b. a.: "the solid system i. a method for organizing and searching files." in schecter, g. (ed.): information retrieval-a critical view. (washington, d. c.: thompson book co., 1967). 10. schieber, w.: isis (integrated scientific information system; a general description of an approach to computerized bibliographical control). (geneva: international labour office, 1971) . 11. books: a marc format; specification of magnetic tapes containing monographic catalog records in the marc ii format. (washington, d. c.: library of congress, information systems office, 1970.) 10 high school library data processing betty flora: librarian, leavenworth high school, leavenworth, kansas and john willhardt: data processing instructor, central missouri state college, warrensburg, missouri. planning and operation of an automated high school library system is described which utilizes an ibm 1401 data processing system installed for teaching purposes. book ordering, shelf listing and circulation have been computerized. this paper presents an example of a small automated high-school library system which works efficiently. a great deal of emphasis to date in library automation has been on large university and college libraries, but the relatively few schools that have pioneered in the field of school library automation have demonstrated its feasibility and its potential. data processing is economically within the realm of large and medium-sized school districts. the port huron district, port huron, michigan, has an accounting machine, keypunch and verifier; among the operations performed are printing purchase orders and book cards. the port huron staff consists of one professional librarian, two clerks and two part-time working students. evanston township high school, evanston, illinois, has an automated library system processed with an ibm 1401 computer. other high schools using library data processing are the oak park-river forest high school in illinois; beverly hills, california; west hartford, connecticut; weston, massachusetts; and the burnt hills-ballston lake and bedford-mt. kisco school districts in new york state (1). there are a small number of high schools and vocational schools in kansas and missouri that have high school library edp/flora and willhardt 11 data processing equipment which is used for teaching purposes. names and addresses of these schools may be obtained from the missouri director of vocational education at jefferson city, missouri, and from the kansas state supervisor of technical training at topeka, kansas. introduction leavenworth senior high school, leavenworth, kansas, a campusstyle school comprising six buildings, has approximately 1350 students. the library, located in the main academic building, is presently being remodeled and enlarged. it contains approximately eighteen thousand volumes, including the professional collection; and fifteen hundred to two thousand new volumes are added each year. the library staff consists of one qualified librarian, two full-time clerical assistants, and twenty student assistants, each of the latter working one class period a day. the library is, in the true sense of the term, a media center. a mobile listening center is available, and there are large collections of recordings, cartridge and reel tapes, film strips, films, microfilms, reproductions of paintings, educational games, magazines and vertical file material. fortunately, there is a consistently substantial budget of more than eight dollars per student, including some federal funds, which makes additions to the collection possible in stable development. · data processing at leavenworth high school was made possible by the vocational education act of 1963, which provided for the secretary of health, education, and welfare to enter into agreements with the several state vocational education agencies to provide such occupational training as found to be necessary by the secretary of labor (2). under the provisions of the act, federal money is alloted to the states, which in turn allot a portion of this money to various school districts; a school system receiving such money must lease or purchase data processing equipment and use it mainly for teaching purposes. a data processing curriculum was initiated in the school year 1964-65 at leavenworth high school, under conditions and regulations set up by the state supervisor of technical training which gave first priority in the use of the data processing equipment to teaching. this has been adhered to strictly at leavenworth high school; the equipment is used over half of the school day for teaching purposes and adult education courses in data processing are offered at night. class time consists of lecture and application, with students having opportunity to operate, wire, program and test problems. data processing classes are scheduled first in the computer room; administrative and library operations are scheduled to be processed in the remaining hours during the school day and after school, each operation being assigned a specific time. although unit record equipment was initially leased, plans for a small computer were included in the original decision to offer data processing courses. equipment, plus salaries to those conducting the program, con12 1 ournal of librm·y automation vol. 2/1 march, 1969 stitute a major investment for a medium-sized public high school. consequently, although the classes are a valuable addition to the vocational training area of the curriculum, as many applications as possible are made of school operations, such as enrollment, record keeping, grade reports and payroll, in order to further justify the cost. for this reason, the superintendent of the leavenworth school system suggested that the library might, by using data processing in many of its procedures, both support the data processing instructional program and increase its own effectiveness. methods and materials to develop a system requires systems analysis, which necessitates a clear formulation of purposes and requirements independent of any particular design for implementation ( 3); and the development of procedural applications to be processed on a computer system should be a joint responsibility of both the systems staff and line management ( 4). furthermore, any conversion of library procedures to automation should be carefully planned in advance. proceeding in the fullest cooperation with a view to mutual benefits, the librarian at leavenworth and the head of data processing spent many hours working out the details of their joint effort. the librarian explained her needs and suggested methods of achieving the desired objectives. for his part, the head of data processing evaluated the possibilities from a technical point of view and suggested methods of achieving the desired objectives. together they worked out an initial plan, and the various phases were then programmed. the leavenworth data processing library system was set up to 1) order all new library books; 2) complete shelf cards and book checkout cards; 3) run shelf card listings; 4) correct and file shelf cards; 5) reproduce book checkout cards for books checked out; 6) run first and second overdue notices; and 7) provide library inventory, book count lists and book catalogs. all the lists, notices, and reproduced cards are done on the 1401 computer; computer programs for these operations are written in autocoder. the amount of computer time required for the processing of library data and reports is comparatively small in relation to other operations of the data processing department and was set up to run partly in the daily schedule and partly after school. time required for preparation of information for the computer is significant and must be scheduled more carefully. again, part of this time is fitted into the daily schedule and part of it is accomplished after classes. the high school leases the following ibm data processing equipment: two 024 card punches, one 026 printing card punch, one 082 sorter, one 548 interpreter, one 085 collator, and one 1401 computer with 4k and one disk storage drive. the 1401 computer consists of the 1401 central processing unit, a 1402 card reader punch, a 1403 printer and a 1311 disk storage drive. high school library edp /flora and willhardt 13 the following cards were developed for the procedure: shelf card a is punched from lists of books to be ordered and only the following information and columns are punched: author name (columns 14-35), title (columns 36-71), copyright date (columns 72-73), and purchase date (columns 79-80). when the book is received, this card is completed with the following information: shelf letter (column 1), dewey decimal number (columns 2-7), author number (columns 8-13) and accession number (columns 7 4-78). shelf card b is punched and filed behind shelf card a. only the following information and columns are punched: price (columns 8-13), publisher (columns 36-65), and an x-punch in column 80. the book checkout card (figure 1) is first reproduced from the completed shelf card a, and after that from book checkout cards when books have been checked out of the library. this card contains the shelf letter (column 1), dewey decimal number (columns 2-7), author number (columns 8-13), author name (columns 14-30), title (columns 31-66), student number (columns 68-73), accession number (columns 7 4-78), and an x-punch in column 80. !look titlt author i accept responsibility for this book ano should this book be lost, destroyed or stolen while checked out to me, i will pay the replacement cost of the book, i agree to pay nie fin~ for overdue book$ as follows: i to 5 days overdue 2¢ per day 6 to 10 da'is overdue !5¢ per day over 10 days ~l:rt!v£ 10¢ per da't ---=moc.,=-=t '""':=::-'""' __ ] fig. 1. book checkout card. student tivmdir i a student finder card locates the student's name and parent's name and address on the computer disk pack. the biggest initial task was keypunching an ibm card for each book in the library, which at that time comprised 13,000 books. it was done by data processing students in the high school, working occasionally during class, but mostly after class and on saturdays on a voluntary basis. toward the end of the second semester, many of the procedures had been reviewed and discussed with students in the data processing classes as part of the vocational program. 14 journal of library automation vol. 2/1 march, 1969 aondb card i nte'rpret cards run listing on 1401 re-:-run list fig. 2. book order procedure. interpret book checkou cards interpret shelf card reproduce shelf card into book checkout card fig. 3. new book p1'0cessing. high school library edp/flora and willhardt 15 r ecej ve book · checkout cards from library reproduce new book checkout cards interpret new book checkout cards return old and new book check: out cards to ll brary fig. 4. book checkout procedure. cards for overdue books send finder cards to data processing run address labels return labels and finder cards to library fig. 5. overdue notice procedure. 16 journal of library automation vol. 2/1 march, 1969 book order (figure 2) the library furnishes the data processing department with request cards or lists of books to be ordered, giving author name, title, copyright date, price, publisher, and purchase date (year). data processing punches two cards for each book according to shelf cards a and b. these cards and batches must be kept in the order received from the library. the cards are interpreted, checked for correct punching and listed by batch. the library must check the number of copies ordered and the total amount of each group or batch. after verification and corrections, the cards are returned to data processing for rerunning of the number of copies necessary to send with the purchase order. new book processing (figure 3) when new books are received, the library staff discards shelf card b and writes the following information on shelf card a for punching in the columns indicated: shelf letter in column 1 ( b for biography, k for kansas, p for professional, r for reference, s for story collection, or a blank which indicates fiction); dewey decimal number in columns 2-7; author number in columns 8-13; and accession number in columns 74-78. these columns are interpreted on the 548 interpreter. shelf card a is used to reproduce the book checkout card. shelf cards are block sorted on column 1; each group is then sorted by author number and dewey decimal number. individual cards must be hand filed into the shelf list. the shelf list can be used to provide classification listings, inventory listings, library book counts and book catalogs. book checkout cards are interpreted, sorted by author name (columns 14-23 alpha), returned to the library and filed in the respective books. book checkout card reproduction (figure 4) as books are checked out of the library, the book checkout card (figure 1) is signed by the student and his number is written on it. once a week accumulated book checkout cards are sent to data processing to be reproduced into new book checkout cards which are interpreted and merged behind the old book checkout cards. each week's cards are kept separately. the old cards are for books due in the library in two weeks. the new cards are inserted in these books as they are returned and the old ones placed in a separate file for library circulation statistics. overdue notices (figure 5) the library is provided with a deck of student finder cards (on~ for each student), with student name, number and finder number on the card and in the address file on the disk. when books are overdue, finder cards are pulled by the library staff and sent to data processing,. where they are sorted by a disk accession number. address labels are run on high school library edp/flora and willhardt 17 the 1401 computer for those students with overdue books. these labels are presently attached to pre-printed envelope overdue notices (figure 6), but it is planned to replace the envelope with a continuous-form post card. the first notice is addressed to the student at his home and the second to his parents. leavenworth senior high school library if you have returned your overdue library materials disregard this notice •••• if not, please come to the library at your earliest convenience. 1 to 5 days overdue..................... 2¢ per day 6 to 10 days overdue ..................... 5¢ per day over 10 days overdue ..................... lo¢ per day fig. 6. overdue notice. discussion the book checkout and overdue notices procedures were the first concrete ones developed. these were initiated during the 1965-66 school year, and have proved to be quite successful in saving time and effort. one of the most useful purposes of the leavenworth system is that any portion of the shelf list can be easily provided for an instructor who wishes to assign special readings. also the system has simplified and accelerated preparation of lists for inventory purposes. the ordering process gives the librarian the opportunity to check the order lists before forwarding them to the business manager; this improves the accuracy of the order. 18 i ournal of library automation vol. 2/1 march, 1969 standardization of procedure and operation is essential for efficiency ( 5,6). basically the leavenworth procedure utilized two types of cards, sheh card a and the book checkout card, which are very similar in format. sheh card a is initiated when ordering books and is used to reproduce book checkout cards and to make sheh listings, inventory and book count listings. moreover the system was designed on the basis of having a minimum of skilled clerical workers. student help is used for correcting and filing sheh cards. the ability to provide a book catalog in the future is an advantage. a book catalog need not be confined to one area and may be done in multiple copies. different editions of a work may be more readily seen and compared on a printed page than in a card catalog, where only one entry can be examined at a time. also a book catalog may concentrate in a single easily handled volume entries which would occupy several heavy drawers in a card catalog (7). one of the problems associated with developing a system like the one here described is that of communication. as in all technical and professional areas, a specialized terminology develops, a kind of esoteric jargon which confuses meanings and impedes understanding. this difficulty naturally diminishes as each party to the cooperative effort becomes more familiar with the terminology of the other, and a little plain talking and clear thinking will soon eliminate it. the effectiveness of an automated library program depends, of course, upon the unqualified cooperation between the library and the data processing department. the librarian must establish a reasonable and acceptable schedule of work upon which the data processing department can depend, and she must assure that library material essential to that work is delivered according to schedule. conversely, the data processing department must undertake to complete the work promptly and accurately. evaluation certainly one of the most significant benefits of automation is the great saving of time. tedious and detailed tasks essential to the efficient operation of any library, tasks which formerly required many hours to complete and which had by their natures to be repeated periodically, are accomplished in a fraction of the time. consequently, the librarian is freed for more professional work; most importantly, she has more time to give to the students and their problems, which should be, above all, her first concern. the value of the leavenworth high school library system lies not only in greater accuracy and saving of time for the librarian and he~ staff, but also in the opportunity it provides for student help to learn and operate a system. it is apparent, finally, that automation, properly applied, can be an invaluable asset to the school library. like all systems it depends, in the high school library edp /flora and willhardt 19 final analysis, upon the human factors involved. so long as interests are mutual, and so long as efforts are equal, the library and data processing departments can work effectively together for the benefit of both. acknowledgments mr. jack spear, ksu, manhattan, kansas, advised on the initial planning of the system. the authors received cooperation and encouragement from mr. gordon yeargan, superintendent of schools in leavenworth, and mr. dino spigarelli, principal of leavenworth high school. mr. fred buis, data processing instructor at the high school, helped with the preparation of this paper and is continuing to develop the potential of the system. references 1. mccusker, sister mary lauretta: "implications of automation for school libraries part 2," school libraries, (fall, 1968), 15-22. 2. united states department of health, education and welfare: vocational and technical education (washington: government printing office, 1964). 3. markuson, barbara evans, ed.: libraries and automation (washington: library of congress, 1964). 4. elliott, orville c.; wesley, roberts.: business information processing systems (homewood, illinois: richard d. irwin, inc~ , 1968). 5. laden, h. n.; gildersleeve, t. r.: system design for computer applications (new york: john wiley & sons, inc., 1963). 6. dougherty, richard m.: "manpower utilization in technical services," library resources and technical services, 12 (winter, 1968), 79-80. 7. kingery, robert e.; tauber, maurice f., eds.: book catalogs (new york: the scarecrow press, inc., 1963). 112 journal of library automation vol. 14/2 june 1981 anyway because he is primarily getting suggested classification numbers in order to browse. the tucson public library could not have made the above decisions if it did not have a complete online file of all its holdings (including even reference materials that never circulate). but since this data did exist (after a five-year bar-coding effort) and since more than forty online terminals were already in place throughout the library system to access the online file, the decision not to include locations or holdings in the microform catalog seemed reasonable . in the longer-range future (1990?), it is very likely that the entire catalog will be available online. in the meantime, the tucson public library did not want to divide its resources maintaining two location records, but rather wanted to concentrate resources in maintaining one accurate record of locations available as widely as possible throughout the library system (by installing more online terminals for staff and public use). was this decision a sound one? we don't know. the microform catalog has not yet been introduced for public use. by the end of this year we should have some preliminary answers to this question. references 1. robin w. macdonald and j. mcree elrod, "an approach to developing computer catalogs," college & research libraries 34:202--8 (may 1973). a structure code for machine readable library catalog record formats herbert h. hoffman: santa ana college, santa ana, california. libraries house many types of publications in many media, mostly print on paper, but also pictures on paper, print and pictures on film, recorded sound on plastic discs, and others. these publications are of interest to people because they contain recorded information. more precisely said, because they contain units of intellectual, artistic, or scholarly creation that collectively can be called "works." one could say simply that library materials consist of documents that are stored and cataloged because they contain works. the structure of publications into documents (or "books") and works, the clear distinction between the concept of the information container as opposed to the contents, deserves more attention than it has received so far from bibliographers and librarians. the importance of the distinction between books and works has been hinted at by several theoreticians, notably lubetzky. however, the idea was never fully developed. the cataloging implications of the structural diversity among documents were left unexplored. as a consequence, librarians have never disentangled the two terms book and work . from the paris principles and the marc formats to the new second edition of the anglo-american cataloguing rules, the terms book and work are used loosely and interchangeably, now meaning a book, now a work proper, now part of a work , now a group of books. such ambiguity can be tolerated as long as each person involved knows at each step which definition is appropriate when the term comes up. but as libraries ease into the age of electronic utilities and computerized catalogs based on records read by machine rather than interpreted by humans, a considerably greater measure of precision will have to be introduced into library work. as one step toward that goal an examination of the structure of publications will be in order. the items that are housed in libraries, regardless of medium, are of two types. they are either single documents, or they are groups of two or more documents. items that contain two or more documents are either finite items (all published at once, or with a first and a last volume identified) or they are infinite items (periodicals, intended to be continued indefinitely at intervals). schematically, these three types of bibliographic items in libraries can be represented as shown in figure l. it should be noted that all publications, all documents, all bibliographic items in lid d ... d do __ _ fig. 1. three types of bibliographic items: top, single-document item; center, finite multiple-document item; bottom, infinite multipledocument item. braries, can be assigned to one of these three structures. there are no exceptions. all bibliographic items, furthermore, contain works. an item may contain one single work. but an item may also contain several works. schematically, the two situations can be represented as shown in figure 2. an item that is composed of several documents and contains several works may have one work in each document, or several per document. schematically, the two possibilities can be represented as shown in figure 3. it is possible, of course, for an item to fig . . 2. top, single-work document (example: a typical novel); bottom, multiple-work document (example: a collection of plays). communications 113 fig. 3. top, one work per document; bottom, several works per document . be composed of several documents but to contain only one work. figure 4 is a schematic representation of this case. mixed structures are also possible, as in the schematic shown in figure 5. ign oring the mixed structure that is only a combination of two "pure" structures, the foregoing information can be combined into a table that shows seven possible publication types that differ from each other in terms of structure (figure 6). all bibliographic items, whether composed of one document or many, are known by a title . these titles can be called item titles. in the case of a singledocument item (structures a and c), item title and document title are, of course, identical. but in the case of some multiple-document items (publications of types d, e, f, and g, for example), two possibilities exist: the documents that make up the item may or may not have their own individual document titles. for purposes of fig. 4. multivolume work (example: a very long novel in two volumes). fig. 5. finite multi-document item containing many works, mixed structure. 114 journal of library automation vol. 14/2 june 1981 one several documents document per item per item one \.jork per item a se veral several lo/orks works per item per c document one lo/ork per document fig . 6. publication types. the bibliographer or cataloger, items that consist of several documents bearing individual document titles can be described under one of two principles. the entire item can be treated as a unit. elsewhere i have coined a term for this treatment: the set description principle .1 but it is also possible to treat each document as a separate publication, to describe it under the book description principle . if we combine all these considerations we find that we can assign to each bibliographic item that is added to a library's collection one of the thirteen codes shown in figure 7. how can these codes be useful? taking a look into the future, let us imagine an online catalog system supported by a database that contains the records of a library's holdings . the records in such a database are entered in a definite format . in this format, whatever it will be called , there will be data fields for titles, authors, physical descriptions , subject headings, document numbers, and much else. i propose that to these fields one other be added: the structure code . the structure code would add a new dimension to the retrieval of recorded infinite infinite b d e f g formation. here are a few specific examples . consider a search for material on subject x. qualify the search argument by structure codes 1, 3, 7, and 12. result: the search will yield only major monographic works, defined as items of types a, b,f, and g. note that subject x assigned to such items is a true subject heading. the materials retrieved in this example would all be works dealing specifically with the topic x. but the same term assigned to an item coded, say, 6, would not be a true subject heading. the term here would only give a broad general summary of what the works in the item are about. the structure code adds sophistication to the retrieval process by enabling a searcher to distinguish between specific subject designators and mere summary subject headings. a search that excludes codes 2, 4, 5, and 6 limits output to materials that are not just collections of essays. the stratagem used in card catalogs to reach the same result is the qualification of a subject heading by terms denoting format, such as the subdivisions congresses or addresses, essays, lectures . this method of qualifying subject headings has never been done communications ll5 structure code publication type description principle: book (b) or set (s) schematic 1 a 2 c 3 b 4 d 5 d 6 d 7 f 8 f 9 e 10 e 11 e 12 g 13 g fig. 7. structure codes . consistently , however . the proposed structure code would ensure uniform treatment of all affected publications. qualify the search by codes 9, 10, 11, 13 and all periodicals can be excluded . in the card catalog, format qualifications such b b s b s, with individual document title s, without indiv . document title b s b s, with individual . document title s, without indiv. document title b s fwli ___ wgj ~--~ as periodicals, or societies, periodicals, etc ., or yearbooks are sometimes added to subject headings to reach similar results. again, the structure code would introduce uniformity and consistency. present-day card catalogs list publica116 journal of library automation vol. 14/2 june 1981 tions only. they do not list the individual works that may be contained in publications. if an analytic catalog were to be built into a computerized system at some time in the future , the structure code would be a great help in the redesign, because it makes it easy to spot items that need analytics, namely those that contain embedded works, or codes 2, 4, 5, 6, 8, 9, 10, 11, and 13. a searcher working with such an analytic catalog could use the code to limit output to manageable stages-first all items of type c, for example; then broadening the search to include those of type d; and so forth, until enough relevant material has been found. the structure code would also be useful in the displayed output. if codes 5 or 8 appeared together with a bibliographic description on the screen, this would tell the catalog user that the item retrieved is a set of many separately titled documents. a complete list of those titles can then be displayed to help the searcher decide which of the documents are relevant for him. in the card catalog this is done by means of contents notes . not all libraries go to the trouble of making contents notes, though, and not all contents notes are complete and rtliable . the structure code would ensure consistency and completeness of contents information at all times. codes 10 and 13 in a search output, analogously, would tell the user that the item is a serial with individual issue titles. there is no mechanism in the contemporary card catalog to inform readers of those titles. codes 4 and 7 would tell that the document is part of a finite set, and so forth. it has been the general experience of database designers that a record cannot have too many searchable elements built into its format. no sooner is one approach abandoned "because nobody needs it," than someone arrives on the scene with just that requirement. it can be anticipated, then, that once the structure code is part of the standard record format, catalog users will find many other ways to work the code into search strategies. it can also be anticipated that the proposed structure code, by adding a factor of selectivity, will help catalogers because it strengthens the authority-control aspect of machine-readable catalog files. if two publications bear identical titles, for example, and one is of structure 1, the other of structure 6, then it is clear that they cannot possibly be the same items. however, if they are of structures 1 and 7, respectively, extra care must be taken in cataloging, for they could be different versions of the same work. determination of the structure of an item is a by-product of cataloging, for no librarian can catalog a book unless he understands what the structure of that book is-one or more works, one or more documents per item, open or closed set, and so forth . it would therefore be very cheap at cataloging time to document the already-performed structure analysis and express this structure in the form of a code. references l. herbert h. hoffman, descriptive cataloging in a new light: polemical chapters for librarians (newport beach, calif.: headway publications, 1976), p.43. revisions to contributed cataloging in a cooperative cataloging database judith hudson: university libraries , state university of new york at albany. introduction oclc is the largest bibliographic utility in the united states. one of its greatest assets is its computerized database of standardized cataloging information . the database, which is built on the principle of shared cataloging, consists of cataloging records input from library of congress marc tapes and records contributed by member libraries. oclc standards ln. order to provide records contributed by member libraries that are as usable as those input from marc tapes, it is immitchell multimedia will have a profound effect on libraries during the next decade. this rapidly developing technology permits the user to combine digital still images, video, animation, graphics, and audio. it can be delivered in a variety of finished formats, including streaming video on the web, video on dvd/vcd, embedded digital objects within a web page or presentation software such as powerpoint, utilized within graphic designs, or printed as hardcopy. this article examines the elements of multimedia creation, as well as requirements and recommendations for implementing a multimedia facility in the library. t he term multimedia, which some may remember being used in the early 1970s as the name for slide shows set to music, now is used to describe “a number of diverse technologies that allow visual and audio media to be combined in new ways for the purpose of communicating.”1 almost all personal computers sold today are capable of viewing multimedia; many can, with minor modifications, also create multimedia. one of the most important features of multimedia is its flexibility. multimedia creation has several distinct elements—inputs, processes performed on those inputs, and outputs (see figure 1). each element can be described as follows. � inputs—new video can be recorded, or existing video, stored on a hard disk, cd/dvd, or tape can be imported. the same is true of audio, with the added flexibility of creating soundtracks or sound effects later, during the editing process. digital still images can be used, either shot on a camera or created by scanning an existing picture. digital artwork or animated sequences created in other software also can be brought in. � processing—regardless of the source, these digital inputs are loaded into the editing software. at this stage, the user will select and arrange the images and sounds, and the software may permit special effects to be created. in addition, the editing software may compress the file so that it is easier to use than the large file sizes used in raw video and audio recording. � outputs—at this point, the user has more choices to make. the new multimedia file can be sent to a program that will encode it for a streaming video in any one of a variety of popular formats, such as windows media, realmedia, or clipstream. then it can be mounted on a web site (either a regular page or within courseware such as webct or blackboard), or the file could be burned onto a cd or dvd, or it could be used within presentation software such as microsoft powerpoint. or the output file from the editing process could be encoded and embedded so that it is an avatar running as part of a web page with a product such as rovion bluestream. the possibilities are nearly endless. all of this is made possible by advances in technology on a variety of fronts. one of the happy anomalies in technology is that greater performance is frequently accompanied by lower costs. this is certainly the case with much of the activity surrounding multimedia. the following factors have fostered advances in multimedia: � increase in processing power and decrease in cost of computer hardware; � quality and affordability of video equipment; � compression of multimedia files; � consumer broadband internet access; and � current multimedia editing software the first two technology factors concern the equipment involved in multimedia production. leading off is the familiar, ever-increasing speed of processors and improved memory and hard-drive space, all delivered for less money. this trend is something that many people take for granted, but a reality check is sometimes in order. the processor in the typical desktop machine on advertised special today is approximately forty-four times as fast as the first pentium processor sold ten years ago, and is equipped with sixteen times as much ram and 117 times as much hard-drive space—at 20 percent of the cost of the old machine (not even adjusted for inflation!). the second factor is the incredible quality available in consumer-market video equipment at reasonable costs. while the images produced with consumer-grade video would not play well at the local megaplex movie theater, they look very good on the small screens found on computers, televisions, and classroom projectors. the third factor is that tremendous compression of multimedia files can be achieved during the editing process. an incoming raw-video file (in the standard .avi format) can be compressed with editing, encoding, and dedicated third-party compression software to an incredible 1 to 2 percent of its original size, and it will still retain very good quality as a digital object on the web and in other desktop viewing applications. the fourth factor is extremely critical for the success of multimedia web applications. home access is shifting away from dial-up access to broadband, with its greatly increased transfer rates. half of all united states homes with internet access are already using broadband, and the 32 information technology and libraries | march 2005 gregory a. mitchell (mitchellg@utpa.edu) is assistant director, resource management at the university of texas—pan american library, edinburg, texas. distinctive expertise: multimedia, the library, and the term paper of the future gregory a. mitchell forecast is for steady increase in these numbers.2 although not all broadband is created equal, it is all significantly faster than dial-up access. the final technology factor concerns the software that is currently available to the multimedia web developer. a developer can achieve some quite professional results with even the most basic products, and then can grow into more complex software that supports increasing levels of expertise. once again, this software is being sold in the price range that typical consumers can afford. � small really is beautiful creating a multimedia lab in the library need not be a large, complex undertaking. in fact, it can be very low cost and as simple as a single workstation. so it is scalable, allowing the library to start small and build in complexity and cost as time, money, and human resources will permit. at the bare-bones minimum, a multimedia lab would consist of a workstation with the software necessary for acquiring, editing, and outputting the files. for practical purposes, though, the workstation should be equipped with a network connection, a cd/dvd burner, a scanner, and a webcam with microphone. another very useful option is an analog-digital bridge device, which enables the capture of analog input (such as vhs tape) into digital files for the editor. to achieve better-quality video when shooting original content, a digital-video camera, tripod, wireless microphone, and portable light kit would be recommended. since more time typically is spent at the editing station than with the camera, the lab can be expanded with additional workstations before investing in another camera. experience at the author’s institution has shown that it is possible to operate a lab with ten workstations and only three video cameras and three still cameras. finally, output from the editing process will likely be printed, so a photoquality printer is another convenient option. this illustrates that the entry into multimedia work need not be a large expense, especially if an existing workstation and any other equipment is already available. if a fairly recent workstation is available to dedicate to the project, the library’s total startup cost could range from $200 to $1,000. not many new library services can be launched for as little as that. rather than dwell on equipment specifications, as that is not the intent of this discussion, the reader may consult the excellent tutorials available from desktop video and pc magazine’s online product guide.3 finally, the creation of a studio is a worthwhile option. although some video will need to be shot on location, many times it is possible to set up and shoot in just one place. a studio is the best place in which to work because it is a controlled environment. it does not need to be large or complicated, and a quiet office or study room can be set up with little effort and expense. the studio gives the users control over the sound and the lighting, and involves minimal setup time for projects. � the research paper of the future multimedia has begun to attract attention in the library community. joe janes, chair of library and information science at the information school at the university of washington and the person responsible for developing the internet public library, recently stated he foresees a growing role for multimedia in the library. it will replace much of the traditional, text-based communication that people are accustomed to. for example, multimedia projects can become the research paper of the future for students.4 it is the media in which many library customers will be working. experience from the author’s institution with creating a multimedia lab would seem to confirm his observation. during the first year and a half of operation, use of the lab has steadily increased (see figure 2). � collaboration the multimedia lab opens the doors to collaborative opportunities with faculty and students from a variety of disciplines across campus. this is because multimedia, like geographic information systems (gis) or other electronic information and communication technologies, is a tool and is not discipline-specific. as important as it is to make the connection with faculty, this media is something with which the students will frequently lead the figure 1. multimedia creation process distinctive expertise: multimedia, the library, and the term paper of the future | mitchell 33 34 information technology and libraries | march 2005 way. they are, after all, the mtv generation, and multimedia has an incredible appeal to their visual orientation. faculty themselves have used it to augment their web-based courses as well as traditional classroom instruction. the author ’s library has even initiated a multimedia résumé service for graduating students. the students can record a video introduction of themselves, encode this as a rovion bluestream avatar, and post it with their résumés on the web. this creates a much stronger impression than a standard résumé, hopefully giving the students an edge in promoting themselves on the job market. even more impressive is the variety of projects that are created in the lab by the students. one might expect to see interest from students in art and communications classes, but students come from many other disciplines as well. for example, business students have effectively used multimedia in their graduate-school business-plan presentations, while biology students like to use the graphics capabilities to study close-ups of slides. education students have employed it to produce multimedia instructional aids, and a sociology student put together a presentation on underserved, low-income neighborhoods. the library supplies the facility and instruction—only the imagination of students is needed. libraries have always been involved in the students’ research and writing process, by providing content, instruction, and facilities for producing the final research product. the same is true in the multimedia environment, although implementing a multimedia lab calls for some new skills for librarians. these include familiarity with basic principles of videography, learning how to use the cameras and other equipment, and gaining some mastery of the editing and encoding software. � why put it in the library? in addition to the research-paper analogy, the author believes that librarians can point with pride to the values and value that libraries offer their communities. it is a central and neutral location—not in one department’s or college’s turf. libraries are conveniently open for many hours per week. many of the information resources that students might use to prepare the presentation are in the library. and librarians have a professional ethic that drives them to provide instruction and assistance for the services the library offers. since multimedia production does have a learning curve and most new users need help in mastering the technology, it does not fit very well with the typical 24/7 drop-in computer lab that the campus information technology (it) often operates. this is a good opportunity for librarians to recognize some of their strengths and capitalize on them. in addition, this can be a breath of fresh air for librarians. here is an opportunity to learn about something new and creative. most people find that they have less room for creativity as time goes by.5 with a multimedia lab in the building, it will offer the librarians the opportunity to create multimedia productions for the library, besides assisting students and faculty with their projects. � potential problems there are some obstacles to overcome, of course. they need not be seen as major, but it is best to be realistic when beginning any new venture. it is almost always a good idea to start small, with a pilot project that will yield valuable lessons before venturing into anything big. � equipment—define what specifications are needed, see what is already available to use or borrow, then figure out what you will actually need to buy. � software—check out the variety of software for editing and production; think about how you want to begin using multimedia (primarily on the web, in presentation software such as powerpoint, as standalone videos on cds and dvds). � money—if funding permits, a library can invest several thousand dollars in a high-end multimedia computer, associated peripherals such as a color printer and one or more scanners, and a software suite to meet initial anticipated demands for multimedia creation and editing. if funding is scarce, you may want to investigate what existing equipment could be used in support of a pilot project. � location—this needs some space of its own, accessible to students and monitored by staff. although the figure 2. university of texas—pan american library multimedia lab usage editing workstation could be in an area with other computers, a quiet area is needed for shooting video so that there will not be interference from noise and unwanted foot traffic through the shots. � staffing and training—a multimedia lab is not a good candidate for self-service. librarians and staff who will provide the service need to learn how to use the equipment and software. make sure that they all have an acceptable level of competence and confidence so that the library can shine with its new service, but expect that everyone will need to continue to learn and grow in their proficiency. if your library plans to produce its own multimedia sessions as well, it would be a good investment to attend a class on television or video production. � hours—how many hours per week will the new service be available? if it is the entire time the library is open, be prepared to train plenty of staff. repeat users will need less help as their skills increase (by the way, some of these students can be great workstudy employees). � instruction—plan to offer formal orientation and instruction sessions to faculty and their classes. if your lab is small, this is challenging, but it can be accomplished with some creativity. for example, a general instruction session on concepts can be done in a classroom, followed up by a series of small groups working by appointment for the appliedlearning component in the multimedia lab. the author and a colleague have even done instruction outside the library using laptops and cameras, creating a de facto mobile studio. � copyright—if there are already vcrs or photocopiers in the library, you have had to deal with this issue. the pan american library at university of texas does not allow people to use its lab to copy movies, which is a request that surely will come to you, and we post the usual copyright notices just as we do at our photocopiers. for some excellent information on copyright, visit the american library association web site (www.ala.org). � evaluation—plan on at least basic evaluation of the service. this can include an assessment of the effectiveness of the instruction sessions, a survey of satisfaction with the lab itself, a questionnaire on the intended uses of the multimedia projects, demographic data on the students, or other student input. logs of the number of uses and peak-demand periods are extremely useful for planning and for justifying further expenditures and staffing requests. � flexibility for the future—whatever you do in a pilot phase, always keep in mind that you want to keep an open mind—you are trying to learn from the experience so that you can make good decisions for the direction of this new service. it may not go exactly the way you originally thought, because of serendipity, or changes in technology, or very strong demand from some segments of the campus instead of others, or other environmental factors. � conclusion benefits to the library from the multimedia lab are many. one of the most important benefits is that it keeps the library involved in the process of academic communication, as the medium of the communication changes with technology. by being involved in this evolving medium at its early stages, the library is poised to pounce on opportunities to employ it to the benefit of the library in instruction and content delivery. the library also would position itself on campus as a key player in it and the leading local expert in the growing field of multimedia. since multimedia is a tool that crosses the entire range of subject disciplines on campus, it opens the doors of faculty to collaborate with librarians in exciting new ways. just as many campuses already have learning and collaborative communities that grew around their web courseware or gis endeavors, so too can one develop around multimedia. the appendix offers a list of multimedia web sites to consider. libraries are more than warehouses of books and periodicals. as more and more of our resources have been made available electronically, and indeed more of higher education has moved to electronic delivery, many libraries have been faced with declining gate counts, circulations, and reference statistics. as someone observed, we are victims of our own success. so what is the role of the library? we are intrinsically involved in the process of instruction, academic research, and communication. as kling observed, “one important strategic idea is that libraries configure their it services and activities to emphasize the distinctive expertise of their librarians rather than simply concentrate on the size and character of the documentary collection.”7 it is imperative therefore that libraries pick out the new trends that will allow them to excel by capitalizing on their traditional strengths. references 1. scala, inc. multimedia directory. accessed apr. 21, 2004, www.scala.com/multimedia/multimedia-definition.html. 2. nielsen/netratings as of june, 2004. accessed aug. 10, 2004, www.websiteoptimization.com/. 3. about.com, dvt101. accessed apr. 15, 2004, http:// desktopvideo.about.com/library/weekly/aa040703a.htm; “anatomy of a video editing workstation,” pc magazine. accessed apr. 16, 2004, www.pcmag.com/article2/0,1759,1264650 ,00.asp. distinctive expertise: multimedia, the library, and the term paper of the future | mitchell 35 36 information technology and libraries | march 2005 4. college of dupage, “joe janes and colleagues: preparing for the future of digital reference,” a satellite broadcast from the college of dupage, 16 apr. 2004. 5. sandra kerka, creativity in adulthood (columbus, ohio: eric clearinghouse on adult career and vocational education, eric digest no. 204, ed429186, 1999). 6. american library association, “copyright issues, primer on the digital millennium.” accessed may 10, 2004, www.ala .org/ala/washoff/woissues/copyrightb/dmca/dmcprimer.pdf. 7. rob kling, “the internet and the strategic reconfiguration of libraries,” library administration & management 15, no. 3 (summer 2001): 144–51. appendix. for further reading: a multimedia web-site tour the following is a sampling of some of the most popular and interesting multimedia software, with examples of completed productions. this is not an official endorsement of any one product over another, whether listed here or not. a look at these sites will, however, give the reader an idea about the power and possibilities of multimedia communications. adobe (www.adobe.com) the well-known makers of some of the most powerful and popular editing software packages for graphics and video. camtasia (www.camtasia.com) easy to use, this is a good example of the type of software that does screen capture and recording, which is handy for producing online tutorials. clipstream (www.clipstream.com) an excellent example of the type of newer encoding software that achieves incredible compression of video and delivers it over the web with no viewer or plug-ins required for the user. finalcut pro (www.apple.com/finalcutpro) a perennial favorite among the mac crowd, this software is relatively easy to learn and lets the developer achieve dramatic results. flashants (www.flashants.com) a handy program that converts flash animation into .avi video format so that you can integrate animated sequences into a video production. macromedia (www.macromedia.com) the makers of flash and director, which are some of the most popular graphics, animation, and mulitimedia editing tools in the business. pinnacle (www.pinnaclesys.com) what finalcut pro is to the mac, this package is for the pc environment. easy to use, yet sophisticated in the results achieved. rovion (www.rovion.com) rovion bluestream is an encoder that enables the creation of avatar characters to appear live on your web page. a plugin is required for the user, but this approach definitely gets attention. serious magic (www.seriousmagic.com) an award-winning software package that allows you to turn a workstation into a studio, complete with teleprompter capability, sound effects, graphics, and editing. university of texas—pan american library (www.lib.panam.edu/libinfo/media.asp) links to multimedia projects at the author’s institution, including productions made by staff and students. 315 technical communications isad/solinet to sponsor institute "networks and networking ii; the present and potential" is the theme of an isad institute to be held at the braniff place hotel on february 27-28, 1975, in new orleans. the sponsors are the information science and automation division of ala and the southeastern library network (solinet). this second institute on networking will be an extension of the previous one held in new orleans a year ago. the ground covered in that previous institute will be the point of departure for "networks ii." the purpose of the previous institute was to review the options available in networking, to provide a framework for identifying problems, and to suggest evaluation strategies to aid in choosing alternative systems. while the topics covered in the previous institute will be briefly reviewed in this one, some speakers will take different approaches to the subject of networking, while other speakers will discuss totally new aspects. in addition to the papers given and the resultant questions and answers from the floor, a period of round table discussions will be held during which the speakers can be questioned on a person-to-person basis. a new feature to isad institutes now being planned will be the presence of vendors' exhibits. arrangements are being made with the many vendors and manufacturers whose services are applicable to networking to exhibit their products and systems. it is hoped that many of them will be interested in responding to this opportunity. the program will include: "a systems approach to selection of alternatives" -resource sharing-camponents-communications options-planning strategy. joseph a. rosenthal university of california, berkeley. ' "state of the nation"-review of current developments and an evaluation. brett butler, butler associates. "the library of congress, marc, and future developments." henriette d. avram, library of congress. "data bases, standards and data conversions" -existing data bases-characteristics-standardization-problems. john f. knapp, richard abel & co. "user products"-possibilities for product creation-the role of user products. maurice freedman, new york public library. "on-line technology"-hardware and software considerations-library requirements-standards-cost considerations of alternatives. philip long, state university of new york, albany. "publishers' view of networks"-copyright-effect on publishers-effect on authorship-impact on jobbers-facsimile transmission. carol nemeyer, association of american publishers. "national library of canada"-current and anticipated developments-cooperative plans in canada-international cooperation. rodney duchesne, national library of canada. "administrative, legal, financial, organizational and political considerations" -actual and potential problems-organizational options-financial commitment-governance. fred kilgour, oclc. registration will be $75.00 to members of ala and staff members of solinet institutions, $90.00 to nonmembers, and $10.00 to library school students. for hotel reservation information and registration blanks, contact donald p. hammer, isad, american library association, 50 e. huron st., chicago, il 60611; 312-944-6780. 316 journal of library automation vol. 7/4 december 1974 regional projects and activities indiana coopemtive libmry services authm·ity the first official meeting of the board of directors of the indiana cooperative library services authority (incolsa) was held june 4, 1974, at the indiana state library in indianapolis. a direct outgrowth of the cooperative bibliographic center for indiana libraries ( cobicil) feasibility study project sponsored by the indiana state library and directed by mrs. barbara evans markuson, incolsa has been organized as an independent not-for-profit organization "to encourage the development and improvement of all types of library service." to date, contracts have been signed by sixty-one public, thirteen academic, fourteen schools and five specfal librariesa total of ninety-three libraries. incolsa is being funded initially by a three-year establishment grant from the u.s. office of education, library services and construction act (lsca) title i funds. officers are: president-harold baker, head of library systems development, indiana state university; vice-presidentor. michael buckland, assistant director for technical services, purdue university libraries; secretary-mary hartzler, head of catalog division, indiana state library; treasurer-mary bishop, director of the crawfordsville book processing center; three directors-at-large--phil hamilton, director of the kokomo public library; edward a. howard, director of the evansville-vanderburgh county public library; and sena kautz, director of media services, duneland school corporation. stanford's ballots on-line files publicly available through spires september 16,.1974 the stanford university libraries automated technical processing system, ballots (bibliographic automation of large library operations using a timesharing system) , has been in operation for twenty-two months and supports the acquisition and cataloging of nearly 90 percent of all materials processed. important components of the ballots operations are several on-line files accessible through an unusually powerful set of indexes. currently available are: a file of library of congress marc data starting from january 1, 1972 (with a gap from may to august 1972); an in-process file of individual items being purchased by stanford; an on-line catalog (the catalog data file) of all items cataloged through the system, whether copy was derived from library of congress marc data, was input from non-marc cataloging copy, or resulted from stanford's own original cataloging efforts; and a file of see, see also, and explanatory references (the reference file) to the catalog data file. in addition, during september and october 1974, the 85,000 bibliographic and holdings records (already in machinereadable form on magnetic tape) representing the entire j. henry meyer memorial undergraduate library was convmted to on-line meyer catalog data and meyer reference files in ballots. these files are publicly available through spires (stanford public information retrieval system) to any person with a terminal that can dial up the stanford center for information processing's academic computer services computer (an ibm 360 model 67) and who has a valid computer account. the marc file can be searched through the following index points: lc card number personal name corporate/ conference n arne title the in-process, catalog data, and reference files for stanford and for meyer can also be searched as spires public subfiles through the following index points: ballots unique record identification number personal name corporate/ conference name title subject heading (catalog data and reference file records only) call number (catalog data and reference file records only) lc card number the title and corporate/ conference name indexes are word indexes; this means that each word is indexed individually. search requests may draw on more than one index at a time by using the logical operators "and," "or," and "and not" to combine index values sought. if you plan to use spires to search these files, or if you would like more information, a publication called gttide to ballots files may be ordered by writing to: editor, library computing services, s.c.i.p.-willow, stanford university, stanford, ca 94305. this document contains complete information about the ballots files and data elements, how to open an account number, and how to use spires to search ballots files. a list of ballots publications and prices is also available on request. as additional libraries create on-line files using ballots in a network environment, these files will also be available. these additions will be announced in ]ola technical commttnications. data base news interchange of alp and ei data bases a national science foundation grant (gn-42062) for $128,700 has been awarded to the american institute of physics (aip), in cooperation with engineering index ( ei), for a project entitled "interchange of data bases." the grant became effective on may 1, 1974, for a period of fifteen months. the project is intended to develop methods by which ei and alp can reduce their input costs by eliminating duplication of intellectual effort and processing. through sharing of the resources of the two organizations and an interchange of their respective data bases, alp and ei expect to improve the utilization of these computer-readable data bases. the basic requirement for the developtechnical communications 317 ment of the interchange capability for computer-readable data bases is the establishment of a compatible set of data elements. each organization has unique data elements in its data base. it will therefore be necessary to determine which of the data elements are absolutely essential to each organization's services which elements can be modified, and wh~t other elements must be added. mter the list of data elements has been established, it will be possible to unite the specifications and programs for format conversions from alp to ei tape format and vice versa. simultaneously, there will be the development of language conversion facilities between ei' s indexing vocabulary and alp's physics and astronomy classification scheme (pacs). it is also planned to investigate the possibility of establishing a computer program which can convert alp's indexing to ei's terms and vice versa. with the accomplishment of the above tasks, it will be possible to create new services and repackage existing services to satisfy the information demands in areas of mutual interest to engineers and physicists, such as acoustics and optics. eric data base users conference the educational resource information center (eric) held an eric data base users conference in conjunction with the 37th annual meeting of the american society for information science (asis) in atlanta, georgia, october 13-17, 1974. the eric data base users conference provided a forum for present and potential eric users to discuss common problems and concerns as well as interact with other components of the eric network: central eric, the eric processing and reference facility, eric clearinghouse personnel, and information dissemination centers. although attendees have in the past been primarily oriented toward machine use of the eric files, all patterns of usage were represented at this conference, from manual users of printed indexes to operators of national on-line reh·ieval systems. 318 ]oumal of library automation vol. 7/4 december 1974 a number of invited papers were presented dealing with subjects such as: • the current state and future directions of educational information dissemination. sam rosenfeld (nie), lee burchina! (nsf). • what services, systems, and data bases are available? marvin gechman (information general), harvey marron (nie). • the roles of libraries and industry, respectively, in disseminating educational information. richard de gennaro (university of pennsylvania), paul zurkowski (information industry association) . several organizations (national library of canada, university of georgia, wisconsin state department of education) were invited to participate in "show and tell" sessions to describe in detail how they are using the eric system and data base. a status report covering eric on-line services for educators was presented by dr. carlos cuadra (system development corporation) and dr. roger summit (lockheed). interactive discussion groups covered a number of subjects including: • computer techniques-programming methods, use of utilities, file maintenance, search system selection, installation, and operation. • serv:uig the end user of educational information. • introduction to the eric systemwhat tools, systems, and services are available and how are they used? • beginning and advanced sessions on computer searching the eric files. online terminals were used to demonstrate and explain use of machine capabilities. commercial services and developments scope data inc. ala train compatible terminal printers scope data inc. currently is offering a high-speed, nonimpact terminal printer for use in various interactive printing applications. capability can be included in the series 200 printer as an extra-cost feature to print the eight-bit ascii character set for ala character set with 176 characters. for further information contact alan g. smith, director of marketing, scope data inc., 3728 silver star rd., orlando, fl 32808. institute for scientific information puts life sciences data base on-line through system development corporation the institute for scientific information (lsi) has announced that it will collaborate with system development corporation (sdc) to provide on-line, interactive, computer searches of the life sciences journal literature. scheduled to be fully operational by july 1, 1974, the isi-sdc service is called scisearch® and is designed to give quick, easy, and economical access to a large life sciences literature .file. stressing ease of access, the sdc retrieval program, orbit, permits subscribers to conduct extremely rapid literature searches through two-way communications terminals located in their own facilities. mter examining the preliminary results of their inquiries, searchers are able to further refine their questions to make them broader or narrower. this dialog between the searcher and the computer (located in sdc's headquarters in santa monica, california) is conducted with simple english-language statements. because this system is tied in to a nationwide communications network, most subscribers will be able to link their terminals to the computer through the equivalent of a local phone call. covering every editorial item from about 1,100 of the world's most important life sciences journals, the service will initially offer a searchable ille of over 400,000 items published between april 1972 and the present. each month approximately 16,000 new items will be added until the average size of the file totals about one-half million items and represents two-and-one-half years ·of coverage. to assure subscribers maximum retrieval effectiveness when dealing with this massive amount of information, the data base can be searched in several ways. included are searches by keywords, word stems, word phrases, authors, and organizations. one of the search techniques utilized-citation searching-is an exclusive feature of the lsi data base. for every item retrieved through a search, subscribers can receive a complete bibliographic description that includes all authors, journal citation, full title, a language indicator, a code for the type of item (article, note, review, etc.), an lsi accession number, and all the cited references contained in the retrieved article. the accession number is used to order full-text copies of relevant items through lsi's original article tear sheet service (oats®). this ability to provide copies of every item in the data base distinguishes the lsi service from many others. current library of congress catalog on-line for reference searches information dynamics corporation (idc) has agreed to collaborate with system development corporation (sdc) to provide reference librarians, researchers, and scholars with on-line interactive computer searches of all library materials being cataloged by the library of congress. scheduled to be fully operational as of october 1, 1974, the sdc-idc service is called sdc-idc/libcon and is designed to give quick, easy, and economical access to a large portion of the world's scholarly library materials. as in the lsi service described above, the data base can be searched in several ways. included are compound logic searches by keywords, word stems, word phrases, authors, organizations, and subject headings for most english materials. one of the search techniques utilized-string searching-is an exclusive feature of sdc's orbit system. keyword searching of cataloged items including all foreign materials processed by the library of congress technical communications 319 is an exclusive feature of the idc data base not currently available in other online marc files. for individual items retrieved through a search, subscribers can receive a bibliographic description that includes authors, full title, an idc accession number, the lc classification number, and publisher information. standards the isad committee on technical standards for library automation invites your participation in the standards game editor's note: the tesla reactor ballot will be provided in f01'thcoming issues. to use, photocopy the ballot fol'm, fill out, and mail to: john c. kountz, associate for library automation, office of the chan{jellor, the california state university and colleges, 5670 wilshire blvd., suite 900, los angeles, ca 90036. the procedure this procedure is geared to handle both reactive (originating from the outside) and initiative (originating from within ala) standards proposals to provide recommendations to ala's representatives to existing, recognized standards organizations. to enter the procedure for an initiative standards proposal you must complete an "initiative standards proposal" using the outline which follows: initiative standard proposal outlinethe following outline is designed to facilitate review by both the committee and the membership of initiative standards proposals and to expedite the handling of the initiative standard proposal through the procedure. since the outline will be used for the review process, it is to be followed explicitly. where an initiative standard requirement does not require the use of a specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indi320 journal of library automation vol. 7/4 december 1974 cated by: vi. existing standards. not applicable). nate that the parenthetical statements following most of the outline entry descriptions relate to the ansi standards proposal section headings to facilitate the translation from this outline to the ansi format. all initiative standards proposals are to be typed, double spaced on 8~~~~ x 11" white paper (typing on one side only) . each page is to be numbered consecutively in the upper right-hand corner. the initiator's last name followed by the key word from the title is to appear one line below each page number. i. title of initiative standard proposal (title) . ii. initiator information (forward). a. name b. title c. organization d. address e. city, state, zip f. telephone: area code, number, extension iii. technical area. describe the area of library technology as understood by initiator. be as precise as possible since in large measure the information given here will help determine which ala official representative might best handle this proposal once it has been reviewed and which ala organizational component might best be engaged in the review process. iv. purpose. state the purpose of standard proposal (scope and qualifications) . v. description. briefly describe the standard proposal (specification of the standard). vi. relationship of other standards. if existing standards have been identified which relate to, or are felt to influence, this standard proposal, cite them here (expository remarks). vii. background. describe the research or historical review performed relating to this standard proposal (if applicable, provide a bibliography) and your findings (justification). viii. specifications. (optional) specify the standard proposal using record layouts, mechanical drawings, and such related documentation aids as required in addition to text exposition where applicable (specifications of the standard). kindly note that the outline is designed to enable standards proposals to be written following a generalized format which will facilitate their review. in addition, the outline permits the presentation of background and descriptive information which, while important during any evaluation, is a prerequisite to the development of a standard. tesla reactor ballot identification number for standing requirement reactor information name-----'----------tiue ______________________ ___ organization --------------addrms _____________ ___ city ___ _ state ___ zip __ _ telephonea 1:-:::-rea::+----~--- need (for this standard) for d against 0 specification (a presented in this requirement) for 0 against 0 ext. can you participate in the development of this. standard -.,.---------==----0 no d yes reason for position: (use format of proposal. · additional pages can be used if required) the reactor ballot is to be used by members to voice their recommendations relative to initiative standards proposals. the reactor ballot permits both "for" and "against" votes to be explained, permitting the capture of additional information which is necessary to document and communicate formal standards proposals to standards organizations outside of the american library association. as you, the members, use the outline to present your standards proposals, tesla will publish them in jola-tc and solicit membership reaction via the reactor ballot. throughout the process tesla will insure that standards proposals are drawn to the attention of the applicable american library association division or committee. thus, internal review usually will proceed concurrently with membership review. from the review and the reactor ballot tesla will prepare a "majority recommendation" and a "minority report" on each standards proposal. the majority recommendation and minority report so developed will then be transmitted to the originator, and to the official american library association representative on the appropriate standards organization where it should prove a source of guidance as official votes are cast. in addition, the status of each standards proposal will be reported by tesla in jola-tc via the standards scoreboard. the committee (tesla) itself will be nonpartisan with regard to the proposals handled by it. however, the committee does reserve the right to reject proposals which after review are not found to relate to library automation. input to the editor: we have been asked by the members of the ala interdivisional committee on representation in machine readable form of bibliographic information, (marbi) to respond to your editorial in the june 1974 issue of the journal of library automation. this editorial dealt with the council of library resources' [sic] involvement in a wide range of projects, ranging from the sponsorship of a group which is attempting to develop a subset of marc for use in inter-library exchange technical communications 321 of bibliographic data ( cembi), to management of a project which has as its goal the creation of a national serials data base, (conser), and, more recently, to the convening of a conference of library and a&i organizations to discuss the outlook for comprehensive national bibliographic control. you raised several legitimate questions: 1) has sufficient publicity been given to these activities of the council so that all, not just a few, libraries are aware of what is happening and have an opportunity to exert an influence on developments? and, 2) is the council bypassing existing channels of operation and communication? you also suggest that proposals from groups such as cembi be channeled through an official ala committee such as marbi for intensive review and evaluation. it should be pointed out that marbi is not charged with the development of standards. it acts to monitor and review proposals affecting the format and content of machine readable bibliographic data, where that data has implications for national or international use. this applies to proposals emanating from cembi and conser as well as from other concerned groups. all indications to date are that the council is fully aware of marbi's role and will not bypass marbi. a number of members of marbi are also members of cembi and marbi is represented on the conser project. also reassuring is the fact that, unless we allow lc to fall by the wayside in its role as the primary creator and distributor of machine readable data, any standards for format or content developed by a council-sponsored group will eventually be reflected in the marc records distributed by lc. the library of congress has issued a statement, published in the june 1974 issue of jola, to the effect that it will not implement any changes in the marc distribution system which are not acceptable to marbi. marbi and lc have worked out a procedure whereby all proposed changes to marc are submitted to marbi. they are then published in ]ola and distributed to mem322 journal of library automation vol. 7/4 december 1974 hers of the marc users discussion group for comments. comments are collected and evaluated by marbi and a report submitted to lc, with its recommendations. the marbi review process does not guarantee perfection and there is no assurance that everyone will be satisfied. compromise and expediency are the name of the game in this extremely complicated and uncharted area of standards for machine readable bibliographic data. however the council has undoubtedly learned from the isbd(m) experience that it cannot make decisions which affect libraries without the greatest possible involvement of librarians. it is the feeling of the marbi committee members that the council intends to work with marbi in future projects which fall into marbi's area of concern. velma veneziano marbi past chairperson ruth tighe chairperson editor's note: it is gratifying to note that marbts response reflects the opinions expressed in the june 1974 editorial. the library community will doubtless. be pleased to learn of clr's intention to work closely with marbi.-skm to the editor: as briefly discussed with you, yom editorial in the june 1974 issue of jola is both admirable and disturbing (to me, at least). the problem of national leadership in the area of library automation is a critical problem indeed. being in the ''boondocks" and far removed from the scene of action, i can only express to you my perception as events and activities filter through to me. i can remember as far back as 1957 when adi had a series of meetings in washington, d.c. trying to establish a national program for bibliographic automation. i have been through eighteen years of meetings, committees, conferences, etc. concerned with trying to develop a national plan for bibliographic automation and information storage and retrieval systems. i have worked with nsf, usoe, department of commerce, u.s. patent office, engineering and technical societies, dod agency-the entire spectrum. i spent a good many years working in adi and asis, sla, andmost recently ala. at no time were we able to make significant progress towards a national system. even the great airlie house conference did not produce any significant changes in the fragmented, competitive "non-system." it has only been in the recent past since clr has taken an aggressive posture that i am able to see the beginning of orderly development of a national automated bibliographic system. i certainly agree that any topic as critical as those being discussed by cembi should be in the public domain, but i also believe that the progress made by cembi would not have been possible without clr taking the initiative in getting these key agencies together. thank goodness someone quit talking and started doing something at the national level! i sincerely believe that in the absence of a national library and with the cmrent lack of legally derived authority in this arena, clr provides a genuine service to the total library community in establishing cembi. hopefully, your very excellent article (in the same issue of jola) on "standards for library automation ... " will help to put the entire issue of bibliographic record standards into perspective. as a former chemist and corrosion engineer, i am fully aware of the absolute necessity for technical standards. i am also fully aware of the necessity of developing technical standards through the process you outlined in your article. hopefully, clr action with cembi will expedite this laborious process and help to push our profession forward into the twentieth century. since we ourselves have not been able to do it through all these years, i am personally grateful that some group such as clr took the initiative and forced us to do what we should have done years ago. maryann duggan slice office di1·ector editor's note: positive action and progmssive movement are, of course, desirable and are often lacking in large organizations. however, posit·ive action without communication of this action to the affected population can only be detrimental. on issues of the complexity of those addressed by cembi and conser, review by the library community is always useful, even though action may be temporarily delayed.-skm to the editor: on page 233 of the september issue of lola there is a report from the information industry association's micropublishing committee chairman (henry powell). he states that", .. the committee spelled out several areas of concern to micropublishers which will be the subject of committee action .... " one of the concerns of the committee is that a z39 standards committee has recommended "standards covering what micropublishers can say about their products." (emphasis mine.) technical communications 323 as chairman of the z39 standards subcommittee which is developing the advertising standard referred to, i wish to point out that there is no intention on the part of the subcommittee to tell micropublishers what they can say nor what they may say about their products. the subcommittee, which is composed of representatives from three micropublishing concerns, two librarians, and myself, has from the beginning taken the view that the purpose of the standard would be to provide guidance for micropublishers and librarians alike. we are most anxious that no one feel that the subcommittee has any intention of attempting to use the standards mechanism to tell any micropublisher how he must design his advertisements. in addition it should be noted that no ansi standard is compulsory. carl m. spaulding program officer council on library resou1·ces microsoft word september_ital_dowling_final.docx president’s message thomas dowling information technologies and libraries | september 2015 doi: 10.6017/ital.v34i3.8966 1 fall has arrived, faster than expected (as it always does). it seems like ala annual just wrapped up in san francisco, but we're already well underway with the coming year's activities. the national forum 2015 will be here before you know it. in a fall season crowded with good technology conferences, lita forum consistently proves its value as a small, engaging, and focused meeting. technologists, strategists, and front-‐line librarians come together to discuss the tools they make and use to provide cutting edge library services. in addition to great lita programming, this year we're working with colleagues from llama (the library leadership and management association) to provide a set of programs focused on the natural cooperation of management and technologies in libraries. there are great preconferences on makerspaces and web analytics, keynote addresses, over 50 concurrent sessions, and a lot of networking opportunities. click on over to litaforum.org, and i hope to see you in minneapolis, november 12-‐ 15. not too long after forum, we'll be in boston for midwinter, and then annual in orlando. the program planning committee is already at work selecting the best programs for annual. next summer is also the start of lita’s 50th anniversary celebrations! of course, not everything we do involves travel and in-‐person meetings. lita’s fall schedule of webinars includes sessions on patron privacy, creative commons, personal digital archiving, and a second iteration of top technologies every librarian needs to know. on the staff side, we are happy to say that jenny levine has started as lita’s new executive director. jenny comes to us from ala’s it and telecommunications services department, where she is still putting in some time bringing a new version of ala connect online. jenny and the governing board are already working together virtually: we are about to select our emerging leaders for the year and are working on an exercise to set divisional priorities, with an eye toward drafting a new strategic plan. the board will hold two online meetings this fall. as always, these are open meetings, so if you’re interested in your association’s governance, you’re welcome to sit in. watch the board’s area in connect for details, or look for upcoming posts to lita-‐l and litablog.org. and if you need to contact the board, you can reach us at http://www.ala.org/lita/about/board/contact. i hope to meet as many lita members as possible this year, at one of the upcoming in-‐person meetings or online, or just drop me a line on connect. it’s going to be a great year for lita. thomas dowling (dowlintp@wfu.edu) is lita president 2015-‐16 and director of technologies, z. smith reynolds library, wake forest university, winston-‐salem, north carolina. fulfill your digital preservation goals with a budget studio yongli zhou information technology and libraries | march 2016 26 abstract to fulfill digital preservation goals, many institutions use high-end scanners for in-house scanning of historical print and oversize materials. however, high-end scanner prices do not fit in many small institutions’ budgets. as digital single-lens reflex (dslr) camera technologies advance and camera prices drop quickly, a budget photography studio can help to achieve institutions’ preservation goals. this paper compares images delivered by a high-end overhead scanner and a consumer-level dslr camera, discusses pros and cons of using each method, demonstrates how to set up a cost-efficient shooting studio, and presents a budget estimate for a studio. introduction colorado state university libraries (csul) are regularly engaged in a variety of digitization projects. materials for some projects are digitized in-house, while items from selected projects are sometimes outsourced. most fragile materials that require professional handling are digitized inhouse using an expensive overhead scanner. however, the overhead scanner has been occasionally unstable since it was purchased, and this has delayed some of our digitization projects. as digital photography technologies advance, image quality delivered by digital singlelens reflex (dslr) cameras is improving, and camera prices have lowered to an affordable level. in this paper, i will compare images produced by a scanner and a camera side-by-side, list pros and cons of using each method, illustrate how to establish a shooting studio, and present a budget estimate for that studio. literature review there are many online guidelines and manuals for digitizing print materials. some universities and museums have information about their digitization equipment online. most articles focus on either high-end scanners or customized scanning stations. these articles are very helpful for universities and museums that are relatively well funded. however, there is almost no literature discussing how to use inexpensive digital cameras and photography equipment to produce highquality digitized images. this article will use a case study to prove that a low-budget studio can produce high-quality digitized images. comparison of scanned and photographed images the test camera set was chosen because it was the one the author used for general purpose. the camera was also chosen by many professional photographers because of its quality and yongli zhou (yongli.zhou@colostate.edu) is digital repositories librarian, colorado state university libraries, fort collins, colorado. mailto:yongli.zhou@colostate.edu fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 27 affordability. to avoid dispute, the overhead scanner’s make and model are not revealed. test equipment budget studio overhead scanner • nikon d800 • nikon af micro-nikkor 60mm f/2.8d lens • manfrotto 055cxpro3 3-section carbon fiber tripod legs • really right stuff bh-40 lr ii ballhead • nonreflective glass • book cradles • x-rite original colorchecker card • natural daylight • total cost: $4,500 and no maintenance fees (priced in 2014) • our overhead scanner • nonreflective glass • book cradles • purchase price: $55,000 (purchase in 2007) • $8,000 annual maintenance (2013 price) table 1. test equipment focus and sharpness a quality digitized image needs to have a good focus. a well-focused image shows details better and can produce better optical character recognition (ocr) results for text-based documents. at csul, we have no control over the automatic focus on our overhead scanner and have noticed that sometimes one page is sharply focused but the next page is slightly out-of-focus. during the scanning process, our overhead scanner does not indicate if a shot is focused or not. a dslr camera can beep or display a flashing dot on the viewfinder when in focus. illustration the following two figures compare images produced by our test dslr and overhead scanner. both images were originals and have not been enhanced by software. in addition to this image, we tested nine other illustrations. following our comparison study, we concluded that a semiprofessional dslr camera produces sharper images than our expensive overhead scanner. in figure 1, at 100 percent zoom , the left image has a better focus, contains more details, and has colors closer to the original. the left image was taken using a nikon d800 + nikkor 60mm macro lens and under natural lighting. the right image was produced by our overhead scanner. in figure 2, at 200 percent zoom, the left image (taking using the dslr) shows much more detail than the image on the right (taken with the overhead scanner). information technology and libraries | march 2016 28 figure 1. comparative images from dslr (left) and overhead scanner (right), at 100 percent zoom. image from samuel m. janney, the life of william penn; with selections from his correspondence and auto-biography (philadelphia: hogan perkins & co, 1852), plate between pages 296 and 297. figure 2. comparative images from dslr (left) and overhead scanner (right), at 200 percent zoom. image from samuel m. janney, the life of william penn; with selections from his fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 29 correspondence and auto-biography (philadelphia: hogan perkins & co, 1852), frontispiece, print. at csul, the process of digitizing a text document includes scanning pages, converting them into portable document format (pdf) files, and applying an ocr process. in general, a well-focused image of text produces better ocr results, although software such as adobe acrobat can tolerate fuzzy images and produce reasonably accurate ocr text. our ocr tests from a slightly out-of-focus image and a well-focused image have no significant difference; however, from preservation and usability standpoints, we prefer well-focused images. figure 3. the left image was produced by our test dslr camera and has a better focus. the right image was produced by our overhead scanner. samuel m. janney, the life of william penn; with selections from his correspondence and auto-biography (philadelphia: hogan perkins & co, 1852), 300, print. figure 4. we ran the ocr process on the above two images. the top image was produced by our test dslr camera and the bottom image was produced by our overhead scanner. samuel m. information technology and libraries | march 2016 30 janney, the life of william penn; with selections from his correspondence and auto-biography (philadelphia: hogan perkins & co, 1852), 300, print. generated from the image by camera generated from the image by scanner " on one or two points of high importance, he had notions more correct than were, in his day, common, even among men of e1~larged minds, and he had the rare good fortune of being able to carry his theories into practice without any compromise." yet, "he was not a man of stron sense." " on one or two points of high importance, he bad notions more correct than were, in his day, common, even arnong men of e1~larged minds, and he had the rare good fortune of being able to carry his theories into practice without any compromise." yet, "he was not a man of strong sense." table 2. ocr results comparison these test results are very close because of the forgiveness of the adobe acrobat software. however, we have seen that for some other pages, a better-focused image generates improved ocr results. photograph a 6.5 inches by 4.5 inches silver print was used for this test. our tests show that the test dslr camera produced a sharper image of this historic photograph. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 31 figure 5. tested 6.5 inches by 4.5 inches photograph. the red square indicates the enlarged area for figure 6. historical photograph from colorado state university archives and special collections. figure 6. screen view at 100 percent zoom of a silver print. the top image was produced by the test dslr camera and the bottom one was produced by our overhead scanner. historical photograph from colorado state university archives and special collections. oversize materials for oversized materials, overhead scanners and dslr cameras have their drawbacks, so we do not think either option is ideal for them. our library uses a map scanner to scan oversize maps and posters. however, a map scanner is expensive and may not fit many libraries’ budgets. a map scanner also is not suitable for fragile maps or posters. our overhead scanner’s maximum scanning area is 24 inches by 17 inches, and the test map’s size is 25 inches by 26 inches. we had to scan the map in four sections and stitch them together using adobe photoshop. each section image has a files size of 313 mb. because of large file sizes, the stitching process is extremely slow. also stitching images is not recommended because there are always some degrees of mismatching errors created by lens distortion. a camera can capture any material size, but the details of the photographed images diminish as the material’s size increases. the photo of the entire map taken by our test dlsr has a file size of 35.8 mb. the image produced by camera has a lower resolution and less detail. information technology and libraries | march 2016 32 figure 7. oversized materials screen view at 100 percent zoom. the top image was photographed by the test dslr. the bottom image was scanned by our overhead scanner. historical map from colorado state university archives and special collections. small prints one big advantage of a dslr camera is that it can be set farther away to take pictures of oversized materials or very close to smaller objects to take close-up pictures. comparatively, the distance of lens and scanning platform on our overhead scanner is fixed, so no close-up images can be produced, and everything is reproduced at scale of 1:1. for the following example, we used a 5.5 inches by 3.5 inches drawing as our test subject. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 33 figure 8. a 5.5 inches by 3.5 inches fine drawing. a historical booklet from colorado state university archives and special collections. figure 9. small prints screen view at 100 percent zoom. the left image is produced by a dslr with a macro lens and the right image was scanned by our overhead scanner. a historical booklet from colorado state university archives and special collections. information technology and libraries | march 2016 34 the image produced by our overhead scanner has a resolution of 3,427 pixels by 2,103 pixels. the camera produces a 6,776 pixels by 4,240 pixels image. the higher pixel count allows users to see more details at the same zoom level. the image produced by camera is not only sharper but also contains more details. it also is good for making enlarged prints for promotion materials. for smaller maps, a dslr camera also produces superior images. for the following sample, we tested a 15 inches by 9.5 inches map. figure 10. a 15 inches by 9.5 inches map. the blue square indicates the enlarged area for figure 11. historical map from colorado state university archives and special collections. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 35 figure 11. small map screen views at 100 percent zoom. the left image was photographed by a dslr camera with a macro lens and the right image was produced by our overhead scanner. historical map from colorado state university archives and special collections. post-processing use of a sharpening filter our tests showed that a main drawback of our overhead scanner is that images produced are outof-focus. some digitization guidelines recommend minor post-processing for delivered images files to improve image quality. one might argue that to fix our overhead scanner’s out-of-focus problem, sharpening can be applied. technical guidelines for digitizing cultural heritage materials: creation of raster image master files recommends doing minor post-scan adjustment to optimize image quality and bring all images to a common rendition.1 this is good advice, but it is not applicable in real-world practice. to get the best result, each image would need to be evaluated and have a sharpening filter applied separately because when an improper sharpening setting is applied to an image, it often creates haloing artifacts and an unnatural look. the application of a sharpening filter to each image process will be extremely time-consuming. the haloing artifact is also called chromatic aberration (ca) effect. ca appears as unsightly color fringes near high contrast edges. chromatic aberrations are typically only visible when viewing the image on-screen at higher zoom levels or on large prints. information technology and libraries | march 2016 36 the following example shows that the ca may not appear at lower zoom levels, such as 50 percent or 100 percent. the left image has no sharpening filter applied and the right image has a sharpening filter applied. at 100 percent zoom, chromatic aberration is almost not identifiable, and the right image appears to be superior in turns of sharpness. figure 12. sharpening filter comparison sample at 100 percent zoom. the left image has no sharpening filter applied and the right image has been applied a sharpening filter. historical map from colorado state university archives and special collections. at a higher zoom level, we see ca, visible in the right image of figure 13. the extra colors are introduced by the software. figure 13. comparison of sharpening filter applied to images and at 500 percent zoom. the left image has no sharpening filter applied and the right image has sharpening filter applied. historical map from colorado state university archives and special collections. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 37 we recommend not applying sharpening filters to original scanned images; instead, attempt to obtain well-focused images from the beginning. for this reason, the test dslr camera outperformed our overhead scanner for most materials. color balance have you seen a scanned color image or color photograph with colors very different from the original image? for example, a white area appears to be bluish, or it has an orange cast? when scanning or photographing an image under different lighting, the output image can have very different colors. in the following figure, the left image was shot at a correct white balance (wb) setting. wb is the process of removing unrealistic color casts so that objects that appear white in person are rendered white in your photo.2 the center image has a blue color cast, which was caused by a lower kelvin setting, and the right image was shot at a higher kelvin setting. a camera may create images with the wrong colors, but so will a scanner if it is not calibrated correctly. figure 14. images shot under different white balance settings. we pay an $8,000 annual service fee for overhead scanner maintenance, which includes scanner color calibration. in general, image colors rendered by the machine are close to original colors but not exact. we have noticed that some images have a very light green overcast and other others are overly yellow; sometimes images appear to be darker than they should be. because we are not certified to calibrate the overhead scanner, we only use the prescribed settings set by technicians. also, we have no control over maintaining a fading light bulb, which will affect correct exposure. wb adjustment on photographs taken in a studio can be very precise. most dslr contains a variety of preset white balances. in general, auto wb works well, but does not deliver the best results. custom wb allows fine-tuning of colors. if a shooting studio is set up properly, the lighting should be consistent, so ideally one setting found most desirable can be used repeatedly. however, professional photographers do test shots at the beginning of each shooting session. once they find information technology and libraries | march 2016 38 the optimal test shot, they will use the exact settings for the batch. later, they will do minor color adjustment on the chosen test shot to ensure precise color representation, and then apply the adjustment settings on all other photos of the same batch. because many small variations can be present for each shooting session, they do not use the settings from the previous shooting. it may seem arduous to do test shots for each shooting, but it ensures accurate color reproduction. many professional photographers use colorchecker passport,3 which is a commercial product to help with quick and easy capture of accurate colors. i will demonstrate briefly a useful trick i learned from a professional photography seminar how to utilize colorchecker passport to apply correct white balance a group of images. 4 step 1: place an 18 percent gray card or a colorchecker passport card on top of a page. choose the correct exposure and take the photo. use the same exposure setting to take additional photos. for demonstration purposes, we deliberately used a very low and high kelvin setting for sample images. the low kelvin setting created cool and blue tones and the high kelvin setting created a tone that was too warm. note that the test shot with colorchecker board was not taken with exactly the correct white balance setting. figure 15. sample images for white balance adjustment. rocky mountain collegian 3–4 (1893), 118, colorado state university archives and special collections. step 2: in adobe lightroom, select the test target image and switch to “develop” mode. select the white balance tool, move the cursor over a gray area, try to find a spot where the red, green, and blue (rgb) values are close. if you can find a place with equal rgb values, it will be ideal. this simple click will set the test image’s white balance to an almost perfect setting. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 39 figure 16. applying a white balance in adobe lightroom 4 step 3. synchronize other images’ settings with the target image. select the target image and all other images, click the sync button, and select settings you would like to synchronize. make sure the wb button is checked. figure 17. synchronize settings in adobe lightroom 4 information technology and libraries | march 2016 40 figure 18. synchronized images with correct white balance. rocky mountain collegian 3–4 (1893), 118, colorado state university archives and special collections. recently, i had the opportunity to visit the spencer museum of art’s digitization lab. they have a different workflow to ensure even more scientifically correct colors. if you are interested in their approach, you can contact their information technology manager or photographer. color space one very important thing to understand is color space when you use a dslr camera. many dslr cameras support adobe rgb and srgb. srgb reflects the characteristics of the average cathode ray tube (crt) display. this standard space is endorsed by many hardware and software manufacturers, and it is becoming the default color space for many scanners, low-end printers, and software applications. it is the ideal space for web work but not recommended for prepress work because of its limited color gamut. adobe rgb (1998) was designed to encompass most of the colors achievable on cmyk printers, but only by using rgb primary colors on a device such as your computer display.5 it is recommended to use this color space if you need to do print production work with a broad range of colors. many scanning vendors deliver images in adobe rgb color space. prophoto rgb contains all colors that are in adobe rgb, and adobe rgb contains nearly every color that is in srgb. this color space covers more colors than the human eye can see. it can only be used for images in raw format and in 16-bit mode. common file formats that support 16-bit images are tiff and psd. most printers do not support 16-bit format. this color space normally is used by photographers who have a specific workflow and who print on specific high-end inkjet printers. when converting from 16-bit to 8-bit, some images will have banding or posterization problems. banding is a digital imaging artifact. a picture with banding problem shows horizontal or vertical lines. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 41 figure 19. an example of colour banding, visible in the sky in this thotograph.6 posterization of an image entails conversion of a continuous gradation of tone to several regions of fewer tones, with abrupt changes from one tone to another.7 figure 20. an example of posterization.8 while it is a good idea to capture images using adobe rgb to preserve a wide range of colors, you should convert images to srgb when delivering to unknown users and displaying on the web. currently, srgb is the only appropriate choice for images uploaded to the web, since most web browsers don’t support any color management. adobe rgb images that are uploaded to websites without conversion to srgb generally appear dark and muted.9 if they were printed on printers that do not support adobe rgb format, colors will be dull too. setting up a budget studio commercial approach bookdrive pro is a commercially available digitization unit. it uses two digital cameras and built-in flash lights. it may be the optimal solution for your projects, but it also may not fit your library’s information technology and libraries | march 2016 42 budget. the unit also is not suitable for oversized material such as large maps and posters. for more information about this product, please visit http://pro.atiz.com/. sample budget studio setup a digitization lab can have three rooms or areas, one for oversized materials, one for smaller prints or 3-d objects, and one for computers. the area for shooting oversized materials should have black walls and floor. you can either use one flash light to bounce light off the ceiling or use two flash lights to shine lights directly onto the materials. for fragile materials, the first approach is more appropriate. the area for shooting smaller prints or 3-d objects should have a stable table and black or white background paper. for this room or area, black walls and floor are not required. for shooting equipment, i will use the set chosen by the photographer from the university of kansas spencer museum of art as my example. item name sample item purchasing url price dslr camera nikon d810 http://www.bhphotovideo.co m/c/search?atclk=camera+mo del_nikon+d810&ci=6222&n= 4288586280+3907353607 $2,996.95 macro lens nikon af micro-nikkor 60mm f/2.8d lens http://www.bhphotovideo.co m/c/product/66987grey/nikon_1987_af_micro_ nikkor_60mm_f_2_8d.html $429.00 heavy duty mono stand arkay 6jrcw mono stand jr with counter weight— 6' http://www.bhphotovideo.co m/c/product/2727reg/arkay_605138_6jrcw_m ono_stand_jr.html $678.50 strobe broncolor g2 pulso— 1600 watt/second focusing lamphead with 16' cord http://www.bhphotovideo.co m/c/product/259745reg/broncolor_32_115_07_g2 _pulso_with_16.html $3,053.68 power pack broncolor senso a4 2,400w/s power pack http://www.bhphotovideo.co m/c/product/745060reg/broncolor_31_051_07_se nso_a4_2_400w_s_power.html $3,629.92 http://www.bhphotovideo.com/c/search?atclk=camera+model_nikon+d810&ci=6222&n=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=camera+model_nikon+d810&ci=6222&n=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=camera+model_nikon+d810&ci=6222&n=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=camera+model_nikon+d810&ci=6222&n=4288586280+3907353607 http://www.bhphotovideo.com/c/product/66987-grey/nikon_1987_af_micro_nikkor_60mm_f_2_8d.html http://www.bhphotovideo.com/c/product/66987-grey/nikon_1987_af_micro_nikkor_60mm_f_2_8d.html http://www.bhphotovideo.com/c/product/66987-grey/nikon_1987_af_micro_nikkor_60mm_f_2_8d.html http://www.bhphotovideo.com/c/product/66987-grey/nikon_1987_af_micro_nikkor_60mm_f_2_8d.html http://www.bhphotovideo.com/c/product/2727-reg/arkay_605138_6jrcw_mono_stand_jr.html http://www.bhphotovideo.com/c/product/2727-reg/arkay_605138_6jrcw_mono_stand_jr.html http://www.bhphotovideo.com/c/product/2727-reg/arkay_605138_6jrcw_mono_stand_jr.html http://www.bhphotovideo.com/c/product/2727-reg/arkay_605138_6jrcw_mono_stand_jr.html http://www.bhphotovideo.com/c/product/259745-reg/broncolor_32_115_07_g2_pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-reg/broncolor_32_115_07_g2_pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-reg/broncolor_32_115_07_g2_pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-reg/broncolor_32_115_07_g2_pulso_with_16.html http://www.bhphotovideo.com/c/product/745060-reg/broncolor_31_051_07_senso_a4_2_400w_s_power.html http://www.bhphotovideo.com/c/product/745060-reg/broncolor_31_051_07_senso_a4_2_400w_s_power.html http://www.bhphotovideo.com/c/product/745060-reg/broncolor_31_051_07_senso_a4_2_400w_s_power.html http://www.bhphotovideo.com/c/product/745060-reg/broncolor_31_051_07_senso_a4_2_400w_s_power.html fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 43 reflector broncolor p65 reflector, 65 degrees, 11" diameter, for broncolor pulso 8, twin and hmi http://www.bhphotovideo.co m/c/product/7162reg/broncolor_33_106_00_p6 5_reflector_65_degrees.html $513.52 reflector broncolor softlight reflector, 20" diameter, for broncolor primo, pulso 2/4 & hmi heads http://www.bhphotovideo.co m/c/product/7167reg/broncolor_33_110_00_sof tlight_reflector_20_for.html $501.76 light stand impact air-cushioned light stand http://www.bhphotovideo.co m/c/product/253067reg/impact_ls10ab_air_cush ioned_light_stand.html $44.99 light meter sekonic l-308s flashmate—digital incident, reflected and flash light meter http://www.bhphotovideo.co m/c/product/368226reg/sekonic_401_309_l_308s _flashmate_light_meter.html $199.00 book cradle book exhibition cradles http://www.universityproduct s.com/cart.php?m=product_list &c=1115&primary=1&parenti d=1271&navtree[]=1115 $30.00 background paper savage seamless background paper (both white and black) http://www.bhphotovideo.co m/c/product/45468reg/savage_1_12_107_x_12yd s_background.html $45.00 x 2 = $90.00 nonreflective glass 1/4" optiwhite starphire purified tempered single lite clear class can be purchased at local glass store. $75.00 white balancing accessory x-rite original colorchecker card http://www.bhphotovideo.co m/c/product/465286reg/x_rite_msccc_original_c olorchecker_card.html $69.00 software adobe lightroom 5 http://www.adobe.com/produ cts/photoshop-lightroom.html $150.00 table 3. list of items needed to prepare for a budget studio the total cost for a “budget” shooting studio ranges from $10,000 to $15,000, and there is no annual maintenance expense. http://www.bhphotovideo.com/c/product/7162-reg/broncolor_33_106_00_p65_reflector_65_degrees.html http://www.bhphotovideo.com/c/product/7162-reg/broncolor_33_106_00_p65_reflector_65_degrees.html http://www.bhphotovideo.com/c/product/7162-reg/broncolor_33_106_00_p65_reflector_65_degrees.html http://www.bhphotovideo.com/c/product/7162-reg/broncolor_33_106_00_p65_reflector_65_degrees.html http://www.bhphotovideo.com/c/product/7167-reg/broncolor_33_110_00_softlight_reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-reg/broncolor_33_110_00_softlight_reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-reg/broncolor_33_110_00_softlight_reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-reg/broncolor_33_110_00_softlight_reflector_20_for.html http://www.bhphotovideo.com/c/product/253067-reg/impact_ls10ab_air_cushioned_light_stand.html http://www.bhphotovideo.com/c/product/253067-reg/impact_ls10ab_air_cushioned_light_stand.html http://www.bhphotovideo.com/c/product/253067-reg/impact_ls10ab_air_cushioned_light_stand.html http://www.bhphotovideo.com/c/product/253067-reg/impact_ls10ab_air_cushioned_light_stand.html http://www.bhphotovideo.com/c/product/368226-reg/sekonic_401_309_l_308s_flashmate_light_meter.html http://www.bhphotovideo.com/c/product/368226-reg/sekonic_401_309_l_308s_flashmate_light_meter.html http://www.bhphotovideo.com/c/product/368226-reg/sekonic_401_309_l_308s_flashmate_light_meter.html http://www.bhphotovideo.com/c/product/368226-reg/sekonic_401_309_l_308s_flashmate_light_meter.html http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentid=1271&navtree%5b%5d=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentid=1271&navtree%5b%5d=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentid=1271&navtree%5b%5d=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentid=1271&navtree%5b%5d=1115 http://www.bhphotovideo.com/c/product/45468-reg/savage_1_12_107_x_12yds_background.html http://www.bhphotovideo.com/c/product/45468-reg/savage_1_12_107_x_12yds_background.html http://www.bhphotovideo.com/c/product/45468-reg/savage_1_12_107_x_12yds_background.html http://www.bhphotovideo.com/c/product/45468-reg/savage_1_12_107_x_12yds_background.html http://www.bhphotovideo.com/c/product/465286-reg/x_rite_msccc_original_colorchecker_card.html http://www.bhphotovideo.com/c/product/465286-reg/x_rite_msccc_original_colorchecker_card.html http://www.bhphotovideo.com/c/product/465286-reg/x_rite_msccc_original_colorchecker_card.html http://www.bhphotovideo.com/c/product/465286-reg/x_rite_msccc_original_colorchecker_card.html http://www.adobe.com/products/photoshop-lightroom.html http://www.adobe.com/products/photoshop-lightroom.html information technology and libraries | march 2016 44 figure 21. the university of kansas spencer museum of art digitization lab setup for oversized materials figure 22. steelworks museum of industry and culture’s digitization lab setup for oversized materials fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 45 figure 23. the university of kansas spencer museum of art digitization lab setup for smaller prints and 3-d objects figure 24. steelworks center of the west’s digitization lab setup for 3-d objects functions of some elements in the sample shooting studio 1. macro lens: it allows close up shooting of objects. it is especially useful when photograph small prints and small 3-d objects. it can also be used to photograph regular and oversized materials. 2. heavy-duty mono stand: it replaces a traditional tripod. it is very stable and allows quick adjustment of camera height and location. 3. strobe, power pack, and reflector: together they generate consistent and homogeneous light distribution. recommended further reading: “introduction to offcamera flash: three main choices in strobe lighting.”10 4. light stand: it holds strobe and reflector. information technology and libraries | march 2016 46 5. light meter: hand-held exposure meters measure light falling onto a light-sensitive cell and converts it into a reading that enables the correct shutter speed and or lens aperture settings to be made.11 6. book cradles: they help to minimize the stress on bookbindings and minimize page curvature problem. 7. nonreflective glass: it helps to flatten a photographed page and reduce the reflection. however, it does not completely eliminate glass reflection. one very useful trick to reduce glass reflection is to place a black board with a hole above a page and shoot through the hole. this approach actually does not eliminate reflection but reflects black to the photograph. when the photograph is reviewed on computer, it will appear as no reflection has occurred. figure 25. the university of kansas spencer museum of art digitization lab setup for materials needed be pressed down by a glass. many librarians believe that digitizing print materials using a digital camera requires a professional photographer, but this is not necessarily true. a professional photographer or even an art student can act as a consultant to help set up a shooting studio and provide basic training. also, many museums have professional photographers and have set up shooting studios for digitization. they are very willing to share their experience and even provide training. i believe the learning curve for operating a shooting studio is no greater than the learning curve to operate an overhead scanner machine and its software. pros and cons no digitization equipment or system is perfect. they all have trade-offs in image quality, speed, convenience of use, quality of accompanying software, and cost. our tests show that for most archival materials a dslr camera will do a better job than an overhead scanner. pros of overhead scanner fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 47 • the scanner is a complete scanning station. it can be connected to a computer and starts scanning immediately. materials can be placed on the scanning surface, so no equipment adjustments are required while scanning. • it can scan and save images in bitmap format directly, while a dslr camera can only shoot in grayscale or color. • built-in book cradles help to scan thick books and those that cannot be fully opened. • book curve correction functionality is provided by the accompanying software. cons of overhead scanner • high cost. the overhead scanner we have cost more than $50,000, with an annual maintenance contract of $8,000. • high replacement cost. when a scanner is outdated or broken, the entire machine has to be replaced. • instability. our overhead scanner is unstable even when placed on a sturdy table and handled only by professionals. from april 2010 to october 2010, the scanner was down for a total of forty-two working days (sixty calendar days). the company fixed the machine onsite many times, but it continues to have minor problems and has not been completely reliable. • the autofocus feature does not work consistently. • special training is needed to operate the machine and associated software. • file formats supported are limited. most scanners only support tiff, jpeg, jpeg 2000, windows bmp, and png. • unsupported outdated software: our overhead scanner’s software can only be run on an older operating system (windows xp) because there is no updated software for this model. pros of budget studio • stable. under normal use dslr cameras are much less likely to break down than scanners. for example, i have had an older dslr, nikon d200, for seven years. it has survived numerous backpacking trips, multiple drops, and extreme weather conditions. the camera still functions as needed. • fast and accurate focus. dslr cameras are designed to focus quickly, and their focus indicators provide instant feedback to the operators so they know that the image is focused. if operated properly, images delivered by dslr cameras can be sharper than ones delivered by scanners. • less expensive. a good quality dslr camera and a lens can be purchased for fewer than $4,000 and last for years. as technologies advance, dslr cameras’ prices will continue to drop. • ability to save files in more formats. in addition to tiff and jpeg formats, most dslr cameras can save photos in raw file format. some cameras can directly save images in digital negative (dng) format, and others deliver images in proprietary formats that can be information technology and libraries | march 2016 48 converted into dng using a computer program. editing raw images is nondestructive, while editing of tiff and jpeg images is irreversible. • accurate wb and exposure. by using right shooting and post-processing techniques, photographs can have exact color reproduction. on the other hand, calibrating an overhead scanner most likely can only be performed by a company’s trained technician. proper exposure and wb are not guaranteed. • the raw file format usually provides more dynamic range. overexposed and underexposed images can be fixed by adjusting exposure compensation via software; thus lost shadow or highlight detail can be restored. • can photograph 3-d objects. archival collections often have materials other than books, such as art pieces. these materials are better to be photographed than scanned. • versatile. cameras can perform on-site digitization, while overhead scanners are too bulky to be moved around. • faster and better preview. images can be viewed instantly on a computer when proper software, such as adobe lightroom, is used. operators can compare multiple shoots on a screen side-by-side and decide which photo to retain. • more accessible technical support. the number of dslr camera users is much higher than overhead scanner users. technical questions can often be answered through online forums. • easy to find replacement parts. when a piece in a shooting studio break down, it is easy to find replacement piece and replace by staff. • easy software updates. software used in a studio is independent from equipment. cons of budget studio • there is learning curve for setting up a shooting studio, operating the studio, and mastering new image processing techniques. • a dslr camera with a lower pixel setting will not be sufficient for scanning large-format materials, such as posters and maps. • no built-in book curve correction is provided by adobe photoshop or lightroom. however, our experience proves that the automatic book curve function does not always work well. we normally use a home-made book cradle to help lay a page flat and use one or two weights to hold down the other side of book. for some books, if flatness is hard to achieve, we place a piece of glass on the top to ensure the flatness. • security concern: since a dslr camera is highly portable, it can be stolen easily. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 49 figure 26. scanning setup using a book cradle. conclusion the technology of dslr cameras has advanced very quickly in the past ten years. newer dslr cameras can handle higher resolutions and have very little image noise even at a high iso setting. the higher demand for dslr cameras and accompanying image-editing software results in more rapid technology advances compared to low-demand and high-end overhead scanners. high consumer demand drives dslr camera prices much lower than prices for overhead scanners. in addition, the wide range of consumers purchasing dslr cameras and software prompts companies to offer more user-friendly interfaces. as you can see from our tests, for most library materials a dslr camera can produce superior images. if you do not have a budget for high-end overhead scanners, you can still fulfill your digitization preservation goals with a budget studio. acknowledgement i would like to thank robert hickerson and ryan waggoner, the university of kansas spencer museum of art, tim hawkins, and steelworks center of the west for showing their digitization labs and sharing experience with me. references 1. federal agencies digitization guidelines initiative, “technical guidelines for digitizing cultural heritage material: creation of raster image master files,” august 2010, http://www.digitizationguidelines.gov/guidelines/digitize-technical.html 2. “tutorials: white balance,” cambridge in colour, accessed march 9, 2016, http://www.cambridgeincolour.com/tutorials/white-balance.htm. http://www.cambridgeincolour.com/tutorials/white-balance.htm information technology and libraries | march 2016 50 3. “colorchecker passport user manual,” x-rite incorporated, accessed march 9, 2016, http://www.xrite.com/documents/manuals/en/colorcheckerpassport_user_manual_en.pdf. 4. scott kelby, “scott kelby's editing essentials: how to develop your photos,” pearson education, peachpit, accessed march 9, 2016, http://www.peachpit.com/articles/article.aspx?p=2117243&seqnum=3. 5. “srgb vs. adobe rgb 1998,” cambridge in colour, accessed march 9, 2016, http://www.cambridgeincolour.com/tutorials/srgb-adobergb1998.htm. 6. “colour banding,” wikipedia, accessed march 9, 2016, http://en.wikipedia.org/wiki/colour_banding. 7. “posterization,” wikipedia, accessed march 9, 2016, http://en.wikipedia.org/wiki/posterization. 8. “image posterization,” cambridge in colour, accessed march 9, 2016, http://www.cambridgeincolour.com/tutorials/posterization.htm. 9. richard anderson and peter krogh, “color space and color profiles,” american society of media photographers, accessed march 9, 2016, http://dpbestflow.org/color/color-space-andcolor-profiles. 10. tony roslund, “introduction to off-camera flash: three main choices in strobe lighting,” fstoppers (blog), accessed march 9, 2016, https://fstoppers.com/originals/introductioncamera-flash-three-main-choices-strobe-lighting-40364. 11. “introduction to light meters,” b & h foto & electronics corp., accessed march 9, 2016, http://www.bhphotovideo.com/find/product_resources/lightmeters1.jsp. http://www.xrite.com/documents/manuals/en/colorcheckerpassport_user_manual_en.pdf http://www.peachpit.com/articles/article.aspx?p=2117243&seqnum=3 http://www.cambridgeincolour.com/tutorials/srgb-adobergb1998.htm http://en.wikipedia.org/wiki/colour_banding http://en.wikipedia.org/wiki/posterization http://www.cambridgeincolour.com/tutorials/posterization.htm http://dpbestflow.org/color/color-space-and-color-profiles http://dpbestflow.org/color/color-space-and-color-profiles https://fstoppers.com/originals/introduction-camera-flash-three-main-choices-strobe-lighting-40364 https://fstoppers.com/originals/introduction-camera-flash-three-main-choices-strobe-lighting-40364 http://www.bhphotovideo.com/find/product_resources/lightmeters1.jsp oversize materials small prints use of a sharpening filter color balance color space setting up a budget studio commercial approach sample budget studio setup cons of budget studio acknowledgement i would like to thank robert hickerson and ryan waggoner, the university of kansas spencer museum of art, tim hawkins, and steelworks center of the west for showing their digitization labs and sharing experience with me. lib-mocs-kmc364-20140103102230 38 journal of library automation vol. 4/1 march, 1971 recon pilot project: a progress report, april-september 1970 henriette d. avram and lenore s. maruyama: marc development office, library of congress, washington, d. c. a synopsis of the third progress report on the recon pilot project submitted by the library of congress to the council on library resources. an overview is given of the progress made from april through september 1970 in the following areas: recon production, format recognition, research titles, microfilming, and investigation of input devices. in addition, the status of the tasks assigned to the recon working task force are briefly described. introduction the recon pilot project was established in august 1969 to test various techniques for retrospective conversion in an operational environment and to convert a useful body of records into machine readable form. it is being supported with funds from the council on library resources, the u.s. office of education, and the library of congress. this article summarizes the third progress report of the pilot project submitted by the library of congress to the council and has addressed itself to all aspects of the project, regardless of the source of funding, in order to present a meaningful document. two previous articles in the journal of library automation summarized the first and second progress reports, respectively ( 1), ( 2). this article describes the activities occurring april through september 1970. progress-april through september 1970 recon production at the present time, the recon data base contains approximately 20,000 records. it appears that the original estimates on the number of titles to be input during the recon pilot project were considerably higher than the actual number found to be eligible. this situation occurred because of the following circumstances: recon pilot project/ avram and maruyama 39 1) the original estimates were derived from the number of english language monographs cataloged during 1968 and 1969. since the marc distribution service began in march 1969, it was felt that the number of titles eligible for recon in the 1969 and 7-series of card numbers would be equal to the number cataloged during january-march 1969. in actuality, the titles cataloged during this period were primarily records with 1968 card numbers. 2) the estimate of records with 1968 card numbers was higher because it was thought that many more of these titles had been through the cataloging system than were actually processed prior to the beginning of the marc distribution service. instead of being included in recon, these records have been input into the marc distribution service. in order to obtain 85,000 records for conversion, several alternatives, including the conversion of english language monographs in the 1967 card series, are being studied. format recognition format recognition is a technique that will allow the computer to process unedited catalog records by examining data strings for certain keywords, significant punctuation, and other clues to determine the proper content designators. this technique should eliminate substantial portions of the manual editing process and, if successful, should represent a considerable savings in the cost of creating machine readable records. the logical design for format recognition has been completed, and the manual simulation to test the efficiency of the algorithms was described in an earlier article ( 3). completion date for the programs is expected in february 1971. the programs were designed in several modules so that they could be adapted for different input procedures without disturbing the logic. once the programs have been implemented, tests may show that certain fields should be pretagged because the error rate is too high or the occurrence of the field is too low to justify the processing time. the complete logical design for format recognition has been published as a separate report by the american library association ( 4). as part of a manual simulation to test the format recognition algorithms, one hundred fifty records for english language monographs were typed on an mt/st, a typewriter-to-magnetic tape device. the mt/st hardcopy output was used as the raw data for the simulation. the results of the test were analyzed for possible changes to the algorithms, keyword lists, or input specifications. then the records with the content designators assigned by the format recognition algorithms were retyped and processed by the existing marc system programs. proofsheets were produced and given to the recon editors for proofing, a process to verify content designators and bibliographic information. 40 journal of library automation vol. 4/1 march, 1971 each editor proofed all of the format recognition records; their hourly numbers of records proofed were as follows: highest, 9.3; lowest, 5.3; average, 6.8. the average number of current marc records edited and proofed in an hour is 4.8. when format recognition is implemented, present workflow-editing, typing, computer processing, proofing-will be replaced by a new onetyping, format recognition, proofing. in comparing production rates in the two systems, time needed to proof format recognition records must be compared against time needed to edit and proof in the current system. several factors should be considered when evaluating this portion of the simulation experiment. although all the records chosen for the test were of english language monographs, they were generally more difficult than those encountered in a normal day's work for both editors and typists. in addition, numerous errors were made by the human simulators, such as omission of subfield codes, delimiters, or fixed field codes. format recognition does appear to have reduced the amount of time spent in the combined editing and proofing process, but the success of the program depends heavily on the following factors: 1) extensive training for the input typists with greater emphasis placed on their role in this project; and 2) extensive training for the editors to alert them to kinds of errors the format recognition programs might make. proofing time for the test was greater than anticipated. with fewer errors from the typing input and the elimination of human errors from the simulation, it is possible that the proofing rate will be higher under actual work conditions. editors might reach an average of 9.3 records proofed, or double the number presently done in a combined editing/ proofing process. two programs are being written to support the format recognition project. format recognition test data generation (fortgen) will provide test data for format recognition by stripping marc records of delimiters, indicators, and subfield codes, and reformatting the data to be identical with the product from the initial input program. thus, a large quantity of high quality test data can be provided without additional keystroking. the keyword list maintenance program ( klmp) maintains approximately sixty keyword lists used by the format recognition program in processing bibliographic data. these lists are maintained as a separate data set on a 2314 disk pack. the actual lists themselves, alon~ with associated control data, are referred to as "keyword list structures. ' the general function of klmp is to read the entire set of keyword list structures from the file on disk, modify them as specified by parameter cards to klmp, and write a new file on disk. the individual actions performed by klmp are as follows: 1) create a list; 2) remove a list; 3) add a keyword; 4) delete a keyword; 5) augment a table (translation tables to recon pilot project/ avram and maruyama 41 generate codes such as geographic area code, language, place of publication); and 6) list structures (printout of all or selected portions of a list). since the keyword lists will be dynamic in nature, this program provides the flexibility required to change or update them without recataloging the entire format recognition program. new lists will be added as format recognition is extended to other languages, and keywords will be added to or deleted from existing lists as experience is gained in the use of format recognition. research titles since the production operations of the recon pilot project have been limited to english language monographs in the 1968, 1969, or 7 -series of card numbers, it was recognized that many problems concerning retrospective records would not be revealed in the conversion of relatively current titles. for this reason, a project to identify and analyze 5,000 research titles was included as part of the pilot project. these research titles would consist of records for older english language monographs and foreign language monographs in roman alphabets and would be studied for problems in the following areas: 1) earlier cataloging rules which caused certain elements to be omitted from the record or transcribed in a different style; 2) different printed card formats which placed elements in different locations; 3) difficulty in working with foreign languages when converting records to machine readable form; 4) problems arising from shared cataloging records; and 5) problems arising when expanding the format recognition algorithms to cover these kinds of records. the selection of these records was described in an earlier article ( 5). the initial analysis of the research titles has been completed, and a few of the problems encountered are listed as follows: 1) ellipses at the beginning of a title field ( • . . dictionnaire-manuelillustre des ecrivains et des litteratures) were used frequently on older cataloging records. since they are no longer prescribed by the present cataloging rules unless they appear on the title page at the beginning of a title, it was recommended that such ellipses be deleted from the machine record because they would affect the format recognition algorithms. 2) card numbers without digits representing the year (f-3144) were assigned during 1901. generally, these numbers appear with an alphabetic prefix representing the language of the publication or the classification number. it has been recommended that such numbers be revised to read "f01-3144" for the machine record. 3) records cataloged under the 1908 a. l.a. catalog rules included in the series statement such information as the editor of the series or the location of the series statement (half-title: everyman's library, ed. by ernest rhys. reference). it has been recommended that such information he deleted from the machine record. 4) an asterisk preceding personal name added entries (i. 0 spence, 42 journal of library automation vol. 4/1 march, 1971 lewis, 1874joint author.) indicated that the name had appeared in a fuller form at an earlier date; if this name were used as the main entry, there would have been a corresponding full name note at the bottom of the catalog card. it has been decided that this asterisk will be deleted from the machine record. 5) the national bibliographies from which shared cataloging copy is derived use punctuation conventions which differ from the aa rules. for example, the west german bibliography uses parentheses to indicate that the data are not on the title page, brackets to indicate the data are not in the publication, and angled brackets to indicate that the data are enclosed in parentheses on the title page ( <22.-27. mai 1967>. koln ([-ehrenfeld] bundesinstitut fur ostwissenschaftliche und internationale studien) 1967). such conventions would affect the expansion of the format recognition algorithms to foreign languages. this is an area in which the standard bibliographic description would be of great value. 6) in the marc ii format, each place of publication is a separate subfield so that when each place is connected by hyphens (milano-romanapoli ... ,), there would be a problem in inputting the data and having the data printed out in the same fashion. it has been recommended that each place of publication be separated with a comma instead of a hyphen (and the ellipsis deleted from the imprint statement). 7) conjunctions have been used between places of publication on records cataloged according to the 1908 rules and on some shared cataloging copy (london, glasgow and bombay) (neuwied a. rh. u. berlin). in the machine record, each place is a separate subfield, and the presence of a conjunction means that one subfield contains non-essential data. it has been recommended that conjunctions be omitted from the machine record and that places of publication be separated by commas. 8) the a. l.a. cataloging rules for author and title entries states that with certain well-known persons, dates of birth and death can be omitted when the heading is followed by a subject subdivision ( 1. shakespeare, william-language-glossaries, etc.). since the rules provide a list of such persons, it has been recommended that when such names are used as subject headings, they should include dates of birth and death in the machine record. 9) a collation statement like the following ( 25 p., 27-204 p. of ill us., 205-232 p., 233-236 p. of illus., 237-247 p. 28 em.) would cause the format recognition algorithms some difficulty in identifying the proper subfields. this is another area in which the adoption of a standard bibliographic description would aid format recognition programs. 10) both east and west german bibliographies give information about illustrations in the title paragraph rather than in the collation (title paragraph: [mit] 147 abbildungen und 71 tabellen. collation: xii, 418 p. 26 em.). the cataloging policy at the library has been revised so that recon pilot pro]ect/avram and maruyama 43 on current cataloging records information about illustrations is also repeated in the collation. it has been recommended that for retrospective records the data should be input as it appears on the catalog card. in this example, the machine record would not contain illustration information in the collation. 11) the method of transcribing non-lc subject headings has been changed in recent years, and the marc ii format reflects this change. in previous years, the following conventions were used: subscript brackets enclosed headings or portions of headings that were not the same as the lc form; subscript parentheses enclosed portions of headings that were the lc form but not the contributing library's; if two headings had the same number, the lc form was listed first; if both forms of the heading were the same, there would be only one number, and the heading itself would not have the subscript brackets or parentheses. it has been recommended that either the non-lc forms be deleted from the machine record or the transcription of such subject headings be revised to follow the current practice. 12) nlm subject hearings have different capitalization conventions from those used by lc, and the geographic subject subdivisions are often in a form different from that which the library of congress uses ( [dnlm: 1. public health administration-u.s.s.r. w6 p3]). in analyzing these research titles in terms of possible problems with format recognition, it was discovered that nlm subject headings would be incorrectly identified for the above reasons. format recognition depends heavily on capitalization and keyword lists; in this example, the heading "public health administration" would be identified as a corporate name because of the capitalization. examinination of the research titles showed the similarity of the cataloging of the older records (pre-1949) and the current foreign language records based on shared cataloging copy. certain stylistic conventions, such as the use of ellipses or the transcription of imprint statements, were similar for both kinds of material. it would be necessary to have a thorough knowledge of the ala catalog rules (published in 1908) in order to interpret the data on the older printed cards correctly during a conversion project. the experience of the editors in the recon production unit has been that retrospective records, even those cataloged during the last two years, require a considerable amount of interpretation in order to assign the correct content designators in the fixed fields. for pre-1949 records, the problem becomes more acute when one attempts to apply the procedures and techniques for current material to older records. it is very likely that a higher level of personnel would be required to process these records because in many instances the changes would be similar to recataloging the entire record. the expansion of format recognition to foreign languages would be 44 journal of library automation vol. 4/1 march, 1971 t emely difficult without a greater degree of consistency in shared ~:t~oging copy. each national bibliography, from which the cataloging copy is derived, has its own rules and style of cataloging, so that although the language of the works may be the same, e.g., german, the entries from the west german, east german, austrian, and swiss bibliographies may differ in terms of punctuation or style of cataloging. these problems have been compounded by printer·s errors on the printed cards as the result of conventions that differ from the aa rules. the adoption of the standard bibliographic description ( 6) would be a tremendous aid in interpreting cataloging data by both humans and format recognition programs. microfilming techniques the library's photoduplication service is supporting the recon pilot project by providing the cost estimates for the various alternatives of microfilming techniques and providing technical guidance as required. several discussions with them confirmed that the method of filming a portion of the record set containing the subset of records to be converted first and selecting the appropriate records afterward would be more advantageous than selection prior to microfilming ( 7). it was considered unrealistic to attempt to project microfilming costs for the entire recon effort. because of the paper handling problems involved in the management of input worksheets, the microfilming rate should be in reasonable proportion to the actual conversion rate. there is no point in providing a huge supply of input worksheets which will not be used in actual conversion for a long time. the data may become "dated," and there may be storage and handling problems. in addition, cost estimates provided by the photoduplication service can only be expected to prevail over the next twelve months. beyond that period, any quotation given is likely to be higher because of the general trend of rising costs. any projection of costs should be based on a manageable portion of the whole. just what this portion should consist of has yet to be determined. assuming a modus operandi as described above, there is needed a determination of the "rate floor," which is defined as the minimum number of records that must be microfilmed to achieve the maximum cost benefits resulting from a relatively high volume job. once the rate floor is determined, it should probably be translated into year equivalents, i.e., if the rate floor is 100,000 and the catalog card production is 50,000, then two years· worth of cards would be microfilmed. estimates would be obtained for the following alternatives: microfilming for ocr device specifications; microfilming for reader-printer specifications; microfilming for reader specifications; and microfilming for xerox copyflo printouts of the lc printed cards onto recon worksheets. certain ground rules were assumed for the actual microfilming process. the selected drawers of the record would be "frozen" for a day or two prior to being filmed, i.e., the file would be complete and no one would recon pilot project/avram and maruyama 45 remove cards from the file while filming was in process. the filming would take place during the day. assuming that 100,000 cards for the year 1965 would be used as a base figure and that approximately 5,000 cards per day can be filmed with a planetary camera, it would take twenty working days to film the collection of cards for one year in the record set (rate floor as defined above). all cost estimates will include quality control; i.e., quotations would indicate degree of inspection of film for technical quality and degree of preparation of the file before filming. input devices during 1969 the library of congress conducted an investigation to determine the feasibility and desirability of using a mini-computer for marc/recon input functions (original input and corrections). this study was performed with contractual support and consisted of three basic tasks: 1) analysis of present operations to determine functional requirements, to measure workloads, and to identify problem areas; 2) survey and analysis of mini-computers that are potentially capable of meeting the requirements of the present operations; 3) evaluation of available hardware and software capabilities relative to marc data preparation requirements and determination of economic feasibility based on present and projected workloads. the intent of this study was to provide a basis for future planning and procurement activities by the library of congress relative to improvement of the marc/recon man-machine interface. the survey of hardware was not intended to be all-inclusive. there were time and funding limitations, and in addition it was recognized that the mini-computer field was a rapidly expanding one; therefore, it was not possible at any cut-off point to have surveyed the totality. six firms were included in the survey, and the machines considered were the burroughs tc-500, the digital equipment corporation pdp-8/i, the honeywell pdp-516, the ibm 1800, the lnterdata model 4, and the xds sigma 3. of these, the dec pdp8/1 and the honeywell pdp-516 were determined to have the highest potential for meeting marc/recon requirements. additional analysis revealed that software availability for mini-computers is minimal. manufacturers covered in this investigation supplied an assembler as well as testing and editing routines. some provided a fortran, algol, or basic compiler and an operating system with foreground/background processing. systems that support fortran and the operating system are quite substantial, generally requiring 16,000 words of core, memory protect, disc, etc. the cost of this kind of system is generally a minimum of $10,000. few low-cost peripheral devices are available for use with mini-computers. high-speed tape readers, punches, and punched card readers are the most inexpensive input/output devices available. the addition of a magnetic tape unit to most systems significantly increases the overall cost. 46 journal of library automation vol. 4/1 march, 1971 the conclusion reached as a result of this investigation was that there is no gain, either tech~ically or economically ( co.nsiderin~ ~he hardwa~e configuration of the l1brary of congress), to usmg a mm1-computer m performing present marc/recon functions. another input device investigated during this reporting period was the keymatic data system model 1093, which was selected for a two-month test and evaluation period because it appeared to have the following advantages for the recording of bibliographic data: 1) this device has 256 unique codes; 2) data is recorded directly on computer compatible magnetic tape; 3) through manufacturer supplied software, the user may assign to certain keys, called expandables, the value of whole strings of characters; thus a single key would equate to a marc tag; 4) correction procedures are built into the device, i.e., the ability to delete a character, word, sentence, or entire record; and 5) the single character display screen obviates the necessity for hard copy. it is often claimed that hard-copy output is scanned by the typist tmintentionally to the detriment of typing rates. the machine tested was specifically set for the library's requirements. four separate keyboards contained 184 keys, of which 103 had upperand lower-case capability, and the remaining 81 had only a single case. the 256 possible codes were divided into the following categories: 1) 94 were used as expandables and assigned to those marc tags and data strings (correction and modification symbols) that appear most frequently; 2) 10 were used as machine function codes; 3) 150 were assigned unique values in the marc character set; and 4) 2 were left unused. the keys on the four keyboards were assigned values such that the most frequently used keys were located in a strong stroke area. the main character keyboard was designed to be closely compatible to the device currently in use at the library to lessen the training requirements for the typist. therefore, the typist had only to learn the expandable keys and some lesser used special characters. the program supplied by the manufacturer was modified for code conversion and output format acceptable to the marc system and to conform to the library's computer system assignments. the two typists selected to participate in the test were both experienced marc production typists. both typists were given individual instruction on the machine and spent three weeks practicing; at the same time, their performance was being analyzed and discussed with them. during the official evaluation period, the typists spent two weeks working full time on the machine. when the typists began their practice period, their speeds were relatively slow, 6-7 records per hour. as time progressed, their speed increased, leveling off to approximately 11-12 records per hour by the end of the test period. each typist reported problem areas during the official evaluation. one problem was the hesitation which resulted when the typist had to detennine recon pilot project/avram and maruyama 47 whether to use an expandable key or actually type the data, character by character. if she chose the former, the expandable key had to be found. the number and different combination of tags caused some confusion. the opinion of both typists concerning the keyboard arrangement was that they would rather type the tags character by character than search for the expandable key. more experience on this device might eliminate this problem. the absence of hard copy was felt to cause another problem. when a typist intuitively feels that she has made an error in current marc/ recon typing operations, she uses the hard copy to verify that a mistake has actually been made prior to taking corrective action. the lack of hard copy did not allow for this verification, and the typists reported that this detracted from their efficiency. the following table lists the results of the official evaluation period. the average production rate of these two typists on the mt /st is also listed. the figures for mt jst production have been calculated for a particular three-week period. typist a typist b total mt/st new records 505 540 1045 1995 correction records 323 278 601 verified records 58 537 595 average records/hour-new 10.1 14.0 12.1 14.6 average records /hour-corrected 21.3 27.7 24.5 keystrokes total 238,435 259,630 498,065 expandables used 12,280 14,646 26,926 the keymatic model used for the test rents for $768.25 per month (july 1970 pricelist). it is a fully equipped model with several options not required for the marc system. without these options, a less expensive model could be used. keymatic does have a 24-month lease plan in which the basic machine could be rented for $368.00 per month. this is an increase of $258.00 per month per machine over the current method of input. costs per record were computed for the keymatic device and for the mt /st based on the average record statistics of both typists. although the same records were not actually typed on the mt jst, extensive experience with production and error rates on that device made it valid to use average production rates for purposes of comparison. for purposes of computing the cost per record, the hourly cost per machine was calculated by dividing the cost per machine by 160 working hours. the 24-month leasing price of $368.00 per month was used for the keymatic, resulting in a macbine cost per hour of $2.30. the mt /st rental cost is $110.00 per month, resulting in an hourly cost of $.69. (the cost of the mt /st listed in a previous article ( 8) as being $100.00 was 48 journal of library automation vol. 4/1 march, 1971 in error.) on the basis of 12.1 records per hour on each device, the cost per record for the keymatic is $.19 and $.06 for the mt /st. in the context of the library of congress marc/recon project, the addition of a digi-data to translate mt/st output to computer compatible tape adds an incremental cost to each input device. for the purposes of this report, it was assumed that the project required five input devices. on this basis, the prorated digi-data cost per hour is $.33, which makes the total machine cost per hour for the mt /st as $1.02. thus, the cost per record for the mt /st becomes $.08. the results of the test indicated that the keymatic used in the library of congress environment did not substantially increase production rates or decrease error rates. thus, no savings in cost were demonstrated. the complex data to be typed and the construction and quality of the worksheets at the library of congress impose severe constraints on all machines. (the manuscript card reproduced on the marc/recon worksheet results in a source document that is difficult to work with for the following reasons: 1) loss of legibility during the copying process; 2) position of tags in relation to content; and 3) combination of typed and handwritten data as recorded by the catalogers. ) in order to make a fair comparison between the keymatic and the mt /st, the manuscript card was used for the test rather than the printed card. if, on evaluation, the keymatic proved to be more efficient than the mt /st using the manuscript card, it would be even more effective if the printed card were used, since the latter is a far more legible source document. keymatic does have a new machine, model k-103, which has an socharacter visual display option which might correct one of the objections raised by the typists, i.e., lack of hard copy; however, this model requires the use of a converter as does the mt /st. this device is less expensive than the machine used in the test and may be evaluated during the recon project at a later date. an investigation of model 370 compuscan was continued following the initial findings reported in a previous article (9). twenty-five letterpress library of congress printed cards representing english language titles and containing no diacritical marks in the content were sent to the firm for input. this allowed the machine to be evaluated and problems noted within an "ideal" test environment. depending on these results, further testing could be performed. since existing compuscan software was used to conduct the library of congress test, the entire lc card could not be read but only that portion that contained fonts already built into the existing configuration. the printed cards were blocked out, except for the area covering the body of the entry, i.e., title through imprint, prior to microfilming for subsequent scanning. operator intervention was required on approximately 1%-25% of the cfutracters on each card. in addition to the problems offered by variant recon pilot project/ avram and maruyama 49 and touching characters, fine lines in certain characters caused a misreading by the machine. this was f~uticularly true with the letter "e" being interpreted as the letter "c. compuscan felt this problem might be resolved by increasing the size of the comparison matrix of the hardware. in some instances, a period was generated in the middle of a word due to the coarseness of the card stock that was microfilmed. initial discussions have begun on the possibility of testing a retyped version of the printed card. the only rationale behind this test would be to investigate if typing for a scanner that could read upper-and lower-case and special characters made any significant difference in speed and/ or error rate compared to costs and production rates of typing for a scanner which could read only upper-case characters. the latter was described in an earlier article on recon (10). recon working task force the working task force continued the discussion on the implications of a national union catalog in machine-readable form. from the postulated reporting system for a future nuc described in a earlier article (11), several items were isolated for further consideration. these included: 1) grouping of records in a register (by language, alphabet, etc. ) to allow for a segmented approach to computer-produced book catalogs (a register is defined as a printed document containing the full bibliographic descriptions of works sequenced by unique identification numbers. as each record is added to the register, it is added at the end and assigned the next sequential identification number); 2) the need for additional indexes to the register by lc card number and classification number (the class number was not included in the list of data elements required for the machine-readable nuc); 3) the requirement to include the author statement in the title index versus using the main entry in all cases; and 4) clarification of subject index to mean only topical or geographic subjects. the following tasks were outlined for further consideration: 1) format of the printed nuc (graphic design and printing, size, style, typographic variation, etc.); 2) physical size of the volume depending on pattern of distribution (monthly, bimonthly, etc.); 3) input (relationship to marc input, use of format recognition, problems of languages in terms of selection for input); 4) output (cost of production for register and indexes, cost of sorting, costs of selection, etc.); 5) cumulation patterns in terms ?£ cost and utility (number of characters in an average entry, number of items on a page, rate of increase, etc.); 6) the use of com (computer output microfilm) as an alternative to photocomposition for printed output. work on task 3, the investigation of the possible use of existing data bases in machine readable form for a national bibliographic service, has been continued. phase 1 of this task consisted of a survey of existing machine readable data bases. selection of data bases for analysis was based on the following criteria: 1) the data base had to include monograph 50 journal of library automation vol. 4/1 march, 1971 records. 2) any data base known to have predominantly lc marc records was excluded. 3) the data base had to be potentially available to recon (security organizations or commercial vendors might not be willing to give their files to a recon effort). 4) data bases of less than 15,000 records were excluded. a data analysis worksheet was prepared to reduce the documentation to a standardized form for each system studied in the survey. it was initially anticipated that once documentation was received from the various institutions, additional contact would be made via telephone or on-site visits. this proved to be unnecessary, as the submitted documentation was generally sufficient. since many of the formats submitted were complicated, errors could have been made in interpretation; however, this possibility was not considered important enough to affect the findings of this task. if necessary, additional information can be requested from the library systems at a later date. the analysis of the submitted documentation was difficult for the following reasons: 1) the amount of documentation ranged from extremely detailed to very sparse; 2) neither the technical nor the bibliographic terminology was consistent for all organizations; 3) in some instances, the format descriptions were more detailed with respect to control and housekeeping data fields than bibliographic data fields. the formats were ranked according to three broad categories: low potential, medium potential, and high potential. to arrive at a ranking, the data fields of each format were compared to the marc ii format. comparison was made on the following basis: 1) present in both formats; 2) not present in local format and not capable of generation by format recognition algorithms; or 3) not present in local format but capable of generation by format recognition. the result of this analysis distributed the twenty-two institutions into the following ranked order: 1) low potential-s; 2) medium potential-s; 3) high potential-h. the figure for the number of low potential data bases is in addition to the eight out of the eleven originally rejected due to a small data base or very limited content in the record. it is significant to note that although no attempt was made at an all-inclusive survey of machine readable data bases, the total number of records in machine readable form reported by the respondents amounted to approximately 3.7 million of all types. of this figure, about 2.5 million represented monograph records. the phase 1 study included procedures required to transform a record into a certified recon record, thus outlining the areas requiring cost analysis to compare the economics of using existing files for a national bibliographic store, as opposed to original input. (certification in this context means comparing the record of the local institution to the record in the lc official catalog and, if required, making the record consistent with the lc cataloging as well as upgrading it to the bibjiographic com· recon pilot projec1'/ avram and maruyama 51 pleteness of the lc record. input in this sense includes the editing of the record as well as the keying.) the results of the study, prior to any further analysis, seems to indicate that the next phases of task 3 will concentrate on a very large data base with a high degree of compatibility with marc ii (high potential) and another data base with a format differing from marc ii both in level of explicit identification and in bibliographic completeness (medium potential). the first data base tests the most favorable situation; the latter a much less favorable situation. the carry-on phases of task 3 will include: 1) a determination of a cut-off point at which a particular data base would not be included in future studies (although the composition and the format of the records in the data base might fit the selection criteria, the number of records in the file might be insufficient to warrant the costs of the hardware/software for the conversion effort); 2) investigation of the hardware and software effort involved; and 3) determination of the costs of comparing the records with the lc official catalog and the resultant updating costs to bring the records up to the level of the records in the lc machine readable marc/ recon data base. acknowledgments the authors wish to thank the staff members associated with the recon pilot project in the marc development office, the marc editorial office, and the technical processes research office in the library of congress for their contributions to this report. the lc photoduplication service provided valuable assistance in certain phases of this project. work on the recon pilot project has continued to be supported by the council on library resources and the u.s. office of education. references 1. avram, henriette d.: "the recon pilot project: a progress report," journal of library automation, 3 (june 1970), 102-114. 2. avram, henriette d.; guiles, kay d.; maruyama, lenore s.: "the recon pilot project: a progress report, november 1969-april 1970," journal of library automation, 3 (september 1970), 230-251. 3. ibid., p. 235 4. u.s. library of congress. information systems office. format recognition process for marc records: a logical design. chicago: ala, 1970. 5. avram, henriette d.; guiles, kay d.; maruyama, lenore s. op. cit., p. 236. 6. ibid. 7. ibid., p. 237. 8. ibid., p . 246. 9. ibid., pp. 244-245. 10. ibid., pp. 245-248. 11. ibid., p. 248. identifying key steps for developing mobile applications and mobile websites for libraries devendra dilip potnis, reynard regenstreifharms, and edwin cortez information technology and libraries | september 2016 43 abstract mobile applications and mobile websites (mamw) represent information systems that are increasingly being developed by libraries to better serve their patrons. because of a lack of in-house it skills and the knowledge necessary to develop mamw, a majority of libraries are forced to rely on external it professionals who may or may not help libraries meet patron needs but instead may deplete libraries’ scarce financial resources. this paper applies a system analysis and design perspective to analyze the experience and advice shared by librarians and it professionals engaged in developing mamw. this paper identifies key steps and precautions to take while developing mamw for libraries. it also advises library and information science graduate programs to equip their students with the specific skills and knowledge needed to develop and implement mamw. introduction the unprecedented adoption and ongoing use of a variety of context-specific mobile technologies by diverse patron populations, the ubiquitous nature of mobile content, and the increasing demand for location-aware library services have forced libraries to “go mobile.” mobile applications and mobile websites (mamw), that is, web portals running on mobile devices, represent information systems that are increasingly being developed and used by libraries to better serve their patrons. however, a majority of libraries often lack the in-house human resources necessary to develop mamw. because of a lack of staff equipped with the requisite it skills and knowledge, libraries are often forced to partner with and rely on external it professionals, potentially losing control over the process of developing mamw.1 partnerships with external it professionals do not always help libraries meet the information needs of their patrons but instead can deplete their scarce financial resources. it then becomes necessary for librarians to understand the process of developing mamw to better evaluate mamw for better serving library patrons. one possibility devendra dilip potnis (dpotnis@utk.edu) is associate professor, school of information sciences; reynard regenstreif-harms (reynardrh@gmail.com) is project archives technician, great smoky mountains national park, gatlinburg, tennessee; and edwin cortez (ecortez@utk.edu) is professor, school of information sciences, university of tennessee at knoxville. mailto:dpotnis@utk.edu mailto:reynardrh@gmail.com) mailto:ecortez@utk.edu identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 44 is to re-educate themselves through continuing education or other professional development activities. another solution would be to see library and information science (lis) schools strengthen their curriculum in the area of management, evaluation, and application of mamw and related emerging technologies. issues, challenges, and strategies for providing librarians with these opportunities are abundant and have been debated for more than thirty years, especially since libraries started experiencing the impact of microchip and portable technologies.2 any practical and immediate guidance could help librarians in charge of developing mamw.3 however, a majority of the practical guidance available for developing mamw for libraries is limited to specific settings or patron populations. also, the practical guidance is not theoretically validated, curtailing its generalizability for diverse library settings. for instance, a number of librarians and it professionals share their experience and stories of mamw development to serve a specific patron population in a specific library setting.4,5 their stories typically describe their success stories of developing mamw, the lessons learned during the development of mamw, or their advice for developing mamw. this paper applies a system analysis and design perspective from the information systems discipline to examine the experience and advice shared by librarians and it professionals for identifying the key steps and precautions to be taken when developing mamw for libraries. system analysis and design, a branch of the information systems discipline, is the most widely used theoretical knowledgebase available for developing information systems.6 according to the system analysis and design perspective, development, planning, analysis, design, implementation, and maintenance are the six phases of building any information system.7 the next section synthesizes our method for this secondary research. the following section discusses the key steps we identified for developing, planning, analyzing, designing, implementing, and maintaining mamw for libraries. the concluding section presents the implications of this study for libraries and lis graduate programs. method we began this study with a practitioner’s handbook guiding libraries to use mobile technologies for delivering services to diverse patron populations.8 to search the literature relevant to our research, we devised many key phrases, including but not limited to “mobile technolog*,” “mobile applications for libraries,” and “mobile websites for libraries.” as part of our active informationseeking process, we applied a snowball sampling technique to collect more than seventy-five scholarly research articles, handbooks, ala library technology reports, and books hosted on ebsco and information science source databases. our passive information-seeking was helped by article suggestions from emerald insight and elsevier science direct, two of the most widely used journal hosting sites, in response to the journal articles we accessed there. we applied the following four criteria to establish the relevancy of publications to our research: accuracy of facts; duration of publications (i.e., from 2000 to 2014); credibility of authors; and content focused on information technology and libraries | september 2016 45 problems, solutions, advice, and tips for developing mamw. several research articles published by information technology and libraries and library hi tech, two top-tier journals covering the development of mamw for libraries, built the foundation of this secondary research. we analyzed the collected literature using the qualitative data presentation and analysis method proposed by miles and huberman.9 we developed microsoft excel summary sheets to code the experience and advice shared by librarians and it professionals. the coded data was read repeatedly to identify and name patterns and themes. each relevant publication was analyzed individually and then compared across subjects to identify patterns and common categories. the inter-coder reliability between the two authors who analyzed data was 85 percent. data analysis helped us identify the key steps needed for planning, analyzing, designing, implementing, and maintaining mamw for libraries. findings and discussion key steps for planning mamw forming and managing a team building teams of people with the appropriate skills, knowledge, and experience is one of the first steps suggested by the existing literature for planning mamw. it is essential for team members to be aware of new developments and trends in the market.10 for instance, developers should be aware of print resources on relevant technologies such as apache, asp, javascript, php, ruby on rails, and python, etc.; online resources such as detectmobilebrowser.com and w3c mobileok checker to test catalogs, design functionality, and accessibility on mobile devices; and various online communities of developers who could provide peer-support when needed.11 team members are also expected to keep up with new developments in mobile devices, platforms, operating systems, digital rights management terms and conditions, and emerging standards for content formats.12 periodic delegation of various tasks could help libraries develop mamw effectively.13 libraries should also form productive, financially feasible partnerships with external stakeholders such as internet service providers and network administrators for hosting mamw on appropriate internet servers that meet desired safety and security standards.14,15 requirements gathering requirements for developing mamw can be collected through empirical research and secondary research. typically, the goal of empirical research is to help libraries [set off as bulleted list?]gather patron preferences for and expectations of mamw,16,17 stay abreast of the continual evolution of patron needs,18 periodically (e.g., quarterly, annually, biannually, etc.) gather and evaluate user needs,19 index the content of mamw,20 investigate the acceptance of the library’s use of mamw by patrons,21 understand user needs, and identify top library services requested by patrons. identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 46 empirical research in the form of usability testing, functional validation, user surveys, etc., should be carried out before developing mamw to inform the development process and/or after developing mamw to study their adoption by library patrons. empirical research typically involves the identification of patrons and other stakeholders who are going to be affected by mamw. this step is followed by developing data-collection instruments, collecting data from patrons and other stakeholders, and analyzing qualitative and quantitative data using appropriate techniques and software.22 secondary research mainly focuses on scanning and assessing existing literature. for instance, using appropriate datasets on mobile use, librarians may be able to identify the factors responsible for the adoption of mobile technologies.23 typically, such factors include but are not limited to cognitive, affective, social, and economic conditions of potential users. mamw developers could also scan the environment by examining existing mamw and reviewing the literature to create sets of guidelines for replacing old information systems by developing new, well-functioning mamw.24 librarians could also scan the market for free software options to conserve financial resources.25 making strategic choices mobile applications or mobile websites? one of the most important strategic decisions libraries need to make during this phase is whether to use a mobile app or a mobile website—that is, a web portal running on mobile devices—for offering services to patrons. mobile websites are web browser-based applications that might direct mobile users to a different set of content pages, serve a single set of content to all patrons while using different style sheets or templates reformatted for desktop or mobile browsers, or use a site transcoder (a rule-based interpreter), which resides between a website and a web client and intercepts and reformats content in real time for a mobile device.26,27 mobile apps are more challenging to build than mobile websites because they require separate and specific programming for each operating system.28 mobile apps burden users and their devices. for instance, users are expected to remember the functionality of each menu item, and a significant amount of memory is required to store and support apps on mobile devices. however, potential profitability, better mobile-device functionality, and greater exposure through app stores can make mobile apps an economical option over mobile websites.29 buy or build? in the planning phase, libraries also need to decide whether to buy commercial, off-the-shelf (cots) mamw or build a customized mamw. mamw need to be evaluated in terms of customer support and service, maintenance, the ability to meet patron needs, and library needs when making this choice.30 sometimes libraries purchase cots products and end up customizing them, benefiting from both options. for example, some libraries first purchase packaged mobile frameworks to create simple, static mobile websites and subsequently develop dynamic library apps specific to library services.31 information technology and libraries | september 2016 47 managing scope many libraries have limited financial resources, which makes it necessary for their staff to manage the scope of mamw development. the ability to prioritize tasks and identify mission-critical features of mobile mamw are some of the most common activities undertaken by libraries to manage this scope.32 for instance, it is not practical to make entire library websites mobile because libraries would end up serving only those patrons who access their sites over mobile alone. instead, libraries should determine which part of the website should go mobile. a growing trend of using products like mobile first design to design a mobile version of a website first and then work up to a larger desktop version could help librarians better manage the scope of mamw development. alternatively, jeff wisniewski, a leading web services librarian in the united states, advises libraries to create a new mobile-optimized homepage alone, which is faster than trying to retrofit the library’s existing homepage for mobile.33 this advice is highly practical because no webmaster has any interest in trying to maintain two distinct versions of the library’s webpages with details such as hours of operations and contact information. selecting the appropriate software development method there are three key methods for developing mamw: structured methodologies (e.g., waterfall or parallel), rapid application prototyping (e.g., phased, prototyping, or throwaway prototyping), and agile development, an umbrella term used to refer to the collection of agile methodologies like crystal, dynamic systems development method, extreme programming, feature-driven development, and scrum. there is a bidirectional relationship between these mamw development methods and the resources available for their development. project resources such as funding, duration, and human resources influence and are affected by the type of software development method selected for developing mamw. however, studies rarely pay attention to this important dimension of the planning phase.34 key steps in the analysis phase requirements analysis after collecting data from patrons, the next natural step is to analyze the data to inform the process of conceptualizing, building, and developing mamw.35 the requirements-analysis phase helps libraries achieve user-centered design of mamw and assess the return on investment in mamw. the context and goals of the patrons using mobile devices, and the tasks they are likely and unlikely to perform on a mobile device, are the key considerations for developing user-centered mamw for library patrons.36 it is critical to gather, understand, and review user needs.37 surveys can be developed on paper or online, which can be analyzed using advanced statistical techniques or qualitative software.38,39 the analysis allows the following questions to be answered: which identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 48 library services do patrons use most frequently on their mobile devices? what is their level of satisfaction for using those services? what types of library services and products would they like to access with their mobile phones in the future? survey analyses can help librarians predict which mobile services patrons will find most useful;40 they can also help librarians classify users on the basis of their perceptions, experience, and habits when using mobile technologies to access library services.41 as a result, libraries can identify and prioritize functional areas for their mamw deployment.42 mamw developers can learn from their users’ humbling and/or frustrating experience of using mobile devices for library services. in addition, libraries can keep track of their patrons’ positive and negative observations, their information-sharing practices, and howthey create group experiences on the platform provided by their libraries.43 to improve existing mamw, libraries could also use google analytics, a free web metrics tool, for identifying the popularity of mamw features and analyzing statistics on how they are used.44 to develop operating system-specific mobile apps, google analytics can be used to learn about the popularity of mobile devices used by patrons.45 ideally, libraries should calculate and document roi before investing in the development of mamw.46 for instance, libraries can run a cost-benefit analysis on the process of developing mamw and compare various library services offered over mobile devices.47 typically the following data could help libraries run the cost-benefit analysis: specific deliverables (e.g., features of mamw), resources (e.g., resources needed, available resources, etc.), risks (e.g., types of risks, level of risks, etc.), performance requirements, and security requirements for developing mamw. this analysis would help libraries make decisions on service provisions such as specific goals to be set for developing mamw, feasibility of introducing desired features of mamw, and how to manage available resources to meet the set goals.48 libraries should also examine what other libraries have already done to provide mobile services.49 communication/liaising with stakeholders the effective communication between developers and stakeholders influences almost every aspect of developing information systems. however, existing studies do not emphasize the significance of communication with stakeholders. for instance, several studies vaguely refer to the translation of user needs into technology requirements.50 but few studies point out the precise modeling technique (e.g., entity relationship diagrams, unified modeling language, etc.) for converting user needs into a language understood by software developers. developers should communicate best practices and suggestions for the future implementation of mamw in libraries,51 which involves the prediction and selection of appropriate mamw for libraries,52 the demonstration of what is possible and how services are relevant, and how new resources can help create value for libraries.53,54 communication with users is also critical for creating value-added services for patrons who use different mobile technologies to meet their needs related to work, leisure, commuting, etc.55 information technology and libraries | september 2016 49 however, the existing literature on mamw development for libraries does not mention the significance of this activity. key steps for designing mamw prototyping prototyping refers to the modeling or simulation of an actual information system. mamw can have paper-based or computer-based prototypes. prototyping allows developers to directly communicate with mamw users to seek their feedback. developers can correct or modify the original design of mamw until users and developers are in agreement about the system design. building consensus between mamw developers and potential users is another key challenge to overcome during this phase, which may put a financial burden on mamw development projects. it requires skilled personnel to manage the scope, time, human resources, and budget of such projects. wireframing is one of the most prominent prototyping techniques practiced by librarians and it professionals for developing mamw for libraries.56 this technique depicts schematic on-screen blueprints of mamw, lacking style, color, or graphics, focusing mainly on functionality, behavior, and priority of content. selecting hardware, programming languages, platforms, frameworks, and toolkits existing literature on the development of mamw for libraries covers the selection and management of software; software development kits; scripting languages like javascript; data management and representation languages such as html, xml, and their text editors; and ajax for animations and transitions. the existing literature also guides libraries for training their staff for using mamw to better serve patrons.57 few studies also provide guidance on selecting cots products such as webkit, an open source web browser engine that renders webpages on smartphones and allows users to view high-quality graphics on data networks with faster throughput.58 however, it might be a good idea to use licensed open source cots products because licensed software allows libraries to legally distribute software within their organizations as covered by the licensing agreement. libraries that use software-licensing agreements may also be able to seek expert help and advice whenever they have a concern or query. in the authors’ experience, librarians have shared few effective strategies to design mamw. one key strategy is to purchase reliable device emulators and cross-compatible web editors. these technologies allow the user to work with the design at the most basic level, save documents as text, transfer the documents between web programs, and direct designers toward simple solutions.59 sample cross-compatible web editors include, but are not limited to, notetab pro (http://www.notetab.com/), code lobster (http://www.codelobster.com/), and bluefish (http://bluefish.openoffice.nl). http://www.notetab.com/ http://www.codelobster.com/ http://bluefish.openoffice.nl/ identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 50 hybrid mobile app frameworks like bootstrap, ionic, mobile angular ui, intel xdk, appcelerator titanium, sencha, kendo ui, and phonegap use a combination of web technologies like html, css, and javascript for developing mobile-first, responsive mamw. a majority of these frameworks use a drag-and-drop approach and do not require any coding for developing mobile apps. one-click api connect further simplifies the process. user-interface frameworks like jquerymobile and topcoat eliminate the need to design user interfaces manually. importantly, mamw developed using such frameworks can support many mobile platforms and devices. toolkits like github, skyronic, crudkit, and hawhaw enable developers to quickly build mobilefriendly crud (create/read/update/delete) interfaces for php, laravel, and codeigniter apps. such mobile apps also work with mysql and other databases, allowing users to receive and process data and display information to users. table 1 categorizes specific hardware and software features recommended for mamw to better serve library patrons. # areas of information systems/it specific features recommended for developing mamw for libraries 1 human-computer interaction (hci) behavioral, cognitive, motivational, and affective aspects of hci design responsive web sites for libraries to enhance user experience60 design a user interface meeting the expectations and needs of potential users (e.g., menu with the following items: library catalog, patron accounts, ask a librarian, contact information, listing of hours, etc.)61 design meaningful mobile websites based on user needs, documenting and maintaining mobile websites62 usability engineering design concise interfaces with limited links, descriptive icons, home and parent-link icons63 create a user-friendly site (e.g., the dok library concept center in delft, netherlands, offers a welcome text message to first-time visitors)64 effectively transition from traditional websites to mobile-optimized sites with responsive design65 create user-friendly interface designs66 present a clean, easy to navigate mobile version of search results67 information technology and libraries | september 2016 51 information visualization automatically maintain reliable and stable fundamental information required by indoor localization systems68 save time by redesigning existing sites69,70 2 web programming html, xml, etc. design sites with a complete separation of content and presentation71 code html and css for better user experiences72 create and shorten links to make them easier to input using small or virtual keyboards73 using cient-side and server-side scripting such as javascript object notation, etc. design and develop mashups74 develop mamw using client-server architecture, accessible on mobile devices75 without scripting implement widgetization to facilitate the integration of mobile websites—developing a widget library for mobile-based web information systems76 3 open source design mobile websites that allow users to leverage the same open source technology as the main websites77 design mobile websites linking to other existing services like library h3lp and library catalogs with mobile interfaces such as mobilecat78 4 networking design a mobile website capable of exploiting advancements in technology such as faster mobile data networks79 identify and address technology issues (e.g., connectivity, security, speed, signal strength, etc.) faced by patrons when using mamw80 5 input/output devices use a mobile robot to determine the location of fixed rfid tags in space81 design mamw capable of processing data communicated using radio frequency identification devices, near-field communication technology, and bluetoothbased technology like ibeacons82 offer innovative services using augmentedreality tools83 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 52 6 databases integrate a back-end database of metadata with front-end mobile technologies84 integrate front-end of mobile mamw with back-end of standard databases and services85 7 social media and analytics integrate social media sites (e.g., foursquare, facebook place, gowalla, etc.) with existing checkout services for accurate and information rich entries86 implement google voice or a free textmessaging service87 use google analytics for mobile optimized website by copying the free javascript code generated from google analytics and paste it into library webpages to gain insight into what resources are used and who used them88 integrate a geo-location feature with mobile services89 table 1. mamw with specific hardware and software features from the above table, which is based on the analysis of the literature on developing mobile applications and mobile websites for libraries, it becomes clear that web programming and hci are the two leading technology areas that shape the development of mamw and consequently the services offered by them. designing user interfaces of mamw librarians and it professionals engaged in developing mamw for libraries make the following recommendations. use two style sheets: css play a key role in offering uniform display to user interfaces for all webpages. studies recommend designing two style sheets—namely, mobile.css and iphone.css— when developing mamw, since most of the time smartphones ignore mobile stylesheets.90 in that case, iphone.css could direct itself to browsers of a specific screen-width, helping those mobile devices that are not directed to the mobile website by the mobile.css stylesheet.91 minimize use of javascript: javascript is instrumental in detecting what mobile device is being used by patrons and then directing them to the appropriate webpage with options including full website, simple text-based, and touch-mobile-optimized. however, it is critical to minimize the use of javascript on library mobile websites because not every smartphone offers the minimum level of support required to operate it.92 handle images intelligently: to help patrons optimize their bandwidth use, image files on mobile sites should be incorporated with css rather than html code; also, to ensure consistency in the information technology and libraries | september 2016 53 appearance of user interfaces of mobile websites, images should be kept to the same absolute size.93 key steps for implementing mamw programming for mamw programming is at the heart of developing mamw. as shown in table 1 above, web programming enables developers to build mamw with a number of value-added features for patrons. for instance, a web-application server running on cold fusion can process data communicated via web browsers on mobile devices; this feature allows mamw users to access search engines on library websites via smartphones.94 also, client-side processing of classes (with a widget library) allows patrons to use their mobile devices as thin clients, thereby optimizing the use of network bandwidth.95 testing mamw past studies recommend testing the content, display/design, and functionality of mamw in a controlled environment (e.g., usability lab) or in the real world (i.e., in libraries). content: librarians are advised to set up testing databases for testing image presentation, traditional free text search, location-based search, barcode scanning for isbn search, qr encapsulation, and voice search.96 display/design: librarians can review and test mamw on multiple devices to confirm that everything displays and functions as intended.97 they can also test a beta version of their mobile website with varying devices to provide guidance regarding image sizing;98 beta versions are also useful in testing mobile websites for their display on different browsers and devices.99 functionality: librarians can set up testing practices and environments for the most heavily used device platforms (e.g., hci incubators such as eye testing software, which is a combination of virtual emulators and mobile devices not owned by libraries).100,101 they can also use the user agent switcher add-on for firefox to test a mobile website and use web-based services like device anywhere and browser cam offering mobile emulation to test the functionality of mamw.102 training patrons unless patrons realize the significance of a new information system for managing information resources they will hardly use it. however, training patrons for using a newly developed mamw is almost completely missing from the studies describing the process of developing mamw for libraries. joe murphy, a technology librarian at yale university, identifies the significance of user training in managing the change from traditional to mobile search and advises librarians to explore the mobile literacy skills of their patrons and educate them on how to use new systems.103 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 54 data management mamw cannot function properly without clean data. cleaning up data, curating data, and addressing other data-related issues are some of the least mentioned activities in the literature for developing mamw. however, it is necessary for librarians engaged in developing mamw to identify and address common challenges for managing data when used for mamw. for example, it might be a good strategy for librarians to study the best practices for managing data-related issues when offering reference services using sms .104 skills needed for maintaining mamw documentation and version control of software past studies recommend developing a mobile strategy for building a mobile-tracking device and evaluating mobile infrastructure to ensure the continued assessment and monitoring of mobile usage and trends among patrons.105 however, past studies do not report or provide many details about the maintenance of mamw, which leads us to infer that maintenance of mamw involving documentation and version control is a neglected aspect of their development. open source software development is increasingly becoming a common practice for developing mamw. implementing version-control software (e.g., subversion and github) to accommodate the needs of developers distributed across the world is a necessity for developing mamw. versioncontrol software provides a code repository with a centralized database for developers to share their code, which minimizes errors associated with overwriting or reverting code changes and maximizes software development collaboration efforts.106 conclusion there are various forces driving change in the knowledge and skills area for information professionals: technologies, changing environments, and the changing role of it in managing and providing services to patrons. these forces affect all levels of it-based professionals, those responsible for information processing and those responsible for information services. this paper has examined the key steps and precautions to be taken while developing mamw to better serve their patrons. after analyzing the existing guidance offered by librarians and it professionals from the system analysis and design perspective, we find that some of the most ignored activities in mamw development are selecting appropriate software development methodologies, prototyping, communicating with stakeholders, software version control, data management, and training patrons to use newly developed or revamped mamw. the lack of attention to these activities could hinder libraries’ ability to better serve patrons using mamw. it is necessary for librarians and it professionals to pay close attention to the above activities when developing mamw. information technology and libraries | september 2016 55 our study also shows that web programming and hci are the two most widely used technology areas for developing mamw for libraries. to save their scarce financial resources, which otherwise could be invested in partnering with external it professionals, libraries could either train their existing staff or recruit lis graduates equipped with the skills and knowledge identified in this paper to develop mamw (see table 2). # key steps for developing mamw skills and knowledge required for developing mamw a planning phase 1 forming and managing team human resource management 2 making strategic choices time management cost management quality management human resource management (e.g., staff capacity) 3 requirements gathering research (empirical and secondary) 4 managing scope (e.g., managing financial resources, prioritizing tasks, identifying mission-critical features of mamw, etc.) scope management 5 selecting an appropriate software development method time management cost management quality management b analysis phase 6 requirements analysis research (empirical and secondary) 7 communication/liaising with stakeholders communications management c design phase 8 prototyping software development (hci) 9 selecting hardware and programming languages and platforms software development (web programming and hci) 10 designing user interfaces of mamw software development (hci) d implementation phase 11 programming for mamw software development (web programming—e.g., android, ios, visual c++, visual c#, visual basic, etc.) 12 testing mamw software development (web programming and hci) identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 56 13 training patrons human resource management 14 data management (e.g., cleaning up data, curating data, etc.) data management e maintenance phase 15 documentation and version control of software software development (web programming and hci) table 2. skills and knowledge necessary to develop mamw the management of scope, time, cost, quality, human resources, and communication related to any project is known as project management.107 in addition to the skills and knowledge related to project management, librarians would also need to be proficient in software development (with an emphasis on hci and web programming), data management, and the proper methods for conducting empirical and secondary research for developing mamw. if lis programs equip their graduate students with the skills and knowledge identified in this paper, the next generation of lis graduates could develop mamw for libraries without relying on external it professionals, which would make libraries more self-reliant and better able to manage their financial resources.108 this paper assumes a very small number of scholarly publications to be reflective of the realworld scenarios of developing mamw for all types of libraries. this assumption is one of the limitations of this study. also, the sample of publications analyzed in this study is not statistically representative of the development of mamw for libraries around the world. in the future, the authors plan to interview librarians and it professionals engaged in developing and maintaining mamw for their libraries to better understand the landscape of developing mamw for libraries. references 1. devendra potnis, ed cortez, and suzie allard, “educating lis students as mobile technology consultants” (poster presented at 2015 association for library and information science education annual meeting, chicago, january 25–27), http://f1000.com/posters/browse/summary/1097683. 2. edwin michael cortez, “new and emerging technologies for information delivery,” catholic library world no. 54 (1982): 214–18. 3. kimberly d. pendell and michael s. bowman, “usability study of a library’s mobile website: an example from portland state university,” information technology & libraries 31, no. 2 (2012): 45–62, http://dx.doi.org/10.6017/ital.v31i2.1913. 4. godmar back and annette bailey, “web services and widgets for library information systems,” information technology & libraries 29 no. 2 (2010): 76–86, http://dx.doi.org/10.6017/ital.v29i2.3146 . http://f1000.com/posters/browse/summary/1097683 http://dx.doi.org/10.6017/ital.v31i2.1913 http://dx.doi.org/10.6017/ital.v29i2.3146 information technology and libraries | september 2016 57 5. hannah gascho rempel and laurie bridges, “that was then, this is now: replacing the mobile optimized site with responsive design,” information technology & libraries 32, no. 4 (2013): 8–24, http://dx.doi.org/10.6017/ital.v32i4.4636. 6. june jamrich parsons and dan oja, new perspectives on computer concepts 2014: comprehensive, course technology (boston: cengage learning, 2013). 7. ibid. 8. andrew walsh, using mobile technology to deliver library services: a handbook (london: facet, 2012). 9. matthew b. miles and a. michael huberman, qualitative data analysis (thousand oaks, ca: sage, 1994). 10. bohyun kim, “responsive web design, discoverability and mobile challenge,” library technology reports 49, no 6 (2013): 29–39, https://journals.ala.org/ltr/article/view/4507. 11. james elder, “how to become the “tech guy and make iphone apps for your library,” the reference librarian 53, no. 4 (2012): 448–55, http://dx.doi.org/10.1080/02763877.2012.707465. 12. sarah houghton, “mobile services for broke libraries: 10 steps to mobile success,” the reference librarian 53, no. 3 (2012): 313–21, http://dx.doi.org/10.1080/02763877.2012.679195. 13. pendell and bowman, “usability study.” 14. lisa carlucci thomas, “libraries, librarians and mobile services,” bulletin of the american society for information science & technology 38, no. 1 (2011): 8–9, http://dx.doi.org/10.1002/bult.2011.1720380105. 15. elder, “how to become the ‘tech guy.’” 16. kim, “responsive web design.” 17. chad mairn, “three things you can do today to get your library ready for the mobile experience,” the reference librarian 53, no. 3 (2012): 263–69, http://dx.doi.org/10.1080/02763877.2012.678245. 18. rempel and bridges, “that was then.” 19. rachael hu and alison meier, “planning for a mobile future: a user research case study from the california digital library,” serials 24, no. 3 (2011): s17–25. 20. kim, “responsive web design.” http://dx.doi.org/10.6017/ital.v32i4.4636 https://journals.ala.org/ltr/article/view/4507 http://dx.doi.org/10.1080/02763877.2012.707465 http://dx.doi.org/10.1080/02763877.2012.679195 http://dx.doi.org/10.1002/bult.2011.1720380105 http://dx.doi.org/10.1080/02763877.2012.678245 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 58 21. lorraine paterson and boon low, “student attitudes towards mobile library services for smartphones,” library hi tech 29, no. 3 (2011): 412–23, http://dx.doi.org/10.1108/07378831111174387. 22. jim hahn, michael twidale, alejandro gutierrez and reza farivar, “methods for applied mobile digital library research: a framework for extensible wayfinding systems,” the reference librarian 52, no. 1-2 (2011): 106–16, http://dx.doi.org/10.1080/02763877.2011.527600. 23. patterson and low, “student attitudes.” 24. gillian nowlan, “going mobile: creating a mobile presence for your library,” new library world 114, no. 3/4 (2013): 142–50, http://dx.doi.org/10.1108/03074801311304050. 25. elder, “how to become the ‘tech guy.’” 26. matthew connolly, tony cosgrave, and baseema b. krkoska, “mobilizing the library’s web presence and services: a student-library collaboration to create the library’s mobile site and iphone application,” the reference librarian 52, no. 1-2 (2010): 27–35, http://dx.doi.org/10.1080/02763877.2011.520109. 27. stephan spitzer, “make that to go: re-engineering a web portal for mobile access,” computers in libraries 3 no. 5 (2012): 10–14. 28. houghton, “mobile services.” 29. cody w. hanson, “mobile solutions for your library,” library technology reports 47, no. 2 (2011): 24–31, https://journals.ala.org/ltr/article/view/4475/5222. 30. terence k. huwe, “using apps to extend the library’s brand,” computers in libraries 33, no. 2 (2013): 27–29. 31. edward iglesias and wittawat meesangnill, “mobile website development: from site to app,” bulletin of the american society for information science and technology 38, no. 1 (2011): 18– 23. 32. jeff wisniewski, “mobile usability,” bulletin of the american society for information science & technology 38, no. 1 (2011): 30–32, http://dx.doi.org/10.1002/bult.2011.1720380108. 33. jeff wisniewski, “mobile websites with minimal effort,” online 34, no. 1 (2010): 54–57. 34. hahn et al., “methods for applied mobile digital library research.” 35. j. michael demars, “smarter phones: creating a pocket sized academic library,” the reference librarian 53, no. 3 (2012): 253–62, http://dx.doi.org/10.1080/02763877.2012.678236. http://dx.doi.org/10.1108/07378831111174387 http://dx.doi.org/10.1080/02763877.2011.527600 http://dx.doi.org/10.1108/03074801311304050 http://dx.doi.org/10.1080/02763877.2011.520109 https://journals.ala.org/ltr/article/view/4475/5222 http://dx.doi.org/10.1002/bult.2011.1720380108 http://dx.doi.org/10.1080/02763877.2012.678236 information technology and libraries | september 2016 59 36. kim griggs, laurie m. bridges, and hannah gascho rempel, “library/mobile: tips on designing and developing mobile websites,” code4lib no. 8 (2009), http://journal.code4lib.org/articles/2055. 37. demars, “smarter phones.” 38. hahn et al., “methods for applied mobile digital library research.” 39. beth stahr, “text message reference service: five years later,” the reference librarian no. 52, no. 1-2 (2011): 9–19, http://dx.doi.org/10.1080/02763877.2011.524502. 40. patterson and low, “student attitudes.” 41. ibid. 42. ibid. 43. hanson, “mobile solutions for your library.” 44. stahr, “text message reference service.” 45. spitzer, “make that to go.” 46. allison bolorizadeh et al., “making instruction mobile,” the reference librarian 53, no. 4 (2012): 373–83, http://dx.doi.org/10.1080/02763877.2012.707488. 47. maura keating, “will they come? get out the word about going mobile,” the reference librarian no. 52, no. 1-2 (2010): 20-26, http://dx.doi.org/10.1080/02763877.2010.520111. 48. patterson and low, “student attitudes.” 49. hanson, “mobile solutions for your library.” 50. patterson and low, “student attitudes.” 51. hanson, “mobile solutions for your library.” 52. cody w. hanson, “why worry about mobile?,” library technology reports no. 47, no. 2 (2011): 5–10, https://journals.ala.org/ltr/article/view/4476. 53. keating, “will they come?” 54. spitzer, “make that to go.” 55. kim, “responsive web design.” 56. wisniewski, “mobile usability.” 57. elder, “how to become the ‘tech guy.’” http://journal.code4lib.org/articles/2055 http://dx.doi.org/10.1080/02763877.2011.524502 http://dx.doi.org/10.1080/02763877.2012.707488 http://dx.doi.org/10.1080/02763877.2010.520111 https://journals.ala.org/ltr/article/view/4476 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 60 58. sally wilson and graham mccarthy, “the mobile university: from the library to the campus,” reference services review 38, no. 2 (2010): 214–32, http://dx.doi.org/10.1108/00907321011044990. 59. brendan ryan, “developing library websites optimized for mobile devices,” the reference librarian 52, no. 1-2 (2010): 128–35, http://dx.doi.org/10.1080/02763877.2011.527792. 60. kim, “responsive web design.” 61. connolly, cosgrave, and krkoska, “mobilizing the library’s web presence and services.” 62. demars, “smarter phones.” 63. mark andy west, arthur w. hafner, and bradley d. faust, “expanding access to library collections and services using small-screen devices,” information technology & libraries 25 (2006): 103–7. 64. houghton, “mobile services.” 65. rempel and bridges, “that was then.” 66. elder, “how to become the ‘tech guy.’” 67. heather williams and anne peters, “and that’s how i connect to my library: how a 42second promotional video helped to launch the utsa libraries’ new summon mobile application,” the reference librarian 53, no. 3 (2012): 322–25, http://dx.doi.org/10.1080/02763877.2012.679845. 68. hahn et al., “methods for applied mobile digital library research.” 69. danielle andre becker, ingrid bonadie-joseph, and jonathan cain, “developing and completing a library mobile technology survey to create a user-centered mobile presence,” library hi-tech 31, no. 4 (2013): 688–99, http://dx.doi.org/10.1108/lht-03-2013-0032. 70. rempel and bridges, “that was then.” 71. iglesias and meesangnill, “mobile website development.” 72. elder, “how to become the ‘tech guy.’” 73. andrew walsh, “mobile information literacy: a preliminary outline of information behavior in a mobile environment,” journal of information literacy 6, no. 2 (2012): 56–69, http://dx.doi.org/10.11645/6.2.1696. 74. back and bailey, “web services and widgets.” 75. ibid. 76. ibid. 77. spitzer, “make that to go.” http://dx.doi.org/10.1108/00907321011044990 http://dx.doi.org/10.1080/02763877.2011.527792 http://dx.doi.org/10.1080/02763877.2012.679845 http://dx.doi.org/10.1108/lht-03-2013-0032 http://dx.doi.org/10.11645/6.2.1696 information technology and libraries | september 2016 61 78. iglesias and meesangnill, “mobile website development.” 79. bohyun kim, “the present and future of the library mobile experience,” library technology reports 49, no. 6 (2013): 15–28, https://journals.ala.org/ltr/article/view/4506. 80. pendell and bowman, “usability study.” 81. hahn et al., “methods for applied mobile digital library research.” 82. andromeda yelton, “where to go next,” library technology reports 48, no. 1 (2012): 25–34, https://journals.ala.org/ltr/article/view/4655/5511. 83. ibid. 84. hahn et al., “methods for applied mobile digital library research.” 85. houghton, “mobile services.” 86. ibid. 87. mairn, “three things you can do today.” 88. ibid. 89. tamara pianos, “econbiz to go: mobile search options for business and economics— developing a library app for researchers,” library hi tech 30, no. 3 (2012): 436–48, http://dx.doi.org/10.1108/07378831211266582. 90. demars, “smarter phones.” 91. ryan, “developing library websites.” 92. pendell and bowman, “usability study.” 93. ryan, “developing library websites.” 94. michael j. whitchurch, “qr codes and library engagement,” bulletin of the american society for information science & technology 38, no. 1 (2011): 14–17. 95. back and bailey, “web services and widgets.” 96. jingru hoivik, “global village: mobile access to library resources,” library hi tech 31, no. 3 (2013): 467–77, http://dx.doi.org/10.1108/lht-12-2012-0132. 97. elder, “how to become the ‘tech guy.’” 98. ryan, “developing library websites.” 99. west, hafner and faust, “expanding access.” 100. hu and meier, “planning for a mobile future.” 101. iglesias and meesangnill, “mobile website development.” https://journals.ala.org/ltr/article/view/4506 https://journals.ala.org/ltr/article/view/4655/5511 http://dx.doi.org/10.1108/07378831211266582 http://dx.doi.org/10.1108/lht-12-2012-0132 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 62 102. wisniewski, “mobile usability.” 103. joe murphy, “using mobile devices for research: smartphones, databases and libraries,” online 34, no. 3 (2010): 14–18. 104. amy vecchione and margie ruppel, “reference is neither here nor there: a snapshot of sms reference services,” the reference librarian 53, no. 4 (2012): 355–72, http://dx.doi.org/10.1080/02763877.2012.704569. 105. hu and meier, “planning for a mobile future.” 106. wilson and mccarthy, “the mobile university.” 107. project management institute, a guide to the project management body of knowledge (pmbok guide) (newtown square, pa: project management institute, 2013). 108. devendra potnis et al., “skills and knowledge needed to serve as mobile technology consultants in information organizations,” journal of education for library & information science 57 (2016): 187–96. http://dx.doi.org/10.1080/02763877.2012.704569 abstract introduction method forming and managing a team key steps in the analysis phase key steps for designing mamw key steps for implementing mamw skills needed for maintaining mamw conclusion forming and managing team this paper assumes a very small number of scholarly publications to be reflective of the real-world scenarios of developing mamw for all types of libraries. this assumption is one of the limitations of this study. also, the sample of publications anal... references lib-s-mocs-kmc364-20140601051249 12 cataloging geometry robert s. hazelton: school of library science, case western reserve university, cleveland, ohio a scheme is suggested for the physical arrangement of the contents of a library, in which the library as well as the books are considered as threedimensional entities, and classification is revised to refle ct this concept. don juan needs no bed, being far too impatient to undress, nor do tristan and isolda, much too in love to care for so mundane a matter, but unmythical mortals require one, and prefer to take their clothes off, if only to sleep. that is why bedroom farces must be incredible to be funny, why peeping toms are never praised, like novelists or bird watchers, for their keenness of observation: where there's a bed, be it a nun's restricted cot or an emperor's baldachined and nightly-redamselled couch, there are no effable data. ( 1) libraries are not beds-but the images are revealing and useful. that there is an information explosion going on, we are told too often by the impatient. and the very grammar of the situation reassures us: unlike tristan and isolda or a bomb, the information explosion can never explode or be exploding. for all its information (or is it just the patterns and inscriptions of a nominalist?) the library is so mundane it hardly merits a peeping tom. the information needs of the dons are seldom met by the library. unmythical mortals, however, swear by the local branch. the repositories of effable library data are small-and still far from full or accurate. perhaps most surprising is that information processing machines are still essentially foreigners in the repositories of information. and that is incredible. given this background i would like to make a suggestion based on some ideas from computer processing with a very mundane practicability -a workable compromise between the catalog and the possible computer manipulation in n dimensions. i take it that the "linearity of the catalogue" cataloging geometry /hazelton 13 is an abstraction; " .. . linearity is dictated by the physical form of the book and the characteristics of library architecture. in effect, a library is one continuous shelf of books, and each particular book represents a speciflc point in that line. it must follow, therefore, that any classification that can be applied to such an assemblage of units must necessarily exhibit a linear sequence of its terms." ( 2) the crucial words are "in effect." few libraries are in fact one continuous shelf. i know of one-it is my daughter's and suffers from a long shelf and only thirty-seven volumes. both the best and the worst of this view are exhibited as it is pushed to the extreme: "the failure of our present systems of book classification in no way condemns the act of classification as a fundamental bibliographic technique. book classification, as we have used it in the past, has failed for two reasons: one, because it has been based upon the book as a physical entity without taking into consideration the inherent character of the book as a composite intellectual product; two, because of limitations arising from the properties of our hierarchical systems of classification. jevons was right, for library classiflcation as he knew it, was indeed "a logical absurdity." by this he meant, of course, that the content of books is polydimensional, which is logically incompatible with the traditional hierarchical schematization of knowledge, which is a linear progression from general to specific. the book, then, as a physical unit, and irrespective of the dimensions of its content, must be forced into a monodimensional system in which it has only linear position. this limitation alone destroys most of the utility of traditional book classifications as instruments for the effective subject organization of library materials." ( 3) the best is this i think-the current schemes are inadequate and one of the major limitations is the notion that each book must be classified only as a linear position. more generally, this idea of linearity points to the absurdity in classification schemes. and now the worst-linearity is not just an abstraction; it is a myth and a fraud. it has not adequately represented the book as a physical object and has been constrained by the error, not the book. let us look more closely at the geometry of the book. that aspect apparently most startling to the classificationist (but not the librarian who never has enough room) is its solidity-its three dimensionality. it is impossible to build a book of less than three dimensions! the same problem exists for unmythical libraries-three dimensions are essential. practice does not easily square with the theory of one-dimensional libraries. the points on the line are far more arbitrary than you imagine. why does that line start on floor one, jump to three, back to one, up to two, and die in the basement? and having traversed the line, we will usually not have found any newspapers, any fiction , any children's books, and few journals. does that line ever flash through the shelf of brightly colored new books resplendent in the lobby impressing the children and trustees? 14 ]ourrujl of library automation vol. 5/1 march, 1972 allow the line to run through every shelf now. what most characterizes the scheme? for all the complexities, for all the work of dewey, la fontaine, and ranganathan, it is simplicity! books about the same subject or in some congenial category are insofar as possible physically proximate. by congenial category i mean a grouping according to a concept which is not a subject classification. the difficulty encountered in one dimension is purely physical. logically, any finite number of dimensions can be mapped into the integers ( i.e., one dimension) as long as the members of each dimension set are denumerable. ordered pairs are easily mapped into the integers by the following formula: i(x, y)=% (x+y)~+3x+y. this yields the progression of pairs < 0, 0 > , < 0, 1>, <1, 0>, < 0, 2>, < 1, 1> , <2, 0 >, < 0, 3>, < 1, 2>, etc. ordered triples are ordered pairs of ordered pairs and the integers: t(x, y, z)=i(x, i(y,z) ). and so on. because we have a d enumerable set of books we can accomplish a linear mapping by both subject and category. in fact , the problem is trivial because there are only a finite number of books. physically, however, neither subject nor category will remain together. to suit the library the mapping must be physically simple, but can be abstractly complex. for all his protestations, the classificationist cannot eschew the physical library. if he could-or wished to-the way is open. as i understand classification, it is vacuous without reference to its ability as a finding tool. it must concern itself with the polydimensional aspect of content but cannot disregard the codex. in answer to the question form "where is the book about ... ?" an appropriate and total response type is "at location (x, y, z) ." here x, y, and z are the spatial coordinates relative to a particular library both as to origin and values. the dewey or lc numbers of the book are incomple te answers in that they presume a knowledge of the classification structure as well as knowledge of the architecture of the building. i have suggested that a classification scheme must not disregard the codex, but must insofar as possible not be subservient to physical form. the following scheme takes advantage of the codex form, is as easily automated or computerized as current one-dimensional schemes, advances beyond one dimension, and is very relevant to finding: a library is considered as a three-dimension entity. conventions are adopted for run-on from room to room and floor to floor as for the linear scheme. each book is classified in all three dimensions-the dimensions being independent. the interpretation of each dimension is left to the discretion of the individual library. thus each book has a relative position in each dimension. (this is not an alexandrian scheme relying on absolute location. ) the following example illustrates the relevant concepts: choose a subject classification (as commonly understood ) for the x dimension. for example, let dewey numbers be arranged from left to right on the x axis. choose a category scheme for the y dimension. one could assign degrees of difcataloging geometry/hazelton 15 x d i f f i c u subject ficulty from one to seven, for example. choose a category scheme for the z dimension. one could assign numbers between one and seven running from most general to most specific. this has the following effect: standing in front of the near shelf (i.e., z=l) one can choose a subject by moving laterally. the general books will appear first with difficult items at the top, easy ones at the bottom. if the items are too general, merely move one stack forward and try again. this approach presents an unusually usable instructional layout for circular libraries. a reading lounge can be put dead center with the most subject specific books ranged about the circumference. level of difficulty is easily adjusted by looking up or down. given this apparatus you may wish to change the subject classification scheme. why not put solid state physics behind general physics instead of to the right or left? the card catalog can now be used with greater meaning. there is no reason why it cannot be a map of the shelves. the axes can be translated for ease of searching (e.g., interchange x and y for the card catalog). of particular interest is the relation between this scheme and those of a. d. booth ( 4) where access time is minimized by arranging books in the inverse order of their frequency of use. further refinements consider nonstandard shelf layouts (radial, circular, spiral). one misgiving about shelving by inverse frequency expressed by librarians is that one no longer knows where to look for a particular book in the sense that one knows when using standard schemes. this objection is easily overcome by combining the three-dimensional and frequency schemes. one dimension can be used for frequency, leaving two dimensions in which to group books by subject, difficulty, generality, color, length, or whatever you please. access time is reduced while physical grouping is retained. 16 journal of library automation vol. 5/1 march, 1972 one difficulty that will be encountered is the classification of books that are not subject-oriented-poetry and fiction, for example. these areas are not adequately dealt with in linear schemes and they could easily be left as they are. that is, two dimensions could be constants. on the other hand, it seems plausible that, given three dimensions in which to work, someone could discover congenial physical groupings that would be reasonable yet impossible in one dimension. rather than being a problem, threedimensional classification offers opportunities to cope with literatures that are not subject specific. each dimension of this scheme can be criticized on the same grounds as the current linear classification. but, taken as a whole, it provides a more powerful, much needed tool for the classificationist while allowing new approaches by automaters. its simplicity is assured because it is closer to our intuitive notions of information storage. three dimensions are necessary! references 1. w. h. auden, "the cave of nakedness," about the house ( new york: random house, 1965), p.32. 2. jesse h. shera, "classification-current functions and applications to the subject analysis of library material," in libraries and the organization of knowledge (connecticut: archon books, 1965), p.97 -98. 3. jesse h. shera, "classification as the basis of bibliographic organization," in libraries and the organization of knowledge (connecticut: archon books, 1965) , p.84, 85. 4. a. d. booth, "on the geometry of libraries," journal of documentation 25:28-42 (march 1969). 112 information technology and libraries | june 2006 book review debra shapiro, editor strategic planning and management for library managers by joseph r. matthews. westport, conn.: libraries unlimited, 2005. xiv, 150p. $40 (isbn 1-59158-231-8). the reality for most librarians is that, sometime in their career, they will be involved in strategic management and planning. while library school courses occasionally deal with this topic, it is from a theoretical perspective only. most librarians are promoted or coerced into leadership and management roles, often with little or no training or resources at their disposal to assist them with the transition or change of responsibilities. strategic planning is one of those duties assigned to library managers and leaders that often get pushed to the lowest-priority list, mainly because there are few guidelines and handbooks available in this area. since the publication of donald riggs’s strategic planning for library managers (oryx, 1984), little attention has been given to this vital topic. matthews’s book attempts to provide information on how to explore strategies; demystify false impressions about strategies; how strategies play a role in the planning and delivery of library services; broad categories of library strategies that can be used; and identification of new ways to communicate the impact of strategies to patrons. as the author states in the introduction, the focus of libraries has moved from collections to encompass the arena of change itself. finding strategies to enable operation in a fluid environment can mean the difference between relevance and irrelevance in today’s competitive information marketplace. the book is divided into three major sections: (1) what is a strategy, and the importance of having one; (2) the value of and options for strategic planning; and (3) the need to monitor and update strategies. the first four chapters make up the first section. chapters 1 and 2 go through the semantics and the need for strategies, as well as the realities and limitations of strategies. chapter 3 provides brief introductions to schools of strategic thought. these include the design school, the planning school, the positioning school, the entrepreneurial school, the cognitive school, the learning school, the power school, the cultural school, the environmental school, and the configuration school. chapter 4 introduces types of strategies: operational excellence, innovative services, customer intimacy, and the concept of strategic options. section 2 consists of chapters 5 through 8 and provides information on what strategic planning is, what its value is, process options such as planning alternatives and critical success factors, and implementation. section 3, comprised of chapters 9 and 10, focuses on the culture of assessment; monitoring and updating strategies; and tools available for managing the library. two appendixes are provided: one containing sample library strategic plans, and another with a critique of a library strategic plan. overall, the book is very straightforward and understandable, with numerous illustrations, process workflows, and charts. i found the information very interesting and useful, and the final section on assessment and measurement of strategic planning is essential for libraries to implement and monitor in today’s marketplace. the various explanations related to schools of strategic thought were especially helpful. this book should be read by every library manager and director involved in strategic planning and process.—brad eden, associate university librarian for technical services and scholarly communication, university of california, santa barbara ebsco cover 3 lita 107, 111, covers 2 and 4 index to advertisers lib-mocs-kmc364-20140106083630 development of a technical library to support computer systems evaluation 173 patricia munson malley: librarian, u. s. army computer systems support and evaluation command, washington, d.c. this paper reports on the development and growth of the united states army computer systems support and evaluation command (usacssec) technical reference library from a collection of miscellaneous documents related to only fifty computer systems to the present collection of approximately 10,000 hardware/software technical documents related to over 200 systems from 70 manufacturers. special emphasis is given to the evolution of the filing system and retrieval techniques unique to the usacssec technical reference library, i.e., computer listings of available documents in various sequences, and development uf the cataloging system adaptable to computer technology. it is hoped that this paper will be a contribution toward a standard approach in cataloging adp collections. the advent of the computer has created a situation which has been labeled the "information explosion." through automatic data processing, managers of all types can have available to them information previously impossible. many authors have addressed this situation from many aspects. however, little has been said of the explosive growth of information about computers themselves and of ways to cope with it. this paper is intended to help overcome this void. it is a description of the system installed by the united states army computer systems support and evaluation command ( usacssec) to provide controls on its extensive library of technical literature pertaining to automatic data processing equipment. the usacssec has the mission of selecting and procuring this equipment to satisfy requirements of the army, a process that involves analyzing and evaluating technical proposals made by computer manufacturers. the analysts of the command require immediate access to detailed technical literature on all aspects of commercially available adp hardware and 174 journal of library automation vol. 4/4 december, 1971 software. this literature is maintained in the command technical reference library. in form it ranges from single-page summaries to multi-volume bound collections. it includes periodicals, books, brochures, and reference works. in approximately five years, the library's vendor documentation has grown from approximately 200 to 10,000 manuals on over 200 computer systems. the library's holdings also include information on peripheral equipment from over 170 manufacturers, e.g., printers, magnetic tape transports, microfilm, platters, memories, etc.; standards; gsa federal supply schedules; programmed instruction courses published by vendors; and major reference works with monthly supplements. in the early days of the library's existence, one librarian was able to catalog and shelve the material manually with no difficulty. however, the rapid growth in the availability and use of adp brought with it a flood of technical literature which threatened to inundate the librarian and the manual filing methods. it was recognized early that some form of automation assistance for the library was necessary. the system described in this paper is the one which evolved and is now successfully employed. system description the system, named access (automated catalog of computer equipment and software systems), used by the usacssec is characterized by simplicity. it is built around a master list of all holdings, and the key to its uniqueness and success is the cataloging scheme. manufacturers have various methods of identifying their literature, some having structured stock numbers, some using only the document title, and others ranging between these extremes. the only common identifier is document title, which offers inadequate access to the collection. an efficient cataloging scheme is therefore of primary importance as a means of identifying and retrieving documents. searches made by the analysts for whom the library is maintained usually fall into one of three types: . 1) location of a specifically identified document (e.g., the cobol programming manual for the univac 1108 computer system); 2) location of all documents pertaining to specific aspects of a particular computer system (e.g., technical descriptions of all output devices for the burroughs b3500 system); 3) location of all documents pertaining to particular aspects of a number of different computer systems (e.g., technical descriptions of line printers for ibm system 360, burroughs b3500, honeywell 200, rca spectra 70, and univac 1108). in 1966, since approximately 75% of the literature in the library was ibm oriented, ibm's index of system's literature, which categorizes documents by subject, was used as an initial model to classify literature of other manufacturers. since that time a more sophisticated, explicit and expanded subject index has been developed. table 1 shows a complete list of categories, together with an explanation of them. computer systems libraryjmalley 175 table 1. representative subject categories and codes hardware categorization subject code (tab) 00 01 03 05 07 08 09 abbreviated title general information machine system input/output magnetic tape units and controls direct access storage units and controls analog equipment auxiliary equipment subject category content systems summaries, bibliographies, configurators, publications guide, brochures on systems where no technical documentation is provided and price lists not in the gsa federal supply schedule. ex: publications guide with addendas. principles of operation, operator manuals, operating procedures, reference and system manuals. ex: processor systems information manual, operating manual. component descriptions of unit record equipment, e.g., line printers, paper tape readers, card readers, etc. ex: printers reference manual, card punch style manual. component descriptions and operation of the units. ex: magnetic tape unit operating manual. component descriptions and operation procedures. ex: disc storage subsystem and reference manual. information related to analog computers. also includes the interface equipment for connecting to digital computers. ex: integrated hybrid subsystem. includes plotters, digitizers, optical character readers, all nonstandard i/ 0 devices. interface equipment. ex: graph plotters. 176 journal of library automation vol. 4/4 december, 1971 10 13 15 19 20 21 24 communications and remote terminal equipment special and custom features physical planning specifications original equipment manufacturers information component descriptions of communication control devices and remote terminals. ex: a. voice response unit. b. visual display unit. c. teletype, typewriter terminals. d. graphic display units. special feature descriptions and custom feature descriptions. (those devices that must be custom built.) ex: a. satellite coupler. b. programmed peripheral switch. c. special feature channelto-channel adapter. d. european communication line terminal. installation and physical planning manuals. ex: site preparation and installation manual. devices subcontracted from other manufacturers. ex: component subleased from one manufacturer for use on own vendors equipment. software categorization programming systemsgeneral assembler cobol general concepts and systems summary related to the software of the system. ex: a. catalog of programs. b. programmer's guide. reference and programming manuals on the assembly language ( s) of the system. ex: a. assembler language. b. card assembler reference manual. reference and programming manuals on the cobol language. ex: cobol reference manual. 25 26 28 30 31 32 computer systems libraryjmalley 177 fortran other languages report program generator input/output control systems data management systems literature on the utility programs reference and programming manuals on the fortran language (includes basic). ex: fortran iv operations manual. reference and programming manuals on other higher-order general purpose languages such as algol, jovial, etc. ex: a. algol programmers' guide. b. jovial compiler reference manual. reference and programming manuals on report program generator ( rpg) languages. ex: report program generator reference manual. information related to the software facilities for the control and handling ·of input/output operations. ex : a. operating systems basic locs. b. computer systems input/output package. information related to generalized information processing systems which include the functions of information storage, retrieval, organization, etc. ex: a. ibm -gis b . burroughs forge c. ge-lds standard routines used to assist in the operation of the computer; e.g., a conversion routine, sorting routine or a printout routine. ex: a. utility system general information manual. b. utility systems programming manual. 178 journal of library automation vol. 4/4 december, 1971 33 35 36 37 48 sort/merge systems simulators/ emulators language translators operating systems, supervisors-monitors automatic testing programs miscellaneous programs information related to software facilities whose major functions are to sequence data in a disciplined order according to defined rules. ex: a. sort /merge timing tables. b. general information sort/ merge routines. information related to techniques, hardware or software, utilized to make one computer operate as nearly as possible like some other computer. ex: a. flow simulator information manual. b. emulation information manual. information related to the programs of a system which are responsible for scheduling, allocating and controlling the system resources and application programs. ex: a. disk / tape operating system operation manual. b. operating system programmers. interpretive diagnostic techniques which provide analysis of hardware components or of software programs; e.g., hardware autotest programs, software trace routines. ex: a. program writing and testing bulletin. b. system test monitor diagnostic. information related to special techniques or application programs. ex: a. apt general information manual. computer systems library/malley 179 documents are shelved (in loose-leaf notebooks) by manufacturer, computer system, subject category and numerical publication identification. the user is aided in his searches by the following three types of listings of holdings: 1) listing by manufacturer (figure 1 ): major sort field, manufacturer; intermediate sort field , computer system nomenclature; intermediate sort field, subject code (tab); and minor sort field, publication number. that is, a document is listed by publication number, within subject code, within the computer system, within the manufacturer. this list serves as an index to the library's holdings. ~ ibm ibm ibm i bm ibm ibm ibm ibm ibm ibm ibm ibm ibh system tab sys/3 70 00 sys/370 00 sys/370 01 sys / 370 01 sys/370 01 sys / 370 01 sys/370 01 sys / 370 03 sys / 370 03 sys/370 07 sys / 370 07 sys/370 15 sys/370 15 usacssec technical reference library catalog as of june 71 ibm corporation library listing by mfr by system mrs . malley, librarian pub no publication title a33-3006-0l sys / 370 model 135 configurator 710300 n20 -0360 71 *srl newsletter index of publications + programs 701231 a22 693500 sys/370 mod 165 functional characteristics 700600 a22-6942 00 sys/370 hod 155 functional characteristics 700600 a22 -700000 sys / 370 principles of operation 700600 c20 -172900 a guide to system / 370 model 165 700600 c20 173400 *a guide to the ibm system/ 370 model 145 700900 a21 9124 -0l 3505 card reader, 3525 card punch subsystem 710300 a24 3550 -0l 3215 -1 console printerkeyboard comp descr 700700 a26-1592-00 3830 stg contrl / 3330 disk storage comp desc 700600 a26 -160600 2319 disk storage component sumhary 700900 a22 697000 system/ 370 model 15 5 installation man phys plan 700600 a22 6971 -00 system/ 370 model 165 installation man phys plan 700600 *indicates new entries since last catalog. fig. 1. sample index listing by manufacturer name and system. 2) listing by subject code (figure 2) : major sort field, subject code (tab); intermediate sort field, manufacturer; and minor sort field, computer system nomenclature. that is, a manual is listed by computer system, within the manufacturer, within the subject code. within each subject code, or tab, all manuals pertaining to this subject area are listed. 3) listing by manufacturer name and publication number (figure 3) : major sort field, manufacturer; intermediate sort field, publication number. that is, a document is listed by publication number within the manufacturer. 180 journal of library automation vol. 4/ 4 december, 1971 mfr system tab cdc 6000 24 cdc 6000 24 602s3000b 60191200a pu blication title *6000 series cobol 3 reference manual 64/6s/6600 cobol reference manual 700700 690900 rca spec70 24 ec 001 s00 *ansi cobol language translator (ucolt)prog pub 701200 rca 3301 24 940sooo realcom cobol 660soo un! 1108 24 fsd 20s l *fd ansi cobol prog ref man 700s04 un! 1108 24 up 7626 r2 *cobol exec 2 & exec 8 supplementary ref 700911 uni 9200 24 up 7s43 r2 *cobol supplementary ref-see 9300 24 700s11 uni 9300 24 up 7s43 r2 *cobol supplementary ref 700s11 uni 9300 24 up 7820 *9200/9300 cobol summary card 700917 uni 9400 24 up 7709 rl *9400 cobol su pplementary ref 700630 uni 9400 24 up 7797 *9400 cobol summ~ry card 700707 xds sigmas 24 901s01a cobol6s operations 680700 xds sigmas 24 90 1sooa cobol 6s reference 680700 *indicates new entries s ince l ast catalog. fig. 2. sample index listing by subject code (tab), manufacturer name i and system . mfr system tab pub no publication title pub date ibm sys/370 01 a22 6935 -00 sys / 370 mod 165 functional characterstics 700600 ibm sys / 370 00 a22 6944 -0l model 195 configurator 691100 ibm sys/370 01 a22 6962 -00 sys/370 mod 155 channel characteristics 700600 ibm sys/370 15 a22 6971 -00 system/370 model 165 i nstallation man phys plan 700600 ibm sys /360 19 a22 6974 -00 sys/360 370 i i o interface channel 710200 ibm sys/370 01 a22 7000-00 sys / 370 principles of operation 700600 ibm 7070 t 7074 01 a22 7003 -06 7070/7074 pri!iciples of operation 620000 ibm 1401/1460 00 a24 -140l -02 1401 system summary 650900 ibm sys/370 07 a26 1606-00 2319 disk storage component summary 700900 ibm sys/370 01 c20 1738 0l a guide to systej>i/370 model 135 710 300 ibm sys/370 15 c22 7004 00 sys / 370 installation manual physical planning 710 100 ibm sys/360 26 320 1011 01 call/360 & pl/1 subroutine ver 2 700200 ibm sys/360 25 320 1054-00 call/360 fortran reference manual 700200 figure 3. sample i ndex l i st i ng by manufacture r name and publ i cation numbe r . fig. 3. sample index listing by manufacturer name and publication number. computer systems library/malley 181 the manufacturer needs only to list his documents pertaining to a proposal and an analyst can find them immediately by using this listing. this listing also aids the manufacturer in updating his documents on file in the library, as most manufacturers publish their own index of publications in numerical order. the above lists are generated by sorting and listing a master file. the latter is maintained on magnetic tape and updated with punch cards. four card formats are employed, one for each of the following: 1) addition of publications, 2) deletion of publications, 3) change of title or date of a publication in the file, and 4) change of other information. tables 2 through 5 show the format for each type of card. it should be noted that in table 3, information in columns 1-26 must be identical to that in the entry to be deleted, and that the publication title and publication date are not changed by the card described in table 5. table 2. punch card format for addition of a publication card columns information 1-3 manufacturer (abbreviated) 4-12 system number 13-14 subject code 15-26 publication number 0 27 the letter 'a' (key for adding 28-74 75-80 a publication) publication title publication date table 3. punch card format for deletion of a publication card columns information 1-3 manufacturer 4-12 system number 13-14 subject code 15-26 publication number 0 27 the letter 'd' (key for deleting a publication) table 4. punch card format for change of title or publication date columns information remarks 1-3 manufacturer identical 4-12 system number to 13-14 subject code listing 15-26 publication number 0 27 the letter 'c' 28-74 the new title if applicable 75-80 the new publication date if applicable 182 journal of library automation vol. 4/4 december, 1971 table 5. punch card format for change of manufacturer, system, tab, or publication columns information remarks 1-3 manufacturer identical 4-12 system number to 13-14 subject code listing 15-26 publication number 27 the letter 'x' 28-30 new manufacturer name 31-39 new system number 40-41 new subject code 42-53 new publication number a simple program written in cobol for the univac ll08 is used to implement access. data cards are read into memory, and the master tape file is updated. errors such as "no match" or incorrect format are identified during the update process. the updated master file is sorted to provide the three types of output listings described above. system development the present system evolved over a five-year period. the initial catalogs were prepared and maintained manually, and some of the better features of the early attempts were carried forward into the automated system. because of this evolution, it is difficult to determine the actual development cost of access. much of the detailed design was done in connection with development of the computer program. approximately seven man-months were required for preparation and debugging of the program. during this period, a total of approximately two hours of univac ll08 system time was required. negligible time has been spent on program maintenance since installation of access. not unexpectedly the greatest effort was expended in collecting and preparing data for the initial master file. the library in 1967 contained over 3,000 documents, and a punch card had to be prepared for each. the major adpe manufacturers cooperated in this undertaking, by providing properly punched cards for individual documents. cards were prepared by the usacssec for documents provided by small manufacturers and for miscellaneous documents in the library. the major manufacturers have continued their assistance in maintaining the data base, providing punch cards with all new documents delivered to the library. nevertheless, it cannot be stressed too strongly that the updating and maintenance of this library file is a very difficult and tedious task representing the work of a full-time librarian, library assistant and computer systems library /malley 183 clerk the library may receive 600 new documents and/or page changes, with or without cards, during a thirty-day period. the master file is updated and new listings produced every sixty to ninety days. more frequent runs would prove more beneficial to the users and require less manpower on the part of the staff. each run requires approximately ten minutes of univac 1108 system time. it is an interesting fact that communication was a problem during detailed design of access. adp system analysts and programmers thought and spoke in terms of codes, fields, sorts and files; the librarian operated in a context of documents, catalog cards, and indexes. a period of mutual education was necessary before effective communication transpired and the system design progressed. results the library today contains almost 10,000 hardware/software equipment documents on over 200 computer systems from 70 manufacturers. the flexibility inherent in access permitted the library to absorb this rapid growth with minor perturbation. during one six-month period documents describing the mini-computers of twenty manufacturers were added. the subject codes accommodated all documents, and the only modification required to the system was the addition of codes for these new manufacturers. the value of access was demonstrated when ibm and rca announced the new system 7. documentation on the available hardware and software was delivered on the day of announcement together with punch cards, and within one week this large addition to the collection was completely integrated into the catalog. adpe manufacturers also have benefitted from access. the army requires that adpe vendors, to be eligible for contracts, must maintain current technical documentation of their proposed systems in the usacssec library. manufacturers are provided copies of the listings pertaining to their equipment to check for compliance with the requirement. some manufacturers have even accepted the access cataloging scheme for use in their own libraries. access has met the objectives established for it. benefitting from the evolutionary nature of the cataloging scheme, the system has required a minimum of modifications to date. none of these has been substantive, falling more in the category of debugging rather than in that of design change. although access was initiated and installed to satisfy the unique requirements of the usacssec, it has general application. it brings order to the conglomeration of technical information on adp systems and equipment. the three listings that it produces become, in effect, axes for the multi-dimensional volume of information. 184 journal of library automation vol. 4/4 december, 1971 conclusion the usacssec technical library is recognized as having the most extensive holdings of adpe manufacturer's literature in the washington area. no libraries of equal or greater size are known to exist anywhere. it was planned initially that only usacssec analysts and technicians would have access to the information in the usacssec library. however, the resulting interest of various organizations of the department of defense (dod), and the fact that this collection provided information that was otherwise unavailable, prompted the command to open the library to a selected group of dod users. this initial relaxation has gradually evolved into provision for all government and military personnel receiving prior clearance from command headquarters usacssec to utilize the library for research. unfortunately, because of the type of material collected, the quantity available, and the constant demand, it has not been possible to permit the lending of materials. at present, approximately eighty personnel from other government agencies use the library each month for research. some of the agencies use it each month for evaluation and selection of computers. user reaction is amazement that such a collection of adp materials exists. it is not unusual for relatively new and thoroughly dynamic fields of interest to progress so rapidly that efforts to document them adequately lag behind the latest developments. the problem is particularly acute in the information processing field, whose large amount of technical literature is of little value without an efficient cataloging system. usacssec has solved some of the information problems in the computer field by examining in detail the special on-the-job requirements of computer system analysts in general. by developing its library in terms of the computer industry, rather than specifically to one command's requirements, a generalized library system in adp has evolved. it is hoped that this paper will be a contribution toward a standard approach in cataloging adp collections and creation of a commonality among adp technical libraries. 105 application of the variety-generator approach to searches of personal names in bibliographic data bases-part 1. microstructure of personal authors' names dirk w. fokker and michael f. lynch: postgraduate school of librarianship and information science, university of sheffield, england. conventional approaches to processing records of linguistic origin for storage and retrieval tend to regard the data as immutable. the data generally exhibit great variety and disparate frequency distributions, which are largely ignored and which entail either the storage of extensive lists of items or the use of complex numerical algorithms such as hash coding. the results in each case are far fmm ideal. the variety-generator approach seeks to reflect the microstructure of data elements in their description for storage and search, and takes advantage of the consistency of statistical characteristics of data elements in homogeneous data bases. in this paper, the application of the variety-generator approach to the description of personal author names from the inspec data base by means of small sets of keys is detailed. it is shown that high degrees of partitioning of names can be obtained by key-sets generated from the initial characters of surnames, fmm the terminal characters of surnames, and from the initials. the implications of the findings for computer-based bibliographical information systems are discussed. introduction the application of computer technology to the storage of bibliographic data bases and to the selection of items from them on the basis of the content of specified data elements poses considerable problems. among the most important of these, from the viewpoint of the efficiency of computer use, is the fact that many of the individual data elements exhibit great variety (i.e., lists of their contents are extensive), and show relatively disparate distributions. this behavior is encountered in different degrees in regard to items such as words in the titles of monograph or periodical ar106 ]oumal of library automation vol. 7/2 june 1974 ticles, assigned subject headings, authors' names, and citations.14 such distributions have been extensively studied in various contexts by bradford, zip£, and mandelbrot.4-6 in general, the distributions are approximately hyperbolic, so that a small proportion of items may account for a substantial proportion of occurrences, while the majority of items occur only infrequently. the studies have been well reviewed by fairthorne.7 of all the data elements, personal author names exhibit a distribution which is at its most exh·eme in one direction. as is shown later in this paper, the most frequent author name in a file of 50,000 names occurred only sixteen times, while over 35,000 of the names, or over 70 percent of the file, occurred once only. a simple and general strategy for dealing with searches of data elements, the contents of which show large variety and disparate distributions, is under development by the research unit at the sheffield school, and has thus far been elaborated in regard to searches of chemical structures and of natural-language data bases. 8• 9 based on information-theoretic principles, it involves a two-stage search procedure in which in the first and rapid stage the majority of items which cannot possibly fulfill the search criteria are eliminated, while those which meet the criteria are examined for an exact match at the second stage. the criteria (or attributes) are selected on the basis of an examination of the microstructure of the items in the data base, and are chosen so that their frequencies are approximately equal. the number of criteria or attributes chosen for description of the items is variable within a wide range; with their aid, the variety of items can be described so as to facilitate discrimination among them. in the context of substructure searching, the attributes are representations of fragments of chemical structures,10 while in the case of text, they are strings of characters which are variable in length. these strings are long when the characters comprising them represent frequent combinations, and short when the characters are infrequent.11 since the sets of attributes can generate, in an approximate manner, the variety of items encountered in the data base, they are termed variety generato1·s. they are intermediate in number between the primitive set of symbols ( alphanumeric characters in the case of text, atoms and bonds in that of chemical structures) and the actual variety of items in the collection (words or word fragments in text in the first instance, and molecules in the second). the variety-generator approach involves recognition of the fact that the statistical properties of specific data elements within homogeneous data bases are relatively constant, and that the primitive symbols of the data elements themselves usually show hyperbolic distributions. new symbol sets can therefore be defined, consisting of sequences of primitive symbols such that their frequencies of occurrence become comparable. the new symbol sets then constitute the attributes which are employed, singly or in combination, to represent the items within a search file. these symbol sets variety-generator approachjfokker and lynch 107 approximate to the ideal of equifrequency postulated by shannon for optimal efficiency in communication. 12 only an approximation can be obtained, however, since the distributions of the newly defined symbols still cover a relatively wide range, and since they are seldom entirely independent of one another in statistical terms, and may often be strongly associated. the variety-generator concept is not entirely novel. indeed, it was anticipated most closely in precisely the present context by merrill and by cutter with a view to subdividing a library's holdings into equal groups of items.13 • 14 however, the greater flexibility of computer techniques would appear to make its use today even more attractive. this paper thus describes a study of a large file of authors' names with a view to identifying attributes of the names which can be used for efficient reh·ieval purposes. assessment of the effectiveness of the attributes in retrieval is described in part 2 of this series. (t the main terms used here are n-gram, key, and key-set, where an n-gram is a string of n adjacent characters. a key consists of an n-gram, and keys are chosen so that the frequencies of a set of keys (or key-set) are approximately equivalent in a given file. the measures used in assessing frequency distributions are shannon's expressions for the entropy of a sequence of symbols: and relative entropy: i h = i p1log2pi i= 1 h _ hactual rhmaximum hmaxlmum is reached when the probabilities of occurrence of the symbols of the sequence are equal; its value is the binary logarithm of the variety of symbols, since 1 1 h =n(-log2-) =log2n n n the value of the relative entropy is thus a measure of the degree of equifrequency of a set of symbols, and is independent of their variety. characteristics of name file the file studied was a collection of 100,000 personal names taken from ten issues of the inspec data base dating from the period 1969 to 1972. the names are represented in variable-length format, surname followed by a comma, space and initials each followed by a period. for the present purpose, case and diacritic shift symbols were ignored. <~>to appear in the september 1974 issue of the journal of library automation. 108 journal of library automation vol. 7/2 june 1974 subsets of the file were first sorted into sequence on the basis of the full names, and distributions determined both for surnames and initials, and for surnames alone, as shown in table 1 for the subset of 50,000 names. since the great majority of full names occur once only, the relative entropy of this distribution, at 0.975 (computed with respect to the 50,000 names, i.e., hmax= log250,000), is high, while that for surnames alone is lower, at 0.904. an analysis of the ratio of unique surnames to the total number of entries in files of 25,000, 50,000, 75,000 and 100,000 names showed that the proportion of different surnames added to the file as it increases in size is predictable. the relationship between the number of different surnames (d) and the total number of entries ( n) conforms to the expression: d=antl where a = 5.89 and {3 = 0.78. next, the frequencies of characters at different positions in the surnames and of the initials were determined. the most important positions in the surname are the first and last characters, as will be seen shortly. the distributions of these characters and of the first and second initials are shown in table 2. the relative entropy of the first initial is, interestingly, table 1. distribution of full names and surnames alone in a file of 50,000 inspec names. frequency f 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 > 20 full names no. of names with %of names with frequency f frequency f 35,187 70.37 4,768 19.07 1,060 6.36 302 2.42 88 0.88 34 0.41 16 0.22 7 0.11 3 0.05 1 0.03 2 0.05 1 0.03 total number of different full names = 41,469 h = 15.22 hmax = 15.61 ( log250,000) hr = 0.9753 su1·names no. of surnames with frequencyf % of surnames with frequencyf 19,894 39.79 4,258 17.03 1,597 706 395 235 134 104 68 54 36 39 36 28 24 24 15 19 16 9 112 9.58 5.65 3.75 2.82 1.88 1.66 1.22 1.08 0.79 0.94 0.94 0.78 0.72 0.77 0.51 0.68 0.61 0.36 8.44 total number of different surnames = 27,803 h = 14.11 hmax = 15.61 ( log250,000) hr = 0.9042 variety-generator approachjfokker and lynch 109 table 2. distributions of first and last characters of surname and of initials in 50,000 inspec name me. first character last character first second of surname of surname initial initial s 0.113 n 0.164 j 0.100 space 0.371 b 0.083 r 0.102 a 0.083 a 0.066 m 0.080 a 0.084 r 0.081 m 0.045 k 0.076 s 0.082 m 0.064 j 0.043 h 0.056 i 0.074 g 0.058 s 0.035 g 0.055 e 0.068 v 0.051 l 0.033 p 0.053 v 0.067 d 0.050 e 0.033 c 0.052 y 0.043 h 0.050 r 0.031 r 0.047 t 0.042 s 0.047 p 0.031 l 0.047 0 0.041 e 0.043 g . 0.030 d 0.044 l 0.040 p 0.042 c 0.030 t 0.040 h 0.037 w 0.038 w 0.028 w 0.040 k 0.033 k 0.036 v 0.028 a 0.036 d 0.030 l 0.036 h 0.027 f 0.034 g 0.026 c 0.035 d 0.026 n 0.025 z 0.013 t 0.033 i 0.026 v 0.025 m 0.013 b 0.032 f 0.024 e 0,018 u 0.013 n 0.026 n 0.024 j 0.017 f 0.006 f 0.026 k 0.022 0 0.016 c 0.005 i 0.023 b 0.020 z 0.013 w 0.005 y 0.023 t 0.013 i 0.013 p 0.004 0 0.010 y 0.007 y 0.011 x 0.004 space 0.005 0 0.005 u 0.005 b 0.003 z 0.005 z 0.002 q 0.001 j 0.001 u 0.004 u 0.001 x q 0.0002 q 0.0002 q 0.0002 x 0.0001 x 0.0001 h =4.309 h =4.039 h =4.374 h =3.688 hmax = 4. 700 (log,26) hmax = 4. 700 (log,26) hmax = 4.755 (iog.27) hmax = 4, 755 (log,27) hr = 0.917 hr = 0.859 h. =0.920 h. =0.776 the highest of the four; the highest ranking initial is j, which is one of the least frequent characters in english text. thereafter follow the first and last letters of the surname, and the second initial. the low relative entropy of the last is partly accounted for by the fact that a single initial occurred in 37 percent of the entries. distributions were also obtained for the second and subsequent characters of the surname. these, and also the distributions of the first character, are in general agreement with the results of earlier studies by bourne and ford, and by ohlman, and indicate that consonants predominate in the first position, vowels in the second position, while thereafter the distributions become less disparate. 15• 16 however, due to the variable lengths of names, the dominant character at the sixth and subsequent positions of the surname is the space character. key-set generation technique the basic key-set generation technique involves creating fixed-length 110 journal of library automation vol. 7/2 jube 1974 n-grams from some point or points of reference within each record, the strings generated being initially of length greater than those anticipated within the key-set. these strings are sorted into lexicographic order and counted. (the resultant distribution of the fixed-length strings is again hyperbolic.) the frequencies are compared with a predetermined threshold frequency-at the first stage none of the string frequencies should exceed this value. the strings are then shortened by truncation of the right-hand character, and the frequencies of the strings which have become identical through truncation are accumulated. the new n-gram frequencies are compared with the threshold value; any strings which exceed the value are noted. the procedure is repeated until the single characters are reached. two types of analysis are possible, redundant and nonredundant. ·in the latter, any string exceeding the threshold value is removed from the list and not processed further, while in the former they continue to the next processing stage. while redundant analysis is valuable at the exploratory stage, the nonredundant type is preferred for key-set generation. the procedure was first applied to strings of characters starting with the first character of each surname, as illustrated in figure 1. n-gram foreman forema forem fore for fo f frequency 11 13 24 98 143 214 1685 fig. 1. successive right-hand truncations of a surname during key-set generation here the frequency of the surname foreman in a _file of 50,000 names is eleven. when successively shortened, other surnames with the same initial n-gram are included in the count. comparison of the count with a threshold value results in selection of a key. here, if the threshold were 100, the key selected would be for. application of the procedure to the surnames of the 50,000 name file (the name records had a maximum of eighteen characters, left-justified and space-filled if less than this length), with a threshold frequency of 300 (i.e., a probability of 0.006), gave a key-set consisting of eighty-seven keys, including all the alphabetic characters. the key-set is shown, in alphabetic order, together with the probabilities, in table 3. it is clear that the most frequent characters at the beginning of the surname have produced most keys, s and m with eight keys each, b with seven, k with six, and h, g, p, and r each with five keys. whereas the relative entropy of the initial surname letter was 0.917, that of the key-set is 0.977. the probabilities of no less than seventy of the eighty-seven keys now lie between 0.005 and 0.015. the key-set itself consists of the twenty-six alphabetic characters (one of these, x, is not represented in the collection), fiftyvariety-generator approachjfokker and lynch 111 table 3. key-set of 87 keys produced from 50,000 surnames from inspec files. key p1'0bability key probability key probability key probability a .023 ga .009 m .001 ro .016 al .007 go .011 ma .022 s .027 an .006 gr .012 mar .008 sa .016 b .012 gu .007 mc .007 sch .014 ba .013 h ,006 me .010 se .008 bar .006 ha .021 mi .012 sh .016 be .017 he .010 mo .012 si .010 bo .014 ho .012 mu .008 so .007 br .014 hu .007 n .011 st .016 bu .009 i .013 na .008 t .030 c .013 j .010 ni .006 ta .010 ca .011 jo .007 0 .017 u .005 ch .016 k .015 p .011 v .015 co .013 ka .018 pa .014 va .010 d .015 ki .008 pe .011 w .011 da .009 ko .017 po .010 wa .011 de .013 kr .008 pr .006 we .008 do .007 ku .010 q .001 wi .010 e .018 l .013 r .007 x f .025 la .012 ra .011 y .011 fr .008 le .014 re .008 z .013 g .015 li .009 ri .006 h=6.2952 hmax = 6.443 (log,87) h, =0.977 eight digram keys, and the three trigram keys bar, mar, and sch. the predominance of vowels as the second character of keys is noticeable; forty-nine of the sixty-one n-grams have a vowel in the second position. the size of the key-set produced from a given data base can be varied arbitrarily by changing the threshold value. an approximately hyperbolic relation obtains between the value of the threshold and the number of keys selected. as the size of the key-set increases, the length of the longest n-gram in the key-set increases, and the distribution of n-grams shifts toward higher values, as shown in figure 2. stability of the key-sets with increase in file size is clearly an important factor. to determine the extent of this, successive portions of the entire file of 100,000 surnames were subjected to the analysis at a threshold value of 0.005. as illustrated in table 4, the key-sets are remarkably stable in regard to total key-set size, the number of keys of each length, and to the actual keys. table 4. stability of size and composition of keys with increasing file size. number of number of number of number of total size entries in file characters digrams trigrams of key-set 25,000 26 76 10 112 50,000 26 74 9 109 75,000 26 74 10 110 100,000 26 75 10 111 no, of keys common to key-sets 26 73 9 108 112 ]ou1'nal of library automation vol. 7/2 june 1974 400 300 number of n-grams 200 100 1 2 3 4 5 6 7 8 9 length of n-grams key-set size a 184 b 332 c 572 d 1034 threshold probability 0.0025 0.0015 0.0010 0.0007 10 11 12 13 fig. 2. distribution characteristics of n-grams generated from 10,000 surnames from inspec for four different threshold values as the size of the key-set increases, the range of probabilities represented among the keys narrows, and the relative entropy of the distribution increases, becoming eventually asymptotic with the value of one. this i~ illustrated in figure 3, for the surnames in a file of 50,000 entries. beyond a key-set size of about 100, increases in the relative entropy of the resultant distribution are marginal. furthermore, with increasing key-set size, the va1'iety-gene1'ato1' appmachjfokker and lynch 113 shorter and more frequent surnames begin to appear in their entirety as keys. as an alternative to increasing the variety of the keys, the production of keys from character positions after the first letter of the surname was considered. the problem of variations in name length, as well as the very different distributions of the characters at these positions, were not encouraging, and instead the production of key-sets from the last letter of the sur1 .99 .98 .97 .96 .95 .94 .93 hr .92 .91 .90 .89 .88 .87 .86 0 20 40 60 80 100 total number of keys for the front of surnames fig. 3. increase in relative entropy with increase in key-set size; keys generated from 50,000 surnames 114 j oumal of library automation vol. 7/2 june 1974 name was investigated, and proved much more ath·active, since it is largely independent of surname length. key-sets from the end of the surname for this purpose, each surname in the file was reversed within a record and subjected to key-generation. the relative entropy of the last character of the surname is substantially lower than that of the first character, at 0.860. accordingly, the key-sets have a higher proportion of longer keys than those produced from the front of the surname, as shown in table 5. this key-set consists of the twenty-six characters, seventy-eight digrams, table 5. key-set of 155 n-grams produced from last letter of 50,000 inspec surnames at threshold of 0.003. key p1'obability key p!'obability key probability key probability a .012 vich ,005 ein .005 is .012 ca .003 gh .003 kin .007 ns .006 da .008 sh .003 lin .005 ins .003 ka .006 th .005 tin .003 os .004 ma .007 ith .004 nn .010 rs .006 na .003 i .014 on .009 ss .005 ina .004 ai .004 son .013 ts .004 ra .010 hi .007 lson .004 us .004 ta .008 ii .009 nson .006 t .012 va .004 vskii .005 rson .004 dt .003 ova .010 ki .006 ton .009 et .004 wa .004 ski .005 0 .017 nt .004 ya .005 wski .004 ko .003 rt .003 b .003 li .005 nko .010 ert .004 c .005 ni .007 no .004 st .004 d .009 ri .005 to .007 tt .005 ld .005 ti .004 p .004 ett .003 nd .006 j .001 q .001 u .013 rd .009 k .010 r .005 v .001 e .020 ak .006 ar .006 ev .018 de .003 ck .009 er .016 ov .012 ee .004 ek .004 ber .003 kov .008 ge .004 ik .004 der .006 ikov .004 ke .006 l .007 ger .005 lov .005 le .008 al .006 nger .003 nov .006 ne .008 el .012 her .006 anov .006 re .006 ll .004 ier .005 rov .006 se .005 all .004 ker .007 sov .003 te .004 ell .008 ler .007 w .005 f .003 m .008 ller .005 x .004 ff .003 am .005 mer .003 y .017 g .004 n .009 ner .010 ay .004 ng .004 an .017 ser .003 ey .006 ang .003 man .014 ter .008 ley .007 ing .007 rman .003 or .004 ky .004 rg .007 yan .003 s .016 ry .005 h .004 en .018 as .007 z .007 ch .009 sen .007 es .011 tz .006 ich .003 in .019 nes .004 h=7.059 hmax = 7.276(log.155) hr = 0.970 va1'iety-generator approachjfokker and lynch 115 1 .99 .98 .97 .96 .95 .94 .93 .92 hr .91 .90 .89 .88 .87 .86 0 40 80 120 160 200 total number of keys for the end of sumames e!g. 4. increase in relative entropy with increase in key-set size; keys generated from 50,000 surnames forty trigrams, ten tetragrams, and a single pentagram. the breakdown of the individual terminal characters of the surname is also more extreme, since the distribution is more skew. thus n, the most frequent last character, has no fewer than nineteen different keys in this set, closely followed by r, with seventeen keys. the relative entropy of the distribution is again high, at 0.970 for this key-set. figure 4 shows the relation between key-set size and relative entropy, and indicates that a larger number of keys from the last character of the surname is required to reach the same relative en116 journal of library automation vol. 7/2 june 197 4 tropy as keys from the first character. there is an anomalous section of the curve, which may well derive from the much greater prevalence of suffixes than prefixes in personal names. conclusions this study has demonstrated the feasibility of devising partial representations of author names by applying the variety-generator approach to overcome the substantial frequency variations encountered in their distributions. it has also been shown that within a homogeneous file, i.e., one of consistent provenance, there exists a substantial level of consistency in terms of character distributions, as illustrated in table 4. the characteristics may vary substantially between data bases of different provenance, e.g., as between inspec and marc files. 17 conventional approaches to processing records comprising linguistic data tend to disregard the statistical properties of the items, and attempt to overcome the resultant problems either by storage of extensive lists of items or by using complex numerical algorithms. typical of this latter approach, in the present context, is the use of truncated search keys for access to bibliographical files in direct access stores, in which fixed-length character strings are the keys, as, for instance, in the system in operation at the ohio college library center.18 the problems encountered in the use of fixed-length truncated author and title search keys for monograph data are indicated by the fact that the search files using hash-addressing are operated, on average, at a density of only 62.5 percent. once the density reaches 75 percent, the proportion of collisions and the resultant degradation in performance are such that the files are recreated at a density of only 50 percent. fixed-length keys from author and title entries are demonstrably inefficient in performance since the information content is low. the distribution of the initial trigrams of 50,000 names from the inspec file provides corroboration of this fact. the number of possible combinations of three characters is 17,576 (263 ), yet only 3,285 trigrams were represented in the file, or 18.7 percent of the total variety. moreover, the relative entropy of the trigrams is much lower than that of the initial characters of the surnames, at 0.73. performance figures for precision illustrate this point.19 the present work, together with other studies of the scope for application of the variety-generator approach, thus stands in considerable contrast to prior work, and must be viewed as a means whereby the microstructure of particular data elements is fully reflected in their manipulation, affording substantial advantages. 20 part 2 of this paper illustrates this in regard to searches of personal names. acknowledgments we thank m. d. martin of the institution of electrical engineers for vm·iety-generator approachjfokker and lynch 117 provision of a part of the inspec data base and of file-handling software, and the potchefstroom university for c.h.e. (south africa) for awarding a national grant to d. fokker to pursue this work. we also thank dr. i. j. barton and dr. g. w. adamson for valuable discussions, and the former for n-gram generation programs. references i. p. b. schipma, term fragment analysis for inversion of large files (chicago: illinois institute of technology research institute, 1971). 2. j. c. costello and e. wall, "recent improvements in techniques for storing and retrieving information," in studies in co-ordinate indexing, vol. 5 (washington, d.c.: documentation inc., 1959). 3. l. h. thiel and h. s. heaps, "program design for retrospective searches on large data bases," information storage and retrieval8:1-20 (feb. 1972). 4. s.c. bradford, documentation (london: crosby-lockwood, 1948). 5. g. k. zip£, human behaviour and the principle of least effort (cambridge, mass: addison-wesley, 1949). 6. b. mandelbrot, "an informational theory of the statistical structure of language," in w. jackson, ed., communication theory (london: butterworth, 1953), p.486501. 7. r. a. fairthorne, "empirical hyperbolic distributions (bradford-zipf-mandelbrot) for bibliometric description and prediction," ]oumal of documentation 25:319-43 (dec. 1969). 8. m. f. lynch, "the microstructure of chemical data-bases, and their representation for retrieval," proceedings, cn ai nato advanced study institute on computer representation and manipulation of chemical information (in press). 9. i. j. barton, s. e. creasey, m. f. lynch, and m. j. snell, "an information-theoretic approach to text searching in direct-access systems," communications of the acm (in press). 10. g. w. adamson, j. cowell, m. f. lynch, a. h. w. mclure, w. g. town, and a. m. yapp, "strategic considerations in the design of screening systems for substructure searches of chemical structure files," ]oumal of chemical documentation 13:153-57 (aug. 1973). 11. a. c. clare, e. m. cook, and m. f. lynch, "the identification of variable-length, equifrequent character strings in a natural language data base," computer journal15:259-62 (aug. 1972). 12. c. e. shannon, "a mathematical theory of communication," bell system technical journal 27: 398-403 ( 1948) . 13. w. c. b. sayers, a manual of classification for librarians and bibliographers (london: grafton, 1926), 14. c. a. cutter, c. a. cutter's alphabetic order table ... altered and fitted with three figures by kate e. sanborn (boston: boston library bureau, 1896). 15. c. p. bourne and d. f. ford, "a study of the statistics of letters in english words," information & control4:48-67 (1961). 16. h. ohlman, "subject word letter frequencies; applications to superimposed coding," proceedings of the inte1'national conference of scientific information, vol. 2 (washington, d.c.: national academy of science, 1959), p.903-16. 17. d. w. fokker and m. f. lynch, "a comparison of the microstructure of author names in the inspec, chemical titles and b.n.b. marc data-bases" (in preparation). 118 ]oumalof library automation vol. 7/2 june 1974 18. f. g. kilgour, p. l. long, a. l. landgraf, and j. a. wyckoff, "the shared cataloging system of the ohio college library center," journal of library automation 5:157-83 (sept. 1972). 19. f. g. kilgour, p. l. long, and e. b. leiderman, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the asis 7:79-82 (1970). 20. i. j. barton, m. f. lynch, j. h. petrie, and m. j. snell, "variable-length character string analysis of three data-bases, and their application for file compression," proceedings, 1st informatics con£., durham, 1973 (in press). microsoft word september_ital_cyzyk_final.docx editorial board thoughts: information technology in libraries: anxiety and exhilaration mark cyzyk information technology and libraries | september 2015 6 a few weeks ago a valued colleague left our library to move his young family back home to pittsburgh. insofar as we were a two-‐man department, i spent the weeks following the announcement of his imminent departure picking his brain about various projects, their codebases, potential rough spots, existing trouble tickets, etc. he left, and i immediately inherited nine-‐years-‐worth of projects and custom code including all the "micro-‐services" that feed into our various well-‐designed, high-‐profile, and high-‐performing (thanks to him) websites. this was all, naturally, anxiety-‐producing. almost immediately, things began to break. early on, a calendar embedded in a custom wordpress theme crucial to the functioning of two of our revenue-‐generating departments broke. the external vendor simply made disappear the calendar we were screenscraping. poof, gone. i quickly created an ok-‐but-‐less-‐ than-‐ideal workaround and we were back in business, at least for the time being. then, two days before the july 4 holiday, our calendar managers started reporting that our google-‐calendar-‐based system was disallowing a change to "closed" for that saturday. i somehow forced a closed notification, at least for our main library building, but no matter what any of us did we could not get such a notification to show up for a few of our other facilities. i spent quite bit of time studying the custom, middleware code that sits between our google calendars and our website, and could see where the magic was happening. i now think i know what to do -‐-‐ and all i have to do is express it in that nutty programming language/platform that the kids are using these days, ruby on rails. i've never written a line of ruby in my life, but it's now or never. a little voice inside me keeps saying, "you're swimming in the deep end now -‐-‐ paddle harder, and try not to sink." while these surprise events were happening, we also switched source code management systems, so a migration was in order there, my longingly-‐awaited new workstation came in (and i'm sure you all know how painstaking it is to migrate all idiosyncratic data/apps/settings to a new workstation and ensure it's all present, functioning, and secure before dban-‐nuking your old drives), we decommissioned a central service that had been in mark cyzyk (mcyzyk@jhu.edu) a member of the ital editorial board, happily works and ages in the sheridan libraries, johns hopkins university, baltimore, maryland, usa. information technology in libraries: anxiety and exhilaration | cyzyk doi: 10.6017/ital.v34i3.8967 7 production since 2006, we fully upgraded our wordpress multisite including all plugins and themes, fixing what broke in the upgrade, and i got into the groove of working on any and all trouble tickets/change requests that spontaneously appeared, popping up like mushrooms in the verdant vale of my worklife. this was all largely in addition to my own job. so now i find myself surgically removing/stitching up code in recently-‐diseased custom wordpress themes, adding ruby code to a crucial piece of our website infrastructure, and learning as much as i can -‐-‐ but quick -‐-‐ about the wonderful and incredibly powerful bootstrap framework upon which most of our sites are built. surely it's anxiety-‐producing? you bet. but it's thrilling and exhilarating was well. i'm paddling hard, and so far my head remains above water. many days, i just can't wait to get to work and start paddling. this aging it guy suddenly feels ten years younger! (but isn't all this paddling supposed to somehow result in a swimmer's body? patiently waiting...) reproduced with permission of the copyright owner. further reproduction prohibited without permission. prospector: a multivendor, multitype, and multistate western union catalog bush, carmel;garrison, william a;machovec, george;reed, helen i information technology and libraries; jun 2000; 19, 2; proquest pg. 71 prospector: a multivendor, multitype, and multistate western union catalog the prospector project represents a unique union catalog. the origin, goals, and design of the union catalog that uses the inn-reach system are presented. challenges of the union catalog include the integration of records from libraries that do not use the innovative interfaces system and the development of best practices for participating libraries. t he prospector project is a union catalog of sixteen libraries in colorado and wyoming built around the inn-reach software from innovative interfaces, inc. (iii).1 in 1997, the colorado alliance of research libraries (the colorado alliance) and the university of northern colorado submitted a joint grant proposal to create a regional union catalog for many of the major academic and public libraries in the region. the project would allow users to view library holdings and circulation information with a single query of the central database. the union catalog also would allow patrons to request items from any of the participating libraries and have them delivered to a nearby local library. however, unlike many of the other union catalogs in the country, prospector has several unique elements: • it is multistate (colorado and wyoming). • it is multisystem (incorporating systems from innovative interfaces and carl corporation; plans call for voyager from endeavor). • it is multi-library-type (academic, public, and special libraries). regional union catalogs representing the cataloged collections of libraries that are related by geography, subject, or library type have been extant for many years. early leaders in the field spearheaded locally developed systems such as the university of california's melvyl system and the illinois library computer systems organization's (ilcso) illinet online system, which became operational in 1980.2 the commercial integrated library system market began to emerge in the late 1980s and the 1990s with such vendors as innovative interfaces and its work with ohiolink through its inn-reach union catalog product, and the carl system.3 many major vendors now have union catalog solutions for a single physical union catalog, although most have the requirement that participating libraries all use the same integrated library system. an alternative approach that is also becoming popular, because of the heterogeneous nature of the ils marketplace and the widespread implementation of z39.50, is for libraries to create virtual union catalogs through broadcast searching. this solution is available from many ils vendors as well as through organizations such as oclc and its webz software. carmel bush, william a. garrison, george machovec, and helen i. reed there is not a single "right" answer for whether regional catalog searching and document delivery is best accomplished through a physical or virtual union catalog. each solution has benefits and drawbacks that must be balanced against the mix of vendors, economics, politics, and technical issues within a state. prospector is somewhat unusual in that it does create a single physical union catalog but allows for the incorporation of other library systems, made possible through a published specification from innovative interfaces. i prospector history, funding, and project goals colorado has a long history of resource sharing through a variety of programs, including use of the colorado library card statewide borrower's card and access to individual libraries' online catalogs through the access colorado library information network (aclin) and other regional catalogs. the colorado alliance has taken a leadership role within the state in promoting cooperation among major academic and public libraries in the areas of automation, joint acquisitions, and other cooperative endeavors. existing online catalog software enabled patrons to easily search individual online catalogs, but searching several catalogs was a tedious task requiring many steps. it has long been a goal of the alliance to have a true union catalog of holdings for all member libraries. to forward this goal, in 1997 the colorado alliance of research libraries and the university of northern colorado jointly applied for and received a grant from the colorado technology grant and revolving loan program to establish the colorado unified catalog, a unified catalog of holdings for sixteen of the major academic, public, and special libraries in colorado.4 the university of wyoming was included in the project through separate funding. the grant of $640,000 was used to develop a union catalog that would support searching and patron borrowing from a single database. the colorado alliance carmel bush (cbush@manta.library.colostate.edu) is assistant dean for technical services at the colorado state university libraries, fort collins; william a. garrison (garrisow@ spot.colorado.edu) is head of cataloging at the university of colorado at boulder (colo.) libraries; george machovec (gmachove@coalliance.org) is the associate director of the colorado alliance of research libraries, denver; and helen i. reed (hreed@unco.edu) is associate dean, university of northern colorado libraries, greeley. prospector i bush, garrison, machovec, and reed 71 reproduced with permission of the copyright owner. further reproduction prohibited without permission. and the university of northern colorado contributed an additional $189,500 of in-kind services to the unified catalog project. additionally, the colorado alliance contributed $119,000 of in-kind funds to support purchase of distributed system software. the colorado unified catalog project, later named prospector, was based upon the inn-reach software developed by innovative interfaces, inc. it included all innovative interfaces sites in colorado as of december 1996 as well as the carl system sites that were members of the nonprofit colorado alliance of research libraries.s the colorado unified catalog project had two major goals: • the development of a global catalog database containing the library holdings of the largest public and academic libraries in the region; and • the development of an automated borrowing system so that users at any of the participating libraries could easily request materials electronically from any other participating libraries.6 the union catalog would allow users to view library holdings and circulation information on titles with a single query of the global database. once titles were located, patrons could request available items and have them delivered to their home library. the grant proposal identified four major goals and outcomes of the project: access, equity, connections, and content and training. by creating a global catalog, the colorado unified catalog project would provide students, faculty, staff, and patrons free and open access to the union catalog via the internet. patrons from all participating libraries would have equal access to the combined holdings of all sixteen participating libraries, thus greatly enhancing resources available to patrons without the necessity of travel across the state. connectivity was greatly enhanced by the installation of high-speed internet access in the colorado alliance office where the union catalog server was housed. the unified catalog project amassed, in one place, the complete cataloged collections of the major libraries in the region creating a single, easy-to-use public interface. training for the catalog would be conducted in each library so that it could be integrated into the standard training and reference services of each participating library.? addressing statewide goals for libraries, the colorado unified catalog was designed to dovetail with an existing project in colorado called the access colorado library and information network (aclin) in several ways. the goal of aclin was to provide statewide searching of several hundred library catalogs in colorado through broadcast 239.50 searching. however, because of the large number of online library catalogs (too many z39.50 targets cause broadcast searching to be slow) and 72 information technology and libraries i june 2000 poor network infrastructure in some parts of the state, the creation of physical union catalogs, such as prospector, greatly enhanced the ability for a project such as aclin to be successful. as stated in the grant proposal it will: • make aclin more efficient since sixteen libraries will be grouped together and can be accessed via a single search, thus saving alcin users steps in searching; • enhance aclin's document delivery plans since patrons can make requests themselves; • offer both web and character interfaces for various levels of users; • provide access via aclin's dial-in ports as well as via the internet; and • support alcin's future developments based on a 239.50 environment.s work on the development of the colorado unified catalog began in mid-1997. even while contract negotiations were underway in midto late 1997, groups were busy undertaking discussions on the design and structure of the unified catalog. work on development of profiling and system specifications continued through july 1998. this data was entered onto the server at the colorado alliance office and a test database was created in august 1998. testing was completed in november 1998 and the first records were loaded in december 1998. the creation of the database for the first twelve libraries took seven months. during the database load the catalog was available for searching, although most participating libraries did not highlight the system in their local opacs. innovative interfaces, inc. conducted training on the actual patron requesting and circulation functions at three sites over the period from may through july 1999. as of january 2000 the catalog included more than 3.6 million unique bibliographic records of the twelve largest libraries in colorado (more than 6.6 million marc records have been contributed, which has resulted in 3.6 million unique records after de-duplication). with the database in place and opac and circulation training complete, prospector went "live" for patron-initiated requests in the first eight libraries on july 23, 1999. as of december 31, 1999, all twelve innovative sites were "live" in prospector. the final programming for loading the records from carl-system sites will be completed in spring 2000. it is anticipated that carl-system library records will be loaded in late spring 2000 and will bring the database to more than five million unique marc records, with more than ten million item records. since the receipt of the grant, two participating libraries have selected endeavor as their online integrated system . contract negotiations are underway between innovative interfaces and the reproduced with permission of the copyright owner. further reproduction prohibited without permission. colorado alliance to come to an agreement on loading records for the endeavor libraries into prospector. i politics and marketing of prospector planning and policy making are inherently political processes in which participants choose among goals and options in order to make decisions and to direct actions. for prospector the diverse makeup of multitype libraries and multisystems augured for different perspectives on implementation from the onset. nearly every department in member libraries would have an impact from the project. to be successful in carrying out their charges, the work of the task forces appointed to implement prospector had to address how these staff could influence the process and how local practices would be affected. the challenge was to engage staff in the process since the task force structure precluded representation from every member library. meeting this challenge would be vital to ensuring input and fostering buy-in and advocacy for prospector in member institutions. consequently, in addition to reviewing standards or best practices and focusing on the goals stipulated in the grant, obtaining factual knowledge about member practices and resources and encouraging communications served as key ingredients in planning and policy development. general process profiling prospector, a main charge for the cataloging/ reference task force, illustrates the general process employed in planning and how key ingredients were applied to gain input and produce results. the first step involved the task force's review of the grant's aims for the unified catalog. with that framework as a basis, a planning process was outlined and shared with participants. the prospector web site detailed the specification development process, including the schedule and opportunities for input. next the task force surveyed participants for information on their systems: bibliographic indexing rules, types of indexes, characters indexed in phrase indexes, indexes on which authority control performed, and suppliers of authority records. using this data, the task force identified the commonalties and differences to determine what to create in the unified catalog. members also consulted innovative interfaces and reviewed what previous innreach customers had established. draft recommendations for indexing, indexes, record overlay, and record display specifications were then posted on the prospector web site and participants requested to review and provide input. a notice in data/ink: the alliance newsletter (www.coalliance.org/ datalink) also referenced the site. at the same time, testing was performed using draft specifications in order to assess them and to check for other concerns that testing might reveal. because of the importance of the recommendations, an open forum was held to receive additional comments. following the forum, the task force members made final adjustments to the specifications. after the period for public comment ended, the specifications were submitted as recommendations to the prospector steering committee for approval. once approved, the specifications became official and were referenced in all site visits. issues because of the design of inn-reach, participants must make decisions about contribution of records, priorities for what record would serve as the master record, order of loading, indexing, indexes, and displays for the unified catalog. circulation functions require decisions about services for patron types, circulation statuses, loan periods, numbers of loans, renewals, recalls, checkouts, holds, overdues, fines, notices, pick-up locations, and billing. in the case of prospector, expectations regarding what would be controversial met with a few surprises. for example, the master record, the bibliographic record from one participating library to which holdings of other libraries are displayed, is based upon encoding level and the library priority list. the latter determines if the incoming record should replace an existing level; a record with a higher level will replace a lower one. based upon the data collected from libraries, a proposal categorized libraries into the following order: large, special, and "all others." the order was further factored by a member library's application of authority control and participation in program for cooperative cataloging programs. the proposal drew minimal comment from libraries. pride of ownership was not an obstacle. everyone was committed to the fullest authorized form of the record. how many loans an individual could request was the subject of early debate. there were concerns about discrepancies between local limits for borrowing and the possible setting of a higher number of loans on prospector. a corollary concern was that a high number might result in depleting a member library's collection. previous experience with borrowing by a subset of members shed light on the issue; there were no problems with loan limits. in fact, inn-reach supports "load leveling" across participating libraries randomly as well as by prospector i bush, garrison, machovec, and reed 73 reproduced with permission of the copyright owner. further reproduction prohibited without permission. precedence tables thus avoiding systematic checkout from one library only. members decided that they could always pass a request on to another owning library if necessary and monitor loans to determine if any abuses would develop. with these options, it then became possible to establish a forty checkout limit for individual patrons in prospector. differences in cataloging practices engendered more discussion because of the potential for a policy that might affect local practice. in the course of comparing practices of institutions, the cataloging/reference task force identified multiple records for the same serial titles that reflected differences in forms of entry and multiple versions treated either in separate records or on the same record. there was wide variety in statements of holdings. these differences warranted gathering further information on holdings; multiple versions, especially those involving electronic versions; and successive/latest entry for cataloging. the task force decided to hold a focus group on serials and invited staff in member libraries from serials, cataloging, and reference to attend. in the meantime, visits to participating libraries were instituted, the first of the roadshows, to discuss serials practices, their implications for overlays and displays, and options for handling them. the focus group attracted a large attendance and proved useful in gathering information about practices and the concerns of participating libraries regarding serials. most libraries reported individual practices for recording holdings. although participants expressed a desire for consistency, attendees also shared that resources are not available to retroactively change them. instead attendees encouraged development of a best practice recommendation that would follow the niso standards for those libraries wishing to change practices. with the exception of electronic versions of serials, focus group participants had no problem with multiple formats in the same bibliographic record as long as it was clear to users. electronic versions prompted a lot of questions about what to do with 856 links to restricted access resources and about changes in software. it was clear that this issue would need further investigation by the task force. the hottest area, successive or latest entry cataloging of serials, registered strong preferences by proponents. attendees did not welcome changing practice in either direction. instead there were questions asked about possible system changes and about the conduct of use studies to determine what problems might arise from latest entry records in the system. with the information gained from the focus group meeting, the task force assigned priority to the areas and pursued latest/ successive entry as the top priority. 74 information technology and libraries i june 2000 already the task force had consulted innovative interfaces, inc. and received a negative reply to possible changes to matching algorithms, loading programs, and record values that could deal with practices of participants because of the software structure. it was technically impossible for a latest entry and successive entry record to load separately given their match on the oclc number. the predominant use of successive entry and its status as the current national standard persuaded the task force initially to recommend coding latest entry in a special way so that the record for such an entry would not be the master record in the system unless it was unique. this interim measure led to the policy recommendation that successive entry serve as the standard for prospector. as a part of the recommendation, members are asked to not undertake retroactive conversion/ recataloging projects to change existing latest entry records. up to the meeting of the prospector board of directors, the serials policy was argued. the approval by the board illustrates that controversial issues may require that leadership commit their libraries to policies. marketing marketing incorporates an overall strategy of identifying patrons' needs, designing products to meet those needs, implementing the products, and promoting and evaluating them. the twin goals of prospector are: (1) one-stop shopping and expanded access regardless of location, and (2) an automated borrowing system to facilitate fast delivery of materials that addressed problems experienced by patrons in searching and obtaining materials. the grant proposal outlined a plan for member libraries to meet these goals through inn-reach software and the cooperative efforts of participating members. with the implementation of the unified catalog and patron-initiated borrowing, the next pieces of the strategy, promotion and evaluation, come into play. member libraries commitment to a cooperative venture takes time and energy. the support for prospector at the library director and dean level had to be translated to staff in member libraries whose efforts would be necessary to support the unified catalog and patron-initiated loans. staff members had to become acquainted with how prospector would benefit patrons and their work. hence internal promotion was a necessary component throughout planning and policy development and with implementation to users. because of the numbers of staff in member libraries, no one method would assure awareness of developments for prospector. the approach involved the alliance's newsletter (datalink), a prospector web site, electronic reproduced with permission of the copyright owner. further reproduction prohibited without permission. discussion lists, e-mail, correspondence, phone calls, documentation, training sessions, and many site visits. the site visits facilitated interaction across institutional lines and were important for discussing critical issues at the local level. in arranging for site visits, it was important to clarify what the staff members wanted to discuss. a general update on prospector might be followed by other technical sessions such as preparing the library's database for load into the prospector system. participants' questions emphasized the importance of sharing the plan for developing prospector and the basic concepts guiding the implementation planning and policy process as listed below. these concepts bore repeating because a staff member could have been hearing about prospector for the first time. • decisions and directions are guided by data and input gathered from participants, standards/best practices, system capabilities, and the aims for prospector described in the grant. • relatively few local practices are affected by participating in prospector. • inclusiveness in record contributions would build prospector into a rich resource for users; however, participating libraries can exert control over contributions. • global policies are developed for prospector only; local sites define their own local policies. • assistance is available to participating libraries in coming up with solutions for special circumstances. • prospector is not reinventing the wheel. although the multitype library and multisystem involvement would produce a new model of inn-reach, other inn-reach sites could serve as models. • think globally but act locally. more than a catchphrase, this statement acknowledges the reality of individual library circumstances and the balancing of prospector goals to maximize access and use of resources by patrons. patrons the design of the pac, a promotional brochure, and individual library public relations efforts all served to promote prospector's availability to users. prospector provides access via telnet and the web. the impetus, however, was to examine member webpacs and create a prospector webp ac that exemplified the best in menu design including caption descriptions, navigational aids, and consistency in display of elements among search screens. special attention was paid to providing example searches that would have appeal for the diversity of patrons served by the membership. after mulling over several name possibilities, the alliance staff suggested the name prospector for the unified catalog, connoting the rich mining history of the rocky mountain area. this identity found its depiction in a classic picture of a gold miner supplied by the colorado historical society. representing the user, the miner is the center panning for gold, an apt image for users exploring the richness of resources from the unified catalog. the incorporation of the image as the logo on the web site and the catalog was followed by its adoption for the entire cooperative venture. name recognition spread quickly. to facilitate promotion at member libraries, the alliance staff designed a brochure. the design features a brief description of the unified catalog, a list of members and information for patrons on how to connect, what's available on prospector, how to use the self-service borrowing, and how to view their circulation record. many libraries have web-mounted guides or paper handouts in their instructional service, using the alliance-designed brochure as a model. finally, staff in member libraries exercised individual approaches to promote prospector to users. denison library describes and provides a link to prospector on its web list of databases and help guides. colorado state university libraries devoted the front page of its library newsletter to "hunting for hidden gold," the introduction of prospector. a special newsletter for auraria's history faculty highlighted prospector in its database news section. the university libraries of the university of colorado at boulder describes the unified catalog in its web site on its state services page. more introductions came from instructional classes held by every member library. profile of participating libraries prospector is unique since it is multistate, multi type, and multisystem. of the sixteen members (see appendix a), almost all are located along the front range of the rocky mountains extending from laramie, wyoming, southward to colorado springs, colorado. only fort lewis college is located on the western slope of the mountains. despite the distances, a network of courier service connects all members. within the membership are eleven public and private academic libraries, three special libraries representing law and medicine, and two public libraries that serve almost one million registered patrons. twelve of the libraries operate innopac and are loaded into prospector. two libraries on the carl system are slated for loading in mid-2000. two other libraries are migrating to the voyager system by endeavor information systems in the summer of 2000. hopes are to incorporate them into the system in 2001. prospector i bush, garrison, machovec, and reed 75 reproduced with permission of the copyright owner. further reproduction prohibited without permission. description of how inn-reach works the inn-reach software is designed to provide a union catalog with one record per title with all of the libraries holding a title represented. after databases are loaded initially, the software automatically queues transactions that occur to bibliographic, item, order, or summary serial holdings records and sends those transactions up to the central catalog. staff in the local library has no extra work or steps to take to send transactions to the union catalog. the union catalog uses a "master" record to maintain only one bibliographic record per title. the "owner" of the master record is determined by several factors. a bibliographic record with only one holding library automatically has that library as the owner of the master record. if more than one library holds a title, the system uses an algorithm to determine which record coming into the system has the highest encoding level. the library that has the record with the highest encoding level becomes the owner of the record, and its version of the record is displayed and indexed in the catalog. in addition, a table is created which has a list of the libraries in priority order for determining the master record if two or more matching records enter the system with the same encoding level. for the prospector catalog, a survey was conducted of the participating institutions to determine which libraries might have the best or fullest records. questions in the survey included size of database, source of bibliographic records, participation in national projects (e.g., program for cooperative cataloging, oclc enhance), amount of authority work done and level of authority control in the local database, level of cataloging given to records, and type of institution. the task force charged with designing the catalog examined these surveys and determined a priority order of the participating institutions for selecting bibliographic records. the system also uses a set of match points each time a bibliographic record is added to the union catalog. whenever a match occurs, the system examines the encoding level of the incoming record and the library from which the record is coming to determine if a change in the master record is required. the existing record is overlaid by the incoming record if the master record holder is changed. the first check is done on the oclc record number. if there is a match on that, the system adds the holdings to the existing record. if there is no match on the oclc number, the system attempts to match on the isbn or issn in combination with the title in the 245 field. again, if a match occurs, the system adds the holdings to the existing record. if no match occurs, a new bibliographic record is added to the catalog. in addition, each library that has a local innovative interfaces system has the ability to exclude bibliographic, item, order, or check-in records from being sent to the 76 information technology and libraries i june 2000 union catalog. suppression may occur in each of these record types. the library may also choose to send a record to the union catalog but exclude it from public display in the union catalog or to suppress a record from displaying in the public catalog both locally and centrally. the inn-reach system has no central database maintenance module, though it does provide a staff mode in which to view records, to create lists, and to monitor transaction queues. the staff module that is available via a telnet connection allows authorized users to view those records that have been contributed to the union catalog but are not displayed to the public in the union catalog. for example, a library may contribute its order records to the union catalog but choose to suppress those records from public display; however, authorized staff may view these records in the inn-reach staff mode or create lists for collection development purposes that include those order records. circulation status of individual items and volumes also appears to the user. the prospector member libraries with local innovative interfaces systems also maintain a set of circulation or item status codes that display various messages to users of their individual public catalogs. the inn-reach system also has a set of circulation or item status codes. agreement was reached on what the status codes were to be in the central catalog, and each member library then had to map its local codes to the codes used in the central catalog to ensure proper message display in the union catalog. in some cases, the member libraries had to adjust local status codes. indexes for the prospector catalog were determined during the profiling process. in general, there are more indexes in the union catalog than are available in the member libraries' local catalogs. indexes in prospector include author, author/ title, library of congress subject headings, medical subject headings, library of congress children's subject headings, journal title, keyword, library of congress classification numbers, national library of medicine classification numbers, dewey decimal classification numbers, government documents numbers, oclc numbers, and special numbers (e.g., isbn, issn, music publisher numbers, etc.). the classification number indexes are derived using the classification numbers that appear in the defined marc tags for the various classification schemes in the bibliographic record and do not represent local call numbers. local call numbers are always stored at the item record level in the union catalog. it was decided that many local marc fields that are defined for local notes or local access would not transfer from the local catalog to the union catalog (e.g., 59x, 69x, 79x, 9xx) to avoid ambiguities and excessive heading conflicts. therefore, there may be access points or index entries in the local catalog that may not be available in the union catalog; the local reproduced with permission of the copyright owner. further reproduction prohibited without permission. catalog may still contain "richer" or "fuller" searching than the union catalog. the local catalog may have materials accessible in it as well that do not appear in the union catalog. patrons using a local catalog may transfer their searches up to prospector simply by clicking on a button in their local public catalogs and have the search automatically occur in the union catalog. patrons may access prospector directly either via the world wide web or via telnet. navigation between local catalogs and prospector as well as navigation within prospector has been designed to be clear and simple. patrons may also go from prospector either back to their local catalog or to the local catalogs of other member libraries. when a patron locates an item that he or she wishes to borrow from prospector, he or she may initiate the request for the item online. the borrowing and lending process is described below. prospector member libraries have been asked to be as inclusive as possible in contributing bibliographic records to the union catalog. member libraries have been asked to contribute the following: • items that users may borrow, including all monographic materials that circulate, and other material types as specified by individual institutions that are listed as available for circulation. • items that users may not borrow but may use onsite, including reference materials, archival materials, rare books, and others as determined by individual institutions. virtual items, such as electronic journals, which have ip limiting and authentication are included in this category. • items that are owned virtually which have urls or ip addresses that are open and unrestricted include government publications and selected home pages as determined by the local institution. bibliographic records that are contributed should have as full cataloging as possible for identification and retrieval. materials that are on reserve and other locally defined special materials (e.g., materials that have use restrictions placed upon them) may be excluded from prospector. the prospector union catalog will also include bibliographic and circulation information from libraries that do not use innovative interfaces as their local system vendor. i the integration of non-innovative libraries into inn-reach one of the major efforts in the prospector project was to be able to incorporate bibliographic, item, summary serial holdings, and acquisitions records from other vendors with the inn-reach union catalog software. in 1997, when the grant was written, it was envisioned that the system would incorporate libraries using two ils vendors-innovative interfaces, inc. and carl corporation-two of the major vendors in colorado at the time. twelve libraries used innovative interfaces and four used the carl system (denver public library, regis university, colorado school of mines, and the university of wyoming). however, in late 1999, the colorado school of mines and the university of wyoming decided to migrate to the voyager system by endeavor information systems (this is occurring in 2000). both of these institutions have still expressed an interest in being part of prospector, so they will need to be integrated in 2001 after they are stable on their new system. the remaining carl sites will be fully integrated in 2000. the integration of records that allows document requests from different vendors is being accomplished as follows: • innovative interfaces, inc. has published a set of specifications for how bibliographic, item, summary serial holdings, and acquisitions order records should be formatted to be loaded into the union catalog. • published specifications were also created for patron verification and for how document requests are to be transferred. • the alliance office is developing the software to package usmarc bibliographic records, item records, summary serial holding records, and order records to transfer to prospector. work is also being done so that document requests may be relayed between the different systems using an intermediate unix server running an sql database with a web interface for circulation to ill staff. because the carl and endeavor systems are built differently, the record updating may be done on a "batch" basis several times a day. patron verification, to determine if a carl or endeavor patron is in good standing before allowing a document request, will be done in realtime. i administrative and committee structures under provisions of the grant, the dean of libraries at the university of northern colorado provides administrative management for the project while the colorado alliance of research libraries houses the server, maintains the union catalog software, provides network connectivity, prospector i bush, garrison, machovec, and reed 77 reproduced with permission of the copyright owner. further reproduction prohibited without permission. develops the software to integrate the non-innovative sites into the union catalog, and provides ongoing system administration support for the project. a prospector steering committee comprised of deans and directors of three participating libraries provided general overview for the project during the initial stages. to carry out the initial work of the project, two task forces were appointed with responsibility for detailed design and implementation of the system: the catalog/reference task force and the circulation/document delivery task force. the catalog/reference task force was charged with making all bibliographic and display decisions relating to the catalog. this included establishing the criteria for determining which institution's bibliographic record displays in the catalog, developing display and overlay hierarchies for bibliographic records coming into the system, and identifying marc fields that would be indexed and displayed in the catalog. membership on this task force included both public services and technical services personnel, but did not include representation from every participating library.9 the circulation/document delivery task force was charged with developing common circulation policies to be applied in the union catalog including loan periods, fines, renewals, holds, recalls, checkout limits, and patron blocks. the task force was also responsible for developing the precedence table for routing patron requests. the members of this task force represented each participating library, and several libraries had representation from both their circulation and interlibrary loan department.lo these two task forces conducted meetings from july 1997 through december of 1999. the stage was set for the task forces' work at a training session held by innovative interfaces, inc. on system operation and functionality. each group received direction on what policy issues needed to be determined to lay the groundwork for establishing the codes that drive system functionality. after the initial training, each task force met several times a month, often consulting with innovative interfaces, inc. and/ or their local libraries as their planning and deliberations continued. communication was an important component during the development of the system. soon after the grant was awarded, staff from the alliance office visited each participating library and met with library personnel to explain the overall goals of the project and how work would be conducted. as detailed development progressed, open forums were held in central locations to keep representatives of all libraries apprised of progress and to get feedback regarding specific policy issues. completed work from the task forces was mounted on the prospector web site. in addition, regular articles appeared in data/ink, the alliance monthly newsletter. specific training sessions were conducted both by the task forces and by innovative interfaces. 78 information technology and libraries i june 2000 as the actual database loading process began, the catalog/reference task force conducted sessions at each prospector library. these sessions were twofold in purpose: to provide an opportunity for a general overview of how the database structure and indexing worked for all library personnel, and to train technical services personnel in how local coding of records impacted the display of their local records in the global catalog. in preparation for going live with patron requesting, innovative interfaces, inc. conducted pac searching and circulation training sessions at several central locations for frontline staff from all institutions. in addition, the circulation/ document delivery task force held a central session for representatives from all libraries to discuss issues relating to the flow of materials among libraries. during system implementation, it became apparent that some ongoing structure would be required for ongoing maintenance and development of the global catalog. in completion of their charges, each task force prepared a final report, which was submitted to the steering committee and to the prospector directors group. each task force recommended its own termination but outlined a structure to address ongoing issues. as approved by the prospector directors group, the ongoing governance structure is multilayered with frontline operations groups, broader planning and policy-setting committees, an advisory committee, a directors group, and electronic discussion lists for communication. monitoring of the day-to-day work of the cataloging and circulation/ document delivery operations is handled by frontline staff via e-mail, electronic discussion lists, and/ or telephone. broader planning and policy issues are addressed through smaller, representative standing committees. the advisory committee and directors group operate at a policy level. the new structure includes: • a catalog site liaison group comprised of one representative from each participating library and charged with serving as the point of contact for inquiries regarding catalog maintenance, access and record merging; • a catalog/reference committee comprised of members selected from the participating libraries and charged with responsibility for all bibliographic and display issues relating to prospector. this includes monitoring details of the current implementation as well as addressing ongoing policy issues, recommending system enhancements, testing new system functionality, and training staff at new sites coming into the system; • a document delivery site liaison group comprised of one or more representatives from each participating institution with responsibility to reproduced with permission of the copyright owner. further reproduction prohibited without permission. serve as a point of contact for other prospector libraries that have inquiries concerning issues, lost books, courier delivery, or related topics; • a circulation/document delivery committee comprised of representatives selected from the participating libraries and responsible for issues relating to the courier delivery service, circulation load-balancing, monitoring member compliance with circulation policies, recommending system enhancements, testing new system functionality , and the year-end reconciliation of lost book charges; and • a prospector advisory committee comprised of tewnty-four deans and directors from participating libraries to address issues requiring quick response relating to project specifications and operating rules. the prospector directors group is comprised of the deans/ directors of all participating libraries and is charged with making recommendations on high-level policy and admission of new participants . since prospector is a project of the nonprofit colorado alliance of research libraries consortium, all final high-level decisions and financial commitments are subject to the approval of the board of directors of the consortium . at the present, five of the sixteen prospector libraries are not part of the formal consortium but participate in this one project. the newly formed committees will continue to address broad policy and operational issues such as the load-balancing tables for routing patron requests to owning libraries, will document best practices for local libraries to follow in implementing certain functionality within their local system to achieve maximal results in the central catalog, will identify enhancements to the system , and will test new release functionality. i borrowing and lending policies and specifications as a prelude to its work, the circulation / document delivery task force examined borrowing and lending practice s from other innovative interfaces . inn-reach sites and reviewed the borrowing policies for consortia! borrowers that were developed and agreed to by a subset of alliance libraries (university of northern colorado, auraria library, and denver public library) several years ago. the first major duty of the task force was to establish circulation and document delivery policies that would govern those functions within the prospector system. these common circulation and document delivery policies were based on a series of assumptions: • the task force policies apply to the unified catalog only; local sites define local policies; • local workflow remains local purview; • policies should be kept simple; • circulating materials are commonly circulated materials, primarily books, at each site; • the task force will work within the confines of the inn-reach system; • if a patron is blocked locally, he or she will be blocked at the global level; • for routing purposes, each institution (rather than branch) is the routing site; and • local sites will determine when their items are declared lost. the task force established a series of recommendations for policies that applied to the prospector system . the proposed policies were discussed within the local institutions as well as with various administrative groups. the final policies for prospector lending as adopted and implemented in the system are: • loan period : twenty -one days • renewals: one • number of holds allowed : forty • checkout limit: forty items • recalls: none, except for academic library reserve collections • lost book charge: $100, which is comprised of a $75 refundable lost book charge and a $25 nonrefund able processing fee • libraries establish their own local rules for overdue fines on prospector materials . key features of the inn-reach software that were emphasized with each local library during training sessions are: • libraries have local control over what is loaned through the global catalog. • libraries have local control over which of their patrons can borrow materials through the global catalog. • if the local copy is checked out or missing, a copy may be requested through prospector. • the system is sensitive to multivolume works and allows particular volumes to be selected. the ongoing document delivery committee has developed a series of "best practices" that establish benchmark policies that each library is urged to adopt in the spirit of uniform cooperation among participating libraries. individual libraries, however, may choose not to adopt these practices. prospector i bush, garrison, machovec, and reed 79 reproduced with permission of the copyright owner. further reproduction prohibited without permission. system functionality the actual steps for a patron to request an item within the prospector system are simple and self-explanatory. once a patron has identified an item they wish to order, the following steps take place: • the user is prompted for institutional affiliation, name, and library card number. • the system checks local system to ensure that the patron is in good standing. • the user selects a pick-up location from those offered by their home institution. • the system forwards the patron request to an owning library with an available circulation status doing load balancing among the libraries with available copies. once the patron request is forwarded to a lending library, the request goes into the queue of requested items from that library. each library has established its own workflow for handling requests; however, that workflow must include interaction with the system to record the status of the request. once the item is located by the lending library, it is checked out to the requesting patron's "home" library and is sent, via courier, to that library. the "home" library then receives the item in the system and holds it pending pick-up by the patron. when the patron arrives to borrow that item, it is checked out to that patron's record according to the prospector loan rules. having a common set of loan rules for all prospector loans provides consistency for the patron. the patron may still have multiple due dates on items checked out at the same time depending on the loan rules for local checkouts. the system maintains statistics on several elements of the borrowing and lending processes. it tracks the total number of items borrowed and loaned and calculates the ratio of borrowing to lending per institution. in addition, it tracks the number of items cancelled and the reason why, the number of holds filled and cancelled, and several other groupings. i challenges and issues with the building of prospector still underway and public access available only since late july 1999, prospector is doing a respectable volume of loans in its infancy. over ten thousand items were delivered during the first six months of operation. this number is expected to dramatically rise as the system grows and as local libraries promote the service. this auspicious start provides a sense of 80 information technology and libraries i june 2000 accomplishment tempered by recognition that there is more to do. some of the major challenges facing the project include: • • • • • • • • development is underway to integrate records for the carl system libraries into the central catalog and provide borrowing capabilities for their patrons. as member libraries choose other online system providers, ideally, these systems likewise need to be interfaced with the prospector system. coming to agreements with all vendors involved will require careful negotiation and wording of contracts. discussions are underway with innovative interfaces and endeavor information systems for merging endeavor libraries into inn-reach. monitoring how the fiscal accounting for first endof-year reconciliation will work for lost books is planned. developing best practices and evaluating software enhancements for inn-reach are necessary. we need to determine how to handle electronic resources and multiple formats, and load records from commercial electronic resources, for example, net library. we must improve matching within the system and additional enhancements to the prospector web site. with growth of the system, full-time operations and management staff may be required. securing funding for the new ventures and new staffing will require development efforts or a sharing of costs by members. there is no state-based funding for ongoing maintenance and new product acquisition. with the increasing flow of materials between libraries, the courier delivery service must be monitored on an ongoing basis. the statewide courier service has been recently restructured and was contracted based on pre-prospector activity levels for interlibrary loan materials. with the ever-growing popularity of prospector, there will be a corresponding increase in volume for the courier. service levels need to be monitored closely to ensure that the speed of delivery is maintained and that the loss and incorrect routing rate is within acceptable limits. the balance of borrowing and lending will have financial impacts on some of the participating libraries. through a legislative allocation, the state library of colorado provides funding on a per transaction basis to libraries that are net lenders, or that loan more materials than they borrow. most libraries are considering the prospector transactions as equivalent to interlibrary loan transactions and counting them toward the payment for lending program. it is anticipated that the inclusion of prospector activity in the interlibrary loan borrowing and reproduced with permission of the copyright owner. further reproduction prohibited without permission. lending statistics will significantly alter the balance of payment for lending among the prospector libraries. already prospector has shown that it is changing behaviors. the cooperation between libraries has been impressive. in member libraries, staff are factoring prospector into their plans and realizing that keeping prospector operations staff informed of problems is a good habit. user searching and document delivery patterns are changing. margaret landrum, director at the fort lewis college library, predicts that prospector will have a dramatic effect on researchers in the geographic area. its start has given all members a share in that expectation. i the future and interesting spin-offs union catalog projects often take on a "life of their own" far beyond what was originally envisioned. some of the future spin-offs may include: • the addition of other research libraries in nearby states. • collection overlap studies and improved coordination on acquisition and weeding projects between libraries. • with the full implementation of the union catalog, there are opportunities for resource sharing at a broader level. the central catalog has the functionality to support bibliographic records for and access to "consortia!" resources, thus enabling libraries to jointly purchase resources and provide centralized access to them. • as database and online information providers develop new methodologies for access to their resources, there will be opportunities to easily link from either the local or central catalog to these online resources, a process which is cumbersome and/or impossible in the nonglobal environment. for instance, where databases are centrally mounted at the alliance office with shared ownership, the link to serial holdings feature is pointed to prospector, thus providing patron access to consortiawide holdings. • use of the system as a central repository for cataloged metadata for electronic resources on the web. • encouraging innovative interfaces, inc. to allow document requests that "fail" in the system to be forwarded to national ill subsystems or commercial document suppliers using national standards. i conclusion prospector dramatically alters the bibliographic landscape in colorado, offering patrons easy access to the bibliographic wealth of the state. patrons will be easily able to move from a local catalog to this regional system and request materials. librarians will find the system useful for collection overlap studies, improved coordination on acquisitions and weeding projects, z39.50 links with other indexing/ abstracting services for serials holdings information (e.g., ovid or silverplatter), and expedited book delivery. the high level of cooperation among the diverse nature of the participating libraries is exemplary. the incorporation of public and private universities, public libraries, and special libraries offers a model for cooperation. references 1. anthony j. dedrick, "the colorado union catalog project," college and research libraries news 59, no. 10 (1998): 754-55; george machovec, "prospector: a regional union catalog," colorado libraries 25, no. 2 (1999): 43-45. 2. clifford a. lynch, "the next generation of public access information retrieval systems for research libraries: lessons from ten years of the melvyl system," l!'.formation technology and libraries 11, no. 4 (1992): 405-15; bernie sloan, "testing common assumptions about resource sharing," information technology and libraries 17, no. 1 (1998): 18-29. 3. thomas dowling, "ohiolink-the ohio library and information network," library hi tech 15, no. 3 / 4 (1997): 136-39; lindy naj, "the carl system at the university of hawaii uhm library," library software review 12, no. 1 (1993): 5-11. 4. gary pitkin and george machovec, colorado union catalog. senate bill 96-197. technology grant and revolving loan program. excellence in learning through technology. december 1996. grant proposal by the university of northern colorado and the colorado alliance of research libraries. 5. gary pitkin, colorado union catalog-prospector. final report. july 27, 1999. 6. machovec, "prospector: a regional union catalog." 7. ibid. 8. ibid. 9. prospector staff web site, www.coalliance.org/prospector. 10. ibid. prospector i bush, garrison, machovec, and reed 81 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a general statistics about prospector: • sixteen libraries (see below) • twelve innovative interfaces sites (went live in fall 1999) • two carl sites (to go live in 2000) • two voyager endeavor sites (to be incorporated in 2001 pending final negotiations with both vendors) • 3.6 million unique marc records as of january 2000, which are expected to grow to more than 5 million after the incorporation of the carl and endeavor sites. • 9 million item records, which are expected to grow to more than 12 million after the incorporation of the carl and endeavor sites. • currently 61 percent of the records in the system are held by only one library. • greater than 1 million registered patrons are possible users . denver public library has over 500,000 patrons and jefferson county public library has over 300,000 patrons . • prospector url for public use : http:/ /prospector.coalliance.org • prospector staff url, which includes policies, committee minutes, and profiling tables: www.coalliance.org/ prospector prospector libraries auraria library colorado college colorado school of mines colorado state university denver public library fort lewis college jefferson county public library regis university university of colorado at boulder university of colorado/colorado springs university of colorado/health sciences university of colorado/law library university of denver university of denver/law library university of northern colorado university of wyoming web site http://carbon.cudenver.edu/public/library http://www.coloradocollege.edu/library http://www.mines.edu/academic/library http://manta.library.colostate.edu http://www.denver.lib.co.us http:/ !library. fortlewis.edu http://www.jefferson.lib.co .us http://www.regis.edu/1 ib/wlibhome.htm http://www.libraries.colorado.edu http://web.uccs.edu/library http://www.uchsc.edu/library/index.html http://www.colorado.edu/law/lawlib http://www.penlib.du.edu http://www.law.du.edu/library http://www.unco.edu/library http://www-lib.uwyo.edu 82 information technology and libraries i june 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix b early borrowing/lending data the borrowing and lending patterns in prospector will be of interest to monitor because of the wide variety of participating libraries in the system. the incorporation of both academic and public libraries has the potential for different use patterns as seen in more homogeneous academic union catalogs. the following data represents some of the very early borrowing and lending patterns in prospector . all of the libraries in the table went "live" in terms of borrowing and lending in late july or august 1999, with the exception of jefferson county public library, which went live in november 1999. history with other similar projects has shown that use will dramatically grow as libraries and users gain familiarity with the service. the incorporation of denver public library in 2000 should provide significant impact on the service. at the present (and in the accompanying table), prospector has been configured to do random load balancing without the use of any precedence tables to force document requests to one site or another. borrowing site aur ccc su cul cub du dul ftl jcpl uccs uchsc unc lending (owning) site ratio ub totals 1879 930 2301 225 1520 1132 129 946 1775 882 364 2063 aur 0.89 1667 108 282 33 232 187 17 113 234 128 70 263 ccc 0.72 673 114 109 11 96 57 66 89 53 10 68 csu 0.86 1985 267 156 29 272 221 18 130 288 134 55 415 cul 0.55 123 24 9 20 5 11 12 3 10 7 3 19 cub 2.05 3120 396 231 590 26 260 21 246 420 233 56 641 du 2.07 2341 361 153 464 42 315 20 163 279 131 69 344 dul 1.12 145 27 7 14 27 15 25 3 11 6 4 6 ftl 0.54 511 66 36 130 3 66 36 7 72 31 11 53 jcpl 0.54 962 187 81 201 11 154 65 11 64 33 38 117 uccs 1.02 900 170 65 148 12 130 65 5 3 137 15 90 uchsc 0.83 301 63 5 49 5 26 31 3 5 32 36 46 unc 0.69 1422 219 81 291 27 207 153 13 89 222 90 30 prospector fulfillments report, august 1999 through february 14, 2000 prospector i bush, garrison, machovec, ano reed 83 lib-mocs-kmc364-20140103103053 113 monocle marc chauveinc: conservator, university library of grenoble, saintmartin d'heres, france a new processing format, based on marc ii and some of bnb's elaborations of marc ii. it further enla1·ges marc ii to encompass french cataloging practices and filing arrangements in f1·ench catalogs. when the bibliotheque universitaire de grenoble, section sciences, wished to transform its card catalog into a book catalog and later into an on-line catalog, the first necessity was to build up a format fitted for the handling of complex records and the filing of non-alphabetical headings. after several personal assays at a format , the librarian at grenoble had translated into french, to give french librarians the opportunity to become acquainted with them, the marc ii and bnb formats ( 1,2) and finding these two formats the most flexible and complete of those reviewed, he also began the work of adapting them to french cataloging rules. the marc format is a standard format designed purely for communication of bibliographic records on magnetic tape; marc ii is a marc format containing library of congress cataloging data disseminated by the marc distribution service of the library of congress. the marc ii format is not intended as a local processing format; indeed, even the library of congress uses its own internal processing format and not marc ii. most centers using marc ii records have designed their own processing formats and file structures from which, if the center is to participate in a network, it must be possible to regenerate records in a communications format. the bnb format, one of the derivatives of marc, contains british national bibliography cataloging data. l 114 journal of library automation vol. 4/3 september, 1971 translations of the two formats was done in january 1969. subsequently a first french adaptation of them was discussed by a group of experts from the bibliotheque nationale and the direction des bibliotheques and was judged not good enough; a deeper work was necessary to analyze the marc format and test its compatibility with french cataloging practices. the resultant new processing format, called monocle (projet de mise en ordinateur d'une notice catalographique de livre), was published in june 1970 (3). programs meanwhile, in order to test the format and to prepare the operational work as soon as possible, programmers attached to the institute of applied mathematics at the university began to write several programs in cobol. cobol was chosen because the institute had good practice in that language, having worked with it for several years; because it can be easily modified if there is a change in format; and because it can be used with several types of computer, enabling other libraries to use it. the programs are still in the process of being written, but since the beginning of january 1970 all books cataloged by the library according to current practice have also been cataloged according to the new system and their records entered into the computer, so that both systems are now working simultaneously. the author catalog program, which is the most difficult and sophisticated, is not yet ready, but most of the following that were foreseen as necessary are actually working: 1) a test program (tstanaly) that checks the logical structure of the records at the input stage and displays on the printout any errors (fields missing, length of tags, of indicators, subfield codes, logical links between fields and information codes, etc. ) ; 2) a program ( expcreat) that creates the files, computes the directory and puts the records at their places on the disks; 3) a program ( tstnot ab ) for producing an alphabetical printed index containing author plus abridged title plus the address of the record on the disk; 4) a program for sorting records according to udc numbers and for printing them on a two-column weekly list; 5) a program to correct and update the created files; 6) a program for sorting records alphabetically in an annual catalog; 7) a program giving a list of udc numbers with the corresponding subject headings and vice-versa ; 8) several small modular programs for supplying statistics on the number of books and volumes, and expenditure in total and by subjects. input and output the institute of applied mathematics has two computers: an ibm monoclejchauveinc 115 360/40 and an ibm 360/67 that work together in a conversational mode during the day and in batch processing during the night. the library uses both of these modes. the conversational mode is controlled by a system called cp jcms (cambridge monitoring system) for the input of data through an ibm 1050 terminal with a paper-tape puncher and a reader, and the batch-processing mode by os (operating system) for the production of lists and statistics. on-line input through the terminal is very convenient for corrections, because of quick access to non-created provisory files of 100 records and the printed list that can be proofread. it has some inconveniences, however, the first of which is that it is a slow system. a typist punches the paper tape at an average rate of twenty records a day. taking into account the time of reply, errors of transmission, and breakdowns of the system, it is not possible to read more than fifty records in a morning, although theoretical speed of reading is forty records an hour. then the files have to be read through the tstanaly program, printed on the line printer, then controlled by the librarians, recalled and corrected on the 1050 terminal, and then again listed, controlled and so on until they are correct. it can take several days before a file of fifty records is ready. though paper is a convenient means of storing data in secmity in case of destruction of the files, it is a slow means of transmitting data and, because it may cause errors in transmission, is not very reliable. the 1050 terminal, although a typewriter, does not have a character set sufficient for library work. it was necessary to create multipunch codes for diacritical marks. because the foregoing is also an expensive means of input, the library is experimenting with a new one. using an ibm 72 tape typewriter already in the library, the corrections will be made off line with the two tape boxes existing on the machine, and when several tapes are correct they will be sent to an ibm service bureau to be translated into a computer magnetic tape. the translation program, which will be written by ibm staff, is not very expensive. output is on an ibm 1403 n1 line printer on which is used a special print train sn with upperand lower-case roman alphabet and to which diacritical marks have been added. products are 1) weekly lists of accessions according to the universal decimal classification, 2) weekly lists of books according to acquisition number, 3) weekly lists of books according to call number, 4) a monthly catalog by authors, 5) an annual catalog by authors, 6) an irregular catalog of periodicals, 7) an irregular catalog of serials, 8 ) an irregular catalog of theses, and 9) regular statistics on the work of the library. it was felt that for several years catalogs in book form would be less expensive and more useful than a system of on-line inquiry that would require display terminals to be used by untrained people. 116 journal of library automation vol. 4/3 september, 1971 format although it will be possible later on to transform monocle's internal format into one suitable for information retrieval , the system in use at grenoble is mainly conceived for printing of the lists enumerated above. this goal led to the consideration of the major problems of filing records and building an internal format to allow easy programming of correct filing , even if this correct filing is rather complicated for the computer. there were two possible ways to achieve this aim: one was to build a simple format and provide complex programming to introduce lists of dead words, tables of transcodification and translation (as "me" to "mac," "van nostrand" to "vannostrand" ) ; the other was to build a more complex format to make programming more simple and generalized and computer processing less expensive. the latter way was followed by the library of congress and the british national bibliography in their communications formats, so a start was made from these two projects, keeping most of their structure, tags and subfield codes. the system to be built, however, required a working format, not a communications format, which led to th e first modifications. two files were created, each containing leader, directory and variable fields. the two parts of each record can be reassembled into one marc record for a communications format. record files the first file, called the index (figure 1) , contains the leader slightly modified; field 008 of the marc format , put in fixed positions and having 69 characters; and the directory, built in a different way from the marc directory. since there will never be a field length of 9999 characters and a starting character position of 9999, length was reduced to 999 characters and the starting character position to 999. since twelve characters are too much for a normal field, these two numbers are only used for computation and are put in binary and both reduced to two bytes. this permits the insertion of three pieces of information between the tag and the field length: the subrecord indicator (two characters), the repeat indicator (one character) and the indicators (two characters). the directory takes the following form : 1 tag 2 subr. 3 1 4 rep. indic. s 1 6 1 1 length st. ch. pos. 8 9 10 1 11 121 bnb marc allows one digit for the subrecord indicator that makes possible nine codes for nine subrecords. since monocle will require more than nine subrecords, two digits are used, thereby permitting 99 subrecords. the repeat indicator of one digit is necessary if several identical fields are repeated in one record (e.g., in the case of several editors). a cross monocle/chauveinc image des enregistrements guide codes d'information emrrc:tntc: vedette auteur ou thrc anon ymc ss 56 57 58 59 lcrc: date jcr mot du titre ou de l'cditcur 60 61 62 63 index 10 ii 2c date 10 ii 12 12 so fichi er principal donnccs prix ~i fig. 1. map of index and cataloging data records. 52 sl donnccs 117 118 ]ourtuj.l of library automation vol. 4/3 september, 1971 reference can be directed towards one of these fields, and to prepare the sort field it is easier for the programs to look only for the tags than to test every "$a, in a field, which requires testing every character in the field. the repeat indicator has another function , that of linking several fields to be associated in the processing. on the worksheet ( figure 2), tags and indicators are written in the initiales border eau de catalocage ms 69 i l i i ecat type fmmc u d,· dah· 1\o rc dah' (4c,) 2.: done ( 4 c.) illustration n1vcau rtpro dur. n a m m r 18 19 ( )uvugc de r h(rc ncc lndt'ji vl·dctcc luccuturc l~ogr.jph.pcnod. lcr pub. sc. collection suite f i 20 21 22 2} 1.4 25 1 6 27 28 29 lo ll n )) lmjut' ~u ~~:c not1:ltnr t. ourcc cat. '":nodk1tc nbrc ¥ol source fournunur "'-'brt t'll . pru e in g ~ 138 s 0 11 a hia 0 11 0 2 0 5 0 0 34 js 36 jq 10 41 42 •0 44 4 s 4ti 47 48 ·19 so 51 52 sj 54 vc"dcttc auu:ur ou tnrc is vt·d'"·uc l c.·r mot tiu,• ou cdltcur 2cmoc tltu ed. l>att g ie n e 1¢ c a j s p e n a 9 619 it ss 56 57 58 59 60 61 62 6j 64 6' 66 67 68 ()9 et~qucttc l nd •c. co.gi!-lli.?>.i\~i9.<:1 ... $.1? ... a .. i;.q.!1.1.pr~.h~~-~-~-y.l! .... ~r~.'!-.~.~-~-<; .... $.~ ... :;:.$!, .... l!y ... ~r.nst . .w.[9.ug!!-.j;~g} .. gi\.~pi\.j;a , ..... , ... ax.~ .. .j\r.~~-l_cl ... w.a.x:.~.\l.~ ... h . ed ...... # ................... ......... ...... ................... ... .. .. 681 04 .. $a .. g.eneti.que ..... #. ..... # ...... ................................ ............ ....................................................... . fig. 2. worksheet . monocle/chauveinc 119 following order: tags, indicators, subrecord indicators, repeat indicators (e.g., 100 00 001). on the magnetic disk, however, the order is as follows: tag, subrecord indicator, repeat indicator, indicator (e.g., 100 001 00). the second file is the main file. records in this file have the same general design as the marc ii communications records, and monocle bas retained all the fields designed by the library of congress. each field begins with a two-character subfield code. grenoble does not use fields 001 to 009, but since the bibliotheque nationale will use these fields, monocle retains them. another characteristic of the second file is that records are input in random order and are given identification numbers that are their physical addresses on the disk. the address, which is put in the leader, is made up of ten digits, of which one is the number of the disk, four the number of the track and five the number of the record. access to every record is simple, since the identification number is also the physical address. a printed abridged alphabetical list giving author, title and this number indexes a printout of the main file. additions and corrections are made on this printout and then added to the computer file through a correction tape. the identification number is the access point. no supplementary internal index is needed, nor is any sequential search. there is direct access to every record in the file. some fields have been added for monocle, some deleted, and some modified. the main field deleted is field 130 (main entry uniform title heading) because its place was considered to be in the group of title fields. accordingly fields 630, 730 and 930 are deleted. that is to say, they are kept on the format, but not used, as is the case with many other fields. field 008 contains codes different from those of the ·marc format. these 69 codes (see figure 1) are put in fixed position just after the leader and before the directory. this permits various studies and manipulations (statistics, sorts, etc.) without going to the main file, which is in a variable-length form and whose contents are therefore less easily accessible than those of fixed fields. field 080 for universal decimal classification was not developed by the library of congress or bnb. for monocle it has been given a structure that permits differentiation of the call number (when the book is classified on the shelves according to the udc) from the udc number, which is only used for the card catalogs. in this structure "$a" represents the call number and "$b" represents the continuation of the udc number, as shown in the following example: 080 00 $a dur 539.143 $b ( 083) : 547.1 the colon instructs the computer to make a cross reference from the second number to the first. in field 100, main entry author personal name, the general layout was 120 journal of library automation vol. 4/3 september, 1971 retained, but the subfield codes changed for filing purposes. as a matter of fact, the filing rules for personal names at the bibliotheque nationale differ in many aspects from american library association rules. in designing monocle, the library tried all along to give filing value to subfield codes in order to simplify programming. for instance, the filing order for the same name is: saint pope emperor kings of france kings (other countries) forename single surname plus forename this gives: john, saint john, king of england john john, bishop of chartres john, peter john, peter, ed. john, peter, advocate therefore the following subfield codes have been adopted: names $a saint $b pope $c emperor $d king of france $e other kings $f (alphabetized by name of kingdom) relator $g date $h numeration $i precedent epithet $k filing epithet $1 forename $m this structure is closer to that of the bnb than to marc's, but an important change has been made in the indicators. marc and bnb indicators for this field were chosen for communications purposes and are therefore not necessarily convenient for internal processing. in fact, the program had to test every character and take action on some of them (delete a blank, transform a hyphen into a blank, etc.), which takes a lot of computer time. to facilitate construction of sort keys a change of indicators was made that assigned to each of them a specific action. for first indicator 1 no action is assigned. that is to say that a name monocle/chauveinc 121 is filed exactly as it is, whether it is a single surname or a compound surname: 100 10 $a durand $m charles " smith $m john ,, castro calvo $m frederico hoa tien su santa cruz $m alonso de eighty percent of names are put under this indicator and put in the sorting field without any test, which saves much computer time. first indicator 2 changes a hyphen into a blank in a compound name. the internal hyphen becomes a blank because it is filed as a blank: martin-chauffier martin chauffier pasteur vallery-radot pasteur vallery radot first indicator 3 is used for the compound names in which a character (blank, hyphen, apostrophe) is deleted: la fontaine (filed as lafontaine) mac innis (filed as macinnis ) o'neil (filed as oneil) von nostrand (filed as van nostrand) there seems nowhere a clear explanation of the reasons for creating a special field for family names (the use of this indicator in marc ii). for french libraries it is useless for filing purposes, family name being filed as a surname. first indicator 4 is used when a complex filing is necessary, that is to say, when the technique of inserting vertical bars (or any other characters) is used in the way proposed by r. coward. the use of this specific indicator for these three bars enables the program to test for them only when this indicator is present. this means that there is just one test per name instead of ten or twenty on each character of every name. as this indicator is in the directory, the processing of the names before the sorting itself is hastened. martin i du card i ducard dupon i de la cueriviere i lacueriviere me alester i macalester i me craw-hill i maccraw hill i muller i mueller i first indicator 0 also has a filing function. as names of saints and kings will be a small part of the files, and in order to file them correctly, three bars are inserted to mark omissions for alphabetization. 100 00 $a therese i d' ii a villa $b sainte 100 00 $a therese de i' ii enfant jesus $b sainte $k marie francoise therese martin in field llo the subfield codes of the communications format were not sufficient for a good filing. first, there seemed no reason to separate name (inverted) and name (direct order) because there is no difference in the 122 journal of library automation vol. 4/3 september, 1971 filing of these names, which is strictly alphabetical. there is also no logical difference between them. so monocle retains only two of these indicators: 10, for name of a corporate body entered under the name of a place and 20, for other corporate bodies. this will be useful either for research purposes or for giving priority in filing to the name of place following upon the other name. as there are the same filing problems as in the author field, the indicator 40 has been added, which means that the three vertical lines are used. 110 40 $c martin i von ii wagner universitat the subfield coding is rather succinct in the marc format, and a change was made from the bnb coding because french practice does not use form subheading and "treaty" subheading. moreover, under the name of a corporate body there can be a subheading such as "conference." this subheading has to be interfiled with a subheading of subordinate department and then should have a different code. library association. londres. conference. library association. londres. cataloging group the subfield codes are: $a french name of the corporate body ~ uniform title used by $b place i the bibliotheque nationale $c name $g relator $h name of congress or conference $1 subordinate department $j additional designation (number of the congress) $k date of the congress $m place of the congress $n remainder of the title $o type of jurisdiction $p name of larger geographic entity $q inverted element monocle does not use the "$t" proposed in marc, and the same is true with many other fields ( 410, 610, 710, 910). monocle makes important changes in the title fields, following british marc but going a little further. tags have been assigned to titles in the following order: 240 collective filing title (complete works) 241 uniform title ( bible) 242 original title 243 translated title (used only for the filing of russian or greek words according to the roman alphabet) 244 romanized title 245 title a book may have several titles, in which case they are filed under the name of the author in the numerical sequence of the tags. a collective monocle/chauveinc 123 title (the complete work ) is filed before a uniform title (if it exists), and the latter before an original title, which is in turn filed before an actual title. classical works of which there are many translations have to be regrouped under the original title, but this may not be true of scientific works or of popular novels, which are filed under actual title. moreover, filing of titles can be different in different libraries and for different books in the same library, which is why the filing order will not be determined on the worksheet, but by the program. this problem in filing order was raised by the bibliotheque nationale, which does not want to have determined in the record itself which of several titles will be the filing title; titles will be put under their respective tags according to their nature, and the program will, according to certain tests, choose the filing title. however, a completely satisfying solution to achieving flexibility and unambiguity in filing has not been arrived at. monocle now uses only sequences 240, 241 and 245, using about the same indicators as the marc format but with a slightly different meaning. the first indicators in field 241 have also been changed in order to achieve proper filing whether or not a conventional title contains a personal name. for example "exposition chagall" will be filed before "exposition bibliotheque nationale." the second indicator set to 't ' shows that there should be a cross reference from this title to the title used for filing (actual title to original title, alternative title to main title ). the second indicator set to "9" shows that the title is not significant and will not be used in a title catalog; field 900 is thus not used and repetition of the cross reference is avoided. monocle also employs in title fields the indicator "4" used in field 100 for complex names and an added indicator "5" for title without personal names. subfield codes have also been modified in such a way as to use their alphabetical value as filing value as well as to identify data elements within a field. the following codes are used in fields 240, 241, 242, 243, 244 and in corresponding fields 440-444, 7 40-7 44, 940-944 ) : $a title $b filing number for a logical order of the bible, koran, etc. $c adaptation or extract $d remainder of the title $e filing number for languages $f language $g filing number for dates $h dates $k name of person $1 epithet $m forename $p place $q corporate body the following are examples of this subfield code use: 124 journal of library automation vol. 413 september, 1971 241 50 $a bible $b 03 $d a. t. pentateuque, genese $c extraits $e 7 $f francais $h 1967 241 50 $a exposition $p paris $q bibliotheque nationale $h 1967 241 10 $a exposition $k chagall $m marc $h 1963 for field 245 marc indicators have been retained and "40" added for title with complex filing. these titles use the three vertical lines. 245 40 $a i le xxeme i vingtieme i siecle for more simple filing the virgule or slash is used to eliminate articles at the beginning of titles. this is more flexible than the use of one indicator to determine the number of characters to avoid in filing, especially as there can be more than nine characters to avoid. 245 00 $a the i chemistry of life the foregoing two techniques are used in all the fields x4y of monocle ( 445, 945, etc. ) . there are slight modifications in other fields. for example, in the "collation" field the american and british formats do not make any mention of volumes. as it comes first in monocle collation, the subfield codes of 260 are modified as follows: $a volumes $b height $c pagination $d illustration this situation may change if an international standardized catalog description is agreed upon. in fields 400, 600, 700 and 900 the marc and bnb marc projects have foreseen only one subfield "$t" to put the title after the name, and only one field, 740 or 940 for titles alone. to permit filing author-title series or an author-title added entry with titles of works of the same author, the following title fields were constructed in exactly the same way as fields 240-245: 440, 640, 740, 940. the following fields were added, with the same indicators and subfield codes as 240-245: 441, 442, 443, 444, 741, 742, etc. the repeat indicator is used to link the author to the title in order to make one entry, since author entry and title entry may be quite independent. 410 20 001 $c national research council 445 00 001 $a i publications $y 1708 100 00 $a meynell $m esther 241 00 $a the i little chronicle of anna magdalena bach $f francais $h 1957 245 01 $a la i petite chronique d'anna magdalena bach $c trad. par m. e. buchet 700 11 $a buchet $m m. e. $g trad. 900 10 001 $a bach $m anna magdalena $g auteur suppose 945 00 001 $a la petite chronique $r voir $z 241 000 945 00 002 $a laipetite chronique d'anna magdalena bach $r voir $z 241 000 monoclejchavveinc 125 this is a very useful tool, which permits generalization of the program to interfile records of books published by an institution with records of series published by the same institution, something not possible if one is under "$t" and the other under 245. the technique is not used, however, when the name is part of the title, as in "holden day series in mathematics." it is also useful because monocle treats large handbooks as series, which is more simple than using "$d" and "$e" in the 245 field and repeating the name of the treatise in every record or using the subrecord technique. field 502 has also been modified to permit filing dissertations by subject, towns, date and number. the details of the indicators and subfield codes can be found in monocle (3). one of the main problems encountered was the processing of multivolume sets. it was thought necessary to develop a provision to permit interfiling volumes of a multivolume set. there are three cases, the most simple being that in which volumes are simply numbered 1, 2, 3 ... with or without a title and a date by volume. field 505 is used in this case, with subfield codes slightly modified: $y volume number $a title $b subtitle $e remainder (date, pagination) following is an example: 505 00 $y 1 $a the practice of kinetics $e 1969, 450 p. $y 2 sa the theory of kinetics $e 1969, 436 p. in the second case, when each volume has authors, title, and date, the subrecord technique can be used, each volume having its own subrecord. this is possible only for treatises with few volumes, since the complete record cannot be too long. for very complicated handbooks the series technique is employed. a record is made for the main title as a guide record, and other records are made for each volume, the name of the main treatise being repeated in fields 400-445. this case could be treated by the subrecord technique, but this would give very long and complicated records, too long to be processed by computer and difficult to correct each time a new volume comes in. although the technique used is not very logical, the guide record is made only once, and a record is made for the volume only when it comes in, without any modification to the records already in the computer. when the records are sorted in alphabetical order, one entry will be made to the individual volume and by the "series note" will find its place under the guide record ( 3). there is of course no logical link internal to the file between records of different books of the same series, nor of them with their guide record. if there is a multivolume work as part of a series, in which each volume bears a different number in the series, there are two possibilities: either to use field 505 and 445 for each volume, linking them by the repeat indicator, or to use the subrecord technique. monocle 126 journal of library automation vol. 4/3 september, 1971 makes a choice according to the complexity of the records. at the request of the bibliotheque nationale and of some documentalists wishing to use the format for bibliographies of articles, some fields were added. field 270 contains name of the printer, the place and date of printing. indicators 00 subfield codes $a place $b printer's name $c date field 545 is the title of a periodical from which is extracted the article in the main entry. this tag was chosen because 500 is the note number (the title of the periodical is not an entry ) and 45 is the title number and can be constructed as a title field. indicators 00 subfield codes $a title $b subtitle $c year $d month $e day $y volume $f issue $g pagination $h bibliographical references "$y" was kept for volume for the sake of consistency throughout the format. since it was undesirable to alter marc fields 660 and 670, monocle employs 680-682 for french subject headings. however, name subject heading tags were retained as 600, 610 and 611, but with modified subfield coding. as in french filing geographical names are filed before topical names, the following tags were assigned: 680 geographical names 681 topical names 682 topical names for indexes only the last tag was created in order to differentiate between subject headings for information retrieval and headings for printed indexes only. if there is a relation between two headings, the slash is used between them to tell the computer to make an inverted entry. for example, 680 04 $a chemistry j physics gives two entries, one under chemistry and the other under physics. to allow each library to have its own subject heading system the second indicator is used to indicate this system: for example, 04 is for bibliotheque de grenoble. codes for monocle are partially taken from the british codes instead of the american ones because they are given a filing value. they are, however, slightly different, in that there is no form subdivision. subfield codes are as follows: $a heading $t chronological subdivision $u geographic subdivision $w general subdivision, 1st level $x general subdivision, 2nd level $y general subdivision, 3rd level $z general subdivision, 4th level monocle/chauveinc 127 the levels have been requested for some information retrieval systems that have multilevel thesauri. as a general rule, the attempt was to give a filing value to most of the subfield codes in order to simplify and hasten processing without any table of translation. the latter is always possible, but burdens the program. the library of congress has published a special format for serials. thinking it not very useful, and feeling that serials could be processed by the marc format for books, the librarians at grenoble simply added to the monocle format some fields specifically for serials, as follows: 030 coden 210 abbreviated title 515 525 not used 555 in monocle 503, bibliographic history, is used for the "followed by" and "following" notes of a periodical, because they are simply notes and not added entries. fields 780 and 785 are not necessary, since in a catalog an entry is usually not made for these titles. most periodicals are processed by the format without any trouble. the holdings of the library are put under 090 $b, as shown in the following example: 090 00 $a cbp. 185 $b 1, 1967$c 5732s. $a call number $b holdings $c location summary as stated at the beginning, the library of congress in its marc ii communications format has published the most comprehensive and the most detailed analysis of a bibliographical record. some, mostly documentalists, do not agree with the marc ii complexity in coding, but their aims are not the same as those of librarians who want, first, to catalog books and catalog records according to rules required for a catalog of a large stock of books. a simple, alphabetical sort on the author names is not adequate and is quite unusable by a reader. however, an arrangement that is good for a weekly bibliography may not be sufficient for a complete catalog. the british national bibliography made a thorough study of catalog entries and produced a better filing structure in accordance with the anglo-american rules. 128 journal of library automation vol. 4/3 september, 1971 monocle translated the marc format with slight modifications, but subsequent trials led to more modifications. monocle format has been made from a librarian's point of view, but sometimes a programmer's view of the system has brought about an improvement in it. monocle is working, but not without difficulties. these difficulties come not from the format itself but from the on-line system, which is not working as well as expected. the system organization may not be of the best and perhaps needs a thorough study before being put into operation. the format is not completely satisfactory and needs improvement. documentalists are right when they say it is too complex and expensive. synthesis between the documentalist format, which is too simple, and the monocle format will be undertaken to simplify the worksheet and speed up input time. from the librarian's point of view there are still problems to be solved. processing of complex titles is not easy, elegant and clear. the analysis should go deeper to determine more logical relations between data, avoidance of duplication of information in the record, and speeding up of processing at every stage. the technique of links between fields and records is not developed in monocle as it is in other systems. it may be helpful to connect data by use of pointers and to do away with repetition of series notes that are already input elsewhere. hierarchical links between records should be useful. hence, there is much work still to be done, but the most immediate goal is to make the monocle format operational not only for the library of grenoble university for also for the bibliotheque nationale, which has adopted it for the automation of the bibliographie de la france. the philosophy behind the modifications introduced in converting the marc communications format to the monocle processing format can and should be discussed, but they have all been made in order to improve the structure of the record not only for an internal processing but also for the interfiling of records, which is much more complicated. until now work has been done only on descriptive cataloging and on author-title filing. subject indexing and information retrieval are quite another job. references 1. avram, henriette d.; knapp, john f. ; rather, lucia j.: the marc ii format: a communications format for bibliographic data (washington, d. c.: library of congress, 1968). 2. bnb marc documentation service publication no.1 (london: council of the british national bibliography, ltd., 1968) . 3. chauveinc, marc: monocle ; protect de mise en ordinateur d'une notice catalographique de livre (grenoble: universitaire de grenoble, 1970) . 242 marc program research and development: a progress report henriette d. avram, alan s. crosby, jerry g. pennington, john c. rather, lucia j. rather, and arlene whitmer: library of congress, washington, d. c. a description of some of the research and development activities at the library of congress to expand the capabilities of the marc system. gives details of the marc processing format used by the library and then describes programming work in three areas: 1) automatic tagging of data elements by format recognition programs; 2) file analysis by a statistical program called genesis; and 8) information retrieval using the marc retriever. the marc system was designed as a generalized data management system that provides flexibility in converting bibliographic descriptions of all forms of material to machine readable form and ease in processing them. the foundation of the system is the marc ii format (hereinafter simply called marc), which reached its present form after many months of planning, consultation, and testing. implementation of the system itself has required development of a battery of programs to perform the input, storage, retrieval, and output functions necessary to create the data base , for the marc distribution service. these programs are essentially like those of the marc interim system described in the report of the marc pilot project ( 1). briefly, they perform the following tasks: marc research and development/ avram 243 1) a pre-edit program converts records prepared on an mt /st to a magnetic tape file of ebcdic encoded record segments. 2) a format edit program converts the pre-edited tape file to a modified form of the marc processing format. 3) a content edit program generates records in the final processing format. at this stage, mnemonic tags are converted to numeric form, subfield codes may be supplied, implicit fixed fields are set, etc. 4) ibm sort program arranges validated content-edit output records by lc card number. this program is also used later in the processing cycle. 5) a generalized file maintenance program (update 1) allows addition, deletion, replacement, or modification of data at the record, field, or subfield levels before the record is posted to the master file. a slightly different version (update 2) is used to update the master file. 6) a print index program generates a list of control numbers for a given file. the list may also include status, date of entry, or date of last transaction for each record. 7) a general purpose print program produces a hardcopy to be used to proofread the machine data against the original input worksheet. since the program is table controlled, it can be modified easily to yield a great variety of other formats and it can be extended routinely to handle other data bases in the marc processing format. 8) two additional programs select new records from the marc master file and convert them from the processing format to the communications format on both sevenand nine-track tapes for general distribution. as the basic programs became operational, it was possible to investigate other aspects of the marc system that would benefit from elaboration and refinement. reports of some of this activity have found their way into print, notably a description of the marc sort program and preliminary findings on format recognition (2, 3), but much of the library·s research and development effort in programming is not well known. the purpose of this article is to give a progress report on work in three significant areas : 1) automatic tagging of data elements by format recognition programs; 2) file analysis by a statistical program called genesis; and 3) information retrieval using the marc retriever. in the following descriptions, the reader should bear in mind that all of the programs are written to accommodate records in the marc processing format. a full description of the format is given to point up differences between it and the communications format. all of the programs are written in assembly language for the ibm s360/ 40 functioning under the disk operating system (dos ) . the machine file is stored on magnetic tape and the system is operated in the batch mode. at present, the programs described here are not available for general distribution, but it is expected that documentation for some of them may 244 journal of library automation vol. 2/4 december, 1969 be filed with the ibm program information department in the near future. meanwhile, the library of congress regrets that it will be unable to supply more detailed information. it is hoped that the information in this article will answer most of the questions that might be asked. marc processing format the marc data base at the library of congress is stored on a ninechannel magnetic tape at a density of 800 bpi. the file contains records in the undefined format; each record is recorded in the marc processing format (sometimes called the internal format). data in the processing format are recorded in binary, packed decimal, or ebcdic notation depending on the characteristics of the data and the processing required. the maximum length of a marc processing record is 2,048 bytes. the magnetic tape labels follow the proposed standard developed by subcommittee x3.2 of the united states of america standards institute. a marc record in the processing format is composed of six parts: record leader ( 12 bytes), communications field ( 12 bytes), record control field ( 14 bytes), fixed fields (54 bytes), record directory (variable in length, with each directory entry containing 12 bytes) and variable data fields (variable length). all records are terminated by an end-of-record ( eor) character. record leader 0 1 2 4 5 6 7 record l ength element number 1 2 date yy : mm :nn status not record used type i number character name of position characters in record record length 2 0-1 date 3 2-4 8 9 11 bibliographic not level used definition total number of bytes in the logical record including the number of bytes in the record length itself. it is given in binary notation. date of last transaction (i.e., the date the last action was taken upon the whole record or some part of the record). the date is recorded in the form of marc research and development/ a vram 245 3 4 5 6 7 status 1 not used 1 record type 1 bibliographic 1 levels not used 3 communications field 12 n 14 15 16 record directory record directory entry source location colult 17 yymmdd, with each digit being represented by a four-bit binary-coded decimal digit packed two to a byte. 5 a code in binary notation to indicate a new, deleted, changed, or replaced record. 6 contains binary zeros. 7 an ebcdic character to identify the type of record that follows (e.g., printed language material) . 8 an ebcdic character used in conjunction with the record type character to describe the components of the · bibliographic record (e.g., monograph). 9-11 contains binary zeros. 18 19 20 2~ record ininnot destination process process u sed type status element number number character n arne of position definition characters in record 1 record directory 2 location 2 directory entry 2 count 3 record source 1 12-13 the binary address of the record directory relative to the first byte in the record (address zero). 14-15 the number of directory entries in the record, in binary notation. there is one directory entry for every variable field in the record. 16 an ebcdic character to show the cataloging source of the record. 246 journal of library automation vol. 2/4 december, 1969 4 record 1 17 an ebcdic character to show destination the data bank to which the record is to be routed. 5 in-process 1 18 a binary code to indicate the type action to be performed on the data base. the in-process type may signify that a new record is to be merged into the existing file; a record currently in the file is to be replaced, deleted, modified in some form; or that it is verified as being free of all error. 6 in-process 1 19 a binary code to show whether status the data content of the record has been verified. 7 not used 4 20-23 contains binary zeros. record control field 24 i ! i i 'i'i i i libr~ry of con~ess cata~og card nymber 1 supplement 1 number not used segment number element number 1 number character name of position definition characters in record library of 12 congress catalog card number 24-35 on december 1, 1968, the library of congress initiated a new card numbering system. numbers assigned prior to this date are in the "old, system; those assigned after that date are in the "new, system( 4). the library of congress catalog card number is always represented by 12 bytes in ebcdic notation but the data elements depend upon the system. marc research and development/ avram 247 old numbering system prefix 3 24-26 an alphabetic prefix is left justified with blank fill; if no prefix is present, the three bytes are blanks. year 2 27-28 number 6 29-34 supplement 1 35 a single byte in binary notation number to identify supplements with the same lc card number as the original work. new numbering system not used 3 24-26 contains three blanks. initial 1 27 initial digit of the number. digit check digit 1 28 "modulus 11, check digit. number 6 29-34 supplement 1 35 see above. number 2 not used 1 36 contains binary zeros. 3 segment 1 37 used to sequentially number the number physical records contained in one logical record. the number is in binary notation. fixed fields i ~ j { 911 the fixed field area is always 54 bytes in length. fixed fields that do not contain data are set to binary zeros . . data in the fixed fields may be recorded in binary or ebcdic notation, but the notation remains constant for any given field. 248 journal of library automation vol. 2/ 4 december, 1969 record directory 92 94 95 96 98 99 100 101 102 103 ta g site not action data relative number used code length address element number character number name of position characters in record definition 1 tag 3 92-94 an ebcdic number that identifies a variable field. the tags in the directory are in ascending order. 2 site number 1 95 a binary number used to distinguish variable fields that have identical tags. 3 not used 3 96-98 contains binary zeros. 4 action code 1 99 a binary code used in file maintenance to specify the field level action to be performed on a record ( i.e., added, deleted, corrected, or modified). 5 data length 2 100-101 length (in binary notation) of the variable data field indicated by a given entry. 6 relative 2 102-103 the binary address of the first address byte of the variable data field relative to the first byte of the record (address zero). 7 directory end 1 n since the number of entries in of field the directory varies, the characsentinel ter position of the end-of-field terminator ( eof) also varies. marc research and development/ avram 249 variable data fields indicator(s) delimiter sub field delimiter data < $ terminator code code( s) element number character number name of position 1 2 3 4 5 6 characters in record indicator variable delimiter 1 subfield variable code delimiter 1 data terminator code variable 1 n n n n n n ~ definition a variable data field may be preceded by a variable number of ebcdic characters which provide descriptive information about the associated field. a one-byte binary code used to separate the indicator ( s) from the subfield code( s). when there are no indicators for a variable field, the first character will be a delimiter. variable fields are made up of one or more data elements ( 5). each data element is preceded by a delimiter; a lower-case alphabetic character is associated with each delimiter to identify the data element. these alpha characters are grouped. all variable fields will have at least one subfield code. each data element in a variable field is preceded by a delimiter. all variable fields except the last in the record end with an endof-field te1minator ( eof); the last variable field ends with an end-of-record terminator (eor). 250 journal of library automation vol. 2/4 december, 1969 format recognition the preparation of bibliographic data in machine readable form involves the labeling of each data element so that it can be identified by the machine. the labels (called content designators) used in the marc format are tags, indicators, and subfield codes; they are supplied by the marc editors before the data are inscribed on a magnetic tape typewriter. in the current marc system, this tape is then run through a computer program and a proofsheet is printed. in a proofing process, the editor compares the original edited data against the proofsheet, checking for errors in editing and keyboarding. errors are marked and corrections are reinscribed. a new proofsheet is produced by the computer and again checked for errors. when a record has been declared error-free by an editor, it receives a final check by a high-level editor called a verifier. verified records are then removed from the work tape and stored on the master tape. the editing process in which the tags, indicators, sub:field codes, and :fixed :field information are assigned is a detailed and somewhat tedious process. it seems obvious that a method that would shift some of this editing to the machine would in the long run be of great advantage. this is especially true in any consideration of retrospective conversion of the 4.1 million library of congress catalog records. for this reason, the library is now developing a technique called "format recognition." this technique will allow the computer to process unedited bibliographic data by examining the data string for certain keywords, significant punctuation, and other clues to determine the proper tags and other machine labels. it should be noted that this concept is not unique to the library of congress. somewhat similar techniques are being developed at the university of california institute of library research ( 6) and by the bodleian library at oxford. a technique using typographic cues has been described by jolliffe ( 7 ) . the format recognition technique is not entirely new at the library of congress. the need was recognized during the development of the marc ii format, but pressure to implement the marc distribution service prevented more than minimal development of format recognition procedures. in the current marc system a few of the fields are identified by machine. for example, the machine scans the collation statement for keywords and sets the appropriate codes in the illustration fixed field. in general, however, machine identification has been limited to those places where the algorithm produces a correct result 100 percent of the time. the new format recognition concept assumes that, after the unedited record has been machine processed, a proofsheet will be examined by a marc editor for errors in the same way as is done in the current marc system. since each machine processed record will be subject to human review, it will be possible to include algorithms in the format recognition program that do not produce correct tagging all of the time. marc research and development/ avram 251 the format recognition algorithms are exceedingly complex, but a few examples will be given to indicate the nature of the logic. in all the examples, it is assumed that the record is typed from an untagged manuscript card (the work record used as a basis for the library of congress catalog card) on an input device such as a paper tape or a magnetic tape typewriter. the data will be typed from left to right on the card and from top to bottom. the data are input as fields, which are detectable by a program because each field ends with a double carriage return. each field comprises a logical portion of a manuscript card; thus the call number would be input as a single field, as would the main entry, title paragraph, collation, each note, each added entry, etc. it is important to note that the title paragraph includes everything through the imprint. identification of variable fields call number. this field is present in almost every case and it is the first field input. the call number usually consists of 1-3 capital letters followed by 1-4 numbers, followed by a period, a capital letter, and more numbers. there are several easily identifiable variations such as a date before the period or a brief string of numbers without capital letters following the period. the delimiter separating the class number from the book number is inserted according to the following five-step algorithm: 1) if the call number is law, do not delimit. 2) if the call number consists simply of letters followed by numbers (possibly including a period), do not delimit. example: hf5415.13 if this type of number is followed by a date, it is delimited before the blank preceding the date. example: ha12f 1967 3) h the call number begins with 'kf' followed by numbers, followed by a period, then: a) if there are one or two numbers before the period, do not delimit. example: kf26.l354 1966a b) if there are three or more numbers before the period, delimit before the last period in the call number. example: kfn5225f.z9f3 4) if the call number begins with 'cs71' do not delimit unless it contains a date. in this case, it is delimited before the blank preceding the date. example: cs7l.s889f 1968 5) in all other cases, delimit before the last capital letter except when the last capital letter is immediately preceded by a period. in this latter case, delimit before this preceding period. examples: ps3553.e73fw6 e595.f6fk4 1968 pz10.3.u36fsp tx652.5f.g63 1968 name main entry. the collation statement is the first field after the call number that can 252 journal of library automation vol. 2/4 december, 1969 be easily identified by analyzing its contents. the field immediately preceding the collation statement must be the title paragraph. if there is only one field between the call number and the collation, the work is entered under title (tagged as 245) and there is no name main entry. if there are two or three fields, the first field after the call number is a name main entry (tagged in the 100 block). when three fields occur between the call number and collation, the second field is a uniform title (tagged as 240). further analysis into the type of name main entry and the subfield code depends on such clues as location of open dates ( 1921) , date ranges covering 20 years or more ( 1921-1967), identification of phrases used only as personal name relators ( ed., tr., comp. ), etc. the above clues strongly indicate a personal name. identification of an ordinal number preceded by punctuation and a blank followed by punctuation is strongly indicative of a conference heading. in the course of processing, delimiters and the appropriate subfield codes are inserted. subfield code "d" is used with dates in personal names; subfield code "e" with relators. example: mepsfde smith, john,f1902-1967,fed. analysis for fixed fields publisher is main entry indicator. this indicator is set when the publisher is omitted from the imprint because it appears as the main entry. the program will set this indicator whenever the main entry is a corporate or conference name and there is no publisher in the imprint statement. this test will fail in the case where there is more than one publisher, one of which is the main entry, but occurrences of this are fairly rare (less than 0.2 percent). biography indicator. four different codes are used with this indicator as follows: a = individual autobiography; b = individual biography; c = collected biography or autobiography; and d = partial collected biography. the "n' code is set when 1) "autobiographical", "autobiography", "memoirs", or "diaries" occurs in the title statement or notes, or 2) the surname portion of a personal name main entry occurs in the short title or the remainder of the title subfields. the "b" code is set when 1 ) "biography" occurs in the title statement, 2) the surname portion of a personal name subject entry occurs in the short title or the remainder of the title subfields, or 3) the dewey number contains a "b" or a 920. the "c" code is set when 1) "biographies" occurs in the title statement or 2) a subject entry contains the subdivision 'oiography." there appears to be no way to identify a "d" code situation. despite this fact, the biography indicator can be set correctly about 83 percent of the time. marc research and development/ avram 253 implementation schedule work on the format recognition project was begun early in 1969. the first two phases were feasibility studies based on english-language records with a certain amount of pretagging assumed. since the results of these studies were quite encouraging, a full-scale project was begun in july 1969. this project is divided into five tasks. task 1 consisted of a new examination of the data fields to see if the technique would work without any pretagging. new algorithms were designed and desk-checked against a sample of records. it now seems likely that format recognition programs might produce correctly tagged records 70 percent of the time under these conditions. it is possible that one or two fixed fields may have to be supplied in a pre-editing process. tasks 2 through 5 remain to be done. task 2 will provide overall format recognition design including 1) development of definitive keyword lists, 2) typing specifications, 3) determination of the order of processing of fields within a record, and 4) description of the overall processing of a record. when the design is completed, a number of records will go through a manual simulation process to determine the general efficiency of the system design. task 3 will investigate the extension of format recognition design to foreign-language titles in roman alphabets. task 4 will provide the design for a format recognition program based on the results of tasks 2 and 3 with detailed flowcharts at the coding level. the actual coding, checkout, and documentation will be performed as task 5. according to current plans, the first four tasks are scheduled for completion early in 1970 and the programming will be finished later in the year. outlook it is apparent that a great deal of intellectual work must be done to develop format recognition algorithms even for english-language records and still greater ingenuity will be required to apply these techniques to foreign-language records. nevertheless, on the basis of encouraging results of early studies, there is evidence that the human effort in converting bibliographic records to machine readable form can be materially reduced. since reduction of human effort would in tum reduce costs, the success of these studies will have an important bearing on the rate at which current conversion activities can be expanded as well as on the economic feasibility of converting large files of retrospective cataloging data. genesis early in the planning and implementation of automation at the library of congress it became apparent that many tasks require information about the frequency of data elements. for example, it was helpful to know about the frequency of individual data elements, their length in characters, and the occurrence of marks of punctuation, diacritics, and specified 254 journal of library automation vol. 2/4 december, 1969 character strings in particular data elements. in the past, most of the counting has been done manually. once a sizable amount of data was available in machine readable form, it was worthwhile to have much of this counting done by computer. therefore, the generalized statistical program (genesis) was done as a general purpose program to make such counts on all forms of material in the marc processing format on magnetic tape files. any of a variety of counts can be chosen at the time of program execution. there are three types of specifications required for a particular run of the program: selection criteria; statistical function specifications; and output specifications. selection criteria record selection criteria are specified by statements about the various data fields that must be present in the records to be processed. field selection criteria specify the data elements that will actually be analyzed. processing by these techniques operates logically in two distinct stages: 1) the record is selected from the input file; i.e., the program must determine if a particular record is to be included in the analysis; and 2) if the record is eligible, the specified function is performed on selected data fields. it should be noted that records may be selected for negative as well as positive reasons. the absence of a particular field may determine the eligibility of a record and statistical processing can be performed on other fields in the record. record selection is optional; if no criteria are specified, all records on the input file will be considered for processing. since both record selection and field selection reference the same elements, specifications are input in the same way. selection of populations can be designated by tagging structure (numeric tags, indicators, subfield codes or any combination of these three), specified character strings, and specified characters in the bibliographic data. the following queries are typical of those that can be processed by genesis. how many records with an indicator set to show that the volume contains biographic information also have an indicator set to show that the subject is the main entry? how many records with a field tagged to show that the main entry is the name of a meeting or conference actually have the words "meeting" or "conference" in the data itself? table 1 shows the operators that can be used with record and field select statements. statistical function specification the desired statistical function is specified via a function statement. four functions have been implemented to date. they involve counts of occurrences of specified fields, unique data within specified fields given a range of data values, data within a specified range, and particular data characters. in addition to counting the frequency of the specified element, genesis calculates its percentage in the total population. marc research and development/ a vram 255 table 1. operators of genesis operator equals not equal greater than or equal to less than or equal to and or example of usage count all occurrences where data represented by tag 530 equals "bound with" count all occurrences where the publication language code is not equal to "eng" count all occurrences and output records that are greater than or equal to 1,000 characters count all occurrences of records entered on the marc data base before june 1, 1968 (less than or equal to 680601) count all occurrences where the publication equals "s" and the publication date is greater than or equal to 1960 count all occurrences of personal name main entry (tag 100) a relator ( subfield code "e") that equals "ed." or "comp." the first function counts occurrences per record of specified field selection criteria. this answers queries concerning the presence of given conditions within the selected records; for example, a frequency distribution of personal name added entries (tag 700). this type of count results in a distribution table of the number of records with 0 occurrences, 1 occurrence, 2 occurrences, and so forth. the second function, which counts occurrences of unique data values within a specified range, answers queries when the user does not know the unique values occurring in a given field, but can state an upper and lower value. for example, the specific occurrences of publishing dates between 1900 and 1960 might be requested. the output in response to this type of query consists of each unique value, lying within the range specified, with its frequency count. in addition, separate counts are given for values less than the lower bound and of values greater than the upper bound. the function is performed by maintaining in computer memory an ordered list of unique values encountered, together with their respective counts. as selected fields are processed, each new value is compared against the entries in the list. if the new value already appears in the list, its corresponding count is incremented. otherwise, the new value is inserted in the list in its proper place and the remainder of the list is pushed down by one entry. the amount of core storage used during a 256 journal of library automation vol. 2/ 4 december, 1969 particular run is directly related to the number of unique occurrences appearing within the specified range. since the length of each entry is determined by the length of the bounds specified, the number of entries which can be held in free storage can vary from run to run. thus it is possible that the number of unique entries may fill memory before a run has been completed. when this happens, the value of the last entry in the list will be discarded and its count added to the "greater than upper bound" count. in this way, while the user may not obtain every unique value in the specified range, he will obtain all unique values from the lower bound which can be contained in memory. he is then in a position to make subsequent runs using, as a beginning lower bound value, the highest unique value obtained from the preceding run. the third function processes queries concerning counts within specified ranges. when this function is used, unique values are not displayed. instead, the occurrences are counted by specified ranges of values. more than one range can be processed during a single run. on output, the program provides a cumulative count of values encountered within each range as well as the counts of those less than and those greater than the ranges. function four counts occurrences of particular data characters. an individual character may be specified explicitly or implicitly as a member of a group of characters. this allows the counting of occurrences of various alphabetic characters within specified fields. the current list of character classes that can be counted are: alpha characters, upper-case letters, lowercase letters, numbers, punctuation, diacritics, blanks, full (all characters included in above classes), nonstandard special characters, and any particular character using hex notation. it should be noted that there are various ways of specifying particular characters. for example, an "a" might be designated causing totals to accumulate for all alphabetics; or, a "u" and an "l" might be specified causing separate totals to be accumulated for upperand 1ower-case characters. in addition to the total counts for each class, individual counts of characters occurring within any class can be obtained for display along with the total count. output specifications formatted statistical information is output to the line printer. optionally, the selected records can be output on magnetic tape for later processing. limitations for the purpose of defining a query, more than one field may be specified for record and field selection, using as many statements as necessary. at present, however, the statistical processing for a particular run is performed on all of the run-criteria collectively. for example, separate runs of the program are required to obtain each frequency distribution. it is important to note that genesis is essentially a means of making marc research and development/ avram 257 counts. the statistical analysis of data is a complex task that requires sophisticated techniques. genesis does not have the capability to analyze data in terms of standard deviation, correlation, etc. but the output does constitute raw data for those kinds of analyses. although the four functions of genesis implemented to date do not, in themselves, provide a complete statistical analysis, they greatly lessen the burden of counting; and techniques for designating data elements to be counted suffice to describe extremely complex patterns. continued use of the program will no doubt provide guidelines for expansion of its functions. use of the program genesis has already provided analyses that are helpful in the design of automated procedures at the library of congress, as is indicated by the following instances. a frequency distribution of characters was made to aid in specifying a print train. an analysis of certain data characteristics has determined some of the specifications for the format recognition program described in an earlier section. genesis is providing many of the basic counts for a thorough analysis of the material currently being converted for the marc distribution service to determine frequency patterns of data elements. the findings should be valuable for determining questions about storage capacity, file organization, and retrieval strategy. although genesis is a new program in the marc system, there is little doubt that it is a powerful tool that will have many uses. marc retriever since the marc distribution service has been given the highest priority during the past two years, the emphasis in the implementation of the marc system has been on input, file maintenance, and output with only minimum work performed in the retrieval area. it was recognized, moreover, that as long as marc is tape oriented, any retrieval system put into effect at the library of congress would be essentially a research tool that should be implemented as inexpensively as possible. it did seem worthwhile, however, to build retrieval capability into the marc system to enable the lc staff to query the growing marc data base. query capability would answer basic questions about the characteristics of the data that arise during the design phases of automation efforts. in addition, it seemed desirable to use the data base in an operational mode to provide some needed experience in file usage to assist in the file organization design of a large bibliographic data base. the specifications of the system desired were: 1) the ability to process the marc processing format without modification; 2) the ability to query every data element in the marc record, alone or in combination (fixed fields, variable fields, the directory, subfield codes, indicators); 3) the ability to count the number of times a particular element was queried, to accumulate this count, print it or make it available in punched card 258 journal of library automation vol. 2/4 december, 1969 form for subsequent processing; and 4) the ability to format and output the results of a query on magnetic tape or printer hardcopy. to satisfy these requirements it was decided to adapt an operational generalized information system to the specifications of the library of congress. the system chosen was aegis, designed and implemented by programmatics, inc. the modification is known as the marc retriever. general description the marc retriever comprises four parts: a control program, a parser, a retrieval program, and a utility program. queries are input in the form of punched cards, stacked in the core of the ibm s /360, and operated on as though all queries were in fact one query. thus a marc record will be searched for the conditions described by all queries, not by handling each query individually and rewinding the input tape before the next query is processed. the control program is the executive module of the system. it loads the parser and reads the first query statement. the parser is then activated to process the query statement. on return from the parser, the control program either outputs a diagnostic message for an erroneous query or assigns an identification number to a valid query. after the last query statement has been parsed, the control program loads the retrieval program and the marc input tape is opened. as each record on the marc tape is processed, the control program checks for a valid input query. if the query is valid, the control program branches to the retrieval program. on return from the retrieval program, the control program writes the record on an output tape if the record meets the specifications of the query. after the last marc record has been read from the input tape, the control program branches to the retrieval program for final processing of any requested statistical function (hits, ratio, sum, avg) that might be a part of the query. the output tapes are closed and the job is ended. the parser examines each query to insure that it conforms to the rules for query construction. if the query is not valid, an error message is returned to the control program giving an indication as to the nature of the error. valid query statements are parsed and converted to query strings in polish notation, which permits mathematical expressions without parentheses. the absence of embedded parentheses allows simpler compiler interpretation, translations, and execution of results. the retrieval program processes the query strings by comparing them with the marc record data elements and the results of the comparison are placed in a true/false stack table. if the comparison result is true, output is generated for further processing. if the result is false, no action · takes place. if query expressions are linked together with "or" or "and'' connectors, the results in the true/false stack table are ored and anded together resulting in a single true or false condition. marc research and development/ avram 259 the utility program counts every data element (fixed field, tag, indicator, sub field code, data in a variable field) that is used in a query statement. the elements in the search argument are counted separately from those in the output specifications. after each run of the marc retriever, the counts can be printed or punched for immediate use, or they can be accumulated over a longer period and processed on demand. query language general. query statements for the marc retriever must be constructed according to a precisely defined set of rules, called the syntax of the language. the language permits the formation of queries that can address any portion of the marc record (fixed fields, record directory, variable fields and associated indicators and subfields). queries are constructed by combining a number of elements: marc retriever terms, operators, fixed field names, and strings of characters (hereafter called constants). the following sections describe the rules for constructing a query and the query elements with examples of their use. query formation. a query is made up of two basic parts or modes: the if mode which specifies the criteria for selecting a record; and the list mode which specifies which data elements in the record that satisfy the search criteria are to be selected for printing or further processing. in general, the rules that apply to constructing if-mode expressions apply to constructing list-mode expressions except that the elements in the list mode must be separated by a comma. a generalized query has the following form: if if-mode expression list list-mode expression; where: if if-mode expression list list-mode expression signals the beginning of the if mode. specifies the search argument. signals the beginning of the list mode. specifies the marc record data element( s) that are to be listed when the search argument specified in the if-mode expression is satisfied. the format of the query card is flexible. columns 1 through 72 contain the query which may be continued on subsequent cards. no continuation indicator is required. columns 73 through 80 may be used to identify the query if desired. the punctuation rules are relatively simple. one or more blanks must be used to separate the elements of a query and a query must be terminated by a semicolon. 260 journal of libmry automation vol. 2/4 december, 1969 queries that involve fixed fields take the following form: if fixed-field-name!= constant list fixed-field-name2 where: fixed-field-namel constant fixed-field-name2 the name of fixed field. any operator appropriate for this query. the search argument the fixed field to be output if a match occurs. to query or specify the output of a variable field, the following general expression is used. if scan (tag= nnn) = constant list scan (tag= nnn); where: scan tag nnn constant indicates that a variable field is to be referenced. indicates that the tag of a variable field is to follow. the only valid operator. specifies the tag of the variable field that is to be searched or output. specifies the character string of data that is the search argument. the marc retriever processes each query in the following manner. each record in the data base is read from tape into core and the data elements in the marc record specified in the if-mode expression are compared against the constant( s) in the if-mode expression. if there is a match, the data element( s) specified in the list-mode expression are output. key terms. the terms used in a query statement fall into two classes. the first group instructs the program to perform specified functions: scan, hits, avg, ratio, sum. the second group relates to elements of the record structure. the most important key terms in this class are: indic (indicator), ntc ( subfield code), record (the entire bibliographic record), and tag (variable field tag). these terms are used to define a constant; e.g., tag= 100. operators. operators are characters that have a specific meaning in the query language. they fall into two classes. the first contains relational operators, such as equal to and greater than, indicating that a numeric relationship must exist between the data element in the marc record and the search argument. the second class comprises the logical operators "and" and marc research and development/ avram 261 "or". the operators of the marc retriever are shown in table 2. in the definitions, c is the query constant and d is the contents of a marc record data element. table 2. operators of the marc retriever operator constan~s. > ;::: < ~ 1= & i meaning c equals d c is greater than d c is greater than or equal to d c is less than d c is less than or equal to d c is not equal to d "and" (both conditions must be true) "or" (at least one condition must be true ) a constant is either a string of characters representing data itself (e.g., poe, edgar allan) or a specific variable field tag, indicator( s), and subfield code( s). constants may take the following form: cc where cc is an alphabetic or numeric character or the pound sign"#". when this form is used, the marc retriever will convert all lower-case alphabetic characters in the data element of the marc record being searched to upper-case before a comparison is made with search argument. this conversion feature permits the use of a standard keypunch that has no lower-case capability for preparation of queries. 'cc' where cc can be any one of the 256 characters represented by the hexadecimal numbers 00 to ff. this form allows nonalphabetic or nonnumeric characters not represented on the standard keyboard to be part of the search argument. when this form is used, the marc retriever will also convert all lowercase alphabetic characters in the data elements in the marc record being searched to upper-case before a comparison is made. @cc@ where cc can be any one of the 256 characters represented by the hexadecimal numbers 00 to ff. when this form is used, characters in the data element of the marc record being searched will be left intact and the search argument must contain identical characters before a match can occur. # the pound sign indicates that the character in the position it occupies in the constant is not to take part in the comparison. for example, if the constant were #ank, tank, rank, bank would be considered matches. more than one pound sign can be used in a constant and in any position. 262 journal of library automation vol. 2/ 4 december, 1969 specimen queries. the following examples illustrate simple query statements involving fixed and variable fields. if mcpdate1 = 1967 list mcrcnumb ; the entire marc data base would be searched a record at a time for records that contained 1967 in the first publication date field ( mcpdate1). the lc card number (mcrcnumb) of the records that satisfied the search argument would be output. if scan(tag= 100) = destouches list scan(tag=245); the personal name main entry field (tag 100) of each marc record would be searched for the surname destouches. if the record meets this search argument, the title statement (tag 245) would be output. in addition to specifying that a variable field is to be searched, the scan function also indicates that all characters of the variable field are to be compared and a match will result at any point in the variable field where the search argument matches the variable field contents. for example, if the if-mode expression is scan(tag = 100) =smith a match would occur on the following examples of personal name main entries (tag 100) : smith, john; smithfield, jerome; jones-smith, anthony. it is possible to include the indicators associated with a variable field in the search by augmenting the constant of the scan function as follows: if scan(tag = 100&indic = 10) = destouches list scan(tag = 245); where: indic 1 0 specifies that indicators are to be included. specifies that the first indicator must be set to 1 (the name in the personal name main entry [tag 100] is a single surname, specifies that the second indicator must be set to zero (main entry is not the subject). the personal name main entry field (tag 100) of each record would be searched and a hit would occur if the indicators associated with the field were 1 and 0 and the contents of the field contained the characters "destouches." if the record met these search criteria, the title statement (tag 245) would be output. it is also possible to restrict the search to the contents of one or more subfields of a variable field. for example: if scan ( tag = loo&indic = 10&ntc = a) =destouches list scan(tag=245); where: ntc a indicates that a subfield code follows. specifies that only the contents of subfield a are to be included in the search. note that in this form the actual subfield code "a" is converted to "a" by the program (see section on constants) . marc research and development/ avram 263 special rules. so far the discussion has concerned rules of the query language that apply to either the if mode or the list mode. this section and the remaining sections will discuss those rules and functions that are unique to either the if mode or the list mode. in the if mode, fixed and variable field expressions can be anded or ored together using the logical operators & and j. for example: if mcpdate1 = 1967&scan(tag = 100) = destouches list scan(tag = 245); this query would search for records with a publication date field (mcpdate1) containing 1967 and a personal name main entry field ( tag 100) containing des touches. if both search criteria are met, the title statement field (tag 245) would be printed. in the list mode more than one fixed or variable field can be listed by a query as long as the fixed field names or scan expressions are separated by commas. for example: if scan(tag = 100) = destouches list scan(tag = 245) , mcrcnumb; the list mode offers two options, list and listm, which result in different actions. list indicates that the data elements in the expressions are to be printed, and listm indicates that the data elements in the expression are to be written on magnetic tape in the marc processing format. it is often desirable to list a complete record either in the marc processing format using listm or in printed form using list. in either case, the listing of a complete record is activated by the marc retriever key term record. for example: if scan (tag= 100) = destouches list record; the complete record would be written on magnetic tape in the marc processing format instead of being printed out if listm were substituted for list in the above query. four functions can be specified by the list mode. hits signals the marc retriever to count and print the number of records that meet the search criteria. for example: if scan(tag=650) = automation list hits; ratio signals the marc retriever to count both the number of records that meet the search criteria and the number of records in the data base and print both counts. the remaining two list functions permit the summing of the contents of fixed fields containing binary numbers. sum causes the contents of all specified fields in the records meeting the search criteria to be summed and printed. for example : if mcrcnumb = ·~~~68 # #####' list sum ( mcrlgth ); the data base would be searched for records with lc card number field 264 journal of library automation vol. 2/4 december, 1969 ( mcrcnumb) containing three blanks and 68 in positions one through five. the remaining positions would not take part in the query process and could have any value. if a record satisfied this search argument, the contents of the record length field (mcrlgth) would be added to a counter. when the complete data base had been searched, the count would be printed. avg performs the same function as sum and also accumulates and prints a count of the number of records meeting the search criteria. use of the program the marc retriever has been operational at the library of congress since may 1969 and selected staff members representing a cross-section of lc activities have been trained in the rules of query construction. the applications of the program to the marc master file include: identification of records with unusual characteristics for the format recognition study; selection of titles for special reference collections; and verification of the consistency of the marc editorial process. as the file grows, it is expected that the marc retriever will be useful in compiling various kinds of bibliographic listings, such as translations into english, topical bibliographies, etc., as well as in making complex subject searches. the marc retriever is not limited to use with the marc master file; it can query any data base that contains records in the marc processing format. thus, the legislative reference service is able to query its own data base of bibliographic citations to produce various outputs of use to its staff and members of congress. because the marc retriever is designed to conduct searches from magnetic tape, it will eventually become too costly in terms of machine processing time to operate. it is difficult to predict when the system will be outgrown, however, because its life span will be determined by the growth of the file and the complexity of the queries. meanwhile, the marc retriever should provide the means for testing the flexibility of the marc format for machine searching of a bibliographic file. references 1. u.s. library of congress. information systems office: the marc pilot project. (washington, d.c.: 1968), pp. 40-51. 2. rather, john c.; pennington, jerry g.: "the marc sort program," journal of library automation, 2 (september 1969), 125-138. 3. recon working task force. conversion of retrospective catalog records to machine-readable form. (washington, d.c.: library of congress, 1969). 4. u.s. library of congress. information systems office: subscribers guide to the marc distribution service, 3d ed. (washington, d.c.: 1969), pp. 31-3lb. 5. ibid., p. 40. marc research and development/ avram 265 6. cunningham, jay l.; schieber, william d.; shoffner, ralph m.: a study of the organization and search of bibliographic holdings records in on-line computer systems: phase i. (berkeley, calif.: institute of library research, university of california, 1969), pp. 85-94. 7. jollilie, john : "the tactics of converting a catalogue to machinereadable form," journal of documentation, 24 (september 1968), 149-158. lib-mocs-kmc364-20131012113301 random sample of personal names in the lc file indicates that less than 17 percent of personal names require cross-references. thus the personal name headings that occur only once but would require authority records because of cross-references could be less than 17 percent. the frequency data combined with reference structure data could have a significant impact on design. out of a total of 695,074 personal names in the authority files associated with the marc bibliographic files examined here, 456,328, or 66 percent, occur only once. of these, fewer than 77,575 would be expected to have cross-references, thus the nameauthority file for personal names could be reduced in size from 695,074 records to 316,321, a 55 percent decrease. if separate authority records are a system requirement, the occurrence figures might then be useful for defining configurations that employ machine-generated provisional records for single-occurrence headings that do not have reference structures or that simplify in other ways the treatment of these headings. these figures may also be useful in making decisions on the addition of retrospective authority records to the automated files. reference 1. william gray potter, "when names collide: conflict in the catalog and aacr2," library resources & technical services 24:7 iwinter 1980). . rlin and oclc as reference tools douglas jones: university of arizona, tucson. the central reference department (social science, humanities, and fine arts) and the science-engineering reference department at the university of arizona library are currently evaluating the oclc and rlin systems as reference tools, to see if their use can significantly improve the effectiveness and efficiency of providing reference service. a significant number of the questions received by our librarians, and presumably by librarians elsewhere, incommunications 201 valve incomplete or inaccurately cited references to monographs, conference proceedings, government documents, technical reports, and monographic serials. if by using a bibliographic utility a librarian can identify or verify an item not found in printed sources, then effectiveness has been improved. once a complete and accurate description of the item is found , it is a relatively simple task to determine whether or not the library has the item, and if not, to request it through interlibrary loan. additionally, if the efficiency of the librarian can be improved by reducing the amount of time required to verify or identify a requested item, then the patron, the library, and, in our case, the taxpayer, have been better served. the promise of nearimmediate response from a computer via an online interactive terminal system is clearly beguiling when compared to the relatively time-consuming searching required with printed sources, which frequently provide only a limited number of access points and often become available weeks, months, or even years after the items they list. we realize, of course, that the promise of instantaneous electronic information retrieval is limited by a variety of factors, and presently we view access to rlin and oclc as potentially powerful adjuncts tonot replacements for printed reference sources. given that rlin and oclc have databases and software geared to known-item searches for catalog card production, our evaluation attempts to document their usefulness in reference service. a preliminary study conducted during the spring semester of 1980-81 indicated that approximately 50 percent of the questionable citations requiring further bibliographic verification could be identified on oclc or rlin. the time required was typically five minutes or less. successful verification using printed indexes to identify the same items ranged from 20 percent in the central reference department to 50 percent in science-engineering. time required per item averaged approximately fifteen minutes. based on our findings, we plan a revised and more thorough test during the fall semester of 1981-82, which will include an assessment of the enhancements to the 202 journal of library automation vol. 14/3 september 1981 rlin system scheduled to be operational this summer. the proposed test will involve eight members of the reference staff-four from each department-who will be trained to search on oclc and rlin. those selected will include both librarians and library assistants who regularly provide reference assistance. the results obtained from such a representative group will better enable us to assess the impact on the whole reference staff should we later decide to fully implement the service. they will be the only ones involved in sampling questions and conducting comparative searches. the test will have two components, the first of which will be a twenty-week period to collect at least 400 sample questions. during their regularly scheduled reference hours, the eight specially trained librarians 'will collect samples of reference requests for materials that, based on the information initially given by the patron, cannot be identified in the card catalog. after checking the catalog, the librarian will then complete the top portion of a two-page selfcarbon form with all of the information that is known about the requested item. then, at regular intervals during the semester, the pages of each form will be separated and distributed to other members of the test staff for batch-mode searching. the manual oclc and rlin searching for each query will be done by different staff members to eliminate crossover effects. each request will be searched on both oclc and rlin with the following information being recorded: 1. date of the material requested (if known). 2. type of material (e.g., conference proceeding). 3. amount of time required to do the search. 4. success or failure of the search. this information will then be cumulated in a statistical table, and the results of each search will be keypunched for computerized analysis using the bmdp (biomedical computer programs) statistical package to determine whether or not effectiveness and efficiency have been improved significantly. in addition, on twenty-four randomly selected days during the semester the trained searchers will count the total number of questions received by them on that day that would have been appropriate to search on rlin or oclc. by using these data it will be possible to extrapolate the potential usefulness of the systems for the entire semester. the second component of the test will be a two-week real-life test during which all questions requiring further verification would be searched immediately on rlin , oclc, and in the appropriate printed sources to compare time required, success rate, and type of material requested. this sort of test would permit the searcher to continue to negotiate with the patron as the search progressed, which is the usual situation. also, this would provide the only opportunity to have the patron judge the value of subject searches done on rlin. if funding is received, preliminary results should be available in early 1982. anyone conducting similar or otherwise relevant studies is asked to contact the author. replicating the washington library network computer system software thomas p. brown: manager of computer services, and raymond deb use: manager of development and library services, washington library network, olympia. the washington library network (wln) computer system supports shared cataloging and catalog maintenance, retrospective conversion, reference, com catalog production, acquisitions, and accounting functions for libraries operating within a network. the system offers both full marc and brief catalog records as well as linked authority control for a ll traced headings. it contains more than 250,000 lines of pl/1 and ibm bal code in more than 1,100 program modules and runs on ibm or ibm-compatible hardware with ibm operating systems (mvs,os/vs1). all database management functions are provided by adxbas, a product of software a.g. of north america. the online system runs unbook reviews 211 acknowledgment the work reported in this paper was supported in part by a grant from the u.s. office of education, oeg-7-071140-4427. references 1. ruecking, frederick h., jr.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-38. 2. lipetz, ben-ami; stangl, peter: "user clues in initiating searches in a large library catalog," in american societ'l for information science, proceedings, 5. annual meeting, october 20-24, 1968, columbus, ohio, p. 137-139. book reviews conceptual design of an automated national library system, by norman r. meise. metuchen, n.j.: scarecrow press, 1969. 234 pp. $5.00. this is a very confusing book. and it is too bad, because this reviewer kept feeling that the author, norman meise, had something to present. the trouble is that he does not communicate. this, i think, is the result of two things. first, the book reflects the naivete of engineers when they come to deal with what are basically social systems like libraries. this does not mean it can't be done, but such a task needs clarity and purpose, which this book does not have. the second springs from this failure. the masses of data, assumptions, and commentary in the book are poorly organized and interrelated. it is not enough to write strings of words; those strings must communicate and relate backward and forward in the text. although never explicitly stated, the book evidently grew out of a study performed by the united aircraft corporate systems center in 1965-66 for the development and implementation of a connecticut library research center (see eric document ed 0221512). the latest reference in the book is 1966. in a field, i.e. library networks, where a fair amount of work and discussion has taken place in the last three years (e.g. the edunet conference in 1966), a book like this quickly loses its impact. the purpose of the book, according to the author, is "to show the feasibility of a system concept rather than provide a detailed engineering design." the system is "an automated national library system" using the state of connecticut as a model. the author then adds (spoiling the whole introduction) : "if these functions (bibliographic searching, acquisition, cataloging, circulation) can be economically automated, the major problems associated with our information explosion will be solved." as anatole france once said: "it is in the ability to deceive oneself that the greatest talent is shown." 272 journal of library automation vol. 2/ 4 december, 1969 basically the system is made up of three levels : local libraries, the regional center, and the "national library central." these are interconnected either by teleprinter, at 75 bits per second, or crt consoles, as 1200 to 2400 bits per second. mr. meise develops extensive tables, using connecticut as a model, for (a) estimated message traffic, real-time and batch; (b) allocation of communication traffic to segments of circuit route; (c ) cumulative communications traffic; (d) number of circuits required versus circuit speed. he discusses bibliographic coupling ( 78-82), the itek memory centered processor, disc packs and file organization, ( 100-118, 162-179). i cite these tables and data (there are many more) merely to show the approach. at one point he talks about packages such as books, at another about papers. the whole system is based on statistics for which there is no discussion. item: "the local library should satisfy a large percentage of the user's needs ( 90-99% ) ; however, some portion of these needs ( 1-10%) should be obtained from other libraries to keep system costs within reasonable range" ( p.32). where does "90-99%" come from? how do we know that this level will "keep system costs within reasonable range"? item: "the state of connecticut is about the right size for a regional center from the point of view of expected user load" ( p.118). whose hat did he pull this one out of? there is no discussion of right size, nor really any of what "size" means -population? geographic area? cultural makeup? one suggested region (arizona, nevada, utah) has about the same population as connecticut, but is 62 times the size. certainly the communications costs are entirely different and the two regions are not comparable. figures suddenly appear in the text, e.g. 9,610,000 vols. (p.98) and others, and the reader does not know where they came from. they may be right. they may even have been discussed somewhere in the text, but on page 98 one does not remember. and the index is of no value: two pages, hastily organized. this is all too bad, because mr. meise evidently put a good deal of effort into this. instead of discussing the statistical assumptions necessary for network planning, we are presented with raw and unevaluated data. instead of a thorough analysis of the "feasibility of a system concept", we are presented with a grandiose scheme. buried in the pile, however, are data, which while poorly organized and presented, are necessary for practical network planning. what is needed is a coherent and basic statement of the kinds of data available, of the kinds of data that are unavailable or imprecise, of the conditions under which these kinds of data hold, and of the relative usefulness of such data at varying systems levels. perhaps it is unfair to criticize mr. meise for not writing this kind of book. yet my criticism is precisely that, because he writes as though these data already exist in organized form. they don't. he has built a house of cards on air. robert s. taylor book reviews 273 thesaurus of eric descriptors, 2d edition, washington, d.c.: educational resources information center, bureau of research, office of education, 1969. 289 pp. one of the principal problems associated with the review of a new thesaurus is that the thesaurus usually serves simultaneously to exemplify the use and misuse of the basics of thesaurus construction. the thesaurus of eric descriptors is no exception. for the purposes of this review, it is necessary to distinguish between a thesaurus and an authority list. both are designed to improve communication between the user and the information storage and retrieval system. a thesaurus is usually used in conjunction with free-vocabulary indexing (and retrieval) while the authority list must be used only with controlledvocabulary indexing. hence a thesaurus, in the words of the engineers' joint council guide to indexing and abstracting, " . .. is not meant to specify the words in which information is to be recorded, but rather to establish the semantic and generic interrelationships between such words". the indexer uses the thesaurus as a means of "enriching" his indexing, i.e., as a guideline for effective indexing. the searcher uses the thesaurus to aid in phrasing or clarifying his search question. in neither use is there demanded the use of a particular term in preference to any other. an authority list, on the other hand, must be composed entirely of system terminology (except for the use-use for relationships, although the non-preferred term cannot be used profitably as a search term) which the indexer and searcher are constrained to employ. the thesaurus of eric descriptors is, by its own admission, an authority list ("only those descriptors actually used for indexing are placed in the thesaurus .. . ", p. vii). a thesaurus may be used with either free or controlled vocabulary indexing/ retrieval; an authority list may be used only with a controlled vocabulary. it is time we started using the correct terms for these two types of communication device. apart from the confusion as to the exact nature of the document it inu·oduces, the introduction to the thesaurus of eric descriptors does provide a good discussion of the problems of indexing and "thesaurus" development, especially concerning the need for multi-term entries. the descriptor listing, to which are added a rotated descriptor display, a descriptor group display, and descriptor scope notes, is well constructed (especially commendable is the rotated descriptor display). however, i question the value of the descriptor groups, which serve to grossly classify the eric descriptors, since they tend to detract from the cross-concept nature of the authority list. finally, the formats of the various listings in this document are well done and provide a very readable and usable authority list. james e. rush 274 journal of library automation vol. 2/4 december, 1969 announced reprints. vol. 1, feb. 1969. microcard editions. 52 pp. $30.00 per year. this journal complements guide to reprints. announced reprints lists forthcoming reprints that have been announced but not yet produced. published quarterly, its scope includes books, journals, and other materials originating both in the united states and abroad. each issue will cumulate all previous issues except that following the november issue all titles that have been published will be dropped. books are entered by author. entries include author, title and original date of publication. journals and sets are entered by title and include volume numbers. each entry includes in brackets the date of the first inclusion of an item in announced reprints. titles preceded by an asterisk are those that have been published subsequent to being listed as a forthcoming title. a title that appears in the february issue, for example, as an announced title which is then published in march will appear in the may, august and november issues preceded by an asterisk. following the november issue it is dropped. prices are included, in some cases being in the currency of the country. prepublication prices may be listed but the deadline is not. there is an alphabetical listing of publishers known to be active in the reprinting business, but of the 218 publishers so listed 124 did not supply announced reprints with titles. among the nonrespondents was kraus reprint corporation, one of the larger houses. exactly what need this journal answers is not completely clear . the guide to reprints provides an annual, cumulative list of books, journals and other materials that have been reprinted. as an acquisition tool it is self-evident. but since the period between the time a title is announced and the time it is actually reprinted is variable, one can only suppose that the publishers hope to fix their market by having their forthcoming titles listed in announced reprints. if they get expressions of interest from many libraries they may actually reprint. since announced reprints gives the date of the first time a reprint title is listed, eventually librarians will learn which publishers are reliable and which are not in following through on the promise of publishing a reprint. john demo~ union list of serials in the libraries in the miami vauey, sue brown, editor. 2d edition. dayton, ohio: wright state university library, 1969. $20.00. it's hard to review a union list of serials because such a publication is obviously a very useful thing to have and to use. being intimately connected with the production of a similar list for the cincinnati area, i can book reviews 275 only commend the librarians of the dayton-miami valley consortium for producing this second edition in as short a time as they did. (the first edition containing 8880 titles held by 35 libraries was published in the spring of 1968.) this edition contains the holdings of tlrree more libraries than did the first, and nearly 900 more titles are included. there are a few minor points about which one might quibble, such as the listing of the computer output on the lined side of the paper, making the pages of the published list a bit lined and grey looking; the use of corporate entries for the titles, which is o.k. if the list is used only by librarians and others used to that form of entry, but confusing to the average patron who is, i am convinced, used to looking up holdings information by the running title that he picked up in a citation somewhere; listing holdings under the latest title of a periodical with notes as to the title variations over the years (although i can't complain too loudly about this, as it is the same way we are doing the cincinnati area list, although with less information as to title changes than in this list); the use of library name "codes., that are the same as, or similar to, those used in the "union list of serials," which causes a great string of oda-to run down each page. there are, naturally, a few missed cross references to the latest title as well as a few keypunching errors. these detract little from the usefulness of the volume, which should be great, especially in the area near western ohio. the list is available for $20 from the acquisitions department, library, wright state university, colonel glenn highway, dayton, ohio 45431. thomas h. rees, ]r. current contents; education. 1 (june 17, 1969) philadelphia, institute for scientific information. subscription price varies. the rise in need for librarians to build their own offprint files has intensified searching for current, relevant references. current contents; education facilitates that search, for it reproduces contents pages of some 350 journals in the field of education and related fields. this new publication includes over a dozen library journals, including the journal of library automation. the various sections of current contents have established a well deserved reputation for timeliness. indeed, some librarians have complained that their users receive reprinted contents pages in current contents before libraries receive the journals. since each issue contains an author index and address directory, it is easy to request an offprint, and thereby expend minimal effort in keeping up as well as building a personal offprint collection. the subscription price can vary from $100 for a single non-educational subscription to less than $1.50 for multi-year subscriptions in groups of 200 or more. f d ·kg k"l re enc . j gour 276 journal of library automation vol. 2/ 4 december, 1969 standardization for documentation, bernard houghton, ed. hamden, conn.: archon books, 1969. 93 pp. $4.00. the editor has brought together in this tight little book an illuminating collection of six useful papers prepared initially for a conference held in liverpool, england, in november, 1968. the announced goal of this conference was to "isolate and consider some of the areas in which the adoption of universal standards is of immediate relevance." inasmuch as the authors are all british, the volume will have greater interest abroad than in the u.s. nonetheless, there is universal recognition that standards in various areas of documentation are desperately needed and that a great deal remains to be done. an especially clear exposition of the british standards institution's work in this field is the work of c. w. paul-jones. he relates the methodology and the work of bsi to that of the international standards organization (iso ) and touches briefly on each standards committee and its program of work. his concise outline of standards in being and in progress, and the place of each standards-involved organization in the framework of universal standards is thoroughly competent. k. i. porter, the editor of the british union catalogue, touches on a variety of problems encountered in his work and discusses the potential of standards in the area of serial publications. a wryly humorous essay on standards for book production is the work of peter wright. he seems not very hopeful of changing the methods of book-trade production through standards, but believes in the usefulness of the effort to establish them. the essay of k.g.b. bakewell takes up classification, cataloging, and other devices for organizing library material and providing access to it. he deplores the inchoate british development in these areas and cites considerably greater standardization elsewhere. his review of known systems is helpful. d. martin's paper, "standards for a mechanized information system," reviews the practical problems of one who has to subject information to the unthinking mind of machines. he too enumerates needed standards for coding, indexing, data elements, etc., and concludes (properly) that "it is too early to start talking in terms of solutions: standards activity is only now beginning to gather momentum." the final paper, that of john maclachlan, is an ordinary how-we-do-it job, describing an abstracting service in one specific field and the local standards applied. these six papers taken as a whole constitute an informative and cogent source of information on the present status of standards work in the information field. despite the british emphasis, the case for multi-national and international standards is clearly set forth. this book should be required reading for students and workers in the fields of information science and documentation. ] errold orne book reviews 277 cataloging u.s.a., by paul s. dunkin. chicago: american library association, 1969. 159 pp. this book is, quite simply, a survey of the development of cataloguing in america, and of the present situation of cataloguing in america. it deals with all aspects of author cataloguing, descriptive cataloguing and subject cataloguing (both subject headings and classification). the method used by the author is didactic and expository rather than critical mr. dunkin seeks to analyse and to display the situation rather than to arrive at startling insights or to propose radical modification. the book is addressed to "the beginning student ... the experienced cataloger ... the public service librarian ... the library administrator". it is, mr. dunkin says, not a "how to do it book" but a "why do it book". it is certainly true that any member of mr. dunkin's readership will be enlightened by being shown the roots of modem cataloguing, and by having the perennial problems of cataloguing discussed in an admirably clear manner. mr. dunkin does not fail to illuminate each problem, and such illumination is, of course, half way to a solution. where he does fail, i feel, is in not providing any firm answers to these problems. perhaps in cataloguing there are no firm answers. this book seems to me, as an english cataloguer, to epitomise the "other directed" nature of american cataloguing. in reading this, as other american textbooks, i find a somewhat reverent attitude towards the great figures (principally cutter), the great institutions ( principally the library of congress) and the "sacred texts" (the various codes). the english tradition seems to me much more "inner directed", much more concerned with what is best for the individual catalogue, much less concerned with the necessity for standardisation and consistency between catalogues. this is not to say that either approach has a monopoly of virtue, or that one can fault mr. dunkin's book on this account. mr. dunkin has chosen his readership and his method, and within his self-imposed limits has produced a practical and useful book. furthermore the book is written with a clarity and ease unusual in cataloguing literature. michael gorman systems analysis in libraries, by f. robinson, c. r. clough, d. s. hamilton, and r. winter. newcastle upon tyne: oriel press lin1ited, 1969. 55 pp. 15s. ( symplegades, number i, a series of papers on computers, libraries and information processing). if symplegades once diligently guarded the entrance to the bosphorus, it has now gratefully allowed this simple book to survive its peril. the authors explain that the title is somewhat misleading-the book 278 journal of library automation vol. 2/4 december, 1969 has nothing to do with library systems and little in terms of system analysis that does not relate to computerization. the two purposes of this work are the need for stressing clarity in defining objectives and for emphasizing the extent and depth of the work involved in systems analysis. a book that could achieve such simple but difficult objectives and do it intelligently would indeed be welcome in our discipline. this volrune, however, does not obtain its objectives. it does provide something just as important in that it is readable (with dashes of humor) with a simple presentation of the basic tenets of systems analysis as it applies to libraries. it assumes that the reader knows nothing about systems analysis and its application to the computer. for the professional neophyte or the old graduate who has finally faced up to the realities of the future, this book should be a definite beginning point. the structure of the book and the presentation of the text contains the same simplicity as the message and book conveys. the presentation is in the form of the message. the book does contain one point of view which seems invalid. it suggests that systems analysis is only undertaken in connection with computerization. there are other shortcomings like the unexplained and unlabeled figures and the use of acronymns without explanation or definition. one also wonders about a technical book without the use of sourcing. i rene braden on research libraries; statement and recommendations of the committee on research libraries of the american council of learned societies; submitted to national advisory commission on libraries, november, 1967. cambridge, mass.: the m.i.t. press, 1969. 104 pp. this report presents the problems of research libraries and puts forth eleven major recommendations to solve these problems. in summary the recommendations are for a national library structure presided over by a national commission on libraries to cope with various problems, including automation; financial support from federal, state and private sources; and study and revision of the copyright law. none of the recommendations is novel. edwin e. williams of the harvard university library contributed a skillful summary of problems related to "bibliographic control and physical dissemination." m. v. mathews and w. s. brown of the bell telephone laboratories prepared a section entitled "research libraries and the new technology," which discusses computers and microcopying. the discussion of library computer applications is less than helpful. the authors propose a catalog for a university library on 80 reels of magnetic tape and propose "complete resorting of the catalog." no one with any experience whatbook reviews 279 soever in library computerization would dream, even in his worst nightmare, of such a monstrous arrangement. yale's ralphs. brown, jr., has furnished an appendix, "copyright problems of research libraries," that is most perceptive and informative. brown concludes that although copyright revision must move on," the costs of using copyright works [must be] bargained out" and that congress "must for a while attempt the difficult feat of standing still on a tightrope." the verso of the title page of on research libraries sharpens this point, for it carries the prohibition that "no part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without permission in writing from the publisher." bargaining, if it can be called that, is surely here. frederick g. kilgour microsoft word 5699-11611-7-ce.docx geographic information and technologies in academic libraries: an arl survey of services and support ann l. holstein information technology and libraries | march 2015 38 abstract one hundred fifteen academic libraries, all current members of the association of research libraries (arl), were selected to participate in an online survey in an effort to better understand campus usage of geographic data and geospatial technologies, and how libraries support these uses. the survey was used to capture information regarding geographic needs of their respective campuses, the array of services they offer, and the education and training of geographic information services department staff members. the survey results, along with review of recent literature, were used to identify changes in geographic information services and support since 1997, when a similar survey was conducted by arl. this new study has enabled recommendations to be made for building a successful geographic information service center within the campus library that offers a robust and comprehensive service and support model for all geographic information usage on campus. introduction in june 1992, the arl in partnership with esri (environmental systems research institute) launched the gis (geographic information systems) literacy project. this project sought to “introduce, educate, and equip librarians with the skills necessary” to become effective gis users and to learn how to provide patrons with “access to spatially referenced data in all formats.”1 through the implementation of a gis program, libraries can provide “a means to have the increasing amount of digital geographic data become a more useful product for the typical patron.”2 in 1997, five years after the gis literacy project began, a survey was conducted to elucidate how arl libraries support patron gis needs. the survey was distributed to 121 arl members for the purpose of gathering information about gis services, staffing, equipment, software, data, and support these libraries offered to their patrons. seventy-‐two institutions returned the survey, a 60% response rate. at that time, nearly three-‐quarters (74%) of the respondents affirmed that their library administered some level of gis services.3 this indicates that the gis literacy project had an evident positive impact on the establishment of gis services in arl member libraries. since then, it has been recognized that the rapid growth of digital technologies has had a tremendous effect on gis services in libraries.4 we acknowledge the importance of assessing ann l. holstein (ann.holstein@case.edu) is gis librarian at kelvin smith library, case western reserve university, cleveland, ohio. geographic information and technologies in academic research libraries | holstein 39 how geographic services in academic research libraries have further evolved over the past 17 years in response to these advancing technologies as well as the increasingly demanding geographic information needs of their user communities. method for this study, 115 academic libraries, all current members of arl as of january 2014, were invited to participate in an online survey in an effort to better understand campus usage of geographic data and geospatial technologies and how libraries support these uses. similar in nature to the 1997 arl survey, the 2014 survey was designed to capture information regarding geographic needs of their respective campuses, the array of services, software. and support the academic libraries offer, and the education and training of geographic information services department staff members. our aim was to be able to determine the range of support patrons can anticipate at these libraries and ascertain changes in gis library services since the 1997 survey. a cross-‐sectional survey was designed and administered using qualtrics, an online survey tool. it was distributed in january 2014 via email to the person identified as the subject specialist for mapping and/or geographic information at each arl member academic library. when the survey closed after two weeks, 54 institutions had responded to the survey. this accounts for 47% participation. responding institutions are listed in the appendix. results software and technologies we were interested in learning about what types of geographic information software and technologies are currently being offered at academic research libraries. results show that 100% of survey respondents offer gis software/mapping technologies at their libraries, 36% offer remote sensing software (to process and analyze remotely sensed data such as aerial photography and satellite imagery), and 36% offer global positioning system (gps) equipment and/or software. nearly all (98%) said that their libraries provide esri arcgis software, with 83% also providing access to google maps and google earth, and 35% providing qgis (previously known as quantum gis). smatterings of other gis, remote-‐sensing, and gps products are also offered by some of the libraries, although not in large numbers (see table 1 for full listing). the fact that nearly all survey respondents offer arcgis software at their libraries comes as no surprise. arcgis is the most commonly provided mapping software available in academic libraries, and in 2011, it was determined that 2,500 academic libraries were using esri products.5 esri software was most popular in 1997 as well, undoubtedly because they offered free software and training to participants of the gis literacy project.6 information technology and libraries | march 2015 40 software/technology type % of providing libraries esri arcgis gis 98 google maps/earth gis 83 qgis gis 35 autocad gis 19 erdas imagine remote sensing 19 grass gis 15 envi remote sensing 15 geoda gis 6 pci geomatica remote sensing 6 garmin map source gps 6 simplymap gis 4 trimble terrasync gps 4 table 1. geographic information software/mapping technologies provided at arl member academic libraries (2014) google maps and google earth, launched in 2005, have quickly become very popular mapping products used at academic libraries—a close second only to esri arcgis. in addition to being free, their ease of use, powerful visualization capabilities, “customizable map features and dynamic presentation tools” make them attractive alternatives to commercial gis software products.7 since 1997, many software programs have fallen out of favor. mapinfo, idrisi, maptitude, and sammamish data finder/geosight pro were gis software programs listed in the 1997 survey results that are not used today at arl member academic libraries.8 instead, open source software such as qgis, grass, and geoda are growing in popularity. they are free to use and their source code may be modified as needed. gps equipment lending can be very beneficial to students and campus researchers who need to collect their own field research locational data. the 2014 survey found that 30% of respondents loan recreational gps equipment at their libraries and 10% loan mapping-‐grade gps equipment. the high cost of mapping-‐grade gps equipment (several thousand dollars) may be a barrier for some libraries; however, this is the type of equipment recommended in best-‐practice methods for gathering highly accurate gps data for research. in addition to expense, complexity of operation is another consideration. while it is “fairly simple to use a recreational gps unit,” a certain level of advanced training is required for operating mapping-‐grade gps equipment.9 a designated staff member may need to take on the responsibility of becoming the in-‐house gps expert and routinely offer training sessions to those interested in borrowing mapping-‐grade gps equipment. location geographic information and technologies in academic research libraries | holstein 41 at 36% of responding libraries, the geographic information services area is located where the paper maps are (map department/services); 19% have separated this area and designated it as a geospatial data center, gis, or data services department; 13% integrate it with the reference department; and just 4% of libraries house the gis area in government documents. table 2 lists all reported locations for this service area. not surprisingly, in 1997, government documents (39%) was just as popular a location for this service area as within the map department (43%).10 libraries identified government documents as a natural fit, keeping gis services within close proximity to spatial data sets recently being distributed by government agencies, most notably the us government printing office (gpo). these agencies had made the decision to distribute “most data in machine readable form,”11 including the 1990 census data as topographically integrated geographic encoding and referencing (tiger) files.12 gis technologies were needed to access and most effectively use information within these massive spatial datasets. location % of libraries (1997) % of libraries (2014) map department/services 43 36 government documents 39 4 reference 10 13 geospatial data center, gis, or data services 3 19 not in any one location -‐ 9 digital scholarship center -‐ 6 combined area (i.e., map dept. & gov. docs.) -‐ 6 table 2. location of the geographic information services area within the library (1997 and 2014) at 59% of responding libraries, geographic information software is available on computer workstations in a designated area, such as within the map department. however, many do not restrict users by location and have the software available on all computer workstations throughout the library (37%) or on designated workstations distributed throughout the library (33%). a small percentage (7%) loan laptops to patrons with the software installed, allowing full mobility throughout the entire library space. staffing most professional staff working in the geographic information services department hold one or more postbaccalaureate advanced degrees. of 113 geographic services staff at responding libraries, 65% had obtained an ma/ms, mls/mlis, or phd; 43% have one advanced degree, while 22% have two postbaccalaureate degrees. half (50%) hold an mls/mlis, 31% hold an ma/ms, and 6% hold a phd. nearly one-‐third (31%) have obtained a ba/bs as their highest educational degree, 3% had a two-‐year technical degree, and 2% had only earned a ged or high school diploma. in 1997, 84% of gis librarians and specialists at arl libraries had an mls degree.13 at that time, the incumbent was most often recruited from within the library to assume this new role, information technology and libraries | march 2015 42 whereas today’s gis professionals are just as likely to come from nonlibrary backgrounds, bringing their expertise and advanced geographic training to this nontraditional librarian role. figure 1. highest educational degree of geographic services staff (2014) on average, this department is staffed by two professional staff members and three student staff. student employees can be a terrific asset, especially if they have been previously trained in gis. students are likely to be recruited from departments that are the heaviest gis users at the university (i.e., geography, geology). some libraries have implemented “co-‐op” programs where students can receive credit for working at the gis services area. these dual-‐benefit positions are quite lucrative to students.14 campus users in a typical week during the course of a semester, responding libraries each serve approximately sixteen gis users, four remote sensing users, and three gps users. these users may obtain assistance from department staff either in-‐person or remotely via phone or email. on average, undergraduate and graduate students compose the majority (75%) of geographic service users (32% and 43%, respectively). faculty members compose 14% of the users, followed by staff (including postdoctoral researchers) at 7%. some institutions also provide support to public patrons and alumni (4% and 1%, respectively). in 1997, it was estimated that on average, 63% of gis users were students, 22% were faculty, 8% were staff, and 8% were public.15 ged/hs 2% 2yr tech 3% ba/bs 31% ma/ms/mlis 58% phd 6% geographic information and technologies in academic research libraries | holstein 43 figure 2. comparison of the percentage of geographic service users by patron status (1997 and 2014) the top three departments that use gis software at arl campuses are environmental science/studies, urban planning/studies, and geography. the most frequent remote sensing software users come from the departments of environmental science/studies, geography, and archaeology. gps equipment loan and software usage is most popular with the departments of environmental science/studies, geography, biology/ecology and archaeology (see table 3 for full listing). some departments are heavy users of all geographic technologies, while others have shown interest in only one. for example, the departments of psychology and medicine/dentistry have used gis but have expressed little or no interest in using remote-‐sensing or gps technologies. support and services the campus community is supported by library staff in a variety of ways with regards to gis, remote-‐sensing, and gps technology and software use. nearly all (94%) libraries provide assistance using the software for specific class assignments and projects, and 78% are able to provide more in-‐depth research project consultations. more than one-‐quarter (27%) of reporting libraries will make custom gis maps for patrons, although there may be a charge depending on the library, project, and patron type (10%). most (90%) offer basic use and troubleshooting support; however, just 39% offer support for software installation, and 55% offer technical support for problems such as licensing issues and turning on extensions. the campus computing center or information technology services (its) at arl institutions most likely fields some of the software installation and technical issues rather than the library, thus accounting for the lower percentages. a variety of software training may be offered to the campus community through the library; 80% of responding libraries make visits to classes to give presentations and training sessions, 69% host workshops, 47% provide opportunities for virtual training courses and tutorials, and 4% offer certificate training programs. 0 10 20 30 40 50 60 70 80 students faculty staff public alumni 1997 2014 information technology and libraries | march 2015 44 department gis remote sensing gps anthropology 24 10 8 archaeology 24 14 13 architecture 24 1 6 biology/ecology 32 10 13 business/economics 23 1 3 engineering 18 9 11 environmental science/studies 41 22 16 forestry/wildlife/fisheries 21 12 10 geography 35 22 15 geology 31 12 10 history 27 2 2 information sciences 14 1 0 nursing 8 1 2 medicine/dentistry 9 0 0 political science 25 3 5 psychology 4 0 0 public health/epidemiology/ biostatistics 30 3 9 social work 2 0 1 sociology 22 0 3 soil science 17 5 4 statistics 8 3 0 urban planning/studies 36 7 9 table 3. number of arl libraries reporting frequent users of gis, remote-‐sensing, or gps software and technologies from a campus department (2014) often, the library is not the only place people can go to obtain software support and training on campus. most (86%) responding libraries state that their university offers credit courses, and 41% of campuses have a gis computer lab located elsewhere on campus that may be utilized. its is available for assistance at 29% of the universities, and continuing education offers some level of training and support at 14% of campuses. data collection and access most (85%) of responding libraries collect geographic data and allow an annual budget for it. “libraries that have invested money in proprietary software and trained staff members will tend to also develop and maintain their own collection of data resources.”16 of those collecting data, 26% spend less than $1,000 annually, 15% spend between $1,000 and $2,499, 17% spend between $2,500 and $5,000, while 41% spend more than $5,000. in 1997, 79% of libraries spent less than $2,000 annually, and only 9% spent more than $5,000.17 geographic information and technologies in academic research libraries | holstein 45 figure 3. annual budget allocations for geographic data (2014) a dramatic shift has occurred over the years with budget allocations for data sets. no longer are academic libraries just collecting free government data sets as was typically the case back in 1997, but they are investing much more of their materials budget into building up the geographic data collection for their users. data is made accessible to campus users in a variety of ways. a majority (84%) offer data via remote access or download from a networked campus computer, using a virtual private network (vpn) or login. more than half (62%) of responding libraries provide access to data from workstations within the library, and 64% lend cd-‐roms. roughly one-‐quarter (26%) of responding libraries provide users with storage for their data. of those, 29% have a dedicated geographic data server, 14% use the main library server, 29% point users to the university server or institutional repository, and 36% allow users to store their data directly onto a library computer workstation hard drive. internal use of gis in libraries geographic information technologies may be used internally to help patrons navigate the library’s physical collections and efficiently locate print materials. of the survey respondents, 60% use gis for map or air photo indexing, 27% use the technology to create floor maps of the library building, and 15% use it to map the library’s physical collections. “the use of gis in mapping library collections is one of the non-‐traditional but useful applications of gis.”18 gis can be used to link library materials to simulated views of floor maps through location codes.19 this enables patrons to determine the exact location of library material by providing them with item “location details such as stacks, row, rack, shelf numbers, etc.”20 the gis system can become a useful tool for collection management and can be a tremendous time-‐saver for patrons, especially those unfamiliar with the cataloging system or collection layout. discussion recommendations for building a successful geographic information service center 0 5 10 15 20 25 30 35 40 45 percent (%) information technology and libraries | march 2015 46 the geographic information services area is often a blend of the traditional and modern. it can extend to paper maps, atlases, gps equipment, software manuals, large-‐format scanners, printers, and gis. gis services may include a cluster of computers with gis software installed, an accessible collection of gis data resources, and assistance available from the library staff. the question for academic libraries today is no longer “whether to offer gis services but what level of service to offer.”21 every university has different gis needs, and the library must decide how it can best support these needs. there is no set formula for building a geographic information service center because each institution “has a different service mission and user base.”22 every library’s gis service program will be designed with its unique institutional needs in mind; however, they each will incorporate some combination of hardware, software, data, and training opportunities provided by at least one knowledgeable staff member.23 “gis represents a significant investment in hardware, software, staffing, data acquisition, and ongoing staff development. either new money or significant reallocation is required.”24 establishing new or enhancing gis services in the library requires the “serious assessment of long-‐ term support and funding needs.”25 commitment of the university as a whole, or at least support from senior administration, “library administration, and related campus departments” is crucial to its success.26 receiving “more funding will mean more staff, better trained staff, a more in-‐depth collection, better hardware and software, and the ability to offer multiple types of gis services.”27 once funding for this endeavor has been secured, it is of utmost importance to recruit a gis professional to manage the geographic information service center. to be most effective in this position, the incumbent should possess a graduate degree in gis or geography; however, depending on what additional responsibilities would be required of the candidate (i.e., reference, cataloging, etc.) a second degree in library science is strongly recommended. this staff member should possess mapping and gis skills, which include experience with esri software and remote sensing technologies. employees in this position may be given a job titles such as “gis specialists, gis/data librarians, gis/map librarians, digital cartographers, spatial data specialists, and gis coordinators.”28 with the new staff member on board, hereafter referred to as “gis specialist,” decisions such as what software to provide, which data sets to collect, and what types of training and support to offer to the campus can be made. consulting with research centers and academic departments that currently use or are interested in using gis and remote sensing technologies is a good place to learn about software, data, and training needs and to determine the focus and direction of the geographic information services department.29 campus users often come from academic departments that “have neither staff nor facilities to support gis,” and “may only consist of one or two faculty and a few graduate students. these gis users need access to software, data, and expertise from a centralized, accessible source of research assistance, such as the library.”30 at minimum, esri arcgis, google maps and google earth should be supported, with additional remote sensing or open source gis software depending on staff expertise and known campus geographic information and technologies in academic research libraries | holstein 47 needs. when purchasing commercial software licenses, such as for esri arcgis, discounts for educational institutions are usually available. additionally, negotiating campus-‐wide software licenses may be a good option to consider as the costs are usually far less than purchasing individual or floating licenses. costs for campus-‐wide licensing are typically determined by full-‐ time equivalent (fte) students enrolled at the university. facilitating “access to educational resources such as software tools and applications, how-‐to-‐ guides for data and software,” and tutorials is crucial.31 the gis specialist must be familiar with how gis software can be used by many disciplines, the availability of “training courses or tutorials, sources or extensible gis software, and hundreds of software and application books.”32 tutorials may be provided direct from a software vendor (i.e., esri virtual campus) or developed in-‐house by the gis specialist. creating “gis tutorials on short, task-‐based techniques such as georeferencing or geocoding” and making them readily available online or as a handout may save time having to repeatedly explain these techniques to patrons.33 geospatial data collection development is a core function of the geographic information services department. to effectively develop the data collection, the gis specialist must fully comprehend the needs of the user community as well as possess a “fundamental understanding of the nature and use of gis data.”34 this is often referred to as “spatial literacy.”35 it is crucial to keep abreast of “recent developments, applications, and data sets.”36 the gis specialist will spend much more time searching for and acquiring geographic data sets than selecting and purchasing traditional print items such as maps, monographs, and journals for the collection. a budget should be established annually for the purchase of all geographic materials, both print and digital. a great challenge for the specialist is to acquire data at the lowest cost possible. while a plethora of free data is available online from government agencies and nonprofit organizations, other data, available only from private companies, may be quite expensive because of the high production costs. a collection development policy should be created that indicates the types of materials and data collected and specifies geographic regions, formats, and preferred scales.37 the needs of the user community must be carefully considered when establishing the policy. the expertise of the gis specialist is needed not only to help patrons locate the appropriate geographic data, but also to use the software to process, interpret, and analyze it. “only the few library patrons that have had gis experience are likely to obtain any level of success without intervention by library staff”;38 thus, for any mapping program installed on a library computer, “staff must have working knowledge of the program” and must be able to provide support to users.39 furthermore, the gis specialist must be able to train patrons to use the software to complete common tasks such as file format conversion, data projection, data manipulation, and geoprocessing. these geospatial technologies involve a steep learning curve, and unfortunately “hands-‐on training options outside the university are often cost-‐prohibitive” for many.40 the campus community requires training opportunities to be both convenient and inexpensive. information technology and libraries | march 2015 48 teaching hands-‐on geospatial technology workshops, from basic to the advanced, is fundamental to educating the campus community. workshops will “vary from institution to institution, with some offering students an introduction to mapping and others focusing on specific features of the program, such as georeferencing, geocoding, and spatial analysis. some also offer workshops that are theme specific,” such as “working with census data” or “digital elevation modeling.”41 custom workshops or training sessions can be developed to meet a specific campus need, tailored for a specific class in consult with an instructor, or designed especially for other library staff. today’s geographic information service center the academic map librarian from the 1970s or 1980s would hardly recognize todays’ geographic information service center. what was once a room of map cases and shelves of atlases and gazetteers is now a bustling geospatial center. computers, powerful gis and remote-‐sensing technologies, gps devices, digital maps, and data are now available to library patrons. every library surveyed provides gis software to campus users, and 85% also actively collect gis and remotely sensed data. with the assistance of expertly trained library staff, users with no or limited experience using geospatial technologies are enabled to analyze spatial data sets and create custom maps for coursework, projects, and research. nearly all surveyed libraries (94%) have staff that can assist students specifically with software use for class assignments and projects, while 90% provide assistance with more generalized use of the software. a majority of libraries also offer a variety of software training sessions, workshops, and give presentations to the campus community. all this is made possible through the library’s commitment to this service area and the availability of highly trained professional staff, most who hold a masters or doctoral degree. the library has truly established itself as the go-‐to location on campus for spatial mapping and analysis. this role has only strengthened in the years since the launch of the arl gis literacy project in 1992. references 1. d. kevin davie et al., comps., spec kit 238: the arl geographic information systems literacy project (washington, dc: association of research libraries, office of leadership and management services, 1999), 16. 2. ibid., 3. 3. ibid., i. 4. abraham parrish, “improving gis consultations: a case study at yale university library,” library trends 55, no. 2 (2006): 328, http://dx.doi.org/10.1353/lib.2006.0060. 5. eva dodsworth, getting started with gis: a lita guide (new york: neal-‐schuman, 2012), 161. 6. davie et al., spec kit 238, i. geographic information and technologies in academic research libraries | holstein 49 7. eva dodsworth and andrew nicholson, “academic uses of google earth and google maps in a library setting,” information technology & libraries 31, no. 2 (2012): 102, http://dx.doi.org/10.6017/ital.v31i2.1848. 8. davie et al., spec kit 238, 8. 9. gregory h. march, “surveying campus gis and gps users to determine role and level of library services,” journal of map & geography libraries 7, no. 2 (2011): 170–71, http://dx.doi.org/10.1080/15420353.2011.566838. 10. davie et al., spec kit 238, 5. 11. george j. soete, spec kit 219: transforming libraries issues and innovation in geographic information systems. (washington, dc: association of research libraries, office of management services, 1997), 5. 12. camila gabaldón and john repplinger, “gis and the academic library: a survey of libraries offering gis services in two consortia,” issues in science and technology librarianship 48 (2006), http://dx.doi.org/10.5062/f4qj7f8r. 13. davie et al., spec kit 238, 5. 14. soete, spec kit 219, 9. 15. davie et al., spec kit 238, 10. 16. dodsworth, getting started with gis, 165. 17. davie et al., spec kit 238, 9. 18. d. n. phadke, geographical information systems (gis) in library and information services (new delhi: concept, 2006), 36–37. 19. ibid., 13. 20. ibid., 74. 21. rhonda houser, “building a library gis service from the ground up,” library trends 55, no. 2 (2006): 325, http://dx.doi.org/10.1353/lib.2006.0058. 22. melissa lamont and carol marley, “spatial data and the digital library,” cartography and geographic information systems 25, no. 3 (1998): 143, http://dx.doi.org/10.1559/152304098782383142. information technology and libraries | march 2015 50 23. carolyn d. argentati, “expanding horizons for gis services in academic libraries,” journal of academic librarianship 23, no. 6 (1997): 463, http://dx.doi.org/10.1559/152304098782383142. 24. soete, spec kit 219, 11. 25. carol cady et al., “geographic information services in the undergraduate college: organizational models and alternatives,” cartographica 43, no. 4 (2008): 249, http://dx.doi.org/10.3138/carto.43.4.239. 26. houser, “building a library,” 325. 27. r. b. parry and c. r. perkins, eds., the map library in the new millennium (chicago: american library association, 2001), 59–60. 28. patrick florance, “gis collection development within an academic library,” library trends 55, no. 2 (2006): 223, http://dx.doi.org/10.1353/lib.2006.0057. 29. houser, “building a library,” 325. 30. ibid., 323. 31. ibid., 322. 32. parrish. “improving gis,” 329. 33. ibid, 336. 34 florance, “gis collection development,” 222. 35. soete, spec kit 219, 6. 36. dodsworth, getting started with gis, 165. 37. soete, spec kit 219, 8. 38. gabaldón and repplinger, “gis and the academic library.” 39. dodsworth, getting started with gis, 164. 40. houser, “building a library,” 323. 41. dodsworth, getting started with gis, 161–62. geographic information and technologies in academic research libraries | holstein 51 appendix responding institutions arizona state university libraries university of michigan library auburn university libraries michigan state university libraries boston college libraries university of nebraska–lincoln libraries university of calgary libraries and cultural resources new york university libraries university of california, los angeles, library university of north carolina at chapel hill libraries university of california, riverside, libraries north carolina state university libraries university of california, santa barbara, libraries northwestern university library case western reserve university libraries university of oregon libraries colorado state university libraries university of ottawa library columbia university libraries university of pennsylvania libraries university of connecticut libraries pennsylvania state university libraries cornell university library purdue university libraries dartmouth college library queen’s university library duke university library rice university library university of florida libraries university of south carolina libraries georgetown university library university of southern california libraries university of hawaii at manoa library syracuse university library university of illinois at chicago library university of tennessee, knoxville, libraries university of illinois at urbana-‐champaign library university of texas libraries indiana university libraries bloomington texas tech university libraries johns hopkins university libraries university of toronto libraries university of kansas libraries tulane university library mcgill university library vanderbilt university library university of manitoba libraries university of waterloo library university of maryland libraries university of wisconsin–madison libraries massachusetts institute of technology libraries yale university library university of miami libraries york university libraries lib-mocs-kmc364-20131012113359 210 reports and working papers inclusion of nonroman character sets the following document was prepared by staff of the library of congress as a working paper for discussions on incorporating the techniques described into the marc communications format. the document defines the principles for inclusion of nonroman alphabet character sets in the marc communications format and the procedural changes needed to allow implementation of the principles. this technique was agreed upon at the marbi committee meeting on february 2, 1981. any questions on the description of the inclusion of nonroman character sets in the marc communications format should be addressed to: library of congress, processing services, attention: mrs. margaret patterson, washington , dc 20540. 1. introduction the cataloging rules followed by american libraries favor recording the title page data in the original script when possible. this helps those who consult catalogs to read the most essential information about the book. (reading his or her name in romanized form is just as difficult for someone who knows arabic as reading your name when it's written in arabic. ) the new cataloging rules also specify that names and titles in notes be given in their original script, aacr2 l. 7 a.3. technological advances have made it possible to provide many, if not all , nonroman alphabets in machinereadable cataloging records. oclc and rlin are in the process of enhancing their systems so they can handle some nonroman writing systems. the library of congress has entered into a cooperative agreement with rlin for the development and use of an augmented rlin system for east asian (i.e., chinese, japanese, and korean) bibliographic data. although the library itself will not be creating and distributing marc records with nonroman characters in the near term , the goal of this proposal is to define how these data can be included now so others can do so soon. the technique known as an escape sequence announces that the codes which follow will represent letters in a specific different alphabet instead of the roman letters the codes would otherwise stand for. 2. principles the following principles will govern inclusion of other alphabets in marc records. note that these deal only with the marc communications format record, not the details of its processing-keying, sorting, display, etc.-by any bibliographic agency or utility. these principles are a slightly revised version of ones reviewed and approved in principle by the marbi character set committee in 1976. the earlier version was also distributed that year as working paper n77 of iso tc46/sc4/ wgi. (1) standard character sets should be used when available. (2) standard escape sequences should be used when available. (3) escape sequences should be used only when needed. (4) escape sequences are locking within a subfield but revert at any delimiter or field or record terminator code. example: (for demonstration purposes only, ec represents escape to cyrillic and ea escape to ascii) 245 10$aecrussian title proper :$becrussian subtitle. f not 245 10$aecrussian title proper :ea$becrussian subtitle. eaf and not 245 10$aecrussian title proper :$brussian subtitle. f (5) records which contain an escape sequence will also contain a special field which specifies what unusual character sets are present. 3. implementation the following will be done to realize these principles. • the ala character set will be redefined-see table 1. • a new character sets present field will be defined. • details of application such as distribution, filing indicator values, etc., will be defined. 3.1 discussionala character set a character set is a list of characters with the code used to represent each one. using this definition , the ala character set as given in appendixes iii.b and iii.c of marc formats for bibliographic data actually consists of eight character sets. (1) ascii and ala diacritics and special characters with their eight-bit code. (2) superscript zero to nine, plus, minus, open and close parentheses with their eight-bit code. table 1. proposed revised ala character set ~ p ~ p p p p i p p i p p p i i p i p p p i p i p i i p ~ i i i i ~ ~ ~ i p ~ i i ~ i p i p i i i i p p i i p i i i i p i i i i 4 3 2 i bits p i 2 3 4 ~ 6 7 r 9 10 ii 12 13 14 , 'i p i p i 2 nul ole sp soh dci ! fstx dc2 . etx dc3 " eot dc4 s enq nak " ack syn & bel !::to os can i ht em i lp sub vt f:sc + ff fs cr os , so ns , s l us' i ~ p l p 9 i i i i p p p i p i 3 ~ ~ . p @ p i a q 2 b r 3 c s 4 i> t ~ e u 6 p v 7 g w 8 h x !i i y j z ; k i < 1. \ m i > n ' 1 0 i ascii 6 • b c d c r • h ; j k i m n 0 reports and working papers 211 (3) subscript zero to nine, plus, minus, open and close parentheses with their eight-bit code. (4) greek lowercase alpha, beta, and gamma with their eight-bit code. (5-8) the same characters with their sixbit codes. the six-bit character sets are used to distribute marc records on seven-track tapes. there are very few subscribers. it is unlikely that a method can be devised for distribution of nonroman character sets records on such tapes. the present seven-track subscribers should be asked if they know of any way to do so. if they do not, the alternatives are to cease distribution of seven-track tapes entirely or limit them to those records containing only roman alphabet characters-those without a character sets present field. in the latter case, they should pay proportionately less for their subscription. the present four eight-bit character sets and their escape sequences do not conform to present standards. the present standards did not exist when the character sets were being defined. to avoid creating and distributing records containing both standard and nonstandard character sets and stanp p i i i i p p i i p p p i p i 7 8 9 . p q r ' l u ,. w x y , i' : i' -. del l i i /i p ~ i i i i i i p p i i p i p i ii p ~~. i i 10 ii 12 13 · u 0 l i l ' e • < 2 0 d ' j .. p ~ 4 4 . ;e • 5 s u > . b "' b * . p p © i ii r ® ii j 1 ® l " '-../ escape sequences would be given where needed in data fields. if necessary, it is permissible to embed escape sequences within a word. for example, a latin diacritic might be needed with an extended cyrillic letter to represent a letter in one of the nonslavic languages of central asia which uses the cyrillic alphabet. in addition to escape sequences for nonroman alphabets described above in which one code stands for one letter, the escape standards also define escape sequence procedures for changing to multiple byte character sets. because the ideographic writing 214 journal of library automation vol. 14/3 september 1981 table 3. escape sequence character set p p p p p p p 1 p p 1 p ~ ~ 1 1 u i p p p i p 1 p i 1 p p 1 i 1 i p p u 1 u u 1 i ~ 1 p i ~ i i i i p u i i q i i i i p i 1 i i ·i 3 2 i jilts g 1 2 j 4 r. 6 7 8 9 ill 11 12 ij (.1 1$ fl ~ ~ u ~ p 1 p 1 ~ p q p 1 1 p i 2 3 sp p ! 1 " 2 # j ll 4 \\ $ & 6 7 i 8 i 9 : • : < > i ? l ~ i i i g g i p i ~ 4 !o g 10 n 10 a « a 15 p !j 1.1 c ll a t .n e >' e "' r • r " • x u .. lo( -,, 3 ~ k w " j1 . n ... u1 m k ~ ll 0 . 0 ~ i i l i i u g p u i 1 p ~ i 1 u 1 ~ 1 ~ i ~ 7 8 9 10 11 12 n r .!l ~ p r c c t ~ y s lk j b 'i b j bl "' 3 ,_ ill ii 3 " ul y 'l ,, ,, i i p 1 13 -t 9 v .. [ j /i i i i 1 h ~ 1!o 1 / r '!; ,; e f y c ;;{ j:: s 1 .. j jb h, 1\ ,( y ll " i hll 7 i gt r, s cost 13052-67 russian iso dis 5~27 extended cyrillic systems of east asia use thousands of different characters, it will be necessary to use two or three bytes/codes to identify a single specific character uniquely. the japanese industrial standard character set, jis 6226, uses two bytes per character, and it has been submitted to iso to obtain a registered escape sequence. the first volume of the chinese character code for information interchange, cccii, has been issued; the second is expected in december. it uses three bytes per character. in all probability the lc/ rlin east asian cooperative project will adopt either these character sets and their escape sequences or machine reversible adaptations of them. the need to expand east asian character sets constantly to provide for infrequently used characters poses problems whose solutions cannot be predicted at this time. 3. 3 discussioncharacter sets present field as specified in the sixth principle, there is need for a special field which specifies what character sets are present whenever a set other than ascii and the ala extension of ascii are present in a record. the proposed field will use tag 066 and be defined as follows: 066 character sets present this field specifies what character sets are present in the other than ascii and the ala extension of ascii. the field is not repeatable. both indicators are unused and will contain blanks. $a this subfield will contain all but the first character of the escape sequence to the default character set in columns 2-7 whenever the default character set is not ascii. this is not likely to occur in records created in the united states. since there can only be one default character set, the subfield is not repeatable. $b this subfield will contain all but the first character of the escape sequence to the default character set in columns 1015 whenever the default character set is not the ala extension of ascii. this is not likely to occur in records created in the united states. since there can be only one default extension character set, this subfield is not repeatable. $c this subfield will contain all but the first character (or all but the first if a longer escape sequence is used) of every escape sequence found in the record. if the same escape sequence occurs more than once, it will be given only once in this subfield. the subfield is repeatable. this subfield does not identify the default character sets. example : l'>l'>~c)w a record containing the iso extended cyrillic character set. l'>l'>$c)w$c)x a record 3.4 discussion-other details containing both the iso greek and extended cyrillic character sets. when a field has an indicator to specify the number of leading characters to be ignored in filing and the text of the field begins with an escape sequence, the length of the escape sequence will not be included in the character count. when fields contain escape sequences to languages written from right to left, the field will still be given in its logical order. for example, the first letter of a hebrew title would be the eighth character in a field (following the indicators, a delimiter, a subfield code, and a three-character escape sequence). the first letter would not appear just before the end of field character and proceed backwards to the beginning of the field. a convention exists in descriptive cataloging fields that subfield content designation generally serves as a substitute for a space. an escape sequence can occur within a word, after a subfield code, or between two words not at a subfield boundary. for simplicity, the convention that an escape sequence does not replace a space should be adopted. one other convention is also advocated: when a space, subfield code, or punctuation mark (except open quote, pareports and working papers 215 renthesis or bracket) is adjacent to an escape sequence, the escape sequence will come last. wayne davison of rlin raised the following issue. after the library of congress has prepared and distributed an entirely romanized cataloging record for a russian book, a library with access to automated cyrillic input and display capability will create a record for the same book with the title in the vernacular. (since aacr2 says to give the title in the original script "wherever practicable," the library could be said to be obligated to do so.) in such an event the local record could have all the authoritative library of congress access points. to keep this record current when the library of congress record is revised and redistributed, it would be necessary to carry the lc control number in the local record . most automated systems are hypersensitive to the presence of two records with the same control number. the two records can be easily distinguished: in the library of congress record, the modified record byte in field 008 will be set to "o" and it will not have any 066, character sets present field. a comparison of oclc, rlg/rlin, and wln university of oregon library the following comparison of three major bibliographic utilities was prepared by the university of oregon library's cataloging objectives committee, subcommittee on bibliographic utilities. members of the subcommittee were elaine kemp, acting assistant university librarian for technical services; rod slade, coordinator of the library's computer search service; and thomas stave, head documents librarian. the subcommittee attempted to produce a comparison that was concise and jargonfree for use with the university community in evaluating the bibliographic utilities under consideration. the university faculty library committee was enlisted to review this document in draft jorm and held three meetings with the subcommittee for that purpose. the document was also shared with library faculty and staff in order to elicit suggestions for revision. president's column 114 information technology and libraries | september 2006 b eing president of a dynamic organization like lita is truly a humbling experience. every day i am awestruck by the dedication, energy, creativity, and excitement exhibited by lita’s members. i see it in everything that lita does, from its stellar publications and communications—including this journal, ital—to its programming and contribution to standards and system development. none of this would be possible without the hard work of all the dedicated members who volunteer their time not only to advancing their own professional development, but also to advancing the profession. thank you all. for forty years now, lita members have been dedicated to the association’s work, and we have been celebrating our fortieth anniversary throughout 2006. the celebration continues as we prepare to convene in nashville for the ninth lita national forum, october 26– 29, 2006. lita has had a long tradition of providing quality conferences. the first, held in 1970, was the conference on interlibrary communications and information networks, more familiarly known as the “airlie conference,” which had published proceedings. the second was a cooperative effort held in 1971 with the library education division and the american society for information science (asis), entitled “directions in education for information science: a symposium for educators.” in later years, lita held three national conferences: baltimore (1983), boston (1988), and denver (1992). in 1996, lita and the library administration and management association (lama) held a joint conference in pittsburgh. while the national conferences were very successful, the idea of a more informal, intimate event to be held annually took form, and in 1998 lita held its first annual national forum. next year we will continue the tradition of successful conference programming as we celebrate the tenth anniversary of the lita national forum in denver. this year’s theme is “netville in nashville: web services as library services.” we have an exciting lineup of keynote and concurrent-session speakers as well as several poster-session presenters who will stimulate lively discussions in all of the wonderful, informal networking opportunities this small conference offers. the sponsor showcase allows plenty of time for attendees to talk to our valued sponsors and learn more about their products. the two preconference programs offer in-depth experiences: “opensource installfest” and “developing best project management practices for it projects.” lita bloggers will be out in force producing summaries and reactions to it all. one of lita’s strongest membership benefits is the personal networking opportunities it provides. by providing an informal and enjoyable atmosphere, the national forum is one of the best places to network with others dealing with the same issues as you. i hope to see you there. besides the national forum (just one of lita’s many educational programs), one of the things i like most about lita is its flexibility to quickly accommodate programming to cover the latest issues and trends. lita’s programming at ala annual conferences attracts attendees from all divisions for this reason. every year, the highly successful top technology trends attracts more and more people who come to listen to the experts speak on the latest trends. the lita interest groups, like the technologies they focus on, also exhibit great flexibility because they can come and go—it’s easy to locate a few other members to create a new group where interested parties can come together for focused discussions or formal presentations. since its inception, lita has had traveling educational programs to provide programming opportunities for people who cannot attend the ala conferences. these in-depth programs, now called the regional institutes, focus on a topic and are offered as long as that issue is relevant. look for new electronic delivery of lita programs in the future. of course, lita’s publications provide a very lasting educational component. lita launched journal of library automation (jola), the predecessor of ital, in 1968, one year after the formation of the new division of ala. jola and, later, ital have consistently been a place for library information technologists to publish in a peer-reviewed scholarly journal. these well-respected publications have had a wonderful group of editors and editorial boards over the years. we are pleased that ital is now available online for members from the moment of publication. i want to thank all the people who work so hard to produce this publication on a quarterly basis. i also want to thank all the authors who submit their research for publication here and make a lasting contribution to the profession. all of these programs are just a sampling of what lita provides its members. is it any wonder i am awed by it all? i hope you are as well. i also hope that, in my year as your president, you will communicate with me in an open dialogue on the lita blog, via e-mail, or in person at conferences regarding how lita can better meet your needs as a member. we have been focusing a great deal on our educational goal because that is what we have heard you want out of lita. i encourage you to let me and the rest of the lita board know how we can best deliver a quality set of educational programs. president’s column bonnie postlethwaite bonnie postlethwaite (postlethwaiteb@umkc.edu) is lita president 2006/2007 and associate dean of libraries, university of missouri–kansas city. data center consolidation at the university at albany rebecca l. mugridge and michael sweeney information technology and libraries | december 2015 18 abstract this paper describes the experience of the university at albany (ualbany) libraries’ migration to a centralized university data center. following an introduction to the environment at ualbany, the authors discuss the advantages of data center consolidation. lessons learned from the project include the need to participate in the planning process, review migration schedules carefully, clarify costs of centralization, agree on a service level agreement, communicate plans to customers, and leverage economies of scale. introduction data centers are facilities that house servers and related equipment and systems. they are distinct from data repositories, which collect various forms of research data, although some data repositories are occasionally called data centers. many colleges and universities have data centers or server rooms distributed across one or more campuses, as does the university at albany (ualbany). this paper reports on the experiences of the libraries at ualbany as the libraries’ application and storage servers were consolidated into a new, state-of-the-art, university data center in a new building on campus. the authors discuss the advantages of consolidation, the planning process for the actual move, and lessons learned from the migration. background the university at albany is one of four university centers that are part of the state university of new york (suny) system. founded in 1844, ualbany has approximately 13,000 undergraduates, 4,500 graduate students, and more than 1,000 faculty members. it offers 118 undergraduate majors and minors, and 138 master’s, doctoral, and certificate programs. ualbany resides on three campuses: uptown (the main campus), downtown, and east.1 the uptown campus was built in the 1960s on grounds formerly owned by the albany country club. the campus was designed by noted architect edward durell stone in 1962–63 and was built in 1963–64. the campus buildings include four residential quadrangles surrounding a central “academic podium” consisting of thirteen three-story buildings connected on the surface by an overhanging canopy and below ground by a maze of tunnels and offices. many of the university’s classrooms, lecture halls, academic and operational offices, and infrastructure are housed within the podium on the basement or subbasement levels. this includes the university’s original data center, which is located in a basement room in the center of the podium. rebecca l. mugridge (rmugridge@albany.edu) is interim dean and director and associate director for technical services and library systems, and michael sweeney (msweeney2@albany.edu) is head, library systems department, university libraries, university at albany, albany, new york. mailto:rmugridge@albany.edu mailto:msweeney2@albany.edu data center consolidation at the university of albany | mugridge and sweeney | doi: 10.6017/ital.v34i4.8650 19 while visually striking and unique, the architectural design of the podium presented many challenges since its construction, one of which is regular flooding of the basement and subbasement levels. the original data center was flooded many times, to the extent that any heavy rainstorm had the potential to disrupt functionality and connectivity. when the university was first built in the 1960s it was not known to what extent computing would become part of the university’s infrastructure, and the room that the data center was housed in was not built to today’s standards for environmental control, such as the need for cooling. at the same time, server rooms sprouted all over the university, with many of the colleges and other units purchasing servers and maintaining server rooms in less than ideal conditions. these included server rooms in the college of arts and sciences, the school of business, the athletics department, the university libraries, and many other units. university libraries’ server room the university libraries maintained its own server room with two racks full of equipment that supported all of the libraries’ computing needs. these servers supported our website, mssql and mysql databases, ezproxy, illiad (interlibrary loan service), ares (electronic reserve service), and our search engine appliance (google mini). they also included our domain controller, intranet, and several servers used for backup. two servers and a storage area network housed our virtual environment, containing an additional nine virtual servers. these included servers to support library blogs, wikis, file storage, development and test servers, and additional backup servers. the only library servers not housed in the libraries’ server room were the integrated library system (ils) servers that were maintained primarily by the university’s information technology services (its) staff, our backup domain controllers, and a server holding backups of our virtual servers. the ils production server was housed in ualbany’s data center and the ils test/backup server was housed in the alternate data center in another building on campus. also, two of the libraries’ backup servers for other applications were housed in the university data center. the libraries’ server room consisted of a 340 square foot room on the third floor of the main campus library that was networked to support servers housed in two racks protected by a fire suppression system. there were two ceiling-mounted air conditioning units that cooled the room sufficiently for optimum performance. the libraries’ windows system administrator’s office was nearby and had a connecting door to the server room, giving him ready access to the servers when needed. data center consolidation data center consolidation is defined as “an organization's strategy to reduce it assets by using more efficient technologies. some of the consolidation technologies used in data centers today include server virtualization, storage virtualization, replacing mainframes with smaller blade server systems, cloud computing, better capacity planning and using tools for process automation.”2 in addition to the investigation and use of these technologies, the planning for a information technology and libraries | december 2015 20 new data center often involves the construction of a new building or the renovation of a current building. there were several drivers behind the ualbany’s decisions to build a new data center. in addition to the concerns mentioned above about the potential flooding risk of the current data center, the ability to manage optimum temperature was also a factor. the current data center was built to house 1960s-era equipment and was not able to keep up with the cooling requirements of the more extensive computing equipment in use in the twenty-first century. the current data center also occupied what is considered prime real estate at the university, at the center of campus and near the lecture center, which experiences high foot traffic during the academic year. the new data center was constructed near the edge of campus, with little foot or auto traffic, allowing the space previously occupied by equipment to be repurposed in a way that better meets the university’s needs. like many other universities, ualbany is increasingly making use of cloud computing capabilities. for example, the email and calendaring system are cloud-based. nevertheless, this movement is being made in a deliberate and thoughtful way, leaving many of our administrative computing needs reliant on the use of physical servers. ualbany and the libraries have decreased the number of physical servers necessary by relying on a virtualized environment, and part of the project to move to the new data center included a conversion from physical to virtual servers. the libraries’ ils production and test servers remain physical, as do several of the other libraries’ application servers. many of the libraries’ backup servers are now virtual. while there was no official mandate to consolidate all of the distributed server rooms across campus into the new data center, everyone involved understood that this was a direction the university administration supported. the libraries’ dean and director also supported this effort on behalf of the libraries and charged libraries’ staff to collaborate with its to make this happen. some of the drivers behind this decision include the promise of a better environment, improved security, backup generators for computing equipment, the use of its’s virtual environment, the automation of server management, a faster network, the ability to repurpose the libraries’ server room, and more. these drivers are described in more detail later in this paper. construction planning for ualbany’s new data center began in the mid-2000s and included the identification of funding and the architectural design of the new building, later to be named the information technology building (itb). the actual construction began in 2013, with an estimated completion date of february 2014 and occupancy in april 2014. unexpected challenges during construction delayed the timeline somewhat, and the construction was not completed until may 2014. the certificate of occupancy was granted in fall 2014. the data center is certified as tier iii by uptime institute,3 and the building is designated leed gold. data center consolidation at the university of albany | mugridge and sweeney | doi: 10.6017/ital.v34i4.8650 21 alternate data center simultaneously with the construction of the new data center, the university entered into an agreement with another suny institution to house our alternate data center. this center was originally housed in another building on the ualbany campus, less than a mile from both the main data center and itb, in a building leased by ualbany. this situation left some environmental issues out of our control, not an ideal situation. for example, an air conditioner failure in fall 2013 caused our backup and test ils server to be down for six days, affecting our ability to use that server for other purposes and holding up several projects. in addition, data center best practice calls for an alternate data center to be housed at a distance from the main data center. in february of 2014, the servers in the alternate data center were moved to their new location. this included the libraries’ backup ils server as well as two backup servers formerly housed in the main data center. advantages to the libraries moving to the university data center there were many advantages to the libraries moving to a centralized data center. many of these advantages also applied to the other units considering a move to the new data center, but for the purposes of this paper, we are addressing them in the context of the libraries’ experience. repurpose space the libraries’ server room occupied a large office that could be repurposed to house multiple staff offices or student spaces. the libraries have many group study rooms available for student use; however, they are in great demand, and the possibility of gaining more space for student use was seen as an advantage to making the move to a new data center. climate control the new data center is built on a raised floor that allows better air circulation. hundreds of servers and other pieces of equipment create a lot of excess heat, and raised floor construction allows for better circulation of air. new racks have chimneys that exhaust heat from high-density computing environments. air conditioners supply a constant stream of air that will maintain the optimum temperature for computing equipment. censors continually monitor humidity and keep it at an optimal level. this was an improvement over the libraries’ current server room, which had sufficient air conditioning for our relatively small number of physical servers but did not have backup generators to keep equipment running during a power outage. backup generators the new data center was built with two backup generators. if the building suddenly loses power, the backup generators will immediately start and provide a seamless source of energy. a secondary benefit to the university is that the backup generators can also provide a source of energy to other buildings on that side of campus; this area did not previously have a backup source of energy. again, the libraries’ server room did not have a redundant electrical supply. in information technology and libraries | december 2015 22 the event of a power outage, battery units would allow the servers to shut down properly if the outage lasted more than forty-five minutes. security with server rooms scattered all over the university, security issues were a concern. now that the servers are housed in one location, the university can provide a highly secure environment in a more cost effective way. the new data center has card-swipe access to the building and biometric access to the data center itself. there are also cameras installed in the building as a further security measure. virtual environment although the libraries have made strides toward moving into a virtualized environment in the past few years, we had many constraints on our ability to keep up with developments. the libraries’ virtual environment was two versions behind ualbany’s virtual environment, and the storage needs of the libraries’ virtual environment were at capacity. part of the incentive to moving into the new data center was the ability to downsize some of our physical equipment and migrate some of our physical servers to virtual equivalents. automation of server management one of the benefits of consolidating servers into one environment is that they are in a secure location, but it is still possible to manage them from a distance. the virtual environment has a web-based console that allows system administrators to connect and manage them, and the physical servers can be managed over the network as well. even though the servers are centralized, our system administrator can work from an office in the library, or from home if needed. faster network part of the project to construct a new data center included the installation of an additional fiber network across campus. the new fiber network connects all buildings on campus with each other and the new data center. all of the network equipment was upgraded, providing faster connections and response time. the additional fiber network is fault tolerant: if the primary network fails, the second fiber network can immediately take its place with no loss of service. staging and work room the new data center was designed to include a staging and work room. this can be used by any of the system administrators who are responsible for equipment housed in the data center, and it allows them to work on equipment in a room adjacent to the locked and secure data center. data center consolidation at the university of albany | mugridge and sweeney | doi: 10.6017/ital.v34i4.8650 23 equipment inventory part of the planning for the migration involved creating a detailed inventory of equipment. the libraries already had a server inventory, but the information collected for the migration went far beyond just a list of the servers. this helped us identify who was responsible for physical and virtual servers and who was responsible for the services and applications that ran on those servers. creating the equipment inventory also allowed us to consolidate and decommission equipment that was no longer needed, and additionally helped us determine a prioritization and time line for the move. applications inventory in addition to creating an equipment inventory, the libraries created an applications inventory that included information about the dependencies that applications had on each other. for example, the libraries’ electronic resources reserve application (ares) had been integrated into the university’s course management system, blackboard. that meant that when blackboard was inaccessible, ares was as well. all of these dependencies had to be taken into account when planning the schedule for the move. disadvantages to the libraries moving to the university data center the libraries have noted few disadvantages to moving to the new data center. what might seem like disadvantages are in reality just a change in the way we do our work. for example, we have been asked to inform someone in its before we go to the new data center to work on a server. this is a simple step, and has not hindered our work at any time. another change is the need to use a tool created by its to configure our virtual servers, and we found that the tool has been configured to give us fewer administrative options than what its staff have. this has reinforced our understanding that we need to be present and proactive in representing the libraries’ interests in managing all of our computing equipment and software. migration days the majority of ualbany’s servers were moved from the main data center to the new one in itb on august 9, 2014. however, we were unwilling to move all of the libraries’ servers on that day, which fell in the middle of the summer session. a compromise was reached between the libraries and its that allowed many of the libraries’ less mission-critical servers to be moved on the same day as the university’s servers. these servers were primarily ones that were used for development and backup purposes, one exception being the server that supported the libraries’ electronic reserves service. this server was dependent on the university-supported blackboard server, which was being moved on august 9, so the libraries’ agreed to move this server that day so there would not be two downtimes for the electronic reserve system. the libraries’ most critical servers were moved to itb on august 18, 2014. this was the first day of intersession and would affect students and faculty the least. there were many people involved information technology and libraries | december 2015 24 in the move, including the library systems staff, the migration consulting firm staff, the professional moving company that was hired to carry out the move, and its staff who were responsible for the network and other support. move activities included shutting down and backing up applications, powering off the servers, and packing the equipment. at itb the equipment was unpacked, placed in its assigned rack location, plugged in, and powered on. then each server had to be started, and applications tested. all of this activity began at 3:00 a.m. and continued until early afternoon. the day concluded with a conference call between all parties involved to confirm that everything was up and running as expected. lessons learned participate in the process the libraries were invited to participate in the planning for a new data center early in the process. its, ul, and other units with significant server collections met and discussed their computing needs and respective computing infrastructures. once the construction of itb began, the planning ramped up and monthly meetings of stakeholders became weekly meetings. agendas for these meetings included round robin reports about • construction project oversight; • migration consulting; • partnerships (with other units on campus, including the libraries); • status of our alternate data center (housed 10 miles away at another suny institution); • campus fiber network; • internal wiring and network design; • administrative computing planning and move; • research computing planning and move; • systems management (storage and virtual environment) planning and move; • data center advisory structure; and • campus notification and public relations. these meetings gave us an opportunity to learn about and understand all aspects of the data center migration project. participants reviewed project timelines and other documents that were housed on a shared wiki space. after the data center migration consultants were hired, they began to use the microsoft onedrive collaboration space to share and distribute documents. meeting regularly with all project participants allowed us to ask questions to clarify priorities and timelines and to advocate for the libraries’ needs. review schedules carefully as with many construction projects, unexpected delays in the construction of the data center delayed all of our plans. originally the building was to be completed in february; this was later data center consolidation at the university of albany | mugridge and sweeney | doi: 10.6017/ital.v34i4.8650 25 changed to april and then may. after the construction was complete, the building had to be commissioned, which means that every system within the building had to be tested independently by outside inspectors. coordination of this work is very time consuming, and the completion of the commissioning delayed occupancy by another few months. the university was finally given permission to move equipment into the data center in july. in the meantime, our consultants were working feverishly to develop timelines for the move, identify, and secure a contract with a professional it moving company, and create “playbooks” for each move. the playbook is a document that includes • names and contact information of everyone involved with the move; • sequence of events: an hour-by-hour description of all activities; • server overview: including the name, make, model, rack location and elevation, and contact person for each server; and • schematics of both old and new server locations including details about each server as well as rack locations and elevations. library staff became concerned when the original date scheduled to move most of the university’s application servers, including the libraries’ ils server, was in the middle of the summer session. although the projected downtime was only to be twelve hours (and probably fewer), library staff were not willing to have twelve hours’ downtime during a short four-week summer session. there were concerns that downtime, not only to the online catalog, but also to all of the libraries’ databases, the website, online reference service, electronic reserves, and other resources would present a severe hardship to faculty and students. we also recognized the risk, however small, of something going wrong during the move that would cause a lengthier downtime. at the same time the university was concerned about pushing the move too close to the start of the fall semester, as well as the increased cost of scheduling a second move date. during these negotiations it became apparent that the libraries’ needs are different from administrative computing needs. whereas the middle of a semester is a poor time for libraries’ servers to experience downtime, it can be a better time for administrative computing, which is often busier during intersession when grading reports are being run and personnel databases are being updated. ultimately, the libraries advocated for and secured an agreement for a second move date, scheduled for the first work day after the end of the summer session. similarly, its was encouraging all of its partners across the campus to move as much computing as possible into their virtual environment. this is a worthwhile goal, but again the libraries had to negotiate to make this change according to the schedule best for the libraries and its users. the its virtual environment was a more current release of the virtual machine (vm) software than the libraries were using, so the libraries were faced with not only a migration, but also an upgrade. ultimately, we postponed the vm migration until after the physical migration, and we have information technology and libraries | december 2015 26 benefited from waiting. other partners have had to work through a number of kinks in the process, and the libraries’ vm migration has benefitted from the other partners’ experience. clarify costs of centralization when ualbany began to consider and plan for a centralized data center, one of the concerns raised by the various data center managers from units other than its was the cost of centralizing their servers in another location. centralized data centers have many costs: heating, cooling, security, staffing, cleaning, backup energy sources, networking costs, and more. the question on everyone’s mind was who was going to pay for these costs. would each unit have to pay toward the maintenance of the data center? some objected to the idea of having to pay to be a tenant in a centralized data center, when they already had their own data center or server room at what seemed like no cost. the only cost they experienced was an opportunity cost of what else they could use the server room for. in the libraries’ case, the server room could be used for group study, office space, or other purposes, but it did not cost the libraries money to use it as a server room because utilities are covered centrally by the university. on the other hand, by migrating some of our computing to the its virtual environment, we may save money in the long run because we will not have to replace hardware and pay warranty fees. after much negotiation the university settled on a five-year commitment to no charges for the partnering units on campus, including the libraries. this agreement was documented in a partnership agreement drafted by a group of representatives from all of the key units involved. contribute to the development of a service level agreement library staff contributed to the development of a service level agreement (sla) for our participation in a centralized data center. having an sla in place ensures that all parties to the agreement understand their rights and responsibilities. we began by searching other universities’ websites for samples of slas, which we shared with its staff who were assigned to this project. the establishment of a centralized data center includes several major elements: data center as a service (dcaas), infrastructure as a service (iaas), as well as the network that connects it all. the sla that was developed, still in draft form, has elements that address the following: • the length of the agreement • network uptime • infrastructure as a service o server/storage environment and technical support o access to iaas o file backup and retention o maintenance of partner systems o its scheduled maintenance o data ownership, security, responsibility, and integrity data center consolidation at the university of albany | mugridge and sweeney | doi: 10.6017/ital.v34i4.8650 27 o business continuity, tiering, and disaster recovery o availability and response time of its staff • data center as a service o environment and support o building access and security o physical rack space o deliveries o scheduled maintenance o communications • glossary we recommend that institutions considering data center consolidation projects complete their sla and other agreements before moving servers into a shared environment. in our case, however, we were unable to finalize the sla prior to the actual move. this was not because of any particular demands that the libraries were making, but was primarily because of the rapid approach of the deadline for moving into the new data center. it had to be completed before the beginning of fall semester, and preferably with a few weeks to spare in case anything went wrong. while the planning for the data center construction and migration seemed to stretch over a long period of time, the final few months turned into a frenzy of activity that ranged from last-minute construction details to nailing down the exact order in which thousands of pieces of equipment would be moved. although not every detail was ironed out at the time of the move, the intentions and spirit of the sla have been documented and it will be completed during 2015. communicate developments and plans during the planning and development for the data center migration project, we recognized that it would be important to communicate any changes to the libraries’ systems availability to our users. its also recognized the need to communicate such changes. both its and the libraries took a many-pronged approach to communicating developments and plans related to the migration. within the libraries we shared updates at library faculty meetings as well as meetings of the library policy group (the dean’s administrative policy team). we sought feedback from many groups on proposed move dates, establishing intersession as the preferred time to move any libraries’ servers that would affect access to resources used by faculty or students. as the moves got closer the communication efforts were ramped up. within the libraries, we posted alerts on the libraries’ webpage that linked to charts indicating what services would be unavailable and when. we also included slides on the libraries’ main webpage with the same information. the same slide was posted on all three libraries’ flat-screen monitors, on which we post important news and dates. we sent mass emails to all libraries’ staff that reminded them when services would be down. staff members who were responsible for specific services made an effort to contact their customers directly. for example, the head of access services contacted information technology and libraries | december 2015 28 faculty members about the scheduled interruptions of ares, our electronic resources reserves system. some of the downtime affected just users, and other downtime also affected staff who could not work in the ils during the move. we planned alternate activities for staff members who could not work during the down time and had a productive division clean up day instead. its also made great efforts to communicate to the university community about the moves and any potential downtime. their efforts included mass emails to all faculty, staff, and students. its created and posted slides to the libraries’ flat screen monitors, as well as other monitors throughout the university. its also formed a team of liaisons from each school and college, using that group as yet another conduit to communicate changes. they shared draft schedules, seeking input on the effect of downtime on the university’s functions. leverage economies of scale one of the challenges of maintaining a distributed data center environment is that each system administrator or unit had to manage its own servers singlehandedly. in the case of the libraries at ualbany, we had moved in the direction of using the power of virtualization to manage many of our servers. virtualization refers to the process of creating virtual servers within one physical server, thereby multiplying the value of a single server many times. the libraries had virtualized a number of library servers, saving money by not having to purchase additional costly physical servers. however, its, with its greater purchasing power, was using more current and advanced virtualization software, hardware, and services than the libraries. its created a suite of services that allows system administrators access to the virtual environment so they can manage their virtual servers from their own offices. by moving into their virtual environment (iaas), the libraries are able to leverage the economies of scale presented by their environment. conclusion the consolidation of distributed data centers or server rooms on university campuses offers many advantages to their owners and administrators, but only minimal disadvantages. the university at albany carried out a decade-long project to design and build a state-of-the-art data center. the libraries participated in a two-year project to migrate their servers to the new data center. this included the hire of a data center migration consulting firm, the development of a migration plan and schedule for the physical move that took place late summer 2014. the authors have found that there are many advantages to consolidating data centers, including taking advantage of economies of scale, an improved physical environment, better backup services and security systems, and more. lessons learned from this experience include the value of participating in the process, reviewing migration schedules carefully, clarifying the costs of consolidation, contributing to the development of an sla, and communicating all plans and developments to the libraries’ customers, including faculty, staff, and students. as other university libraries consider the possibility of consolidating their data centers, the authors hope that this paper will provide some guidance to their efforts. data center consolidation at the university of albany | mugridge and sweeney | doi: 10.6017/ital.v34i4.8650 29 references 1. “fast facts,” university at albany, accessed march 31, 2015, www.albany.edu/about/about_fastfacts.php. 2. “data center consolidation, it consolidation,” webopedia, www.webopedia.com/term/d/data-center-consolidation-it-consolidation.html (accessed march 31, 2015). 3. uptime institute, accessed march 31, 2015, https://uptimeinstitute.com/tiercertification/. https://uptimeinstitute.com/tiercertification/ lib-s-mocs-kmc364-20140601052623 137 technical note help: the automated binding records control system an interesting new aspect of library automation has been the appearance of commercial ventures established to provide for an effective use of the new ideas and techniques of automation and related fields. some of these ventures have offered the latest in information science research and development techniques, such as systems analysis, management planning, and operations research. others have offered services based on new procedures, for example, computer-produced book catalogs, selective dissemination of information services, indexing and abstracting activities, mechanized acquisitions, and catalog card production systems. one innovation is a new technique devised for libraries to reduce the clerical effort required to prepare materials for binding and to maintain the necessary related records. the technique is called help, the heckman electronic library program. it was developed by the heckman bindery of north manchester, indiana, with the cooperation of the purdue university libraries. it was recognized by heckman's management that the processing of 10,000 to 20,000 periodicals weekly and the maintenance of over 250,000 binding patterns would soon become too unwieldy and costly unless more efficient procedures were developed. it was additionally realized that any new system should also be designed as a means to aid libraries with their interminable record-keeping problems. the latter purpose could be accomplished by providing a library with detailed and accurate information regarding each periodical it binds, and by simplifying the library's method of preparing binding slips for the bindery. in the fall of 1969, after a detailed analysis, the heckman bindery management began the development and programming of a computerized binding pattern system. this system was a result of a team effort involving management, sales, and production departments. john pilkington, data processing manager, directed the installation of the system and earl beal performed the necessary programming functions. in december of 1971 approx imately 700 libraries were using the system, and about 100,000 binding patterns were in the data file . 138 journal of library automation vol. 5/2 june, 1972 as the system was developed, a library's binding pattern data were converted to machine-readable form which then made it possible for the bindery automatically to provide nearly complete binding slips for each periodical title bound. in addition, the system provides an up-to-date pattern record for the libraries' files, and the bindery maintains the resultant data bank of pattern records as the library notifies it of additions, changes, and deletions. in this manner, the bindery expects to establish an efficient method for purging the file of out-of-date information. the system revolves around four forms: the binding pattern index card, the binding slip, the variable posting sheet, and the binding historical record. the binding pattern index card (figure 1) is a 5" x 8w' card, pink in color, which is a computer printout. one of these cards is retained in the library as its pattern record for each set of each periodical bound by the library. the data given on the card are essentially the same as those maintained by most libraries in their manual pattern £les, except that more detail is provided by the help system, and the library does not maintain the record-the bindery does-in machine-readable form. as changes are made to the patterns, the library clerk simply crosses out the old data on the appropriate binding slip and writes in the new data. when the bindery receives the binding slip, a new index card is produced, among other records, and forwarded to the library with the returned shipment of bound volumes. the system also provides for one-time changes that do not affect the pattern record. the data contained on the index cards include the library account number, the library branch or department code, the pattern number, color, type size, stamping position, title (vertical or horizontal spine positions), labels, call number, library imprint, and collating instructions. the collating instructions, which are listed in the instruction manual provided by the bindery, are given as a series of numeric codes. asterisks are used to indicate the end of a print line. the binding slips are also 5" x 8}2'' forms, but they are four-part multiple forms, of which three parts are sent to the bindery with the periodical to be bound, and one part, a card form, is retained by the library as its "at bindery" record. the information required by the binding slip is essentially the same as that included on the index card. the library, however, must provide the variable data such as volume number(s), date(s), month(s), or whatever information is required to identify a specific volume. the variable posting sheet (figure 2) is an 8)~" x 11" form that is used by the library when it sends several volumes or copies of a volume to the bindery at the same time. since the bindery cannot determine beforehand the number of physical volumes of a title a library will want to send for binding at a given time, it sends to the library only one printed-out binding slip to be used for the next volume of a given serial. if multiple volumes of -r-~---------------------------------------------:r-0 pattern cust. acct . no. i lib rar y' i pattern no. i •• 1. colo~. . , i trim i spine icust . pat. no. 'i' 0 i t'"e slot or i i otui i i i size start i i ::: library i 0 <( ~ ..... oo z z 0 i <( 0 post • 0 0 0 0 ,, ' 0 -o ·-' 0 ·~ f ·o 0 ~ accents i ~ z ~ to i : i rr.· llol ~ !ii llol i: i o i z <( ::e i i u z ·> i a: lli 0 i ~ i id z <( :1 :iii: i ~ x lli x i ... v e • t i c a l f r l 0 a n 0 t e l 0 • • call impriiiit panel l,..lllll$ coll.atl 8 len . s£w p£rma · film vol.. oty. 1 ovt• u: " u fiiioui u•• nu... 0, ~: r tal"[ stui filler sep. covea stu r stu8 w/stui sheets 11111 papu y x title i 0 ~ i f '" required i ~ i new title i i : i 0$ sample i i q. or rub job no. cover no. 0 c o: .. . : 3 0~ : • ! of 'i q_l ' + ~--------. 'f•a ( i o 1 -. }, 4 fig. 1. binding pattern index card. 140 journal of library automation vol. 5/ 2 june, 1972 binding patiern variable posting sheet 1he. heckman bin'de.~y, inc. cust. acct. no. 1 ~.18rj.rv rattern no.,l-israrv name periodicalname 'post patterw variabl-e information from \.eft to right in seqi./li:nc'e i z 3 4 5 . 6 ; .... '-......_~ .... "-....... ,_......-"'\_'-~r··-~ ........... ..____.._ · -l, )~ i / -~ fig. 2. variable posting sheet. a set are to be bound, the library clerk provides the variable information for the first volume by using the single binding slip, and the variable data for each additional volume of the same title are posted by the clerk on the posting sheet. the bindery will automatically produce from its pattern data bank the binding slips necessary for binding the additional volumes that are listed on the posting sheet. the binding historical record (figure 3) is a form provided for the use of the library if it desires a permanent record of every volume bound. the use of this form is not required by the system; it is simply a convenience record for the library binding staff. the form is printed on the back of the pattern index card. spaces are provided for volume, year, and date sent to the bindery, and most of the back of the card is available for posting. all data fields are of fixed length with the maximum size of the records at 328 characters. some of the data formats are shown in figure 4. a few of the data fields in the example need additional explanation. the fifth field labeled "print" refers to the color of the spine stamping, i.e., gold, black, or white. the "trim #1 & 2" fields are for bindery use only, and indicate volume size within certain groups for printing purposes. the "spine" field is also for bindery use, and it indicates the size of type that can be used according to the width of the spine. "product no." refers to certain types of publications such as magazines, matched sets, or items which will be pamphlet (inexpensively) bound. i i 0 0 0 0 0 0 0 0 0 0 title : publisher ' s address: volume year -------------------· binding record 0 0 date sent volume year date sent 0 0 0 0 0 0 0 0 fig. 3. binding historical record. ,..--1 i i i i i i i i i ibr., print punch program control card print punch program control card print punch program control card l----96 column cal card name ______________ _ i 12 1314 15 1 6 171 81 91 10 11 112 113 14l15l16 l11 l18l 19 20 l21l22 l23l24l25l26 l21 l28 l29 l30 131132133 134 35136 3ji38 l39 l40 1411421 43144145 l i ' print line 1 i p ier 1 ' t i i cust. no. lib pattern p mat. trim ~im s customer no. no. r #i p 1 i i i pattern ' n i n i t i e no. i ' i 2j 3 4j5j_6 1 18 19 110 11112113 14li5l 16 l11 l1 8 ll 9 20121 122 23124125126 21128129130 31132133134 35136 31138139140 i 411 42 143144 145 i ii i i i i i ii i i i i i ill ill ii i i ii i ill ii card name-------------i 2 13} 4 }5}6 1 i 8 i 9 i 101 ii 1121 131 14115 116111 118 119 20121122123124125126 1211281 29 130 i 311 32 133 i 34 i 35 l36l31 l38l39l40 141 42143144 14511 l i ' print line 1 i p,, er 1 i ti ' i cust. no. lib pattern i ' ' no. no. i !2 ' ) collate (con~.) -~ ' i i i i i 2 i 3 4 i 5 i 6 118 19 }10 11}12 }13 j4li5l16 i 11 i 18 i 19 20 i 21 i 22123 i 24125126121 i 28129130 131132133 i 34 i 35136 131 138 i 39140 i 41 42 143 14414511 i i i i ii i i , iiiii i i i i i 11 i i i i 11 111111 i i ii ii card name ______________ _ 1 2 i 3 14 i sl 6 i 1 i sl 91 10 111 112 j 13 14l15l 16l11 l1 8}19 20 }21}22}23}24}25}26}21}28 }29 }30 l31 }32 }33}34l35l36l37j38 l39l40 ]41 ]42 143 144]45~ print line 1 pr ier 1 ti cust. no. lib pattern i ~ no. no. 5 i 2 i 3 41516 1 i 8 i 9 i 10 ii 112113 14115 i 16117 i 18 i 19 20121 i 22123124125126 i 27128129130 i 31 i 32 133 134 i 35136131138 i 39140 i41i42i43i44i4sl~ i i i ijj ll . ill i i i i l l i i l i ll l ill l l l l l l l l l i ll fig. 4. data formats. ----, 1 j multiple layout form print lines 3 and 4 tier3 gx21·9088·0 um /050 " pnnted •n u s a "no of,offt'is_,.,~,..\w~l,.. "1f.---------'--collate----------------l 11 line 2 print lines 3 and 4 r2 tier 3 ----------------------~----variable ------------------------~----~ 1t line 2 print lines 3 and 4 r2 tier3 -----variable (contt) ------------------~ ' ' i i i i ~ l i i i i _____ j 144 journal of library automation vol. 5/ 2 june, 1972 l. lllrary hamil!. 'uft. acct. no, llll r: how 80unp i pro.~~~ ;::;:. j'iittflll no.,i'itiny i"ayijtta..:l trim i ~''ni-l cu$ t. i"ayteitn no. i !rvpe nor dr 'patter-n pr.l)o.itlng~t::tu p sixe hart ~ lfor.ix:olhal. lv veil tical ' i fr? fronl' or. labels variable fgl cafyions call ~ c impjlinl' ~ i panel. ~ line:s p collatingom ~ fig. 5. pattern printing setup. technical note / hammer 145 one additional form used in the system is for heckman's internal operations. that is a data input form known as the "pattern printing setup" (figure 5). this form is used by the bindery's input clerks to prepare new binding patterns for conversion to machine-readable form. the data prescribed by the form is much like that required by the binding pattern index card, except that data tags are shown for keypunching purposes. the system operates on an ibm system 3 computer with two 5445 disk drives and a 1403nl printer. the disk drives provide a total of 40,000,000 characters of on-line storage in addition to the 7,500,000 usable characters provided by the system 3 itself. five 5496 data recorders are used for data conversion. the programs are written in rpg2. the development of computer-oriented commercial services for libraries suggests that, perhaps if librarians wait long enough, they will not have to automate their libraries as commercial ventures will do it for them. the rapid appearance of systems-analysis firms, commercial and societal abstracting and indexing services, management and planning consulting groups, and data processing service bureaus tends to bear this theory out. at the very least, libraries will not be able to automate internally without providing for the incorporation of such ready services into their systems. when a service such as help is made available at no additional charge, there is no way for libraries to avoid automation. donald p. hammer donald p. hammer is associate director for library and information systems, university of massachusetts library, amherst. at the time the system d escribed in this article was developed, mr. hammer was the head of libraries systems development at purdue university. lib-mocs-kmc364-20140106084018 title-only entries retrieved by use of trunca1'ed search keys 207 frederick g. kilgour, philip l. long, eugene b. liederman, and alan l. landgraf: the ohio college library center, columbus, ohio. an experiment testing utility of truncated search keys as inquiry terms in an on-line system was performed on a file of 16,792 title-only bibliographic entries. use of a 3,3 key yields eight or fewer entries 99.0% of the time. a previous paper ( 1) established that truncated derived search keys are efficient in retrieval of entries from a name-title catalog. this paper reports a similar investigation into the retrieval efficiency of truncated keys for extracting entries from an on-line, title-only catalog; it is assumed that entries retrieved would be displayed on an interactive terminal. earlier work by ruecking (2), nugent (3), kilgour (4), dolby (5), coe ( 6), and newman and buchinski ( 7) were investigations of search keys designed to retrieve bibliographic entries from magnetic tape files. the earlier paper in this series and the present paper investigate retrieval from on-line files in an interactive environment. similarly, the work of rothrock ( 8) inquired into the efficacy of derived truncated search keys for retrieving telephone directory entries from an on-line file. since the appearance of the previous paper, the ohio state university libraries have developed and activated a remote catalog access and circulation control system employing a truncated derived search key similar to those described in the earlier paper. however, osu adopted a 4,5 key consisting of the first four characters of the main entry and the first five characters of the title excluding initial articles and a few other nonsignificant words. whereas the osu system treats the name and title as a continuous string of characters, the experiments reported in this and the previous paper deal only with the first word in the name and title, articles always being excluded. 208 journal of library automation vol. 4/4 december, 1971 the bell system has also recently activated a large traffic experiment in the san francisco bay area. the master file in this system contains 1,300,000 directory entries. the system utilizes truncated derived keys like those investigated in the present experiments. materials and methods the file used in this experiment was described in the earlier paper ( 1), except that this experiment investigates the title-only entries. the same programs used in the name-title investigation were used in this experiment; the title-only entries were edited so that the first word of the title was placed in the name field and the .11emaining words in the title field. as was the case formerly, it was necessary to clean up the file. single word titles often carried in the second or title field such expressions as one year subscription or vol 16 1968. in addition there were spurious character strings that were not titles, and in such cases the entire entry was removed from the file. thereby, the original 17,066 title entries were reduced to 16,792. the truncated search keys derived from these title-only entries consist of the initial characters of the first word of the title and of the second word of the title. if there was no second word, blanks were employed. if either the first or second word contained fewer characters than the key to be derived, the key was left-justified and padded out with blanks. to obtain a comparison of the effectiveness of truncated research keys derived from title-only entries as related to first keys derived from nametitle entries, a name-title entry fil e of the same number of entries ( 16,792) was constructed. a series of random numbers larger than the number of entries in the original name-title file ( 132,808 ) was generated and one of the numbers was added to each of the 132,808 name-title entries in sequence. next the fil e was sorted by number so that a randomized file was obtained. then the first 16,792 name-title entries were selected. the same program analyzed keys d erived from this file. results table 1 presents the maximum number of entries to be expected in 99% of replies for the file of 16,792 title-only entries as well as for the nametitle file containing the same total of entries. for example, when a large number of random requests are put to the title-only file using a 3,3 search key, the prediction is that 99.0% of the time, eight or fewer replies will be returned. however, in the case of the name-title file , only two replies will be returned 99.3% of the time. the 3,3 key produced only thirteen replies ( .12% of the total number of 3,3 keys) containing twenty-one or more entries. the highest number of entries for a single reply for the 3,3 key was 235 ( "jou ,of" d erived from journal of ) . the next highest number of replies was 88 ("adv, in" for advances in ) . trun cated search keys j kilgour 209 table 1. maximum number of entries in 99% of replies search key title-only entries name-title entries percent max imum ent1·ies maximum entries percent per reply of time per reply of time ~2 ~ ~1 7 99.0 ~3 ~ ~1 4 99.6 2,4 11 99.0 3 99.5 3,2 9 99.1 3 99.2 3,3 8 99.0 2 99.3 3~ 8 ~1 2 99.5 4,2 8 99.1 2 99.2 4,3 7 99.0 2 99.6 4,4 7 99.1 2 99.7 discussion the two words from which the keys are derived in name-title entries constitute a two-symbol markov string of zero order, since the name string and title string are uncorrelated. however, the two words from which keys are derived in the title-only entry are first order markov strings, since they are consecutive words from the title string and are correlated. the consequence of these two circumstances on the effective ness of derived keys is clearly presented in table 1. the keys from name-title entries consistently produce fewer maximum entries per reply. therefore, it is desirable to derive keys from zero order markov strings wherever possible. the ohio state university libraries contain over two and a quarter million volumes, but on 9 february 1971 there were only 47,736 title-only main entries in the catalog. the file used in the present experiment is 35% of the size of the osu file. since 99% of the time the 3,3 key yields eight or fewer titles, it is clear that such a key will be adequate for retrieval for library on-line, title-only catalogs. the 3,3 key also posse sses the attractive quality of eliminating the majority of human misspe1ling as pointed out in the earlier paper ( 1). there remains, however, the unsolved problem of the efficient retrieval of such titles as those beginning with "journal of" and "advances in". it appears that it will be necessary to devise a special algorithm for those relatively few titles that produce excessively high numbers of entries in replies. in the previous investigation it was found that a 3,3 key yielded five or fewer replies 99.08% of the time from a fil e of 132,808 name-title entries. table 1 shows that for a file of only 16,792 entries the 3,3 key produces two or fewer replies 99.3% of the time . these two observations suggest that as a file of bibliographic entries increases, the maximum number of entries per reply does not increase in a one-to-one ratio, since the maximum 210 journal of library automation vol. 4/4 december, 1971 number of entries rose from two to five while the total size of the file increased from one to approximately eight. further research must be done in this area to determine the relative behavior of derived truncated keys as their associated file sizes vary. conclusion this experiment has produced evidence that a series of truncated search keys derived from a first order markov word string in a bibliographic description yields a higher number of maximum entries per reply than does a series derived from a zero order markov string. however, the results indicate that the technique is nonetheless sufficiently efficient for application to large on-line library catalogs. use of a 3,3 search key yields eight or fewer entries 99.0% of the time from a file of 16,792 title-only entries. acknowledgment this study was supported in part by national agricultural library contract 12-03-01-5-70 and by office of education contract oec-0-72-2289 (506). references 1. f. g. kilgour; p. l. long; e. b. leiderman: "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science 7 ( 1970), pp. 79-82. 2. f. h. ruecking, jr.: "bibliographic retrieval from bibliographic imput; the hypothesis and construction of a test," journal of library automation 1 (december 1968), 227-38. 3. nugent, w. r.: "compression word coding techniques for information retrieval," ] ournal of library automation 1 ( december 1968 ) , 250-60. 4. f. g. kilgour: "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science 5 ( 1968), pp. 133-36. 5. j. l. dolby: "an algorithm for variable-length proper-name compression," ] ournal of library automation 3 (december 1970), 257-75. 6. m. j. coe: "mechanization of library procedures in the medium-sized medical library: x. uniqueness of compression codes for bibliographic retrieval,'' bulletin of the medical library association 58 (october 1970), 587-97. 7. w. l. newman; e. j. buchinski: "entry/title compression code access to machine readable bibliographic files," journal of library automation 4 (june 1971 ), 72-85. 8. h. i. rothrock, jr.: computer-assisted directory search; a dissertation in electrical engineering. (philadelphia: university of pennsylvania, 1968). open search environments: the free alternative to commercial search services. adrian o’riordan information technology and libraries | june 2014 45 abstract open search systems present a free and less restricted alternative to commercial search services. this paper explores the space of open search technology, looking in particular at lightweight search protocols and the issue of interoperability. a description of current protocols and formats for engineering open search applications is presented. the suitability of these technologies and issues around their adoption and operation are discussed. this open search approach is especially useful in applications involving the harvesting of resources and information integration. principal among the technological solutions are opensearch, sru, and oai-pmh. opensearch and sru realize a federated model to enable content providers and search clients communicate. applications that use opensearch and sru are presented. connections are made with other pertinent technologies such as open-source search software and linking and syndication protocols. the deployment of these freely licensed open standards in web and digital library applications is now a genuine alternative to commercial and proprietary systems. introduction web search has become a prominent part of the internet experience for millions of users. companies such as google and microsoft offer comprehensive search services to users free with advertisements and sponsored links, the only reminder that these are commercial enterprises. businesses and developers on the other hand are restricted in how they can use these search services to add search capabilities to their own websites or for developing applications with a search feature. the closed nature of the leading web search technology places barriers in the way of developers who want to incorporate search functionality into applications. for example, google’s programmatic search api is a restful method called google custom search api that offers only 100 search queries per day for free.1 the limited usage restrictions of these apis mean that organizations are now frequently looking elsewhere for the provision of search functionality. free software libraries for information retrieval and search engines have been available for some time allowing developers to build their own search solutions. these libraries enable search and retrieval of document collections on the web or offline. web crawlers can harvest content from multiple sources. a problem is how to meet users’ expectations of search efficacy while not having adrian o’riordan (a.oriordan@cs.ucc.ie) is lecturer, school of computer science and information technology, university college, cork, ireland. open search environments: the free alternative to commercial search services | o’riordan 46 the resources of the large search providers. reservations about the business case for free open search include that large-scale search is too resource-hungry and the operational costs are too high; but these suppositions have been challenged.2 open search technology enables you to harvest resources and combine searchers in innovative ways outside the commercial search platforms. further prospects for open search systems and open-source search lie in areas such as peer-to-peer, information extraction, and subject-specific technology.3 many search systems unfortunately use their own formats and protocols for indexing, search, and results lists. this makes it difficult to extend, alter, or combine services. distributed search is the main alternative to building a “single” giant index (on mirrored clusters) and searching at one site, a la google search and microsoft bing. callan describes the distributed search model in an information retrieval context.4 in his model, information retrieval consists of four steps: discovering databases, ranking databases by their expected ability to satisfy the query, searching the most relevant databases, and merging results returned by the different databases. distributed search has become a popular approach in the digital libraries field. note that in the digital libraries literature distributed search is often called federated search.5 the federated model has clear advantages in some application areas. it is very hard for a single index to do justice to all the possible schemas in use on the web today. other potential benefits can come from standardization. the leading search engine providers utilize their own proprietary technologies for crawling, indexing, and the presentation of results. the standardization of result lists would be useful for developers combining information from multiple sources or pipelining search to other functions. a common protocol for declaring and finding searchable information is another desirable feature. standardized formats and metadata are key aspects of search interoperability, but the focus of this article is on protocols for exchanging, searching, and harvesting information. in particular, this article focuses on lightweight protocols, often rest (representational state transfer)–style applications. lightweight protocols place less onerous overheads on development in terms of adapting existing systems and additional metadata. they are also simpler. the alternative is heavyweight approaches to federated search, such as using web services or service-oriented architectures. there have been significant efforts at developing lightweight protocols for search, primary among which is the opensearch protocol developed by an amazon subsidiary. other protocols and services of relevance are sru, mxg, and the oai-omh interoperability framework. we describe these technologies and give examples of their use. technologies for the exchange and syndication of content are often used as part of the search process or in addition. we highlight key differences between protocols and give instances where technologies can be used in tandem. information technology and libraries | june 2014 47 this paper is structured as follows. the next section describes the open search environment and the technologies contained therein. the following section describes open search protocols in detail, giving examples. finally, summary conclusions are presented. an open search environment a search environment (or ecosystem) consists of a software infrastructure and participants. the users of search services and the content providers and publishers are the main participants. the systems infrastructure consists of the websites and applications that both publish resources and present a search interface for the user, and the technologies that enable search. technologies include publishing standards for archiving and syndicating content, the search engines and web crawlers, the search interface (and query languages), and the protocols or glue for interoperability. baeza-yates and raghavan present their vision of next-generation web search highlighting how developments in the web environment and search technology are shaping the next generation search environment.6 open-source libraries for the indexing and retrieval of document collections and the creation of search engines include the lemur project (and the companion indri search engine),7 xapian,8 sphinx,9 and lucene (and the associated nutch web crawler).10 all of these systems support web information retrieval and common formats. from a developer perspective they are all crossplatform; lemur/indri, xapian, and sphinx are in c and c++ whereas lucene/nutch is in java. xapian has language bindings for other programming languages such as python. the robustness and scalability of these libraries support large-scale deployment, for example the following large websites use xapian: citebase, die zeit (german newspaper), and debian (linux distribution).11 middleton and baeza-yates present a more detailed comparison of open-source search engines.12 they compare twelve open-source search engines including indri and lucene mentioned above across thirteen dimensions. features include license, storage, indexing, query preprocessing (stemming, stop-word removal), results format, and ranking. apache solr is a popular open-source search platform that additionally supports features such as database integration, real-time indexing, faceted search, and clustering.13 solr uses the lucene search library for the core information retrieval. solr’s rich functionality and the provision of restful http/xml and json apis makes it an attractive option for open information integration projects. in a libraries context singer cites solr as an open-source alternative for next-generation opac replacements.14 solr is employed, for example, in the large-scale europeana project.15 the focus of much of this paper is on the lightweight open protocols and interoperability solutions that allow application developers to harvest and search content across a range of locations and formats. in contrast, the delosdlms digital library framework exemplifies a heavyweight approach to integation.16 in delosdlms, services are either loosely or tightly coupled in a serviceoriented architecture using web service middleware. open search environments: the free alternative to commercial search services | o’riordan 48 particular issues for open search are the location and collection of metadata for searchable resources and the creation of applications offering search functionality. principal among these technological solutions are the opensearch and sru protocols. both implement what alipourhafezi et al. term a federated model in the context of search interoperability, wherein providers agree that their services will conform to certain standard specifications.17 the costs and adoption risk of this approach are low. technologies such as opensearch occupy an abstraction layer above existing search infrastructure such as solr. interoperability interoperability is an active area of work in both the search and digital library fields. interoperability is “the ability of two or more systems or components to exchange information and to use the information that has been exchanged.”18 interoperability in web search applies to resource harvesting, meta-search, and to allow search functions interact with other system elements. interoperability in digital libraries is a well-established research agenda,19 and it has been described by paepcke et al. as “one of the most important problems to solve [in dls].”20 issues that are common to both web search and digital-library search include metadata incompatibilities, protocol incompatibilities, and record duplication and near-duplication. in this paper, the focus is on search protocols; for a comprehensive survey of technology for semantic interoperability in digital library systems, see the delos report on same.21 a comprehensive survey of methods and technology for digital library interoperability is provided in a dl.org report.22 formats and metadata free and open standard formats are extensive in web technology. standard or de facto formats for archiving content include plain text, rich text format (rtf), html, pdf, and various xml formats. document or resource identification is another area where there has been much agreement. resource identification schemes need to be globally unique, persistent, efficient, and extensible. popular schemes include urls, persistent urls (purls), the handle system (handle.net), and dois (digital object identifiers). linking technologies include openurl and coins. openurl links sources to targets using a knowledge base of electronic resources such as digital libraries.23 contextobjects in spans (coins), as used in wikipedia for example, is another popular linking technology.24 applications can use various formats for transporting and syndicating content. syndication and transport technologies include xml, json, rss/atom, and heavyweight web service-based approaches. much of the metadata employed in digital libraries is in xml formats, for example in marcxml and the metadata encoding and transmission standard (mets). the world wide web consortium defined rdf (resource description framework) to provide among other goals “a mechanism for integrating multiple metadata schemes.”25 rdf records are defined in an xml namespace. in rdf, subject-predicate-object expressions represent web information technology and libraries | june 2014 49 resources and typically identified by means of an uri (universal resource identifier). json (javascript object notation) is a lightweight data-interchange format that has become popular in web-based applications and is seeing increasing support in digital library systems.26 harvesting web content is harvested using software called web crawlers (or web spiders). a crawler is an instance of a software application that runs automated tasks on the web. specifically the crawler follows web links to index websites. there was been little standardization in this area except the robot exclusion standard and use of various xhtml meta-elements and http header fields. there is a lot of variability in terms of the policy for the selection of content sources, policy for following links, url normalization, politeness, depth of crawl, and revisit policy. consequently, there are many web crawling systems in operation; open-source crawlers include datapartsearch, grub, heritrix, and the aforementioned nutch. harvesting and syndication of metadata from open repositories is the goal of the open archives initiative protocol for metadata harvesting (oai-pmh), originally developed by los alamos national laboratory, cornel, and nasa in 2000 and 2001.27 resource harvesting challenges include scale, keeping information up-to-date, robustness, and security. oai-pmh has been adopted by many digital libraries, museums, archives, and publishers. the latest version, oai-pmh 2.0, was released in 2012. oai-pmh specifies a general application-independent model of network-accessible repository and client harvester that issues requests using http (either get or post). metadata is expressed as a record in xml format. an oai-phm implementation must support dublin core, with other vocabularies as additions. oai-pmh is the key technology in the harvesting model of digital library interoperability described by van de sompel et al.28 an oai-pmh-compliant system consists of harvester (client), repository (network accessible server), and items (constituents of a repository). portal sites such as europeana and oaister use oai-pmh to harvest from large numbers of collections.29 there are online registries of oai-compliant repositories. the european commission’s europeana allow users to search across multiple image collections including the british library and the louvre online. another portal site that uses oai-pmh is culturegrid, operated by the uk collections trust. culturegrid provides access to hundreds of museum, galleries, libraries, and archives in the uk. the apache software foundation has developed a module, mod_oai, for apache webservers that helps crawlers to discover content. syndication and exchange here we outline lightweight options for syndication and information exchange. heavyweight web services-based approaches are outside the scope of this article. web syndication commonly uses rss (really simple syndication) or its main alternative, atom. atom is a proposed ietf standard.30 rss 2.0 is the latest version in the rss family of specifications, a simple yet highly extensible open search environments: the free alternative to commercial search services | o’riordan 50 format where content items contain plain text or escaped html.31 atom, developed to counter perceived deficiencies in rss, has a richer content model than rss and is more reusable in other xml vocabularies.32 both rss and atom use http for transport. rss organizes information into channels and items, atom into feeds and entries. extension as modules allows rss to carry multimedia payload (rss enclosures) and geographical information (georss). atom has an associated publishing protocol called atompub. syndication middleware, which supports multiple formats, can serve as an intermediary in application architectures. information and content exchange (ice) is a protocol that aims to “automate the scheduled, reliable, secure redistribution of any content.”33 twice is a java implementation of ice. ice automates the establishment of syndication relationships and handles data transfer and results formatting. this gives content providers more control over delivery, schedule, and reliability than simple web syndication without deploying a full-scale web services solution. the open archives initiative—object reuse and exchange (oai-ore) protocol provides standards for the description and exchange of aggregations of web resources.34 this specification standardizes how compound digital objects can combine distributed resources of multiple media types. ore introduces the concepts of aggregation, resource map, and proxy resource. resource providers or curators can express objects in rdf or atom format and assign http uris for identification. ore supports resource discovery so crawlers or harvesters can find these resource maps and aggregates. ore can work in partnership with oai-pmh. we outline some additional lightweight technologies for information exchange to conclude this section. opml (outline processor markup language) is a format that represents lists of web feeds for services such as aggregators.35 it is a simple xml format. feedsync and rome support formatneutral feed formats that abstract from wire formats such as rss 2.0 and atom 1.0 for aggregator or syndication middleware. these technologies are described in the literature.36 lockss (lots of copies keep stuff safe) is a novel project that users a peer-to-peer network to preserve and provide access to web resources. for example, the metaarchive cooperative uses lockss for digital preservation.37 meta-search meta-search is where multiple search services are combined. such services have a very small share of the total search market owing to the dominance of the big players. metacrawler, developed in the 1990s, was one of the first meta-search engines and serves as a model for how such systems operate.38 a meta-search engine utilizes multiple search engines by sending a user request to multiple sources or engines aiming to improve recall in the process. a key issue with meta-search is how to weight search engines and how to integrate sets of results into a single results list. figure 1 shows a general model of meta-search where the meta-search service chooses which search engines and content providers to employ. active meta-search engines on the web include dogpile, yippy, ixquick, and info.com. note that these types of website appear, change information technology and libraries | june 2014 51 names, and disappear frequently. currently meta-search services use various implementation methods such as proprietary protocols and screen scraping. figure 1. general model of meta-search. metasearch xml gateway (mxg) is a meta-search protocol developed by the niso metasearch initiative, a consortium of meta-search developers and interested parties.39 mxg is a message and response protocol, which enables meta-search service providers and content providers communicate. a goal of the design of mxg was that content providers should not have to expend substantial development resources. mxg, based on sru, specifies both the query and search results formats. combining results, aggregation and presentation are not part of the protocol and handled by the meta-search service. the standard defines three levels of compliance allowing varying degrees of commitment and interoperability. search protocols we describe opensearch and sru, along with applications, in the following subsections. after that, we detail related technologies. opensearch opensearch is a protocol that defines simple formats to help search engines and content providers communicate. it was developed by a9, a subsidiary of amazon, in 2005.40 it defines common formats for describing a search service, query results, and operation control. it does not specify content formats for documents or queries. the current specification, version 1.1, is available with a creative commons license. it is an extensible specification with extensions published on the website. both free open systems and proprietary systems use opensearch. in particular, many open-source search engines and content management systems support opensearch, including yacy, drupal and plone cms. opensearch consists of a description file for search source and a response format for query results. descriptors include elements url, query, syndicationright, and language. resource identification can be by urls, dois, or a linking technology such as openurl. responses describe a list of results and can be in rss, atom, or html formats. additionally there is an auto-discovery feature to signal that a html page is searchable, implemented using a html 4.0 element. search engine content provider search engine content provider choose combine query combined results open search environments: the free alternative to commercial search services | o’riordan 52 opensearch makes very few assumptions about the types of sources, the type of query or the search operation. it is ideal for combining content from multiple disparate sources, which may be data from repositories, webpages, or syndicated feeds. for illustrative purposes, listing 1 gives an example opensearch description for harvesting book information from an example digital library called diglib. the root node includes an xml namespace attribute, which gives the url for the standard version. the url element specifies the content type (a mime type), the query (book in this case), and the index offset where to begin. the rel attribute states that the result is a collection of resources. diglib harvests book items en-us utf-8 listing 1. xml opensearch description record. next, we describe some deployed applications that use opensearch. ojax uses qeb technologies such as ajax (asynchronous javascript) to provide a federated search service for oai-pmh compatible open repositories.41, 42 ojax also supports the discovery feature of opensearch, as described in the opensearch 1.1 specification, for auto-detecting that a repository is searchable. stored searches are in atom format. open-source meta-search engines can combine the results of opensearch enabled search engines.43 a system built as a proof-of-concept uses four search sources: a9.com, yacy, mozdex, and alpha. a user can issue a text query (word or phrase) with boolean operators and several modifiers. users can prefer or bias particular engines by setting weights. the system ranks results, combined using a voting algorithm and implemented using the lucene library. opensearch can be employed to specify the search sources and as a common format when results are combined. as levan points out “the job of the meta-search engine is made much simpler if the local search engine supports a standard search interface.”44 levan also mentions mxg in this context. nguyen et al. describe an application where over one hundred search engines are used in experiments in federated search.45 the search sources were mostly opensearch-compliant search engines. an additional tool scrapes results from noncompliant systems. intersynd uses opensearch to help provide a common protocol for harvesting web feeds. intersynd is a syndication system that harvests, stores, and provides feed recommendations.36 it uses java.net’s rome (rss and atom utilities) library to represent feeds in a format-neutral way. information technology and libraries | june 2014 53 intersynd is syndication middleware that allows sources to post and services to fetch information in all major syndication formats (see figure 2). its feed-discovery module, disco, uses the nutch crawler and the opensearch protocol to harvest feeds. nutch is an open-source library for building search engines that supports opensearch. nutch builds on the lucene information retrieval library, adding web-specifics, such as a crawler, a link-graph database, and parsers for html. figure 2. openseach in intersynd. opensearch 1.1 allows returned results in either rss 2.0 or atom 1.0 format or an opensearch format, the “bare minimum of additional functionality required to provide search results over rss channels” (quoted from a9 website). listing 2 below shows a disco results list in rss 2.0 format. opensearch fields appear in the channel description. the nutch fields appear within each item (not shown). an opensearch namespace is specified in the opening xml element. the following additional opensearch elements appear in the example: totalresults, itemsperpage and startindex. nutch: metasearch nutch search results for query: metasearch http://localhost/nutch-1.6dev/opensearch?query=metasearch&start=0&hitspersite=2&hitsperpage=10 282 0 10 metasearch cut... more items cut... listing 2. results produced using nutch with opensearch. open search environments: the free alternative to commercial search services | o’riordan 54 we mention one more application of opensearch here. a series of nasa projects to develop a set of interoperable standards for sharing information employs various open technologies for sharing and disseminating datasets including opensearch for its discovery capability.46 discovery of document and data collections is by keyword search, using the opensearch protocol. there are various extensions to opensearch. for example, an extension to handle sru allows sru (search and retrieval via url) queries within opensearch contexts. other proposed extensions include support for mobility, e-commerce, and geo-location. sru (search/retrieval via url) a technology with some similarities to opensearch but more comprehensive is sru (search/retrieval via url).47 sru is an open restful technology for web search. the current version is sru 2.0, standardized by the organization for the advancement of structured information standards (oasis) as searchretrieve. version 1.0. sru was developed to provide functionality similar to the widely deployed z39.50 standard for library information retrieval updated for the web age.48 sru addresses aspect of search and retrieval by defining models: a data model, a query model, and processing model, a result set model, a diagnostics model and a description-and-discovery model. sru is extensible and can support various underlying low-level technologies. both lucene and dspace implementations are available. the oclc implementation of sru supports both rss and atom feed formats and the atom publishing protocol. sru uses http as the application transport and xml formats for messages. requests can be in the form of either get or post http methods. sru supports a high-level query language called contextual query language (cql). cql is a human-readable query language consisting of search clauses. sru operation involves three parts: explain, search/retrieve and scan. explain is a way to publish resource descriptions. search/retrieve entails the sending of requests (formulated in cql) and the receiving of responses over http. the optional sru scan enables software to query the index. the result list is in xml schema format. the meta-search service mxg uses sru, but relaxes the requirement to use cql.39 srw (search/retrieve web service) is a web services implementation of sru that uses soap as the transfer mechanism. hammond combines opensearch with sru technology in an application for nature publishers.49 he also points out the main differences between the protocols such as sru’s use of a query specification language and differences in the results records. as well as supporting opensearch data formats (rss and atom), the nature application also supports json (javascript object notation). opensearch is used for formatting the result sets whereas sru/cql is used for querying. this search application launched as a public service in 2009. listing 3 below is an example from the nature application showing cql search queries ( tags) used in an opensearch description document. note how both the sru querytype and the information technology and libraries | june 2014 55 opensearch searchterms attributes appear in the query. further details on how to use sru and opensearch together are on the opensearch website. nature.com opensearch interface for nature.com the nature.com opensearch service nature.com opensearch sru listing 3. example using sru and opensearch. other technologies here we more briefly survey some additional technologies of relevance to open-search interoperability. xml-based approaches to information integration, such as the use of xquery, are an option but do not present a loose integration. chudnov et al. describes a simple api for a copy function for web applications to enable syndication, searching, and linking of web resources.50 called unapi, it requires small changes for publishers to add the functionality to web resources such as repositories and feeds. developers can layer unapi over sru, opensearch, or openurl.51 announced in 2008, yahoo!’s searchmonkey technology, also called yahoo!’s open search platform, allowed publishers to add structured metadata to yahoo! search results. searchmonkey divided the problem into two parts: metadata extraction and result presentation. in is not clear how much of this technology survived yahoo! and microsoft’s new search alliance, signed in 2010.52 mika described a search interface technology called microsearch that is similar in nature.53 in microsearch, semantic fields are added a search and search result presentation enriched with open search environments: the free alternative to commercial search services | o’riordan 56 metadata extracted from retrieved content. govaerts et al. described a federated search and recommender system that operates as a browser add-on. the system is opensearch-compliant and all results are in the atom format.54 the corporation for national research initiatives (cnri) digital object architecture (doa) provides a framework for managing digital objects in a networked environment. it consists of three parts: a digital object repository, a resolution mechanism (handle system), and a digital object registry. the repository access protocol (rap) proves a means of networked access to digital objects, which supports authentication and encryption.55 summary and conclusions a rich set of formats and protocols and working implementations show that open search technology is an alternative to the dominant commercial search services. in particular, we discussed the lightweight opensearch and sru protocols as suitable glue to create loosely coupled search-based applications. these can complement other developments in resource discovery and description, open repositories, and open-source information retrieval. the flexibility and extensibility offers exciting opportunities to develop new applications and new types of applications. the successful deployment of open search technology shows that this technology has matured to support many uses. a fruitful area of further development would be to make working with these standards easier for developers and even accessible to the nonprogrammer. references 1. google custom search api, https://developers.google.com/custom-search/v1/overview. 2. mike cafarella and doug cutting, “building nutch: open source search: a case study in writing an open source search engine,” acm queue 2, no. 2 (2004), http://0dl.acm.org.library.ucc.ie/citation.cfm?doid=988392.988408. 3. wray buntine et al., “opportunities from open source search,” in proceedings, the 2005 ieee/wic/acm international conference on web intelligence, 2–8 (2005), http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1517807. 4. jamie callan, “distributed information retrieval,” advances in information retrieval 5 (2000): 127–50. 5. péter jacsó, “internet insights—thoughts about federated searching,” information today 21, no. 9 (2004): 17–27. 6. ricardo baeza and prabhakar raghavan, “next generation web search,” in search computing (berlin heidelberg: springer, 2010): 11–23, http://link.springer.com/chapter/10.1007/9783-642-12310-8_2. https://developers.google.com/custom-search/v1/overview http://0-dl.acm.org.library.ucc.ie/citation.cfm?doid=988392.988408 http://0-dl.acm.org.library.ucc.ie/citation.cfm?doid=988392.988408 http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1517807 http://link.springer.com/chapter/10.1007/978-3-642-12310-8_2 http://link.springer.com/chapter/10.1007/978-3-642-12310-8_2 information technology and libraries | june 2014 57 7. trevor strohman et al., “indri: a language model-based search engine for complex queries,” in proceedings of the international conference on intelligent analysis 2, no. 6, (2005): 2–6. 8. xapian project website, http://xapian.org/. 9. andrew aksyonoff, introduction to search with sphinx: from installation to relevance tuning (sebastopol, ca: o’reilly, 2011). 10. rohit khare, “nutch: a flexible and scalable open-source web search engine,” oregon state university, 2004, p. 32, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.105.5978 11. “xapian users,” http://xapian.org/users. 12. christian middleton and ricardo baeza-yates, “a comparison of open source search engines,” 2007, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.119.6955. 13. apache solr, http://lucene.apache.org/solr/. 14. ross singer, “in search of a really ‘next generation’ catalog,” journal of electronic resources librarianship 20, no. 3 (2008): 139–42, http://www.tandfonline.com/doi/pdf/10.1080/19411260802412752. 15. europeana portal, http://www.europeana.eu/portal/. 16. maristella agosti et al., delosdlms—the integrated delos digital library management system berlin heidelberg: springer, 2007). 17. mehdi alipour-hafezi et al., “interoperability models in digital libraries: an overview,” electronic library 28, no. 3 (2010): 438–52, http://www.emeraldinsight.com/journals.htm?articleid=1864156. 18. institute of electrical and electronics engineers, ieee standard computer dictionary: a compilation of ieee standard computer glossaries (new york: ieee, 1990). 19. clifford lynch and hector garcía-molina, “interoperability, scaling, and the digital libraries research agenda,” in iita digital libraries workshop, 1995. 20. andreas paepcke et al., “interoperability for digital libraries worldwide,” communications of the acm 41, no. 4 (1998): 33–42. 21. manjula, patel et al., “"semantic interoperability in digital library systems,” 2005, http://delos-wp5.ukoln.ac.uk/project-outcomes/si-in-dls/si-in-dls.pdf. 22. georgios athanasopoulos et al., “digital library technology and methodology cookbook,” deliverable d3.4, 2011, http://www.dlorg.eu/index.php/outcomes/dl-org-cookbook. http://xapian.org/ http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.105.5978 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.119.6955 http://lucene.apache.org/solr/ http://www.tandfonline.com/doi/pdf/10.1080/19411260802412752 http://www.europeana.eu/portal/ http://www.emeraldinsight.com/journals.htm?articleid=1864156 http://delos-wp5.ukoln.ac.uk/project-outcomes/si-in-dls/si-in-dls.pdf http://www.dlorg.eu/index.php/outcomes/dl-org-cookbook open search environments: the free alternative to commercial search services | o’riordan 58 23. herbert van de sompel and oren beit-arie, “open linking in the scholarly information environment using the openurl framework,” new review of information networking 7, no. 1 (2001): 59–76, http://www.tandfonline.com/doi/abs/10.1080/13614570109516969. 24. daniel chudnov, “coins for the link trail,” library journal, 131 (2006): 8-10.25. lois mai chan and marcia lei zeng, “metadata interoperability and standardization—a study of methodology, part ii,” d-lib magazine 12, no. 6 (2006), http://www.dlib.org/dlib/june06/zeng/06zeng.html. 26. json (javascript object notation), http://www.json.org/. 27. the open archives initiative protocol for metadata harvesting, http://www.openarchives.org/oai/openarchivesprotocol.html. 28. herbert van de sompel et al., “the ups prototype: an experimental end-user service across e-print archives,” d-lib magazine 6, no. 2 (2000), http://www.dlib.org/dlib/february00/vandesompel-ups/02vandesompel-ups.html. 29. oaister, http://oaister.worldcat.org/. 30. mark nottingham, ed., “the atom syndication format. rfc 4287,” memorandum, ietf network working group, 2005, http://www.ietf.org/rfc/rfc4287. 31. rss 2.0 specification, berkman center for internet & society at harvard law school, july 15, 2003, http://cyber.law.harvard.edu/rss/rss.html. 32. “rss 2.0 and atom 1.0 compared,” http://www.intertwingly.net/wiki/pie/rss20andatom10compared 33. jay brodsky et al., eds., “the information and content exchange (ice) protocol,” working draft, version 2.0, 2003, http://xml.coverpages.org/icev20-workingdraft.pdf. 34. open archives initiative object reuse and exchange, http://www.openarchives.org/ore/. 35. opml (outline processor markup language), http://dev.opml.org/. 36. adrian p. o’riordan, and m. oliver o’mahoney, “engineering an open web syndication interchange with discovery and recommender capabilities,” journal of digital information, 12, no. 1 (2011), http://journals.tdl.org/jodi/index.php/jodi/article/viewarticle/962. 37. vicky reich and david s. h. rosenthal, “lockss: a permanent web publishing and access system,” d-lib magazine 7, no. 6 (2001): 14, http://mirror.dlib.org/dlib/june01/reich/06reich.html. http://www.tandfonline.com/doi/abs/10.1080/13614570109516969 http://www.dlib.org/dlib/june06/zeng/06zeng.html http://www.json.org/ http://www.openarchives.org/oai/openarchivesprotocol.html http://www.dlib.org/dlib/february00/vandesompel-ups/02vandesompel-ups.html http://oaister.worldcat.org/ http://www.ietf.org/rfc/rfc4287 http://cyber.law.harvard.edu/rss/rss.html http://www.intertwingly.net/wiki/pie/rss20andatom10compared http://xml.coverpages.org/icev20-workingdraft.pdf http://www.openarchives.org/ore/ http://dev.opml.org/ http://journals.tdl.org/jodi/index.php/jodi/article/viewarticle/962 http://mirror.dlib.org/dlib/june01/reich/06reich.html information technology and libraries | june 2014 59 38. erik selberg and oren etzioni, “multi-service search and comparison using the metacrawler,” in proceedings of the fourth int'l www conference, boston, 1995. [pub info?] 39. niso metasearch initiative, metasearch xml gateway implementers guide, version 1.0, niso rp-2006-02, 2006, http://www.niso.org/publications/rp/rp-2006-02.pdf. 40. dewitt clinton, “opensearch 1.1 specification, draft 5,” http://opensearch.org/specifications/opensearch/1.1. 41. judith wusteman, “ojax: a case study in agile web 2.0 open source development,” in aslib proceedings 61, no. 3 (2009): 212–31, http://dx.doi.org/10.1108/00012530910959781. 42. judith wusteman and padraig o’hlceadha, “using ajax to empower dynamic searching,” information technology & libraries 25, no. 2 (2013): 57–64, http://0www.ala.org.sapl.sat.lib.tx.us/lita/ital/sites/ala.org.lita.ital/f iles/content/25/2/wusteman.pd f. 43. adrian p. o–riordan, “open meta-search with opensearch: a case study,” technical report hosted at cora.ucc.ie repository, 2007, http://dx.doi.org/10468/982. 44. ralph levan, “opensearch and sru: a continuum of searching,” information technology & libraries 25, no. 3 (2013): 151–53, https://napoleon.bc.edu/ojs/index.php/ital/article/view/3346. 45. dong nguyen et al., “federated search in the wild: the combined power of over a hundred search engines,” in proceedings of the 21st acm international conference on information and knowledge management (maui, hawaii): acm press, 2012): 1874–78, http://dl.acm.org/citation.cfm?id=2398535. 46. b. d. wilson et al., “interoperability using lightweight metadata standards: service & data casting, opensearch, opm provenance, and shared sciflo workflows,” in agu fall meeting abstracts 1 (2011): 1593, http://adsabs.harvard.edu/abs/2011agufmin51c1593w. 47. library of congress, “sru—search/retrieve via url,” www.loc.gov/standards/sru. 48. the library of congress network development and marc standards office, “z39.50 maintenance agency page,” www.loc.gov/z3950/agency. 49. tony hammond, “nature.com opensearch: a case study in opensearch and sru integration,” d-lib magazine 16, no. 7/8, (2010), http://mirror.dlib.org/dlib/july10/hammond/07hammond.print.html. 50. daniel chudnov et al., “introducing unapi,” 2006, http://ir.library.oregonstate.edu/xmlui/handle/1957/2359. http://www.niso.org/publications/rp/rp-2006-02.pdf http://opensearch.org/specifications/opensearch/1.1 http://dx.doi.org/10.1108/00012530910959781 http://0-www.ala.org.sapl.sat.lib.tx.us/lita/ital/sites/ala.org.lita.ital/files/content/25/2/wusteman.pdf http://0-www.ala.org.sapl.sat.lib.tx.us/lita/ital/sites/ala.org.lita.ital/files/content/25/2/wusteman.pdf http://0-www.ala.org.sapl.sat.lib.tx.us/lita/ital/sites/ala.org.lita.ital/files/content/25/2/wusteman.pdf http://dx.doi.org/10468/982 https://napoleon.bc.edu/ojs/index.php/ital/article/view/3346 http://dl.acm.org/citation.cfm?id=2398535 http://adsabs.harvard.edu/abs/2011agufmin51c1593w http://www.loc.gov/standards/sru http://www.loc.gov/z3950/agency http://mirror.dlib.org/dlib/july10/hammond/07hammond.print.html http://ir.library.oregonstate.edu/xmlui/handle/1957/2359 open search environments: the free alternative to commercial search services | o’riordan 60 51. daniel chudnov and deborah england, “a new approach to library service discovery and resource delivery,” serials librarian 54, no. 1–2 (2008): 63–69, http://www.tandfonline.com/doi/abs/10.1080/03615260801973448. 52. “news about our searchmonkey program,” yahoo! search blog, 2010, http://www.ysearchblog.com/2010/08/17/news-about-our-searchmonkey-program/. 53. peter mika, “microsearch: an interface for semantic search,” in semantic search, international workshop located at the 5th european semamntic web conference (eswc 2008) 334 (2008): 79–88, http://ceur-ws.org/vol-334/. 54. sten govaerts et al., “a federated search and social recommendation widget,” in proceedings of the 2nd international workshop on social recommender systems ([pub info?], 2011): 1–8. 55. s. [first name?]reilly, “digital object protocol specification, version 1.0,” november 12, 2009, http://dorepository.org/documentation/protocol_specification.pdf. http://www.tandfonline.com/doi/abs/10.1080/03615260801973448 http://www.ysearchblog.com/2010/08/17/news-about-our-searchmonkey-program/ http://ceur-ws.org/vol-334/ http://dorepository.org/documentation/protocol_specification.pdf lib-s-mocs-kmc364-20140601051338 multipurpose cataloging and indexing system (cain) at the national agricultural library. 21 vern j. van dyke: chief, computer applications, national agricultural library, and nancy l . ayer: computer systems analyst, national agricultural library, beltsville, maryland. a description of the cataloging and indexing system (cain) which the national agricultural library has been using since january 1970 to build a broad data base of agricultural and associated sciences information. with a single keyboarding, bibliographic data is inputed, edited, manipulated, and merged into a permanent base which is used to produce many types of printed or print-ready end-products. presently consisting of five subsystems, cain utilizes the concept of controlled authority files to facilitate both information input and its retrieval. the system was designed to provide maximum computer services with the minimum of effort by users. introduction this article describes an interactive system in operation at the national agricultural library which with a single keyboarding of data provides all necessary catalog cards, book catalogs, bibliographies, and related internal reports, as well as a computer data base for information retrieval. primarily in batch mode, the system can operate on an ibm 360 with 256k memory using os, six magnetic tape drives, a card reader, and a line printer. background the national agricultural library ( nal) as one of the three national libraries is responsible for the collection and dissemination of agricultural information on a national and worldwide basis. in this pursuit publications are obtained through gifts, exchange agreements, and by purchase of items in many languages. titles of those items in non-roman alphabets are transliterated and all non-english titles are translated. the volume of publications handled by nal in 1969 was in the neigh22 journal of library automation vol. 5/1 march, 1972 borhood of 600,000, of which approximately 275,000 were added to the collection. this volume was sufficiently large to provide a serious problem to nal's staff and thus computer assistance was clearly a logical and necessary arrangement. in 1964 a computer group was formed in nal; it became active in developing systems to prepare voluminous indexes for the bibliography of agriculture, the complete pesticides documentation bulletin, and the categorical and alphabetical issues of the agricultural/ biological vocabulary. during 1969 these systems were consolidated and expanded so as to process all input data within one coordinated set of parameters. in january 1970 the new cataloging and indexing (cain) system was implemented. system design cain is a complex and comprehensive computer system which has been engineered to handle up to five ( 5) simultaneous but separate users who share the same controlled authority files. the basic precept in development of computer applications at nal is to make input and output simple and convenient for the users, with the computer assuming as much detail and data manipulation as is technically feasible. at nal the current users providing input data are the new book section, cataloging, indexing, and agricultural economics. operating in parallel, cain also services the herbicides data base of the agricultural research service; the international tree disease data base of the forest service; and in 1971 will be installed in the library of the technion-israel institute of technology in haifa, israel. the master data record is variable in length with a fixed portion of 173 characters and up to fifty-seven additional segments of 65 characters each. the fixed portion includes basic data plus a directory of data contained in the variable portion. data elements in cain are: a. file code-delineates the various files. b. identification number-on cataloged items this embodies the accession number. all identification numbers include the year of accession, a parallel run code plus a unique control number. c. source code. d. user codes-specific identification of up to five users. e. english indicator-language of text. f. translation code-availability of an english translation. g. language, if other than english. h. proprietary restrictoridentifies classified records. i. title tracing indicator-for catalog cards. j .. main entry-designates main entry if not normal sequence. k document type-whether journal article, monograph, serial, etc. i. filing location-if other than in the library stacks. m. categories-two. general area of coverage of subject matter. cataloging and indexing system/van dyke and ayer 23 n. new book description-if the title is not sufficiently explanatory. o. titles-three types: ( 1 ) vernacular or short, ( 2) alternate or holdings, and ( 3) translated title (english). p. personal authors-up to 10. names plus identifying data. q. corporate authors-maximum of two. r. major personal author affiliation. s. abbreviated journal title if item is a journal article; imprint if monographs and serials. t. collation/pagination. u. date-two: search date, and date on publication if different. v. call number. w. subject terms-may be nested. up to 45. x. general notes. y. special purpose numbers-patent, grant, analysis, contract, technical, or report. z. series statement. aa. abstract/ extract. bb. tracings not otherwise normally generated by the system. cc. nonvocabulary cross-references. the total number of individual elements is limited only by the maximum record size. the nal-produced software is written in cobol. the data base is maintained on tape which is nine-track, 800 bpi, blocked 2, in ebcdic, with standard ibm 360 header and trailer labels. the total system presently consists of forty programs, some of which are multipass. in addition, throughput is sorted twenty-five times during the full computer run. these, of course, include the search and retrieval programs and sorts which are run only on request. the ultimate system which nal is working toward and for which the basic design is already substantially complete is an on-line full library document locator and control system which may be linked via dial-up service to an international and national science and technology information network. each portion of cain is developed with the broader picture in mind. it was this factor which weighed heavily in selecting cathode ray tube (crt) terminals for the proposed data gathering subsystem inasmuch as crt's will be the predominant type of terminal in the future network. for convenience in discussion, the system will be described by its subsystems: data gathering, edit and update, publication, search and controlled authorities. data gathering subsystem from its inception the input to cain was in the form of punched cards, a method which has proved to be slow and error prone. in order to eliminate double keyboarding and excessive time lag, as well as to reduce the 24 journal of library automation vol. 5/ 1 march, 1972 error rates, it was decided to perform this input function in the library with trained library personnel. to accomplish this, nal proposes to implement an "on-line" type of input subsystem using crt's. although this form of entry is not yet in use, the subsystem should operate substantially as follows. the documents are to be marked by catalogers and indexers and passed to library technicians who will enter the data through crt's into an on-line storage file. to do this, the technician will call from the hardware prestored formats as desired and fill in the data elements required. these formats use english terms and for the most part call for data rather than codes. in addition, data are to be entered in normal upperand lowercase without diacritics, thus improving visual scanning for errors. an average of four formats will be needed to enter one item. by use of an algorithm, the system would store formatted records for each id in such a manner as to permit recall singly or collectively. the physical documents are then to be passed on to an editor who can recall any or all formatted records for review. with the document in hand, stored records will be reviewed and corrected if necessary. when acceptable, the records will then be transmitted to magnetic tape. variations on this procedure could include input direct to tape, storage to tape without recall to a crt by an editor, cancellation of actions, and a direct purge of the entire storage file without loss of the controlling matrix. the expertise of the library technicians inputting the data should insure far more accuracy than could be expected from multihandling and multikeyboarding. in addition the system has been designed to accomplish basic pre-cain editing of such factors as numeric or alphabetic characters in certain fields and overall lengths of the fields. errors in these categories will be promptly identified by the computer by a blinking feature on the crt screen. another major benefit of this direct approach is that documents can be processed through the system so as to reach the stacks twenty-four days faster than under the current keypunch method. magnetic tapes created by the data gathering system will be periodically converted from ascii to ebcdic and processed into the edit and update subsystem of cain. the present nal time schedule for updating master cain files is weekly. this is not a requirement of the system but an administrative decision based on other deadlines. the data gathering system as prescribed by nal will be composed of sixteen crt's, a large on-line storage file , and one nine-track 800 bpi magnetic tape drive. this configuration will be either a hard-wired "black-box" approach, or controlled by a dedicated mini-computer. the hardware prescribed for this subsystem is not included as a requirement of cain inasmuch as transactions can be entered on 80-column cards if desired. an additional feature of this subsystem will be the generation of managecataloging and indexing systemjv an dyke and ayer 25 ment information feedback. this will encourage elimination of manual counts and provide accurate throughput volume statistics on a timely basis. through this means the supervisor will be in a better position to evaluate workload, individual performance, and hardware utilization. edit and update subsystem the first step in the acceptance of transactions is a thorough validation of each data element. the computer is used to relieve librarians of the voluminous and time-consuming edit of many individual elements having predetermined limits. thus, only a cursory review of the proof-listed records is necessary by a librarian before acceptance. the system cannot detect, of course, logical or typographical errors, but it can determine the absence of necessary information, codes in invalid ranges, and the incorrect placement of data. elements for which the system supplies authority files are not only verified against the file but also additional transactions are generated from the authority file to assure uniformity in output. this also eliminates the necessity for librarians having to enter those elements which have a direct predictable relationship to another element. further validations are performed at the point of building new records or updating records already in the master file. the two "master" files are ( 1 ) the temporary set of unselected records and ( 2 ) the permanent set of those records which have been approved and selected for publication in some form. data elements specified as required within each record are reviewed. if one or more is missing, the system refuses to approve this record, and a notice is produced concerning this reversal of human input. fields can be deleted, in whole or in part, replaced or added. three types of output from this subsystem are: • new updated master files. those which have been added or altered during this update run are proof-listed for cursory review by a team of professional librarians. corrections and/ or approvals are submitted in a subsequent update run. • activity notices. every action whether submitted by the user or system-generated which has been accepted for processing is reported. • error notices. all error and warning messages from this subsystem are compiled into one listing. this includes errors on individual elements, system-discovered errors of omission, and warnings of computer overriding of submitted actions. through the use of control cards various handling options are possible. one of these is proof-listing of a specific range or ranges of masters by identification numbers or dates. subject headings are assigned by professional librarians for monographs and new serial titles. for journal articles, however, the system analyzes the title of the article and creates subject index terms, using single words, 26 journal of library automation vol. 5/1 march, 1972 combinations of two words not separated by stop words, and singular and plural variations. the generated terms are then processed against the controlled authority file. those accepted as valid are inserted in the record for searching purposes. publication and distribution subsystem each data element of a bibliographic item is captured only once and at the earliest possible time in the receipt process. master records which have successfully passed the edit and update phase become candidates for various types of publications and other user services. six major modes of publication products are produced by cain, at various times and in a variety of both formats and media. preliminary to the production of formal output there is a screening for records designated as fully acceptable by the edit and update subsystem. as mentioned above, any record may be identified as being applicable to any combination of from one to five users. by a method of control cards the system is informed as to which users are scheduled for publication/ distribution, and the maximum quantity to be selected in each case. this subsystem reviews each record to ascertain its appropriateness for selection. records meeting the criteria are siphoned off for individual handling. no record is dropped from the temporary file until it has been selected by all applicable users. a new book shelf listing may be printed on photocopy paper on request. on preparation, it is ready to be matted, photographed, printed, and distributed throughout the department of agriculture. only enough new book entries are selected by the computer at one time as will fit on three sheets of a four-page publication. approved cataloged records are selected weekly. each record is analyzed for applicability to any or all of the eight major files for which catalog cards are prepared. each card file has its own criteria both in content and in the number and types of cards produced for it. the system produces a separate record for each card required, sorts together the records for each file, and alphabetizes within that file. leading articles (regardless of language) are printed but are excluded in the sorting procedure. cards are printed two-up in upperand lowercase in the format prescribed by angloamerican cataloging rules. after printing, the cards are distributed to the appropriate organizations and sections where they may be filed with a minimum of additional effort. monthly, a book catalog is compiled. this contains not only a listing by main entry but also indexes of personal authors, corporate authors, subjects, and titles. a biographic index (major personal author affiliation) capability is available although not presently used by nal in the book catalog. this catalog is printed in varying numbers of columns changeable by control card option for each index. again photocopy paper is used with a standard cataloging and indexing system/van dyke and ayer 27 upperand lowercase (tn) print train. an alternate option is magnetic tape output formatted for direct input to a computer-driven linotron. see bibliographic description for more detail. semiannually the index portions of the book catalog are cumulative. main entry listings are not repeated. multiyear accumulations may also be produced. the book catalogs are presently being published from photocopy printout by rowman and littlefield, inc., new york. bibliographies, either scheduled or special, can be produced with the same indexes as those in the book catalog. these are normally prepared for printing via the linotron. this magnetic tape record contains all formatting requirements with the exception of word divisions. document title, page, and columnar (subject category) headers are provided by nal. running headers are inserted by the linotron. through predetermined codes, the cain tape specifies the print style, print size, and print format. bibliographies may also be computer printed on photocopy paper similar to the book catalog. once a month, each record selected for publication is processed through a merge and adjustment program. at this point published records not previously on the permanent master file are added to it. those which are already on it are compared and the resident record is adjusted to include the new user for whom the record has just been published. the term field is also verified and updated if necessary. each term is also used to generate posting records for the subject authority file. the permanent (published) cain data base is available on magnetic tape in either the master format or a print format of the linear proof (listing of each data element). only records not previously published are added to the monthly sale tapes. these tapes may be ordered individually (new monthly selections) or collectively (whole file) at the cost of reproduction only. the tape is nine-track, 800 bpi, ebcdic with standard ibm 360 header and trailer labels. one of the purchasers of cain tape is the ccm information corporation of new york which publishes bibliography of agriculture from it starting in 1970. current purchasers include private corporations and universities, both in the united states and abroad. the last type of output is normal computer printout of numerous internal reports in a variety of customized formats. search subsystem the search capability of the cain system is not being used by nal on its own data base at the present time. it is utilized, however, by other organizations who run the cain system on a parallel basis, maintaining their own data bases. the following description, therefore, pertains to the programmed system rather than to its use on the nal data base. this subsystem permits identification and retrieval of records in cain format based on search statements as applied to almost every data element 28 journal of library automation vol. 5/ 1 march, 1972 or combinations thereof. such searches may use simple statements or a complex series of nested boolean parameters. questions may also be absolute or weighted to give more precise results. the weight factors if used are normally assigned to each statement within a search question, with a threshold weight assigned to the overall question. the total weight of all true statements must be equal to or greater than the threshold weight for the full query in order to be considered as meeting the search criteria. if such is not the case, the record will not be selected. since cain uses a controlled vocabulary, query statements on subject terms are first matched against that authority file. at this point each invalid (use ) term is replaced by a corresponding valid ( uf ) term if appropriate. in addition, if the query statement so specifies, the requested terms may be expanded one level in the hierarchy. in other words, it could generate additional statements requesting all broader, narrower, or related terms as specified if such structure were present for the subject within the vocabulary. because subject terms comprise the largest percentage of all search elements, an algorithm was developed whereby queries on this type of element are first processed against an inverted file. identification numbers are extracted for all terms matching the query and only those candidate records are searched using the full query. on a serial file such as cain, this concept provides a substantial savings in computer run time. the print options of retrieval output allow either for normal sequence by identification number or for a specific sequence as requested by the originator. the printout may contain all data elements or only those selected, all others being suppressed. at the present time this subsystem is used infrequently by nal and only for internal high priority searches due to the extremely limited subject indexing terms present. it is used more extensively on the parallel operation established for the international tree disease register maintained for the u. s. forest service. authority files subsystem this subsystem updates, generates, expands, and maintains three types of authority files. these include subject terms with associated hierarchy, call numbers of indexed journals with abbreviated titles, and a subject term inverted file carrying the identification number of each record using that term. each transaction to add, change, or delete any data is both edited and reversed before entering the updating sequence. thus an addition of a narrower term (for example, horse) to a base term (for example, animal) will automatically generate another transaction to add the broader term of animal to a base term (new or existing ) of horse. this precludes having to manually enter both sides of an action as well as assuring reciprocity of entries. due to the flexibility of the search subcataloging and indexing system/van dyke and ayer 29 system of cain, this hierarchical continuity is of great importance. if an item is changed the same procedure is followed. in the instance of deletion, a broader precept is involved. in this case, the term is deleted from all entries in other hierarchies but is itself left on the authority fil e and marked as being no longer valid. it is thus available for search purposes but is not allowed to be used on subsequent cain data records. during a normal cain data run, each call number or subject term in a record is verified against the appropriate file. each element on these files is carried in two forms-one in stripped uppercase, and the other in preferred print form. when an incoming term is found on the authority file, the system substitutes the proper form. this includes substituting a valid term for an invalid term as in the "use-use for" relationship, as well as generation of the appropriate abbreviated journal title for a given call number. in order to keep the authority file up to date, the transactions generated by the publication subsystem are now used to insert the record identification number into the inverted file as well as increase the number of postings per term. this assists search specialists in formulating queries in the manner which will reduce computer processing time to the greatest degree. when published, the authority files themselves can be printed in a special format which displays the entire hierarchy of each term. in addition, up to ten levels of increasingly narrower terms can be listed for each term. summary cain is a broad-based comprehensive batch mode system which meets many library requirements. its flexibility is apparent from the fact that it has already been expanded to se lect each newly cataloged serial record for transmission in marc ii communication format to the national serials data bank being created by the three national libraries. still more capabilities will undoubtedly be built into it before the nal ultimate on-line system is implemented. the major thrust of the systems design has been to concentrate on simplifying user interface while imposing stringent and extensive service requirements on the computer system itself. due to its inherent fluidity, cain is being retained as an in-house system. it is so complex that a single change in one subsystem may have radial effects in any or all of the other portions. continuing efforts are underway to simplify input, accelerate throughput, and expand its already generous services both to the staff of the national agricultural library and to those organizations utilizing output from the cain system. 290 a computer-accessed microfiche library r. g. j. zimmermann: department of engineering-economic systems, stanford university, stanford, california. at the time this article was written, the author was a member of the technical staff, space photography laboratory, california institute of technology, pasadena, california. this paper describes a user-interactive system for the selection and display of pictorial information stored on microfiche cards in a computej'controlled viewer. the system is designed to provide rapid access to photographic and graphical data. it is intended to provide a library of photogmphs of planetary bodies and is currently being used to sto1·e selected martian and lunar photogmphy. introduction information is often most usefully stored in pictorial form. photography, for example, has become an important means of recording data, especially in the sciences. a major reason for this importance is that photographs can be used to record information collected by instruments and not normally observable by the unaided eye. such photographs, especially in large quantities, may present a barrier to their use because of the inconvenience of reproducing and handling them. it is apparent that a system to compactly store and to speed access to these photographs would be very useful. such a system, utilizing a microfiche viewer directly controlled by a user-interactive computer program, has been developed to support a library of photographs taken from space. in the past fifteen years, the national aeronautics and space administration has conducted many missions to photograph planetary bodies. these missions have provided millions of pictures of the earth, moon, and mars. a large number of additional pich1res are expected to be taken in the near future. the space photography laboratory of the california institute of technology is establishing, under nasa auspices, a microfiche library of a selection of these photographs. the library currently contains the photographs of mars taken by the mariner 9 spacecraft as well as lunar photographs taken by the lunar orbiter series. the library is expected to be expanded as time and resources permit. it has been operating, with various versions of the control program, since june 1972. the program is: currently being further developed by mr. david neff and miss laura hormicrofiche libraryjzimmermann 291 ner of the space photography laboratory at the california institute of technology. hardware the photographs are kept on 105-by-148mm microfiche cards, sixty frames to a card. this format provides the least reduction of any standard microfiche format and was used to retain the highest possible resolution. the cards are displayed by a microfiche viewer (image systems, culver city, california) which can store up to about 700 cards and has the capability of selecting a card and displaying any frame on it within a maximum of about four seconds. (throughout this paper, "viewer" will be used to refer to the microfiche viewing device. ) the viewer can be equipped with a computer interface which allows the picture display to be directly computer controlled. an installation consists of the viewer with interface, any standard input/output ( ijo) terminal, and the control program, running, in this case, on a time-shared computer. the terminal is used for communication with the control program. the user enters all commands by typing on the terminal keyboard. the viewer is designed to be plugged in between the computer and i/0 terminal. the computer transmits all information on the circuit to which normally (without the viewer) only the terminal is attached. this information includes the viewer picture display control codes which are recognized and intercepted by the viewer. all other information is passed on to the terminal. no further special equipment is necessary. the system described has been implemented on a digital equipment corporation system 10 medium-scale computer with a time-sharing operating system. the program is written mainly in fortran with some assembly language subroutines. it runs in 12k words ( 36 bits /word) of core memory. the program will not run without conversion on any computer other than the dec system 10. software the control program is user-interactive, that is, it accepts information and commands from the user. these commands allow him to indicate what he desires and to control the action taken by the program. the program permits the user to indicate what characteristics he wishes the pictures to have, selects the pictures that satisfy his criteria, and then allows him to control the display of the selected pictures and to obtain any additional information he may need to interpret the pictures. to guide the user, instructions for use of the system, as well as other infonnation the user may need, are displayed on the viewer as they are required. all user responses are extensively checked for validity. any uninterpretable response is rejected with a message indicating the source of the trouble, and may be reentered in corrected form. it is always possible to return to a previous state, so it is impossible to make a "catastrophic" error. in designing the 292 journal of librat'y automation vol. 7/4 december 1974 system, particular attention was paid to integrating the viewer and computer to utilize the unique capabilities of each. for example, most instructions are presented on the viewer where they can be shown quickly and can be scanned easily by the user. only short messages need to be sent and received by the i/0 terminal. data base a picture is described by a number of characteristics, called parameters. for every picture stored in the viewer, the value for each of these parameters is stored in a disc file. in this application, parameters are mainly used to describe characteristics that are available without analyzing the picture for content. in science, these are the experimental conditions-such as viewing and lighting conditions for space photography. because space photographs are taken by missions with different objectives and equipment, it was necessary to design a library system to include pictures with widely varying selection characteristics. in order to accommodate sets of pictures with widely differing characteristics, without wasting storage space or requiring the elimination of useful descriptors, the computer storage has been structured to allow pictures to be grouped into picture sets, each of which is described by its own set of parameters. conversely, any group of pictures for which the same selection parameters are used forms a picture set. the characteristics of each such set of pictures are also stored and the program reconfigures itself to these characteristics whenever a new picture set is encountered. such an organization allows the control program to be used on groups of totally different kinds of pictures. opemtion in selecting a picture set the user is guided along a series of decisions presented on the viewer. at each step the control program directs the viewer to display a frame with a set of possible choices. the user enters his response on the i/0 terminal and the control program uses this response to determine which frame the viewer should be commanded to display next. when the user has selected a set, he is shown the available parameters and apppropriate values for these parameters. after he has specified acceptable values for the parameters he is interested in, the computer program compares these values with the known values in its records for the picture set. the pictures selected by the program are then available for display. as will be described, the user may, at any time, select another picture set or change his parameter specifications. he may also indicate which pictures of those selected by the computer during the comparison search he wishes to have remain available after the next comparison search. this allows comparison of pictures in different picture sets. appendix 1 shows an example of a typical search. the action of the control program can be separated into five phases of microfiche library/zimmermann 293 operation, each with a distinct function. the functions of three of these phases involve user interaction. transfer between phases may also be accomplished by user command. a different group of commands is employed for each of the user-interactive phases. in addition, there is a group of commands which may be used any time a user response is requested; they are listed in appendixes 3 and 4. there are no required commands or sequences of commands. the user proceeds from one phase to another as he desires. in each phase allowing user interaction, the user can enter any valid command at any time. figure 1 shows the phases and possible transfers between phases. a more detailed description of what occurs in each phase will be given after the data organization is described. picture set selection parameter specification search optimization comparison search picture display and information access bold lines enclose user-interactive phases. arrows indicate possible directions of control transfer; bold arrows are control transfers made by user commands. fig. 1. phases and control transfers. description of software data base organization as has been stated, the pictures of the library are grouped into picture sets. the data base may contain any number of picture sets. each such set has a picture file associated with it. this picture file is on disc storage and 294 journal of library automation vol. 7/4 december 1974 contains all the known information stored for a set of pictures. each picture in the set has an associated picture record in the file. in addition, the first record in a picture file, known as the format record, contains all the file specific information about that file. whenever a new picture file is called for, the format record for that file is read from disc storage into main memory and kept for reference. figure 2 shows the organizational structure of the data base. picture files (as many as required) format record picture records / ~.___i ~i ......_i ~if }ij fig. 2. picture file organization, picture records consist of a fixedand a variable-length portion. the variable-length portion contains the known values, for the associated picture, of the specification parameters. since the number of parameters, can vary from file to file, the length of this portion varies from file to file. (however, all picture records within a particular file have the same length and form.) the maximum number of parameters for a system is determined by array dimensions set when the program is compiled. currently these dimensions are set for a maximum of fifty parameters for any file in the system. the fixed-length portion contains (generally) the same type of information for all files. it includes the information needed to display a picture and to obtain interpretive information. when, during the comparison search, a picture is selected on the basis of information in the variable data, the fixed-length portion is copied into a table and kept for use during the picture display phase. each selected picture is represented by an entry in this table. the contents of the fixed-length portion are presented in table 1. as an example, the contents of a picture record for the mariner 9 photographs are given in appendix 5. a picture file's format record describes the file by all characteristics that are allowed to vary from file to file. the format records for all picture files have the same form; each is divided into a number of fields supplying information for a particular function. these fields can be separated into two categories: those which describe the picture records and those which apply to the file as a whole. for fields of the first type, each parameter has an enb·y in the field. for example, one such field contains the location, in microfiche librm·y /zimmermann 295 table 1. the fixed-length portion of a picture record field use fiche code file name picture number unit number id number auxiliary codes ( 3 fields) control code output by the control program to the viewer to display the frame associated with this picture record. the file name of the picture file; this and the picture number uniquely identify the picture record and allow it, and specifically the contents of the variable portion, to be refound. a sequence number assigned each picture record in the file in increasing order. the viewer that the picture associated with this picture record is stored in. the identification number referred to by the user. if the picture has been given an id number by which it is commonly known, it will be kept in this field. viewer control codes for frames containing different versions of, or auxiliary data for, the picture. the actual contents of these fields vary with the picture file as determined from the contents of the format record of that file. a picture record, of the value for each of the parameters. another field has a ten-letter description of each parameter. see appendix 2 for a description of the format field. operation of the control progmm the following is a brief technical description of the control program; detailed documentation is available. the control program is modularly constructed. each phase consists of a major subroutine and its subsidiary subroutines. at the completion of a phase, control is transferred to a main program which determines which phase is to be performed next and transfers control to it. the user-interactive (interrogation) subroutines ask for a user response, attempt to interpret the response and perform the desired function, then ask for another response. an important subroutine used by all the interrogation subroutines collects the characters of the user response into groups of similar characters to form alphabetic keywords, numbers, punctuation marks, relational operators, etc. when an interrogation subroutine is ready for a user request, it calls this "scanning" subroutine. the scanning subroutine outputs an asterisk, indicating it is ready, to the user i/0 terminal. the scanning subroutine supplies the groups of characters, along with a description of the group, to the interrogation subroutine. the interrogation subroutine then attempts to interpret the character groups by comparing them with acceptable responses. if the response is not in one of the acceptable forms, an error message is given to the user and he can try again. the error message includes an indication of where the error was found and describes the error. some commands do not need to be interpreted by the interrogation subroutines; the function they request is the same throughout the program. these are called immediate commands and are listed in appendix 3. these 296 journal of library automation vol. 7/4 december 1974 commands are interpreted, and their functions performed, by the scanning subroutine. picture set selection in selecting a picture set the user is asked to make a series of decisions. for each decision, a frame listing the possible choices is displayed on the viewer. all possible decisions form an inverted tree structure (see figure 3). the user may also return to a previous decision point. the tree structure is implemented in a table in computer storage. there is an entry in this table corresponding to each decision point in the tree. when a decision a martian aa orbital. aaa aab aac flyby :1-iariner hariner nariner iv vi, ix 0 u vii ab sur£ace viking b lunar ba orbital approach baa apollo hand held bab apollo metric bac apollo pan bad lunar orbiter bae ranger bb surface bba apollo bbb surveyor c venus flyby d mercury flyby fig. 3, example of a tree. microfiche libmryjzimmermann 297 is made, the entry corresponding to the new decision point is obtained. an entry at the bottom of the· tree identifies the picture file associated with the picture set selected. in general, an entry contains: ( 1) the viewer control code of the frame displaying the choices; ( 2) a pointer to the entry from which this node was reached; ( 3) the number of possible decisions which can be made at this decision point (to check for valid decisions); and ( 4) pointers to the entries for the decision points reached. parameter specification once the user has made a decision selecting a set of pictures, he is presented with a list of the available parameters and acceptable values for them. for each parameter in which the user is interested, he specifies the parameter number and the values or range of values acceptable to him. this information is stored in two tables which are referred to when the comparison search is made. one table, the parameter table, contains an entry for each parameter specified. this table is cleared whenever a new picture set is called for. an entry in the table includes: ( 1) the parameter number; ( 2) a code indicating which of several methods is to be used in processing the parameter; ( 3) a code providing information on how the user-specified values are to be interpreted; and ( 4) a pointer to the location in a second table, the values table, where the first of the specified values is stored. all additional values are placed in the values table following the addressed value. the processing code (number ( 2) above) allows each parameter to be processed by a unique method. a standard method for a given parameter is kept in a field of the format record. the user can also specify a method other than the standard one. if an entry already exists for a just-entered parameter, the old entry is updated rather than a new one created. search optimization this phase determines the most efficient way to conduct the comparison search from among a set of alternatives. whenever possible, the search is restricted to only a part of the picture file. for each picture file there is a number of parameters for which additional information is available. specifically, if a list of pictures ordered by increasing value of a parameter is available, the pictures which have a particular value of that parameter can be found more quickly through this list than by searching through the whole file for that value of the parameter. if the position, in this ordered list, of the picture at the low end of a range of values (of the parameter it is ordered on) can be found easily, the search can be started at this point and need only be continued until the picture at the high end has been reached. note that the picture records for the intervening pictures must nonetheless be compared with the user specifications since the restriction is only made on the basis of one parameter whereas more than one may have been specified. 298 ]oumal of library automation vol. 7/4 december 1974 a binary search is the method used to search the list for the first picture in a range of values. to use this method, of a set of n picture records the n / 2th is chosen and its value of the parameter is compared with the desired one. since the list of records is in order of the value of this parameter, it is clear in which half of the list a picture with the desired value of the· parameter would have to be. this interval can then be divided and the process continued until the remaining interval consists of only one picture. the main picture :file is itself usually arranged in order of at least one parameter. for other parameters, control lists of picture numbers ordered by value of these parameters can be used for binary searches. however, it is not practical to create these lists for all parameters as they require a fair amount of storage. an entry in such a list contains two words, the value of the parameter and the picture number of the corresponding picture. picture number is a sequence number which determines the position of the picture record relative to the beginning of the picture :file. each picture file has a table in its format record containing identifiers for the parameters for which the binary search technique can be used. if more than one of these has been specified (as stored in the parameter table), it must be determined which parameter restricts the search the most. to do this the upper and lower limits of the specified values of each such parameter are found (from the values table), and from this the expected number of picture records to be compared is computed. this number is multiplied by a factor indicating the speed of the type of search to be used relative to the speed of the simplest type of search. the parameter· with the lowest expected elapsed time of search is selected for the search. comparison search for each picture to be compared, the appropriate picture record is found and specified parameter values are compared with those in the picture record. a control list, selected in the search optimization phase, may be used to determine which picture records are to be compared. for each selected picture an entry containing a portion of the picture record is made in a picture table. the picture table has a limited capacity which is set when the program is compiled. for our application there is currently room for up to 100 entries. if the picture table is filled before the search is finished, the search is suspended and can be continued by a command in the display phase. picture display, information access this phase accepts commands to conb·ol display of the selected pictures and provide access to interpretive information. the picture table entries provide the information needed, either directly or by referring back to the picture record. any of the selected pictures can be viewed at any time. in addition, the user can "mark" preferred pictures to differentiate them from the others. these marked pictures are set apart in the sense that microfiche library/zimmermann 299 many. viewing and information access commands refer optionally to only these pictures. the pictures themselves are the primary source of information, but the user will often want information that is not available from the picture in order to interpret the picture. there are commands that request the control program to type out on the i/0 terminal the information in a picture record. these commands optionally refer to the picture currently displayed, the marked pictures, or all the selected pictures. other commands call for the display of data frames associated with a picture. these frames can contain large volumes of data that need not be kept in computer storage. the viewer control codes for these frames are kept in the picture table. the keyword commands to display data frames can vary from file to file. the valid commands for a file are kept in the file's format record. there are other commands to transfer control to other phases and to keep desired pictures available for display with those selected by the next comparison search. there is also a provision for adding file specific commands to perform any other function. the commands and their functions are listed in appendix 3. performance and costs a typical simple search consisting of logging in, picture selection, parameter specification, search, and display might take five to ten minutes and cost one to two dollars for compute time. most of this is time spent by the user in entering commands. command execution is usually almost immediate as it does not involve a major amount of computation. most of the compute time is accumulated during the comparison search phase. to search through the entire mariner 9 picture file of around 7,000 pictures (about 200,000 words) takes about forty seconds elapsed time and costs about two dollars. a more typical search, however, will allow some search optimization and cost about thirty cents with an elapsed time of ten seconds. of course, these figures should only be used as estimates, even for other dec system 10 systems, as elapsed time depends on system load and this, as well as the rates charged, varies considerably. total monthly compute costs for a system depend entirely on use. likewise, storage costs depend on actual storage space used. for the 200,000-word mariner 9 file our cost is about seventy-five dollars per month. only the most-used picture files actually need be kept on disc; the rest can be copied from magnetic tape if they are needed. all files are backed up on magnetic tape in any case. the rates listed in this paper are those charged by our campus time-sharing system. dec system 10 computer time is available from commercial firms at somewhat higher rates. the cost for a microfiche viewer with computer interface (image systems, culver city, california, model 201) is around $7,000. a thirty-characters-per-second i/0 terminal sells for $1,500 and leases for $90 per month. in addition, an installation may require a microfiche camera and 300 ]oumal of library autonwtion vol. 7/4 december 1974 other photographic equipment and supplies. photographic services are also available from the viewer manufacturer. the hardware cost for an independent system implemented on a minicomputer with 12k to 20k of core and five million words of disc memory is estimated at an additional $30,000 (exclusive of development and photographic costs). implementing a library system in implementing a library system to use the hardware and software described in this paper, two major areas of effort are required. first, the pictorial information must be converted to microfiche format; that is, it must be photographed, or possibly rephotographed if already in photographic form. in addition, a computer data base must be created. if information about the photographs is already available in computer-readable form, this involves writing a program to convert the data to the structure required by the control program. if this type of information is not available, the pictures may need to be investigated and the information coded, and presumably punched onto computer cards, for further processing. the major difficulties we encountered were coordinating the photographic and data base generation tasks, achieving the high resolution we required to retain the detail of the original photographs, and in using early versions of the microfiche viewer (which had a tendency to jam cards). conclusion a system for rapid access to pictorial information, the computer accessed microfiche library ( caml), has been described. caml has been designed to integrate, in an easy-to-use system, the storage capacity and capability for fast retrieval of a special microfiche viewer with the manipulating ability and speed of a computer. it is believed that this system will help overcome the barriers to the full utilization of photographs in large quantities, as well as have applications in the retrieval of other types of pictorial information. acknowledgments the work described in this paper was supported by nasa grant #ngr 05-002-117. the author is grateful to dr. bruce murray and the staff of the space photography laboratory at caltech for their support and advice; he also wishes to acknowledge the efforts of mr. james fuhrman, who assisted in the programming task and contributed many valuable ideas. appendix 1 the following is an example of a typical search. numbers in the left margin indicate when a new frame is displayed on the viewer. these were added later to clarify the interaction between viewer and terminal. user responses and commands are identified by lines beginning with an asterisk. (the control program types asterisks when it mic1'ofiche libra1'yjzlmmermann 301 is ready for input.) in this demonstration, most keywords were completely typed out. it is possible, however, to abbreviate any keyword to the shortest form that will be unique among the acceptable keywords. after the user enters a standard "log in" procedure to identify his account number and verify that he is an authorized user of this account, the control program is automatically initiated. the viewer displays a picture ( 1) of the installation and the user is asked to enter his name. the name, charges, and time of use will later be added . log 9::::;:94-···t·h·h-j job 13 caltech 506b sys~em tty?? po=t·::s~~ord: 1930 27-aug-74 tue ltd start, please enter your name det1dtetrat i otl 2enter name of file desired •~1r·1 i::·:: 3please type in parameters and their values type "done" when you have finished +orbit 222 +canera a •latitude -45 to 45 +specifications parameters from file mmix orbit 222 dr latitude -45.00 to 45.00 cat·1ef.'a a dr +done ·--?3:::2 pictuf.:es: to pf.:ocess, please lo.iait 2 pictures have been selected 4this is the fif.:st pictuf.:e the fdlloi!jitlg pictupes are ff.:om file t·hh~'!. •• 1 please entef.' commands • 5this is the last pictuf.:e 2 •t·1af.:k +type parat-!eters t·1arked paf.:at-!etef.: key for file mt-!ix das time orbit latitude phase angl viewing an slant rang local time filter exposur tm lont3itude camera rdllfile ~ 2, file mmix , id = 9557769 base pict. reader o, 2-e-2 1-a data reader 1), 2-e-2 5-a fdotp reader o, 2-e-2 5-k parameter values: ·~557769 222 :3€ .. 70 140.70 48.18 14.85 29:37 a 15.29 sn,:rt i~4 425606:3 no cm1~1ents solar angl f.:esolution 60.26 2.29 302 journal of library automation vol. 7/4 december 1974 to an accounting file. the user now enters the picture set selection phase. in the current system, only two files (picture sets) are stored and the user is simply presented with a frame ( 2) listing the file names and giving a short description of what is contained in each. the user types the desired file name (mmix-mariner 9 mars photographs) and thus enters the parameter specification phase. the available selection parameters and acceptable values are now shown ( 3) . the user specifies some param+e:>(am i t·ie 1 marked pictures haye been selected 6this is the first marked picture this is the last marked picture the following pictures are from file mmix 2 +respec if~' %warning--original search parameters are still in effect 7 pleas&type ih parameters and their values type "done" when you haye finished +restart 8 e~iter name of file des i red +orbit 9 please type in parameters and their values t'r'pe "done" i,_ihen you have finished +charge:s: $ 0.5:3 10 •help ll+iden "> 5196 • +this is a~ error ++error++: ~ld such ~:eyi .• jdrd--please rehpe litle +do 1022 pictures to process, please ~jait 22 pictures have been selected 12thjs is the first picture the follobiing pictures are from file orbit .. 1 please enter commands +type parameters specified parameter key for file orbit idetl a 1• file orbit• id = 5196 parameter. values: +5 13 :~ 6 5196 +type parameters latitude, longitude, resolution parameter key for file orbit latitude longitude resolution a 6, file orbit• id = 5201 parameter values: 24.48 -47.27 2.90 please turn off viei.~er• terminal• and coupler jc:s 13 [98"394, mnnj legged cff tty77 1948 27-a•y;-74 ji.s:i n2, ... nn) 306 journal of library automation vol. 7/4 december 1974 if not used, file name is assumed to refer to the file last searched. if the parameters are not enumerated, those specified for the picture selection are typed out. the parameters to be typed out can be enumerated or the specification parameters called for. if neither of these is done, the values of all parameters are typed out. parameters typed out are identified by column headings. phase transfer commands function respecify allows respecification of selection parameters-only those parameters which are reentered are changed; previously specified parameters retain their values. search similar to respecify, except only those pictures in the present list are candidates for selection. this is more efficient than again searching through all the pictures. continue if the search was terminated before all pictures had been processed, the search is continued from where it had been suspended. restart to view another set of pictures (all specified parameter values are deleted) . field number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23-28 appendix5 mariner 9 picture records field fixed-length portion fiche code data code file name id number (das) unit# picture number footprint code unused variable portion das time orbit latitude longitude solar lighting angle phase angle viewing angle slant range camera resolution local time filter exposure time role and file of filter version on roll film comments (content descriptors) lib-mocs-kmc364-20140103102400 52 journal of library automation vol. 4/1 march, 1971 book reviews computerized library catalogs: their growth, cost, and utility, by j. l. dolby, v. j. forsyth, [and] h. l. resnikoff. cambridge, mass.: them. i. t. press, 1969. 164 pp. $10.00. on the verso of the title page of this book extolling the benefits of computer stored information we read: "no part of this book may be reproduced in any form or by any means, . . . , or by any information storage and retrieval system, without permission in writing from the publisher." this is ironic evidence of the inner contradiction between the pioneering aspirations of new technological development and the interests of an existing industry which feels threatened by unfounded fear of obsolescence and pursues claims to undesirable universal control of information. it is a vivid indication of an urgent need for statutory regulation of the right to information as opposed to the right of profit-motivated control of information. behind this disturbing title page are found seven chapters, three of which constitute a specially important contribution to the very scarce literature on quantitative aspects of bibliographic administration. the other four chapters deal at some length with various cost-related aspects of bibliographic record conversion in machine readable form and with computer use for the production of library catalogs from such records: a chapter analyzing the user costs, costs of programming, hardware costs, and record conversion costs; another chapter on the effect of type face design and page format characteristics on the cost of printed catalogs; a chapter on automated error detection in bibliographic record processing; and a chapter on the use of machine readable catalog data in the production of bibliographies. the three chapters on statistical analysis of machine readable bibliographic data are chef-d'oeuvres of library literature. they demonstrate the wealth of quantitative information inherent in bibliographic record files, which with application of appropriate statistical methodology can yield most important information for library management. the introductory chapter illustrates a methodology of analysis of book publication trends, and comparing these with the gross national product and other economic indicators, points out forcibly the extent of vital quantitative information available to the administrator if he cares to analyze. this is a brilliant essay on the topic of the growth rate of library collections! the case study of the fondren library shelf list is a further elaboration of this theme, especially in terms of title vs. volume ratios and class distri· bution, leading to the third analytical essay on the similarities between the economic growth of nations and archival acquisition rates. this essay trio should not be missed by anyone concerned with the objectives and rational management of libraries. it should be obligatory book reviews 53 reading for administrators who want food for their creative vision. these essays not only are informative, illuminating and stimulating, they also attest the virtuosity of their authors in the area of imaginative statistical analysis. we put this book on our ready reference shelf with the conviction that its authors should be persuaded to give us a most needed textbook on the methodology of library statistical analysis. ritvars bregzis library use of computers, an introduction, edited by gloria l. smith and robert s. meyer. new york: special libraries association, 1969. 114 pp. $5.00. ( sla monograph no. 3). this little paperback book is the result of a lecture series held during the spring of 1965 and sponsored by sla's san francisco bay region chapter and the university extension, university of california, berkeley. according to the introduction, it is intended to be "a librarian's primer for computers: briefly, how they work, and basically what kinds of output can be got from them for use in libraries." the book includes most aspects of library automation in a very generalized manner. the chapters include programming, systems analysis, hardware, applications, reference services, conversion, and current trends and future applications. the most informative chapters are the ones on application and current trends. the other chapters matter-of-factly present their information, but because they fail to raise questions do not challenge the beginner to seek additional answers. in addition, most of the papers do not include a bibliography or reading list and therefore do not give the reader guidance to further his interests. the more substantive papers in the book confront the reader with provocative statements, such as, "it should be the cost to the patron which the library should worry about. what is the cost to the ~atron of not knowing something, or the cost of having to find out?" or ' . . . library science . . . today is really only a technology, much like medicine was before biology was developed, namely a handbook-cookbook world. we must continue to search for underlying principles so that librarianship may become a science and grow out of its technological phase." the papers without doubt met the needs of the lecture series for which they were originally intended, and the authors are experts in the field. most of them are well known for their work in library automation and mechanized information retrieval the question nevertheless arises-must ~very spoken word subsequently appear in print? very little contribution is made to the literature by this book as most of it has appeared elsewhere, many times, before-even down to many of the illustrations. as a group of elementary instructional lectures designed to stimulate intere!jt, most of these papers make the grade, but as a contribution to the literature, a fllw of the papers may have better served as journal articles. donald p. hamme1' 54 journal of library automation vol. 4/1 march, 1971 the career of the academic librarian-a study of the social origins, educational attainments, vocational experiences, and personality characteristics of a group of american academic librarians, by perry d. morrison. chicago: american library association, 1969. viii, 165 pp. $4.50. this acrl monograph no. 29 is a revised and condensed version of a dissertation for the degree of doctor of library science at the university of california at berkeley, published ten years after the data were collected. in 1958 the author sent questionnaires to two groups of academic librarians: head librarians of american college and university libraries earning $6,000 or more, which he calls the "primary group," and a "control group" from the same institutions selected from the 1955 edition of who's who in library service. the findings are quite interesting, often expected, but sometimes surprising. it would be expected that the study would support the theory that the true leader has interests broader than those whom he leads, that participation in professional and scholarly organizations is directly correlated with position and salary, and that willingness to move around to different positions is an important ingredient in the formula of "success". but this reviewer, and perhaps others, are surprised to learn that librarians, psychologists and business leaders tend to come from families of men who are better educated than the average, and that it is more advantageous for a woman than a man to hold a master's degree in a subject field and/or to possess the old-type master's in library science. the chapter on "implications of findings" has some interesting comments from respondents, among which is the notion that rewarding specialized competence, as opposed to general administrative ability, is essential if a maximum contribution is to be secured from both men and women. not many academic library administrators would quarrel with this point of view, but might quarrel with the heavy burden which the respondents lay on the library schools to solve the problems of academic librarianship, rather than sharing them with the libraries. criticism of the study is directed toward the rather pessimistic, probably unrealistic, attitude taken by the author on the question of faculty status, and the omission of any substantial inclusion about information science and information specialists, although one needs to recall that these were not prominent issues in 1958, though they were in 1969. the wealth of information contained in this little volume can be useful in recruiting to the profession and to individual libraries, helpful to the young librarian just starting out in the practice of his profession, and of value to administrators of academic libraries. the appendices include a copy of the questionnaire, a statement of the statistical treatment, suggestions for further research and an extensive bibliography. lewis c. branscomb book reviews 55 the computer in the public service: an annotated bibliography 19661969. compiled by public administration service. chicago: public information service, 1970. 74 pp. $8.00. it seemed useful to approach this review from the point of view of a designer of automated library systems, to see how well this bibliography covered the use of computers in that sector of the public service. thus approached, the computer in the public service is disappointing in the extreme. nearly a quarter of the volume is used to explain the classification system, wherein one finds libraries lumped together with museums and the promotion of science and research. this over-generalization is reflected throughout the work and results in very thin coverage, at least as concems the field of library automation. for example, in the four-year period covered, one finds a single article by henriette a vram. one finds neither marc nor recon in the subject index. the indexes, by the way, refer one to the numbers of the sections of the classification code, rather than to page numbers. this takes some getting used to. both journal articles and monographic works are included, but one is hard put to find reference to most of the major systems problems now being tackled by library systems designers, and is struck by the fact that most imprints included seem to be 1966-67. but to return to the real problem with this work: covering four years of the use of computers in the public service in forty-seven pages of citations naturally leaves wide gaps in the literature. the person wishing to learn all about the use of computers in the public service, especially in library systems, will do well to look beyond this meagre list of citations. it can perhaps be best characterized as too little and too late. lawrence g. livingston computer programming for business and social science, by paul l. emerick and joseph wilkinson. homewood, ill.: richard d. irwin, 1970. xviii, 429 pp. $13.25. this book is devoted principally to business programming, with a few of the examples being taken from educational administration or social ~cience. the computer language used throughout most of the chapters ~s fortran-not the current fortran iv, which is covered briefly m an appendix, but the older fortran ii. one chapter is devoted to cobol and another to an "overview of other languages" including basic, algol, and pl/i. it is a satisfactory introduction to programming fundamentals, generally clear and well presented. furthermore, the ?umerous exercises in business programming would seem to be a useful mtroduction to that field. however, neither the language nor the type of ~xample used gives the book any special relevance for persons approaching library automation. foster m. palmer 56 ] oumal of library automation vol. 4/1 march, 1971 consortiwn of universities of the washington metropolitan area: union list of serials, edited by bruce dack. 2d edition. washington, d.c.: distributed by the catholic university of america press, 1970. $20.00. this edition represents 23,787 diherent serial titles located in american, catholic, george washington, georgetown and howard universities. the scope has been enlarged to include monographic series. among the subjects emphasized are africana, astronomy, canon law, chemistry, classics, latin americana, law, linguistics, medicine, physics, semitics, theology and the american negro. although the citations lack the complete bibliographic data that was included in the first edition, the cross references are more than adequate to get from variant forms of the entry to the latest form. in most instances only the title is noted along with the holdings of the various libraries. the print is not as good as that of the first edition. even though the quality does not quite come up to that of the first edition, the list is still worthwhile for people doing research in the washington metropolitan area. sue brown the case for faculty status for academic librarians, edited by lewis c. branscomb. chicago: american library association, 1970. 122p. $5.00. this volwne is available at a most opportune time because of the current interest in faculty status for librarians in academic institutions. fourteen articles and statements examine the various aspects of the subject and generally support faculty status for librarians. although only two of the articles have not appeared in colleges and research libraries during the past decade, this compilation effectively brings together the relevant statements on the subject. faculty status for librarians is a burning issue on many campuses where recognition is being sought, threatened, or questioned. unfortunately, many librarians take faculty status for granted or perhaps do not completely understand or appreciate it. this book will be useful for many groups and situations because of its comprehensiveness. some of the subjects discussed are priviliges and obligations of faculty status, definition of professional duties, criteria for appointment and promotion, opportunities for study and research, and the granting of tenure. on individual campuses faculty status for librarians must be secured and retained in terms locally acceptable and in a frequently unfriendly, if not hostile, atmosphere. the article, "institutional dynamics of faculty status for librarians," by robert h . muller very effectively describes the advantages of faculty status for the institution and explains the forces that tend to work against faculty status for librarians. ]ames t. dodson book reviews 57 monocle; projet de mise en ordinateur d'une notice catalographique de livre, [by] marc chauveinc. grenoble: bibliotheque universitaire, 1970. (publications de ia bibliotheque u niversitaire de grenoble, iii) 156p., [32 leaves] this attractively printed volume is an adaptation, not a translation, of lc marc ii and bnb marc by the librarian of the university of grenoble. as m. chauveinc points out, the format established in a country is based on the cataloging rules used there, since the latter determine the cataloging elements, their functions and relationships. element by element, code by code, everything in marc had to be redefined for monocle in the context of the french rules. because of a commitment to international standardization, care was taken not to alter the basic structure of the content designators used in marc. when a marc variable field is not needed in monocle, e.g., 650 ( l. c. topical subject headings) it is left unused and a new tag is created for the equivalent french field. french topical subject headings are thus 681. marc is a communications format; monocle is a processing format for an individual library which wishes to place its bibliographic records in memory, to produce printed catalogs, and to keep statistics that will be useful in administering the library. the structure of monocle divides each record into two parts: an index file and a library or principal file. this latter is a continuous string of variable length fields which contain the data of a catalog record. the fields and subfields are differentiated in this file only by delimiters and subfield codes. the tags and indicators appear in the directory. the index file in its turn is divided into three parts: a leader, information codes, and a directory which is similar to, though not identical with, the marc directory. the nineteen-byte leader and the sixty-nine-byte information codes area together form the legend. the leader is based on the marc leader and 001 (control number) field while the information codes are an expansion and replacement of marc's 008 field, with provision for serials as well as monographs in this one format. some of the information indicated in the legend can be found in full in the variable fields. however, in line with monocle's avowed objective of automating to improve the running of a library, this information is repeated in the legend, since information placed there, being coded and in fixed positions, can be accessed and sorted more easily, rapidly, and economically. one of monocle's most striking characteristics is the great concern exhibited throughout for sorting and filing arrangement; one example is the effort made to give sorting values to subfield codes. in this respect monocle follows the example of bnb marc rather than lc marc, 58 journal of library automation vol. 4/1 march, 1971 which uses subfield codes only as a means of identifying distinct elements within a field. everyone interested in the problems of bibliographic formatting, or sorting for filing, should give monocle close attention, both for its specific provisions (such as its tagging conventions, its search code, its treatment of titles, subrecords, and references, etc.) and for the light it throws on marc. while lc marc is very succinct in its commentaries and rru:ely justifies its codes, monocle has much fuller explanations of its provisions and its reasons for agreeing with or differing from marc. judith hopkins system scope for library automation and generalized information storage and retrieval at stanford university. stanford, calif.: stanford university, 1970. 157 pp. (available from eric document reproduction service. ed 038153 mf $0.75; hc $7.70) "the purpose of this document is to define the scope of a manual-automated system to serve the libraries and the teaching and research community of stanford university." the automated system considered is not one, but the joint development of two major bibliographic projects; ballots (bibliographic automation of large library operations on a timesharing system) and spires (stanford physics information retrieval system). the development activity falls into three areas; applications unique to ballots, applications unique to spires, and common facilities that are used by both applications, such as executive and communications software and a text editor. the document is roughly divided in two, with hah being devoted to the scope statement and the other half a myriad collection of appendices. the scope portion of the document defines a second phase of development for the system, as prototype applications have been in operation. the objectives of the applications are redefined in system level detail in view of experience learned from phase one. hardware is evaluated and there are indications that it is inadequate to effectively handle even the prototype system. the appendices include a glossary for the uninitiated, sample documentation of the present library operations, a comment on how the law library could use the system, a review by louise addis of the stanford linear accelerator center's experience with spires, and a tutorial on information retrieval. because of the audience this publication is intended for (librarian users, system developers, and administrators) library automation specialists and information scientists will not find much to put their teeth into. the document seems to be intended mainly for internal use rather than external distribution. alan d. hogan book reviews 59 advances in librarianship, edited by melvin j. voigt. volume 1. new york: academic press, 1970. 294 pp. this volume is a most welcome addition to the literature of librarianship, and the prospect of its annual reappearance is indeed cheering. for decades, there were few major innovations in librarianship, but about 1960 there occurred a series of events in libraries, including user-operated photocopying machines, radical improvements and extensions of microphotography, and computerization, that are leading to formulation of new objectives, new systems, new techniques for printing, new media of communication, and new knowledge. up until a dozen years ago, a librarian could keep abreast of new knowledge in his field by skimming a few journals and reading precious few articles. today to keep up he should read a couple of abstract journals and request offprints, read most of the artic1es in at least four or five journals, and advances in libmrianship. this first volume contains eleven chapters by different authors that · discuss topics in a broad span of librarianship : cataloging; acquisitions; costs; academic, school and public libraries; bibliotherapy; and developing countries. although the standard observation by reviewers of such volumes is that "as is to be expected, the quality of the papers is uneven, with some falling below others," it would be accurate to state of this volume that the quality of some papers is higher than others. the editor is to be commended for having produced an excellent publication that should be on the personal shelves of every librarian who wishes to keep abreast of advances throughout his profession. frederick g. kilgour folkbiblioteken och adb. en introduktion i automatisk databehandling. av sten henriksson. datorn som hjalpmedel vid utlan och katalogisering. av claes axelson, lund: bibliotekstjanst, 1969. 86 p., ills., reg. the general service bureau for swedish public libraries, bibliotekstjanst, has edited this introduction to computers and library automation ( circulation and cataloguing). the book is written for persons who know the functions of a library and have perhaps more interest in than knowledge of computers and computer technique. the introduction about computers by sten henriksson from the university of lund is not loaded with numeric statements but carefully explains the computer technique. for non-mathematicians this is an extremely good introduction. claes axelson from bibliotekstjanst writes about the computer's use in circulation and cataloguing. the described circulation system is based on some experiments carried out at a branch library in malmo (stock: 15,000 vols., circulation: 50,000 vols per year). borrower's card and book-card are matched, and batch processed once a week. in case any cards are lost, 60 journal of library automation vol. 4/1 march, 1971 new ones are punched at the desk. reservations were troublesome to handle (a more serious objection if the systems had been intended for use in research and special libraries). cataloguing is described without referring to any experiments. all possibilities are mentioned, book catalogues, card catalogues, databanks, microfiche. specific problems about book cataloguing are treated too. both parts of the book are well written and illustrated with taste and economy. there is an index and a bibliography, but one will find ]ola and program missing. for librarians in scandinavian countries this book is a very useful one-though some problems are related only to the swedish public library world. mogens weitemeyer latin american literature. harvard university library (widener library shelhist, 21). cambridge, massachusetts : harvard university library, 1969. 498 pp. $20.00. the harvard library initiated its monumental project for publishing the widener shelf list in 1965 with the appearance of crusades. making available a classed listing of the holdings of one of the world's largest scholarly libraries is a major contribution to scholarship, as reviewers of the early volumes gratefully acknowledged. however, this review is not concerned with content of this remarkable publication, but rather with the typography of the 21st volume. the early volumes ( 1-8) were reproduced by photo-offset from computer produced copy that was all in upper-case characters. to be sure, one does not read a shelf list, one consults it. nevertheless, literature is in lower case, and using large compilations in upper case is tiresome because of the low legibility of upper-case characters. the plates of these first volumes contained an average of 55 entries. the harvard library improved its procedures about a year after the volumes began to appear, so that the computer printout was in an expanded character set that included lower-case characters and diacritics. the newer volumes were far more legible and therefore far more comfortable to use. each page carried a single column of entries; pages averaged 85 entries. beginning with volume 21, computerized phototypesetting techniques are being employed, and legibility is greatly improved. economy has also improved, for each page now averages over 130 entries per page-nearly two-and-one-half times the content of pages in the early volumes. volume 21 has two columns of entries per page-a format that enhances the number of entries as well as legibility. harvard is to be congratulated on taking advantage of computer developments during the first five years of publication of the widener shelflist. thereby, the aesthetics and economy of a major bibliographic publication have been gratifyingly enhanced. frederick g. kilgofjf june_ital_rubel_final picture perfect: using photographic previews to enhance realia collections for library patrons and staff dejah t. rubel information technology and libraries | june 2017 59 abstract like many academic libraries, the ferris library for information, technology, and education (flite) acquires a range of materials, including learning objects, to best suit our students’ needs. some of these objects, such as the educational manipulatives and anatomical models, are common to academic libraries but others, such as the tabletop games, are not. after our liaison to the school of education discovered some accessibility issues with innovative interfaces' media management module, we decided to examine all three of our realia collections to determine what our goals in providing catalog records and visual representations would be. once we concluded that we needed photographic previews to both enhance discovery and speed circulation service, choosing processing methods for each collection became much easier. this article will discuss how we created enhanced records for all three realia collections including custom metadata, links to additional materials, and photographic previews. introduction ferris state university’s full-time enrollment for fall 2015 was 14,715 students. of these students, 10,216 are big rapids residents and the other 4,499 are either kendall college of art and design students or at other off-campus sites across michigan.1 during the 2014-2015 school year, flite had 14,647 check-outs including 2,558 check-outs of items in reserves, which is where our realia collections are located.2 however, reserves includes other items in addition to these collections, thus making analysis of circulation statistics problematic. another problem with conducting such an analysis is that the educational manipulative collection already had photographic previews and the tabletop game collection is a pilot project, so there is no clear before and after comparison. we can, however, demonstrate that enhancing the catalog records for our anatomical model collection had an incredibly significant impact, jumping from a handful of check-outs from 2014-2015 to almost 450 in 2016. literature review although there are very few libraries using photographic previews for their realia collections, the ones that do described similar limitations with bibliographic records and goals that only dejah t .rubel (rubeld@ferris.edu) is the metadata and electronic resources management librarian, ferris state university, big rapids, mi. picture perfect: using photographic previews to enhance realia collections for library patrons and staff | rubel | https://doi.org/10.6017/ital.v36i2.9474 60 photographic previews could meet. most realia collections that warranted this extra effort are either curriculum materials or anatomical models, which is not surprising considering how difficult they are to describe. as butler and kvenild noted in their article on cataloging curriculum materials, “patrons struggled to identify which game or kit they sought based on the…information in the online catalog,” because “discovering curriculum materials in the catalog and getting a sense of the item are not easy when using traditional catalog descriptions...”3. as they continue, “the inventory and retrieval problems…were compounded by the fact that existing catalog records were not as descriptive as they should be.”4 this was also a problem for our collections because our names and descriptions were often not intuitive or precise. in addition, as loesch and deyrup discovered while cataloging their curriculum materials collection, “…there was great inconsistency among the oclc records regarding the labeling of the format…,”5 which was another issue we needed to address. although the general material designation (gmd) has since been rendered obsolete, flite continues to use it to highlight certain material. this choice is due to some limitations with our library management system as well as our discovery layer, namely the lack of good mapping or use of the 33x fields. until this is rectified with a more modern system, we have it found it easier to retain certain gmds like “sound recording”, “electronic resource”, and “realia”. thus, we needed to standardize our terms for each collection. another problem that our predecessors indicated photographic previews might resolve was missing objects or pieces of objects.6 this becomes especially important for our tabletop games collection because most of those pieces are very small and too numerous for a piece count upon return. fortunately, “previews…can aid users in making better decisions about potential relevance, and extract gist more accurately and rapidly than traditional hit lists provided by search engines.”7 ideally, a preview will display an appropriate level of information about the object it represents in order “…to support users in making a correct judgement about the relevance of that object to the user’s information need.”8 greene goes further by listing the main roles for previews of which the first two are the most applicable for photographic previews: aiding retrieval and aiding users in quickly making relevance decisions.9 for these uses, photographic previews of realia are ideal because users can examine the object without needing to see its details and they expect them to be abstract, not exhaustive, unlike digital surrogates that an archive would use.10 as greene also notes, the high-level goal of any preview is to "...communicate the level and scope of objects to users so that comprehension is maximized and disorientation is minimized."11 a common finding among all the previous projects was that even a single photograph provides more readily comprehensible information than several lines of description. as moeller states regarding their journal project, "they [previews of each issue's cover] give the researcher or student an immediate idea of the nature of the journal."12 he goes further to give the example of an innocuous journal title for a propagandist serial whose political nature is transparent once you view its imagery. from a staff perspective, photographic previews can also easily illustrate the number of information technology and libraries | june 2017 61 pieces and an object's condition or orientation. this can be very useful in determining whether something is missing or damaged without having to do a time-consuming individual piece count upon check-in. but as butler and kvenild discuss, layout within each photograph is key for illustrating missing pieces.13 unfortunately, aside from a few small projects mentioned in butler and kvenild's article, there are not many examples of photographic previews for realia collections currently being used by academic libraries. one reason might be software limitations. innovative's media management module is still unique among ils/lms software in that most vendors either provide a separate digital repository for special collections digital surrogates or they incorporate images into the catalog using third party software like syndetic solutionstm. another reason for the lack of photographic previews within catalogs may simply be the rarity of realia in academic libraries. every library certainly has a few unique pieces, like a skeleton for the pre-medical students, but often not enough to consider them an entire collection much less a complex enough collection to warrant the extra effort to create photographic previews of each item. at flite, we had already crossed that threshold of complexity. therefore, this article will start by discussing our educational manipulative collection, which provided the basis for how we would catalog and process the tabletop games and anatomical models. educational manipulative collection our first foray into creating photographic previews was completed by the previous cataloger with over 300 items cataloged in 2004 and another 30-40 added to the collection over the next decade. unlike the other realia collections, the educational manipulatives were cataloged using innovative’s course reserves module, so no attempt was made to find or create oclc records. nevertheless, the minimal metadata is very consistent across the collection, which supports greene’s recommendation “…that it was important to define a set of consistent attributes at the high level of the collection if any effective browsing across the collections was to be provided.”14 in our case, we rely on a combination of the gmd ([realia]), a custom call number prefix (toys box #), and a limited amount of local subject headings as shown below with “manipulatives” as the common subject for the entire collection. 690 = (d) current local subject headings in use as of 12/3/15: art. infant/toddler. block props. magnets. boards. manipulatives. cognitive. music. discovery box. oversize books. discovery. posters. dramatics. puppets. finger puppets. story apron. flannel board. story props. gross motor. woodworking. picture perfect: using photographic previews to enhance realia collections for library patrons and staff | rubel | https://doi.org/10.6017/ital.v36i2.9474 62 due to the nature of descriptive metadata, photographic previews of the educational manipulatives made logical sense because “the images…are not the content. they are the metadata, the description of the materials.”15 as moeller describes, innovative’s media management module links images and many other file types directly to bibliographic records without requiring users to click an additional link unless they want to view a larger image of a thumbnail.16 similar to butler and kvenild’s project, all of our photos were 900 pixels wide by 600 pixels tall, which is slightly smaller than their default width of 1000 pixels.17 one advantage of using the media management module is its ability to automatically create thumbnails 185 pixels wide by 85 pixels tall. a bigger advantage is that the images are hosted on the same server that runs our catalog, which allows us to freely distribute the images in an intuitive manner (thumbnails instead of links) without having to worry about authentication to a shared folder from off-campus, unlike our pdf files. unfortunately, our liaison to the school of education recently discovered some accessibility issues with media management that forced us to consider whether we should change the embedded photographic previews to external links. the most significant of these problems is simply the language of the proprietary viewer software. because it is written in java, if you click on a thumbnail for a larger image, many browsers, like chrome, will not run it and those that will often require a security exception to do so. we have attempted to ameliorate some of these issues by providing an faq entry on which browsers are best for viewing these images and how to add a security exception for our website, but unless or until innovative rewrites this software in a different language, these accessibility issues will persist because java is being phased out of many browsers. butler and kvenild also noted its slow response time compared to their own server.18 another issue they mentioned was that the thumbnails would not be visible in their consortial catalog, so they needed to add links in the 856 field for these users.19 this is less of an issue for us because we do not contribute any of our realia records to our consortia catalog, but moeller’s concern that in general “…enhancements involving scanned images…will not be easily shared with other libraries,”20 is entirely valid. unlike oclc records, there is no way to share attached or embedded images as part of the metadata and not the content. contrariwise, butler and kvenild’s concerns regarding catalog migration are very pertinent because we are considering moving to a new lms within the next few years.21 although we acknowledge that “utilizing 856 tags is an indirect method of accessing the images, as users must take the intiative to follow the links,” we will eventually have to move and link our photographic previews to ensure accessibility after migration.22 tabletop game collection unlike the educational manipulatives, the majority of the tabletop game collection was previously cataloged in oclc, so finding good bibliographic records was easy. once downloaded, we decided to add a unique gmd ([game]), custom call number prefix (board game box #), and local subject heading “tabletop games”. however, our emerging technlogies librarian who coordinated this information technology and libraries | june 2017 63 pilot project felt that the single subject heading was not descriptive enough. so he gave us a spreadsheet with more specific subject headings such as “deck building”, “historical”, and “resource management” that we added as genre/form subject headings in the 655_4 field. he also suggested that we add links to the rule books, which we did using the 856 field and the link text “connect to rule book (pdf)”. because tabletop games are commercial products, finding images online was also easy. at first, we had some concerns about copyright, but we are not reselling these products or using the image as a replacement for the item. so, we concurred with butler and kvenild that “…the images in our project fall under copyright fair use.”23 another plus to using commercial images is that we could use more than one to show various aspects of setup and play. the downside to this benefit is image sizes and content photographed varied widely, so we used our best judgement in creating labels and tried to keep them as consistent as possible. to ensure consistency across the collection, we decided that the first image should always be the top of the game’s box labeled “box cover” or “box cover – front” if there was a “box cover – back” image. (we only displayed the back of the box cover if there was significant information about the game printed on it.) then we added up to five additional images showing parts of the game like “card examples”, “game pieces”, and “game set-up”. overall, this number of images worked very well in both encore’s attached media viewer and the classic catalog/web opac, but there is a slight duplication in images by syndetic solutionstm for a few games. this results in a larger version of the box top image displaying to the right of the title and above the smaller thumbnails of images we added using media management. in regards to piece counts, we presumed that we would need photographic previews to aid in piece counting upon return of a tabletop game. however, our emerging technologies librarian assured us that because we are an educational institution, we could contact the vendor for free replacement pieces at any time. he also emphasized that unlike the educational manipulatives or the anatomical models, this was a pilot collection, so extensive processing would not be a good investment of our labor. fortunately, the anatomical model collection would require images for piece counts as well as several other cataloging customizations to increase discoverability and speed circulation. anatomical model collection similar to our educational manipulative collection, but not nearly as extensive, our anatomical model collection has been a part of flite since its inception. unlike the manipulatives, which are used primarily by the early childhood education students, the anatomical models support a range of allied health programs including but not limited to dental hygiene, radiology, and nursing. the majority of our two dozen models were purchased in the 20th century and, like the manipulatives, the majority were cataloged using innovative’s course reserves module. unfortunately, none of these records were very descriptive, some being so poor as to be merely a title like “jawbones” and a barcode. so, the first task was to match objects with oclc records. fortunately, this task picture perfect: using photographic previews to enhance realia collections for library patrons and staff | rubel | https://doi.org/10.6017/ital.v36i2.9474 64 became easier once we discovered that it was easier to match the object to the vendor’s catalog image and then search oclc by vendor model name or number than it is to decipher written descriptions if you do not know human anatomy. once good bibliographic records were downloaded, we decided to add one of three gmds depending on the type of model ([model], [chart], or [flash card]), a custom call number prefix (model #), and one or more of the local subject headings shown below. 690 = (d) anatomy model. anatomy chart. anatomy models. anatomy charts. dental hygiene model. dental model. dental hygiene models. dental models. technically, all dental models could be used as anatomical models, but not vice versa. therefore, the common subject headings for the collection are “anatomy model” and “anatomy models”. to make things easier to shelve, retrieve, and inventory, we also designed numeric ranges for the call numbers, as shown below, so we would know what type of model we should expect when referring to a specific model number. 099 = (c) model #00x following this hierarchy: 001-099 anatomical charts and flash cards 100-199 articulated skeletons 200-299 disarticulated skeletons and bone kits 300-399 organs 400-499 skulls (anatomical and dental hygiene) 500-599 other dental models (dental studies, dental decks) we also scanned and linked pdfs of the heavily worn model keys with the link text “connect to key pdf” before washing and rehousing all the models. once they were clean, they were ready for their shoot with ferris state university’s media production team. due to winter break, media production was able to shoot the majority of the collection fairly quickly. they returned to us high-resolution tiffs the same size as those for the manipulatives, 900 pixels by 600 pixels. in case of java viewer failure, we requested that there be one top-level image that showcases exactly what the model contains with images of individual pieces or drawers as the succeeding images. for example, our disarticulated skeletons are housed in small plastic carts with three drawers in each cart. therefore, the first image would be a shot of all the pieces of the disarticulated skeleton and the second image would be the contents of the top drawer, the third image the contents of the middle drawer, and the last image the contents of the bottom drawer. in this specific example, we re-used the images that we posted in the catalog information technology and libraries | june 2017 65 record by pasting them on top of the cart to show circulation staff what to expect in each drawer upon check-in. overall, photographic previews for this collection appear to be working very well for both catalog users and circulation staff “…to inform users about size, extent, and availability of collections or objects.”24 in fact, they have been working so well for this collection that usage has increased exponentially compared to previous years. figure 1. circulation statistics 2014-2016 conclusions and future directions although we implemented photographic previews for three realia collections, we could not define any standard workflow for the process beyond correcting or downloading the metadata first and adding the images second. part of this is due to our working primarily with legacy collections because we often discovered issues, like the model keys, while working through another issue. the other part is due to the nuances involved in processing realia in general. even with good, readily available catalog records like those for the tabletop games, time still had to be spent separating, organizing, and rehousing game pieces as well as hunting down useful images. unfortunately, any type of realia processing, even if it is just textual description, is much more time-consuming than the majority of academic library cataloging. adding in the extra steps to create, upload, and link a photographic preview can nearly double that labor investment. notwithstanding, as butler and kvenild advocate “…not supplying images as metadata for items that most need them (i.e. kits, games, and models) is to make them nearly irretrievable. providing bare-bones traditional metadata for these items is analogous to delegating them to the backlog shelves of yesteryear.”25 367 317 114 10 1 444 24 0 50 100 150 200 250 300 350 400 450 500 2014 2015 2016 circulation statistics manipulatives models games picture perfect: using photographic previews to enhance realia collections for library patrons and staff | rubel | https://doi.org/10.6017/ital.v36i2.9474 66 unfortunately, neither the library management system nor the third-party catalog enhancement market currently provides a good solution to this problem. considering how great an impact photographic previews have had in the online retail market, this lack of technical support is surprising. yes, syndetic solutionstm is a great product for cover images and tables of content for books. however, once you go beyond traditional resources, there is a great need to allow institutions to submit their own images as part of catalog record enhancement and not to serve as separate digital surrogates in a digital respository. this could be done either within the library management system, like the media management module, or as an option for catalog enhancement where libraries could add images to either a shared database or their own database using standard identifiers on a third-party platform like syndeticstm. further research on photographic previews is also sorely needed. as of this writing, we only have a handful of case studies and some guiding philosophy on the use of previews. consultation with internet retailers and literature on online marketing might be more applicable than library science research to evaluate their impact, but research into their direct impact vs. textual descriptions on catalog use would be ideal. references 1. fact book 2015 – 2016 (big rapids, mi: ferris state university institutional research & testing, 2016), http://www.ferris.edu/htmls/admision/testing/factbook/factbook15-162.pdf, 47. 2. ibid, 12. 3. marcia butler and cassandra kvenild, “enhancing catalog records with photographs for a curriculum materials center,” technical services quarterly 31 (2014): 122-138, https://doi.org/10.1080/07317131.2014.875377, 122-124. 4. ibid, 126. 5. martha fallahay loesch and marta mestrovic deyrup, “cataloging the curriculum library: new procedures for non-traditional formats,” cataloging & classification quarterly 34, no. 4 (2002): 79-89, https://doi.org/10.1300/j104v34n04_08, 82. 6. butler and kvenild, “enhancing catalog records with photographs,” 128. 7. stephan greene, gary marchionini, catherine plaisant, and ben shneiderman, “previews and overviews in digital libraries: designing surrogates to support visual information seeking,” journal of the american society for information science 51, no. 4 (2000): 380-393, https://doi.org/10.1002/(sici)1097-4571(2000) 51:4<380::aid-asi7>3.0.co;2-5, 381. 8. ibid. information technology and libraries | june 2017 67 9. ibid, 384. 10. ibid, 385. 11. ibid. 12. paul moeller, “enhancing access to rare journals: cover images and contents in the online catalog,” serials review 33, no. 4 (2007): 231-237, https://doi.org/10.1016/j.serrev.2007.09.003, 235. 13. butler and kvenild, “enhancing catalog records with photographs,” 128. 14. greene et. al., “previews and overviews in digital libraries,” 388. 15. butler and kvenild, “enhancing catalog records with photographs,” 124. 16. moeller, “enhancing access to rare journals,” 234. 17. butler and kvenild, “enhancing catalog records with photographs,” 129. 18. ibid, 132. 19. ibid, 126. 20. moeller, “enhancing access to rare journals,” 237. 21. butler and kvenild, “enhancing catalog records with photographs,” 131. 22. ibid, 135. 23. ibid, 134. 24. greene et. al., “previews and overviews in digital libraries,” 386. 25. butler and kvenild, “enhancing catalog records with photographs,” 136. 168 techniques for special processing of data within bibliographic text paula goossens: royal library albert i, brussels, belgium. an analysis of the codification practices of bibliographic desc1'iptions reveals a multiplicity of ways to solve the p1'oblem of the special processing of ce1tain characters within a bibliographic element. to obtain a clem· insight i'nto this subfect, a review of the techniques used in different systems is given. the basic principles of each technique are stated, examples am given, and advantages and disadvantages are weighed. simple local applications as well as more ambitious shared cataloging p1'0jects are considered. introduction effective library automation should be based on a one-time manual input of the bibliographic descriptions, with multiple output functions. these objectives may be met by introducing a logical coding technique. the higher the requirements of the output, the more sophisticated the storage coding has to be. in most cases a simple identification of the bibliographic elements is not sufficient. the requirement of a minimum of flexibility in filing and printing operations necessitates the ability to locate certain groups of characters within these elements. it is our aim, in this article, to give a review of the techniques solving this last problem. as an introduction, the basic bibliographic element coding methods are roughly schematized in the first section. according to the precision in the element identification, a distinction is made between two groups, called respectively field level and sub:field level systems. the second section contains discussions on the techniques for special processing of data within bibliographic text. three basic groups are treated: the duplication method, the internal coding techniques, and the automatic handling techniques. the different studies are illustrated with examples of existing systems. for the field level projects we confined ourselves to some important german and belgian applications. in the choice of the subfield level systems, which are marc ii based, we tried to be more complete. most of the cited applications, for practical reasons, only concern the treatment of monographs. this cannot be seen as a limitation because the methods discussed are very techniques for special processing/ goossens 169 general by nature and may be used for other material. each system which has recourse to different special processing techniques is discussed in terms of each of these techniques, enabling one to get a realistic overview of the problem. in the last section, a table of the systems versus the techniques used is given. the material studied in this paper provided us with the necessary background for building an internal coding technique in our internal processing format. bibliographic element codification methods field level systems the most rudimentary projects of catalog automation are limited to a coarse division of the bibliographic description into broad fields. these are marked by special supplied codes and cover the basic elements of author, title, imprint, collation, etc. in some of the field level systems, a bibliographic element may be further differentiated according to a more specific content designation, or according to a function identification. for instance, the author element can be split up into personal name and corporate name, or a distinction can be made between a main entry, an added entry, a reference, etc. this approach supports only the treatment of each identified bibliographic element as a whole for all necessary processing operations, filing and printing included. this explains why, in certain applications, some of the bibliographic elements are duplicated, under a variant form, according to the subsequent treatments reflected in the output functions. details on this will be discussed later. here we only mention as an example the deutsche bibliographie and the project developed at the university of bochum.l-4 it is evident that these procedures are limited in their possibilities and are not economical if applied to very voluminous bibliographic files. for this reason, at the same time, more sophisticated systems, using internal coding techniques, came into existence. these allow one to perform separate operations within a bibliographic element, based on a special indication of certain character strings within the text. as there is an overlap in the types of internal coding techniques used in the field level systems and in the subfield level systems, this problem will later be studied as a whole. we limit ourselves to citing some projects falling under this heading. as german applications we have the deutsche bibliographie and the bikas system. 5 in belgium the programs of the quetelet fonds may be mentioned.6· 7 subfield level systems in a subfield level system the basic bibliographic elements, separated into fields, are further subdivided into smaller logical units called subfields. for instance, a personal name is broken into a surname, a forename, a numeration, a title, etc. such a working method provides access to smaller logical units and will greatly facilitate the functions of extraction, sup170 journal of lihm1·y automation vol. 7/3 september 1974 pression, and transposition. thus, more flexibility in the processing of the bibliographic records is obtained. as is well known, the library of congress accomplished the pioneering work in developing the marc ii format: the communications format and the internal processing format. s-n these will be called marc lc and a distinction between the two will only be made if necessary. the marc lc project originated in the context of a shared cataloging program and immediately served as a model in different national bibliographies and in public and university libraries. in this paper we will discuss bnb marc of the british national bibliography, the nypl automated bibliographic system of the new york public library, monocle of the library of the university of grenoble, canadian marc, and fbr (forma bibliothecae regiae), the internal processing format of the royal library of belgium.l2-21 in order to further optimize the coding of a bibliographic description, the library of congress also provided for each field two special codes, called indicators. the function of these indicators differs from field to field. for example, in a personal name one of the indicators describes the type of name, to wit: forename, single surname, multiple surname, and name of family. some of the indicators may act as an internal code. in spite of the well-considered structuring of the bibliographic data in the subfield level systems, not all library objectives may yet be satisfied. to reduce the remaining limitations, some approaches similar to those elaborated in field level systems are supplied. some ( nypl, marc lc internal fmmat, and canadian marc) have, or will have, in a very limited way, recourse to a procedure of duplication of subfields or fields. all cited systems, except nypl, use to a greater or lesser degree internal coding techniques. finally some subfield level systems automatically solve certain filing problems by computer algorithms. this option was taken by nypl, marc lc, and bnb marc. each of these methods will be discussed in detail in the next section. techniques for special processing of data methods for special treatment of words or characters within bibliographic text were for the most part introduced to suppmt exact file arrangement procedures and printing operations. in order to give concrete form to the following explanation, we will illustrate some complex cases. each example contains the printing form and the filing form according to specific cataloging practices for some bibliographic elements. consider the titles in examples 1, 2, and 3, and the surnames in examples 4, 5, and 6. example 1: l'automation des bibliotheques automation bibliotheques example 2: bulletino della r. accademia medica di roma bolletino accademia medica roma techniques for special processing/ goossens 171 example 3: ibm 360 assembler language i b m three hundred sixty assembler language example 4: me kelvy mackelvy example 5: van de castele v andecastele example 6: martin du card martin dugard we do not intend, in this paper, to review the well-known basic rules for building a sort key (the translation of lowercase characters to uppercase, the completion of numerics, etc.). our attention is directed to the character strings that file differently than they are spelled in the printing form. the methods developed to meet these problems are of a very different nature. for reasons of space, not all the examples will be reconsidered in every case; only those most meaningful for the specific application will be chosen. duplication methods we briefly repeat that this method consists of the duplication of certain bibliographic elements in variant fonns, each of them exactly corresponding to a certain type of treatment. in bochum, the title data are handled in this way. one field, called "sachtitel," contains the filing form of the title followed by the year of edition. another field, named "titelbeschreibung," includes the printing form of the title and the other elements necessary for the identification of a work (statements of authorship, edition statement, imprint, series statement, etc.). to apply this procedure to examples 1, 2, and 3, the different forms of each title respectively have to be stored in a printing field and in a sorting field. analogous procedures are, in a more limited way, employed in the deutsche bibliographie. for instance, in addition to the imprint, the name of the publisher is stored in a separate field to facilitate the creation of publisher indexes. the technique of the duplication of bibliographic elements has also been considered in subfield level systems. the nypl format furnishes a filing subfield in those fields needed for the creation of the sort key. this special subfield is generally created by program, although in exceptional cases manual input may be necessary. in the filing subfield the text is preceded by a special character indicating whether or not the subfield has been introduced manually. marc lc (internal format) and canadian marc opt for a more flexible approach in which the filing information is specified with the same precision as the other information. the sorting data are stored in complete fields containing, among others, the same subfields as the corresponding original field. because in most subfield level systems the number of different fields is much higher than in field level systems, the duplication method becomes more intricate. provision of a separately coded field for each normal field 172 j oumal of library automation vol. 7 i 3 september 197 4 which may need filing information is excluded. only one filing field is supplied, which is repeatable and stored after the other fields. in order to link the sorting fields with the original fields, specific procedures have been devised. marc lc, for instance, reserves one byte per field, the sorting field code, to announce the presence or the absence of a related sorting field. the link between the fields themselves is placed in a special subfield of the filing field. 22 in the supposition that examples 3 and 4 originate from the same bibliographical description, this method may be illustrated schematically as follows: tag 100 245 880 880 sorting field code sequence number x 1 x 1 1 2 data $a$mc kelvy $a$ibm 360 assembler language $ja$1001$mackelvy $ja$2451$i b m three hundred sixty assembler language as is well known, the personal author and title fields are coded respectively as tag 100 and tag 245. tag 880 defines a filing field. in the second column, the letter x identifies the presence of a related sorting field. the third column contains a tag sequence number needed for the unequivocal identification of a field. in the last column the sign ·$ is a delimiter. the first $ is followed by the different subfield codes. the other delimiters initiate the subsequent subfields. in tag 100 and 245, the first subfields contain the surname and the short title respectively. in tag 880 the first subfield gives the identification number of the related original field. the further subfield subdivision is exactly the same as in the original fields. in canadian marc a slightly different approach has been worked out. note that in neither of the last two projects has this technique been implemented yet. for an evaluation of the duplication method different means of application must be considered. if not systematically used for several bibliographic elements, the method is very easy at input. the cataloger can fill in the data exactly as they are; no special codes must be imbedded in the text. but it is easy to understand that a more frequent need of duplicated data renders the cataloging work very cumbersome. in regard to information processing, this method consumes much storage space. first, a certain percentage of the data is repeated; second, in the most complete approach of the subfield level systems, space is needed for identifying and linking information. for instance, in marc lc, one byte per field is provided containing the sorting field code, even if no filing information at all is present. finally, programming efforts are also burdened by the need for special linking procedures. in order to minimize the use of the duplication technique, the cited systems reduce their application in different ways. bochum simplified its cataloging rules in order to limit its use to title information. as will be explained further, the deutsche bibliographie also has recourse to internal techniques for special processing/ goossens 173 coding techniques. nypl, marc lc, and canadian marc only call on it if other more efficient methods (see later) fail. they also make an attempt to adapt existing cataloging practices to an unmodified machine handling of nonduplicated and minimally coded data. intemal coding techniques separators separators are special codes introduced within the text, identifying the characters to be treated in a special way. a distinction can be made among four procedures. 1. simple separators. with this method, each special action to be performed on a limited character string is indicated by a group of two identical separators, each represented as a single special sign. illustration on examples 2, 3, 4, and 6 gives: example 2: £ bolletino £ ¢bulletino della r. ¢accademia medica ¢di ¢roma example 3: £i b m three hundred sixty £¢ibm 360 ¢assembler language example 4: m£a£c¢ ¢kelvy example 6: martin du¢ ¢card the characters enclosed between each group of two corresponding codes £ must be omitted for printing operations. in the same way the characters enclosed between two corresponding codes ¢ are to be ignored in the process of filing. in the case that only the starting position of a special action has to be indicated, one separator is sufficient. for instance, if in example 1 we limit ourselves to coding the first character to be taken into account for filing operations, we have: example 1: l' i automation des bibliotheques where a slash is used as sorting instruction code. the simple separator method has tempting positive aspects. occupying a minimum of storage space (maximum two bytes for each instruction), the technique gives a large range of processing possibilities. indeed, excluding the limitation on the number of special signs available as separators, no other restrictions are imposed. this argument will be rated at its true worth only after evaluation of the multiple function separators method and of the indicator techniques. the major disadvantage of the simple separator method lies in its slowness of exploitation. in fact, for every treatment to be performed, each data element which may contain special codes has to be scanned, character by character, to localize the separators within the text and to enable the execution of the appropriate instructions. for example, in the case of a printing operation, the program has to identify the parts of the text to be considered and to remove all separators. the sluggishness of 17 4 i ournal of library automation vol. 7 i 3 september 197 4 execution was for some, as for canadian marc, a reason to disapprove this method.23 as already mentioned, another handicap with cataloging applications is the loss of a number of characters caused by their use as special codes. it is self-evident that each character needed as a separator cannot be used as an ordinary character in the text. for bochum this was a motive to reject this method. many of the field level systems with internal codes have recourse to simple separators. we mention the deutsche bibliographie, in which some separators indicate the keywords serving for automatic creation of indexes and others give the necessary commands for font changes in photocomposition applications. in order to reduce the number of special signs, the deutsche bibliographie also duplicates certain bibliographic data. bikas uses simple separators for filing purposes. the technique is also employed in subfield level systems. in monocle each title field contains a slash, indicating the first character to be taken into account for filing. 2. multiple function separators. designed by the british, the technique of the multiple function separators was adopted in monocle. the basic idea consists of the use of one separator characteristic for instructing multiple actions. in the case of monocle these actions are printing only, filing only, and both printing and filing. in order to give concrete form to this method we apply it to examples 3, 4, and 6, using a vertical bar as special code. example 3: jibm 360 jib m three hundred sixty jassembler language example 4: mjc jacjkelvy example 6: martin dujjjgard the so-called three-bar filing system divides a data element into the following parts: data to be j data to be i data to be filed and printed j printed only filed only i data to be j filed and printed in comparison with the simple separator technique, this method has the advantage of needing fewer special characters. a gain of storage space cannot be assumed directly. as is the case in example 6, if only one special instruction is needed, the set of three separators must still be used. on the other hand, one must note that a repetition of identical groups of multiple function separators within one data element must be avoided. subsequent use of these codes leads to very unclear representations of the text and may cause faulty data storage. this can well be proved if the necessary groups of three bars are inserted in examples 1 and 2. of the studied systems, monocle is the only one to use this method. 3. separators with indicators. as mentioned in the description of subfield level systems, two indicators are added for each field present. in techniques for special p1'0cessing/ goossens 175 order to speed up the processing time in separator applications, indicators may be exploited. in monocle the presence or the absence of three bars in a subfield is signalled by an indicator at the beginning of the corresponding field. this avoids the systematic search for separators within all the subfields that may contain special codes. the number of indicators being limited, it is self-evident that in certain fields they may already be used for other purposes. as a result, some of the separators will be identified at the beginning of the field and others not. this leads to a certain heterogeneity in the general system concept which complicates the programming efforts. under this heading, we have mentioned the use of indicators only in connection with multiple function separators. note that this procedure could be applied as well in simple separator methods. nevertheless, none of the subfield level systems performs in this fashion because it is not necessary for the particular applications. this method is not followed in the field level systems as no indicators are provided. 4. compound separators. a means of avoiding the second disadvantage of the simple separator technique is to represent each separator by a two-character code: the first one, a delimiter, identifies the presence of the separator and is common to each of them; the second one, a normal character, identifies the separator's characteristic. taking the sign £ as delimiter and indicating the functions of nonprinting and nonfiling respectively by the characters a and b, examples 2 and 4 give in this case : example 2: £ abolletino £ a£ bbulletino della r. £ baccademia medica £ bdi £ broma example 4: m£aa£ac£b £bkelvy thus the number of reserved special characters is reduced to one, independent of the number of different types of separators needed. in none of the considered projects is this technique used, probably because of the amount of storage space wasted. indicators as the concept of adding indicators in a bibliographic record format is an innovation of marc lc, the methods described under this heading concern only subfield level systems. although at the moment of the creation of marc lc one did not anticipate the systematic use of indicators for filing, its adherents made good use of them for this purpose. 1. personal name type indicator. as mentioned earlier, in marc lc one of the indicators, in the field of a personal name, provides information on the name type. this enables one to realize special file arrangements. for example, in the case of homonyms, the names consisting only of a forename can be filed before identical surnames. using the same indicator, an exact sort sequence can be obtained for 176 journal of libmry automation vol. 7/3 september 1974 single surnames, including prefixes. knowing that the printing form of example 5 is a single surname, the program for building the sort key can ignore the two spaces. the systems derived from marc lc developed analog indicator codifications adapted to their own requirements. this seems to be an elegant method for solving particular filing problems in personal names. nevertheless, its possibilities are not large enough to give full satisfaction. for instance, example 6 gives a multiple surname with prefix in the second part of the name. the statement of multiple surname in the indicator does not give enough information to create the exact sort form. because of this shortcoming, monocle had recourse to the technique called "separators with indicators." 2. indicators identifying the beginning of filing text. bnb marc reserves one indicator in the title field for identification of the first character of the title to be considered for filing. this indicator is a digit between zero and nine, giving the number of characters to be skipped at the beginning of the text. applying this technique to example i, the corresponding filing indicator must have the value three. without having recourse to other working methods, this title sorts as: example 1: automation des bibliotheques notice that the article des still remains in the filing form. this procedure has the advantage of being very economical in storage space and in processing time. moreover the text is not cluttered with extraneous characters. on the other hand we must disapprove of the limitation of this technique to the indication of nonfiling words at the beginning of a field. the possibility of identifying certain character strings within the text is not provided for. taking examples 2 and 3 we observe that the stated conditions cannot be fulfilled. another negative side is the number of characters to be ignored, which may not exceed nine. also one indicator must be available for this filing indication. after bnb marc, marc lc and canadian marc also introduced this technique. 3. separators with indicators. the use of indicators in combination with separators has been treated above. pointers a final internal coding technique which seems worth studying is the one developed at the royal library of belgium for the creation of the catalogs of the library of the quetelet fonds, a field level system. the pointer technique is rather intricate at input but has many advantages at output. because there is inadequate documentation of this working method, we will try to give an insight into it by schematizing the procedures to be followed to create the final storage structure. at input, the cataloger intechniques for special p1'dcessing/goossens 177 serts the necessary internal codes as simple separators within the text. these codes are extracted by program from the text and placed before it, at the beginning of each field. each separator, now called pointm· characteristic, is supplemented with the absolute beginning address and the length of its action area within the text. in the quetelet fonds the pointer characteristic is represented by one character, the address and length occupy two bytes each. the complete set of pointers (pointer characteristics, lengths, and addresses ) is named pointer field. this field is incorporated in a sort of directory, starting with the sign "&" identifying the beginning of the field, followed by the length of the directory, the length of the text, and the pointer field itself. this is illustrated in figure 1. note that each field contains the five first bytes, even if no pointers are present. in the quetelet fonds, pointers are used for the following purposes: nonfiling, nonprinting, kwic index, indication of a corporate name in the title of a periodical, etc. examples 2, 3, and 4 should be stored in this system as represented in figure 2. directory text i i pointer field i i i i i 1 i i i i i i i i i i i representation of the structure of a field in the internal processing format of the quetelet fonds system. the codes respectively represent: &: field delimiter; ld: length of directory; lt: length of text; x, y, . . . : pointer characteristics; ax, ay, . . . : addresses of the beginning of the related action area inside the text; lx, ly, ... : length of these action areas. fig. 1. structure of direct01y with pointe1' technique. ' the advantages of the pointer technique are numerous. first, we must mention the relative rapidity of the processing of the records. in fact, in order to detect a specific pointer, only the directory has to be consulted. all subsequent instructions can be executed immediately. in contrast with most of the other methods discussed, there is no objection to using pointers for all internal coding purposes needed. this enables one to pursue homogeneity in the storage format, facilitating the development of programs. further, the physical separation of the internal codes and the text allow, in most cases, a direct clean text representation without any reformatting. finally, unpredictable expansions of internal coding processes can easily be added without adaptation of the existing software. a great disadvantage of the pointer technique lies in the creation of the directory. the storage space occupied by the pointers is also great in comparison with the place occupied by internal codes in other methods. a further handicap is the limitation imposed at input due to the use of simple separators. 178 journal of library automation vol. 7 i 3 september 197 4 ~~2,!j5,31~4>.~ 1,~eb ,4>11 ,9lel4,61ct;,3jb.o,l,l,e,t, i,n.o, ,b,u, i, i ,e, t, i ,n,oj 0 5 10 15 ,d,e,l,l,a, ,r,., ,a,c,c,a,d,e,m,i,a, ,m,e,d,i,c,a, ,d:i, ,r,o,m,a,$ ~ ~ ~ ~ ~ ~ ~ ~ ~~1 ,5 1 5,2~a~,$1 2,61sl2,6 1ci>, eli, ,b, ,m, ,t,h,r,e,e, ,h,u,n,d,r,e,d, ,s, i ,x,rl 0 5 10 15 ~ lv, , i,b,m, ,3,6,4>, ,a,s,s,e,m,b, i ,e, r, , i ,a,n,g,u,a,g,e, ~ ~ ~ h ~ ~ ~~ ~~1 ,5,4>,91~4>, 114>,1lelc~>,3 1 ci>, 1jm,a,c, ,k,e, l ,v,y, ~ 0 5 8 representation of examples 2, 3, and 4 in the quetelet fonds format. a represents the pointer characteristic for nonprinting data; b is the pointer characteristic for nonfiling data. fig. 2. pointe1· technique as applied to bibliographic data. in spite of these negative arguments, we see a great interest in this method, and wish to give some suggestions in order to relieve or to eliminate some of them. initially we must realize that the creation of a record takes place only once, while the applications are innumerable. the possibility of automatically adding some of the codes may also be considered. data needing special treatment expressed in a consistent set of logical rules can be coded by program. only exceptions have to be treated manually. in considering the space occupied by the directory, some profit could be imagined by trying to reduce the storage space occupied by the addresses and the lengths. there is also a solution to be found by not having systematically to provide pointer field information. one must realize that only a small percentage of the fields may contain such codes. finally, the restrictions at input may be removed by using complex separators. such a change does not have any repercussion on the directory. as far as we know, the pointer technique has not been used in a subfield level system. at our library an internal processing format of the subfield level type, called fbr, is under development, in which a pointer technique based on the foregoing is incorporated. techniques for special p1'dcessing/goossens 179 automatic handling techniques in order to give a complete review of the methods of handling data within bibliographic text, we must also treat the methods in which both the identification and the special treatment of these data are done during the execution of the output programs. the working method can easily be demonstrated with example 1. only the printing form must be recorded. the program for building the sort key processes a look-up table of nonfiling words including the articles l' and des. the program checks every word of the printing form for a match with one of the words of the nonfiling list. the sort key is built up with all the words which are not present in this table. to treat example 4, an analogous procedure can be worked through. an equivalence list of words for which the filing form differs from the printing form is needed. if, during the construction of the sort key, a match is found with a word in the equivalence list, the correct filing form, stored in this list, is placed in the sort key. the other words are taken in their printing form. in our case, using the equivalence list, me should be replaced by mac. in order to speed up the look-up procedures, different methods of organization of the look-up tables can be devised. other types of automatic processing techniques can be illustrated by the special filing algorithms constructed for a correct sort of dates. for instance, in order to be able to sort b.c. and a.d. dates in a chronological order, the year 0 is replaced by the year 5000. b.c. and a.d. dates are respectively subtracted from or added to this number. thus dates back to 5000 b.c. can be correctly treated. this technique, introduced by nypl, is also used at lc. the advantages of automatic handling techniques are many. no special arrangements must be made at input. only the bibliographic elements must be introduced under the printing form and no special codes have to be added. there is no storage space wasted for storing internal codes. as negative aspects we ascertain that not all cataloging rules may be expressed in rigid systematic process steps. examples 2 and 3 illustrate this point. one must also recognize that the special automatic handling programs must be executed repeatedly when a sort key is built up, increasing the processing time. this procedure may give some help for filing purposes, but we can hardly imagine that it really may solve all internal coding problems. think of the instructions to be given for the choice of character type while working with a type setting machine. the automatic handling technique is very extensively applied in the nypl programs, marc lc has recourse to it for treating dates, and bnb marc for personal names. 24 none of the field level systems considered here uses this method. summary and conclusions table 1 presents, for the discussed systems, a summary of the methods used for treating data in a bibliographic text. the duplication and indicator techniques have the most adherents. however, we must keep in mind table 1. review of the techniques for special processing of data within bibliographic text used or planned in the discussed systems systems techniques ,..... 00 0 automatic duplication internal codes handling ......... 0 separators t separators with indicators indicators pointers multiple personal beginning of 0 simple function name type filing text -t-t deutsche ""· c::>"' bibliographie x x ~ ~ <.-: eo chum x ~ <>+0 ~ bikas a .... 0 ;:i quetelet fonds x < 0 !"'"" -l. marc lc x x x x -cn cj') bnb marc x x x ('t) "'0 ..... ('t) nypl x x s 0" ('t) .... ,..... monocle x x x x co -l. jol>.. canadian marc x x x fer x techniques for special processing/ goossens 181 that in most of the systems the duplication of data only represents an extreme solution. on the other hand, indicators are very limited in their possibilities. as far as the flexibility and application possibilities are concerned, the simple separators and the pointers present the most interesting prospects. automatic handling techniques may produce good results for use in well-defined fields or subfields. from the evaluations given for the different methods, we conclude that for a special application the choice of a method depends greatly on the objectives, namely the sort of special processing facilities needed, the volume of data to be treated, and the frequency of execution. references i. rudolf blum, "die maschinelle herstellung der deutschen bibliographie in bibliothekarischer sicht," zeitschrift fur bibliothekswesen und bibliographie 13:303-21 (1966). 2. die zmd in frankfurt am main; herausgegeben von klaus schneider (berlin: beuth-vertrieb gmbh, 1969), p.133-37, 162-67. 3, magnetbanddienst deutsche bibliographie, beschreibung fur 7-spur-magnetbiinder (frankfurt on the main: zentralstelle fi.ir maschinelle documentation, 1972). 4. ingeborg sobottke, "rationalisierung der alphabetischen katalogisierung," in electronische datenverarbeitung in der universitiitsbibliothek bochum; herausgegeben in verbindung mit der pressestelle der ruhr-universitat bochum von gunther pflug und bernhard adams (bochum: druckund verlagshaus schiirmann & klagges, 1968), p.24-32. 5. datenerfassung und datenverarbeitung in der universitiitsbibliothek bielefeld: eine materialsammlung; hrsg. von elke bonness und harro heim (munich: pullach, 1972). 6. michel bartholomeus, l' aspect informatique de la catalographie automatique (brussels: bibliotheque royale albert j•r, 1970), 7. m. bartholomeus and m. hansart, lecture des ent1·ees bibliog1·aphiques sous format 80 colonnes et creation de l'enregistrement standard; publication interne: mecono b015a (brussels: bibliotheque royale albert j•r, 1969). 8. henriette d. avram, john f. knapp, and lucia j. rather, the marc ii format: a communications format for bibliographic data (washington, d.c.: library of congress, 1968) . 9. books, a marc format: specifications for magnetic tapes containing catalog records for books (5th ed.; washington, d.c.: library of congress, 1972). 10. "automation activities in the processing department of the library of congress," library resources & technical services 16:195-239 (spring 1972). 11. l. e. leonard and l. j. rather, internal marc format specifications for books (3d ed.; washington, d.c.: library of congress, 1972). 12. marc record service proposals (bnb documentation service publications no.1 [london: council of the british national bibliography, ltd., 1968]). 13. marc ii specifications (bnb documentation service publications no.2 [london: council of the british national bibliography, ltd., 1969]). 14. michael gorman and john e. linford, desc1·iption of the bnb marc recorda manual of practice (london: council of the british national bibliography, ltd., 1971). 182 ] ournal of library automation vol 7 i 3 september 197 4 15. edward duncan, "computer filing at the new york public library," in lm·c reports vol.3, no.3 ( 1970), p.66-72. 16. nypl automated bibliographic system overview, internal report. (new york: new york public library, 1972). 17. marc chauveinc, monocle: projet de mise en ordinateur d'une notice catalographique de livre. deuxieme edition (grenoble: bibliotheque universitaire, 1972). 18. marc chauveinc, "monocle," journal of library automation 4:113-28 (sept. 1971). 19. canadian marc (ottawa: national library of canada, 1972). 20. format de communication du marc canadien: monographies (ottawa: bibliotheque nationale du canada, 1973). 21. to be published. 22. private communications ( 1973). 23. private communications ( 1972). 24. private communications ( 1973). lib-s-mocs-kmc364-20141005043400 file structure for an online catalog of one million titles j. j. dimsdale: department of computing science, university of alberta, edmonton, canada, and h. s. heaps: department of computer science, sir george williams university, montreal, canada. 37 a description is given of the file organization and design of an on-line catalog suitable jo1· automation of a library of one million books. a method of virtual hash addressing allows rapid search of the indexes to the catalog file. storage of textual material in a compressed form allows considerable reduction in storage costs. introduction an integrated system for on-line library automation requires a number of computer accessible files. it proves convenient to divide these files into three principal groups, those required for the on-line catalog subsystem, those required for the acquisition subsystem, and those required for the on-line circulation subsystem. the present paper is concerned with the files for the catalog subsystem. files required for the circulation subsystem will be discussed in a future paper. the files for an on-line catalog system should contain all bibliographic details normally present in a manual catalog, and the file should be organized to allow searches to be made with respect to title words, authors, and library of congress ( lc) call numbers. it may also be desired to search on other bibliographic details, in which instance the appropriate files may be added to those described in the present paper. the file organization should be such as to support economic searching with respect to questions in which terms are connected by the logic operations and, or, and not. it should also allow question terms to be connected by operations of adjacency and precedence, and it should allow question terms to be weighted and the search made with reference to a specified threshold weight. it may be desirable for the file organization to include a thesaurus that may be used either directly by the user or by the search program to narrow, or broaden, the scope of the initial query or to ensure standardization of the question vocabulary. the file organization and search strategy should ensure that the user of the on-line catalog system receive an acceptable response time to his 38 journal of library automation vol. 6/ 1 march 1973 queries, although it is likely that some of the operations required by the circulation system will be given a higher priority. thus the integrated system must time-share between search queries, circulation transactions, and other tasks that originate from a number of separate terminals or from batch input. such tasks might arise from acquisitions, and from update and maintenance of the on-line catalog. the system should be a special purpose time-sharing system such as the time sharing chemical information retrieval system described by lefkovitz and powers and by weinberg.1· 2 in this system the queries time-share disk storage as well as the central processor. since an on-line catalog is a large file, and hence expensive to store in computer accessible form , it is desirable to store it in as compact a form as possible. for example, a catalog file for one million titles is likely to involve between 2 x los and 5 x 108 alphanumeric characters. if stored character by character the required storage capacity would be equivalent to that supplied by from seven to sixteen ibm 2316 disk packs. it is also important to design the frequently accessed files so as to minimize the number of disk, or data cell, accesses required to process each query. the files described in the present paper include ones stored in compressed form and organized for rapid access. throughout the present paper the term title is used in a general sense. it may include periodical titles as well as book titles. however, it is supposed that frequently changing information, such as periodical volume ranges, will be stored as part of the circulation subsystem rather than the catalog subsystem. overall file organization the complete bibliographic entries of the catalog may be stored in a serial (sequential) file so that any record may readily be read and displayed in its entirety. however, as indicated by curtice, use of an inverted file is to be preferred for purposes of searching.3 an alternative to the simple serial file is one organized in the form of a multiple threaded list ( multilist) in which all records that contain a particular key are linked together by pointers within the records themselves. the first record in each list is pointed to by an entry in a key directory as described by lefkovitz, holbrook, dodd, and rettenmayer.4-7 for very small collections of documents divett and burnaugh have attempted to organize on-line catalogs by use of ring structured variations of the multilist technique.8• 9 neither file organization is feasible for a collection of a million documents because of the long length of the threads involved. many disk accesses would be needed in order to retrieve all elements of a list, and hence there would be a very slow response to queries. the cellular multilist structure proposed by lefkovitz and powers, or the cellular serial structure proposed by lefkovitz, may well prove to be a viable alternative to the organization proposed in the present paper.10• 11 file structure for an on-line catalog j dimsdale 39 however, as indicated by lefkovitz, the inverted organization provides shorter initial, and successive, response times in answer to queries.12 in the present paper it is supposed that the on-line catalog file consists of both a serial file of complete bibliographic entries and an inverted file organized with respect to search keys such as title words, subject terms, author names, and call numbers. such a two-level structure is often assumed and has been termed a "combined file" by warheit who concluded it to be superior to either a single serial file or a threaded list organization.1317 the file structure described in the present paper uses indexes based on the virtual scatter table as described by morris and murray, the scatter in~ dex table discussed by morris, and the bucket as treated by buchholz.1820 the attractiveness of a similar structure for use in the ohio college library center has been analyzed by long, et ap1 the basic elements of the file organization are shown in figure 1. it is supposed that the access keys are title words, but a similar file structure is used for access with respect to keys of other types. key hashi ng hash {eg. title word)_. f unction-+ table file fig. 1. overall file organization any key may be operated on by a hashing function which transforms it into a pointer to an entry in a hash table file. this file contains pointers to both a dictionary file of title words and an inverted index which is stored in a compressed form. entries within the compressed inverted index serve as pointers to the catalog file of complete bibliographic entries. terms, such as title words, within the catalog file are coded to allow a compressed form of storage. the codes used in the compressed catalog file serve as pointers to the uncoded terms stored in the dictionary file. there would be a separate hashing function, hash table file, dictionary file, and compressed inverted file for use with each different type of key. however, there is only one compressed catalog file. for a search scheme that allows use of a thesaurus of synonyms, narrower terms, broader terms, and so forth, a thesaurus file may be added (figme 2). the files must be organized to allow for ease of updating. as further bibliographic entries are added it is necessary to add additional pointers from the inverted index. also, whenever a new key occurs in a bibliographic entry it must be added to the dictionary, assigned a code for storage in 40 journal of library automation vol. 6/1 march 1973 the compressed catalog file, and entered into the compressed inverted index. key hashing (eg. title word) function fig. 2. file organization with inclusion of a thesaur·us structure of the hash table file in order to locate the set of inverted index pointers that corresponds to a given search key k, the key is first operated on by a hashing function that transforms it into a bit string of length v bits. each such bit string is said to represent a virtual hash address, and is regarded as the concatenation of two substrings of length r and v-r bits. the two substrings are respectively said to constitute the major and the minor m( k) of the virtual hash address. the major is further divided into two bit strings b(k) and i(k) that define a bucket number b(k) of a bucket f3(k), and an index number i(k) of an entry within the bucket. the major that represents the pair of numbers b ( k), i ( k) is said to constitute a real hash address. the hash table file is divided into portions, or buckets, of equal length. each bucket is further divided into an index section, a content section, and a counter section (figure 3) . the index sections of all buckets have the same length. similarly, all content sections are of equal length, and so are all counter sections. as the hash table is created, entries are added sequentially into the content section so that any unfilled portion is at the end. in contrast, the index section of any bucket may contain unfilled entries at random positions and hence constitutes a scatter table. the hash table :file is created as follows. the various keys are transformed by the hashing function into bit strings b ( k), i ( k), m ( k). in the bucket f3 ( k) of number b ( k) an entry as described below is added to the content section, and the vacancy pointer within the counter section is incremented to point to the beginning of the unfilled portion of the content section. the i(k)th entry number in the index section is then set to point to the position of the entry added to the content section. the entry placed in the content section includes the minor m ( k) and a dictionary pointer to where the key is placed in the dictionary file as well as a pointer to an entry in the compressed inverted index. if there has previously occurred a bit string b(k1), i(k1), mr(k1) in which b(l) = b(k), i(k1) = i(k), mr(k1) # m(k) then no change is file structure for an on-line catalog/dim5dale 41 b(k), i(k), m(k) hash table file: counter section: counter section number number number occupied overflows from overflows into fig. 3. bucket of the hash table file made to the i ( k) th entry in the bucket f3 ( k) or to the minor m ( k1) in the content section. instead, the chain pointer is set to point to the location of a new entry that is added to the content section. in this new entry the minor is set to mt(k) and the dictionary pointer is set to indicate where the new key is placed in the dictionary file. there is said to have resulted a collision at the real hash address b(k), i(k). if there has previously occurred a bit string b ( k1), i ( kt) , m ( k1) in which b(k1) = b(k), i(k1) = i(k ), m(kt) = mt(k), where k1 =f k, then the collision bit that precedes m( k.,_) is set to 1. and a further content entry containing m ( k) is chained from the entry that contains m ( k1 ) . there is said to have occulted a collision at the virtual hash address b(k), i(k), m(k). the last three entries included in the counter section shown in figure 3 are optional but are useful for monitoring the performance of the hashing function with respect to bucket overflows and so forth. a bucket becomes full when there is no remaining unfilled space in its content section. if a further chain pointer is required from a content entry, its preceding overflow bit qc is set to 1 to indicate that the pointer is to another bucket. likewise, if a further entry is required in the index section its preceding overflow bit qr is set to 1 to indicate that it refers to an entry within another bucket. the bucket is then said to have overflowed. methods of handling bucket overflow, and choice of the new bucket, are discussed in a subsequent section. it should be noted that use of a hash table as described above retains most of the advantages of the usual scatter index method in which the in42 journal of library automation vol. 6/1 march 1973 dex entries and content entries are stored in two separate files. it has the further important advantage that in most instances a single disk access is sufficient to locate both the index entry and the corresponding content entry. as noted by buchholz and reising~ if it is known that certain keys are likely to appear with high frequency in search queries then it is advantageous to enter them at the start of creation of the hash file. 22 • 23 they will then tend to appear near the beginnings of the content entry chains and hence require little cpu time for their subsequent location. furthermore, they will tend to appear in the same bucket as their corresponding index entries, and hence their location will usually require only a single disk access. number of bits for virtual hash address suppose the hashing function is chosen so that the majors of the transformed keys are uniformly distributed among the r slots available for real hash addresses b,i. if there are n keys then a = n / r may be termed the load factor. it is the average number of keys that are transformed into any given real hash address. the probability that any given real hash address corresponds to k keys is given by m urra y24 as (i) pk = e-a ak j kl hence, for any given real address the probability of a collision occurring is n (2) c = ~ pk = 1 po p1 = 1 (1 + a)e-a. k= 2 if a collision occurs at a particular real hash address, the expected length of the required chain within the content section is n (3) l = ~ kp~r/c k=2 n = (1 / c) (~ kpk pd k:o = ( l j c) (a ae-a) _ a (e" 1) ea.1 a it may be noted that if the load factor a is equal to 1 then l = 2.43. if all the transformed keys are distributed uniformly among the v possible virtual addresses b, i, m then the expected total number of collisions at virtual addresses is given by murray25 as (4) p = n2/ 2v provided v" n. the expected relative frequency of collisions at virtual addresses is therefore (5) f = n / 2v. file structure for an on-line catalog j dlmsdale 43 it proves convenient to regard n, f, and a as basic parameters in terms of which may be determined the number r of bits required in the major 1 and the number v of bits required in the virtual hash addresses. the value of r must be at least as large as lo~r = lo~(n/a), and hence r may be chosen according to the formula (6) r = r log2 (n/ a) where r means "the smallest integer greater than or equal to." the value of v must be at least as large as (7) v = r lo~v = r lo~ (n/2£). if n and f have the form n = 2n and f = 2-'y then v may be chosen according to the formula (8) v = n + 'y 1 and the number of bits required for the minor is (9) m = vr. choice of bucket capacity with an 8-bit byte-oriented computer, such as the ibm 360, it proves convenient to use 8 bits of storage for each entry number plus overflow bit within the index section. if a value of zero is used to indicate an unused index entry there remain up to 127 possible values for entry numbers. thus the number c of entries in the content section must be less than or equal to 127. suppose there are b slots for index entries in each bucket. the total number of index entries in the entire file is r. it follows from the results of schay and spruth,26 tainter,27 and heising28 that the probability p( b, c) of overflow of any bucket is given by oo c-<>b (10) p (b, c) = ~ (ab)k -· k = c + 1 kl for selected values of b, beyer's tables of the poisson distribution have been used to compute p ( b, c) and to determine the largest value of c for which p(b, c) l o.oi.29 the results are shown in table 1 for the instance in which a = 1. a similar table has been computed by buchholz3° for the instance in which c = b and a ranges from 0.1 to 1.2. as is apparent from table 1, an increase in the value of b allows use of a smaller ratio c/ b and hence permits more economical use of storage. with b = 64 the allowed value of c/b is 1.33 and hence c may be chosen equal to 85. the reduction in access time that results from structuring the file so that each bucket contains both index and content entries is, of course, effected at the expense of additional storage costs. for example, if cj b = 1.33 then the space allocated for storage of content entries is 33 percent greater than if content entries are stored in a separate file. relaxation of the condition p(b,c)..:::: 0.01 allows a reduction in cj b, but the increased number of bucket overflows will cause additional disk accesses to be required. 44 journal of library automation vol. 6/ 1 march 1973 table 1. values of b, c, and cj b for which p(b,c~o.ol when a= 1. b 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 60 100 c 5 6 8 10 11 13 1415 17 18 19 20 22 23 24 25 27 28 29 30 80 125 treatment of bucket overflows c! b 5.00 3.00 2.66 2.50 2.20 2.17 2.00 1.88 1.89 1.80 1.73 1.67 1.69 1.64 1.60 1.56 1.59 1.55 1.53 1.50 1.33 1.25 when a new key is found to map into a bucket whose content section is full then some means must be found to provide space in some other bucket. the particular procedure that should be used depends on the extent to which the entire set of buckets contain unfilled portions. suppose that many buckets are almost full and that the number c of allowed content entries is less than 127. the entire hash file may then be expanded with the same index sections but with longer content sections. if many buckets are almost full and c = 127 then the entire file may be expanded in such manner that each bucket is replaced by a pair of buckets that contain the same number b of allowable index entries, but whose number ct of allowable content entries is chosen to ensure that p(b,ct ) l 0.01. such doubling of buckets also doubles the number of index entries but it does not double the storage required for the entire file. each key k that corresponds to an entry in the original bucket is associated with an entry in the first , or second, of the new buckets according as the leading bit of either its index address i ( k) or its minor m:( k) is equal to 0 or 1. the effect is to shift one bit from i(k) or m(k) into the bucket address b ( k ) . this method is based on a suggestion of morris. 31 · suppose that few buckets are almost full. then a suitable means of determination of an unfilled bucket for storage of the minor is through use of some overflow algorithm that determines a sequence of bucket numbers bo ( k) , bt ( k) , b2 ( k), etc., corresponding to any given full bucket {:3o ( k) . suppose there are nb buckets. a quadratic residue algorithm (11) bi (k) = [b0 (k) + aj + bj2 ] mod nb file structure for an on-line catalog / dimsdale 45 has been considered by maurer and by bell for use with in-core hash tables, but it suffers from the disadvantage that the existence of a full bucket /)o ( k) will divert entries into the particular buckets /31 ( k), /32 ( k), etc. and hence cause them to fill more rapidly than other buckets which may contain fewer entries.82 • 33 it is believed that a more desirable form of the quadratic residue algorithm is ( 12) bj ( k) = { b0 ( k) + f1 [i ( k)] } mod nb where fl is a suitably chosen function. letting b, ( k) depend, through fs, on both j and i ( k), instead of on j alone, allows reduction of the tendency to fill a particular set of buckets. to prevent a tendency to overflow particular buckets it is also desirable for the overflow algorithm to produce bucket numbers that are uniformly distributed among all possible bucket numbers. among the more promising forms to be chosen for the fl [i ( k)] are the following ( 13a) fj [i ( k)] = i ' ( k) j where j = 1, 2, ... , nb -1, and l'(k) denotes i(k) if i(k ) is odd, but denotes i ( k) + 1 if i ( k) is even. since nb is a power of 2 such choice of i'(k) ensures that i'(k) and nb have no common factors, and hence that bi ( k) steps through the sequence /3o ( k ), /31 ( k), etc. covering every bucket in the file. ( 13b ) fj [i ( k) ] = i ' ( k ) j 2 where j = 1, 2, ... , r \ / n-1, and r means "the least integer greater than or equal to." ( 13c ) fdl ( k)] = rdi ' ( k)] where j = 1, 2, ... , db, and rj[l'( k)] denotes a number output by a pseudorandom number generator of the form suggested by morrisb4 with an initial input of i' ( k) instead of 1. it may be remarked that use of equation 13a requires the least number of machine instructions, and the least cpu time per step, but it has a strong tendency to cluster the f31(k) immediately after the /3o(k) and hence it is likely to be the least effective of the three methods. use of equation 13b produces less clustering, but the sequence does not include all buckets of the file. use of equation 13c requires the largest number of instructions and cpu time per step, but the f3j(k) are less likely to cluster and they are uniformly distributed among all possible buckets. thus equation 13c produces shorter chains of overflow buckets and hence requires fewer disk accesses. if a new key k maps into a full bucket /3o ( k) then the following procedure is used to determine the bucket into which the minor of k is to be inserted: ( i) the chain of pointers from the i ( k) th entry of the bucket /)o ( k) is followed, possibly through overflow buckets given by equation 12, in or46 journal of library automation vol. 6/1 march 1973 der to locate the terminal entry of the chain. suppose this terminal entry is within a bucket /3j ( k) . (ii) if there is available space in bucket /3j(k) then the minor mr(k) is entered and chained as described previously. (iii) if bucket /3j ( k) is full, but there is space in /3j + 1 ( k), then the minor m ( k) is entered into /3j + 1 ( k) and chained as described previously. ( iv) if buckets f3j ( k) and /3j + 1 ( k) are both full, and bucket /3j + 1 ( k) contains at least one nonempty index entry i ( k') whose chained content entries are all contained within /3j + 1 ( k), then the minor m ( k) is stored according to the following displacement algorithm: the terminal member of the chain from i ( k') is displaced to an overflow bucket /3r ( k') determined by use of equation 12, except that if both /3r(k') and /3r + 1(k') are full then a further bucket is determined by use of the displacement algorithm. the minor m ( k) is substituted for the displaced entry in bucket /3j + 1(k) and is chained appropriately. ( v) if application of step ( iv) leads to a bucket /3j + 1 ( k), or /3r + 1 ( k), that contains no nonempty index entry whose chained content entries are all contained within it, then the entire hash file must be expanded by use of one of the procedures described at the beginning of the present section. it should be emphasized that, although step ( iv) is necessary for completeness, the probability of its use is very low. with a probability of less than 0.01 for a bucket overflow, the probability of use of step ( iv) is less than (0.01) 3• search phase and problem of mismatch in the previous sections the structure of the hash index file has been discussed with emphasis on details of its creation and update. during search of the catalog files by use of the inverted index, each search key is processed by the following search alogorithm: step 1: the search key k is transformed by the hashing function into a virtual hash address b(k), i(k), m(k). step 2: the bucket /3(k) is read into core. step 3: the index entry specified by i(k) is examined. if it is empty then the search key is not present in the data base. if it is not empty then step 4 is performed. step 4: the overflow bit of the index entry specified by i(k) is examined. if it is equal to 1 then step 5 is performed. if it is equal to 0 then step 6 is performed. step 5: the overflow algorithm is used to determine the address of therequired overflow bucket which is then read into core, and step 6 is executed. step 6: the minor of each entry in the chain of content entries is comfile structure for an on-line catalog/dimsdale 47 pared to the minor of the search key's virtual hash address until either a match is found or the chain is exhausted. whenever the chain leads to an overflow bucket then step 5 is performed. step 7: if a match is found for m ( k) then the collision bit of the entry is examined. if it is equal to 0 then step 9 is performed. if it is equal to 1 then step 8 is performed. step 8: the dictionary entry that corresponds to each content entry in the virtual address collision is read into core and compared to the search key k. if no match is found then the search key is not present in the index. step 9: this step is included because there is a small probability that a misspelled search key, or one not present in the hash file, may be transformed into the same virtual address as some key already included in the file. the step consists of reading the corresponding dictionary entry into core and comparing it with the search key. for reasons discussed later in the present section it is desirable to omit this step. it should be noted that in most instances the search algorithm will not require execution of steps 5 and 8. in fact, with the hash index files designed as described in the previous sections, the probability of execution of step 5 is about 0.01 and the probability of execution of step 8 is about 2-16• consequently, if step 9 is also omitted the number of disk accesses required to find the index entry corresponding to a search key is approximately l.ol. the mismatch problem, which gives rise to step 9 of the search algorithm, is less serious than might be expected. suppose the hash function distributes the transformed keys uniformly over all hash addresses. the probability that a new, or misspelled, key maps into an existing entry is given by (14) pc = njv the probability that a search leads to a mismatch is therefore ( 15) p m = p .n j v where ps is the probability that the search key is misspelled or not in the hash table. thus, for a hash table of n = 216= 65,536 title words and v = 28\ an assumption of ps = 0.1leads to pm = 3 x 106• because pm is extremely small, and because each execution of step 9 requires up to two disk accesses, it is desirable to omit this step. if experience shows that particular new or misspelled search keys occur frequently, and cause mismatches, they may themselves be entered into the hash index file. in fact, some degree of automatic spelling correction may be provided if some common misspellings are included in the hash files and chained to the content entries that correspond to the correctly spelled keys. correct, but alternative, spellings of search keys may also be treated in the same manner. 48 journal of library automation vol. 6/1 march 1973 size of hash file for title words suppose the docwnent collection contains t different titles that comprise a total of w words of which there are n different words. let w = w /t denote the average number of words in each title. reid and heaps85 have reported word counts on the 57,800 titles included on the marc tapes between march 1969 and may 1970 and have noted that (16) w = 5.5 ( 17) log10n = 0.6 log1ow + 1.2. examination of other data bases has led to the conclusion that log n is likely to be a linear function of log w over the range 0 l w l 106• for a library of one million titles the equations 16 and 17 may therefore be used to predict that when t = 106 then (18) w :::: 5.5 x 106 and n = 1.8 x 105 • it follows from equation 6 that if a = 1 the number of bits required in the major is (19) r = 18. according to equation 7, in order to reduce the frequency f of collisions at virtual addresses to 2-16 the number of bits required in the entire virtual address is (20) v = r [lo~ (1.8 x 105 + 16 1] = 33. consequently, the number of bits in the minor is ( 21) m = v r = 15. however, with such a choice of r then r = 218 and the value of the load factor is, in fact, (22) a = n/r = 0.7 it follows from equation 4 that the expected total number p of collisions at virtual addresses is equal to approximately 2. it may be further noted that murray36 has derived the following approximation for the probability that the number of collisions at virtual hash addresses lies within the range a to d: d (23) p (a, d) = ~ e-"p p1/il (0 ~i l u~ n) i= a where l means "greatest integer less than or equal to." when p = 2 the equation gives a value of 0.9998 for the probability that the total number of collisions lies between 0 and 8. thus the above choice of r, v, and m leads to a title word hash table file with excellent virtual address collision properties. use of equation 10 with b = 64 and a= 0.7, leads to the result that the probability of bucket overflow may be reduced to 0.01 by choosing c = 62. in view of the above value of m it proves convenient to allocate 10 bytes of storage for each content entry. each entry consists of a 2-byte portion to contain the 15-bit minor preceded by a collision bit, a 1-byte portion to file structure for an on-line catalogj dimsdale 49 contain a 7-bit chain pointer preceded by an overflow bit, a 3-byte dictionary pointer, and a 4-byte pointer to an inverted index. the 64 one-byte index entries, the 62 ten-byte content entries, and 4 one-byte counters, constitute buckets of length 688 bytes. the entire hash file consists of r entries, and hence r/b = 212 buckets. its storage requirement is therefore for 212 x 688 = 2.82 x 106 bytes. it may be remarked that nine 688-byte buckets may be stored unblocked in one track of an ibm 2316 disk pack, and that the entire hash file occupies 11.38 percent of the disk pack. when the disk and channel are idle the average time to access such a bucket is the sum of the average seek time, the average rotational delay, and the record transmission time. for storage on an ibm 2314 disk drive the average bucket access time is therefore 60 + 12.5 + 2.8 = 75.3 milliseconds. the average access time for a sequence of accesses could be reduced by suitable scheduling. size of hash file for lc call numbers for a library of one million titles the number n of call numbers is 106• if a = 1 and f = .2-16 it follows from equations 6, 7, 9, and 4 that (24) r = 20, v = 35, m = 15, p = 16. with such a choice of r the load factor is approximately equal to 1. equa~ tion 23 gives a probability of 0.9998 that the total number of virtual address collisions lies between 0 and 34. use of equation 10 with b = 64 and a = 1.0 shows that the probability of bucket overflow may be reduced to 0.01 by choosing c = 85. the content entries for lc call numbers may be arranged as for title words except that the 4-byte pointer to an inverted index is replaced by a 3-byte pointer to the compressed catalog file. the bucket length is therefore 64 + 85 x 9 + 4 = 833 bytes. the storage requirement for the hash file is ( 220/ 26 ) x 833 = 13.65 x 106 bytes which may be stored in 2184 tracks, or 54.6 percent, of an ibm 2.316 disk pack. the average time to access a bucket is 60 + 12.5 + 3.3 = 75.8 milliseconds. size of hash file for author names in the present section the term "author" will be used to include personal names, corporate names, editors, compilers, composers, translators, and so forth. it will be assumed that for personal names only surnames are entered into the author dictionary. a search query that includes specification of authors with initials is first processed as if initials were omitted, and the resulting retrieved catalog entries are then scanned sequentially to eliminate any entries whose authors do not have the required initials. it will also be supposed that each word of a corporate name is entered separately into the author dictionary, and that the inverted index contains an entry for each term. in the absence of reliable statistics regarding the distributions of author 50 journal of library automation vol. 6/1 march 1973 surnames, words within corporate names, and so forth, the following as~ sumptions have been made in order to estimate tile size of the author dictionary and hash file for a library of one million titles: ( i) personal author names contain 2 x 105 different surnames of average length 7 characters. ( ii) the corporate author names include 4 x 104 different words of average length 6 characters. (iii) the author names include 1.6 x 104 different acronyms such as ibm, aslib, and so forth; their average length is 4 characters. it is thus supposed that n = 2.56 x 105 entries are required in the author hash files. calculations similar to those of the previous section show that ( 25) r = 18, v = 33, m = 15, p = 4, a = 1.0. equation 23 gives a probability of 0.9999 that the total number of virtual address collisions lies between 0 and 13. the probability of bucket overflow may be reduced to 0.01 by choosing c = 85. content entries of 10 bytes may be arranged as previously described for title words. hence each bucket requires 918 bytes of storage. the storage requirement for the hash file is ( 218/ 26 ) x 918 = 3.76 x 106 bytes which may be stored in 586 tracks, or 14.6 percent, of an ibm 2316 disk pack. the average time to access a bucket is 76.1 milliseconds. structure of dictionary files the structure of the dictionary files for title words and author names is as described by thiel and heaps.87• 38 each dictionary file contains up to 128 directories each of which points to up to 128 term strings that may each contain space for storage of 128 terms of equal length. thus each dictionary file contains up to 214 different terms. the dictionary pointers in the hash files are essentially the codes stored instead of alphanumeric terms in the catalog file. the most frequent 127 title words are assigned dictionary pointers of the form (26) 10000000 10000000 1xxxxxxx pt and do not have corresponding entries in the inverted index file. the last byte forms the code used to represent the title word within the compressed catalog file. the next most frequent 16,384 title words are assigned dictionary pointers of the form ( 27) 00000000 1xxxxxxx lxxxxxxx or (28) 10000000 oxxxxxxx 1xxxxxxx file structure for an on-line catalog/ d!msdale 51 according as there is, or is not, a corresponding entry in the inverted index. the last 2 bytes are used as codes in the compressed catalog file. the remaining title words are assigned dictionary pointers of the form ( 29) oxxxxxxx oxxxxxxx lxxxxxxx ...____~ --...------' p~ p~ pt they all have corresponding entries in the inverted index file, and the 3 bytes are used as codes in the catalog file. the reason that terms coded in the form 26 or 28 do not have corresponding entries in the inverted index file is that very frequently occurring terms form very inefficient search keys. also, previous results suggest that omission of corresponding entries in the inverted index allows its size to be reduced by about 50 percent.39• 40 the codes of type pt, ( ps,pt) , and ( pn,ps,pt) are used respectively for approximately 50 percent, 45 percent, and 5 percent of the title words. the average length of the coded title words in the compressed catalog file is therefore 1.55 bytes. associated with each dictionary file there is a directory of length 512 bytes whose entries point to the beginnings of term strings within the dictionary file and also indicate the lengths of the terms. within the hash table file a dictionary pointer of the form po, p s, pt points to the pt th term of the ps th term string in the dictionary associated with the po th directory. there is a single directory associated with each set of pointers of type pt and ps, pt. the average length of the 1.8 x 105 different title words is 7.6 characters, and hence the entire set of term strings requires 1.8 x 105 x 7.6 = 1.37 x 106 bytes for storage of title words. since twelve directories occupying 12 x 512 = 6144 bytes will be required, and since some term strings will contain unfilled portions, the storage requirement of the dictionary file will be slightly larger. if the title word dictionary is stored on disk in 1,000 byte records then the storage requirement is 238 tracks, or 5.95 percent, of an ibm 2316 disk pack. the assumptions made previously regarding author names imply an author dictionary size of 1.70 x 106 bytes and sixteen directories whose total storage requirements are 16 x 512 = 8,192 bytes. using an ibm: 2316 disk pack the storage requirement is for 286 tracks, or 7.15 percent. on completion of a search through use of the inverted index .file there results a set of sequence numbers that indicate the position of the relevant items in the compressed catalog file. before such items are displayed to a user of the system, each term must be decoded through access to the directory and dictionary to which it points. the time required to decode a catalog item depends on how the directories and dictionaries are partitioned between disk and core memory. several partitioning schemes for title words have been analysed, and the results are summarized in table 2. 52 journal of library automation vol. 6/1 march 1973 in the calculations used to obtain table 2 it is assumed that title words occur with the frequencies listed by kucera and francis.41 it is supposed that both the directory and term strings corresponding to codes of form pt are stored in a single physical record, that every other directory is contained wholly within a physical record, and that each dictionary term may be located by a single access to a term string. any required cpu time is regarded as insignificant compared to the time needed for file accesses. from the results shown in table 2 it appears that the best partition between core and disk is probably that which gives an average decode time of 42 milliseconds while requiring a dedicated 1501 bytes of core memory. this results when core is used to store both the directories and term strings for terms that correspond to pointers of type pt, and the directories only for terms that correspond to pointers of type ps,pt. compressed catalog file since the title word codes stored in the compressed catalog file have an average length of 1.55 bytes, whereas uncoded title words and their delimiting spaces have an average length of 6.5 characters, the compressed title fields occupy only 24 percent of the storage required for uncompressed words. uncoded author names and their delimiting spaces have an average length of 7.6 characters and are coded to occupy not more than 3 bytes; hence coding of author names effects an average compression factqr of less than 3;7.6 = 40 percent. for lc call numbers the compression factor is less than 30 percent. clearly, subject headings, publisher names, and series statements may be coded with even more effective compression factors. the saving in space through compression of the catalog file may be translated into a cost saving as follows. if there are an average of 5.5 words in each title then one million titles include 5.5 x 106 title words and delimiting spaces which, if stored in the catalog file in uncoded form, would require 3.63 x 107 bytes.42 when stored in coded form the requirement is for 8.54 x 106 bytes. charges for disk space vary considerably with different computing facilities. at the university of alberta users of the ibm 360 model 67 are charged a monthly rate of $.50 for each 4,096 bytes of disk storage. thus, for title words alone the advantage of storing the catalog file in compressed form is to allow the monthly storage cost to be reduced from $4,440 to $950. concluding remarks the results reported in the present paper indicate that a satisfactory structure for a catalog file may be designed to use the concept of virtual hash addressing and storage of terms in compressed form. access and decoding times may be reduced to acceptable amounts. it may prove advantageous to arrange the items in the catalog file in the order of their call numbers. this will tend to reduce the number of disk file structure for an on-line catalog/ dimsdale 53 table 2. average time to decode a title word of the compressed catalog file. core resident directories ter-m string none pr pt, (ps, pt) all pt, (ps, pt) all none pr pr p.,. pt, (ps, pr) 0 pr, (ps, pr ) 0 average number accesses 1.50 1.01 0.55 0.50 0.49 0.44 ( ps, pr) 0 signifies the 128 most frequent of the codes ps, pt average decode time (milliseconds) ll5 77 42 39 38 34 dedicated core memory (bytes) 0 989 1501 7133 2474 8106 accesses needed to retrieve catalog items in response to queries since it will tend to group relevant items. however, the benefits should be weighed against the additional expense required to maintain and update the ordered file. the present paper has omitted discussion of the form of the query language or the search algorithm that operates on the elements of the inverted index. a formal definition of one form of query language has been discussed by dimsdale.48 details of a search algorithm and structure of a compressed form of inverted index have been discussed by thiel and heaps.4 4 it may be noted that each content entry in the hash table file has 4 bytes reserved for a pointer to a bit string of the inverted index. whenever the bit string is less than 4 bytes in length it is stored in the content section and no pointer is required. storage of such bit strings within the content entries significantly reduces the storage requirements of the inverted index and also reduces the number of required disk accesses in the search phase of the program. acknowledgment the authors wish to express their appreciation to the national research council of canada for their support of the present investigation. references 1. d. lefkovitz and r. v. powers, "a list-structured chemical information retl"ieval system," in g. schecter, ed., informatio-n retrieval (washington, d.c.: thompson book co., 1967), p.l09-29. 2. p. r. weinberg, "a time sharing chemical information retrieval system" (doctoral thesis, univ. of pennsylvania, 1969) . 3. r. m. curtice, "experimental retrieval systems studies. report no. 1. magnetic tape and disc file organization for retrieval" (master's thesis, lehigh univ., 1966). 4. d. lefkovitz, file strttctures for on-line systems (new york: spartan books, 1969). 54 journal of library automation vol. 6/ 1 march 1973 5. i. b. holbrook, "a threaded-file retrieval system," journal of the american society for information science 21: 4048 (jan.-feb. 1970). 6. g. g. dodd, "elements of data management systems," computer surveys 1:11733 (june 1969). 7. j. w. rettenmayer, "file ordering and retrieval cost," information storage and retrieval8:19-93 (april1972). 8. r. t. divett, "design of a file structure for a total system computer program for medical libraries and programming of the book citation module" (doctoral thesis, univ. of utah, 1968). 9. h. p. burnaugh, "the bold (bibliographic on-line display) system," in g. schecter, ed., information retrieval (washington, d .c.: thompson book co., 1967)' p.53-66. 10. lefkovitz, powers, "a list-structured chemical information," p.109--29. 11. lefkovitz, file structures for on-line systetm, p.141. 12. ibid., p.177. 13, f. g. kilgour, "concept of an on-line computerized catalog," journal of library automation 3:1-11 (march 1970). 14. j. l. cunningham, w. d. schieber, and r. m. shoffner, a study of the organization and search of bibliographic holdings records in on-line computer systetm: phase i (berkeley: univ. of california, 1969). 15. r. s. marcus, p. kugel, and r. l. kusik, "an experimental computer stored, augmented catalog of professional literature," in proceedings of the 1969 spring joint computer conference (montvale: afips press, 1969) p.461-73. 16. j. w. henderson and j. a. rosenthal, eds., library catalogs: their preservation and maintenance by photographic and automate d techniques; m.i.t. report 14 (cambridge, mass.: m.i.t. press, 1968). 17. i. a. warheit, "file organization of library records," journal of library automation 2:2(}...30 (march 1969) . 18. r. morris, "scatter storage techniques," communications of the acm 11 :38-44 (jan. 1968) . 19. d. m. murray, "a scatter storage scheme for dictionary lookups," journal of library automation 3:173-201 (sept. 1970). 20. w. buchholz, "file organization and addressing," ibm systems journal 2:86-111 {june 1963). 21. p. l. long, k. b. l. rastogi, j. e. rush, and j. a. wyckoff, "large on-line files of bibliographic data: an efficient design and a mathematical predictor of retrieval behavior," in information processing 71 (north holland publishing company, 1972) p.473-78. 22. buchholz, "file organization," p.l02-3. 23. w. p. reising, "note on random addressing techniques," ibm systems journal 2:11216 (june 1963). 24. murray, "a scatter storage scheme," p.178. 25. ibid., p.181. 26. g. schay and w. g. spruth, "analysis of a file addressing method," communications of the acm 5:459-62 (august 1962). 27. m. tainter, "addressing for random-access storage with multiple bucket capacities," journal of the acm 10:307-15 (july 1963). 28. reising, "note on random addressing," p.ll2-16. 29. w. h. beyer, handbook of tables for probability and statistics (cleveland: the chemical rubber company, 1966). 30. buchholz, "file organization," p.99. 31. morris, "scatter storage," p.42. 32. w. d. maurer, "an improved hash code for scatter storage," communications of the acm 11:35-38 (jan. 1968). file structure for an on-line catalog/dimsdale 55 33. j. r. bell, "the quadratic quotient method: a hash code eliminating secondary clustering," communications of the acm 13:107-9 (feb. 1970). 34. morris, "scatter storage," p.40. 35. w. d. reid and h. s. heaps, "compression of data for library automation," in canadian association of college and university libraries: automation in libraries1971 (ottawa: canadian library association, 1971), p.2.1-2.21. 36. murray, "a scatter storage scheme," p.183. 37. l. h. thiel and h. s. heaps, "program design for retrospective searches on large data bases," information storage and retrieval8:1-20 (jan. 1972) . 38. h. s. heaps, "storage analysis of a compression coding for document data bases," infor 10:47-61 (feb. 1972) . 39. thiel and heaps, "program design," p.l5-16. 40. reid and heaps, "compression of data," p.2.1-2.21. 41. h. kucera and w. n. francis, computational analysis of present-day american english (providence: brown university press, 1967). 42. reid and heaps, "compression of data," p.2.4. 43. j. j. dimsdale, "application of on-line computer systems to library automation" (master's thesis, univ. of alberta, 1971), p.50-68. 44. thiel and heaps, "program design," p.l-20. lib-s-mocs-kmc364-20141005043558 56 highlight of minutes information science and automation division board of directors meeting 1973 midwinter meeting washington, d. c. monday, january 29, 1973 the meeting was called to order by president ralph shoffner at 8:10a.m. the following were present: board-ralph m. shoffner (chairman ) , richard s. angell, don s. culbertson (!sad executive secretary), paul j. fasana, donald p. hammer, susan k. martin, and bemiece coulter, secretary, isad. committee chairman-stephen r. salmon. guestscharles stevens and david weisbrod. report of national commission on library and information science. mr. charles stevens, executive director of the national commission on library and information science, discussed the commission's priorities and objectives for planning libmry and information services for the nation. the commission has identified six areas of activity in which to conduct investigations in relation to its charge which is to study " ... library and information services adequate to meet the needs of the people of the united states." these six: areas are: ( 1) understanding information needs of the users; ( 2) adequacies and deficiencies of current library and information services; ( 3) pattems of organization; ( 4) legal and financial restrictions on libraries; ( 5) technology in library and information systems; and ( 6) human resources. report to ala planning committee. the report to the ala planning committee on !sad's long range plans was deferred until after the !sad objectives committee report is received in june. objectives committee interim report. mr. stephen salmon, chairman, provided an interim report of the committee. the committee will recommend that the division continue to exist and will list its proposed objectives, which may differ from the original objectives. at the request of louise giles, chairman of the information technology discussion group, special attention will be given to that group's interests in formulating the statement of objectives. membership survey committee. mr. shoffner relayed ms. pope's report that the membership survey will cost $700.00, which is not available in the current budget. mr. culbertson said that the cost could be decreased by surveying a sample of 1,000 members. the decision was to highlights of minutes 57 request the full amount for the survey, to be performed in the coming fiscal year. asidic representative. mr. peter t. watson, through correspondence with mr. shoffner, reported that asidic is interested in liaison with ala, and was concerned with the possibility of accomplishing this through isad. mr. culbertson reported that asidic could become an affiliate of ala for a $40.00 fee, but that isad could recommend a formal liaison, especially if isad and asidic had similar interests. motion. it was moved by paul fasana that this matter of asidic liaison with ala be passed on to the executive director, mr. robert wedgeworth, and that the president of isad write and inform him of such. seconded by richard angell. carried. policy statement on privacy of data processing records. mr. culbertson had been approached about !sad's making a statement on broad issues of data processing, including privacy. a need has been made known by the ala washington office for having such a statement on which to base their stand in certain hearings. mr. hammer felt it very appropriate that the association (ala) take a position on it. mr. weisbrod mentioned that !sad could be involved because of the vulnerability of machine-readable files due to the large quantity of data processed. motion. it was moved by paul fasana that the isad board recommend to the ala council that it (ala) develop some policy expressing its membership's attitude toward the privacy of machine-readable data. seconded by donald hammer. carried. ]ola editor. mr. shoffner reported, concerning the appointment of an editor, that two contacts were outstanding and he would report to the board on wednesday. mr. culbertson has been serving as temporary editor. mr. fasana noted that the schedule for 1972 was for four issues, but only one had appeared. he asked what plans there were to catch up or cancel. mr. culbertson said that legally isad could not cancel any issues, and that a statement had been written for the "memo to members" section of american libraries. he also mentioned the previous board action to have ]ola te chnical communications become a part of the 1973 volume. wednesday, january 31, 1973 mr. shoffner called the meeting to order at 10:00 a.m. those present were: board-ralph m. shoffner (chairman), richards. angell, dons. culbertson (!sad executive secretary), paul j. fasana, donald p. hammer, susan k. martin, and berniece coulter, secretary, isad. committee chairmen-brigitte kenney, ronald miller, and velma veneziano. guest-peter watson. conference planning committee report. mrs. susan 58 journal of library automation vol. 6/ 1 march 1973 martin, chairman, reported that the 1972 seminar on telecommunications had been successful, and the april seminar with the national microfilm association in detroit was proceeding as scheduled. the seminar on the national libraries, originally scheduled for january, and the seminar on netm works which was to be in march had both been postponed until the next fiscal year. planning of the las vegas preconference program is continuing smoothly; the institute is to be concerned with a review of the state-of-the-art of library automation. it will update the !sad preconference institute of 1967. isad / led education committee report. a written report was submitted. (see exhibit 1.) rtsd / isad / rasd representation in machine-readable form of bibliographic information committee report. chairman vehna veneziano reported that as a result of a ]ola technical communications announcement that the committee meeting was open and that there would be discussion of the controversial international standard bibliographic description ( isbd), 2()0-300 persons attended the committee meeting. the committee felt that changes such as isbd in the marc records by the library of congress should take into account the users of the marc distribution service. committee action on the isbd was delayed until the isbd for serials proposal was further along. it was stated that the isbd for serials should be as consistent as possible with the isbd for monographs. the committee suggested that each division publish these standards in its journal. motion. it was moved by paul fasana that the !sad board suggest to the jola editorial board that discussion drafts of standards be published in the ] ournal of library automation. seconded by donald hammer. carried. mrs. veneziano pointed out that a resolution was passed concerning the formation of an ad hoc task force for a period of two years. the task force would work with emerging standards relating to character sets: greek and cyrillic alphabets; mathematical and logical symbols; and control characters relating to communications. three persons were suggested for the task force: charles payne of the university of chicago, david weisbrod of yale, and michael malinconico of the new york public library in addition to lucia rather and henriette avram of the library of congress. the task force would report back to the board through the committee. motion. it was moved by paul fasana that !sad consider the creation of a task force to work with emerging standards relating to character sets and the insertion of a fund request in the isad budget for $1,060 ( $700 for 2 trips for 3 persons and $360 per diem for 3 persons for 2 days for each of 2 trips). seconded by donald hammer. carried. highlights of minutes 59 the committee wished to go on record that since rtsd had recently formed a committee on computer filing that computer filing rules was a function of the interdivisional committee on representation in machinereadable form of bibliographic information. the subject of library codes was discussed. bowker was assigning numeric codes to libraries, book publishers, and book dealers. the committee is concerned about standards and does not wish to see the creation of systems of incompatible codes. telecommunications committee report. brigitte kenney, chairman, submitted a written 18-month report of the committee (exhibit 2). miss kenney announced that she was resigning as chairman of the committee and that no present member was available to assume the chairmanship. mr. hammer, as president-elect, was charged with appointing the next chairman. the function statement of the telecommunications committee has been grouped into four areas: ( 1) communication to members; ( 2) training; ( 3) legislative matters; and ( 4) research. she pointed out that both ]ola technical communications and american libraries had said in writing that they would accept articles on telecommunications, particularly cable tv, and had accepted none. also, she ltad attempted for a year and a half to assemble an information packet at ala headquarters, but did not know the status of the project. headquarters had requested guidelines on cable policy from the committee; she stated that they had not succeeded in completing this task. no guidelines had been provided. mter !sad and ala sources did not respond to a request to publish a cable newsletter, the american society for information science was approached. the asis council approved this the previous friday and she had obtained seed money from the markle foundation. miss kenney referred to the resolution introduced that afternoon in council that an ad hoc ala committee be established to address itself exclusively to cable matters and be representative of all units of ala, and that it take on very specific tasks with clearly delineated time limits. she further stated that she had not felt that !sad had given adequate support to the isad telecommunications committee's activities, and thought that the board would have to decide if this was an appropriate committee for !sad. if so, was the function statement too broad? should it be narrowed to just data transfer? miss kenney also suggested that the committee be expanded in size to include more people involved in telecommunications. in the discussion which followed it was indicated that it could take from two to three years to set up a committee in ala as an interdivisional committee. it was decided that a committee chairman should be found and that 60 journal of library automation vol. 6/1 march 1973 the board could then work with the chairman in the definition of the tasks to be performed. publishing of minutes. it was decided that the board of directors express to the editorial board their desire that the minutes of board meetings be published in the journal. seminar and institute topics committee report. ronald miller~ chairman, enumerated the following points of the committee's meeting: that ( 1) a long range plan for seminar programs be written to cover the period from july 1974 through june 1978; (2) part of the money from the institutes be budgeted to support a professional staff person at ala headquarters to handle the burden of the work; ( 3) policy be established concerning commercial groups using isad programs for a marketing channel, particularly products of use to libraries; ( 4) institutes or seminars be regionalized in the u.s. and canada; and ( 5) liaison efforts be utilized (a) within the network of ala, (b) through subcontractors, and (c) through continuing education programs of library schools or other institutes of higher education. in the discussion by the board it was agreed that a written document, both specific and general, be put before the isad membership concerning future seminar and institute topics in order to obtain reactions. ]ola editor appointed motion. it was moved by donald hammer that the board approve the appointment of susan k. martin as editor of the journal of library automation. seconded by paul fasana. carrjed. tribute to don culbertson. "the board commended don s. culbertson for long, energetic and useful service to isad." exhibit 1 january 23, 1973 !sad/led education committee report the isad/led education committee met sunday, january 28, at 9:30 a.m. in the garden restaurant of the shoreham hotel. present were members james liesener, robert kemper, gerald jahoda, edward heiliger, and (ex officio ) ralph shoffner. absent were ann painter and duane johnson. discussion focused on disc (developmental information science curriculum), what has been achieved by the disc contingent working under the aegis of asis, and how isad/ led could contribute to achieving the disc objective of producing transferable "modules" or packaged programs for information science teaching. it was decided that to reach this objective what would be required were: ( 1) an overall structure or frame of reference which could be used to coordinate modules developed by interested and dedicated individuals. (2) specifications for module construction. re 2lt was decided to await the completion of modules currently b eing developed by charles davis and david buttz and to examine these (at las vegas) as providing guidelines for module specifications. highughts of minutes 61 re l-it was suggested by ralph shoffner that a frame of reference might be achieved, with some dispatch, by drawing up a list of about 20 questions in the area of information science, which library schools might expect their graduating students to answer, each question being answerable in no more than an hour. the idea was that modules might be designed around these questions. also, it was seen that these questions might serve a useful purpose in organizing information science teaching in light of professional program evaluation and accreditation. the suggestion of "questions" was enthusiastically received and the following day gerald jahoda, edward heiliger and charles davis drew up a "sample" list of questions and outlined the following procedure: ( 1) the sample list of questions is sent to isad /led education committee members as well as to asis sig/eis and asis education committee members for recommendations in the way of additions, deletions and word revisions. by february 15, 1973. (2} the questions are revised and edited by an ad hoc committee consisting of interested members of the three committees involved. by march 30, 1973. ( 3) the revised list of questions is sent to accredited library schools in the u.s. and canada for additions, deletions and word revisions. by april 15. ( 4) !sad/led education committee members together with invited members from the asis committees involved revise the question list at las vegas. (5) designating potential module constructors for each of the questions on the final question list. formulation of module specifications at las vegas. immediately after las vegas the designated module constructors will be solicited. they will be sent a "question" together with module specification. this is where we are january 29, 1973. exhibit 2 respectfully submitted, elaine svenonius telecommunications committee annual report 1972/ 73 1. communications: a. cable newsletter: after exhausting every possible avenue within ala (amlibs, lola technicaj. commtmications, headquarters clearinghouse, information packet) the chairman received the mandate considered necessary to go ahead with plans for an effective communications medium. the mandate came in the form of a unanimous resolution from the 104 attendees at the cable institute, held in september, to produce such a newsletter. !sad board approval/endorsement was obtained, and lbe chairman approached asis which will publish the newsletter. start-up money was obtained from the markle foundation for the first promotional issue, which will receive widest distribution. based on response to the initial mailing, the newsletter will continue on a subscription basis, provided 750 subscriptions are obtained. the chairman and two other people will volunteer their time as coeditors. b. the chairman has been operating a clearinghouse on cable information out of her office, which has become incredibly time-consuming. it is impossible for one person to do all that is needed; innumerable letters have been written and phone conversations held with people and groups wanting advice on dozens of issues connected with cable. it is hoped that the newsletter, the proceedings of the cable institute, and a soon-to-be-established task force within srrt on cable will lessen the almost impossible load. c. specific letters were written in response to requests from the rocky mountain 62 ]oumal of library automation vol. 6/1 march 1973 federation (justification of library use of the ats-f satellite), senator mike gravel (introduced several bills on telecommunications, wanted to know what libraries could do with this medium), and a presentation will be made to the national commission hearings in new york. d. a librarian-representative was located, suggested, and subsequently appointed to the fcc federal-state-local advisory committee on cable. (a first for librarians!) e. liaison was maintained with nonlibrary groups: publicable, of which the chairman is a member, the mitre symposium on cable, to which the chairman was invited, and, as a result of that meeting, the aspen workshop on cable in palo alto, which the chairman attended by invitation from douglass cater, together with eight other people, to decide on the direction this activity should take. at all three meetings the chairman attempted to represent the library viewpoint on cable. f. a las vegas program was to be planned, together \vith acrl and the av committee. plans did not materialize, and the committee is being approached by the soon-to-be-established srrt task force on cable to cosponsor a program on cable at las vegas. 2. training: 1. institute on cable television for librarians: held september 1720, 1972, and attended by over 100 librarians from thirty-four states, representing public and state libraries primarily, this was directed by the chairman, and funded by usoe. russell shank and frank norwood, consultants to the tc committee presented major talks. the entire institute was videotaped and the tapes are available. proceedings will be issued in march as a double issue of the drexel library quarterly. the institute was designed to provide a format and material (including videotaped presentations) to allow others to do their own institutes. 2. telecommunications seminar: conducted by russell shank, consultant to the tc committee, it presented an overview over various aspects of telecommunications. held in washington september 25-26, 1972, it, too, was attended by almost 100 ji. brarians from all types of libraries. the chairman and frank norwood, consultant, participated in the presentation of papers. 3. legislative matters: the committee expressed its concern to the ala legislation committee about the lack of sufficient personnel to keep abreast of legislative and regulatory matters affecting telecommunications. the chairman of the legi~lation committee responded by stating that the ala washington office had been trying to do their best, in the absence of funding for additional personnel, and would continue to do so. the committee attempts to follow legislative and regulatory developments in the telecommunications area, and works closely with the washington office in this activity, providing persons to testify, and supplying two of the four members of the subcommittee on copyright (shank and kenney) . the committee participated actively in the revision of the ala policy booklet, concerning itself with matters pertaining to networks and telecommunications. all recommendations were incorporated in the final draft of this document. 4. research: the telecommunications requirements study, long ago proposed, is dormant. shank and kenney are actively working on putting together a proposal to respond to a call for proposals from nsf in the area of telecommunications policy research. the committee will discuss the proposal during the midwinter meeting, 1973. respectfully submitted, brigitte kenney the library of congress view on its relation to the ala marc advisory committee henriette d. avram: marc development office, libraq of congress. 119 this paper is a statement of the library of congress' 1'ecommendation that a marc advisory committee be appointed within the present structure of the rtsd jisad jrasd committee on representation in machine-readable form of bibliographic information (marbi) and describes the library's proposed relation to such a committee. the proposals and recommendations suggested were adopted by the marbi committee dming its deliberations at ala midwinter, janua1'y 1974, and a1'e now in effect. introduction during ala midwinter, january 1973, the library of congress (lc) suggested to the rtsd/isad/rasd committee on representation in machine-readable form of bibliographic information that a marc advisory committee be formed to work with the marc development office regarding changes made to the various marc formats. the primary interest of the committee would be the serial and monograph formats, though the committee should have interest in and responsibility for reviewing changes in any of the marc formats to insure that the integrity and compatibility of marc content designators are preserved. the marbi committee decided that it would be the marc advisory committee and asked that a paper be prepared proposing how such a committee would operate in relationship to the marc development office. prior to a discussion of marc changes, it appears appropriate to make certain basic statements regarding marc changes and the difficulties experienced by the marc development office in evaluating the significance of a change for the marc subscriber. it would be naive to assume, in a dynamic situation, that even in the best of all worlds a marc subscriber would never have to do any reprogramming. changes in procedures, changes in cataloging, experience in providing the knowledge for more efficient ways to process information, additional requirements from users, etc., have always been factors creating the 120 ] ournal of library automation vol. 7/2 june 197 4 need to both modify andjor expand an automated system. programming installations always require personnel to maintain ongoing systems. situations creating changes locally must exist and, likewise, they also exist at lc. staff of the marc development office give serious consideration to every proposed marc change and its impact on the marc subscribers. however, it must be realized that it is not possible to evaluate fully the impact of each change because the significance of a change is directly dependent on the use made of the elements of the record and the programming techniques used by each subscriber. marc staff cannot possibly know the details of use and programming techniques and capabilities at every user installation. each marc subscriber evaluates a change in light of his operational requirements. since the uses made of the data are varied among users, there is rarely a consensus as to the pros and cons of a change. marc staff are aware of the expenses imposed by changes to software and have made an attempt to solicit preferences in some cases for one technique over another from marc subscribers when changes were required. in the case of the isbd implementation, ten replies were received from questions submitted to the then sixty-two marc users. the remainder of this paper describes what is included in the term "change," the various stimuli that initiate changes, and recommendations of how lc and the marc advisory committee should interact in regard to changes. the appendix summarizes in chart form the addenda to books: a marc fo1·mat since the initiation of the marc service. an examination of the chart will reveal that the number and the types of changes have not been too significant. marc changes the term "change" is used throughout this paper in the broad sense, i.e., the term includes additions, modifications, and deletions of content data (in both fixed and variable fields) and content designators (tags, indicators, and subfield codes) made to the format as well as additions, modifications, and deletions made to the tape labels. the concern is with changes made to all records where applicable or groups of records but not with the correction or updating of individual records as part of the marc distribution service. changes as described above fall into several broad types: 1. addition of new fields, indicators, or subfield codes to the format. 2. implementation of aheady defined but unused tags, indicators, subfield codes, or fixed fields. 3. modification of content data of fields (fixed and variable). 4. changes in style of content in records, e.g., punctuation. 5. cessation in use of existing fields, indicators, and subfield codes. library of congress view/ avram 121 the following paragraphs are divided into two sections. section "a" describes the stimulus for a change and the rationale for making it. section "b" describes the lc position regarding the change and, where applicable, a recommendation to the marc advisory committee. changes made to marc records may be divided into the following categories: category 1: changes resulting from a change in cataloging rules or systems. a. cataloging rules or systems fall into two distinct types: those made in consultation with ala (resources & technical services division/cataloging & classification section/descriptive cataloging committee), and those made by the subject cataloging division to the subject cataloging system without consultation with ala. lc follows aacr. since the marc record is the record used for lc bibliographical control as well as the source record for the lc printed card and lc book catalogs (for those items presently within the scope of marc), cataloging changes (descriptive and subject) are necessarily reflected in marc. if the cataloging change is such that the retrospective records can reasonably be modified by automated techniques, these records are modified to reflect the change. prior to marc, this updating could not be provided to subscribers to lc bibliographic products and is one of the advantages of a machine-readable service. it has the effect of maintaining a consistent data base for all marc users. b. changes made in cataloging rules or systems will be made by the appropriate agencies. once changes in cataloging rules have been made by the ala (rtsdjccsjdcc) committee, lc will consult with the marc advisory committee with respect to their implementation in those cases affecting the marc format.'~* wherever possible, depending upon resources available, the number of records affected, and the type of change, the retrospective flies will be updated and made available in one of two ways: if the number of records is small (to be decided by lc), the records will be distributed as corrections through the normal channels of the marc distribution service. if the number of records is large, the records will be sold by the lc card division. category 2: changes made to satisfy a requirement of the library of congress. a. since lc uses the marc records for its own purposes, situations do arise in which lc has a requirement for a change. in most cases, lc feels that the change would also be beneficial to the users. under these circumstances lc has carefully evaluated the im""format change is used in this context to mean a change affecting the tags, indicators, subfield codes, addition or deletion of fixed fields, or change to the leader. 122 i oumal of libmry automation vol. 7/2 june 197 4 plication of the change to the marc subscribers and, in some cases, solicited their preferences and advice. b. if lc has a requirement to make a change to marc, the proposed change and the reason for the change will be referred to the marc advisory committee. the marc advisory committee will solicit opinions from marc users as to whether or not to include the change in the marc distribution service, and lc will abide by the committee's recommendation. if this decision is not to include the change, lc will implement the change only in its own data base.t category 3: changes made to satisfy subscribers' requests. a. subscribers sometimes request that a change be made to a marc record. where possible, within the limitation of lc resources, these requests are complied with. lc, when considering such a request, has sought the opinion of the marc subscribers, and if sufficient numbers of users were interested in the change, the change was implemented. b. changes requested by subscribers will be evaluated by lc, and if considered possible to implement, the proposed change will be submitted by lc to the marc advisory committee to solicit opinions from marc users. if the committee recommends, lc will implement the change. catego1·y 4: changes made to support international standardization. a. lc plays a significant role in international activities in the area of machine-readable cataloging records. much of the future expansion of marc depends upon standards in formats, data content, and cataloging. in all these activities, lc firmly supports aacr and current marc formats. occasionally, in order to arrive at complete agreement with agencies in other countries, it becomes necessary for all to compromise. however, in all cases lc does not agree to changes in cataloging rules until the recommendation has been approved by the appropriate ala committee. b. changes resulting from international meetings will fall principally into two areas: 1. cataloging-if the change required is the result of a change in cataloging rules and the ala (rtsdjccsjdcc) has approved the aacr modifications, the marc change falls into category 1. 2. all other changes affecting the format-since lc is the agency in the u.s. that will exchange machine-readable bibliographic records with other national agencies, lc will consider these t an exception to this statement will be those changes to lc practice which must be reflected on cards and in the marc record and which cannot exist in optional form. an example of the above would be abolition of the check digit in the lc card number. libmry of congress viewj avram 123 changes an internal lc requirement; therefore, they can be considered under the proposal described in category 2. lc will submit the proposed changes to the marc advisory committee. category 5: changes made to expand the marc program to include additional services. a. if the marc service were static, changes to expand the service would not be possible. an example of an additional service is the cataloging in publication data available on marc tapes. since these cataloging data are available four to six months prior to the publication of the item, it was determined to be of value to marc subscribers and'changes were made to the marc record to make these data available in machine-readable form. b. if a new service is under consideration at lc that will cause a change to marc records, e.g., cataloging in publication, lc will submit the proposal to the marc advisory committee for their action as described in category 2. other lc recommendations for the marc advisory committee 1. time fmme fo1' changes. in order to prevent consultation on changes from taking an inordinate length of time, lc proposes that the marc advisory committee be given two months to solicit comments from marc users, to arrive at a consensus, and to respond to proposed changes. if there is no response during that time, lc will implement the proposed change. lc will notify the marc subscribers two months prior to including the change in the marc distribution service. 2. consultation with the marc advisory committee. the marc development office will submit the recommendation for change and any other information required to evaluate the change to the marc advisory committee. the marc advisory committee will be responsible for submitting the proposal to the marc users and notifying the marc development office of the committee's recommendation. 3. test tapes. the marc advisory committee, on consultation with the marc development office, will consider the requirement for a test tape to reflect the change made to the marc record (the requirement for a test tape is dependent on the type of change made). appendix a addenda to books: a marc format stimul~ for change date change 1. cataloging rules and cataloging system changes 1972 u.s./gt. brit. changed to united states and great britain. comments change made to facilitate machine filing. 124 journal of library automation vol. 7/2 june 1974 appendix a-continued stimulus for change date change 1972 isbd. 1973 isbd-additional information. comments cataloging change based on an international agreement. 2. subscribers requests 1972 government publication code 3. initiated at lc: a. addition or deletion of fields added to fixed field. 1969 abolishment of 653-political jurisdiction (subject) and 750-proper name not capable of authorship.' these little-used fields proved difficult to define and of little value. 1970 addition of encoding level to implemented for use for leader. recon records. 1970 addition of geographic area code field, tag 043. 1971 addition of superintendent of documents field, tag 086. this field has been widely used by lc and subscriber libraries. information added to lc catalog cards (and thus to marc records) at the request of outside libraries. b. additions of indicators 1971 addition of filing indicators. or subfields information needed to allow lc to ignore initial articles in arranging its computerproduced book catalog. c. addition or change of codes or data to existing fields 1972 addition of "q" subfield to fields for conferences entered under place. 1969 code added to modified record indicator in fixed field to indicate shortened records. 1969 code for phonodiscs added to illustration fixed field. 1970 code added to modified record indicator in fixed field to indicate that the dashed-on entry on the original lc card was not carried in marc record. 1971 "questionable condition" codes deleted from country of publication code. 1971 geographic area code. guidelines for implementation modified slightly and 23 new codes added. subfield needed to enable lc to file conferences entered under place correctly. 1971 microfilm call numbers description of what such call carried in lc call number field. numbers looked like. 1971 abolished lc card number check digit. numbers available using check digit too limited. library of congress viewjavram 125 appendix a-continued stimulus for change date change comments d. explanations or 1970 use of "b" subfield with subfield and its use inadcorrections topical subjects (field 650) vertently omitted from books: and geographic subjects a marc format. it occurs (field 651). rarely in marc records. 1971 use of "revision date" as explanation of what this insuffix to lc card number. formation means at lc and how subscribers use it. 1971 indicators used with explanation of use of indiromanized title. cators with this field omitted from books: a marc format. e. changes to labels 1972 change to label to reflect new computer system at lc. 4. national and 1970 standard book number (9 international agreement digits ) changed to international standard book number ( 10 digits) to conform to an international standard. 1971 entry map added to leader to adoption of ansi z39 format conform to national standard. for exchange of bibliographic information interchange. 1971 change to label to conform to ansi standard. 5. new services at lc 1969 changes to label and status to provide for cumulative codes for cumulated tapes. quarterly and semiannual tapes. 1971 cip records-addition of codes to encoding level and record status. 139 technical communications announcements panel discussion on «government publications in machine-readable form" this meeting will be held on july 10 from 8:30 to 10:30 p.m. as a part of the american library association's 1974 new york conference. the meeting is cosponsored by the government documents round table's (godort) machinereadable data file committee, the federal librarians round table (flirt), the rasd information retrieval committee, and the rasd/rtsd/ asla public documents committee. the moderator is gretchen dewitt of columbus public library and the panelists are peter watson of ucla, mary pensyl of mit, judith rowe of princeton, and billie salter of yale. mr. watson will discuss the general issues concerning the acquisition and use of bibliographic data files and provide a brief description of some of the files now publicly available; miss pensyl will describe the workings of the project now underway to make these files available to mit users. mrs. rowe will discuss the ways in which government-produced statistical files supplement the related printed reports and will indicate some of the types and sources of files now being released; miss salter will discuss a program for integrating these and other research files into yale's social science reference service. representatives of several federal agencies will display materials describing and documenting both bibliographic and statistical data files. the purpose of the program is to acquaint reference librarians, particularly those now handling printed documents, with the uses of both types of files, the advantages and disadvantages of these reference tools, and the techniques and policy changes necessary for their use in a library environment. the recent release of the draft proposal produced by the national commission on libraries and information services makes more timely than ever an open discussion of the place of bibliographic and numeric data files in a reference collection. all librarians must be acquainted with these growing resources in order to continue to provide full service to their patrons. for further information, contact judith rowe, computer center, princeton university, 87 prospect ave., princeton, nj 08540. ninth annual educational media and technology conference to be hosted by university of wisconsin-stout, july 22-24, 1974 aetc past president dr. jerry kemp, coordinator of instructional development services for san jose state university (california), and film consultant ralph j. amelio, media coordinator and english instructor at willowbrook high school, villa park, illinois, will headline the university of wisconsin-stout's 9th annual educational media and technology conference to be held in menomonie, wisconsin, on july 22-24, 1974. "educational technology: can we realize its potential?" will be the subject of kemp's presentation on monday evening, while amelio, speaking on tuesday, july 23, will challenge participants with the subject "visual literacy: what can you do?". seven concurrent workshops will be held on monday afternoon: library automation; sound for visuals; making the timesharing computer work for you; new developments in photography; what's 140 journal of libmry automation vol. 7/2 june 1974 new in graphics; selecting and evaluating educational media; and instructional development: how to make it work! individuals leading the three-hour workshops will include: alfred baker, vicepresident of science press; john lord, technical service manager for the dukane corporation; william daehling, weber state college, ogden, utah; and several media specialists from learning resources, university of wisconsin-stout. about fifty exhibitors will show and demonstrate both hardware and software during the conference. six case studies will be given of exemplary media programs at the public school, vocational-technical, and college level. further information may be obtained by contacting dr. david p. bernard, dean of learning resources, university of wisconsin-stout, menomonie, wi 54751. report of recon project published the library of congress has published in recon pilot project (vii, 49p.) the final report of a project sponsored by lc, the council on library resources, inc., and the u.s. office of education to determine the problems associated with centralized conversion of retrospective catalog records and distribution of these records from a central source. in the marc pilot project, begun in november 1966, the library of congress distributed machine-readable catalog records for english-language monographs, and the success of that project led to the implementation in march 1969 of the marc distribution service, in which over fifty subscribers have by now received more than 300,000 marc records representing the current english-language monograph cataloging at the library of congress. as coverage is extended to catalog records for foreign-language monographs and for other forms of material, libraries will be able to obtain machine records for a large number of their current titles. more research was needed, however, on the problems of obtaining machinereadable data for retrospective cataloging, and the council on library resources made it possible for lc to engage in november 1968 a task force to study the feasibility of converting retrospective catalog records. the final report of the recon (for retrospective conversion) working task force was published in june 1969. one of the report's recommendations was that a pilot project test various conversion techniques, ideally covering the highest priority materials, english-language monograph records from 1960-68; and with funds from the sponsoring agencies lc initiated a two-year project in august 1969. the present report covers five major areas examined in that period: 1. testing of techniques postulated in the recon report in an operational environment by converting englishlanguage monographs cataloged in 1968 and 1969 but not included in the marc distribution service. 2. development of format recognition, a computer program which can process unedited catalog records and supply all the necessary content designators required for the full marc record. 3. analysis of techniques for the conversion of older english-language materials and titles in foreign languages using the roman alphabet. 4. monitoring the state-of-the-art of input devices that would facilitate conversion of a large data base. 5. a study of microfilming techniques and their associated costs. recon pilot project is available for $1.50 from the superintendent of documents, u.s. government printing office, washington, dc 20402. stock no. 300000061. library of congress issues recon working task fo1'ce report national aspects of creating and using marc/recon records (v, 48p.) reports on studies conducted at the library of congress by the recon working task force under the chairmanship of henriette d. avram. they were made concurrently with a pilot project by the library to test the feasibility of the plan outlined in the task force's first report entitled conversion of retrospective reco1·.ds to machine-readable form (library of congress, 1969) and in recon pilot p1'oject (library of congress, 1972). both the pilot project and the new studies received financial support from the council on library resources, inc., and the u.s. office of education. the present volume describes four investigations: ( 1) the feasibility of determining a level or subset of the established marc content designators (tags, indicators, and subfield codes) that would still allow a library using it to be part of a future national network; ( 2) the practicality of the library of congress using other machine-readable data bases to build a national bibliographic store; ( 3) implications of a national union catalog in machine-readable form; and ( 4) alternative strategies for undertaking a largescale conversion project. the appendices include an explanation of the problems of achieving a cooperatively produced bibliographic data base, a description of the characteristics of the present national union catalog, and an analysis of library of congress card orders for one year. although the findings and recommendations of this report are less optimistic than those of the original recon study, they reaffirm the need for coordinated activity in the conversion of retrospective catalog records and suggest ways in which a large-scale project might be undmtaken. the report provides a basis for realistic planning in a critical area of library automation. national aspects of creating and using marc!recon records is available for $2.75 from the superintendent of documents, u.s. government printing office, washington, dc 20402. stock no. 300000062. isad official activities tesla info1'mation editor's note: use of the following guidelines and forms is described in the article by john kountz in this issue of technical communications 141 jola. the tesla reactor ballot will also appear in subsequent issues of technical communications for reader use, and the tesla standards scoreboard will be presented as cumulate.d 1'esults warrant its publication. to use, photocopy or otherwise duplicate the forms presented in jola-tc, fill out these copies, and mail them to the tesla chai1'man, m1'. john c. kountz, associate fo1' libmry automation, office of the chancello1', the califomia state university and colleges, 5670 wilshim blvd., suite 900, los angeles, ca 90036. initiative standard proposal outlinethe following outline and forms are designed to facilitate review by both the isad committee on technical standards for library automation (tesla) and the membership of initiative standards requirements and to expedite the handling of the initiative standard proposal through the procedure. since the outline will be used for the review process, it is to be followed explicitly. where an initiative standard requirement does not require the use of a tesla reactor ballot reactor information name title organization address city state ___ zip __ telephone identification number for standard requirement for against reason for position: (use additional pages if required} 142 ]oumal of librm·y automation vol. 7/2 june 1974 tesla standards scoreboard receipt screen division rej/acpl publish tally representative title/i.d. number date date date date date date date target specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indicated by: vi. existing standards. not applicable). note that the parenthetical statements following most of the outline entry descriptions relate to the ansi standards proposal section headings to facilitate the translation from this outline to the ansi format. all initiative standards proposals are to be typed, double spaced on 83~" x 11" white paper (typing on one side only). each page is to be numbered consecutively in the upper right-hand corner. the initiator's last name followed by the key word from the title is to appear one line below each page number. i. title of initiative standard proposal (title) . ii. initiator information (forward). a. name b. title c. organization d. address e. city, state, zip f. telephone: area code, number, extension iii. technical area. describe the area of library technology as understood by initiator. be as precise as possible since in large measure the information given here will help determine which ala official representative might best handle this proposal once it has been reviewed and which ala organizational component might best be engaged in the review process. iv. purpose. state the purpose of standard proposal (scope and qualifications) . v. description. briefly describe the standard proposal (specification of the standard) . vi. relationship of other standards. if existing standards have been identified which relate to, or are felt to influence, this standard technical communications 143 proposal, cite them here (expository remarks) . vii. background. describe the research or historical review performed relating to this standard proposal (if applicable, provide a bibliography) and your findings (justification). viii. specifications. specify the standard proposal using record layouts, mechanical drawings, and such related documentation aids as required in addition to text exposition where applicable (specification of the standard). research and development system development corporation awarded national science foundation grant to study interactive searching of large literature data bases santa monica, california-the national science foundation has awarded system development corporation $98,500 for a study of man-machine system communication in on-line reh·ieval systems. the study will focus on interactive searching of very large literature data bases, which has become a major area of interest and activity in the field of information science. at least seven major systems of national or international scope are in operation within the federal government and private industry, and more systems are on the drawing boards or in experimental operation. the principal investigator for the project will be dr. carlos cuadra, manager of sdc's education and library systems department. the project manager, who will be responsible for the day-to-day operation of the fifteen-month effort, is judy wanger, an information systems analyst and project leader with extensive experience in the establishment and use of interactive bibliographic retrieval services. ms. wanger is currently responsible for user training and customer support on sdc's on-line information service. the study will use questionnaire and interview techniques to collect data re144 journal of libml'y automation vol. 7/2 june 1974 lated to: (1) the impact of on-line retrieval usage on the terminal user; (2) the impact of on-line service on the sponsoring institution; and ( 3) the impact of online service on the information-utilization habits of the information consumer. attention will also be given to reliability problems in the transmission chain from the user to the computer and back. the major elements in this chain include: the user; the terminal; the telephone instrument; local telephone lines and switchboards; long-haul communications; the communications-computer interface hardware; the computer itself; and various programs in the computer, including the retrieval program. reports on regional projects and activities california state university and colleges system union list system the library systems project of the california state university and colleges has recently completed a production union list system. this system, comprised of eight processing programs to be run in a very modest environment (currently a cdc 3300), is written in ansi cobol and is fully documented. included in the documentation package are user worksheets for bibliographic and holding data, copies of all reports, file layouts, program descriptions, etc. output from this system are files designed to drive graphic quality photocomposition or com devices. the system is available for the price of duplicating the documentation package. and, for those so desiring, the master file containing some 25,000 titles and titles with references is also available for the cost of duplication. interested parties (bona fides only, please) should contact john c. kountz, associate for library automation, california state university and colleges, 5670 wilshire blvd., suite 900, los angeles, ca 90036, for further details. solinet membe1·ship meeting the annual membership meeting of the southeastern library network (solinet) was held at the georgia institute of technology in atlanta, march 14. it was announced that charles h. stevens, executive director of the national commission on libraries and information science, has been named director of solinet effective july 1. john h. gribbin, chairman of the board, will serve as interim director. it was also announced that solinet will be affiliated with the southern regional education board. sreb will provide office space, act as financial disbursing agent, and will be available at all times in an advisory capacity. negotiations are underway for a tie-in to the ohio college library center ( oclc) and a proposed contract is in the hands of the oclc legal counsel. it is anticipated that a contract soon will be signed. additional to the tie-in, solinet will proceed with the development of its own permanent computer center in atlanta. this center will eventually provide a variety of services and will be coordinated carefully with other developing networks, looking toward a national library network system. elected to fill three vacancies on the board of directors were james f. govan (university of north carolina), gustave a. harrar (university of florida), and robert h. simmons (west georgia college). they will assume office on july 1. anyone desiring information about solinet should write to 130 sixth st., nw, atlanta, ga 30313. reports-library projects and activities new book catalog for junior college district of st. louis the three community college libraries of the junior college district of st. louis have been using computerized union book catalogs since 1964. formerly maintained and produced by an outside contractor, the catalogs are now one product of a new catalog system recently designed and implemented by instructional resources and data processing staff of the district. known as "ir catalog," the system presently has a data base of approximately 65,000 records describing the print and nonprint collections of the district's three college instructional resource centers. in addition to photocomposed author, subject, and title indexes, the system also produces weekly cumulative printouts which supplement the phototypeset ''base" catalog. other output includes three-by-five-inch shelflist cards (which include union holdings information), a motion picture film catalog, subject and cross reference authority lists, and various statistical reports. hawaii state lihra1'y system to automate p1'ocessing the state board of education in hawaii has approved a proposal for a computerized data processing system for the hawaii state library. the decision allows for the purchase of computer equipment for automating library operations. the state library centrally processes library materials for all public and school libraries in the state. teichior hirata, acting state education superintendent, told board members a computerized system will speed book selection, ordering, and processing, and will improve interlibrary loan and reference services. he also pointed out it would facilitate a general streamlining of all technical administrative operations. the system's total cost will be $187,000, of which $58,000 will be spent for computer software. the "biblios" system, designed and developed at orange county public library in california and marketed by information design, inc., was selected as the software package. the caltech science lihm1'y catalog supplement the use of catalog supplements during the necessary maturation period required to take full advantage of the national program for acquisitions and cataloging is technical communications 145 obviously an idea whose time has come. the program developed at the california institute of technology, however, differs in several important respects from that previously described by nixon and bell at u.c.l.a. 1 for reasons based primarily on faculty pressure, the practice of holding books in anticipation of the cataloging copy has never been a practice at the institute. the solution, while hardly unique, is to assign the classification number (dewey) and depend on a temporary main entry card to suffice until the lc copy is available. while this procedure has the distinct advantage of not requiring the presence of the book to complete the cataloging process, it does, however, prevent the user from finding the newest books through a search of the subject added entry cards. the use of the computer-based systems is an obvious solution to this aspect of the program but raises several additional problems which formerly seemed to defy solutions. as has been pointed out by mason, library-based computer systems can rarely be justified in terms of cost effectiveness, and computer-based library catalogs are no exception.2 part of this problem arises from the natural inclination to repeat in machine language what has been standard practice in the library catalog. this reaction overlooks the very different nature of catalogs and catalog supplements. as catalogs serve as the basis for the permanent record and their cost can be prorated over several decades the need for a careful description of the many facets of a book is quite properly justified. in the case of catalog supplements, however, where the record will serve quite likely for only a few months, any attempt at detailed description of the book cannot be justified. one solution to this dilemma that has been developed here at caltech is a brief listing supplement which allows searching for a given book by either the first author or editor's last name, a key word from the title, or the first word of a series entry. these elements form the basis of a simple kwoc index (see figure 1) which sup146 journal of library automation vol. 7/2 june 1974 chemisorption chemisorption and catalysis hepple 541.395 he 1970 ch chester 19 techniques in partial differential equations chester 517.6 ch 1971 ch 199 ciba protein turnover 612.39 pr 1972 bl ( ciba foundation symposium, 9) fig. 1. sample entries from the kwoc index 108 19 t chemisorption & catalysis hepple 541.395 he 1970 ch a hepple chemisorption catalysis 108 t protein turnover 612.39 pr 1972 bi (ciba foundation symposium, 9) a protein ciba 199 t techniques in partial differential equations chester 517.6 ch 1971 ch a differential chester fig. 2. sample ent1·ies from the bibliographic listing new books chemistry /biology august 6, 1973 catalysis, chemisorption and . hepple 541.395 he 1970 ch differential equations, techniques in partial . . . chester 517.6 ch 1971 ch protein turnover ciba foundation symposium, 9 612.39 pr 1972 bi fig. 3. sample entries from the weekly list of newly added books plements the bibliographic listing (shown in figure 2) . all books received in the chemistry, physics, and biology libraries are represented in the catalog supplement. weekly lists of newly added books (shown in figure 3) are annotated to show the index terms prior to keypunching. the unit record consists of a "title" card or cards (which contain the full title, author/ editor, call number, library designation, and series information) and an "author" card (which contains the index terms) . edited material is added accessionally to the card file data base and batch processed on the campus ibm 370/ 155 computer. the catalog supplement is currently published on 8jf-by-1hnch sheets as a result of reducing the computer printout on a xerox 7000 copier. lists are given a vello-bind and delivered to therespective libraries. weeding the catalog supplement is still unresolved. at the present time additions are less than 1,000 per year, so that it may be possible after five years to replace the subject sections of the respective divisional catalogs with the catalog supplement. the "library" at caltech consists of several divisional libraries, each with their own card catalog. these divisional card catalogs are supplemented by a union catalog, which serves all libraries on campus and, because of the strong interdisciplinary nature of the divisional libraries, is much the better source for subject searches. the project is so facile and the costs so minimal that this approach might be of value to many small libraries. it is particularly applicable to the problems recently discussed by patterson. 3 books in series, even if they are distinct monographs, are often lost to the user from a subject approach. with this system each physical volume added to the library can be analyzed for possible inclusion in the catalog supplement. 1. robertanixon and ray bell, "the u.c.l.a. library catalog supplement," library resources & technical services 17:59 (winter 1973). 2. ellsworth mason, "along the academic way," library journal 96:1671 (1971). 3. kelly patterson, "library think vs libra1y user," rq 12:364 (summer 1973). danal. roth millikan librm·y c alifomia institute of technology commercial activities richard abel & company to sponsor workshops in library automation and management one of the most effective forms of continuing education is state-of-the-art reporting. recognizing the need for more such communication 1 the international library service firm of richard abel & company plans to sponsor two workshops for the library and information science community. the first workshop will deal with the latest techniques in library automation. it will precede the 197 4 american library association conference in new york city, july 7-13. the second will present advances in library management, and will be scheduled to precede the 1975 ala midwinter meeting, january 19-25. the workshops will include forums, lectures, and open discussions. they will be presented by recognized leaders in the fields of library automation, management, and consulting. each workshop will probably be one or two days long. there will be no charge to attend either of the workshops, but attendance will be limited, to provide a good discussion atmosphere. for the management workshop, attendance will be limited to librarians active in library management. similarly, the automation workshop is intended for librarians working in library automation. maintaining the theme of state-of-theart reporting, the basic content of the workshops will consist of what is happening in library management and automation today. looking to the future, there will also be discussions and forecasts of what is to come. persons interested in further informatechnical communications 147 tion or in pa1ticipating in either workshop should contact abel workshop director, richard abel & company, inc., p.o. box 4245, portland, or 97208. idc introduces bibnet on-line services the introduction of bibnet on-line systems, a centralized computer-based bibliographic data service for libraries, has been announced by information dynamics corporation. demonstrations are planned for the ala annual conference in new york, july 7-13. according to david p. waite, idc president, "during 1973, bibnet service modules were interconnected over thousands of miles and tested for on-line use with idc's centralized computerbased cataloging data files. this is the culmination of a program that began two years ago. it is patterned after advanced technological developments similar to those recently applied to airline reservation systems and other large scale nationwide computing networks used in industry." idc, a new england-based library systems supplier, will provide a computerstored cataloging data base of more than 1.2 million library of congress and contributed entries. initially it will consist of all library of congress marc records (now numbering over 430,000 titles), plus another 800,000 partial lc catalog records containing full titles, main entries, lc card numbers, and other selected data elements. as a result, bibnet will provide on-line bibliographic searching for all 1,250,000 catalog records produced by the library of congress since 1969. to enable users to produce library cards from those non-marc records for which only partial entries are kept in the computer, idc will mail card sets from its headquarters and add the full records to the data base for future reference. subscribing libraries will have access to the data base using a minicomputer cathode ray tube (crt) terminal. using this technique of dispersed computing each bibnet terminal has programmable computer power built-in. this in-house 148 journal of library automation vol. 7/2 june 1974 processing power, independent of the central computer, allows computer processes like library card production to be performed in the library. this also eliminates waiting for catalog cards to arrive in the mail. bibnet terminals communicate with the central computer over regular telephone lines, eliminating the high costs of dedicated communication lines. therefore, thousands of libraries throughout the united states and canada can avail themselves of on-line services at low cost. bibnet users will have several methods of extracting information from the idc data base. the computer can search for individual records by titles, main entry, isbn number, or keywords. here's how it works: the operator types in any one of the search items or if a complete title is not known, a keyword from the title may be used. the cataloging information is then displayed on the crt where the operator may verify the record. at the push of a button, the data is stored on a magnetic cassette tape which is later used for editing and production of catalog cards by the user library. the bibnet demonstration in new york will highlight one of many bibliographic service modules available from idc and stress the fact that these services can be utilized by individual libraries and organized groups of libraries. license for new information retrieval concept awarded to boeing by xynetics an exclusive license for manufacture and marketing to the government sector of systems incorporating a completely new concept in information storage and retrieval has been awarded to the boeing company, seattle, washington, by xynetics, inc., canoga park, california, it was announced jointly by dr. r. v. hanks, boeing program manager, and burton cohn, xynetics board chairman. the system is said to be the first image storage and retrieval system which offers response times and costs comparable to those of digital systems. the heart of the system is a device of proprietary design, the flat plane memory, which provides mpid access to massive amounts of data stored in high resolution photographic media. the photographic medium enables low cost storage of virtually any type of source material (documents, correspondence, drawings, multitone images, computer output, etc.) while eliminating the need for time-consuming, costly conversion of pre-existing information into a specialized (e.g., digital) format. by virtue of its extremely rapid random access capability, the data needs of as many as several thousand users can be served at remote video terminals from a single memory with near real time response ( 1-3 seconds, typically). the high speed, high accuracy, and high reliability of the flat plane memory is accomplished primarily through the use of the patented xynetics positioner, which generates direct linear motion at high speeds and with great precision and reliability instead of converting rotary motion. as a result, the positioners eliminate the gears, lead screws, and other mechanical devices previously utilized, and thus achieve the requisite speed, accuracy, and reliability. the xynetics positioners are already being used in automated drafting systems produced by the firm, and in a wide variety of other applications, including the apparel industry and integrated circuit test systems. the new approach could eliminate many of the problems associated with multiple reproductions and distribution of large data files. in addition to many government applications, the system is expected to have major applications in the commercial marketplace. appointments charles h. stevens appointed solinet director charles h. stevens, executive director, national commission on libraries and information science, has been appointed director of the southeastern library net~ work (solinet), effective july 1. the announcement was made at a meeting of solinet in atlanta, march 14, by john h. gribbin, board chairman. composed of ninety-nine institutional members, solinet is headquartered in atlanta. a librarian of acknowledged national stature and an expert on the technical aspects of information retrieval systems, mr. stevens brings to solinet a valuable combination of experience and abilities. concerned with national problems of libraries and information services, he will develop a regional network and move toward a cohesive national program to meet the evolving needs of u.s. libraries. a forerunner in library automation, mr. stevens served for six years as associate director for library development, project intrex, at massachusetts institute of technology. from 1959-1965 he was director of library and publications at mit's lincoln laboratory, lexington, massachusetts. at purdue university, he was aeronautical engineering librarian and later director of documentation of the thermophysical properties research center. mr. stevens is a member of the council of the american library association, the american society for information science, the special libraries association, and other professional organizations. he is the author of approximately forty papers in the field, lectures widely, and consults on library activities for a number of universities. mr. stevens holds a b.a. in english fro:in principia college, elsah, illinois, and master's degrees in english and in library science from the university of north carolina. mr. stevens has done further study in engineering at brooklyn polytechnic institute. mr. stevens is married and has three sons. input to the editor: international scuttlebutt informs us that those in the bibliothecal stratosphere are technical communications 149 attempting to formulate a communications format for bibliographical records acceptable on a worldwide basis. we on the local scene unite in wishing them "huzzah!" and "godspeed!" nomenclature must be provided, of course, to designate particular applications; and the following suggestions are offered as possible subspecies of the genus supermarc: deutschmarc-for records distributed from bonn and/ or wiesbaden rheemarc-for south korean records, named in honor of the late president of that country bismarc-for records of stage productions which have been produced by popular demand from the top balcony; especially pertinent for wagnerian operas benchmarc-for records of generally unsuccessful football plays minskmarc-for byelorussian records sachermarc-for austrian records, usually representing extremely tasteful concoctions trademarc-for records pertaining to manufactured products, especially patent medicines goldmarc-for records representing hungarian musical compositions ( v. karl goldmark, 1830-1915) ectomarc } endomarc mesomarc (from -for skinny, fat, and the italian, mezmedium-sized reczomarc) ords, respectively landmarc-for records of historic edifices; sometimes ( enoneously) applied to records for local geographical regions feuermarc-for records representing charred or burned documents montmarc-1. for records representing works by or about parisian artists; 2. for records representing publications of the french academy watermarc-for records representing documents contained in bottles washed up on the beach. joseph a. rosenthal university of california, berkeley microsoft word author_edits_march_ital_rebmannproof_edits.docx tv white spaces in public libraries: a primer kristen radsliff rebmann, emmanuel edward te, and donald means information technology and libraries | march 2017 36 abstract tv white space (tvws) represents one new wireless communication technology that has the potential to improve internet access and inclusion. this primer describes tvws technology as a viable, long-term access solution for the benefit of public libraries and their communities, especially for underserved populations. discussion focuses first on providing a brief overview of the digital divide and the emerging role of public libraries as internet access providers. next, a basic description of tvws and its features is provided, focusing on key aspects of the technology relevant to libraries as community anchor institutions. several tvws implementations are described with discussion of tvws implementations in several public libraries. finally, consideration is given to first steps that library organizations must take when contemplating new tvws implementations supportive of wifi applications and crisis response planning. introduction tens of millions of people rely wholly or in part on libraries to provide access to the internet. many lack access to the federal communications commission (fcc) recommended standard of 25 mbps (megabits per second) download speed and 3 mbps upload speed.1 though the fcc reclassified high-speed internet as a public utility under title ii of the telecommunications act to ensure that broadband networks are “fast, fair, and open” in 2015,2 the “digital divide” still remains. one in four community members does not have access to the internet at home. accounting for age and education level, households with the lowest median income households have service adoption rates of around 50%, compared to those with higher incomes, with rates of 80 to 90%.3 a recent pew research center survey on home broadband adoption found that 43% of those surveyed reported cost being their main reason for non-adoption.4 individuals with low quality or no access are more likely to be digitally disadvantaged, tend to use library computers more frequently, and are less equipped to interact and compete economically as more services and application processes move online.5 kristen radsliff rebmann (kristen.rebmann@sjsu.edu) is associate professor, san jose state university school of information, san jose, ca. emmanuel edward te (emmanueledward.te@sjsu.edu) is a graduate student, san jose state university school of information, san jose, ca. donald means (don@digitalvillage.com) is co-founder and principal of digital village associates, sausalito, ca. tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 37 this article highlights tv white space (tvws), a new wireless communication technology with the potential to assist libraries in addressing digital access and inclusion issues. this primer provides first a brief overview of the digital divide and the emerging role of public libraries as internet access providers, highlighting the need for cost-efficient, technological solutions. we go further to provide a basic description of tvws and its features, focusing on key aspects of the technology relevant to libraries as community anchor institutions. several tvws implementations are described with discussion of how tvws was set up in several public libraries. finally, we extend consideration to first steps library organizations must consider when contemplating new implementations including everyday applications and crisis response planning. digital access and inclusion the term “digital divide” describes the gap between people who can easily access and use technology and the internet, and those who cannot.6 as kinney observes, “there has not been one single digital divide, but rather a series of divides that attend each new technology.”7 digital divides are exacerbated by various factors including: socioeconomic status, education, geography, age, ability, language, and especially availability and quality.8 in recent years, the language describing this issue has changed, but the inequalities stay consistent and widen among different dimensions with each emerging technology. the most recent public policy term “digital inclusion” promotes digital literacy efforts for unserved and underserved populations.9 the progression from the term “digital divide” to “digital inclusion” represents a shift in focus from issues of access exclusively toward contexts and quality of participation and usage. along these lines, the language of digital inclusion reframes the issue by making visible that simply focusing on internet access can obscure the fact that divides associated with quality and effectiveness remain.10 in response to the digital divide, public libraries have become the “unofficial” providers of internet access, stemming from libraries’ access to broadband infrastructure, maintenance of publiclyavailable computers, and services providing assistance and training.11 a pew research center survey on perceptions of libraries found that most respondents reported viewing public libraries as important parts of their communities, providing resources and assisting in decisions regarding what information to trust.12 however, many public libraries are facing an “infrastructure plateau” of internet access due to few computer workstations and slower broadband connection speeds that can support a growing number of users,13 on top of insufficient funding, physical space, and staffing.14 previous surveys show that although public libraries are connected to the internet and provide public access workstations and wireless access, nearly 50% of public libraries only offer wireless access that shares the same bandwidth as their workstations.15 this increased usage strains existing network connections and infrastructure, resulting in slower connections for everyone connected to the public library’s network. many public libraries cannot accommodate more workstations, support the power requirements of both workstations and patrons’ laptops, and afford workstation upgrades and bandwidth increases to move past their insufficient connectivity speeds. libraries often lack the it skills, time, and funds to upgrade their information technology and libraries | march 2017 38 infrastructure.16 typical wireless access via wi-fi is relegated to distances within library buildings, which may extend to exterior spaces and is available only during operating hours. despite these challenges, public libraries continually provide access and “at-the-point-of-need” training and support for their patrons, especially for those who do not have easy access to the internet and computers.17 subsidized by federal funding, libraries represent key access providers and technology trainers for the public without internet access.18 the fcc classifies libraries as “community anchor institutions” (cais), organizations that “facilitate greater use of broadband by vulnerable populations, including low-income, the unemployed, and the aged.”19 recent surveys show that users have a positive view of libraries, providing opportunities to spend time in a safe space, pursue learning, and promote a sense of community. librarians offer internet skills training programs more often than other community organizations though (at around 75% of the time) training occurs informally.20 in particular, 29% of respondents to a library use survey reported going to libraries to use computers, the internet, or the wi-fi network; 7% have also reported using libraries’ wi-fi signals outside when libraries are closed.21 the majority of these users are more likely to be young, black, female, and lower income, utilizing library technology resources for school or work (61%), checking email or sending texts (53%), finding health information (38%), and taking online courses or completing certifications (26%).22 public libraries are already exploring creative approaches to providing internet access for these underserved communities. the mobile hotspot lending program in public library systems in new york city and kansas city are just two examples.23 yet libraries must do more by supporting innovation and providing leadership by partnering with other community organizations and their stakeholders to enhance resilience in addressing access and inclusion. the emergence of tvws wireless technology presents an opportunity for libraries to explore expanding the reach of their wireless signals beyond library buildings and extend 24/7 library wi-fi availability to community spaces such as subsidized housing, schools, clinics, parks, senior centers, and museums. tvws basics tv whitespace (tvws) refers to the unoccupied portions of spectrum in the vhf/uhf terrestrial television frequency bands.24 television broadcast frequency allocations traditionally assumed that tv station transmissions operating at high power needed wide spectrum separation to prevent interference between broadcasting channels, which led to the specific spectrum allocation of these frequency “guard bands.”25 research discovered that low-power devices can operate within these spaces, which led the federal communications commission (fcc) to field test tvws applications to wireless communications and (ultimately) promote tvws neutrality.26 in 2015, the federal communications commission (fcc) made a portion of these very valuable tvws bands of spectrum available for open, shared public use, like wi-fi. yet, unlike wi-fi, with a reach measured in 10s of meters, the range of tvws is measured in 100s or even 1000s of meters. tvws has good propagation characteristics, which makes it an extremely valuable license-exempt radio spectrum.27 it is a relatively stable frequency that does not change over time, allowing for tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 39 spectrum availability estimates to remain reliable and valid, which in turn promotes its various applications.28 radio spectrum is considered a “common heritage of humanity,”29 as radio waves “do not respect national borders.”30 the fcc recently made a portion of these tvws bands of spectrum available for open, shared public use.31 tvws availability and application are contextual and dependent on many key factors. availability is influenced by frequency (the idle channels purposely planned in tv bands, varying across regions), deployment (the height and location of the tvws transmit antenna and its installation sites in relation to nearby surrounding tv broadcasting reception), space and distance (geographical areas outside the current planned tv coverage, including no present broadcasting signals), and time (off-air availability of licensed broadcasting transmitters during specific periods of time, subject to change by the broadcaster).32 as tvws existed as fragmented “safety margins” between broadcast services, tvws is typically more abundant in rural areas that have less broadcast coverage and in larger contiguous blocks rather than in highly dense urban areas.33 assigned spectrum is not always used efficiently and effectively by licensees, and exclusive or nonexclusive sharing can alleviate pressure on these resources.34 this “spectrum crunch” of the inefficient use of scarce spectrum resources can be alleviated with dynamic spectrum access (dsa) and spectrum sharing. tvws availability is small where digital television has been deployed, with the potentials for aggregate interference (from tvws users in relation to primary tv service) and self-interference (within the tvws network), which may lead to a “mismatch situation” where there is high demand for bandwidth but very low tvws bandwidth supply.35 as most spectrum frequencies have been organized through some form of exclusive access in which only the licensee can use the specific spectrum, technologies such as cognitive radios can enable new modes of spectrum access, supporting autonomous, self-configuring, self-planning networks which rely on up-to-date tvws availability databases. the limited distribution (in many areas) of basic broadband infrastructure and relatively high cost of access often prevents individuals with lower incomes from participating in the digital revolution of information access and its opportunities.36 despite these challenges to broadband availability, tvws excels in areas with low broadband coverage. rural regions possess greater frequency availability due to lower density of spectrum licensing. in comparison to other frequencies operating higher up on the spectrum band, tvws does not require direct line-of-sight between devices for operation, and has lower deployment costs. equipment market costs are comparable to wi-fi equipment currently on the market.37 importantly, tvws can address access and inclusion by having relatively low start-up costs and no ongoing services fees. as a public resource, it can work with existing services to create new, potentially mobile connections to the internet that ensure the continuation of vital services in the event of service interruptions.38 in urban areas with fewer channels available, new efficient spectrum sharing policies will be necessary. assigned spectrum is not always used efficiently and effectively by licensees, and exclusive or non-exclusive sharing or “recycling” of bands for more information technology and libraries | march 2017 40 effective spectrum use by multiple parties with changing spectrum needs can alleviate pressure on these resources.39 tvws for public libraries tvws is a viable medium for applications from internet access, content distribution within a given location, tracking (people, animals, and assets), task automation, and public safety and security,40 as well as remote patient monitoring and other telemedicine applications.41 tvws complements existing networks that use other parts of the spectrum for access points, mobile communications, and home media devices.42 analyses of a recent digital inclusion survey suggest that technology upgrades can have significant impact on the ability of libraries to expand programs and services.43 as community anchor institutions (cais), public libraries can use tvws systems to expand and improve access to their services for their users, especially for underserved populations. library-led collaborations to deploy tvws networks in other cais and public spaces have numerous benefits. in conjunction with building-centered wi-fi, tvws can redistribute network users from congested library spaces to other community sites, thereby distributing network usage across the community. from an existing broadband connection, libraries can extend their networks of internet access strategically across their communities. yet, unlike networks which solely use limited-range wi-fi, far-reaching tvws can improve the coverage and inclusion of patrons in accessing library programs, services and the broader internet.44 the portability of the access points allows libraries to extend their reach by providing wireless connections in the shortterm, for cultural or civic events like fairs, markets, or concerts, and in the long-term, for use at popular public areas. recent tvws pilot installations have proven to be very stable in kansas, colorado, mississippi, and delaware. manhattan public library (kansas)’s tvws project began in fall 2013. though there were a few delays in the installation and testing process, the tvws equipment was successfully implemented and welcomed by the community in early 2014. it staff report that their remote locations have shown that this library service fills a community need, especially for underserved populations.45 delta county libraries (colorado) are conducting trials with two public hotspots to support “guest” access and potentially provide library patrons with more bandwidth access.46 tvws implementations in the pascagoula school district (mississippi)47 and delaware public libraries48 show successful initial pilot usage in providing wireless internet service directly to community-distributed access points. though there are contextual differences across these sites, the strength of public libraries as cais providing internet access via tvws systems is evident and promising. first steps any library can take the initiative in setting up a tvws network on its own. the first step is to assess availability of spectrum in the library’s geographic location. access to tvws frequencies is free and requires no subscription fees other than the initial equipment investment. public tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 41 databases of tvws availability are easily accessible and have been tested by the fcc since 2011;49 google also has posted its own spectrum database as well.50 from this setup, the library gains access to public tvws frequencies by which they can broadcast and receive internet connections from paired tvws-enabled remote hotspots. once it is determined that there is available spectrum/channels in the desired area, libraries can then explore how their current broadband and wireless connections might be expanded to include several community spaces where internet access is needed. next, the library works with a tvws equipment supplier to design and install a tvws network consisting of a base station that is integrated with their wired connection to the internet. finally, the library places tvws-enabled remote hotspots in (previously identified) community-based spaces where wi-fi access is needed by underserved populations. given a high quality backhaul (i.e., fiber optic cable high speed connection), tvws can spread that signal and provide access from the library, which is able to propagate and penetrate multiple barriers and geographical features with a signal up to 10 times stronger than current wi-fi. depending on the context (geographical features, tvws availabilities, etc.), hotspots can be installed up to six miles (10 km) away and do not require line-of-sight between the base station and hotspots. this ability is superior to current wi-fi networks that only cover patrons in the immediate vicinity of the library. these tvws remote hotspots also can be easily (and strategically) moved to support occasional community needs (such as neighborhood-wide or city events) or in response to crisis situations. tvws, libraries, and emergency response public libraries provide leadership as “ready access point, first choice, first refuge, and last resort” for community services in everyday matters and in emergencies.51 they have assisted residents in relief efforts during hurricanes katrina and rita, and other natural and man-made disasters.52 …the provision of access to computers and the internet was a wholly unique and immeasurably important role for public libraries… the infrastructure can be a tremendous asset in times of emergencies, and should be incorporated into community plans.53 they have likewise provided immediate and long-term assistance to communities and aid workers, providing physical space for recovery operations for emergency agencies, communication technologies, and emotional support for the community. in previous library internet usage surveys, nearly one-third of libraries reported that their computers and internet services would be used by the public in emergencies to access relief services and benefits.54 such activities include finding and communicating with family and friends, completing online fema forms and insurance claims, and checking news sites regarding information of their affected homes.55 yet, despite the admirable and successful efforts of many public libraries, their infrastructures are not always built to meet the increased demand of user needs and e-government services in emergency contexts.56 jaeger, shneiderman, fleischmann , preece, qu, and wu propose the concept of community response grids (crgs), which utilize the internet and mobile communications devices so that emergency responders and residents in a disaster area can information technology and libraries | march 2017 42 communicate and coordinate accurate, appropriate responses.57 this concept relies on social networks, both in person and online, to enable residents and emergency responders to work together in a multi-directional communication scheme. crgs provide residents tailored, localized information and a means to report pertinent disaster related information to emergency responders, who in turn can synthesize and analyze submitted information and act accordingly.58 due to their existing role as community anchor institutions (cais), public libraries are uniquely positioned for crg involvement. libraries can assist in facilitation of internet access with portable tvws network connection points. by virtue of their portability, tvws hotspots can provide essential digital access in times of crisis by moving along with their affected populations. emergency operations and communications in a crisis occur throughout networks comprised of various technologies. information management before, during, and after a disaster affects how well a crisis is managed.59 broadband internet can be one access route in the event that phone and radio transmissions are affected, and vice versa, as part of a “mixed media approach” to get messages to those that need it in an emergency.60 yet one must remember that internet communications are double-edged: the internet provides relevant material on demand and near instant sharing and collaborating, but these very features can compound a crisis with misinformation.61 despite these concerns, the potential of the integration of wireless devices and other technologies into a multi-technology, collaborative response system can solve the problem of existing communication structures that lack coordination and quality control.62 the proliferation of smartphones, laptops, and other portable wireless devices makes such technology ideal for emergency communications, especially in how users’ familiarity with their own devices will help them navigate crg communications while under stress.63 conclusion supporting internet access and inclusion in public libraries and having equal, affordable, and available access to information is a necessary component to bridging the digital divide. technology has become “an irreducible component of modern life, and its presence and use has significant impact on an individuals’ ability to fully engage in society.”64 as cohron argues, this principle represents more than providing people with internet access: it is about “leveling the playing field in regards to information diffusion. the internet is such a prominent utility in peoples’ lives that we, as a society, cannot afford for citizens to go without.”65 broadband access is the first step; digital literacy training is also a necessity. access alone is not enough to ensure quality and effective use, however, as the digital divide is representative of broader social inequalities that computer and internet access cannot fully remedy.66 this is a complex problem that requires a multi-faceted solution. as kinney states, “the digital divide is a moving target, and new divides open up with new technologies. libraries help bridge some inequities more than others, and substantial disparities exist among library systems.”67 internet access also becomes a necessity when the internet is to play a role in emergency communications.68 tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 43 it is problematic to suggest that public libraries can be simultaneously promoted as the solution to digital divide issues while facing cuts to funding. policy makers, community advocates, and the community members themselves are stakeholders in the success of their communities, and must also take responsibility for access and inclusion via public libraries.69 as public agencies automate to increase equality and save money, they exacerbate digital divides by excluding those without access. suggesting that community members simply visit the library to ensure access to public services places additional pressure on libraries, yet these efforts may go unsupported and unacknowledged. public libraries are already valuable community access points to resources especially in emergencies, though many suffer from a lack of concerted disaster planning. along similar lines, many libraries are ill-equipped to accommodate the bandwidth needs of growing and oftentimes sparsely connected populations. as communications and government services move increasingly online, it becomes imperative to build strong cost-effective information infrastructures. tvws connections can arguably help in breaking down the barriers that challenge ubiquitous access and inclusion. tvws-enabled remote access points in daily use around communities are ideally situated to provide everyday wi-fi and for rapid redeployment to damaged areas (as pop-up hotspots) to provide essential communication and information resources in times of crisis. in short, tvws can augment the technological infrastructure of public libraries toward further developing their roles as cais and leaders serve their communities well into the future. references 1. wireline competition bureau, “2016 broadband progress report,” federal communications commission, january 29, 2016, https://www.fcc.gov/reports-research/reports/broadbandprogress-reports/2016-broadband-progress-report. 2. office of chairman wheeler, “fcc adopts strong, sustainable rules to protect the open internet,” federal communications commission, february 26, 2015, https://apps.fcc.gov/edocs_public/attachmatch/doc-332260a1.pdf. 3. “here's what the digital divide looks like in the united states,” the white house, july 15, 2015, https://www.whitehouse.gov/share/heres-what-digital-divide-looks-united-states. 4. john b. horrigan and maeve duggan, “home broadband 2015,” pew research center, december 21, 2015, http://www.pewinternet.org/files/2015/12/broadband-adoptionfull.pdf. this 43% is further divided between 33% reporting the monthly subscription cost as their main reason, while the other 10% report the expensive cost of a computer as their reason for non-adoption. 5. bo kinney, “the internet, public libraries, and the digital divide,” public library quarterly 29, no. 2 (2010): 104-161, https://doi.org/10.1080/01616841003779718. information technology and libraries | march 2017 44 6. madalyn cohron, “the continuing digital divide in the united states,” the serials librarian 69, no. 1 (2015): 77-86, https://doi.org/10.1080/0361526x.2015.1036195. 7. kinney, “the internet, public libraries, and the digital divide.” 8. paul t. jaeger, john carlo bertot, kim m. thompson, sarah m. katz, and elizabeth j. decoster, “the intersection of public policy and public access: digital divides, digital literacy, digital inclusion, and public libraries,” public library quarterly 31, no.1 (2012): 1-20, https://doi.org/10.1080/01616846.2012.654728. 9. brian real, john carlo bertot, and paul t. jaeger, “rural public libraries and digital inclusion: issues and challenges,” information technology and libraries 33, no. 1 (2014): 6-24, https://doi.org/10.6017/ital.v33i1.5141. 10. jaeger et al., “the intersection of public policy and public access.” 11. john carlo bertot, paul t. jaeger, lesley a. langa, charles r. mcclure, “public access computing and internet access in public libraries: the role of public libraries in e-government and emergency situations,” first monday 11, no. 9 (2006), https://doi.org/10.5210/fm.v11i9.1392. 12. john. b horrigan, “libraries 2016,” pew research center, september 9. 2016, http://www.pewinternet.org/2016/09/09/libraries-2016/. 13. real et al., “rural public libraries and digital inclusion.” 14. john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no.3 (2008): 285301, https://doi.org/10.1086/588445. 15. charles r. mcclure, paul t. jaeger, john carlo bertot, “the looming infrastructure plateau? space, funding, connection speed, and the ability of public libraries to meet the demand for free internet access,” first monday 12, no. 12 (2007): https://doi.org/10.5210/fm.v12i12.2017 . 16. ibid. 17. bertot et al., “public access computing and internet access in public libraries.” 18. ibid.; jaeger et al., “the intersection of public policy and public access.” 19. wireline competition bureau, “wcb cost model virtual workshop 2012 community anchor institutions,” federal communications commission, june 1, 2012, https://www.fcc.gov/newsevents/blog/2012/06/01/wcb-cost-model-virtual-workshop-2012-community-anchorinstitutions. 20. jennifer koerber, "ala and ipac analyze digital inclusion survey," library journal 141, no. 1 (2016): 24-26. 21. horrigan, “libraries 2016.” tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 45 22. ibid. 23. timothy inklebarger, “bridging the tech gap,” american libraries, september 11, 2015, https://americanlibrariesmagazine.org/2015/09/11/bridging-tech-gap-wi-fi-lending. 24. andrew stirling, “white spaces – the new wi-fi?,” international journal of digital television 1, no. 1 (2010): 69–83, https://doi.org/10.1386/jdtv.1.1.69/1; cristian gomez, “tv white spaces: managing spaces or better managing inefficiencies?,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 67-77. 25. steve song, “spectrum and development,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 35-40. 26. robert horvitz, “geo-database management of white space vs. open spectrum,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 7-17. 27. julie knapp, “fcc announces public testing of first television white spaces database,” federal communications commission, september 14, 2011, https://www.fcc.gov/newsevents/blog/2011/09/14/fcc-announces-public-testing-first-television-white-spacesdatabase. 28. horvitz, “geo-database management of white space vs. open spectrum.” 29. ryszard strużak and dariusz więcek, “regulatory issues for tv white spaces,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 19-34. 30. horvitz, “geo-database management of white space vs. open spectrum,” 8. 31. engineering & technology bureau, “fcc adopts rules for unlicensed services in tv and 600 mhz bands,” federal communications commission, august 11, 2015, https://apps.fcc.gov/edocs_public/attachmatch/fcc-15-99a1_rcd.pdf. 32. gomez, “tv white spaces: managing spaces or better managing inefficiencies?,” 68. 33. stirling, “white spaces – the new wi-fi?.” 34. linda e. doyle, “cognitive radio and africa,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 109-119. 35. gomez, “tv white spaces: managing spaces or better managing inefficiencies?,” 72. 36. mike jensen, “the role of tv white spaces and dynamic spectrum in helping to improve internet access in africa and other developing regions,” in tv white spaces a pragmatic information technology and libraries | march 2017 46 approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 83-89. 37. song, “spectrum and development.” 38. ibid. 39. doyle, “cognitive radio and africa,” 113. 40. stirling, “white spaces – the new wi-fi?.” 41. afton chavez, ryan littman-quinn, kagiso ndlovu, and carrie l kovarik, “using tv white space spectrum to practice telemedicine: a promising technology to enhance broadband internet connectivity within healthcare facilities in rural regions of developing countries,” journal of telemedicine and telecare 22, no. 4 (2015): 260-263, https://doi.org/10.1177/1357633x15595324. 42. stirling, “white spaces – the new wi-fi?.” 43. koerber, "ala and ipac analyze digital inclusion survey." 44. chavez et al., “using tv white space spectrum to practice telemedicine.” 45. kerry ingersoll, june 22, 2015, google+ comment to the gigabit libraries network, https://plus.google.com/107631107756352079114/posts/l4y8ci8sg5y. 46. delta county libraries, “super wi-fi pilot,” accessed november 1, 2016, http://www.deltalibraries.org/super-wi-fi-pilot/. 47. pascagoula tv white spaces facebook group, accessed november 1, 2016, https://www.facebook.com/psdtvws/. 48. “delaware libraries white space pilot update, january 2015,” accessed november 1, 2016, http://lib.de.us/files/2015/01/delaware-libraries-white-space-pilot-update-jan-2015.pdf. 49. knapp, “fcc announces public testing of first television white spaces database.” 50. see https://www.google.com/get/spectrumdatabase/. 51. bertot et al., “public access computing and internet access in public libraries.” 52. bertot et al., “the impacts of free public internet access.” see also horrigan, “libraries 2016.” 53. paul t. jaeger, lesley a. langa, charles r. mcclure, and john carlo bertot, “the 2004 and 2005 gulf coast hurricanes: evolving roles and lessons learned for public libraries in disaster preparedness and community services,” public library quarterly 25, 3/4, (2007), 199-214. 54. ibid. tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 47 55. bertot et al., “public access computing and internet access in public libraries.” 56. ibid. 57. paul t. jaeger, ben shneiderman, kenneth r. fleischmann , jennifer preece, yan qu, philip fei wu, “community response grids: e-government, social networks, and effective emergency management,” telecommunications policy 31 (2007): 592-604, https://doi.org/10.1016/j.telpol.2007.07.008. 58. ibid., 595. 59. laurie putnam, “by choice or by chance: how the internet is used to prepare for, manage, and share information about emergencies,” first monday 7, no.11 (2002), https://doi.org/10.5210/fm.v7i11.1007. 60. ibid. 61. ibid. 62. jaeger et al., “community response grids,” 598. jaegar et al. describe how the internet combines the best of one-to-one, one-to-many, many-to-one, and many-to-many in terms of the flow and quality of information. one-to-one communication is slow; many-to-one only benefits the central network, while outsiders reporting emergencies do not learn what others are reporting; one-to-many is inefficient, limited, and assumes the broadcaster has the appropriate information and can get it to those that need it most; many-to-many can create “information overload” of questionable content. 63. ibid., 599. 64. jaeger et al., “the intersection of public policy and public access,” 3. 65. cohron, “the continuing digital divide in the united states,” 84. 66. kinney, “the internet, public libraries, and the digital divide,” 120. 67. ibid., 148. 68. jaeger et al., “community response grids,” 599. 69. bertot et al., “the impacts of free public internet access,” 299. microsoft word december_ital_gonzales_final.docx linking libraries to the web: linked data and the future of the bibliographic record brighid m. gonzales information technology and libraries | december 2014 10 abstract the ideas behind linked data and the semantic web have recently gained ground and shown the potential to redefine the world of the web. linked data could conceivably create a huge database out of the internet linked by relationships understandable by both humans and machines. the benefits of linked data to libraries and their users are potentially great, but so are the many challenges to its implementation. the bibframe initiative provides the possible framework that will link library resources with the web, bringing them out of their information silos and making them accessible to all users. introduction for many years now the marc (machine-‐readable cataloging) format has been the focus of rampant criticisms across library-‐related literature, and though an increasing number of diverse metadata formats for libraries, archives, and museums have been developed, no framework has shown the potential to be a viable replacement for the long-‐established and widely used bibliographic format. over the past decade, web technologies have been advancing at a progressively rapid pace, outpacing marc’s ability to keep up with the potential these technologies can offer to libraries. standing by the marc format leaves libraries in danger of not being adequately prepared to meet the needs of modern users in the information environments they currently frequent (increasingly, search engines such as google). new technological developments such as the ideas behind linked data and the semantic web have the potential to bring a host of benefits to libraries and other cultural institutions by allowing libraries and their carefully cultivated resources to connect with users on the web. though there remains a host of obstacles to their implementation, linked data has much to offer libraries if they can find ways to leverage this technology for their own uses. libraries are slowly finding ways to take advantage of the opportunities linked data present, including initiatives such as the bibliographic framework initiative, known as bibframe, which may have the potential to be the bibliographic replacement for marc that the information community has long needed. such a change may help libraries not only to stay current with the modern information world and stay relevant in the minds of users, but also reciprocally create a richer world of data available to information seekers on the web. brighid gonzales (brighidmgonzales@gmail.com), a recent mlis recipient from the school of library and information science, san jose state university, is winner of the 2014 lita/ex libris student writing award. linking libraries to the web | gonzales 11 the limitations of marc much has been written over the years about the issues and shortcomings of the marc format. nonetheless, marc formatting has been widely used by libraries around the world since the 1960s, when it was first created. this long-‐established and ubiquitous usage has resulted in countless legacy bibliographic records that currently exist in the marc format. to lose this carefully crafted data or to expend the finances, time, and manual effort required to convert all of this legacy data into a new format may be a cause for reservation in the community. but the fact remains that in spite of its widespread use, there are many issues with the marc format that make it a candidate for replacement in the world of bibliographic data. andresen describes several different versions of marc that have largely been wrapped together in the community’s mind, reminding us that “although marc21 is often described as an international standard, it is only used in a limited number of countries.”1 in actuality, what we often refer to simply as marc could be marc21, ukmarc, unimarc or even danmarc2.2 this lack of a unified standard has long been an issue with this particular format. then there is marc’s notorious inflexibility. originally created for the description of printed materials, marc’s rigidly defined standards can make it unsuited for the description of digital, visual, or multimedia resources. andresen writes that “the lack of flexibility means that local additions might hinder exchange between local systems and union catalogue systems.”3 tennant has also expressed frustration with marc’s inflexibility, particularly its inability to express hierarchical relationships. tennant posits that where the marc format is “flat,” expressing relationships involving hierarchy, such as in a table of contents, “would be a breeze in xml,” which is the format he recommends moving toward for its greater extensibility.4 marc’s rigidity may also be a reason why the format is not generally used outside of the library environment; thus information contained in marc format cannot be exchanged with information from nonlibrary environments.5 inconsistencies, errors, and localized practices are also issues frequently cited in detailing marc’s inherent shortcomings. with shared cataloging, inconsistencies may be less common, but there remains the fact that with any number of individual catalogers creating records, the potential for error is still great. and any localized changes can also create inconsistency in records from library to library. tennant gives as an example recording the editor of a book, which “should be encoded in a 700 field, with a $e subfield that specifies the person is the editor. but the $e subfield is frequently not encoded, thus leaving one to guess the role of the person encoded in the 700 field.”6 when it comes to issues with marc in the modern computing environment, however, one of the biggest and seemingly insurmountable problems is its inability to express the relationships between entities. andresen points out that it is “difficult to handle relations between data that are described in different fields,”7 while tennant writes that “relationships among related titles are problematic.”8 alemu et al. also write of marc’s “document-‐centric” structure, which prevents it information technology and libraries | december 2014 12 from recognizing relationships between entities that might be possible in a more “actionable data-‐ centric format.”9 though tennant advocates the embrace of xml-‐based formats as a way to transition from marc, breeding writes that even marcxml “cannot fully make intelligible the quirky marc coding in terms of semantic relationships.”10 alemu et al. also note that marc may continue to be widely used mainly because alternatives, including xml, have not yet been found to be an adequate replacement.11 it is clear that if libraries and their carefully crafted bibliographic records are to remain relevant and viable in today’s modern computing world, a more modern metadata format that addresses these issues will be required. clearly needed is a more flexible and extensible format that allows for the expression of relationships between points of data and the ability to link that data to other related information outside of the presently insular library catalog. linked data and the semantic web linked data works as the framework behind the semantic web, an idea by world wide web inventor tim berners-‐lee, which would turn the internet into something closer to one large database rather than simply a disparate collection of documents. since the internet is often the first place users turn to for information, libraries should take advantage of the concepts behind linked data to both put their resources out on the web, where they can be found by users, and in turn bring those users back to the library through the lure of authoritative, high-‐quality resources. in the world of linked data, the relationships between data, not just the documents in which they are contained, are made explicit and readable by both humans and machines. with the ability to “understand” and interpret these semantically explicit connections, computers will have the power to lead users to a web of related data based on a single information search. underpinning the semantic web are the web-‐specific standards xml and rdf (resource description framework). these work as universal languages for semantically labeling data in such a way that both a person and a computer can interpret their meaning and then distinguishing the relationships between the various data sources. these relationships are expressed using rdf, “a flexible standard proposed by the w3c to characterize semantically both resources and the relationships which hold between them.”12 baker notes that rdf supports “the process of connecting dots—of creating “knowledge”—by providing a linguistic basis for expressing and linking data.”13 rdf is organized into triples, expressing meaning as subject, verb, and object and detailing the relationships between them. an example is the catcher in the rye is written by j. d. salinger, where the catcher in the rye acts as the subject, j. d. salinger is the object and the “verb” is written by expresses the semantic relationship between the two, naming j. d. salinger as the author of the catcher in the rye. by using this framework, computers can link to other rdf-‐encoded data, leading users to other works written by j. d. salinger, other adaptations of the catcher in the rye, and other related data sources from around linking libraries to the web | gonzales 13 the web. rdf gives machines the ability to “understand” the semantic meaning of things on the web and the nature of the relationships between them. in this way it can make connections for people, leading them to related information they may not have otherwise found. the use of xml allows developers to create their own tags, adding an explicit semantic structure to their documents that they can exploited using rdf. the semantic web is based on four rules explicated by web inventor tim berners-‐lee. the rules for the semantic web are as follows: 1. use uris (uniform resource identifiers) as names for things. 2. use http uris so that people can look up those names. 3. when someone looks up a uri, provide useful information, using the standards (rdf*, sparql). 4. include links to other uris so that they can discover more things.14 uris act as a permanent signpost for things, both on and off the web. using consistent uris allows data to be linked between and back to certain places on the web without the worry of broken or dead links. rdf triples map the relationships between each thing, which can then be linked to more things, opening up a wide world of interrelated data for users. the concept behind linked data would allow for the integration of library data and data from other resources, whether from “scientific research, government data, commercial information, or even data that has been crowd-‐sourced.”15 however, to create an open web of data facilitated by linked data theories, open standards such as rdf must be used, making data interoperable with resources from various communities. this interoperability is key to being able to mix library resources with those from other parts of the web. interoperability helps to make “data accessible and available, so that they can be processed by machines to allow their integration and their reuse in different applications.”16 in this way, machines would be able to understand the relationships and connections between data contained within documents and thus lead users to related data they may not have otherwise found. using linked data would bring carefully crafted and curated library data out of the information silos in which they have long been enclosed and connect them with the rest of the web where users can more easily find them. benefits for libraries libraries and their users have much to gain from participation in the linked data movement. in an age when google is often the first place users turn when searching for information, freeing library data from their insulated databases and getting them out onto the web where the users are can help make library resources both relevant and available for users who may not make the library information technology and libraries | december 2014 14 the first place they look for information. this can lead not only to increased use by library patrons and nonpatrons (who would now be potential library patrons) alike, but also to increased visibility for the library. creating and using linked data technologies also opens the door for libraries to share metadata and other information in a way previously limited by marc. libraries also have the potential to add to the richness of data that is available on the web, creating a reciprocal benefit with the semantic web itself. coyle writes that “every minute an untold number of new resources is added to our digital culture, and none of these is under the bibliographic control of the library.”17 indeed, the world wide web is a participatory environment where anyone can create, edit or manipulate information resources. libraries still consider themselves the province of quality, reliable information, but users don’t necessarily go to libraries when searching and don’t necessarily have the internet acumen to distinguish between authoritative information and questionable resources. coyle also notes that “the push to move libraries in the direction of linked data is not just a desire to modernize the library catalog; it represents the necessity to transform the library catalog from a separate, closed database to an integration with the technology that people use for research.”18 using linked data, libraries can still create the rich, reliable, authoritative data they are known for while also making it available on the web, where potentially anyone can find it. much has been written about libraries’ information silos, and many researchers are finding in linked data the possibility to free this information. for the information contained in the library catalog to be significantly more usable it “must be integrated into the web, queryable from it, able to speak and to understand the language of the web.”19 alemu et al. write that linking library data to the web “would allow users to navigate seamlessly between disparate library databases and external information providers such as other libraries, and search engines.”20 users are likely to find the world of linked data immeasurably more useful than individually searching library databases one-‐by-‐one or relying on google search results for the information they need. linked data also allows for the possibility of serendipity in information searching, of finding information one didn't even know they were looking for, something akin to browsing the library shelves.21 linked data “allows for the richer contextualization of sources by making connections not only within collections but also to relevant outside sources.”22 tillett adds that linked data would allow for “mashups and pathways to related information that may be of interest to the web searcher—either through showing them added facets they may wish to consider to refine their search or suggesting new directions or related resources they may also like to see.”23 the use of linked data is not just beneficial to users though. libraries are also likely to see increased benefits in the sharing of metadata and other resources. alemu et al. write that “making library metadata available for re-‐use would eliminate unnecessary duplication of data that is already available elsewhere, through reliable sources.24 tillett also writes about the reduced cost to libraries for storage and data in a linked data environment where “libraries do not need to replicate the same data over and over, but instead share it mutually with each other and with linking libraries to the web | gonzales 15 others using the web,” reducing costs and expanding information accessibility.25 byrne and goddard also note that “having a common format for all data would be a huge boon for interoperability and the integration of all kinds of systems.”26 in addition to the reduced cost of shared resources, something with which libraries are already very familiar, the linking of data from libraries to one another and to the web would also allow for an increased richness in overall data. from metadata that may need to be changed or updated periodically to user-‐generated metadata that is more likely to include current, up-‐to-‐date terminology, the “mixed metadata” approach allowed by linked data would be “better situated to provide a richer and more complete” description of various resources that could more accurately provide for the variety of interpretation and terminology possible in their description.27 a new bibliographic framework one of the most important ways libraries are moving toward the world of linked data is with the bibliographic framework initiative, known as bibframe, which was announced by the library of congress in 2011. since then, though bibframe is still in development, rapid progress has been made that suggests that bibframe may be the long-‐awaited replacement for the marc format that could free library bibliographic information from its information silos and allow it to be integrated with the wider web of data. the bibframe model comprises four classes: creative work, instance, authority, and annotation. in this model, creative work represents the “conceptual essence” of the item. instance is the “material embodiment” of the creative work. authority is a resource that defines relationships reflected by the creative work and instance, such as people, places, topics, and organizations. annotation relates the creative work with other information resources, which could be library holdings information, cover art, or reviews.28 these are similar in a way to the frbr (functional requirements for bibliographic records) model, which uses work, expression, manifestation, and item.29 indeed, bibframe is built with rda (resource description and access) as an important source for content, which was in turn built around the principles in frbr. despite this, bibframe “aims to be independent of any particular set of cataloging rules.”30 realizing the vast amounts of information that is still recorded in marc format, the bibframe initiative is also working on a variety of tools that will help to transform legacy marc records into bibframe resources.31 these tools will be essential as “the conversion of marc records to useable linked data is a complicated process.”32 where marc allowed for libraries to share bibliographic records without each having to constantly reinvent the wheel, bibframe will allow library metadata to be “shared and reused without being transported and replicated.”33 bibframe would support the linked data model while also incorporating emerging content standards such as frbr and rda.34 the bibframe initiative is committed to compatibility with existing marc records but would eventually replace marc as a bibliographic framework “agnostic to cataloging rules”35 rather than intertwined with them as marc was with aacr2. also unlike information technology and libraries | december 2014 16 marc, which is rigidly structured and not amenable to incorporation with web standards, bibframe would enable library metadata to be found on the web, freeing it from the information silos that have contained it for decades. whereas marc is not very web-‐compatible, “bibframe is built on xml and rdf, both ‘native’ schemas for the internet. the web-‐friendly nature of these schemas allows for the widest possible indexing and exposure for the resources held in libraries.”36 backed by the library of congress, bibframe already has a great deal of support throughout the information community, though it is not yet at the stage of implementation for most libraries. however, half a dozen libraries and other institutions are acting as “early experimenters” working to implement and experiment with bibframe to assist in the development process and get the framework library ready. participating institutions include the british library, george washington university, princeton university, deutsche national bibliothek, national library of medicine, oclc, and the library of congress.37 though not yet fully realized, bibframe seems to offer a substantial step toward the implementation of linked data to connect library bibliographic materials with other resources on the web. the challenges ahead the road to widespread use of the semantic web, linked data, and even possible implementations such as bibframe is not without obstacles. for one, knowledge and awareness is a major concern, as well as the intimidating thought of transitioning away from marc, a standard that has been in widespread use for as long as many of the professionals using it have been alive. there is also the challenge and significant resources required for converting huge stores of legacy data from marc format to a new standard. in addition, linked data has its own set of specific concerns, such as legality and copyright issues involved in the sharing of information resources, as well as the willingness of institutions to share metadata that they may have invested a great deal of time and money in creating. many organizations may be hesitant to make the move toward linked data without a clear sign of success from other institutions. chudnov writes that “a new era of information access where library-‐provided resources and services rose swiftly to the top of ambient search engines’ results and stayed there” is what may be necessary, as well as “tools and techniques that make it easier to put content online and keep it there.”38 byrne and goddard also note that “linked data becomes more powerful the more of it there is. until there is enough linking between collections and imaginative uses of data collections there is a danger librarians will see linked data as simply another metadata standard, rather than the powerful discovery tool it will underpin.”39 alemu et al. concur that making linked data easy to create and put online is necessary before potential implementers will begin to use it. “it is imperative that the said technologies be made relatively easy to learn and use, analogous to the simplicity of creating html pages during the early days of the web.”40 the potential learning curve involved in linked data may be a great barrier to its potential use. tennant writes in an article about moving away from marc to a more linking libraries to the web | gonzales 17 modern bibliographic framework that users “must dramatically expand our understanding of what it means to have a modern bibliographic infrastructure, which will clearly require sweeping professional learning and retooling.”41 even without considering ease-‐of-‐use difficulties or the challenges in teaching practitioners an entirely new bibliographic system, the fact remains that transitioning away from marc toward any new bibliographic infrastructure system will require a great deal of resources, time and effort. “there are literally billions of records in marc formats; an attempt at making the slightest move away from it would have huge implications in terms of resources.”42 breeding also writes of the potential trauma involved in shifting away from marc, which is currently integral to many library automation systems.43 a shift to anything else would require not just the cooperation of libraries but also of vendors, who may see no reason to create systems compatible with anything other than marc. as tennant writes, “anyone who has ever been involved with migrating from one integrated library system to another knows, even moving from one system based on marc/aacr2 to another can be daunting.”44 moving from a marc/aacr2-‐based system to one based on an entirely new framework may be more of a challenge than many libraries would like to take on. a move to something such as bibframe may be fraught with even more difficulty, though it is impossible to say before such an implementation has been fully realized. library system software is not yet compatible with bibframe, and as kroeger writes, “most libraries will not be able to implement bibframe because their systems do not support it, and software vendors have little incentive to develop bibframe integrated library systems without reasonable certainty of library implementation of bibframe.”45 this catch-‐22 situation may be difficult to remedy without a large cooperative effort between libraries, vendors, and the entire information community. another potential obstacle to bibframe implementation that kroeger suggests is the possible difficulty in providing interoperability with all of the many other metadata standards currently in existence.46 this is an issue that tennant also considers in his recommendations that a new bibliographic infrastructure compatible with modern library and information needs must be versatile, extensible, and especially interoperable with other metadata schemes currently in use.47 xml has proven to be useful for a wide variety of metadata schemas, but bibframe would need to be able to make library data held in a huge variety of metadata standards available for use on the web. another issue, cited by byrne and goddard, is that of privacy. “librarians, with their long tradition of protecting the privacy of patrons, will have to take an active role in linked data development to ensure rights are protected.”48 issues of copyright and ownership, something libraries already grapple with in the licensing of various library journals, databases, and other electronic resources, may be insurmountable. “libraries no longer own much of the content they provide to users; rather it is subscribed to from a variety of vendors. not only does that mean that vendors will have to make their data available in linked data formats for improvements to federated search to happen, but a mix of licensed and free content in a linked data environment would be extremely information technology and libraries | december 2014 18 difficult to manage.”49 again, overcoming obstacles such as these would require intense negotiation and cooperation between libraries and vendors. a sustainable and viable move to a linked data environment would need to be a cooperative effort between all involved parties and would have to have the full support and commitment of everyone involved before it could begin to move forward. moving libraries toward linked data making the move toward the use of linked data and modern bibliographic implementations such as bibframe will require a great deal of cooperation, sharing, learning, and investigation, but libraries are already starting to look toward a linked future and what it will take to get there. libraries will need to begin incorporating the principles of linked open data in their own catalogs and online resources as well as publishing and sharing as much data as possible. libraries also need to put forth a concerted effort to encourage vendors to move toward library systems which can accommodate a linked data environment. alemu et al. write that cooperation and collaboration between all of the involved stakeholders will be a crucial piece to the transfer of library metadata from catalog to web. in the process, and as part of this cooperative effort, libraries will have to wholeheartedly adopt the rdf/xml format, something alemu et al. deem “mandatory.”50 this would support the “conceptual shift from perceiving library metadata as a document or record to what coyle (2010) terms as actionable metadata, i.e., one that is machine-‐readable, mash-‐able and re-‐combinable metadata.”51 chudnov adds that libraries will need to follow “steady url patterns” for as much of their resources as possible, one of the key rules of linked data. 52 he also notes that we will know we have made progress on the implementation of linked data when “link hubs at smaller libraries (aka catalogs and discovery systems) cross link between local holdings, authorities, these national authority files, and peer libraries that hold related items,” though the real breakthrough will come when “the big national hubs add reciprocal links back out to smaller hub sites.”53 before this can happen, however, libraries must make sure that all of their own holdings link to each other, from the catalog to items in online exhibits. chudnov also advocates adding user-‐generated knowledge into the mix by allowing users to make new connections between resources when and where they can.54 borst, fingerle, and neubert, in their conference report from 2009, write that libraries and projects using linked data need to regard the catalog as a network, publish their data as linked data using the semantic web standards laid out by tim berners-‐lee, and link to external uris.55 they also suggest libraries use and help to further develop open standards that are already available rather than rely on in-‐house developments.56 in their final recommendation, they write that while libraries need to publish their data as open linked data on the web, they should also try to do so with the “least possible restrictions imposed by licences in order to ensure widest re-‐ usability.”57 linking libraries to the web | gonzales 19 conclusion the theories behind linked data and the semantic web are still in the process of being drawn out, but it is clear that at this point they are more than hypotheticals. linked data is the possible future of the web and how information will be organized, searched for, discovered, and retrieved. as search algorithms continue to improve and users continue to turn to them first (and sometimes entirely) for their information needs, libraries will need to make major changes to ensure the data they have painstaking created and curated over the decades remains relevant and reachable to users on the web. linked data provides the opportunity for libraries to integrate their authoritative data with user-‐generated data from the web, creating a rich network of reliable, current, far-‐reaching resources that will meet users’ needs wherever they are. libraries have always been known to embrace technology to stay at the forefront of user needs and provide unique and irreplaceable user services. to stay current with shifts in modern technology and user behavior, libraries need to be a driving force in the implementation of linked data, embrace semantic web standards, and take full advantage of the benefits and opportunities they present. ultimately, libraries can leverage the advantages created by linked data to construct a better information experience for users, keeping libraries both a relevant and more highly valued part of information retrieval in the twenty-‐first century. references 1. leif andresen, “after marc—what then?” library hi tech 22, no. 1 (2004): 41. 2. ibid., 40-‐51. 3. ibid., 43. 4. roy tennant, “marc must die,” library journal 127, no. 17 (2002): 26–28, http://lj.libraryjournal.com/2002/10/ljarchives/marc-‐must-‐die/#_. 5. andresen, “after marc—what then?” 6. tenant, “marc must die.” 7. andresen, “after marc—what then?”, 43. 8. tenant, “marc must die.” 9. getaneh alemu et al., “linked data for libraries: benefits of a conceptual shift from library-‐ specific record structures to rdf-‐based data models,” new library world 113, no. 11/12 (2012): 549-‐570, http://dx.doi.org/10.1108/03074801211282920. 10. marshall breeding, “linked data: the next big wave or another tech fad?,” computers in libraries 33, no. 3 (2013): 20-‐22, http://www.infotoday.com/cilmag/. 11. alemu et al., “linked data for libraries.” information technology and libraries | december 2014 20 12. mauro guerrini and tiziana possemato, “linked data: a new alphabet for the semantic web,” italian journal of library & information science 4, no. 1 (2013): 79-‐80, http://dx.doi.org/10.4403/jlis.it-‐6305. 13. tom baker, “designing data for the open world of the web,” italian journal of library & information science 4, no 1 (2013): 64, http://dx.doi.org/10.4403/jlis.it-‐6308. 14. tim berners-‐lee, “linked data,” w3.org, last modified june 18, 2009, http://www.w3.org/designissues/linkeddata.html. 15. karen coyle, “library linked data: an evolution,” italian journal of library & information science 4, no 1 (2013): 58, http://dx.doi.org/10.4403/jlis.it-‐5443. 16. gianfranco crupi, “beyond the pillars of hercules: linked data and cultural heritage,” italian journal of library & information science 4, no. 1 (2013), 36, http://dx.doi.org/10.4403/jlis.it-‐ 8587. 17. coyle, “library linked data: an evolution,” 56. 18. ibid., 56-‐57. 19. crupi, “beyond the pillars of hercules,” 35. 20. alemu et al., “linked data for libraries,” 562. 21. ibid. 22. thea lindquistet al., “using linked open data to enhance subject access in online primary sources,” cataloging & classification quarterly 51 (2013): 913-‐928, http://dx.doi.org/10.1080/01639374.2013.823583. 23. barbara tillett, “rda and the semantic web, linked data environment,” italian journal of library & information science 4, no. 1 (2013): 140, http://dx.doi.org/10.4403/jlis.it-‐6303. 24. alemu et al., “linked data for libraries.” 25. tillett, “rda and the semantic web, linked data environment,” 140. 26. gillian byrne and lisa goddard, “the strongest link: libraries and linked data,” d-‐lib magazine 16, no. 11/12 (2010), http://dx.doi.org/10.1045/november2010-‐byrne. 27. alemu et al., “linked data for libraries,” 560. 28. library of congress, bibliographic framework as a web of data: linked data model and supporting services, (washington, dc: library of congress, november 21 2012), http://www.loc.gov/bibframe/pdf/marcld-‐report-‐11-‐21-‐2012.pdf. 29. barbara tillett, “what is frbr? a conceptual model for the bibliographic universe,” library of linking libraries to the web | gonzales 21 congress, 2003, http://www.loc.gov/cds/downloads/frbr.pdf. 30. “bibframe frequently asked questions,” library of congress, http://www.loc.gov/bibframe/faqs/#q04. 31. ibid. 32. lindquist et al., “using linked open data to enhance subject access in online primary sources,” 923. 33. alan danskin, “linked and open data: rda and bibliographic control.” italian journal of library & information science 4, no. 1 (2013): 157, http://dx.doi.org/10.4403/jlis.it-‐5463. 34. erik t. mitchell, “three case studies in linked open data.” library technology reports 49, no. 5 (2013): 26-‐43. http://www.alatechsource.org/taxonomy/term/106. 35. angela kroeger, “the road to bibframe: the evolution of the idea of bibliographic transition into a post marc future,” cataloging & classification quarterly 51 (2013): 881, http://dx.doi.org/10.1080/01639374.2013.823584. 36. jason w. dean, “charles a. cutter and edward tufte: coming to a library near you, via bibframe,” in the library with the lead pipe, december 4, 2013, http://www.inthelibrarywiththeleadpipe.org/2013/charles-‐a-‐cutter-‐and-‐edward-‐tufte-‐ coming-‐to-‐a-‐library-‐near-‐you-‐via-‐bibframe/ . 37. “bibframe frequently asked questions,” library of congress, http://www.loc.gov/bibframe/faqs/#q04. 38. daniel chudnov, “what linked data is missing,” computers in libraries 31, no. 8 (2011): 35-‐ 36,http://www.infotoday.com/cilmag. 39. byrne and goddard, “the strongest link: libraries and linked data.” 40. alemu et al., “linked data for libraries,” 557. 41. roy tennant, “a bibliographic metadata infrastructure for the twenty-‐first century,” library hi tech 22, no. 2 (2004): 175-‐181, http://dx.doi.org/10.1108/07378830410524602. 42. alemu et al., “linked data for libraries,” 556. 43. breeding, “linked data.” 44. tennant, “a bibliographic metadata infrastructure for the twenty-‐first century.” 45. kroeger, “the road to bibframe,” 884-‐885. 46. ibid. 47. tennant, “a bibliographic metadata infrastructure for the twenty-‐first century.” information technology and libraries | december 2014 22 48. byrne and goddard, “the strongest link: libraries and linked data. 49. ibid. 50. alemu et al., “linked data for libraries.” 51. ibid., 563. 52. chudnov, “what linked data is missing.” 53. ibid. 54. ibid. 55. timo borst, birgit fingerle, and joachim neubert, “how do libraries find their way onto the semantic web?” liber quarterly 19, no 3/4 (2010): 336–43, http://liber.library.uu.nl/index.php/lq/article/view/7970/8271. 56. ibid. 57. ibid., 342-‐343. automatic extraction of figures from scientific publications in high-energy physics piotr adam praczyk, javier nogueras-iso, and salvatore mele information technology and libraries | december 2013 25 abstract plots and figures play an important role in the process of understanding a scientific publication, providing overviews of large amounts of data or ideas that are difficult to intuitively present using only the text. state-of-the-art digital libraries, which serve as gateways to knowledge encoded in scholarly writings, do not yet take full advantage of the graphical content of documents. enabling machines to automatically unlock the meaning of scientific illustrations would allow immense improvements in the way scientists work and the way knowledge is processed. in this paper, we present a novel solution for the initial problem of processing graphical content, obtaining figures from scholarly publications stored in pdf. our method relies on vector properties of documents and, as such, does not introduce additional errors, unlike methods based on raster image processing. emphasis has been placed on correctly processing documents in high-energy physics. the described approach distinguishes different classes of objects appearing in pdf documents and uses spatial clustering techniques to group objects into larger logical entities. many heuristics allow the rejection of incorrect figure candidates and the extraction of different types of metadata. introduction notwithstanding the technological advances of large-scale digital libraries and novel technologies to package, store, and exchange scientific information, scientists’ communication pattern has changed little in the past few decades, if not the past few centuries. the key information of scientific articles is still packaged in a form of text and, for several scientific disciplines, in a form of figures. new semantic text-mining technologies are unlocking the information in scientific discourse, and there exist some remarkable examples of attempts to extract figures from scientific publications,1 but current attempts do not provide a sufficient level of generality to deal with figures from high energy physics (hep) and cannot be applied in a digital library like inspire, which is our main piotr adam praczyk (piotr.praczyk@gmail.com) is a phd student at universidad de zaragoza, spain, and research grant holder at the scientific information service of cern, geneva, switzerland. javier nogueras-iso (jnog@unizar.es) is associate professor, computer science and systems engineering department, universidad de zaragoza, spain. salvatore mele (salvatore.mele@cern.ch) is leader of the open access section at the scientific information service of cern, geneva, switzerland. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 26 point of interest. scholarly publications in hep tend to contain highly specific types of figures (as any type of graphical content illustrating the text and referenced from it). in particular, they contain a high volume of plots, which are line-art images illustrating a dependency of a certain quality on a parameter. the graphical content of scholarly publications allows much more efficient access to the most important results presented in a publication.2,3 the human brain perceives the graphical content much faster than reading an equivalent block of text. presenting figures with the publication summary when displaying search results would allow more accurate assessment of the article content and in turn lead to a better use of researchers’ time. enabling users to search for figures describing similar quantities or phenomena could become a very powerful tool for finding publications describing similar results. combined with additional metadata, it could provide knowledge about evolution of certain measurements or ideas over time. these and many more applications created an incentive to research possible ways to integrate figures in inspire. inspire is a digital library for hep,4 the application field of this work. it provides a large-scale digital library service (1 million records, fifty-thousand users), which is starting to explore new mechanisms of using figures in articles of the field to index, retrieve, and present information.5,6 as a first step, direct access to graphical content before accessing the text of a publication can be provided. second, a description of graphics (“blue-band plot,” “the yellow shape region”) could be used in addition to metadata or full-text queries to retrieve a piece of information. finally, articles could be aggregated into clusters containing the same or similar plots in a possible alternative automated answer to a standing issue in information management. the indispensable step to realize this vision is an automated, resilient, and high-efficiency extraction of figures from scientific publications. in this paper, we present an approach that we have developed to address this challenge. the focus has been put on developing a general method allowing the extraction of data from documents stored in portable document format (pdf). the results of the algorithm consist of metadata, raster images of a figure, but also vector graphics, which allows easier further processing. the pdf format has been chosen as the input of the algorithm because it is a de facto standard in scientific communication. in the case of hep, mathematics, and other exact sciences, the majority of publications are prepared using the latex document formatting system and later compiled into a pdf file. the electronic versions of publications from outstanding scientific journals are also provided in pdf. the internal structure of pdf files does not always reveal the location of graphics. in some cases, images are included as external entities and easily distinguishable from the rest of a document’s content, but other times they are mixed with the rest of the content. therefore, to miss any figures, the low-level structure of a pdf had to be analyzed. the work described in this paper focuses on the area of hep. however, with minor variations, the described methods could be applicable to a different area of knowledge. information technology and libraries | december 2013 27 related work over years of development of digital libraries and document processing, researchers came up with several methods of automatically extracting and processing graphics appearing in pdf documents. based on properties of the processed content, these methods can be divided into two groups. the attempts of the first category deal with pdf documents in general, not making any assumptions about the content of encoded graphics or document type. the methods from the second group are more specific to figures from scientific publications. our approach belongs to the second group. tools include command line programs like pdf-images (http://sourceforge.net/projects/pdfimages/) or web-based applications like pdf to word (http://www.pdftoword.com/). these solutions are useful for general documents, but all suffer from the same difficulties when processing scientific publications: graphics that are recognized by such tools have to be marked as graphics inside pdf documents. this is the case with raster graphics and some other internally stored objects. in the case of scholarly documents, most graphics are constructed internally using pdf primitives and thus cannot be correctly processed by tools from the first group. moreover, general tools do not have the necessary knowledge to produce metadata describing the extracted content. with respect to specific tools for scientific publications it must be noted first that important scientific publishers like springer or elsevier have created services to allow access to figures present in scientific publications: the improvement of the sciverse science direct site (http://www.sciencedirect.com) for searching images in the case of elsevier7 and the springerimages service (http://www.springerimages.com/) in the case of springer.8 these services allow searches triggered from a text box, where the user can introduce a description of the required content. it is also possible to browse images by categories such as types of graphics (image, table, line art, video, etc.). the search engines are limited to searches based on figure captions. in this sense, there is little difference between the image search and text search implemented in a typical digital library. most of existing works aiming at the retrieval and analysis of figures use the rasterized graphical representation of source documents as its basis. browuer et al. and kataria et al. describe a method of detecting plots by means of wavelet analysis.9,10 they focus on the extraction of data points from identified figures. in particular, they address the challenge of correctly identifying overlapping points of data in plots. this problem would not manifest itself often in the case of vector graphics, which is the scenario proposed in our extraction method. vector graphics preserve much more information about the documents content than simple values of pixel colours. in particular, vector graphics describe overlapping objects separately. raster methods are also much more prone to additional errors being introduced during the recognition/extraction phase. the methods described in this paper could be used with kataria’s method for documents resulting from a digitization process.11 http://sourceforge.net/projects/pdf-images/) http://sourceforge.net/projects/pdf-images/) http://www.pdftoword.com/). http://www.sciencedirect.com/ http://www.springerimages.com/ automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 28 liu et al. present a page box-cutting algorithm for the extraction of tables from pdf documents.12 their approach is not directly applicable, but their ideas of geometrical clustering of pdf primitives are similar to the ones proposed in our work. however, our experiments with their implementation and hep publications have shown that the heuristics used in their work cannot be directly applied to hep, showing the need for an adapted approach, even in the case of tables. a different category of work, not directly related to graphics extraction but useful when designing algorithms, has been devoted to the analysis of graph use in scientific publications. the results presented by cleveland describe a more general case than hep publications.13 even if the data presented in the work came from scientific publications before 1984, included observations—for example, typical sizes of graphs—were useful with respect to general properties of figures and were taken into account when adjusting parameters of the presented algorithm. finally, there exist attempts to extract layout information from pdf documents. the knowledge of page layout is useful to distinguish independent parts of the content. the approach of layout and content extraction presented by chao and fan is the closest to the one we propose in this paper.14 the difference lies in the fact that we are focusing on the extraction of plots and figures from scientific documents, which usually follow stricter conventions. therefore we can make more assumptions about their content and extract more precise data. for instance, our method emphasizes the role of detected captions and permits them to modify the way in which graphics are treated. we also extract portions of information that are difficult to be extracted using more general methods, such as captions of figures. method pdf files have a complex internal structure allowing them to embed various external objects and to include various types of metadata. however, the central part of every pdf file consists of a visual description of the subsequent pages. the imaging model of pdf uses a language based on a subset of the postscript language. postscript is a complete programming language containing instructions (also called operators) allowing the rendering of text and images on a virtual canvas. the canvas can correspond to a computer screen or to another, possibly virtual, device used to visualize the file. the subset of postscript, which was used to describe content of pdfs, had been stripped from all the flow control operations (like loops and conditional executions), which makes it much simpler to interpret than the original postscript. additionally, the state of the renderer is not preserved between subsequent pages, making their interpretation independent. to avoid many technical details, which are irrelevant in this context, we will consider a pdf document as a sequence of operators (also called the content stream). every operator can trigger a modification of the graphical state of the pdf interpreter, which might be drawing a graphical primitive, rendering an external attached object, or modifying a position of the graphical pointer15 or a transformation matrix.16 the outcome of an atomic operation encoded in the content stream depends not only on parameters of the operation, but also on the way previous operators modified information technology and libraries | december 2013 29 the state of the interpreter. such a design makes a pdf file easy to render but not necessarily easy to analyze. figure 1 provides an overview of the proposed extraction method. at the very first stage, the document is pre-processed and operators are extracted (see “pre-processing of operators” below). later, graphical17 and textual18 operators are clustered using different criteria (see “inclusion of text parts” and “detection and matching of captions” below), and the first round of heuristics rejects regions that cannot be considered figures. in the next phase, the clusters of graphical operators are merged with text operators representing fragments of text to be included inside a figure (see “inclusion of text parts” below). the second round of heuristics detects clusters that are unlikely to be figures. text areas detected by the means of clustering text operations are searched for possible figure captions (see “detection and matching of captions” below). captions are matched with corresponding figure candidates, and geometrical properties of captions are used to refine the detected graphics. the last step generates data in a format convenient for further processing (see “generation of the output” below). figure 1. overview of the figure extraction method. additionally, it must be noted that another important pre-processing step of the method consists of the layout detection. an algorithm for segmenting pages into layout elements called page divisions is presented later in the paper. this considerably improves the accuracy of the extraction method because elements from different page divisions can no longer be considered to belong to the same cluster (and subsequently figure). this allows the method to be applied separately to different columns of a document page. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 30 pre-processing of operators the proposed algorithm considers only certain properties of a pdf operator rather than trying to completely understand its effect. considered properties consist of the operators’ type, the region of the page where the operator produces output and, in the case of textual operations, the string representation of the result. for simplicity, we suppress the notion of coordinate system transformation, inherent for the pdf rendering, and describe all operators in a single coordinate system of a virtual 2-dimensional canvas where operations take effect. transformation operators19 are assigned an empty operation region as they do not modify the result directly but affect subsequent operations. in our implementation, an existing pdf rendering library has been used to determine boundaries of operators. rather than trying to understand all possible types of operators, we check the area of the canvas that has been affected by an operation. if the area is empty, we consider the operation to be a transformation. if there exists a non-empty area that has been changed, we check if the operator belongs to a maintained list of textual operators. this list is created based on the pdf specification. if so, the operators argument list is scanned searching for a string and the operation is considered to be textual. an operation that is neither a transformation nor a textual operation is considered to be graphical. it might happen that text is generated using a graphical operator. however, such a situation is unusual. in the case of operators triggering the rendering of other operators, which is the case when rendering text using type-3 fonts, we consider only the top-level operation. in most cases, separate operations are not equivalent to logical entities considered by a human reader (such as a paragraph, a figure, or a heading). graphical operators are usually responsible for displaying lines or curve segments while humans think in terms of illustrations, data lines, etc. similarly, in the case of text, operators do not have to represent complete or separate words or paragraphs. they usually render parts of words and sometimes parts of more than one word. the only assumption we make about the relation between operators and logical entities is that a single operator does not trigger rendering of elements from different detected entities (figures, captions). this is usually true because logical entities tend to be separated by a modification of the context—there is a distance between text paragraphs or an empty space between curves. clustering of graphical operators the clustering algorithm the representation of a document as a stream of rectangles allows the calculation of more abstract elements of the document. in our model, every logical entity of the document is equivalent to a set of operators. the set of all operators of the document is divided into disjoint subsets in the process called clustering. operators are decided to belong to the same cluster based on the position of their boundaries. the criteria for the clustering are based on a simple but important observation: information technology and libraries | december 2013 31 operations forming a logical entity have boundaries lying close to each other. groups of operations forming different entities are separated by empty spaces. algorithm 1. the clustering algorithm. the clustering of textual operations yields text paragraphs and smaller objects like section headings. however, in the case of graphical operations, we can obtain consistent parts of images, but usually not complete figures yet. outcomes of the clustering are utilized during the process of figures detection. algorithm 1 shows the pseudo-code of the clustering algorithm. the input of the algorithm consists of a set of pre-processed operators annotated with their affected area. the output is a division of the input set into disjoint clusters. every cluster is assigned a boundary equal to the smallest rectangle containing boundaries of all included operations. in the first stage of the algorithm (lines 6–20), we organize all input operations in a data structure of forest of trees. every tree describes a separate cluster of operations. the second stage (lines 21– 29) converts the results (clusters) into a more suitable format. 1: input: operationset input_operations {set of operators of the same type} 2: output: map {spatial clusters of operators} 3: intervaltree tx ← intervaltree() 4: intervaltree ty ← intervaltree() 5: map parent ← map() 6: for all operation op ∈ input_operations do 7: rectangle boundary ← extendbymargins(op.boundary) 8: repeat 9: operationset int_opsx ← tx.getintersectingops(boundary) 10: operationset int_opsy ← ty.getintersectingops(boundary) 11: operationset int_ops ← int_opsx ∩ int_opsy 12: for all operation int_op ∈ int_ops do 13: rectangle bd ← tx[int_op] × ty[int_op] 14: boundary ← smallestenclosing(bd, boundary) 15: parent[int_op] ← op 16: tx.remove(int_op); ty.remove(int_op) 17: end for 18: until int_ops = ∅ 19: tx.add(boundary, op); ty.add(boundary, op) 20: end for 21: map results ← map() 22: for all operation op ∈ input_operations do 23: operation root_ob ← getroot(parent, op) 24: rectangle rec ← tx[int_ob] × ty[int_ob] 25: if not results.has_key(rec) then 26: results[rec] ← list() 27: end if 28: results[rec].add(op) 29: end for 30: return results automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 32 the clustering of operations is based on the relation of their rectangles being close to each other. definition 1 formalizes the notion of being close, making it useful for the algorithm. definition 1: two rectangles are considered to be located close to each other if they are intersecting after expanding their boundaries in every direction by a margin. the value by which rectangles should be extended is a parameter of the algorithm and might be different in various situations. to detect if rectangles are close to each other, we needed a data structure allowing the storage a set of rectangles. this data structure was required to allow retrieving all stored rectangles that intersect a given one. we have constructed the necessary structure using an important observation about the operation result areas. in our model all bounding rectangles have their edges parallel to the edges of the reference canvas on which the output of the operators is rendered. this allowed us to reduce our problem from the case of 2-dimensional rectangles to the case of 1-dimensional intervals. we can assume that edges of the rectangular canvas define the coordinates system. it is easy to prove that two rectangles of edges parallel to the axis of the coordinates system intersect only if both their projections in the directions of axis intersect. the projection of a rectangle into an axis is always an interval. the observation made above has allowed us to build the required 2-dimensional data structure by remembering two 1-dimensional data structures that recall a number of intervals and for a given interval return the set of intersecting ones. such a 1-dimensional data structure has been provided by interval-trees.20 every interval inside the tree has an arbitrary object assigned to it, which in this case is a representation of the pdf operator. this object can be treated as an identifier of the interval. the data structure also implements a dictionary interface, mapping objects to actual intervals. at the beginning, the algorithm initializes two empty interval trees representing projections on the x and y axes, respectively. those trees store values about projections of the biggest so-far calculated areas rather than about particular operators. each cluster is represented by the most recently discovered operation belonging to it. during the algorithm execution, each operator from the input set is considered only once. the order of processing is not important. the processing of a single operator proceeds as follows (the interior of the outermost “for all” loop of the algorithm). 1. the boundary of the operation is extended by the width of margins. the spatial data structure described earlier is utilized to retrieve boundaries of all already detected clusters (lines 9–10) 2. the forest of trees representing clusters is updated. the currently processed operation is added without a parent. roots of all trees representing intersecting clusters (retrieved in previous step) are attached as children of the new operation. information technology and libraries | december 2013 33 3. the boundary of the processed operation is extended to become the smallest rectangle containing all boundaries of intersecting clusters and the original boundary. finally, all intersecting clusters are removed from the spatial data structure. 4. lines 9–17 of the algorithm are repeated as long as there exist areas intersecting the current boundary. in some special cases, more than one iteration may be necessary. 5. finally, the calculated boundary is inserted into the spatial data structure as a boundary of a new cluster. the currently processed operation is designed to represent the cluster and so is remembered as a representation of the cluster. after processing all available operations, the post–processing phase begins. all the trees are transformed into lists. the resulting data structure is a dictionary having boundaries of detected clusters as keys and lists of belonging operations as values. this is achieved in lines 21–29. during the process of retrieving the cluster to which a given operation belongs, we use a technique called path compression, known from the union-find data structure.21 filtering of clusters graphical areas detected by a simple clustering usually do not directly correspond to figures. the main reason for this is that figures may contain not only graphics, but also portions of text. moreover, not all graphics present in the document must be part of a figure. for instance, common graphical elements not belonging to a figure include logos of institutions and text separators like lines and boxes; various parts of mathematical formulas usually include graphical operations; and in the case of slides from presentations, the graphical layout should not be considered part of a figure. the above shows that the clustering algorithm described earlier is not sufficient for the purpose of figures detection and it yields a results set wider than expected. in order to take into account the aforementioned characteristics, pre-calculated graphical areas are subject to further refinement. this part of the processing is highly domain-dependent as it is based on properties of scientific publications in a particular domain, in this case publications of hep. in the course of the refinement process, previously computed clusters can be completely discarded, extended with new elements, or some of their parts might be removed. in this subsection we discuss the heuristics applied for rejecting and splitting clusters of graphical operators. there are two main reasons for rejecting a cluster. the first of them is a size being too small compared to a page size. the second is the figure candidate having its aspect ratio outside a desired interval of values. the first heuristic is designed to remove small graphical elements appearing for example inside mathematical formulas, but also small logos and other decorations. the second one discards text separators and different parts of mathematical equations, such as a line-separating numerator from a denominator inside a fraction. the thresholds used for filtering are provided as automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 34 configurable properties of the algorithm and their values are assigned experimentally in a way maximising the accuracy of figures detection. additionally, the analysis of the order of operations forming the content stream of a pdf document may help to split clusters that were incorrectly joined by algorithm 1. parts of the stream corresponding to logical parts of the document usually form a consistent subsequence. this observation allows the construction of a method of splitting elements incorrectly clustered together. we can assign content streams not only to entire pdf documents or pages, but also to every cluster of operations. the clustering algorithm presented in algorithm 1 returns a set of areas with a list of operations assigned to each of them. the content stream of a cluster consists of all operations from such a set ordered in the same manner as in the original content stream of the pdf document. the usage of the original content stream allows us to define a distance in the content stream as follows: definition 2. if o 1 and o 2 are two operations appearing in the content stream of the pdf document, by the distance between these operations we understand the number of textual and graphical operations appearing after the first of them and before the second of them. to detect situations when a figure candidate contains unnecessary parts, the content stream of a figure candidate is read from the first to the last operation. for every two subsequent operations, the distance between them in the sense of the original content stream is calculated. if the value is larger than a given threshold, the content stream is split into two parts, which become separate figure candidates. for both candidates, a new boundary is calculated. this heuristic is especially important in the case of less formal publications such as slides from presentations at conferences. presentation slides tend to have a certain number of graphics appearing on every page and not carrying any meaning. simple geometrical clustering would connect elements of page style with the rest of the document content. measuring the distance in the content stream and defining a threshold on the distance facilitates the distinction between the layout and the rest of the page. this technique also might be useful to automatically extract the template used for a presentation, although this transcends the scope of this publication. clustering of textual operators the same algorithm that clusters graphical elements can cluster parts of text. detecting larger logically consistent parts of text is important because they should be treated as single entities during subsequent processing. this comprises, for example, inclusion inside a figure candidate (e.g., captions of axes, parts of a legend) and classification of a text paragraph as a figure caption. inclusion of text parts the next step in figures extraction involves the inclusion of lost text parts inside figure candidates. information technology and libraries | december 2013 35 at the stage of operations clustering, only the operations of the same type (graphical or textual) were considered. the results of those initial steps become the input to the clustering algorithm that will detect relations between previously detected entities. by doing this, we move one level farther in the process of abstracting from operations. we start from basic meaningless operations. later we detect parts of graphics and text, and finally we are able to see the relations between both. not all clusters detected at this stage are interesting because some might consist uniquely of text areas. only those results that include at least one graphical cluster may be subsequently considered figure candidates. another round of heuristics marks unnecessary intermediate results as deleted. applied methods are very similar to those described in “filtering of clusters” (above), only thresholds deciding on the rejections must change because we operate on geometrically much larger entities. also the way of application is different—candidates rejected at this stage can be later restored to the status of a figure. instead of permanently removing, heuristics of this stage only mark figure candidates as rejected. this happens in the case of the candidates having incorrect aspect ratio, incorrect sizes or consisting only of horizontal lines (which is usually the case with mathematical formulas but also tables). in addition to using the aforementioned heuristics, having clusters consisting of a mixture of textual and graphical operations allows the application of new heuristics. during the next phase, we analyze the type of operations rather than their relative location. in some cases, steps described earlier might detect objects that should not be considered a figure, such as text surrounded by a frame. this situation can be recognized by the calculation of a ratio between the number of graphical and textual operations in the content stream of a figure candidate. in our approach we have defined a threshold that indicates which figure candidates should be rejected because they contain too few graphics. this allows the removal of, for instance, blocks of text decorated with graphics for aesthetic reasons. the ratio between numbers of graphical and textual operations is smaller for tables than for figures, so extending the heuristic with an additional threshold could improve the table–figure distinction. another heuristic analyzes ratio between the total area of graphical operations and the area of the entire figure candidate. subsequently, we mark as deleted the figure candidates containing horizontal lines as the only graphical operations. these candidates describe tables or mathematical formulas that have survived previous steps of the algorithm. tables can be reverted to the status of figure candidates in later stages of processing. figure candidates that survive all the phases of filtering are finally considered to be figures. figure 2 shows a fragment of a publication page with indicated text areas and final figure candidates detected by the algorithm. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 36 figure 2. a fragment of the pdf page with boxes around every detected text area and each figure candidate. dashed rectangles indicate figure candidates. solid rectangles indicate text areas. detection and matching of captions the input of the part of the algorithm responsible for detecting figure captions consists of previously determined figures and all text clusters. the observation of scientific publications shows that, typically, captions of figures start with a figure identifier (for instance see the grammar for figure captions proposed by bathia, lahiri, and mitra.22 the identifier usually starts with a word describing a figure type and is followed by a number or some other unique identifier. in more complex documents, the figure number might have a hierarchical structure reflecting, for example, the chapter number. the set of possible figure types is very limited. in the case of hep publications, the most usual combinations include words “figure”, “plot,” and different variations of their spelling and abbreviating. information technology and libraries | december 2013 37 during the first step of the caption detection, all text clusters from the publication page are tested for the possibility of being a caption. this consists of matching the beginning of the text contained in a textual cluster with a regular expression determining what is a figure caption. the role of the regular expression is to elect strings starting with one of the predefined words, followed by an identifier or beginning of a sentence. the identifier is subsequently extracted and included in the metadata of a caption. the caption detection has to be designed to reject paragraphs of the type “figure 1 presents results of (. . .)”. to achieve this, we reject the possibility of having any lowercase text after the figure identifier. having the set of all the captions, we start searching for corresponding figures. all previous steps of the algorithm take into account the division of a page into text columns (see “detection of the page layout” below). when matching captions with figure candidates, we do not take into account the page layout. matching between figure candidates and captions happens at every document page separately. we consider every detected caption once, starting with those located at the top of the page and moving down toward the end. for every caption we search figure candidates lying nearby. first we search above the caption and, in the case of failure, we move below the caption. we take into account all figure candidates, including those rejected by heuristics. in the case of finding multiple figure candidates corresponding to a caption, we merge them into a single figure, treating previous candidates as subfigures of a larger figure. we also include small portions of text and graphics previously rejected from figure candidates that lie between figure and caption and between different parts of a figure. these parts of text usually contain identifiers of the subfigures. the amount of unclustered content that can be included in a figure is a parameter of the extraction algorithm and is expressed as a percentage of the height of the document page. it might happen that captions are located in a completely different location, but this case is rare and tends to appear in older publications. the distance from the figure is calculated based on the page geometry. the captions should not be too distant from the figure. generation of the output the choice of the format in which data should be saved at the output of the extraction process should take into account further requirements. the most obvious use case of displaying figures to end users in response to text-based search queries does not yield very sophisticated constraints. a simple raster graphic annotated with captions and possibly some extracted portions of metadata would be sufficient. unfortunately, the process of generating raster representations of figures might lose many important pieces of information that could be used in the future for an automatic analysis. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 38 to store as much data as possible, apart from storing the extracted figures in a raster format (e.g., png), we also decided to preserve their original vector character. vector graphics formats, similarly to pdf documents, contain information about graphical primitives. primitives can be organized in larger logical entities. sometimes rendering of different primitives leads to a modification of the same pixel of resulting image. such a situation might happen, for example, when circles are used to draw data points lying nearby on the same plot. to avoid such issues, we convert figures into scalable vector graphics (svg) format.23 on the implementation level, the extraction of vector representation of a figure proceeds in a manner similar to regular rendering of a pdf document. the interpreter preserves the same elements of the state and allows their modification by transformation operations. a virtual canvas is created for every detected figure. the content stream of the document is processed and all the transformation operations are executed modifying the interpreter’s state. the textual and graphical operators are also interpreted, but they affect only the appropriate canvas of the figure to which the operation belongs. if a particular operation does not belong to any figure, no canvas is affected. the behaviour of graphical canvases used during the svg generation is different from the case of raster rendering. instead of creating graphical output, every operation is transformed into a corresponding primitive and saved within an svg file. pdf was designed in such a manner that the number of external dependencies of a file is minimized. this design decision led to the inclusion of the majority of fonts in the document itself. it would be possible to embed font glyphs in the svg file and use them to render strings. however, for the sake of simplicity, we decided to omit font definitions in the svg output. a text representation is extracted from every text operation, and the operation is replaced by a svg text primitive with a standard font value. this simplification affects what the output looks like, but the amount of formatting information that is lost is minimal. moreover, this does not pose a problem because vector representations are intended to be used during automatic analysis of figures rather than for displaying purposes. a possible extension of the presented method could involve embedding complete information about used glyphs. finally, the generation of the output is completed with some metadata elements. an exhaustive categorization of the metadata that can be compiled for figures could be the customization of the one proposed by liu et al. for table metadata.24 in the case of figures, the following categories could be distinguished: (1) environment/geography metadata (information of the document where the figure is located); (2) affiliated metadata (e.g., captions, references, or footnotes); (3) layout metadata (information about the original visualization of the figure); (4) content data; and (5) figure type metadata. for the moment, we compile only environment/geography metadata and affiliated metadata. the geography/environment metadata consists of the document title, the document authors, the document date (creation and publication), and the exact location of a figure inside a publication information technology and libraries | december 2013 39 (page and boundary). most of these elements are provided by simply referencing the original publication in the inspire repository. the affiliated metadata consists of the text caption and the exact location of the caption in the publication (page and boundary). in the future, metadata from other categories will be annotated for each figure. detection of the page layout figure 3. sample page layouts that might appear in a scientific publication. the black color indicates areas where content is present. in this section we discuss how to detect the page layout, an issue which has been omitted in the main description of the extraction algorithm, but which is essential for an efficient detection of figures. figure 3 depicts several possibilities of organising content on the page. as mentioned in previous sections, the method of clustering operations based on their geometrical position may fail in the case of documents having a complex page layout. the content appearing in different columns should never be considered belonging to the same figure. this cannot be assured without enforcing additional constrains during the clustering phase. to address this difficulty, we enhanced the figure extractor with a pre-processing phase of detecting the page layout. being able to identify how the document page is divided into columns enables us to execute the clustering within every column separately. it is intuitively obvious, what can be understood as a page layout, although to provide a method of calculating such, we need a more formal definition, which we provide below. by the layout of a page, we understand a particular division of a page into areas called columns. each area is a sum of disjoint rectangles. the division of a page into areas must satisfy a set of conditions summarized in definition 3. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 40 definition 3: let p be a rectangle representing the page. the set d containing subareas of a page is called a page division if and only if � 𝑄 = 𝑃 𝑄∈𝐷 ∀𝑥,𝑦∈𝐷𝑥 ∩ 𝑦 = ∅ ∀𝑄∈𝐷𝑄 ≠ ∅ ∀𝑄∈𝐷∃𝑅=�𝑥:𝑥 𝑖𝑠 𝑎 𝑟𝑒𝑐𝑡𝑎𝑛𝑔𝑙𝑒,∀𝑦∈r\{x} 𝑦∩x=∅ �𝑄 = � 𝑥𝑥∈𝑅 every element of a division is called a page area. to be considered a page layout, borders of areas from the division must not intersect the content of the page. definition 3 does not guarantee that the layout is unique. a single page might be assigned different divisions satisfying the definition. additionally, not all valid page layouts are interesting from the point of view of figures detection. the segmentation algorithm calculates one of such divisions, imposing additional constraints on the detected areas. the layout-calculation procedure utilizes the notion of separators, introduced by definition 4. definition 4: a vertical (or horizontal) line inside a page or on its borders is called a separator if its horizontal (vertical) distance from the page content is larger than a given constant value. the algorithm consists of two stages. first, the vertical separators of a sufficient length are detected and used to divide the page into disjoint rectangular areas. each area is delimited by two vertical lines, each of which forms a consistent interval inside of one of the detected vertical separators. at this stage, horizontal separators are completely ignored. figure 4 shows a fragment of a publication page processed by the first stage of the layout-detection. the upper horizontal edge of one of the areas lies too close too close to two text lines. with the constant of the definition 4 chosen to be sufficiently large, this edge would not be a horizontal separator and thus the generated division of the page would require additional processing to become a valid page layout. the second stage of the algorithm transforms the previously detected rectangles into a valid page layout by splitting rectangles into smaller parts and by joining appropriate rectangles to form a single area. information technology and libraries | december 2013 41 figure 4. example of intermediate layout-detection results requiring the refinement. algorithm 2 shows the pseudo-code of the detection of vertical separators. the input of the algorithm consists of the image of the publication page. the output is a list of vertical separators aggregated by their x-coordinates. every element of this list consists of two elements: an integer indicating the x-coordinate and the list of y-coordinates describing the separators. the first element of this list indicates the y-coordinate of the beginning of the first separator. the second element is the y-coordinate of the end of the same separator. the third and fourth elements describe the second separator and the same mechanism is used for the remaining separators (if they exist). the algorithm proceeds according to the sweeping principle known from the computational geometry.25 the algorithm reads the publication page starting from the left. for every xcoordinate value, a set of corresponding vertical separators is detected (lines 9–18). vertical separators are searched as consistent sequences of blank points. a point is considered blank if all the points in its horizontal surrounding of the radius defined by the constant from definition 5 are of the background colour. not all blank vertical lines can be considered separators. short, empty spaces usually delimit lines of text or different small units of the content. in line 11 we test detected vertical separators for being long enough. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 42 if a separator has been detected in a particular column of a publication page, the adjacent columns also tend to contain similar separators. lines 19–31 of the algorithm are responsible for electing the longest candidate among the adjacent columns of the page. the maximization is performed across a set of adjacent columns for which at least one separator exists. algorithm 2. detecting vertical separators. the detected separators are used to create the preliminary division of the page, similar to the one from the example of figure 4. as with the previous step, separators are considered one by one in the order of increasing x coordinate. at every moment of the execution, the algorithm maintains a division of the page into rectangles. this division corresponds only to the already detected vertical separators. updating the previously considered division is facilitated by processing separators in a particular well-defined order. 1: input: the page image 2: output: vertical separators of the input page 3: list> separators ← ∅ 4: int max_weight ← 0; 5: boolean maximizing ← false 6: for all x ∈ {minx … maxx} do 7: emptyb ← 0, current_eval ← 0 8: empty_areas ← list() 9: for all y ∈ {0 … page_height} do 10: if point at (x, y) is not blank then 11: if y – emptyb – 1 > heightmin then 12: empty_areas.append(emptyb) 13: empty_areas.append(y = page_height? y : y-1) 14: current_eval ← current_eval + y emptyb 15: end if 16: emptyb ← y + 1 17: end if 18: end for {we have already processed the entire column. now we are comparing with adjacent already processed columns} 19: if max_weight < current_eval then 20: max_weight ← current_eval 21: max_separators ← empty_areas 22: maxx ← x 23: end if 24: if maximising then 25: if empty_areas = ∅ then 26: separators.add() 27: maximising ← false, max_weight ← 0 28: end if 29: else 30: maximising ← (empty_areas ≠ ∅) 31: end if 32: end for 33: return separators information technology and libraries | december 2013 43 before presenting the final outcome, the algorithm must refine the previously calculated division. this happens in the second phase of the execution. all the horizontal borders of the division are then moved along adjacent vertical separators until they become horizontal separators in the sense of definition 4. typically, moving the horizontal borders result in dividing already existing rectangles into smaller ones. if such a situation happens, both newly created parts are assigned to different page layout areas. sometimes when moving separators is not possible, different areas are combined together, forming a larger one. tuning and testing the extraction algorithm described here has been implemented in java and tested on a random set of scientific articles coming from the inspire repository. the testing procedure has been used to evaluate the quality of the method, but also allowed to tweak the parameters of the algorithm to maximize the outcomes. preparation of the testing set to prepare the testing set, we randomly selected 207 documents stored in inspire. in total, these documents consisted of 37,28 pages which contained 1,697 figures altogether. the records have been selected according to a uniform probability distribution across the entire record space. this way, we have created a collection that is representative for the entire inspire including historical entries. currently, inspire consists of: 1,140 records describing publications written before 1950; 4,695 between 1950 and 1960; 32,379 between 1960 and 1970; 108,525 between 1970 and 1980; 167,240 between 1980 and 1990; 251,133 between 1990 and 2000; and 333,864 in the first decade of the twenty-first century. in total, up to july 2012, inspire manages 952,026 records. it can be seen that the rate of growth has increased with time and most of inspire documents come from the last decade. the results on such a testing set should accurately estimate the efficiency of extraction for existing documents but not necessarily for new documents, being ingested into inspire. this is because inspire contains entries describing old articles which were created using obsolete technologies or scanned and encoded in pdf. the extraction algorithm is optimized for born-digital objects. to test the hypothesis that the extractors provides better results for newer papers, the testing set has been split into several subsets. the first set consists of publications published before 1980. the rest of the testing set has been split into subsets corresponding to decades of publication. to simplify the counting of correct figure detections and to provide a more reliable execution and measurement environment, every testing document has been split into many of pdf documents consisting of a single page. subsequently, every single page document has been manually annotated with the number of figures appearing inside. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 44 execution of the tests the efficient execution of the testing was possible thanks to a special script executing the plots extractor on every single page separately and then computing the total number of successes and failures. the script allows the execution of tests in a distributed heterogeneous environment and allows dynamic connection and disconnection of computing nodes. in the case of a software failure, the extraction request is resubmitted to a different computation node, allowing the avoidance problems related to a worker node configuration rather than to the algorithm implementation itself. during the preparation of the testing set, we manually annotated all the expected extraction results. subsequently, the script compared these metadata with the output of the extractor. using aggregated numbers from all extracted pages allowed us to calculate efficiency measures of the extraction algorithm. as quality measures, we used recall and precision.26 their definitions are included in the following equations: at every place where we needed a single comparable quality measure rather than two semiindependent numbers, we have used a harmonic average of the precision and the recall.27 table 1 summarizes the results obtained during the test execution for every subset of our testing set. figure 5 shows the dependency of recall and precision on the time of publication. the extractor parameters used in this test execution were chosen based on intuition and small number of manually triggered trials. in the next section we describe an automatic tuning procedure we have used to find the most optimal algorithm arguments. testsettheinpresentfigures figuresextractedcorrectly recall # # = figuresextracted figuresextractedcorrectly precision # # = information technology and libraries | december 2013 45 –1980 1980–90 1990–2000 2000–10 2010–12 number of existent figures 114 60 170 783 570 number of correctly detected figures 59 53 164 703 489 number of incorrectly detected figures 26 78 65 40 73 total number of pages 85 136 760 1919 828 number of correctly processed pages 20 44 712 1816 743 table 1. results of the test execution. figure 5. recall and precision as functions of decade of the date of the publication. it can be seen that, as expected, the efficiency increases with the increasing time of publication. a total recall and precision for all samples since 1990, which constitutes a majority of the inspire corpus, were both 88 percent. precision and recall based on the correctly detected figures do not give a full image of the algorithm efficiency because the extraction has been executed on a number of pages not containing any figures. the correctly extracted pages not having any figures do not appear in the recall and precision statistics because in their case the expected and detected number of figures are both equal to 0. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 46 besides recall and precision, figure 5 depicts also the fraction of pages that have been extracted correctly. taking into account the samples since 1990, 3,271 pages out of 3,507 have been detected completely correctly, which makes 93 percent success rate counted by number of pages. as it can be seen, this measure is higher than both the precision and the recall. the analysis of the extractor results in the case of failure shows that in many cases, even if results are not completely correct, they are not far from the expectation. there are different reasons of the algorithm failing. some of them may result from non-optimal choice of algorithm parameters, others from document layout being too far from the assumed one. in some rare cases, even manual inspection of the document does not allow an obvious identification of figures. the automatic tuning of parameters in previous section we have shown the results obtained by executing the extraction algorithm on a sample set. during this execution we were using extractor arguments which seemed to be the most correct based on our observation but also on other research (typical sizes of figures, margin sizes, etc.).28 this way of algorithm configuration was useful during the development, but is not likely to yield the best possible results. to find better parameters, we have implemented a method of automatic tuning. metrics described in the previous section provided a good method of measuring the efficiency of the algorithm running based on given parameters. the choice of optimal parameters can be relative to the choice of documents on which the extraction is to be performed. the way in which the testing set has been selected, allowed us to use it as representative for the hep publications. to tune the algorithm, we have used a described subset of testing set from the previous step as a reference. the subset consisted of all entries created after 1990. this allowed us to minimize the presence of scanned documents which, by design, cannot be correctly processed by our method. the adjustment of parameters has been performed by a dedicated script which has executed the extraction using various parameter values and has read results. the script has been configured with a list of tuneable parameters together with their type and allowed values range. additionally, the script had the knowledge of the believed best value, which was the one used in previous testing. to decrease the complexity of training, we have made several assumptions about the parameters. these assumptions are only an approximation of real nature of parameters, but the practice has shown that they are good enough to permit the optimization: • we assume that the precision and recall are continuous with respect to the parameters. this allows us to assume that efficiency of the algorithm for parameter values close to a given one will be close. the optimization has proceeded by sampling the parametric space in a number of points and executing tests using the selected points as parameter values. information technology and libraries | december 2013 47 having n parameters to optimize and dividing the space of every parameter into m regions leads to the execution of mn tests. execution of every test is a timely operation due to the size of the training set. • we assume that parameters are independent from each other. this means that we can divide the problem of finding an optimal solution in the n-dimensional space of n configuration arguments into finding n solutions in 1-dimensional subspaces. such an assumption seems to be intuitive and considerably reduces the number of necessary tests from o(mn) to o(m⋅n), where m is the number of samples taken from a single dimension. in our tests, the parametric space has been divided into 10 equal intervals in every direction. in addition to checking the extraction quality in those points, we have executed one test for the so-far best argument. in order to increase the level of fine-tuning of the algorithm, each test has been reexecuted in the region, where chances of finding a good solution were considered the highest. this consisted of a region centred around the highest result and having a radius of 10 percent of the parameter space. figure 6 and figure 7 show the dependency of the recall and the precision on an algorithm parameter. the parameter depicted in figure 6 indicates what minimal aspect ratio the figure candidate must have in order to be considered a correct figure. it can be seen that tuning this heuristic increases the efficiency of the extraction. moreover, the dependency of recall and precision on the parameter is monotonic which is the most compatible with the chosen optimization method. the parameter of figure 7 specifies which fraction of the area of the entire figure candidate has to be occupied by graphical operations. this parameter has a lower influence on the extraction efficiency. such a situation can happen when more than one heuristic influences the same aspect of the. this is contradictory with the assumption of parameter independence, but we have decided to use the present model for the simplicity. figure 6. effect of the minimal aspect ratio on precision and recall. automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 48 figure 7. effect on the precision and recall of the area fraction occupied by graphical operations. after executing the optimization algorithm, we have managed to achieve a recall of 94.11 percent and a precision of 96.6 percent, which is a considerable improvement compared to previous results of 88 percent. conclusions and future work this work has presented a method for extracting figures from scientific publications in a machinereadable format, which is the main step toward the development of services enabling access and search of images stored in scientific digital libraries. in recent years, figures have been gaining increasing attention in the digital libraries community. however, little has been done to decipher the semantics of these graphical representations and to bridge the semantic gap between content, which can be understood by machines and this which is managed by digital libraries. extracting figures and storing them in uniform and machine-readable format constitutes the first step towards the extraction and the description of the internal semantics of figures. storing semantically described and indexed figures would open completely new possibilities of accessing the data and discovering connections between different types of publishing artefacts and different resources describing related knowledge.29 our method of detecting fragments of pdf documents that correspond to figures is based on a series of observations of the character of publications. however, tests have shown that additional work is needed to improve the correctness of the detection. also, the performance should be reevaluated after we have a large set of correctly annotated figures, confirmed by users of our system. the heuristics used by the algorithm are based on a number of numeric parameters that we have tried to optimize using automatic techniques. the tuning procedure has made several arbitrary assumptions on the nature of the dependency between parameters and extraction results. a future approach to the parameter optimization, requiring much more processing, could information technology and libraries | december 2013 49 involve the execution of a genetic algorithm that would treat the parameters as gene samples.30 this could potentially allow a discovery of a better parameter set because a smaller set of assumptions would be imposed on the parameters. a vector of algorithm parameters could play the role of a gene and random mutations could be introduced to previously considered and subsequently crossed genes. the evaluation and selection of surviving genes could be performed by the usage of the metrics described previously. another approach to improving the quality of the tuning could involve extending the present algorithm by a discovery of mutually dependent parameters and usage of special techniques (relaxing the assumptions) to fine-tune in subspaces spanned by these parameters. all of our experiments have been performed using a corpus of publications from hep. the usage of the extraction algorithm on a different corpus would require tuning the parameters for the specific domain of application. for the area of hep, we can also consider preparing several sets of execution parameters varying by decade of document publication or by other easy to determine characteristics. subsequently, we could decide which extraction method to run, based on those metrics. in addition to a better tuning of the existing heuristics, there are improvements that can be made at the level of the algorithm. for example, we could mention extending the process of clustering text parts. in the current implementation, the margins by which textual operations are extended during the clustering process are fixed as algorithm parameters. this approach proved to be robust in most cases. in fact, distances between text lines tend to be different depending on the currently utilized style. every text portion tends to have one style that dominates. an improved version of the text-clustering algorithm could use local rather than global properties of the content. this would not only allow to correctly handle the entire document written using different text styles, but also help to manage cases of single paragraphs differing from the rest of the content. another important, not-yet-implemented improvement related to figure metadata is the automatic extraction of figure references from the text content. important information about figure content might be stored in the surroundings of the place where publication text refers to a figure. furthermore, the metadata could be extended by the usage of some type of classifier that would assign a graphics type to the extracted result. currently, we are only distinguishing between tables and figures based on simple heuristics involving number and type of graphical areas and the text inside of the detected caption. in the future, we could detect line-plots from photos, histograms and so on. such a classifier could be implemented using artificial intelligence techniques such as support vector machines.31 finally, partial results of the figures extraction algorithm might be useful in performing other pdf analyses: • the usage of clustered text areas could allow a better interpretation and indexing of textual content stored in digital libraries with full-text access. clusters of text tend to describe automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 50 logical parts like paragraphs, section and chapter titles, etc. a simple extension of the current schema could allow the extraction of predominant formatting style of the text encoded in a page area. text parts written in different styles could be indexed in a different manner giving for instance more importance to segments written with larger font. • we mentioned that the algorithm detects not only figures, but also tables. a heuristic is being used in order to distinguish tables from different types of figures. our present effort concentrates on correct treatment of figures, but a useful extension could allow extraction of different types of entities. for instance, another common type of content ubiquitous in hep documents are mathematical formulas. thus, in addition to figures, it would be important to extract tables and formulas in structured format allowing a further processing. the internal architecture of the implemented prototype of the figure extractor allows easy implementation of extension modules which can compute other properties of pdf documents. acknowledgements this work has been partially supported by cern, and the spanish government through the project tin2012-37826-c02-01. references 1. saurabh kataria, “on utilization of information extracted from graph images in digital documents,” bulletin of ieee technical comittee on digital libraries 4, no. 2 (2008), http://www.ieee-tcdl.org/bulletin/v4n2/kataria/kataria.html. 2. marti a. hearst et al., “exploring the efficacy of caption search for bioscene journal search interfaces,” proceedings of the workshop on bio nlp 2007: biological, translational and clinical language processing: 73–80, http://dl.acm.org/citation.cfm?id=1572406. 3. lisa johnston, “web reviews: see the science: scitech image databases,” sci-tech news 65, no. 3 (2011), http://jdc.jefferson.edu/scitechnews/vol65/iss3/11. 4. annette holtkamp et al., “inspire: realizing the dream of a global digital library in highenergy physics,” 3rd workshop conference: towards a digital mathematics library, paris, france (july 2010): 83–92. 5. piotr praczyk et al., “integrating scholarly publications and research data—preparing for open science, a case study from high-energy physics with special emphasis on (meta)data models,” metadata and semantics research—ccis 343 (2012): 146–57. 6. piotr praczyk et al., “a storage model for supporting figures and other artefacts in scientific libraries: the case study of invenio,” 4th workshop on very large digital libraries (vldl 2011), berlin, germany (2011). http://www.ieee-tcdl.org/bulletin/v4n2/kataria/kataria.html http://dl.acm.org/citation.cfm?id=1572406 http://jdc.jefferson.edu/scitechnews/vol65/iss3/11 information technology and libraries | december 2013 51 7. “sciverse science direct: image search,” elsevier, http://www.info.sciverse.com/sciencedirect/using/searching-linking/image. 8. guenther eichhorn, “trends in scientific publishing at springer,” in future professional communication in astronomy ii (new york: springer, 2011), doi: 10.1007/978-1-4419-83695_5. 9. william browuer et al., “segregating and extracting overlapping data points in twodimensional plots,” proceedings of the 8th acm/ieee-cs joint conference on digital libraries (jcdl 2008), new york: 276–79. 10. saurabh kataria et al., “automatic extraction of data points and text blocks from 2dimensional plots in digital documents,” proceedings of the 23rd aaai conference on artificial intelligence, (2008) chicago: 1169–1174. 11. saurabh kataria, “on utilization of information extracted from graph images in digital documents,” bulletin of ieee technical committee on digital libraries 4, no. 2 (2008), http://www.ieee-tcdl.org/bulletin/v4n2/kataria/kataria.html. 12. ying liu et al., “tableseer: automatic table metadata extraction and searching in digital libraries,” proceedings of the 7th acm/ieee-cs joint conference on digital libraries (jcdl’07), vancouver (2007): 91–100. 13. william s. cleveland, “graphs in scientific publications,” american statistician, 38, no. 4, (1984): 261–69, doi: 10.1080/00031305.1984.10483223. 14. hui chao and jian fan, “layout and content extraction for pdf documents,” document analysis systems vi, lecture notes in computer science 3163 (2004): 213–24. 15. at every moment of the execution of a postscript program, the interpreter maintains many variables. some of them encode current positions within the rendering canvas. such positions are used to locate the subsequent character or to define the starting point of the subsequent graphical primitive. 16. transformation matrices are encoded inside the interpreters’ state. if an operator requires arguments indicating coordinates, these matrices are used to translate the provided coordinates to the coordinate system of the canvas. 17. graphical operators are those that trigger the rendering of a graphical primitive. 18. textual operations are the pdf instructions that cause the rendering of the text. textual operations receive the string representation of the desired text and use the current font, which is saved in the interpreters’ state. 19. operations that do not produce any visible output, but solely modify the interpreters’ state. 20. herbert edelsbrunner and hermann a. maurer, “on the intersection of orthogonal objects,” information processing letters 13, nos. 4, 5 (1981): 177–81. http://www.info.sciverse.com/sciencedirect/using/searching-linking/image http://www.ieee-tcdl.org/bulletin/v4n2/kataria/kataria.html automatic extraction of figures from scientific publications in high-energy physics | praczyk, nogueras-iso, and mele 52 21. thomas h. cormen, charles e. leiserson, and ronald l. rivest, introduction to algorithms, (cambridge: mit electrical engineering and computer science series, 1990). 22. sumit bhatia, shibamouli lahiri, and prasenjit mitra, “generating synopses for documentelement search,” proceedings of the 18th acm conference on information and knowledge management, new york (2009): 2003–6, doi: 10.1145/1645953.1646287. 23. jon ferraiolo, ed., “scalable vector graphics (svg) 1.0 specification,” w3c recommendation 01 september 2001, http://www.w3.org/tr/svg10/. 24. liu et al., “tableseer.” 25. cormen, leiserson, and rivest, introduction to algorithms. 26. ricardo a. baeza-yates and berthier ribeiro-neto, modern information retrieval,” (boston: addison-wesley, 1999). 27. ibid. 28. cleveland, “graphs in scientific publications.” 29. praczyk et al., “a storage model for supporting figures and other artefacts in scientific libraries.” 30. stuart russell and peter norvig, artificial intelligence: a modern approach (third edition) (prentice hall, 2009). 31. sergios theodoridis and konstantinos koutroumbas, pattern recognition (third edition) (boston, academic press, 2006). http://www.w3.org/tr/svg10/ pre-processing of operators clustering of graphical operators the clustering algorithm filtering of clusters clustering of textual operators inclusion of text parts detection and matching of captions generation of the output detection of the page layout preparation of the testing set execution of the tests the automatic tuning of parameters the free software alternative: freeware, open-source software, and libraries james e. corbly information technology and libraries | september 2014 65 abstract this paper will introduce the reader to the world of freeware and open-source software. following a brief introduction, the author presents an overview of these types of software. next comes a discussion of licensing issues unique to freeware and open-source software, which leads directly to issues of registration. the author then offers several strategies readers can adopt to locate these software packages on the web. the author then addresses questions regarding the use of freeware and open-source software before offering a few closing thoughts. introduction i first recognized the potential savings in time, money, and labor offered to librarians and others by freeware and open-source software while i was head of technical and automated services at st. ambrose university in davenport, iowa. among other responsibilities, i oversaw the cataloging and processing of all new library materials. normally, i created original records on oclc whenever i cataloged. one of the tools i needed to complete this work (particularly with foreign language materials) was an ascii chart to provide instructions for making characters not found in english, such as ß, ¿, and ç. today a search for such symbols is relatively easy (most can be found on any character map), but twenty years ago, it presented more of a challenge. i spent many fruitless hours looking for the implement i needed. after much searching, i discovered the right tool—david lord’s ascii chart. this freeware program featured several tables one for ansi characters, control characters, an ebdic table (extended binary coded decimal interchange code), palette (for colors), and a list of ibm pc characters. the first and last charts were the ones i referred to the most. whenever i needed a special symbol in a windows-based program, the ibm pc chart gave me the formula i needed to make it. any time i was in dos or a dos-based utility (as i was with oclc), i consulted the ansi chart to form a diacritical mark. ascii chart was a great piece of software that saved me many hours of work and helped me formulate accurate documents and records. in another instance, shortly after i assumed my duties as the director of library services at kansas wesleyan university, the time clock that registered the work hours of student workers broke down. it was an ancient machine that demanded frequent repairs. and although it dutifully james e. corbly (james.corbly@gmail.com), from austin, minnesota, has been studying freeware and open-source software for over a decade. mailto:james.corbly@gmail.com information technology and libraries | september 2014 66 printed check-in and check-out hours on time cards, it did not calculate the number of hours worked, nor could it prevent such abuses as students punching in and out for one another. rather than taking the machine to a repair shop, i sought an alternative method of record keeping. i found one a computer program from guia international called picture timeclock1 which not only registers each student’s work hours but also totals them so that monthly summaries of each student’s work record can be compiled with increased speed, ease, and accuracy. the software lists among its features a photo identification module that prohibits one student from signing in or out for another. picture timeclock is freeware, so no monies were involved in its procurement. i then took this process a step farther. each month, i digitized the work record sheets required by the business office so i could employ another freeware program, a pdf editor, to transfer the records from picture timeclock onto the digitized time sheets. together, these two programs reduced the amount of time needed to document student worker hours by 88% while enhancing the correctness of the submitted records. as an added bonus, i obtained a permanent digital copy of each student’s work record which could be readily accessed whenever necessary. today, i still rely on freeware and open-source software to solve many of my workaday problems. i utilize these packages for a variety of purposes including web content mining, manipulating pdf documents, and keeping my computers functioning at their optimal level. what are freeware and open-source software? among the major types of software (commercial, shareware, adware, etc.), freeware and opensource software are unique. freeware is copyrighted software given away by its owner (normally the author) for others to use. the author retains sole possession of the copyright, so users cannot alter the software. freeware authors allow individuals to use their productions in any legal manner but they do not allow anyone to sell the software for a profit. additionally, many freeware suppliers impose boundaries on the use of their products, restricting their application in commercial endeavors, for example. open-source software carries the concept of freeware to its ultimate conclusion. with open source products, the copyright holder gives others the right to study, modify, and distribute the software free of charge to anyone for any purpose. quite often, open source products result from the collaborative efforts of contributors living in numerous locations around the world. raw program code, along with the compiled program, is available to anyone who is willing to obtain it, scrutinize it, and make additions or improvements to it in the expectation that the combined efforts of many people will result in a product increasingly useful and reliable to end users. although some open source products lack documentation, many (if not most) have active user groups or communities which serve as sources of assistance to users. to be considered open source, a software product must meet several criteria, among which are the following: the free software alternative| corbly 67 • the software must be freely available without cost, royalties, or fees of any kind. • the program must be distributed as source code (for programmers) and compiled code (for end users). • end users and programmers may alter the program’s code. • the modified source code must be redistributed under the same conditions as the license for the original version of the software.2 one must guard against the temptation to equate freeware and open-source software with publicdomain software. this latter category of software is not copyrighted; hence, it is free of all costs and can be employed without any restrictions whatsoever.3 freeware and open-source software possess copyrights. although many users may be unfamiliar with the concepts of freeware and open-source software, they nonetheless rely on them every day in their routine tasks. if one uses the web, one needs to use a web browser. those who employ microsoft’s internet explorer are using one of the most popular freeware products available,4 while firefox users rely on an open source package.5 many freeware and open source systems are standards in their fields. ccleaner (freeware) has won numerous awards for its efficiency not only as a cleaner and system optimization tool but also as a guardian of user privacy.6 audacity (open source) is a sound recorder and editor employed not only by amateurs but also by professionals in the field.7 access to the web would be impossible without the use of apache (open source), the number one http server.8 it is also important to note that software can change type. freeware can become shareware, commercial software can morph into freeware, etc. licensing licensing of open source products is rather straightforward. although there are over sixty-five different open source licenses, one predominates the general public license (gpl). this is the most popular license used for open-source software. the gpl serves as the license for approximately 70% of open source products. the gpl first appeared in 1989. richard stallman, formerly of the massachusetts institute of technology and founder of the nonprofit free software foundation, authored it. modifications made to the gpl helped keep it vibrant as the years passed and the information technology world changed. as of this writing, the gpl in use is the third version, which came out on june 29, 2007. the foundation of the gpl consists of four principles: 1. the right of individuals to use software for any purpose; 2. the right of individuals to alter software to meet individual needs; 3. the right of individuals to share software with others; and information technology and libraries | september 2014 68 4. the right of individuals to freely distribute the changes one makes to software.9 to these ends, the gpl gives end users the right to freely reproduce and distribute copies of a software program’s source code, providing that each copy displays a copyright notice, a disclaimer of warranty, a copy of the gpl, and gpl notices. the right to modify the software’s code and freely distribute it to others, taking care to list all modifications made to the code and insuring that every condition outlined above is met. commentators often refer to gpl as “copyleft” licensing. copyleft is a method of making software freely available and requires all modified versions to meet the criteria already listed. one can read the text of this license at: https://www.gnu.org/licenses/gpl-3.0.txt.10 in addition to the gpl, open-source software authors distribute their work under other arrangements such as the berkeley software distribution licenses (bsdl), the mozilla public license, the nasa open source agreement, and the common public license. freeware licensing is not nearly as uniform as open source. there is no freeware equivalent to the gpl and the rights and responsibilities of the copyright owner vary from program to program. having noted this, there are clauses that many freeware licenses hold in common. among these are the following: • the copyright to the software is retained either by the author of the software or its provider. • end users may install the software and use it for any legal purpose. • one may install and use the software only on a specified number of computers. • users may copy and distribute the software provided the original copyright remains intact. • one cannot charge a fee for copies of the freeware save for distribution costs. • the software is provided “as is” the copyright owner assumes no liability for any damage caused by product usage. • freeware frequently has usage limits. many freeware licenses permit only personal or noncommercial use of the product. there may be limits on use of the freeware with other software packages and restrictions on freeware use over a network. with such a variety of clauses such as these, it is hardly surprising that freeware licenses vary in length. one freeware program i regularly use has a license consisting of three small paragraphs, while another boasts a license five pages in length. due to these characteristics of freeware licensing, i always copy the text of a freeware license to a blank page of my word processor. i keep this document in the master file of the software so that it is readily available should it be required for any exigency. https://www.gnu.org/licenses/gpl-3.0.txt the free software alternative| corbly 69 software registration the above section on licensing leads directly to questions regarding registration. in this arena, freeware and open-source software differ from other types of software. when employing freeware and open source products, one often finds differences between “personal” and “business” use. many of these packages allow unlimited use of the software as long as its use is strictly personal. in other words, if one downloads a software program, installs it on one’s computer, and utilizes it strictly for one’s own, non-commercial purposes, then use of the software is free. personal use refers to all usage that does not generate financial income, such as scrap-booking, creating personal websites, personal blogs, and print jobs such as flyers, posters, and t-shirts for grade schools and local food banks. in this case, one would simply register the software in one’s name. additionally, charities and other non-profit institutions (such as public, academic, and school libraries) may ordinarily employ freeware and open-source software under the same precincts as individuals, i.e., as personal users. however, suppose one works at a for-profit enterprise and desires to install the software on his/her office computer. naturally, since the firm owns all its office computers, one usually registers the software in the company’s name. under such conditions, many freeware packages oblige the user to seek permission from the software owner for such use. commercial use may also require a fee from the user to the software owner. these caveats also apply to individuals who obtain this software for engagement in their own moneymaking endeavors (such as freelancing). the end-user license agreement, included with all freeware packages, contains information to guide users in such contingencies. to avoid worrying about these details, some freeware and open source users simply treat these programs as if they were commercial products and register their use accordingly. this is a safe practice to adopt. one may be surprised to learn that numerous freeware and open source packages do not require registration at all; others regard user registration as a voluntary exercise. both stipulations presume non-commercial use of the software. locating freeware and open-source software finding freeware and open-source software is a rather simple process. a good place for the neophyte to start searching for them is at datamation.11 datamation is a periodical that began life in hard copy format in 1957 but morphed into an e-journal in the 1990s (the final print edition appeared in february 1998). to access the list of open source replacements for popular commercial security tools, for example, simply click on the “open source” heading of the menu bar on the home page of the website. scroll down the next page to discover the list of alternatives to information technology and libraries | september 2014 70 commercial security software in such categories as anti-malware, backup, and browsers, until finally, the reader reaches the last category, web filtering. another important resource comes from pcmag.com. each month, the site presents a list of the best free software available in a particular category such as firewalls, video conferencing, antivirus, and presentation software. both freeware and open-source software are included in each category. last year, experts in the field examined over one hundred free software packages from nine categories. most of the evaluated packages operate on either windows 7 or windows 8, although programs designed for other platforms, such as mac os and android, were included in the lists. also assayed are free cloud-based web applications which run in a browser. note that the lists of free software from 2012 and 2013 are readily available.12 john corpuz’s “45 free and useful windows applications” is a slideshow presenting detailed information on useful applications from a variety of software categories. embedded within each software description is a link which, when clicked, takes the reader to a window where that software may be downloaded.13 dottech offers one of the most comprehensive lists of freeware for windows available. this list consists of individual reports organized around nine categories: cleaning and maintenance, communication, files & documents, work & productivity, pdf tools, privacy and security, multimedia, network/internet tools, and miscellaneous. each report is well-written, concise, and features the type of information computer users need. although i disagree on some of the choices labeled as “best,” i cannot but appreciate the exhaustive nature of these documents. since dottech continuously revises and expands the number of reports in this set, wise users make periodical visits to this site.14 a quick search of the web will bring one to the open-source software directory where one will find open source applications listed by broad categories on the left side of the home page, categories which are then separated into topics before being subdivided by function. to discover network management software, for example, find “administrators” (category), then “networking” (topic), and finally, click on “network management” (function). in this manner, one will obtain a list of open source products matching this description.15 techradar provides another register of recommended free software.16 this site provides detailed descriptions of seventy-six freeware packages, ranging from productivity software to games. a link takes users to techradar’s download channel where one may read information on freeware and open-source software categorized by function and specific name.17 one cannot discuss freeware without mentioning the world’s largest supplier of freeware, microsoft. veteran users of its windows operating system and its office suite are undoubtedly cognizant of the templates and other helps the firm offers them. those implements represent merely a sampling of the valuable and diverse tools the company makes available to the computing world. microsoft provides access to their abundance of freeware via a download site on the web;18 the free software alternative| corbly 71 however, many people find the site difficult to navigate. for this reason, many individuals consult a friendlier, third party site that opens the doors to this unique collection of freeware. one of the better examples of these sites can be found at gizmo’s freeware.19 internet download directories offer an easy and convenient avenue for one to obtain freeware and open-source software. there are a plethora of such sites on the web; unfortunately, they are not all of equal value. some sites are simply better than others. here is a brief list of criteria one may employ to judge these sites: • ease of use. is the site easy to navigate? does it feature categories that enable one to search for a specific type of software? if it contains categories, does their organization enable one to find software quickly and efficiently? • language. can one easily comprehend the language used to describe the software? does the language express complex concepts in laypeople’s terms or in a manner targeted to information technology professionals? • are the software packages accurately described? do the descriptions detail desirable and undesirable traits of the software? do they clearly indicate the operating system required by the software? what other prerequisites does the software require for effective operation? are alternatives to that software specified? • are software reviews presented? if so, who wrote the reviews laypeople or information technology professionals? do the reviews offer download statistics? does the site contain a ratings system easily understood by the user? • are aids available to help users make optimum use of the software? if aids are available, what format are they in videos, documentation, other? • look for other features which may prove useful, such as: is there a link to the software’s home page? does the download directory provide access to more than one download site? among the download sites meeting these criteria are the following: cnet download.com http://download.cnet.com/windows/ majorgeeks.com http://www.majorgeeks.com/ softpedia http://www.softpedia.com/ filehorse.com http://www.filehorse.com/ softonic http://en.softonic.com/ filehippo.com http://www.filehippo.com/ freewareweb http://www.freewareweb.com/ http://download.cnet.com/windows/ http://www.majorgeeks.com/ http://www.softpedia.com/ http://www.filehorse.com/ http://en.softonic.com/ http://www.filehippo.com/ http://www.freewareweb.com/ information technology and libraries | september 2014 72 freeware guide http://www.freeware-guide.com/ freeware files http://freewarefiles.com/ sourceforge http://sourceforge.net/ this brief list does not even begin to exhaust the number of internet download directory websites available for use. frequent visits to these and similar websites will prove amply rewarding. another method of finding this software is to simply look for it on the web via a search engine. to do this most efficiently, one will need the proper name of the software one is trying to obtain. however, if one has that element, one will find this an effective technique of obtaining freeware and open-source software. how does one seek freeware and open source equivalents to commercial software and shareware? first, one may consult a website entitled alternativeto.20 this website will present one with a list of alternatives to specifically named commercial software packages. one has only to click on the search dialog box, key in the name of the commercial product one wishes to find alternatives to, and press the search icon. the list of results will include freeware, open-source software, and commercial products. clicking on the name of the product with transport one to that product’s home page, where one will gain additional information on that product and be given an opportunity to download the product onto one’s computer. secondly, one may consult any of several lists of software equivalents from the web. one of the better of these registers is “100 open source apps to replace everyday software,” by cynthia harvey.21 this list provides not only names of individual open source projects and the commercial packages they replace but also links to homepages of these projects. library-specific freeware and open-source software in the past several years, an increasing number of librarians have turned to freeware and opensource software to help them fulfill the duties they are obligated to discharge. one of the most renowned examples of library-specific open-source software is koha, an integrated library system.22 programmers and librarians developed koha fifteen years ago for the horowhenua library trust in new zealand. since then, libraries of all types, including public, academic, and school libraries, have adopted koha as their integrated library system. numerous consortia across the globe also employee koha to meet their needs and those of their users. one way novices may apprise themselves of this software is to consult websites such as the creative librarian, which features a page entitled “open-source software for libraries,” where software is enumerated by library function.23 any type of library, irrespective of the library’s size, can utilize the software described on this page. http://www.freeware-guide.com/ http://freewarefiles.com/ http://sourceforge.net/ the free software alternative| corbly 73 questions regarding the use of freeware and open-source software all this is not to say that freeware and open-source software do not have their challenges. for instance, in one anecdote i related in the introduction in this paper, i did not name the pdf editor i used at kansas wesleyan. that is because the software is no longer available; unfortunately, neither is david lord’s ascii chart. whenever one spots a piece of freeware or open-source software that may be useful, it is imperative to download it immediately. the availability of many packages is limited, and once gone, they are usually difficult, if not impossible, to obtain. another concern regards documentation. when i first obtained picture timeclock, for example, a complete set of instructions was available from the software provider. that is no longer the case. although an increasing number of freeware and open source packages offer documentation, many do not. however, as noted earlier in this report, many open-source software products have user groups called communities that exist not only to improve the software but also to provide technical assistance to those who use the software. downloading freeware and open-source software presents its own quandaries. even though most providers of these packages go to great lengths to insure the cleanliness of their product, it is nevertheless true that viruses and malware sometimes attach themselves to this software. whenever my security software activates during a download, i immediately cease the downloading process and make a note of the site for future reference. additionally, i always run my security software against all software downloads before installing them in order to keep my system free of any potential threats. issues also arise from employing freeware and open-source software in business offices. individuals bring most of this software into enterprise environments. since the organization itself doesn’t procure this software, the corporation’s information technology personnel may be reluctant to provide support. indeed, the corporation’s it department may not even permit an individual to download any outside software whatsoever onto a system under their domain. before one attempts to install such software (regardless of type of software) on one’s business unit, one should check with the company’s it people to obtain their views on the proposed installation. closing thoughts one question remains: why bother with freeware and open-source software? are librarians searching for new software programs to master? do they need an additional task to add to their todo lists? are freeware and open-source software worthy of the attention of already overworked and stressed-out librarians? yes, they are worthy of our attention. why? for three key reasons. first, freeware and open-source software are cost-effective. for no monies whatsoever, freeware and open-source software offer librarians the opportunity to add important new tools to the arsenal of implements at their information technology and libraries | september 2014 74 disposal. that means that badly needed funds can be more strategically used by the library to help enable it to fulfill its mission to its clientele. secondly, freeware and open-source software enable librarians to make increased use of computer hardware. computers are machines: they require software to not only tell them which tasks to execute but also to provide instructions for performing those tasks. with this software, the range of computer capabilities not only expands in terms of numbers but also increases in terms of efficiency. the bottom line: freeware and open-source software enhance the value of computer hardware to the library and its patrons. finally, with the assistance of freeware and open-source software, librarians become better librarians. they manage their time more effectually, make better use of the resources at their disposal, and elevate the degree of customer service at all levels of the organization. freeware and open-source software can expedite the handling of routine assignments and make possible the fulfillment of other jobs that, due to time and human limitations, are difficult, if not impossible, to address. freeware and open-source software are good for librarians, good for the library, and good for those who depend on the library for the fulfillment of their information needs. they truly foster what many individuals refer to as a “win-win situation” in the world of information acquisition, organization, preservation, and retrieval. urls cited 1. “picture timeclock,” guia international corporation, http://workschedules.com/store/product/picture_time_clock.aspx. 2. “the open source definition,” open source initiative, http://opensource.org/docs/osd. 3. “public-domain software,” webopedia: online tech dictionary for it professionals, http://www.webopedia.com/term/p/public_domain_software.html. 4. “fast and fluid for windows 7: get internet explorer 11,” microsoft, http://windows.microsoft.com/en-us/internet-explorer/download-ie. 5. “firefox web browser,” mozilla, http://www.mozilla.org/en-us/firefox/new/. 6. “ccleaner,” piriform, http://www.piriform.com/ccleaner. 7. “audacity,” audacity: free audio editor and recorder, http://audacity.sourceforge.net/. 8. “apache,” the apache http server project, http://httpd.apache.org/. 9. “a quick guide to gplv3,” gnu operating system, http://www.gnu.org/licenses/quick-guidegplv3.html. 10. “gnu general public license,” gnu operating system, https://www.gnu.org/licenses/gpl3.0.txt. http://workschedules.com/store/product/picture_time_clock.aspx http://opensource.org/docs/osd http://www.webopedia.com/term/p/public_domain_software.html http://windows.microsoft.com/en-us/internet-explorer/download-ie http://www.mozilla.org/en-us/firefox/new/ http://www.piriform.com/ccleaner http://audacity.sourceforge.net/ http://httpd.apache.org/ http://www.gnu.org/licenses/quick-guide-gplv3.html http://www.gnu.org/licenses/quick-guide-gplv3.html https://www.gnu.org/licenses/gpl-3.0.txt https://www.gnu.org/licenses/gpl-3.0.txt the free software alternative| corbly 75 11. “about us: datamation,” datamation, http://www.datamation.com/about/. 12. “the best free software,” pcmag.com, http://www.pcmag.com/article2/0,2817,2381528,00.asp. 13. “45 free and useful applications,” tom’s guide: tech for real life, http://www.tomsguide.com/us/pictures-story/286-39-best-free-windows-apps.html. 14. “best windows free software,” dottech, http://dottech.org/best-free-windows-software. 15. “open-source software directory,” http://www.opensourcesoftwaredirectory.com/. 16. “the best free software for your pc: essential pc programs you should download today,” techradar, http://www.techradar.com/us/news/software/the-best-free-software-for-yourpc-1221029 . 17. “newest downloads,” techradar, http://www.techradar.com/us/downloads. 18. “microsoft download center,” microsoft corporation, http://www.microsoft.com/enus/download/. 19. “best free microsoft downloads,” gizmo’s freeware: the best freeware reviewed and rated, http://www.techsupportalert.com/content/best-free-microsoft-downloads.htm. 20. “alternativeto,” http://alternativeto.net/. 21. “100 open source apps to replace everyday software,” datamation, http://www.datamation.com/open-source/100-open-source-apps-to-replace-everydaysoftware-1.html. 22. “koha library software,” official website of koha library software, http://kohacommunity.org/. 23. “open-source software for libraries,” the creative librarian, http://creativelibrarian.com/library-oss/. http://www.datamation.com/about/ http://www.pcmag.com/article2/0,2817,2381528,00.asp http://www.tomsguide.com/us/pictures-story/286-39-best-free-windows-apps.html http://dottech.org/best-free-windows-software http://www.opensourcesoftwaredirectory.com/ http://www.techradar.com/us/news/software/the-best-free-software-for-your-pc-1221029 http://www.techradar.com/us/news/software/the-best-free-software-for-your-pc-1221029 http://www.techradar.com/us/downloads http://www.microsoft.com/en-us/download/ http://www.microsoft.com/en-us/download/ http://www.techsupportalert.com/content/best-free-microsoft-downloads.htm http://alternativeto.net/ http://www.datamation.com/open-source/100-open-source-apps-to-replace-everyday-software-1.html http://www.datamation.com/open-source/100-open-source-apps-to-replace-everyday-software-1.html http://koha-community.org/ http://koha-community.org/ http://creativelibrarian.com/library-oss/ lib-mocs-kmc364-20131012113604 233 lit a a ward, 1980: maurice j. freedman s. michael malinconico this is the third presentation of the lit a award for outstanding achievement. the first two honored individuals whose achievements can be said to have created the discipline we know as library automation. the first award went to fred kilgour whose vision, daring, and entrepreneurial and managerial skills changed the way libraries operate almost overnight, and may in the increasingly stringent economic times ahead have helped ensure the economic viability of libraries. the second award went to henriette avram, whose untiring efforts on behalf of the marc formats and their promulgation is only just short of legendary. this year's winner distinguished himself in a somewhat different manner. his contributions did not lead to the development of new automated systems or services. rather, his outstanding achievement lies in the creative and pioneering use he made of technology in support of a clear vision of effective library service. his contribution comes from the depth of sensitivity and understanding he brought to the application of technology to library service. much to our go<;>d fortune, he has chosen to share with us through his many writings the insights he has found in his study of the fit between technology and the delivery of effective library service. this year's winner shares the distinction, with the two previous winners, of being a former president of the division. in fact, he presided over the change from the venerable acronym isad to the new name of the division: library and information technology association (lita). it gives me particular pleasure to present this year's award, as it goes not simply to an esteemed colleague but to a valued friend. i first met maurice (mitch) freedman at the first ala conference i attended-the midwinter meeting of 1972. the first session i attended at that conference was a meeting of the committee on library automation (cola). i had gone to that meeting to report on nypl's automated cataloging system, which had that month become fully operational with the publication of the book catalogs of the research libraries and of the mid-manhattan library. following the cola program, mitch approached me, introduced himself, and inquired about the possibility of using the nypl system to produce hennepin county's catalog. the consequences of that afternoon were most salutary both for the hennepin county library (hcl) and for me personally. hcl acquired at no cost an automated bibliographic control system, and i gained a friendship that has endured for nearly a decade. thus, rather than dwelling on mitch's professional accomplishments-which are already well known to you-1 would prefer to say a few words about the man himself. perhaps the best way to characterize him is to describe to you his office at 234 journal of library automation vol. 14/3 september 1981 maurice freedman (left) receiving 1980 lita award presented by s. michael malinconico (right). columbia university. prominently displayed on the walls are two enormous posters, one of bertrand russell and another of lenny bruce. a perhaps odd pair until one realizes that these men had one important attribute in common: neither of them accepted, without incontrovertible proof, truths supported by conventional wisdom alone. mitch, like the philosopher and satirist whose images grace the walls of his office, is an iconoclast who insists on more than the endorsement of reigning authority before he will embrace an idea; and he will work tirelessly to change the prevailing wisdom if he finds that it serves to frustrate rather than aid the delivery of the kind and quality of library service to which he feels the patrons of libraries are entitled. likewise, though he was among the pioneers who helped introduce sophisticated technologies such as automation and micrographics into the operation of libraries, he has always maintained a healthy skepticism, which has prevented him from being seduced by the dry voices of the hollow men who proclaim marvels that are in reality only gilded figures of straw. just as lenny bruce refused to accept contemporary conventions regarding language and behavior, mitch freeman has refused to accept the sanctity of lc subject terminology. he, sanford berman, and joan marshall have served for more than a decade as lc's conscience, prodding our phlegmatic, de facto national library to action. just as bertrand russell returned to the axioms of giuseppe peano in an attempt to secure the foundation of mathematics in formal logic and to lita award 235 free that discipline of fuzzy thinking, mitch has returned to the principles articulated by antonio panizzi and seymour lubetzky, as the tests by which to judge the claims of the self-assured mountebanks who regale us with newly coined bibliographic wisdom. in this regard i anxiously await the completion of his doctoral dissertation, in which he explores the philosophical underpinnings of theories of bibliographic control (a work that would have proved most useful during the protracted emotional debate that surrounded aacr2). i expect that it must be particularly gratifying for mitch to accept his award in this particular city. although his physical roots are in the northeast, i rather think his intellectual and spiritual roots are here, or more precisely, in the city across the bay-berkeley. it was just about twenty years ago that mitch, after graduating from rutgers university, newark, enrolled as a graduate student in philosophy at the university of california, berkeley. while at berkeley, his sense of social justice and utter disdain for unsupported dogma-could one expect less of a student of philosophy?led him to become active in the free speech movement. thus, we find very early in his career a concern for social issues, a concern that reemerged in his active involvement with the social responsibilities round table shortly after joining the library profession. before leaving berkeley, mitch earned his degree in library science. thus, he earned his degree from one of the most prestigious library schools on the west coast, and now plies his trade as associate professor at one of the most prestigious library schools on the east coast, the columbia university school of library service. if he is only moderately successful in conveying to his students his dedication to the delivery of quality library service, his steadfast conviction that technical services is in reality the first step in the provision of effective public service, and a respect for the supremacy of principle over expedience, his graduating classes will constitute a more lasting and meaningful award than this simple gesture conferred upon him by his professional colleagues. lib-mocs-kmc364-20140103102946 103 book reviews libraries in new york city, edited by molly herman. new york: columbia university school of library service, 1971. 214 pp. $3.50. this guide to libraries in new york is comprehensive, and the description of each library is thorough. pages 184 and 185 list libraries in which there are active and significant automation projects. frederick g. kilgour cobol logic and programming, by fritz a. mccameron, homewood, ill.: richard d. irwin, inc., 1970. 254 pp. $6.00. this book provides a good introduction to cobol, although the author implies that cobol logic is different from ·other computer language logics. however, many examples are included in the text to illustrate new commands and there are numerous review questions, exercises and problems in each chapter. the problems of later chapters build on the logical designs presented earlier. thus, the reader can follow a problem from analysis through solution. the book would be a more useful self-instruction guide as well as textbook if the answers to recall questions and exercises were given. a sound understanding of cobol should be gained from solving the fairly sophisticated problems at the end of the book. one unique and useful idea is the inclusion of coding sheet, punch card, printout, test data and output facsimiles. the most serious drawback of this book in regard to library automation is its obvious slant toward business applications. while the cobol commands presented are sufficient for most applications, there is no mention of character manipulation commands such as examine, with tallying and replacing options. in addition, problems are oriented toward bookkeeping and inventory controls. valerie ]. ryder die universitii.tsbibliothek auf der industrieausstellung: 1. wissen auf abruf. 16 pp. 2. dokumentation-lnformation. 16 pp. berlin: universitatsbibliothek der technischen universitiit berlin, 1970. no price. this constitutes a report (in two parts) of the contribution of the library of the technical university of west berlin to the official german industrial exposition held september 27 to october 6, 1968. the library's special exhibit was part of a section labeled: "quality through research and development." it attempted to give a synoptic view of modern library proce104 journal of library automation vol. 4/2 june, 1971 dures and their value for improving science library services. the examples demonstrated emphasized document acquisition procedures and the various readers' services. a total area of approximately 600 square feet was divided into two rooms, one showing technical equipment and the other, besides housing a twx-terminal, was furnished as a reference reading room. the terminal connected the exhibit area with the reference department of the library of the technical university. graphic charts on the walls explained functions of the typical science library in germany and the kinds of services offered. no fundamental differences from the situation in other western countries, especially the usa, can be pointed out. it may be mentioned here that west germany has an efficient organization of union catalogs, one for almost every state (bavaria, wiirtemberg baden including palatinate, hessen, nordrhein-westphalia, hamburg, and west-berlin). interlibrary loan requests go first to a region's union catalog and from there, when the item is traced within the region, to the appropriate lending library, which forwards the item or copy to the requesting library. non-traceable titles are automatically sent on to a neighboring state's union catalog, and so on, until the item is found and sent to the requesting library. reader/ copier machines for different systems of micronized text material were displayed and could be operated by the visitors. under the title "document circulation" the application of edp methods were shown, using machine readable paper tape for borrowing records. the system described was an off-line one, using (presumably daily) lists of the updated circulation master file. other graphic charts described the automated document retrieval system installed at the library of the technical institute of delft, netherlands, and the integrated library system of euratom in ispra, italy, which includes a selected dissemination of information service. computer generated bookform catalogs of monographic and serials records of other west german science libraries were on display, together with information dealing with the european translations center in delft, which records all scientific translations and publishes "world index of scientific translations." a film showing the operation of the national lending library of great britain was demonstrated. literature analysis, recording, storing, and retrieval are the topics of the second part of the report. electromechanical documentation methods using punch cards, and more often punch paper tape, with their corresponding machinery for selecting and writing back records, were shown under operating conditions. a computer based automatic information retrieval system, developed by siemens on the hardware of the current rca spectra 70 computer series was also exhibited. the system named "golem" claims to have some advantages over the medlars i system of the national library of medicine. it is operational at siemens/ edp headquarters in munich. richard a. polacsek book reviews 105 marc manuals used by the library of congres~> , prepared by the information systems office, library of congress. 2d ed. chicago: information science and automation division, american library association, 1970. 70, 318, 26, 18 p. this second edition contains the same four manuals as did the first, issued in 1969, although the titles of some of the individual manuals have been changed. the manuals are: 1. books: a marc format. 4th ed., april 1970 (formerly the subscriber's guide to the marc distt·ibution service. 3d ed.) 2. data preparation manual: marc editors. 3d ed., april 1970. 3. transcription manual: marc typists. 2d ed., april 1970. 4. computer and magnetic tape unit usability study. the fourth manual has been reproduced unchanged from the 1969 edition. the third, which contains the keyboarding procedures designed to convert bibliographic data into machine readable form, has been given a subtitle and completely revised to apply to a different keying device, the ibm mt /st, model v. it is the first two manuals, however, which will attract the widest continuing study outside of the library of congress. both manuals have been updated. significant changes from the previous edition of each are indicated in the margin by a double asterisk at the point where the revision was made. no indication is made of deletions, however. thus, users who look for field 652, which was described in the earlier edition, will not find it; nor will they find any instructions directing them to fields 651 and 610, which contain the material formerly placed in that discontinued field, although both 651 and 610 are provided with o o to indicate that they contain new material. among the additions to the first manual are provisions for greek, subscript, and superscript characters, and a revision of the 001 field to take into account both the old and the new l.c. card numbering systems. among the deletions is the table showing the ascii 8-bit hex and 6-bit octal in ebcdic hex sequence. the editors' manual contains the procedures followed by the marc editors in preparing data for conversion to machine readable form. while the first edition of the marc manuals contained the first edition of this particular manual, a second edition was issued in july 1969 for internal use within the library of congress. this third edition is essentially the same as the second edition with minor revisions such as the addition of examples and clarifying statements, a few new instructions, and corrections of typographical errors. the double asterisks in this manual refer to changes from the second edition, not from the first, so that owners of the first edition will have to make their own comparisons to see where the third edition differs from the first. among the new, non-asterisked, materials included that did not appear in the first edition are a discussion of other (non-lc) subject headings on 106 journal of library automation vol. 4/2 june, 1971 pp. 111-114 and of romanized titles on pp. 131-132. the third edition also contains several new appendices covering diacritics and special characters, sequence numbers, and correction procedures. while the editors' manual is designed chiefly for use by the editors at l.c., it has great value for marc users. in many places it provides an expansion and explanation of material treated much more briefly in the first manual, books: a marc format. examples of this clarification are the discussion of fixed fields in the editors' manual and its explanation of the alternative entry indicator in the 700 fields, which is merely listed in the first manual. the editors' manual also contains material that does not appear in the first manual, such as the alphabetic alternatives for the numeric tags (which i find more confusing and less memorable than the numeric ones). while only a year intervened between the appearance of the first and second editions of the marc manuals, enough changes have been made to make the new edition a necessary purchase for all those actively involved in the use of marc records. provision of an index would, however, have facilitated its use. judith hopkins computers in knowledge based fields, by charles a. myers. cambridge, mass.: the mit press, 1970. 136 pp. $6.95. a joint project of the industrial relations section, sloan school of management, mit and the inter-university study of labor problems in economic development. the author has written previously on the impact of computers on management. in the current study on the implications of technological change and automation he has selected five areas-formal education and educational administration; library systems; legal services; medical and hospital service; national and centralized local data banks. in this book he is trying to answer such questions as what needs prompted the use of computers, what are the initial applications and what problems were encountered, what affect does the use of computers have on the work performed and what resistance was encountered to their introduction. he also posed the question: can anything be said about comparative costs of computer based programs as compared with other programs? the answer appears to be "no" or "not yet." the chapter on libraries deals primarily with project intrex and thus fails to give an overview of developments in library systems which are operational. the other chapters offer a review of planned and operational projects as of 1968-69. stephen e. furth book reviews 107 libraries and cultural change, ronald c. benge ( hamden, connecticut and london:) archon books & clive bingley ( 1970 ). 278 pp. $9.00. this work is intended primarily to serve library students as an introduction to a consideration of the place of the library in society, with suggestions for further reading. the author is hopeful that it may be of interest to a wider audience, and it is. mr. benge has taught in library schools in the caribbean, west africa, england and wales. this experience is reflected in his approach to a discussion of the social background of library work. although, as he points out, it is possible to establish connections of many kinds, and libraries might be convincingly connected with witchcraft or the illegitimacy rate or prehistoric man, yet more meaningful connections must be sought, and he has selected not only culture, but cultural change, as the basis. further, in his several discussions he has tried to commence with the cultural background and then to note the possible implications for librarianship, rather than to follow the more usual method of commencing with libraries and showing the relevance to them of social forces and other institutions. a listing of a few of mr. benge's fourteen chapter-headings will suggest his development of his theme: "the clash of cultures", "mass communications", "censorship", "the impact of technology", "philosophies of librarianship". each chapter is an urbane essay in the editor's easy-chair manner, a monolog in which the author introduces the reader to that part of the universe that can be viewed through the arch over which the particular chapter-title is inscribed, and relates it to the work of the library. mr. benge is infmmative ( he is up-to-date on all manner of matters; e.g., he has been reading library college and he knows about high john), he is occasionally witty and often convincing. as the basis for class-room discussion his work is perhaps also as stimulating as a propaedeutic should be, but lacking such discussion i doubt this attribute. i find that to stimulate, a book must organize the field of discussion. for me mr. ben~e fails to do so. i find his essays agreeable, with occasional bons mots ('young people, like books, must be preserved for the future "; "guinea pigs are happy creatures") but, like other conversational literature, it leaves me with a general euphoria but unsatisfied logic. for example, the final chapter ("philosophies of librarianship" ) starts out bravely by questioning the relevance of theory but concludes feebly that what is needed to explain librarianship is perhaps a new integration of traditional custodial principles, the missionary approach, and the rationale of a personal reference service. references from other than the anglo-american culture-sphere are few; the book would have gained greatly from more. we here in jlauay are naturally interested to hear what mr. benge has to say on "the impact of technology". in this chapter-regrettablyhe abandons his method of social background first and relevance for libraries 108 journal of library automation vol. 4/2 june, 1971 afterward, and simply notes the direct impact of technology on libraries, mainly in the uk. he concludes that "there can be no doubt that the information crisis does exist and that traditional reference or retrieval methods have not solved it. there is chaos, duplication and waste. what i have tried to suggest here is that on the evidence to date, we cannot yet be sure that machine retrieval is the answer" ( p . 175). there are misprints, to be sure, neither unusually numerous or serious, with one exception. dr. vannevar bush's name (p. 182) has been mangled, and is, moreover, omitted from the name index. verner w. clapp serial publications in large libraries, edited by walter c. allen. urbana, ill.: graduate school of library science, university of illinois, 1970. 194 pp. $4.50. handling of serial publications was the topic of the sixteenth allerton park institute held in november 1969; the papers are published in this slim volume. almost every paper offers a number of controversial and provocative ideas which must have evoked interested and interesting reactions. the subsequent discussions are not reported. problems of serials-the librarian's basket of snakes-are identified and analyzed from selection and acquisition through check-in, cataloging, binding, shelf arrangement, abstracting and indexing, to machine applications. the papers cover this gamut well and in most cases provide a good view of the state of the art. recurrent themes are the significant role of serials in today's information flow, the urgency of the problems (though the content is long on agony and short on therapy), and the necessity for bearing in mind the user's rather than the librarian's convenience where both cannot be accommodated when reaching for solutions. donald hammer's paper on computer aided operations provides a good introduction and overview of automated serials systems, with some helpful hints to beginners in the field. microfilm technology and machine reada.ble commercial abstracting and indexing services are touched on by warren kuhn and bill woods, but each topic deserves more thorough treatment in separate papers. too few of the speakers proposed specific research in their areas; where such long-standing problems exist, some well-directed suggestions might elicit useful studies. the book should be useful to library schools as good coverage of a seldom detailed problem operation, to librarians entering the challenging maelstrom of serials handling, and to those already overinvolved who might be refreshed by the longer view. the poor proofreading is a minor flaw. mary jane reed book reviews 109 training in indexing: a course of the society of indexers, edited by g. norman knight. cambridge, massachusetts: the m. i. t. press, 1969. 219 pp. $7.95. to this reviewer, who h ad struggled through the compilation of one annual index to the journal of library automation with the aid of scarce, unrelated, and out-of-date books and periodicals on the subject of indexing, this thorough, well-written volume, aimed at the neophyte indexer, came as a godsend. it comprises a series of lectures, by master practitioners of the craft, sponsored by the society of indexers. that authors and audience were chiefly british d etracts not a whit from the book's usefulness to americans. two introductory chapters b y robert l. collison on the elements of book indexing are followed by twelve on specific treatment of those elements and of different types of material. chapters on indexing p eriodicals, scientific and technical material will particularly interest readers of lola. exercises, a selected bibliography, and an index that also serves as an illustration of points in the text, enhance the usefulness of this book to the beginner. it should be equally useful to an indexer of no matter how much experience, for , as collison emphasizes in his opening statement, indexing is still in an elementary stage, there are no common rules on which all indexers agree, and everyone considers himself his own authority on how an index should b e arranged and what should go into it. in treating a subject that might seem to the layman to lend itself all too readily to the cut-and-dried approach, the authors have brought a delightful measure of flexibility, wit and imagination. at no point do they lose sight of the fact that the indexing of books, like the writing of them, is a very human endeavor. eleano1· m. kilgour reader in library services and the computer, edited by louis kaplan. washington, d. c.: ncr microcard editions, 1971. 239 pp. $9.95. this volume contains a couple of dozen reprints, mostly of articles. the reader is not intended for those doing research and d evelopment in library automation, but rather for librarians and library students who wish to familiarize themselves with the subject. the quality of the articles is high. in general, they present a conservative position, which is not to say that they oppose library automation. rather, they inform the reader of positive action to be taken and in so doing impart understanding. within this conservative fram ework, however, various viewpoints are expressed. seven subjects group the articles: the challenge ( three articles); varieties of response (six ); theory of management (one); new services (six) ; catalogs and the computer ( two ); copyright (one ); and information retrieval testing ( six ) . the r eader is not a book in the sense that a book 110 journal of library automation vol. 4/ 2 june, 1971 contains a central theme. it is likely that the r eader will be used for its sections rather than in its entirety, but that is the manner in which one expects to use a reader. anyone who so uses it will be enlightened. the reader has but one serious shortcoming. it is devoid of an index. this deficit will seriously hamper consultation of the book. frederick g. kilgour automation management: the social perspective, ed. by ellis l. scott and roger w. bolz. athens, ga., center for the study of automation and society, university of georgia, 1970. (second annual georgia-reliance symposium) $5.75. sixteen papers are presented at this symposium by a variety of authors from labor, management, academe, etc. as in all collections of papers, they are uneven in quality. the preface of the symposium states that the "1970 symposium focused on the problem of automation management, from a social perspective, as it relates to industry, education, labor and government." the papers reflect ideas concerning the need for training and retraining, and for preparing people for automation by having them participate in the decision-making process. three papers on the effects of automation use economic analysis based upon the gross national product and other labor and business indicators and find that the changes predicted for automation in terms of joblessness and increased productivity are unfounded, although some questions are asked about the validity of the figures used to make these assumptions. there are interesting formulations on the nature of change and innovation and the time lag between basic research and industrial application. gordon carson's paper expressly attacks the issue of automation in libraries and in education. dr. carson sees one of the problems as the library's print media orientation when the other senses, such as hearing, could also be used. libraries are also attacked on the basis of how they measure effectiveness, i.e., the number of volumes on the shelf, rather than "the speed with which information can be retrieved from that library and placed in the hands of him who needs to use it." this methodology for measuring effectiveness is changing presently, so that the need expressed by dr. carson may be met. in conclusion, dr. carson states that there are "three essential areas in which automation can be exceptionally helpful in higher education. these are as follows: 1 ) improved teaching techniques including autodidactic learning systems; 2 ) registration, fee payment and curriculum planning .. . ; 3) libraries-information retrieval." although in a way many papers in this volume skirt the periphery of the effects of change and how to create it, it is worthwhile reading on the whole. henry voos book reviews 111 interlibrary loan involving academic libraries, by sarah katharine thomson. chicago: american library association, 1970. (acrl monograph, 32). viii/127 pp. $5.00. interlibrary loan pmcedure manual, by sarah katharine thomson. chicago: american library association, 1970. xi/116 pp. $4.50. interlibrary loan involving academic libraries is a summary version of "a normative survey of current interlibrary loan practices in academic libraries in the united states." it makes surprisingly compulsive reading for anyone who has worked much with interlibrary loans, and might be an eye-opener for those who haven't. (the original, complete version appeared in 1967 as a columbia university dls dissertation.) much of it documents or corroborates the feelings (or suspicions) of busy, experienced interlibrary loan staff; some of it is new and surprising; and doubtless many of the same patterns and trends hold true today. dr. thomson, working primarily with data reported by academic libraries to the u.s. office of education in 1963-64, results of intensive analysis of a sample of 5895 interlibrary loan requests (drawn from a total of 60,000 received by eight major university libraries in 1963-64 and 1964-65 ), and information from several questionnaires, presents a clear picture of who borrowed what from whom, how often; staffing and time required; distribution patterns of requests by size and location of library, type of reader; sources of difficulty, delay and failure; factors predictive of fast and efficient service; and a number of other variables. her results and conclusions are presented clearly, with supportive or illustrative statistics, graphs, correlations, and other tables. chapter 14 offers recommendations of librarians for increasing the proportion of interlibrary loan requests fill ed. suggestions and recommendations resulting from dr. thomson's study were incorporated in, or influenced the drafters of, the 1968 national interlibrary loan code, the model regional or state code, and the 1968 interlibrary loan request form. dr. thomson estimates that interlibrary loan requests involving academic libraries are well over the million mark by now, and refers to a 1965 study which reports large libraries estimating they are unable to fill about onethird of the requests they receive. it is to be hoped that some of the worst faults in interlibrary loan requests have been mitigated by the revised codes, revised form s, and better education of interlibrary loan assistants. the new procedures manual should help, too. perusing this monograph should foster greater awareness and understanding of the dimensions and problems of interlibrary loan service. now, if only we had an up-to-date cost study .... who profits from the appearance of the interlibrary loan procedure manual? not merely ill novices, whether new clerical assistants or young librarians faced with setting up, reorganizing, or streamlining interlibrary loan routines. it has value for the old ill hand, checking up on established routines to be sure no sloppiness has crept in ; for the library school student, as an early exposure to good library cooperation manners, 112 journal of library automation vol. 4/2 june, 1971 as well as a basic step-by-step indoctrination in "how to do it"; for recipient libraries, whose time and patience would be much less strained were all requestors to follow these elementary, commonsense, too often ignored recommendations; and last, not least, the library's patron, whose needs will be filled faster, more economically, with fewer false starts. a wealth of practical detail has been packed into these pages-a plethora of detail, some might complain, confusing the beginner and boring the experienced. but a procedure manual by definition tries to incorporate every stroke and serif of a to z. simple solutions to that complaint are re-reading, and/or judicious scanning. the manual includes annotated texts of the 1968 national interlibrary loan code and the model regional or special code; primer-type instructions for borrowing and requesting libraries (including concise sections on special puzzlers such as academic theses, government publications, technical reports, materials in non-roman alphabets); and consideration of related, often problematical areas such as photocopy, copyright and reprinting, location requests, teletype requests, purchase of dissertations, and international loans. useful appendices (e.g., sample forms, some library policy statements, the text of the ifla international loan code), a bibliography and a detailed index complete the work. chapter levels vary of necessity. for the novice, the teletype request chapter may seem too brief or confusing, yet several appendices (for instance ) will be of interest even to the seasoned ill assistant. throughout, the effort has been for clarity, coverage, explicitness. the cost of an interlibrary loan transaction is too great to indulge sloppy, inefficient, or idiosyncratic procedures, and this manual is therefore required reading for all involved in interlibrary loans, and a copy should be at the elbow of every new clerical assistant. elizabeth rumics microsoft word 5888-14722-8-ce.docx exploratory subject searching in library catalogs: reclaiming the vision julia bauder and emma lange information technology and libraries | june 2015 92 abstract librarians have had innovative ideas for ways to use subject and classification data to provide an improved online search experience for decades, yet after thirty-‐plus years of improvements in our online catalogs, users continue to struggle with narrowing down their subject searches to provide manageable lists containing only relevant results. this article reports on one attempt to rectify that situation by radically reenvisioning the library catalog interface, enabling users to interact with and explore their search results in a profoundly different way. this new interface gives users the option of viewing a graphical overview of their results, grouped by discipline and subject. results are depicted as a two-‐level treemap, which gives users a visual representation of the disciplinary perspectives (as represented by the main classes of the library of congress classification) and topics (as represented by elements of the library of congress subject headings) included in the results. introduction reading library literature from the early days of the opac era is simultaneously inspiring and depressing. the enthusiasm that some librarians felt in those days about the new possibilities that were being opened by online catalogs is infectious. elaine svenonius envisioned a catalog that could interactively guide users from a broad single-‐word search to the specific topic in which they were really interested.1 pauline cochrane conceived of a catalog that could group results on similar aspects of a given subject, showing the user a “systematic outline” of what was available on the subject and allowing the user to narrow their search easily.2 marcia bates even pondered whether “any indexing/access apparatus that does not stimulate, intrigue, and give pleasure in the hunt is defective,” since “people enjoy exploring knowledge, particularly if they can pursue mental associations in the same way they do in their minds. . . . should that not also carry over into enjoying exploring an apparatus that reflects knowledge, that suggests paths not thought of, and that shows relationships between topics that are surprising?”3 however, looking back thirty years later, it is dispiriting to consider how many of these visions have not yet been realized. the following article reports on one attempt to rectify that situation by radically reenvisioning the library catalog interface, enabling users to interact with and explore their search results in a profoundly different way. the idea is to give users the option of viewing a graphical overview of their results, grouped by discipline and subject. this was achieved by modifying a vufind-‐based julia bauder (bauderj@grinnell.edu) is social studies and data services librarian, and emma lange (langemm@grinnell.edu) is an undergraduate student and former library intern, grinnell college, grinnell, iowa. exploratory subject searching in library catalogs: reclaiming the vision | bauder and lange doi: 10.6017/ital.v34i2.5888 93 discovery layer to allow users to choose between a traditional, list-‐based view of their search results and a visualized view. in the visualized view, results are depicted as a two-‐level treemap, which gives users a visual representation of the disciplinary perspectives (as represented by the main classes of the library of congress classification [lcc]) and topics (as represented by elements of the library of congress subject headings [lcsh]) included in the results. an example of this visualized view can be seen in figure 1. figure 1. visualization of the results for a search for “climate change.” subsequent sections of this paper summarize the library-‐science and computer-‐science literature that provides the theoretical justification this project, explain how the visualizations are created, and report on the results of usability testing of the visual interface with faculty, academic staff, and undergraduate students. information technology and libraries | june 2015 94 literature review exploratory subject searching in library catalogs since charles ammi cutter published his rules for a printed dictionary catalogue in 1876, most library catalogs have been premised on the idea that users have a very good idea of what they are looking for before they begin to interact with the catalog.4 in this classic view, users are either conducting known-‐item searches—they know the titles or the author of the books they want to find—or they know the exact subject on which they are interested in finding books. yet research has shown that known-‐item searches are only about half of catalog searches,5 and that users often have a very difficult time expressing their information needs with enough detail to construct a specific subject search. instead, much of the time, users approach the catalog with only a vaguely formulated information need and an even vaguer sense of what words to type into the catalog to get the resources that would solve their information need.6 even in the earliest days of the opac era, librarians were aware of this problem. some of them, including elaine svenonius and pauline cochrane, speculated about better use of subject and classification data to try to help users who enter too-‐short, overly broad searches focus their results on the information that they truly want. one of cochrane’s many ideas on this topic was to use subject and classification data “to present a systematic outline of a subject,” which would let users see all of the different aspects of that subject, as reflected in the library’s classification system and subject headings, and the various locations where those materials could be found in the library.7 svenonius suggested using library classifications to help narrow users’ searches to appropriate areas of the catalog. for example, she suggests, if a user enters “freedom” as a search term, the system might be programmed to present to the user contexts in which “freedom” is used in the dewey decimal classification, such as “freedom of choice” or “freedom of the press.” once the user selects a one of these phrases, svenonius continued, the system could present the user with additional contextual information, again allow the user to specify which context is desired, and then guide the user to the exact call number range for information on the topic. she concluded, “thus by contextualizing vague words, such as freedom, within perspective hierarchies, the computer might guide a user from an ineptly or imprecisely articulated search request to one that is quite specific.”8 ideas such as these had little impact on the design of production library catalogs until the late 1990s, when a dutch company, medialab solutions, began developing aquabrowser, which features a word cloud composed of synonyms and other words related to the search term and allows users to refocus their search by clicking on these words.9 aquabrowser became available in the united states in the mid-‐2000s, shortly before north carolina state university launched its endeca-‐based catalog in 2006.10 while aquabrowser’s word cloud is certainly visually striking, the feature that these and most of the subsequent “next-‐generation” library catalogs implement that has had the most impact on search behavior is faceting. facets, while not as sophisticated as the systems envisioned by exploratory subject searching in library catalogs: reclaiming the vision | bauder and lange doi: 10.6017/ital.v34i2.5888 95 svenonius and cochrane, are partial solutions to the problems they lay out. facets can serve to give users a high-‐level overview of what is available on a topic, based on classification, format, period, or other factors. they can also help guide a user from an impossibly broad search to a more focused one. various studies have shown that faceted interfaces are effective at helping users narrow their searches, as well as helping them discover more relevant materials than they did when performing similar tasks on nonfaceted interfaces.11 however, studies have also shown that users can become overwhelmed by the number and variety of facets available and the number of options shown under each facet.12 visual interfaces to document corpora when librarians were pondering how to create a better online library catalog, computer scientists were investigating the broader problem of helping users to navigate and search large databases and collections of documents effectively. visual interfaces have been one of the methods computer scientists have investigated for providing user-‐friendly navigation, with perhaps the most prominent early advocate for visual interfaces being ben shneiderman.13 in recent years, shneiderman and other researchers have built and tested various types of experimental visual interfaces for different forms of information-‐seeking.14 however, with a few exceptions, most of these visual interfaces have remained in a laboratory rather than a production setting.15 with the exception of the “date slider,” a common interface feature that displays a bar graph showing dates related to the search results and allows users to slide handles to include or exclude times from their search results, few current document search systems present users with any kind of visual interface. method the grinnell college libraries use vufind, open-‐source software originally developed at villanova university as a discovery layer to use over a traditional ils. vufind in turn makes use of apache solr, a powerful open-‐source indexing and search platform, and solrmarc, code developed within the library community that facilitates indexing marc records into solr. using solrmarc, marc fields and subfields are mapped to various fields in the solr index; for example, the contents of marc field 020, subfield a, and field 773, subfield z, are both mapped to a solr index field called “isbn.” more than fifty solr fields are populated in our index. our visualization system was built on top of vufind’s solr index and visualizes data taken directly from the index. the visualizations are created in javascript using the d3.js visualization library, and they are designed to implement shneiderman’s visual information seeking mantra: “overview first, zoom and filter, then details-‐on-‐demand.”16 the goal was to give users the option of viewing a graphical overview of their results, grouped by disciplinary perspective and topic, and then allow them to zoom in on the results from specific perspectives or on specific topics. once they have used the interactive visualization to narrow their search, they can choose to see a traditional list of results with full bibliographic details about the items. this would, ideally, provide a version of the information technology and libraries | june 2015 96 systematic outline that cochrane envisioned. it should also support users as they attempt to narrow down their search results and focus on a specific aspect of their chosen subject without overwhelming them with long lists of results or of facets. currently, we are visualizing values of two fields, one containing the first letter of the items’ library of congress classification (lcc) numbers and the other containing elements of the items’ library of congress subject headings (lcsh). this data is visualized as a two-‐level treemap.17 first, large boxes are drawn representing the number of items matching the search within each letter of the lcc. within the largest of these boxes, smaller boxes are drawn showing the most common elements of the subject headings for items matching that search within that lcc main class. less common subject heading elements are combined into an additional small box, labeled “x more topics”; clicking on that box zooms in so that users only see results from one lcc main class, and it displays all of the lcsh headings applied to items in that group. similarly, users can click on any of the smaller lcc boxes, which do not contain lcsh boxes in the original visualization, to zoom in on that lcc main class and see the lcsh subject headings for it. both the large and the small boxes are sized to represent what proportion of the results were in that lcc main class or had that lcsh subject heading. this is easier to explain with a concrete example. let’s say a student were to search for “climate change” and click on the option to visualize the results. you can see what this looks like in figure 1. instead of seeing a list of nearly two thousand books, the student now sees a visual representation of the disciplinary perspectives (as represented by the main classes of the lcc) and topics (as represented by elements of the lcsh) included in the results. users could click to zoom in on any main class within the lcc to see all of the topics covered by books in that class, as in figure 2, where the student has zoomed in on “s – agriculture.” or users could click on any topic facet to see a traditional results list of books with that topic facet in that main class. at any zoom level, users could choose to return to the traditional results list by clicking on the “list results” option.18 we launched this feature in our catalog midway through the spring 2014 semester. formal usability testing was completed with five advanced undergraduates, three staff, and two faculty members in the summer of 2014. (see appendix a for the outline of the usability test.) one first-‐ year student completed usability testing in the fall 2014 semester. the usability study asked participants to complete a set list of nine specific, predetermined tasks. some tasks involved the use of now-‐standard catalog features, such as saving results to a list and emailing results to oneself, while about half of the tasks involved navigation of the visualization tool, which was entirely new to the participants. each participant received the same tasks and testing experience regardless of their status as a student, faculty, or staff, and each academic division was represented among the participants. exploratory subject searching in library catalogs: reclaiming the vision | bauder and lange doi: 10.6017/ital.v34i2.5888 97 figure 2. visualization of the results for a search for “climate change,” filtered to show only results with library of congress classification numbers starting with s. results usability testing revealed no major obstacles in the way of users’ ability to navigate the visualization feature; the visualized search results were quickly deciphered by the participants with the assistance of the context set by the study’s outlined tasks. familiarity with library catalogs in general, and the grinnell college libraries catalog in particular, showed no marked impact on users’ performance. no particular user group performed as an outlier in regards to users’ general ability to complete tasks or the time required to do so. the most common issue to arise during the session concerned the visualization’s truncated text, which appears in the far left column of results when the descriptor text contains too many characters for the space allocated. (an example of this truncated text can be seen in figure 1.) the information technology and libraries | june 2015 98 subject boxes appearing in the furthest left column contain the least results, and therefore receive the least space within the visualization. this limited space sometimes results in truncated text. the full-‐text can be viewed by hovering over the truncated text box, but few users discovered this capability. another common concern involved a participant’s ability to switch their search results from the default list view to the visualized view. all participants were capable of selecting the “visualize these results” button required to produce the visualization, but a handful of participants expressed that they feared they would not find that option if they were not prompted to do so. participants remarked that the visualization initially appeared daunting but then quickly became comfortable navigating the results. most participants, including staff, stated that they found the tool useful and intended to use it in the future during the course of their typical work at the college. conclusion librarians have had innovative ideas for ways to use subject and classification data to provide an improved online search experience for decades, yet after thirty-‐plus years of improvements in online catalogs, users continue to struggle with narrowing down their searches to produce manageable lists containing only relevant results.19 computer scientists have been advocating for interfaces to support visual information-‐seeking since the 1980s. finally, hardware and software have improved to the point where many of these ideas can be implemented feasibly, even by relatively small libraries. now is the time to put some of them into production and see how well they work for library users. the particular visualizations reported in this article may or may not be the best possible visualizations of bibliographic data, but we will never know which of these ideas might prove to be the revolution that library discovery interfaces need until we try them. exploratory subject searching in library catalogs: reclaiming the vision | bauder and lange doi: 10.6017/ital.v34i2.5888 99 appendix a. usability testing instrument introductory questions before we look at the site, i’d like to ask you just a few quick questions. —have you searched for materials using the grinnell college libraries’ website before? if so, what for and when? (for students only: could you please estimate how many research projects you’ve done at grinnell college using the library catalog?) in the grinnell college libraries, we’re testing out a new tool in our catalog that presents search results in a different way than you are used to. now i’m going to read you a short explanation of why we created this tool and what we hope the tool will do for you before we start the test. research is a conversation: a scholar reads writings by other scholars in the field, then enters into dialogue with them in his or her own writing. most of the time, these conversations happen within the boundaries of a single discipline, such as chemistry, sociology, or art history, even when many disciplines are discussing similar topics. but when you do a search in a library catalog, writings that are part of many different conversations are all jumbled together in the results. it’s like being thrown into one big room where all of these scholars, from all of these different disciplines, are talking over each other all at once. our new visualization tool aims to help you sort all of these writings into the separate conversations in which they originated. scenarios now i am going to ask you to try doing some specific tasks using 3search. you should read the instructions aloud for all tasks individually prior to beginning each. and again, as much as possible, it will help us if you can try to think out loud as you go along. please begin by reading the first scenario aloud and then begin the first scenario. if you are unsure whether you finished the task or not, please ask me. i can confirm if the task has been completed. once you are done with scenario 1, please continue onto scenario 2 by reading it aloud and then beginning the task. continue this process until all scenarios are finished. if you cannot complete a task, please be honest and try to explain briefly why you were unsuccessful and continue to the next. 1. pretend that you are writing a paper about issues related to privacy and the internet. do a search in 3search with the words “privacy internet.” 2. please select the first worldcat result and attempt to determine whether you have access to the full text of this book. if not, please indicate where you could request the full text through the interlibrary loan service. 3. go back to your initial search results. please choose “explore these results” of the ebsco database results. choose an article. if you have unlimited texting, have the article’s information technology and libraries | june 2015 100 information texted to your cell phone. then, add the article to a new list for future reference throughout this project. 4. go back to your initial search results. for grinnell college’s collections results, click on the “explore these results” link. then click on the “visualize results” link to visualize the results. which disciplines appear to have the greatest interest in this topic? 5. when privacy and the internet are discussed in the context of law, what are some of the topics that are frequently covered in these discussions? 6. one specific topic you are considering is the legal issues around libel and slander on the internet. how many resources do the libraries have on that specific topic? 7. click on “q – science,” to see the results authored by theoretical computer scientists. based on these results, what are some of the topics that are frequently covered in their discussions when these computer scientists discuss privacy and the internet? 8. pretend that you are writing this paper for a computer science class and you are supposed to address your topic from a computer science perspective. please narrow your results to only show results that are in the format of a book. based on this new visualization, what might be some good topics to consider? 9. add one of these books to the list you created in step 3. please email all of the items on this list to yourself. debriefing thank you. that is it for the computer tasks. i have a few quick questions for you now that you have gotten a chance to use the site. 1. what do you think about 3search? is it something that you would use? why or why not? 2. what is your favorite thing about 3search? 3. what is your least favorite thing about 3search? 4. did you find the visualization function useful? why or why not? 5. do you have any recommendations for changes to the way this site looks or works? exploratory subject searching in library catalogs: reclaiming the vision | bauder and lange doi: 10.6017/ital.v34i2.5888 101 references 1. elaine svenonius, “use of classification in online retrieval,” library resources & technical services 27, no. 1 (1983): 76–80, http://alcts.ala.org/lrts/lrtsv25no1.pdf. 2. pauline a. cochrane, “subject access—free or controlled? the case of papua new guinea,” in redesign of catalogs and indexes for improved online subject access: selected papers of pauline a. cochrane (phoenix: oryx, 1985), 275. previously published in online public access to library files: conference proceedings: the proceedings of a conference held at the university of bath, 3– 5 september 1984 (oxford: elsevier, 1985). 3. marcia bates, “subject access in online catalogs: a design model,” journal of the american society for information science 37, no. 6 (1986): 363, http://dx.doi.org/10.1002/(sici)1097-‐ 4571(198611)37:6<357::aid-‐asi1>3.0.co;2-‐h 4. charles ammi cutter, rules for a printed dictionary catalog (washington, dc: government printing office, 1876). 5. david ward, jim hahn, and kirsten feist, “autocomplete as a research tool: a study on providing search suggestions,” information technology & libraries 31, no. 4 (2012): 6–19, http://dx.doi.org/10.6017/ital.v31i4.1930; suzanne chapman et al., “manually classifying user search queries on an academic library web site,” journal of web librarianship 7 (2013): 401–21, http://dx.doi.org/10.1080/19322909.2013.842096. 6. n. j. belkin, r. n. oddy, and h. m. brooks, “ask for information retrieval: part i. background and theory,” journal of documentation (1982): 61–71, http://dx.doi.org/10.1108/eb026722; christine borgman, “why are online catalogs still hard to use?,” journal of the american society for information science (1996): 493–503, http://dx.doi.org/10.1002/(sici)1097-‐ 4571(199607)47:7<493::aid-‐asi3>3.0.co;2-‐p; karen markey, “the online library catalog: paradise lost and paradise regained?,” d-‐lib magazine 13, no. 1/2 (2007), http://www.dlib.org/dlib/january07/markey/01markey.html. 7. cochrane, “subject access—free or controlled?,” 275. 8. svenonius, “use of classification in online retrieval,” 78–79. 9. jasper kaizer and anthony hodge, “aquabrowser library: search, discover, refine,” library hi tech news (december 2005): 9–12, http://dx.doi.org/10.1108/07419050510644329. 10. kristen antelman, emily lynema, and andrew pace, “toward a twenty-‐first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39, http://dx.doi.org/10.6017/ital.v25i3.3342. 11. tod olson, “utility of a faceted catalog for scholarly research,” library hi tech (2007): 550– 61, http://dx.doi.org/10.1108/07378830710840509; jody condit fagan, “usability studies of information technology and libraries | june 2015 102 faceted browsing: a literature review,” information technology and libraries 29, no. 2 (2010): 58-‐66, http://dx.doi.org/10.6017/ital.v29i2.3144. 12. kathleen bauer, “yale university library vufind test—undergraduates,” november 11, 2008, accessed september 9, 2014, http://www.library.yale.edu/usability/studies/summary_undergraduate.doc. 13. see, for example, ben shneiderman, “the future of interactive systems and the emergence of direct manipulation,” behaviour & information technology 1 (1982): 237–56, http://dx.doi.org/10.1080/01449298208914450; ben shneiderman, “dynamic queries for visual information seeking,” ieee software 11 (1994): 70–77, http://dx.doi.org/10.1109/52.329404. 14. see, for example, aleks aris et al., “visual overviews for discovering key papers and influences across research fronts,” journal of the american society for information science & technology 60 (2009): 2219–28, http://dx.doi.org/10.1002/asi.v60:11; furu wei et al., “tiara: a visual exploratory text analytic system,” in proceedings of the 16th acm sigkdd international conference on knowledge discovery and data mining (washington, dc: acm, 2010), 153–62, http://dx.doi.org/10.1145/1835804.1835827; cody dunne, ben shneiderman, robert gove, judith klavans, and bonnie dorr, “rapid understanding of scientific paper collections: integrating statistics, text analysis, and visualization,” journal of the american society for information science & technology 63 (2012): 2351–69, http://dx.doi.org/10.1002/asi.22652. 15. the most notable exception is carrot2 (http://search.carrot2.org), a search tool that will automatically cluster web search results and display visualizations of those clusters. 16. ben shneiderman, “the eyes have it: a task by data type taxonomy for information visualizations,” september 1996, accessed april 27, 2014, http://drum.lib.umd.edu/bitstream/1903/5784/1/tr_96-‐66.pdf. 17. ben shneiderman, “treemaps for space-‐constrained visualization of hierarchies: including the history of treemap research at the university of maryland,” institute for systems research, accessed october 6, 2014, http://www.cs.umd.edu/hcil/treemap-‐history. 18. to explore this feature in our catalog, go to https://libweb.grinnell.edu/vufind/search/home, do a search, and click on the “visualize results” link in the upper right. 19. a recent project information literacy report found that the two aspects of research that first-‐ year students found most difficult were “coming up with keywords to narrow down searches” and “filtering and sorting through irrelevant results from online searches.” alison j. head, learning the ropes: how freshmen conduct course research once they enter college (project information literacy, december 5, 2013), http://projectinfolit.org/images/pdfs/pil_2013_freshmenstudy_fullreport.pdf, 15. lib-s-mocs-kmc364-20140601053338 two types of designs/ mcgee 203 book reviews the proceedings of the international conference on training for information work, rome, italy , 15th-19th november 1971, edited by georgette lubock. joint publication of the italian national information institute, rome and the international federation for documentation, the hague; f.i.d. publ. 486; sept. 1972, rome, 510 p. let's face it: there is something about any proceedings that elicits a very personal reaction in many of us: "here are papers that either, a) got their authors a trip to the conference city; b ) tell how we did good at our place; or c) unabashedly present h.b.i.'s( half baked ideas )." i personally like proceedings that have many papers under category c); such papers make me think ( or laugh ). the great majority of papers in these rome proceedings fall basically under category b), i.e.-'how we done it good,' and some quite obviously under a), i.e.-'have paper will travel'-well it was rome, italy, after all. however, there is a smattering of papers that fall under c), i.e.-h.b.i.'s. so for those interested in the topic, these proceedings offer among other things some food for speculative thought. for these other things let us start at the beginning. the contents consists of prefatory sections, one opening address, sixty-six papers, a set of twenty brief conclusions, three closing addresses, a summary of work at the conference, an author index, and a list of participants and authors' addresses. the papers are organized according to two major sessions: one on "training of information specialists" (nine invited and fortytwo submitted papers ) and another on "training of information users" (six invited and nine submitted papers ). the larger number of papers on training of specialists vs. training of users probably represents a good assessment of real education interests in the field. the conference was truly international: authors came from four continents, twenty countries and four international organizations. most represented were: italy as host country with fifteen papers, usa with eight, great britain with seven, and france with six papers. the concern for information science education is indeed worldwide; however, if the presented papers are any measure, such education is in big trouble, because one is left with the impression that information science education is in some kind of limbo: the bases, relations, and directions are muddled or nonexistent. but then isn't all contemporary higher education in big trouble, and in limbo? the conceptions of what information science education is all about differ so widely from paper to paper that the question of this difference in itself could be a subject of the next conference. it is my impression that the differences are due to a) widely disparate preconceptions of the nature of "information problems,'' and b) incompetence of a number of authors in relation to the subjects. accomplishments in some other field or, even worse, 204 journal of library automation vol. 5/3 september, 1972 a high administrative title does not necessarily make for competence in information science education. the proceedings offer a fascinating picture of information science education by countries and by various facets. it also offers frustration due to unbelievably unhygienic semantic conditions in the treatment of concepts, including a confusion from the outset of "training" and "education." the first business of the field should be toward clearing its own semantic pollution; such a conclusion can be derived even after a most cursory examination of the papers. my own choices for the three most interesting papers are: -v. slamecka and p. zunde, "science and information: some implications for the education of scientists;" (usa) -s. j. malan, "the implications for south african education in library science in the light of developments in information science;" (south africa) -w. kunz and h. w. j. rittel, "an educational system for the information sciences." (germany) the editing of the ptoceedings is exemplary; the editors and conference organizers worked hard and conscientiously. the proceedings also provide the best single source published so far from which one could gain a wide international overview not only of information science education but also of information science itself, including implicitly the problems the field faces. in this lies the main worth of the proceedings. t efko saracevic computer processing of library files at durham unive1·sity; an ordering and cataloging facility for a small collection using an ibm 360/ 67 machine. by r. n. oddy. durham, england: university library, 1971. 202p. £1.75. the task of the book is to guide the reader in the use of the lfp (library file processing) system developed by the durham university library. the lfp system orders items and prints book catalogs in various sequences for a small collection of items with the aid of an electronic digital computer. the system is batch with card input and printed output; the programs are written in pl/1. "the lfp system was designed to be flexible and easy to operate for small files , and is less suitable for files larger than 10,000 items because there are then other problems which it does not attempt to solve." (p. 10). the book fulfills its assigned task well; it is an excellent example of explanations and instructions for the personnel charged with the day to day operations for the particular system described. the book includes excellent introductory chapters on job control language, how computers operate, file maintenance, etc. outside of the durham university library, however, the book has little use except as a model of a well done operations guide. kenneth ]. bierman lib-s-mocs-kmc364-20141005044246 127 isad ad hoc committee reports introduction after seven years of operation as a division, it is an appropriate time to take stock of the division objectives and to describe desirable future activities of the division. to this end we established in july 1972, four ad hoc committees whose charges were to provide overviews of the division from slightly different perspectives. the first of these, the committee on objectives, was to review the activities of isad since its founding and to recommend future objectives and general activities. second, the seminar and institute topics committee was charged with reviewing past !sadsponsored institute activities and with recommending topics for future seminars and institutes which would be of most value both for the isad membership and for general library personnel. the third committee, research topics, was charged with assembling a priority list of !sad-related research and development needs of libraries. and, the fourth committee, membership survey, was charged with determining by a survey important characteristics of the current isad membership, i.e., their employment, experience, education, interests, and expectations. it can be seen from their charges that there is an interrelationship among the committees and thus, in the work to meet their charges, the committees would certainly wish to know the results of the work of the other committees. however, because of problems of timing and communication, it was recognized that the initial committee work would have to be carried out largely independent of the work of the other committees and that the task of knitting these committee results together would have to be carried out after the committees finished this stage of the work. the report of the committee on objectives was reviewed and accepted by the isad board of directors at the national meeting in las vegas. the work of the seminar topics committee and the research topics committee has not been reviewed by the board of directors. they are presented here in order to elicit thought and comment by the isad membership. the membership survey committee has established the survey to be carried out, but the survey has been delayed until funding could be obtained. this funding has been made available and analysis of survey results will be provided to the membership in the coming year. i should like to thank not only members of the ad hoc committees for their contributions to the division during the past year, but also, the members of the standing committees, the representatives and the chairmen of the discussion groups, all of whom contributed to an active, interesting and useful year for the division.-ralph m. shoffner, past-president, information science and automation division. 128 journal of library auto11ultion vol. 6/ 3 september 1973 report of the committee on objectives background on january 27, 1966, the council of the american library association voted to establish ala's fourteenth division, the information science and automation division, which would concern itself, said the council, "with the development and application of electronic data processing techniques and the use of automated systems in all areas of library work." before this date, there was no membership unit within ala with the sole responsibility for library automation, so there was no effective way for librarians involved with automation to communicate, or to learn from each other's experiences. there was also no way for the national professional association to provide leadership for those in need of information and guidance in the field, since responsibility for the area (to the extent it had been recognized at all) lay fragmented among various units within ala. two of the most important objectives of the division at the beginning, therefore, were the establishment of professional leadership at a national level (largely through the office of the executive secretary) to those libraries and librarians needing advice and help in the field of library automation, particularly smaller libraries unable to afford competent staff members with assigned responsibility in this area; and provision of a forum for discussion of library automation problems and experiences, and other means of communicating information in the field. a number of specific activities were also suggested at the beginning, including the following: 1. the establishment of a journal which would pull together articles on library automation, which at that time were appearing in many different places; 2. provision of a clearinghouse for information on library automation projects; 3. creation of a "bank" of computer programs and related documentation for use by other libraries planning similar applications; 4. evaluation of library automation equipment and applications; 5. tutorial seminars, preconference meetings and other educational programs. other concerns or matters suggested to the ala committee on organization for immediate attention by the new division included: 1. standardization of coding systems; 2. interlibrary distribution of bibliographic data in machine-readable form; 3. shared programming; 4. establishment of library communication networks; 5. automated searching techniques; 6. compatibility of equipment and programming; 7. use measurement and user studies involving automated systems; 8. the financing of cooperative automation projects; and 9. various social and legal problems relating to automation in libraries. during the first few years of the division's existence, most of these activities were pursued, some of them extensively (the seminars, for example) and others isad ad hoc committee reports 129 for only a brief period of time, until the need no longer seemed to exist. early committees also considered the need for a special library programming language, the computer aspects of the new copyright bill (then being drafted), and the method of designating periodical publications known as coden. the committee on objectives in the spring of 1972, ralph shoffner, the incoming president of !sad, appointed an ad hoc committee on objectives to consider and present formal recommendations regarding the future objectives and activities of isad. part of the thought behind appointment of the committee was that since isad had been in existence for five years it would be an appropriate time to review both its activities and objectives, and even to consider whether the division should continue to exist; it also seemed likely that some of the original objectives of the division might have been attained, or might no longer be appropriate, and a revised set of objectives might be needed for the forthcoming years. besides this writer, the committee has included frederick g. kilgour of the ohio college library center; henriette avram of the library of congress; john mcgowan of northwestern university; john knapp of richard abel & company; joseph treyz of the university of wisconsin; and pauline atherton of the school of library science at syracuse university. the membership of the committee was intended not only to incorporate knowledge of !sad's history and informed judgment of the need for various possible activities in the future, but also to be as representative as possible of the various types of library activity and the types of institutions presumably served by the division. first meeting and tentative conclusions the committee held its first meeting on wednesday, june 28, 1972 during the ala annual conference in chicago. after considerable discussion, it was clear that the consensus of the committee members, based on their own experiences and the comments of colleagues to whom they had talked, was that: 1. the education function has been important, and should be continued, especially the seminars, which have been very useful (at least 2,000 people have attended the marc institutes co-sponsored by isad and the library of congress, and hundreds more have attended seminars on other topics at local, regional and national meetings); 2. the journal of library automation and technical communications have also been useful, and should not only be continued but improved and expanded; 3. the executive secretary provides a focus for inquiries from libraries needing help and advice, and has provided much useful information to many libraries and librarians; 4. the idea of a computer program "bank" is not practical at this time, largely because of the costs to ala in terms of staff and operating expenses that such a project would entail; 5. evaluation of equipment and applications is still needed, but would require more staff and expertise than the association can fund at this time; 6. the division has provided useful forums for discussion, particularly such forums as the marc users group which meets twice a year to exchange ideas and ask questions of staff members from the marc office at lc; 130 journal of library automation vol. 6/ 3 september 1973 7. the division has also served a useful function in the promotion and development of standards, and in determining whether the standards have broad support in the library profession; 8. the division has helped to coordinate the concerns of other parts of ala regarding automat~on and information science, both through the executive secretary's office and through such devices as the preconference institute on data bases. the general but tentative conclusion was that the division should continue, but with a revised set of objectives, emphasizing the educative function, the provision of current information through various means, and the development and promotion of standards. requests for comment and responses the committee then decided to test these tentative conclusions by asking the opinions of a number of librarians and others who appeared to be in a position to judge the effectiveness of isad (or the lack of it), or who could be expected to reflect a useful variety of viewpoints. each was sent a letter outlining the purposes of the committee and the tentative conclusions, followed by a request for written comments; each was also invited to attend the next meeting of the committee in january 1973 at the midwinter meeting in washington. the responses by and large agreed with the tentative conclusions, but in many cases went further. for example, it was suggested that the division should take a more active stance, and should emphasize such things as making services available (especially through cooperation) that could not be made available before, reducing library operating costs, making the work of library staff more meaningful, minimizing the impact of library automation on people and jobs, encouraging research in library automation, and encouraging instruction in library automation for all students in library schools. it was also suggested that more emphasis be placed on reporting new developments at the annual meetings, after the model of the american society of physics, and that the division should be a vehicle for a "new librarianship" in which librarians would participate more fully in education and research rather than acting solely as a ''service" organization. another respondent commented that more active liaison was needed with other professional groups in the information processing field, e.g., asis, afips, jcet, etc.; that transferability of programs and applications should continue to be encouraged; and that library schools should be encouraged to offer more courses in library automation. many of those who received requests for comment accepted the committee's invitation to attend the midwinter meeting in washington, and added other points of view. among them were suggestions that isad should do more toward developing the utilization of machine-readable data base~, including the production of indexes to such data bases; foster and encourage computer-based library networks; and promote methods of accountability for librarians and libraries through development of techniques for measuring unit output of products and services. information technology one new and large area of activity for isad was also formally suggested at this meeting: leadership in, and organizational responsibility for, audiovisual isad ad hoc committee reports 131 and related educational technologies, including cable television. the origins of this suggestion date back some months before the washington meeting, and since adoption of this suggestion would entail a major enlargement of isad's area of responsibility it may be useful to provide the background. at present, ala has an audiovisual committee established during the reorganization of the late fifties, plus five divisional subcommittees officially tied to the audiovisual committee, and approximately nineteen other committees actively concerned with audiovisual matters but with no official connection to the ala audiovisual committee. this structure has not provided a "home" for media specialists or a focus for their interests and activities, and ala has been cl"itidzed for years for its sporadic and disjointed attempts to give proper attention to audiovisual matters. · in 1971, don s. culbertson, then executive secretary of isad, expressed formal concern over this issue to the isad board and proposed the establishment of an educational technology section within isad. following this, isad and the ala audiovisual committee agreed to a discussion group within isad to deterrtline the extent of interest and need for such a membership "home." the isad information technology discussion group, formed as a result of this initiative, met first on june 28, 1972, at the chicago conference. those present discussed areas of need and ·alternative organizational approaches, including the present organization of committees and subcommittees; the present organization, but with all committees reporting to the ala audiovisual committee; a round table; and affiliation with isad. becoming a part of isad seemed to this group to offer the most advantages : isad was an established division, capable of accommodating diverse interests, and with many concerns in common with the media specialists; it was already active in many areas of interest to media specialists, particularly standards development and educational programs; and affiliation would offer access to the two divisional publications, jola and ]ola technical communications. louise giles, leader of the discussion group, and pearce grove, chairman of the ala audiovisual committee, then met with the isad board and requested formal affiliation. the board referred the group to the committee on objectives, and on january 30, 1973, mrs. giles, mr. grove, and others interested in this request attended the midwinter meeting of the committee. discussion of this group's petition merged with the discussion of other activities of the division, as reported above, but the committee was unanimous in feeling that the group's request should be granted, and that the activities of isad should include audiovisual and related educational technology. recommendations as a result of its deliberations, and based on the comments and testimony of many interested and affected ala members, the committee now recommends the following: 1. that the division continue to exist; 2. that its area of responsibility include audiovisual and related educational technology; 3. that the objectives of the division be: a. advancement of the state of the art of librarianship via research in, and application of, information science and library automation; 132 ]ourool of library automation vol. 6/ 3 september 1973 b. professional leadership at the national level in the fields of information science and library automation; c. education and communication of information in these fields for libraries, librarians, and other interested parties; d. provision of expertise and assistance in these fields to other units of ala, and to other professional organizations; 4. that among other activities the division pursue, or continue to pursue, the following: . a. publication of the journal of library automation, technical communications, and other publications that may from time to time appear necessary, appropriate, and feasible; · b. provision of forums for communication of information in its fields of responsibility, including local, regional, and national seminars, institutes, regularly scheduled meetings, discussion groups, and special programs on specific topics; c. promotion of the development and use of appropriate standards; d. investigation, largely through committees, of matters of immediate, even though temporary, concern to the profession, within the division's area: of responsibility; e. encouragement through various means of techniques, approaches, and specific activities outside the division which are desirable for the profession (such as computer-based library networks, increased instruction in library automation and information technology in library· schools, costeffectiveness in automated library systems, and development of the utilization of machine-readable data bases); f. liaison with other professional organizations in its fields of responsibility. respectfully submitted, pauline atherton henriette avram frederick g. kilgour john knapp john mcgowan joseph h . treyz stephen r. salmon, chairman isad ad hoc committee reports 133 report of the committee on research topics introduction we take it· as axiomatic that the fundamental purpose of library automation is to increase the productivity of people who work in libraries. a necessary concomitant of this is that we must know what it is that people who work in libraries do. the question of what they should do is a deeper question, but not one that experts in automation are likely to shed much light on. what is new in the elicitation of the contents of library work is the need for specilication at a level of detail not needed when one must only specify for human direction. at the first, most elementary, level much of this has now been done. at least 25 or 30 major academic and public libraries have well-tested, working systems in operation and perhaps as many as 100 more have significant operations underway. the fact that some operations have failed outright, or fallen far short of promise, or been far too expensive, does not detract from the fundamental fact that a number of promising, economic systems are now in place and can be replicated throughout the library world at will. in this category we would place book catalog systems, basic circulation and acquisitions systems, and, more recently, card catalog production systems. such performance would not have been possible without great attention to detail and without the accumulation of a more precise understanding of how librarians work on these tasks. the question at hand is, given the developments of the past five years, where do we go from here? at least a portion of the answer lies in the observation that one of the by-products of the initial phase of automation is the creation of substantial data bases in machine-readable form which can be used to provide greater insight into the ne,.:t higher level of library work. analysis of this unprecedented mass of data on library activity must therefore be placed high on any priority list of future activity. we shall consider some of the possibilities in traditional terms by enumerating some of the problems in three broad areas of library work: acquisitions, cataloging, and circulation. acquisitions the fundamental problem in acquisitions is the problem of selectivity. some rough estimates may help put things in perspective: although we know of no formal estimates of the total number of monographs residing in archives in, say, the english speaking world, it must surely be in excess of twenty million volumes. we do know from the u.s. office of education that the median-sized university library has something in the order of 750,000 volumes; thus with the possible exception of the library of congress, the british museum, and (literally ) a handful of major university and public libraries there is little hope of ever obtaining a "complete" collection. the problem is not new and librarians have grappled with it for decades-if not centuries-with varying degrees of success. the several faces of the problem can perhaps best be seen by examining the questions posed by the several members of this committee: 1. how can libraries, with a·ininimum expenditnre of time and money, deter134 ]our1ull of library automation vol. 6/ 3 september 1973 mine how well they are serving their intended audiences? 2. what methodologies exist, or can be created, for dividing collection responsibility among members of a library consortium? 3. is it possible to develop criteria which libraries of varying size can use to identify not just the subject matter of materials to be purchased, but actually place priorities on serial titles, perhaps some monographs , government documents, etc. which should reside in a given library? if we presuppose the existence of automated circulation, acquisition, and cataloging systems-with accompanying statistical packages to simplify the routine processing of machine-readable data-the following sorts of studies might help to illuminate these questions: 1. d etailed statistics about who is borrowing what kinds of books; how these borrowing patterns change within a year and from year to year provide useful information about one aspect of library service. acquisitions data on the number and types of books ordered or suggested by patrons and the turn around time necessary to obtain such items and comparative studies of books obtained at patron request versus books obtained by other means may be helpful. 2. if two or more libraries are to act together in planning their acquisitions, it seems reasonable to suggest that each library should first carefully describe its collection and borrowing pattern at a rather detailed level. even libraries not formally allied in a consortium would do well to publish more information about their holdings to allow individual users to better judge which of the several libraries available to the user is most likely to contain the desired information. 3. better circulation data cannot help but be useful in selecting those categories of publications that are likely to circulate well. periodic examination of citation indexes can shed insight into the problem of selecting periodicals. individual monograph selection might be improved by accumulating circulation data across a number of similar libraries to provide a "best-circulator" list to go along with the "best-seller" lists already available. many such studies have been made in the past. the joint availability of computers and machine-readable records of b'ansactions makes it rather inevitable that their number will increase-with or without the benefit of further research. given this, it seems reasonable to suggest that in addition to a frontal attack on the problem of selectivity in acquisitions, there should be a substantive effort towards improving the statistical methodology involved in analyzing the available data as well as an attempt to make more readily available those statistical techniques that have already been shown to have application in this area. cataloging if the key word in acquisitions is "selectivity," the corresponding term in cataloging is "access." some of the earliest automation efforts in libraries were directed to the production of machine-readable catalogs and associated programs to produce from such data bases printed book catalogs. because of the large cost in converting the reb'ospective catalog, such efforts w ere largely limited to small collections and had, p erhaps, their best application in public library systems where multiple copies of the book catalog had obvious application. in such applications, access was enhanced by the reduced cost of multiple copies which enisad ad hoc committee reports 135 abled the system not only to maintain complete catalogs at each location but also to make copies available to neighboring institutions such as local school districts. more recently, the coming into being of the marc data base has led to . the creation of regional and national services for the more rapid creation of catalog card images. in these applications, the primary improvement in access is that provided by time gains that make it possible to get items on the shelf faster, however, some of these services, e.g. oclc, provide useful by-products such as the creation of an on-line union catalog of the various libraries using the system, thus facilitating interlibrary loan services. similarly, some automated circulation systems provide increased access to holdings through the use of on-line and/ or telephone access to author and/or title catalogs. (ohio state university, with one of the more sophisticated systems, notes that in each of the first two years of operation, circulation increased 15 percent-considerably more than the growth of the campus community.) computers have also been used to multiply the number of available orderings of a shelflist, increase access to titles by permuted title lists, develop use of citations through citation indexes, and increase subject access through accumulation of book indexes in consolidated volumes; in many cases access is further increased by publication-and multiplication of the access through multiple copies. extensions and refinements will almost certainly continue through the coming decade. what research, then, is necessary? the following topics seem w01thy of consideration: 1. develop measures of cost effectiveness of various access systems particularly with regard to the relative merits of on-line, telephone, and printed copies of access systems. 2. systematically enumerate the various types of information requests placed on libraries to obtain a better understanding of what libraries can do to supply needed information as well as documents and bibliographic references. circulation some of the most successful automation efforts have been obtained in circulation control, together with some of the most useful by-products. as costs come down, it seems natural to hope that similar usagl" information might be made available. in the noncirculating areas. specifically£ 1. determine the feasibility of entering records of replacement of documents that were removed from the shelf, but not borrowed, into the existing circulation system. 2. develop economic means for entering information about "information requests" into the same system. general in addition to those requests that have been more, or less, accumulated under the traditional headings, we would like to present the following general recommendations for future library studies: 1. determine the needs in terms of coordinated planning, cooperation, and hardware and software transferability which should be confronted before the fact, rather than after, as more and more regional operations take shape. ·136 journal of library automation vol. 6/ 3 september 1973 2. determine how libraries can develop the problem-solving and idea-producing capabilities of their staffs to the maximum. 3. develop a continuing education program for librarians covering isad related topics that can be presented throughout the u.s. at a reasonable price per person per class. summary the first phase of library automation required a significant effort to develop information on the basic chores of running a library. as the studies moved from the testing ground to fu11 operation they have started to generate a significant amount of information that can and should be exploited in tum to determine how library operations can be improved. such "research" will generally tend to be "applied" rather than "pure." they will tend to concentrate on costeffectiveness at least as much as on novelty. the studies themselves need not beand almost undoubtedly should not be-multimillion dollar efforts. although hardware and software developments will no doubt occur, the concentration appears to be more on planning and evaluating existing and proposed methods rather than on system "breakthroughs." respectfully submitted, don l . bosseau michael d. cooper douglas ferguson james l . dolby, chairman committee on research topics: members reports . . . concerning !sad related research and development needs, my first instinct was to try to think of needs occurring in hardware and software areas, but the influence of my new administrative duties has caused me to forego my old interest in systems work, by suggesting projects that are more planning oriented than technical. ( 1) determine the needs in terms of coordinated planning, cooperation, and hardware and software transferability which should be confronted before the fact, rather than after, as more and more regional oclc type operations take shape. with centers now on the drawing boards for texas, one in the southeast at atlanta or possibly tulane university, the california state university and colleges project, the university of california bibcenter, etc., it is possible that many of the duplications of effort of the past (involving individual libraries) could take place again, only on a larger scale. of course, this gets into "networks," but there is a similarity \vith the problems posed in the past which were due to incompatibilities between individual library automation efforts, and the potential for the same problems occurring with the oclc type of operations. again, standard formats and other recent accomplishments will help alleviate the magnitude of the problems. ( 2) with the implementation of a growing number of automated circulation systems, inventory control and processing systems, and the combined effects of tightening funding and inflated costs of library materials, libraries will, for the first time, have both a compelling reason to tighten up their collection development mechanism while also having available some of the statistical data required to determine the scope of material being collected. thus a library could scienisad ad hoc committee reports 137 tifically direct the nature of its acquisitions. specifically (though perhaps not clearly) i am suggesting that there is a need for research to study and perhaps develop criteria which academic libraries of varying sizes can use to identify not just the subject matter of materials to be purchased, but actually place priorities on serial titles, perhaps some monographs, government documents, etc. which should reside in a given library. such a set of criteria can be developed using circulation use statistics, statistical analysis of citation indexes, knowledge of the number of volumes and value of materials being published by subject areas and perhaps other factors. using statistics derived from citation indexes, as an example, we evaluated our journal collection in several subject areas in an effort to determine whether we were deriving maximum benefit in terms of cost and coverage from our existing journal collection. to do so we utilized a recent article by eugene garfield ("citation analysis as a tool in journal evaluation," science, 178:471-79,3 november 1972) which ranked journals by frequency and impact of citations. it was interesting to note that in some areas we were providing effective coverage of the literature for our faculty and research programs, whereas in other areas our collection was picking up only 1 percent to 2 percent of the useful information as evaluated on the basis of frequency of citation. of course what i am proposing is to do research into how this type of information can be stored and extracted, and evaluated to provide general academic libraries with scientifically based guides upon which to base their acquisitions programs. ( 3) where should libraries go with automation after inventory control, internal housekeeping functions, and mis activities become routine? in other words, if there is going to be a phase iii (?) in library automation, where will its emphasis be, or better yet, where should it be? i hope that this will provide at least something to think about and perhaps lead to a more clearly defined set of research topics.-dlb . . . . a number of research topics seem to me worth exploring. however i have no idea whether they fall within the scope of isad. ( 1) economics of depository storage facilities. should twelve million volume libraries be built, or should we have secondary storage facilities? ( 2) methodologies for dividing collection responsibility among members of a library consortium. ( 3) how to predict the usage of individual monographs, not classes of material. ( 4) develop a continuing education program for librarians covering isad related topics. present throughout the united states and at a price less than $10 per person per class. ( 5) undertake a study to determine a new editorial policy for the i ournal of library automation. my personal view would be that ]ola should move away from system descriptions and toward research topics.mdc. libraries require research that will develop the capacity to improve existing operations and respond innovatively with new services, programs, and products. this requires applied research that produces immediately usable tools that can 138 journal of library automation vol. 6/ 3 september 1973 be applied by a library staff with a minimum of outside or specialist help. ~ the focus is on the library staff at all levels (managers, supervisors, librarians, support personnel) and the aim is to enhance their ability to identify problems and fmmulate solutions within realistic constraints. the only way to move from the gene1ality of these considerations to the concreteness of what i consider important is to state the questions to which applied research should address itself. ( 1) how can libraries do more to supply needed information as well as documents and bibliographic references? ( 2) how can libraries, with a minimum expenditure of time and money, determine how well they are serving their intended audience( s)? ( 3a) how can libraries develop the problem-solving and idea-producing capabilities of their staffs to the maximum? ( 3b) how can a library staff develop a cost-consciousness combined with an aggressive approach for funds for projects with demonstrable results? comments it should be clear that these questions assume that libraries can better serve their patrons if existing staff skills are developed at the same time as the library becomes more actively involved with those it serves. technological development and multimillion dollar research are not needed. managerial and staff cookbooks, library-based demonstration projects and on-the-job training programs are needed. such tools and activities would help develop the flexible, mobile, and aggressive counterpart to the library's equally important passive, conserving, and stabiljzing role. for example the proposed research might have the following kinds of results. it might show how library managers (directors etc.) use noneconomic rewards in the library work system to foster cost control and reduction, and new service ideas and opportunities to get in touch with library users . research might produce a howto-do-it manual on user studies that might show how to use existing data or quickly gather data, perhaps on a sampling basis, to evaluate performance of a service or operational unit.-df. isad ad hoc committee reports 139 report of the committee on seminar and institute topics the committee the mission of the ad hoc committee as defined by the committee with the concurrence of ralph shoffner, president of isad, was: "to propose a plan for a program of seminars and institutes within the interests and educational needs of the information science and automation division (!sad) of the american library association. the plan should cover the period commencing july 1974 and ending june 1978." the members of the committee are: pauline atherton (syracuse university), brett butler (information design, inc.), jay cunningham (university of california-berkeley), paul fasana (new york public library), diana ironside (ontario institute for studies in education), sue martin (harvard university), ron miller (new england library information network), elaine svenonius (university of denver). the approach the committee began its work by placing the fulfillment of its mission firmly within the constraints of one major working assumption: that the educational function which the plan should serve must be directed toward accomplishing the objectives and needs of the isad membership. three parallel activities were undertaken as a result of our adherence to that assumption. first, a review of the extant data which resulted from seminars and institutes held by isad over ·the past several years was undertaken in order to identify characteristics of success or inadequacy which could be helpful to the committee. second, the deliberations of a parallel group, the isad committee on objectives, were obtained to provide the context within which the plan will function. third, the chairman of the isad membership survey committee was interviewed for the purpose of including questions pertinent to seminar and institute topics in the proposed survey. the remaining resources, external to the committee, were combined with the considerable experience and insight of the committee members. this experience includes continuing education techniques, information science research and education, automation of libraries and information services, computer and graphic technology, library cooperation and institute planning. some members play important roles in isad conference planning. it was hoped that the combination of these resources could then be directed toward the intriguing prospect of developing some reasonable prescience about how technology may be applied to libraries and information science activities during the period ending june 1978. the result of the first activity-a review of the historical data about isad seminars and institutesis discussed b elow, followed by a series of recommendations formulated by the committee within the context of the objectives proposed and accepted by the isad board of directors. 140 journal of library automation vol. 6/3 september 1973 seminar and institute activities: historical table 1 summarizes some of the data available to the committee about thirtytwo seminar and institute programs held since 1968: table 1 . 1sad institutes 1967-1972. dates programs location attendance 1967 june st ate of the art of library automation san francisco boo 1968 june automated circulation systems kansas city 600 july marc institute seattle 94 aug. marc institute denver 99 sep. marc institute new york 112 oct. marc institute chlcago 128 nov. marc institute boston 126 dec. marc institute atlanta 123 1969 feb. marc institute cleveland 120 mar. marc institute los angeles 120 mar. marc institute honolulu 52 apr. marc institute houston ( 100+ )? sep. marc institute san francisco 208 1970 jan. tutorial on library automation washington, d.c. 106 mar. marc institute washington, d.c. 167 mar. tutorial on library automation seattle 79 apr. marc institute evanston, il 92 apr. tutorial on library automation cambridge, ma 107 may tutorial on library automation new york 156 june marc institute cambridge, ma 105 oct. tutorial on library automation philadelphia 123 nov. tutorial on library automation san francisco 7 2 dec. library automation for school libraries dallas 38 1971 feb. library automation for school libraries atlanta 80 mar. tutorial on library automation elgin, il 77 apr. marc institute new york 100 may administration & management dedham, ma 54 nov. directions in information science education denver 67 1972 feb. marc i nstitute washington, d.c. 139 may microforms in library automation new york 53 june administration & management new york 52 sep. seminar on t elecommunications washington, d.c. llo total seminars/institutes: 32 estimated attendance: 4,460 by the end of september 1972, thirty-two seminars or institutes had been held across the united states from boston to honolulu. fifty percent of the sessions were concerned directly with the exposition and use of either the marc i or marc ii communications formats. the remaining seminars dealt with topics such as the introduction to library automation in general and to school libraries in particular, as well as library automation management, information science education, micrographic and telecommunications technologies. as a group, the marc-related institutes reached approximately 1,900 people isad ad hoc committee reports 141 (not allowing for multiple attendance); or 43 percent of the total attendance over the five-year period. the most heavuy attended institutes were those held either immediately b efore or after annual conferences of ala or asis. the northeast region (boston area, new yorlr., philadelphia, and washington, d.c.) hosted fourteen institutes; the midwest region (chicago area, cleveland, and kansas city ) hosted five; the southwest (denver, houston, and dallas) hosted four; the west coast (los angeles and san francisco) also hosted four; the northwest ( seattle) and southeast (atlanta) regions each were the site of two institutes. one institute was held in honolulu, and another was canceled. if preand post-conference institutes are excluded, the arithmetic mean attendance at a typical institute was approximately 98. the range was from a low of 38 (library automation for school libraries in dallas, december 1970) to a high of 167 (marc institute in washington, d.c., in march of 1970). it is risky to conjecture why any single institute was more successful than another purely on tht• basis of attendance. attendance figures are probably more nearly a function of the expectations of the target population, the magnetism of the topic and how much physical and financial investment is required to attend. there is little doubt, however, that the best attendance can be obtained if seminars are held immediately before or after an annual meeting of ala or isad in the same city. some post-institute evaluative data, also provided by isad headquarters, seemed to indicate that when dissatisfaction with institutes was registered by participants, it was derived from the diverse backgrounds of participants who attended institutes. for some, the content was too elementary; for others, it was too advanced. some attendees were technologists while others were managers. both large and small libraries of all types were represented. another factor relating to the apparent effectiveness of a particular institute was the perceived competence or teaching ability of particular institute faculty members. these factors do not reveal themselves as startling revelations, but do imply that some care should be used to describe and reach particular target populations, and to select instructional leaders. another important fact: between two-thirds and three-quarters of the participants of a typical institute were not members of !sad. it is readily apparent therefore, that these programs have reached an audience drawn from a broad segment of the library professional community. this condition is no doubt a good one, b ut it may imply a differential registration fee structure as well as a fruitful context in which to recruit new division members. with these considerations of the past in mind, the committee offers the comments and recommendations which follow. it should not be assumed that these recommendations are necessarily endorsed unanimously by the committee members. seminar and institute activities : recommendations 1. the isad committee on seminar and institute topics and the isad committee on objectives fully concur that the seminar/institute program has been valuable as a division activity in service of the profession. the committee therefore recommends that a program of seminars and institutes be continued and broadened by the division during the next five years. 142 journal of library automation vol. 6/ 3 september 1973 2. a history of fruitful conjoining of the division's seminars and institutes program with the library of congress, the american society for information science and other institutes and associations has been mutually beneficial. the committee therefore recommends that a conscious effort be made to continue the division's seminars and institutes program in cooperation with appropriate activities within the library of congress and other federal library-oriented activities (such as the federal libra1y cooperative center), and expand its target population and faculties to include members of asis, acm, iia, afips and jcet. 3. the geographic distribution of previous seminars and institutes based upon population concentrations has appeared to be reasonably sound. the committee there fore recommends that the division's seminars and. institutes program continue to disperse and replicate seminars among six geographic regions; the northeast, southeast, midwest, southwest, west and northwest. the gulf coast area sh()uld be considered as a central location for programs which the division may co-sponsor with the southeastern library association and the southwestern library association. programs occurring in the northern half of the continental united states should coordinate with our canadian counterparts. 4. technological advances which may be applicable to the concerns of the division are developing with astonishing speed. the committee therefore recommends that an "alert group" be formed from technically aware people for the purpose of formulating topics which should be considered by the planning group assigned to monitor the division's seminars and institutes program. a secondary source for topical input to a planning group should be the inclusion of "a topic alert form" included in the division's publications. 5. planning and implementing a series of seminars and institutes is a severe time burden on volunteers. the committee therefore recommends that the reliance upon voluntary workers to design and implement seminars and institutes should diminish, the major burden of such work passing to staff responsible for this work at ala headquarters. the staff salary should be substantially reimbursed from income derived from the program. further, subcontractors whose products are educational programs, as opposed to systems, hardware or supplies, should be considered as supplemental manpower to both voluntary committee work and !sad staff investment in particular seminars or institutes. 6. commercial interests may have particular products to sell to the isad community and view the division's seminars and institutes program as a channel through which their products can be marketed. the committee therefore recommends that a policy be formally adopted which permits the participation of representatives of commercial enterprises in the seminars and institutes program under the condition that appropriate competitors have equal and simultaneous opportunity to offer alternative products or points of view. the division must maintain its professional distance from any one technological solution to a particular set of problems. isad ad hoc committee reports 143 7. dissatisfaction on the part of some institute participants has been attributed to ( 1) participant heterogeneity, ( 2) misplaced expectations, ( 3) inadeq uacies in seminar faculties. the committee therefore recommends that the target populations be clearly defined and subgroups be given special attention; that expectations be clearly spelled out in publicity announcements and that faculty members be selected with care. 8. the most useful measure of success in any educational program is purported to be evidenced by some behavioral or attitudinal change on the part of the "educatee" occurring after a defined educational experience. the committee therefore recommernh that, where possible, evaluation techniques be used on a sampling basis to obtain such evidence after a period of time has elapsed beyond the occurrence of the seminar experience. 9. a limited number of people have been able to attend particular seminars because of the barriers of time, cost, and space. the committee therefore recommends that a supplemental program using edited audio or video media be attempted to promulgate institute segments in shortened time frames. 10. several colleges and universities are committed to continuing education in information science and library automation. the committee therefore recommends that, when appropriate, the talent of selected graduate schools be considered as resources for faculty and for program planning. one prerequisite to maximize the success of this type of alliance is evidence that the objectives of the isad seminars and institutes program are coincident with the objectives of the continuing education programs of such graduate schools. 11. information science has not received much attention per se within the seminars and institutes program, and the committee regards this state of affairs as insufficient to attain the objectives of the division. the committee therefore recommends that a working definition of information science be promulgated by the division to provide guidance for planning seminars and institutes for its practitioners. in iliis connection, close liaison with asis and special interest groups in other organizations is a natural procedure to follow. 12. the technologies which appear to have a growing impact upon library and information center operations and services are: computers, micrographics, media systems, and telecommunications. the committee therefore recommends that the division's seminars and in~ stitutes program planners seek to provide a series of tutorials not only on the operations and applications of these technologies taken separately, but also on the inter-relationships which are possible among them. 13. based upon suggestions contributed by the committee members and others, the following topical areas are recommended for consideration in planning for future seminars and institutes. a. interlibrary cooperation 1. technological options 144 journal of library automation vol. 6/3 september 1973 2. how to choose a network 3. variant forms of marc (e.g., serials, music, etc.) b. technology i. the interrelationships of computers, micrographics, instructional media systems and telecommunications 2. mini-computers 3. telecommunications 4. non-print audio and visual technology 5. techniques of data base storage, search and retrieval: trade-offs 6. the emergence of commercial jobbers in support of library automation c. internetwork compatibility 1. network transferability 2. data base interchange 3. methods of interconnection through telecommunications 4. standards and protocols d. management and people l. impact of automation on people and jobs 2. automation, ne tworks and educational needs of librarians 3. contract negotiation 4. problem solving for library managers 5. the impact of nonlibrarians on library automation and information science 6. the theory and application of interinstitutional cooperation: quid pro quo revisited 7. why automation projects fail. respectfully s-ubmitted, pauline atherton brett butler ]ay cunningham paul fasana diana ironside sue martin elaine svenonius ronald miller, chairman reproduced with permission of the copyright owner. further reproduction prohibited without permission. harvesting information from a library data warehouse su, siew-phek t;needamangala, ashwin information technology and libraries; mar 2000; 19, 1; proquest pg. 17 harvesting information from a library data warehouse data warehousing technology has been defined by john ladley as "a set of methods, techniques, and tools that are leveraged together and used to produce a vehicle that delivers data to end users on an integrated platform. "1 this concept has been applied increasingly by industries worldwide to develop data warehouses for decision support and knowledge discovery. in the academic sector, several universities have developed data warehouses containing the universities'ftnancial, payroll, personnel, budget, and student data.2 these data warehouses across all industries and academia have met with varying degrees of success. data warehousing technology and its related issues have been widely discussed and published. 3 little has been done, however, on the application of this cutting edge technology in the library environment using library data. i motivation of project daniel boorstin, the former librarian of congress, mentions that "for most of western history, interpretation has far outrun data." 4 however, he points out "that modem tendency is quite the contrary, as we see data outrun meaning." his insights tie directly to many large organizations that long have been rich in data but poor in information and knowledge. library managers are increasingly finding the importance of obtaining a comprehensive and integrated view of the library operations and the services it provides. this view is helpful for the purpose of making decisions on the current operations and for their improvement. due to financial and human constraints for library support, library managers increasingly encounter the need to justify everything they dofor example, the library's operation budget. the most frustrating problem they face is knowing that the information needed is available somewhere in the ocean of data but there is no easy way to obtain it. for example, it is not easy to ascertain whether the materials of a certain subject area, which consumed a lot of financial resources for their acquisition and processing, are either frequently used (i.e., high rate of circulation), seldom used, or not used at all. or, whether they satisfy users' needs. another example, an analysis of the methods of acquisition (firm order vs. approval plan) together with the circulation rate could be used as a factor in deciding the best method of acquiring certain types of material. such information can play a pivotal role in performing collection development and library management more efficiently and effectively. unfortunately, the data needed to make these types of decisions are often scattered in different files maintained siew-phek t. su and ashwin needamangala by a large centralized system, such as notis, that does not provide a general querying facility or by different file/ data management or application systems. this situation makes it very difficult and time-consuming to extract useful information. this is precisely where data warehousing technology comes in. the goal of this research and development work is to apply data warehousing and data mining technologies in the development of a library decision support system (loss) to aid the library management's decision making. the first phase of this work is to establish a data warehouse by importing selected data from separately maintained files presently used in the george a. smathers libraries of the university of florida into a relational database system (microsoft access). data stored in the existing files were extracted, cleansed, aggregated, and transformed into the relational representation suitable for processing by the relational database management system. a graphical user interface (gui) is developed to allow decision makers to query for the data warehouse's contents using either some predefined queries or ad hoc queries. the second phase is to apply data mining techniques on the library data warehouse for knowledge discovery. this paper covers the first phase of this research and development work. our goal is to develop a general methodology and inexpensive software tools, which can be used by different functional units of a library to import data from different data sources and to tailor different warehouses to meet their local decision needs. for meeting this objective, we do not have to use a very large centralized database management system to establish a single very large data warehouse to support different uses. i local environment the university of florida libraries has a collection of more than two million titles, comprising over three million volumes. it shares a notis-based integrated system with nine other state university system (sus) libraries for acquiring, processing, circulating, and accessing its collection. all ten sus libraries are under the consortium umbrella of the florida center for library automation (fcla). siew-phekt. su (pheksu@mail.uflib.ufl.edu) is associate chair of the central bibliographic services section, resource services department, university of florida libraries, and ashwin needamangala (nsashwin@grove.ufl.edu) is a graduate student at the electrical and computer engineering department, university of florida. harvesting information from a library data warehouse i su and needamangala 17 reproduced with permission of the copyright owner. further reproduction prohibited without permission. i library data sources the university of florida libraries' online database, luis, stores a wealth of data, such as bibliographic data (author, title, subject, publisher information), acquisitions data (price, order information, fund assignment), circulation data (charge out and browse information, withdrawn and inventory information), and owning location data (where item is shelved). these voluminous data are stored in separate files. the notis system as used by the university of florida does not provide a general querying facility for accessing data across different files. extracting any information needed by a decision maker has to be done by writing an application program to access and manipulate these files. this is a tedious task since many application programs would have to be written to meet the different information needs. the challenge of this project is to develop a general methodology and tools for extracting useful data and metadata from these disjointed files, and to bring them into a warehouse that is maintained by a database management system such as microsoft access. the selection of access and pc hardware for this project is motivated by cost consideration. we envision that multiple special purpose warehouses be established on multiple pc systems to provide decision support to different library units. the library decision support system (loss) is developed with the capability of handling and analyzing an established data warehouse. for testing our methodology and software system, we established a warehouse based on twenty thousand monograph titles acquired from our major monograph vendor. these titles were published by domestic u.s. publishers and have a high percentage of dlc/dlc records (titles cataloged by the library of congress). they were acquired by firm order and approval plan, the publication coverage is the calendar year 1996-1997. analysis is only on the first item record (future project will include all copy holdings). although the size of the test data used is small, it is sufficient to test our general methodology and the functionality of our software system. fcla d82 tables and key list most of the data from the twenty-thousand-title domain that go into the loss warehouse are obtained from the db2 tables maintained by fcla. fcla developed and maintains the database of a system called ad hoc report request over the web (arrow) to facilitate querying and generating reports on acquisitions activities . the data are stored in 0b2 tables. 5 for our research and development purpose, we needed db2 tables for only the twenty-thousand titles that we identified as our initial project domain. these titles all have an identifiable 035 field in the bibliographic records (zybp1996, zybcip1996, zybp1997 or zybpcip1997). we used the batchbam program developed by gary strawn of northwestern university library to extract and list the unique bibliographic record numbers in separate files for fcla to pick up. 6 using the unique bibliographic record numbers, fcla extracted the 0b2 tables from the arrow database and exported the data to text files. these text files then were transferred to our system using the file transfer protocol (frp) and inserted as tables into the loss warehouse. bibliographic and item records extraction fcla collects and stores complete acquisitions data from the order records as db2 tables. however, only brief bibliographic data and no item record data are available . bibliographic and item record data are essential for inclusion in the loss warehouse in order to create a viable integrated system capable of performing cross-file analysis and querying for the relationships among different types of data. because these required data do not exist in any computer readable form, we designed a method to obtain them. using the identical notis key lists to extract the targeted twenty-thousand bibliographic and item records, we applied a screen scraping technique to scrape the data from the screen and saved them in a flat file. we then wrote a program in microsoft visual basic to clean the scraped data and saved them as text-delimited files that are suitable for importing into the loss warehouse. screen scraping concept screen scraping is a process used to capture data from a host application. it is conventionally a three-part process: • displaying the host screen or data to be scraped. • finding the data to be captured. • capturing the data to a pc or host file, or using it in another windows application. in other words, we can capture particular data on the screen by providing the corresponding screen coordinates to the screen scraping program. numerous commercial applications for screen scraping are available on the market. however, we used an approach slightly different from the conventional one. although we had to capture only certain fields from the notis screen, there were other factors that we had to take into consideration. they are: • the location of the various fields with respect to the screen coordinates changes from record to record . this makes it impossible for us to lock a particular field with a corresponding screen coordinate. 18 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. • the data present on the screen are dynamic because we are working on a "live" database where data are frequently modified. for accurate query results, all the data, especially the item record data where the circulation transactions are housed, need to be captured within a specified time interval so that the data are uniform. this makes the time taken for capturing the data extremely important. • most of the fields present on the screen needed to be captured. taking the above factors into consideration, it was decided to capture the entire screen instead of scraping only certain parts of the screen. this made the process both simpler and faster . the unnecessary fields were filtered out during the cleanup process . i system architecture the architecture of the loss system is shown in figure 1 and is followed by a discussion on its components' functions. notis notis (northwestern online totally integrated system) was developed at the northwestern university library and introduced in 1970. since its inception, notis has undergone many versions. university of florida libraries is one of the earliest users of notis. fcla has made many local modifications of the notis system since uf libraries started using it. as a result, the uf notis is different from the rest of the notis world in many respects . notis can be broken down into four subsystems: • acquisitions • cataloging • circulation • online public access catalog (opac) at the university of florida libraries, the notis system runs on an ibm 370 main frame computer that runs the os/390 operating system . host explorer host explorer is a software program that provides a tcp /ip link to the main frame computer . it is a terminal emulation program supporting the ibm main frame, as/400, and vax hosts . host explorer delivers an enhanced user environment for all windows nt platforms, windows 95 and windows 3.x desktops. exact tn3270e, tn5250, vt420/320/220/101/100/52, wyse 50/60 and ansi-bbs display is extended to leverage the wealth of the windows desktop. it also supports all db2tables loss host explorer data cleansing and extraction warehouse graphical user interface figure 1. loss architecture and its components tcp /ip based tn3270 and tn3270e gateways. the host explorer program is used as the terminal emulation program in loss. it also provides vba compatible basic scripting tools for complete desktop macro development. users can run these macros directly or attach them to keyboard keys, toolbar buttons, and screen hotspots for additional productivity. the function of host explorer in the loss is v ery simple. it has to "visit" all screens in the notis system corresponding to each notis number present in the batchbam file, and capture all the data on the screens. in order to do this, we wrote a macro that read the notis number one at a time from the batchbam file and input the number into the command string of host explorer . the macro essentially performed the following functions: • read the notis numbers from the batchbam file. • inserted the notis number into the command string of host explorer . • toggled the screen capture option in host explorer so that data are scraped from the screen only at necessary times. • saved all the scraped data into a flat file. after the macro has been executed, all the data scraped from the notis screen reside in a flat file. the data present harvesting information from a library data warehouse i su and needamangala 19 reproduced with permission of the copyright owner. further reproduction prohibited without permission. in this file have to be cleansed in order to make them suitable for insertion into the library warehouse. a visual basic program is written to perform this function. the details of this program will be given in the next section. i data cleansing and extraction this component of the loss is written in the visual basic programming language. its main function is to cleanse the data that have been scraped from the notis screen. the visual basic code saves the cleansed data in a text-delimited format that is recognized by microsoft access. this file is then imported into the library warehouse maintained by microsoft access. the detailed working of the code that performs the cleansing operation is discussed below. the notis screen that comes up for each notis number has several parts that are critical to the working of the code. they are: • notis number present in the top-right of the screen (in this case, akr9234) • field numbers that have to be extracted. example: 010::, 035:: • delimiters. the " i " symbol is used as the delimiter throughout this code. for example, in the 260 field of a bibliographic record, "i a" delimits the place of publication, " i b" the name of the publisher and, "i c" the date of publication. we shall now go step by step through the cleansing process. initially we have the flat file containing all the data that have been scraped from the notis screens. • the entire list of notis numbers from the batchbam file is read into an array called bam_number$. • the file containing the data that have been scraped is read into a single string called bibrecord$. • this string is then parsed using the notis numbers from the bam_number$ array. • we now have a string that contains a single notis record. this string is called single_record$. • the program runs in a loop till all the records have been read. • each string is now broken down into several smaller strings based on the field numbers. each of these smaller strings contains data pertaining to the corresponding field number. • a considerable amount of the data present on the notis screen is unnecessary from the point of view of our project. we need only certain fields from the notis screen. but even from these fields we need the data only from certain delimiters. therefore, we now scan each of these smaller strings for a certain set of delimiters, which was predefined for each individual field. the data present in the other delimiters are discarded. • the data collected from the various fields and their corresponding delimiters are assigned to corresponding variables. some variables contain data from more than one delimiter concatenated together. the reason for this can be explained as follows. there are certain fields, which are present in the database only for informational purposes and will not be used as a criteria field in any query. since these fields will never be queried upon, they do not need to be cleansed as rigorously as the other fields and therefore, we can afford to leave the data of these fields as concatenated strings. example: the catalog_source field which has data from " i a" and " i c" is of the form " i a dlc i c dlc" while the lang code field which has data from "i a" and" i h" is of the form" i a eng i h rus." but we split this into two fields: lang_code_l containing "eng" and lang_code_2 containing "rus." • the data collected from the various fields are saved in a flat file in the text-delimited format. microsoft access recognizes this format. a screen dump of the text-delimited file, which is the end result of the cleansing operation, is shown in figure 2. the flat file, which we now have, can be imported into the library warehouse. i graphical user interface in order to ease the tasks of the user (i.e., the decision maker) to create the library warehouse and to query and analyze its contents, a graphical user interface tool has been developed. through the gui, the user can enact the following processes or operations through a main menu: • connection to notis • screen scraping • data cleansing and extracting • importing data • viewing collected data • querying • report generating the first option opens hostexplorer and provides a connection to notis. lt provides a shortcut to closing or minimizing ldss and opening hostexplorer. the screen scraping option activates the data scraping process. the data cleansing and extracting option filters out the unnecessary data fields and saves the cleansed data in a text-delimited format. the importing data option imports the data in the text-delimited format into the warehouse. the viewing collected data option allows the user to view the contents of a selected relational table stored in 20 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. "record humber","system control humber","catalogin source","language codes 1","language code~ "akr9234", "ybp1996 0507--clarr done", "a dlc i c dlc ", "1 : i a eng "," i h rus", "e-ur-ru", "306/. 0~ "rks6472", "ybp1996 0507--clrrr done"," a dlc i c dlc ", "1 : i a eng "," i h rus", "hull", "891. 73/ 44 "aks6493", "ybp1996 0507--clarr done"," a dlc i c dlc ","hull", "hull", "hull"," 001. 4/225/ 028563 i ~f "ajx7554", "ybp1996 05 08--clarr done"," a uk i c uk ","hull", "hull", "e-uk---", "362. 1 / 068 12 2 o",' "akb3478", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-fr---", "843/. 7 12 2 o", "t " "akc6442","ybp19960508--clarr done","a dlc c dlc ","1 : la eng ","lh ger","e-fr---","194 12 "ake9837", "ybp1996 0508--clarr done"," a dlc c dlc ","hull", "hull", "e-gr---", "883/. 01 12 20",' "akk9486", "ybp1996 0508--clarr done", "a dlc c dlc ","hull", "hull", "e-uk---", "822/. 052309 12 ~% l'akl2258", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-xr---", "929. 4/2/ 08992401 1• "akm2455", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-gx---", "943. 086 12 2 o",' "akm4649", "ybp1996 0508--clarr done"," a dlc c dlc ","hull", "hull", "hull", "863/ .64 i 2 20", "hu] ' "akh0246","ybp19960508--clarr done","a dlc c dlc ","hull","hull","n-us--la e-uk-en","700/. "akh181 o", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull" ,"hull", "e-uk---", "305. 6/2 042/ 0903.: "akh3749","ybp19960508--clarr done","a dlc c dlc ","hull","hull","f-ke--la f-so --","327.{ "akq727 4", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "hull", "355. 4/2 12 2 o", "hu] "akq9180", "y.bp1996 0508--clarr done", "a dlc c dlc ","hull", "hull", "n-us---", "23 0/. 93/ 09 12 2,f "akr 0424", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "n-us-mi", "331 . 88/1292/ 097' "rkr1411", "ybp1996 05 08--clarr done"," a cl i c cl ","hull", "hull", "n-us---", "3 05. 896/ 073 12 2 o' "akr1846", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e-uk-ni", "hull", "hull", "x, "akr2169", "ybp1996jt5 08--clarr done"," a dlc i c dlc ","hull", "hull", "n-us-sc", "323. 1/196073/ 091 "akr2245" ,"ybp19960508--c .larr d.one" ," a dlc i c dlc ","hull", "hull", "hull", "306 .4/6 i 2 20", "hu1 "akr2255", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "3 03. 48/2 12 2 o", "2r "akr226 o", "ybp1996 0508--clarr done"," a dlc i c dlc ","hull", "hull", "n-us-", "3 03. 48/2 12 2 o", "akr2281", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "t-----i a r------", "333. , · "akr2287", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "57 4. 5/262 12 2 o", "t "rkr2357", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e------", "361 . 6/1 / 094 12 l "akr2358", "ybp1996 0508--clarr done"," a dlc i c dlc ","hull", "hull" ,"hull", "333. 7/2/01 12 20" ,' ¥' "akr2371", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e------", "3 07. 72/ 094 12 211 "akr2386", "ybp1996 05 08--clarr done", "dlc i c dlci", "hull" ,/'hull", "e-uk---", "hull", "hull", "xu, "rkr25 03", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "575. 1 / 09 12 2 o", "hl 'i-r---· ---------·----figure 2. a text-delimited file the warehouse. the querying option activates ldss's querying facility that provides wizards to guide the formulations of different types of queries, as discussed later in this article . the last option, report generating, is for the user to specify the report to be generated. i data mining tool a very important component of loss is the data mining tool for discovering association rules that specify the interrelationships of data stored in the warehouse. many data mining tools are now available in the commercial world. for our project, we are investigating the use of a neural-network-based data mining tool developed by limin fu of the university of florida.? the tool allows the discovery of association rules based on a set of training data provided to the tool. this part of our research and development work is still in progress . the existing gui and report generation facilities will be expanded to include the use of this mining tool. i library warehouse fcla exports the data existing in the 0b2 tables into text files. as a first step towards creating the database, these text files are transferred using ftp and form separate relational tables in the library warehouse. the data that harvesting information from a library data warehouse i su and needamangala 21 reproduced with permission of the copyright owner. further reproduction prohibited without permission. are scraped from the bibliographic and item record screens result in the formation of two more tables. characteristics data in the warehouse are snapshots of the original data files. only a subset of the data contents in these files are extracted for querying and analysis since not all the data are useful for a particular decision-making situation. data are filtered as they pass from the operational environment to the data warehouse environment. this filtering process is necessary particularly when a pc system, which has limited secondary storage and main memory space, is used. once extracted and stored in the warehouse, data are not updateable. they form a read-only database. however, different snapshots of the original files can be imported into the warehouse for querying and analysis. the results of the analyses of different snapshots can then be compared. structure data warehouses have a distinct structure. there are summarization and detail structures that demarcate a data warehouse. the structure of the library data warehouse is shown in figure 3. the different components of the library data warehouse as shown in figure 3 are: • notis and 0b2 tables. bibliographic and circulation data are obtained from notis through the screen scraping process and imported into the warehouse. fcla maintains acquisitions data in the form of db2 tables. these are also imported into the warehouse after conversion to a suitable format. • warehouse. the warehouse consists of several relational tables that are connected by means of relationships. the universal relation approach could have been used to implement the warehouse by using a single table. the argument for using the universal relation approach would be that all the collected data fall under the same domain. but let us examine why this approach would not have been suitable. the different data collected for import into the warehouse were bibliographic data, circulation data, order data, and pay data. now, if all these data were incorporated into one single table with many attributes, it would not be of any exceptional use since each set of attributes have their own unique meaning when grouped together as bibliographic table, circulation table, and so on. for example, if we group the circulation data and the pay data together in a single table, it would not make sense. however, the pay data and the circulation data are related through the bib_key. hence, our use of the conventional approach of havuser .....--------~----.----------......----=--___ bibliographic data view circulation data view ufbib, ufpay, ufinv, ufcirc, uford warehouse pay data view import screen scraping notis fcla db2 tables figure 3. structure of the library data warehouse ing several tables connected by means of relationships is more appropriate. • views. a view in sql terminology is a single table that is derived from other tables. these other tables could be base tables or previously defined views. a view does not necessarily exist in physical form; it is considered a virtual table, in contrast to base tables whose tables are actually stored in the database. in the context of the ldss, views can be implemented by means of the adhoc query wizard. the user can define a query /view using the wizard and save it for future use. the user can then define a query on this query i view. • summarization. the process of implementing views falls under the process of summarization. summarization provides the user with views, which make it easier for users to query on the data of their interests. as explained above, the specific warehouse we established consists of five tables. table names including "_wh" indicates that it contains current detailed data of the warehouse. current detailed data represents the most recent snapshot of data that has been taken from the notis system. the summarized views are derived from the current detailed data of the warehouse. since current detailed data of the warehouse are the basic data of the 22 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. application, only the current detailed data tables are shown in appendix a. i decision support by querying the warehouse the warehouse contains a set of integrated relational tables whose contents are linked by the common primary key, the bib_key (biblio_key). the data stored across these tables can be traver sed by matching the key values associated with their tuples or records . decision makers can issue all sorts of sql-type queries to retrieve useful information from the warehouse. two general types of queries can be distinguished : predefined queries and ad hoc queries . the former type refers to queries that are frequently used by decision makers for accessing information from different snapshots of data imported into the warehouse . the latter type refers to queries that are exploratory in nature. a decision maker suspects that there is some relationship between different types of data and issues a query to verify the existence of such a relationship. alternatively, data mining tools can be applied to analyze the data contents of the warehouse and discover rules of their relationships (or associations). predefined queries below are some sample queries posted in english. their corresponding sql queries can be processed using loss. l. number and percentage of approval titles circulated and noncirculated. 2. number and percentage of firm order titles circulated and noncirculated . 3. amount of financial resources spent on acquiring noncirculated titles. 4. number and percentage of dlc/dlc cataloging records in circulated and noncirculated titles . 5. number and percentage of "shared" cataloging records in circulated and noncirculated titles. 6. numbers of original and "shared" cataloging records of noncirculated titles. 7. identify the broad subject areas of circulated and noncirculated titles . 8. identify titles that have been circulated "n" number of times and by subjects . 9. number of circulated titles without the 505 field. each of the above english queries can be realized by a number of sql queries. we shall use the first two english queries and their corresponding sql queries to explain how the data warehouse contents and the querying facility of microsoft access can be used to support decision making. the results of sql queries also are given . the first english query can be divided into two parts (see figure 4), each realized by a number of sql queries as shown below . sample query outputs query 1: number and percentage of approval titles circulated and noncirculated result : total approval titles circulated noncirculated 1172 980 192 83.76 % 16.24 % similar to the above sql queries, we can translate the second english query into a number of sql queries and the result is given below: query 2: number and percentage of firm order titles circulated and noncirculated result : total firm order titles circulated noncirculated report generation 1829 1302 527 71.18 % 28.82 % the results of the two predefined english queries can be presented to users in the form of a report. total titles 3001 approval 1172 39% circulated 980 83.76 % noncirculated 192 16.24 % firm order 1829 61% circulated 1302 71.18 % noncirculated 527 28 .82 % from the above report, we can ascertain that, though 39 percent of the titles were purchased through the approval plan and 61 percent through firm orders, the approval titles have a higher rate of circulation, 83.76 percent, as compared to firm order titles of 71.18 percent. it is important to note that the result of the above queries is taken from only one snapshot of the circulation data. analysis from several snapshots is needed in order to compare the results and arrive with reliable information. we now present a report on the financial resources spent on acquiring and processing noncirculated titles. in order to generate this report, we need the output of queries four and five listed earlier in this article. the corresponding outputs are shown below. query 4: number and percentage of dlc/dlc cataloging records in circulated and noncirculated titles. harvesting information from a library data warehouse i su and needamangala 23 reproduced with permission of the copyright owner. further reproduction prohibited without permission. result: total dlc/dlc records circulated noncirculated 2852 2179 673 76.40% 23.60% query 5: number and percentage of "shared" cataloging records in circulated and noncirculated titles. result: total "shared" records circulated noncirculated 149 100 49 67.11% 32.89% in order to come up with the financial resources, we need to consider several factors, which contribute to the amount of financial resources spent. for the sake of simplicity, we consider only the following factors: 1. the cost of cataloging each item with dlc/dlc record approval titles circulated 2. the cost of cataloging each item with shared record 3. the average price of noncirculated books 4. the average pages of noncirculated books 5. the value of shelf space per centimeter because the value of the above factors differs from institution to institution and might change according to more efficient workflow and better equipment used, users are required to fill in the value for factors 1, 2, and 5. loss can compute factors 3 and 4. the financial report , taking into consideration the value of the above factors, could be as shown below. processing cost of each dlc title = $10.00 673 x $10.00 = $ 6,730.00 processing cost of each shared title = $20.00 sql query t.o retrieve the distinct bibliographic keys of all the approval titles: select distinct bibscreen.bib_key from bibscreen right join pa yl on bibscreen.bib_key = pa y l.bib_num where (((payl.fund_key) like "*07*")); sql query to count the number of approval titles that have been circulated: select count (appr_title.bib_key) as countofbib_key from (bibscreen inner join appr_title on bibscreen.bib_key = appr _title.bib_key) inner join itemscreen on bibscreen.bib_key = itemscreen .biblio_key where (((itemscreen.charges)>0)) order by count(appr _title.bib_key); sql query to calculate the percentage: select cnt_appr_ti tle_circ.countofbib_ke y, int(([cnt_appr_titl e_circ]![countofbib _key])*lo0/ count([bibscreen)![bib_key])) as percent_apprcirc from bibscreen, cnt_appr_title _circ group by cnt _appr _title_circ.countofbib _key; approval titles noncirculated sql query for counting the number of approval titles that have not been circulated: select distinct count(appr_title.bib_key) as countofbib_ke y from (appr _title inner join bibscreen on appr_title.bib_key bibscreen.bib_key) inner join itemscreen on bibscreen .bib_key = itemscreen.biblio_ke y where ( ( (itemscreen.charges)=0) ); sql query to calculate the percentage: select cnt_appr_title_noncirc.countofbib_ke y, int(([cnt_appr_title_noncirc)![countofbib_ke y])*lo0/ count([bibscreen]! [bib _key]))) as percent_appr _noncirc from bibscreen, cnt_appr _title_noncirc group by cnt_appr_title_noncirc .countofbib_ke y; figure 4. example of an english query divided into two parts 24 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. 49 x $20.00 = $ 980.00 average price paid per noncirculated item = $48.00 722 x $48.00 = $34,656.00 average size of book = 288 pages = 3 cm average cost of 1 cm of shelf space= $0.10 722 x $0.30 = $216.60 grand total = $42,582.60 again it is important to point out that several snapshots of the circulation data have to be taken to track and compare the different analyses before deriving the reliable information. ad hoc queries alternately, if the user wishes to issue a query that has not been predefined, the ad hoc query wizard can be used. the following example illustrates the use of the ad hoc query wizard. assume the sample query is: how many circulated titles in the english subject area cost more than $35? we now take you on a walk-through of the adhoc query wizard starting from the first step till the output is obtained. figure 4 depicts step 1 of the ad hoc query wizard. the sample query mentioned above requires the following fields: • biblio_key for a count of all the titles which satisfy the given condition. • charges to specify the criteria of "circulated title". • fund_key to specify all titles under the "english" subject area. • paid_amt to specify all titles which cost more than $35. step 2 of the ad hoc query wizard (figure 5) allows the user to specify criteria and thereby narrow the search domain. step 3 (figure 6) allows the user to specify any mathematical operations or aggregation functions to be performed. step 4 (figure 7) displays the user-defined query in sql form and allows the user to save the query for future reuse. the output of the query is shown below in figure 8. the figure shows the number of circulated titles in the english subject area that cost more than $35. alternatively, the user might wish to obtain a listing of these 33 titles. figure 9 shows the listing. i conclusion in this article, we presented the design and development of a library decision support system based on data warehousing and data mining concepts and techniques. we described the functions of the components of loss. the screen scraping and data cleansing and extraction figure 4. step 1: ad hoc query wizard ~ e.9,~lang__;c~,tfe ... 1 lik~ "'ft,f" j.esi: !han eg,. crfi;irget t 4 gr~er th'jn·eii, q:,arges,> 0 equal tci'e_g_cfiarge~= !1 not . . figure 5. step 2: ad hoc query wizard harvesting information from a library data warehouse i su and needamangala 25 reproduced with permission of the copyright owner. further reproduction prohibited without permission. figure 6. step three : ad hoc query wizard figure 7. step four: ad hoc query wizard figure 8. query output figure 9. listing of query output 26 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. processes were described in detail. the process of importing data stored in luis as separate data files into the library data warehouse was also described. the data contents of the warehouse can provide a very rich information source to aid the library management in decision making. using the implemented system, a decision maker can use the gui to establish the warehouse, and to activate the querying facility provided by microsoft access to explore the warehouse contents . many types of queries can be formulated and issued against the database. experimental results indicate that the system is effective and can provide pertinent information for aiding the library management in making decisions. we have fully tested the implemented system using a small sample database . our on going work includes the expansion of the database size and the inclusion of a data mining component for association rule discovery. extensions of the existing gui and report generation facilities to accommodate data mining needs are expected. i acknowledgments we would like to thank professor stanley su for his support and advice on the technical aspect of this project. we would also like to thank donna alsbury for providing us with the 0b2 data, daniel cromwell for loading the 0b2 files and along with nancy williams and tim hartigan for their helpful comments and valuable discussions on this project. references and notes 1. john ladley , "operational data stores: building an effective strategy, " data warehouse: practical advice from the experts (englewood cliffs, n.j.: prentice hall , 1997). 2. information on har vard university's adapt proj ect. accessed march 8, 2000, www.adapt.harvard .edu/; information on the arizona state university data administration and institutional analysis warehou se. accessed march 8, 2000, www .asu .edu / data_admin / wh-1.html; information on the university of minnesota clarity project. accessed march 8, 2000,www.clarity.umn .edu/; information on the uc san diego darwin project. accessed march 8, 2000, www.act .ucsd .edu/ dw i darwin.html; information on university of wisconsinmadison infoaccess . accessed march 8, 2000, http :/ / wiscinfo. doit.wisc .edu/infoac cess /; information on the univer sity of nebraska data warehouse-nulook. accessed march 8, 2000, www .nulook.uneb.edu /. 3. ramon barquin and herbert edelstein, eds ., building, using, and managing the data warehouse (englewood cliffs, n .j.: prentice hall , 1997); ramon barquin and herbert edelstein, eds ., planning and designing the data warehouse (upper saddle river, n.j .: prentice hall, 1996); joyce bischoff and ted alexander, data warehouse: practical advice from the experts (englewood cliffs, n.j.: prentice hall , 1997); jeff byard and donovan schneider, "the ins and outs (and everything in between) of data war ehousing ," acm sigmod 1996 tutorial notes, may 1996. accessed march 8, 2000, www .redbrick.com / product s/ white / pdf/sigmod96.pdf ; surajit chaudhuri and umesh dayal, "an overview of data warehousing and olap technolog ," acm sigmod record 26(1), march 1997. accessed march 8, 2000, www.acm.org/sigmod / record/issue s/ 9703/ chaudhuri .ps ; b. devlin , data warehouse: from architecture to implementation (reading, mass.: addison-wesle y, 1997); u. fayyad and others, eds ., advances in knowledge discovery and data mining (cambridge, mass.: the mit pr., 1996); joachim hammer, "data war ehousing overview, terminology, and research issues." accessed march 8, 2000, www.cise.ufl .edu/ -jhammer / classes / wh-seminar / overview / index .htm ; w. h. inmon, building the data warehouse (new york, n.y.: john wiley, 1996); ralph kimball , "dangerous preconceptions." accessed march 8, 2000, www .dbmsmag.com/9608d05.html ; ralph kimball , the data warehouse toolkit (new york, n.y.: john wiley, 1996); ralph kimball, "mastering data extraction," in dbms magazine, june 1996. (provides an overview of the process of extracting , cleaning, and loading data .) accessed march 8, 2000, www .dbmsmag.com / 9606d05 .html ; alberto mendelzon , "bibliography on data warehousing and olap." accessed march 8, 2000, www.cs.toronto.edu/-mendel/dwbib.html. 4. daniel j. boorstin, "the age of negative discovery," cleopatra's nose: essays on the unexpected (new york: random hous e, 1994). 5. information on the arrow system . accessed march 8, 2000,www . fcla.edu /s ystem/intro_arrow.html. 6. gary strawn, "batchbaming." accessed march 8, 2000, http:/ /web .uflib.ufl .edu/rs/rsd/batchbam .html. 7. li-min fu, "oomrul: leaming the domain rules ." accessed march 8, 2000, www .cise.ufl .edu / -fu / domrul.html. harvesting information from a library data warehouse i su and needamangala 27 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a warehouse data tables ufcirc_wh uford _wh ufpay_wh attribute domain attribute domain attribute domain bib_key text(s0) id autonumber inv_key text(20) status text(20) ord_num text(20) ord_num text(20) enum / chron text(20) ord_div number ord_div number midspine text(20) process_uni t text(20) process _unit text(20) temp_locatn text(20) bib_num text(20) bib_key text(20) pieces number order_da te da te / time ord_seq_num number ch arges number mod_date date / time inv_seq_num number last_use date / tune vendor_code text(20) status text(20) browse s number vndadr_order text(20 create_ date da te / tune value text(20) vndadr_claim text(20) lst_update da te / time invnt_date date / time vndadr_retum text(20) currency text(20) created date / time vend_ title_n um text(20) paid_am t num ber ord_unit text(20) usd_amt n u mber rcv_unit text(20) fund_key text(20) ufinv_wh ord_scope text(20 exp_class text(20) pur_ord_prod text(20) fiscal_year text(20) attribute domain action _int number copies number inv_key text(20) libspecl text(20) type_pay text(lo) create _dat e date / time libspec2 text(20) text text(20) mod_date date / time vend_note text(20) db2_11mestamp date / time approv _stat text(20) ord_note text(20) vend_adr _code text(20) source text(20) vend_code text(20) ref text(20) ufbib_wh action_date text(20) copyctl _num number attribute domain vend_inv _date date/tune mediu m text(20) approval_date date / tune piece_cnt n umber bib_key text(20) appro ver_id text(20) div_no te text(20) system_control _num text(s0) vend_inv _num text(20) acr_stat text(20) ca talog_source text(20) inv_tot number rel_stat text(20) lan g_code_l text(20) cale_ tot_rym ts num ber lst_date date / time lang_code_2 text(20) calc_net _tot_pymts number action_date text(20) geo_code text(20) currency text(20) libspec3 text(20) dewey_num text(20) discount_percen t number libspec4 text(20) edition text(20) vouch_no te text(20) encum b_units number pagina tion text(20) official_ vend text(20) currency text(20) size text(20) process _unit text(20) est_price number series_440 text(20) intemal_note text(20) encumb_outs num ber series_490 text(20) db2_ timestamp text(20) fund _key text(20) conten t text(20) fiscal_ year text(20) subject_l text(20) copies n u mber subject_2 text(20) xpay_method text(20) subject_3 text(20) vol_isu_date text(20) authors_l text(20) title_author text(20) au thors_2 text(20) db2_ timestamp date / time au th ors_3 text(20) series text(20) 28 information technology and libraries i march 2000 150 book reviews networks and disciplines; !proceedings of the educom fall conference, october 11-13, 1972, ann arbor, michigan. princeton: educom, 1973. 209p. $6.00. as with so many conferences, the principal beneficiaries of this one are those who attended the sessions, and not those who will read the proceedings. except for a few prepared papers, the text is the somewhat edited version of verbatim, ad lib summaries of a number of workshop sessions and two panels that purport to summarize common themes and consensus. since few people are profound in ad lib commentaries, the result is shallow and repetitive. the forest of themes is completely lost among a bewildering array of trees. the conference was, i am sure, exciting and thought-provoking for the participants. it was simply organized, starting with statements of networking activities in a number of disciplines, i.e., chemistry, language studies, economics, libraries, museums, and social research. the paper on economics is by far the best organized presentation of the problems and potential of computers in any of the fields considered, and perhaps the best short presentation yet published for economics. the paper on libraries was short, that on chemistry lacking in analytical quality, that on language provocative, that on social research highly personal, and that on museums a neat mixture of reporting and interpreting. much of the information is conditional, that is, it described what might or could be in the realm of the application of computers to the various subjects. the speakers all directed their papers to the concept of networks, interpreted chiefly as widespread remote access to computational facilities. the papers are followed by very brief transcripts of the summaries of workshops in which the application of computers to each of the disciplines was presumably discussed in detail. much of each summary is indicative and not really informative about the discussions. the concluding text again is the transcript of two final panels on themes and relationships among computer centers. the only description for this portion of the text is turgid. in the midst of all this is the banquet paper presented by ed parker, who as usual was thoughtful and insightful, and several presentations by national science foundation officials that must have been useful at the time to guide those relying on federal funding for computer networks in developing proposals. i can't think of another reference that touches on the potential of computers in so many different disciplines, but it is apparent from the breadth of ideas and the range of suggested or tested applications that a coherent and analytical review should be done. this volume isn't it. russell shank smithsonian institution the analysis of information systems, by charles t. meadow. second edition. los angeles: melville publishing co., 1973. a wiley-becker & hayes series book. this is a revised edition of a book first published in 1967. the earlier edition was written from the viewpoint of the programmer interested in the application of computers to information retrieval and related problems. the second edition claims to be "more of a textbook for information science graduate students and users" (although it is not clear who these "users" are) . elsewhere the author indicates that his emphasis is on "software technology of information systems" and that the book is intended "to bridge the communications gap among information users, librarians and data processors." the book is divided into four parts: language and communication (dealing largely with indexing techniques and the properties of index languages) , retrieval of information (including retrieval strategies and the evaluation of system performance), the organization of information (organization of records, of ffies, file sets), computer processing of information (basic file processes, data access systems, interactive information retrieval, programming languages, generalized data management systems). the second two sections are, i feel, . much better than the first. these are the areas in which the author has had the most direct experience, and the topics covered, at least in their information retrieval applications, are not discussed particularly well or particularly fully elsewhere. it is these sections of the book that make it of most value to the student of information science. i am less happy about meadow's discussion of indexing and index languages, which i find unclear, incomplete, and inaccurate in places. the distinction drawn between pre-coordinate and post-coordinate systems is inaccurate; meadow tends to refer to such systems simply as keyword systems, although it is perfectly possible to have a post-coordinate system based on, say, class numbers, which can hardly be considered keywords, while it is also possible to have keyword systems that are essentially precoordinate. in fact, meadow relates the characteristic of being post-coordinate to the number of terms an indexer may use (" ... permit their users to select several descriptors for an index, as many as are needed to describe a particular document"), but this is not an accurate distinction between the two types of system. the real difference is related to how the terms are used (not how many are used), including how they are used at the time of searching. the references to faceted classification are also confusing and a number of statements are made throughout the discussion on index languages that are completely untrue. for example, meadow states (p. 51) that "a hierarchical classification language has no syntax to combine descriptors into terms." this is not at all accurate since several hierarchical classification schemes, including udc, do have synthetic elements which allow combination of descriptors, and some of these are highly synthetic. in fact, meadow himself gives an example (p. 3839) of this synthetic feature in the udc. it is also perhaps unfortunate that the student could read all through meadow's discussion of index languages without getting any clear idea of the structure of a thesaurus for information retrieval and how this thesaurus is applied in practice. book reviews 151 moreover, meadow used medical subject headings as his example of a thesaurus (p. 33-34), although this is not at all a conventional thesaurus and does not follow the usual thesaurus structure. my other criticism is that the book is too selective in its discussion of various aspects of information retrieval. for example, the discussion on automatic indexing is by no means a complete review of techniques that have been used in this field. likewise, the discussion of interactive systems is very limited, because it is based solely on nasa's system, recon. the student who relied only on meadow's coverage of these topics would get a very incomplete and one-sided view of what exists and what has been done in the way of research. in short, i would recommend this book for those sections (p. 183-412) that deal with the organization of records and files and with related programming considerations. the author has handled these topics well and perhaps more completely, in the information retrieval context, than anyone else. indexing and index languages, on the other hand, are subjects that have been covered more completely, clearly, and accurately by various other writers. i would not recommend the discussion on index languages to a student unless read in conjunction with other texts. f. w. lancaster university of illinois application of computer technology to librm·y processes, a syllabus, by joseph becker and josephine s. pulsifer. metuchen, n.j.: scarecrow press, 1973. 173p. $5.00. despite the large number of institutions offering courses related to library automation, including just about every library school in north america, accredited or not, there is a remarkable shortage of published material to assist in this instruction. with the publication of this small volume a light has been kindled; let us hope it will be only the first of many, for larger numbers of better educated librarians must surely result in higher standards in the field. this syllabus covers eight topics related 152 journal of library automation vol. 7/2 jtme 1974 to the use of computers in libraries, titled as follows: bridging the gap (librarians and automation); computer technology; systems analysis and implementation; marc program; library clerical processes (which encompasses acquisitions, cataloging, serials, circulation, and management information) ; reference services; related technologies; and library networks. each topic is treated as a unit of instruction, and each receives the identical treatment as follows. the units each start with an introductory paragraph, explaining what the field encompasses, and indicating the purpose of teaching that topic. the purpose of systems analysis, for example, is "to develop the sequence of steps essential to the introduction of automated systems into the library." a series of behavioral objectives are then listed, to show what the student will be able to do (after he has learned the material) that he presumably was unable to do before. for example, there are seven behavioral objectives in the unit on computer technology, of which the first four are: "1) the student will be able to discuss the two-fold requirement to represent data by codes and data structures for purposes of machine manipulation, 2) the student will be able to identify the basic components of computer systems and describe their purposes, 3) the student will be able to differentiate hardware and software and describe briefly the part that programming plays in the overall computer processing operation, 4) the student will be able to define the various modes of computer operation and indicate the utility of each in library operations." the remaining three objectives refer to the student's ability to enumerate and compare types of input, output, and storage devices. then an outline of the instructional material is presented, followed by the detailed and well-organized material for instruction. in no case can the material presented here be considered all that an instructor would need to know about the field, but a surprising amount of specific detail is included, along with a carefully organized framework within which to place other knowledge. the end result is to present to the instructor a series of outlines that would encompass much of the material included in a basic introductory course in library automation. every instructor would, presumably, want to add other topics of his own in addition to adding other material to the topics treated in this volume, but he has here an extremely helpful guide to a basic course, and the only work of its kind to be published to date. peter simmons school of librarianship university of british columbia the larc reports, vol. 6, issue 1. online cataloging and circulation at western kentucky university: an approach to automated instructional resources ~anagement. 1973. 78p. this is a detailed account of the design, development, and implementation of online cataloging and circulation which have been in operation at western kentucky university for several years. the library's reasons for using computers are similar to those of many college and university libraries that experienced rapid growth during the 1960s. the faculty of the division of library services first prepared a detailed proposal with appropriate feasibility studies and cost analyses to reclassify the collection from dewey decimal to library of congress classification. the proposal was approved by the administration of the university, and the decision was made to utilize campus computer facilities via online input techniques for reclassification, cataloging, and circulation. "project reclass" was accomplished during 1970-71 using ibm 2741 ats/360 terminals. a circulation file was subsequently generated from the master record file. the main library is housed in a new building and has excellent computer facilities within the library that are connected to the university computer center. cataloging information is input directly into the system via ats terminals; ibm 2260 visual display terminals are used for inquiry into the status of books and patrons; and ibm 1031/1033 data collection terminals are used to charge out and check in books. catalog cards and book catalogs in upper/lower case are produced in batch mode on regular schedule. the on-line circulation book record file is used in conjunction with the on-line student master record and payroll master record files for preparation of overdue and fine notices. apparently the communication between library staff and computer personnel has been well above average, and cooperation of the administration and other interested parties has been outstanding. the attention given to planning, scheduling, training, and implementation is impressive. what has been accomplished to date is considered very successful, and plans are book reviews 153 underway to develop on-line acquisitions ordering and receiving procedures. the report has some annoying shortcomings such as referring to the library of congress as "national library"; frequent use of the word "xeroxing," which the xerox corporation is attempting to correct; "inputing" for "inputting"; and several other misspelled words. some parts are poorly organized and unclear, but the report does provide rriany useful details for those considering a similar undertaking. lavahn overmyer school of library science case western reserve university digitization of text documents using pdf/a yan han and xueheng wan information technology and libraries | march 2018 52 yan han (yhan@email.arizona.edu) is full librarian, the university of arizona libraries, and xueheng wan (wanxueheng@email.arizona.edu) is a student, department of computer science, university of arizona. abstract the purpose of this article is to demonstrate a practical use case of pdf/a for digitization of text documents following fadgi’s recommendation of using pdf/a as a preferred digitization file format. the authors demonstrate how to convert and combine tiffs with associated metadata into a single pdf/a-2b file for a document. using real-life examples and open source software, the authors show readers how to convert tiff images, extract associated metadata and international color consortium (icc) profiles, and validate against the newly released pdf/a validator. the generated pdf/a file is a self-contained and self-described container that accommodates all the data from digitization of textual materials, including page-level metadata and icc profiles. providing theoretical analysis and empirical examples, the authors show that pdf/a has many advantages over the traditionally preferred file format, tiff/jpeg2000, for digitization of text documents. background pdf has been primarily used as a file delivery format across many platforms in almost every device since its initial release in 1993. pdf/a was designed to address concerns about long-term preservation of pdf files, but there has been little research and few implementations of this file format. since the first standard (iso 19005 pdf/a-1), published in 2005, some articles discuss the pdf/a family of standards, relevant information, and how to implement pdf/a for born-digital documents.1 there is growing interest in the pdf and pdf/a standards after both the us library of congress and the national archives and records administration (nara) joined the pdf association in 2017. nara joined the pdf association because pdf files are used as electronic documents in every government and business agency. as explained in a blog post, the library of congress joined the pdf association because of the benefits to libraries, including participating in developing pdf standards, promoting best-practice use of pdf, and access to the global expertise in pdf technology.2 few articles, if any, have been published about using this file format for preservation of digitized content. yan han published a related article in 2015 about theoretical research on using pdf/a for text documents.3 in this article, han discussed the shortcomings of the widely used tiff and jpeg2000 as master preservation file formats and proposed using the then-emerging pdf/a as the preferred file format for digitization of text documents. han further analyzed the requirements mailto:yhan@email.arizona.edu mailto:wanxueheng@email.arizona.edu digitization of text documents using pdf/a | han and wan 53 https://doi.org/10.6017/ital.v37i1.9878 of digitization of text documents and discussed the advantages of pdf/a over tiff and jpeg2000. these benefits include platform independence, smaller file size, better compression algorithms, and metadata encoding. in addition, the file format reduces workload and simplifies postdigitization processing such as quality control, adding and updating missing pages, and creating new metadata and ocr data for discovery and digital preservation. as a result, pdf/a can be used in every phase of a digital object in an open archival information system (oais)—for example, a submission information package (sip), archive information package (aip), and dissemination information package (dip). in summary, a pdf/a file can be a structured, self-contained, and selfdescribed container allowing a simpler one-to-one relationship between an original physical document and its digital surrogate. in september 2016, the federal agencies digital guidelines initiative (fadgi) released its latest guidelines for digitization related to raster images: technical guidelines for digitizing heritage materials.4 the de-facto best practices for digitization, these guidelines provide federal agencies guidance and have been used in many cultural heritage institutions. both the pdf association and the authors welcomed the recognition of pdf/a as the preferred master file format for digitization of text documents such as unbound documents, bound volumes, and newspapers.5 goals and tasks since han has previously provided theoretical methods of coding raster images, metadata, and related information in pdf/a, the goals of this article are threefold: 1. present real-life experience of converting tiffs/jpeg2000s to pdf/a and back, along with image metadata 2. test open source libraries to create and manipulate images, image metadata, and pdf/a 3. validate generated pdf/as with the first legitimate validator for pdf/a validation the tasks included the following: ● convert all the master files in tiffs/jpeg2000 from digitization of text documents into single pdf/a files losslessly. one document, one pdf/a file. ● evaluate and extract metadata from each tiff/jpeg2000 image and encode it along with its image when creating the corresponding pdf/a file. ● demonstrate the runtimes of the above tasks for feasibility evaluation. ● validate the pdf/a files against the newly released open source pdf/a validator verapdf. ● extract each digital image from the pdf/a file back to its original master image files along with associated metadata. ● verify the extracted image files in the back-and-forth conversion process against the original master image files choices of pdf/a standards and conformance level this article demonstrates using pdf/a-2b as a self-contained self-describing file format. currently, there are three related pdf/a standards (pdf/a-1, pdf/a-2, and pdf/a-3), each with information technology and libraries | march 2018 54 three conformance levels (a, b, and u). the reasons for choosing pdf/a-2 (instead of pdf/a-1 or pdf/a-3) are the following: ● pdf/a-1 is based on pdf 1.4. in this standard, images coded in pdf/a-1 cannot use jpeg2000 compression (named in pdf/a as jpxdecode). one can still convert tiffs to pdf/a-1 using other lossless compression methods such as lzw. however, the spacesaving benefits of jpeg2000 compression over other methods would not be utilized. ● pdf/a-2 and pdf/a-3 are based on pdf 1.7. one significant feature of pdf 1.7 is that it supports jpeg2000 compression, which saves 40–60 percent of space for raster images compared to uncompressed tiffs. ● pdf/a-3 has one major feature that pdf/a-2 does not have, which is to allow arbitrary files to be embedded within the pdf file. in this case, there is no file to be embedded. the authors chose conformance level b for simplicity. ● b is basic conformance, which requires only necessary components (e.g., all fonts embedded in the pdf) for reproduction of a document’s visual appearance. ● a is accessible conformance, which means b conformance level plus additional accessibility (structural and semantic features such as document structure). one can add tags to convert pdf/2b to pdf/2a. ● u represents a conformance level with the additional requirement that all text in the document have unicode equivalents. this article does not cover any post-processing of additional manual or computational features such as adding ocr text to the generated pdf/a files. these features do not help faithfully capture the look and feel of original pages in digitization, and they can be added or updated later without any loss of information. in addition, ocr results rely on the availability of ocr engines for the document’s language, and results can vary between different ocr engines over time. ocr technology is getting better and will produce better results in the future. for example, current ocr technology for english gives very reliable (more than 90 percent) accuracy. in comparison, traditional chinese manuscripts and pashto/persian give unacceptably low accuracy (less than 60 percent). the cutting edge on ocr engines has started to utilize artificial intelligence networks, and the authors believe that a breakthrough will happen soon. data source the university of arizona libraries (ual) and afghanistan center at kabul university (acku) have been partnering to digitize and preserve acku’s permanent collection held in kabul. this collaborative project created the largest afghan digital repository in the world. currently the afghan digital repository (http://www.afghandata.org) contains more than fifteen thousand titles and 1.6 million pages of documents. digitization of these text documents follows the previous version of the fadgi guideline, which recommended scanning each page of a text document into a separate tiff file as the master file. these tiffs were organized by directories in a file system, where each directory represents a corresponding document containing all the scanned pages of this title. an example of the directory structure can be found in han’s article. http://www.afghandata.org/ digitization of text documents using pdf/a | han and wan 55 https://doi.org/10.6017/ital.v37i1.9878 pdf/a and image manipulation tools there are a few open source and proprietary pdf software development kits (sdk). adobe pdf library and foxit sdk are the most well-known commercial tools to manipulate pdfs. to show readers that they can manipulate and generate pdf/a documents themselves, open source software, rather than commercial tools, was used. currently, only a very limited number of open source pdf sdks are available, including itext and pdfbox. itext was chosen because it has g ood documentation and provides a well-built set of apis to support almost all the pdf and pdf/a features. initially written by bruno lowagie (who was in the iso pdf standard working group) in 1998 as an in-house project, lowagie later started up his own company, itext, and published itext in action with many code examples.6 moreover, itext has java and c# coding options with good code documentation. it is worth mentioning that itext has different versions. the author used itext 5.5.10 and 5.4.4. using an older version in our implementation generated a non-compatible pdf/a file because the it was not aligned with the pdf/a standard.7 for image processing, there were a few popular open source options, including imagemagick and gimp. imagemagick was chosen because of its popularity, stability, and cross-platform implementation. our implementation identified one issue with imagemagick: the current version (7.0.4) could not retrieve all the metadata from tiff files as it did not extract certain information such as the image file directory and color profile. these metadata are critical because they are part of the original data from digitization. unfortunately, the author observed that some image editors were unable to preserve all the metadata from the image files during the conversion process. hart and de varies used case studies to show the vulnerability of metadata, demonstrating metadata elements in a digital object can be lost and corrupted by use or conversion of a file to another format. they suggested that action is needed to ensure proper metadata creation and preservation so that all types of metadata must be captured and preserved to achieve the most authentic, consistent, and complete digital preservation for future use.8 metadata extraction tools and color profiles as we digitize physical documents and manipulate images, color management is important. the goal of color management is to obtain a controlled conversion between the color representations of various devices such as image scanners, digital cameras, and monitors. a color profile is a set of data that control input and output of a color space. the international color consortium (icc) standards and profiles were created to bring various manufacturers together because embedding color profiles into images is one of the most important color management solutions. image formats such as tiff and jpeg2000 and document formats such as pdf may contain embedded color profiles. the authors identified a few open source tools to extract tiff metadata, includin g exiftool, exiv2, and tiffinfo. exiftool is an open source tool for reading, writing, and manipulating metadata of media files. exiv2 is another free metadata tool supporting different image formats. the tiffinfo program is widely used in the linux platform, but it has not been updated for at least ten years. our implementations showed that exiftool was the one that most easily extracted the full icc profiles and other metadata from tiff and jpeg2000 files. imagemagick and other image processing software were examined in van der knijff’s article discussing jpeg2000 for long-term preservation.9 he found that icc profiles were lost in imagemagick. our implementation has information technology and libraries | march 2018 56 showed the current version of imagemagick has fixed this issue. a metadata sample can be found in appendix a. implementation converting and ordering tiffs into a single pdf/a-2 file when ordering and combining all individual tiffs of a document into a single pdf/a-2b file, the authors intended to preserve all information from the tiffs, including raster image data streams and metadata stored in each tiff’s header. the raster image data streams are the main images reflecting the original look and feel of these pages, while the metadata (including technical and administrative metadata such as bitspersample, datetime, and make/model/software) tells us important digitization and provenance information. both are critical for delivery and digital preservation. the tiff images were first converted to jpeg2000 with lossless compression using the open source imagemagick software. our tests of imagemagick demonstrated that it can handle different color profiles and will convert images correctly if the original tiff comes with a color profile. this gave us confidence that past concerns about jpeg2000 and imagemagick had been resolved. these images were then properly sorted into their original order and combined into a single pdf/a-2 file. an alternative is to directly code tiff’s image data stream into a pdf/a file, but this approach would miss one benefit of pdf/a-2: tremendous file size reduction with jpeg2000. the following is the pseudocode of ordering and combining all the tiffs in a text document into a single pdf/a2 file. createpdfa2(queue tifflist) { create an empty queue xmlq; create an empty queue jp2q; /* tifffilelist is pre-sorted queue based on the original order */ /* convert each tiff to jpeg2000 losslessly, then add each jpeg2000 and its metadata into a queue */ while (tifflist is not empty) { string tifffilepath = tifflist.dequeue(); string xmlfilepath = tiff metadata extracted using exiftool; xmlq.enqueue(xmlfilepath); string jp2filepath = jpeg2000 file location from tiff converted by imagemagick; jp2q.enqueue(jp2filepath); } /* convert each image’s metadata to xmp, add each jpeg2000 and its metadata into the pdf/a-2 file based on its original order */ document pdf2b = new document(); /* create pdf/a-2b conformance level */ pdfawriter writer = pdfawriter.getinstance(doc, new fileoutputstream(pdfafilepath),pdfaconformacelevel.pdf_a_2b); writer.createxmpmetadata(); //create root xmp digitization of text documents using pdf/a | han and wan 57 https://doi.org/10.6017/ital.v37i1.9878 pdf2b.open(); while(jp2q is not empty){ image jp2 = image.getinstance(jp2q.dequeue()); rectangle size = new rectangle(jp2.getwidth(), jp2.getheight()); //pdf page size setting pdf2b.setpagesize(size); pdf2b.newpage(); // create a new page for a new image byte[] bytearr = xmpmanipulation(xmlq.dequeue()); // convert original metadata based on the xmp standard writer .setpagexmpmetadata(bytearr); pdf2b.add(jp2); } pdf2b.close(); } converting pdf/a-2 files back to tiffs and jpeg2000s to ensure that we can extract raster images from the newly created pdf/a-2 file, the authors also wrote code to convert a pdf/a-2 file back to the original tiff or jpeg2000 format. this implementation was a reverse process of the above operation. once the reverse conversion process was completed, the authors verified that the image files created from the pdf/a-2 file were the same as before the conversion to pdf/a-2. note that we generated md5 checksums to verify image data streams. images data streams are the same, but metadata location can be varied because of inconsistent tiff tags used over the years. when converting one tiff to another tiff, imagemagick has its implementation of metadata tags. the code can be found in appendix b. pdf/a validation pdf/a is one of the most recognized digital preservation formats, specially designed for long -term preservation and access. however, no commonly accepted pdf/a validator was available in the past, although several commercial and open source pdf preflight and validation engines (e.g., acrobat) were available. validating a pdf/a against the pdf/a standards is a challenging task for a few reasons, including the complexity of the pdf and pdf/a formats. the pdf association and the open preservation foundation recognized the need and started a project to develop an open source pdf/a validator and build a maintenance community. their result, verapdf, is an open source validator designed for all pdf/a parts and conformance levels. released in january 2017, the goal of verapdf is to become the commonly accepted pdf/a validator. 10 our generated pdf/as have been validated with verapdf 1.4 and adobe acrobat pro dc preflight. both products validated the pdf/a-2b files as fully compatible. our implementations showed that verapdf 1.4 verified more cases than acrobat dc preflight. figure 1 shows a pdf file structure and its metadata. information technology and libraries | march 2018 58 figure 1. a pdf object tree with root-level metadata. runtime and conclusion the time complexity of our code is o(log n) because of the sorting algorithms used. tiffs were first converted to jpeg2000. when jpeg2000 images are added to a pdf/a-2 file, no further image manipulation is required because the generated pdf/a-2 uses jpeg2000 directly (in other words, it uses the jpxdecode filter). tables 1 and 2 show the performance comparison running in our computer hardware and software environment (intel core i7-2600 cpu@3.4ghz, 8gb ddr3 ram, 3tb 7200-rpm 64mb-cache hard disk running ubuntu 16.10). digitization of text documents using pdf/a | han and wan 59 https://doi.org/10.6017/ital.v37i1.9878 table 1. runtimes of converting grayscale tiffs to jpeg2000s and to pdf/a-2b no. of files total file size (mb) image conversion runtime (tiffs to jp2s in seconds) total runtime (tiffs to jp2s to a single pdf/a-2b in seconds) 1 9.1 3.61 3.98 10 91.1 35.63 36.71 20 182.2 71.83 73.98 50 455.5 179.06 184.63 100 910.9 358.3 370.91 table 2. runtimes of converting color tiffs to jpeg2000s and to pdf/a-2b no. of files total file size (mb) image conversion runtime (tiffs to jp2s in seconds) total runtime (tiffs to jp2s to a single pdf/a-2b in seconds) 1 27.3 14.80 14.94 10 273 150.51 151.55 20 546 289.95 293.21 50 1,415 741.89 749.75 100 2,730 1490.49 1509.23 the results show that (a) the majority of the runtime (more than 95 percent) is spent in converting a tiff to a jpeg2000 using imagemagick (see figure 2); (b) the average runtime of converting a tiff has a constant positive relationship with the file’s size (see figure 2); (c) in information technology and libraries | march 2018 60 comparison, the runtime of converting a color tiff is significantly higher than that of converting a greyscale tiff (see figure 2); and (d) it is feasible in terms of time and resources to convert existing master images of digital document collections to pdf/a-2b. for example, the runtime of 1 tb of conversion of color tiffs will be 552,831 seconds (153.5 hours; 6.398 days) using the above hardware. the authors have already processed more than 600,000 tiffs using this method. the authors conclude that using pdf/a gives institutions advantages of the newly preferred master file format for digitization of text documents over tiff/jpeg2000. the above implementation demonstrates the ease, the reasonable runtime, and the availability of open source software to perform such conversions. from both the theoretical analysis and empirical evidences, the authors show that pdf/a has advantages over the traditional preferred file format tiff for digitization of text documents. following best practice, a pdf/a file can be a selfcontained and self-described container that accommodates all the data from digitization of textual materials, including page-level metadata and icc profiles. summary the goal of this article is to demonstrate empirical evidences of using pdf/a for digitization of text document. the authors evaluated and used multiple open source software programs for processing raster images, extracting image metadata, and generating pdf/a files. these pdf/a files were validated using the up-to-date pdf/a validators verapdf and acrobat preflight. the authors also calculated the time complexity of the program and measured the total runtime in multiple testing cases. most of the runtime was spent on image conversions from tiff to jpeg2000. the creation of the pdf/a-2b file with associated page-level metadata accounted for less than 5 percent of the total runtime. runtime of conversion of a color tiff was much higher than that of a greyscale one. our theoretical analysis and empirical examples show that using pdf/a-2 presents many advantages over the traditional preferred file format (tiff/jpeg2000) for digitization of text documents. digitization of text documents using pdf/a | han and wan 61 https://doi.org/10.6017/ital.v37i1.9878 figure 2. file size, greyscale and color tiffs and runtime ratio. information technology and libraries | march 2018 62 appendix a: sample tiff metadata with icc header 8 3400 4680 8 8 8 uncompressed rgb (binary data 41025 bytes, use -b option to extract) 3 1 (binary data 28079 bytes, use -b option to extract) 400 400 chunky

appl

2.2.0

display device profile

rgb

xyz

2006:02:02 02:20:00

acsp

apple computer inc.

not embedded, independent

none

reflective, glossy, positive, color perceptual

0.9642 1 0.82491 epso

epson srgb 0.43607 0.22249 0.01392 0.38515 0.71687 0.09708 0.14307 0.06061 0.7141 0.95045 1 1.08905 copyright (c) seiko epson corporation 2000 2006. all rights reserved. (binary data 8204 bytes, use -b option to extract) (binary data 8204 bytes, use -b option to extract) (binary data 8204 bytes, use -b option to extract) 0 0 0 digitization of text documents using pdf/a | han and wan 63 https://doi.org/10.6017/ital.v37i1.9878 appendix b: sample code to convert pdf/a-2 back to jpeg2000s /* assumption: the pdf/a-2b file was specifically generated from image objects converted from tiff images with jpxdecode along with page-level metadata */ public static void parse(string src, string dest) throws ioexception{ pdfreader reader = new pdfreader(src); pdfobject obj; int counter = 0; for(int i = 1; i <= reader.getxrefsize(); i ++){ obj = reader.getpdfobject(i); if(obj != null && obj.isstream()){ prstream stream = (prstream) obj; byte[] b; try{ b = pdfreader.getstreambytes(stream); }catch(unsupportedpdfexception e){ b = pdfreader.getstreambytesraw(stream); } pdfobject pdfsubtype = stream.get(pdfname.subtype); fileoutputstream fos = null; if (pdfsubtype != null && pdfsubtype.tostring().equals(pdfname.xml.tostring())) { fos = new fileoutputstream(string.format(dest + "_xml/" + counter+".xml", i)); system.out.println("page metadata extracted!"); } if (pdfsubtype != null && pdfsubtype.tostring().equals(pdfname.image.tostring())) { counter ++; fos = new fileoutputstream(string.format(dest + "_jp2/" + counter+".jp2", i)); } if (fos != null) { fos.write(b); fos.flush(); fos.close(); system.out.println("jpeg2000s conversion from pdf completed !"); } } } /* then use imagemagick library to convert jpeg2000s to tiffs */ information technology and libraries | march 2018 64 references 1 pdf-tools.com and pdf association, “pdf/a—the standard for long-term archiving,” version 2.4, white paper, may 20, 2009, http://www.pdftools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf; duff johnson, “white paper: how to implement pdf/a,” talking pdf, august 24, 2010, https://talkingpdf.org/white-paperhow-to-implement-pdfa/; alexandra oettler, “pdf/a in a nutshell 2.0: pdf for long-term archiving,” association for digital standards, 2013, https://www.pdfa.org/wpcontent/until2016_uploads/2013/05/pdfa_in_a_nutshell_211.pdf; library of congress, “pdf/a, pdf for long-term preservation,” last modified july 27, 2017, https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml. 2 library of congress, “the time and place for pdf: an interview with duff johnson of the pdf association,” the signal (blog), december 12, 2017, https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duffjohnson-of-the-pdf-association/. 3 yan han, “beyond tiff and jpeg2000: pdf/a as an oais submission information package container,” library hi tech 33, no. 3 (2015): 409–23, https://doi.org/10.1108/lht-06-20150068. 4 federal agencies digital guidelines initiative, technical guidelines for digitizing cultural heritage materials. (washington, dc: federal agencies digital guidelines initiative, 2016), http://www.digitizationguidelines.gov/guidelines/fadgi%20federal%20%20agencies%20d igital%20guidelines%20initiative-2016%20final_rev1.pdf. 5 duff johnson, “us federal agencies approve pdf/a,” pdf association, september 2, 2016, http://www.pdfa.org/new/us-federal-agencies-approve-pdfa/. 6 bruno lowagie, itext in action, 2nd ed. (stamford, ct: manning, 2010). 7 “itext 5.4.4,” itext, last modified september 16, 2013, http://itextpdf.com/changelog/544. 8 timothy robert hart and denise de vries, “metadata provenance and vulnerability,” information technology and libraries 36, no. 4 (2017), https://doi.org/10.6017/ital.v36i4.10146. 9 johan van der knijff, “jpeg 2000 for long-term preservation: jp2 as a preservation format,” dlib 17, no. 5/6 (2011), https://doi.org/10.1045/may2011-vanderknijff. 10 pdf association, “how verapdf does pdf/a validation,” 2016, http://www.pdfa.org/howverapdf-does-pdfa-validation/. http://www.pdf-tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf http://www.pdf-tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf https://talkingpdf.org/white-paper-how-to-implement-pdfa/ https://talkingpdf.org/white-paper-how-to-implement-pdfa/ https://www.pdfa.org/wp-content/until2016_uploads/2013/05/pdfa_in_a_nutshell_211.pdf https://www.pdfa.org/wp-content/until2016_uploads/2013/05/pdfa_in_a_nutshell_211.pdf https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://doi.org/10.1108/lht-06-2015-0068 https://doi.org/10.1108/lht-06-2015-0068 http://www.digitizationguidelines.gov/guidelines/fadgi%20federal%20%20agencies%20digital%20guidelines%20initiative-2016%20final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/fadgi%20federal%20%20agencies%20digital%20guidelines%20initiative-2016%20final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/fadgi%20federal%20%20agencies%20digital%20guidelines%20initiative-2016%20final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/fadgi%20federal%20%20agencies%20digital%20guidelines%20initiative-2016%20final_rev1.pdf https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ http://itextpdf.com/changelog/544 http://itextpdf.com/changelog/544 https://doi.org/10.6017/ital.v36i4.10146 https://doi.org/10.6017/ital.v36i4.10146 https://doi.org/10.1045/may2011-vanderknijff https://www.pdfa.org/how-verapdf-does-pdfa-validation/ https://www.pdfa.org/how-verapdf-does-pdfa-validation/ https://www.pdfa.org/how-verapdf-does-pdfa-validation/ abstract background goals and tasks choices of pdf/a standards and conformance level data source pdf/a and image manipulation tools metadata extraction tools and color profiles implementation converting and ordering tiffs into a single pdf/a-2 file converting pdf/a-2 files back to tiffs and jpeg2000s pdf/a validation runtime and conclusion summary appendix a: sample tiff metadata with icc header appendix b: sample code to convert pdf/a-2 back to jpeg2000s references let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects tanya m. johnson information technology and libraries | june 2016 39 abstract three-dimensional objects are important sources of information that should not be ignored in the increasing trend towards digitization. previous research has not addressed the evaluation of digitized versions of three-dimensional objects. this paper first reviews research concerning such digitization, in both two and three dimensions, as well as public access in this context. next, evaluation criteria for websites incorporating digital versions of three-dimensional objects are extrapolated from previous research. finally, five websites are evaluated, and suggestions for best practices to provide public access to digital versions of three-dimensional objects are proposed. introduction much of the literature surrounding the increased efforts of libraries and museums to digitize content has focused on two-dimensional forms, such as books, photographs, or paintings. however, information does not only come in two dimensions; there are sculptures, artifacts, and other three-dimensional objects that have been unfortunately neglected by this digital revolution. as one author stated, “while researchers do not refer to three-dimensional objects as commonly as books, manuscripts, and journal articles, they are still important sources of information and should not be taken for granted” (jarrell 1998, 32). the importance of three-dimensional objects as information that can and should be shared is not a new phenomenon; indeed, as early as 1887, museologists and educators forwarded the view that “museums were in effect libraries of objects” that provided information not supplied by books alone (given and mctavish 2010, 11). however, it is only recently, with the advent of newer technological mechanisms, that such objects could be shared with the public on a larger scale. no longer do people need to physically visit museums to experience and learn from threedimensional objects. rather, various techniques have been utilized to place digital versions of such objects on the websites of museums and archives, and projects have been created by various universities in order to enhance that digital experience. nevertheless, as newell (2012) states: collections-holding institutions increasingly regard digital resources as additional objects of significance, not as complete replacements for the original. digital technologies work best when they enable people who feel connected to museum objects to have the freedom to deepen these tanya m. johnson (tmjohnso@gmail.com), a recent mlis degree graduate from the school of communication & information, rutgers, the state university of new jersey, is winner of the 2016 lita/ex libris student writing award. mailto:tmjohnso@gmail.com let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 40 relationships and, where appropriate, to extend outsiders’ understandings of the objects’ cultural contexts. the raison d’être of museums and other cultural institutions remains centred on the primacy of the object and in this sense continues to privilege material authenticity. (303) in this regard, three-dimensional visualization of physical objects can be seen as the next step for museums and cultural heritage institutions that seek to further patrons’ connection to such objects via the internet. indeed, in this digital age, the goals of museums and archives are changing, converging with those of libraries to focus more efforts on providing information to the public, and, along with the growing trend to digitize information contained within libraries, there has been a concomitant trend to digitize the contents of museums in order to provide greater public access to collections (given and mctavish 2010). in light of this progress, this paper will review various methods of presenting three-dimensional objects to the public on the internet and, based on an evaluation of five digital collections, attempt to provide some advice as to best practices for museums or institutions seeking to digitize such objects and present them to the public via a digital collection. literature review two-dimensional digitization there are many ways to present digital versions of three-dimensional objects on a webpage, ranging from simple two-dimensional photography to complicated three-dimensional scanning and rendering. beginning on the simpler end of the scale, bincsik, maezaki, and hattori (2012) describe the process of photographing japanese decorative art objects in order to create an image database of objects from multiple museums. specifically, the researchers explain that they need high quality photographs showing each object in all directions, as well as close-up images of fine details, in order to recreate the physical research experience as closely as possible. they also note that, for the same reason, the context of each object must be recorded, including photographs of any wrapping or storage materials and accompanying documentation. for this project, the researchers utilized nikon professional or semi-professional cameras, with zoom and macro lenses, and often used small apertures to increase depth-of-field. at times, they also took measurements of the objects in order to assist museums in maintaining accurate records. the raw image files were then processed with programs such as adobe photoshop, saved as original tif files, and converted into jpeg format for upload. despite the success of the project, the researchers also noted the limitations of digitizing three-dimensional objects: with decorative art objects some information is inevitably lost, such as the weight of the object, the feeling of its surface texture or the sense of its functionality in terms of proportions and balance. digital images clearly can fulfill many research objectives, but in some cases they can only be used as references. one objective of the decorative arts database is to advise the researcher in selecting which objects should be examined in person. (bincsik, maezaki, and hattori 2012, 46) one difficulty with photography, particularly when digitizing artwork, is that color is a function of light. thus, a single object will often appear to be different colors when photographed in different lighting conditions using conventional digital cameras, which process images using rgb filters. information technology and libraries | june 2016 41 more accurate representations of objects can be acquired using multispectral imaging, which uses a higher number of parameters (the international standard is 31, compared to rgb’s 3) in order to obtain more information about the reflectance of an object at any particular point in space (novati, pellegri, and schettini 2005). multispectral imaging, however, is very expensive and, despite some researchers’ attempts to create affordable systems (e.g., novati, pellegri, and schettini 2005), the acquisition of multispectral images is generally limited to large institutions with considerable funding (chane et al. 2013). the use of two-dimensional photography to digitize objects is not limited to the arts; in the natural sciences, different types of photographic equipment have been developed to document existing collections and enhance scientific observation. gigapixel imaging, for example, has been utilized to allow museum visitors to virtually explore large petroglyphs located in remote locations as well as for documentation and viewing of dinosaur bone specimens that are not on public display (louw and crowley 2013). this technology consists of taking many, very high resolution photographs that are then, via computer software, “aligned, blended, and stitched” together to create one extremely detailed composite image (louw and crowley 2013, 89–90). robotic systems, such as gigapan, have been developed to speed up the process and permit rapid recording and processing of the necessary area. once the gigapixel image is created, it can then be uploaded and displayed on the web in dynamic form, including spatial navigation of the image with embedded text, audio, or video at specific locations and zoom levels to provide further information (louw and crowley 2013). various types of gigapixel imaging, including the gigapan system, have also been used to digitize important collections of biological specimens, particularly insects, which are often stored in large drawers. one study examined the documentation of entomological specimens by “whole-drawer imaging” using various gigapixel imaging technologies (holovachov, zatushevsky, and shydlovsky 2014). the researchers explained that different gigapixel imaging systems (many of which are commercial and proprietary) utilize different types of cameras and lenses, as well as different types of software for processing. however, despite the expensive cost of some commercially available systems, it is possible for museums and other institutions to create their own, economically viable versions. the system created by holovachov, zatushevsky, and shydlovsky utilized a standard slr camera, fitted with a macro lens and attached to an immovable stand. the researchers manually set up lighting, focus, aperture, and other settings, and moved the insect drawer along a pre-determined grid pattern in order to obtain the multiple overlapping photographs necessary to create a large gigapixel image. they used a freely available stitching software program and manually corrected stitching artifacts and color balance issues that resulted from the use of a non-telecentric lens.1 despite the lower cost of their individualized system, however, the researchers noted that the process was much more time-consuming and necessitated more labor from workers digitizing the collection. moreover, technologically speaking, the researchers emphasized the limits of two-dimensional imaging, given that the 1the difference between telecentric and non-telecentric lenses is explained by the researchers: “contrary to ordinary photographic lenses, object-space telecentric lenses provide the same object magnification at all possible focusing distances. an object that is too close or too far from the focus plane and not in focus, will be the same size as if it were in focus. there is no perspective error and the image projection is parallel. therefore, when such a lens is used to take images of pinned insects in a box, all vertical pins will appear strictly vertical, independent of their position within the camera’s field of view” (holovachov, zatushevsky, and shydlovsky 2014, 7). let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 42 “diagnostic characteristics of three-dimensional insects,” as well as the accompanying labels, are often invisible when a drawer is only photographed from the top. thus, the researchers concluded that, ultimately, “the whole-drawer digitizing of insect collections needs to be transformed from two-dimensions to three-dimensions by employing complex imaging techniques (simultaneous use of multiple cameras positioned at different angles) and a digital workflow” (holovachov, zatushevsky, and shydlovsky 2014, 7). three-dimensional digitization given the goal of obtaining as accurate a representation as possible when digitizing objects, many researchers have turned to the use of various techniques in order to obtain three-dimensional data. acquiring a three-dimensional image of an object takes place in three steps: 1. preparation, during which certain preliminary activities take place that involve the decision about the technique and methodology to be adopted as well as the place of digitization, security planning issues, etc. 2. digital recording, which is the main digitization process according to the plan from phase 1. 3. data processing, which involves the modeling of the digitized object through the unification of partial scans, geometric data processing, texture data processing, texture mapping, etc. (pavlidis et al. 2007, 94) steps 2 and 3 have been more technically described as (2) obtaining data from an object to create point clouds (from thousands to billions of x,y,z coordinates representing loci on the object); and (3) processing point clouds into polygon models (creating a surface on top of the points), which can then be mapped with textures and colors (metallo and rossi 2011). there are several techniques that can be utilized to acquire three-dimensional data from a physical object. table 1 explains the four general methods most commonly used by museums. information technology and libraries | june 2016 43 type description positives negatives approx. price range laser scanning a laser source emits light onto the object’s surface, which is detected by a digital camera; geometry of the object is extracted by triangulation or time of flight calculations high accuracy in capturing geometry; can capture small objects and entire buildings (using different hardware) limited texture and color captured; shiny surfaces refract the laser $3,000– $200,000 white light (structured light) scanning a pattern of light is projected onto the object’s surface, and deformations in that pattern are detected by a digital camera; geometry is extracted by triangulation from deformations captures texture details, making it very accurate; can capture color dark, shiny, or translucent objects are problematic $15,000– $250,000 photogrammetry three-dimensional data is extracted from multiple twodimensional pictures can capture small objects and mountain ranges; good color information need either precise placement of cameras or more precise software to obtain accurate data cameras: $500– $50,000; software: free– $40,000 volumetric scanning magnetic resonance imaging (mri) uses a strong magnetic field and radio waves to detect geometric, density, volume and location information; computed tomography (ct) uses rotating x-rays to create twodimensional slices, which can then be reconstructed into three-dimensional images both types can view the interior and exterior of an object; ct can be used for reflective or translucent objects; mri can image soft tissues no color information; mri requires object to have high water content $200,000– $2,000,000 table 1. description of four general methods of acquiring three-dimensional data about physical objects (table information compiled by reference to pavlidis et al. 2007; metallo and rossi 2011; abel et al. 2011; and berquist et al. 2012). the type of three-dimensional digitization used can ultimately depend upon the types of objects to be imaged or the type of data needed. for example, in digitizing human skeletal collections, one study explained that three-dimensional laser scanning was an advantageous technique to create models of bones for preservation and analysis, but cautioned that ct scans would be needed to examine the internal structures of such specimens (kuzminsky and gardiner 2012). another study let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 44 utilized several techniques in an attempt to decipher graffiti inscriptions on ancient roman pottery shards, ultimately concluding that high-resolution photography (similar to gigapixel imaging) and three-dimensional laser scanning both provided detailed and helpful data (montani et al. 2012). additionally, sometimes multiple types of digitization can be used for the same objects with similar results. one study, for example, obtained virtually equivalent threedimensional models of the same object using laser scanning and two types of photogrammetry (lerma and muir 2014). most recently, researchers have been utilizing combinations of digitization techniques to obtain the most accurate representations possible. chane et al. (2013), for example, examined methods of combining three-dimensional digitization with multispectral photography in order to obtain enhanced information concerning the physical object in question. the researchers explained that combining the two processes is difficult because, in order to obtain multispectral textural data that is mapped to geometric positions, the object must be imaged from identical locations by multiple scanners/cameras or else the data processing that combines the two types of data becomes extremely complex. as a compromise, the researchers created a system of optical tracking based on photogrammetry techniques that permits the collection and integration of geometric positioning data and multispectral textures utilizing precise targeting procedures. however, the researchers noted that most systems integrating multispectral photography with threedimensional digitization tended to be quite bulky, did not adapt easily to different types of objects, and needed better processing algorithms for more complex three-dimensional objects (chane et al. 2013). public access to three-dimensionally digitized objects despite museums’ growing focus on increasing public access to collections via digitization (given and mctavish 2010), there is very little literature addressing public access to three-dimensionally digitized objects. indeed, studies in this realm tend to focus on the technological aspects of either the modeling of specific objects or collections or website viewing of three-dimensional models. for example, abate et al. (2011) described the three-dimensional digitization of a particular statue from the scanning process to its ultimate depiction on a website. the researchers explained in detail the particular software architecture utilized in order to permit the remote rendering of the three-dimensional model on users’ computers via a java applet without compromising quality or necessitating download of potentially copyrighted works. by contrast, literature concerning the digital michelangelo project, during which researchers three-dimensionally digitized various michelangelo works, focused on the method used to create an accurate three-dimensional model, complete with color and texture mapping, and a visualization tool (dellepiane et al. 2008). one study did describe a project that was designed to place three-dimensional data about various cultural artifacts in an online repository for curators and other professionals (hess et al. 2011). this repository was contained within database management software, a web-based interface was designed for searching, and user access to three-dimensional images and models was provided via an activex plugin. despite the potential of the prototype, however, it appears that the project has ceased,2 and the institution’s current three-dimensional imaging project is focused on the design 2see http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/ecurator. http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e-curator http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e-curator information technology and libraries | june 2016 45 of a traveling exhibition incorporating, among other things, three-dimensional models of artifacts and physical replicas created from such models.3 studies that do address public access directly tend to focus on the improvement of museum websites generally. for example, in terms of user expectations of museum websites, one study found that approximately 63 percent of visitors to a museum’s website did so in order to search the digital collection (kravchyna and hastings 2002). another study found four types of museum website users, who each had different needs and expectations of sites. relevantly, educators sought collections that were “the more realistic the better,” including suggestions like incorporating three-dimensional simulations of physical objects so that students could “explore the form, construction, texture and use of objects” (cameron 2003, 335). further, non-specialist users “value free choice learning” and “access online collections to explore and discover new things and build on their knowledge base as a form of entertainment” (cameron 2003, 335). similarly, some studies have addressed the incorporation of web 2.0 technologies into museum websites. srinivasan et al. (2009), for example, argue that web 2.0 technologies must be integrated into museum catalogs rather than simply layered over existing records because users’ interest in objects is increased by participation in the descriptive practice. an implementation of this concept is found in hunter and gerber’s (2010) system of social tagging attached to threedimensional models. this paper is an effort to address the gap between the technical process of digitizing and presenting three-dimensional objects on the web and the user experience of such. through the evaluation of five websites, this paper will provide some guidance for the digitization of threedimensional objects and their presentation in digital collections for public access. methodology and evaluative criteria evaluations of digital museums are not as prevalent as evaluations of digital libraries. however, given the similar purposes of digital museums and digital libraries, it is appropriate to utilize similar criteria. for digital libraries, saracevic (2000) synthesized evaluation criteria into performance questions in two broad areas: (a) user-centered questions, including how well the digital library supports the society or community served, how well it supports institutional or organizational goals, how well it supports individual users’ information needs, and how well the digital library’s interface provides access and interaction; and (b) systemcentered questions, including hardware and network performance, processing and algorithm performance, and how well the content of the collection is selected, represented, organized, and managed. xie (2008) focused on user-centered evaluation and found five general criteria that exemplified users’ own evaluations of digital libraries: interface usability, collection quality, service quality, system performance, and user satisfaction. parandjuk (2010) used information architecture to construct criteria for the evaluation of a digital library, including the following: • uniformity of standards, including consistency among webpages and individual records; • findability, including ease of use and multiple ways to access the same information; • sub-navigation, including indexes, sitemaps, and guides; 3see http://www.3dencounters.com. http://www.3dencounters.com/ let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 46 • contextual navigation, including simplified searching and co-location of different types of resources; • language, including consistency in labeling across pages and records and appropriateness for the audience; and • integration of searching and browsing. this system is particularly appropriate in the context of digital museums, as it emphasizes the curatorial or organizational aspect of the collection in order to support learning objectives. in one comprehensive evaluation of the websites of art museums, pallas and economides (2008) created a framework for such evaluation, incorporating six dimensions: content, presentation, usability, interactivity and feedback, e-services, and technical. each dimension then contained several specific criteria. many of the criteria overlapped, however, and three-dimensional imaging, for example, was placed within the e-services dimension, under virtual tours, although it could have been placed within presentation, with other multimedia criteria, or even within interactivity, with interactive multimedia applications. the problem in trying to evaluate a particular part of a museum’s website, namely, the way it presents three-dimensional objects in digital form, is that the level of specificity almost renders many of the evaluation criteria from previous studies irrelevant. as hariri and norouzi (2011) suggest, evaluation criteria should be based on the objective of the evaluation. hence, based on portions of the above-referenced studies, this author has created a more focused evaluation framework, concentrating on criteria that are particularly relevant to museums’ digital presentations of three-dimensional objects. this framework is detailed in table 2, below. dimension description functionality what technology is used to display the object? how well does it work? must programs or files be downloaded? are the loading times of displays acceptable? usability how easy is the site to use? what is the navigation system? are there searching and browsing functions, and how well does each work? how findable are individual objects? presentation how does the display of the object look? what is the context in which the object is presented? are there multiple viewing options? is there any interactivity permitted? content does the site provide an adequate collection of objects? for individual objects, is there sufficient information provided? is there additional educational content? table 2. summary of evaluative criteria five digital collections, specified below, will be evaluated based on these criteria. this will be done in a case study manner, describing each website based on the above criteria and then using those evaluations to make suggestions for best practices. results information technology and libraries | june 2016 47 it is difficult to compare different types of digital collections, particularly when the focus is on different types of technology utilized to display similar objects. however, because the goal here is to determine the best practices for the digital presentation of three-dimensional objects, it is important to evaluate a variety of techniques in a variety of fields. thus, the following digital collections have been chosen to illustrate different ways in which such objects can be displayed on a website. museum of fine arts, boston (mfa) (http://www.mfa.org/collections) the mfa, both in person and online, boasts a comprehensive and extensive collection of art and historical artifacts of varying forms. the website is very easy to navigate, with well-defined browsing options and easy search capabilities, allowing for refinement of results by collection or type of item. there are many collections, which are well organized and curated into separate exhibits and galleries. in addition, when viewing each gallery, suggestions are linked for related online exhibitions as well as tours and exhibits at the physical museum. each item record contains a detailed description of the item as well as its provenance. thus, the mfa website attains a very high rating for usability and content. however, individual items are represented by only single pictures of varying quality. some pictures are color, some are black and white, and no two pictures appear to have the same lighting. additionally, despite being slow to load, even the pictures that appear to be of the best quality cannot be of high resolution, as zooming in makes them slightly blurry. accordingly, the mfa website receives a medium rating for functionality and a low rating for presentation. digital fish library (dfl) (http://www.digitalfishlibrary.org/index.php) the dfl project is a comprehensive program that utilizes mri scanning to digitize preserved biological fish samples from a particular collection housed at the scripps institution of oceanography. after mri scans of a specimen are taken, the data is processed and translated into various views that are placed on the website, accompanied by information about each species (berquist et al. 2012). navigating the dfl website is very intuitive, as the individual specimen records are organized by taxonomy. it is easy to search for particular species or browse through the clickable, pictorial interface. records for each species include detailed information about the individual specimen, the specifics of the scans used to image each, and broader information about the species. individual records also provide links to other species within the taxonomic family. thus, the dfl website attains high ratings in both usability and content. for functionality and presentation, however, the ratings are medium. although for each item there are videos and still images obtained from threedimensional volume renderings and mri scans, they are small in size and have low resolution. there is no interactive component, with the possible exception of the “digital fish viewer” that supposedly requires java, but this author could not get it to work despite best efforts. one nice feature, shown in figure 1 below, is that some of the specimen records have three-dimensional renderings showing and explaining the internal structures of the species. http://www.mfa.org/collections http://www.digitalfishlibrary.org/index.php let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 48 figure 1. annotated three-dimensional rendering of internal structures of hammerhead shark, from the digital fish library (http://www.digitalfishlibrary.org/library/viewimage.php?id=2851) the eton myers collection (http://etonmyers.bham.ac.uk/3d-models.html) the eton myers collection of ancient egyptian art is housed at eton college, and a project to threedimensionally digitize the items for public access was undertaken via collaboration between that institution and the university of birmingham. digitization was accomplished with threedimensional laser scanners, data was then processed with geomagic software to produce point cloud and mesh forms, and individual datasets were reduced in size and converted into an appropriate file type to allow for public access (chapman, gaffney, and moulden 2010). usability of the eton myers collection website is extremely low. the initial interface is simply a list of three-dimensional models by item number with a description of how to download the appropriate program and files. another website from the university of birmingham (http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=the+eton+myers+col lection) contains a more museum-like interface, but contains many more records for objects than are contained on the initial list of three-dimensional models. moreover, most of the records do not even include pictures of the items, let alone links to the three-dimensional models, and the records that do include pictures do not necessarily include such links. even when a record has a link to the three-dimensional model, it actually redirects to the full list of models rather than to the individual item. there is no search functionality from the initial list of three-dimensional models, and no way to browse other than to, colloquially speaking, poke and hope. individual items are only identified by item number, and, aside from the few records that have accompanying pictures on the university of birmingham site, there is no way to know to what item any given number refers. the http://www.digitalfishlibrary.org/library/viewimage.php?id=2851 http://etonmyers.bham.ac.uk/3d-models.html http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=the+eton+myers+collection http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=the+eton+myers+collection information technology and libraries | june 2016 49 website attains only a low rating for content; although it seems that there may be a decent number of items in the collection, it is impossible to know for certain given the problems with the interface and the fact that individual items are virtually unidentified. the eton myers collection website also receives a low rating for functionality. in order to access three-dimensional models of items, users must download and install a program called meshlab, then download individual folders of compressed files, then unzip those files, and finally open the appropriate file in meshlab. despite compression, some of the file folders are still quite large and take some time to download. presentation of the items is also rated low. even for the high resolution versions of the three-dimensional renderings, viewed in meshlab, the geometry of the objects seems underdeveloped (e.g., hieroglyphics are illegible) and surface textures are not well mapped (e.g., colors are completely off). this is evident from a comparison of the threedimensional rendering with a two-dimensional photograph of the same item, as in figure 2, below. figure 2. comparison of original photograph (left) and three-dimensional rendering (right) of item number ecm 361, from the eton myers collection (http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+3 61&op-earliest_year=%3d&op-latest_year=%3d). notably, chapman, gaffney, and moulden (2010) indicate that the detailed three-dimensional imaging enabled them to identify tooling marks and read previously unclear hieroglyphics on certain items. thus, it is possible that the problems with the renderings may be a result of a loss in quality between the original models and the downloaded versions, particularly given that the files were reduced in size and converted prior to being made available for download. http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+361&op-earliest_year=%3d&op-latest_year=%3d http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+361&op-earliest_year=%3d&op-latest_year=%3d let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 50 epigraphia 3d project (http://www.epigraphia3d.es) the epigraphia 3d project was created to present an online collection of various historical roman epigraphs (also known as inscriptions) that were discovered and excavated in spain and italy; the physical collection is housed at the museo arqueológico nacional (madrid). digital imaging was accomplished using photogrammetry, free software was utilized to create three-dimensional object models and renderings, and photoshop was used to obtain appropriate textures. finally, the three-dimensional model was published on the web using sketchfab, a web service similar to flickr that allows in-browser viewing of three-dimensional renderings in many different formats (ramírez-sánchez et al. 2014). the epigraphia 3d website is intuitive and informative. browsing is simple because there are not many records, but, although it is possible to search the website, there is no search function specifically directed to the collection. thus, usability is rated as medium. despite the fact that the website provides descriptions of the project and the collection, as well as information about epigraphs generally, the website attains a medium rating for content in light of the small size of the collection and the limited information given for each individual item. however, the epigraphia 3d website receives very high ratings for functionality and presentation. the individual threedimensional models are detailed, legible, and interactive. individual inscriptions are transcribed for each item. the use of sketchfab to display the models is effective; no downloading is necessary, and it takes an acceptable amount of time to load. when viewing the item, users can rotate the object in either “orbit” or “first person” mode, as well as view it full-screen or within the browser window. users can also display the wireframe model and the textured or surfaced rendering, as shown in figure 3 below. figure 3. three-dimensional textured (left) and wireframe (middle) renderings from the epigraphia 3d project (http://www.epigraphia3d.es/3d-01.html), as compared to an original twodimensional photograph of the same object (right) (http://edabea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&r ec=19984). http://www.epigraphia3d.es/ http://www.epigraphia3d.es/3d-01.html http://eda-bea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&rec=19984 http://eda-bea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&rec=19984 http://eda-bea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&rec=19984 information technology and libraries | june 2016 51 smithsonian x 3d (http://3d.si.edu) the smithsonian x 3d project, although affiliated with all of the smithsonian’s varying divisions, was created to test the application of three-dimensional digitization techniques to “iconic collection objects” (http://3d.si.edu/about). the website provides significant detail concerning the project itself, mostly in the form of videos, and individual items, many of which are linked to “tours” that incorporate a story about the object. content is rated as medium because, despite the depth of information provided about individual items, there are still very few items within the collection. the website also receives a medium rating for usability, given the simple browsing structure, easy navigation, and lack of a search feature (all likely due at least in part to the limited content). functionality and presentation, however, are rated high. the x3d explorer in-browser software (powered by autodesk) does more than simply display a three-dimensional rendering of an object; it also permits users to edit the model by changing color, lighting, texture, and other variables as well as incorporates detailed information about each item, both as an overall description and as a slide show, where snippets of information are connected to specific views of the item. the individual three-dimensional models are high resolution, detailed, and wellrendered, with very good surface texture mapping. however, it must be noted that the x3d explorer tool is in beta and, as such, still has some bugs; for example, this author has observed a model disappear while zooming in on the rendering. table 3, below, summarizes the results of the evaluation. functionality usability presentation content mfa medium very high low very high dfl medium high medium high eton myers low low low low epigraphia 3d very high medium very high medium smithsonian x 3d high medium high medium table 3. summary of evaluation results for each website by individual criteria discussion based on the evaluation of the five websites described above, some suggested best practices for the digitization and presentation of three-dimensional objects become apparent. when digitizing, the museum should utilize the method that best suits the object or collection. for example, while mri scanning is likely the best method for three-dimensionally digitizing biological fish specimens, it is not going to be effective or feasible for digitizing artwork or artifacts (abel et al. 2011; berquist et al. 2012). regardless of the method of digitization used, however, the people conducting the imaging and processing should fully comprehend the hardware and software necessary to complete the task. additionally, although financial restraints must be considered, museums should note that some three-dimensional scanning equipment is just as economically feasible as standard digital cameras (metallo and rossi 2011). however, if a museum chooses to utilize only two-dimensional imaging, http://3d.si.edu/ http://3d.si.edu/about let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 52 each item should be photographed from multiple angles in high resolution, to avoid creating a website, like the mfa’s, on which everything other than the object itself is presented outstandingly. further, museums deciding on two-dimensional imaging should explore the possibility of utilizing photogrammetry to create three-dimensional models from their twodimensional photographs, like the epigraphia 3d project. there is free or inexpensive software that functions to permit the creation of three-dimensional object maps from very few photographs (ramírez-sánchez et al. 2014). finally, compatibility is a key issue when conducting threedimensional scans; the museum should ensure that the software used for rendering models is compatible with the way in which users will be viewing the models. in the context of public access to the museum’s digital collections, the website should be easy and intuitive to navigate. the mfa website is an excellent example; browsing and search functions should both be present, and reorganization of large numbers of objects into separate collections may be necessary. where searching is going to be the primary point of entry into the collection, it is important to have sufficient metadata and functional search algorithms to ensure that item records are findable. furthermore, remember that the website is simply a way to access the museum itself. hence, the collections on the website, like the collections in the physical museum, should be curated; there should be a logical flow to accessing object records. the museum may also want to have sections that are similar to virtual exhibitions, like the “tours” provided by the smithsonian x 3d project. finally, museums should ensure that no additional technological know-how (beyond being able to access the internet) is required to access the three-dimensional content in object records. users should not be required to download software or files to view records; epigraphia 3d’s use of sketchfab and the smithsonian’s x 3d explorer tool are both excellent examples of ways in which three-dimensional content can be viewed on the web without the need for extraneous software. museums and cultural heritage institutions are increasing the focus on providing public access to collections via digitization and display on websites (given and mctavish 2010). in order to do this effectively, this paper has attempted to provide some guidance as to best practices of presenting digital versions of three-dimensional objects. in closing, however, it must be noted that this author is not a technician. although this paper has tried to contend with the issues from the perspective of a librarian, there are complicated technical concerns behind any digitization project that have not been adequately addressed. in addition, this paper has not examined the role of budgetary constraints on digitization or the concomitant issues of creating and maintaining websites. moreover, because this paper has been treated as a broad overview of the digitization and presentation for public access of three-dimensional objects, the five websites evaluated were from varying fields of study. museums should look to more specific comparisons in order to appropriately digitize and present their collections on the web. conclusion there may not be a direct substitute for encountering an object in person, but for people who cannot obtain physical access to three-dimensional objects, the digital realm can serve as an adequate proxy. this paper has demonstrated, through an evaluation of five distinct digital collections, that utilizing three-dimensional imaging and presenting three-dimensional models of physical objects on the web can serve the important purpose of increasing public access to otherwise unavailable collections. information technology and libraries | june 2016 53 references abate, d., r. ciavarella, g. furini, g. guarnieri, s. migliori, and s. pierattini. “3d modeling and remote rendering technique of a high definition cultural heritage artefact.” procedia computer science 3 (2011): 848–52. http://dx.doi.org/10.1016/j.procs.2010.12.139. abel, r. l., s. parfitt, n. ashton, simon g. lewis, beccy scott, and c. stringer. “digital preservation and dissemination of ancient lithic technology with modern micro-ct.” computers and graphics 35, no. 4 (august 2011): 878–84. http://dx.doi.org/10.1016/j.cag.2011.03.001. berquist, rachel m., kristen m. gledhill, matthew w. peterson, allyson h. doan, gregory t. baxter, kara e. yopak, ning kang, h.j. walker, philip a. hastings, and lawrence r. frank. “the digital fish library: using mri to digitize, database, and document the morphological diversity of fish.” plos one 7, no. 4: (april 2012). http://dx.doi.org/10.1371/journal.pone.0034499. bincsik, monika, shinya maezaki, and kenji hattori. “digital archive project to catalogue exported japanese decorative arts.” international journal of humanities and arts computing 6, no. 1– 2 (march 2012): 42–56. http://dx.doi.org/10.3366/ijhac.2012.0037. cameron, fiona. “digital futures i: museum collections, digital technologies, and the cultural construction of knowledge.” curator: the museum journal 46, no. 3 (july 2003): 325–40. http://dx.doi.org/10.1111/j.2151-6952.2003.tb00098.x. chane, camille simon, alamin mansouri, franck s. marzani, and frank boochs. “integration of 3d and multispectral data for cultural heritage applications: survey and perspectives.” image and vision computing 31, no. 1 (january 2013): 91–102. http://dx.doi.org/10.1016/j.imavis.2012.10.006. chapman, henry p., vincent l. gaffney, and helen l. moulden. “the eton myers collection virtual museum.” international journal of humanities and arts computing 4, no. 1–2 (october 2010): 81–93. http://dx.doi.org/10.3366/ijhac.2011.0009. dellepiane, m., m. callieri, f. ponchio, and r. scopigno. “mapping highly detailed colour information on extremely dense 3d models: the case of david's restoration.” computer graphics forum 27, no. 8 (december 2008): 2178–87. http://dx.doi.org/10.1111/j.14678659.2008.01194.x. given, lisa m., and lianne mctavish. “what’s old is new again: the reconvergence of libraries, archives, and museums in the digital age.” library quarterly 80, no. 1 (january 2010): 7– 32. http://dx.doi.org/10.1086/648461. hariri, nadjla, and yaghoub norouzi. “determining evaluation criteria for digital libraries’ user interface: a review.” the electronic library 29, no. 5 (2011): 698–722. http://dx.doi.org/10.1108/02640471111177116. hess, mona, francesca simon millar, stuart robson, sally macdonald, graeme were, and ian brown. “well connected to your digital object? e-curator: a web-based e-science platform for museum artefacts.” literary and linguistic computing 26, no. 2 (2011): 193– 215. http://dx.doi.org/10.1093/llc/fqr006. http://dx.doi.org/10.1016/j.cag.2011.03.001 http://dx.doi.org/10.1371/journal.pone.0034499 http://dx.doi.org/10.3366/ijhac.2012.0037 http://dx.doi.org/10.1111/j.2151-6952.2003.tb00098.x http://dx.doi.org/10.1016/j.imavis.2012.10.006 http://dx.doi.org/10.3366/ijhac.2011.0009 http://dx.doi.org/10.1111/j.1467-8659.2008.01194.x http://dx.doi.org/10.1111/j.1467-8659.2008.01194.x http://dx.doi.org/10.1086/648461 http://dx.doi.org/10.1108/02640471111177116 http://dx.doi.org/10.1093/llc/fqr006 let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 54 holovachov, oleksandr, andriy zatushevsky, and ihor shydlovsky. “whole-drawer imaging of entomological collections: benefits, limitations and alternative applications.” journal of conservation and museum studies 12, no. 1 (2014): 1–13. http://dx.doi.org/10.5334/jcms.1021218. hunter, jane, and anna gerber. 2010. “harvesting community annotations on 3d models of museum artefacts to enhance knowledge, discovery and re-use.” journal of cultural heritage 11, no. 1 (2010): 81–90. http://dx.doi.org/10.1016/j.culher.2009.04.004. jarrell, michael c. “providing access to three-dimensional collections.” reference & user services quarterly 38, no. 1 (1998): 29–32. kravchyna, victoria, and sam k. hastings. “informational value of museum web sites.” first monday 7, no. 4 (february 2002). http://dx.doi.org/10.5210/fm.v7i2.929. kuzminsky, susan c. and megan s. gardiner. “three-dimensional laser scanning: potential uses for museum conservation and scientific research.” journal of archaeological science 39, no. 8 (august 2012): 2744–51. http://dx.doi.org/10.1016/j.jas.2012.04.020. lerma, josé luis, and colin muir. “evaluating the 3d documentation of an early christian upright stone with carvings from scotland with multiples images.” journal of archaeological science 46 (june 2014): 311–18. http://dx.doi.org/10.1016/j.jas.2014.02.026. louw, marti, and kevin crowley. “new ways of looking and learning in natural history museums: the use of gigapixel imaging to bring science and publics together.” curator: the museum journal 56, no. 1 (january 2013): 87–104. http://dx.doi.org/10.1111/cura.12009. metallo, adam, and vince rossi. “the future of three-dimensional imaging and museum applications.” curator: the museum journal 54, no. 1 (january 2011): 63–69. http://dx.doi.org/10.1111/j.2151-6952.2010.00067.x. montani, isabelle, eric sapin, richard sylvestre, and raymond marquis . “analysis of roman pottery graffiti by high resolution capture and 3d laser profilometry.” journal of archaeological science 39, no. 11 (2012): 3349–53. http://dx.doi.org/10.1016/j.jas.2012.06.011. newell, jenny. “old objects, new media: historical collections, digitization and affect.” journal of material culture 17, no. 3 (september 2012): 287–306. http://dx.doi.org/10.1177/1359183512453534. novati, gianluca, paolo pellegri, and raimondo schettini. “an affordable multispectral imaging system for the digital museum.” international journal on digital libraries 5, no. 3 (may 2005): 167–78. http://dx.doi.org/10.1007/s00799-004-0103-y. pallas, john, and anastasios a. economides. “evaluation of art museums' web sites worldwide.” information services and use 28, no. 1 (2008): 45–57. http://dx.doi.org/10.3233/isu2008-0554. parandjuk, joanne c. “using information architecture to evaluate digital libraries.” the reference librarian 51, no. 2 (2010): 124–34. http://dx.doi.org/10.1080/02763870903579737. http://dx.doi.org/10.5334/jcms.1021218 http://dx.doi.org/10.1016/j.culher.2009.04.004 http://dx.doi.org/10.5210/fm.v7i2.929 http://dx.doi.org/10.1016/j.jas.2012.04.020 http://dx.doi.org/10.1016/j.jas.2014.02.026 http://dx.doi.org/10.1111/cura.12009 http://dx.doi.org/10.1111/j.2151-6952.2010.00067.x http://dx.doi.org/10.1016/j.jas.2012.06.011 http://dx.doi.org/10.1177/1359183512453534 http://dx.doi.org/10.1007/s00799-004-0103-y http://dx.doi.org/10.3233/isu-2008-0554 http://dx.doi.org/10.3233/isu-2008-0554 http://dx.doi.org/10.1080/02763870903579737 information technology and libraries | june 2016 55 pavlidis, george, anestis koutsoudis, fotis arnaoutoglou, vassilios tsioukas, and christodoulos chamzas. “methods for 3d digitization of cultural heritage.” journal of cultural heritage 8, no. 1 (2007): 93–98, http://dx.doi.org/10.1016/j.culher.2006.10.007. ramírez-sánchez, manuel, josé-pablo suárez-rivero, and maría-ángeles castellano-hernández. “epigrafía digital: tecnología 3d de bajo coste para la digitalización de inscripciones y su acceso desde ordenadores y dispositivos móviles.” el profesional de la información 23, no. 5 (2014): 467–74. http://dx.doi.org/10.3145/epi.2014.sep.03. saracevic, tefko. “digital library evaluation: toward an evolution of concepts.” library trends 49, no. 3 (2000): 350–69. srinivasan, ramesh, robin boast, jonathan furner, and katherine m. becvar. “digital museums and diverse cultural knowledges: moving past the traditional catalog.” the information society 25, no. 4 (2009): 265–78, http://dx.doi.org/10.1080/01972240903028714. xie, hong iris. “users’ evaluation of digital libraries (dls): their uses, their criteria, and their assessment.” information processing and management 44, no. 3 (may 2008): 1346–73, http://dx.doi.org/10.1016/j.ipm.2007.10.003. http://dx.doi.org/10.1016/j.culher.2006.10.007 http://dx.doi.org/10.3145/epi.2014.sep.03 http://dx.doi.org/10.1080/01972240903028714 http://dx.doi.org/10.1016/j.ipm.2007.10.003 introduction microsoft word 9526-16430-5-ce.docx president’s message: reflections on lita’s past and future aimee fifarek information technologies and libraries | september 2016 3 when i reached out to ital editor bob gerrity about my first president’s column, he graciously provided copies of past lita presidents’ columns to get me started. it reminded me once again of the illustrious company i am in, starting with stephen r. salmon, the first president of the information services and automation division, as we were known until 1977. i am proud to be at the head of lita as it begins to celebrate its 50th anniversary year. a half century ago when lita was founded the world was experiencing an era of profound technological change. the us and soviet union were battling to be first in the space race, and an increasing number of world powers were engaging in nuclear testing. while civil rights demonstrations and the fighting in vietnam dominated the news, we were imagining peace via the technologically-driven future depicted in a new tv series called star trek. with tv focused on the stars, we were able to go to the movies and explore the strange new world of inner space in fantastic voyage. technology was poised to enter our daily lives as well, with diebold demonstrating the first atm1 and ralph h. baer writing the 4-page paper that would lay the foundation for the video game industry.2 heady times for technology indeed, and the fact that libraries were sufficiently advanced to require an association dedicated to supporting technologists is hardly surprising. by the time of lita’s founding at the 1966 midwinter meeting in chicago, library automation had been in development for over a decade.3 marc was just being invented, with the first tapes from the library of congress scheduled to go to the sixteen pilot libraries later that year. membership in the only organization that existed, the committee on library automation (cola), was restricted to the handful of professionals who either developed or managed existing library systems. but technology was beginning to impact many more librarians than just those rarified few. according to president salmon, “it was clear that large numbers of librarians who didn't meet cola's standards for membership were in need of information on library automation and wanted leadership.”4 the first meeting of our division on july 14, 1966 at the ala annual conference in new york was attended by several hundred librarians interested in information sharing, technology standards, and technology training for library staff. this group created the first mission, vision, and bylaws that set us on a 50-year path of success. lita is well positioned to take the first steps into our next 50 years. thanks to the efforts of last year’s lita board, we are on the verge of adopting a new two-year strategic plan that is designed aimee fifarek (aimee.fifarek@phoenix.gov) is lita president 2016-17 and deputy director for customer support, it and digital initiatives at phoenix public library, phoenix, az. president’s message | fifarek doi: 10.6017/ital.v35i3.9526 4 to guide us through the current transitional period. it will be accompanied by a tactical plan that will allow us to document our accomplishments and set the stage for an ongoing culture of continuous planning. also, jenny levine has proven to be extremely capable as she completes her first year as lita executive director. she has just the right combination of ala experience, technology know-how, and calm competence to guide us through the retooling and reimagining that is required to take a middle-aged association into the next phase of its life. the four areas of focus in the new strategic plan will help us to balance our efforts between preserving the strengths of our past and adapting our organization for a successful future. the first area of focus, member engagement, shows that our primary commitment needs to be to lita members. without you, lita would not exist. one of the key efforts is to increase the value of lita for members who are unable to travel to conferences. with travel budgets down and staying low, online member engagement is an area all of ala needs to improve, and who better to lead in this area than lita. the next area, organizational sustainability, is all about keeping the infrastructure of the organization strong, much of which happens in the domain of lita staff. budgeting, quality communication, and strategic planning all live here. the section on education and professional development recognizes the important role that webinars, online courses, online journal, and print publications play in allowing lita members to share their knowledge on both cutting edge and practical topics with the rest of the association and ala in general. we are already doing great work here and we need to better support and expand these efforts. the last focus area, advocacy and information policy, represents a future growth area for lita. now that everyone in the library world "does" technology to a certain extent, lita needs to think about how we will differentiate ourselves as outside competencies increase. our advantage is that we have been doing and thinking about technology for much longer than anyone else. with our vast wealth of experience, it's appropriate that we work to become thought leaders and implementers in the information policy realm. in this, as always, we return to where we started: our members. lita has thrived over the last 50 years because of this, our most important resource. lita was founded on the concept of sharing information about technology through conversation, publications, and knowledge creation. we endure because you, the committed, passionate information professionals are willing to share what you know with those who come after. and like our founders, there are always individuals who are willing to take on the mantle of leadership, whether through getting elected to lita board, becoming a committee or interest group chair, serving in key editorial roles for our monographs, journal, and blog, or joining the all-important lita staff. thanks to all of you who make lita’s future happen every day. i am proud to be in your company. information technologies and libraries | september 1016 5 references 1 . alan taylor, “50 years ago: a look back at 1966,” the atlantic photo, march 23, 2016, http://www.theatlantic.com/photo/2016/03/50-years-ago-a-look-back-at-1966/475074/, photo 46. 2. “take me back to august 30, 1966,” http://takemeback.to/30-august-1966#.v8szitlrtaq. 3. “library technology timeline,” http://web.york.cuny.edu/~valero/timeline_reference_citations.htm. 4. stephen r. salmon, “lita’s first 25 years, a brief history,” http://www.ala.org/lita/about/history/1st25years. institutional political and fiscal factors in the development of library automation, 1967-71 allen b. veaner: stanford university, stanford, california. 5 this paper (1) summarizes an investigation into the political and financial factors which inhibited the ready application of computers to individual academic libraries during the period 1967-71, and (2) presents the author's speculations on the future of libraries in a computer dominant society.il> technical aspects of system design were specifically excluded from the investigation. twenty-four institutions were visited and approximately 100 pe1·sons interviewed. substantial future change is envisaged in both the structure and function of the library, if the eme1·ging trend of coalescing libraries and computerized «information processing centers" continues. summary of major factors which inhibited the application of computers to library problems, 1967-71 major factors which inhibited the application of the computer to the library during the period 1967-71 can be categorized under three broad headings: (a) governance, organization, and management of the computer facility; (b) personnel in the computer facility; and (c) deficiencies in the library environment. a. governance, organization, and management of the computer facility 1. uncertainty over who was in charge of the computer facility.-this problem was partly attributable to the fact that the goals and objectives of the facility were imprecisely stated or not stated at all often there was no charter, no systematic procedures for establishing priorities, and excessive autonomy by the computer facility. these factors often permitted the facility to operate as a self-directing, self-sustaining entity, responsible to no informed, upper level manager. '~> the paper is based on a clr fellowship report to the council on library resources, inc., for the period january-june 1972. 6 journal of lihra1·y automation vol. 7/1 march 1974 2. effect of high level administrative changes.-in a few instances, the library automation effort was instigated by the president of the institution. he could, in effect, personally direct the allocation of resources. however, whenever a high administrative official leaves, the resulting vacuum is quickly filled by other interests, the atmosphere changes, and his personal program goals dissolve. 3. management inadequacies.-the effects of domination by a technician or special interest group are described below in more detail. although more and more organizations are putting together influential user groups to point the way toward better management, decision-making responsibility and authority continued to be misplaced in a few institutions which vested authority for technical decisions in a committee of deans who were somewhat remote from current trends in computing because of their administrative responsibilities. (in one institution, it was half jokingly stated that a dean in any hard science could be characterized as suffering from a minimum technological time lag of two years.) 4. lack of long-range planning inclusive of attention to community priorities.-few facilities visited had any written long-range plans, either for the acquisition of hardware, the conversion of older programs, or the involvement of users in systems design. ad hoc arrangements were prevalent. 5. system instability.-this was more the rule than the exception, especially in software, operating systems, hardware configuration, and pricing. wherever an academic computing facility was used for library development, the same broken record always seemed to be playing: the facility was always being taken apart and put together again. of course library development was not the only user affected; complaints arose from all users. 6. biased pricing algorithms.-in the academic facility, student and research use were competitive. hence systems were typically geared to distribute computing resources around the clock in some equitable and rational way. for instance, short student jobs were sometimes given a high priority for rapid turnaround, while long, grinding calculation work was pushed off to the evening or night shift by means of variable pricing schedules or algorithms. a pricing algorithm is basically a load leveling device to smooth out the peaks of over-demand and the valleys of under-utilization which would have occurred in the absence of such controls. devising pricing algorithms is by no means a simple task, since many factors must be taken into account: the kinds of machine resources available, their respective costs, the data rates at which they can function, market demand, hardware and software available, and system overhead, to name but a few. library jobs tended to suffer in both batch and on-line processing. in the former case, because batch jobs on large data bases took so much institutional political and fiscal factors/veaner 7 time, library work generally could not be done during the prime shift; in the latter case, an on-line library system made substantial demands upon a facility's storage equipment and telecommunications support, and competed with all other on-line users. 7. sense of competition with the library for hard dollars.-this problem, which is related to pricing bias, is detailed further on page 21. 8. scheduling problems.-many of the institutions visited had systems or charts for scheduling production, development, and maintenance. but conversations with system users often verified that schedules were either not met or had been unrealistically established. this was especially the case with development work b. personnel in the computer facility 1. selection and evaluation.-inasmuch as the library often did not have the competence to judge personnel nor the ability to generate meaningful specifications, there was generally very little protection from incompetence in this area. 2. elitism: the notion that the masters of the computer are inherently superior to and have better judgment than computer customers.-elitism is a paradox: it can be positive or negative-positive when the best brains produce software designs of true genius with respect to function, performance, economy, and reliability-but in its negative manifestation, reminiscent of the girl with the curl in the middle of her forehead: "when she was good, she was very, very good; when she was bad, she was horrid." during the boom years when computer facilities were expanding faster than the supply of competent staff, elitism seemed fairly common in the computer center. the excitement of rapid development, the seemingly unlimited intellectual challenge presented by the powerful apparatus, and high strung dispositions sometimes caused tempers to flare or immaturity to sustain itself beyond a reasonable time. strange hours, strange habits, bizarre behavior, all seemed to conspire against ordered and rational development. fortunately, as the field matures, the negative aspects of elitism are dying; managers now can concentrate on staff development work to turn top intellectual talents toward productive achievement. 3. disinterest.-this factor may be allied to elitism. in some instances, the computer center's staff gave considerable attention to the library during the period immediately following machine installation, when utilization was low. later, the staff's keen interest became "dulled" at the thought of operating a production system. "more interesting jobs" were .challenging the programmers and beginning to fill up the machine. 4. fear of the unknown big user.-it was recognized early that the library could be among the computer facility's largest potential customers, perhaps the largest. in some facilities, this recognition may have induced 8 journal of library automation vol. 7/1 march 1974 a fear of being taken over or overwhelmed by the user, who would then be in a position to dominate and dictate the direction of further development and operations. 5. fears of an unknown production environment-simply expressed, a production environment removes much of the stimulus for creative approaches to problem solving unless continuous development is maintained for new systems and new applications. many of the best programmers did not wish to lose their freedom to innovate and actively resisted participation in establishment of a production environment, with its concomitant requirement of "dull" maintenance support work. c. deficiencies within the library environment 1. failure to understand in full detail the current manual system.even where the manual system was understood, there was often an inability to describe it in the clear, unambiguous style essential to system design work. these deficiencies were further compounded by the unwillingness of some librarians to learn how to communicate adequately with computer personnel. 2. inability to communicate design specification.-many did not understand how to put together a specification document; particularly they did not know how to account exhaustively for all possible cases or alternatives. librarians were unaccustomed to defining their data processing requirements quantitatively or with precision-both absolutely indispensable to the computer environment. also, as much as the computer facility changed its software environment, many library development efforts were constantly changing their system requirements-a condition which made it all but impossible to program efficiently. 3. failure to understand the development process.-development is a new phenomenon in libraries. most librarians were not educated to comprehend development as an iterative process, characterized by experimentation, error, feedback, and corrective measures. accustomed to the relative stability of long-established procedures-some of which had stood for generations, even centuries-some librarians were baffled by the rapidly changing new technology, others showed impatience and a low tolerance for frustration. many expected development projects to resemble turnkey operations, and the failure of the process to accommodate these expectations produced disappointment and an inability to cope with the computer environment. 4. failure to recognize the computer as a finite resource.-both librarians and early facility managers seemed to look upon the computer as an inexhaustible resource, the former through lack of sophistication and the latter apparently through myopia or possibly ambition. some managers must have told their users that there was "no way" their equipment could be saturated in the foreseeable future. apparently some library users were naive enough to believe. institutional political and fiscal factorsjveaner 9 5. excessive or unrealistic performance expectations.-few library users understood the relationship between the system specifications and functional results, and fewer still understood the significance of performance specifications. the situation was not assisted by notions of "instantaneous" retrieval pushed by salesmen or the popular press. (the writer recalls vividly how one salesman told him the library could have a crt device for $1 a day! and indeed, the device itself was $1 per day if one cared to do without the keyboard, without cables, installation, control units, teleprocessing overhead, a computer, software, etc.) 6. lack of an established tradition of research and development ( r & d) and the lack of venture capital in the library community.the challenge of the computer may have been largely responsible for activating research and development as a serious and continuous effort in librarianship. inexperience in raising and managing funds for r & d, as well as a general lack of knowledge of computer cost factors inhibited progress or tended to make the development effort inefficient and full of surprises. 7. human problems.-some libraries having prior experience with small batch systems underestimated the scale of effort for contributing to the design of the large system, selling it to the users, installing it, and training the users. 8. insufficient support from top management.-in some instances, library management did not accord the automation effort the kind and degree of support essential to success. in particular, some librarians seemed to feel that automation was a temporary affair, definitely of less importance and significance than current manual operations. some did not recognize the sacrifices in regular production that would be necessary and some did not appreciate the continuing nature of development work. background two important prerequisites to progress in library automation were money and technical readiness. the government supplied the first, industry the second. the announcement by ibm in 1964 of its system 360 occurred at a fortunate time for the american library community. president johnson's administration had launched enormous programs in support of education. the library services and construction act was soon to channel millions of dollars into library plant expansion and, perhaps more significantly, the higher education act of 1965 was to sponsor research, which ui1til then had only the support of limited funds from the council on library resources, inc., and the national science foundation. (support from the national science foundation was largely, although not exclusively; directed toward discipline-oriented information services; one of the largest nsf grants went to the university of chicago library.) it was the right time to invest in library automation. important milestones were already behind the library community: the national library 10 journal of lihm1'y automation vol. 7/1 march 1974 of medicine's medlars program was well underway, the airlie conference on library automation had been held and its report published ("the white book"), and the library of congress automation feasibility study ("the red book") had appeared. 1 • 2 the first marc format was being tested in the field. in computer technology, third generation equipment represented major increases in computing power, processing speed, reliability, and capacity to store data in machine-readable form. ibm's sales force was successful beyond imagination in getting system 360's installed in large universities, as well as in business and government. ibm promised a new kind of software-time-sharing-which would virtually eliminate the tremendous mismatch of data processing speed between the human being and the machine. the new methods of spreading computer power through teleprocessing and time-sharing promised to make the computer at least competitive with and possibly an improvement over "antiquated" manual systems of providing rapid access to large and complex data files. within this relatively unknown environment, universities and libraries entered the software development process, which if successful, could enable them to catch up where they had been hopelessly falling behind. circulation, book purchasing, and technical processing loads in many libraries seemed to double and triple overnight as the country's schools and their programs grew to accommodate expanding enrollments. manual systems that had been reasonably workable and responsive in environments characterized by slow growth demonstrated significant and disturbing defects -the inability to deal with peak loads, or rapidly changing loads. the same effects were felt in administrative and academic computing: a bigger and more complex payroll, more students to register, construction contracts to monitor, more research grants which demanded bigger computers, and so on. these were truly boom years. but in the academic community there was still another force developing which was ultimately to be of even greater significance for libraries than the inconveniences of being unable to handle the housekeeping load: a dramatic rise in the expectations of patrons, especially in the academic community, where computers already abounded. libraries had come to be felt by some as strongholds of conservatism and expensive luxuries; librarians were faulted for not "putting the card catalog onto magnetic tape," for not implementing automated circulation systems, or otherwise failing to take advantage of new and powerful data processing techniques. the libraries were caught amidst a variety of sometimes conflicting, sometimes complementary factors: the visionary ignorance of the computer salesman, the senior academic officer possessed by the computer dybbuk, a lack of sympathy or understanding among some computer center managers, a lack of appreciation by students and faculty of the complexity of identifying, procuring, and cataloging unique copies of what must be the least standardized product known to man, and their own lukeinstitutional political and fiscal facto1'sjveaner 11 warm commitment to undertake the hard work required to learn how to use the computer resource. anxieties about jog displacement caused some library staff to look upon computers with trepidation, thus further placing the librarian in a defensive position. while these forces were taking shape, the library's bibliographic activities continued to be seriously hampered by inadequate international bibliographic control.~~ some essential computer hardware, especially the programmable crt terminal with an adequate character set, was either nonexistent or totally unsuitable to library applications. in this institutional context librarians entered the world of computers and data processing. t purpose it is the purpose of this report to examine in some detail how internal institutional factors affected the development of computerized bibliographic systems, and especially to consider nontechnical, negative factors: what slowed down or inhibited the applications of computers in librarianship? this report is not concerned with the merits or demerits of specific systems or their features; indeed, the investigator did not inquire about system specifications. major questions centered about the factors which fostered or hindered the development p1'ocess, regardless of the merit of a project or system. scope investigation was limited almost solely to those institutions considered likely to have large scale, in-house development projects using third generation computer equipment. the majority of places visited were large academic libraries. the time span included in the survey begins approximately in 1967 and ends in 1971. a total of twenty-four institutions was visited and some 100 persons interviewed; a list of the institutions visited is in appendix 1. methodology site visits and i nte1'views arrangements were made to visit four types of individuals: the director of libraries, the head of the library's system development department, the director of the computation center, and whatever principal institutional officer was managerially and/ or financially responsible for campus computing. considerable variation was found in the type of person assigned this last responsibility-it could be the provost, the vice-president u implementation of the library of congress' shared cataloging program under title ii6f the higher education act of 1965 was soon to alter this situation dramatically. t the painful trauma libraries and librarians experienced in getting into computers is too well documented to summarize here. perhaps the best summary has been done by stuart-stubbs. a 12 ] oumal of library automation vol. 7/1 march 197 4 for academic affairs, or the vice-president for business/ financial affairs. choice of the major institutional official to be interviewed was often determined by the pattern of computing in a particular institution, or the facility which supported the development effort. at first the investigator attempted to utilize a structured questionnaire for interviewing. this very quickly broke down, as the interviewees were generally voluble and ranged widely over many related topics or items which they would have been asked about later. accordingly, after the first few interviews, the formal questionnaire approach was dropped and a simple checklist of major questions kept on a few cards to make sure that each major issue had been addressed. every interviewee received the investigator graciously and none was unwilling to talk; indeed, if anything the opposite was the case-most persons seemed to be eagerly waiting for an opportunity to air their views. visits and interviews occurred during the period january-april1972. literature searches searching the literature on this topic has been extremely frustrating. in the literature of computer science and management, there are many articles on pricing algorithms, machine resource allocation schemes, and issues of managing the computer facility, but none specific to the topic of this report. besides scanning professional literature, the author has regularly conducted for the past year monthly computer searches via the ucla center for information service's sdi service. abstracts and citations were searched in research in education (rie) and current index to journals in education (cije). with respect to problems faced by the library in acquiring computer services, the results have been nil in both cases. the author reluctantly concludes that no major recent studies have yet been published in this sensitive area, although two papers by canadian librarians are very helpful. 3• 4 the national academy of sciences/computer science and engineering board's information systems panel appears to have come closest to identifying the issues in its report, library and information technology: a national systems challenge. still, the comments in that report are highly generalized and do not grapple with specifics. 5 structure of educational computing most of the visited institutions maintained separate facilities for administrative and academic computing, while a few ran combined facilities or were in the throes of consolidating their facilities. the differences between administrative and academic computing have historical roots deeply embedded in institutional soil. administrative computing is usually an outgrowth of punched card installations first set up for payroll and financial reporting. academic computing, on the other hand, has its origins within the institution's instructional and research programs. typically it has been supported by external grants and contracts and has been oriented toward institutional political and fiscal facto1'sjveaner 13 the "hard" sciences. until the recent dropoff in federal support of higher education, academic computing was a money maker (through the overhead on grants and contracts) while administrative computing was a money spender. administrative computing typically very little computational work is done in administrative applications; most of the computer work is associated with input, update, reading records, writing records, and printing reports. except for the pay.roll application, the consumer group has tended to be somewhat smaller and less transient than the academic group. but to university administrators the computer could do much more than write checks and pay bills. many significant administrative applications had already been installed on second generation equipment: faculty-staff directories, inventories of space, supplies, and equipment, records of grades, course consumption reports, etc. all these tended to expand the user group, increasing competition for the resource. the advent of third generation equipment made it attractive for administrators to think about applications centered around the so-called "integrated data base." this led to a demand for further new services for the registrar, fund raising and gift solicitation, student services, purchasing, etc. conventional administrative computing-particularly that part of it which generated regular reports-lent itself naturally to batch processing, and indeed many of the early computer installations actually continued established punched card operations, merely using the computer as a faster calculator and printer. the administrative computing shop is typically characterized by (or hopes to be characterized by) great systems stability and dependability, a cautious and measured rate of innovation, and in the opinion of some academic computing types, not much imagination. file integrity, backup and recovery, and timely delivery of its products are prime goals in an administrative computing system. the administrative computing facility very much resembles the library in two important aspects: ( 1) it is a production system; and ( 2) it is almost entirely an overhead function, i.e., there is little or no attempt at cost recovery from system users for its services. academic computing academic computing is a much different world. it serves a large, vociferous, .influential, and mostly technological user community, many of whom ~~e not only competent in programming, but more importantly, possess ready cash. but this is changing: as academic computing expands to service users in the humanities and social sciences rather than mainly those in the "hard" sciences, the user group is growing and it will probably not be long before it embraces the total academic community. in hard science applications, the academic facility typically performs an 14 journal of library automation vol. 7/1 march 1974 enormous amount of computing ("number crunching") with a relatively small amount of output. system backup and recovery is important to the academic computing facility, but file integrity responsibility may often be assigned to the user since such a center sometimes does not maintain the data base but merely provides a service for manipulating it. the main components of academic use are departmentor discipline-oriented research and student instruction, the latter being particularly strong if there is a well-established computer science department. software development has customarily played a major role in academic computing and the usual practice was to actively seek out imaginative systems programmers for whom change and system improvement are food and drink. consequently, instability, both in hardware and software, has been more the rule than the exception in the recent past, although as the management of computer facilities matures, this too is changing. currenttrendsandstatus it is obvious from the above that administrative and academic computing have been characterized by diametrically opposed machine and managerial requirements. where they have been combined in the same facility, tensions have prevailed and neither user was happy. in a few instances known to the writer, such combinations have been abortive and a reversion made to divided facilities. but as computing matures it is becoming evident that operational stability is needed for all types of computing, not just administrative computing. additionally, the financial crises now prevalent in institutions of higher education have brought more realistic attitudes to the fore in understanding just what kinds of facilities can be afforded, and how they should be managed. additionally, the economies of scale, the increasing flexibility of hardware and growing sophistication of software are now combining to form an environment which can better satisfy all potential users of computers. there are clear indications that a unified, well-managed shop with competent staff might now economically and efficiently serve a variety of applications, including administrative and academic-on the same facility. however, this is a developing trend and does not correspond with what the writer actually observed during his visits. in situ he saw much evidence that anthony oettinger's observations of some years ago were still valid: ... routine scheduled administrative work and unpredictable experimental work coexist only very uneasily at best, and quite often to the serious detriment of both. where the demands of administrative data processing and education require the same facilities at precisely the same time, the argument is invariably won by whoever pays the bills. finances permitting, the loser sets up an independent installation. 6 indeed, it would not be unreasonable to conclude from the interviews that in most places visited, computing during the period 1967-71 was in a institutional political and fiscal factorsjveaner 15 state of disarray. there is abundant and disagreeable evidence of technical incompetence, lack of management ability, ill spent money, communication failures, and naive and disillusioned users. but it would be a mistake to conclude that the failures in library automation are attributable primarily to computer-oriented personnel or hardware problems-librarians in their own way displayed many of these same failures. it would be another mistake to dwell excessively on the high failure rates observed. in any complex technological endeavor, the rate of failure is dramatically high at the beginning; there is ample evidence here from the aircraft and space industries. indeed, the likelihood of a first success in anything complex-library automation is complex, as we have learned the hard way-is practically nil. organization and management problems: the academic computing environment early academic computing facilities were typically run by faculty members in engineering, applied mathematics, computer science, or related fields. this arrangement was satisfactory when computers were small, relatively primitive, and the user community was confined to those few people who could program in machine language or assembly language. as equipment became bigger and more powerful, and as higher level programming languages developed, more and more people learned programming. correspondingly, the task of managing the computer facility grew rapidly in size and scope. the budget of a large computer center in a modern university can easily run to several millions of dollars annually. the manager must balance seemingly innumerable, complex forces: personnel, management, government and vendor relationships, demands from vocal users, establishing priorities, the challenge of hardware advances, marketing, pricing services, balancing the budget, etc. it soon became clear that few faculty members possessed either the multifaceted talents or the experience required for effective management. as the center's budget grew, and particularly as the shift was made from second to third generation equipment, th,e faculty member tended to be replaced by the technician as manager. unf01tunately for many of the facility users, the technician tended to promote his own technical interests in software development or hardware utilization. in some instances, the user community felt that the facility was being run more for the benefit of the staff than for the users. the technician-manager often looked at the computer as his personal machine, much as some faculty members had earlier felt the computer to be their own private preserve. the vice-president of one university expressed the view that the technician-manager doesn't really have an institutional loyalty tied to the goals and objectives of the academic programs; he is more loyal to the machine or the software. in a school with a long history of computer utilization, there had been no tech16 journal of library automation vol. 7/1 march 1974 nician in charge of the computer facility for a decade. yet in a school not too far away, an officer indicated that his institution had "made the same mistake twice in a row" by hiring a technician to manage the computer facility. the technician-manager represents a highly personalized management style, one in which goodwill, friendship, or personal interest is the key to effective service. it can hardly represent an arrangement for the successful development and implementation of computerized bibliographic systems. in the third and current organization and management phase of academic computer facilities, the professional manager is in charge. schools are now beginning to see the need to develop formal charters for their computing centers, quasi-legal instruments which will lay out their specific responsibilities as service agencies. a professionally managed service agency eliminates one of the most irritating elements in the allocation of computer resources: personal judgment by the faculty or technician-manager as to the worth of a project, which was so prevalent during earlier management stages. at the time of the interviews, very few institutions actually had such charters, but their need was being recognized. it is now universally accepted that the computer center can no longer be the plaything of the faculty nor the expensive toy of the technician. organization and management: the administrative environment because of its historical development the administrative computing facility was usually first run by someone with an accounting or financial background. (academic computing persons occasionally put disparaging labels on such people as "edp-types" or characterized them as having a "punched card mentality.") the nature of the workload virtually meant that the administrative shop would be set up mainly for batch processing and any data base services provided for other users would involve printed lists. such facilities were found satisfactory by a number of libraries even for applications such as circulation, which produced gigantic lists-probably because it represented a vast improvement over an antiquated, poorly designed, or overloaded manual system. however, there was at least one major technical consideration which had direct political and financial implications for the library which turned to the administrative computing facility for its computer support. this was the library's need to support and manipulate a data base with nearly every data element of variable length-a requirement that was practically nonexistent in administrative computing. some facilities were unable or unwilling to meet this requirement. the move from tape-oriented systems to mixed disc and tape systems on third generation equipment necessitated an upgrading of programming staff, and brought into the administrative shop the same clearcut distinction between system programmers and application programmers which had institutional political and fiscal factorsjveaner 17 emerged earlier in the academic shop. this change in turn demanded appointment of more knowledgeable facility managers, many of whom were drawn from business and industry rather than the ranks of in-house accounting staff. this transitional period was characterized by two enormously challenging parallel efforts: the conversion of existing programs to run on third generation equipment and the development of new applications. to an extent these responsibilities were competitive, and from this viewpoint it was certainly not a propitious time to embark upon anything as complex as bibliographic data processing. yet numerous workable systems emerged for circulation, book catalogs, ordering and accounting systems, and serials lists. these were not accomplished without anguish as the library did not control the machine resources and often did not control the human resources -the facility manager tended to make his pliority decisions to please his boss who was certainly not the librarian. besides, no application could really take precedence over payroll or accounting in the administrative shop. to the librarian it was more like borrowing another person's car than renting or owning a car: when the resource was urgently needed someone else had first call. organization and management: the library automation endeavor a detailed study of this subject is not within the scope of this investigation. however, it will be useful to note that the organization and management of library automation activities demonstrate development phases which closely parallel those in the computing environment: 1. a stage in which the user himself ( cf. accountant or faculty member) undertakes to perform the activity. in this stage individual librarians learned programming, did their own design work, wrote, debugged, and ran programs themselves. (this was possible in the "open shop" environment prevalent in many early computer facilities.) 2. a stage in which the technician-in this case a librarian with appropriate public service expertise (for circulation applications) or technical processing knowledge (for acquisitions, cataloging, or serials) -took charge of an organized development effort, hired his own programmers and systems analysts, and negotiated directly with the computer facility.* 3. a stage in which the professional system development manager is hired to oversee the total effort. such a person is sometimes drawn from business or industry, is a seasoned project manager, and has broad knowledge of computers, especially in the area of costs. such an ap*the technical person need not be a librarian. northwestern university represents a significant instance where a faculty member in computer sciences and electrical engineering undertook the development effort. 18 ]oumal of library automation vol. 7/1 march 1974 pointment is more common in the large library, the consortium, or network. human problems associated with rapid change in institutions some institutions, particularly in their administrative functions, became embroiled in a seemingly endless round of internal psycho-social problems which did not make the environment conducive to problem solving. the move to computerizing manually oriented functions, whether in the library or other parts of an institution, was found to be extremely threatening to established departmental structures. it was consistently reported that the political and emotional aspects of system conversion, both in the libra.ry and elsewhere, were much more aggravating than the technical aspects. the problem simply showed up first outside the library because applications of computers occurred there earlier. departments were sometimes unwilling to give up data for computer manipulation for fear that computerization would take jobs away. this phenomenon is not unknown in librarianship where some professionals take an extremely proprietary attitude toward bibliographic data. now pressures from governments, legislatures, and the academic community at large are gradually establishing the concept that some categories of data are corporate, and do not belong to a specific individual or department, or even to an institution, but should be shared through networking or other mechanisms. but the rapidity of microsocial change and its upsetting emotional consequences caught some library leaders unawares. a considerable reeducational process for both management and labor is required to smooth the transition to the new view. motivation problems it is difficult to elicit sound comment concerning motivation (or lack thereof) as a deterrent to progress in library automation. it is an emotional subject and neither the librarians nor the programmers come out "clean." the prima donna computer programmer, much in evidence in the early days of computer center development, is very much on the wane these days. like the spoiled child, the prima donna programmer could only exist where personal interests were permitted to take precedence over social goals-or perhaps where institutional goals for the computer facility had not been clearly articulated or had not yet come into focus. some prima donnas, partly out of ignorance, partly through a stereotyped image of library activities, were inclined to disdainfully dismiss library applications as "trivial," and demand "really challenging" assignments. but the librarians had their prima donnas, too. some had learned enough programming to be a little dangerous and they then felt like peers who could tell the computer center not only what to do but how to do it. at first, few members of the library staff were willing to learn how to arinstitutional political and fiscal factorsjveaner 19 ticulate their specifications and requirements to the management of a computer facility. most librarians expected some kind of miraculous magic, akin to a wave of the hand, to bring a computer system to reality. very few understood the heuristic nature of development. so there were barriers of status, depth of knowledge, and language-any one of which would have sufficed to kill the development of the good motivation essential to breaking new gro~nd. in the wrong combination they could present an overwhelming conspiracy, for their mutual interaction could only produce polarization and intransigence. the library and the computer facility the role of similarities and differences for a long time the library has been the "heart of the university." until the advent of the computer, little could challenge the supremacy of the library as the principal resource of an educational institution. even the faculty could be put into second place, since it was difficult to attract high quality faculty without good library resources, and the faculty were to a greater degree transient, for the library was considered "permanent," an investment for all time. the computer represents a new and challenging force in the arena where shrinking resources are allocated among competing academic users. both the library and the computer facility have experienced exceedingly rapid growth in the recent past, concurrent with an expanded demand for services which can easily outstrip available resources. among some of the larger academic libraries, the staff of the computer center may be half or greater than half that of the library. important differences between the two services have recently come into focus. first, most of the services and benefits of the library are intangible. because of this it has always been difficult to measure the cost benefit of the library as an institution, and it is well known that counts of the number of people entering the door or the number of circulations are far from true measures of the library's functional success. the computer, on the other hand, is a relentless accounting engine; computer facilities can produce endless statistics on the number of jobs run, lines printed, terminal hours provided to users, turnaround time, cards punched, etc. the computer's output is extremely tangible and can be more directly and easily related to academic achievement than can library use. a second major difference lies in apparently different financial roles within the institution. in most organizations, the library is run as an overhead expense, without any attempt to charge back to users or departments proportional costs of utilization. like air, the library resource is there for anyone to use as much or as little as he pleases; the library gets a "free ride," but the computer center is expected to pay its own way. this dichotomy is often explicitly designated as the "library-bookstore" duo model. furthermore, since the library does not generate much in the way of re20 journal of library automation vot 7/1 march 1974 search grants and contracts, it is looked upon as a consumer rather than a producer of financial resources. in fact, those who support computing in preference to books point to. the fact that overhead income generated by computer-related research grants and contracts is shared with the library which may have done little to contribute toward the acquisition of such income! in some institutions the situation has become critical indeed because of the recent substantial reductions in federal· support. much political in-fighting has been necessary to maintain current levels of computer activity, and not all such efforts· have been successful.· some institutions have been forced to cut back on computing power, merge facilities, or combine resources with other institutions. · · · · several years ago when the national science foundation imposed an expenditure ceiling on grants, associated overhead income was correspondingly reduced. one computer center director was reported to have suggested that the effect of this overhead cut could be nullified by a simple, internal reallocation of funds, say by taking the needed amount from the budget of another agency on campus of less significance to researchers and scientists, such as the library. this attitude is clear evidence that the library has lost its sacred cow status as a "good thing" on the campus. it too must justify itself. close examination of the library and the computer facility gives clear evidence that both deal with the same commodity: information. within the recent past several computer facilities have changed their designations to "information processing" facilities or centers. several institutions, notably the university of pittsburgh and columbia university, have coalesced the library and the computer center organizationally or have both units reporting to a vice-president for information services. the recognition and furtherance of this natural linkage may do much to reduce the potentially destructive competition which can characterize the relationship between the two units. there are remarkable growth parallels between the two facilities-the library acquiring and processing more and niore books in response to expanded publication patterns, more users, and the· growth of new ·disciplines and interdisciplinary research, while the computation facility moves rapidly from one generation of software and hardware to the next. the expansion of both organizations produces seemingly equal capital-intensive and labor-intensive pressures: library processing staff doubles and triples, while the ·newly acquired books demand ·more in the way of housing, whether of the traditional library type or warehouse space; the computer center moves toward more sophisticated hardware, especially terminals and communications, which need to be supported by greater numbers of still more highly qualified· systems programmers, communication experts, and user services staff. both services have a marketing. problem; but the computation facility, being relatively more dynamic and more interactive (because of terminal services), can be more sensitive and responsive, .financially and technically, to its clientele than can the library. only now institutional political and fiscal factors/veaner 21 with the emphasis upon computerized bibliographic networking has the library as an institution begun to approach the marketing strategies and the effective user feedback already well developed in computation facilities. service capacity, resource utilization and sharing differences both in service capacity and resource utilization represent a key political issue affecting the future of both libraries and computer facilities. in major universities, the budget for the computer facility is now not far from the library budget in size, and in a few institutions it exceeds the library budget. with the diminution of external grants and contracts, the two organizations compete for the same hard dollars. this economic competition can either drive the two facilities apart, dividing the campus, or cause them to coalesce-as has been the case at columbia and pittsburgh. despite its high operating costs, from the viewpoint of resource utilization, the well-managed computer facility can almost always point to an excellent record.§ no matter how well managed, the research library can never make this claim in the context of its current materials and processing expenditures, much of which by definition is aimed at filling future needs. the library and its patrons cannot "use" all the resources at their command; the library could not even service all the patrons should they demand the use of "all" the resources. in contrast, the computer facility (particularly large on-line systems with interactive capabilities) can be very efficiently utilized even when demand is heavy. thus, to the "objective" eye, it would appear that in the computer facility both the institution and the individual patron get more value for their dollar than they do in the library, which in comparison resembles a bottomless financial pit. one may counter that apples and oranges are being compared, but the institution which pays their bills nevertheless makes the comparison. flexibility, inflexibility, and the future besides better resource utilization, the computer facility offers the patron far greater flexibility of resource use than can the library. there is no way a large collection of books on the celtic language or the military history of the austro-hungarian empire can help a professor of structural engineering, a student of marine biology, or a researcher in modern urban problems. even the books these people actually need and use cannot easily assist others, as relevant data in them is not indexed or readily available for computer manipulation. · the point is that, unlike the library, the computer is a highly elastic universal tool, one that each user can temporarily shape to his own need, replicate .the shape later, or if he wishes change the shape at will. the traditional.lib:rary has no such flexibility; its main bibliographic retrieval de§in fact, if a computer resource is not much used and isn't "carrying its weight," it can be disposed of, by sale if purchased, or by cancellation if leased. 22 journal of library automation vol. 7/1 march 1974 vice-the card catalog-is especially noted for its high maintenance cost, its limited ability to respond to complex queries, and a general fixity of organization and structure that is ever at variance with changing patron expectations and interests. (if computers can be flexible, why can't the library?) there is much in the library that is not used because it is inaccessiblelocked up in an inflexible retrieval tool or unavailable because the stateof-the-art (both in bibliography and computer science) or staffing does not yet permit far deeper access via "librarian-negotiators" and patrons at terminals interacting with large and deeply indexed data bases. as long as major portions of the library budget and staff are devoted to housekeeping and internal technical processing, the library will look less good, less "costbeneficial" to the academic community than does the computer facility. but there is growing recognition that both institutions deal with information processing which covers a wide spectrum of time. true, the storage formats differ, but this may be a temporary phenomenon. as progress is made on improved, less expensive conversion of data from analog to digital form and vice-versa, the day may arrive when the library and the computer facility are indistinguishable. will the library become an information utility? computer utilities are an important developing trend and it is sometimes suggested that library services could be delivered within the utility model. utilities and libraries as they exist today have very different characteristics. a utility can be defined as a system providing a relatively undifferentiated but tangible service to a mass consumer group and with use charges in accordance with a pricing structure designed for load leveling (i.e., optimization of resource utilization). typically, a utility both wholesales and retails its services. within this definition, a conventional library cannot be construed as a utility; its services are generally intangible and very highly differentiated-indeed, chiefly unique, for rarely is one book "just as good as another"; its clientele is not the general public but a highly select group which itself contains highly unequal concentrations of users; and almost no libraries impose user charges in the interest of cost recovery; practically speaking, there is only one united states wholesaler (of bibliographic data) -the library of congress. this situation is changing in several respects. first, the establishment of practical, computerized bibliographic networks has introduced among participating institutions cost sharing schemes closely resembling the load leveling or rate averaging algorithms prevalent among utilities.ll these han example of rate averaging is the practice of the ohio college library center to lump total telecommunication cost and prorate it into the membership fee, in effect creab":ng a distance independent tariff. (this arrangement does not hold outside of ohio.) institutional political and fiscal factorsjveaner 23 new ideas have been readily accepted by libraries and could even become the basis for balancing more equitably the costs of interlibrary loan traffic. second, specialized "information centers" have evolved in certain fields, partially as a consequence of lack of responsiveness (or slow turnaround) by conventional library services, and "for profit" commercial services have been set up. examples of the latter include the european s'il vous plait and its american counterpart, f.i.n.d. (often such commercial services do not hire librarians as they are considered too tradition bound.) a third force which is rather inchoate at the moment may soon take on a recognizable shape: facilities management. under such a scheme, the complete management responsibility for all or part of a function is contracted to an outside vendor. for instance, it is conceivable that some libraries in the near future may have no in-house staff for technical processing. services would be purchased totally from a vendor or obtained from his resident staff, much as computer centers buy specialized expertise through the "resident s.e." (systems engineer). the gradual buildup of computerized bibliographic services offers an excellent opportunity for commercial ventures into turnkey bibliographic operations for libraries. this would bring the libraries one step closer to the utility concept, as they buy a complete package from a wholesaler who probably services many customers. the traditional library service concepts we know today may undergo drastic changes in financing and in methods of delivery. beyond the commercialized or contractual arrangement for technical processing, which is only one component of the total information flow, lie unknown territory and little explored concepts: use charges for library services (the bookstore model), the "for profit" library, the complete information delivery system integrated with computers, communication satellites, and cable tv. if the computer-based library is to become an information utility, a major accommodation will be needed in the financing arrangements, perhaps in form of user charges-for no utility can survive without regulated demand. an unlimited, uncontrolled demand for any product or service is untenable, for without regulation (i.e., pricing) demand rapidly outruns supply. in the traditional library, where theoretically every user has the "righf' to unlimited demand, this never happens for several reasons: (1) not all potential patrons elect to use the resource; ( 2) the users must usually go to the library to access the bibliographic apparatus and obtain the materials held by the library; ( 3) every item in a library collection does not have an equal probability of use; and ( 4) there is a finite rate at which human beings can "use the resource," i.e., people can read just so f~st. none of these self-limiting factors applies to say, electric power, radxo and tv broadcasting, telecommunication services, or similar utilities. the library picture could become quite different if these limitations were removed or mitigated. suppose the patron could access the bibliographic apparatus through his home computer terminal attached to his tv 24 journal of libmry automation vol. 7 ;1· march 1974 in the "wired city." further suppose that he could receive selected, short items (where time of delivery is important to him) directly at his tv set, or longer items having less time value as microforms or hard copy delivered by mail or private delivery systems. given such possibilities, the collecting policies of individual .. libraries" (if they continue to be called by that name) might well change drastically so that nationally, collections might become much more standardized or .. homogenized" -increasing the likelihood that individual holdings will have more nearly equal use probabilities. this would imply the need for one or more national and/ or regional centers for servicing the less used materials, along with appropriate delivery systems and pricing schedules. conclusion work on library automation has proceeded during a highly developmental period in the history of computing. in this sense, librarianship has suffered no worse than any other computer application, nearly all of which have gone through traumas of design, installation, redesign, reprogramming, etc. the main distinction is that in many of these other applications -government, military, industrial, or commercial-there have been . far greater resources available to the task and vastly greater experience with the development process. despite the obstacles, progress in computerized bibliographic work has been far more significant and has achieved far more than many librarians-especially those unaccustomed to the developmerit cycle..;..can appreciate. the snowballing growth of practical consortia and networks along with the successful installation and operation of several on-line bibliographic systems has already changed the face of libtarianship in ·a very short time. like the breaking of the sonic barrier, once the initial.difficulty is overcome, further progress is easier. the ·computer has successfully achieved what librarians have until recently· only paid lip service to: cooperation and wide sharing of an expensive· and large· resource. though the linear growth model in libraries has been dead for some time, the recognition of this fact has riot yet penetrated the entire profession. if libraries are to survive as viable institutions throughout this century and into the next, their leaders inust solve the financial, space, ·and human communication problems inherent in growth. local autonomy, local self-sufficiency, and the "freedom" to ·avoid, evade, and even· undermine national standards now show up as expensive and dangerous luxuries-potentially self-destructive. only through the computet will true library cooperation be possible~ only the development of regional and national bibliographic networks,· with the assistance of substantial federal funding, can really .. save" the library. the computer is actually the' library's life insurance and blood plasma .. a failure to respond to the challenge of the ·computer could be fatal, for it is increasingly apparent that patrons growing up in the computer era will not patiently interact'with··library systems geared to nineteenth-century methods. nothing institutional political and fiscal factorsjveaner 25 in the educational system exists to .force people to use a given resource; people use the resources which are effective, responsive, and economical. if the computer is a better performer than the library, patrons will go to the computer. this will be pa!ticularly the case as computer services· become broader in coverage, simpler to lise, and unit prices continue to decline. despite the serious and irritating problems associated with learning''tp ·use the computer,. librarians must continue aggressively to support. computer applications; indeed, library leaders can impart no more important message than this to their community leaders. · acknowledgments· i wish to thank the following persons for their support: dr. e. howard brooks, who was vice-provost for academic affairs in 1971, and da'vid c. weber, director of libraries, respectively, stanford university, for granting the leave of absence which enabled me to undertake this project. i acknowledge with thanks the contributions of the following persons who reviewed early drafts of the paper, in many cases making valuable suggestions and in other instances helping me ward off errors: mrs. henriette d. avram, head, marc development office, library of congress; hank epstein, director of project ballots and associate director for library and administrative computing, stanford center for information processing; frederick g. kilgour, executive director, ohio college library center; peter simmons, professor of library science, university of british columbia; carl m. spaulding, program officer, council on library resources, inc.; david c. weber, director of libraries, stanford university. references 1. barbara evans markuson, ed., libra1'ies and automation; conference on libraries and automation, warrenton, va., 1963. (washington, d.c.: library of congress, 1964). 2. u.s. library of congress, automation and the library of congress; a survey sponsored by the council on library resources, inc. (washington, d.c.: library of congress, 1963), 3, basil stuart-stubbs, "trial by computer: a punched card parable for library administrators," library ]ournal92:4471-4 (15 dec. 1967). 4. dan mather, "data processing in an academic library: some conclusions and observations," pnla quarterly 32:4-21 (july 1968). 5. lib1'aries and information technology: a national systems challenge; a report to the council on library resources, inc., by the information systems panel, computer science and engineering board. (washington: national academy of sciences, 1972). 6. anthony oettinger, run, computer, run (cambridge, mass.: harvard university · press, 1969), p.196. (these same comments were cited in allen b. veaner's earlier article, "major decision points in library automation," college & research libraries :299-312. 26 journal of library automation vol. 7/1 march 1974 appendix 1 list of institutions visited university of alberta university of british columbia university of chicago cleveland public library the college bibliocentre, ontario university of colorado columbia university cornell university harvard university university of illinois indiana university massachusetts institute of technology university of michigan new york public library northwestern university ohio college library center university of pennsylvania pennsylvania state university umversity of pittsburgh purdue university simon fraser university syracuse university university of toronto yale university subject access to a data base of library holdings alice s. clark: assistant dean for readers' services, university of new mexico general library, albuquerque. at the time this research was undertaken, the author was head of the undergraduate libraries at ohio state university. 267 as more academic and public libraries have some form of bibliographic description of their complete collection available in machine-readable form, public service librarians are devising ways to use the information for better retrieval. research at the ohio state university tested user 1'esponse to paper and com output from selected areas of the shelflist. results indicated usm·s at remote locations found such lists helpful, with some indication that paper printout was more popular than microfiche. while many of the computer applications in special libraries were designed to improve subject access to the collections, the systems adopted in academic and public libraries have often been those which would handle various file operations and improve control of circulation or technical processing functions. once some of the data describing the items in the collection became available in machine-readable form, reference librarians have been tempted to find ways to use it for subject retrieval. in november 1970, the ohio state university ( osu) libraries began to use its automated circulation system using a data base representing its complete shelflist with limited information on each title: field no. field 1 call number 2 author 3 title 4 lc number-or nolc if none available 5 title number 6 publication date (if available) 7 ser-serial indicator. when present indicates the title is a serial. 8 neng-non-english indicator. when present indicates the title is non-english. 9 size-oversize indicator. when present indicates the book is an oversize book. 268 journal of library automation vol. 7 i 4 december 197 4 field field no. 10 portxx:xx-portfolio number in which book is located (main library only). 11 mono-monographic set indicator. when present indicates 12 13 14 15 16 17 18 19 20 21 22 title has been designated a monographic set. number of holdings (not displayed if copy 1, main library) reference line number volume number copy number holdings· condition code library location patron identification number of specific saves for the copy circulation status date charged in the form of year, month, day date due in the form of year, month, day the system, modified from time to time, provided access by call number, record number, or author-title with an algorithm consisting of the first four letters of the author's name plus the first five letters of the title. a title search was also possible by entering four letters of the first significant word and five letters of the second significant word or five dashes. as soon as the system was implemented, it was immediately evident that the search option was one of the most important features of the system. the circulation clerk at any location either in the main library or in any department library could search the author and title and find: ( 1) if the osu libraries had the book; ( 2) where it was regularly housed; and ( 3) its status (charged out, missing, lost, or available for circulation). all of this was possible without checking the card catalog except when problems of identifying the main entry existed. the immediate lack was, of course, the subject approach. as use of the system continued and library personnel became more sophisticated, various procedures offering some kind of subject approach were developed. the title search option is one possibility for finding subject access. for example, to find a book on "evolution" one can enter the title search command tls/evol----and receive a report that there are 757 titles in which evolution is the first significant word. the terminal will then print out items as follows: tls/evol----page 1 757 matches 01 lan, h. j. 02 moody, paul amos. 1903 03 brosseau, george e 04 adler, irving 05 lotsy, j. p. 0 skipped evolutie (not all retrieved) introduction to evolution evolution evolution evolution 1946 1970 1967 1965 subject access/clark 269 06 smith, john maynard, 192007 miller, edward on evolution evolution evolution evolution evolution . 1972 1917 19-1924 1951 08 watson, j. a. s. 09 kellogg, v. l; 10 shull, a. franklin when the user types in pg2 or pg3, more titles will come up, and if more than thirty titles are desired, the original command can be reentered with a /skip 30 option to display others including all 757 titles if necessary. it is also possible to manipulate this option further since this first. search may tum up the name of an author recognized as an authority on the subject. in this case, when thomas huxley's evolution and ethics appears, the terminal attendant changes to an author-title search, ats/huxlevolu, and finds eight matches, four books by thomas huxley and four by julian sorell huxley on the same subject: ats/huxlevolu page 1 8 matches 01 huxley, thomas henry 02 huxley, thomas henry 03 huxley, julian 04 huxley, thomas henry 05 huxley, julian sorell 06 huxley, thomas henry 07 huxley, julian sorell 08 huxley, julian sorell 0 skipped (all retrieved in 1) evolution and ethics, and other essays evolution and ethics and other essays evolution, the modern synthesis evolution and ethics and other essays evolution as a process evolution and ethics and other essays evolution in action 1st ed evolution as a process 2d ed 1970 1916 1942 1897 1954 1896 1953 1958 to find the call number of any of these, the attendant merely enters a detailed line search dsl/1: dsl/1 hm106h91896a huxley, thomas henry evolution and ethics, and other nolc 902452 1970 1 01 001 3week und page 1 end the ability to search by a word in the title, which in the above example gives a form of kwic subject index, is even more specific if two words are used. for example, the attendant may enter tsl/chilpsych to bring up titles containing the words "child" and "psychology" as follows: tls / chilpsych page 1 52 matches 0 skipped (not all retrieved) 01 jersild, arthur thomas, 1902child psychology. 4th 1954 02 jersild, arthur thomas, 1902child psychology 5th ed 1960 270 journal of library automation vol. 7/4 december 1974 03 thompson, george greene, 191404 kanner, leo 05 curti, margaret (wooster) 06 clarke, paul a 07 greenberg, harold a 08 english, horace bidwell 09 chess, stella 10 curti, margaret (wooster) child psychology 1952 child psychiatry 3d ed 1957 child psychology 1930 child-adolescent psychology 1968 child psychiatry in the commun 1950 child psychology 1951 an introduction to child psych 1969 child psychology 2d ed 1938 the obvious subject approach is, of course, by call number. the system contains an option that permits a search on the general call number. the operator may enter either a real or an imaginary call number and receive the fifteen titles preceding and the fifteen titles subsequent to it in the shelflist. for example, with the command sps/hm106h9, using the call number from the previous example, the following ten titles will appear with that call number as the central item: sps/hm106h9 11 hm106g77 graubard, man the slave and master 12 hm106h3 haycraft, darwinism and race progress 13 hm106h57 herter, c. biological aspects of human problems 14 hm106h6 hill, g. c. heredity and selection in sociology 15 hm106h63 hoagland, evolution and man's progress 16 °hm106h9 17 hm106h91896 huxley, evolution and ethics and other essays 18 hm106h91896a huxley, evolution and ethics and other essays 19 hm106h91897 huxley, evolution and ethics and other essays 20 hm106h91916 huxley, evolution and ethics and other essays 21 hm106k29 keller, societal evolution; a study of the evolutionary basis page 2 input:hm106h9 entering pgl will bring up the ten preceding titles and pg3 the ten sub:sequent titles. one of the best features of this system is that the patron may call in by telephone and have at least some of this information read to him; if he is at a circulation area, he may receive a printout as an instant bibliography. recently an attempt has been made to use the file of data in other ways. in an attempt to provide better access to the main campus collection for the people at the five regional campuses of the university, an experiment was tried using a computer printout of certain selected parts of the shelflist. since microfiche is less expensive and more compact to handle, there were good reasons for using this form rather than the paper printout form. this was an obvious application for computer output microfiche (com). once subject access/clark 271 a master frame has been produced by com, the cost of additional copies is negligible. in order to test acceptance of form more accurately, it was decided to provide a list in each form to test on sample populations. to cover some of the subjects taught at the agricultural and technical institute at wooster, a total of 20,672 titles were selected in the following areas: agricultural economics botany agriculture agricultural machinery wood technology woodworking hd1401-2210 qk10-942 s tj148g-1496 ts80g-937 tt13g-200 2,121 titles 1,039 titles 17,157 titles 6 titles 197 titles 152 titles these titles were printed in a hard-copy printout in the following format with a program designed by gerry guthrie of the research and development division of the osu libraries: call number = tj1496c3a3 title number = 196795 author = caterpillar tractor company title = fifty years on tracks publ. date = 1954 holdings = cool com regular lc number = 55-20529 the physical form of the resulting documents varied somewhat due to the fact that each subject area was put in one cover. this meant "agriculture" ( s) with 17,157 titles was too bulky to carry around, but "wood technology" was compact and easily carried to one's office or home for leisurely browsing. a brief questionnaire was used to test the reaction to the list. responses were received from 6 percent of the students and faculty at the agricultural and technical institute. with the usual assumption that some students are not library users, there was some validity to the sample. results tabulated from these questionnaires fell into three categories: ( 1) nature of use; ( 2) value of the list; and ( 3) response to its form and format. since some questions were left blank, the totals were often less than 100 percent. nature of use the responses turned out to be evenly divided between faculty and students, 46 percent for each with some leaving this question blank. the faculty indicated that two-thirds of the use was for themselves and one-third for the students. students, of course, used it totally for their own purposes. the actual purpose of the list had been envisioned as access to the main campus collection, and increases in interlibrary loans indicated that it was 272 journal of libmry automation vol. 7/4 december 1974 effective. loans during the month of october 1973 totaled four while november's loans totaled thirty-four, showing a marked difference after the delivery of this search tool on october 31. the questionnaire showed that 77 percent indicated they used the information for this purpose. it should not have been a surprise to librarians to find that 34 percent of the sample population used the information to order a duplicate copy for the wooster ati library, an indication of readers' known proclivity for wanting their material close at hand. users' evaluation the increase in interlibrary loans was probably a better reflection of the users' approval than the actual questionnaire results, although the results themselves were also highly positive. seventy-seven percent checked that they found it valuable, against 15 percent who did not. eighty-five percent said they wanted more lists. requests for additional suggestions included a request to keep it up to date and a request to limit it to just recently published items, while another person asked for all of the titles located in the agricultural engineering library. the requests indicated that several additional subject areas were wanted: communication skills, personnel management, human relations, use of airplanes in agricultural, irrigation, and drainage engineering, and environmental pollution. suitability of form and format some attempt was made to determine how people react to the admittedly inconvenient form of a computer printout. since financial considerations limited the possibilities to either this form or microfiche, those options were presented in the questionnaire. preference for the paper form was expressed by the users of the list in this form-84 percent to 8 percent who would have preferred microfiche. · the population was evenly divided as to whether or not they wished to have the list in this call number order-50 percent wanted it by straight shelflist or call number order and 50 percent wanted it alphabetically by author. the latter response may very well reflect the proportionally large number of respondents who were faculty and who supposedly would know the authors in their fields and do not use a subject approach when seeking materials. while the original purpose of the research was to provide better subject access to a remote collection, it was also important to find out more about the user's response to microfiche if he could be given an improvement in service or a service he did not previously have. microfiche would be both more compact and less expensive if lists of this type were to be provided in many subjects and continually updated. for the microfiche section of the research project the library of congress classifications covering classics and related fields were chosen, partly subject access/clark 273 on the basis that faculty in these areas had agreed to participate and encourage their students to use the list. included were: de1-de98 df101-df289 dg11-dg209 n563q-n5790 na20q-na335 pa-all z7001-z7005 history-the mediterranean world history-greece history-italy history of art-greek and roman architecture-history-greek and roman language and literature of greece and rome bibliographies in linguistics, roman and greek literature, teaching languages this subset produced about eleven thousand titles. the format of the com was the same as that on the paper printout, with general titles appearing at the top of each sheet or frame, e.g., shelflist-classics-greece. this took twenty-two microfiche with sixty-nine frames each listing seven or eight titles. the last frame on each fiche was an index to that fiche. a nonreduced (eyeball) character at the top listed the first call number on the fiche. it was envisioned that the user might know the general classification number, search for it by the eyeball character, then consult the index in the last frame to locate the proper frame for a specific class. in this way the user could browse through the subject area. the chief advantage of com lay in the fact that the small envelope of microfiche and a portable reader were easy to check out of the library and carry home or to an office where the user could browse through the library shelflist at a leisurely pace. since initial reaction was negative, a subject index was prepared to make the list more usable to undergraduate students. this index was made up of the appropriate entries which appeared in the library of congress classification schedules, with all entries consolidated into one alphabet. 1 using this index to find an entry-for example, "caesar, c. julius" -the student would find two areas to search: dg261-267 and pa6235-6269. he would find these areas on the microfiche with the eyeball characters, then search the index frame to find the appropriate pages. the classics list with its index and instructions was packaged in neat, loose-leaf notebook form and, together with a portable reader, presented to classics faculty at two regional campuses. a set was also available in the library. the results were completely negative. reliance upon the cooperation of too small a number of cooperating teachers may have invalidated this part of the research, but the contrast in response to the similar printed list raised serious questions about user response to microfiche in an index or reference book situation.2 it had been anticipated that a population in the humanities or social sciences would have had more need than the science group for what was essentially a book list since serial titles did not include 27 4 j oumal of library automation vol. 7 i 4 december 197 4 holdings. the complete lack of interest from the faculty in the field of classics was an unexpected disappointment but no firm conclusions could be drawn without a research strategy designed to remove any possible variables. conclusion increased use of marc cataloging through such systems as oclc and ballots will mean many more libraries will have their total holdings in machine-readable form with the capability of using their records in new ways. programs for distributing microfiche copies of library catalogs such as georgia tech's lends program provide inspiration for public service librarians to make use of the data and technology that technical services automation projects are supplying. 8 this experiment in manipulating machine-readable library records for use in subject searching was an attempt toward better retrieval of a library's collection and indicated that such programs would be useful to extend service outside a single library location. references 1. it may soon be possible to do this in a much simpler fashion by using the combined indexes to the libl'ary of congmss classification schedttles (washington, d.c.: u.s. historical documents institute, 1974). 2. doris bole£, "computer-output microfilm," special libraries 65:169-75 (april 1974). in describing the use of com at the washington university school of medicine, bole£ said, "there is, however, an additional disadvantage, namely, the resistance of users to the use of microforms because of their inconvenience. patrons will sometimes choose not to read a publication when told it is available in some sort of microform only. it is assumed that librarians are not quite as reluctant, but it would be a mistake not to take this reluctance into consideration. this resistance by both librarians and patrons is stronger than is usually reported by com manufacturers and service bureaus" ( p.170-71). 3. the georgia tech libl'ary's complete card catalog is now available in microfiche form, brochure (atlanta: price gilbert memorial library, georgia institute of technology, 1972). lib-s-mocs-kmc364-20141005044147 117 technical communications isad announcements please note a change of address for the editor of technical communications: send all future news releases, technical communications, and announcements to don l. bosseau, director of libraries, emory university, atlanta, ga 30322. technological inroads artificial intelligence transistors and other circuit elements of the new generation of computers are so tiny and fitted so closely together that it becomes feasible to combine thinking circuits and memory units on a single chip. thus, one cell in the computer's memory bank can both remember and reason. this is a major step closer to artificial intelligence. in july 1971, the japanese government eannarked $100,000,000 for an eight year study of artificial intelligence. japanese industry accepts the conclusion that it could be increasingly dependent on "intelligent" computers. a technical report dated february 1971 reads; "the development of these tiny chips presages a time when the electronic brain will rival the human brain in complexity and memory. the identity of the fully educated computer may become blurred with that of its programmer-teacher! it may exhibit esthetic and artistic judgments of an interesting degree of subtlety. responses akin to feeling and emotion need not be excluded from its training if they may enhance its performance." along with artificial intelligence will come electronic voice recognition. voice recognition by the computer-in other words, a computer that will respond to oral command-is making significant engineering progress. rca reports that its voice command machine responds to twenty-eight of the basic sounds in the english language.-(extracted from advertising age, march 19, 1973). cbs laboratories invents way to produce microfilm pictures by laser dr. william e. glenn, jr., director of research at cbs laboratories, a division of columbia broadcasting system, inc., has been granted a u.s. patent for an improved method of recording and reproducing information from microfilm. by means of a splitbeam laser, pictorial or printed information is transferred to a metal master. this metal master disk is similar to the type used in the record industry and, from this disk, duplicates can be stamped at low cost. "the market potential," dr. glenn stated, "will not only include the cassette and film industry, but it can be an asset to libraries and government printing as well. this system," he further stated, "is designed for recording and reproducing picture infonnation. it uses diffraction gratings that are modulated in accordance with the picture information. reproduction is effected by directing light through the medium. the zero-order diffracted light is modulated in accordance with the picture information." the patent, assigned to columbia broadcasting system, inc., will offer reduced costs for recording on microfilm. it has potential for use in the motion picture film industry, libraries, and cassette r~ cording. cbs laboratories has made other outstanding advances in laser technology which include the laser color film recorder, holography, and the holographic scanner. l i i 118 journal of library automation vol. 6/ 2 june 1973 microimagery-solution to the information explosion tomorrow's busy businessman will have the information necessary to do his job right at his flngertips, due to the growing acceptance of microimagery as the solution to the information explosion. "in every area of business today, the need for information is increasing faster than any individual can keep up," says walter steel, bell & howell's vice-president of microimagery marketing. "university courses are now teaching kids to be generalists and how to flnd the information on what they need to know. they're learning that the vehicle to the access of information sometimes is more important than the knowledge," steel says. the seventies will be known as the decade of microfilm, just like the sixties for the copier and the fifties for the computer, according to steel. microfilm is halfway between the computer and the copier as a support to business, because it includes copies and peripherals to the computer. soon the copier will become peripheral to microfllm, steel states. steel calls microimagery, "the immediate communication tool." it's the new media that fits the new world of business. soon companies will be saying to their customers, "we11 send you our computer once a week." technical journals will simply send their subscribers a paper newsletter that hits the high spots, along with a deck of microfiche and a new index, plus a retrospective new index each month, steel forecasts. "microfilm won't ever totally replace paper," says steel, "but it will replace file cabinets and storage areas, plus it will simplif,y the filing system in any size office. steel says that the potential for microfilm is greatest in the business records market. the bank market was the base for the microfilm business, but it's no longer predominant, according to steel. "the basic unique value of microimagery is that it saves money. our goal at bell & howell is to be able to provide a complete microfilm system for the small office market for under $1,000. that would include a camera, microfilm processor and viewer," he stated. in light of increasing postage costs, many publishers are actively investigating microimagery. ten pounds of printed matter are reduced in microforms to an ounce or less. with the development of microfiche having a 50 to 1 ratio (i.e., 510 images on a 4 x 6 inch fiche) , 90 percent of the books published could be available on a single microfiche each. the book of the month club could become the fiche of the month club. in every profession there's new technology that the successful manager must have access to in order to continue his success. microimagery can put that knowledge at his fingertips. reports-regional projects and activities new uc library automation office established berkeley-coordination of multicampus automation projects serving the university of california's libraries has been placed in a central office under a director of the university-wide library automation program (ulap). jay l. cunningham, a project manager in uc's institute of library research, has been appointed to the director's position. library automation has been underway for several years at the university of california, which is considered one of the pioneers in this field. each of the nine campus libraries has specialists for automation on its staff, and a central staff has been also working on such problems in the university-wide institute of library research ( ilr) . with growing emphasis on automation, coordination of the various campus projects becomes increasingly important to insure that applications are compatible. uc also maintains close contact with similar efforts at the california state university and colleges. coordination of a number of horary functions by these two segments of public higher education may be greatly facilitated by automated procedures. in recent years, such coordinating tasks have fallen more and more on ilr, an organized research unit directed by a professor in berkeley's school of librarianship, charles p. bourne, who also served as acting ulap director for the past eighteen months. since the primary task of such units is research in support of the university's educational function, the responsibility for development and operation of university-wide automated procedures has been made into a full-time assignment, with cunningham taking over as ulap director from professor bourne. a close working relationship will be maintained between the two groups. among projects well under way are the following: university of california union catalog supplement. the berkeley and ucla catalogs published in book form in 1963 have been recently supplemented by a forty-seven-volume set showing all monographs cataloged by all nine campuses during the five years 1963 through 1967. preparation and printing of the more than 750,000 titles was done by semiautomatic methods. union list af serials. all serial publications, including book series and scholarly journals, to which uc libraries subscribe are entered in another list that is to be continually updated, a task greatly simplified by the computer. scholars and other library users will be able to determine immediately which uc campus subscribes to any serial and how complete its holdings are. bibliographic center. in addition to housing the above two projects, this center helps in processing newly acquired books by printing catalog cards by computer at the ulap headquarters. cards can be ordered by a uc library in full sorted sets, including multiple sets if needed for branch libraries. the new system supplements the present method of ordering cards separately from outside vendors or producing them on each campus. among projects envisaged for the future are automated circulation procedures, under which each borrower would be given a machine-readable card and the charge slip technical communications 119 in the back of the book would likewise be machine-readable, such as a punched card. this method would speed the checking out of books and facilitate statistical studies. other projects include a clearinghouse that would indicate instantaneously whether a new book recommended for purchase has been already ordered by another campus; and the streamlining of library accounting procedures. cunningham is a graduate of cornell university and holds the master of library science degree from berkeley. before joining the ilr staff, he served as a library systems specialist at the library of congress, and as a u.s. air force officer for four years. the new director will report directly to vice-president-academic affairs angus e. taylor, a university-wide official. committee undertakes implementation of program which will afford universitywide "direct access" state university of new york students will soon benefit from more direct access to the 7.5 million books and 6.2 million slides, films, recordings, and other research materials contained in libraries on the university's thirty-four state campuses. that the university is moving to provide faculty and students with walk-in privileges at any of the libraries at the twentynine state-operated campuses and the five statutory colleges at alfred and cornell universities, was announced recently by university chancellor ernest l. boyer. the proposed system, which has the endorsement of the faculty senate of the university, will greatly improve upon the university's current interlibrary loan program under which beaks at cooperating libraries can be borrowed through the mails. working in cooperation with state university librarians, chancellor boyer has announced the formation o£ a committee of librarians and administrators to develop a timetable and procedures to implement the program. the committee will be chaired by willis bridegam, director of libraries at the university center at binghamton. 120 journal of library automation vol. 6/ 2 june 1973 the other members of the panel are dr. philip sirotkin, vice-president for academic ahairs at the university center at albany; don cook of the university center at stony brook; mary cassata of the university center at buffalo; george cornell, college at brockport; and henry murphy of cornell university. in addition to developing a program timetable and procedures, the committee will also explore the future possibility of extending access privileges to the faculty and students at the thirty-eight locallysponsored community colleges. the expanded library access policy is seen as an essential step in the university's efforts to use its library resources more effectively, particularly since the cost of acquiring books and periodicals has grown at an extraordinary rate in recent years. some publications costs have increased at the rate of 15 percent p er year. state university of new york is the first major multicampus system to introduce such a reciprocal program on so wide a scale, although the library system of the state university of illinois has a similar policy, limited to faculty and graduate students. the growing use of modern computer and data processing techniques is another cost control program the university has implemented in administration of its libraries. shared cataloging techniques and the compilation of lists of university-wide locations will be developed to enable library users expeditiously to locate books and reference tools. the policy will be particularly beneficial to students of the university's empire state college, since they are not campusbased and must rely heavily on library collections near their homes or places of employment. the policy will also make it much more convenient for students and faculty to conduct research and complete reference assignments in other parts of the state during vacation and intersession periods. collectively, the libraries at the university's state campuses comprise one of the greatest collections of titles and reference materials in the world. holdings for the 197172 academic year included 7,551,333 volumes, 237,428 microfilms, another 5,115,584 units in other forms of microtext, 20,587 slides, 71,007 recordings, 86,662 maps, 90,694 periodical titles, 29,334 additional serial titles, and 541,007 printed government documents-for a grand total of 13,743,636 entries. potpourri u.s. experts study soviet science information system and services eight united states information specialists from government, universities, professional societies, and private industry participated in the first u.s.-u.s.s.r. symposium on scientific and technical information, organized under the u.s.-u.s.s.r. agreement on cooperation in the fields of science and technology, in moscow, june 18-19. the group led by dr. lee g . burchinal, head, office of science information service, national science f oundation, also spent ten days in the soviet union visiting key information organizations in moscow, novosibirsk, yerevan, and kiev. the purpose of the symposium and subsequent site visits was to give the u.s. group an opportunity to learn more about the soviet system for providing science and industry with needed scientific and technological information, and to explore feasible areas for possible future cooperation. in addition to dr. burchinal, members of the group included william t. knox, director, national technical information service, department of commerce; melvin s. day, deputy director, national library of medicine; dale b. baker, director, chemical abstracts service; scott adams, science communications division, the george washington university; dr. vladimir slamecka, director, school of information and computer science, georgia institute of technology; bart holm, manager, systems development section, information services division, e. i. dupont de nemours & co.; and jerome luntz, senior vice-president, mcgraw-hill publications co. the group was hosted by engineer n. b. arutiunov, director of the information directorate, state committee for science and technology (scst), council of ministers of the u.s.s.r. the symposium featured four presentations by soviet specialists on the following topics: 1. state scientifl.c and technical information system of the u.s.s.r. (dr. 0. v. kedrovskiy) 2. viniti's integrated information system for the u.s.s.r. (dr. a. i. chernyy) 3. specialized system of scientific and technical information services in instrument making (dr. v. a. rukhadze and dr. v. m. baikovsky) 4. psychological aspects in charting the pathways of scientific and technical information development (prof. dr. g. t. artamonov) on june 20-23, the u.s. group visited the all-union institute for scientific and technical information (viniti), the allunion scientific and technical information center (vntitsentr), the all-union research institute of medical and medicotechnical information (vniimi) and the state public library of the u.s.s.r. for science and technology (gpntb-sssr). on june 24-29, the u.s. group visited the siberian branch of the u.s.s.r. academy of sciences and the novosibirsk center of scientific and technical information, the armenian research institute of scientific and technical information and technico-economic studies ( armniinti), the ukrainian research institute of scientific and te chnical information and technico-economic studies (ukrniinti), and the institute of cybernetics of the ukrainian academy of sciences. although about .five years behind the u.s. in applications of technology, especially computer and microform systems, dr. burchinal said, the soviets have established a strong base for rapid future growth. reflecting their style of centralized, national planning, the soviets are well advanced toward development of an integrated national information system embracing both science and technology. the major components of the emerging technical communications 121 integrated national system are { 1) centralized policy, planning review, and methodological guidance provided by the state committee for science and technology ( scst) ; ( 2) concentration of national backup resources in all-union (national) institutions; ( 3) eighty-two ''branch" information networks established by the industrial ministries; ( 4) development by the fifteen republic and regional information institutes of "interbranch" or interdisciplinary dissemination services to serve local industries and planning bodies. a major feature of this national information system is emphasis on the active dissemination ("propaganda") of information about technological innovations throughout the soviet economy. the u.s. group, d. burchinal said, was particularly struck by the importance attached to information services by the highest levels of scientific and technological management in the u.s.s.r. and in the constituent republics. their commitment is reflected in the resources being assigned to development of improved information services. four new buildings are being constructed in moscow alone for all-union scientific and technological information services; staffs are being expanded; third-generation computer systems will be installed at numerous sites beginning in early 197 4; and new buildings are underway or were recently completed for n early a dozen republic and interbranch services. in short, the soviets know where they want to go, and they are devoting considerable resources to achieve their national objectives. the second half of the symposium begun in moscow was held in washington on october 1-2. at that time u.s. and u.s.s.r. representatives sought agreement on areas of continued cooperation which will be reported to the joint u.s.-u.s.s.r. commission on cooperation in science and technology when it meets in moscow. a report of the june visit by the u.s. team to the u.s.s.r. will be available through the national technical information service. 122 journal of library automation vol. 6/2 june 1973 pertinent publications isad cable tv information packet now available from the american library association's information science and automation division is a thirteen-piece packet of materials on cable television. included in this information kit of articles, bibliographies, policy statements and suggestions are the following: • annotated bibliography on cable television for librarians, brigitte l. kenney and susan bunting • catv: visual library service, brigitte l. kenney and frank w. norwood • cable television-a bibliographic review, james schoenung • cable television: state-of-the-art and franchise recommendations, advisory memorandum by nowell leitzke • a glossary of terms for cable television and other broadband communications, merry sue smaller • guidelines for planning a cable te1evision franchise, sidney dean, jr. • letter to joe fischer, jr., from c. lamar wallis, director of libraries, memphis public library and information center • metropolitan library service agency (melsa) position paper on cable television, jon shafer • planning for urban telecommunications, kas kalba • public-cable, inc. statement • a report on cable communications and the district of columbia public library, lawrence e. molumby • san francisco public library video center policy statement • video/ cable activities in libraries, brigitte l. kenney and susan bunting packets are available for $2.50 each. send order to: cable tv packet, donald p. hammer, !sad, american library association, 50 e. huron st., chicago, il 60611. please make checks payable to the american library association. lib-mocs-kmc364-20140106084141 221 book reviews introduction to information science, tefko saracevic, ed. new york: bowker 1970, 776 pp. $25.00 the editor has put together a large volume consisting of 776, 8~ x 11 pages and weighing almost 5 pounds. it comprises 66 different articles written by almost as many authors and covers the period from 1953 to 1970. two-thirds of the articles were written during the period 1966-1969. in short, it is a collection of a large number of papers mostly from the last few years having to do in some way with information science or more properly, with information systems. the papers generally are good ones and in some cases have already become acknowledged classics. in a few cases i am a bit puzzled about their inclusion in a volume of this type. in the few months since i have had this book i have already found numerous occasions to consult several of the articles. some of the other papers which i have not seen recently i have enjoyed reading again. the book is divided into four parts, which are further subdivided into thirteen chapters. the four parts are basic phenomena, information systems, evaluation of information systems, and a unifying theory. although the chapter headings are too numerous to list they include such topics as notions of information, communication processes, behavior of information users, concept of relevance, testing, as well as economics and growth. by virtue of the parts, chapters, and articles the editor has provided a type of classificalion system or structure for information science without attempting to define information science. interspersed between each of the parts and chapters is up to a page of introductory and explanatory material provided by the editor. in a volume of this type it is important to recognize what the volume is and what it is not. as i have mentioned, it is a good anthology of important articles related to information. it is not, as the title implies, an introduction to information science. the papers are by and large unrelated to each other and the introductory comments by the editor do little to provide a unifying relationship. furthermore, the overall scope of the articles is generally quite limited and, although the editor implies it is not so, tends to equate information science to information systems. the final paper in the volume by professor william goffmann is listed by the editor as part four-a unifying theory. the precise title of the chapter is somewhat less ambitious: namely, "a general theory of communication." the paper is an unpublished one (although similar papers by the author have been published elsewhere) and relates communication in a general sense to the theory of epidemic processes. although the theory is an 222 journal of library automation vol. 4/4 december, 1971 interesting one, it would hardly qualify as a unifying theory for information science. it certainly does not provide the unifying relationships among the various articles included in the text. my guess would be that .other qualified individuals, in putting together a similar volume, would have included many different articles. this, however, is the nature of the field at this time. by comparison note the recently published volume key papers in inf01·mation science, arthur w. elias editor. this book, although admittedly serving a somewhat different purpose, contains 19 papers with only a single paper in common with those of this particular volume. in summary, this is a good collection of relevant and useful articles in information science. it is probably desirable that they be included in a single volume. serious students, educators, and research workers will find this volume to be of interest. as a reference book it will be quite useful. the book is not, however, an introduction to information science. the novice, the student, and the casual reader will probably be disappointed and confused, and in some cases might even be misled. marshall c. yovits information processing letters. north-holland publishing company, amsterdam. vol. 1, no. 1, 1971. english. bi-monthly. $25.00. this journal is published by a most reputable company and has a most impressive international list of editors and references. the affiliation of editors illuminates the orientation of the journal: six of them are from departments of mathematics, computer science or cybernetics and two are from ibm laboratories. understandably, the journal is devoted basically to computer theory, software and applications, with a heavy accent on mathematically expressed theory related to the solution of computing problems, algorithms, etc. it is directed toward basic and applied scientists and not toward practitioners. people interested in library automation may, from time to time, find in it theoretical articles broadly related to their work, but they will have to do the "translating" themselves. this journal follows the tradition of "letters" journals in physics, biology and some other disciplines. the papers are short; publication is rapid; work reported generally tends to be very specific, preliminary to or a part of some larger research project; usually small items of knowledge are reported. the "letters" journals are received in the fields where they appear with mixed emotions. for instance, ziman (nature 224:318-324, 1969) questions very much the need for these publications. on the other hand, they are a useful outlet for authors who otherwise would not publish these often useful bits of specific knowledge. recommended for research libraries related to computer science. t efko saracevic book reviews 223 handbook of data processing for libraries. by robert m. hayes and joseph becker. new york: becker & hayes, inc., 1970. 885 pages. $19.95. to write a universal handbook in a field so full of complex intellectual pro.blems and simultaneously satisfy every potential reader is an impossible assignment. therefore the authors cannot be faulted for failing to satisfy everyone. they have succeeded in writing for a very important audience -administrators and decision makers. for this group, they have presented difficult technical material in a clear readable fashion-a reflection of ' their extensive teaching experience. for many library administrators, this handbook arrives five years too late. had it been available earlier, a large number of current automation projects might never have been authorized by management, or at least might have been conducted on a sounder basis. following a very conservative approach, the authors generally remain within the limitations of the current state of the art, being careful to distinguish that which is feasible (i.e., practical ) from that which is possible. over and over again, they warn librarians about the limitations of computers and caution against excessively high expectations. for administrators, the most useful material is in chapter 3, "scientific management of libraries," and in chapter 8, "system implementation." a reading of chapter 8 alone suffices to convey to the administrator the magnitude and complexity of even the most seemingly routine computer application in libraries. this chapter, the most important and useful in the entire book, covers planning, organization, staffing, hardware, site preparation, programming, data conversion, phase-over, staff orientation, and training. each of these topics-deserving of complete chapters in themselves-is treated briefly, but in enough detail to communicate the complexity of each component in the long stream of system development activities, all of which must be completed to the last detail for success. there are three useful appendices: a glossary, an inventory of machine readable data bases, and a list of 115 sources for keeping up to date. bibliographic footnotes abound and each chapter ends with a list of suggested readings. however, it is surprising how many references are five or more years old; in fact, there is a scarcity of current references. for example, ballou's well-known guide to microreproduction equipment, now entering its 5th edition, is cited in the first edition of 1959. the authors have been badly served by their proofreaders. the book is marred by an incredible number of spelling errors in text, tables, footnotes and references, especially with personal names, plus incomplete citations. the index contains many entries too broad to be useful, such as: utilization of computer ( 1 entry ), time sharing ( 1 entry), hardware ( 3 entries), technical services ( 3 entries). lacking from the index are name references to distinguished contributors to the literature, such as avram, cuadra, degennaro, fasana, and others. many of these names appear only in footnotes. 224 journal of library automation vol. 4/4 december, 1971 the book is rich in tabulated data and specifications for a variety of equipment. unfortunately, much of this equipment is inapplicable to library use, or the tabulated data is in error. table 12.25lists several defunct or never marketed equipments, such as ibm's walnut and eastman kodak's minicard, without indication of non-availability. in table 11.22 there are extensive listings of crt terminals, most of which are unsuitable for library applications by reason of deficient character sets or excessive rentals. nine of the units listed showed rentals of over $1,000 per month, and two of these were virtually at $5,000 per month, clearly beyond the reach of any library. table 12.2 suggests the access time to one of 10,000 pages in microfiche is half a second, a figure that is off by an order of magnitude for mechanical equipment and by two orders of magnitude for manual systems. (more nearly correct figures are given in the text on page 396). table 12.21 lists several microfilm cameras designed expressly for non-library applications and not adaptable to any library purpose. from a broader perspective, one misses several other features. is a "handbook" for the practitioner? if so, this volume is too elementary. can it be used as a textbook in a course in library automation and information science? the book contains no problems for students to attack, and except for references, no aids to the instructor. possibly it can serve as supplementary reading, for it contains far too much tutorial material (yet only ten pages of nearly 900 are devoted to flow charting). one wishes for more specifics drawn from the real world. a hypothetical case study in chapter 11 is illustrative: a 5% error rate is assumed for input of a 300,000 record bibliographic data base to be converted to machine readable form. not revealed in the example is that a relatively low error rate in keyboarding may result in a very high percentage of records which must be reprocessed to achieve a high quality data base. each reprocessed record will consume computer resources: cpu time, core, disc i/0, tape reading and writing, etc. we know from marc and recon that the ratio of the total records processed to net yield is on the order of 3:2; i.e., each record must be processed on the average of one and a half times to get a "clean" record. the cost of this reprocessing is far beyond the 5% lost by faulty keyboarding. the handbook will be a useful decision making tool for the generalist, a less helpful aid to the practitioner. it is hoped that a revised edition is in preparation, and particularly that the tabular material will be corrected and brought up to date. chapter 8, the heart of the book, should be greatly expanded. for the next edition, some consideration might be given to a two-volume work: the first volume for the administrator, and the second containing much more technical detail for the practitioner. if the two volume pattern is followed, a loose-leaf format with regular updating would be most helpful for the second half. allen b. v eaner book reviews 225 l~brary automation: experience, methodology, and technology of the lzbrary as an information system, by edward w. heiliger. new york: mcgraw-hill book co., 1971. xii, 333 pp. the need for a handbook and/or general introductory text on the topics of automation and systems analysis in libraries has been sorely felt for quite som.e time. during the past year, three have appeared (chapman and st. pierre, library systems analysis guidelines, wiley, 1970; hayes and becker, handbook of data processing for libraries, wiley, 1970 and the book here reviewed.) unfortunately, none is completely satisfactory, for different reasons. a serious student wanting a reasonably comprehensive, systematic, and balanced treatment of these subjects will, i'm afraid, be forced to have to use all three of these titles and, even then, will have constant need to use supplementary materials for a number of aspects. the title being considered in this review by heiliger and henderson, if one judged only by the authors' intent as expressed in the preface, would be exactly the kind of work that we've all felt the need for. as they state on page vii, the purpose "is to provide a perspective of the library functions that have been or might be mechanized or automated, an outline of the methodology of the systems approach, an overview of the technology available to the library, and a projection of the prospects for library automation." and, indeed, if one looks at the table of contents there are four parts that closely parallel this statement of purpose. the parts themselves though, when inspected more closely, reveal not a systematic treatise or even an in-depth treatment of these topics, but rather a loosely connected series of essays, each on a fairly superficial level, discoursing on a variety of aspects associated with, or tangental to these topics. this indicates, at least to this reviewer, that the genesis of the book was a series of lectures presented and refined over a period of time by the authors. although not in itself a bad thing, here it is unfortunate to some degree because not enough effort was expended in amplifying the material with additional data, library-oriented examples, and illustrations, nor in logically integrating the various parts. part i, entitled "experience in library automation," begins by broadly citing a number of library automation projects mostly dating from the early 60's. the level is extremely superficial and the presentation not very enlightening, since only three or four projects are mentioned, and then only in passing. immediately following are several excellent chapters describing traditional library activities (e.g., acquisition, cataloging, reference,. etc. ) in functional terms. the approach, though extremely simple, is for the most part effective and is only marred by occasional, overly condescending statements such as "library filing is a very complicated matter" or "reference librarians use serials literature extensively." unfortunately, in the 104 pages of this section there is not one illustration. 226 journal of library automation vol. 4/4 december, 1971 part ii, "methodology of library automation," attempts to describe the general approach and techniques of systems analysis. in a number of ways, this is the best part of the book. unfortunately, the concepts that are so simply and succinctly described are only indifferently related to activities that will be familiar to librarians. as a brief essay on the objectives and concepts of systems analysis, it is quite adequate, but as a discussion of how they relate to library problems, it is totally inadequate and often misleading. part iii, "technology for library automation," is probably the least informative part of the book, giving the reader virtually no practical information. all of the important and obvious technological concepts are listed, but are dismissed with what oftentimes is little more than a brief definition. the one exception to this is chapter 13, entitled for no apparent reason, "concepts." this chapter is in fact an innovative and thoughtprovoking view of a library as a data-handling system. one wishes that this chapter had been amplified and treated more fully. part iv, "prospects for library automation," is the least effective part of the book, having in my mind only one merit: it doesn't tack on a hollywood-style happy ending. the authors' view of the 70's, as far as can be inferred from this too short section, is cautious and mundane. these will be, i'm convinced, the overriding characteristics of automation efforts for the next several years. i only wish that the authors had elaborated more fully on these points and presented their views more coherently. the book is augmented with a 61-page bibliography ( 1,029 citations), which, though reasonably current, is of dubious worth because it is neither annotated nor particularly well balanced. certain classics, such as bourne's methods of information handling, or information storage and retrieval by becker and hayes, and certain current, basic items, such as cuadra's annual review of infonnation science and technology and the journal of library automation, are not listed. each chapter is accompanied by a "suggested reading list" wherein materials more or less pertinent to the subject of the chapter are listed. a glossary of terms in three parts (a total of 36 pages) is also included and, though difficult to use because it is in three alphabets and interspersed with the text, provides short but very adequate definitions. unfortunately, several jargon terms used in the text itself are not included; one that was most irritating to this reviewer is the term "gigabyte" which to my knowledge has very little currency among the cognoscenti. on balance, library automation is a title that should be recommended for a wide range of readers. though it will probably have little to offer experts in the field, it does have value as a text for library students or a general introduction for the average, non-technical librarian. paul ]. fasana book reviews 227 sistema colombiano de informacion cientifica y t echnica (sicoldic). a colombian network for scientific information, by joseph becker et al. quirama, colombia: may-june 1970. 59 p. mimeo. the task of the study team which produced this report was to present "an implementation plan for strengthening the scientific communication process in colombia by providing a permanent systematic mechanism to function in the context of colombia's internal needs for scientific and technical information in government, industry, and among the research activities in higher education." more specifically, the expressed goal of such a mechanism is "to develop a network which will permit any scientific or technical researcher, in government, industry, or university, to access the total information resources of the country without regard for his own physical location." the study was comple ted in two months (according to the cover dates) and comprised four areas of investigation, namely: 1) to elucidate the advantages of d eveloping a centrally administered national network including three levels of network nodes and a technical communications plan; 2) make an inventory of universities, institutes, telecommunications and computer facilities in colombia; 3) recommend a mix of these factors to produce specific services, and 4) propose a seven-year budget. the republic of colombia is about the size of texas and california combined, and its population is about 1 million less than new york state. most scientific and technical workers are located in five major cities, and the country is divided into six administrative zones. within these zones twenty universities and forty-four institutes were inventoried by the study team with respect to specialization, faculty, book collections and the like. from these universities and institutes, five primary and seven secondary nodes were named to be connected by means of a telex communications system. the telex connections are not to be computer-mediated in the forseeable future, but used for interlibrary loan and other messages. ( there were two teleprocessing systems operating in colombia at the time of the study.) basic recommendations are: that a governmental unit be established with responsibility for directing the development of sicoldic; that this unit, with a high echelon board of directors, should produce several directories, bibliographies and union lists, and publish a monthly catalog of government-sponsored scientific and technical research. in addition, a manual for use of the telecommunications system should be produced. the proposed budget is about $250,000. ( 4.5 million pesos ) for the first year, graduating to a 25-fold increase by 1976. in some aspects the sicoldic plan follows the pattern of some state library development plans being implemented in the u.s. the advantage of central control of information resources planning and fund control b y the sicoldic group, with fairly direct access to high governmental 228 journal of library automation vol. 4/4 december, 1971 authority, provides reasonable insurance for support of the plan, especially since these services contribute to the economic and scientific advance of colombia. there is no indication of the acceptance of the plan by colciencas, the governmental unit which commissioned it. of the sixty references in the bibliography, spanish publications predominate. ronald miller cooperation between types of libraries 1940-1968: an annotated bibliography, by ralph h. stenstrom. chicago: american library association, 1970. this bibliography is an effort to sift, organize and describe the literature of library cooperation produced during the period 1940-1968. two criteria governed the selection of the 348 books and monographs listed: 1) they must deal with cooperative programs involving more than one type of library, and 2) they must describe programs in actual operation or likely to be implemented. although most of the cooperative projects described are located in the united states, other countries are represented when the material about them is written in english. cooperative programs in the audio-visual field are included. the annotations explain the nature of the cooperative projects and give the names of participating libraries. an appendix describes briefly about 35 recent cooperative ventures not yet reported in the literature, which the editor learned about through an appeal published in professional journals. entries are arranged chronologically to facilitate direct access to the most recent developments and to permit tracing the evolution of a particular project over a period of time. three indexes provide approaches to the material by 1) name of author, cooperative project or library organization, 2) type of c ~ .... c ~ .... c;· ~ < ~ c.o ......... ~ t) (!) @ o(!) v'"l ..... :s 0 marc based sdi service j bierman and blue 311 low and high values are entered into the system and put directly into the lc table for searching of the marc tape. table 3 presents some lc classification numbers as they might be keypunched and entered into the system and a brief explanation of what records will be pulled as hits (matches). lc table entries are in the form of aann,nn where aa stands for the two possible initial letters and nnnn stands for the four numbers following the initial letter( s) and immediately preceding the first decimal point or next alphabetic character. zero is the lowest value and z is the highest. mter all classification number cards have been converted to table entries, the marc tape is read, the lc and dewey numbers are pulled from each record, and both tables are searched for hits (matches). the dewey classification number from the marc record is read and converted into a fixed-length 10-position numeric field. for example, the classification number 020/.6234/5456 from the marc tape would be converted to 0206234545 and the number 025.3/02 would be converted to 0253020000 before dewey table searching. if a classification number card had been 020-029 (see table 2), both of these records would have been a hit. the lc classification number read from the marc record is first converted to the form aannnn and then searched against the lc table. for example, the classification number z665.h45 from the marc tape would be converted to z00665 and z678.3.k39 would be converted to z00678 and then searched against the lc table. if the last entry in table 3 had been input into the system, these records would both be hits, as their lc numbers lie between zooool and zolooo. if a match is found in either table, the marc record is transferred in the original marc format to the output tape with the list code. mter odl-07 is completed, control passes to odl-07x. odl-fp7x program inputs are the header tape from the previous run and the detail tape containing the selected records from the previous ( odl-07) run. outputs are the sdi listings by subject areas (list code). figure 3 is a detail flow chart for odl-07x. the first record is read from the header tape and the detail tape is then searched for matching list codes. when a match is found, the marc record is formatted and printed. when the entire tape has been searched, the next header is read, the detail tape is rewound and the process is repeated. this continues · until all header and detail records have been matched and printed. the result is a series of sdi lists, each in lc card number sequence. see figure 4 for a sample of two printed records from a library science list. presently, the weekly lists are being printed on two-up, three-part, perforated teletype size (8~" x 5~") paper, one record ( sdi notice) to each separable form. 312 journal of library automation vol. 3/4 december, 1970 hskp at end at end yes construct and print record no fig. 3. odl-07x detail flow chart. discussion the· sdi system was written with flexibility as one of the main considerations. dewey classification number cards in ahnost any format can be machine converted to the intended table entry. both ranges and individual classification numbers are allowed. any number of dewey and lc entries and any number of lists can be handled simultaneously, the only limit being core size. the selection tables, not being built into the programs, can be changed at any time, weekly if desired. the print format generally follows traditional catalog card arrangement, the major difference being that each subject heading and added entry appears on a new line and is not numbered. the print program can be easily adapted to any conversion table desired; delimiters, field terminators, etc. are referred to symbolically. there is an optional feature which allows marc based sdi servicefbierman and blue 313 09/03170 ll erary science stevens, ~ary elizabeth• autc~atic indexing, a state•cf•the•art report. reissued with additions a.no ccrrecticns. washington, u.s. national bureau of standards, for sale by the supt. of docs., u.s. ggvt. print. off., 1970. vit 290 p. 26 cm. 2.25 (national bureau of standards ~onograph 91) •a united . states department of co~~erce publication.• includes eiblicgraphies. automatic indexing. t.s. national bureau cf standards. monograph 91 cc1co.u556 no. 91, l97c 029.5 73•t07239 marc • oklahoma oklahoma d.,artm£nt of lo1uoon sdj usu inr<>rmatoot< s••voco 09/03170 library sci ehce librarianship and literature, essays in honour of jack paffcrc. ecitec by a. t. milne. london, athlone p., 1970. viii ., ·141 p., 4 plates. illus., port. 23 cm . 40/incluoes eibliographical references. £r. j. ~. p. paffcrot by a. t. mjlne.--1. the british museum in recent times, by sir f. fra~cis.--2. the education cf a librarian, by r. irwin.••3. library cd-operation in great britain, by s. p. l. filcn.••4. the development of british university libraries, by j. w. scott.--5. problems of a special library, by r. tho~as.••6. t~e growth of literary societies, by a. brown.••7. the editor and the literary textt requi~e~ents ano opportunities, by ~. f. brooks.••b. some leaves frcm a thirteen•centurv illuminated manvscript in the university of london library, ay f. wormal0.••9. a bibliography of j. h. p. paffort, by j. harries and r. wo pound. library science••acdresse·s, essays, lectures. pafford, john henry pyle. ~ilne, ~lex~niler taylor, eo. pafford, john ~enry pyle. z6~5.l57 c20/.9~2 10~477193 ~85111179 marc • oklahoma . oklahoma o .. artm£nt 01' liuaiifs sol u ... infoiimation suvicf fig. 4. sample sdi notices. 314 journal of library automation vol. 3/4 december, 1970 any character or characters to be deleted and the resulting gap closed; this is desirable for diacriticals until better techniques for handling them are devised. both line and page length are referred to symbolically and can be easily changed to fit any form desired. line spacing and indentation are built into the present program, but even these can be changed. the major disadvantage of the sdi system as it now exists is that it allows selection by classification numbers only. unlike the marc i experimental sdi system at indiana university (16), which allowed for selection by weighted terms (both classification number and subject heading), this system allows for classification number selection only. programming difficulties, expense, and the necessity for additional processing time inhibit searching on subject headings. for selection of detailed subjects, subject heading searching is essential; however, for making subject searches in subject areas classification number searching seems more expedient, as it would be difficult to determine, and expensive to input, all of the subject headings for the field of law, for example. ideally, a marc-based sdi system would be able to provide selection based on classification numbers and/or subject descriptors. computer, language and cost the computer for which the programs were written was an ibm 360/30, 32k core size, one card read/punch, four tape drives, two disk drives and one printer. the programs have also been successfully run on an ibm 360/25 with one card read/punch, two tape drives and one printer. in the latter case, the first program was modified slightly because only two tape drives were available, whereas the sdi system normally requires three. modification was easily accomplished by having the header records punched rather than written in odl-07. the programs are written in cobol for the 360, operating under dos. very little modification would be required to operate under os. being written in cobol, the programs are easily adapted from one machine to another; they have been successfully run on a rca spectra, for example. they also are easily adapted and changed, the symbolic names and procedure division paragraph headings having been carefully selected to build in as much documentation as possible. following is a breakdown of the charges to the department of libraries for programming and machine time for development; department of libraries' staff time, overhead costs, and operating costs are not included. programming and debugging ------------------------$2,941.00 machine & operator costs for testing ___________ 452.00 operating costs are more difficult to determine and nearly impossible to evaluate meaningfully. the total amount of computer time required (and therefore the major cost) is primarily a function of the number of records on the marc tape being searched and the number of selected marc based sdi servicejbierman and blue 315 and printed records. if the marc tape contains 1,200 records, it takes about twelve minutes (clock time) of computer time (ibm 360/30, 32k) to select the desired records ( odl-07). as the total of classification numbers being searched increases (that is, as the dewey and lc tables grow), the computer time for selection does not appear to increase significantly. the print program ( odl-07x) is directly a function of the number of lists being produced (the number of times the detail tape must be rewound and re-read) and the total number of records being printed. as an example, ' if six different lists are being produced and a total of 375 records are being printed out, the computer time is 25 minutes. therefore, producing six weekly lists with an average of 62 records for each list takes approximately 37 minutes (clock time) each week. at the rate of $60.00 an hour, this is $37.00, or approximately 10c per record selected and sdi notice printed. table 4 presents a detailed analysis of five weekly runs. the total computer time is the number of minutes which were charged to the department of libraries by the computer center. since the department is charged one dollar per minute, this is also the dollar cost to the department for computer and operator costs for that weekly run. unfortunately, the total time given includes time for set-up and other factors. therefore, meaningful patterns are difficult to discern, as one week it may take several minutes longer to get the forms inserted and lined up in the printer, forms may break another week, etc. the remainder of table 4 is exactly accurate. it is interesting to note how much variance there is from week to week in the number of sdi notices for each subject list. for example, out of 889 marc records on the marc tape run on july 23, 16 were library science titles. however, the marc tape run on august 6 contained 1,201 records but only 12 were library science titles. in addition, notice that the library science list was reprinted seven times, and for the last two weeks reprinted five times, to get the total number of copies needed for the 25 subscribers to the list. current uses the uses to which the system is presently being put are in three general areas: 1) sdi lists for internal use of the department, 2) sdi lists for state government, and 3) sdi lists for other libraries. the department currently produces subject lists primarily for its own use in the areas of law and political science. since the department maintains specialty collections in these two subject areas, it is anxious to obtain the most current information on materials published in them for selection purposes. because the marc record comes out before the corresponding proof slip is distributed ( 17), use of the marc file has been a most successful means of obtaining complete and verified bibliographic information for the purpose of ordering new books. in addition, complete lc cataloging information is available should the proof not have arrived at the time the book is received. because the lists are currently being printed on three2'able 4. sample run times and list lengths. r"' cl' -8 a .. " ~ n 3 '" ~ .. " 1 1 1 1 75 92 118 22 /:j yl. lis ll. 1 1 1 1 7 1 r~ 100 ?n 71 83 100 20 1 1 1 1 60 65 73 21 60 65 73 21 1 1 1 1 61 80 89 29 61 80 89 29 1 1 1 1 80 113 106 31 80 113 106 31 c )> c .. ~ .. "!?"!?-0 "' 2. 2. (') ~ ~ .. g; .. !1. a (;' a 1 1 -15 41 -i:j 41 -1 1 -r u -8 44 -1 i 1 5 34 16 5 34 16 1 1 1 11 38 17 11 38 17 1 1 1 11 42 22 11 42 22 number of print runs number of marc records selected number of sol notices printed number of print runs nun>t>e_r of marc records selected number of sol notices printed number of pri nt runs number of marc records selected number of sol noli ces printed number of print runs number of marc records se ected number of 501 notices printed number of pri nt runs number of marc records selected number of sol not ices printed c.;, ...... :: "'! ;i ~ .q.. ~ ~ ~ "'i ~ > >:: cs .... ;::s a ~· ~ ~ c.;, .......... ~ t::1 ('t) (') ('t) s 0" j~ ...... ~ marc based sdi servicejbierman and blue 317 part teletype paper, one record per sheet, it is easy to separate the record to be ordered and send one copy to acquisitions, retaining one copy for the files, and sending one to the interested individual in state government with a note that the book is on order. the department also produces a special list of many different subjects which are of interest to the legislature for the legislative reference division of the governmental services branch. the legislative reference division can then order particularly useful materials quickly and route a copy of the sdi printout to the interested legislator or legislative committee. the department has prepared profiles of the state agencies having a large planning and research role. lists are prepared weekly for the department of education, department of corrections, department of vocational-technical education, department of welfare, industrial development commission, department of highways, and several small agencies, and are sent to the person responsible for planning and research within the department. he can then request books from the lists by returning one copy of the sdi notice to the department of libraries with a note to order, retaining the other copy for his files or routing it to a researcher particularly interested in the subject. certain lists are being produced and shared with libraries around the state. the law and political science lists are being sent to two law schools in oklahoma. the library science and bibliography lists are being sent to the library school and the two largest public library systems, as well as the two state universities. over 25 libraries outside oklahoma are receiving weekly library science, political science or law lists ( 18). a cooperative acquisitions program is evolving whereby certain libraries agree to specialize in certain subject areas so that every subject area would be covered by one library for specialized materials not needed by all libraries. currently, the program involves the two major public libraries and the department of libraries wherein the state teletype network (otis) is used to transmit rapidly information on expensive materials for cooperative acquisitions. selected lists in the specialized subject areas can be produced each week for each of the cooperating libraries to aid them in their selection, acquisition and cataloging of the materials. the uses currently being made have excited the imagination of many people, both within and without the department of libraries. a great deal has been accomplished since the system became operational early in february 1970; however, the possibilities have barely been identified. as mentioned above, one can envision this being the foundation of a cooperative acquisitions program. such a system could form a node of library service to business and industry; currently, some thought is being given to producing weekly lists of materials in automation and computer science (systems analysis, etc.) both for the many state agencies which have automated equipment and for businesses and industries around the state which utilize computer technology. 318 journal of library automation vol. 3/4 december, 1970 conclusion marc is an exciting and potentially valuable innovative new tool available to the library community, useful to improve both its own internal operations and, more importantly, its service to others. nonetheless, before extensive meaningful use of marc will occur, its potential uses must be identified and explored. this article has attempted to give a picture of one such experimental project to improve library service for others within the framework of a particular institution's resources and functions. much more research is needed on potential and operating uses of marc and the results of this research need to be disseminated to the library community. in addition, it is the opinion of the authors that for reasons both of available financial resources and expertise much of the research and development with marc must be a cooperative venture among many different libraries. some work has been done with marc cooperatively throughout the country (nelinet (19), oclc (20), clsd (21), for example) but much more needs to be done. the future of meaningful uses of marc is bright; however, much research and development is yet to be done which can best be done as a cooperative effort. programs and additional information sdi computer programs and services available from the department of libraries to other libraries are described in a publication called "sdi services and costs," available from the oklahoma department of libraries, 109 state capitol, oklahoma city, oklahoma 73105. additional progress reports on the sdi project, as well as other automation projects in oklahoma are reported in the bi-monthly oklahoma department of libraries automation newsletter, which is available on request. references 1. cuadra, carlos a., editor: annual review of information science and technology, 4 (chicago: encyclopedia britannica, 1969), 249-258. 2. studer, william joseph: computer-based selective dissemination of information (sdi) service for faculty using library of congress machine-readable catalog (marc) records (ph.d dissertation, graduate library school, indiana university, september, 1968 ), 1. 3. studer, william j.: "book-oriented sdi service provided for 40 faculty." in avram, henriette d.: the marc pilot profect; final report on a project sponsored by the council on library resources, inc. (washington: library of congress, 1968), 180. 4. cuadra: op. cit., 243-258. 5. ibid:. 263-270. 6. bloomfield, masse: "current awareness publications; an evaluation," special libraries, 60 (october 1969), 514-520. marc based sdi servicejbierman and blue 319 7. bottle, robert t.: "title indexes as alerting services in the chemical and life sciences," journal of the american society for information science, 21 (january-february 1970), 16-21. 8. brannon, pam barney; et al.: "automated literature alerting system," american documentation, 20 (january 1969), 16-20. 9. brown, jack e.: "the can/sdi project; the sdi program of canada's national science library," special libraries, 60 (october 1969), 501-509. 10. davis, charles h.; hiatt, peter: "an automated current-awareness service for public libraries," journal of the american society for information science, 21 (january-february 1970), 29-33. 11. housman, edward m.: "survey of current systems for selective dissemination of information ( sdi) ." in proceedings of the american society for information science, 6 (westport, connecticut: greenwood publishing corporation, 1969), 57-61. 12. martin, dohn h.: "marc tape as a selection tool in the medical library," special libraries, 61 (april 1970), 190-193. 13. bierman, kenneth john; blue, betty jean: "processing of marc, tapes for cooperative use," journal of library automation, 3 (march 1970), 36-64. 14. recon working task force: conversion of retrospective catalog records to machine-readable form; a study of the feasibility of a national bibliographic service (washington d.c.: library of congress, 1969). 15. bierman, kenneth john: "marc-oklahoma data base maintenance project," oklahoma department of libraries automation newsletter, 2 ( october 1970). 16. studer, william j.: (op. cit., note 2), 23-37. 17. payne, charles t.; mcgee, robert s.: "comparisons of lc proofslip and marc tape arrival dates at the university of chicago library," journal of library automation, 3 (june 1970 ), 115-121. 18. bierman, kenneth john: "marc-oklahoma cooperative sdi project report no. 1," oklahoma department of libraries automation newsletter, 2 (june & august 1970), 10-14. 19. nugent, william r.: nelinet: the new england library information network. paper presented at the international federation for information processing, ifip congress 68, edinburgh, scotland, august 6, 1968. (cambridge, mass: inforonics, inc., 1968). 20. kilgour, frederick g.: "a regional networkohio college library center" datamation, 16 (february 1970 ), 87-89. 21. the collaborative library systems development project (clsd): chicago-columbia-stanford. unpublished paper presented at the marc ii special institute, san francisco, september 29-30, 1969. lib-s-mocs-kmc364-20141005043847 87 on-line and back at s.f.u. m. sanderson: simon fraser university simon fraser university library began operation with an automated circulation system. after deliberation, it mounted the first phase of a two-phase o~line circulation system. a radically revised loan pol·icy caused the system design and assumptions to be called into question. a cheaper, simpler, and more effective off-line system eventually replaced the on-line system. the systems, fiscal, and administrative implications of this decision are reviewed. the original system when simon fraser university ( sfu) library opened in 1965, circulation of materials was handled by an automated system. briefly the method of operation was as follows: to borrow a book, the patron presented a laminated plastic card which had his borrower number and borrower class (faculty, staff, graduate, undergraduate) punched in it. the book itself cont:'lined a keypunched card holding the book's class number and brief author and title information. the book card and the patron's badge were fed into an ibm 1031 data collection terminal. the terminal transmitted the information to an ibm 1034 card punch which punched out a card containing the information from the book card, the patron's borrower number, and the date borrowed. at the end of the day, these transaction cards were used to update the loan master file. the loan master file produced daily a list of all material on loan, and fine and overdue notices for dispatch to patrons. payment cards for fines were also produced daily by the system; these cards were used to cancel fines from the file upon payment of the fine. the loan master file and the daily circulation listing also contained records of all materials on reserve. separate listings were available weekly showing reserve books and reserve photocopied material. at the end of each semester a list was produced of all students owing more than $2 in fines for the purpose of withholding grades until such time as fines were paid. reasons for going on-line the possibility of implementing an on-line system in one of the sfu departments was first discussed in early summer 1968. it was accepted by the computing centre management and the nonacademic department heads that: 1. the use of on-line processing generally was increasing rapidly. 2. the level of sophistication of these systems was not high. 88 ]oumal of libra-ry automation vol. 6 / 2 june 1973 3. there was a shortage of people competent to design, implement, and maintain sophisticated on-line systems. 4. a demand for on-line processing at sfu would develop. 5. sfu would probably move with the general trend toward increased use of on-line systems, and an on-line system ought to be initiated to develop local expertise in anticipation of demand. after further discussion, it was agreed that the department wishing to develop the first on-line system must be able to satisfy the following prerequisites: l. the system should encompass the beginning and the end of a clearly defined process. 2. the system should require the simultaneous use of one or more files by two or more terminals. 3. the system should use relatively large files with a high inquiry and update rate. 4. the system should satisfy genuine objectives of the application department. a survey of the departments showed that the library was the logical choice because: l. it could satisfy the prerequisites. 2. it had experience with automated systems. 3. batch-processing in the loan division could be extended to the on-line mode using the existing line of equipment. 4. the library administration was prepared to make an immediate commitment of resources to the project. the library's objectives were as follows: l. inventory conj1·ol-to gain statistics about the use of the collection. such data were available under batch processing for the general collection, but not for the reserve collection, which, with its loan periods of two hours, four hours, one day, and three days, was handled manually. 2. inventory usefulness-to determine how the library is being used and by whom. this information is essential in order to ensure that collection building is a reflection of the realities of the education process of the institution. 3. increased service-by definitiqn, the library is a service institution. if the automated system in batch mode allowed us to speed up the transaction process to handle large volume circulation, and allowed us to produce overdue notices, bills, and statistics, thereby increasing both the efficiency and service of the loan division, then we were satisfying a built-in library objective by implementing data processing in batch mode in the loan division. if the on-line system could give our users instant information on the status of books, then that function becomes a service objective. at sfu, the loan period and penalties for overdue books are the same for all classes of borrowers. the library has never been an enthusiastic supporter of the fines system because on-line and back at s.f.u./sanderson 89 of the general antagonism it creates and because it favors the borrower who can afford to pay. unfortunately, there was no acceptable way to force faculty to pay fines. it yvas thought that the on-line system was the only way to support a system of suspension of borrowing privileges for failure to return books, in lieu of the fines system. 4. cooperation-it was agreed between the three universities of british columbia (simon fraser university, university of victoria, and university of british columbia) that the storage of low-use material in a cooperatively supported lending/storage facility would save in the order of $800,000 per year. it was felt that the on-line system would provide useful statistics for this purpose. 5. future development-it was thought that the on-line system, with its statistics-gathering potential, was a necessary preliminary to the cooperative shelflist conversion of the three universities, in turn thought necessary to provide the kind of bibliographic information to allow collaborative collection building. the reasons why the above justifications later turned out to be invalid are given in a subseq11ent section. phase i of the on-line system (abbreviated system flowcharts of the various stages are shown in appendix 4) the purpose of phase i was to put the general collection on-line in enquiry mode only with batch updating every three hours-on-line updating was to wait until phase ii. in april1969, one full-time programmer analyst and one part-time systems analyst began work on the first phase of the on-line system, using three ibm 2260 graphic display terminals. problems with pgam, the pl/i graphic access method interface program, and multitasking support allowing the use of more than one terminal at a time (it was easy to get one terminal going) meant that by april 1970 the system was just struggling into life. there followed a period of parallel running which was unexpectedly long as a result of some of the problems peculiar to on-line systems (e.g. system down-time; designing a 1'eally effective back-up system to prevent loss of data). this phase lasted until october 1970. by july 1971 it had become apparent that the system was not cost-effective and in august 1971 the system was taken down and replaced by a revised version of the old batch system. the reasons and costs are given in a later section. there were three display terminals in the loan division, two for patrons, one for staff, giving the following capabilities: patron-when the patron typed in the class number of the book he was looking for, according to instructions appearing on the terminafs screen, the information was transmitted to the computer program which searched the on-line loan master file for the required class number. if the book was on loan, a message appeared on the screen giving the class number, borrower number, due date, and whether a hold had been placed on the book. 90 journal of library automation vol. 6/2 june 1973 if the book was not on loan or on reserve, or being repaired, or in cataloging, a message to this effect was displayed. if the patron made any errors in his use of the terminal, error routines in the program displayed messages giving corrective procedures. staff-by use of a special password, staff members could access different modules of the enquiry program. a status query by a staff member would result in all copies of a particular class number being displayed serially on the screen, and since fines and overdues were held on the master file, this type of information was also displayed. other routines available to staff members allowed holds to be placed on books or removed, renewals to be made, and the passwords to be altered. although passwords were a closely guarded secret, it was felt necessary to be able to change passwords in the event of their being learned by unauthorized users. since on-line updating was not to be incorporated until phase ii, the 1034 transaction cards were input every three hours and the loan master file updated in batch mode. file structure for phase i was based on an indexed sequential type of access to a loan master file which contained one 100-byte record per book on loan, one record per fine and one record per reserve book. in this way, the loan master file was in the same format as in the batch system. access to the file began with a program check of a small table held in core storage which gave ranges of class numbers with entry points to an index table. taking the appropriate entry point, the index table stored on disc was accessed. this gave the class number which headed each track for the loan master file. the index table was scanned for the appropriate track. each track of the loan master file contained fifty-four records with eighteen spaces for updates. whenever a record was changed or a new loan inserted, the new record was inserted in the update area. at the end of the day, the file was stripped of its update records and the old batch update program was used to update the loan master file. the loan master file was rewritten to disc the following morning ready for the day's updates. total file space allocated was fifty cylinders. phase ii and the demerit system phase ii was to see the on-line processing of loans and returns, the master file being updated at the time of the transaction instead of in three-hour batches. the reserve collection was to be automated and go on-line. the recording of holds and the production of hold slips for patrons and books was to be fully automated. detailed statistics of the use of the reserve collection were to be obtained. one of the major objectives of phase ii was the replacement of the fines system by a demerit system. under the demerit system a patron would accrue penalty points for the length of time a book was overdue. after a certain level was reached, a warning notice was to be sent out informing him that his privileges would be suspended if a particular level, of points were exceeded. if he then exceeded this level, his borrowing privileges on-line and back at s.f.u./ sanderson 91 wouid be suspended, and whenever he subsequently presented his library card to take out books, the checking procedure in the program would find his borrower number invalid, prevent the transaction being recorded and print a message on a 27 41 terminal giving the reason for suspension. after a given period, borrowing privileges would be restored, provided that overdue materials had been returned. at exam times, penalty points would accumulate more rapidly, as they would also for reserve materials which had short loan periods. file organization for phase ii was to be altered from that of phase i principally to allow easier retrieval and updating. a master index file would contain a brief record (26 bytes for class number, 4 bytes for relative address) for every cataloged book in the library. this index file would lead into the loan master file which would consist of variable length records: one fixed length portion of the class number and author-title, followed by varying numbers of fixed length sections giving details of the loan transaction. the number of the transaction sections would depend on the number of copies of the book which were on loan. anticipated file sizes were 60 cylinders for the master index and 30 to 40 cylinders for the loan master file. the increase in file handling efficiency and in restarting with no lost data after system down-time were seen to compensate for the increase in space allocation. loan policy changes problems with the system of fines and proposals such as the demerit system led to the suggestion that a survey should be made of campus opinion on the library loan policy. an examination of the results of the questionnaire and the comments obtained led to the submission of a somewhat different loan policy to the senate library committee. this policy, briefly, was a recall system with the two-week loan period changed to a semester loan period for general loan material, and retention of the current fines system for reserve materials until the implementation of phase ii. failure to respond to recall was to be penalized by suspension of library service. the system was to be experimental for two semesters. the decision to adopt a recall system had an immediate impact on system development for phase ii: 1. specifications for phase ii needed to be reworked. 2. the demerit system was no longer required. 3. interim procedures were required to handle the recall system until the inception of phase ii. 4. file size growth became unpredictable because it was not known whether all books would stay out until the end of the semester or be returned at more frequent intervals. this could indicate a file size of between 30,000 and 80,000. revision of thinking on on-line circulation two significant developments made it advisable for the library to re92 journal of library automation vol. 6/ 2 june 1973 consider its need for an on-line system in terms of both its benefits for the library and its economic justification. the first development was, as indicated, the radical revision of library loan policy-namely, the proposed adoption of a semester loan period supported by a recall system. the second was a detailed costing of the equipment requirements for phase ii of the on-line system, weighing the relative merits and costs of two alternative manufacturers. these costs have turned out to be significantly higher than originally anticipated. consequently, it was seen that the costing done for phase ii should be done again in the light of the new developments. the original benefits of the on-line system were also reexamined. 1. inventory control-this still applied as far as the reserve collection was concerned. these :.tatistics would have to be gained in some other way insofar as they are additional to the statistics now collected manually. 2. inventory usefulness-this was no longer a justification. by this time we had developed collection analysis programs which give a fine breakdown of the collection into separate disciplinary areas and give total volumes and book usage by borrower class in these areas. further development of these programs could give more information; e.g. referencing the registration system files could give information correlating students, courses, and book usage. 3. increase in service-this was no longer a justification. (a) the implementation of the recall system with its attendant suspension of privileges does not demand an on-line system for its operation as would the previously proposed demerit system. with a suspension of privileges for those owing over $25 tested in early 1971, we were operating a manual system of borrower control successfully, leading us to assume that the recall system's control system would similarly function well. (b) nobody ever complained that the information on the batch system was too old (eighteen hours old at maximum). we had even had messages (anonymous) left by frustrated users of the on-line terminals which could be paraphrased as: "what was wrong with the old system?" 4. cooperation-this was no longer a justification. extensions of the work on collection analysis mentioned in 2 above· could help in the identification of high and low use items and thus provide an alternative way to save the estimated $800,000 per year. work on collection comparison between the three british columbia universities is already underway in a tri-university task force. 5. future development-this was no longer a justification. shelflist conversion should have been hastened by the abandoning of the on-line loan system insofar as resources would be freed to work on the conversion, which is of far greater importance to the future information on-line and back at s.f.u./sanderson 93 handling capability of the library than knowing within four seconds whether or not a book is on loan-especially as the time taken in reshelving of books make this loan information prone to inaccuracy. it thus appeared that the reasons used to justify an on-line system were no longer valid, if, indeed, they ever were. when examining the cost figures again in view of the proposed recall system, the amortization of the development and equipment costs no longer seemed possible. the cost of the batch and on-line system equipment is shown in figures i and 2 for both ibm and colorado instruments (now mohawk). it can be seen that the difference m equipment costs between the proposed batch system and phase ii would have been over $15,000 per year. (some of the savings in equipment rental has been used to microfilm the subject catalog for distribution to three floors in the library which do not have easy access to this catalog.) the manual procedures involved with fines which phase ii was to eliminate are now considerably reduced by the recall system. the development costs of phase ii have been replaced with the cost of returning to the old batch system in a slightly improved form. the cost of this, at the computing centre, was $2,123.76. it had been predicted that writing phase ii in minerva and marc iv (two high-level program language packages) would make considerable savings in the impact on computing centre operations. however, even taking this into account there still remains the development costs and at least $15,000 per year for extra equipment. (the difference between the equipment costs for phase i (figure 1) and phase ii (figure 2).) see the appendixes for cost comparison and projections. colorado ( 3 year lease) monthly ibm monthly 3 c-deks @ $131.29 $393.87 1 10.31 a terminal 3 c-dek cable terminals @ $100.34 $100.34 @ $2.14 6.42 1 1031 a terminal 105.35 1 central controller 137.25 1 1031 b terminal 64.12 1 controller cable terminal 1 1034 card punch 328.73 box 2.25 (includes educational 2 mag tape-recorders 268.20 discount) 598.54 807.99 service-free discount @ 12% 100.00 installation-equipment 707.99 already on site installation-probably fr ee service contractapproximately 122.00 total colorado total ibm monthly $829.99 monthly $598.54 fig. 1. equipment costs (1971) ibm vs colomdo, phase i and off-line 94 journal of library automation vol. 6/ 2 junel973 colorado ( 3 year lease) monthly ibm monthly data collection: 5 c-dek3213 2 1031a terminals @ $131.39 $ 656.95 @ $100.34 $ 200.64 5 c-dek cable 2 1031b terminals terminals @ $2.14 10.70 @ $64.12 128.24 1 3216 central 1 1031a terminal controller 137.25 @ $105.35 105.35 1 controller cable 1 2711 data set 115.00 terminal 2.25 549.23 1 interface coupler 112.50 919.65 less 12% discount 110.36 additional 2703 809.29 attachments: 1 4879 600 bps 12.00 library share of 1 4697 type ii control 40.00 memorex 1270: 1 3205 data line set 86.00 base: 1/ 32 of $1,011 31.00 2 4790 line adapters line adapter: ~of $28 7.00 @ $12 24.00 modem 33.00 1 7506 (library pays half?) @ $86 43.00 $ 205.00 back-up: 9-track mag-tape switching rpq 36.00 recorder with free back-up clock 98.21 switching rpq 134.10 1034 card punch 328.73 printers: 2 2741@ $90.70 181.40 2 2741@ $90.70 181.40 display t enninals: 4 2260@ $46.74 186.96 4 2260@ $46.74 186.96 share of 2848 311.10 share of 2848 311.10 systems $1,896.63 equipment totals: $1,693.85 service contract: nil prime shift only 195.00 $1,896.63 total monthly cost: $1,888.85 equipment freight installation and charges: check-out: $1,390.00 approximately $ 100.00 fig. 2. equipment costs (1971) ibm vs colorado, phase ii on-line and back at s.f.u./sanderson 95 the present recall system the recall system has been in operation since august 1971. its principal features are as follows: that books be loaned for a period of one semester. that they be subject to recall after a period of two weeks after borrowing. that they become due on the last day of exams. that there be a penalty for failure to respond to recall. that there be a penalty for failure to return books after exams. that the penalty be suspension of library privileges plus a $5.00 fine. in the case of failure to respond to recall, the $5.00 fine is levied five days after the recall notice is sent. in the case of failure to return books after exams, the fine is $1.00 per day to a maximum of ·$25.00, starting at the end of the semester. listings of overdue books will be run during this period only, and a fine payment card produced and kept in the loan division. as in the first system, the fine payment card is used to cancel fines upon payment. the fine system and checking of delinquent borrowers is being successfully handled manually. that privileges will be restored only when the patron has both returned the books and paid the fine. the automated part of the system is similar to the original system described earlier except that fine and overdue notices are produced only at the semester end as mentioned. the reaction of the staff handling the recall system has been favorable, as has been the reaction of patrons. initial fears that a high percentage of the books in the collection would be out all semester and be returned en masse at the end have proved unfounded. the number of books out at any one time is often less than under the previous system. people seem to be returning books when they have finished with them and taking out fewer at a time; thus, browsing and usage are not affected. books began returning at 2,000 per day on november 30, 1971 in anticipation of the december 17 due date (master file standing at around 34,000 books on loan at this point). on december 19 only 4,864 books had not been returned. by december 29 this was down to 2,169, and by january 13, 1972 down to 394. recalls have fluctuated between 35 and 130 per day and of these an average of 8 recalls per day have not been picked up by the recaller. by contrast, under the fines system, the daily production of fines, overdue, and hold notices was between 500 and 700. the total amount of fines from september 1, 1970 to november 17, 1970 was ·$11,021.32. from september 6, 1971 to november 17, 1971 the figure was $2,405.03, a difference of $8,616.29. thus, although people are making similar use of the library, judging by the circulation statistics, it is not costing them as dearly. 96 ]oumal of library automation vol. 6/ 2 june 1973 costs comparative computer operating costs are shown in table l. tahle 1. comparative computer operating costs average monthly computer cost computer model 196970 old batch system $3, 100 ibm 360-40 1970-71 phase i-on-line $3,851 ibm 360-50 1971 recall system, batch $1,178 ibm 360-50 197273 recall system, batch $ 514 ibm 370-155 the annual average cost of computer processing is no\v $6,168 rather than the $19,320 projected in appendix l. staff salarit"s have risen in the two years since august 1971 and loans staff costs are now $33,200 instead of $21,267. total total annual cost is now $6,168 (computer time) + $7,182 (equipment) + $33,200 (loan staff and materials ) = $46,550. this is less than tl1e projected annual cost of $57,994. the recall system certainly seems so far to be making the predicted savings, and the increase in good will in the university community is something we must also take into account on the credit side. conclusion as is stressed so often in systems analysis theory, and sinned against so often in practice, a clear statement of objectives is required and a thorough cost/ benefit analysis of all alternative solutions is needed to prevent unwanted solutions of unreal problems. a first question should be: '\vhat are we really trying to acl1ieve here?" rather than: "i wonder if we could apply system x in this situation?" automation is one of many possible solutions to a p roblem. an on-line system is one of many possible automated solutions. the management aspects of the decisions in setting up an on-line system were referred to in "reasons for going on-line." the thought of taking the on-line system down again was born of a number of factors. in the first place, feeling on campus cau_sed the loan policy to evolve in a way not predictable at the time of system design. in the second place, we learned that on-line systems are not to be tre ated lightly. they require a great deal of careful design and technical competence if they are to be as efficient as they are impressive. they embody concepts as different from batch processing as batch processing is from the manual system it may replace. for us, the result was escalating costs, and an on-line system design that could have been better and less costly. the solution finally adopted was the result of considering what were seen to be the real requirements: maximum availability of materials with maximum convenience; and against the background of the library's general objectives, maximum cost-effective service in an era of tight budgets. appendix 1 annual circulation system cost summary (as of aucust 1971) present on-losses compared pr()posed line phase i plw.se ii savings over with batch (without predicted annual costs (with recall) recall) (with recall) phase i phase ii phase i phase ii machine time $19,320 $37,150 $42,000 $17,830 $22,680 forms overdue notices 950 3,250 9.50 2,300 fine notices 15 48 15 33 a printouts 3,200 960 1,000 $2,240 $2,200 ~ i postage for overdues & fines 3,530 12,000 3,530 8,470 t'"t ... envelopes 70 250 70 180 ~ ~ postage for holds/ recalls 1,200 1,200 1,200 [ punch cards 1,260 1,260 1,260 loans staff b:l ;::. fines 2,000 6,000 2,000 4 ,000 ~ ;>;"' stuffing envelopes 600 2,000 600 1,400 ~ looking up addresses 400 1,200 400 800 vl reserves staff 18,267 18,267 14,763 3,504 ~ equipment c::: 1030 system 7,182.48 7,182.48 14,606.04 7,423,56 '--en 2260 terminals 1,682.64 2,243.52 1,682.64 2,243.52 > share of 2848 ·3,733.20 3,733.20 3,733.20 3,733.20 z t:j 2741 terminals 2,176.80 2,176.80 tr1 ::0 $40,428.84 $38,257.08 $3,440 $6,964 en 0 net saving in annual cost of batch system over: phase i $36,988 z phase ii $31,293 '-0 --l 98 journal of library automation vol. 6/2 june 1973 appendix 2 gross computer operating costs during phase i costs shown include all circulation runs. nov. 1970 dec. 1970 ]an. 1971 feb. march april may $ 5,402.57 4,265.12 3,605.33 3,937.78 4,349.41 2,981.39 2,421.39 cpu hrs. 36.0241 28.4410 24.0419 26.2595 29.0043 average monthly operating cost of phase i over seven months: average monthly operating cost of former batch system: appendix 3 19.8820 16.1487 $3,851.85 $3,100.00 development costs for phase ii completion present system (phase i ) converted to minerva with new file organization, etc. interface to batch system. and systems computing centre library programming and systems tests est. pacific westem consulting (minerva) at $150 per day computer time (est. ) forms, staff training parallel runs minerva total phase ii on-line systems computing centre ibm support library personnel programming and systems tests pacific western consulting computer time (est.) forms, staff training parallel runs (33 d ays at $35 per day) equipment rental (@ $1,200 per month additional) total development phase ii total system development (already spent-in addition): $11,576 2 months 7 days subtotal 7 months 5 days subtotal ( 10 days) 13 months 48 days subtotal 21 months 10 days subtotal $ 1,800 200 2,000 5,600 750 8,350 1,500 50 350 --$10,2.50 --11,700 1,400 1,900 15,000 16,800 1,500 33,300 10,000 1,000 1,155 1,300 46,755 57,005 circulation mash:r lis~lng on-line and back at s.f.u. j sanderson 99 appendix 4 (a) original circulation system ibm 1034 card punch dajly circulation s)l>tom 100 journal of library automation vol. 6/ 2 june 1973 1030system circulation cards payment cards, lost book billi, reserve bills, etc. reserves listiog by course appendix 4 (b) phase i create on~ line l..o.an master inquiry and update program (status, holds & renewalt) 3 in cene ntl lo3.ns 1031 badgecard readers 2 in reserves reser\le listing"' by course {weekly) bode-up 1034 ca rd punch on~line and back at s.f-u-/sanderson appendix 4 (c) proposed phase ii c"..reate on-lioe loj><00t!!jdomo"-'ootl!jooljl0>j<'0!!!joolj!_oo!!j!o!!lo _ __,.oo!!_!!!oolj!_oo!!j!o'l!)do'-"'oo .. o""oo!jloo"'o!!!joo"'ooi!!io.!l!ddlli!ddil!iol!loo"'oo""o""'-do ooo cciooooooooooo oa ooooooooooooilnooooooooaooooooooooooooooooooooooo 00 ooooooooooooooodoooooojoooooooooooooooooooomoqo -·--oo oooooooooooooooooooooooooooooooooooooooooociooooo :j 1 ~o 11111010 t 720610 no lot7206l07l oo moooooo ttl r4 2 o , ooto4 3'911600 1 06'9u t 1 .. ...qqj 0 9607 iip q ll ltl 2l'9 0 0 115)512 0 --· ·-··--·--·· ·---fig. 1. dump of contracts & grants office master tape record original specifications for these project records had in fact included a gesture toward information retrieval in the form of a 5-digit "discipline" code; this code had quickly become null when problems of maintenance and interpretation revealed themselves. however, the regular monthly entry and updating of other descriptive data was already twelve months underway at the time the cooperative project was suggested. library index the original proposal for a library index to the c&g file was production of a standard kwic (keyword-in-context) index to project titles, to be based on use of an ibm share library program developed by computer center staff. after review of available library programs and output, the product was finally specified to be a kwoc (keyword-out-of-context) index to project titles, using the chief investigator's name as key to a second 'bibliographic" section of project summaries (figures 2, 3). a third section was added to list project summaries indexed by campus department name (figure 4). the c&g file included 12 elements which the library considered to be of general campus interest. these elements, comprising the project summaries, are: ( 1) project title; ( 2) chief investigator's name; ( 3) award status (i.e., funded or proposed); ( 4) project site (i.e., campus or affiliated institution); ( 5) project type (e.g., training, basic research, applied research); ( 6) grant number; ( 7) total project duration to date; ( 8) award period; ( 9) award amount; ( 10) granting agency name; ( 11) campus department; and ( 12) school. items 10, 11, and 12 of this list exist as numeric codes in the records, --:....-;;;;-:--c:-:--:-:c-:--'""""'!oi-sif\g-cc,...muf\ity ierifil heili 1--·· · -·dunlap ... mental healttt princ.ipl[ s and early educat w n f (j it oeaf children schlesinger, h ~~~~:~l~~ !~ t=e~:~:~=~ ~~ ~f :~m~e~~~~6:~ ~ ~n h:~!!~~t ~ ---------.oijsi:ji=li'ft~to~t,..\5ng:"!t:•o.c· ~~ -helabollt routes & their cthtitcl singer, 1 ~~~~~~~-'i~~~;~~·~s~~~~~2!-'la~~o;;c~~~tc.;:~:~ot· !~i ~nf e~7:~~~~~~1 ~smh1-bol-is-m -ta:::~·/ stuutes cf bile pigment i"'ltatiollsm schmid, r sa! :~~: ~" "~~ 4~~~:~" .~v£ ~~ '\i~e~a~:!:~uf!jf2~' ~\1!!2c;;te,;oe"'s•t ----------~~~/t¢~~t:·o.jl" ,.-. --tnetr.n erroiis qf metaticli sm smith, l "-*w.lffis'-''-'-~---l:~iffi~~~~~~~ o: ~~p~~o~~~~~ii~k~~~~~·~~·~;tt~~:" ceu s ___ ~-----~~n~at .. " fig. 2. keyword subject section 164 journal of library automation vol. 6/3 september 1973 _,uc..._..sf-:_j,jli'-"j:b .. a ... bfl-------...ti.ll.....ulhtl<;l a _ t_:_r;_fi_~•li.j1!.!2t.k _..21jlil,j[;l_ _ _ ________ jp:aag!ilec__.l19"""'•"'o"se'""•"'"'•u:-,--cw:-.... -..,..,ly"'•"'•h"'cc" y=te &li"~un e cy1 ct·~ · sts· ~ ·... ----patholugv pfi(~ j lvpe 8 slle:ucsf award ccontjiiiueoi --------"yp"-!..li!...j.i .;t!'t!cj .(ll__to !?/31/i.r.) agency cc.zl __ _!!n""c:._,ro,_.i_,t.,_a0,_,7"-19"'1~--,.----aosetiiau, w. • ••• lranshr fact(l fi -charachr t. role tumor immunity --------~~=~th~~u,.lc~ .. ~~icii/fj' to. '6f3c~.q..tj~1';~0~ ag~~~~~-: nu: ~=&b q ...!•~o~sfeo!n.!.!th!!!.a'-'l ''-'''-'''-'-' ~· ~·_.<;ex"-t~~:~~d~e~~~~t!\~:0:~~"uf'f~~;'t~:t4l fields gf p;~~t ~~:~ :1-ne~~~~~ucsf award 't'r 5 c ij/01170 to 4/30/hi agency 0061 no: rolns-09146 ."'c"'th""•"'••,-,""'s'.., -. .-cs"et'""re'""t"'io'-'n'"'p"'r"'ct"'e""'in7s ·&iinsey f.anu.'fat ____ ---·-----.•• --· , ···dentistry pa.oj lype 8 site:utsf proposal ruo,.,to:, a ... • • • the fcrest cycles cf uengue in m.tlaysu hccpeii fcuncal ion prijj type b '-'-'u.ci ~~-=~~~-;~~,·~2~1~/c,'~'l:'~-t~o,._:s'~'=''~~~·~·-~~~~-~=:::~~~~~~:;~~~~~~--fig. 3. chief investigator section with decoding tables comprising a separate file at the beginning of the tape volume. these items, together with items 3, 4, and 5 are coded on input but are not uniformly edited by the regular c&g update program. program requirements necessary program functions included the selection of active grant records and the editing, decoding, and formatting of selected data elements for printing, and the extraction and sorting of index terms. these functions were divided between a main routine coded by library staff, an indexing subroutine and print program written by a computer center staff member, and an ibm utility sort. local programs were written in pl/ 1, using the newly installed pl/1 optimizing compiler, and provided several tests of the compiler's capacities. only projects currently in "award" status, or which have outstanding proposals during the preceding twenty months, are selected for printing, approximately two-thirds of the file at this time. these conditions are tested on various data fields in the c&g record, and data from selected records are reformatted into a partial print line and passed to the keyword extraction ,..t: n icint-, c.j:~e i.!a~-fj~f s."i· i~lll. m!sciiptrmi'tf t=ei't.~-lct'lr-: tp[;i,· so:(gl u ~' tldc.jnf: p.-l:j type h site:ucsfjh:c i tine' gf.i\eriol • hgvlat ten 'cf·' p.'ftfhansf.()pi1iffgn' fntheltvf'r si.hi:c l gf ~lojc.ine pfi.c j type ti' stte:ucsf ,.ecii':ih, ghej.ial • p41ht:ttie'5itti:' aclh hr-:1-l 'faill.:l,·l: s( hf.ll gf mfflicine p'-'~j type ~ slle:v .~ -::;~e"'o"'i t"i ;:;-;n e,-, -,gc;oe n"'e"'•a""l.--;:;""'"'•"'••"""""th'r'rg 'f c. ((iii c inc g e 1\ s. schcdl r.f "'~ihcint pllf1j typ[ 0 sjte:ucsf ndtclf'.li:, bhemi\l ,. ly.syl olivas£ k(:'t l~hc tftl.ac:;'f.'n' c: vi"--ss (il'-."it' t'ng st:hl:l of mi:cicwe tl'-'(i j tyh b stlf.:uc.sf proposal proposal proposal • proposal p~c.posal ccontjnued. bissell•: oj-~:~-~ ::-, ·,;-._ ;·_.-;.r:::··· ~:-;:-:9_ siegel, r --...,:noulc'>in"'ec::-,<'o~, .... , ....... .--;;ccn:n;t;tr;;;a,.ct.--.;10;-;;-.0,:;;v•"•""op c··cr:r;ifucr lrait-.ih~g sessions frjr( op,.;s•"'n"•"'•"'•s'"o"-nn..,el;---_ --,,.---,_,~~-'"-d""''"'j"sc::-,-;;";--sc.hccl of mfojc:lne p !l:o j t'r'pe t sjte::ucsf "proposal: .. !-·.~ '·. .._, ... t ~icim , gei~f. ra l • y[f;ulath: n of ci tf~[ e" ipf'fs's'j cs:·l>y-~tffij~f.i£'rgiij '(;'cj;,htxe""s--------,8;';-a;i-;xt,-,'er;;-,-j.,--' .sci-icc( of •t:i:i cin !: tl'fjl caroigv as cul ar res -sctu.ii l ( f ~;[c-icin e ·•.• • ._, ( a-rlfh) vat 'r f s iffi __ _ --·-··· .... ----occ iu gl l'ajh (log y, clhd cal sch cl cf. "'hici ~§ . , .... _ _ • o __ t:lin p~~i::i~e. o )008901 histo k y 1-:ealth sci sc~ool (jf fi'loicine •ll . hist of health s tl _ _!l!!t;t<~f~i.ljift!ji,":l!,bix.x---:-'-------"u'ts f t i] j\!iflacts &._g.b.a.tlls....ll:tlla.-. __ ._____2/...l'llll---------"m.i'---'-•••••file2 ii.eao ei'i'ors ----------~a.cifncy c.ooe~~t . in cc.g ubl'e -·----·--depf • ccoes not found -----------------sue co de s not foun once the appropriate pieces of html code had been replaced with corresponding include statements, the changeover was complete. from this point forward, changes in such things as database names, coverage periods, and descriptive material will be made to one .txt file. the change will immediately be reflected across all subject pages with no additional work involved for the librarians responsible for those pages. note that the sul server has been configured so that it parses all web pages. this is necessary because most of the library's web pages have some ssl this configuration means that the web page extensions remain .html. if the server is not configured in this manner, then all pages containing ssi must end in a .shtml extension. this is a subject that requires discussion with automation librarians or the department responsible for the library's server. advantages obviously, the biggest advantage to this method is the time saved for individual librarians. there is now no need for librarians to do any maintenance work for links to information housed in the alphabetical list. static html pages referencing gale's infotrac onefile database, for instance, would have required updates to approximately forty subject pages; now, one librarian can correct one .txt file and simultaneously update all forty subject pages. time saved can be used in collecting and editing the list of web sites that are a part of each subject page; this is a task that has been pushed back in the past, in favor of making more urgent database information changes. .! i coffeecup html editor www.coffeecup.com , f.ile j;:dit ~iew q.ocument [nsert e,ormat iools \'lindow t!elp j . [~ · lrl· ~ ~ i ~ @ • ! .. ,,., ,. ix iq :ii"' t;~ ~ . ! '1&1 t':f ~ ft; • •&;pl .;11= • · 1;9l· ,,. ®·. !ai· ,.,i · ~ . fil• · ~ · ~ · ee . !.l!j • edrt j preview i help i r academic sea rch elite ( ebsco) ; < img src= " / image::: /f ulltext. gif " a lt = "some full text " border = "o"> multi-di sc iplinary database includes some sc hol ar ly articles

collection

call number

# of requests

# of users